nav-left cat-right
cat-right
Matthias Pospiech > Programming > matlab > fast reading of structured text data with matlab

fast reading of structured text data with matlab

In the following I present a solution to read data from a textfile which has the number of x-values (columns) in the first line, the number of y-values (rows) in the second line and in all following lines the matrix with size x times y.

The principle reading of the data is achieved by the following code

% open file
display(['reading ' FILENAME]);
fid = fopen(FILENAME, 'r');
fseek(fid, 0, 'bof');
 
% read single line: number of x-values
tline = fgetl(fid);
valueLine = strrep(tline, '#', '');
valueLine = strrep(valueLine, ' ', '');
x = str2double(valueLine);
 
% read single line: number of y-values
tline = fgetl(fid);
valueLine = strrep(tline, '#', '');
valueLine = strrep(valueLine, ' ', '');
y = str2double(valueLine);
 
% read full matrix with size x * y
DATACELL = textscan(fid,'%f',[x,y]);
DATA = DATACELL{1};
 
% convert data to correct matrix format
DATA = reshape(DATA, y, x);
 
% close file
fclose(fid);

This can however be slow, especially for matrices with text-file size of several hundred MB. Since disk space is usually not a limit an time much more expensive than disk space I save the data again in matlabs binary format. If I have to read the file a second time I can read the matlab file instead which is 10-100 times faster. Additionally If the data of the file is requested a second time this can be skipped if the data is kept persistent in the function reading the data. In total this is achieved with the following code:

function [r_Data, r_xsize, r_ysize] = ReadSimulationData(FILENAME)
 
persistent lastImageName;
persistent DATA;
persistent xsize;
persistent ysize;
 
% if this file is equal to last file reuse data
if (not(isempty(lastImageName)) && (strcmp(lastImageName,FILENAME)))
    display(['using ' FILENAME]);
else
    % check if converted matlab file exists
    MatlabDataName = strrep(FILENAME, '.txt', '.mat');
    if (exist(MatlabDataName, 'file'))
        display(['loading ' MatlabDataName]);
        load(MatlabDataName, '-mat', 'DATA', 'xsize', 'ysize');
    else                
        %% read data from file
 
        % open file
        display(['reading ' FILENAME]);
        fid = fopen(FILENAME, 'r');
        fseek(fid, 0, 'bof');
 
        % read single line: number of x-values
        tline = fgetl(fid);
        valueLine = strrep(tline, '#', '');
        valueLine = strrep(valueLine, ' ', '');
        x = str2double(valueLine);
 
        % read single line: number of y-values
        tline = fgetl(fid);
        valueLine = strrep(tline, '#', '');
        valueLine = strrep(valueLine, ' ', '');
        y = str2double(valueLine);
 
        % read full matrix with size x * y
        DATACELL = textscan(fid,'%f',[x,y]);
        DATA = DATACELL{1};
 
        % check if read length is equal
        % to expected length.
        % if not resize x accordingly
        % (typically not necessary)
        lengthdiff = length(DATA) - x*y;
        if (lengthdiff ~= 0)
            x = x + lengthdiff/y;
        end
        % convert data to correct matrix format
        DATA = reshape(DATA, y, x);
 
        % close file
        fclose(fid);
 
        % save last read filename
        lastImageName = FILENAME;        
 
        xsize = x;
        ysize = y;
        display(['saving ' MatlabDataName]);
 
        % save data in binary format
        % this speeds up reading of big matrices
        % enormously
        save(MatlabDataName, 'DATA', 'xsize', 'ysize', '-mat');        
    end
end
 
% copy read, or old persistent data to
% return data values
r_Data = DATA;
r_xsize = xsize;
r_ysize = ysize;
 
return;

Einen Kommentar schreiben

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

*

This blog is kept spam free by WP-SpamFree.