Direct creation of sparse matrices in matlab
9th October, 2012
A small Matlab
mex
tool for generating sparse matrices from disk.
As I've discussed previously, generating sparse matrices in Matlab
is very inefficient. Using the Matlab
sparse
command requires at least double the amount of storage than the sparse matrix itself. For a large matrix, this can cause memory problems. The alternative of decomposing and recomposing the matrix as you build it is extremely slow.
Generating the matrix directly on disk
To be maximally memory efficient, you can write the sparse indices out to disk to a binary file. To manage this, you need to understand how a sparse matrix is stored (see the Matlab documentation on sparse matrices for details. I'm providing here a small mex
tool that reads the binary data from disk and builds a sparse matrix directly in the Matlab
storage format. Only the memory required to store the sparse matrix is ever allocated.
Three binary files are required: one to define row indices (uint64
); one to define column indices (uint64
); and one containing the double-precision values for each sparse entry. These files must be built in element index sorted order; the mex
tool does not sort them or check them for you! Also, values are not accumulated into the sparse matrix. You'll need to use the Matlab
sparse
command for that.
Download and installation
Download the source code: matlab_sparsefromdisk.zip
Unzip the file into a directory on the matlab path. The mex
code will be compiled when you first use the function. Your system installation of mex
must be configured properly.
Building the matrix
Create temporary files to contain sparse indices:
strRowInd = tempname; strJ = tempname; strValue = tempname;
Open the files:
fhRowInd = fopen(strRowInd, 'w'); fhJ = fopen(strJ, 'w'); fhValue = fopen(strValue, 'w');
Generate some sparse data, and write it to disk:
fwrite(fhRowInd, vnSortedRowIndices, 'uint64'); fwrite(fhJ, vnSortedJVector, 'uint64'); fwrite(fhValues, vfValuesInIndexSortedOrder, 'double');
Close the binary files, then construct the sparse matrix:
fclose(fhRowInd); fclose(fhJ); fclose(fhValues); mfSparse = sparsefromdisk([nNumRows nNumCols], strRowInd, strJ, strValues);
Clean up by deleting the temporary files:
delete(strRowInd); delete(strJ); delete(strValues);
Caveats
No error checking is performed on the files you provide to sparsefromdisk
. If you mess up the generation of the files and don't respect the format required by Matlab
for sparse matrices, then you risk corrupting Matlab
's variable storage.
I've made an effort to write platform-agnostic code, but I've only debugged the code on OS X. If it doesn't compile, or breaks in other ways on Windows or another OS, please let me know. Bug fixes are also welcome.