Direct creation of sparse matrices in matlab

9th October, 2012

A small Matlab mex tool for generating sparse matrices from disk.

As I've discussed previously, generating sparse matrices in Matlab is very inefficient. Using the Matlab sparse command requires at least double the amount of storage than the sparse matrix itself. For a large matrix, this can cause memory problems. The alternative of decomposing and recomposing the matrix as you build it is extremely slow.

Generating the matrix directly on disk

To be maximally memory efficient, you can write the sparse indices out to disk to a binary file. To manage this, you need to understand how a sparse matrix is stored (see the Matlab documentation on sparse matrices for details. I'm providing here a small mex tool that reads the binary data from disk and builds a sparse matrix directly in the Matlab storage format. Only the memory required to store the sparse matrix is ever allocated.

Three binary files are required: one to define row indices (uint64); one to define column indices (uint64); and one containing the double-precision values for each sparse entry. These files must be built in element index sorted order; the mex tool does not sort them or check them for you! Also, values are not accumulated into the sparse matrix. You'll need to use the Matlab sparse command for that.

Download and installation

Download the source code: matlab_sparsefromdisk.zip
Unzip the file into a directory on the matlab path. The mex code will be compiled when you first use the function. Your system installation of mex must be configured properly.

Building the matrix

Create temporary files to contain sparse indices:

strRowInd = tempname;
strJ = tempname;
strValue = tempname;

Open the files:

fhRowInd = fopen(strRowInd, 'w');
fhJ = fopen(strJ, 'w');
fhValue = fopen(strValue, 'w');

Generate some sparse data, and write it to disk:

fwrite(fhRowInd, vnSortedRowIndices, 'uint64');
fwrite(fhJ, vnSortedJVector, 'uint64');
fwrite(fhValues, vfValuesInIndexSortedOrder, 'double');

Close the binary files, then construct the sparse matrix:

fclose(fhRowInd);
fclose(fhJ);
fclose(fhValues);

mfSparse = sparsefromdisk([nNumRows nNumCols], strRowInd, strJ, strValues);

Clean up by deleting the temporary files:

delete(strRowInd);
delete(strJ);
delete(strValues);

Caveats

No error checking is performed on the files you provide to sparsefromdisk. If you mess up the generation of the files and don't respect the format required by Matlab for sparse matrices, then you risk corrupting Matlab's variable storage.

I've made an effort to write platform-agnostic code, but I've only debugged the code on OS X. If it doesn't compile, or breaks in other ways on Windows or another OS, please let me know. Bug fixes are also welcome.