Sparse Matrices in MATLAB P:Design and Implementation: December 2004
Sparse Matrices in MATLAB P:Design and Implementation: December 2004
net/publication/220728126
CITATIONS READS
53 359
2 authors, including:
Viral B. Shah
Julia Computing
35 PUBLICATIONS 3,129 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Viral B. Shah on 03 June 2014.
{viral,gilbert}@cs.ucsb.edu?
Department of Computer Science
University of California, Santa Barbara
Introduction
1 User’s View
In addition to Matlab’s sparse and dense matrices, Matlab*P provides sup-
port for distributed sparse (dsparse) and distributed dense (ddense) matrices.
The system design of Matlab*P and operations on ddense matrices are de-
scribed elsewhere [12, 7].
The p operator provides for parallelism in Matlab*P. For example, a random
parallel dense matrix (ddense) distributed by rows across processors is created
as follows:
>> A = rand (100000*p, 100000)
Similarly, a random parallel sparse matrix (dsparse) also distributed across pro-
cessors by rows is created as follows: (An extra argument is required to specify
the density of non-zeros.)
>> S = sprand (1000000*p, 1000000, 0.001)
We use the overloading facilities in Matlab to define a dsparse object. The
Matlab*P language requires that all operations that can be performed in Mat-
lab be possible with Matlab*P. Our current implementation provides a work-
ing basis, but is not quite a drop–in replacement for existing Matlab programs.
Matlab*P achieves parallelism through polymorphism. Operations on ddense
matrices produce ddense matrices. But once initiated, sparsity propagates. Oper-
ations on dsparse matrices produce dsparse matrices. An operation on a mixture
of dsparse and ddense matrices produces a dsparse matrix unless the operator
destroys sparsity. The user can explicitly convert a ddense matrix to a dsparse
matrix using sparse(A). Similarly a dsparse matrix can be converted to a ddense
matrix using f ull(S). A dsparse matrix can also be converted into a Matlab
sparse matrix using S(:,:) or p2matlab(S). In addition to the data–parallel
SIMD view of distributed data, Matlab*P also provides a task–parallel SPMD
view through the so–called “MM–mode”.
Matlab*P currently also offers some preliminary graphics capabilities to
help users visualize dsparse matrices. This is based upon the parallel rendering
Fig. 1. Matlab and Matlab*P Spy plots of a web crawl dsparse matrix
for ddense matrices [5]. Again, this demonstrates the philosophy that Matlab*P
should feel like Matlab. Figure 1 shows the spy plots (showing the non–zeros
of a matrix) of a web crawl matrix in Matlab*P and in Matlab.
– The iterative methods community largely uses row based storage. Since we
believe that iterative methods will be the methods of choice for large sparse
matrices, we want to ensure maximum compatibility with existing code.
– A row based data structure also allows efficient implementation of matvec
(sparse matrix dense vector product) which is the workhorse of several itera-
tive methods such as Conjugate Gradient and Generalized Minimal Residual.
By default, a dsparse matrix in Matlab*P has the block row layout which
would be obtained by ScaLAPACK [3] for a ddense matrix of the same dimen-
sions. This allows for roughly the same number of rows on each processor. The
user can override this block row layout in a couple of ways. The Matlab sparse
function takes arguments specifying a vector of row indices i, a vector of column
indices j, a vector of non–zero values v, the number of rows m and the number
of columns n as follows:
>> S = sparse (i, j, v, m, n)
By using a vector layout which specifies the number of rows on each processor
instead of the scalar m which is simply the number of rows, the user can create
a dsparse matrix with the desired layout:
>> S = sparse (i, j, v, layout, n)
The block row layout of a dsparse matrix can also be changed after creation
with:
>> changelayout (S, newlayout)
The CSR data structure stores whole rows contiguously in a single array on each
processor. If a processor has nnz non–zeros, CSR uses an array of length nnz to
store the non–zeros and another array of length nnz to store column indices, as
shown in Figure 2. Row boundaries are specified by an array of length m + 1,
where m is the number of rows on that processor.
80
!
!
#
1e7 nnz(4/row)
70 1e6 nnz(32/row)
60
Time (Secs)
" !
#
#
50
# !
40
#
30
!
20
10
#
0
2 4 6 8 10 12 14 16
Processors
3.2 Constructors
2. sparse – sparse works with [i, j, v] triples, which specify the row value,
the column value and the non–zero value respectively. If i, j, v are ddense
vectors with nnz non–zeros, then sparse assembles a sparse matrix with
nnz non–zeros. If there are duplicate [i, j] indices, the corresponding values
are summed. The pseudocode for sparse is shown in Figure 3.1. However, in
our implementation, we implement this by sorting the vectors simultaneously
using row numbers as the primary key, and column numbers as the secondary
key.
The starch phase here is similar to the starching used in the parallel sort,
except that it redistributes the vectors so that row boundaries do not over-
lap among processors and the required block row distribution for the sparse
matrix is achieved. The assemble phase actually constructs a dsparse ma-
trix and fills it with the non–zero values. Figure 4 shows the scalability of
sparse on an SGI Altix 350. Although performance for commodity clusters
cannot be as good as that of an Altix, our initial experiments do indicate
good scalability on commodity clusters too. We will report more detailed
performance comparisons in a future paper.
3. spones, speye, spdiag, sprand etc. – Some basic functions implicitly
construct dsparse matrices.
One of the goals in designing a sparse matrix data structure is that, wherever
possible, it should support matrix operations in time proportional to flops. As a
result, arithmetic on dsparse matrices is performed using a sparse accumulator
(SPA). Gilbert, Moler and Schreiber [10] discuss the design of the SPA in detail.
Matlab*P uses a separate SPA for each processor.
3.5 Matvec
The matvec operation multiplies a dsparse matrix with a ddense column vector,
producing a ddense column vector as a result. Matvec is the kernel for many
iterative methods.
For the matvec, y = Ax, we have A and x distributed across processors by
rows. The submatrix of A at each processor will need a piece of x depending upon
its sparsity structure. When matvec is invoked for the first time on a dsparse
matrix A, Matlab*P computes a communication schedule for A and caches it.
When more matvecs are performed using A, this communication schedule does
not need to be recomputed, which saves some computing and communication
overhead, at the cost of extra space required to save the schedule. Matlab*P
also overlaps the communication and computation during matvec. This way,
each processor starts computing the result of the matvec whenever it receives a
piece of the vector from any other processor. Figure 6 also shows how matvec
scales in Matlab*P, since it forms the main computational kernel for conjugate
gradient.
Communication in matvec can be reduced by performing graph partitioning
of the graph of the sparse matrix. If fewer edges cross processors, lesser commu-
nication is required during matvec. Matlab*P can use several of the available
tools for graph partitioning. However, by default, Matlab*P does not perform
graph partitioning during matvec. The philosophy behind this decision is sim-
ilar to that in Matlab, that reorganizing data to make later operations more
efficient should be possible, but not automatic.
0.4
0.35
0.25
0.2
0.15
0.1
Matlab
0.05
Matlab*P
0
40 50 60 70 80
Problem Size (grid3d)
g
y
1
ρ1 , µ 1
x
c =1
z 2
ρ , µ2
c =0 2
Fig. 7. Geometry of the Hele-Shaw cell (left). The heavier fluid is placed above the
lighter one. Either one of the fluids can be the more viscous one. Mesh point distribution
in the computational domain (right). A Chebyshev grid is employed in the y–direction,
and compact finite differences in the z–direction.
size 20, 625 × 20, 625. The number of non–zeros is 3, 812, 450. The matrix is un-
symmetric, both in values and nonzero structure, as shown in the spy plots in
Figure 4. In order to calculate the largest eigenvalue, we use the power method
with shift and invert in Matlab*P.
[m n] = size (A);
C = A - sigma * B;
y = rand (n*p, 1);
for k=1:iter
q = y ./ norm (y);
v = B * q;
y = C \ v;
Fig. 9. Matlab*P code for power method with shift and invert
The original non–Matlab*P code used LAPACK with ARPACK [13], while
the Matlab*P code is using SuperLU DIST with the power method as shown
in figure 4. We use a guess of 0.1 to initialize the power method and it converges
to 0.0194 which is enough precision for linear stability analysis. We use a cluster
with 16 processors to solve the generalized eigenvalue problem. Each node has
a 2.6GHz Pentium Xeon CPU, 3GB of RAM and a gigabit ethernet connection.
Results are presented in Table 1.
5 Conclusion
The implementation of sparse matrices in Matlab*P is work in progress. Cur-
rent available functionality includes being able to construct sparse matrices,
perform element–wise arithmetic and indexing operations on them, multiply a
sparse matrix with a dense vector and solve linear systems. This level of func-
tionality allows us to implement several algorithms such as conjugate gradient
and the power method.
Much remains to be done. A complete implementation of sparse matrices
requires matrix–matrix multiplication and several factorizations (Cholesky, QR,
SVD etc). Improvements in the sorting code can lead to general improvements in
many parts of Matlab*P. It is also important to make existing graph partition-
ers available in Matlab*P – Meshpart and ParMetis. Several preconditioning
methods also need to be implemented for Matlab*P, since iterative methods
might possibly be the way to solve large linear systems.
The goal of sparse matrix support in Matlab*P is to provide an interactive
environment for users to perform operations on large sparse matrices in parallel,
while being compatible with Matlab. Our current implementation is ready to
be used for simple real life problems.
6 Acknowledgements
David Cheng made useful contributions to the sorting code. We had several useful
discussions with Parry Husbands, Per–Olof Persson, Ron Choy, Alan Edelman,
Nisheet Goyal and Eckart Meiburg.
References
1. Patrick Amestoy, Iain S. Duff, and Jean-Yves L’Excellent. Multifrontal solvers
within the PARASOL environment. In PARA, pages 7–11, 1998.
2. D. Arnold, S. Agrawal, S. Blackford, J. Dongarra, M. Miller, K. Seymour, K. Sagi,
Z. Shi, and S. Vadhiyar. Users’ Guide to NetSolve V1.4.1. Innovative Computing
Dept. Technical Report ICL-UT-02-05, University of Tennessee, Knoxville, TN,
June 2002.
3. L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Don-
garra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C.
Whaley. ScaLAPACK Users’ Guide. SIAM, Philadelphia, PA, 1997.
4. Guy E. Blelloch, Charles E. Leiserson, Bruce M. Maggs, C. Greg Plaxton,
Stephen J. Smith, and Marco Zagha. A comparison of sorting algorithms for the
connection machine cm-2. In Proceedings of the third annual ACM symposium on
Parallel algorithms and architectures, pages 3–16. ACM Press, 1991.
5. Oskar Bruning, Jack Holloway, and Adnan Sulejmanpasic. Matlab *p visualization
package. 2002.
6. Long Yin Choy. Parallel Matlab survey. 2001. http://theory.csail.mit.edu/
∼cly/survey.html.
7. Long Yin Choy. MATLAB*P 2.0: Interactive supercomputing made practical. M.S.
Thesis, EECS, 2002.
8. D. E. Culler, A. Dusseau, R. Martin, and K. E. Schauser. Fast parallel sorting under
LogP: from theory to practice. In Proceedings of the Workshop on Portability and
Performance for Parallel Processing, Southampton, England, July 1993. Wiley.
9. John. R. Gilbert, Gary L. Miller, and Shang-Hua Teng. Geometric mesh partition-
ing: Implementation and experiments. SIAM Journal on Scientific Computing,
19(6):2091–2110, 1998.
10. John R. Gilbert, Cleve Moler, and Robert Schreiber. Sparse matrices in MATLAB:
Design and implementation. SIAM Journal on Matrix Analysis and Applications,
13(1):333–356, 1992.
11. Nisheet Goyal and Eckart Meiburg. Unstable density stratification of miscible fluids
in a vertical hele-shaw cell: Influence of variable viscosity on the linear stability.
Journal of Fluid Mechanics, (To appear), 2004.
12. P. Husbands and C. Isbell. MATLAB*P: A tool for interactive supercomputing.
The Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.
13. R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users Guide: Solution of
Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM,
Philadelphia, 1998.
14. Xiaoye S. Li and James W. Demmel. Superlu dist: A scalable distributed–memory
sparse direct solver for unsymmetric linear systems. ACM Trans. Math. Softw.,
29(2):110–140, 2003.