Project Topics(PT) for CENG577 Parallel Computing - Fall 2011 Deadline: January 6th, 2012 @ 23:55 General notes:
: In every project (when possible) you are expected to: *use nonblocking operations *overlap communication with computation *use efficient kernels In addition to above, grading will be based on the correctness of your implementation and the final report (5-10 pages long) which should contain the algorithm description, scalability plots, comparisons, and interpretation of results. What to submit: Submit (on cow) the final report and the source code as a zip file before the deadline. Important deadlines: October 26th, 2011 - email the names of the people in your group to: manguoglu@[Link] November 16th, 2011 - choose your project topic January 6th, 2012 - submit the final reports and files PT#1: Parallel sparse matrix vector multiplication with graph and hypergraph partitioning Given a sparse input matrix, apply a graph (METIS or other) and a hypergraph (hMETIS and/or PaToH or others) partitioning algorithm on the matrix and implement parallel sparse matrix vector multiplication using MPI and using the partitioning computed for a given number of MPI processes. Time the parts of the code (such as graph partitioning algorithm, reordering/partitioning operation itself, and the actual sparse matrix vector multiplication). Provide parallel scalability plots as a function of number of MPI processes using ~10 large square matrices from the University of Florida Sparse Matrix Colelection. Compare the graph and hypergraph partitioning algorithms for the same set of matrices. PT#2: Parallel banded lower (or upper) triangular solver Read the paper: "Practical Parallel Band Triangular System Solvers", ACM Transactions on Mathematical Software, Volume 4 Issue 3, Sept. 1978, [Link], [Link], and A.H. Sameh Implement the algorithm described for solving lower (or upper) triangular systems in the paper (and in the class) using MPI. The input should be a banded matrix stored in LAPACK banded storage format and a right hand side which could be a single vector or a group of vectors (i.e. a dense matrix). Mesaure the parallel scalability as a function of number of MPI processes and compare the results to the banded solver provided in ScaLAPACK as the matrix size and the lower (or upper) bandwidth varied.
PT#3: Parallel banded linear system solver (Spike algorithm) Read the paper: "On Stable Parallel Linear System Solvers", Sameh, A. H.; Kuck, D. J., Journal of the ACM 25(1) pp.81-91, 1978 Implement the algorithm described for solving banded linear systems in the paper (and in the class) using MPI. The input should be a banded matrix stored in LAPACK banded storage format and a right hand side which could be a single vector or a group of vectors (i.e. a dense matrix). Mesaure the parallel scalability as a function of number of MPI processes and compare the results to the banded solver provided in ScaLAPACK as the matrix size and the bandwidth varied.
PT#4: Parallel banded linear system solver (Cyclic reduction algorithm) Read the papers: (1) R. W. Hockney. 1965. A Fast Direct Solution of Poisson's Equation Using Fourier Analysis. J. ACM 12, 1 (January 1965), 95-11 (2) Harold S. Stone. 1975. Parallel Tridiagonal Equation Solvers. ACM Trans. Math. Softw. 1, 4 (December 1975) Implement the algorithm described for solving banded linear systems in the papers (and in the class) using MPI. The input should be a banded matrix stored in LAPACK banded storage format and a right hand side which could be a single vector or a group of vectors (i.e. a dense matrix). Mesaure the parallel scalability as a function of number of MPI processes and compare the results to the banded solver provided in ScaLAPACK as the matrix size and the bandwidth varied. PT#5: Parallel banded linear system solver (Balance algorithm) Read the paper: [Link], A.H. Sameh, V. Sarin, A parallel balance scheme for banded linear systems, Numerical Linear Algebra with Application, 8(5) , pp.297-316, 2001 Implement the algorithm described for solving banded linear systems in the paper (and in the class) using MPI. The input should be a banded matrix stored in LAPACK banded storage format and a right hand side which could be a single vector or a group of vectors (i.e. a dense matrix). Mesaure the parallel scalability as a function of number of MPI processes and compare the results to the banded solver provided in ScaLAPACK as the matrix size and the bandwidth varied.