Project 1: Parallel Implementation of Matrix Multiplication: C C B B B B A A A A

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

CS708: Scientific Computing Prof.

Zeyun Yu

Project 1: Parallel Implementation of Matrix Multiplication

Assigned: February 26, 2014 (Wednesday)
Due 11:59pm: March 14, 2014 (Friday)


In this project, you will implement the block-based matrix multiplication in C/C++ and
MPI and run it on the peregrine cluster.

Specific Requirements:
1. On the root processor, you will make two matrices, A and B, of 512*512 entries,
whose values are randomly generated in the range of [0, 1]. You will then partition
each matrix into s*s sub-matrices (the choice of s is specified below). The resulting
matrix C should be partitioned in the same way as shown below.

( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
|
|
|
.
|

\
|
=
|
|
|
.
|

\
|

|
|
|
.
|

\
|

s s s
s
s s s
s
s s s
s
C C
C C
B B
B B
A A
A A
, 0 , 1
1 , 0 0 , 0
, 0 , 1
1 , 0 0 , 0
, 0 , 1
1 , 0 0 , 0


2. You will allocate s
2
processors, each of which will calculate one sub-matrix C
p,q
. The
required sub-matrices A
p,k
and B
k,q
(k = 0, ... s-1) will be passed along with MPI from
the root processor to the processor working on C
p,q
. The resulting sub-matrix will be
returned back to the root processor for further verification (see below).
3. In your code, you need to test four cases with different values for s: (1) s = 1 (i.e., no
partitioning), (2) s = 2, (3) s = 4, (4) s = 8. For each case, you will end up with a
512*512 matrix C
k
, (k = 1, 2, 3, 4). To verify if you have the same matrix returned,
you need to randomly pick an entry of C
k
, and double check if you have the same
value for the four cases. You need to repeat the verification for at least ten randomly
picked entries of C
k
.
4. You also need to record the running time for each of the four cases. The time should
be recorded on the root processor, starting right before sending the messages to other
processors and ending right after receiving the messages from other processors. In
other words, the running time should NOT include the time for random matrix
generation and final matrix verification.

Notes:
1. Please submit your source code and a readme (or PDF report) file on D2L
2. Your readme (or PDF report) should include at least three parts: (1) instruction
on how to compile and run your code on the cluster, including the detailed
command line and arguments; (2) the running time recorded for each of the four
cases; (3) the ten verification results, including the random entry index (i, j) you
picked and the corresponding entry values for each of the four cases.

You might also like