King Fahd University of Petroleum & Minerals
Information and Computer Science Department
ICS 507: Design and Analysis of Parallel Algorithms
Project
(Due: Tuesday May 14, 2024 at Midnight)
In this project, we will study the problem of parallelization of matrix multiplication, with respect
to two divide and conquer algorithms. The sequential straightforward way of carrying out the
matrix multiplication is given below.
/* Sequential Version */
void multiply(int A[][N], int B[][N], int C[][N])
{
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
C[i][j] = 0;
for (int k = 0; k < N; k++) {
C[i][j] += A[i][k]*B[k][j];
}
}
}
}
The two divide and conquer versions are the straightforward divide and conquer algorithm and
Strassen’s algorithm. The project will be carried out in phases.
Phase 1 (10 points): Background (Due Saturday April 27, 2024 at midnight)
In this phase, you will study and report on the two divide and conquer versions. First, you will
describe the straightforward divide and conquer version, write the algorithm, and analyze its
time complexity in terms of counting the number of scalar multiplications. Second, you will
describe Strassen’s divide and conquer algorithm, write the algorithm, and analyze its time
complexity, again in terms of the number of scalar multiplications. Compare and contrast both
versions as far as the efficiency of their sequential algorithms.
The outcome of this phase is a report on the above consisting of 2-4 pages.
Phase II [70 points]: Implementation, Experimentation and Analysis (Due Tuesday May
14, 2024 at midnight)
Question 1 [25 points] Program Implementation
Use OpenMP to develop
1. (5 points) The sequential multiplication version.
2. (5 points) A parallelization of the sequential multiplication version.
3. (5 points) The sequential straightforward divide and conquer version.
4. (5 points) A parallelization of the straightforward divide and conquer version.
5. (5 points) The sequential Strassen’s divide and conquer version.
Your implementation needs to take the following considerations into account:
The input should be read from a text file containing the following information
o The matrix dimension 𝑛 in the first line. The input matrices are going to be
𝑛 × 𝑛 matrices. You may assume that 𝑛 is a power of 2.
o 2𝑛 integer values, entered in a row-major form starting from the second
1
line onward, representing the two input matrices, respectively. Each input
value is a random integer between -9 and 9, inclusive.
The values should all be of type “long integer”, so that you do not run into overflow
problems due to multiplying matrices of large sizes.
Your code should be able to support matrices of sizes up to 2 × 2 (or as far as
you can get).
There shall be ten output text files generated, two for each multiplication method.
The first file is the output resulting matrix. The filename shall consist of the input
filename, followed by the dimension of the output matrix n followed by the word
“output” followed by the method of multiplication. Please use the following for
the name of the multiplication method: Sequential, SequentialP,
StraightDivAndConq, StraightDivAndConqP and StrassenDivAndConq. For
example, if the input filename is input1 and the dimension of the matrices is
128 × 128, then the output files containing the results of the multiplication for
each method are:
input1_128_output_Sequential.txt
input1_128_output_ SequentialP.txt
input1_128_output_StraightDivAndConq.txt
input1_128_ output_StraightDivAndConqP.txt
input1_128_ output_StrassenDivAndConq.txt
The second file contains the time (hh:mm:ss) taken by the matrix multiplication
algorithm. Note that you need to exclude the time of reading the input or writing
the output. The filename shall consist of the input filename, followed by the
dimension of the output matrix n followed by the word “info” followed by the
method of multiplication.
Question 2 [35 points] Experimental Evaluation
Run the five implementations on different matrices with varying parameters of
dimensions, number of processors used, and base cases for the stopping criteria of the
recursion. Tabulate your results in tabular form and give your assessment of the five
algorithms based on your experimentation.
Question 3 [10 points]
Analyze the performance of each parallel algorithm theoretically by providing the
following information:
1. The shared memory model used for each parallel implementation.
2. The expected running time of each parallel implementation as a function of the
size 𝑛 and the number of processors 𝑝.
3. Explain any discrepancies that you may encounter due to difference between the
theoretical analysis and the actual performance of each implementation.
Phase III (20 points): Presentation
In this phase, an online presentation of each project will be given, which will be attended by all
the students. The presentation should not exceed 10 minutes and should have the following
information:
a. Status of the project: This includes what was implemented and what was not
implemented. What was additionally implemented but not required, if any. Any
limitations of your implementations.
b. Description of the experimentation and summary of the results. This includes the
specifications of the machine used.
c. Conclusions.
The date and time of the presentation will be agreed upon later, in shaa Allaah.
2
IMPORTANT NOTES REGARDING THE DELIVERABLES
2. This project will be done in groups of 3 students.
3. Your submission must be a zip file containing the following directories:
a. src: Your Project Implementation [Answer to Question 1].
b. doc: A short README file on how to compile and run your code.
c. rep: The project report.
d. pres: The project presentation.
4. The report should contain the following:
a. Cover page.
b. Status of the project. This includes
i. What has been implemented from the required algorithms.
ii. Mention any additional algorithms implemented, if any.
iii. What are the limitations, if any, of your implementation[s] and or
difficulties encountered.
iv. Who did what in the project.
c. Experimentation Results
i. Specification of the machine used in running the algorithms.
ii. How many runs were performed, what size of matrices were used, …etc.
[Answer to Question 2]
iii. Analysis of the performance of all algorithms. [Answer to Question 3]
d. Summary and Conclusion
e. Any references used in the development of the parallel algorithms or preparing
the background.