Assignment 4

The assignment involves multiplying N sparse matrices using MPI, OpenMP, and CUDA, with a focus on parallelization across multiple nodes and CPU cores. Students must implement a CUDA kernel for matrix multiplication and follow specific input/output formats for the matrices. A report detailing the parallelization strategy and performance analysis is also required alongside the source files and a Makefile for execution.

Uploaded by

Aneeket1 Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views2 pages

Assignment 4

Uploaded by

Aneeket1 Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

COL380 Assignment 4

(2024-25 Sem II)

You are given N sparse matrices Ai, 0 ≤ i < N (of the kind given in Assignment 1 – You may employ
the generate_matrix function from assignment 1 for testing as well). Your task is to multiply this
sequence of matrices: produce ∏Ai in the same format as the input matrices (all integers are 64-bit
long long and all operations are modulo maximum long long). This sequence multiplication is to be
parallelized across multiple nodes (< 8), each with multiple CPU cores (up to 16) and 1 GPU.
Combine MPI, OpenMP, and CUDA (same versions as you have used in the earlier assignments).

You will need to implement a CUDA kernel for multiplying two sparse matrices. You may re-use
the OpenMP parallelization of matrix multiplication. You will distribute these pair-wise
multiplications on all available MPI processes.

The command line of your program has one input string: the path of a readable folder, call it F.
Folder F will contain N+1 files, named matrix1 .. matrixN, and size, respectively. The file
F/size has a two integers (separated by white-spaces). The first denotes N and the second denotes
k, the block size used for all matrices in this folder. Each file matrixi has the following format:

<Height:int> <Width:int>
<Number of non-zero blocks: int>
For each non-zero block:
<Block starting row number:int> <Block starting column number:int>
For each non-zero block (in the same order as above)
k*k <value:int>

Correct matrix files ensure that <Width> value in matrixi equals <Height> value in
matrixi+1. It is possible (if rare) that a matrix’s width or height is not divisible by k – in that case,
the last block will have fewer than k elements.

The output will be a single matrix produces in the execution directory with the name matrix in
the same format as input matrices. (Note that F may be read-only – do not attempt to write there.)

For performance tuning, expect input matrix sizes to be around s*1024 (with a small single digits s).
N may be up to a few hundred. The value of k would be small (e.g., 4, 16, 32). Fewer than 10% of
the values in each matrix would be non-zero.

Note that finding the most efficient order of multiplications in the parallel context is a hard
problem. Consider greedy heuristics to reduce the number of operations. For example, if
multiplying a pair of consecutive matrices produces fewer non-zero blocks, it may be multiplied
first. Remember, the time to compute the heuristic ordering is included in the total multiplication
time.

Submission:
Submit any number of source files, but must also include a Makefile, which produces an
executable a4. The tester will call make, followed by a4 foldername (of course, with
appropriate mpirun).
In addition to the source file, also submit a report in file report.pdf (only for part 2) explaining
your parallelization strategy. Include your performance analysis in the report itself. (No csv files are
required.) The total execution time will include file IO times (and will be determined by the slowest
MPI process). You may also separately analyze the relative performances of your CUDA and
OpenMP parts.

6CS005 - Assessment 20-21
No ratings yet
6CS005 - Assessment 20-21
25 pages
Midterm
No ratings yet
Midterm
5 pages
COL380 Assignment 1
No ratings yet
COL380 Assignment 1
10 pages
Ee8218 Lab2
No ratings yet
Ee8218 Lab2
7 pages
Java Practise Exercise
No ratings yet
Java Practise Exercise
3 pages
MAP Lab Mannual
No ratings yet
MAP Lab Mannual
24 pages
HPC La2-2023qs
No ratings yet
HPC La2-2023qs
5 pages
Matrix Multiplication-Javan.
No ratings yet
Matrix Multiplication-Javan.
6 pages
CP4252 Multicore Architecture and Programming Lab Manual
No ratings yet
CP4252 Multicore Architecture and Programming Lab Manual
26 pages
Tiny Project 1
No ratings yet
Tiny Project 1
2 pages
Structures
No ratings yet
Structures
5 pages
Inf3380 Oblig2 2011
No ratings yet
Inf3380 Oblig2 2011
3 pages
Assignment 04
No ratings yet
Assignment 04
16 pages
Master in High Performance Computing Advanced Parallel Programming LABS
No ratings yet
Master in High Performance Computing Advanced Parallel Programming LABS
2 pages
Matrix Mult
100% (1)
Matrix Mult
55 pages
Digital Assignment-6: Name: Bejugam Shiva Suprith REG NO: 18BCE0427 Faculty: Narayanamoorthi M SLOT: L59+L60
No ratings yet
Digital Assignment-6: Name: Bejugam Shiva Suprith REG NO: 18BCE0427 Faculty: Narayanamoorthi M SLOT: L59+L60
14 pages
PL01 Guiao
No ratings yet
PL01 Guiao
3 pages
HPC LA2 2024 Questions
No ratings yet
HPC LA2 2024 Questions
7 pages
HW6 Program With Multiple Threads or Processes 6025687028328934
No ratings yet
HW6 Program With Multiple Threads or Processes 6025687028328934
2 pages
Advanced Computer Architecture 1
No ratings yet
Advanced Computer Architecture 1
14 pages
CP4292 Mcap
No ratings yet
CP4292 Mcap
24 pages
EE 463 Operating Systems Spring 2018 Homework #3 Dr. A. M. Al-Qasimi Objectives
No ratings yet
EE 463 Operating Systems Spring 2018 Homework #3 Dr. A. M. Al-Qasimi Objectives
2 pages
CP4292 Multicore Architecture Lab Manual
No ratings yet
CP4292 Multicore Architecture Lab Manual
36 pages
Optimize Matrix Multiplication Utilizing Opencl Fpga Kernel
No ratings yet
Optimize Matrix Multiplication Utilizing Opencl Fpga Kernel
8 pages
Matrix Mul
No ratings yet
Matrix Mul
33 pages
OpenMP Matrix
No ratings yet
OpenMP Matrix
6 pages
LP1 1
No ratings yet
LP1 1
129 pages
Multicore Architecture and Programming Lab Manual
No ratings yet
Multicore Architecture and Programming Lab Manual
29 pages
20 Quiz 14
No ratings yet
20 Quiz 14
12 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
ECN Informatics 2013
No ratings yet
ECN Informatics 2013
17 pages
Lab 7
No ratings yet
Lab 7
3 pages
VL2020210104311 Fat PDF
No ratings yet
VL2020210104311 Fat PDF
6 pages
CS6135 HW3 Spec
No ratings yet
CS6135 HW3 Spec
6 pages
Matrix Computation On The GPU
No ratings yet
Matrix Computation On The GPU
455 pages
HW 1
No ratings yet
HW 1
5 pages
Matlab Fundamentals
No ratings yet
Matlab Fundamentals
7 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
Assignment 6 - P1
No ratings yet
Assignment 6 - P1
7 pages
Big HW2 2025
No ratings yet
Big HW2 2025
15 pages
03 Multidimensional Lists Exercise
No ratings yet
03 Multidimensional Lists Exercise
8 pages
LinearAlgebra Matlab HW3 V2s
No ratings yet
LinearAlgebra Matlab HW3 V2s
5 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Lab 2
No ratings yet
Lab 2
2 pages
Assignment2 3
No ratings yet
Assignment2 3
1 page
CUDA Part-2
No ratings yet
CUDA Part-2
49 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
InLab12Exercise cs-100 Lums
No ratings yet
InLab12Exercise cs-100 Lums
4 pages
CS2209 - Oops Lab Manual
100% (1)
CS2209 - Oops Lab Manual
62 pages
Matrix Multiplication With CUDA - A Basic Introduction To The CUDA Programming Model
No ratings yet
Matrix Multiplication With CUDA - A Basic Introduction To The CUDA Programming Model
44 pages
Gauravkumar 221it027@it301 Lab2
No ratings yet
Gauravkumar 221it027@it301 Lab2
28 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
Matrix
No ratings yet
Matrix
2 pages
Assignment 2 MPI MSA
No ratings yet
Assignment 2 MPI MSA
10 pages
Task Level Parallelization of All Pair Shortest Path Algorithm in Openmp 3.0
No ratings yet
Task Level Parallelization of All Pair Shortest Path Algorithm in Openmp 3.0
4 pages
410A Week 5
No ratings yet
410A Week 5
23 pages
CS 2073 Lab 10: Matrix Multiplication Using Pointers: I Objectives
No ratings yet
CS 2073 Lab 10: Matrix Multiplication Using Pointers: I Objectives
2 pages
Assignment 2-3
No ratings yet
Assignment 2-3
5 pages
Osbook-V0.775 Removed
No ratings yet
Osbook-V0.775 Removed
31 pages
Osbook-V0 78
No ratings yet
Osbook-V0 78
433 pages
Osbook-V0.776 Removed
No ratings yet
Osbook-V0.776 Removed
17 pages
2nd Puc Computer Science Notes PDF (1 Mark Questions and Answers) - 2nd Puc Computer Science
No ratings yet
2nd Puc Computer Science Notes PDF (1 Mark Questions and Answers) - 2nd Puc Computer Science
15 pages
Python Question Bank
No ratings yet
Python Question Bank
24 pages
Place Value Cards Virtual Manipulatives Toy Theater
No ratings yet
Place Value Cards Virtual Manipulatives Toy Theater
1 page
Thesis Jakhongir Fayzullaev
No ratings yet
Thesis Jakhongir Fayzullaev
62 pages
CS 217 5 Classes
No ratings yet
CS 217 5 Classes
111 pages
Cognizant Eligible Students For Technical Assessment On 19th September
No ratings yet
Cognizant Eligible Students For Technical Assessment On 19th September
12 pages
Ip Project 2ND Year
No ratings yet
Ip Project 2ND Year
18 pages
Lesson 7 PHP OOP
No ratings yet
Lesson 7 PHP OOP
25 pages
Sessional II Toc Set 3
No ratings yet
Sessional II Toc Set 3
2 pages
Complete Part of Array
No ratings yet
Complete Part of Array
3 pages
Aliff Ilham Sabki Bin Omar Sabki-Dsc5516b (Python)
No ratings yet
Aliff Ilham Sabki Bin Omar Sabki-Dsc5516b (Python)
6 pages
DS JAVA - 12th Lab Program Bubble, Selection, Insertion
No ratings yet
DS JAVA - 12th Lab Program Bubble, Selection, Insertion
4 pages
Algorithms:: Inserting at Beginning of The List
No ratings yet
Algorithms:: Inserting at Beginning of The List
4 pages
Class 9 Project Second
No ratings yet
Class 9 Project Second
7 pages
Lab Report
No ratings yet
Lab Report
43 pages
Array
No ratings yet
Array
9 pages
FINAL Solving Two-Step Equations-2
No ratings yet
FINAL Solving Two-Step Equations-2
30 pages
Unit 6 Coa
No ratings yet
Unit 6 Coa
20 pages
Mamba Sequence Modeling Presentation
No ratings yet
Mamba Sequence Modeling Presentation
16 pages
OOPS Guess Paper
No ratings yet
OOPS Guess Paper
31 pages
350 Questions 8 Aug RRB Clerk Prelims Marathon Arun Singh Rawat
No ratings yet
350 Questions 8 Aug RRB Clerk Prelims Marathon Arun Singh Rawat
252 pages
AES512-Bit Advanced Encryption Standard Algorithm Design and Evaluation
No ratings yet
AES512-Bit Advanced Encryption Standard Algorithm Design and Evaluation
6 pages
History of Java
No ratings yet
History of Java
2 pages
Confidential: SESSION 2018/2019
No ratings yet
Confidential: SESSION 2018/2019
26 pages
E-Vaccination System: Submitted by
No ratings yet
E-Vaccination System: Submitted by
89 pages
D16 R Discrete Mathematics and Combinatory
No ratings yet
D16 R Discrete Mathematics and Combinatory
20 pages
DSA Sheet
No ratings yet
DSA Sheet
5 pages
Booth's Multiplication
No ratings yet
Booth's Multiplication
22 pages
Assets, Images, And Icon Widgets in Flutter
No ratings yet
Assets, Images, And Icon Widgets in Flutter
8 pages
Quiz
No ratings yet
Quiz
4 pages

Assignment 4

Uploaded by

Assignment 4

Uploaded by

COL380 Assignment 4

(2024-25 Sem II)

You might also like