0% found this document useful (0 votes)

9 views6 pages

Matrix Multiplication-Javan.

This document outlines a lab focused on matrix multiplication, covering serial, cache efficient, and parallel methods using programming concepts such as loops and conditionals. It requires basic programming knowledge in C or Java and provides detailed methodologies for implementing the different multiplication techniques, including pseudo code examples. The lab emphasizes the importance of cache locality and data parallelism, particularly in the context of using OpenMP for parallel processing.

Uploaded by

meghakr803

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

Matrix Multiplication-Javan.

Uploaded by

meghakr803

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Matrix Multiplication

Course Level:
CS2

PDC Concepts Covered:

PDC Concept Bloom Level

Concurrency C
Data Parallel A
Cache Locality A

Programming Knowledge Prerequisites:

Basic programming knowledge in C or Java is required for this lab. More specifically, the skills below
are needed to complete this lab.
• Variable declaration
• If-else
• For loop, while loop
• Functions/methods

Tools Required:
Java: Java Development Kit (JDK) 8 or later

Prolog and Review:

One of the important concepts in programming is the if-else statement. Using the if-else statement,
programmer can decide which actions to take in the program. Following is the if-else syntax.

1 if(boolean_expression) {
2 /* statement(s) will execute if the boolean expression is true */
3 } else {
4 /* statement(s) will execute if the boolean expression is false */
5}

If the Boolean expression evaluates to true, then the if block is executed, otherwise else block of code
is executed. Likewise, another important programming concept is looping. Using looping, a
programmer can do a set of statements multiple times. A common looping structure in programming
languages is the for-loop. Consider the following example.

1 for ( initialization; condition; increment ) {

2 statement(s);
3}

The initialization is executed once upon entering the loop. Next, the condition is evaluated. If it is
true then the body of the loop is executed, otherwise loop exits, and the program continues with the
next statement that occurs after the for-loop. After the body executes, control flow jumps to
increment statement. Typically, the increment statement updates a loop control variable that is used
in the condition to determine is the loop should continue. The condition gets evaluated again and the
process repeats. Like for loop, another common loop structure is while loop. In a while loop,
statements are repeatedly executed as long as a given condition is true. The syntax is given below.

1 while(condition) {
2 statement(s);
3}

For this lab you will also have some idea about cache locality. In computer systems, programs tend to
reuse data and instructions near those they have used recently, or exactly same data and instructions.
There are two basic types of locality: Temporal and Spatial. In Temporal locality, recently referenced
items are likely to be referenced in the near future. In Spatial locality, the data items with nearby
addresses tend to be referenced in near future. Strong cache locality gives great performance to a
system as it reduces cache misses.

Problem description:

In this lab, you will be writing programs to multiply two matrices. Two matrices can be square or any
size. As you will be using personal computer, restrict the dimension of the matrices to below 2000 by
2000 (square matrices). Consider the below diagram, where matrix A and B have dimensions (M x K)
and (K x N) respectively. Therefore, the resultant matrix C has dimension (M x N).

N
N K

M = M x K

C
A B
Fig 1: Matrix A and matrix B multiply to generate matrix C.

You will write three functions to implement serial, parallel and cache efficient ways of multiplying two
matrices.
a) Serial method: Serial method is pretty much the way you did in high school. It’s also known as
naïve method, where rows of A matrix multiplies (and adds) column of B matrix and generates
a single cell of first row of C matrix.
b) Cache efficient method: In this method, cache misses are reduced by multiplying a cell of A
matrix with a row of B matrix at a time. After each iteration, partial update of a row of C matrix
is generated.
c) Parallel method: In this method, you will divide the number of rows of A matrix with the
number of processors. Each different part of the A matrix is then assigned to processors. Each
processor multiplies their part of A matrix with the whole B matrix to generate part of C matrix.
Matrix B is not partitioned to keep the program as simple as possible.

Methodology:

The serial and the cache efficient multiplications are pretty much straight forward as no partitioning
required to the matrices. To implement the parallel version you will partition row-wise to the A matrix.
Each part of the A matrix will be multiplied with the whole B matrix. If the number of rows is not
evenly divided by the number of processors, the last processor gets some extra rows of A matrix.

N
N

Processor 1 Processor 1
M = M x K
Processor 2 Processor 2

C A
B

Fig 2: Data Partitioning of A matrix among the two processors.

Figure 2 shows how matrix A is partitioned into two equal halves and assigned to two different
processors. Now both processors concurrently perform multiplication of A (half) and B matrices.

*Note to the instructor: Instructor can illustrate Data parallel by pointing to that different processors
are doing the same computation but on different pieces of data.

Implementation:

For this lab, you will be implementing three versions of matrix multiplications: serial, cache efficient
and parallel. To create matrices two dimensional arrays can be used. Fortunately, you can use one
dimensional array to make your life easy for this lab. Consider three matrices with their dimension
below.

A: M x K (M and K are number of rows and columns of A matrix)

B: K x N (K and N are number of rows and columns of B matrix)
C: M x N (M and N are number of rows and columns of C matrix)
You can declare the three matrices like below.

1 double[] A = new double[M*K];

2 double[] B = new double[K*N];
3 double[] C = new double[M*N];

Now, any 2D cell location say (i, j) can be located by the below expression.

1 j + (i * num_of_columns)

Matrices are initialized with random floating point numbers. Consider below code of random
initialization to A and B matrices.

1 void initialize_matrix(double* matrix, int row, int col){

2 for (int i = 0; i < row; i++ ){
3 for(int j = 0; j < col; j++){
4 matrix[j+ i*col] = i+j;//random number
5 }
6 }
7}

Now, serial implementation can be performed using three for loops. Consider the following pseudo
code.

Serial_matrix_multiplication()
Inputs:
A - Array of matrix A
B - Array of matrix B
C - Array of matrix C
M - Rows of matrix A
K - Columns of matrix A
N – Column of matrix B
Outputs:
C – Computed cells of C matrix
Begin
Loop index from i=0 to M
Loop index from j=0 to N
Set my_sum to zero
Loop index from k=0 to K
Multiply A[k + i*K] with B[j + k*N] and update my_sum
Store my_sum to C[j + i*N]
End

In case of cache efficient version, second and third loops will need interchange. This makes
processor to read row of B matrix instead of column of B matrix. It reduces cache misses as the
C/C++ compiler reads array elements in row major order.

Cache_efficient_matrix_multiplication()
Inputs:
A - Array of matrix A
B - Array of matrix B
C - Array of matrix C
M - Rows of matrix A
K - Columns of matrix A
N – Column of matrix B
Outputs:
C – Computed cells of C matrix
Begin
Loop index from i=0 to M
Loop index from k=0 to K
Set temp_var to A[k + i*K]
Loop index from j=0 to N
Multiply temp_var with B[j + k*N] and update C[j + i*N]
End

In the above pseudo code, variable temp_var is set to A[k + i*K], which stays in the cache until the
third loop ends. This is an example of Temporal Cache locality we discussed earlier. When the
address of B[j + k*N] is referenced in third loop, the cache block reads consecutive elements in row
major order from the main memory into the cache. This reduces memory read time by the
processor while running in the third loop as the processor finds the data available in the cache. This
is an example of Spatial cache locality.

At this point, you got idea about the serial and cache efficient implementation of matrix
multiplication. Now, we will talk about the parallel implementation. As always you will use OpenMP
threads for this lab. To spawn threads the below directive is used before the code block that is run
by the threads.

1 //#omp parallel num_threads(thread_count) shared( A, B, C, M, N, K,

thread_count)
As we discussed earlier, you will partition the A matrix and assign to the threads. The matrix B will
not be partitioned as each thread needs the entire B matrix. You can do the partitioning using the
below code snippet.

1 int my_rank= Pyjama.omp_get_thread_num();

2 int num_parts = M/thread_count;
3 int my_first_i = num_parts * my_rank;
4 int my_last_i;
5 if (my_rank == thread_count-1) my_last_i = M;
6 else my_last_i = my_first_i + num_parts;

In the above code, M is the number of rows of A matrix. Every thread calculates its starting and
ending row position of A matrix. Thus, every thread can work with a smaller block of A matrix
(num_parts x K) to multiply the B matrix simultaneously.

CS508 SOLVED MCQs FINAL TERM BY JUNAID
100% (1)
CS508 SOLVED MCQs FINAL TERM BY JUNAID
54 pages
Optimize Matrix Multiplication Utilizing Opencl Fpga Kernel
No ratings yet
Optimize Matrix Multiplication Utilizing Opencl Fpga Kernel
8 pages
Advanced Computer Architecture 1
No ratings yet
Advanced Computer Architecture 1
14 pages
LinearAlgebra Matlab HW3 V2s
No ratings yet
LinearAlgebra Matlab HW3 V2s
5 pages
Chained Matrix Multiplication
No ratings yet
Chained Matrix Multiplication
32 pages
1.2 MARS Data Cache Simulator Tool
No ratings yet
1.2 MARS Data Cache Simulator Tool
2 pages
Cache Performance
No ratings yet
Cache Performance
44 pages
Parallel & Distributed Computing
No ratings yet
Parallel & Distributed Computing
58 pages
2D Array Lab Manual
No ratings yet
2D Array Lab Manual
6 pages
CS33 S25 L14 OpenMP Intro Annotated
No ratings yet
CS33 S25 L14 OpenMP Intro Annotated
73 pages
Matrix Multiplication Algorithm
No ratings yet
Matrix Multiplication Algorithm
9 pages
Start Declaration/ Initialization Min. No. of Matrix 2 Error Reading Matrices Stop Dim's of
No ratings yet
Start Declaration/ Initialization Min. No. of Matrix 2 Error Reading Matrices Stop Dim's of
3 pages
DAA Mini Project
No ratings yet
DAA Mini Project
6 pages
Blocked Matrix Multiply
No ratings yet
Blocked Matrix Multiply
6 pages
Rec 07
No ratings yet
Rec 07
40 pages
Array Unit 2 Notes
No ratings yet
Array Unit 2 Notes
39 pages
LAB Manual
No ratings yet
LAB Manual
99 pages
COA Imple
No ratings yet
COA Imple
22 pages
Embedded C Programming
100% (1)
Embedded C Programming
57 pages
A Practical Performance Comparison of Parallel Matrix Multiplication Algorithms On Networks of Workstations
No ratings yet
A Practical Performance Comparison of Parallel Matrix Multiplication Algorithms On Networks of Workstations
2 pages
CS 2073 Lab 10: Matrix Multiplication Using Pointers: I Objectives
No ratings yet
CS 2073 Lab 10: Matrix Multiplication Using Pointers: I Objectives
2 pages
Cannon Strassen DNS Algorithm
No ratings yet
Cannon Strassen DNS Algorithm
10 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Presentation 13627 Content Document 20231203040237PM
No ratings yet
Presentation 13627 Content Document 20231203040237PM
39 pages
Operating Systems Lab Assignment 5: Developing Multi-Threaded Applications
No ratings yet
Operating Systems Lab Assignment 5: Developing Multi-Threaded Applications
7 pages
COSS - Lecture - 6 - With Annotation
No ratings yet
COSS - Lecture - 6 - With Annotation
37 pages
2 Cache Complexity
No ratings yet
2 Cache Complexity
100 pages
CS-114 Fundamentals of Programming (2+1) DE-41 EE Semester 1 Fall 2019
No ratings yet
CS-114 Fundamentals of Programming (2+1) DE-41 EE Semester 1 Fall 2019
4 pages
Psc-Unit 3
No ratings yet
Psc-Unit 3
33 pages
2D Arrays
No ratings yet
2D Arrays
38 pages
Dsa 1
No ratings yet
Dsa 1
12 pages
Ca 3
No ratings yet
Ca 3
34 pages
PL01 Guiao
No ratings yet
PL01 Guiao
3 pages
Lab Manual Data Structure
No ratings yet
Lab Manual Data Structure
77 pages
Lecture Slides 07 076-Caches-Opt
No ratings yet
Lecture Slides 07 076-Caches-Opt
11 pages
C Labs-2
No ratings yet
C Labs-2
3 pages
03 DS Array 2024
No ratings yet
03 DS Array 2024
47 pages
Matrix Algebra
No ratings yet
Matrix Algebra
25 pages
Task 1 Types of Parallel Processing
No ratings yet
Task 1 Types of Parallel Processing
3 pages
Unit-1 Matrix Multiplication
No ratings yet
Unit-1 Matrix Multiplication
18 pages
Class XII Computer Science Practice Questions
No ratings yet
Class XII Computer Science Practice Questions
7 pages
MIT6 172F09 Lec02
No ratings yet
MIT6 172F09 Lec02
85 pages
Unit-5 Toc
No ratings yet
Unit-5 Toc
41 pages
DSA Full Final
No ratings yet
DSA Full Final
74 pages
Code Generation Compiler Construction
No ratings yet
Code Generation Compiler Construction
38 pages
Data Structure N Algorithm
No ratings yet
Data Structure N Algorithm
2 pages
Recitation05 Cachelab
No ratings yet
Recitation05 Cachelab
97 pages
Fortran Array
No ratings yet
Fortran Array
1 page
Matrix and Graph
No ratings yet
Matrix and Graph
44 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
Daa 1
No ratings yet
Daa 1
40 pages
HPC-Practical-4Addition of Two Large Vectors
No ratings yet
HPC-Practical-4Addition of Two Large Vectors
4 pages
228 Sakshi Pahade Lab Manual 5
No ratings yet
228 Sakshi Pahade Lab Manual 5
13 pages
OpenMP Matrix
No ratings yet
OpenMP Matrix
6 pages
Lab10 - Arrays2 - Sec450 C#
No ratings yet
Lab10 - Arrays2 - Sec450 C#
9 pages
Compiler Unit 5 Notes
No ratings yet
Compiler Unit 5 Notes
20 pages
Rohan Shashikant Dhumal 2024300049 Psipl Exp8
No ratings yet
Rohan Shashikant Dhumal 2024300049 Psipl Exp8
9 pages
Matrix Mul
No ratings yet
Matrix Mul
33 pages
LP1 1
No ratings yet
LP1 1
129 pages
12 CS EM Public Answer Key May 2022
No ratings yet
12 CS EM Public Answer Key May 2022
10 pages
Introduction To C Programming Language
No ratings yet
Introduction To C Programming Language
30 pages
Looping Statements in C
No ratings yet
Looping Statements in C
7 pages
Python For Loop-Programs
No ratings yet
Python For Loop-Programs
15 pages
Introduction To Problem Solving Using C: Basic Concepts of Computer
No ratings yet
Introduction To Problem Solving Using C: Basic Concepts of Computer
32 pages
Netlinx Programming Language
No ratings yet
Netlinx Programming Language
246 pages
Practice 03 C Programming Constructs
No ratings yet
Practice 03 C Programming Constructs
20 pages
System Verilog Introduction
No ratings yet
System Verilog Introduction
20 pages
21-22 Cs Xi Assignments
No ratings yet
21-22 Cs Xi Assignments
59 pages
Control Statements
No ratings yet
Control Statements
63 pages
Control Structures: Introduction To Programming 1
No ratings yet
Control Structures: Introduction To Programming 1
39 pages
Introduction To The QUINCY C
No ratings yet
Introduction To The QUINCY C
14 pages
Programming in C
100% (2)
Programming in C
56 pages
Unit 4
No ratings yet
Unit 4
15 pages
Introduction To Python Solutions
No ratings yet
Introduction To Python Solutions
36 pages
Python End Sem Answer
100% (1)
Python End Sem Answer
23 pages
Week 10 Cursor
No ratings yet
Week 10 Cursor
44 pages
FSD - Experiment - VIII - For II-II-CSE-A - 8abcdef
No ratings yet
FSD - Experiment - VIII - For II-II-CSE-A - 8abcdef
22 pages
Python All Programs
No ratings yet
Python All Programs
7 pages
Java Programming 7th Edition Joyce Farrell Test Bank Instant Download
100% (3)
Java Programming 7th Edition Joyce Farrell Test Bank Instant Download
47 pages
VBA While Loop
No ratings yet
VBA While Loop
16 pages
Chapter 5 - Array and Strings
No ratings yet
Chapter 5 - Array and Strings
50 pages
Java For Loop (With Examples)
No ratings yet
Java For Loop (With Examples)
12 pages
Lecture06 Arrays
No ratings yet
Lecture06 Arrays
30 pages
Looping
No ratings yet
Looping
13 pages
OOP Assignment For HND
100% (2)
OOP Assignment For HND
57 pages
OS Lab Manual
No ratings yet
OS Lab Manual
56 pages
Unit IV JavaScript New
No ratings yet
Unit IV JavaScript New
22 pages
Cs 1101 C
No ratings yet
Cs 1101 C
117 pages

Matrix Multiplication-Javan.

Uploaded by

Matrix Multiplication-Javan.

Uploaded by

Matrix Multiplication

PDC Concepts Covered:

PDC Concept Bloom Level

Programming Knowledge Prerequisites:

Prolog and Review:

1 for ( initialization; condition; increment ) {

Fig 2: Data Partitioning of A matrix among the two processors.

A: M x K (M and K are number of rows and columns of A matrix)

1 double[] A = new double[M*K];

1 void initialize_matrix(double* matrix, int row, int col){

1 //#omp parallel num_threads(thread_count) shared( A, B, C, M, N, K,

1 int my_rank= Pyjama.omp_get_thread_num();

You might also like