Canon's Algorithm

This document discusses parallel algorithms for matrix multiplication. It begins by describing a simple parallel algorithm that partitions the matrices into blocks and distributes the blocks across processes. Each process computes a block of the result matrix using blocks of the input matrices. It analyzes the computation and communication costs. The document then presents Cannon's algorithm, which improves memory efficiency by shifting blocks during the computation to pair matching blocks. It provides pseudocode and analyzes Cannon's performance.

Uploaded by

Venkatesan N

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

Canon's Algorithm

Uploaded by

Venkatesan N

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Lecture 6: Parallel Matrix

Algorithms (part 3)

1
A Simple Parallel Matrix-Matrix Multiplication
Let 𝐴 = [𝑎𝑖𝑗 ]𝑛×𝑛 and 𝐵 = [𝑏𝑖𝑗 ]𝑛×𝑛 be n × n matrices. Compute 𝐶 =
𝐴𝐵
• Computational complexity of sequential algorithm: 𝑂(𝑛3 )
• Partition 𝐴 and 𝐵 into 𝑝 square blocks 𝐴𝑖,𝑗 and 𝐵𝑖,𝑗 (0 ≤ 𝑖, 𝑗 < 𝑝)
of size (𝑛/ 𝑝) × (𝑛/ 𝑝) each.
• Use Cartesian topology to set up process grid. Process 𝑃𝑖,𝑗 initially
stores 𝐴𝑖,𝑗 and 𝐵𝑖,𝑗 and computes block 𝐶𝑖,𝑗 of the result matrix.
• Remark: Computing submatrix 𝐶𝑖,𝑗 requires all submatrices 𝐴𝑖,𝑘 and
𝐵𝑘,𝑗 for 0 ≤ 𝑘 < 𝑝.

2
• Algorithm:
– Perform all-to-all broadcast of blocks of A in each row of
processes
– Perform all-to-all broadcast of blocks of B in each column
of processes
𝑝−1
– Each process 𝑃𝑖,𝑗 perform 𝐶𝑖,𝑗 = 𝑘=0 𝐴𝑖,𝑘 𝐵𝑘,𝑗

3
Performance Analysis
• 𝑝 rows of all-to-all broadcasts, each is among a group of 𝑝
𝑛2
processes. A message size is , communication time: 𝑡𝑠 𝑙𝑜𝑔 𝑝 +
𝑝
𝑛2
𝑡𝑤 𝑝−1
𝑝
• 𝑝 columns of all-to-all broadcasts, communication time:
𝑛2
𝑡𝑠 𝑙𝑜𝑔 𝑝 + 𝑡𝑤 𝑝−1
𝑝
• Computation time: 𝑝 × (𝑛/ 𝑝)3 = 𝑛3 /𝑝
𝑛3 𝑛2
• Parallel time: 𝑇𝑝 = + 2 𝑡𝑠 𝑙𝑜𝑔 𝑝 + 𝑡𝑤 𝑝−1
𝑝 𝑝

4
Memory Efficiency of the Simple Parallel Algorithm

• Not memory efficient

– Each process 𝑃𝑖,𝑗 has 2 𝑝 blocks of 𝐴𝑖,𝑘 and 𝐵𝑘,𝑗
– Each process needs Θ(𝑛2 / 𝑝) memory
– Total memory over all the processes is Θ(𝑛2 × 𝑝),
i.e., 𝑝 times the memory of the sequential
algorithm.

5
Cannon’s Algorithm of Matrix-Matrix Multiplication

Goal: to improve the memory efficiency.

Let 𝐴 = [𝑎𝑖𝑗 ]𝑛×𝑛 and 𝐵 = [𝑏𝑖𝑗 ]𝑛×𝑛 be n × n matrices. Compute 𝐶 =
𝐴𝐵
• Partition 𝐴 and 𝐵 into 𝑝 square blocks 𝐴𝑖,𝑗 and 𝐵𝑖,𝑗 (0 ≤ 𝑖, 𝑗 < 𝑝)
of size (𝑛/ 𝑝) × (𝑛/ 𝑝) each.
• Use Cartesian topology to set up process grid. Process 𝑃𝑖,𝑗 initially
stores 𝐴𝑖,𝑗 and 𝐵𝑖,𝑗 and computes block 𝐶𝑖,𝑗 of the result matrix.
• Remark: Computing submatrix 𝐶𝑖,𝑗 requires all submatrices 𝐴𝑖,𝑘 and
𝐵𝑘,𝑗 for 0 ≤ 𝑘 < 𝑝.
• The contention-free formula:
𝑝−1
𝐶𝑖,𝑗 = 𝑘=0 𝐴𝑖, 𝑖+𝑗+𝑘 % 𝑝 𝐵 𝑖+𝑗+𝑘 % 𝑝,𝑗

6
Cannon’s Algorithm
// make initial alignment
for 𝑖, 𝑗 :=0 to 𝑝 − 1 do
Send block 𝐴𝑖,𝑗 to process 𝑖, 𝑗 − 𝑖 + 𝑝 𝑚𝑜𝑑 𝑝 and block 𝐵𝑖,𝑗 to process
𝑖 − 𝑗 + 𝑝 𝑚𝑜𝑑 𝑝, 𝑗 ;
endfor;
Process 𝑃𝑖,𝑗 multiply received submatrices together and add the result to 𝐶𝑖,𝑗 ;

// compute-and-shift. A sequence of one-step shifts pairs up 𝐴𝑖,𝑘 and 𝐵𝑘,𝑗

// on process 𝑃𝑖,𝑗 . 𝐶𝑖,𝑗 = 𝐶𝑖,𝑗 +𝐴𝑖,𝑘 𝐵𝑘,𝑗
for step :=1 to 𝑝 − 1 do
Shift 𝐴𝑖,𝑗 one step left (with wraparound) and 𝐵𝑖,𝑗 one step up (with
wraparound);
Process 𝑃𝑖,𝑗 multiply received submatrices together and add the result to 𝐶𝑖,𝑗 ;
Endfor;

Remark: In the initial alignment, the send operation is to: shift 𝐴𝑖,𝑗 to the left (with
wraparound) by 𝑖 steps, and shift 𝐵𝑖,𝑗 to the up (with wraparound) by 𝑗 steps. 7
Cannon’s Algorithm for 3 × 3 Matrices

Initial A, B A, B initial A, B after A, B after

alignment shift step 1 shift step 2

8
Performance Analysis
• In the initial alignment step, the maximum distance
over which block shifts is 𝑝 − 1
– The circular shift operations in row and column
𝑡𝑤 𝑛 2
directions take time: 𝑡𝑐𝑜𝑚𝑚 = 2(𝑡𝑠 + )
𝑝
• Each of the 𝑝 single-step shifts in the compute-
𝑡𝑤 𝑛 2
and-shift phase takes time: 𝑡𝑠 + .
𝑝
n n
• Multiplying 𝑝 submatrices of size ( ) ×( )
𝑝 𝑝
takes time: 𝑛3 /𝑝.
𝑛3 𝑡𝑤 𝑛 2 𝑡𝑤 𝑛 2
• Parallel time: 𝑇𝑝 = + 2 𝑝 𝑡𝑠 + + 2(𝑡𝑠 + )
𝑝 𝑝 𝑝

9
int MPI_Sendrecv_replace( void *buf, int count,
MPI_Datatype datatype, int dest, int sendtag, int source,
int recvtag, MPI_Comm comm, MPI_Status *status );
• Execute a blocking send and receive. The same buffer is
used both for the send and for the receive, so that the
message sent is replaced by the message received.
• buf[in/out]: initial address of send and receive buffer

10
#include "mpi.h"
#include <stdio.h>

int main(int argc, char *argv[])

{
int myid, numprocs, left, right;
int buffer[10];
MPI_Request request;
MPI_Status status;

MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);

right = (myid + 1) % numprocs;

left = myid - 1;
if (left < 0)
left = numprocs - 1;

MPI_Sendrecv_replace(buffer, 10, MPI_INT, left, 123, right, 123, MPI_COMM_WORLD,

&status);

MPI_Finalize();
return 0; 11
}

Kuvalayananda - Appayya
50% (4)
Kuvalayananda - Appayya
200 pages
Anthony PDF
No ratings yet
Anthony PDF
33 pages
Lecture-18-PDC-BCS-6EF-SMI-Spring-2025
No ratings yet
Lecture-18-PDC-BCS-6EF-SMI-Spring-2025
14 pages
DLassignment
No ratings yet
DLassignment
6 pages
Ch-1: Algorithms Analysis
No ratings yet
Ch-1: Algorithms Analysis
10 pages
Computational Complexity: ECE 3340 - David Mayerich
No ratings yet
Computational Complexity: ECE 3340 - David Mayerich
12 pages
Class 5 - 2D Maxima Sweep-Line Algorithm
No ratings yet
Class 5 - 2D Maxima Sweep-Line Algorithm
28 pages
TrinadhreddySeelam_AOA_Assignment2
No ratings yet
TrinadhreddySeelam_AOA_Assignment2
9 pages
Guidelines Phase 3
No ratings yet
Guidelines Phase 3
2 pages
Lecture 2
No ratings yet
Lecture 2
30 pages
04 Digital System Modeling
No ratings yet
04 Digital System Modeling
41 pages
ISE503 Project Report
No ratings yet
ISE503 Project Report
6 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
CZ1102 Computing & Problem Solving Lecture 4
No ratings yet
CZ1102 Computing & Problem Solving Lecture 4
26 pages
Lecture 2
No ratings yet
Lecture 2
71 pages
Exercise 9
No ratings yet
Exercise 9
5 pages
08 Implementation of Digital Controllers
No ratings yet
08 Implementation of Digital Controllers
5 pages
CO-2 (2)
No ratings yet
CO-2 (2)
22 pages
Deep learning
No ratings yet
Deep learning
15 pages
QFT Assignment
No ratings yet
QFT Assignment
3 pages
Integer Arithmetic: Dr. Arunachalam V Associate Professor, SENSE
No ratings yet
Integer Arithmetic: Dr. Arunachalam V Associate Professor, SENSE
19 pages
Chapter 3-Fuzzy Logic Control
100% (1)
Chapter 3-Fuzzy Logic Control
67 pages
01 Modeling, Computers, And Error Analysis
No ratings yet
01 Modeling, Computers, And Error Analysis
36 pages
Optimization Lesson 4 - Numerical Solutions of Unconstrained Single-Variable Optimization
No ratings yet
Optimization Lesson 4 - Numerical Solutions of Unconstrained Single-Variable Optimization
14 pages
Module 4
No ratings yet
Module 4
36 pages
Wk03 machine learning
No ratings yet
Wk03 machine learning
5 pages
.Assignment_1[1]
No ratings yet
.Assignment_1[1]
9 pages
Chapter 8 - Advanced Parallel Algorithms
No ratings yet
Chapter 8 - Advanced Parallel Algorithms
56 pages
Advanced Programming Techniques
No ratings yet
Advanced Programming Techniques
25 pages
Ada RGPV Chatgpt Ques
No ratings yet
Ada RGPV Chatgpt Ques
32 pages
Inbound 8392301798635648784
No ratings yet
Inbound 8392301798635648784
43 pages
Multitreading
No ratings yet
Multitreading
2 pages
CRE Lecture5
No ratings yet
CRE Lecture5
25 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
Chapter 2 - Random Number Generation
No ratings yet
Chapter 2 - Random Number Generation
21 pages
10 RDT Part4 Jan 24
No ratings yet
10 RDT Part4 Jan 24
17 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
SOT Method
No ratings yet
SOT Method
9 pages
Lab 2
No ratings yet
Lab 2
4 pages
Gradient Descent Algorithm
No ratings yet
Gradient Descent Algorithm
6 pages
DAA PPT-1worst Case and Average Case Analysis, Asymptotic Notations
No ratings yet
DAA PPT-1worst Case and Average Case Analysis, Asymptotic Notations
22 pages
Improving ML, DL networks Hyperparameter tuning, Regularization & Optimization
No ratings yet
Improving ML, DL networks Hyperparameter tuning, Regularization & Optimization
16 pages
Analysis of Algorithms: CS 302 - Data Structures Section 2.6
No ratings yet
Analysis of Algorithms: CS 302 - Data Structures Section 2.6
48 pages
Analysis of Algorithms: CS 302 - Data Structures Section 2.6
No ratings yet
Analysis of Algorithms: CS 302 - Data Structures Section 2.6
48 pages
Summarised Content Data Structures & Algorithm
No ratings yet
Summarised Content Data Structures & Algorithm
100 pages
AOA 2023 Solution
No ratings yet
AOA 2023 Solution
25 pages
Regular_Exam_Control_Tu01April2025_Final
No ratings yet
Regular_Exam_Control_Tu01April2025_Final
7 pages
control___questions
No ratings yet
control___questions
10 pages
Data Science-lab-080424manual With Header
No ratings yet
Data Science-lab-080424manual With Header
78 pages
HW 6
No ratings yet
HW 6
9 pages
Parallel Quick Sort Algorithm
No ratings yet
Parallel Quick Sort Algorithm
8 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
EEE_4114F_lab_ass01_2025
No ratings yet
EEE_4114F_lab_ass01_2025
5 pages
Probabilistic Analysis Overview
No ratings yet
Probabilistic Analysis Overview
29 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
CONTROL_ENGINNERING_EXP-03_LOG_REPORT.pdf_1
No ratings yet
CONTROL_ENGINNERING_EXP-03_LOG_REPORT.pdf_1
7 pages
Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology
No ratings yet
Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology
44 pages
Assignment LUfactorisation
No ratings yet
Assignment LUfactorisation
5 pages
DSA Complexity
No ratings yet
DSA Complexity
20 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
Group Division
No ratings yet
Group Division
2 pages
HMI - Mini - Project - Report - Roll No. 24,27 - Food Ordering Website
No ratings yet
HMI - Mini - Project - Report - Roll No. 24,27 - Food Ordering Website
11 pages
10-Basic Config Cisco ASA
No ratings yet
10-Basic Config Cisco ASA
2 pages
Lesson Plan Light
No ratings yet
Lesson Plan Light
3 pages
Types of English Pronunciation
No ratings yet
Types of English Pronunciation
23 pages
Perdev DLL Q2-W6
No ratings yet
Perdev DLL Q2-W6
4 pages
SOM - Assignments
100% (1)
SOM - Assignments
15 pages
4 Basic Types of Speeches
No ratings yet
4 Basic Types of Speeches
13 pages
Rank 1 2 3 4 5 6
No ratings yet
Rank 1 2 3 4 5 6
14 pages
24 B.A. (History) Final Book
No ratings yet
24 B.A. (History) Final Book
194 pages
2024 Ministers Conference Convention Report
No ratings yet
2024 Ministers Conference Convention Report
4 pages
Lesson 6 Vowel Letters
No ratings yet
Lesson 6 Vowel Letters
4 pages
Pet - Exercises. Unit 3
No ratings yet
Pet - Exercises. Unit 3
4 pages
final exaM
No ratings yet
final exaM
3 pages
Lab 1
No ratings yet
Lab 1
6 pages
02 Icelandic in Easy Stages No. 1 PDF
100% (2)
02 Icelandic in Easy Stages No. 1 PDF
137 pages
Analog vs. Digital Signal True or False Physical Science Cooperative Game in Colorful Bright Illustrated Style
No ratings yet
Analog vs. Digital Signal True or False Physical Science Cooperative Game in Colorful Bright Illustrated Style
25 pages
Web Design Style Guide 1.0
No ratings yet
Web Design Style Guide 1.0
16 pages
DLL - Mathematics 6 - Q3 - W4
No ratings yet
DLL - Mathematics 6 - Q3 - W4
9 pages
Fast Firewall Linux Ebpf
No ratings yet
Fast Firewall Linux Ebpf
3 pages
Early Muslim Polemic against Christianity Abū ĩsa al Warrāq s Against the Incarnation 1st Edition David Thomas - Quickly access the ebook and start reading today
100% (1)
Early Muslim Polemic against Christianity Abū ĩsa al Warrāq s Against the Incarnation 1st Edition David Thomas - Quickly access the ebook and start reading today
57 pages
Tag Questionsss
No ratings yet
Tag Questionsss
5 pages
MATH10 QUARTER 2 Week 2 DLL
100% (2)
MATH10 QUARTER 2 Week 2 DLL
8 pages
Instant download (Ebook) Encyclopedia of Stateless Nations: Ethnic and National Groups around the World, 2nd Edition by James B. Minahan ISBN 9781610699532, 161069953X pdf all chapter
100% (9)
Instant download (Ebook) Encyclopedia of Stateless Nations: Ethnic and National Groups around the World, 2nd Edition by James B. Minahan ISBN 9781610699532, 161069953X pdf all chapter
65 pages
Models of Communication: Miss Riza O. Villanueva
No ratings yet
Models of Communication: Miss Riza O. Villanueva
53 pages
R Programming
No ratings yet
R Programming
12 pages
CCNA 1 Final Exam Answers 2011
No ratings yet
CCNA 1 Final Exam Answers 2011
10 pages
Database
No ratings yet
Database
28 pages
Lecture 1 of Software Construction
No ratings yet
Lecture 1 of Software Construction
115 pages