06 cmsc416 Algorithms
06 cmsc416 Algorithms
Parallel Algorithms
Abhinav Bhatele, Alan Sussman
Matrix multiplication
https://en.wikipedia.org/wiki/Matrix_multiplication
https://en.wikipedia.org/wiki/Matrix_multiplication
k j j
A00 A01 A02 A03 B00 B01 B02 B03 C00 C01 C02 C03
i k i
A10 A11 A12 A13 B10 B11 B12 B13 C10 C11 C12 C13
A20 A21 A22 A23 B20 B21 B22 B23 C20 C21 C22 C23
A30 A31 A32 A33 B30 B31 B32 B33 C30 C31 C32 C33
https://link.springer.com/chapter/10.1007/978-3-319-67630-2_36
https://link.springer.com/chapter/10.1007/978-3-319-67630-2_36
https://link.springer.com/chapter/10.1007/978-3-319-67630-2_36
https://link.springer.com/chapter/10.1007/978-3-319-67630-2_36
• Requires other processes in its row and column to send A and B blocks so can it can
compute the nal values of its sub-block
2D process grid
4 5 6 7 A10
10 A11
11 A
A12
12 A13
13 B10 B11 B12
12 B13
8 9 10 11 A
A20
20 A
A21
21 A
A22
22 A
A23
23 B20 B21 B
B22
22 B23
2D process grid
4 5 6 7 A11
10 A12
11 A
A13
12 A10
13 B10 B11 B12
12 B13
8 9 10 11 A
A20
20 A
A21
21 A
A22
22 A
A23
23 B20 B21 B
B22
22 B23
4 5 6 7 A11
10 A12
11 A
A13
12 A10
13 B10 B11 B12
12 B13
8 9 10 11 A
A22
20 A
A23
21 A
A22
20 A
A21
23 B20 B21 B
B22
22 B23
4 5 6 7 A11
10 A12
11 A
A13
12 A10
13 B10 B11 B12
32 B13
8 9 10 11 A
A22
20 A
A23
21 A
A22
20 A
A21
23 B20 B21 B
B22
02 B23
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
2D process grid
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
0 1 2 3 B22
8 9 10 11 B02
12 13 14 15 B12
2D process grid
0 1 2 3 B22
8 9 10 11 B02
12 13 14 15 B12
0 1 2 3 B32
8 9 10 11 B12
12 13 14 15 B22
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
2D process grid
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
• Data movement is done only once before computation and once after computation
j
12 13 14 A10 A11 A12 B01 B11 B21
j
12 13 14 A
A10
10 A
A11
11 A
A12
12 B01 B11 B21
j
12 13 14 A
A10
10 A
A11
11 A
A12
12 B0101
B
B1111
B
B2121
B
j
12 13 14 A10 A11 A12 B01 B11 B21
C00
15 16 17 A20 A21 A22 A10 * B01 C01
B00 B10 B20
A11 * B11 C10
k C11
A12 * B21
B01 B11 B21 C20
0 1 2 A00 A01 A02 C21
i
3 4 5 A10 A11 A12 B00 B10 B20
• Reduction
• All-to-all
• Naive algorithm: every process sends the data pair-wise to all other processes
https://www.codeproject.com/Articles/896437/A-Gentle-Introduction-to-the-Message-Passing-Inter
https://en.wikipedia.org/wiki/Hypercube