02 cmsc416 Parallel
02 cmsc416 Parallel
• Assignment 0 will be posted tonight Sep 3 11:59 pm, due on Sep 10 11:59 pm
SPMD model
• Decide the serial algorithm rst
SPMD model
• Decide the serial algorithm rst
SPMD model
• Decide the serial algorithm rst
SPMD model
• Decide the serial algorithm rst
for(i ...)
for(j ...)
A_new[i, j] = (A[i, j] + A[i-1, j] + A[i+1, j] + A[i, j-1] + A[i, j+1]) * 0.2
...
}
for(i ...)
for(j ...)
A_new[i, j] = (A[i, j] + A[i-1, j] + A[i+1, j] + A[i, j-1] + A[i, j+1]) * 0.2
...
}
for(i ...)
for(j ...)
A_new[i, j] = (A[i, j] + A[i-1, j] + A[i+1, j] + A[i, j-1] + A[i, j+1]) * 0.2
...
} For correctness, we have to ensure that
elements in A are not written into before they
are read in the same timestep / iteration
• 1D decomposition
• Divide rows (or columns) among processes
• 1D decomposition
• Divide rows (or columns) among processes
• 1D decomposition
• Divide rows (or columns) among processes
• 1D decomposition
• Divide rows (or columns) among processes
• 2D decomposition
• Divide both rows and columns (2d blocks)
among processes
• 1D decomposition
• Divide rows (or columns) among processes
• 2D decomposition
• Divide both rows and columns (2d blocks)
among processes
pSum[0] = A[0]
A 1 2 3 4 5 6 …
pSum 1 3 6 10 15 21 …
2 8 3 5 7 4 1 6
Stride 1
2 10 11 8 12 11 5 7
Stride 1
2 10 11 8 12 11 5 7
Stride 2
2 10 13 18 23 19 17 18
Stride 1
2 10 11 8 12 11 5 7
Stride 2
2 10 13 18 23 19 17 18
Stride 4
2 10 13 18 25 29 30 36
• Then do parallel algorithm with partial pre x sums (using the last element from each
local block)
• Last element from sending process is added to all elements in receiving process’ sub-block
• Load balance: try to balance the amount of work (computation) assigned to different
threads/ processes
• Bring ratio of maximum to average load as close to 1.0 as possible