chap4_selected_slides
chap4_selected_slides
7 6 5 4
0 1 2 3
2
3 3
7 6 5 4
0 1 2 3
2
1 1
P0 P0 P1 P2 P3
P4 P4 P5 P6 P7
Output
Vector
4 4 4 4
2 6 10 14
3 3 3 3
1 5 9 13
4 4 4 4
2 2
0 4 8 12
6 7
(011)
(010) 2 3
2
3
3
2 1 4 5
(100) (101)
(000) 0 1
(001)
3
• Assume that source processor is the root of this tree. In the first
step, the source sends the data to the right child (assuming
the source is also the left child). The problem has now
been decomposed into two problems with half the number of
processors.
Broadcast and Reduction on a Balanced Binary Tree
3 3 3 3
0 1 2 3 4 5 6 7
• Each node first sends to one of its neighbors the data it needs
to broadcast.
7 6 5 4
(7) (6) (5) (4)
1 (7) 1 (3)
0 1 2 3
7 6 5 4
(7,6) (6,5) (5,4) (4,3)
2 (6) 2 (2)
0 1 2 3
. .
. .
. .
7 (0) 7 (7) 7 (6)
7 6 5 4
(7,6,5,4,3,2,1) (6,5,4,3,2,1,0) (5,4,3,2,1,0,7) (4,3,2,1,0,7,6)
7 (1) 7 (5)
0 1 2 3
6 7 8 6 7 8
0 1 2 0 1 2
(a) Initial data distribution (b) Data distribution after rowwise broadcast
6 7 6 7
(4) (5)
4 5 4 5
(4,5) (4,5)
(a) Initial distribution of messages (b) Distribution before the second step
(0,...,7) (0,...,7)
(4,5, (4,5,
6,7) 6 7 6,7) 6 7
(0,...,7) (0,...,7)
(0,1, (0,1,
2 3 2 3
2,3) 2,3)
(4,5, (4,5,
6,7) 6,7) (0,...,7) (0,...,7)
4 5 4 5
(0,...,7) (0,...,7)
(0,1, (0,1,
0 1 0 1
2,3) 2,3)
(c) Distribution before the third step (d) Final distribution of messages
• On a hypercube, we have:
Xp
log
T = (ts + 2i−1tw m)
i=1
= ts log p + tw m(p − 1). (2)
All-Reduce and Prefix-Sum Operations
6 7 6 7
[2] [2]
(2) 2 3 (3) [3] (2+3) 2 3 (2+3)
[2+3]
[4]
4 5 4 5
(4) [4] (5) [5] (4+5) (4+5) [4+5]
[0] [0]
(0) 0 1 (1) [1] (0+1) 0 1 (0+1) [0+1]
(a) Initial distribution of values (b) Distribution of sums before second step
6 7 6 7
[0+1+2]
[0+1+2+3] [0+1+2+3]
(0+1+ [0+1+2]
2 3 2 3
2+3)
(0+1+2+3)
[4] (4+5)
4 5 4 5
[4+5]
(4+5) [0+1+2+3+4] [0+ .. +5]
[0]
(0+1+ 0 1 (0+1+ 0 1
[0+1] [0] [0+1]
2+3) 2+3)
(c) Distribution of sums before third step (d) Final distribution of prefix sums
• We must account for the fact that in prefix sums the node with
label k uses information from only the k-node subset whose
labels are less than or equal to k.
2 3 2 3
4 5 4 5
(4,5,
(0,1,2,3, (0,1, 6,7)
4,5,6,7) 2,3)
0 1 0 1
(a) Initial distribution of messages (b) Distribution before the second step
6 7 6 7
(4) (5)
4 5 4 5
(4,5)
(c) Distribution before the third step (d) Final distribution of messages
• There are log p steps, in each step, the machine size halves and
the data size halves.
P1
n
P2
P3
• Each node extracts the information meant for it from the data
received, and forwards the remaining (p − 2) pieces of size m
each to the next node.
X
p−1
T = (ts + tw m(p − i))
i=1
X
p−1
= ts(p − 1) + itw m
i=1
= (ts + tw mp/2)(p − 1). (4)
({5,0},{5,3},{5,6},
3 4 5 {5,1},{5,4},{4,7},
{5,2},{5,5},{5,8})
({3,0},{3,3},{3,6}, ({4,0},{4,3},{4,6},
{3,1},{3,4},{3,7}, {4,1},{4,4},{4,7},
{3,2},{3,5},{3,8}) {4,2},{4,5},{4,8}) ({6,0},{6,3},{6,6}, ({6,1},{6,4},{6,7}, ({6,2},{6,5},{6,8},
{7,0},{7,3},{7,6}, {7,1},{7,4},{7,7}, {7,2},{7,5},{7,8},
{8,0},{8,3},{8,6}) {8,1},{8,4},{8,7}) {8,2},{8,5},{8,8})
0 1 2
6 7 8
({0,0},{0,3},{0,6}, ({1,0},{1,3},{1,6}, ({2,0},{2,3},{2,6},
{0,1},{0,4},{0,7}, {1,1},{1,4},{1,7}, {2,1},{2,4},{2,7}, ({3,1},{3,4}, ({3,2},{3,5},
{0,2},{0,5},{0,8}) {1,2},{1,5},{1,8}) {2,2},{2,5},{2,8}) {3,7},{4,1}, {3,8},{4,2},
{4,4},{4,7}, {4,5},{4,8},
{5,1},{5,,4}, {5,2},{5,5},
(a) Data distribution at the ({3,0},{3,3},{3,6}, {5,7}) {5,8})
beginning of first phase 3 4 5
{4,0},{4,3},{4,6},
{5,0},{5,3},{5,6})
({0,1},{0,4}, ({0,2},{0,5},
{0,7},{1,1}, {0,8},{1,2},
{1,4},{1,7}, {1,5},{1,8},
({0,0},{0,3},{0,6}, {2,1},{2,4}, {2,2},{2,5},
{1,0},{1,3},{1,6}, {2,7}) {2,8})
0 1 2
{2,0},{2,3},{2,6})
√
• Time for the first phase is identical to that in a ring with p
√
processors, i.e., (ts + tw mp/2)( p − 1).
•
All-to-All Personalized Communication on a
Hypercube
6 7 6 7
4 5 4 5 {4,5},{4,7},
{5,1},{5,3},
({4,0} ... {4,7}) ({5,0} ... {5,7}) {5,5},{5,7})
0 1 0 1
({0,0} ... {0,7}) ({1,0} ... {1,7}) ({0,0},{0,2},{0,4},{0,6}, ({1,1},{1,3},{1,5},{1,7},
{1,0},{1,2},{1,4},{1,6}) {0,1},{0,3},{0,5},{0,7})
(a) Initial distribution of messages (b) Distribution before the second step
({6,2},{6,6},{4,2},{4,6}, ({7,3},{7,7},{5,3},{5,7},
{7,2},{7,6},{5,2},{5,6}) {6,3},{6,7},{4,3},{4,7}) ({0,6} ... {7,6}) ({0,7} ... {7,7})
6 7 6 7
0 1 0 1
({0,0},{0,4},{2,0},{2,4}, ({1,1},{1,5},{3,1},{3,5}, ({0,0} ... {7,0}) ({0,1} ... {7,1})
{1,0},{1,4},{3,0},{3,4}) {0,1},{0,5},{2,1},{2,5})
(c) Distribution before the third step (d) Final distribution of messages
2 3 2 3 2 3
4 5 4 5 4 5
0 1 0 1 0 1
6 7 6 7 6 7
2 3 2 3 2 3
4 5 4 5 4 5
0 1 0 1 0 1
6 7 0 1 3 7
1 0 2 6
2 3 2 3 1 5
3 2 0 4
4 5 7 3
4 5
5 4 6 2
6 7 5 1
0 1 7 6 4 0
(g)
Seven steps in all-to-all personalized communication on an
eight-node hypercube.
All-to-All Personalized Communication on a
Hypercube: Cost Analysis of Optimal Algorithm
• We have:
T=(ts + tw m)(p − 1). (7)