0% found this document useful (0 votes)

9 views54 pages

Chap4 Selected Slides

Slides

Uploaded by

Neeraj Gusain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views54 pages

Chap4 Selected Slides

Slides

Uploaded by

Neeraj Gusain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Chapter 4

Basic Communication Operations

(Selected slides)

A. Grama, A. Gupta, G. Karypis, and V. Kumar

To accompany the text “Introduction to Parallel Computing”,

Addison Wesley, 2003.
Topic Overview

• One-to-All Broadcast and All-to-One Reduction

• All-to-All Broadcast and Reduction

• All-Reduce and Prefix-Sum Operations

• Scatter and Gather

• All-to-All Personalized Communication

Basic Communication Operations: Introduction

• Many interactions in practical parallel programs occur in well-

defined patterns involving groups of processors.

• These collective communication operations are important for

performance, development effort and cost, and software
quality.

• Efficient implementations should consider the underlying

architecture.

• A descriptive set of architectures are selected to illustrate the

process of algorithm design.
Basic Communication Operations: Introduction

• Group communication operations are built using point-to-point

messaging primitives.

• Time usage of communicating a message of size m over an

uncongested network: ts + tw m.

• We use this as the basis for our analyses. Where necessary, we

take congestion into account explicitly by scaling the tw term.

• We assume that the network is bidirectional and that

communication is single-ported.
One-to-All Broadcast and All-to-One Reduction

• One processor has a piece of data (of size m) it needs to send

to everyone.

• The dual of one-to-all broadcast is all-to-one reduction.

• In all-to-one reduction, each processor has m pieces of data.

These data items must be combined piece-wise (using some
associative operator, such as addition or min), and the result
made available at a target processor.
One-to-All Broadcast and All-to-One Reduction
One-to-all Broadcast
M M M M
0 1 ... p-1 0 1 ... p-1
All-to-one Reduction

One-to-all broadcast and all-to-one reduction among p

processors.
One-to-All Broadcast and All-to-One Reduction on
Rings

• The simplest way is to send p − 1 messages from the source to

the other p − 1 processors – this is not very efficient.

• Use recursive doubling: source sends a message to a selected

processor. We now have two independent problems derined
over halves of machines.

• Reduction can be performed in an identical fashion by

inverting the process.
One-to-All Broadcast
3 3
2

7 6 5 4

0 1 2 3

2
3 3

One-to-all broadcast on an eight-node ring. Node 0 is the

source of the broadcast. Each message-transfer step is shown by
a numbered, dotted arrow from the source of the message to its
destination. The number on an arrow indicates the time step
during which the message is transferred.
All-to-One Reduction
1 1
2

7 6 5 4

0 1 2 3

2
1 1

Reduction on an eight-node ring with node 0 as the destination

of the reduction.
Broadcast and Reduction: Example

Consider the problem of multiplying a matrix with a vector.

• The n × n matrix is assigned to an n × n (virtual) processor grid.

The vector is assumed to be on the first row of processors.

• The first step of the product requires a one-to-all broadcast

of the vector element along the corresponding column of
processors. This can be done concurrently for all n columns.

• The processors compute local product of the vector element

and the local matrix entry.

• In the final step, the results of these products are accumulated

to the first row using n concurrent all-to-one reduction
operations along the oclumns (using the sum operation).
Broadcast and Reduction: Matrix-Vector
Multiplication Example
All-to-one Input Vector
reduction
P0 P1 P2 P3
One-to-all broadcast

P0 P0 P1 P2 P3

P4 P4 P5 P6 P7

P8 P8 P9 P10 P11 Matrix

P12 P12 P13 P14 P15

Output
Vector

One-to-all broadcast and all-to-one reduction in the

multiplication of a 4 × 4 matrix with a 4 × 1 vector.
Broadcast and Reduction on a Mesh

• We can view each row and column of a square mesh of p

√
nodes as a linear array of p nodes.

• Broadcast and reduction operations can be performed in two

steps – the first step does the operation along a row and the
second step along each column concurrently.

• This process generalizes to higher dimensions as well.

Broadcast and Reduction on a Mesh: Example
3 7 11 15

4 4 4 4

2 6 10 14

3 3 3 3

1 5 9 13

4 4 4 4

2 2
0 4 8 12

One-to-all broadcast on a 16-node mesh.

Broadcast and Reduction on a Hypercube

• A hypercube with 2d nodes can be regarded as a d-

dimensional mesh with two nodes in each dimension.

• The mesh algorithm can be generalized to a hypercube and

the operation is carried out in d (= log p) steps.
Broadcast and Reduction on a Hypercube: Example
(110) 3 (111)

6 7

(011)

(010) 2 3
2
3
3
2 1 4 5
(100) (101)

(000) 0 1
(001)
3

One-to-all broadcast on a three-dimensional hypercube. The

binary representations of node labels are shown in parentheses.
Broadcast and Reduction on a Balanced Binary Tree

• Consider a binary tree in which processors are (logically) at the

leaves and internal nodes are routing nodes.

• Assume that source processor is the root of this tree. In the first
step, the source sends the data to the right child (assuming
the source is also the left child). The problem has now
been decomposed into two problems with half the number of
processors.
Broadcast and Reduction on a Balanced Binary Tree

3 3 3 3
0 1 2 3 4 5 6 7

One-to-all broadcast on an eight-node tree.

Cost Analysis

• The broadcast or reduction procedure involves log p point-to-

point simple message transfers, each at a time cost of ts + tw m.

• The total time is therefore given by:

T = (ts + tw m) log p. (1)

All-to-All Broadcast and Reduction

• Generalization of broadcast in which each processor is the

source as well as destination.

• A process sends the same m-word message to every other

process, and different processes broadcast different messages.
All-to-All Broadcast and Reduction
M p -1 M p -1 M p -1
.. .. ..
All-to-all broadcast . . .
M1 M1 M1
M0 M1 M p -1 M0 M0 M0
0 1 ... p-1 All-to-all reduction 0 1 ... p-1

All-to-all broadcast and all-to-all reduction.

All-to-All Broadcast and Reduction on a Ring

• Simplest approach: perform p one-to-all broadcasts. This is not

the most efficient way, though.

• Each node first sends to one of its neighbors the data it needs
to broadcast.

• In subsequent steps, it forwards the data received from one of

its neighbors to its other neighbor.

• The algorithm terminates in p − 1 steps.

All-to-All Broadcast and Reduction on a Ring
1 (6) 1 (5) 1 (4)

7 6 5 4
(7) (6) (5) (4)
1 (7) 1 (3)

1st communication step

(0) (1) (2) (3)

0 1 2 3

1 (0) 1 (1) 1 (2)

2 (5) 2 (4) 2 (3)

7 6 5 4
(7,6) (6,5) (5,4) (4,3)
2 (6) 2 (2)

2nd communication step

(0,7) (1,0) (2,1) (3,2)

0 1 2 3

2 (7) 2 (0) 2 (1)

. .
. .
. .
7 (0) 7 (7) 7 (6)

7 6 5 4
(7,6,5,4,3,2,1) (6,5,4,3,2,1,0) (5,4,3,2,1,0,7) (4,3,2,1,0,7,6)
7 (1) 7 (5)

7th communication step

(0,7,6,5,4,3,2) (1,0,7,6,5,4,3) (2,1,0,7,6,5,4) (3,2,1,0,7,6,5)

0 1 2 3

7 (2) 7 (3) 7 (4)

All-to-all broadcast on an eight-node ring.

All-to-All Broadcast and Reduction on a Ring

1. procedure ALL TO ALL BC RING(my id, my msg , p, result)

2. begin
3. left := (my id − 1) mod p;
4. right := (my id + 1) mod p;
5. result := my msg ;
6. msg := result;
7. for i := 1 to p − 1 do
8. send msg to right;
9. receive msg from left;
10. result := result ∪ msg;
11. endfor;
12. end ALL TO ALL BC RING

All-to-all broadcast on a p-node ring.

All-to-all reduction is simply a dual of this operation and can be

performed in an identical fashion.
All-to-all Broadcast on a Mesh

• Performed in two phases – in the first phase, each row of the

mesh performs an all-to-all broadcast using the procedure for
the linear array.
√
• In this phase, all nodes collect p messages corresponding to
√
the p nodes of their respective rows. Each node consolidates
√
this information into a single message of size m p.

• The second communication phase is a columnwise all-to-all

broadcast of the consolidated messages.
All-to-all Broadcast on a Mesh
(6) (7) (8) (6,7,8) (6,7,8) (6,7,8)

6 7 8 6 7 8

(3) (4) (5)

(3,4,5) (3,4,5) (3,4,5)
3 4 5 3 4 5

0 1 2 0 1 2

(0) (1) (2) (0,1,2) (0,1,2) (0,1,2)

(a) Initial data distribution (b) Data distribution after rowwise broadcast

All-to-all broadcast on a 3 × 3 mesh. The groups of nodes

communicating with each other in each phase are enclosed by
dotted boundaries. By the end of the second phase, all nodes
get (0,1,2,3,4,5,6,7) (that is, a message from each node).
All-to-all broadcast on a Hypercube

• Generalization of the mesh algorithm to log p dimensions.

• Message size doubles at each of the log p steps.

All-to-all broadcast on a Hypercube
(6) (7) (6,7) (6,7)

6 7 6 7

(2) 2 3 (3) (2,3) 2 3 (2,3)

(4) (5)

4 5 4 5
(4,5) (4,5)

(0) 0 1 (1) (0,1) 0 1 (0,1)

(a) Initial distribution of messages (b) Distribution before the second step

(0,...,7) (0,...,7)
(4,5, (4,5,
6,7) 6 7 6,7) 6 7

(0,...,7) (0,...,7)
(0,1, (0,1,
2 3 2 3
2,3) 2,3)
(4,5, (4,5,
6,7) 6,7) (0,...,7) (0,...,7)
4 5 4 5

(0,...,7) (0,...,7)
(0,1, (0,1,
0 1 0 1
2,3) 2,3)

All-to-all broadcast on an eight-node hypercube.

All-to-all Reduction

• Similar communication pattern to all-to-all broadcast, except

in the reverse order.

• On receiving a message, a node must combine it with the local

copy of the message that has the same destination as the
received message before forwarding the combined message
to the next neighbor.
Cost Analysis

• On a ring, the time is given by: (ts + tw m)(p − 1).

√
• On a mesh, the time is given by: 2ts( p − 1) + tw m(p − 1).

• On a hypercube, we have:

Xp
log
T = (ts + 2i−1tw m)
i=1
= ts log p + tw m(p − 1). (2)
All-Reduce and Prefix-Sum Operations

• In all-reduce, each node starts with a buffer of size m and the

final results of the operation are identical buffers of size m on
each node that are formed by combining the original p buffers
using an associative operator.

• Same effect as all-to-one reduction followed by a one-to-all

broadcast. (But this formulation is not the most efficient.) Uses
the pattern of all-to-all broadcast, instead. The only difference
is that message size does not change here. Time for this
operation is (ts + tw m) log p.

• Different from all-to-all reduction, in which p simultaneous all-to-

one reductions take place, each with a different destination for
the result.
The Prefix-Sum Operation

• Given p numbers n0, n1, . . . , np−1 (one on each node), the

problem is to compute the sums sk = Σki=0ni for all k between 0
and p − 1.

• Initially, nk resides on the node labeled k, and at the end of the

procedure, the same node holds sk .
The Prefix-Sum Operation
(6) [6] (7) [7] (6) [6] (6+7) [6+7]

6 7 6 7

[2] [2]
(2) 2 3 (3) [3] (2+3) 2 3 (2+3)
[2+3]

[4]
4 5 4 5
(4) [4] (5) [5] (4+5) (4+5) [4+5]
[0] [0]
(0) 0 1 (1) [1] (0+1) 0 1 (0+1) [0+1]

(a) Initial distribution of values (b) Distribution of sums before second step

(4+5+6) [4+5+6] (4+5+6+7) [4+5+6+7] [0+ .. +6] [0+ .. +7]

6 7 6 7
[0+1+2]
[0+1+2+3] [0+1+2+3]
(0+1+ [0+1+2]
2 3 2 3
2+3)
(0+1+2+3)

[4] (4+5)
4 5 4 5
[4+5]
(4+5) [0+1+2+3+4] [0+ .. +5]
[0]

(0+1+ 0 1 (0+1+ 0 1
[0+1] [0] [0+1]
2+3) 2+3)

Computing prefix sums on an eight-node hypercube. At each

node, square brackets show the local prefix sum accumulated in
the result buffer and parentheses enclose the contents of the
outgoing message buffer for the next step.
The Prefix-Sum Operation

• The operation can be implemented using the all-to-all

broadcast kernel.

• We must account for the fact that in prefix sums the node with
label k uses information from only the k-node subset whose
labels are less than or equal to k.

• This is implemented using an additional result buffer. The

content of an incoming message is added to the result buffer
only if the message comes from a node with a smaller label
than the recipient node.

• The contents of the outgoing message (denoted by parentheses

in the figure) are updated with every incoming message.
Scatter and Gather

• In the scatter operation, a single node sends a unique message

to every other node (also called a one-to-all personalized
communication).

• In the gather operation, a single node collects a unique

message from each node.

• While the scatter operation is fundamentally different from

broadcast, the algorithmic structure is similar, except for
differences in message sizes (messages get smaller in scatter
and stay constant in broadcast).

• The gather operation is exactly the inverse of the scatter

operation and can be executed as such.
Gather and Scatter Operations
M p -1
.
. Scatter
.
M1
M0 M0 M1 M p -1
0 1 ... p-1
Gather
0 1 ... p-1

Scatter and gather operations.

Example of the Scatter Operation
6 7 6 7

2 3 2 3

4 5 4 5
(4,5,
(0,1,2,3, (0,1, 6,7)
4,5,6,7) 2,3)
0 1 0 1

(a) Initial distribution of messages (b) Distribution before the second step

(6,7) (6) (7)

6 7 6 7

(2,3) (2) (3)

2 3 2 3

(4) (5)

4 5 4 5
(4,5)

(0,1) (0) (1)

0 1 0 1

The scatter operation on an eight-node hypercube.

Cost of Scatter and Gather

• There are log p steps, in each step, the machine size halves and
the data size halves.

• We have the time for this operation to be:

T = ts log p + tw m(p − 1). (3)

• This time is for a linear array as well as a 2-D mesh.

• These times are asymptotically optimal in message size.

All-to-All Personalized Communication

• Each node has a distinct message of size m for every other

node.

• This is unlike all-to-all broadcast, in which each node sends the

same message to all other nodes.

• All-to-all personalized communication is also known as total

exchange.
All-to-All Personalized Communication
M 0, p -1 M 1, p -1 M p -1, p -1 M p -1,0 M p -1,1 M p -1, p -1
... .. .. .. .
.
.
.
. . . . .
M 0,1 M 1,1 M p -1,1 M 1,0 M 1,1 M 1, p -1
M 0,0 M 1,0 M p -1,0 All-to-all personalized M 0,0 M 0,1 M 0, p -1
communication
0 1 ... p-1 0 1 ... p-1

All-to-all personalized communication.

All-to-All Personalized Communication: Example

Consider the problem of transposing a matrix.

• Each processor contains one full row of the matrix.

• The transpose operation in this case is identical to an all-to-all

personalized communication operation.
All-to-All Personalized Communication: Example
P0

P1
n
P2

All-to-all personalized communication in transposing a 4 × 4

matrix using four processes.
All-to-All Personalized Communication on a Ring

• Each node sends all pieces of data as one consolidated

message of size m(p − 1) to one of its neighbors.

• Each node extracts the information meant for it from the data
received, and forwards the remaining (p − 2) pieces of size m
each to the next node.

• The algorithm terminates in p − 1 steps.

• The size of the message reduces by m at each step.

All-to-All Personalized Communication on a Ring
({0,5}) ({5,4})
5 5
({1,5}, {1,0}) ({0,4}, {0,5})
4 4
({2,5} ... {2,1}) ({1,4} ... {1,0})
3 3
({3,5} ... {3,2}) ({2,4} ... {2,1})
2 2
({4,5} ... {4,3}) ({3,4} ... {3,2})
1 1
5 4 3 2 1
5 4 3
({5,0},
({3,0}, ({4,0}, {5,1}, ({2,3},
({1,0}) ({2,0}, {3,1}, {4,1}, {5,2}, {2,4}, ({1,3}, ({0,3}, ({5,3},
{2,1}) {2,5}, {1,4}, {0,4}, {5,4}) ({4,3})
{3,2}) {4,2}, {5,3},
{4,3}) {5,4}) {2,0}, {1,5}, {0,5})
{2,1}) {1,0})
0 1 2
({0,1} ... {0,5}) ({1,2} ... {1,0}) 1 2 3 4 5
1 1
({5,1} ... {5,4}) ({0,2} ... {0,5})
2 2
({4,1} ... {4,3}) ({5,2} ... {5,4})
3 3
({3,1}, {3,2}) ({4,2}, {4,3})
4 4
({2,1}) ({3,2})
5 5

All-to-all personalized communication on a six-node ring. The

label of each message is of the form {x, y}, where x is the label
of the node that originally owned the message, and y is the
label of the node that is the final destination of the message. The
label ({x1, y1}, {x2, y2}, . . . , {xn, yn}) indicates a message that is
formed by concatenating n individual messages.
All-to-All Personalized Communication on a Ring:
Cost

• We have p − 1 steps in all.

• In step i, the message size is m(p − i).

• The total time is given by:

X
p−1
T = (ts + tw m(p − i))
i=1

X
p−1
= ts(p − 1) + itw m
i=1
= (ts + tw mp/2)(p − 1). (4)

• The tw term in this equation can be reduced by a factor of 2 by

communicating messages in both directions.
All-to-All Personalized Communication on a Mesh

• Each node first groups its p messages according to the columns

of their destination nodes.

• All-to-all personalized communication is performed independently

√
in each row with clustered messages of size m p.

• Messages in each node are sorted again, this time according

to the rows of their destination nodes.

• All-to-all personalized communication is performed independently

√
in each column with clustered messages of size m p.
All-to-All Personalized Communication on a Mesh
({8,0},{8,3},{8,6},
6 7 8 {8,1},{8,4},{8,7},
{8,2},{8,5},{8,8})
({6,0},{6,3},{6,6}, ({7,0},{7,3},{7,6},
{6,1},{6,4},{6,7}, {7,1},{7,4},{7,7},
{6,2},{6,5},{6,8}) {7,2},{7,5},{7,8})

({5,0},{5,3},{5,6},
3 4 5 {5,1},{5,4},{4,7},
{5,2},{5,5},{5,8})
({3,0},{3,3},{3,6}, ({4,0},{4,3},{4,6},
{3,1},{3,4},{3,7}, {4,1},{4,4},{4,7},
{3,2},{3,5},{3,8}) {4,2},{4,5},{4,8}) ({6,0},{6,3},{6,6}, ({6,1},{6,4},{6,7}, ({6,2},{6,5},{6,8},
{7,0},{7,3},{7,6}, {7,1},{7,4},{7,7}, {7,2},{7,5},{7,8},
{8,0},{8,3},{8,6}) {8,1},{8,4},{8,7}) {8,2},{8,5},{8,8})
0 1 2
6 7 8
({0,0},{0,3},{0,6}, ({1,0},{1,3},{1,6}, ({2,0},{2,3},{2,6},
{0,1},{0,4},{0,7}, {1,1},{1,4},{1,7}, {2,1},{2,4},{2,7}, ({3,1},{3,4}, ({3,2},{3,5},
{0,2},{0,5},{0,8}) {1,2},{1,5},{1,8}) {2,2},{2,5},{2,8}) {3,7},{4,1}, {3,8},{4,2},
{4,4},{4,7}, {4,5},{4,8},
{5,1},{5,,4}, {5,2},{5,5},
(a) Data distribution at the ({3,0},{3,3},{3,6}, {5,7}) {5,8})
beginning of first phase 3 4 5
{4,0},{4,3},{4,6},
{5,0},{5,3},{5,6})
({0,1},{0,4}, ({0,2},{0,5},
{0,7},{1,1}, {0,8},{1,2},
{1,4},{1,7}, {1,5},{1,8},
({0,0},{0,3},{0,6}, {2,1},{2,4}, {2,2},{2,5},
{1,0},{1,3},{1,6}, {2,7}) {2,8})
0 1 2
{2,0},{2,3},{2,6})

(b) Data distribution at the beginning of second phase

The distribution of messages at the beginning of each phase of

all-to-all personalized communication on a 3 × 3 mesh. At the
end of the second phase, node i has messages ({0,i}, . . . ,{8,i}),
where 0 ≤ i ≤ 8. The groups of nodes communicating together in
each phase are enclosed in dotted boundaries.
All-to-All Personalized Communication on a Mesh:
Cost

√
• Time for the first phase is identical to that in a ring with p
√
processors, i.e., (ts + tw mp/2)( p − 1).

• Time in the second phase is identical to the first phase.

Therefore, total time is twice of this time, i.e.,
√
T = (2ts + tw mp)( p − 1). (5)

•
All-to-All Personalized Communication on a
Hypercube

• Generalize the mesh algorithm to log p steps.

• At any stage in all-to-all personalized communication, every

node holds p packets of size m each.

• While communicating in a particular dimension, every node

sends p/2 of these packets (consolidated as one message).

• A node must rearrange its messages locally before each of the

log p communication steps.
All-to-All Personalized Communication on a
Hypercube ({6,0},{6,2},{6,4},{6,6}, ({6,1},{6,3},{6,5},{6,7},
({6,0} ... {6,7}) ({7,0} ... {7,7}) {7,0},{7,2},{7,4},{7,6}) {7,1},{7,3},{7,5},{7,7})

6 7 6 7

({2,0} ... {2,7}) ({3,0} ... {3,7}) ({2,0},{2,2},

{2,4},{2,6},
2 3 2 3
{3,0},{3,2},
{3,4},{3,6})
({4,1},{4,3},

4 5 4 5 {4,5},{4,7},
{5,1},{5,3},
({4,0} ... {4,7}) ({5,0} ... {5,7}) {5,5},{5,7})

0 1 0 1
({0,0} ... {0,7}) ({1,0} ... {1,7}) ({0,0},{0,2},{0,4},{0,6}, ({1,1},{1,3},{1,5},{1,7},
{1,0},{1,2},{1,4},{1,6}) {0,1},{0,3},{0,5},{0,7})

(a) Initial distribution of messages (b) Distribution before the second step

({6,2},{6,6},{4,2},{4,6}, ({7,3},{7,7},{5,3},{5,7},
{7,2},{7,6},{5,2},{5,6}) {6,3},{6,7},{4,3},{4,7}) ({0,6} ... {7,6}) ({0,7} ... {7,7})

6 7 6 7

({0,2} ... {7,2}) ({0,3} ... {7,3})

({0,2},{2,2},
{0,6},{2,6}, 2 3 2 3
{1,2},{3,2}, ({4,1},{6,1},
{1,6},{3,6}) {4,5},{6,5},
4 5 {5,1},{7,1}, 4 5
{5,5},{7,5})
({0,4} ... {7,4}) ({0,5} ... {7,5})

0 1 0 1
({0,0},{0,4},{2,0},{2,4}, ({1,1},{1,5},{3,1},{3,5}, ({0,0} ... {7,0}) ({0,1} ... {7,1})
{1,0},{1,4},{3,0},{3,4}) {0,1},{0,5},{2,1},{2,5})

An all-to-all personalized communication algorithm on a

three-dimensional hypercube.
All-to-All Personalized Communication on a
Hypercube: Cost

• We have log p iterations and mp/2 words are communicated in

each iteration. Therefore, the cost is:

T = (ts + tw mp/2) log p. (6)

• This is not optimal!

All-to-All Personalized Communication on a
Hypercube: Optimal Algorithm

• Each node simply performs p − 1 communication steps,

exchanging m words of data with a different node in every
step.

• A node must choose its communication partner in each step

so that the hypercube links do not suffer congestion.

• In the jth communication step, node i exchanges data with

node (i XOR j).

• In this schedule, all paths in every communication step are

congestion-free, and none of the bidirectional links carry more
than one message in the same direction.
All-to-All Personalized Communication on a
Hypercube: Optimal Algorithm
6 7 6 7 6 7

2 3 2 3 2 3

4 5 4 5 4 5

0 1 0 1 0 1

(a) (b) (c)

6 7 6 7 6 7

2 3 2 3 2 3

4 5 4 5 4 5

0 1 0 1 0 1

(d) (e) (f)

6 7 0 1 3 7
1 0 2 6

2 3 2 3 1 5
3 2 0 4
4 5 7 3
4 5
5 4 6 2
6 7 5 1
0 1 7 6 4 0

(g)
Seven steps in all-to-all personalized communication on an
eight-node hypercube.
All-to-All Personalized Communication on a
Hypercube: Cost Analysis of Optimal Algorithm

• There are p − 1 steps and each step involves non-congesting

message transfer of m words.

• We have:
T=(ts + tw m)(p − 1). (7)

• This is asymptotically optimal in message size.

LAS - Math-9 4th Quarter
100% (2)
LAS - Math-9 4th Quarter
32 pages
Grade 10 Functions Worksheets
No ratings yet
Grade 10 Functions Worksheets
3 pages
5 - Curve Fitting by Numerical Methods
No ratings yet
5 - Curve Fitting by Numerical Methods
57 pages
Math Teachers Guide 2
100% (1)
Math Teachers Guide 2
62 pages
CH 03
No ratings yet
CH 03
11 pages
Summer Assignment Maths 9th, O-Levels
100% (1)
Summer Assignment Maths 9th, O-Levels
2 pages
Differential Equations Mcqs
No ratings yet
Differential Equations Mcqs
20 pages
Communication Operations
No ratings yet
Communication Operations
70 pages
MMW - PPT - Based Reviewer
No ratings yet
MMW - PPT - Based Reviewer
4 pages
Academic Transcript Economics
No ratings yet
Academic Transcript Economics
3 pages
AMO 2022 Paper and Solutions.
No ratings yet
AMO 2022 Paper and Solutions.
17 pages
Lecture 13, 14 - Chapter 6 Area Moments of Inertia
No ratings yet
Lecture 13, 14 - Chapter 6 Area Moments of Inertia
31 pages
Explain Group Communication - How Causal and Total Order Can Be Implemented? Group Communication
No ratings yet
Explain Group Communication - How Causal and Total Order Can Be Implemented? Group Communication
11 pages
AS P1 2020 Assignment 1
No ratings yet
AS P1 2020 Assignment 1
8 pages
Parallel Computing Communication Operations Slides
No ratings yet
Parallel Computing Communication Operations Slides
71 pages
Four Bit Binary To Gray Code Converter
No ratings yet
Four Bit Binary To Gray Code Converter
23 pages
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
No ratings yet
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
81 pages
Decode HPC
No ratings yet
Decode HPC
68 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
F2 PDF
No ratings yet
F2 PDF
51 pages
Exponents and Logs BOOK-compressed
No ratings yet
Exponents and Logs BOOK-compressed
92 pages
Parallel Computing - Unit II - NLAL
No ratings yet
Parallel Computing - Unit II - NLAL
84 pages
Relational Algebra: CSCD343-Introduction To Databases - A. Vaisman 1
No ratings yet
Relational Algebra: CSCD343-Introduction To Databases - A. Vaisman 1
21 pages
Unit 3
No ratings yet
Unit 3
62 pages
DC Module 4
No ratings yet
DC Module 4
78 pages
HPC UNIT 3 To UNIT 6 Technical-Merged
No ratings yet
HPC UNIT 3 To UNIT 6 Technical-Merged
143 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
Parallel Computing: MPI - Collective Communication
No ratings yet
Parallel Computing: MPI - Collective Communication
52 pages
Questions Pool For Distributed Systems
No ratings yet
Questions Pool For Distributed Systems
15 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
Principal Parameters: Communication Costs in Static Interconnection Networks
No ratings yet
Principal Parameters: Communication Costs in Static Interconnection Networks
31 pages
Distribuited Computing II 5-14
No ratings yet
Distribuited Computing II 5-14
16 pages
Unit 3 - Parallel Communication
No ratings yet
Unit 3 - Parallel Communication
41 pages
Lecture 6B - Multicasting
No ratings yet
Lecture 6B - Multicasting
28 pages
08 1 MPI Comm Data Distributions
No ratings yet
08 1 MPI Comm Data Distributions
60 pages
Trigonometry 40 Marks
No ratings yet
Trigonometry 40 Marks
2 pages
Hough Transform: Computer Vision and Image Processing - Fundamentals and Applications
No ratings yet
Hough Transform: Computer Vision and Image Processing - Fundamentals and Applications
24 pages
Unit 3 HPC
No ratings yet
Unit 3 HPC
73 pages
Lecture 11
No ratings yet
Lecture 11
52 pages
HPC Scaling
No ratings yet
HPC Scaling
56 pages
12.revision Parallelization
No ratings yet
12.revision Parallelization
30 pages
Module 3ppt
No ratings yet
Module 3ppt
50 pages
Chapter 06
No ratings yet
Chapter 06
47 pages
HPC Endsem FlyHigh Services
No ratings yet
HPC Endsem FlyHigh Services
18 pages
Lecture-02. Message Passing
No ratings yet
Lecture-02. Message Passing
38 pages
Lecture 14 Basic Communication Operations
No ratings yet
Lecture 14 Basic Communication Operations
40 pages
11.collectives II
No ratings yet
11.collectives II
20 pages
LEC6 parallelAlg-Broadcasting
No ratings yet
LEC6 parallelAlg-Broadcasting
15 pages
Advanced Engineering Mathematics: Submitted By: Jeb Gumban Submitted To: Engr. Jocelyn Domingo
No ratings yet
Advanced Engineering Mathematics: Submitted By: Jeb Gumban Submitted To: Engr. Jocelyn Domingo
17 pages
HPC 3rd Unit
No ratings yet
HPC 3rd Unit
16 pages
Basic Communications
No ratings yet
Basic Communications
13 pages
HPC Endsem 2024 FlyHigh Services
No ratings yet
HPC Endsem 2024 FlyHigh Services
16 pages
Communication
No ratings yet
Communication
24 pages
CS-3006 7 MPI Advanced Topics
No ratings yet
CS-3006 7 MPI Advanced Topics
36 pages
Lec8 MPIalgorithmDesign
No ratings yet
Lec8 MPIalgorithmDesign
12 pages
Optimal Communication Algorithms For Hypercubes : Journal of Parallel and Distributed Computing 11, 263-275 (1991)
No ratings yet
Optimal Communication Algorithms For Hypercubes : Journal of Parallel and Distributed Computing 11, 263-275 (1991)
13 pages
Thakur05-Optimization of Collective Communication Operations in MPICH
No ratings yet
Thakur05-Optimization of Collective Communication Operations in MPICH
18 pages
Lecture 15 PDC BCS 6EF SMI Spring 2025
No ratings yet
Lecture 15 PDC BCS 6EF SMI Spring 2025
27 pages
A Note On Rhotrix
No ratings yet
A Note On Rhotrix
10 pages
Unit 1 Quadratic Equation Examples
No ratings yet
Unit 1 Quadratic Equation Examples
19 pages
RajSingh HPC Exp1-7
No ratings yet
RajSingh HPC Exp1-7
23 pages
Lecture 17 PDC BCS 6EF SMI Spring 2025
No ratings yet
Lecture 17 PDC BCS 6EF SMI Spring 2025
17 pages
Lecture 16 PDC BCS 6EF SMI Spring 2025
No ratings yet
Lecture 16 PDC BCS 6EF SMI Spring 2025
15 pages
Lecture 18 PDC BCS 6EF SMI Spring 2025
No ratings yet
Lecture 18 PDC BCS 6EF SMI Spring 2025
14 pages
Lecture 19 PDC BCS 6EF SMI Spring 2025
No ratings yet
Lecture 19 PDC BCS 6EF SMI Spring 2025
14 pages
PDC Presntation
No ratings yet
PDC Presntation
9 pages
Intro To Communication: - Advantages
No ratings yet
Intro To Communication: - Advantages
13 pages
PDC - Co1-Basic Op & Cost Analysis
No ratings yet
PDC - Co1-Basic Op & Cost Analysis
22 pages
Chapter 2: Fluid Dynamics Review
No ratings yet
Chapter 2: Fluid Dynamics Review
9 pages
Lukoame GE5 SEMIS
No ratings yet
Lukoame GE5 SEMIS
6 pages
Grade 9 4th Summative Exam Reviewer 2024 2025
No ratings yet
Grade 9 4th Summative Exam Reviewer 2024 2025
3 pages
HPC Bankai
No ratings yet
HPC Bankai
7 pages
TOS (First Summative)
No ratings yet
TOS (First Summative)
5 pages
Being Numerate PDF
No ratings yet
Being Numerate PDF
3 pages
Exercise 9
No ratings yet
Exercise 9
5 pages
Prereq Graph 11-12
No ratings yet
Prereq Graph 11-12
1 page
Mid 2 Solution
No ratings yet
Mid 2 Solution
5 pages
Cuisenaire Rods Associative Property of Addition
No ratings yet
Cuisenaire Rods Associative Property of Addition
4 pages
PDCAssignment 04
No ratings yet
PDCAssignment 04
4 pages
RajSingh HPCexp2B
No ratings yet
RajSingh HPCexp2B
3 pages
2.3 First-Order Linear ODE
No ratings yet
2.3 First-Order Linear ODE
3 pages
The Complex Plane
No ratings yet
The Complex Plane
2 pages
3.5 Lecture Summary - Coursera
No ratings yet
3.5 Lecture Summary - Coursera
1 page
Distributive Law
No ratings yet
Distributive Law
5 pages
TensorFlow构建机器学习项目: Chinese Edition
From Everand
TensorFlow构建机器学习项目: Chinese Edition
Posts & Telecom Press
No ratings yet
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Some Case Studies on Signal, Audio and Image Processing Using Matlab
From Everand
Some Case Studies on Signal, Audio and Image Processing Using Matlab
Dr. Hedaya Mahmood Alasooly
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet