0% found this document useful (0 votes)
38 views

CSS490 Group Communication and MPI: Instructor: Munehiro Fukuda

This document summarizes key topics from Chapter 3 of the CSS490 Group Communication and MPI textbook. It discusses different types of group communication including one-to-many, many-to-one, and many-to-many. It also covers group addressing, semantics of communication, atomic multicast, message ordering approaches like absolute, consistent, causal and FIFO ordering. Finally, it provides an overview of why high-level message passing tools are useful and compares PVM and MPI libraries.

Uploaded by

S.Aatif Gulrez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

CSS490 Group Communication and MPI: Instructor: Munehiro Fukuda

This document summarizes key topics from Chapter 3 of the CSS490 Group Communication and MPI textbook. It discusses different types of group communication including one-to-many, many-to-one, and many-to-many. It also covers group addressing, semantics of communication, atomic multicast, message ordering approaches like absolute, consistent, causal and FIFO ordering. Finally, it provides an overview of why high-level message passing tools are useful and compares PVM and MPI libraries.

Uploaded by

S.Aatif Gulrez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

CSS490 Group Communication and

MPI
Textbook Ch3
Instructor: Munehiro Fukuda

These slides were compiled from the course textbook, the reference books, and
the instructor’s original materials.
Winter, 2004 CSS490 MPI 1
Group Communication
 Communication types:
 One-to-many: broadcast

 Many-to-one: synchronization, collective communication

 Many-to-many: gather and scatter

 Group addressing
 Using a special network address: IP Class D and UDP

 Emulating a broadcast with one-to-one communication:

 Performance drawback on bus-type networks

 Simpler for switching-based networks

 Semantics
 Send-to-all, bulletin-board semantics

 0-, 1-, m-out-of-n, all-reliable

Winter, 2004 CSS490 MPI 2


Atomic Multicast
 Send-to-all semantics and all-reliable
 Simple emulation:
 A repetition of one-to-on communication with
acknowledgment
 What if a receiver fails
 Time-out retransmission
 What if a sender fails before all receivers receive
a message
 All receivers forward the message to the same group.
 A receiver discard the 2nd or the following messages.
Winter, 2004 CSS490 MPI 3
Message Ordering
 R1 and R2 receive m1
S2
S1 R1 R2
and m2 in a different
m2 order!
 Some message ordering
m1
required
 Absolute ordering
m1  Consistent ordering
m2
 Causal ordering
 FIFO ordering

Winter, 2004 CSS490 MPI 4


Absolute Ordering
 Rule:
Ti < Tj
 Mi must be delivered before mj if Ti < Tj

Ti  Implementation:
 A clock synchronized among machines
mi
Tj  A sliding time window used to commit
message delivery whose timestamp is in
this window.
mi
 Example:
mj  Distributed simulation

mj  Drawback
 Too strict constraint

 No absolute synchronized clock

 No guarantee to catch all tardy messages

Winter, 2004 CSS490 MPI 5


Consistent Ordering
 Rule:
Ti < Tj  Messages received in the same order

Ti (regardless of their timestamp).


 Implementation:
mj Tj  A message sent to a sequencer,
assigned a sequence number, and
finally multicast to receivers
mj
 A message retrieved in incremental

mi order at a receiver
 Example:
mi
 Replicated database updation
 Drawback:
 A centralized algorithm

Winter, 2004 CSS490 MPI 6


Causal Ordering
 Rule:
S1 R1 R2 R3 S2
 Happened-before relation
• If eki, eli ∈h and k < l, then eki → eli,
m4
• If ei = send(m) and ej = receive(m),
m1 then ei → ej,
• If e → e’ and e’ → e”, then e → e”
m1 m4  Implementation:
m2  Use of a vector message

 Example:
 Distributed file system
m2
 Drawback:
m3  Vector as an overhead

 Broadcast assumed
From R2’s view point m1 →m2

Winter, 2004 CSS490 MPI 7


Vector Message
Site A Site B Site C Site D
2, 1, 1, 0 1, 1, 1, 0 2, 1, 0, 0
2, 1, 1, 0

3,1,1,0 delayed
delayed
delivered

 S[i] = R[i] + 1 where i is the source id


 S[j] ≤ R[j] where i≠j
Winter, 2004 CSS490 MPI 8
FIFO Ordering
S
 Rule:
R
 Messages received in the
m1
Router
same order as they were
m2
1 sent.
m1
m3  Implementation:
m2  Messages assigned a
m4 sequence number
m3
Router  Example:
2
 TCP
m4
 This is the weakest
ordering.
Winter, 2004 CSS490 MPI 9
Why High-Level Message
Passing Tools?
 Data formatting
 Data formatted into appropriate types at user level
 Non-blocking communication
 Polling and interrupt handled at system call level
 Process addressing
 Inflexible hardwired addressing with machine id + local
id
 Group communication
 Group server implemented at user level
 Broadcasting simulated by a repetition of one-to-one
communication
Winter, 2004 CSS490 MPI 10
PVM and MPI
 PVM: Parallel Virtual Machine
 Developed in 80’s

 The pioneer library to provide high-level message passing functions

 The PVM daemon process taking care of message transfer for user
processes in background
 MPI: Message Passing Interface
 Defined in 90’s

 The specification of high-level message passing functions

 Several implementations available: mpich, mpi-lam

 Library functions directly linked to user programs (no background


daemons)
 The detailed difference is shown by:
 PVMvsMPI.ps

Winter, 2004 CSS490 MPI 11


Getting Started with MPI
 Website:
http://www-unix.mcs.anl.gov/mpi/mpich/
 Creating a hostfile:
[mfukuda@UW1-320-00 mfukuda]$ vi hosts
uw1-320-00
uw1-320-01
uw1-320-02
uw1-320-03
 Compile a source program:
[mfukuda@UW1-320-00 mfukuda]$ mpiCC source.cpp –o myProg
 Run the executable file:
[mfukuda@UW1-320-00 mfukuda]$ mpirun –np 4 myProg args

Winter, 2004 CSS490 MPI 12


Program Using MPI
#include <iostream.h>
#include "mpi++.h"

int
main(int argc, char *argv[])
{
MPI::Init(argc, argv); // Start MPI computation

int rank = MPI::COMM_WORLD.Get_rank(); // Process ID (from 0 to


#processes – 1)
int size = MPI::COMM_WORLD.Get_size(); // # participating processes

cout << "Hello World! I am " << rank << " of " << size << endl;

MPI::Finalize(); // Finish MPI computation


}
Winter, 2004 CSS490 MPI 13
MPI_Send and MPI_Recv
Int MPI::COMM_WORLD.Send(
void* message /* in */,
int count /* in */,
MPI::Datatype datatype /* in */,
int dest /* in */,
int tag /* in */)
Int MPI::COMM_WORLD.Recv(
void* message /* in */,
int count /* in */,
MPI::Datatype datatype /* in */,
int source /* in */, /* MPI::ANY_SOURCE */
int tag /* in */,
MPI::Status* status /* out */) /* can be omitted */

MPI::Datatype = CHAR, SHORT, INT, LONG UNSIGNED_CHAR, UNSIGNED_SHORT,


UNSIGNED, UNSIGNED_LONG, FLOAT, DOUBLE, LONG_DOUBLE,
BYTE, PACKED
MPI::Status->MPI_SOURCE, MPI::Status->MPI_TAG, MPI::MPI_ERROR
Winter, 2004 CSS490 MPI 14
MPI_Send and MPI_Recv
#include <iostream.h>
#include "mpi++.h"

main(int argc, char *argv[])


{
int tag0 = 0;
MPI::Init(argc, argv); // Start MPI computation
if (MPI::COMM_WORLD.Get_rank() rank == 0 ) { // rank 0…sender
int loop = 3;
MPI::COMM_WORLD.Send( "Hello World!", 12, MPI::CHAR, 1, tag0 );
MPI::COMM_WORLD.Send( &loop, 1, MPI::INT, 1, tag0 );
} else { // rank 1…receiver
int loop; char msg[12];
MPI::COMM_WORLD.Recv( msg, 12, MPI::CHAR, 0, tag0 );
MPI::COMM_WORLD.Recv( &loop, 1, MPI::INT, 0, tag0 );
for (int I = 0; I < loop; I++ ) cout << msg << endl;
}
MPI::Finalize(); // Finish MPI
computation
Winter,
} 2004 CSS490 MPI 15
Message Ordering in MPI
Source Destination
 FIFO Ordering in
each data type

Source Destination  Messages reordered


tag = 1
with a tag in each
tag = 3
tag = 2 data type

Winter, 2004 CSS490 MPI 16


MPI_Bcast
Int MPI::COMM_WORLD.Bcast(
void* message /* in */,
int count /* in */,
MPI::Datatype datatype /* in */,
int root /* in */)

Rank Rank Rank Rank Rank


0 1 2 3 4

MPI::COMM_WORLD.Bcast( &msg, 1, MPI::INT, 2);

Winter, 2004 CSS490 MPI 17


MPI_Reduce
Int MPI::COMM_WORLD.Reduce(
void* operand /* in */,
void* result /* out */,
int count /* in */,
MPI::Datatype datatype /* in */,
MPI::Op operator /* in */,
int root /* in */)

MPI::Op = MPI::MAX (Maximum), MPI::MIN (Minimum), MPI::SUM (Sum),


MPI::PROD (Product), MPI::LAND (Logical and), MPI::BAND (Bitwise and),
MPI::LOR (Logical or), MPI::BOR (Bitwise or), MPI::LXOR (logical xor),
MPI::BXOR(Bitwise xor), MPI::MAXLOC (MAX location) MPI::MINLOC (MIN loc.)

Rank0 Rank1 Rank2 Rank3 Rank4


15 10 12 8 4
49
MPI::COMM_WORLD.Reduce( &msg, &result, 1, MPI::INT, MPI::SUM, 2);
Winter, 2004 CSS490 MPI 18
MPI_Allreduce
Int MPI::COMM_WORLD.Allreduce(
void* operand /* in */,
void* result /* out */,
int count /* in */,
MPI::Datatype datatype /* in */,
MPI::Op operator /* in */)

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

Winter, 2004 CSS490 MPI 19


Exercises (No turn-in)
1. Consider an application requiring both one-to-many and many-to-one
communication.
2. Consider an application requiring atomic multicast.
3. Assume that four processes communicate with one another in causal
ordering. Their current vectors are show below. If Process A sends a
message, which processes can receive it immediately?

Process A Process B Process C Process D


3, 5, 2, 1 2, 5, 2, 1 3, 5, 2, 1 3, 4, 2, 1
4. Consider pros and cons of PVM’s daemon-based and MPI’s library
linking-based message passing.
5. Why can MPI maintain FIFO ordering?

Winter, 2004 CSS490 MPI 20

You might also like