Module 5
Module 5
Module 5
• The logical view of a machine supporting the message-passing paradigm consists of p processes,
each with its own exclusive address space.
• Each data element must belong to one of the partitions of the space; hence, data must be explicitly
partitioned and placed.
• All interactions (read-only or read/write) require cooperation of two processes - the process
that has the data and the process that wants to access the data.
• These two constraints make underlying costs very explicit to the programmer.
• Message-passing programs are often written using the asynchronous or loosely synchronous
paradigms.
• In the asynchronous paradigm, all concurrent tasks execute asynchronously.
• In the loosely synchronous model, tasks or subsets of tasks synchronize to perform
interactions. Between these interactions, tasks execute completely asynchronously.
• Most message-passing programs are written using the single program multiple data (SPMD)
model.
Blocking buffered transfer protocols: (a) in the presence of communication hardware with buffers at
send and receive ends; and (b) in the absence of communication hardware, sender interrupts
receiver and deposits data in buffer at receiver end.
• Bounded buffer sizes can have significant impact on performance.
P0 P1
for (i = 0; i < 1000; i++){ for (i = 0; i < 1000; i++){
produce_data(&a); receive(&a, 1, 0);
send(&a, 1, 1); consume_data(&a);
} }
What if consumer was much slower than producer?
• Deadlocks are still possible with buffering since receive operations block.
P0 P1
receive(&a, 1, 1); receive(&a, 1, 0);
send(&b, 1, 1); send(&b, 1, 0);
Non-Blocking Message Passing Operations
• The programmer must ensure semantics of the send and receive.
• This class of non-blocking protocols returns from the send or receive operation before it is
semantically safe to do so.
• Non-blocking operations are generally accompanied by a check-status operation.
• When used correctly, these primitives are capable of overlapping communication overheads
with useful computations.
• Message passing libraries typically provide both blocking and non-blocking primitives.
Non-blocking non-buffered send and receive operations (a) in absence of communication hardware;
(b) in presence of communication hardware.
int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm,
MPI_Status *status)
• MPI provides equivalent datatypes for all C datatypes. This is done for portability reasons.
• The message-tag can take values ranging from zero up to the MPI defined constant
MPI_TAG_UB.
• If source is set to MPI_ANY_SOURCE, then any process of the communication domain can be
the source of the message.
• If tag is set to MPI_ANY_TAG, then messages with any tag are accepted.
• On the receive side, the message must be of length equal to or less than the length field
specified.
• On the receiving end, the status variable can be used to get information about the MPI_Recv
operation.
• The corresponding data structure contains:
typedef struct MPI_Status {
int MPI_SOURCE;
int MPI_TAG;
int MPI_ERROR; };
Avoiding deadlock: a common issue in message-passing interfaces (MPI) when processes send and
receive messages. Consider a scenario where process 0 sends two messages to process 1, and process
1 receives them in the reverse order.
If MPI_Send is blocking, process 0 will wait for process 1 to receive the message with tag 1.
Simultaneously, process 1 is waiting to receive the message with tag 2 from process 0. Both processes
are waiting on each other, leading to a deadlock.
Solution: Ensure that send and receive operations are matched in order. For instance, process 0 and
process 1 should both use the same order of messages
Sending and Receiving Messages Simultaneously: In a circular communication pattern, each
process sends a message to its neighbour and receives a message from another neighbour. If
MPI_Send is blocking, each process will wait to send its message until it can receive the message
from its neighbour. All processes end up waiting indefinitely, causing a circular deadlock.
Solution: Split processes into two groups (even and odd ranks) to alternate send and receive
operations. Use MPI_Sendrecv, which combines send and receive operations, ensuring no deadlocks
occur even in complex communication patterns
MPI Program:
#include <mpi.h>
main(int argc, char *argv[])
{
int npes, myrank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
printf("From process %d out of %d, Hello World!\n",
myrank, npes);
MPI_Finalize();
}
Topology and Embedding
• MPI allows a programmer to organize processors into logical k-d meshes.
• The processor ids in MPI_COMM_WORLD can be mapped to other communicators (corresponding
to higher-dimensional meshes) in many ways.
• The goodness of any such mapping is determined by the interaction pattern of the underlying
program and the topology of the machine.
• MPI does not provide the programmer any control over these mappings.
int MPI_Cart_create( )
This function takes the processes in the old communicator and creates a new communicator with
dims dimensions.
Each processor can now be identified in this new cartesian topology by a vector of dimension dims.
Since sending and receiving messages still require (one-dimensional) ranks, MPI provides routines
to convert ranks to cartesian coordinates and vice-versa.
6) If the result of the reduction operation is needed by all processes, MPI provides:
int MPI_Allreduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op,
MPI_Comm comm)
7) To compute prefix-sums, MPI provides:
int MPI_Scan(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op,
MPI_Comm comm)
8) The gather operation is performed in MPI using:
int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int
recvcount, MPI_Datatype recvdatatype, int target, MPI_Comm comm)
9) MPI also provides the MPI_Allgather function in which the data are gathered at all the processes.
int MPI_Allgather(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf,
int recvcount, MPI_Datatype recvdatatype, MPI_Comm comm)
10)The corresponding scatter operation is:
int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int
recvcount, MPI_Datatype recvdatatype, int source, MPI_Comm comm)
Using this core set of collective operations, a number of programs can be greatly simplified.