0% found this document useful (0 votes)

28 views115 pages

Unit3 All

Uploaded by

swatiulligeri52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views115 pages

Unit3 All

Uploaded by

swatiulligeri52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 115

UNIT-3

• UNIT-III 8 Hours
Programming using the Message-Passing Paradigm: Principles of Message-Passing Programming, The Building
Blocks: Send and Receive Operations, MPI: The Message Passing Interface, Topologies and Embedding,
Overlapping Communication with Computation, Collective Communication and Computation Operations. Self-
Study: Groups and Communicators.

Reference Book
Ananth Grama,Anshul Gupta,Vipin kumar, George Karypis, Introduction to parallel computing, second edition,
2003, Pearson education publishers.

Chapter 6

2
MPI
Message Passing Interface
3
Outline
• Background
• Message Passing
• Principles of Message-Passing Programming
• MPI
• Group and Context
• Communication Modes
• Blocking/Non-blocking
• Features
• Programming / issues
• The Building Blocks: Send and Receive Operations
• MPI: the Message Passing Interface
• Topologies and Embedding
• Overlapping Communication with Computation
• Collective Communication and Computation Operations
• Groups and Communicators
4
Distributed Computing Paradigms

Communication Computation
Models: Models:

Message Shared Functional

Data Parallel
Passing Memory Parallel

5
• Each processing elements cannot access all
data natively
Distributed • The scale can go up considerably
Memory • Penalty for coordinating with other
Parallelism processing elements is now significantly
higher
• Approaches change accordingly

6
7

Distributed Memory Multiprocessors

• Each processor has a local memory

• Physically separated memory address space
• Processors must communicate to access non-local data
• Message communication (message passing)
• Message passing architecture
• Processor interconnection network
• Parallel applications must be partitioned across
• Processors: execution units
• Memory: data partitioning
• Scalable architecture
• Small incremental cost to add hardware (cost of
node)
Distributed Memory (MP) Architecture
• Nodes are complete Network
computer systems
• Including I/O
• Nodes communicate
via interconnection M $ M $ M $
network
• Standard networks P P P
• Specialized networks
• Network interfaces Network interface
• Communication integration
• Easier to build
8
Bandwidth
• Need high bandwidth in communication
• Match limits in network, memory, and processor
• Network interface speed vs. network bisection

Performance bandwidth

Metrics: Latency
Latency and • Performance affected since processor may have to wait
• Harder to overlap communication and computation
Bandwidth • Overhead to communicate is a problem in many
machines

Latency hiding
• Increases programming system burden
• Examples: communication/computation overlaps,
prefetch 9
Advantages of Distributed Memory
Architectures

• The hardware can be simpler (especially versus NUMA) and is more

scalable
• Communication is explicit and simpler to understand
• Explicit communication focuses attention on costly aspect of parallel
computation
• Synchronization is naturally associated with sending messages, reducing
the possibility for errors introduced by incorrect synchronization
• Easier to use sender-initiated communication, which may have some
advantages in performance

10
Types of • Data parallel
Parallel • Simultaneous execution on multiple data items
• Example: Single Instruction, Multiple Data (SIMD)
Computing • Task parallel
• Different instructions on different data (MIMD)
Models • SPMD (Single Program, Multiple Data)
• Combination of data parallel and task parallel
• Not synchronized at individual operation level
• Message passing is for MIMD/SPMD parallelism
• Can be used for data parallel programming

11
Message Passing

A process is a program counter and Message passing is used for Inter-process communication:
address space. communication among processes.
Type:
• Synchronous / Asynchronous
Movement of data from one process’s address
space to another’s

12
The Message-Passing Model
• A process is a program counter and address space
• Processes can have multiple threads (program counters and associated stacks) sharing a
single address space process
P1 P2 P3 P4
thread
address space
(memory)

• MPI is for communication among processes

• Not threads
• Interprocess communication consists of
• Synchronization
• Data movement
13
Synchronous Vs. Asynchronous

A SYNCHRONOUS COMMUNICATION IS AN ASYNCHRONOUS COMMUNICATION

NOT COMPLETE UNTIL THE MESSAGE COMPLETES AS SOON AS THE MESSAGE
HAS BEEN RECEIVED. IS ON THE WAY.

14
Synchronous Vs. Asynchronous
( cont. )

15
What is message passing?

Requires Cooperation not

Data transfer. cooperation of always apparent in
sender and receiver code

16
SPMD ~~~~~
~~~ Multiple “Owner compute” rule:
~~~~ data Process that “owns”
~~ the data (local data)
performs computations
on that data
~~~~~
Shared ~~~
program ~~~~
~~

~~~~~ ~~~~~
~~~ ~~~
~~~~ ~~~~
~~ ~~

• Data distributed across processes

• Not shared
17
Message Passing Programming
Defined by communication requirements

• Data communication (necessary for algorithm)

• Control communication (necessary for dependencies)

Program behavior determined by communication patterns

Message passing infrastructure attempts to support the forms of communication most often used or desired

• Basic forms provide functional access

• Can be used most often
• Complex forms provide higher-level abstractions
• Serve as basis for extension
• Example: graph libraries, meshing libraries, …
• Extensions for greater programming power

18
Principles of
Message-Passing Programming

• The logical view of a machine supporting the message-passing paradigm

consists of p processes, each with its own exclusive address space.
• Each data element must belong to one of the partitions of the space; hence,
data must be explicitly partitioned and placed.
• All interactions (read-only or read/write) require cooperation of two processes
- the process that has the data and the process that wants to access the data.
• These two constraints, while onerous, make underlying costs very explicit to
the programmer.

19
• Message-passing programs are often
written using the asynchronous or loosely
synchronous paradigms.
Principles of • In the asynchronous paradigm, all

Message- concurrent tasks execute asynchronously.

• In the loosely synchronous model, tasks or
Passing subsets of tasks synchronize to perform
interactions. Between these interactions,

Programming tasks execute completely asynchronously.

• Most message-passing programs are
written using the single program multiple
data (SPMD) model.

20
Communication • Two ideas for communication
• Cooperative operations
Types • One-sided operations

21
Cooperative Operations for Communication
• Data is cooperatively exchanged in message-passing
• Explicitly sent by one process and received by another
• Advantage of local control of memory
• Any change in the receiving process’s memory is made with the
receiver’s explicit participation
• Communication and synchronization are combined

Process Process 1
0
Send(data)
Receive(data)

time
22
One-Sided Operations for Communication
One-sided operations between Include remote memory reads and writes
processes

Only one process needs to There is still agreement implicit in the

explicitly participate SPMD program

Communication and synchronization are

Advantages? decoupled

Process Process 1
0Put(data)
(memory)
(memory)
Get(data)
time
23
Pairwise vs. Collective
Communication
• Communication between process pairs
• Send/Receive or Put/Get
• Synchronous or asynchronous (we’ll talk about this later)
• Collective communication between multiple processes
• Process group (collective)
• Several processes logically grouped together
• Communication within group
• Collective operations
• Communication patterns
• broadcast, multicast, subset, scatter/gather, …
• Reduction operations

24
The Building Blocks:
Send and Receive Operations
The prototypes of these operations are as follows:

• send(void *sendbuf, int nelems, int dest)

• receive(void *recvbuf, int nelems, int source)

Consider the following code segments:

• P0 P1
• a = 100; receive(&a, 1, 0)
• send(&a, 1, 1); printf("%d\n", a);
• a = 0;

The semantics of the send operation require that the value received by process P1
must be 100 as opposed to 0.

This motivates the design of the send and receive protocols.

25
Blocking vs. Non-Blocking

BLOCKING, MEANS THE PROGRAM WILL NON-BLOCKING, MEANS THE PROGRAM

NOT CONTINUE UNTIL THE WILL CONTINUE, WITHOUT WAITING FOR
COMMUNICATION IS COMPLETED. THE COMMUNICATION TO BE COMPLETED.

26
Non-Buffered Blocking
Message Passing Operations Send/Receive

• Handshake for a blocking

non-buffered send/receive
operation.
• It is easy to see that in cases
where sender and receiver do
not reach communication
point at similar times, there
can be considerable idling
overheads.

27
Dead Lock in Blocking Non-buffered Operations

28
Buffered Blocking
Message Passing Operations :Send/Receive

A simple solution to the idling and The sender simply copies the data
deadlocking problem outlined into the designated buffer and
above is to rely on buffers at the returns after the copy operation
sending and receiving ends. has been completed.

The data must be buffered at the Buffering trades off idling overhead
receiving end as well. for buffer copying overhead.

29
30
Buffered Blocking
Message Passing Operations
• Blocking buffered transfer
protocols:
• (a) in the presence of
communication hardware
with buffers at send and
receive ends;
• (b) in the absence of
communication hardware,
sender interrupts receiver
and deposits data in buffer at
receiver end.

(a) (b) 31
Buffered Blocking
Message Passing Operations

Bounded buffer sizes can have signicant impact on performance.

P0 P1
for (i = 0; i < 1000; i++) { for (i = 0; i < 1000; i++){
produce_data(&a); receive(&a, 1, 0);
send(&a, 1, 1); consume_data(&a);
} }

What if consumer was much slower than producer?

-Bounded buffer requirements
32
Buffered Blocking
Message Passing Operations
Deadlocks are still possible with buffering since receive operations block.

P0 P1
receive(&a, 1, 1); receive(&a, 1, 0);
send(&b, 1, 1); send(&b, 1, 0);

33
Non-Blocking
Message Passing Operations

This class of non-blocking

The programmer must ensure Non-blocking operations are
protocols returns from the send
semantics of the send and generally accompanied by a
or receive operation before it is
receive. check-status operation.
semantically safe to do so.

When used correctly, these

primitives are capable of Message passing libraries
overlapping communication typically provide both blocking
overheads with useful and non-blocking primitives.
computations.

34
Non Blocking send and receive
Non-Blocking
Message Passing Operations

• Non-blocking non-buffered
send and receive operations
• (a) in absence of
communication hardware;
• (b) in presence of
communication hardware.

36
Send and Receive Protocols

• Space of possible protocols

for send and receive
operations.

37
MPI

38
• A message-passing library specifications:
• Extended message-passing model
• Not a language or compiler specification
• Not a specific implementation or product
• For parallel computers, clusters, and heterogeneous networks.
• Communication modes: standard, synchronous, buffered, and ready.
• Designed to permit the development of parallel software libraries.
• Designed to provide access to advanced parallel hardware for
• End users
• Library writers
• Tool developers
39
MPI
• MPI defines a standard library for message-passing that can be used
to develop portable message-passing programs using either C or
Fortran.
• The MPI standard defines both the syntax as well as the semantics of
a core set of library routines.
• Vendor implementations of MPI are available on almost all
commercial parallel computers.
• It is possible to write fully-functional message-passing programs by
using only the six routines.

41
Why Use MPI?
• Message passing is a mature parallel programming model
• Well understood
• Efficient match to hardware (interconnection networks)
• Many applications
• MPI provides a powerful, efficient, and portable way to express parallel programs
• MPI was explicitly designed to enable libraries …
• … which may eliminate the need for many users to learn (much of) MPI
• Need standard, rich, and robust implementation
• Three versions: MPI-1, MPI-2, MPI-3 , MPI-4
• Robust implementations including free MPICH (ANL)

42
General

• Communicators combine context and group for security

• Thread safety (implementation dependent)

Point-to-point communication
Features • Structured buffers and derived datatypes, heterogeneity

of MPI • Modes: normal, synchronous, ready, buffered

Collective
• Both built-in and user-defined collective operations
• Large number of data movement routines
• Subgroups defined directly or by topology

43
Features that are NOT part of MPI

PROCESS REMOTE MEMORY THREADS VIRTUAL SHARED

MANAGEMENT TRANSFER MEMORY

45
Is MPI Large or Small?
• MPI-1 is 128 functions, MPI-2 is 152 functions
MPI is large • Extensive functionality requires many functions
• Not necessarily a measure of complexity

MPI is small
• Many parallel programs use just 6 basic functions
(6 functions)

“MPI is just right,” • One can access flexibility when it is required

said Baby Bear • One need not master all parts of MPI to use it

46
To use or not use MPI? That is the question?

USE NOT USE

You need a portable parallel program You don’t need parallelism at all
You are writing a parallel library You can use libraries (which may be written in
You have irregular or dynamic data relationships MPI)
that do not fit a data parallel model You can use multi-threading in a concurrent
You care about performance and have to do environment

47
The minimal set of MPI routines.
MPI_Init Initializes MPI.
MPI_Finalize Terminates MPI.
MPI: the Message
Passing Interface MPI_Comm_size Determines the number of processes.
MPI_Comm_rank Determines the label of calling process.
MPI_Send Sends a message.
MPI_Recv Receives a message.

48
Starting and Terminating the MPI Library
MPI_Init is called prior to any calls to other MPI routines. Its purpose is to initialize the MPI
environment.

MPI_Finalize is called at the end of the computation, and it performs various clean-up tasks to
terminate the MPI environment.

int MPI_Init(int *argc, char ***argv)

The prototypes of these two functions are:
int MPI_Finalize()

MPI_Init also strips off any MPI related command-line arguments.

All MPI routines, data-types, and constants are prefixed by “MPI_”. The return code for successful
completion is MPI_SUCCESS.

49
Initialization and Finalization
MPI_Init
•gather information about the parallel job
•set up internal library state
•prepare for communication

MPI_Finalize
•cleanup

50
Communicators

51
Group and Context

Are two important and indivisible concepts of MPI.

Group: is the set of processes that communicate with one

another.

Context: it is somehow similar to the frequency in radio

communications.

Communicator: is the central object for communication in MPI.

Each communicator is associated with a group and a context.

52
Communicators
• A communicator defines a communication domain - a set of processes
that are allowed to communicate with each other.
• Information about communication domains is stored in variables of
type MPI_Comm.
• Communicators are used as arguments to all message transfer MPI
routines.
• A process can belong to many different (possibly overlapping)
communication domains.
• MPI defines a default communicator called MPI_COMM_WORLD
which includes all the processes.
53
MPI_COMM_WORLD

54
MPI_COMM_WORLD

55
Communication Scope
Communicator(communication handle)
•Defines the scope
•Specifies communication context

Process
•Belongs to a group
•Identified by a rank within a group

Identification
•MPI_Comm_size–total number of processes in communicator
•MPI_Comm_rank–rank in the communicator

56
Querying Information

1 2 3
The MPI_Comm_size and The calling sequences of The rank of a process is an
MPI_Comm_rank functions these routines are as follows: integer that ranges from zero
are used to determine the • int MPI_Comm_size(MPI_Comm up to the size of the
number of processes and the comm, int *size) communicator minus one.
label of the calling process, • int MPI_Comm_rank(MPI_Comm
comm, int *rank)
respectively.

57
#include <mpi.h>

main(int argc, char *argv[])

{
int npes, myrank;
MPI_Init(&argc, &argv); Our First MPI
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank); Program
printf("From process %d out of %d, Hello World!\n",
myrank, npes);
MPI_Finalize();
}

58
Sending and Receiving Messages
• The basic functions for sending and receiving messages in MPI are the MPI_Send
and MPI_Recv, respectively.
• The calling sequences of these routines are as follows:
int MPI_Send(void *buf, int count, MPI_Datatype
datatype, int dest, int tag, MPI_Comm comm)
int MPI_Recv(void *buf, int count, MPI_Datatype
datatype, int source, int tag, MPI_Comm comm,
MPI_Status *status)
• MPI provides equivalent datatypes for all C datatypes. This is done for portability
reasons.
• The datatype MPI_BYTE corresponds to a byte (8 bits) and MPI_PACKED
corresponds to a collection of data items that has been created by packing non-
contiguous data.
• The message-tag can take values ranging from zero up to the MPI defined constant
MPI_TAG_UB. 59
MPI Datatype C Datatype
MPI_CHAR signed char
MPI_SHORT signed short int
MPI_INT signed int
MPI_LONG signed long int
MPI_UNSIGNED_CHAR unsigned char
MPI MPI_UNSIGNED_SHORT unsigned short int
Datatypes MPI_UNSIGNED unsigned int
MPI_UNSIGNED_LONG unsigned long int
MPI_FLOAT float
MPI_DOUBLE double
MPI_LONG_DOUBLE long double
MPI_BYTE
MPI_PACKED

60
MPI allows specification of wildcard arguments
for both source and tag.

If source is set to MPI_ANY_SOURCE, then any

Sending and process of the communication domain can be
the source of the message.
Receiving If tag is set to MPI_ANY_TAG, then messages
Messages with any tag are accepted.

On the receive side, the message must be of

length equal to or less than the length field
specified.

61
• On the receiving end, the status variable can be used to get information about the
MPI_Recv operation.
• The corresponding data structure contains:
Sending typedef struct MPI_Status {
int MPI_SOURCE;
and
int MPI_TAG;
Receiving int MPI_ERROR; };
Messages • The MPI_Get_count function returns the precise count of data items received.
int MPI_Get_count(MPI_Status *status, MPI_Datatype
datatype, int *count)

62
Consider:

int a[10], b[10], myrank;

MPI_Status status;
...
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

Avoiding if (myrank == 0) {
MPI_Send(a, 10, MPI_INT, 1, 1, MPI_COMM_WORLD);

Deadlocks
- Each process is waiting for matching
}
MPI_Send(b, 10, MPI_INT, 1, 2, MPI_COMM_WORLD);

MPI_Send , wrt tag else if (myrank == 1) {

MPI_Recv(b, 10, MPI_INT, 0, 2, MPI_COMM_WORLD);
MPI_Recv(a, 10, MPI_INT, 0, 1, MPI_COMM_WORLD);
}
...

If MPI_Send is blocking, there is a deadlock.

63
• Dead lock may also occur when process
sends message to itself
Avoiding • It is legal

Deadlocks
-
• Behavior is implementation dependent
and must be avoided

64
• Improper use of MPI_Send and MPI_Recv can also lead to
deadlocks in situations when each processor needs to send
and receive a message in a circular fashion.

Avoiding 1 int a[10], b[10], npes, myrank;

2 MPI_Status status;

Deadlocks
-
3 ...
4 MPI_Comm_size(MPI_COMM_WORLD, &npes);
5 MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
6 MPI_Send(a, 10, MPI_INT, (myrank+1)%npes, 1,
MPI_COMM_WORLD);
7 MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1,
MPI_COMM_WORLD);
8 ...

65
Consider the following piece of code, in which process i sends a
message to process i + 1 (modulo the number of processes) and receives
a message from process i - 1 (module the number of processes).

int a[10], b[10], npes, myrank;

MPI_Status status;
...
Avoiding MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
Deadlocks MPI_Send(a, 10, MPI_INT, (myrank+1)%npes, 1,
MPI_COMM_WORLD);
MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1,
MPI_COMM_WORLD);
...

Once again, we have a deadlock if MPI_Send is blocking.

66
We can break the circular wait to avoid deadlocks as follows:

int a[10], b[10], npes, myrank;

MPI_Status status;
...
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank%2 == 1) {
Avoiding MPI_Send(a, 10, MPI_INT, (myrank+1)%npes, 1,
MPI_COMM_WORLD);

Deadlocks MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1,

MPI_COMM_WORLD);
}
else {
MPI_Recv(a, 10, MPI_INT, (myrank-1+npes)%npes, 1,
MPI_COMM_WORLD);
MPI_Send(b, 10, MPI_INT, (myrank+1)%npes, 1,
MPI_COMM_WORLD);
}
...

67
Sending and Receiving
Messages Simultaneously

To exchange messages, MPI provides the following function:

int MPI_Sendrecv(void *sendbuf, int sendcount, MPI_Datatype senddatatype, int dest,

int sendtag, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int source,
int recvtag, MPI_Comm comm, MPI_Status *status)

The safe version of earlier example using MPI_Sendrecv is

int a[10], b[10], npes, myrank;
MPI_Status status;
...
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_SendRecv(a, 10, MPI_INT, (myrank+1)%npes, 1,
b, 10, MPI_INT, (myrank-1+npes)%npes, 1,
MPI_COMM_WORLD, &status);
... 68
int MPI_Sendrecv_replace(void *buf, int count, MPI_Datatype datatype, int dest, int sendtag,
int source, int recvtag, MPI_Comm comm, MPI_Status *status)

MPI_Sendrecv_replace uses
• single buffer for sending & receiving
• Performs blocking send , receive
• Send & receive must transfer data of the same datatype

69
Topologies and Embedding

70
MPI allows a programmer to organize processors
into logical k-d meshes.

The processor ids in MPI_COMM_WORLD can be

mapped to other communicators (corresponding
Topologies and

to higher-dimensional meshes) in many ways.

Embeddings

The goodness of any such mapping is determined

by the interaction pattern of the underlying
program and the topology of the machine.

MPI does not provide the programmer any control

over these mappings.

71
• Different ways to map a set of processes to a two-dimensional
Topologies and grid.
(a) and (b) show a row- and column-wise mapping of these
Embeddings processes,
• (c) shows a mapping that follows a space-flling curve (dotted
line), and
• (d) shows a mapping in which neighboring processes are
directly connected in a hypercube.

72
Creating and Using Cartesian Topologies

• Virtual process topology – of arbitrary connection – in terms of graph

• Each node corresponds to the process
• Link indicate they are communicating with each other
• Graphs can be used specify any desired topology
• Commonly used topologies in message-passing programs are one-, two-, or higher-dimensional
grids, that are also referred to as Cartesian topologies
• MPI's function for describing Cartesian topologies is called MPI_Cart_create

int MPI_Cart_create(MPI_Comm comm_old, int ndims, int dims, int periods,

int reorder, MPI_Comm *comm_cart)

This function takes the processes in the old communicator and creates a new communicator with dims dimensions.
• Each processor can now be identified in this new cartesian topology by a vector of dimension dims.

73
Since sending and receiving messages still require (one-dimensional)
ranks, MPI provides routines to convert ranks to cartesian coordinates
and vice-versa.
• int MPI_Cart_coord(MPI_Comm comm_cart, int rank, int maxdims, int
*coords)
• int MPI_Cart_rank(MPI_Comm comm_cart, int *coords, int *rank)

The most common operation on cartesian topologies is a shift. To determine the rank
of source and destination of such shifts, MPI provides the following function:

• int MPI_Cart_shift(MPI_Comm comm_cart, int dir, int s_step, int

*rank_source, int *rank_dest)

74
Overlapping Communication with
Computation

75
76
Overlapping Communication
with Computation
• int MPI_Isend(void *buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm, MPI_Request *request)
• int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int
source, int tag, MPI_Comm comm, MPI_Request *request)

• These operations return before the operations have been completed.

• Function MPI_Test tests whether or not the non-blocking send or receive operation identified by its request has finished.

• int MPI_Test(MPI_Request request, int flag,MPI_Status *status)

• MPI_Wait waits for the operation to complete.
int MPI_Wait(MPI_Request *request, MPI_Status *status)

77
Avoiding Deadlocks Non Blocking counter Part
int a[10], b[10], myrank;
MPI_Status status;
Using non-blocking operations remove most deadlocks. Consider:
MPI_Request requests[2];
int a[10], b[10], myrank; ...
MPI_Status status;
... MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0) {
if (myrank == 0) {
MPI_Send(a, 10, MPI_INT, 1, 1, MPI_COMM_WORLD); MPI_Send(a, 10, MPI_INT, 1, 1,
MPI_Send(b, 10, MPI_INT, 1, 2, MPI_COMM_WORLD); MPI_COMM_WORLD);
}
else if (myrank == 1) { MPI_Send(b, 10, MPI_INT, 1, 2,
MPI_Recv(b, 10, MPI_INT, 0, 2, &status, MPI_COMM_WORLD);
MPI_COMM_WORLD);
MPI_Recv(a, 10, MPI_INT, 0, 1, &status,
}
MPI_COMM_WORLD); else if (myrank == 1) {
}
... MPI_Irecv(b, 10, MPI_INT, 0, 2,
&requests[0], MPI_COMM_WORLD);
Replacing either the send or the receive operations with
non-blocking counterparts fixes this deadlock. MPI_Irecv(a, 10, MPI_INT, 0, 1,
&requests[1], MPI_COMM_WORLD);
} 78
MPI provides an extensive set of
functions for performing common
collective communication
Collective operations.

Communication
and Each of these operations is defined
over a group corresponding to the
Computation communicator.
Operations

All processors in a communicator

must call these operations.

79
80
81
int MPI_Bcast(void *buf, int count, MPI_Datatype datatype,
int source, MPI_Comm comm)
82
83
Reduction
• int MPI_Reduce(void *sendbuf, void *recvbuf,
int count, MPI_Datatype datatype, MPI_Op op,
int target, MPI_Comm comm)

84
• Predefined Operations
Predefined Reduction Operations
Operation Meaning Datatypes
MPI_MAX Maximum C integers and floating point
MPI_MIN Minimum C integers and floating point
MPI_SUM Sum C integers and floating point
MPI_PROD Product C integers and floating point
MPI_LAND Logical AND C integers
MPI_BAND Bit-wise AND C integers and byte
MPI_LOR Logical OR C integers
MPI_BOR Bit-wise OR C integers and byte
MPI_LXOR Logical XOR C integers
MPI_BXOR Bit-wise XOR C integers and byte
MPI_MAXLOC max-min value-location Data-pairs
MPI_MINLOC min-min value-location Data-pairs

85
An example use of the MPI_MINLOC and MPI_MAXLOC operators.

• The operation MPI_MAXLOC combines pairs of values (vi, li) and returns
the pair (v, l) such that v is the maximum among all vi 's and l is the
corresponding li (if there are more than one, it is the smallest among all
these li 's).
• MPI_MINLOC does the same, except for minimum value of vi.

86
MPI datatypes for data-pairs used with the MPI_MAXLOC and
MPI_MINLOC reduction operations.

MPI Datatype C Datatype

Collective MPI_2INT pair of ints
Communication MPI_SHORT_INT short and int
Operations
MPI_LONG_INT long and int
MPI_LONG_DOUBLE_INT long double and int
MPI_FLOAT_INT float and int
MPI_DOUBLE_INT double and int

87
• If the result of the reduction operation is needed by all
processes, MPI provides:
int MPI_Allreduce(void *sendbuf, void
*recvbuf, int count, MPI_Datatype
Collective datatype, MPI_Op op,MPI_Comm comm)
Communication
• To compute prefix-sums, MPI provides:
Operations int MPI_Scan(void *sendbuf, void*
recvbuf, int count,MPI_Datatype datatype,
MPI_Op op, MPI_Comm comm)

88
Collective Communication Operations
The barrier
synchronization operation • int MPI_Barrier(MPI_Comm comm)
is performed in MPI using:

The one-to-all broadcast • int MPI_Bcast(void *buf, int count, MPI_Datatype

operation is: datatype, int source, MPI_Comm comm)

• int MPI_Reduce(void sendbuf, void recvbuf, int

The all-to-one reduction
count, MPI_Datatype datatype, MPI_Op op, int target,
operation is: MPI_Comm comm)

89
• Int MPI_Allreduce(void *sendbuf, void *recvbuf, int
The all-to-all reduction count, MPI_Datatype datatype, MPI_Op op,
operation is: MPI_Comm comm)

• int MPI_Scan(void sendbuf, void recvbuf, int count,

The prefix reduction MPI_Datatype datatype, MPI_Op op, MPI_Comm
operation is: comm)

90
Scatter and Gather

• int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype senddatatype,

void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int source,
MPI_Comm comm)

• int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype senddatatype,

void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int target,
MPI_Comm comm)

91
Vector variants of gather& allgather

• int MPI_Gatherv(void *sendbuf, int sendcount, MPI_Datatype

Gatherv: senddatatype, void *recvbuf, int *recvcounts, int *displs,
MPI_Datatype recvdatatype, int target, MPI_Comm comm)

• int MPI_Allgatherv(void *sendbuf, int sendcount,

Allgatherv: MPI_Datatype senddatatype, void *recvbuf, int *recvcounts,
int *displs, MPI_Datatype recvdatatype, MPI_Comm comm)

92
Vector variants of scater

• int MPI_Scatterv(void sendbuf, int sendcounts, int

Scatterv: displs, MPI_Datatype senddatatype, void recvbuf,

int recvcount, MPI_Datatype recvdatatype, int source,
MPI_Comm comm)

93
94
95
96
97
Groups and Communicators

• In many parallel algorithms, communication operations need to be restricted to

certain subsets of processes.
• MPI provides mechanisms for partitioning the group of processes that belong to a
communicator into subgroups each corresponding to a different communicator.
• The simplest such mechanism is:
int MPI_Comm_split(MPI_Comm comm, int color, int
key, MPI_Comm *newcomm)
• This operation groups processors by color and sorts resulting groups on the key.
• Using MPI_Comm_split to split a group of processes in a communicator
into subgroups.
Communicators
Groups and
• In many parallel algorithms, processes are
arranged in a virtual grid, and in different steps
of the algorithm, communication needs to be
restricted to a different subset of the grid.
• MPI provides a convenient way to partition a
Cartesian topology to form lower-dimensional
grids:
int MPI_Cart_sub(MPI_Comm

Groups and comm_cart, int *keep_dims,

MPI_Comm

Communicators
*comm_subcart)
• If keep_dims[i] is true (non-zero value in
C) then the ith dimension is retained in the
new sub-topology.
• The coordinate of a process in a sub-topology
created by MPI_Cart_sub can be obtained
from its coordinate in the original topology by
disregarding the coordinates that correspond
to the dimensions that were not retained.
Groups and Communicators

• Splitting a Cartesian topology of size 2 x 4 x 7

into (a) four
• subgroups of size 2 x 1 x 7, and (b) eight
subgroups of size 1 x 1 x 7.
Basic Commands
Standard with blocking
Skeleton MPI Program

#include <mpi.h>

main( int argc, char** argv )

{
MPI_Init( &argc, &argv );

/* main part of the program */

/*
Use MPI function call depend on your data partitioning and the
parallelization architecture
*/

MPI_Finalize();
The initialization routine
MPI_INIT is the first MPI
routine called.
Initializing
MPI
MPI_INIT is called once
int mpi_Init( int *argc, char
**argv );
A minimal MPI program(c)
#include “mpi.h”
#include <stdio.h>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
printf(“Hello, world!\n”);
MPI_Finalize();
Return 0;
}
• #include “mpi.h” provides basic MPI definitions and
types.

A minimal • MPI_Init starts MPI

MPI • MPI_Finalize exits MPI

program(c) • Note that all non-MPI routines are local; thus “printf”
(cont.) run on each process

• Note: MPI functions return error codes or

MPI_SUCCESS
By default, an error causes all
processes to abort.

Error handling The user can have his/her own

error handling routines.

Some custom error handlers

are available for downloading
from the net.
>include <mpi.h#
>include <stdio.h#
int main(int argc, char *argv[])
{
Improved ;int rank, size
;MPI_Init(&argc, &argv)
Hello (c) ;MPI_Comm_rank(MPI_COMM_WORLD, &rank)
;MPI_Comm_size(MPI_COMM_WORLD, &size)
printf("I am %d of %d\n", rank,
;size)
;)(MPI_Finalize
;return 0
}
The default communicator is
the MPI_COMM_WORLD
Some
concepts
A process is identified by its
rank in the group associated
with a communicator.
The data message which is sent or received is The following data types are supported by MPI:
described by a triple (address, count, datatype).

Predefined data types that are corresponding to data types from

the programming language.
Arrays.
Sub blocks of a matrix
User defined data structure.
A set of predefined data types

Data Types
• MPI datatype C
datatype

• MPI_CHAR signed char

• MPI_SIGNED_CHAR signed char
• MPI_UNSIGNED_CHAR unsigned char

Basic MPI • MPI_SHORT

short
signed

types
• MPI_UNSIGNED_SHORT unsigned short
• MPI_INT signed int
• MPI_UNSIGNED unsigned int
• MPI_LONG signed long
• MPI_UNSIGNED_LONG unsigned long
• MPI_FLOAT float
• MPI_DOUBLE double
• MPI_LONG_DOUBLE long double
Why defining
the data types
Because communications take during the
place between heterogeneous
machines. Which may have send of a
different data representation and
length in the memory. message?
MPI_SEND(void *start, int
count,MPI_DATATYPE datatype,
int dest, int tag, MPI_COMM

MPI blocking comm)

• The message buffer is described by (start,
send count, datatype).
• dest is the rank of the target process in the
defined communicator.
• tag is the message identification number.
MPI_RECV(void *start, int count,
MPI_DATATYPE datatype, int source, int
tag, MPI_COMM comm, MPI_STATUS *status)
• Source is the rank of the sender in the communicator.

MPI blocking • The receiver can specify a wildcard value for source
(MPI_ANY_SOURCE) and/or a wildcard value for tag
(MPI_ANY_TAG), indicating that any source and/or tag are
receive acceptable
• Status is used for extra information about the received
message if a wildcard receive mode is used.
• If the count of the message received is less than or equal to
that described by the MPI receive command, then the
message is successfully received. Else it is considered as a
buffer overflow error.
Status is a data structure

In C:

• int recvd_tag, recvd_from,

recvd_count;
MPI_STATUS • MPI_Status status;
• MPI_Recv(…, MPI_ANY_SOURCE,
MPI_ANY_TAG, …, &status)
• recvd_tag = status.MPI_TAG;
• recvd_from = status.MPI_SOURCE;
• MPI_Get_count(&status, datatype,
&recvd_count);
• A receive operation may accept messages
from an arbitrary sender, but a send
operation must specify a unique receiver.
More info
• Source equals destination is allowed, that
is, a process can send a message to itself.

Unit - 3 - My
No ratings yet
Unit - 3 - My
84 pages
Unit-II Part I
No ratings yet
Unit-II Part I
15 pages
ch5 MPI
No ratings yet
ch5 MPI
53 pages
Lecture 05 - Programming Models
No ratings yet
Lecture 05 - Programming Models
14 pages
PDC Lec 8
No ratings yet
PDC Lec 8
40 pages
Lecture 3
No ratings yet
Lecture 3
19 pages
Module 5
No ratings yet
Module 5
9 pages
Message Passing Architecture
No ratings yet
Message Passing Architecture
32 pages
Inter Process Communication
No ratings yet
Inter Process Communication
25 pages
Mpi 2
No ratings yet
Mpi 2
46 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
BIg Data Anslysi
No ratings yet
BIg Data Anslysi
57 pages
Mpi Course
No ratings yet
Mpi Course
93 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
CH 6
No ratings yet
CH 6
47 pages
Unit 3 (3.3) Inter Process Communication (IPC)
No ratings yet
Unit 3 (3.3) Inter Process Communication (IPC)
18 pages
Interprocess Communication & Process Synchronization: Fall 09
No ratings yet
Interprocess Communication & Process Synchronization: Fall 09
51 pages
5 Interprocess Communication
No ratings yet
5 Interprocess Communication
20 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Unit 2 Pram Algorithms: Structure Page Nos
No ratings yet
Unit 2 Pram Algorithms: Structure Page Nos
25 pages
Next Chapter Os
No ratings yet
Next Chapter Os
26 pages
Lecture 03 InterprocessCommunication
No ratings yet
Lecture 03 InterprocessCommunication
45 pages
An Autonomous Institution: Course Name: 19Csb201 - Operating Systems
No ratings yet
An Autonomous Institution: Course Name: 19Csb201 - Operating Systems
23 pages
Mca 4223 L16
No ratings yet
Mca 4223 L16
21 pages
Chapter 2
No ratings yet
Chapter 2
75 pages
UNIT-2-Process Synchronization
No ratings yet
UNIT-2-Process Synchronization
52 pages
PDC Lecture 14 MPI Sockets and Memory Models
No ratings yet
PDC Lecture 14 MPI Sockets and Memory Models
20 pages
Unit 2 Lecture 21 - IPC
No ratings yet
Unit 2 Lecture 21 - IPC
24 pages
Cse333 ch3 B
No ratings yet
Cse333 ch3 B
16 pages
Distributed System Message Passing
No ratings yet
Distributed System Message Passing
30 pages
Parallel Random Access Machines
No ratings yet
Parallel Random Access Machines
5 pages
Message Passing-1
No ratings yet
Message Passing-1
76 pages
Message Passing in Distributed Operating Systems
No ratings yet
Message Passing in Distributed Operating Systems
39 pages
Receive Etc
No ratings yet
Receive Etc
16 pages
Distributed Computing
No ratings yet
Distributed Computing
189 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
Interprocess Communication
100% (1)
Interprocess Communication
28 pages
Inter Process Communication
No ratings yet
Inter Process Communication
59 pages
Os Unit-Iii
No ratings yet
Os Unit-Iii
8 pages
Principles of Operating Systems: Lecture 5 - Interprocess Communication Ardalan Amiri Sani
No ratings yet
Principles of Operating Systems: Lecture 5 - Interprocess Communication Ardalan Amiri Sani
38 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
04 Process Con
No ratings yet
04 Process Con
26 pages
Inter-Process Communication
No ratings yet
Inter-Process Communication
37 pages
What Is A Message Passing System? Discuss The Desirable Feature of A Message Passing System. Ans
No ratings yet
What Is A Message Passing System? Discuss The Desirable Feature of A Message Passing System. Ans
8 pages
Interprocess Communicatio N
No ratings yet
Interprocess Communicatio N
29 pages
Os New
No ratings yet
Os New
8 pages
Module 203 20 - 20MPI 20for 20cluster 20computing 20lec
No ratings yet
Module 203 20 - 20MPI 20for 20cluster 20computing 20lec
30 pages
Unit 1 Part 2 IPC
No ratings yet
Unit 1 Part 2 IPC
7 pages
3 ParallelProgrammingModels
No ratings yet
3 ParallelProgrammingModels
20 pages
Task Communication
No ratings yet
Task Communication
9 pages
PDC Lec 9
No ratings yet
PDC Lec 9
26 pages
Message Passing Interface: Parallel Processing Course University of Tehran
No ratings yet
Message Passing Interface: Parallel Processing Course University of Tehran
49 pages
OS Module-2
No ratings yet
OS Module-2
101 pages
Lecture 2 - Asynchrnous and Synchronous Computation & Communication
No ratings yet
Lecture 2 - Asynchrnous and Synchronous Computation & Communication
28 pages
Operating Systems: IPC - Shared Memory & Message Passing
No ratings yet
Operating Systems: IPC - Shared Memory & Message Passing
24 pages
Interprocess Communication
No ratings yet
Interprocess Communication
17 pages
Interprocess Communication
No ratings yet
Interprocess Communication
25 pages
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
No ratings yet
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
22 pages
OS Chapter 3 - C Interprocess Communication
No ratings yet
OS Chapter 3 - C Interprocess Communication
29 pages
Parallel Programming with MPI: Definitive Reference for Developers and Engineers
From Everand
Parallel Programming with MPI: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BDC - Batch Input
No ratings yet
BDC - Batch Input
52 pages
How To Log Defects
No ratings yet
How To Log Defects
6 pages
Programming in Java: Biyani's Think Tank
No ratings yet
Programming in Java: Biyani's Think Tank
103 pages
Blender Tutorials-1
No ratings yet
Blender Tutorials-1
1 page
DS Assignment-2 (20csu073)
No ratings yet
DS Assignment-2 (20csu073)
21 pages
.JSP Extension Are The File Name Homepage - JSP: JSP & JDB Lab Manual For IT by Abdo .A at W K U (2 0 1 7)
No ratings yet
.JSP Extension Are The File Name Homepage - JSP: JSP & JDB Lab Manual For IT by Abdo .A at W K U (2 0 1 7)
6 pages
DS-2CD1043G2 Datasheet V5.7.1 20230425
No ratings yet
DS-2CD1043G2 Datasheet V5.7.1 20230425
5 pages
The Restaurant Management System Using Net
No ratings yet
The Restaurant Management System Using Net
123 pages
Catia MCQ
No ratings yet
Catia MCQ
4 pages
p139 Data Mining Mafia
No ratings yet
p139 Data Mining Mafia
13 pages
Product Development Life Cycle
No ratings yet
Product Development Life Cycle
1 page
Viva Questions Class 12
No ratings yet
Viva Questions Class 12
28 pages
Student Book Touchstone 2
0% (1)
Student Book Touchstone 2
3 pages
Coursera
No ratings yet
Coursera
4 pages
Set A: Test Code: JS-X-62-17
No ratings yet
Set A: Test Code: JS-X-62-17
2 pages
BRKNMS-2573 (2019)
No ratings yet
BRKNMS-2573 (2019)
106 pages
Governance, Risk and Compliance - Energy Industry
100% (2)
Governance, Risk and Compliance - Energy Industry
4 pages
Context Switching'
No ratings yet
Context Switching'
1 page
IBM Product Development
No ratings yet
IBM Product Development
12 pages
Kohler 35EFKOZD Specifications
No ratings yet
Kohler 35EFKOZD Specifications
4 pages
Ir 2113
No ratings yet
Ir 2113
18 pages
Oie751 Robotics Unit I
No ratings yet
Oie751 Robotics Unit I
123 pages
Demystifying Noise Spectre Example
No ratings yet
Demystifying Noise Spectre Example
20 pages
Hikvision: PRICE LIST March 2022
No ratings yet
Hikvision: PRICE LIST March 2022
32 pages
AY23-24 CSE TT Time Table - Template - Final
No ratings yet
AY23-24 CSE TT Time Table - Template - Final
2 pages
Wms Requirements Template
100% (1)
Wms Requirements Template
13 pages
To For "Passport Mela" With Of: On Their
No ratings yet
To For "Passport Mela" With Of: On Their
2 pages
NASA Science Mission Directorate Knowledge Graph Discovery
No ratings yet
NASA Science Mission Directorate Knowledge Graph Discovery
6 pages
Geek Pride Day by Slidesgo
No ratings yet
Geek Pride Day by Slidesgo
47 pages
Exam Cell Automation System
No ratings yet
Exam Cell Automation System
3 pages

Unit3 All

Uploaded by

Unit3 All

Uploaded by

UNIT-3

Message Shared Functional

Distributed Memory Multiprocessors

• Each processor has a local memory

• The hardware can be simpler (especially versus NUMA) and is more

• MPI is for communication among processes

A SYNCHRONOUS COMMUNICATION IS AN ASYNCHRONOUS COMMUNICATION

Requires Cooperation not

• Data distributed across processes

• Data communication (necessary for algorithm)

Program behavior determined by communication patterns

• Basic forms provide functional access

• The logical view of a machine supporting the message-passing paradigm

Message- concurrent tasks execute asynchronously.

Programming tasks execute completely asynchronously.

Only one process needs to There is still agreement implicit in the

Communication and synchronization are

• send(void *sendbuf, int nelems, int dest)

Consider the following code segments:

This motivates the design of the send and receive protocols.

BLOCKING, MEANS THE PROGRAM WILL NON-BLOCKING, MEANS THE PROGRAM

• Handshake for a blocking

Bounded buffer sizes can have signicant impact on performance.

What if consumer was much slower than producer?

This class of non-blocking

When used correctly, these

• Space of possible protocols

• Communicators combine context and group for security

of MPI • Modes: normal, synchronous, ready, buffered

PROCESS REMOTE MEMORY THREADS VIRTUAL SHARED

“MPI is just right,” • One can access flexibility when it is required

USE NOT USE

int MPI_Init(int *argc, char ***argv)

MPI_Init also strips off any MPI related command-line arguments.

Are two important and indivisible concepts of MPI.

Group: is the set of processes that communicate with one

Context: it is somehow similar to the frequency in radio

Communicator: is the central object for communication in MPI.

main(int argc, char *argv[])

If source is set to MPI_ANY_SOURCE, then any

On the receive side, the message must be of

int a[10], b[10], myrank;

MPI_Send , wrt tag else if (myrank == 1) {

If MPI_Send is blocking, there is a deadlock.

Avoiding 1 int a[10], b[10], npes, myrank;

int a[10], b[10], npes, myrank;

Once again, we have a deadlock if MPI_Send is blocking.

int a[10], b[10], npes, myrank;

Deadlocks MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1,

To exchange messages, MPI provides the following function:

int MPI_Sendrecv(void *sendbuf, int sendcount, MPI_Datatype senddatatype, int dest,

The safe version of earlier example using MPI_Sendrecv is

The processor ids in MPI_COMM_WORLD can be

to higher-dimensional meshes) in many ways.

The goodness of any such mapping is determined

MPI does not provide the programmer any control

• Virtual process topology – of arbitrary connection – in terms of graph

int MPI_Cart_create(MPI_Comm comm_old, int ndims, int *dims, int *periods,

• int MPI_Cart_shift(MPI_Comm comm_cart, int dir, int s_step, int

• These operations return before the operations have been completed.

• int MPI_Test(MPI_Request *request, int *flag,MPI_Status *status)

All processors in a communicator

MPI Datatype C Datatype

The one-to-all broadcast • int MPI_Bcast(void *buf, int count, MPI_Datatype

• int MPI_Reduce(void *sendbuf, void *recvbuf, int

• int MPI_Scan(void *sendbuf, void *recvbuf, int count,

• int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype senddatatype,

• int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype senddatatype,

• int MPI_Gatherv(void *sendbuf, int sendcount, MPI_Datatype

• int MPI_Allgatherv(void *sendbuf, int sendcount,

• int MPI_Scatterv(void *sendbuf, int *sendcounts, int

Scatterv: *displs, MPI_Datatype senddatatype, void *recvbuf,

• In many parallel algorithms, communication operations need to be restricted to

Groups and comm_cart, int *keep_dims,

• Splitting a Cartesian topology of size 2 x 4 x 7

main( int argc, char** argv )

int MPI_Cart_create(MPI_Comm comm_old, int ndims, int dims, int periods,

• int MPI_Test(MPI_Request request, int flag,MPI_Status *status)

• int MPI_Reduce(void sendbuf, void recvbuf, int

• int MPI_Scan(void sendbuf, void recvbuf, int count,

• int MPI_Scatterv(void sendbuf, int sendcounts, int

Scatterv: displs, MPI_Datatype senddatatype, void recvbuf,