05 DistributedMemoryMPI
05 DistributedMemoryMPI
Passing Programming
Programming with MPI
Francesco Moscato
Outline
1 MPI Basics
3 Collective Communication
4 Derived DataTypes
5 IO
6 Questions
Distributed Memory
By Message Passing, Nodes with local Memory create an Abstract
Memory that is distributed among nodes
Passing Data
Data in local memories are shared among nodes in the sense that
nodes that own data in their local memories can send them to
other nodes by using proper communications
MPI
MPI is a standard, vendor-independent, portable library
Supported Languages
C, C++, Fortran, Fortran90, Fortran 2008
Some wrappers for other Interpreted languages:
e.g.: Python with mpi4py
Common Implementation
MPICH
http://www.mpich.org
OpenMPI
http://www.open-mpi.org
Intel MPI
https://software.intel.com/content/www/us/en/develop/
tools/oneapi/components/mpi-library.html
Running MPI
Synchronous Communications
Implement blocking p2p communications
E.G.: M P I− Send() and M P I− Recv()
Asynchronous Communications
Implement non-blocking communications
E.G.: M P I− Isend() and M P I− Irecv()
Blocking Send
MPI− Send
1 int MPI_Send(void *buf, int count, MPI_Datatype datatype,
2 int dest, int tag, MPI_Comm comm)
MPI Datatypes
• M P I− CHAR, M P I− IN T , M P I− F LOAT ,
M P I− DOU BLE, M P I− LON G
• Array of DataTypes
• indexed arrays of blocks of datatypes
• arbitrary structure of datatypes
• Custom Datatypes ...
Blocking Receive
MPI− Recv
1 int MPI_Recv(void *buf, int count, MPI_Datatype datatype,
2 int source, int tag, MPI_Comm comm, MPI_Status *
status)
Example
array sending
Let’s have two processes, one owning a vector of positive numbers,
another owning a vector of negative numbers.
Let’s these processes exchange their vectors.
runinng
mpirun -np 2 ./exchange
Examples
Example
MPI− Sendrcv
Click Here!
Test
MPI− Test
Tests the stauts of a request (created by a Isend or a Irecv)
1 int MPI_Test(
2 MPI_Request* request,
3 int * flag,
4 MPI_Status* status)
Wait
MPI− Wait
Waits the execution of a request (created by a Isend or a Irecv)
1 int MPI_Wait(
2 MPI_Request* request,
3 MPI_Status* status)
Exercise
Ring Topology
Realize a Ring Topology by using MPI Synchronous and
Asynchronous calls.
Sync Solution
Collective Operations
Barrier
Barrier API
1 int MPI_Barrier ( MPI_Comm communicator )
Broadcast
Broadcast API
1 int MPI_Bcast (void * data, int count, MPI_Datatype datatype,
2 int root, MPI_Comm communicator)
Scatter
Scatter API
1 int MPI_Scatter( void* sendbuf, int sendcount, MPI_Datatype
sendtype, void* recvbuffer, int recvcount, MPI_Datatype
recvtype, int root, MPI_Comm communicator)
scatter
Data in sendbuf from root is splitted in chunks, each of
sendcount elements of type sendtype. Received data is written to
recvbuf. Usually recvcount = N ranks ∗ sendcount
Gather
Gather API
1 int MPI_Gather( void* sendbuf, int sendcount, MPI_Datatype
sendtype, void* recvbuffer, int sendcount, MPI_Datatype
recvtype, int root, MPI_Comm communicator)
Reduce
Reduce API
1 int MPI_Reduce( void* sendbuf, void* recvbuffer, int count,
MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm
communicator)
Reduce
Each rank sends a piece of data, which are combined on their way to rank root into a single piece of data.
M P I− M AXLOC, M P I− M IN LOC
Allreduce
Allreduce API
1 int MPI_Allreduce( void* sendbuf, void* recvbuffer, int count,
MPI_Datatype datatype, MPI_Op op, MPI_Comm communicator)
Exercise
Gather
Use the Gather API to write a program where each rank sends an
“Hello World” to rank0
Possible Solution
Solution
Click Here!
Exercise
Solution
Solution
Click Here!
Derived Datatypes
• User-define datatypes
• based on basi MPI datatypes
• Useful when dealing with messages containing non-contiguous
data of a single type, or with contiguous or non contiguous
data of mixed datatypes
• user datatypes improve readability and portability
Construction of datatype
Datatype Construction
Example
1 MPI_Datatype new_type; //datatype name declaration
2 ...
3 MPI_Type_XXX(..., &new_type); //construct the datatype
4 MPI_Type_commit (&new_type); //Allocate
5 // ... Some work here
6 ...
7 MPI_Type_free(&new_type); //remove
M P I− T ype− contiguos()
M P I− T ype− vector
M P I− T ype− hvector
Example
Solution
Click Here!
MPI Structure
• nblocks: 3
• array of blocklen: 2,3,1
• array of displacement: 0, 3*sizeof(A),
3*sizeof(A)+5*sizeof(B)
• array of types: A,B,C (where A,B,C can be any MPI basic
type)
auto alignment
compiler may insert one or more empty bytes to pad structure (e.g.
when mixing chars with int or double)
Distributed Memory and Message Passing Programming – Francesco Moscato 49/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Example
SubArrays
Example
Master-Slave
File Opening
amode
File Views
File View
Defines the part of the file that is visible to a process, as well as
the type of data in the file.
View consist of
Views setting
Example
Click Here!
File Views
Multidimansional Arrays
• I/O on Multidim arrays should be managed independently
form decomposition
• datafiles should be written in a “serial order” (e.g.: row major
order in C)
• use a subarray datatype
• Use a Cartesian Decomposition
Cartesian Decomposition
Cartesian Decomposition
is a parallelization method whereby different portions of the
domain are assigned to individual processes
Cartesian Decomposition
Maps a rank to a Coordinate
Cartesian Decomposition
Example
- Click Here!
Any Question ?
1
image from: https://pigswithcrayons.com/illustration/
dd-players-strategy-guide-illustrations/
Distributed Memory and Message Passing Programming – Francesco Moscato 77/77