Distributed Memory and Message
Passing Programming
Programming with MPI
Francesco Moscato
Universitá degli Studi di Salerno
[email protected] MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Outline
1 MPI Basics
2 Point to Point Communication
3 Collective Communication
4 Derived DataTypes
5 IO
6 Questions
Distributed Memory and Message Passing Programming – Francesco Moscato 2/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Distributed Memory Model
Distributed Memory and Message Passing Programming – Francesco Moscato 3/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Distributed Memory and Message Passing
Distributed Memory
By Message Passing, Nodes with local Memory create an Abstract
Memory that is distributed among nodes
Passing Data
Data in local memories are shared among nodes in the sense that
nodes that own data in their local memories can send them to
other nodes by using proper communications
Nodes and Communicatio
Nodes access rapidly to their local memory and other nodes access
to the same data through communication networks (usually
high-speed networks)
Distributed Memory and Message Passing Programming – Francesco Moscato 4/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Message Passing Interface (MPI)
MPI
MPI is a standard, vendor-independent, portable library
Supported Languages
C, C++, Fortran, Fortran90, Fortran 2008
Some wrappers for other Interpreted languages:
e.g.: Python with mpi4py
MPI is not ...
A Language;
a framework for automatic parallelization
Distributed Memory and Message Passing Programming – Francesco Moscato 5/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Common Implementation
MPICH
http://www.mpich.org
OpenMPI
http://www.open-mpi.org
Intel MPI
https://software.intel.com/content/www/us/en/develop/
tools/oneapi/components/mpi-library.html
Distributed Memory and Message Passing Programming – Francesco Moscato 6/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Running MPI
Single Machine or Clusters
MPI creates processes: You can have some speedup on single,
multi-core CPUs: performances do not improve when you use more
processes than cores.
Obviously you can install the MPI middleware to run on many
Nodes on a cluster or even on different nodes on local networks or
internet.
Distributed Memory and Message Passing Programming – Francesco Moscato 7/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Communicators and Groups
• MPI clusters processes into groups for
communication
• A Communicator is a container of processes
that can communicate together.
• First Groups are created and then a Group is
used to create a Communicator
• M P I− COM M− W ORLD is the default
communicator for ALL processes.
• Processes in a Communicator are identified by a
unique rank (i.e. an ID)
• The size of a communicator is the number of
processes it manages. It cannot be changed
after Communicator creation
Distributed Memory and Message Passing Programming – Francesco Moscato 8/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
First MPI Example
Hello World
1 #include <mpi.h>
2 #include <stdio.h>
3 #include <stdlib.h>
4
5 int main (int argc, char * argv[])
6 {
7 int rank, size;
8 MPI_Init(&argc, &argv); //init MPI environment
9 //get rank and size of communicator
10 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
11 MPI_Comm_size(MPI_COMM_WORLD, &size);
12
13 printf("Hello! I am rank # %d of %d processes\n", rank,size);
14
15 MPI_Finalize(); //Terminate MPI execution env.
16 exit(EXIT_SUCCESS);
17 }
Distributed Memory and Message Passing Programming – Francesco Moscato 9/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Compile and Run First Example
MPICH version on Linux
1 $ mpicc firstmpi.c -o firstmpi
2 $ mpirun -np 2 ./firstmpi
3
4 Hello! I am rank # 1 of 2 processes
5 Hello! I am rank # 0 of 2 processes
Distributed Memory and Message Passing Programming – Francesco Moscato 10/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Blocking or Non Blocking ?
Synchronous Communications
Implement blocking p2p communications
E.G.: M P I− Send() and M P I− Recv()
Asynchronous Communications
Implement non-blocking communications
E.G.: M P I− Isend() and M P I− Irecv()
Not p2p communications
Collective communication calls like: M P I− Broadcast,
M P I− Reduce, M P I− Barrier ...
Distributed Memory and Message Passing Programming – Francesco Moscato 11/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Blocking (Synchronous) calls
• Peer is waiting for a
message
• Message is sent
• Sender waits for a reply
• Peer sends the reply
• Sender gets the reply ack of
receipt
Distributed Memory and Message Passing Programming – Francesco Moscato 12/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Non-Blocking (Asynchronous) calls
• Sender uses a Mailbox to
send the message, than it
can do other
• Message arrives at peer’s
mailbox
• Peer reads the message
• Peer replies by using a
Mailbox
• Message arrives at sender’s
mailbox
• Sender reads the message
from the mailbox
Distributed Memory and Message Passing Programming – Francesco Moscato 13/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Blocking Send
MPI− Send
1 int MPI_Send(void *buf, int count, MPI_Datatype datatype,
2 int dest, int tag, MPI_Comm comm)
• *buf points to a memory buffer with data
• count is the number of elements in buf
• datatype is the (MPI) type of data in buf
• dest is the rank of dest process
• tag is an int that identifies the type of communication (it can
be used to define channels in communicators)
• comm is the Communicator where process sends data
Distributed Memory and Message Passing Programming – Francesco Moscato 14/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
MPI Datatypes
• M P I− CHAR, M P I− IN T , M P I− F LOAT ,
M P I− DOU BLE, M P I− LON G
• Array of DataTypes
• indexed arrays of blocks of datatypes
• arbitrary structure of datatypes
• Custom Datatypes ...
Distributed Memory and Message Passing Programming – Francesco Moscato 15/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
MPI− Send() and MPI− Ssend()
Blocking calls, with some differences:
• MPI− Send(): returns to the application when the buffer is
available to (re)use (e.g. when a small buffer has been copied
to an internal buffer, before the receiving process executes the
receive.
• MPI− Ssend(): always waits for the receiver to complete the
Recv call, even if the message is buffered internally
Distributed Memory and Message Passing Programming – Francesco Moscato 16/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Blocking Receive
MPI− Recv
1 int MPI_Recv(void *buf, int count, MPI_Datatype datatype,
2 int source, int tag, MPI_Comm comm, MPI_Status *
status)
• *buf points to a memory buffer with data
• count is the number of elements in buf
• datatype is the (MPI) type of data in buf
• source is the rank of sending process
• tag is an int that identifies the type of communication (it can
be used to define channels in communicators)
• comm is the Communicator where process sends data
• status is a data structure containing info on received message
Distributed Memory and Message Passing Programming – Francesco Moscato 17/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Example
array sending
Let’s have two processes, one owning a vector of positive numbers,
another owning a vector of negative numbers.
Let’s these processes exchange their vectors.
runinng
mpirun -np 2 ./exchange
Distributed Memory and Message Passing Programming – Francesco Moscato 18/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Examples
Wrong use of Send and Recv
Click Here!
Good Use of Send and Recv
Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 19/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Send and Receive call
MPI− Sendrecv
1 int MPI_Sendrecv(void* sendbuf, int sendcount, MPI_Datatype
senddatatype, int dest, int sendtag, void* recvbuf, int
recvcount, MPI_Datatype recvdatatye, int src, int recvtag,
MPI_Comm comm, MPI_Status * status);
• sendbuf send buffer
• sendcount is the number of elements in sendbuf
• senddatatype is the (MPI) type of data in sendbuf
• dst is the rank of recipient process
• sendtag tag for send message
• recvbuf buffer at receive point
• recvcount is the number of elements in recvbuf
• recvdatatype is the type of elements in recvbuf
• src is the rank of the sender
• recvtag is the tag for receive messages
• comm is the Communicator where process sends data
• status is a data structure containing info on received message
Distributed Memory and Message Passing Programming – Francesco Moscato 20/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Example
MPI− Sendrcv
Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 21/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Non Blocking Send
MPI− Isend
1 int MPI_Isend(
2 void* data,
3 int count,
4 MPI_Datatype datatype,
5 int destination,
6 int tag,
7 MPI_Comm communicator,
8 MPI_Request* request)
• data: pointer to data to be written
• count: number of elems in data
• datatype: type of elements in data
• destination: destination rank
• tag: message tag
• communicator: The communicator
Distributed Memory
• and Message Passing Programming – Francesco Moscato 22/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Non Blocking Recv
MPI− Irecv
1 int MPI_Isend(
2 void* data,
3 int count,
4 MPI_Datatype datatype,
5 int destination,
6 int tag,
7 MPI_Comm communicator,
8 MPI_Request* request)
• data: pointer to data to be written
• count: number of elems in data
• datatype: type of elements in data
• source: source rank
• tag: message tag
• communicator: The communicator
Distributed Memory
• and Message Passing Programming – Francesco Moscato 23/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Test
MPI− Test
Tests the stauts of a request (created by a Isend or a Irecv)
1 int MPI_Test(
2 MPI_Request* request,
3 int * flag,
4 MPI_Status* status)
Distributed Memory and Message Passing Programming – Francesco Moscato 24/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Wait
MPI− Wait
Waits the execution of a request (created by a Isend or a Irecv)
1 int MPI_Wait(
2 MPI_Request* request,
3 MPI_Status* status)
Distributed Memory and Message Passing Programming – Francesco Moscato 25/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Async Communication Example
ISend and IRecv
Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 26/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Exercise
Ring Topology
Realize a Ring Topology by using MPI Synchronous and
Asynchronous calls.
Distributed Memory and Message Passing Programming – Francesco Moscato 27/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Sync Solution
ISend and IRecv
Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 28/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Collective Operations
The most commonly used are:
• Synchronization:
• M P I− Barrier()
• One-To-All Communication
• M P I− Bcast(), M P I− Scatter()
• All-to-One Communication
• M P I− Reduce(), M P I− Gather()
• All-to-All Communication
• M P I− Alltoall(), M P I− Allgather(), M P I− Allreduce()
Distributed Memory and Message Passing Programming – Francesco Moscato 29/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Barrier
Barrier API
1 int MPI_Barrier ( MPI_Comm communicator )
in a barrier all ranks in the communicator wait each other reach
the barrier
Distributed Memory and Message Passing Programming – Francesco Moscato 30/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Broadcast
Broadcast API
1 int MPI_Bcast (void * data, int count, MPI_Datatype datatype,
2 int root, MPI_Comm communicator)
Distributed Memory and Message Passing Programming – Francesco Moscato 31/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Scatter
Scatter API
1 int MPI_Scatter( void* sendbuf, int sendcount, MPI_Datatype
sendtype, void* recvbuffer, int recvcount, MPI_Datatype
recvtype, int root, MPI_Comm communicator)
scatter
Data in sendbuf from root is splitted in chunks, each of
sendcount elements of type sendtype. Received data is written to
recvbuf. Usually recvcount = N ranks ∗ sendcount
Distributed Memory and Message Passing Programming – Francesco Moscato 32/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Gather
Gather API
1 int MPI_Gather( void* sendbuf, int sendcount, MPI_Datatype
sendtype, void* recvbuffer, int sendcount, MPI_Datatype
recvtype, int root, MPI_Comm communicator)
Distributed Memory and Message Passing Programming – Francesco Moscato 33/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Reduce
Reduce API
1 int MPI_Reduce( void* sendbuf, void* recvbuffer, int count,
MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm
communicator)
Reduce
Each rank sends a piece of data, which are combined on their way to rank root into a single piece of data.
Combination Operations include: M P I− SU M , M P I− M AX, M P I− M IN , M P I− P ROD,
M P I− M AXLOC, M P I− M IN LOC
Distributed Memory and Message Passing Programming – Francesco Moscato 34/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Allreduce
Allreduce API
1 int MPI_Allreduce( void* sendbuf, void* recvbuffer, int count,
MPI_Datatype datatype, MPI_Op op, MPI_Comm communicator)
Distributed Memory and Message Passing Programming – Francesco Moscato 35/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Exercise
Gather
Use the Gather API to write a program where each rank sends an
“Hello World” to rank0
Distributed Memory and Message Passing Programming – Francesco Moscato 36/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Possible Solution
Solution
Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 37/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Exercise
Use Reduce and Allreduce
this Code! is wrong: it returns the local sum and maximum.
Modify it by using Allreduce and Reduce to have the right results.
Distributed Memory and Message Passing Programming – Francesco Moscato 38/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Solution
Solution
Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 39/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Derived Datatypes
• User-define datatypes
• based on basi MPI datatypes
• Useful when dealing with messages containing non-contiguous
data of a single type, or with contiguous or non contiguous
data of mixed datatypes
• user datatypes improve readability and portability
Distributed Memory and Message Passing Programming – Francesco Moscato 40/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Construction of datatype
1 Build datatype by using a template. The new datatype has
type M P I− Datatype
2 allocate the datatype (M P I− T ype− commit()
3 use the datatype
4 deallocate the datatype
Distributed Memory and Message Passing Programming – Francesco Moscato 41/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Datatype Construction
Example
1 MPI_Datatype new_type; //datatype name declaration
2 ...
3 MPI_Type_XXX(..., &new_type); //construct the datatype
4 MPI_Type_commit (&new_type); //Allocate
5 // ... Some work here
6 ...
7 MPI_Type_free(&new_type); //remove
Distributed Memory and Message Passing Programming – Francesco Moscato 42/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Most used Constructors
• M P I− T ype− contiguos(): replicates contiguously locations
• M P I− T ype− vector(): replicates with stride
• M P I− T ype− hvector(): strides are given in bytes
• M P I− T ype− indexed() creates a new type from blocks
comprising identical elements with varying size and
displacement
• M P I− T ype− hindexed displacements are in byte
• M P I− T ype− create− subarray creates a datatype
corresponding to a distributed multidimensional array
• M P I− T ype− create− struct() creates a datatype from a
generic set of datatypes, displacements and block sizes.
Distributed Memory and Message Passing Programming – Francesco Moscato 43/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
M P I− T ype− contiguos()
1 int MPI_Type_contiguous(int count, MPI_Datatype oldtype,
MPI_Datatype *newtype)
Distributed Memory and Message Passing Programming – Francesco Moscato 44/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
M P I− T ype− vector
1 int MPI_Type_vector(int count, int blocklen, int stride,
MPI_Datatype oldtype, MPI_Datatype *newtype)
Distributed Memory and Message Passing Programming – Francesco Moscato 45/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
M P I− T ype− hvector
1 int MPI_Type_vector(int count, int blocklen, int stride,
MPI_Datatype oldtype, MPI_Datatype *newtype)
The same of previous datatype, but it is heterogeneous and strides
is specified in “bytes”
Distributed Memory and Message Passing Programming – Francesco Moscato 46/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Example
Create two datatypes in order to:
• Exchange a given raw and a given column in a matrix
• process rank 0 owns the matrix, process rank 1 has to receive
one row and one column
• Let the Matrix be MxN
• Since C stores elements row by row: row type is contiguos; col
type must be striped
Solution
Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 47/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
MPI Structure
1 int MPI_Type_create_struct(int nblocks, const int
array_of_blocklen[], const MPI_Aint array_of_displacements
[], const MPI_Datatype array_of_types[], MPI_Datatype *
newtype)
• nblocks: number of blocks. A block is a collection of data of
the same type
• array of blocklen: an array of int with the size of each block
• array of displacements: array that specifies the offset of
each block (in bytes)
• array of types: array with (old) datatypes
• newtype: handle for new datatype
Distributed Memory and Message Passing Programming – Francesco Moscato 48/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
MPI Structure Example
• nblocks: 3
• array of blocklen: 2,3,1
• array of displacement: 0, 3*sizeof(A),
3*sizeof(A)+5*sizeof(B)
• array of types: A,B,C (where A,B,C can be any MPI basic
type)
auto alignment
compiler may insert one or more empty bytes to pad structure (e.g.
when mixing chars with int or double)
Distributed Memory and Message Passing Programming – Francesco Moscato 49/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Safety and Portability
Use M P I− Get− address to get displacements ...
Distributed Memory and Message Passing Programming – Francesco Moscato 50/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Example
1 typedef struct st {float x; float y; int type; } ST;
2
3 int nblocks = 2, blocklen[] = {2, 1};
4 oldtypes[] = {MPI FLOAT, MPI INT};
5 MPI Aint displ[] = {0, 8}; // Manual setting (not very recommended)
6 MPI Datatype MPI ST;
7 ST s;
8 //...
9 MPI Get address(&(s.x),&displ[0]);
10 MPI Get address(&(s.type), &displ[1]);
11 displ[1] −= displ[0]; displ[0] −= displ[0];
12 MPI Type create struct (nblocks, blocklen, displ, oldtypes, &MPI ST);
13 MPI Type commit(&MPI ST);
14 s.x = ... // Initialize record here
15 int dst = 0, src = 1;
16 if (rank == src) MPI Send (&s, 1, MPI ST, dst, 10, MPI COMM WORLD);
17 else MPI Recv (&s, 1, MPI ST, src, 10, MPI COMM WORLD, MPI STATUS IGNORE);
Distributed Memory and Message Passing Programming – Francesco Moscato 51/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
SubArrays
1 int MPI_Type_create_subarray(int ndims, const int sizes[], const
int subsizes[], const int starts[], int order, MPI_Datatype
oldtype, MPI_Datatype * newtype)
• ndims: number of array dimensions
• sizes: number of elements of type oldtype in each dimension
of the full array
• subsizes: number of elements of type oldtype in each
dimension of the subarray
• starts: starting coordinates of the subarray in each dimension
• order: array storage order flag (in C: M P I− ORDER− C)
• newtype: the new datatype handler
Distributed Memory and Message Passing Programming – Francesco Moscato 52/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Example
Distributed Memory and Message Passing Programming – Francesco Moscato 53/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Managing Files in MPI
• MPI has many routines to manage data from files
• We see here some basic routines
• Properties in basic routines
• Positioning (with MPI file pointers)
• Synchronization (blocking or non-blocking)
• Coordination (local or collective)
Distributed Memory and Message Passing Programming – Francesco Moscato 54/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
I/O in Parallel Programs
Three different approaches:
• Master-Slave (or sequential)
• Distributed I/O on local files
• Fully parallel I/O
Distributed Memory and Message Passing Programming – Francesco Moscato 55/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Master-Slave
• Pros: data consistency, parallel machines may have disks only
on one node
• Cons: lack of parallelism, lots of communications
Distributed Memory and Message Passing Programming – Francesco Moscato 56/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Distributed I/O on Separate Files
• Pros: Scalable, no communications
• Cons: not very usable: too much files if you have lots of
processes
Distributed Memory and Message Passing Programming – Francesco Moscato 57/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Fully Parallel I/O
• Pros: High Performance, avoid communication, single file
• Cons: extra coding
Distributed Memory and Message Passing Programming – Francesco Moscato 58/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
MPI I/O functions
Distributed Memory and Message Passing Programming – Francesco Moscato 59/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
File Opening
1 int MPI_File_open(MPI_Comm comm, char *filename, int amode,
MPI_Info info, MPI_File *fh)
• amode is the opening mode
• info provides additional information (it is system dependent
and you can use M P I− IN F O− N U LL)
• the call is a collective routine. All processes must provide the
same amode and the same filename
• This supports ONLY Binary I/O
Distributed Memory and Message Passing Programming – Francesco Moscato 60/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
amode
Distributed Memory and Message Passing Programming – Francesco Moscato 61/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Shared and Local File Pointers
MPI supports read/write ops with:
• Shared fp: one rank at a time owns the shared pointer for
r/w. This may lead to performances loss. All functions are
collective (e.g. M P I− W rite− shared(),
M P I− W rite− ordered(), M P I− F ile− seek− shared() etc.)
• Local fp: each rank has its own fp. There are both collective
and non-collective operations (e.g. non-collective :
M P I− F ile− write(), M P I− F ile− read(); collective:
M P I− F ile− write− al())
• File Views: Map data from multiple processors to the file
representation on disk.
Distributed Memory and Message Passing Programming – Francesco Moscato 62/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
I/O and Shared Pointers
1 int MPI_File_write_ordered(MPI_File fh, void *buf, int count,
MPI_Datatype datatype, MPI_Status *status)
• collective access using shared fp
• accesses ordered by ranks
• fp moves as processes access to the file
• the same view has to be used on all processes
• read with M P I− F ile− read− ordered()
Distributed Memory and Message Passing Programming – Francesco Moscato 63/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
I/O and local pointers
1 int MPI_File_seek(MPI_File mpi_fh, MPI_Offset offset, int whence)
2
3 int MPI_File_write(MPI_File mpi_fh, void *buf, int count,
MPI_Datatype datatype, MPI_Status *status);
4
5 int MPI_File_write_all();
• Seek operations update local pointer
• whence is the update mode (M P I− SEEK− SET ,
M P I− SEEK− CU R, M P I− SEEK−EN D
• M P I− F ile− write() is not collective, the collective version is
M P I− F ile− write− all()
Distributed Memory and Message Passing Programming – Francesco Moscato 64/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
File Views
File View
Defines the part of the file that is visible to a process, as well as
the type of data in the file.
Read and Write
processes access to bytes (Binary I/O)
Distributed Memory and Message Passing Programming – Francesco Moscato 65/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
View consist of
displacement: the number of bytes from the beginning of file
etype: the basic unit of data access
filetype: the type of elements in the visible part
Distributed Memory and Message Passing Programming – Francesco Moscato 66/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Views setting
1 int MPI_File_set_view(MPI_File mpi_fh, MPI_Offset disp,
MPI_Datatype etype, MPI_Datatype filetype, char *datarep,
MPI_Info info);
datarep is the data representation (string)
info describes the object (handle)
Distributed Memory and Message Passing Programming – Francesco Moscato 67/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
datarep in File view
• native: (default) use the memory layout without conversion.
No precision loss, no portability
• internal: layout implementation-dependent. It is portable
within the same MPI implementation
• external32: This uses an MPI Standard (32-bit big endian
IEEE). It is portable, it has some conversion overhead, it is
not implemented everywhere
Internal and external32 portability is guaranteed only when using
correct MPI datatypes and not MPI− BYE
Distributed Memory and Message Passing Programming – Francesco Moscato 68/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Default File View
• the default view is defined by the M P I− F ile− open()
• disp = 0; etype= M P I− BY T E = filetype;
Distributed Memory and Message Passing Programming – Francesco Moscato 69/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Example
Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 70/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
File Views and Non-Contiguous Data
File views are good when writing non-continuously on files
In the example the file view has: count=3, blocklen=1, stride=4
Distributed Memory and Message Passing Programming – Francesco Moscato 71/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
File Views
1 for (i = 0; i < NELEM; i++) buf[i] = rank + 0.1*i; // Fill buffer
2 MPI_Datatype vec_type;
3 MPI_Type_vector(NELEM, 1, size, MPI_DOUBLE, &vec_type); // Create
vector type
4 MPI_Type_commit(&vec_type);
5 disp = rank*sizeof(double);
6 // Compute offset (in bytes)
7 MPI_File_set_view(fh, disp, MPI_DOUBLE, vec_type, "native",
MPI_INFO_NULL);
8 // Set view
9 MPI_File_write(fh, buf, NELEM, MPI_DOUBLE, MPI_STATUS_IGNORE);
10 // Write
11 MPI_Type_free(&vec_type);
Distributed Memory and Message Passing Programming – Francesco Moscato 72/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Multidimansional Arrays
• I/O on Multidim arrays should be managed independently
form decomposition
• datafiles should be written in a “serial order” (e.g.: row major
order in C)
• use a subarray datatype
• Use a Cartesian Decomposition
Distributed Memory and Message Passing Programming – Francesco Moscato 73/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Cartesian Decomposition
Cartesian Decomposition
is a parallelization method whereby different portions of the
domain are assigned to individual processes
Cartesian Decomposition
Maps a rank to a Coordinate
1 int MPI_Cart_create(MPI_Comm comm_old, int ndims, const int dims
[], const int periods[], int reorder, MPI_Comm * comm_cart)
Distributed Memory and Message Passing Programming – Francesco Moscato 74/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Cartesian Decomposition
• comm− old: input communicator
• ndims: number of dimensions of Cartesian grid (integer)
• dims: integer array of size ndims specifying the number of
procs in each dimension;
• periods: logical array of size ndims specifying periodicity
(true) or not (false) in each dimension;
• reorder: ranking may be reordered (true) or not (false)
• comm− cart: communicator with new Cartesian topology
Distributed Memory and Message Passing Programming – Francesco Moscato 75/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Example
- Click Here!
Distributed Memory and Message Passing Programming – Francesco Moscato 76/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions
Any Question ?
1
image from: https://pigswithcrayons.com/illustration/
dd-players-strategy-guide-illustrations/
Distributed Memory and Message Passing Programming – Francesco Moscato 77/77