0% found this document useful (0 votes)
7 views

05 DistributedMemoryMPI

The document discusses message passing interface (MPI) which is a standard library used for distributed memory parallel programming. It describes the distributed memory model, MPI basics including communicators and groups, point-to-point communication using blocking and non-blocking calls, and derived data types. Examples of MPI code are provided.

Uploaded by

FrancescoMoscato
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

05 DistributedMemoryMPI

The document discusses message passing interface (MPI) which is a standard library used for distributed memory parallel programming. It describes the distributed memory model, MPI basics including communicators and groups, point-to-point communication using blocking and non-blocking calls, and derived data types. Examples of MPI code are provided.

Uploaded by

FrancescoMoscato
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Distributed Memory and Message

Passing Programming
Programming with MPI

Francesco Moscato

Universitá degli Studi di Salerno


[email protected]
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Outline

1 MPI Basics

2 Point to Point Communication

3 Collective Communication

4 Derived DataTypes

5 IO

6 Questions

Distributed Memory and Message Passing Programming – Francesco Moscato 2/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Distributed Memory Model

Distributed Memory and Message Passing Programming – Francesco Moscato 3/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Distributed Memory and Message Passing

Distributed Memory
By Message Passing, Nodes with local Memory create an Abstract
Memory that is distributed among nodes

Passing Data
Data in local memories are shared among nodes in the sense that
nodes that own data in their local memories can send them to
other nodes by using proper communications

Nodes and Communicatio


Nodes access rapidly to their local memory and other nodes access
to the same data through communication networks (usually
high-speed networks)

Distributed Memory and Message Passing Programming – Francesco Moscato 4/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Message Passing Interface (MPI)

MPI
MPI is a standard, vendor-independent, portable library

Supported Languages
C, C++, Fortran, Fortran90, Fortran 2008
Some wrappers for other Interpreted languages:
e.g.: Python with mpi4py

MPI is not ...


A Language;
a framework for automatic parallelization

Distributed Memory and Message Passing Programming – Francesco Moscato 5/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Common Implementation

MPICH
http://www.mpich.org

OpenMPI
http://www.open-mpi.org

Intel MPI
https://software.intel.com/content/www/us/en/develop/
tools/oneapi/components/mpi-library.html

Distributed Memory and Message Passing Programming – Francesco Moscato 6/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Running MPI

Single Machine or Clusters


MPI creates processes: You can have some speedup on single,
multi-core CPUs: performances do not improve when you use more
processes than cores.
Obviously you can install the MPI middleware to run on many
Nodes on a cluster or even on different nodes on local networks or
internet.

Distributed Memory and Message Passing Programming – Francesco Moscato 7/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Communicators and Groups

• MPI clusters processes into groups for


communication
• A Communicator is a container of processes
that can communicate together.
• First Groups are created and then a Group is
used to create a Communicator
• M P I− COM M− W ORLD is the default
communicator for ALL processes.
• Processes in a Communicator are identified by a
unique rank (i.e. an ID)
• The size of a communicator is the number of
processes it manages. It cannot be changed
after Communicator creation

Distributed Memory and Message Passing Programming – Francesco Moscato 8/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

First MPI Example


Hello World
1 #include <mpi.h>
2 #include <stdio.h>
3 #include <stdlib.h>
4
5 int main (int argc, char * argv[])
6 {
7 int rank, size;
8 MPI_Init(&argc, &argv); //init MPI environment
9 //get rank and size of communicator
10 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
11 MPI_Comm_size(MPI_COMM_WORLD, &size);
12
13 printf("Hello! I am rank # %d of %d processes\n", rank,size);
14
15 MPI_Finalize(); //Terminate MPI execution env.
16 exit(EXIT_SUCCESS);
17 }
Distributed Memory and Message Passing Programming – Francesco Moscato 9/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Compile and Run First Example

MPICH version on Linux


1 $ mpicc firstmpi.c -o firstmpi
2 $ mpirun -np 2 ./firstmpi
3
4 Hello! I am rank # 1 of 2 processes
5 Hello! I am rank # 0 of 2 processes

Distributed Memory and Message Passing Programming – Francesco Moscato 10/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Blocking or Non Blocking ?

Synchronous Communications
Implement blocking p2p communications
E.G.: M P I− Send() and M P I− Recv()

Asynchronous Communications
Implement non-blocking communications
E.G.: M P I− Isend() and M P I− Irecv()

Not p2p communications


Collective communication calls like: M P I− Broadcast,
M P I− Reduce, M P I− Barrier ...

Distributed Memory and Message Passing Programming – Francesco Moscato 11/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Blocking (Synchronous) calls

• Peer is waiting for a


message
• Message is sent
• Sender waits for a reply
• Peer sends the reply
• Sender gets the reply ack of
receipt

Distributed Memory and Message Passing Programming – Francesco Moscato 12/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Non-Blocking (Asynchronous) calls

• Sender uses a Mailbox to


send the message, than it
can do other
• Message arrives at peer’s
mailbox
• Peer reads the message
• Peer replies by using a
Mailbox
• Message arrives at sender’s
mailbox
• Sender reads the message
from the mailbox

Distributed Memory and Message Passing Programming – Francesco Moscato 13/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Blocking Send

MPI− Send
1 int MPI_Send(void *buf, int count, MPI_Datatype datatype,
2 int dest, int tag, MPI_Comm comm)

• *buf points to a memory buffer with data


• count is the number of elements in buf
• datatype is the (MPI) type of data in buf
• dest is the rank of dest process
• tag is an int that identifies the type of communication (it can
be used to define channels in communicators)
• comm is the Communicator where process sends data

Distributed Memory and Message Passing Programming – Francesco Moscato 14/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

MPI Datatypes

• M P I− CHAR, M P I− IN T , M P I− F LOAT ,
M P I− DOU BLE, M P I− LON G
• Array of DataTypes
• indexed arrays of blocks of datatypes
• arbitrary structure of datatypes
• Custom Datatypes ...

Distributed Memory and Message Passing Programming – Francesco Moscato 15/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

MPI− Send() and MPI− Ssend()

Blocking calls, with some differences:


• MPI− Send(): returns to the application when the buffer is
available to (re)use (e.g. when a small buffer has been copied
to an internal buffer, before the receiving process executes the
receive.
• MPI− Ssend(): always waits for the receiver to complete the
Recv call, even if the message is buffered internally

Distributed Memory and Message Passing Programming – Francesco Moscato 16/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Blocking Receive
MPI− Recv
1 int MPI_Recv(void *buf, int count, MPI_Datatype datatype,
2 int source, int tag, MPI_Comm comm, MPI_Status *
status)

• *buf points to a memory buffer with data


• count is the number of elements in buf
• datatype is the (MPI) type of data in buf
• source is the rank of sending process
• tag is an int that identifies the type of communication (it can
be used to define channels in communicators)
• comm is the Communicator where process sends data
• status is a data structure containing info on received message
Distributed Memory and Message Passing Programming – Francesco Moscato 17/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Example

array sending
Let’s have two processes, one owning a vector of positive numbers,
another owning a vector of negative numbers.
Let’s these processes exchange their vectors.

runinng
mpirun -np 2 ./exchange

Distributed Memory and Message Passing Programming – Francesco Moscato 18/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Examples

Wrong use of Send and Recv


Click Here!

Good Use of Send and Recv


Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 19/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Send and Receive call


MPI− Sendrecv
1 int MPI_Sendrecv(void* sendbuf, int sendcount, MPI_Datatype
senddatatype, int dest, int sendtag, void* recvbuf, int
recvcount, MPI_Datatype recvdatatye, int src, int recvtag,
MPI_Comm comm, MPI_Status * status);

• sendbuf send buffer


• sendcount is the number of elements in sendbuf
• senddatatype is the (MPI) type of data in sendbuf
• dst is the rank of recipient process
• sendtag tag for send message
• recvbuf buffer at receive point
• recvcount is the number of elements in recvbuf
• recvdatatype is the type of elements in recvbuf
• src is the rank of the sender
• recvtag is the tag for receive messages
• comm is the Communicator where process sends data
• status is a data structure containing info on received message

Distributed Memory and Message Passing Programming – Francesco Moscato 20/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Example

MPI− Sendrcv
Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 21/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Non Blocking Send


MPI− Isend
1 int MPI_Isend(
2 void* data,
3 int count,
4 MPI_Datatype datatype,
5 int destination,
6 int tag,
7 MPI_Comm communicator,
8 MPI_Request* request)

• data: pointer to data to be written


• count: number of elems in data
• datatype: type of elements in data
• destination: destination rank
• tag: message tag
• communicator: The communicator
Distributed Memory
• and Message Passing Programming – Francesco Moscato 22/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Non Blocking Recv


MPI− Irecv
1 int MPI_Isend(
2 void* data,
3 int count,
4 MPI_Datatype datatype,
5 int destination,
6 int tag,
7 MPI_Comm communicator,
8 MPI_Request* request)

• data: pointer to data to be written


• count: number of elems in data
• datatype: type of elements in data
• source: source rank
• tag: message tag
• communicator: The communicator
Distributed Memory
• and Message Passing Programming – Francesco Moscato 23/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Test

MPI− Test
Tests the stauts of a request (created by a Isend or a Irecv)

1 int MPI_Test(
2 MPI_Request* request,
3 int * flag,
4 MPI_Status* status)

Distributed Memory and Message Passing Programming – Francesco Moscato 24/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Wait

MPI− Wait
Waits the execution of a request (created by a Isend or a Irecv)

1 int MPI_Wait(
2 MPI_Request* request,
3 MPI_Status* status)

Distributed Memory and Message Passing Programming – Francesco Moscato 25/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Async Communication Example

ISend and IRecv


Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 26/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Exercise

Ring Topology
Realize a Ring Topology by using MPI Synchronous and
Asynchronous calls.

Distributed Memory and Message Passing Programming – Francesco Moscato 27/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Sync Solution

ISend and IRecv


Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 28/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Collective Operations

The most commonly used are:


• Synchronization:
• M P I− Barrier()
• One-To-All Communication
• M P I− Bcast(), M P I− Scatter()
• All-to-One Communication
• M P I− Reduce(), M P I− Gather()
• All-to-All Communication
• M P I− Alltoall(), M P I− Allgather(), M P I− Allreduce()

Distributed Memory and Message Passing Programming – Francesco Moscato 29/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Barrier

Barrier API
1 int MPI_Barrier ( MPI_Comm communicator )

in a barrier all ranks in the communicator wait each other reach


the barrier

Distributed Memory and Message Passing Programming – Francesco Moscato 30/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Broadcast
Broadcast API
1 int MPI_Bcast (void * data, int count, MPI_Datatype datatype,
2 int root, MPI_Comm communicator)

Distributed Memory and Message Passing Programming – Francesco Moscato 31/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Scatter
Scatter API
1 int MPI_Scatter( void* sendbuf, int sendcount, MPI_Datatype
sendtype, void* recvbuffer, int recvcount, MPI_Datatype
recvtype, int root, MPI_Comm communicator)

scatter
Data in sendbuf from root is splitted in chunks, each of
sendcount elements of type sendtype. Received data is written to
recvbuf. Usually recvcount = N ranks ∗ sendcount

Distributed Memory and Message Passing Programming – Francesco Moscato 32/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Gather
Gather API
1 int MPI_Gather( void* sendbuf, int sendcount, MPI_Datatype
sendtype, void* recvbuffer, int sendcount, MPI_Datatype
recvtype, int root, MPI_Comm communicator)

Distributed Memory and Message Passing Programming – Francesco Moscato 33/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Reduce
Reduce API
1 int MPI_Reduce( void* sendbuf, void* recvbuffer, int count,
MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm
communicator)

Reduce
Each rank sends a piece of data, which are combined on their way to rank root into a single piece of data.

Combination Operations include: M P I− SU M , M P I− M AX, M P I− M IN , M P I− P ROD,

M P I− M AXLOC, M P I− M IN LOC

Distributed Memory and Message Passing Programming – Francesco Moscato 34/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Allreduce
Allreduce API
1 int MPI_Allreduce( void* sendbuf, void* recvbuffer, int count,
MPI_Datatype datatype, MPI_Op op, MPI_Comm communicator)

Distributed Memory and Message Passing Programming – Francesco Moscato 35/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Exercise

Gather
Use the Gather API to write a program where each rank sends an
“Hello World” to rank0

Distributed Memory and Message Passing Programming – Francesco Moscato 36/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Possible Solution

Solution
Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 37/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Exercise

Use Reduce and Allreduce


this Code! is wrong: it returns the local sum and maximum.
Modify it by using Allreduce and Reduce to have the right results.

Distributed Memory and Message Passing Programming – Francesco Moscato 38/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Solution

Solution
Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 39/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Derived Datatypes

• User-define datatypes
• based on basi MPI datatypes
• Useful when dealing with messages containing non-contiguous
data of a single type, or with contiguous or non contiguous
data of mixed datatypes
• user datatypes improve readability and portability

Distributed Memory and Message Passing Programming – Francesco Moscato 40/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Construction of datatype

1 Build datatype by using a template. The new datatype has


type M P I− Datatype
2 allocate the datatype (M P I− T ype− commit()
3 use the datatype
4 deallocate the datatype

Distributed Memory and Message Passing Programming – Francesco Moscato 41/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Datatype Construction

Example
1 MPI_Datatype new_type; //datatype name declaration
2 ...
3 MPI_Type_XXX(..., &new_type); //construct the datatype
4 MPI_Type_commit (&new_type); //Allocate
5 // ... Some work here
6 ...
7 MPI_Type_free(&new_type); //remove

Distributed Memory and Message Passing Programming – Francesco Moscato 42/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Most used Constructors

• M P I− T ype− contiguos(): replicates contiguously locations


• M P I− T ype− vector(): replicates with stride
• M P I− T ype− hvector(): strides are given in bytes
• M P I− T ype− indexed() creates a new type from blocks
comprising identical elements with varying size and
displacement
• M P I− T ype− hindexed displacements are in byte
• M P I− T ype− create− subarray creates a datatype
corresponding to a distributed multidimensional array
• M P I− T ype− create− struct() creates a datatype from a
generic set of datatypes, displacements and block sizes.

Distributed Memory and Message Passing Programming – Francesco Moscato 43/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

M P I− T ype− contiguos()

1 int MPI_Type_contiguous(int count, MPI_Datatype oldtype,


MPI_Datatype *newtype)

Distributed Memory and Message Passing Programming – Francesco Moscato 44/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

M P I− T ype− vector

1 int MPI_Type_vector(int count, int blocklen, int stride,


MPI_Datatype oldtype, MPI_Datatype *newtype)

Distributed Memory and Message Passing Programming – Francesco Moscato 45/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

M P I− T ype− hvector

1 int MPI_Type_vector(int count, int blocklen, int stride,


MPI_Datatype oldtype, MPI_Datatype *newtype)

The same of previous datatype, but it is heterogeneous and strides


is specified in “bytes”

Distributed Memory and Message Passing Programming – Francesco Moscato 46/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Example

Create two datatypes in order to:


• Exchange a given raw and a given column in a matrix
• process rank 0 owns the matrix, process rank 1 has to receive
one row and one column
• Let the Matrix be MxN
• Since C stores elements row by row: row type is contiguos; col
type must be striped

Solution
Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 47/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

MPI Structure

1 int MPI_Type_create_struct(int nblocks, const int


array_of_blocklen[], const MPI_Aint array_of_displacements
[], const MPI_Datatype array_of_types[], MPI_Datatype *
newtype)

• nblocks: number of blocks. A block is a collection of data of


the same type
• array of blocklen: an array of int with the size of each block
• array of displacements: array that specifies the offset of
each block (in bytes)
• array of types: array with (old) datatypes
• newtype: handle for new datatype

Distributed Memory and Message Passing Programming – Francesco Moscato 48/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

MPI Structure Example

• nblocks: 3
• array of blocklen: 2,3,1
• array of displacement: 0, 3*sizeof(A),
3*sizeof(A)+5*sizeof(B)
• array of types: A,B,C (where A,B,C can be any MPI basic
type)

auto alignment
compiler may insert one or more empty bytes to pad structure (e.g.
when mixing chars with int or double)
Distributed Memory and Message Passing Programming – Francesco Moscato 49/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Safety and Portability

Use M P I− Get− address to get displacements ...

Distributed Memory and Message Passing Programming – Francesco Moscato 50/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Example

1 typedef struct st {float x; float y; int type; } ST;


2
3 int nblocks = 2, blocklen[] = {2, 1};
4 oldtypes[] = {MPI FLOAT, MPI INT};
5 MPI Aint displ[] = {0, 8}; // Manual setting (not very recommended)
6 MPI Datatype MPI ST;
7 ST s;
8 //...
9 MPI Get address(&(s.x),&displ[0]);
10 MPI Get address(&(s.type), &displ[1]);
11 displ[1] −= displ[0]; displ[0] −= displ[0];
12 MPI Type create struct (nblocks, blocklen, displ, oldtypes, &MPI ST);
13 MPI Type commit(&MPI ST);
14 s.x = ... // Initialize record here
15 int dst = 0, src = 1;
16 if (rank == src) MPI Send (&s, 1, MPI ST, dst, 10, MPI COMM WORLD);
17 else MPI Recv (&s, 1, MPI ST, src, 10, MPI COMM WORLD, MPI STATUS IGNORE);

Distributed Memory and Message Passing Programming – Francesco Moscato 51/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

SubArrays

1 int MPI_Type_create_subarray(int ndims, const int sizes[], const


int subsizes[], const int starts[], int order, MPI_Datatype
oldtype, MPI_Datatype * newtype)

• ndims: number of array dimensions


• sizes: number of elements of type oldtype in each dimension
of the full array
• subsizes: number of elements of type oldtype in each
dimension of the subarray
• starts: starting coordinates of the subarray in each dimension
• order: array storage order flag (in C: M P I− ORDER− C)
• newtype: the new datatype handler
Distributed Memory and Message Passing Programming – Francesco Moscato 52/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Example

Distributed Memory and Message Passing Programming – Francesco Moscato 53/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Managing Files in MPI

• MPI has many routines to manage data from files


• We see here some basic routines
• Properties in basic routines
• Positioning (with MPI file pointers)
• Synchronization (blocking or non-blocking)
• Coordination (local or collective)

Distributed Memory and Message Passing Programming – Francesco Moscato 54/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

I/O in Parallel Programs

Three different approaches:


• Master-Slave (or sequential)
• Distributed I/O on local files
• Fully parallel I/O

Distributed Memory and Message Passing Programming – Francesco Moscato 55/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Master-Slave

• Pros: data consistency, parallel machines may have disks only


on one node
• Cons: lack of parallelism, lots of communications

Distributed Memory and Message Passing Programming – Francesco Moscato 56/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Distributed I/O on Separate Files

• Pros: Scalable, no communications


• Cons: not very usable: too much files if you have lots of
processes

Distributed Memory and Message Passing Programming – Francesco Moscato 57/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Fully Parallel I/O

• Pros: High Performance, avoid communication, single file


• Cons: extra coding

Distributed Memory and Message Passing Programming – Francesco Moscato 58/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

MPI I/O functions

Distributed Memory and Message Passing Programming – Francesco Moscato 59/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

File Opening

1 int MPI_File_open(MPI_Comm comm, char *filename, int amode,


MPI_Info info, MPI_File *fh)

• amode is the opening mode


• info provides additional information (it is system dependent
and you can use M P I− IN F O− N U LL)
• the call is a collective routine. All processes must provide the
same amode and the same filename
• This supports ONLY Binary I/O

Distributed Memory and Message Passing Programming – Francesco Moscato 60/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

amode

Distributed Memory and Message Passing Programming – Francesco Moscato 61/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Shared and Local File Pointers

MPI supports read/write ops with:


• Shared fp: one rank at a time owns the shared pointer for
r/w. This may lead to performances loss. All functions are
collective (e.g. M P I− W rite− shared(),
M P I− W rite− ordered(), M P I− F ile− seek− shared() etc.)
• Local fp: each rank has its own fp. There are both collective
and non-collective operations (e.g. non-collective :
M P I− F ile− write(), M P I− F ile− read(); collective:
M P I− F ile− write− al())
• File Views: Map data from multiple processors to the file
representation on disk.

Distributed Memory and Message Passing Programming – Francesco Moscato 62/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

I/O and Shared Pointers

1 int MPI_File_write_ordered(MPI_File fh, void *buf, int count,


MPI_Datatype datatype, MPI_Status *status)

• collective access using shared fp


• accesses ordered by ranks
• fp moves as processes access to the file
• the same view has to be used on all processes
• read with M P I− F ile− read− ordered()

Distributed Memory and Message Passing Programming – Francesco Moscato 63/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

I/O and local pointers

1 int MPI_File_seek(MPI_File mpi_fh, MPI_Offset offset, int whence)


2
3 int MPI_File_write(MPI_File mpi_fh, void *buf, int count,
MPI_Datatype datatype, MPI_Status *status);
4
5 int MPI_File_write_all();

• Seek operations update local pointer


• whence is the update mode (M P I− SEEK− SET ,
M P I− SEEK− CU R, M P I− SEEK−EN D
• M P I− F ile− write() is not collective, the collective version is
M P I− F ile− write− all()

Distributed Memory and Message Passing Programming – Francesco Moscato 64/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

File Views

File View
Defines the part of the file that is visible to a process, as well as
the type of data in the file.

Read and Write


processes access to bytes (Binary I/O)

Distributed Memory and Message Passing Programming – Francesco Moscato 65/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

View consist of

displacement: the number of bytes from the beginning of file


etype: the basic unit of data access
filetype: the type of elements in the visible part

Distributed Memory and Message Passing Programming – Francesco Moscato 66/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Views setting

1 int MPI_File_set_view(MPI_File mpi_fh, MPI_Offset disp,


MPI_Datatype etype, MPI_Datatype filetype, char *datarep,
MPI_Info info);

datarep is the data representation (string)


info describes the object (handle)

Distributed Memory and Message Passing Programming – Francesco Moscato 67/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

datarep in File view

• native: (default) use the memory layout without conversion.


No precision loss, no portability
• internal: layout implementation-dependent. It is portable
within the same MPI implementation
• external32: This uses an MPI Standard (32-bit big endian
IEEE). It is portable, it has some conversion overhead, it is
not implemented everywhere

Internal and external32 portability is guaranteed only when using


correct MPI datatypes and not MPI− BYE

Distributed Memory and Message Passing Programming – Francesco Moscato 68/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Default File View

• the default view is defined by the M P I− F ile− open()


• disp = 0; etype= M P I− BY T E = filetype;

Distributed Memory and Message Passing Programming – Francesco Moscato 69/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Example

Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 70/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

File Views and Non-Contiguous Data

File views are good when writing non-continuously on files

In the example the file view has: count=3, blocklen=1, stride=4

Distributed Memory and Message Passing Programming – Francesco Moscato 71/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

File Views

1 for (i = 0; i < NELEM; i++) buf[i] = rank + 0.1*i; // Fill buffer


2 MPI_Datatype vec_type;
3 MPI_Type_vector(NELEM, 1, size, MPI_DOUBLE, &vec_type); // Create
vector type
4 MPI_Type_commit(&vec_type);
5 disp = rank*sizeof(double);
6 // Compute offset (in bytes)
7 MPI_File_set_view(fh, disp, MPI_DOUBLE, vec_type, "native",
MPI_INFO_NULL);
8 // Set view
9 MPI_File_write(fh, buf, NELEM, MPI_DOUBLE, MPI_STATUS_IGNORE);
10 // Write
11 MPI_Type_free(&vec_type);

Distributed Memory and Message Passing Programming – Francesco Moscato 72/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Multidimansional Arrays
• I/O on Multidim arrays should be managed independently
form decomposition
• datafiles should be written in a “serial order” (e.g.: row major
order in C)
• use a subarray datatype
• Use a Cartesian Decomposition

Distributed Memory and Message Passing Programming – Francesco Moscato 73/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Cartesian Decomposition
Cartesian Decomposition
is a parallelization method whereby different portions of the
domain are assigned to individual processes
Cartesian Decomposition
Maps a rank to a Coordinate

1 int MPI_Cart_create(MPI_Comm comm_old, int ndims, const int dims


[], const int periods[], int reorder, MPI_Comm * comm_cart)
Distributed Memory and Message Passing Programming – Francesco Moscato 74/77
MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Cartesian Decomposition

• comm− old: input communicator


• ndims: number of dimensions of Cartesian grid (integer)
• dims: integer array of size ndims specifying the number of
procs in each dimension;
• periods: logical array of size ndims specifying periodicity
(true) or not (false) in each dimension;
• reorder: ranking may be reordered (true) or not (false)
• comm− cart: communicator with new Cartesian topology

Distributed Memory and Message Passing Programming – Francesco Moscato 75/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Example

- Click Here!

Distributed Memory and Message Passing Programming – Francesco Moscato 76/77


MPI Basics Point to Point Communication Collective Communication Derived DataTypes IO Questions

Any Question ?

1
image from: https://pigswithcrayons.com/illustration/
dd-players-strategy-guide-illustrations/
Distributed Memory and Message Passing Programming – Francesco Moscato 77/77

You might also like