Message-Passing Multicomputer
Message-Passing Multicomputer
Message-Passing Multicomputer
Complete computers connected through an interconnection
network:
Interconnection Network
Messages
Processor
Local
Memory
Programming
Processes
Results
Parallel Computing (Intro-06): Rajeev Wankar
6
The phase-parallel model offers a
paradigm that is widely used in
parallel programming.
The parallel program consists of a
number of super steps, and each
has two phases.
In a computation phase, multiple
processes each perform an
independent computation C.
In the subsequent interaction
phase, the processes perform one
or more synchronous interaction
operations, such as a barrier or a
Nearly embarrassing blocking communication.
Then next super step is executed.
Parallel Computing (Intro-06): Rajeev Wankar
7
CRI/EPCC
Hitachi MPI
HP MPI
IBM Parallel Environment for AIX-MPI Library
LAM/MPI (Supplier: Indiana University)
MPI for UNICOS Systems
MPICH (Supplier: Argonne National Laboratory)
OS/390 Unix System Services Parallel
RACE-MPI
SGI Message Passing Toolkit
Sun MPI
Parallel Computing (Intro-06): Rajeev Wankar
10
List of Few active products/projects
CRI/EPCC
Hitachi MPI
HP MPI
IBM Parallel Environment for AIX-MPI Library
LAM/MPI (Supplier: Indiana University)
MPI for UNICOS Systems
MPICH (Supplier: Argonne National Laboratory)
OS/390 Unix System Services Parallel
RACE-MPI
SGI Message Passing Toolkit
Sun MPI
Parallel Computing (Intro-06): Rajeev Wankar
11
What is MPI
A message-
message-passing library specification
Not a compiler specification
Not a specific product
Used for parallel computers, clusters, and heterogeneous
networks as a message passing library
Designed to be used for the development of parallel software
libraries
Designed to provide access to advanced parallel hardware for
End users
Library writers
Tool developers
"
#
A History of MPICH
MPICH was developed during the MPI standards process
to provide feedback to the MPI forum on implementation
and usability issues.
MPICH
"
MPICH
William Gropp,
Gropp, Ewing Lusk, Nathan Doss, and
Skjellum A high performance, portable
Anthony Skjellum.
implementation of the MPI Message-Passing
Interface standard. Parallel Computing, 22(6):789–
828, 1996.
MPICH
MPICH implementations
C data types
• MPI_UNSIGNED_CHAR
• MPI_CHAR unsigned char
char • MPI_UNSIGNED_SHORT
• MPI_BYTE unsigned short
like unsigned char • MPI_UNSIGNED
• MPI_SHORT unsigned int
short
• MPI_UNSIGNED_LONG
• MPI_INT
unsigned long
int
• MPI_LONG • MPI_LONG_DOUBLE
long long double (some
• MPI_FLOAT systems may not
implement)
float
• MPI_DOUBLE • MPI_LONG_LONG_INT
double long long (some systems
may not implement)
Parallel Computing (Intro-06): Rajeev Wankar
30
FORTRAN data types
The following data types are
• MPI_REAL optional
REAL • MPI_INTEGER1
• MPI_INTEGER integer*1 if supported
INTEGER • MPI_INTEGER2
• MPI_LOGICAL integer*2 if supported
LOGICAL • MPI_INTEGER4
• MPI_DOUBLE_PRECISION integer*4 if supported
DOUBLE PRECISION
• MPI_REAL4
• MPI_COMPLEX
real*4 if supported
COMPLEX
• MPI_DOUBLE_COMPLEX • MPI_REAL8
complex*16 (or real*8 if supported
complex*32) where
supported
Caution!
MPI_BAND MPI_MAX
MPI_BOR MPI_MAXLOC*
MPI_BXOR MPI_MIN
MPI_LAND MPI_MINLOC
MPI_LOR MPI_PROD
MPI_LXOR MPI_SUM
Communicators
Defines scope of a communication operation.
" MPI_COMM_WORLD
MPI Messages
#
MPICH Send and Recv
"
MPI Basic (Blocking) Send
Buffers
• When you send data, where does it go? One possibility is:
Process 0 Process 1
User data
Local buffer
the network
Local buffer
User data
Parallel Computing (Intro-06): Rajeev Wankar
44
Message Tag
Used to differentiate between different types of messages
being sent.
+
* %
%&'( ))$ *'( ))$
Include files
• mpi.h (c)
• mpif.h (Fortran)
Initiation of MPI
• MPI_INIT
Completion of MPI
• MPI_FINALIZE
MPI_Init
At least one process has access to stdin,
stdout, and stderr
The user can find out which process this is by
querying the attribute MPI_IO on MPI_COM
WORLD
In MPICH all processes have access to stdin,
stdout, and stderr and on networks these
I/O streams are routed back to the process with
rank 0 in MPI_COMM_WORLD.
Example
To send an integer x from process 0 to process 1,
Preliminaries
Set up paths
Create required directory structure
Modify makefile to match your source file
Create a file (hostfile) listing machines to be
used (required)
Collective Communications
#
Collective Communication
Two broad classes:
Data movement routines
Global computation routines
Called by all processes in a communicator
Examples:
• Barrier synchronization
• Broadcast, scatter, gather
• Global sum, global maximum, etc.
Parallel Computing (Intro-06): Rajeev Wankar
59
"
Broadcast
Sending same message to all processes concerned with problem.
Multicast - sending same message to defined group of processes.
"
! .. ! .. ! ..
.
Parallel Computing (Intro-06): Rajeev Wankar
61
Procedure Speciation
Opaque objects
Scatter
In its simplest form sending each element of an array in
root process to a separate process. Contents of ith location
of array sent to ith process.
"
! ..
.
Parallel Computing (Intro-06): Rajeev Wankar
66
Scatter
Scatter
"
"
! ..
.
Parallel Computing (Intro-06): Rajeev Wankar
69
Gather
"
Reduce
It is a Gather operation combined with specified arithmetic/
logical operation.
Values could be gathered and then added together by root:
"
! ..
6
.
Parallel Computing (Intro-06): Rajeev Wankar
72
MPI_Reduce
int MPI_Reduce (void *sendbuf, void *recvbuf, int count,
MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm
comm )
Input Parameters
sendbuf address of send buffer (choice)
count number of elements in send buffer
datatype data type of elements of send buffer (handle)
op reduce operation
root rank of root process (integer)
comm communicator (handle)
Output Parameter
recvbuf address of receive buffer (significant only at root)
Parallel Computing (Intro-06): Rajeev Wankar
73
Example
0
0.0 0.2 0.4 0.6 0.8 1.0
#include "mpi.h"
#include <math.h>
double f(a)
double a;
{
return (4.0 / (1.0 + a*a));
}
int main(argc,argv)
int argc;
char *argv[];
{
int done = 0, n, myid, numprocs, i, rc;
double PI25DT = 3.14159265;
double mypi, pi, h, sum, x, a;
double startwtime, endwtime;
Parallel Computing (Intro-06): Rajeev Wankar
76
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
n = 0;
while (!done)
{
if (myid == 0)
{
printf("Enter the number of
intervals: (0 quits) ");
scanf("%d",&n);
if (n==0) n=100; else n=0;
startwtime = MPI_Wtime();
}
Parallel Computing (Intro-06): Rajeev Wankar
77
#
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,
MPI_COMM_WORLD);
if (myid == 0)
{
printf("pi is approximately
%.16f, Error is %.16f\n",pi, fabs(pi
- PI25DT));
endwtime = MPI_Wtime();
printf("wall clock time = %f\n",
endwtime-startwtime);
}
}
}
MPI_Finalize();
}
Parallel Computing (Intro-06): Rajeev Wankar
79
Scatterv Operation
P0
3 4 6 8 4 9 3 6 7 8
3 4 6 8 4 9 3 6 7 8
P0 P1 P2 P3
"
MPI SCATTERV( sendbuf, sendcounts, displs, sendtype, recvbuf,
recvcount, recvtype, root, comm)
int MPI_Scatterv (
void *send_buffer,
int *send_cnt,
int *send_disp,
MPI_Datatype send_type,
void *receive_buffer,
int receive_cnt,
MPI_Datatype receive_type,
int root,
MPI_Comm communicator)
Parallel Computing (Intro-06): Rajeev Wankar
82
Gatherv Operation
P0 P1 P2 P3
3 4 6 8 4 9 3 6 7 8
3 4 6 8 4 9 3 6 7 8
P0
Alltoall
Input Parameters
sendbuf starting address of send buffer
sendcount number of elements to send to
each process (integer)
sendtype data type of send buffer elements
recvcount number of elements received from
any process (integer)
recvtype data type of receive buffer
elements
comm communicator
Alltoall Operation
P0 P1 P2 P3
1 4 6 7 8 3 3 2 5 5 5 1 0 4 3 2 0 4 3 6 3 6 1 6 2 4 3 5 1 3 0 3
1 4 5 5 0 4 2 4 6 7 5 1 3 6 3 5 8 3 0 4 3 6 1 3 3 2 3 2 1 6 0 3
MPI_Allgatherv
int MPI_Allgatherv (
void *send_buffer,
int send_cnt,
MPI_Datatype send_type,
void *receive_buffer,
int *receive_cnt,
int *receive_disp,
MPI_Datatype receive_type,
MPI_Comm communicator)
The block of data sent from the jth process is received by every
process and placed in the jth block of the buffer recvbuf.
Function MPI_Alltoallv
Matrix-vector Multiplication
Storing Vectors
#
Matrix-Vector Multiplication
c0 = a0,0 b0 + a0,1 b1 + a0,2 b2 + a0,3 b3 + a0,4 b4
c1 = a1,0 b0 + a1,1 b1 + a1,2 b2 + a1,3 b3 + a1,4 b4
c2 = a2,0 b0 + a2,1 b1 + a2,2 b2 + a2,3 b3 + a2,4 b4
c3 = a3,0 b0 + a3,1 b1 + a3,2 b2 + a3,3 b3 + b3,4 b4
c4 = a4,0 b0 + a4,1 b1 + a4,2 b2 + a4,3 b3 + a4,4 b4
Proc 4
Proc 3
Proc 2
Processor 1’s initial computation
Processor 0’s initial computation
Parallel Computing (Intro-06): Rajeev Wankar
99
Multiplications
Column i of A
Column i of A
b b ~c
All-to-all exchange
Column i of A
Column i of A
b c b ~c
Reduction
"
Matrix-Vector
2 1 0 4 1
3 2 1 1 3
×
4 3 1 2 4
3 0 2 0 1
This is alltoall
operation with
P0 reduction P1
2 X 1 = 2 1 = 3
3 3 2 X 3 6
4 4 SUM 3 9
3 3 0 0
9
14
P2 P3
19
0 = 0 4 = 4
1 4 11 1 1
1 X 4 4 2 2
2 8 0 X 1 0
What’s in MPICH-2
• Extensions to the message-passing model
– Dynamic process management
– One-sided operations (remote memory access)
– Parallel I/O
– Thread support
• Making MPI more robust and convenient
– C++ and Fortran 90 bindings
– External interfaces, handlers
– Extended collective operations
– Language interoperability
#include "mpi.h"
#include <stdio.h>
#define BUFSIZE 100
MPI_File thefile;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_File_close(&thefile);
MPI_Finalize();
return 0;
}
MPICH-G2
References