0% found this document useful (0 votes)
15 views57 pages

Message-Passing Multicomputer

The document provides an overview of message-passing computing for distributed multi-computers, detailing the programming paradigms, tools like MPI and PVM, and the structure of message-passing communication. It explains the concepts of embarrassingly parallel computations, process farms, and the SPMD model, along with the architecture and functionalities of MPICH. Additionally, it covers the basics of MPI, including message types, communicators, and send/receive operations.

Uploaded by

mem1178perycap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views57 pages

Message-Passing Multicomputer

The document provides an overview of message-passing computing for distributed multi-computers, detailing the programming paradigms, tools like MPI and PVM, and the structure of message-passing communication. It explains the concepts of embarrassingly parallel computations, process farms, and the SPMD model, along with the architecture and functionalities of MPICH. Additionally, it covers the basics of MPI, including message types, communicators, and send/receive operations.

Uploaded by

mem1178perycap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Message-Passing Computing

for Distributed Multi-Computers

A review of basic concepts

Parallel Computing (Intro-06): Rajeev Wankar


1

Message-Passing Multicomputer
Complete computers connected through an interconnection
network:

Interconnection Network
Messages

Processor

Local
Memory

Parallel Computing (Intro-06): Rajeev Wankar


2
Programming
Programming a message-passing multicomputer can
be achieved by

Designing a special parallel programming language


Extending the syntax/reserved words of an existing
sequential high-level language to handle message
passing
Using an existing sequential high-level language and
providing a library of external procedures for
message passing

Parallel Computing (Intro-06): Rajeev Wankar


3

Programming

Involves dividing problem into parts that are intended to be


executed simultaneously to solve the problem

Each part executed by separate computers

Parts (processes) communicate by sending messages - the


only way to distribute data and collect result

Parallel Computing (Intro-06): Rajeev Wankar


4
Message Passing Parallel Programming
Software Tools

Parallel Virtual Machine (PVM) - developed in late 1980 s.


Became very popular.

Message-Passing Interface (MPI) - standard defined in


1990s.

Both provide a set of user-level libraries for message


passing. Use with regular programming languages
(FORTRAN, C, C++, ...).
Parallel Computing (Intro-06): Rajeev Wankar
5

Embarrassingly Parallel Computations

A computation that can be divided into a number of


completely independent parts, each of which can be
executed by a separate process(or).
No communication or very little communication between
processes
Input data

Processes

Results
Parallel Computing (Intro-06): Rajeev Wankar
6
The phase-parallel model offers a
paradigm that is widely used in
parallel programming.
The parallel program consists of a
number of super steps, and each
has two phases.
In a computation phase, multiple
processes each perform an
independent computation C.
In the subsequent interaction
phase, the processes perform one
or more synchronous interaction
operations, such as a barrier or a
Nearly embarrassing blocking communication.
Then next super step is executed.
Parallel Computing (Intro-06): Rajeev Wankar
7

This paradigm is also known as


Process farm the master-slave paradigm.
A master process executes the
essentially sequential part of
the parallel program and
spawns a number of slave
processes to execute the
parallel workload.
When a slave finishes its
workload, it informs the master
which assigns a new workload
to the slave.
This is a very simple paradigm,
where the coordination is done
by the master.
Parallel Computing (Intro-06): Rajeev Wankar
8
MPI (Message Passing Interface)

Standard developed by group of academics and


industrial partners to foster more widespread use and
portability

Defines routines, not implementation

Several free implementations of MPI standard exist.

Parallel Computing (Intro-06): Rajeev Wankar


9

List of Few active products/projects

CRI/EPCC
Hitachi MPI
HP MPI
IBM Parallel Environment for AIX-MPI Library
LAM/MPI (Supplier: Indiana University)
MPI for UNICOS Systems
MPICH (Supplier: Argonne National Laboratory)
OS/390 Unix System Services Parallel
RACE-MPI
SGI Message Passing Toolkit
Sun MPI
Parallel Computing (Intro-06): Rajeev Wankar
10
List of Few active products/projects

CRI/EPCC
Hitachi MPI
HP MPI
IBM Parallel Environment for AIX-MPI Library
LAM/MPI (Supplier: Indiana University)
MPI for UNICOS Systems
MPICH (Supplier: Argonne National Laboratory)
OS/390 Unix System Services Parallel
RACE-MPI
SGI Message Passing Toolkit
Sun MPI
Parallel Computing (Intro-06): Rajeev Wankar
11

What is MPI
A message-
message-passing library specification
Not a compiler specification
Not a specific product
Used for parallel computers, clusters, and heterogeneous
networks as a message passing library
Designed to be used for the development of parallel software
libraries
Designed to provide access to advanced parallel hardware for
End users
Library writers
Tool developers

Parallel Computing (Intro-06): Rajeev Wankar


12
Where to use MPI?

• We need a portable parallel program


• We are writing a parallel Library

Why learn MPI?


• Portable
• Expressive
• Good way to learn about subtle issues in parallel
computing
• Universal acceptance

Parallel Computing (Intro-06): Rajeev Wankar


13

The attractiveness of the message-passing paradigm


is its wide portability.

Programs expressed this way may run on distributed-


memory multiprocessors, networks of workstations,
and combinations of all of these.

In addition, shared-memory implementations are


possible.

Parallel Computing (Intro-06): Rajeev Wankar


14
Basics of Message-Passing
Programming

Two primary mechanisms needed:

1. A method of creating separate processes for execution


on different computers

2. A method of sending and receiving messages

Parallel Computing (Intro-06): Rajeev Wankar


15

Single Program Multiple Data (SPMD) model


Different processes merged into one program. Within
program,control statements select different parts for each
processor to execute. All executables start together -
static process creation.

"

Parallel Computing (Intro-06): Rajeev Wankar


16
"

Parallel Computing (Intro-06): Rajeev Wankar


17

Evaluating General Message

Message Passing SPMD : C program


main (int argc, char **argv)
{
if (process is to become a controller process)
{
Controller (/* Arguments /*);
}
else
{
Worker (/* Arguments /*);
}
}

Parallel Computing (Intro-06): Rajeev Wankar


18

#
A History of MPICH
MPICH was developed during the MPI standards process
to provide feedback to the MPI forum on implementation
and usability issues.

With the release of the MPI standard, MPICH was


designed to provide an implementation of the MPI
standard.

It supported both MIMD programming and heterogeneous


clusters from the very beginning.

Parallel Computing (Intro-06): Rajeev Wankar


19

MPICH

MPICH is an open-source, portable implementation of


the Message-Passing Interface Standard.

Designed at Mathematics and Computer Science


Division of Argonne National Laboratory.

The “CH" in MPICH stands for symbol of


adaptability to one's environment and thus of
portability.

Parallel Computing (Intro-06): Rajeev Wankar


20

"
MPICH

William Gropp,
Gropp, Ewing Lusk, Nathan Doss, and
Skjellum A high performance, portable
Anthony Skjellum.
implementation of the MPI Message-Passing
Interface standard. Parallel Computing, 22(6):789–
828, 1996.

It contains a complete implementation of version 1.2


of the MPI Standard and also significant parts of
MPI-2, particularly in the area of parallel I/O.

Parallel Computing (Intro-06): Rajeev Wankar


21

MPICH

MPICH2 is the latest implementation of MPI.

In addition to the features in MPICH, MPICH2


includes support for one-side communication,
dynamic processes, inter communicator collective
operations, and expanded MPI-IO functionality.

Clusters consisting of both single-processor and SMP


nodes are supported.

Parallel Computing (Intro-06): Rajeev Wankar


22
MPICH Architecture

• Most code is completely portable


• An Abstract Device defines the communication layer
• The abstract device can have widely varying
instantiations, using:
– sockets
– shared memory
– other special interfaces
• e.g. Myrinet, Quadrics, InfiniBand, Grid
protocols

Parallel Computing (Intro-06): Rajeev Wankar


23

MPICH implementations

MPICH is freely available and is distributed as


open source.

• The Unix/Linux (all flavors) version of MPICH


• The Microsoft Windows version of MPICH
(MPICH.NT 1.2.5)
• MPICH-G2, the Globus version of MPICH

Parallel Computing (Intro-06): Rajeev Wankar


24
Is MPICH Large or Small?

MPICH is large ( >210 Functions)


• MPICH’s extensive functionality requires many functions
• Number of functions not necessarily a measure of
complexity
MPICH is small (6 Functions)
• Many parallel programs can be written with just 6 basic
functions
MPICH is just right candidate for message passing
• One need not master all parts of MPICH to use it
Parallel Computing (Intro-06): Rajeev Wankar
25

MPI Basic Send/Receive


• We need to fill in the details in
Process 0 Process 1
Send(data)
Receive(data)

• Things that need specifying:


– How will “data” be described?
– How will processes be identified?
– How will the receiver recognize/screen messages?
– What will it mean for these operations to complete?

Parallel Computing (Intro-06): Rajeev Wankar


26
Some Basic Concepts

• Processes can be collected into groups


• Each message is sent in a context, and must be
received in the same context
• A group and context together form a communicator
• A process is identified by its rank in the group
associated with a communicator
• There is a default communicator whose group
contains all initial processes, called
MPI_COMM_WORLD

Parallel Computing (Intro-06): Rajeev Wankar


27

Begin programming with 6 MPI function calls

• MPI_INIT Initializes MPI


• MPI_COMM_SIZE Determines number of processes
• MPI_COMM_RANK Determines the label of the
calling process
• MPI_SEND Sends a message
• MPI_RECV Receives a message
• MPI_FINALIZE Terminates MPI

Parallel Computing (Intro-06): Rajeev Wankar


28
MPI Datatypes
• The data in a message to send or receive is
described by a triple (address, count, datatype),
where
• An MPI datatype is recursively defined as:
– predefined, corresponding to a data type from the
language (e.g., MPI_INT, MPI_DOUBLE)
– a contiguous array of MPI datatypes
– a strided block of datatypes
– an indexed array of blocks of datatypes
– an arbitrary structure of datatypes
• There are MPI functions to construct custom
datatypes, in particular ones for subarrays

Parallel Computing (Intro-06): Rajeev Wankar


29

C data types
• MPI_UNSIGNED_CHAR
• MPI_CHAR unsigned char
char • MPI_UNSIGNED_SHORT
• MPI_BYTE unsigned short
like unsigned char • MPI_UNSIGNED
• MPI_SHORT unsigned int
short
• MPI_UNSIGNED_LONG
• MPI_INT
unsigned long
int
• MPI_LONG • MPI_LONG_DOUBLE
long long double (some
• MPI_FLOAT systems may not
implement)
float
• MPI_DOUBLE • MPI_LONG_LONG_INT
double long long (some systems
may not implement)
Parallel Computing (Intro-06): Rajeev Wankar
30
FORTRAN data types
The following data types are
• MPI_REAL optional
REAL • MPI_INTEGER1
• MPI_INTEGER integer*1 if supported
INTEGER • MPI_INTEGER2
• MPI_LOGICAL integer*2 if supported
LOGICAL • MPI_INTEGER4
• MPI_DOUBLE_PRECISION integer*4 if supported
DOUBLE PRECISION
• MPI_REAL4
• MPI_COMPLEX
real*4 if supported
COMPLEX
• MPI_DOUBLE_COMPLEX • MPI_REAL8
complex*16 (or real*8 if supported
complex*32) where
supported

Parallel Computing (Intro-06): Rajeev Wankar


31

Caution!

Fortran types should only be used in Fortran


programs,
C types should only be used in C programs.

For example, it is in error to use MPI_INT for a


Fortran INTEGER, which should be MPI_INTEGER.

Parallel Computing (Intro-06): Rajeev Wankar


32
MPI_Op Options (Collective Operation)

MPI_BAND MPI_MAX
MPI_BOR MPI_MAXLOC*
MPI_BXOR MPI_MIN
MPI_LAND MPI_MINLOC
MPI_LOR MPI_PROD
MPI_LXOR MPI_SUM

* Maximum and Location


Parallel Computing (Intro-06): Rajeev Wankar
33

Communicators
Defines scope of a communication operation.

Processes have ranks associated with the communicator.

Initially, all processes enrolled in a “universe” called


MPI_COMM_WORLD and each process is given a unique
rank, a number from 0 to n - 1, where there are n
processes.

Other communicators can be established for groups of


processes.
Parallel Computing (Intro-06): Rajeev Wankar
34
MPI_COMM_WORLD communicator

This Default Communicator is MPI’s mechanism for


establishing individual communication universes

" MPI_COMM_WORLD

Parallel Computing (Intro-06): Rajeev Wankar


35

MPI Messages

Message : data (3 parameters) + envelope (3 parameters)


Data: startbuf, count, datatype
• Startbuf: address where the data starts
• Count: number of elements (items) of data in the message
Envelope: dest, tag, comm
• Destination or Source: Sending or Receiving processes
• Tag: Integer to distinguish messages
Communicator:
The communicator is communication “universe.”
Messages are sent or received within a given “universe.”

Parallel Computing (Intro-06): Rajeev Wankar


36
Synchronous Message Passing (Blocking$

Routines that actually return when message transfer


completed.

Synchronous send routine Waits until complete


message can be accepted by the receiving process
before sending the message.

Synchronous receive routine Waits until the message it


is expecting arrives.

Synchronous routines intrinsically perform two actions:


They transfer data and they synchronize processes.
Parallel Computing (Intro-06): Rajeev Wankar
37

Asynchronous Message Passing (Non-Blocking)

Routines that do not wait for actions to complete before


returning. Usually require local storage for messages.

More than one version depending upon the actual


semantics for returning.

In general, they do not synchronize processes but allow


processes to move forward sooner. Must be used with
care.

Parallel Computing (Intro-06): Rajeev Wankar


38

#
MPICH Send and Recv

• Communication between two processes


• Source process sends message to destination
process
• Communication takes place within a
communicator
• Destination process is identified by its rank in the
communicator

Parallel Computing (Intro-06): Rajeev Wankar


39

Parameters of the blocking send

MPI_Send( buf, count, datatype, dest, tag, comm)

Address of Data type of Message tag


Send buffer each item
Number of items Rank of destination
to send process Communicator

Parallel Computing (Intro-06): Rajeev Wankar


40

"
MPI Basic (Blocking) Send

• When this function returns, the data has been delivered


to the system and the buffer can be reused. The
message may not have been received by the target
process.

Parallel Computing (Intro-06): Rajeev Wankar


41

Parameters of the blocking receive

MPI_Recv(buf, count, datatype, src, tag, comm, status)

Address of Data type of Message tag Status after


receive buffer each item operation

Maximum number Rank of source


of items to receive process Communicator

Parallel Computing (Intro-06): Rajeev Wankar


42
MPI Basic (Blocking) Receive

• Waits until a matching (both source and tag) message is


received from the system, and the buffer can be used
• source is rank in communicator specified by comm, or
MPI_ANY_SOURCE
• tag is a tag to be matched on or MPI_ANY_TAG
• receiving fewer than count occurrences of datatype is OK,
but receiving more is an error
• status contains further information (e.g. size of message)

Parallel Computing (Intro-06): Rajeev Wankar


43

Buffers

• When you send data, where does it go? One possibility is:

Process 0 Process 1
User data

Local buffer

the network

Local buffer

User data
Parallel Computing (Intro-06): Rajeev Wankar
44
Message Tag
Used to differentiate between different types of messages
being sent.

Message tag is carried within message.

If special type matching is not required, a wild card


message tag is used, so that the will match with
any

Parallel Computing (Intro-06): Rajeev Wankar


45

Message Tag Example


To send a message x with message tag 5 from a source
process 1 to a destination process 2 and assign to y:

+
* %
%&'( ))$ *'( ))$

Waits for a message from process 1 with a tag of 5


Parallel Computing (Intro-06): Rajeev Wankar
46
Initializing MPICH

• Must be first routine called


• int MPI_Init(int *argc, char **argv);

Parallel Computing (Intro-06): Rajeev Wankar


47

What makes an MPICH Program?

Include files
• mpi.h (c)
• mpif.h (Fortran)
Initiation of MPI
• MPI_INIT
Completion of MPI
• MPI_FINALIZE

Parallel Computing (Intro-06): Rajeev Wankar


48
MPI_Init
The command line arguments are provided to
MPI_Init to allow an MPI implementation to
use them in initializing the MPI environment.
environment

They are passed by reference to allow an


MPI implementation to provide them in
environments where the command-line
arguments are not provided to main.

Parallel Computing (Intro-06): Rajeev Wankar


49

MPI_Init
At least one process has access to stdin,
stdout, and stderr
The user can find out which process this is by
querying the attribute MPI_IO on MPI_COM
WORLD
In MPICH all processes have access to stdin,
stdout, and stderr and on networks these
I/O streams are routed back to the process with
rank 0 in MPI_COMM_WORLD.

Parallel Computing (Intro-06): Rajeev Wankar


50
MPI_Init
On most systems, these streams also can be
redirected through mpirun, as follows

mpirun –np 64 myprog -myarg 13 <data.in>


results.out

Here we assume that –myarg 13 are


command-line arguments processed by the
application myprog. After MPI_Init, each
process will have these arguments in its argv
Parallel Computing (Intro-06): Rajeev Wankar
51

Example
To send an integer x from process 0 to process 1,

MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /* find rank */


if (myrank == 0) {
int x;
MPI_Send(&x, 1, MPI_INT, 1, msgtag,
MPI_COMM_WORLD);
} else if (myrank == 1) {
int x;
MPI_Recv(&x, 1, MPI_INT, 0,msgtag,
MPI_COMM_WORLD, status);
}

Parallel Computing (Intro-06): Rajeev Wankar


52
, - . / 0 1

Parallel Computing (Intro-06): Rajeev Wankar


53

Write a simple parallel program in which every


process with rank greater than 0 sends a message
“Hello-Participants” to a process with rank 0. The
processes with rank 0 receives the message and
prints it.

Parallel Computing (Intro-06): Rajeev Wankar


54
#include “mpi.h”
main (int argc, char **argv) {
int MyRank, Numprocs, tag, ierror, i;
MPI_Status status;
char send_message[20], recv_message[20];
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &Numprocs);
MPI_Comm_rank (MPI_COMM_WORLD, &MyRank);
tag = 100;
strcpy (send_message, “Hello-Participants”);
if (MyRank==0) {
for (i=1; i<Numprocs; i++) {
MPI_Recv (recv_message,20, MPI_CHAR, i, tag, MPI_COMM_WORLD,&status);
printf (“node %d : %s \n”, i, recv_message);
}
} else
MPI_Send(send_message, 20, MPI_CHAR,0, tag, MPI_COMM_WORLD);
MPI_Finalize();
}

Parallel Computing (Intro-06): Rajeev Wankar


55

Basic instructions for compiling/executing


MPICH programs

Preliminaries

Set up paths
Create required directory structure
Modify makefile to match your source file
Create a file (hostfile) listing machines to be
used (required)

Parallel Computing (Intro-06): Rajeev Wankar


56
Compiling/executing (SPMD) C/MPICH program

To compile MPI programs:


mpicc -o file file.c
Or
mpiCC -o file file.cpp

To execute MPI program: mpirun -np no_processors file

Parallel Computing (Intro-06): Rajeev Wankar


57

Collective Communications

The sending and/or receiving of messages to/from


groups of processes. A collective communication
implies that all processes need participate in the
communication.
Involves coordinated communication within a
group of processes
No message tags used
All collective routines block until they are locally
complete

Parallel Computing (Intro-06): Rajeev Wankar


58

#
Collective Communication
Two broad classes:
Data movement routines
Global computation routines
Called by all processes in a communicator
Examples:
• Barrier synchronization
• Broadcast, scatter, gather
• Global sum, global maximum, etc.
Parallel Computing (Intro-06): Rajeev Wankar
59

Parallel Computing (Intro-06): Rajeev Wankar


60

"
Broadcast
Sending same message to all processes concerned with problem.
Multicast - sending same message to defined group of processes.

"

& & &

! .. ! .. ! ..

2 '$3 2 '$3 2 '$3

.
Parallel Computing (Intro-06): Rajeev Wankar
61

Procedure Speciation

MPI procedures are specified using a language


independent notation. The arguments of
procedure calls are marked as IN, OUT or INOUT. The
meanings of these are:
the call uses but does not update an argument marked
IN,
the call may update an argument marked OUT,
the call both uses and updates an argument marked
INOUT.

Parallel Computing (Intro-06): Rajeev Wankar


62
Procedure Speciation

There is one special case: if an argument is a handle to


an opaque object and the object is updated by the
procedure call, then the argument is marked OUT.

It is marked this way even though the handle itself is not


modified we use the OUT attribute to denote that the
handle references is updated.

Parallel Computing (Intro-06): Rajeev Wankar


63

Opaque objects

MPI manages system memory that is used for buffering


messages and for storing internal representations of
various MPI objects such as groups, communicators,
datatypes, etc.
This memory is not directly accessible to the user, and
objects stored there are opaque: their size and shape is
not visible to the user. Opaque objects are accessed via
handles, which exist in user space.
In Fortran, all handles have type INTEGER. In C, a
different handle type is defined for each category of
objects.

Parallel Computing (Intro-06): Rajeev Wankar


64
Broadcast
A broadcast sends data from one processor to all other
processors, including itself.
C:
int MPI_Bcast ( void *buffer, int count, MPI_Datatype
datatype, int root, MPI_Comm comm);
Input/output Parameters
INOUT buffer starting address of buffer
IN count number of entries in buffer
IN Datatype data type of buffer
IN root rank of broadcast root
IN comm communicator
Parallel Computing (Intro-06): Rajeev Wankar
65

Scatter
In its simplest form sending each element of an array in
root process to a separate process. Contents of ith location
of array sent to ith process.

"

& & &

! ..

24 '$3 24 '$3 24 '$3

.
Parallel Computing (Intro-06): Rajeev Wankar
66
Scatter

The process with rank root distributes the contents of send-


buffer among the processes. The contents of send-buffer
are split into p segments each consisting of send_count
elements. The first segment goes to process 0, the second
to process 1, etc… The send arguments are significant only
on process root.
C:
int MPI_Scatter( void *send_buffer, int send_count,
MPI_Datatype send_type, void *recv_buffer, int recv_count,
MPI_Datatype recv_type, int root, MPI_Comm comm);

Parallel Computing (Intro-06): Rajeev Wankar


67

Scatter
"

Parallel Computing (Intro-06): Rajeev Wankar


68
Gather
Having one process collect individual values from set of processes.

"

& & &

! ..

25 - '$3 25 - '$3 25 - '$3

.
Parallel Computing (Intro-06): Rajeev Wankar
69

Gather

Each process in comm sends the contents of send-buffer to


the process with rank root. The process with rank root
concatenates the received data in the process rank order in
recv-buffer. The argument recv_count indicates the number
of items received from each process – not the total number
received.
C:
int MPI_Gather( void *send_buffer, int send_count,
MPI_Datatype send_type, void *recv_buffer, int recv_count,
MPI_Datatype recv_type, int root, MPI_Comm comm);
Parallel Computing (Intro-06): Rajeev Wankar
70
Gather

"

Parallel Computing (Intro-06): Rajeev Wankar


71

Reduce
It is a Gather operation combined with specified arithmetic/
logical operation.
Values could be gathered and then added together by root:
"

& & &

! ..
6

2 & '$3 2 & '$3 2 & '$3

.
Parallel Computing (Intro-06): Rajeev Wankar
72
MPI_Reduce
int MPI_Reduce (void *sendbuf, void *recvbuf, int count,
MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm
comm )

Input Parameters
sendbuf address of send buffer (choice)
count number of elements in send buffer
datatype data type of elements of send buffer (handle)
op reduce operation
root rank of root process (integer)
comm communicator (handle)

Output Parameter
recvbuf address of receive buffer (significant only at root)
Parallel Computing (Intro-06): Rajeev Wankar
73

Example

Write a simple parallel program to calculate value


of pi by numerical integration. Since
1 1 π
dx = tan −1( x ) |10 = tan −1(1) − tan −1(0) = tan −1(1) =
0 1+ x
2
4

We will integrate the function f(x) = 4/(1 + x2)

Parallel Computing (Intro-06): Rajeev Wankar


74
Graph
4
.' $ 7 /' 6 $
-

0
0.0 0.2 0.4 0.6 0.8 1.0

Parallel Computing (Intro-06): Rajeev Wankar


75

#include "mpi.h"
#include <math.h>
double f(a)
double a;
{
return (4.0 / (1.0 + a*a));
}
int main(argc,argv)
int argc;
char *argv[];
{
int done = 0, n, myid, numprocs, i, rc;
double PI25DT = 3.14159265;
double mypi, pi, h, sum, x, a;
double startwtime, endwtime;
Parallel Computing (Intro-06): Rajeev Wankar
76
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
n = 0;
while (!done)
{
if (myid == 0)
{
printf("Enter the number of
intervals: (0 quits) ");
scanf("%d",&n);
if (n==0) n=100; else n=0;
startwtime = MPI_Wtime();
}
Parallel Computing (Intro-06): Rajeev Wankar
77

MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);


if (n == 0)
done = 1;
else
{
h = 1.0 / (double) n;
sum = 0.0;
for (i = myid + 1; i <= n; i +=
numprocs)
{
x = h * ((double)i - 0.5);
sum += f(x);
}
mypi = h * sum;
Parallel Computing (Intro-06): Rajeev Wankar
78

#
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,
MPI_COMM_WORLD);
if (myid == 0)
{
printf("pi is approximately
%.16f, Error is %.16f\n",pi, fabs(pi
- PI25DT));
endwtime = MPI_Wtime();
printf("wall clock time = %f\n",
endwtime-startwtime);
}
}
}
MPI_Finalize();
}
Parallel Computing (Intro-06): Rajeev Wankar
79

Scatterv Operation
P0

3 4 6 8 4 9 3 6 7 8

3 4 6 8 4 9 3 6 7 8

P0 P1 P2 P3

Parallel Computing (Intro-06): Rajeev Wankar


80

"
MPI SCATTERV( sendbuf, sendcounts, displs, sendtype, recvbuf,
recvcount, recvtype, root, comm)

IN sendbuf address of send buffer (choice, significant only at


root)
IN sendcounts integer array (of length group size) specifying the
number of elements to send to each processor
IN displs integer array (of length group size). Entry i species
the displacement (relative to sendbuf from which
to take the outgoing data to process i
IN sendtype data type of send buffer elements
OUT recvbuf address of receive buffer
IN recvcount number of elements in receive buffer (integer)
IN recvtype data type of receive buffer elements
IN root rank of sending process (integer)
IN comm communicator
Parallel Computing (Intro-06): Rajeev Wankar
81

Header for MPI_Scatterv

int MPI_Scatterv (
void *send_buffer,
int *send_cnt,
int *send_disp,
MPI_Datatype send_type,
void *receive_buffer,
int receive_cnt,
MPI_Datatype receive_type,
int root,
MPI_Comm communicator)
Parallel Computing (Intro-06): Rajeev Wankar
82
Gatherv Operation

P0 P1 P2 P3

3 4 6 8 4 9 3 6 7 8

3 4 6 8 4 9 3 6 7 8

P0

Parallel Computing (Intro-06): Rajeev Wankar


83

Header for MPI_Gatherv


int MPI_Gatherv (
void *send_buffer,
int send_cnt,
MPI_Datatype send_type,
void *receive_buffer,
int *receive_cnt,
int *receive_disp,
MPI_Datatype receive_type,
int root,
MPI_Comm communicator)

Parallel Computing (Intro-06): Rajeev Wankar


84
Allgather

Gathers data from all tasks and distribute it to all

Parallel Computing (Intro-06): Rajeev Wankar


85

Alltoall

• Sends data from all to all processes


• Useful in getting transpose of a matrix

Parallel Computing (Intro-06): Rajeev Wankar


86
MPI_Alltoall
int MPI_Alltoall( void *sendbuf, int sendcount,
MPI_Datatype sendtype, void *recvbuf, int recvcnt,
MPI_Datatype recvtype, MPI_Comm comm )

Input Parameters
sendbuf starting address of send buffer
sendcount number of elements to send to
each process (integer)
sendtype data type of send buffer elements
recvcount number of elements received from
any process (integer)
recvtype data type of receive buffer
elements
comm communicator

Parallel Computing (Intro-06): Rajeev Wankar


87

Alltoall Operation
P0 P1 P2 P3

1 4 6 7 8 3 3 2 5 5 5 1 0 4 3 2 0 4 3 6 3 6 1 6 2 4 3 5 1 3 0 3

1 4 5 5 0 4 2 4 6 7 5 1 3 6 3 5 8 3 0 4 3 6 1 3 3 2 3 2 1 6 0 3

Sends data from all to all processes


All-to-All operation for an integer array of size 8 on 4 processors
Parallel Computing (Intro-06): Rajeev Wankar
88
MPI_Allgatherv

Parallel Computing (Intro-06): Rajeev Wankar


89

MPI_Allgatherv

int MPI_Allgatherv (
void *send_buffer,
int send_cnt,
MPI_Datatype send_type,
void *receive_buffer,
int *receive_cnt,
int *receive_disp,
MPI_Datatype receive_type,
MPI_Comm communicator)

Parallel Computing (Intro-06): Rajeev Wankar


90
Input Parameters (MPI_Allgatherv)
sendbuf starting address of send buffer (choice)
sendcount number of elements in send buffer (integer)
sendtype data type of send buffer elements
recvcounts integer array containing the number of
elements that are received from each process
displs integer array (of length group size). Entry i
specifies the displacement (relative to recvbuf)
at which to place the incoming data from
process i
recvtype data type of receive buffer elements
comm communicator
recvbuf address of receive buffer (choice)

The block of data sent from the jth process is received by every
process and placed in the jth block of the buffer recvbuf.

Parallel Computing (Intro-06): Rajeev Wankar


91

Function MPI_Alltoallv

Parallel Computing (Intro-06): Rajeev Wankar


92
Header for MPI_Alltoallv
int MPI_Gatherv (
void *send_buffer,
int *send_cnt,
int *send_disp,
MPI_Datatype send_type,
void *receive_buffer,
int *receive_cnt,
int *receive_disp,
MPI_Datatype receive_type,
MPI_Comm communicator)

Parallel Computing (Intro-06): Rajeev Wankar


93

Matrix-vector Multiplication

Parallel Computing (Intro-06): Rajeev Wankar


94
Outline

• At least three parallel implementation are


possible

– Rowwise block striped


– Columnwise block striped
– Block decomposition

Parallel Computing (Intro-06): Rajeev Wankar


95

Storing Vectors

• Divide vector elements among processes


• Replicate vector elements
• Vector replication acceptable because vectors have
only n elements, versus n2 elements in matrices

Parallel Computing (Intro-06): Rajeev Wankar


96
Rowwise Block Striped Matrix

• Partitioning through domain decomposition


• Primitive task associated with
– Row of matrix
– Entire vector

Parallel Computing (Intro-06): Rajeev Wankar


97

Columnwise Block Striped Matrix

• Partitioning through domain decomposition


• Task associated with
– Column of matrix
– Vector element
• MPICH function used are MPI_Scatter, MPI_Gather,
MPI_alltoall

Parallel Computing (Intro-06): Rajeev Wankar


98

#
Matrix-Vector Multiplication
c0 = a0,0 b0 + a0,1 b1 + a0,2 b2 + a0,3 b3 + a0,4 b4
c1 = a1,0 b0 + a1,1 b1 + a1,2 b2 + a1,3 b3 + a1,4 b4
c2 = a2,0 b0 + a2,1 b1 + a2,2 b2 + a2,3 b3 + a2,4 b4
c3 = a3,0 b0 + a3,1 b1 + a3,2 b2 + a3,3 b3 + b3,4 b4
c4 = a4,0 b0 + a4,1 b1 + a4,2 b2 + a4,3 b3 + a4,4 b4

Proc 4
Proc 3
Proc 2
Processor 1’s initial computation
Processor 0’s initial computation
Parallel Computing (Intro-06): Rajeev Wankar
99

Phases of Parallel Algorithm


Scatter A, b Task i

Multiplications
Column i of A

Column i of A

b b ~c

All-to-all exchange
Column i of A

Column i of A

b c b ~c

Reduction

Parallel Computing (Intro-06): Rajeev Wankar


100

"
Matrix-Vector

2 1 0 4 1
3 2 1 1 3
×
4 3 1 2 4
3 0 2 0 1

Parallel Computing (Intro-06): Rajeev Wankar


101

This is alltoall
operation with
P0 reduction P1
2 X 1 = 2 1 = 3
3 3 2 X 3 6
4 4 SUM 3 9
3 3 0 0
9

14
P2 P3
19
0 = 0 4 = 4
1 4 11 1 1
1 X 4 4 2 2
2 8 0 X 1 0

Parallel Computing (Intro-06): Rajeev Wankar


102
Evaluating Programs Empirically

Measuring Execution Time


To measure the execution time between point L1 and point
L2 in the code, we might have a construction such as.
L1: time(&t1); /* start timer */..
L2: time(&t2); /* stop timer */.
elapsed_time = difftime(t2, t1); /* elapsed_time = t2 - t1 */
printf("Elapsed time = %5.2f seconds", elapsed_time);
MPICH provides the routine MPI_Wtime() for returning time
(in seconds).
Parallel Computing (Intro-06): Rajeev Wankar
103

What’s in MPICH-2
• Extensions to the message-passing model
– Dynamic process management
– One-sided operations (remote memory access)
– Parallel I/O
– Thread support
• Making MPI more robust and convenient
– C++ and Fortran 90 bindings
– External interfaces, handlers
– Extended collective operations
– Language interoperability

Parallel Computing (Intro-06): Rajeev Wankar


104
I/O in MPICH-2 Advanced Features

• Non-Contiguous access in both memory and file


• Collective I/O Operations
• Both Individual and Shared File pointers
• Non-blocking I/O

Parallel Computing (Intro-06): Rajeev Wankar


105

/*example of parallel MPI write into a single file */

#include "mpi.h"
#include <stdio.h>
#define BUFSIZE 100

int main(int argc, char *argv[]) {


int i, myrank, buf[BUFSIZE];

MPI_File thefile;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

for (i=0; i<BUFSIZE; i++)


buf[i] = myrank * BUFSIZE + i;

Parallel Computing (Intro-06): Rajeev Wankar


106
MPI_File_open(MPI_COMM_WORLD, "testfile",
MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL, &thefile);

MPI_File_set_view (thefile, myrank * BUFSIZE *


sizeof(int), MPI_INT,
MPI_INT, "native", MPI_INFO_NULL);

MPI_File_write(thefile, buf, BUFSIZE, MPI_INT,


MPI_STATUS_IGNORE);

MPI_File_close(&thefile);

MPI_Finalize();

return 0;
}

Parallel Computing (Intro-06): Rajeev Wankar


107

Remote Memory Operations


• In MPI - the data is moved from the address space of
one process to that of another by means of a co-
operative operations such as send/receive pair
• In the case of Shared Memory model, process have
access to a common pool of memory and can simply
perform ordinary memory operation (load from, store
into) on some set of addresses
• In an MPICH-2, an API is defined that provides
elements of the shared memory model in an MPI
environment. These are called “MPI’s one sided” or
“remote memory” operations
Parallel Computing (Intro-06): Rajeev Wankar
108
Remote Memory Operations

• Design is based on the idea of remote memory access


windows
• Balance efficiency and portability across classes of
architecture, including shared memory multi-processors
(SMPs), NUMA machines and distributed memory parallel
processors

Parallel Computing (Intro-06): Rajeev Wankar


109

MPICH-G2

• MPICH-G2 is a grid-enabled implementation of the MPI v1.1


standard
• Using services from the Globus Toolkit® (e.g., job startup,
security) MPICH-G2 allows you to couple multiple machines,
potentially of different architectures, to run MPI applications on
Wide Area Network
• MPICH-G2 automatically converts data in messages sent
between machines of different architectures and supports multi
protocol communication by automatically selecting TCP for
intermachine messaging and (where available) vendor-
supplied MPI for intramachine messaging

Parallel Computing (Intro-06): Rajeev Wankar


110
References

• William Gropp, Ewing Lusk, Nathan Doss, and Anthony


Skjellum. A high performance, portable implementation of the
MPI Message-Passing Interface standard. Parallel Computing,
22(6):789–828, 1996.
• William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI:
Portable Parallel Programming with the Message Passing
Interface, 2nd edition. MIT Press, Cambridge, MA, 1999.
• William Gropp, Ewing Lusk, and Rajeev Thakur. Using MPI-2:
Advanced Features of the Message-Passing Interface. MIT
Press, Cambridge, MA, 1999.

Parallel Computing (Intro-06): Rajeev Wankar


111

References

• Message Passing Interface Forum. MPI: A Message-Passing


Interface standard. International Journal of Supercomputer
Applications, 8(3/4):165–414, 1994.
• Peter S. Pacheco. Parallel Programming with MPI. Morgan
Kaufman, 1997.
• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W.
Walker, and Jack Dongarra. MPI—The Complete Reference:
Volume 1, The MPI Core, 2nd edition. MIT Press, Cambridge,
MA, 1998.
• http://www.mcs.anl.gov/mpi/mpich

Parallel Computing (Intro-06): Rajeev Wankar


112
References

• William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI:


Portable Parallel Programming with the Message Passing
Interface, 2nd edition. MIT Press, Cambridge, MA, 1999.
• Michael J Quinn. Parallel Programming in C with MPI and
OpenMP, Tata-McGraw-Hill Edition, 2003.
• Barry Wilkinson And Michael Allen, Parallel Programming:
Techniques and Applications Using Networked Workstations
and Parallel Computers, Prentice Hall, Upper Saddle River, NJ,
1999.

Parallel Computing (Intro-06): Rajeev Wankar


113

You might also like