0% found this document useful (0 votes)
55 views58 pages

Slides 2

The document discusses message passing programming and models. It covers: 1) The two primary mechanisms needed for message passing programming are creating separate processes for execution on different computers and sending/receiving messages. 2) Message passing models include SPMD (single program multiple data), MPMD (multiple program multiple data), and PVM (parallel virtual machine). 3) Basic message passing involves point-to-point send and receive routines as well as group routines like broadcast, scatter, gather, and reduce. Asynchronous and synchronous routines are also discussed.

Uploaded by

Hafiz Safwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views58 pages

Slides 2

The document discusses message passing programming and models. It covers: 1) The two primary mechanisms needed for message passing programming are creating separate processes for execution on different computers and sending/receiving messages. 2) Message passing models include SPMD (single program multiple data), MPMD (multiple program multiple data), and PVM (parallel virtual machine). 3) Basic message passing involves point-to-point send and receive routines as well as group routines like broadcast, scatter, gather, and reduce. Asynchronous and synchronous routines are also discussed.

Uploaded by

Hafiz Safwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

slides2-1

Chapter 2

Message-Passing Computing

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-2

Basics of Message-Passing Programming using


User-level Message Passing Libraries
Two primary mechanisms needed:

1. A method of creating separate processes for execution on


different computers

2. A method of sending and receiving messages

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-3

Multiple program, multiple data (MPMD) model

Source
file

Source
file

Compile to suit
processor

Executables

Processor 0

Processor p - 1

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-4

Single Program Multiple Data (SPMD) model


Different processes merged into one program. Within program,
control statements select different parts for each processor to
execute. All executables started together - static process creation.
Source
file

Basic MPI way


Compile to suit
processor
Executables

Processor 0

Processor p 1

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-5

Multiple Program Multiple Data (MPMD) Model


Separate programs for each processor. Master-slave approach
usually taken. One processor executes master process. Other
processes started from within master process - dynamic process
creation.
Process 1

spawn();

Start execution
of process 2

Process 2

Time

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-6

Basic point-to-point Send and Receive


Routines
Passing a message between processes using send() and recv()
library calls:

Process 1

Process 2

send(&x, 2);

Movement
of data
recv(&y, 1);

Generic syntax (actual formats later)


Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-7

Synchronous Message Passing


Routines that actually return when message transfer completed.
Synchronous send routine
Waits until complete message can be accepted by the receiving
process before sending the message.
Synchronous receive routine
Waits until the message it is expecting arrives.
Synchronous routines intrinsically perform two actions: They
transfer data and they synchronize processes.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-8

Synchronous send() and recv() library calls using 3-way protocol


Process 1
Process 2

Time
Suspend
process
Both processes
continue

send();

Request to send
Acknowledgment

recv();

Message

(a) When send() occurs before recv()


Process 1

Process 2

Time

recv();
Request to send
send();

Both processes
continue

Suspend
process

Message
Acknowledgment

(b) When recv() occurs before send()


Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-9

Asynchronous Message Passing


Routines that do not wait for actions to complete before returning.
Usually require local storage for messages.

More than one version depending upon the actual semantics for
returning.

In general, they do not synchronize processes but allow processes


to move forward sooner. Must be used with care.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-10

MPI Definitions of Blocking and Non-Blocking


Blocking - return after their local actions complete, though the
message transfer may not have been completed.

Non-blocking - return immediately.


Assumes that data storage to be used for transfer not modified by
subsequent statements prior to tbeing used for transfer, and it is left
to the programmer to ensure this.

These terms may have different interpretations in other systems.


Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-11

How message-passing routines can return


before message transfer completed
Message buffer needed between source and destination to hold
message:

Process 1

Process 2

Message buffer

Time

send();

Continue
process

recv();

Read
message buffer

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-12

Asynchronous (blocking) routines changing to


synchronous routines
Once local actions completed and message is safely on its way,
sending process can continue with subsequent work.

Buffers only of finite length and a point could be reached when send
routine held up because all available buffer space exhausted.

Then, send routine will wait until storage becomes re-available - i.e
then routine behaves as a synchronous routine.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-13

Message Tag
Used to differentiate between different types of messages being
sent.

Message tag is carried within message.

If special type matching is not required, a wild card message tag is


used, so that the recv() will match with any send().

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-14

Message Tag Example


To send a message, x, with message tag 5 from a source process,
1, to a destination process, 2, and assign to y:

Process 1

Process 2

send(&x,2,5);

Movement
of data
recv(&y,1,5);

Waits for a message from process 1 with a tag of 5


Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-15

Group message passing routines


Apart from point-to-point message passing routines, have routines
that send message(s) to a group of processes or receive
message(s) from a group of processes - higher efficiency than
separate point-to-point routines although not absolutely necessary.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-16

Broadcast
Sending same message to all processes concerned with problem.
Multicast - sending same message to defined group of processes.

Process 0

data

Process 1

data

Process p 1

data

Action

buf

Code

bcast();

bcast();

bcast();

MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-17

Scatter
Sending each element of an array in root process to a separate
process. Contents of ith location of array sent to ith process.

Process 0

Process 1

data

data

Process p 1
data

Action
buf
Code

scatter();

scatter();

scatter();

MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-18

Gather
Having one process collect individual values from set of processes.

Process 0

Process 1

Process p 1

data

data

data

gather();

gather();

gather();

Action
buf
Code

MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-19

Reduce
Gather operation
operation.

combined

with

specified

arithmetic/logical

Example
Values could be gathered and then added together by root:
Process p 1

Process 0

Process 1

data

data

data

reduce();

reduce();

Action
buf
+

Code

reduce();

MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-20

PVM (Parallel Virtual Machine)


Perhaps first widely adopted attempt at using a workstation cluster
as a multicomputer platform, developed by Oak Ridge National
Laboratories. Available at no charge.
Programmer decomposes problem into separate programs (usually
a master program and a group of identical slave programs).
Each program compiled to execute on specific types of computers.
Set of computers used on a problem first must be defined prior to
executing the programs (in a hostfile).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-21

Message routing between computers done by PVM daemon processes


installed by PVM on computers that form the virtual machine.
Workstation
Can have more than one process
running on each computer.

Workstation

PVM
daemon
Application
program
(executable)

Messages
sent through
network
Workstation

PVM
daemon
Application
program
(executable)

PVM
daemon
Application
program
(executable)

MPI implementation we use is similar.


Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-22

MPI (Message Passing Interface)


Standard developed by group of academics and industrial partners
to foster more widespread use and portability.

Defines routines, not implementation.

Several free implementations exist.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-23

MPI
Process Creation and Execution

Purposely not defined and will depend upon the implementation.

Only static process creation is supported in MPI version 1. All


processes must be defined prior to execution and started together.

Orginally SPMD model of computation.

MPMD also possible with static creation - each program to be


started together specified.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-24

Communicators
Defines scope of a communication operation.

Processes have ranks associated with communicator.

Initially,

all

processes

enrolled

in

universe

called

MPI_COMM_WORLD, and each process is given a unique rank, a


number from 0 to p 1, where there are p processes.

Other communicators can be established for groups of processes.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-25

Using the SPMD Computational Model


main (int argc, char *argv[])
{
MPI_Init(&argc, &argv);
.
.
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/*find process rank */
if (myrank == 0)
master();
else
slave();
.
.
MPI_Finalize();
}

where master() and slave() are procedures to be executed by


master process and slave process, respectively.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-26

Unsafe Message Passing


MPI specifically addresses unsafe message passing.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

Unsafe message passing with libraries


Process 0

slides2-27

Process 1

Destination
send(,1,);
lib()

send(,1,);

(a) Intended behavior

Source
recv(,0,);

lib()

recv(,0,);

Process 0

Process 1

send(,1,);

(b) Possible behavior

lib()

send(,1,);
recv(,0,);

lib()

recv(,0,);

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-28

MPI Solution
Communicators

A communication domain that defines a set of processes that are


allowed to communicate between themselves.

The communication domain of the library can be separated from


that of a user program.

Used in all point-to-point and collective MPI message-passing


communications.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-29

Default Communicator
MPI_COMM_WORLD, exists as the first communicator for all the
processes existing in the application.

A set of MPI routines exists for forming communicators.


Processes have a rank in a communicator.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-30

Point-to-Point Communication
Uses send and receive routines with message tags (and
communicator). Wild card message tags available

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-31

Blocking Routines
Return when they are locally complete - when location used to hold
message can be used again or altered without affecting message
being sent.

A blocking send will send the message and return. This does not
mean that the message has been received, just that the process is
free to move on without adversely affecting the message.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-32

Parameters of the blocking send

MPI_Send(buf, count, datatype, dest, tag, comm)


Datatype of
Address of
Message tag
each item
send buffer
Rank of destination Communicator
Number of items
process
to send

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-33

Parameters of the blocking receive

MPI_Recv(buf, count, datatype, src, tag, comm, status)


Status
Address of
Datatype of
Message tag after operation
receive buffer
each item
Maximum number
Rank of source Communicator
of items to receive
process

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-34

Example
To send an integer x from process 0 to process 1,

MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
/* find rank */
if (myrank == 0) {
int x;
MPI_Send(&x, 1, MPI_INT, 1, msgtag, MPI_COMM_WORLD);
} else if (myrank == 1) {
int x;
MPI_Recv(&x, 1, MPI_INT, 0,msgtag,MPI_COMM_WORLD,status);
}

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-35

Nonblocking Routines
Nonblocking send - MPI_Isend(), will return immediately even
before source location is safe to be altered.

Nonblocking receive - MPI_Irecv(), will return even if there is no


message to accept.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-36

Nonblocking Routine Formats


MPI_Isend(buf, count, datatype, dest, tag, comm, request)

MPI_Irecv(buf, count, datatype, source, tag, comm, request)

Completion detected by MPI_Wait()and MPI_Test().

MPI_Wait()waits until operation completed and returns then.


MPI_Test() returns with flag set indicating whether operation
completed at that time.

Need to know whether particular operation completed.


Determined by accessing the request parameter.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-37

Example
To send an integer x from process 0 to process 1 and allow process
0 to continue,
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/* find rank */
if (myrank == 0) {
int x;
MPI_Isend(&x,1,MPI_INT, 1, msgtag, MPI_COMM_WORLD, req1);
compute();
MPI_Wait(req1, status);
} else if (myrank == 1) {
int x;
MPI_Recv(&x,1,MPI_INT,0,msgtag, MPI_COMM_WORLD, status);
}

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-38

Four Send Communication Modes


Standard Mode Send
Not assumed that corresponding receive routine has started.
Amount of buffering not defined by MPI. If buffering provided, send
could complete before receive reached.
Buffered Mode
Send may start and return before a matching receive. Necessary to
specify buffer space via routine MPI_Buffer_attach()
.
Synchronous Mode
Send and receive can start before each other but can only complete
together.
Ready Mode
Send can only start if matching receive already reached, otherwise
error. Use with care.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-39

Each of the four modes can be applied to both blocking and


nonblocking send routines.

Only the standard mode is available for the blocking and


nonblocking receive routines.

Any type of send routine can be used with any type of receive
routine.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-40

Collective Communication
Involves set of processes, defined by an intra-communicator.
Message tags not present.

Broadcast and Scatter Routines


The principal collective operations operating upon data are
MPI_Bcast()
MPI_Gather()
MPI_Scatter()
MPI_Alltoall()
MPI_Reduce()
MPI_Reduce_scatter()
MPI_Scan()

Broadcast from root to all other processes


Gather values for group of processes
Scatters buffer in parts to group of processes
Sends data from all processes to all processes
Combine values on all processes to single value
Combine values and scatter results
Compute prefix reductions of data on processes

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-41

Example
To gather items from the group of processes into process 0, using
dynamically allocated memory in the root process, we might use

int data[10];
/*data to be gathered from processes*/
.
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/* find rank */
if (myrank == 0) {
MPI_Comm_size(MPI_COMM_WORLD, &grp_size);
/*find group size*/
buf = (int *)malloc(grp_size*10*sizeof(int));/*allocate memory*/
}
MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0,MPI_COMM_WORLD);

Note that MPI_Gather()gathers from all processes, including root.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-42

Barrier
As in all message-passing systems, MPI provides a means of
synchronizing processes by stopping each one until they all have
reached a specific barrier call.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-43

#include mpi.h
Sample MPI program.
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000
void main(int argc, char *argv)
{
int myid, numprocs;
int data[MAXSIZE], i, x, low, high, myresult, result;
char fn[255];
char *fp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) {
/* Open input file and initialize data */
strcpy(fn,getenv(HOME));
strcat(fn,/MPI/rand_data.txt);
if ((fp = fopen(fn,r)) == NULL) {
printf(Cant open the input file: %s\n\n, fn);
exit(1);
}
for(i = 0; i < MAXSIZE; i++) fscanf(fp,%d, &data[i]);
}
/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
/* Add my portion Of data */
x = n/nproc;
low = myid * x;
high = low + x;
for(i = low; i < high; i++)
myresult += data[i];
printf(I got %d from %d\n, myresult, myid);
/* Compute global sum */
MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(The sum is %d.\n, result);
MPI_Finalize();
}
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-44

Evaluating Parallel Programs

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-45

Equations for Parallel Execution Time


First concern is how fast parallel implementation is likely to be.
Might begin by estimating execution time on a single computer, ts,
by counting computational steps of best sequential algorithm.

For a parallel algorithm, in addition to number of computational


steps, need to estimate communication overhead.

Parallel execution time, tp, composed of two parts: a computation


part, say tcomp, and a communication part, say tcomm; i.e.,
tp = tcomp + tcomm
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-46

Computational Time
Can be estimated in a similar way to that of a sequential algorithm,
by counting number of computational steps. When more than one
process being executed simultaneously, count computational steps
of most complex process. Generally, some function of n and p, i.e.
tcomp = f(n, p)
The time units of tp are that of a computational step.
Often break down computation time into parts. Then
tcomp = tcomp1 + tcomp2 + tcomp3 +
where tcomp1, tcomp2, tcomp3 are computation times of each part.
Analysis usually done assuming that all processors are same and
operating at same speed.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-47

Communication Time
Will depend upon the number of messages, the size of each
message, the underlying interconnection structure, and the mode of
transfer. Many factors, including network structure and network
contention. For a first approximation, we will use
tcomm1 = tstartup + ntdata
for communication time of message 1.
tstartup is the startup time, essentially the time to send a message
with no data. Assumed to be constant.
tdata is the transmission time to send one data word, also assumed
constant, and there are n data words.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-48

Idealized Communication Time

Startup time
Number of data items (n)

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-49

Final communication time, tcomm will be the summation of


communication times of all the sequential messages from a
process, i.e.
tcomm = tcomm1 + tcomm2 + tcomm3 +

Typically, the communication patterns of all the processes are the


same and assumed to take place together so that only one process
need be considered.

Both startup and data transmission times, tstartup and tdata, are
measured in units of one computational step, so that we can add
tcomp and tcomm together to obtain the parallel execution time, tp.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-50

Benchmark Factors
With ts, tcomp, and tcomm, can establish speedup factor and
computation/communication ratio for a particular algorithm/
implementation:
ts
ts
Speedup factor = ----- = -----------------------------------------tp
t comp + t comm
t comp
Computation/communication ratio = -----------------t comm

Both functions of number of processors, p, and number of data


elements, n.
Will give an indication of the scalability of the parallel solution with
increasing number of processors and problem size. Computation/
communication ratio will highlight effect of communication with
increasing problem size and system size.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-51

Debugging and Evaluating Parallel Programs Empirically


Visualization Tools
Programs can be watched as they are executed in a space-time
diagram (or process-time diagram):

Process 1
Process 2
Process 3
Computing
Time
Waiting
Message-passing system routine
Message
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-52

Implementations of visualization tools are available for MPI.

An example is the Upshot program visualization system.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-53

Evaluating Programs Empirically


Measuring Execution Time
To measure the execution time between point

L1

and point

L2

in the

code, we might have a construction such as

.
L1: time(&t1);
/* start timer */
.
.
L2: time(&t2);
/* stop timer */
.
elapsed_time = difftime(t2, t1); /* elapsed_time = t2 - t1 */
printf(Elapsed time = %5.2f seconds, elapsed_time);

MPI provides the routine MPI_Wtime() for returning time (in


seconds).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-54

Parallel Programming Home Page


http://www.cs.uncc.edu/par_prog
Gives step-by-step instructions for compiling and executing
programs, and other information.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-55

Basic Instructions for Compiling/Executing MPI


Programs
Preliminaries

Set up paths

Create required directory structure

Create a file (hostfile) listing machines to be used


(required)

Details described on home page.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-56

Hostfile
Before starting MPI for the first time, need to create a hostfile
Sample hostfile
ws404
#is-sm1 //Currently not executing, commented
pvm1 //Active processors, UNCC sun cluster called pvm1 - pvm8
pvm2
pvm3
pvm4
pvm5
pvm6
pvm7
pvm8
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-57

Compiling/executing (SPMD) MPI program


For LAM MPI version 6.5.2. At a command line:
To start MPI:
First time:
lamboot -v hostfile
Subsequently:
lamboot
To compile MPI programs:
mpicc -o file file.c
or
mpiCC -o file file.cpp
To execute MPI program:
mpirun -v -np no_processors file
To remove processes for reboot
lamclean -v
Terminate LAM
lamhalt
If fails
wipe -v lamhost
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-58

Compiling/Executing Multiple MPI Programs


Create a file specifying programs:
Example
1 master and 2 slaves, appfile contains
n0 master
n0-1 slave
To execute:
mpirun -v appfile
Sample output
3292 master running on n0 (o)
3296 slave running on n0 (o)
412 slave running on n1
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

You might also like