0% found this document useful (0 votes)

103 views34 pages

Message-Passing Computing

This document discusses message passing programming and models. It covers: 1. The third option for message passing programming - using an existing sequential language and message passing library. This requires explicitly defining processes, message passing, and message content. 2. The primary methods needed - creating separate processes and sending/receiving messages between them. 3. The Single Program Multiple Data (SPMD) and Multiple Program Multiple Data (MPMD) models for organizing processes and programs.

Uploaded by

smanjuravi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views34 pages

Message-Passing Computing

Uploaded by

smanjuravi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Chapter 2 Message-Passing Computing

Basics of Message-Passing Programming

Programming Options
Programming a message-passing multicomputer can be achieved by

1. Designing a special parallel programming language

2. Extending the syntax/reserved words of an existing sequential high-level

language to handle message passing

3. Using an existing sequential high-level language and providing a library of

external procedures for message passing

We will concentrate upon the third option. Necessary to say explicitly what processes
are to be executed, when to pass messages between concurrent processes, and what to
pass in the messages.

Two primary methods are needed in this form of a message-passing system:

1. A method of creating separate processes for execution on different computers

2. A method of sending and receiving messages

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 29
Single Program Multiple Data (SPMD) model
Different processes are merged into one program. Within the program are control
statements that will customize the code; i.e. select different parts for each process.

Source
file

Compile to suit
processor

Executables

Processor 0 Processor n − 1

Figure 2.1 Single program, multiple data operation.

Multiple Program Multiple Data (MPMD) Model

Separate programs written for each processors. Master-slave approachusually taken
whereby a single processor executes a master process and other processes are started
from within the master process.
Process 1

Start execution
spawn(); of process 2 Process 2

Time

Figure 2.2 Spawning a process.

Process 1 Process 2
x y

Movement
send(&x, 2); of data
recv(&y, 1);

Figure 2.3 Passing a message between processes using send() and recv() library
calls.

Synchronous Message Passing

Routines that actually return when the message transfer has been completed.

Do not need message buffer storage. A synchronous send routine could wait until the
complete message can be accepted by the receiving process before sending the
message.

A synchronous receive routine will wait until the message it is expecting arrives.

Synchronous routines intrinsically perform two actions: They transfer data and they
synchronize processes.

Suggest some form of signaling, such as a three-way protocol:

Time Request to send

send();
Suspend Acknowledgment
process recv();
Both processes Message
continue

(a) When send() occurs before recv()

Process 1 Process 2

Time recv();
Request to send Suspend
send(); process
Both processes Message
continue Acknowledgment
(b) When recv() occurs before send()
Figure 2.4 Synchronous send() and recv() library calls using a three-way

Blocking and Nonblocking Message Passing

Blocking - has been used to describe routines that do not return until the transfer is
completed.

The routines are “blocked” from continuing.

In that sense, the terms synchronous and blocking were synonymous.

Non-blocking - has been used to describe routines that return whether or not the
message had been received.

The terms blocking and nonblocking redefined in systems such as MPI:

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 32
MPI Definitions of Blocking and Non-Blocking
Blocking - return after their local actions complete, though the message transfer may
not have been completed.

Non-blocking - return immediately. Assumed that the data storage being used for the
transfer is not modified by the subsequent statements prior to the data storage being
used for the transfer, and it is left to the programmer to ensure this.

How message-passing routines can return before the message transfer

has been completed
Generally, a message buffer needed between source and destination to hold message:

Process 1 Process 2

Message buffer
Time send();
Continue recv();
process Read
message buffer

Figure 2.5 Using a message buffer.

For send routine, once the local actions have been completed and the message is safely
on its way, the process can continue with subsequent work. Buffers can only be of finite
length and a point could be reached when the send routine is held up because all the
available buffer space has been exhausted.

Example
To send a message, x, with message tag 5 from a source process, 1, to a destination
process, 2, and assign to y, we might have
send(&x, 2, 5);
in the source process and
recv(&y, 1, 5);
in the destination process. The message tag is carried within the message.

If special type matching is not required, a wild card message tag is used, so that the
recv() will match with any send().

Broadcast
Sending the same message to all the processes concerned with the problem.
Multicast - sending the same message to a defined group of processes.

Process 0 Process 1 Process n − 1

data data data

Action
buf

bcast(); bcast(); bcast();

Code

Figure 2.6 Broadcast operation.

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 34
Scatter
Sending each element of an array of data in the root to a separate process. The contents
of the ith location of the array is sent to the ith process.

Process 0 Process 1 Process n − 1

data data data

Action

buf

scatter(); scatter(); scatter();

Code

Figure 2.7 Scatter operation.

Gather
Having one process collect individual values from a set of processes.

Process 0 Process 1 Process n − 1

data data data

Action

buf

gather(); gather(); gather();

Code

Figure 2.8 Gather operation.

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 35
Reduce
Gather operation combined with a specified arithmetic or logical operation. Example,
the values could be gathered and then added together by the root:

Process 0 Process 1 Process n − 1

data data data

Action

buf +

reduce(); reduce(); reduce();

Code

Figure 2.9 Reduce operation (addition).

Using Workstation Clusters - Software Tools

PVM (Parallel Virtual Machine) - Perhaps the first widely adopted attempt at using a
workstation cluster as a multicomputer platform developed by Oak Ridge National
Laboratories.

Provides for a software environment for message passing between homogeneous or

heterogeneous computers and has a collection of library routines that the user can
employ with C or FORTRAN programs. Available at no charge.

MPI (Message Passing Interface) - standard developed by group of academics and

industrial partnersto foster more widespread use and portability. Several free
implementations exist

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 36
PVM
The programmer decomposes the problem into separate programs. Each program is
written in C (or Fortran) and compiled to run on specific types of computers in the
network.

The set of computers used on a problem first must be defined prior to running the
programs.

The most convenient way of doing this is by creating a list of the names of the
computers available in a hostfile. The hostfile is then read by PVM.

The routing of messages between computers is done by PVM daemon processes

installed by PVM on the computers that form the virtual machine:

Workstation

PVM
daemon

Application
program
(executable)
Messages
sent through
Workstation network

Workstation
PVM
daemon

Application
program PVM
(executable) daemon

Application
program
(executable)

Figure 2.10 Message passing between workstations using PVM.

PVM
daemon

Messages
sent through
Workstation network

PVM
daemon Workstation

PVM
daemon

Application
program
(executable)
Figure 2.11 Multiple processes allocated to each processor (workstation).

Basic Message-Passing Routines

All PVM send routines are nonblocking (or asynchronous in PVM terminology) while
PVM receive routines can be either blocking (synchronous) or nonblocking. Uses a
message tag (msgtag). Both message tag and source wild cards available.

pvm_psend()and pvm_precv()

If data being sent is a list of items of the same data type, the PVM routines
pvm_psend() and pvm_precv() can be used.

Parameter in pvm_psend() points to an array of data in the source process to be sent,

and a parameter in pvm_precv() points to where to store the received data.

Full list of parameters for pvm_psend() and pvm_precv():

pvm_psend();
Continue pvm_precv(); Wait for message
process

Figure 2.12 pvm_psend() and pvm_precv() system calls.

Full list of parameters for pvm_psend() and pvm_precv():

Sending Data Composed of Various Types

Data packed into a send buffer prior to sending data. Receiving process must unpack its
receive buffer according to format in which it was packed. Specific packing and
unpacking routines for each datatype.

Process_1
Process_2

pvm_initsend(); x
Send s
buffer y
pvm_pkint( … &x …);
pvm_pkstr( … &s …);
pvm_pkfloat( … &y …);
pvm_send(process_2 … ); Message
pvm_recv(process_1 …);
pvm_upkint( … &x …);
Receive pvm_upkstr( … &s …);
buffer pvm_upkfloat(… &y … );

Figure 2.13 PVM packing messages, sending, and unpacking.

Broadcast, scatter, gather, and reduce operations (pvm_bcast(), pvm_scatter(),

pvm_gather(), and pvm_reduce()) used with group of processes.

A process joins named group by calling pvm_joingroup().

The pvm_bcast(), when called, would send a message to each member of the named
group.

Similarly, pvm_gather() would collect values from each member of the named group.

The PVM multicast operation, pvm_mcast(), is not a group operation.

#include <stdio.h>
#include <stdlib.h>
Master
#include <pvm3.h>
#define SLAVE “spsum”
#define PROC 10
#define NELEM 1000
main() {
int mytid,tids[PROC];
int n = NELEM, nproc = PROC;
int no, i, who, msgtype;
int data[NELEM],result[PROC],tot=0;
char fn[255]; Slave
FILE *fp; #include <stdio.h>
mytid=pvm_mytid();/*Enroll in PVM */ #include “pvm3.h”
#define PROC 10
/* Start Slave Tasks */ #define NELEM 1000
no=
pvm_spawn(SLAVE,(char**)0,0,““,nproc,tids); main() {
if (no < nproc) { int mytid;
printf(“Trouble spawning slaves \n”); int tids[PROC];
for (i=0; i<no; i++) pvm_kill(tids[i]); int n, me, i, msgtype;
pvm_exit(); exit(1); int x, nproc, master;
} int data[NELEM], sum;
/* Open Input File and Initialize Data */ mytid = pvm_mytid();
strcpy(fn,getenv(“HOME”));
strcat(fn,”/pvm3/src/rand_data.txt”); /* Receive data from master */
if ((fp = fopen(fn,”r”)) == NULL) { msgtype = 0;
printf(“Can’t open input file %s\n”,fn); pvm_recv(-1, msgtype);
exit(1); pvm_upkint(&nproc, 1, 1);
} pvm_upkint(tids, nproc, 1);
for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]); pvm_upkint(&n, 1, 1);
pvm_upkint(data, n, 1);
/* Broadcast data To slaves*/
pvm_initsend(PvmDataDefault); /* Determine my tid */
msgtype = 0; for (i=0; i<nproc; i++)

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 40
/* Open Input File and Initialize Data */ mytid = pvm_mytid();
strcpy(fn,getenv(“HOME”));
strcat(fn,”/pvm3/src/rand_data.txt”); /* Receive data from master */
if ((fp = fopen(fn,”r”)) == NULL) { msgtype = 0;
printf(“Can’t open input file %s\n”,fn); pvm_recv(-1, msgtype);
exit(1); pvm_upkint(&nproc, 1, 1);
} pvm_upkint(tids, nproc, 1);
for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]); pvm_upkint(&n, 1, 1);
pvm_upkint(data, n, 1);
/* Broadcast data To slaves*/
pvm_initsend(PvmDataDefault); /* Determine my tid */
msgtype = 0; for (i=0; i<nproc; i++)
pvm_pkint(&nproc, 1, 1); if(mytid==tids[i])
pvm_pkint(tids, nproc, 1); {me = i;break;}
pvm_pkint(&n, 1, 1); Broadcast data
pvm_pkint(data, n, 1); /* Add my portion Of data */
pvm_mcast(tids, nproc, msgtag); x = n/nproc;
low = me * x;
high = low + x;
/* Get results from Slaves*/ for(i = low; i < high; i++)
msgtype = 5; sum += data[i];
for (i=0; i<nproc; i++){
pvm_recv(-1, msgtype); /* Send result to master */
pvm_upkint(&who, 1, 1); pvm_initsend(PvmDataDefault);
pvm_upkint(&result[who], 1, 1); Receive results pvm_pkint(&me, 1, 1);
printf(“%d from %d\n”,result[who],who); pvm_pkint(&sum, 1, 1);
} msgtype = 5;
master = pvm_parent();
/* Compute global sum */ pvm_send(master, msgtype);
for (i=0; i<nproc; i++) tot += result[i];
printf (“The total is %d.\n\n”, tot); /* Exit PVM */
pvm_exit();
pvm_exit(); /* Program finished. Exit PVM */ return(0);
return(0); }
}
Figure 2.15 Sample PVM program.

MPI
Process Creation and Execution

Purposely not defined and will depend upon the implementation.

Only static process creation is supported in MPI version 1. All the processes must be
defined prior to execution and started together. Use SPMD model of computation.

Communicators
Defines the scope of a communication operation.
Processes have ranks associated with the communicator.
Initially, all processes enrolled in a “universe” called MPI_COMM_WORLD, and each process is
given a unique rank, a number from 0 to n − 1, where there are n processes.
Other communicators can be established for groups of processes.

main (int argc, char *argv[])

{
MPI_Init(&argc, &argv);
.
.
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find process rank */
if (myrank == 0)
master();
else
slave();
.
.
MPI_Finalize();
}

where master() and slave() are procedures to be executed by the master process and
slave process, respectively.

Global and Local Variables

Any global declarations of variables will be duplicated in each process.

Variables that are not to be duplicated will need to be declared within code only
executed by that process.

Example
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find process rank */
if (myrank == 0) { /* process 0 actions/local variables */
int x, y;
.
.
} else if (myrank == 1) {/* process 1 actions/local variables */
int x, y;
.
.
}

Here, x and y in process 0 are different local variables from x and y in process 1.

lib() send(…,1,…); Source

recv(…,0,…); lib()

recv(…,0,…);

(a) Intended behavior

Process 0 Process 1

send(…,1,…);

lib() send(…,1,…);

recv(…,0,…); lib()

recv(…,0,…);

(b) Possible behavior

Figure 2.16 Unsafe message passing with libraries.

MPI Solution

Communicators - used in MPI for all point-to-point and collective MPI message-
passing communications.

A communicator is a communication domain that defines a set of processes that are

allowed to communicate between themselves.

In this way, the communication domain of the library can be separated from that of a
user program.

Each process has a rank within the communicator, an integer from 0 to n − 1, where
there are n processes.

Intracommunicator - for communicating within a group

Intercommunicator - for communication between groups.

A process has a unique rank in a group (an integer from 0 to m − 1, where there are m
processes in the group). A process could be a member of more than one group.

Default intracommunicator - MPI_COMM_WORLD, exists as the first communicator for all

the processes existing in the application.New communicators are created based upon
existing communicators. A set of MPI routines exists for forming communicators.

Point-to-Point Communication

Message tags are present, and wild cards can be used in place of the tag (MPI_ANY_TAG)
and in place of the source in receive routines (MPI_ANY_SOURCE).

PVM style packing and unpacking data is generally avoided by the use of an MPI
datatype being defined in the send/receive parameters together with the source or
destination of the message.
Blocking Routines
Return when they are locally complete - when the location used to hold the message
can be used again or altered without affecting the message being sent.

A blocking send will send the message and return. This does not mean that the message
has been received, just that the process is free to move on without adversely affecting
the message.

MPI_Send(buf, count, datatype, dest, tag, comm)

Address of Datatype of Message tag

send buffer each item
Number of items Rank of destination Communicator
to send process

The general format of parameters of the blocking receive is

MPI_Recv(buf, count, datatype, src, tag, comm, status)

Address of Datatype of Message tag Status after operation

receive buffer each item
Maximum number Rank of source Communicator
of items to receive process

Example

To send an integer x from process 0 to process 1,

MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find process rank */

if (myrank == 0) {
int x;
MPI_Send(&x, 1, MPI_INT, 1, msgtag, MPI_COMM_WORLD);
} else if (myrank == 1) {
int x;
MPI_Recv(&x, 1, MPI_INT, 0, msgtag, MPI_COMM_WORLD, status);
}

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 45
Nonblocking Routines
Nonblocking send - MPI_Isend(), will return “immediately” even before source
location is safe to be altered.

Nonblocking receive - MPI_Irecv(), will return even if there is no message to accept.

Formats
MPI_Isend(buf, count, datatype, dest, tag, comm, request)
MPI_Irecv(buf, count, datatype, source, tag, comm, request)

Completion detected by MPI_Wait() and MPI_Test().

MPI_Wait() waits until the operation has actually completed and will return then.
MPI_Test() returns with a flag set indicating whether operation completed at that time.

These routines need to know whether the particular operation has completed, which is
determined by accessing the request parameter.

Example
To send an integer x from process 0 to process 1 and allow process 0 to continue,

MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find process rank */

if (myrank == 0) {
int x;
MPI_Isend(&x, 1, MPI_INT, 1, msgtag, MPI_COMM_WORLD, req1);
compute();
MPI_Wait(req1, status);
} else if (myrank == 1) {
int x;
MPI_Recv(&x, 0, MPI_INT, 1, msgtag, MPI_COMM_WORLD, status);
}

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 46
Send Communication Modes
Standard Mode Send
Not assumed that corresponding receive routine has started. Amount of buffering not
defined by MPI. If buffering is provided, send could complete before receive reached.

Buffered Mode
Send may start and return before a matching receive. Necessary to specify buffer space
via routine MPI_Buffer_attach() - removed with MPI_Buffer_detach().

Synchronous Mode
Send and receive can start before each other but can only complete together.

Ready Mode
Send can only start if matching receive already reached, otherwise error. Use with care.

Each of the four modes can be applied to both blocking and nonblocking send routines.
Only the standard mode is available for the blocking and nonblocking receive routines.
Any type of send routine can be used with any type of receive routine.

Collective Communication

Involves set of processes, defined by an intra-communicator. Message tags not present.

Broadcast and Scatter Routines

The principal collective operations operating upon data are

MPI_Bcast() - Broadcast from root to all other processes

MPI_Gather() - Gather values for group of processes
MPI_Scatter() - Scatters buffer in parts to group of processes
MPI_Alltoall() - Sends data from all processes to all processes
MPI_Reduce() - Combine values on all processes to single value
MPI_Reduce_scatter() - Combine values and scatter results
MPI_Scan() - Compute prefix reductions of data on processes

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 47
Example
To gather items from the group of processes into process 0, using dynamically allocated
memory in the root process, we might use

int data[10]; /data to be gathered from processes/

.
MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* find rank */
if (myrank == 0) {
MPI_Comm_size(MPI_COMM_WORLD, &grp_size); /*find group size*/
buf = (int *)malloc(grp_size*10*sizeof(int));/*allocate memory*/
}
MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0,MPI_COMM_WORLD);

Note that MPI_Gather() gathers from all processes, including the root.

Barrier
As in all message-passing systems, MPI provides a means of synchronizing processes
by stopping each one until they all have reached a specific “barrier” call.

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 48
#include “mpi.h”
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000
void main(int argc, char *argv)
{
int myid, numprocs;
int data[MAXSIZE], i, x, low, high, myresult, result;
char fn[255];
char *fp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { /* Open input file and initialize data */
strcpy(fn,getenv(“HOME”));
strcat(fn,”/MPI/rand_data.txt”);
if ((fp = fopen(fn,”r”)) == NULL) {
printf(“Can’t open the input file: %s\n\n”, fn);
exit(1);
}
for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);
}
/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
/* Add my portion Of data */
x = n/nproc;
low = myid * x;
high = low + x;
for(i = low; i < high; i++)
myresult += data[i];
printf(“I got %d from %d\n”, myresult, myid);
/* Compute global sum */
MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(“The sum is %d.\n”, result);
MPI_Finalize();
}
Figure 2.17 Sample MPI program.

Pseudocode Constructs
To send message consisting of an integer x and a float y, from the process called master
to the process called slave, assigning to a and b, we simply write in the master process

send(&x, &y, Pslave);

and in the slave process

recv(&a, &b, Pmaster);

where x and a are integers and y and b are floats. x will be copied to a, and y copied to b.
ith process given notation Pi, and a tag may be present i.e.,

send(&x, P2, data_tag);

sends x to process 2, with the message tag data_tag.

Locally blocking send() and recv() written as given. Other forms with prefixes; i.e.,
ssend(&data1, Pdestination); /* Synchronous send */

tp = tcomp + tcomm

Computation time estimated in a similar way to that of a sequential algorithm.

Communication Time
As a first approximation, we will use
tcomm = tstartup + ntdata

where tstartup is the startup time, sometimes called the message latency - essentially
time to send a message with no data. Startup time is assumed constant.

The term tdata is the transmission time to send one data word, also assumed a constant,
and there are n data words.

Startup time

Number of data items (n)

Figure 2.18 Theoretical communication time.

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 50
Important Note on Interpretation of Equations
Only intended to give a starting point to how an algorithm might perform in practice.

Parallel execution time, tp, normalized in units of an arithmetic operation, which will
depend upon computer system.

We might find that the computation requires m computational steps so that

tcomp = m

Since we are measuring time in units of computational steps, the communication time
has to be measured in the same way.

All data types are assumed to require the same time.

Suppose q messages are sent, each containing n data items. We have

tcomm = q(tstartup + ntdata)

Latency Hiding
A way to ameliorate situation of significant message communication times is to overlap
communication with subsequent computations.

Nonblocking send routines provided particularly to enable latency hiding.

Latency hiding can also be achieved by mapping multiple processes on a processor and
use a time-sharing facility that switches for one process to another when the first
process is stalled because of incomplete message passing or otherwise.

Relies upon an efficient method of switching from one process to another. Threads
offer an efficient mechanism.

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 51
Time Complexity
As with sequential computations, a parallel algorithm can be evaluated through the use
of time complexity (notably the Ο notation — “order of magnitude,” big-oh).

Start with an estimate of the number of the computational steps, considering all
arithmetic and logical operations to be equal and ignoring other aspects of the
computation such as computational tests.

An expression of the number of computational steps is derived, often in terms of the

number of data items being handled by the algorithm.

The Ο notation
f(x) = Ο(g(x)) if and only if there exists positive constants, c and x0, such that
0 ≤ f(x) ≤ cg(x) for all x ≥ x0

where f(x) and g(x) are functions of x.

For example, if f(x) = 4x2 + 2x + 12, the constant c = 6 would work with the formal
definition to establish that f(x) = O(x2), since 0 < 4x2 + 2x + 12 ≤ 6x2 for x ≥ 3.

Alternative functions for g(x) that will satisfy definition. Use the function that grows
least for g(x).

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 52
Θ notation - upper bound
f(x) = Θ(g(x)) if and only if there exists positive constants c1, c2, and x0 such that
0 ≤ c1g(x) ≤ f(x) ≤ c2g(x) for all x ≥ x0.

If f(x) = Θ(g(x)), it is clear that f(x) = Ο(g(x)) is also true.

Ω notation - lower bound

f(x) = Ω(g(x)) if and only if there exists positive constants c and x0 such that
0 ≤ cg(x) ≤ f(x) for all x ≥ x0.

It follows from this definition that f(x) = 4x2 + 2x + 12 = Ω(x2)

We can read Ο() as “grows at most as fast as” and Ω() as “grows at least as fast as.”

Example
The execution time of a sorting algorithm often depends upon the original order of the
numbers to be sorted. It may be that it requires at least n log n steps, but could require

n2 steps for n numbers depending upon the order of the numbers. This would be

indicated by a time complexity of Ω(n log n) and Ο(n2).

140 f(x) = 4x2 + 2x + 12

120

100

60 c1g(x) = 2x2

0
0 1 2 3 4 5
x0
Figure 2.19 Growth of function f(x) = 4x2 + 2x + 12.

Time Complexity of a Parallel Algorithm

Time complexity analysis hides lower terms - tcomm has time complexity of Ο(n).
Time complexity of tp will be sum of complexity of computation and communication.
Example
To add n numbers on two computers, where each computer adds n/2 numbers together,
and the numbers are initially all held by first computer. Second computer submits its
result to first computer for adding the two partial sums together. Several phases:

1. Computer 1 sends n/2 numbers to computer 2.

2. Both computers add n/2 numbers simultaneously.
3. Computer 2 sends its partial result back to computer 1.
4. Computer 1 adds the partial sums to produce the final result.

Computation (for steps 2 and 4):

tcomp = n/2 + 1
Communication (for steps 1 and 3):
tcomm = (tstartup + n/2tdata) + (tstartup + tdata) = 2tstartup + (n/2 + 1)tdata
The computational complexity is Ο(n). The communication complexity is Ο(n). The
overall time complexity is Ο(n).

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 54
Cost-Optimal Algorithms
Acost-optimal (or work-efficient or processor-time optimality) algorithm is one in
which the cost to solve a problem is proportional to the execution time on a single
processor system (using the fastest known sequential algorithm); i.e.,

Cost = t p × n = k × t s
where k is a constant.
Given time complexity analysis, we can say that a parallel algorithm is cost-optimal
algorithm if

(Parallel time complexity) × (number of processors) = sequential time complexity

Example
Suppose the best known sequential algorithm for a problem has time complexity of
Ο(n log n). A parallel algorithm for the same problem that uses n processes and has a
time complexity of Ο(log n) is cost optimal, whereas a parallel algorithm that uses n2
processors and has time complexity of Ο(1) is not cost optimal.

Time Complexity of Broadcast/Gather

Broadcast on a Hypercube Network
Consider a three-dimensional hypercube. To broadcast from node 000 to every other
node, 001, 010, 011, 100, 101, 110 and 111, an efficient algorithm is

Node Node

1st step: 000 → 001

2nd step: 000 → 010

001 → 011

3rd step: 000 → 100

001 → 101
010 → 110
011 → 111

100 101
3rd step

010 011
2nd step

1st step 000 001

Figure 2.20 Broadcast in a three-dimensional hypercube.

The time complexity for a hypercube system will be Ο(log n), using this algorithm,
which is optimal because the diameter of a hypercube network is log n. It is necessary
at least to use this number of links in the broadcast to reach the furthest node.

P000
Message
Step 1

P000 P001
Step 2

P000 P010 P001 P011

Step 3

P000 P100 P010 P110 P001 P101 P011 P111

Figure 2.21 Broadcast as a tree construction.

The reverse algorithm can be used to gather data from all nodes to, say, node 000; i.e.,
for a three-dimensional hypercube,

Node Node

1st step: 100 → 000

101 → 001
110 → 010
111 → 011

2nd step: 010 → 000

011 → 001

3rd step: 001 → 000

In the case of gather, the messages become longer as the data is gathered, and hence the
time complexity is increased over Ο(log n).

Broadcast on a Mesh Network

Send message across top row and down each column as it reaches top of that column.
Steps
1 2 3

2 3 4 4

3 4 5 5

4 5 6 6

Figure 2.22 Broadcast in a mesh.

Optimal in terms of number of steps because same as diameter of mesh.

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 57
Broadcast on a Workstation Cluster
Broadcast on a single Ethernet connection can be done using a single message that is
read by all the destinations on the network simultaneously

Message

Source Destinations

Figure 2.23 Broadcast on an Ethernet network.

Source

Sequential

N destinations

Figure 2.24 1-to-N fan-out broadcast.

Sequential message issue

Destinations
Figure 2.25 1-to-N fan-out broadcast on a tree structure.

Debugging and Evaluating Parallel Programs

Visualization Tools
Programs can be watched as they are executed in a space-time diagram (or process-
time diagram):

Process 1

Process 2

Process 3

Computing
Time
Waiting
Message-passing system routine
Message

Figure 2.26 Space-time diagram of a parallel program.

Implementations of visualization tools are available for MPI. An example is the Upshot
program visualization system.

Debugging Strategies
Geist et al. (1994a) suggest a three-step approach to debugging message-passing
programs:

1. If possible, run the program as a single process and debug as a normal sequential
program.
2. Execute the program using two to four multitasked processes on a single computer.
Now examine actions such as checking that messages are indeed being sent to the
correct places. It is very common to make mistakes with message tags and have
messages sent to the wrong places.
3. Execute the program using the same two to four processes but now across several
computers. This step helps find problems that are caused by network delays related
to synchronization and timing.

Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen  Prentice Hall, 1999. All rights reserved. Page 60
Evaluating Programs Empirically
Measuring Execution Time
To measure the execution time between point L1 and point L2 in the code, we might have
a construction such as

.
L1: time(&t1); /* start timer */
.
.
L2: time(&t2); /* stop timer */
.
elapsed_time = difftime(t2, t1); /* elapsed_time = t2 - t1 */
printf(“Elapsed time = %5.2f seconds”, elapsed_time);

MPI provides the routine MPI_Wtime() for returning time (in seconds).

Communication Time by the Ping-Pong Method

One process, say P0, is made to send a message to another process, say P1.
Immediately upon receiving the message, P1 sends the message back to P0.

P0
.
L1: time(&t1);
send(&x, P1);
recv(&x, P1);
L2: time(&t2);
elapsed_time = 0.5 * difftime(t2, t1);
printf(“Elapsed time = %5.2f seconds”, elapsed_time);
.

P1
.
recv(&x, P0);
send(&x, P0);
.

1 2 3 4 5 6 7 8 9 10
Statement number or regions of program

Figure 2.27 Program profile.

MiniTool Partition Wizard Crack 12 Key Download Free 2025
No ratings yet
MiniTool Partition Wizard Crack 12 Key Download Free 2025
29 pages
C29-_English_instruction_manual-1
No ratings yet
C29-_English_instruction_manual-1
24 pages
1st SEM M.tech June 2009 UNIX Network Programming
No ratings yet
1st SEM M.tech June 2009 UNIX Network Programming
1 page
1st SEM M.tech Dec 2010 Network Programming
No ratings yet
1st SEM M.tech Dec 2010 Network Programming
1 page
Np Question Bank New Part-1 to Students
No ratings yet
Np Question Bank New Part-1 to Students
1 page
VTU Network Programming Content
No ratings yet
VTU Network Programming Content
3 pages
VTU Network Programming-4
No ratings yet
VTU Network Programming-4
3 pages
VTU Network Programming -2
No ratings yet
VTU Network Programming -2
4 pages
Parallel Random Access Machines
No ratings yet
Parallel Random Access Machines
5 pages
TDC561_week1
No ratings yet
TDC561_week1
75 pages
module-4 cc notes
No ratings yet
module-4 cc notes
7 pages
Sampling (UK Stream) and ISA 530, Audit Sampling and Other Selective Testing Procedures (International Stream)
No ratings yet
Sampling (UK Stream) and ISA 530, Audit Sampling and Other Selective Testing Procedures (International Stream)
15 pages
42 Networking
No ratings yet
42 Networking
18 pages
M1A CD396 Operation
No ratings yet
M1A CD396 Operation
78 pages
Message Passing Model
No ratings yet
Message Passing Model
3 pages
Untitled document
No ratings yet
Untitled document
23 pages
6th Unit Answers
No ratings yet
6th Unit Answers
7 pages
CH -2 DPC
No ratings yet
CH -2 DPC
9 pages
Module_203_20-_20MPI_20for_20Cluster_20Computing_20Lec
No ratings yet
Module_203_20-_20MPI_20for_20Cluster_20Computing_20Lec
30 pages
Network Programming 2
No ratings yet
Network Programming 2
127 pages
Distributed System Notes
No ratings yet
Distributed System Notes
10 pages
Unit3-all
No ratings yet
Unit3-all
115 pages
MPI (1)
No ratings yet
MPI (1)
57 pages
Toyota Fortuner (Em21N0E) : Junction Connector (CAN)
No ratings yet
Toyota Fortuner (Em21N0E) : Junction Connector (CAN)
1 page
PDC Week 11 Synchronization
No ratings yet
PDC Week 11 Synchronization
6 pages
Assignment: Case Study
67% (3)
Assignment: Case Study
3 pages
Naukri_ShubhraJha[7y_0m]
No ratings yet
Naukri_ShubhraJha[7y_0m]
5 pages
spathiphyllum-cultivation-manual-2019_en
No ratings yet
spathiphyllum-cultivation-manual-2019_en
5 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
A. 2. The Union Executive The President and The Vice-President
100% (2)
A. 2. The Union Executive The President and The Vice-President
15 pages
Unit-II part I
No ratings yet
Unit-II part I
15 pages
Unit 3
No ratings yet
Unit 3
43 pages
Distribute Computing Another Slide
No ratings yet
Distribute Computing Another Slide
92 pages
Student Attachment Guide v2.3.3
No ratings yet
Student Attachment Guide v2.3.3
50 pages
PDC Lec 8
No ratings yet
PDC Lec 8
40 pages
24-Chapter-24-Operating-Segments
No ratings yet
24-Chapter-24-Operating-Segments
8 pages
Ride Details Bill Details: Thanks For Travelling With Us, S
No ratings yet
Ride Details Bill Details: Thanks For Travelling With Us, S
3 pages
operating systems ch-2 part2
No ratings yet
operating systems ch-2 part2
19 pages
Module 5
No ratings yet
Module 5
9 pages
FALLSEM2020-21 CSE4001 ETH VL2020210104170 Reference Material I 02-Sep-2020 Module4-MessagePassing 1
No ratings yet
FALLSEM2020-21 CSE4001 ETH VL2020210104170 Reference Material I 02-Sep-2020 Module4-MessagePassing 1
12 pages
Exercise 1 Er Modelling 1
No ratings yet
Exercise 1 Er Modelling 1
3 pages
Outline For Public International Law
No ratings yet
Outline For Public International Law
13 pages
New York Times
No ratings yet
New York Times
18 pages
PA midsem
No ratings yet
PA midsem
20 pages
Wireless Sensor Network (WSN) Architecture and Applications
No ratings yet
Wireless Sensor Network (WSN) Architecture and Applications
8 pages
Allotropes of Carbon
No ratings yet
Allotropes of Carbon
9 pages
CH 6
No ratings yet
CH 6
47 pages
ch5 MPI
No ratings yet
ch5 MPI
53 pages
Load Balancing and Termination Detection
No ratings yet
Load Balancing and Termination Detection
67 pages
Slides 1
No ratings yet
Slides 1
28 pages
Parallel Programming With Message-Passing Interface (MPI)
No ratings yet
Parallel Programming With Message-Passing Interface (MPI)
6 pages
Time Series Analysis
No ratings yet
Time Series Analysis
9 pages
Mpi Openmp Handouts
No ratings yet
Mpi Openmp Handouts
67 pages
Slides PDF
No ratings yet
Slides PDF
268 pages
Parallel Programming - Slides
No ratings yet
Parallel Programming - Slides
268 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
51 pages
Evaluating Hotel Performance - 6 Key Factors
No ratings yet
Evaluating Hotel Performance - 6 Key Factors
3 pages
2.0 Semantic Terms: Distributed Memory System
No ratings yet
2.0 Semantic Terms: Distributed Memory System
4 pages
Interprocess Communication & Process Synchronization: Fall 09
No ratings yet
Interprocess Communication & Process Synchronization: Fall 09
51 pages
Transformation Problems
No ratings yet
Transformation Problems
18 pages
mpi2
No ratings yet
mpi2
46 pages
MPI Part2 Updated
No ratings yet
MPI Part2 Updated
20 pages
Concurrency
No ratings yet
Concurrency
60 pages
Fault Analysis On Three Phase Transmission Lines and Its Detection
No ratings yet
Fault Analysis On Three Phase Transmission Lines and Its Detection
5 pages
CS621 Final Term Current Papers
No ratings yet
CS621 Final Term Current Papers
9 pages
Writing Message Passing Parallel Programs With MPI: Course Notes
No ratings yet
Writing Message Passing Parallel Programs With MPI: Course Notes
80 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
Lec 9 DR Marwa Abbas
No ratings yet
Lec 9 DR Marwa Abbas
64 pages
Message Passing Architecture
No ratings yet
Message Passing Architecture
32 pages
Mpi
No ratings yet
Mpi
46 pages
Slides 2
No ratings yet
Slides 2
58 pages
DSI233 Group 2 Solution and EA Presentation
No ratings yet
DSI233 Group 2 Solution and EA Presentation
25 pages
Mpi Course
No ratings yet
Mpi Course
93 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
02 Message Passing Interface Tutorial
No ratings yet
02 Message Passing Interface Tutorial
34 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
PDC Lecture 16 MPI - Net-New
No ratings yet
PDC Lecture 16 MPI - Net-New
59 pages
Unit - 3 - My
No ratings yet
Unit - 3 - My
84 pages
Tax Digest For CTA Jurisdiction
No ratings yet
Tax Digest For CTA Jurisdiction
7 pages
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
No ratings yet
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
24 pages
BIg data anslysi
No ratings yet
BIg data anslysi
57 pages
What Is Data Structure
100% (1)
What Is Data Structure
31 pages
Unit V Patents 1
No ratings yet
Unit V Patents 1
26 pages
You Press 'Enter' on the Browser: What happens when..., #1
From Everand
You Press 'Enter' on the Browser: What happens when..., #1
Dustin W. Morris
5/5 (1)
Configuration of Postfix Mail Server Supporting Anti Spam and Anti Virus
From Everand
Configuration of Postfix Mail Server Supporting Anti Spam and Anti Virus
Dr. Hedaya Mahmood Alasooly
No ratings yet
Harticulture Development
100% (1)
Harticulture Development
68 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Daemon Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Daemon Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
MX200 Design en R01
100% (1)
MX200 Design en R01
116 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
Graph Database-Based Network Security
No ratings yet
Graph Database-Based Network Security
12 pages
Mpi Lecture
No ratings yet
Mpi Lecture
129 pages
Starting Guide for Postfix Mail Server Configuration Supporting Anti Spam and Anti Virus
From Everand
Starting Guide for Postfix Mail Server Configuration Supporting Anti Spam and Anti Virus
Dr. Hidaia Mahmood Alassouli
No ratings yet
VAS6161 - VAS 6161 Battery Tester Instruction Manual
No ratings yet
VAS6161 - VAS 6161 Battery Tester Instruction Manual
27 pages
Personal Goal Setting
No ratings yet
Personal Goal Setting
56 pages
FMM-BID-DOC-007 - Key Personnels Affidavit of Commitment To Work On The Contract - R0
50% (2)
FMM-BID-DOC-007 - Key Personnels Affidavit of Commitment To Work On The Contract - R0
2 pages
The Beginner’s Guide to Node.js
From Everand
The Beginner’s Guide to Node.js
Steven Mcananey
No ratings yet
Message Passing Interface: Parallel Processing Course University of Tehran
No ratings yet
Message Passing Interface: Parallel Processing Course University of Tehran
49 pages
OPS-Chapter-2-Services and Components of Operating Systems
No ratings yet
OPS-Chapter-2-Services and Components of Operating Systems
10 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet