Message-Passing Computing
Message-Passing Computing
We will concentrate upon the third option. Necessary to say explicitly what processes
are to be executed, when to pass messages between concurrent processes, and what to
pass in the messages.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 29
Single Program Multiple Data (SPMD) model
Different processes are merged into one program. Within the program are control
statements that will customize the code; i.e. select different parts for each process.
Source
file
Compile to suit
processor
Executables
Processor 0 Processor n − 1
Start execution
spawn(); of process 2 Process 2
Time
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 30
Basic Send and Receive Routines
Process 1 Process 2
x y
Movement
send(&x, 2); of data
recv(&y, 1);
Figure 2.3 Passing a message between processes using send() and recv() library
calls.
Do not need message buffer storage. A synchronous send routine could wait until the
complete message can be accepted by the receiving process before sending the
message.
A synchronous receive routine will wait until the message it is expecting arrives.
Synchronous routines intrinsically perform two actions: They transfer data and they
synchronize processes.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 31
Process 1 Process 2
Process 1 Process 2
Time recv();
Request to send Suspend
send(); process
Both processes Message
continue Acknowledgment
(b) When recv() occurs before send()
Figure 2.4 Synchronous send() and recv() library calls using a three-way
Non-blocking - has been used to describe routines that return whether or not the
message had been received.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 32
MPI Definitions of Blocking and Non-Blocking
Blocking - return after their local actions complete, though the message transfer may
not have been completed.
Non-blocking - return immediately. Assumed that the data storage being used for the
transfer is not modified by the subsequent statements prior to the data storage being
used for the transfer, and it is left to the programmer to ensure this.
Process 1 Process 2
Message buffer
Time send();
Continue recv();
process Read
message buffer
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 33
Message Tag
Used to differentiate between different types of messages being sent.
Example
To send a message, x, with message tag 5 from a source process, 1, to a destination
process, 2, and assign to y, we might have
send(&x, 2, 5);
in the source process and
recv(&y, 1, 5);
in the destination process. The message tag is carried within the message.
If special type matching is not required, a wild card message tag is used, so that the
recv() will match with any send().
Broadcast
Sending the same message to all the processes concerned with the problem.
Multicast - sending the same message to a defined group of processes.
Action
buf
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 34
Scatter
Sending each element of an array of data in the root to a separate process. The contents
of the ith location of the array is sent to the ith process.
Action
buf
Gather
Having one process collect individual values from a set of processes.
Action
buf
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 35
Reduce
Gather operation combined with a specified arithmetic or logical operation. Example,
the values could be gathered and then added together by the root:
Action
buf +
PVM (Parallel Virtual Machine) - Perhaps the first widely adopted attempt at using a
workstation cluster as a multicomputer platform developed by Oak Ridge National
Laboratories.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 36
PVM
The programmer decomposes the problem into separate programs. Each program is
written in C (or Fortran) and compiled to run on specific types of computers in the
network.
The set of computers used on a problem first must be defined prior to running the
programs.
The most convenient way of doing this is by creating a list of the names of the
computers available in a hostfile. The hostfile is then read by PVM.
Workstation
PVM
daemon
Application
program
(executable)
Messages
sent through
Workstation network
Workstation
PVM
daemon
Application
program PVM
(executable) daemon
Application
program
(executable)
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 37
Workstation
PVM
daemon
Messages
sent through
Workstation network
PVM
daemon Workstation
PVM
daemon
Application
program
(executable)
Figure 2.11 Multiple processes allocated to each processor (workstation).
pvm_psend()and pvm_precv()
If data being sent is a list of items of the same data type, the PVM routines
pvm_psend() and pvm_precv() can be used.
pvm_psend();
Continue pvm_precv(); Wait for message
process
Process_1
Process_2
pvm_initsend(); x
Send s
buffer y
pvm_pkint( … &x …);
pvm_pkstr( … &s …);
pvm_pkfloat( … &y …);
pvm_send(process_2 … ); Message
pvm_recv(process_1 …);
pvm_upkint( … &x …);
Receive pvm_upkstr( … &s …);
buffer pvm_upkfloat(… &y … );
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 39
Broadcast, Multicast, Scatter, Gather, and Reduce
The pvm_bcast(), when called, would send a message to each member of the named
group.
Similarly, pvm_gather() would collect values from each member of the named group.
#include <stdio.h>
#include <stdlib.h>
Master
#include <pvm3.h>
#define SLAVE “spsum”
#define PROC 10
#define NELEM 1000
main() {
int mytid,tids[PROC];
int n = NELEM, nproc = PROC;
int no, i, who, msgtype;
int data[NELEM],result[PROC],tot=0;
char fn[255]; Slave
FILE *fp; #include <stdio.h>
mytid=pvm_mytid();/*Enroll in PVM */ #include “pvm3.h”
#define PROC 10
/* Start Slave Tasks */ #define NELEM 1000
no=
pvm_spawn(SLAVE,(char**)0,0,““,nproc,tids); main() {
if (no < nproc) { int mytid;
printf(“Trouble spawning slaves \n”); int tids[PROC];
for (i=0; i<no; i++) pvm_kill(tids[i]); int n, me, i, msgtype;
pvm_exit(); exit(1); int x, nproc, master;
} int data[NELEM], sum;
/* Open Input File and Initialize Data */ mytid = pvm_mytid();
strcpy(fn,getenv(“HOME”));
strcat(fn,”/pvm3/src/rand_data.txt”); /* Receive data from master */
if ((fp = fopen(fn,”r”)) == NULL) { msgtype = 0;
printf(“Can’t open input file %s\n”,fn); pvm_recv(-1, msgtype);
exit(1); pvm_upkint(&nproc, 1, 1);
} pvm_upkint(tids, nproc, 1);
for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]); pvm_upkint(&n, 1, 1);
pvm_upkint(data, n, 1);
/* Broadcast data To slaves*/
pvm_initsend(PvmDataDefault); /* Determine my tid */
msgtype = 0; for (i=0; i<nproc; i++)
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 40
/* Open Input File and Initialize Data */ mytid = pvm_mytid();
strcpy(fn,getenv(“HOME”));
strcat(fn,”/pvm3/src/rand_data.txt”); /* Receive data from master */
if ((fp = fopen(fn,”r”)) == NULL) { msgtype = 0;
printf(“Can’t open input file %s\n”,fn); pvm_recv(-1, msgtype);
exit(1); pvm_upkint(&nproc, 1, 1);
} pvm_upkint(tids, nproc, 1);
for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]); pvm_upkint(&n, 1, 1);
pvm_upkint(data, n, 1);
/* Broadcast data To slaves*/
pvm_initsend(PvmDataDefault); /* Determine my tid */
msgtype = 0; for (i=0; i<nproc; i++)
pvm_pkint(&nproc, 1, 1); if(mytid==tids[i])
pvm_pkint(tids, nproc, 1); {me = i;break;}
pvm_pkint(&n, 1, 1); Broadcast data
pvm_pkint(data, n, 1); /* Add my portion Of data */
pvm_mcast(tids, nproc, msgtag); x = n/nproc;
low = me * x;
high = low + x;
/* Get results from Slaves*/ for(i = low; i < high; i++)
msgtype = 5; sum += data[i];
for (i=0; i<nproc; i++){
pvm_recv(-1, msgtype); /* Send result to master */
pvm_upkint(&who, 1, 1); pvm_initsend(PvmDataDefault);
pvm_upkint(&result[who], 1, 1); Receive results pvm_pkint(&me, 1, 1);
printf(“%d from %d\n”,result[who],who); pvm_pkint(&sum, 1, 1);
} msgtype = 5;
master = pvm_parent();
/* Compute global sum */ pvm_send(master, msgtype);
for (i=0; i<nproc; i++) tot += result[i];
printf (“The total is %d.\n\n”, tot); /* Exit PVM */
pvm_exit();
pvm_exit(); /* Program finished. Exit PVM */ return(0);
return(0); }
}
Figure 2.15 Sample PVM program.
MPI
Process Creation and Execution
Only static process creation is supported in MPI version 1. All the processes must be
defined prior to execution and started together. Use SPMD model of computation.
Communicators
Defines the scope of a communication operation.
Processes have ranks associated with the communicator.
Initially, all processes enrolled in a “universe” called MPI_COMM_WORLD, and each process is
given a unique rank, a number from 0 to n − 1, where there are n processes.
Other communicators can be established for groups of processes.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 41
Using the SPMD Computational Model
where master() and slave() are procedures to be executed by the master process and
slave process, respectively.
Variables that are not to be duplicated will need to be declared within code only
executed by that process.
Example
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find process rank */
if (myrank == 0) { /* process 0 actions/local variables */
int x, y;
.
.
} else if (myrank == 1) {/* process 1 actions/local variables */
int x, y;
.
.
}
Here, x and y in process 0 are different local variables from x and y in process 1.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 42
Process 0 Process 1
Destination
send(…,1,…);
recv(…,0,…); lib()
recv(…,0,…);
send(…,1,…);
lib() send(…,1,…);
recv(…,0,…); lib()
recv(…,0,…);
MPI Solution
Communicators - used in MPI for all point-to-point and collective MPI message-
passing communications.
In this way, the communication domain of the library can be separated from that of a
user program.
Each process has a rank within the communicator, an integer from 0 to n − 1, where
there are n processes.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 43
Communicator Types
A process has a unique rank in a group (an integer from 0 to m − 1, where there are m
processes in the group). A process could be a member of more than one group.
Point-to-Point Communication
Message tags are present, and wild cards can be used in place of the tag (MPI_ANY_TAG)
and in place of the source in receive routines (MPI_ANY_SOURCE).
PVM style packing and unpacking data is generally avoided by the use of an MPI
datatype being defined in the send/receive parameters together with the source or
destination of the message.
Blocking Routines
Return when they are locally complete - when the location used to hold the message
can be used again or altered without affecting the message being sent.
A blocking send will send the message and return. This does not mean that the message
has been received, just that the process is free to move on without adversely affecting
the message.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 44
The general format of parameters of the blocking send is
Example
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 45
Nonblocking Routines
Nonblocking send - MPI_Isend(), will return “immediately” even before source
location is safe to be altered.
MPI_Wait() waits until the operation has actually completed and will return then.
MPI_Test() returns with a flag set indicating whether operation completed at that time.
These routines need to know whether the particular operation has completed, which is
determined by accessing the request parameter.
Example
To send an integer x from process 0 to process 1 and allow process 0 to continue,
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 46
Send Communication Modes
Standard Mode Send
Not assumed that corresponding receive routine has started. Amount of buffering not
defined by MPI. If buffering is provided, send could complete before receive reached.
Buffered Mode
Send may start and return before a matching receive. Necessary to specify buffer space
via routine MPI_Buffer_attach() - removed with MPI_Buffer_detach().
Synchronous Mode
Send and receive can start before each other but can only complete together.
Ready Mode
Send can only start if matching receive already reached, otherwise error. Use with care.
Each of the four modes can be applied to both blocking and nonblocking send routines.
Only the standard mode is available for the blocking and nonblocking receive routines.
Any type of send routine can be used with any type of receive routine.
Collective Communication
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 47
Example
To gather items from the group of processes into process 0, using dynamically allocated
memory in the root process, we might use
Note that MPI_Gather() gathers from all processes, including the root.
Barrier
As in all message-passing systems, MPI provides a means of synchronizing processes
by stopping each one until they all have reached a specific “barrier” call.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 48
#include “mpi.h”
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000
void main(int argc, char *argv)
{
int myid, numprocs;
int data[MAXSIZE], i, x, low, high, myresult, result;
char fn[255];
char *fp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { /* Open input file and initialize data */
strcpy(fn,getenv(“HOME”));
strcat(fn,”/MPI/rand_data.txt”);
if ((fp = fopen(fn,”r”)) == NULL) {
printf(“Can’t open the input file: %s\n\n”, fn);
exit(1);
}
for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);
}
/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
/* Add my portion Of data */
x = n/nproc;
low = myid * x;
high = low + x;
for(i = low; i < high; i++)
myresult += data[i];
printf(“I got %d from %d\n”, myresult, myid);
/* Compute global sum */
MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(“The sum is %d.\n”, result);
MPI_Finalize();
}
Figure 2.17 Sample MPI program.
Pseudocode Constructs
To send message consisting of an integer x and a float y, from the process called master
to the process called slave, assigning to a and b, we simply write in the master process
where x and a are integers and y and b are floats. x will be copied to a, and y copied to b.
ith process given notation Pi, and a tag may be present i.e.,
Locally blocking send() and recv() written as given. Other forms with prefixes; i.e.,
ssend(&data1, Pdestination); /* Synchronous send */
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 49
Parallel Execution Time
Two parts: a computation part, say tcomp, and a communication part, say tcomm; i.e.,
tp = tcomp + tcomm
Communication Time
As a first approximation, we will use
tcomm = tstartup + ntdata
where tstartup is the startup time, sometimes called the message latency - essentially
time to send a message with no data. Startup time is assumed constant.
The term tdata is the transmission time to send one data word, also assumed a constant,
and there are n data words.
Startup time
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 50
Important Note on Interpretation of Equations
Only intended to give a starting point to how an algorithm might perform in practice.
Parallel execution time, tp, normalized in units of an arithmetic operation, which will
depend upon computer system.
tcomp = m
Since we are measuring time in units of computational steps, the communication time
has to be measured in the same way.
Latency Hiding
A way to ameliorate situation of significant message communication times is to overlap
communication with subsequent computations.
Latency hiding can also be achieved by mapping multiple processes on a processor and
use a time-sharing facility that switches for one process to another when the first
process is stalled because of incomplete message passing or otherwise.
Relies upon an efficient method of switching from one process to another. Threads
offer an efficient mechanism.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 51
Time Complexity
As with sequential computations, a parallel algorithm can be evaluated through the use
of time complexity (notably the Ο notation — “order of magnitude,” big-oh).
Start with an estimate of the number of the computational steps, considering all
arithmetic and logical operations to be equal and ignoring other aspects of the
computation such as computational tests.
The Ο notation
f(x) = Ο(g(x)) if and only if there exists positive constants, c and x0, such that
0 ≤ f(x) ≤ cg(x) for all x ≥ x0
For example, if f(x) = 4x2 + 2x + 12, the constant c = 6 would work with the formal
definition to establish that f(x) = O(x2), since 0 < 4x2 + 2x + 12 ≤ 6x2 for x ≥ 3.
Alternative functions for g(x) that will satisfy definition. Use the function that grows
least for g(x).
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 52
Θ notation - upper bound
f(x) = Θ(g(x)) if and only if there exists positive constants c1, c2, and x0 such that
0 ≤ c1g(x) ≤ f(x) ≤ c2g(x) for all x ≥ x0.
We can read Ο() as “grows at most as fast as” and Ω() as “grows at least as fast as.”
Example
The execution time of a sorting algorithm often depends upon the original order of the
numbers to be sorted. It may be that it requires at least n log n steps, but could require
n2 steps for n numbers depending upon the order of the numbers. This would be
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 53
c2g(x) = 6x2
160
120
100
80
60 c1g(x) = 2x2
40
20
0
0 1 2 3 4 5
x0
Figure 2.19 Growth of function f(x) = 4x2 + 2x + 12.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 54
Cost-Optimal Algorithms
Acost-optimal (or work-efficient or processor-time optimality) algorithm is one in
which the cost to solve a problem is proportional to the execution time on a single
processor system (using the fastest known sequential algorithm); i.e.,
Cost = t p × n = k × t s
where k is a constant.
Given time complexity analysis, we can say that a parallel algorithm is cost-optimal
algorithm if
Example
Suppose the best known sequential algorithm for a problem has time complexity of
Ο(n log n). A parallel algorithm for the same problem that uses n processes and has a
time complexity of Ο(log n) is cost optimal, whereas a parallel algorithm that uses n2
processors and has time complexity of Ο(1) is not cost optimal.
Node Node
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 55
110 111
100 101
3rd step
010 011
2nd step
The time complexity for a hypercube system will be Ο(log n), using this algorithm,
which is optimal because the diameter of a hypercube network is log n. It is necessary
at least to use this number of links in the broadcast to reach the furthest node.
P000
Message
Step 1
P000 P001
Step 2
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 56
Gather on a Hypercube Network
The reverse algorithm can be used to gather data from all nodes to, say, node 000; i.e.,
for a three-dimensional hypercube,
Node Node
In the case of gather, the messages become longer as the data is gathered, and hence the
time complexity is increased over Ο(log n).
2 3 4 4
3 4 5 5
4 5 6 6
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 57
Broadcast on a Workstation Cluster
Broadcast on a single Ethernet connection can be done using a single message that is
read by all the destinations on the network simultaneously
Message
Source Destinations
Source
Sequential
N destinations
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 58
Source
Destinations
Figure 2.25 1-to-N fan-out broadcast on a tree structure.
Process 1
Process 2
Process 3
Computing
Time
Waiting
Message-passing system routine
Message
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 59
PVM has a visualization tool called XPVM.
Implementations of visualization tools are available for MPI. An example is the Upshot
program visualization system.
Debugging Strategies
Geist et al. (1994a) suggest a three-step approach to debugging message-passing
programs:
1. If possible, run the program as a single process and debug as a normal sequential
program.
2. Execute the program using two to four multitasked processes on a single computer.
Now examine actions such as checking that messages are indeed being sent to the
correct places. It is very common to make mistakes with message tags and have
messages sent to the wrong places.
3. Execute the program using the same two to four processes but now across several
computers. This step helps find problems that are caused by network delays related
to synchronization and timing.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 60
Evaluating Programs Empirically
Measuring Execution Time
To measure the execution time between point L1 and point L2 in the code, we might have
a construction such as
.
L1: time(&t1); /* start timer */
.
.
L2: time(&t2); /* stop timer */
.
elapsed_time = difftime(t2, t1); /* elapsed_time = t2 - t1 */
printf(“Elapsed time = %5.2f seconds”, elapsed_time);
MPI provides the routine MPI_Wtime() for returning time (in seconds).
P0
.
L1: time(&t1);
send(&x, P1);
recv(&x, P1);
L2: time(&t2);
elapsed_time = 0.5 * difftime(t2, t1);
printf(“Elapsed time = %5.2f seconds”, elapsed_time);
.
P1
.
recv(&x, P0);
send(&x, P0);
.
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 61
Profiling
A profile is a histogram or graph showing time spent on different parts of program:
1 2 3 4 5 6 7 8 9 10
Statement number or regions of program
Slides for Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1999. All rights reserved. Page 62