Parallel Programming - Slides
Parallel Programming - Slides
1 Astrophysical N-body
simulation by Scott Linssen (undergraduate
University of North Carolina at Charlotte
[UNCC] student).
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Main memory
Instructions (to processor)
Data (to or from processor)
Processor
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
One
address
space
Memory modules
Interconnection
network
Processors
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Interconnection
network
Messages
Processor
Local
memory
Computers
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Interconnection
network
Messages
Processor
Shared
memory
Computers
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Program
Instructions
Program
Instructions
Processor
Processor
Data
Data
Figure 1.6 MPMD structure.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Computers
M
C
C
P
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Computer (node)
Links
to other
nodes
Switch
Processor
Links
to other
nodes
Memory
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Link
Node
Node
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
10
Links
Computer/
processor
Figure 1.11
(mesh).
Two-dimensional array
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
11
Root
Links
Processing
element
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
12
110
100
111
101
010
000
011
001
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
13
0110
0100
0111
0101
0010
0000
1100
0011
0001
Figure 1.14
1110
1111
1101
1010
1000
1011
1001
Four-dimensional hypercube.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
14
Ring
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
15
Nodal address
1011
10
11
01
00
x
00
01
11
10
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
16
A
Root
A
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
17
Packet
Head
Movement
Flit buffer
Request/
Acknowledge
signal(s)
Figure 1.18 Distribution of flits.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
18
Source
processor
Destination
processor
Data
R/A
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
19
Packet switching
Network
latency
Wormhole routing
Circuit switching
Distance
(number of nodes between source and destination)
Figure 1.20
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
20
Node 4
Node 3
Messages
Node 1
Node 2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
21
Virtual channel
buffer
Node
Node
Route
Physical link
Figure 1.22 Multiple virtual channels mapped onto a single physical channel.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
22
Ethernet
Workstation/
file server
Workstations
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
23
Frame check
sequence
(32 bits)
Data
(variable)
Type
(16 bits)
Source
address
(48 bits)
Destination
address
(48 bits)
Preamble
(64 bits)
Direction
Figure 1.24
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
24
Network
Workstation/
file server
Workstations
Figure 1.25 Network of workstations connected via a ring.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
25
Workstations
Workstation/
file server
Figure 1.26 Star connected network.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
26
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
27
Process 1
Process 2
Computing
Process 3
Process 4
Message
Time
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
28
ts
fts
(1 f)ts
Serial section
Parallelizable sections
(b) Multiple
processors
n processors
tp
Figure 1.29
(1 f)ts /n
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
29
f = 0%
20
20
16
12
f = 5%
8
f = 10%
f = 20%
n = 256
16
12
8
4
n = 16
4
8
12
16
Number of processors, n
(a)
20
0.2
0.4
0.6
0.8
Serial fraction, f
(b)
1.0
Figure 1.30 (a) Speedup against number of processors. (b) Speedup against serial fraction, f.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
30
Source
file
Compile to suit
processor
Executables
Processor 0
Processor n 1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
31
Process 1
spawn();
Start execution
of process 2
Process 2
Time
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
32
Process 1
Process 2
send(&x, 2);
Movement
of data
recv(&y, 1);
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
33
Process 1
Time
send();
Suspend
process
Both processes
continue
Process 2
Request to send
Acknowledgment
recv();
Message
Process 2
Time
recv();
Request to send
send();
Both processes
continue
Suspend
process
Message
Acknowledgment
(b) When recv() occurs before send()
Figure 2.4 Synchronous send() and recv() library calls using a three-way protocol.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
34
Process 1
Process 2
Message buffer
Time
send();
Continue
process
recv();
Read
message buffer
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
35
Process 0
Process 1
data
data
Process n 1
data
Action
buf
bcast();
bcast();
bcast();
Code
Figure 2.6
Broadcast operation.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
36
Process 0
Process 1
Process n 1
data
data
data
scatter();
scatter();
scatter();
Action
buf
Code
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
37
Process 0
Process 1
Process n 1
data
data
data
gather();
gather();
gather();
Action
buf
Code
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
38
Process 0
Process 1
data
Process n 1
data
data
reduce();
reduce();
Action
buf
+
reduce();
Code
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
39
Workstation
PVM
daemon
Application
program
(executable)
Messages
sent through
network
Workstation
Workstation
PVM
daemon
Application
program
(executable)
PVM
daemon
Application
program
(executable)
Figure 2.10
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
40
Workstation
PVM
daemon
Messages
sent through
network
Workstation
PVM
daemon
Workstation
PVM
daemon
Application
program
(executable)
Figure 2.11 Multiple processes allocated to each processor (workstation).
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
41
Array
holding
data
Process 1
Send buffer
Pack
Process 2
Array to
receive
data
pvm_psend();
Continue
process
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
42
Process_1
Process_2
pvm_initsend();
pvm_pkint( &x );
pvm_pkstr( &s );
pvm_pkfloat( &y );
pvm_send(process_2 );
x
s
y
Send
buffer
Message
Receive
buffer
Figure 2.13
pvm_recv(process_1 );
pvm_upkint( &x );
pvm_upkstr( &s );
pvm_upkfloat( &y );
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
43
#include <stdio.h>
Master
#include <stdlib.h>
#include <pvm3.h>
#define SLAVE spsum
#define PROC 10
#define NELEM 1000
main() {
int mytid,tids[PROC];
int n = NELEM, nproc = PROC;
int no, i, who, msgtype;
int data[NELEM],result[PROC],tot=0;
char fn[255];
FILE *fp;
mytid=pvm_mytid();/*Enroll in PVM */
Slave
#include <stdio.h>
#include pvm3.h
#define PROC 10
#define NELEM 1000
main()
int
int
int
int
int
{
mytid;
tids[PROC];
n, me, i, msgtype;
x, nproc, master;
data[NELEM], sum;
mytid = pvm_mytid();
/* Determine my tid */
for (i=0; i<nproc; i++)
if(mytid==tids[i])
{me = i;break;}
Broadcast data
44
Process 0
Process 1
Destination
send(,1,);
lib()
send(,1,);
Source
recv(,0,);
lib()
recv(,0,);
Process 1
send(,1,);
lib()
send(,1,);
recv(,0,);
lib()
recv(,0,);
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
45
#include mpi.h
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000
void main(int argc, char *argv)
{
int myid, numprocs;
int data[MAXSIZE], i, x, low, high, myresult, result;
char fn[255];
char *fp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) {
/* Open input file and initialize data */
strcpy(fn,getenv(HOME));
strcat(fn,/MPI/rand_data.txt);
if ((fp = fopen(fn,r)) == NULL) {
printf(Cant open the input file: %s\n\n, fn);
exit(1);
}
for(i = 0; i < MAXSIZE; i++) fscanf(fp,%d, &data[i]);
}
/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
/* Add my portion Of data */
x = n/nproc;
low = myid * x;
high = low + x;
for(i = low; i < high; i++)
myresult += data[i];
printf(I got %d from %d\n, myresult, myid);
/* Compute global sum */
MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(The sum is %d.\n, result);
MPI_Finalize();
}
Figure 2.16
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
46
Time
Startup time
Number of data items (n)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
47
c2g(x) = 6x2
160
f(x) = 4x2 + 2x + 12
140
120
100
80
c1g(x) = 2x2
60
40
20
0
0
3
x0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
48
110
100
111
101
3rd step
010
2nd step
1st step
000
011
001
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
49
P000
Message
Step 1
P000
P001
Step 2
P000
P010
P001
P011
Step 3
P000
P100
P010
P110
P001
P101
P011
P111
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
50
Steps
1
6
Figure 2.21 Broadcast in a mesh.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
51
Message
Source
Destinations
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
52
Source
Sequential
N destinations
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
53
Source
Destinations
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
54
Process 1
Process 2
Process 3
Time
Computing
Waiting
Message-passing system routine
Message
Figure 2.25 Space-time diagram of a parallel program.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
55
2
3
4
5
6
7
8
9
Statement number or regions of program
10
Figure 2.26 Program profile.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
56
Input data
Processes
Results
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
57
spawn()
send()
Master
send()
recv()
Collect results
Figure 3.2 Practical embarrassingly parallel computational graph with dynamic process
creation and the master-slave approach.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
58
x
Process
80
640
Map
80
480
640
Map
480
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
59
+2
Imaginary
2
2
Real
+2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
60
Work pool
(xc, yc)
(xa, ya)
(xb, yb)
(xe, ye)
(xd, yd)
Task
Return results/
request new task
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
61
Row sent
disp_height
Increment
Row returned
Terminate
Decrement
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
62
Total area = 4
Area =
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
63
f(x)
y =
x
1
1 x2
Figure 3.8 Function being integrated in
computing by a Monte Carlo method.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
64
Master
Partial sum
Request
Slaves
Random
number
Random number
process
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
65
x1
x2
xk-1
xk
xk+1
xk+2
x2k-1
x2k
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
66
x(m1)n/m xn1
+
Partial sums
+
Sum
Figure 4.1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
67
Initial problem
Divide
problem
Final tasks
Figure 4.2
Tree construction.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
68
Original list
P0
P0
P4
P0
P0
P2
P1
P2
P4
P3
P4
P6
P5
x0
P6
P7
xn1
Figure 4.3 Dividing a list into parts.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
69
x0
xn1
P0
P1
P2
P0
P3
P4
P2
P5
P6
P4
P0
P7
P6
P4
P0
Final sum
Figure 4.4
Partial summation.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
70
Found/
Not found
OR
OR
OR
Figure 4.5
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
71
Figure 4.6
Quadtree.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
72
Image area
First division
into four parts
Second division
Figure 4.7
Dividing an image.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
73
Unsorted numbers
Buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Figure 4.8 Bucket sort.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
74
Unsorted numbers
p processors
Buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Figure 4.9 One parallel version of bucket sort.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
75
n/m numbers
Unsorted numbers
p processors
Small
buckets
Empty
small
buckets
Large
buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Figure 4.10 Parallel version of bucket sort.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
76
Process n 1
Process 0
Receive
buffer
Send
buffer
Send
buffer
n1
Process 1
n1
Process n 1
n1
Process 0
n1
Process n 2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
77
All-to-all
P0
P1
P2
P3
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
78
f(x)
f(p)
f(q)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
79
f(x)
f(p)
f(q)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
80
f(x)
f(p)
f(q)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
81
f(x)
C
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
82
f(x)
C=0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
83
Center of mass
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
84
Subdivision
direction
Particles
Partial quadtree
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
85
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
86
log n numbers
+
+
+
+
+
+
+
+
Binary Tree
Result
Figure 4.21
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
87
y
f(x)
f(a)
b
a
f(b)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
88
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
89
P0
P1
P2
P3
P4
P5
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
90
sum
a[0]
a[1]
a[2]
a[3]
a[4]
sin
sout
sin
sout
Figure 5.2
sin
sout
sin
sout
sin
sout
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
91
fin
f1
fout
fin
f2
fout
fin
f3
fout
fin
f4
fout
fin
fout
Filtered signal
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
92
p1
P5
P4
P3
P2
P1
P0
Instance
1
Instance Instance
1
2
Instance Instance Instance
1
2
3
Instance Instance Instance Instance
1
2
3
4
Instance
1
Instance
2
Instance
3
Instance
4
Instance
5
Instance
1
Instance
2
Instance
3
Instance
4
Instance
5
Instance
6
Instance
2
Instance
3
Instance
4
Instance
5
Instance
6
Instance
7
Instance
3
Instance
4
Instance
5
Instance
6
Instance
7
Instance Instance
4
5
Instance Instance
5
6
Instance Instance
6
7
Instance
7
Time
Figure 5.4 Space-time diagram of a pipeline.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
93
Instance 0
Instance 1
Instance 2
Instance 3
Instance 4
P0
P1
P2
P3
P4
P5
P0
P1
P2
P3
P4
P5
P0
P1
P2
P3
P4
P5
P0
P1
P2
P3
P4
P5
P0
P1
P2
P3
P4
P5
Time
Figure 5.5 Alternative space-time diagram.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
94
Input sequence
d9d8d7d6d5d4d3d2d1d0
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
n
d0
d1
d2
d3
d4
d5
d6
d0
d1
d2
d3
d4
d5
d6
d7
d0
d1
d2
d3
d4
d5
d6
d7
d8
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d1
d2
d3
d4
d5
d6
d7
d8
d9
P9
P8
P7
P6
P5
P4
P3
P2
P1
P0
d0
Time
(b) Timing diagram
Figure 5.6
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
95
P5
P5
P4
Information
transfer
sufficient to
start next
process
P4
P3
P3
P2
P2
P1
P0
P1
Information passed
to next stage
Time
(a) Processes with the same
execution time
P0
Time
(b) Processes not with the
same execution time
Figure 5.7 Pipeline processing where information passes to next stage before end of process.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
96
Processor 0
P0
P1
P2
Processor 1
P3
P4
P5
P6
Processor 2
P7
P8
P9
P10
P11
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
97
Multiprocessor
Host
computer
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
98
1 i
P0
1 i
P1
P2
Figure 5.10
1 i
1 i
P3
1 i
P4
Pipelined addition.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
99
Master process
dn1 d2d1d0
Slaves
P0
P1
P2
Pn1
Sum
Figure 5.11 Pipelined addition numbers with a master process and ring configuration.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
100
Master process
Numbers
d0
d1
P0
P1
Slaves
P2
dn1
Pn1
Sum
Figure 5.12 Pipelined addition of numbers with direct access to slave processes.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
101
P0
1
4, 3, 1, 2, 5
4, 3, 1, 2
4, 3, 1
4, 3
P1
P2
P3
P4
2
1
2
3
Time
(cycles)
1
2
10
1
2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
102
P0
Series of numbers
xn1 x1x0
Smaller
numbers
P1
P2
Compare
xmax
Largest number
Next largest
number
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
103
Master process
dn1 d2d1d0
Sorted sequence
P0
P1
P2
Pn1
Figure 5.15 Insertion sort with results returned to the master process using a bidirectional line configuration.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
104
Sorting phase
2n 1
n
Shown for n = 5
P4
P3
P2
P1
P0
Time
Figure 5.16
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
105
P0
Not multiples of
1st prime number
P1
P2
2nd prime
number
3rd prime
number
Series of numbers
xn1 x1x0
Compare
multiples
1st prime
number
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
106
P0
P1
Compute x0
x0
Compute x1
P2
x0
x1
Compute x2
P3
x0
x1
x2
Compute x3
x0
x1
x2
x3
Figure 5.18 Solving an upper triangular set of linear equation using a pipeline.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
107
P5
P4
Processes
P3
P2
P1
P0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
108
P0
divide
send(x0)
end
Time
P1
recv(x0)
send(x0)
multiply/add
divide/subtract
send(x1)
end
P2
recv(x0)
send(x0)
multiply/add
recv(x1)
send(x1)
multiply/add
divide/subtract
send(x2)
end
P3
recv(x0)
send(x0)
multiply/add
recv(x1)
send(x1)
multiply/add
recv(x2)
send(x2)
multiply/add
divide/subtract
send(x3)
end
P4
recv(x0)
send(x1)
multiply/add
recv(x1)
send(x1)
multiply/add
recv(x2)
send(x2)
multiply/add
recv(x3)
send(x3)
multiply/add
divide/subtract
send(x4)
end
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
109
y4y3y2y1
x1
x2
x3
x4
yin
yout
yin
yout
yin
yout
yin
yout
a1
a2
a3
a4
Figure 5.21
Output
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
110
Display
Display
Audio input
(digitized)
Pipeline
Audio input
(digitized)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
111
Processes
P0
P1
P2
Pn1
Active
Time
Waiting
Barrier
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
112
Processes
P0
P1
Pn1
Barrier();
Barrier();
Processes wait until
all reach their
barrier call
Barrier();
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
113
Processes
P0
P1
Pn1
Counter, C
Increment
and check for n
Barrier();
Barrier();
Barrier();
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
114
Slave processes
Master
Arrival
phase
Departure
phase
for(i=0;i<n;i++)
recv(Pany);
for(i=0;i<n;i++)
send(Pi);
Barrier:
send(Pmaster);
recv(Pmaster);
Barrier:
send(Pmaster);
recv(Pmaster);
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
115
P0
P1
P2
P3
Arrival
at barrier
P4
P5
P6
P7
Sychronizing
message
Departure
from barrier
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
116
P0
P1
P2
P3
P4
P5
P6
P7
1st stage
Time
2nd stage
3rd stage
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
117
Instruction
a[] = a[] + k;
Processors
a[0]=a[0]+k;
a[1]=a[1]+k;
a[n-1]=a[n-1]+k;
a[0]
a[1]
a[n-1]
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
118
Numbers
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
10
11
12
13
14
15
i=0
i=0
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
10
11
12
i=0
i=0
i=0
i=0
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
10
11
12
13
14
15
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
10
11
12
13
14
15
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
i=0
Add
Step 1
(j = 0)
Add
Step 2
(j = 1)
13
14
15
Add
Step 3
(j = 2)
Add
Final step
i=0
(j = 3)
Figure 6.8 Data parallel prefix sum operation.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
119
Computed
value
Error
Exact value
t+1
Iteration
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
120
Process 0
Send
buffer
data
x0
Process 1
data
x1
Process n 1
data
xn1
Receive
buffer
Allgather();
Allgather();
Allgather();
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
121
2 106
Execution
time
( = 1)
1 106
Overall
Communication
Computation
0
0
12
16
20
24
28
32
Number of processors, p
Figure 6.11 Effects of computation and communication in Jacobi iteration.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
122
Metal plate
Enlarged
hi1,j
hi,j
hi,j1
hi,j+1
hi+1,j
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
123
x1
x2
xk1
xk+1 xk+2
xk
x2k1 x2k
xik
xi1
xi+1
xi
xi+k
xk2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
124
row
send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);
send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);
send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);
column
i
send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);
send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);
Figure 6.14
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
125
P0 P1
P0
Pp1
P1
Pp1
Blocks
Strips (columns)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
126
n
--p
Square blocks
Strips
Figure 6.16 Communication consequences of partitioning.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
127
2000
Strip partition best
tstartup
1000
0
1
10
100
1000
Processors, p
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
128
Process i
Array held
by process i
One row
of points
Ghost points
Copy
Array held
by process i+1
Process i+1
Figure 6.18
Configurating array into contiguous rows for each process, with ghost points.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
129
20C
4ft
100C
10ft
10ft
Figure 6.19 Room for Problem 6-14.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
130
vehicle
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
131
Airflow
Actual dimensions
selected at will
Figure 6.21 Figure for Problem 6-23.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
132
P5
P4
P
Processors 3
P2
P1
P0
Time
(a) Imperfect load balancing leading
to increased execution time
P5
P4
P
Processors 3
P2
P1
P0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
133
Work pool
Queue
Master
process
Tasks
Send task
Request task
(and possibly
submit new tasks)
Slave worker processes
Figure 7.2 Centralized work pool.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
134
Initial tasks
Master, Pmaster
Process M0
Process Mn1
Slaves
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
135
Process
Process
Requests/tasks
Process
Process
Figure 7.4 Decentralized work pool.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
136
Slave Pi
Requests
Local
selection
algorithm
Requests
Slave Pj
Local
selection
algorithm
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
137
Master
process
P0
P1
Figure 7.6
P2
P3
Pn1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
138
Pcomm
If buffer empty,
make request
Receive task
from request
If buffer full,
send task
If free,
request
task
Receive
task from
request
Ptask
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
139
P0
Task
when
requested
P1
P3
P2
P5
P4
P6
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
140
Parent
Process
Inactive
Final
acknowledgment
First task
Acknowledgment
Task
Other processes
Active
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
141
P0
P1
Figure 7.10
P2
Pn1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
142
Token
AND
Terminated
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
143
Task
P0
Pj
Figure 7.12
Pi
Pn1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
144
AND
Terminated
AND
AND
Terminated
Terminated
Figure 7.13 Tree termination.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
145
Summit
F
E
D
C
A
Base camp
Climbing a mountain.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
146
17
E
9
51
24
D
13
10
A
14
8
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
147
Destination
C
D
E
10
13
24
51
14
17
Source
B 10
C 8
D 14
E 9
F 17
D 13
E 24
F 51
Source
F
(b) Adjacency list
Figure 7.16 Representing a graph.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
148
Vertex j
di
Vertex i
wi,j
dj
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
149
Master process
Start at
source
vertex
Vertex
Vertex w[]
w[]
New
distance
dist
dist
Process A
Vertex w[]
New
distance
Process C
Other processes
dist
Process B
Figure 7.18
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
150
Entrance
Search path
Exit
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
151
Gold
Entrance
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
152
Room B
Door
Room A
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
153
Bus
Cache
Processors
Memory modules
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
154
TABLE 8.1
Language
Comments
Concurrent Pascal
Extension to Pascal
Ada
Modula-P
Brunl, 1986c
Extension to Modula 2
C*
Concurrent C
Extension to C
Fortran D
a. Brinch Hansen, P. (1975), The Programming Language Concurrent Pascal, IEEE Trans. Software Eng.,
Vol. 1, No. 2 (June), pp. 199207.
b. U.S. Department of Defense (1981), The Programming Language Ada Reference Manual, Lecture
Notes in Computer Science, No. 106, Springer-Verlag, Berlin.
c. Brunl, T., R. Norz (1992), Modula-P User Manual, Computer Science Report, No. 5/92 (August), Univ.
Stuttgart, Germany.
d. Thinking Machines Corp. (1990), C* Programming Guide, Version 6, Thinking Machines System Documentation.
e. Gehani, N., and W. D. Roome (1989), The Concurrent C Programming Language, Silicon Press, New
Jersey.
f. Fox, G., S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu (1990), Fortran D
Language Specification, Technical Report TR90-141, Dept. of Computer Science, Rice University.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
155
Main program
FORK
Spawned processes
FORK
FORK
JOIN
JOIN
JOIN
JOIN
Figure 8.2
FORK-JOIN construct.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
156
Code
Heap
IP
Stack
Interrupt routines
Files
(a) Process
Code
Stack
Heap
Thread
IP
Interrupt routines
Stack
Thread
IP
Files
(b) Threads
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
157
Main program
thread1
proc1(&arg)
{
return(*status);
}
pthread_join(thread1, *status);
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
158
Main program
Thread
pthread_create();
pthread_create();
Thread
pthread_create();
Thread
Termination
Termination
Termination
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
159
Shared variable, x
Write
Write
Read Read
+1
+1
Process 1
Process 2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
160
Process 1
while (lock == 1) do_nothing;
lock = 1;
Process 2
while (lock == 1)do_nothing;
Critical section
lock = 0;
lock = 1;
Critical section
lock = 0;
Figure 8.7 Control of critical sections through busy waiting.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
161
R1
R2
Resource
P1
P2
Process
R1
R2
Rn 1
Rn
P1
P2
Pn 1
Pn
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
162
Main memory
Block
7
6
5
4
3
2
1
0
Address
tag
Cache
Cache
Block in cache
Processor 1
Processor 2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
163
sum
Array a[]
addr
Figure 8.10 Shared memory locations for Section 8.4.1 program example.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
164
global_index sum
Array a[]
addr
Figure 8.11 Shared memory locations for Section 8.4.2 program example.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
165
Test1
Test2
Test3
Gate
Function
Input 1
Input 2
Output
AND
Test1
Test2
Gate1
NOT
Gate1
OR
Test3
Output1
Output2
Output1
Gate1
Output2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
166
Log
Movement
of logs
River
Frog
Figure 8.13 River and frog for Problem 8-23.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
167
Pool of threads
Request
Request
serviced
Slaves
Master Signal
Figure 8.14 Thread pool for Problem 8-24.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
168
a[i] a[0]
a[i] a[n-1]
Compare
Increment
counter, x
b[x] = a[i]
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
169
Compare
0/1
0/1
0/1
0/1
Add
Add
0/1/2
0/1/2
Tree
Add
0/1/2/3/4
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
170
Master
a[]
b[]
Read
numbers
Place selected
number
Slaves
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
171
Sequence of steps
P1
A
P2
1
Send(A)
If A > B send(B)
else send(A)
If A > B load A
else load B
2
Compare
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
172
P1
A
P2
1
Send(A)
B
Send(B)
2
If A > B load B
3
If A > B load A
Compare
Compare
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
173
P2
P1
Merge
88
Original 50
numbers 28
25
88
50
28
25
43
42
Final
numbers 28
25
98
80
43
42
98
88
80
50
43
42
28
25
Keep
higher
numbers
Return
lower
numbers
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
174
P1
P2
Original
numbers
Merge
Keep
lower
numbers
(final
numbers)
98
88
80
50
43
42
28
25
Merge
98
80
43
42
98
80
43
42
88
50
28
25
88
50
28
25
Original
numbers
98
88
80
50
43
42
28
25
Keep
higher
numbers
(final
numbers)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
175
Original
sequence:
Phase 1
Place
largest
number
Phase 2
Place
next
largest
number
Phase 3
Time
Figure 9.8 Steps in bubble sort.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
176
Phase 1
1
1
Phase 2
1
Time
Phase 3
3
Phase 4
4
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
177
P0
P1
P2
P3
P4
P5
P6
P7
Step
Time
Figure 9.10
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
178
Smallest
number
Largest
number
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
179
14
14
10
13
16
16
13
10
15
15
12
11
14
12
11
12
11
16
13
10
15
11
12
14
11
12
10
10
11
12
16
15
13
10
16
15
13
14
16
15
14
13
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
180
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
181
Unsorted list
4
P0
P0
Divide
list
4
P4
P0
P0
P2
P1
P2
P0
P4
P3
P4
P2
P6
P5
P6
P4
P7
P6
Merge
2
Sorted list
P0
P4
P0
Process allocation
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
182
Unsorted list
Pivot
4
Sorted list
P0
P0
P0
P0
P4
P2
P6
P4
P6
P1
P7
Process allocation
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
183
Unsorted list
Pivot
4
6
Sorted list
Pivots
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
184
Work pool
Sublists
Request
sublist
Return
sublist
Slave processes
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
185
(a) Phase 1
000
001
010
011
100
101
p1
(b) Phase 2
000
001
Figure 9.18
111
110
111
> p1
010
p2
(c) Phase 3
110
011
100
> p2
101
p3
> p3
000
001
010
011
100
101
110
111
p4
> p4
p5
> p5
p6
> p6
p7
> p7
Hypercube quicksort algorithm when the numbers are originally in node 000.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
186
Broadcast pivot, p1
(a) Phase 1
000
001
010
011
100
101
p1
(c) Phase 3
000
001
111
110
111
> p1
Broadcast pivot, p2
(b) Phase 2
110
Broadcast pivot, p3
010
011
100
101
p2
> p2
p3
> p3
Broadcast
pivot, p4
Broadcast
pivot, p5
Broadcast
pivot, p6
Broadcast
pivot, p7
000
001
010
011
100
101
110
111
p4
> p4
p5
> p5
p6
> p6
p7
> p7
Figure 9.19 Hypercube quicksort algorithm when numbers are distributed among nodes.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
187
110
111
011
100
000
101
001
110
111
011
100
000
101
001
110
111
011
100
101
Figure 9.20 Hypercube quicksort
communication.
000
001
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
188
Broadcast pivot, p1
(a) Phase 1
000
001
011
010
110
111
p1
(c) Phase 3
000
001
100
101
100
> p1
Broadcast pivot, p2
(b) Phase 2
101
Broadcast pivot, p3
011
010
110
111
p2
> p2
p3
> p3
Broadcast
pivot, p4
Broadcast
pivot, p5
Broadcast
pivot, p6
Broadcast
pivot, p7
000
001
011
010
110
111
101
100
p4
> p4
p5
> p5
p6
> p6
p7
> p7
Figure 9.21
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
189
a[]
b[]
2 4 5 8
Sorted lists
Even indices
Odd indices
c[]
1 3 6 7
Merge
Merge
1 2 5 6
d[] 3 4 7 8
e[]
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
190
Compare and
exchange
c2n
c2n1
c2n2
bn
bn1
Even
mergesort
b4
b3
b2
b1
an
an1
Odd
mergesort
a4
a3
a2
a1
c7
c6
c5
c4
c3
c2
c1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
191
Value
an2, an1
an2, an1
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
192
Bitonic sequence
3
Compare and
exchange
Bitonic sequence
Bitonic sequence
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
193
Unsorted numbers
8
9
7
4
Compare and
exchange
4
5
Sorted list
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
194
Unsorted numbers
Bitonic
sorting
operation
Direction
of increasing
numbers
Sorted list
Figure 9.27
Bitonic mergesort.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
195
= bitonic list
[Fig. 9.24 (a) or (b)]
Step
1
Form
bitonic lists
of four
numbers
n=2
1
ai with ai+2
Split
ai with ai+4
Split
n=4
2
ai with ai+1
Sort
n=8
Higher
Split
Lower
ai with ai+2
n=2
3
Compare and
exchange
n=4
1
ai with ai+1
Form
bitonic list
of eight
numbers
n=2
ai with ai+1
9
Sort
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
196
Step 1
88
50
28
25
98
80
43
42
Step 2
50
42
28
25
98
88
80
43
Step 3
43
42
28
25
98
88
80
50
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
197
Column
a0,0
a0,1
a0,m2
a0,m1
a1,0
a1,1
a1,m2
a1,m1
an2,0
an2,1
an2,m-2 an2,m1
an1,0
an1,1
an1,m2 an1,m1
Row
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
198
Column
Multiply
Sum
results
Row
i
ci,j
A
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
199
Row
sum
i
ci
Figure 10.3 Matrix-vector multiplication
c = A b.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
200
Multiply
Sum
results
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
201
a0,0
a0,1
a0,2
a0,3
b0,0
b0,1
b0,2
b0,3
a1,0
a1,1
a1,2
a1,3
b1,0
b1,1
b1,2
b1,3
a2,0
a2,1
a2,2
a2,3
b2,0
b2,1
b2,2
b2,3
a3,0
a3,1
a3,2
a3,3
b3,0
b3,1
b3,2
b3,3
(a) Matrices
A0,0
a0,0
a0,1
a1,0
a1,1
B0,0
b0,0
b0,1
b1,0
b1,1
A0,1
a0,2
a0,3
a1,2
a1,3
b2,0
b2,1
b3,0
b3,1
B1,0
a0,2b2,0 + a0,3b3,0
a0,2b2,1 + a0,3b3,1
a1,2b2,0 + a1,3b3,0
a1,2b2,1 + a1,3b3,1
= C0,0
(b) Multiplying A0,0 B0,0 to obtain C0,0
Figure 10.5 Submatrix multiplication.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
202
Column j
Row i
b[][j]
a[i][]
Processor Pi,j
c[i][j]
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
203
P0
P1
P2
P3
P0
P2
P0
+
c0,0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
204
j
P0 P1 P2 P3
i
App
Apq
Bpp
Bpq
P0 + P1
Cpp
Aqp
Aqq
Bqp
Bqq
P4 + P5
Cqp
P2 + P3
Cpq
P6 + P7
Cqq
P4 P5 P6 P7
Figure 10.8 Submatrix multiplication and summation.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
205
i
A
Pi,j
B
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
206
j
B
i
i places
A
j places
ai,j+i
bi+j,j
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
207
j
B
i
A
Pi,j
Figure 10.11 Step 4 One-place shift of
elements of A and B.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
208
Pumping
action
b3,0
b2,0
b1,0
b0,0
b3,1
b2,1
b1,1
b0,1
b3,2
b2,2
b1,2
b0,2
b3,3
b2,3
b1,3
b0,3
c0,0
c0,1
c0,2
c0,3
c1,0
c1,1
c1,2
c1,3
c2,0
c2,1
c2,2
c2,3
c3,0
c3,1
c3,2
c3,3
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
209
Pumping
action
b3
b2
b1
b0
c0
c1
c2
c3
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
210
Column
Row
Row i
aji
Step through
Row j
Already
cleared
to zero
Cleared
to zero
Column i
Figure 10.14 Gaussian elimination.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
211
Column
Row
n i +1 elements
(including b[i])
Row i
Broadcast
ith row
Already
cleared
to zero
Figure 10.15 Broadcast in parallel implementation of Gaussian elimination.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
212
P0
P1
P2
Pn1
Row
Broadcast
rows
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
213
Row
0
P0
n/p
P1
2n/p
P2
3n/p
P3
Figure 10.17 Strip partitioning.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
214
Row
0
n/p
P0
2n/p
P1
3n/p
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
215
Solution space
f(x, y)
y
x
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
216
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
x21
x22
x23
x24
x25
x26
x27
x28
x29
x30
x31
x32
x33
x34
x35
x36
x37
x38
x39
x40
x41
x42
x43
x44
x45
x46
x47
x48
x49
x50
x51
x52
x53
x54
x55
x56
x57
x58
x59
x60
x61
x62
x63
x64
x65
x66
x67
x68
x69
x70
x71
x72
x73
x74
x75
x76
x77
x78
x79
x80
x81
x82 x83
x85
x86
x87
x88
x89
x90
x91
x92
x95
x96
x97
x98
x99 x100
Figure 10.20
x84
x93
x94
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
217
1
1
ith equation
To include
boundary values
and some zero
entries (see text)
1 4 1
1
1 4 1
1
1
1
1 4 1
ai,in ai,i1 ai,i ai,i+1 ai,i+n
1
1
1 4 1
1 4 1
x1
x2
0
0
1
1
xN-1
xN
x
0
0
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
218
Point
computed
Point to be
computed
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
219
Red
Black
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
220
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
221
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
222
50C
40C
60C
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
223
Origin (0, 0)
i
Picture element
(pixel)
p(i, j)
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
224
Number
of pixels
Gray level
255
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
225
x0
x1
x2
x3
x4
x5
x6
x7
x8
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
226
Step 1
Each pixel adds
pixel from left
Step 2
Each pixel adds
pixel from right
Step 3
Each pixel adds pixel
from above
Step 4
Each pixel adds pixel
from below
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
227
x0
x1
x2
x0
x0 + x1
x3
x4
x0
x7
x5
x3
x8
x6
(a) Step 1
(b) Step 2
x2
x0
x7
x1
x8
x2
x0 + x1 + x2
x5
x3
x4
x0 + x1 + x2
x3 + x4 + x5
x6 + x7 + x8
x5
x8
x6
x7
x8
x0 + x1 + x2
x3 + x4 + x5
x6
x7
x6 + x7 + x8
x4
x5
x3 + x4 + x5
x0 + x1 + x2
x3
x4
x6 + x7
x1
x2
x0 + x1 + x2
x3 + x4
x6
x1
x6 + x7 + x8
x6 + x7 + x8
(c) Step 3
(d) Step 4
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
228
Largest
in row
Next largest
in row
Next largest
in column
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
229
Mask
Pixels
w0
w1
w2
w3
w4
w5
w6
w7
w8
Result
x0
x1
x2
x3
x4
x5
x6
x7
x8
x4'
Figure 11.7 Using a 3 3 weighted mask.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
230
k=
1
9
Figure 11.8
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
231
k=
1
16
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
232
1
k=
9
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
233
Intensity transition
First derivative
Second derivative
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
234
Image
y
Constant
intensity
f(x, y)
Gradient
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
235
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
236
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
237
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
238
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
239
Upper pixel
x1
x3
Left pixel
x4
x5
Right pixel
x7
Lower pixel
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
240
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
241
b = x1a + y1
y = ax + b
b = xa + y
(x1, y1)
(a, b)
Pixel in image
x
(a) (x, y) plane
a
(b) Parameter space
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
242
y = ax + b
r = x cos + y sin
(r, )
x
(a) (x, y) plane
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
243
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
244
Accumulator
15
10
5
0
0102030
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
245
Transform
rows
Transform
columns
xjk
Xjm
Xlm
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
246
Transform
Image
fj,k
Convolution
f(j, k)
F(j, k)
hj,k
gj,k
g(j, k)
Inverse
transform
Multiply
H(j, k)
h(j, k)
G(j, k)
Filter/image
(a) Direct convolution
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
247
Master process
w0
w1
wn1
Slave processes
X[0]
X[1]
X[n1]
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
248
x[j]
Process j
X[k]
a
wk
Values for
next iteration
X[k]
a x[j]
a
wk
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
249
x[0]
x[1]
x[2]
x[3]
x[N1]
Output sequence
0
1
X[k]
a
wk
wk
X[0],X[1],X[2],X[3]
P0
P1
P2
P3
PN1
PN1
PN2
Pipeline
stages
P2
P1
P0
Time
(b) Timing diagram
Figure 11.27 Discrete Fourier transform with a pipeline.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
250
Input sequence
x0
x1
x2
x3
xN2
xN1
Transform
N/2 pt
DFT
N/2 pt
DFT
Xeven
Xodd
wk
Xk
Xk+N/2
k = 0, 1, N/2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
251
x0
X0
x1
X1
x2
X2
x3
X3
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
252
Xk = (0,2,4,6,8,10,12,14)+wk(1,3,5,7,9,11,13,15)
{(0,4,8,12)+wk(2,6,10,14)}+wk{(1,5,9,13)+wk(3,7,11,15)}
{[(0,8)+wk(4,12)]+wk[(2,10)+wk(6,14)]}+{[(1,9)+wk(5,13)]+wk[(3,11)+wk(7,15)]}
x0
x8
x4
x12
x2
x10
x6
x14
x1
x9
x5
x13
x3
x11
x7
x15
0000 1000 0100 1100 0010 1010 0110 1011 0001 1001 0101 1101 0011 1011 0111 1111
Figure 11.30 Sixteen-point DFT decomposition.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
253
x0
X0
x1
X1
x2
X2
x3
X3
x4
X4
x5
X5
x6
X6
x7
X7
x8
X8
x9
X9
x10
X10
x11
X11
x12
X12
x13
X13
x14
X14
x15
X15
Figure 11.31 Sixteen-point FFT computational flow.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
254
Process
Row
Inputs
P/r
0000 x0
P0
P1
P2
P3
Outputs
X0
0001 x1
X1
0010 x2
X2
0011 x3
X3
0100 x4
X4
0101 x5
X5
0110 x6
X6
0111 x7
X7
1000 x8
X8
1001 x9
X9
1010 x10
X10
1011 x11
X11
1100 x12
X12
1101 x13
X13
1110 x14
X14
1111 x15
X15
Figure 11.32 Mapping processors onto 16-point FFT computation.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
255
P0
P1
P2
P3
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
256
P0
P1
P2
P3
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
257
P0
P1
P2
P3
x0
x4
x8
x12
x1
x5
x9
x13
x2
x6
x10
x14
x3
x7
x11
x15
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
258
7
6
5
4
3
2
Mask
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
259
C0
First choice
Second choice
Not
including
C0
C1
Not
including
C1
Cn1
Not
including
Cn1
Third choice
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
260
1
Parent A
p p+1
A1
1
Parent B
p p+1
m
B2
p p+1
A1
1
Child 2
A2
B1
Child 1
m
B2
p p+1
B1
m
A2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
261
Subpopulation
Migration path;
every island sends
to every other island
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
262
Island subpopulations
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
263
Program
Instructions
Clock
Processors
with local
memory
Data
Shared memory
Figure D.1 PRAM model.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
264
d[0] s[0]
1
d[1] s[1]
1
d[2] s[2]
d[3] s[3]
d[4] s[4]
1
d[5] s[5]
1
d[6] s[6]
1
d[7] s[7]
0
Null
Figure D.2
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
265
Threads or processes
Local computation
(maximum time w)
Maximum of h
sends or receives
Communication
Barrier synchronization
Figure D.3
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
266
Pi
Next message
Processors
Message
Pk
Pi
Time
Figure D.4 LogP parameters.
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
267
Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
268