L8 Parallel Algorithms
L8 Parallel Algorithms
Parallel algorithms
TOPICS:
Sorting
Merging
List ranking in PRAMs and
Applications.
What is a parallel approach?
Imagine you needed to find a lost
child in the woods. Even in a small
area, searching by yourself would be
very time consuming. Now if you
gathered some friends and family to
help you, you could cover the woods
in much faster manner…
Parallel Architectures
Static
Each processor is hard-wired to every
other processor
Completely
Connected Star- Bounded-Degree
Connected (Degree 4)
Dynamic
Processors are connected to a series of
switches
Parallel Algorithms
A parallel algorithm is an algorithm that has been
specifically written for execution on a computer with
two or more processing units.
Can be run on computers with single
processor
(multiple functional units, pipelined
functional units, pipelined memory
systems).
When designing algorithm, take into
account the cost of communication, the
number of processors (efficiency)
Parallel Algorithms
Parallel: perform more than one operation at a time.
PRAM model: Parallel Random Access Model.
3 1 7 6 8 1 1 1
2 1 9 3
P1 P2 P3 P4
Read
1 3 6 7 1 8 1 1
2
First Second First Second 1
First 3
Second First 9
Second
Write
1 1 7 6 1 1 1 1
2 2 1 1 9 3
Example: Finding the Largest key in an array
1 1 7 6 1 1 1 1
2 2 1 1 9 3
P1 P3
Read
7 12 19 11
Write
1 1 7 6 1 1 1 1
2 2 9 1 9 3
Example: Finding the Largest key in an array
1 1 7 6 1 1 1 1
2 2 9 1 9 3
Read P1
1 1
9 2
Write
1 1 7 6 1 1 1 1
9 2 9 1 9 3
Example: Merge Sort
8 1 4 5 2 7 3 6
P1 P2 P3 P4
1 8 4 5 2 7 3 6
P1 P2
1 4 5 8 2 3 6 7
P1
1 2 3 4 5 6 7 8
Merge Sort Analysis
Number of comparisons
= 1 + 3 + … + (2i-1) + … + (n-1)
= ∑i=1.. log2(n)(2i-1) < 2n- log2(n) =
Θ(n)
Unsorted
List {4, 2,
1, 3}
P11(4,4) P12(4,2) P13(4,1)
P14(4,3) 0+1+1+1 = 3
0+0+0+0 = 0
P31(1,4) P32(1,2) P33(1,1)
P34(1,3)
0+1+1+0 = 2
Applications
Computer Graphics Processing
Video Encoding
Accurate weather forecasting
Potential Speedup
O( n log n)
optimal parallel time complexity O(log n)
n
Odd-Even Transposition Sort - example
Each PE gets n/p numbers. First, PEs sort n/p locally, then they run
odd-even trans. algorithm each time doing a merge-split for 2n/p
numbers. P P P
0 1 2
P3
13 7 12 8 5 4 6 1 3 9 2 10
Local sort
7 12 13 4 5 8 1 3 6 2 9 10
O-E
4 5 7 8 12 13 1 2 3 6 9 10
E-O
4 5 7 1 2 3 8 12 13 6 9 10
O-E
1 2 3 4 5 7 6 8 9 10 12 13
E-O
SORTED: 1 2 3 4 5 6 7 8 9 10 12 13
Time complexity: Tpar = (Local Sort) + (p merge-splits) +(p exchanges)
Parallel :
n n n n
T par 2 0 1 2 k 2 1
2 2 2 2
2n 20 2 1 2 2 2 log n
T par O( 4n)
Bitonic Merge sort
Bitonic Sequence
A bitonic sequence is defined as a list with no more than one
LOCAL MAXIMUM and no more than one LOCAL
MINIMUM.(Endpoints must be considered - wraparound )
Binary Split
1. Divide the bitonic list into two equal halves.
2. Compare-Exchange each item on the first half
with the corresponding item in the second half.
Result:
Two bitonic sequences where the numbers in one sequence are all less
than the numbers in the other sequence.
Repeated application of binary split
Bitonic list:
24 20 15 9 4 2 5 8 | 10 11 12 13 22 30 32 45
10 11 12 9 . 4 2 5 8 | 24 20 15 13 . 22 30 32
45
4 2 . 5 8 10 11 . 12 9 | 22 20 . 15 13 24 30 . 32
45
4 . 2 5 . 8 10 . 9 12 .11 15 . 13 22 . 20 24 . 30 32 .
45
2 4 5 8 9 10 11 12 13 15 20 22 24 30 32
Sorting a bitonic sequence
Compare-and-exchange moves smaller numbers of each pair to left
and larger numbers of pair to right.
Given a bitonic sequence,
recursively performing ‘binary split’ will sort the list.
Sorting an arbitrary sequence
To sort an unordered sequence, sequences are merged into larger
bitonic sequences, starting with pairs of adjacent numbers.
i log n
log n(log n 1)
T bitonic
par i 2
O(log n)
i 1 2
Here Bitonic sort (for N >> P)
x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x
Parallel sorting - summary
3 4 6 1 0 5
(a) 1 1 1 1 1 0
3 4 6 1 0 5
(b) 2 2 2 2 1 0
3 4 6 1 0 5
(c) 4 4 3 2 1 0
3 4 6 1 0 5
(d) 5 4 3 2 1 0
List ranking –correctness of EREW algorithm
Loop invariant: for each i, the sum of d
values in the sub list headed by i is the
correct distance from i to the end of the
original list L.
Parallel memory must be synchronized:
the reads on the right must occur before
the writes on the left. Moreover, read d[i]
and then read d[next[i]].
An EREW algorithm: every read and write
is exclusive. For an object i, its
processor reads d[i], and then its
precedent processor reads its d[i].
Writes are all in distinct locations.
LIST ranking EREW algorithm running time
O(log2 n):
The initialization for loop runs in O(1).
Each iteration of while loop runs in O(1).
There are exactly log2 n iterations:
Each iteration transforms each list into two
interleaved lists: one consisting of objects
in even positions, and the other odd
positions. Thus, each iteration double the
number of lists but halves their lengths.
The termination test in line 5 runs in O(1).
Define work = #processors running time.
O(n log2 n).
Sorting on Specific Networks
41