Parallel Distributed Computing Unit-4
Parallel Distributed Computing Unit-4
4)
Sorting Algorithms
A. Grama, A. Gupta, G. Karypis, and V. Kumar
• Quicksort
Sorting: Basics
1 6 8 11 13 2 7 9 10 12
1 6 8 11 13 2 7 9 10 12
Pi Pj Pi Pj
Step 1 Step 2
1 2 6 7 8 9 10 11 12 13 1 2 6 7 8 9 10 11 12 13 1 2 6 7 8 9 10 11 12 13
Pi Pj Pi Pj
Step 3 Step 4
1. procedure ODD-EVEN(n)
2. begin
3. for i := 1 to n do
4. begin
5. if i is odd then
6. for j := 0 to n/2 − 1 do
7. compare-exchange(a2j+1, a2j+2);
8. if i is even then
9. for j := 1 to n/2 − 1 do
10. compare-exchange(a2j , a2j+1);
11. end for
12. end ODD-EVEN
3 2 3 8 5 6 4 1
Phase 1 (odd)
2 3 3 8 5 6 1 4
Phase 2 (even)
2 3 3 5 8 1 6 4
Phase 3 (odd)
2 3 3 5 1 8 4 6
Phase 4 (even)
2 3 3 1 5 4 8 6
Phase 5 (odd)
2 3 1 3 4 5 6 8
Phase 6 (even)
2 1 3 3 4 5 6 8
Phase 7 (odd)
1 2 3 3 4 5 6 8
Phase 8 (even)
1 2 3 3 4 5 6 8
Sorted
• This is cost optimal with respect to the base serial algorithm but
not the optimal one.
Parallel Odd-Even Transposition
z local
}|sort { comparisons communication
n n z }| { z }| {
TP = Θ log + Θ(n) + Θ(n).
p p
Shellsort
• During the first phase, processes that are far away from each
other in the array compare-split their elements.
0 3 4 5 6 7 2 1
0 3 4 5 6 7 2 1
0 3 4 5 6 7 2 1
Pivot
(b) 1 2 3 5 8 4 3 7
Final position
(c) 1 2 3 3 4 5 8 7
(d) 1 2 3 3 4 5 7 8
(e) 1 2 3 3 4 5 7 8
• In the best case, the pivot divides the list in such a way that the
larger of the two lists does not have more than αn elements (for
some constant α).
• Each processor partitions its list into two, say Si and Li, based
on the selected pivot.
• All of the Si lists are merged and all of the Li lists are merged
separately.
First Step
P0 P1 P2 P3 P4
after local
7 2 18 13 1 17 14 20 6 10 15 9 3 4 19 16 5 12 11 8 rearrangement
after global
7 2 1 6 3 4 5 18 13 17 14 20 10 15 9 19 16 12 11 8 rearrangement
P0 P1 P2 P3 P4
7 2 1 6 3 4 5 18 13 17 14 20 10 15 9 19 16 12 11 8 pivot selection
pivot=5 pivot=17
Second Step
P0 P1 P2 P3 P4
after local
1 2 7 6 3 4 5 14 13 17 18 20 10 15 9 19 16 12 11 8 rearrangement
after global
1 2 3 4 5 7 6 14 13 17 10 15 9 16 12 11 8 18 20 19 rearrangement
P0 P1 P2 P3 P4
1 2 3 4 5 7 6 14 13 17 10 15 9 16 12 11 8 18 20 19 pivot selection
pivot=11
P0 P1 P2 P3 P4
Third Step
after local
1 2 3 4 5 6 7 10 13 17 14 15 9 8 12 11 16 18 19 20 rearrangement
after global
10 9 8 12 11 13 17 14 15 16 rearrangement
P2 P3
Fourth Step
after local
10 9 8 12 11 13 17 14 15 16 rearrangement
P0 P1 P2 P3 P4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Solution
Parallelizing Quicksort: Shared Address Space
Formulation
• How to globally merge the local lists (S0, L0, S1, L1, . . .) to form
S and L?
P0 P1 P2 P3 P4
after local
7 2 18 13 1 17 14 20 6 10 15 9 3 4 19 16 5 12 11 8 rearrangement
|Si | 2 1 1 2 1 2 3 3 2 3 |Li |
Prefix Sum Prefix Sum
0 2 3 4 6 7 0 2 5 8 10 13
after global
7 2 1 6 3 4 5 18 13 17 14 20 10 15 9 19 16 12 11 8 rearrangement
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
• The parallel time depends on the split and merge time, and the
quality of the pivot.
• The first step takes time Θ(log p), the second, Θ(n/p), the third,
Θ(log p), and the fourth, Θ(n/p).
• The process recurses until there are p lists, at which point, the
lists are sorted locally.
array splits
z local
}|sort { z }| {
n n n
TP = Θ log +Θ log p + Θ(log2 p). (2)
p p p
• Each processor splits its local list into two lists, one less (Si), and
other greater (Li) than the pivot.
• A processor in the low half of the machine sends its list Li to the
paired processor in the other half. The paired processor sends
its list Si.
• It is easy to see that after this step, all elements less than the
pivot are in the low half of the machine and all elements
greater than the pivot are in the high half.
Parallelizing Quicksort: Message Passing Formulation
• The above process is recursed until each processor has its own
local list, which is sorted locally.