0% found this document useful (0 votes)
3 views24 pages

Bitonic Sort algorithm used on Nvidia GPUs

The document presents a lecture on Bitonic Sort, a parallel sorting algorithm that sorts a bitonic sequence in Θ(log2 n) time using a sorting network. It explains the structure and properties of bitonic sequences, the process of building and sorting them, and provides pseudocode for the algorithm. The complexity of Bitonic Sort is discussed, highlighting its efficiency in comparison to other sorting methods.

Uploaded by

Bob Zigon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views24 pages

Bitonic Sort algorithm used on Nvidia GPUs

The document presents a lecture on Bitonic Sort, a parallel sorting algorithm that sorts a bitonic sequence in Θ(log2 n) time using a sorting network. It explains the structure and properties of bitonic sequences, the process of building and sorting them, and provides pseudocode for the algorithm. The complexity of Bitonic Sort is discussed, highlighting its efficiency in comparison to other sorting methods.

Uploaded by

Bob Zigon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

COMP 322: Fundamentals of

Parallel Programming

Lecture 28: Bitonic Sort

John Mellor-Crummey
Department of Computer Science, Rice University
[email protected]

https://wiki.rice.edu/confluence/display/PARPROG/COMP322

COMP 322 Lecture 28 26 March 2012


Introduction

• Why study sorting?


—one of the most common operations performed on computers
• Sorting algorithm attributes
—internal vs. external Today’s focus
– internal: data fits in memory Bitonic sort: internal,
– external: uses tape or disk comparison-based,
—comparison-based or not parallel sort
– comparison sort
basic operation: compare elements and exchange as necessary
Θ(n log n) comparisons to sort n numbers
– non-comparison-based sort
e.g. radix sort based on the binary representation of data
Θ(n) operations to sort n numbers
—parallel vs. sequential

2
Sorting Network

• Network of comparators designed for sorting


• Comparator : two inputs x and y; two outputs x' and y’
—types
– increasing (denoted ⊕): x' = min(x,y) and y' = max(x,y)
x ⊕ min(x,y)

y ⊕ max(x,y)
– decreasing (denoted Ө) : x' = max(x,y) and y' = min(x,y)

x Ө max(x,y)

y Ө min(x,y)
• Sorting network speed is proportional to its depth

3
Sorting Networks

• Network structure: a series of columns


• Each column consists of a vector of comparators (in parallel)
• Sorting network organization:

4
Example: Bitonic Sorting Network

• Bitonic sequence
—two parts: increasing and decreasing
– 〈1,2,4,7,6,0〉: first increases and then decreases (or vice versa)
—cyclic rotation of a bitonic sequence is also considered bitonic
– 〈8,9,2,1,0,4〉: cyclic rotation of 〈0,4,8,9,2,1〉

• Bitonic sorting network


—sorts n elements in Θ(log2 n) time
—network kernel: rearranges a bitonic sequence into a sorted one

5
Bitonic Split (Batcher, 1968)

• Let s = 〈a0,a1,…,an-1〉 be a bitonic sequence such that


—a0 ≤ a1 ≤ ··· ≤ an/2-1 , and
—an/2 ≥ an/2+1 ≥ ··· ≥ an-1

• Consider the following subsequences of s


s1 = 〈min(a0,an/2),min(a1,an/2+1),…,min(an/2-1,an-1)〉
s2 = 〈max(a0,an/2),max(a1,an/2+1),…,max(an/2-1,an-1)〉

• Sequence properties
—s1 and s2 are both bitonic
—∀x ∀y x ∈ s1, y ∈ s2 , x < y

• Apply recursively on s1 and s2 to produce a sorted sequence

• Works for any bitonic sequence, even if |s1| ≠ |s2|


6
Splitting Bitonic Sequences - I
Sequence properties
s1 and s2 are both bitonic
∀x ∀y x ∈ s1, y ∈ s2 , x < y

min max 7
Splitting Bitonic Sequences - I
Sequence properties
s1 and s2 are both bitonic
∀x ∀y x ∈ s1, y ∈ s2 , x < y

min max 8
Splitting Bitonic Sequences - I
Sequence properties
s1 and s2 are both bitonic
∀x ∀y x ∈ s1, y ∈ s2 , x < y

min max 9
Splitting Bitonic Sequences - I
Sequence properties
s1 and s2 are both bitonic
∀x ∀y x ∈ s1, y ∈ s2 , x < y

min max 10
Splitting Bitonic Sequences - I
Sequence properties
s1 and s2 are both bitonic
∀x ∀y x ∈ s1, y ∈ s2 , x < y

min max 11
Splitting Bitonic Sequences - II
Sequence properties
s1 and s2 are both bitonic
∀x ∀y x ∈ s1, y ∈ s2 , x < y

min max 12
Bitonic Merge

Sort a bitonic sequence through a series of bitonic splits


Example: use bitonic merge to sort 16-element bitonic sequence
How: perform a series of log2 16 = 4 bitonic splits

13
Sorting via Bitonic Merging Network

• Sorting network can implement bitonic merge algorithm


—bitonic merging network
• Network structure
—log2 n columns
—each column
– n/2 comparators
– performs one step of the bitonic merge

• Bitonic merging network with n inputs: ⊕BM[n]


—produces an increasing sequence

• Replacing ⊕ comparators by Ө comparators: ӨBM[n]


—produces a decreasing sequence

14
Bitonic Merging Network, ⊕ BM[16]

• Input: bitonic sequence


— input wires are numbered 0,1,…, n - 1 (shown in binary)

• Output: sequence in sorted order


• Each column of comparators is drawn separately 15
Bitonic Sort

How do we sort an unsorted sequence using a bitonic merge?


Two steps
• Build a bitonic sequence
• Sort it using a bitonic merging network

16
Building a Bitonic Sequence

• Build a single bitonic sequence from the given sequence


—any sequence of length 2 is a bitonic sequence.
—build bitonic sequence of length 4
– sort first two elements using ⊕BM[2]
– sort next two using ӨBM[2]

• Repeatedly merge to generate larger bitonic sequences


—⊕BM[k] & ӨBM[k]: bitonic merging networks of size k

17
Building a Bitonic Sequence

Input: sequence of 16 unordered numbers


Output: a bitonic sequence of 16 numbers
18
Bitonic Sort, n = 16

• First 3 stages create bitonic sequence input to stage 4


• Last stage (⊕BM[16]) yields sorted sequence
19
Complexity of Bitonic Sorting Networks

• Depth of the network is Θ(log2 n)


—log2 n merge stages
—jth merge stage depth is log2 2j = j
log 2 n log 2 n

—depth = ∑ log 2 2j = ∑ j = (log 2 n + 1)(log 2 n) /2 = θ (log 2 n)


j=1 i=1

• Each stage of the network contains n/2 comparators


• Complexity
€ of serial implementation = Θ(n log2 n)

20
From Sorting Network to Pseudocode

Batcher’s Bitonic Sort


• bmerge(s): recursively sort a bitonic sequence as follows
1. compute s1 and s2 as shown earlier for ascending sort of s
both will be bitonic by Batcher’s Lemma
note: for a descending sort, just swap min & max
2. recursively call bmerge on s1 and s2
3. return s = concat(bmerge(s1), bmerge(s2))
• bsort(s): create a bitonic sequence then sort it
1. convert an arbitrary sequence s into a bitonic sequence
• sort s[1 … n/2] in place in ascending order (recursive call to sort)
• sort s[n/2+1 … n] in place in descending order (recursive call to sort)
2. after step 1, the sequence will be bitonic; sort it using bmerge(s)

21
Bitonic Sort in HJ
void bmerge(final int[] A, final int low, final int high, boolean asc) {
if ( high-low > 1 ) {
final int mid = (low + high)/2 ;
final int size = high – low + 1;
forall (point[i]:[low:mid]) orderElementPair(A, i, i+size/2, asc);
finish {
async bmerge(A, low, mid, asc); async bmerge(A, mid+1, high, asc);
} // finish
} // if
} // bmerge

void bsort (final int[] A, final int low, final int high, boolean asc) {
if ( high-low > 1 ) {
finish {
final int mid = (low + high)/2 ;
async bsort(A, low, mid, asc); async bsort(A, mid+1, high, !asc);
} // finish
bmerge(A, low, high, asc); // asc = true is for ascending order
} // if
} // sort
22
Batcher’s Bitonic Sort in NESL
function bitonic_merge(a) =
if (#a == 1) then a
else
let
halves = bottop(a)
mins = {min(x, y) : x in halves[0]; y in halves[1]};
maxs = {max(x, y) : x in halves[0]; y in halves[1]};
in flatten({bitonic_merge(x) : x in [mins,maxs]});

function bitonic_sort(a) =
if (#a == 1) then a
else
let b = {bitonic_sort(x) : x in bottop(a)};
in bitonic_merge(b[0]++reverse(b[1]));

Run this code at http://www.cs.rice.edu/~johnmc/nesl.html 23


References

• Adapted from slides “Sorting” by Ananth Grama


• Based on Chapter 9 of “Introduction to Parallel Computing”
by Ananth Grama, Anshul Gupta, George Karypis, and Vipin
Kumar. Addison Wesley, 2003
• “Programming Parallel Algorithms.” Guy Blelloch.
Communications of the ACM, volume 39, number 3, March
1996.
• http://www.cs.cmu.edu/~scandal/nesl/algorithms.html#sort
• “Highly Scalable Parallel Sorting.” Edgar Solomonik and
Laxmikant V. Kale. Proc. of the IEEE Intl. Parallel and
Distributed Processing Symp., April, 2010, Atlanta, GA.

24

You might also like