Advanced Topics in Sorting
Advanced Topics in Sorting
‣ selection
‣ duplicate keys
‣ system sorts
‣ comparators
Algorithms in Java, 4th Edition · Robert Sedgewick and Kevin Wayne · Copyright © 2008 · May 2, 2008 10:45:30 AM
Selection
Applications.
• Order statistics.
• Find the “top k.”
Which is true?
• Ω(N log N) lower bound? is selection as hard as sorting?
2
Quick-select
}
return a[k];
}
3
Quick-select: mathematical analysis
CN = 2 N + k ln ( N / k) + (N - k) ln (N / (N - k))
4
Theoretical context for selection
Theorem. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] There exists a compare-
based selection algorithm that takes linear time in the worst case.
5
Generic methods
% javac Quick.java
Note: Quick.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Q. How to fix?
6
Generic methods
private static <Key extends Comparable<Key>> int partition(Key[] a, int lo, int hi)
{ /* as before */ }
Remark. Obnoxious code needed in system sort; not in this course (for brevity).
7
‣ selection
‣ duplicate keys
‣ comparators
‣ applications
8
Duplicate keys
• Huge file.
Chicago 09:00:59
Houston 09:01:10
Chicago 09:19:46
Chicago 09:19:32
Chicago 09
Chicago 09
•
Chicago 09:03:13 Chicago 09:00:00 Chicago 09
Small number of key values. Seattle 09:10:11 Chicago 09:35:21 Chicago 09
Seattle 09:10:25 Chicago 09:00:59 Chicago 09
Phoenix 09:14:25 Houston 09:01:10 Houston 09
Chicago 09:19:32 Houston 09:00:13 NOT Houston 09
Chicago 09:19:46 Phoenix 09:37:44 sorted Phoenix 09
Chicago 09:21:05 Phoenix 09:00:03 Phoenix 09
Seattle 09:22:43 Phoenix 09:14:25 Phoenix 09
Seattle 09:22:54 Seattle 09:10:25 Seattle 09
Chicago 09:25:52 Seattle 09:36:14 Seattle 09
Chicago 09:35:21 Seattle 09:22:43 Seattle 09
Seattle 09:36:14 Seattle 09:10:11 Seattle 09
Phoenix 09:37:44 Seattle 09:22:54 Seattle 09
9
Duplicate keys
10
Duplicate keys: the problem
Assume all keys are equal. Recursive code guarantees this case predominates!
Mistake. Put all keys equal to the partitioning element on one side.
Consequence. ~ N2 / 2 compares when all keys equal.
B A A B A B B B C C C A A A A A A A A A A A
B A A B A B C C B C B A A A A A A A A A A A
before v
lo hi
lo lt gt hi
3-way partitioning
3-way partitioning.
• Let v be partitioning element a[lo].
• Scan i from left to right.
- a[i] less than v : exchange a[lt] with a[i] and increment both lt and i
- a[i] greater than v : exchange a[gt] with a[i] and decrement gt
before v
•
during <v =v >v
In-place.
•
lt i gt
Not much code. >v
after <v =v
• Small overhead if no equal keys. lo lt gt hi
3-way partitioning
13
3-way partitioning: trace
v a[]
lt i gt 0 1 2 3 4 5 6 7 8 9 10 11
0 0 11 R B W W R W B R R W B R
0 1 11 R B W W R W B R R W B R
1 2 11 B R W W R W B R R W B R
1 2 10 B R R W R W B R R W B W
1 3 10 B R R W R W B R R W B W
1 3 9 B R R B R W B R R W W W
2 4 9 B B R R R W B R R W W W
2 5 9 B B R R R W B R R W W W
2 5 8 B B R R R W B R R W W W
2 5 7 B B R R R R B R W W W W
2 6 7 B B R R R R B R W W W W
3 7 7 B B B R R R R R W W W W
3 8 7 B B B R R R R R W W W W
3-way partitioning trace (array contents after each loop iteration)
14
3-way quicksort: Java implementation
15
3-way quicksort: visual trace
16
Duplicate keys: lower bound
17
‣ selection
‣ duplicate keys
‣ comparators
‣ applications
18
Natural order
19
Generalized compare
String[] a;
...
Arrays.sort(a);
Arrays.sort(a, String.CASE_INSENSITIVE_ORDER);
Arrays.sort(a, Collator.getInstance(Locale.SPANISH));
import java.text.Collator;
20
Comparators
21
Comparator example
comparator implementation
...
Arrays.sort(a, new ReverseOrder());
...
client
22
Sort implementation with comparators
type variable
Ex. Insertion sort. (not necessarily Comparable)
23
Generalized compare
Arrays.sort(students, Student.BY_NAME);
Arrays.sort(students, Student.BY_SECT);
24
Generalized compare
25
Generalized compare problem
A stable sort preserves the relative order of records with equal keys.
26
Stability
•
Phoenix 09:00:03 Chicago 09:03:13 Chicago 09:00:59
Mergesort? Houston 09:00:13 Chicago 09:21:05 Chicago 09:03:13
Chicago 09:00:59 Chicago 09:19:46 Chicago 09:19:32
Houston 09:01:10 Chicago 09:19:32 Chicago 09:19:46
Chicago 09:03:13 Chicago 09:00:00 Chicago 09:21:05
Seattle 09:10:11 Chicago 09:35:21 Chicago 09:25:52
Seattle 09:10:25 Chicago 09:00:59 Chicago 09:35:21
Phoenix 09:14:25 Houston 09:01:10 Houston 09:00:13
Chicago 09:19:32 Houston 09:00:13 NOT Houston 09:01:10 sorted
Chicago 09:19:46 Phoenix 09:37:44 sorted Phoenix 09:00:03
Chicago 09:21:05 Phoenix 09:00:03 Phoenix 09:14:25
Seattle 09:22:43 Phoenix 09:14:25 Phoenix 09:37:44
Seattle 09:22:54 Seattle 09:10:25 Seattle 09:10:11
Chicago 09:25:52 Seattle 09:36:14 Seattle 09:10:25
Chicago 09:35:21 Seattle 09:22:43 Seattle 09:22:43
Seattle 09:36:14 Seattle 09:10:11 Seattle 09:22:54
Phoenix 09:37:44 Seattle 09:22:54 Seattle 09:36:14
28
Sorting applications
• Data compression.
• Computer graphics.
• Computational biology.
non-obvious applications
• Supply chain management.
• Load balancing on a parallel computer.
...
Every system needs (and has) a system sort!
29
Java system sorts
import java.util.Arrays;
30
Java system sort for primitive types
nine evenly
spaced elements R L A P M C G A X Z K R B R J J E
groups of 3 R A M G X K B J E
medians M K E
ninther K
32
Achilles heel in Bentley-McIlroy implementation (Java system sort)
Consequences.
• Confirms theoretical possibility.
• Algorithmic complexity attack: you enter linear amount of data;
server performs quadratic amount of work.
Internal sorts.
• Insertion sort, selection sort, bubblesort, shaker sort.
• Quicksort, mergesort, heapsort, samplesort, shellsort.
• Solitaire sort, red-black sort, splaysort, Dobosiewicz sort, psort, ...
Parallel sorts.
• Bitonic sort, Batcher even-odd sort.
• Smooth sort, cube sort, column sort.
• GPUsort.
34
System sort: Which algorithm to use?
36