Advanced Sorts
Advanced Sorts
Decision Tree
a<b
yes no
code between comparisons
(e.g., sequence of exchanges)
b<c a<c
complexity
yes
selection no yes no
comparators
yes no yes no
2 4
Comparison-based lower bound for sorting Drawbacks of complexity results
Theorem. Any comparison based sorting algorithm must use more than Mergesort is optimal (to within a small additive factor)
N lg N - 1.44 N comparisons in the worst-case.
Other operations?
Pf. !
statement is only about number of compares
!
Assume input consists of N distinct values a1 through aN. !
quicksort is faster than mergesort (lower use of other operations)
!
Worst case dictated by tree height h.
!
N ! different orderings. Space?
!
(At least) one leaf corresponds to each ordering.
!
mergesort is not optimal with respect to space usage
!
Binary tree with N ! leaves cannot have height less than lg (N!)
!
insertion sort, selection sort, shellsort, quicksort are space-optimal
!
is there an algorithm that is both time- and space-optimal?
stay tuned for radix sorts
Is my case the worst case?
h ! lg N! !
statement is only about guaranteed worst-case performance
! lg (N / e) N Stirling's formula !
quicksort’s probabilistic guarantee is just as good in practice
Upper bound. Cost guarantee provided by some algorithm for X. Lower bound may not hold if the algorithm has information about
Lower bound. Proven limit on cost guarantee of any algorithm for X. !
the key values
Optimal algorithm. Algorithm with best cost guarantee for X. !
their initial arrangement
Example: sorting.
!
Machine model = # comparisons Partially ordered arrays. Depending on the initial order of the input,
!
Upper bound = N lg N (mergesort) we may not need N log N compares.
!
Lower bound = N lg N - 1.44 N insertion sort requires O(N) compares on
an already sorted array
Mergesort is optimal (to within a small additive factor) Duplicate keys. Depending on the input distribution of duplicates, we
may not need N log N compares.
lower bound ! upper bound
stay tuned for 3-way quicksort
First goal of algorithm design: optimal algorithms Digital properties of keys. We can use digit/character comparisons
instead of key comparisons for numbers and strings.
6 8
Selection: quick-select algorithm
complexity Finished when m = k a[k] is in place, no larger element to the left, no smaller element to the right
selection
system sorts
duplicate keys public static void select(Comparable[] a, int k)
{
comparators StdRandom.shuffle(a);
int l = 0;
int r = a.length - 1;
while (r > l)
{
int i = partition(a, l, r);
if (m > k) r = m - 1;
else if (m < k) l = m + 1;
else return;
}
}
9 11
Find the kth largest element. Theorem. Quick-select takes linear time on average.
!
Min: k = 1. Pf.
!
Max: k = N. !
Intuitively, each partitioning step roughly splits array in half.
!
Median: k = N/2. !
N + N/2 + N/4 + … + 1 ! 2N comparisons.
!
Formal analysis similar to quicksort analysis:
Applications.
CN = 2 N + k ln ( N / k) + (N - k) ln (N / (N - k))
!
Order statistics.
!
Find the “top k” Ex: (2 + 2 ln 2) N comparisons to find the median
Note. Might use ~N2/2 comparisons, but as with quicksort, the random shuffle provides a
probabilistic guarantee.
Use theory as a guide
!
easy O(N log N) upper bound: sort, return a[k]
Theorem. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] There exists a
!
easy O(N) upper bound for some k: min, max
selection algorithm that take linear time in the worst case.
!
easy O(N) lower bound: must examine every element Note. Algorithm is far too complicated to be useful in practice.
internal sorts.
!
Insertion sort, selection sort, bubblesort, shaker sort.
!
Quicksort, mergesort, heapsort, samplesort, shellsort.
!
Solitaire sort, red-black sort, splaysort, Dobosiewicz sort, psort, ...
complexity
selection external sorts. Poly-phase mergesort, cascade-merge, oscillating sort.
system sorts
radix sorts.
duplicate keys !
Distribution, MSD, LSD.
comparators !
3-way radix quicksort.
parallel sorts.
!
Bitonic sort, Batcher even-odd sort.
!
Smooth sort, cube sort, column sort.
!
GPUsort.
13 15
Sorting algorithms are essential in a broad variety of applications Applications have diverse attributes
!
Display Google PageRank results.
!
Deterministic?
!
List RSS news items in reverse chronological order.
!
Keys all distinct?
!
Find the median.
!
Multiple key types?
!
Find the closest pair. !
Linked list or arrays?
problems become easy once
!
Binary search in a database. !
Large or small records?
items are in sorted order
!
Identify statistical outliers. !
Is your file randomly ordered?
Find duplicates in a mailing list.
Need guaranteed performance?
!
!
many more combinations of
attributes than algorithms
!
Data compression.
!
Computer graphics.
Elementary sort may be method of choice for some combination.
Computational biology. Cannot cover all combinations of attributes.
non-obvious applications
!
!
Supply chain management.
!
Load balancing on a parallel computer. Q. Is the system sort good enough?
... A. Maybe (no matter which algorithm it uses).
Every system needs (and has) a system sort!
14 16
Duplicate keys: the problem
Mistake: Put all keys equal to the partitioning element on one side
!
easy to code
!
guarantees N2 running time when all keys equal
complexity
B A A B A B C C B C B A A A A A A A A A A A
selection
system sorts Recommended: Stop scans on keys equal to the partitioning element
easy to code
duplicate keys
!
!
guarantees N lg N compares when all keys equal
comparators
B A A B A B C C B C B A A A A A A A A A A A
17 19
Often, purpose of sort is to bring records with duplicate keys together. 3-way partitioning. Partition elements into 3 parts:
!
Sort population by age. !
Elements between i and j equal to partition element v.
!
Finding collinear points. !
No larger elements to left of i.
!
Remove duplicates from mailing list. !
No smaller elements to right of j.
!
Sort job applicants by college attended.
18 20
Solution to Dutch national flag problem. Duplicate keys: lower bound
21 23
sort(a, l, j);
recursively sort left and right
sort(a, i, r);
}
p i
22 24
Generalized compare
import java.text.Collator;
25 27
Comparable interface: sort uses type’s compareTo() function: Comparable interface: sort uses type’s compareTo() function:
Comparable interface: sort uses type’s compareTo() function: Comparators enable multiple sorts of single file (different keys)
Problem 1: May want to use a different order. Example. Enable sorting students by name or by section.
Problem 2: Some types may have no “natural” order.
Arrays.sort(students, Student.BY_NAME);
Arrays.sort(students, Student.BY_SECT);
Solution: Use Comparator interface
public class ReverseOrder implements Comparator<String> Andrews 3 A 664-480-0023 097 Little Fox 1 A 884-232-5341 11 Dickinson
public int compare(String a, String b) Chen 2 A 991-878-4944 308 Blair Andrews 3 A 664-480-0023 097 Little
29 31
Easy modification to support comparators in our sort implementations Comparators enable multiple sorts of single file (different keys)
!
pass comparator to sort(), less()
!
use it in less Example. Enable sorting students by name or by section.
private static boolean less(Comparator c, Object v, Object w) private static class BySect implements Comparator<Student>
{ return c.compare(v, w) < 0; } {
public int compare(Student a, Student b)
private static void exch(Object[] a, int i, int j) { return a.section - b.section; }
{ Object t = a[i]; a[i] = a[j]; a[j] = t; } }
only use this trick if no danger of overflow
}
30 32
Generalized compare problem Java system sorts
A typical application Use theory as a guide: Java uses both mergesort and quicksort.
!
first, sort by name !
Can sort array of type Comparable or any primitive type.
!
then, sort by section !
Uses quicksort for primitive types.
Arrays.sort(students, Student.BY_NAME); Arrays.sort(students, Student.BY_SECT);
!
Uses mergesort for objects.
import java.util.Arrays;
Andrews 3 A 664-480-0023 097 Little Fox 1 A 884-232-5341 11 Dickinson public class IntegerSort
{
Battle 4 C 874-088-1212 121 Whitman Chen 2 A 991-878-4944 308 Blair public static void main(String[] args)
Chen 2 A 991-878-4944 308 Blair Kanaga 3 B 898-122-9643 22 Brown {
int N = Integer.parseInt(args[0]);
Fox 1 A 884-232-5341 11 Dickinson Andrews 3 A 664-480-0023 097 Little int[] a = new int[N];
Furia 3 A 766-093-9873 101 Brown Furia 3 A 766-093-9873 101 Brown for (int i = 0; i < N; i++)
a[i] = StdIn.readInt();
Gazsi 4 B 665-303-0266 22 Brown Rohde 3 A 232-343-5555 343 Forbes Arrays.sort(a);
Kanaga 3 B 898-122-9643 22 Brown Battle 4 C 874-088-1212 121 Whitman for (int i = 0; i < N; i++)
System.out.println(a[i]);
Rohde 3 A 232-343-5555 343 Forbes Gazsi 4 B 665-303-0266 22 Brown }
}
Q. Why use two different sorts?
@#%&@!! Students in section 3 no longer in order by name.
A. Use of primitive types indicates time and space are critical: quicksort
A stable sort preserves the relative order of records with equal keys. A. Use of objects indicates time and space not so critical:
Is the system sort stable? mergesort provides worst-case guarantee and stability.
33 35
R A M G X K B J E groups of 3
M K E medians
Annoying fact. Many useful sorting algorithms are unstable.
K ninther
Easy solutions.
!
add an integer rank to the key Why use ninther?
!
careful implementation of mergesort !
better partitioning than sampling
!
quick and easy to implement with macros
Open: Stable, inplace, optimal, practical sort?? !
less costly than random Good idea? Stay tuned.
34 36
A final caution
Consequences.
!
Confirms theoretical possibility.
!
Algorithmic complexity attack: you enter linear amount of data;
server performs quadratic amount of work.
37
A final caution
A killer input. Blows function call stack in Java and crashes program.
% more 250000.txt
0 % java IntegerSort < 250000.txt
218750 Exception in thread "main" java.lang.StackOverflowError
222662 at java.util.Arrays.sort1(Arrays.java:562)
11 at java.util.Arrays.sort1(Arrays.java:606)
166672 at java.util.Arrays.sort1(Arrays.java:608)
247070 at java.util.Arrays.sort1(Arrays.java:608)
83339 at java.util.Arrays.sort1(Arrays.java:608)
156253 . . .
...
38