0% found this document useful (0 votes)
42 views

Advanced Sorts

Uploaded by

Amritesh Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Advanced Sorts

Uploaded by

Amritesh Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Complexity of sorting

Computational complexity. Framework to study efficiency of


algorithms for solving a particular problem X.

Advanced Topics in Sorting Machine model. Focus on fundamental operations.

Upper bound. Cost guarantee provided by some algorithm for X.


Lower bound. Proven limit on cost guarantee of any algorithm for X.
Optimal algorithm. Algorithm with best cost guarantee for X.
• complexity
• selection
lower bound ! upper bound
• system sorts
Example: sorting.
• equal keys
!
Machine model = # comparisons access information only through compares
• comparators
!
Upper bound = N lg N from mergesort.
!
Lower bound ?

References: Algorithms in Java, Chapter 8


Intro to Algs and Data Structs, Chapter 3

Copyright © 2007 by Robert Sedgewick and Kevin Wayne. 1 3

Decision Tree

a<b

yes no
code between comparisons
(e.g., sequence of exchanges)

b<c a<c
complexity
yes
selection no yes no

system sorts abc bac


duplicate keys a<c b<c

comparators
yes no yes no

acb cab bca cba

2 4
Comparison-based lower bound for sorting Drawbacks of complexity results

Theorem. Any comparison based sorting algorithm must use more than Mergesort is optimal (to within a small additive factor)
N lg N - 1.44 N comparisons in the worst-case.
Other operations?
Pf. !
statement is only about number of compares
!
Assume input consists of N distinct values a1 through aN. !
quicksort is faster than mergesort (lower use of other operations)
!
Worst case dictated by tree height h.
!
N ! different orderings. Space?
!
(At least) one leaf corresponds to each ordering.
!
mergesort is not optimal with respect to space usage
!
Binary tree with N ! leaves cannot have height less than lg (N!)
!
insertion sort, selection sort, shellsort, quicksort are space-optimal
!
is there an algorithm that is both time- and space-optimal?
stay tuned for radix sorts
Is my case the worst case?
h ! lg N! !
statement is only about guaranteed worst-case performance
! lg (N / e) N Stirling's formula !
quicksort’s probabilistic guarantee is just as good in practice

= N lg N - N lg e don’t try to design an algorithm that uses


Lessons half as many compares as mergesort
! N lg N - 1.44 N !
use theory as a guide
!
know your algorithms use quicksort when time and space are critical
5 7

Complexity of sorting More drawbacks of complexity results

Upper bound. Cost guarantee provided by some algorithm for X. Lower bound may not hold if the algorithm has information about
Lower bound. Proven limit on cost guarantee of any algorithm for X. !
the key values
Optimal algorithm. Algorithm with best cost guarantee for X. !
their initial arrangement

Example: sorting.
!
Machine model = # comparisons Partially ordered arrays. Depending on the initial order of the input,
!
Upper bound = N lg N (mergesort) we may not need N log N compares.
!
Lower bound = N lg N - 1.44 N insertion sort requires O(N) compares on
an already sorted array

Mergesort is optimal (to within a small additive factor) Duplicate keys. Depending on the input distribution of duplicates, we
may not need N log N compares.
lower bound ! upper bound
stay tuned for 3-way quicksort

First goal of algorithm design: optimal algorithms Digital properties of keys. We can use digit/character comparisons
instead of key comparisons for numbers and strings.

stay tuned for radix sorts

6 8
Selection: quick-select algorithm

Partition array so that:


if k is here if k is here
!
element a[m] is in place set r to m-1 set l to m+1
!
no larger element to the left of m
!
no smaller element to the right of m
Repeat in one subarray, depending on m. l m r

complexity Finished when m = k a[k] is in place, no larger element to the left, no smaller element to the right

selection
system sorts
duplicate keys public static void select(Comparable[] a, int k)
{
comparators StdRandom.shuffle(a);
int l = 0;
int r = a.length - 1;
while (r > l)
{
int i = partition(a, l, r);
if (m > k) r = m - 1;
else if (m < k) l = m + 1;
else return;
}
}
9 11

Selection Selection analysis

Find the kth largest element. Theorem. Quick-select takes linear time on average.
!
Min: k = 1. Pf.
!
Max: k = N. !
Intuitively, each partitioning step roughly splits array in half.
!
Median: k = N/2. !
N + N/2 + N/4 + … + 1 ! 2N comparisons.
!
Formal analysis similar to quicksort analysis:
Applications.
CN = 2 N + k ln ( N / k) + (N - k) ln (N / (N - k))
!
Order statistics.
!
Find the “top k” Ex: (2 + 2 ln 2) N comparisons to find the median

Note. Might use ~N2/2 comparisons, but as with quicksort, the random shuffle provides a
probabilistic guarantee.
Use theory as a guide
!
easy O(N log N) upper bound: sort, return a[k]
Theorem. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] There exists a
!
easy O(N) upper bound for some k: min, max
selection algorithm that take linear time in the worst case.
!
easy O(N) lower bound: must examine every element Note. Algorithm is far too complicated to be useful in practice.

Which is true? Use theory as a guide


!
O(N log N) lower bound? [is selection as hard as sorting?] !
still worthwhile to seek practical linear-time (worst-case) algorithm
!
O(N) upper bound? [linear algorithm for all k] !
until one is discovered, use quick-select if you don’t need a full sort
10 12
System sort: Which algorithm to use?

Many sorting algorithms to choose from

internal sorts.
!
Insertion sort, selection sort, bubblesort, shaker sort.
!
Quicksort, mergesort, heapsort, samplesort, shellsort.
!
Solitaire sort, red-black sort, splaysort, Dobosiewicz sort, psort, ...
complexity
selection external sorts. Poly-phase mergesort, cascade-merge, oscillating sort.
system sorts
radix sorts.
duplicate keys !
Distribution, MSD, LSD.
comparators !
3-way radix quicksort.

parallel sorts.
!
Bitonic sort, Batcher even-odd sort.
!
Smooth sort, cube sort, column sort.
!
GPUsort.

13 15

Sorting Applications System sort: Which algorithm to use?

Sorting algorithms are essential in a broad variety of applications Applications have diverse attributes

Sort a list of names.


Stable?
!
!

Organize an MP3 library. obvious applications Multiple keys?


!
!

!
Display Google PageRank results.
!
Deterministic?
!
List RSS news items in reverse chronological order.
!
Keys all distinct?
!
Find the median.
!
Multiple key types?
!
Find the closest pair. !
Linked list or arrays?
problems become easy once
!
Binary search in a database. !
Large or small records?
items are in sorted order
!
Identify statistical outliers. !
Is your file randomly ordered?
Find duplicates in a mailing list.
Need guaranteed performance?
!
!
many more combinations of
attributes than algorithms
!
Data compression.
!
Computer graphics.
Elementary sort may be method of choice for some combination.
Computational biology. Cannot cover all combinations of attributes.
non-obvious applications
!

!
Supply chain management.
!
Load balancing on a parallel computer. Q. Is the system sort good enough?
... A. Maybe (no matter which algorithm it uses).
Every system needs (and has) a system sort!
14 16
Duplicate keys: the problem

Assume all keys are equal.


Recursive code guarantees that case will predominate!

Mistake: Put all keys equal to the partitioning element on one side
!
easy to code
!
guarantees N2 running time when all keys equal
complexity
B A A B A B C C B C B A A A A A A A A A A A
selection
system sorts Recommended: Stop scans on keys equal to the partitioning element
easy to code
duplicate keys
!

!
guarantees N lg N compares when all keys equal
comparators
B A A B A B C C B C B A A A A A A A A A A A

Desirable: Put all keys equal to the partitioning element in place


A A A B B B B B C C C A A A A A A A A A A A

Common wisdom to 1990s: not worth adding code to inner loop

17 19

Duplicate keys 3-Way Partitioning

Often, purpose of sort is to bring records with duplicate keys together. 3-way partitioning. Partition elements into 3 parts:
!
Sort population by age. !
Elements between i and j equal to partition element v.
!
Finding collinear points. !
No larger elements to left of i.
!
Remove duplicates from mailing list. !
No smaller elements to right of j.
!
Sort job applicants by college attended.

Typical characteristics of such applications.


!
Huge file.
!
Small number of key values.

Mergesort with duplicate keys: always ~ N lg N compares


Dutch national flag problem.
Quicksort with duplicate keys !
not done in practical sorts before mid-1990s.
!
algorithm goes quadratic unless partitioning stops on equal keys! !
new approach discovered when fixing mistake in Unix qsort()
!
[many textbook and system implementations have this problem] !
now incorporated into Java system sort
!
1990s Unix user found this problem in qsort()

18 20
Solution to Dutch national flag problem. Duplicate keys: lower bound

3-way partitioning (Bentley-McIlroy). Theorem. [Sedgewick-Bentley] Quicksort with 3-way partitioning is


!
Partition elements into 4 parts: optimal for random keys with duplicates.
– no larger elements to left of i
– no smaller elements to right of j Proof (beyond scope of 226).
– equal elements to left of p !
generalize decision tree
– equal elements to right of q !
tie cost to entropy
!
Afterwards, swap equal keys into center. !
note: cost is linear when number of key values is O(1)

All the right properties.


!
in-place.
!
not much code. Bottom line: Randomized Quicksort with 3-way partitioning
!
linear if keys are all equal. reduces cost from quadratic to linear (!) in broad class of applications
!
small overhead if no equal keys.

21 23

3-way Quicksort: Java Implementation 3-way partitioning animation


j q
private static void sort(Comparable[] a, int l, int r)
{
if (r <= l) return;
int i = l-1, j = r;
int p = l-1, q = r;

while(true) 4-way partitioning


{
while (less(a[++i], a[r])) ;
while (less(a[r], a[--j])) if (j == l) break;
if (i >= j) break;
exch(a, i, j);
if (eq(a[i], a[r])) exch(a, ++p, i); swap equal keys to left or right
if (eq(a[j], a[r])) exch(a, --q, j);
}
exch(a, i, r);
swap equal keys back to middle
j = i - 1;
i = i + 1;
for (int k = l ; k <= p; k++) exch(a, k, j--);
for (int k = r-1; k >= q; k--) exch(a, k, i++);

sort(a, l, j);
recursively sort left and right
sort(a, i, r);
}
p i
22 24
Generalized compare

Comparable interface: sort uses type’s compareTo() function:

Problem 1: May want to use a different order.


Problem 2: Some types may have no “natural” order.

Ex. Sort strings by:


complexity !
Natural order. Now is the time
selection !
Case insensitive. is Now the time
system sorts !
French. real réal rico
Spanish.
duplicate keys café cuidado champiñón dulce
!

ch and rr are single letters


comparators
String[] a;
...
Arrays.sort(a);
Arrays.sort(a, String.CASE_INSENSITIVE_ORDER);
Arrays.sort(a, Collator.getInstance(Locale.FRENCH));
Arrays.sort(a, Collator.getInstance(Locale.SPANISH));

import java.text.Collator;
25 27

Generalized compare Generalized compare

Comparable interface: sort uses type’s compareTo() function: Comparable interface: sort uses type’s compareTo() function:

public class Date implements Comparable<Date>


{ Problem 1: May want to use a different order.
private int month, day, year; Problem 2: Some types may have no “natural” order.
public Date(int m, int d, int y)
{ Solution: Use Comparator interface
month = m;
day = d;
year = y; Comparator interface. Require a method compare() so that
} compare(v, w) is a total order that behaves like compareTo().
public int compareTo(Date b)
{ Advantage. Separates the definition of the data type from
Date a = this;
if (a.year < b.year ) return -1; definition of what it means to compare two objects of that type.
if (a.year > b.year ) return +1; !
add any number of new orders to a data type.
if (a.month < b.month) return -1;
if (a.month > b.month) return +1;
!
add an order to a library data type with no natural order.
if (a.day < b.day ) return -1;
if (a.day > b.day ) return +1;
return 0;
}
}
26 28
Generalized compare Generalized compare

Comparable interface: sort uses type’s compareTo() function: Comparators enable multiple sorts of single file (different keys)

Problem 1: May want to use a different order. Example. Enable sorting students by name or by section.
Problem 2: Some types may have no “natural” order.
Arrays.sort(students, Student.BY_NAME);
Arrays.sort(students, Student.BY_SECT);
Solution: Use Comparator interface

sort by name then sort by section


Example:

public class ReverseOrder implements Comparator<String> Andrews 3 A 664-480-0023 097 Little Fox 1 A 884-232-5341 11 Dickinson

{ Battle 4 C 874-088-1212 121 Whitman Chen 2 A 991-878-4944 308 Blair

public int compare(String a, String b) Chen 2 A 991-878-4944 308 Blair Andrews 3 A 664-480-0023 097 Little

{ return - a.compareTo(b); } Fox 1 A 884-232-5341 11 Dickinson Furia 3 A 766-093-9873 101 Brown

} Furia 3 A 766-093-9873 101 Brown Kanaga 3 B 898-122-9643 22 Brown


reverse sense of comparison
Gazsi 4 B 665-303-0266 22 Brown Rohde 3 A 232-343-5555 343 Forbes

Kanaga 3 B 898-122-9643 22 Brown Battle 4 C 874-088-1212 121 Whitman

... Rohde 3 A 232-343-5555 343 Forbes Gazsi 4 B 665-303-0266 22 Brown


Arrays.sort(a, new ReverseOrder());
...

29 31

Generalized compare Generalized compare

Easy modification to support comparators in our sort implementations Comparators enable multiple sorts of single file (different keys)
!
pass comparator to sort(), less()
!
use it in less Example. Enable sorting students by name or by section.

Example: (insertion sort) public class Student


{
public static final Comparator<Student> BY_NAME = new ByName();
public static final Comparator<Student> BY_SECT = new BySect();
public static void sort(Object[] a, Comparator comparator)
{ private String name;
int N = a.length; private int section;
for (int i = 0; i < N; i++) ...
for (int j = i; j > 0; j--) private static class ByName implements Comparator<Student>
if (less(comparator, a[j], a[j-1])) {
exch(a, j, j-1); public int compare(Student a, Student b)
else break; { return a.name.compareTo(b.name); }
} }

private static boolean less(Comparator c, Object v, Object w) private static class BySect implements Comparator<Student>
{ return c.compare(v, w) < 0; } {
public int compare(Student a, Student b)
private static void exch(Object[] a, int i, int j) { return a.section - b.section; }
{ Object t = a[i]; a[i] = a[j]; a[j] = t; } }
only use this trick if no danger of overflow
}
30 32
Generalized compare problem Java system sorts

A typical application Use theory as a guide: Java uses both mergesort and quicksort.
!
first, sort by name !
Can sort array of type Comparable or any primitive type.
!
then, sort by section !
Uses quicksort for primitive types.
Arrays.sort(students, Student.BY_NAME); Arrays.sort(students, Student.BY_SECT);
!
Uses mergesort for objects.

import java.util.Arrays;
Andrews 3 A 664-480-0023 097 Little Fox 1 A 884-232-5341 11 Dickinson public class IntegerSort
{
Battle 4 C 874-088-1212 121 Whitman Chen 2 A 991-878-4944 308 Blair public static void main(String[] args)
Chen 2 A 991-878-4944 308 Blair Kanaga 3 B 898-122-9643 22 Brown {
int N = Integer.parseInt(args[0]);
Fox 1 A 884-232-5341 11 Dickinson Andrews 3 A 664-480-0023 097 Little int[] a = new int[N];
Furia 3 A 766-093-9873 101 Brown Furia 3 A 766-093-9873 101 Brown for (int i = 0; i < N; i++)
a[i] = StdIn.readInt();
Gazsi 4 B 665-303-0266 22 Brown Rohde 3 A 232-343-5555 343 Forbes Arrays.sort(a);
Kanaga 3 B 898-122-9643 22 Brown Battle 4 C 874-088-1212 121 Whitman for (int i = 0; i < N; i++)
System.out.println(a[i]);
Rohde 3 A 232-343-5555 343 Forbes Gazsi 4 B 665-303-0266 22 Brown }
}
Q. Why use two different sorts?
@#%&@!! Students in section 3 no longer in order by name.
A. Use of primitive types indicates time and space are critical: quicksort
A stable sort preserves the relative order of records with equal keys. A. Use of objects indicates time and space not so critical:
Is the system sort stable? mergesort provides worst-case guarantee and stability.
33 35

Stability Arrays.sort() for primitive types

Q. Which sorts are stable? Bentley-McIlroy. [Engineeering a Sort Function]


!
Selection sort? !
Original motivation: improve qsort() function in C.
!
Insertion sort? !
Basic algorithm = 3-way quicksort with cutoff to insertion sort.
!
Shellsort? !
Partition on Tukey's ninther: median-of-3 elements, each of which
!
Quicksort? is a median-of-3 elements.
!
Mergesort? approximate median-of-9
nine evenly spaced elements

A. Careful look at code required. R A M G X K B J E

R A M G X K B J E groups of 3

M K E medians
Annoying fact. Many useful sorting algorithms are unstable.
K ninther

Easy solutions.
!
add an integer rank to the key Why use ninther?
!
careful implementation of mergesort !
better partitioning than sampling
!
quick and easy to implement with macros
Open: Stable, inplace, optimal, practical sort?? !
less costly than random Good idea? Stay tuned.

34 36
A final caution

Based on all this research, Java’s system sort is solid, right?

McIlroy's devious idea. [A Killer Adversary for Quicksort]


Construct malicious input while running system quicksort,
!

in response to elements compared.


If p is pivot, commit to (x < p) and (y < p), but don't commit to
!

(x < y) or (x > y) until x and y are compared.

Consequences.
!
Confirms theoretical possibility.
!
Algorithmic complexity attack: you enter linear amount of data;
server performs quadratic amount of work.

37

A final caution

A killer input. Blows function call stack in Java and crashes program.

more disastrous possibilities in C

% more 250000.txt
0 % java IntegerSort < 250000.txt
218750 Exception in thread "main" java.lang.StackOverflowError
222662 at java.util.Arrays.sort1(Arrays.java:562)
11 at java.util.Arrays.sort1(Arrays.java:606)
166672 at java.util.Arrays.sort1(Arrays.java:608)
247070 at java.util.Arrays.sort1(Arrays.java:608)
83339 at java.util.Arrays.sort1(Arrays.java:608)
156253 . . .
...

250,000 integers between Java's sorting library crashes, even if


0 and 250,000 you give it as much stack space as Windows allows.

Achilles heel: no guarantees in implementation (randomization is required)

38

You might also like