The Selection Problem Selection
Given an integer k and n elements x1, x2, , xn,
taken from a total order, find the k-th smallest element in this set. Of course, we can sort the set in O(n log n) time and then index the k-th element.
k=3 7 4 9 6 2 2 4 6 7 9
Can we solve the selection problem faster?
Selection 1 Selection 2
Quick-Select ( 4.7)
Quick-select is a randomized
selection algorithm based on the prune-and-search paradigm:
Partition
We partition an input
x sequence as in the quick-sort algorithm:
Prune: pick a random element x (called pivot) and partition S into
L elements less than x E elements equal x G elements greater than x
x L E G
We remove, in turn, each element y from S and We insert y into L, E or G, depending on the result of the comparison with the pivot x
Each insertion and removal is
at the beginning or at the end of a sequence, and hence takes O(1) time Thus, the partition step of quick-select takes O(n) time
Search: depending on k, either answer is in E, or we need to recurse in either L or G
k < |L|
k > |L|+|E| k = k - |L| - |E|
|L| < k < |L|+|E| (done)
3
Algorithm partition(S, p) Input sequence S, position p of pivot Output subsequences L, E, G of the elements of S less than, equal to, or greater than the pivot, resp. L, E, G empty sequences x [Link](p) while [Link]() y [Link]([Link]()) if y < x [Link](y) else if y = x [Link](y) else { y > x } [Link](y) return L, E, G
4
Selection
Selection
Quick-Select Visualization
An execution of quick-select can be visualized by a
recursion path
Expected Running Time
Consider a recursive call of quick-select on a sequence of size s
Each node represents a recursive call of quick-select, and stores k and the remaining sequence
Good call: the sizes of L and G are each less than 3s/4 Bad call: one of L and G has size greater than 3s/4
7 2 9 43 7 6 19 7 2 9 43 7 6 1 1 7294376
k=5, S=(7 4 9 3 2 6 5 1 8) k=2, S=(7 4 9 6 5 8) k=2, S=(7 4 6 5) k=1, S=(7 6 5) 5
Selection 5
2 4 3 1
7 9 7 1 1
Good call
Bad call
A call is good with probability 1/2
1/2 of the possible pivots cause good calls: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Bad pivots Good pivots
Selection
Bad pivots
6
Expected Running Time, Part 2
Probabilistic Fact #1: The expected number of coin tosses required in
order to get one head is two
Deterministic Selection
We can do selection in O(n) worst-case time. Main idea: recursively use the selection algorithm
itself to find a good pivot for quick-select: Divide S into n/5 sets of 5 each Find a median in each set Recursively find the median of the baby medians.
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Probabilistic Fact #2: Expectation is a linear function:
E(X + Y ) = E(X ) + E(Y ) E(cX ) = cE(X )
Let T(n) denote the expected running time of quick-select. By Fact #2,
T(n) < T(3n/4) + bn*(expected # of calls before a good call) T(n) < T(3n/4) + 2bn
By Fact #1,
That is, T(n) is a geometric series:
T(n) < 2bn + 2b(3/4)n + 2b(3/4)2n + 2b(3/4)3n + So T(n) is O(n).
Min size for L
Min size for G
We can solve the selection problem in O(n) expected
time.
Selection 7
See Exercise C-4.24 for details of analysis.
Selection 8