0% found this document useful (0 votes)
210 views30 pages

Introduction To Algorithms: Order Statistics

Uploaded by

geniusamit
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
210 views30 pages

Introduction To Algorithms: Order Statistics

Uploaded by

geniusamit
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 30

Introduction to Algorithms

6.046J/18.401J
LECTURE 6
Order Statistics
• Randomized divide and
conquer
• Analysis of expected time
• Worst-case linear-time
order statistics
• Analysis

Prof. Erik Demaine


September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.1
Order statistics
Select the ith smallest of n elements (the
element with rank i).
• i = 1: minimum;
• i = n: maximum;
• i = ⎣(n+1)/2⎦ or ⎡(n+1)/2⎤: median.
Naive algorithm: Sort and index ith element.
Worst-case running time = Θ(n lg n) + Θ(1)
= Θ(n lg n),
using merge sort or heapsort (not quicksort).
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.2
Randomized divide-and-
conquer algorithm
RAND-SELECT(A, p, q, i) ⊳ ith smallest of A[ p . . q]
if p = q then return A[ p]
r ← RAND-PARTITION(A, p, q)
k←r–p+1 ⊳ k = rank(A[r])
if i = k then return A[ r]
if i < k
then return RAND-SELECT(A, p, r – 1, i )
else return RAND-SELECT(A, r + 1, q, i – k )
k
≤≤ A[r]
A[r] ≥≥ A[r]
A[r]
p r q
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.3
Example
Select the i = 7th smallest:
66 10
10 13
13 55 88 33 22 11
11 i=7
pivot
Partition:
22 55 33 66 88 13
13 10
10 11
11 k=4

Select the 7 – 4 = 3rd smallest recursively.


September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.4
Intuition for analysis
(All our analyses today assume that all elements
are distinct.)
Lucky:
T(n) = T(9n/10) + Θ(n) n log10 / 9 1 = n 0 = 1
= Θ(n) CASE 3
Unlucky:
T(n) = T(n – 1) + Θ(n) arithmetic series
= Θ(n2)
Worse than sorting!
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.5
Analysis of expected time
The analysis follows that of randomized
quicksort, but it’s a little different.
Let T(n) = the random variable for the running
time of RAND-SELECT on an input of size n,
assuming random numbers are independent.
For k = 0, 1, …, n–1, define the indicator
random variable
1 if PARTITION generates a k : n–k–1 split,
Xk =
0 otherwise.
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.6
Analysis (continued)
To obtain an upper bound, assume that the ith
element always falls in the larger side of the
partition:
T(max{0, n–1}) + Θ(n) if 0 : n–1 split,
T(max{1, n–2}) + Θ(n) if 1 : n–2 split,
T(n) =
M
T(max{n–1, 0}) + Θ(n) if n–1 : 0 split,
n −1
= ∑ X k (T (max{k , n − k − 1}) + Θ(n)) .
k =0
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.7
Calculating expectation
⎡ n −1 ⎤
E[T (n)] = E ⎢ ∑ X k (T (max{k , n − k − 1}) + Θ(n) )⎥
⎣k =0 ⎦

Take expectations of both sides.

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.8
Calculating expectation
⎡ n −1 ⎤
E[T (n)] = E ⎢ ∑ X k (T (max{k , n − k − 1}) + Θ(n) )⎥
⎣k =0 ⎦
n −1
= ∑ E[ X k (T (max{k , n − k − 1}) + Θ(n) )]
k =0

Linearity of expectation.

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.9
Calculating expectation
⎡ n −1 ⎤
E[T (n)] = E ⎢ ∑ X k (T (max{k , n − k − 1}) + Θ(n) )⎥
⎣k =0 ⎦
n −1
= ∑ E[ X k (T (max{k , n − k − 1}) + Θ(n) )]
k =0
n −1
= ∑ E[ X k ] ⋅ E[T (max{k , n − k − 1}) + Θ(n)]
k =0

Independence of Xk from other random


choices.

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.10
Calculating expectation
⎡ n −1 ⎤
E[T (n)] = E ⎢ ∑ X k (T (max{k , n − k − 1}) + Θ(n) )⎥
⎣k =0 ⎦
n −1
= ∑ E[ X k (T (max{k , n − k − 1}) + Θ(n) )]
k =0
n −1
= ∑ E[ X k ] ⋅ E[T (max{k , n − k − 1}) + Θ(n)]
k =0
n −1 n −1
= 1 ∑ E [T (max{k , n − k − 1})] + 1 ∑ Θ(n)
n k =0 n k =0

Linearity of expectation; E[Xk] = 1/n .

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.11
Calculating expectation
⎡ n −1 ⎤
E[T (n)] = E ⎢ ∑ X k (T (max{k , n − k − 1}) + Θ(n) )⎥
⎣k =0 ⎦
n −1
= ∑ E[ X k (T (max{k , n − k − 1}) + Θ(n) )]
k =0
n −1
= ∑ E[ X k ] ⋅ E[T (max{k , n − k − 1}) + Θ(n)]
k =0
n −1 n −1
= 1 ∑ E [T (max{k , n − k − 1})] + 1 ∑ Θ(n)
n k =0 n k =0
n −1
≤ 2 ∑ E [T (k )] + Θ(n) Upper terms
n k = ⎣n / 2 ⎦
appear twice.
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.12
Hairy recurrence
(But not quite as hairy as the quicksort one.)
n −1
E[T (n)] = 2 ∑ E [T (k )] + Θ(n)
n k= n/2
⎣ ⎦
Prove: E[T(n)] ≤ cn for constant c > 0 .
• The constant c can be chosen large enough
so that E[T(n)] ≤ cn for the base cases.
n −1
Use fact: ∑ 8 (exercise).
k ≤ 3n 2
k = ⎣n / 2 ⎦
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.13
Substitution method
n −1
E [T (n)] ≤ 2 ∑ ck + Θ(n)
n k= n/2
⎣ ⎦
Substitute inductive hypothesis.

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.14
Substitution method
n −1
E [T (n)] ≤ 2 ∑ ck + Θ(n)
n k= n/2
⎣ ⎦
≤ 2c ⎛⎜ 3 n 2 ⎞⎟ + Θ(n)
n ⎝8 ⎠
Use fact.

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.15
Substitution method
n −1
E [T (n)] ≤ 2 ∑ ck + Θ(n)
n k= n/2
⎣ ⎦
≤ 2c ⎛⎜ 3 n 2 ⎞⎟ + Θ(n)
n ⎝8 ⎠
= cn − ⎛⎜ cn − Θ(n) ⎞⎟
⎝4 ⎠
Express as desired – residual.

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.16
Substitution method
n −1
E [T (n)] ≤ 2 ∑ ck + Θ(n)
n k= n/2
⎣ ⎦
≤ 2c ⎛⎜ 3 n 2 ⎞⎟ + Θ(n)
n ⎝8 ⎠
= cn − ⎛⎜ cn − Θ(n) ⎞⎟
⎝4 ⎠
≤ cn ,
if c is chosen large enough so
that cn/4 dominates the Θ(n).
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.17
Summary of randomized
order-statistic selection
• Works fast: linear expected time.
• Excellent algorithm in practice.
• But, the worst case is very bad: Θ(n2).
Q. Is there an algorithm that runs in linear
time in the worst case?
A. Yes, due to Blum, Floyd, Pratt, Rivest,
and Tarjan [1973].
IDEA: Generate a good pivot recursively.

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.18
Worst-case linear-time order
statistics
SELECT(i, n)
1. Divide the n elements into groups of 5. Find
the median of each 5-element group by rote.
2. Recursively SELECT the median x of the ⎣n/5⎦
group medians to be the pivot.
3. Partition around the pivot x. Let k = rank(x).
4. if i = k then return x
elseif i < k Same as
then recursively SELECT the ith RAND-
smallest element in the lower part SELECT
else recursively SELECT the (i–k)th
smallest element in the upper part
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.19
Choosing the pivot

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.20
Choosing the pivot

1. Divide the n elements into groups of 5.

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.21
Choosing the pivot

1. Divide the n elements into groups of 5. Find lesser


the median of each 5-element group by rote.

greater
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.22
Choosing the pivot

1. Divide the n elements into groups of 5. Find lesser


the median of each 5-element group by rote.
2. Recursively SELECT the median x of the ⎣n/5⎦
group medians to be the pivot. greater
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.23
Analysis

At least half the group medians are ≤ x, which lesser


is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.

greater
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.24
Analysis (Assume all elements are distinct.)

At least half the group medians are ≤ x, which lesser


is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.
• Therefore, at least 3 ⎣n/10⎦ elements are ≤ x.
greater
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.25
Analysis (Assume all elements are distinct.)

At least half the group medians are ≤ x, which lesser


is at least ⎣ ⎣n/5⎦ /2⎦ = ⎣n/10⎦ group medians.
• Therefore, at least 3 ⎣n/10⎦ elements are ≤ x.
• Similarly, at least 3 ⎣n/10⎦ elements are ≥ x. greater
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.26
Minor simplification
• For n ≥ 50, we have 3 ⎣n/10⎦ ≥ n/4.
• Therefore, for n ≥ 50 the recursive call to
SELECT in Step 4 is executed recursively
on ≤ 3n/4 elements.
• Thus, the recurrence for running time
can assume that Step 4 takes time
T(3n/4) in the worst case.
• For n < 50, we know that the worst-case
time is T(n) = Θ(1).

September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.27
Developing the recurrence
T(n) SELECT(i, n)
1. Divide the n elements into groups of 5. Find
Θ(n) the median of each 5-element group by rote.
2. Recursively SELECT the median x of the ⎣n/5⎦
T(n/5) group medians to be the pivot.
Θ(n) 3. Partition around the pivot x. Let k = rank(x).
4. if i = k then return x
elseif i < k
T(3n/4) then recursively SELECT the ith
smallest element in the lower part
else recursively SELECT the (i–k)th
smallest element in the upper part
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.28
Solving the recurrence
T (n) = T ⎛⎜ 1 n ⎞⎟ + T ⎛⎜ 3 n ⎞⎟ + Θ(n)
⎝5 ⎠ ⎝4 ⎠

Substitution: T (n) ≤ 1 cn + 3 cn + Θ(n)


T(n) ≤ cn 5 4
= 19 cn + Θ(n)
20
= cn − ⎛⎜ 1 cn − Θ(n) ⎞⎟
⎝ 20 ⎠
≤ cn ,
if c is chosen large enough to handle both the
Θ(n) and the initial conditions.
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.29
Conclusions
• Since the work at each level of recursion
is a constant fraction (19/20) smaller, the
work per level is a geometric series
dominated by the linear work at the root.
• In practice, this algorithm runs slowly,
because the constant in front of n is large.
• The randomized algorithm is far more
practical.
Exercise: Why not divide into groups of 3?
September 28, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson L6.30

You might also like