0% found this document useful (0 votes)
4 views46 pages

Slides04 Selection

The document discusses the selection problem, which involves finding the k-th smallest integer from a set of n integers. It outlines various methods for solving this problem, including one-by-one selection, sorting, and a divide-and-conquer approach. The divide-and-conquer method is further refined by introducing randomness to improve average-case performance, ultimately leading to an expected running time of O(n).

Uploaded by

ez4uke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views46 pages

Slides04 Selection

The document discusses the selection problem, which involves finding the k-th smallest integer from a set of n integers. It outlines various methods for solving this problem, including one-by-one selection, sorting, and a divide-and-conquer approach. The divide-and-conquer method is further refined by introducing randomness to improve average-case performance, ultimately leading to an expected running time of O(n).

Uploaded by

ez4uke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Divide and Conquer

Selection
Selection Problem

▪ Input: A set 𝑆 of n integers 𝑥1 , 𝑥2 , … , 𝑥𝑛 and an integer 𝑘


▪ Output: The 𝑘-th smallest integer 𝑥 ∗ among 𝑥1 , 𝑥2 , … , 𝑥𝑛
One-by-one Selection

▪ Input: A set 𝑆 of n integers 𝑥1 , 𝑥2 , … , 𝑥𝑛 and an integer 𝑘


▪ Output: The 𝑘-th smallest integer 𝑥 ∗ among 𝑥1 , 𝑥2 , … , 𝑥𝑛
▪ Plan 1
– Select the smallest integer. 𝑂 𝑛
– Select the 2nd smallest integer. 𝑂(𝑛 − 1)
– …
– Select the k-th smallest integer. 𝑂(𝑛 − 𝑘 + 1)

– Total running time 𝑶(𝒏𝒌)


Sorting

▪ Input: A set 𝑆 of n integers 𝑥1 , 𝑥2 , … , 𝑥𝑛 and an integer 𝑘


▪ Output: The 𝑘-th smallest integer 𝑥 ∗ among
▪ Plan 2
– 𝑥1 , 𝑥2 , … , 𝑥𝑛 Sort the integers in ascending order. 𝑂 𝑛 log 𝑛
– Output the 𝑘-th integer. 𝑂(1)
– Total running time 𝑶(𝒏 𝒍𝒐𝒈 𝒏)
Big
Problem

Small Small
Problem Problem

Smaller Smaller Smaller Smaller


Problem Problem Problem Problem

Ok! Let’s move to divide and


conquer!
Divide and Conquer

▪ Input: A set 𝑆 of n integers 𝑥1 , 𝑥2 , … , 𝑥𝑛 and an integer 𝑘


▪ Output: The 𝑘-th smallest integer 𝑥 ∗ among 𝑥1 , 𝑥2 , … , 𝑥𝑛
▪ Plan 2: Divide and Conquer
– Divide:
▪ Pick an arbitrary value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
▪ Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
– L:𝑥<𝑣
– M:𝑥=𝑣
– R:𝑥>𝑣
– Recurse: find 𝑥 ∗ in the subset contains 𝑥 ∗ .
– Combine: we already have 𝑥 ∗ !
Divide

▪ Choose 𝑣 = 3.
1 2 5 3 1 3 4 0

▪ What is L , M, and R?
1 2 5 3 1 3 4 0

▪ L: 1 2 1 0

▪ M: 3 3

▪ R: 5 4
Divide

▪ L: 1 2 1 0

▪ M: 3 3

▪ R: 5 4
Recurse

▪ Roughly sorted list

1 2 1 0 3 3 5 4

𝑘 = 1,2,3,4 𝑘 = 5,6 𝑘 = 7,8


Recurse

▪ How to find 𝑥 ∗ in L,M,R?


– Recall 𝑥 ∗ is the 𝑘-th smallest integer in 𝑆.

▪ L: 1 2 1 0
– 𝑥 ∗ is the 𝑘-th integer in L

▪ M: 3 3
– 𝑥∗ = 3

▪ R: 5 4
– 𝑥 ∗ is the (𝑘 − L − |M|)-th integer in R
Example: 𝑘 = 4

1 2 5 3 1 3 4 0

1 2 1 0 3 3 5 4
Example: 𝑘 = 4

1 2 5 3 1 3 4 0

1 2 1 0
Example: 𝑘 = 4

1 2 5 3 1 3 4 0

1 2 1 0

0 1 1 2
Example: 𝑘 = 4

1 2 5 3 1 3 4 0

1 2 1 0

2
Example: 𝑘 = 4

1 2 5 3 1 3 4 0

1 2 1 0

Output 2
Formalize

Function Select(𝑺,𝒌)
▪ Divide:
– Pick an arbitrary value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣,
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse:
– Recurse the subset contains 𝑥 ∗ .
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘);
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣;
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|).
Running Time We want to know 𝑻(𝒏)

Function Select(𝑺,𝒌)
▪ Divide:
– Pick an arbitrary value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣, Divide: 𝑶(𝒏)
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse: 𝑻( 𝑳 )
– Recurse the subset contains 𝑥 ∗ .
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘);
𝑶(𝟏)
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣;
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|).
𝑻(|𝑹|)
Running Time

Fact
▪ 𝑇 𝑛 ≤ 𝑂 𝑛 + m𝑎𝑥 𝑇 𝐿 , 𝑇 𝑅
▪ 𝐿 + 𝑀 + 𝑅 = 𝑆 =𝑛
≤𝑂 𝑛 +𝑇 𝑛−1 ▪ 𝑀 ≥1

▪ 𝐿 , 𝑅 ≤𝑛−1
≤𝑂 𝑛 +𝑂 𝑛−1 +𝑇 𝑛−2 ≤⋯

= 𝑂 𝑛 + 𝑂 𝑛 − 1 + 𝑂 𝑛 − 2 + ⋯ + 𝑂 1 = 𝑶(𝒏𝟐 )
▪ Very Bad!
– One-by-one: 𝑂 𝑛𝑘
– Sorting: 𝑂(𝑛 log 𝑛)
Is it really that bad?

▪ Yes, the unluckiest case:


– 𝑘=1
– Each time, 𝑣 is the largest integer.
– 𝑇 𝑛 = 𝑂 𝑛 + 𝑇 𝑛 − 1 = 𝑂 𝑛 + 𝑂 𝑛 − 1 + 𝑇 𝑛 − 2 = ⋯ = 𝑂(𝑛2 )
▪ What if we are super lucky?
– Each time, 𝑣 is in the middle.
𝑛 𝑛 𝑛
– 𝑇 𝑛 =𝑇 +𝑂 𝑛 =𝑇 +𝑂 +𝑂 𝑛 =⋯=𝑂 𝑛 .
2 4 2

▪ What if we are lucky?


1 2
– Each time, 𝑣 is in the middle range [ 𝑛, 𝑛].
3 3
2 4 2
– 𝑇 𝑛 =𝑇 3
𝑛 +𝑂 𝑛 =𝑇 9
𝑛 +𝑂 3
𝑛 +𝑂 𝑛 =⋯=𝑂 𝑛 .

▪ Idea: to make us reasonably lucky in average by randomness.


What is the next?

▪ Improving the running time with randomness.


▪ Improving the running time without randomness.
Using Randomness!

Function Select(𝑺,𝒌)
▪ Divide:
– Pick an arbitrary value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣,
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse:
– Recurse the subset contains 𝑥 ∗ .
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘);
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣;
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|).
Using Randomness!

Function Select(𝑺,𝒌)
▪ Divide:
– Pick an arbitrary value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣,
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse:
– Recurse the subset contains 𝑥 ∗ .
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘);
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣;
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|).
Quick Selection

Function Select(𝑺,𝒌)
▪ Divide:
– Pick a random value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣,
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse:
– Recurse the subset contains 𝑥 ∗ .
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘);
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣;
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|).
When we are lucky

𝑆 𝑆 𝑆
4 2 4
▪ : Lucky pivot area
▪ : Bad pivot area
1
▪ Fact 1: With probability, we are lucky!
2
3𝑛
▪ Fact 2: If we are always lucky, 𝑇 𝑛 = 𝑇 + 𝑂 𝑛 = 𝑂(𝑛)
4
Analysis

3𝑛
▪ 𝜏 𝑛 : Time we reduce 𝑛 to
4
3𝑛
▪ 𝑇 𝑛 = 𝜏 𝑛 + 𝑇( )
4
3𝑛
▪ 𝐸[𝜏 𝑛 ]: The expected time we reduce 𝑛 to
4
3𝑛
▪ 𝐸𝑇 𝑛 =𝐸 𝜏 𝑛 +𝑇
4
3𝑛 Fact
=𝐸 𝜏 𝑛 +𝐸 𝑇
4 1
Since we are lucky with probably 2,
▪ 𝐸𝜏 𝑛 = 𝑂(𝑛) so the expected number of rounds
3𝑛
it takes to become lucky is 2.
▪ 𝐸[𝑇 𝑛 ] = 𝑂 𝑛 + 𝐸 𝑇 = 𝑶(𝒏)
4
Evaluate Random Algorithm by
Expected Running Time!
Other Viewpoints

▪ Worst Case Running Time


– 𝑂(𝑛2 )

▪ The Probability it runs in 𝑂(𝑛)?


What if we do not want
randomness?
Throw Randomness!

Function Select(𝑺,𝒌)
▪ Divide:
– Pick a random value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣,
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse:
– Recurse the subset contains 𝑥 ∗ .
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘);
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣;
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|).
Throw Randomness!

Function Select(𝑺,𝒌)
▪ Divide:
– Pick a random value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣,
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse:
– Recurse the subset contains 𝑥 ∗ .
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘);
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣;
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|).
Blum, M.; Floyd, R. W.; Pratt, V. R.;
Median of medians (1973) Rivest, R. L.; Tarjan, R. E.

Function Select(𝑺,𝒌)
▪ Divide:
– Pick a good pivot value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣,
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse:
– Recurse the subset contains 𝑥 ∗ .
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘);
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣;
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|).
Trade-off

▪ The time of finding a good pivot.


▪ The quality of the pivot.
▪ We can find the median by sorting the array, but it takes
too much time.
▪ We can find an arbitrary pivot in 𝑂(1) but it may be too
bad.
▪ Look at the recursive running time:
– 𝑇 𝑛 = 𝑇 𝑐 ⋅ 𝑛 + 𝑓𝑖𝑛𝑑𝑃𝑖𝑣𝑜𝑡 + 𝑂(𝑛).
How to select a good pivot?

▪ Partition 𝑆 into subsets with size 5.


12 23 53 84 90 32 4 5 32 63 13 14 8 2 42

5 5 5
How to select a good pivot?

▪ Partition 𝑆 into subsets with size 5.


12 23 53 84 90

32 4 5 32 63

13 14 8 2 42
How to select a good pivot?

▪ Partition 𝑆 into subsets with size 5.


12 23 53 84 90

32 4 5 32 63

13 14 8 2 42

▪ Find the medians of them: 𝑣1 , 𝑣2 , 𝑣3


– 53, 32, 13
How to select a good pivot?

▪ Partition 𝑆 into subsets with size 5.


12 23 53 84 90

32 4 5 32 63

13 14 8 2 42

▪ Find the medians of them: 𝑣1 , 𝑣2 , 𝑣3


– 53, 32, 13

▪ Fix 𝑣 to be the median of 𝑣1 , 𝑣2 , 𝑣3


– 𝑣 = 32
How long it takes?

▪ Partition 𝑆 into subsets with size 5.


12 23 53 84 90
𝑶(𝒏)
32 4 5 32 63

13 14 8 2 42

▪ Find the medians of them: 𝑣1 , 𝑣2 , 𝑣3


𝑶(𝒏)
– 53, 32, 13

▪ Fix 𝑣 to be the median of 𝑣1 , 𝑣2 , 𝑣3


– 𝑣 = 32 𝑻(𝒏/𝟓)
Why it is good?

▪ It should be in the middle range


▪ Why? Two questions
– How many integers should be no greater than 𝑣?
– How many integers should be no less than 𝑣?
Answer them step by step
median
▪ Partition 𝑆 into subsets
median
12 23 53 84 90 Smaller median
than 𝑣.
median
32 4 5 32 63 𝑣
median
13 14 8 2 42
median
▪ Answer median
𝑛 𝑛
– We have groups, so medians. median
5 5
– 𝑣 is no greater than 𝑛/10 medians, no less than 𝑛/10 medians.
Larger
– Each median is no greater than 2 integers, no less than 2 integers. than 𝑣.
3𝑛
– 𝑣 is no greater than integers, no less than 3𝑛/10 integers.
10
The running time

Function Select(𝑺,𝒌)
𝒏
▪ Divide: 𝑻 + 𝑶(𝒏)
𝟓
– Pick a good pivot value 𝑣 among 𝑥1 , 𝑥2 , 𝑥3 , … .
– Divide 𝑥1 , 𝑥2 , 𝑥3 , … into three subsets:
▪ L : 𝑥 < 𝑣, 𝑶(𝒏)
▪ M : 𝑥 = 𝑣,
▪ R : 𝑥 > 𝑣.

▪ Recurse:
– Recurse the subset contains 𝑥 ∗ . 𝟑
𝑻 𝑳 ≤ 𝑻(𝒏 − 𝒏)
▪ If 𝑘 ≤ |𝐿|, output Select(𝐿,𝑘); 𝟏𝟎
▪ If 𝐿 < 𝑘 ≤ 𝐿 + |𝑀|, output 𝑣; 𝟑
▪ If |𝐿| + |𝑀| < 𝑘, output Select(𝑅, k − L − |M|). 𝑻 𝑹 ≤ 𝑻(𝒏 − 𝒏)
𝟏𝟎
Guess time!
𝑻(𝒏)=𝑻(𝟎.𝟐𝒏)+𝑻(𝟎.𝟕𝒏)+𝑶(𝒏)
Observation: Comparing to Master Theorem

Level 0 𝑛 𝑂(𝑛)

Level 1 0.2𝑛 0.7𝑛 𝑂(0.9𝑛)

Level 2 0.04𝑛 0.14𝑛 …… 0.14𝑛 0.49𝑛 𝑂(0.81𝑛)

Level k …… 𝑂(0.9𝑘 𝑛)

log10 𝑛
Level log 10 𝑛 1 1 …… 1 1 𝑂(0.9 7 𝑛)
7
We allow some
problem with size≤ 1.
Make a guess

𝑻 𝒏 = 𝑻 𝟎. 𝟐𝒏 + 𝑻 𝟎. 𝟕𝒏 + 𝑶(𝒏)
▪ Guess: 𝑇 𝑛 ≤ 𝐵𝑛!
Assume
▪ Try to prove it inductively 𝑂 𝑛 ≤ 𝐶𝑛
– Basic step: 𝑇 1 = 1 ≤ 𝐵𝑛
– Inductive step:
𝑇 𝑛 ≤ 𝑇 0.2n + 𝑇 0.7n + 𝐶𝑛
≤ 0.9𝐵𝑛 + 𝐶𝑛
≤ 𝐵𝑛
It holds when
▪ We have 𝑇 𝑛 ≤ 10𝐶𝑛 = 𝑂(𝑛) 𝐵 ≥ 10𝐶
Remember not to use induction
with Big 𝑂 notations!
One more Question
What if we group them by 2,3,4,5,…?
Today’s goal

▪ Learn the quick selection algorithm


▪ Learn to make it polynomial by randomness (in
expectation) analytically
▪ Learn to make it polynomial by median of medians
analytically
▪ Remember to try to group by 2,3,4,5,6…

You might also like