0% found this document useful (0 votes)
15 views

K - Select Using Data Structures

The document describes the SELECT algorithm for finding the kth smallest element in an unsorted array. It works by using a divide and conquer approach. It selects a pivot element, partitions the array around the pivot so that elements less than the pivot are left and greater elements are right. It then recursively searches in the left or right partition depending on if k is less than or greater than the pivot position. This allows it to find the kth element in linear O(n) time on average.

Uploaded by

Someone Out here
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

K - Select Using Data Structures

The document describes the SELECT algorithm for finding the kth smallest element in an unsorted array. It works by using a divide and conquer approach. It selects a pivot element, partitions the array around the pivot so that elements less than the pivot are left and greater elements are right. It then recursively searches in the left or right partition depending on if k is less than or greater than the pivot position. This allows it to find the kth element in linear O(n) time on average.

Uploaded by

Someone Out here
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

The SELECT Algorithm

Data Structures and Algorithms


Samyan Qayyum Wahla
1
THE SELECT PROBLEM
INPUT:
an unsorted array A of n elements (assume all elements are distinct),
& an integer k in {1, …, n}

7 2 6 9 1 5 4 11

OUTPUT of SELECT(A, k): the kth smallest element of A


SELECT(A, 1) = 1 SELECT(A, 1) = MIN(A)
SELECT(A, 2) = 2 SELECT(A, n/2) = Note: k is a 1-
indexed number!
SELECT(A, 3) = 4 MEDIAN(A)
SELECT(A, 8) = 11 SELECT(A, n) = MAX(A)
2
THE SELECT PROBLEM
INPUT:
an unsorted array A of n elements (assume all elements are distinct),
& an integer k in {1, …, n}

7 2 6 9 1 5 4 11

OUTPUT of SELECT(A, k): the kth smallest element of A

Can you come up with an O(n log n) algorithm for


SELECT?
3
AN O(n log n) ALGORITHM

SELECT(A,k):
A = MERGESORT(A) It’s k-1 (rather than k)
since my pseudocode
return A[k-1] is 0-indexed and k is a
1-indexed number

Okay, great! We’re done!

4
AN O(n log n) ALGORITHM

SELECT(A,k):
THE QUESTION IS...

A =CAN WE DO
MERGESORT(A)
BETTER
It’s k-1 (rather than k)
since my pseudocode
return A[k-1] is 0-indexed and k is a

? 1-indexed number

Okay, great! We’re done!

5
GOAL: AN O(n) ALGORITHM
If k = 1, then we want the minimum of A. There’s an easy O(n) algorithm for that:
Pretty much the same if k = n (we’re just finding MAX(A) instead)

SELECT-1(A):
result = infinity
for i in [0,...,n-1]: This loop runs O(n) times

The body of each iteration


if A[i] < result:
is O(1) work. result = A[i]
return result

Runtime of SELECT-1: O(n)


6
GOAL: AN O(n) ALGORITHM
If k = 2, then we want the second-smallest element in A.
There’s an easy-ish O(n) algorithm for that:
(Not a very important algorithm, because this will end up being a bad idea…)

SELECT-2(A):
result = infinity
minSoFar = infinity
This loop runs O(n) times
for i in [0,...,n-1]:
if A[i] < result & A[i] < minSoFar:
result = minSoFar
The body of each iteration
is still O(1) work.
minSoFar = A[i]
else if A[i] < result & A[i] >= minSoFar
result = A[i]
return result

Runtime of SELECT-2: O(n) 7


GOAL: AN O(n) ALGORITHM
If k = n/2, then we want the median element in A.

SELECT-n/2(A):
result = infinity
minSoFar = infinity
secondMinSoFar = infinity
thirdMinSoFar = infinity
fourthMinSoFar = infinity
fifthMinSoFar = infinity
...

Runtime of SELECT-n/2: O(n2)


Clearly, this algorithm style isn’t a good idea for large k (e.g. n/2).
This basically ends up looking like InsertionSort. 8
LINEAR SELECTION: THE IDEA
Let’s use DIVIDE-and-CONQUER!

Select a pivot
kind of like a “binary
search” for the kth
Partition around it smallest element
(except that the array
isn’t sorted!)

Recurse!
9
LINEAR SELECTION: THE IDEA
3 2 9 8 1 6 4 11

10
LINEAR SELECTION: THE IDEA
3 2 9 8 1 6 4 11
Select a pivot How do we pick a pivot?? We’ll see this later.
For now, imagine we pick it randomly.

11
LINEAR SELECTION: THE IDEA
3 2 9 8 1 6 4 11
Select a pivot How do we pick a pivot?? We’ll see this later.
For now, imagine we pick it randomly.

6
Partition around it
L 3 2 1 4 9 8 11 R
Partition around pivot: L has elements less than pivot, and R has elements greater than pivot.
(Note that L and R remain unsorted).

12
LINEAR SELECTION: THE IDEA
3 2 9 8 1 6 4 11
Select a pivot How do we pick a pivot?? We’ll see this later.
For now, imagine we pick it randomly.

6
Partition around it
L 3 2 1 4 9 8 11 R
Partition around pivot: L has elements less than pivot, and R has elements greater than pivot.
(Note that L and R remain unsorted).

The pivot is in position 5. We have three cases:


1. if k = 5: return pivot the kth smallest element is the pivot!

Recurse! 2. if k < 5: return SELECT(L, k) the kth smallest element lives in L

3. if k > 5: return SELECT(R, k-5) the kth smallest element is the (k-
5)th
smallest element in R 13
LINEAR SELECTION: EXAMPLE
SELECT(A, 7):
1 12 4 20 31 6 18 9

14
LINEAR SELECTION: EXAMPLE
SELECT(A, 7):
PICK A PIVOT
How do we pick a pivot??? 1 12 4 20 31 6 18 9
We’ll see later...

15
LINEAR SELECTION: EXAMPLE
SELECT(A, 7):
1 12 4 20 31 6 18 9

18
PARTITION
L 1 12 4 6 9 20 31 R

16
LINEAR SELECTION: EXAMPLE
SELECT(A, 7):
1 12 4 20 31 6 18 9

18
L 1 12 4 6 9 20 31 R
Recurse here (since 18 occupies
index 6 and k = 7 > 6)
RECURSE
SELECT(R, 1):
20 31
1=7-6
(aka k minus pivot position)

17
LINEAR SELECTION: EXAMPLE
SELECT(A, 7):
1 12 4 20 31 6 18 9

18
L 1 12 4 6 9 20 31 R
Recurse here (since 18 occupies
index 6 and k = 7 > 6)

SELECT(R, 1):
PICK A PIVOT 20 31
How do we pick a pivot???
We’ll see later...

18
LINEAR SELECTION: EXAMPLE
SELECT(A, 7):
1 12 4 20 31 6 18 9

18
L 1 12 4 6 9 20 31 R
Recurse here (since 18 occupies
index 6 and k = 7 > 6)

SELECT(R, 1):
20 31

PARTITION 20
31 R
19
LINEAR SELECTION: EXAMPLE
SELECT(A, 7):
1 12 4 20 31 6 18 9

18
L 1 12 4 6 9 20 31 R
Recurse here (since 18 occupies
index 6 and k = 7 > 6)

SELECT(R, 1):
20 31 20 IS OUR ANSWER!
(20 is the 1th smallest in R,
and 7th smallest overall)
20 is in the 1th position, and k = 1! 20
No need to recurse further! 31
20
LINEAR SELECTION: PSEUDOCODE

Base Case:
SELECT(A,k):
if len(A) = 1, then just if len(A) == 1:
go ahead and return return A[0]
the element itself p = GET_PIVOT(A)
Case 1:
L, R = PARTITION(A,p) We got lucky and found
exactly the kth smallest!
if len(L) == k-1:
return p Case 2:
else if len(L) > k-1: The kth smallest is in the
first part of the array (L)
return SELECT(L, k)
Case 3:
else if len(L) < k-1: The kth smallest is in the
return SELECT(R, k-len(L)-1) second part of the array (R)

21
LINEAR SELECTION: PSEUDOCODE
SELECT(A,k): PARTITION(A, pivot):
if len(A) == 1: L, R = [], []
return A[0] for i in
p = GET_PIVOT(A) [1,...,len(A)]:
L, R = PARTITION(A,p) if A[i] == pivot:
if len(L) == k-1: continue
return p else if A[i] <
else if len(L) > k-1: pivot:
return SELECT(L, k) add A[i] to L
else if len(L) < k-1: else:
return SELECT(R, k-len(L)-1) add A[i] to R
22
RUNTIME
SELECT(A,k): Recurrence Relation for SELECT
if len(A) == 1:
return A[0] For now, assume we’ll pick the pivot in time O(n)
p = GET_PIVOT(A)
L, R = PARTITION(A,p) O(n) len(L) == k-1
if len(L) == k-1: T(n)
return p T(len(L)) + O(n) len(L) > k-1
=
else if len(L) > k-1: T(len(R)) + O(n) len(L) < k-1
return SELECT(L, k)
else if len(L) < k-1:
return SELECT(R, k-len(L)- But what are len(L) and len(R)?
1) That depends on how we pick the pivot...

23
RUNTIME
SELECT(A,k): What’s a “good” pivot? Relation for SELECT
Recurrence
if len(A) == 1: What’s a “bad”For
pivot?
now, assume we’ll pick the pivot in time O(n)
return A[0]
p = GET_PIVOT(A)
L, R = PARTITION(A,p) O(n) len(L) == k-1
if len(L) == k-1: T(n)
return p T(len(L)) + O(n) len(L) > k-1
=
else if len(L) > k-1: T(len(R)) + O(n) len(L) < k-1
return SELECT(L, k)
else if len(L) < k-1:
return SELECT(R, k-len(L)- But what are len(L) and len(R)?
1) That depends on how we pick the pivot...

24
THE WORST PIVOT
The WORST pivot: picking the max or the min each time!
Then, in the worst case, the recurrence relation looks like T(n) = T(n-1) + O(n).

O(n) len(L) == k-1


T(n)
=
T(len(L)) + O(n) len(L) > k-1 T(n) ≤ T(n-1) + O(n)
T(len(R)) + O(n) len(L) < k-1

This ends up being Ω(n2)!


A call to SELECT(A, n/2) would already consist of ~n/2 recursive calls
(each with a subarray of length at least n/2)!
25
THE IDEAL PIVOT
The IDEAL pivot: splits the input array exactly in half!
len(L) = len(R) = (n-1)/2

O(n) len(L) == k-1


T(n)
=
T(len(L)) + O(n) len(L) > k-1 T(n) ≤ T(n/2) + O(n)
T(len(R)) + O(n) len(L) < k-1 Use Master Theorm to
solve recurrence
With the
ideal pivot,
the runtime
would be:

O(n) 26
THE IDEAL PIVOT
The IDEAL pivot: splits the input array exactly in half!
len(L) = len(R) = (n-1)/2

O(n) len(L) == k
T(n)
T(len(L))Sadly,
+ O(n)the pivot to divide
len(L) > k the input in half
T(n)is ≤
theT(n/2) + O(n)
=
T(len(R)) + O(n) len(L)MEDIAN
< k

aka SELECT(A, n/2)


aka exactly the problem we’re trying to solve...

27
THE GOOD-ENOUGH PIVOT
The GOOD-ENOUGH pivot: splits the input array kind of in half!
3n/10 < len(L) < 7n/10
3n/10 < len(R) < 7n/10
If we could fetch this good-enough pivot in time O(n), let’s say, the recurrence looks like:

O(n) len(L) == k-1


T(n) T(n) ≤ T(7n/10) + O(n)
T(len(L)) + O(n) len(L) > k-1
=
T(len(R)) + O(n) len(L) < k-1

28
THE GOOD-ENOUGH PIVOT
The GOOD-ENOUGH pivot: splits the input array kind of in half!
3n/10 < len(L) < 7n/10
3n/10 < len(R) < 7n/10
If we could fetch this good-enough pivot in time O(n), let’s say, the recurrence looks like:

O(n) len(L) == k-1


T(n) T(n) ≤ T(7n/10) + O(n)
T(len(L)) + O(n) len(L) > k-1 With Master
=
T(len(R)) + O(n) len(L) < k-1 Theorem
This good-
enough pivot
would still
give us:

O(n) 29
OUR GOAL
Efficiently pick the pivot in time O(n) so that
pivot!
array with things smaller than pivot array with things larger than pivot
18
L 1 12 4 6 9 20 31 R

3n/10 < len(L) < 7n/10 3n/10 < len(R) < 7n/10

Then, our recurrence T(n) ≤ T(7n/10) + O(n) comes out to O(n)!


30
MEDIAN-OF-MEDIANS
The Big Idea in Linear-Selection!

31
MEDIAN-OF-MEDIANS
The ideal world wasn’t feasible because we can’t just compute SELECT(A, n/2) ⇒ that would
throw us into infinite recursion since problem sizes aren’t shrinking between recursive calls…
But we can instead generate a smaller list and call SELECT on that smaller list!

OUR GAME PLAN:


We’ll make a smaller list out of SUB-MEDIANS.
Then, we’ll use SELECT to find the median of the sub-medians.
This “median of medians” will be our proxy for the true median!

32
MEDIAN-OF-MEDIANS
GOAL: get a proxy for the true median by finding the exact median of all the sub-medians!

1 14 4 18 25 6 17 9 3 5 10 16 12 23 19 13 20 8 15 24 7 21 22 2 11
33
MEDIAN-OF-MEDIANS
GOAL: get a proxy for the true median by finding the exact median of all the sub-medians!

Divide the original list into ⌈n/5⌉ groups (each group has 5 elements)

1 14 4 18 25 6 17 9 3 5 10 16 12 23 19 13 20 8 15 24 7 21 22 2 11
34
MEDIAN-OF-MEDIANS
GOAL: get a proxy for the true median by finding the exact median of all the sub-medians!

Divide the original list into ⌈n/5⌉ groups (each group has 5 elements)

Find the sub-median of each small group (3rd smallest out of the 5)

14 6 16 15 11

1 14 4 18 25 6 17 9 3 5 10 16 12 23 19 13 20 8 15 24 7 21 22 2 11
35
MEDIAN-OF-MEDIANS
GOAL: get a proxy for the true median by finding the exact median of all the sub-medians!

Divide the original list into ⌈n/5⌉ groups (each group has 5 elements)

Find the sub-median of each small group (3rd smallest out of the 5)
Find the median of all the sub-medians (call SELECT)

14 isn’t the exact median, but it’s pretty close! 14

14 6 16 15 11

1 14 4 18 25 6 17 9 3 5 10 16 12 23 19 13 20 8 15 24 7 21 22 2 11
36
MEDIAN-OF-MEDIANS
GOAL: get a proxy for the true median by finding the exact median of all the sub-medians!

Divide the original list into ⌈n/5⌉ groups (each group has 5 elements)

Find the sub-median of each small group (3rd smallest out of the 5)
Find the median of all the sub-medians (call SELECT) constant work for
each group.
14 ⌈n/5⌉ groups total ⇒
O(n) work.

14 6 16 15 11

1 14 4 18 25 6 17 9 3 5 10 16 12 23 19 13 20 8 15 24 7 21 22 2 11
37
MEDIAN-OF-MEDIANS
GOAL: get a proxy for the true median by finding the exact median of all the sub-medians!

Divide the original list into ⌈n/5⌉ groups (each group has 5 elements)

Find the sub-median of each small group (3rd smallest out of the 5)
Find the median of all the sub-medians (call SELECT) constant work for
each group.
14 ⌈n/5⌉ groups total ⇒
O(n) work.

To compute our pivot:


14 6 16 15 11
Do O(n) work to set up (divide into groups & get a list of submedians),
then make a call to SELECT(Submedians, |Submedians|/2)
1 14 4 18 25 6 17 9 3 5 10 16 12 23 19 13 20 8 15 24 7 21 22 2 11
38
ANALYZING RUNTIME
SELECT(A,k):
if len(A) == 1:
return A[0]
p = MEDIAN_OF_MEDIANS(A)
L, R = PARTITION(A,p)
What does the recurrence
if len(L) == k-1:
relation for T(n) look like?
return p
else if len(L) > k-1:
return SELECT(L, k)
else if len(L) < k-1:
return SELECT(R, k-len(L)-1)
39
ANALYZING RUNTIME
O(n) work outside of
SELECT(A,k): recursive calls
if len(A) == 1: (base case, set-up within
return A[0] MEDIAN_OF_MEDIANS, partitioning)

p = MEDIAN_OF_MEDIANS(A)
L, R = PARTITION(A,p) T(n/5) work hidden in
if len(L) == k-1: this recursive call
(remember, MEDIAN_OF_MEDIANS
return p calls SELECT on ⌈n/5⌉-size array)
else if len(L) > k-1:
return SELECT(L, k) T(???) work hidden in
else if len(L) < k-1: this recursive call
return SELECT(R, k-len(L)-1) What is the maximum size of
either L or R?
40
ANALYZING RUNTIME
O(n) work outside of
SELECT(A,k): recursive calls
if len(A) == 1: (base case, set-up within
return A[0] MEDIAN_OF_MEDIANS, partitioning)

What is the smallest number


p = MEDIAN_OF_MEDIANS(A)
L, R = PARTITION(A,p) T(n/5) work hidden in
of elements that
if len(L) == k-1:
could be this recursive call
smaller
returnthan
p our MEDIAN
(remember, MEDIAN_OF_MEDIANS
calls SELECT on ⌈n/5⌉-size array)
OF
else if MEDIANS?
len(L) > k-1:
return SELECT(L, k) T(???) work hidden in
else if len(L) < k-1: this recursive call
return SELECT(R, k-len(L)-1) What is the maximum size of
either L or R?
41
ANALYZING RUNTIME
MEDIAN_OF_MEDIANS will choose a pivot greater than at least 3n/10 - 6
elements
(The same reasoning we’re about to do also shows that the pivot will be less than at least 3n/10 - 6 elements)

1 14 4 18 25
m = ⌈n/5⌉ groups

6 17 9 3 5

10 16 12 23 19

13 20 8 15 24

7 21 22 2 11

at most 5 elements 42
ANALYZING RUNTIME
MEDIAN_OF_MEDIANS will choose a pivot greater than at least 3n/10 - 6
elements
(The same reasoning we’re about to do also shows that the pivot will be less than at least 3n/10 - 6 elements)

1 14 4 18 25 At least how many elements are guaranteed to be smaller


than the median of medians?
m = ⌈n/5⌉ groups

6 17 9 3 5

10 16 12 23 19

13 20 8 15 24

7 21 22 2 11

at most 5 elements 43
ANALYZING RUNTIME
MEDIAN_OF_MEDIANS will choose a pivot greater than at least 3n/10 - 6
elements
(The same reasoning we’re about to do also shows that the pivot will be less than at least 3n/10 - 6 elements)

3 5 6 9 17 At least how many elements are guaranteed to be smaller


than the median of medians?
m = ⌈n/5⌉ groups

2 7 11 21 22

1 4 14 18 25

8 13 15 20 24

10 12 16 19 23

at most 5 elements 44
ANALYZING RUNTIME
MEDIAN_OF_MEDIANS will choose a pivot greater than at least 3n/10 - 6
elements
(The same reasoning we’re about to do also shows that the pivot will be less than at least 3n/10 - 6 elements)

3 5 6 9 17 At least how many elements are guaranteed to be smaller


than the median of medians?
m = ⌈n/5⌉ groups

2 7 11 21 22
3 elements from each group that 2 elements from the group
1 4 14 18 25 has a median smaller than the containing the median of
median of medians medians
8 13 15 20 24
3 · (⌈m/2⌉ - 1) + 2
10 12 16 19 23
To exclude the group with
at most 5 elements the median of medians 45
ANALYZING RUNTIME
MEDIAN_OF_MEDIANS will choose a pivot greater than at least 3n/10 - 6
elements
(The same reasoning we’re about to do also shows that the pivot will be less than at least 3n/10 - 6 elements)

5 6 9 At least how many elements are guaranteed to be smaller


than the median of medians?
m = ⌈n/5⌉ groups

2 7 11 21 22
3 elements from each (non-leftover) 2 elements from the group
1 4 14 18 25 group that has a median smaller than containing the median of
the median of medians medians
8 13 15 20 24
3 · (⌈m/2⌉ - 1 - 1) + 2
10 12 16 19 23
To exclude any of those
To exclude the group with
groups that might be a
at most 5 elements the median of medians
“leftover” group! 46
ANALYZING RUNTIME
MEDIAN_OF_MEDIANS will choose a pivot greater than at least 3n/10 - 6
elements
(The same reasoning we’re about to do also shows that the pivot will be less than at least 3n/10 - 6 elements)

3 5 6 9 17 At least how many elements are guaranteed to be smaller


than the median of medians?
m = ⌈n/5⌉ groups

2 7 11 21 22
3 elements from each (non-leftover) 2 elements from the group
4 14 18 group that has a median smaller than containing the median of
the median of medians medians
8 13 15 20 24 The group with the
3 · (⌈m/2⌉ - 1 - 1) + 2 median of medians
might be a “leftover”
10 12 16 19 23 group! Might as well
just get rid of the +2
To exclude any of those to be safe
To exclude the group with
groups that might be a
at most 5 elements the median of medians
“leftover” group! 47
ANALYZING RUNTIME
MEDIAN_OF_MEDIANS will choose a pivot greater than at least 3n/10 - 6
elements
(The same reasoning we’re about to do also shows that the pivot will be less than at least 3n/10 - 6 elements)

3 5 6 9 17 At least how many elements are guaranteed to be smaller


than the median of medians?
m = ⌈n/5⌉ groups

2 7 11 21 22 3 elements from each (non-leftover)


group that has a median smaller than
1 4 14 18 25 the median of medians

8 13 15 20 24 3 · (⌈m/2⌉ - 2)
= 3 · (⌈⌈n/5⌉/2⌉ - 2)
10 12 16 19 23
≥ 3 · (n/10 - 2)
= 3n/10 - 6
at most 5 elements 48
ANALYZING RUNTIME
We just showed:

3n/10 - 6 ≤ len(L)
len(R) ≤ 7n/10 + 5

49
ANALYZING RUNTIME
We can similarly show the inverse:

3n/10 - 6 ≤ len(L) ≤ 7n/10 +


5
3n/10 - 6 ≤ len(R) ≤ 7n/10 +
5
What does the recurrence relation for T(n) look like?
T(n) ≤ T(n/5) + T(???) + O(n)
(from before the break)
50
ANALYZING RUNTIME
We can similarly show the inverse:

3n/10 - 6 ≤ len(L) ≤ 7n/10 +


5
3n/10 - 6 ≤ len(R) ≤ 7n/10 +
5
What does the recurrence relation for T(n) look like?
T(n) ≤ T(n/5) + T(7n/10) + O(n)
51
ANALYZING RUNTIME

T(n) ≤ T(n/5) + T(7n/10) + O(n)

Solve this equation using sustitution Method

52
ANALYZING RUNTIME

T(n) ≤ T(n/5) + T(7n/10) + O(n)

Solve this equation using sustitution Method

O(n)
Worst-case Runtime! 53
PSEUDOCODE & RUNTIME
O(n) work outside of
SELECT(A,k): recursive calls
if len(A) == 1: (base case, set-up within
return A[0] MEDIAN_OF_MEDIANS, partitioning)

p = MEDIAN_OF_MEDIANS(A)
L, R = PARTITION(A,p) T(n/5) work hidden in
if len(L) == k-1: this recursive call
(remember, MEDIAN_OF_MEDIANS
return p calls SELECT on ⌈n/5⌉-size array)
else if len(L) > k-1:
return SELECT(L, k) T(7n/10) work hidden in
else if len(L) < k-1: this recursive call
7n/10 is the maximum size of
return SELECT(R, k-len(L)-1) either L or R (this is what the
median-of-medians technique
guarantees us)! 54
LINEAR-TIME SELECTION
SELECT(A,k):
if len(A) == 1:
return A[0]
p = MEDIAN_OF_MEDIANS(A)
L, R = PARTITION(A,p)
if len(L) == k-1:
return p
else if len(L) > k-1:
return SELECT(L, k)
else if len(L) < k-1:
return SELECT(R, k-len(L)-1)

O(n)
Worst-case Runtime! 55
HIGHLIGHTS OF SELECT
We covered a lot of details - here are the big picture takeaways.

56
LINEAR SELECTION: THE BIG IDEA

Select a pivot: Median of Medians

Partition around pivot

Recurse!

57
LINEAR SELECTION: RUNTIME
Select a pivot: Median of (sub)Medians
Divide the original list into ⌈n/5⌉ groups (each group has ≤ 5 elements)

Find the sub-median of each small group (3rd smallest out of the 5)
Find the median of all the sub-medians (via recursive call to
SELECT!!)

Partition around pivot


Recurse!
58
LINEAR SELECTION: RUNTIME
O(n) Non- Select a pivot: Median of (sub)Medians
recursive
“shallow” Divide the original list into ⌈n/5⌉ groups (each group has ≤ 5 elements)
work!
Find the sub-median of each small group (3rd smallest out of the 5)
Find the median of all the sub-medians (via recursive call to
SELECT!!)

Partition around pivot


Recurse!
59
LINEAR SELECTION: RUNTIME
O(n) Non- Select a pivot: Median of (sub)Medians
recursive
“shallow” Divide the original list into ⌈n/5⌉ groups (each group has ≤ 5 elements)
T(n/5)
work! Recursive work:
Find the sub-median of each small group (3rd smallest out of the 5) we call SELECT
on an array of size
Find the median of all the sub-medians (via recursive call to n/5
SELECT!!)

Partition around pivot


Recurse!
60
LINEAR SELECTION: RUNTIME
O(n) Non- Select a pivot: Median of (sub)Medians
recursive
“shallow” Divide the original list into ⌈n/5⌉ groups (each group has ≤ 5 elements)
T(n/5)
work! Recursive work:
Find the sub-median of each small group (3rd smallest out of the 5) we call SELECT
on an array of size
Find the median of all the sub-medians (via recursive call to n/5
SELECT!!)

Partition around pivot T(7n/10)


Recursive work:
we call SELECT

Recurse! on either L or R
(size ≤ 7n/10)

61
WAIT: WHERE DID WE GET 7n/10?
We proved this claim:
3n/10 - 6 ≤ len(L) ≤ 7n/10 + this is because
len(L) + len(R) = n-1,
53n/10 - 6 ≤ len(R) ≤ 7n/10 + so if
3n/10 - 6 ≤ len(L)
5 then
len(R) ≤ 7n/10 + 5
3 5 6 9 17 We asked ourselves:
m = ⌈n/5⌉ groups

2 7 11 21 22 At least how many elements are guaranteed to be


smaller than the median of medians?
1 4 14 18 25
The shaded region denotes the only elements that are guaranteed to
8 13 15 20 24
be smaller than 14(the median of medians). We counted that up,
10 12 16 19 23 took care of some off-by-one errors just to be safe (i.e. just to make
sure we’re underestimating), and we got 3n/10 - 6!
at most 5 elements 62
(DETAILS IF YOU’RE CURIOUS)
We proved this claim:
3n/10 - 6 ≤ len(L) ≤ 7n/10 +
53n/10 - 6 ≤ len(R) ≤ 7n/10 +
5
3 5 6 9 17 3 elements from each (non-leftover) 2 elements from the group
m = ⌈n/5⌉ groups

group that has a median smaller than containing the median of


2 7 11 21 22 the median of medians medians
4 14 18 The group with the
3 · (⌈m/2⌉ - 1 - 1) + 2 median of medians
8 13 15 20 24 might be a “leftover”
group! Might as well
just get rid of the +2
10 12 16 19 23 To exclude any of those
To exclude the group with to be safe
groups that might be a
the median of medians
“leftover” group! 63
at most 5 elements
LINEAR SELECTION: RUNTIME
O(n) Non- Select a pivot: Median of (sub)Medians
recursive
“shallow” Divide the original list into ⌈n/5⌉ groups (each group has ≤ 5 elements)
T(n/5)
work! Recursive work:
Find the sub-median of each small group (3rd smallest out of the 5) we call SELECT
on an array of size
Find the median of all the sub-medians (via recursive call to n/5
SELECT!!)

Partition around pivot T(7n/10)


Recursive work:
we call SELECT

Recurse! on either L or R
(size ≤ ~7n/10)

64
LINEAR SELECTION: RUNTIME
O(n) Non- Select a pivot: Median of (sub)Medians
recursive
“shallow” Divide the original list into ⌈n/5⌉ groups (each group has ≤ 5 elements)
T(n/5)
work! Recursive work:
Find the sub-median of each small group (3rd smallest out of the 5) we call SELECT
on an array of size
T(n) ≤ T(n/5) + T(7n/10) +
Find the median of all the sub-medians (via recursive call to
SELECT!!)
n/5

O(n)
Partition around pivot T(7n/10)
Recursive work:
we call SELECT

Recurse! on either L or R
(size ≤ 7n/10)

65
LINEAR SELECTION: RUNTIME

T(n) ≤ T(n/5) + T(7n/10) + O(n)

Solve this equation using substitution Method

O(n)
Worst-case Runtime! 66
LINEAR SELECTION: THE BIG IDEA
Select a pivot: Median of Medians

Partition around pivot

Recurse!
Median of Medians is really cool! The math was a little detailed, but worth the time to
digest so that you’re 110% convinced that the technique does give a ~7n/10 bound on the
max size of either L or R. Solving the recurrence can be done via Substitution Method.
SELECT as a whole is an amazing display of Divide-and-Conquer! 67

You might also like