0% found this document useful (0 votes)

24 views7 pages

Lecture4 Notes

This document summarizes a lecture on selection algorithms: 1) The selection problem is to find the k-th smallest element in an array. A naive approach is to sort the array and return the k-th element, with O(n log n) time complexity. 2) For k=1 (finding the minimum), there is a linear time selection algorithm that scans the array and tracks the minimum. This is optimal since any algorithm must look at each element. 3) A linear time selection algorithm was developed using a "median of medians" approach to choose a pivot element in linear time, partitioning the array around the pivot, and recursively solving the problem on one of the partitions.

Uploaded by

Kutemwa Mithi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views7 pages

Lecture4 Notes

Uploaded by

Kutemwa Mithi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CS 161 Lecture 4: Median and Selection

Winter 2021 Mon, Jan 25

Adapted from Virginia Williams’ lecture notes. Additional credits go to Albert Chen, Juliana
Cook (2015), Ofir Geri, Sam Kim (2016), Gregory Valiant (2017), Aviad Rubinstein (2018).
Please direct all typos and mistakes to Moses Charikar and Nima Anari (2021).

1 Selection

The selection problem is to find the k-th smallest number in an array A.

Input: array A of n numbers, and an integer k ∈ {1, . . . , n}.
Output: the k-th smallest number in A.
One approach is to sort the numbers in ascending order, and then return the k-th number in
the sorted list. This takes O(n log n) time, since it takes O(n log n) time for the sort (e.g. by
MergeSort) and O(1) time to return k-th number.

1.1 Minimum Element

As always, we ask if we can do better (i.e., faster in big-O terms). In the special case where
k = 1, selection is the problem of finding the minimum element. We can do this in O(n)
time by scanning through the array and keeping track of the minimum element so far. If the
current element is smaller than the minimum so far, we update the minimum.

Algorithm 1: SelectMin(A)
m←∞
n ← length(A)
for i = 1 to n do
if A[i ] < m then
m ← A[i ]

return m

In fact, this is the best running time we could hope for.

Definition 1. A deterministic algorithm is one which, given a fixed input, always performs
the same operations (as opposed to an algorithm which uses randomness).
Proposition 2. Any deterministic algorithm for finding the minimum has runtime Ω(n).

1
Proof. Intuitively, the claim holds because any algorithm for the minimum must look at
all the elements, each of which could be the minimum. Suppose a correct deterministic
algorithm does not look at A[i ] for some i . Then the output cannot depend on A[i ], so
the algorithm returns the same value whether A[i ] is the minimum element or the maximum
element. Therefore the algorithm is not always correct, which is a contradiction. So there is
no sublinear deterministic algorithm for finding the minimum.

So for k = 1, we have an algorithm which achieves the best running time possible. By similar
reasoning, this lower bound of Ω(n) applies to the general selection problem. So ideally we
would like to have a linear-time selection algorithm in the general case.

2 Linear-Time Selection

In fact, a linear-time selection algorithm does exist. Before showing the linear time selection
algorithm, it’s helpful to build some intuition on how to approach the problem. The high-level
idea will be to try to do a Binary Search over an unsorted input. At each step, we hope to
divide the input into two parts, the subset of smaller elements of A, and the subset of larger
elements of A. We will then determine whether the k-th smallest element lies in the first part
(with the “smaller” elements) or the part with larger elements, and recurse on exactly one of
those two parts.
How do we decide how to partition the array into these two pieces? Suppose we have a
black-box algorithm ChoosePivot that chooses some element in the array A, and we use this
pivot to define the two sets–any A[i ] less than the pivot is in the set of “smaller” values, and
any A[i ] greater than the pivot is in the other part. We will figure out precisely how to specify
this subroutine ChoosePivot a bit later, after specifying the high-level algorithm structure.
For clarity we’ll assume all elements are distinct from now on, but the idea generalizes easily.
Let n be the size of the array and assume we are trying to find the k-th element.
At each iteration, we use the element p to partition the array into two parts: all elements
smaller than the pivot and all elements larger than the pivot, which we denote A< and A> ,
respectively.
Depending on what the size of the resulting sub-arrays are, the runtime can be diﬀerent. For
example, if one of these sub-arrays is of size n − 1, at each iteration, we only decreased the
size of the problem by 1, resulting in total running time O(n2 ). If the array is split into two
equal parts, then the size of the problem at iteration reduces by half, resulting in a linear time
solution. (We assume ChoosePivot runs in O(n).)
Proposition 3. If the pivot p is chosen to be the minimum or maximum element, then Select
runs in Θ(n2 ) time.

Proof. At each iteration, the number of elements decreases by 1. Since running ChoosePivot
and creating A< and A> takes linear time, the recurrence for the runtime is T (n) = T (n −

2
Algorithm 2: Select(A, n, k)
if n = 1 then
return A[1]
p ← ChoosePivot(A, n)
A< ← {A(i ) | A(i ) < p}
A> ← {A(i ) | A(i ) > p}
if |A< | = k − 1 then
return p
else if |A< | > k − 1 then
return Select(A< , |A< |, k)
else if |A< | < k − 1 then
return Select(A> , |A> |, k − |A< | − 1)

1) + Θ(n). Expanding this,

T (n) ≤ c1 n + c1 (n − 1) + c1 (n − 2) + ... + c1 = c1 n(n + 1)/2

and
T (n) ≥ c2 n + c2 (n − 1) + c2 (n − 2) + ... + c2 = c2 n(n + 1)/2.
We conclude that T (n) = Θ(n2 ).
Proposition 4. If the pivot p is chosen to be the median element, then Select runs in O(n)
time.

Proof. Intuitively, the running time is linear since we remove half of the elements from consid-
eration each iteration. Formally, each recursive call is made on inputs of half the size, namely,
T (n) ≤ T (n/2)+cn. Expanding this, the runtime is T (n) ≤ cn+cn/2+cn/4+...+c ≤ 2cn,
which is O(n).

So how do we design ChoosePivot that chooses a pivot in linear time? In the following, we
describe three ideas.

2.1 Idea #1: Choose a random pivot

As we saw earlier, depending on the pivot chosen, the worst-case runtime can be O(n2 ) if we
are unlucky in the choice of the pivot at every iteration. As you might expect, it is extremely
unlikely to be this unlucky, and one can prove that the expected runtime is O(n) provided
the pivot is chosen uniformly at random from the set of elements of A. In practice, this
randomized algorithm is what is implemented, and the hidden constant in the O(n) runtime
is very small.

3
2.2 Idea #2: Choose a pivot that creates the most “balanced” split

Consider ChoosePivot that returns the pivot that creates the most “balanced” split, which
would be the median of the array. However, this is exactly selection problem we are trying to
solve, with k = n/2! As long as we do not know how to find the median in linear time, we
cannot use this procedure as ChoosePivot.

2.3 Idea #3: Find a pivot "close enough" to the median

Given a linear-time median algorithm, we can solve the selection problem in linear time (and
vice versa). Although ideally we would want to find the median, notice that as far as cor-
rectness goes, there was nothing special about partitioning around the median. We could
use this same idea of partitioning and recursing on a smaller problem even if we partition
around an arbitrary element. To get a good runtime, however, we need to guarantee that
the subproblems get smaller quickly. In 1973, Blum, Floyd, Pratt, Rivest, and Tarjan came
up with the Median of Medians algorithm. It is similar to the previous algorithm, but rather
than partitioning around the exact median, uses a surrogate “median of medians". We update
ChoosePivot accordingly.

Algorithm 3: ChoosePivot(A, n)
Split A into g = ⌈n/5⌉ groups p1 , . . . , pg
for i = 1 to g do
pi ← MergeSort(pi )
C ← {median of pi | i = 1, . . . , g}
p ← Select(C, g, g/2)
return p

What is this algorithm doing? First it divides A into segments of size 5. Within each group,
it finds the median by first sorting the elements with MergeSort. Recall that MergeSort
sorts in O(n log n) time. However, since each group has a constant number of elements, it
takes constant time to sort. Then it makes a recursive call to Select to find the median
of C, the median of medians. Intuitively, by partitioning around this value, we are able to
find something that is close to the true median for partitioning, yet is ‘easier’ to compute,
because it is the median of g = ⌈n/5⌉ elements rather than n. The last part is as before:
once we have our pivot element p, we split the array and recurse on the proper subproblem,
or halt if we found our answer.
We have devised a slightly complicated method to determine which element to partition
around, but the algorithm remains correct for the same reasons as before. So what is its
running time? As before, we’re going to show this by examining the size of the recursive
subproblems. As it turns out, by taking the median of medians approach, we have a guarantee

4
on how much smaller the problem gets each iteration. The guarantee is good enough to
achieve O(n) runtime.

2.3.1 Running Time

Lemma 5. |A< | ≤ 7n/10 + 5 and |A> | ≤ 7n/10 + 5.

Proof. p is the median of p1 , · · · , pg . Because p is the median of g = ⌈n/5⌉ elements, the

medians of ⌈g/2⌉−1 groups pi are smaller than p. If p is larger than a group median, it is larger
than at least three elements in that group (the median and the smaller two numbers). This
applies to all groups except the remainder group, which might have fewer than 5 elements.
Accounting for the remainder group, p is greater than at least 3 · (⌈g/2⌉ − 2) elements of A.
By symmetry, p is less than at least the same number of elements.
Now,
|A> | = # of elements greater than p
≤ (n − 1) − 3 · (⌈g/2⌉ − 2)
= n + 5 − 3 · ⌈g/2⌉ (1)
≤ n − 3n/10 + 5
= 7n/10 + 5.

By symmetry, |A< | ≤ 7n/10 + 5 as well.

Intuitively, we know that 60% of half of the groups are less than the pivot, which is 30% of
the total number of elements, n. Therefore, at most 70% of the elements are greater than
the pivot. Hence, |A> | ≈ 7n/10. We can make the same argument for |A< |.

The recursive call used to find the median of medians has input of size ⌈n/5⌉ ≤ n/5 + 1.
The other work in the algorithm takes linear time: constant time on each of ⌈n/5⌉ groups
for MergeSort (linear time total for that part), O(n) time scanning A to make A< and A> .
Thus, we can write the full recurrence for the runtime,
!
c1 n + T (n/5 + 1) + T (7n/10 + 5) if n > 5
T (n) ≤
c2 if n ≤ 5.

How do we prove that T (n) = O(n)? The master theorem does not apply here. Instead, we
will prove this using the substitution method.

2.4 Solving the Recurrence of Select Using the Substitution Method

For simplicity, we consider the recurrence T (n) ≤ T (n/5) + T (7n/10) + cn instead of the
exact recurrence of Select.

5
To prove that T (n) = O(n), we guess:
!
d · n0 if n = n0
T (n) ≤
d ·n if n > n0

For the base case, we pick n0 = 1 and use the standard assumption that T (1) = 1 ≤ d. For
the inductive hypothesis, we assume that our guess is correct for any n < k, and we prove
our guess for k. That is, consider d such that for all n0 ≤ n < k, T (n) ≤ dn.
To prove for n = k, we solve the following equation:

T (k) ≤ T (k/5) + T (7k/10) + ck ≤ dk/5 + 7dk/10 + ck ≤ dk

9/10d + c ≤ d
c ≤ d/10
d ≥ 10c
Therefore, we can choose d = max(1, 10c), which is a constant factor. The induction is
completed. By the definition of big-O, the recurrence runs in O(n) time.

2.5 Issues When Using the Substitution Method

Now" we
# will try out an example where our guess is incorrect. Consider the recurrence T (n) =
2T n2 + n (similar to MergeSort). We will guess that the algorithm is linear.
$
dn0 if n = n0
T (n) ≤
d · n if n > n0

We try the inductive step. We try to pick some d such that for all n ≥ n0 ,
k
%
n+ dg(ni ) ≤ d · g(n)
i=1

n
n+2·d · ≤ dn
2
n(1 + d) ≤ dn
n + dn ≤ dn
n < 0,

However, the above can never be true, and there is no choice of d that works! Thus our
guess was incorrect.

6
This time the guess was incorrect since MergeSort takes superlinear time. Sometimes, how-
ever, the guess can be asymptotically correct but the induction might not work out. Consider
for instance T (n) ≤ 2T (n/2) + 1.
We know that the runtime is O(n) so let’s try to prove it with the substitution method. Let’s
guess that T (n) ≤ cn for all n ≥ n0 .
First we do the induction step: We assume that T (n/2) ≤ cn/2 and consider T (n). We
want that 2 · cn/2 + 1 ≤ cn, that is, cn + 1 ≤ cn. However, this is impossible.
This doesn’t mean that T (n) is not O(n), but in this case we chose the wrong linear function.
We could guess instead that T (n) ≤ cn −1. Now for the induction we get 2·(cn/2−1)+1 =
cn − 1 which is true for all c. We can then choose the base case T (1) = 1.

3 Divide and Conquer 5 Quicksort
No ratings yet
3 Divide and Conquer 5 Quicksort
79 pages
Cs 161 Lecture 04
No ratings yet
Cs 161 Lecture 04
6 pages
K - Select Using Data Structures
No ratings yet
K - Select Using Data Structures
67 pages
08 Medians and Order Statistics
No ratings yet
08 Medians and Order Statistics
43 pages
Median Finding Algorithm
No ratings yet
Median Finding Algorithm
10 pages
05 Ch9 LinearSelection
No ratings yet
05 Ch9 LinearSelection
12 pages
4_quicksort.v2
No ratings yet
4_quicksort.v2
101 pages
Selection PDF
No ratings yet
Selection PDF
3 pages
CS-E3190 Lect04 PDF
No ratings yet
CS-E3190 Lect04 PDF
19 pages
KTH Smallest Number Algo
No ratings yet
KTH Smallest Number Algo
17 pages
Lecture 10ppt
No ratings yet
Lecture 10ppt
9 pages
L15 Median OrderStatistics
No ratings yet
L15 Median OrderStatistics
33 pages
Selection
No ratings yet
Selection
2 pages
Chapter 9: Medians and Order Statistics: Selection Problem
No ratings yet
Chapter 9: Medians and Order Statistics: Selection Problem
15 pages
Chapter Two
No ratings yet
Chapter Two
58 pages
Today's Material: - Medians & Order Statistics - Ch. 9
No ratings yet
Today's Material: - Medians & Order Statistics - Ch. 9
15 pages
Slides Algo Select Dalgorithm Typed
No ratings yet
Slides Algo Select Dalgorithm Typed
8 pages
Lecture 5: The Linear Time Selection in The Worst Case
No ratings yet
Lecture 5: The Linear Time Selection in The Worst Case
7 pages
CS4311 Design and Analysis of Algorithms: Lecture 8: Order Statistics
No ratings yet
CS4311 Design and Analysis of Algorithms: Lecture 8: Order Statistics
25 pages
Análisis y Diseño de Algoritmos (Algorítmica III) : - Selection Algorithms
No ratings yet
Análisis y Diseño de Algoritmos (Algorítmica III) : - Selection Algorithms
20 pages
Median Order Statistics
No ratings yet
Median Order Statistics
26 pages
CS4311 Design and Analysis of Algorithms: Lecture 8: Order Statistics
No ratings yet
CS4311 Design and Analysis of Algorithms: Lecture 8: Order Statistics
28 pages
Writeup (1)
No ratings yet
Writeup (1)
3 pages
Slides04 Selection
No ratings yet
Slides04 Selection
46 pages
Lecture 8 QuickSort
No ratings yet
Lecture 8 QuickSort
64 pages
6515 Transcripts DC2
No ratings yet
6515 Transcripts DC2
27 pages
Qsort
No ratings yet
Qsort
3 pages
06-prune-and-search
No ratings yet
06-prune-and-search
6 pages
1
No ratings yet
1
33 pages
Module 1-Lecture5-6
No ratings yet
Module 1-Lecture5-6
43 pages
Analysis of Algorithms - Medians and Order Statistics
No ratings yet
Analysis of Algorithms - Medians and Order Statistics
2 pages
Gate
No ratings yet
Gate
33 pages
DAA Unit-3 (B)
No ratings yet
DAA Unit-3 (B)
56 pages
Lec 8 12 Algo Spr15 MergeSort QuickSort
100% (1)
Lec 8 12 Algo Spr15 MergeSort QuickSort
39 pages
0.1 Review (Recurrences)
No ratings yet
0.1 Review (Recurrences)
8 pages
07 Sort2
No ratings yet
07 Sort2
87 pages
7-5 Solution PDF
No ratings yet
7-5 Solution PDF
3 pages
Quicksort, Mergesort, and Heapsort
No ratings yet
Quicksort, Mergesort, and Heapsort
22 pages
Chapter Two
No ratings yet
Chapter Two
30 pages
Introduction To Algorithms: 6.046J/18.401J/SMA5503
No ratings yet
Introduction To Algorithms: 6.046J/18.401J/SMA5503
30 pages
CORE - 14: Algorithm Design Techniques (Unit - 2)
No ratings yet
CORE - 14: Algorithm Design Techniques (Unit - 2)
14 pages
Mca 403
No ratings yet
Mca 403
85 pages
LAB_5
No ratings yet
LAB_5
3 pages
Lecture 4: The Linear Time Selection
No ratings yet
Lecture 4: The Linear Time Selection
22 pages
Lecture5 Compressed
No ratings yet
Lecture5 Compressed
36 pages
CC204
No ratings yet
CC204
8 pages
Sergey-6
No ratings yet
Sergey-6
23 pages
Sorting
No ratings yet
Sorting
34 pages
12 DC
No ratings yet
12 DC
52 pages
Quick Sort Algorithm
No ratings yet
Quick Sort Algorithm
6 pages
(Divide and Conquer) - Merge and Quick Sort
No ratings yet
(Divide and Conquer) - Merge and Quick Sort
34 pages
DSD Unit 3 Sorting and Searching
No ratings yet
DSD Unit 3 Sorting and Searching
36 pages
5 D&C Ii (Orig)
No ratings yet
5 D&C Ii (Orig)
58 pages
Slides Algo Select Ralgorithm Typed
No ratings yet
Slides Algo Select Ralgorithm Typed
11 pages
D&C BhagabanUnit - II - 1
No ratings yet
D&C BhagabanUnit - II - 1
45 pages
11110 計算方法設計許建平 quiz1
No ratings yet
11110 計算方法設計許建平 quiz1
6 pages
algorithm-D-A-C
No ratings yet
algorithm-D-A-C
7 pages
Lecture01 Slides
No ratings yet
Lecture01 Slides
37 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
File Server Group 3
No ratings yet
File Server Group 3
18 pages
NR1 NR1: Malawian Citizens
No ratings yet
NR1 NR1: Malawian Citizens
2 pages
Lecture 02 Oop210
No ratings yet
Lecture 02 Oop210
17 pages
OSYS-210 Course Outline
No ratings yet
OSYS-210 Course Outline
3 pages
Data Structure and Algorithm Sample Questions
No ratings yet
Data Structure and Algorithm Sample Questions
5 pages
Dsal-211 Assignment 1
No ratings yet
Dsal-211 Assignment 1
2 pages
January 2018 FP1 Question Paper
0% (1)
January 2018 FP1 Question Paper
32 pages
Assignment Problems
No ratings yet
Assignment Problems
55 pages
Chapter 1
No ratings yet
Chapter 1
45 pages
New Algorithm To Compute The Discrete Cosine Transform
No ratings yet
New Algorithm To Compute The Discrete Cosine Transform
3 pages
Table of Integrals: Appendix
No ratings yet
Table of Integrals: Appendix
30 pages
2 Conservation Equations
100% (1)
2 Conservation Equations
45 pages
Gamma Function
No ratings yet
Gamma Function
20 pages
Goldstein 21 7 12 PDF
100% (2)
Goldstein 21 7 12 PDF
8 pages
Calculus and Its Applications 11th Edition Bittinger Solutions Manual download
100% (3)
Calculus and Its Applications 11th Edition Bittinger Solutions Manual download
68 pages
Cs II Sem Handbook July-2021 Ok
No ratings yet
Cs II Sem Handbook July-2021 Ok
59 pages
EECS3451 Chapter1
No ratings yet
EECS3451 Chapter1
42 pages
Summary of Chapter 6
No ratings yet
Summary of Chapter 6
9 pages
Statistics Sheet III (Probability Distributions)
No ratings yet
Statistics Sheet III (Probability Distributions)
6 pages
Equal Area Criterion: Machine Infinite Bus Line
No ratings yet
Equal Area Criterion: Machine Infinite Bus Line
25 pages
Bab 3 Slide 1
100% (1)
Bab 3 Slide 1
11 pages
Mathematics Paper With Answer Solution Paper 1
No ratings yet
Mathematics Paper With Answer Solution Paper 1
14 pages
Tensor Short Sample Secure
No ratings yet
Tensor Short Sample Secure
15 pages
Pso Tutorial
No ratings yet
Pso Tutorial
5 pages
《Journal of Vibration and Acoustics》One-way Clutch and belt bending stiffness
No ratings yet
《Journal of Vibration and Acoustics》One-way Clutch and belt bending stiffness
14 pages
13.3 Exercises
No ratings yet
13.3 Exercises
2 pages
Engineering Electromagnetics: Dr.-Ing. Erwin Sitompul President University
No ratings yet
Engineering Electromagnetics: Dr.-Ing. Erwin Sitompul President University
25 pages
GN Smith - Probability & Statistics in Civil Engineering PDF
100% (2)
GN Smith - Probability & Statistics in Civil Engineering PDF
252 pages
Mathematics III Jan 2023
No ratings yet
Mathematics III Jan 2023
8 pages
Handouts MTH303-1
No ratings yet
Handouts MTH303-1
192 pages
Linear Algebra & Ordinary Differential Equations MATH-121: Lecture # 20
No ratings yet
Linear Algebra & Ordinary Differential Equations MATH-121: Lecture # 20
8 pages
Maths Syllabus PDF
No ratings yet
Maths Syllabus PDF
20 pages
Taylor Series
No ratings yet
Taylor Series
14 pages
2023 JC2 H2 Maths Prelim Temasek Junior College With Answer
No ratings yet
2023 JC2 H2 Maths Prelim Temasek Junior College With Answer
50 pages
CH 3
No ratings yet
CH 3
6 pages
15forteen@cm Viva Voce
No ratings yet
15forteen@cm Viva Voce
6 pages

Lecture4 Notes

Uploaded by

Lecture4 Notes

Uploaded by

CS 161 Lecture 4: Median and Selection

Winter 2021 Mon, Jan 25

The selection problem is to find the k-th smallest number in an array A.

1.1 Minimum Element

In fact, this is the best running time we could hope for.

1) + Θ(n). Expanding this,

T (n) ≤ c1 n + c1 (n − 1) + c1 (n − 2) + ... + c1 = c1 n(n + 1)/2

2.1 Idea #1: Choose a random pivot

2.3 Idea #3: Find a pivot "close enough" to the median

2.3.1 Running Time

Lemma 5. |A< | ≤ 7n/10 + 5 and |A> | ≤ 7n/10 + 5.

Proof. p is the median of p1 , · · · , pg . Because p is the median of g = ⌈n/5⌉ elements, the

By symmetry, |A< | ≤ 7n/10 + 5 as well.

2.4 Solving the Recurrence of Select Using the Substitution Method

T (k) ≤ T (k/5) + T (7k/10) + ck ≤ dk/5 + 7dk/10 + ck ≤ dk

2.5 Issues When Using the Substitution Method

You might also like