CS4311 Design and Analysis of Algorithms: Lecture 8: Order Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

CS4311

Design and Analysis of


Algorithms
Lecture 8: Order Statistics

1
About this lecture
•Finding max, min in an unsorted array
(upper bound and lower bound)

•Selecting the kth smallest element in an


unsorted array

kth smallest element  kth order statistics

2
Finding Maximum (Method I)
• Let S denote the input set of n items
• To find the maximum of S, we can:
Step 1: Set max = item 1
Step 2: for k = 2, 3, …, n
if (item k is larger than max)
Update max = item k;
Step 3: return max;
# comparisons = n –1
3
Finding Maximum (Method II)
• Define a function Find-Max as follows:
Find-Max(R, k) /* R is a set with k items */
1. if (k 2) return maximum of R;
2. Partition items of R into bk/2c pairs;
3. Delete smaller item from R in each pair;
4. return Find-Max(R, k - bk/2c);
Calling Find-Max(S,n) gives the maximum of S

4
Finding Maximum (Method II)
Let T(n) = # comparisons for Find-Max with
problem size n

So, T(n) = T(n -bn/2c) + bn/2c for n ¸ 3


T(2) = 1
Solving the recurrence (by substitution),
we get T(n) = n - 1

5
Lower Bound
Question: Can we find the maximum using
fewer than n –1 comparisons?
Answer: No ! To ensure that an item x is
not the maximum, there must be at least
one comparison in which x is the smaller
of the compared items
So, we need to ensure n-1 items not max
 at least n –1 comparisons are needed

6
Finding Both Max and Min
Can we find both max and min quickly?
Solution 1:
First, find max with n –1 comparisons
Then, find min with n –1 comparisons
 Total = 2n –2 comparisons

Is there a better solution ??

7
Finding Both Max and Min
Solution 2: (if n is even)
First, partition items into n/2 pairs;

Next, compare items within each pair;


= larger = smaller
8
Finding Both Max and Min
Then, max = Find-Max in larger items
min = Find-Min in smaller items

Find-Max Find-Min
# comparisons = 3n/2 –2
9
Finding Both Max and Min
Solution 2: (if n is odd)
We find max and min of first n - 1 items;
if (last item is larger than max)
Update max = last item;
if (last item is smaller than min )
Update min = last item;

# comparisons = 3(n-1)/2
10
Finding Both Max and Min
Conclusion:
To find both max and min:
if n is odd: 3(n-1)/2 comparisons
if n is even: 3n/2 –2 comparisons

Combining: at most 3bn/2c comparisons

 better than finding max and min separately

11
Lower Bound
Textbook Ex 9.1-2 (Very challenging):
• Show that we need at least
d3n/2e–2 comparisons
to find both max and min in worst-case

Hint: Consider how many numbers may be


max or min (or both). Investigate how a
comparison affects these counts
12
Selection in Linear Time
• In next slides, we describe a recursive call
Select(S,k)
which supports finding the kth smallest
element in S
• Recursion is used for two purposes:
(1) selecting a good pivot (as in Quicksort)
(2) solving a smaller sub-problem

13
Select(S, k)
/* First,find a good pivot */
1. Partition S into d|S|/5e groups, each
group has five items (one group may
have fewer items);
2. Sort each group separately;
3. Collect median of each group into S’;
4. Find median m of S’
:
m = Select(S’
,dd|S|/5e/2e);

14
4. Let q = # items of S smaller than m;
5. If (k == q + 1)
return m;
/* Partition with pivot */
6. Else partition S into X and Y
X = {items smaller than m}
Y = {items larger than m}
/* Next,form a sub-problem */
7. If (k q + 1)
return Select(X, k)
8. Else
return Select(Y, k– (q+1));
15
Selection in Linear Time
Questions:

1. Why is the previous algorithm correct?


(Prove by Induction)

2. What is its running time?

16
Running Time
• In our selection algorithm, we chose m,
which is the median of medians, to be a
pivot and partition S into two sets X and Y
• In fact, if we choose any other item as the
pivot, the algorithm is still correct
• Why don’ t we just pick an arbitrary pivot
so that we can save some time ??

17
Running Time
• A closer look reviews that the worst-case
running time depends on |X| and |Y|
• Precisely, if T(|S|) denote the worst-case
running time of the algorithm on S, then

T(|S|) = T(d|S|/5e) + (|S|)

+ max {T(|X|),T(|Y|) }

18
Running Time
• Later, we show that if we choose m, the
“median of medians”, as the pivot,

both |X| and |Y| will be at most 3|S|/4

• Consequently,
T(n) = T(dn/5e) + (n) + T(3n/4)

 T(n) = (n) (obtained by substitution)

19
Median of Medians
• Let’
s begin with dn/5e sorted groups, each
has 5 items (one group may have fewer)

= larger = median = smaller

20
Median of Medians
• Then, we obtain the median of medians, m

Groups with median Groups with median


smaller than m =m larger than m 21
Median of Medians
Then, we know that all items marked with
X have value at most m
X X
X
X X X X
X X
X X
X

Groups with median


smaller than m =m X=“
value m”
22
Median of Medians
The number of items with value at most m
is at least
3(ddn/5e/2e–1) - 2

one group may have


each full group has min # of only 1 ‘
crossed’
item
3‘crossed’ items groups

 number of items: at least 3n/10 –5


23
Median of Medians
Previous page implies that at most
7n/10 + 5 items
are greater than m

 For large enough n (say, n 100)


7n/10 + 5  3n/4
 |Y| is at most 3n/4 for large enough n

24
Median of Medians
Similarly, we can show that at most
7n/10 + 5 items are smaller than m
 |X| is at most 3n/4 for large enough n

Conclusion:
The “ median of medians”helps us control
the worst-case size of the sub-problem
 without it, the algorithm runs in (n2)
time in the worst-case
25

You might also like