0% found this document useful (0 votes)
49 views28 pages

Announcements: Weekly Reading: Chap 11 (CLRS) (Not On Upcoming Exam)

This document describes algorithms for finding order statistics like the minimum, maximum, and median of a data set. It introduces an algorithm called Randomized-Select that can find the ith order statistic in average O(n) time by recursively partitioning the data around a random pivot. It also presents an algorithm that can find the ith order statistic in worst-case linear time using the "median of medians" technique to select a good partitioning element at each step.

Uploaded by

Ibrahim Hawari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views28 pages

Announcements: Weekly Reading: Chap 11 (CLRS) (Not On Upcoming Exam)

This document describes algorithms for finding order statistics like the minimum, maximum, and median of a data set. It introduces an algorithm called Randomized-Select that can find the ith order statistic in average O(n) time by recursively partitioning the data around a random pivot. It also presents an algorithm that can find the ith order statistic in worst-case linear time using the "median of medians" technique to select a good partitioning element at each step.

Uploaded by

Ibrahim Hawari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

UNC Chapel Hill

Announcements
Weekly Reading: Chap 11 (CLRS)
[not on upcoming exam]

Assignment 3, due today; solutions
on Sakai today; graded results back
on Tuesday
Exam on Sorting and Its Analysis
on Thursday, 26 September



UNC Chapel Hill
Order Statistics
ith order statistic of a set of n elements is
the ith smallest element

Minimum: the first order statistic
Maximum: the nth order statistic
Median: the n/2th order statistic

Selection problem can be specified as
Input: A set A of n numbers and a number i,
with 1 i n
Output: an element x A that is larger than
exactly i-1 other elements of A
UNC Chapel Hill
Minimum (A)
1. min A[1]
2. for i 2 to length[A]
3. do if min > A[i]
4. then min A[i]
5. return min

Note lines 1 and 4 swap a new value into
min

UNC Chapel Hill
Algorithm Analysis:
Min or Max
Worst case and Average case comparisons:
T(n) = n-1 = (n) for Minimum(A) or Maximum(A)

Average case number of swaps s: Line 1 or 4
(swap) is executed (lg n), proved as follows:
For any 1 i n, the probability line 4 is executed is
the probability that A[i] is the minimum among all
A[j] for 1 j i, which is 1/i. So, the expectation of s
E[s] = E[s
1
+ s
2
+...+ s
n
]
= 1/1 + . + 1/n
= ln n + O(1) = (lg n)

UNC Chapel Hill
Simultaneous Minimum and
Maximum (A), Basic Method
min A[1]; max A[1];
for i 2 to length[A]
do if min > A[i]
then min A[i]
if max < A[i]
then max A[i]
return min, max

UNC Chapel Hill
Algorithm Analysis for
Min and Max, Basic Method
Worst case and Average case comparisons:
T(n) = 2n-2 = for minimum(A) and maximum(A)
In the following we will see an algorithm that also
requires (n) comparisons but has constant
multiplier 3/2 rather than 2


UNC Chapel Hill
Simultaneous Minimum and
Maximum (A), Improved Method
1. For i = 1 to length[A]/2
Sort A[2i-1] and A[2i]
2. Do normal algorithm for max on second
elements of the sorted pairs,
plus the singleton at the end if odd # of elements
3. Do normal algorithm for min on first
elements of the sorted pairs
plus the singleton at the end if odd # of elements


Idea: if you have pairs ordered by value, the min can come
only from the first elements, and the max can come only
from the second elements
UNC Chapel Hill
Min and Max, Algorithm Analysis,
Improved Method

Pair sorting takes n/2 comparisons
Max finding and min finding each take n/2-1
comparisons
Thus, only 3 n/2 - O(1) comparisons are necessary
to find both the minimum and the maximum.
UNC Chapel Hill
Selection of i
th
Order Stat
Other Than Min or Max
At worst can be done by full O(n log n) sort and
then selecting A[i]
But with quicksort idea of reorganizing wrt a pivot
elements, you can recur in only the part in which
the ith element falls
UNC Chapel Hill
Randomized-Select
Partition the input array around a randomly
chosen element x using Randomized-Partition.
Let k be the number of elements on the low
side and n-k on the high side.

Use Randomized-Select recursively to find the
ith smallest element on the low side if i k , or
the (i-k)th smallest element on the high side if
i > k
UNC Chapel Hill
Randomized-Select (A, p, r, i)
1. if p = r
2. then return A[p]
3. q Randomized-Partition(A, p, r)
4. k q - p + 1
5. if i k
6. then Randomized-Select(A, p, q, i)
7. else Randomized-Select(A, q+1, r, i-k)

The worst-case running time can be (n
2
), but
we will see that the average performance is
O(n).
UNC Chapel Hill
Randomized-Partition Example
8 1 5 3 4
Goal: Find 3
rd
smallest element
UNC Chapel Hill
Randomized-Partition Example
8 1 5 3 4
UNC Chapel Hill
Randomized-Partition Example
8 1 5 3 4
1 3 8 5 4
UNC Chapel Hill
Randomized-Partition Example
8 1 5 3 4
1 3 8 5 4
8 5 4
UNC Chapel Hill
Randomized-Partition Example
8 1 5 3 4
1 3 8 5 4
8 5 4
UNC Chapel Hill
Randomized-Partition Example
8 1 5 3 4
1 3 8 5 4
4 5 8
UNC Chapel Hill
Randomized-Partition Example
8 1 5 3 4
1 3 8 5 4
4 5 8
4
UNC Chapel Hill
Upper Bound Analysis
At a recursive step, we first partition, which takes O(n)
comparisons.
When we have n elements partitioned into k-1, 1, and n-k,
we can do no worse than having to work with the larger of
k and n-k
So T(n) T(max(k, n-k)) + O(n)
If each k is equally likely, with probability 1/n, all
partition sizes can happen in two ways:

E[(T(n)]

(
k = n/2 to n-1
E[T(k)] ) + O(n)

UNC Chapel Hill
Average-Case Analysis
cont.

T(n)
2


k = n/2 to n-1
T(k) + O(n)

Substitution Method: Guess T(n) c n
T(n)
2


k = n/2 to n-1
ck + O(n)

2

(
k = 1 to n-1
k -
k = 1 to n/2 -1
k ) + O(n)
=
2

( (n-1)n/2 - ( n/2 -1

) n/2 ) + O(n)
c(n - 1) - (c/n)(

-1)(n/2) + O(n)
c(3n/4 - 1/2) + O(n)
cn if we pick c large enough so that
c(n/4 - 1/2) dominates O(n)
UNC Chapel Hill
Selection in
Worst-Case Linear Time
It finds the desired element(s) by
recursively partitioning the input
array

Basic idea: to generate a good split
of the array by computing a good
partitioning element,
where that computation is efficient
enough
UNC Chapel Hill
Algorithm to select i
th
order stat
in worst case linear time
1 Divide the n elements of input array into n/5 groups of
5 elements each and at most one group made up of the
remaining (n mod 5) elements.
2 Find the median of each group by insertion sort & take
its middle element (smaller of 2 if even number input).
3 Use this method recursively to find the median x of the
n/5 medians found in step 2.
4 Partition the input array around the median-of-
medians-of medians x. Let k be the number of
elements on the low side and n-k on the high side.
5 Use Select recursively to find the ith smallest element
on the low side if i k , or the (i-k)th smallest element on
the high side if i > k
UNC Chapel Hill
Pictorial Analysis of Select
UNC Chapel Hill
Algorithm Analysis (I)
At least half of the medians found in step 2 are
greater or equal to the median-of-medians x.
Thus, at least half of the n/5 groups contribute
3 elements that are greater than x, except the
one that has < 5 and the one group containing x.
The number of elements > x is at least
3 ( (1/2)n/5 - 2) 3n/10 - 6
Similarly the number of elements < x is at least
3n/10 - 6. In the worst case, SELECT is called
recursively on at most 7n/10 + 6 elements.

UNC Chapel Hill
Algorithm Analysis (II), for
computing the partitioner
1 Divide the n elements of input array into n/5 groups of 5 elements
each and at most one group made up of the remaining (n mod 5)
elements.
2 Find the median of each group by insertion sort & take its middle
element (smaller of 2 if even number input).
3 Use this method recursively to find the median x of the n/5
medians
Steps 1 and 2 together take O(n) time.
Step 3 takes time T(n/5)
So to find the median of medians: solve
recurrence T(n)= T(n/5) + O(n)
Yields T(n) = O(n)
UNC Chapel Hill
Creating the Recurrence Equation
for the full method
Steps 1- 4 take O(n) time. Step 5 takes time at
most T(7n/10 + 6). Assume we apply the recursion
only for n > some n
0


T(n) (1), if n n
0
T(n) T(7n/10 + 6) + O(n), if n > n
0

1-3 Compute x =the median of of medians.
4. Partition the input array around x. Let k be the number of
elements on the low side and n-k on the high side.
5. Use Select recursively to find the ith smallest element on the low
side if i k , or the (i-k)th smallest element on the high side if i > k
UNC Chapel Hill
Solving Recurrence
Step 1, 2 and 4 take O(n) time. Step 3 takes time
T(n/5) and step 5 takes time at most T(7n/10 + 6).

T(n) (1), if n n
0
T(n) T(7n/10 + 6) + O(n), if n > n
0

Substitution Method: Guess T(n) cn
T(n) c (7n/10 + 6) + O(n)
7cn/10 + 6c + O(n)
c n

if we choose c large enough such
that c(n/10 - 7) is larger than O(n)
UNC Chapel Hill
The Full Algorithm
Our analysis shows that the running time for our
fancy algorithm using a pivot on median of
medians on files of small size, e.g., n < 80 is
slower on the average than the simple algorithm
with a random pivot.
Thus, the full algorithm does two things:
If n<80, run with random pivot
Otherwise, subdivide recursively with the median of ..
medians pivot until the subdivision yields a subfile of size
<80, in which case switch to using the algorithm with the
random pivot.

You might also like