0% found this document useful (0 votes)
95 views26 pages

Median Order Statistics

The document discusses order statistics and selection problems. It defines order statistics as the ith smallest element of a set and discusses how the minimum, maximum, and median are special cases of order statistics. It then defines the selection problem and discusses algorithms for finding the minimum, maximum, and a general selection in linear time using randomization.

Uploaded by

Munawar Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views26 pages

Median Order Statistics

The document discusses order statistics and selection problems. It defines order statistics as the ith smallest element of a set and discusses how the minimum, maximum, and median are special cases of order statistics. It then defines the selection problem and discusses algorithms for finding the minimum, maximum, and a general selection in linear time using randomization.

Uploaded by

Munawar Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Order

Order Statistics
Statistics

Comp 122, Spring 2004


Order Statistic
 ith order statistic: ith smallest element of a set of n
elements.
 Minimum: first order statistic.
 Maximum: nth order statistic.
 Median: “half-way point” of the set.
» Unique, when n is odd – occurs at i = (n+1)/2.
» Two medians when n is even.
• Lower median, at i = n/2.
• Upper median, at i = n/2+1.
• For consistency, “median” will refer to the lower median.

der - 2 Comp 122 Lin / Devi


Selection Problem
 Selection problem:
» Input: A set A of n distinct numbers and a number
i, with 1 i  n.
» Output: the element x  A that is larger than exactly
i – 1 other elements of A.
 Can be solved in O(n lg n) time. How?
 We will study faster linear-time algorithms.
» For the special cases when i = 1 and i = n.
» For the general problem.

der - 3 Comp 122 Lin / Devi


Minimum (Maximum)
Minimum
Minimum(A) (A)
min
1.1. min A[1]
A[1]
forii
2.2. for 22to tolength[A]
length[A]
3.3. do doififmin
min>>A[i]
A[i]
4.4. then min
thenmin A[i]
A[i]
5.5. return
returnminmin

Maximum can be determined similarly.


• T(n) = (n).
• No. of comparisons: n – 1.
• Can we do better? Why not?
• Minimum(A) has worst-case optimal # of comparisons.
der - 4 Comp 122 Lin / Devi
Problem
Minimum
Minimum(A) (A)
 Average for random input: min
1.1. min A[1]
A[1]
How many times forii
2.2. for 22to tolength[A]
length[A]
3.3. do doififmin
min>>A[i]
A[i]
do we expect line 4 4.4. then min
thenmin A[i]
A[i]
to be executed? 5.5. return
returnminmin
» X = RV for # of executions of line 4.
» Xi = Indicator RV for the event that line 4 is executed
on the ith iteration.
» X = i=2..n Xi
» E[Xi] = 1/i. How?
» Hence, E[X] = ln(n) – 1 = (lg n).

der - 5 Comp 122 Lin / Devi


Simultaneous Minimum and Maximum
 Some applications need to determine both the
maximum and minimum of a set of elements.
» Example: Graphics program trying to fit a set of
points onto a rectangular display.
 Independent determination of maximum and
minimum requires 2n – 2 comparisons.
 Can we reduce this number?
» Yes.

der - 6 Comp 122 Lin / Devi


Simultaneous Minimum and Maximum
 Maintain minimum and maximum elements seen so far.
 Process elements in pairs.
» Compare the smaller to the current minimum and the larger to
the current maximum.
» Update current minimum and maximum based on the
outcomes.
 No. of comparisons per pair = 3. How?
 No. of pairs  n/2.
» For odd n: initialize min and max to A[1]. Pair the remaining
elements. So, no. of pairs = n/2.
» For even n: initialize min to the smaller of the first pair and
max to the larger. So, remaining no. of pairs = (n – 2)/2 <
n/2.

der - 7 Comp 122 Lin / Devi


Simultaneous Minimum and Maximum
 Total no. of comparisons, C  3n/2.
» For odd n: C = 3n/2.
» For even n: C = 3(n – 2)/2 + 1 (For the initial comparison).
= 3n/2 – 2 < 3n/2.

der - 8 Comp 122 Lin / Devi


General Selection Problem
 Seems more difficult than Minimum or
Maximum.
» Yet, has solutions with same asymptotic complexity
as Minimum and Maximum.
 We will study 2 algorithms for the general
problem.
» One with expected linear-time complexity.
» A second, whose worst-case complexity is linear.

der - 9 Comp 122 Lin / Devi


Selection in Expected Linear Time
 Modeled after randomized quicksort.
 Exploits the abilities of Randomized-Partition (RP).
» RP returns the index k in the sorted order of a randomly chosen
element (pivot).
• If the order statistic we are interested in, i, equals k, then we are done.
• Else, reduce the problem size using its other ability.
» RP rearranges the other elements around the random pivot.
• If i < k, selection can be narrowed down to A[1..k – 1].
• Else, select the (i – k)th element from A[k+1..n].
(Assuming RP operates on A[1..n]. For A[p..r], change k appropriately.)

der - 10 Comp 122 Lin / Devi


Randomized Quicksort: review
Quicksort(A,
Quicksort(A,p,p,r)r) Rnd-Partition(A,
Rnd-Partition(A,p,p,r)r)
ififpp<<rrthen
then ii:=
:=Random(p,
Random(p,r); r);
qq:=
:=Rnd-Partition(A,
Rnd-Partition(A,p,p,r);
r); A[r]
A[r]  A[i];
A[i];
Quicksort(A,
Quicksort(A,p,p,qq––1);
1); x,x,ii :=:=A[r],
A[r],pp––1;1;
Quicksort(A,
Quicksort(A,qq++1,1,r)r) for
forjj:= :=pptotorr––11dodo
fifi A[j]  xxthen
ififA[j] then
ii:=
:=ii++1;1;
A[p..r] A[i]
A[i] A[j]A[j]
fifi
5 od;
od;
A[i
A[i++1] 1]A[r];
A[r];
A[p..q – 1] A[q+1..r] return
returnii++11
Partition 5

5 5

der - 11 Comp 122 Lin / Devi


Randomized-Select
Randomized-Select(A,
Randomized-Select(A,p,p,r,r,i)i) ////select
selectith
ithorder
orderstatistic.
statistic.
1.1. ififpp==rr
2.2. then thenreturn
returnA[p]
A[p]
3.3. qq Randomized-Partition(A,
Randomized-Partition(A,p,p,r)r)
4.4. kk qq––pp++11
5.5. ififii== kk
6.6. then thenreturn
returnA[q]
A[q]
7.7. elseif
elseifii<<kk
8.8. then thenreturn
returnRandomized-Select(A,
Randomized-Select(A,p,p,qq––1,1,i)i)
9.9. else elsereturn
returnRandomized-Select(A,
Randomized-Select(A,q+1,q+1,r,r,ii––k)k)

der - 12 Comp 122 Lin / Devi


Analysis
 Worst-case Complexity:
 (n2) – As we could get unlucky and always recurse
on a subarray that is only one element smaller than the
previous subarray.
 Average-case Complexity:
 (n) – Intuition: Because the pivot is chosen at
random, we expect that we get rid of half of the list
each time we choose a random pivot q.
» Why (n) and not (n lg n)?

der - 13 Comp 122 Lin / Devi


Average-case Analysis
 Define Indicator RV’s Xk, for 1  k  n.
» Xk = I{subarray A[p…q] has exactly k elements}.
» Pr{subarray A[p…q] has exactly k elements} = 1/n for
all k = 1..n.
» Hence, E[Xk] = 1/n. (9.1)

 Let T(n) be the RV for the time required by


Randomized-Select (RS) on A[p…q] of n
elements.
 Determine an upper bound on E[T(n)].

der - 14 Comp 122 Lin / Devi


Average-case Analysis
 A call to RS may
» Terminate immediately with the correct answer,
» Recurse on A[p..q – 1], or
» Recurse on A[q+1..r].
 To obtain an upper bound, assume that the ith smallest
element that we want is always in the larger subarray.
 RP takes O(n) time on a problem of size n.
 Hence, recurrence
n
for T(n) is:
»
T (n)   X k  (T (max(k  1, n  k ))  O (n))
k 1
 For a given call of RS, Xk =1 for exactly one value of k,
and Xk = 0 for all other k.
der - 15 Comp 122 Lin / Devi
Average-case Analysis
n
T (n)   X k  (T (max(k  1, n  k ))  O( n))
k 1
n
  X k  T (max(k  1, n  k ))  O( n)
k 1

Taking expectation, we have


n 
E[T (n)]  E  X k  T (max(k  1, n  k ))  O (n)
 k 1 
n
  E[ X k  T (max(k  1, n  k ))]  O( n) (by linearity of expectation)
k 1
n
  E[ X k ]  E[T (max(k  1, n  k ))]  O(n) (by Eq. (C.23))
k 1
n
1
   E[T (max(k  1, n  k ))]  O(n) (by Eq. (9.1))
k 1 n

der - 16 Comp 122 Lin / Devi


Average-case Analysis (Contd.)
k  1 if k   n / 2
max(k  1, n  k )  
 n  k if k   n / 2 The summation is expanded

1  E (T (n  1))  E (T (n  2))    E (T (n   n / 2 ))  
E[T (n)]  O(n)   
n  E (T (  n / 2 ))    E (T (n  1)) 

• If n is odd, T(n – 1) thru T(n/2) occur twice and T(n/2) occurs once.
• If n is even, T(n – 1) thru T(n/2) occur twice.

Thus, we have
2 n 1
E[T (n)]   E[T (k )]  O(n).
n k  n / 2

der - 17 Comp 122 Lin / Devi


Average-case Analysis (Contd.)
 We solve the recurrence by substitution.
 Guess E[T(n)] = O(n).
2 n 1  cn c 
E[T (n)]   ck  an
n k  n / 2
cn     an   cn,
4 2 
2c  n 1  n / 2  1  cn c
   k   k   an    an  0
n  k 1 k 1 
4 2
 n(c / 4  a )  c / 2, or
c  3n 2 n 
    2   an c/2 2c
n 4 2  n  , if c  4a.
c / 4  a c  4a
 3n 1 2 
 c     an
 4 2 n

3cn c
  an
Thus, if we assume T(n) = O(1) for
4 2 n < 2c/(c – 4a), we have E[T(n)] =
 cn c  O(n).
 cn     an 
4 2 

der - 18 Comp 122 Lin / Devi


Selection in Worst-Case Linear Time
 Algorithm Select:
» Like RandomizedSelect, finds the desired element by
recursively partitioning the input array.
» Unlike RandomizedSelect, is deterministic.
• Uses a variant of the deterministic Partition routine.
• Partition is told which element to use as the pivot.
» Achieves linear-time complexity in the worst case by
• Guaranteeing that the split is always “good” at each
Partition.
• How can a good split be guaranteed?

der - 19 Comp 122 Lin / Devi


Guaranteeing a Good Split
 We will have a good split if we can ensure that
the pivot is the median element or an element
close to the median.
 Hence, determining a reasonable pivot is the first
step.

der - 20 Comp 122 Lin / Devi


Choosing a Pivot
 Median-of-Medians:
» Divide the n elements into n/5 groups.
 n/5 groups contain 5 elements each. 1 group contains
n mod 5 < 5 elements.
• Determine the median of each of the groups.
– Sort each group using Insertion Sort. Pick the median from the sorted
list of group elements.
• Recursively find the median x of the n/5 medians.
 Recurrence for running time (of median-of-
medians):
» T(n) = O(n) + T(n/5) + ….
der - 21 Comp 122 Lin / Devi
Algorithm Select
 Determine the median-of-medians x (using the procedure
on the previous slide.)
 Partition the input array around x using the variant of
Partition.
 Let k be the index of x that Partition returns.
 If k = i, then return x.
 Else if i < k, then apply Select recursively to A[1..k–1] to
find the ith smallest element.
 Else if i > k, then apply Select recursively to A[k+1..n] to
find the (i – k)th smallest element.
(Assumption: Select operates on A[1..n]. For subarrays A[p..r],
suitably change k. )

der - 22 Comp 122 Lin / Devi


Worst-case Split
Arrows point from larger to smaller elements.

Elements < x n/5 groups of 5 elements each.


n/5th group of n mod 5
elements.

Median-of-medians, x
Elements > x

der - 23 Comp 122 Lin / Devi


Worst-case Split
 Assumption: Elements are distinct. Why?
 At least half of the n/5 medians are greater than x.
 Thus, at least half of the n/5 groups contribute 3
elements that are greater than x.
» The last group and the group containing x may contribute
fewer than 3 elements. Exclude these groups.
 Hence, the no. of elements > x is at least  1  n   3n
3      2    6
  2  5   10

 Analogously, the no. of elements < x is at least 3n/10–6.


 Thus, in the worst case, Select is called recursively on at
most 7n/10+6 elements.
der - 24 Comp 122 Lin / Devi
Recurrence for worst-case running time
 T(Select)  T(Median-of-medians) +T(Partition)
+T(recursive call to select)

 T(n)  O(n) + T(n/5) + O(n) + T(7n/10+6)


T(Median-of-medians) T(Partition) T(recursive call)

= T(n/5) + T(7n/10+6) + O(n)

 Assume T(n)  (1), for n  140.

der - 25 Comp 122 Lin / Devi


Solving the recurrence
 To show: T(n) = O(n)  cn for suitable c and all n > 0.
 Assume: T(n)  cn for suitable c and all n  140.
 Substituting the inductive hypothesis into the recurrence,
» T(n)  c n/5 + c(7n/10+6)+an
 cn/5 + c + 7cn/10 + 6c + an –cn/10 + 7c + an  0  c 
10a(n/(n – 70)), when n >
= 9cn/10 + 7c + an 70.
= cn +(–cn/10 + 7c + an)
 cn, if –cn/10 + 7c + an  0. For n  140, c  20a.

 n/(n–70) is a decreasing function of n. Verify.


 Hence, c can be chosen for any n = n0 > 70, provided it can be
assumed that T(n) = O(1) for n  n0.
 Thus, Select has linear-time complexity in the worst case.
der - 26 Comp 122 Lin / Devi

You might also like