4 Quicksort.v2
4 Quicksort.v2
p r
i
…5 7 1 2 8 4 3 6…
p r
i j
…5 7 1 2 8 4 3 6…
p r
i j
…5 7 1 2 8 4 3 6…
p r
ij
…5 7 1 2 8 4 3 6…
p r
ij
…5 7 1 2 8 4 3 6…
p r
i j
…5 7 1 2 8 4 3 6…
p r
i j
…5 7 1 2 8 4 3 6…
p r
i j
…5 7 1 2 8 4 3 6…
p r
i j
…5 1 7 2 8 4 3 6…
p r
i j
…5 1 7 2 8 4 3 6…
p r
i j
…5 1 7 2 8 4 3 6…
p r
i j
…5 1 2 7 8 4 3 6…
p r
i j
…5 1 2 7 8 4 3 6…
p r
i j
…5 1 2 7 8 4 3 6…
p r
What’s happening?
i j
…5 1 2 7 8 4 3 6…
p r
…5 1 2 7 8 4 3 6…
p r
i j
…5 1 2 4 8 7 3 6…
p r
i j
…5 1 2 4 3 7 8 6…
p r
i j
…5 1 2 4 3 6 8 7…
p r
i j
…5 1 2 4 3 6 8 7…
p r
Is Partition correct?
Partitions the elements A[p…r-1] in to two sets, those ≤ pivot
and those > pivot?
Loop Invariant:
Is Partition correct?
Partitions the elements A[p…r-1] in to two sets, those ≤ pivot
and those > pivot?
Loop Invariant: A[p…i] ≤ A[r] and
A[i+1…j-1] > A[r]
Proof by induction
Loop Invariant: A[p…i] ≤ A[r] and A[i+1…j-1] > A[r]
Base case: A[p…i] and A[i+1…j-1] are empty
Assume it holds for j -1
two cases:
A[j] > A[r]
A[p…i] remains unchanged
> A[r]
Proof by induction
Loop Invariant: A[p…i] ≤ A[r] and A[i+1…j-1] > A[r]
2nd case:
A[j] ≤ A[r]
i is incremented
for as a pivot?
What happens after an element is selected
as a pivot?
1 3 2 4 6 8 7 5
Is Quicksort correct?
Assuming Partition is correct
Proof by induction
Base case: Quicksort works on a list of 1 element
Inductive case:
Assume Quicksort sorts arrays for arrays of smaller < n elements,
show that it works to sort n elements
If partition works correctly then we have:
and, by our inductive assumption, we have:
pivot
A sorted sorted
reverse sorted
near sorted/reverse sorted
Quicksort best case?
Eachcall to Partition splits the array into two
equal parts
T (n) 2T (n / 2) (n)
O(nlog n)
When does this happen?
random data
?
Quicksort Average case?
How close to “even” splits do they need to be to
maintain an O(n log n) running time?
Say the Partition procedure always splits the array
into some constant ratio b-to-a, e.g. 9-to-1
What is the recurrence?
a b
T (n) T n T n cn
a b a b
a b
T (n) T n T n cn
a b a b
cn
a b
T( n) T( n)
a b a b
a b
T (n) T n T n cn
a b a b
cn
a b
cn cn
a b a b
a2 ab ab b2
T n T n T 2
n T n
( a b)
2 2 2
( a b) ( a b) ( a b)
a b
T (n) T n T n cn
a b a b
cn
a b
cn cn
a b a b
a2 ab ab b2
cn cn n n
( a b) 2 ( a b) 2 ( a b) 2 ( a b) 2
a3 a 2b a 2b ab 2 a 2b ab 2 ab 2 b3
T( n ) T ( n ) T ( n ) T ( n) T( n ) T ( n ) T ( n ) T( n)
( a b) 2 ( a b) 2 ( a b) 2 ( a b) 2 ( a b) 2 ( a b) 2 ( a b) 2 ( a b) 2
a b
T (n) T n T n cn
a b a b
Level 0: cn
a b
Level 1: cn cn
a b
cn
a b
a2 ab ab b2
Level 2: cn
2
cn
( a b) 2 cn
( a b) 2 cn
2
( a b) ( a b)
a 2 2ab b 2 ( a b) 2
cn 2
cn (a b) 2 cn
( a b )
( a b) 2 a ( a b) 2 b
cn
Level 3: ( a b) 3
(a b)(a b) 2
cn 3
cn
( a b )
( a b) d
Level d: cn d
cn
( a b)
What is the depth of the tree?
Leaves will have different heights
Want to pick the deepest leave
Assume a < b
What is the depth of the tree?
Assume a<b
d
b
n 1
a b
…
d log ab n
b
Cost of the tree
Cost of each level ≤ cn
?
Cost of the tree
Costof each level ≤ cn
Times the maximum depth
Why not?
(n log ab n)
b
Quicksort average case: take 2
What would happen if half the time Partition produced
a “bad” split and the other half “good”?
cn
n 1 n 1
T( ) T( )
2 2
n 1
T (n)2T ( ) ( n )
2
Quicksort average case: take 2
cn “bad” split
T (1) T (n 1)
Quicksort average case: take 2
cn “bad” split
T (1) c(n 1)
“good” 50/50 split
n 1 n 1
T( 1) T( )
2 2
n 1 n 1
T (n)T (1) T ( 1) T ( ) (n) (n 1)
2 2
Quicksort average case: take 2
cn “bad” split
T (1) c(n 1)
“good” 50/50 split
n 1 n 1
T( 1) T( )
2 2
O(n2)
Stillcould get very unlucky and pick “bad”
partitions at every step
randomized Quicksort:
expected running time
How many calls are made to Partition for an input of
size n? n
What is the cost of a call to Partition?
Cost is proportional to the number of iterations of
the for loop
the total number of
comparisons will give
us a bound on the
running time
Counting the number of
comparisons
Let zi of z1, z2,…, zn be the i th smallest
element
Let Z be the set of elements Z = z , z ,…, z
ij ij i i+1 j
A = [3, 9, 7, 2]
z1 = 2
z2 = 3 Z24 =
z3 = 7
z4 = 9
Counting the number of
comparisons
Let zi of z1, z2,…, zn be the i th smallest
element
Let Z be the set of elements Z = z , z ,…, z
ij ij i i+1 j
A = [3, 9, 7, 2]
z1 = 2
z2 = 3 Z24 = [3, 7, 9]
z3 = 7
z4 = 9
Counting comparisons
1 if zi is compared to z j
Let X ij I {zi is compared to z j }
0 otherwise
(indicator random variable)
n 1
X i 1 j i 1 X ij
Total number of n
comparisons
Counting comparisons:
average running time
n 1
E[ X ] E i 1 j i 1 X ij
n
n 1
i 1 j i 1 E[ X ij ]
n expectation of sums is the sum of
expectations
n 1
i 1 j i 1 p{zi is compared to z j }
n
remember,
1 if zi is compared to z j
X ij I {zi is compared to z j }
0 otherwise
p{zi is compared to z j } ?
The pivot element separates the set of numbers into
two sets (those less than the pivot and those larger).
Elements from one set will never be compared to
elements of the other set
If a pivot x is chosen zi < x < zj then zi and zj how
many times will zi and zj be compared?
What is the only time that zi and zj will be
compared?
In Zij, when will zi and zj will be compared?
p{zi is compared to z j } ?
pivot is chosen 1 1
randomly over j-i+1
elements
j 1 1 j 1 1
2
j 1 1
E[X] ?
n 1 2
E[ X ] i 1 j i 1
n
j i 1
n 1 2
i 1 j i 1
n
Let k = j-i
k 1
n 1 2
i 1 j i 1
n
k
n 1
i 1 O(log n)
n
k 1
2 / k ln n O(1) O(log n)
O(n log n)
Memory usage?
Quicksortonly uses O(1) additional memory
How does randomized Quicksort compare to
Mergesort?
Medians
The median of a set of numbers is the number such
that half of the numbers are larger and half smaller
Θ(n log n)
Medians
Can we do better? By sorting the data, it
seems like we’re doing more work than we
need to.
Partition takes Θ(n) time and performs a
similar operation
Selection problem: find the kth smallest
element of A
If we could solve the selection problem, how
would this help us?
Selection: divide and conquer
Partition splits the data into 3 sets:
those < A[q], = A[q] and > A[q]
With each recursive call, we can narrow the search
in to one of these sets
Like binary search on unsorted data
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
5 7 1 4 8 3 2 6
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
5 1 4 3 2 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
5 1 4 3 2 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
5 1 4 3 2 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
1 2 4 3 5 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
1 2 4 3 5 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
1 2 4 3 5 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
1 2 4 3 5 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
1 2 4 3 5 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
1 2 4 3 5 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
1 2 3 4 5 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)
1 2 3 4 5 6 8 7
Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Running time of Selection?
Best case?
Each call to Partition throws away half the data
Recurrence?
T (n) T (n / 2) (n)
O(n)
Running time of Selection?
Worst case?
Each call to Partition only reduces our search by 1
Recurrence?
T (n) T (n 1) (n)
O(n2)
How can randomness help us?
RSelection(A, k, p, r)
q <- RPartition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Running time of RSelection?
Best case
O(n)
Worst case
Still O(n2)
As with Quicksort, we can get unlucky
Average case?
Average case
Depends on how much data we throw away
at each step
Average case
We’ll call a partition “good” if the pivot falls
within within the 25th and 75th percentile
Or, each of the partions contains at least 25%
of the data
What is the probability of a “good” partition?
E[ X ] j 1 p (1 p ) j 1
p
1 p j 1
(1 p ) j 1
p (1 p )
1 p p2
1
p
Average case
T (n) T (3 / 4n) O(n)