0% found this document useful (0 votes)
3 views101 pages

4 Quicksort.v2

The document explains the Quicksort algorithm, focusing on the partitioning process which divides elements into those less than or equal to a pivot and those greater than it. It discusses the correctness of the partitioning method, its running time, and the average, best, and worst-case scenarios for Quicksort. Additionally, it highlights the impact of randomness on the performance of the algorithm.

Uploaded by

akashgagan003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views101 pages

4 Quicksort.v2

The document explains the Quicksort algorithm, focusing on the partitioning process which divides elements into those less than or equal to a pivot and those greater than it. It discusses the correctness of the partitioning method, its running time, and the average, best, and worst-case scenarios for Quicksort. Additionally, it highlights the impact of randomness on the performance of the algorithm.

Uploaded by

akashgagan003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 101

 What does it do?

 A[r] is called the pivot


 Partitions the elements A[p…r-1] in to two sets, those ≤ pivot
and those > pivot
 Operates in place
 Final result:
p pivot r

≤ pivot > pivot


…5 7 1 2 8 4 3 6…

p r
i

…5 7 1 2 8 4 3 6…

p r
i j

…5 7 1 2 8 4 3 6…

p r
i j

…5 7 1 2 8 4 3 6…

p r
ij

…5 7 1 2 8 4 3 6…

p r
ij

…5 7 1 2 8 4 3 6…

p r
i j

…5 7 1 2 8 4 3 6…

p r
i j

…5 7 1 2 8 4 3 6…

p r
i j

…5 7 1 2 8 4 3 6…

p r
i j

…5 1 7 2 8 4 3 6…

p r
i j

…5 1 7 2 8 4 3 6…

p r
i j

…5 1 7 2 8 4 3 6…

p r
i j

…5 1 2 7 8 4 3 6…

p r
i j

…5 1 2 7 8 4 3 6…

p r
i j

…5 1 2 7 8 4 3 6…

p r

What’s happening?
i j

…5 1 2 7 8 4 3 6…

p r

≤ pivot > pivot unprocessed


i j

…5 1 2 7 8 4 3 6…

p r
i j

…5 1 2 4 8 7 3 6…

p r
i j

…5 1 2 4 3 7 8 6…

p r
i j

…5 1 2 4 3 6 8 7…

p r
i j

…5 1 2 4 3 6 8 7…

p r
Is Partition correct?
 Partitions the elements A[p…r-1] in to two sets, those ≤ pivot
and those > pivot?
 Loop Invariant:
Is Partition correct?
 Partitions the elements A[p…r-1] in to two sets, those ≤ pivot
and those > pivot?
 Loop Invariant: A[p…i] ≤ A[r] and
A[i+1…j-1] > A[r]
Proof by induction
 Loop Invariant: A[p…i] ≤ A[r] and A[i+1…j-1] > A[r]
 Base case: A[p…i] and A[i+1…j-1] are empty
 Assume it holds for j -1
 two cases:
 A[j] > A[r]
 A[p…i] remains unchanged

 A[i+1…j] contains one additional element, A[j] which is

> A[r]
Proof by induction
 Loop Invariant: A[p…i] ≤ A[r] and A[i+1…j-1] > A[r]
 2nd case:
 A[j] ≤ A[r]
 i is incremented

 A[i] swapped with A[j] – A[p…i] constains one additional

element which is ≤ A[r]


 A[i+1…j-1] will contain the same elements, except the last

element will be the old first element


Partition running time?
 Θ(n)
Quicksort
8 5 1 3 6 2 7 4
8 5 1 3 6 2 7 4
1 3 2 4 6 8 7 5
1 3 2 4 6 8 7 5
1 3 2 4 6 8 7 5
1 2 3 4 6 8 7 5
1 2 3 4 6 8 7 5
1 2 3 4 6 8 7 5
1 2 3 4 6 8 7 5
1 2 3 4 6 8 7 5
1 2 3 4 5 8 7 6

What happens here?


1 2 3 4 5 8 7 6
1 2 3 4 5 8 7 6
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Some observations
 Divide and conquer: different than MergeSort
– do the work before recursing
 How many times is/can an element selected

for as a pivot?
 What happens after an element is selected

as a pivot?

1 3 2 4 6 8 7 5
Is Quicksort correct?
 Assuming Partition is correct
 Proof by induction
 Base case: Quicksort works on a list of 1 element
 Inductive case:
 Assume Quicksort sorts arrays for arrays of smaller < n elements,
show that it works to sort n elements
 If partition works correctly then we have:
 and, by our inductive assumption, we have:

pivot

A sorted sorted

≤ pivot > pivot


Running time of Quicksort?
 Worst case?
 Each call to Partition splits the array into an empty
array and n-1 array
Quicksort: Worse case running
time
T (n) T (n  1)  (n)
Which is? Θ(n2)

 When does this happen?


 sorted

 reverse sorted
 near sorted/reverse sorted
Quicksort best case?
 Eachcall to Partition splits the array into two
equal parts
T (n) 2T (n / 2)  (n)
 O(nlog n)
 When does this happen?
 random data
 ?
Quicksort Average case?
 How close to “even” splits do they need to be to
maintain an O(n log n) running time?
 Say the Partition procedure always splits the array
into some constant ratio b-to-a, e.g. 9-to-1
 What is the recurrence?

 a   b 
T (n) T  n T  n   cn
 a b   a b 
 a   b 
T (n) T  n T  n   cn
 a b   a b 

cn

a b
T( n) T( n)
a b a b
 a   b 
T (n) T  n T  n   cn
 a b   a b 

cn

a b
cn cn
a b a b

 a2   ab   ab   b2 
T  n  T  n  T  2
n  T  n 
 ( a  b)
2 2 2
 ( a  b)   ( a  b)    ( a  b) 
 a   b 
T (n) T  n T  n   cn
 a b   a b 

cn

a b
cn cn
a b a b

a2 ab ab b2
cn cn n n
( a  b) 2 ( a  b) 2 ( a  b) 2 ( a  b) 2

a3 a 2b a 2b ab 2 a 2b ab 2 ab 2 b3
T( n ) T ( n ) T ( n ) T ( n) T( n ) T ( n ) T ( n ) T( n)
( a  b) 2 ( a  b) 2 ( a  b) 2 ( a  b) 2 ( a  b) 2 ( a  b) 2 ( a  b) 2 ( a  b) 2
 a   b 
T (n) T  n T  n   cn
 a b   a b 

Level 0: cn

 a   b 
Level 1: cn   cn
 a b 
 cn
 a b 

 a2   ab   ab   b2 
Level 2: cn 
2 
 cn 
 ( a  b) 2   cn 
 ( a  b) 2   cn 
2 
 ( a  b)       ( a  b) 

 a 2  2ab  b 2   ( a  b) 2 

cn 2

 cn (a  b) 2  cn
 ( a  b )   

 ( a  b) 2 a  ( a  b) 2 b 
cn 
Level 3:  ( a  b) 3 

 (a  b)(a  b) 2 
cn 3
 cn
 ( a  b ) 

 ( a  b) d 
Level d: cn d
 cn
 ( a  b) 
What is the depth of the tree?
 Leaves will have different heights
 Want to pick the deepest leave
 Assume a < b
What is the depth of the tree?
 Assume a<b

d
 b 
  n 1
 a b 


d log ab n
b
Cost of the tree
 Cost of each level ≤ cn
?
Cost of the tree
 Costof each level ≤ cn
 Times the maximum depth

O(n log ab n)


b

 Why not?
(n log ab n)
b
Quicksort average case: take 2
 What would happen if half the time Partition produced
a “bad” split and the other half “good”?

“good” 50/50 split

cn

n 1 n 1
T( ) T( )
2 2

n 1
T (n)2T ( )  ( n )
2
Quicksort average case: take 2
cn “bad” split

T (1) T (n  1)
Quicksort average case: take 2
cn “bad” split

T (1) c(n  1)
“good” 50/50 split

n 1 n 1
T(  1) T( )
2 2

recursion cost partition cost

n 1 n 1
T (n)T (1)  T (  1)  T ( )  (n)  (n  1)
2 2
Quicksort average case: take 2
cn “bad” split

T (1) c(n  1)
“good” 50/50 split

n 1 n 1
T(  1) T( )
2 2

We absorb the “bad”


partition. In general, we
n 1 n 1 can absorb any constant
T (n)T (  1)  T ( )  ( n ) number of “bad”
2 2
partitions
How can we avoid the worst
case?
 Inject randomness into the data
What is the running time of
randomized Quicksort?
 Worst case?

O(n2)
 Stillcould get very unlucky and pick “bad”
partitions at every step
randomized Quicksort:
expected running time
 How many calls are made to Partition for an input of
size n? n
 What is the cost of a call to Partition?
 Cost is proportional to the number of iterations of
the for loop
the total number of
comparisons will give
us a bound on the
running time
Counting the number of
comparisons
 Let zi of z1, z2,…, zn be the i th smallest
element
 Let Z be the set of elements Z = z , z ,…, z
ij ij i i+1 j

A = [3, 9, 7, 2]
z1 = 2
z2 = 3 Z24 =
z3 = 7
z4 = 9
Counting the number of
comparisons
 Let zi of z1, z2,…, zn be the i th smallest
element
 Let Z be the set of elements Z = z , z ,…, z
ij ij i i+1 j

A = [3, 9, 7, 2]
z1 = 2
z2 = 3 Z24 = [3, 7, 9]
z3 = 7
z4 = 9
Counting comparisons
1 if zi is compared to z j
Let X ij I {zi is compared to z j } 
0 otherwise
(indicator random variable)

 How many times can zi be compared to zj?


 At most once. Why?

n 1
X i 1  j i 1 X ij
Total number of n
comparisons
Counting comparisons:
average running time
 n 1
E[ X ] E i 1  j i 1 X ij
n

n 1
i 1  j i 1 E[ X ij ]
n expectation of sums is the sum of
expectations

n 1
i 1  j i 1 p{zi is compared to z j }
n

remember,
1 if zi is compared to z j
X ij I {zi is compared to z j } 
0 otherwise
p{zi is compared to z j } ?
 The pivot element separates the set of numbers into
two sets (those less than the pivot and those larger).
Elements from one set will never be compared to
elements of the other set
 If a pivot x is chosen zi < x < zj then zi and zj how
many times will zi and zj be compared?
 What is the only time that zi and zj will be
compared?
 In Zij, when will zi and zj will be compared?
p{zi is compared to z j } ?

p{zi is compared to z j }  p{zi or z j is first pivot chosen from Z ij }


 p{zi is first pivot chosen from Z ij } 
p(a,b) = p(a)+p(b) for
independent events p{z j is first pivot chosen from Z ij }

pivot is chosen 1 1
randomly over j-i+1  
elements
j  1 1 j  1 1

2

j  1 1
E[X] ?
n 1 2
E[ X ] i 1  j i 1
n

j  i 1
n 1 2
i 1  j i 1
n
Let k = j-i
k 1
n 1 2
 i 1  j i 1
n

k
n 1
i 1 O(log n) 
n
k 1
2 / k ln n  O(1) O(log n)

O(n log n)
Memory usage?
 Quicksortonly uses O(1) additional memory
 How does randomized Quicksort compare to

Mergesort?
Medians
 The median of a set of numbers is the number such
that half of the numbers are larger and half smaller

A = [50, 12, 1, 97, 30]

 How might we calculate the median of a set?


 Sort the numbers, then pick the n/2 element
Medians
 The median of a set of numbers is the number such
that half of the numbers are larger and half smaller

A = [50, 12, 1, 97, 30]

 How might we calculate the median of a set?


 Sort the numbers, then pick the n/2 element
A = [1, 12, 30, 50, 97]

Θ(n log n)
Medians
 Can we do better? By sorting the data, it
seems like we’re doing more work than we
need to.
 Partition takes Θ(n) time and performs a
similar operation
 Selection problem: find the kth smallest
element of A
 If we could solve the selection problem, how
would this help us?
Selection: divide and conquer
 Partition splits the data into 3 sets:
those < A[q], = A[q] and > A[q]
 With each recursive call, we can narrow the search
in to one of these sets
 Like binary search on unsorted data

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

5 7 1 4 8 3 2 6

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

5 1 4 3 2 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

5 1 4 3 2 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

5 1 4 3 2 6 8 7

At each call, discard


part of the array

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

1 2 4 3 5 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

1 2 4 3 5 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

1 2 4 3 5 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

1 2 4 3 5 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

1 2 4 3 5 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

1 2 4 3 5 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

1 2 3 4 5 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Selection(A, 3, 1, 8)

1 2 3 4 5 6 8 7

Selection(A, k, p, r)
q <- Partition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Running time of Selection?
 Best case?
 Each call to Partition throws away half the data

 Recurrence?
T (n) T (n / 2)  (n)

 O(n)
Running time of Selection?
 Worst case?
 Each call to Partition only reduces our search by 1

 Recurrence?
T (n) T (n  1)  (n)

 O(n2)
How can randomness help us?

RSelection(A, k, p, r)
q <- RPartition(A,p,r)
if k = q
Return A[q]
else if k < q
Selection(A, k, p, q-1)
else // k > q
Selection(A, k, q+1, r)
Running time of RSelection?
 Best case
 O(n)
 Worst case
 Still O(n2)
 As with Quicksort, we can get unlucky
 Average case?
Average case
 Depends on how much data we throw away
at each step
Average case
 We’ll call a partition “good” if the pivot falls
within within the 25th and 75th percentile
 Or, each of the partions contains at least 25%

of the data
 What is the probability of a “good” partition?

 Half of the elements lie within this range and

half outside, so 50% chance


Average case
 Recall, that like Quicksort, we can absorb the
cost of a number of “bad” partitions
Average case
 On average, how many times will Partition need to
be called before be get a good partition?
 Let E be the number of times
 Recurrence:
half the time we get a good
1 partition on the first try and half
E 1 E of the time, we have to try again
2
1 1 1 1
1      ...
2 4 8 16
2
Average case
 Another look. Let p be the probability of success
 Let X be the number of calls required


E[ X ]  j 1 p (1  p ) j  1
p 
 
1  p j 1
(1  p ) j 1

p (1  p )

1 p p2
1

p
Average case
T (n) T (3 / 4n)  O(n)

? roll in the cost of the


“bad” partitions
Average case
T (n) T (3 / 4n)  O(n)

We throw away at roll in the cost of the


least ¼ of the data “bad” partitions
Randomized algorithms
 We used randomness to minimize the
possibility of worst case running time
 Other uses
 approximation algorithms, i.e. random walks
 initialization, e.g. K-means clustering
 contention resolution
 reinforcement learning/interacting with an ill-
defined world

You might also like