0% found this document useful (0 votes)
24 views

Lecture 7

Quicksort is an efficient sorting algorithm that works in O(n log n) time on average. It uses a divide-and-conquer approach, recursively partitioning the array around a pivot element into smaller subarrays until each subarray contains a single element. The key partitioning step takes linear time O(n) but balances the partitions well on average, resulting in the efficient O(n log n) runtime. The analysis considers both the worst case O(n^2) for already sorted inputs and the average case, solving a recurrence relation to show the average runtime is O(n log n).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 7

Quicksort is an efficient sorting algorithm that works in O(n log n) time on average. It uses a divide-and-conquer approach, recursively partitioning the array around a pivot element into smaller subarrays until each subarray contains a single element. The key partitioning step takes linear time O(n) but balances the partitions well on average, resulting in the efficient O(n log n) runtime. The analysis considers both the worst case O(n^2) for already sorted inputs and the average case, solving a recurrence relation to show the average runtime is O(n log n).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

CS 332: Algorithms

Quicksort

Homework 2
Assigned today, due next Wednesday
Will be on web page shortly after class
Go over now

Review: Quicksort
Sorts in place
Sorts O(n lg n) in the average case
Sorts O(n2) in the worst case

But in practice, its quick


And the worst case doesnt happen often (but more
on this later)

Quicksort

Another divide-and-conquer algorithm

The array A[p..r] is partitioned into two non-empty


subarrays A[p..q] and A[q+1..r]

Invariant: All elements in A[p..q] are less than all


elements in A[q+1..r]

The subarrays are recursively sorted by calls to


quicksort
Unlike merge sort, no combining step: two
subarrays form an already-sorted array

Quicksort Code
Quicksort(A, p, r)
{
if (p < r)
{
q = Partition(A, p, r);
Quicksort(A, p, q);
Quicksort(A, q+1, r);
}
}

Partition

Clearly, all the action takes place in the


partition() function

Rearranges the subarray in place


End result:
Two subarrays
All values in first subarray all values in second

Returns the index of the pivot element separating


the two subarrays

How do you suppose we implement this?

Partition In Words

Partition(A, p, r):

Select an element to act as the pivot (which?)


Grow two regions, A[p..i] and A[j..r]
All elements in A[p..i] <= pivot
All elements in A[j..r] >= pivot

Increment i until A[i] >= pivot


Decrement j until A[j] <= pivot
Swap A[i] and A[j]
Repeat until i >= j
Note: slightly different from
Return j
books partition()

Partition Code
Partition(A, p, r)
x = A[p];
i = p - 1;
j = r + 1;
while (TRUE)
repeat
j--;
until A[j] <= x;
repeat
i++;
until A[i] >= x;
if (i < j)
Swap(A, i, j);
else
return j;

Illustrate on
A = {5, 3, 2, 6, 4, 1, 3, 7};

What is the running time of


partition()?

Partition Code
Partition(A, p, r)
x = A[p];
i = p - 1;
j = r + 1;
while (TRUE)
repeat
j--;
until A[j] <= x;
repeat
i++;
until A[i] >= x;
if (i < j)
Swap(A, i, j);
else
return j;

partition() runs in O(n) time

Analyzing Quicksort

What will be the worst case for the algorithm?

What will be the best case for the algorithm?

Partition is perfectly balanced

Which is more likely?

Partition is always unbalanced

The latter, by far, except...

Will any particular input elicit the worst case?

Yes: Already-sorted input

Analyzing Quicksort

In the worst case:


T(1) = (1)
T(n) = T(n - 1) + (n)

Works out to
T(n) = (n2)

Analyzing Quicksort

In the best case:


T(n) = 2T(n/2) + (n)

What does this work out to?


T(n) = (n lg n)

Improving Quicksort
The real liability of quicksort is that it runs in
O(n2) on already-sorted input
Book discusses two solutions:

Randomize the input array, OR


Pick a random pivot element

How will these solve the problem?

By insuring that no particular input can be chosen


to make quicksort run in O(n2) time

Analyzing Quicksort: Average Case


Assuming random input, average-case running
time is much closer to O(n lg n) than O(n 2)
First, a more intuitive explanation/example:

Suppose that partition() always produces a 9-to-1


split. This looks quite unbalanced!
The recurrence is thus:
Use n instead of O(n)
for convenience (how?)
T(n) = T(9n/10) + T(n/10) + n
How deep will the recursion go? (draw it)

Analyzing Quicksort: Average Case

Intuitively, a real-life run of quicksort will


produce a mix of bad and good splits

Randomly distributed among the recursion tree


Pretend for intuition that they alternate between
best-case (n/2 : n/2) and worst-case (n-1 : 1)
What happens if we bad-split root node, then
good-split the resulting size (n-1) node?

Analyzing Quicksort: Average Case

Intuitively, a real-life run of quicksort will


produce a mix of bad and good splits

Randomly distributed among the recursion tree


Pretend for intuition that they alternate between
best-case (n/2 : n/2) and worst-case (n-1 : 1)
What happens if we bad-split root node, then
good-split the resulting size (n-1) node?

We fail English

Analyzing Quicksort: Average Case

Intuitively, a real-life run of quicksort will produce


a mix of bad and good splits

Randomly distributed among the recursion tree


Pretend for intuition that they alternate between bestcase (n/2 : n/2) and worst-case (n-1 : 1)
What happens if we bad-split root node, then good-split
the resulting size (n-1) node?
We end up with three subarrays, size 1, (n-1)/2, (n-1)/2
Combined cost of splits = n + n -1 = 2n -1 = O(n)
No worse than if we had good-split the root node!

Analyzing Quicksort: Average Case


Intuitively, the O(n) cost of a bad split
(or 2 or 3 bad splits) can be absorbed
into the O(n) cost of each good split
Thus running time of alternating bad and good
splits is still O(n lg n), with slightly higher
constants
How can we be more rigorous?

Analyzing Quicksort: Average Case

For simplicity, assume:

All inputs distinct (no repeats)


Slightly different partition() procedure
partition around a random element, which is not included
in subarrays
all splits (0:n-1, 1:n-2, 2:n-3, , n-1:0) equally likely

What is the probability of a particular split


happening?
Answer: 1/n

Analyzing Quicksort: Average Case


So partition generates splits
(0:n-1, 1:n-2, 2:n-3, , n-2:1, n-1:0)
each with probability 1/n
If T(n) is the expected running time,

1 n 1
T n T k T n 1 k n
n k 0
What is each term under the summation for?
What is the (n) term for?

Analyzing Quicksort: Average Case

So

1 n 1
T n T k T n 1 k n
n k 0
2 n 1
T k n
n k 0

Write it on
the board

Note: this is just like the books recurrence (p166),


except that the summation starts with k=0
Well take care of that in a second

Analyzing Quicksort: Average Case

We can solve this recurrence using the dreaded


substitution method

Guess the answer


Assume that the inductive hypothesis holds
Substitute it in for some value < n
Prove that it follows for n

Analyzing Quicksort: Average Case

We can solve this recurrence using the dreaded


substitution method

Guess the answer

Whats the answer?

Assume that the inductive hypothesis holds


Substitute it in for some value < n
Prove that it follows for n

Analyzing Quicksort: Average Case

We can solve this recurrence using the dreaded


substitution method

Guess the answer

T(n) = O(n lg n)

Assume that the inductive hypothesis holds


Substitute it in for some value < n
Prove that it follows for n

Analyzing Quicksort: Average Case

We can solve this recurrence using the dreaded


substitution method

Guess the answer

Assume that the inductive hypothesis holds

T(n) = O(n lg n)
Whats the inductive hypothesis?

Substitute it in for some value < n


Prove that it follows for n

Analyzing Quicksort: Average Case

We can solve this recurrence using the dreaded


substitution method

Guess the answer

Assume that the inductive hypothesis holds

T(n) = O(n lg n)
T(n) an lg n + b for some constants a and b

Substitute it in for some value < n


Prove that it follows for n

Analyzing Quicksort: Average Case

We can solve this recurrence using the dreaded


substitution method

Guess the answer

Assume that the inductive hypothesis holds

T(n) an lg n + b for some constants a and b

Substitute it in for some value < n

T(n) = O(n lg n)

What value?

Prove that it follows for n

Analyzing Quicksort: Average Case

We can solve this recurrence using the dreaded


substitution method

Guess the answer

Assume that the inductive hypothesis holds

T(n) an lg n + b for some constants a and b

Substitute it in for some value < n

T(n) = O(n lg n)

The value k in the recurrence

Prove that it follows for n

Analyzing Quicksort: Average Case

We can solve this recurrence using the dreaded


substitution method

Guess the answer

Assume that the inductive hypothesis holds

T(n) an lg n + b for some constants a and b

Substitute it in for some value < n

T(n) = O(n lg n)

The value k in the recurrence

Prove that it follows for n

Grind through it

Analyzing Quicksort: Average Case


2 n 1
T n T k n
n k 0

The recurrence to be solved

2 n 1
ak lg k b n
n k 0

Plug
What
in inductive
are we doing
hypothesis
here?

n 1
2

b ak lg k b n
n
k 1

Expand
case
Whatout
arethe
we k=0
doing
here?

2 n 1
2b
ak lg k b
n
n k 1
n

2b/n is just a constant,


What are we doing here?
so fold it into (n)

2 n 1
ak lg k b n
n k 1

Note: leaving the same


recurrence as the book

Analyzing Quicksort: Average Case


2 n 1
T n ak lg k b n
n k 1
2 n 1
2 n 1
ak lg k b n
n k 1
n k 1

The recurrence to be solved

Distribute
thewe
summation
What are
doing here?

2a n 1
2b
Evaluate the summation:

k
lg
k

(
n

1
)

n
What are we doing here?

b+b++b
= b (n-1)
n k 1
n
2a n 1

k lg k 2b n

n k 1
This summation gets its own set of slides later

Since
n-1<n,
2b(n-1)/n
< 2b
What
are we
doing here?

Analyzing Quicksort: Average Case


2a n 1
T n
k lg k 2b n

n k 1

The recurrence to be solved

2a 1 2
1 2
Wellthe
prove
this later
hell?

n lg n n 2b n What
n 2
8
a
an lg n n 2b n
Distribute
thewe(2a/n)
What are
doingterm
here?
4
a

our goal is to get


an lg n b n b n Remember,
What are we doing here?
4 T(n) an lg n + b

Pick a large enough that


an lg n b
How did we do this?
an/4 dominates (n)+b

Analyzing Quicksort: Average Case

So T(n) an lg n + b for certain a and b

Thus the induction holds


Thus T(n) = O(n lg n)
Thus quicksort runs in O(n lg n) time on average
(phew!)

Oh yeah, the summation

Tightly Bounding
The Key Summation
n 1

n 2 1

n 1

k 1

k 1

k n 2

n 2 1

n 1

k 1

k n 2

k lg k k lg k k lg k

k lg k k lg n

n 2 1

n 1

k 1

k n 2

k lg k lg n k

Split the summation for a


What are we doing here?
tighter bound
The lg k in the second term
What are we doing here?
is bounded by lg n
Move the lg n outside the
What are we doing here?
summation

Tightly Bounding
The Key Summation
n 1

n 2 1

n 1

k 1

k 1

k n 2

k lg k k lg k lg n k

The summation bound so


far

n 2 1

n 1

k 1

k n 2

k lg n 2 lg n k

n 2 1

k lg n 1 lg n
k 1

lg n 1

n 1

The lg k in the first term is


What are we doing here?
bounded by lg n/2

lg n/2
= lg
n we
- 1 doing here?
What
are

k n 2

n 2 1

n 1

k 1

k n 2

k lg n k

Move (lg n - 1) outside the


What are we doing here?
summation

Tightly Bounding
The Key Summation
n 1

n 2 1

n 1

k 1

k 1

k n 2

k lg k lg n 1 k lg n k
lg n

n 2 1

n 2 1

k 1

k 1

k lg n

n 1

n 2 1

k 1

k 1

lg n k

n 1

Distribute
the
(lg nhere?
- 1)
What
are we
doing

k n 2

The summations overlap in


What are we doing here?
range; combine them

n 1 (n)
lg n

2

The summation bound so


far

n 2 1

k
k 1

TheWhat
Guassian
are weseries
doing here?

Tightly Bounding
The Key Summation
n 1 (n)
k lg k
lg n

k 1
n 1

n 2 1

The summation bound so


far

k 1

n 2 1
1
n n 1 lg n k
2
k 1

Rearrange first term, place


What are we doing here?
upper bound on second

1
1 n n
n n 1 lg n 1
2
2 2 2
1 2
1 2 n
n lg n n lg n n
2
8
4

X Guassian
What are series
we doing?

Multiply it
What are we doing?
all out

Tightly Bounding
The Key Summation
n 1

1 2
1 2 n
k lg k n lg n n lg n n

2
8
4
k 1
1 2
1 2
n lg n n when n 2
2
8
Done!!!

You might also like