lecture7_v2
lecture7_v2
Mergesort / Quicksort
Problem of the Day
Give an efficient algorithm to determine whether two sets (of
size m and n) are disjoint. Analyze the complexity of your
algorithm in terms of m and n. Be sure to consider the case
where m is substantially smaller than n.
Mergesort
Recursive algorithms are based on reducing large problems
into small ones.
A nice recursive approach to sorting involves partitioning
the elements into two groups, sorting each of the smaller
problems recursively, and then interleaving the two sorted
lists to totally order the elements.
https://upload.wikimedia.org/wikipedia/commons/c/cc/Merge-sort-example-300px.
gif
Mergesort Implementation
Sort(A)
Quicksort(A,1,n)
Partition(A,low,high)
pivot = A[low]
leftwall = low
for i = low+1 to high
if (A[i] < pivot) then
leftwall = leftwall+1
swap(A[i],A[leftwall])
swap(A[low],A[leftwall])
Best Case for Quicksort
Since each element ultimately ends up in the correct position,
the algorithm correctly sorts. But how long does it take?
The best case for divide-and-conquer algorithms comes when
we split the input as evenly as possible. Thus in the best case,
each subproblem is of size n/2.
The partition step on each subproblem is linear in its size.
Thus the total effort in partitioning the 2k problems of size
n/2k is O(n).
Best Case Recursion Tree
p p
p p p p
Half the time, the pivot element will be from the center half
of the sorted array.
Whenever the pivot element is from positions n/4 to 3n/4, the
larger remaining subarray contains at most 3n/4 elements.
Optional
p=1 n
i=1
Optional
p=1 n
2 Xn
T (n) = T (p − 1) + n − 1
n p=1
n
nT (n) = 2 T (p − 1) + n(n − 1) multiply by n
X
p=1
n−1
(n−1)T (n−1) = 2 T (p−1)+(n−1)(n−2) apply to n-1
X
p=1
nT (n) − (n − 1)T (n − 1) = 2T (n − 1) + 2(n − 1)
rearranging the terms give us:
T (n) T (n − 1) 2(n − 1)
= +
n+1 n n(n + 1)
Optional
i=1 (i + 1)
We are really interested in A(n), so
A(n) = (n + 1)an ≈ 2(n + 1) ln n ≈ 1.38n lg n
Randomized Quicksort
Suppose you are writing a sorting program, to run on data
given to you by your worst enemy. Quicksort is good on
average, but bad on certain worst-case instances.
If you used Quicksort, what kind of data would your enemy
give you to run it on? Exactly the worst-case instance, to
make you look bad.
But suppose you picked the pivot element at random.
Now your enemy cannot design a worst-case instance to give
to you, because no matter which data they give you, you
would have the same probability of picking a good pivot!
Randomized Guarantees
Randomization is a very important and useful idea. By either
picking a random pivot or scrambling the permutation before
sorting it, we can say:
“With high probability, randomized quicksort runs in
Θ(n lg n) time.”
Where before, all we could say is:
“If you give me random input data, quicksort runs in
expected Θ(n lg n) time.”
Importance of Randomization
Since the time bound how does not depend upon your input
distribution, this means that unless we are extremely unlucky
(as opposed to ill prepared or unpopular) we will certainly get
good performance.
Randomization is a general tool to improve algorithms with
bad worst-case but good average-case complexity.
The worst-case is still there, but we almost certainly won’t
see it.
Pick a Better Pivot
Having the worst case occur when they are sorted or almost
sorted is very bad, since that is likely to be the case in certain
applications.
To eliminate this problem, pick a better pivot:
1. Use the middle element of the subarray as pivot.
2. Use a random element of the array as the pivot.
3. Perhaps best of all, take the median of three elements
(first, last, middle) as the pivot. Why should we use
median instead of the mean?
Whichever of these three rules we use, the worst case remains
O(n2).
Is Quicksort really faster than Heapsort?