0% found this document useful (0 votes)
86 views15 pages

Lecture 5

Quicksort is a fast sorting algorithm that uses partitioning to divide an array into subarrays. It selects a pivot element and partitions the array into three sections - elements less than, equal to, and greater than the pivot. The pivot ends up in its correct sorted position. This allows the subarrays to be sorted independently in recursive calls. Quicksort runs in O(n log n) time on average but can be O(n^2) in the worst case if the pivot always splits the array unevenly. Randomly selecting the pivot eliminates worst-case inputs and guarantees expected O(n log n) time.

Uploaded by

Abdurrouf
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views15 pages

Lecture 5

Quicksort is a fast sorting algorithm that uses partitioning to divide an array into subarrays. It selects a pivot element and partitions the array into three sections - elements less than, equal to, and greater than the pivot. The pivot ends up in its correct sorted position. This allows the subarrays to be sorted independently in recursive calls. Quicksort runs in O(n log n) time on average but can be O(n^2) in the worst case if the pivot always splits the array unevenly. Randomly selecting the pivot eliminates worst-case inputs and guarantees expected O(n log n) time.

Uploaded by

Abdurrouf
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Although mergesort is O(n lg n), it is quite inconvenient for implementation with arrays, since we need space to merge.

In practice, the fastest sorting algorithm is Quicksort, which uses partitioning as its main idea. Example: Pivot about 10.
17 12 6 19 23 8 5 10 6 8 5 10 23 19 12 17 -- before -- after

Quicksort

Partitioning places all the elements less than the pivot in the left part of the array, and all elements greater than the pivot in the right part of the array. The pivot ts in the slot between them. Note that the pivot element ends up in the correct place in the total order!

Once we have selected a pivot element, we can partition the array in one linear scan, by maintaining three sections of the array: < pivot, > pivot, and unexplored. Example: pivot about 10
| 17 12 6 19 23 8 5 | 10 | 5 12 6 19 23 8 | 17 5 | 12 6 19 23 8 | 17 5 | 8 6 19 23 | 12 17 5 8 | 6 19 23 | 12 17 5 8 6 | 19 23 | 12 17 5 8 6 | 23 | 19 12 17 5 8 6 ||23 19 12 17 5 8 6 10 19 12 17 23

Partitioning the elements

As we scan from left to right, we move the left bound to the right when the element is less than the pivot, otherwise we swap it with the rightmost unexplored element and move the right bound one step closer to the left.

Since the partitioning step consists of at most n swaps, takes time linear in the number of keys. But what does it buy us? 1. The pivot element ends up in the position it retains in the nal sorted order. 2. After a partitioning, no element ops to the other side of the pivot in the nal sorted order. Thus we can sort the elements to the left of the pivot and the right of the pivot independently! This gives us a recursive sorting algorithm, since we can use the partitioning approach to sort each subproblem.

Quicksort Animations

Sort(A) Quicksort(A,1,n)

Pseudocode

Quicksort(A, low, high) if (low < high) pivot-location = Partition(A,low,high) Quicksort(A,low, pivot-location - 1) Quicksort(A, pivot-location+1, high) Partition(A,low,high) pivot = A low] leftwall = low for i = low+1 to high if (A i] < pivot) then leftwall = leftwall+1 swap(A i],A leftwall]) swap(A low],A leftwall])

Since each element ultimately ends up in the correct position, the algorithm correctly sorts. But how long does it take? The best case for divide-and-conquer algorithms comes when we split the input as evenly as possible. Thus in the best case, each subproblem is of size n=2. The partition step on each subproblem is linear in its size. Thus the total e ort in partitioning the 2 problems of size n=2 is O(n). The recursion tree for the best case looks like this:
k k

Best Case for Quicksort

The total partitioning on each level is O(n), and it take lg n levels of perfect partitions to get to single element subproblems. When we are down to single elements, the problems are sorted. Thus the total time in the best case is O(n lg n).

Suppose instead our pivot element splits the array as unequally as possible. Thus instead of n=2 elements in the smaller half, we get zero, meaning that the pivot element is the biggest or smallest element in the array.

Worst Case for Quicksort

Now we have n ;21 levels, instead of lg n, for a worst case time of (n ), since the rst n=2 levels will each have more than n=2 elements to partition. Thus the worst case time for Quicksort is worse than Heapsort or Mergesort. To justify its name, Quicksort had better be good in the average case. Showing this requires some fairly intricate analysis. The divide and conquer principle applies to real life. If you will break a job into pieces, it is best to make the pieces of equal size!

Suppose we pick the pivot element at random in an array of n keys.


1 n/4 n/2 3n/4 n

Intuition: The Average Case for Quicksort

Half the time, the pivot element will be from the center half of the sorted array. Whenever the pivot element is from positions n=4 to 3n=4, the larger remaining subarray contains at most 3n=4 elements. If we assume that the pivot element is always in this range, what is the maximum number of partitions we need to get from n elements down to 1 element? (3=4) n = 1 ;! n = (4=3)
l l

lg n = l lg(4=3) Therefore l = lg(4=3) lg(n) < 2 lg n good partitions su ce.

At most 2 lg n levels of decent partitions su ces to sort an array of n elements. But how often when we pick an arbitrary element as pivot will it generate a decent partition? Since any number between n=4 and 3n=4 would make a decent pivot, we get one half the time on average. If we need 2 lg n levels of decent partitions to nish the job, and half of random partitions are decent, then on average the recursion tree to quicksort the array has 4 lg n levels.

What have we shown?

Since O(n) work is done partitioning on each level, the average time is O(n lg n). More careful analysis shows that the expected number of comparisons is 1:38n lg n.

To do a precise average-case analysis of quicksort, we formulate a recurrence given the exact expected time T (n):

Average-Case Analysis of Quicksort


X n(T (p ; 1) + T (n ; p)) + n ; 1 T (n ) = 1
n p

=1

Each possible pivot p is selected with equal probability. The number of comparisons needed to do the partition is n ; 1. We will need one useful fact about the Harmonic numbers H , namely
n

X 1=i H =
n n i

=1

ln n

It is important to understand (1) where the recurrence relation comes from and (2) how the log comes out from the summation. The rest is just messy algebra.

X n(T (p ; 1) + T (n ; p)) + n ; 1 T (n ) = 1
n p

=1

X T (p ; 1) + n(n ; 1) multiply by n nT (n) = 2 ; X T (p;1)+(n;1)(n;2) apply to n-1 (n;1)T (n;1) = 2


p n p

T (n) = 2
=1

X T (p ; 1) + n ; 1 n
n

=1

nT (n) ; (n ; 1)T (n ; 1) = 2T (n ; 1) + 2(n ; 1)

=1

rearranging the terms give us: T (n) = T (n ; 1) + 2(n ; 1) n n+1 n(n + 1) substituting a = A(n)=(n + 1) gives
n

a = a ;1 + 2(n ; 1) = n(n + 1)
n n

X 2(i ; 1)
n

X 2
n i n

=1 i

(i + 1)

1 (i + 1) 2 ln n =1

We are really interested in A(n), so A(n) = (n + 1)a 2(n + 1) ln n 1:38n lg n

The worst case for Quicksort depends upon how we select our partition or pivot element. If we always select either the rst or last element of the subarray, the worst-case occurs when the input is already sorted!
A B D F B D F D F F H H H H H J J J J J J K K K K K K K

What the Worst Case?


is

Having the worst case occur when they are sorted or almost sorted is very bad, since that is likely to be the case in certain applications. To eliminate this problem, pick a better pivot: 1. Use the middle element of the subarray as pivot. 2. Use a random element of the array as the pivot. 3. Perhaps best of all, take the median of three elements ( rst, last, middle) as the pivot. Why should we use median instead of the mean? Whichever of2 these three rules we use, the worst case remains O(n ). However, because the worst case is no longer a natural order it is much more di cult to occur.

Since Heapsort is (n lg n) and selection sort is (n2), there is no debate about which will be better for decentsized les. But how can we compare two (n lg n) algorithms to see which is faster? Using the RAM model and the big Oh notation, we can't! When Quicksort is implemented well, it is typically 2-3 times faster than mergesort or heapsort. The primary reason is that the operations in the innermost loop are simpler. The best way to see this is to implement both and experiment with di erent inputs. Since the di erence between the two programs will be limited to a multiplicative constant factor, the details of how you program each algorithm will make a big di erence. If you don't want to believe me when I say Quicksort is faster, I won't argue with you. It is a question whose solution lies outside the tools we are using.

Is Quicksort really faster than Heapsort?

Suppose you are writing a sorting program, to run on data given to you by your worst enemy. Quicksort is good on average, but bad on certain worst-case instances. If you used Quicksort, what kind of data would your enemy give you to run it on? Exactly the worst-case instance, to make you look bad. But instead of picking the median of three or the rst element as pivot, suppose you picked the pivot element at random. Now your enemy cannot design a worst-case instance to give to you, because no matter which data they give you, you would have the same probability of picking a good pivot! Randomization is a very important and useful idea. By either picking a random pivot or scrambling the permutation before sorting it, we can say: \With high probability, randomized quicksort runs in (n lg n) time." Where before, all we could say is: \If you give me random input data, quicksort runs in expected (n lg n) time."

Randomization

Since the time bound how does not depend upon your input distribution, this means that unless we are extremely unlucky (as opposed to ill prepared or unpopular) we will certainly get good performance. Randomization is a general tool to improve algorithms with bad worst-case but good average-case complexity. The worst-case is still there, but we almost certainly won't see it.

You might also like