Data Structures and Algorithms: (CS210/ESO207/ESO211)
Data Structures and Algorithms: (CS210/ESO207/ESO211)
(CS210/ESO207/ESO211)
Lecture 31
Average running time of Quick sort
1
Overview of this lecture
Main Objective:
Analyzing average time complexity of QuickSort using recurrence.
Using mathematical induction.
Solving the recurrence exactly.
The outcome of this analysis will be quite surprising!
Extra benefits:
You will learn a standard way of using mathematical induction to bound
time complexity of an algorithm. You must try to internalize it.
2
QuickSort
3
Pseudocode for QuickSort()
QuickSort()
{ If (||>1)
Pick and remove an element from ;
(
<
,
>
) Partition(,);
return( Concatenate(QuickSort(
<
), , QuickSort(
>
))
}
4
Pseudocode for QuickSort()
When the input is stored in an array
QuickSort(,, )
{ If ( < )
Partition(,,);
QuickSort(,, );
QuickSort(, +, )
}
Partition selects an element from [ ] as a pivot element, and permutes
the subarray [ ] such that elements preceding are smaller than , and
elements succeeding are greater than . It achieves this task in O( ) time
using O(1) extra space only.
5
0 1 2 3 4 5 6 7 8
Example: Partition(,,)
Without loss of generality, the first element is selected as the pivot element.
6
x
<x
>x
What happens after
Partition(,,)
0 1 2 3 4 5 6 7 8
Example: Partition(,,)
Without loss of generality, the first element is selected as the pivot element.
7
0 1 2 3 4 5 6 7 8
<x
>x
Analyzing average time complexity of
QuickSort
8
Part 1
Deriving the recurrence
Analyzing average time complexity of QuickSort
Assumption (for a neat analysis):
all elements are distinct.
Each recursive call selects the first element as the pivot element
Let
: th smallest element of .
Observation: the running time of quick sort depends upon the permutation of
s and not
on the values taken by
s.
() : Average running time for Quick sort on input of size .
Question: average over what ?
Answer: average over all possible permutations of {
1
,
2
, ,
}
Hence, ()= ()
,
where () is the time complexity (or no. of comparisons) when the input is permutation .
9
Calculating () from definition/scratch is
impractical, if not impossible.
All ! permutations of {
1
,
2
, ,
}
Analyzing average time complexity of QuickSort
Let P() be the set of all those permutations of {
1
,
2
, ,
.
Question: What fraction of all permutations constitutes P() ?
Answer:
1
Let (, ) be the average running time of QuickSort over P().
Question: What is the relation between () and (, )s ?
Answer: () =
(, )
=1
Observation: We now need to derive an expression for (, ).For this purpose, we need to
have a closer look at the execution of QuickSort over P().
10
Permutations beginning with
3
Permutations beginning with
2
Permutations beginning with
1
Analyzing average time complexity of QuickSort
Question: How does QuickSort proceed on a permutation from P() ?
Answer: The procedure Partition(, , ) permutes the array such that it looks like
Observation: For any given implementation of Partition(, , ) and a permutation
from P(), the resulting permutation is well-defined. Let S() be the set of permutations that
are outcome of executing Partition(, , ) on P().
Inference: Partition() can be viewed as a mapping from set P() to S().
Question: What kind of mapping from set P() to S() is defined by Partition() in this way?
Lemma1: The mapping defined by Partition() is a many-to-one uniform mapping from P() to
S(). In particular, exactly (
) permutations from P() get mapped to a single permutation
in S().
11
0 1 2 3 4 5 6 7 8
Permutation of
1
,,
1
Permutation of
+1
,,
( )! ( )! ( )!
Analyzing average time complexity of QuickSort
It follows from Lemma 1 that after Partition() on P(), each permutation of
1
,,
1
occurs
with same(uniform) frequency in [ ] and each permutation
+1
,,
occurs
with same (uniform) frequency in [ ].
Hence we can express (, ) (the time complexity of QuickSort averaged over P())as:
(, ) = ( ) + ( ) ----1
We showed previously that :
() =
(, )
=1
----2
Question: Can you express () recursively using 1 and 2?
() =
(( 1)
=1
+( )) + d
() = c
12
+ d
Analyzing average time complexity of
QuickSort
13
Part 2
Solving the recurrence through
mathematical induction
() = c
() =
(( 1)
=1
+ ( )) + d
=
()
1
=1
+ d
Assertion A(): () a + b for all 1
Base case A() : Holds for b c
Induction step: Assuming A() holds for all < , we have to prove A().
()
(a + b)
1
=1
+ d
( a
1
=1
) + 2b + d
=
( a
/2
=1
+ a )
1
=
2
+1
+ 2b + d
( a /
/2
=1
+ a )
1
=
2
+1
+ 2b + d
=
( a
1
=1
a)
/2
=1
+ 2b + d
=
(
()
+)
a) + 2b + d
a
a + 2b +d
= a + b
a + b + d
a + b for a >(b + d)
14
Analyzing average time complexity of
QuickSort
15
Part 3
Solving the recurrence exactly
Some elementary tools
=
=1
Question: How to approximate () ?
Answer: ()
+ , as increases
where is Eulers constant ~0.58
Hint:
We shall calculate average number of comparisons during QuickSort using:
our knowledge of solving recurrences by substitution
our knowledge of solving recurrence by unfolding
our knowledge of simplifying a partial fraction (from JEE days)
Students should try to internalize the way the above tools are used.
16
1
1/2
1/3
1/4
1/5
1/6
Look at this figure, and relate
it to the curve for function
f(x)= 1/x and its integration
() : average number of comparisons during QuickSort on elements.
() = , () = ,
() =
(( 1)
=1
+( )) +
=
(( 1)
=1
) +
() = 2 (( 1)
=1
) + ( ) -----1
Question: How will this equation appear for ?
( )( ) = 2 (( 1)
1
=1
) + ( )( ) -----2
Subtracting 2 from 1, we get
() ( )( ) = 2 ( ) + 2( )
() ( +)( ) = 2( )
Question: How to solve/simplify it further ?
()
+1
(1)
=
(1)
(+1)
17
()
+1
(1)
=
(1)
(+1)
() =
(1)
(+1)
, where =
()
+1
Question: How to simplify RHS ?
(1)
(+1)
=
+1 4
(+1)
=
=
4
(+1)
=
+
4
+1
=
4
+1
2
() =
4
+1
2
18
() =
4
+1
2
Question: How to calculate () ?
( ) =
4
( ) =
4
=
() =
4
3
2
() =
4
2
2
1
Hence () =
4
+1
+ (2
=
) 2 =
4
+1
+ (2
=
) 4
=
4
+1
+ 2() 4
() = ( + 1) (
4
+1
+ 2() 4)
= 2( + )() 4
19
() = 2( +)() 4
= 2( +1)
+ 1.16 ( +)
= 2
2.84 + O(1)
= 2
Theorem: The average number of comparisons during QuickSort on elements
approaches 2
2.84 .
= 1.39
O()
The best case number of comparisons during QuickSort on elements =
The worst case no. of comparisons during QuickSort on elements = ( )
20
QuickSort versus MergeSort
Theorem: Average no. of comparisons by QuickSort on elements = 1.39
O().
Question: How does MergeSort on 8 elements look like ?
Theorem: Worst case number of comparisons during MergeSort =
+ O()
Average no. of comparisons during QuickSort is 39% more than the worst case number
of comparisons of MergeSort.
Is the fact mentioned above not surprising because QuickSort outperforms MergeSort in
real life almost always ?
21
QuickSort versus MergeSort
Question: What makes QuickSort outperform MergeSort in real life?
Answer:
Fewer recursive calls (MergeSort : 2n-1, QuickSort : n).
QuickSort makes better use of cache (a fast but small memory unit between
processor and the RAM).
No extra space used by QuickSort.
No overhead of copying the array carried out during merging.
The likelihood of QuickSort to deviate from its average case behavior reduces
drastically as input size increases (read the following theorem).
Theorem: On a permutation of 1 million elements selected randomly uniformly from
all possible permutations, the probability that QuickSort takes more than twice the
average time is less than 10
24
.
Proof of this theorem is discussed in almost every course on randomized algorithms.
22