0% found this document useful (0 votes)
8 views

Day4.2 Algorithms Part1

Algo p1

Uploaded by

Rishav Dhama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Day4.2 Algorithms Part1

Algo p1

Uploaded by

Rishav Dhama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 165

Algorithms

Module : Asymptotic Analysis, Sorting


Asymptotic Performance
• In Algorithm Analysis, we care most about
asymptotic performance
– How does the algorithm behave as the problem
size gets very large?
• Running time
• Memory/storage requirements
• Bandwidth/power requirements/logic gates/etc.
Asymptotic Notation
• By now you should have an intuitive feel for
asymptotic (big-O) notation:
– What does O(n) running time mean? O(n2)?
O(n lg n)?
– How does asymptotic running time relate to
asymptotic memory usage?
• Our first task is to define this notation more
formally and completely
Analysis of Algorithms
• Analysis is performed with respect to a
computational model
• We will usually use a generic uniprocessor
random-access machine (RAM)
– All memory equally expensive to access
– No concurrent operations
– All reasonable instructions take unit time
• Except, of course, function calls
– Constant word size
• Unless we are explicitly manipulating bits
Input Size
• Time and space complexity
– This is generally a function of the input size
• E.g., sorting, multiplication
– How we characterize input size depends:
• Sorting: number of input items
• Multiplication: total number of bits
• Graph algorithms: number of nodes & edges
• Etc
Running Time
• Number of primitive steps that are executed
– Except for time of executing a function call most
statements roughly require the same amount of
time
• y=m*x+b
• c = 5 / 9 * (t - 32 )
• z = f(x) + g(y)
• We can be more exact if need be
Analysis
• Worst case
– Provides an upper bound on running time
– An absolute guarantee
• Average case
– Provides the expected running time
– Very useful, but treat with care: what is
“average”?
• Random (equally likely) inputs
• Real-life inputs
Design and Analysis of
Algorithms
• Analysis: predict the cost of an algorithm in
terms of resources and performance

• Design: design algorithms which minimize the


cost
Insertion Sort
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
30 10 40 20 i =  j =  key = 
A[j] =  A[j+1] = 
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
30 10 40 20 i=2 j=1 key = 10
A[j] = 30 A[j+1] = 10
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
30 30 40 20 i=2 j=1 key = 10
A[j] = 30 A[j+1] = 30
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
30 30 40 20 i=2 j=1 key = 10
A[j] = 30 A[j+1] = 30
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
30 30 40 20 i=2 j=0 key = 10
A[j] =  A[j+1] = 30
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
30 30 40 20 i=2 j=0 key = 10
A[j] =  A[j+1] = 30
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=2 j=0 key = 10
A[j] =  A[j+1] = 10
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=3 j=0 key = 10
A[j] =  A[j+1] = 10
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=3 j=0 key = 40
A[j] =  A[j+1] = 10
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=3 j=0 key = 40
A[j] =  A[j+1] = 10
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=3 j=2 key = 40
A[j] = 30 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=3 j=2 key = 40
A[j] = 30 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=3 j=2 key = 40
A[j] = 30 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=4 j=2 key = 40
A[j] = 30 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=4 j=2 key = 20
A[j] = 30 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=4 j=2 key = 20
A[j] = 30 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 20 i=4 j=3 key = 20
A[j] = 40 A[j+1] = 20
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
An Example: Insertion Sort
10 30 40 20 i=4 j=3 key = 20
A[j] = 40 A[j+1] = 20
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 40 i=4 j=3 key = 20
A[j] = 40 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
An Example: Insertion Sort
10 30 40 40 i=4 j=3 key = 20
A[j] = 40 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 40 i=4 j=3 key = 20
A[j] = 40 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 40 i=4 j=2 key = 20
A[j] = 30 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 40 40 i=4 j=2 key = 20
A[j] = 30 A[j+1] = 40
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 30 40 i=4 j=2 key = 20
A[j] = 30 A[j+1] = 30
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 30 40 i=4 j=2 key = 20
A[j] = 30 A[j+1] = 30
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 30 40 i=4 j=1 key = 20
A[j] = 10 A[j+1] = 30
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 30 30 40 i=4 j=1 key = 20
A[j] = 10 A[j+1] = 30
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 20 30 40 i=4 j=1 key = 20
A[j] = 10 A[j+1] = 20
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
10 20 30 40 i=4 j=1 key = 20
A[j] = 10 A[j+1] = 20
1 2 3 4
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Done!
Insertion Sort
InsertionSort(A, n) { What is the precondition
for i = 2 to n { for this loop?
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
}
}
Insertion Sort
InsertionSort(A, n) {
for i = 2 to n {
key = A[i]
j = i - 1;
while (j > 0) and (A[j] > key) {
A[j+1] = A[j]
j = j - 1
}
A[j+1] = key
} How many times will
} this loop execute?
Insertion Sort
Statement Effort
InsertionSort(A, n) {
for i = 2 to n { c1 n
key = A[i] c2(n-1)
j = i - 1; c3(n-1)
while (j > 0) and (A[j] > key) { c4 T
A[j+1] = A[j] c5(T-(n-1))
j = j - 1 c6(T-(n-1))
} 0
A[j+1] = key c7(n-1)
} 0
}
T = t2 + t3 + … + tn where ti is number of while expression evaluations for the ith for loop
iteration
Analyzing Insertion Sort
• T(n) = c1n + c2(n-1) + c3(n-1) + c4T + c5(T - (n-1)) + c6(T - (n-1)) + c7(n-1)
= c8T + c9n + c10
• What can T be?
– Best case -- inner loop body never executed
• ti = 1 ➔ T(n) is a linear function
– Worst case -- inner loop body executed for all
previous elements
• ti = i ➔ T(n) is a quadratic function
– Average case
• ???
Analysis
• Simplifications
– Ignore actual and abstract statement costs
– Order of growth is the interesting measure:
• Highest-order term is what counts
– Remember, we are doing asymptotic analysis
– As the input size grows larger it is the high order term that
dominates
Upper Bound Notation
• We say InsertionSort’s run time is O(n2)
– Properly we should say run time is in O(n2)
– Read O as “Big-O” (you’ll also hear it as “order”)
• In general a function
– f(n) is O(g(n)) if there exist positive constants c
and n0 such that f(n)  c  g(n) for all n  n0
• Formally
– O(g(n)) = { f(n):  positive constants c and n0 such
that f(n)  c  g(n)  n  n0
Insertion Sort Is O(n2)
• Proof
– Suppose runtime is an2 + bn + c
• If any of a, b, and c are less than 0 replace the constant
with its absolute value
– an2 + bn + c  (a + b + c)n2 + (a + b + c)n + (a + b + c)
–  3(a + b + c)n2 for n  1
– Let c’ = 3(a + b + c) and let n0 = 1
• Question
– Is InsertionSort O(n3)?
– Is InsertionSort O(n)?
Big O Fact
• A polynomial of degree k is O(nk)
• Proof:
– Suppose f(n) = bknk + bk-1nk-1 + … + b1n + b0
• Let ai = | bi |
– f(n)  aknk + ak-1nk-1 + … + a1n + a0
i
n
 n  ai k
k
 n k
a i  cn k

n
Lower Bound Notation
• We say InsertionSort’s run time is (n)
• In general a function
– f(n) is (g(n)) if  positive constants c and n0 such
that 0  cg(n)  f(n)  n  n0
• Proof:
– Suppose run time is an + b
• Assume a and b are positive (what if b is negative?)
– an  an + b
Asymptotic Tight Bound
• A function f(n) is (g(n)) if  positive
constants c1, c2, and n0 such that

c1 g(n)  f(n)  c2 g(n)  n  n0

• Theorem
– f(n) is (g(n)) iff f(n) is both O(g(n)) and (g(n))
The problem of sorting

Input: sequence a1, a2, …, an of numbers.

Output: permutation a'1, a'2, …, a'n such


that a'1  a'2  …  a'n .

Example:
Input: 8 2 4 9 3 6
Output: 2 3 4 6 8 9
Example of insertion sort
8 2 4 9 3 6
Example of insertion sort
8 2 4 9 3 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
2 3 4 6 8 9 done
Running time

• The running time depends on the input: an


already sorted sequence is easier to sort.
• Major Simplifying Convention:
Parameterize the running time by the size of
the input, since short sequences are easier to
sort than long ones.
➢TA(n) = time of A on length n inputs
• Generally, we seek upper bounds on the
running time, to have a guarantee of
performance.
-notation

DEF:
(g(n)) = { f (n) : there exist positive constants c1, c2, and
n0 such that 0  c1 g(n)  f (n)  c2 g(n)
for all n  n0 }
Basic manipulations:
• Drop low-order terms; ignore leading constants.
• Example: 3n3 + 90n2 – 5n + 6046 = (n3)
Asymptotic performance
When n gets large enough, a (n2) algorithm
always beats a (n3) algorithm.
.
• Asymptotic analysis is a
useful tool to help to
structure our thinking
toward better algorithm
• We shouldn’t ignore
T(n) asymptotically slower
algorithms, however.
• Real-world design
n0 situations often call for a
n
careful balancing
Insertion sort analysis
Worst case: Input reverse sorted.
n
T (n) =   ( j ) =  ( n 2) [arithmetic series]
j=2
Average case: All permutations equally likely.
n
T (n) =   ( j / 2 ) =  (n 2 )
j=2
Is insertion sort a fast sorting algorithm?
• Moderately so, for small n.
• Not at all, for large n.
Example 2:Merge sort

MERGE-SORT A[1 . . n]
1. If n = 1, done.
2. Recursively sort A[ 1 . . n/2 ]
and A[ n/2+1 . . n ] .
3. “Merge” the 2 sorted lists.

Key subroutine: MERGE


Merging two sorted arrays
20 12
13 11
7 9
2 1
Merging two sorted arrays
20 12
13 11
7 9
2 1

1
Merging two sorted arrays
20 12 20 12
13 11 13 11
7 9 7 9
2 1 2

1
Merging two sorted arrays
20 12 20 12
13 11 13 11
7 9 7 9
2 1 2

1 2
Merging two sorted arrays
20 12 20 12 20 12
13 11 13 11 13 11
7 9 7 9 7 9
2 1 2

1 2
Merging two sorted arrays
20 12 20 12 20 12
13 11 13 11 13 11
7 9 7 9 7 9
2 1 2

1 2 7
Merging two sorted arrays
20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11
7 9 7 9 7 9 9
2 1 2

1 2 7
Merging two sorted arrays
20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11
7 9 7 9 7 9 9
2 1 2

1 2 7 9
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11
7 9 7 9 7 9 9
2 1 2

1 2 7 9
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11
7 9 7 9 7 9 9
2 1 2

1 2 7 9 11
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11 13
7 9 7 9 7 9 9
2 1 2

1 2 7 9 11
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11 13
7 9 7 9 7 9 9
2 1 2

1 2 7 9 11 12
Merging two sorted arrays
20 12 20 12 20 12 20 12 20 12 20 12
13 11 13 11 13 11 13 11 13 11 13
7 9 7 9 7 9 9
2 1 2

1 2 7 9 11 12

Time = (n) to merge a total


of n elements (linear time).
Analyzing merge sort

T(n) MERGE-SORT A[1 . . n]


(1) 1. If n = 1, done.
2T(n/2) 2. Recursively sort A[ 1 . . n/2 ]
and A[ n/2+1 . . n ] .
(n) 3. “Merge” the 2 sorted lists
Sloppiness: Should be T( n/2 ) + T( n/2 ) ,
but it turns out not to matter asymptotically.
Recurrence for merge sort
(1) if n = 1;
T(n) =
2T(n/2) + (n) if n > 1.
• We shall usually omit stating the base
case when T(n) = (1) for sufficiently
small n, but only when it has no effect on
the asymptotic solution to the recurrence.

Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
T(n)
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
T(n/2) T(n/2)
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2

T(n/4) T(n/4) T(n/4) T(n/4)


Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2

cn/4 cn/4 cn/4 cn/4

(1)
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn
cn/2 cn/2
h = lg n cn/4 cn/4 cn/4 cn/4

(1)
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2
h = lg n cn/4 cn/4 cn/4 cn/4

(1)
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4

(1)
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn


(1)
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn


(1) #leaves = n (n)
Recursion tree
Solve T(n) = 2T(n/2) + cn, where c > 0 is constant.
cn cn
cn/2 cn/2 cn
h = lg n cn/4 cn/4 cn/4 cn/4 cn


(1) #leaves = n (n)
Total = (n lg n)
Conclusions

• (n lg n) grows more slowly than (n2).


• Therefore, merge sort asymptotically
beats insertion sort in the worst case.
• In practice, merge sort beats insertion
sort for n > 30 or so.
Heaps
• A heap can be seen as a complete binary tree:
16

14 10

8 7 9 3

2 4 1

– What makes a binary tree complete?


– Is the example above complete?
Heaps
• A heap can be seen as a complete binary tree:
16

14 10

8 7 9 3

2 4 1 1 1 1 1 1

– The book calls them “nearly complete” binary


trees; can think of unfilled slots as null pointers
Heaps
• In practice, heaps are usually implemented as
arrays:
16

14 10

8 7 9 3
A = 16 14 10 8 7 9 3 2 4 1 =
2 4 1
Heaps
• To represent a complete binary tree as an
array:
– The root node is A[1]
– Node i is A[i]
– The parent of node i is A[i/2] (note: integer divide)
– The left child of node i is A[2i]
– The right child of node i is A[2i + 1] 16

14 10

8 7 9 3
A = 16 14 10 8 7 9 3 2 4 1 =
2 4 1
Referencing Heap Elements
• So…
Parent(i) { return i/2; }
Left(i) { return 2*i; }
right(i) { return 2*i + 1; }
• An aside: How would you implement this
most efficiently?
• Another aside: Really?
The Heap Property
• Heaps also satisfy the heap property:
A[Parent(i)]  A[i] for all nodes i > 1
– In other words, the value of a node is at most the
value of its parent
– Where is the largest element in a heap stored?
• Definitions:
– The height of a node in the tree = the number of
edges on the longest downward path to a leaf
– The height of a tree = the height of its root
Heap Height
• What is the height of an n-element heap?
Why?
• This is nice: basic heap operations take at
most time proportional to the height of the
heap
Heap Operations: Heapify()
• Heapify(): maintain the heap property
– Given: a node i in the heap with children l and r
– Given: two subtrees rooted at l and r, assumed to
be heaps
– Problem: The subtree rooted at i may violate the
heap property (How?)
– Action: let the value of the parent node “float
down” so subtree at i satisfies the heap property
• What do you suppose will be the basic operation
between i, l, and r?
Heap Operations: Heapify()
Heapify(A, i)
{
l = Left(i); r = Right(i);
if (l <= heap_size(A) && A[l] > A[i])
largest = l;
else
largest = i;
if (r <= heap_size(A) && A[r] > A[largest])
largest = r;
if (largest != i)
Swap(A, i, largest);
Heapify(A, largest);
}
Heapify() Example

16

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1
Heapify() Example

16

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1
Heapify() Example

16

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1
Heapify() Example

16

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1
Heapify() Example

16

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1
Heapify() Example

16

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1
Heapify() Example

16

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1
Heapify() Example

16

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1
Heapify() Example

16

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1
Analyzing Heapify(): Informal
• Aside from the recursive call, what is the
running time of Heapify()?
• How many times can Heapify() recursively
call itself?
• What is the worst-case running time of
Heapify() on a heap of size n?
Analyzing Heapify(): Formal
• Fixing up relationships between i, l, and r
takes (1) time
• If the heap at i has n elements, how many
elements can the subtrees at l or r have?
– Draw it
• Answer: 2n/3 (worst case: bottom row 1/2
full)
• So time taken by Heapify() is given by
T(n)  T(2n/3) + (1)
2/3 bound
• In the worst case, for the heap with i internal
nodes and r leaves,
• 2i = r + i – 1
Fraction of nodes in left subtree =
• => r = i + 1 (2m+1)/(3m+2) ≤ 2/3

m m

m +1
Analyzing Heapify(): Formal
• So we have
T(n)  T(2n/3) + (1)
• By case 2 of the Master Theorem,
T(n) = O(lg n)
• Thus, Heapify() takes logarithmic time
Heap Operations: BuildHeap()
• We can build a heap in a bottom-up manner
by running Heapify() on successive
subarrays
– Fact: for array of length n, all elements in range
A[n/2 + 1 .. n] are heaps (Why?)
– So:
• Walk backwards through the array from n/2 to 1, calling
Heapify() on each node.
• Order of processing guarantees that the children of
node i are heaps when i is processed
BuildHeap()
// given an unsorted array A, make A a heap
BuildHeap(A)
{
heap_size(A) = length(A);
for (i = length[A]/2 downto 1)
Heapify(A, i);
}
BuildHeap() Example
• Work through example
A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7}

1 3

2 16 9 10

14 8 7
Analyzing BuildHeap()
• Each call to Heapify() takes O(lg n) time
• There are O(n) such calls (specifically, n/2)
• Thus the running time is O(n lg n)
– Is this a correct asymptotic upper bound?
– Is this an asymptotically tight bound?
• A tighter bound is O(n)
– How can this be? Is there a flaw in the above
reasoning?
Analyzing BuildHeap(): Tight
• To Heapify() a subtree takes O(h) time
where h is the height of the subtree
– h = O(lg m), m = # nodes in subtree
– The height of most subtrees is small
• Fact: an n-element heap has at most n/2h+1
nodes of height h
• Can be used to prove that BuildHeap()
takes O(n) time
Heapsort
• Given BuildHeap(), an in-place sorting
algorithm is easily constructed:
– Maximum element is at A[1]
– Discard by swapping with element at A[n]
• Decrement heap_size[A]
• A[n] now contains correct value
– Restore heap property at A[1] by calling
Heapify()
– Repeat, always swapping A[1] for A[heap_size(A)]
Heapsort
Heapsort(A)
{
BuildHeap(A);
for (i = length(A) downto 2)
{
Swap(A[1], A[i]);
heap_size(A) -= 1;
Heapify(A, 1);
}
}
Analyzing Heapsort
• The call to BuildHeap() takes O(n) time
• Each of the n - 1 calls to Heapify() takes
O(lg n) time
• Thus the total time taken by HeapSort()
= O(n) + (n - 1) O(lg n)
= O(n) + O(n lg n)
= O(n lg n)
Problems
• 1. Design an algorithm to merge k large sorted
arrays of a total of n elements using no more
than O(k) additional storage. Here k << n.
• 2. Design an algorithm which reads in n
integers one by one and then returns the k
smallest ones. Here k << n and n is unknown
to start with.
Priority Queues
• Heapsort is a nice algorithm, but in practice
Quicksort usually wins
• But the heap data structure is incredibly
useful for implementing priority queues
– A data structure for maintaining a set S of
elements, each with an associated value or key
– Supports the operations Insert(),
Maximum(), and ExtractMax()
– What might a priority queue be useful for?
Priority Queue Operations
• Insert(S, x) inserts the element x into set S
• Maximum(S) returns the element of S with
the maximum key
• ExtractMax(S) removes and returns the
element of S with the maximum key
• How could we implement these operations
using a heap?
Example of Insertion to Max Heap
• Add at the end, move upwards
20 20 21

15 2 15 5 15 20

14 10 14 10 2 14 10 2

initial location of new node insert 5 into heap insert 21 into heap

128
Example of Deletion from Max Heap
• EXTRACT_MAX
remove
20 10 15

15 2 15 2 14 2

14 10 14 10

129
Analysis of Algorithms

Quicksort, Lower Bounds


Module 4: Topics
• Quicksort
• Lower Bounds
Problem 1:
Rearranging elements in an array
• Given an array A [0..n-1] of integers and
integer x given by the user, design an
algorithm to rearrange the elements so that
the elements less or equal to than x occur
first, followed by elements greater than x.
• Here x is the “pivot”
Problem 2:
Rearranging elements in an array
• Given an array A [0..n-1] of integers and
integer x given by the user, implement an
algorithm to rearrange the elements so that
the elements less than x occur first, followed
by elements equal to x and then followed by
elements greater than x.
• Here x is the “pivot”
Rearranging elements in an array

• Maintain 4 partitions of the array


• Elements less than the pivot in A[0..r-1]
• Elements equal to the pivot in A[r..s-1]
• Unclassified elements in A[s..t-1]
• Elements greater than A[i] in A[t..n-1]
< A[i] == A[i] ? > A[i]

r s t
Rearranging Elements In An Array
• while(s ≤ t) {
if(A[s] < pivot) {
swap A[r] with A[s];
increment r and s;
}
else if(A[s] == pivot) s++;
else {
swap A[s] with A[t];
t--;
}
}
}
Quicksort
• Sorts in place
• Sorts O(n lg n) in the average case
• Sorts O(n2) in the worst case
Quicksort
• Another divide-and-conquer algorithm
– The array A[p..r] is partitioned into two non-empty
subarrays A[p..q] and A[q+1..r]
• Invariant: All elements in A[p..q] are less than all
elements in A[q+1..r]
– The subarrays are recursively sorted by calls to
quicksort
– Unlike merge sort, no combining step: two
subarrays form an already-sorted array
Quicksort Code
Quicksort(A, p, r)
{
if (p < r)
{
q = Partition(A, p, r);
Quicksort(A, p, q);
Quicksort(A, q+1, r);
}
}
Partition
• Clearly, all the action takes place in the
partition() function
– Rearranges the subarray in place
– End result:
• Two subarrays
• All values in first subarray  all values in second
– Returns the index of the “pivot” element separating
the two subarrays
• How do you suppose we implement this function?
Partition In Words
• Partition(A, p, r):
– Select an element to act as the “pivot” (which?)
– Grow two regions, A[p..i] and A[j..r]
• All elements in A[p..i] <= pivot
• All elements in A[j..r] >= pivot
– Increment i until A[i] >= pivot
– Decrement j until A[j] <= pivot
– Swap A[i] and A[j]
– Repeat until i >= j
– Return j
Partition Code
Partition(A, p, r)
x = A[p];
Illustrate on
i = p - 1;
j = r + 1;
A = {5, 3, 2, 6, 4, 1, 3, 7};
while (TRUE)
repeat
j--;
until A[j] <= x;
repeat
i++; What is the running time of
until A[i] >= x; partition()?
if (i < j)
Swap(A, i, j);
else
return j;
Analyzing Quicksort
• What will be the worst case for the algorithm?
– Partition is always unbalanced
• What will be the best case for the algorithm?
– Partition is perfectly balanced
• Which is more likely?
– The latter, by far, except...
• Will any particular input elicit the worst case?
– Yes: Already-sorted input
Analyzing Quicksort
• In the worst case:
T(1) = (1)
T(n) = T(n - 1) + (n)
• Works out to
T(n) = (n2)
Analyzing Quicksort
• In the best case:
T(n) = 2T(n/2) + (n)
• What does this work out to?
T(n) = (n lg n)
Improving Quicksort
• The real liability of quicksort is that it runs in
O(n2) on already-sorted input
• Book discusses two solutions:
– Randomize the input array, OR
– Pick a random pivot element
• How will these solve the problem?
– By insuring that no particular input can be chosen
to make quicksort run in O(n2) time
Analyzing Quicksort: Average Case
• Assuming random input, average-case running
time is much closer to O(n lg n) than O(n2)
• First, a more intuitive explanation/example:
– Suppose that partition() always produces a 9-to-1
split. This looks quite unbalanced!
– The recurrence is thus: Use n instead of O(n)
for convenience (how?)
T(n) = T(9n/10) + T(n/10) + n
– How deep will the recursion go? (draw it)
Analyzing Quicksort: Average Case
• Intuitively, a real-life run of quicksort will
produce a mix of “bad” and “good” splits
– Randomly distributed among the recursion tree
– Pretend for intuition that they alternate between
best-case (n/2 : n/2) and worst-case (n-1 : 1)
– What happens if we bad-split root node, then
good-split the resulting size (n-1) node?
Analyzing Quicksort: Average Case
• Intuitively, a real-life run of quicksort will
produce a mix of “bad” and “good” splits
– Randomly distributed among the recursion tree
– Pretend for intuition that they alternate between
best-case (n/2 : n/2) and worst-case (n-1 : 1)
Analyzing Quicksort: Average Case
• Intuitively, a real-life run of quicksort will
produce a mix of “bad” and “good” splits
– Randomly distributed among the recursion tree
– Pretend for intuition that they alternate between
best-case (n/2 : n/2) and worst-case (n-1 : 1)
– What happens if we bad-split root node, then good-
split the resulting size (n-1) node?
• We end up with three subarrays, size 1, (n-1)/2, (n-1)/2
• Combined cost of splits = n + n -1 = 2n -1 = O(n)
• No worse than if we had good-split the root node!
Analyzing Quicksort: Average Case
• Intuitively, the O(n) cost of a bad split
(or 2 or 3 bad splits) can be absorbed
into the O(n) cost of each good split
• Thus running time of alternating bad and good
splits is still O(n lg n), with slightly higher
constants
• How can we be more rigorous?
Analyzing Quicksort: Average Case
• For simplicity, assume:
– All inputs distinct (no repeats)
– Slightly different partition() procedure
• partition around a random element, which is not
included in subarrays
• all splits (0:n-1, 1:n-2, 2:n-3, … , n-1:0) equally likely
• What is the probability of a particular split
happening?
• Answer: 1/n
Analyzing Quicksort: Average Case
• So partition generates splits
(0:n-1, 1:n-2, 2:n-3, … , n-2:1, n-1:0)
each with probability 1/n
• If T(n) is the expected running time,

1 n −1
T (n ) =  T (k ) + T (n − 1 − k ) + (n )
n k =0
• What is each term under the summation for?
• What is the (n) term for?
Analyzing Quicksort: Average Case
• So…
1 n −1
T (n ) =  T (k ) + T (n − 1 − k ) + (n )
n k =0

2 n −1
=  T (k ) + (n )
n k =0
Analyzing Quicksort: Average Case
• We can solve this recurrence using the
dreaded substitution method
– Guess the answer
– Assume that the inductive hypothesis holds
– Substitute it in for some value < n
– Prove that it follows for n
Analyzing Quicksort: Average Case
• We can solve this recurrence using the
dreaded substitution method
– Guess the answer
• What’s the answer?
– Assume that the inductive hypothesis holds
– Substitute it in for some value < n
– Prove that it follows for n
Analyzing Quicksort: Average Case
• We can solve this recurrence using the
dreaded substitution method
– Guess the answer
• T(n) = O(n lg n)
– Assume that the inductive hypothesis holds
– Substitute it in for some value < n
– Prove that it follows for n
Tightly Bounding
The Key Summation
 (n − 1)(n)  
n −1 n 2 −1

 
The summation bound so
k lg k 
  lg n − kfar
k =1  2  k =1
n 2 −1
 n(n − 1)lg n −  k
1 Rearrange
What arefirst
we term,
doing place
upper bound on second
here?
2 k =1

1  n  n 
 n(n − 1)lg n −   − 1
1
X What
Guassian series
are we doing?
2 2  2  2 
1 2
2
( 8
)
1 2 n
 n lg n − n lg n − n +
4
Multiply it
What are we doing?
all out
Tightly Bounding
The Key Summation
( )
n −1
1 2 1 2 n

k =1
k lg k  n lg n − n lg n − n +
2 8 4
1 2 1 2
 n lg n − n when n  2
2 8

Done!!!
How Fast Can We Sort?
• We will provide a lower bound, then beat it
– How do you suppose we’ll beat it?
• First, an observation: all of the sorting
algorithms so far are comparison sorts
– The only operation used to gain ordering
information about a sequence is the pairwise
comparison of two elements
– Theorem: all comparison sorts are (n lg n)
• A comparison sort must do O(n) comparisons (why?)
• What about the gap between O(n) and O(n lg n)
David Luebke

159

12/08/2022
Decision Trees
• Decision trees provide an abstraction of
comparison sorts
– A decision tree represents the comparisons made
by a comparison sort. Every thing else ignored

• What do the leaves represent?


• How many leaves must there be?

David Luebke

160

12/08/2022
Decision Trees
• Decision trees can model comparison sorts.
For a given algorithm:
– One tree for each n
– Tree paths are all possible execution traces
– What’s the longest path in a decision tree for
insertion sort? For merge sort?
• What is the asymptotic height of any decision
tree for sorting n elements?
• Answer: (n lg n) (now let’s prove it…)
David Luebke

161

12/08/2022
Lower Bound For
Comparison Sorting
• Thm: Any decision tree that sorts n elements
has height (n lg n)
• What’s the minimum # of leaves?
• What’s the maximum # of leaves of a binary
tree of height h?
• Clearly the minimum # of leaves is less than or
equal to the maximum # of leaves

David Luebke

162

12/08/2022
Lower Bound For
Comparison Sorting
• So we have…
n!  2h
• Taking logarithms:
lg (n!)  h
• Stirling’s approximation tells us:
n
n
n!   
e n
n
• Thus: h  lg e 
David Luebke
 
163

12/08/2022
Lower Bound For
Comparison Sorting
• So we have
n
n
h  lg 
e

= n lg n − n lg e

= (n lg n )

• Thus the minimum height of a decision tree is


(n lg n)
David Luebke

164

12/08/2022
Lower Bound For
Comparison Sorts
• Thus the time to comparison sort n elements
is (n lg n)
• Corollary: Heapsort and Mergesort are
asymptotically optimal comparison sorts
• But Sorting in linear time is possible!
– How can we do better than (n lg n)?

David Luebke

165

12/08/2022
Search Vs. Sort and Search

• We are given an unsorted array of n integers


• How do you search for an user defined key q?
• Linear Search?
• Sort and then binary search?

You might also like