0% found this document useful (0 votes)
83 views54 pages

Heap Sort and Quick Sort-2

The document discusses heapsort and heaps. It introduces heaps as nearly complete binary trees and how they can be represented as arrays. Key points include: Heaps satisfy the max/min heap property where a node is smaller/larger than its children. Building a max heap from an unsorted array can be done in linear time O(n) by calling Max-Heapify on nodes from the bottom up. Heapsort then works by removing the max element to the end of the array, shrinking the heap size, and sifting down the new max until the array is fully sorted.

Uploaded by

Angad pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views54 pages

Heap Sort and Quick Sort-2

The document discusses heapsort and heaps. It introduces heaps as nearly complete binary trees and how they can be represented as arrays. Key points include: Heaps satisfy the max/min heap property where a node is smaller/larger than its children. Building a max heap from an unsorted array can be done in linear time O(n) by calling Max-Heapify on nodes from the bottom up. Heapsort then works by removing the max element to the end of the array, shrinking the heap size, and sifting down the new max until the array is fully sorted.

Uploaded by

Angad pal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Introduction to Heapsort and

Quicksort

Tarunpreet Bhatia
Lecturer, CSED, Thapar University
Sorting Revisited

 So far we’ve talked about algorithms to sort an


array of numbers
 What is the advantage of merge sort?
 Answer: O(n lg n) worst-case running time
 What is the advantage of insertion sort?
 Answer: sorts in place
 Also: When array “nearly sorted”, runs fast in practice

 Next on the agenda: Heapsort


 Combines advantages of both previous algorithms
Heaps

 A heap can be seen as a nearly complete


binary tree: 16

14 10

8 7 9 3

2 4 1

 What makes a binary tree complete?


 Is the example above complete?
Heaps

 In practice, heaps are usually implemented as


arrays:
16

14 10

8 7 9 3
A = 16 14 10 8 7 9 3 2 4 1 =
2 4 1
Heaps

 To represent a complete binary tree as an array:


 The root node is A[1]
 Node i is A[i]
 The parent of node i is A[i/2] (note: integer divide)
 The left child of node i is A[2i]
 The right child of node i is A[2i + 1]
16

14 10

8 7 9 3
A = 16 14 10 8 7 9 3 2 4 1 =
2 4 1
Referencing Heap Elements

 So…
Parent(i) { return i/2; }
Left(i) { return 2*i; }
right(i) { return 2*i + 1; }
 An aside: How would you implement this
most efficiently?
 Trick question, I was looking for “i << 1”, etc.
 But, any modern compiler is smart enough to do
this for you (and it makes the code hard to follow)
The Heap Property
 Max-Heaps also satisfy the max-heap property:
A[Parent(i)]  A[i] for all nodes i > 1
 In other words, the value of a node is at most the
value of its parent
 Where is the largest element in a max-heap stored?

 Min-Heaps also satisfy the min-heap property:


A[Parent(i)] ≤ A[i] for all nodes i > 1
 In other words, the value of a node is at most the
value of its parent
 Where is the smallest element in a min-heap
stored?
Heap Height

 Definitions:
 The height of a node in the tree = the number of
edges on the longest downward path to a leaf
 The height of a tree = the height of its root
 What is the height of an n-element heap? Why?
 This is nice: basic heap operations take at most
time proportional to the height of the heap
 The depth of a node X in a tree T is defined as the length of
the simple path (number of edges) from the root node of T to
X.
 Number of edges/arc from the root node to that node is called
as the Depth of that node.
 Number of edges/arc from the root node to the leaf node of
the tree is called as the Depth of the Tree.
 The level of a node is defined by 1 + the number of
connections between the node and the root.
 The height of a tree is equal to the max depth of a tree.
 The depth of a node and the height of a node are not
necessarily equal.
 What are the minimum and maximum number of elements in a
heap of height h?
 Show that n-element heap has height  lg n 
 Is an array that is in sorted order a min-heap?
 Is the sequence (23, 17, 14, 6, 13, 10, 1 , 5, 7, 12) a max-
heap?
 Show that, with the array representation for storing an n-
element heap, the leaves are indexed by  n/2  + 1,  n/2  +
2 ……. n.
 Show that there are at most n/2 h+1  nodes of height h in any
n-element heap.
Heap Operations: Max-Heapify()

 Max-Heapify(): maintain the heap


property
 Given: a node i in the heap with children l and r
 Given: two subtrees rooted at l and r, assumed to
be heaps
 Problem: The subtree rooted at i may violate the
heap property (How?)
 Action: let the value of the parent node “float
down” so subtree at i satisfies the heap property
 Whatdo you suppose will be the basic operation
between i, l, and r?
Heap Operations: Max-Heapify()
Max-Heapify(A, i)
{
l = Left(i); r = Right(i);
if (l <= heap_size(A) && A[l] > A[i])
largest = l;
else
largest = i;
if (r <= heap_size(A) && A[r] > A[largest])
largest = r;
if (largest != i)
Swap(A, i, largest);
Heapify(A, largest);
}
Max-Heapify(A, 2)Example

16

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1
Max-Heapify()Example

16

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1
Max-Heapify() Example

16

4 10

14 7 9 3

2 8 1

A = 16 4 10 14 7 9 3 2 8 1
Max-Heapify()Example

16

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1
Max-Heapify() Example

16

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1
Max-Heapify() Example

16

14 10

4 7 9 3

2 8 1

A = 16 14 10 4 7 9 3 2 8 1
Max-Heapify() Example

16

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1
Max-Heapify() Example

16

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1
Max-Heapify() Example

16

14 10

8 7 9 3

2 4 1

A = 16 14 10 8 7 9 3 2 4 1
Analyzing Max-Heapify(): Informal

 Aside from the recursive call, what is the


running time of Max-Heapify()?
 How many times can Max-Heapify()
recursively call itself?
 What is the worst-case running time of Max-
Heapify() on a heap of size n?
Analyzing Max-Heapify() Formal

 Fixing up relationships between i, l, and r


takes (1) time
 If the heap at i has n elements, how many
elements can the subtrees at l or r have?
 Draw it
 Answer: 2n/3 (worst case: bottom row 1/2 full)
 So time taken by Max-Heapify() is given
by
T(n)  T(2n/3) + (1)
Analyzing Max-Heapify() Formal

 So we have
T(n)  T(2n/3) + (1)
 By case 2 of the Master Theorem,
T(n) = O(lg n)
 Thus, Max-Heapify() takes logarithmic
time
Heap Operations: BuildMaxHeap()

 We can build a heap in a bottom-up manner by


running Max-Heapify() on successive
subarrays
 Fact: for array of length n, all elements in range
A[n/2 + 1 .. n] are heaps
 So:
 Walk
backwards through the array from n/2 to 1, calling
Max-Heapify() on each node.
 Order of processing guarantees that the children of node
i are heaps when i is processed
BuildMaxHeap()
// given an unsorted array A, make A a heap
BuildMaxHeap(A)
{
heap_size(A) = length(A);
for (i = length[A]/2 downto 1)
MaxHeapify(A, i);
}
BuildMaxHeap() Example

 Work through example


A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7}

1 3

2 16 9 10

14 8 7
Analyzing BuildMaxHeap()

 Each call to Max-Heapify() takes O(lg n)


time
 There are O(n) such calls (specifically, n/2)
 Thus the running time is O(n lg n)
 Is this a correct asymptotic upper bound?
 Is this an asymptotically tight bound?
 A tighter bound is O(n)
 How can this be? Is there a flaw in the above
reasoning?
Analyzing BuildMaxHeap(): Tight

 To Max-Heapify() a subtree takes O(h)


time where h is the height of the subtree
 h = O(lg n), n = # nodes in subtree
 The height of most subtrees is small
 Fact: an n-element heap has at most n/2h+1
nodes of height h
 Hence we can BuildHeap() from an
unordered array in linear time.
Heapsort

 Given BuildHeap(), an in-place sorting


algorithm is easily constructed:
 Maximum element is at A[1]
 Discard by swapping with element at A[n]
 Decrement heap_size[A]
 A[n] now contains correct value

 Restore heap property at A[1] by calling


Heapify()
 Repeat, always swapping A[1] for A[heap_size(A)]
Heapsort
Heapsort(A)
{
BuildMaxHeap(A);
for (i = length(A) downto 2)
{
Swap(A[1], A[i]);
heap_size(A) -= 1;
MaxHeapify(A, 1);
}
}
Analyzing Heapsort

 The call to BuildMaxHeap() takes O(n)


time
 Each of the n - 1 calls to Max-Heapify()
takes O(lg n) time
 Thus the total time taken by HeapSort()
= O(n) + (n - 1) O(lg n)
= O(n) + O(n lg n)
= O(n lg n)
Priority Queues

 Heapsort is a nice algorithm, but in practice


Quicksort (coming up) usually wins
 But the heap data structure is incredibly useful
for implementing priority queues
 A data structure for maintaining a set S of
elements, each with an associated value or key
 What might a priority queue be useful for?
 Ready list of processes in operating systems by their
priorities – the list is highly dynamic.
 In event-driven simulators to maintain the list of events
to be simulated in order of their time of occurrence.
Priority Queue Operations

 Insert(S, x) inserts the element x into set S


 Maximum(S) returns the element of S with
the maximum key
 ExtractMax(S) removes and returns the
element of S with the maximum key
 IncreaseKey(S, x, k) increases the value of
element x’s key to the new value k.
 How could we implement these operations
using a heap?
Implementing Priority Queues
HeapMaximum(A)
{
return A[1];
}
HeapExtractMax(A)
{
if (heap_size[A] < 1) { error; }
max = A[1];
A[1] = A[heap_size[A]]
heap_size[A] --;
Max-Heapify(A, 1);
return max;
}
Implementing Priority Queues
Element at i has its key increased from 4 to 15, and then it is propagated
up:
 Give an implementation of Heap-Delete (A, i)
that deletes the item in node i from heap A in
O(lg n) time for an n-element max-heap.
Heap-Delete (A, i)
Quicksort

 Sorts in place
 Sorts O(n lg n) in the average case
 Sorts O(n2) in the worst case
 So why would people use it instead of merge
sort?
Quicksort

 Another divide-and-conquer algorithm


 The array A[p..r] is partitioned into two non-
empty subarrays A[p..q] and A[q+1..r]
 Invariant:
All elements in A[p..q] are less than all
elements in A[q+1..r]
 The subarrays are recursively sorted by calls to
quicksort
 Unlike merge sort, no combining step: two
subarrays form an already-sorted array
Quicksort Code
Quicksort(A, p, r)
{
if (p < r)
{
q = Partition(A, p, r);
Quicksort(A, p, q-1);
Quicksort(A, q+1, r);
}
}
Partition

 Clearly, all the action takes place in the


partition() function
 Rearranges the subarray in place
 End result:
 Two subarrays
 All values in first subarray  all values in second

 Returns the index of the “pivot” element


separating the two subarrays
 How do you suppose we implement this
function?
Analyzing Quicksort

 What will be the worst case for the algorithm?


 Partition is always unbalanced
 What will be the best case for the algorithm?
 Partition is perfectly balanced
 Which is more likely?
 The latter, by far, except...
 Will any particular input elicit the worst case?
 Yes: Already-sorted input
Analyzing Quicksort

 In the worst case:


T(0) = (1)
T(n) = T(n - 1) + (n)
 Works out to
T(n) = (n2)
Analyzing Quicksort

 In the best case:


T(n) ≤ 2T(n/2) + (n)
 What does this work out to?
T(n) = (n lg n)
Analyzing Quicksort: Average Case

 Assuming random input, average-case running


time is much closer to O(n lg n) than O(n2)
 First, a more intuitive explanation/example:
 Suppose that partition() always produces a 9-to-1
split. This looks quite unbalanced!
 The recurrence is thus: Use n instead of O(n)
for convenience (how?)
T(n) = T(9n/10) + T(n/10) + n
 How deep will the recursion go? (draw it)
Analyzing Quicksort: Average Case

 Intuitively, a real-life run of quicksort will


produce a mix of “bad” and “good” splits
 Randomly distributed among the recursion tree
 Pretend for intuition that they alternate between
best-case and worst-case
 What happens if we bad-split root node, then good-
split the resulting size (n-1) node?
 We end up with three subarrays, size 0, (n-1)/2-1, (n-1)/2
 Combined cost of splits = n + n -1 = 2n -1 = O(n)

 No worse than if we had good-split the root node!


Analyzing Quicksort: Average Case

 Intuitively, the O(n) cost of a bad split


(or 2 or 3 bad splits) can be absorbed
into the O(n) cost of each good split
 Thus running time of alternating bad and good
splits is still O(nlg n), with slightly higher
constants
 What value of q does PARTITION return
when all the elements in the array have the
same value. Also tell the running time.
 How would you modify QUICKSORT to sort
in nonincreasing order?
Randomnized Quick sort
Randomnized-Quicksort(A, p, r)
{
if (p < r)
{
q = Randomnized-Partition(A, p, r);
Randomnized-Quicksort(A, p, q-1);
Randomnized-Quicksort(A, q+1, r);
}
}
Randomnized-Partition(A, p, r)
{
i = Random(p, r);
exchange A[r] and A[i];
return Partition(A, p, r);
}

You might also like