0% found this document useful (0 votes)
76 views35 pages

CS 332: Algorithms: Introduction To Heapsort

Heapsort is an in-place sorting algorithm that uses a heap data structure. It works in two steps: 1. It uses the BuildHeap() function to convert the input array into a heap in O(n) time. 2. It then repeatedly extracts the maximum element from the heap, exchanges it with the last element, and sifts it down to maintain the heap property. This is done n-1 times to sort the entire array in O(n log n) time. The heap data structure can also be used to implement a priority queue, which supports operations like Insert(), Maximum(), and ExtractMax() in logarithmic time by leveraging the heap properties.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views35 pages

CS 332: Algorithms: Introduction To Heapsort

Heapsort is an in-place sorting algorithm that uses a heap data structure. It works in two steps: 1. It uses the BuildHeap() function to convert the input array into a heap in O(n) time. 2. It then repeatedly extracts the maximum element from the heap, exchanges it with the last element, and sifts it down to maintain the heap property. This is done n-1 times to sort the entire array in O(n log n) time. The heap data structure can also be used to implement a priority queue, which supports operations like Insert(), Maximum(), and ExtractMax() in logarithmic time by leveraging the heap properties.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

CS 332: Algorithms

Introduction to heapsort

Review: The Master Theorem

Given: a divide and conquer algorithm

An algorithm that divides the problem of size n


into a subproblems, each of size n/b
Let the cost of each stage (i.e., the work to divide
the problem + combine solved subproblems) be
described by the function f(n)

Then, the Master Theorem gives us a


cookbook for the algorithms running time:

Review: The Master Theorem

if T(n) = aT(n/b) + f(n) then

log b a

T ( n) n
log n

f ( n )

log b a

f (n) O n log b a

f ( n) n

log b a

c 1

f (n) n logb a AND

af (n / b) cf (n) for large n

Sorting Revisited

So far weve talked about two algorithms to sort


an array of numbers

What is the advantage of merge sort?

Answer: O(n lg n) worst-case running time

What is the advantage of insertion sort?


Answer: sorts in place
Also: When array nearly sorted, runs fast in practice

Next on the agenda: Heapsort

Combines advantages of both previous algorithms

Heaps

A heap can be seen as a complete binary tree:


16

14

10

What makes a binary tree complete?


Is the example above complete?

Heaps

A heap can be seen as a complete binary tree:


16

14

10

The book calls them nearly complete binary trees;


can think of unfilled slots as null pointers

Heaps

In practice, heaps are usually implemented as


arrays:
16
14

A = 16 14 10 8

10

1 =
2

7
4

Heaps

To represent a complete binary tree as an array:

The root node is A[1]


Node i is A[i]
The parent of node i is A[i/2] (note: integer divide)
The left child of node i is A[2i]
The right child of node i is A[2i + 1]
16
14

A = 16 14 10 8

10

1 =
2

7
4

Referencing Heap Elements

So
Parent(i) { return i/2 ; }
Left(i) { return 2*i; }
right(i) { return 2*i + 1; }

An aside: How would you implement this


most efficiently?

Trick question, I was looking for i << 1, etc.


But, any modern compiler is smart enough to do this
for you (and it makes the code hard to follow)

The Heap Property

Heaps also satisfy the heap property:


A[Parent(i)] A[i]
for all nodes i > 1

In other words, the value of a node is at most the


value of its parent
Where is the largest element in a heap stored?

Heap Height

Definitions:

The height of a node in the tree = the number of


edges on the longest downward path to a leaf
The height of a tree = the height of its root

What is the height of an n-element heap?


Why?
This is nice: basic heap operations take at most
time proportional to the height of the heap

Heap Operations: Heapify()

Heapify(): maintain the heap property

Given: a node i in the heap with children l and r


Given: two subtrees rooted at l and r, assumed to be
heaps
Problem: The subtree rooted at i may violate the heap
property (How?)
Action: let the value of the parent node float down so
subtree at i satisfies the heap property

What do you suppose will be the basic operation between i, l,


and r?

Heap Operations: Heapify()


Heapify(A, i)
{
l = Left(i); r = Right(i);
if (l <= heap_size(A) && A[l] > A[i])
largest = l;
else
largest = i;
if (r <= heap_size(A) && A[r] > A[largest])
largest = r;
if (largest != i)
Swap(A, i, largest);
Heapify(A, largest);
}

Heapify() Example

16
4

10

14
2

7
8

A = 16 4 10 14 7

Heapify() Example

16
4

10

14
2

7
8

A = 16 4 10 14 7

Heapify() Example

16
4

10

14
2

7
8

A = 16 4 10 14 7

Heapify() Example

16
14

10

4
2

7
8

A = 16 14 10 4

Heapify() Example

16
14

10

4
2

7
8

A = 16 14 10 4

Heapify() Example

16
14

10

4
2

7
8

A = 16 14 10 4

Heapify() Example

16
14

10

8
2

7
4

A = 16 14 10 8

Heapify() Example

16
14

10

8
2

7
4

A = 16 14 10 8

Heapify() Example

16
14

10

8
2

7
4

A = 16 14 10 8

Analyzing Heapify(): Informal


Aside from the recursive call, what is the
running time of Heapify()?
How many times can Heapify() recursively
call itself?
What is the worst-case running time of
Heapify() on a heap of size n?

Analyzing Heapify(): Formal


Fixing up relationships between i, l, and r takes
(1) time
If the heap at i has n elements, how many
elements can the subtrees at l or r have?

Draw it

Answer: 2n/3 (worst case: bottom row 1/2 full)


So time taken by Heapify() is given by
T(n) T(2n/3) + (1)

Analyzing Heapify(): Formal


So we have
T(n) T(2n/3) + (1)
By case 2 of the Master Theorem,
T(n) = O(lg n)
Thus, Heapify() takes logarithmic time

Heap Operations: BuildHeap()

We can build a heap in a bottom-up manner by


running Heapify() on successive subarrays

Fact: for array of length n, all elements in range


A[ n/2 + 1 .. n] are heaps (Why?)
So:
Walk backwards through the array from n/2 to 1, calling
Heapify() on each node.
Order of processing guarantees that the children of node
i are heaps when i is processed

BuildHeap()
// given an unsorted array A, make A a heap
BuildHeap(A)
{
heap_size(A) = length(A);
for (i = length[A]/2 downto 1)
Heapify(A, i);
}

BuildHeap() Example

Work through example


A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7}
4
1

2
14

16
8

10

Analyzing BuildHeap()
Each call to Heapify() takes O(lg n) time
There are O(n) such calls (specifically, n/2 )
Thus the running time is O(n lg n)

Is this a correct asymptotic upper bound?


Is this an asymptotically tight bound?

A tighter bound is O(n)

How can this be? Is there a flaw in the above


reasoning?

Analyzing BuildHeap(): Tight

To Heapify() a subtree takes O(h) time


where h is the height of the subtree

h = O(lg m), m = # nodes in subtree


The height of most subtrees is small

Fact: an n-element heap has at most n/2h+1


nodes of height h
CLR 7.3 uses this fact to prove that
BuildHeap() takes O(n) time

Heapsort

Given BuildHeap(), an in-place sorting


algorithm is easily constructed:

Maximum element is at A[1]


Discard by swapping with element at A[n]
Decrement heap_size[A]
A[n] now contains correct value

Restore heap property at A[1] by calling


Heapify()
Repeat, always swapping A[1] for A[heap_size(A)]

Heapsort
Heapsort(A)
{
BuildHeap(A);
for (i = length(A) downto 2)
{
Swap(A[1], A[i]);
heap_size(A) -= 1;
Heapify(A, 1);
}
}

Analyzing Heapsort
The call to BuildHeap() takes O(n) time
Each of the n - 1 calls to Heapify() takes
O(lg n) time
Thus the total time taken by HeapSort()
= O(n) + (n - 1) O(lg n)
= O(n) + O(n lg n)
= O(n lg n)

Priority Queues
Heapsort is a nice algorithm, but in practice
Quicksort (coming up) usually wins
But the heap data structure is incredibly useful
for implementing priority queues

A data structure for maintaining a set S of elements,


each with an associated value or key
Supports the operations Insert(), Maximum(),
and ExtractMax()
What might a priority queue be useful for?

Priority Queue Operations


Insert(S, x) inserts the element x into set S
Maximum(S) returns the element of S with
the maximum key
ExtractMax(S) removes and returns the
element of S with the maximum key
How could we implement these operations
using a heap?

You might also like