HW 3
HW 3
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 1/19
Example of PQ: Emergency Room
2025/4/17 下午4:29 -
the nurse also needs to delete that patient from the list (pop is ). O(n)
sorted array (highest priority first). A better and more natural idea is not
to be so lazy, and maintain a sorted list from the most urgent to the least
(which is probably what most ERs would do). Then push is like insertion
sort, where you can use binary search ( time) to find the right
O(log n)
spot to insert, but insertion itself still costs (so push is ). Peek
O(n) O(n)
We can see that all these implementations would suffer from an O(n)
What does remind you? Binary search, and balanced binary search
O(log n)
balancing binary search tree, but there is another, much simpler, data
structure that is more suitable for PQ: binary heap.
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 4/19
Binary Heap as Trees
2025/4/17 下午4:29 -
a complete tree
Why is complete trees even better than general balanced trees? We’ll see
below that they are much more convenient to be represented (or
“linearized”) as arrays (via level-order traversal), and arrays are much more
efficient to manipulate than trees.
How would we organize the ER patients in such a tree? Naturally, the root is
the most urgent (highest priority) patient, so that peek is . By default
O(1)
we talk about min-heaps (but you can also use max-heaps), where the root
is the smallest. In terms of priority (urgency) in ER, think about it as a
number meaning “distance to death”, so a smaller number indicates higher
priority. What about the rest of the heap? Well, each node needs to be
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 5/19
2025/4/17 下午4:29 -
smaller than both children (or actually, all descendents, i.e., in each subtree,
the root is the smallest). Like other structures such as BSTs, heap can also
be defined recursively:
a singleton node is a heap
a heap is a complete binary tree whose root is the smallest node, and
both left and right subtrees are heaps.
example of a min-heap
When performing the operations, we need to maintain both the shape
property (complete tree) and heap property (each node is smaller than both
children).
push: append at the end of the tree so that it remains a complete tree. It
likely violates the heap property (if the number is smaller than its parent,
it needs to move up). So bubble-up to maintain the heap property, i.e.,
swap with parent if needed, and keep swapping upwards until no longer
violating the heap property. For example:
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 6/19
2025/4/17 下午4:29 -
initial min-heap:
1
/ \
2 4
/ \ /
8 5 6
pop: return the root, and replace it with the last node on the last level
(which is big, and likely violates the heap property). Do bubble-down to
maintain the heap property. Here bubble-down is more involved than
bubble-up, because when the root violates the heap property, it could be
larger than either or both of its children, e.g.:
*4* *4* *4*
/ \ / \ / \
2 3 5 3 3 5
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 7/19
2025/4/17 下午4:29 -
But in any case, we should choose the smallest out of the three numbers
(root, left node, right node) and swap that number with the root:
2 3 3
/ \ / \ / \
*4* 3 5 *4* *4* 5
and keep going down the tree until no longer violating the heap property.
Here is a complete example of pop:
initial min-heap:
1
/ \
2 4
/ \ /
8 5 6
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 8/19
2025/4/17 下午4:29 -
Because heaps are complete trees which are in turn balanced trees, the
height is , thus both push and pop are
O(log n) time. O(log n)
0 1 2 3 4 5 6
array representation of a complete tree
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 9/19
2025/4/17 下午4:29 -
This example is identical to the one above. Another very useful function is
heapreplace() which is conceptually a combination of heappop() and
heappush(), but is more efficient because it just replaces the root by a new
element, followed by a bubble-down:
>>> heapq.heapreplace(h, 9) # pops root, replaces it by 9
2
>>> h
[4, 5, 9, 8, 6]
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 10/19
2025/4/17 下午4:29 -
O(n) .
heapify as divide-n-conquer (top-down)
Let’s first view heapify as divide-n-conquer. A random array is still a
complete tree in our linearized representation of heaps, e.g.:
5
/ \
2 6
/ \ / \
1 4 3 7
/ \ /
8 0 9
But it’s clearly not a heap. How should we make it a heap? Just divide-n-
conquer, in a post-order traversal:
heapify left subtree
heapify right subtree
now both left and right subtrees are heaps, but the root might still cause
a violation; if so, keep bubbling down until it fits
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 11/19
2025/4/17 下午4:29 -
because the tree height is O(log n). We will see how to solve it to O(n)
below.
heapify as bottom-up
Since the above top-down heapify is a post-order traversal, the real
execution is in bottom-up order. Note that leaf nodes are already heaps, so
we just need to adjust the [1,8,0] tree first by a 1-step bubble-down:
5
/ \
2 6
/ \ / \
0 4 3 7
/ \ /
8 *1 9
/ \ / \
*2 4 3 7
/ \ /
8 1 9
5
/ \
0 6
/ \ / \
1 4 3 7
/ \ /
8 *2 9
Now the left subtree of the root is a heap, so we visit its sibling, the [6,3,7]
subtree, and need a 1-step bubble-down:
5
/ \
0 3
/ \ / \
1 4 *6 7
/ \ /
8 2 9
Now both left and right subtrees are done, and the only remaining step is to
bubble-down from the root, which needs 3 steps:
0
/ \
*5 3
/ \ / \
1 4 6 7
/ \ /
8 2 9
0
/ \
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 13/19
2025/4/17 下午4:29 -
1 3
/ \ / \
*5 4 6 7
/ \ /
8 2 9
0
/ \
1 3
/ \ / \
2 4 6 7
/ \ /
8 *5 9
For the above example with |h|=10 nodes, we have i <= 10//2 -1 = 4. So
the first node to adjust is h[4]:
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 14/19
2025/4/17 下午4:29 -
index: 0 1 2 3 4 5 6 7 8 9
h = [5, 2, 6, 1, 4, 3, 7, 8, 0, 9]
non-leaves <-------------||-------------> leaves
5
/ \
2 6
/ \ / \
1 4<- 3 7
/ \ /
8 0 9
Indeed, this node is the first one that has children in the backward order. All
nodes after it in the linear order are leaf nodes (half of the array). So we can
write a simple loop, which is how heapify is implemented in practice:
def heapify2(h):
for i in range(len(h)//2-1, -1, -1): # downto 0
bubbledown(h, i)
heapify is faster because the vast majority (lower levels) requires very
little or no work (bubble-down to the leaves); in fact, half of the nodes
have 0 work;
while heappush one by one is slower because the majority requires the
most work (bubble-up all the way to the root); in fact, half of the nodes
have work.
O(log n)
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 15/19
2025/4/17 下午4:29 -
h+1
=n ⋅ (0 + 1/4 + 2/8 + 3/16 + … + h/2 )
Here 1/4 + 2/8 + 3/16 + …is a very interesting series called arithmetico-
geometric sequence because the numerator is arithmetic while the
denominator is geometric (see also here and here). It still converges:
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 16/19
2025/4/17 下午4:29 -
1/4 + 2/8 + 3/16 + 4/32 + …
+0 + 0 + 1/16 + 2/32 + …
+0 + 0 + 1/16 + 1/32 + …
+0 + 0 + 0 + 1/32 + …
+ …
=1
So heapify is time.
O(n)
Note that these two derivations are much simpler than the ones in most
textbooks (such as CLRS) which require calculus.
Empirical Comparison between heapify and n heappush ’s:
To see if this analysis makes a difference in practice, I wrote a little program
to compare them on the worst-case input (inversely sorted array, so that
you need to bubble-down all the way to the leaf node or bubble-up all the
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 17/19
2025/4/17 下午4:29 -
for i in range(5):
n = 1000000 * 2**i
a = list(range(n, 0, -1)) # worst-case input
h = a.copy()
t = time.time()
heapq.heapify(h)
t1 = time.time()
h = []
for x in a: heapq.heappush(h, x)
t2 = time.time()
print("n=%9d heapify: %.4f n heappushes: %.4f ratio:
%.2f" % (n, t1-t, t2-t1, (t2-t1)/(t1-t)))
You can see that heapify is much faster, and as grows, its advantage is
n
Binary heap and heap sort were invented by British computer scientist JWJ
Williams in 1963. Further improvements of heap include the Fibonacci heap,
which improves push to , but it’s too complicated for practical use.
O(1)
https://web.engr.oregonstate.edu/~huanlian/algorithms_course/1-datastructures/pq_heap.html 19/19