Algoxy en
Algoxy en
Xinyu LIU 1
1
Xinyu LIU
∞
1 1 1 1
Version: e =
X
=1+ + + + · · · = 2.718283
n=0
n! 1 1 · 2 1 · 2·3
Email: [email protected]
2
Preface
How can we find the smallest free number, 10, from this list? It seems quite easy with
exhaustive search:
1: function Min-Free(A)
2: x←0
3: loop
4: if x ∈
/ A then
5: return x
6: else
7: x←x+1
Where the ∈/ is realized as below.
1: function ‘∈’(x,
/ X)
2: for i ← 1 to |X| do
3: if x = X[i] then
4: return False
5: return True
Where |X| = n is the length of X. Some environments have built-in implementation
to test existence of an element. However, this solution performs poor with millions of
numbers. The time spent is quadratic to n. In a computer with 2 cores of 2.10 GHz
CPU, and 2G RAM, the C implementation takes 5.4s to find the answer among 100K
numbers, and exceeds 8 minutes to handle 1 million numbers.
i
ii Preface
Improvement
For n numbers x1 , x2 , ..., xn , if there is a free number, some xi must be out of the range
[0, n)1 ; otherwise the list is exactly some permutation of 0, 1, ..., n − 1 hence n is the
minimum free number.
search(∅, l, u) = l(
|A0 | = m − l + 1 : search(A00 , m + 1, u)
search(A, l, u) =
otherwise : search(A0 , l, m)
where:
l+u
m=b c
2
A0 = [x ≤ m, x ∈ A], A00 = [x > m, x ∈ A]
This algorithm needn’t additional space2 . Each recursive call performs O(|A|) com-
parisons to partition A0 and A00 , hence halves the problem as T (n) = T (n/2) + O(n).
We can reduce it to O(n) according to the master theorem3 . Below example program
implements this algorithm.
1 Range[a, b) starts from a to b, but excludes b.
2 The
recursion takes O(lg n) stack spaces, but it can be eliminated through tail recursion optimization
3 Alternatively, the first call takes O(n) time to partition A0 and A00 , the second call takes O(n/2)
time, the third call takes O(n/4) time ... The total time is O(n + n/2 + n/4 + ...) = O(2n) = O(n).
Elementary Algorithms iii
bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m + 1) u
| otherwise = bsearch as l m
where
m = (l + u) `div` 2
as = [x | x ← xs, x ≤ m]
bs = [x | x ← xs, x > m]
There are O(lg n) recursive calls. We can eliminate the recursion with loops:
1: function Min-Free(A)
2: l ← 0, u ← |A|
3: while u − l > 0 do
u+l
4: m←b c
2
5: lef t ← l
6: for right ← l to u − 1 do
7: if A[right] ≤ m then
8: Exchange A[lef t] ↔ A[right]
9: lef t ← lef t + 1
10: if lef t < m + 1 then
11: u ← lef t
12: else
13: l ← lef t
As shown in fig. 1, this program re-arranges the array such that all elements before
lef t are less than or equal to m; while those between lef t and right are greater than m.
Figure 1: All A[i] ≤ m where 0 ≤ i < lef t; while A[i] > m where lef t ≤ i < right. The
rest are yet to be scanned.
Regular number
The second problem is to find the 1,500-th number, which only contains factor 2, 3 or 5.
Such numbers are called the regular numbers4 . 2, 3, and 5 are definitely regular numbers.
60 = 22 31 51 is the 25-th regular number. 21 = 20 31 71 is not because it has a factor of 7.
Define 1 = 20 30 50 as the 0-th regular number. The first 10 are:
1, 2, 3, 4, 5, 6, 8, 9, 10, 12, ...
Hamming.
iv Preface
2: x←1
3: while n > 0 do
4: x←x+1
5: if Valid?(x) then
6: n←n−1
7: return x
8: function Valid?(x)
9: while x mod 2 = 0 do
10: x ← x/2
11: while x mod 3 = 0 do
12: x ← x/3
13: while x mod 5 = 0 do
14: x ← x/5
15: return x = 1 ?
This ‘brute-force’ algorithm performs poor when n increases. The C implementation
takes 40.39s in above computer to find the 1500-th number (860934420).
Improvement
Modular and divide are expensive operations [2] . Instead of checking every number, we
can generate regular numbers from 1 in ascending order. We use a queue, which allows
to add number to one end (enqueue), and remove from the other end (dequeue). The
number enqueued first will be dequeued first (First In First Out). Initialize the queue
with 1, the 0th regular number. We repeatedly dequeue a number, multiply it by 2, 3, 5
to generate 3 new numbers; then add them to the queue in ascending order. We drop the
number if it is already in the queue, as shown in fig. 2.
Q Q
1 2 3 5
1 2
Q Q
3 4 5 6 10 4 5 6 9 10 15
3 4
1: function Regular-Number(n)
2: Q ← [1]
Elementary Algorithms v
3: while n > 0 do
4: x ← Dequeue(Q)
5: Unique-Enqueue(Q, 2x)
6: Unique-Enqueue(Q, 3x)
7: Unique-Enqueue(Q, 5x)
8: n←n−1
9: return x
Where x:xs links x before list xs. It is called ‘cons’ in Lisp. 1 is linked as the head
(the 0th regular number). ∪ merges two lists:
a < b : a : as ∪ (b:bs)
(a:as) ∪ (b:bs) = a = b : a : as ∪ bs
a > b : b : (a:as) ∪ bs
This example program gives the 1500th number 860934420 by xs !! 1500 in 0.03s
in the same computer.
Queues
The above solution needs filter out duplicated numbers, scan the queue to maintain the
ascending order. We category all regular numbers into 3 disjoint buckets: Q2 = {2i |i > 0},
Q23 = {2i 3j |i ≥ 0, j > 0}, and Q235 = {2i 3j 5k |i, j ≥ 0, k > 0}. Constraint j 6= 0 in Q23 ,
and k 6= 0 in Q235 such there is no overlap. Realize the buckets as 3 queues starting from
Q2 = {2}, Q23 = {3}, and Q235 = {5}. Each time extract the smallest x from the queues,
then do the following:
We reach to the answer after dequeue n numbers. Figure 4 gives the first 4 steps.
2 4
3 3 6
5 5 10
4 8
6 9 6 9 9
5 10 15 5 10 15 20
1: function Regular-Number(n)
2: x←1
3: Q2 ← {2}, Q23 ← {3}, Q235 ← {5}
Elementary Algorithms vii
4: while n > 0 do
5: x ← min(Head(Q2 ), Head(Q23 ), Head(Q235 ))
6: if x = Head(Q2 ) then
7: Dequeue(Q2 )
8: Enqueue(Q2 , 2x)
9: Enqueue(Q23 , 3x)
10: Enqueue(Q235 , 5x)
11: else if x = Head(Q23 ) then
12: Dequeue(Q23 )
13: Enqueue(Q23 , 3x)
14: Enqueue(Q235 , 5x)
15: else
16: Dequeue(Q235 )
17: Enqueue(Q235 , 5x)
18: n←n−1
19: return x
This algorithm loops n times. Each time extracts the minimum number in constant
time. Then adds at most 3 numbers to the queues in constant time. The overall perfor-
mance is O(n).
Summary
Both brute-force solutions can’t scale up. This book is not about coding contest or code
interview, but aims to provide both purely functional algorithms and their counterpart
imperative implementations. We referenced many results from Okasaki’s work [3] and
classic text books [4] . We avoid relying on a specific programming language, because the
reader may or may not be familiar with it, and programming languages keep changing.
Instead, we use pseudo code or mathematics notation to make the algorithm definition
generic. When give code examples, the functional ones look more like Haskell, and the
imperative ones look like a mix of several languages.
I wrote the first edition from 2009 to 2017, then rewrote the second edition and added
answers to the 119 exercises from 2020 to 2023. The pdf can be downloaded from github.
Exercise 1
1.1. For the free number puzzle, since all numbers are not negative, we can reuse the
sign as a flag. For every |x| < n (where n is the length), negate the number at
position |x|. Then scan to find the first positive number. Its position is the answer.
Write a program to realize this solution.
1.2. There are n numbers 1, 2, ..., n. After some processing, they are shuffled, and a
number x is altered to y. Suppose 1 ≤ y ≤ n, design a solution to find x and y in
linear time with constant space.
1.3. Below example program is a solution for the regular number puzzle. Is it equivalent
to the queue based solution?
Int regularNum(Int m) {
[Int] nums(m + 1)
Int n = 0, i = 0, j = 0, k = 0
nums[0] = 1
Int x2 = 2 ∗ nums[i]
Int x3 = 3 ∗ nums[j]
Int x5 = 5 ∗ nums[k]
while n < m {
viii Preface
n = n + 1
nums[n] = min(x2, x3, x5)
if x2 == nums[n] {
i = i + 1
x2 = 2 ∗ nums[i]
}
if x3 == nums[n] {
j = j + 1
x3 = 3 ∗ nums[j]
}
if x5 == nums[n] {
k = k + 1
x5 = 5 ∗ nums[k]
}
}
return nums[m]
}
Contents
Preface i
1 List 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Basic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Last . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.3 Right index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.4 Mutate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 sum and product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.6 maximum and minimum . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 map and for-each . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
For each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2 reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Sub-list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 break and group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Fold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Search and filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 zip and unzip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Insertion sort 39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Binary search tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Red-black tree 43
ix
x CONTENTS
4.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 Imperative red-black tree? . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 AVL tree 57
5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2.2 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Imperative algorithm F . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 64
6 Radix tree 65
6.1 Integer trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1.3 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Integer prefix tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.3 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.1 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.2 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Prefix tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4.1 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.4.2 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.5 Applications of trie and prefix tree . . . . . . . . . . . . . . . . . . . . . . 79
6.5.1 Dictionary and input completion . . . . . . . . . . . . . . . . . . . 80
6.5.2 Predictive text input . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 84
7 B-Tree 89
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2.1 Insert then split . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2.2 Split before insert . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.2.3 Paired lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.3 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.4 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.4.1 Delete and fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.4.2 Merge before delete . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 106
11 Queue 163
11.1 Linked-list queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
11.2 Circular buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
11.3 Paired-list queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
11.4 Balance Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.5 Real-time queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.6 Lazy real-time queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.7 Appendix - example programs . . . . . . . . . . . . . . . . . . . . . . . . . 171
12 Sequence 173
12.1 Binary random access list . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
xii CONTENTS
Appendices
Answers 293
List
1.1 Introduction
List and array are build blocks for other complex data structure. Both hold multiple
elements as a container. Array is a range of consecutive cells indexed by a number
(address). It is typically bounded with fixed size. While list increases on-demand. One
can traverse a list one by one from head to tail. Particularly in functional settings, list
plays critical role to control the computation and logic flow1 . Readers already be familiar
with map, filter, fold are safe to skip this chapter, and directly start from chapter 2.
1.2 Definition
List, or singly linked-list is a data structure recursively defined as: A list is either empty,
denoted as [ ] or NIL; or contains an element (also called key) and liked with a sub-list
(called next). Figure 1.1 shows a list of nodes. Every node links to the next or NIL (the
last). We often define list with the compound structure2 , for example:
NIL
data List<A> {
A key
List<A> next
}
Many programming environments support the NIL concept. There are two ways to
represent the empty list: use NIL (or null, or ∅) directly, or create a list, but put nothing
as [ ]. From implementation perspective, NIL need not allocate any memories, while [ ]
does.
1 In low level, lambda calculus plays the most critical role as one of the computation model equivalent
1
2 CHAPTER 1. LIST
1.2.1 Access
Given a none empty list X, define two functions3 to access the first element, and the rest
sub-list. They are often called as first X and rest X, or head X and tail X 4 . Conversely,
we can construct a list from an element x and another list xs (can be empty), as x:xs. It
is called the cons operation. We have the following equations:
(
head (x:xs) = x
(1.1)
tail (x:xs) = xs
For a none empty list X, we also denote the first element as x1 , and the rest sub-list
as X 0 . For example, when X = [x1 , x2 , x3 , ...], then X 0 = [x2 , x3 , ...].
Exercise 1.2
1.2.1. For list of type A, suppose we can test if any two elements x, y ∈ A are equal,
define the algorithm to test if two lists are equal.
We traverse the list to count the length, the performance is bound to O(n), where n
is the number of elements. We use |X| as the length of X when the context is clear. To
avoid repeatedly counting, we can persist the length in a variable, and update it when
mutate (add or delete). Below is the iterative length counting:
1: function Length(X)
2: n←0
3: while X 6= NIL do
4: n←n+1
5: X ← Next(X)
6: return n
1.3.1 index
Array supports random access at position i in constant time, while we need traverse the
list i steps to access the target element.
(
i=0: x
getAt i (x:xs) = (1.3)
i 6= 0 : getAt (i − 1) xs
We leave the empty list not handled. The behavior for [ ] is undefined. As such, the
out of bound case also leads to the undefined behavior. If i > |X|, we end up the edge
case to access the (i − |X|) position of the empty list. On the other hand, if i < 0, after
minus it by one, it’s even farther away from 0, and finally ends up with some negative
position of the empty list. getAt is bound to O(i) time as it advances the list i steps.
Below is the imperative implementation:
3 We often write function f (x) as f x, and f (x, y, ..., z) as f x y ... z.
4 They are named as car and cdr in Lisp due to the design of machine registers [63] .
1.3. BASIC OPERATIONS 3
1: function Get-At(i, X)
2: while i 6= 0 do
3: X ← Next(X) . error when X = NIL
4: i←i−1
5: return First(X)
Exercise 1.3
1.3.1. For the iterative Get-At(i, X), what is the behavior when X is empty? what if i
is out of bound?
1.3.2 Last
There is a pair of symmetric operations to ‘first/rest’, namely ‘last/init’. For a none empty
list X = [x1 , x2 , ..., xn ], function last returns the tail element xn , while init returns the
sub-list of [x1 , x2 , ..., xn−1 ]. Although they are symmetric pairs left to right, ‘last/init’
need traverse the list in linear time.
last [x] = x init [x] = [ ]
(1.4)
last (x:xs) = last xs init (x : xs) = x : init xs
Neither handles the empty list. The behavior is undefined with [ ]. Below are the
iterative implementation:
1: function Last(X)
2: x ← NIL
3: while X 6= NIL do
4: x ← First(X)
5: X ← Rest(X)
6: return x
7: function Init(X)
8: X 0 ← NIL
9: while Rest(X) 6= NIL do . Error when X is NIL
10: X 0 ← Cons(First(X), X 0 )
11: X ← Rest(X)
12: return Reverse(X 0 )
Init accumulates the result through Cons. However, the order is reversed. We need
reverse (section 1.4.2) it back.
The better solution uses two pointers p1 , p2 with the distance i, i.e., resti (p2 ) = p1 ,
where resti (p2 ) means repeatedly apply rest for i times. When advance p2 by i steps, it
meets p1 . p2 starts from the head. Advance both pointers in parallel till p1 arrives at tail.
At this time point, p2 exactly points to the i-th element from right. as shown in fig. 1.2.
p1 and p2 form a sliding window of width i.
4 CHAPTER 1. LIST
Figure 1.2: Sliding window. (a) p2 starts from the head, behind p1 in i steps; (b) When
p1 reaches the tail, p2 points to the i-th element from right.
1: function Last-At(i, X)
2: p←X
3: while i > 0 do
4: X ← Rest(X) . Error if out of bound
5: i←i−1
6: while Rest(X) 6= NIL do
7: X ← Rest(X)
8: p ← Rest(p)
9: return First(p)
We can’t alter the pointers in purely functional settings. Instead, we advance two
lists X = [x1 , x2 , ..., xn ] and Y = [xi , xi+1 , ..., xn ] simultaneously, where Y is the sub-list
without the first i − 1 elements.
Where:
drop 0 xs = xs
drop m [ ] = [ ] (1.7)
drop m (x:xs) = drop (m − 1) xs
Exercise 1.4
1.4.1. In the Init algorithm, can we use Append(X 0 , First(X)) instead of Cons?
1.4.2. How to handle empty list or out of bound error in Last-At?
1.3. BASIC OPERATIONS 5
1.3.4 Mutate
Mutate includes append, insert, update, and delete. The functional environment actually
implements mutate by creating a new list for the changed part, while keeps (persists) the
original one for reuse, or release at sometime (chapter 2 in [3] ). Append is the symmetric
operation of cons, it appends element to the tail, while cons adds from head. It is also
known as ‘snoc’ (reverse of ‘cons’). As it need traverse the list to the tail, the performance
is O(n), where n is the length. To avoid repeatedly traverse, we can persist the tail
reference, and update it for changes.
append [ ] x = [x]
(1.8)
append (y:ys) x = y : append ys x
Similar to getAt, we need advance to the target position i in O(i) time to change the
element.
setAt 0 x (y:ys) = x : ys
(1.9)
setAt i x (y:ys) = y : setAt (i − 1) x ys
Exercise 1.5
1.5.1. Add the ‘tail’ reference, optimize the append to constant time.
1.5.2. When need update the tail reference? How does it affect the performance?
1.5.3. Handle the empty list and out of bound error for setAt.
insert
There are two different cases about insertion: (1) insert an element at a given position:
insert i x X, similar to setAt; (2) insert an element to a sorted list, and maintain the
ordering.
5 The parameter orders are also symmetric: cons x xs and append xs x
6 CHAPTER 1. LIST
insert 0 x ys = x : ys
(1.10)
insert i x (y:ys) = y : insert (i − 1) x ys
When i exceeds the length, treat it as append (the exercise of this section). Below is
the iterative implementation:
1: function Insert(i, x, X)
2: if i = 0 then
3: return Cons(x, X)
4: H←X
5: p←X
6: while i > 0 and X 6= NIL do
7: p←X
8: X ← Rest(X)
9: i←i−1
10: Rest(p) ← Cons(x, X)
11: return H
Let the list L = [x1 , x2 , ..., xn ] be sorted, i.e., for any position 1 ≤ i ≤ j ≤ n, then
xi ≤ xj . Where ≤ is abstract ordering. It can be ≥, subset between sets, and etc. We
define insert to maintain the ordering.
insert x [ ] = [x]
(
x≤y: x:y:ys (1.11)
insert x (y:ys) =
otherwise : y : insert x ys
Since it need compare elements one by one, the performance is bound to O(n) time,
where n is the length. Below is the iterative implementation:
1: function Insert(x, X)
2: if X = NIL or x < First(X) then
3: return Cons(x, X)
4: H←X
5: while Rest(X) 6= NIL and First(Rest(X)) < x do
6: X ← Rest(X)
7: Rest(X) ← Cons(x, Rest(X))
8: return H
With insert, we can further define the insertion sort: repeatedly insert elements to
the empty list. Since each insert takes liner time, the overall time is bound to O(n2 ).
sort [ ] = [ ]
(1.12)
sort (x:xs) = insert x (sort xs)
We can eliminate the recursion to implement the iterative sort. Scan the list, and
insert elements one by one:
1: function Sort(X)
2: S ← NIL
3: while X 6= NIL do
4: S ← Insert(First(X), S)
5: X ← Rest(X)
6: return S
At any time during loop, S is sorted. The recursive implementation processes the list
from right, while the iterative one is from left. We’ll use ‘tail-recursion’ in section 1.3.5 to
eliminate this difference. Chapter 3 is about insertion sort in detail, including performance
analysis and optimization.
1.3. BASIC OPERATIONS 7
Exercise 1.6
1.6.1. Handle the out-of-bound case when insert, treat it as append.
1.6.2. Implement insert for array. When insert at position i, all elements after i need
shift to the end.
delete
Symmetric to insert, there are two cases for deletion: (1) delete the element at a position
delAt i X; (2) look up then delete the element of a given value delete x X. To delete
the element at position i, we advance i steps, then by pass the element, and link the rest
sub-list.
delAt i [ ] = [ ]
delAt 0 (x:xs) = xs (1.13)
delAt i (x:xs) = x : delAt (i − 1) xs
delete x [ ] = [(]
x = y : ys (1.14)
delete x (y:ys) =
x 6= y : y : delete x ys
Because we scan the list to find the target element, the time is bound to O(n), where
n is the length. We use a sentinel node to simplify the iterative implementation too:
1: function Delete(x, X)
2: S ← Cons(⊥, X)
3: p←X
4: while X 6= NIL and First(X) 6= x do
5: p←X
6: X ← Rest(X)
7: if X 6= NIL then
8: Rest(p) ← Rest(X)
9: return Rest(S)
Exercise 1.7
8 CHAPTER 1. LIST
1.7.1. Implement the algorithm to find and delete all occurrences of a given value.
1.7.2. Design the delete algorithm for array, all elements after the delete position need
shift to front.
concatenate
Append is a special case for concatenation. It adds only one element, while concatenation
adds multiple. However, the performance would be quadratic if repeatedly append. Let
|xs| = n, |ys| = m be the lengths, we need advance to the tail of xs for m times, the
performance is O(n + (n + 1) + ... + (n + m)) = O(nm + m2 ).
xs +
+ [ ] = xs
xs +
+ (y:ys) = append xs y +
+ ys
While the ‘cons’ is fast (constant time), we can traverse to the tail of xs only once,
then link to ys.
[ ]+
+ ys = ys
xs +
+[ ] = xs (1.15)
(x:xs) +
+ ys = x : (xs +
+ ys)
This improvement has the performance of O(n). In imperative settings, we can im-
plement concatenation in constant time with the tail reference variable (see exercise).
1: function Concat(X, Y )
2: if X = NIL then
3: return Y
4: if Y = NIL then
5: return X
6: H←X
7: while Rest(X) 6= NIL do
8: X ← Rest(X)
9: Rest(X) ← Y
10: return H
Both need traverse the list, hence the performance is O(n). They compute from right
to left. We can change to accumulate the result from left to right.
sum0 a [ ] = a prod0 a [ ] = a
sum0 a (x:xs) = sum (x + a) xs prod0 a (x:xs) = prod0 (x · a) xs
(1.17)
Given a list, we call sum0 with 0, and prod0 with 1 to start accumulating:
Or in Curried form:
1.3. BASIC OPERATIONS 9
Curried form was introduced by Schönfinkel (1889 - 1942) in 1924, then widely used by
Haskell Curry from 1958. It is known as Currying [73] . For a function taking 2 parameters
f (x, y), when fix x with a value, it becomes a function of y: g(yy ) = f (x, y ) or g = f x.
For multiple variables of f (x, y, ..., z), we convert it to a series of Curried functions:
f, f x, f x y, ..., each takes one parameter: f (x, y, ..., z) = f (x)(y)...(z) = f x y ... z.
The accumulated implementation computes from left to right, needn’t book keeping
any context, state, or intermediate result for recursion. All states are either passed as
argument (for example a), or dropped (for example the previous element). We can further
optimize such recursive calls to loops. Because the recursion happens at the tail of the
function, we call them tail recursion (or ‘tail call’), and the process to eliminate recur-
sion as ‘tail recursion optimization’ [61] . It greatly improves the performance and avoid
stack overflow due to deep recursions. In eq. (1.12) about insertion sort, the recursive
implementation sorts elements form right. We can also optimize it to tail call:
sort0 a [ ] = a
(1.19)
sort a (x:xs) = sort0 (insert x a) xs
0
We pass [ ] to start sorting (Curried form): sort = sort0 [ ]. As a typical tail call
example, consider how to compute bn effectively? (problem 1.16 in [63] .) A direct imple-
mentation repeatedly multiplies b for n times from 1, which is bound to O(n) time:
1: function Pow(b, n)
2: x←1
3: loop n times
4: x←x·b
5: return x
When compute b8 , after the first 2 loops, we get x = b2 . At this stage, we needn’t
multiply x with b to get b3 , but directly compute x2 , which gives b4 . If do this again, we
get (b4 )2 = b8 . We only need loop 4 times, but not 8 times. If n = 2m for some integer
m ≥ 0, we can compute bn fast as below:
b1 = b
n
n
b = (b 2 )2
We next extend this divide and conquer method to any integer n ≥ 0: if n = 0, define
n
b0 = 1; if n is even, we halve n to compute b 2 , then square it; if n is odd, since n − 1 is
even, we recursively compute bn−1 , then multiply b:
b0 = (
1
n
2|n : (b 2 )2 (1.20)
bn =
otherwise : b · bn−1
n
However, (b 2 )2 is not tail recursive. Alternatively, we square the base number, and
halve the exponent.
b0 = (
1
n
n 2|n : (b2 ) 2 (1.21)
b =
otherwise : b · bn−1
With this change, we get a tail recursive function to compute bn = pow(b, n, 1).
10 CHAPTER 1. LIST
pow(b, 0, a) = a
( n
2|n : pow(b2 , , a) (1.22)
pow(b, n, a) = 2
otherwise : pow(b, n − 1, ab)
This implementation is bound to O(lg n) time. Write n in binary format n = (am am−1 ...a1 a0 )2 ,
we need compute b2 if ai = 1, similar to the Binomial heap (section 10.1) algorithm. Mul-
i
tiply them together finally. For example, when compute b11 , as 11 = (1011)2 = 23 + 2 + 1,
gives b11 = b2 × b2 × b. We follow these steps:
3
1. compute b1 , which is b;
2. Square to b2 ;
3. Square to b2 ;
2
4. Square to b2 .
3
pow(b, 0, a) = a
n
2|n : pow(b2 , , a) (1.23)
pow(b, n, a) = 2n
otherwise : pow(b2 , b c, ab)
2
This algorithm essentially shifts n to right 1 bit a time (divide n by 2). If the LSB
(the least significant bit) is 0, n is even, squares the base and keeps the accumulator a
unchanged. If the LSB is 1, n is odd, squares the base and accumulates it to a. When
n becomes zero, we exhaust all bits, a is the final result. At any time, the updated base
b0 , the shifted exponent n0 , and the accumulator a satisfy the invariant bn = a(b0 )n .The
0
previous implementation minus one for odd n, the improvement halves n every time.
It exactly runs m rounds, where m is the number of bits. We leave the imperative
implementation as exercise.
Back to the sum and product, the iterative implementation applies plus and multiply
while traversing:
1: function Sum(X)
2: s←0
3: while X 6= NIL do
4: s ← s+ First(X)
5: X ← Rest(X)
6: return s
7: function Product(X)
8: p←1
9: while X 6= NIL do
10: p ← p · First(X)
11: X ← Rest(X)
12: return p
With product, we can define factorial of n as: n! = product [1..n].
min0 a [ ] = a
min0 x xs
(
x<a: (1.25)
min0 a (x:xs) =
otherwise : min0 a xs
Different from sum0 /prod0 , we can’t pass a fixed starting value to min0 /max0 , unless
±∞ (Curried form):
We can pass the first element given min/max only take none empty list:
We can optimize the tail recursive implementation with loops. Use the Min for ex-
ample.
1: function Min(X)
2: m ← First(X)
3: X ← Rest(X)
4: while X 6= NIL do
5: if First(X) < m then
6: m ← First(X)
7: X ← Rest(X)
8: return m
Alternatively, we can re-use the first element as the accumulator. Every time, we
compare the first two elements, and drop one. Below is the example for min.
min [x] = x
min (x1 :xs)
(
x1 < x 2 : (1.27)
min (x1 :x2 :xs) =
otherwise : min (x2 :xs)
Exercise 1.8
1.8.1. Change length to tail recursive.
1.8.2. Compute bn through the binary format of n.
1.4 Transform
In algebra, there are two types of transformation: one keeps the list structure, but only
transforms the elements; the other alter the list structure, hence the result is not isomor-
phic. We call the former map.
12 CHAPTER 1. LIST
toStr [ ] = [ ]
(1.28)
toStr (x:xs) = (str x) : toStr xs
For the second example, given a dictionary, which is a list of words grouped by their
initials:
[[a, an, another, ... ],
[bat, bath, bool, bus, ...],
...,
[zero, zoo, ...]]
Next process a text (Hamlet for example), augment each word with the number of
occurrences, like:
[[(a, 1041), (an, 432), (another, 802), ... ],
[(bat, 5), (bath, 34), (bool, 11), (bus, 0), ...],
...,
[(zero 12), (zoo, 0), ...]]
Now for every initial letter, which word occurs most? The answer is a list of words,
that every one has the most occurrences in the group, like [a, but, can, ...]. We
need a program that transforms a list of groups of word-number pairs into a list
of words. First, define a function, which takes a list of word-number pairs, finds the
word paired with the biggest number. Sort is overkill. We need a special max function
maxBy cmp xs, where cmp is the generic compare function.
Then pass less to maxBy (in Curried form): max00 = maxBy less. Finally, call max00
to process the list:
solve [ ] = [ ]
(1.32)
solve (x:xs) = (f st (max00 x)) : solve xs
map f [ ] = [ ]
(1.33)
map f (x:xs) = (f x) : map f xs
map takes a function f , applies it to every element to form a new list. A function that
computes with other functions is called high-order function. Let the type of f be A → B.
It sends an element of A to the result of B, the type of map is:
1.4. TRANSFORM 13
Read as: map takes a function of A → B, converts a list [A] to another list [B]. We
can define the above two examples with map as below (in Curried form):
Where f ◦g is function composition, i.e. first apply g then apply f . (f ◦g) x = f (g(x)),
read as f after g. From the set theory point of view. Function y = f (x) defines the map
from x in set X to y in set Y :
Y = {f (x)|x ∈ X} (1.35)
|X| < r or r = 0 :
(
[[ ]]
perm X r =
otherwise : [x:ys | x ← X, ys ← perm (delete x X) (r − 1)]
(1.36)
If pick zero element, or there are too few (less than r), the result is a list of empty[[
]]; otherwise, for every x in X, we recursively pick r − 1 out of the rest n − 1 elements;
then prepend x for each.
We use a sentinel node in the iterative Map implementation.
1: function Map(f, X)
2: X 0 ← Cons(⊥, NIL) . the sentinel
3: p ← X0
4: while X 6= NIL do
5: x ← First(X)
6: X ← Rest(X)
7: Rest(p) ← Cons(f (x), NIL)
8: p ← Rest(p)
9: return Rest(X 0 ) . discard the sentinel
For each
Sometimes we only need process the elements one by one without building the new list,
for example, print every element:
1: function Print(X)
2: while X 6= NIL do
3: print First(X)
4: X ← Rest(X)
More generally, we pass a procedure P , then apply P to each element.
1: function For-Each(P, X)
2: while X 6= NIL do
3: P(First(X))
4: X ← Rest(X)
14 CHAPTER 1. LIST
For example, consider the “n-lights puzzle” [96] . There are n lights in a room, all are
off. We execute the following for n rounds:
4. ...
At the last round, only the n-th light is switched. How many lights are on in the end?
We start with a brute-force solution. Represent the n lights as a list of 0/1 numbers (0:
off, 1: on). Start from all zeros: [0, 0, ..., 0]. Label the light from 1 to n, then map them
to (i, on/off) pairs:
It binds each number to zero, i.e., a list of pairs: L = [(1, 0), (2, 0), ..., (n, 0)]. We
operate this list of pairs n rounds. In the i-th round, for every pair (j, x), if i|j (meaning
j mod i = 0), then switch it on/off. As 1 − 0 = 1 and 1 − 1 = 0, we switch x to 1 − x.
j mod i = 0 : (j, 1 − x)
(
switch i (j, x)) = (1.37)
otherwise : (j, x)
Realize the i-th round of operation as map (switch i) L (we use the Curried form of
switch). Next, define a function op(), which performs mapping on L over and over for n
rounds: op [1, 2, ..., n] L.
op [ ] L = L
(1.38)
op (i:is) L = op is (map (switch i) L)
Finally, sum the second value of each pair to get the answer.
Run from 1 to 100 lights to give below answers (added line breaks):
[1,1,1,
2,2,2,2,2,
3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,10]
1.4. TRANSFORM 15
They form a pattern: the first 3 answers are 1; the 4-th to the 8-th answers are 2; the
9-th to the 15-th answers are 3; ... It seems that from the i2 -th to the ((i + 1)2 − 1)-th
answers are i. Let’s prove it:
Proof. Given n lights labeled from 1 to n, all light are off when start. The lights which
are switched odd times are on finally. For every light i, we switch it at round j if j divides
i (j|i). Only the lights which have odd number of factors are on in the end. The key
point to solve this puzzle, is to find all the numbers that have odd number of factors. For
any natural number n, let S be the set of all factors of n. Initialize S as ∅. If p is a factor
of n, there must exist a natural number q such that n = pq. It means q is also a factor
of n. We add 2 different factors to set S if and only if p 6= q, which keeps |S| even all the
time unless p = q. In such case, n is a square number. We can only add 1 factor to set
S, which leads to odd number of factors.
Below example program outputs the answer for 1, 2, ..., 100 lights:
map (floor ◦ sqrt) [1..100]
Map is abstract, does not limit to list, but applies to many complex algebraic struc-
tures. The next chapter explains how to map trees. We can apply mapping as long as we
can traverse the structure, and the empty is defined.
1.4.2 reverse
It’s a good exercise to reverse a singly linked-list with constant space. One must carefully
manipulate the node reference, while there is an easy way: (1) Write a purely recursive
solution; (2) Change it to tail recursive; (3) Convert to imperative implementation. The
purely recursive solution is direct:
reverse [ ] = [ ]
reverse (x:xs) = append (reverse xs) x
Next convert it to tail recursive. Use an accumulator to store the reversed part, start
from empty: reverse = reverse0 [ ]
reverse0 a [ ] = a
(1.41)
reverse0 a (x:xs) = reverse0 (x:a) xs
Different from appending, cons (:) takes constant time. We repeatedly extract the
head element, and prepend to the accumulator. It likes to store the elements in a stack,
then pop them out. The overall performance is O(n), where n is the length. Since tail
call need not keep the context, we next convert it to iterative loops:
1: function Reverse(X)
2: A ← NIL
3: while X 6= NIL do
4: A ← Cons(First(X), A)
5: X ← Rest(X)
6: return A
However, this implementation creates a new reversed list, but not reverses in-place.
We change it further:
16 CHAPTER 1. LIST
Exercise 1.9
1.9.1. Find the maximum v in a list of pairs [(k, v)] in tail recursive way.
1.5 Sub-list
One can slice an array fast, but need linear time to traverse and extract sub-list. take
extracts the first n elements. It is equivalent to: sublist 1 n X. drop discards the first n
elements. It is equivalent to: sublist (n + 1) |X| X, which is symmetric to take6 :
take 0 xs = [ ] drop 0 xs = xs
take n [ ] = [ ] drop n [ ] = [ ]
take n (x:xs) = x : take (n − 1) xs drop n (x:xs) = drop (n − 1) xs
(1.42)
When n > |X| or n < 0, it ends up with the empty list case. We leave the imperative
implementation as exercise. We can extract the sub-list at any position for a given length:
The range [s, e] includes both ends. We can split the list at a position:
takeWhile p [ ] = (
[] dropWhile p [ ] = [(]
p(x) : x : takeWhile p xs p(x) : dropWhile p xs
takeWhile p (x:xs) = dropWhile p (x:xs) =
otherwise : [ ] otherwise : x:xs
(1.46)
6 Some programming languages provide built-in implementation, for example in Python: xs[:m] and
span p [ ] = ([
( ], [ ])
p(x) : (x:as, bs), where (as, bs) = span p xs (1.47)
span p (x:xs) =
otherwise : ([ ], x:xs)
We define break by negating the predication: break p = span (¬p). span and break
find the longest prefix. They stop immediately when the condition is broken and ignore
the rest. Below is the iterative implementation of span:
1: function Span(p, X)
2: A←X
3: tail ← NIL
4: while X 6= NIL and p(First(X)) do
5: tail ← X
6: X ← Rest(X)
7: if tail = NIL then
8: return (NIL, X)
9: Rest(tail) ← NIL
10: return (A, X)
span and break cut the list into two parts, group divides list into multiple sub-lists. For
example, group a long string into small units, each contains consecutive same characters:
For another example, given a list of numbers: X = [15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1,
4, 8, 3, 14, 2], divide it into small descending sub-lists:
group X = [[15, 9, 0], [12, 11, 7], [10, 5], [6], [13, 1], [4], [8, 3], [14, 2]]
Both are useful. We can build a Radix tree from string groups, support fast text
search (chapter 6). We can implement the nature merge sort algorithm from number
groups (chapter 13). Abstract the group condition as a relation ∼. It tests whether two
consecutive elements x, y are ‘equivalent’: x ∼ y. We scan the list, compare two elements
each time. If they are equivalent, then add both to a group; otherwise put to two different
ones.
group ∼ [ ] = [[ ]]
group ∼ [x] = [[x]]
(x:ys):yss, where (ys:yss) = group ∼ (y:xs)
(
x∼y:
group ∼ (x:y:xs) =
otherwise : [x]:ys:yss
(1.48)
It is bound to O(n) time, where n is the length. For the iterative implementation, if
X isn’t empty, initialize the result groups as [[x1 ]]. Scan from the second element, append
it to the last group if the two consecutive elements are ‘equivalent’; otherwise start a new
group.
1: function Group(∼, X)
18 CHAPTER 1. LIST
2: if X = NIL then
3: return [[ ]]
4: x ← First(X)
5: X ← Rest(X)
6: g ← [x]
7: G ← [g]
8: while X 6= NIL do
9: y ← First(X)
10: if x ∼ y then
11: g ← Append(g, y)
12: else
13: g ← [y]
14: G ← Append(G, g)
15: x←y
16: X ← Next(X)
17: return G
However, the performance downgrades to quadratic without the tail reference opti-
mization for Append. We can change to Cons if don’t care the order. We can define
the above 2 examples as group (=) “Mississippi” and group (≥) X. Alternatively, we
can realize grouping with span: repeatedly apply span to the rest till it becomes empty.
However, span takes an unary function as the predication, while the group needs a binary.
We solve it with Currying: pass and fix the first argument.
group ∼ [ ] = [[ ]]
group ∼ (x:xs) = (x:as) : group ∼ bs, where (as, bs) = span (x ∼) xs
(1.49)
Although the new function groups string correctly, it can’t group numbers to de-
scending lists: group (≥) X = [[15,9,0,12,11,7,10,5,6,13,1,4,8,3,14,2]]. When put the first
number 15 as the left hand of ≥, it is the maximum , hence span ends with putting all
numbers to as and leaves bs empty. It is not a defect, but the correct behavior. Because
group is defined to put equivalent elements together. The equivalent relation (∼) must
satisfy three axioms: reflexive, symmetric, and transitive:
1. Reflexive. x ∼ x;
2. Symmetric. x ∼ y ⇔ y ∼ x.
3. Transitive. x ∼ y, y ∼ z ⇒ x ∼ z;
When group “Mississippi”, the equal (=) operator satisfies the three axioms, hence
generates the correct result. However, the Curried (≥) as an equivalent relationship,
violets both reflexive and symmetric axioms, hence generates unexpected result. The
second implementation via span, limits its use case to strict equivalence; while the first
one does not. It only tests the predication for every two elements matches, which is
weaker than equivalence.
Exercise 1.10
1.10.1. Change the take/drop implementation. When n is negative, returns [ ] for take,
and the entire list for drop.
1.10.2. Implement the in-place imperative take/drop.
1.10.3. Define sublist and slice in Curried Form without X as parameter.
1.6. FOLD 19
1.6 Fold
Almost all list algorithms share the common structure. It is not by chance. The common-
ality is rooted from the recursive nature of list. We can abstract the list algorithm to a
high level concept, fold7 , which is essentially the initial algebra of all list computations [99] .
Observe sum, product, and sort for the common structure: the result for empty list is
0 for sum, 1 for product, and [ ] for sort; the binary operation that applies to the head
and the recursive result. It’s plus for sum, multiply for product, and ordered insertion for
sort. We abstract the result for empty list as the initial value z (generic zero), the binary
operation as ⊕. define:
h ⊕ z[] = z
(1.50)
h ⊕ z (x:xs) = x ⊕ (h ⊕ z xs)
Consider the fold-fan as a list of bamboo frames. The binary operation is to fold a
frame to the top of the stack (initialized empty). To fold the fan, start from one end,
7 also known as reduce
20 CHAPTER 1. LIST
repeatedly apply the binary operation, till all the frames are stacked. The sum and
product algorithms do the same thing essentially.
sum [1, 2, 3, 4, 5] = 1 + (2 + (3 + (4 + 5))) product [1, 2, 3, 4, 5] = 1 × (2 × (3 × (4 × 5)))
= 1 + (2 + (3 + 9)) = 1 × (2 × (3 × 20))
= 1 + (2 + 12) = 1 × (2 × 60)
= 1 + 14 = 1 × 120
= 15 = 120
We name this kind of processes fold. Particularly, since the computation is from right,
we denote it as f oldr:
f oldr f z [ ] = z
(1.51)
f oldr f z (x:xs) = f x (f oldr f z xs)
Or in Curried form: sum = f oldr (+) 0, product = f oldr (×) 1, for insertion-sort, it
is: sort = f oldr insert [ ]. Convert f oldr to tail recursive. It generates the result from
left. denote it as f oldl:
f oldl f z [ ] = z
(1.54)
f oldl f z (x:xs) = f oldl f (f z x) xs
Use sum for example, we can see how the computation is expanded from left to right:
The evaluation of f (z, x) is delayed in every step (the lazy evaluation). Otherwise,
they will be evaluated in sequence of [1, 3, 6, 10, 15] in each call. Generally, we can expand
f oldl as (infix notation):
Where ∅ is the empty container. The singly linked-list is such a container. It performs
well (constant time) when add element to the head, but need linear time when append
to tail. f oldr is a natural choice when duplicate a list while keeping the order. But f oldl
will generate a reversed list. As a workaround, we first reverse the list, then reduce it:
1: function Reduce-Right(f, z, X)
2: return Reduce(f, z, Reverse(X))
One may prefer f oldl as it is tail recursive, fits for both functional and imperative
settings as an online algorithm. However, f oldr plays a critical role when handling infinite
list (modeled as stream) with lazy evaluation. For example, below program wraps every
natural number to a singleton list, and returns the first 10:
take 10 (f oldr (x xs 7→ [x]:xs) [ ] [1, 2, ...])
⇒ [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]
It does not work with f oldl or the evaluation never ends. We use a unified notation
f old when both fold left and right work. We also use f oldl and f oldr to indicate direction
doesn’t matter. Although this chapter is about list, the fold concept is generic, applies to
other algebraic structures. We can fold a tree (section 2.6 in [99] ), a queue, and many other
objects as long as the following 2 things are defined: (1) empty (for example the empty
tree); (2) decomposed recursive structure (like to decompose a tree into sub-trees and a
key). People abstract them further with concepts like foldable, monoid, and traversable.
For example, let us implement the n-lights puzzle with f old and map. In the brute-
force solution, we create a list of pairs. Each pair (i, s) has a light number i, and on/off
state s. For every round j, switch the i-th light when j|i. Define this process with f old:
f oldr step [(1, 0), (2, 0), ..., (n, 0)] [1, 2, ..., n]
All lights are off at the beginning. We fold the list of round 1 to n. Function step takes
two parameters: the round number i, and the list of pairs: step i L = map (switch i) L.
The result of f oldr is the pairs of light number and the final on/off state. We extract the
state out through map, and count the number with sum:
sum (map snd (f oldr step [(1, 0), (2, 0), ..., (n, 0)] [1, 2, ..., n])) (1.56)
concat = f oldr (+
+) [ ] (1.57)
For example: concat [[1], [2, 3, 4], [5, 6, 7, 8, 9]] ⇒ [1, 2, 3, 4, 5, 6, 7, 8, 9].
Exercise 1.11
1.11.1. To define insertion-sort with f oldr, we design the insert function as insert x X,
and sort as sort = f oldr insert [ ]. The type for f oldr is:
f oldr :: (A → B → B) → B → [A] → B
Where its first parameter f has the type of A → B → B, the initial value z has
the type B. It folds a list of A, and builds the result of B. How to define the
insertion-sort with f oldl? What is the type of f oldl?
1.11.2. What’s the performance of concat? Design a linear time concat algorithm.
1.11.3. Define map in f oldr.
22 CHAPTER 1. LIST
The existence check is also called elem. The performance is O(n). We can not improve
it to O(lg n) with binary search directly even for ordered list. This is because list does
not support constant time random access (chapter 3). Let’s extend elem. In the n-lights
puzzle, we use a list of pairs [(k, v)]. Every pair contains a key and a value. Such list is
called ‘associate list’ (abbrev. assoc list). We can lookup the value with a key.
lookup x [ ] = Nothing
(
k = x : Just (k, v) (1.61)
lookup x ((k, v):kvs) =
k 6= x : lookup x kvs
Different from elem, we want to find the corresponding value besides the existence of
key x. However, it is not guaranteed the value always exists. We use the algebraic type
class ‘Maybe’. A type of Maybe A has two kinds of value. It may be some a in A or
nothing. Denoted as Just a and Nothing respectively. This is a way to deal with null
reference8 (4.2.2 in [99] ). We can make lookup generic, to find the element that satisfies a
given predicate:
f ind p [ ] = Nothing
(
p(x) : Just x (1.62)
f ind p (x:xs) =
otherwise : f ind p xs
Although there can be multiple elements satisfying p, the f ind function picks the
first. We can expand to find all, which is called f ilter as shown in fig. 1.4. Define (ZF
expression): f ilter p X = [x|x ← X, p(x)]. We can use logic and to chain multiple filters,
as: f ilter (p1 ∧ p2 ∧ · · · ) X = [x|x ← X, p1 (x), p2 (x), · · · ]9 .
Figure 1.4: Input: [x1 , x2 , ..., xn ], Output: [x01 , x02 , ..., x0m ]. and ∀x0i ⇒ p(x0i ).
Different from f ind, filter returns the empty list instead of Nothing when no element
satisfies the predicate.
f ilter p [ ] = [(]
p(x) : x : f ilter p xs (1.63)
f ilter p (x:xs) =
otherwise : f ilter p xs
This definition builds the result from right. For iterative implementation, the perfor-
mance will drop to O(n2 ) with Append. If change to Cons, then the order is reversed.
We need further reverse it back in linear time (see the exercise).
8 Similar to Optional<A> in some environments.
9 We may use the simplified notation: [x ← X, p1 (x), p2 (x), ...], or even [p1 (x), p2 (x), ...] when context
is clear
1.7. SEARCH AND FILTER 23
1: function Filter(p, X)
2: X 0 ← NIL
3: while X 6= NIL do
4: if p(First(X)) then
5: X 0 ← Append(X 0 , First(X)) . Linear time
6: L ← Rest(X)
The nature to build result from right reminds us f oldr. Define f to test an element
against the predicate, and prepend it to the result: f p x as = if p x then x:as else as.
Use its Curried form to define f ilter:
Filter is a generic concept not only limit to list. We can apply a predicate to any
traversable structure to extract things.
Match is to find a pattern from some structure. Even if limit to list and string, there
are still too many things to cover (chapter 14). The very basic problem is to test whether
list as exits in bs as a sub-list. There are two special cases: to test if as is prefix or
suffix of bs. The span function actually finds the longest prefix under a given predicate.
Similarly, we can compare each element between as and bs. Define as ⊆ bs if as is a
prefix of bs:
[ ] ⊆ bs = T rue
(a:as) ⊆ [ ] = F
(alse
(1.66)
a 6= b : F alse
(a:as) ⊆ (b:bs) =
a = b : as ⊆ bs
Prefix testing takes linear time to scan the two lists. However, we can not do suffix
testing in this way because it is expensive to align the right ends and scan backwards.
This is different from array. Alternatively, we can reverse both lists in linear time, convert
the problem to prefix testing:
With ⊆, we can test if a list is the sub-list of another one (infix testing). Define empty
is infix of any list, we repeatedly apply prefix testing while traverse bs:
infix? (a:as) [ ] = F
(alse
as ⊆ (b:bs) : T rue (1.68)
infix? as (b:bs) =
otherwise : infix? as bs
9: return FALSE
Because prefix testing runs in linear time, and is called in every loop. This implemen-
tation is bound to O(mn) time, where m, n are the length of the two lists. Symmetrically,
we can enumerate all suffixes of B, and test if A is prefix of any:
Where isPrefixOf does the prefixing testing, tails generates all suffixes of a given
list (exercise of this section).
Exercise 1.12
1.12.1. Implement the linear time filter algorithm through reverse.
1.12.2. Enumerate all suffixes of a list.
zip as [ ] = [ ]
zip [ ] bs = [ ] (1.70)
zip (a:as) (b:bs) = (a, b) : zip as bs
This implementation works even the two lists have different lengths. The result has
the same length as the shorter one. We can even zip infinite lists (under lazy evaluation),
for example10 : zip [0, 0, ...] [1, 2, ..., n]. For a list of words, we can index it as: zip [1, 2,
...] [a, an, another, ...]. zip builds the result from right. We can define it with f oldr. It
is bound to O(m) time, where m is the length of the shorter list. When implement the
iterative zip, the performance will drop to quadratic if using Append, we can use Cons
then reverse the result. However, this method can’t handle two infinite lists. In imperative
settings, we can reuse A to hold the zip result (treat as transform every element to a pair).
1: function Zip(A, B)
2: C ← NIL
3: while A 6= NIL and B 6= NIL do
4: C ← Append(C, (First(A), First(B))) . Linear time
5: A ← Rest(A)
6: B ← Rest(B)
7: return C
We can extend to zip multiple lists. Some programming environments provide, zip,
zip3, zip4, ... Sometimes, we want to apply a binary function to combine elements,
but not just form a pair. For example, given a list of unit prices [1.00, 0.80, 10.05, ...] for
fruits: apple, orange, banana, ... and a list of quantities, like [3, 1, 0, ...], meaning, buy 3
apples, 1 orange, 0 banana, ... Below program generates the payment list:
10 Or zip (repeat 0) [1..n], where repeat x = x : repeat x.
1.8. ZIP AND UNZIP 25
pays us [ ] = [ ]
pays [ ] qs = [ ]
pays (u:us) (q:qs) = uq : pays us qs
It has the same structure as zip except using multiply but not ‘cons’. We can abstract
the binary function as f :
zipW ith f as [ ] = [ ]
zipW ith f [ ] bs = [ ] (1.71)
zipW ith f (a:as) (b:bs) = (f a b) : zipW ith f as bs
We can define the inner product (or dot product) [98] as: A·B = sum (zipW ith (·) A B),
or define the infinite Fibonacci sequence with lazy evaluation:
Let F be the infinite Fibonacci numbers, starts from 0 and 1. F 0 drops the head. From
the third number, every Fibonacci number is the sum of the corresponding numbers from
F and F 0 at the same position. Below example program takes the first 15 Fibonacci
numbers:
fib = 0 : 1 : zipWith (+) fib (tail fib)
take 15 fib
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377]
unzip is the inverse of zip. It converts a list of pairs to two separated lists. Define it
with f oldr in Curried form:
For the fruits example, given the unit price as an assoc list: U = [(apple, 1.00),
(orange, 0.80), (banana, 10.05), ...], the purchased quantity is also an assoc list: Q =
[(apple, 3), (orange, 1), (banana, 0), ...]. We extract the unit prices and the quantities,
then compute their inner-product:
zip and unzip are generic. We can expand to zip two trees, where the nodes contain
paired elements from both. When traverse a collection of elements, we can also use the
generic zip and unzip to track the path. This is a method to mimic the ‘parent’ reference
in imperative implementation (last chapter of [10] ).
List is fundamental to build more complex data structures and algorithms particularly
in functional settings. We introduced elementary algorithms to construct, access, update,
and transform list; how to search, filter data, and compute with list. Although most pro-
gramming environments provide pre-defined tools and libraries to support list, we should
not simply treat them as black-boxes. Rabhi and Lapalme introduce many functional al-
gorithms about list [72] . Haskell library provides detailed documentation about basic list
algorithms. Bird gives good examples of folding [1] , and introduces about the fold fusion
law.
Exercise 1.13
1.13.1. Design the iota (the Greek letter I) operator for list, below are the use cases:
• iota(..., n) = [1, 2, 3, ..., n];
• iota(m, n) = [m, m + 1, m + 2, ..., n], where m ≤ n;
26 Binary search tree
Array and list are typically considered the basic data structures. However, we’ll see they
are not necessarily easy to implement in chapter 12. Upon imperative settings, array is
the most elementary data structures. It is possible to implement linked-list using arrays
(section 3.4). While in functional settings, linked-list acts as the building blocks to create
array and other data structures. The binary search trees is another basic data structure.
Jon Bentley gives a problem in Programming Pearls [2] : how to count the number of word
occurrences in a text. Here is a solution:
void wordCount(Input in) {
Map<String, Int> map
while String w = read(in) {
map[w] = if map[w] == null then 1 else map[w] + 1
}
for var (w, c) in map {
print(w, ":", c)
}
}
2.1 Definition
The map is a binary search tree. Here we use the word as the key, and its occurrence
number as the value. This program is a typical application of binary search tree. Let
us firstly define the binary tree. A binary tree is either empty (∅)1 ; or contains 3 parts:
an element k, and two sub-trees called left(l) and right(r) children, denoted as (l, k, r).
A none empty binary tree consists of multiple nodes, each is either empty or stores the
element of type K. We define the type of the binary tree as T ree K. We say a node is a
leaf if both sub-trees are empty, otherwise it’s a branch node.
A binary search tree is a special binary tree that its elements are comparable2 , and
satisfies: for any non empty node (l, k, r), all the keys in the left sub-tree < k; k < any
key in the right sub-tree. Figure 2.2 shows an example of binary search tree. Comparing
with fig. 2.1, we can see the different ordering. For this reason, we call the comparable
element as key, and the augmented data as value. The type is T ree (K, V ). A node
contains a key, a value (optional), left, right sub-tree references, and a parent reference
for easy backtracking. When the context is clear, we skip the value. The appendix
1 The great mathematician André Weil invented this symbol for null set. It comes from the Norwegian
alphabet.
2 It is abstract ordering, not limit to magnitude, but like precedence, subset of etc. the ‘less than’ (<)
is abstract.
27
28 CHAPTER 2. BINARY SEARCH TREE
16
4 10
14 7 9 3
2 8 1
3 8
1 7 16
2
10
9 14
2.2 Insert
When insert a key k (or with the value) to the binary search tree T , we need maintain
the ordering. If the tree is empty, create a leaf of k. Otherwise, let the tree be (l, x, r). If
k < x, insert it to the left sub-tree l; otherwise, insert to the right r. If k = x, it already
exists in the tree. We overwrite the value (update). Alternatively, we can append the
data or do nothing. We skip this case.
insert k ∅ = (∅,
( k, ∅)
k<x: (insert k l, x, r) (2.1)
insert k (l, x, r) =
otherwise : (l, x, insert k r)
This implementation uses the pattern matching feature. There is an example without
pattern matching in the appendix. We can eliminate the recursion with iterative loops:
2.3. TRAVERSE 29
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: parent ← NIL
5: while T 6= NIL do
6: parent ← T
7: if k < Key(T ) then
8: T ← Left(T )
9: else
10: T ← Right(T )
11: Parent(x) ← parent
12: if parent = NIL then . T in empty
13: return x
14: else if k < Key(parent) then
15: Left(parent) ← x
16: else
17: Right(parent) ← x
18: return root
key ∅ = Nothing
(2.2)
key (l, k, r) = Just k
We can repeat insert every element from a list, convert the list to a binary search tree:
fromList [ ] = ∅
fromList (x:xs) = insert x (fromList xs)
2.3 Traverse
There are 3 ways to visit the elements in a binary tree: pre-order, in-order, and post-order.
They are named to highlight the order of visiting key between/before/after sub-trees.
Exercise 2.1
2.1.1. Given the in-order and pre-order traverse results, rebuild the tree, and output the
post-order traverse result. For example:
• Pre-order: 1, 2, 4, 3, 5, 6;
• In-order: 4, 2, 1, 5, 3, 6;
• Post-order: ?
2.1.2. Write a program to rebuild the binary tree from the pre-order and in-order traverse
lists.
2.1.3. For binary search tree, prove that the in-order traverse always gives ordered list.
2.1.4. What is the complexity of tree sort?
2.1.5. Define toList with fold.
2.1.6. Define depth t with fold, which calculates the height of a binary tree.
2.4 Query
Because the binary search tree organises ordered elements recursively, it supports varies
of query efficiently. This is the reason we name it ‘search’ tree. There are three types
of query: (1) lookup a key; (2) find the minimum or maximum; (3) given a node, find
its predecessor or successor. When lookup the value of some key x in a tree of type
T ree (K, V ):
lookup x ∅ = Nothing
Just v
k = x :
(2.9)
lookup x (l, (k, v), r), x) = x<k: lookup x l
otherwise : lookup x r
We use the Maybe type3 to handle the ‘not found’ case. Let the height of the tree
be h, the performance of lookup is O(h). If the tree is balanced (see chapter 4), the
performance is O(lg n), where n is the number of elements. It decreases to O(n) time
in the worse case for extremely unbalanced tree. Below implementation eliminates the
recursion with loops:
1: function Lookup(T, x)
2: while T 6= NIL and Key(T ) 6= x do
3: if x < Key(T ) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: return Value(T ) . returns ∅ if T =NIL
3 Also known as Optional<T> type, see chapter 1.
32 CHAPTER 2. BINARY SEARCH TREE
In binary search tree, the less keys are on the left, while the greater keys are on the
right. To locate the minimum, we keep going to the left till the left sub-tree is empty.
Symmetrically, we keep going to the right to find the maximum. Both min / max are
bound to O(h) time, where h is the height of the tree.
We sometimes traverse a binary search tree as a container. Start from the minimum,
keep moving forward step by step towards the maximum, or go back and forth. Below
example program prints elements in sorted order.
void printTree (Node<T> t) {
for var it = Iterator(t), it.hasNext(), it = it.next() {
print(it.get(), ", ")
}
}
Such use case need to find the successor or predecessor of a node. Define the successor
of x as the smallest y that x < y. If x has none empty right sub-tree r, the minimum of r
is the successor. As shown in fig. 2.3, to find the successor of 8, we search the minimum
in its right, which is 9. If the right sub-tree of x is empty, we need back-track along the
parent till the closest ancestor whose left sub-tree is also an ancestor of x. In fig. 2.3,
since node 2 does not have right sub-tree, we go up to its parent of node 1. However,
node 1 does not have left sub-tree, we need go up again, hence reach to node 3. As the
left sub-tree of node 3 is also an ancestor of node 2, node 3 is the successor of node 2.
3 8
1 7 16
2
10
9 14
Figure 2.3: The successor of 8 is 9, the minimum of its right; for the successor of 2, we
go up to its parent 1, then 3.
If we finally reach to the root along the parent path, but still can not find an ancestor
on the right, then the node does not have the successor (the last element). Below algorithm
finds the successor of x:
1: function Succ(x)
2: if Right(x) 6= NIL then
3: return Min(Right(x))
4: else
5: p ← Parent(x)
6: while p 6= NIL and x = Right(p) do
7: x←p
8: p ← Parent(p)
9: return p
2.5. DELETE 33
This algorithm returns NIL when x hasn’t the successor. The predecessor algorithm
is symmetric:
1: function Pred(x)
2: if Left(x) 6= NIL then
3: return Max(Left(x))
4: else
5: p ← Parent(x)
6: while p 6= NIL and x = Left(p) do
7: x←p
8: p ← Parent(p)
9: return p
The purely functional settings don’t use parent reference4 . Some implementation
records the visited paths for back-track or tree rebuilding, called zipper(last chapter
of [10] ). The original purpose for Succ and Pred is ‘to traverse the tree’ as a container.
However, we typically in-order traverse the tree through map in functional settings. It’s
only meaningful to find the successor and predecessor in imperative settings.
Exercise 2.2
2.5 Delete
We need maintain the ordering while delete: for any node (l, k, r), all left are still less
than k, all right are still greater than k after delete. To delete x [6] : (1) if x is a leaf or
only has a none empty sub-tree, cut x off; (2) if x has two none empty sub-trees, use
the minimum y of its right sub-tree to replace x, then cut the original y off. Because
the minimum of the right sub-tree can not have two none empty sub-trees, we eventually
convert case 2 to 1, directly cut the minimum node off, as shown in figs. 2.4 to 2.6.
NIL NIL
NIL
NIL
delete x ∅ = ∅
x < k : (delete x l, k, r)
(2.11)
delete x (l, k, r) = x > k : (l, k, delete x r)
x = k : del l r
Where:
del ∅ r = r
del l ∅ = l (2.12)
del l r = (l, y, delete y r), where y = min r
The performance of delete is O(h), where h is the height. The imperative implemen-
tation needs set the parent reference in addition.
1: function Delete(T, x)
2: r←T
3: x0 ← x . save x
4: p ← Parent(x)
5: if Left(x) = NIL then
6: x ← Right(x)
7: else if Right(x) = NIL then
8: x ← Left(x)
9: else . neither sub-tree is empty
10: y ← Min(Right(x))
11: Key(x) ← Key(y)
12: Value(x) ← Value(y)
13: if Parent(y) 6= x then . y does not have left sub-tree
14: Left(Parent(y)) ← Right(y)
15: else . y is the root of the right sub-tree
16: Right(x) ← Right(y)
17: if Right(y) 6= NIL then
18: Parent(Right(y)) ← Parent(y)
19: Remove y
20: return r
21: if x 6= NIL then
22: Parent(x) ← p
23: if p = NIL then . remove the root
24: r←x
25: else
26: if Left(p) = x0 then
27: Left(p) ← x
28: else
29: Right(p) ← x
30: Remove x0
31: return r
Assume x is not empty, first record the root, copy reference to x and its parent. When
delete, we also need handle the special case, that y is the root of the right sub-tree.
Finally, we need reset the stored parent if x has only one none empty sub-tree. If the
copied parent is empty, we are deleting the root. We return the new root in this case.
After setting the parent, we can safely remove x.
The performance of the binary search tree algorithms depend on the height h. When
unbalanced, O(h) is close to O(n), while for well balanced tree, O(h) is close to O(lg n).
36 CHAPTER 2. BINARY SEARCH TREE
Chapter 4 and 5 introduce self-balanced solution, there is another simple balance method:
shuffle the elements, then build the tree [4] . It decreases the possibility of poorly balanced.
We can use binary search tree to realize the map data structure (also known as asso-
ciative data structure or dictionary). A finite map is a collection of key-value pairs. Each
key is unique, and mapped to some value. For keys of type K, values of type V , the type
of the map is M ap K V or Map<K, V>. For none empty map, it contains n mappings of
{k1 7→ v1 , k2 7→ v2 , ..., kn 7→ vn }. When use the binary search tree to implement map, we
constrain K to be an ordered set. Every node stores a pair of key and value. The type of
the tree is T ree (K, V ). We use the tree insert/update operation to associate a key with
a value. Given a key k, we use lookup to find the mapped value v, or returns nothing or
∅ when k does not exist. The red-black tree and AVL tree in chapter 4 and 5 can also
implement map.
Exercise 2.3
2.3.1. There is a symmetric deletion algorithm. When neither sub-tree is empty, we
replace with the maximum of the left sub-tree, then cut it off. Write a program to
implement this solution.
2.3.2. Write a randomly building algorithm for binary search tree.
2.3.3. How to find the two nodes with the greatest distance in a binary tree?
Node(Node<T> l, T k, Node<T> r) {
left = l, key = k, right = r
if (left 6= null) then left.parent = this
if (right 6= null) then right.parent = this
}
}
foldt _ _ z Empty = z
foldt f g z (Node l k r) = g (foldt f g z l) (f k) (foldt f g z r)
Elementary Algorithms 37
fold _ z Empty = z
fold f z (Node l k r) = fold f (k `f` (fold f z r)) l
delete:
delete _ Empty = Empty
delete x (Node l k r) | x < k = Node (delete x l) k r
| x > k = Node l k (delete x r)
| otherwise = del l r
where
del Empty r = r
del l Empty = l
del l r = let k' = min r in Node l k' (delete k' r)
38 Insertion sort
Chapter 3
Insertion sort
3.1 Introduction
Insertion sort is a straightforward sort algorithm1 . We give its preliminary definition
for list in chapter 1. For a collection of comparable elements, we repeatedly pick one,
insert them to a list and maintain the ordering. As every insertion takes linear time, its
performance is bound to O(n2 ) where n is the number of elements. This performance is
not as good as the divide and conqueror sort algorithms, like quick sort and merge sort.
However, we can still find its application today. For example, a well tuned quick sort
implementation falls back to insertion sort for small data set. The idea of insertion sort is
similar to sort a deck of a poker cards( [4] pp.15). The cards are shuffled. A player takes
card one by one. At any time, all cards on hand are sorted. When draws a new card, the
player inserts it in proper position according to the order of points as shown in fig. 3.1.
39
40 CHAPTER 3. INSERTION SORT
2: for i ← 2 to |A| do
3: ordered insert A[i] to A[1...(i − 1)]
Where the index i ranges from 1 to n = |A|. We start from 2, because the singleton
sub-array of A[1] is ordered. When process the i-th element, all elements before i are
sorted. We continuously insert elements till n, as shown in fig. 3.2.
insert
3.2 Insertion
In chapter 1, we define the ordered insertion for list. For array, we scan to locate the
insert position either from left or right. Below algorithm is from right:
1: function Sort(A)
2: for i ← 2 to |A| do . Insert A[i] to A[1...(i − 1)]
3: x ← A[i] . Save A[i] to x
4: j ←i−1
5: while j > 0 and x < A[j] do
6: A[j + 1] ← A[j]
7: j ←j−1
8: A[j + 1] ← x
It’s expensive to insert at arbitrary position, as array stores elements continuously.
When insert x at position i, we need shift all elements after i (i.e. A[i + 1], A[i + 2], ...)
right. then put x in the freed cell, as shown in fig. 3.3.
insert
For the array of length n, suppose after comparing x to the first i elements, we located
the position to insert. Then we shift the rest n − i + 1 elements, and put x in the i-th
cell. Overall, we traverse the entire array if scan from left. While, if scan from right, we
examine n − i + 1 elements, and do the same amount of shifts. The insertion takes linear
time no matter scans from left or right, hence the sort algorithm is bound to O(n2 ). We
can also define a separated Insert() function, and call it inside the loop.
Exercise 3.1
3.1.1. Implement the insert to scan from left to right.
3.3. BINARY SEARCH 41
3.1.2. Define the insert function, and call it from the sort algorithm.
3.4 List
With binary search, the total number of comparisons reduced to O(n lg n). However, as
we need shift array cells when insert, the overall time is still bound to O(n2 ). On the
other hand, when use list, the insert operation is constant time at a given node reference.
In chapter 1, we define the insertion sort algorithm for list as below:
sort [ ] = [ ]
(3.1)
sort (x:xs) = insert x (sort xs)
Or define with f oldr in Curried form: sort = f oldr insert [ ]. However, the list insert
algorithm still takes linear time, because we need scan to locate the insert position:
insert x [ ] = [x]
(
x≤y: x : y : ys (3.2)
insert x (y:ys) =
otherwise : y : insert x ys
42 Red-black tree
Instead of using node reference, we can also realize list through an additional index
array. For every element A[i], N ext[i] stores the index to the next element follows A[i],
i.e. A[N ext[i]] is the next element of A[i]. There are two special indexes: for the tail node
A[m], we define N ext[m] = −1, indicating it points to NIL; we also define N ext[0] to
index the head element. With the index array, we can implement the insertion algorithm
as below:
1: function Insert(A, N ext, i)
2: j←0 . N ext[0] for head
3: while N ext[j] 6= −1 and A[N ext[j]] < A[i] do
4: j ← N ext[j]
5: N ext[i] ← N ext[j]
6: N ext[j] ← i
7: function Sort(A)
8: n ← |A|
9: N ext = [1, 2, ..., n, −1] . n + 1 indexes
10: for i ← 1 to n do
11: Insert(A, N ext, i)
12: return N ext
With list, although the insert operation changes to constant time, we need traverse
the list to locate the position. It is still bound to O(n2 ) times comparison. Unlike array,
list does not support random access, hence we can not use binary search to speed up.
Exercise 3.2
3.2.1. For the index array based list, we return the re-arranged index as result. Design
an algorithm to re-order the original array A from the index N ext.
Red-black tree
As the example in chapter 2, we use the binary search tree as a dictionary to count the
word occurrence. One may want to feed a address book, and use the binary search tree
to lookup the contact, for example:
void addrBook(Input in) {
Map<String, String> dict
while (String name, String addr) = read(in) {
dict[name] = addr
}
loop {
string name = read(Console)
var addr = dict[name]
if (addr == null) {
print("not found")
} else {
print("address: ", addr)
}
}
}
Unlike the word counter program, this one performs poorly, especially when search
names like Zara, Zed, Zulu, etc. This is because the address entries are typically in
lexicographic order. If insert numbers 1, 2, 3, ..., n to a binary search tree, it ends up
like in fig. 4.1. It is an extremely unbalanced binary search tree. The lookup is bound
to O(h) time for a tree of height h. When the tree is well balanced, the performance is
O(lg n), where n is the number of elements. But in this extreme case, the performance
downgrades to O(n), same as list scan.
...
43
44 CHAPTER 4. RED-BLACK TREE
Exercise 4.1
4.1.1. For a big address book in lexicographic order, one may want to speed up with two
concurrent tasks: one reads from the head; the other from the tail. They meet and
stop at some middle point. What does the binary search tree look like? What if
split the list into multiple sections to scale up the concurrency?
n
n
1
n-1
n-1
n-2
2
... n-2
3
1
...
(a) (b)
m-1 m+1
... ...
1 n
(c)
4.1 Balance
To avoid extremely unbalanced tree, we can shuffle the input(section 2.5), however, we can
not randomize interactive input (e.g. entered by user). Most re-balancing solutions rely
on the tree rotation. It changes the tree structure while maintain the elements ordering.
This chapter introduces the red-black tree, a popular self-balancing binary search tree.
Next chapter is about AVL tree, another self-balancing tree. Chapter 8 introduces the
splay tree. It adjusts the tree in steps. Multiple binary trees can have the same in-order
traverse result. Figure 4.3 shows the tree rotation. We can define them with pattern
matching:
x y
a y x c
b c a b
1: function Left-Rotate(T, x)
2: p ← Parent(x)
3: y ← Right(x) . assume y 6= NIL
4: a ← Left(x)
5: b ← Left(y)
6: c ← Right(y)
7: Replace(x, y) . replace node x with y
8: Set-Subtrees(x, a, b) . Set a, b as the sub-trees of x
9: Set-Subtrees(y, x, c) . Set x, c as the sub-trees of y
10: if p = NIL then . x was the root
11: T ←y
12: return T
The Right-Rotate is symmetric (as Exercise 4.2). The Replace(x, y) uses node y
to replace x:
1: function Replace(x, y)
2: p ← Parent(x)
3: if p = NIL then . x is the root
4: if y 6= NIL then Parent(y) ← NIL
5: else if Left(p) = x then
6: Set-Left(p, y)
7: else
8: Set-Right(p, y)
9: Parent(x) ← NIL
Procedure Set-Subtrees(x, L, R) assigns L as the left, and R as the right sub-trees
of x:
1: function Set-Subtrees(x, L, R)
2: Set-Left(x, L)
3: Set-Right(x, R)
It further calls Set-Left and Set-Right to set the two sub-trees:
1: function Set-Left(x, y)
2: Left(x) ← y
3: if y 6= NIL then Parent(y) ← x
4: function Set-Right(x, y)
5: Right(x) ← y
6: if y 6= NIL then Parent(y) ← x
We can see how pattern matching simplifies the tree rotation. Based on this idea,
Okasaki developed the purely functional algorithm for red-black tree in 1995 [13] .
Exercise 4.2
46 CHAPTER 4. RED-BLACK TREE
4.2 Definition
A red-black tree is a self-balancing binary search tree [14] . It is equivalent to 2-3-4 tree1 .
By coloring the node red or black, and performing rotation, red-black tree provides an
efficient way to keep the tree balanced. On top of the binary search tree definition, we
label the node with a color. We say it is a red-black tree if the coloring satisfies the
following 5 rules( [4] pp273):
1. Every node is either red or black.
2. The root is black.
3. Every NIL node is black.
4. If a node is red, then both sub-trees are black.
5. For every node, all paths from it to descendant leaves contain the same number of
black nodes.
Why do they keep the red-black tree balanced? The key point is that, the longest
path from the root to leaf can not exceed 2 times of the shortest path. Consider rule 4,
there can not be any two adjacent red nodes. Hence the shortest path only contains black
nodes. Any longer path must have red ones. In addition, rule 5 ensures all paths have
the same number of black nodes. So as to the root. It eventually ensures any path can’t
exceed 2 times of the others [14] . Figure 4.4 gives an example of red-black tree.
13
8 17
1
11 15 25
6
NIL NIL NIL NIL NIL 22 27
As all NIL nodes are black, we can hide them as shown in fig. 4.5. All operations
including lookup, min / max, are same as the binary search tree. However, the insert and
delete are special, as we need maintain the coloring rules. Below example program adds
the color variable atop binary search tree definition. Denote the empty tree as ∅, the
none empty tree as (c, l, k, r), where c is the color (red/black), k is the element, l and r
are left and right sub-trees.
data Color = R | B
data RBTree a = Empty | Node Color (RBTree a) a (RBTree a)
Exercise 4.3
4.3.1. Prove the height h of a red-black tree of n nodes is at most 2 lg(n + 1)
1 Chapter 7, B-tree. For any 2-3-4 tree, there is at least one red-black tree has the same ordered data.
4.3. INSERT 47
13
8 17
1
11 15 25
6
22 27
4.3 Insert
The insert operation takes two steps. The first is as same as the binary search tree.
The second is to restore the coloring if it becomes unbalanced. We always color the new
element red unless it is the root. Hence don’t break any coloring rules except the 4-th
(As it may bring two adjacent red nodes). There are 4 cases violate rule 4. They share
the same structure after fixing [13] as shown in fig. 4.6.
x
z
a y
y d
b z
x c y
c d
a b x z
z a b c d
x
x d a z
a y
y d
b c b c
All 4 transformations move the redness one level up. When fix recursively bottom-up,
it may color the root red, hence violates rule 2. We need revert the root black finally.
Define a balance function to fix the coloring with pattern matching. Denote the color as
C with values black B, and red R.
If none of the 4 patterns matches, we leave the tree unchanged. Define insert x T =
makeBlack (ins x T ), or in Curried form:
Where:
ins x ∅ = (R,
( ∅, x, ∅)
x < k : balance C (ins x l) k r (4.4)
ins x (C, l, k, r) =
x > k : balance C l k (ins x r)
If the tree is empty, we create a red leaf of x; otherwise, compare x and k, recursively
insert x to a sub-tree. After that, call balance to fix the coloring, finally force the root to
be black.
makeBlack (C, l, k, r) = (B, l, k, r) (4.5)
Below is the example program:
insert x = makeBlack ◦ (ins x) where
ins x Empty = Node R Empty x Empty
ins x (Node color l k r)
| x < k = balance color (ins x l) k r
| otherwise = balance color l k (ins x r)
makeBlack (Node _ l k r) = Node B l k r
We skip to handle the duplicated keys. If the key already exists, we can overwrite,
drop, or store the values in a list ( [4] , pp269). Figure 4.7 shows two red-black trees built
from sequence 11, 2, 14, 1, 7, 15, 5, 8, 4 and 1, 2, ..., 8. The second example is well
balanced even for ordered input.
7 4
2 14 2 6
7
1 5 11 15 1 3 5
4 8 8
The insert performs top-down recursive fixing. It is bound to O(h) time, where h
is the height. As the coloring rules are maintained, h is logarithm to n, the number of
elements. The overall performance is O(lg n).
Exercise 4.4
4.4.1. Implement the insert without pattern matching, handle the 4 cases separately.
4.4 Delete
Delete is more complex than insert. We can simplify the recursive implementation with
pattern matching2 . There are alternative implementation to mimic delete. Build a read-
only tree for frequently looking up [5] . When delete a node, mark it with a flag, and trigger
2 Actually, we reuse the unchanged part to rebuild the tree in purely functional settings, known as the
‘persist’ feature
4.4. DELETE 49
tree rebuilding if such nodes exceed 50%. Delete may also violate the coloring rules, hence
need fixing. The violation only happens when delete a black node according to rule 5.
The black nodes along the path decrease by one, causing not all paths contain the same
number of black nodes. To resume the blackness, we introduce a special ‘doubly-black’
node( [4] , pp290). Such a node is counted as 2 black nodes. When delete a black node x,
move the blackness up to parent or down to a sub-tree. Let node y accept the blackness.
If y was red, turn it black; if y was already black, turn it ‘doubly-black’ as B 2 . Below
example program adds the ‘doubly-black’ to color definition.
data Color = R | B | BB
data RBTree a = Empty | BBEmpty | Node Color (RBTree a) a (RBTree a)
Because NIL is black, when push the blackness down to NIL, it becomes ‘doubly-black’
empty (BBEmpty, or bold ∅ ). The first step is normal binary search tree delete; then, if
we cut a black node off, shift the blackness, and fix the coloring (Curried form):
When delete a singleton tree, it becomes empty. To cover this case, we modify
makeBlack as:
makeBlack ∅ =
(4.7)
∅
makeBlack (C, l, k, r) = (B, l, k, r)
del x ∅ = ∅
x < k : f ixB 2 (C, (del x l), k, r)
2
x > k : fixB (C, l, k, (del x r))
del x (C, l, k, r) = l = ∅ :
if C = B then shif tB r else r
if C = B then shif tB l else l
x = k : r = ∅ :
otherwise : f ixB 2 (C, l, m, (del m r)), where : m = min(r)
(4.8)
When the tree is empty, the result is ∅; otherwise, we compare x and k. If x < k, we
recursively delete from left; otherwise right. Because the result may contain doubly-black
node, we apply f ixB 2 . When x = k, we found the node to cut. If either sub-tree is
empty, replace with the none empty one, then shift the blackness if the node was black.
If neither sub-tree is empty, cut the minimum m = min r off, and use m to replace k.
To reserve the blackness, shif tB makes a black node doubly-black, and forces it black
otherwise. It flips doubly-black to normal black when applied twice.
shif tB (B, l, k, r) = (B 2 , l, k, r)
shif tB (C, l, k, r) = (B, l, k, r)
(4.9)
shif tB ∅ = ∅
shif tB ∅ = ∅
makeBlack _ = Empty
The f ixB 2 function eliminates the doubly-black by rotation and re-coloring. The
doubly-black node can be a branch or empty ∅ . There are three cases:
Case 1. The sibling of the doubly-black node is black, and it has a red sub-tree. We
can fix this case with a rotation. There are 4 sub-cases, all can transform to the same
pattern, as shown in fig. 37.
x x
a z a y
y d b z
y
b c c
x z d
a d z
z b c
y d
x d
x c
a y
a b
b c
f ixB 2 C aB2 x (B, (R, b, y, c), z, d) = (C, (B, shif tB(a), x, b), y, (B, c, z, d))
f ixB 2 C aB2 x (B, b, y, (R, c, z, d)) = (C, (B, shif tB(a), x, b), y, (B, c, z, d))
f ixB 2 C (B, a, x, (R, b, y, c)) z dB2 = (C, (B, a, x, b), y, (B, c, z, shif tB(d)))
f ixB 2 C (B, (R, a, x, b), y, c) z dB2 = (C, (B, a, x, b), y, (B, c, z, shif tB(d)))
(4.10)
Where aB2 means node a is doubly-black.
Case 2. The sibling of the doubly-black is red. We can rotate the tree to turn it into
case 1 or 3, as shown in fig. 38. We add this fixing as additional 2 rows in eq. (4.10):
...
f ixB 2 B aB2 x (R, b, y, c) = f ixB 2 B (f ixB 2 R a x b) y c (4.11)
f ixB 2 B (R, a, x, b) y cB2 = f ixB 2 B a x (f ixB 2 R b y c)
Case 3. The sibling of the doubly-black node, and its two sub-trees are all black. In
this case, we change the sibling to red, flip the doubly-black node to black, and propagate
the doubly-blackness a level up to parent as shown in fig. 39. There are two symmetric
sub-cases. For the upper case, x was either red or black. x changes to black if it was red,
otherwise changes to doubly-black; Same coloring changes to y in the lower case. We add
this fixing to eq. (4.11):
4.4. DELETE 51
x y
a y x c
b c a b
y x
x c a y
a b b c
x x
a y a y
b c b c
y y
x c x c
a b a b
...
f ixB 2 C aB2 x (B, b, y, c) = shif tB (C, (shif tB a), x, (R, b, y, c))
(4.12)
f ixB 2 C (B, a, x, b) y cB2 = shif tB (C, (R, a, x, b), y, (shif tB c))
f ixB 2 C l k r = (C, l, k, r)
If none of the patterns match, the last row keeps the node unchanged. The doubly-
black fixing is recursive. It terminates in two ways: One is Case 1, the doubly-black
node is eliminated. Otherwise the blackness may move up till the root. Finally the we
force the root be black. Below example program puts all three cases together:
fixDB color a@(Node BB _ _ _) x (Node B (Node R b y c) z d)
= Node color (Node B (shiftBlack a) x b) y (Node B c z d)
fixDB color BBEmpty x (Node B (Node R b y c) z d)
= Node color (Node B Empty x b) y (Node B c z d)
fixDB color a@(Node BB _ _ _) x (Node B b y (Node R c z d))
= Node color (Node B (shiftBlack a) x b) y (Node B c z d)
fixDB color BBEmpty x (Node B b y (Node R c z d))
= Node color (Node B Empty x b) y (Node B c z d)
fixDB color (Node B a x (Node R b y c)) z d@(Node BB _ _ _)
= Node color (Node B a x b) y (Node B c z (shiftBlack d))
fixDB color (Node B a x (Node R b y c)) z BBEmpty
= Node color (Node B a x b) y (Node B c z Empty)
fixDB color (Node B (Node R a x b) y c) z d@(Node BB _ _ _)
= Node color (Node B a x b) y (Node B c z (shiftBlack d))
fixDB color (Node B (Node R a x b) y c) z BBEmpty
= Node color (Node B a x b) y (Node B c z Empty)
fixDB B a@(Node BB _ _ _) x (Node R b y c)
= fixDB B (fixDB R a x b) y c
fixDB B a@BBEmpty x (Node R b y c)
= fixDB B (fixDB R a x b) y c
fixDB B (Node R a x b) y c@(Node BB _ _ _)
= fixDB B a x (fixDB R b y c)
fixDB B (Node R a x b) y c@BBEmpty
= fixDB B a x (fixDB R b y c)
fixDB color a@(Node BB _ _ _) x (Node B b y c)
= shiftBlack (Node color (shiftBlack a) x (Node R b y c))
fixDB color BBEmpty x (Node B b y c)
= shiftBlack (Node color Empty x (Node R b y c))
fixDB color (Node B a x b) y c@(Node BB _ _ _)
= shiftBlack (Node color (Node R a x b) y (shiftBlack c))
fixDB color (Node B a x b) y BBEmpty
= shiftBlack (Node color (Node R a x b) y Empty)
fixDB color l k r = Node color l k r
The delete algorithm is bound to O(h) time, where h is the height of the tree. As
red-black tree maintains the balance, h = O(lg n) for n nodes.
Exercise 4.5
4.5.1. Implement the ‘mark-rebuild’ delete algorithm: mark the node as deleted without
actually removing it. When the marked nodes exceed 50%, rebuild the tree.
3: x ← Create-Leaf(k)
4: Color(x) ← RED
5: p ← NIL
6: while T 6= NIL do
7: p←T
8: if k < Key(T ) then
9: T ← Left(T )
10: else
11: T ← Right(T )
12: Parent(x) ← p
13: if p = NIL then . tree T is empty
14: return x
15: else if k < Key(p) then
16: Left(p) ← x
17: else
18: Right(p) ← x
19: return Insert-Fix(root, x)
We make the new node red, and then perform fixing before return. There are 3 basic
cases, each one has a symmetric case, hence there are total 6 cases. Among them, we can
merge two cases, because both have a red ‘uncle’ node. We change the parent and uncle
to black, and set grand parent to red:
1: function Insert-Fix(T, x)
2: while Parent(x) 6= NIL and Color(Parent(x)) = RED do
3: if Color(Uncle(x)) = RED then . Case 1, x’s uncle is red
4: Color(Parent(x)) ← BLACK
5: Color(Grand-Parent(x)) ← RED
6: Color(Uncle(x)) ← BLACK
7: x ← Grand-Parent(x)
8: else . x’s uncle is black
9: if Parent(x) = Left(Grand-Parent(x)) then
10: if x = Right(Parent(x)) then . Case 2, x is on the right
11: x ← Parent(x)
12: T ← Left-Rotate(T, x)
. Case 3, x is on the left
13: Color(Parent(x)) ← BLACK
14: Color(Grand-Parent(x)) ← RED
15: T ← Right-Rotate(T , Grand-Parent(x))
16: else
17: if x = Left(Parent(x)) then . Case 2, Symmetric
18: x ← Parent(x)
19: T ← Right-Rotate(T, x)
. Case 3, Symmetric
20: Color(Parent(x)) ← BLACK
21: Color(Grand-Parent(x)) ← RED
22: T ← Left-Rotate(T , Grand-Parent(x))
23: Color(T ) ← BLACK
24: return T
This algorithm takes O(lg n) time to insert a key, where n is the number of nodes.
Compare to the balance function defined previously, they have different logic. Even input
the same sequence of keys, they build different red-black trees, as shown in Figure 4.11
and fig. 4.7. There is a bit performance overhead in the pattern matching algorithm.
54 CHAPTER 4. RED-BLACK TREE
11 5
2 14 2 7
1 7 15 1 4 6 9
5 8 3 8
Self setLeft(l) {
left = l
if l 6= null then l.parent = this
}
Self setRight(r) {
right = r
if r 6= null then r.parent = this
}
root = t
x = Node(key)
parent = null
while (t 6= null) {
parent = t
t = if (key < t.key) then t.left else t.right
}
if (parent == null) { //tree is empty
root = x
} else if (key < parent.key) {
parent.setLeft(x)
} else {
parent.setRight(x)
}
return insertFix(root, x)
}
AVL tree
The idea of red-black tree is to limit the number of nodes along a path within a range.
AVL tree takes a direct approach: quantify the difference between branches. For a node
T , define:
Where |T | is the height of tree T , l and r are the left and right sub-trees. Define
δ(∅) = 0 for the empty tree. If δ(T ) = 0 for every node T , the tree is definitely balanced.
For example, a complete binary tree of height h has n = 2h − 1 nodes. There are not any
empty branches except for the leaves. The less absolute value of δ(T ), the more balanced
between the sub-trees. We call δ(T ) the balance factor of a binary tree.
5.1 Definition
2 8
1 3 6 9
5 7 10
|δ(T )| ≤ 1 (5.2)
There are three valid values of δ(T ): ±1, and 0. Figure 5.1 shows an AVL tree. This
definition ensures the tree height h = O(lg n), where n is the number of nodes. Let’s
prove it. For an AVL tree of height h, the number of nodes varies. There are at most
2h − 1 nodes for a complete binary tree case. We are interesting in how many nodes at
least. Let the minimum number be N (h). We have the following result:
57
58 CHAPTER 5. AVL TREE
Figure 5.2 shows an AVL tree T of height h. It contains three parts, the key k, and
two sub-trees l, r. We have the following equation:
Figure 5.2: An AVL tree of height h. The height of one sub-tree is h − 1, the other is no
less than h − 2.
There must be a sub-tree of height h − 1. From the definition. we have ||l| − |r|| ≤ 1
holds. Hence the height of the other tree can not be lower than h − 2. The total number
of the nodes in T is the sum of both sub-trees plus 1 (for the root):
N (h) = N (h − 1) + N (h − 2) + 1 (5.4)
N 0 (h) = N 0 (h − 1) + N 0 (h − 2) (5.5)
Lemma 5.1.1. Let N (h) be the minimum number of nodes for an AVL tree of height h,
and N 0 (h) = N (h) + 1, then
N 0 (h) ≥ φh (5.6)
√
5+1
Where φ = is the golden ratio.
2
Proof. When h = 0 or 1, we have:
• h = 0: N 0 (0) = 1 ≥ φ0 = 1
• h = 1: N 0 (1) = 2 ≥ φ1 = 1.618...
For the induction case, assume N 0 (h) ≥ φh .
N 0 (h + 1) = N 0 (h) + N 0 (h − 1) {Fibonacci}
≥ φh + φh−1 {induction hypothesis}
√
h−1 5+3
=φ (φ + 1) {φ + 1 = φ2 = }
h+1
2
=φ
The height of AVL tree is proportion to O(lg n), indicating AVL tree is balanced.
When insert or delete, the balance factor may exceed the valid value range, we need fix
to resume |δ| < 1. Traditionally, the fixing is through tree rotations. We simplify the
5.2. INSERT 59
implementation with pattern matching. The idea is similar to the functional red-black
tree [13] . Because of this ‘modify-fix’ approach, AVL tree is also self-balancing. We can
re-use the binary search tree definition. Although the balance factor δ can be computed
recursively, we record it inside each node as T = (l, k, r, δ), and update it when mutate
the tree1 . Below example program adds δ as an Int:
data AVLTree a = Empty | Br (AVLTree a) a (AVLTree a) Int
For AVL tree, lookup, max, min are as same as the binary search tree. We focus on
insert and delete algorithms.
5.2 Insert
When insert a new element x, some |δ(T )| may exceed 1. For those sub-trees which are
the ancestors of x, the height may increase at most by 1. We need recursively update the
balance factor along the path of insertion. Define the insert result as a pair (T 0 , ∆H),
where T 0 is the updated tree and ∆H is the increment of height. We modify the binary
search tree insert function as below:
insert x = fst ◦ ins x (5.8)
Where fst (a, b) = a returns the first component of a pair. ins x T inserts element x
into tree T :
ins x ∅ = ((∅,
( x, ∅, 0), 1)
x < k : tree (ins x l) k (r, 0) δ (5.9)
ins x (l, k, r, δ) =
x > k : tree (l, 0) k (ins x r) δ
If the tree is empty ∅, the result is a leaf of x with balance factor 0. The height
increases to 1. Otherwise let T = (l, k, r, δ). We compare x with k. If x < k, we
recursively insert x to l, otherwise insert to r. As the recursive insert result is a pair
of (l0 , ∆l) or (r0 , ∆r), we need adjust the balance factor and update tree height through
function tree. It takes 4 parameters: (l0 , ∆l), k 0 , (r0 , ∆r), and δ. The result is (T 0 , ∆H),
where T 0 is the new tree, and ∆H is defined as:
∆H = |T 0 | − |T | (5.10)
We can further break it down into 4 cases:
∆H = |T 0 | − |T |
= 1 + max(|r0 |, |l0 |) − (1 + max(|r|, |l|))
= max(|r0 |, |l0 |) − max(|r|, |l|)
δ ≥ 0, δ 0 ≥ 0 : ∆r
(5.11)
δ ≤ 0, δ 0 ≥ 0 : δ + ∆r
=
δ ≥ 0, δ 0 ≤ 0 : ∆l − δ
otherwise : ∆l
Where δ 0 = δ(T 0 ) = |r0 | − |l0 |, is the updated balance factor. Appendix B provides the
proof for it. We need determine δ 0 before balance adjustment.
δ0 = |r0 | − |l0 |
= |r| + ∆r − (|l| + ∆l)
(5.12)
= |r| − |l| + ∆r − ∆l
= δ + ∆r − ∆l
1 Alternatively, we can record the height instead of δ [20] .
60 CHAPTER 5. AVL TREE
With the changes in height and balance factor, we can define the tree function in
eq. (5.9):
tree (l0 , ∆l) k (r0 , ∆r) δ = balance (l0 , k, r0 , δ 0 ) ∆H (5.13)
Below example programs implements what we deduced so far:
insert x = fst ◦ ins x where
ins x Empty = (Br Empty x Empty 0, 1)
ins x (Br l k r d)
| x < k = tree (ins x l) k (r, 0) d
| x > k = tree (l, 0) k (ins x r) d
tree (l, dl) k (r, dr) d = balance (Br l k r d') deltaH where
d' = d + dr - dl
deltaH | d ≥ 0 && d' ≥ 0 = dr
| d ≤ 0 && d' ≥ 0 = d + dr
| d ≥ 0 && d' ≤ 0 = dl - d
| otherwise = dl
5.2.1 Balance
There are 4 cases need fix as shown in fig. 5.3. The balance factor is ±2, exceeds the range
of [−1, 1]. We adjust them to a uniformed structure in the center, with the δ(y) = 0.
𝛿 𝑥 =2
𝛿 𝑧 = −2
x
z 𝛿 𝑦 =1
𝛿 𝑦 = −1
a y
y d
𝛿 𝑦 =0
b z
x c y
c d
a b x z
𝛿 𝑧 = −2 𝛿 𝑥 =2
z a b c d
x
𝛿 𝑥 =1 𝛿 𝑧 = −1
x d a z
a y
y d
b c b c
We call the 4 cases: left-left, right-right, right-left, and left-right. Denote the balance
factors before fixing as δ(x), δ(y), and δ(z); after fixing, they change to δ 0 (x), δ 0 (y) = 0,
and δ 0 (z) respectively. The values of δ 0 (x) and δ 0 (z) can be given as below. Appendix B
gives the proof.
Left-left Right-right
0 0
δ (x) = δ(x)
δ (x) = 0
δ 0 (y) = 0 δ 0 (y) = 0 (5.14)
0
0
δ (z) = 0 δ (z) = δ(z)
5.2. INSERT 61
The performance of insert is proportion to the height of the tree. From eq. (5.7), it
is bound to is O(lg n).
5.2.2 Verification
To validate an AVL tree, we need verify two things: (1) It is a binary search tree; (2) For
every sub-tree T , eq. (5.2): δ(T ) ≤ 1 holds. Below function examines the height difference
between the two sub-trees recursively:
avl? ∅ = True
(5.17)
avl? T = avl? l and avl? r and δ = |r| − |l| and |δ| ≤ 1
|∅| = 0
(5.18)
|(l, k, r, δ)| = 1 + max(|r|, |l|)
height Empty = 0
height (Br l _ r _) = 1 + max (height l) (height r)
62 CHAPTER 5. AVL TREE
Exercise 5.1
5.1.1. We only give the algorithm to test AVL height. Complete the program to test if a
binary tree is AVL tree.
• |δ| = 1, |δ 0 | = 0. The new node makes the tree well balanced. The height of the
parent keeps unchanged.
• |δ| = 0, |δ 0 | = 1. Either the left or the right sub-tree increases its height. We need
go on checking the upper level.
• |δ| = 1, |δ 0 | = 2. We need rotate the tree to fix the balance factor.
1: function AVL-Insert-Fix(T, x)
2: while Parent(x) 6= NIL do
3: P ← Parent(x)
4: L ← Left(x)
5: R ← Right(x)
6: δ ← δ(P )
7: if x = Left(P ) then
8: δ0 ← δ − 1
9: else
10: δ0 ← δ + 1
11: δ(P ) ← δ 0
5.3. IMPERATIVE ALGORITHM F 63
better in frequently insertion and removal cases. Many popular self-balance binary search
tree libraries are implemented on top of red-black tree. AVL tree also provides the intuitive
and effective solution to the balance problem.
Radix tree
Binary search tree stores data in nodes. Can we use the edges to carry information? Radix
trees, including trie, prefix tree, and suffix tree are the data structures developed based
on this idea in 1960s. They are widely used in compiler design [21] , and bio-information
processing, like DNA pattern matching [23] .
1
0
0
0
1
10
0
1 1
011 100
1
1011
Figure 6.1 shows a Radix tree. It contains bits 1011, 10, 011, 100 and 0. When lookup
a key k = (b0 b1 ...bn )2 , we take the first bit b0 (MSB from left), check whether it is 0 or 1.
For 0, turn left, else turn right. Then take the second bit and repeat looking up till either
reach a leaf node or finish all the n bits. We needn’t store keys in node because they are
represented by edges. The nodes labelled with key in fig. 6.1 are for illustration purpose.
For the integer keys, we represent them in binary format, and implement lookup with
bit-wise manipulations.
65
66 CHAPTER 6. RADIX TREE
‘go right’ [21] . Consider the binary trie in fig. 6.2. The three keys of “11”, “011”, and
“0011” all equal to 3.
1
0
0 1 1
11
1 1
011
1
0011
It is inefficient to treat the prefix zeros as valid bits. We need a tree of 32 levels to
insert 1 as an integer of 32 bits. Okasaki suggests to use little-endian integers instead [21] .
1 is represented as bits (1)2 , 2 as (01)2 , and 3 is (11)2 , ...
6.1.1 Definition
Re-use binary tree definition, a node is either empty, or a branch containing the left, right
sub-trees, and an optional value v. The left sub-tree l is encoded as 0 and the right r is
encoded as 1.
data IntTrie a = Empty | Branch (IntTrie a) (Maybe a) (IntTrie a)
Given a node, the corresponding integer key is uniquely determined through its posi-
tion. That is the reason we need not save the key, but only the value in the node. The
type of the tree is IntT rie A where A is the type of the value.
6.1.2 Insert
When insert a key k and a value x, we convert the integer k into binary. If k is even, the
lowest bit is 0, we recursively insert to the left sub-tree; otherwise k is odd, the lowest bit
is 1, we insert to the right. Next divide k by 2 to remove the lowest bit. For none empty
trie T = (l, v, r), function insert is defined as below:
1
0
0
1:a
0 0
1 0 1
4:b 5:c
1
9:d
We can test even/odd with the remainder modulo 2: even(k) = (k mod 2 = 0)m or
use bit-wise operation, like (k & 0x1) == 0. We can eliminate the recursion through
loops to realize an iterative implementation:
1: function Insert(T, k, x)
2: if T = NIL then
3: T ← Empty-Node . (NIL, Nothing, NIL)
4: p←T
5: while k 6= 0 do
6: if Even?(k) then
7: if Left(p) = NIL then
8: Left(p) ← Empty-Node
9: p ← Left(p)
10: else
11: if Right(p) = NIL then
12: Right(p) ← Empty-Node
13: p ← Right(p)
14: k ← bk/2c
15: Value(p) ← x
16: return T
Insert takes a trie T , a key k, and a value x. For integer k of m bits, it goes m
levels. The performance is bound to O(m). We design insert k x T and Insert(T, k, x)
symmetric, apply f oldr to the former, and f oldl (or for-loops) to the latter to convert a
list of key-value pairs to tree. For example:
The usage is f romList [(1, a), (4, b), (5, c), (9, d)], where uncurry is the revert of
Currying, it unpack a pair and feed to insert:
6.1.3 Lookup
When lookup key k, if k = 0, then the root is the target. Otherwise, we check the lowest
bit, then recursively look up the left or right sub-tree accordingly.
lookup k ∅ = Nothing
lookup 0 (l, v, r) = v
even(k) :
lookup
k
l (6.4)
lookup k (l, v, r) = 2
k
odd(k) :
lookup b c r
2
We can eliminate the recursion to implement the iterative lookup:
1: function Lookup(T, k)
2: while k 6= 0 and T 6=NIL do
3: if Even?(k) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: k ← bk/2c
8: if T 6= NIL then
9: return Value(T )
10: else
11: return NIL
The lookup function is bound to O(m) time, where m is the number of bits of k.
Exercise 6.1
6.1.1. Can we change the definition from Branch (IntTrie a) (Maybe a) (Int-
Trie a) to Branch (IntTrie a) a (IntTrie a), and return Nothing if
the value does not exist, and Just v otherwise?
6.2.1 Definition
Integer prefix tree is a special binary tree. It is empty ∅, or a leaf of (k, v), that contains
an integer key k and a value v, or a branch with the left and right sub-trees, that share
the longest common prefix bits for their keys. For the left sub-tree, the next bit is 0,
for the right, it is 1. Denoted as (p, m, l, r). Below example program defines the integer
prefix tree. The branch node contains 4 components: The longest prefix p, a mask integer
m indicating from which bit the sub-trees branch out, the left and right sub-trees l and r.
6.2. INTEGER PREFIX TREE 69
1
001
4:b
1:a
0
01 1
9:d 5:c
The mask m = 2n for some integer n ≥ 0. All bits belown do not belong to the common
prefix.
data IntTree a = Empty
| Leaf Int a
| Branch Int Int (IntTree a) (IntTree a)
6.2.2 Insert
When insert integer y to tree T , if T is empty, we create a leaf of y; If T is a singleton
leaf of x, we create a new leaf of y, and a branch, set x and y as the two sub-trees. To
determine which one is on the left between x and y, we find the longest common prefix p
of x and y. For example if x = 12 = (1100)2 , y = 15 = (1111)2 , then p = (11oo)2 , where o
denotes the bits we don’t care. We can use another integer m to mask those bits. In this
example, m = 4 = (100)2 . The next bit after p presents 21 . It is 0 in x, 1 in y. Hence,
we set x as the left sub-tree and y as the right in fig. 6.5.
prefix = 1100
12
mask = 100
0 1
12 15
If T is neither empty nor a leaf, we firstly check if y matches the longest common
prefix p in the root, then recursively insert it to the sub-tree according to the next bit to
p. For example, when insert y = 14 = (1110)2 to the tree in fig. 6.5, since p = (11oo)2
and the next bit (the bit of 21 ) is 1, we recursively insert y to the right sub-tree. If y does
not match p in the root, we need branch a new leaf as shown in fig. 6.6.
insert k v ∅ = (k, v)
insert k v (k, v 0 ) = (k, v)
insert k v (k 0 , v 0 ) = join 0 0 0
k (k, v) k (k , v()
match(k, p, m) : zero(k, m) : (p, m, insert k v l, r)
insert k v (p, m, l, r) = otherwise : (p, m, l, insert k v r)
otherwise : join k (k, v) p (p, m, l, r)
(6.5)
70 CHAPTER 6. RADIX TREE
0 1 0 1
12 15 12 prefix = 1110
mask = 10
0 1
14 15
0 1 0 1
12 15 5 prefix = 1110
mask = 10
0 1
12 15
We create a leaf of (k, v) when T = ∅, override the value for the same key. Func-
tion match(k, p, m) tests if k and p have the same bits after masked with m through:
mask(k, m) = p, where mask(k, m) = m − 1&k. It applies bit-wise not to m − 1, then
does bit-wise and with k. zero(k, m) tests the next bit in k masked with m is 0 or not.
We shift m one bit to right, then do bit-wise and with k:
Function join(p1 , T1 , p2 , T2 ) takes two prefixes and trees. It extracts the longest com-
mon prefix of p1 and p2 as (p, m) = LCP (p1 , p2 ), creates a new branch node, then set T1
and T2 as the sub-trees:
(
zero(p1 , m) : (p, m, T1 , T2 )
join(p1 , T1 , p2 , T2 ) = (6.7)
otherwise : (p, m, T2 , T1 )
To calculate the longest common prefix, we firstly compute bit-wise exclusive-or for
p1 and p2, then count the highest bit h = highest(xor(p1 , p2 )) as:
highest(0) = 0
highest(n) = 1 + highest(n 1)
Then generate a mask m = 2h . The longest common prefix p can be given by masking
the bits with m for either p1 or p2 , like p = mask(p1 , m). The following example program
implements the insert function:
insert k x t
= case t of
Empty → Leaf k x
Leaf k' x' → if k == k' then Leaf k x
else join k (Leaf k x) k' t
Branch p m l r
| match k p m → if zero k m
then Branch p m (insert k x l) r
else Branch p m l (insert k x r)
| otherwise → join k (Leaf k x) p t
match k p m = (mask k m) == p
8: function MaskBit(x, m)
9: return x&m − 1
Figure 6.7 gives an example tree. Although the integer prefix tree consolidates the
chained nodes, the operation to extract the longest common prefix need scan the bits.
For integer of m bits, it is bound to O(m).
6.2. INTEGER PREFIX TREE 73
prefix = 0
mask = 8
0 1
1: prefix = 100
mask = 2
0 1
4: 5:
6.2.3 Lookup
When lookup key k, if the tree T = ∅ or a leaf T = (k 0 , v) with different key, then k does
not exist; if k = k 0 , then v is the result; if T = (p, m, l, r), we need check if the common
prefix p matches k under the mask m, then recursively lookup the sub-tree l or r. If fails
to match p, then k does not exist.
lookup k ∅ = Nothing
Just v
(
k = k0 :
lookup k (k 0 , v) =
otherwise : Nothing
( (6.8)
match(k, p, m) : zero(k, m) : lookup k l
lookup k (p, m, l, r) = otherwise : lookup k r
Nothing
otherwise :
Exercise 6.2
6.3 Trie
When extend the key type from 0/1 bits to generic list, the tree structure changes from
binary tree to multiple sub-trees. Taking English characters for example, there are up to
26 sub-trees when ignore the case as shown in fig. 6.8.
a z
b c
a NIL ...
n o o
an
y
o o o
boy zoo
t l
bool
another
Figure 6.8: A trie of 26 branches, containing key ‘a’, ‘an’, ‘another’, ‘bool’, ‘boy’, and
‘zoo’.
Not all the 26 sub-trees contain data. In fig. 6.8, there are only three none empty
sub-trees bound to ‘a’, ‘b’, and ‘z’. Other sub-trees, such as ‘c’, are empty. We will hide
them later. When it is case sensitive, or extent the key from alphabetic string to generic
list, we adopt collection types, like map to define trie. A trie of type T rie K V is either
empty ∅ or a node of 2 cases:
1. A leaf of value v without any sub-trees as (v, ∅), where the type of v is V ;
Let the empty content be (Nothing, ∅), Below example program defines trie.
data Trie k v = Trie { value :: Maybe v
, subTrees :: Map k (Trie k v)}
6.3.1 Insert
When insert a pair of key and value, where the key is a list of elements. Let the trie be
T = (v, ts), ts[k] looks up k in map ts. It returns empty when k doesn’t exist; ts[k] ← t
inserts a mapping from k to tree t, and returns the updated map.
6.4. PREFIX TREE 75
6.3.2 Lookup
When look up a none empty key (k : ks) from trie T = (v, ts), starting from the first
element k, if there exists sub-tree T 0 mapped to k, we then recursively lookup ks in T 0 .
When the key is empty, then return the value as result:
trees, each ti is bound to a list si , as [si 7→ ti ]. These lists share the longest common
prefix s bound to the node t. i.e. s is the longest common prefix of s + + s1 , s +
+ s2 , ... For
any i 6= j, list si and sj don’t have none empty common prefix. Consolidate the chained
nodes in fig. 6.8, we get the prefix tree in fig. 6.9.
a zoo
bo
a zoo
n y
ol
an bool boy
other
another
Figure 6.9: A prefix tree with keys: ‘a’, ‘an’, ‘another’, ‘bool’, ‘boy’, ‘zoo’.
Denote prefix tree t = (v, ts). Particularly, (Nothing, [ ]) is the empty node, and
(Just v, [ ]) is the leaf of v.
6.4.1 Insert
When insert key s, if the tree is empty, we create a leaf of s as fig. 6.10 (a); otherwise, if
there is none empty common prefix between s and si , where si is bound to some sub-tree
ti , we branch out a new leaf tj . Extract the common prefix, and map it to a new internal
branch node t0 ; then put ti and tj as two sub-trees of t0 . Figure 6.10 (b) shows this case.
There are two special cases: s is the prefix of si as shown in fig. 6.10 (c) → (e); or si is
the prefix of s as shown in fig. 6.10 (d) → (e).
Below function inserts key s and value v to the prefix tree t = (v 0 , ts):
If the key s is empty, we overwrite the value to v; otherwise, we call ins to examine
the sub-trees and their prefixes.
If the node hasn’t any sub-trees, we create a leaf of v as the only sub-tree, and map
s to it; otherwise, for each mapping s0 7→ t, we compare s0 with s. If they have common
prefix (tested by the match function), then we branch out new sub-tree. We define two
lists matching if they have common prefix:
1 Alternatively, we can use Map [k] PrefixTree k v to manage the sub-trees.
6.4. PREFIX TREE 77
another bo
another y
ol
an bo
bool boy
boy bo
(c)
other y
ol
y
boy ol
an bo another bool boy
bool boy
an y
ol
(a) (b) (e)
bool boy
(d)
Figure 6.10: (a) insert ‘boy’ to empty tree; (b) insert ‘bool’, branch a new node out; (c)
insert ‘another’ to (b); (d) insert ‘an’ to (b); (e) insert ‘an’ to (c), same result as insert
‘another’ to (d)
match [ ] B = T rue
match A [ ] = T rue (6.13)
match (a:as) (b:bs) = a = b
To extract the longest common prefix of A and B, define (C, A0 , B 0 ) = lcp A B, where
C+ + A0 = A and C + + B 0 = B. If either A or B is empty, or their first elements are
different, then the common prefix C = [ ]; otherwise, we recursively extract the longest
common prefix from the rest, and preprend the head:
lcp [ ] B= ([ ], [ ], B)
lcp A [ ]= ([
( ], A, [ ])
a 6= b : ([ ], a:as, b:bs)
lcp (a:as) (b:bs) =
otherwise : (a:cs, as0 , bs0 ), where (cs, as0 , bs0 ) = lcp as bs
(6.14)
Function branch A v B t takes two keys A, B, a value v, and a tree t. It extracts
the longest common prefix C from A and B, maps it to a new branch node, and assign
sub-trees:
branch A v
B t=
0 0
(C, [ ], B ) : (C, (Just v, [B 7→ t]))
(6.15)
lcp A B = (C, A0 , [ ]) : (C, insert A0 v t)
(C, A0 , B 0 ) : (C, (Nothing, [A0 7→ (Just v, [ ]), B 0 7→ t]))
If A is the prefix of B, then A is mapped to the node of v, and the remaining list is
re-mapped to t, which is the only sub-tree in the branch; if B is the prefix of A, then we
recursively insert the remaining list and the value to t; otherwise, we create a leaf of v,
put it together with t as the two sub-trees. The following example program implements
the insert algorithm:
78 CHAPTER 6. RADIX TREE
match [] _ = True
match _ [] = True
match (a:_) (b:_) = a == b
6.4.2 Lookup
When look up a key k, we start from the root. If k = [ ], then return the root value;
otherwise, examine the sub-tree mappings, locate the one si 7→ ti , such that si is some
prefix of k, then recursively look up k − si in sub-tree ti . If does not exist si as the prefix
of k, then there is no such key in the tree.
lookup [] (PrefixTree v _) = v
lookup ks (PrefixTree v ts) =
case find (λ(s, t) → s `isPrefixOf` ks) ts of
Nothing → Nothing
Just (s, t) → lookup (drop (length s) ks) t
The prefix testing is linear to the length of the list, the lookup algorithm is bound to
O(mn) time, where m is the size of the element set, and n is the length of the key. We
leave the imperative implementation as the exercise.
Exercise 6.3
6.3.1. Eliminate the recursion to implement the prefix tree lookup purely with loops.
As shown in fig. 6.11, when user enters some characters, the dictionary application
searches the library, populates a list of candidate words or phrases that start from what
input.
Given a prefix s, function startsW ith searches all candidates in the prefix tree starts
with s. If s is empty, it enumerates all sub-trees, and prepand ([ ], x) for none empty
value x in the root. Function enum ts is defined as:
Where concatMap (also known as flatMap) is an important concept for list compu-
tation. Literally, it results like firstly map on each element, then concatenate the result
together. It’s typically realized with the ‘build-foldr’ fusion law to eliminate the interme-
diate list overhead. (chapter 5 of [99] ) If the input prefix s is not empty, we examine the
sub-tree mappings, for each list and sub-tree pair (k, t), if either s is prefix of k or vice
versa, we recursively expand t and prepand k to each result key; otherwise, s does not
match any sub-trees, hence the result is empty. Below example program implements this
algorithm.
startsWith [] (PrefixTree Nothing ts) = enum ts
startsWith [] (PrefixTree (Just v) ts) = ([], v) : enum ts
startsWith k (PrefixTree _ ts) =
case find (λ(s, t) → s `isPrefixOf` k | | k `isPrefixOf` s) ts of
Nothing → []
Just (s, t) → [(s + + a, b) |
(a, b) ← startsWith (drop (length s) k) t]
We can also realize the algorithm Starts-With(T, k, n) imperatively. From the root,
we loop on every sub-tree mapping ki 7→ Ti . If k is the prefix for any sub-tree Ti , we
expand all things in it up to n items; if ki is the prefix of k, we then drop that prefix,
update the key to k − ki , then search Ti for this new key.
1: function Starts-With(T, k, n)
2: if T = NIL then
3: return NIL
4: s ← NIL
5: repeat
6: match ← FALSE
7: for ki 7→ Ti in Sub-Trees(T ) do
8: if k is prefix of ki then
9: return Expand(s + + ki , Ti , n)
10: if ki is prefix of k then
11: match ← TRUE
12: k ← k − ki . drop the prefix
13: T ← Ti
14: s←s+ + ki
15: break
16: until not match
17: return NIL
Where function Expand(s, T, n) populates n results from T and prepand s to each
key. We implement it with ‘breadth first search’ method (see section 14.6.1):
82 CHAPTER 6. RADIX TREE
1: function Expand(s, T, n)
2: R ← NIL
3: Q ← [(s, T )]
4: while |R| < n and Q 6= NIL do
5: (k, T ) ← Pop(Q)
6: v ← Value(T )
7: if v 6= NIL then
8: Insert(R, (k, v))
9: for ki 7→ Ti in Sub-Trees(T ) do
10: Push(Q, (k + + ki , Ti ))
1. Press key sequence ‘4’, ‘6’, ‘6’, ‘3’, the word ‘home’ appears as a candidate;
2. Press key ‘*’ to change to next candidate, word ‘good’ appears;
3. Press key ’*’ again for another candidate, word ‘gone’ appears;
4. ...
This is called predictive input, or abbreviated as ‘T9’ [25] , [26] . The commercial im-
plementations use multiple layers of caches/index in both memory and file system. We
simplify it as an example of prefix tree application. First, we need define the digit key
mappings:
MT 9 = { 2 7→ "abc", 3 7→ "def", 4 7→ "ghi",
5 7→ "jkl", 6 7→ "mno", 7 7→ "pqrs", (6.19)
8 7→ "tuv", 9 7→ "wxyz" }
6.5. APPLICATIONS OF TRIE AND PREFIX TREE 83
MT 9 [i] gives the corresponding characters for digit i. We can also define the reversed
mapping from a character back to digit.
digits(s) = [MT−1
9 [c]|c ← s] (6.21)
For any character not in [a..z], we map it to a special key ‘#’. Below example program
defines the two mappings.
mapT9 = Map.fromList [('2', ”abc ”), ('3', ” d e f ”), ('4', ” ghi ”),
('5', ” j k l ”), ('6', ”mno”), ('7', ” pqrs ”),
('8', ” tuv ”), ('9', ”wxyz”)]
Suppose we already builded the prefix tree (v, ts) from the words. We need change the
above auto completion algorithm to process digit string ds. For every sub-tree mappings
(s 7→ t) ∈ ts, we convert the prefix s to digits(s), check if it matches to ds (either one is
the prefix of the other). There can be multiple sub-trees match ds as:
f indT 9 t [ ] = [[ ]]
(6.22)
f indT 9 (v, ts) ds = concatMap f ind pfx
For each mapping (s, t) in pfx, function f ind recursively looks up the remaining digits
ds0 in t, where ds0 = drop |s| ds, then prepend s to every candidate. However, the length
may exceed the number of input digits, we need cut and only take n = |ds| characters:
To realize the predictive text input imperatively, we can perform breadth first search
with a queue Q of tuples (prefix, D, t). Every tuple records the possible prefix searched so
far; the remaining digits D to be searched; and the sub-tree t we are going to search. Q is
initialized with the empty prefix, the whole digits sequence, and the root. We repeatedly
pop the tuple from the queue, and examine the sub-tree mappings. for every mapping
(s 7→ T 0 ), we convert s to digits(s). If D is prefix of it, then we find a candidate. We
append s to prefix, and record it in the result. If digits(s) is prefix of D, we need further
search the sub-tree T 0 . We create a new tuple of (prefix + + s, D0 , T 0 ), where D0 is the
remaining digits to be searched. Then add this new tuple back to the queue.
1: function Look-Up-T9(T, D)
2: R ← NIL
3: if T = NIL or D = NIL then
4: return R
84 CHAPTER 6. RADIX TREE
5: n ← |D|
6: Q ← {(NIL, D, T )}
7: while Q 6= NIL do
8: (prefix, D, T ) ← Pop(Q)
9: for (s 7→ T 0 ) ∈ Sub-Trees(T ) do
10: D0 ← Digits(s)
11: if D0 @ D then . D0 is prefix of D
12: Append(R, (prefix + + s)[1..n]) . limit the length to n
13: else if D @ D0 then
14: Push(Q, (prefix +
+ s, D − D0 , T 0 ))
15: return R
We start from integer trie and prefix tree. With binary format, we re-use binary tree
to realize the integer based map data structure. Then extend the key from integer to
generic list of finite set. Particularly for alphabetic strings, the generic trie and prefix
tree are powerful tools to manipulate the text. We give examples about auto-completion
and predictive text input. As another instance of radix tree, the suffix tree, that is widely
used in text/DNA processing, is closely related to trie and prefix tree.
Exercise 6.4
6.4.1. Implement the auto-completion and predictive text input with trie.
6.4.2. How to ensure the candidates in lexicographic order in the auto-completion and
predictive text input program? What’s the performance change accordingly?
IntTree(Int k, T v) {
key = k, value = v, prefix = k
}
for c in key {
if p.subTrees[c] == null then p.subTrees[c] = Trie<K, V>()
p = p.subTrees[c]
}
p.value = Optional.of(value)
return t
}
Self PrefixTree(V v) {
value = Optional.of(v)
}
}
return tree1
t = PrefixTree()
t.subtrees[key1] = tree1
t.subtrees[key2] = tree2
return t
}
}
}
return res
}
Chapter 7
B-Tree
7.1 Introduction
The integer prefix tree in previous chapter gives a way to encode the information in the
edge of the binary tree. Another way to extend the binary search tree is to increase
the sub-trees from 2 to k. B-tree is such a data structure, that can be considered as a
generic form of k-ary search tree. It is also developed to be self-balancing [39] . B-tree is
widely used in computer file system (some are based on B+ tree, an extension of B-tree)
and database system. Figure 7.1 gives an example B-tree, we can find the difference and
similarity between B-tree and binary search tree.
E J S V
A C D G K M O P R T U X Y Z
A binary search tree is either empty or contains a key k and two sub-trees l and r.
Every key in l is less than k, while k is less than every key in r:
Extend to multiple keys and sub-trees: a B-tree is either empty or contains n keys and
n+1 sub-trees, each sub-tree is also a B-tree, denoted as k1 , k2 , ..., kn and t1 , t2 , ..., tn , tn+1 ,
as shown in fig. 7.2.
k1 k2 … kn
t1 t2 tn tn+1
For every node, the keys and sub-trees satisfy the following rules:
89
90 CHAPTER 7. B-TREE
• For every key ki , all keys in sub-tree ti are less than it, while ki is less than every
key in sub-tree ti+1 :
∀ xi ∈ ti , i = 0, 1, ..., n ⇒ x1 < k1 < x2 < k2 < ... < xn < kn < xn+1 (7.2)
Leaf node has no sub-tree (accurately, all sub-trees are empty). There can be optional
values bound to the keys. We skip the values for simplicity. The type of B-tree is BT ree K
(or BTree<K>), where K is the type of keys. On top of it, we also need define a set of
self-balancing rules:
2. Let d be the minimum degree number of a B-tree, such that each node:
In summary:
d − 1 ≤ |keys(t)| ≤ 2d − 1 (7.3)
Proof. Consider a B-tree of n keys. The minimum degree d ≥ 2. Let the height be h. All
the nodes have at least d − 1 keys except for the root. The root contains at least 1 key.
There are at least 2 nodes at depth 1, at least 2d nodes at depth 2, at least 2d2 nodes at
depth 3, ..., at least 2dh−1 nodes at depth h. Multiply all nodes with d − 1 except for the
root, the total number of keys satisfies the following:
n+1
h ≤ logd (7.5)
2
Hence B-tree is balanced. The simplest B-tree is called 2-3-4 tree, where d = 2. Every
node except for the root contains 2, 3, or 4 sub-trees. Essentially, a red-black tree can be
mapped to a 2-3-4 tree. For a none empty B-tree of degree d, we denote it as (d, (ks, ts)),
where ks are the keys, ts are the sub-trees. Below example program defines the B-tree.
data BTree a = BTree [a] [BTree a]
Let the empty node be (∅, ∅) (or BTree [] []). Instead of storing d in every node,
we pass it together with B-tree t as a pair (d, t).
7.2. INSERT 91
7.2 Insert
The idea is similar to the binary search tree. While we need deal with multiple keys and
sub-trees. When insert key x to B-tree t, starting from the root, we examine the keys in
the node to locate a position1 where all keys on the left are less than x, while the rest keys
on the right are greater than x. If the node is a leaf, and it is not full (|keys(t)| < 2d − 1),
we insert x at this position. Otherwise, this position points to a sub-tree t0 , we recursively
insert x to t0 .
20
4 11 26 38 45
1 2 5 8 9 12 15 16 17 21 25 30 31 37 40 42 46 47 50
20
4 11 26 38 45
1 2 5 8 9 12 15 16 17 21 22 25 30 31 37 40 42 46 47 50
Figure 7.3: Insert 22 to a 2-3-4 tree. 22 > 20, go to the right sub-tree; next as 22 < 26,
go to the first sub-tree; finally, 21 < 22 < 25, and the leaf is not full.
For example, consider the 2-3-4 tree in fig. 7.3. when insert x = 22, because 20 < 22,
we next examine the sub-tree on the right, which contains 26, 38, 45. Since 22 < 26, we
next go to the first sub-tree containing 21 and 25. This is a leaf, and it is not full. Hence
we insert 22 to this node.
However, if there are already 2d − 1 keys in the leaf, we will break the B-tree rules
after insert (too ’full’). For the same B-tree in fig. 7.3, we’ll meet this issue when insert
18. There are two solutions: insert then split, and split before insert.
When the node contains too many keys and sub-trees, define split function to break
it into 3 parts at position m as shown in fig. 7.4:
k1 … km-1 km km+1 … kn
t1 tm tm+1 tn+1
km
k1 … km-1 km+1 … kn
t1 tm tm+1 tn+1
We first insert x to the tree t, then call f ix to resume the B-tree balancing rules with
degree d.
After ins, if the root contains too many keys, function f ix calls split to break it and
build a new root.
ins need handle two cases: for leaf, reuse the list ordered insert defined in eq. (1.11);
otherwise, we need find the position of the sub-tree where recursively insert to. Define a
partition as:
Where l = (ksl , tsl ) and r = (ksr , tsr ). It further calls the list partition function span
defined in eq. (1.47):
(
(ksl , ksr ) = span (< x) ks
0
(tsl , (t : tsr )) = splitAt |ksl | ts
As such, we separate all the keys and sub-trees less than x on the left as l, and those
greater than x on the right as r. The last sub-tree that less than x is extracted as t0 . We
then recursively insert x to t0 , as shown in fig. 7.5.
After insert x to t0 , it may contains too many keys that violates B-tree rules. We
define function balance to recursively recover B-tree rules by splitting sub-tree.
7.2. INSERT 93
k1 … ki-1 ki … kn
t1 ti tn+1
k1 … ki-1 ki … kn
ki-1 < x < ki
insert
l t’ r
(
f ull d t : f ixf
balance d (ksl , tsl ) t (ksr , tsr ) = (7.13)
otherwise : (ksl ++ ksr , tsl +
+ [t] +
+ tsr )
where f ixf splits sub-tree t as (t1 , k, t2 ) = split d t, then combine them to a new
node:
f ixf = (ksl +
+ [k] +
+ ksr , tsl +
+ [t1 , t2 ] +
+ tsr ) (7.14)
split d (BTree ks ts) = (BTree ks1 ts1, k, BTree ks2 ts2) where
(ks1, k:ks2) = splitAt (d - 1) ks
(ts1, ts2) = splitAt d ts
E J S V
A C D G K M O P R T U X Y Z
E O U
A C D G J K M N P R S T V X Y Z
7: r←s
8: return Insert-Nonfull(r, k)
Where Insert-Nonfull assumes the node r passed in is not full. If r is a leaf, we
insert k to the keys based on order (Exercise 7.1.3 asks to realize the ordered insert with
binary search); otherwise, we locate the position, where ki (r) < k < ki+1 (r). Split the
sub-tree ti (r) if it is full, and go on insert to this sub-tree.
1: function Insert-Nonfull(r, k)
2: n ← |K(r)|
3: if r is leaf then
4: i←1
5: while i ≤ n and k > ki (r) do
6: i←i+1
7: Insert-At(K(r), i, k)
8: else
9: i←n
10: while i > 1 and k < ki (r) do
11: i←i−1
12: if ti (r) is full then
13: Split(r, i)
14: if k > ki (r) then
15: i←i+1
16: Insert-Nonfull(ti (r), k)
17: return r
This algorithm is recursive. Exercise 7.1.2 asks to eliminate the recursion with pure
loops. Figure 7.7 gives the result with the same input of “GMPXACDEJKNORSTUVYZ”.
E P
C M S U X
A D G J K N O R T V Y Z
D M P T
A C E G J K N O R S U V X Y Z
k1 … ki-1 ki … kn
l t’ r
t’ l
ti head tail
Figure 7.8: Define the B-tree node with a sub-tree and paired lists
Below example program defines B-tree node. It’s either empty, or contains 3 parts:
the left list of (key, sub-tree) pairs in reversed order, a sub-tree, and the right list of (key,
sub-tree) pairs. We denoted the none empty node as (l, t0 , r).
data BTree a = Empty
| BTree [(a, BTree a)] (BTree a) [(a, BTree a)]
When move to right by a step, we take the first pair (k, t) from r, then form another
pair (k, t0 ) in front of l, and replace t0 with t. When move to left a step, it is symmetric.
Both operations take constant time.
stepl ((k, t):l, t0 , r) = (l, t, (k, t0 ):r)
(7.15)
stepr (l, t0 , (k, t):r) = ((k, t0 ):l, t, r)
With the left/right moves, we can implement a generic partition p t, that separates
the tree t with a predicate p into 3 parts: left, middle, right: (l, m, r), such that all sub-
trees in l and m satisfy p, while the sub-trees in r do not. Let hd = f st ◦ head, picks the
first pair (a, b) from a list, then extracts a out.
(
p(hd(r)) : partition p (stepr t)
partition p (∅, m, r) =
otherwise : (∅, m, r)
(
(not ◦ p)(hd(l)) : partition p (stepl t)
partition p (l, m, ∅) =
otherwise : (l, m, ∅)
p(hd(l)) and (not ◦ p)(hd(r)) : (l, m, r)
partition p (l, m, r) = p(hd(r)) : partition p (stepr t)
(not ◦ p)(hd(l)) : partition p (stepl t)
(7.16)
For example, partition (< k) t moves all keys and sub-trees in t less than k out of the
right part. Below example program implements the partition function:
7.2. INSERT 97
partition p t@(BTree [] m r)
| p (hd r) = partition p (stepR t)
| otherwise = ([], m, r)
partition p t@(BTree l m [])
| (not ◦ p) (hd l) = partition p (stepL t)
| otherwise = (l, m, [])
partition p t@(BTree l m r)
| p (hd l) && (not ◦ p) (hd r) = (l, m, r)
| p (hd r) = partition p (stepR t)
| (not ◦ p) (hd l) = partition p (stepL t)
We can use stepl /stepr to split a B-tree at position d when it’s overly full. Let n = |l|
be the number of keys/sub-trees of the left part. f n (x) means repeatedly apply function
f to x for n times.
n < d :
sp(stepd−n
r (t))
split d t = n > d : n−d
sp(stepr (t)) (7.17)
otherwise : sp(t)
Function ins need handle both t = ∅, and t 6= ∅ cases. For empty case, we create
a singleton leaf; otherwise, we call (l, t0 , r) = partition (< x) t to locate the position for
recursive insert:
ins ∅ = (∅,
( ∅, [(x, ∅)])
t0 = ∅ : balance d l ∅ ((x, ∅):r) (7.23)
ins t =
t0 6= ∅ : balance d l (ins t0 ) r
Function balance examines if the sub-tree t contains too many keys, and splits it.
(
f ull d t : f ixF ull
balance d l t r = (7.24)
otherwise : (l, t, r)
Where f ixF ull = (l, t1 , ((k, t2 ):r), and (t1 , k, t2 ) = split d t. Below example program
implements the insert algorithm:
98 CHAPTER 7. B-TREE
Exercise 7.1
7.1.1. Can we use ≤ to support duplicated keys in B-Tree?
7.1.2. For the ‘split then insert’ algorithm, eliminate the recursion with loops.
7.1.3. We use linear search among keys to find the proper insert position. Improve the im-
perative implementation with binary search. Is the big-O performance improved?
7.3 Lookup
For lookup, we extend from the binary search tree to multiple branches, and obtain the
generic B-tree lookup solution. There are only two directions when lookup the binary
search tree: left and right, while, there are multiple ways in B-tree. Consider lookup k
in B-tree t = (ks, ts), if t is a leaf (ts is empty), then the problem becomes list lookup;
otherwise, we partition t with k into three parts: l = (ksl , tsl ), t0 , r = (ksr , tsr ), where all
keys in l and sub-tree t0 are less then k, and the remaining (≥ k) is in r. If the first key
in ksr equals k, then we find the answer; otherwise, we recursive look up in sub-tree t0 .
Just (ks, ∅)
(
k ∈ ks :
lookup k (ks, ∅) =
otherwise : Nothing
(7.25)
Just k = safeHd ksr : Just (ks, ts)
(
lookup k (ks, ts) =
otherwise : lookup k t0
safeHd [] = Nothing
safeHd (x:xs) = Just x
For the paired list implementation, the idea is similar. If the tree is not empty, we
partition it with the predicate ‘< k’. Then check if the first key in the right part equals
to k, or recursively look up the partitioned sub-tree:
lookup k ∅ = Nothing
Just k = safeFst (safeHd r) : Just (l, t0 , r)
(
(7.26)
lookup k t =
otherwise : lookup k t0
Where (l, t0 , r) = partition (< k) t for the none empty tree case. safeFst applies fst
function to a ‘Maybe’ value. Below example program utilizes fmap to do this:
lookup x Empty = Nothing
lookup x t = let (l, t', r) = partition (< x) t in
if (Just x) == fmap fst (safeHd r) then Just (BTree l t' r)
else lookup x t'
For the imperative implementation, we start from the root r, find a position i among
the keys, such that ki (r) ≤ k < ki+1 (r). If ki (r) = k then return the node r and i as a
pair; otherwise, move to sub-tree ti (r), and go on looking up. If r is a leaf and k is not
in the keys, then return nothing as k does not exist in the tree.
1: function Lookup(r, k)
2: loop
3: i ← 1, n ← |K(r)|
4: while i ≤ n and k > ki (r) do
5: i←i+1
6: if i ≤ n and k = ki (r) then
7: return (r, i)
8: if r is leaf then
9: return Nothing . k does not exist
10: else
11: r ← ti (r) . go to the i-th sub-tree
Exercise 7.2
7.2.1. Improve the imperative lookup with binary search among keys.
7.4 Delete
After delete a key, the number of keys may be too few to be a valid B-tree node. Except
the root, the keys should not be less than d−1, where d is the minimum degree. There are
two methods symmetric to insert: we can either delete then fix, or merge before delete.
k1 … ki-1 ki=x … kn
replace ki with k’
delete k’
t1 tn+1
ti
k’=max(ti)
Function last returns the last element from a list (eq. (1.4)). deletel is the list delete al-
gorithm (eq. (1.14)). tail drops the first element from a list and returns the rest (eq. (1.1)).
We need modify the balance function, which defined for insert before, with the additional
logic to merge the node if it contains too few keys.
f ull d t : f ixf
balance d (ksl , tsl ) t (ksr , tsr ) = low d t : f ixl (7.30)
otherwise : (ksl +
+ ksr , tsl +
+ [t] +
+ tsr )
If t is overly low (< d − 1 keys), we call f ixl to merge it with the left part (ksl , tsl )
or right part (ksr , tsr ) depends on which side of keys is not empty. Use the left part for
example: we extract the last element from ksl and tsl respectively, say km and tm . Then
call unsplit (eq. (7.8)) to merge them with t as unsplit tm km t. It forms a new sub-tree
with more keys. Finally we call balance again to build the result B-tree.
ksl 6= ∅ :
balance d (init ksl , init tsl ) (unsplit tm km t) (ksr , tsr )
f ixl = ksr 6= ∅ : balance d (ksl , tsl ) (unsplit t k1 t1 ) (tail ksr , tail tsr ) (7.31)
otherwise : t
The last case (otherwise) means ksl = ksr = ∅, both sides are empty. The tree is
a singleton leaf hence need not fixing. k1 and t1 are the first element in ksr and tsr
7.4. DELETE 101
respectively. Finally, we need modify the f ix function defined for insert, add new logic
for delete:
f ix (d, (∅, [t])) =(d,
( t)
f ull d t : (d, ([k], [l, r])), where (l, k, r) = split d t (7.32)
f ix (d, t) =
otherwise : (d, t)
What we add is the first case. After delete, if the root contains nothing but a sub-tree,
we can shrink the height, pull the single sub-tree as the new root. The following example
program implements the delete algorithm.
delete x (d, t) = fixRoot (d, del x t) where
del x (BTree ks []) = BTree (List.delete x ks) []
del x t = if (Just x) == safeHd ks' then
let k' = max t' in
balance d l (del k' t') (k':(tail ks'), ts')
else balance d l (del x t') r
where
(l, t', r@(ks', ts')) = partition x t
We leave the delete function for the ‘paired list’ implementation as an exercise. Fig-
ures 7.10 to 7.12 give examples of delete.
E J S V
A C D G K M O P R T U X Y Z
E J S V
A D G K M O P R T U X Y Z
G S V
A D E K M O P R T U X Y Z
G S V
A D E M O P R T U X Y Z
D S V
A E G O P R T U X Y Z
Case 2. If x exists in node t, but t is not a leaf. There are three sub-cases:
Case 2a. As shown in fig. 7.9, let the predecessor of ki = x be k 0 , where k 0 = max(ti ).
If ti has sufficient keys (≥ d), we replace ki with k 0 , then recursively delete k 0 from ti .
Case 2b. If ti does not have enough keys, but the sub-tree ti+1 does (≥ d). Symmet-
rically, we replace ki with its successor k 00 , where k 00 = min(ti+1 ), then recursively delete
k 00 from ti+1 , as shown in fig. 7.13.
delete x
k1 … ki-1 ki=x … kn
delete k”
ti+1
k"=min(ti+1)
Case 2c. If neither ti nor ti+1 contains sufficient keys (|ti | = |ti+1 | = d − 1), we merge
ti , x, ti+1 to a new node. This new node has 2d − 1 keys, we can safely perform delete on
it as shown in fig. 7.14.
ti ti+1
k'1, ..., k'd-1 k''1, ..., k''d-1 k'1, ..., k'd-1 ki k''1, ..., k''d-1
Merge pushes a key ki down to the sub-tree. After that, if node t becomes empty, it
means ki is the only key in t, and ti , ti+1 are the only two sub-trees. We need shrink the
tree height as shown in fig. 7.15.
shrink
k'1, ..., k'd-1 ki k''1, ..., k''d-1 k'1, ..., k'd-1 ki k''1, ..., k''d-1
Case 3. If node t does not contain x, we need recursively delete x from a sub-tree ti .
There are two sub-cases if there are too few keys in ti :
Case 3a. Among the two siblings ti−1 , ti+1 , if either one has enough keys (≥ d),
we move a key from t to ti , then move a key from the sibling up to t, and move the
104 CHAPTER 7. B-TREE
corresponding sub-tree from the sibling to ti . As shown in fig. 7.16, ti received one more
key. We next recursively delete x from ti .
... ki ...
ti ti+1
ti ti+1
Case 3b. If neither sibling has sufficient keys (|ti−1 | = |ti+1 | = d − 1), we merge ti ,
a key from t, and either sibling into a new node, as shown in fig. 7.17. Then recursively
delete x from it.
... ki ...
ti ti+1
... ...
ti
6: i←i+1
7: if k = ki (t) then
8: if t is leaf then . case 1
9: Remove(K(t), k)
10: else . case 2
11: if |K(ti (t))| ≥ d then . case 2a
12: ki (t) ← Max(ti (t))
13: Delete(ti (t), ki (t))
14: else if |K(ti+1 (t))| ≥ d then . case 2b
15: ki (t) ← Min(ti+1 (t))
16: Delete(ti+1 (t), ki (t))
17: else . case 2c
18: Merge-At(t, i)
19: Delete(ti (t), k)
20: if K(T ) is empty then
21: t ← ti (t) . Shrinks height
22: return t
23: if t is not leaf then
24: if k > kn (t) then
25: i←i+1
26: if |K(ti (t))| < d then . case 3
27: if i > 1 and |K(ti−1 (t))| ≥ d then . case 3a: left
28: Insert(K(ti (t)), ki−1 (t))
29: ki−1 (t) ← Pop-Last(K(ti−1 (t)))
30: if ti (t) is not leaf then
31: Insert(T (ti (t)), Pop-Back(T (ti−1 (t))))
32: else if i ≤ n and |K(ti+1 (t))| ≥ d then . case 3a: right
33: Append(K(ti (t)), ki (t))
34: ki (t) ← Pop-First(K(ti+1 (t)))
35: if ti (t) is not leaf then
36: Append(T (ti (t)), Pop-First(T (ti+1 (t))))
37: else . case 3b
38: if i = n + 1 then
39: i←i−1
40: Merge-At(t, i)
41: Delete(ti (t), k)
42: if K(t) is empty then . Shrinks height
43: t ← t1 (t)
44: return t
Where Merge-At(t, i) merges sub-tree ti (t), key ki (t), and ti+1 (t) into one sub-tree.
1: procedure Merge-At(t, i)
2: x ← ti (t)
3: y ← ti+1 (t)
4: K(x) ← K(x) + + [ki (t)] +
+ K(y)
5: T (x) ← T (x) + + T (y)
6: Remove-At(K(t), i)
7: Remove-At(T (t), i + 1)
Exercise 7.3
7.3.1. When delete a key k from the branch node, we use the maximum key from the
106 CHAPTER 7. B-TREE
7.5 Summary
We extend the binary search tree to multiple branches, then constrain the branches within
a range to develop the B-tree. B-tree is used as a tool to control the magnetic disk access
(chapter 18, [4] ). Because all B-tree nodes store keys in a range, not too few or too
many, B-tree is balanced. Most of the tree operations are proportion to the height. The
performance is bound to O(lg n) time, where n is the number of keys.
Split node
void split(BTree<K, deg> z, Int i) {
var d = deg
var x = z.subTrees[i]
var y = BTree<K, deg>()
y.keys = x.keys[d ...]
x.keys = x.keys[ ... d - 1]
if not isLeaf(x) {
y.subTrees = x.subTrees[d ... ]
x.subTrees = x.subTrees[... d]
}
z.keys.insert(i, x.keys[d - 1])
z.subTrees.insert(i + 1, y)
}
Iterative lookup:
Optional<(BTree<K, deg>, Int)> lookup(BTree<K, deg> tr, K key) {
loop {
Int i = 0, n = length(tr.keys)
while i < n and key > tr.keys[i] {
i = i + 1
}
if i < n and key == tr.keys[i] then return Optional.of((tr, i))
if isLeaf(tr) {
return Optional.Nothing
} else {
tr = tr.subTrees[i]
}
}
}
return t
}
if not isLeaf(t) {
if x > t.keys[n - 1] then i = i + 1
if low(t.subtrees[i]) {
var tl = if i == 0 then null else t.subtrees[i - 1]
var tr = if i == n then null else t.subtrees[i + 1]
if tl 6= null and (not low(tl)) { // case 3a, left
insert(t.subtrees[i].keys, 0, t.keys[i - 1])
t.keys[i - 1] = popLast(tl.keys)
if not isLeaf(tl) {
insert(t.subtrees[i].subtrees, 0, popLast(tl.subtrees))
}
} else if tr 6= null and (not low(tr)) { // case 3a, right
append(t.subtrees[i].keys, t.keys[i])
t.keys[i] = popFirst(tr.keys)
if not isLeaf(tr) {
append(t.subtrees[i].subtrees, popFirst(tr.subtrees))
}
} else { // case 3b
mergeSubtrees(t, if i < n then i else (i - 1))
if i == n then i = i - 1
}
delete(t.subtrees[i], x)
if empty(t.keys) then t = t.subtrees[0] // shrink height
}
}
return t
}
K max(BTree<K, deg> t) {
while not empty(t.subtrees) {
t = last(t.subtrees)
}
return last(t.keys)
}
K min(BTree<K, deg> t) {
while not empty(t.subtrees) {
t = t.subtrees[0]
}
return t.keys[0]
}
Chapter 8
Binary Heaps
8.1 Definition
Heaps are widely used for sorting, priority scheduling, graph algorithms, and etc. [40] . The
most popular implementation uses array to represent the heap as a complete binary tree [4] .
Robert W. Floyd developed an efficient heap sort algorithm based on this idea [41] [42] . We
can implement the heap with varies data structures but not limit to array. This chapter
focuses on the heaps implemented with binary trees, including leftist heap, skew heap,
and splay heap [3] . A heap is either empty, or stores comparable elements that satisfy a
property, and define the following operations:
2. Pop: removes the top element from the heap and maintain the heap property: the
new top is still the minimum of the rest;
3. Insert: add a new element to the heap and maintain the heap property;
Alternatively, we can define the heap that always keeps the maximum on the top. We
call the heap with the minimum on the top as min-heap, the maximum on the top as max-
heap. When implement heap with a tree, we can put the minimum (or the maximum) in
the root. After pop, remove the root, and rebuild the tree from the sub-trees. We call
the heap implemented with binary tree as binary heap.
109
110 CHAPTER 8. BINARY HEAPS
16
14 10
8 7 9 3
2 4 1
16 14 10 8 7 9 3 2 4 1
1 2 3 4 5 6 7 8 9 10
i
parent(i)
=b c
2
lef t(i) = 2i (8.1)
right(i) = 2i + 1
8.2.1 Heapify
Heapify is the process that maintains heap property, keeping the minimum element on
the top. For binary heap, we obtain a stronger property as the binary tree is recursive:
every sub-tree stores its minimum element in the root. In other words, every sub-tree is
also a binary heap. For the cell indexed i in array representation, we examine whether all
the sub-tree elements are greater than or equal to it (≥), exchange when do not satisfy.
1: function Heapify(A, i)
2: n ← |A|
3: loop
4: s←i . s is the smallest
5: l ← Left(i), r ← Right(i)
6: if l ≤ n and A[l] < A[i] then
7: s←l
8: if r ≤ n and A[r] < A[s] then
9: s←r
10: if s 6= i then
11: Exchange A[i] ↔ A[s]
12: i←s
13: else
14: return
Because we recursive check sub-trees, the process time is proportion to the height of
the tree. Heapify is bound to O(lg n), where n is the length of the array. Figure 8.2
gives the steps when apply Heapify from 2 to array [1, 13, 7, 3, 10, 12, 14, 15, 9, 16].
The result is [1, 3, 7, 9, 10, 12, 14, 15, 13, 16].
8.2. BINARY HEAP BY ARRAY 111
13 7
3 10 12 14
15 9 16
3 7
13 10 12 14
15 9 16
3 7
9 10 12 14
15 13 16
Figure 8.2: Heapify. Step 1: the minimum of 13, 3, 10 is 3, exchange 3 ↔ 13; Step 2: the
minimum of 13, 15, 9 is 9, exchange 13 ↔ 9; Step 3: 13 is leaf, terminate.
112 CHAPTER 8. BINARY HEAPS
8.2.2 Build
We can build heap from an array with Heapify. List the numberf of nodes in each level
of a complete binary tree: 1, 2, 4, 8, .... They are all powers of 2 except for the last level,
because the tree is not necessarily full. There are at most 2p−1 nodes, where p is the
smallest integer satisfying 2p − 1 ≥ n, and n is the length of the array. Skip all leaves
because Heapify takes no effect on them, we start applying Heapify to the last branch
node (indexed at ≤ bn/2c) bottom-up. The build function is defined as below:
1: function Build-Heap(A)
2: n ← |A|
3: for i ← bn/2c down to 1 do
4: Heapify(A, i)
Although Heapify is bound O(lg n) time, Build-Heap is bound to O(n), but not
O(n lg n). We skip all leaves, check and move down a level at most for 1/4 nodes; check
and move down two levels at most for 1/8 nodes; check and move down three levels at
most for 1/16 nodes... the total number of comparisons and moves is up to:
1 1 1
S = n( + 2 + 3 + ...) (8.2)
4 8 16
Multiply by 2 for both sides:
1 1 1
2S = n( + 2 + 3 + ...) (8.3)
2 4 8
1 1 1 1 1
2S − S = n[ + (2 − ) + (3 − 2 ) + ...] shift by one and subtract
2 4 4 8 8
1 1 1
S = n[ + + + ...] geometric series
2 4 8
= n
Figure 8.3 shows the steps to build a min-heap from array [4, 1, 3, 2, 16, 9, 10, 14, 8, 7].
The black node is where Heapify is applied. The grey nodes are swapped to maintain
the heap property.
4 1 3 2 16 9 10 14 8 7
4 4
1 3 1 3
2 16 9 10 2 7 9 10
14 8 7 14 8 16
(1) (5)
4 4
1 3 1 3
2 7 9 10 2 7 9 10
14 8 16 14 8 16
(2) (6)
4
1
1 3
4 3
2 7 9 10
2 7 9 10
14 8 16
14 8 16
(3)
(7)
4 1
1 3 2 3
2 7 9 10 4 7 9 10
14 8 16 14 8 16
(8)
(4)
1 2 3 4 7 9 10 14 8 16
Figure 8.3: Build heap. (1) 16 > 7; (2) exchange 16 ↔ 7; (3) 2 < 14 and 2 < 8; (4) 3 < 9
and 3 < 10; (5) 1 < 2 and 1 < 7; (6) 1 < 4 and 1 < 3; (7) exchange 4 ↔ 1; (8) exchange
4 ↔ 2, end.
114 CHAPTER 8. BINARY HEAPS
It takes constant time to remove the last element from array, pop is bound to O(lg n)
time as it calls Heapify. We can develop a solution to find the top k elements from an
array. First build a heap from the array, then repeatedly pop k times:
1: function Top-k(A, k)
2: R←[]
3: Build-Heap(A)
4: loop Min(k, |A|) times . cut off when k > |A|
5: Append(R, Pop(A))
6: return R
Further, we can implement a priority queue to schedule tasks with priorities. Every
time, we peek the highest priority task to run. To run an urgent task earlier, we can
increase its priority, meaning to decrease an element in a min-heap, as shown in fig. 8.4.
1 1
3 7 3 7
9 10 12 14 2 10 12 14
15 13 16 15 9 16
(1) (3)
1 1
3 7 2 7
9 10 12 14 3 10 12 14
15 2 16 15 9 16
(2) (4)
The heap property may be broken when decrease some element in a min-heap. Let
the decreased element be A[i], below function resumes the heap property bottom-up. It
is bound to O(lg n) time.
1: function Heap-Fix(A, i)
2: while i > 1 and A[i] < A[ Parent(i) ] do
3: Exchange A[i] ↔ A[ Parent(i) ]
4: i ← Parent(i)
We can realize push with Heap-Fix [4] . Append the new element k to the tail, then
apply Heap-Fix to recover the heap property:
1: function Push(A, k)
2: Append(A, k)
3: Heap-Fix(A, |A|)
the total time is bound to O(n lg n). Besides, we need another list of length n to hold the
result.
1: function Heap-Sort(A)
2: R←[]
3: Build-Heap(A)
4: while A 6= [ ] do
5: Append(R, Pop(A))
6: return R
Floyd developed a fast implementation with max-heap. Every time, swap the head
and the tail of the array. The maximum is swapped to the expected position, the tail; and
the previous tail becomes the new top. Next decrease the heap size by one, and apply
Heapify to restore the heap property. Repeat this till the heap size decrease to one.
This algorithm needn’t the additional space.
1: function Heap-Sort(A)
2: Build-Max-Heap(A)
3: n ← |A|
4: while n > 1 do
5: Exchange A[1] ↔ A[n]
6: n←n−1
7: Heapify(A[1...n], 1)
Exercise 8.1
8.1.1. Consider another idea about in-place heap sort: Build a min-heap from the array
A, the first element a1 is in the right position. Treat the rest [a2 , a3 , ..., an ] as the
new heap, and apply Heapify from a2 . Repeat this till the last element. Is this
method correct?
1: function Heap-Sort(A)
2: Build-Heap(A)
3: for i = 1 to n − 1 do
4: Heapify(A[i...n], 1)
8.1.2. Similarly, can we apply Heapify k times from left to right to get the top-k ele-
ments?
1: function Top-K(A, k)
2: Build-Heap(A)
3: n ← |A|
4: for i ← 1 to min(k, n) do
5: Heapify(A[i...n], 1)
merge ∅ R = R
merge L ∅ = L
merge L R = ?
116 CHAPTER 8. BINARY HEAPS
Merge
L R L R
Both left and right sub-trees are heaps. If neither is empty, each root stores the
minimum respectively. We compare the two roots, and peek the smaller one as the new
root. Let L = (A, x, B), R = (A0 , y, B 0 ), where A, A0 , B, B 0 are sub-trees. If x < y, then
x is the new root. We keep A, and merge B and R recursively; alternatively, we can keep
B, and merge A and R. The new heap can be (merge A R, x, B) or (A, x, merge B R).
We always merge the right sub-tree for simplicity. This method generates the leftist heap.
5 8
NIL NIL
Define an auxiliary make function. It compares the ranks of the sub-trees and swap
them if necessary.
(
rank(A) < rank(B) : (1 + rank(A), B, k, A)
make(A, k, B) = (8.5)
otherwise : (1 + rank(B), A, k, B)
For the two trees A and B, if rank(A) is smaller, set B as the left sub-tree, and A as
the right. The rank of the new node is 1 + rank(A); otherwise if rank(B) is smaller, set A
as the left sub-tree, and B as the right. The rank of the new node is 1 + rank(B). Given
two leftist heaps, and denote them as H1 = (r1 , L1 , K1 , R1 ) and H2 = (r2 , L2 , k2 , R2 ) if
not empty, define merge:
merge ∅ H2 = H2
merge H1 ∅ = H
(1
(8.6)
k1 < k 2 : make(L1 , k1 , merge R1 H2 )
merge H1 H2 =
otherwise : make(L2 , k2 , merge H1 R2 )
We always merge to the right sub-tree recursively to maintain the leftist property. It
is bound to O(lg n) time. The binary heap implemented with array performs well in most
cases, suitable for the modern cache technology. However, it takes O(n) time to merge,
because it needs concatenate two arrays, and rebuild the heap [50] :
1: function Merge-Heap(A, B)
2: C ← Concat(A, B)
3: Build-Heap(C)
We can access the top of the leftist heap in constant time(assume it is not empty):
After pop the root, we merge the two sub-trees in O(lg n) time.
We can build a leftist heap from a list as (Curried form): build = f oldr insert ∅.
Then repeatedly pop the minimum to output the sorted result:
Where
heapSort ∅ = []
(8.11)
heapSort H = (top H) : heapSort (pop H)
It pops n times, each takes O(lg n) time. The total time is bound to O(n lg n).
118 CHAPTER 8. BINARY HEAPS
4 3
7 9 14 8
16 10
Figure 8.7: Build the leftist heap from [9, 4, 16, 7, 10, 2, 14, 3, 8, 1].
Leftist heap may lead to unbalanced tree in some cases as shown in fig. 8.8. Skew heap is
a self-adjusting heap. It simplifies the leftist heap and improves balance [46] [47] . To build
the leftist heap, we swap the left and right sub-trees when the rank on left is smaller than
the right. However, this method can’t handle the case that either sub-tree has a NIL
node. The rank is always 1 no matter how big the sub-tree is. Skew heap always swaps
the sub-trees for simplification.
3 4
8
9
10
14
16
Figure 8.8: Leftist heap built from [16, 14, 10, 8, 7, 9, 3, 2, 4, 1].
Skew heap is implemented with skew tree. A skew tree is a binary tree. The root
stores the minimum element, every sub-tree is also a skew tree. Skew tree needn’t the
rank. We can directly re-use the binary tree definition. Denote the none empty tree as
(L, k, R).
When merge two none empty skew trees H1 = (L1 , k1 , R1 ) and H2 = (L2 , k2 , R2 ), if
k1 < k2 , then choose k1 (otherwise k2 ) as the new root. Then merge the greater one with
a sub-tree. We can either merge H2 with L1 , or merge H2 with R1 . We choose R1 , and
swap the left and right sub-trees. The result is (merge R1 H2 , k1 , L1 ).
8.4. SPLAY HEAP 119
merge ∅ H2 = H2
merge H1 ∅ = H
(1
(8.12)
k1 < k 2 : (merge R1 H2 , k1 , L1 )
merge H1 H2 =
otherwise : (merge H1 R2 , k2 , L2 )
The other operations, including insert, top, and pop are implemented with merge.
Skew heap outputs a balanced tree even for ordered list as shown in fig. 8.9.
2 3
6 4 5 7
8 9
10
1. Zig-zig: Both x and p are on the left; or on the right. We rotate twice to make x as
root.
There are total 6 cases. Let the none empty tree be T = (L, k, R), define splay as
below when access element y:
120 CHAPTER 8. BINARY HEAPS
g x
p d a p
x c b g
a b c d
zig-zig
g x
p p g
d
a a
x b c d
b c
zig-zag
p x
x c a p
a b zig b c
Figure 8.10: zig-zig: x and p are both on left or right, x becomes the new root. zig-zag:
x and p are on different sides, x becomes the new root, p and g are siblings. zig: p is the
root, rotate to make x as the root.
8.4. SPLAY HEAP 121
(
x=y: (a, x, (b, p, (c, g, d)))
splay y (((a, x, b), p, c), g, d) = zig-zig
otherwise : T
(
x=y: (((a, g, b), p, c), x, d)
splay y (a, g, (b, p, (c, x, d))) = zig-zig symmetric
otherwise : T
(
x=y: ((a, p, b), x, (c, g, d))
splay y (a, p, (b, x, c), g, d) = zig-zag
otherwise : T
(
x=y: ((a, g, b), x, (c, p, d))
splay y (a, g, ((b, x, c), p, d)) = zig-zag symmetric
otherwise : T
(
x=y: (a, x, (b, p, c))
splay y ((a, x, b), p, c) = zig
otherwise : T
(
x=y: ((a, p, b), x, c)
splay y (a, p, (b, x, c)) = zig symmetric
otherwise : T
splay y T = T others
(8.13)
The tree is unchanged for all other cases. Every time when insert, we trigger splay to
adjust the balance. If the tree is empty, the result is a leaf; otherwise, compare the new
element and the root, then recursively insert to left (less than) or right (greater than)
sub-tree and splay.
insert y ∅ = (
(∅, y, ∅)
y<x: splay y (insert y L, x, R) (8.14)
insert y (L, x, R) =
otherwise : splay y (L, x, insert y R)
4 10
2 9
7
1 3
6 8
Figure 8.11 gives the splay tree built from [1, 2, ..., 10]. It generates a relative balanced
tree. Okasaki gives a simple rule for splaying [3] . Whenever follow two left or two right
branches continuously, rotate the two nodes. When access x, if have moved left or right
twice, then partition T as L and R recursively, where L contains all elements less than x,
while R contains the remaining. Then create a new tree with x as the root, and L, R as
the sub-trees.
122 CHAPTER 8. BINARY HEAPS
partition y ∅ = (∅, ∅)
partition
y (L,x, R) =
R=∅ (T, ∅)
0
x <y (((L, x, L0 ), x0 , A), B)
where: (A, B) = partition y R0
x < y
R = (L0 , x0 , R0 )
otherwise ((L, x, A), (B, x0 , R0 ))
where: (A, B) = partition y L0
L=∅ (∅, T )
y < x0 (A, (L0 , x0 , (R0 , x, R)))
otherwise where: (A, B) = partition y L0
L = (L0 , x0 , R0 )
otherwise ((L0 , x0 , A), (B, x, R))
where: (A, B) = partition y R0
(8.15)
We partitions the tree T with a pivot y. For empty tree, the result is (∅, ∅); otherwise
for tree (L, x, R), if x < y, there are two sub-cases: (1) R is empty. All elements in the
tree are less then y, the result is (T, ∅); (2) R = (L0 , x0 , R0 ). If x0 < y, then recursively
partition R0 with y. Put all elements less than y in A, and the rest in B. The result is
a pair of trees: ((L, x, L0 ), x0 , A) and B. If x0 > y, then recursively partition L0 with y
into (A, B). The result is also a pair of (L, x, A) and (B, x0 , R0 ). The result is symmetric
when y < x.
Alternatively, we can define insert with partition. When insert element k to T , first
partition the heap to two sub-trees of L and R, satisfying L < k < R (that L contains
all elements smaller than k, and R contains the rest). Then create a new tree of (L, k, R)
(rooted at k, with L, R as sub-trees).
insert k T = (L, k, R), where (L, R) = partition k T (8.16)
Since splay tree is essentially a binary search tree, the minimum is at the left most.
We keep traversing the left sub-tree to access the ‘top’ of the heap:
top (∅, k, R) = k
(8.17)
top (L, k, R) = top L
This is equivalent to min for the binary search tree (alternatively, we can define:
top = min). For pop, we remove the minimum and splay when move left twice.
pop (∅, k, R) = R
pop ((∅, k 0 , R0 ), k, R) = (R0 , k, R) (8.18)
pop ((L0 , k 0 , R0 ), k, R) = (pop L0 , k 0 , (R0 , k, R))
The third row performs splaying based on the binary search tree property without
calling partition. Top and pop both are bound to O(lg n) time when the splay tree is
balanced.
We can implement merge in O(lg n) time with partition. When merge two none-
empty trees, we choose either root as the pivot to partition the other, then recursively
merge the sub-trees:
merge T ∅ = T
merge ∅ T = T (8.19)
merge (L, x, R) T = ((merge L L0 ) x (merge R R0 ))
where
(L0 , R0 ) = partition x T
8.5. SUMMARY 123
8.5 Summary
We give the generic definition of binary heap in this chapter. There are several imple-
mentations. The array based representation is suitable for imperative implementation. It
maps a complete binary tree to array, supporting random access in constant time. We
directly use the binary tree to implement the heap in functional way. Most operations
are bound to O(lg n) time, some are amortized O(1) time [3] . When extend from the bi-
nary tree to k-ary tree, we obtain binomial heap, Fibonacci heap, and pairing heap (See
chapter 10).
Exercise 8.2
8.2.1. Implement leftist heap and skew heap imperatively.
8.2.2. Define fold for heap.
Pop:
K pop([K] a, Less<K> lt) {
var n = length(a)
t = a[n]
124 CHAPTER 8. BINARY HEAPS
swap(a, 0, n - 1)
remove(a, n - 1)
if a 6= [] then heapify(a, 0, lt)
return t
}
Push:
void push([K] a, K k, less<K> lt) {
append(a, k)
heapFix(a, length(a) - 1, lt)
}
Heap sort:
void heapSort([K] a, less<K> lt) {
buildHeap(a, not ◦ lt)
n = length(a)
while n > 1 {
swap(a, 0, n - 1)
n = n - 1
heapify(a[0 .. (n - 1)], 0, not ◦ lt)
}
}
merge Empty h = h
merge h Empty = h
merge h1@(Node x l r) h2@(Node y l' r') =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r') l'
Splay operation:
−− zig-zig
splay t@(Node (Node (Node a x b) p c) g d) y =
if x == y then Node a x (Node b p (Node c g d)) else t
splay t@(Node a g (Node b p (Node c x d))) y =
if x == y then Node (Node (Node a g b) p c) x d else t
−− zig-zag
splay t@(Node (Node a p (Node b x c)) g d) y =
if x == y then Node (Node a p b) x (Node c g d) else t
splay t@(Node a g (Node (Node b x c) p d)) y =
if x == y then Node (Node a g b) x (Node c p d) else t
−− zig
splay t@(Node (Node a x b) p c) y = if x == y then Node a x (Node b p c) else t
splay t@(Node a p (Node b x c)) y = if x == y then Node (Node a p b) x c else t
−− others
splay t _ = t
Selection sort
It sorts elements in ascending order as shown in fig. 9.1. If select the maximum, then
it sorts in descending order. The compare operation can be abstract.
sort [ ] = [ ]
(9.1)
sort A = m : sort (A − [m]) where m = min A
127
128 CHAPTER 9. SELECTION SORT
min
x
append
sorted unsorted
x1 < x2 < ... < xk a1 a2 ... am
Figure 9.1: The left is sorted, repeatedly select the minimum of the rest and append.
swap
Figure 9.2: The left is sorted, repeatedly find the minimum and swap to the right position.
We can further improve it to tail recursive. Divide the elements into two groups A and
B. A is initialized empty ([ ]), B contains all elements. We pick two elements from B,
compare and put the greater one to A, leave the smaller one as m. Then repeatedly pick
element from B, compare with m till B becomes empty. m holds the minimum finally.
At any time, we have the invariant: L = A + + [m] ++ B, where a ≤ m ≤ b for every
a ∈ A, b ∈ B.
Where:
min0 as m [ ] = (
(m, as)
b<m: min0 (m:as) b bs (9.4)
min0 as m (b:bs) =
otherwise : min0 (b:as) m bs
9.2. IMPROVEMENT 129
Function min returns a pair: the minimum and the list of the remaining. We define
selection sort as below:
sort [ ] = [ ]
(9.5)
sort xs = m : (sort xs0 ), where (m, xs0 ) = min xs
9.1.1 Performance
Selection sort need scan and find the minimum for n times. It compares n + (n − 1) +
n(n + 1)
(n − 2) + ... + 1 times, which is O( ) = O(n2 ). Compare to the insertion sort,
2
selection sort performs same in the best, worst, and average cases. While insertion sort
performs best at O(n) (the list is in reversed ordered), and worst at O(n2 ).
Exercise 9.1
9.1.1. What is the problem with below implementation of min?
min0 as m [ ] = (m,
( as)
b<m: min0 (as +
+ [m]) b bs
min0 as m (b:bs) =
otherwise : min (as +
0
+ [b]) m bs
9.2 Improvement
To sort in ascending, descending order flexibly, we abstract the comparison as ‘C’.
sortBy C [ ] = [ ]
(9.6)
sortBy C xs = m : sortBy C xs0 , where (m, xs0 ) = minBy C xs
For example, we pass (<) to sort a collection of numbers in ascending order: sortBy (<
) [3, 1, 4, ...]. As the constraint, the comparison C need satisfy the strict weak ordering [52] .
6: m←i
7: Exchange A[i] ↔ A[m]
We only need sort n − 1 elements, and leave the last one to save the last loop. Besides,
we needn’t swap if A[i] is exact the i-th smallest.
1: procedure Sort(A)
2: for i ← 1 to |A| − 1 do
3: m←i
4: for j ← i + 1 to |A| do
5: if A[i] < A[m] then
6: m←i
7: if m 6= i then
8: Exchange A[i] ↔ A[m]
swap
... max ... x ... sorted ...
Further, we can pick both the minimum and maximum in one pass, swap the minimum
to the head, and the maximum to the tail, hence halve the inner loops.
1: procedure Sort(A)
|A|
2: for i ← 1 to b c do
2
3: min ← i
4: max ← |A| + 1 − i
5: if A[max] < A[min] then
6: Exchange A[min] ↔ A[max]
7: for j ← i + 1 to |A| − i do
8: if A[j] < A[min] then
9: min ← j
10: if A[max] < A[j] then
11: max ← j
12: Exchange A[i] ↔ A[min]
13: Exchange A[|A| + 1 − i] ↔ A[max]
It’s necessary to swap if the right most element less than the left most one before the
inner loop starts. This is because the scan excludes them. We can also implement the
9.2. IMPROVEMENT 131
swap
... sorted smaller ... x ... max ... min ... y ... sorted greater ...
swap
Figure 9.4: Find the minimum and maximum, swap both to the right positions.
sort [ ] = [ ]
sort [x] = [x] (9.8)
+ [b], where (a, b, xs0 ) = min-max xs
sort xs = a : (sort xs0 ) +
Where function min-max extracts the minimum and maximum from a list:
We initialize the minimum as the first element x0 , the maximum as the second element
x1 , and process the list with f oldr. Define sel as:
x < x 0 :
(x, x1 , x0 :xs)
sel x (x0 , x1 , xs) = x1 < x : (x0 , x, x1 :xs)
otherwsie : (x0 , x1 , x : xs)
Although min-max is bound to O(n) time, + +[b] is expensive. As shown in fig. 9.4, let
the left sorted part be A, the right sorted part be B. We can turn the cock-tail sort to
tail recursive with A and B as the accumulators.
sort0 A B [ ] = A + +B
sort0 A B [x] = A + + (x:B)
sort0 A B (x:xs) = sort0 (A + + [x0 ]) xs0 (x1 :B), where (x0 , x1 , xs0 ) = min-max xs
(9.10)
Start sorting with empty accumulators: sort = sort0 [ ] [ ]. The append only happens
to A ++ [x0 ], while x1 is linked before B. To further eliminate the + +[x0 ], we maintain A
←
−
in reversed order: A , hence x0 is preprended but not appended. We have the following
equations:
A0 = A++ [x]
= reverse (x : reverse A)
←
−
= reverse (x : A ) (9.11)
←−−−
←
−
= x: A
←
−
Finally, reverse A0 back to A0 . We improve the algorithm as:
sort0 A B [ ] = (reverse A) +
+B
sort0 A B [x] = (reverse x:A) ++B (9.12)
sort0 A B (x:xs) = sort0 (x0 :A) xs0 (x1 :B)
132 CHAPTER 9. SELECTION SORT
16
16 14
13 10 14
16
8 13 10 9 12 14
7 16
7 6 15 16 8 4 13 3 5 10 9 1 12 2 11 14
Assign every team a number to measure its strength. Suppose the team with greater
number always beats the smaller one (this is obviously not true in real world). The
champion number is 16. the runner-up is not 14, but 15, which is out in the first round.
We need figure out a way to quickly identify the second greater number in the tournament
tree. Then apply it to select the 3rd, the 4th, ... to sort. We can overwrite the champion
to a very small number, i.e. −∞, hence it won’t be selected next time, and the previous
runner-up will become the new champion. For 2m teams, where m is some natural number,
it takes 2m−1 + 2m−2 + ... + 2 + 1 = 2m − 1 comparisons to determine the new champion.
This is same as before. Actually, we needn’t perform bottom-up comparisons because
the tournament tree stores sufficient ordering information. The champion must beat the
runner-up at sometime. We can locate the runner-up along the path from the root to
the leaf of the champion. We grey the path in fig. 9.5, compare with [14, 13, 7, 15]. This
method is defined as below:
1. Build a tournament tree with the maximum (the champion) at the root;
3. Perform a bottom-up back-track along the path, find the new champion and store
it in the root;
15
15 14
13 10 14
15
8 13 10 9 12 14
7 15
7 6 15 8 4 13 3 5 10 9 1 12 2 11 14
13 14
13 10 14
7
8 13 10 9 12 14
7
7 6 8 4 13 3 5 10 9 1 12 2 11 14
13 12
13 10 12
7
8 13 10 9 12 11
7
7 6 8 4 13 3 5 10 9 1 12 2 11
3: for each x ∈ A do
4: Append(T , Node(NIL, x, NIL))
5: while |T | > 1 do
6: T0 ← [ ]
7: for every t1 , t2 ∈ T do
8: k ← Max(Key(t1 ), Key(t2 ))
9: Append(T 0 , Node(t1 , k, t2 ))
10: if |T| is odd then
11: Append(T 0 , Last(T ))
12: T ← T0
13: return T [1]
When pop, we replace the root with −∞ top-down, then back-track through the parent
field to find the new maximum.
1: function Pop(T )
2: m ← Key(T )
3: Key(T ) ← −∞
4: while T is not leaf do . top-down replace m with −∞.
5: if Key(Left(T )) = m then
6: T ← Left(T )
7: else
8: T ← Right(T )
9: Key(T ) ← −∞
10: while Parent(T ) 6= NIL do . bottom-up to find the new maximum.
11: T ← Parent(T )
12: Key(T ) ← Max(Key(Left(T )), Key(Right(T )))
13: return (m, T ) . the maximum and the new tree.
Pop processes the tree in two passes, top-down, then bottom-up along the path of
the champion. Because the tournament tree is balanced, the length of this path, i.e. the
height of the tree, is bound to O(lg n), where n is the number of the elements. Below is
the tournament tree sort. We first build the tree in O(n) time, then pop the maximum
for n times, each pop takes O(lg n) time. The total time is bound to O(n lg n).
procedure Sort(A)
T ← Build-Tree(A)
for i ← |A| down to 1 do
(A[i], T ) ← Pop(T )
We can also implement tournament tree sort recursively. Reuse the binary search tree
definition, let an none empty tree be (l, k, r), where k is the element, l, r are the left and
right sub-trees. Define wrap x = (∅, x, ∅) to create a leaf node. Convert the n elements
to a list of single trees: ts = map wrap xs. For every pair of trees t1 , t2 , we merge them
to a bigger tree, pick the greater element as the new root, and set t1 , t2 as the sub-trees.
Where:
9.4. APPENDIX - EXAMPLE PROGRAMS 135
When pop the champion, we examine the sub-trees to see which one holds the same
element as the root. Then recursively pop the champion from the sub-tree till the leaf
node, and replace it with −∞.
sort ∅ = [ ]
sort (l, −∞, r) = [ ] (9.17)
sort t = (key t) : sort (pop t)
Exercise 9.2
9.2.1. Implement the recursive tournament tree sort in ascending order.
9.2.2. How to handle duplicated elements with the tournament tree? is tournament tree
sort stable?
9.2.3. Compare the tournament tree sort and binary search tree sort in terms of space
and time performance.
9.2.4. Compare heap sort and tournament tree sort in terms of space and time perfor-
mance.
sort ∅ = []
(9.18)
sort t = (top t) : sort (pop t)
This is exactly as same as the definition of heap sort. Heap always stores the minimum
(or the maximum) on the top, and provides fast pop operation. The array implementation
encodes the binary tree structure as indices, uses exactly n cells to represent the heap.
The functional heaps, like the leftist heap and splay heap use n nodes as well. We’ll give
more well performed heaps in next chapter.
Cock-tail sort:
[A] cocktailSort([A] xs) {
Int n = length(xs)
for Int i = 0 to n / 2 {
var (mi, ma) = (i, n - 1 -i)
if xs[ma] < xs[mi] then swap(xs[mi], xs[ma])
for Int j = i + 1 to n - 1 - i {
if xs[j] < xs[mi] then mi = j
if xs[ma] < xs[j] then ma = j
}
swap(xs[i], xs[mi])
swap(xs[n - 1 - i], xs[ma])
}
return xs
}
key (Br _ k _ ) = k
toList Empty = []
toList (Br _ Inf _) = []
toList t@(Br _ Only k _) = k : toList (pop t)
Binary heap stores elements in a binary tree, we can extend it to k-ary tree [54] (k > 2
multi-ways tree) or multiple trees. The binomial heap is a forest of k-ary trees. When
delay some operations of the Binomial heap, we obtain the Fibonacci heap. It improves
the heap merge performance from O(lg n) to amortized constant time, which is critical for
graph algorithm design. The pairing heap gives a simplified implementation with good
overall performance.
When n is a natural number, the list of coefficients is some row in Pascal’s triangle1 [55] .
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
...
The first row is 1, all the first and last numbers are 1 for every row. Any other number
is the sum of the top-left and top-right numbers in the previous row. There are many
ways to generate pascal triangle, like recursion.
1070). Newton generalized n to rational numbers, later Euler expanded it to real exponents.
139
140 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
2. Bn is formed by two Bn−1 trees, the one with the greater root element is the left
most sub-tree of the other, as shown in fig. 10.1.
B0
Bn
Bn-1
Bn-1 ...
...
1. Every tree satisfies the heap property, i.e. for min heap, the element in every node
is not less than (≥) its parent;
2. Every tree has the unique rank. i.e. any two trees have different ranks.
From the 2nd rule, for a binomial heap of n elements, convert n to its binary format
n = (am ...a1 , a0 )2 , where a0 is the least significant bit (LSB) and am is the most significant
bit (MSB). There is a tree of rank i if and only if ai = 1. For example, consider a binomial
heap of 5 elements. As 5 = (101)2 in binary, there are 2 binomial trees: B0 and B2 . The
binomial heap in fig. 10.3 has 19 elements, 19 = (10011)2 . There are three trees: B0 , B1 ,
and B4 .
We define the binomial tree as (r, k, ts), where r is the rank, k is the root element,
and ts is the list of sub-trees ordered by rank.
data BiTree a = Node Int a [BiTree a]
type BiHeap a = [BiTree a]
There is a method called ‘left-child, right-sibling’ [4] , that reuses the binary tree data
structure to define multi-ways tree. Every node has the left and right parts. the left
references to the first sub-tree; the right references to its sibling. All siblings form a list
as shown in fig. 10.4. Alternatively, we can use an array or a list to hold the sub-trees.
10.1. BINOMIAL HEAPS 141
1 0
2 2
1 1 0 1 0 0
0 0 0 0
3 2 1 0
2 1 0 1 0 0
1 0 0
0
18 3 6
37 8 29 10 44
30 23 22 48 31 17
45 32 24 50
55
T1 T2 ... Tm
Figure 10.4: R is the root, T1 , T2 , ..., Tm are sub-trees of R. The left of R is T1 , the right
is NIL. T11 , ..., T1p are sub-trees of T1 . The left of T1 is T11 , the right is its sibling T2 .
The left of T2 is T21 , the left is sibling.
10.1.2 Link
To link two Bn trees to a Bn+1 tree, we compare the two root elements, choose the smaller
one as the root, and put the other tree ahead of other sub-trees as shown in fig. 10.5.
(
x<y: (r + 1, x, (r, t, ts0 ):ts)
0
link (r, x, ts) (r, y, ts ) = (10.2)
otherwise : (r + 1, y, (r, x, ts):ts0 )
We can implement link with ‘left child, right sibling’ method as below. Link operation
is bound to constant time.
1: function Link(x, y)
2: if Key(y) < Key(x) then
3: Exchange x ↔ y
4: Sibling(y) ← Sub-Trees(T1 )
5: Sub-Trees(x) ← y
6: Parent(y) ← x
7: Rank(x) ← Rank(y) + 1
8: return x
Exercise 10.1
10.1.1. Write a program to generate Pascal’s triangle.
10.1.2. Prove that the i-th row in tree Bn has ni nodes.
10.1. BINOMIAL HEAPS 143
10.1.3 Insert
When insert a tree, we keep the forest ordered by rank (ascending):
ins t [ ] = [t]
0 0
rank t < rank t : t:t :ts
(10.3)
ins t (t0 :ts) = rank t0 < rank t : t0 : ins t ts
otherwise : ins (link t t0 ) ts
Where rank (r, k, ts) = r returns the rank of a tree. For empty heap [ ], it becomes a
singleton of [t]; otherwise, compare the rank of t with the first tree t0 , if t has the less rank,
then it becomes the new first tree; if t0 has the less rank, we recursively insert t to the
rest trees; if they have the same rank, then link t and t0 to a bigger tree, and recursively
insert. For n elements, there are at most O(lg n) binomial trees in the heap. ins links
O(lg n) time at most. As linking is constant time, the overall performance is bound to
O(lg n)2 . We define insert for binomial heap with ins. First wrap the new element x in
a singleton tree, then insert to the heap:
This is in Curried form. We can further insert a list of elements with fold:
10.1.4 Merge
When merge two binomial heaps, we actually merge two lists of binomial trees. Every
tree has the unique rank in the merged result in ascending order. The tree merge process
is similar to merge sort (see chapter 13). Every time, we pick the first tree from each
heap, compare their ranks, put the smaller one to the result. If the two trees have the
same rank, we link them to a bigger one, and recursively insert to the merge result.
merge ts1 [ ] = ts1
merge [ ] ts2 = ts
2
rank t1 < rank t2 : t1 : (merge ts1 (t2 :ts2 ))
merge (t1 :ts1 ) (t2 :ts2 ) = rank t2 < rank t1 : t2 : (merge (t1 :ts1 ) ts2 )
otherwise :
ins (link t1 t2 ) (merge ts1 ts2 )
(10.6)
Alternatively, when t1 and t2 have the same rank, we can insert the linked tree back
to either heap, and recursively merge:
merge (ins (link t1 t2 ) ts1 ) ts2
We can eliminate recursion, and implement the iterative merge:
1: function Merge(H1 , H2 )
2: H ← p ← Node(0, NIL, NIL)
3: while H1 6= NIL and H2 6= NIL do
4: if Rank(H1 ) < Rank(H2 ) then
5: Sibling(p) ← H1
6: p ← Sibling(p)
7: H1 ← Sibling(H1 )
8: else if Rank(H2 ) < Rank(H1 ) then
9: Sibling(p) ← H2
10: p ← Sibling(p)
11: H2 ← Sibling(H2 )
12: else . same rank
13: T1 ← H1 , T2 ← H2
14: H1 ← Sibling(H1 ), H2 ← Sibling(H2 )
15: H1 ← Insert-Tree(Link(T1 , T2 ), H1 )
16: if H1 6= NIL then
17: Sibling(p) ← H1
18: if H2 6= NIL then
19: Sibling(p) ← H2
20: return Remove-First(H)
If there are m1 trees in H1 , m2 trees in H2 . There are at most m1 + m2 trees after
merge. The merge is bound to O(m1 +m2 ) time if all trees have different ranks. If there are
trees of the same rank, we call ins up to O(m1 + m2 ) times. Consider m1 = 1 + blg n1 c
and m2 = 1 + blg n2 c, where n1 , n2 are the numbers of elements in each heap, and
blg n1 c + blg n2 c ≤ 2blg nc, where n = n1 + n2 . The final performance of merge is O(lg n).
10.1.5 Pop
Although every tree has the minimal element in its root, we don’t know which tree holds
the overall minimum of the heap. We need locate it from all trees. As there are O(lg n)
trees, it takes O(lg n) time to find the top element.
top (t:ts) = f oldr f (key t) ts where f (r, x, ts) y = min x y (10.7)
10.1. BINOMIAL HEAPS 145
1: function Top(H)
2: m←∞
3: while H 6= NIL do
4: m ← Min(m, Key(H))
5: H ← Sibling(H)
6: return m
For pop, we need further remove the top element and maintain heap property. Let the
trees be Bi , Bj , ..., Bp , ..., Bm in the heap, and the minimum is in the root of Bp . After
remove the top, there leave p sub binomial trees with ranks of p − 1, p − 2, ..., 0. We can
reverse them to form a new binomial heap Hp . The other trees without Bp also form a
binomial heap H 0 = H − [Bp ]. We merge Hp and H 0 to get the final result as shown in
fig. 10.6. To support pop, we need extract the tree containing the minimum out:
Bp
min
… …
Bi
Bj …
B’0 Bm
B’p-2 B’1
B’p-1
…
B’0
B’1 B’p-2
B’p-1
Merge
…
Bi Bj
Bm
Where key (r, k, ts) = k accesses the root element, the result of min0 is a pair: the
tree containing the minimum and the remaining trees. We next define pop with it:
2: H 0 ← H, p ← NIL
3: Tm ← Tp ← NIL
4: while H 6= NIL do
5: if Tm = NIL or Key(H) < Key(Tm ) then
6: Tm ← H
7: Tp ← p
8: p←H
9: H ← Sibling(H)
10: if Tp 6= NIL then
11: Sibling(Tp ) ← Sibling(Tm )
12: else
13: H 0 ← Sibling(Tm )
14: Sibling(Tm ) ← NIL
15: return (Tm , H 0 )
We can implement heap sort with pop. First build a binomial heap from a list, then
repeatedly pop the minimum.
heapSort [ ] = [ ]
(10.11)
heapSort H = (top H) : heapSort (pop H)
Binomial heap insert and merge are bound to O(lg n) time in worst case, their amor-
tized performance are constant time, we skip the proof.
When insert x to a binomial heap, we wrap it to a single tree, then insert to the forest.
We maintain the rank ordering. If two ranks are same, link them, and recursively insert.
The performance is bound to O(lg n) time. Taking lazy strategy, we delay the ordered
(by rank) insert and link later. Put the singleton tree of x directly to the forest. To access
the top element in constant time, we need record which tree has the overall minimum. A
3 Michael L. Fredman and Robert E. Tarjan, used Fibonacci numbers to prove the performance time
bound, they decided to use Fibonacci to name this data structure. [4]
10.2. FIBONACCI HEAP 147
Fibonacci heap is either empty ∅, or a forest of trees denoted as (n, tm , ts). Where n is
the number of elements in the heap, tm is the tree holds the top element, and ts is the
rest trees. Below example defines Fibonacci heap (reuse the definition of binomial tree).
data FibHeap a = Empty | FibHeap { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]}
We can access the top in constant time (Curried form): top = key ◦ minTree.
10.2.1 Insert
We define insert as a special case of merge: one heap is a singleton tree: insert x H =
merge (singleton x) H. Or simplified in Curried form:
Where:
singleton x = (1, (1, x, [ ]), [ ])
Below is the imperative implementation:
1: function Insert(k, H)
2: x ← Singleton(k) . wrap k to a tree
3: Add(x, Trees(H))
4: Tm ← Min-Tree(H)
5: if Tm = NIL or k < Key(Tm ) then
6: Min-Tree(H) ← x
7: Size(H) ← Size(H) + 1
Where Trees(H) access the list of trees in H, Min-Tree(H) returns the tree that
holds the minimal element.
10.2.2 Merge
When merge two heaps, we delay the link, but only put the trees together, then pick the
new top.
merge h ∅ = h
merge ∅ h = h (
0 0 0 key tm < key t0m : (n + n0 , tm , t0m :ts +
+ ts0 )
merge (n, tm , ts) (n , tm , ts ) =
otherwise : 0 0
(n + n , tm , tm :ts ++ ts0 )
(10.13)
When neither tree is empty, the + + takes time that is proportion to the number of
trees in one heap. We can improve it to constant time with doubly linked-list as in below
example program.
data Node<K> {
K key
Int rank
Node<k> next, prev, parent, subTrees
}
data FibHeap<K> {
Int size
Node<K> minTree, trees
}
148 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
1: function Merge(H1 , H2 )
2: H ← Fib-Heap
3: Trees(H) ← Concat(Trees(H1 ), Trees(H2 ))
4: if Key(Min-Tree(H1 )) < Key(Min-Tree(H2 )) then
5: Min-Tree(H) ← Min-Tree(H1 )
6: else
7: Min-Tree(H) ← Min-Tree(H2 )
Size(H) = Size(H1 ) + Size(H2 )
8: return H
9: function Concat(s1 , s2 )
10: e1 ← Prev(s1 )
11: e2 ← Prev(s2 )
12: Next(e1 ) ← s2
13: Prev(s2 ) ← e1
14: Next(e2 ) ← s1
15: Prev(s1 ) ← e2
16: return s1
10.2.3 Pop
As the merge function delays the link, we need ‘compensate’ it during pop. We define it
as tree consolidation. Consider another problem: given a list of numbers of 2m (for some
integer m ≥ 0), for e.g., L = [2, 1, 1, 4, 8, 1, 1, 2, 4], we repeatedly sum two equal numbers
until every one is unique. The result is [8, 16], as shown in table 10.2. The first column
gives the number we are ‘scanning’; the second is the middle step, i.e. compare current
number and the first number in result list, add them when equal; the last column is the
merge result to the next step. We can define the consolidation with fold:
Let n = sum L. consolidate actually represents n in binary format. The result contains
2i (i starts from 0) if and only if the i-th bit is 1. For e.g., sum[2, 1, 1, 4, 8, 1, 1, 2, 4] = 24.
10.2. FIBONACCI HEAP 149
It’s (11000)2 in binary, the 3rd and 4th bit are 1, hence the result contains 23 = 8, 24 = 16.
We can consolidate trees in similar way: compare the rank, and link trees:
melt t [ ] = [t]
rank t = rank t0 : melt (link t t0 ) ts
(10.16)
melt t (t0 :ts) = rank t < rank t0 : t:t0 :ts
rank t > rank t0 : t0 : melt t ts
Figure 10.7 gives the consolidation steps. It is similar to number consolidation when
compare with table 10.2. We can use an auxiliary array A to do the consolidation. A[i]
stores the tree of rank i. We traverse the trees in the heap. If meet another tree of rank
i, we link them together to obtain a bigger tree of rank i + 1, clean A[i], and next check
whether A[i + 1] is empty or not. If there is a tree of rank i + 1, then link them together
again. Array A stores the final consolidation result after traverse.
a c d e i q r s u
b f g j k m t v w
h l n o x
(0)
a a
a
b c e b c e i
c a
b c
d f g d f g j k m
b
d
h h l n o
p
(1), (2) (3) (4) (5)
q a q a
b c e i r s b c e i
d f g j k m t d f g j k m
h l n o h l n o
p p
(6) (7), (8)
Figure 10.7: Consolidation. Step 3, link d and c, then link a; Step 7, 8, link r and q, then
link s and q.
1: function Consolidate(H)
150 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
2: R ← Max-Rank(Size(H))
3: A ← [NIL, NIL, ..., NIL] . total R cells
4: for each T in Trees(H) do
5: r ← Rank(T )
6: while A[r] 6= NIL do
7: T 0 ← A[r]
8: T ← Link(T, T 0 )
9: A[r] ← NIL
10: r ←r+1
11: A[r] ← T
12: Tm ← NIL
13: Trees(H) ← NIL
14: for each T in A do
15: if T 6= NIL then
16: append T to Trees(H)
17: if Tm = N IL or Key(T ) < Key(Tm ) then
18: Tm ← T
19: Min-Tree(H) ← Tm
It becomes a binomial heap after consolidation. There are O(lg n) trees. Max-
Rank(n) returns the upper limit of rank R in a heap of n elements. From the binomial
tree result, the biggest tree BR has 2R elements. We have 2R ≤ n < 2R+1 , we estimate
the rough upper limit is R ≤ log2 n. We’ll give more accurate estimation of R in later
section. We need additionally scan all trees, find the minimal root element. We can reuse
min0 defined in eq. (10.8) to extract the min-tree.
E = mgh
As shown in fig. 10.8, consider some process, that moves an object of mass m up and
down, and finally stops at height h0 . Let the friction resistance be Wf , the process works
the following power:
W = mg(h0 − h) + Wf
Consider heap pop. To evaluate the cost, let the potential be Φ(H) before pop. It
is the result accumulated by a series of insert and merge operations. The heap becomes
10.2. FIBONACCI HEAP 151
h'
H 0 after tree consolidation. The new potential is Φ(H 0 ). The difference between Φ(H 0 )
and Φ(H), plus the cost of tree consolidation give the amortized performance. Define the
potential as:
Where t(H) is the number of trees in the heap. Let the upper bound of rank for all
trees as R(n), where n is the number of elements in the heap. After tree consolidation,
there are at most t(H 0 ) = R(n) + 1 trees. Before consolidation, there is another operation
contributes to running time. we removed the root of min-tree, then add all sub-trees to
the heap. We consolidate at most R(n) + t(H) − 1 trees. Let the pop performance bound
to T , the consolidation bound to Tc , the amortized time is given as below:
T = Tc + Φ(H 0 ) − Φ(H)
= O(R(n) + t(H) − 1) + (R(n) + 1) − t(H) (10.19)
= O(R(n))
Insert, merge, and pop ensure all trees are binomial trees, therefore, the upper bound
of R(n) is O(lg n).
y ...
x ...
... r
x
y ...
...
Figure 10.9: If key x < key y, cut x off and add to the heap.
Exercise 10.2
F0 = 0
F1 = 1
Fk = Fk−1 + Fk−2
Proof. For tree x, let its k sub-trees be y1 , y2 , ..., yk , ordered by the time when they are
linked to x. Where y1 is the earliest, and yk is the latest. Obviously, |yi | ≥ 0. When link
yi to x, there have already been sub-trees of y1 , y2 , ..., yi−1 . Because we only link nodes
of the same rank, by that time we have:
rank(yi ) = rank(x) = i − 1
After that, yi can lost an additional sub-tree at most, (through the Decrease). Once
loss the second sub-tree, it will be cut off then added to the forest. For any i = 2, 3, ..., k,
we have:
rank(yi ) ≥ i − 2
Let sk be the minimum possible size of tree x, where k = rank(x). It starts from
s0 = 1, s1 = 2. i.e. there is at least a node in tree of rank 0, at least two nodes in tree of
rank 1, at least k nodes in tree of rank k.
|x| ≥ sk
= 2 + srank(y2 ) + srank(y3 ) + ... + srank(yk )
≥ 2 + s0 + s1 + ... + sk−2
The last row holds because rank(yi ) ≥ i − 2, and sk is monotonic, hence srank(yi ) ≥
si−2 . We next show that sk > Fk+2 . Apply induction. For edge case, s0 = 1 ≥ F2 = 1,
and s1 = 2 ≥ F3 = 2; For induction case k ≥ 2.
|x| ≥ sk
≥ 2 + s0 + s1 + ... + sk−2
≥ 2 + F2 + F3 + ... + Fk induction hypothesis
= 1 + F0 + F1 + F2 + ... + Fk F0 = 0, F1 = 1
Next, we prove:
k
(10.21)
X
Fk+2 = 1 + Fi
i=0
• Edge case, F2 = 1 + F0 = 2
154 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Fk+2 = Fk+1 + Fk
k−1
induction hypothesis
X
= (1 + Fi ) + Fk
i=0
k
X
= 1+ Fi
i=0
√
1+ 5
For Fibonacci sequence, Fk ≥ φ , where φ =
k
is the golden ratio. We prove
2
that pop is amortized O(lg n) algorithm. We can define maxRank as:
10.3.1 Definition
A pairing heap is a multi-way tree. The root holds the minimum. A pairing heap is either
empty ∅, or a k-ary tree, consists of a root and multiple sub-trees, denoted as (x, ts). We
can also use ‘left child, right sibling’ way to define the tree.
data PHeap a = Empty | Node a [PHeap a]
2. Otherwise, compare the two roots, set the greater one as the new sub-tree of the
other.
10.3. PAIRING HEAPS 155
merge ∅ h2 = h2
merge h1 ∅ = h
(1
(10.24)
x<y: (x, (y, ts2 ):ts1 )
merge (x, ts1 ) (y, ts2 ) =
otherwise : (y, (x, ts1):ts2 )
merge is in constant time. Below is the imperative implementation with the ‘left-child,
right sibling’ method:
1: function Merge(H1 , H2 )
2: if H1 = NIL then
3: return H2
4: if H2 = NIL then
5: return H1
6: if Key(H2 ) < Key(H1 ) then
7: Exchange(H1 ↔ H2 )
8: Sub-Trees(H1 ) ← Link(H2 , Sub-Trees(H1 ))
9: Parent(H2 ) ← H1
10: return H1
Similar to Fibonacci heap, we implement insert with merge as eq. (10.12), access the
top from the root: top (x, ts) = x. Both operations are in constant time.
10.3.4 Pop
After pop the root, we consolidate the sub-trees to a tree:
We firstly merge every two sub-trees from left to right, then merge these paired results
from right to left to a tree. This explains the name of ‘paring heap’. Figures 10.10
and 10.11 show the paired merge.
consolidate [ ] = ∅
consolidate [t] = t (10.26)
consolidate (t1 :t2 :ts) = merge (merge t1 t2 ) (consolidate ts)
5 4 3 12 7 10 11 6 9
15 13 8 17 14
16
5 4 3 12 7 10 11 6 9
15 13 8 17 14
16
4 3 7 6 9
5 13 12 8 10 11 17 14
16
15
Figure 10.10: Pop the root, merge the 9 sub-trees in pairs, leave the last tree.
5: L ← Link(T, L)
6: H ← NIL
7: for T in L do
8: H ← Merge(H, T )
9: return H
We iterate to merge Tx , Ty to T , and link ahead of L. When loop on L the second
time, we actually traverse from right to left. If there are odd number of sub-trees, Ty =
NIL at last, hence T = Tx in this case.
10.3.5 Delete
To delete a node x, first decrease the value of x to −∞, then followed with a pop.
Alternatively, if x is the root, pop it; otherwise, cut x off H, then pop(x), and merge it
back to H:
1: function Delete(H, x)
2: if H = x then
3: Pop(H)
4: else
5: H ← Cut(H, x)
6: x ← Pop(x)
7: Merge(H, x)
As delete is implemented with pop, the performance is conjectured to be amortized
O(lg n) time.
10.3. PAIRING HEAPS 157
6
6 6 12 8
9 11
7 9 11 7 9 11
17 14
10 17 14 10 17 14
16
16 16
4 6 12 8
5 13 7 9 11
15 10 17 14
16
Figure 10.11: Merge from right to left. (d) merge 9, 6; (e) merge 7; (f) merge 3; (g) merge
4.
158 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP
Exercise 10.3
10.3.1. If continuously insert n elements then followed with a pop. The performance
overhead is big when n is a large number (although the amortized performance is
O(lg n)). How to mitigate such worst case?
10.3.2. Implement delete for the pairing heap.
10.3.3. Implement Decrease-Key for the pairing heap.
Node(K x) {
key = x
rank = 0
parent = subTrees = sibling = null
mark = false
}
}
Node<K> removeFirst(Node<K> h) {
var next = h.sibling
h.sibling = null
10.4. APPENDIX - EXAMPLE PROGRAMS 159
return next
}
insertTree t [] = [t]
insertTree t ts@(t':ts') | rank t < rank t' = t:ts
| rank t > rank t' = t' : insertTree t ts'
| otherwise = insertTree (link t t') ts'
} else {
x.mark = true
}
}
Chapter 11
Queue
Queue supports first-in, first-out (FIFO). There are many ways to implement queue,
e.g., through linked-list, doubly liked list, circular buffer, etc. Okasaki gives 16 different
implementations in [3] . A queue satisfies the following two requirements:
It’s easy to realize queue with doubly linked-list. We skip this implementation, and
focus on using other basic data structures, like (singly) linked-list or array.
data Queue<K> {
Node<K> head, tail
}
Figure 11.1: Both head and tail point to S for empty queue.
Define ‘enqueue’ (also called push, snoc, append, or push back) and ‘dequeue’ (also
called pop, or pop front) to add and remove element respectively. When implement queue
with list, we push on head, and pop from tail.
163
164 CHAPTER 11. QUEUE
1: function Enqueue(Q, x)
2: p ← Node(x)
3: Next(p) ← NIL
4: Next(Tail(Q)) ← p
5: Tail(Q) ← p
As there is at least a S node even for empty queue, we need not check if the tail is
NIL.
1: function Dequeue(Q)
2: x ← Head(Q)
3: Next(Head(Q)) ← Next(x)
4: if x = Tail(Q) then . Q is empty
5: Tail(Q) ← Head(Q)
6: return Key(x)
As the S node is ahead of all other nodes, Head actually returns the next node to S,
as shown in fig. 11.2. It’s easy to expand this implementation to concurrent environment
with two locks on the head and tail respectively. S node helps to prevent dead-lock when
the queue is empty [59] [60] .
NIL
head
1: function Enqueue(Q, x)
11.3. PAIRED-LIST QUEUE 165
a[0] a[1] ... a[i] ... ... a[j] ... a[i] ...
(c) Enqueue more elements to the (d) Enqueue the next element to
boundary the first cell.
tail head boundary
Exercise 11.1
11.1.1. The circular buffer is allocated with a predefined size. We can use two references,
head and tail instead. How to determine if a circular buffer queue is full or empty?
(the head can be either ahead of tail or behind it.)
f may become empty after a series of pops, while r still contains elements. To continue
166 CHAPTER 11. QUEUE
front
rear
pop, we reverse r to replace f , i.e., ([ ], r) 7→ (reverser, [ ]). We need check and balance
in every push/pop:
balance [ ] r = (reverse r, [ ])
(11.2)
balance f r = (f, r)
Although the time is bound to linear time when reverse r, the amortised performance
is constant time. Update push/pop as below:
There is a symmetric implementation with a pair of arrays (table 11.1). Connect two
arrays head to head, as shown in fig. 11.6. When R becomes empty, reverse array F to
replace R.
x1 x2 ... xn push
front
y1 y2 ... ym pop
rear
Exercise 11.2
11.2.1. Why need balance check and adjustment after push?
11.2.2. Do the amortized analysis for the paired-list queue.
11.2.3. Implement the paired-array queue.
11.4. BALANCE QUEUE 167
|r| ≤ |f | (11.4)
We check the lengths in every push/pop, however, it takes linear time to compute
length. We can record the length in a variable, and update it during push/pop. Denote
the paired-list queue as (f, n, r, m), where n = |f |, m = |r|. From the balance rule
eq. (11.4), we only need check the length of f to test if a queue is empty:
Q = φ ⇐⇒ n = 0 (11.5)
Where:
(
m≤n: (f, n, r, m)
balance (f, n, r, m) = (11.7)
otherwise : (f ++ reverse r, m + n, [ ], 0)
where:
reverse0 a [ ] = a
(11.9)
reverse0 a (x:xs) = reverse0 (x:a) xs
step Sr a [ ] = (Sf , a)
(11.10)
step Sr a (x:xs) = (Sr , (x:a), xs)
Each step, we check and transform the state. Sr means the reverse is on going. If
there is no remaining element to reverse, we change the state to Sf (done); otherwise,
link the head x ahead of a. Every step terminates, but not continues running recursively.
The new state with the intermediate reverse result is input to the next step. For example:
168 CHAPTER 11. QUEUE
However, it only solves half problem. We also need slow-down + + computation, which
is more complex. Use state machine again. To concatenate xs + + ys, we first reverse xs to
←− then pick elements from ←
xs, − one by one, and link each ahead of ys. The idea is similar
xs
to reverse :
0
xs +
+ ys = (reverse reverse xs) ++ ys
= (reverse0 [ ] ←
− +
xs) + ys (11.11)
= reverse0 ys ← −
xs
←−
We need add another state: after reverse r, step by step concatenate from f . The
three states are: Sr of reverse, Sc of concatenate, Sf of completion. The two phases are:
←
−
1. Stepped reverse f to f , and r to ←
−
r in parallel;
←
−
2. Stepped take elements from f , and link each ahead of ←
−
r.
|r0 | = |f 0 | + 1
= |f | + |r| + 1 (11.13)
= 2n + 2
Thanks to the balance rule. Even repeat pushing as many as possible, the 2n + 2
steps are guaranteed to be completed before the next time when the queue is unbalanced,
hence the new f will be ready. We can safely start to compute f 0 + + reverse r0 .
However, pop may happen before the completion of 2n + 2 steps. We are facing the
situation that needs to extract element from f , while the new front list f 0 = f +
+ reverse r
hasn’t been ready yet. To solve this issue, we duplicate a copy of f when doing reverse f .
We are safe even repeat pop for n times. Table 11.2 shows the queue during phase 1
(reverse f and r in parallel)1 .
The copy of f is exhausted after repeating n pops. We are about to start stepped
concatenation. What if pop happens at this time? Since f is exhausted, it becomes [ ].
We needn’t concatenate anymore. This is because f + +← −
r = [ ]+ +← −
r = ← −
r . In fact,
1 Although it takes linear time to duplicate a list, however, the one-shot copying won’t happen at all.
We actually duplicate the reference to the front list, and delay the element level copying to each step
11.5. REAL-TIME QUEUE 169
we only need to concatenate the elements in f that haven’t been popped. Because we
pop from the head of f , let us use a counter to record the remaining elements in f . It’s
initialized as 0. We apply +1 every time when reverse an element. It means we need
concatenate this element in the future; Whenever pop happens, we apply -1, means we
needn’t concatenate this one any more. We also decrease it during concatenation, and
cancel the process when it is 0. Below is the updated state transformation:
Where abort decreases the counter to cancel an element for concatenation. We’ll define
it later. balance triggers stepped f ++ reverse r if the queue is unbalanced, otherwise runs
a step:
(
m≤n: step f n S r m
balance f n S r m =
otherwise : step f (n + m) (next (Sr , 0, [ ], f, [ ], r)) [ ] 0
(11.16)
Where step transforms the state machine, ends with the idle state S0 when completes.
Where:
queue (Sf , f 0 ) = (f 0 , n, S0 , r, m) replace f with f 0
(11.18)
queue S 0 = (f, n, S 0 , r, m)
Exercise 11.3
170 CHAPTER 11. QUEUE
11.3.1. Why need rollback an element (we cancelled the previous ‘cons’, removed x and
return a as the result) when n = 0 in abort?
11.3.2. Implement the real-time queue with paired arrays. We can’t copy the array when
start rotation, or the performance will downgrade to linear time. Please implement
‘lazy’ copy, i.e., copy an element per step.
rotate xs ys a = xs +
+ (reverse ys) +
+a (11.20)
Initialize xs as the front list f , ys as the rear list r, the accumulator a empty [ ]. We
implement rotate from the edge case:
Summarize together:
In lazy evaluation settings, (:) is delayed to push/pop, hence the rotate is broken
down. We change the paired-list queue definition to (f, r, rot), where rot is the on-going
f++ reverse r computation. It is initialized empty [ ].
(
push x (f, r, rot) = balance f (x:r) rot
(11.24)
pop (x:f, r, rot) = balance f r rot
Every time, balance advances the rotation one step, and starts another round when
completes.
Exercise 11.4
Implement bidirectional queue, support add/remove elements on both head and tail
in constant time.
11.7. APPENDIX - EXAMPLE PROGRAMS 171
K deQ(Queue<K> q) {
var p = q.head.next //the next of S
q.head.next = p.next
if q.tail == p then q.tail = q.head //empty
return p.key
}
Queue(int max) {
buf = Array<K>(max)
size = max
head = cnt = 0
}
}
void enQ(Queue<K> q, K x) {
if q.cnt < q.size {
q.buf[offset(q.head + q.cnt, q.size)] = x;
q.cnt = q.cnt + 1
}
}
K deQ(Queue<K> q) {
K x = null
if q.cnt > 0 {
x = head(q)
q.head = offset(q→head + 1, q→size);
q.cnt = q.cnt -1
}
return x
}
Real-time queue:
data State a = Empty
| Reverse Int [a] [a] [a] [a] −− n, acc f, f, acc r, r
| Concat Int [a] [a] −− n, acc, reversed f
| Done [a] −− f’ = f ++ reverse r
balance f n s r m
| m ≤ n = step f n s r m
| otherwise = step f (m + n) (next (Reverse 0 [] f [] r)) [] 0
empty = LQ [] [] []
Sequence
Sequence is the combination of array and list. We set the following goals for the ideal
sequence:
Array and list only satisfy these goals partially as shown in below table, where n is
the length for the sequence, and use n1 , n2 for lengths if there are two.
We give three implementations: binary random access list, concatenate-able list, and
finger tree.
n = 20 e0 + 21 e1 + ... + 2m em (12.1)
173
174 CHAPTER 12. SEQUENCE
If ei 6= 0, there is a full binary tree ti of size 2i . For example in fig. 12.1, the length
of the sequence is 6 = (110)2 . The lowest bit is 0, there’s no tree of size 1; the 2nd bit is
1, there is t1 of size 2; the highest bit is 1, there is t2 of size 4. In this way, we represent
sequence [x1 , x2 , ..., xn ] as a list of trees. Each tree has unique size, in ascending order.
We call it binary random access list [3] . Customize the binary tree definition: (1) only
store the element in leaf node as (x); (2) augment the size in each branch node as (s, l, r),
where s is the size of the tree, l, r are left and right sub-trees respectively. We get the
size as below:
size (x) = 1
(12.2)
size (s, l, r) = s
To add a new element y before sequence S, we create a singleton t0 tree t0 = (y), then
insert it to the forest. insert y S = insertT (y) S, or in Curried form:
Compare t0 with the first tree ti in the forest, if ti is bigger, then put t0 ahead of the
forest (in constant time); if they have the same size, then link them to a bigger tree (in
constant time): t0i+1 = (2s, ti , t0 ), then recursively insert t0i+1 to the forest, as shown in
fig. 12.2.
insertT t [ ] = [t]
(
size t < size t1 : t : t1 : ts (12.4)
insertT t (t1 :ts) =
otherwise : insertT (link t t1 ) ts
Figure 12.2: Insert x1 , x2 , ..., x6 . (a) Insert x1 , (b) Insert x2 , link to [t1 ]. (c) Insert x3 ,
result [t0 , t1 ]. (d) Insert x4 , link twice, generate [t2 ]. (e) Insert x5 , result [t0 , t2 ]. (f) Insert
x6 , result [t1 , t2 ].
Figure 12.3: Remove: (a) x1 , x2 , ..., x5 as [t0 , t2 ]. (b) Remove x5 (t0 ) directly. (c) Remove
x4 . Split twice to get [t0 , t0 , t1 ], then remove the head to get [t0 , t1 ].
176 CHAPTER 12. SEQUENCE
1. For the first tree t in the forest, if i ≤ size(t), then the element is in t, we next
lookup t for the target element;
2. Otherwise, let i0 = i − size(t), then recursively lookup the i0 -th element in the rest
trees.
(
i ≤ size t : lookupT i t
(t:ts)[i] = (12.7)
otherwise : ts[i − size t]
Where lookupT applies binary search. If i = 1, returns the root, else divides the tree
and recursively looks up:
lookupT 1 (x) = x
i ≤ b s c :
2
lookupT i t1 (12.8)
lookupT i (s, t1 , t2 ) =
otherwise : lookupT (i − b s c) t2
2
Figure 12.4 gives the steps to lookup the 4-th element in a sequence of length 6. The
size of the first tree is 2 < 4, move to the next tree and update the index to i0 = 4−2. The
size of the second tree is 4 > i0 = 2, we need lookup it. Because the index 2 is less than
the half size 4/2 = 2, we lookup the left, then the right, and finally locate the element.
Similarly, we can alter an element at a given position.
Figure 12.4: Steps to access S[4]: (a) S[4], 4 > size(t1 ) = 2, (b) S 0 [4 − 2] ⇒ lookupT 2 t2 ,
size(t2 )
(c) 2 ≤ b c ⇒ lookupT 2 lef t(t2 ), (d) lookupT 1 right(lef t(t2 )), return x3 .
2
There are O(lg n) full binary trees to hold n elements. For index i, we need at most
O(lg n) time to locate the tree, the next lookup time is proportion to the height, which
is O(lg n) at most. The overall random access time is bound to O(lg n).
Exercise 12.1
The LSB changes every time when insert, total 2m times. The second bit changes
every other time (link trees), total 2m−1 times. The second highest bit only changes 1
time, links all trees to the final one. The highest bit changes to 1 after insert the last
element. Sum as: T = 1 + 1 + 2 + 4 + ... + 2m−1 + 2m = 2m+1 . Hence the amortized
performance is:
178 CHAPTER 12. SEQUENCE
2m+1
O(T /n) = O( ) = O(1) (12.11)
2m
Proved the amortized constant time performance.
Exercise 12.2
12.2.1. Implement the random access for numeric representation S[i], 1 ≤ i ≤ n, where n
is the length of the sequence.
12.2.2. Analyze the amortized performance of delete.
12.2.3. We can represent the full binary tree with array of length 2m , where m is none
negative integer. Implement the binary tree forest, insert, and random access.
front rear
1: function Insert(x, S)
2: Append(x, Front(S))
3: function Append(x, S)
4: Append(x, Rear(S))
When access the i-th element, we first determine which array i indexes to, f or r? If
i ≤ |f |, the element is in f . Because f and r are connected head to head, we need index
from right of f at position |f | − i + 1; if i > |f |, the element is in r. We index from left
at position i − |f |.
1: function Get(i, S)
2: f, r ← Front(S), Rear(S)
3: n ← Size(f )
4: if i ≤ n then
5: return f [n − i + 1] . reversed
6: else
7: return r[i − n]
Removing can make f or r empty ([ ]), while the other is not. To re-balance, we
halve the none empty one, and reverse either half to form a new pair. As f and r are
symmetric, we can swap them, call Balance, then swap back.
1: function Balance(S)
2: f ← Front(S), r ← Rear(S)
3: n ← Size(f ), m ← Size(r)
4: if F = [ ] then
12.4. CONCATENATE-ABLE LIST 179
m
5: k←b c
2
6: return (Reverse(r[1...k]), r[(k + 1)...m])
7: if R = [ ] then
n
8: k←b c
2
9: return (f [(k + 1)...n], Reverse(f [1...k]))
10: return (f, r)
Every time when delete, we check f , r and balance them:
1: function Remove-Head(S)
2: Balance(S)
3: f, r ← Front(S), Rear(S)
4: if f = [ ] then . S = ([ ], [x])
5: r←[]
6: else
7: Remove-Last(f )
8: function Remove-Tail(S)
9: Balance(S)
10: f, r ← Front(S), Rear(S)
11: if r = [ ] then . S = ([x], [ ])
12: f ←[]
13: else
14: Remove-Last(r)
Due to reverse, the performance is O(n) in the worst case, where n is the number of
elements, while it is amortized constant time.
Exercise 12.3
12.3.1. Analyze the amortized performance for paired-array delete.
s++∅ = s
∅++s = s (12.12)
(x, Q) +
+ s = (x, push s Q)
When insert new element z, we create a singleton of (z, ∅), then concatenate it to the
sequence:
(
insert z s = (z, ∅) + +s
(12.13)
append z s = s + + (z, ∅)
180 CHAPTER 12. SEQUENCE
x1
c1 c2 ... cn
x1
c1 c2 ... cn
Figure 12.6: Concatenate-able list: (a) (x1 , Qx ) = [x1 , x2 , ..., xn ], (b) Concatenate with
(y1 , Qy ) = [y1 , y2 , ..., ym ], addcn+1 to Qx .
12.5. FINGER TREE 181
When delete x1 from head, we lose the root. The rest sub-trees are all concatenate-able
lists. We concatenate them together to a new sequence.
concat ∅ =
(12.14)
∅
concat Q = (top Q) +
+ concat (pop Q)
The real-time queue holds sub-trees, we pop the first c1 , and recursively concatenate
the rest to s, then concatenate c1 and s. We define delete from head with concat.
Function concat traverses the queue, and reduces to a result, it essentially folds on
Q [10] .
f old f z ∅ = z
(12.16)
f old f z Q = f (top Q) (f old f z (pop Q))
Where f is a binary function, z is zero unit. Here are examples of folding on queue
Q = {1, 2, ..., 5}:
concat = f old (+
+) ∅ (12.17)
Normally, the add, append, delete randomly happen. The performance is bound
to linear time in worst case: delete after repeatedly add n elements. All n − 1 sub-
trees are singleton. concat takes O(n) time to consolidate. Nevertheless, the amortized
performance is constant time.
1. empty ∅;
2. a singleton leaf (x);
3. a tree with three parts: a sub-tree, left and right finger, denoted as (f, t, r). Each
finger is a list up to 3 elements1 .
1 f: front, r: rear
182 CHAPTER 12. SEQUENCE
12.5.1 Insert
a b NIL a d c b NIL a
NIL
e f a
d c b
As shown in fig. 12.7. (1) is ∅; (2) is a singleton; (3) has two element in f and r for
each; (4) f will exceed 2-3 tree when add more. We need re-balance as in (5). There are
two elements in f , the middle is singleton of a 2-3 tree. These examples are list as below:
∅ Empty
(a) Lf a
([b], ∅, [a]) Tr [b] Empty [a]
([d, c, b], ∅, [a]) Tr [d, c, b] Empty [a]
([f, e], (d, c, b), [a]) Tr [f, e] Lf (Br3 d c b) [a]
In (5), the middle component is a singleton leaf. Finger tree is recursive, apart from
f and r, the middle is a deeper finger tree of type T ree (N ode a). One more wrap, one
level deeper. Summarize above examples, we define insert a to tree T as below:
3. For T = (f, t, r), if there are < 3 elements in f , we insert a to f , otherwise (≥ 3),
extract the last 3 elements from f to a 2-3 tree t0 , recursively insert t0 to t, then
insert a to f .
12.5. FINGER TREE 183
insert a ∅ = (x)
insert a (b) = ([a], ∅, [b])
(12.18)
insert a ([b, c, d], t, r) = ([a], insert (b, c, d) t, r)
insert a (f, t, r) = (a:f, t, r)
The insert performance is constant time except for the recursive case. The recursion
time is proportion to the height of the tree h. Because of 2-3 trees, it’s balanced, hence
h = O(lg n), where n is the number of elements. When distribute the recursion to other
cases, the amortized performance is constant time [3] [65] . We can repeatedly insert a list
of elements by folding:
xs t = f oldr insert t xs (12.19)
Exercise 12.4
12.4.1. Eliminate recursion, implement insert with loop.
12.5.2 Extract
We implement extract as the reverse of insert.
extract (a) = (a, ∅)
extract ([a], ∅, [b]) = (a, (b))
extract ([a], ∅, b:bs) = (a, ([b], ∅, bs)) (12.20)
extract ([a], t, r) = (a, (toList f, t0 , r)), where : (f, t0 ) = extract t
extract (a:as, t, r) = (a, (as, t, r))
Where toList flattens a 2-3 tree to list:
toList (a, b) = [a, b]
(12.21)
toList (a, b, c) = [a, b, c]
We skip error handling (e.g., extract from empty tree). If the tree is a singleton leaf,
the result is empty; if there are two elements, the result is a singleton; if f is a singleton
list, the middle is empty, while r isn’t empty, we extract the only one in f , then borrow
one from r to f ; if the middle isn’t empty, we recursively extract a node from the middle,
flatten that node to list to replace f (the original one is extracted). If f has more than
one element, we extract the first. Figure 12.8 gives examples that extract 2 elements. We
define head, tail with extract.
(
head = f st ◦ extract
(12.22)
tail = snd ◦ extract
Exercise 12.5
12.5.1. Eliminate recursion, implement extract in loops.
NIL NIL
Figure 12.8: Extract: (a) A sequence of 10 elements. (b) Extract one, f becomes a
singleton list. (c) Extract another, borrow an element from the middle, flatten the 2-3
tree to a list as the new f .
12.5. FINGER TREE 185
If there are less than 3 elements in r, we append the new element to the tail of r.
Otherwise, we extract 3 to form a 2-3 tree, and recursively append the tree to the middle.
We can repeatedly append a list of elements by folding from left:
t xs = f oldl append t xs (12.24)
remove is reverse of append:
remove (a) = (∅, a)
remove ([a], ∅, [b]) = ((a), b)
remove (f, ∅, [a]) = ((initf, ∅, [lastf ]), a) (12.25)
remove (f, t, [a]) = ((f, t0 , toList r), a), where : (t0 , r) = remove t
remove (f, t, r) = ((f, t, init r), last r)
Where last accesses the tail element of a list, init returns the rest (eq. (1.4) in chapter
1).
12.5.4 concatenate
When concatenate (+ +) two none empty finger trees T1 = (f1 , t1 , r1 ), T2 = (f2 , t2 , r2 ), we
use f1 as the result front f , r2 as the result rear r. Then merge t1 , r1 , f2 , t2 as the middle
tree. Because both r1 and f2 are list of nodes, it is equivalent to the below problem:
merge t1 (r1 +
+ f2 ) t2 =?
Both t1 and t2 are finger trees deeper than T1 and T2 a level. If the type of element
in T1 is a, then the type of element in t1 is Node a. We recursively merge: keep the front
of t1 and rear of t2 , then further merge the middle of t1 , t2 , and the rear of t1 , the front
of t2 .
merge ∅ ts t2 = ts t2
merge t1 ts ∅ = t1 ts
merge (a) ts t2 = merge ∅ (a:ts) t2
merge t1 ts (a) = merge t1 (ts + + [a]) ∅
merge (f1 , t1 , r1 ) ts (f2 , t2 , r2 ) = (f1 , merge t1 (nodes (r1 +
+ ts +
+ f2 )) t2 , r2 )
(12.26)
Where function nodes collects elements to a list of 2-3 trees. This is because the type
of the element in the middle is deeper than the finger.
nodes [a, b] = [(a, b)]
nodes [a, b, c] = [(a, b, c)]
(12.27)
nodes [a, b, c, d] = [(a, b), (c, d)]
nodes (a:b:c:ts) = (a, b, c):nodes ts
We define finger tree concatenation with merge:
(f1 , t1 , r1 ) +
+ (f2 , t2 , r2 ) = (f1 , merge t1 (r1 +
+ f2 ) t 2 , r 2 ) (12.28)
Compare with eq. (12.26), concatenation is essentially merge, let us define them in a
unified way:
T1 +
+ T2 = merge T1 [ ] T2 (12.29)
The performance is proportion to the number of recursions, which is the smaller height
of the two trees. The 2-3 trees are balanced, the height is O(lg n), where n is the number
of elements. In edge cases, merge performs as same as insert (call insert at most 8 times)
in amortized constant time; In worst case, the performance is O(m), where m is the height
difference between the two trees. The overall performance is bound O(lg n), where n is
the total elements of the two trees.
186 CHAPTER 12. SEQUENCE
size ∅ = 0
size (x) = size x (12.30)
size (s, f, t, r) = s
Here size (x) is not necessarily 1. x can be a deeper node, like Node a. It is 1 only
at level one. For termination, we wrap the level one x as an element cell (x)e , and define
size (x)e = 1 (see the example in appendix).
(
x C t = insert (x)e t
(12.31)
t B x = append t (x)e
and:
(
xs t = f oldr (C) t xs
(12.32)
t xs = f oldl (B) t xs
Given a list of nodes (e.g., a finger at deeper level), we calculate the size by: sum ◦
(map size). We need update the size when insert or delete element. With size augmented,
we can lookup the tree at any position i. The finger tree (s, f, t, r) has recursive structure.
Let the size of these components be sf , st , sr , where s = sf +st +sr . If i ≤ sf , the location
is in f , we further lookup f ; if sf < i ≤ sf +st , then the location is in t, we need recursively
lookup t; otherwise, lookup r. We also need to handle the leaf case of (x). Use a pair
(i, t) to define the position i at data structure t, and define lookupT as below:
Where sf = sum (map size f ), st = size t, are the sizes of the first two components.
When lookup location i, if the tree is a leaf (x), the result is (i, x); otherwise we need
figure out which component among (s, f, t, r) that i points to. If it either in f or r, then
we lookup the figure:
(
i < size x : (i, x)
lookups i (x:xs) = (12.35)
otherwise : lookups (i − size x) xs
If i is in some element x (i < size x), we return (i, x); otherwise, we continue looking
up the rest elements. If i points to the middle t, we recursively lookup to obtain a place
(i0 , m), where m is a 2-3 tree. We next lookup m:
12.6. APPENDIX - EXAMPLE PROGRAMS 187
(
i < size t1 : (i, t1 )
lookupN i (t1 , t2 ) =
otherwise : (i − size t1 , t2 )
i < size t1 :
(i, t1 )
lookupN i (t1 , t2 , t3 ) = size t1 ≤ i < size t1 + size t2 : (i − size t1 , t2 )
otherwise :
(i − size t1 − size t2 , t3 )
(12.36)
Because we previously wrapped x inside (x)e , we need extract x out finally:
We return the result of type Maybe a = Nothing | Just a, means either found, or
lookup failed2 . The random access looks up the finger tree recursively, proportion to the
tree depth. Because finger tree is balanced, the performs is bound to O(lg n), where n is
the number of elements.
We achieve balanced performance with finger tree implementation. The operations at
head and tail are bound to amortized constant time, the concatenation, split, and random
access are in logarithm time [67] . By the end of this chapter, we’ve seen many elementary
data structures. They are useful to solve some classic problems. For example, we can use
sequence to implement MTF (move-to-front3 ) encoding algorithm [68] . MTF moves any
element at position i to the front of the sequence (see Exercise 12.6.2):
In the next two chapters, we’ll go through the classic divide and conquer sorting
algorithms, including quick sort, merge sort, and their variants; then give the string
matching algorithms and elementary search algorithms.
Exercise 12.6
12.6.1. For random access, how to handle the empty tree ∅ and out of bound cases?
12.6.2. Implement cut i S, split sequence S at position i.
size (Leaf _) = 1
size (Node sz _ _) = sz
2 Some programming environments provide equivalent tool, like the Optional<T> in Java/C++.
3 Used in Burrows-Wheeler transform (BWT) data compression algorithm.
188 CHAPTER 12. SEQUENCE
Paired-array sequence:
Data Seq<K> {
[K] front = [], rear = []
}
K get(Int i, Seq<K> s) {
Int n = length(s.front)
return if i < n then s.front[n - i - 1] else s.rear[i - n]
}
Concatenate-able list:
data CList a = Empty | CList a (Queue (CList a))
x ++ Empty = x
Empty ++ y = y
(CList x q) +
+ y = CList x (push q y)
fold f z q | isEmpty q = z
| otherwise = (top q) `f` fold f z (pop q)
concat = fold (+
+) Empty
insert x xs = (wrap x) +
+ xs
12.6. APPENDIX - EXAMPLE PROGRAMS 189
append xs x = xs +
+ wrap x
head (CList x _) = x
tail (CList _ q) = concat q
Finger tree:
−− 2-3 tree
data Node a = Tr2 Int a a
| Tr3 Int a a a
−− finger tree
data Tree a = Empty
| Lf a
| Br Int [a] (Tree (Node a)) [a] −− size, front, mid, rear
−− left
x <| Seq xs = Seq (Elem x `cons` xs)
−− right
Seq xs |> x = Seq (xs `snoc` Elem x)
190 CHAPTER 12. SEQUENCE
−− concatenate
Seq xs + ++ Seq ys = Seq (xs >+< ys)
xs >+< ys = merge xs [] ys
−− index
data Place a = Place Int a
People prove the upper limit of performance is O(n lg n) for comparison based sort [51] .
This chapter gives two divide and conquer sort algorithms: quick sort and merge sort,
both achieve O(n lg n) time bound. We also give their variants, like natural merge sort,
in-place merge sort, and etc.
193
194 CHAPTER 13. QUICK SORT AND MERGE SORT
We say and, but not ‘then’, indicate we can sort left and right in parallel. C. A. R.
Hoare developed quick sort in 1960 [51] [78] . There are varies of ways to pick the pivot, for
example, always choose the first element.
sort [ ] = [ ]
(13.1)
sort (x:xs) = sort [y ← xs, y ≤ x] + + sort [y ← xs, x < y]
+ [x] +
We use ZF expression (see sections 1.4.1 and 1.7) to filter the list. Below is the example
program:
sort [] = []
sort (x:xs) = sort [y | y←xs, y ≤ x] +
+ [x] +
+ sort [y | y←xs, x < y]
We assume to sort in ascending order. We can abstract the (≤) as generic comparison
to sort different things like numbers, strings, and etc. We needn’t total ordering, but at
least need strict weak ordering [79] [52] (see section 9.2).
13.1.1 Partition
We traverse the elements in two passes: first filter all elements ≤ x ; next filter all > x.
Let us combine them into one pass:
part p [ ] = (
([ ], [ ])
p(x) : (x:as, bs), where (as, bs) = part p xs (13.2)
part p (x:xs) =
otherwise : (as, x:bs)
sort [ ] = []
(13.3)
sort (x:xs) = sort as + + sort bs, where (as, bs) = part (≤ x) xs
+ [x] +
Where:
(
p(x) : (x:as, bs)
f (as, bs) x = (13.5)
otherwise : (as, x:bs)
13.1. QUICK SORT 195
It’s essentially to accumulate to (as, bs). If p(x) holds, then add x to as, otherwise to
bs. Change the partition to tail recursive:
part p [ ] as bs = (as,
( bs)
p(x) : part p xs (x:as) bs (13.6)
part p (x:xs) as bs =
otherwise : part p xs as (x:bs)
sort0 s [ ] = s
(13.7)
sort s (x:xs) = sort0 (x : sort0 s bs) as
0
Start it with an empty list: sort = sort0 [ ]. After partition, we need recursively sort
as, bs. We can first sort bs, prepend x, then pass it as the new accumulator to sort as:
sort = sort' []
sort' s [] = s
sort' s (x:xs) = sort' (x : sort' s bs) as where
(as, bs) = part xs [] []
part [] as bs = (as, bs)
part (y:ys) as bs | y ≤ x = part ys (y:as) bs
| otherwise = part ys as (y:bs)
(b) Initialize
p L R
swap
(c) Terminate
1. The pivot is the left element: p = A[l]. It moves to the final position after partition;
196 CHAPTER 13. QUICK SORT AND MERGE SORT
Exercise 13.1
13.1.1. Optimize the basic quick sort definition for the singleton list case.
13.1. QUICK SORT 197
13.1.3 Performance
Quick sort performs well in most cases. Consider the best/worst cases. For the best case,
it always halves the elements into two equal sized parts. As shown in figure fig. 13.2,
there are total O(lg n) levels of recursions. At level one, it processes n elements with
one partition; at level two, partitions twice, each processes n/2 elements, taking total
2O(n/2) = O(n) time; at level three, partitions four times, each processes n/4 elements,
taking total O(n) time too, ..., at the last level, there are n singleton segments, taking
total O(n) time. Sum all levels, the time is bound to O(n lg n).
...
1 1 ... ... 1
For the worst case, the partition is totally unbalanced, one part is of O(1) length, the
other is O(n). The level of recursions decays to O(n). Model the partition as a tree.
It’s balanced binary tree in the best case, while it becomes a linked-list of O(n) length
in the worst case. Every branch node has an empty sub-tree. At each level, we process
all elements, hence the total time is bound to O(n2 ). This is same as insertion sort, and
selection sort. There are several challenging cases, for example, the sequence has many
duplicated elements, or is largely ordered, and etc. There isn’t a method can avoid the
worst case completely.
Average caseF
Quick sort performs well in average. For example, even if every partition gives two parts
of 1:9, the performance still achieves O(n lg n) [4] . We give two methods to evaluate the
performance. The first one is based on the fact, that the performance is proportion to
the number of comparisons. In selection sort, every two elements are compared, while in
quick sort, we save many comparisons. When partition sequence [a1 , a2 , a3 , ..., an ] with a1
as the pivot, we obtain two sub sequences A = [x1 , x2 , ..., xk ] and B = [y1 , y2 , ..., yn−k−1 ].
After that, none element xi in A will be compared with any yj in B. Let the sorted result
be [a1 , a2 , ..., an ], if ai < aj , we do not compare them if and only if there is some element
ak , where ai < ak < aj , is picked as the pivot before either ai or aj being the pivot. In
other word, the only chance that we compare ai and aj is either ai or aj is chosen as the
pivot before any other elements in ai+1 < ai+2 < ... < aj−1 being the pivot. Let P (i, j)
be the probability that we compare ai and aj . We have:
2
P (i, j) = (13.8)
j−i+1
The total number of comparisons is:
n−1 n
(13.9)
X X
C(n) = P (i, j)
i=1 j=i+1
198 CHAPTER 13. QUICK SORT AND MERGE SORT
The other method uses the recursion. Let the length of the sequence be n, we partition
it into two parts of length i and n − i − 1. The partition takes cn time because it compares
every element with the pivot. The total time is:
Where T (n) is the time to sort n elements. i equally distributes across 0, 1, ..., n − 1.
Taking math expectation:
Substitute n to n − 1:
n−2
(13.15)
X
(n − 1)T (n − 1) = 2 T (i) + c(n − 1)2
i=0
Take (eq. (13.14)) - (eq. (13.15)), cancel all T (i) for 0 ≤ i < n − 1.
T (n − 1) T (n − 2) 2c
= +
n n−1 n
T (n − 2) T (n − 3) 2c
= +
n−1 n−2 n−1
...
T (2) T (1) 2c
= +
3 2 3
Sum up and cancel the same components on both sides, we get a function of n.
n+1
T (n) T (1) X1
= + 2c (13.18)
n+1 2 k
k=3
13.1.4 Improvement
The Partition procedure doesn’t perform well when there are many duplicated elements.
Consider the extreme case that all n elements are equal [x, x, ..., x]:
1. From the quick sort definition: pick any element as the pivot, hence p = x, partition
into two sub-sequences. One is [x, x, ..., x] of length n − 1, the other is empty. Next
recursively sort the n − 1 elements, the total time decays to O(n2 ).
2. Modify the partition with < x and > x. The result are two empty sub-sequences,
and n elements equal to x. The recursion on empty sequence terminates immedi-
ately. The result is [ ] + + [ ]. The performance is O(n).
+ [x, x, ..., x] +
We improve from binary partition to ternary partition to handle the duplicated ele-
ments:
sort [ ] = [ ]
(13.21)
sort (x:xs) = sort S +
+ sort E +
+ sort G
Where:
S = [y ← xs, y < x]
E = [y ← xs, y = x]
G = [y ← xs, y > x]
sort A [ ] = A
(13.22)
sort A (x:xs) = sort (E +
+ sort A G) S
200 CHAPTER 13. QUICK SORT AND MERGE SORT
The sub-list E contains elements of the same value, hence sorted. We first sort G with
the accumulator A, then prepend E as the new accumulator, and use it to sort S. We
also improve the partition with accumulator:
part S E G x [ ] = (S,
E, G)
y < x : (y:S, E, G)
(13.23)
part S E G x (y:ys) = y=x: (S, y:E, G)
y>x: (S, E, y:G)
Richard Bird gives another improvement [1] : collect the recursive sort results with a
list and concatenate it finally:
sort :: (Ord a) ⇒ [a] → [a]
sort = concat ◦ (pass [])
Robert Sedgewick develops the two-way partition method [69] [2] . Use two pointers i, j
to the left and right boundaries. Pick the first element as the pivot p. Advance i to right
till meet an element ≥ p; while (in parallel) move j to left till meet an element ≤ p. At
this time, all elements on the left to i are less than the pivot (< p), while those on the
right to j are greater than the pivot (> p). i points to an element that is ≥ p, and j
points to an element that is ≤ p, as shown in fig. 13.3 (a). To move all elements ≤ p
to the left, and the remaining to the right, we exchange x[i] ↔ x[j], then continue scan.
Repeat this till i meets j. At any time, we keep the invariant: all elements one the left
to i (include i) are ≤ p; while all on the right to j (include j) are ≥ p. The elements
between i and j are yet to scan, as shown in fig. 13.3 (b).
pivot p ≥p ≤p
A[l] ... < p ... A[i] ... ? ... A[j] ... > p ...
When i meets j, we need an additional exchange: swap the pivot p to position j. Then
recursive sort the sub-arrays A[l...j) and A[i...u).
1: procedure Sort(A, l, u) . Sort range [l, u)
2: if u − l > 1 then . At least 2 elements
3: i ← l, j ← u
4: pivot ← A[l]
5: loop
13.1. QUICK SORT 201
6: repeat
7: i←i+1
8: until A[i] ≥ pivot . Ignore i ≥ u
9: repeat
10: j ←j−1
11: until A[j] ≤ pivot . Ignore j < l
12: if j < i then
13: break
14: Exchange A[i] ↔ A[j]
15: Exchange A[l] ↔ A[j] . Move the pivot
16: Sort(A, l, j)
17: Sort(A, i, u)
For the extreme case that all elements are equal, the array is partitioned into two same
n
parts with swaps. Because of the balanced partition, the performance is O(n lg n). It
2
takes less swaps than the one pass scan method, since it skips the elements on the right
side of the pivot. We can combine the 2-way scan and the ternary partition. Only
recursively sort the elements different with the pivot. Jon Bentley and Douglas McIlroy
develop a method as shown in fig. 13.4 (a), that store the elements equal to the pivot on
both sides [70] [71] .
pivot p i j q
A[l] ... = ... ... < ... ... ? ... ... > ... ... = ...
We scan from two sides, pause when i reaches an element ≥ the pivot, and j reaches
one ≤ the pivot. If i doesn’t meet or pass j, we exchange A[i] ↔ A[j], then check if A[i]
or A[j] equals to the pivot. If yes, we exchange A[i] ↔ A[p] or A[j] ↔ A[q] respectively.
Finally, we swap all the elements equal to the pivot to the middle. This step does nothing
if all elements are distinct. The partition result is shown as fig. 13.4 (b). We next only
recursively sort the elements not equal to the pivot.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: i ← l, j ← u
4: p ← l, q ← u . point to the boundaries of duplicated elements
5: pivot ← A[l]
6: loop
7: repeat
8: i←i+1
9: until A[i] ≥ pivot . Ignore i ≥ u case
10: repeat
202 CHAPTER 13. QUICK SORT AND MERGE SORT
11: j ←j−1
12: until A[j] ≤ pivot . Ignore j < l case
13: if j ≤ i then
14: break
15: Exchange A[i] ↔ A[j]
16: if A[i] = pivot then . duplicated element
17: p←p+1
18: Exchange A[p] ↔ A[i]
19: if A[j] = pivot then
20: q ←q−1
21: Exchange A[q] ↔ A[j]
22: if i = j and A[i] = pivot then
23: j ← j − 1, i ← i + 1
24: for k from l to p do . Swap the duplicated elements to the middle
25: Exchange A[k] ↔ A[j]
26: j ←j−1
27: for k from u − 1 down-to q do
28: Exchange A[k] ↔ A[i]
29: i←i+1
30: Sort(A, l, j + 1)
31: Sort(A, i, u)
It becomes complex when combine the 2-way scan and the ternary partition. Alter-
natively, we change the one pass scan to the ternary partition directly. Pick the first
element as the pivot, as shown in fig. 13.5. At any time, the left part contains elements
< p; the next part contains those = p; and the right part contains those > p. The bound-
aries are i, k, j. Elements between [k, j) are yet to be partitioned. We scan from left to
right. When start, the part < p is empty; the part = p has an element; i points to the
lower boundary, k points to the next. The part > p is empty too, j points to the upper
boundary.
i k j
... < p ... ... = p ... ... ? ... ... > p ...
Iterate on k, if A[k] = p, then move k to the next; if A[k] > p, then exchange
A[k] ↔ A[j − 1], the range of elements that > p increases by one. Its boundary j moves
to left a step. Because we don’t know if the element moved to k is still > p, we need
compare again and repeat. Otherwise if A[k] < p, we exchange A[k] ↔ A[i], where A[i]
is the first element that = p. The partition terminates when k meets j.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: i ← l, j ← u, k ← l + 1
4: pivot ← A[i]
5: while k < j do
6: while pivot < A[k] do
7: j ←j−1
8: Exchange A[k] ↔ A[j]
13.1. QUICK SORT 203
Challenging cases
Although ternary partition handles duplicated elements well, there are other challenging
cases. For example, when most elements are ordered (ascending or descending), the
partition is unbalanced. Figure 13.6 gives two cases: [x1 < x2 < ... < xn ] and [y1 > y2 >
... > yn ]. It’s easy to give more, for example: [xm , xm−1 , ..., x2 , x1 , xm+1 , xm+2 , ...xn ],
where [x1 < x2 < ... < xn ], and [xn , x1 , xn−1 , x2 , ...] as shown in fig. 13.7.
[]
[]
[] ...
[]
(a) Partition tree of [x1 < x2 < ... < xn ], the sub-trees of ≤ p are empty.
[]
[]
... []
[]
(b) Partition tree of [y1 > y2 > ... > yn ], the sub-trees of ≥ p are empty.
In these challenging cases, the partition is unbalanced when choose the first element as
the pivot. Robert Sedgwick improves the pivot selection [69] : Instead pick a fixed position,
sample several elements to avoid bad pivot. We sample the first, the middle, and the last,
pick the median as the pivot. We can either compare every two (total 3 times) [70] , or
swap the least one to head, swap the greatest one to end, and move the median to the
middle.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
204 CHAPTER 13. QUICK SORT AND MERGE SORT
[] []
... [] [] ...
[] []
[]
[]
[]
[]
... []
l+u u−l
3: m←b c . or l + to void overflow
2 2
4: if A[m] < A[l] then . Ensure A[l] ≤ A[m]
5: Exchange A[l] ↔ A[m]
6: if A[u − 1] < A[l] then . Ensure A[l] ≤ A[u − 1]
7: Exchange A[l] ↔ A[u − 1]
8: if A[u − 1] < A[m] then . Ensure A[m] ≤ A[u − 1]
9: Exchange A[m] ↔ A[u − 1]
10: Exchange A[l] ↔ A[m]
11: (i, j) ← Partition(A, l, u)
12: Sort(A, l, i)
13: Sort(A, j, u)
This implementation handles the above four challenging cases well, known as the
‘median of three’. Alternatively, we can randomly pick the pivot:
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: Exchange A[l] ↔ A[ Random(l, u) ]
4: (i, j) ← Partition(A, l, u)
5: Sort(A, l, i)
6: Sort(A, j, u)
Where Random(l, u) returns integer l ≤ i < u randomly. We swap A[i] with the
first element as the pivot. This method is called random quick sort [4] . Theoretically,
neither ‘median of three’ nor random quick sort can avoid the worst case completely. If
the sequence is random, it’s same to choose any one as the pivot. Nonetheless, these
improvements are widely used in engineering practice.
There are other improvements besides partition. Sedgewick find quick sort has over-
head when the list is short, while insert sort performs better [2] [70] . Sedgewick, Bentley
and McIlroy evaluate varies thresholds, as ‘cut-off’. When the elements are less than the
‘cut-off’, then switch to insert sort.
1: procedure Sort(A, l, u)
2: if u − l > Cut-Off then
3: Quick-Sort(A, l, u)
4: else
5: Insertion-Sort(A, l, u)
unfold [ ] = ∅
(13.24)
unfold (x:xs) = (unfold [a ← xs, a ≤ x], x, unfold [a ← xs, a > x])
Compare with the binary tree insert (see chapter 2), unfold creates the tree differently.
If the list is empty, the tree is empty; otherwise, use the first element x as the key, then
recursively build the left, and right sub-trees. Where the left sub-tree has the elements
≤ x; and the right has the elements > x. Define in-order traverse to convert a binary
search tree to ordered list:
206 CHAPTER 13. QUICK SORT AND MERGE SORT
toList ∅ = [ ]
(13.25)
toList (l, k, r) = toList l + + toList r
+ [k] +
We first build the binary search tree through unfold, then pass it to toList to generate
the list, and discard the tree. When eliminate the intermediate tree (through deforestation
by Burstle-Darlington’s work [73] ), we obtain the quick sort.
sort [ ] = [ ]
sort [x] = [x] (13.27)
sort xs = merge (sort as) (sort bs), where : (as, bs) = halve xs
|xs|
Where halve splits the sequence. For array, we can cut at the middle: splitAt b c xs.
2
However, it takes linear time to move to the middle point of a list (see eq. (1.45)):
Where:
Because halve needn’t keep the relative order among elements, we can simplify it with
odd-even split. There are same number of elements in odd and even positions, or they
only differ by one. Define halve = split [ ] [ ], where:
As in fig. 13.8, consider two groups of kids, each is already ordered from short to tall.
They need pass a gate, one kid per time. We arrange the first kid from each group to
compare, the shorter one pass. Repeat this till either group complete pass the gate, then
the remaining kids go one by one.
1 For example in the standard library of Haskell, Python, and Java.
13.2. MERGE SORT 207
merge [ ] bs = bs
merge as [ ] = as
(13.31)
a : merge as (b:bs)
(
a<b:
merge (a:as) (b:bs) =
otherwise : b : merge (a:as) bs
For array, we directly cut at the middle, recursively sort two halves, then merge:
1: procedure Sort(A)
2: n ← |A|
3: if n > 1 then
n
4: m←b c
2
5: X ← Copy-Array(A[1...m])
6: Y ← Copy-Array(A[m + 1...n])
7: Sort(X)
8: Sort(Y )
9: Merge(A, X, Y )
We allocated additional space of the same size as A because Merge is not in-pace.
We repeatedly compare elements from X and Y , pick the less one to A. When either
sub-array finishes, then add all the remaining to A.
1: procedure Merge(A, X, Y )
2: i ← 1, j ← 1, k ← 1
3: m ← |X|, n ← |Y |
4: while i ≤ m and j ≤ n do
5: if X[i] < Y [j] then
6: A[k] ← X[i]
7: i←i+1
8: else
9: A[k] ← Y [j]
10: j ←j+1
11: k ←k+1
12: while i ≤ m do
13: A[k] ← X[i]
14: k ←k+1
15: i←i+1
16: while j ≤ n do
17: A[k] ← Y [j]
18: k ←k+1
19: j ←j+1
To simplify merge, we adjoin ∞ to X and Y 2 .
1: procedure Merge(A, X, Y )
2: Append(X, ∞)
2 −∞ for descending order
208 CHAPTER 13. QUICK SORT AND MERGE SORT
3: Append(Y, ∞)
4: i ← 1, j ← 1, n ← |A|
5: for k ← from 1 to n do
6: if X[i] < Y [j] then
7: A[k] ← X[i]
8: i←i+1
9: else
10: A[k] ← Y [j]
11: j ←j+1
13.2.1 Performance
Merge sort takes two steps: partition and merge. It always halves the sequence. The
binary partition tree is balanced as shown in fig. 13.2. Its height is O(lg n), so as the
recursion depth. The merge happens at all levels, it compares elements one by one from
each sorted sub-sequence, hence takes linear time. For sequence of length n, let T (n) be
the merge sort time, we have below recursive breakdown:
n n n
T (n) = T ( ) + T ( ) + cn = 2T ( ) + cn (13.32)
2 2 2
n
The time consists of three parts: sort the first and second halves, each takes T ( ) time;
2
and merge in cn time, where c is a constant. Solving this equation gives T (n) = O(n lg n).
For space, varies implementation differ a lot. The basic merge sort allocates the space of
the same size as the array in each recursion, copies elements and sorts, then release the
space. It consumes the largest space of O(n lg n) when reaches to the deepest recursion.
It’s expensive to allocate/release space repeatedly [2] . We can pre-allocate a work area
of the same size as A. Reuse it during recursion, and release it finally.
1: procedure Sort(A)
2: n ← |A|
3: Sort0 (A, Create-Array(n), 1, n)
8: Sort0 (A, B, m + 1, u)
9: Merge0 (A, B, l, m, u)
We need update merge with the passed-in work area:
1: procedure Merge0 (A, B, l, m, u)
2: i ← l, j ← m + 1, k ← l
3: while i ≤ m and j ≤ u do
4: if A[i] < A[j] then
5: B[k] ← A[i]
6: i←i+1
7: else
8: B[k] ← A[j]
9: j ←j+1
10: k ←k+1
11: while i ≤ m do
12: B[k] ← A[i]
13.2. MERGE SORT 209
13: k ←k+1
14: i←i+1
15: while j ≤ u do
16: B[k] ← A[j]
17: k ←k+1
18: j ←j+1
19: for i ← from l to u do . copy back
20: A[i] ← B[i]
This implementation reduces the space from O(n lg n) to O(n), improves performance
20% to 25% for 100K numeric elements.
1: procedure Merge(A, l, m, u)
2: while l ≤ m ∧ m ≤ u do
3: if A[l] < A[m] then
4: l ←l+1
5: else
6: x ← A[m]
7: for i ← m down-to l + 1 do . Shift
8: A[i] ← A[i − 1]
9: A[l] ← x
However, it downgrades to O(n2 ) time because array shift takes linear time (O(|X|).
When sort sub-array, we want to reuse the remaining part as the work area, and must
avoid overwriting any elements. We compare elements from sorted sub-array X and Y ,
pick the less one and store it in the work area. However, we need exchange the element
out to free up the cell. After merge, X and Y together store the content of the original
work area, as shown in fig. 13.10.
The sorted X, Y , and the work area Z are all sub-arrays. We pass the start, end
positions of X and Y as ranges [i, m), [j, n). The work area starts from k.
1: procedure Merge(A, [i, m), [j, n), k)
2: while i < m and j < n do
3: if A[i] < A[j] then
4: Exchange A[k] ↔ A[i]
5: i←i+1
6: else
7: Exchange A[k] ↔ A[j]
3 range [a, b) includes a, but excludes b.
210 CHAPTER 13. QUICK SORT AND MERGE SORT
compare
... reuse ... X[i] ... ... reuse ... Y [j] ...
8: j ←j+1
9: k ←k+1
10: while i < m do
11: Exchange A[k] ↔ A[i]
12: i←i+1
13: k ←k+1
14: while j < m do
15: Exchange A[k] ↔ A[j]
16: j ←j+1
17: k ←k+1
The work area has two properties: 1) It has sufficient size to hold the elements swapped
in; 2) It can overlap with either sorted sub-arrays, but must not overwrite any unmerged
elements. One idea is to use half array as the work area to sort the other half, as shown
in fig. 13.11.
1
We next sort further half of the work area (remaining ), as shown in fig. 13.12. We
4
1 1
must merge X ( array) and Y ( array) later sometime. However, the work area can
2 4
1
only hold array, insufficient for the size of X + Y .
4
The second property gives a way out: arrange the work area overlapped with either
sub-array, and only override the merged part. We first sort the second 1/2 of the work
area, as the result, swap Y to the first 1/2, the new work area is between X and Y , as
shown in the upper of fig. 13.13. The work area is overlapped with X [74] . Consider two
extremes:
1. y < x, for all y in Y , x in X. After merge, contents of Y and the work area are
swapped (the size of Y equals to the work area);
13.2. MERGE SORT 211
The other cases are between the above two extremes. The work area finally moves to
the first 1/4 of the array. Repeat this, we always sort the second 1/2 of the work area,
swap the result to the first 1/2, and keep the work area in the middle. We halve the work
1 1 1
area every time , , , ... of the array, terminate when there is only one element left.
2 4 8
Alternatively, we can fallback to insert sort for the last few elements.
1: procedure Sort(A, l, u)
2: if u − l > 0 then
l+u
3: m←b c
2
4: w ←l+u−m
5: Sort’(A, l, m, w) . sort half
6: while w − l > 1 do
7: u0 ← w
l + u0
8: w←d e . halve the work area
2
9: Sort’(A, w, u , l)
0
. sort the remaining half
10: Merge(A, [l, l + u0 − w], [u0 , u], w)
11: for i ← w down-to l do . Switch to insert sort
12: j←i
13: while j ≤ u and A[j] < A[j − 1] do
14: Exchange A[j] ↔ A[j − 1]
15: j ←j+1
We round up the work area to ensure sufficient size, then pass the range and work
area to Merge. We next update Sort’, which calls Sort to swap the work area and the
merged part.
1: procedure Sort’(A, l, u, w)
2: if u − l > 0 then
l+u
3: m←b c
2
4: Sort(A, l, m)
5: Sort(A, m + 1, u)
212 CHAPTER 13. QUICK SORT AND MERGE SORT
n n n n 3n n 7n
T( ) = T( ) + c + T( ) + c + T( ) + c + ... (13.34)
2 4 4 8 8 16 16
Subtract eq. (13.33) and eq. (13.34):
n n 1 1
T (n) − T ( ) = T ( ) + cn( + + ...)
2 2 2 2
1
It adds with total lg n times, hence:
2
n c
T (n) = 2T ( ) + n lg n
2 2
Apply telescope method, (or master theorem) gives the result O(n lg2 n).
Knuth gives another implementation, called nature merge sort. It likes burning a
candle from both ends [51] . For any sequence, one can always find an ordered segment
from any position. Particularly, we can find such a segment from left end as shown in
below table.
The first row is the extreme case of a singleton segment; the third row is the other
extreme that the segment extends to the right end, the whole sequence is ordered. Sym-
metrically, we can always find ordered segment from right end, then merge the two sorted
segments, one from left, another from right. The advantage is to re-use the nature ordered
sub-sequences for partition.
merge
merge
As shown in fig. 13.15, scan from both ends, find the two longest ordered segments
respectively. Then merge them to the left of the work area. Next, restart to scan from
left and right to center. This time, merge the two segments from the right to left of
the work area. We switch the merge direction right/left in-turns. After scanning all
elements and merge them to the work area, swap the original array and the work area,
then start a new round of bi-directional scan and merge. Terminate when the ordered
segment extends to cover the whole array. This implementation processes the array from
both directions based on nature ordering, called nature two-way merge sort. As shown in
fig. 13.16, elements before a and after d are scanned. We span the ordered segment [a, b)
to right, meanwhile, span [c, d) to left. For the work area, elements before f and after f
are merged (consist of multiple sub-sequences). In odd rounds, we merge [a, b) and [c, d)
from f to right; in even rounds, merge from r to left.
a b c d
... scanned ... ... span [a, b) ... ... ? ... ... span [c, d) ... ... scanned ...
f r
... merged ... ... free cells ... ... merged ...
When start, we allocate a work area of the same size as the array. a, b point to the left
side, c, d point to the right side. f , r point to the two sides of the work area respectively.
1: function Sort(A)
2: if |A| > 1 then
3: n ← |A|
4: B ← Create-Array(n) . the work area
214 CHAPTER 13. QUICK SORT AND MERGE SORT
5: loop
6: [a, b) ← [1, 1)
7: [c, d) ← [n + 1, n + 1)
8: f ← 1, r ← n . front, rear of the work area
9: t←1 . even/odd round
10: while b < c do . elements yet to scan
11: repeat . Span [a, b)
12: b←b+1
13: until b ≥ c ∨ A[b] < A[b − 1]
14: repeat . Span [c, d)
15: c←c−1
16: until c ≤ b ∨ A[c − 1] < A[c]
17: if c < b then . Avoid overlap
18: c←b
19: if b − a ≥ n then . Terminate if [a, b) spans the whole array
20: return A
21: if t is odd then . merge to front
22: f ← Merge(A, [a, b), [c, d), B, f, 1)
23: else . merge to rear
24: r ← Merge(A, [a, b), [c, d), B, r, −1)
25: a ← b, d ← c
26: t←t+1
27: Exchange A ↔ B . Switch work area
28: return A
We need pass the merge direction in:
1: function Merge(A, [a, b), [c, d), B, w, ∆)
2: while a < b and c < d do
3: if A[a] < A[d − 1] then
4: B[w] ← A[a]
5: a←a+1
6: else
7: B[w] ← A[d − 1]
8: d←d−1
9: w ←w+∆
10: while a < b do
11: B[w] ← A[a]
12: a←a+1
13: w ←w+∆
14: while c < d do
15: B[w] ← A[d − 1]
16: d←d−1
17: w ←w+∆
18: return w
The performance does not depend on how ordered the elements are. In the ‘worst’
case, the ordered sub-sequences are all singleton. After merge, the length of the new
ordered sub-sequences are at least 2. Suppose we still encounter the ‘worst’ case in the
second round, the merged sub-sequences have length at least 4, ... every round doubles
the sub-sequence length, hence we need at most O(lg n) rounds. Because we scan all
elements every round, the total time is bound to O(n lg n). For list, we can’t scan from
tail back easily as array. A list consists multiple ordered sub-lists, we merge them in
pairs. It halves the sub-lists every round, and finally builds the sorted result. Define this
13.2. MERGE SORT 215
as (Curried form):
group [ ] = [[ ]]
group [x] = ([[x]]
(13.36)
x<y: (x:g):gs, where (g:gs) = group (y:xs)
group (x:y:xs) =
otherwise : [x]:g:gs
sort0 [ ] = [ ]
sort0 [g] = g (13.37)
sort0 gs = sort0 (mergePairs gs)
Where:
Exercise 13.2
13.2.1. One defines sort0 = f oldr merge [ ]. Is the performance same as the pairwise
merge (mergeP airs)? If yes, prove it; if not, which one is faster?
...
... ...
...
We needn’t partition the list. When start, convert [x1 , x2 , ..., xn ] to [[x1 ], [x2 ], ..., [xn ]],
then apply paired merge:
We reuse the mergePairs defined for nature merge sort, terminates when consolidates
to one list [3] . The bottom up sort is similar to the nature merge sort, different only in
partition method. It can be deduced from nature merge sort as a special case (the ‘worst’
case). Nature merge sort always spans the ordered sub-sequence as long as possible;
while the bottom up merge sort only spans the length to 1. From the tail recursive
implementation, we can eliminate the recursion and convert it to iterative loops.
1: function Sort(A)
2: n ← |A|
3: B ← Create-Array(n)
4: for i from 1 to n do
5: B[i] = [A[i]]
6: while n > 1 do
n
7: for i ← from 1 to b c do
2
8: B[i] ← Merge(B[2i − 1], B[2i])
9: if Odd(n) then
n
10: B[d e] ← B[n]
2
n
11: n←d e
2
12: if B = [ ] then
13: return [ ]
14: return B[1]
Exercise 13.3
13.3.1. Define the generic pairwise fold f oldp, and use it to implement the bottom-up
merge sort.
13.3 Parallelism
In quick sort implementation, we can parallel sorting the two sub-sequences after partition.
Similarly, to parallel merge sort. Actually, we don’t limit by two concurrent tasks, but
divide into p sub-sequences, where p is the number of processors. Ideally, if we can achieve
sorting in T 0 time with parallelism, where O(n lg n) = pT 0 , we say it’s linear speed up, and
the algorithm is parallel optimal. However, it is not parallel optimal by choosing p − 1
pivots, and partition the sequence into p parts for quick sort. The bottleneck happens in
the divide phase, that can only achieve in O(n) time. While, the bottleneck is the merge
phase for parallel merge sort. Both need specific design to speed up. Basically, the divide
and conquer nature makes merge sort and quick sort relative easy for parallelism. Richard
Cole developed parallel merge sort achieved O(lg n) performance with n processors in
1986 [76] . Parallelism is a big and complex topic out of the elementary scope [76] [77] .
13.4 Summary
This chapter gives two popular divide and conquer sort algorithms: quick sort and merge
sort. Both achieve the best performance of O(n lg n) for comparison based sort. Sedgewick
quotes quick sort as the greatest algorithm developed in the 20th century. Many pro-
gramming environments provide sort tool based on it. Merge sort is a powerful tool when
handling sequence of complex entities, or not persisted in array5 . Quick sort performs
5 In practice, most are kind of hybrid sort, for example, fallback to insert sort for small sequence.
13.5. APPENDIX: EXAMPLE PROGRAMS 217
well in most cases with fewer swaps than other methods. However, swap is not suitable for
linked-list, while merge sort is. It costs constant space and the performance is guaranteed
for all cases. Quick sort has advantage for vector storage like arrays, because it needn’t
extra work area and can sort in-place. This is a valuable feature particularly in embedded
system where memory is limited. In-place merging is till an active research area.
We can consider quick sort as the optimized tree sort. Similarly, we can also deduce
merge sort from tree sort [75] . People categorize sort algorithms in different ways [51] , for
example, from the perspective of partition and merge [72] . Quick sort is easy to merge,
because all the elements in one sub-sequence are not greater than the other. Merge is
equivalent to concatenation. While merge sort is easy to partition no matter cut at the
middle, even-odd split, nature split, or bottom up split. It’s difficult to achieve perfect
partition in quick sort, we can’t completely avoid the worst case no matter with median-
of-three pivot, random quick sort, or ternary quick sort.
As of this chapter, we’ve seen the elementary sort algorithms, including insert sort,
tree sort, selection sort, heap sort, quick sort, and merge sort. Sort is an important domain
in computer algorithm design. People are facing the ‘big data’ challenge when I write
this chapter. It becomes routine to sort hundreds of Gigabytes with limited resources and
time.
Exercise 13.4
13.4.1. Build a binary search tree from a sequence using the idea of merge sort.
Bi-directional scan:
Void sort([K] xs, Int l, Int u) {
if l < u - 1 {
Int pivot = l, Int i = l, Int j = u
loop {
while i < u and xs[i] < xs[pivot] {
i = i + 1
}
while j ≥ l and xs[pivot] < xs[j] {
j = j - 1
}
if j < i then break
218 CHAPTER 13. QUICK SORT AND MERGE SORT
swap(xs[i], xs[j])
}
swap(xs[pivot], xs[j])
sort(xs, l, j)
sort(xs, i, u)
}
}
Merge sort:
[K] sort([K] xs) {
Int n = length(xs)
if n > 1 {
var ys = sort(xs[0 ... n/2 - 1])
var zs = sort(xs[n/2 ...])
xs = merge(xs, ys, zs)
}
return xs
}
while i < m {
swap(xs, w++, i++)
}
while j < n {
swap(xs, w++, j++)
}
}
Solution search
Computers enables people to search the solution for many problems: we build robot to
search and pick the gadget in assembly lane; we develop car navigator to search the map
for the best route; we make smart phone application to search the best shopping plan.
This chapter is about the elementary lookup, matching, and solution search algorithms.
221
222 CHAPTER 14. SOLUTION SEARCH
In ideal case, the split is balanced (the sizes of as and bs are almost same), halves the
size every time. The performance is O(n + n/2 + n/4 + ...) = O(n). Same as the quick
sort algorithm, the worst case happens when the partition is always unbalanced. The
performance downgrades to O(kn) or O((n − k)n). In average case, we can find the k-th
element in linear time. Most engineering practices in quick sort are applicable too, like
the ‘media of three’1 , and randomly pivot:
1: function Top(k, xs, l, u)
2: Exchange xs[l] ↔ xs[ Random(l, u) ] . Randomly select in [l, u]
3: p ← Partition(xs, l, u)
4: if p − l + 1 = k then
5: return xs[p]
6: if k < p − l + 1 then
7: return Top(k, xs, l, p − 1)
8: return Top(k − p + l − 1, xs, p + 1, u)
We can change to return all the top k elements (in arbitrary order), as below example
program:
tops _ [] = []
tops 0 _ = []
tops n (x:xs) | len == k = as
| len < n = as + + [x] ++ tops (n - len - 1) bs
| otherwise = tops n as
where
(as, bs) = partition ( ≤ x) xs
len = length as
1 Blum, Floyd, Pratt, Rivest, and Tarjan developed a linear time algorithm in 1973 [4] [81] . Split the
elements into groups, each has 5 elements at most. It gives n/5 medians. Repeat this to pick the median
of median.
2 There’s a ‘mind reading’ game in social network. One thinks about a person in mind. The AI robot
asks 16 questions, and tells who that person is from the yes/no answers
14.2. BINARY SEARCH 223
Nothing
u < l :
u−l
Just m, where m = l + b
x = A[m] : c
bsearch x A (l, u) = 2 (14.5)
x < A[m] : bsearch x A (l, m − 1)
otherwise : bsearch x A (m + 1, u)
Nothing
u<l:
l+u
f (m) = y : Just m, where m = b
c
bsearch f y (l, u) = 2 (14.6)
f (m) < y : bsearch f y (m + 1, u)
f (m) > y : bsearch f y (l, m − 1)
14.2.1 2D search
Extend binary search to 2D or even higher dimension. Consider matrix M of size m × n.
The elements in each row, column are ascending nature numbers as shown in fig. 14.1.
How to locate all elements equal to z? i.e. find all locations of (i, j), such that Mi,j = z.
[(x, y)|x ← [1, 2, ..., m], y ← [1, 2, ..., n], Mx,y = z] (14.7)
Richard Bird used to interview students with this question [1] . Those who had pro-
gramming experience at school tended to apply binary search. But it was easy to get
stuck. One often checks the middle point M m2 , n2 . If it is less than z, then drop the top-
left rectangle; if greater than z then drop the bottom-right rectangle, as shown in fig. 14.2,
discard the shaded rectangle. Both cases lead to a L-shape search area, where one can’t
apply recursive search directly any more. Define the 2D search as: given f (x, y), search
integer solution (x, y), such that f (x, y) = z. The matrix search is just a special case as:
3 One can reuse the result of an to compute an+1 = aan . We consider generic monotone f (n).
224 CHAPTER 14. SOLUTION SEARCH
1 2 3 4 ...
2 4 5 6 ...
3 5 7 8 ...
4 6 8 9 ...
...
< ? ?
? ? >
Figure 14.2: Left: the middle point < z, all shaded rectangle < z; Right: the middle
point > z, all shaded rectangle > z.
(
1 ≤ x ≤ m, 1 ≤ y ≤ n : Mx,y
f (x, y) =
otherwise : −1
For monotone function f (x, y), e.g., f (x, y) = xa + y b , where a, b are nature numbers,
the effective method is to search from the top-left, but not bottom-left [82] . As shown in
fig. 14.3, start from (0, z), for each location (p, q), compare f (p, q) and z:
1. If f (p, q) < z, since f is monotone increasing, f (p, y) < z for all 0 ≤ y < q. Drop
all points in the vertical line segment (red);
2. If f (p, q) > z, then f (x, q) > z for all p < x ≤ z. Drop all points in the horizontal
line segment (blue);
Reduce the search rectangle line by line, every time drop a row, or a column, or both.
Define search function as below, and pass the top-left corner: search f z 0 z
14.2. BINARY SEARCH 225
p > z or q < 0 : [ ]
search f z (p + 1) q
f (p, q) < z :
search f z p q = (14.8)
f (p, q) > z :
search f z p (q − 1)
(p, q) : search f z (p + 1) (q − 1)
f (p, q) = z :
Every time, at least one of p and q advances towards the bottom or right by one. It
needs at most 2(z + 1) steps. There are three best cases: (1) both p and q advance one
a time, there are total z + 1 steps. As in fig. 14.4 (a), all points in the diagonal line
(x, z − x) satisfy f (x, z − x) = z. It reaches to (z, 0) in z + 1 steps; (2) move to the right
horizontally till p exceeds z. As in fig. 14.4 (b), all points in the top horizontal line (x, z)
satisfy f (x, z) < z. It terminates after z + 1 steps; (3) move down vertically till q becomes
negative. As in fig. 14.4 (c), all points in the left vertical line (0, x) satisfy f (0, x) > z.
It terminates after z + 1 steps; (d) is the worst case. If project all the horizontal sections
in the search path to x axis, all the vertical sections to y axis, it gives the total steps
of 2(z + 1). This method improves the performance of exhaustive search from O(z 2 ) to
O(z).
This algorithm is called the ‘saddle back’ search. The plot image of f has the smallest
bottom-left and the largest top-right. It looks like a saddle with two wings as shown in
fig. 14.5. We can further reduce the search rectangle of (0, z) − (z, 0). Since f is monotone
increasing, find the maximum m along y axis satisfying f (0, m) ≤ z; find the maximum
n along x axis satisfying f (n, 0) ≤ z. Reduce the search rectangle to (0, m) − (n, 0), as
shown in fig. 14.6.
m = max {0 ≤ y ≤ z, f (0, y) ≤ z}
(
(14.9)
n = max {0 ≤ x ≤ z, f (x, 0) ≤ z}
Finally, apply saddle back search in this smaller rectangle: solve(f, z) = search f z 0 m
p > n or q < 0 :
[]
search f z (p + 1) q
f (p, q) < z :
search f z p q = (14.12)
f (p, q) > z : search f z p (q − 1)
(p, q) : search f z (p + 1) (q − 1)
f (p, q) = z :
We apply two rounds of binary search to find m, n, each round computes f for O(lg z)
times; The saddle back search computes f for O(m + n) times in the worst case; it’s
O(min(m, n)) in the best case as below table. For functions like f (x, y) = xa + y b ,
a, b ∈ N, the boundary m, n are very small. The total performance is close to O(lg z).
steps to compute f
worst 2 log z + m + n
best 2 log z + min(m, n)
As shown in fig. 14.7, for a point (p, q) in rectangle (a, b) − (c, d), if f (p, q) 6= z, we
can only discard the shaded part (≤ 1/4). If f (p, q) = z, we can discard the bottom-left,
top-right parts, and all points in row p and column q since f is monotone. Hence reduce
the search rectangle by 1/2. To find the point satisfying f (p, q) = z, we apply binary
search along the horizontal or vertical central line. Because the performance is bound to
O(lg |L|) for line L, we chose the shorter central line as shown in fig. 14.8.
If there is no point satisfying f (p, q) = z, find a point, such that f (p, q) < z <
f (p + 1, q) in the horizontal central line (or f (p, q) < z < f (p, q + 1) for vertical central
line). We can’t discard all points in row p and column q. In summary, we apply binary
search along horizontal central line for the point: f (p, q) ≤ z < f (p + 1, q); or search
the vertical central line for the point: f (p, q) ≤ z < f (p, q + 1). If all points in the line
segment are f (p, q) < z, then return the upper bound; if all are f (p, q) > z, then return
the lower bound. We discard half side in this case. Below is the improved saddle back
search:
1. Apply binary search along the x, y axes for the search rectangle (0, m) − (n, 0);
2. For rectangle (a, b) − (c, d), if the height > width, apply binary search along the
horizontal central line; otherwise search along the vertical central line for the point
(p, q);
(a) If f (p, q) 6= z, we can only drop the shaded area, the remaining is a ’L’ shape.
Figure 14.9: Recursively search the shaded parts, include the bold line if f (p, q) 6= z.
4. If f (p, q) 6= z, recursively search the two rectangles and a line section, either (p, q +
1) − (p, b) in fig. 14.9 (a); or (p + 1, q) − (c, q) in fig. 14.9 (b).
c < a or d < b : [ ]
search (a, b) (c, d) = c − a < b − d : csearch (14.13)
otherwise : rsearch
Where csearch apply binary search to the horizontal central line for point (p, q), such
that f (p, q) ≤ z < f (p + 1, q), as shown in fig. 14.9 (a). If all function values are greater
b+d
than z, then return the lower bound (a, b c). Drop the above side (include the central
2
line) as shown in fig. 14.10 (a).
Let
b+d
q=b c
2
p = bsearch (x 7→ f (x, q)) z (a, c)
230 CHAPTER 14. SOLUTION SEARCH
As we halve the rectangle every time, we search O(lg(mn)) rounds. We apply binary
search along the central line for (p, q), compute f for O(lg(min(m, n))) times. Let the
time be T (m, n) when search m × n rectangle. We have the following recursive equation:
m n
T (m, n) = lg(min(m, n)) + 2T ( , ) (14.15)
2 2
Exercise 14.1
14.1.1. Prove the performance of k-selection problem is O(n) in average.
14.1.2. To find the top k element in A, we can search x = max (take k A), y = min (drop k A).
If x < y, then the first k elements in A is the answer; otherwise, we partition the
first k elements with x, partition the rest with y, then recursively find in sub-
sequence [a ← A, x < a < y] for the top k 0 elements, where k 0 = k − |[a ← A, a ≤
x]|. Implement this solution, and evaluate its performance.
14.3. THE MAJORITY NUMBER 231
14.1.3. Find the ‘simplified’ median of two sorted arrays A and B in O(lg(m + n)) time,
where m = |A|, n = |B|. The array index starts from 0. The simplified median
m+n
is defined as median(A, B) = C[b c], where C = merge(A, B) is the merged
2
4
sorted array .
14.1.4. For the saddle back search, eliminate recursion, implement it in loops to update
the boundaries.
14.1.5. For 2D search, let the bottom-left be the minimum, the top-right be the maximum.
if z is less than the minimum or greater than the maximum, then no solution;
otherwise cut the rectangle into 4 parts with a horizontal line and a vertical line
crossed at the center. then recursive search in these 4 small rectangles. Implement
this solution and evaluate its performance.
The map is often implemented as a red-black tree or Hash table. For m candidates,
n votes, below table gives the performance:
Define the element occurs over 50% as the ‘majority’. Boyer and Moore developed an
algorithm in 1980, which picks the majority element in one scan if there is with constant
space [83] . There is at most 1 majority. Repeat dropping two different elements till all
remaining are same. If the majority exists, then it is the remaining. Start from the
first vote, let the candidate be the winner so far with point 1. If the next one votes the
same candidate, then add the winner point by 1, otherwise -1. The candidate won’t be
the winner when the point reduces to 0. We pick the candidate of the next vote as the
4 In statistics, the median of an ascending data set x with n elements is defined as:
odd(n) : x[ n+1
2
]
median(x) = 1 n n
even(n) : (x[ ] + x[ + 1])
2 2 2
6 There is a probabilistic sub-linear space counting algorithm published in 2004, named as ‘Count-min
sketch’ [84] .
232 CHAPTER 14. SOLUTION SEARCH
new winner and go on. As shown in below table, if there exists majority m, then other
candidates can’t beat m. Otherwise if the majority doesn’t exist (invalid vote result, no
winner), then discard the recorded ‘winner’. We need another scan to valid the winner.
maj [ ] = ∅
(14.17)
maj (x:xs) = scan (x, 1) xs
Or implement with fold (Curried form): maj = f oldr f (∅, 0), where:
x = m :
(m, v + 1)
f x (m, v) = v = 0 : (x, 1) (14.19)
otherwise : (m, v − 1)
verify m = if 2|f ilter (= m) xs| > |xs| then Just m else Nothing (14.20)
10: c←0
11: for each a in A do
12: if a = m then
13: c←c+1
14: if c > %50|A| then
15: return Just m
16: else
17: return Nothing
Exercise 14.2
14.2.1. Extend to find k majorities that occurs over bn/kc times in collection A, where
n = |A|. Hint: Drop k different elements every time, till the remaining is less than
k distinct candidates. Any k-majority (the one over bn/kc) must remain in the
end.
Figure 14.11: A: max sum so far; B: sum of the sub-vector ends with i.
234 CHAPTER 14. SOLUTION SEARCH
1: function Max-Sum(V )
2: A ← 0, B ← 0, n ← |V |
3: for i ← 1 to n do
4: B ← Max(B + V [i], 0)
5: A ← Max(A, B)
6: return A
Or implement with fold (Curried form): Smax = f st ◦ f oldr f (0, 0), where f update
the maximum sum so far:
0
f x (Sm , S) = (Sm = max(Sm , S 0 ), S 0 = max(0, x + S)) (14.23)
Exercise 14.3
14.3.1. Modify the solution that finds the max sum of sub-vector, returns the sub-vector
of the maximum sum.
14.3.2. Bentley gives a divide and conquer algorithm to find the max sum in O(n lg n)
time [2] . Split the vector at middle, recursively find the max sum of the two halves,
and the max sum that crosses the middle. Then pick the greatest. Implement this
solution.
14.3.3. Find the sub-metrics in a m × n metrics that gives the maximum sum.
a n y a n a n t h o u s a n a n y m f l o w e r T
s a n a n y m P
q
s a n a n y m P
q
(b) Move s = 4 + 2 = 6.
When match pattern P against text T from offset s, if fails after matching q characters,
we next look up q 0 = π(q) to get a fallback position q 0 . Then retry to compare P [q 0 ] with
the text:
1: function KMP(T, P )
2: π ← Build-Prefixes(P )
3: n ← |T |, m ← |P |, q ← 0
4: for i ← 1 to n do
236 CHAPTER 14. SOLUTION SEARCH
q Pq k Pk
1 a 0 “”
2 an 0 “”
3 ana 1 a
4 anan 2 an
5 anany 0 “”
6 ananym 0 “”
1: function Build-Prefixes(P )
2: m ← |P |, k ← 0
3: π(1) ← 0
4: for q ← 2 to m do
5: while k > 0 and P [q] 6= P [k + 1] do
6: k ← π(k)
7: if P [q] = P [k + 1] then
8: k ←k+1
9: π(q) ← k
10: return π
The KMP algorithm builds the prefix function in amortized O(m) time [4] , and matches
the string in amortized O(n) time, where m = |P |, n = |T | are the lengths. The total
14.6. SOLUTION SEARCH 237
amortized performance is O(m + n), with additional O(m) space to store the prefix
function. Varies of pattern string P don’t impact the performance. Consider matching
pattern ‘aaa...ab’ (length of m) in string ‘aaa...a’ (length of n). The m-th character
doesn’t match, we can only fallback by 1 repeatedly. The algorithm is still bound to
linear time in this case.
In early years of artificial intelligent, people developed methods to search solutions. Dif-
ferent from sequence and string matching, the solution may not directly exist among a set
of candidates. We need construct the solution while try varies of options. Some problems
are not solvable. Among the solvable ones, there can be multiple solutions. For example,
a maze may have multiple ways out. We need find the optimal solution sometimes.
DFS stands for deep-first search, and BFS stands for breadth-first search. They are
typical graph search algorithms. We give some examples and skip the formal definition
of graph.
Maze
Maze is a classic puzzle. There is saying: always turn right. However, it ends up loops
as shown in fig. 14.15 (b). The decision matters when there are multiple ways. In fairy
tales, one takes some bread crumbs in a maze. Select a way, leave a piece of bread. If
later enters a died end, then goes back to the last place through the bread crumbs. then
goes to another way. Whenever sees bread crumbs left, one knows he visited it before.
Then goes back and tries a different way. Repeats the ‘try and check’ step, one will either
find the way out, or go back to the starting point (no solution). We use m × n matrix to
define a maze, each element is 0 or 1, means there is a way or not. Below matrix defines
the maze in fig. 14.15 (b):
Given a start point s = (i, j), a destination e = (p, q), we need find all paths from s
to e. We first examine all points connected with s. For every such point k, recursively
find all paths from k to e. Then prepend path s-k to every path from k to e. We need
leave some ‘bread crumbs’ to avoid looping. We use a list P to record all visited points.
Look it up and only try new ways.
Where:
(
s=e: map (reverse ◦ (s :)) P
solve s P = (14.26)
otherwise : concat [solve k (map (s :) P )|k ← adj s, k ∈
/ P]
The paths in P are reversed, we reverse the result back finally. Function adj p enu-
merates adjacent points to p:
This essentially ‘exhaustive searches’ all possible paths. We only need one way out. We
need some data structure serves for the ’bread crumbs’, recording the previous decisions.
We always search on top of the latest decision. We can use stack to realize the last-
in, first-out order. The stack starts from [s]. Pop s out and find all connected points of
a, b, ..., push the new paths [a, s], [b, s] to the stack. Next pop [a, s] out, examine all points
connected to a. Then push new paths consist of 3 steps to the stack. Repeat this. The
stack records paths in reversed order: from the farthest place back to the starting point,
as shown in fig. 14.16. If the stack becomes empty, we’ve tried all ways, and terminate
the search; otherwise, we pop a path, expand to new adjacent points, and push the new
paths back.
i j k
p [i, p, ..., s]
[j, p, ..., s]
Where:
14.6. SOLUTION SEARCH 239
solve [ ] = []
c = e :
reverse (p:ps)
solve ((p:ps):cs) = solve cs, where ks = f ilter (∈
ks = [ ] : / ps) (adj p)
solve ((map (: p:ps) ks) +
ks 6= [ ] : + cs)
(14.29)
Below is the iterative implementation:
1: function Solve-Maze(M, s, e)
2: S ← [s], L = [ ]
3: while S 6= [ ] do
4: P ← Pop(S)
5: p ← Last(P )
6: if e = p then
7: Add(L, P ) . find a solution
8: else
9: for each k in Adjacent(M, p) do
10: if k ∈
/ P then
11: Push(S, P +
+ [k])
12: return L
Each step tries 4 options (up, down, left, and right) through the backtrack. It seems
the performance is O(4n ), where n is the length of the path. The actual time won’t be so
large because we skip the visited places. In the worst case, we traverse all the reachable
points exactly once. Hence the time is bound to O(n), where n is the number of connected
points. We need additional O(n2 ) space for the stack.
Exercise 14.4
14.4.1. Modify the implementation with stack, find all ways to the maze.
[1, 2, 3, 4, 5, 6, 7, 8]. For example, the permutation [6, 2, 7, 1, 3, 5, 8, 4] means the first queen
is at row 1, column 6, the second queen is at row 2 column 2, ..., and the 8th queen is
at row 8, column 4. As such, we reduce the solution domain to 8! = 40320 permutations.
Arrange queues from the first row, there are 8 options (columns). For the next queue,
we need skip some columns to avoid attacking the first queue. For the i-th queue, find
the columns at row i, that not being attacked by the first i − 1 queues. If all 8 columns
are invalid, then go back to adjust the previous i − 1 queues. We find a solution after
arranged all 8 queues. Record it and search/backtrack further to find all solutions. Start
searching with an empty stack and list: solve [[ ]] [ ]
solve [ ] s = s(
|c| = 8 : solve cs (c:s)
solve (c:cs) s =
otherwise : solve ([x:c|x ← [1..8], x ∈
/ c, safe x c] +
+ cs) s
(14.30)
We’ve exhausted all options when the stack becomes empty, s records all the solutions;
If the arrangement c of the stack top has length of 8, we add this newly find solution
to s, then continue search; if |c| < 8, find the columns that are not occupied (x ∈ / c),
or attacked by other queues in diagonal (through safe x c). Then push the new valid
arrangements to the stack.
safe x c = ∀(i, j) ← zip (reverse c) [1, 2, ...] ⇒ |x − i| 6= |y − j|, where y = 1 + |c| (14.31)
Function safe checks if the queue at row y = 1 + |c|, column x is in the diagonal
with any other queue. Let c = [iy−1 , iy−2 , ..., i1 ] be the columns of the first y − 1 queues.
Reverse c, zip with 1, 2, ... to form coordinates: [(i1 , 1), (i2 , 2), ..., (iy−1 , y − 1)]. Then
check every (i, j) forms a diagonal with (x, y): |x − i| 6= |y − j|. This implementation is
tail recursive, we can turn it into loops:
1: function Solve-Queens
2: S ← [[ ]]
3: L←[] . Stores the solution
4: while S 6= [ ] do
5: A ← Pop(S) . A: arrangement
6: if |A| = 8 then
7: Add(L, A)
8: else
9: for i ← 1 to 8 do
10: if Valid(i, A) then
11: Push(S, A ++ [i])
12: return L
Exercise 14.5
14.5.1. Extend the 8 queens to n queens.
14.5.2. There are 92 solutions to the 8 queens puzzle. For any solution, it’s also a solution
if rotates 90◦ . We can flip to get another solution. There are essentially 12 distinct
solutions. Write a program to find them.
This is a special form of the peg puzzle. The number of pegs can be 8 or other even
numbers (as shown in Figure 14.208 ).
Label the stones from left as 1, 2, ..., 7. There are at most 4 options for every move.
When start for example, the frog on the 3rd stone can hop right to the empty stone; the
8 from http://home.comcast.net/~stegmann/jumping.htm
242 CHAPTER 14. SOLUTION SEARCH
frog on the 5th stone can hop left; the frog on the 2nd stone can leap right, the frog on
the 6th stone can leap left. We record the stone status and try the 4 options at every
step. Backtrack and try other options when get stuck. Because every frog can only move
forward, the movement is not revertible. Hence, we needn’t worry about repetition. We
record the steps only for the final output. State L is some permutation of s. L[i] is ±1, 0,
indicates there is a frog on the i-th stone heading left, right, or the stone is empty. Let
the empty stone be p, the 4 movements are:
1. Leap left: p < 6 and L[p + 2] > 0, swap L[p] ↔ L[p + 2];
2. Hop left: p < 7 and L[p + 1] > 0, swap L[p] ↔ L[p + 1];
Define four functions: leapl , hopl , leapr , and hopr , transition the status L 7→ L0 . If
can’t move, then returns L unchanged. We use a stack S to record the attempts. The
stack starts from a singleton list, containing the initial status. List M records all solutions.
We repeatedly pop the stack. If the state L = e, then we add this new solution to M ;
otherwise, we try 4 moves on top of L, and push the new status back.
Where:
solve [ ] s = s(
L=e: solve cs(reverse c : s), where : L = head c (14.33)
solve (c:cs) s =
otherwise : solve ((map (: c) (moves L)) + + cs) s
step -1 -1 -1 0 1 1 1
1 -1 -1 0 -1 1 1 1
2 -1 -1 1 -1 0 1 1
3 -1 -1 1 -1 1 0 1
4 -1 -1 1 0 1 -1 1
5 -1 0 1 -1 1 -1 1
6 0 -1 1 -1 1 -1 1
7 1 -1 0 -1 1 -1 1
8 1 -1 1 -1 0 -1 1
9 1 -1 1 -1 1 -1 0
10 1 -1 1 -1 1 0 -1
11 1 -1 1 0 1 -1 -1
12 1 0 1 -1 1 -1 -1
13 1 1 0 -1 1 -1 -1
14 1 1 1 -1 0 -1 -1
15 1 1 1 0 -1 -1 -1
For 3 frogs on each side, it takes 15 steps. We extend 3 to n, and list the steps against
n:
The steps are all square numbers minus one: (n + 1)2 − 1. Let us prove it:
Proof. Compare the start and end states, every frog moves ahead n + 1 stones. The 2n
frogs in total move 2n(n + 1) stones. Every frog on the left must meet every one from
right once. The frog must leap over another one when meet. Because there are total n2
meets, they cause all frogs move ahead 2n2 stones. The remaining moves are not leaps,
but hops. There are total 2n(n + 1) − 2n2 = 2n hops. Sum up all n2 leaps and 2n hops,
the total steps are n2 + 2n = (n + 1)2 − 1.
The three puzzles share a common solution structure: start from some state, e.g. the
entrance to the maze; the empty chess board; pegs of [-1, -1, -1, 0, 1, 1, 1]. Search the
solution, try multiple options every step, e.g. 4 directions of up, down, left, and right in
maze; 8 columns in each row; leap and hop, right and left. Although we don’t know how
far a decision leads to, we clearly know the final state, e.g. the exit of the maze; complete
arranging 8 queens; all pegs are swapped.
We apply the same strategy: repeatedly try an option; record it with the newly
obtained state; backtrack when stuck and try another option. We either find a solution
or exhaust all options, hence know the problem is unsolvable. There are variants, like to
stop when find a solution, or continue search all solutions. If build a tree rooted at the
starting state, every branch represents an option, the search grows the tree deeper and
deeper. We don’t try alternatives at the same depth until fail and backtrack. Figure 14.21
shows the search order with arrows that go down then backtrack.
We call it deep first search (DFS), which is widely used in practice. Some programming
environments, like Prolog, use DFS as the default evaluation model. For example define
a maze with rules in Prolog:
c(a, b). c(a, e).
c(b, c). c(b, f).
244 CHAPTER 14. SOLUTION SEARCH
2 8
... ...
3 6 7
4 5
a g
b e h
f d
go(X, X).
go(X, Y) :- c(X, Z), go(Z, Y)
This program says: a place X is connected with itself. Given two places X and Y ,
if X is connected with Z, and Z is connected with Y , then X is connected with Y . For
multiple choices of Z, Prolog chooses one, then goes on searching. It only tries another
Z if the recursive search completes (fails or succeeds) and backtracks. This is exactly the
DFS. We can apply DFS when only need a solution, but don’t care the number of steps.
For example, we need a way out of the maze, although it may not be the shortest.
Exercise 14.6
14.6.1. Extend the pegs puzzle solution for n pegs on each side.
can only carry one thing a time. The wolf would kill the goat; the goat would bite the
cabbage if they stay alone without the farmer. The puzzle asks to find the best (fast)
solution to cross the river.
Since the wolf doesn’t bite the cabbage, the farmer can safely carry the goal to the
other side and go back. No matter carry the wolf or the cabbage next, the farmer need
carry one back to avoid conflict. To find the best the solution, we parallel try all options
and compare. Despite the direction, count back and forth 2 steps. We check all possible
status after 1, 2, 3, ... steps, till the farmer and all things move to the other side at step
n. This is the best solution.
But how to parallel try all options? Consider a lucky draw. People pick a ball from
a box of colored balls. There is a black ball, and the rest are white. The one picks the
black wins, or returns the white ball back to the box and waits for the next draw. We
define the rule that nobody can try a second draw before all others pick. We line people
in a queue. Every time the first person picks a ball, moves to the tail if doesn’t win. The
queue ensures the fairness.
ball box
Figure 14.23: The i-th person de-queues, draws, then in-queues if doesn’t win.
We apply the same method for the cross river puzzle. Let set A, B contain the things
on each side. When start, A = {w, g, c, p} includes the wolf, the goat, the cabbage, and
the farmer; B = ∅. We move the farmer with or without another element between A and
B. If a set doesn’t contain the farmer, then it should not have conflict elements. The
goal is to swap elements in A and B with the fewest steps. We initiate a queue Q with
the start status: A = {w, g, c, p}, B = ∅. As far as Q isn’t empty, we de-queue the head,
try all options, then en-queue the new status back to the tail. We find the solution when
the head becomes A = ∅, B = {w, g, c, p}. Figure 14.24 shows the search order. As all
options at the same level are tried, we needn’t backtrack.
We can represent the set with a binary number of four bits, each bit stands for an
element, e.g., the wolf w = 1, the goat g = 2, the cabbage c = 4, and the farmer p = 8.
0 is the empty set, 15 is the full set. 3 = 1 + 2, means the set {wolf, goat}. It’s invalid
because the wolf will kill the goat; 6 = 2 + 4, is another conflict {goat, cabbage}. Every
time, we move the highest bit (8), with or without another bit (4, 2, 1) from one number
to the other. The options are:
246 CHAPTER 14. SOLUTION SEARCH
2 3 4
5 6 7
8 9 10
Figure 14.24: Start from 1, check all options 2, 3, 4 for the next step; then all option for
the 3rd step, ...
Given two water jugs, 9 and 4 litres. How to get 6 litres of water from river? This puzzle
has history back to ancient Greece. A story said the French mathematician Poisson solved
this puzzle when he was a child. It also appears in Hollywood movie ‘Die-Hard 3’. Pòlya
uses this puzzle as an example of backwards induction [90] .
9 9
4 4
6
After fill the 9 litres jug, then pour to the 4 litres jug twice, one obtains 1 litre of water,
as shown in fig. 14.26. Backwards induction is a strategy, but not detailed algorithm. It
can’t directly answer how to get 2 litres of water from two jugs of 899 litres and 1147
litres for example.
9 9
4 4
4
1 1
Figure 14.26: Fill the bigger jug, then pour to the smaller one twice.
Let the small jug be A, the big jug be B. There are 6 operations each time: (1) Fill
jug A; (2) Fill jug B; (3) Empty jug A; (4) Empty jug B; (5) Pour from jug A to B; (6)
Pour water from jug B to A. Below lists a series of operations (assume a < b < 2a).
248 CHAPTER 14. SOLUTION SEARCH
A B operation
0 0 start
a 0 fill A
0 a pour A to B
a a fill A
2a - b b pour A to B
2a - b 0 empty B
0 2a - b pour A to B
a 2a - b fill A
3a - 2b b pour A to B
... ... ...
Whatever operations, the water in each jug must be xa + yb, from some integers x and
y, where a and b are jug volumes. From the number theory, we can get g litres of water if
and only if g is some multiple of the greatest common divisor of a and b, i.e., gcd(a, b)|g.
If gcd(a, b) = 1 (a and b are coprime), then we can get any nature number g litres of
water. Although we know the existence of the solution, we don’t know the detailed steps.
We can solve the Diophantine equation g = xa + yb, then design the operations from x
and y. Assume x > 0, y < 0, we fill jug A total x times, empty jug B total y times. For
example, the small jug a = 3 litres, the big jug b = 5 litres, and the goal is to get g = 4
litres of water. Because 4 = 3 × 3 − 5, we design below operations:
A B operation
0 0 start
3 0 fill A
0 3 pour A to B
3 3 fill A
1 5 pour A to B
1 0 empty B
0 1 pour A to B
3 1 fill A
0 4 pour A to B
We fill jug A 3 times, empty jug B 1 time. We can apply the Extended Euclid algorithm
in number theory to find x and y:
d = x0 (b − aq) + y 0 a
(14.40)
= (y 0 − x0 q)a + x0 b
The edge case happens when a = 0: gcd(0, b) = b = 0a + 1b. Hence the extended
Euclid algorithm can be defined as:
There are infinite many solutions for the Diophantine equation g = xa + by. The
smaller |x| + |y|, the fewer steps. We can apply the same method as the ‘cross river’
puzzle. Try the 6 operations (fill A, fill B, pour A into B, pour B into A, empty A, and
empty B) in parallel to find the best solution. We use a queue to arrange the attempts.
The elements in the queue are series of pairs (p, q), where p and q are water in each jug,
as operations from the beginning. The queue starts from {[(0, 0)]}.
As far as the queue isn’t empty, we pop a sequence from the head. If the last pair of
the sequence contains g litres, we find a solution. We reverse and output the sequence;
otherwise, we try 6 operations atop the latest pair, filter out the duplicated ones and add
back to the queue.
bfs ∅ = [(]
p or q = g : reverse s, where: (p, q) = head s, (s, Q0 ) = pop Q (14.44)
bfs Q =
otherwise : bfs (pushAll (map (: s) (try s)) Q0 )
try s = f ilter (∈
/ s) [f (p, q)|f ← {f lA , f lB , prA , prB , emA , emB }] (14.45)
Where:
f lA (p, q) = (a, q)
f lB (p, q) = (p, b)
em (p, q) = (0, q)
(14.46)
A
emB (p, q) = (p, 0)
prA (p, q) = (max(0, p + q − b), min(x + y, b))
prB (p, q) = (min(x + y, a), max(0, x + y − a))
This method returns the solution with the fewest steps. To avoid storing the complete
operation sequence in the queue, we can use a global history list, and link every operation
250 CHAPTER 14. SOLUTION SEARCH
back to its predecessor. As shown in fig. 14.27, the start state is (0, 0), only ‘fill A’ and
‘fill B’ are applicable. We next try ‘fill B’ atop (3, 0), record the new state (3, 5). If
apply ‘empty A’ to (3, 0), we’ll go back to the starting point (0, 0). We skip it (shaded
state). We add a ‘parent’ reference to each node in fig. 14.27, and backtrack along it to
the beginning.
(0, 0) (0, 0)
fill B (3, 0)
fill A
(0, 5)
(3, 0) pour A (0, 5)
fill B fill A pour B (3, 5)
(0, 3)
(3, 5) (0, 3) (3, 5) (3, 2)
empty A empty B (3, 2)
...
(0, 0) (0, 0)
...
1: function Solve(a, b, g)
2: Q ← {(0, 0, NIL)} . Queue
3: V ← {(0, 0, NIL)} . Visited set
4: while Q 6= ∅ do
5: s ← Pop(Q)
6: if p(s) = g or q(s) = g then
7: return Back-track(s)
8: else
9: for each c in Expand(s, a, b) do
10: if c 6= s and c ∈
/ V then
11: Push(Q, c)
12: Add(V, c)
13: return NIL
Exercise 14.7
14.7.1. Improve the extended Euclid algorithm, find the x and y that minimize |x| + |y|
for the optimal solution for the two jugs puzzle.
14.6. SOLUTION SEARCH 251
Kloski
Kloski is a block sliding puzzle, as shown in fig. 14.28. There are 10 blocks of 3 sizes: 4
pieces of 1 × 1; 4 pieces of 1 × 2, 1 piece of 2 × 1, 1 piece of 2 × 2. The goal is to slide the
big block to the bottom slot. Figure 14.29 shows variants of this puzzle in Japan.
We define the board as a 5 × 4 matrix, the row and column start from 0. Label the
pieces from 1 to 10. 0 means empty cell. The matrix M gives the initial layout. The cells
with value i is occupied by piece i. We use a map L to represent the layout, where L[i]
is the set of cells occupied by piece i. For example, L[4] = {(2, 1), (2, 2)} means the 4th
piece occupies cells (2, 1) and (2, 2). Label all 20 cells from 0 to 19, we can convert a pair
of row, col to label by: c = 4y + x. The 4th piece occupies cells L[4] = {9, 10}.
1 10 10 2
1 10 10 2 1 7→ {0, 4}, 2 7→ {3, 7}, 3 7→ {8, 12},
4 7→ {9, 10}, 5 7→ {11, 15},
M = 3 4 4 5 L=
6 7→ {16}, 7 7→ {13}, 8 7→ {14},
3 7 8 5
9 7→ {19}, 10 7→ {1, 2, 5, 6}
6 0 0 9
Define map ϕ(M ) 7→ L and its reverse ϕ−1 (L) 7→ M to convert between board and
layout:
1: function ϕ(M )
2: L ← {}
3: for y ← 0 ∼ 4 do
4: for x ← 0 ∼ 3 do
252 CHAPTER 14. SOLUTION SEARCH
5: k ← M [y][x]
6: L[k] ← Add(L[k], 4y + x)
7: return L
1 1
1 0 1 1
1 1
1 1 1 1
0 1 2 2
Figure 14.30: Left: two cells of 1 can move; Right: the lower cell of 1 conflicts with the
cell of 2.
For the movement of piece k, it is valid if the target cells have value of 0 or k:
valid L[k] d :
∀c ∈ L[k] ⇒ y = bc/4c + bd/4c, x = (c mod 4) + (d mod 4), (14.47)
(0, 0) ≤ (y, x) ≤ (4, 3), M [y][x] ∈ {k, 0}
We may return to some layout after a series of slides. It’s insufficient to only avoid
duplicated matrix. Although M1 6= M2 , they are essentially the same layout.
1 10 10 2 2 10 10 1
1 10 10 2 2 10 10 1
M1 = 3 4 4 5 M2 = 3 4
4 5
3 7 8 5 3 7 6 5
6 0 0 9 8 0 0 9
We need avoid duplicated layout. Treat all pieces of the same size same, we define
normalized layout as: kLk = {p|(k 7→ p) ∈ L}, the set of all cell labels in L. Both matrix
above have the same normalized layout as {{1, 2, 5, 6}, {0, 4}, {3, 7}, {8, 12}, {9, 10},
{11, 15}, {16}, {13}, {14}, {19}}. We also need avoid mirrored layout, for example:
10 10 1 2 3 1 10 10
10 10 1 2 3 1 10 10
M1 = 3 5 4 4 M2 = 4 4 2
5
3 5 8 9 7 6 2 5
6 7 0 0 0 0 9 8
14.6. SOLUTION SEARCH 253
solve ∅ H = (
[]
L[10] = t : reverse ms, where: ((L, ms), Q0 ) = pop Q (14.49)
solve Q H =
otherwise : solve (pushAll cs Q0 ) H 0
Function move slides piece L[k] by d to: move L (k, d) = map (+d) L[k]. unique
checks if the normalized layout kL0 k ∈
/ H and its mirror(kL0 k) ∈
/ H. Add them to H 0 if
new. Below are the iterative implementation. The solution has 116 steps (1 cell a step).
The last 3 are:
1: function Solve(s, e)
2: H ← {ksk}
3: Q ← {(s, ∅)}
4: while Q 6= ∅ do
5: (L, p) ← Pop(Q)
6: if L[10] = e then
7: return (L, p)
8: else
9: for each L0 in Expand(L, H) do
10: Push(Q, (L0 , L))
11: Add(H, kL0 k)
12: return ∅
The cross river puzzle, the water jugs puzzle, and the Kloski puzzle share another
common solution structure. Similar to the DFS, they have start and end states. For
example, the cross river puzzle starts with all things on one side, the other side is empty;
it ends with all things on the other side. The water jugs puzzle starts with two empty
jugs; it ends with either jug has g litres of water. The Klotski puzzle starts with a given
layout, it ends with some layout that the big block arrives at the bottom slot. Every
puzzle has a set of rules, transfers from a state to another. We ‘parallel’ try all options.
We don’t search further until complete trying all options of the same step. This search
strategy ensures we find the solution with the fewest steps before others. Because we
expand horizontally (as in fig. 14.31), it’s called breadth-first search (BFS).
1
1
2 3 4
2 8
5 6 7
... ...
3 6 7
8 9 10
4 5
Because we can’t really search in parallel, we realize BFS with a queue. Repeatedly
de-queue the candidate with fewer steps from head, and en-queue new candidate with
more steps to tail. BFS provides a simple method to search the solution with the fewest
steps. However, it can’t directly search for generic optimal solution. Consider the directed
graph in fig. 14.32, the length of each section varies. We can’t use BFS to find the shortest
path between two nodes. For example, the shortest path from a to c is not the one with
the fewest steps: a → b → c (the total length of 22), but the path with more steps
a → e → f → c (the total length of 20).
a g
15 4 9
b e h 8
11 10
12 5
7 f d
Exercise 14.8
14.8.1. John Conway9 gives a slide tile puzzle. Figure 14.33 is a simplified example. There
are 8 cells, 7 are occupied. Label the pieces from 1 to 7. Each piece can slide to
9 John Conway (1937 - 2020), British mathematician.
14.6. SOLUTION SEARCH 255
the connected free cell. (two cells are connected if there is a line between them.)
How to reverse the pieces from 1, 2, 3, 4, 5, 6, 7 to 7, 6, 5, 4, 3, 2, 1 by sliding?
Write a program to solve this puzzle.
1 7
2 6
3 5
Huffman coding
Huffman coding encodes information with the shortest length. The ASCII code needs 7
bits to encode characters, digits, and symbols. It can represent 27 = 128 symbols. We
need at least log2 n bits of 0/1 to distinguish n symbols. Below table encodes upper
case English letters, maps A to Z from 0 to 25, each with 5 bits. Zero is padded as
00000 but not 0. Such scheme is called fixed-length coding. For example, it encodes
‘INTERNATIONAL’ to a binary number of 65 bits:
00010101101100100100100011011000000110010001001110101100000011010
with 101 (decoded as ‘BF’) or 110 followed with 1 (decoded as ‘GB’), or 1101 (decoded as
N). The Morse code is variable-length. It encodes the most used letter ‘E’ as ‘.’, encodes
‘Z’ as ‘- -..’. Particularly, it uses a special pause separator to indicate the termination of
a code, hence, eliminates the ambiguity. Below code table is ambiguity free, it encodes
‘INTERNATIONAL’ with only 38 bits.
10101100111000101110100101000011101111
The reason why it’s ambiguity free is because there is no code is the prefix of the
other. Such code is called prefix-code. (but not the ‘non-prefix code’.) Since the prefix-
code needn’t separator, we can further shorten the code length. Given a text, can we find
a prefix-code scheme, that produces the shortest code? In 1951, Robert M. Fano told the
class that who could solve this problem needn’t take the final exam. Huffman was still a
student in MIT [91] . He almost gave up and started preparing the final exam when found
the answer. Huffman created the coding table according to the frequency of the symbol
appeared in the text. The more used one is assigned with the shorter code. He could
process the text, and calculate the occurrence for each symbol, then define the weight as
the frequency. Huffman uses a binary tree to generate the prefix-code. The symbols are
stored in the leaf nodes. Traverse from the root to generate the code, add 0 when go left,
1 when go right, as shown in fig. 14.34. For example, starting from the root, go left, then
right, we arrive at ‘N’. Therefore, ‘N’ is encoded as ‘01’; While the paths of ‘A’ is right
→ right → left, encoded as ‘110’.
13
5 8
N, 3
2 4 4
A, 2
O, 1 R, 1 T, 2 I, 2 2
E, 1 L, 1
We can use the tree to decode as well. Scan the binary bits, go left for 0, and right
for 1. When arrive at a leaf, we decode the symbol from it. Then restart from the root to
continue scan. Huffman builds the tree in bottom-up way. When start, wrap all symbols
in leaves. Every time, pick two nodes with the minimum weights, merge them to a branch
node of weight w. where w = w1 + w2 is the sum of the two weights. Repeatedly pick and
merge the two smallest weighted trees till we get the final tree, as shown in fig. 14.35.
We reuse the binary tree definition for Huffman tree. We augment the weight and only
hold the symbol in leaf node. Let the branch node be (w, l, r), where w is the weight, l
14.6. SOLUTION SEARCH 257
4 5
2 2 4 A, 2 2 2 N, 3
E, 1 L, 1 O, 1 R, 1 T, 2 I, 2 E, 1 L, 1 O, 1 R, 1
13
8
5 8
4 4
N, 3
2
T, 2 I, 2 A, 2 2 4 4
E, 1 L, 1 A, 2 2
O, 1 R, 1 T, 2 I, 2
(6) (7)
E, 1 L, 1
and r are the left and right sub-trees. Let the leaf be (w, c), where c is the symbol. When
merge trees, we sum the weight: merge a b = (weight a + weight b, a, b), where:
weight (w, a) = w
(14.51)
weight (w, l, r) = w
Below function repeatedly pick and merge the minimum weighted trees:
build [t] = t
(14.52)
build ts = build (merge t1 t2 ) ts0 , where: (t1 , t2 , ts0 ) = extract ts
Function extract picks two trees with the minimal weight. Define t1 < t2 if weight t1 <
weight t2 .
Where:
To iterate building Huffman tree, we store n sub-trees in array A. Scan A from right
to left. If the weight of A[i] is less than A[n − 1] or A[n], we swap A[i] and MAX(A[n −
1], A[n]). Merge A[n] and A[n − 1] after scan, and shrink the array by one. Repeat this
to build the Huffman tree:
1: function Huffman(A)
2: while |A| > 1 do
3: n ← |A|
4: for i ← n − 2 down to 1 do
5: T ← Max(A[n], A[n − 1])
6: if A[i] < T then
7: Exchange A[i] ↔ T
258 CHAPTER 14. SOLUTION SEARCH
When encoding, we scan the text w while looking up the code table dict to generate
binary bits:
Conversely, when decoding, we scan the binary bits bs while looking up the tree. Start
from the root, go left for 0, right for 1; output symbol c when arrive at leaf; then reset to
the root to continue. decode T bs = lookup T bs, where:
Huffman tree building reflects a special strategy: always pick and merge the two trees
with the minimal weight every time. The series of local optimal options generate a global
optimal prefix-code. Local optimal sub-solutions do not necessarily lead to global optimal
solution usually. Huffman coding is an exception. We call the strategy that always choose
the local optimal option as the greedy strategy. Greedy method simplifies and works for
many problems. However, it’s not easy to tell whether the greedy method generates the
global optimal solution. The generic formal proof is still an active research area [4] .
Exercise 14.9
14.9.1. Implement the imperative Huffman code table algorithm.
change 0 = []
(14.58)
change x = cm : change (x − cm ), where : cm = max {c ∈ C, c ≤ x}
For example, to change money of value 142, this function outputs a coin list: [100,
25, 5, 5, 5, 1, 1]. We can convert it to [(100, 1), (25, 1), (5, 3), (1, 2)], meaning 1 coin of
100, 1 coin of 25, 3 coins of 5, 2 coins of 1. For the coin system of C, the greedy method
can find the optimal solution. Actually, it is applicable for the most coin systems in the
world. There are exceptions, for example: C = {1, 3, 4}. To change money x = 6, the
optimal solution is 2 coins of 3, however, the greedy method gives 6 = 4 + 1 + 1, total 3
coins.
14.6. SOLUTION SEARCH 259
Although it’s not the optimal solution, the greedy method often gives a simplified
sub-optimal implementation. The result is often good enough in practice. For example,
the word-wrap is a common functionality in editors. If the length of the text T exceeds
the page width w, we need break it into lines. Let the space between words be s, below
greedy implementation gives the approximate optimal solution: put as many words as
possible in a line.
1: L ← W
2: for w ∈ T do
3: if |w| + s > L then
4: Insert line break
5: L ← W − |w|
6: else
7: L ← L − |w| − s
Exercise 14.10
14.10.1. Use heap to build the Huffman tree: take two trees from the top, merge then add
back to the heap.
14.10.2. If we sort the symbols by their weight as A, there is a linear time algorithm to
build the Huffman tree: use a tree Q to store the merge result, repeat take the
minimal weighted tree from Q and the head of A, merge then add to the queue.
After process all trees in A, there is a single tree in the Q, which is the Huffman
tree. Implement this algorithm.
14.10.3. Given a Huffman tree T , implement the decode algorithm with fold left.
change 0 = []
(14.59)
change x = min [c : change (x − c)|c ∈ C, c < x]
Where min picks the shortest list. However, this definition is impractical. There
are too many duplicated computations. For example C = {1, 2, 25, 50, 100}, when com-
putes change(142), it needs further compute change(141), change(137), change(117),
260 CHAPTER 14. SOLUTION SEARCH
change(92), and change(42). For change(141), minus it by 1, 2, 25, 50, 100, we go back
to 137, 117, 92, 42. The search domain expands at 5n . Remind the idea to generate
Fibonacci numbers, we can use a table T to records the optimal solution to the sub-
problems. T starts from empty. When change money y, we lookup T [y] first. If T [y] = ∅,
then recursively compute the sub-problem, and save the sub-solution in T [y].
1: T ← [[ ], ∅, ∅, ...] . T [0] = [ ]
2: function Change(x)
3: if x > 0 and T [x] = ∅ then
4: for each c in C and c ≤ x do
5: Cm ← c : Change(x − c)
6: if T [x] = ∅ or |Cm | < |T [x]| then
7: T [x] ← Cm
8: return T [x]
We can bottom-up generate the optimal solutions for each sub-problem. From T [0] =
[ ], generate T [1] = [1], T [2] = [1, 1], T [3] = [1, 1, 1], T [4] = [1, 1, 1, 1], as shown in
table 14.1(a). There are two options for T [5]: 5 coins of 1, or a coin of 5. The latter
need fewer coins. We update the optimal table to table 14.1(b), T [5] = [5]. Next change
money x = 6. Both 1 and 5 are less than 6, there are two options: (1) 1 + T [5] gives
[1, 5]; (2) 5 + T [1] gives [5, 1]. They are equivalent, we pick either T [6] = [1, 5]. For T [i],
where i ≤ x, we check every coin value c ≤ i. Lookup T [i − c] for the sub-problem, then
plus c to get a new solution. We choose the fewest one as T [i].
x 0 1 2 3 4
optimal solution [ ] [1] [1, 1] [1, 1, 1] [1, 1, 1, 1]
(a) Optimal solution for x ≤ 4
x 0 1 2 3 4 5
optimal solution [ ] [1] [1, 1] [1, 1, 1] [1, 1, 1, 1] [5]
(b) Optimal solution for x ≤ 5
1: function Change(x)
2: T ← [[ ], ∅, ...]
3: for i ← 1 to x do
4: for each c in C and c ≤ i do
5: if T [i] = ∅ or 1 + |T [i − c]| < |T [i]| then
6: T [i] ← c : T [i − c]
7: return T [x]
There are many duplicated contents in the below optimal solution table. A solution
contains the sub-solutions. We can only record the incremental part: the coin c chosen
for T [i] and the number n of coins, i.e., T [i] = (n, c). To generate the list of coins for x,
we lookup T [x] to get c, then lookup T [x − c] to get c0 , ... repeat this to T [0].
value 6 7 8 9 10 ...
optimal solution [1, 5] [1, 1, 5] [1, 1, 1, 5] [1, 1, 1, 1, 5] [5, 5] ...
1: function Change(x)
2: T ← [(0, ∅), (∞, ∅), (∞, ∅), ...]
14.6. SOLUTION SEARCH 261
3: for i ← 1 to x do
4: for each c in C and c ≤ i do
5: (n, _) ← T [i − c], (m, _) ← T [i]
6: if 1 + n < m then
7: T [i] ← (1 + n, c)
8: s←[]
9: while x > 0 do
10: (_, c) ← T [x]
11: s←c:s
12: x←x−c
13: return s
We can build the optimal solution table T with left fold: foldl f ill [(0, 0)] [1, 2, ...],
where:
f ill T x = T B min {(fst T [x − c], c)|c ∈ C, c ≤ x} (14.60)
Where s B a append a to the right of s (see finger tree in section 12.5). Then rebuild
the optimal solution backwards from T :
change 0 T = []
(14.61)
change x T = c : change (x − c) T, where: c = snd T [x]
For x = n, we loop n times, check at most k = |C| coins. The performance is bound
to Θ(nk)10 , and need O(n) space to persist T both in the top-down and bottom-up
implementations. The solution to the sub-problem is used many times to compute the
global optimal solution. We call it overlapping sub-problems. Richard Bellman developed
dynamic programming in 1940s. It has two properties.
1. Optimal sub-structure. The problem can be broken down into small problems. The
optimal solution can be constructed from the solutions of these sub-problems;
2. Overlapping sub-problems. The solution of the sub-problem is reused multiple times
to find the overall solution.
M i s s i s s i p p i
M i s s u n d e r s t a n d i n g
LCS([ ], ys) = [ ]
LCS(xs, [ ]) = (
[]
(14.62)
x=y: x : LCS(xs, ys)
LCS(x:xs, y:ys) =
otherwise : max LCS(x:xs, ys) LCS(xs, y:ys)
10 upper bound
262 CHAPTER 14. SOLUTION SEARCH
Where max picks the longer sequence. There is optimal sub-structure in the definition
of LCS. It can be broken down into sub-problems. The sequence length reduces at least
by 1 every time. There are overlapping sub-problems. The longest common sub-sequence
of the sub-strings are reused multiple times to find the global optimal solution. We use
a 2D table T to record the optimal solution of the sub-problems. The row and column
represent xs and ys respectively. Index the sequence from 0. Row 0, column 0 represents
the empty sequence. T [i][j] is the length of LCS(xs[0..j], ys[0..i]). We finally build the
longest common sub-sequences from T . Because LCS([ ], ys) = LCS(xs, [ ]) = [ ], row 0
and column 0 are all 0s. Consider ‘antenna’ and ‘banana’ for example, we fill row 1 from
T [1][1]. ‘b’ is different from any one in ‘antenna’, hence row 1 are all 0s. For T [2][1], the
row and column are corresponding to ‘a’, T [2][1] = T [1][0] + 1 = 1, i.e., LCS(a, ba) = a.
Next move to T [2][2], ‘a’ 6= ‘n’, we choose the greater one between the above (LCS(an, b))
and the left (LCS(a, ba)) as T [2][2], which equals to 1, i.e., LCS(ba, an) = a. In this way,
we step by step fill the table out. The rule is: for T [i][j], if xs[i − 1] = ys[i − 1], then
T [i][j] = T [i − 1][j − 1] + 1; otherwise, pick the greater one from above T [i − 1][j] and the
left T [i][j − 1].
0 1 2 3 4 5 6 7
[] a n t e n n a
0 [] 0 0 0 0 0 0 0 0
1 b 0 0 0 0 0 0 0 0
2 a 0 1 1 1 1 1 1 1
3 n 0 1 2 2 2 2 2 2
4 a 0 1 2 2 2 2 2 3
5 n 0 1 2 2 2 3 3 3
6 a 0 1 2 2 2 3 3 4
10: m←m−1
11: else
12: n←n−1
13: return r
Exercise 14.11
14.11.1. For the longest common sub-sequence, build the optimal solution table with fold.
Subset sum
Given a set XPof integers,
P how to find all the subsets S ⊆ X, that the sum of elements
in S is s, i.e., S= i = s? For example, X = {11, 64, -82, -68, 86, 55, -88, -21, 51},
i∈S
there are three subsets with sum s = 0: S = ∅, {64, -82, 55, -88, 51}, {64, -82, -68, 86}.
We need exhaust 2n subset sums, where n = |X|, the performance is O(n2n ).
sets s ∅ = (
[∅]
s=x: {x} : sets s xs (14.63)
sets s (x:xs) =
otherwise : (sets s xs) ++ [x:S|S ∈ sets (s − x) xs]
(14.64)
X X
l= {x < 0}, u = {x > 0}
X X
When add element xi to fill row i, we can obtain all subset sums from previous
elements:P{x1 , x2 , ..., xi−1 }, hence all the entries of true in previous row are still true.
Because {xi } = xi , hence T [i][xi ] = T. We add xi to each previous sum, generate some
new sums, the corresponding entries of them are all true. After adding all n elements,
the Boolean value of T [n][s] gives whether the subset sum s exists.
1: function Subset-Sum(X, s)
2:
P P
l ← {x ∈ X, x < 0}, u ← {x ∈ X, x > 0}
3: n ← |X|
4: T ← {{F, F, ...}, {F, F, ...}, ...} . (n + 1) × (uP
− l + 1)
5: T [0][0] ← T . ∅=0
6: for i ← 1 to n do
7: T [i][X[i]] ←T
8: for j ← l to u do
9: T [i][j] ← T [i][j] ∨ T [i − 1][j]
10: j 0 ← j − X[i]
11: if l ≤ j 0 ≤ u then
12: T [i][j] ← T [i][j] ∨ T [i − 1][j 0 ]
13: return T [n][s]
The column index j does not start from 0, but from l to u. We can convert P it by
j − l in programming environment. We next generate all subsets S satisfying S =s
from table T . If T [n][s] =F then there’s no solution; otherwise, there are two cases: (1) if
xn = s, then the singleton set {xn } is a solution. We next lookup T [n − 1][s], if it’s true
T , then recursively generate all subsets from {x1 , x2 , x3 , ..., xn−1 } with the sum s; (2) let
s0 = s − xn , if l ≤ s0 ≤ u and T [n − 1][s0 ] is true, we recursively generate subsets from
{x1 , x2 , x3 , ..., xn−1 } with the sum s0 , then add xn to each subset.
1: function Get(X, s, T, n)
2: r←[]
3: if X[n] = s then
4: r ← {X[n]} : r
5: if n > 1 then
6: if T [n − 1][s] then
7: r ← r++ Get(X, s, T, n − 1)
8: 0
s ← s − X[n]
9: if l ≤ s0 ≤ u and T [n − 1][s0 ] then
10: r←r+ + [(X[n]:r0 )|r0 ← Get(X, s0 , T, n − 1) ]
11: return r
The dynamic programming method loops O(n(u − l + 1)) times to build table T , then
recursively generates the solution in O(n) levels. The 2D table need O(n(u − l + 1)) space.
We can replace itPwith a P 1D vector V of u − l + 1 entries. each V [j] = {S1 , S2 , ...} stores
the subsets that S1 = S2 = ... = j. V starts from all empty entries. For each xi , we
update V a round, add the new obtained sums with xi . The final solution is in V [s].
1: function Subset-Sum(X, s)
2:
P P
l ← {x ∈ X, x < 0}, u ← {x ∈ X, x > 0}
3: V ← [∅, ∅, ...] . u−l+1
4: for each x in X do
5: U ← Copy(V )
6: for j ← l to u do
7: if x = j then
8: U [j] ← {{x}} ∪ U [j]
9: 0
j ←j−x
14.7. APPENDIX - EXAMPLE PROGRAMS 265
Where:
j = x :
V [j] ∪ {{x}}
f V j = l ≤ j 0 ≤ u and T [j 0 ] 6= ∅ : V [j] ∪ {{x}S|S ∈ T [j 0 ]}, where : j 0 = j − x
otherwise :
V
(14.66)
Exercise 14.12
14.12.1. For the longest common sub-sequence problem, an alternative solution is to record
the length and the direction in the table. There are three directions: ‘N’ for north,
‘W’ for west, and ‘NW’. Given such a table, we can build the longest common
sub-sequence from the bottom-right entry. If the entry is ‘NW’, next go to the
upper-left entry; if it’s ‘N’, go to the above row; and go to the previous entry if
it’s ‘W’. Implement this solution.
14.12.2. For the subset sum upper/lower bound, does l ≤ 0 ≤ u always hold? can we reduce
the range between the bounds?
14.12.3. Given a list of non-negative integers, find the maximum sum composed by numbers
that none of them are adjacent.
14.12.4. Edit distance (also known as Levenshtein edit distance) is defined as the cost of
converting from one string s to another string t. It is widely used in spell-checking,
OCR correction etc. There are three symbol changes: insert, delete, and replace.
Each operation mutate a character a time. For example the edit distance is 3 for
‘kitten’ 7→ ‘sitting’:
1. kitten → sitten (k 7→ s);
2. sitten → sittin (e 7→ i);
3. sittin → sitting (+ g).
Compute the edit distance with dynamic programming.
return Optional.Nothing
}
bsearch f y (l, u) | u ≤ l = l
| f m ≤ y = if f (m + 1) ≤ y then bsearch f y (m + 1, u) else m
| otherwise = bsearch f y (l, m-1)
where m = (l + u) `div` 2
Boyer-Moore majority:
Optional<T> majority([T] xs) {
var (m, c) = (Optional<T>.Nothing, 0)
for var x in xs {
if c == 0 then (m, c) = (Optional.of(x), 0)
if x == m then c++ else c--
}
c = 0
for var x in xs {
if x == m then c++
}
return if c > length(xs)/2 then m else Optional<T>.Nothing
}
[Int] prefixes([T] p) {
m = length(p)
[Int] t = [0] ∗ m //fallback table
Int k = 0
for i = 2 to m {
while k > 0 and p[i-1] 6= p[k] {
k = t[k-1] #fallback
}
if p[i-1] == p[k] then k = k + 1
t[i] = k
}
return t
}
leapLeft [] = []
leapLeft (0:y:1:ys) = 1:y:0:ys
leapLeft (y:ys) = y:leapLeft ys
hopLeft [] = []
hopLeft (0:1:ys) = 1:0:ys
hopLeft (y:ys) = y:hopLeft ys
leapRight [] = []
leapRight (-1:y:0:ys) = 0:y:(-1):ys
leapRight (y:ys) = y:leapRight ys
hopRight [] = []
hopRight (-1:0:ys) = 0:(-1):ys
hopRight (y:ys) = y:hopRight ys
[[Int]] moves([Int] s) {
[[Int]] ms = []
n = length(s)
p = find(s, 0)
if p < n - 2 and s[p+2] > 0 then ms += swap(s, p, p+2)
if p < n - 1 and s[p+1] > 0 then ms += swap(s, p, p+1)
if p > 1 and s[p-2] < 0 then ms += swap(s, p, p-2)
if p > 0 and s[p-1] < 0 then ms += swap(s, p, p-1)
return ms
}
valid (a, b) r = not $ or [ a `elem` [3, 6], b `elem` [3, 6], (a, b) `elem` r]
moves (a, b) = if b < 8 then trans a b else map swap (trans b a) where
trans x y = [(x - 8 - i, y + 8 + i)
| i ←[0, 1, 2, 4], i == 0 | | (x .&. i) 6= 0]
14.7. APPENDIX - EXAMPLE PROGRAMS 269
solve a b g | g `mod` d 6= 0 = []
| otherwise = solve' (x ∗ g `div` d)
where
(d, x, y) = extGcd a b
solve' x | x < 0 = solve' (x + b)
| otherwise = pour x [(0, 0)]
pour 0 ps = reverse ((0, g):ps)
pour x ps@((a', b'):_) | a' == 0 = pour (x - 1) ((a, b'):ps)
| b' == b = pour x ((a', 0):ps)
| otherwise = pour x ((max 0 (a' + b' - b),
min (a' + b') b):ps)
[Step] backtrack(Step s) {
[Step] seq
while s 6= null {
seq = s : seq
s = s.parent
}
return seq
}
Klotski puzzle:
import qualified Data.Map as Map
import qualified Data.Set as Set
import qualified Data.Sequence as Queue
import Data.Sequence (Seq((:<| )), (><))
cellOf (y, x) = y ∗ 4 + x
posOf c = (c `div` 4, c `mod` 4)
end = cellSet [(3, 1), (3, 2), (4, 1), (4, 2)]
solve Queue.Empty _ = []
solve ((x, ms) :<| cs) visited | Map.lookup 10 x == Just end = reverse ms
| otherwise = solve q visited'
where
q = cs >< (Queue.fromList [(move x op, op:ms) | op ← ops ])
visited' = foldr Set.insert visited (map (normalize ◦ move x) ops)
14.7. APPENDIX - EXAMPLE PROGRAMS 271
Layout START = [{0, 4}, {3, 7}, {8, 12}, {9, 10},
{11, 15},{16},{13}, {14}, {19}, {1, 2, 5, 6}]
data Node {
Node parent
Layout layout
}
return Optional.None
}
var m = matrix(layout)
Bool valid(Set<Int> piece, Int d, Int i) {
for c in piece {
y, x = pos(c + d)
if m[y][x] not in [0, i] then return False
}
return True
}
[Layout] s = []
for i, p in zip([1, 2, ...], layout) {
for d in [-1, 1, -4, 4] {
if bound(p, d) and valid(p, d, i) {
ly = move(layout, i - 1, d)
if unique(ly) then s.append(ly)
}
}
}
return
}
Greedy change-making:
import qualified Data.Set as Set
import Data.List (group)
change 0 _ = []
change x cs = let c = Set.findMax $ Set.filter ( ≤ x) cs in c : change (x - c) cs
assoc = (map (λcs → (head cs, length cs))) ◦ group
}
return reverse(r)
}
We need handle more cases for imperative delete than insert. To resume balance after
cutting off a node from the red-black tree, we perform rotations and re-coloring. When
delete a black node, rule 5 will be violated because the number of black nodes along the
path through that node reduces by one. We introduce ‘doubly-black’ to maintain the
number of black nodes unchanged. Below example program adds ‘doubly black’ to the
color definition:
data Color {RED, BLACK, DOUBLY_BLACK}
When delete a node, we re-use the binary search tree delete in the first step, then
further fix the balance if the node is black.
1: function Delete(T, x)
2: p ← Parent(x)
3: q ← NIL
4: if Left(x) = NIL then
5: q ← Right(x)
6: Replace(x, Right(x)) . replace x with its right sub-tree
7: else if Right(x) = NIL then
8: q ← Left(x)
9: Replace(x, Leftx()) . replace x with its left sub-tree
10: else
11: y ← Min(Right(x))
12: p ← Parent(y)
13: q ← Right(y)
14: Key(x) ← Key(y)
15: copy data from y to x
16: Replace(y, Right(y)) . replace y with its right sub-tree
17: x←y
18: if Color(x) = BLACK then
19: T ← Delete-Fix(T , Make-Black(p, q), q = NIL?)
20: release x
21: return T
Delete takes the root T and the node x to be deleted as the parameters. x can be
located through lookup. If x has an empty sub-tree, we cut off x, then replace it with
the other sub-tree q. Otherwise, we locate the minimum node y in the right sub-tree of
x, then replace x with y. We cut off y after that. If x is black, we call Make-Black(p,
q) to maintain the blackness before further fixing.
1: function Make-Black(p, q)
2: if p = NIL and q = NIL then
275
276 IMPERATIVE DELETE FOR RED-BLACK TREE
x x
a z a y
y d b z
y
b c c
x z d
a d z
z b c
y d
x d
x c
a y
a b
b c
Figure 37: The doubly black node has a black sibling, and a red nephew. It can be fixed
with a rotation.
1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then . x is doubly black NIL
4: n←x
5: if x = NIL then . Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do . x is doubly black, but not the root
8: if Sibling(x) 6= NIL then . The sibling is not empty
9: s ← Sibling(x)
10: ...
11: if s is black and Left(s) is red then
12: if x = Left(Parent(x)) then . x is the left
13: set x, Parent(x), and Left(s) all black
277
14: T ← Rotate-Right(T , s)
15: T ← Rotate-Left(T , Parent(x))
16: else . x is the right
17: set x, Parent(x), s, and Left(s) all black
18: T ← Rotate-Right(T , Parent(x))
19: else if s is black and Right(s) is red then
20: if x = Left(Parent(x)) then . x is the left
21: set x, Parent(x), s, and Right(s) all black
22: T ← Rotate-Left(T , Parent(x))
23: else . x is the right
24: set x, Parent(x), and Right(s) all black
25: T ← Rotate-Left(T , s)
26: T ← Rotate-Right(T , Parent(x))
27: ...
Case 2. The sibling of the doubly black is red. We can rotate the tree to change the
doubly black node to black. As shown in figure fig. 38, change a or c to black. We can
add this fixing to the previous implementation.
x y
a y x c
b c a b
y x
x c a y
a b b c
1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then . x is doubly black NIL
4: n←x
5: if x = NIL then . Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do
8: if Sibling(x) 6= NIL then
9: s ← Sibling(x)
10: if s is red then . The sibling is red
11: set Parent(x) red
12: set s black
13: if x = Left(Parent(x)) then . x is the left
14: T ← Rotate-LeftT , Parent(x)
15: else . x is the right
16: T ← Rotate-RightT , Parent(x)
17: else if s is black and Left(s) is red then
278 IMPERATIVE DELETE FOR RED-BLACK TREE
18: ...
Case 3. The sibling of the doubly black node, and its two sub-trees are all black. In
this case, we re-color the sibling to red, change the doubly black node back to black, then
move the doubly blackness up to the parent. As shown in figure fig. 39, there are two
symetric sub-cases.
x x
a y a y
b c b c
y y
x c x c
a b a b
The sibling of the doubly black isn’t empty in all above 3 cases. Otherwise, we change
the doubly black node back to black, and move the blackness up. When reach the root,
we force the root to be black to complete fixing. It also terminates if the doubly black
node is eliminated after re-color in the midway. At last, if the doubly black node passed
in is empty, we turn it back to normal NIL.
1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then . x is a doubly black NIL
4: n←x
5: if x = NIL then . Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do
8: if Sibling(x) 6= NIL then . The sibling is not empty
9: s ← Sibling(x)
10: if s is red then . The sibling is red
11: set Parent(x) red
12: set s black
13: if x = Left(Parent(x)) then . x is the left
14: T ← Rotate-LeftT , Parent(x)
15: else . x is the right
16: T ← Rotate-RightT , Parent(x)
17: else if s is black and Left(s) is red then
18: if x = Left(Parent(x)) then . x is the left
19: set x, Parent(x), and Left(s) all black
20: T ← Rotate-Right(T , s)
21: T ← Rotate-Left(T , Parent(x))
22: else . x is the right
23: set x, Parent(x), s, and Left(s) all black
24: T ← Rotate-Right(T , Parent(x))
279
if x.left == null {
db = x.right
x.replaceWith(db)
} else if x.right == null {
db = x.left
x.replaceWith(db)
} else {
var y = min(x.right)
parent = y.parent
db = y.right
x.key = y.key
y.replaceWith(db)
x = y
}
if x.color == Color.BLACK {
t = deleteFix(t, makeBlack(parent, db), db == null);
}
remove(x)
return t
}
Where makeBlack checks if the node changes to doubly black, and handles the special
case of doubly black NIL.
Node makeBlack(Node parent, Node x) {
if parent == null and x == null then return null
return if x == null
280 IMPERATIVE DELETE FOR RED-BLACK TREE
The function blacken(node) changes the red node to black, and the black node to
doubly black:
Node blacken(Node x) {
x.color = if isRed(x) then Color.BLACK else Color.DOUBLY_BLACK
return x
}
db.color = Color.BLACK
p.color = Color.BLACK
s.right.color = p.color
t = leftRotate(t, s)
t = rightRotate(t, p)
}
} else if isBlack(s) and isBlack(s.left) and
isBlack(s.right) {
// the sibling and both sub-trees are black.
// move blackness up
db.color = Color.BLACK
s.color = Color.RED
blacken(p)
db = p
}
} else { // no sibling, move blackness up
db.color = Color.BLACK
blacken(p)
db = p
}
}
t.color = Color.BLACK
if (dbEmpty 6= null) { // change the doubly black nil to nil
dbEmpty.replaceWith(null)
delete dbEmpty
}
return t
}
Where isBlack(x) tests if a node is black, the NIL node is also black.
Bool isBlack(Node x) = (x == null or x.color == Color.BLACK)
Before returning the final result, we check the doubly black NIL, and call the re-
placeWith function defined in Node.
data Node<T> {
//...
void replaceWith(Node y) = replace(parent, this, y)
}
The program terminates when reach the root or the doubly blackness is eliminated.
As we maintain the red-black tree balanced, the delete algorithm is bound to O(lg n) time
for the tree of n nodes.
282 AVL tree - proofs and the delete algorithm
AVL tree - proofs and the
delete algorithm
I Height increment
When insert an element, the increment of the height can be deduced into 4 cases:
∆H = |T 0 | − |T |
= 1 + max(|r0 |, |l0 |) − (1 + max(|r|, |l|))
= max(|r0 |, |l0 |) − max(|r|, |l|)
δ ≥ 0, δ 0 ≥ 0 : ∆r
(67)
δ ≤ 0, δ 0 ≥ 0 : δ + ∆r
=
δ ≥ 0, δ 0 ≤ 0 : ∆l − δ
otherwise : ∆l
Proof. When insert, the height can not increase both on left and right. We can explain
the 4 cases from the balance factor definition, which is the difference of the right and left
sub-trees:
1. If δ ≥ 0 and δ 0 ≥ 0, it means the height of the right sub-tree is not less than the
left sub-tree before and after insertion. In this case, the height increment is only
‘contributed’ from the right, which is ∆r.
2. If δ ≤ 0, it means the height of left sub-tree is not less than the right before. Since
δ 0 ≥ 0 after insert, we know the height of right sub-tree increases, and the left side
keeps same (|l0 | = |l|). The height increment is:
4. Otherwise, δ and δ 0 are not bigger than zero. It means the height of the left sub-tree
is not less than the right. The height increment is only ‘contributed’ from the left,
which is ∆l.
283
284 AVL TREE - PROOFS AND THE DELETE ALGORITHM
𝛿 𝑥 =2
𝛿 𝑧 = −2
x
z 𝛿 𝑦 =1
𝛿 𝑦 = −1
a y
y d
𝛿 𝑦 =0
b z
x c y
c d
a b x z
𝛿 𝑧 = −2 𝛿 𝑥 =2
z a b c d
x
𝛿 𝑥 =1 𝛿 𝑧 = −1
x d a z
a y
y d
b c b c
The four cases are left-left, right-right, right-left, and left-right. Let the balance
factors before fixing be δ(x), δ(y), and δ(z), after fixing, they change to δ 0 (x), δ 0 (y), and
δ 0 (z) respectively. We next prove that, δ(y) = 0 for all 4 cases after fixing, and give the
result of δ 0 (x) and δ 0 (z).
After fixing:
Summarize the above, the balance factors change to the following in left-left case:
δ 0 (x) = δ(x)
δ 0 (y) = 0 (71)
δ 0 (z) = 0
Right-right
The right-right case is symmetric to left-left:
δ 0 (x) = 0
δ 0 (y) = 0 (72)
δ 0 (z) = δ(z)
Right-left
Consider δ 0 (x), after fixing, it is:
If δ(y) 6= 1, then max(|b|, |c|) = |b|. Take this into (eq. (75)) gives:
Summarize the 2 cases, we obtain the result of δ 0 (x) in δ(y) as the following:
(
δ(y) = 1 : −1
0
δ (x) = (79)
otherwise : 0
If δ(y) = |c| − |b| = −1, then max(|b|, |c|) = |b| = |c| + 1. Take this into (eq. (80)),
we have δ 0 (z) = 1. If δ(y) 6= −1, then max(|b|, |c|) = |c|. We have δ 0 (z) = 0. Combined
these two cases, we obtain the result of δ 0 (z) in δ(y) as below:
(
δ(y) = −1 : 1
0
δ (z) = (81)
otherwise : 0
1. If δ(y) = 0, then |b| = |c|. According to (eq. (79)) and (eq. (81)), we have δ 0 (x) =
0 ⇒ |a| = |b|, and δ 0 (z) = 0 ⇒ |c| = |d|. These lead to δ 0 (y) = 0.
All three cases lead to the same result δ 0 (y) = 0. Summarize all above, we get the
updated balance factors after fixing as below:
(
0 δ(y) = 1 : −1
δ (x) =
otherwise : 0
δ 0 (y) = (
0 (83)
δ(y) = −1 : 1
δ 0 (z) =
otherwise : 0
Left-right
Left-right is symmetric to the right-left case. With similar method, we can obtain the
new balance factors that is identical to (eq. (83)).
∗ Functional delete
When delete, we re-use the binary search tree delete in the first step, then check the
balance factors and perform fixing. The result is a pair (T 0 , ∆H), where T 0 is the new
tree and ∆H is the height decrement. We define delete as below:
del ∅ k =
(∅, 0)
k < k 0 : tree (del l k) k 0 (r, 0) δ
k > k 0 : tree (l, 0) k 0 (del r k) δ
(85)
l = ∅ : (r, −1)
del (l, k 0 , r, δ) =
r = ∅ : (l, −1)
k = k0 :
else : tree (l, 0) k 00 (del r k 00 ) δ
where k 00 = min(r)
If the tree is empty, the result is (∅, 0); otherwise, let the tree be T = (l, k 0 , r, δ). We
compare the k and k 0 , lookup and delete recursively. When k = k 0 , we locate the node to
be deleted. If it has either empty sub-tree, we cut the node off, and replace it with the
other sub-tree; otherwise, we use the minimum k 00 in the right sub-tree to replace k 0 , and
cut k 00 off. We re-use the tree function and ∆H result. Additional to the insert cases,
there are two cases violate AVL rule, and need fixing. As shown in figure fig. 41, both
cases can be fixed by a tree rotation. We define them as pattern matching:
a b b c
(a) Fix case A
δ(x) = 2 δ(y)0 = δ(y) − 1
x y
0
δ(y) = 0 δ(x) = 1
a y =⇒ x c
b c a b
(b) Fix case B
...
balance ((a, x, b, δ(x)), y, c, −2) ∆H = (a, x, (b, y, c, −1), δ(x) + 1, ∆H)
(86)
balance (a, x, (b, y, c, δ(y)), 2) ∆H = ((a, x, b, 1), y, c, δ(y) − 1, ∆H)
...
With the additional two, there are total 7 cases in balance implementation:
balance (Br (Br (Br a x b dx) y c (-1)) z d (-2), dH) =
(Br (Br a x b dx) y (Br c z d 0) 0, dH-1)
balance (Br a x (Br b y (Br c z d dz) 1) 2, dH) =
(Br (Br a x b 0) y (Br c z d dz) 0, dH-1)
balance (Br (Br a x (Br b y c dy) 1) z d (-2), dH) =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1)) 2, dH) =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
−− Delete specific
balance (Br (Br a x b dx) y c (-2), dH) =
(Br a x (Br b y c (-1)) (dx+1), dH)
balance (Br a x (Br b y c dy) 2, dH) =
(Br (Br a x b 1) y c (dy-1), dH)
balance (t, d) = (t, d)
† Imperative delete
The imperative delete uses tree rotations for fixing. In the first step, we re-use the binary
search tree algorithm to delete the node x from tree T ; then in the second step, check the
balance factor and perform rotation.
1: function Delete(T, x)
2: if x = NIL then
3: return T
4: p ← Parent(x)
5: if Left(x) = NIL then
6: y ← Right(x)
7: replace x with y
8: else if Right(x) = NIL then
9: y ← Left(x)
10: replace x with y
11: else
12: z ← Min(Right(x))
13: copy data from z to x
14: p ← Parent(z)
15: y ← Right(z)
III. DELETE ALGORITHM 289
1. |δ(p)| = 0, |δ(p)0 | = 1. After delete, although a sub-tree height decreases, the parent
still satisfies the AVL rule. The algorithm terminates as the tree is still balanced;
2. |δ(p)| = 1, |δ(p)0 | = 0. Before the delete, the height difference between the two
sub-trees is 1; while after delete, the higher sub-tree shrinks by 1. Both sub-trees
have the same height now. As the result, the height of the parent also decrease by
1. We need continue the bottom-up update along the parent reference to the root;
3. |δ(p)| = 1, |δ(p)0 | = 2. After delete, the tree violates the AVL height rule, we need
rotate the tree to fix it.
For case 3, the implementation is similar to the insert fixing. We need add two
additional sub-cases as shown in figure fig. 41.
1: function AVL-Delete-Fix(T, p, x)
2: while p 6= NIL do
3: l ← Left(p), r ← Right(p)
4: δ ← δ(p), δ 0 ← δ
5: if x = l then
6: δ0 ← δ0 + 1
7: else
8: δ0 ← δ0 − 1
9: if p is leaf then . l = r = NIL
10: δ0 ← 0
11: if |δ| = 1 ∧ |δ 0 | = 0 then
12: x←p
13: p ← Parent(x)
14: else if |δ| = 0 ∧ |δ 0 | = 1 then
15: return T
16: else if |δ| = 1 ∧ |δ 0 | = 2 then
17: if δ 0 = 2 then
18: if δ(r) = 1 then . Right-right
19: δ(p) ← 0
20: δ(r) ← 0
21: p←r
22: T ← Left-Rotate(T, p)
23: else if δ(r) = −1 then . Right-left
24: δy ← δ( Left(r) )
25: if δy = 1 then
26: δ(p) ← −1
27: else
28: δ(p) ← 0
29: δ( Left(r) ) ← 0
30: if δy = −1 then
290 AVL TREE - PROOFS AND THE DELETE ALGORITHM
31: δ(r) ← 1
32: else
33: δ(r) ← 0
34: else . Delete specific right-right
35: δ(p) ← 1
36: δ(r) ← δ(r) − 1
37: T ← Left-Rotate(T, p)
38: break . No furthur height change
39: else if δ = −2 then
0
IV Example program
The main delete program:
Node del(Node t, Node x) {
if x == null then return t
Node y
var parent = x.parent
if x.left == null {
y = x.replaceWith(x.right)
} else if x.right == null {
y = x.replaceWith(x.left)
} else {
y = min(x.right)
x.key = y.key
parent = y.parent
x = y
IV. EXAMPLE PROGRAM 291
y = y.replaceWith(y.right)
}
t = deleteFix(t, parent, y)
release(x)
return t
}
}
}
// height decreases, go on bottom-up update
x = parent
parent = x.parent
}
}
if parent == null then return x // delete the root
return t
}
Answers
Answer of exercise 1
1.1. For the free number puzzle, since all numbers are not negative, we can reuse the
sign as a flag. For every |x| < n (where n is the length), negate the number at
position |x|. Then scan to find the first positive number. Its position is the answer.
Write a program to realize this solution.
Int minFree([Int] nums) {
var n = length(nums)
for Int i = 0 to n - 1 {
var k = abs(nums[i])
if k < n then nums[k] = -abs(nums[k])
}
for Int i = 0 to n - 1 {
if nums[i] > 0 then return i
}
return n
}
1.2. There are n numbers 1, 2, ..., n. After some processing, they are shuffled, and a
number x is altered to y. Suppose 1 ≤ y ≤ n, design a solution to find x and y in
linear time with constant space.
For example X = [3, 1, 3, 5, 4], the missing number x = 2, the duplicated one
y = 3. We give 4 methods: (1) divide and conquer; (2) pigeon hole sort; (3) sign
encoding; and (4) equations.
1+n
Divide and conquer: Partition the numbers with the middle point m = b c:
2
the left as = [a ≤ m, a ← X], and the right bs = [b > m, b ← X]. If the length
of |as| < m, then the missing number is on the left, let s = 1 + 2 + ... + m =
m(m + 1)
, then x = s − sum(as). We can also calculate the missing one on
2
(n + m + 1)(n − m)
the right. Let s0 = (m + 1) + (m + 2) + ... + n = , then
2
y = sum(bs) − s . If the length of |as| > m, then the duplicated number is on the
0
left. Use the similar method, we calculate the missing number x = s0 − sum(bs),
and the duplicated number y = sum(as) − s. Otherwise if the length |as| = m,
then there are m numbers not greater than m. But we don’t know whether they
are some permutation of 1, 2, ..., m. We can calculate and compare sum(as) and
s. If equal, then we can drop all numbers on the left, then recursively find x and y
on the right; otherwise, we drop the right, recursively find on the left. In recursive
finding, we need replace the lower bound of 1 with l. Because we halve the list
every time, the overall performance is O(n) according to the master theorem.
missDup xs = solve xs 1 (length xs) where
solve xs@(_:_:_) l u | k < m - l + 1 = (sl - sl', sr' - sr)
293
294 ANSWERS
Pigeon hole sort. Since all numbers are within the range from 1 to n, we can
do pigeon hole sort. Scan from left to right, for every number x at position i, if
x 6= i, we swap it with number y at position x. We find the duplicated number if
x = y, besides, we find the missing number i. Repeat this till x = i or meet the
duplicated number. Because every number is swapped to its right position a time,
the total performance is O(n).
(Int, Int) missDup([Int] xs) {
Int miss = -1, dup = -1
for Int i = 0 to length(xs) - 1 {
while xs[i] 6= i {
Int j = xs[i]
if xs[j] == xs[i] {
dup = xs[j]
miss = i
break
} else {
j = xs[i]
(xs[i], xs[j]) = (xs[j], xs[i])
}
}
}
return (miss, dup)
Sign encoding. Setup an array of n flags. For every number x, mark the x-th
flag in the array true. When meet the duplicated number, the corresponding flag
was marked before. Let the duplicated number be d, we know s = 1 + 2 + ... + n =
n(n + 1)
, and the sum s0 of all numbers. We can calculate the missing number
2
m = d + s − s0 . However, this method need additional n flags. The existence
of a number is a type of binary information (yes/no), we can encode it as the
positive/negative sign, hence re-use the space. For every x, flip the number at
position |x| to negative, where |x| is the absolute value. If a number of some
position is already negative, it’s the duplicated one, and we can next calculate the
missing one.
(Int, Int) missDup([Int] xs) {
Int miss = -1, dup = -1
Int n = length(xs)
Int s = sum(xs)
for i = 0 to n - 1 {
Int j = abs(xs[i]) - 1
if xs[j] < 0 {
dup = j
miss = dup + n ∗ (n + 1) / 2 - s
break
}
xs[j] = -abs(xs[j])
}
return (miss, dup)
295
Where m is the missing number, s is the sum from 1 to n, s0 is the sum of all
numbers. However, for a missing number and a duplicated number, we can’t solve
with only one equation:
(3)
X
(x[i] − i) = d − m
Where the left hand is the sum of the i-th number minus i. Can we figure out a
second equation? We can use square: sum the difference between the square of the
i-th number and the square of i:
(4)
X
(x[i]2 − i2 ) = d2 − m2 = (d + m)(d − m)
Since d − m 6= 0, we can divide eq. (3) by eq. (4) on both sides to get another
equation:
(5)
X X
(x[i]2 − i2 )/ (x[i] − i) = d + m
Compare equation eq. (3) and eq. (5), there are two equations with two unknowns.
We can solve them:
(x[i]2 − i2 ) P
P
1
m = ( P − (x[i] − i))
2 P (x[i] − i)
2 2
1 (x[i] − i ) P
d = ( P + (x[i] − i))
2 (x[i] − i)
1.5.2. When need update the tail reference? How does it affect the performance?
We need update the tail reference when append to, delete from the tail, add to
empty list, delete the element from a singleton list, and split the list. All operations
are in constant time except for the splitting as = bs + + cs. We need linear time to
reach the tail of bs.
1.5.3. Handle the empty list and out of bound error for setAt.
setAt 0 x [ ] means x : [ ]. Let |xs| = n, we treat setAt n x xs same as xs + + [x].
Other out of bound cases will raise exception.
1.6.2. Implement insert for array. When insert at position i, all elements after i need
shift to the end.
[K] insert([K] xs, Int i, K x) {
append(xs, x)
for Int j = length(xs) - 1, j > i, j-- {
swap(xs[j], xs[j-1])
}
return xs
}
Or use f oldr:
delAll x = foldr f [] where
f y ys = if x == y then ys else y : ys
1.7.2. Design the delete algorithm for array, all elements after the delete position need
shift to front.
[K] delAt(Int i, [K] xs) {
Int n = length(xs)
if 0 ≤ i < n {
while i < n - 1 {
xs[i] = xs[i + 1]
i++
}
popLast(xs)
}
return xs
}
safeDrop n [] = []
safeDrop n (x:xs) | n ≤ 0 = x:xs
| otherwise = safeDrop (n - 1) xs
pop(xs)
m--
}
return xs
}
f oldr :: (A → B → B) → B → [A] → B
Where its first parameter f has the type of A → B → B, the initial value z has
the type B. It folds a list of A, and builds the result of B. How to define the
insertion-sort with f oldl? What is the type of f oldl?
Method 3: use f oldr, start from [[ ]], add element to right to build the suffixes:
tails = foldr f [[]] where
f x xss@(xs:_) = (x:xs) : xss
2. With start value: [m, m + 1, ..., n]. Change the lower limit from 1 to m:
iota m n = iota' [] n where
iota' ns n | n < m = ns
| otherwise = iota' (n : ns) (n - 1)
3. With step: [m, m + a, m + 2a, ..., m + ka], where k is the maximum integer
satisfying m + ka ≤ n.
iota m n a | m ≤ n = m : iota (m + a) n a
| otherwise = []
For example, generate the first 10 natural numbers: take 10 (iota 1).
5. Remove +1, to implement repeat:
repeat m = m : repeat m
Summarize above, we define the iterate function that covers all iota use cases:
iterate f x = x : iterate f (f x)
1.13.3. Define zip with fold (hint: define fold for two lists f oldr2 f z xs ys).
Define fold for two lists:
f oldr2 f z [ ] ys = z
f oldr2 f z xs [ ] = z (1.75)
f oldr2 f z (x:xs) (y:ys) = f oldr2 f (f x y z) xs ys
1.13.5. Write a program to remove the duplicated elements in a list while maintain the
original order. For imperative implementation, the elements should be removed in-
place. What is the complexity? How to simplify it with additional data structure?
dedup [] = []
dedup (x : xs) = x : dedup (filter (x 6= ) xs)
Because we f ilter for every element, the time is bound to O(n2 ). We can use set
(see chapter 3, 4) to implement dedup in O(n lg n) time:
dedup = Set.toList ◦ Set.fromList
302 ANSWERS
1.13.6. List can represent decimal non-negative integer. For example 1024 as list is 4 →
2 → 0 → 1. Generally, n = dm ...d2 d1 can be represented as d1 → d2 → ... → dm .
Given two numbers a, b in list form. Realize arithmetic operations such as add
and subtraction.
Wrap every digit of a decimal natural number into a list, the higher digit is on
the right. Convert n = (dm ...d2 d1 )10 to list of [d1 , d2 , ..., dm ]. For example, 1024
is represented as [4, 2, 0, 1]. We can convert such a list back to number, by:
f oldr (c d 7→ 10d + c) 0. Conversely, below function convert a natural number to
list:
toList n | n < 10 = [n]
| otherwise = (n `mod` 10) : toList (n `div` 10)
2. Minus. as − 0 = as, otherwise subtract each digit of as and bs. If the digit
a < b, we need carry out: d = 10 + a − b, and the remaining as0 = as − [1].
minus as [] = as
minus as [0] = as
minus (a:as) (b:bs) | a < b = (10 + a - b) : minus (minus as [1]) bs
| otherwise = (a - b) : minus as bs
303
3. Multiply. For as × bs, we multiply every digit b in bs to as, then times the by
10 and accumulate the result: cs0 = 10 × cs + (b × as). When compute b × as, if
b = 0, then it’s 0; otherwise, multiply b with the first digit a to get d = ab mod 10,
ab
add the carry c = b c to the further result:
10
mul1 0 _ = []
mul1 b [] = []
mul1 b (a:as) = (b ∗ a `mod` 10) : add [b ∗ a `div` 10] (mul1 b as)
4. Divide (with remainder). First, define how to test zero, how to compare two
numbers. A number is zero if it’s an empty list or all digits are 0:
isZero = all (== 0)
To compare as and bs, 0 is less than any none zero number; otherwise, compare
from the highest bit to the lowest (EQ: equal, LT: less than, GT: greater than):
cmp [] [] = EQ
cmp [] bs = if isZero bs then EQ else LT
cmp as [] = if isZero as then EQ else GT
cmp (a:as) (b:bs) = case cmp as bs of EQ → compare a b
r → r
Then we can define equal, and less than test with cmp:
eq as bs = EQ == cmp as bs
lt as bs = LT == cmp as bs
1.13.7. In imperative settings, a circular linked-list is corrupted, that some node points
back to previous one, as shown in fig. 1.6. When traverse, it falls into infinite
loops. Design an algorithm to detect if a list is circular. On top of that, improve
it to find the node where loop starts (the node being pointed by two precedents).
304 ANSWERS
Robert W. Folyd calls this method ‘tortoise and hare’ algorithm. We can further
find the node where the circle starts. As shown in fig. 1.7, let OA has k nodes.
The loop starts from A. The circle contains n nodes. When p arrives at A, the
faster pointer (doubled speed) q will arrives at B. From this time point, the two
pointers loop. p is behind q by k nodes. From circle perspective, it equivalent to
q is going to catch up p, which is ahead of n − k nodes. It takes time of:
n−k n−k
t= =
2v − v v
O k Ap
Bq
n−k
n−k
When meet, p moves from A by distance of v = n − k. p will arrive at A
v
again after moving forward k nodes. At this time, if reset q to the head O, let it
move 1 node a time. Then q will arrive at A after k nodes too, i.e., p and q meet
at A. Although we assumed k < n, it’s true also for k ≥ n.
Proof. p and q start from the head. When p arrives at A, q will arrive at a place
in the circle pass A by k mod n nodes. Convert to q will catch up p ahead of
n − (k mod n) nodes. It takes time:
n − (k mod n) n − (k mod n)
t= =
2v − v v
n − (k mod n)
When meet, p moves from A by distance of v = n − (k mod n). If p
v
further moves forward k, it will arrives at the place of (k mod n) + n − (k mod n) =
n, it’s exactly at A again. At this time after p and q meet, if reset q to the head
O, let it move 1 node a time. Then a will arrive at A after k nodes too, i.e., p and
q meet at A again.
We can implement a program to find A:
Optional<List<K>> findCycle(List<K> h) {
var p = h, q = h
while p 6= null and q 6= null {
p = p.next
q = q.next
if q == null then return Optional.Nothing
q = q.next
if p == q {
q = h
while p 6= q {
p = p.next
q = q.next
}
return Optional.of(p)
}
}
return Optional.Nothing
}
[a1 , a2 , ..., ai−1 , m, ai+1 , aa+2 , ..., an ]. Let Il = I[1, i), Ir = I[i + 1, n], where [l, r)
includes l, but excludes r. Either can be empty [ ]. In these three parts Il , m, Ir ,
Il is the in-order traverse result of the left sub-tree, Ir is the in-order result of the
right sub-tree. Let k = |Il | be the size of the left sub-tree, we can split P [2, n] at
k to two parts: Pl , Pr , where Pl contains the first k elements. We next recursively
rebuild the left sub-tree from (Pl , Il ), rebuild the right sub-tree from (Pr , Ir ):
rebuild [ ] [ ] = ∅
rebuild (m:ps) I = (rebuild Pl Il , m, rebuild Pr Ir )
Where:
(Il , Ir ) = splitWith m I
(
2.1.3. For binary search tree, prove that the in-order traverse always gives ordered list.
Proof. Use proof with absurdity. Suppose there exits (finite sized) binary search
tree, the in-order result is not ordered. Among all such trees, select the smallest
T . First T can’t be ∅, as the in-order result is [ ], which is ordered. Second, T
can’t be singleton (∅, k, ∅), as the in-order result is [k], which is ordered. Hence T
must be a branch node of (l, k, r). The in-order result is toList l + + [x] +
+ toList r.
Because T is the smallest tree that the in-order result is not ordered, while l and
r are smaller than T , hence both toList l and toList r are ordered. According to
the binary search tree definition, for every x ∈ l, x < k, and every y ∈ r, y > k.
Hence the in-order result toList l + + toList r is ordered, which conflicts with
+ [x] +
the assumption, that the in-order result of T is not ordered.
Therefore, for any binary search tree, the in-order result is ordered.
2.1.6. Define depth t with fold, to calculate the height of a binary tree.
2.2.2. Use Pred and Succ to write an iterator to traverse the binary search tree as a
generic container. What’s the time complexity to traverse a tree of n elements?
data TreeIterator<T> {
Node<T> node = null
T get() = node.key
Although we need find the min/max of the sub-tree or back-track along the parent
reference, we take linear time to iterate the tree container. During the traverse
process, we visit every node a time (arrive and leave), for example:
for var it = TreeIterator(root), it.hasNext(), it = it.next() {
print(it.get())
}
2.3.3. How to find the two nodes with the greatest distance in a binary tree?
We first find the maximum distance m, then give the longest path [s, a, b, ..., e]. The
two ends s and e are the two nodes in question. To define the distance between two
nodes, let the connected path (without direction) be s → n1 → n2 → ... → nm →
e, every edge has the length of 1, then the total length from s to e is the distance
between them, which is m + 1. Define the maximum distance of the empty tree
as 0. For singleton leaf (∅, k, ∅), as the longest path is [k], the maximum distance
is also 0 (from k to k). Consider the branch node (l, k, r), the maximum distance
must be one of the three: (1) from the deepest node on the left to the root, then
to the deepest node on the right: depth l + depth r; (2) the maximum distance of
the left sub-tree l; (3) the maximum distance of the right sub-tree r.
maxDistance Empty = 0
maxDistance (Node Empty _ Empty) = 0
maxDistance (Node l _ r) = maximum [depth l + depth r,
maxDistance l, maxDistance r]
Where the definition of depth is in Exercise 2.1.6. We can adjust it to find the
longest path. For the empty tree, the longest path is [ ], for singleton leaf, the
longest path is [k], for branch node (l, k, r), the longest path is the maximum of
the three: (1) The reverse of the path from the root to the deepest node on the left,
and k, and the path from the root the deepest node on the right; (2) the longest
path of the left; (3) the longest path of the right.
maxPath Empty = []
maxPath (Node Empty k Empty) = [k]
maxPath (Node l k r) = longest [(reverse depthPath l) +
+ k : depthPath r,
maxPath l, maxPath r] where
longest = maximumBy (compare `on` length)
depthPath = foldt id (λ xs k ys → k : longest [xs, ys]) []
This implementation traverses the tree when calculate the depth, then traverses
another two rounds for the left and right sub-trees. To avoid duplication, we can
bottom-up fill the depth d and the maximum distance m in each node. This can
be done through tree map: T ree A 7→ T ree (Int, Int) in one traverse:
maxDist = extract ◦ mapTr where
extract Empty = 0
extract (Node _ (_, m) _) = m
mapTr Empty = Empty
mapTr (Node l _ r) = f (mapTr l) (mapTr r)
f l r = Node l (1 + max d1 d2, maximum [d1 + d2, m1, m2]) r where
(d1, m1) = pairof l
(d2, m2) = pairof r
pairof Empty = (0, 0)
pairof (Node _ k _) = k
3.1.2. Define the insert function, and call it from the sort algorithm.
Void insert([T] xs, T x) {
append(xs, x)
Int i = length(xs) - 1
while i > 0 and xs[i] < xs[i-1] {
swap(xs[i], xs[i-1])
i--
}
}
Because the address entries are in lexicographic order, the two tasks generate two
unbalanced tree, head:
Th = ((...(∅, k1 , ∅), ...), km , ∅),
and tail:
Tt = (∅, km+1 , (∅, km+2 , ...(∅, kn , ∅))...),
310 ANSWERS
as shown in fig. 4.2(c). With multiple slices, each one generates a tree likes Th ,
then combine to a big unbalanced tree. Figure 4.2(b) builds a zig-zag tree from
the elements with interleaved order. Each node has an empty sub-tree.
n ≥ 2h/2 − 1 ⇒ 2h/2 ≤ n + 1
Take logarithm on both sides: h/2 ≤ lg(n + 1), i.e., h ≤ 2 lg(n + 1).
Node<T> makeBlack(Node<T> t) {
t.color = Color.BLACK
return t
311
Node<T> ins(Node<T> t, T x) {
if t == null then return Node(null, x, null, Color.BLACK)
return if x < t.key
then balance(t.color, ins(t.left, x), t.key, t.right)
else balance(t.color, t.left, t.key, ins(t.right, x))
}
If there are more than half nodes are inactive, we convert the tree to a list, filter
the active nodes and rebuild the tree.
312 ANSWERS
size t < 1 (cap t) : (f romList ◦ toList) t
rebuild t = 2
otherwise : t
Where toList traverse the tree and skip the deleted nodes:
toList ∅ = [(]
a: toList l +
+ [k] +
+ toList r
toList (c, l, (k, a), r) =
otherwise : toList l +
+ toList r
To avoid traverse the entire tree every time when count the nodes, we save the tree
capacity and size in each node. Extend the type of the tree to T ree (K, Bool, Int, Int),
and define node as:
node ∅ = ∅
node c l (k, a, _, _) r = (c, l, (k, a, sz, ca), r)
Where:
ca = 1 + cap l + cap r
Function size and cap access the stored size and capacity:
size ∅ = 0 cap ∅ = 0
size (_, (_, _, sz, _), _) = sz cap (_, (_, _, _, ca), _) = ca
active (Elem _ a _ _) = a
getElem (Elem x _ _ _) = x
size Empty = 0
size (Node _ _ (Elem _ _ sz _) _) = sz
cap Empty = 0
cap (Node _ _ (Elem _ _ _ ca) _) = ca
313
toList Empty = []
toList (Node _ l e r) | active e = toList l +
+ [getElem e] +
+ toList r
| otherwise = toList l ++ toList r
6.2.2. Implement the pre-order traverse for both integer trie and integer tree. Only
output the keys when the nodes store values. What pattern does the result follow?
We first convert an integer trie to assoc-list. The pre-order is the recursive ‘middle-
left-right’ order. When traverse the empty tree, the result is [ ], For branch node
(l, m, r), let the recursive pre-order lists of the left and right sub-trees be as and bs
respectively. For the middle Maybe value m, if it’s Nothing, the result is as + + bs;
if it’s Just v, then the result is (k, v):as +
+ bs, where k is the corresponding binary
integer (in little endian, 0 for left, 1 for right).
toList = go 0 1 where
go _ _ Empty = []
go k n (Branch l m r) = case m of
Nothing → as + + bs
(Just v) → (k, v) : as + + bs
where
as = go k (2 ∗ n) l
bs = go (n + k) (2 ∗ n) r
Further, we can define generic pre-order fold for integer trie. Different from the
f old in chapter 2, the keys are computed while folding.
foldpre f z = go 0 1 z where
go _ _ z Empty = z
go k n z (Branch l m r) = f k m (go k (2 ∗ n) (go (n + k) (2 ∗ n) z r) l)
It’s more straightforward to implement the pre-order traverse for integer prefix
tree than trie. We needn’t compute keys. Below is the generic fold in pre-order:
foldpre _ z Empty = z
foldpre f z (Leaf k v) = f k v z
foldpre f z (Branch p m l r) = foldpre f (foldpre f z r) l
We can convert a integer tree to assoc-list and get keys with fold:
toList = foldpre (λk v xs → (k, v):xs) []
keys = fst ◦ unzip ◦ toList
When populate keys of a tree, their binary bits are in ascending order for both
little endian binary trie and big endian integer prefix tree. To verify it, we define
a function bitsLE converting an integer to a list of bits in little endian. Then use
it to verify the key ordering of trie:
verify kvs = sorted $ map bitsLE $ keys $ fromList kvs where
sorted [] = True
sorted xs = and $ zipWith ( ≤ ) xs (tail xs)
bitsLE 0 = []
bitsLE n = (n `mod` 2) : bitsLE (n `div` 2)
Where kvs is a list of random key-value pairs. The corresponding verification for
big endian integer prefix tree is as below:
verify kvs = sorted $ keys $ fromList kvs
Where first f (a, b) = (f a, b), it applies the function f to the first one of a pair.
When implement predictive input with trie, we lookup MT 9 for all characters
mapped to a digit, then lookup the trie for candidate words.
findT9 [] _ = [[]]
findT9 (d:ds) (Trie _ ts) = concatMap find cts where
cts = case Map.lookup d mapT9 of
Nothing → []
Just cs → Map.assocs $ Map.filterWithKey (λc _ → c `elem` cs) ts
find (c, t) = map (c:) (findT9 ds t)
6.4.2. How to ensure the candidates in lexicographic order in the auto-completion and
predictive text input program? What’s the performance change accordingly?
From Exercise 6.2.2, if traverse a binary prefix tree in pre-order, the result is
in lexicographic order. For multi-way prefix tree, we need traverse sub-trees in
lexicographic order. If the sub-trees are managed with self-balancing tree (like the
red-black tree or AVL tree), we can do this in linear time (Exercise 2.2.2). If the
sub-trees are stored in hash table or assoc-list, then we need O(n lg n) time to sort
them.
7.1.3. We use linear search among keys to find the proper insert position. Improve the im-
perative implementation with binary search. Is the big-O performance improved?
317
The performance is still linear. Although the binary search speed up to O(lg n),
the insertion takes O(n) time to shift elements.
When partition the tree with x, we need use ≤, but not <, because x may be the
last one on the left. We abstract the partition predicate as a parameter:
partitionWith p (BTree ks ts) = (l, t, r) where
l = (ks1, ts1)
r = (ks2, ts2)
(ks1, ks2) = L.span p ks
(ts1, (t:ts2)) = L.splitAt (length ks1) ts
7.3.2. Define the delete function for the ‘paired list’ implementation.
delete x (d, t) = fixRoot (d, del x t) where
del _ Empty = Empty
del x t = if (Just x) == fmap fst (listToMaybe r) then
case t' of
Empty → balance d l Empty (tail r)
_ → let k' = max' t' in
balance d l (del k' t') ((k', snd $ head r):(tail r))
else balance d l (del x t') r
where
(l, t', r) = partition (< x) t
We need add additional logic in balance to fix the too low cases after delete:
balance :: Int → [(a, BTree a)] → BTree a → [(a, BTree a)] → BTree a
balance d l t r | full d t = fixFull
| low d t = fixLow l t r
| otherwise = BTree l t r
where
fixFull = let (t1, k, t2) = split d t in BTree l t1 ((k, t2):r)
fixLow ((k', t'):l) t r = balance d l (unsplit t' k' t) r
fixLow l t ((k', t'):r) = balance d l (unsplit t k' t') r
fixLow l t r = t −− l == r == []
When merge two leftist heaps, we firstly top-down merge along the right sub-tree,
then bottom-up update the rank along the parent reference. Swap the sub-trees
if the left has the smaller rank. To simplify the empty tree handling, we use a
sentinel node.
Node<T> merge(Node<T> a, Node<T> b) {
var h = Node(null) // the sentinel node
while a 6= null and b 6= null {
if b.value < a.value then swap(a, b)
var c = Node(a.value, parent = h, left = a.left)
h.right = c
h = c
a = a.right
}
h.right = if a 6= null then a else b
while h.parent 6= null {
if rank(h.left) < rank(h.right) then swap(h.left, h.right)
h.rank = 1 + rank(h.right)
h = h.parent
}
h = h.right
if h 6= null then h.parent = null
return h
}
T top(Node<T> h) = h.value
It’s simpler to merge skew heaps, as we needn’t update the rank, the parent, or
back-track.
data Node<T> {
T value
Node<T> left = null, right = null
}
Below is the merge function, the others are same as the leftist heap.
Node<T> merge(Node<T> a, Node<T> b) {
var h = Node(None)
var root = h
while a 6= null and b 6= null {
if b.value < a.value then swap(a, b)
var c = Node(a.value, left = null, right = a.left)
h.left = c
h = c
320 ANSWERS
a = a.right
}
h.left = if a 6= null then a else b
root = root.left
return root
}
f old f z ∅ = z
f old f z H = f old f (f (top H) z) (pop H)
sortBy (<) ∞ defines the ascending sort, and sortBy (>) − ∞ defines the de-
scending sort.
9.2.2. How to handle duplicated elements with the tournament tree? is tournament tree
sort stable?
From Exercise 9.2.1, we can handle the duplicated elements with sortBy (≤) ∞
(ascending sort) for example. The tournament tree sort is not stable.
9.2.3. Compare the tournament tree sort and binary search tree sort in terms of space
and time performance.
They are both bound to O(n lg n) time, and O(n) space. The difference is, the
binary search tree does not change after build (unless insert, delete), while the
tournament tree changes to a tree with n nodes of infinity.
9.2.4. Compare heap sort and tournament tree sort in terms of space and time perfor-
mance.
They are both bound to O(n lg n) time, and O(n) space. The difference is, the
heap becomes empty after sort complete, while the tournament tree still occupies
O(n) space.
Given any row in Pascal’s triangle, this function generates the next row. We can
build the first n rows of pascal triangle as: take n (iterate pascal [1]).
10.1.2. Prove that the i-th row in tree Bn has ni nodes.
Proof. Use induction. There is only a node (the root) in B0 . Assume every row in
Bn is a list of binomial numbers. Tree Bn+1 is composed from two Bn trees. The
0-th row contains the root: 1 = n+1 0 . The i-th row has two parts: one from the
(i − 1)-th row of the left most sub-tree Bn , the other from the i-th row of the other
Bn tree. In total:
n
n
n! n!
i−1 + i = +
(i − 1)!(n − i + 1)! i!(n − i)!
n![i + (n − i + 1)]
=
i!(n − i + 1)!
(n + 1)!
=
i!(n −
i + 1)!
n+1
= i
10.1.4. Use a container to store sub-trees, how to implement link? How to ensure it is in
constant time?
If store all sub-trees in an array, we need linear time to insert a new tree ahead:
1: function Link’(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Parent(T2 ) ← T1
5: Insert(Sub-Trees(T1 ), 1, T2 )
6: Rank(T1 ) ← Rank(T2 ) + 1
7: return T1
We can store the sub-trees in reversed order, it needs constant time to append the
new tree to the tail.
Where t(H) is the number of trees in the heap, and m(H) is the number of the nodes
being marked. As we mark, then later cut and clear the flag, its coefficient is 2. Decrease
takes O(1) time to cut x off, then recursively call Cascade-Cut. Assume it’s recursively
called c times. Each time takes O(1) time to call Cut, then continue recursion. Hence
the total cost of Decrease is O(c).
For the potential change, let H be the heap before we call Decrease, every recursive
Cascade-Cut cuts a marked node off, then clear the flag (except for the last call). After
that, there are t(H) + c trees, including the original t(H) trees, the cut and added back
c − 1 trees, and the tree with x as the root. There are at most m(H) − c + 2 marked
nodes, including the original m(H) nodes, minus the c − 1 nodes being cleared in the
Cascade-Cut call. The last call may mark another node. The potential changes at
most:
Node<K> insert(Node<K> h, K x) {
if h 6= null and length(h.subTrees) > MAX_SUBTREES {
h = insert(pop(h), top(h))
}
return merge(h, Node(x))
}
323
Node<K>(K k) { key = k }
}
When delete x, first lookup the heap h to find the sub-tree t rooted at x. If t is
the root of h, then merely do a pop; otherwise, get the parent p of t, remove t from
its sub-trees. Then apply pop, and finally merge pop(t) back with h.
Node<K> delete(Node<K> h, K x) {
var tr = lookuptr(h, x)
if tr == null then return h
if tr == h then return pop(h)
tr.parent.subtrees.remove(tr)
tr.parent = null
return merge(pop(tr), h)
}
Node<K> lookuptr(Node<K> h, K x) {
if h.key == x then return h
for var t in h.subtrees {
var tr = lookuptr(t, x)
if tr 6= null then return tr
}
return null
}
The recursive lookup takes O(n) time, where n is the number of elements in the
heap. Then it takes O(m) time to remove t from m sub-trees. The total perfor-
mance is O(n).
10.3.3. Implement Decrease-Key for the pairing heap.
If decrease the key of the root of h, we directly update it to x; otherwise, get the
parent of tr, then cut tr off from the sub-trees; update the key of tr to x, then
merge tr back to h:
Node<K> decreaseKey(Node<K> h, Node<K> tr, K x) {
if tr == null or tr.key < x then return h
tr.key = x
if tr == h then return h
tr.parent.subtrees.remove(tr) // O(m), where m = length(subtrees)
tr.parent = null
return merge(tr, h)
}
When use, we first call lookuptr(h, y) to find the node, then update from y to
x, i.e., deceaseKey(h, lookuptr(h, y), x). The performance is as same as delete,
which is O(n).
Although we can deal with different cases that the head is before/behind the tail,
as shown in fig. 11.4, let us seek for some simple and unified solution. Consider an
324 ANSWERS
array with both sides open and are extending infinitely. Let the head index be h,
the tail be t, the size of the buffer be s. The range [h, t) (left close, right open) is
occupied with elements. The empty/full testing is as below:
(
empty(h, t) : h = t
f ull(h, t) : t−h=s
The circular buffer is essentially modular arithmetic on top of this. Write [n]s =
n mod s. Apply to above full test: [t]s − [h]s = [s]s = 0, which gives [t]s = [h]s .
It is exactly the empty test condition. It means we can’t differentiate empty with
full only with the modular result. We need either add a flag (indicate the order
between h and t), or use the original index (of the infinite long segment) without
applying mod for the empty/full test. Consider the integer number has limited
byte size, we can mod the index with a big number p (p > s) which is co-prime
with s:
(
empty(h, t) : [h]p = [t]p
f ull(h, t) : [t − h]p = s
void enqueue(Queue<K> q, K x) {
if not full(q) {
q.t = (q.t + 1) mod P_LIMIT
q.buf[q.t mod q.s] = x
}
}
Optional<K> dequeue(Queue<K> q) {
Optional<K> x = Optional.Nothing
if not empty(q) {
x = Optional.of(q.buf[q.h mod q.s])
q.h = (q.h + 1) mod P_LIMIT
}
return x
}
2: Append(Front(Q), x)
3: function Pop(Q)
4: if Rear(Q) = [ ] then
5: Rear(Q) ← Reverse(Front(Q))
6: Front(Q) ← [ ]
7: n ← Length(Rear(Q))
8: x ← Rear(Q)[n]
9: Length(Rear(Q)) ← n − 1
10: return x
State([K] f, [K] r) {
acc = [], front = f, rear = r
idx = 0
}
data RealtimeQueue<K> {
[K] front = []
[K] rear = []
State<K> state = null
Self push(K x) {
front.append(x)
balance()
}
K pop() {
x = rear.popLast()
balance()
return x
}
Void balance() {
if state == null and length(rear) < length(front) {
state = State(front, rear).step()
front = []
}
if state 6= null and state.step().done() {
rear = state.acc
state = null
}
}
}
We can use Maybe type to handle the out of bound cases. If the index i < 0, return
Nothing; if the index exceeds, then it eventually converts to the empty forest after
recursion. We return Nothing in this case too.
getAt [] _ = Nothing
getAt (t:ts) i | i < 0 = Nothing
| i < size t = lookupTree i t
| otherwise = getAt ts (i - size t)
where
lookupTree 0 (Leaf x) = Just x
lookupTree i (Node sz t1 t2) = if i < sz `div` 2 then lookupTree i t1
else lookupTree (i - sz `div` 2) t2
data List<K> {
Int size = 0
[[K]] trees = [[]]
}
Int nbits(Int n) {
Int i = 0
while n 6= 0 {
i = i + 1
n = n / 2
}
return i
}
Define the potential of the paired-array sequence as the difference of the array
lengths: Φ(s) = |r| − |f | = n − m, where m = |f | and n = |r|. When delete from
the head, if f 6= [ ], then takes O(1) time to remove the last element of f . If f = [ ],
then takes O(n) time to halve r, reverse and replace as f 0 , and then uses another
O(1) time to remove the last element of f 0 . The amortized cost is:
c = n + 1 + Φ(s0 ) − Φ(s)
= n + 1 + (|r0 | − |f 0 |) − (|r| − |f |)
n n
= n + 1 + (n − d e) − (d e − 1) − (n − 0)
2 2
= 1
Hence the amortized time is O(1) when delete from head, symmetrically, the amor-
tized time is O(1) when delete from tail too.
2: n = (x)
3: ⊥← p ← ([ ], T, [ ])
4: while |Front(T )| ≥ 3 do
5: f ← Front(T )
6: n ← (f [2], f [3], ...)
7: Front(T ) ← [n, f [1]]
8: p←T
9: T ← Mid(T )
10: if T = NIL then
11: T ← ([n], NIL, [ ])
12: else if |Front(T )| = 1 and Rear(T ) = [ ] then
13: Rear(T ) ← Front(T )
14: Front(T ) ← [n]
15: else
16: Insert(Front(T ), n)
17: Mid(p) ← T
18: T ← Mid(⊥), Mid(⊥) ← NIL
19: return T
We wrap x in a leaf (x). If there are more than 3 elements in f , we go top-down
along the middle part. We extract the elements except the first one in f out, wrap
them in a node n (depth + 1), then insert n to the middle. We form n and the
remaining in f as the new f finger. At the end of traverse, we either reach an
empty tree, or a tree can hold more elements in f . For empty tree case, we create
a new leaf node; otherwise, we insert n to the head of f . We return the root T .
To simplify implementation, we create a special ⊥ node as the parent of the root.
sub-trees of n[ ][1]
sub-trees of n[ ][1]
Where function Elem(n) accesses the element in sub-tree n. We need change the
way to access the first/last element of finger tree. If the finger is empty, and the
middle isn’t empty, we need search along the middle.
1: function First-Leaf(T )
330 ANSWERS
∅[i] = Nothing
i < 0 or i ≥ size T : Nothing
(
T [i] =
otherwise : ...
cutTree splits the tree in three parts: left, middle, and right. We wrap the middle
in Maybe type to handle the not found case; when found, the result is a pair of
position i0 and node a, wrapped in the type of Place. if i points to finger f or r,
we call cutList to further split, then build the result; if i points to the middle, we
recursively cut the middle to obtain a place Place i0 a, then cut the 2-3 tree a at
position i0 :
cutTree :: (Sized a) ⇒ Int → Tree a → (Tree a, Maybe (Place a), Tree a)
cutTree _ Empty = (Empty, Nothing, Empty)
cutTree i (Lf a) | i < size a = (Empty, Just (Place i a), Empty)
| otherwise = (Lf a, Nothing, Empty)
cutTree i (Br s f m r)
| i < sf = case cutList i f of
(xs, x, ys) → (Empty <<< xs, x, tree ys m r)
| i < sm = case cutTree (i - sf) m of
(t1, Just (Place i' a), t2) →
let (xs, x, ys) = cutNode i' a
in (tree f t1 xs, x, tree ys t2 r)
| i < s = case cutList (i - sm) r of
(xs, x, ys) → (tree f m xs, x, ys >>> Empty)
where
331
With cut defined, we can update or delete any element at a given position, move
to front (MTF), they all bound to O(lg n) time.
setAt s i x = case cut i s of
(_, Nothing, _) → s
(xs, Just y, ys) → xs + ++ (x <| ys)
time in total; the next round is to merge (s1 ⊕ s2 ) ⊕ (s3 ⊕ s4 ), ..., taking O(kn)
time too. There are total lgk rounds, hence the total complexity is O(nk lg k).
Therefore the pairwise merge performs better.
We can also merge with a min-heap of size k. Store the minimum elements from
each sequence in the heap, keep popping the overall minimum one, and replace it
with the next element in that sequence. The complexity is O(nk lg k) too.
For example, we can define sum = f oldp (+) 0, and the bottom-up merge sort is
defined as4 :
sort = foldp merge [] ◦ map (:[])
We can build the binary search tree either top-down or bottom-up. For top-
down, halve the list to two, recursively build two trees respectively then merge;
for bottom-up, wrap each element to a singleton leaf, then pair-wise merge trees
repeatedly to the final result. Both approaches depend on the tree merge, let us
define it6 . If either tree is empty, the merge result is the other; otherwise, let the
two trees be (A, x, B) and (C, y, D), if x < y, use y to partition B to two trees By
and B y , where By has all elements {b < y}, and By has the rest {b ≥ y}; use x to
partition C to Cx and C x . We then form the merge result as:
merge Empty t = t
4 (:[]) is equivalent to x xs 7→ [x]:xs.
6 For the standalone problem of binary search tree merge, we can flatten the trees to two arrays (or
lists through in-order traverse), then apply merge (see eq. (13.31)), and rebuild the tree with the middle
element as the root.
333
merge t Empty = t
merge (Node a x b) (Node c y d)
| x < y = let
(b1, b2) = partition y b
(c1, c2) = partition x c
in Node (Node (merge a c1) x (merge b1 c2)) y (merge d b2)
| otherwise = let
(a1, a2) = partition y a
(d1, d2) = partition x d
in Node (merge a1 c) y (Node (merge a2 d1) x (merge b d2))
B[j] ≤ A[i] ≤ B[j + 1] holds, then the guessed A[i] is the median. Otherwise, we
need update the boundaries l or u if the guess is too big or small. Below example
program implements this solution:
K median([K] a, [K] b) {
if a == [] then return b[length(b) / 2]
if b == [] then return a[length(a) / 2]
Int i = medianOf(a, b)
return if i == -1 then return median(b, a) else a[i]
}
The second solution is to develop a generic function that looks for the k-th element.
Assume m ≥ n (otherwise swap A and B), If either array is empty, then return
the k-th element of the other array. If k = 1, then return the smaller one between
A[0] and B[0]. Otherwise guess j = min(k/2, n) and i = k − j, then check A[i]
and B[j]. If A[i] < B[j], we drop all elements before A[i] and after B[j], then
recursively find the (k − i)-th element of the remaining; otherwise, drop all before
B[j] and after A[i] for recursively finding.
K median([K] xs, [K] ys) {
Int n = length(xs), m = length(ys)
return kth(xs, 0, n, ys, 0, m, (m + n) / 2 + 1)
}
K kth([K] xs, Int x0, Int x1, [K] ys, Int y0, Int y1, Int k) {
if x1 - x0 < y1 - y0 then return kth(ys, y0, y1, xs, x0, x1, k)
if x1 ≤ x0 then return ys[y0 + k - 1]
if y1 ≤ y0 then return xs[x0 + k - 1]
if k == 1 then return min(xs[x0], ys[y0])
var j = min(k / 2, y1 - y0), i = k - j
i = x0 + i, j = y0 + j
if xs[i - 1] < ys[j - 1] then
return kth(xs, i, x1, ys, y0, j, k - i + x0)
else
return kth(xs, x0, i, ys, j, y1, k - j + y0)
}
1: function Solve(f, z)
2: p ← 0, q ← z
3: S←φ
4: while p ≤ z and q ≥ 0 do
5: z 0 ← f (p, q)
6: if z 0 < z then
7: p←p+1
8: else if z 0 > z then
9: q ←q−1
10: else
11: S ← S ∪ {(p, q)}
12: p ← p + 1, q ← q − 1
13: return S
14.1.5. For 2D search, let the bottom-left be the minimum, the top-right be the maximum.
if z is less than the minimum or greater than the maximum, then no solution;
otherwise cut the rectangle into 4 parts with a horizontal line and a vertical line
crossed at the center. then recursive search in these 4 small rectangles. Implement
this solution and evaluate its performance.
1: procedure Search(f, z, a, b, c, d) . (a, b): bottom-left (c, d): top-right
2: if z ≤ f (a, b) or f (c, d) ≥ z then
3: if z = f (a, b) then
4: record (a, b) as a solution
5: if z = f (c, d) then
6: record (c, d) as a solution
7: return
a+c
8: p←b c
2
b+d
9: q←b c
2
10: Search(f, z, a, q, p, d)
11: Search(f, z, p, q, c, d)
12: Search(f, z, a, b, p, q)
13: Search(f, z, p, b, c, q)
Let the time to search in rectangle of area A be T (A). We take O(1) time to check
whether z ≤ f (a, b) or f (c, d) ≥ z, then divide into 4 smaller areas, i.e., T (A) =
4T (A/4) + O(c). We apply the master theorem, the complexity is O(A) = O(mn),
which is proportion to the area. It’s essentially same as exhaustive search in the
rectangle.
For every a in A, if a ∈
/ m (new to the dictionary), and the candidates in m is less
than k, we add a to m with one net-win vote: m[a] ← 1; if a ∈ m, add the vote
by 1: m[a] ← m[a] + 1; otherwise, if there are already k candidates, we reduce the
vote by 1 for every one, and remove the candidate when the vote becomes 0.
We need verify the remaining candidates at last, whether the votes > n/k, let
m0 = {(a, 0)|a ∈ m}. Scan A again: f oldr cnt m0 A, where cnt is defined as:
cnt a m0 = if a ∈ m0 then m0 [a] ← m0 [a] + 1 else m0 (14.22)
After scan, m0 records the votes for each candidate, we filter the true winners in:
keys (f ilter (> n/k) m0 ).
majorities k xs = verify $ foldr maj Map.empty xs where
maj x m | x `Map.member` m = Map.adjust (1+) x m
| Map.size m < k = Map.insert x 1 m
| otherwise = Map.filter ( 6= 0) $ Map.map (-1 +) m
verify m = Map.keys $ Map.filter (> th) $ foldr cnt m' xs where
m' = Map.map (const 0) m
cnt x m = if x `Map.member` m then Map.adjust (1+) x m else m
th = (length xs) `div` k
14.3.2. Bentley gives a divide and conquer algorithm to find the max sum in O(n lg n)
time [2] . Split the vector at middle, recursively find the max sum of the two halves,
and the max sum that crosses the middle. Then pick the greatest. Implement this
solution.
1: function Max-Sum(A)
2: if A = φ then
3: return 0
4: else if |A| = 1 then
5: return Max(0, A[1])
6: else
7: m ← b |A|
2 c
8: a ← Max-From(Reverse(A[1...m]))
9: b ← Max-From(A[m + 1...|A|])
10: c ← Max-Sum(A[1...m])
11: d ← Max-Sum(A[m + 1...|A|)
12: return Max(a + b, c, d)
13:function Max-From(A)
14: sum ← 0, m ← 0
15: for i ← 1 to |A| do
16: sum ← sum + A[i]
17: m ← Max(m, sum)
18: return m
Consider the recursive equation: T (n) = 2T (n/2)+O(n), from the master theorem,
the performance is O(n).
14.3.3. Find the sub-metrics in a m × n metrics that gives the maximum sum.
We start from the first row of the metrics, add a row per round [M [1, ∗], M [2, ∗], ..., M [i, ∗]].
Then sum the numbers in each column and convert it to a vector:
Xi i
X i
X
V = M [1, j], M [2, j], ..., M [n, j]
j=1 j=1 j=1
Next use the maxsum to find the max sum in vector V , and record the global
maximum sum.
maxSum = maximum ◦ (map maxS) ◦ acc ◦ rows where
rows = init ◦ tails −− exclude the empty row
acc = concatMap (scanl1 (zipWith (+))) −− accumulated sum along columns
maxS = snd ◦ (foldl f (0, 0)) −− max sum in a vector
f (m, s) x = let m' = max (m + x) 0
s' = max m' s in (m', s')
K maxsum1([K] xs) {
K s = 0 // max so far
K m = 0 // max end here
for x in xs {
m = max(m + x, 0)
s = max(m, s)
}
return s
}
}
return ps
}
n solution
1 [1]
2 no solution
3 no solution
4 [2,4,1,3],[3,1,4,2]
5 [2,4,1,3,5],[3,1,4,2,5],[1,3,5,2,4], ... total 10 solutions
14.5.2. There are 92 solutions to the 8 queens puzzle. For any solution, it’s also a solution
if rotates 90◦ . We can flip to get another solution. There are essentially 12 distinct
solutions. Write a program to find them.
The solutions are symmetric because the square board is symmetric. The dihe-
dral group D8 defines the square symmetric, including 8 permutations: identity
id, counter clockwise rotate around the center by 90°, 180°, and 270°. Reflect
horizontally, vertically, along the two diagonals. They send the queen at location
(i, j) to:
permutation position
id (i, j)
reflect along Y and X (9 − i, j), (i, 9 − j)
reflect along two diagonals (j, i), (9 − j, 9 − i)
rotate 90°, 180°, 270° (9 − j, i), (9 − i, 9 − j), (j, 9 − i)
d8 = [id,
reverse, map (9 - ), −− reflect Y, X
trans swap, trans (λ(i, j) → (9 - j, 9 - i)), −− reflect AC, BD
trans (λ(i, j) → (9 - j, i)), −− 90
trans (λ(i, j) → (9 - i, 9 - j)), −− 180
trans (λ(i, j) → (j, 9 - i))] −− 270
where
trans f xs = snd $ unzip $ sortOn fst $ map f $ zip [1..8] xs
...
[1,0,7,6,5,4,3,2],[1,7,0,6,5,4,3,2],[1,7,6,0,5,4,3,2],[1,7,6,5,0,4,3,2],
[1,7,6,5,4,0,3,2],[1,7,6,5,4,3,0,2],[1,7,6,5,4,3,2,0],[0,7,6,5,4,3,2,1]
Map<T, [T]> codeTab(Node<T> t, [T] bits = [], Map<T, [T]> codes = {}) {
if t.isLeaf() {
codes[t.c] = bits
} else {
codeTab(t.left, bits + [0], codes)
codeTab(t.right, bits + [1], codes)
}
return codes
}
14.10.3. Given a Huffman tree T , implement the decode algorithm with fold left.
decode = snd ◦ (f oldl lookup (T, [ ])), where:
if d == DIR.NW {
r.append(xs[m - 1]) // or ys[n - 1]
m = m - 1, n = n - 1
} else if d == DIR.N {
m = m - 1
} else if d == DIR.W {
n = n - 1
}
}
return reverse(r)
}
14.12.2. For the subset sum upper/lower bound, does l ≤ 0 ≤ u always hold? can we reduce
the range between the bounds?
For non-empty subset (the sum of empty is 0), l ≤ 0 ≤ u does not hold necessarily.
Consider set X of only positive numbers, the lower bound can’t be less than 0, and
l = min(X). For set of only negative numbers, the upper bound can’t be greater
than 0, and u = max(X).
14.12.3. Compute the edit distance between two strings.
Int lev([K] s, [K] t) {
[[Int]] d = [[0]∗n]∗m //d[i][j]: distance between s[:i] and t[:j]
for Int i = 0 to length(s) {
d[i][0] = i //drop all chars of source prefix gives []
}
for Int j = 0 to length(t) {
d[0][j] = j //insert all chars of target prefix to []
}
for Int j = 1 to length(t) {
for i = 1 to length(m) {
c = if s[i-1] == t[j-1] then 0 else 1
d[i][j] = min([d[i-1][j] + 1, //deletion
d[i][j-1] + 1, //insertion
d[i-1][j-1] + c]) //substitution
}
}
return d[length(s)][length(t)]
}
Bibliography
[1] Richard Bird. “Pearls of functional algorithm design”. Cambridge University Press;
1 edition (November 1, 2010). ISBN-10: 0521513383. pp1 - pp6.
[2] Jon Bentley. “Programming Pearls(2nd Edition)”. Addison-Wesley Professional; 2
edition (October 7, 1999). ISBN-13: 978-0201657883.
[3] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[5] Chris Okasaki. “Ten Years of Purely Functional Data Structures”. http://okasaki.
blogspot.com/2008/02/ten-years-of-purely-functional-data.html
[6] SGI. “Standard Template Library Programmer’s Guide”. http://www.sgi.com/tech/
stl/
[7] Wikipedia. “Fold(high-order function)”. https://en.wikipedia.org/wiki/Fold_
(higher-order_function)
[8] Wikipedia. “Function Composition”. https://en.wikipedia.org/wiki/Function_
composition
[9] Wikipedia. “Partial application”. https://en.wikipedia.org/wiki/Partial_application
[10] Miran Lipovaca. “Learn You a Haskell for Great Good! A Beginner’s Guide”. No
Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327-283-8
[11] Wikipedia. “Bubble sort”. https://en.wikipedia.org/wiki/Bubble_sort
[12] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[13] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a Functional Setting”.
J. Functional Programming. 1998
[14] Wikipedia. “Red-black tree”. https://en.wikipedia.org/wiki/Red-black_tree
[15] Lyn Turbak. “Red-Black Trees”. http://cs.wellesley.edu/~cs231/fall01/red-black.pdf
Nov. 2, 2001.
[16] Rosetta Code. “Pattern matching”. http://rosettacode.org/wiki/Pattern_matching
[17] Hackage. “Data.Tree.AVL”. http://hackage.haskell.org/packages/archive/AvlTree/
4.2/doc/html/Data-Tree-AVL.html
345
346 BIBLIOGRAPHY
[38] Esko Ukkonen “Approximate string-matching over suffix trees”. Proc. CPM 93. Lec-
ture Notes in Computer Science 684, pp. 228-242, Springer 1993. http://www.cs.
helsinki.fi/u/ukkonen/cpm931.ps
[39] Wikipeida. “B-tree”. https://en.wikipedia.org/wiki/B-tree
[40] Wikipedia. “Heap (data structure)”. https://en.wikipedia.org/wiki/Heap_(data_
structure)
[41] Wikipedia. “Heapsort”. https://en.wikipedia.org/wiki/Heapsort
[42] Rosetta Code. “Sorting algorithms/Heapsort”. http://rosettacode.org/wiki/Sorting_
algorithms/Heapsort
[43] Wikipedia. “Leftist Tree”. https://en.wikipedia.org/wiki/Leftist_tree
[44] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented Design Pat-
terns in Java. http://www.brpreiss.com/books/opus5/index.html
[45] Donald E. Knuth. “The Art of Computer Programming. Volume 3: Sorting and
Searching.”. Addison-Wesley Professional; 2nd Edition (October 15, 1998). ISBN-13:
978-0201485417. Section 5.2.3 and 6.2.3
[46] Wikipedia. “Skew heap”. https://en.wikipedia.org/wiki/Skew_heap
[47] Sleator, Daniel Dominic; Jarjan, Robert Endre. “Self-adjusting heaps” SIAM Journal
on Computing 15(1):52-69. doi:10.1137/0215004 ISSN 00975397 (1986)
[48] Wikipedia. “Splay tree”. https://en.wikipedia.org/wiki/Splay_tree
[49] Sleator, Daniel D.; Tarjan, Robert E. (1985), “Self-Adjusting Binary Search Trees”,
Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835
[50] NIST, “binary heap”. http://xw2k.nist.gov/dads//HTML/binaryheap.html
[51] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[52] Wikipedia. “Strict weak order”. https://en.wikipedia.org/wiki/Strict_weak_order
[53] Wikipedia. “FIFA world cup”. https://en.wikipedia.org/wiki/FIFA_World_Cup
[54] Wikipedia. “K-ary tree”. https://en.wikipedia.org/wiki/K-ary_tree
[55] Wikipedia, “Pascal’s triangle”. https://en.wikipedia.org/wiki/Pascal's_triangle
[56] Hackage. “An alternate implementation of a priority queue based on a
Fibonacci heap.”, http://hackage.haskell.org/packages/archive/pqueue-mtl/1.0.7/
doc/html/src/Data-Queue-FibQueue.html
[57] Chris Okasaki. “Fibonacci Heaps.” http://darcs.haskell.org/nofib/gc/fibheaps/orig
[58] Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan.
“The Pairing Heap: A New Form of Self-Adjusting Heap” Algorithmica (1986) 1:
111-129.
[59] Maged M. Michael and Michael L. Scott. “Simple, Fast, and Practical Non-
Blocking and Blocking Concurrent Queue Algorithms”. http://www.cs.rochester.
edu/research/synchronization/pseudocode/queues.html
348 BIBLIOGRAPHY
[60] Herb Sutter. “Writing a Generalized Concurrent Queue”. Dr. Dobb’s Oct 29, 2008.
http://drdobbs.com/cpp/211601363?pgno=1
[61] Wikipedia. “Tail-call”. https://en.wikipedia.org/wiki/Tail_call
[62] Wikipedia. “Recursion (computer science)”. https://en.wikipedia.org/wiki/
Recursion_(computer_science)#Tail-recursive_functions
[63] Harold Abelson, Gerald Jay Sussman, Julie Sussman. “Structure and Interpretation
of Computer Programs, 2nd Edition”. MIT Press, 1996, ISBN 0-262-51087-1.
[64] Chris Okasaki. “Purely Functional Random-Access Lists”. Functional Programming
Languages and Computer Architecture, June 1995, pages 86-95.
[65] Ralf Hinze and Ross Paterson. “Finger Trees: A Simple General-purpose Data
Structure,” in Journal of Functional Programming 16:2 (2006), pages 197-217.
http://www.soi.city.ac.uk/~ross/papers/FingerTree.html
[66] Guibas, L. J., McCreight, E. M., Plass, M. F., Roberts, J. R. (1977), ”A new repre-
sentation for linear lists”. Conference Record of the Ninth Annual ACM Symposium
on Theory of Computing, pp. 49-60.
[67] Generic finger-tree structure. http://hackage.haskell.org/packages/archive/
fingertree/0.0/doc/html/Data-FingerTree.html
[68] Wikipedia. “Move-to-front transform”. https://en.wikipedia.org/wiki/Move-to-
front_transform
[69] Robert Sedgewick. “Implementing quick sort programs”. Communication of ACM.
Volume 21, Number 10. 1978. pp.847 - 857.
[70] Jon Bentley, Douglas McIlroy. “Engineering a sort function”. Software Practice and
experience VOL. 23(11), 1249-1265 1993.
[71] Robert Sedgewick, Jon Bentley. “Quicksort is optimal”. http://www.cs.princeton.
edu/~rs/talks/QuicksortIsOptimal.pdf
[72] Fethi Rabhi, Guy Lapalme. “Algorithms: a functional programming approach”. Sec-
ond edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
[73] Simon Peyton Jones. “The Implementation of functional programming languages”.
Prentice-Hall International, 1987. ISBN: 0-13-453333-X
[74] Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. “Practical in-place mergesort”.
Nordic Journal of Computing, 1996.
[75] Josè Bacelar Almeida and Jorge Sousa Pinto. “Deriving Sorting Algorithms”. Tech-
nical report, Data structures and Algorithms. 2008.
[76] Cole, Richard (August 1988). “Parallel merge sort”. SIAM J. Comput. 17 (4): 770-
785. doi:10.1137/0217049. (August 1988)
[77] Powers, David M. W. “Parallelized Quicksort and Radixsort with Optimal Speedup”,
Proceedings of International Conference on Parallel Computing Technologies. Novosi-
birsk. 1991.
[78] Wikipedia. “Quicksort”. https://en.wikipedia.org/wiki/Quicksort
[79] Wikipedia. “Total order”. http://en.wokipedia.org/wiki/Total_order
BIBLIOGRAPHY 349
[86] Robert Boyer, Strother Moore. “A Fast String Searching Algorithm”. Comm. ACM
(New York, NY, USA: Association for Computing Machinery) 20 (10): 762-772. 1977
[87] R. N. Horspool. “Practical fast searching in strings”. Software - Practice & Experience
10 (6): 501-506. 1980.
[90] George Pólya. “How to solve it: A new aspect of mathematical method”. Princeton
University Press(April 25, 2004). ISBN-13: 978-0691119663
[91] Wikipedia. “David A. Huffman”. https://en.wikipedia.org/wiki/David_A._Huffman
[92] Andrei Alexandrescu. “Modern C++ design: Generic Programming and Design Pat-
terns Applied”. Addison Wesley February 01, 2001, ISBN 0-201-70431-5
[93] Benjamin C. Pierce. “Types and Programming Languages”. The MIT Press, 2002.
ISBN:0262162091
[94] Joe Armstrong. “Programming Erlang: Software for a Concurrent World”. Pragmatic
Bookshelf; 1 edition (July 18, 2007). ISBN-13: 978-1934356005
<http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies of this license document,
but changing it is not allowed.
Preamble
The purpose of this License is to make a manual, textbook, or other functional and
useful document “free” in the sense of freedom: to assure everyone the effective freedom
to copy and redistribute it, with or without modifying it, either commercially or noncom-
mercially. Secondarily, this License preserves for the author and publisher a way to get
credit for their work, while not being considered responsible for modifications made by
others.
This License is a kind of “copyleft”, which means that derivative works of the document
must themselves be free in the same sense. It complements the GNU General Public
License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because
free software needs free documentation: a free program should come with manuals provid-
ing the same freedoms that the software does. But this License is not limited to software
manuals; it can be used for any textual work, regardless of subject matter or whether it
is published as a printed book. We recommend this License principally for works whose
purpose is instruction or reference.
351
352 BIBLIOGRAPHY
textbook of mathematics, a Secondary Section may not explain any mathematics.) The
relationship could be a matter of historical connection with the subject or with related
matters, or of legal, commercial, philosophical, ethical or political position regarding
them.
The “Invariant Sections” are certain Secondary Sections whose titles are designated,
as being those of Invariant Sections, in the notice that says that the Document is released
under this License. If a section does not fit the above definition of Secondary then it is
not allowed to be designated as Invariant. The Document may contain zero Invariant
Sections. If the Document does not identify any Invariant Sections then there are none.
The “Cover Texts” are certain short passages of text that are listed, as Front-Cover
Texts or Back-Cover Texts, in the notice that says that the Document is released under
this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may
be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy, represented
in a format whose specification is available to the general public, that is suitable for re-
vising the document straightforwardly with generic text editors or (for images composed
of pixels) generic paint programs or (for drawings) some widely available drawing editor,
and that is suitable for input to text formatters or for automatic translation to a variety of
formats suitable for input to text formatters. A copy made in an otherwise Transparent
file format whose markup, or absence of markup, has been arranged to thwart or dis-
courage subsequent modification by readers is not Transparent. An image format is not
Transparent if used for any substantial amount of text. A copy that is not “Transparent”
is called “Opaque”.
Examples of suitable formats for Transparent copies include plain ASCII without
markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly
available DTD, and standard-conforming simple HTML, PostScript or PDF designed
for human modification. Examples of transparent image formats include PNG, XCF
and JPG. Opaque formats include proprietary formats that can be read and edited only
by proprietary word processors, SGML or XML for which the DTD and/or processing
tools are not generally available, and the machine-generated HTML, PostScript or PDF
produced by some word processors for output purposes only.
The “Title Page” means, for a printed book, the title page itself, plus such following
pages as are needed to hold, legibly, the material this License requires to appear in the
title page. For works in formats which do not have any title page as such, “Title Page”
means the text near the most prominent appearance of the work’s title, preceding the
beginning of the body of the text.
The “publisher” means any person or entity that distributes copies of the Document
to the public.
A section “Entitled XYZ” means a named subunit of the Document whose title
either is precisely XYZ or contains XYZ in parentheses following text that translates
XYZ in another language. (Here XYZ stands for a specific section name mentioned below,
such as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To
“Preserve the Title” of such a section when you modify the Document means that it
remains a section “Entitled XYZ” according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that
this License applies to the Document. These Warranty Disclaimers are considered to be
included by reference in this License, but only as regards disclaiming warranties: any
other implication that these Warranty Disclaimers may have is void and has no effect on
the meaning of this License.
2. VERBATIM COPYING
BIBLIOGRAPHY 353
You may copy and distribute the Document in any medium, either commercially or
noncommercially, provided that this License, the copyright notices, and the license notice
saying this License applies to the Document are reproduced in all copies, and that you
add no other conditions whatsoever to those of this License. You may not use technical
measures to obstruct or control the reading or further copying of the copies you make
or distribute. However, you may accept compensation in exchange for copies. If you
distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may
publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers)
of the Document, numbering more than 100, and the Document’s license notice requires
Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all
these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the
back cover. Both covers must also clearly and legibly identify you as the publisher of
these copies. The front cover must present the full title with all words of the title equally
prominent and visible. You may add other material on the covers in addition. Copying
with changes limited to the covers, as long as they preserve the title of the Document and
satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put
the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest
onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100,
you must either include a machine-readable Transparent copy along with each Opaque
copy, or state in or with each Opaque copy a computer-network location from which
the general network-using public has access to download using public-standard network
protocols a complete Transparent copy of the Document, free of added material. If you use
the latter option, you must take reasonably prudent steps, when you begin distribution of
Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible
at the stated location until at least one year after the last time you distribute an Opaque
copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well
before redistributing any large number of copies, to give them a chance to provide you
with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions
of sections 2 and 3 above, provided that you release the Modified Version under precisely
this License, with the Modified Version filling the role of the Document, thus licensing
distribution and modification of the Modified Version to whoever possesses a copy of it.
In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the
Document, and from those of previous versions (which should, if there were any, be
listed in the History section of the Document). You may use the same title as a
previous version if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for
authorship of the modifications in the Modified Version, together with at least five
of the principal authors of the Document (all of its principal authors, if it has fewer
than five), unless they release you from this requirement.
354 BIBLIOGRAPHY
C. State on the Title page the name of the publisher of the Modified Version, as the
publisher.
E. Add an appropriate copyright notice for your modifications adjacent to the other
copyright notices.
F. Include, immediately after the copyright notices, a license notice giving the public
permission to use the Modified Version under the terms of this License, in the form
shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required Cover
Texts given in the Document’s license notice.
I. Preserve the section Entitled “History”, Preserve its Title, and add to it an item
stating at least the title, year, new authors, and publisher of the Modified Version as
given on the Title Page. If there is no section Entitled “History” in the Document,
create one stating the title, year, authors, and publisher of the Document as given
on its Title Page, then add an item describing the Modified Version as stated in the
previous sentence.
J. Preserve the network location, if any, given in the Document for public access to
a Transparent copy of the Document, and likewise the network locations given in
the Document for previous versions it was based on. These may be placed in the
“History” section. You may omit a network location for a work that was published
at least four years before the Document itself, or if the original publisher of the
version it refers to gives permission.
L. Preserve all the Invariant Sections of the Document, unaltered in their text and in
their titles. Section numbers or the equivalent are not considered part of the section
titles.
M. Delete any section Entitled “Endorsements”. Such a section may not be included in
the Modified Version.
If the Modified Version includes new front-matter sections or appendices that qualify
as Secondary Sections and contain no material copied from the Document, you may at
your option designate some or all of these sections as invariant. To do this, add their
titles to the list of Invariant Sections in the Modified Version’s license notice. These titles
must be distinct from any other section titles.
You may add a section Entitled “Endorsements”, provided it contains nothing but
endorsements of your Modified Version by various parties—for example, statements of
peer review or that the text has been approved by an organization as the authoritative
definition of a standard.
BIBLIOGRAPHY 355
You may add a passage of up to five words as a Front-Cover Text, and a passage of up
to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified
Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added
by (or through arrangements made by) any one entity. If the Document already includes
a cover text for the same cover, previously added by you or by arrangement made by the
same entity you are acting on behalf of, you may not add another; but you may replace
the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission
to use their names for publicity for or to assert or imply endorsement of any Modified
Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License,
under the terms defined in section 4 above for modified versions, provided that you in-
clude in the combination all of the Invariant Sections of all of the original documents,
unmodified, and list them all as Invariant Sections of your combined work in its license
notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical
Invariant Sections may be replaced with a single copy. If there are multiple Invariant
Sections with the same name but different contents, make the title of each such section
unique by adding at the end of it, in parentheses, the name of the original author or
publisher of that section if known, or else a unique number. Make the same adjustment
to the section titles in the list of Invariant Sections in the license notice of the combined
work.
In the combination, you must combine any sections Entitled “History” in the various
original documents, forming one section Entitled “History”; likewise combine any sections
Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You must delete
all sections Entitled “Endorsements”.
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released
under this License, and replace the individual copies of this License in the various docu-
ments with a single copy that is included in the collection, provided that you follow the
rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individ-
ually under this License, provided you insert a copy of this License into the extracted
document, and follow this License in all other respects regarding verbatim copying of
that document.
Cover Texts may be placed on covers that bracket the Document within the aggregate, or
the electronic equivalent of covers if the Document is in electronic form. Otherwise they
must appear on printed covers that bracket the whole aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of
the Document under the terms of section 4. Replacing Invariant Sections with translations
requires special permission from their copyright holders, but you may include translations
of some or all Invariant Sections in addition to the original versions of these Invariant
Sections. You may include a translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include the original
English version of this License and the original versions of those notices and disclaimers.
In case of a disagreement between the translation and the original version of this License
or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled “Acknowledgements”, “Dedications”, or “His-
tory”, the requirement (section 4) to Preserve its Title (section 1) will typically require
changing the actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly
provided under this License. Any attempt otherwise to copy, modify, sublicense, or dis-
tribute it is void, and will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your license from a particular
copyright holder is reinstated (a) provisionally, unless and until the copyright holder
explicitly and finally terminates your license, and (b) permanently, if the copyright holder
fails to notify you of the violation by some reasonable means prior to 60 days after the
cessation.
Moreover, your license from a particular copyright holder is reinstated permanently if
the copyright holder notifies you of the violation by some reasonable means, this is the
first time you have received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after your receipt of the
notice.
Termination of your rights under this section does not terminate the licenses of parties
who have received copies or rights from you under this License. If your rights have been
terminated and not permanently reinstated, receipt of a copy of some or all of the same
material does not give you any rights to use it.
11. RELICENSING
“Massive Multiauthor Collaboration Site” (or “MMC Site”) means any World Wide
Web server that publishes copyrightable works and also provides prominent facilities for
anybody to edit those works. A public wiki that anybody can edit is an example of such a
server. A “Massive Multiauthor Collaboration” (or “MMC”) contained in the site means
any set of copyrightable works thus published on the MMC site.
“CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0 license pub-
lished by Creative Commons Corporation, a not-for-profit corporation with a principal
place of business in San Francisco, California, as well as future copyleft versions of that
license published by that same organization.
“Incorporate” means to publish or republish a Document, in whole or in part, as part
of another Document.
An MMC is “eligible for relicensing” if it is licensed under this License, and if all
works that were first published under this License somewhere other than this MMC, and
subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or
invariant sections, and (2) were thus incorporated prior to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site under
CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is
eligible for relicensing.
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the
“with … Texts.” line with this:
with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover
Texts being LIST, and with the Back-Cover Texts being LIST.
If you have Invariant Sections without Cover Texts, or some other combination of the
three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend re-
leasing these examples in parallel under your choice of free software license, such as the
GNU General Public License, to permit their use in free software.
Index
358
INDEX 359
Radix tree, 65
range traverse, 33
Red-black tree
Imperative delete, 275
red-black tree, 46
delete, 48
imperative insertion, 52
insert, 47
red-black properties, 46
reduce, 20