0% found this document useful (0 votes)
20 views376 pages

Algoxy en

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views376 pages

Algoxy en

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 376

Elementary Algorithms

Xinyu LIU 1

August 31, 2023

1
Xinyu LIU

1 1 1 1
Version: e =
X
=1+ + + + · · · = 2.718283
n=0
n! 1 1 · 2 1 · 2·3
Email: [email protected]
2
Preface

Programmers learn elementary algorithms at school. Except for programming contest or


code interview, they seldom develop algorithms at work. Most time, the needed algo-
rithms or data structures are provided in libraries. We needn’t bother to ‘re-invent the
wheel’. When talking about algorithms in AI and machine learning, people actually mean
scientific modeling. Elementary algorithms are fundamental things, Let’s start with two
puzzles.

The smallest free number


Richard Bird gives a problem to find the minimum number that not appears in a list
(Chapter 1, [1] ). People use number to index entities. A number is either occupied or free.
When acquire, we always want to allocate the smallest available one. Suppose numbers
are non-negative integers and those being used are recorded in a list, for example:

[18, 4, 8, 9, 16, 1, 14, 7, 19, 3, 0, 5, 2, 11, 6]

How can we find the smallest free number, 10, from this list? It seems quite easy with
exhaustive search:
1: function Min-Free(A)
2: x←0
3: loop
4: if x ∈
/ A then
5: return x
6: else
7: x←x+1
Where the ∈/ is realized as below.
1: function ‘∈’(x,
/ X)
2: for i ← 1 to |X| do
3: if x = X[i] then
4: return False
5: return True
Where |X| = n is the length of X. Some environments have built-in implementation
to test existence of an element. However, this solution performs poor with millions of
numbers. The time spent is quadratic to n. In a computer with 2 cores of 2.10 GHz
CPU, and 2G RAM, the C implementation takes 5.4s to find the answer among 100K
numbers, and exceeds 8 minutes to handle 1 million numbers.

i
ii Preface

Improvement
For n numbers x1 , x2 , ..., xn , if there is a free number, some xi must be out of the range
[0, n)1 ; otherwise the list is exactly some permutation of 0, 1, ..., n − 1 hence n is the
minimum free number.

minfree(x1 , x2 , ..., xn ) ≤ n (1)

We use an array F of n + 1 flags to mark whether a number is free in [0, n].


1: function Min-Free(A)
2: F ←[False, False, ..., False] . n+1
3: for x in A do
4: if x < n then
5: F [x] ← True
6: for i ← 0 to n do
7: if F [i] = False then
8: return i
Initialize F with all False values. For every number x in A, mark the flag F [x] true
if x < n. Finally, scan F to find the first false flag. This program takes time proportion
to n. It uses n + 1 flags to cover the special case that sort(A) = [0, 1, 2, ..., n − 1]. To
avoid repeated allocate then release the flags, we can pre-allocate a sufficient big buffer
for reusing, and change to bit-wise flags instead of array. The C implementation handles
1 million numbers in 0.023s in the same computer.

Divide and Conquer


The divide and conquer method breaks the problem into smaller ones, then solve them
and consolidate the result. Collect xi ≤ bn/2c in sub-list A0 and the rest in another
A00 . According to eq. (1), if the length of |A0 | = bn/2c, it means A0 is ‘full’. The
minimum free number must be in A00 , otherwise in A0 . Both cases lead to a smaller
problem. When search in A00 , we start from bn/2c + 1, but not from 0. We define the
algorithm as search(A, l, u), where l and u are the lower and upper bounds. We start
from minf ree(A) = search(A, l = 0, u = |A| − 1):

search(∅, l, u) = l(
|A0 | = m − l + 1 : search(A00 , m + 1, u)
search(A, l, u) =
otherwise : search(A0 , l, m)

where:

l+u
m=b c

2
A0 = [x ≤ m, x ∈ A], A00 = [x > m, x ∈ A]

This algorithm needn’t additional space2 . Each recursive call performs O(|A|) com-
parisons to partition A0 and A00 , hence halves the problem as T (n) = T (n/2) + O(n).
We can reduce it to O(n) according to the master theorem3 . Below example program
implements this algorithm.
1 Range[a, b) starts from a to b, but excludes b.
2 The
recursion takes O(lg n) stack spaces, but it can be eliminated through tail recursion optimization
3 Alternatively, the first call takes O(n) time to partition A0 and A00 , the second call takes O(n/2)

time, the third call takes O(n/4) time ... The total time is O(n + n/2 + n/4 + ...) = O(2n) = O(n).
Elementary Algorithms iii

minFree xs = bsearch xs 0 (length xs - 1)

bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m + 1) u
| otherwise = bsearch as l m
where
m = (l + u) `div` 2
as = [x | x ← xs, x ≤ m]
bs = [x | x ← xs, x > m]

There are O(lg n) recursive calls. We can eliminate the recursion with loops:
1: function Min-Free(A)
2: l ← 0, u ← |A|
3: while u − l > 0 do
u+l
4: m←b c
2
5: lef t ← l
6: for right ← l to u − 1 do
7: if A[right] ≤ m then
8: Exchange A[lef t] ↔ A[right]
9: lef t ← lef t + 1
10: if lef t < m + 1 then
11: u ← lef t
12: else
13: l ← lef t
As shown in fig. 1, this program re-arranges the array such that all elements before
lef t are less than or equal to m; while those between lef t and right are greater than m.

Figure 1: All A[i] ≤ m where 0 ≤ i < lef t; while A[i] > m where lef t ≤ i < right. The
rest are yet to be scanned.

Regular number
The second problem is to find the 1,500-th number, which only contains factor 2, 3 or 5.
Such numbers are called the regular numbers4 . 2, 3, and 5 are definitely regular numbers.
60 = 22 31 51 is the 25-th regular number. 21 = 20 31 71 is not because it has a factor of 7.
Define 1 = 20 30 50 as the 0-th regular number. The first 10 are:
1, 2, 3, 4, 5, 6, 8, 9, 10, 12, ...

The brute-force solution


We can check numbers one by one from 1, extract all factors of 2, 3 and 5 to see if the
remaining is 1:
1: function Regular-Number(n)
4 Also known as 5-smooth numbers in number theory, and Hamming numbers named after Richard

Hamming.
iv Preface

2: x←1
3: while n > 0 do
4: x←x+1
5: if Valid?(x) then
6: n←n−1
7: return x

8: function Valid?(x)
9: while x mod 2 = 0 do
10: x ← x/2
11: while x mod 3 = 0 do
12: x ← x/3
13: while x mod 5 = 0 do
14: x ← x/5
15: return x = 1 ?
This ‘brute-force’ algorithm performs poor when n increases. The C implementation
takes 40.39s in above computer to find the 1500-th number (860934420).

Improvement
Modular and divide are expensive operations [2] . Instead of checking every number, we
can generate regular numbers from 1 in ascending order. We use a queue, which allows
to add number to one end (enqueue), and remove from the other end (dequeue). The
number enqueued first will be dequeued first (First In First Out). Initialize the queue
with 1, the 0th regular number. We repeatedly dequeue a number, multiply it by 2, 3, 5
to generate 3 new numbers; then add them to the queue in ascending order. We drop the
number if it is already in the queue, as shown in fig. 2.

Q Q

1 2 3 5

1 2

Q Q

3 4 5 6 10 4 5 6 9 10 15

3 4

Figure 2: First 4 steps.

1: function Regular-Number(n)
2: Q ← [1]
Elementary Algorithms v

3: while n > 0 do
4: x ← Dequeue(Q)
5: Unique-Enqueue(Q, 2x)
6: Unique-Enqueue(Q, 3x)
7: Unique-Enqueue(Q, 5x)
8: n←n−1
9: return x

10: function Unique-Enqueue(Q, x)


11: i ← 0, m ← |Q|
12: while i < m and Q[i] < x do
13: i←i+1
14: if i ≥ m or x 6= Q[i] then
15: Insert(Q, i, x)
The Unique-Enqueue function takes O(m) time to uniquely insert a number in
ascending order, where m = |Q| is the length of the queue. It increases proportion to n
(Each time, we dequeue an element, and enqueue at most 3. The increasing ratio ≤ 2),
the total time is O(1 + 2 + 3 + ... + n) = O(n2 ). Figure 3 shows the queue access times
against n. The quadratic curve reflects the O(n2 ) performance.

Figure 3: Queue access count - n.

The corresponding C implementation takes 0.016s to output 860934420, about 2500


times faster than the brute-force solution. Let xs = [x1 , x2 , x3 , ...] be the infinite list of all
regular numbers. Multiply every number by 2, the result is again infinite many regular
numbers: [2x1 , 2x2 , 2x3 , ...], so as multiply by 3 and 5. If we merge the three infinite lists
together, filter out the duplicated numbers, and prepend 1 as the first, we get xs again:

xs = 1 : [2x|x ← xs] ∪ [3x|x ← xs] ∪ [5x|x ← xs] (2)

Where x:xs links x before list xs. It is called ‘cons’ in Lisp. 1 is linked as the head
(the 0th regular number). ∪ merges two lists:

a < b : a : as ∪ (b:bs)

(a:as) ∪ (b:bs) = a = b : a : as ∪ bs

a > b : b : (a:as) ∪ bs

Below is the example program:


vi Preface

xs = 1 : [2∗x | x ← xs] `merge` [3∗x | x ← xs] `merge` [5∗x | x ← xs]

merge (a:as) (b:bs) | a < b = a : merge as (b:bs)


| a == b = a : merge as bs
| otherwise = b : merge (a:as) bs

This example program gives the 1500th number 860934420 by xs !! 1500 in 0.03s
in the same computer.

Queues
The above solution needs filter out duplicated numbers, scan the queue to maintain the
ascending order. We category all regular numbers into 3 disjoint buckets: Q2 = {2i |i > 0},
Q23 = {2i 3j |i ≥ 0, j > 0}, and Q235 = {2i 3j 5k |i, j ≥ 0, k > 0}. Constraint j 6= 0 in Q23 ,
and k 6= 0 in Q235 such there is no overlap. Realize the buckets as 3 queues starting from
Q2 = {2}, Q23 = {3}, and Q235 = {5}. Each time extract the smallest x from the queues,
then do the following:

• If x comes from Q2 , enqueue 2x to Q2 , 3x to Q23 , and 5x to Q235 ;


• If x comes from Q23 , enqueue 3x to Q23 , and 5x to Q235 . We do not add 2x to Q2 ,
because Q2 does not hold any numbers divisible by 3.
• If x comes from Q235 , enqueue 5x to Q235 . We do not add 2x to Q2 , or 3x to Q23
because they don’t hold numbers divisible by 5.

We reach to the answer after dequeue n numbers. Figure 4 gives the first 4 steps.

2 4

3 3 6

5 5 10

4 8

6 9 6 9 9

5 10 15 5 10 15 20

Figure 4: First 4 steps with Q2 , Q23 , and Q235 .

1: function Regular-Number(n)
2: x←1
3: Q2 ← {2}, Q23 ← {3}, Q235 ← {5}
Elementary Algorithms vii

4: while n > 0 do
5: x ← min(Head(Q2 ), Head(Q23 ), Head(Q235 ))
6: if x = Head(Q2 ) then
7: Dequeue(Q2 )
8: Enqueue(Q2 , 2x)
9: Enqueue(Q23 , 3x)
10: Enqueue(Q235 , 5x)
11: else if x = Head(Q23 ) then
12: Dequeue(Q23 )
13: Enqueue(Q23 , 3x)
14: Enqueue(Q235 , 5x)
15: else
16: Dequeue(Q235 )
17: Enqueue(Q235 , 5x)
18: n←n−1
19: return x
This algorithm loops n times. Each time extracts the minimum number in constant
time. Then adds at most 3 numbers to the queues in constant time. The overall perfor-
mance is O(n).

Summary
Both brute-force solutions can’t scale up. This book is not about coding contest or code
interview, but aims to provide both purely functional algorithms and their counterpart
imperative implementations. We referenced many results from Okasaki’s work [3] and
classic text books [4] . We avoid relying on a specific programming language, because the
reader may or may not be familiar with it, and programming languages keep changing.
Instead, we use pseudo code or mathematics notation to make the algorithm definition
generic. When give code examples, the functional ones look more like Haskell, and the
imperative ones look like a mix of several languages.
I wrote the first edition from 2009 to 2017, then rewrote the second edition and added
answers to the 119 exercises from 2020 to 2023. The pdf can be downloaded from github.

Exercise 1
1.1. For the free number puzzle, since all numbers are not negative, we can reuse the
sign as a flag. For every |x| < n (where n is the length), negate the number at
position |x|. Then scan to find the first positive number. Its position is the answer.
Write a program to realize this solution.
1.2. There are n numbers 1, 2, ..., n. After some processing, they are shuffled, and a
number x is altered to y. Suppose 1 ≤ y ≤ n, design a solution to find x and y in
linear time with constant space.
1.3. Below example program is a solution for the regular number puzzle. Is it equivalent
to the queue based solution?
Int regularNum(Int m) {
[Int] nums(m + 1)
Int n = 0, i = 0, j = 0, k = 0
nums[0] = 1
Int x2 = 2 ∗ nums[i]
Int x3 = 3 ∗ nums[j]
Int x5 = 5 ∗ nums[k]
while n < m {
viii Preface

n = n + 1
nums[n] = min(x2, x3, x5)
if x2 == nums[n] {
i = i + 1
x2 = 2 ∗ nums[i]
}
if x3 == nums[n] {
j = j + 1
x3 = 3 ∗ nums[j]
}
if x5 == nums[n] {
k = k + 1
x5 = 5 ∗ nums[k]
}
}
return nums[m]
}
Contents

Preface i

1 List 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Basic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Last . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.3 Right index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.4 Mutate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 sum and product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.6 maximum and minimum . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 map and for-each . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
For each . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.2 reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Sub-list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 break and group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Fold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Search and filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 zip and unzip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Binary Search Tree 27


2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Traverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Insertion sort 39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Binary search tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Red-black tree 43

ix
x CONTENTS

4.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 Imperative red-black tree? . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 AVL tree 57
5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.1 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2.2 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Imperative algorithm F . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Radix tree 65
6.1 Integer trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1.3 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2 Integer prefix tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2.3 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Trie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.1 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.2 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Prefix tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4.1 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.4.2 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.5 Applications of trie and prefix tree . . . . . . . . . . . . . . . . . . . . . . 79
6.5.1 Dictionary and input completion . . . . . . . . . . . . . . . . . . . 80
6.5.2 Predictive text input . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 84

7 B-Tree 89
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2.1 Insert then split . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2.2 Split before insert . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.2.3 Paired lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.3 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.4 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.4.1 Delete and fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.4.2 Merge before delete . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.6 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 106

8 Binary Heaps 109


8.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.2 Binary heap by array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.2.1 Heapify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
CONTENTS xi

8.2.2 Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


8.2.3 Heap operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.2.4 Heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.3 Leftist heap and skew heap . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.3.1 Leftist heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.3.2 Skew heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.4 Splay heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.6 Appendix - example programs . . . . . . . . . . . . . . . . . . . . . . . . . 123

9 Selection sort 127


9.1 Find the minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
9.1.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.2 Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
9.2.1 Cock-tail sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
9.3 Further improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.3.1 Tournament knock out . . . . . . . . . . . . . . . . . . . . . . . . . 132
9.3.2 Heap sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.4 Appendix - example programs . . . . . . . . . . . . . . . . . . . . . . . . . 135

10 Binomial heap, Fibonacci heap, and pairing heap 139


10.1 Binomial Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
10.1.1 Binomial tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
10.1.2 Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
10.1.3 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10.1.4 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
10.1.5 Pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
10.2 Fibonacci heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
10.2.1 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
10.2.2 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
10.2.3 Pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
10.2.4 Increase priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.2.5 The name of Fibonacci heap . . . . . . . . . . . . . . . . . . . . . 153
10.3 Pairing Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
10.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
10.3.2 Merge, insert, and top . . . . . . . . . . . . . . . . . . . . . . . . . 154
10.3.3 decrease key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
10.3.4 Pop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
10.3.5 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
10.4 Appendix - example programs . . . . . . . . . . . . . . . . . . . . . . . . . 158

11 Queue 163
11.1 Linked-list queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
11.2 Circular buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
11.3 Paired-list queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
11.4 Balance Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.5 Real-time queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.6 Lazy real-time queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.7 Appendix - example programs . . . . . . . . . . . . . . . . . . . . . . . . . 171

12 Sequence 173
12.1 Binary random access list . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
xii CONTENTS

12.2 Numeric representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177


12.3 paired-array sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
12.4 Concatenate-able list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
12.5 Finger tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
12.5.1 Insert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
12.5.2 Extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
12.5.3 Append and remove . . . . . . . . . . . . . . . . . . . . . . . . . . 183
12.5.4 concatenate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
12.5.5 Random access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
12.6 Appendix - example programs . . . . . . . . . . . . . . . . . . . . . . . . . 187

13 Quick sort and merge sort 193


13.1 Quick sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
13.1.1 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
13.1.2 In-place sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
13.1.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Average caseF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
13.1.4 Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Challenging cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
13.1.5 quick sort and tree sort . . . . . . . . . . . . . . . . . . . . . . . . 205
13.2 Merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
13.2.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
13.2.2 In-place merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
13.2.3 Nature merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
13.2.4 Bottom-up merge sort . . . . . . . . . . . . . . . . . . . . . . . . . 215
13.3 Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
13.5 Appendix: Example programs . . . . . . . . . . . . . . . . . . . . . . . . . 217

14 Solution search 221


14.1 k selection problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
14.2 Binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
14.2.1 2D search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
14.3 The majority number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
14.4 Maximum sum of sub-vector . . . . . . . . . . . . . . . . . . . . . . . . . . 233
14.5 String matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.6 Solution search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
14.6.1 DFS and BFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Maze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
The eight queens puzzle . . . . . . . . . . . . . . . . . . . . . . . . 239
The peg puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
The wolf, goat, and cabbage puzzle . . . . . . . . . . . . . . . . . . 244
Water jugs puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Kloski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
14.6.2 Greedy algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Huffman coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Change making problem . . . . . . . . . . . . . . . . . . . . . . . . 258
14.6.3 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . 259
Longest common sub-sequence . . . . . . . . . . . . . . . . . . . . 261
Subset sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
14.7 Appendix - example programs . . . . . . . . . . . . . . . . . . . . . . . . . 265
CONTENTS xiii

Appendices

Imperative delete for red-black tree 275

AVL tree - proofs and the delete algorithm 283


I Height increment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
II Balance adjustment after insert . . . . . . . . . . . . . . . . . . . . . . . . 284
III Delete algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
∗ Functional delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
† Imperative delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
IV Example program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

Answers 293

GNU Free Documentation License 351


1. APPLICABILITY AND DEFINITIONS . . . . . . . . . . . . . . . . . . . . 351
2. VERBATIM COPYING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
3. COPYING IN QUANTITY . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
4. MODIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
5. COMBINING DOCUMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . 355
6. COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . . . . . . . . . 355
7. AGGREGATION WITH INDEPENDENT WORKS . . . . . . . . . . . . . 355
8. TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
9. TERMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
10. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . . . . . . . . 356
11. RELICENSING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
ADDENDUM: How to use this License for your documents . . . . . . . . . . . 357
xiv CONTENTS
Chapter 1

List

1.1 Introduction
List and array are build blocks for other complex data structure. Both hold multiple
elements as a container. Array is a range of consecutive cells indexed by a number
(address). It is typically bounded with fixed size. While list increases on-demand. One
can traverse a list one by one from head to tail. Particularly in functional settings, list
plays critical role to control the computation and logic flow1 . Readers already be familiar
with map, filter, fold are safe to skip this chapter, and directly start from chapter 2.

1.2 Definition
List, or singly linked-list is a data structure recursively defined as: A list is either empty,
denoted as [ ] or NIL; or contains an element (also called key) and liked with a sub-list
(called next). Figure 1.1 shows a list of nodes. Every node links to the next or NIL (the
last). We often define list with the compound structure2 , for example:

NIL

Figure 1.1: A list of nodes

data List<A> {
A key
List<A> next
}

Many programming environments support the NIL concept. There are two ways to
represent the empty list: use NIL (or null, or ∅) directly, or create a list, but put nothing
as [ ]. From implementation perspective, NIL need not allocate any memories, while [ ]
does.

1 In low level, lambda calculus plays the most critical role as one of the computation model equivalent

to Turing machine [93] , [99] .


2 In most cases, the data stored in list have the same type. However, there is also heterogeneous list,

like the list in Lisp for example.

1
2 CHAPTER 1. LIST

1.2.1 Access
Given a none empty list X, define two functions3 to access the first element, and the rest
sub-list. They are often called as first X and rest X, or head X and tail X 4 . Conversely,
we can construct a list from an element x and another list xs (can be empty), as x:xs. It
is called the cons operation. We have the following equations:
(
head (x:xs) = x
(1.1)
tail (x:xs) = xs

For a none empty list X, we also denote the first element as x1 , and the rest sub-list
as X 0 . For example, when X = [x1 , x2 , x3 , ...], then X 0 = [x2 , x3 , ...].

Exercise 1.2
1.2.1. For list of type A, suppose we can test if any two elements x, y ∈ A are equal,
define the algorithm to test if two lists are equal.

1.3 Basic operations


From the definition, we can count the length recursively. the length of the empty list is
0, or it is 1 plus the length of the sub-list.
length [ ] = 0
(1.2)
length (x:xs) = 1 + length xs

We traverse the list to count the length, the performance is bound to O(n), where n
is the number of elements. We use |X| as the length of X when the context is clear. To
avoid repeatedly counting, we can persist the length in a variable, and update it when
mutate (add or delete). Below is the iterative length counting:
1: function Length(X)
2: n←0
3: while X 6= NIL do
4: n←n+1
5: X ← Next(X)
6: return n

1.3.1 index
Array supports random access at position i in constant time, while we need traverse the
list i steps to access the target element.
(
i=0: x
getAt i (x:xs) = (1.3)
i 6= 0 : getAt (i − 1) xs

We leave the empty list not handled. The behavior for [ ] is undefined. As such, the
out of bound case also leads to the undefined behavior. If i > |X|, we end up the edge
case to access the (i − |X|) position of the empty list. On the other hand, if i < 0, after
minus it by one, it’s even farther away from 0, and finally ends up with some negative
position of the empty list. getAt is bound to O(i) time as it advances the list i steps.
Below is the imperative implementation:
3 We often write function f (x) as f x, and f (x, y, ..., z) as f x y ... z.
4 They are named as car and cdr in Lisp due to the design of machine registers [63] .
1.3. BASIC OPERATIONS 3

1: function Get-At(i, X)
2: while i 6= 0 do
3: X ← Next(X) . error when X = NIL
4: i←i−1
5: return First(X)

Exercise 1.3
1.3.1. For the iterative Get-At(i, X), what is the behavior when X is empty? what if i
is out of bound?

1.3.2 Last
There is a pair of symmetric operations to ‘first/rest’, namely ‘last/init’. For a none empty
list X = [x1 , x2 , ..., xn ], function last returns the tail element xn , while init returns the
sub-list of [x1 , x2 , ..., xn−1 ]. Although they are symmetric pairs left to right, ‘last/init’
need traverse the list in linear time.
last [x] = x init [x] = [ ]
(1.4)
last (x:xs) = last xs init (x : xs) = x : init xs

Neither handles the empty list. The behavior is undefined with [ ]. Below are the
iterative implementation:
1: function Last(X)
2: x ← NIL
3: while X 6= NIL do
4: x ← First(X)
5: X ← Rest(X)
6: return x

7: function Init(X)
8: X 0 ← NIL
9: while Rest(X) 6= NIL do . Error when X is NIL
10: X 0 ← Cons(First(X), X 0 )
11: X ← Rest(X)
12: return Reverse(X 0 )
Init accumulates the result through Cons. However, the order is reversed. We need
reverse (section 1.4.2) it back.

1.3.3 Right index


last is a special case of right index. The generic case is to find the last i-th element (from
right). The naive implementation traverses two rounds: count the length n first, then
access the (n − i − 1)-th element from left:

lastAt i X = getAt (|X| − i − 1) L

The better solution uses two pointers p1 , p2 with the distance i, i.e., resti (p2 ) = p1 ,
where resti (p2 ) means repeatedly apply rest for i times. When advance p2 by i steps, it
meets p1 . p2 starts from the head. Advance both pointers in parallel till p1 arrives at tail.
At this time point, p2 exactly points to the i-th element from right. as shown in fig. 1.2.
p1 and p2 form a sliding window of width i.
4 CHAPTER 1. LIST

Figure 1.2: Sliding window. (a) p2 starts from the head, behind p1 in i steps; (b) When
p1 reaches the tail, p2 points to the i-th element from right.

1: function Last-At(i, X)
2: p←X
3: while i > 0 do
4: X ← Rest(X) . Error if out of bound
5: i←i−1
6: while Rest(X) 6= NIL do
7: X ← Rest(X)
8: p ← Rest(p)
9: return First(p)
We can’t alter the pointers in purely functional settings. Instead, we advance two
lists X = [x1 , x2 , ..., xn ] and Y = [xi , xi+1 , ..., xn ] simultaneously, where Y is the sub-list
without the first i − 1 elements.

lastAt i X = slide X (drop i X) (1.5)

Where:

slide (x:xs) [y] = x


(1.6)
slide (x:xs) (y:ys) = slide xs ys

Function drop m X discards the first m elements.

drop 0 xs = xs
drop m [ ] = [ ] (1.7)
drop m (x:xs) = drop (m − 1) xs

Exercise 1.4

1.4.1. In the Init algorithm, can we use Append(X 0 , First(X)) instead of Cons?
1.4.2. How to handle empty list or out of bound error in Last-At?
1.3. BASIC OPERATIONS 5

1.3.4 Mutate
Mutate includes append, insert, update, and delete. The functional environment actually
implements mutate by creating a new list for the changed part, while keeps (persists) the
original one for reuse, or release at sometime (chapter 2 in [3] ). Append is the symmetric
operation of cons, it appends element to the tail, while cons adds from head. It is also
known as ‘snoc’ (reverse of ‘cons’). As it need traverse the list to the tail, the performance
is O(n), where n is the length. To avoid repeatedly traverse, we can persist the tail
reference, and update it for changes.

append [ ] x = [x]
(1.8)
append (y:ys) x = y : append ys x

Below is the corresponding iterative implementation5 :


1: function Append(X, x)
2: if X = NIL then
3: return Cons(x, NIL)
4: H←X . Copy of the head
5: while Rest(X) 6= NIL do
6: X ← Rest(X)
7: Rest(X) ← Cons(x, NIL)
8: return H
To update the Rest, it is typically implemented by updating the next reference, for
example:
List<A> append(List<A> xs, A x) {
if xs == null then return cons(x, null)
var head = xs
while xs.next 6= null {
xs = xs.next
}
xs.next = cons(x, null)
return head
}

Similar to getAt, we need advance to the target position i in O(i) time to change the
element.

setAt 0 x (y:ys) = x : ys
(1.9)
setAt i x (y:ys) = y : setAt (i − 1) x ys

Exercise 1.5
1.5.1. Add the ‘tail’ reference, optimize the append to constant time.
1.5.2. When need update the tail reference? How does it affect the performance?
1.5.3. Handle the empty list and out of bound error for setAt.

insert
There are two different cases about insertion: (1) insert an element at a given position:
insert i x X, similar to setAt; (2) insert an element to a sorted list, and maintain the
ordering.
5 The parameter orders are also symmetric: cons x xs and append xs x
6 CHAPTER 1. LIST

insert 0 x ys = x : ys
(1.10)
insert i x (y:ys) = y : insert (i − 1) x ys
When i exceeds the length, treat it as append (the exercise of this section). Below is
the iterative implementation:
1: function Insert(i, x, X)
2: if i = 0 then
3: return Cons(x, X)
4: H←X
5: p←X
6: while i > 0 and X 6= NIL do
7: p←X
8: X ← Rest(X)
9: i←i−1
10: Rest(p) ← Cons(x, X)
11: return H
Let the list L = [x1 , x2 , ..., xn ] be sorted, i.e., for any position 1 ≤ i ≤ j ≤ n, then
xi ≤ xj . Where ≤ is abstract ordering. It can be ≥, subset between sets, and etc. We
define insert to maintain the ordering.
insert x [ ] = [x]
(
x≤y: x:y:ys (1.11)
insert x (y:ys) =
otherwise : y : insert x ys

Since it need compare elements one by one, the performance is bound to O(n) time,
where n is the length. Below is the iterative implementation:
1: function Insert(x, X)
2: if X = NIL or x < First(X) then
3: return Cons(x, X)
4: H←X
5: while Rest(X) 6= NIL and First(Rest(X)) < x do
6: X ← Rest(X)
7: Rest(X) ← Cons(x, Rest(X))
8: return H
With insert, we can further define the insertion sort: repeatedly insert elements to
the empty list. Since each insert takes liner time, the overall time is bound to O(n2 ).
sort [ ] = [ ]
(1.12)
sort (x:xs) = insert x (sort xs)
We can eliminate the recursion to implement the iterative sort. Scan the list, and
insert elements one by one:
1: function Sort(X)
2: S ← NIL
3: while X 6= NIL do
4: S ← Insert(First(X), S)
5: X ← Rest(X)
6: return S
At any time during loop, S is sorted. The recursive implementation processes the list
from right, while the iterative one is from left. We’ll use ‘tail-recursion’ in section 1.3.5 to
eliminate this difference. Chapter 3 is about insertion sort in detail, including performance
analysis and optimization.
1.3. BASIC OPERATIONS 7

Exercise 1.6
1.6.1. Handle the out-of-bound case when insert, treat it as append.
1.6.2. Implement insert for array. When insert at position i, all elements after i need
shift to the end.

delete
Symmetric to insert, there are two cases for deletion: (1) delete the element at a position
delAt i X; (2) look up then delete the element of a given value delete x X. To delete
the element at position i, we advance i steps, then by pass the element, and link the rest
sub-list.
delAt i [ ] = [ ]
delAt 0 (x:xs) = xs (1.13)
delAt i (x:xs) = x : delAt (i − 1) xs

It is bound to O(i) time. Below is the iterative implementation.


1: function Del-At(i, X)
2: S ← Cons(⊥, X) . Sentinel node
3: p←S
4: while i > 0 and X 6= NIL do
5: i←i−1
6: p←X
7: X ← Rest(X)
8: if X 6= NIL then
9: Rest(p) ← Rest(X)
10: return Rest(S)
To simplify the implementation, we introduce a sentinel node S, it contains a special
value ⊥, and points to X. With S, we are save to cut-off any node in X even for the
head. Finally, we return the list after S as the result, and discard S. For ‘find and delete’,
there are two sub-cases: (1) find and delete the first occurrence of a value; (2) remove all
the occurrences. The later is more generic (see the exercise).

delete x [ ] = [(]
x = y : ys (1.14)
delete x (y:ys) =
x 6= y : y : delete x ys

Because we scan the list to find the target element, the time is bound to O(n), where
n is the length. We use a sentinel node to simplify the iterative implementation too:
1: function Delete(x, X)
2: S ← Cons(⊥, X)
3: p←X
4: while X 6= NIL and First(X) 6= x do
5: p←X
6: X ← Rest(X)
7: if X 6= NIL then
8: Rest(p) ← Rest(X)
9: return Rest(S)

Exercise 1.7
8 CHAPTER 1. LIST

1.7.1. Implement the algorithm to find and delete all occurrences of a given value.
1.7.2. Design the delete algorithm for array, all elements after the delete position need
shift to front.

concatenate
Append is a special case for concatenation. It adds only one element, while concatenation
adds multiple. However, the performance would be quadratic if repeatedly append. Let
|xs| = n, |ys| = m be the lengths, we need advance to the tail of xs for m times, the
performance is O(n + (n + 1) + ... + (n + m)) = O(nm + m2 ).

xs +
+ [ ] = xs
xs +
+ (y:ys) = append xs y +
+ ys

While the ‘cons’ is fast (constant time), we can traverse to the tail of xs only once,
then link to ys.

[ ]+
+ ys = ys
xs +
+[ ] = xs (1.15)
(x:xs) +
+ ys = x : (xs +
+ ys)

This improvement has the performance of O(n). In imperative settings, we can im-
plement concatenation in constant time with the tail reference variable (see exercise).
1: function Concat(X, Y )
2: if X = NIL then
3: return Y
4: if Y = NIL then
5: return X
6: H←X
7: while Rest(X) 6= NIL do
8: X ← Rest(X)
9: Rest(X) ← Y
10: return H

1.3.5 sum and product


Sum and product have the same structure. We will introduce how to abstract them to
higher order computation in section 1.6. For empty list, define the sum as 0, the product
as 1.
sum [ ] = 0 product [ ] = 1
(1.16)
sum (x:xs) = x + sum xs product(x:xs) = x · product xs

Both need traverse the list, hence the performance is O(n). They compute from right
to left. We can change to accumulate the result from left to right.

sum0 a [ ] = a prod0 a [ ] = a
sum0 a (x:xs) = sum (x + a) xs prod0 a (x:xs) = prod0 (x · a) xs
(1.17)
Given a list, we call sum0 with 0, and prod0 with 1 to start accumulating:

sum xs = sum0 0 xs product xs = prod0 1 xs (1.18)

Or in Curried form:
1.3. BASIC OPERATIONS 9

sum = sum0 0 product = prod0 1

Curried form was introduced by Schönfinkel (1889 - 1942) in 1924, then widely used by
Haskell Curry from 1958. It is known as Currying [73] . For a function taking 2 parameters
f (x, y), when fix x with a value, it becomes a function of y: g(yy ) = f (x, y ) or g = f x.
For multiple variables of f (x, y, ..., z), we convert it to a series of Curried functions:
f, f x, f x y, ..., each takes one parameter: f (x, y, ..., z) = f (x)(y)...(z) = f x y ... z.
The accumulated implementation computes from left to right, needn’t book keeping
any context, state, or intermediate result for recursion. All states are either passed as
argument (for example a), or dropped (for example the previous element). We can further
optimize such recursive calls to loops. Because the recursion happens at the tail of the
function, we call them tail recursion (or ‘tail call’), and the process to eliminate recur-
sion as ‘tail recursion optimization’ [61] . It greatly improves the performance and avoid
stack overflow due to deep recursions. In eq. (1.12) about insertion sort, the recursive
implementation sorts elements form right. We can also optimize it to tail call:

sort0 a [ ] = a
(1.19)
sort a (x:xs) = sort0 (insert x a) xs
0

We pass [ ] to start sorting (Curried form): sort = sort0 [ ]. As a typical tail call
example, consider how to compute bn effectively? (problem 1.16 in [63] .) A direct imple-
mentation repeatedly multiplies b for n times from 1, which is bound to O(n) time:
1: function Pow(b, n)
2: x←1
3: loop n times
4: x←x·b
5: return x
When compute b8 , after the first 2 loops, we get x = b2 . At this stage, we needn’t
multiply x with b to get b3 , but directly compute x2 , which gives b4 . If do this again, we
get (b4 )2 = b8 . We only need loop 4 times, but not 8 times. If n = 2m for some integer
m ≥ 0, we can compute bn fast as below:

b1 = b
n
n
b = (b 2 )2

We next extend this divide and conquer method to any integer n ≥ 0: if n = 0, define
n
b0 = 1; if n is even, we halve n to compute b 2 , then square it; if n is odd, since n − 1 is
even, we recursively compute bn−1 , then multiply b:

b0 = (
1
n
2|n : (b 2 )2 (1.20)
bn =
otherwise : b · bn−1
n
However, (b 2 )2 is not tail recursive. Alternatively, we square the base number, and
halve the exponent.

b0 = (
1
n
n 2|n : (b2 ) 2 (1.21)
b =
otherwise : b · bn−1

With this change, we get a tail recursive function to compute bn = pow(b, n, 1).
10 CHAPTER 1. LIST

pow(b, 0, a) = a
( n
2|n : pow(b2 , , a) (1.22)
pow(b, n, a) = 2
otherwise : pow(b, n − 1, ab)

This implementation is bound to O(lg n) time. Write n in binary format n = (am am−1 ...a1 a0 )2 ,
we need compute b2 if ai = 1, similar to the Binomial heap (section 10.1) algorithm. Mul-
i

tiply them together finally. For example, when compute b11 , as 11 = (1011)2 = 23 + 2 + 1,
gives b11 = b2 × b2 × b. We follow these steps:
3

1. compute b1 , which is b;
2. Square to b2 ;
3. Square to b2 ;
2

4. Square to b2 .
3

Finally, multiply the result of step 1, 2, and 4 to get b11 .

pow(b, 0, a) = a
 n
2|n : pow(b2 , , a) (1.23)
pow(b, n, a) = 2n
otherwise : pow(b2 , b c, ab)
2
This algorithm essentially shifts n to right 1 bit a time (divide n by 2). If the LSB
(the least significant bit) is 0, n is even, squares the base and keeps the accumulator a
unchanged. If the LSB is 1, n is odd, squares the base and accumulates it to a. When
n becomes zero, we exhaust all bits, a is the final result. At any time, the updated base
b0 , the shifted exponent n0 , and the accumulator a satisfy the invariant bn = a(b0 )n .The
0

previous implementation minus one for odd n, the improvement halves n every time.
It exactly runs m rounds, where m is the number of bits. We leave the imperative
implementation as exercise.
Back to the sum and product, the iterative implementation applies plus and multiply
while traversing:
1: function Sum(X)
2: s←0
3: while X 6= NIL do
4: s ← s+ First(X)
5: X ← Rest(X)
6: return s

7: function Product(X)
8: p←1
9: while X 6= NIL do
10: p ← p · First(X)
11: X ← Rest(X)
12: return p
With product, we can define factorial of n as: n! = product [1..n].

1.3.6 maximum and minimum


For a list of comparable elements (we can define order for any two elements), there is the
maximum and minimum. max/min share the same structure:
1.4. TRANSFORM 11

min [x] = x max [x] = x


x < min xs : x x > max xs : x
( (
min (x:xs) = max (x:xs) =
otherwise : min xs otherwise : max xs
(1.24)
Both process the list from right. We can change them to tail recursive. It also makes
the computation ‘on-line’, that at any time, the accumulator is the min/max so far. Use
min for example:

min0 a [ ] = a
min0 x xs
(
x<a: (1.25)
min0 a (x:xs) =
otherwise : min0 a xs

Different from sum0 /prod0 , we can’t pass a fixed starting value to min0 /max0 , unless
±∞ (Curried form):

min = min0 ∞ max = max0 −∞

We can pass the first element given min/max only take none empty list:

min (x:xs) = min0 x xs max (x:xs) = max0 x xs (1.26)

We can optimize the tail recursive implementation with loops. Use the Min for ex-
ample.
1: function Min(X)
2: m ← First(X)
3: X ← Rest(X)
4: while X 6= NIL do
5: if First(X) < m then
6: m ← First(X)
7: X ← Rest(X)
8: return m
Alternatively, we can re-use the first element as the accumulator. Every time, we
compare the first two elements, and drop one. Below is the example for min.

min [x] = x
min (x1 :xs)
(
x1 < x 2 : (1.27)
min (x1 :x2 :xs) =
otherwise : min (x2 :xs)

Exercise 1.8
1.8.1. Change length to tail recursive.
1.8.2. Compute bn through the binary format of n.

1.4 Transform
In algebra, there are two types of transformation: one keeps the list structure, but only
transforms the elements; the other alter the list structure, hence the result is not isomor-
phic. We call the former map.
12 CHAPTER 1. LIST

1.4.1 map and for-each


The first example converts a list of numbers to strings. Transform [3, 1, 2, 4, 5] to [“three”,
“one”, “two”, “four”, “five”]

toStr [ ] = [ ]
(1.28)
toStr (x:xs) = (str x) : toStr xs

For the second example, given a dictionary, which is a list of words grouped by their
initials:
[[a, an, another, ... ],
[bat, bath, bool, bus, ...],
...,
[zero, zoo, ...]]

Next process a text (Hamlet for example), augment each word with the number of
occurrences, like:
[[(a, 1041), (an, 432), (another, 802), ... ],
[(bat, 5), (bath, 34), (bool, 11), (bus, 0), ...],
...,
[(zero 12), (zoo, 0), ...]]

Now for every initial letter, which word occurs most? The answer is a list of words,
that every one has the most occurrences in the group, like [a, but, can, ...]. We
need a program that transforms a list of groups of word-number pairs into a list
of words. First, define a function, which takes a list of word-number pairs, finds the
word paired with the biggest number. Sort is overkill. We need a special max function
maxBy cmp xs, where cmp is the generic compare function.

maxBy cmp [x] = x


(
cmp x1 x2 : maxBy cmp (x2 :xs) (1.29)
maxBy cmp (x1 :x2 :xs) =
otherwise : maxBy cmp (x1 :xs)

For a pair p = (a, b) we define two functions:


(
f st (a, b) = a
(1.30)
snd (a, b) = b

Then define a special compare function for word-count pairs:

less p1 p2 = snd p1 < snd p2 (1.31)

Then pass less to maxBy (in Curried form): max00 = maxBy less. Finally, call max00
to process the list:

solve [ ] = [ ]
(1.32)
solve (x:xs) = (f st (max00 x)) : solve xs

solve and toStr share the same structure. We abstract it as map:

map f [ ] = [ ]
(1.33)
map f (x:xs) = (f x) : map f xs

map takes a function f , applies it to every element to form a new list. A function that
computes with other functions is called high-order function. Let the type of f be A → B.
It sends an element of A to the result of B, the type of map is:
1.4. TRANSFORM 13

map :: (A → B) → [A] → [B] (1.34)

Read as: map takes a function of A → B, converts a list [A] to another list [B]. We
can define the above two examples with map as below (in Curried form):

toStr = map str solve = map (f st ◦ max00 )

Where f ◦g is function composition, i.e. first apply g then apply f . (f ◦g) x = f (g(x)),
read as f after g. From the set theory point of view. Function y = f (x) defines the map
from x in set X to y in set Y :

Y = {f (x)|x ∈ X} (1.35)

This type of set definition is called Zermelo-Frankel set abstraction (known as ZF


expression) [72] . The difference is that the mapping is from a list (but not set) to another:
Y = [f (x)|x ← Y ]. There can be duplicated elements. For list, such ZF style expres-
sion is called list comprehension. It is a powerful tool. Let us see how to realize the
permutation algorithm for example. Extend from full-permutations [72] [94] , we define a
generic perm X r, that permutes r out of the total n elements in list X. There are total
n!
Pnr = permutations.
(n − r)!

|X| < r or r = 0 :
(
[[ ]]
perm X r =
otherwise : [x:ys | x ← X, ys ← perm (delete x X) (r − 1)]
(1.36)
If pick zero element, or there are too few (less than r), the result is a list of empty[[
]]; otherwise, for every x in X, we recursively pick r − 1 out of the rest n − 1 elements;
then prepend x for each.
We use a sentinel node in the iterative Map implementation.
1: function Map(f, X)
2: X 0 ← Cons(⊥, NIL) . the sentinel
3: p ← X0
4: while X 6= NIL do
5: x ← First(X)
6: X ← Rest(X)
7: Rest(p) ← Cons(f (x), NIL)
8: p ← Rest(p)
9: return Rest(X 0 ) . discard the sentinel

For each
Sometimes we only need process the elements one by one without building the new list,
for example, print every element:
1: function Print(X)
2: while X 6= NIL do
3: print First(X)
4: X ← Rest(X)
More generally, we pass a procedure P , then apply P to each element.
1: function For-Each(P, X)
2: while X 6= NIL do
3: P(First(X))
4: X ← Rest(X)
14 CHAPTER 1. LIST

For example, consider the “n-lights puzzle” [96] . There are n lights in a room, all are
off. We execute the following for n rounds:

1. Switch all lights on;

2. Switch lights of number 2, 4, 6, ... , that every other light is switched;

3. Switch every third lights, number 3, 6, 9, ... ;

4. ...

At the last round, only the n-th light is switched. How many lights are on in the end?
We start with a brute-force solution. Represent the n lights as a list of 0/1 numbers (0:
off, 1: on). Start from all zeros: [0, 0, ..., 0]. Label the light from 1 to n, then map them
to (i, on/off) pairs:

lights = map (i 7→ (i, 0)) [1, 2, ..., n]

It binds each number to zero, i.e., a list of pairs: L = [(1, 0), (2, 0), ..., (n, 0)]. We
operate this list of pairs n rounds. In the i-th round, for every pair (j, x), if i|j (meaning
j mod i = 0), then switch it on/off. As 1 − 0 = 1 and 1 − 1 = 0, we switch x to 1 − x.

j mod i = 0 : (j, 1 − x)
(
switch i (j, x)) = (1.37)
otherwise : (j, x)

Realize the i-th round of operation as map (switch i) L (we use the Curried form of
switch). Next, define a function op(), which performs mapping on L over and over for n
rounds: op [1, 2, ..., n] L.

op [ ] L = L
(1.38)
op (i:is) L = op is (map (switch i) L)

Finally, sum the second value of each pair to get the answer.

solve n = sum (map snd (op [1, 2, ..., n] L)) (1.39)

Below is the example program:


solve = sum ◦ (map snd) ◦ proc where
lights = map (λi → (i, 0)) [1..n]
proc n = operate [1..n] lights
operate [] xs = xs
operate (i:is) xs = operate is (map (switch i) xs)
switch i (j, x) = if j `mod` i == 0 then (j, 1 - x) else (j, x)

Run from 1 to 100 lights to give below answers (added line breaks):

[1,1,1,
2,2,2,2,2,
3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,10]
1.4. TRANSFORM 15

They form a pattern: the first 3 answers are 1; the 4-th to the 8-th answers are 2; the
9-th to the 15-th answers are 3; ... It seems that from the i2 -th to the ((i + 1)2 − 1)-th
answers are i. Let’s prove it:

Proof. Given n lights labeled from 1 to n, all light are off when start. The lights which
are switched odd times are on finally. For every light i, we switch it at round j if j divides
i (j|i). Only the lights which have odd number of factors are on in the end. The key
point to solve this puzzle, is to find all the numbers that have odd number of factors. For
any natural number n, let S be the set of all factors of n. Initialize S as ∅. If p is a factor
of n, there must exist a natural number q such that n = pq. It means q is also a factor
of n. We add 2 different factors to set S if and only if p 6= q, which keeps |S| even all the
time unless p = q. In such case, n is a square number. We can only add 1 factor to set
S, which leads to odd number of factors.

We have a fast solution by counting the square numbers under n.



solve(n) = b nc (1.40)

Below example program outputs the answer for 1, 2, ..., 100 lights:
map (floor ◦ sqrt) [1..100]

Map is abstract, does not limit to list, but applies to many complex algebraic struc-
tures. The next chapter explains how to map trees. We can apply mapping as long as we
can traverse the structure, and the empty is defined.

1.4.2 reverse
It’s a good exercise to reverse a singly linked-list with constant space. One must carefully
manipulate the node reference, while there is an easy way: (1) Write a purely recursive
solution; (2) Change it to tail recursive; (3) Convert to imperative implementation. The
purely recursive solution is direct:

reverse [ ] = [ ]
reverse (x:xs) = append (reverse xs) x

Next convert it to tail recursive. Use an accumulator to store the reversed part, start
from empty: reverse = reverse0 [ ]

reverse0 a [ ] = a
(1.41)
reverse0 a (x:xs) = reverse0 (x:a) xs

Different from appending, cons (:) takes constant time. We repeatedly extract the
head element, and prepend to the accumulator. It likes to store the elements in a stack,
then pop them out. The overall performance is O(n), where n is the length. Since tail
call need not keep the context, we next convert it to iterative loops:
1: function Reverse(X)
2: A ← NIL
3: while X 6= NIL do
4: A ← Cons(First(X), A)
5: X ← Rest(X)
6: return A
However, this implementation creates a new reversed list, but not reverses in-place.
We change it further:
16 CHAPTER 1. LIST

List<T> reverse(List<T> xs) {


List<T> p, ys = null
while xs 6= null {
p = xs
xs = xs.next
p.next = ys
ys = p
}
return ys
}

Exercise 1.9

1.9.1. Find the maximum v in a list of pairs [(k, v)] in tail recursive way.

1.5 Sub-list
One can slice an array fast, but need linear time to traverse and extract sub-list. take
extracts the first n elements. It is equivalent to: sublist 1 n X. drop discards the first n
elements. It is equivalent to: sublist (n + 1) |X| X, which is symmetric to take6 :

take 0 xs = [ ] drop 0 xs = xs
take n [ ] = [ ] drop n [ ] = [ ]
take n (x:xs) = x : take (n − 1) xs drop n (x:xs) = drop (n − 1) xs
(1.42)
When n > |X| or n < 0, it ends up with the empty list case. We leave the imperative
implementation as exercise. We can extract the sub-list at any position for a given length:

sublist s n X = take n (drop (s − 1) X) (1.43)

Or slice the list with left and right boundaries:

slice s e X = drop (s − 1) (take e X) (1.44)

The range [s, e] includes both ends. We can split the list at a position:

splitAt i X = (take i X, drop i X) (1.45)

We can extend take/drop to keep taking or dropping as far as some condition is


satisfied, Define takeWhile/dropWhile, that scan every element with a predicate p, stop
when any element doesn’t satisfy. They ignore the rest even if some still satisfy p. We’ll
see this difference in the section of filtering.

takeWhile p [ ] = (
[] dropWhile p [ ] = [(]
p(x) : x : takeWhile p xs p(x) : dropWhile p xs
takeWhile p (x:xs) = dropWhile p (x:xs) =
otherwise : [ ] otherwise : x:xs
(1.46)

6 Some programming languages provide built-in implementation, for example in Python: xs[:m] and

xs[m:] correspond to take and drop.


1.5. SUB-LIST 17

1.5.1 break and group


Break and group re-arrange a list into multiple sub-lists. They typically collect the sub-
lists while traversing to achieve linear performance. We can consider break/span generic
splitting. Not at a given position, break/span scan the list, extract the longest prefix
with a prediction p. There are two options for p: pick the elements satisfied; or pick those
not satisfied. The former is span, the later is break.

span p [ ] = ([
( ], [ ])
p(x) : (x:as, bs), where (as, bs) = span p xs (1.47)
span p (x:xs) =
otherwise : ([ ], x:xs)

We define break by negating the predication: break p = span (¬p). span and break
find the longest prefix. They stop immediately when the condition is broken and ignore
the rest. Below is the iterative implementation of span:
1: function Span(p, X)
2: A←X
3: tail ← NIL
4: while X 6= NIL and p(First(X)) do
5: tail ← X
6: X ← Rest(X)
7: if tail = NIL then
8: return (NIL, X)
9: Rest(tail) ← NIL
10: return (A, X)
span and break cut the list into two parts, group divides list into multiple sub-lists. For
example, group a long string into small units, each contains consecutive same characters:

group “Mississippi” = [“M”, “i”, “ss”, “i”, “ss”,“i”, “pp”, “i”]

For another example, given a list of numbers: X = [15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1,
4, 8, 3, 14, 2], divide it into small descending sub-lists:

group X = [[15, 9, 0], [12, 11, 7], [10, 5], [6], [13, 1], [4], [8, 3], [14, 2]]

Both are useful. We can build a Radix tree from string groups, support fast text
search (chapter 6). We can implement the nature merge sort algorithm from number
groups (chapter 13). Abstract the group condition as a relation ∼. It tests whether two
consecutive elements x, y are ‘equivalent’: x ∼ y. We scan the list, compare two elements
each time. If they are equivalent, then add both to a group; otherwise put to two different
ones.
group ∼ [ ] = [[ ]]
group ∼ [x] = [[x]]
(x:ys):yss, where (ys:yss) = group ∼ (y:xs)
(
x∼y:
group ∼ (x:y:xs) =
otherwise : [x]:ys:yss
(1.48)
It is bound to O(n) time, where n is the length. For the iterative implementation, if
X isn’t empty, initialize the result groups as [[x1 ]]. Scan from the second element, append
it to the last group if the two consecutive elements are ‘equivalent’; otherwise start a new
group.
1: function Group(∼, X)
18 CHAPTER 1. LIST

2: if X = NIL then
3: return [[ ]]
4: x ← First(X)
5: X ← Rest(X)
6: g ← [x]
7: G ← [g]
8: while X 6= NIL do
9: y ← First(X)
10: if x ∼ y then
11: g ← Append(g, y)
12: else
13: g ← [y]
14: G ← Append(G, g)
15: x←y
16: X ← Next(X)
17: return G
However, the performance downgrades to quadratic without the tail reference opti-
mization for Append. We can change to Cons if don’t care the order. We can define
the above 2 examples as group (=) “Mississippi” and group (≥) X. Alternatively, we
can realize grouping with span: repeatedly apply span to the rest till it becomes empty.
However, span takes an unary function as the predication, while the group needs a binary.
We solve it with Currying: pass and fix the first argument.

group ∼ [ ] = [[ ]]
group ∼ (x:xs) = (x:as) : group ∼ bs, where (as, bs) = span (x ∼) xs
(1.49)
Although the new function groups string correctly, it can’t group numbers to de-
scending lists: group (≥) X = [[15,9,0,12,11,7,10,5,6,13,1,4,8,3,14,2]]. When put the first
number 15 as the left hand of ≥, it is the maximum , hence span ends with putting all
numbers to as and leaves bs empty. It is not a defect, but the correct behavior. Because
group is defined to put equivalent elements together. The equivalent relation (∼) must
satisfy three axioms: reflexive, symmetric, and transitive:

1. Reflexive. x ∼ x;

2. Symmetric. x ∼ y ⇔ y ∼ x.

3. Transitive. x ∼ y, y ∼ z ⇒ x ∼ z;

When group “Mississippi”, the equal (=) operator satisfies the three axioms, hence
generates the correct result. However, the Curried (≥) as an equivalent relationship,
violets both reflexive and symmetric axioms, hence generates unexpected result. The
second implementation via span, limits its use case to strict equivalence; while the first
one does not. It only tests the predication for every two elements matches, which is
weaker than equivalence.

Exercise 1.10
1.10.1. Change the take/drop implementation. When n is negative, returns [ ] for take,
and the entire list for drop.
1.10.2. Implement the in-place imperative take/drop.
1.10.3. Define sublist and slice in Curried Form without X as parameter.
1.6. FOLD 19

1.10.4. Consider the below span implementation:


span p [ ] = ([
( ], [ ])
p(x) : (x : as, bs), where : (as, bs) = span(p, xs)
span p (x:xs) =
otherwise : (as, x : bs)

What is the difference here?

1.6 Fold
Almost all list algorithms share the common structure. It is not by chance. The common-
ality is rooted from the recursive nature of list. We can abstract the list algorithm to a
high level concept, fold7 , which is essentially the initial algebra of all list computations [99] .
Observe sum, product, and sort for the common structure: the result for empty list is
0 for sum, 1 for product, and [ ] for sort; the binary operation that applies to the head
and the recursive result. It’s plus for sum, multiply for product, and ordered insertion for
sort. We abstract the result for empty list as the initial value z (generic zero), the binary
operation as ⊕. define:
h ⊕ z[] = z
(1.50)
h ⊕ z (x:xs) = x ⊕ (h ⊕ z xs)

Feed a list X = [x1 , x2 , ..., xn ] and expand:


h ⊕ z [x1 , x2 , ..., xn ]
= x1 ⊕ (h ⊕ z [x2 , x3 , ..., xn ])
= x1 ⊕ (x2 ⊕ (h ⊕ z [x3 , ..., xn ]))
...
= x1 ⊕ (x2 ⊕ (...(xn ⊕ (h ⊕ z [ ]))...))
= x1 ⊕ (x2 ⊕ (...(xn ⊕ z)...))
The parentheses are necessary, because the computation starts from the right-most
(xn ⊕ z), repeatedly folds left towards x1 . This is quite similar to a fold-fan in fig. 1.3.
Fold-fan is made of bamboo and paper. Multiple frames stack together with an axis at
one end. The arc shape paper is fully expanded by these frames; We can close the fan by
folding the paper. It ends up as a stick.

Figure 1.3: Fold fan

Consider the fold-fan as a list of bamboo frames. The binary operation is to fold a
frame to the top of the stack (initialized empty). To fold the fan, start from one end,
7 also known as reduce
20 CHAPTER 1. LIST

repeatedly apply the binary operation, till all the frames are stacked. The sum and
product algorithms do the same thing essentially.
sum [1, 2, 3, 4, 5] = 1 + (2 + (3 + (4 + 5))) product [1, 2, 3, 4, 5] = 1 × (2 × (3 × (4 × 5)))
= 1 + (2 + (3 + 9)) = 1 × (2 × (3 × 20))
= 1 + (2 + 12) = 1 × (2 × 60)
= 1 + 14 = 1 × 120
= 15 = 120

We name this kind of processes fold. Particularly, since the computation is from right,
we denote it as f oldr:

f oldr f z [ ] = z
(1.51)
f oldr f z (x:xs) = f x (f oldr f z xs)

Define sum and product with f oldr as below:


Pn
i=1 xi = x1 + (x2 + (x3 + ... + (xn−1 + xn ))...) (1.52)
= f oldr (+) 0 [x1 , x2 , ..., xn ]
Qn
i=1 xi = x1 × (x2 × (x3 × ... + (xn−1 × xn ))...) (1.53)
= f oldr (×) 1 [x1 , x2 , ..., xn ]

Or in Curried form: sum = f oldr (+) 0, product = f oldr (×) 1, for insertion-sort, it
is: sort = f oldr insert [ ]. Convert f oldr to tail recursive. It generates the result from
left. denote it as f oldl:

f oldl f z [ ] = z
(1.54)
f oldl f z (x:xs) = f oldl f (f z x) xs

Use sum for example, we can see how the computation is expanded from left to right:

f oldl (+) 0 [1, 2, 3, 4, 5]


= f oldl (+) (0 + 1) [2, 3, 4, 5]
= f oldl (+) (0 + 1 + 2) [3, 4, 5]
= f oldl (+) (0 + 1 + 2 + 3) [4, 5]
= f oldl (+) (0 + 1 + 2 + 3 + 4) [5]
= f oldl (+) (0 + 1 + 2 + 3 + 4 + 5) [ ]
= 0+1+2+3+4+5

The evaluation of f (z, x) is delayed in every step (the lazy evaluation). Otherwise,
they will be evaluated in sequence of [1, 3, 6, 10, 15] in each call. Generally, we can expand
f oldl as (infix notation):

f oldl (⊕) z [x1 , x2 , ..., xn ] = z ⊕ x1 ⊕ x2 ⊕ ... ⊕ xn (1.55)

f oldl is tail recursive. We can convert it to loops, called Reduce.


1: function Reduce(f, z, X)
2: while X 6= NIL do
3: z ← f (z, First(X) )
4: X ← Rest(X)
5: return z
Both f oldr and f oldl have their own suitable use cases. They are not necessarily
exchangeable. For example, some container only allows to add element to one end (like
stack). We can define a function fromList to build such a container from a list (in Curried
form):
1.6. FOLD 21

fromList = f oldr add ∅

Where ∅ is the empty container. The singly linked-list is such a container. It performs
well (constant time) when add element to the head, but need linear time when append
to tail. f oldr is a natural choice when duplicate a list while keeping the order. But f oldl
will generate a reversed list. As a workaround, we first reverse the list, then reduce it:
1: function Reduce-Right(f, z, X)
2: return Reduce(f, z, Reverse(X))
One may prefer f oldl as it is tail recursive, fits for both functional and imperative
settings as an online algorithm. However, f oldr plays a critical role when handling infinite
list (modeled as stream) with lazy evaluation. For example, below program wraps every
natural number to a singleton list, and returns the first 10:
take 10 (f oldr (x xs 7→ [x]:xs) [ ] [1, 2, ...])
⇒ [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]
It does not work with f oldl or the evaluation never ends. We use a unified notation
f old when both fold left and right work. We also use f oldl and f oldr to indicate direction
doesn’t matter. Although this chapter is about list, the fold concept is generic, applies to
other algebraic structures. We can fold a tree (section 2.6 in [99] ), a queue, and many other
objects as long as the following 2 things are defined: (1) empty (for example the empty
tree); (2) decomposed recursive structure (like to decompose a tree into sub-trees and a
key). People abstract them further with concepts like foldable, monoid, and traversable.
For example, let us implement the n-lights puzzle with f old and map. In the brute-
force solution, we create a list of pairs. Each pair (i, s) has a light number i, and on/off
state s. For every round j, switch the i-th light when j|i. Define this process with f old:

f oldr step [(1, 0), (2, 0), ..., (n, 0)] [1, 2, ..., n]

All lights are off at the beginning. We fold the list of round 1 to n. Function step takes
two parameters: the round number i, and the list of pairs: step i L = map (switch i) L.
The result of f oldr is the pairs of light number and the final on/off state. We extract the
state out through map, and count the number with sum:

sum (map snd (f oldr step [(1, 0), (2, 0), ..., (n, 0)] [1, 2, ..., n])) (1.56)

What if we f old a list of lists with “+


+” (eq. (1.15))? It concatenates them to a list,
just like sum to numbers.

concat = f oldr (+
+) [ ] (1.57)

For example: concat [[1], [2, 3, 4], [5, 6, 7, 8, 9]] ⇒ [1, 2, 3, 4, 5, 6, 7, 8, 9].

Exercise 1.11
1.11.1. To define insertion-sort with f oldr, we design the insert function as insert x X,
and sort as sort = f oldr insert [ ]. The type for f oldr is:

f oldr :: (A → B → B) → B → [A] → B

Where its first parameter f has the type of A → B → B, the initial value z has
the type B. It folds a list of A, and builds the result of B. How to define the
insertion-sort with f oldl? What is the type of f oldl?
1.11.2. What’s the performance of concat? Design a linear time concat algorithm.
1.11.3. Define map in f oldr.
22 CHAPTER 1. LIST

1.7 Search and filter


Search and filter are generic concepts for a wide range of things. For list, it often takes
linear time to scan and find the result. First, consider how to test if x is in list X? We
compare every element with x, until either equal or reach to the end:
a∈[] = F
(alse
b = a : T rue (1.60)
a ∈ (b:bs) =
b 6= a : a ∈ bs

The existence check is also called elem. The performance is O(n). We can not improve
it to O(lg n) with binary search directly even for ordered list. This is because list does
not support constant time random access (chapter 3). Let’s extend elem. In the n-lights
puzzle, we use a list of pairs [(k, v)]. Every pair contains a key and a value. Such list is
called ‘associate list’ (abbrev. assoc list). We can lookup the value with a key.
lookup x [ ] = Nothing
(
k = x : Just (k, v) (1.61)
lookup x ((k, v):kvs) =
k 6= x : lookup x kvs

Different from elem, we want to find the corresponding value besides the existence of
key x. However, it is not guaranteed the value always exists. We use the algebraic type
class ‘Maybe’. A type of Maybe A has two kinds of value. It may be some a in A or
nothing. Denoted as Just a and Nothing respectively. This is a way to deal with null
reference8 (4.2.2 in [99] ). We can make lookup generic, to find the element that satisfies a
given predicate:
f ind p [ ] = Nothing
(
p(x) : Just x (1.62)
f ind p (x:xs) =
otherwise : f ind p xs

Although there can be multiple elements satisfying p, the f ind function picks the
first. We can expand to find all, which is called f ilter as shown in fig. 1.4. Define (ZF
expression): f ilter p X = [x|x ← X, p(x)]. We can use logic and to chain multiple filters,
as: f ilter (p1 ∧ p2 ∧ · · · ) X = [x|x ← X, p1 (x), p2 (x), · · · ]9 .

Input filter p Output

Figure 1.4: Input: [x1 , x2 , ..., xn ], Output: [x01 , x02 , ..., x0m ]. and ∀x0i ⇒ p(x0i ).

Different from f ind, filter returns the empty list instead of Nothing when no element
satisfies the predicate.
f ilter p [ ] = [(]
p(x) : x : f ilter p xs (1.63)
f ilter p (x:xs) =
otherwise : f ilter p xs

This definition builds the result from right. For iterative implementation, the perfor-
mance will drop to O(n2 ) with Append. If change to Cons, then the order is reversed.
We need further reverse it back in linear time (see the exercise).
8 Similar to Optional<A> in some environments.
9 We may use the simplified notation: [x ← X, p1 (x), p2 (x), ...], or even [p1 (x), p2 (x), ...] when context
is clear
1.7. SEARCH AND FILTER 23

1: function Filter(p, X)
2: X 0 ← NIL
3: while X 6= NIL do
4: if p(First(X)) then
5: X 0 ← Append(X 0 , First(X)) . Linear time
6: L ← Rest(X)
The nature to build result from right reminds us f oldr. Define f to test an element
against the predicate, and prepend it to the result: f p x as = if p x then x:as else as.
Use its Curried form to define f ilter:

f ilter p = f oldr (x as 7→ f p x as) [ ] (1.64)

We can further simplify it (called η-conversion [73] ) as:

f ilter p = f oldr (f p) [ ] (1.65)

Filter is a generic concept not only limit to list. We can apply a predicate to any
traversable structure to extract things.
Match is to find a pattern from some structure. Even if limit to list and string, there
are still too many things to cover (chapter 14). The very basic problem is to test whether
list as exits in bs as a sub-list. There are two special cases: to test if as is prefix or
suffix of bs. The span function actually finds the longest prefix under a given predicate.
Similarly, we can compare each element between as and bs. Define as ⊆ bs if as is a
prefix of bs:

[ ] ⊆ bs = T rue
(a:as) ⊆ [ ] = F
(alse
(1.66)
a 6= b : F alse
(a:as) ⊆ (b:bs) =
a = b : as ⊆ bs

Prefix testing takes linear time to scan the two lists. However, we can not do suffix
testing in this way because it is expensive to align the right ends and scan backwards.
This is different from array. Alternatively, we can reverse both lists in linear time, convert
the problem to prefix testing:

as ⊇ bs = reverse(as) ⊆ reverse(bs) (1.67)

With ⊆, we can test if a list is the sub-list of another one (infix testing). Define empty
is infix of any list, we repeatedly apply prefix testing while traverse bs:

infix? (a:as) [ ] = F
(alse
as ⊆ (b:bs) : T rue (1.68)
infix? as (b:bs) =
otherwise : infix? as bs

Below is the iterative implementation:


1: function Is-Infix(A, B)
2: if A = NIL then
3: return TRUE
4: n ← |A|
5: while B 6= NIL and n ≤ |B| do
6: if A ⊆ B then
7: return TRUE
8: B ← Rest(B)
24 CHAPTER 1. LIST

9: return FALSE
Because prefix testing runs in linear time, and is called in every loop. This implemen-
tation is bound to O(mn) time, where m, n are the length of the two lists. Symmetrically,
we can enumerate all suffixes of B, and test if A is prefix of any:

infix? A B = ∃S ∈ suffixes B, A ⊆ S (1.69)

Below example program implements infix testing with list comprehension:


isInfixOf a b = (not ◦ null) [s | s ← tails b, a `isPrefixOf` s]

Where isPrefixOf does the prefixing testing, tails generates all suffixes of a given
list (exercise of this section).

Exercise 1.12
1.12.1. Implement the linear time filter algorithm through reverse.
1.12.2. Enumerate all suffixes of a list.

1.8 zip and unzip


The assoc list is a light weighted dictionary (map) for small data. It is easier than tree
or heap based dictionary. But has the overhead of linear time lookup. In the ‘n-lights’
puzzle, we build the assoc list as: map (i 7→ (i, 0)) [1, 2, ..., n]. We define a zip function:

zip as [ ] = [ ]
zip [ ] bs = [ ] (1.70)
zip (a:as) (b:bs) = (a, b) : zip as bs

This implementation works even the two lists have different lengths. The result has
the same length as the shorter one. We can even zip infinite lists (under lazy evaluation),
for example10 : zip [0, 0, ...] [1, 2, ..., n]. For a list of words, we can index it as: zip [1, 2,
...] [a, an, another, ...]. zip builds the result from right. We can define it with f oldr. It
is bound to O(m) time, where m is the length of the shorter list. When implement the
iterative zip, the performance will drop to quadratic if using Append, we can use Cons
then reverse the result. However, this method can’t handle two infinite lists. In imperative
settings, we can reuse A to hold the zip result (treat as transform every element to a pair).
1: function Zip(A, B)
2: C ← NIL
3: while A 6= NIL and B 6= NIL do
4: C ← Append(C, (First(A), First(B))) . Linear time
5: A ← Rest(A)
6: B ← Rest(B)
7: return C
We can extend to zip multiple lists. Some programming environments provide, zip,
zip3, zip4, ... Sometimes, we want to apply a binary function to combine elements,
but not just form a pair. For example, given a list of unit prices [1.00, 0.80, 10.05, ...] for
fruits: apple, orange, banana, ... and a list of quantities, like [3, 1, 0, ...], meaning, buy 3
apples, 1 orange, 0 banana, ... Below program generates the payment list:
10 Or zip (repeat 0) [1..n], where repeat x = x : repeat x.
1.8. ZIP AND UNZIP 25

pays us [ ] = [ ]
pays [ ] qs = [ ]
pays (u:us) (q:qs) = uq : pays us qs

It has the same structure as zip except using multiply but not ‘cons’. We can abstract
the binary function as f :

zipW ith f as [ ] = [ ]
zipW ith f [ ] bs = [ ] (1.71)
zipW ith f (a:as) (b:bs) = (f a b) : zipW ith f as bs

We can define the inner product (or dot product) [98] as: A·B = sum (zipW ith (·) A B),
or define the infinite Fibonacci sequence with lazy evaluation:

F = 0 : 1 : zipW ith (+) F F 0 (1.72)

Let F be the infinite Fibonacci numbers, starts from 0 and 1. F 0 drops the head. From
the third number, every Fibonacci number is the sum of the corresponding numbers from
F and F 0 at the same position. Below example program takes the first 15 Fibonacci
numbers:
fib = 0 : 1 : zipWith (+) fib (tail fib)

take 15 fib
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377]

unzip is the inverse of zip. It converts a list of pairs to two separated lists. Define it
with f oldr in Curried form:

unzip = f oldr ((a, b) (as, bs) 7→ (a:as, b:bs)) ([ ], [ ]) (1.73)

For the fruits example, given the unit price as an assoc list: U = [(apple, 1.00),
(orange, 0.80), (banana, 10.05), ...], the purchased quantity is also an assoc list: Q =
[(apple, 3), (orange, 1), (banana, 0), ...]. We extract the unit prices and the quantities,
then compute their inner-product:

pay = sum (zipW ith (·) snd(unzip U ) snd(unzip Q)) (1.74)

zip and unzip are generic. We can expand to zip two trees, where the nodes contain
paired elements from both. When traverse a collection of elements, we can also use the
generic zip and unzip to track the path. This is a method to mimic the ‘parent’ reference
in imperative implementation (last chapter of [10] ).
List is fundamental to build more complex data structures and algorithms particularly
in functional settings. We introduced elementary algorithms to construct, access, update,
and transform list; how to search, filter data, and compute with list. Although most pro-
gramming environments provide pre-defined tools and libraries to support list, we should
not simply treat them as black-boxes. Rabhi and Lapalme introduce many functional al-
gorithms about list [72] . Haskell library provides detailed documentation about basic list
algorithms. Bird gives good examples of folding [1] , and introduces about the fold fusion
law.

Exercise 1.13
1.13.1. Design the iota (the Greek letter I) operator for list, below are the use cases:
• iota(..., n) = [1, 2, 3, ..., n];
• iota(m, n) = [m, m + 1, m + 2, ..., n], where m ≤ n;
26 Binary search tree

• iota(m, m + a, ..., n) = [m, m + a, m + 2a, ..., m + ka], where k is the maximum


integer satisfying m + ka ≤ n;
• iota(m, m, ...) = repeat(m) = [m, m, m, ...];
• iota(m, ...) = [m, m + 1, m + 2, ...].
1.13.2. Implement the linear time imperative zip.
1.13.3. Define zip with fold (hint: define fold for two lists f oldr2 f z xs ys).
1.13.4. Implement lastAt with zip.
1.13.5. Write a program to remove the duplicated elements in a list while maintain the
original order. For imperative implementation, the elements should be removed in-
place. What is the complexity? How to simplify it with additional data structure?
1.13.6. List can represent decimal non-negative integer. For example 1024 as list is 4 →
2 → 0 → 1. Generally, n = dm ...d2 d1 can be represented as d1 → d2 → ... → dm .
Given two numbers a, b in list form. Realize arithmetic operations such as add
and subtraction.
1.13.7. In imperative settings, a circular linked-list is corrupted, that some node points
back to previous one, as shown in fig. 1.6. When traverse, it falls into infinite
loops. Design an algorithm to detect if a list is circular. On top of that, improve
it to find the node where loop starts (the node being pointed by two precedents).

Figure 1.5: A circular linked-list


Chapter 2

Binary Search Tree

Array and list are typically considered the basic data structures. However, we’ll see they
are not necessarily easy to implement in chapter 12. Upon imperative settings, array is
the most elementary data structures. It is possible to implement linked-list using arrays
(section 3.4). While in functional settings, linked-list acts as the building blocks to create
array and other data structures. The binary search trees is another basic data structure.
Jon Bentley gives a problem in Programming Pearls [2] : how to count the number of word
occurrences in a text. Here is a solution:
void wordCount(Input in) {
Map<String, Int> map
while String w = read(in) {
map[w] = if map[w] == null then 1 else map[w] + 1
}
for var (w, c) in map {
print(w, ":", c)
}
}

2.1 Definition
The map is a binary search tree. Here we use the word as the key, and its occurrence
number as the value. This program is a typical application of binary search tree. Let
us firstly define the binary tree. A binary tree is either empty (∅)1 ; or contains 3 parts:
an element k, and two sub-trees called left(l) and right(r) children, denoted as (l, k, r).
A none empty binary tree consists of multiple nodes, each is either empty or stores the
element of type K. We define the type of the binary tree as T ree K. We say a node is a
leaf if both sub-trees are empty, otherwise it’s a branch node.
A binary search tree is a special binary tree that its elements are comparable2 , and
satisfies: for any non empty node (l, k, r), all the keys in the left sub-tree < k; k < any
key in the right sub-tree. Figure 2.2 shows an example of binary search tree. Comparing
with fig. 2.1, we can see the different ordering. For this reason, we call the comparable
element as key, and the augmented data as value. The type is T ree (K, V ). A node
contains a key, a value (optional), left, right sub-tree references, and a parent reference
for easy backtracking. When the context is clear, we skip the value. The appendix
1 The great mathematician André Weil invented this symbol for null set. It comes from the Norwegian

alphabet.
2 It is abstract ordering, not limit to magnitude, but like precedence, subset of etc. the ‘less than’ (<)

is abstract.

27
28 CHAPTER 2. BINARY SEARCH TREE

16

4 10

14 7 9 3

2 8 1

Figure 2.1: Binary tree

of this chapter includes an example definition. We needn’t reference for backtracking


in functional settings, but use top-down recursive computation. Below is the example
functional definition (known as algebraic data type, abbrev: ADT):

3 8

1 7 16

2
10

9 14

Figure 2.2: A binary search tree

data Tree a = Empty | Node (Tree a) a (Tree a)

2.2 Insert
When insert a key k (or with the value) to the binary search tree T , we need maintain
the ordering. If the tree is empty, create a leaf of k. Otherwise, let the tree be (l, x, r). If
k < x, insert it to the left sub-tree l; otherwise, insert to the right r. If k = x, it already
exists in the tree. We overwrite the value (update). Alternatively, we can append the
data or do nothing. We skip this case.
insert k ∅ = (∅,
( k, ∅)
k<x: (insert k l, x, r) (2.1)
insert k (l, x, r) =
otherwise : (l, x, insert k r)

insert k Empty = Node Empty k Empty


insert k (Node l x r) | k < x = Node (insert k l) x r
| otherwise = Node l x (insert k r)

This implementation uses the pattern matching feature. There is an example without
pattern matching in the appendix. We can eliminate the recursion with iterative loops:
2.3. TRAVERSE 29

1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: parent ← NIL
5: while T 6= NIL do
6: parent ← T
7: if k < Key(T ) then
8: T ← Left(T )
9: else
10: T ← Right(T )
11: Parent(x) ← parent
12: if parent = NIL then . T in empty
13: return x
14: else if k < Key(parent) then
15: Left(parent) ← x
16: else
17: Right(parent) ← x
18: return root

19: function Create-Leaf(k)


20: x ← Empty-Node
21: Key(x) ← k
22: Left(x) ← NIL
23: Right(x) ← NIL
24: Parent(x) ← NIL
25: return x
Where Key(T ) accesses the key of the node:

key ∅ = Nothing
(2.2)
key (l, k, r) = Just k

We can repeat insert every element from a list, convert the list to a binary search tree:

fromList [ ] = ∅
fromList (x:xs) = insert x (fromList xs)

Or define it with fold (chapter 1) in Curried form: fromList = f oldr insert ∅. We


arrange the arguments in symmetric order: insert k t and Insert(T, k), apply f oldr for
the former, and f oldl (or for-loop) for the latter:
1: function From-List(X)
2: T ← NIL
3: for each x in X do
4: T ← Insert(T, x)
5: return T

2.3 Traverse
There are 3 ways to visit the elements in a binary tree: pre-order, in-order, and post-order.
They are named to highlight the order of visiting key between/before/after sub-trees.

• pre-order: key - left - right;


30 CHAPTER 2. BINARY SEARCH TREE

• in-order: left - key - right;


• post-order: left - right - key.
The ‘visit’ is recursive. For the tree in fig. 2.2, the corresponding orders are:
• pre-order: 4, 3, 1, 2, 8, 7, 16, 10, 9, 14
• in-order: 1, 2, 3, 4, 7, 8, 9, 10, 14, 16
• post-order: 2, 1, 3, 7, 9, 14, 10, 16, 8, 4
It is not by accident that the in-order traverse gives an ascending list, but is guaranteed
(Exercise 2.1.3). Define map that in-order traverses and applies function f to every
element. It transforms a tree to another tree of the same structure (isomorphic).
map f ∅ =
(2.3)

map f (l, k, r) = (map f l, f k, map f r)
If we only need process keys but not transform the tree, we can implement in-order
traverse as below:
1: function Traverse(T, f )
2: if T 6= NIL then
3: Traverse(Left(T ), f )
4: f (Key(T ))
5: Traverse(Right(T , f ))
We can change the map function, and convert a binary search tree to a sorted list
(also called flatten).
toList ∅ = [ ]
(2.4)
toList (l, k, r) = toList l +
+ [k] +
+ toList r
We can develop a sort algorithm: convert a list to a binary search tree, then convert
the tree back to ordered list, namely ‘tree sort’: sort X = toList (fromList X), or write
as function composition [8] :
sort = toList ◦ fromList (2.5)
Define the generic fold for binary trees (see chapter 1 for fold):
f oldt f g z ∅ = z
(2.6)
f oldt f g z (l, k, r) = g (f oldt f g z l) (f k) (f oldt f g z r)
Where f : A → B, sends the key k of type A to m = f (k) of type B. It recursively
folds the left and right sub-trees (from z) to get x and y respectively, then combines the
three things together as g x m y. We can define map with f oldt:
map f = f oldt f (x m y 7→ (x, m, y)) ∅ (2.7)
f oldt preserves the tree structure with the ternary function g. If don’t care about the
tree structure, we can use a binary function f : A × B → B to simplify and fold a tree of
type T ree A to a value of type B:
f old f z ∅ = z
(2.8)
f old f z (l, k, r) = f old f (f k (f old f z r)) l
For example: sum = f old (+) 0 sums all elements of the tree; length = f old (x n 7→
n + 1) 0 counts the number of elements. However, f old can not define map, as the binary
function f losses the tree structure.
2.4. QUERY 31

Exercise 2.1
2.1.1. Given the in-order and pre-order traverse results, rebuild the tree, and output the
post-order traverse result. For example:
• Pre-order: 1, 2, 4, 3, 5, 6;
• In-order: 4, 2, 1, 5, 3, 6;
• Post-order: ?

2.1.2. Write a program to rebuild the binary tree from the pre-order and in-order traverse
lists.
2.1.3. For binary search tree, prove that the in-order traverse always gives ordered list.
2.1.4. What is the complexity of tree sort?
2.1.5. Define toList with fold.
2.1.6. Define depth t with fold, which calculates the height of a binary tree.

2.4 Query
Because the binary search tree organises ordered elements recursively, it supports varies
of query efficiently. This is the reason we name it ‘search’ tree. There are three types
of query: (1) lookup a key; (2) find the minimum or maximum; (3) given a node, find
its predecessor or successor. When lookup the value of some key x in a tree of type
T ree (K, V ):

• If the tree is empty, x does not exist;

• For tree (l, (k, v), r), if k = x, then v is the result;

• If x < k, then recursively lookup l, otherwise, lookup r.

lookup x ∅ = Nothing
Just v

k = x :
(2.9)

lookup x (l, (k, v), r), x) = x<k: lookup x l
otherwise : lookup x r

We use the Maybe type3 to handle the ‘not found’ case. Let the height of the tree
be h, the performance of lookup is O(h). If the tree is balanced (see chapter 4), the
performance is O(lg n), where n is the number of elements. It decreases to O(n) time
in the worse case for extremely unbalanced tree. Below implementation eliminates the
recursion with loops:
1: function Lookup(T, x)
2: while T 6= NIL and Key(T ) 6= x do
3: if x < Key(T ) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: return Value(T ) . returns ∅ if T =NIL
3 Also known as Optional<T> type, see chapter 1.
32 CHAPTER 2. BINARY SEARCH TREE

In binary search tree, the less keys are on the left, while the greater keys are on the
right. To locate the minimum, we keep going to the left till the left sub-tree is empty.
Symmetrically, we keep going to the right to find the maximum. Both min / max are
bound to O(h) time, where h is the height of the tree.

min (∅, k, r) = k max (l, k, ∅) = k


(2.10)
min (l, k, r) = min l max (l, k, r) = max r

We sometimes traverse a binary search tree as a container. Start from the minimum,
keep moving forward step by step towards the maximum, or go back and forth. Below
example program prints elements in sorted order.
void printTree (Node<T> t) {
for var it = Iterator(t), it.hasNext(), it = it.next() {
print(it.get(), ", ")
}
}

Such use case need to find the successor or predecessor of a node. Define the successor
of x as the smallest y that x < y. If x has none empty right sub-tree r, the minimum of r
is the successor. As shown in fig. 2.3, to find the successor of 8, we search the minimum
in its right, which is 9. If the right sub-tree of x is empty, we need back-track along the
parent till the closest ancestor whose left sub-tree is also an ancestor of x. In fig. 2.3,
since node 2 does not have right sub-tree, we go up to its parent of node 1. However,
node 1 does not have left sub-tree, we need go up again, hence reach to node 3. As the
left sub-tree of node 3 is also an ancestor of node 2, node 3 is the successor of node 2.

3 8

1 7 16

2
10

9 14

Figure 2.3: The successor of 8 is 9, the minimum of its right; for the successor of 2, we
go up to its parent 1, then 3.

If we finally reach to the root along the parent path, but still can not find an ancestor
on the right, then the node does not have the successor (the last element). Below algorithm
finds the successor of x:
1: function Succ(x)
2: if Right(x) 6= NIL then
3: return Min(Right(x))
4: else
5: p ← Parent(x)
6: while p 6= NIL and x = Right(p) do
7: x←p
8: p ← Parent(p)
9: return p
2.5. DELETE 33

This algorithm returns NIL when x hasn’t the successor. The predecessor algorithm
is symmetric:
1: function Pred(x)
2: if Left(x) 6= NIL then
3: return Max(Left(x))
4: else
5: p ← Parent(x)
6: while p 6= NIL and x = Left(p) do
7: x←p
8: p ← Parent(p)
9: return p
The purely functional settings don’t use parent reference4 . Some implementation
records the visited paths for back-track or tree rebuilding, called zipper(last chapter
of [10] ). The original purpose for Succ and Pred is ‘to traverse the tree’ as a container.
However, we typically in-order traverse the tree through map in functional settings. It’s
only meaningful to find the successor and predecessor in imperative settings.

Exercise 2.2

2.2.1. How to test whether an element k exists in the tree t?


2.2.2. Use Pred and Succ to write an iterator to traverse the binary search tree as a
generic container. What’s the time complexity to traverse a tree of n elements?
2.2.3. One can traverse the elements inside range [a, b], for example:
for_each (m.lower_bound(12), m.upper_bound(26), f);
Write an equivalent functional program for binary search tree.

2.5 Delete
We need maintain the ordering while delete: for any node (l, k, r), all left are still less
than k, all right are still greater than k after delete. To delete x [6] : (1) if x is a leaf or
only has a none empty sub-tree, cut x off; (2) if x has two none empty sub-trees, use
the minimum y of its right sub-tree to replace x, then cut the original y off. Because
the minimum of the right sub-tree can not have two none empty sub-trees, we eventually
convert case 2 to 1, directly cut the minimum node off, as shown in figs. 2.4 to 2.6.

NIL NIL

Figure 2.4: Cut the leaf x off.

4 There is ref in ML and OCaml, we limit to the purely functional settings.


34 CHAPTER 2. BINARY SEARCH TREE

NIL

NIL

Figure 2.5: Delete a node with only a none empty sub-tree.

Figure 2.6: Delete a branch with two none empty sub-trees.


2.5. DELETE 35

delete x ∅ = ∅

x < k : (delete x l, k, r)
(2.11)

delete x (l, k, r) = x > k : (l, k, delete x r)

x = k : del l r

Where:

del ∅ r = r
del l ∅ = l (2.12)
del l r = (l, y, delete y r), where y = min r

The performance of delete is O(h), where h is the height. The imperative implemen-
tation needs set the parent reference in addition.
1: function Delete(T, x)
2: r←T
3: x0 ← x . save x
4: p ← Parent(x)
5: if Left(x) = NIL then
6: x ← Right(x)
7: else if Right(x) = NIL then
8: x ← Left(x)
9: else . neither sub-tree is empty
10: y ← Min(Right(x))
11: Key(x) ← Key(y)
12: Value(x) ← Value(y)
13: if Parent(y) 6= x then . y does not have left sub-tree
14: Left(Parent(y)) ← Right(y)
15: else . y is the root of the right sub-tree
16: Right(x) ← Right(y)
17: if Right(y) 6= NIL then
18: Parent(Right(y)) ← Parent(y)
19: Remove y
20: return r
21: if x 6= NIL then
22: Parent(x) ← p
23: if p = NIL then . remove the root
24: r←x
25: else
26: if Left(p) = x0 then
27: Left(p) ← x
28: else
29: Right(p) ← x
30: Remove x0
31: return r
Assume x is not empty, first record the root, copy reference to x and its parent. When
delete, we also need handle the special case, that y is the root of the right sub-tree.
Finally, we need reset the stored parent if x has only one none empty sub-tree. If the
copied parent is empty, we are deleting the root. We return the new root in this case.
After setting the parent, we can safely remove x.
The performance of the binary search tree algorithms depend on the height h. When
unbalanced, O(h) is close to O(n), while for well balanced tree, O(h) is close to O(lg n).
36 CHAPTER 2. BINARY SEARCH TREE

Chapter 4 and 5 introduce self-balanced solution, there is another simple balance method:
shuffle the elements, then build the tree [4] . It decreases the possibility of poorly balanced.
We can use binary search tree to realize the map data structure (also known as asso-
ciative data structure or dictionary). A finite map is a collection of key-value pairs. Each
key is unique, and mapped to some value. For keys of type K, values of type V , the type
of the map is M ap K V or Map<K, V>. For none empty map, it contains n mappings of
{k1 7→ v1 , k2 7→ v2 , ..., kn 7→ vn }. When use the binary search tree to implement map, we
constrain K to be an ordered set. Every node stores a pair of key and value. The type of
the tree is T ree (K, V ). We use the tree insert/update operation to associate a key with
a value. Given a key k, we use lookup to find the mapped value v, or returns nothing or
∅ when k does not exist. The red-black tree and AVL tree in chapter 4 and 5 can also
implement map.

Exercise 2.3
2.3.1. There is a symmetric deletion algorithm. When neither sub-tree is empty, we
replace with the maximum of the left sub-tree, then cut it off. Write a program to
implement this solution.
2.3.2. Write a randomly building algorithm for binary search tree.
2.3.3. How to find the two nodes with the greatest distance in a binary tree?

2.6 Appendix: Example programs


Definition of binary search tree node with parent reference.
data Node<T> {
T key
Node<T> left
Node<T> right
Node<T> parent

Node(T k) = Node(null, k, null)

Node(Node<T> l, T k, Node<T> r) {
left = l, key = k, right = r
if (left 6= null) then left.parent = this
if (right 6= null) then right.parent = this
}
}

Recursive insert without using pattern matching.


Node<T> insert (Node<T> t, T x) {
if (t == null) {
return Node(null, x, null)
} else if (t.key < x) {
return Node(insert(t.left, x), t.key, t.right)
} else {
return Node(t.left, t.key, insert(t.right, x))
}
}

Map and fold:


mapt _ Empty = Empty
mapt f (Node l x r)= Node (mapt f l) (f x) (mapt f r)

foldt _ _ z Empty = z
foldt f g z (Node l k r) = g (foldt f g z l) (f k) (foldt f g z r)
Elementary Algorithms 37

maptr :: (a → b) → Tree a → Tree b


maptr f = foldt f Node Empty

fold _ z Empty = z
fold f z (Node l k r) = fold f (k `f` (fold f z r)) l

Iterative lookup without recursion:


Optional<Node<T>> lookup (Node<T> t, T x) {
while (t 6= null and t.key 6= x) {
if (x < t.key) {
t = t.left
} else {
t = t.right
}
}
return Optional.of(t);
}

Example iterative program to find the minimum of a tree.


Optional<Node<T>> min (Node<T> t) {
while (t 6= null and t.left 6= null) {
t = t.left
}
return Optional.of(t);
}

Iterative find the successor.


Optional<Node<T>> succ (Node<T> x) {
if (x == null) {
return Optional.Nothing
} else if (x.right 6= null) {
return min(x.right)
} else {
p = x.parent
while (p 6= null and x == p.right) {
x = p
p = p.parent
}
return Optional.of(p);
}
}

delete:
delete _ Empty = Empty
delete x (Node l k r) | x < k = Node (delete x l) k r
| x > k = Node l k (delete x r)
| otherwise = del l r
where
del Empty r = r
del l Empty = l
del l r = let k' = min r in Node l k' (delete k' r)
38 Insertion sort
Chapter 3

Insertion sort

3.1 Introduction
Insertion sort is a straightforward sort algorithm1 . We give its preliminary definition
for list in chapter 1. For a collection of comparable elements, we repeatedly pick one,
insert them to a list and maintain the ordering. As every insertion takes linear time, its
performance is bound to O(n2 ) where n is the number of elements. This performance is
not as good as the divide and conqueror sort algorithms, like quick sort and merge sort.
However, we can still find its application today. For example, a well tuned quick sort
implementation falls back to insertion sort for small data set. The idea of insertion sort is
similar to sort a deck of a poker cards( [4] pp.15). The cards are shuffled. A player takes
card one by one. At any time, all cards on hand are sorted. When draws a new card, the
player inserts it in proper position according to the order of points as shown in fig. 3.1.

Figure 3.1: Insert card 8 to a deck.

Based on this idea, we can implement insertion sort as below:


1: function Sort(A)
2: S←[]
3: for each a ∈ A do
4: Insert(a, S)
5: return S
We store the sorted result in a new array, alternatively, we can change it to in-place:
1: function Sort(A)
1 We skip the ‘Bubble sort’ method

39
40 CHAPTER 3. INSERTION SORT

2: for i ← 2 to |A| do
3: ordered insert A[i] to A[1...(i − 1)]
Where the index i ranges from 1 to n = |A|. We start from 2, because the singleton
sub-array of A[1] is ordered. When process the i-th element, all elements before i are
sorted. We continuously insert elements till n, as shown in fig. 3.2.

insert

... sorted elements ... ... unsorted elements ...

Figure 3.2: Continuously insert elements to the sorted part.

3.2 Insertion
In chapter 1, we define the ordered insertion for list. For array, we scan to locate the
insert position either from left or right. Below algorithm is from right:
1: function Sort(A)
2: for i ← 2 to |A| do . Insert A[i] to A[1...(i − 1)]
3: x ← A[i] . Save A[i] to x
4: j ←i−1
5: while j > 0 and x < A[j] do
6: A[j + 1] ← A[j]
7: j ←j−1
8: A[j + 1] ← x
It’s expensive to insert at arbitrary position, as array stores elements continuously.
When insert x at position i, we need shift all elements after i (i.e. A[i + 1], A[i + 2], ...)
right. then put x in the freed cell, as shown in fig. 3.3.

insert

Figure 3.3: Insert x to A at i.

For the array of length n, suppose after comparing x to the first i elements, we located
the position to insert. Then we shift the rest n − i + 1 elements, and put x in the i-th
cell. Overall, we traverse the entire array if scan from left. While, if scan from right, we
examine n − i + 1 elements, and do the same amount of shifts. The insertion takes linear
time no matter scans from left or right, hence the sort algorithm is bound to O(n2 ). We
can also define a separated Insert() function, and call it inside the loop.

Exercise 3.1
3.1.1. Implement the insert to scan from left to right.
3.3. BINARY SEARCH 41

3.1.2. Define the insert function, and call it from the sort algorithm.

3.3 Binary search


When insert a poker card, human does not scan, but takes a quick glance at the deck to
locate the position. We can do this because the deck is sorted. Binary search is such a
method that applies to ordered sequence.
1: function Sort(A)
2: for i ← 2 to |A| do
3: x ← A[i]
4: p ← Binary-Search(x, A[1...(i − 1)])
5: for j ← i down to p do
6: A[j] ← A[j − 1]
7: A[p] ← x
Because the slice A[1...(i − 1)] is already ordered, to find the position j such thatA[j −
i
1] ≤ x ≤ A[j], we compare x to the middle element A[m], where m = b c. If x < A[m],
2
we then recursively apply binary search to the first half; otherwise, we search the second
half. As we halve the elements every time, binary search takes O(lg i) time to locate the
insert position.
1: function Binary-Search(x, A)
2: l ← 1, u ← 1 + |A|
3: while l < u do
l+u
4: m←b c
2
5: if A[m] = x then
6: return m . Duplicated element
7: else if A[m] < x then
8: l ←m+1
9: else
10: u←m
11: return l
The improved sort algorithm is still bound to O(n2 ). The one with scan takes O(n2 )
comparisons and O(n2 ) shifts; with binary search, it takes O(n lg n) comparisons and
O(n2 ) shifts.

3.4 List
With binary search, the total number of comparisons reduced to O(n lg n). However, as
we need shift array cells when insert, the overall time is still bound to O(n2 ). On the
other hand, when use list, the insert operation is constant time at a given node reference.
In chapter 1, we define the insertion sort algorithm for list as below:
sort [ ] = [ ]
(3.1)
sort (x:xs) = insert x (sort xs)
Or define with f oldr in Curried form: sort = f oldr insert [ ]. However, the list insert
algorithm still takes linear time, because we need scan to locate the insert position:
insert x [ ] = [x]
(
x≤y: x : y : ys (3.2)
insert x (y:ys) =
otherwise : y : insert x ys
42 Red-black tree

Instead of using node reference, we can also realize list through an additional index
array. For every element A[i], N ext[i] stores the index to the next element follows A[i],
i.e. A[N ext[i]] is the next element of A[i]. There are two special indexes: for the tail node
A[m], we define N ext[m] = −1, indicating it points to NIL; we also define N ext[0] to
index the head element. With the index array, we can implement the insertion algorithm
as below:
1: function Insert(A, N ext, i)
2: j←0 . N ext[0] for head
3: while N ext[j] 6= −1 and A[N ext[j]] < A[i] do
4: j ← N ext[j]
5: N ext[i] ← N ext[j]
6: N ext[j] ← i

7: function Sort(A)
8: n ← |A|
9: N ext = [1, 2, ..., n, −1] . n + 1 indexes
10: for i ← 1 to n do
11: Insert(A, N ext, i)
12: return N ext
With list, although the insert operation changes to constant time, we need traverse
the list to locate the position. It is still bound to O(n2 ) times comparison. Unlike array,
list does not support random access, hence we can not use binary search to speed up.

Exercise 3.2
3.2.1. For the index array based list, we return the re-arranged index as result. Design
an algorithm to re-order the original array A from the index N ext.

3.5 Binary search tree


We drive into a corner. We want to improve both comparison and insertion at the same
time, or will end up with O(n2 ) performance. For comparison, we need binary search
to achieve O(lg n) time; on the other hand, we need change the data structure, because
array can not support constant time insertion at a position. We introduce a powerful
data structure in chapter 2, the binary search tree. It supports binary search from its
definition by nature. At the same time, we can insert a new node in binary search tree
fast at the given location.
1: function Sort(A)
2: T ←∅
3: for each x ∈ A do
4: T ← Insert-Tree(T, x)
5: return To-List(T )
Or sort = toList ◦ fromList for list, where Insert-Tree(), To-List(), and fromList
are defined in chapter 2. In average case, the performance of tree sort is bound to
O(n lg n). This is the lower limit of comparison based sort( [12] pp.180-193). However, in
the worst case that the tree is poor balanced the performance drops to O(n2 ).
Insertion sort is often used as the first example of sorting. It is straightforward and
easy to implement. However its performance is quadratic. Insertion sort does not only
appear in textbooks, it has practical use case in the quick sort implementation. It is an
engineering practice to fallback to insertion sort when the number of elements is small.
Chapter 4

Red-black tree

As the example in chapter 2, we use the binary search tree as a dictionary to count the
word occurrence. One may want to feed a address book, and use the binary search tree
to lookup the contact, for example:
void addrBook(Input in) {
Map<String, String> dict
while (String name, String addr) = read(in) {
dict[name] = addr
}
loop {
string name = read(Console)
var addr = dict[name]
if (addr == null) {
print("not found")
} else {
print("address: ", addr)
}
}
}

Unlike the word counter program, this one performs poorly, especially when search
names like Zara, Zed, Zulu, etc. This is because the address entries are typically in
lexicographic order. If insert numbers 1, 2, 3, ..., n to a binary search tree, it ends up
like in fig. 4.1. It is an extremely unbalanced binary search tree. The lookup is bound
to O(h) time for a tree of height h. When the tree is well balanced, the performance is
O(lg n), where n is the number of elements. But in this extreme case, the performance
downgrades to O(n), same as list scan.

...

Figure 4.1: unbalanced tree

43
44 CHAPTER 4. RED-BLACK TREE

Exercise 4.1
4.1.1. For a big address book in lexicographic order, one may want to speed up with two
concurrent tasks: one reads from the head; the other from the tail. They meet and
stop at some middle point. What does the binary search tree look like? What if
split the list into multiple sections to scale up the concurrency?

n
n
1

n-1
n-1

n-2
2

... n-2

3
1
...

(a) (b)

m-1 m+1

... ...

1 n

(c)

Figure 4.2: Unbalanced trees

4.1 Balance
To avoid extremely unbalanced tree, we can shuffle the input(section 2.5), however, we can
not randomize interactive input (e.g. entered by user). Most re-balancing solutions rely
on the tree rotation. It changes the tree structure while maintain the elements ordering.
This chapter introduces the red-black tree, a popular self-balancing binary search tree.
Next chapter is about AVL tree, another self-balancing tree. Chapter 8 introduces the
splay tree. It adjusts the tree in steps. Multiple binary trees can have the same in-order
traverse result. Figure 4.3 shows the tree rotation. We can define them with pattern
matching:

rotatel (a, x, (b, y, c)) = ((a, x, b), y, c))


rotater ((a, x, b), y, c) = (a, x, (b, y, c))
rotatel T = T rotater T = T
(4.1)
Each second clause keeps the tree unchanged if the pattern does not match (e.g. both
sub-trees are empty). We can also implement tree rotation imperatively. We need re-
assign sub-trees and parent reference. When rotate, we pass both the root T , and the
node x as parameters:
4.1. BALANCE 45

x y

a y x c

b c a b

Figure 4.3: ‘left rotate’ and ‘right rotate’.

1: function Left-Rotate(T, x)
2: p ← Parent(x)
3: y ← Right(x) . assume y 6= NIL
4: a ← Left(x)
5: b ← Left(y)
6: c ← Right(y)
7: Replace(x, y) . replace node x with y
8: Set-Subtrees(x, a, b) . Set a, b as the sub-trees of x
9: Set-Subtrees(y, x, c) . Set x, c as the sub-trees of y
10: if p = NIL then . x was the root
11: T ←y
12: return T
The Right-Rotate is symmetric (as Exercise 4.2). The Replace(x, y) uses node y
to replace x:
1: function Replace(x, y)
2: p ← Parent(x)
3: if p = NIL then . x is the root
4: if y 6= NIL then Parent(y) ← NIL
5: else if Left(p) = x then
6: Set-Left(p, y)
7: else
8: Set-Right(p, y)
9: Parent(x) ← NIL
Procedure Set-Subtrees(x, L, R) assigns L as the left, and R as the right sub-trees
of x:
1: function Set-Subtrees(x, L, R)
2: Set-Left(x, L)
3: Set-Right(x, R)
It further calls Set-Left and Set-Right to set the two sub-trees:
1: function Set-Left(x, y)
2: Left(x) ← y
3: if y 6= NIL then Parent(y) ← x

4: function Set-Right(x, y)
5: Right(x) ← y
6: if y 6= NIL then Parent(y) ← x
We can see how pattern matching simplifies the tree rotation. Based on this idea,
Okasaki developed the purely functional algorithm for red-black tree in 1995 [13] .

Exercise 4.2
46 CHAPTER 4. RED-BLACK TREE

4.2.1. Implement the Right-Rotate.

4.2 Definition
A red-black tree is a self-balancing binary search tree [14] . It is equivalent to 2-3-4 tree1 .
By coloring the node red or black, and performing rotation, red-black tree provides an
efficient way to keep the tree balanced. On top of the binary search tree definition, we
label the node with a color. We say it is a red-black tree if the coloring satisfies the
following 5 rules( [4] pp273):
1. Every node is either red or black.
2. The root is black.
3. Every NIL node is black.
4. If a node is red, then both sub-trees are black.
5. For every node, all paths from it to descendant leaves contain the same number of
black nodes.
Why do they keep the red-black tree balanced? The key point is that, the longest
path from the root to leaf can not exceed 2 times of the shortest path. Consider rule 4,
there can not be any two adjacent red nodes. Hence the shortest path only contains black
nodes. Any longer path must have red ones. In addition, rule 5 ensures all paths have
the same number of black nodes. So as to the root. It eventually ensures any path can’t
exceed 2 times of the others [14] . Figure 4.4 gives an example of red-black tree.

13

8 17

1
11 15 25

6
NIL NIL NIL NIL NIL 22 27

NIL NIL NIL NIL NIL NIL

Figure 4.4: A red-black tree

As all NIL nodes are black, we can hide them as shown in fig. 4.5. All operations
including lookup, min / max, are same as the binary search tree. However, the insert and
delete are special, as we need maintain the coloring rules. Below example program adds
the color variable atop binary search tree definition. Denote the empty tree as ∅, the
none empty tree as (c, l, k, r), where c is the color (red/black), k is the element, l and r
are left and right sub-trees.
data Color = R | B
data RBTree a = Empty | Node Color (RBTree a) a (RBTree a)

Exercise 4.3
4.3.1. Prove the height h of a red-black tree of n nodes is at most 2 lg(n + 1)
1 Chapter 7, B-tree. For any 2-3-4 tree, there is at least one red-black tree has the same ordered data.
4.3. INSERT 47

13

8 17

1
11 15 25

6
22 27

Figure 4.5: Hide the NIL nodes

4.3 Insert
The insert operation takes two steps. The first is as same as the binary search tree.
The second is to restore the coloring if it becomes unbalanced. We always color the new
element red unless it is the root. Hence don’t break any coloring rules except the 4-th
(As it may bring two adjacent red nodes). There are 4 cases violate rule 4. They share
the same structure after fixing [13] as shown in fig. 4.6.

x
z
a y
y d

b z
x c y

c d
a b x z

z a b c d
x

x d a z

a y
y d

b c b c

Figure 4.6: Fix 4 cases to the same structure.

All 4 transformations move the redness one level up. When fix recursively bottom-up,
it may color the root red, hence violates rule 2. We need revert the root black finally.
Define a balance function to fix the coloring with pattern matching. Denote the color as
C with values black B, and red R.

balance B (R, (R, a, x, b), y, c) z d = (R, (B, a, x, b), y, (B, c, z, d))


balance B, (R, a, x, (R, b, y, c)) z d = (R, (B, a, x, b), y, (B, c, z, d))
balance B a x (R, b, y, (R, c, z, d)) = (R, (B, a, x, b), y, (B, c, z, d)) (4.2)
balance B a x (R, (R, b, y, c), z, d) = (R, (B, a, x, b), y, (B, c, z, d))
balance T = T

If none of the 4 patterns matches, we leave the tree unchanged. Define insert x T =
makeBlack (ins x T ), or in Curried form:

insert x = makeBlack ◦ ins x (4.3)


48 CHAPTER 4. RED-BLACK TREE

Where:
ins x ∅ = (R,
( ∅, x, ∅)
x < k : balance C (ins x l) k r (4.4)
ins x (C, l, k, r) =
x > k : balance C l k (ins x r)
If the tree is empty, we create a red leaf of x; otherwise, compare x and k, recursively
insert x to a sub-tree. After that, call balance to fix the coloring, finally force the root to
be black.
makeBlack (C, l, k, r) = (B, l, k, r) (4.5)
Below is the example program:
insert x = makeBlack ◦ (ins x) where
ins x Empty = Node R Empty x Empty
ins x (Node color l k r)
| x < k = balance color (ins x l) k r
| otherwise = balance color l k (ins x r)
makeBlack (Node _ l k r) = Node B l k r

balance B (Node R (Node R a x b) y c) z d = Node R (Node B a x b) y (Node B c z d)


balance B (Node R a x (Node R b y c)) z d = Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R b y (Node R c z d)) = Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R (Node R b y c) z d) = Node R (Node B a x b) y (Node B c z d)
balance color l k r = Node color l k r

We skip to handle the duplicated keys. If the key already exists, we can overwrite,
drop, or store the values in a list ( [4] , pp269). Figure 4.7 shows two red-black trees built
from sequence 11, 2, 14, 1, 7, 15, 5, 8, 4 and 1, 2, ..., 8. The second example is well
balanced even for ordered input.

7 4

2 14 2 6

7
1 5 11 15 1 3 5

4 8 8

Figure 4.7: Red-black tree examples

The insert performs top-down recursive fixing. It is bound to O(h) time, where h
is the height. As the coloring rules are maintained, h is logarithm to n, the number of
elements. The overall performance is O(lg n).

Exercise 4.4
4.4.1. Implement the insert without pattern matching, handle the 4 cases separately.

4.4 Delete
Delete is more complex than insert. We can simplify the recursive implementation with
pattern matching2 . There are alternative implementation to mimic delete. Build a read-
only tree for frequently looking up [5] . When delete a node, mark it with a flag, and trigger
2 Actually, we reuse the unchanged part to rebuild the tree in purely functional settings, known as the

‘persist’ feature
4.4. DELETE 49

tree rebuilding if such nodes exceed 50%. Delete may also violate the coloring rules, hence
need fixing. The violation only happens when delete a black node according to rule 5.
The black nodes along the path decrease by one, causing not all paths contain the same
number of black nodes. To resume the blackness, we introduce a special ‘doubly-black’
node( [4] , pp290). Such a node is counted as 2 black nodes. When delete a black node x,
move the blackness up to parent or down to a sub-tree. Let node y accept the blackness.
If y was red, turn it black; if y was already black, turn it ‘doubly-black’ as B 2 . Below
example program adds the ‘doubly-black’ to color definition.
data Color = R | B | BB
data RBTree a = Empty | BBEmpty | Node Color (RBTree a) a (RBTree a)

Because NIL is black, when push the blackness down to NIL, it becomes ‘doubly-black’
empty (BBEmpty, or bold ∅ ). The first step is normal binary search tree delete; then, if
we cut a black node off, shift the blackness, and fix the coloring (Curried form):

delete x = makeBlack ◦ del x (4.6)

When delete a singleton tree, it becomes empty. To cover this case, we modify
makeBlack as:
makeBlack ∅ =
(4.7)

makeBlack (C, l, k, r) = (B, l, k, r)

Where del accepts x and the tree:

del x ∅ = ∅



 x < k : f ixB 2 (C, (del x l), k, r)
2
x > k : fixB (C, l, k, (del x r))



del x (C, l, k, r) = l = ∅ :
 if C = B then shif tB r else r
if C = B then shif tB l else l



 x = k : r = ∅ :
otherwise : f ixB 2 (C, l, m, (del m r)), where : m = min(r)

 
 
(4.8)
When the tree is empty, the result is ∅; otherwise, we compare x and k. If x < k, we
recursively delete from left; otherwise right. Because the result may contain doubly-black
node, we apply f ixB 2 . When x = k, we found the node to cut. If either sub-tree is
empty, replace with the none empty one, then shift the blackness if the node was black.
If neither sub-tree is empty, cut the minimum m = min r off, and use m to replace k.
To reserve the blackness, shif tB makes a black node doubly-black, and forces it black
otherwise. It flips doubly-black to normal black when applied twice.

shif tB (B, l, k, r) = (B 2 , l, k, r)
shif tB (C, l, k, r) = (B, l, k, r)
(4.9)
shif tB ∅ = ∅
shif tB ∅ = ∅

Below is the example program (except the doubly-black fixing part).


delete x = makeBlack ◦ (del x) where
del x Empty = Empty
del x (Node color l k r)
| x < k = fixDB color (del x l) k r
| x > k = fixDB color l k (del x r)
| isEmpty l = if color == B then shiftBlack r else r
| isEmpty r = if color == B then shiftBlack l else l
| otherwise = fixDB color l m (del m r) where m = min r
makeBlack (Node _ l k r) = Node B l k r
50 CHAPTER 4. RED-BLACK TREE

makeBlack _ = Empty

isEmpty Empty = True


isEmpty _ = False

shiftBlack (Node B l k r) = Node BB l k r


shiftBlack (Node _ l k r) = Node B l k r
shiftBlack Empty = BBEmpty
shiftBlack BBEmpty = Empty

The f ixB 2 function eliminates the doubly-black by rotation and re-coloring. The
doubly-black node can be a branch or empty ∅ . There are three cases:
Case 1. The sibling of the doubly-black node is black, and it has a red sub-tree. We
can fix this case with a rotation. There are 4 sub-cases, all can transform to the same
pattern, as shown in fig. 37.

x x

a z a y

y d b z
y
b c c
x z d

a d z
z b c

y d
x d

x c
a y

a b
b c

Figure 4.8: Transform 4 sub-cases to the same pattern

f ixB 2 C aB2 x (B, (R, b, y, c), z, d) = (C, (B, shif tB(a), x, b), y, (B, c, z, d))
f ixB 2 C aB2 x (B, b, y, (R, c, z, d)) = (C, (B, shif tB(a), x, b), y, (B, c, z, d))
f ixB 2 C (B, a, x, (R, b, y, c)) z dB2 = (C, (B, a, x, b), y, (B, c, z, shif tB(d)))
f ixB 2 C (B, (R, a, x, b), y, c) z dB2 = (C, (B, a, x, b), y, (B, c, z, shif tB(d)))
(4.10)
Where aB2 means node a is doubly-black.
Case 2. The sibling of the doubly-black is red. We can rotate the tree to turn it into
case 1 or 3, as shown in fig. 38. We add this fixing as additional 2 rows in eq. (4.10):

...
f ixB 2 B aB2 x (R, b, y, c) = f ixB 2 B (f ixB 2 R a x b) y c (4.11)
f ixB 2 B (R, a, x, b) y cB2 = f ixB 2 B a x (f ixB 2 R b y c)

Case 3. The sibling of the doubly-black node, and its two sub-trees are all black. In
this case, we change the sibling to red, flip the doubly-black node to black, and propagate
the doubly-blackness a level up to parent as shown in fig. 39. There are two symmetric
sub-cases. For the upper case, x was either red or black. x changes to black if it was red,
otherwise changes to doubly-black; Same coloring changes to y in the lower case. We add
this fixing to eq. (4.11):
4.4. DELETE 51

x y

a y x c

b c a b

y x

x c a y

a b b c

Figure 4.9: The sibling of the doubly-black is red.

x x

a y a y

b c b c

y y

x c x c

a b a b

Figure 4.10: move the blackness up.


52 CHAPTER 4. RED-BLACK TREE

...
f ixB 2 C aB2 x (B, b, y, c) = shif tB (C, (shif tB a), x, (R, b, y, c))
(4.12)
f ixB 2 C (B, a, x, b) y cB2 = shif tB (C, (R, a, x, b), y, (shif tB c))
f ixB 2 C l k r = (C, l, k, r)

If none of the patterns match, the last row keeps the node unchanged. The doubly-
black fixing is recursive. It terminates in two ways: One is Case 1, the doubly-black
node is eliminated. Otherwise the blackness may move up till the root. Finally the we
force the root be black. Below example program puts all three cases together:
fixDB color a@(Node BB _ _ _) x (Node B (Node R b y c) z d)
= Node color (Node B (shiftBlack a) x b) y (Node B c z d)
fixDB color BBEmpty x (Node B (Node R b y c) z d)
= Node color (Node B Empty x b) y (Node B c z d)
fixDB color a@(Node BB _ _ _) x (Node B b y (Node R c z d))
= Node color (Node B (shiftBlack a) x b) y (Node B c z d)
fixDB color BBEmpty x (Node B b y (Node R c z d))
= Node color (Node B Empty x b) y (Node B c z d)
fixDB color (Node B a x (Node R b y c)) z d@(Node BB _ _ _)
= Node color (Node B a x b) y (Node B c z (shiftBlack d))
fixDB color (Node B a x (Node R b y c)) z BBEmpty
= Node color (Node B a x b) y (Node B c z Empty)
fixDB color (Node B (Node R a x b) y c) z d@(Node BB _ _ _)
= Node color (Node B a x b) y (Node B c z (shiftBlack d))
fixDB color (Node B (Node R a x b) y c) z BBEmpty
= Node color (Node B a x b) y (Node B c z Empty)
fixDB B a@(Node BB _ _ _) x (Node R b y c)
= fixDB B (fixDB R a x b) y c
fixDB B a@BBEmpty x (Node R b y c)
= fixDB B (fixDB R a x b) y c
fixDB B (Node R a x b) y c@(Node BB _ _ _)
= fixDB B a x (fixDB R b y c)
fixDB B (Node R a x b) y c@BBEmpty
= fixDB B a x (fixDB R b y c)
fixDB color a@(Node BB _ _ _) x (Node B b y c)
= shiftBlack (Node color (shiftBlack a) x (Node R b y c))
fixDB color BBEmpty x (Node B b y c)
= shiftBlack (Node color Empty x (Node R b y c))
fixDB color (Node B a x b) y c@(Node BB _ _ _)
= shiftBlack (Node color (Node R a x b) y (shiftBlack c))
fixDB color (Node B a x b) y BBEmpty
= shiftBlack (Node color (Node R a x b) y Empty)
fixDB color l k r = Node color l k r

The delete algorithm is bound to O(h) time, where h is the height of the tree. As
red-black tree maintains the balance, h = O(lg n) for n nodes.

Exercise 4.5
4.5.1. Implement the ‘mark-rebuild’ delete algorithm: mark the node as deleted without
actually removing it. When the marked nodes exceed 50%, rebuild the tree.

4.5 Imperative red-black tree?


We simplify the red-black tree implementation with pattern matching. In this section, we
give the imperative algorithm for completeness. When insert, the first step is as same as
the binary search tree, then as the second step, we fix the balance through tree rotations.
1: function Insert(T, k)
2: root ← T
4.5. IMPERATIVE RED-BLACK TREE? 53

3: x ← Create-Leaf(k)
4: Color(x) ← RED
5: p ← NIL
6: while T 6= NIL do
7: p←T
8: if k < Key(T ) then
9: T ← Left(T )
10: else
11: T ← Right(T )
12: Parent(x) ← p
13: if p = NIL then . tree T is empty
14: return x
15: else if k < Key(p) then
16: Left(p) ← x
17: else
18: Right(p) ← x
19: return Insert-Fix(root, x)
We make the new node red, and then perform fixing before return. There are 3 basic
cases, each one has a symmetric case, hence there are total 6 cases. Among them, we can
merge two cases, because both have a red ‘uncle’ node. We change the parent and uncle
to black, and set grand parent to red:
1: function Insert-Fix(T, x)
2: while Parent(x) 6= NIL and Color(Parent(x)) = RED do
3: if Color(Uncle(x)) = RED then . Case 1, x’s uncle is red
4: Color(Parent(x)) ← BLACK
5: Color(Grand-Parent(x)) ← RED
6: Color(Uncle(x)) ← BLACK
7: x ← Grand-Parent(x)
8: else . x’s uncle is black
9: if Parent(x) = Left(Grand-Parent(x)) then
10: if x = Right(Parent(x)) then . Case 2, x is on the right
11: x ← Parent(x)
12: T ← Left-Rotate(T, x)
. Case 3, x is on the left
13: Color(Parent(x)) ← BLACK
14: Color(Grand-Parent(x)) ← RED
15: T ← Right-Rotate(T , Grand-Parent(x))
16: else
17: if x = Left(Parent(x)) then . Case 2, Symmetric
18: x ← Parent(x)
19: T ← Right-Rotate(T, x)
. Case 3, Symmetric
20: Color(Parent(x)) ← BLACK
21: Color(Grand-Parent(x)) ← RED
22: T ← Left-Rotate(T , Grand-Parent(x))
23: Color(T ) ← BLACK
24: return T
This algorithm takes O(lg n) time to insert a key, where n is the number of nodes.
Compare to the balance function defined previously, they have different logic. Even input
the same sequence of keys, they build different red-black trees, as shown in Figure 4.11
and fig. 4.7. There is a bit performance overhead in the pattern matching algorithm.
54 CHAPTER 4. RED-BLACK TREE

Okasaki discussed the difference in detail in [13] .

11 5

2 14 2 7

1 7 15 1 4 6 9

5 8 3 8

Figure 4.11: Red-black trees created by imperative algorithm.

We provide the imperative delete algorithm in Appendix A of the book. Red-black


tree is a popular self-balancing binary search tree. It is a good start for more complex
data structures. If extend from 2 to k sub-trees and maintain the balance, we obtain
B-tree; If store the data along with the edge but not in node, we obtain the Radix tree.
To maintain the balance, we need handle multiple cases. Okasaki’s developed a method
that makes the red-black tree easy to implement. There are many implementations based
on this idea [16] . We also implement AVL tree and Splay tree based on pattern matching
in this book.

4.6 Appendix: Example programs


Definition of red-black tree node with parent reference. Set the color red by default.
data Node<T> {
T key
Color color
Node<T> left, right, parent

Node(T x) = Node(null, x, null, Color.RED)

Node(Node<T> l, T k, Node<T> r, Color c) {


left = l, key = k, right = r, color = c
if left 6= null then left.parent = this
if right 6= null then right.parent = this
}

Self setLeft(l) {
left = l
if l 6= null then l.parent = this
}

Self setRight(r) {
right = r
if r 6= null then r.parent = this
}

Node<T> sibling() = if parent.left == this then parent.right


else parent.left

Node<T> uncle() = parent.sibling()

Node<T> grandparent() = parent.parent


}

Insert a key to red-black tree:


Node<T> insert(Node<T> t, T key) {
Elementary Algorithms 55

root = t
x = Node(key)
parent = null
while (t 6= null) {
parent = t
t = if (key < t.key) then t.left else t.right
}
if (parent == null) { //tree is empty
root = x
} else if (key < parent.key) {
parent.setLeft(x)
} else {
parent.setRight(x)
}
return insertFix(root, x)
}

Fix the balance:


// Fix the red→red violation
Node<T> insertFix(Node<T> t, Node<T> x) {
while (x.parent 6= null and x.parent.color == Color.RED) {
if (x.uncle().color == Color.RED) {
// case 1: ((a:R x:R b) y:B c:R) =⇒ ((a:R x:B b) y:R c:B)
x.parent.color = Color.BLACK
x.grandparent().color = Color.RED
x.uncle().color = Color.BLACK
x = x.grandparent()
} else {
if (x.parent == x.grandparent().left) {
if (x == x.parent.right) {
// case 2: ((a x:R b:R) y:B c) =⇒ case 3
x = x.parent
t = leftRotate(t, x)
}
// case 3: ((a:R x:R b) y:B c) =⇒ (a:R x:B (b y:R c))
x.parent.color = Color.BLACK
x.grandparent().color = Color.RED
t = rightRotate(t, x.grandparent())
} else {
if (x == x.parent.left) {
// case 2': (a x:B (b:R y:R c)) =⇒ case 3'
x = x.parent
t = rightRotate(t, x)
}
// case 3': (a x:B (b y:R c:R)) =⇒ ((a x:R b) y:B c:R)
x.parent.color = Color.BLACK
x.grandparent().color = Color.RED
t = leftRotate(t, x.grandparent())
}
}
}
t.color = Color.BLACK
return t
}
56 AVL tree
Chapter 5

AVL tree

The idea of red-black tree is to limit the number of nodes along a path within a range.
AVL tree takes a direct approach: quantify the difference between branches. For a node
T , define:

δ(T ) = |r| − |l| (5.1)

Where |T | is the height of tree T , l and r are the left and right sub-trees. Define
δ(∅) = 0 for the empty tree. If δ(T ) = 0 for every node T , the tree is definitely balanced.
For example, a complete binary tree of height h has n = 2h − 1 nodes. There are not any
empty branches except for the leaves. The less absolute value of δ(T ), the more balanced
between the sub-trees. We call δ(T ) the balance factor of a binary tree.

5.1 Definition

2 8

1 3 6 9

5 7 10

Figure 5.1: an AVL tree

A binary search tree is an AVL tree if every sub-tree T satisfies:

|δ(T )| ≤ 1 (5.2)

There are three valid values of δ(T ): ±1, and 0. Figure 5.1 shows an AVL tree. This
definition ensures the tree height h = O(lg n), where n is the number of nodes. Let’s
prove it. For an AVL tree of height h, the number of nodes varies. There are at most
2h − 1 nodes for a complete binary tree case. We are interesting in how many nodes at
least. Let the minimum number be N (h). We have the following result:

• Empty tree ∅: h = 0, N (0) = 0;


• Singleton tree: h = 1, N (1) = 1;

57
58 CHAPTER 5. AVL TREE

Figure 5.2 shows an AVL tree T of height h. It contains three parts, the key k, and
two sub-trees l, r. We have the following equation:

Figure 5.2: An AVL tree of height h. The height of one sub-tree is h − 1, the other is no
less than h − 2.

h = max(|l|, |r|) + 1 (5.3)

There must be a sub-tree of height h − 1. From the definition. we have ||l| − |r|| ≤ 1
holds. Hence the height of the other tree can not be lower than h − 2. The total number
of the nodes in T is the sum of both sub-trees plus 1 (for the root):

N (h) = N (h − 1) + N (h − 2) + 1 (5.4)

This recursive equation is similar to Fibonacci numbers. Actually we can transform


it to Fibonacci numbers through N 0 (h) = N (h) + 1. Equation (5.4) then changes to:

N 0 (h) = N 0 (h − 1) + N 0 (h − 2) (5.5)

Lemma 5.1.1. Let N (h) be the minimum number of nodes for an AVL tree of height h,
and N 0 (h) = N (h) + 1, then
N 0 (h) ≥ φh (5.6)

5+1
Where φ = is the golden ratio.
2
Proof. When h = 0 or 1, we have:
• h = 0: N 0 (0) = 1 ≥ φ0 = 1
• h = 1: N 0 (1) = 2 ≥ φ1 = 1.618...
For the induction case, assume N 0 (h) ≥ φh .

N 0 (h + 1) = N 0 (h) + N 0 (h − 1) {Fibonacci}
≥ φh + φh−1 {induction hypothesis}

h−1 5+3
=φ (φ + 1) {φ + 1 = φ2 = }
h+1
2

From lemma 5.1.1, we immediately obtain:

h ≤ logφ (n + 1) = logφ 2 · lg(n + 1) ≈ 1.44 lg(n + 1) (5.7)

The height of AVL tree is proportion to O(lg n), indicating AVL tree is balanced.
When insert or delete, the balance factor may exceed the valid value range, we need fix
to resume |δ| < 1. Traditionally, the fixing is through tree rotations. We simplify the
5.2. INSERT 59

implementation with pattern matching. The idea is similar to the functional red-black
tree [13] . Because of this ‘modify-fix’ approach, AVL tree is also self-balancing. We can
re-use the binary search tree definition. Although the balance factor δ can be computed
recursively, we record it inside each node as T = (l, k, r, δ), and update it when mutate
the tree1 . Below example program adds δ as an Int:
data AVLTree a = Empty | Br (AVLTree a) a (AVLTree a) Int

For AVL tree, lookup, max, min are as same as the binary search tree. We focus on
insert and delete algorithms.

5.2 Insert
When insert a new element x, some |δ(T )| may exceed 1. For those sub-trees which are
the ancestors of x, the height may increase at most by 1. We need recursively update the
balance factor along the path of insertion. Define the insert result as a pair (T 0 , ∆H),
where T 0 is the updated tree and ∆H is the increment of height. We modify the binary
search tree insert function as below:
insert x = fst ◦ ins x (5.8)
Where fst (a, b) = a returns the first component of a pair. ins x T inserts element x
into tree T :
ins x ∅ = ((∅,
( x, ∅, 0), 1)
x < k : tree (ins x l) k (r, 0) δ (5.9)
ins x (l, k, r, δ) =
x > k : tree (l, 0) k (ins x r) δ

If the tree is empty ∅, the result is a leaf of x with balance factor 0. The height
increases to 1. Otherwise let T = (l, k, r, δ). We compare x with k. If x < k, we
recursively insert x to l, otherwise insert to r. As the recursive insert result is a pair
of (l0 , ∆l) or (r0 , ∆r), we need adjust the balance factor and update tree height through
function tree. It takes 4 parameters: (l0 , ∆l), k 0 , (r0 , ∆r), and δ. The result is (T 0 , ∆H),
where T 0 is the new tree, and ∆H is defined as:
∆H = |T 0 | − |T | (5.10)
We can further break it down into 4 cases:
∆H = |T 0 | − |T |
= 1 + max(|r0 |, |l0 |) − (1 + max(|r|, |l|))
= max(|r0 |, |l0 |) − max(|r|, |l|)
δ ≥ 0, δ 0 ≥ 0 : ∆r


 (5.11)
δ ≤ 0, δ 0 ≥ 0 : δ + ∆r

=


 δ ≥ 0, δ 0 ≤ 0 : ∆l − δ
otherwise : ∆l

Where δ 0 = δ(T 0 ) = |r0 | − |l0 |, is the updated balance factor. Appendix B provides the
proof for it. We need determine δ 0 before balance adjustment.
δ0 = |r0 | − |l0 |
= |r| + ∆r − (|l| + ∆l)
(5.12)
= |r| − |l| + ∆r − ∆l
= δ + ∆r − ∆l
1 Alternatively, we can record the height instead of δ [20] .
60 CHAPTER 5. AVL TREE

With the changes in height and balance factor, we can define the tree function in
eq. (5.9):
tree (l0 , ∆l) k (r0 , ∆r) δ = balance (l0 , k, r0 , δ 0 ) ∆H (5.13)
Below example programs implements what we deduced so far:
insert x = fst ◦ ins x where
ins x Empty = (Br Empty x Empty 0, 1)
ins x (Br l k r d)
| x < k = tree (ins x l) k (r, 0) d
| x > k = tree (l, 0) k (ins x r) d

tree (l, dl) k (r, dr) d = balance (Br l k r d') deltaH where
d' = d + dr - dl
deltaH | d ≥ 0 && d' ≥ 0 = dr
| d ≤ 0 && d' ≥ 0 = d + dr
| d ≥ 0 && d' ≤ 0 = dl - d
| otherwise = dl

5.2.1 Balance
There are 4 cases need fix as shown in fig. 5.3. The balance factor is ±2, exceeds the range
of [−1, 1]. We adjust them to a uniformed structure in the center, with the δ(y) = 0.

𝛿 𝑥 =2
𝛿 𝑧 = −2
x
z 𝛿 𝑦 =1
𝛿 𝑦 = −1
a y
y d
𝛿 𝑦 =0
b z
x c y

c d
a b x z
𝛿 𝑧 = −2 𝛿 𝑥 =2

z a b c d
x
𝛿 𝑥 =1 𝛿 𝑧 = −1
x d a z

a y
y d

b c b c

Figure 5.3: Fix 4 cases to the same structure

We call the 4 cases: left-left, right-right, right-left, and left-right. Denote the balance
factors before fixing as δ(x), δ(y), and δ(z); after fixing, they change to δ 0 (x), δ 0 (y) = 0,
and δ 0 (z) respectively. The values of δ 0 (x) and δ 0 (z) can be given as below. Appendix B
gives the proof.
Left-left Right-right
 
0 0
δ (x) = δ(x)
 δ (x) = 0

δ 0 (y) = 0 δ 0 (y) = 0 (5.14)
 0
  0

δ (z) = 0 δ (z) = δ(z)
5.2. INSERT 61

Right-left and Left-right are same:


(
0 δ(y) = 1 : −1
δ (x) =
otherwise : 0
δ 0 (y) = (
0 (5.15)
δ(y) = −1 : 1
δ 0 (z) =
otherwise : 0

Based on this, we can implement the pattern matching fix as below:


balance (((a, x, b, δ(x)), y, c, −1), z, d, −2) ∆H = ((a, x, b, δ(x)), y, (c, z, d, 0), 0, ∆H − 1)
balance (a, x, (b, y, (c, z, d, δ(z)), 1), 2) ∆H = ((a, x, b, 0), y, (c, z, d, δ(z)), 0, ∆H − 1)
balance ((a, x, (b, y, c, δ(y)), 1), z, d, −2) ∆H = ((a, x, b, δ 0 (x)), y, (c, z, d, δ 0 (z)), 0, ∆H − 1)
balance (a, x, ((b, y, c, δ(y)), z, d, −1), 2) ∆H = ((a, x, b, δ 0 (x)), y, (c, z, d, δ 0 (z)), 0, ∆H − 1)
balance T ∆H = (T, ∆H)
(5.16)
Where δ 0 (x) and δ 0 (z) are defined in eq. (83). If none of the pattern matches, keep
the tree unchanged. Below is the example program implements balance:
balance (Br (Br (Br a x b dx) y c (-1)) z d (-2)) dH =
(Br (Br a x b dx) y (Br c z d 0) 0, dH-1)
balance (Br a x (Br b y (Br c z d dz) 1) 2) dH =
(Br (Br a x b 0) y (Br c z d dz) 0, dH-1)
balance (Br (Br a x (Br b y c dy) 1) z d (-2)) dH =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1)) 2) dH =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance t d = (t, d)

The performance of insert is proportion to the height of the tree. From eq. (5.7), it
is bound to is O(lg n).

5.2.2 Verification
To validate an AVL tree, we need verify two things: (1) It is a binary search tree; (2) For
every sub-tree T , eq. (5.2): δ(T ) ≤ 1 holds. Below function examines the height difference
between the two sub-trees recursively:

avl? ∅ = True
(5.17)
avl? T = avl? l and avl? r and δ = |r| − |l| and |δ| ≤ 1

Where the height is calculated recursively:

|∅| = 0
(5.18)
|(l, k, r, δ)| = 1 + max(|r|, |l|)

Below example program implements AVL tree height verification:


isAVL Empty = True
isAVL (Br l _ r _) = isAVL l && isAVL r &&
d == (height r - height l) && abs d ≤ 1

height Empty = 0
height (Br l _ r _) = 1 + max (height l) (height r)
62 CHAPTER 5. AVL TREE

Exercise 5.1
5.1.1. We only give the algorithm to test AVL height. Complete the program to test if a
binary tree is AVL tree.

5.3 Imperative algorithm F


This section gives the imperative algorithm for completeness. Similar to the red-black
tree algorithm, we first re-use the binary search tree insert, then fix the balance through
tree rotations.
1: function Insert(T, k)
2: root ← T
3: x ← Create-Leaf(k)
4: δ(x) ← 0
5: parent ← NIL
6: while T 6= NIL do
7: parent ← T
8: if k < Key(T ) then
9: T ← Left(T )
10: else
11: T ← Right(T )
12: Parent(x) ← parent
13: if parent = NIL then . tree T is empty
14: return x
15: else if k < Key(parent) then
16: Left(parent) ← x
17: else
18: Right(parent) ← x
19: return AVL-Insert-Fix(root, x)
After insert, the balance factor δ may change because of the tree growth. Insert to the
right may increase δ by 1, while insert to the left may decrease it. We perform bottom-up
fixing from x to root. Denote the new balance factor as δ 0 , there are 3 cases:

• |δ| = 1, |δ 0 | = 0. The new node makes the tree well balanced. The height of the
parent keeps unchanged.
• |δ| = 0, |δ 0 | = 1. Either the left or the right sub-tree increases its height. We need
go on checking the upper level.
• |δ| = 1, |δ 0 | = 2. We need rotate the tree to fix the balance factor.

1: function AVL-Insert-Fix(T, x)
2: while Parent(x) 6= NIL do
3: P ← Parent(x)
4: L ← Left(x)
5: R ← Right(x)
6: δ ← δ(P )
7: if x = Left(P ) then
8: δ0 ← δ − 1
9: else
10: δ0 ← δ + 1
11: δ(P ) ← δ 0
5.3. IMPERATIVE ALGORITHM F 63

12: if |δ| = 1 and |δ 0 | = 0 then . Height unchanged


13: return T
14: else if |δ| = 0 and |δ 0 | = 1 then . Go on bottom-up update
15: x←P
16: else if |δ| = 1 and |δ 0 | = 2 then
17: if δ 0 = 2 then
18: if δ(R) = 1 then . Right-right
19: δ(P ) ← 0 . By eq. (72)
20: δ(R) ← 0
21: T ← Left-Rotate(T, P )
22: if δ(R) = −1 then . Right-left
23: δy ← δ(Left(R)) . By eq. (83)
24: if δy = 1 then
25: δ(P ) ← −1
26: else
27: δ(P ) ← 0
28: δ(Left(R)) ← 0
29: if δy = −1 then
30: δ(R) ← 1
31: else
32: δ(R) ← 0
33: T ← Right-Rotate(T, R)
34: T ← Left-Rotate(T, P )
35: if δ = −2 then
0

36: if δ(L) = −1 then . Left-left


37: δ(P ) ← 0
38: δ(L) ← 0
39: Right-Rotate(T, P )
40: else . Left-Right
41: δy ← δ(Right(L))
42: if δy = 1 then
43: δ(L) ← −1
44: else
45: δ(L) ← 0
46: δ(Right(L)) ← 0
47: if δy = −1 then
48: δ(P ) ← 1
49: else
50: δ(P ) ← 0
51: Left-Rotate(T, L)
52: Right-Rotate(T, P )
53: break
54: return T
Besides rotation, we also need update δ for the impacted nodes. The right-right and
left-left cases need one rotation, while the right-left and left-right case need two rotations.
Appendix B provides the delete implementation.
AVL tree was developed in 1962 (earlier than the red-black tree) by Adelson-Velskii
and Landis [18] , [19] . It is named after the two authors. Most tree operations are bound
O(lg n) time. From eq. (5.7), AVL tree is more rigidly balanced, and performs faster than
red-black tree in looking up intensive applications [18] . However, red-black tree performs
64 Radix tree

better in frequently insertion and removal cases. Many popular self-balance binary search
tree libraries are implemented on top of red-black tree. AVL tree also provides the intuitive
and effective solution to the balance problem.

5.4 Appendix: Example programs


Definition of AVL tree node.
data Node<T> {
int delta
T key
Node<T> left, right, parent
}

Fix the balance:


Node<T> insertFix(Node<T> t, Node<T> x) {
while (x.parent 6= null ) {
var (p, l, r) = (x.parent, x.parent.left, x.parent.right)
var d1 = p.delta
var d2 = if x == parent.left then d1 - 1 else d1 + 1
p.delta = d2

if abs(d1) == 1 and abs(d2) == 0 {


return t
} else if abs(d1) == 0 and abs(d2) == 1 {
x = p
} else if abs(d1) == 1 and abs(d2) == 2 {
if d2 == 2 {
if r.delta == 1 { //Right-right
p.delta = 0
r.delta = 0
t = rotateLeft(t, p)
} else if r.delta == -1 { //Right-Left
var dy = r.left.delta
p.delta = if dy == 1 then -1 else 0
r.left.delta = 0
r.delta = if dy == -1 then 1 else 0
t = rotateRight(t, r)
t = rotateLeft(t, p)
}
} else if d2 == -2 {
if l.delta == -1 { //Left-left
p.delta = 0
l.delta = 0
t = rotateRight(t, p)
} else if l.delta == 1 { //Left-right
var dy = l.right.delta
l.delta = if dy == 1 then -1 else 0
l.right.delta = 0
p.delta = if dy == -1 then 1 else 0
t = rotateLeft(t, l)
t = rotateRight(t, p)
}
}
break
}
}
return t
}
Chapter 6

Radix tree

Binary search tree stores data in nodes. Can we use the edges to carry information? Radix
trees, including trie, prefix tree, and suffix tree are the data structures developed based
on this idea in 1960s. They are widely used in compiler design [21] , and bio-information
processing, like DNA pattern matching [23] .

1
0
0
0
1

10
0
1 1

011 100
1

1011

Figure 6.1: Radix tree.

Figure 6.1 shows a Radix tree. It contains bits 1011, 10, 011, 100 and 0. When lookup
a key k = (b0 b1 ...bn )2 , we take the first bit b0 (MSB from left), check whether it is 0 or 1.
For 0, turn left, else turn right. Then take the second bit and repeat looking up till either
reach a leaf node or finish all the n bits. We needn’t store keys in node because they are
represented by edges. The nodes labelled with key in fig. 6.1 are for illustration purpose.
For the integer keys, we represent them in binary format, and implement lookup with
bit-wise manipulations.

6.1 Integer trie


We call the data structure in fig. 6.1 binary trie, which was developed by Edward Fredkin
in 1960. The name comes from “retrieval”, pronounce as /’tri:/ by Freddkin, while others
pronounce it as /’trai/ “try” [24] . Although it’s also called prefix tree, we treat trie and
prefix tree differently in this book. A binary trie is a special binary tree in which the
placement of each key is controlled by its bits, each 0 means ‘go left’ and each 1 means

65
66 CHAPTER 6. RADIX TREE

‘go right’ [21] . Consider the binary trie in fig. 6.2. The three keys of “11”, “011”, and
“0011” all equal to 3.

1
0

0 1 1

11
1 1

011
1

0011

Figure 6.2: A big-endian trie.

It is inefficient to treat the prefix zeros as valid bits. We need a tree of 32 levels to
insert 1 as an integer of 32 bits. Okasaki suggests to use little-endian integers instead [21] .
1 is represented as bits (1)2 , 2 as (01)2 , and 3 is (11)2 , ...

6.1.1 Definition
Re-use binary tree definition, a node is either empty, or a branch containing the left, right
sub-trees, and an optional value v. The left sub-tree l is encoded as 0 and the right r is
encoded as 1.
data IntTrie a = Empty | Branch (IntTrie a) (Maybe a) (IntTrie a)

Given a node, the corresponding integer key is uniquely determined through its posi-
tion. That is the reason we need not save the key, but only the value in the node. The
type of the tree is IntT rie A where A is the type of the value.

6.1.2 Insert
When insert a key k and a value x, we convert the integer k into binary. If k is even, the
lowest bit is 0, we recursively insert to the left sub-tree; otherwise k is odd, the lowest bit
is 1, we insert to the right. Next divide k by 2 to remove the lowest bit. For none empty
trie T = (l, v, r), function insert is defined as below:

insert k x ∅ = insert k x (∅, Nothing, ∅)


insert 0 x (l, v, r) =  Just x, r)
(l,
even(k) : (insert k x l, v, r)
 (6.1)
insert k x (l, v, r) = 2
k
odd(k) :
 (l, v, insert b c x r)
2
If k = 0, we put x in the node. This algorithm overrides the value if k already
exists. Alternatively, we can store a list, and append x to it. As far as k 6= 0, we goes
down along the tree based on the parity of k. We create empty leaf (∅, Nothing, ∅)
whenever meet ∅. Figure 6.3 shows an example trie, generated with key-value pairs of
{1 → a, 4 → b, 5 → c, 9 → d}. Below example program implements insert:
6.1. INTEGER TRIE 67

1
0
0
1:a
0 0

1 0 1

4:b 5:c
1

9:d

Figure 6.3: A little-endian integer binary trie of {1 → a, 4 → b, 5 → c, 9 → d}.

insert k x Empty = insert k x (Branch Empty Nothing Empty)


insert 0 x (Branch l v r) = Branch l (Just x) r
insert k x (Branch l v r) | even k = Branch (insert (k `div` 2) x l) v r
| otherwise = Branch l v (insert (k `div` 2) x r)

We can test even/odd with the remainder modulo 2: even(k) = (k mod 2 = 0)m or
use bit-wise operation, like (k & 0x1) == 0. We can eliminate the recursion through
loops to realize an iterative implementation:
1: function Insert(T, k, x)
2: if T = NIL then
3: T ← Empty-Node . (NIL, Nothing, NIL)
4: p←T
5: while k 6= 0 do
6: if Even?(k) then
7: if Left(p) = NIL then
8: Left(p) ← Empty-Node
9: p ← Left(p)
10: else
11: if Right(p) = NIL then
12: Right(p) ← Empty-Node
13: p ← Right(p)
14: k ← bk/2c
15: Value(p) ← x
16: return T
Insert takes a trie T , a key k, and a value x. For integer k of m bits, it goes m
levels. The performance is bound to O(m). We design insert k x T and Insert(T, k, x)
symmetric, apply f oldr to the former, and f oldl (or for-loops) to the latter to convert a
list of key-value pairs to tree. For example:

f romList = f oldr (uncurry insert) ∅ (6.2)

The usage is f romList [(1, a), (4, b), (5, c), (9, d)], where uncurry is the revert of
Currying, it unpack a pair and feed to insert:

uncurry f (a, b) = f a b (6.3)


68 CHAPTER 6. RADIX TREE

6.1.3 Lookup
When lookup key k, if k = 0, then the root is the target. Otherwise, we check the lowest
bit, then recursively look up the left or right sub-tree accordingly.

lookup k ∅ = Nothing
lookup 0 (l, v, r) = v
even(k) :

lookup
k
l (6.4)
lookup k (l, v, r) = 2
k
odd(k) :
 lookup b c r
2
We can eliminate the recursion to implement the iterative lookup:
1: function Lookup(T, k)
2: while k 6= 0 and T 6=NIL do
3: if Even?(k) then
4: T ← Left(T )
5: else
6: T ← Right(T )
7: k ← bk/2c
8: if T 6= NIL then
9: return Value(T )
10: else
11: return NIL
The lookup function is bound to O(m) time, where m is the number of bits of k.

Exercise 6.1
6.1.1. Can we change the definition from Branch (IntTrie a) (Maybe a) (Int-
Trie a) to Branch (IntTrie a) a (IntTrie a), and return Nothing if
the value does not exist, and Just v otherwise?

6.2 Integer prefix tree


Trie is not space efficient. As shown in fig. 6.3, there are only 4 nodes with value, while
the rest 5 are empty. The space usage is less than 50%. To improve the efficiency, we can
consolidate the chained nodes. Integer prefix tree is such a data structure developed by
Donald R. Morrison in 1968. He named it as ‘Patricia’, standing for Practical Algorithm
To Retrieve Information Coded In Alphanumeric [22] . When the keys are integer, we
call it integer prefix tree or simply integer tree when the context is clear. Consolidate the
chained nodes in fig. 6.3, gives the integer tree in fig. 6.4. The key of the branch node is
the longest common prefix for its descendant trees. In other words, the sibling sub-trees
branch out at the bit where ends at their longest prefix.

6.2.1 Definition
Integer prefix tree is a special binary tree. It is empty ∅, or a leaf of (k, v), that contains
an integer key k and a value v, or a branch with the left and right sub-trees, that share
the longest common prefix bits for their keys. For the left sub-tree, the next bit is 0,
for the right, it is 1. Denoted as (p, m, l, r). Below example program defines the integer
prefix tree. The branch node contains 4 components: The longest prefix p, a mask integer
m indicating from which bit the sub-trees branch out, the left and right sub-trees l and r.
6.2. INTEGER PREFIX TREE 69

1
001
4:b
1:a
0

01 1

9:d 5:c

Figure 6.4: Little endian integer tree of the map {1 → a, 4 → b, 5 → c, 9 → d}.

The mask m = 2n for some integer n ≥ 0. All bits belown do not belong to the common
prefix.
data IntTree a = Empty
| Leaf Int a
| Branch Int Int (IntTree a) (IntTree a)

6.2.2 Insert
When insert integer y to tree T , if T is empty, we create a leaf of y; If T is a singleton
leaf of x, we create a new leaf of y, and a branch, set x and y as the two sub-trees. To
determine which one is on the left between x and y, we find the longest common prefix p
of x and y. For example if x = 12 = (1100)2 , y = 15 = (1111)2 , then p = (11oo)2 , where o
denotes the bits we don’t care. We can use another integer m to mask those bits. In this
example, m = 4 = (100)2 . The next bit after p presents 21 . It is 0 in x, 1 in y. Hence,
we set x as the left sub-tree and y as the right in fig. 6.5.

prefix = 1100
12
mask = 100

0 1

12 15

Figure 6.5: Left: T is a leaf of 12; Right: After insert 15.

If T is neither empty nor a leaf, we firstly check if y matches the longest common
prefix p in the root, then recursively insert it to the sub-tree according to the next bit to
p. For example, when insert y = 14 = (1110)2 to the tree in fig. 6.5, since p = (11oo)2
and the next bit (the bit of 21 ) is 1, we recursively insert y to the right sub-tree. If y does
not match p in the root, we need branch a new leaf as shown in fig. 6.6.

insert k v ∅ = (k, v)
insert k v (k, v 0 ) = (k, v)
insert k v (k 0 , v 0 ) = join 0 0 0
 k (k, v) k (k , v()

match(k, p, m) : zero(k, m) : (p, m, insert k v l, r)
insert k v (p, m, l, r) = otherwise : (p, m, l, insert k v r)

otherwise : join k (k, v) p (p, m, l, r)

(6.5)
70 CHAPTER 6. RADIX TREE

prefix = 1100 prefix = 1100


mask = 100 mask = 100

0 1 0 1

12 15 12 prefix = 1110
mask = 10

0 1

14 15

(a) Insert 14 = (1110)2 , which matches p = (1100)2 , to


right.

prefix = 1100 prefix = 0


mask = 100 mask = 10000

0 1 0 1

12 15 5 prefix = 1110
mask = 10

0 1

12 15

(b) Insert 5 = (101)2 , which does not match p = (1100)2 .


Branch out a new leaf.

Figure 6.6: Insert to a branch.


6.2. INTEGER PREFIX TREE 71

We create a leaf of (k, v) when T = ∅, override the value for the same key. Func-
tion match(k, p, m) tests if k and p have the same bits after masked with m through:
mask(k, m) = p, where mask(k, m) = m − 1&k. It applies bit-wise not to m − 1, then
does bit-wise and with k. zero(k, m) tests the next bit in k masked with m is 0 or not.
We shift m one bit to right, then do bit-wise and with k:

zero(k, m) = x&(m  1) (6.6)

Function join(p1 , T1 , p2 , T2 ) takes two prefixes and trees. It extracts the longest com-
mon prefix of p1 and p2 as (p, m) = LCP (p1 , p2 ), creates a new branch node, then set T1
and T2 as the sub-trees:
(
zero(p1 , m) : (p, m, T1 , T2 )
join(p1 , T1 , p2 , T2 ) = (6.7)
otherwise : (p, m, T2 , T1 )

To calculate the longest common prefix, we firstly compute bit-wise exclusive-or for
p1 and p2, then count the highest bit h = highest(xor(p1 , p2 )) as:

highest(0) = 0
highest(n) = 1 + highest(n  1)

Then generate a mask m = 2h . The longest common prefix p can be given by masking
the bits with m for either p1 or p2 , like p = mask(p1 , m). The following example program
implements the insert function:
insert k x t
= case t of
Empty → Leaf k x
Leaf k' x' → if k == k' then Leaf k x
else join k (Leaf k x) k' t
Branch p m l r
| match k p m → if zero k m
then Branch p m (insert k x l) r
else Branch p m l (insert k x r)
| otherwise → join k (Leaf k x) p t

join p1 t1 p2 t2 = if zero p1 m then Branch p m t1 t2


else Branch p m t2 t1
where
(p, m) = lcp p1 p2

lcp p1 p2 = (p, m) where


m = bit (highestBit (p1 `xor` p2))
p = mask p1 m

highestBit x = if x == 0 then 0 else 1 + highestBit (shiftR x 1)

mask x m = x .&. complement (m - 1)

zero x m = x .&. (shiftR m 1) == 0

match k p m = (mask k m) == p

We can also implement insert imperatively:


1: function Insert(T, k, v)
2: if T = NIL then
3: return Create-Leaf(k, v)
4: y←T
5: p ← NIL
72 CHAPTER 6. RADIX TREE

6: while y is not leaf, and Match(k, Prefix(y), Mask(y)) do


7: p←y
8: if Zero?(k, Mask(y)) then
9: y ← Left(y)
10: else
11: y ← Right(y)
12: if y is leaf, and k = Key(y) then
13: Value(y) ← v
14: else
15: z ← Branch(y, Create-Leaf(k, v))
16: if p = NIL then
17: T ←z
18: else
19: if Left(p) = y then
20: Left(p) ← z
21: else
22: Right(p) ← z
23: return T
Where Branch(T1 , T2 ) creates a new branch node, extracts the longest common pre-
fix, then sets T1 and T2 as the two sub-trees.
1: function Branch(T1 , T2 )
2: T ← Empty-Node
3: (Prefix(T ), Mask(T )) ← LCP(Prefix(T1 ), Prefix(T2 ))
4: if Zero?(Prefix(T1 ), Mask(T )) then
5: Left(T ) ← T1
6: Right(T ) ← T2
7: else
8: Left(T ) ← T2
9: Right(T ) ← T1
10: return T

11: function Zero?(x, m)


m
12: return (x&b c) = 0
2
Function LCP find the longest bit prefix from two integers:
1: function LCP(a, b)
2: d ← xor(a, b)
3: m←1
4: while d 6= 0 do
d
5: d←b c
2
6: m ← 2m
7: return (MaskBit(a, m), m)

8: function MaskBit(x, m)
9: return x&m − 1

Figure 6.7 gives an example tree. Although the integer prefix tree consolidates the
chained nodes, the operation to extract the longest common prefix need scan the bits.
For integer of m bits, it is bound to O(m).
6.2. INTEGER PREFIX TREE 73

prefix = 0
mask = 8

0 1

1: prefix = 100
mask = 2

0 1

4: 5:

Figure 6.7: Insert {1 → x, 4 → y, 5 → z} to the big-endian integer tree.

6.2.3 Lookup
When lookup key k, if the tree T = ∅ or a leaf T = (k 0 , v) with different key, then k does
not exist; if k = k 0 , then v is the result; if T = (p, m, l, r), we need check if the common
prefix p matches k under the mask m, then recursively lookup the sub-tree l or r. If fails
to match p, then k does not exist.

lookup k ∅ = Nothing
Just v
(
k = k0 :
lookup k (k 0 , v) =
otherwise : Nothing
 ( (6.8)

match(k, p, m) : zero(k, m) : lookup k l
lookup k (p, m, l, r) = otherwise : lookup k r
Nothing

otherwise :

We can also eliminate the recursion to implement the iterative lookup.


1: function Look-Up(T, k)
2: if T = NIL then
3: return NIL
4: while T is not leaf, and Match(k, Prefix(T ), Mask(T )) do
5: if Zero?(k, Mask(T )) then
6: T ← Left(T )
7: else
8: T ← Right(T )
9: if T is leaf, and Key(T ) = k then
10: return Value(T )
11: else
12: return NIL
The lookup algorithm is bound to O(m), where m is the number of bits in the key.

Exercise 6.2

6.2.1. Implement the lookup function for integer tree.


6.2.2. Implement the pre-order traverse for both integer trie and integer tree. Only
output the keys when the nodes store values. What pattern does the result follow?
74 CHAPTER 6. RADIX TREE

6.3 Trie
When extend the key type from 0/1 bits to generic list, the tree structure changes from
binary tree to multiple sub-trees. Taking English characters for example, there are up to
26 sub-trees when ignore the case as shown in fig. 6.8.

a z
b c

a NIL ...

n o o

an
y
o o o

boy zoo

t l

bool

another

Figure 6.8: A trie of 26 branches, containing key ‘a’, ‘an’, ‘another’, ‘bool’, ‘boy’, and
‘zoo’.

Not all the 26 sub-trees contain data. In fig. 6.8, there are only three none empty
sub-trees bound to ‘a’, ‘b’, and ‘z’. Other sub-trees, such as ‘c’, are empty. We will hide
them later. When it is case sensitive, or extent the key from alphabetic string to generic
list, we adopt collection types, like map to define trie. A trie of type T rie K V is either
empty ∅ or a node of 2 cases:

1. A leaf of value v without any sub-trees as (v, ∅), where the type of v is V ;

2. A branch, containing a value v and multiple sub-trees. Each sub-tree is bound to an


element k of type K. Denoted as (v, ts), where ts = {k1 7→ T1 , k2 7→ T2 , ..., km 7→
Tm }, contains the mapping from ki to sub-tree Ti . The mapping can be assoc list
or self-balancing trees (see chapter 4, 5).

Let the empty content be (Nothing, ∅), Below example program defines trie.
data Trie k v = Trie { value :: Maybe v
, subTrees :: Map k (Trie k v)}

6.3.1 Insert
When insert a pair of key and value, where the key is a list of elements. Let the trie be
T = (v, ts), ts[k] looks up k in map ts. It returns empty when k doesn’t exist; ts[k] ← t
inserts a mapping from k to tree t, and returns the updated map.
6.4. PREFIX TREE 75

insert [ ] v (v 0 , ts) = (Just v, ts)


(6.9)
insert (k:ks) v (v 0 , ts) = (v 0 , ts[k] ← insert ks v ts[k])

Below is the example program:


insert [] x (Trie _ ts) = Trie (Just x) ts
insert (k:ks) x (Trie v ts) = Trie v (Map.insert k (insert ks x t) ts) where
t = case Map.lookup k ts of
Nothing → Trie Nothing Map.empty
(Just t) → t

We can eliminate the recursion with loops:


1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node
4: p←T
5: for each c in k do
6: if Sub-Trees(p)[c] = NIL then
7: Sub-Trees(p)[c] ← Empty-Node
8: p ← Sub-Trees(p)[c]
9: Value(p) ← v
10: return T
For the key type [K] (list of K), if K is finite set of m elements, and the length of the
key is n, then the insert algorithm is bound to O(n lg m). For lower case English strings,
m = 26, the insert operation is proportion to the length of the key string.

6.3.2 Lookup
When look up a none empty key (k : ks) from trie T = (v, ts), starting from the first
element k, if there exists sub-tree T 0 mapped to k, we then recursively lookup ks in T 0 .
When the key is empty, then return the value as result:

lookup [ ] (v, ts) = v(


ts[k] = Nothing : Nothing (6.10)
lookup (k:ks) (v, ts) =
ts[k] = Just t : lookup ks t

Below is the corresponding iterative implementation:


1: function Look-Up(T, key)
2: if T = NIL then
3: return Nothing
4: for each c in key do
5: if Sub-Trees(T )[c] = NIL then
6: return Nothing
7: T ← Sub-Trees(T )[c]
8: return Value(T )
The lookup algorithm is bound to O(n lg m), where n is the length of the key, and m
is the size of the element set.

6.4 Prefix tree


Trie is not space efficient. We consolidate the chained nodes to obtain the prefix tree.
A prefix tree node t contains two parts: an optional value v; zero or multiple sub prefix
76 CHAPTER 6. RADIX TREE

trees, each ti is bound to a list si , as [si 7→ ti ]. These lists share the longest common
prefix s bound to the node t. i.e. s is the longest common prefix of s + + s1 , s +
+ s2 , ... For
any i 6= j, list si and sj don’t have none empty common prefix. Consolidate the chained
nodes in fig. 6.8, we get the prefix tree in fig. 6.9.

a zoo
bo

a zoo
n y
ol
an bool boy
other

another

Figure 6.9: A prefix tree with keys: ‘a’, ‘an’, ‘another’, ‘bool’, ‘boy’, ‘zoo’.

Below example program defines the prefix tree1 :


data PrefixTree k v = PrefixTree { value :: Maybe v
, subTrees :: [([k], PrefixTree k v)]}

Denote prefix tree t = (v, ts). Particularly, (Nothing, [ ]) is the empty node, and
(Just v, [ ]) is the leaf of v.

6.4.1 Insert
When insert key s, if the tree is empty, we create a leaf of s as fig. 6.10 (a); otherwise, if
there is none empty common prefix between s and si , where si is bound to some sub-tree
ti , we branch out a new leaf tj . Extract the common prefix, and map it to a new internal
branch node t0 ; then put ti and tj as two sub-trees of t0 . Figure 6.10 (b) shows this case.
There are two special cases: s is the prefix of si as shown in fig. 6.10 (c) → (e); or si is
the prefix of s as shown in fig. 6.10 (d) → (e).
Below function inserts key s and value v to the prefix tree t = (v 0 , ts):

insert [ ] v (v 0 , ts) = (Just v, ts)


(6.11)
insert s v (v 0 , ts) = (v 0 , ins ts)

If the key s is empty, we overwrite the value to v; otherwise, we call ins to examine
the sub-trees and their prefixes.

ins [ ] = ( 7→ (Just v, [ ])]


[s
match s s0 : (branch s v s0 t) : ts0 (6.12)
ins ((s0 7→ t):ts0 ) =
otherwise : (s0 7→ t) : ins ts0

If the node hasn’t any sub-trees, we create a leaf of v as the only sub-tree, and map
s to it; otherwise, for each mapping s0 7→ t, we compare s0 with s. If they have common
prefix (tested by the match function), then we branch out new sub-tree. We define two
lists matching if they have common prefix:
1 Alternatively, we can use Map [k] PrefixTree k v to manage the sub-trees.
6.4. PREFIX TREE 77

another bo

another y
ol

an bo
bool boy
boy bo
(c)
other y
ol
y
boy ol
an bo another bool boy
bool boy

an y
ol
(a) (b) (e)
bool boy

(d)

Figure 6.10: (a) insert ‘boy’ to empty tree; (b) insert ‘bool’, branch a new node out; (c)
insert ‘another’ to (b); (d) insert ‘an’ to (b); (e) insert ‘an’ to (c), same result as insert
‘another’ to (d)

match [ ] B = T rue
match A [ ] = T rue (6.13)
match (a:as) (b:bs) = a = b

To extract the longest common prefix of A and B, define (C, A0 , B 0 ) = lcp A B, where
C+ + A0 = A and C + + B 0 = B. If either A or B is empty, or their first elements are
different, then the common prefix C = [ ]; otherwise, we recursively extract the longest
common prefix from the rest, and preprend the head:

lcp [ ] B= ([ ], [ ], B)
lcp A [ ]= ([
( ], A, [ ])
a 6= b : ([ ], a:as, b:bs)
lcp (a:as) (b:bs) =
otherwise : (a:cs, as0 , bs0 ), where (cs, as0 , bs0 ) = lcp as bs
(6.14)
Function branch A v B t takes two keys A, B, a value v, and a tree t. It extracts
the longest common prefix C from A and B, maps it to a new branch node, and assign
sub-trees:

branch A v 
B t=
0 0
(C, [ ], B ) : (C, (Just v, [B 7→ t]))
(6.15)

lcp A B = (C, A0 , [ ]) : (C, insert A0 v t)
(C, A0 , B 0 ) : (C, (Nothing, [A0 7→ (Just v, [ ]), B 0 7→ t]))

If A is the prefix of B, then A is mapped to the node of v, and the remaining list is
re-mapped to t, which is the only sub-tree in the branch; if B is the prefix of A, then we
recursively insert the remaining list and the value to t; otherwise, we create a leaf of v,
put it together with t as the two sub-trees. The following example program implements
the insert algorithm:
78 CHAPTER 6. RADIX TREE

insert [] v (PrefixTree _ ts) = PrefixTree (Just v) ts


insert k v (PrefixTree v' ts) = PrefixTree v' (ins ts) where
ins [] = [(k, leaf v)]
ins ((k', t) : ts) | match k k' = (branch k v k' t) : ts
| otherwise = (k', t) : ins ts

leaf v = PrefixTree (Just v) []

match [] _ = True
match _ [] = True
match (a:_) (b:_) = a == b

branch a v b t = case lcp a b of


(c, [], b') → (c, PrefixTree (Just v) [(b', t)])
(c, a', []) → (c, insert a' v t)
(c, a', b') → (c, PrefixTree Nothing [(a', leaf v), (b', t)])

lcp [] bs = ([], [], bs)


lcp as [] = ([], as, [])
lcp (a:as) (b:bs) | a 6= b = ([], a:as, b:bs)
| otherwise = (a:cs, as', bs') where
(cs, as', bs') = lcp as bs

We can eliminate the recursion to implement the insert algorithm in loops.


1: function Insert(T, k, v)
2: if T = NIL then
3: T ← Empty-Node
4: p←T
5: loop
6: match ← FALSE
7: for each si 7→ Ti in Sub-Trees(p) do
8: if k = si then
9: Value(Ti ) ← v . Overwrite
10: return T
11: c ← LCP(k, si )
12: k1 ← k − c, k2 ← si − c
13: if c 6= NIL then
14: match ← TRUE
15: if k2 = NIL then . si is prefix of k
16: p ← Ti , k ← k1
17: break
18: else . Branch out a new leaf
19: Add(Sub-Trees(p), c 7→ Branch(k1 , Leaf(v), k2 , Ti ))
20: Delete(Sub-Trees(p), si 7→ Ti )
21: return T
22: if not match then . Add a new leaf
23: Add(Sub-Trees(p), k 7→ Leaf(v))
24: break
25: return T
Function LCP extracts the longest common prefix from two lists.
1: function LCP(A, B)
2: i←1
3: while i ≤ |A| and i ≤ |B| and A[i] = B[i] do
4: i←i+1
5: return A[1...i − 1]
6.5. APPLICATIONS OF TRIE AND PREFIX TREE 79

There is a special case in Branch(s1 , T1 , s2 , T2 ). If s1 is empty, the key to be insert


is some prefix. We set T2 as the sub-tree of T1 . Otherwise, we create a new branch node
and set T1 and T2 as the two sub-trees.
1: function Branch(s1 , T1 , s2 , T2 )
2: if s1 = NIL then
3: Add(Sub-Trees(T1 ), s2 7→ T2 )
4: return T1
5: T ← Empty-Node
6: Sub-Trees(T ) ← {s1 7→ T1 , s2 7→ T2 }
7: return T
Although the prefix tree improves the space efficiency, it is still bound to O(mn),
where n is the length of the key, and m is the size of the element set.

6.4.2 Lookup
When look up a key k, we start from the root. If k = [ ], then return the root value;
otherwise, examine the sub-tree mappings, locate the one si 7→ ti , such that si is some
prefix of k, then recursively look up k − si in sub-tree ti . If does not exist si as the prefix
of k, then there is no such key in the tree.

lookup [ ] (v, ts) = v


lookup k (v, ts) = f(ind ((s, t) 7→ s v k) ts =
(6.16)
Nothing : N othing
Just (s, t) : lookup (k − s) t

Where A v B means list A is prefix of B. Function f ind is defined in chapter 1, which


searches element in a list with a given predication. Below example program implements
lookup.

lookup [] (PrefixTree v _) = v
lookup ks (PrefixTree v ts) =
case find (λ(s, t) → s `isPrefixOf` ks) ts of
Nothing → Nothing
Just (s, t) → lookup (drop (length s) ks) t

The prefix testing is linear to the length of the list, the lookup algorithm is bound to
O(mn) time, where m is the size of the element set, and n is the length of the key. We
leave the imperative implementation as the exercise.

Exercise 6.3

6.3.1. Eliminate the recursion to implement the prefix tree lookup purely with loops.

6.5 Applications of trie and prefix tree


We can use trie and prefix tree to solve many interesting problems, like to develop a smart
dictionary that can populate candidate words, and realize the textonym input method.
Different from the commercial software, we give the examples to illustrate the ideas of
trie and prefix tree.
80 CHAPTER 6. RADIX TREE

6.5.1 Dictionary and input completion

As shown in fig. 6.11, when user enters some characters, the dictionary application
searches the library, populates a list of candidate words or phrases that start from what
input.

Figure 6.11: A dictionary application

A dictionary contains hundreds of thousands words. It’s expensive to perform a com-


plete search. Commercial dictionaries adopt varies engineering approach, like caching,
indexing to speed up search. Figure 6.12 shows a smart text input component. When
type some characters, it populates a candidate lists, with all items starting with the input.

Figure 6.12: A smart text input component

Both examples give the ‘auto-completion’ functionality. We can implement it with


prefix tree. For illustration purpose, we limit to English characters, and set a upper
bound n for the number of candidates. A dictionary stores key-value pairs, where the
key is English word or phrase, the value is the corresponding meaning and explanation.
When user input string s, we look up the prefix tree for all keys start with s. If s is empty
we expand all sub-trees till reach to n candidates; otherwise, we locate the sub-tree from
the mapped key, and look up recursively. If the environment supports lazy evaluation,
we can expand all candidates, and take the first n on demand: take n (startsW ith s t),
where t is the prefix tree.
6.5. APPLICATIONS OF TRIE AND PREFIX TREE 81

startsW ith [ ] (Nothing, ts) = enum ts


startsW ith [ ] (Just x, ts) = ([ ], x) : enum ts
startsW ith s (v, ts) = f ind ((k, t) 7→ s v k or k v s) ts = (6.17)
Nothing :
(
[]
Just (k, t) : [(k ++ a, b)|(a, b) ∈ startsW ith (s − k) t]

Given a prefix s, function startsW ith searches all candidates in the prefix tree starts
with s. If s is empty, it enumerates all sub-trees, and prepand ([ ], x) for none empty
value x in the root. Function enum ts is defined as:

enum = concatMap (k, t) 7→ [(k +


+ a, b)|(a, b) ∈ startsW ith [ ] t] (6.18)

Where concatMap (also known as flatMap) is an important concept for list compu-
tation. Literally, it results like firstly map on each element, then concatenate the result
together. It’s typically realized with the ‘build-foldr’ fusion law to eliminate the interme-
diate list overhead. (chapter 5 of [99] ) If the input prefix s is not empty, we examine the
sub-tree mappings, for each list and sub-tree pair (k, t), if either s is prefix of k or vice
versa, we recursively expand t and prepand k to each result key; otherwise, s does not
match any sub-trees, hence the result is empty. Below example program implements this
algorithm.
startsWith [] (PrefixTree Nothing ts) = enum ts
startsWith [] (PrefixTree (Just v) ts) = ([], v) : enum ts
startsWith k (PrefixTree _ ts) =
case find (λ(s, t) → s `isPrefixOf` k | | k `isPrefixOf` s) ts of
Nothing → []
Just (s, t) → [(s + + a, b) |
(a, b) ← startsWith (drop (length s) k) t]

enum = concatMap (λ(k, t) → [(k +


+ a, b) | (a, b) ← startsWith [] t])

We can also realize the algorithm Starts-With(T, k, n) imperatively. From the root,
we loop on every sub-tree mapping ki 7→ Ti . If k is the prefix for any sub-tree Ti , we
expand all things in it up to n items; if ki is the prefix of k, we then drop that prefix,
update the key to k − ki , then search Ti for this new key.
1: function Starts-With(T, k, n)
2: if T = NIL then
3: return NIL
4: s ← NIL
5: repeat
6: match ← FALSE
7: for ki 7→ Ti in Sub-Trees(T ) do
8: if k is prefix of ki then
9: return Expand(s + + ki , Ti , n)
10: if ki is prefix of k then
11: match ← TRUE
12: k ← k − ki . drop the prefix
13: T ← Ti
14: s←s+ + ki
15: break
16: until not match
17: return NIL
Where function Expand(s, T, n) populates n results from T and prepand s to each
key. We implement it with ‘breadth first search’ method (see section 14.6.1):
82 CHAPTER 6. RADIX TREE

1: function Expand(s, T, n)
2: R ← NIL
3: Q ← [(s, T )]
4: while |R| < n and Q 6= NIL do
5: (k, T ) ← Pop(Q)
6: v ← Value(T )
7: if v 6= NIL then
8: Insert(R, (k, v))
9: for ki 7→ Ti in Sub-Trees(T ) do
10: Push(Q, (k + + ki , Ti ))

6.5.2 Predictive text input


Before 2010, most mobile phones had a small keypad as shown in fig. 6.13, called ITU-T
keypad. It maps a digit to 3 - 4 characters. For example, when input the word ‘home’,
one presses keys in below sequence:

Figure 6.13: The mobile phone ITU-T keypad.

1. Press key ‘4’ twice to enter ‘h’;


2. Press key ‘6’ three times to enter ‘o’;
3. Press key ‘6’ to enter ‘m’;
4. Press key ‘3’ twice to enter ‘e’;

A smarter input method allows to press less keys:

1. Press key sequence ‘4’, ‘6’, ‘6’, ‘3’, the word ‘home’ appears as a candidate;
2. Press key ‘*’ to change to next candidate, word ‘good’ appears;
3. Press key ’*’ again for another candidate, word ‘gone’ appears;
4. ...

This is called predictive input, or abbreviated as ‘T9’ [25] , [26] . The commercial im-
plementations use multiple layers of caches/index in both memory and file system. We
simplify it as an example of prefix tree application. First, we need define the digit key
mappings:
MT 9 = { 2 7→ "abc", 3 7→ "def", 4 7→ "ghi",
5 7→ "jkl", 6 7→ "mno", 7 7→ "pqrs", (6.19)
8 7→ "tuv", 9 7→ "wxyz" }
6.5. APPLICATIONS OF TRIE AND PREFIX TREE 83

MT 9 [i] gives the corresponding characters for digit i. We can also define the reversed
mapping from a character back to digit.

9 = concatMap ((d, s) 7→ [(c, d)|c ← s]) MT 9


MT−1 (6.20)

Given a string, we can convert it to a sequence of digits by looking up MT−1


9.

digits(s) = [MT−1
9 [c]|c ← s] (6.21)

For any character not in [a..z], we map it to a special key ‘#’. Below example program
defines the two mappings.
mapT9 = Map.fromList [('2', ”abc ”), ('3', ” d e f ”), ('4', ” ghi ”),
('5', ” j k l ”), ('6', ”mno”), ('7', ” pqrs ”),
('8', ” tuv ”), ('9', ”wxyz”)]

rmapT9 = Map.fromList $ concatMap (λ(d, s) → [(c, d) | c ← s]) $


Map.toList mapT9

digits = map (λc → Map.findWithDefault '#' c rmapT9)

Suppose we already builded the prefix tree (v, ts) from the words. We need change the
above auto completion algorithm to process digit string ds. For every sub-tree mappings
(s 7→ t) ∈ ts, we convert the prefix s to digits(s), check if it matches to ds (either one is
the prefix of the other). There can be multiple sub-trees match ds as:

pfx = [(s, t)|(s 7→ t) ∈ ts, digits(s) v ds or ds v digits(s)]

f indT 9 t [ ] = [[ ]]
(6.22)
f indT 9 (v, ts) ds = concatMap f ind pfx

For each mapping (s, t) in pfx, function f ind recursively looks up the remaining digits
ds0 in t, where ds0 = drop |s| ds, then prepend s to every candidate. However, the length
may exceed the number of input digits, we need cut and only take n = |ds| characters:

+ si )|si ∈ f indT 9 t ds0 ]


f ind (s, t) = [take n (s + (6.23)

The following example program implements the predictive input algorithm:


findT9 _ [] = [[]]
findT9 (PrefixTree _ ts) k = concatMap find pfx where
find (s, t) = map (take (length k) ◦ (s++)) $ findT9 t (drop (length s) k)
pfx = [(s, t) | (s, t) ← ts, let ds = digits s in
ds `isPrefixOf` k | | k `isPrefixOf` ds]

To realize the predictive text input imperatively, we can perform breadth first search
with a queue Q of tuples (prefix, D, t). Every tuple records the possible prefix searched so
far; the remaining digits D to be searched; and the sub-tree t we are going to search. Q is
initialized with the empty prefix, the whole digits sequence, and the root. We repeatedly
pop the tuple from the queue, and examine the sub-tree mappings. for every mapping
(s 7→ T 0 ), we convert s to digits(s). If D is prefix of it, then we find a candidate. We
append s to prefix, and record it in the result. If digits(s) is prefix of D, we need further
search the sub-tree T 0 . We create a new tuple of (prefix + + s, D0 , T 0 ), where D0 is the
remaining digits to be searched. Then add this new tuple back to the queue.
1: function Look-Up-T9(T, D)
2: R ← NIL
3: if T = NIL or D = NIL then
4: return R
84 CHAPTER 6. RADIX TREE

5: n ← |D|
6: Q ← {(NIL, D, T )}
7: while Q 6= NIL do
8: (prefix, D, T ) ← Pop(Q)
9: for (s 7→ T 0 ) ∈ Sub-Trees(T ) do
10: D0 ← Digits(s)
11: if D0 @ D then . D0 is prefix of D
12: Append(R, (prefix + + s)[1..n]) . limit the length to n
13: else if D @ D0 then
14: Push(Q, (prefix +
+ s, D − D0 , T 0 ))
15: return R
We start from integer trie and prefix tree. With binary format, we re-use binary tree
to realize the integer based map data structure. Then extend the key from integer to
generic list of finite set. Particularly for alphabetic strings, the generic trie and prefix
tree are powerful tools to manipulate the text. We give examples about auto-completion
and predictive text input. As another instance of radix tree, the suffix tree, that is widely
used in text/DNA processing, is closely related to trie and prefix tree.

Exercise 6.4
6.4.1. Implement the auto-completion and predictive text input with trie.
6.4.2. How to ensure the candidates in lexicographic order in the auto-completion and
predictive text input program? What’s the performance change accordingly?

6.6 Appendix: Example programs


Definition of integer binary trie:
data IntTrie<T> {
IntTrie<T> left = null, right = null
Optional<T> value = Optional.Nothing
}

Imperative insert with bit-wise operations:


IntTrie<T> insert(IntTrie<T> t, Int key,
Optional<T> value = Optional.Nothing) {
if t == null then t = IntTrie<T>()
p = t
while key 6= 0 {
if key & 1 == 0 {
p = if p.left == null then IntTrie<T>() else p.left
} else {
p = if p.right == null then IntTrie<T>() else p.right
}
key = key >> 1
}
p.value = Optional.of(value)
return t
}

Definition of integer prefix tree:


data IntTree<T> {
Int key
T value
Int prefix = 0, mask = 1
IntTree<T> left = null, right = null
6.6. APPENDIX: EXAMPLE PROGRAMS 85

IntTree(Int k, T v) {
key = k, value = v, prefix = k
}

bool isLeaf = (left == null and right == null)

Self replace(IntTree<T> x, IntTree<T> y) {


if left == x then left = y else right = y
}

bool match(Int k) = maskbit(k, mask) == prefix


}

Int maskbit(Int x, Int mask) = x & (~(mask - 1))

Insert key-value to integer prefix tree.


IntTree<T> insert(IntTree<T> t, Int key, T value) {
if t == null then return IntTree(key, value)
node = t
Node<T> parent = null
while (not node.isLeaf()) and node.match(key) {
parent = node
node = if zero(key, node.mask) then node.left else node.right
}
if node.isleaf() and key == node.key {
node.value = value
} else {
p = branch(node, IntTree(key, value))
if parent == null then return p
parent.replace(node, p)
}
return t
}

IntTree<T> branch(IntTree<T> t1, IntTree<T> t2) {


var t = IntTree<T>()
(t.prefix, t.mask) = lcp(t1.prefix, t2.prefix)
(t.left, t.right) = if zero(t1.prefix, t.mask) then (t1, t2)
else (t2, t1)
return t
}

bool zero(int x, int mask) = (x & (mask >> 1) == 0)

Int lcp(Int p1, Int p2) {


Int diff = p1 ^ p2
Int mask = 1
while diff 6= 0 {
diff = diff >> 1
mask = mask << 1
}
return (maskbit(p1, mask), mask)
}

Definition of trie and the insert:


data Trie<K, V> {
Optional<V> value = Optional.Nothing
Map<K, Trie<K, V>> subTrees = Map.empty()
}

Trie<K, V> insert(Trie<K, V> t, [K] key, V value) {


if t == null then t = Trie<K, V>()
var p = t
86 CHAPTER 6. RADIX TREE

for c in key {
if p.subTrees[c] == null then p.subTrees[c] = Trie<K, V>()
p = p.subTrees[c]
}
p.value = Optional.of(value)
return t
}

Definition of Prefix Tree and insert:


data PrefixTree<K, V> {
Optional<V> value = Optional.Nothing
Map<[K], PrefixTree<K, V>> subTrees = Map.empty()

Self PrefixTree(V v) {
value = Optional.of(v)
}
}

PrefixTree<K, V> insert(PrefixTree<K, V> t, [K] key, V value) {


if t == null then t = PrefixTree()
var node = t
loop {
bool match = false
for var (k, tr) in node.subtrees {
if key == k {
tr.value = value
return t
}
prefix, k1, k2 = lcp(key, k)
if prefix 6= [] {
match = true
if k2 == [] {
node = tr
key = k1
break
} else {
node.subtrees[prefix] = branch(k1, PrefixTree(value),
k2, tr)
node.subtrees.delete(k)
return t
}
}
}
if !match {
node.subtrees[key] = PrefixTree(value)
break
}
}
return t
}

The longest common prefix lcp and branch.


([K], [K], [K]) lcp([K] s1, [K] s2) {
j = 0
while j < length(s1) and j < length(s2) and s1[j] == s2[j] {
j = j + 1
}
return (s1[0..j-1], s1[j..], s2[j..])
}

PrefixTree<K, V> branch([K] key1, PrefixTree<K, V> tree1,


[K] key2, PrefixTree<K, V> tree2) {
if key1 == []:
tree1.subtrees[key2] = tree2
6.6. APPENDIX: EXAMPLE PROGRAMS 87

return tree1
t = PrefixTree()
t.subtrees[key1] = tree1
t.subtrees[key2] = tree2
return t
}

Populate multiple candidates, they share the common prefix


[([K], V)] startsWith(PrefixTree<K, V> t, [K] key, Int n) {
if t == null then return []
[T] s = []
repeat {
bool match = false
for var (k, tr) in t.subtrees {
if key.isPrefixOf(k) {
return expand(s ++ k, tr, n)
} else if k.isPrefixOf(key) {
match = true
key = key[length(k)..]
t = tr
s = s ++ k
break
}
}
} until not match
return []
}

[([K], V)] expand([K] s, PrefixTree<K, V> t, Int n) {


[([K], V)] r = []
var q = Queue([(s, t)])
while length(r) < n and !q.isEmpty() {
var (s, t) = q.pop()
v = t.value
if v.isPresent() then r.append((s, v.get()))
for k, tr in t.subtrees {
q.push((s ++ k, tr))
}
}
return r
}

Predictive text input lookup


var T9MAP={'2':"abc", '3':"def", '4':"ghi", '5':"jkl",
'6':"mno", '7':"pqrs", '8':"tuv", '9':"wxyz"}

var T9RMAP = { c : d for var (d, cs) in T9MAP for var c in cs }

string digits(string w) = ''.join([T9RMAP[c] for c in w])

[string] lookupT9(PrefixTree<char, V> t, string key) {


if t == null or key == "" then return []
res = []
n = length(key)
q = Queue(("", key, t))
while not q.isEmpty() {
(prefix, key, t) = q.pop()
for var (k, tr) in t.subtrees {
ds = digits(k)
if key.isPrefixOf(ds) {
res.append((prefix ++ k)[:n])
} else if ds.isPrefixOf(key) {
q.append((prefix ++ k, key[length(k)..], tr))
}
88 B-Tree

}
}
return res
}
Chapter 7

B-Tree

7.1 Introduction
The integer prefix tree in previous chapter gives a way to encode the information in the
edge of the binary tree. Another way to extend the binary search tree is to increase
the sub-trees from 2 to k. B-tree is such a data structure, that can be considered as a
generic form of k-ary search tree. It is also developed to be self-balancing [39] . B-tree is
widely used in computer file system (some are based on B+ tree, an extension of B-tree)
and database system. Figure 7.1 gives an example B-tree, we can find the difference and
similarity between B-tree and binary search tree.

E J S V

A C D G K M O P R T U X Y Z

Figure 7.1: A B-Tree

A binary search tree is either empty or contains a key k and two sub-trees l and r.
Every key in l is less than k, while k is less than every key in r:

∀ x ∈ l, y ∈ r ⇒ x < k < y (7.1)

Extend to multiple keys and sub-trees: a B-tree is either empty or contains n keys and
n+1 sub-trees, each sub-tree is also a B-tree, denoted as k1 , k2 , ..., kn and t1 , t2 , ..., tn , tn+1 ,
as shown in fig. 7.2.

k1 k2 … kn

t1 t2 tn tn+1

Figure 7.2: A B-tree node

For every node, the keys and sub-trees satisfy the following rules:

89
90 CHAPTER 7. B-TREE

• Keys are in ascending order: k1 < k2 < ... < kn ;

• For every key ki , all keys in sub-tree ti are less than it, while ki is less than every
key in sub-tree ti+1 :

∀ xi ∈ ti , i = 0, 1, ..., n ⇒ x1 < k1 < x2 < k2 < ... < xn < kn < xn+1 (7.2)

Leaf node has no sub-tree (accurately, all sub-trees are empty). There can be optional
values bound to the keys. We skip the values for simplicity. The type of B-tree is BT ree K
(or BTree<K>), where K is the type of keys. On top of it, we also need define a set of
self-balancing rules:

1. All leaves have the same depth;

2. Let d be the minimum degree number of a B-tree, such that each node:

• has at most 2d − 1 keys;


• has at least d − 1 keys, except for the root;

In summary:

d − 1 ≤ |keys(t)| ≤ 2d − 1 (7.3)

We next prove that a B-tree satisfying these rules is always balanced.

Proof. Consider a B-tree of n keys. The minimum degree d ≥ 2. Let the height be h. All
the nodes have at least d − 1 keys except for the root. The root contains at least 1 key.
There are at least 2 nodes at depth 1, at least 2d nodes at depth 2, at least 2d2 nodes at
depth 3, ..., at least 2dh−1 nodes at depth h. Multiply all nodes with d − 1 except for the
root, the total number of keys satisfies the following:

n ≥ 1 + (d − 1)(2 + 2d + 2d2 + ... + 2dh−1 )


h−1
X
= 1 + 2(d − 1) dk
k=0 (7.4)
dh − 1
= 1 + 2(d − 1)
d−1
= 2dh − 1

It limits the tree height with logarithm of the number of keys.

n+1
h ≤ logd (7.5)
2

Hence B-tree is balanced. The simplest B-tree is called 2-3-4 tree, where d = 2. Every
node except for the root contains 2, 3, or 4 sub-trees. Essentially, a red-black tree can be
mapped to a 2-3-4 tree. For a none empty B-tree of degree d, we denote it as (d, (ks, ts)),
where ks are the keys, ts are the sub-trees. Below example program defines the B-tree.
data BTree a = BTree [a] [BTree a]

Let the empty node be (∅, ∅) (or BTree [] []). Instead of storing d in every node,
we pass it together with B-tree t as a pair (d, t).
7.2. INSERT 91

7.2 Insert
The idea is similar to the binary search tree. While we need deal with multiple keys and
sub-trees. When insert key x to B-tree t, starting from the root, we examine the keys in
the node to locate a position1 where all keys on the left are less than x, while the rest keys
on the right are greater than x. If the node is a leaf, and it is not full (|keys(t)| < 2d − 1),
we insert x at this position. Otherwise, this position points to a sub-tree t0 , we recursively
insert x to t0 .
20

4 11 26 38 45

1 2 5 8 9 12 15 16 17 21 25 30 31 37 40 42 46 47 50

20

4 11 26 38 45

1 2 5 8 9 12 15 16 17 21 22 25 30 31 37 40 42 46 47 50

Figure 7.3: Insert 22 to a 2-3-4 tree. 22 > 20, go to the right sub-tree; next as 22 < 26,
go to the first sub-tree; finally, 21 < 22 < 25, and the leaf is not full.

For example, consider the 2-3-4 tree in fig. 7.3. when insert x = 22, because 20 < 22,
we next examine the sub-tree on the right, which contains 26, 38, 45. Since 22 < 26, we
next go to the first sub-tree containing 21 and 25. This is a leaf, and it is not full. Hence
we insert 22 to this node.
However, if there are already 2d − 1 keys in the leaf, we will break the B-tree rules
after insert (too ’full’). For the same B-tree in fig. 7.3, we’ll meet this issue when insert
18. There are two solutions: insert then split, and split before insert.

7.2.1 Insert then split


We can adopt the similar ‘insert then fix’ method for the red-black tree. First, we insert
the key to the proper ordering position without considering the B-tree balancing rules.
Next, if the new tree violates any rule, we recursively bottom-up split and fix the overly full
node. Define a function to test whether a node satisfies the minimum degree constraint:
(
f ull d (ks, ts) = |ks| > 2d − 1
(7.6)
low d (ks, ts) = |ks| < d − 1

When the node contains too many keys and sub-trees, define split function to break
it into 3 parts at position m as shown in fig. 7.4:

split m (ks, ts) = ((ksl , tsl ), k, (ksr , tsr )) (7.7)

We reuse the list splitAt function in eq. (1.45).


(
(ksl , (k : ksr )) = splitAt (m − 1) ks
(tsl , tsr ) = splitAt m ts
1 In fact, it is sufficient to only support less-than and equality. See Exercise 7.1.1.
92 CHAPTER 7. B-TREE

k1 … km-1 km km+1 … kn

t1 tm tm+1 tn+1

km

k1 … km-1 km+1 … kn

t1 tm tm+1 tn+1

Figure 7.4: Split the node into 3 parts at m

Conversely, unsplit combines the 3 parts back into a B-tree node:

unsplit (ksl , tsl ) k (ksr , tsr ) = (ksl +


+ [k] +
+ ksr , tsl +
+ tsr ) (7.8)

We first insert x to the tree t, then call f ix to resume the B-tree balancing rules with
degree d.

insert x (d, t) = f ix (d, ins t) (7.9)

After ins, if the root contains too many keys, function f ix calls split to break it and
build a new root.

(d, ([k], [l, r])), where (l, k, r) = split d t


(
f ull d t :
f ix (d, t) = (7.10)
otherwise : (d, t)

ins need handle two cases: for leaf, reuse the list ordered insert defined in eq. (1.11);
otherwise, we need find the position of the sub-tree where recursively insert to. Define a
partition as:

partition x (ks, ts) = (l, t0 , r) (7.11)

Where l = (ksl , tsl ) and r = (ksr , tsr ). It further calls the list partition function span
defined in eq. (1.47):
(
(ksl , ksr ) = span (< x) ks
0
(tsl , (t : tsr )) = splitAt |ksl | ts

As such, we separate all the keys and sub-trees less than x on the left as l, and those
greater than x on the right as r. The last sub-tree that less than x is extracted as t0 . We
then recursively insert x to t0 , as shown in fig. 7.5.

ins (ks, ∅) = (insertL x ks, ∅) list insert for leaf


(7.12)
ins (ks, ts) = balance d l (ins t0 ) r where (l, t0 , r) = partition x t

After insert x to t0 , it may contains too many keys that violates B-tree rules. We
define function balance to recursively recover B-tree rules by splitting sub-tree.
7.2. INSERT 93

ki-1 < x < ki


insert

k1 … ki-1 ki … kn

t1 ti tn+1

k1 … ki-1 ki … kn
ki-1 < x < ki
insert

t1 ti-1 ti ti+1 tn+1

l t’ r

Figure 7.5: partition a node with x

(
f ull d t : f ixf
balance d (ksl , tsl ) t (ksr , tsr ) = (7.13)
otherwise : (ksl ++ ksr , tsl +
+ [t] +
+ tsr )

where f ixf splits sub-tree t as (t1 , k, t2 ) = split d t, then combine them to a new
node:

f ixf = (ksl +
+ [k] +
+ ksr , tsl +
+ [t1 , t2 ] +
+ tsr ) (7.14)

The following example program implements insert for B-tree.

partition x (BTree ks ts) = (l, t, r) where


l = (ks1, ts1)
r = (ks2, ts2)
(ks1, ks2) = span (< x) ks
(ts1, (t:ts2)) = splitAt (length ks1) ts

split d (BTree ks ts) = (BTree ks1 ts1, k, BTree ks2 ts2) where
(ks1, k:ks2) = splitAt (d - 1) ks
(ts1, ts2) = splitAt d ts

insert x (d, t) = fixRoot (d, ins t) where


ins (BTree ks []) = BTree (List.insert x ks) []
ins t = balance d l (ins t') r where (l, t', r) = partition x t

fixRoot (d, t) | full d t = let (t1, k, t2) = split d t in


(d, BTree [k] [t1, t2])
| otherwise = (d, t)

balance d (ks1, ts1) t (ks2, ts2)


| full d t = fixFull
| otherwise = BTree (ks1 ++ ks2) (ts1 ++ [t] ++ ts2)
where
fixFull = let (t1, k, t2) = split d t in
BTree (ks1 ++ [k] ++ ks2) (ts1 ++ [t1, t2] +
+ ts2)

Figure 7.6 shows the example B-trees built from “GMPXACDEJKNORSTUVYZ”.


94 CHAPTER 7. B-TREE

E J S V

A C D G K M O P R T U X Y Z

E O U

A C D G J K M N P R S T V X Y Z

Figure 7.6: Repeatedly insert elements from “GMPXACDEJKNORSTUVYZ”. above:


d = 2 (2-3-4 tree), below: d = 3

7.2.2 Split before insert


The second method is to split a node before insertion to prevent it overly full. We often
see this method in imperative implementation. When perform top-down recursive insert,
if we reach to a node with 2d − 1 keys, we divide it into 3 parts as shown in fig. 7.4, such
that each new node has d − 1 keys. They will be valid B-tree node after insertion. For
node x, let K(x) be the keys, T (x) be the sub-trees. Denote the i-th key of x as ki (x),
the j-th sub-tree as tj (x). Below algorithm splits the i-th sub-tree of node z:
1: procedure Split(z, i)
2: d ← Deg(z)
3: x ← ti (z)
4: y ← Create-Node
5: K(y) ← [kd+1 (x), kd+2 (x), ..., k2d−1 (x)]
6: K(x) ← [k1 (x), k2 (x), ..., kd−1 (x)]
7: if x is not leaf then
8: T (y) ← [td+1 (x), td+2 (x), ..., t2d (x)]
9: T (x) ← [t1 (x), t2 (x), ..., td (x)]
10: Insert-At(K(z), i, kd (x))
11: Insert-At(T (z), i + 1, y)
When split the node x = ti (z), we push the d-th key kd (x) up to the parent node z.
If z is already full, the pushing will break B-tree rules. To solve this problem, we need
do the top-down check from the root along the path when insert. Split any node with
2d − 1 keys. Since all parent nodes are processed to be not full, they can accept the
additional key pushed up. This method needs one single pass down the tree without any
back-tracking. If the root is full, we create a new node, and put the root as it singleton
sub-tree. Below is the insert algorithm:
1: function Insert(t, k)
2: r←t
3: if r is full then . root is full
4: s ← CREATE-NODE
5: T (s) ← [r]
6: Split(s, 1)
7.2. INSERT 95

7: r←s
8: return Insert-Nonfull(r, k)
Where Insert-Nonfull assumes the node r passed in is not full. If r is a leaf, we
insert k to the keys based on order (Exercise 7.1.3 asks to realize the ordered insert with
binary search); otherwise, we locate the position, where ki (r) < k < ki+1 (r). Split the
sub-tree ti (r) if it is full, and go on insert to this sub-tree.
1: function Insert-Nonfull(r, k)
2: n ← |K(r)|
3: if r is leaf then
4: i←1
5: while i ≤ n and k > ki (r) do
6: i←i+1
7: Insert-At(K(r), i, k)
8: else
9: i←n
10: while i > 1 and k < ki (r) do
11: i←i−1
12: if ti (r) is full then
13: Split(r, i)
14: if k > ki (r) then
15: i←i+1
16: Insert-Nonfull(ti (r), k)
17: return r
This algorithm is recursive. Exercise 7.1.2 asks to eliminate the recursion with pure
loops. Figure 7.7 gives the result with the same input of “GMPXACDEJKNORSTUVYZ”.

E P

C M S U X

A D G J K N O R T V Y Z

D M P T

A C E G J K N O R S U V X Y Z

Figure 7.7: Insert from “GMPXACDEJKNORSTUVYZ”. up: d = 2, 2-3-4 tree; bottom:


d = 3.

7.2.3 Paired lists


When use list to store ordered keys, we always start from the first key, and scan the list
to find the insert position. If the keys are stored in array, we can improve it with binary
search. Can we start somewhere in the node, go left or right depending on the order of
keys? One idea is to separate the B-tree node into three parts: left l, a sub-tree t0 , and
right r. Where left and right are lists of pairs, each pair contains a key and a sub-tree:
96 CHAPTER 7. B-TREE

(ki , ti ). However, l is reversed. In other words, l and r are head-to-head connected by


t0 as a U-shape shown in fig. 7.8. We can move forward and backward both in constant
time.

k1 … ki-1 ki … kn

t1 ti-1 ti ti+1 tn+1

l t’ r

(ki-1, ti-1) … (k1 , t1)

t’ l
ti head tail

(ki , ti+1) … (kn , tn+1)

Figure 7.8: Define the B-tree node with a sub-tree and paired lists

Below example program defines B-tree node. It’s either empty, or contains 3 parts:
the left list of (key, sub-tree) pairs in reversed order, a sub-tree, and the right list of (key,
sub-tree) pairs. We denoted the none empty node as (l, t0 , r).
data BTree a = Empty
| BTree [(a, BTree a)] (BTree a) [(a, BTree a)]

When move to right by a step, we take the first pair (k, t) from r, then form another
pair (k, t0 ) in front of l, and replace t0 with t. When move to left a step, it is symmetric.
Both operations take constant time.
stepl ((k, t):l, t0 , r) = (l, t, (k, t0 ):r)
(7.15)
stepr (l, t0 , (k, t):r) = ((k, t0 ):l, t, r)

With the left/right moves, we can implement a generic partition p t, that separates
the tree t with a predicate p into 3 parts: left, middle, right: (l, m, r), such that all sub-
trees in l and m satisfy p, while the sub-trees in r do not. Let hd = f st ◦ head, picks the
first pair (a, b) from a list, then extracts a out.
(
p(hd(r)) : partition p (stepr t)
partition p (∅, m, r) =
otherwise : (∅, m, r)
(
(not ◦ p)(hd(l)) : partition p (stepl t)
partition p (l, m, ∅) =
otherwise : (l, m, ∅)
p(hd(l)) and (not ◦ p)(hd(r)) : (l, m, r)


partition p (l, m, r) = p(hd(r)) : partition p (stepr t)

(not ◦ p)(hd(l)) : partition p (stepl t)

(7.16)
For example, partition (< k) t moves all keys and sub-trees in t less than k out of the
right part. Below example program implements the partition function:
7.2. INSERT 97

partition p t@(BTree [] m r)
| p (hd r) = partition p (stepR t)
| otherwise = ([], m, r)
partition p t@(BTree l m [])
| (not ◦ p) (hd l) = partition p (stepL t)
| otherwise = (l, m, [])
partition p t@(BTree l m r)
| p (hd l) && (not ◦ p) (hd r) = (l, m, r)
| p (hd r) = partition p (stepR t)
| (not ◦ p) (hd l) = partition p (stepL t)

We can use stepl /stepr to split a B-tree at position d when it’s overly full. Let n = |l|
be the number of keys/sub-trees of the left part. f n (x) means repeatedly apply function
f to x for n times.

n < d :
 sp(stepd−n
r (t))
split d t = n > d : n−d
sp(stepr (t)) (7.17)

otherwise : sp(t)

Where sp does the separation work as below:


sp (l, t, (k, t0 ):r) = ((l, t, ∅), k, (∅, t0 , r)) (7.18)
With partition and split defined, we can define B-tree insert algorithm for the paired
lists implementation. Firstly, we need modify the low/full testing to count both left and
right parts:
f ull d ∅ = F alse
(7.19)
f ull d (l, t0 , r) = |l| + |r| > 2d − 1
and
low d ∅ = F alse
(7.20)
low d (l, t0 , r) = |l| + |r| < d − 1
When insert key x to B-tree t of degree d, we do the recursive insertion, then fix the
root if it gets overly full:
insert x (d, t) = f ix (d, ins t) (7.21)
Where f ix splits the root at d if needed:

(d, (∅, t1 , [(k, t2 )] where (t1 , k, t2 ) = split d t


(
f ull d t :
f ix (d, t) = (7.22)
otherwise : (d, t)

Function ins need handle both t = ∅, and t 6= ∅ cases. For empty case, we create
a singleton leaf; otherwise, we call (l, t0 , r) = partition (< x) t to locate the position for
recursive insert:
ins ∅ = (∅,
( ∅, [(x, ∅)])
t0 = ∅ : balance d l ∅ ((x, ∅):r) (7.23)
ins t =
t0 6= ∅ : balance d l (ins t0 ) r

Function balance examines if the sub-tree t contains too many keys, and splits it.
(
f ull d t : f ixF ull
balance d l t r = (7.24)
otherwise : (l, t, r)

Where f ixF ull = (l, t1 , ((k, t2 ):r), and (t1 , k, t2 ) = split d t. Below example program
implements the insert algorithm:
98 CHAPTER 7. B-TREE

insert x (d, t) = fixRoot (d, ins t) where


ins Empty = BTree [] Empty [(x, Empty)]
ins t = let (l, t', r) = partition (< x) t in
case t' of
Empty → balance d l Empty ((x, Empty):r)
_ → balance d l (ins t') r

fixRoot (d, t) | full d t = let (t1, k, t2) = split d t in


(d, BTree [] t1 [(k, t2)])
| otherwise = (d, t)

balance d l t r | full d t = fixFull


| otherwise = BTree l t r
where
fixFull = let (t1, k, t2) = split d t in BTree l t1 ((k, t2):r)

split d t@(BTree l _ _) | n < d = sp $ iterate stepR t !! (d - n)


| n > d = sp $ iterate stepL t !! (n - d)
| otherwise = sp t
where
n = length l
sp (BTree l t ((k, t'):r)) = (BTree l t [], k, BTree [] t' r)

Exercise 7.1
7.1.1. Can we use ≤ to support duplicated keys in B-Tree?
7.1.2. For the ‘split then insert’ algorithm, eliminate the recursion with loops.
7.1.3. We use linear search among keys to find the proper insert position. Improve the im-
perative implementation with binary search. Is the big-O performance improved?

7.3 Lookup
For lookup, we extend from the binary search tree to multiple branches, and obtain the
generic B-tree lookup solution. There are only two directions when lookup the binary
search tree: left and right, while, there are multiple ways in B-tree. Consider lookup k
in B-tree t = (ks, ts), if t is a leaf (ts is empty), then the problem becomes list lookup;
otherwise, we partition t with k into three parts: l = (ksl , tsl ), t0 , r = (ksr , tsr ), where all
keys in l and sub-tree t0 are less then k, and the remaining (≥ k) is in r. If the first key
in ksr equals k, then we find the answer; otherwise, we recursive look up in sub-tree t0 .

Just (ks, ∅)
(
k ∈ ks :
lookup k (ks, ∅) =
otherwise : Nothing
(7.25)
Just k = safeHd ksr : Just (ks, ts)
(
lookup k (ks, ts) =
otherwise : lookup k t0

Where ((ksl , tsl ), t0 , (ksr , tsr )) = partition k t, and

safeHd [] = Nothing
safeHd (x:xs) = Just x

Below example program2 implements lookup.


2 safeHd is provided as listToMaybe in some library.
7.4. DELETE 99

lookup k t@(BTree ks []) = if k `elem` ks then Just t else Nothing


lookup k t = if (Just k) == safeHd ks then Just t
else lookup k t' where
(_, t', (ks, _)) = partition k t

For the paired list implementation, the idea is similar. If the tree is not empty, we
partition it with the predicate ‘< k’. Then check if the first key in the right part equals
to k, or recursively look up the partitioned sub-tree:

lookup k ∅ = Nothing
Just k = safeFst (safeHd r) : Just (l, t0 , r)
(
(7.26)
lookup k t =
otherwise : lookup k t0

Where (l, t0 , r) = partition (< k) t for the none empty tree case. safeFst applies fst
function to a ‘Maybe’ value. Below example program utilizes fmap to do this:
lookup x Empty = Nothing
lookup x t = let (l, t', r) = partition (< x) t in
if (Just x) == fmap fst (safeHd r) then Just (BTree l t' r)
else lookup x t'

For the imperative implementation, we start from the root r, find a position i among
the keys, such that ki (r) ≤ k < ki+1 (r). If ki (r) = k then return the node r and i as a
pair; otherwise, move to sub-tree ti (r), and go on looking up. If r is a leaf and k is not
in the keys, then return nothing as k does not exist in the tree.
1: function Lookup(r, k)
2: loop
3: i ← 1, n ← |K(r)|
4: while i ≤ n and k > ki (r) do
5: i←i+1
6: if i ≤ n and k = ki (r) then
7: return (r, i)
8: if r is leaf then
9: return Nothing . k does not exist
10: else
11: r ← ti (r) . go to the i-th sub-tree

Exercise 7.2
7.2.1. Improve the imperative lookup with binary search among keys.

7.4 Delete
After delete a key, the number of keys may be too few to be a valid B-tree node. Except
the root, the keys should not be less than d−1, where d is the minimum degree. There are
two methods symmetric to insert: we can either delete then fix, or merge before delete.

7.4.1 Delete and fix


We first extend the delete algorithm from binary search tree to multiple branches, then
fix the B-tree balancing rules. The main program is defined with two steps:

delete x (d, t) = f ix (d, del x t) (7.27)


100 CHAPTER 7. B-TREE

Where del is the function we extend to support multiple branches. If t is a leaf,


we merely delete x from the keys; otherwise, we partition the tree with x into 3 parts:
(l, t0 , r). Where all the keys in l and sub-tree t0 are less than x, and the rest in r are
(≥ x). When r isn’t empty, we pick the first key ki from it. If the key ki = x, we next
replace it with the maximum key k 0 = max(t0 ), and recursively delete k 0 from t0 as shown
in fig. 7.9. Otherwise (either r is empty, or ki 6= x), we recursively delete x from sub-tree
t0 .
delete x

k1 … ki-1 ki=x … kn

replace ki with k’
delete k’
t1 tn+1

ti

k’=max(ti)

Figure 7.9: Replace ki with k 0 = max(t0 ), then recursively delete k 0 from t0 .

del x (ks, ∅) = (delete l x ks, ∅)


Just x = safeHd ks0 : balance d l (del k 0 t0 ) (k 0 : (tail ks0 ), ts0 )
(
del x t =
otherwise : balance d l (del x t0 ) (ks0 , ts0 )
(7.28)
Where (l, t0 , (ks0 , ts0 )) = partition x t, are the 3 parts partitioned by x. On top of it,
we extract the maximum key k 0 from t0 . The max function is defined as:
max (ks, ∅) = last ks
(7.29)
max (ks, ts) = max (last ts)

Function last returns the last element from a list (eq. (1.4)). deletel is the list delete al-
gorithm (eq. (1.14)). tail drops the first element from a list and returns the rest (eq. (1.1)).
We need modify the balance function, which defined for insert before, with the additional
logic to merge the node if it contains too few keys.

f ull d t : f ixf

balance d (ksl , tsl ) t (ksr , tsr ) = low d t : f ixl (7.30)

otherwise : (ksl +
+ ksr , tsl +
+ [t] +
+ tsr )

If t is overly low (< d − 1 keys), we call f ixl to merge it with the left part (ksl , tsl )
or right part (ksr , tsr ) depends on which side of keys is not empty. Use the left part for
example: we extract the last element from ksl and tsl respectively, say km and tm . Then
call unsplit (eq. (7.8)) to merge them with t as unsplit tm km t. It forms a new sub-tree
with more keys. Finally we call balance again to build the result B-tree.

ksl 6= ∅ :
 balance d (init ksl , init tsl ) (unsplit tm km t) (ksr , tsr )
f ixl = ksr 6= ∅ : balance d (ksl , tsl ) (unsplit t k1 t1 ) (tail ksr , tail tsr ) (7.31)

otherwise : t

The last case (otherwise) means ksl = ksr = ∅, both sides are empty. The tree is
a singleton leaf hence need not fixing. k1 and t1 are the first element in ksr and tsr
7.4. DELETE 101

respectively. Finally, we need modify the f ix function defined for insert, add new logic
for delete:
f ix (d, (∅, [t])) =(d,
( t)
f ull d t : (d, ([k], [l, r])), where (l, k, r) = split d t (7.32)
f ix (d, t) =
otherwise : (d, t)

What we add is the first case. After delete, if the root contains nothing but a sub-tree,
we can shrink the height, pull the single sub-tree as the new root. The following example
program implements the delete algorithm.
delete x (d, t) = fixRoot (d, del x t) where
del x (BTree ks []) = BTree (List.delete x ks) []
del x t = if (Just x) == safeHd ks' then
let k' = max t' in
balance d l (del k' t') (k':(tail ks'), ts')
else balance d l (del x t') r
where
(l, t', r@(ks', ts')) = partition x t

fixRoot (d, BTree [] [t]) = (d, t)


fixRoot (d, t) | full d t = let (t1, k, t2) = split d t in
(d, BTree [k] [t1, t2])
| otherwise = (d, t)

balance d (ks1, ts1) t (ks2, ts2)


| full d t = fixFull
| low d t = fixLow
| otherwise = BTree (ks1 ++ ks2) (ts1 ++ [t] ++ ts2)
where
fixFull = let (t1, k, t2) = split d t in
BTree (ks1 ++ [k] ++ ks2) (ts1 ++ [t1, t2] ++ ts2)
fixLow | not $ null ks1 = balance d (init ks1, init ts1)
(unsplit (last ts1) (last ks1) t)
(ks2, ts2)
| not $ null ks2 = balance d (ks1, ts1)
(unsplit t (head ks2) (head ts2))
(tail ks2, tail ts2)
| otherwise = t

We leave the delete function for the ‘paired list’ implementation as an exercise. Fig-
ures 7.10 to 7.12 give examples of delete.

E J S V

A C D G K M O P R T U X Y Z

Figure 7.10: Before delete

7.4.2 Merge before delete


The other way is to merge the nodes before delete if there are too few keys. Consider
delete key x from the tree t, let us start from the easy case.
Case 1. If x exists in node t, and t is a leaf, we can directly remove x from t. If t is
the singleton node in the tree (root), we needn’t worry about too few keys.
102 CHAPTER 7. B-TREE

E J S V

A D G K M O P R T U X Y Z

G S V

A D E K M O P R T U X Y Z

Figure 7.11: Delete ‘C’, then delete ‘J’

G S V

A D E M O P R T U X Y Z

D S V

A E G O P R T U X Y Z

Figure 7.12: Delete ‘K’, then delete ‘N’


7.4. DELETE 103

Case 2. If x exists in node t, but t is not a leaf. There are three sub-cases:
Case 2a. As shown in fig. 7.9, let the predecessor of ki = x be k 0 , where k 0 = max(ti ).
If ti has sufficient keys (≥ d), we replace ki with k 0 , then recursively delete k 0 from ti .
Case 2b. If ti does not have enough keys, but the sub-tree ti+1 does (≥ d). Symmet-
rically, we replace ki with its successor k 00 , where k 00 = min(ti+1 ), then recursively delete
k 00 from ti+1 , as shown in fig. 7.13.

delete x

k1 … ki-1 ki=x … kn

delete k”

t1 replace ki with k” tn+1

ti+1

k"=min(ti+1)

Figure 7.13: Replace ki with k 00 = min(ti+1 ), then delete k 00 from ti+1 .

Case 2c. If neither ti nor ti+1 contains sufficient keys (|ti | = |ti+1 | = d − 1), we merge
ti , x, ti+1 to a new node. This new node has 2d − 1 keys, we can safely perform delete on
it as shown in fig. 7.14.

... x = ki ... ... ...


merge

ti ti+1

k'1, ..., k'd-1 k''1, ..., k''d-1 k'1, ..., k'd-1 ki k''1, ..., k''d-1

Figure 7.14: Merge before delete

Merge pushes a key ki down to the sub-tree. After that, if node t becomes empty, it
means ki is the only key in t, and ti , ti+1 are the only two sub-trees. We need shrink the
tree height as shown in fig. 7.15.

shrink
k'1, ..., k'd-1 ki k''1, ..., k''d-1 k'1, ..., k'd-1 ki k''1, ..., k''d-1

Figure 7.15: Shrink

Case 3. If node t does not contain x, we need recursively delete x from a sub-tree ti .
There are two sub-cases if there are too few keys in ti :
Case 3a. Among the two siblings ti−1 , ti+1 , if either one has enough keys (≥ d),
we move a key from t to ti , then move a key from the sibling up to t, and move the
104 CHAPTER 7. B-TREE

corresponding sub-tree from the sibling to ti . As shown in fig. 7.16, ti received one more
key. We next recursively delete x from ti .

... ki ...

ti ti+1

k'1, ..., k'd-1 k''1, k''2, ... , k''m

t'1 t'd t''1 t''m+1

... k''1 ...

ti ti+1

k'1, ..., k'd-1 ki k''2, ... , k''m

t'1 t'd t''1 t''2 t''m+1

Figure 7.16: Borrow from the right sibling.

Case 3b. If neither sibling has sufficient keys (|ti−1 | = |ti+1 | = d − 1), we merge ti ,
a key from t, and either sibling into a new node, as shown in fig. 7.17. Then recursively
delete x from it.

... ki ...

ti ti+1

k'1, ..., k'd-1 k''1, ... , k''m

t'1 t'd t''1 t''m+1

... ...

ti

k'1, ..., k'd-1 ki k''1, ... , k''m

t'1 t'd t''1 t''m+1

Figure 7.17: Merge ti , k, ti+1

Below Delete algorithm implements the ‘merge then delete’ method:


1: function Delete(t, k)
2: if t is empty then
3: return t
4: i ← 1, n ← |K(t)|
5: while i ≤ n and k > ki (t) do
7.4. DELETE 105

6: i←i+1
7: if k = ki (t) then
8: if t is leaf then . case 1
9: Remove(K(t), k)
10: else . case 2
11: if |K(ti (t))| ≥ d then . case 2a
12: ki (t) ← Max(ti (t))
13: Delete(ti (t), ki (t))
14: else if |K(ti+1 (t))| ≥ d then . case 2b
15: ki (t) ← Min(ti+1 (t))
16: Delete(ti+1 (t), ki (t))
17: else . case 2c
18: Merge-At(t, i)
19: Delete(ti (t), k)
20: if K(T ) is empty then
21: t ← ti (t) . Shrinks height
22: return t
23: if t is not leaf then
24: if k > kn (t) then
25: i←i+1
26: if |K(ti (t))| < d then . case 3
27: if i > 1 and |K(ti−1 (t))| ≥ d then . case 3a: left
28: Insert(K(ti (t)), ki−1 (t))
29: ki−1 (t) ← Pop-Last(K(ti−1 (t)))
30: if ti (t) is not leaf then
31: Insert(T (ti (t)), Pop-Back(T (ti−1 (t))))
32: else if i ≤ n and |K(ti+1 (t))| ≥ d then . case 3a: right
33: Append(K(ti (t)), ki (t))
34: ki (t) ← Pop-First(K(ti+1 (t)))
35: if ti (t) is not leaf then
36: Append(T (ti (t)), Pop-First(T (ti+1 (t))))
37: else . case 3b
38: if i = n + 1 then
39: i←i−1
40: Merge-At(t, i)
41: Delete(ti (t), k)
42: if K(t) is empty then . Shrinks height
43: t ← t1 (t)
44: return t
Where Merge-At(t, i) merges sub-tree ti (t), key ki (t), and ti+1 (t) into one sub-tree.
1: procedure Merge-At(t, i)
2: x ← ti (t)
3: y ← ti+1 (t)
4: K(x) ← K(x) + + [ki (t)] +
+ K(y)
5: T (x) ← T (x) + + T (y)
6: Remove-At(K(t), i)
7: Remove-At(T (t), i + 1)

Exercise 7.3
7.3.1. When delete a key k from the branch node, we use the maximum key from the
106 CHAPTER 7. B-TREE

predecessor sub-tree k 0 = max(t0 ) to replace k, then recursively delete k 0 from


t0 . There is a symmetric method, to replace k with the minimum key from the
successor sub-tree. Implement this solution.
7.3.2. Define the delete function for the ‘paired list’ implementation.

7.5 Summary
We extend the binary search tree to multiple branches, then constrain the branches within
a range to develop the B-tree. B-tree is used as a tool to control the magnetic disk access
(chapter 18, [4] ). Because all B-tree nodes store keys in a range, not too few or too
many, B-tree is balanced. Most of the tree operations are proportion to the height. The
performance is bound to O(lg n) time, where n is the number of keys.

7.6 Appendix: Example programs


Definition of B-tree:
data BTree<K, Int deg> {
[K] keys
[BTree<K>] subStrees;
}

Split node
void split(BTree<K, deg> z, Int i) {
var d = deg
var x = z.subTrees[i]
var y = BTree<K, deg>()
y.keys = x.keys[d ...]
x.keys = x.keys[ ... d - 1]
if not isLeaf(x) {
y.subTrees = x.subTrees[d ... ]
x.subTrees = x.subTrees[... d]
}
z.keys.insert(i, x.keys[d - 1])
z.subTrees.insert(i + 1, y)
}

Bool isLeaf(BTree<K, deg> t) = t.subTrees == []

Insert a key to B-tree:


BTree<K, deg> insert(BTree<K, deg> tr, K key) {
var root = tr
if isFull(root) {
var s = BTree<K, deg>()
s.subTrees.insert(0, root)
split(s, 0)
root = s
}
return insertNonfull(root, key)
}

Insert a key to a non-full node.


BTree<K, deg> insertNonfull(BTree<K, deg> tr, K key) {
if isLeaf(tr) {
orderedInsert(tr.keys, key)
} else {
Int i = length(tr.keys)
7.6. APPENDIX: EXAMPLE PROGRAMS 107

while i > 0 and key < tr.keys[i - 1] {


i = i - 1
}
if isFull(tr.subTrees[i]) {
split(tr, i)
if key > tr.keys[i] then i = i + 1
}
insertNonfull(tr.subTree[i], key)
}
return tr
}

Where orderedInsert inserts an element to an ordered list.


void orderedInsert([K] lst, K x) {
Int i = length(lst)
lst.append(x)
while i > 0 and lst[i] < lst[i-1] {
(lst[i-1], lst[i]) = (lst[i], lst[i-1])
i = i - 1
}
}

Bool isFull(BTree<K, deg> x) = length(x.keys) ≥ 2 ∗ deg - 1


Bool isLow(BTree<K, deg> x) = length(x.keys) ≤ deg - 1

Iterative lookup:
Optional<(BTree<K, deg>, Int)> lookup(BTree<K, deg> tr, K key) {
loop {
Int i = 0, n = length(tr.keys)
while i < n and key > tr.keys[i] {
i = i + 1
}
if i < n and key == tr.keys[i] then return Optional.of((tr, i))
if isLeaf(tr) {
return Optional.Nothing
} else {
tr = tr.subTrees[i]
}
}
}

Imperative merge before delete:


BTree<K, deg> delete(BTree<K, deg> t, K x) {
if empty(t.keys) then return t
Int i = 0, n = length(t.keys)
while i < n and x > t.keys[i] { i = i + 1 }
if x == t.keys[i] {
if isLeaf(t) { // case 1
removeAt(t.keys, i)
} else {
var tl = t.subtrees[i]
var tr = t.subtrees[i + 1]
if not low(tl) { // case 2a
t.keys[i] = max(tl)
delete(tl, t.keys[i])
} else if not low(tr) { // case 2b
t.keys[i] = min(tr)
delete(tr, t.keys[i])
} else { // case 2c
mergeSubtrees(t, i)
delete(d, tl, x)
if empty(t.keys) then t = tl // shrink height
}
108 Binary Heaps

return t
}
if not isLeaf(t) {
if x > t.keys[n - 1] then i = i + 1
if low(t.subtrees[i]) {
var tl = if i == 0 then null else t.subtrees[i - 1]
var tr = if i == n then null else t.subtrees[i + 1]
if tl 6= null and (not low(tl)) { // case 3a, left
insert(t.subtrees[i].keys, 0, t.keys[i - 1])
t.keys[i - 1] = popLast(tl.keys)
if not isLeaf(tl) {
insert(t.subtrees[i].subtrees, 0, popLast(tl.subtrees))
}
} else if tr 6= null and (not low(tr)) { // case 3a, right
append(t.subtrees[i].keys, t.keys[i])
t.keys[i] = popFirst(tr.keys)
if not isLeaf(tr) {
append(t.subtrees[i].subtrees, popFirst(tr.subtrees))
}
} else { // case 3b
mergeSubtrees(t, if i < n then i else (i - 1))
if i == n then i = i - 1
}
delete(t.subtrees[i], x)
if empty(t.keys) then t = t.subtrees[0] // shrink height
}
}
return t
}

merge sub-trees, find the min/max key from a B-tree.


void mergeSubtrees(BTree<K, deg>, Int i) {
t.subtrees[i].keys += [t.keys[i]] + t.subtrees[i + 1].keys
t.subtrees[i].subtrees += t.subtrees[i + 1].subtrees
removeAt(t.keys, i)
removeAt(t.subtrees, i + 1)
}

K max(BTree<K, deg> t) {
while not empty(t.subtrees) {
t = last(t.subtrees)
}
return last(t.keys)
}

K min(BTree<K, deg> t) {
while not empty(t.subtrees) {
t = t.subtrees[0]
}
return t.keys[0]
}
Chapter 8

Binary Heaps

8.1 Definition
Heaps are widely used for sorting, priority scheduling, graph algorithms, and etc. [40] . The
most popular implementation uses array to represent the heap as a complete binary tree [4] .
Robert W. Floyd developed an efficient heap sort algorithm based on this idea [41] [42] . We
can implement the heap with varies data structures but not limit to array. This chapter
focuses on the heaps implemented with binary trees, including leftist heap, skew heap,
and splay heap [3] . A heap is either empty, or stores comparable elements that satisfy a
property, and define the following operations:

1. The heap property: the top element is always the minimum;

2. Pop: removes the top element from the heap and maintain the heap property: the
new top is still the minimum of the rest;

3. Insert: add a new element to the heap and maintain the heap property;

4. Other: operations like merge also maintain the heap property.

Alternatively, we can define the heap that always keeps the maximum on the top. We
call the heap with the minimum on the top as min-heap, the maximum on the top as max-
heap. When implement heap with a tree, we can put the minimum (or the maximum) in
the root. After pop, remove the root, and rebuild the tree from the sub-trees. We call
the heap implemented with binary tree as binary heap.

8.2 Binary heap by array


The first implementation is to represent the complete binary tree with array. The complete
binary tree is ‘almost’ full. The full binary tree of depth k contains 2k − 1 nodes. We can
number every node top-down, from left to right as 1, 2, ..., 2k − 1. The node number i in
a complete binary tree is located at the same position in the full binary tree. The leaves
only appear in the bottom layer, or the second last layer. Figure 8.1 shows a complete
binary tree and the array. The i-th cell in array corresponds to a node, its parent maps
to the bi/2c-th cell; the left sub-tree maps to the 2i-th cell, and the right sub-tree maps
to the (2i + 1)-th cell. If any sub-tree maps to an index out of the array bound, then the
sub-tree does not exist (i.e. leaf node). We can define the map as below (index starts
from 1):

109
110 CHAPTER 8. BINARY HEAPS

16

14 10

8 7 9 3

2 4 1

16 14 10 8 7 9 3 2 4 1

1 2 3 4 5 6 7 8 9 10

Figure 8.1: Map between a complete binary tree and an array.

i

parent(i)

 =b c
2
lef t(i) = 2i (8.1)


right(i) = 2i + 1

8.2.1 Heapify
Heapify is the process that maintains heap property, keeping the minimum element on
the top. For binary heap, we obtain a stronger property as the binary tree is recursive:
every sub-tree stores its minimum element in the root. In other words, every sub-tree is
also a binary heap. For the cell indexed i in array representation, we examine whether all
the sub-tree elements are greater than or equal to it (≥), exchange when do not satisfy.
1: function Heapify(A, i)
2: n ← |A|
3: loop
4: s←i . s is the smallest
5: l ← Left(i), r ← Right(i)
6: if l ≤ n and A[l] < A[i] then
7: s←l
8: if r ≤ n and A[r] < A[s] then
9: s←r
10: if s 6= i then
11: Exchange A[i] ↔ A[s]
12: i←s
13: else
14: return
Because we recursive check sub-trees, the process time is proportion to the height of
the tree. Heapify is bound to O(lg n), where n is the length of the array. Figure 8.2
gives the steps when apply Heapify from 2 to array [1, 13, 7, 3, 10, 12, 14, 15, 9, 16].
The result is [1, 3, 7, 9, 10, 12, 14, 15, 13, 16].
8.2. BINARY HEAP BY ARRAY 111

13 7

3 10 12 14

15 9 16

3 7

13 10 12 14

15 9 16

3 7

9 10 12 14

15 13 16

Figure 8.2: Heapify. Step 1: the minimum of 13, 3, 10 is 3, exchange 3 ↔ 13; Step 2: the
minimum of 13, 15, 9 is 9, exchange 13 ↔ 9; Step 3: 13 is leaf, terminate.
112 CHAPTER 8. BINARY HEAPS

8.2.2 Build
We can build heap from an array with Heapify. List the numberf of nodes in each level
of a complete binary tree: 1, 2, 4, 8, .... They are all powers of 2 except for the last level,
because the tree is not necessarily full. There are at most 2p−1 nodes, where p is the
smallest integer satisfying 2p − 1 ≥ n, and n is the length of the array. Skip all leaves
because Heapify takes no effect on them, we start applying Heapify to the last branch
node (indexed at ≤ bn/2c) bottom-up. The build function is defined as below:
1: function Build-Heap(A)
2: n ← |A|
3: for i ← bn/2c down to 1 do
4: Heapify(A, i)
Although Heapify is bound O(lg n) time, Build-Heap is bound to O(n), but not
O(n lg n). We skip all leaves, check and move down a level at most for 1/4 nodes; check
and move down two levels at most for 1/8 nodes; check and move down three levels at
most for 1/16 nodes... the total number of comparisons and moves is up to:

1 1 1
S = n( + 2 + 3 + ...) (8.2)
4 8 16
Multiply by 2 for both sides:

1 1 1
2S = n( + 2 + 3 + ...) (8.3)
2 4 8

Subtract eq. (8.2) from eq. (8.3):

1 1 1 1 1
2S − S = n[ + (2 − ) + (3 − 2 ) + ...] shift by one and subtract
2 4 4 8 8
1 1 1
S = n[ + + + ...] geometric series
2 4 8
= n

Figure 8.3 shows the steps to build a min-heap from array [4, 1, 3, 2, 16, 9, 10, 14, 8, 7].
The black node is where Heapify is applied. The grey nodes are swapped to maintain
the heap property.

8.2.3 Heap operations


Heap operations include access the top, pop, lookup the top k elements, decrease an
element in min-heap (or increase an element in max-heap), and insert. The root of the
binary heap, which is the first array cell stores the minimum: T op(A) = A[1].
After pop, the remaining elements all shift ahead a cell of the array. However, without
the root, the rest is not a binary tree any more. Alternatively, we swap the head and the
tail of the array, then reduce the array length by one. It’s equivalent to remove a leaf but
not the root. We then apply Heapify to the root to recover the heap property:
1: function Pop(A)
2: x ← A[1], n ← |A|
3: Exchange A[1] ↔ A[n]
4: Remove(A, n)
5: if A is not empty then
6: Heapify(A, 1)
7: return x
8.2. BINARY HEAP BY ARRAY 113

4 1 3 2 16 9 10 14 8 7

4 4

1 3 1 3

2 16 9 10 2 7 9 10

14 8 7 14 8 16

(1) (5)

4 4

1 3 1 3

2 7 9 10 2 7 9 10

14 8 16 14 8 16

(2) (6)

4
1

1 3
4 3

2 7 9 10
2 7 9 10

14 8 16
14 8 16

(3)
(7)
4 1

1 3 2 3

2 7 9 10 4 7 9 10

14 8 16 14 8 16
(8)
(4)
1 2 3 4 7 9 10 14 8 16

Figure 8.3: Build heap. (1) 16 > 7; (2) exchange 16 ↔ 7; (3) 2 < 14 and 2 < 8; (4) 3 < 9
and 3 < 10; (5) 1 < 2 and 1 < 7; (6) 1 < 4 and 1 < 3; (7) exchange 4 ↔ 1; (8) exchange
4 ↔ 2, end.
114 CHAPTER 8. BINARY HEAPS

It takes constant time to remove the last element from array, pop is bound to O(lg n)
time as it calls Heapify. We can develop a solution to find the top k elements from an
array. First build a heap from the array, then repeatedly pop k times:
1: function Top-k(A, k)
2: R←[]
3: Build-Heap(A)
4: loop Min(k, |A|) times . cut off when k > |A|
5: Append(R, Pop(A))
6: return R
Further, we can implement a priority queue to schedule tasks with priorities. Every
time, we peek the highest priority task to run. To run an urgent task earlier, we can
increase its priority, meaning to decrease an element in a min-heap, as shown in fig. 8.4.

1 1

3 7 3 7

9 10 12 14 2 10 12 14

15 13 16 15 9 16

(1) (3)

1 1

3 7 2 7

9 10 12 14 3 10 12 14

15 2 16 15 9 16

(2) (4)

Figure 8.4: Decrease 13 to 2. Exchange 2 and 9, then exchange with 3.

The heap property may be broken when decrease some element in a min-heap. Let
the decreased element be A[i], below function resumes the heap property bottom-up. It
is bound to O(lg n) time.
1: function Heap-Fix(A, i)
2: while i > 1 and A[i] < A[ Parent(i) ] do
3: Exchange A[i] ↔ A[ Parent(i) ]
4: i ← Parent(i)
We can realize push with Heap-Fix [4] . Append the new element k to the tail, then
apply Heap-Fix to recover the heap property:
1: function Push(A, k)
2: Append(A, k)
3: Heap-Fix(A, |A|)

8.2.4 Heap sort


We can sort elements with heap. Build a min-heap from n elements (with (n) time), then
repeatedly pop the top to build the ascending result. Each pop is bound to O(lg n) time,
8.3. LEFTIST HEAP AND SKEW HEAP 115

the total time is bound to O(n lg n). Besides, we need another list of length n to hold the
result.
1: function Heap-Sort(A)
2: R←[]
3: Build-Heap(A)
4: while A 6= [ ] do
5: Append(R, Pop(A))
6: return R
Floyd developed a fast implementation with max-heap. Every time, swap the head
and the tail of the array. The maximum is swapped to the expected position, the tail; and
the previous tail becomes the new top. Next decrease the heap size by one, and apply
Heapify to restore the heap property. Repeat this till the heap size decrease to one.
This algorithm needn’t the additional space.
1: function Heap-Sort(A)
2: Build-Max-Heap(A)
3: n ← |A|
4: while n > 1 do
5: Exchange A[1] ↔ A[n]
6: n←n−1
7: Heapify(A[1...n], 1)

Exercise 8.1
8.1.1. Consider another idea about in-place heap sort: Build a min-heap from the array
A, the first element a1 is in the right position. Treat the rest [a2 , a3 , ..., an ] as the
new heap, and apply Heapify from a2 . Repeat this till the last element. Is this
method correct?
1: function Heap-Sort(A)
2: Build-Heap(A)
3: for i = 1 to n − 1 do
4: Heapify(A[i...n], 1)
8.1.2. Similarly, can we apply Heapify k times from left to right to get the top-k ele-
ments?
1: function Top-K(A, k)
2: Build-Heap(A)
3: n ← |A|
4: for i ← 1 to min(k, n) do
5: Heapify(A[i...n], 1)

8.3 Leftist heap and skew heap


With an explicit binary tree, after pop the root, there remain two sub-trees. Both are
heaps as shown in fig. 8.5. How can we merge them to a new heap? To maintain the heap
property, the new root must be the minimum of the remaining. We can give the two edge
cases easily:

merge ∅ R = R
merge L ∅ = L
merge L R = ?
116 CHAPTER 8. BINARY HEAPS

Merge

L R L R

Figure 8.5: Merge left and right sub-trees after pop.

Both left and right sub-trees are heaps. If neither is empty, each root stores the
minimum respectively. We compare the two roots, and peek the smaller one as the new
root. Let L = (A, x, B), R = (A0 , y, B 0 ), where A, A0 , B, B 0 are sub-trees. If x < y, then
x is the new root. We keep A, and merge B and R recursively; alternatively, we can keep
B, and merge A and R. The new heap can be (merge A R, x, B) or (A, x, merge B R).
We always merge the right sub-tree for simplicity. This method generates the leftist heap.

8.3.1 Leftist heap


The leftist heap is implemented with leftist tree. C. A. Crane in 1972 [43] developed the
leftist tree. He defines a rank for every node (also known as S-value) as the distance to
the nearest NIL. The rank of NIL is 0. As shown in fig. 8.6, The nearest leaf node to 4 is
8, the rank of 4 is 2; Both 6 and 8 are leaves, their ranks are 1. Although the left sub-tree
of 5 is not empty, its right sub-tree is NIL, hence the rank is 1. Define the merge method
with rank as below. Denote the rank of the left and right sub-trees be rl , rr respectively:

5 8

6 NIL NIL NIL

NIL NIL

Figure 8.6: rank(4) = 2, rank(6) = rank(8) = rank(5) = 1.

1. Always merge the right sub-tree;


2. When rl < rr , swap the left and right sub-trees.
We call this merge method the ‘leftist property’. Basically, a leftist tree always has the
shortest path to some NIL on the right. Although it seems unbalanced, there is critical
fact:
Theorem 8.3.1. For a leftist tree T of n nodes, the path from the root to the rightmost
NIL has at most blog(n + 1)c nodes.
We skip the proof [44] [51] . This theorem ensures any algorithm processes along this path
is bound to O(lg n) time. We define the leftist tree as a binary tree plus an additional
rank. Let the none empty leftist tree be (r, L, k, R):
8.3. LEFTIST HEAP AND SKEW HEAP 117

data LHeap a = Empty | Node Int (LHeap a) a (LHeap a)

Function rank returns the rank value:


rank ∅ = 0
(8.4)
rank (r, L, k, R) = r

Define an auxiliary make function. It compares the ranks of the sub-trees and swap
them if necessary.
(
rank(A) < rank(B) : (1 + rank(A), B, k, A)
make(A, k, B) = (8.5)
otherwise : (1 + rank(B), A, k, B)

For the two trees A and B, if rank(A) is smaller, set B as the left sub-tree, and A as
the right. The rank of the new node is 1 + rank(A); otherwise if rank(B) is smaller, set A
as the left sub-tree, and B as the right. The rank of the new node is 1 + rank(B). Given
two leftist heaps, and denote them as H1 = (r1 , L1 , K1 , R1 ) and H2 = (r2 , L2 , k2 , R2 ) if
not empty, define merge:

merge ∅ H2 = H2
merge H1 ∅ = H
(1
(8.6)
k1 < k 2 : make(L1 , k1 , merge R1 H2 )
merge H1 H2 =
otherwise : make(L2 , k2 , merge H1 R2 )

We always merge to the right sub-tree recursively to maintain the leftist property. It
is bound to O(lg n) time. The binary heap implemented with array performs well in most
cases, suitable for the modern cache technology. However, it takes O(n) time to merge,
because it needs concatenate two arrays, and rebuild the heap [50] :
1: function Merge-Heap(A, B)
2: C ← Concat(A, B)
3: Build-Heap(C)
We can access the top of the leftist heap in constant time(assume it is not empty):

top (r, L, k, R) = k (8.7)

After pop the root, we merge the two sub-trees in O(lg n) time.

pop (r, L, k, R) = merge L R (8.8)

To insert a new element k, create a leaf of k, then merge it to the heap:

insert k H = merge (1, ∅, k, ∅) H (8.9)

We can build a leftist heap from a list as (Curried form): build = f oldr insert ∅.
Then repeatedly pop the minimum to output the sorted result:

sort = heapSort ◦ build (8.10)

Where
heapSort ∅ = []
(8.11)
heapSort H = (top H) : heapSort (pop H)

It pops n times, each takes O(lg n) time. The total time is bound to O(n lg n).
118 CHAPTER 8. BINARY HEAPS

4 3

7 9 14 8

16 10

Figure 8.7: Build the leftist heap from [9, 4, 16, 7, 10, 2, 14, 3, 8, 1].

8.3.2 Skew heap

Leftist heap may lead to unbalanced tree in some cases as shown in fig. 8.8. Skew heap is
a self-adjusting heap. It simplifies the leftist heap and improves balance [46] [47] . To build
the leftist heap, we swap the left and right sub-trees when the rank on left is smaller than
the right. However, this method can’t handle the case that either sub-tree has a NIL
node. The rank is always 1 no matter how big the sub-tree is. Skew heap always swaps
the sub-trees for simplification.

3 4

8
9

10

14

16

Figure 8.8: Leftist heap built from [16, 14, 10, 8, 7, 9, 3, 2, 4, 1].

Skew heap is implemented with skew tree. A skew tree is a binary tree. The root
stores the minimum element, every sub-tree is also a skew tree. Skew tree needn’t the
rank. We can directly re-use the binary tree definition. Denote the none empty tree as
(L, k, R).

data SHeap a = Empty | Node (SHeap a) a (SHeap a)

When merge two none empty skew trees H1 = (L1 , k1 , R1 ) and H2 = (L2 , k2 , R2 ), if
k1 < k2 , then choose k1 (otherwise k2 ) as the new root. Then merge the greater one with
a sub-tree. We can either merge H2 with L1 , or merge H2 with R1 . We choose R1 , and
swap the left and right sub-trees. The result is (merge R1 H2 , k1 , L1 ).
8.4. SPLAY HEAP 119

merge ∅ H2 = H2
merge H1 ∅ = H
(1
(8.12)
k1 < k 2 : (merge R1 H2 , k1 , L1 )
merge H1 H2 =
otherwise : (merge H1 R2 , k2 , L2 )

The other operations, including insert, top, and pop are implemented with merge.
Skew heap outputs a balanced tree even for ordered list as shown in fig. 8.9.

2 3

6 4 5 7

8 9
10

Figure 8.9: Skew tree built from [1, 2, ..., 10].

8.4 Splay heap


Both leftist heap and skew heap are implemented with binary trees. If change to binary
search tree, then the minimum will not be in root. We need O(lg n) time to locate the
minimum. The performance downgrades if the tree is not balanced. Although we can
use the red-black tree to secure balancing, the splay tree provides a simple solution. It
dynamically ‘splays’ to balance the tree. Splay tree takes cache-like approach. It rotates
the node currently being accessed to the root, reduces the access time for the next visit.
We define such operation as ’splay’. The tree tends to be more balanced after multiple
splays. Most splay tree operations perform in amortized O(lg n) time. Sleator and Tarjan
developed splay tree in 1985 [48] [49] .
We give two implementations. The first is pattern matching, it needs match multiple
cases; the second has the uniformed form, but the implementation is complex. When
access node x, denote the parent node as p, and grand parent node (if there is) as g.
There are 3 cases, each has two symmetric sub-cases. Figure 8.10 shows one:

1. Zig-zig: Both x and p are on the left; or on the right. We rotate twice to make x as
root.

2. Zig-zag: x is on the left, while p is on the right; or x is on the right, while P is on


the left. After rotation, x becomes the root, p and g are siblings.

3. Zig: p is the root, we rotate to make x as root.

There are total 6 cases. Let the none empty tree be T = (L, k, R), define splay as
below when access element y:
120 CHAPTER 8. BINARY HEAPS

g x

p d a p

x c b g

a b c d

zig-zig

g x

p p g
d

a a
x b c d

b c
zig-zag

p x

x c a p

a b zig b c

Figure 8.10: zig-zig: x and p are both on left or right, x becomes the new root. zig-zag:
x and p are on different sides, x becomes the new root, p and g are siblings. zig: p is the
root, rotate to make x as the root.
8.4. SPLAY HEAP 121
(
x=y: (a, x, (b, p, (c, g, d)))
splay y (((a, x, b), p, c), g, d) = zig-zig
otherwise : T
(
x=y: (((a, g, b), p, c), x, d)
splay y (a, g, (b, p, (c, x, d))) = zig-zig symmetric
otherwise : T
(
x=y: ((a, p, b), x, (c, g, d))
splay y (a, p, (b, x, c), g, d) = zig-zag
otherwise : T
(
x=y: ((a, g, b), x, (c, p, d))
splay y (a, g, ((b, x, c), p, d)) = zig-zag symmetric
otherwise : T
(
x=y: (a, x, (b, p, c))
splay y ((a, x, b), p, c) = zig
otherwise : T
(
x=y: ((a, p, b), x, c)
splay y (a, p, (b, x, c)) = zig symmetric
otherwise : T
splay y T = T others
(8.13)

The tree is unchanged for all other cases. Every time when insert, we trigger splay to
adjust the balance. If the tree is empty, the result is a leaf; otherwise, compare the new
element and the root, then recursively insert to left (less than) or right (greater than)
sub-tree and splay.

insert y ∅ = (
(∅, y, ∅)
y<x: splay y (insert y L, x, R) (8.14)
insert y (L, x, R) =
otherwise : splay y (L, x, insert y R)

4 10

2 9

7
1 3

6 8

Figure 8.11: Splay tree built from [1, 2, ..., 10].

Figure 8.11 gives the splay tree built from [1, 2, ..., 10]. It generates a relative balanced
tree. Okasaki gives a simple rule for splaying [3] . Whenever follow two left or two right
branches continuously, rotate the two nodes. When access x, if have moved left or right
twice, then partition T as L and R recursively, where L contains all elements less than x,
while R contains the remaining. Then create a new tree with x as the root, and L, R as
the sub-trees.
122 CHAPTER 8. BINARY HEAPS

partition y ∅ = (∅, ∅)
partition
 y (L,x, R) =





 R=∅ (T, ∅)
 0
x <y (((L, x, L0 ), x0 , A), B)

 


 
 
where: (A, B) = partition y R0
 

x < y

 
R = (L0 , x0 , R0 )


otherwise ((L, x, A), (B, x0 , R0 ))

 


 
 

where: (A, B) = partition y L0

 
 

 






 L=∅ (∅, T )
y < x0 (A, (L0 , x0 , (R0 , x, R)))

 
 

 
 
otherwise where: (A, B) = partition y L0

 

 
L = (L0 , x0 , R0 )


otherwise ((L0 , x0 , A), (B, x, R))

 


 
 

where: (A, B) = partition y R0

 
 

 
(8.15)
We partitions the tree T with a pivot y. For empty tree, the result is (∅, ∅); otherwise
for tree (L, x, R), if x < y, there are two sub-cases: (1) R is empty. All elements in the
tree are less then y, the result is (T, ∅); (2) R = (L0 , x0 , R0 ). If x0 < y, then recursively
partition R0 with y. Put all elements less than y in A, and the rest in B. The result is
a pair of trees: ((L, x, L0 ), x0 , A) and B. If x0 > y, then recursively partition L0 with y
into (A, B). The result is also a pair of (L, x, A) and (B, x0 , R0 ). The result is symmetric
when y < x.
Alternatively, we can define insert with partition. When insert element k to T , first
partition the heap to two sub-trees of L and R, satisfying L < k < R (that L contains
all elements smaller than k, and R contains the rest). Then create a new tree of (L, k, R)
(rooted at k, with L, R as sub-trees).
insert k T = (L, k, R), where (L, R) = partition k T (8.16)
Since splay tree is essentially a binary search tree, the minimum is at the left most.
We keep traversing the left sub-tree to access the ‘top’ of the heap:
top (∅, k, R) = k
(8.17)
top (L, k, R) = top L
This is equivalent to min for the binary search tree (alternatively, we can define:
top = min). For pop, we remove the minimum and splay when move left twice.
pop (∅, k, R) = R
pop ((∅, k 0 , R0 ), k, R) = (R0 , k, R) (8.18)
pop ((L0 , k 0 , R0 ), k, R) = (pop L0 , k 0 , (R0 , k, R))
The third row performs splaying based on the binary search tree property without
calling partition. Top and pop both are bound to O(lg n) time when the splay tree is
balanced.
We can implement merge in O(lg n) time with partition. When merge two none-
empty trees, we choose either root as the pivot to partition the other, then recursively
merge the sub-trees:
merge T ∅ = T
merge ∅ T = T (8.19)
merge (L, x, R) T = ((merge L L0 ) x (merge R R0 ))
where
(L0 , R0 ) = partition x T
8.5. SUMMARY 123

8.5 Summary
We give the generic definition of binary heap in this chapter. There are several imple-
mentations. The array based representation is suitable for imperative implementation. It
maps a complete binary tree to array, supporting random access in constant time. We
directly use the binary tree to implement the heap in functional way. Most operations
are bound to O(lg n) time, some are amortized O(1) time [3] . When extend from the bi-
nary tree to k-ary tree, we obtain binomial heap, Fibonacci heap, and pairing heap (See
chapter 10).

Exercise 8.2
8.2.1. Implement leftist heap and skew heap imperatively.
8.2.2. Define fold for heap.

8.6 Appendix - example programs


Access parent, and sub-trees in a complete binary tree with bit-wise operation (index
from 0) for the array representation:
Int parent(Int i) = ((i + 1) >> 1) - 1

Int left(Int i) = (i << 1) + 1

Int right(Int i) = (i + 1) << 1

Heapify, parameterized the comparison:


void heapify([K] a, Int i, Less<K> lt) {
Int l, r, m
Int n = length(a)
loop {
m = i
l = left(i)
r = right(i)
if l < n and lt(a[l], a[i]) then m = l
if r < n and lt(a[r], a[m]) then m = r
if m 6= i {
swap(a, i, m);
i = m
} else {
break
}
}
}

Build the binary heap from array:


void buildHeap([K] a, Less<K> lt) {
Int n = length(a)
for Int i = (n-1) / 2 downto 0 {
heapify(a, i, lt)
}
}

Pop:
K pop([K] a, Less<K> lt) {
var n = length(a)
t = a[n]
124 CHAPTER 8. BINARY HEAPS

swap(a, 0, n - 1)
remove(a, n - 1)
if a 6= [] then heapify(a, 0, lt)
return t
}

Find the top-k elements:


[K] topk([K] a, Int k, Less<K> lt) {
buildHeap(a, lt)
[K] r = []
loop min(k, length(a)) {
append(r, pop(a, lt))
}
return r
}

Decrease the key in min-heap:


void decreaseKey([K] a, Int i, K k, Less<K> lt) {
if lt(k, a[i]) {
a[i] = k
heapFix(a, i, lt)
}
}

void heapFix([K] a, Int i, Less<K> lt) {


while i > 0 and lt(a[i], a[parent(i)]) {
swap(a, i, parent(i))
i = parent(i)
}
}

Push:
void push([K] a, K k, less<K> lt) {
append(a, k)
heapFix(a, length(a) - 1, lt)
}

Heap sort:
void heapSort([K] a, less<K> lt) {
buildHeap(a, not ◦ lt)
n = length(a)
while n > 1 {
swap(a, 0, n - 1)
n = n - 1
heapify(a[0 .. (n - 1)], 0, not ◦ lt)
}
}

Merge two leftist heaps:


merge Empty h = h
merge h Empty = h
merge h1@(Node _ x l r) h2@(Node _ y l' r') =
if x < y then makeNode x l (merge r h2)
else makeNode y l' (merge h1 r')

makeNode x a b = if rank a < rank b then Node (rank a + 1) x b a


else Node (rank b + 1) x a b

Merge two skew heaps:


Elementary Algorithms 125

merge Empty h = h
merge h Empty = h
merge h1@(Node x l r) h2@(Node y l' r') =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r') l'

Splay operation:
−− zig-zig
splay t@(Node (Node (Node a x b) p c) g d) y =
if x == y then Node a x (Node b p (Node c g d)) else t
splay t@(Node a g (Node b p (Node c x d))) y =
if x == y then Node (Node (Node a g b) p c) x d else t
−− zig-zag
splay t@(Node (Node a p (Node b x c)) g d) y =
if x == y then Node (Node a p b) x (Node c g d) else t
splay t@(Node a g (Node (Node b x c) p d)) y =
if x == y then Node (Node a g b) x (Node c p d) else t
−− zig
splay t@(Node (Node a x b) p c) y = if x == y then Node a x (Node b p c) else t
splay t@(Node a p (Node b x c)) y = if x == y then Node (Node a p b) x c else t
−− others
splay t _ = t

Splay heap insert:


insert Empty y = Node Empty y Empty
insert (Node l x r) y
| x > y = splay (Node (insert l y) x r) y
| otherwise = splay (Node l x (insert r y)) y

Partition the splay tree:


partition Empty _ = (Empty, Empty)
partition t@(Node l x r) y
| x < y =
case r of
Empty → (t, Empty)
Node l' x' r' →
if x' < y then
let (small, big) = partition r' y in
(Node (Node l x l') x' small, big)
else
let (small, big) = partition l' y in
(Node l x small, Node big x' r')
| otherwise =
case l of
Empty → (Empty, t)
Node l' x' r' →
if y < x' then
let (small, big) = partition l' y in
(small, Node l' x' (Node r' x r))
else
let (small, big) = partition r' y in
(Node l' x' small, Node big x r)

Merge two splay trees:


merge t Empty = t
merge Empty t = t
merge (Node l x r) t = Node (merge l l') x (merge r r')
where (l', r') = partition t x
126 Selection sort
Chapter 9

Selection sort

Selection sort is a straightforward sorting algorithm. It repeatedly selects the minimum


(or maximum) from a collection of elements. The performance is below the divide and
conqueror algorithms, like quick sort and merge sort. We’ll seek varies of improvement
and finally evolve it to heap sort, achieving O(n lg n) time bound, the upper limit of
comparison based sort algorithms.
When facing a bunch of grapes, there are two types of people. One love to pick the
biggest grape every time, the other always pick the smallest one. The former enjoy the
grape in ascending order of size, while the latter in descending order. In either case, one
essentially applies selection sort method, defined as:

1. If the collection is empty, the sorted result is empty;


2. Otherwise, select the minimum element, and append it to the sorted result.

It sorts elements in ascending order as shown in fig. 9.1. If select the maximum, then
it sorts in descending order. The compare operation can be abstract.

sort [ ] = [ ]
(9.1)
sort A = m : sort (A − [m]) where m = min A

A − [m] means the remaining elements in A except m. The corresponding imperative


implementation is as below:
1: function Sort(A)
2: X←[]
3: while A 6= [ ] do
4: x ← Min(A)
5: Del(A, x)
6: Append(X, x)
7: return X
We can improve it to in-place sort by reusing A. Place the minimum in A[1], the
second smallest in A[2], ... When find the i-th smallest element, swap it with A[i].
1: function Sort(A)
2: for i ← 1 to |A| do
3: m ← Min-At(A, i)
4: Exchange A[i] ↔ A[m]
Let A = [a1 , a2 , ..., an ], when select the i-th smallest element, [a1 , a2 , ..., ai−1 ] are
sorted. Call Min-At(A, i) to find the minimum of [ai , ai+1 , ..., an ], then swap with ai .
Repeat this to process all elements as shown in fig. 9.2.

127
128 CHAPTER 9. SELECTION SORT

min
x
append

sorted unsorted
x1 < x2 < ... < xk a1 a2 ... am

Figure 9.1: The left is sorted, repeatedly select the minimum of the rest and append.

swap

... sorted ... x ... min ...

Figure 9.2: The left is sorted, repeatedly find the minimum and swap to the right position.

9.1 Find the minimum


We use the ‘compare and swap’ method to find the minimum. Label the elements with
1, 2, ..., n. Compare the elements of number 1 and 2, pick the smaller and compare it with
number 3, ... repeat till the last element of number n.
1: function Min-At(A, i)
2: m←i
3: for i ← m + 1 to |A| do
4: if A[i] < A[m] then
5: m←i
6: return m
Min-At locates the minimum at m from slice A[i...]. m starts from A[i], then scan
A[i + 1], A[i + 2], .... To find the minimum of a list L, if L is a singleton [x], then x is
the minimum; otherwise pick an element x from L, then recursively find the minimum y
from the remaining, the smaller one between x and y is the minimum of L.

min [x] = (x,


( [ ])
x<y: (x, xs), where (y, ys) = min xs (9.2)
min (x:xs) =
otherwise : (y, x:ys)

We can further improve it to tail recursive. Divide the elements into two groups A and
B. A is initialized empty ([ ]), B contains all elements. We pick two elements from B,
compare and put the greater one to A, leave the smaller one as m. Then repeatedly pick
element from B, compare with m till B becomes empty. m holds the minimum finally.
At any time, we have the invariant: L = A + + [m] ++ B, where a ≤ m ≤ b for every
a ∈ A, b ∈ B.

min (x:xs) = min0 [ ] x xs (9.3)

Where:

min0 as m [ ] = (
(m, as)
b<m: min0 (m:as) b bs (9.4)
min0 as m (b:bs) =
otherwise : min0 (b:as) m bs
9.2. IMPROVEMENT 129

Function min returns a pair: the minimum and the list of the remaining. We define
selection sort as below:
sort [ ] = [ ]
(9.5)
sort xs = m : (sort xs0 ), where (m, xs0 ) = min xs

9.1.1 Performance
Selection sort need scan and find the minimum for n times. It compares n + (n − 1) +
n(n + 1)
(n − 2) + ... + 1 times, which is O( ) = O(n2 ). Compare to the insertion sort,
2
selection sort performs same in the best, worst, and average cases. While insertion sort
performs best at O(n) (the list is in reversed ordered), and worst at O(n2 ).

Exercise 9.1
9.1.1. What is the problem with below implementation of min?

min0 as m [ ] = (m,
( as)
b<m: min0 (as +
+ [m]) b bs
min0 as m (b:bs) =
otherwise : min (as +
0
+ [b]) m bs

9.1.2. Implement the in-place selection sort.

9.2 Improvement
To sort in ascending, descending order flexibly, we abstract the comparison as ‘C’.

sortBy C [ ] = [ ]
(9.6)
sortBy C xs = m : sortBy C xs0 , where (m, xs0 ) = minBy C xs

And use C to find the ‘minimum’:


minBy C [x] = (x,
( [ ])
xCy : (x, xs), where (y, ys) = minBy C xs (9.7)
minBy C (x:xs) =
otherwise : (y, x:ys)

For example, we pass (<) to sort a collection of numbers in ascending order: sortBy (<
) [3, 1, 4, ...]. As the constraint, the comparison C need satisfy the strict weak ordering [52] .

• Irreflexivity: for all x, x ≮ x (x is not less than itself);

• Asymmetry: for all x and y, if x < y, then y ≮ x;

• Transitivity, for all x, y, and z, if x < y, and y < z, then x < z.

As an impact in-place implementation, we traverse elements, and find the minimum


in an inner loop:
1: procedure Sort(A)
2: for i ← 1 to |A| do
3: m←i
4: for j ← i + 1 to |A| do
5: if A[i] < A[m] then
130 CHAPTER 9. SELECTION SORT

6: m←i
7: Exchange A[i] ↔ A[m]
We only need sort n − 1 elements, and leave the last one to save the last loop. Besides,
we needn’t swap if A[i] is exact the i-th smallest.
1: procedure Sort(A)
2: for i ← 1 to |A| − 1 do
3: m←i
4: for j ← i + 1 to |A| do
5: if A[i] < A[m] then
6: m←i
7: if m 6= i then
8: Exchange A[i] ↔ A[m]

9.2.1 Cock-tail sort


Knuth gives another implementation [51] called ‘cock-tail sort’. Select the maximum, but
not the minimum, and move it to the tail as shown in fig. 9.3. At any time, the right
most part is sorted. We scan the unsorted part, find the maximum and swap to the right.
1: procedure Sort’(A)
2: for i ← |A| down-to 2 do
3: m←i
4: for j ← 1 to i − 1 do
5: if A[m] < A[i] then
6: m←i
7: Exchange A[i] ↔ A[m]

swap
... max ... x ... sorted ...

Figure 9.3: Select the maximum and swap to tail

Further, we can pick both the minimum and maximum in one pass, swap the minimum
to the head, and the maximum to the tail, hence halve the inner loops.
1: procedure Sort(A)
|A|
2: for i ← 1 to b c do
2
3: min ← i
4: max ← |A| + 1 − i
5: if A[max] < A[min] then
6: Exchange A[min] ↔ A[max]
7: for j ← i + 1 to |A| − i do
8: if A[j] < A[min] then
9: min ← j
10: if A[max] < A[j] then
11: max ← j
12: Exchange A[i] ↔ A[min]
13: Exchange A[|A| + 1 − i] ↔ A[max]
It’s necessary to swap if the right most element less than the left most one before the
inner loop starts. This is because the scan excludes them. We can also implement the
9.2. IMPROVEMENT 131

swap

... sorted smaller ... x ... max ... min ... y ... sorted greater ...

swap

Figure 9.4: Find the minimum and maximum, swap both to the right positions.

cock-tail sort recursively:

1. If the list is empty or singleton, it’s sorted;


2. Otherwise, select the minimum and the maximum, move them to the head and tail,
then recursively sort the rest elements.

sort [ ] = [ ]
sort [x] = [x] (9.8)
+ [b], where (a, b, xs0 ) = min-max xs
sort xs = a : (sort xs0 ) +

Where function min-max extracts the minimum and maximum from a list:

min-max (x:y:xs) = f oldr sel (min x y, max x y, [ ]) xs (9.9)

We initialize the minimum as the first element x0 , the maximum as the second element
x1 , and process the list with f oldr. Define sel as:

x < x 0 :
 (x, x1 , x0 :xs)
sel x (x0 , x1 , xs) = x1 < x : (x0 , x, x1 :xs)
otherwsie : (x0 , x1 , x : xs)

Although min-max is bound to O(n) time, + +[b] is expensive. As shown in fig. 9.4, let
the left sorted part be A, the right sorted part be B. We can turn the cock-tail sort to
tail recursive with A and B as the accumulators.
sort0 A B [ ] = A + +B
sort0 A B [x] = A + + (x:B)
sort0 A B (x:xs) = sort0 (A + + [x0 ]) xs0 (x1 :B), where (x0 , x1 , xs0 ) = min-max xs
(9.10)
Start sorting with empty accumulators: sort = sort0 [ ] [ ]. The append only happens
to A ++ [x0 ], while x1 is linked before B. To further eliminate the + +[x0 ], we maintain A


in reversed order: A , hence x0 is preprended but not appended. We have the following
equations:

A0 = A++ [x]
= reverse (x : reverse A)


= reverse (x : A ) (9.11)
←−−−


= x: A


Finally, reverse A0 back to A0 . We improve the algorithm as:

sort0 A B [ ] = (reverse A) +
+B
sort0 A B [x] = (reverse x:A) ++B (9.12)
sort0 A B (x:xs) = sort0 (x0 :A) xs0 (x1 :B)
132 CHAPTER 9. SELECTION SORT

9.3 Further improvement


Although cock-tail sort halves the loops, it’s still bound to O(n2 ) time. To sort by
comparison, we need the outer loop to examine all the elements for ordering. Do we need
scan all the elements to select the minimum every time? After find the first smallest one,
we’ve traversed the whole collection, obtain some information, like which are greater,
which are smaller. However, we discard such information for further selection, but restart
a fresh scan. The idea is information reusing. Let’s see one inspired from football match.

9.3.1 Tournament knock out


The football world cup is held every four years. There are 32 teams from different conti-
nent play the final games. Before 1982, there were 16 teams in the finals. Let’s go back to
1978 and imagine a special way to determine the champion: In the first round, the teams
are grouped into 8 pairs to play. There will be 8 winners, and 8 teams will be out. Then
in the second round, 8 teams are grouped into 4 pairs. There will be 4 winners. Then
the top 4 teams are grouped into 2 pairs, there will be two teams left for the final. The
champion is determined after 4 rounds of games. There are total 8 + 4 + 2 + 1 = 15 games.
Besides the champion, we also want to know which is the silver medal team. In the real
world cup, the team lost the final is the runner-up. However, it isn’t fair in some sense.
We often hear about the ‘group of death’. Suppose Brazil is grouped with Germam in
round one. Although both teams are strong, one team is knocked out. It’s quite possible
that the team would beat others except for the champion, as shown in fig. 9.5.

16

16 14

13 10 14
16

8 13 10 9 12 14
7 16

7 6 15 16 8 4 13 3 5 10 9 1 12 2 11 14

Figure 9.5: The element 15 is knocked out in the first round.

Assign every team a number to measure its strength. Suppose the team with greater
number always beats the smaller one (this is obviously not true in real world). The
champion number is 16. the runner-up is not 14, but 15, which is out in the first round.
We need figure out a way to quickly identify the second greater number in the tournament
tree. Then apply it to select the 3rd, the 4th, ... to sort. We can overwrite the champion
to a very small number, i.e. −∞, hence it won’t be selected next time, and the previous
runner-up will become the new champion. For 2m teams, where m is some natural number,
it takes 2m−1 + 2m−2 + ... + 2 + 1 = 2m − 1 comparisons to determine the new champion.
This is same as before. Actually, we needn’t perform bottom-up comparisons because
the tournament tree stores sufficient ordering information. The champion must beat the
runner-up at sometime. We can locate the runner-up along the path from the root to
the leaf of the champion. We grey the path in fig. 9.5, compare with [14, 13, 7, 15]. This
method is defined as below:

1. Build a tournament tree with the maximum (the champion) at the root;

2. Extract the root, replace it with −∞ along the path to leaf;


9.3. FURTHER IMPROVEMENT 133

3. Perform a bottom-up back-track along the path, find the new champion and store
it in the root;

4. Repeat step 2 to process all elements.

15

15 14

13 10 14
15

8 13 10 9 12 14
7 15

7 6 15 8 4 13 3 5 10 9 1 12 2 11 14

Take 16, replace with −∞, 15 becomes the new root.


14

13 14

13 10 14
7

8 13 10 9 12 14
7

7 6 8 4 13 3 5 10 9 1 12 2 11 14

Take 15, replace with −∞, 14 becomes the new root.


13

13 12

13 10 12
7

8 13 10 9 12 11
7

7 6 8 4 13 3 5 10 9 1 12 2 11

Take 14, replace with −∞, 13 becomes the new root.

Figure 9.6: The first 3 steps of tournament tree sort.

To sort a collection of elements, we build a tournament tree, repeatedly select the


champion out. Figure 9.6 gives the first 3 steps. Re-use the binary tree definition. To
make back-track easy, we add a parent field in each node. When n is not 2m for some
natural number m, there is remaining element without “player”, and directly enters the
next round. To build the tournament tree, we build n singleton trees from every element.
Then pick every two t1 , t2 to create a bigger binary tree t. Where the root of t is
max(key(t1 ), key(t2 )), the left and right sub-trees are t1 , t2 . Repeat to obtain a collection
of new trees, each height increases by one. Enter the next round if there is a tree left (odd
n n
number of trees). The trees halve in every round b c, b c, ... till the final tournament
2 4
n n
tree. This process is bound to O(n + + + ...) = O(2n) = O(n) time.
2 4
1: function Build-Tree(A)
2: T ←[]
134 CHAPTER 9. SELECTION SORT

3: for each x ∈ A do
4: Append(T , Node(NIL, x, NIL))
5: while |T | > 1 do
6: T0 ← [ ]
7: for every t1 , t2 ∈ T do
8: k ← Max(Key(t1 ), Key(t2 ))
9: Append(T 0 , Node(t1 , k, t2 ))
10: if |T| is odd then
11: Append(T 0 , Last(T ))
12: T ← T0
13: return T [1]
When pop, we replace the root with −∞ top-down, then back-track through the parent
field to find the new maximum.
1: function Pop(T )
2: m ← Key(T )
3: Key(T ) ← −∞
4: while T is not leaf do . top-down replace m with −∞.
5: if Key(Left(T )) = m then
6: T ← Left(T )
7: else
8: T ← Right(T )
9: Key(T ) ← −∞
10: while Parent(T ) 6= NIL do . bottom-up to find the new maximum.
11: T ← Parent(T )
12: Key(T ) ← Max(Key(Left(T )), Key(Right(T )))
13: return (m, T ) . the maximum and the new tree.
Pop processes the tree in two passes, top-down, then bottom-up along the path of
the champion. Because the tournament tree is balanced, the length of this path, i.e. the
height of the tree, is bound to O(lg n), where n is the number of the elements. Below is
the tournament tree sort. We first build the tree in O(n) time, then pop the maximum
for n times, each pop takes O(lg n) time. The total time is bound to O(n lg n).
procedure Sort(A)
T ← Build-Tree(A)
for i ← |A| down to 1 do
(A[i], T ) ← Pop(T )
We can also implement tournament tree sort recursively. Reuse the binary search tree
definition, let an none empty tree be (l, k, r), where k is the element, l, r are the left and
right sub-trees. Define wrap x = (∅, x, ∅) to create a leaf node. Convert the n elements
to a list of single trees: ts = map wrap xs. For every pair of trees t1 , t2 , we merge them
to a bigger tree, pick the greater element as the new root, and set t1 , t2 as the sub-trees.

merge t1 t2 = (t1 , max k1 k2 , t2 ) (9.13)

Where k1 = key t1 , k2 = key t2 respectively. Define build ts to repeatedly merge trees,


and build the final tournament tree.
build [ ] = ∅
build [t] = t (9.14)
build ts = build (pairs ts)

Where:
9.4. APPENDIX - EXAMPLE PROGRAMS 135

pairs (t1 :t2 :ts) = (merge t1 t2 ) : pairs ts


(9.15)
pairs ts = ts

When pop the champion, we examine the sub-trees to see which one holds the same
element as the root. Then recursively pop the champion from the sub-tree till the leaf
node, and replace it with −∞.

pop (∅, k, ∅) =( −∞, ∅)


(∅,
k = key l : (l0 , max (key l0 ) (key r), r), where l0 = pop l
pop (l, k, r) =
k = key r : (l, max (key l) (key r0 ), r0 ), where r0 = pop r
(9.16)
Then repeatedly pop the tournament tree to sort (in descending order):

sort ∅ = [ ]
sort (l, −∞, r) = [ ] (9.17)
sort t = (key t) : sort (pop t)

Exercise 9.2
9.2.1. Implement the recursive tournament tree sort in ascending order.
9.2.2. How to handle duplicated elements with the tournament tree? is tournament tree
sort stable?
9.2.3. Compare the tournament tree sort and binary search tree sort in terms of space
and time performance.
9.2.4. Compare heap sort and tournament tree sort in terms of space and time perfor-
mance.

9.3.2 Heap sort


We improve the selection based sort to O(n lg n) time through tournament tree. It is the
upper limit of the comparison based sort [51] . However, there is still room for improvement.
After sort, the binary tree holds all −∞, occupying 2n nodes for n elements. It’s there
a way to release nodes after pop? Can we halve 2n nodes to n? Treat the tree as empty
when the root element is −∞, and rename key to top, we can write eq. (9.17) in a generic
way:

sort ∅ = []
(9.18)
sort t = (top t) : sort (pop t)

This is exactly as same as the definition of heap sort. Heap always stores the minimum
(or the maximum) on the top, and provides fast pop operation. The array implementation
encodes the binary tree structure as indices, uses exactly n cells to represent the heap.
The functional heaps, like the leftist heap and splay heap use n nodes as well. We’ll give
more well performed heaps in next chapter.

9.4 Appendix - example programs


Tail recursive selection sort:
sort [] = []
sort xs = x : sort xs'
where
(x, xs') = extractMin xs
136 CHAPTER 9. SELECTION SORT

extractMin (x:xs) = min' [] x xs


where
min' ys m [] = (m, ys)
min' ys m (x:xs) = if m < x then min' (x:ys) m xs
else min' (m:ys) x xs

Cock-tail sort:
[A] cocktailSort([A] xs) {
Int n = length(xs)
for Int i = 0 to n / 2 {
var (mi, ma) = (i, n - 1 -i)
if xs[ma] < xs[mi] then swap(xs[mi], xs[ma])
for Int j = i + 1 to n - 1 - i {
if xs[j] < xs[mi] then mi = j
if xs[ma] < xs[j] then ma = j
}
swap(xs[i], xs[mi])
swap(xs[n - 1 - i], xs[ma])
}
return xs
}

Tail recursive cock-tail sort:


csort xs = cocktail [] [] xs
where
cocktail as bs [] = reverse as ++ bs
cocktail as bs [x] = reverse (x:as) ++ bs
cocktail as bs xs = let (mi, ma, xs') = minMax xs
in cocktail (mi:as) (ma:bs) xs'

minMax (x:y:xs) = foldr sel (min x y, max x y, []) xs


where
sel x (mi, ma, ys) | x < mi = (x, ma, mi:ys)
| ma < x = (mi, x, ma:ys)
| otherwise = (mi, ma, x:ys)

Build the tournament tree (reuse the binary tree structure):


Node<T> build([T] xs) {
[T] ts = []
for x in xs {
append(ts, Node(null, x, null))
}
while length(ts) > 1 {
[T] ts' = []
for l, r in ts {
append(ts', Node(l, max(l.key, r.key), r))
}
if odd(length(ts)) then append(ts', last(ts))
ts = ts'
}
return ts[0];
}

Pop from the tournament tree:


T pop(Node<T> t) {
T m = t.key
t.key = -INF
while not isLeaf(t) {
t = if t.left.key == m then t.left else t.right
t.key = -INF
}
Elementary Algorithms 137

while (t.parent 6= null) {


t = t.parent
t.key = max(t.left.key, t.right.key)
}
return (m, t);
}

Tournament tree sort:


void sort([A] xs) {
Node<T> t = build(xs)
for Int n = length(xs) - 1 downto 0 {
(xs[n], t) = pop(t)
}
}

Recursive tournament tree sort (descending order):


data Tr a = Empty | Br (Tr a) a (Tr a)

data Infinite a = NegInf | Only a | Inf deriving (Eq, Ord)

key (Br _ k _ ) = k

wrap x = Br Empty (Only x) Empty

merge t1@(Br _ k1 _) t2@(Br _ k2 _) = Br t1 (max k1 k2) t2

fromList = build ◦ (map wrap) where


build [] = Empty
build [t] = t
build ts = build (pairs ts)
pairs (t1:t2:ts) = (merge t1 t2) : pair ts
pairs ts = ts

pop (Br Empty _ Empty) = Br Empty NegInf Empty


pop (Br l k r) | k == key l = let l' = pop l in Br l' (max (key l') (key r)) r
| k == key r = let r' = pop r in Br l (max (key l) (key r')) r'

toList Empty = []
toList (Br _ Inf _) = []
toList t@(Br _ Only k _) = k : toList (pop t)

sort = toList ◦ fromList


138 Binomial heap, Fibonacci heap, and pairing heap
Chapter 10

Binomial heap, Fibonacci heap,


and pairing heap

Binary heap stores elements in a binary tree, we can extend it to k-ary tree [54] (k > 2
multi-ways tree) or multiple trees. The binomial heap is a forest of k-ary trees. When
delay some operations of the Binomial heap, we obtain the Fibonacci heap. It improves
the heap merge performance from O(lg n) to amortized constant time, which is critical for
graph algorithm design. The pairing heap gives a simplified implementation with good
overall performance.

10.1 Binomial Heaps


Binomial heap is named after Newton’s binomial theorem. It consists of a set of k-ary
trees (a forest). Every tree has the size equal to a binomial coefficient. Newton proves
that (a + b)n expands to:
   
n n−1 n
n n
(a + b) = a + a b + ... + abn−1 + b (10.1)
1 n−1

When n is a natural number, the list of coefficients is some row in Pascal’s triangle1 [55] .
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
...

The first row is 1, all the first and last numbers are 1 for every row. Any other number
is the sum of the top-left and top-right numbers in the previous row. There are many
ways to generate pascal triangle, like recursion.

10.1.1 Binomial tree


A binomial tree is a multi-ways tree with an integer rank. Denoted as B0 if the rank is
0, and Bn for rank n.
1 Also know as the Jia Xian’s triangle named after ancient Chinese mathematician Jia Xian (1010-

1070). Newton generalized n to rational numbers, later Euler expanded it to real exponents.

139
140 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

1. B0 has only one node;

2. Bn is formed by two Bn−1 trees, the one with the greater root element is the left
most sub-tree of the other, as shown in fig. 10.1.

B0

Bn

Bn-1

Bn-1 ...

...

Figure 10.1: Binomial tree

Figure 10.2 gives examples of B0 to B4 . The number of nodes in every row in Bn is


a binomial coefficient. For example in B4 , there is a node (the root) in level 0, 4 nodes
in level 1, 6 nodes in level 2, 4 nodes in level 3, and a node in level 4. They are exactly
same as the 4th row (start from 0) of Pascal’s triangle: 1, 4, 6, 4, 1. This is the reason
why we name it binomial tree. We further know there are 2n elements in a Bn tree.
A binomial heap is a set of binomial trees (a forest) satisfying the following two rules:

1. Every tree satisfies the heap property, i.e. for min heap, the element in every node
is not less than (≥) its parent;

2. Every tree has the unique rank. i.e. any two trees have different ranks.

From the 2nd rule, for a binomial heap of n elements, convert n to its binary format
n = (am ...a1 , a0 )2 , where a0 is the least significant bit (LSB) and am is the most significant
bit (MSB). There is a tree of rank i if and only if ai = 1. For example, consider a binomial
heap of 5 elements. As 5 = (101)2 in binary, there are 2 binomial trees: B0 and B2 . The
binomial heap in fig. 10.3 has 19 elements, 19 = (10011)2 . There are three trees: B0 , B1 ,
and B4 .
We define the binomial tree as (r, k, ts), where r is the rank, k is the root element,
and ts is the list of sub-trees ordered by rank.
data BiTree a = Node Int a [BiTree a]
type BiHeap a = [BiTree a]

There is a method called ‘left-child, right-sibling’ [4] , that reuses the binary tree data
structure to define multi-ways tree. Every node has the left and right parts. the left
references to the first sub-tree; the right references to its sibling. All siblings form a list
as shown in fig. 10.4. Alternatively, we can use an array or a list to hold the sub-trees.
10.1. BINOMIAL HEAPS 141

1 0
2 2

1 1 0 1 0 0

0 0 0 0

3 2 1 0

2 1 0 1 0 0

1 0 0
0

Figure 10.2: Binomial trees of rank 0, 1, 2, 3, 4, ...

18 3 6

37 8 29 10 44

30 23 22 48 31 17

45 32 24 50

55

Figure 10.3: A binomial heap with 19 elements


142 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

T1 T2 ... Tm

Tm1 Tm2 ... Tmn

T21 T22 ... T2k

T11 T12 ... T1p

Figure 10.4: R is the root, T1 , T2 , ..., Tm are sub-trees of R. The left of R is T1 , the right
is NIL. T11 , ..., T1p are sub-trees of T1 . The left of T1 is T11 , the right is its sibling T2 .
The left of T2 is T21 , the left is sibling.

10.1.2 Link
To link two Bn trees to a Bn+1 tree, we compare the two root elements, choose the smaller
one as the root, and put the other tree ahead of other sub-trees as shown in fig. 10.5.
(
x<y: (r + 1, x, (r, t, ts0 ):ts)
0
link (r, x, ts) (r, y, ts ) = (10.2)
otherwise : (r + 1, y, (r, x, ts):ts0 )

Figure 10.5: If x < y, link y as the first sub-tree of x.

We can implement link with ‘left child, right sibling’ method as below. Link operation
is bound to constant time.
1: function Link(x, y)
2: if Key(y) < Key(x) then
3: Exchange x ↔ y
4: Sibling(y) ← Sub-Trees(T1 )
5: Sub-Trees(x) ← y
6: Parent(y) ← x
7: Rank(x) ← Rank(y) + 1
8: return x

Exercise 10.1
10.1.1. Write a program to generate Pascal’s triangle.
10.1.2. Prove that the i-th row in tree Bn has ni nodes.

10.1. BINOMIAL HEAPS 143

10.1.3. Prove there are 2n elements in a Bn tree.


10.1.4. Use a container to store sub-trees, how to implement link? How to ensure it is in
constant time?

10.1.3 Insert
When insert a tree, we keep the forest ordered by rank (ascending):

ins t [ ] = [t]

0 0
rank t < rank t : t:t :ts
(10.3)

ins t (t0 :ts) = rank t0 < rank t : t0 : ins t ts
otherwise : ins (link t t0 ) ts

Where rank (r, k, ts) = r returns the rank of a tree. For empty heap [ ], it becomes a
singleton of [t]; otherwise, compare the rank of t with the first tree t0 , if t has the less rank,
then it becomes the new first tree; if t0 has the less rank, we recursively insert t to the
rest trees; if they have the same rank, then link t and t0 to a bigger tree, and recursively
insert. For n elements, there are at most O(lg n) binomial trees in the heap. ins links
O(lg n) time at most. As linking is constant time, the overall performance is bound to
O(lg n)2 . We define insert for binomial heap with ins. First wrap the new element x in
a singleton tree, then insert to the heap:

insert x = ins (0, x, [ ]) (10.4)

This is in Curried form. We can further insert a list of elements with fold:

fromList = f oldr insert [ ] (10.5)

Below is the implementation with ’left child, right sibling’ method:


1: function Insert-Tree(T, H)
2: ⊥← p ← Node(0, NIL, NIL)
3: while H 6= NIL and Rank(H) ≤ Rank(T ) do
4: T1 ← H
5: H ← Sibling(H)
6: if Rank(T ) = Rank(T1 ) then
7: T ← Link(T, T1 )
8: else
9: Sibling(p) ← T1
10: p ← T1
11: Sibling(p) ← T
12: Sibling(T ) ← H
13: return Remove-First(⊥)

14: function Remove-First(H)


15: n ← Sibling(H)
16: Sibling(H) ← NIL
17: return n
2 It’s similar to adding two binary numbers. A more generic topic is numeric representation [3] .
144 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

10.1.4 Merge
When merge two binomial heaps, we actually merge two lists of binomial trees. Every
tree has the unique rank in the merged result in ascending order. The tree merge process
is similar to merge sort (see chapter 13). Every time, we pick the first tree from each
heap, compare their ranks, put the smaller one to the result. If the two trees have the
same rank, we link them to a bigger one, and recursively insert to the merge result.
merge ts1 [ ] = ts1
merge [ ] ts2 = ts
2
rank t1 < rank t2 : t1 : (merge ts1 (t2 :ts2 ))

merge (t1 :ts1 ) (t2 :ts2 ) = rank t2 < rank t1 : t2 : (merge (t1 :ts1 ) ts2 )
otherwise :

ins (link t1 t2 ) (merge ts1 ts2 )

(10.6)
Alternatively, when t1 and t2 have the same rank, we can insert the linked tree back
to either heap, and recursively merge:
merge (ins (link t1 t2 ) ts1 ) ts2
We can eliminate recursion, and implement the iterative merge:
1: function Merge(H1 , H2 )
2: H ← p ← Node(0, NIL, NIL)
3: while H1 6= NIL and H2 6= NIL do
4: if Rank(H1 ) < Rank(H2 ) then
5: Sibling(p) ← H1
6: p ← Sibling(p)
7: H1 ← Sibling(H1 )
8: else if Rank(H2 ) < Rank(H1 ) then
9: Sibling(p) ← H2
10: p ← Sibling(p)
11: H2 ← Sibling(H2 )
12: else . same rank
13: T1 ← H1 , T2 ← H2
14: H1 ← Sibling(H1 ), H2 ← Sibling(H2 )
15: H1 ← Insert-Tree(Link(T1 , T2 ), H1 )
16: if H1 6= NIL then
17: Sibling(p) ← H1
18: if H2 6= NIL then
19: Sibling(p) ← H2
20: return Remove-First(H)
If there are m1 trees in H1 , m2 trees in H2 . There are at most m1 + m2 trees after
merge. The merge is bound to O(m1 +m2 ) time if all trees have different ranks. If there are
trees of the same rank, we call ins up to O(m1 + m2 ) times. Consider m1 = 1 + blg n1 c
and m2 = 1 + blg n2 c, where n1 , n2 are the numbers of elements in each heap, and
blg n1 c + blg n2 c ≤ 2blg nc, where n = n1 + n2 . The final performance of merge is O(lg n).

10.1.5 Pop
Although every tree has the minimal element in its root, we don’t know which tree holds
the overall minimum of the heap. We need locate it from all trees. As there are O(lg n)
trees, it takes O(lg n) time to find the top element.
top (t:ts) = f oldr f (key t) ts where f (r, x, ts) y = min x y (10.7)
10.1. BINOMIAL HEAPS 145

1: function Top(H)
2: m←∞
3: while H 6= NIL do
4: m ← Min(m, Key(H))
5: H ← Sibling(H)
6: return m
For pop, we need further remove the top element and maintain heap property. Let the
trees be Bi , Bj , ..., Bp , ..., Bm in the heap, and the minimum is in the root of Bp . After
remove the top, there leave p sub binomial trees with ranks of p − 1, p − 2, ..., 0. We can
reverse them to form a new binomial heap Hp . The other trees without Bp also form a
binomial heap H 0 = H − [Bp ]. We merge Hp and H 0 to get the final result as shown in
fig. 10.6. To support pop, we need extract the tree containing the minimum out:

Bp
min
… …

Bi
Bj …
B’0 Bm
B’p-2 B’1
B’p-1


B’0
B’1 B’p-2
B’p-1

Merge

Bi Bj
Bm

Figure 10.6: Binomial heap pop.

min0 [t] = (t,


( [ ])
key t < key t0 : (t, ts), where (t0 , ts0 ) = min0 ts (10.8)
min0 (t:ts) =
otherwise : (t0 , t:ts0 )

Where key (r, k, ts) = k accesses the root element, the result of min0 is a pair: the
tree containing the minimum and the remaining trees. We next define pop with it:

pop H = merge (reverse ts) H 0 , where ((r, k, ts), H 0 ) = min0 H (10.9)

The iterative implementation is as below:


1: function Pop(H)
2: (Tm , H) ← Extract-Min(H)
3: H ← Merge(H, Reverse(Sub-Trees(Tm )))
4: Sub-Trees(Tm )
5: return (Key(Tm ), H)
Where the list reverse is defined in chapter 1, Extract-Min is implemented as below:
1: function Extract-Min(H)
146 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

2: H 0 ← H, p ← NIL
3: Tm ← Tp ← NIL
4: while H 6= NIL do
5: if Tm = NIL or Key(H) < Key(Tm ) then
6: Tm ← H
7: Tp ← p
8: p←H
9: H ← Sibling(H)
10: if Tp 6= NIL then
11: Sibling(Tp ) ← Sibling(Tm )
12: else
13: H 0 ← Sibling(Tm )
14: Sibling(Tm ) ← NIL
15: return (Tm , H 0 )
We can implement heap sort with pop. First build a binomial heap from a list, then
repeatedly pop the minimum.

sort = heapSort ◦ f romList (10.10)

Where heapSort is defined as:

heapSort [ ] = [ ]
(10.11)
heapSort H = (top H) : heapSort (pop H)

Binomial heap insert and merge are bound to O(lg n) time in worst case, their amor-
tized performance are constant time, we skip the proof.

10.2 Fibonacci heap


Binomial heap is named after the binomial theorem, Fibonacci heap is named after Fi-
bonacci numbers3 . Fibonacci heap is essentially a ‘lazy’ binomial heap. It delays some
operation. However, it does not mean the binomial heap turns to be Fibonacci heap
automatically in lazy evaluation environment. Such environment only makes the im-
plementation easy [56] . All operations except for pop are bound to amortized constant
time [57] .

operation Binomial heap Fibonacci heap


insertion O(lg n) O(1)
merge O(lg n) O(1)
top O(lg n) O(1)
pop O(lg n) amortized O(lg n)

Table 10.1: Performance of Fibonacci heap and binomial heap

When insert x to a binomial heap, we wrap it to a single tree, then insert to the forest.
We maintain the rank ordering. If two ranks are same, link them, and recursively insert.
The performance is bound to O(lg n) time. Taking lazy strategy, we delay the ordered
(by rank) insert and link later. Put the singleton tree of x directly to the forest. To access
the top element in constant time, we need record which tree has the overall minimum. A
3 Michael L. Fredman and Robert E. Tarjan, used Fibonacci numbers to prove the performance time

bound, they decided to use Fibonacci to name this data structure. [4]
10.2. FIBONACCI HEAP 147

Fibonacci heap is either empty ∅, or a forest of trees denoted as (n, tm , ts). Where n is
the number of elements in the heap, tm is the tree holds the top element, and ts is the
rest trees. Below example defines Fibonacci heap (reuse the definition of binomial tree).
data FibHeap a = Empty | FibHeap { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]}

We can access the top in constant time (Curried form): top = key ◦ minTree.

10.2.1 Insert
We define insert as a special case of merge: one heap is a singleton tree: insert x H =
merge (singleton x) H. Or simplified in Curried form:

insert x = merge ◦ (singleton x) (10.12)

Where:
singleton x = (1, (1, x, [ ]), [ ])
Below is the imperative implementation:
1: function Insert(k, H)
2: x ← Singleton(k) . wrap k to a tree
3: Add(x, Trees(H))
4: Tm ← Min-Tree(H)
5: if Tm = NIL or k < Key(Tm ) then
6: Min-Tree(H) ← x
7: Size(H) ← Size(H) + 1
Where Trees(H) access the list of trees in H, Min-Tree(H) returns the tree that
holds the minimal element.

10.2.2 Merge
When merge two heaps, we delay the link, but only put the trees together, then pick the
new top.

merge h ∅ = h
merge ∅ h = h (
0 0 0 key tm < key t0m : (n + n0 , tm , t0m :ts +
+ ts0 )
merge (n, tm , ts) (n , tm , ts ) =
otherwise : 0 0
(n + n , tm , tm :ts ++ ts0 )
(10.13)
When neither tree is empty, the + + takes time that is proportion to the number of
trees in one heap. We can improve it to constant time with doubly linked-list as in below
example program.
data Node<K> {
K key
Int rank
Node<k> next, prev, parent, subTrees
}

data FibHeap<K> {
Int size
Node<K> minTree, trees
}
148 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

1: function Merge(H1 , H2 )
2: H ← Fib-Heap
3: Trees(H) ← Concat(Trees(H1 ), Trees(H2 ))
4: if Key(Min-Tree(H1 )) < Key(Min-Tree(H2 )) then
5: Min-Tree(H) ← Min-Tree(H1 )
6: else
7: Min-Tree(H) ← Min-Tree(H2 )
Size(H) = Size(H1 ) + Size(H2 )
8: return H

9: function Concat(s1 , s2 )
10: e1 ← Prev(s1 )
11: e2 ← Prev(s2 )
12: Next(e1 ) ← s2
13: Prev(s2 ) ← e1
14: Next(e2 ) ← s1
15: Prev(s1 ) ← e2
16: return s1

10.2.3 Pop
As the merge function delays the link, we need ‘compensate’ it during pop. We define it
as tree consolidation. Consider another problem: given a list of numbers of 2m (for some
integer m ≥ 0), for e.g., L = [2, 1, 1, 4, 8, 1, 1, 2, 4], we repeatedly sum two equal numbers
until every one is unique. The result is [8, 16], as shown in table 10.2. The first column
gives the number we are ‘scanning’; the second is the middle step, i.e. compare current
number and the first number in result list, add them when equal; the last column is the
merge result to the next step. We can define the consolidation with fold:

number compare, add result


2 2 2
1 1, 2 1, 2
1 (1+1), 2 4
4 (4+4) 8
8 (8+8) 16
1 1, 16 1, 16
1 (1+1), 16 2, 16
2 (2+2), 16 4, 16
4 (4+4), 16 8, 16

Table 10.2: Consolidation steps.

consolidate = f oldr melt [ ] (10.14)


Where melt is defined as below:
melt x [ ] = x

0
x = x : melt 2x xs
(10.15)

melt x (x0 :xs) = x < x0 : x:x0 :xs
x > x0 : x0 : melt x xs

Let n = sum L. consolidate actually represents n in binary format. The result contains
2i (i starts from 0) if and only if the i-th bit is 1. For e.g., sum[2, 1, 1, 4, 8, 1, 1, 2, 4] = 24.
10.2. FIBONACCI HEAP 149

It’s (11000)2 in binary, the 3rd and 4th bit are 1, hence the result contains 23 = 8, 24 = 16.
We can consolidate trees in similar way: compare the rank, and link trees:

melt t [ ] = [t]

rank t = rank t0 : melt (link t t0 ) ts
(10.16)

melt t (t0 :ts) = rank t < rank t0 : t:t0 :ts
rank t > rank t0 : t0 : melt t ts

Figure 10.7 gives the consolidation steps. It is similar to number consolidation when
compare with table 10.2. We can use an auxiliary array A to do the consolidation. A[i]
stores the tree of rank i. We traverse the trees in the heap. If meet another tree of rank
i, we link them together to obtain a bigger tree of rank i + 1, clean A[i], and next check
whether A[i + 1] is empty or not. If there is a tree of rank i + 1, then link them together
again. Array A stores the final consolidation result after traverse.

a c d e i q r s u

b f g j k m t v w

h l n o x

(0)

a a

a
b c e b c e i

c a
b c
d f g d f g j k m

b
d
h h l n o

p
(1), (2) (3) (4) (5)

q a q a

b c e i r s b c e i

d f g j k m t d f g j k m

h l n o h l n o

p p
(6) (7), (8)

Figure 10.7: Consolidation. Step 3, link d and c, then link a; Step 7, 8, link r and q, then
link s and q.

1: function Consolidate(H)
150 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

2: R ← Max-Rank(Size(H))
3: A ← [NIL, NIL, ..., NIL] . total R cells
4: for each T in Trees(H) do
5: r ← Rank(T )
6: while A[r] 6= NIL do
7: T 0 ← A[r]
8: T ← Link(T, T 0 )
9: A[r] ← NIL
10: r ←r+1
11: A[r] ← T
12: Tm ← NIL
13: Trees(H) ← NIL
14: for each T in A do
15: if T 6= NIL then
16: append T to Trees(H)
17: if Tm = N IL or Key(T ) < Key(Tm ) then
18: Tm ← T
19: Min-Tree(H) ← Tm
It becomes a binomial heap after consolidation. There are O(lg n) trees. Max-
Rank(n) returns the upper limit of rank R in a heap of n elements. From the binomial
tree result, the biggest tree BR has 2R elements. We have 2R ≤ n < 2R+1 , we estimate
the rough upper limit is R ≤ log2 n. We’ll give more accurate estimation of R in later
section. We need additionally scan all trees, find the minimal root element. We can reuse
min0 defined in eq. (10.8) to extract the min-tree.

pop (1, (0, x, [ ]), [ ]) = (x, [ ])


(10.17)
pop (n, (r, x, tsm ), ts) = (x, (n − 1, tm , ts0 ))

Where (tm , ts0 ) = min0 (consolidate (tsm +


+ ts)). It takes O(|tsm |) time for +
+ to
concatenate trees. The corresponding iterative implementation is as below:
1: function Pop(H)
2: Tm ← Min-Tree(H)
3: for each T in Sub-Trees(Tm ) do
4: append T to Trees(H)
5: Parent(T ) ← NIL
6: remove Tm from Trees(H)
7: Size(H) ← Size(H) - 1
8: Consolidate(H)
9: return (Key(Tm ), H)
We use the ‘potential’ method to evaluate the amortized performance. The gravity
potential energy in physics is defined as:

E = mgh

As shown in fig. 10.8, consider some process, that moves an object of mass m up and
down, and finally stops at height h0 . Let the friction resistance be Wf , the process works
the following power:

W = mg(h0 − h) + Wf

Consider heap pop. To evaluate the cost, let the potential be Φ(H) before pop. It
is the result accumulated by a series of insert and merge operations. The heap becomes
10.2. FIBONACCI HEAP 151

h'

Figure 10.8: Gravity potential energy.

H 0 after tree consolidation. The new potential is Φ(H 0 ). The difference between Φ(H 0 )
and Φ(H), plus the cost of tree consolidation give the amortized performance. Define the
potential as:

Φ(H) = t(H) (10.18)

Where t(H) is the number of trees in the heap. Let the upper bound of rank for all
trees as R(n), where n is the number of elements in the heap. After tree consolidation,
there are at most t(H 0 ) = R(n) + 1 trees. Before consolidation, there is another operation
contributes to running time. we removed the root of min-tree, then add all sub-trees to
the heap. We consolidate at most R(n) + t(H) − 1 trees. Let the pop performance bound
to T , the consolidation bound to Tc , the amortized time is given as below:

T = Tc + Φ(H 0 ) − Φ(H)
= O(R(n) + t(H) − 1) + (R(n) + 1) − t(H) (10.19)
= O(R(n))

Insert, merge, and pop ensure all trees are binomial trees, therefore, the upper bound
of R(n) is O(lg n).

10.2.4 Increase priority


We can use heap to manage tasks with priority. When need prioritize a task, we decrease
(for the min heap) the value, making it close to the heap top. Some graph algorithms,
like the minimum spanning tree and Dijkstra’s algorithm require this heap operation [4]
meet amortized constant time. Let x be a node, we need decrease its value to k. As
shown in fig. 10.9, if x is less than its parent y (in terms of key), we cut x off y, then
add it to the heap (forest). Although the parent still holds the minimum of the tree, it
is not a binomial tree any more. The performance drops when loss too many sub-trees.
We add another rule to address this problem: If a node losses its second sub-tree, it is
immediately cut from parent, and added to the heap (forest).
1: function Decrease(H, x, k)
2: Key(x) ← k
3: p ← Parent(x)
152 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

y ...

x ...

... r
x

y ...

...

Figure 10.9: If key x < key y, cut x off and add to the heap.

4: if p 6= NIL and k < Key(p) then


5: Cut(H, x)
6: Cascade-Cut(H, p)
7: if k < Top(H) then
8: Min-Tree(H) ← x
Where function Cascade-Cut uses a mark to record whether a node lost sub-tree
before. The mark is cleared later in Cut function.
1: function Cut(H, x)
2: p ← Parent(x)
3: remove x from p
4: Rank(p) ← Rank(p) - 1
5: add x to Trees(H)
6: Parent(x) ← NIL
7: Mark(x) ← False
During cascade cut, if node x is marked, it has lost some sub-tree before. We need
recursively cut along the parent till root.
1: function Cascade-Cut(H, x)
2: p ← Parent(x)
3: if p 6= NIL then
4: if Mark(x) = False then
5: Mark(x) ← True
6: else
7: Cut(H, x)
8: Cascade-Cut(H, p)

Exercise 10.2

Why is Decrease bound to amortized O(1) time?


10.2. FIBONACCI HEAP 153

10.2.5 The name of Fibonacci heap


We are yet to implement Max-Rank(n). It defines the upper bound of tree rank for a
Fibonacci heap of n elements.
Lemma 10.2.1. For any tree x in a Fibonacci Heap, let k = rank(x), and |x| = size(x),
then

|x| ≥ Fk+2 (10.20)

Where Fk is the k-th Fibonacci number:

F0 = 0
F1 = 1
Fk = Fk−1 + Fk−2

Proof. For tree x, let its k sub-trees be y1 , y2 , ..., yk , ordered by the time when they are
linked to x. Where y1 is the earliest, and yk is the latest. Obviously, |yi | ≥ 0. When link
yi to x, there have already been sub-trees of y1 , y2 , ..., yi−1 . Because we only link nodes
of the same rank, by that time we have:

rank(yi ) = rank(x) = i − 1

After that, yi can lost an additional sub-tree at most, (through the Decrease). Once
loss the second sub-tree, it will be cut off then added to the forest. For any i = 2, 3, ..., k,
we have:

rank(yi ) ≥ i − 2

Let sk be the minimum possible size of tree x, where k = rank(x). It starts from
s0 = 1, s1 = 2. i.e. there is at least a node in tree of rank 0, at least two nodes in tree of
rank 1, at least k nodes in tree of rank k.

|x| ≥ sk
= 2 + srank(y2 ) + srank(y3 ) + ... + srank(yk )
≥ 2 + s0 + s1 + ... + sk−2

The last row holds because rank(yi ) ≥ i − 2, and sk is monotonic, hence srank(yi ) ≥
si−2 . We next show that sk > Fk+2 . Apply induction. For edge case, s0 = 1 ≥ F2 = 1,
and s1 = 2 ≥ F3 = 2; For induction case k ≥ 2.

|x| ≥ sk
≥ 2 + s0 + s1 + ... + sk−2
≥ 2 + F2 + F3 + ... + Fk induction hypothesis
= 1 + F0 + F1 + F2 + ... + Fk F0 = 0, F1 = 1

Next, we prove:
k
(10.21)
X
Fk+2 = 1 + Fi
i=0

Use induction again:

• Edge case, F2 = 1 + F0 = 2
154 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

• Induction case, suppose it’s true for k + 1.

Fk+2 = Fk+1 + Fk
k−1
induction hypothesis
X
= (1 + Fi ) + Fk
i=0
k
X
= 1+ Fi
i=0

Wrap up to the final result:


n ≥ |x| ≥ Fk+2 (10.22)


1+ 5
For Fibonacci sequence, Fk ≥ φ , where φ =
k
is the golden ratio. We prove
2
that pop is amortized O(lg n) algorithm. We can define maxRank as:

maxRank(n) = 1 + blogφ nc (10.23)

We can also implement Max-Rank from Fibonacci numbers:


1: function Max-Rank(n)
2: F0 ← 0, F1 ← 1
3: k←2
4: repeat
5: Fk ← Fk1 + Fk2
6: k ←k+1
7: until Fk < n
8: return k − 2

10.3 Pairing Heaps


It’s complex to implement Fibonacci heap. Pairing heap provides another option. It’s
easy to implement, and the performance is good. Most operations, like insert, top, merge
are bound to constant time. the pop is conjectured to be amortized O(lg n) time [58] [3] .

10.3.1 Definition
A pairing heap is a multi-way tree. The root holds the minimum. A pairing heap is either
empty ∅, or a k-ary tree, consists of a root and multiple sub-trees, denoted as (x, ts). We
can also use ‘left child, right sibling’ way to define the tree.
data PHeap a = Empty | Node a [PHeap a]

10.3.2 Merge, insert, and top


There are two cases for merge:

1. Either heap is ∅, the result is the other heap;

2. Otherwise, compare the two roots, set the greater one as the new sub-tree of the
other.
10.3. PAIRING HEAPS 155

merge ∅ h2 = h2
merge h1 ∅ = h
(1
(10.24)
x<y: (x, (y, ts2 ):ts1 )
merge (x, ts1 ) (y, ts2 ) =
otherwise : (y, (x, ts1):ts2 )

merge is in constant time. Below is the imperative implementation with the ‘left-child,
right sibling’ method:
1: function Merge(H1 , H2 )
2: if H1 = NIL then
3: return H2
4: if H2 = NIL then
5: return H1
6: if Key(H2 ) < Key(H1 ) then
7: Exchange(H1 ↔ H2 )
8: Sub-Trees(H1 ) ← Link(H2 , Sub-Trees(H1 ))
9: Parent(H2 ) ← H1
10: return H1
Similar to Fibonacci heap, we implement insert with merge as eq. (10.12), access the
top from the root: top (x, ts) = x. Both operations are in constant time.

10.3.3 decrease key


When decrease the value of a node, if it is the root, then directly alter the value; otherwise,
cut the sub-tree rooted at this node, then merge it back to the heap.
1: function Decrease(H, x, k)
2: Key(x) ← k
3: p ← Parent(x)
4: if p 6= NIL then
5: Remove x from Sub-Trees(p)
6: Parent(x) ← NIL
7: return Merge(H, x)
8: return H

10.3.4 Pop
After pop the root, we consolidate the sub-trees to a tree:

pop (x, ts) = consolidate ts (10.25)

We firstly merge every two sub-trees from left to right, then merge these paired results
from right to left to a tree. This explains the name of ‘paring heap’. Figures 10.10
and 10.11 show the paired merge.

consolidate [ ] = ∅
consolidate [t] = t (10.26)
consolidate (t1 :t2 :ts) = merge (merge t1 t2 ) (consolidate ts)

The corresponding ‘left child, right sibling’ implementation is as below:


1: function Pop(H)
2: L ← NIL
3: for every Tx , Ty in Sub-Trees(H) do
4: T ← Merge(Tx , Ty )
156 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

5 4 3 12 7 10 11 6 9

15 13 8 17 14

16

5 4 3 12 7 10 11 6 9

15 13 8 17 14

16

4 3 7 6 9

5 13 12 8 10 11 17 14

16
15

Figure 10.10: Pop the root, merge the 9 sub-trees in pairs, leave the last tree.

5: L ← Link(T, L)
6: H ← NIL
7: for T in L do
8: H ← Merge(H, T )
9: return H
We iterate to merge Tx , Ty to T , and link ahead of L. When loop on L the second
time, we actually traverse from right to left. If there are odd number of sub-trees, Ty =
NIL at last, hence T = Tx in this case.

10.3.5 Delete
To delete a node x, first decrease the value of x to −∞, then followed with a pop.
Alternatively, if x is the root, pop it; otherwise, cut x off H, then pop(x), and merge it
back to H:
1: function Delete(H, x)
2: if H = x then
3: Pop(H)
4: else
5: H ← Cut(H, x)
6: x ← Pop(x)
7: Merge(H, x)
As delete is implemented with pop, the performance is conjectured to be amortized
O(lg n) time.
10.3. PAIRING HEAPS 157

6
6 6 12 8

9 11
7 9 11 7 9 11

17 14
10 17 14 10 17 14

16
16 16

4 6 12 8

5 13 7 9 11

15 10 17 14

16

Figure 10.11: Merge from right to left. (d) merge 9, 6; (e) merge 7; (f) merge 3; (g) merge
4.
158 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

Exercise 10.3
10.3.1. If continuously insert n elements then followed with a pop. The performance
overhead is big when n is a large number (although the amortized performance is
O(lg n)). How to mitigate such worst case?
10.3.2. Implement delete for the pairing heap.
10.3.3. Implement Decrease-Key for the pairing heap.

10.4 Appendix - example programs


Definition of multi-way tree (left child, right sibling):
data Node<K> {
Int rank
K key
Node<K> parent, subTrees, sibling,
Bool mark

Node(K x) {
key = x
rank = 0
parent = subTrees = sibling = null
mark = false
}
}

Link binomial trees:


Node<K> link(Node<K> t1, Node<K> t2) {
if t2.key < t1.key then (t1, t2) = (t2, t1)
t2.sibling = t1.subTrees
t1.subTrees = t2
t2.parent = t1
t1.rank = t1.rank + 1
return t1
}

Binomial heap insert:


Node<K> insert(K x, Node<K> h) = insertTree(Node(x), h)

Node<K> insertTree(Node<K> t, Node<K> h) {


var h1 = Node()
var prev = h1
while h 6= null and h.rank ≤ t.rank {
var t1 = h
h = h.sibling
if t.rank == t1.rank {
t = link(t, t1)
} else {
prev.sibling = t1
prev = t1
}
}
prev.sibling = t
t.sibling = h
return removeFirst(h1)
}

Node<K> removeFirst(Node<K> h) {
var next = h.sibling
h.sibling = null
10.4. APPENDIX - EXAMPLE PROGRAMS 159

return next
}

Binomial heap recursive insert:


data BiTree a = Node { rank :: Int
, key :: a
, subTrees :: [BiTree a]}

type BiHeap a = [BiTree a]

link t1@(Node r x c1) t2@(Node _ y c2) =


if x < y then Node (r + 1) x (t2:c1)
else Node (r + 1) y (t1:c2)

insertTree t [] = [t]
insertTree t ts@(t':ts') | rank t < rank t' = t:ts
| rank t > rank t' = t' : insertTree t ts'
| otherwise = insertTree (link t t') ts'

insert x = insertTree (Node 0 x [])

Binomial heap merge:


Node<K> merge(h1, h2) {
var h = Node()
var prev = h
while h1 6= null and h2 6= null {
if h1.rank < h2.rank {
prev.sibling = h1
prev = prev.sibling
h1 = h1.sibling
} else if h2.rank < h1.rank {
prev.sibling = h2
prev = prev.sibling
h2 = h2.sibling
} else {
var (t1, t2) = (h1, h2)
(h1, h2) = (h1.sibling, h2.sibling)
h1 = insertTree(link(t1, t2), h1)
}
if h1 6= null then prev.sibling = h1
if h2 6= null then prev.sibling = h2
return removeFirst(h)
}

Binomial heap recursive merge:


merge ts1 [] = ts1
merge [] ts2 = ts2
merge ts1@(t1:ts1') ts2@(t2:ts2')
| rank t1 < rank t2 = t1:(merge ts1' ts2)
| rank t1 > rank t2 = t2:(merge ts1 ts2')
| otherwise = insertTree (link t1 t2) (merge ts1' ts2')

Binomial tree pop:


Node<K> reverse(Node<K> h) {
Node<K> prev = null
while h 6= null {
var x = h
h = h.sibling
x.sibling = prev
prev = x
}
return prev
160 CHAPTER 10. BINOMIAL HEAP, FIBONACCI HEAP, AND PAIRING HEAP

(Node<K>, Node<K>) extractMin(Node<K> h) {


var head = h
Node<K> tp = null
Node<K> tm = null
Node<K> prev = null
while h 6= null {
if tm == null or h.key < tm.key {
tm = h
tp = prev
}
prev = h
h = h.sibling
}
if tp 6= null {
tp.sibling = tm.sibling
} else {
head = tm.sibling
}
tm.sibling = null
return (tm, head)
}

(K, Node<K>) pop(Node<K> h) {


var (tm, h) = extractMin(h)
h = merge(h, reverse(tm.subtrees))
tm.subtrees = null
return (tm.key, h)
}

Binomial heap recursive pop:


pop h = merge (reverse $ subTrees t) ts where
(t, ts) = extractMin h

extractMin [t] = (t, [])


extractMin (t:ts) = if key t < key t' then (t, ts)
else (t', t:ts') where
(t', ts') = extractMin ts

Merge Fibonacci heaps with bidirectional linked list:


FibHeap<K> merge(FibHeap<K> h1, FibHeap<K> h2) {
if isEmpty(h1) then return h2
if isEmpty(h2) then return h1
FibHeap<K> h = FibHeap<K>()
h.trees = concat(h1.trees, h2.trees)
h.minTree = if h1.minTree.key < h2.minTree.key
then h1.minTree else h2.minTree
h.size = h1.size + h2.size
return h
}

bool isEmpty(FibHeap<K> h) = (h == null or h.trees == null)

Node<K> concat(Node<K> first1, Node<K> first2) {


var last1 = first1.prev
var last2 = first2.prev
last1.next = first2
first2.prev = last1
last2.next = first1
first1.prev = last2
return first1
}
10.4. APPENDIX - EXAMPLE PROGRAMS 161

Consolidate trees in Fibonacci heap:


consolidate = foldr melt [] where
melt t [] = [t]
meld t (t':ts) | rank t == rank t' = meld (link t t') ts
| rank t < rank t' = t : t' : ts
| otherwise = t' : meld t ts

Consolidate trees with auxiliary array:


void consolidate(FibHeap<K> h) {
Int R = maxRank(h.size) + 1
Node<K>[R] a = [null, ...]
while h.trees 6= null {
var x = h.trees
h.trees = remove(h.trees, x)
Int r = x.rank
while a[r] 6= null {
var y = a[r]
x = link(x, y)
a[r] = null
r = r + 1
}
a[r] = x
}
h.minTr = null
h.trees = null
for var t in a if t 6= null {
h.trees = append(h.trees, t)
if h.minTr == null or t.key < h.minTr.key then h.minTr = t
}
}

Fibonacci heap pop:


pop (FibHeap _ (Node _ x []) []) = (x, Empty)
pop (FibHeap sz (Node _ x tsm) ts) = (x, FibHeap (sz - 1) tm ts') where
(tm, ts') = extractMin $ consolidate (tsm ++ ts)

Decrease value in Fibonacci heap:


void decrease(FibHeap<K> h, Node<K> x, K k) {
var p = x.parent
x.key = k
if p 6= null and k < p.key {
cut(h, x)
cascadeCut(h, p)
}
if k < h.minTr.key then h.minTr = x
}

void cut(FibHeap<K> h, Node<K> x) {


var p = x.parent
p.subTrees = remove(p.subTrees, x)
p.rank = p.rank - 1
h.trees = append(h.trees, x)
x.parent = null
x.mark = false
}

void cascadeCut(FibHeap<K> h, Node<K> x) {


var p = x.parent
if p == null then return
if x.mark {
cut(h, x)
cascadeCut(h, p)
162 Queue

} else {
x.mark = true
}
}
Chapter 11

Queue

Queue supports first-in, first-out (FIFO). There are many ways to implement queue,
e.g., through linked-list, doubly liked list, circular buffer, etc. Okasaki gives 16 different
implementations in [3] . A queue satisfies the following two requirements:

1. Add a new element to the tail in constant time;


2. Access or remove an element from head in constant time.

It’s easy to realize queue with doubly linked-list. We skip this implementation, and
focus on using other basic data structures, like (singly) linked-list or array.

11.1 Linked-list queue


We can insert or remove element from the head of a linked-list. However, to support FIFO,
we have to do one operation in head and the other in tail. We need O(n) time to traverse
and reach the tail, where n is the length. To achieve the constant time performance goal,
we need an additional variable to record the tail position, and use a sentinel node S to
simplify the empty queue handling, as shown in fig. 11.1.
data Node<K> {
K key
Node<K> next
}

data Queue<K> {
Node<K> head, tail
}

Figure 11.1: Both head and tail point to S for empty queue.

Define ‘enqueue’ (also called push, snoc, append, or push back) and ‘dequeue’ (also
called pop, or pop front) to add and remove element respectively. When implement queue
with list, we push on head, and pop from tail.

163
164 CHAPTER 11. QUEUE

1: function Enqueue(Q, x)
2: p ← Node(x)
3: Next(p) ← NIL
4: Next(Tail(Q)) ← p
5: Tail(Q) ← p
As there is at least a S node even for empty queue, we need not check if the tail is
NIL.
1: function Dequeue(Q)
2: x ← Head(Q)
3: Next(Head(Q)) ← Next(x)
4: if x = Tail(Q) then . Q is empty
5: Tail(Q) ← Head(Q)
6: return Key(x)
As the S node is ahead of all other nodes, Head actually returns the next node to S,
as shown in fig. 11.2. It’s easy to expand this implementation to concurrent environment
with two locks on the head and tail respectively. S node helps to prevent dead-lock when
the queue is empty [59] [60] .

NIL

Figure 11.2: List with S node.

11.2 Circular buffer


Symmetrically, we can append element to the tail of array in constant time, but it takes
O(n) time to remove from head. This is because we need shift all elements one cell ahead.
The idea of circular buffer is to reuse the free cells before the first valid element after
remove from head, as shown figs. 11.3 and 11.4. We can use the head index, the length
count, and the size of the array to define a queue. It’s empty when the count is 0, it’s
full when count = size, we can also simplify the enqueue/dequeue implementation with
modular operation.
tail

head

Figure 11.3: Circular buffer.

1: function Enqueue(Q, x)
11.3. PAIRED-LIST QUEUE 165

head tail boundary head tail boundary

a[0] a[1] ... a[i] ... ... a[j] ... a[i] ...

(a) Enqueue some elements. (b) Free cells after dequeue.


head tail boundary boundary
tail head

... a[j] ... a[i] a[0] ... a[j] ...

(c) Enqueue more elements to the (d) Enqueue the next element to
boundary the first cell.
tail head boundary

a[0] a[1] ... a[j-1] a[j] ...

(e) All cells are occupied, full.

Figure 11.4: Circular buffer queue

2: if not Full(Q) then


3: Count(Q) ← Count(Q) + 1
4: tail ← (Head(Q) + Count(Q)) mod Size(Q)
5: Buf(Q)[tail] ← x
1: function Dequeue(Q)
2: x ← NIL
3: if not Empty(Q) then
4: h ← Head(Q)
5: x ← Buf(Q)[h]
6: Head(Q) ← (h + 1) mod Size(Q)
7: Count(Q) ← Count(Q) - 1
8: return x

Exercise 11.1
11.1.1. The circular buffer is allocated with a predefined size. We can use two references,
head and tail instead. How to determine if a circular buffer queue is full or empty?
(the head can be either ahead of tail or behind it.)

11.3 Paired-list queue


We can access the list head in constant time, but need linear time to access the tail. We
connect two lists ‘tail to tail’ to implement queue, as shown in fig. 11.5. Define the queue
as (f, r), where f is the front list, and r is the rear list. The empty list is ([ ], [ ]). We
push new element to the head of r, and pop from the tail of f . Both are constant time.
(
push x (f, r) = (f, x:r)
(11.1)
pop (x:f, r) = (f, r)

f may become empty after a series of pops, while r still contains elements. To continue
166 CHAPTER 11. QUEUE

pop x1 x2 ... xn NIL

front

push y1 y2 ... ym NIL

rear

Figure 11.5: paired-list queue.

pop, we reverse r to replace f , i.e., ([ ], r) 7→ (reverser, [ ]). We need check and balance
in every push/pop:
balance [ ] r = (reverse r, [ ])
(11.2)
balance f r = (f, r)
Although the time is bound to linear time when reverse r, the amortised performance
is constant time. Update push/pop as below:

push x (f, r) = balance f (x:r)


(
(11.3)
pop (x:f, r) = balance f r

There is a symmetric implementation with a pair of arrays (table 11.1). Connect two
arrays head to head, as shown in fig. 11.6. When R becomes empty, reverse array F to
replace R.

operation array list


insert to head O(n) O(1)
append to tail O(1) O(n)
remove from head O(n) O(1)
remove from tail O(1) O(n)

Table 11.1: array and list

x1 x2 ... xn push

front

y1 y2 ... ym pop

rear

Figure 11.6: paired-array queue.

Exercise 11.2
11.2.1. Why need balance check and adjustment after push?
11.2.2. Do the amortized analysis for the paired-list queue.
11.2.3. Implement the paired-array queue.
11.4. BALANCE QUEUE 167

11.4 Balance Queue


Although paired-list queue performs in amortized constant time, it is linear time in worse
case. For e.g., there is an element in f , then repeat pushing n elements. Now it takes
O(n) time to pop. f and r are unbalance in this case. To solve it, we add another rule:
the length of r is not greater than f , otherwise reverse.

|r| ≤ |f | (11.4)

We check the lengths in every push/pop, however, it takes linear time to compute
length. We can record the length in a variable, and update it during push/pop. Denote
the paired-list queue as (f, n, r, m), where n = |f |, m = |r|. From the balance rule
eq. (11.4), we only need check the length of f to test if a queue is empty:

Q = φ ⇐⇒ n = 0 (11.5)

Change the definition of push/pop to:

push x (f, n, r, m) = balance (f, n, x:r, m + 1)


(
(11.6)
pop (x:f, n, r, m) = balance (f, n − 1, r, m)

Where:
(
m≤n: (f, n, r, m)
balance (f, n, r, m) = (11.7)
otherwise : (f ++ reverse r, m + n, [ ], 0)

11.5 Real-time queue


It still takes linear time to reverse and concatenate lists in balanced queue. The real-
time queue need guarantee constant time in every push/pop operation. The performance
bottleneck happens in f + + reverse r. At this time, m > n, breaks the balance rule. Since
m, n are integers, we know m = n + 1. + + takes O(n) time, and reverse takes O(m) time.
The total time is bound to O(n + m), which is proportion to the number of elements. Our
solution is to distribute the computation to multiple push and pop operations. Revisit
the tail recursive [61] [62] reverse (in Curried form):

reverse = reverse0 [ ] (11.8)

where:
reverse0 a [ ] = a
(11.9)
reverse0 a (x:xs) = reverse0 (x:a) xs

We turn the tail recursive implementation to stepped computation. Model it as a


series of state transformations. Define a state machine with two states: reverse state Sr ,
and complete state Sf . We slow-down the reverse computation as below:

step Sr a [ ] = (Sf , a)
(11.10)
step Sr a (x:xs) = (Sr , (x:a), xs)

Each step, we check and transform the state. Sr means the reverse is on going. If
there is no remaining element to reverse, we change the state to Sf (done); otherwise,
link the head x ahead of a. Every step terminates, but not continues running recursively.
The new state with the intermediate reverse result is input to the next step. For example:
168 CHAPTER 11. QUEUE

step Sr “hello” [ ] = (Sr , “ello”, “h”)


step Sr “ello” “h” = (Sr , “llo”, “eh”)
...
step Sr “o” “lleh” = (Sr , [ ], “olleh”)
step Sr [ ] “olleh” = (Sf , “olleh”)

However, it only solves half problem. We also need slow-down + + computation, which
is more complex. Use state machine again. To concatenate xs + + ys, we first reverse xs to
←− then pick elements from ←
xs, − one by one, and link each ahead of ys. The idea is similar
xs
to reverse :
0

xs +
+ ys = (reverse reverse xs) ++ ys
= (reverse0 [ ] ←
− +
xs) + ys (11.11)
= reverse0 ys ← −
xs
←−
We need add another state: after reverse r, step by step concatenate from f . The
three states are: Sr of reverse, Sc of concatenate, Sf of completion. The two phases are:


1. Stepped reverse f to f , and r to ←

r in parallel;


2. Stepped take elements from f , and link each ahead of ←

r.

next (Sr , f 0 , x:f, r0 , y:r) = reverse f, r


(Sr , x:f 0 , f, y:r0 , r)
next (Sr , f 0 , [ ], r0 , [y]) = reverse done, start concatenation
next (Sc , f 0 , y:r0 )
next (Sc , a, [ ]) = (Sf , a) done
next (Sc , a, x:f 0 ) = concatenation
(Sc , x:a, f 0 )
(11.12)
We need arrange these steps to each push/pop. From the balance rule, when m = n+1,
triggers f +
+ reverse r. it takes n + 1 steps to reverse r. Within these steps, we reverse
f in parallel. After that, it takes another n + 1 steps to concatenate, in total 2n + 2
steps. The critical question is: Before complete the 2n + 2 steps, will the queue become
unbalanced again due to a series of push/pop operations?
Luckily, repeat pushing won’t break the balance rule again before we complete f + +
reverse r. We will obtain a new front list f 0 = f +
+ reverse r after 2n + 2 steps, while the
time to break the balance rule again is:

|r0 | = |f 0 | + 1
= |f | + |r| + 1 (11.13)
= 2n + 2

Thanks to the balance rule. Even repeat pushing as many as possible, the 2n + 2
steps are guaranteed to be completed before the next time when the queue is unbalanced,
hence the new f will be ready. We can safely start to compute f 0 + + reverse r0 .
However, pop may happen before the completion of 2n + 2 steps. We are facing the
situation that needs to extract element from f , while the new front list f 0 = f +
+ reverse r
hasn’t been ready yet. To solve this issue, we duplicate a copy of f when doing reverse f .
We are safe even repeat pop for n times. Table 11.2 shows the queue during phase 1
(reverse f and r in parallel)1 .
The copy of f is exhausted after repeating n pops. We are about to start stepped
concatenation. What if pop happens at this time? Since f is exhausted, it becomes [ ].
We needn’t concatenate anymore. This is because f + +← −
r = [ ]+ +← −
r = ← −
r . In fact,
1 Although it takes linear time to duplicate a list, however, the one-shot copying won’t happen at all.

We actually duplicate the reference to the front list, and delay the element level copying to each step
11.5. REAL-TIME QUEUE 169

f copy on-going part new r


{fi , fi+1 , ..., fn } (Sr , f˜, ..., r̃, ...) {...}
←− −
first i − 1 elements out intermediate f , ← r newly pushed

Table 11.2: Before completion of the first n steps.

we only need to concatenate the elements in f that haven’t been popped. Because we
pop from the head of f , let us use a counter to record the remaining elements in f . It’s
initialized as 0. We apply +1 every time when reverse an element. It means we need
concatenate this element in the future; Whenever pop happens, we apply -1, means we
needn’t concatenate this one any more. We also decrease it during concatenation, and
cancel the process when it is 0. Below is the updated state transformation:

next (Sr , n, f 0 , x:f, r0 , y:r) = reverse f, r


(Sr , n + 1, x:f 0 , f, y:r0 , r)
next (Sr , n, f 0 , [ ], r0 , [y]) = reverse done, start concatenation
next (Sc , n, f 0 , y:r0 )
next (Sc , 0, a, f ) = (Sf , a) done
next (Sc , n, a, x:f 0 ) = concatenation
(Sc , n − 1, x:a, f 0 )
next S0 = S0 idle
(11.14)
We define an addition idle state S0 to simplify the logic. The queue contains 3 parts:
the front list f with its length n, the state S of on-going f +
+ reverse r, and the rear list
r with its length m. Denoted as (f, n, S, r, m). The empty queue is ([ ], 0, S0 , [ ], 0). The
queue is empty when n = 0 according to the balance rule. Update the push/pop as:
(
push x (f, n, S, r, m) = balance f n S (x:r) (m + 1)
(11.15)
pop (x:f, n, S, r, m) = balance f (n − 1) (abort S) r m

Where abort decreases the counter to cancel an element for concatenation. We’ll define
it later. balance triggers stepped f ++ reverse r if the queue is unbalanced, otherwise runs
a step:
(
m≤n: step f n S r m
balance f n S r m =
otherwise : step f (n + m) (next (Sr , 0, [ ], f, [ ], r)) [ ] 0
(11.16)
Where step transforms the state machine, ends with the idle state S0 when completes.

step f n S r m = queue (next S) (11.17)

Where:
queue (Sf , f 0 ) = (f 0 , n, S0 , r, m) replace f with f 0
(11.18)
queue S 0 = (f, n, S 0 , r, m)

Finally, define abort to cancel an element:

abort (Sc , 0, (x:a), f 0 ) = (Sf , a)


abort (Sc , n, a, f 0 ) = (Sc , n − 1, a, f 0 )
(11.19)
abort (Sr , n, f 0 f, r0 r) = (Sr , n − 1, f 0 , f, r0 , r)
abort S = S

Exercise 11.3
170 CHAPTER 11. QUEUE

11.3.1. Why need rollback an element (we cancelled the previous ‘cons’, removed x and
return a as the result) when n = 0 in abort?
11.3.2. Implement the real-time queue with paired arrays. We can’t copy the array when
start rotation, or the performance will downgrade to linear time. Please implement
‘lazy’ copy, i.e., copy an element per step.

11.6 Lazy real-time queue


The key to realize real-time queue is to break down the expensive f +
+ reverse r. We can
simplify it with lazy evaluation. Assume function rotate computes f +
+ reverse r in steps,
i.e., below two functions are equivalent with an accumulator a.

rotate xs ys a = xs +
+ (reverse ys) +
+a (11.20)

Initialize xs as the front list f , ys as the rear list r, the accumulator a empty [ ]. We
implement rotate from the edge case:

rotate [ ] [y] a = y:a (11.21)

The recursive case is:

rotate (x:xs) (y:ys) a


= (x:xs) ++ (reverse (y:ys)) +
+a from eq. (11.20)
+ reverse (y:ys)) +
= x : (xs + + a) concatenation is associative (11.22)
+ reverse ys +
= x : (xs + + (y:a)) reverse property, and associative
= x : rotate xs ys (y:a) reverse of eq. (11.20)

Summarize together:

rotate [ ] [y] a = y:a


(11.23)
rotate (x:xs) (y:ys) a = x : rotate xs ys (y:a)

In lazy evaluation settings, (:) is delayed to push/pop, hence the rotate is broken
down. We change the paired-list queue definition to (f, r, rot), where rot is the on-going
f++ reverse r computation. It is initialized empty [ ].
(
push x (f, r, rot) = balance f (x:r) rot
(11.24)
pop (x:f, r, rot) = balance f r rot

Every time, balance advances the rotation one step, and starts another round when
completes.

balance f r [ ] = (f 0 , [ ], f 0 ) where : f 0 = rotate f r [ ]


(11.25)
balance f r (x:rot) = (f, r, rot) advance the rotation

Exercise 11.4
Implement bidirectional queue, support add/remove elements on both head and tail
in constant time.
11.7. APPENDIX - EXAMPLE PROGRAMS 171

11.7 Appendix - example programs


List implemented queue:
Queue<K> enQ(Queue<K> q, K x) {
var p = Node(x)
p.next = null
q.tail.next = p
q.tail = p
return q
}

K deQ(Queue<K> q) {
var p = q.head.next //the next of S
q.head.next = p.next
if q.tail == p then q.tail = q.head //empty
return p.key
}

Circular buffer queue:


data Queue<K> {
[K] buf
int head, cnt, size

Queue(int max) {
buf = Array<K>(max)
size = max
head = cnt = 0
}
}

Enqueue, dequeue implementation for circular buffer queue:


N offset(N i, N size) = if i < size then i else i - size

void enQ(Queue<K> q, K x) {
if q.cnt < q.size {
q.buf[offset(q.head + q.cnt, q.size)] = x;
q.cnt = q.cnt + 1
}
}

K head(Queue<K> q) = if q.cnt == 0 then null else q.buf[q.head]

K deQ(Queue<K> q) {
K x = null
if q.cnt > 0 {
x = head(q)
q.head = offset(q→head + 1, q→size);
q.cnt = q.cnt -1
}
return x
}

Real-time queue:
data State a = Empty
| Reverse Int [a] [a] [a] [a] −− n, acc f, f, acc r, r
| Concat Int [a] [a] −− n, acc, reversed f
| Done [a] −− f’ = f ++ reverse r

−− f, n = length f, state, r, m = length r


data RealtimeQueue a = RTQ [a] Int (State a) [a] Int
172 Sequence

push x (RTQ f n s r m) = balance f n s (x:r) (m + 1)

pop (RTQ (_:f) n s r m) = balance f (n - 1) (abort s) r m

top (RTQ (x:_) _ _ _ _) = x

balance f n s r m
| m ≤ n = step f n s r m
| otherwise = step f (m + n) (next (Reverse 0 [] f [] r)) [] 0

step f n s r m = queue (next s) where


queue (Done f') = RTQ f' n Empty r m
queue s' = RTQ f n s' r m

next (Reverse n f' (x:f) r' (y:r)) = Reverse (n + 1) (x:f') f (y:r') r


next (Reverse n f' [] r' [y]) = next $ Concat n (y:r') f'
next (Concat 0 acc _) = Done acc
next (Concat n acc (x:f')) = Concat (n-1) (x:acc) f'
next s = s

abort (Concat 0 (_:acc) _) = Done acc −− rollback 1 elem


abort (Concat n acc f') = Concat (n - 1) acc f'
abort (Reverse n f' f r' r) = Reverse (n - 1) f' f r' r
abort s = s

Lazy real-time queue:


data LazyRTQueue a = LQ [a] [a] [a] −− front, rear, f ++ reverse r

empty = LQ [] [] []

push (LQ f r rot) x = balance f (x:r) rot

pop (LQ (_:f) r rot) = balance f r rot

top (LQ (x:_) _ _) = x

balance f r [] = let f' = rotate f r [] in LQ f' [] f'


balance f r (_:rot) = LQ f r rot

rotate [] [y] acc = y:acc


rotate (x:xs) (y:ys) acc = x : rotate xs ys (y:acc)
Chapter 12

Sequence

Sequence is the combination of array and list. We set the following goals for the ideal
sequence:

1. Add, remove element on head and tail in constant time;

2. Fast (no slower than linear time) concatenate two sequences;

3. Fast access, update element at any position;

4. Fast split at any position;

Array and list only satisfy these goals partially as shown in below table, where n is
the length for the sequence, and use n1 , n2 for lengths if there are two.

operation array list


add/remove on head O(n) O(1)
add/remove on tail O(1) O(n)
concatenate O(n2 ) O(n1 )
random access at i O(1) O(i)
remove at i O(n − i) O(i)

We give three implementations: binary random access list, concatenate-able list, and
finger tree.

12.1 Binary random access list


The binary random access list is a set of full binary trees (forest). The elements are stored
in leaves. For any integer n ≥ 0, we know how many trees need to hold n elements from its
binary format. Every bit of 1 represents a binary tree, the tree size is determined by the
magnitude of the bit. For any index 1 ≤ i ≤ n, we can locate the binary tree that stores
the i-th element. As shown in fig. 12.1, tree t1 , t2 represent sequence [x1 , x2 , x3 , x4 , x5 , x6 ].
Denote the full binary tree of depth i + 1 as ti . t0 only has a leaf node. There are 2i
leaves in ti . For sequence of n elements, represent n in binary as n = (em em−1 ...e1 e0 )2 ,
where ei is either 1 or 0.

n = 20 e0 + 21 e1 + ... + 2m em (12.1)

173
174 CHAPTER 12. SEQUENCE

Figure 12.1: A sequence of 6 elements.

If ei 6= 0, there is a full binary tree ti of size 2i . For example in fig. 12.1, the length
of the sequence is 6 = (110)2 . The lowest bit is 0, there’s no tree of size 1; the 2nd bit is
1, there is t1 of size 2; the highest bit is 1, there is t2 of size 4. In this way, we represent
sequence [x1 , x2 , ..., xn ] as a list of trees. Each tree has unique size, in ascending order.
We call it binary random access list [3] . Customize the binary tree definition: (1) only
store the element in leaf node as (x); (2) augment the size in each branch node as (s, l, r),
where s is the size of the tree, l, r are left and right sub-trees respectively. We get the
size as below:
size (x) = 1
(12.2)
size (s, l, r) = s

To add a new element y before sequence S, we create a singleton t0 tree t0 = (y), then
insert it to the forest. insert y S = insertT (y) S, or in Curried form:

insert y = insertT (y) (12.3)

Compare t0 with the first tree ti in the forest, if ti is bigger, then put t0 ahead of the
forest (in constant time); if they have the same size, then link them to a bigger tree (in
constant time): t0i+1 = (2s, ti , t0 ), then recursively insert t0i+1 to the forest, as shown in
fig. 12.2.

insertT t [ ] = [t]
(
size t < size t1 : t : t1 : ts (12.4)
insertT t (t1 :ts) =
otherwise : insertT (link t t1 ) ts

Where: link t1 t2 = (size t1 + size t2 , t1 , t2 ).


For n elements, there are m = O(lg n) trees in the forest. The performance is bound
to O(lg n) time. We’ll prove the amortized performance is constant time.
We define remove reversely. If the first tree is t0 (singleton leaf), then remove it;
otherwise, recursively split the first tree to obtain a t0 and remove, as shown in fig. 12.3.

extract ((x):ts) = (x, ts)


(12.5)
extract ((s, t1 , t2 ):ts) = extract (t1 :t2 :ts)

We call extract to remove element from head:

head = fst ◦ extract


(
(12.6)
tail = snd ◦ extract

Where fst (a, b) = a, snd (a, b) = b access the component in a pair.


The trees divide elements into chunks. For a given index 1 ≤ i ≤ n, first locate the
corresponding tree, then lookup the tree to access the element.
12.1. BINARY RANDOM ACCESS LIST 175

Figure 12.2: Insert x1 , x2 , ..., x6 . (a) Insert x1 , (b) Insert x2 , link to [t1 ]. (c) Insert x3 ,
result [t0 , t1 ]. (d) Insert x4 , link twice, generate [t2 ]. (e) Insert x5 , result [t0 , t2 ]. (f) Insert
x6 , result [t1 , t2 ].

Figure 12.3: Remove: (a) x1 , x2 , ..., x5 as [t0 , t2 ]. (b) Remove x5 (t0 ) directly. (c) Remove
x4 . Split twice to get [t0 , t0 , t1 ], then remove the head to get [t0 , t1 ].
176 CHAPTER 12. SEQUENCE

1. For the first tree t in the forest, if i ≤ size(t), then the element is in t, we next
lookup t for the target element;

2. Otherwise, let i0 = i − size(t), then recursively lookup the i0 -th element in the rest
trees.
(
i ≤ size t : lookupT i t
(t:ts)[i] = (12.7)
otherwise : ts[i − size t]

Where lookupT applies binary search. If i = 1, returns the root, else divides the tree
and recursively looks up:

lookupT 1 (x) = x
i ≤ b s c :

2
lookupT i t1 (12.8)
lookupT i (s, t1 , t2 ) =
otherwise : lookupT (i − b s c) t2
2

Figure 12.4 gives the steps to lookup the 4-th element in a sequence of length 6. The
size of the first tree is 2 < 4, move to the next tree and update the index to i0 = 4−2. The
size of the second tree is 4 > i0 = 2, we need lookup it. Because the index 2 is less than
the half size 4/2 = 2, we lookup the left, then the right, and finally locate the element.
Similarly, we can alter an element at a given position.

Figure 12.4: Steps to access S[4]: (a) S[4], 4 > size(t1 ) = 2, (b) S 0 [4 − 2] ⇒ lookupT 2 t2 ,
size(t2 )
(c) 2 ≤ b c ⇒ lookupT 2 lef t(t2 ), (d) lookupT 1 right(lef t(t2 )), return x3 .
2

There are O(lg n) full binary trees to hold n elements. For index i, we need at most
O(lg n) time to locate the tree, the next lookup time is proportion to the height, which
is O(lg n) at most. The overall random access time is bound to O(lg n).

Exercise 12.1

12.1.1. How to handle the out of bound exception?


12.2. NUMERIC REPRESENTATION 177

12.2 Numeric representation


The binary form of n = 20 e0 + 21 e1 + ... + 2m em maps to the forest. The ei is the i-th bit.
If ei = 1, there is a full binary tree of size 2i . Adding an element corresponds to +1 to
a binary number; while deleting corresponds to -1. We call such correspondence numeric
representation [3] . To explicitly express this correspondence, we define two states: Zero
means none existence of the binary tree, while One t means there exits tree t. As such,
we represent the forest as a list of binary states, and implement insert as binary add.
add t [ ] = [One t]
add t (Zero:ds) = (One t):ds (12.9)
add t ((One t0 ):ds) = Zero : add (link t t0 ) ds
When add tree t, if the forest is empty, we create a state of One t, it’s the only
bit, corresponding to 0 + 1 = 1. If the forest isn’t empty, and the first bit is Zero, we
use the state One t to replace Zero, corresponding to binary add (...digits...0)2 + 1 =
(...digits...1)2 . For e.g. 6+1 = (110)2 +1 = (111)2 = 7. If the first bit is One t0 , we assume
t and t0 have the same size because we always start to insert from a singleton leaf t0 = (x).
The tree size increase as a sequence of 1, 2, 4, ..., 2i , .... We link t and t0 , recursively insert
to the rest bits. The original One t0 is replaced by Zero. It corresponds to binary add
(...digits...1)2 + 1 = (...digits0 ...0)2 . For e.g. 7 + 1 = (111)2 + 1 = (1000)2 = 8.
Symmetrically, we implement remove as binary subtraction. If the sequence is a
singleton bit One t, it becomes empty after remove, corresponding to 1 − 1 = 0. If there
are multiple bits and the first one is One t, we replace it by Zero. This corresponds to
(...digits...1)2 − 1 = (...digits...0)2 . For e.g., 7 − 1 = (111)2 − 1 = (110)2 = 6. If the
first bit is Zero, we need borrow. We cursively extract tree from the rest bits, split into
two t1 , t2 , replace Zero to One t2 , and remove t1 . It corresponds to (...digits...0)2 − 1 =
(...digits0 ...1)2 . For e.g., 4 − 1 = (100)2 − 1 = (11)2 = 3.
minus [One t] = (t, [ ])
minus ((One t):ts) = (t, Zero:ts) (12.10)
minus (Zero:ts) = (t1 , (One t2 ):ts0 ), where : (s, t1 , t2 ) = minus ts
Numeric representation doesn’t change the performance. We next evaluate the amor-
tized time by aggregation. The steps to insert n = 2m elements to empty is given as
table 12.1:
i binary (MSB ... LSB)
0 0, 0, ..., 0, 0
1 0, 0, ..., 0, 1
2 0, 0, ..., 1, 0
3 0, 0, ..., 1, 1
... ...
2m − 1 1, 1, ..., 1, 1
2m 1, 0, 0, ..., 0, 0
bits changed 1, 1, 2, ... 2m−1 , 2m

Table 12.1: Insert 2m elements.

The LSB changes every time when insert, total 2m times. The second bit changes
every other time (link trees), total 2m−1 times. The second highest bit only changes 1
time, links all trees to the final one. The highest bit changes to 1 after insert the last
element. Sum as: T = 1 + 1 + 2 + 4 + ... + 2m−1 + 2m = 2m+1 . Hence the amortized
performance is:
178 CHAPTER 12. SEQUENCE

2m+1
O(T /n) = O( ) = O(1) (12.11)
2m
Proved the amortized constant time performance.

Exercise 12.2
12.2.1. Implement the random access for numeric representation S[i], 1 ≤ i ≤ n, where n
is the length of the sequence.
12.2.2. Analyze the amortized performance of delete.
12.2.3. We can represent the full binary tree with array of length 2m , where m is none
negative integer. Implement the binary tree forest, insert, and random access.

12.3 paired-array sequence


Expand the paired-array queue (section 11.3 in chapter 11) to paired-array sequence. As
shown in fig. 12.5, link two arrays head to head. When add an element from left, we
append to the tail of f ; when add from right, we append to the tail of r. Denote the
sequence as a pair S = (f, r), Front(S) = f , Rear(S) = r access them respectively.
Implement insert/append as below:

insert xn ... x2 x1 y1 y2 ... ym append

front rear

Figure 12.5: Paired-array sequence.

1: function Insert(x, S)
2: Append(x, Front(S))
3: function Append(x, S)
4: Append(x, Rear(S))
When access the i-th element, we first determine which array i indexes to, f or r? If
i ≤ |f |, the element is in f . Because f and r are connected head to head, we need index
from right of f at position |f | − i + 1; if i > |f |, the element is in r. We index from left
at position i − |f |.
1: function Get(i, S)
2: f, r ← Front(S), Rear(S)
3: n ← Size(f )
4: if i ≤ n then
5: return f [n − i + 1] . reversed
6: else
7: return r[i − n]
Removing can make f or r empty ([ ]), while the other is not. To re-balance, we
halve the none empty one, and reverse either half to form a new pair. As f and r are
symmetric, we can swap them, call Balance, then swap back.
1: function Balance(S)
2: f ← Front(S), r ← Rear(S)
3: n ← Size(f ), m ← Size(r)
4: if F = [ ] then
12.4. CONCATENATE-ABLE LIST 179

m
5: k←b c
2
6: return (Reverse(r[1...k]), r[(k + 1)...m])
7: if R = [ ] then
n
8: k←b c
2
9: return (f [(k + 1)...n], Reverse(f [1...k]))
10: return (f, r)
Every time when delete, we check f , r and balance them:
1: function Remove-Head(S)
2: Balance(S)
3: f, r ← Front(S), Rear(S)
4: if f = [ ] then . S = ([ ], [x])
5: r←[]
6: else
7: Remove-Last(f )

8: function Remove-Tail(S)
9: Balance(S)
10: f, r ← Front(S), Rear(S)
11: if r = [ ] then . S = ([x], [ ])
12: f ←[]
13: else
14: Remove-Last(r)
Due to reverse, the performance is O(n) in the worst case, where n is the number of
elements, while it is amortized constant time.

Exercise 12.3
12.3.1. Analyze the amortized performance for paired-array delete.

12.4 Concatenate-able list


We achieve O(lg n) time insert, delete, random index with binary tree forest. However,
it’s not easy to concatenate two sequences. We can’t merely merge trees, but need link
trees with the same size. Figure 12.6 shows an implementation of concatenate-able list.
The first element x1 is in root, the rest is organized with smaller sequences, each one
is a sub-tree. These sub-trees are put in a real-time queue (section 11.5 in chapter 11).
We denote the sequence as (x1 , Qx ) = [x1 , x2 , ..., xn ]. When concatenate with another
sequence of (y1 , Qy ) = [y1 , y2 , ..., ym ], we append it to Qx . The real-time queue guarantees
the en-queue in constant time, hence the concatenate performance is in constant time.

s++∅ = s
∅++s = s (12.12)
(x, Q) +
+ s = (x, push s Q)

When insert new element z, we create a singleton of (z, ∅), then concatenate it to the
sequence:
(
insert z s = (z, ∅) + +s
(12.13)
append z s = s + + (z, ∅)
180 CHAPTER 12. SEQUENCE

x1

c1 c2 ... cn

x2, x3, ..., xi xi+1, xi+2, ..., xj xk, xk+1, ..., xn

x1

c1 c2 ... cn

x2, x3, ..., xi xi+1, xi+2, ..., xj xk, xk+1, ..., xn

Figure 12.6: Concatenate-able list: (a) (x1 , Qx ) = [x1 , x2 , ..., xn ], (b) Concatenate with
(y1 , Qy ) = [y1 , y2 , ..., ym ], addcn+1 to Qx .
12.5. FINGER TREE 181

When delete x1 from head, we lose the root. The rest sub-trees are all concatenate-able
lists. We concatenate them together to a new sequence.

concat ∅ =
(12.14)

concat Q = (top Q) +
+ concat (pop Q)

The real-time queue holds sub-trees, we pop the first c1 , and recursively concatenate
the rest to s, then concatenate c1 and s. We define delete from head with concat.

tail (x, Q) = concat Q (12.15)

Function concat traverses the queue, and reduces to a result, it essentially folds on
Q [10] .

f old f z ∅ = z
(12.16)
f old f z Q = f (top Q) (f old f z (pop Q))

Where f is a binary function, z is zero unit. Here are examples of folding on queue
Q = {1, 2, ..., 5}:

f old (+) 0 Q = 1 + (2 + (3 + (4 + (5 + 0)))) = 15


f old (×) 1 Q = 1 × (2 × (3 × (4 × (5 × 1)))) = 120
f old (×) 0 Q = 1 × (2 × (3 × (4 × (5 × 0)))) = 0

We can define concat with fold (Curried form):

concat = f old (+
+) ∅ (12.17)

Normally, the add, append, delete randomly happen. The performance is bound
to linear time in worst case: delete after repeatedly add n elements. All n − 1 sub-
trees are singleton. concat takes O(n) time to consolidate. Nevertheless, the amortized
performance is constant time.

12.5 Finger tree


Binary random access list supports to insert, remove from head in amortized constant
time, and index in logarithm time. But we can’t easily append element to tail, or fast
concatenate. With concatenate-able list, we can concatenate, insert, and append in amor-
tized constant time, but can’t easily index element. From these two examples, we need:
(1) access head, tail fast to insert or delete; (2) the recursive structure, e.g., tree, to
realize random access as divide and conquer search. Finger tree [66] implements sequence
with these two ideas [65] . It’s critical to maintain the tree balanced to guarantee search
performance. Finger tree leverages 2-3 tree (a type of B-tree). A 2-3 tree is consist of 2
or 3 sub-trees, as (t1 , t2 ) or (t1 , t2 , t3 ).
data Node a = Br2 a a | Br3 a a a

We define a finger tree as one of below three:

1. empty ∅;
2. a singleton leaf (x);
3. a tree with three parts: a sub-tree, left and right finger, denoted as (f, t, r). Each
finger is a list up to 3 elements1 .
1 f: front, r: rear
182 CHAPTER 12. SEQUENCE

data Tree a = Empty


| Lf a
| Tr [a] (Tree (Node a)) [a]

12.5.1 Insert

a b NIL a d c b NIL a
NIL

e f a

d c b

Figure 12.7: Finger tree example

As shown in fig. 12.7. (1) is ∅; (2) is a singleton; (3) has two element in f and r for
each; (4) f will exceed 2-3 tree when add more. We need re-balance as in (5). There are
two elements in f , the middle is singleton of a 2-3 tree. These examples are list as below:

∅ Empty
(a) Lf a
([b], ∅, [a]) Tr [b] Empty [a]
([d, c, b], ∅, [a]) Tr [d, c, b] Empty [a]
([f, e], (d, c, b), [a]) Tr [f, e] Lf (Br3 d c b) [a]

In (5), the middle component is a singleton leaf. Finger tree is recursive, apart from
f and r, the middle is a deeper finger tree of type T ree (N ode a). One more wrap, one
level deeper. Summarize above examples, we define insert a to tree T as below:

1. If T = ∅, the result is a singleton (a);

2. If T = (b) is a leaf, the result is ([a], ∅, [b]);

3. For T = (f, t, r), if there are < 3 elements in f , we insert a to f , otherwise (≥ 3),
extract the last 3 elements from f to a 2-3 tree t0 , recursively insert t0 to t, then
insert a to f .
12.5. FINGER TREE 183

insert a ∅ = (x)
insert a (b) = ([a], ∅, [b])
(12.18)
insert a ([b, c, d], t, r) = ([a], insert (b, c, d) t, r)
insert a (f, t, r) = (a:f, t, r)
The insert performance is constant time except for the recursive case. The recursion
time is proportion to the height of the tree h. Because of 2-3 trees, it’s balanced, hence
h = O(lg n), where n is the number of elements. When distribute the recursion to other
cases, the amortized performance is constant time [3] [65] . We can repeatedly insert a list
of elements by folding:
xs  t = f oldr insert t xs (12.19)

Exercise 12.4
12.4.1. Eliminate recursion, implement insert with loop.

12.5.2 Extract
We implement extract as the reverse of insert.
extract (a) = (a, ∅)
extract ([a], ∅, [b]) = (a, (b))
extract ([a], ∅, b:bs) = (a, ([b], ∅, bs)) (12.20)
extract ([a], t, r) = (a, (toList f, t0 , r)), where : (f, t0 ) = extract t
extract (a:as, t, r) = (a, (as, t, r))
Where toList flattens a 2-3 tree to list:
toList (a, b) = [a, b]
(12.21)
toList (a, b, c) = [a, b, c]
We skip error handling (e.g., extract from empty tree). If the tree is a singleton leaf,
the result is empty; if there are two elements, the result is a singleton; if f is a singleton
list, the middle is empty, while r isn’t empty, we extract the only one in f , then borrow
one from r to f ; if the middle isn’t empty, we recursively extract a node from the middle,
flatten that node to list to replace f (the original one is extracted). If f has more than
one element, we extract the first. Figure 12.8 gives examples that extract 2 elements. We
define head, tail with extract.
(
head = f st ◦ extract
(12.22)
tail = snd ◦ extract

Exercise 12.5
12.5.1. Eliminate recursion, implement extract in loops.

12.5.3 Append and remove


We implement append, remove on right symmetrically.
append ∅ a = (a)
append (a) b = ([a], ∅, [b])
(12.23)
append (f, t, [a, b, c]) d = (f, append t (a, b, c), [d])
append (f, t, r) a = (f, t, r +
+ [a])
184 CHAPTER 12. SEQUENCE

NIL NIL

Figure 12.8: Extract: (a) A sequence of 10 elements. (b) Extract one, f becomes a
singleton list. (c) Extract another, borrow an element from the middle, flatten the 2-3
tree to a list as the new f .
12.5. FINGER TREE 185

If there are less than 3 elements in r, we append the new element to the tail of r.
Otherwise, we extract 3 to form a 2-3 tree, and recursively append the tree to the middle.
We can repeatedly append a list of elements by folding from left:
t  xs = f oldl append t xs (12.24)
remove is reverse of append:
remove (a) = (∅, a)
remove ([a], ∅, [b]) = ((a), b)
remove (f, ∅, [a]) = ((initf, ∅, [lastf ]), a) (12.25)
remove (f, t, [a]) = ((f, t0 , toList r), a), where : (t0 , r) = remove t
remove (f, t, r) = ((f, t, init r), last r)
Where last accesses the tail element of a list, init returns the rest (eq. (1.4) in chapter
1).

12.5.4 concatenate
When concatenate (+ +) two none empty finger trees T1 = (f1 , t1 , r1 ), T2 = (f2 , t2 , r2 ), we
use f1 as the result front f , r2 as the result rear r. Then merge t1 , r1 , f2 , t2 as the middle
tree. Because both r1 and f2 are list of nodes, it is equivalent to the below problem:
merge t1 (r1 +
+ f2 ) t2 =?
Both t1 and t2 are finger trees deeper than T1 and T2 a level. If the type of element
in T1 is a, then the type of element in t1 is Node a. We recursively merge: keep the front
of t1 and rear of t2 , then further merge the middle of t1 , t2 , and the rear of t1 , the front
of t2 .
merge ∅ ts t2 = ts  t2
merge t1 ts ∅ = t1  ts
merge (a) ts t2 = merge ∅ (a:ts) t2
merge t1 ts (a) = merge t1 (ts + + [a]) ∅
merge (f1 , t1 , r1 ) ts (f2 , t2 , r2 ) = (f1 , merge t1 (nodes (r1 +
+ ts +
+ f2 )) t2 , r2 )
(12.26)
Where function nodes collects elements to a list of 2-3 trees. This is because the type
of the element in the middle is deeper than the finger.
nodes [a, b] = [(a, b)]
nodes [a, b, c] = [(a, b, c)]
(12.27)
nodes [a, b, c, d] = [(a, b), (c, d)]
nodes (a:b:c:ts) = (a, b, c):nodes ts
We define finger tree concatenation with merge:
(f1 , t1 , r1 ) +
+ (f2 , t2 , r2 ) = (f1 , merge t1 (r1 +
+ f2 ) t 2 , r 2 ) (12.28)
Compare with eq. (12.26), concatenation is essentially merge, let us define them in a
unified way:
T1 +
+ T2 = merge T1 [ ] T2 (12.29)
The performance is proportion to the number of recursions, which is the smaller height
of the two trees. The 2-3 trees are balanced, the height is O(lg n), where n is the number
of elements. In edge cases, merge performs as same as insert (call insert at most 8 times)
in amortized constant time; In worst case, the performance is O(m), where m is the height
difference between the two trees. The overall performance is bound O(lg n), where n is
the total elements of the two trees.
186 CHAPTER 12. SEQUENCE

12.5.5 Random access


The idea is to turn random access into tree search. To avoid repeatedly computing tree
size, we augment a size variable s to each branch node as (s, f, t, r).
data Tree a = Empty
| Lf a
| Tr Int [a] (Tree (Node a)) [a]

size ∅ = 0
size (x) = size x (12.30)
size (s, f, t, r) = s

Here size (x) is not necessarily 1. x can be a deeper node, like Node a. It is 1 only
at level one. For termination, we wrap the level one x as an element cell (x)e , and define
size (x)e = 1 (see the example in appendix).
(
x C t = insert (x)e t
(12.31)
t B x = append t (x)e

and:
(
xs  t = f oldr (C) t xs
(12.32)
t  xs = f oldl (B) t xs

We also need calculate the size of a 2-3 tree:


size (t1 , t2 ) = size t1 + size t2
(12.33)
size (t1 , t2 , t3 ) = size t1 + size t2 + size t3

Given a list of nodes (e.g., a finger at deeper level), we calculate the size by: sum ◦
(map size). We need update the size when insert or delete element. With size augmented,
we can lookup the tree at any position i. The finger tree (s, f, t, r) has recursive structure.
Let the size of these components be sf , st , sr , where s = sf +st +sr . If i ≤ sf , the location
is in f , we further lookup f ; if sf < i ≤ sf +st , then the location is in t, we need recursively
lookup t; otherwise, lookup r. We also need to handle the leaf case of (x). Use a pair
(i, t) to define the position i at data structure t, and define lookupT as below:

lookupT i (x) = (i,


 x)
i < sf : lookups i f
(12.34)

lookupT i (s, f, t, r) = sf ≤ i < sf + st : lookupN (lookupT (i − sf ) t)
otherwise :

lookups (i − sf − st ) r

Where sf = sum (map size f ), st = size t, are the sizes of the first two components.
When lookup location i, if the tree is a leaf (x), the result is (i, x); otherwise we need
figure out which component among (s, f, t, r) that i points to. If it either in f or r, then
we lookup the figure:
(
i < size x : (i, x)
lookups i (x:xs) = (12.35)
otherwise : lookups (i − size x) xs

If i is in some element x (i < size x), we return (i, x); otherwise, we continue looking
up the rest elements. If i points to the middle t, we recursively lookup to obtain a place
(i0 , m), where m is a 2-3 tree. We next lookup m:
12.6. APPENDIX - EXAMPLE PROGRAMS 187
(
i < size t1 : (i, t1 )
lookupN i (t1 , t2 ) =
otherwise : (i − size t1 , t2 )

i < size t1 :
 (i, t1 )
lookupN i (t1 , t2 , t3 ) = size t1 ≤ i < size t1 + size t2 : (i − size t1 , t2 )
otherwise :

(i − size t1 − size t2 , t3 )

(12.36)
Because we previously wrapped x inside (x)e , we need extract x out finally:

if lookupT i T = (i0 , (x)e ) : Just x


(
T [i] = (12.37)
otherwise : Nothing

We return the result of type Maybe a = Nothing | Just a, means either found, or
lookup failed2 . The random access looks up the finger tree recursively, proportion to the
tree depth. Because finger tree is balanced, the performs is bound to O(lg n), where n is
the number of elements.
We achieve balanced performance with finger tree implementation. The operations at
head and tail are bound to amortized constant time, the concatenation, split, and random
access are in logarithm time [67] . By the end of this chapter, we’ve seen many elementary
data structures. They are useful to solve some classic problems. For example, we can use
sequence to implement MTF (move-to-front3 ) encoding algorithm [68] . MTF moves any
element at position i to the front of the sequence (see Exercise 12.6.2):

mtf i S = x C S 0 , where(x, S 0 ) = extractAt i S

In the next two chapters, we’ll go through the classic divide and conquer sorting
algorithms, including quick sort, merge sort, and their variants; then give the string
matching algorithms and elementary search algorithms.

Exercise 12.6
12.6.1. For random access, how to handle the empty tree ∅ and out of bound cases?
12.6.2. Implement cut i S, split sequence S at position i.

12.6 Appendix - example programs


Binary random access list (forest):
data Tree a = Leaf a
| Node Int (Tree a) (Tree a)

type BRAList a = [Tree a]

size (Leaf _) = 1
size (Node sz _ _) = sz

link t1 t2 = Node (size t1 + size t2) t1 t2

insert x = insertTree (Leaf x) where


insertTree t [] = [t]
insertTree t (t':ts) = if size t < size t' then t:t':ts
else insertTree (link t t') ts

2 Some programming environments provide equivalent tool, like the Optional<T> in Java/C++.
3 Used in Burrows-Wheeler transform (BWT) data compression algorithm.
188 CHAPTER 12. SEQUENCE

extract ((Leaf x):ts) = (x, ts)


extract ((Node _ t1 t2):ts) = extract (t1:t2:ts)

head' = fst ◦ extract


tail' = snd ◦ extract

getAt i (t:ts) | i < size t = lookupTree i t


| otherwise = getAt (i - size t) ts
where
lookupTree 0 (Leaf x) = x
lookupTree i (Node sz t1 t2)
| i < sz `div` 2 = lookupTree i t1
| otherwise = lookupTree (i - sz `div` 2) t2

Numeric representation of binary random access list:


data Digit a = Zero | One (Tree a)

type RAList a = [Digit a]

insert x = add (Leaf x) where


add t [] = [One t]
add t (Zero:ts) = One t : ts
add t (One t' :ts) = Zero : add (link t t') ts

minus [One t] = (t, [])


minus (One t:ts) = (t, Zero:ts)
minus (Zero:ts) = (t1, One t2:ts') where
(Node _ t1 t2, ts') = minus ts

head' ts = x where (Leaf x, _) = minus ts


tail' = snd ◦ minus

Paired-array sequence:
Data Seq<K> {
[K] front = [], rear = []
}

Int length(S<K> s) = length(s.front) + length(s.rear)

void insert(K x, Seq<K> s) = append(x, s.front)

void append(K x, Seq<K> s) = append(x, s.rear)

K get(Int i, Seq<K> s) {
Int n = length(s.front)
return if i < n then s.front[n - i - 1] else s.rear[i - n]
}

Concatenate-able list:
data CList a = Empty | CList a (Queue (CList a))

wrap x = CList x emptyQ

x ++ Empty = x
Empty ++ y = y
(CList x q) +
+ y = CList x (push q y)

fold f z q | isEmpty q = z
| otherwise = (top q) `f` fold f z (pop q)

concat = fold (+
+) Empty

insert x xs = (wrap x) +
+ xs
12.6. APPENDIX - EXAMPLE PROGRAMS 189

append xs x = xs +
+ wrap x

head (CList x _) = x
tail (CList _ q) = concat q

Finger tree:
−− 2-3 tree
data Node a = Tr2 Int a a
| Tr3 Int a a a

−− finger tree
data Tree a = Empty
| Lf a
| Br Int [a] (Tree (Node a)) [a] −− size, front, mid, rear

newtype Elem a = Elem { getElem :: a } −− wrap element

newtype Seq a = Seq (Tree (Elem a)) −− sequence

class Sized a where −− support size measurement


size :: a → Int

instance Sized (Elem a) where


size _ = 1 −− 1 for any element

instance Sized (Node a) where


size (Tr2 s _ _) = s
size (Tr3 s _ _ _) = s

instance Sized a ⇒ Sized (Tree a) where


size Empty = 0
size (Lf a) = size a
size (Br s _ _ _) = s

instance Sized (Seq a) where


size (Seq xs) = size xs

tr2 a b = Tr2 (size a + size b) a b


tr3 a b c = Tr3 (size a + size b + size c) a b c

nodesOf (Tr2 _ a b) = [a, b]


nodesOf (Tr3 _ a b c) = [a, b, c]

−− left
x <| Seq xs = Seq (Elem x `cons` xs)

cons :: (Sized a) ⇒ a → Tree a → Tree a


cons a Empty = Lf a
cons a (Lf b) = Br (size a + size b) [a] Empty [b]
cons a (Br s [b, c, d] m r) = Br (s + size a) [a] ((tr3 b c d) `cons` m) r
cons a (Br s f m r) = Br (s + size a) (a:f) m r

head' (Seq xs) = getElem $ fst $ uncons xs


tail' (Seq xs) = Seq $ snd $ uncons xs

uncons :: (Sized a) ⇒ Tree a → (a, Tree a)


uncons (Lf a) = (a, Empty)
uncons (Br _ [a] Empty [b]) = (a, Lf b)
uncons (Br s [a] Empty (r:rs)) = (a, Br (s - size a) [r] Empty rs)
uncons (Br s [a] m r) = (a, Br (s - size a) (nodesOf f) m' r)
where (f, m') = uncons m
uncons (Br s (a:f) m r) = (a, Br (s - size a) f m r)

−− right
Seq xs |> x = Seq (xs `snoc` Elem x)
190 CHAPTER 12. SEQUENCE

snoc :: (Sized a) ⇒ Tree a → a → Tree a


snoc Empty a = Lf a
snoc (Lf a) b = Br (size a + size b) [a] Empty [b]
snoc (Br s f m [a, b, c]) d = Br (s + size d) f (m `snoc` (tr3 a b c)) [d]
snoc (Br s f m r) a = Br (s + size a) f m (r ++ [a])

last' (Seq xs) = getElem $ snd $ unsnoc xs


init' (Seq xs) = Seq $ fst $ unsnoc xs

unsnoc :: (Sized a) ⇒ Tree a → (Tree a, a)


unsnoc (Lf a) = (Empty, a)
unsnoc (Br _ [a] Empty [b]) = (Lf a, b)
unsnoc (Br s f@(_:_:_) Empty [a]) = (Br (s - size a) (init f) Empty [last f], a)
unsnoc (Br s f m [a]) = (Br (s - size a) f m' (nodesOf r), a)
where (m', r) = unsnoc m
unsnoc (Br s f m r) = (Br (s - size a) f m (init r), a) where a = last r

−− concatenate
Seq xs + ++ Seq ys = Seq (xs >+< ys)

xs >+< ys = merge xs [] ys

t <<< xs = foldl snoc t xs


xs >>> t = foldr cons t xs

merge :: (Sized a) ⇒ Tree a → [a] → Tree a → Tree a


merge Empty es t2 = es >>> t2
merge t1 es Empty = t1 <<< es
merge (Lf a) es t2 = merge Empty (a:es) t2
merge t1 es (Lf a) = merge t1 (es++[a]) Empty
merge (Br s1 f1 m1 r1) es (Br s2 f2 m2 r2) =
Br (s1 + s2 + (sum $ map size es)) f1 (merge m1 (trees (r1 +
+ es +
+ f2)) m2) r2

trees [a, b] = [tr2 a b]


trees [a, b, c] = [tr3 a b c]
trees [a, b, c, d] = [tr2 a b, tr2 c d]
trees (a:b:c:es) = (tr3 a b c):trees es

−− index
data Place a = Place Int a

getAt :: Seq a → Int → Maybe a


getAt (Seq xs) i | i < size xs = case lookupTree i xs of
Place _ (Elem x) → Just x
| otherwise = Nothing

lookupTree :: (Sized a) ⇒ Int → Tree a → Place a


lookupTree n (Lf a) = Place n a
lookupTree n (Br s f m r) | n < sf = lookups n f
| n < sm = case lookupTree (n - sf) m of
Place n' xs → lookupNode n' xs
| n < s = lookups (n - sm) r
where sf = sum $ map size f
sm = sf + size m

lookupNode :: (Sized a) ⇒ Int → Node a → Place a


lookupNode n (Tr2 _ a b) | n < sa = Place n a
| otherwise = Place (n - sa) b
where sa = size a
lookupNode n (Tr3 _ a b c) | n < sa = Place n a
| n < sab = Place (n - sa) b
| otherwise = Place (n - sab) c
where sa = size a
sab = sa + size b
Elementary Algorithms 191

lookups :: (Sized a) ⇒ Int → [a] → Place a


lookups n (x:xs) = if n < sx then Place n x
else lookups (n - sx) xs
where sx = size x
192 Quick sort and merge sort
Chapter 13

Quick sort and merge sort

People prove the upper limit of performance is O(n lg n) for comparison based sort [51] .
This chapter gives two divide and conquer sort algorithms: quick sort and merge sort,
both achieve O(n lg n) time bound. We also give their variants, like natural merge sort,
in-place merge sort, and etc.

13.1 Quick sort

Consider arrange kids in a line ordered by height.


1. The first kid raises hand, all shorter ones move to left, and the others move to right;
2. All kids on the left and right repeat.
For example, the heights (in cm) are [102, 100, 98, 95, 96, 99, 101, 97]. Table 13.1 gives
the steps. (1) The kid of 102 cm raises hand as the pivot (underlined in the first row).
He happens to be the tallest, hence all others move to the left as shown in the second row
in the table. (2) The kid of 100 cm is the pivot. Those of height 98, 95, 96, and 99 cm
move to the left, and the one of 101 cm moves to the right, as shown in the third row.
(3) The kid of 98 cm is the left pivot, while 101 cm is the right pivot. Because there is
only one kid on the right, it’s sorted. Repeat this to sort all kids.
Summarize the quick sort. Let the list be L:

193
194 CHAPTER 13. QUICK SORT AND MERGE SORT

102 100 98 95 96 99 101 97


100 98 95 96 99 101 97 ‘102’
98 95 96 99 97 ‘100’ 101 ‘102’
95 96 97 ‘98’ 99 ‘100’ ‘101’ ‘102’
‘95’ 96 97 ‘98’ ‘99’ ‘100’ ‘101’ ‘102’
‘95’ ‘96’ 97 ‘98’ ‘99’ ‘100’ ‘101’ ‘102’
‘95’ ‘96’ ‘97’ ‘98’ ‘99’ ‘100’ ‘101’ ‘102’

Table 13.1: Sort steps

• If L is empty[ ], the result is [ ];

• Otherwise, select an element as the pivot p, recursively sort elements ≤ p to the


left; and sort other elements > p to the right.

We say and, but not ‘then’, indicate we can sort left and right in parallel. C. A. R.
Hoare developed quick sort in 1960 [51] [78] . There are varies of ways to pick the pivot, for
example, always choose the first element.

sort [ ] = [ ]
(13.1)
sort (x:xs) = sort [y ← xs, y ≤ x] + + sort [y ← xs, x < y]
+ [x] +

We use ZF expression (see sections 1.4.1 and 1.7) to filter the list. Below is the example
program:
sort [] = []
sort (x:xs) = sort [y | y←xs, y ≤ x] +
+ [x] +
+ sort [y | y←xs, x < y]

We assume to sort in ascending order. We can abstract the (≤) as generic comparison
to sort different things like numbers, strings, and etc. We needn’t total ordering, but at
least need strict weak ordering [79] [52] (see section 9.2).

13.1.1 Partition
We traverse the elements in two passes: first filter all elements ≤ x ; next filter all > x.
Let us combine them into one pass:

part p [ ] = (
([ ], [ ])
p(x) : (x:as, bs), where (as, bs) = part p xs (13.2)
part p (x:xs) =
otherwise : (as, x:bs)

And change the quick sort to:

sort [ ] = []
(13.3)
sort (x:xs) = sort as + + sort bs, where (as, bs) = part (≤ x) xs
+ [x] +

Alternatively, we can define partition with fold (in Curried form):

part p = f oldr f ([ ], [ ]) (13.4)

Where:
(
p(x) : (x:as, bs)
f (as, bs) x = (13.5)
otherwise : (as, x:bs)
13.1. QUICK SORT 195

It’s essentially to accumulate to (as, bs). If p(x) holds, then add x to as, otherwise to
bs. Change the partition to tail recursive:

part p [ ] as bs = (as,
( bs)
p(x) : part p xs (x:as) bs (13.6)
part p (x:xs) as bs =
otherwise : part p xs as (x:bs)

To partition (x:xs), call (as, bs) = part (≤ x) xs [ ] [ ]. We eliminate the concatenation


(+
+) in ‘sort as + + sort bs’ with an accumulators s as:
+ [x] +

sort0 s [ ] = s
(13.7)
sort s (x:xs) = sort0 (x : sort0 s bs) as
0

Start it with an empty list: sort = sort0 [ ]. After partition, we need recursively sort
as, bs. We can first sort bs, prepend x, then pass it as the new accumulator to sort as:
sort = sort' []

sort' s [] = s
sort' s (x:xs) = sort' (x : sort' s bs) as where
(as, bs) = part xs [] []
part [] as bs = (as, bs)
part (y:ys) as bs | y ≤ x = part ys (y:as) bs
| otherwise = part ys as (y:bs)

13.1.2 In-place sort


Figure 13.1 gives a way to partition in-place [2] [4] . Scan from left to right. At any time,
the array is consist of four parts (fig. 13.1 (a)):

p = A[l] left L right R

A[l] ... ≤ p ... ... > p ... ...?... A[u]

(a) Partition invariant


p L R

A[l] A[l + 1] ...?... A[u]

(b) Initialize
p L R

A[l] ... ≤ p ... A[L] ... > p ... A[u]

swap
(c) Terminate

Figure 13.1: In-place partition, pivot p = A[l]

1. The pivot is the left element: p = A[l]. It moves to the final position after partition;
196 CHAPTER 13. QUICK SORT AND MERGE SORT

2. A section of elements ≤ p, extends right to L;


3. A section of elements > p, extends right to R. The elements between L and R are
> p;
4. Elements after R haven’t been partitioned (maybe > or = or < p).
When partition starts, L points to p, R points to the next (fig. 13.1 (b)). We advance
R to the right boundary. Every time, compare A[R] and p. If A[R] > p, it should be
between L and R, we move R forward; otherwise if A[R] ≤ p, it should be on the left of
L. We advance L a step, then swap A[L] ↔ A[R]. The partition ends when R passes the
last element. All elements > p are moved to the right of L, while others are on the left.
We need move p to the position between the two parts. To do that, swap p ↔ A[L], as
shown in fig. 13.1 (c). L finally points to p, partitioned the array in two parts. We return
L + 1 as the result, that points to the first element > p. Let the array be A, the lower,
upper boundaries be l, u. The in-place partition is defined below:
1: function Partition(A, l, u)
2: p ← A[l] . pivot
3: L←l . left
4: for R in [l + 1, u] do . iterate on right
5: if p ≥ A[R] then
6: L←L+1
7: Exchange A[L] ↔ A[R]
8: Exchange A[L] ↔ p
9: return L + 1 . partition position
Table 13.2 lists the steps to partition [3, 2, 5, 4, 0, 1, 6, 7].

3(l) 2(r) 5 4 0 1 6 7 start, p = 3, l = 1, r = 2


3 2(l)(r) 5 4 0 1 6 7 2 < 3, advance l (r = l)
3 2(l) 5(r) 4 0 1 6 7 5 > 3, move on
3 2(l) 5 4(r) 0 1 6 7 4 > 3, move on
3 2(l) 5 4 0(r) 1 6 7 0<3
3 2 0(l) 4 5(r) 1 6 7 advance l, swap with r
3 2 0(l) 4 5 1(r) 6 7 1<3
3 2 0 1(l) 5 4(r) 6 7 advance l, swap with r
3 2 0 1(l) 5 4 6(r) 7 6 > 3, move on
3 2 0 1(l) 5 4 6 7(r) 7 > 3, move on
1 2 0 3 5(l+1) 4 6 7 terminate, swap p and l

Table 13.2: Partition array

We implement quick sort with Partition as bellow:


1: procedure Quick-Sort(A, l, u)
2: if l < u then
3: m ← Partition(A, l, u)
4: Quick-Sort(A, l, m − 1)
5: Quick-Sort(A, m, u)
We pass the array and its boundaries as Quick-Sort(A, 1, |A|) to sort. It returns
immediately if the array is empty or a singleton.

Exercise 13.1
13.1.1. Optimize the basic quick sort definition for the singleton list case.
13.1. QUICK SORT 197

13.1.3 Performance
Quick sort performs well in most cases. Consider the best/worst cases. For the best case,
it always halves the elements into two equal sized parts. As shown in figure fig. 13.2,
there are total O(lg n) levels of recursions. At level one, it processes n elements with
one partition; at level two, partitions twice, each processes n/2 elements, taking total
2O(n/2) = O(n) time; at level three, partitions four times, each processes n/4 elements,
taking total O(n) time too, ..., at the last level, there are n singleton segments, taking
total O(n) time. Sum all levels, the time is bound to O(n lg n).

...

1 1 ... ... 1

Figure 13.2: The best case, halve every time.

For the worst case, the partition is totally unbalanced, one part is of O(1) length, the
other is O(n). The level of recursions decays to O(n). Model the partition as a tree.
It’s balanced binary tree in the best case, while it becomes a linked-list of O(n) length
in the worst case. Every branch node has an empty sub-tree. At each level, we process
all elements, hence the total time is bound to O(n2 ). This is same as insertion sort, and
selection sort. There are several challenging cases, for example, the sequence has many
duplicated elements, or is largely ordered, and etc. There isn’t a method can avoid the
worst case completely.

Average caseF
Quick sort performs well in average. For example, even if every partition gives two parts
of 1:9, the performance still achieves O(n lg n) [4] . We give two methods to evaluate the
performance. The first one is based on the fact, that the performance is proportion to
the number of comparisons. In selection sort, every two elements are compared, while in
quick sort, we save many comparisons. When partition sequence [a1 , a2 , a3 , ..., an ] with a1
as the pivot, we obtain two sub sequences A = [x1 , x2 , ..., xk ] and B = [y1 , y2 , ..., yn−k−1 ].
After that, none element xi in A will be compared with any yj in B. Let the sorted result
be [a1 , a2 , ..., an ], if ai < aj , we do not compare them if and only if there is some element
ak , where ai < ak < aj , is picked as the pivot before either ai or aj being the pivot. In
other word, the only chance that we compare ai and aj is either ai or aj is chosen as the
pivot before any other elements in ai+1 < ai+2 < ... < aj−1 being the pivot. Let P (i, j)
be the probability that we compare ai and aj . We have:
2
P (i, j) = (13.8)
j−i+1
The total number of comparisons is:
n−1 n
(13.9)
X X
C(n) = P (i, j)
i=1 j=i+1
198 CHAPTER 13. QUICK SORT AND MERGE SORT

If we compare ai and aj , we won’t compare aj and ai again, and we never compare ai


with itself. The upper bound of i is n − 1, and the lower bound of j is i + 1. Substitute
the probability:
n−1 n
X X 2
C(n) =
j−i+1
i=1 j=i+1
n−1 n−i
(13.10)
XX 2
=
i=1 k=1
k+1

Use the result of harmonic series [80] .


1 1
Hn = 1 + + + .... = ln n + γ + n
2 3
n−1
O(lg n) = O(n lg n) (13.11)
X
C(n) =
i=1

The other method uses the recursion. Let the length of the sequence be n, we partition
it into two parts of length i and n − i − 1. The partition takes cn time because it compares
every element with the pivot. The total time is:

T (n) = T (i) + T (n − i − 1) + cn (13.12)

Where T (n) is the time to sort n elements. i equally distributes across 0, 1, ..., n − 1.
Taking math expectation:

T (n) = E(T (i)) + E(T (n − i − 1)) + cn


n−1 n−1
1X 1X
= T (i) + T (n − i − 1) + cn
n i=0 n i=0
1X
n−1
1X
n−1
(13.13)
= T (i) + T (j) + cn
n i=0 n j=0
n−1
2X
= T (i) + cn
n i=0

Multiply n to both sides:


n−1
(13.14)
X
nT (n) = 2 T (i) + cn2
i=0

Substitute n to n − 1:
n−2
(13.15)
X
(n − 1)T (n − 1) = 2 T (i) + c(n − 1)2
i=0

Take (eq. (13.14)) - (eq. (13.15)), cancel all T (i) for 0 ≤ i < n − 1.

nT (n) = (n + 1)T (n − 1) + 2cn − c (13.16)

Drop the constant c, we obtain:


T (n) T (n − 1) 2c
= + (13.17)
n+1 n n+1
13.1. QUICK SORT 199

Assign n to n − 1, n − 2, ..., to give n − 1 equations.

T (n − 1) T (n − 2) 2c
= +
n n−1 n
T (n − 2) T (n − 3) 2c
= +
n−1 n−2 n−1
...
T (2) T (1) 2c
= +
3 2 3
Sum up and cancel the same components on both sides, we get a function of n.
n+1
T (n) T (1) X1
= + 2c (13.18)
n+1 2 k
k=3

Use the result of the harmonic series:


T (n) T (1)
O( ) = O( + 2c ln n + γ + n ) = O(lg n) (13.19)
n+1 2
Therefore:

O(T (n)) = O(n lg n) (13.20)

13.1.4 Improvement
The Partition procedure doesn’t perform well when there are many duplicated elements.
Consider the extreme case that all n elements are equal [x, x, ..., x]:

1. From the quick sort definition: pick any element as the pivot, hence p = x, partition
into two sub-sequences. One is [x, x, ..., x] of length n − 1, the other is empty. Next
recursively sort the n − 1 elements, the total time decays to O(n2 ).

2. Modify the partition with < x and > x. The result are two empty sub-sequences,
and n elements equal to x. The recursion on empty sequence terminates immedi-
ately. The result is [ ] + + [ ]. The performance is O(n).
+ [x, x, ..., x] +

We improve from binary partition to ternary partition to handle the duplicated ele-
ments:
sort [ ] = [ ]
(13.21)
sort (x:xs) = sort S +
+ sort E +
+ sort G

Where:

S = [y ← xs, y < x]

E = [y ← xs, y = x]

G = [y ← xs, y > x]

we use an accumulator to improve the list concatenation: qsort = sort [ ], where:

sort A [ ] = A
(13.22)
sort A (x:xs) = sort (E +
+ sort A G) S
200 CHAPTER 13. QUICK SORT AND MERGE SORT

The sub-list E contains elements of the same value, hence sorted. We first sort G with
the accumulator A, then prepend E as the new accumulator, and use it to sort S. We
also improve the partition with accumulator:
part S E G x [ ] = (S,
 E, G)
y < x : (y:S, E, G)
(13.23)

part S E G x (y:ys) = y=x: (S, y:E, G)

y>x: (S, E, y:G)

Richard Bird gives another improvement [1] : collect the recursive sort results with a
list and concatenate it finally:
sort :: (Ord a) ⇒ [a] → [a]
sort = concat ◦ (pass [])

pass xss [] = xss


pass xss (x:xs) = step xs [] [x] [] xss where
step [] as bs cs xss = pass (bs : pass xss cs) as
step (x':xs') as bs cs xss | x' < x = step xs' (x':as) bs cs xss
| x' == x = step xs' as (x':bs) cs xss
| x' > x = step xs' as bs (x':cs) xss

Robert Sedgewick develops the two-way partition method [69] [2] . Use two pointers i, j
to the left and right boundaries. Pick the first element as the pivot p. Advance i to right
till meet an element ≥ p; while (in parallel) move j to left till meet an element ≤ p. At
this time, all elements on the left to i are less than the pivot (< p), while those on the
right to j are greater than the pivot (> p). i points to an element that is ≥ p, and j
points to an element that is ≤ p, as shown in fig. 13.3 (a). To move all elements ≤ p
to the left, and the remaining to the right, we exchange x[i] ↔ x[j], then continue scan.
Repeat this till i meets j. At any time, we keep the invariant: all elements one the left
to i (include i) are ≤ p; while all on the right to j (include j) are ≥ p. The elements
between i and j are yet to scan, as shown in fig. 13.3 (b).

pivot p ≥p ≤p

A[l] ... < p ... A[i] ... ? ... A[j] ... > p ...

(a) When i and j stop


pivot p i j

A[l] ... ≤ p ... ... ? ... ... ≥ p ...

(b) Partition invariant

Figure 13.3: 2-way scan

When i meets j, we need an additional exchange: swap the pivot p to position j. Then
recursive sort the sub-arrays A[l...j) and A[i...u).
1: procedure Sort(A, l, u) . Sort range [l, u)
2: if u − l > 1 then . At least 2 elements
3: i ← l, j ← u
4: pivot ← A[l]
5: loop
13.1. QUICK SORT 201

6: repeat
7: i←i+1
8: until A[i] ≥ pivot . Ignore i ≥ u
9: repeat
10: j ←j−1
11: until A[j] ≤ pivot . Ignore j < l
12: if j < i then
13: break
14: Exchange A[i] ↔ A[j]
15: Exchange A[l] ↔ A[j] . Move the pivot
16: Sort(A, l, j)
17: Sort(A, i, u)
For the extreme case that all elements are equal, the array is partitioned into two same
n
parts with swaps. Because of the balanced partition, the performance is O(n lg n). It
2
takes less swaps than the one pass scan method, since it skips the elements on the right
side of the pivot. We can combine the 2-way scan and the ternary partition. Only
recursively sort the elements different with the pivot. Jon Bentley and Douglas McIlroy
develop a method as shown in fig. 13.4 (a), that store the elements equal to the pivot on
both sides [70] [71] .

pivot p i j q

A[l] ... = ... ... < ... ... ? ... ... > ... ... = ...

(a) Ternary partition invariant.


i j

... < ... ... = ... ... > ...

(b) Swap the elements = p to the middle.

Figure 13.4: Ternary partition

We scan from two sides, pause when i reaches an element ≥ the pivot, and j reaches
one ≤ the pivot. If i doesn’t meet or pass j, we exchange A[i] ↔ A[j], then check if A[i]
or A[j] equals to the pivot. If yes, we exchange A[i] ↔ A[p] or A[j] ↔ A[q] respectively.
Finally, we swap all the elements equal to the pivot to the middle. This step does nothing
if all elements are distinct. The partition result is shown as fig. 13.4 (b). We next only
recursively sort the elements not equal to the pivot.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: i ← l, j ← u
4: p ← l, q ← u . point to the boundaries of duplicated elements
5: pivot ← A[l]
6: loop
7: repeat
8: i←i+1
9: until A[i] ≥ pivot . Ignore i ≥ u case
10: repeat
202 CHAPTER 13. QUICK SORT AND MERGE SORT

11: j ←j−1
12: until A[j] ≤ pivot . Ignore j < l case
13: if j ≤ i then
14: break
15: Exchange A[i] ↔ A[j]
16: if A[i] = pivot then . duplicated element
17: p←p+1
18: Exchange A[p] ↔ A[i]
19: if A[j] = pivot then
20: q ←q−1
21: Exchange A[q] ↔ A[j]
22: if i = j and A[i] = pivot then
23: j ← j − 1, i ← i + 1
24: for k from l to p do . Swap the duplicated elements to the middle
25: Exchange A[k] ↔ A[j]
26: j ←j−1
27: for k from u − 1 down-to q do
28: Exchange A[k] ↔ A[i]
29: i←i+1
30: Sort(A, l, j + 1)
31: Sort(A, i, u)
It becomes complex when combine the 2-way scan and the ternary partition. Alter-
natively, we change the one pass scan to the ternary partition directly. Pick the first
element as the pivot, as shown in fig. 13.5. At any time, the left part contains elements
< p; the next part contains those = p; and the right part contains those > p. The bound-
aries are i, k, j. Elements between [k, j) are yet to be partitioned. We scan from left to
right. When start, the part < p is empty; the part = p has an element; i points to the
lower boundary, k points to the next. The part > p is empty too, j points to the upper
boundary.

i k j

... < p ... ... = p ... ... ? ... ... > p ...

Figure 13.5: 1 way scan ternary partition

Iterate on k, if A[k] = p, then move k to the next; if A[k] > p, then exchange
A[k] ↔ A[j − 1], the range of elements that > p increases by one. Its boundary j moves
to left a step. Because we don’t know if the element moved to k is still > p, we need
compare again and repeat. Otherwise if A[k] < p, we exchange A[k] ↔ A[i], where A[i]
is the first element that = p. The partition terminates when k meets j.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: i ← l, j ← u, k ← l + 1
4: pivot ← A[i]
5: while k < j do
6: while pivot < A[k] do
7: j ←j−1
8: Exchange A[k] ↔ A[j]
13.1. QUICK SORT 203

9: if A[k] < pivot then


10: Exchange A[k] ↔ A[i]
11: i←i+1
12: k ←k+1
13: Sort(A, l, i)
14: Sort(A, j, u)
This implementation is less complex but need more swaps than the version of ternary
partition through 2-way scan.

Challenging cases
Although ternary partition handles duplicated elements well, there are other challenging
cases. For example, when most elements are ordered (ascending or descending), the
partition is unbalanced. Figure 13.6 gives two cases: [x1 < x2 < ... < xn ] and [y1 > y2 >
... > yn ]. It’s easy to give more, for example: [xm , xm−1 , ..., x2 , x1 , xm+1 , xm+2 , ...xn ],
where [x1 < x2 < ... < xn ], and [xn , x1 , xn−1 , x2 , ...] as shown in fig. 13.7.

[]

[]

[] ...

[]

(a) Partition tree of [x1 < x2 < ... < xn ], the sub-trees of ≤ p are empty.

[]

[]

... []

[]

(b) Partition tree of [y1 > y2 > ... > yn ], the sub-trees of ≥ p are empty.

Figure 13.6: The challenging cases - 1.

In these challenging cases, the partition is unbalanced when choose the first element as
the pivot. Robert Sedgwick improves the pivot selection [69] : Instead pick a fixed position,
sample several elements to avoid bad pivot. We sample the first, the middle, and the last,
pick the median as the pivot. We can either compare every two (total 3 times) [70] , or
swap the least one to head, swap the greatest one to end, and move the median to the
middle.
1: procedure Sort(A, l, u)
2: if u − l > 1 then
204 CHAPTER 13. QUICK SORT AND MERGE SORT

[] []

... [] [] ...

[] []

(a) Unbalanced partitions except for the first time.

[]

[]

[]

[]

... []

(b) A zig-zag partition tree.

Figure 13.7: The challenging cases - 2.


13.1. QUICK SORT 205

l+u u−l
3: m←b c . or l + to void overflow
2 2
4: if A[m] < A[l] then . Ensure A[l] ≤ A[m]
5: Exchange A[l] ↔ A[m]
6: if A[u − 1] < A[l] then . Ensure A[l] ≤ A[u − 1]
7: Exchange A[l] ↔ A[u − 1]
8: if A[u − 1] < A[m] then . Ensure A[m] ≤ A[u − 1]
9: Exchange A[m] ↔ A[u − 1]
10: Exchange A[l] ↔ A[m]
11: (i, j) ← Partition(A, l, u)
12: Sort(A, l, i)
13: Sort(A, j, u)
This implementation handles the above four challenging cases well, known as the
‘median of three’. Alternatively, we can randomly pick the pivot:
1: procedure Sort(A, l, u)
2: if u − l > 1 then
3: Exchange A[l] ↔ A[ Random(l, u) ]
4: (i, j) ← Partition(A, l, u)
5: Sort(A, l, i)
6: Sort(A, j, u)
Where Random(l, u) returns integer l ≤ i < u randomly. We swap A[i] with the
first element as the pivot. This method is called random quick sort [4] . Theoretically,
neither ‘median of three’ nor random quick sort can avoid the worst case completely. If
the sequence is random, it’s same to choose any one as the pivot. Nonetheless, these
improvements are widely used in engineering practice.
There are other improvements besides partition. Sedgewick find quick sort has over-
head when the list is short, while insert sort performs better [2] [70] . Sedgewick, Bentley
and McIlroy evaluate varies thresholds, as ‘cut-off’. When the elements are less than the
‘cut-off’, then switch to insert sort.
1: procedure Sort(A, l, u)
2: if u − l > Cut-Off then
3: Quick-Sort(A, l, u)
4: else
5: Insertion-Sort(A, l, u)

13.1.5 quick sort and tree sort


The ‘true quick sort’ is the combination of multiple engineering improvements, e.g., falls
back to insert sort for small sequence, in-place swaps, choose the pivot as the ‘median
of three’, 2-way scan, and ternary partition. Some people think the basic recursive def-
inition is essentially tree sort. Richard Bird derives quick sort from binary tree sort by
deforestation [72] . Define unfold that converts a list to binary search tree:

unfold [ ] = ∅
(13.24)
unfold (x:xs) = (unfold [a ← xs, a ≤ x], x, unfold [a ← xs, a > x])

Compare with the binary tree insert (see chapter 2), unfold creates the tree differently.
If the list is empty, the tree is empty; otherwise, use the first element x as the key, then
recursively build the left, and right sub-trees. Where the left sub-tree has the elements
≤ x; and the right has the elements > x. Define in-order traverse to convert a binary
search tree to ordered list:
206 CHAPTER 13. QUICK SORT AND MERGE SORT

toList ∅ = [ ]
(13.25)
toList (l, k, r) = toList l + + toList r
+ [k] +

Then define quick sort as:

sort = toList ◦ unfold (13.26)

We first build the binary search tree through unfold, then pass it to toList to generate
the list, and discard the tree. When eliminate the intermediate tree (through deforestation
by Burstle-Darlington’s work [73] ), we obtain the quick sort.

13.2 Merge sort


Quick sort performs well in most cases. However, it can’t avoid the worst cases com-
pletely. Merge sort guarantees O(n lg n) performance in all cases, supports both array
and list. Many programming environments provide merge sort as the standard tool1 .
Merge sort takes divide and conquer approach. It always splits the sequence in half and
half, recursively sorts and merges them.

sort [ ] = [ ]
sort [x] = [x] (13.27)
sort xs = merge (sort as) (sort bs), where : (as, bs) = halve xs

|xs|
Where halve splits the sequence. For array, we can cut at the middle: splitAt b c xs.
2
However, it takes linear time to move to the middle point of a list (see eq. (1.45)):

splitAt n xs = shift n [ ] xs (13.28)

Where:

shift 0 as bs = (as, bs)


(13.29)
shift n as (b:bs) = shift (n − 1) (b:as) bs

Because halve needn’t keep the relative order among elements, we can simplify it with
odd-even split. There are same number of elements in odd and even positions, or they
only differ by one. Define halve = split [ ] [ ], where:

split as bs [ ] = (as, bs)


split as bs [x] = (x:as, bs) (13.30)
split as bs (x:y:xs) = split (x:as) (y:bs) xs

We can further simplify it with folding. As in below example, we add x to as every


time, then swap as ↔ bs:
halve = foldr f ([], []) where
f x (as, bs) = (bs, x:as)

As in fig. 13.8, consider two groups of kids, each is already ordered from short to tall.
They need pass a gate, one kid per time. We arrange the first kid from each group to
compare, the shorter one pass. Repeat this till either group complete pass the gate, then
the remaining kids go one by one.
1 For example in the standard library of Haskell, Python, and Java.
13.2. MERGE SORT 207

Figure 13.8: Merge

merge [ ] bs = bs
merge as [ ] = as
(13.31)
a : merge as (b:bs)
(
a<b:
merge (a:as) (b:bs) =
otherwise : b : merge (a:as) bs

For array, we directly cut at the middle, recursively sort two halves, then merge:
1: procedure Sort(A)
2: n ← |A|
3: if n > 1 then
n
4: m←b c
2
5: X ← Copy-Array(A[1...m])
6: Y ← Copy-Array(A[m + 1...n])
7: Sort(X)
8: Sort(Y )
9: Merge(A, X, Y )
We allocated additional space of the same size as A because Merge is not in-pace.
We repeatedly compare elements from X and Y , pick the less one to A. When either
sub-array finishes, then add all the remaining to A.
1: procedure Merge(A, X, Y )
2: i ← 1, j ← 1, k ← 1
3: m ← |X|, n ← |Y |
4: while i ≤ m and j ≤ n do
5: if X[i] < Y [j] then
6: A[k] ← X[i]
7: i←i+1
8: else
9: A[k] ← Y [j]
10: j ←j+1
11: k ←k+1
12: while i ≤ m do
13: A[k] ← X[i]
14: k ←k+1
15: i←i+1
16: while j ≤ n do
17: A[k] ← Y [j]
18: k ←k+1
19: j ←j+1
To simplify merge, we adjoin ∞ to X and Y 2 .
1: procedure Merge(A, X, Y )
2: Append(X, ∞)
2 −∞ for descending order
208 CHAPTER 13. QUICK SORT AND MERGE SORT

3: Append(Y, ∞)
4: i ← 1, j ← 1, n ← |A|
5: for k ← from 1 to n do
6: if X[i] < Y [j] then
7: A[k] ← X[i]
8: i←i+1
9: else
10: A[k] ← Y [j]
11: j ←j+1

13.2.1 Performance
Merge sort takes two steps: partition and merge. It always halves the sequence. The
binary partition tree is balanced as shown in fig. 13.2. Its height is O(lg n), so as the
recursion depth. The merge happens at all levels, it compares elements one by one from
each sorted sub-sequence, hence takes linear time. For sequence of length n, let T (n) be
the merge sort time, we have below recursive breakdown:
n n n
T (n) = T ( ) + T ( ) + cn = 2T ( ) + cn (13.32)
2 2 2
n
The time consists of three parts: sort the first and second halves, each takes T ( ) time;
2
and merge in cn time, where c is a constant. Solving this equation gives T (n) = O(n lg n).
For space, varies implementation differ a lot. The basic merge sort allocates the space of
the same size as the array in each recursion, copies elements and sorts, then release the
space. It consumes the largest space of O(n lg n) when reaches to the deepest recursion.
It’s expensive to allocate/release space repeatedly [2] . We can pre-allocate a work area
of the same size as A. Reuse it during recursion, and release it finally.
1: procedure Sort(A)
2: n ← |A|
3: Sort0 (A, Create-Array(n), 1, n)

4: procedure Sort0 (A, B, l, u)


5: if u − l > 0 then
l+u
6: m←b c
2
7: Sort (A, B, l, m)
0

8: Sort0 (A, B, m + 1, u)
9: Merge0 (A, B, l, m, u)
We need update merge with the passed-in work area:
1: procedure Merge0 (A, B, l, m, u)
2: i ← l, j ← m + 1, k ← l
3: while i ≤ m and j ≤ u do
4: if A[i] < A[j] then
5: B[k] ← A[i]
6: i←i+1
7: else
8: B[k] ← A[j]
9: j ←j+1
10: k ←k+1
11: while i ≤ m do
12: B[k] ← A[i]
13.2. MERGE SORT 209

13: k ←k+1
14: i←i+1
15: while j ≤ u do
16: B[k] ← A[j]
17: k ←k+1
18: j ←j+1
19: for i ← from l to u do . copy back
20: A[i] ← B[i]
This implementation reduces the space from O(n lg n) to O(n), improves performance
20% to 25% for 100K numeric elements.

13.2.2 In-place merge sort


To avoid additional space, consider how to reuse the array as the work area. As shown
in fig. 13.9, sub-array X and Y are sorted. When merge in-place, the part before l are
merged and ordered. If A[l] < A[m], move l to right a step; otherwise (A[l] ≥ A[m]),
move A[m] to the merged part before l. We need shift all elements in range [l, m)3 to
right a step.

merged A[l] ... sorted X... A[m] ... sorted Y ...

if A[l] ≥ A[m], then shift X

Figure 13.9: In-place shift and merge

1: procedure Merge(A, l, m, u)
2: while l ≤ m ∧ m ≤ u do
3: if A[l] < A[m] then
4: l ←l+1
5: else
6: x ← A[m]
7: for i ← m down-to l + 1 do . Shift
8: A[i] ← A[i − 1]
9: A[l] ← x
However, it downgrades to O(n2 ) time because array shift takes linear time (O(|X|).
When sort sub-array, we want to reuse the remaining part as the work area, and must
avoid overwriting any elements. We compare elements from sorted sub-array X and Y ,
pick the less one and store it in the work area. However, we need exchange the element
out to free up the cell. After merge, X and Y together store the content of the original
work area, as shown in fig. 13.10.
The sorted X, Y , and the work area Z are all sub-arrays. We pass the start, end
positions of X and Y as ranges [i, m), [j, n). The work area starts from k.
1: procedure Merge(A, [i, m), [j, n), k)
2: while i < m and j < n do
3: if A[i] < A[j] then
4: Exchange A[k] ↔ A[i]
5: i←i+1
6: else
7: Exchange A[k] ↔ A[j]
3 range [a, b) includes a, but excludes b.
210 CHAPTER 13. QUICK SORT AND MERGE SORT

compare

... reuse ... X[i] ... ... reuse ... Y [j] ...

if X[i] < Y [j] then exchange X[i] ↔ Z[k]

... merged ... Z[k] ...

Figure 13.10: Merge and swap

8: j ←j+1
9: k ←k+1
10: while i < m do
11: Exchange A[k] ↔ A[i]
12: i←i+1
13: k ←k+1
14: while j < m do
15: Exchange A[k] ↔ A[j]
16: j ←j+1
17: k ←k+1
The work area has two properties: 1) It has sufficient size to hold the elements swapped
in; 2) It can overlap with either sorted sub-arrays, but must not overwrite any unmerged
elements. One idea is to use half array as the work area to sort the other half, as shown
in fig. 13.11.

... unsorted ... ... sorted ...

Figure 13.11: Merge and sort half array

1
We next sort further half of the work area (remaining ), as shown in fig. 13.12. We
4
1 1
must merge X ( array) and Y ( array) later sometime. However, the work area can
2 4
1
only hold array, insufficient for the size of X + Y .
4

1/4 : 1/4 : 1/2


work area sorted sorted

Figure 13.12: Work area can’t support merge X and Y .

The second property gives a way out: arrange the work area overlapped with either
sub-array, and only override the merged part. We first sort the second 1/2 of the work
area, as the result, swap Y to the first 1/2, the new work area is between X and Y , as
shown in the upper of fig. 13.13. The work area is overlapped with X [74] . Consider two
extremes:

1. y < x, for all y in Y , x in X. After merge, contents of Y and the work area are
swapped (the size of Y equals to the work area);
13.2. MERGE SORT 211

: 1/4 1/4 : 1/2


sorted work area sorted

1/4 merge , : 3/4


work area sorted

Figure 13.13: Merge X and Y with the work area.

2. x < y, for all y in Y , x in X. During merge, we repeatedly swap content of X and


the work area. After half of X is swapped, we start overriding X. Fortunately, we
only override the merged content. The right boundary of work area keep extending
to the 3/4 of the array. After that, we start swapping the content of Y and the
work area. Finally, the work area moves to the left side of the array, as shown in
the bottom of fig. 13.13.

The other cases are between the above two extremes. The work area finally moves to
the first 1/4 of the array. Repeat this, we always sort the second 1/2 of the work area,
swap the result to the first 1/2, and keep the work area in the middle. We halve the work
1 1 1
area every time , , , ... of the array, terminate when there is only one element left.
2 4 8
Alternatively, we can fallback to insert sort for the last few elements.
1: procedure Sort(A, l, u)
2: if u − l > 0 then
l+u
3: m←b c
2
4: w ←l+u−m
5: Sort’(A, l, m, w) . sort half
6: while w − l > 1 do
7: u0 ← w
l + u0
8: w←d e . halve the work area
2
9: Sort’(A, w, u , l)
0
. sort the remaining half
10: Merge(A, [l, l + u0 − w], [u0 , u], w)
11: for i ← w down-to l do . Switch to insert sort
12: j←i
13: while j ≤ u and A[j] < A[j − 1] do
14: Exchange A[j] ↔ A[j − 1]
15: j ←j+1
We round up the work area to ensure sufficient size, then pass the range and work
area to Merge. We next update Sort’, which calls Sort to swap the work area and the
merged part.
1: procedure Sort’(A, l, u, w)
2: if u − l > 0 then
l+u
3: m←b c
2
4: Sort(A, l, m)
5: Sort(A, m + 1, u)
212 CHAPTER 13. QUICK SORT AND MERGE SORT

6: Merge(A, [l, m), [m + 1, u), w)


7: else . Swap elements to the work area
8: while l ≤ u do
9: Exchange A[l] ↔ A[w]
10: l ←l+1
11: w ←w+1
This implementation needn’t shift sub-array, it keeps reducing the unordered part:
n n n
, , , ..., completes in O(lg n) steps. Every step sorts half of the remaining, then
2 4 8
merge in linear time. Let the time to sort n elements be T (n), we have the following
recursive result:
n n n 3n n 7n
T (n) = T ( ) + c + T ( ) + c + T( ) + c + ... (13.33)
2 2 4 4 8 8
For half elements, the time is:

n n n n 3n n 7n
T( ) = T( ) + c + T( ) + c + T( ) + c + ... (13.34)
2 4 4 8 8 16 16
Subtract eq. (13.33) and eq. (13.34):

n n 1 1
T (n) − T ( ) = T ( ) + cn( + + ...)
2 2 2 2
1
It adds with total lg n times, hence:
2
n c
T (n) = 2T ( ) + n lg n
2 2

Apply telescope method, (or master theorem) gives the result O(n lg2 n).

13.2.3 Nature merge sort

Figure 13.14: Burn from both ends

Knuth gives another implementation, called nature merge sort. It likes burning a
candle from both ends [51] . For any sequence, one can always find an ordered segment
from any position. Particularly, we can find such a segment from left end as shown in
below table.

15, 0, 4, 3, 5, 2, 7, 1, 12, 14, 13, 8, 9, 6, 10, 11


8, 12, 14, 0, 1, 4, 11, 2, 3, 5, 9, 13, 10, 6, 15, 7
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
13.2. MERGE SORT 213

The first row is the extreme case of a singleton segment; the third row is the other
extreme that the segment extends to the right end, the whole sequence is ordered. Sym-
metrically, we can always find ordered segment from right end, then merge the two sorted
segments, one from left, another from right. The advantage is to re-use the nature ordered
sub-sequences for partition.

8, 12, 14 0, 1, 4, 11 2, 3, 5 9 13, 10, 6 15, 7

merge

7, 8, 12, 14, 15 ... free cells ... 13, 11, 10, 6, 4, 1, 0

merge

Figure 13.15: Nature merge sort

As shown in fig. 13.15, scan from both ends, find the two longest ordered segments
respectively. Then merge them to the left of the work area. Next, restart to scan from
left and right to center. This time, merge the two segments from the right to left of
the work area. We switch the merge direction right/left in-turns. After scanning all
elements and merge them to the work area, swap the original array and the work area,
then start a new round of bi-directional scan and merge. Terminate when the ordered
segment extends to cover the whole array. This implementation processes the array from
both directions based on nature ordering, called nature two-way merge sort. As shown in
fig. 13.16, elements before a and after d are scanned. We span the ordered segment [a, b)
to right, meanwhile, span [c, d) to left. For the work area, elements before f and after f
are merged (consist of multiple sub-sequences). In odd rounds, we merge [a, b) and [c, d)
from f to right; in even rounds, merge from r to left.

a b c d

... scanned ... ... span [a, b) ... ... ? ... ... span [c, d) ... ... scanned ...

f r

... merged ... ... free cells ... ... merged ...

Figure 13.16: A status of nature merge sort

When start, we allocate a work area of the same size as the array. a, b point to the left
side, c, d point to the right side. f , r point to the two sides of the work area respectively.
1: function Sort(A)
2: if |A| > 1 then
3: n ← |A|
4: B ← Create-Array(n) . the work area
214 CHAPTER 13. QUICK SORT AND MERGE SORT

5: loop
6: [a, b) ← [1, 1)
7: [c, d) ← [n + 1, n + 1)
8: f ← 1, r ← n . front, rear of the work area
9: t←1 . even/odd round
10: while b < c do . elements yet to scan
11: repeat . Span [a, b)
12: b←b+1
13: until b ≥ c ∨ A[b] < A[b − 1]
14: repeat . Span [c, d)
15: c←c−1
16: until c ≤ b ∨ A[c − 1] < A[c]
17: if c < b then . Avoid overlap
18: c←b
19: if b − a ≥ n then . Terminate if [a, b) spans the whole array
20: return A
21: if t is odd then . merge to front
22: f ← Merge(A, [a, b), [c, d), B, f, 1)
23: else . merge to rear
24: r ← Merge(A, [a, b), [c, d), B, r, −1)
25: a ← b, d ← c
26: t←t+1
27: Exchange A ↔ B . Switch work area
28: return A
We need pass the merge direction in:
1: function Merge(A, [a, b), [c, d), B, w, ∆)
2: while a < b and c < d do
3: if A[a] < A[d − 1] then
4: B[w] ← A[a]
5: a←a+1
6: else
7: B[w] ← A[d − 1]
8: d←d−1
9: w ←w+∆
10: while a < b do
11: B[w] ← A[a]
12: a←a+1
13: w ←w+∆
14: while c < d do
15: B[w] ← A[d − 1]
16: d←d−1
17: w ←w+∆
18: return w
The performance does not depend on how ordered the elements are. In the ‘worst’
case, the ordered sub-sequences are all singleton. After merge, the length of the new
ordered sub-sequences are at least 2. Suppose we still encounter the ‘worst’ case in the
second round, the merged sub-sequences have length at least 4, ... every round doubles
the sub-sequence length, hence we need at most O(lg n) rounds. Because we scan all
elements every round, the total time is bound to O(n lg n). For list, we can’t scan from
tail back easily as array. A list consists multiple ordered sub-lists, we merge them in
pairs. It halves the sub-lists every round, and finally builds the sorted result. Define this
13.2. MERGE SORT 215

as (Curried form):

sort = sort0 ◦ group (13.35)

Where group breaks the list into ordered sub-lists:

group [ ] = [[ ]]
group [x] = ([[x]]
(13.36)
x<y: (x:g):gs, where (g:gs) = group (y:xs)
group (x:y:xs) =
otherwise : [x]:g:gs

sort0 [ ] = [ ]
sort0 [g] = g (13.37)
sort0 gs = sort0 (mergePairs gs)

Where:

mergePairs (g1 :g2 :gs) = merge g1 g2 : mergePairs gs


(13.38)
mergePairs gs = gs

Exercise 13.2
13.2.1. One defines sort0 = f oldr merge [ ]. Is the performance same as the pairwise
merge (mergeP airs)? If yes, prove it; if not, which one is faster?

13.2.4 Bottom-up merge sort


We can develop the bottom-up merge sort from the above performance analysis. First
wrap all elements as n singletons. Then merge them in pairs to obtain n2 ordered sub-lists
of length 2; If n is odd, there remains a single list. Repeat this paired merge to sort all.
Knuth calls it ‘straight two-way merge sort’ [51] , as shown in fig. 13.17.

...

... ...

...

Figure 13.17: Bottom-up merge sort

We needn’t partition the list. When start, convert [x1 , x2 , ..., xn ] to [[x1 ], [x2 ], ..., [xn ]],
then apply paired merge:

sort = sort0 ◦ map(x 7→ [x]) (13.39)


216 CHAPTER 13. QUICK SORT AND MERGE SORT

We reuse the mergePairs defined for nature merge sort, terminates when consolidates
to one list [3] . The bottom up sort is similar to the nature merge sort, different only in
partition method. It can be deduced from nature merge sort as a special case (the ‘worst’
case). Nature merge sort always spans the ordered sub-sequence as long as possible;
while the bottom up merge sort only spans the length to 1. From the tail recursive
implementation, we can eliminate the recursion and convert it to iterative loops.
1: function Sort(A)
2: n ← |A|
3: B ← Create-Array(n)
4: for i from 1 to n do
5: B[i] = [A[i]]
6: while n > 1 do
n
7: for i ← from 1 to b c do
2
8: B[i] ← Merge(B[2i − 1], B[2i])
9: if Odd(n) then
n
10: B[d e] ← B[n]
2
n
11: n←d e
2
12: if B = [ ] then
13: return [ ]
14: return B[1]

Exercise 13.3
13.3.1. Define the generic pairwise fold f oldp, and use it to implement the bottom-up
merge sort.

13.3 Parallelism
In quick sort implementation, we can parallel sorting the two sub-sequences after partition.
Similarly, to parallel merge sort. Actually, we don’t limit by two concurrent tasks, but
divide into p sub-sequences, where p is the number of processors. Ideally, if we can achieve
sorting in T 0 time with parallelism, where O(n lg n) = pT 0 , we say it’s linear speed up, and
the algorithm is parallel optimal. However, it is not parallel optimal by choosing p − 1
pivots, and partition the sequence into p parts for quick sort. The bottleneck happens in
the divide phase, that can only achieve in O(n) time. While, the bottleneck is the merge
phase for parallel merge sort. Both need specific design to speed up. Basically, the divide
and conquer nature makes merge sort and quick sort relative easy for parallelism. Richard
Cole developed parallel merge sort achieved O(lg n) performance with n processors in
1986 [76] . Parallelism is a big and complex topic out of the elementary scope [76] [77] .

13.4 Summary
This chapter gives two popular divide and conquer sort algorithms: quick sort and merge
sort. Both achieve the best performance of O(n lg n) for comparison based sort. Sedgewick
quotes quick sort as the greatest algorithm developed in the 20th century. Many pro-
gramming environments provide sort tool based on it. Merge sort is a powerful tool when
handling sequence of complex entities, or not persisted in array5 . Quick sort performs
5 In practice, most are kind of hybrid sort, for example, fallback to insert sort for small sequence.
13.5. APPENDIX: EXAMPLE PROGRAMS 217

well in most cases with fewer swaps than other methods. However, swap is not suitable for
linked-list, while merge sort is. It costs constant space and the performance is guaranteed
for all cases. Quick sort has advantage for vector storage like arrays, because it needn’t
extra work area and can sort in-place. This is a valuable feature particularly in embedded
system where memory is limited. In-place merging is till an active research area.
We can consider quick sort as the optimized tree sort. Similarly, we can also deduce
merge sort from tree sort [75] . People categorize sort algorithms in different ways [51] , for
example, from the perspective of partition and merge [72] . Quick sort is easy to merge,
because all the elements in one sub-sequence are not greater than the other. Merge is
equivalent to concatenation. While merge sort is easy to partition no matter cut at the
middle, even-odd split, nature split, or bottom up split. It’s difficult to achieve perfect
partition in quick sort, we can’t completely avoid the worst case no matter with median-
of-three pivot, random quick sort, or ternary quick sort.
As of this chapter, we’ve seen the elementary sort algorithms, including insert sort,
tree sort, selection sort, heap sort, quick sort, and merge sort. Sort is an important domain
in computer algorithm design. People are facing the ‘big data’ challenge when I write
this chapter. It becomes routine to sort hundreds of Gigabytes with limited resources and
time.

Exercise 13.4
13.4.1. Build a binary search tree from a sequence using the idea of merge sort.

13.5 Appendix: Example programs


In-place partition:
Int partition([K] xs, Int l, Int u) {
for (Int pivot = l, Int r = l + 1; r < u; r = r + 1) {
if xs[pivot] ≥ xs[r] {
l = l + 1
swap(xs[l], xs[r])
}
}
swap(xs[pivot], xs[l])
return l + 1
}

Void sort([K] xs, Int l, Int u) {


if l < u {
Int m = partition(xs, l, u)
sort(xs, l, m - 1)
sort(xs, m, u)
}
}

Bi-directional scan:
Void sort([K] xs, Int l, Int u) {
if l < u - 1 {
Int pivot = l, Int i = l, Int j = u
loop {
while i < u and xs[i] < xs[pivot] {
i = i + 1
}
while j ≥ l and xs[pivot] < xs[j] {
j = j - 1
}
if j < i then break
218 CHAPTER 13. QUICK SORT AND MERGE SORT

swap(xs[i], xs[j])
}
swap(xs[pivot], xs[j])
sort(xs, l, j)
sort(xs, i, u)
}
}

Merge sort:
[K] sort([K] xs) {
Int n = length(xs)
if n > 1 {
var ys = sort(xs[0 ... n/2 - 1])
var zs = sort(xs[n/2 ...])
xs = merge(xs, ys, zs)
}
return xs
}

[K] merge([K] xs, [K] ys, [K] zs) {


Int i = 0
while ys 6= [] and zs 6= [] {
xs[i] = if ys[0] < zs[0] then pop(ys) else pop(zs)
i = i + 1
}
xs[i...] = if ys 6= [] then ys else zs
return xs
}

Merge sort with work area:


Void sort([K] xs) = msort(xs, copy(xs), 0, length(xs))

Void msort([K] xs, [K] ys, Int l, Int u) {


if (u - l > 1) {
Int m = l + (u - l) / 2
msort(xs, ys, l, m)
msort(xs, ys, m, u)
merge(xs, ys, l, m, u)
}
}

Void merge([K] xs, [K] ys, Int l, Int m, Int u) {


Int i = l, Int k = l; Int j = m
while i < m and j < u {
ys[k++] = if xs[i] < xs[j] then xs[i++] else xs[j++]
}
while i < m {
ys[k++] = xs[i++]
}
while j < u {
ys[k++] = xs[j++]
}
while l < u {
xs[l] = ys[l]
l++
}
}

In-place merge sort:


Void merge([K] xs, (Int i, Int m), (Int j, Int n), Int w) {
while i < m and j < n {
swap(xs, w++, if xs[i] < xs[j] then i++ else j++)
}
Elementary Algorithms 219

while i < m {
swap(xs, w++, i++)
}
while j < n {
swap(xs, w++, j++)
}
}

Void wsort([K] xs, (Int l, Int u), Int w) {


if u - l > 1 {
Int m = l + (u - l) / 2
imsort(xs, l, m)
imsort(xs, m, u)
merge(xs, (l, m), (m, u), w)
}
else {
while l < u { swap(xs, l++, w++) }
}
}

Void imsort([K] xs, Int l, Int u) {


if u - l > 1 {
Int m = l + (u - l) / 2
Int w = l + u - m
wsort(xs, l, m, w)
while w - l > 2 {
Int n = w
w = l + (n - l + 1) / 2;
wsort(xs, w, n, l);
merge(xs, (l, l + n - w), (n, u), w);
}
for Int n = w; n > l; --n {
for Int m = n; m < u and xs[m] < xs[m-1]; m++ {
swap(xs, m, m - 1)
}
}
}
}

Iterative bottom up merge sort:


[K] sort([K] xs) {
var ys = [[x] | x in xs]
while length(ys) > 1 {
ys += merge(pop(ys), pop(ys))
}
return if ys == [] then [] else pop(ys)
}

[K] merge([K] xs, [K] ys) {


[K] zs = []
while xs 6= [] and ys 6= [] {
zs += if xs[0] < ys[0] then pop(xs) else pop(ys)
}
return zs ++ (if xs 6= [] then xs else ys)
}
220 Solution search
Chapter 14

Solution search

Computers enables people to search the solution for many problems: we build robot to
search and pick the gadget in assembly lane; we develop car navigator to search the map
for the best route; we make smart phone application to search the best shopping plan.
This chapter is about the elementary lookup, matching, and solution search algorithms.

14.1 k selection problem


A selection problem is to find the k-th smallest (or largest, k > 0) element in a collection
xs (list or array). The ordering is abstract, denoted as ≤. The intuitive method repeatedly
finds the minimum for k times. It takes O(n) time to find the minimum, where n = |xs|
is the size. The total performance is bound to O(kn). Alternatively, we can use heap to
update, access the top element in O(lg n) time, hence find the k-th element in O(k lg n)
time.
top k xs = f ind k (heapify xs) (14.1)
Or in Curried form:
top k = (f ind k) ◦ heapify (14.2)
Where:
f ind 1 = top
(14.3)
f ind k = (f ind (k − 1)) ◦ pop
We can do it even better with the divide and conquer method. Pick an arbitrary
element p in xs, split xs into as and bs with p (X = as + + bs), where as = [x ←
+ [p] +
xs, x ≤ p] and B = [x ← xs, x > p]. Let m = |as| be the size of as, compare m and k:
1. If m = k − 1, then p is the k-th element;
2. If m < k − 1, the k-th element is in as, drop bs and recursively search in as;
3. If m > k − 1, the k-th element is in bs, drop as and recursively search the (k − m)-th
element in bs.
Reuse the part function in quick sort (see eq. (13.2)):

m = k − 1 : x, where m = |as|, (as, bs) = part (≤ x) xs




top k (x:xs) = m < k − 1 : top (k − m) bs (14.4)
otherwise : top k as

221
222 CHAPTER 14. SOLUTION SEARCH

In ideal case, the split is balanced (the sizes of as and bs are almost same), halves the
size every time. The performance is O(n + n/2 + n/4 + ...) = O(n). Same as the quick
sort algorithm, the worst case happens when the partition is always unbalanced. The
performance downgrades to O(kn) or O((n − k)n). In average case, we can find the k-th
element in linear time. Most engineering practices in quick sort are applicable too, like
the ‘media of three’1 , and randomly pivot:
1: function Top(k, xs, l, u)
2: Exchange xs[l] ↔ xs[ Random(l, u) ] . Randomly select in [l, u]
3: p ← Partition(xs, l, u)
4: if p − l + 1 = k then
5: return xs[p]
6: if k < p − l + 1 then
7: return Top(k, xs, l, p − 1)
8: return Top(k − p + l − 1, xs, p + 1, u)
We can change to return all the top k elements (in arbitrary order), as below example
program:

tops _ [] = []
tops 0 _ = []
tops n (x:xs) | len == k = as
| len < n = as + + [x] ++ tops (n - len - 1) bs
| otherwise = tops n as
where
(as, bs) = partition ( ≤ x) xs
len = length as

14.2 Binary search


My high school teacher once played a ‘math magic’. He asked a student to pick a number
from 0 to 1000 in mind. He asked 10 questions, then figured out that number based on the
yes/no answers from the student. For example: is it even? is it prime? can it be divided
by 3? and etc. If halves the numbers with every question, one can find any number within
1000 because 210 = 1024 > 1000. The question of whether it is even, perfectly halves
numbers2 . The game becomes not so interesting when the player guess like: 1000, high;
50, low; 750, low; 890, low; 990, correct! This is the binary search method. To find x in
an ordered sequence A, one firstly tries the middle point y. Done if x = y; if x < y, then
drop the second half of A as it’s ordered; otherwise drop the first half. When A = [ ],
then x doesn’t exist. A need be ordered, I often see people are struggled with unordered
data, confusing why the binary search does not work. ‘Although the basic idea of binary
search is comparatively straightforward, the details can be surprisingly tricky’ said Donald
Knuth. Jon Bentley said most binary search implementations had error, including the
one he gave in ‘Programming pearls’. He corrected the error after two decades [2] . Below
is the binary search definition, where the lower and upper bounds of A is l and u (exclude
u).

1 Blum, Floyd, Pratt, Rivest, and Tarjan developed a linear time algorithm in 1973 [4] [81] . Split the

elements into groups, each has 5 elements at most. It gives n/5 medians. Repeat this to pick the median
of median.
2 There’s a ‘mind reading’ game in social network. One thinks about a person in mind. The AI robot

asks 16 questions, and tells who that person is from the yes/no answers
14.2. BINARY SEARCH 223

Nothing

u < l :

 u−l
Just m, where m = l + b

x = A[m] : c

bsearch x A (l, u) = 2 (14.5)


x < A[m] : bsearch x A (l, m − 1)
otherwise : bsearch x A (m + 1, u)

Or implement with loops:


1: function Binary-Search(x, A, l, u)
2: while l < u do
u−l l+u
3: m←l+b c . avoid b c overflow
2 2
4: if A[m] = x then
5: return m
6: if x < A[m] then
7: u←m−1
8: else
9: l ←m+1
10: Not found
The performance of binary search is bound to O(lg n) because it halves A every time.
We can extend it to solve equation of monotone functions, for example ax = y, where
a ≤ y, a and y are nature numbers. To find the integral x, exhaust a0 , a1 , a2 , ..., till ai = y
or ai < y < ai+1 (no solution). If a and x are big numbers, it’s expensive to compute
ax in loops3 . Let’s apply binary search. As ay ≥ y, we search in [0, 1, ..., y]. Function
0+y
f (x) = ax is monotone, fix x, we examine the middle point: xm = b c. If axm = y,
2
then xm is the solution; if am < y, we discard the range before xm ; otherwise discard
the range after xm . Both halve the search range. When the range becomes empty, it
means no solution. Below is the implementation. Denote the monotone function as f ,
call bsearch f y (0, y), where f (x) = ax to start. This method computes f (x) for O(lg y)
times, better than the exhaustive search.

Nothing


 u<l:
 l+u
f (m) = y : Just m, where m = b

c

bsearch f y (l, u) = 2 (14.6)
f (m) < y : bsearch f y (m + 1, u)


f (m) > y : bsearch f y (l, m − 1)

14.2.1 2D search
Extend binary search to 2D or even higher dimension. Consider matrix M of size m × n.
The elements in each row, column are ascending nature numbers as shown in fig. 14.1.
How to locate all elements equal to z? i.e. find all locations of (i, j), such that Mi,j = z.

[(x, y)|x ← [1, 2, ..., m], y ← [1, 2, ..., n], Mx,y = z] (14.7)

Richard Bird used to interview students with this question [1] . Those who had pro-
gramming experience at school tended to apply binary search. But it was easy to get
stuck. One often checks the middle point M m2 , n2 . If it is less than z, then drop the top-
left rectangle; if greater than z then drop the bottom-right rectangle, as shown in fig. 14.2,
discard the shaded rectangle. Both cases lead to a L-shape search area, where one can’t
apply recursive search directly any more. Define the 2D search as: given f (x, y), search
integer solution (x, y), such that f (x, y) = z. The matrix search is just a special case as:
3 One can reuse the result of an to compute an+1 = aan . We consider generic monotone f (n).
224 CHAPTER 14. SOLUTION SEARCH
 
1 2 3 4 ...

 2 4 5 6 ... 


 3 5 7 8 ... 

 4 6 8 9 ... 
...

Figure 14.1: Each row, column is ascending.

< ? ?

? ? >

Figure 14.2: Left: the middle point < z, all shaded rectangle < z; Right: the middle
point > z, all shaded rectangle > z.

(
1 ≤ x ≤ m, 1 ≤ y ≤ n : Mx,y
f (x, y) =
otherwise : −1

For monotone function f (x, y), e.g., f (x, y) = xa + y b , where a, b are nature numbers,
the effective method is to search from the top-left, but not bottom-left [82] . As shown in
fig. 14.3, start from (0, z), for each location (p, q), compare f (p, q) and z:

1. If f (p, q) < z, since f is monotone increasing, f (p, y) < z for all 0 ≤ y < q. Drop
all points in the vertical line segment (red);

2. If f (p, q) > z, then f (x, q) > z for all p < x ≤ z. Drop all points in the horizontal
line segment (blue);

3. If f (p, q) = z, then (p, q) is a solution. Drop both line segments.

Reduce the search rectangle line by line, every time drop a row, or a column, or both.

Figure 14.3: Search from top-left.

Define search function as below, and pass the top-left corner: search f z 0 z
14.2. BINARY SEARCH 225

p > z or q < 0 : [ ]


search f z (p + 1) q

f (p, q) < z :
search f z p q = (14.8)
f (p, q) > z :
 search f z p (q − 1)
(p, q) : search f z (p + 1) (q − 1)

f (p, q) = z :

Every time, at least one of p and q advances towards the bottom or right by one. It
needs at most 2(z + 1) steps. There are three best cases: (1) both p and q advance one
a time, there are total z + 1 steps. As in fig. 14.4 (a), all points in the diagonal line
(x, z − x) satisfy f (x, z − x) = z. It reaches to (z, 0) in z + 1 steps; (2) move to the right
horizontally till p exceeds z. As in fig. 14.4 (b), all points in the top horizontal line (x, z)
satisfy f (x, z) < z. It terminates after z + 1 steps; (3) move down vertically till q becomes
negative. As in fig. 14.4 (c), all points in the left vertical line (0, x) satisfy f (0, x) > z.
It terminates after z + 1 steps; (d) is the worst case. If project all the horizontal sections
in the search path to x axis, all the vertical sections to y axis, it gives the total steps
of 2(z + 1). This method improves the performance of exhaustive search from O(z 2 ) to
O(z).

Figure 14.4: The best and worst cases.

This algorithm is called the ‘saddle back’ search. The plot image of f has the smallest
bottom-left and the largest top-right. It looks like a saddle with two wings as shown in
fig. 14.5. We can further reduce the search rectangle of (0, z) − (z, 0). Since f is monotone
increasing, find the maximum m along y axis satisfying f (0, m) ≤ z; find the maximum
n along x axis satisfying f (n, 0) ≤ z. Reduce the search rectangle to (0, m) − (n, 0), as
shown in fig. 14.6.

m = max {0 ≤ y ≤ z, f (0, y) ≤ z}
(
(14.9)
n = max {0 ≤ x ≤ z, f (x, 0) ≤ z}

We can apply binary search to find m, n (fix x = 0 to search m, fix y = 0 to search


n). Modify eq. (14.6), search l ≤ x ≤ u satisfying f (x) ≤ y < f (x + 1).
226 CHAPTER 14. SOLUTION SEARCH

Figure 14.5: Plot of f (x, y) = x2 + y 2 .

Figure 14.6: Reduced search rectangle.


14.2. BINARY SEARCH 227

u ≤ l :
 l
 l+u
m, where m = b

f (m) ≤ y < f (m + 1) : c

bsearch f y (l, u) = 2 (14.10)


f (m) ≤ y : bsearch f y (m + 1, u)
bsearch f y (l, m − 1)


f (m) > y :

Then determine m, n with binary search:

m = bsearch (y 7→ f (0, y)) z (0, z)


(
(14.11)
n = bsearch (x 7→ f (x, 0)) z (0, z)

Finally, apply saddle back search in this smaller rectangle: solve(f, z) = search f z 0 m

p > n or q < 0 :

 []
search f z (p + 1) q


f (p, q) < z :
search f z p q = (14.12)

 f (p, q) > z : search f z p (q − 1)
(p, q) : search f z (p + 1) (q − 1)

f (p, q) = z :

We apply two rounds of binary search to find m, n, each round computes f for O(lg z)
times; The saddle back search computes f for O(m + n) times in the worst case; it’s
O(min(m, n)) in the best case as below table. For functions like f (x, y) = xa + y b ,
a, b ∈ N, the boundary m, n are very small. The total performance is close to O(lg z).

steps to compute f
worst 2 log z + m + n
best 2 log z + min(m, n)

As shown in fig. 14.7, for a point (p, q) in rectangle (a, b) − (c, d), if f (p, q) 6= z, we
can only discard the shaded part (≤ 1/4). If f (p, q) = z, we can discard the bottom-left,
top-right parts, and all points in row p and column q since f is monotone. Hence reduce
the search rectangle by 1/2. To find the point satisfying f (p, q) = z, we apply binary
search along the horizontal or vertical central line. Because the performance is bound to
O(lg |L|) for line L, we chose the shorter central line as shown in fig. 14.8.
If there is no point satisfying f (p, q) = z, find a point, such that f (p, q) < z <
f (p + 1, q) in the horizontal central line (or f (p, q) < z < f (p, q + 1) for vertical central
line). We can’t discard all points in row p and column q. In summary, we apply binary
search along horizontal central line for the point: f (p, q) ≤ z < f (p + 1, q); or search
the vertical central line for the point: f (p, q) ≤ z < f (p, q + 1). If all points in the line
segment are f (p, q) < z, then return the upper bound; if all are f (p, q) > z, then return
the lower bound. We discard half side in this case. Below is the improved saddle back
search:

1. Apply binary search along the x, y axes for the search rectangle (0, m) − (n, 0);

2. For rectangle (a, b) − (c, d), if the height > width, apply binary search along the
horizontal central line; otherwise search along the vertical central line for the point
(p, q);

3. If f (p, q) = z, it is a solution. Recursively search rectangles (a, b) − (p − 1, q + 1)


and (p + 1, q − 1) − (c, d);
228 CHAPTER 14. SOLUTION SEARCH

(a) If f (p, q) 6= z, we can only drop the shaded area, the remaining is a ’L’ shape.

(b) If f (p, q) = z, we can drop 1/2 rect-


angle.

Figure 14.7: Reduce the search rectangle.

Figure 14.8: Chose the shorter center line.


14.2. BINARY SEARCH 229

Figure 14.9: Recursively search the shaded parts, include the bold line if f (p, q) 6= z.

4. If f (p, q) 6= z, recursively search the two rectangles and a line section, either (p, q +
1) − (p, b) in fig. 14.9 (a); or (p + 1, q) − (c, q) in fig. 14.9 (b).

c < a or d < b : [ ]


search (a, b) (c, d) = c − a < b − d : csearch (14.13)
otherwise : rsearch

Where csearch apply binary search to the horizontal central line for point (p, q), such
that f (p, q) ≤ z < f (p + 1, q), as shown in fig. 14.9 (a). If all function values are greater
b+d
than z, then return the lower bound (a, b c). Drop the above side (include the central
2
line) as shown in fig. 14.10 (a).

Figure 14.10: Special case.

Let

b+d
q=b c

2
p = bsearch (x 7→ f (x, q)) z (a, c)
230 CHAPTER 14. SOLUTION SEARCH

f (p, q) > z : search (p, q − 1) (c, d)




csearch = f (p, q) = z : search (a, b) (p − 1, q + 1) + + search (p + 1, q − 1) (c, d)
+ [(p, q)] +
f (p, q) < z : search (a, b) (p, q + 1) +
+ search (p + 1, q − 1) (c, d)


(14.14)
Function rsearch is symmetric along the vertical central line. Below example program
implements the improved saddle back search:
solve f z = search f z (0, m) (n, 0) where
m = bsearch (f 0) z (0, z)
n = bsearch (λx → f x 0) z (0, z)

search f z (a, b) (c, d)


| c < a | | b < d = []
| c - a < b - d = let q = (b + d) `div` 2 in
csearch (bsearch (λx → f x q) z (a, c), q)
| otherwise = let p = (a + c) `div` 2 in
rsearch (p, bsearch (f p) z (d, b))
where
csearch (p, q)
| z < f p q = search f z (p, q - 1) (c, d)
| f p q == z = search f z (a, b) (p - 1, q + 1) + +
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p, q + 1) + +
search f z (p + 1, q - 1) (c, d)
rsearch (p, q)
| z < f p q = search f z (a, b) (p - 1, q)
| f p q == z = search f z (a, b) (p - 1, q + 1) + +
(p, q) : search f z (p + 1, q - 1) (c, d)
| otherwise = search f z (a, b) (p - 1, q + 1) + +
search f z (p + 1, q) (c, d)

As we halve the rectangle every time, we search O(lg(mn)) rounds. We apply binary
search along the central line for (p, q), compute f for O(lg(min(m, n))) times. Let the
time be T (m, n) when search m × n rectangle. We have the following recursive equation:
m n
T (m, n) = lg(min(m, n)) + 2T ( , ) (14.15)
2 2

Suppose m = 2i > n = 2j , use telescope method:

T (2i , 2j ) = j + 2T (2i−1 , 2j−1 )


i−1
X
= 2k (j − k)
(14.16)
k=0
i
= O(2 (j − i))
= O(m lg(n/m))

Richard Bird proves this is asymptotically optimal by a lower bound of searching a


given value in m × n rectangle [1] .

Exercise 14.1
14.1.1. Prove the performance of k-selection problem is O(n) in average.
14.1.2. To find the top k element in A, we can search x = max (take k A), y = min (drop k A).
If x < y, then the first k elements in A is the answer; otherwise, we partition the
first k elements with x, partition the rest with y, then recursively find in sub-
sequence [a ← A, x < a < y] for the top k 0 elements, where k 0 = k − |[a ← A, a ≤
x]|. Implement this solution, and evaluate its performance.
14.3. THE MAJORITY NUMBER 231

14.1.3. Find the ‘simplified’ median of two sorted arrays A and B in O(lg(m + n)) time,
where m = |A|, n = |B|. The array index starts from 0. The simplified median
m+n
is defined as median(A, B) = C[b c], where C = merge(A, B) is the merged
2
4
sorted array .
14.1.4. For the saddle back search, eliminate recursion, implement it in loops to update
the boundaries.
14.1.5. For 2D search, let the bottom-left be the minimum, the top-right be the maximum.
if z is less than the minimum or greater than the maximum, then no solution;
otherwise cut the rectangle into 4 parts with a horizontal line and a vertical line
crossed at the center. then recursive search in these 4 small rectangles. Implement
this solution and evaluate its performance.

14.3 The majority number


People often vote and use computer to count the result. Suppose a candidate wins if and
only if gets more than half votes. From the votes sequence A, B, A, C, B, B, D, ..., how
to find the winner efficiently? We can use a map to count the result (see chapter 2)6 .
Optional<T> majority([T] xs) {
Map<T, Int> m
for var x in xs {
if x in m then m[x]++ else mx[x] = 0
}
var (r, v) = (Optional<T>.Nothing, length(xs) / 2 - 1)
for var (x, c) in m {
if c > v then (r, v) = (Optional.of(x), c)
}
return r
}

The map is often implemented as a red-black tree or Hash table. For m candidates,
n votes, below table gives the performance:

map time space


self-balancing tree O(n lg m) O(m)
Hash table O(n) at least O(m)

Define the element occurs over 50% as the ‘majority’. Boyer and Moore developed an
algorithm in 1980, which picks the majority element in one scan if there is with constant
space [83] . There is at most 1 majority. Repeat dropping two different elements till all
remaining are same. If the majority exists, then it is the remaining. Start from the
first vote, let the candidate be the winner so far with point 1. If the next one votes the
same candidate, then add the winner point by 1, otherwise -1. The candidate won’t be
the winner when the point reduces to 0. We pick the candidate of the next vote as the
4 In statistics, the median of an ascending data set x with n elements is defined as:

odd(n) : x[ n+1
2
]
median(x) = 1 n n
even(n) : (x[ ] + x[ + 1])
2 2 2
6 There is a probabilistic sub-linear space counting algorithm published in 2004, named as ‘Count-min

sketch’ [84] .
232 CHAPTER 14. SOLUTION SEARCH

new winner and go on. As shown in below table, if there exists majority m, then other
candidates can’t beat m. Otherwise if the majority doesn’t exist (invalid vote result, no
winner), then discard the recorded ‘winner’. We need another scan to valid the winner.

winner count position


A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
C 1 A, B, C, B, B, C, A, B, A, B, B, D, B
C 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B
B 0 A, B, C, B, B, C, A, B, A, B, B, D, B
A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
A 1 A, B, C, B, B, C, A, B, A, B, B, D, B
A 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B
B 0 A, B, C, B, B, C, A, B, A, B, B, D, B
B 1 A, B, C, B, B, C, A, B, A, B, B, D, B

maj [ ] = ∅
(14.17)
maj (x:xs) = scan (x, 1) xs

Where scan is defined as:


scan (m, v) [ ] = m

m = x : scan (m, v + 1) xs
(14.18)

scan (m, v) (x:xs) = v=0: scan (x, 1) xs
otherwise : scan (m, v − 1) xs

Or implement with fold (Curried form): maj = f oldr f (∅, 0), where:

x = m :
 (m, v + 1)
f x (m, v) = v = 0 : (x, 1) (14.19)
otherwise : (m, v − 1)

Finally, verify the winner is the true majority:

verify m = if 2|f ilter (= m) xs| > |xs| then Just m else Nothing (14.20)

Below is the corresponding iterative implementation:


1: function Majority(A)
2: c ← 0, m ← ∅
3: for each a in A do
4: if c = 0 then
5: m←a
6: if a = m then
7: c←c+1
8: else
9: c←c−1
14.4. MAXIMUM SUM OF SUB-VECTOR 233

10: c←0
11: for each a in A do
12: if a = m then
13: c←c+1
14: if c > %50|A| then
15: return Just m
16: else
17: return Nothing

Exercise 14.2

14.2.1. Extend to find k majorities that occurs over bn/kc times in collection A, where
n = |A|. Hint: Drop k different elements every time, till the remaining is less than
k distinct candidates. Any k-majority (the one over bn/kc) must remain in the
end.

14.4 Maximum sum of sub-vector


For vector V , define a range V [i...j] as sub-vector, the sum of sub-vector is S = V [i] +
V [i + 1] + ... + V [j]. Empty [ ] is sub-vector of any vector with sum 0. How to find the
maximum sum of a given vector V [2] ? For example, in vector [3, -13, 19, -12, 1, 9, 18, -16,
15, -15], the sub-vector [19, -12, 1, 9, 18] gives the maximum sum of 35. If all elements
are positive, then the max is the total sum. If all are negative, then the empty vector
gives the max sum of 0. Below is the exhaustive search implementation:
1: function Max-Sum(V )
2: m ← 0, n ← |V |
3: for i ← 1 to n do
4: s←0
5: for j ← i to n do
6: s ← s + V [j]
7: m ← Max(m, s)
8: return m
The performance of exhaustive search is O(n2 ), where n is the vector length. Similar
to majority number algorithm, we scan the vector. For every position i, record the sum
of sub-vector ends with i as A, and the maximum sum so far as B. As shown in fig. 14.11.
A is not necessarily equal to B. We maintain B ≤ A always hold. When B + V [i] > A,
we replace A with this greater value. When B + V [i] < 0, we reset B to 0. Below table
gives the steps when scan [3, −13, 19, −12, 1, 9, 18, −16, 15, −15].

... A ... B ...

Figure 14.11: A: max sum so far; B: sum of the sub-vector ends with i.
234 CHAPTER 14. SOLUTION SEARCH

max sum max end at i yet to scan


0 0 [3, −13, 19, −12, 1, 9, 18, −16, 15, −15]
3 3 [−13, 19, −12, 1, 9, 18, −16, 15, −15]
3 0 [19, −12, 1, 9, 18, −16, 15, −15]
19 19 [−12, 1, 9, 18, −16, 15, −15]
19 7 [1, 9, 18, −16, 15, −15]
19 8 [9, 18, −16, 15, −15]
19 17 [18, −16, 15, −15]
35 35 [−16, 15, −15]
35 19 [15, −15]
35 34 [−15]
35 19 []

1: function Max-Sum(V )
2: A ← 0, B ← 0, n ← |V |
3: for i ← 1 to n do
4: B ← Max(B + V [i], 0)
5: A ← Max(A, B)
6: return A
Or implement with fold (Curried form): Smax = f st ◦ f oldr f (0, 0), where f update
the maximum sum so far:
0
f x (Sm , S) = (Sm = max(Sm , S 0 ), S 0 = max(0, x + S)) (14.23)

Exercise 14.3
14.3.1. Modify the solution that finds the max sum of sub-vector, returns the sub-vector
of the maximum sum.
14.3.2. Bentley gives a divide and conquer algorithm to find the max sum in O(n lg n)
time [2] . Split the vector at middle, recursively find the max sum of the two halves,
and the max sum that crosses the middle. Then pick the greatest. Implement this
solution.
14.3.3. Find the sub-metrics in a m × n metrics that gives the maximum sum.

14.5 String matching


String matching is widely used in editor applications. We can use data structures like
radix tree, prefix tree (see chapter 6) to search, or directly match the string as shown in
fig. 14.127 . We match a pattern P in text T character by character. At offset s = 4,
the first 4 are same as shown in fig. 14.12 (a), the 5th is ‘y’ in P , but ‘t’ in T . We stop
here, add s by 1 (move P to right by 1), then restart matching ‘ananym’ and ‘nantho...’.
Actually, we can increase s more than 1. The first two characters ‘an’ happen to be the
suffix of ‘anan’. We can add s by 2 as shown in fig. 14.12 (b). We reuse the information
from the 4 matched characters, skip some positions. Knuth, Morris and Pratt develope
an efficient matching algorithm from this idea [85] , known as ‘KMP’. the initials of the
three authors.
7 Some programming environment provide match tool, like strstr in C library, find in C++ library,

indexOf in Java library.


14.5. STRING MATCHING 235

a n y a n a n t h o u s a n a n y m f l o w e r T

s a n a n y m P
q

(a) The offset s = 4, after matching q = 4 characters, the 5th mismatches.


a n y a n a n t h o u s a n a n y m f l o w e r T

s a n a n y m P
q

(b) Move s = 4 + 2 = 6.

Figure 14.12: Match ‘ananym’ in ‘any ananthous ananym flower’.

Denote the first k characters of text T as Tk (the k-character prefix of T ). To shift


P to the right s steps as far as possible, we need reuse the information of the matched q
characters. As shown in fig. 14.13, if we can shift P ahead, there exists some k, such that
the first k characters are same as the last k characters of Pq , i.e., the prefix Pk is suffix of
Pq . Define empty “” be both the prefix and suffix of any string, hence the minimum k = 0
always exists. We need find the maximum k for the string that is both the prefix and
suffix. Define the prefix function π(q), that gives where to fallback when the (q + 1)-th
character doesn’t match [4] .

... T[i] T[i+1] T[i+2] ... T[i+q-1] ...

. P[1] P[2] ... P[j] P[j+1] ... P[q] ...

. P[1] P[2] ... P[k] ...

Figure 14.13: Pk is both the prefix and suffix of Pq .

π(q) = max{k|0 ≤ k < q, and Pk is suffix of Pq } (14.24)

When match pattern P against text T from offset s, if fails after matching q characters,
we next look up q 0 = π(q) to get a fallback position q 0 . Then retry to compare P [q 0 ] with
the text:
1: function KMP(T, P )
2: π ← Build-Prefixes(P )
3: n ← |T |, m ← |P |, q ← 0
4: for i ← 1 to n do
236 CHAPTER 14. SOLUTION SEARCH

5: while q > 0 and P [q + 1] 6= T [i] do


6: q ← π(q)
7: if P [q + 1] = T [i] then
8: q ←q+1
9: if q = m then
10: position i − m is a solution
11: q ← π(q) . search further
The definition eq. (14.24) is not practical to build π(q). Let us pre-process P . If
the first character doesn’t match, then the longest prefix and suffix is empty: π(1) = 0,
i.e., Pk = P0 = “”. When scan the q-th character in P , the prefix function values π(i),
i = 1, 2, ..., q − 1, are already known, and so far, the longest prefix Pk is also the suffix
of Pq−1 . As shown in fig. 14.14, if P [q] = P [k + 1], we find a greater k, and increase
k by 1; otherwise, if P [q] 6= P [k + 1], then lookup π(k) and fallback to a shorter prefix
Pk0 , where k 0 = π(k). Then compare the next character of this new prefix with the q-th
character. Repeat this till k becomes zero (empty string), or the q-th character matches.
Below table gives the prefix function values of ‘ananym’. The k column is the maximum
satisfying eq. (14.24).

P[1] P[2] ... P[k] P[k+1] ... P[q-1] P[q] ...

P[1] P[2] ... P[k] P[k+1] ...

Figure 14.14: Pk is suffix of Pq−1 , compare P [q] and P [k + 1].

q Pq k Pk
1 a 0 “”
2 an 0 “”
3 ana 1 a
4 anan 2 an
5 anany 0 “”
6 ananym 0 “”

1: function Build-Prefixes(P )
2: m ← |P |, k ← 0
3: π(1) ← 0
4: for q ← 2 to m do
5: while k > 0 and P [q] 6= P [k + 1] do
6: k ← π(k)
7: if P [q] = P [k + 1] then
8: k ←k+1
9: π(q) ← k
10: return π
The KMP algorithm builds the prefix function in amortized O(m) time [4] , and matches
the string in amortized O(n) time, where m = |P |, n = |T | are the lengths. The total
14.6. SOLUTION SEARCH 237

amortized performance is O(m + n), with additional O(m) space to store the prefix
function. Varies of pattern string P don’t impact the performance. Consider matching
pattern ‘aaa...ab’ (length of m) in string ‘aaa...a’ (length of n). The m-th character
doesn’t match, we can only fallback by 1 repeatedly. The algorithm is still bound to
linear time in this case.

14.6 Solution search

In early years of artificial intelligent, people developed methods to search solutions. Dif-
ferent from sequence and string matching, the solution may not directly exist among a set
of candidates. We need construct the solution while try varies of options. Some problems
are not solvable. Among the solvable ones, there can be multiple solutions. For example,
a maze may have multiple ways out. We need find the optimal solution sometimes.

14.6.1 DFS and BFS

DFS stands for deep-first search, and BFS stands for breadth-first search. They are
typical graph search algorithms. We give some examples and skip the formal definition
of graph.

Maze

Maze is a classic puzzle. There is saying: always turn right. However, it ends up loops
as shown in fig. 14.15 (b). The decision matters when there are multiple ways. In fairy
tales, one takes some bread crumbs in a maze. Select a way, leave a piece of bread. If
later enters a died end, then goes back to the last place through the bread crumbs. then
goes to another way. Whenever sees bread crumbs left, one knows he visited it before.
Then goes back and tries a different way. Repeats the ‘try and check’ step, one will either
find the way out, or go back to the starting point (no solution). We use m × n matrix to
define a maze, each element is 0 or 1, means there is a way or not. Below matrix defines
the maze in fig. 14.15 (b):

(a) Maze (b) Loop when keep turning right.

Figure 14.15: Maze


238 CHAPTER 14. SOLUTION SEARCH
 
0 0 0 0 0 0
0 1 1 1 1 0
 
0 1 1 1 1 0
 
0 1 1 1 1 0
 
0 1 1 1 1 0
 
0 0 0 0 0 0
1 1 1 1 1 0

Given a start point s = (i, j), a destination e = (p, q), we need find all paths from s
to e. We first examine all points connected with s. For every such point k, recursively
find all paths from k to e. Then prepend path s-k to every path from k to e. We need
leave some ‘bread crumbs’ to avoid looping. We use a list P to record all visited points.
Look it up and only try new ways.

solveMaze M s e = solve s [[ ]] (14.25)

Where:
(
s=e: map (reverse ◦ (s :)) P
solve s P = (14.26)
otherwise : concat [solve k (map (s :) P )|k ← adj s, k ∈
/ P]

The paths in P are reversed, we reverse the result back finally. Function adj p enu-
merates adjacent points to p:

adj (x, y) = [← [(x − 1, y), (x + 1, y), (x, y − 1), (x, y + 1)],


(14.27)
1 ≤ x0 ≤ m, 1 ≤ y 0 ≤ n, Mx0 y0 = 0]

This essentially ‘exhaustive searches’ all possible paths. We only need one way out. We
need some data structure serves for the ’bread crumbs’, recording the previous decisions.
We always search on top of the latest decision. We can use stack to realize the last-
in, first-out order. The stack starts from [s]. Pop s out and find all connected points of
a, b, ..., push the new paths [a, s], [b, s] to the stack. Next pop [a, s] out, examine all points
connected to a. Then push new paths consist of 3 steps to the stack. Repeat this. The
stack records paths in reversed order: from the farthest place back to the starting point,
as shown in fig. 14.16. If the stack becomes empty, we’ve tried all ways, and terminate
the search; otherwise, we pop a path, expand to new adjacent points, and push the new
paths back.

i j k

p [i, p, ..., s]

[j, p, ..., s]

[p, ..., s] [k, p, ..., s]

[a, s] [q, ..., s] [q, ..., s]

[s] [b, s] ... ... ...

Figure 14.16: Search with a stack

solveMaze M s e = solve [[s]] (14.28)

Where:
14.6. SOLUTION SEARCH 239

solve [ ] = []
c = e :
 reverse (p:ps)
solve ((p:ps):cs) = solve cs, where ks = f ilter (∈
ks = [ ] : / ps) (adj p)
solve ((map (: p:ps) ks) +

ks 6= [ ] : + cs)

(14.29)
Below is the iterative implementation:
1: function Solve-Maze(M, s, e)
2: S ← [s], L = [ ]
3: while S 6= [ ] do
4: P ← Pop(S)
5: p ← Last(P )
6: if e = p then
7: Add(L, P ) . find a solution
8: else
9: for each k in Adjacent(M, p) do
10: if k ∈
/ P then
11: Push(S, P +
+ [k])
12: return L
Each step tries 4 options (up, down, left, and right) through the backtrack. It seems
the performance is O(4n ), where n is the length of the path. The actual time won’t be so
large because we skip the visited places. In the worst case, we traverse all the reachable
points exactly once. Hence the time is bound to O(n), where n is the number of connected
points. We need additional O(n2 ) space for the stack.

Exercise 14.4
14.4.1. Modify the implementation with stack, find all ways to the maze.

The eight queens puzzle


Although cheese has very long history, it was late in 1848, that Max Bezzel gave the 8
queens puzzle [89] . The queue is a powerful piece. It attacks any other pieces in the same
row, column or diagonal at any distance, as shown in fig. 14.17 (a). How to arrange 8
queens in the cheese board, such that none of them attacks each other? Figure 14.17 (b)
gives a solution.

(a) Queen (b) A solution

Figure 14.17: The eight queens puzzle.

There are total P64


8
permutations to arrange 8 queens in 64 cells, about 4 × 1010 . Since
no two queens can be in the same row or column, a solution must be a permutation of
240 CHAPTER 14. SOLUTION SEARCH

[1, 2, 3, 4, 5, 6, 7, 8]. For example, the permutation [6, 2, 7, 1, 3, 5, 8, 4] means the first queen
is at row 1, column 6, the second queen is at row 2 column 2, ..., and the 8th queen is
at row 8, column 4. As such, we reduce the solution domain to 8! = 40320 permutations.
Arrange queues from the first row, there are 8 options (columns). For the next queue,
we need skip some columns to avoid attacking the first queue. For the i-th queue, find
the columns at row i, that not being attacked by the first i − 1 queues. If all 8 columns
are invalid, then go back to adjust the previous i − 1 queues. We find a solution after
arranged all 8 queues. Record it and search/backtrack further to find all solutions. Start
searching with an empty stack and list: solve [[ ]] [ ]

solve [ ] s = s(
|c| = 8 : solve cs (c:s)
solve (c:cs) s =
otherwise : solve ([x:c|x ← [1..8], x ∈
/ c, safe x c] +
+ cs) s
(14.30)
We’ve exhausted all options when the stack becomes empty, s records all the solutions;
If the arrangement c of the stack top has length of 8, we add this newly find solution
to s, then continue search; if |c| < 8, find the columns that are not occupied (x ∈ / c),
or attacked by other queues in diagonal (through safe x c). Then push the new valid
arrangements to the stack.

safe x c = ∀(i, j) ← zip (reverse c) [1, 2, ...] ⇒ |x − i| 6= |y − j|, where y = 1 + |c| (14.31)

Function safe checks if the queue at row y = 1 + |c|, column x is in the diagonal
with any other queue. Let c = [iy−1 , iy−2 , ..., i1 ] be the columns of the first y − 1 queues.
Reverse c, zip with 1, 2, ... to form coordinates: [(i1 , 1), (i2 , 2), ..., (iy−1 , y − 1)]. Then
check every (i, j) forms a diagonal with (x, y): |x − i| 6= |y − j|. This implementation is
tail recursive, we can turn it into loops:
1: function Solve-Queens
2: S ← [[ ]]
3: L←[] . Stores the solution
4: while S 6= [ ] do
5: A ← Pop(S) . A: arrangement
6: if |A| = 8 then
7: Add(L, A)
8: else
9: for i ← 1 to 8 do
10: if Valid(i, A) then
11: Push(S, A ++ [i])
12: return L

13: function Valid(x, A)


14: y ← 1 + |A|
15: for i ← 1 to |A| do
16: if x = A[i] or |y − i| = |x − A[i]| then
17: return False
18: return True
We only try the unoccupied ones out of the 8 columns, in total 15,720 arrangements. It
is less than 8! = 40, 320 [89] . Because the square board is horizontal and vertical symmetric,
when find a solution, we can rotate and flip to obtain other symmetric solutions (see
Exercise 14.5.2). We can expand to solve n queues puzzle (n ≥ 4). However, the time
increases fast along with n. The backtrack algorithm is slightly better than the exhaustive
permutations (bound to o(n!)).
14.6. SOLUTION SEARCH 241

Exercise 14.5
14.5.1. Extend the 8 queens to n queens.
14.5.2. There are 92 solutions to the 8 queens puzzle. For any solution, it’s also a solution
if rotates 90◦ . We can flip to get another solution. There are essentially 12 distinct
solutions. Write a program to find them.

The peg puzzle


As shown in fig. 14.18, there are 6 frogs in 7 stones, grouped on two sides. Each frog can
hop to the next stone if not occupied, or leap over to another empty one. The frogs can
only move forward or stay, but not go back. Figure 14.19 give the rules. How to arrange
the frogs to hop, leap, such that the left and right sides swap? Mark the left frogs as -1,
the right as 1, the empty stone as 0. We are seeking the solution that transitions from
s = [-1, -1, -1, 0, 1, 1, 1] to e = [1, 1, 1, 0, -1, -1, -1].

Figure 14.18: The leap frogs puzzle.

(a) Hop to the (b) Leap over to (c) Leap over to


next stone the right the left

Figure 14.19: Moving rules.

This is a special form of the peg puzzle. The number of pegs can be 8 or other even
numbers (as shown in Figure 14.208 ).

(a) Solitaire (b) Hop over (c) Draught


board

Figure 14.20: Variants of the peg puzzle

Label the stones from left as 1, 2, ..., 7. There are at most 4 options for every move.
When start for example, the frog on the 3rd stone can hop right to the empty stone; the
8 from http://home.comcast.net/~stegmann/jumping.htm
242 CHAPTER 14. SOLUTION SEARCH

frog on the 5th stone can hop left; the frog on the 2nd stone can leap right, the frog on
the 6th stone can leap left. We record the stone status and try the 4 options at every
step. Backtrack and try other options when get stuck. Because every frog can only move
forward, the movement is not revertible. Hence, we needn’t worry about repetition. We
record the steps only for the final output. State L is some permutation of s. L[i] is ±1, 0,
indicates there is a frog on the i-th stone heading left, right, or the stone is empty. Let
the empty stone be p, the 4 movements are:

1. Leap left: p < 6 and L[p + 2] > 0, swap L[p] ↔ L[p + 2];

2. Hop left: p < 7 and L[p + 1] > 0, swap L[p] ↔ L[p + 1];

3. Leap right: p > 2 and L[p − 2] < 0, swap L[p − 2] ↔ L[p];

4. Hop right: p > 1 and L[p − 1] < 0, swap L[p − 1] ↔ L[p].

Define four functions: leapl , hopl , leapr , and hopr , transition the status L 7→ L0 . If
can’t move, then returns L unchanged. We use a stack S to record the attempts. The
stack starts from a singleton list, containing the initial status. List M records all solutions.
We repeatedly pop the stack. If the state L = e, then we add this new solution to M ;
otherwise, we try 4 moves on top of L, and push the new status back.

solve [[−1, −1, −1, 0, 1, 1, 1]] [ ] (14.32)

Where:

solve [ ] s = s(
L=e: solve cs(reverse c : s), where : L = head c (14.33)
solve (c:cs) s =
otherwise : solve ((map (: c) (moves L)) + + cs) s

function moves tries 4 movements from L:

moves L = f ilter(6= L) [leapl L, hopl L, leapr L, hopr L] (14.34)

Below is the corresponding iterative implementation:


1: function Solve(s, e)
2: S ← [[s]]
3: M ←[]
4: while S 6= [ ] do
5: s ← Pop(S)
6: if s[1] = e then
7: Add(M , Reverse(s))
8: else
9: for each m in Moves(s[1]) do
10: Push(S, m:s)
11: return M
This method gives two symmetric solutions (15 steps for each). Below table lists one:
14.6. SOLUTION SEARCH 243

step -1 -1 -1 0 1 1 1
1 -1 -1 0 -1 1 1 1
2 -1 -1 1 -1 0 1 1
3 -1 -1 1 -1 1 0 1
4 -1 -1 1 0 1 -1 1
5 -1 0 1 -1 1 -1 1
6 0 -1 1 -1 1 -1 1
7 1 -1 0 -1 1 -1 1
8 1 -1 1 -1 0 -1 1
9 1 -1 1 -1 1 -1 0
10 1 -1 1 -1 1 0 -1
11 1 -1 1 0 1 -1 -1
12 1 0 1 -1 1 -1 -1
13 1 1 0 -1 1 -1 -1
14 1 1 1 -1 0 -1 -1
15 1 1 1 0 -1 -1 -1

For 3 frogs on each side, it takes 15 steps. We extend 3 to n, and list the steps against
n:

n frogs on each side 1 2 3 4 5 ...


number of steps 3 8 15 24 35 ...

The steps are all square numbers minus one: (n + 1)2 − 1. Let us prove it:
Proof. Compare the start and end states, every frog moves ahead n + 1 stones. The 2n
frogs in total move 2n(n + 1) stones. Every frog on the left must meet every one from
right once. The frog must leap over another one when meet. Because there are total n2
meets, they cause all frogs move ahead 2n2 stones. The remaining moves are not leaps,
but hops. There are total 2n(n + 1) − 2n2 = 2n hops. Sum up all n2 leaps and 2n hops,
the total steps are n2 + 2n = (n + 1)2 − 1.
The three puzzles share a common solution structure: start from some state, e.g. the
entrance to the maze; the empty chess board; pegs of [-1, -1, -1, 0, 1, 1, 1]. Search the
solution, try multiple options every step, e.g. 4 directions of up, down, left, and right in
maze; 8 columns in each row; leap and hop, right and left. Although we don’t know how
far a decision leads to, we clearly know the final state, e.g. the exit of the maze; complete
arranging 8 queens; all pegs are swapped.
We apply the same strategy: repeatedly try an option; record it with the newly
obtained state; backtrack when stuck and try another option. We either find a solution
or exhaust all options, hence know the problem is unsolvable. There are variants, like to
stop when find a solution, or continue search all solutions. If build a tree rooted at the
starting state, every branch represents an option, the search grows the tree deeper and
deeper. We don’t try alternatives at the same depth until fail and backtrack. Figure 14.21
shows the search order with arrows that go down then backtrack.
We call it deep first search (DFS), which is widely used in practice. Some programming
environments, like Prolog, use DFS as the default evaluation model. For example define
a maze with rules in Prolog:
c(a, b). c(a, e).
c(b, c). c(b, f).
244 CHAPTER 14. SOLUTION SEARCH

2 8

... ...
3 6 7

4 5

Figure 14.21: DFS search order.

c(e, d), c(e, f).


c(f, c).
c(g, d). c(g, h).
c(h, f).

Where predicate c(X, Y ) means X is connected with Y . This is a directed predi-


cate. We can add a reverse symmetric rule c(Y, X) or create a undirected predicate.
Figure 14.22 shows a directed graph. Given two places X and Y , Prolog tells if they are
connected with the following program:

a g

b e h

f d

Figure 14.22: A directed graph.

go(X, X).
go(X, Y) :- c(X, Z), go(Z, Y)

This program says: a place X is connected with itself. Given two places X and Y ,
if X is connected with Z, and Z is connected with Y , then X is connected with Y . For
multiple choices of Z, Prolog chooses one, then goes on searching. It only tries another
Z if the recursive search completes (fails or succeeds) and backtracks. This is exactly the
DFS. We can apply DFS when only need a solution, but don’t care the number of steps.
For example, we need a way out of the maze, although it may not be the shortest.

Exercise 14.6
14.6.1. Extend the pegs puzzle solution for n pegs on each side.

The wolf, goat, and cabbage puzzle


This traditional puzzle says that a farmer is going to cross the river with a wolf, a goat,
and a bucket of cabbage. There is a small boat. Only the farmer can drive it. The boat
14.6. SOLUTION SEARCH 245

can only carry one thing a time. The wolf would kill the goat; the goat would bite the
cabbage if they stay alone without the farmer. The puzzle asks to find the best (fast)
solution to cross the river.
Since the wolf doesn’t bite the cabbage, the farmer can safely carry the goal to the
other side and go back. No matter carry the wolf or the cabbage next, the farmer need
carry one back to avoid conflict. To find the best the solution, we parallel try all options
and compare. Despite the direction, count back and forth 2 steps. We check all possible
status after 1, 2, 3, ... steps, till the farmer and all things move to the other side at step
n. This is the best solution.
But how to parallel try all options? Consider a lucky draw. People pick a ball from
a box of colored balls. There is a black ball, and the rest are white. The one picks the
black wins, or returns the white ball back to the box and waits for the next draw. We
define the rule that nobody can try a second draw before all others pick. We line people
in a queue. Every time the first person picks a ball, moves to the tail if doesn’t win. The
queue ensures the fairness.

ball box

Figure 14.23: The i-th person de-queues, draws, then in-queues if doesn’t win.

We apply the same method for the cross river puzzle. Let set A, B contain the things
on each side. When start, A = {w, g, c, p} includes the wolf, the goat, the cabbage, and
the farmer; B = ∅. We move the farmer with or without another element between A and
B. If a set doesn’t contain the farmer, then it should not have conflict elements. The
goal is to swap elements in A and B with the fewest steps. We initiate a queue Q with
the start status: A = {w, g, c, p}, B = ∅. As far as Q isn’t empty, we de-queue the head,
try all options, then en-queue the new status back to the tail. We find the solution when
the head becomes A = ∅, B = {w, g, c, p}. Figure 14.24 shows the search order. As all
options at the same level are tried, we needn’t backtrack.
We can represent the set with a binary number of four bits, each bit stands for an
element, e.g., the wolf w = 1, the goat g = 2, the cabbage c = 4, and the farmer p = 8.
0 is the empty set, 15 is the full set. 3 = 1 + 2, means the set {wolf, goat}. It’s invalid
because the wolf will kill the goat; 6 = 2 + 4, is another conflict {goat, cabbage}. Every
time, we move the highest bit (8), with or without another bit (4, 2, 1) from one number
to the other. The options are:
246 CHAPTER 14. SOLUTION SEARCH

2 3 4

5 6 7

8 9 10

Figure 14.24: Start from 1, check all options 2, 3, 4 for the next step; then all option for
the 3rd step, ...

[(A − 8 − i, B + 8 + i)|i ← [0, 1, 2, 4], i = 0 or A∧i 6= 0]


(
B<8:
mv A B =
otherwise : [(A + 8 + i, B − 8 − i)|i ← [0, 1, 2, 4], i = 0 or B∧i 6= 0]
(14.35)
Where ∧ is bitwise-and. We start searching from Q = {[(15, 0)]}, as: solve Q
solve ∅ = ∅
reverse c, where: (A, B) = c, (c, Q0 ) = pop Q
(
A=0:
solve Q =
otherwise:solve (pushAll (map (: c) (f ilter (valid c) (mv A B))) Q0 )
(14.36)
Where valid c checks if the move (A, B) is valid, neither 3 nor 6, and is new (∈
/ c):
A, B 6= 3 or 6, (A, B) ∈
/c (14.37)
Below is the iterative implementation:
1: function Solve
2: S ← []
3: Q ← {[(15, 0)]}
4: while Q 6= ∅ do
5: C ← DeQ(Q)
6: if C[1] = (0, 15) then
7: Add(S, Reverse(C))
8: else
9: for each m in Moves(C) do
10: if Valid(m, C) then
11: EnQ(Q, m:C)
12: return S
It outputs two best solutions:

Left river Right


wolf, goat, cabbage, farmer
wolf, cabbage goat, farmer
wolf, cabbage, farmer goat
cabbage wolf, goat, farmer
goat, cabbage, farmer wolf
goat wolf, cabbage, farmer
goat, farmer wolf, cabbage
wolf, goat, cabbage, farmer
14.6. SOLUTION SEARCH 247

Left river Right


wolf, goat, cabbage, farmer
wolf, cabbage goat, farmer
wolf, cabbage, farmer goat
wolf goat, cabbage, farmer
wolf, goat, farmer cabbage
goat wolf, cabbage, farmer
goat, farmer wolf, cabbage
wolf, goat, cabbage, farmer

Water jugs puzzle

Given two water jugs, 9 and 4 litres. How to get 6 litres of water from river? This puzzle
has history back to ancient Greece. A story said the French mathematician Poisson solved
this puzzle when he was a child. It also appears in Hollywood movie ‘Die-Hard 3’. Pòlya
uses this puzzle as an example of backwards induction [90] .

9 9

4 4
6

Figure 14.25: The last two steps.

After fill the 9 litres jug, then pour to the 4 litres jug twice, one obtains 1 litre of water,
as shown in fig. 14.26. Backwards induction is a strategy, but not detailed algorithm. It
can’t directly answer how to get 2 litres of water from two jugs of 899 litres and 1147
litres for example.

9 9

4 4
4

1 1

Figure 14.26: Fill the bigger jug, then pour to the smaller one twice.

Let the small jug be A, the big jug be B. There are 6 operations each time: (1) Fill
jug A; (2) Fill jug B; (3) Empty jug A; (4) Empty jug B; (5) Pour from jug A to B; (6)
Pour water from jug B to A. Below lists a series of operations (assume a < b < 2a).
248 CHAPTER 14. SOLUTION SEARCH

A B operation
0 0 start
a 0 fill A
0 a pour A to B
a a fill A
2a - b b pour A to B
2a - b 0 empty B
0 2a - b pour A to B
a 2a - b fill A
3a - 2b b pour A to B
... ... ...

Whatever operations, the water in each jug must be xa + yb, from some integers x and
y, where a and b are jug volumes. From the number theory, we can get g litres of water if
and only if g is some multiple of the greatest common divisor of a and b, i.e., gcd(a, b)|g.
If gcd(a, b) = 1 (a and b are coprime), then we can get any nature number g litres of
water. Although we know the existence of the solution, we don’t know the detailed steps.
We can solve the Diophantine equation g = xa + yb, then design the operations from x
and y. Assume x > 0, y < 0, we fill jug A total x times, empty jug B total y times. For
example, the small jug a = 3 litres, the big jug b = 5 litres, and the goal is to get g = 4
litres of water. Because 4 = 3 × 3 − 5, we design below operations:

A B operation
0 0 start
3 0 fill A
0 3 pour A to B
3 3 fill A
1 5 pour A to B
1 0 empty B
0 1 pour A to B
3 1 fill A
0 4 pour A to B

We fill jug A 3 times, empty jug B 1 time. We can apply the Extended Euclid algorithm
in number theory to find x and y:

(d, x, y) = gcdext (a, b) (14.38)

Where d = gcd(a, b), ax + by = d. Assume a < b, the quotient q and remainder r


satisfy b = aq + r. The common divisor d divides both a and b, hence d divides r too.
Because r < a, we can scale down the problem to find gcd(a, r):

(d, x0 , y 0 ) = gcdext (r, a) (14.39)

Where d = x0 r + y 0 a. Substitute r = b − aq in:

d = x0 (b − aq) + y 0 a
(14.40)
= (y 0 − x0 q)a + x0 b

Compare with d = ax + by, we have the following recursion:


14.6. SOLUTION SEARCH 249

b
x = y 0 − x0

a (14.41)
y = x0

The edge case happens when a = 0: gcd(0, b) = b = 0a + 1b. Hence the extended
Euclid algorithm can be defined as:

gcdext (0, b) = (b, 0, 1)


b (14.42)
gcdext (a, b) = (d, y 0 − x0 , x0 )
a
Where d, x0 , y 0 are defined in eq. (14.39). If g = md, then mx and my is a solution; if
x < 0, for example: gcdext (4, 9) = (1, −2, 1). Since d = xa + yb, we repeatedly add x by
b, and decrease y by a till x > 0. Such solution may not be the best one. For example, to
get 4 litres of water from two jugs of 3 and 5 liters, the extended Euclid algorithm gives
23 steps:
[(0,0),(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),
(0,4),(3,4),(2,5),(2,0),(0,2),(3,2),(0,5),(3,5),
(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),(0,4)]

While the best solution only need 6 steps:


[(0,0),(0,5),(3,2),(0,2),(2,0),(2,5),(3,4)]

There are infinite many solutions for the Diophantine equation g = xa + by. The
smaller |x| + |y|, the fewer steps. We can apply the same method as the ‘cross river’
puzzle. Try the 6 operations (fill A, fill B, pour A into B, pour B into A, empty A, and
empty B) in parallel to find the best solution. We use a queue to arrange the attempts.
The elements in the queue are series of pairs (p, q), where p and q are water in each jug,
as operations from the beginning. The queue starts from {[(0, 0)]}.

solve a b g = bfs {[(0, 0)]} (14.43)

As far as the queue isn’t empty, we pop a sequence from the head. If the last pair of
the sequence contains g litres, we find a solution. We reverse and output the sequence;
otherwise, we try 6 operations atop the latest pair, filter out the duplicated ones and add
back to the queue.

bfs ∅ = [(]
p or q = g : reverse s, where: (p, q) = head s, (s, Q0 ) = pop Q (14.44)
bfs Q =
otherwise : bfs (pushAll (map (: s) (try s)) Q0 )

try s = f ilter (∈
/ s) [f (p, q)|f ← {f lA , f lB , prA , prB , emA , emB }] (14.45)

Where:

f lA (p, q) = (a, q)


f lB (p, q) = (p, b)





em (p, q) = (0, q)
(14.46)
A

 emB (p, q) = (p, 0)
prA (p, q) = (max(0, p + q − b), min(x + y, b))





prB (p, q) = (min(x + y, a), max(0, x + y − a))

This method returns the solution with the fewest steps. To avoid storing the complete
operation sequence in the queue, we can use a global history list, and link every operation
250 CHAPTER 14. SOLUTION SEARCH

back to its predecessor. As shown in fig. 14.27, the start state is (0, 0), only ‘fill A’ and
‘fill B’ are applicable. We next try ‘fill B’ atop (3, 0), record the new state (3, 5). If
apply ‘empty A’ to (3, 0), we’ll go back to the starting point (0, 0). We skip it (shaded
state). We add a ‘parent’ reference to each node in fig. 14.27, and backtrack along it to
the beginning.

(0, 0) (0, 0)
fill B (3, 0)
fill A

(0, 5)
(3, 0) pour A (0, 5)
fill B fill A pour B (3, 5)

(0, 3)
(3, 5) (0, 3) (3, 5) (3, 2)
empty A empty B (3, 2)

...
(0, 0) (0, 0)
...

Figure 14.27: Store all states with a global list.

1: function Solve(a, b, g)
2: Q ← {(0, 0, NIL)} . Queue
3: V ← {(0, 0, NIL)} . Visited set
4: while Q 6= ∅ do
5: s ← Pop(Q)
6: if p(s) = g or q(s) = g then
7: return Back-track(s)
8: else
9: for each c in Expand(s, a, b) do
10: if c 6= s and c ∈
/ V then
11: Push(Q, c)
12: Add(V, c)
13: return NIL

14: function Expand(s, a, b)


15: p ← p(s), q ← q(s)
16: return [(a, q, s), (p, b, s), (0, q, s), (p, 0, s), (max(0, p+q−b), min(p+q, b), s), (min(p+
q, a), max(0, p + q − a), s)]

17: function Back-track(s)


18: r←[]
19: while s 6= NIL do
20: (p, q, s0 ) = s
21: r ← (p, q):r
22: s ← s0
23: return r

Exercise 14.7

14.7.1. Improve the extended Euclid algorithm, find the x and y that minimize |x| + |y|
for the optimal solution for the two jugs puzzle.
14.6. SOLUTION SEARCH 251

Kloski
Kloski is a block sliding puzzle, as shown in fig. 14.28. There are 10 blocks of 3 sizes: 4
pieces of 1 × 1; 4 pieces of 1 × 2, 1 piece of 2 × 1, 1 piece of 2 × 2. The goal is to slide the
big block to the bottom slot. Figure 14.29 shows variants of this puzzle in Japan.

(a) Initial layout (b) Layout after sev-


eral movements

Figure 14.28: ‘Huarong Escape’, the traditional Chinese Kloski puzzle.

Figure 14.29: ‘Daughter in the box’, the Japanese Kloski puzzle.

We define the board as a 5 × 4 matrix, the row and column start from 0. Label the
pieces from 1 to 10. 0 means empty cell. The matrix M gives the initial layout. The cells
with value i is occupied by piece i. We use a map L to represent the layout, where L[i]
is the set of cells occupied by piece i. For example, L[4] = {(2, 1), (2, 2)} means the 4th
piece occupies cells (2, 1) and (2, 2). Label all 20 cells from 0 to 19, we can convert a pair
of row, col to label by: c = 4y + x. The 4th piece occupies cells L[4] = {9, 10}.
 
1 10 10 2  
 1 10 10 2   1 7→ {0, 4}, 2 7→ {3, 7}, 3 7→ {8, 12}, 
 
4 7→ {9, 10}, 5 7→ {11, 15},
   
M = 3 4 4 5  L=
 6 7→ {16}, 7 7→ {13}, 8 7→ {14},
 
 3 7 8 5 
 

9 7→ {19}, 10 7→ {1, 2, 5, 6}
 
6 0 0 9

Define map ϕ(M ) 7→ L and its reverse ϕ−1 (L) 7→ M to convert between board and
layout:
1: function ϕ(M )
2: L ← {}
3: for y ← 0 ∼ 4 do
4: for x ← 0 ∼ 3 do
252 CHAPTER 14. SOLUTION SEARCH

5: k ← M [y][x]
6: L[k] ← Add(L[k], 4y + x)
7: return L

8: function ϕ−1 (L)


9: M ← [[0] × 4] × 5
10: for each (k 7→ S) in L do
11: for each c in S do
12: x ← c mod 4, y ← bc/4c
13: M [y][x] ← k
14: return M
We try all the 10 blocks in 4 directions: up, down, left, and right. For board matrix,
the movement means: (∆y, ∆x) = (0, ±1), (±1, 0); for layout of cell labels, it means:
d = ±1, ±4. For example, move piece L[i] = {c1 , c2 } to left, it becomes: {c1 −1, c2 −1}. We
need avoid invalid movement in two edge cases: d = 1, c mod 4 = 3 and d = −1, c mod 4 =
0. They are invalid because the piece jumps from one side to the other. Consider the two
free cells, there are at most 8 movements. For example, the first step only have 4 options:
move piece 6 right, move piece 7 or 8 down, move piece 9 left. Figure 14.30 shows how to
verify the movement is valid.

1 1
1 0 1 1
1 1
1 1 1 1

0 1 2 2

Figure 14.30: Left: two cells of 1 can move; Right: the lower cell of 1 conflicts with the
cell of 2.

For the movement of piece k, it is valid if the target cells have value of 0 or k:
valid L[k] d :
∀c ∈ L[k] ⇒ y = bc/4c + bd/4c, x = (c mod 4) + (d mod 4), (14.47)
(0, 0) ≤ (y, x) ≤ (4, 3), M [y][x] ∈ {k, 0}
We may return to some layout after a series of slides. It’s insufficient to only avoid
duplicated matrix. Although M1 6= M2 , they are essentially the same layout.
   
1 10 10 2 2 10 10 1
 1 10 10 2   2 10 10 1 
   
M1 =  3 4 4 5  M2 =  3 4
  4 5  
 3 7 8 5   3 7 6 5 
6 0 0 9 8 0 0 9
We need avoid duplicated layout. Treat all pieces of the same size same, we define
normalized layout as: kLk = {p|(k 7→ p) ∈ L}, the set of all cell labels in L. Both matrix
above have the same normalized layout as {{1, 2, 5, 6}, {0, 4}, {3, 7}, {8, 12}, {9, 10},
{11, 15}, {16}, {13}, {14}, {19}}. We also need avoid mirrored layout, for example:
   
10 10 1 2 3 1 10 10
 10 10 1 2   3 1 10 10 
   
M1 =  3 5 4 4  M2 =  4 4 2
  5 
 3 5 8 9   7 6 2 5 
6 7 0 0 0 0 9 8
14.6. SOLUTION SEARCH 253

They have symmetric normalized layout. Define the mirror function:

mirror(kLk) = {{f (c)|c ∈ s}|s ∈ kLk} (14.48)

Where f (c) = 4y 0 + x0 , y 0 = bc/4c, x0 = 3 − (c mod 4). We use a queue to arrange the


search. The element in the queue has two parts: a series of movements, and the resulted
layout. The movement is a pair (k, d), means move piece k by d (some value of ±1, ±4).
Initialize the queue Q = {(s, [ ])}, where s is the starting layout. As far as the queue
isn’t empty Q 6= ∅, we get its head, examine whether the big block (piece 10) arrives at
t = {13, 14, 17, 18}, i.e., L[10] = t. Terminate if yes; otherwise, we try up, down, left,
right for every piece, add every valid (k, d), that leads to distinct layout to the queue. We
use a set H to record all visited normalized layouts to avoid repetition.

solve ∅ H = (
[]
L[10] = t : reverse ms, where: ((L, ms), Q0 ) = pop Q (14.49)
solve Q H =
otherwise : solve (pushAll cs Q0 ) H 0

Where cs = [(move L e, e:ms)|e ← expand L] are the new movements expanded.

expand L = {(k, d)| k ← [1, 2, ..., 10], d ← [±1, ±4],


(14.50)
valid k d, unique k d}

Function move slides piece L[k] by d to: move L (k, d) = map (+d) L[k]. unique
checks if the normalized layout kL0 k ∈
/ H and its mirror(kL0 k) ∈
/ H. Add them to H 0 if
new. Below are the iterative implementation. The solution has 116 steps (1 cell a step).
The last 3 are:
1: function Solve(s, e)
2: H ← {ksk}
3: Q ← {(s, ∅)}
4: while Q 6= ∅ do
5: (L, p) ← Pop(Q)
6: if L[10] = e then
7: return (L, p)
8: else
9: for each L0 in Expand(L, H) do
10: Push(Q, (L0 , L))
11: Add(H, kL0 k)
12: return ∅

['5', '3', '2', '1']


['5', '3', '2', '1']
['7', '9', '4', '4']
['A', 'A', '6', '0']
['A', 'A', '0', '8']

['5', '3', '2', '1']


['5', '3', '2', '1']
['7', '9', '4', '4']
['A', 'A', '0', '6']
['A', 'A', '0', '8']

['5', '3', '2', '1']


['5', '3', '2', '1']
['7', '9', '4', '4']
['0', 'A', 'A', '6']
['0', 'A', 'A', '8']
254 CHAPTER 14. SOLUTION SEARCH

The cross river puzzle, the water jugs puzzle, and the Kloski puzzle share another
common solution structure. Similar to the DFS, they have start and end states. For
example, the cross river puzzle starts with all things on one side, the other side is empty;
it ends with all things on the other side. The water jugs puzzle starts with two empty
jugs; it ends with either jug has g litres of water. The Klotski puzzle starts with a given
layout, it ends with some layout that the big block arrives at the bottom slot. Every
puzzle has a set of rules, transfers from a state to another. We ‘parallel’ try all options.
We don’t search further until complete trying all options of the same step. This search
strategy ensures we find the solution with the fewest steps before others. Because we
expand horizontally (as in fig. 14.31), it’s called breadth-first search (BFS).

1
1

2 3 4
2 8

5 6 7
... ...
3 6 7

8 9 10
4 5

(a) DFS (b) BFS

Figure 14.31: DFS and BFS.

Because we can’t really search in parallel, we realize BFS with a queue. Repeatedly
de-queue the candidate with fewer steps from head, and en-queue new candidate with
more steps to tail. BFS provides a simple method to search the solution with the fewest
steps. However, it can’t directly search for generic optimal solution. Consider the directed
graph in fig. 14.32, the length of each section varies. We can’t use BFS to find the shortest
path between two nodes. For example, the shortest path from a to c is not the one with
the fewest steps: a → b → c (the total length of 22), but the path with more steps
a → e → f → c (the total length of 20).

a g

15 4 9

b e h 8

11 10
12 5

7 f d

Figure 14.32: A weighted directed graph.

Exercise 14.8
14.8.1. John Conway9 gives a slide tile puzzle. Figure 14.33 is a simplified example. There
are 8 cells, 7 are occupied. Label the pieces from 1 to 7. Each piece can slide to
9 John Conway (1937 - 2020), British mathematician.
14.6. SOLUTION SEARCH 255

the connected free cell. (two cells are connected if there is a line between them.)
How to reverse the pieces from 1, 2, 3, 4, 5, 6, 7 to 7, 6, 5, 4, 3, 2, 1 by sliding?
Write a program to solve this puzzle.
1 7

2 6

3 5

Figure 14.33: Conway slide puzzle

14.6.2 Greedy algorithm


People need find the ‘best’ solution to minimize time, space, cost, energy, and etc. It’s
not easy to find the optimal solution within limited resource. Many problems don’t have
solution in polynomial time, however, there exist simple solutions for a small portion of
the special problems.

Huffman coding
Huffman coding encodes information with the shortest length. The ASCII code needs 7
bits to encode characters, digits, and symbols. It can represent 27 = 128 symbols. We
need at least log2 n bits of 0/1 to distinguish n symbols. Below table encodes upper
case English letters, maps A to Z from 0 to 25, each with 5 bits. Zero is padded as
00000 but not 0. Such scheme is called fixed-length coding. For example, it encodes
‘INTERNATIONAL’ to a binary number of 65 bits:

char code char code


A 00000 N 01101
B 00001 O 01110
C 00010 P 01111
D 00011 Q 10000
E 00100 R 10001
F 00101 S 10010
G 00110 T 10011
H 00111 U 10100
I 01000 V 10101
J 01001 W 10110
K 01010 X 10111
L 01011 Y 11000
M 01100 Z 11001

00010101101100100100100011011000000110010001001110101100000011010

Another scheme is the variable-length coding. Encode A as single bit 0, encode C as


10 of two bits, encode Z as 11001 of 5 bits. Although the code length is shorter, there is
ambiguity when decode. For example, the binary number 1101 may stand for 1 followed
256 CHAPTER 14. SOLUTION SEARCH

with 101 (decoded as ‘BF’) or 110 followed with 1 (decoded as ‘GB’), or 1101 (decoded as
N). The Morse code is variable-length. It encodes the most used letter ‘E’ as ‘.’, encodes
‘Z’ as ‘- -..’. Particularly, it uses a special pause separator to indicate the termination of
a code, hence, eliminates the ambiguity. Below code table is ambiguity free, it encodes
‘INTERNATIONAL’ with only 38 bits.

char code char code


A 110 E 1110
I 101 L 1111
N 01 O 000
R 001 T 100

10101100111000101110100101000011101111

The reason why it’s ambiguity free is because there is no code is the prefix of the
other. Such code is called prefix-code. (but not the ‘non-prefix code’.) Since the prefix-
code needn’t separator, we can further shorten the code length. Given a text, can we find
a prefix-code scheme, that produces the shortest code? In 1951, Robert M. Fano told the
class that who could solve this problem needn’t take the final exam. Huffman was still a
student in MIT [91] . He almost gave up and started preparing the final exam when found
the answer. Huffman created the coding table according to the frequency of the symbol
appeared in the text. The more used one is assigned with the shorter code. He could
process the text, and calculate the occurrence for each symbol, then define the weight as
the frequency. Huffman uses a binary tree to generate the prefix-code. The symbols are
stored in the leaf nodes. Traverse from the root to generate the code, add 0 when go left,
1 when go right, as shown in fig. 14.34. For example, starting from the root, go left, then
right, we arrive at ‘N’. Therefore, ‘N’ is encoded as ‘01’; While the paths of ‘A’ is right
→ right → left, encoded as ‘110’.

13

5 8

N, 3
2 4 4

A, 2
O, 1 R, 1 T, 2 I, 2 2

E, 1 L, 1

Figure 14.34: Huffman tree

We can use the tree to decode as well. Scan the binary bits, go left for 0, and right
for 1. When arrive at a leaf, we decode the symbol from it. Then restart from the root to
continue scan. Huffman builds the tree in bottom-up way. When start, wrap all symbols
in leaves. Every time, pick two nodes with the minimum weights, merge them to a branch
node of weight w. where w = w1 + w2 is the sum of the two weights. Repeatedly pick and
merge the two smallest weighted trees till we get the final tree, as shown in fig. 14.35.
We reuse the binary tree definition for Huffman tree. We augment the weight and only
hold the symbol in leaf node. Let the branch node be (w, l, r), where w is the weight, l
14.6. SOLUTION SEARCH 257

4 5

2 2 4 A, 2 2 2 N, 3

E, 1 L, 1 O, 1 R, 1 T, 2 I, 2 E, 1 L, 1 O, 1 R, 1

(1) (2) (3) (4) (5)

13
8

5 8
4 4

N, 3
2
T, 2 I, 2 A, 2 2 4 4

E, 1 L, 1 A, 2 2
O, 1 R, 1 T, 2 I, 2

(6) (7)
E, 1 L, 1

Figure 14.35: Build a Huffman tree.

and r are the left and right sub-trees. Let the leaf be (w, c), where c is the symbol. When
merge trees, we sum the weight: merge a b = (weight a + weight b, a, b), where:

weight (w, a) = w
(14.51)
weight (w, l, r) = w

Below function repeatedly pick and merge the minimum weighted trees:

build [t] = t
(14.52)
build ts = build (merge t1 t2 ) ts0 , where: (t1 , t2 , ts0 ) = extract ts

Function extract picks two trees with the minimal weight. Define t1 < t2 if weight t1 <
weight t2 .

extract(t1 :t2 :ts) = f oldr min2 (min t1 t2 , max t1 t2 , [ ]) ts (14.53)

Where:

(min t t1 , max t t1 , t2 :ts)


(
t < t2 :
min2 t (t1 , t2 , ts) = (14.54)
otherwise : (t1 , t2 , t:ts)

To iterate building Huffman tree, we store n sub-trees in array A. Scan A from right
to left. If the weight of A[i] is less than A[n − 1] or A[n], we swap A[i] and MAX(A[n −
1], A[n]). Merge A[n] and A[n − 1] after scan, and shrink the array by one. Repeat this
to build the Huffman tree:
1: function Huffman(A)
2: while |A| > 1 do
3: n ← |A|
4: for i ← n − 2 down to 1 do
5: T ← Max(A[n], A[n − 1])
6: if A[i] < T then
7: Exchange A[i] ↔ T
258 CHAPTER 14. SOLUTION SEARCH

8: A[n − 1] ← Merge(A[n], A[n − 1])


9: Drop(A[n])
10: return A[1]
We can build the code table from the Huffman tree. Let p = [ ]. Traverse from
the root, update p ← 0 : p when go left; p ← 1 : p when go right. When arrive at leaf
of symbol c, record the map c 7→ reverse p to the code table. Define (Curried form):
code = traverse [ ], where:

traverse p (w, c) = [c 7→ reverse p]


(14.55)
traverse p (w, l, r) = traverse (0:p) l +
+ traverse (1:p) l

When encoding, we scan the text w while looking up the code table dict to generate
binary bits:

encode dict w = concatMap (c 7→ dict[c]) w, where: dict = code T (14.56)

Conversely, when decoding, we scan the binary bits bs while looking up the tree. Start
from the root, go left for 0, right for 1; output symbol c when arrive at leaf; then reset to
the root to continue. decode T bs = lookup T bs, where:

lookup (w, c) [ ] = [c]


lookup (w, c) bs = c : lookup T bs (14.57)
lookup (w, l, r) (b:bs) = lookup (if b = 0 then l else r) bs

Huffman tree building reflects a special strategy: always pick and merge the two trees
with the minimal weight every time. The series of local optimal options generate a global
optimal prefix-code. Local optimal sub-solutions do not necessarily lead to global optimal
solution usually. Huffman coding is an exception. We call the strategy that always choose
the local optimal option as the greedy strategy. Greedy method simplifies and works for
many problems. However, it’s not easy to tell whether the greedy method generates the
global optimal solution. The generic formal proof is still an active research area [4] .

Exercise 14.9
14.9.1. Implement the imperative Huffman code table algorithm.

Change making problem


How to change money with coins as few as possible? Suppose there are 5 coin values: 1,
5, 25, 50, and 100 as a set C = {1, 5, 25, 50, 100}. To change money of value x, we can
apply the greedy method, always choose the most valued coin:

change 0 = []
(14.58)
change x = cm : change (x − cm ), where : cm = max {c ∈ C, c ≤ x}

For example, to change money of value 142, this function outputs a coin list: [100,
25, 5, 5, 5, 1, 1]. We can convert it to [(100, 1), (25, 1), (5, 3), (1, 2)], meaning 1 coin of
100, 1 coin of 25, 3 coins of 5, 2 coins of 1. For the coin system of C, the greedy method
can find the optimal solution. Actually, it is applicable for the most coin systems in the
world. There are exceptions, for example: C = {1, 3, 4}. To change money x = 6, the
optimal solution is 2 coins of 3, however, the greedy method gives 6 = 4 + 1 + 1, total 3
coins.
14.6. SOLUTION SEARCH 259

Although it’s not the optimal solution, the greedy method often gives a simplified
sub-optimal implementation. The result is often good enough in practice. For example,
the word-wrap is a common functionality in editors. If the length of the text T exceeds
the page width w, we need break it into lines. Let the space between words be s, below
greedy implementation gives the approximate optimal solution: put as many words as
possible in a line.
1: L ← W
2: for w ∈ T do
3: if |w| + s > L then
4: Insert line break
5: L ← W − |w|
6: else
7: L ← L − |w| − s

Exercise 14.10
14.10.1. Use heap to build the Huffman tree: take two trees from the top, merge then add
back to the heap.
14.10.2. If we sort the symbols by their weight as A, there is a linear time algorithm to
build the Huffman tree: use a tree Q to store the merge result, repeat take the
minimal weighted tree from Q and the head of A, merge then add to the queue.
After process all trees in A, there is a single tree in the Q, which is the Huffman
tree. Implement this algorithm.
14.10.3. Given a Huffman tree T , implement the decode algorithm with fold left.

14.6.3 Dynamic programming


Consider how to find the best solution Cm (the list of coins) to change money x for any
coin system. We can partition Cm into two groups: C1 and C2 , with values x1 and x2
respectively, i.e., Cm = C1 ++ C2 and x = x1 + x2 . We’ll prove that C1 is the optimal
solution to change x1 , and C2 is the optimal solution to change x2 .
Proof. For x1 , suppose there exists another solution C10 with less coins than C1 . Then the
solution C10 +
+ C2 changes x with less coins than Cm . This conflicts with the fact that Cm
is the optimal solution to change x. We can prove C2 is the optimal solution to change
X2 in the same way.
However, the converse is not true. for any integer y < x, divide the original problem
to two sub-problems: change y and x − y. It’s not necessary the overall optimal solution
when combine the two optimal sub-solutions. For example, use 3 values C = {1, 2, 4} to
change x = 6. The optimal solution needs two coins: 2 + 4. As 6 = 3 + 3, divide it to two
same sub-problems of changing 3. Each sub-problem has the optimal solution: 3 = 1 + 2,
however, the combination (1 + 2) + (1 + 2) needs 4 coins. If an optimal problem can
be divided into several optimal sub-problems, we call it has optimal substructure. The
change money problem has optimal substructure, but we need divide based on the coin
value, but not an arbitrary integer.

change 0 = []
(14.59)
change x = min [c : change (x − c)|c ∈ C, c < x]

Where min picks the shortest list. However, this definition is impractical. There
are too many duplicated computations. For example C = {1, 2, 25, 50, 100}, when com-
putes change(142), it needs further compute change(141), change(137), change(117),
260 CHAPTER 14. SOLUTION SEARCH

change(92), and change(42). For change(141), minus it by 1, 2, 25, 50, 100, we go back
to 137, 117, 92, 42. The search domain expands at 5n . Remind the idea to generate
Fibonacci numbers, we can use a table T to records the optimal solution to the sub-
problems. T starts from empty. When change money y, we lookup T [y] first. If T [y] = ∅,
then recursively compute the sub-problem, and save the sub-solution in T [y].
1: T ← [[ ], ∅, ∅, ...] . T [0] = [ ]
2: function Change(x)
3: if x > 0 and T [x] = ∅ then
4: for each c in C and c ≤ x do
5: Cm ← c : Change(x − c)
6: if T [x] = ∅ or |Cm | < |T [x]| then
7: T [x] ← Cm
8: return T [x]
We can bottom-up generate the optimal solutions for each sub-problem. From T [0] =
[ ], generate T [1] = [1], T [2] = [1, 1], T [3] = [1, 1, 1], T [4] = [1, 1, 1, 1], as shown in
table 14.1(a). There are two options for T [5]: 5 coins of 1, or a coin of 5. The latter
need fewer coins. We update the optimal table to table 14.1(b), T [5] = [5]. Next change
money x = 6. Both 1 and 5 are less than 6, there are two options: (1) 1 + T [5] gives
[1, 5]; (2) 5 + T [1] gives [5, 1]. They are equivalent, we pick either T [6] = [1, 5]. For T [i],
where i ≤ x, we check every coin value c ≤ i. Lookup T [i − c] for the sub-problem, then
plus c to get a new solution. We choose the fewest one as T [i].

x 0 1 2 3 4
optimal solution [ ] [1] [1, 1] [1, 1, 1] [1, 1, 1, 1]
(a) Optimal solution for x ≤ 4

x 0 1 2 3 4 5
optimal solution [ ] [1] [1, 1] [1, 1, 1] [1, 1, 1, 1] [5]
(b) Optimal solution for x ≤ 5

Table 14.1: Optimal solution table

1: function Change(x)
2: T ← [[ ], ∅, ...]
3: for i ← 1 to x do
4: for each c in C and c ≤ i do
5: if T [i] = ∅ or 1 + |T [i − c]| < |T [i]| then
6: T [i] ← c : T [i − c]
7: return T [x]
There are many duplicated contents in the below optimal solution table. A solution
contains the sub-solutions. We can only record the incremental part: the coin c chosen
for T [i] and the number n of coins, i.e., T [i] = (n, c). To generate the list of coins for x,
we lookup T [x] to get c, then lookup T [x − c] to get c0 , ... repeat this to T [0].

value 6 7 8 9 10 ...
optimal solution [1, 5] [1, 1, 5] [1, 1, 1, 5] [1, 1, 1, 1, 5] [5, 5] ...

1: function Change(x)
2: T ← [(0, ∅), (∞, ∅), (∞, ∅), ...]
14.6. SOLUTION SEARCH 261

3: for i ← 1 to x do
4: for each c in C and c ≤ i do
5: (n, _) ← T [i − c], (m, _) ← T [i]
6: if 1 + n < m then
7: T [i] ← (1 + n, c)
8: s←[]
9: while x > 0 do
10: (_, c) ← T [x]
11: s←c:s
12: x←x−c
13: return s
We can build the optimal solution table T with left fold: foldl f ill [(0, 0)] [1, 2, ...],
where:
f ill T x = T B min {(fst T [x − c], c)|c ∈ C, c ≤ x} (14.60)
Where s B a append a to the right of s (see finger tree in section 12.5). Then rebuild
the optimal solution backwards from T :
change 0 T = []
(14.61)
change x T = c : change (x − c) T, where: c = snd T [x]
For x = n, we loop n times, check at most k = |C| coins. The performance is bound
to Θ(nk)10 , and need O(n) space to persist T both in the top-down and bottom-up
implementations. The solution to the sub-problem is used many times to compute the
global optimal solution. We call it overlapping sub-problems. Richard Bellman developed
dynamic programming in 1940s. It has two properties.
1. Optimal sub-structure. The problem can be broken down into small problems. The
optimal solution can be constructed from the solutions of these sub-problems;
2. Overlapping sub-problems. The solution of the sub-problem is reused multiple times
to find the overall solution.

Longest common sub-sequence


Different with sub-string, the sub-sequence needn’t be consecutive. For example: the
longest common sub-string of ‘Mississippi’ and ‘Missunderstanding’ is ‘Miss’, while the
longest common sub-sequence is ‘Misssi’ as shown in fig. 14.36. If rotate the figure by 90°,
it turns to be a ‘diff’ result between them. This is a common function in version control
tools. The longest common sub-sequence of x and y are defined as below:

M i s s i s s i p p i

M i s s u n d e r s t a n d i n g

Figure 14.36: The longest common sub-sequence.

LCS([ ], ys) = [ ]
LCS(xs, [ ]) = (
[]
(14.62)
x=y: x : LCS(xs, ys)
LCS(x:xs, y:ys) =
otherwise : max LCS(x:xs, ys) LCS(xs, y:ys)
10 upper bound
262 CHAPTER 14. SOLUTION SEARCH

Where max picks the longer sequence. There is optimal sub-structure in the definition
of LCS. It can be broken down into sub-problems. The sequence length reduces at least
by 1 every time. There are overlapping sub-problems. The longest common sub-sequence
of the sub-strings are reused multiple times to find the global optimal solution. We use
a 2D table T to record the optimal solution of the sub-problems. The row and column
represent xs and ys respectively. Index the sequence from 0. Row 0, column 0 represents
the empty sequence. T [i][j] is the length of LCS(xs[0..j], ys[0..i]). We finally build the
longest common sub-sequences from T . Because LCS([ ], ys) = LCS(xs, [ ]) = [ ], row 0
and column 0 are all 0s. Consider ‘antenna’ and ‘banana’ for example, we fill row 1 from
T [1][1]. ‘b’ is different from any one in ‘antenna’, hence row 1 are all 0s. For T [2][1], the
row and column are corresponding to ‘a’, T [2][1] = T [1][0] + 1 = 1, i.e., LCS(a, ba) = a.
Next move to T [2][2], ‘a’ 6= ‘n’, we choose the greater one between the above (LCS(an, b))
and the left (LCS(a, ba)) as T [2][2], which equals to 1, i.e., LCS(ba, an) = a. In this way,
we step by step fill the table out. The rule is: for T [i][j], if xs[i − 1] = ys[i − 1], then
T [i][j] = T [i − 1][j − 1] + 1; otherwise, pick the greater one from above T [i − 1][j] and the
left T [i][j − 1].

0 1 2 3 4 5 6 7
[] a n t e n n a
0 [] 0 0 0 0 0 0 0 0
1 b 0 0 0 0 0 0 0 0
2 a 0 1 1 1 1 1 1 1
3 n 0 1 2 2 2 2 2 2
4 a 0 1 2 2 2 2 2 3
5 n 0 1 2 2 2 3 3 3
6 a 0 1 2 2 2 3 3 4

1: function LCS(xs, ys)


2: m ← |xs|, n ← |ys|
3: T ← [[0, 0, ...], [0, 0, ...], ...] . (m + 1) × (n + 1)
4: for i ← 1 to m do
5: for j ← 1 to n do
6: if xs[i] = ys[j] then
7: T [i + 1][j + 1] ← T [i][j] + 1
8: else
9: T [i + 1][j + 1] ← Max(T [i][j + 1], T [i + 1][j])
10: return Fetch(T, xs, ys) . build the LCS
We next build the longest common sub-sequence from T . Start from the bottom-
right, if xs[m] = ys[n], then xs[m] is the tail of the LCS, we next compare xs[m − 1] and
ys[n − 1]; otherwise, we pick the greater one from T [m − 1][n] and T [m][n − 1] and go on.
1: function Fetch(T, xs, ys)
2: m ← |xs|, n ← |ys|
3: r←[]
4: while m > 0 and n > 0 do
5: if xs[m − 1] = ys[n − 1] then
6: r ← xs[m − 1] : r
7: m←m−1
8: n←n−1
9: else if T [m − 1][n] > T [m][n − 1] then
14.6. SOLUTION SEARCH 263

10: m←m−1
11: else
12: n←n−1
13: return r

Exercise 14.11
14.11.1. For the longest common sub-sequence, build the optimal solution table with fold.

Subset sum
Given a set XPof integers,
P how to find all the subsets S ⊆ X, that the sum of elements
in S is s, i.e., S= i = s? For example, X = {11, 64, -82, -68, 86, 55, -88, -21, 51},
i∈S
there are three subsets with sum s = 0: S = ∅, {64, -82, 55, -88, 51}, {64, -82, -68, 86}.
We need exhaust 2n subset sums, where n = |X|, the performance is O(n2n ).

sets s ∅ = (
[∅]
s=x: {x} : sets s xs (14.63)
sets s (x:xs) =
otherwise : (sets s xs) ++ [x:S|S ∈ sets (s − x) xs]

There is sub-structure and overlapping sub-problems in above exhaustive search defi-


nition, we can apply dynamic programming method. We bottom-up build solution table
P, and generate the final subset. First consider the existence of some subset S, satisfying
T
S = s. We scan the elements to determine the bottom/up bound of the subset sum
l ≤ s ≤ u. if s < l or s > u, then there’s no solution.

(14.64)
X X
l= {x < 0}, u = {x > 0}
X X

As the elements are integers, there are m = u − l + 1 columns in table T , each


corresponds to a value: l ≤ j ≤ u. There are n = |X| + 1 rows, each corresponds to some
element xi . T [i][j] indicates whether exists some subset S ⊆ {x1 , x2 , ..., xi }, satisfying
S = j. Row 0 is special, represents Pthe sum of empty set ∅. All entries in T startP from
P
false F except T [0][0] = T, meaning ∅ = 0. Start from x 1 to build row 1. As ∅ = 0,
{x1 } = x1 , hence T [1][0] =T, T [1][x1 ] =T, the rest are all F.
P

l l+1 ... 0 ... x1 ... u


∅ F F ... T ... F ... F
x1 F F ... T ... T ... F
... F F ... T ... T ... F

P Add x2 , we get 4 possible subset sums: {x1 } = x1 , {x2 } = x2 ,


P P P
∅ = 0,
{x1 , x2 } = x1 + x2 .

l l+1 ... 0 ... x1 ... x2 ... x1 + x2 ... u


∅ F F ... T ... T ... F ... F ... F
x1 F F ... T ... T ... F ... F ... F
x2 F F ... T ... T ... T ... T ... F
... F F ... T ... T ... T ... T ... F
264 CHAPTER 14. SOLUTION SEARCH

When add element xi to fill row i, we can obtain all subset sums from previous
elements:P{x1 , x2 , ..., xi−1 }, hence all the entries of true in previous row are still true.
Because {xi } = xi , hence T [i][xi ] = T. We add xi to each previous sum, generate some
new sums, the corresponding entries of them are all true. After adding all n elements,
the Boolean value of T [n][s] gives whether the subset sum s exists.
1: function Subset-Sum(X, s)
2:
P P
l ← {x ∈ X, x < 0}, u ← {x ∈ X, x > 0}
3: n ← |X|
4: T ← {{F, F, ...}, {F, F, ...}, ...} . (n + 1) × (uP
− l + 1)
5: T [0][0] ← T . ∅=0
6: for i ← 1 to n do
7: T [i][X[i]] ←T
8: for j ← l to u do
9: T [i][j] ← T [i][j] ∨ T [i − 1][j]
10: j 0 ← j − X[i]
11: if l ≤ j 0 ≤ u then
12: T [i][j] ← T [i][j] ∨ T [i − 1][j 0 ]
13: return T [n][s]
The column index j does not start from 0, but from l to u. We can convert P it by
j − l in programming environment. We next generate all subsets S satisfying S =s
from table T . If T [n][s] =F then there’s no solution; otherwise, there are two cases: (1) if
xn = s, then the singleton set {xn } is a solution. We next lookup T [n − 1][s], if it’s true
T , then recursively generate all subsets from {x1 , x2 , x3 , ..., xn−1 } with the sum s; (2) let
s0 = s − xn , if l ≤ s0 ≤ u and T [n − 1][s0 ] is true, we recursively generate subsets from
{x1 , x2 , x3 , ..., xn−1 } with the sum s0 , then add xn to each subset.
1: function Get(X, s, T, n)
2: r←[]
3: if X[n] = s then
4: r ← {X[n]} : r
5: if n > 1 then
6: if T [n − 1][s] then
7: r ← r++ Get(X, s, T, n − 1)
8: 0
s ← s − X[n]
9: if l ≤ s0 ≤ u and T [n − 1][s0 ] then
10: r←r+ + [(X[n]:r0 )|r0 ← Get(X, s0 , T, n − 1) ]
11: return r
The dynamic programming method loops O(n(u − l + 1)) times to build table T , then
recursively generates the solution in O(n) levels. The 2D table need O(n(u − l + 1)) space.
We can replace itPwith a P 1D vector V of u − l + 1 entries. each V [j] = {S1 , S2 , ...} stores
the subsets that S1 = S2 = ... = j. V starts from all empty entries. For each xi , we
update V a round, add the new obtained sums with xi . The final solution is in V [s].
1: function Subset-Sum(X, s)
2:
P P
l ← {x ∈ X, x < 0}, u ← {x ∈ X, x > 0}
3: V ← [∅, ∅, ...] . u−l+1
4: for each x in X do
5: U ← Copy(V )
6: for j ← l to u do
7: if x = j then
8: U [j] ← {{x}} ∪ U [j]
9: 0
j ←j−x
14.7. APPENDIX - EXAMPLE PROGRAMS 265

10: if l ≤ j 0 ≤ u and V [j 0 ] 6= ∅ then


11: U [j] ← U [j] ∪ {({x} ∪ S)|S ∈ V [j 0 ]}
12: V ←U
13: return V [s]
We can build the solution vector with left fold: V = f oldl bld (replicate (u−l+1) ∅) X,
where replicate n a generates list [a, a, ..., a] of length n. bld updates V with each element
in X.

bld V x = f oldl f V [l, l + 1..., u] (14.65)

Where:

j = x :
 V [j] ∪ {{x}}
f V j = l ≤ j 0 ≤ u and T [j 0 ] 6= ∅ : V [j] ∪ {{x}S|S ∈ T [j 0 ]}, where : j 0 = j − x
otherwise :

V

(14.66)

Exercise 14.12
14.12.1. For the longest common sub-sequence problem, an alternative solution is to record
the length and the direction in the table. There are three directions: ‘N’ for north,
‘W’ for west, and ‘NW’. Given such a table, we can build the longest common
sub-sequence from the bottom-right entry. If the entry is ‘NW’, next go to the
upper-left entry; if it’s ‘N’, go to the above row; and go to the previous entry if
it’s ‘W’. Implement this solution.
14.12.2. For the subset sum upper/lower bound, does l ≤ 0 ≤ u always hold? can we reduce
the range between the bounds?
14.12.3. Given a list of non-negative integers, find the maximum sum composed by numbers
that none of them are adjacent.
14.12.4. Edit distance (also known as Levenshtein edit distance) is defined as the cost of
converting from one string s to another string t. It is widely used in spell-checking,
OCR correction etc. There are three symbol changes: insert, delete, and replace.
Each operation mutate a character a time. For example the edit distance is 3 for
‘kitten’ 7→ ‘sitting’:
1. kitten → sitten (k 7→ s);
2. sitten → sittin (e 7→ i);
3. sittin → sitting (+ g).
Compute the edit distance with dynamic programming.

14.7 Appendix - example programs


Find the top-k element:
Optional<K> top(Int k, [K] xs, Int l, Int u) {
if l < u {
swap(xs, l, rand(l, u))
var p = partition(xs, l, u)
if p - l + 1 == k
return Optional.of(xs[p])
return if k < p - l + 1 then top(k, xs, l, p)
else top(k- p + l - 1, xs, p + 1, u)
}
266 CHAPTER 14. SOLUTION SEARCH

return Optional.Nothing
}

Int partition([K] xs, Int l, Int u) {


var p = l
for var r = l + 1 to u {
if not xs[p] < xs[r] {
l = l + 1
swap(xs, l, r)
}
}
swap(xs, p, l)
return l
}

Saddle back search:


solve f z = search 0 m where
search p q | p > n | | q < 0 = []
| z' < z = search (p + 1) q
| z' > z = search p (q - 1)
| otherwise = (p, q) : search (p + 1) (q - 1)
where z' = f p q
m = bsearch (f 0) z (0, z)
n = bsearch (λx→f x 0) z (0, z)

bsearch f y (l, u) | u ≤ l = l
| f m ≤ y = if f (m + 1) ≤ y then bsearch f y (m + 1, u) else m
| otherwise = bsearch f y (l, m-1)
where m = (l + u) `div` 2

Boyer-Moore majority:
Optional<T> majority([T] xs) {
var (m, c) = (Optional<T>.Nothing, 0)
for var x in xs {
if c == 0 then (m, c) = (Optional.of(x), 0)
if x == m then c++ else c--
}
c = 0
for var x in xs {
if x == m then c++
}
return if c > length(xs)/2 then m else Optional<T>.Nothing
}

Find the majority with fold:


majority xs = verify $ foldr maj (Nothing, 0) xs where
maj x (Nothing, 0) = (Just x, 1)
maj x (Just y, v) | x == y = (Just y, v + 1)
| v == 0 = (Just x, 1)
| otherwise = (Just y, v - 1)
verify (Nothing, _) = Nothing
verify (Just m, _) = if 2 ∗ (length $ filter (==m) xs) > length xs
then Just m else Nothing

The maximum sum of sub-vector:


maxSum :: (Ord a, Num a) ⇒ [a] → a
maxSum = fst ◦ foldr f (0, 0) where
f x (m, mSofar) = (m', mSofar') where
mSofar' = max 0 (mSofar + x)
m' = max mSofar' m

KMP string matching:


14.7. APPENDIX - EXAMPLE PROGRAMS 267

[Int] match([T] w, [T]p) {


n = length(w), m = length(p)
[Int] fallback = prefixes(p)
[Int] r = []
Int k = 0
for i = 0 to n {
while k > 0 and p[k] 6= w[i] {
k = fallback[k]
}
if p[k] == w[i] then k = k + 1
if k == m {
add(r, i + 1 - m)
k = fallback[k - 1]
}
}
return r
}

[Int] prefixes([T] p) {
m = length(p)
[Int] t = [0] ∗ m //fallback table
Int k = 0
for i = 2 to m {
while k > 0 and p[i-1] 6= p[k] {
k = t[k-1] #fallback
}
if p[i-1] == p[k] then k = k + 1
t[i] = k
}
return t
}

The maze puzzle:


dfsSolve m from to = solve [[from]] where
solve [] = []
solve (c@(p:path):cs)
| p == to = reverse c
| otherwise = let os = filter (`notElem` path) (adj p) in
if os == [] then solve cs
else solve ((map (:c) os) ++ cs)
adj (x, y) = [(x', y') | (x', y') ← [(x-1, y), (x+1, y), (x, y-1), (x, y+1)],
inRange (bounds m) (x', y'), m ! (x', y') == 0]

The eight queens puzzle:


solve = dfsSolve [[]] [] where
dfsSolve [] s = s
dfsSolve (c:cs) s
| length c == 8 = dfsSolve cs (c:s)
| otherwise = dfsSolve ([(x:c) | x ← [1..8] \\ c,
not $ attack x c] +
+ cs) s
attack x c = let y = 1 + length c in
any (λ(i, j) → abs(x - i) == abs(y - j)) $
zip (reverse c) [1..]

The peg puzzle:


solve = dfsSolve [[[-1, -1, -1, 0, 1, 1, 1]]] [] where
dfsSolve [] s = s
dfsSolve (c:cs) s
| head c == [1, 1, 1, 0, -1, -1, -1] = dfsSolve cs (reverse c:s)
| otherwise = dfsSolve ((map (:c) $ moves $ head c) ++ cs) s

moves s = filter ( 6= s) [leapLeft s, hopLeft s, leapRight s, hopRight s] where


268 CHAPTER 14. SOLUTION SEARCH

leapLeft [] = []
leapLeft (0:y:1:ys) = 1:y:0:ys
leapLeft (y:ys) = y:leapLeft ys
hopLeft [] = []
hopLeft (0:1:ys) = 1:0:ys
hopLeft (y:ys) = y:hopLeft ys
leapRight [] = []
leapRight (-1:y:0:ys) = 0:y:(-1):ys
leapRight (y:ys) = y:leapRight ys
hopRight [] = []
hopRight (-1:0:ys) = 0:(-1):ys
hopRight (y:ys) = y:hopRight ys

Iterative solution to the peg puzzle:


[Int] solve([Int] start, [Int] end) {
stack = [[start]]
s = []
while stack 6= [] {
c = pop(stack)
if c[0] == end {
s += reverse(c)
} else {
for [Int] m in moves(c[0]) {
stack += (m:c)
}
}
}
return s
}

[[Int]] moves([Int] s) {
[[Int]] ms = []
n = length(s)
p = find(s, 0)
if p < n - 2 and s[p+2] > 0 then ms += swap(s, p, p+2)
if p < n - 1 and s[p+1] > 0 then ms += swap(s, p, p+1)
if p > 1 and s[p-2] < 0 then ms += swap(s, p, p-2)
if p > 0 and s[p-1] < 0 then ms += swap(s, p, p-1)
return ms
}

[Int] swap([Int] s, Int i, Int j) {


a = copy(s)
(a[i], a[j]) = (a[j], a[i])
return a
}

The wolf, goat, cabbage cross river puzzle:


import Data.Bits
import qualified Data.Sequence as Queue
import Data.Sequence (Seq((:<| )), (><))

solve = bfsSolve $ Queue.singleton [(15, 0)] where


bfsSolve Queue.Empty = [] −− no solution
bfsSolve (c@(p:_) :<| cs)
| fst p == 0 = reverse c
| otherwise = bfsSolve (cs >< (Queue.fromList $ map (:c)
(filter (`valid` c) $ moves p)))

valid (a, b) r = not $ or [ a `elem` [3, 6], b `elem` [3, 6], (a, b) `elem` r]

moves (a, b) = if b < 8 then trans a b else map swap (trans b a) where
trans x y = [(x - 8 - i, y + 8 + i)
| i ←[0, 1, 2, 4], i == 0 | | (x .&. i) 6= 0]
14.7. APPENDIX - EXAMPLE PROGRAMS 269

swap (x, y) = (y, x)

The extended Euclid algorithm to solve the water jugs puzzle:


extGcd 0 b = (b, 0, 1)
extGcd a b = let (d, x', y') = extGcd (b `mod` a) a in
(d, y' - x' ∗ (b `div` a), x')

solve a b g | g `mod` d 6= 0 = []
| otherwise = solve' (x ∗ g `div` d)
where
(d, x, y) = extGcd a b
solve' x | x < 0 = solve' (x + b)
| otherwise = pour x [(0, 0)]
pour 0 ps = reverse ((0, g):ps)
pour x ps@((a', b'):_) | a' == 0 = pour (x - 1) ((a, b'):ps)
| b' == b = pour x ((a', 0):ps)
| otherwise = pour x ((max 0 (a' + b' - b),
min (a' + b') b):ps)

BFS solution to the water jugs puzzle:


import qualified Data.Sequence as Queue
import Data.Sequence (Seq((:<| )), (><))

solve' a b g = bfs $ Queue.singleton [(0, 0)] where


bfs Queue.Empty = []
bfs (c@(p:_) :<| cs)
| fst p == g | | snd p == g = reverse c
| otherwise = bfs (cs >< (Queue.fromList $ map (:c) $ expand c))
expand ((x, y):ps) = filter (`notElem` ps) $ map (λf → f x y)
[fillA, fillB, pourA, pourB, emptyA, emptyB]
fillA _ y = (a, y)
fillB x _ = (x, b)
emptyA _ y = (0, y)
emptyB x _ = (x, 0)
pourA x y = (max 0 (x + y - b), min (x + y) b)
pourB x y = (min (x + y) a, max 0 (x + y - a))

Iterative BFS for water jugs puzzle:


data Step {
Pair<Int> (p, q)
Step parent
Step(Pair<Int>(x, y), Step p = null) {
(p, q) = (x, y), parent = p
}
}

Bool (==) (Step a, Step b) = {a.(p, q) == b.(p, q)}


Bool ( 6= ) (Step a, Step b) = not ◦ (==)

[Step] expand(Step s, Int a, Int b) {


var (p, q) = s.(p, q)
return [Step(a, q, s), /∗fill A∗/
Step(p, b, s), /∗fill B∗/
Step(0, q, s), /∗empty A∗/
Step(p, 0, s), /∗empty B∗/
Step(max(0, p + q - b), min(p + q, b), s), /∗pour A into B∗/
Step(min(p + q, a), max(0, p + q - a), s)] /∗pour B into A∗/
}

Optional<[Step]> solve(Int a, Int b, Int g) {


q = Queue<Step>(Step(0, 0))
Set<Step> visited = {head(q)}
while not empty(q) {
270 CHAPTER 14. SOLUTION SEARCH

var cur = pop(q)


if cur.p == g | | cur.q == g {
return Optional.of(backtrack(cur))
} else {
for s in expand(cur, a, b) {
if cur 6= s and s not in visited {
push(q, s)
visited += s
}
}
}
}
return Optional.Nothing
}

[Step] backtrack(Step s) {
[Step] seq
while s 6= null {
seq = s : seq
s = s.parent
}
return seq
}

Klotski puzzle:
import qualified Data.Map as Map
import qualified Data.Set as Set
import qualified Data.Sequence as Queue
import Data.Sequence (Seq((:<| )), (><))

cellOf (y, x) = y ∗ 4 + x
posOf c = (c `div` 4, c `mod` 4)

cellSet = Set.fromList ◦ (map cellOf)

type Layout = Map.Map Integer (Set.Set Integer)


type NormLayout = Set.Set (Set.Set Integer)
type Move = (Integer, Integer)

start = Map.map cellSet $ Map.fromList


[(1, [(0, 0), (1, 0)]),
(2, [(0, 3), (1, 3)]),
(3, [(2, 0), (3, 0)]),
(4, [(2, 1), (2, 2)]),
(5, [(2, 3), (3, 3)]),
(6, [(4, 0)]), (7, [(3, 1)]), (8, [(3, 2)]), (9, [(4, 3)]),
(10, [(0, 1), (0, 2), (1, 1), (1, 2)])]

end = cellSet [(3, 1), (3, 2), (4, 1), (4, 2)]

normalize = Set.fromList ◦ Map.elems

mirror = Map.map (Set.map f) where


f c = let (y, x) = posOf c in cellOf (y, 3 - x)

klotski = solve q visited where


q = Queue.singleton (start, [])
visited = Set.singleton (normalize start)

solve Queue.Empty _ = []
solve ((x, ms) :<| cs) visited | Map.lookup 10 x == Just end = reverse ms
| otherwise = solve q visited'
where
q = cs >< (Queue.fromList [(move x op, op:ms) | op ← ops ])
visited' = foldr Set.insert visited (map (normalize ◦ move x) ops)
14.7. APPENDIX - EXAMPLE PROGRAMS 271

ops = expand x visited

expand x visited = [(i, d) | i ←[1..10], d ← [-1, 1, -4, 4],


valid i d, unique i d]
where
valid i d = let p = trans d (maybe Set.empty id $ Map.lookup i x) in
(not $ any (outside d) p) &&
(Map.keysSet $ Map.filter (overlapped p) x)
`Set.isSubsetOf` Set.singleton i
outside d c = c < 0 | | c ≥ 20 | |
(d == 1 && c `mod` 4 == 0) | | (d == -1 && c `mod` 4 == 3)
unique i d = let ly = move x (i, d) in all (`Set.notMember` visited)
[normalize ly, normalize (mirror ly)]

move x (i, d) = Map.update (Just ◦ trans d) i x

trans d = Set.map (d+)

overlapped :: (Set.Set Integer) → (Set.Set Integer) → Bool


overlapped a b = (not ◦ Set.null) $ Set.intersection a b

Iterative solution to the Klotski puzzle:


type Layout = [Set<Int>]

Layout START = [{0, 4}, {3, 7}, {8, 12}, {9, 10},
{11, 15},{16},{13}, {14}, {19}, {1, 2, 5, 6}]

Set<Int> END = {13, 14, 17, 18}

(Int, Int) pos(Int c) = (y = c / 4, x = c mod 4)

[[Int]] matrix(Layout layout) {


[[Int]] m = replicate(replicate(0, 4), 5)
for Int i, var p in (zip([1, 2, ...], layout)) {
for var c in p {
y, x = pos(c)
m[y][x] = i
}
}
return m
}

data Node {
Node parent
Layout layout

Node(Layout l, Node p = null) {


layout = l, parent = p
}
}

//usage: solve(START, END)


Optional<Node> solve(Layout start, Set<Int> end) {
var visit = {Set(start)}
var queue = Queue.of(Node(start))
while not empty(queue) {
cur = pop(queue)
if last(cur.layout) == end {
return Optional.of(cur)
} else {
for ly in expand(cur.layout, visit) {
push(queue, Node(ly, cur))
add(visit, Set(ly))
}
}
272 CHAPTER 14. SOLUTION SEARCH

}
return Optional.None
}

[Layout] expand(Layout layout, Set<Set<Layout>> visit):


Bool bound(Set<Int> piece, Int d) {
for c in piece {
if c + d < 0 or c + d ≥ 20 then return False
if d == 1 and c mod 4 == 3 then return False
if d == -1 and c mod 4 == 0 then return False
}
return True
}

var m = matrix(layout)
Bool valid(Set<Int> piece, Int d, Int i) {
for c in piece {
y, x = pos(c + d)
if m[y][x] not in [0, i] then return False
}
return True
}

Bool unique(Layout ly) {


n = Set(ly)
Set<Set<Int>> m = map(map(c → 4 ∗ (c / 4) + 3 - (c mod 4), p), n)
return (n not in visit) and (m not in visit)
}

[Layout] s = []
for i, p in zip([1, 2, ...], layout) {
for d in [-1, 1, -4, 4] {
if bound(p, d) and valid(p, d, i) {
ly = move(layout, i - 1, d)
if unique(ly) then s.append(ly)
}
}
}
return
}

Layout move(Layout layout, Int i, Int d) {


ly = clone(layout)
ly[i] = map((d+), layout[i])
return ly
}

Code, decode with a Huffman tree:


code = Map.fromList ◦ (traverse []) where
traverse bits (Leaf _ c) = [(c, reverse bits)]
traverse bits (Branch _ l r) = traverse (0:bits) l +
+ traverse (1:bits) r

encode dict = concatMap (dict !)

decode tr cs = find tr cs where


find (Leaf _ c) [] = [c]
find (Leaf _ c) bs = c : find tr bs
find (Branch _ l r) (b:bs) = find (if b == 0 then l else r) bs

Greedy change-making:
import qualified Data.Set as Set
import Data.List (group)

solve x = assoc ◦ change x where


14.7. APPENDIX - EXAMPLE PROGRAMS 273

change 0 _ = []
change x cs = let c = Set.findMax $ Set.filter ( ≤ x) cs in c : change (x - c) cs
assoc = (map (λcs → (head cs, length cs))) ◦ group

example = solve 142 $ Set.fromList [1, 5, 25, 50, 100]

Dynamic programming change-making:


[Int] change(Int x, Set<Int> cs) {
t = [(0, None)] ++ [(x + 1, None)] ∗ x
for i = 1 to x {
for c in cs {
if c ≤ i {
(n, _) = t[i - c]
(m, _) = t[i]
if 1 + n < m then t[i] = (1 + n, c)
s = []
while x > 0:
(_, c) = t[x]
s += c
x = x - c
return s
}

Dynamic programming with fold to solve the change-making problem:


import qualified Data.Set as Set
import Data.Sequence (( |>), singleton, index)

changemk x cs = makeChange x $ foldl fill (singleton (0, 0)) [1..x] where


fill tab i = tab |> (n, c) where
(n, c) = minimum $ Set.map lookup $ Set.filter ( ≤ i) cs
lookup c = (1 + fst (tab `index` (i - c)), c)
makeChange 0 _ = []
makeChange x tab = let c = snd $ tab `index` x in c : makeChange (x - c) tab

The longest common sub-sequence:


[K] lcs([K] xs, [K] ys) {
Int m = length(xs), n = length(ys)
[[Int]] c = [[0]∗(n + 1)]∗(m + 1)
for i = 1 to m {
for j = 1 to n {
if xs[i-1] == ys[j-1] {
c[i][j] = c[i-1][j-1] + 1
} else {
c[i][j] = max(c[i-1][j], c[i][j-1])
}
}
}
return fetch(c, xs, ys)
}

[K] fetch([[Int]] c, [K] xs, [K] ys) {


[K] r = []
var m = length(xs), n = length(ys)
while m > 0 and n > 0 {
if xs[m - 1] == ys[n - 1] {
r += xs[m - 1]
m = m - 1
n = n - 1
} else if c[m - 1][n] > c[m][n - 1] {
m = m - 1
} else {
n = n - 1
}
274 CHAPTER 14. SOLUTION SEARCH

}
return reverse(r)
}

Existence of the subset sum:


Bool subsetsum([Int] xs, Int s) {
Int l = 0, u = 0, n = length(xs)
for x in xs {
if x > 0 then u++ else l++
}
tab = [[False]∗(u - l + 1)] ∗ (n + 1)
tab = [0][0 - l] = True
for i, x in zip([1, 2, ..., n], xs) {
tab[i][x - l] = True
for j = l to u {
tab[i][j - l] or = tab[i-1][j - l]
j1 = j - x
if l ≤ j1 ≤ u then tab[i][j - l] or = tab[i-1][j1 - l]
}
}
return tab[n][s - l]
}

Solve the subset sum with a vector:


{{Int}} subsetsum(xs, s) {
Int l = 0, u = 0, n = length(xs)
for x in xs {
if x > 0 then u++ else l++
}
tab = {} ∗ (u - l + 1)
for x in xs {
tab1 = copy(tab)
for j = low to up {
if x == j then add(tab1[j], {x})
j1 = j - x
if low ≤ j1 ≤ up and tab[j1] {
tab1[j] |= {add(ys, x) for ys in tab[j1]}
}
}
tab = tab1
}
return tab[s]
}
Imperative delete for red-black
tree

We need handle more cases for imperative delete than insert. To resume balance after
cutting off a node from the red-black tree, we perform rotations and re-coloring. When
delete a black node, rule 5 will be violated because the number of black nodes along the
path through that node reduces by one. We introduce ‘doubly-black’ to maintain the
number of black nodes unchanged. Below example program adds ‘doubly black’ to the
color definition:
data Color {RED, BLACK, DOUBLY_BLACK}

When delete a node, we re-use the binary search tree delete in the first step, then
further fix the balance if the node is black.
1: function Delete(T, x)
2: p ← Parent(x)
3: q ← NIL
4: if Left(x) = NIL then
5: q ← Right(x)
6: Replace(x, Right(x)) . replace x with its right sub-tree
7: else if Right(x) = NIL then
8: q ← Left(x)
9: Replace(x, Leftx()) . replace x with its left sub-tree
10: else
11: y ← Min(Right(x))
12: p ← Parent(y)
13: q ← Right(y)
14: Key(x) ← Key(y)
15: copy data from y to x
16: Replace(y, Right(y)) . replace y with its right sub-tree
17: x←y
18: if Color(x) = BLACK then
19: T ← Delete-Fix(T , Make-Black(p, q), q = NIL?)
20: release x
21: return T
Delete takes the root T and the node x to be deleted as the parameters. x can be
located through lookup. If x has an empty sub-tree, we cut off x, then replace it with
the other sub-tree q. Otherwise, we locate the minimum node y in the right sub-tree of
x, then replace x with y. We cut off y after that. If x is black, we call Make-Black(p,
q) to maintain the blackness before further fixing.
1: function Make-Black(p, q)
2: if p = NIL and q = NIL then

275
276 IMPERATIVE DELETE FOR RED-BLACK TREE

3: return NIL . The tree was singleton


4: else if q = NIL then
5: n ← Doubly Black NIL
6: Parent(n) ← p
7: return n
8: else
9: return Blacken(q)
If both p and q are empty, we are deleting the only leaf from a singleton tree. The
result is empty. If the parent p is not empty, but q is, we are deleting a black leaf. We
use NIL to replace it. As NIL is already black, we change it to ’doubly black’ NIL to
maintain the blackness. Otherwise, if neither p nor q is empty, we call Blacken(q). If
q is red, it changes to black; if q is already black, it changes to doubly black. As the
next step, we need eliminate the doubly blackness through tree rotations and re-coloring.
There are three different cases ( [4] , pp292). The doubly black node can be NIL or not in
all the cases.
Case 1. The sibling of the doubly black node is black, and it has a red sub-tree. We
can rotate the tree to fix the doubly black. There are 4 sub-cases, all can be transformed
to a uniformed structure as shown in fig. 37.

x x

a z a y

y d b z
y
b c c
x z d

a d z
z b c

y d
x d

x c
a y

a b
b c

Figure 37: The doubly black node has a black sibling, and a red nephew. It can be fixed
with a rotation.

1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then . x is doubly black NIL
4: n←x
5: if x = NIL then . Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do . x is doubly black, but not the root
8: if Sibling(x) 6= NIL then . The sibling is not empty
9: s ← Sibling(x)
10: ...
11: if s is black and Left(s) is red then
12: if x = Left(Parent(x)) then . x is the left
13: set x, Parent(x), and Left(s) all black
277

14: T ← Rotate-Right(T , s)
15: T ← Rotate-Left(T , Parent(x))
16: else . x is the right
17: set x, Parent(x), s, and Left(s) all black
18: T ← Rotate-Right(T , Parent(x))
19: else if s is black and Right(s) is red then
20: if x = Left(Parent(x)) then . x is the left
21: set x, Parent(x), s, and Right(s) all black
22: T ← Rotate-Left(T , Parent(x))
23: else . x is the right
24: set x, Parent(x), and Right(s) all black
25: T ← Rotate-Left(T , s)
26: T ← Rotate-Right(T , Parent(x))
27: ...
Case 2. The sibling of the doubly black is red. We can rotate the tree to change the
doubly black node to black. As shown in figure fig. 38, change a or c to black. We can
add this fixing to the previous implementation.

x y

a y x c

b c a b

y x

x c a y

a b b c

Figure 38: The sibling of the doubly black is red

1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then . x is doubly black NIL
4: n←x
5: if x = NIL then . Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do
8: if Sibling(x) 6= NIL then
9: s ← Sibling(x)
10: if s is red then . The sibling is red
11: set Parent(x) red
12: set s black
13: if x = Left(Parent(x)) then . x is the left
14: T ← Rotate-LeftT , Parent(x)
15: else . x is the right
16: T ← Rotate-RightT , Parent(x)
17: else if s is black and Left(s) is red then
278 IMPERATIVE DELETE FOR RED-BLACK TREE

18: ...
Case 3. The sibling of the doubly black node, and its two sub-trees are all black. In
this case, we re-color the sibling to red, change the doubly black node back to black, then
move the doubly blackness up to the parent. As shown in figure fig. 39, there are two
symetric sub-cases.

x x

a y a y

b c b c

y y

x c x c

a b a b

Figure 39: move the blackness up

The sibling of the doubly black isn’t empty in all above 3 cases. Otherwise, we change
the doubly black node back to black, and move the blackness up. When reach the root,
we force the root to be black to complete fixing. It also terminates if the doubly black
node is eliminated after re-color in the midway. At last, if the doubly black node passed
in is empty, we turn it back to normal NIL.
1: function Delete-Fix(T , x, f )
2: n ← NIL
3: if f = True then . x is a doubly black NIL
4: n←x
5: if x = NIL then . Delete the singleton leaf
6: return NIL
7: while x 6= T and Color(x) = B 2 do
8: if Sibling(x) 6= NIL then . The sibling is not empty
9: s ← Sibling(x)
10: if s is red then . The sibling is red
11: set Parent(x) red
12: set s black
13: if x = Left(Parent(x)) then . x is the left
14: T ← Rotate-LeftT , Parent(x)
15: else . x is the right
16: T ← Rotate-RightT , Parent(x)
17: else if s is black and Left(s) is red then
18: if x = Left(Parent(x)) then . x is the left
19: set x, Parent(x), and Left(s) all black
20: T ← Rotate-Right(T , s)
21: T ← Rotate-Left(T , Parent(x))
22: else . x is the right
23: set x, Parent(x), s, and Left(s) all black
24: T ← Rotate-Right(T , Parent(x))
279

25: else if s is black and Right(s) is red then


26: if x = Left(Parent(x)) then . x is the left
27: set x, Parent(x), s, and Right(s) all black
28: T ← Rotate-Left(T , Parent(x))
29: else . x is the right
30: set x, Parent(x), and Right(s) all black
31: T ← Rotate-Left(T , s)
32: T ← Rotate-Right(T , Parent(x))
33: else if s, Left(s), and Right(s) are all black then
34: set x black
35: set s red
36: Blacken(Parent(x))
37: x ← Parent(x)
38: else . move the blackness up
39: set x black
40: Blacken(Parent(x))
41: x ← Parent(x)
42: set T black
43: if n 6= NIL then
44: replace n with NIL
45: return T
When fixing, we pass in the root T , the node x (can be doubly black), and a flag f .
The flag is true if x is doubly black NIL. We record it with n, and replace n with the
normal NIL after fixing.
Below is the example program implements delete:
Node del(Node t, Node x) {
if x == null then return t
var parent = x.parent;
Node db = null; //doubly black

if x.left == null {
db = x.right
x.replaceWith(db)
} else if x.right == null {
db = x.left
x.replaceWith(db)
} else {
var y = min(x.right)
parent = y.parent
db = y.right
x.key = y.key
y.replaceWith(db)
x = y
}
if x.color == Color.BLACK {
t = deleteFix(t, makeBlack(parent, db), db == null);
}
remove(x)
return t
}

Where makeBlack checks if the node changes to doubly black, and handles the special
case of doubly black NIL.
Node makeBlack(Node parent, Node x) {
if parent == null and x == null then return null
return if x == null
280 IMPERATIVE DELETE FOR RED-BLACK TREE

then replace(parent, x, Node(0, Color.DOUBLY_BLACK))


else blacken(x)
}

The function replace(parent, x, y) replaces the child of the parent, which is


x, with y.
Node replace(Node parent, Node x, Node y) {
if parent == null {
if y 6= null then y.parent = null
} else if parent.left == x {
parent.setLeft(y)
} else {
parent.setRight(y)
}
if x 6= null then x.parent = null
return y
}

The function blacken(node) changes the red node to black, and the black node to
doubly black:
Node blacken(Node x) {
x.color = if isRed(x) then Color.BLACK else Color.DOUBLY_BLACK
return x
}

Below example program implements the fixing:


Node deleteFix(Node t, Node db, Bool isDBEmpty) {
var dbEmpty = if isDBEmpty then db else null
if db == null then return null // delete the root
while (db 6= t and db.color == Color.DOUBLY_BLACK) {
var s = db.sibling()
var p = db.parent
if (s 6= null) {
if isRed(s) {
// the sibling is red
p.color = Color.RED
s.color = Color.BLACK
t = if db == p.left then leftRotate(t, p)
else rightRotate(t, p)
} else if isBlack(s) and isRed(s.left) {
// the sibling is black, and one sub-tree is red
if db == p.left {
db.color = Color.BLACK
p.color = Color.BLACK
s.left.color = p.color
t = rightRotate(t, s)
t = leftRotate(t, p)
} else {
db.color = Color.BLACK
p.color = Color.BLACK
s.color = p.color
s.left.color = Color.BLACK
t = rightRotate(t, p)
}
} else if isBlack(s) and isRed(s.right) {
if (db == p.left) {
db.color = Color.BLACK
p.color = Color.BLACK
s.color = p.color
s.right.color = Color.BLACK
t = leftRotate(t, p)
} else {
Elementary Algorithms 281

db.color = Color.BLACK
p.color = Color.BLACK
s.right.color = p.color
t = leftRotate(t, s)
t = rightRotate(t, p)
}
} else if isBlack(s) and isBlack(s.left) and
isBlack(s.right) {
// the sibling and both sub-trees are black.
// move blackness up
db.color = Color.BLACK
s.color = Color.RED
blacken(p)
db = p
}
} else { // no sibling, move blackness up
db.color = Color.BLACK
blacken(p)
db = p
}
}
t.color = Color.BLACK
if (dbEmpty 6= null) { // change the doubly black nil to nil
dbEmpty.replaceWith(null)
delete dbEmpty
}
return t
}

Where isBlack(x) tests if a node is black, the NIL node is also black.
Bool isBlack(Node x) = (x == null or x.color == Color.BLACK)

Bool isRed(Node x) = (x 6= null and x.color == Color.RED)

Before returning the final result, we check the doubly black NIL, and call the re-
placeWith function defined in Node.
data Node<T> {
//...
void replaceWith(Node y) = replace(parent, this, y)
}

The program terminates when reach the root or the doubly blackness is eliminated.
As we maintain the red-black tree balanced, the delete algorithm is bound to O(lg n) time
for the tree of n nodes.
282 AVL tree - proofs and the delete algorithm
AVL tree - proofs and the
delete algorithm

I Height increment
When insert an element, the increment of the height can be deduced into 4 cases:
∆H = |T 0 | − |T |
= 1 + max(|r0 |, |l0 |) − (1 + max(|r|, |l|))
= max(|r0 |, |l0 |) − max(|r|, |l|)
δ ≥ 0, δ 0 ≥ 0 : ∆r


 (67)
δ ≤ 0, δ 0 ≥ 0 : δ + ∆r

=


 δ ≥ 0, δ 0 ≤ 0 : ∆l − δ
otherwise : ∆l

Proof. When insert, the height can not increase both on left and right. We can explain
the 4 cases from the balance factor definition, which is the difference of the right and left
sub-trees:
1. If δ ≥ 0 and δ 0 ≥ 0, it means the height of the right sub-tree is not less than the
left sub-tree before and after insertion. In this case, the height increment is only
‘contributed’ from the right, which is ∆r.
2. If δ ≤ 0, it means the height of left sub-tree is not less than the right before. Since
δ 0 ≥ 0 after insert, we know the height of right sub-tree increases, and the left side
keeps same (|l0 | = |l|). The height increment is:

∆H = max(|r0 |, |l0 |) − max(|r|, |l|) {δ ≤ 0 and δ 0 ≥ 0}


= |r0 | − |l| {|l| = |l0 |}
= |r| + ∆r − |l|
= δ + ∆r

3. If δ ≥ 0 and δ 0 ≤ 0, similar to the above case, we have the following:

∆H = max(|r0 |, |l0 |) − max(|r|, |l|) {δ ≥ 0 and δ 0 ≤ 0}


= |l0 | − |r|
= |l| + ∆l − |r|
= ∆l − δ

4. Otherwise, δ and δ 0 are not bigger than zero. It means the height of the left sub-tree
is not less than the right. The height increment is only ‘contributed’ from the left,
which is ∆l.

283
284 AVL TREE - PROOFS AND THE DELETE ALGORITHM

II Balance adjustment after insert


The balance factors are ±2 in the 4 cases shown in figure fig. 40. After fixing, δ(y) resumes
to 0. The height of left and right sub-trees are equal.

𝛿 𝑥 =2
𝛿 𝑧 = −2
x
z 𝛿 𝑦 =1
𝛿 𝑦 = −1
a y
y d
𝛿 𝑦 =0
b z
x c y

c d
a b x z
𝛿 𝑧 = −2 𝛿 𝑥 =2

z a b c d
x
𝛿 𝑥 =1 𝛿 𝑧 = −1
x d a z

a y
y d

b c b c

Figure 40: Fix 4 cases to the same structure

The four cases are left-left, right-right, right-left, and left-right. Let the balance
factors before fixing be δ(x), δ(y), and δ(z), after fixing, they change to δ 0 (x), δ 0 (y), and
δ 0 (z) respectively. We next prove that, δ(y) = 0 for all 4 cases after fixing, and give the
result of δ 0 (x) and δ 0 (z).

Proof. We break into 4 cases:


Left-left
The sub-tree x keeps unchanged, hence δ 0 (x) = δ(x). As δ(y) = −1 and δ(z) = −2,
we have:

δ(y) = |c| − |x| = −1 ⇒ |c| = |x| − 1


(68)
δ(z) = |d| − |y| = −2 ⇒ |d| = |y| − 2

After fixing:

δ 0 (z) = |d| − |c| {f rom(eq. (68))}


= |y| − 2 − (|x| − 1)
(69)
= |y| − |x| − 1 {x is sub-tree of y ⇒ |y| − |x| = 1}
= 0

For δ 0 (y), we have the following:

δ 0 (y) = |z| − |x|


= 1 + max(|c|, |d|) − |x| {by (eq. (69)), |c| = |d|}
= 1 + |c| − |x| {by (eq. (68))} (70)
= 1 + |x| − 1 − |x|
= 0
II. BALANCE ADJUSTMENT AFTER INSERT 285

Summarize the above, the balance factors change to the following in left-left case:

δ 0 (x) = δ(x)
δ 0 (y) = 0 (71)
δ 0 (z) = 0

Right-right
The right-right case is symmetric to left-left:

δ 0 (x) = 0
δ 0 (y) = 0 (72)
δ 0 (z) = δ(z)

Right-left
Consider δ 0 (x), after fixing, it is:

δ 0 (x) = |b| − |a| (73)

Before fixing, the height of z can be obtained as:

|z| = 1 + max(|y|, |d|) {δ(z) = −1 ⇒ |y| > |d|}


= 1 + |y| (74)
= 2 + max(|b|, |c|)

Since δ(x) = 2, we have:

δ(x) = 2 ⇒ |z| − |a| = 2 {by (eq. (74))}


⇒ 2 + max(|b|, |c|) − |a| = 2 (75)
⇒ max(|b|, |c|) − |a| = 0

If δ(y) = |c| − |b| = 1, then:

max(|b|, |c|) = |c| = |b| + 1 (76)

Take this into (eq. (75)) gives:

|b| + 1 − |a| = 0 ⇒ |b| − |a| = −1 {by (eq. (73)) }


(77)
⇒ δ 0 (x) = −1

If δ(y) 6= 1, then max(|b|, |c|) = |b|. Take this into (eq. (75)) gives:

|b| − |a| = 0 {by (eq. (73))}


(78)
⇒ δ 0 (x) = 0

Summarize the 2 cases, we obtain the result of δ 0 (x) in δ(y) as the following:
(
δ(y) = 1 : −1
0
δ (x) = (79)
otherwise : 0

For δ 0 (z), from the definition, it equals to:

δ 0 (z) = |d| − |c| {δ(z) = −1 = |d| − |y|}


= |y| − |c| − 1 {|y| = 1 + max(|b|, |c|)} (80)
= max(|b|, |c|) − |c|
286 AVL TREE - PROOFS AND THE DELETE ALGORITHM

If δ(y) = |c| − |b| = −1, then max(|b|, |c|) = |b| = |c| + 1. Take this into (eq. (80)),
we have δ 0 (z) = 1. If δ(y) 6= −1, then max(|b|, |c|) = |c|. We have δ 0 (z) = 0. Combined
these two cases, we obtain the result of δ 0 (z) in δ(y) as below:
(
δ(y) = −1 : 1
0
δ (z) = (81)
otherwise : 0

Finally, for δ 0 (y), we deduce it like below:

δ 0 (y) = |z| − |x|


(82)
= max(|c|, |d|) − max(|a|, |b|)

There are three cases:

1. If δ(y) = 0, then |b| = |c|. According to (eq. (79)) and (eq. (81)), we have δ 0 (x) =
0 ⇒ |a| = |b|, and δ 0 (z) = 0 ⇒ |c| = |d|. These lead to δ 0 (y) = 0.

2. If δ(y) = 1, from (eq. (81)), we have δ 0 (z) = 0 ⇒ |c| = |d|.

δ 0 (y) = max(|c|, |d|) − max(|a|, |b|) {|c| = |d|}


= |c| − max(|a|, |b|) {from (eq. (79)): δ 0 (x) = −1 ⇒ |b| − |a| = −1}
= |c| − (|b| + 1) {δ(y) = 1 ⇒ |c| − |b| = 1}
= 0

3. If δ(y) = −1, from (eq. (79)), we have δ 0 (x) = 0 ⇒ |a| = |b|.

δ 0 (y) = max(|c|, |d|) − max(|a|, |b|) {|a| = |b|}


= max(|c|, |d|) − |b| {from (eq. (81)): |d| − |c| = 1}
= |c| + 1 − |b| {δ(y) = −1 ⇒ |c| − |b| = −1}
= 0

All three cases lead to the same result δ 0 (y) = 0. Summarize all above, we get the
updated balance factors after fixing as below:
(
0 δ(y) = 1 : −1
δ (x) =
otherwise : 0
δ 0 (y) = (
0 (83)
δ(y) = −1 : 1
δ 0 (z) =
otherwise : 0

Left-right
Left-right is symmetric to the right-left case. With similar method, we can obtain the
new balance factors that is identical to (eq. (83)).

III Delete algorithm


Deletion may reduce the height of the sub-tree. If the balance factor exceeds the range
of [−1, 1], then we need fixing.
III. DELETE ALGORITHM 287

∗ Functional delete
When delete, we re-use the binary search tree delete in the first step, then check the
balance factors and perform fixing. The result is a pair (T 0 , ∆H), where T 0 is the new
tree and ∆H is the height decrement. We define delete as below:

delete = fst ◦ del (84)

where del(T, k) does the actual work to delete element k from T :

del ∅ k = 
(∅, 0)

 k < k 0 : tree (del l k) k 0 (r, 0) δ
k > k 0 : tree (l, 0) k 0 (del r k) δ





(85)
 
 l = ∅ : (r, −1)
del (l, k 0 , r, δ) =



r = ∅ : (l, −1)
k = k0 :



else : tree (l, 0) k 00 (del r k 00 ) δ



 

where k 00 = min(r)

 

If the tree is empty, the result is (∅, 0); otherwise, let the tree be T = (l, k 0 , r, δ). We
compare the k and k 0 , lookup and delete recursively. When k = k 0 , we locate the node to
be deleted. If it has either empty sub-tree, we cut the node off, and replace it with the
other sub-tree; otherwise, we use the minimum k 00 in the right sub-tree to replace k 0 , and
cut k 00 off. We re-use the tree function and ∆H result. Additional to the insert cases,
there are two cases violate AVL rule, and need fixing. As shown in figure fig. 41, both
cases can be fixed by a tree rotation. We define them as pattern matching:

δ(y) = −2 δ(x)0 = δ(x) + 1


y x
δ(x) = 0 δ(y)0 = −1
x c =⇒ a y

a b b c
(a) Fix case A
δ(x) = 2 δ(y)0 = δ(y) − 1
x y
0
δ(y) = 0 δ(x) = 1
a y =⇒ x c

b c a b
(b) Fix case B

Figure 41: delete fix

...
balance ((a, x, b, δ(x)), y, c, −2) ∆H = (a, x, (b, y, c, −1), δ(x) + 1, ∆H)
(86)
balance (a, x, (b, y, c, δ(y)), 2) ∆H = ((a, x, b, 1), y, c, δ(y) − 1, ∆H)
...

Below is the example program:


288 AVL TREE - PROOFS AND THE DELETE ALGORITHM

delete t x = fst $ del t x where


del Empty _ = (Empty, 0)
del (Br l k r d) x
| x < k = node (del l x) k (r, 0) d
| x > k = node (l, 0) k (del r x) d
| isEmpty l = (r, -1)
| isEmpty r = (l, -1)
| otherwise = node (l, 0) k' (del r k') d where k' = min r

Where min and isEmpty are defined as below:


min (Br Empty x _ _) = x
min (Br l _ _ _) = min l

isEmpty Empty = True


isEmpty _ = False

With the additional two, there are total 7 cases in balance implementation:
balance (Br (Br (Br a x b dx) y c (-1)) z d (-2), dH) =
(Br (Br a x b dx) y (Br c z d 0) 0, dH-1)
balance (Br a x (Br b y (Br c z d dz) 1) 2, dH) =
(Br (Br a x b 0) y (Br c z d dz) 0, dH-1)
balance (Br (Br a x (Br b y c dy) 1) z d (-2), dH) =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1)) 2, dH) =
(Br (Br a x b dx') y (Br c z d dz') 0, dH-1) where
dx' = if dy == 1 then -1 else 0
dz' = if dy == -1 then 1 else 0
−− Delete specific
balance (Br (Br a x b dx) y c (-2), dH) =
(Br a x (Br b y c (-1)) (dx+1), dH)
balance (Br a x (Br b y c dy) 2, dH) =
(Br (Br a x b 1) y c (dy-1), dH)
balance (t, d) = (t, d)

† Imperative delete
The imperative delete uses tree rotations for fixing. In the first step, we re-use the binary
search tree algorithm to delete the node x from tree T ; then in the second step, check the
balance factor and perform rotation.
1: function Delete(T, x)
2: if x = NIL then
3: return T
4: p ← Parent(x)
5: if Left(x) = NIL then
6: y ← Right(x)
7: replace x with y
8: else if Right(x) = NIL then
9: y ← Left(x)
10: replace x with y
11: else
12: z ← Min(Right(x))
13: copy data from z to x
14: p ← Parent(z)
15: y ← Right(z)
III. DELETE ALGORITHM 289

16: replace z with y


17: return AVL-Delete-Fix(T, p, y)
When delete node x, we record its parent in p. If either sub-tree is empty, we cut
off x, and replace it with the other sub-tree. Otherwise if neither sub-tree is empty, we
locate the minimum element z of the right sub-tree, copy data from z to x, then cut z off.
Finally, we call AVL-Delete-Fix with the root T , the parent p, and the replacement
node y. Let the balance factor of p be δ(p), and it changes to δ(p)0 after delete. There
are three cases:

1. |δ(p)| = 0, |δ(p)0 | = 1. After delete, although a sub-tree height decreases, the parent
still satisfies the AVL rule. The algorithm terminates as the tree is still balanced;

2. |δ(p)| = 1, |δ(p)0 | = 0. Before the delete, the height difference between the two
sub-trees is 1; while after delete, the higher sub-tree shrinks by 1. Both sub-trees
have the same height now. As the result, the height of the parent also decrease by
1. We need continue the bottom-up update along the parent reference to the root;

3. |δ(p)| = 1, |δ(p)0 | = 2. After delete, the tree violates the AVL height rule, we need
rotate the tree to fix it.

For case 3, the implementation is similar to the insert fixing. We need add two
additional sub-cases as shown in figure fig. 41.
1: function AVL-Delete-Fix(T, p, x)
2: while p 6= NIL do
3: l ← Left(p), r ← Right(p)
4: δ ← δ(p), δ 0 ← δ
5: if x = l then
6: δ0 ← δ0 + 1
7: else
8: δ0 ← δ0 − 1
9: if p is leaf then . l = r = NIL
10: δ0 ← 0
11: if |δ| = 1 ∧ |δ 0 | = 0 then
12: x←p
13: p ← Parent(x)
14: else if |δ| = 0 ∧ |δ 0 | = 1 then
15: return T
16: else if |δ| = 1 ∧ |δ 0 | = 2 then
17: if δ 0 = 2 then
18: if δ(r) = 1 then . Right-right
19: δ(p) ← 0
20: δ(r) ← 0
21: p←r
22: T ← Left-Rotate(T, p)
23: else if δ(r) = −1 then . Right-left
24: δy ← δ( Left(r) )
25: if δy = 1 then
26: δ(p) ← −1
27: else
28: δ(p) ← 0
29: δ( Left(r) ) ← 0
30: if δy = −1 then
290 AVL TREE - PROOFS AND THE DELETE ALGORITHM

31: δ(r) ← 1
32: else
33: δ(r) ← 0
34: else . Delete specific right-right
35: δ(p) ← 1
36: δ(r) ← δ(r) − 1
37: T ← Left-Rotate(T, p)
38: break . No furthur height change
39: else if δ = −2 then
0

40: if δ(l) = −1 then . Left-left


41: δ(p) ← 0
42: δ(l) ← 0
43: p←l
44: T ← Right-Rotate(T, p)
45: else if δ(l) = 1 then . Left-right
46: δy ← δ( Right(l) )
47: if δy = −1 then
48: δ(p) ← 1
49: else
50: δ(p) ← 0
51: δ( Right(l) ) ← 0
52: if δy = 1 then
53: δ(l) ← −1
54: else
55: δ(l) ← 0
56: else . Delete specific left-left
57: δ(p) ← −1
58: δ(l) ← δ(l) + 1
59: T ← Right-Rotate(T, p)
60: break . No furthur height change
. Height decreases, go on bottom-up updating
61: x←p
62: p ← Parent(x)
63: if p = NIL then . Delete the root
64: return x
65: return T

IV Example program
The main delete program:
Node del(Node t, Node x) {
if x == null then return t
Node y
var parent = x.parent
if x.left == null {
y = x.replaceWith(x.right)
} else if x.right == null {
y = x.replaceWith(x.left)
} else {
y = min(x.right)
x.key = y.key
parent = y.parent
x = y
IV. EXAMPLE PROGRAM 291

y = y.replaceWith(y.right)
}
t = deleteFix(t, parent, y)
release(x)
return t
}

Where replaceWith is defined in the chapter of red-black tree. release(x) re-


leases the memory of a node. Function deleteFix is implemented as below:
Node deleteFix(Node t, Node parent, Node x) {
int d1, d2, dy
Node p, l, r
while parent 6= null {
d2 = d1 = parent.delta
d2 = d2 + if x == parent.left then 1 else -1
if isLeaf(parent) then d2 = 0
parent.delta = d2
p = parent
l = parent.left
r = parent.right
if abs(d1) == 1 and abs(d2) == 0 {
x = parent
parent = x.parent
} else if abs(d1) == 0 and abs(d2) == 1 {
return t
} else if abs(d1) == 1 and abs(d2) == 2 {
if d2 == 2 {
if r.delta == 1 { // right-right
p.delta = 0
r.delta = 0
parent = r
t = leftRotate(t, p)
} else if r.delta == -1 { // right-left
dy = r.left.delta
p.delta = if dy == 1 then -1 else 0
r.left.delta = 0
r.delta = if dy == -1 then 1 else 0
parent = r.left
t = rightRotate(t, r)
t = leftRotate(t, p)
} else { // delete specific right-right
p.delta = 1
r.delta = r.delta - 1
t = leftRotate(t, p)
break // no further height change
}
} else if d2 == -2 {
if (l.delta == -1) { // left-left
p.delta = 0
l.delta = 0
parent = l
t = rightRotate(t, p)
} else if l.delta == 1 { // left-right
dy = l.right.delta
l.delta = if dy == 1 then -1 else 0
l.right.delta = 0
p.delta = if dy == -1 then 1 else 0
parent = l.right;
t = leftRotate(t, l)
t = rightRotate(t, p)
} else { // delete specific left-left
p.delta = -1
l.delta = l.delta + 1
t = rightRotate(t, p)
break // no further height change
292 AVL TREE - PROOFS AND THE DELETE ALGORITHM

}
}
// height decreases, go on bottom-up update
x = parent
parent = x.parent
}
}
if parent == null then return x // delete the root
return t
}
Answers

Answer of exercise 1
1.1. For the free number puzzle, since all numbers are not negative, we can reuse the
sign as a flag. For every |x| < n (where n is the length), negate the number at
position |x|. Then scan to find the first positive number. Its position is the answer.
Write a program to realize this solution.
Int minFree([Int] nums) {
var n = length(nums)
for Int i = 0 to n - 1 {
var k = abs(nums[i])
if k < n then nums[k] = -abs(nums[k])
}
for Int i = 0 to n - 1 {
if nums[i] > 0 then return i
}
return n
}

1.2. There are n numbers 1, 2, ..., n. After some processing, they are shuffled, and a
number x is altered to y. Suppose 1 ≤ y ≤ n, design a solution to find x and y in
linear time with constant space.
For example X = [3, 1, 3, 5, 4], the missing number x = 2, the duplicated one
y = 3. We give 4 methods: (1) divide and conquer; (2) pigeon hole sort; (3) sign
encoding; and (4) equations.
1+n
Divide and conquer: Partition the numbers with the middle point m = b c:
2
the left as = [a ≤ m, a ← X], and the right bs = [b > m, b ← X]. If the length
of |as| < m, then the missing number is on the left, let s = 1 + 2 + ... + m =
m(m + 1)
, then x = s − sum(as). We can also calculate the missing one on
2
(n + m + 1)(n − m)
the right. Let s0 = (m + 1) + (m + 2) + ... + n = , then
2
y = sum(bs) − s . If the length of |as| > m, then the duplicated number is on the
0

left. Use the similar method, we calculate the missing number x = s0 − sum(bs),
and the duplicated number y = sum(as) − s. Otherwise if the length |as| = m,
then there are m numbers not greater than m. But we don’t know whether they
are some permutation of 1, 2, ..., m. We can calculate and compare sum(as) and
s. If equal, then we can drop all numbers on the left, then recursively find x and y
on the right; otherwise, we drop the right, recursively find on the left. In recursive
finding, we need replace the lower bound of 1 with l. Because we halve the list
every time, the overall performance is O(n) according to the master theorem.
missDup xs = solve xs 1 (length xs) where
solve xs@(_:_:_) l u | k < m - l + 1 = (sl - sl', sr' - sr)

293
294 ANSWERS

| k > m - l + 1 = (sr - sr', sl' - sl)


| sl == sl' = solve bs (m + 1) u
| otherwise = solve as l m
where
m = (l + u) `div` 2
(as, bs) = partition ( ≤ m) xs
k = length as
sl = (l + m) ∗ (m - l + 1) `div` 2
sr = (m + 1 + u) ∗ (u - m) `div` 2
(sl', sr') = (sum as, sum bs)

Pigeon hole sort. Since all numbers are within the range from 1 to n, we can
do pigeon hole sort. Scan from left to right, for every number x at position i, if
x 6= i, we swap it with number y at position x. We find the duplicated number if
x = y, besides, we find the missing number i. Repeat this till x = i or meet the
duplicated number. Because every number is swapped to its right position a time,
the total performance is O(n).
(Int, Int) missDup([Int] xs) {
Int miss = -1, dup = -1
for Int i = 0 to length(xs) - 1 {
while xs[i] 6= i {
Int j = xs[i]
if xs[j] == xs[i] {
dup = xs[j]
miss = i
break
} else {
j = xs[i]
(xs[i], xs[j]) = (xs[j], xs[i])
}
}
}
return (miss, dup)

Sign encoding. Setup an array of n flags. For every number x, mark the x-th
flag in the array true. When meet the duplicated number, the corresponding flag
was marked before. Let the duplicated number be d, we know s = 1 + 2 + ... + n =
n(n + 1)
, and the sum s0 of all numbers. We can calculate the missing number
2
m = d + s − s0 . However, this method need additional n flags. The existence
of a number is a type of binary information (yes/no), we can encode it as the
positive/negative sign, hence re-use the space. For every x, flip the number at
position |x| to negative, where |x| is the absolute value. If a number of some
position is already negative, it’s the duplicated one, and we can next calculate the
missing one.
(Int, Int) missDup([Int] xs) {
Int miss = -1, dup = -1
Int n = length(xs)
Int s = sum(xs)
for i = 0 to n - 1 {
Int j = abs(xs[i]) - 1
if xs[j] < 0 {
dup = j
miss = dup + n ∗ (n + 1) / 2 - s
break
}
xs[j] = -abs(xs[j])
}
return (miss, dup)
295

Equation. Consider a simplified problem: random drop a number after shuffle 1


n(n + 1)
to n, how to find it? We sum all the numbers, then subtract it from :
2
m = s − s0

Where m is the missing number, s is the sum from 1 to n, s0 is the sum of all
numbers. However, for a missing number and a duplicated number, we can’t solve
with only one equation:

(3)
X
(x[i] − i) = d − m

Where the left hand is the sum of the i-th number minus i. Can we figure out a
second equation? We can use square: sum the difference between the square of the
i-th number and the square of i:

(4)
X
(x[i]2 − i2 ) = d2 − m2 = (d + m)(d − m)

Since d − m 6= 0, we can divide eq. (3) by eq. (4) on both sides to get another
equation:

(5)
X X
(x[i]2 − i2 )/ (x[i] − i) = d + m

Compare equation eq. (3) and eq. (5), there are two equations with two unknowns.
We can solve them:
(x[i]2 − i2 ) P
 P
1
m = ( P − (x[i] − i))


2 P (x[i] − i)
2 2
1 (x[i] − i ) P
d = ( P + (x[i] − i))


2 (x[i] − i)

missDup xs = ((b `div` a - a) `div` 2, (b `div` a + a) `div` 2)


where
ys = zip xs [1..]
a = sum [x - y | (x, y) ← ys]
b = sum [x^2 - y^2 | (x, y) ← ys]

1.3. Yes, it’s essentially a solution based on queue.

Answer of exercise 1.2


1.2.1. For list of type A, suppose we can test if any two elements x, y ∈ A are equal,
define the algorithm to test if two lists are equal.
We use type constraint to ensure compare lists of the same type.
(==) :: [a] → [a] → Bool
[] == [] = True
(x:xs) == (y:ys) = x == y && (xs == ys)
xs == ys = False

Answer of exercise 1.3


1.3.1. For the iterative Get-At(i, X), what is the behavior when X is empty? what if i
is out of bound?
The behavior is undefined. We can handle it with Optional<T>:
296 ANSWERS

Optional<T> getAt(List<T> xs, Int i) {


while i 6= 0 and xs 6= null {
xs = xs.next
i--
}
return if xs 6= null then Optional.of(xs.key) else Optional.Nothing
}

Answer of exercise 1.4


1.4.1. In the Init algorithm, can we use Append(X 0 , First(X)) instead of Cons?
Append need traverse to the tail every time, hence degrades the performance from
O(n) to O(n2 ), where n is the length.
1.4.2. How to handle empty list or out of bound error in Last-At?
Use Optional<T>:
Optional<T> lastAt(List<T> xs, Int i) {
List<T> p = xs
while i 6= 0 and xs 6= null {
xs = xs.next
i--
}
if xs == null then return Optional.Nothing
while xs.next 6= null {
xs = xs.next
p = p.next
}
return Optional.of(p.key)
}

Answer of exercise 1.5


1.5.1. Add the ‘tail’ reference, optimize the append to constant time.
We need wrap the linked-list with the head and tail variables:
data List<A> {
data Node<A> {
A key
Node<A> next
}

Node<A> head = null


Node<A> tail = null
Int length = 0
}

List<A> append(List<A> xs, A x) {


List.Node<A> tl = xs.tail
xs.tail = List.Node<A>(x, null)
if tl == null {
xs.head = xs.tail
} else {
tl.next = xs.tail
}
xs.length++
return xs
}
297

1.5.2. When need update the tail reference? How does it affect the performance?
We need update the tail reference when append to, delete from the tail, add to
empty list, delete the element from a singleton list, and split the list. All operations
are in constant time except for the splitting as = bs + + cs. We need linear time to
reach the tail of bs.
1.5.3. Handle the empty list and out of bound error for setAt.
setAt 0 x [ ] means x : [ ]. Let |xs| = n, we treat setAt n x xs same as xs + + [x].
Other out of bound cases will raise exception.

Answer of exercise 1.6


1.6.1. Handle the out-of-bound case when insert, treat it as append.
insert n x [] = [x]
insert n x (y:ys) | n ≤ 0 = x : y : ys
| otherwise y : insert (n - 1) x ys

1.6.2. Implement insert for array. When insert at position i, all elements after i need
shift to the end.
[K] insert([K] xs, Int i, K x) {
append(xs, x)
for Int j = length(xs) - 1, j > i, j-- {
swap(xs[j], xs[j-1])
}
return xs
}

Answer of exercise 1.7


1.7.1. Implement the algorithm to find and delete all occurrences of a given value.
delAll x [] = []
delAll x (y:ys) | x == y = delAll x ys
| otherwise = y : delAll x ys

Or use f oldr:
delAll x = foldr f [] where
f y ys = if x == y then ys else y : ys

1.7.2. Design the delete algorithm for array, all elements after the delete position need
shift to front.
[K] delAt(Int i, [K] xs) {
Int n = length(xs)
if 0 ≤ i < n {
while i < n - 1 {
xs[i] = xs[i + 1]
i++
}
popLast(xs)
}
return xs
}

Answer of exercise 1.8


298 ANSWERS

1.8.1. Change length to tail recursive.


length = len 0 where
len n [] = n
len n (x:xs) = len (n + 1) xs

1.8.2. Compute bn through the binary format of n.


b needn’t necessarily be integer, it can be real, or any element in a set that defines
the unit e (e.g. 1) and binary operation (e.g. multiply). We call such set monoid
in mathematics.
Monoid pow(Monoid b, Int n) {
Monoid a = Monoid.e
while n 6= 0 {
if n & 1 == 1 then a = a ∗ b
b = b ∗ b
n = n >> 1
return a
}

Answer of exercise 1.9


1.9.1. Find the maximum v in a list of pairs [(k, v)] in tail recursive way.
maxValue ((k, v):kvs) = maxV v kvs where
maxV m [] = m
maxV m ((_, v):kvs) = maxV (max m v) kvs

Answer of exercise 1.10


1.10.1. Change the take/drop implementation. When n is negative, returns [ ] for take,
and the entire list for drop.
safeTake n [] = []
safeTake n (x:xs) | n ≤ 0 = []
| otherwise = x : safeTake (n - 1) xs

safeDrop n [] = []
safeDrop n (x:xs) | n ≤ 0 = x:xs
| otherwise = safeDrop (n - 1) xs

1.10.2. Implement the in-place imperative take/drop.


We need append to/delete from the tail of array, otherwise, need shift elements.
Let the length of array xs be n, we implement take(m, xs) by deleting n − m
elements from tail. For drop(m, xs), we start from xs[m], move every element
xs[i] ahead to xs[i − m]. Finally remove m elements from the tail.
[K] take(Int m, [K] xs) {
Int d = length(xs) - m
while d > 0 and xs 6= [] {
pop(xs)
d--
}
return xs
}

[K] drop(Int m, xs) {


if m ≤ 0 then return xs
for Int i = m to length(xs) { xs[i - n] = xs[i] }
while m > 0 and xs 6= [] {
299

pop(xs)
m--
}
return xs
}

1.10.3. Define sublist and slice in Curried Form without X as parameter.


sublist from cnt = take cnt ◦ drop (from - 1)

slice from to = drop (from - 1) ◦ take to

1.10.4. Consider the below span implementation:


span p [ ] = ([
( ], [ ])
p(x) : (x : as, bs), where (as, bs) = span(p, xs)
span p (x:xs) =
otherwise : (as, x : bs)

What is the difference here?


It separates xs into two: all elements satisfying p and the others: as = [p(a), a ←
xs] and bs = [not p(b), b ← xs]. as is not necessarily the longest prefix that satisfies
p, but the longest sub-sequence satisfying p.

Answer of exercise 1.11


1.11.1. To define insertion-sort with f oldr, we design the insert function as insert x X,
and sort as sort = f oldr insert [ ]. The type for f oldr is:

f oldr :: (A → B → B) → B → [A] → B

Where its first parameter f has the type of A → B → B, the initial value z has
the type B. It folds a list of A, and builds the result of B. How to define the
insertion-sort with f oldl? What is the type of f oldl?

f oldl has the type of f oldl :: (B → A → B) → B → [A] → B. We need swap the


arguments of insert for f oldl:

sort = f oldl (xs x 7→ insert x xs) [ ]

Or define a f lip function:


f lip f x y = f y x (1.58)
Then define the insertion sort as: sort = f oldl (f lip insert) [ ]
1.11.2. What’s the performance of concat? Design a linear time concat algorithm.
The performance varies when define concat with f oldl and f oldr. as + + bs takes
O(m) time, where m = |as|. When using f oldl for xs1 + + xs2 + + xsn , as gets
+ ... +
longer and longer, the performance is O(m1 + (m1 + m2 ) + ... + 1 mi ). While
Pn−1
using f oldr for xs1 +
+(xs2 ++(...++xsn )...), the length of as doesn’t keep increasing,
but is the length of every xsi . The performance is O(mn−1 + ... + m2 + m1 ), bound
to linear time. We can expand its definition:
concat [] = []
concat ([]:xss) = concat xss
concat ((x:xs):xss) = x : concat (xs:xss)

1.11.3. Define map in f oldr.

map f = f oldr (x xs 7→ (f x):xs) [ ] (1.59)


300 ANSWERS

Answer of exercise 1.12


1.12.1. Implement the linear time filter algorithm through reverse.
List<K> filterL((K → Bool) p, List<K> xs) {
List<K> ys = null
while xs 6= null {
if p(xs.key) then ys = cons(xs.key, ys)
xs = xs.next
}
return reverse(ys)
}

1.12.2. Enumerate all suffixes of a list.


Method 1: repeatedly drop the head of xs:

[xs, tail xs, tail (tail xs), ..., [ ]]

Method 2: use pattern matching to drop the head:


tails [] = [[]]
tails xs@(_:ys) = xs : tails ys

Method 3: use f oldr, start from [[ ]], add element to right to build the suffixes:
tails = foldr f [[]] where
f x xss@(xs:_) = (x:xs) : xss

Answer of exercise 1.13


1.13.1. Design the iota (the Greek letter I) operator for list, below are the use cases:
• iota(..., n) = [1, 2, 3, ..., n];
• iota(m, n) = [m, m + 1, m + 2, ..., n], where m ≤ n;
• iota(m, m + a, ..., n) = [m, m + a, m + 2a, ..., m + ka], where k is the maximum
integer satisfying m + ka ≤ n;
• iota(m, m, ...) = repeat(m) = [m, m, m, ...];
• iota(m, ...) = [m, m + 1, m + 2, ...].
1. Tail recursive way to generate [1, 2, ..., n]:
iota = iota' [] where
iota' ns n | n < 1 = ns
| otherwise = iota' (n : ns) (n - 1)

2. With start value: [m, m + 1, ..., n]. Change the lower limit from 1 to m:
iota m n = iota' [] n where
iota' ns n | n < m = ns
| otherwise = iota' (n : ns) (n - 1)

3. With step: [m, m + a, m + 2a, ..., m + ka], where k is the maximum integer
satisfying m + ka ≤ n.
iota m n a | m ≤ n = m : iota (m + a) n a
| otherwise = []

4. Remove the termination condition to generate open sequence of n = ∞: [m, m+


1, ...].
iota m = m : iota (m + 1)
301

For example, generate the first 10 natural numbers: take 10 (iota 1).
5. Remove +1, to implement repeat:
repeat m = m : repeat m

Summarize above, we define the iterate function that covers all iota use cases:
iterate f x = x : iterate f (f x)

iota1 n = take n $ iterate (+1) 1


iota2 m n = takeWhile ( ≤ n) $ iterate (+1) m
iota3 m n a = takeWhile ( ≤ n) $ iterate (+a) m
repeat m = iterate id m
iota5 m = iterate (+1) m

1.13.2. Implement the linear time imperative zip.


[(A, B)] zip([A] xs, [B] ys) {
[(A, B)] zs = null
while xs 6= null and ys 6= null {
zs = cons((xs.key, ys.key), zs)
xs = xs.next
ys = ys.next
}
return reverse(zs)
}

1.13.3. Define zip with fold (hint: define fold for two lists f oldr2 f z xs ys).
Define fold for two lists:
f oldr2 f z [ ] ys = z
f oldr2 f z xs [ ] = z (1.75)
f oldr2 f z (x:xs) (y:ys) = f oldr2 f (f x y z) xs ys

Then define zip with f oldr2 (Curried form):

zip = f oldr2 f [ ], where f x y xys = (x, y):xys

1.13.4. Implement lastAt with zip.


To index the right k-th element, we drop the first k elements of xs = [x0 , x1 , ..., xn−1 ]
to get ys = drop k xs. Then zip xs ys, the last pair is (xn−k−1 , xn−1 ).

lastAt k xs = (f st ◦ last) (zip xs (drop k xs))

1.13.5. Write a program to remove the duplicated elements in a list while maintain the
original order. For imperative implementation, the elements should be removed in-
place. What is the complexity? How to simplify it with additional data structure?
dedup [] = []
dedup (x : xs) = x : dedup (filter (x 6= ) xs)

We can define dedup with f oldr:


dedup = foldr f [] where
f x xs = x : filter (x 6= ) xs

Because we f ilter for every element, the time is bound to O(n2 ). We can use set
(see chapter 3, 4) to implement dedup in O(n lg n) time:
dedup = Set.toList ◦ Set.fromList
302 ANSWERS

The corresponding iterative in-place implementation:


[K] dedup([K] xs) {
Int n = length(xs)
for Int i = 0, i < n, i++ {
Int j = i + 1
while j < n {
if xs[i] == xs[j] {
swap(xs[j], xs[n - 1])
n--
} else {
j++
}
}
}
Int m = length(xs) - n
loop m { pop(xs) }
return xs
}

1.13.6. List can represent decimal non-negative integer. For example 1024 as list is 4 →
2 → 0 → 1. Generally, n = dm ...d2 d1 can be represented as d1 → d2 → ... → dm .
Given two numbers a, b in list form. Realize arithmetic operations such as add
and subtraction.

Wrap every digit of a decimal natural number into a list, the higher digit is on
the right. Convert n = (dm ...d2 d1 )10 to list of [d1 , d2 , ..., dm ]. For example, 1024
is represented as [4, 2, 0, 1]. We can convert such a list back to number, by:
f oldr (c d 7→ 10d + c) 0. Conversely, below function convert a natural number to
list:
toList n | n < 10 = [n]
| otherwise = (n `mod` 10) : toList (n `div` 10)

1. Add. 0 is the unit element, i.e., 0 + as = as + 0 = as, where 0 is represented as


empty list [ ] or list of all zeros [0].
[ ] + bs = bs
[0] + bs = bs
as + [ ] = as
as + [0] = as
a+b
Add every digit together d = (a + b) mod 10, and add the carry c = b c to the
10
next digit:
(a:as) + (b:bs) = d:(as + bs + [c])
Below is the example program:
add [] bs = bs
add [0] bs = bs
add as [] = as
add as [0] = as
add (a:as) (b:bs) = ((a + b) `mod` 10) : add as (add bs [(a + b) `div` 10])

2. Minus. as − 0 = as, otherwise subtract each digit of as and bs. If the digit
a < b, we need carry out: d = 10 + a − b, and the remaining as0 = as − [1].
minus as [] = as
minus as [0] = as
minus (a:as) (b:bs) | a < b = (10 + a - b) : minus (minus as [1]) bs
| otherwise = (a - b) : minus as bs
303

3. Multiply. For as × bs, we multiply every digit b in bs to as, then times the by
10 and accumulate the result: cs0 = 10 × cs + (b × as). When compute b × as, if
b = 0, then it’s 0; otherwise, multiply b with the first digit a to get d = ab mod 10,
ab
add the carry c = b c to the further result:
10

b × (a:as) = d : ([c] + (b × as))

Below is the corresponding example program:


mul as = foldr (λ b cs → add (mul1 b as) (0:cs)) []

mul1 0 _ = []
mul1 b [] = []
mul1 b (a:as) = (b ∗ a `mod` 10) : add [b ∗ a `div` 10] (mul1 b as)

4. Divide (with remainder). First, define how to test zero, how to compare two
numbers. A number is zero if it’s an empty list or all digits are 0:
isZero = all (== 0)

To compare as and bs, 0 is less than any none zero number; otherwise, compare
from the highest bit to the lowest (EQ: equal, LT: less than, GT: greater than):
cmp [] [] = EQ
cmp [] bs = if isZero bs then EQ else LT
cmp as [] = if isZero as then EQ else GT
cmp (a:as) (b:bs) = case cmp as bs of EQ → compare a b
r → r

Then we can define equal, and less than test with cmp:
eq as bs = EQ == cmp as bs
lt as bs = LT == cmp as bs

The brute-force way to implement divide is to repeat minus bs 6= 0 from as:


(
as < bs : 0
bas/bsc =
otherwise : 1 + b(as − bs)/bsc

To improve it, consider as = q · bs + r, where q is the quotient, 0 ≤ r < bs is the


remainder. Let as = (am ...a2 a1 )10 = [a1 , a2 , ..., am ], we first use the brute force
way to divide am by bs to get the quotient qm and the remainder rm = am −qm ·bs.
Then add the next digit am−1 to divide 10rm + am−1 by bs to get the quotient
qm−1 and the remainder rm−1 . Finally, put all digits together. This is exactly the
f oldr process:
ldiv as bs | isZero bs = error ” d i v i d e by 0”
| otherwise = foldr f ([], []) as where
f a (qs, rs) = (q:qs, (a:rs) `minus` (q `mul1` bs)) where
q = ndec (a:rs)
ndec as = if as `lt` bs then 0 else 1 + ndec (as `minus` bs)

1.13.7. In imperative settings, a circular linked-list is corrupted, that some node points
back to previous one, as shown in fig. 1.6. When traverse, it falls into infinite
loops. Design an algorithm to detect if a list is circular. On top of that, improve
it to find the node where loop starts (the node being pointed by two precedents).
304 ANSWERS

Figure 1.6: List with cycle


If a list has cycle, the traverse will not stop. Consider two runners in a circle, the
faster runner will catch, meet, surpass the slower one again and again11 . We use
two pointers to traverse the list: p moves 1 node a time, q moves 2 nodes a time.
If they meet at sometime, then there is a circle.
Bool hasCycle(List<K> h) {
var p = h, q = h
while p 6= null and q 6= null {
p = p.next
q = q.next
if q == null then return False
q = q.next
if p == q return True
}
return False
}

Robert W. Folyd calls this method ‘tortoise and hare’ algorithm. We can further
find the node where the circle starts. As shown in fig. 1.7, let OA has k nodes.
The loop starts from A. The circle contains n nodes. When p arrives at A, the
faster pointer (doubled speed) q will arrives at B. From this time point, the two
pointers loop. p is behind q by k nodes. From circle perspective, it equivalent to
q is going to catch up p, which is ahead of n − k nodes. It takes time of:

n−k n−k
t= =
2v − v v

O k Ap

Bq

n−k

Figure 1.7: Tortoise and hare


11 Strictly speaking, it’s limit to continuous situation. For discrete case like linked-list, they meet if the

two speed integers co-prime.


305

n−k
When meet, p moves from A by distance of v = n − k. p will arrive at A
v
again after moving forward k nodes. At this time, if reset q to the head O, let it
move 1 node a time. Then q will arrive at A after k nodes too, i.e., p and q meet
at A. Although we assumed k < n, it’s true also for k ≥ n.
Proof. p and q start from the head. When p arrives at A, q will arrive at a place
in the circle pass A by k mod n nodes. Convert to q will catch up p ahead of
n − (k mod n) nodes. It takes time:

n − (k mod n) n − (k mod n)
t= =
2v − v v
n − (k mod n)
When meet, p moves from A by distance of v = n − (k mod n). If p
v
further moves forward k, it will arrives at the place of (k mod n) + n − (k mod n) =
n, it’s exactly at A again. At this time after p and q meet, if reset q to the head
O, let it move 1 node a time. Then a will arrive at A after k nodes too, i.e., p and
q meet at A again.
We can implement a program to find A:
Optional<List<K>> findCycle(List<K> h) {
var p = h, q = h
while p 6= null and q 6= null {
p = p.next
q = q.next
if q == null then return Optional.Nothing
q = q.next
if p == q {
q = h
while p 6= q {
p = p.next
q = q.next
}
return Optional.of(p)
}
}
return Optional.Nothing
}

Answer of exercise 2.1


2.1.1. Given the in-order and pre-order traverse results, rebuild the tree, and output the
post-order traverse result. For example:
• Pre-order: 1, 2, 4, 3, 5, 6;
• In-order: 4, 2, 1, 5, 3, 6;
• Post-order: ?
[4, 2, 5, 6, 3, 1]
2.1.2. Write a program to rebuild the binary tree from the pre-order and in-order traverse
lists.

Let P be the pre-order traverse result, I be the in-order result. If P = I = [ ],


then the binary tree is empty ∅. Otherwise, the pre-order is recursive ‘key - left
- right’, hence the first element m in P is the key of the root. The in-order is
recursive ‘left - key - right’, we can find m in I, which splits I into three parts:
306 ANSWERS

[a1 , a2 , ..., ai−1 , m, ai+1 , aa+2 , ..., an ]. Let Il = I[1, i), Ir = I[i + 1, n], where [l, r)
includes l, but excludes r. Either can be empty [ ]. In these three parts Il , m, Ir ,
Il is the in-order traverse result of the left sub-tree, Ir is the in-order result of the
right sub-tree. Let k = |Il | be the size of the left sub-tree, we can split P [2, n] at
k to two parts: Pl , Pr , where Pl contains the first k elements. We next recursively
rebuild the left sub-tree from (Pl , Il ), rebuild the right sub-tree from (Pr , Ir ):

rebuild [ ] [ ] = ∅
rebuild (m:ps) I = (rebuild Pl Il , m, rebuild Pr Ir )

Where:

(Il , Ir ) = splitWith m I
(

(Pl , Pr ) = splitAt |Il | ps

Below is the example program:


rebuild [] _ = Empty
rebuild [c] _ = Node Empty c Empty
rebuild (x:xs) ins = Node (rebuild prl inl) x (rebuild prr inr) where
(inl, _:inr) = (takeWhile ( 6= x) ins, dropWhile ( 6= x) ins)
(prl, prr) = splitAt (length inl) xs

We can also update the left and right boundary imperatively:


Node<T> rebuild([T] pre, [T] ins, Int l = 0, Int r = length(ins)) {
if l ≥ r then return null
T c = popFront(pre)
Int m = find(c, ins)
var left = rebuild(pre, ins, l, m)
var right = rebuild(pre, ins, m + 1, r)
return Node(left, c, right)
}

2.1.3. For binary search tree, prove that the in-order traverse always gives ordered list.
Proof. Use proof with absurdity. Suppose there exits (finite sized) binary search
tree, the in-order result is not ordered. Among all such trees, select the smallest
T . First T can’t be ∅, as the in-order result is [ ], which is ordered. Second, T
can’t be singleton (∅, k, ∅), as the in-order result is [k], which is ordered. Hence T
must be a branch node of (l, k, r). The in-order result is toList l + + [x] +
+ toList r.
Because T is the smallest tree that the in-order result is not ordered, while l and
r are smaller than T , hence both toList l and toList r are ordered. According to
the binary search tree definition, for every x ∈ l, x < k, and every y ∈ r, y > k.
Hence the in-order result toList l + + toList r is ordered, which conflicts with
+ [x] +
the assumption, that the in-order result of T is not ordered.
Therefore, for any binary search tree, the in-order result is ordered.

2.1.4. What is the complexity of tree sort?


(n lg n), where n is the number of elements.
2.1.5. Define toList with fold.
toList = f oldt id (as b bs 7→ as +
+ b : bs) [ ]
= f old (:) [ ]

2.1.6. Define depth t with fold, to calculate the height of a binary tree.

depth = f oldt (x 7→ 1) (x d y 7→ d + max x y) 0


307

Answer of exercise 2.2


2.2.1. How to test whether an element k exists in the tree t?
member x (Node l k r) | x == k = True
| x < k = member x l
| otherwise = member x r

2.2.2. Use Pred and Succ to write an iterator to traverse the binary search tree as a
generic container. What’s the time complexity to traverse a tree of n elements?
data TreeIterator<T> {
Node<T> node = null

TreeIterator(Node<T> root) { node = min(root) }

T get() = node.key

Bool hasNext() = node 6= null

Self next() { if hasNext() then node = succ(node) }


}

Although we need find the min/max of the sub-tree or back-track along the parent
reference, we take linear time to iterate the tree container. During the traverse
process, we visit every node a time (arrive and leave), for example:
for var it = TreeIterator(root), it.hasNext(), it = it.next() {
print(it.get())
}

The traverse performance is O(n).


2.2.3. One can traverse the elements inside range [a, b], for example:
for_each (m.lower_bound(12), m.upper_bound(26), f);
Write an equivalent functional program for binary search tree.
mapR f a b t = map' t where
map' Empty = Empty
map' (Node l k r) | k < a = map' r
| a ≤ k && k ≤ b = Node (map' l) (f k) (map' r)
| k > b = map' l

Answer of exercise 2.3


2.3.1. There is a symmetric deletion algorithm. When neither sub-tree is empty, we
replace with the maximum of the left sub-tree, then cut it off. Write a program to
implement this solution.
delete _ Empty = Empty
delete x (Node l k r) | x < k = Node (delete x l) k r
| x > k = Node l k (delete x r)
| otherwise = del l r
where
del Empty r = r
del l Empty = l
del l r = let m = max l in Node (delete m l) m r

2.3.2. Write a randomly building algorithm for binary search tree.


fromList = (foldr insert Empty) ◦ shuffle
308 ANSWERS

2.3.3. How to find the two nodes with the greatest distance in a binary tree?
We first find the maximum distance m, then give the longest path [s, a, b, ..., e]. The
two ends s and e are the two nodes in question. To define the distance between two
nodes, let the connected path (without direction) be s → n1 → n2 → ... → nm →
e, every edge has the length of 1, then the total length from s to e is the distance
between them, which is m + 1. Define the maximum distance of the empty tree
as 0. For singleton leaf (∅, k, ∅), as the longest path is [k], the maximum distance
is also 0 (from k to k). Consider the branch node (l, k, r), the maximum distance
must be one of the three: (1) from the deepest node on the left to the root, then
to the deepest node on the right: depth l + depth r; (2) the maximum distance of
the left sub-tree l; (3) the maximum distance of the right sub-tree r.
maxDistance Empty = 0
maxDistance (Node Empty _ Empty) = 0
maxDistance (Node l _ r) = maximum [depth l + depth r,
maxDistance l, maxDistance r]

Where the definition of depth is in Exercise 2.1.6. We can adjust it to find the
longest path. For the empty tree, the longest path is [ ], for singleton leaf, the
longest path is [k], for branch node (l, k, r), the longest path is the maximum of
the three: (1) The reverse of the path from the root to the deepest node on the left,
and k, and the path from the root the deepest node on the right; (2) the longest
path of the left; (3) the longest path of the right.
maxPath Empty = []
maxPath (Node Empty k Empty) = [k]
maxPath (Node l k r) = longest [(reverse depthPath l) +
+ k : depthPath r,
maxPath l, maxPath r] where
longest = maximumBy (compare `on` length)
depthPath = foldt id (λ xs k ys → k : longest [xs, ys]) []

This implementation traverses the tree when calculate the depth, then traverses
another two rounds for the left and right sub-trees. To avoid duplication, we can
bottom-up fill the depth d and the maximum distance m in each node. This can
be done through tree map: T ree A 7→ T ree (Int, Int) in one traverse:
maxDist = extract ◦ mapTr where
extract Empty = 0
extract (Node _ (_, m) _) = m
mapTr Empty = Empty
mapTr (Node l _ r) = f (mapTr l) (mapTr r)
f l r = Node l (1 + max d1 d2, maximum [d1 + d2, m1, m2]) r where
(d1, m1) = pairof l
(d2, m2) = pairof r
pairof Empty = (0, 0)
pairof (Node _ k _) = k

We can further simplify it with fold:


maxDist = snd ◦ pair ◦ foldt id g Empty where
g l _ r = Node l (1 + max d1 d2, maximum [d1 + d2, m1, m2]) r where
(d1, m1) = pair l
(d2, m2) = pair r
pair = (maybe (0, 0) id) ◦ key

Answer of exercise 3.1


3.1.1. Implement the insert to scan from left to right.
309

Void insert([T] xs, T x) {


Int i = 0, n = length(xs)
append(xs, x)
while i < n and xs[i] < x { i++ }
while i < n {
xs[n] = xs[n - 1]
n = n - 1
}
xs[i] = x
}

3.1.2. Define the insert function, and call it from the sort algorithm.
Void insert([T] xs, T x) {
append(xs, x)
Int i = length(xs) - 1
while i > 0 and xs[i] < xs[i-1] {
swap(xs[i], xs[i-1])
i--
}
}

[T] sort([T] xs) {


[T] ys = []
for x in xs {
insert(ys, x)
}
return ys
}

Answer of exercise 3.2


3.2.1. For the index array based list, we return the re-arranged index as result. Design
an algorithm to re-order the original array A from the index N ext.
[K] reorder([K] xs, [Int] next) {
Int i = -1
[Int] ys = []
while next[i] 6= -1 {
append(ys, xs[next[i]])
i = next[i]
}
return ys
}

Answer of exercise 4.1


4.1.1. For a big address book in lexicographic order, one may want to speed up with two
concurrent tasks: one reads from the head; the other from the tail. They meet and
stop at some middle point. What does the binary search tree look like? What if
split the list into multiple sections to scale up the concurrency?

Because the address entries are in lexicographic order, the two tasks generate two
unbalanced tree, head:
Th = ((...(∅, k1 , ∅), ...), km , ∅),
and tail:
Tt = (∅, km+1 , (∅, km+2 , ...(∅, kn , ∅))...),
310 ANSWERS

as shown in fig. 4.2(c). With multiple slices, each one generates a tree likes Th ,
then combine to a big unbalanced tree. Figure 4.2(b) builds a zig-zag tree from
the elements with interleaved order. Each node has an empty sub-tree.

Answer of exercise 4.2


4.2.1. Implement the Right-Rotate.
1: function Right-Rotate(T, y)
2: p ← Parent(y)
3: x ← Left(y) . assert x 6= NIL
4: a ← Left(x)
5: b ← Right(x)
6: c ← Right(y)
7: Replace(y, x) . replace y with x
8: Set-Subtrees(y, b, c) . assign b, c sub-trees of y
9: Set-Subtrees(x, a, y) . assign a, y sub-trees of x
10: if p = NIL then . y was the root
11: T ←x
12: return T

Answer of exercise 4.3


4.3.1. Prove the height h of a red-black tree of n nodes is at most 2 lg(n + 1)
We first define the number of black nodes on any path from, but not including, a
node x down to a leaf the black-height of the node, denoted bh(x). By property 5,
all descending paths from the node have the same number of black nodes, hence the
black-height is well defined. Particularly, we define the black-height of a red-black
tree to be the black-height of its root.
Proof. We first prove that any sub-tree rooted at x contains at least 2bh(x) −1 nodes.
Apply mathematical induction on the height of x. if h(x) = 0, then x is NIL. It
contains at least 20 − 1 = 0 nodes. Consider a branch node x, The black-height of
its two sub-trees is either bh(x) (the root of the sub-tree is black) or bh(x) − 1 (the
root of the sub-tree is red). Since the height of the sub-tree of x is less than h(x),
from the induction hypothesis, each sub-tree contains at least 2bh(x)−1 − 1 nodes.
Hence the sub-tree rooted at x contains at least: 2(2bh(x)−1 − 1) + 1 = 2bh(x) − 1
nodes.
Let the height of the tree be h, according to property 4, there are not adjacent
red nodes. There are more than half black nodes on any path from the root to the
leaf. Hence the black-height of the tree is at least h/2. We have:

n ≥ 2h/2 − 1 ⇒ 2h/2 ≤ n + 1

Take logarithm on both sides: h/2 ≤ lg(n + 1), i.e., h ≤ 2 lg(n + 1).

Answer of exercise 4.4


4.4.1. Implement the insert without pattern matching, handle the 4 cases separately.
Node<T> insert(Node<T> t, T x) = makeBlack(ins(t, x))

Node<T> makeBlack(Node<T> t) {
t.color = Color.BLACK
return t
311

Node<T> ins(Node<T> t, T x) {
if t == null then return Node(null, x, null, Color.BLACK)
return if x < t.key
then balance(t.color, ins(t.left, x), t.key, t.right)
else balance(t.color, t.left, t.key, ins(t.right, x))
}

Node<T> balance(Color c, Node<T> l, T k, Node<T> r) {


return if c == Color.BLACK {
if isRed(l) and isRed(l.left) {
Node(Node(l.left.left, l.left.key, l.left.right, Color.BLACK),
l.key,
Node(l.right, k, r, Color.BLACK),
Color.RED)
} else if isRed(l) and isRed(l.right) {
Node(Node(l.left, l.key, l.right.left, Color.BLACK),
l.right.key,
Node(l.right.right, k, r, Color.BLACK),
Color.RED)
} else if isRed(r) and isRed(r.right) {
Node(Node(l, k, r.left, Color.BLACK),
r.key,
Node(r.left.right, r.right.key, r.right.right, Color.BLACK),
Color.RED)
} else if isRed(r) and isRed(r.left) {
Node(Node(l, k, r.left.left, Color.BLACK),
r.left.key,
Node(r.left.right, r.key, r.right, Color.BLACK),
Color.RED)
} else {
Node(l, k, r, c)
}
} else {
Node(l, k, r, c)
}
}

Bool isRed(Node<T> t) = (t 6= null and t.color == Color.RED)

Answer of exercise 4.5


4.5.1. Implement the ‘mark-rebuild’ delete algorithm: mark the node as deleted without
actually removing it. When the marked nodes exceed 50%, rebuild the tree.
We augment each key x with a flag a in the node. The type of the tree is
T ree (K, Bool). When insert x, call insert (x, T rue) to mark the node active.
When delete, flip a to False, then count the active node number and trigger re-
build if exceeds half.
delete x = rebuild ◦ del x
Where:
del x ∅ = ∅

x < k :
 (c, del x l, (k, a), r)
del x (c, l, (k, a), r) = x>k: (c, l, (k, a), del x r)

x=k: (c, l, (k, F alse), r)

If there are more than half nodes are inactive, we convert the tree to a list, filter
the active nodes and rebuild the tree.
312 ANSWERS

size t < 1 (cap t) : (f romList ◦ toList) t
rebuild t = 2
otherwise : t

Where toList traverse the tree and skip the deleted nodes:

toList ∅ = [(]
a: toList l +
+ [k] +
+ toList r
toList (c, l, (k, a), r) =
otherwise : toList l +
+ toList r

To avoid traverse the entire tree every time when count the nodes, we save the tree
capacity and size in each node. Extend the type of the tree to T ree (K, Bool, Int, Int),
and define node as:
node ∅ = ∅
node c l (k, a, _, _) r = (c, l, (k, a, sz, ca), r)

Where:

sz = size l + size r + (if a then 1 else 0)


(

ca = 1 + cap l + cap r

Function size and cap access the stored size and capacity:

size ∅ = 0 cap ∅ = 0
size (_, (_, _, sz, _), _) = sz cap (_, (_, _, _, ca), _) = ca

we replace the node constructor (c, l, k, r) in insert/delete implementation with the


node function. Below is the example program.
data Elem a = Elem a Bool Int Int deriving (Eq)

active (Elem _ a _ _) = a
getElem (Elem x _ _ _) = x

instance Ord a ⇒ Ord (Elem a) where


compare = compare `on` getElem

insert x = makeBlack ◦ ins (Elem x True 1 1) where


ins e Empty = Node R Empty e Empty
ins e (Node color l k r)
| e < k = balance color (ins e l) k r
| otherwise = balance color l k (ins e r)
makeBlack (Node _ l k r) = Node B l k r

balance B (Node R (Node R a x b) y c) z d = node R (node B a x b) y (node B c z d)


balance B (Node R a x (Node R b y c)) z d = node R (node B a x b) y (node B c z d)
balance B a x (Node R b y (Node R c z d)) = node R (node B a x b) y (node B c z d)
balance B a x (Node R (Node R b y c) z d) = node R (node B a x b) y (node B c z d)
balance color l k r = node color l k r

node c l (Elem k a _ _) r = Node c l (Elem k a sz ca) r where


sz = size l + size r + if a then 1 else 0
ca = cap l + cap r + 1

size Empty = 0
size (Node _ _ (Elem _ _ sz _) _) = sz

cap Empty = 0
cap (Node _ _ (Elem _ _ _ ca) _) = ca
313

delete x = rebuild ◦ del x where


del _ Empty = Empty
del x (Node c l e@(Elem k a sz ca) r)
| x < k = node c (del x l) e r
| x > k = node c l e (del x r)
| x == k = node c l (Elem k False 0 0) r

rebuild t | 2 ∗ size t < cap t = (fromList ◦ toList) t


| otherwise = t

fromList :: (Ord a) ⇒ [a] → RBTree (Elem a)


fromList = foldr insert Empty

toList Empty = []
toList (Node _ l e r) | active e = toList l +
+ [getElem e] +
+ toList r
| otherwise = toList l ++ toList r

Answer of exercise 5.1


5.1.1. We only give the algorithm to test AVL height. Complete the program to test if a
binary tree is AVL tree.
Beside verify the height property, we reuse the toList function for binary tree,
verify if the in-order traverse result is sorted:
verifyAVL t = isAVL t && sorted (toList t) where
sorted [] = True
sorted xs = and (zipWith ( ≤ ) xs (tail xs))

Answer of exercise 6.1


6.1.1. Can we change the definition from Branch (IntTrie a) (Maybe a) (Int-
Trie a) to Branch (IntTrie a) a (IntTrie a), and return Nothing if
the value does not exist, and Just v otherwise?
Besides lookup, we need handle branch node without value (the blank circle nodes
in fig. 6.3) when insert. Alternatively, we can add an additional constructor in
the algebraic data type (ADT) to replace the Maybe type:
data IntTrie a = Empty
| Branch (IntTrie a) a (IntTrie a)
| EmptyBranch (IntTrie a) (IntTrie a)

Answer of exercise 6.2


6.2.1. Implement the lookup function for integer tree.
import Data.Bits

type Key = Int


type Prefix = Int
type Mask = Int

data IntTree a = Empty


| Leaf Key a
| Branch Prefix Mask (IntTree a) (IntTree a)

lookup :: Key → IntTree a → Maybe a


lookup _ Empty = Nothing
lookup k (Leaf k' v) = if k == k' then Just v else Nothing
314 ANSWERS

lookup k (Branch p m l r) | match k p m = if zero k m then lookup k l


else lookup k r
| otherwise = Nothing

match :: Key → Prefix → Mask → Bool


match k p m = (mask k m) == p

mask :: Int → Mask → Int


mask x m = (x .&. complement (m - 1))

zero :: Int → Mask → Bool


zero x m = x .&. (shiftR m 1) == 0

6.2.2. Implement the pre-order traverse for both integer trie and integer tree. Only
output the keys when the nodes store values. What pattern does the result follow?

We first convert an integer trie to assoc-list. The pre-order is the recursive ‘middle-
left-right’ order. When traverse the empty tree, the result is [ ], For branch node
(l, m, r), let the recursive pre-order lists of the left and right sub-trees be as and bs
respectively. For the middle Maybe value m, if it’s Nothing, the result is as + + bs;
if it’s Just v, then the result is (k, v):as +
+ bs, where k is the corresponding binary
integer (in little endian, 0 for left, 1 for right).
toList = go 0 1 where
go _ _ Empty = []
go k n (Branch l m r) = case m of
Nothing → as + + bs
(Just v) → (k, v) : as + + bs
where
as = go k (2 ∗ n) l
bs = go (n + k) (2 ∗ n) r

We start from the root. Let k = 0, the depth d = 0. If go left, then k 0 = 0, if go


right, then k 0 = 1 = 2d + k = 1 + 0; For the next level d = 1, the corresponding
k for the four nodes are (00)2 = 0, (10)2 = 21 + 0, (01)2 = 1, and (11)2 = 21 + 1.
Basically, for the node in level d, let k = (ad ...a2 a1 a0 )2 , when go left, then k 0 = k,
when go right, then k 0 = ad 2d + k. In above implementation, we start with k = 0,
n = 20 = 1 to call go k n. Call go k 2n when go left, go (n + k) 2n when go right.
We get the keys in pre-order through keys = f st ◦ unzip ◦ toList. We can use tail
recursive call to optimize as ++ bs:
toList = go 0 1 [] where
go _ _ z Empty = z
go k n z (Branch l m r) = case m of
Nothing → xs
(Just v) → (k, v) : xs
where xs = go k (2 ∗ n) (go (n + k) (2 ∗ n) z r) l

Further, we can define generic pre-order fold for integer trie. Different from the
f old in chapter 2, the keys are computed while folding.
foldpre f z = go 0 1 z where
go _ _ z Empty = z
go k n z (Branch l m r) = f k m (go k (2 ∗ n) (go (n + k) (2 ∗ n) z r) l)

We redefine the toList with fold:


toList = foldpre f [] where
f _ Nothing xs = xs
f k (Just v) xs = (k, v) : xs
315

It’s more straightforward to implement the pre-order traverse for integer prefix
tree than trie. We needn’t compute keys. Below is the generic fold in pre-order:
foldpre _ z Empty = z
foldpre f z (Leaf k v) = f k v z
foldpre f z (Branch p m l r) = foldpre f (foldpre f z r) l

We can convert a integer tree to assoc-list and get keys with fold:
toList = foldpre (λk v xs → (k, v):xs) []
keys = fst ◦ unzip ◦ toList

When populate keys of a tree, their binary bits are in ascending order for both
little endian binary trie and big endian integer prefix tree. To verify it, we define
a function bitsLE converting an integer to a list of bits in little endian. Then use
it to verify the key ordering of trie:
verify kvs = sorted $ map bitsLE $ keys $ fromList kvs where
sorted [] = True
sorted xs = and $ zipWith ( ≤ ) xs (tail xs)
bitsLE 0 = []
bitsLE n = (n `mod` 2) : bitsLE (n `div` 2)

Where kvs is a list of random key-value pairs. The corresponding verification for
big endian integer prefix tree is as below:
verify kvs = sorted $ keys $ fromList kvs

Answer of exercise 6.3


6.3.1. Eliminate the recursion to implement the prefix tree lookup purely with loops.
Optional<V> lookup(PrefixTree<K, V> t, K key) {
if t == null then return Optional.Nothing
Bool match
repeat {
match = False
for k, tr in t.subtrees {
if k == key then return Optional.of(tr.value)
(K prefix, K k1, K k2) = lcp(key, k)
if prefix 6= [] and k2 == [] {
match = True
key = k1
t = tr
break
}
}
} until not match
return Optional.Nothing
}

Answer of exercise 6.4


6.4.1. Implement the auto-completion and predictive text input with trie.
For input prefix ks, we advance to node t in the trie, expand all sub-trees, the take
the first n items:
import Data.Map (Map)
import qualified Data.Map as Map

startsWith :: Ord k ⇒ [k] → Trie k v → [([k], v)]


316 ANSWERS

startsWith [] (Trie Nothing ts) = enum ts


startsWith [] (Trie (Just v) ts) = ([], v) : enum ts
startsWith (k:ks) (Trie _ ts) = case Map.lookup k ts of
Nothing → []
Just t → map (first (k:)) (startsWith ks t)

enum :: Ord k ⇒ Map k (Trie k v) → [([k], v)]


enum = (concatMap (λ(k, t) →
map (first (k:)) (startsWith [] t))) ◦ Map.assocs

get n k t = take n $ startsWith k t

Where first f (a, b) = (f a, b), it applies the function f to the first one of a pair.
When implement predictive input with trie, we lookup MT 9 for all characters
mapped to a digit, then lookup the trie for candidate words.
findT9 [] _ = [[]]
findT9 (d:ds) (Trie _ ts) = concatMap find cts where
cts = case Map.lookup d mapT9 of
Nothing → []
Just cs → Map.assocs $ Map.filterWithKey (λc _ → c `elem` cs) ts
find (c, t) = map (c:) (findT9 ds t)

6.4.2. How to ensure the candidates in lexicographic order in the auto-completion and
predictive text input program? What’s the performance change accordingly?
From Exercise 6.2.2, if traverse a binary prefix tree in pre-order, the result is
in lexicographic order. For multi-way prefix tree, we need traverse sub-trees in
lexicographic order. If the sub-trees are managed with self-balancing tree (like the
red-black tree or AVL tree), we can do this in linear time (Exercise 2.2.2). If the
sub-trees are stored in hash table or assoc-list, then we need O(n lg n) time to sort
them.

Answer of exercise 7.1


7.1.1. Can we use ≤ to support duplicated keys in B-Tree?
We can use ≤ to allow duplicated keys in B-tree: all left ones are ≤ x, while x ≤
all right ones. However, we need additional logic to handle duplicated keys when
lookup and delete a key. Typically, we constraint the key to be unique, while a
key can map to multiple values: k 7→ [v1 , v2 , ...], called multi-map.
7.1.2. For the ‘split then insert’ algorithm, eliminate the recursion with loops.
BTree<K, deg> insertNonfull(BTree<K, deg> tr, K key) {
var root = t
while not is_leaf(t) {
Int i = length(t.keys)
while i > 0 and key < t.keys[i-1] {
i = i - 1
}
if full(d, t.subTrees[i]) {
split(d, t, i)
if key > t.keys[i] then i = i + 1
}
t = t.subTrees[i]
}
orderedInsert(t.keys, key)
return root

7.1.3. We use linear search among keys to find the proper insert position. Improve the im-
perative implementation with binary search. Is the big-O performance improved?
317

void orderedInsert([K] xs, K x) {


append(xs, x)
Int p = binarySearch(xs, x)
for Int i = length(lst), i > p, i = i - 1 {
xs[i] = xs[i-1]
}
xs[p] = x
}

Int binarySearch([K] xs, K x) {


Int l = 0, u = length(xs)
while l < u {
Int m = (l + u) / 2
if xs[m] == x {
return m
} else if xs[m] < x {
l = m + 1
} else {
u = m
}
}
return l
}

The performance is still linear. Although the binary search speed up to O(lg n),
the insertion takes O(n) time to shift elements.

Answer of exercise 7.2


7.2.1. Improve the imperative lookup with binary search among keys.
Optional<(BTree<K, deg>, Int)> lookup(BTree<K, deg> tr, K key) {
loop {
Int l = 0, u = length(tr.keys)
while l < u {
Int m = (l + u) / 2
if key == tr.keys[m] {
return Optional.of((tr, m))
} else if t.keys[m] < key {
l = m + 1
} else {
u = m
}
}
if isLeaf(tr) {
return Optional.Nothing
} else {
tr = tr.subTrees[l]
}
}
}

Answer of exercise 7.3


7.3.1. When delete a key k from the branch node, we use the maximum key from the
predecessor sub-tree k 0 = max(t0 ) to replace k, then recursively delete k 0 from
t0 . There is a symmetric method, to replace k with the minimum key from the
successor sub-tree. Implement this solution.
We first define the min function to get the minimum key:
min' (BTree ks []) = head ks
318 ANSWERS

min' (BTree _ ts) = min' $ head ts

When partition the tree with x, we need use ≤, but not <, because x may be the
last one on the left. We abstract the partition predicate as a parameter:
partitionWith p (BTree ks ts) = (l, t, r) where
l = (ks1, ts1)
r = (ks2, ts2)
(ks1, ks2) = L.span p ks
(ts1, (t:ts2)) = L.splitAt (length ks1) ts

We can next implement the delete with min:


delete' x (d, t) = fixRoot (d, del x t) where
del x (BTree ks []) = BTree (L.delete x ks) []
del x t = if (Just x) == (listToMaybe $ reverse ks') then
let k' = min' t' in
balance d ((init ks') ++ [k'], ts') (del k' t') r
else balance d l (del x t') r
where
(l@(ks', ts'), t', r) = partitionWith ( ≤ x) t

7.3.2. Define the delete function for the ‘paired list’ implementation.
delete x (d, t) = fixRoot (d, del x t) where
del _ Empty = Empty
del x t = if (Just x) == fmap fst (listToMaybe r) then
case t' of
Empty → balance d l Empty (tail r)
_ → let k' = max' t' in
balance d l (del k' t') ((k', snd $ head r):(tail r))
else balance d l (del x t') r
where
(l, t', r) = partition (< x) t

max' t@(BTree _ _ []) = max' (stepL t)


max' (BTree _ _ [(k, Empty)]) = k
max' (BTree _ _ [(k, t)]) = max' t
max' t = max' (stepR t)

We need add additional logic in balance to fix the too low cases after delete:
balance :: Int → [(a, BTree a)] → BTree a → [(a, BTree a)] → BTree a
balance d l t r | full d t = fixFull
| low d t = fixLow l t r
| otherwise = BTree l t r
where
fixFull = let (t1, k, t2) = split d t in BTree l t1 ((k, t2):r)
fixLow ((k', t'):l) t r = balance d l (unsplit t' k' t) r
fixLow l t ((k', t'):r) = balance d l (unsplit t k' t') r
fixLow l t r = t −− l == r == []

Where unsplit is the reverse of split:


unsplit t1 k t2@(BTree (_:_) _ _) = unsplit t1 k (stepL t2)
unsplit t1@(BTree _ _ (_:_)) k t2 = unsplit (stepR t1) k t2

Answer of exercise 8.1


8.1.1. No, it is not correct. The sub-array [a2 , a3 , ..., an ] can’t map back to binary heap.
It’s insufficient to only apply Heapify from a2 , we need run Build-Heap to
rebuild the heap.
319

8.1.2. For the same reason, it does not work.

Answer of exercise 8.2


8.2.1. Implement leftist heap and skew heap imperatively.
We add a parent reference in the node definition for easy back-tracking.
data Node<T> {
T value
Int rank = 1
Node<T> left = null, right = null, parent = null
}

When merge two leftist heaps, we firstly top-down merge along the right sub-tree,
then bottom-up update the rank along the parent reference. Swap the sub-trees
if the left has the smaller rank. To simplify the empty tree handling, we use a
sentinel node.
Node<T> merge(Node<T> a, Node<T> b) {
var h = Node(null) // the sentinel node
while a 6= null and b 6= null {
if b.value < a.value then swap(a, b)
var c = Node(a.value, parent = h, left = a.left)
h.right = c
h = c
a = a.right
}
h.right = if a 6= null then a else b
while h.parent 6= null {
if rank(h.left) < rank(h.right) then swap(h.left, h.right)
h.rank = 1 + rank(h.right)
h = h.parent
}
h = h.right
if h 6= null then h.parent = null
return h
}

Int rank(Node<T> x) = if x 6= null then x.rank else 0

Node<T> insert(Node<T> h, T x) = merge(Node(x), h)

T top(Node<T> h) = h.value

Node<T> pop(Node<T> h) = merge(h.left, h.right)

It’s simpler to merge skew heaps, as we needn’t update the rank, the parent, or
back-track.
data Node<T> {
T value
Node<T> left = null, right = null
}

Below is the merge function, the others are same as the leftist heap.
Node<T> merge(Node<T> a, Node<T> b) {
var h = Node(None)
var root = h
while a 6= null and b 6= null {
if b.value < a.value then swap(a, b)
var c = Node(a.value, left = null, right = a.left)
h.left = c
h = c
320 ANSWERS

a = a.right
}
h.left = if a 6= null then a else b
root = root.left
return root
}

8.2.2. Define fold for heap.

f old f z ∅ = z
f old f z H = f old f (f (top H) z) (pop H)

Answer of exercise 9.1


9.1.1. We should use link but not append. Appending is linear to the length of the list,
while linking is constant time.
9.1.2. Implement the in-place selection sort.
Void sort([K] xs) {
var n = length(xs)
for var i = 0 to n - 1 {
var m = i
for Int j = i + 1 to n - 1 {
if xs[j] < xs[m] then m = j
}
swap(xs[i], xs[m])
}
}

Answer of exercise 9.2


9.2.1. Implement the recursive tournament tree sort in ascending order.
We can realize the ascending sort by replacing the max and −∞ with the min and
∞. Further, we can abstract them as two parameters:
minBy p a b = if p a b then a else b

merge p t1 t2 = Br t1 (minBy p (key t1) (key t2)) t2

fromListWith p xs = build $ map wrap xs where


build [] = Empty
build [t] = t
build ts = build $ pair ts
pair (t1:t2:ts) = (merge p t1 t2) : pair ts
pair ts = ts

popWith p inf = delMin where


delMin (Br Empty _ Empty) = Br Empty inf Empty
delMin (Br l k r) | k == key l = let l' = delMin l in
Br l' (minBy p (key l') (key r)) r
| k == key r = let r' = delMin r in
Br l (minBy p (key l) (key r')) r'

toListWith p inf = flat where


flat Empty = []
flat t | inf == key t = []
| otherwise = (top t) : (flat $ popWith p inf t)

sortBy p inf xs = toListWith p inf $ fromListWith p xs where


321

sortBy (<) ∞ defines the ascending sort, and sortBy (>) − ∞ defines the de-
scending sort.
9.2.2. How to handle duplicated elements with the tournament tree? is tournament tree
sort stable?
From Exercise 9.2.1, we can handle the duplicated elements with sortBy (≤) ∞
(ascending sort) for example. The tournament tree sort is not stable.
9.2.3. Compare the tournament tree sort and binary search tree sort in terms of space
and time performance.
They are both bound to O(n lg n) time, and O(n) space. The difference is, the
binary search tree does not change after build (unless insert, delete), while the
tournament tree changes to a tree with n nodes of infinity.
9.2.4. Compare heap sort and tournament tree sort in terms of space and time perfor-
mance.
They are both bound to O(n lg n) time, and O(n) space. The difference is, the
heap becomes empty after sort complete, while the tournament tree still occupies
O(n) space.

Answer of exercise 10.1


10.1.1. Write a program to generate Pascal’s triangle.
pascal = gen [1] where
gen cs (x:y:xs) = gen ((x + y) : cs) (y:xs)
gen cs _ = 1 : cs

Given any row in Pascal’s triangle, this function generates the next row. We can
build the first n rows of pascal triangle as: take n (iterate pascal [1]).
10.1.2. Prove that the i-th row in tree Bn has ni nodes.


Proof. Use induction. There is only a node (the root) in B0 . Assume every row in
Bn is a list of binomial numbers. Tree  Bn+1 is composed from two Bn trees. The
0-th row contains the root: 1 = n+1 0 . The i-th row has two parts: one from the
(i − 1)-th row of the left most sub-tree Bn , the other from the i-th row of the other
Bn tree. In total:

n
 n
 n! n!
i−1 + i = +
(i − 1)!(n − i + 1)! i!(n − i)!
n![i + (n − i + 1)]
=
i!(n − i + 1)!
(n + 1)!
=
i!(n −
 i + 1)!
n+1
= i

10.1.3. Prove there are 2n elements in a Bn tree.


Proof. From the previous exercise, sum all rows of Bn tree:
n n n
Sum rows
  
0 + 1 + ... + n
= (1 + 1)n Let a = b = 1 in (a + b)n
n
= 2
322 ANSWERS

10.1.4. Use a container to store sub-trees, how to implement link? How to ensure it is in
constant time?

If store all sub-trees in an array, we need linear time to insert a new tree ahead:
1: function Link’(T1 , T2 )
2: if Key(T2 ) < Key(T1 ) then
3: Exchange T1 ↔ T2
4: Parent(T2 ) ← T1
5: Insert(Sub-Trees(T1 ), 1, T2 )
6: Rank(T1 ) ← Rank(T2 ) + 1
7: return T1
We can store the sub-trees in reversed order, it needs constant time to append the
new tree to the tail.

Answer of exercise 10.2


Why is Decrease bound to amortized O(1) time?

Define the potential function as:

Φ(H) = t(H) + 2m(H)

Where t(H) is the number of trees in the heap, and m(H) is the number of the nodes
being marked. As we mark, then later cut and clear the flag, its coefficient is 2. Decrease
takes O(1) time to cut x off, then recursively call Cascade-Cut. Assume it’s recursively
called c times. Each time takes O(1) time to call Cut, then continue recursion. Hence
the total cost of Decrease is O(c).
For the potential change, let H be the heap before we call Decrease, every recursive
Cascade-Cut cuts a marked node off, then clear the flag (except for the last call). After
that, there are t(H) + c trees, including the original t(H) trees, the cut and added back
c − 1 trees, and the tree with x as the root. There are at most m(H) − c + 2 marked
nodes, including the original m(H) nodes, minus the c − 1 nodes being cleared in the
Cascade-Cut call. The last call may mark another node. The potential changes at
most:

t(H) + c + 2(m(H) − c + 2) − [t(H) + 2m(H)] = 4 − c

Hence the amortized cost is at most O(c) + 4 − c = O(1).

Answer of exercise 10.3


10.3.1. If continuously insert n elements then followed with a pop. The performance
overhead is big when n is a large number (although the amortized performance is
O(lg n)). How to mitigate such worst case?

Set a threshold m to the number of sub-trees. When insert, check whether it


exceeds m, if yes, then perform a pop then add the element back.
MAX_SUBTREES = 16

Node<K> insert(Node<K> h, K x) {
if h 6= null and length(h.subTrees) > MAX_SUBTREES {
h = insert(pop(h), top(h))
}
return merge(h, Node(x))
}
323

10.3.2. Implement delete for the pairing heap.


Add a parent reference to the node definition:
data Node<K> {
K key
Node<K> parent = null
[Node<K>] subTrees = []

Node<K>(K k) { key = k }
}

When delete x, first lookup the heap h to find the sub-tree t rooted at x. If t is
the root of h, then merely do a pop; otherwise, get the parent p of t, remove t from
its sub-trees. Then apply pop, and finally merge pop(t) back with h.
Node<K> delete(Node<K> h, K x) {
var tr = lookuptr(h, x)
if tr == null then return h
if tr == h then return pop(h)
tr.parent.subtrees.remove(tr)
tr.parent = null
return merge(pop(tr), h)
}

Node<K> lookuptr(Node<K> h, K x) {
if h.key == x then return h
for var t in h.subtrees {
var tr = lookuptr(t, x)
if tr 6= null then return tr
}
return null
}

The recursive lookup takes O(n) time, where n is the number of elements in the
heap. Then it takes O(m) time to remove t from m sub-trees. The total perfor-
mance is O(n).
10.3.3. Implement Decrease-Key for the pairing heap.
If decrease the key of the root of h, we directly update it to x; otherwise, get the
parent of tr, then cut tr off from the sub-trees; update the key of tr to x, then
merge tr back to h:
Node<K> decreaseKey(Node<K> h, Node<K> tr, K x) {
if tr == null or tr.key < x then return h
tr.key = x
if tr == h then return h
tr.parent.subtrees.remove(tr) // O(m), where m = length(subtrees)
tr.parent = null
return merge(tr, h)
}

When use, we first call lookuptr(h, y) to find the node, then update from y to
x, i.e., deceaseKey(h, lookuptr(h, y), x). The performance is as same as delete,
which is O(n).

Answer of exercise 11.1


11.1.1. The circular buffer is allocated with a predefined size. We can use two references,
head and tail instead. How to determine if a circular buffer queue is full or empty?
(the head can be either ahead of tail or behind it.)

Although we can deal with different cases that the head is before/behind the tail,
as shown in fig. 11.4, let us seek for some simple and unified solution. Consider an
324 ANSWERS

array with both sides open and are extending infinitely. Let the head index be h,
the tail be t, the size of the buffer be s. The range [h, t) (left close, right open) is
occupied with elements. The empty/full testing is as below:
(
empty(h, t) : h = t
f ull(h, t) : t−h=s

The circular buffer is essentially modular arithmetic on top of this. Write [n]s =
n mod s. Apply to above full test: [t]s − [h]s = [s]s = 0, which gives [t]s = [h]s .
It is exactly the empty test condition. It means we can’t differentiate empty with
full only with the modular result. We need either add a flag (indicate the order
between h and t), or use the original index (of the infinite long segment) without
applying mod for the empty/full test. Consider the integer number has limited
byte size, we can mod the index with a big number p (p > s) which is co-prime
with s:
(
empty(h, t) : [h]p = [t]p
f ull(h, t) : [t − h]p = s

Below is the example program:


Int P_LIMIT = 104743 // the 10000th prime
Bool empty(Queue<K> q) = (q.h == q.t)
Bool full(Queue<K> q) = (q.s == (q.t - q.h) mod P_LIMIT)

void enqueue(Queue<K> q, K x) {
if not full(q) {
q.t = (q.t + 1) mod P_LIMIT
q.buf[q.t mod q.s] = x
}
}

Optional<K> dequeue(Queue<K> q) {
Optional<K> x = Optional.Nothing
if not empty(q) {
x = Optional.of(q.buf[q.h mod q.s])
q.h = (q.h + 1) mod P_LIMIT
}
return x
}

Answer of exercise 11.2


11.2.1. Why need balance check and adjustment after push?
Consider the case, first push a ([ ], [ ]), then pop.
11.2.2. Do the amortized analysis for the paired-list queue.
We use the banker’s accounting method. Each element in the rear list r has 1
credit. When push to rear, we take a step to add, and increase 1 credit. The
amortized cost is O(2). When pop, if it doesn’t cause list reverse, we take a step
to remove an element without decreasing the credit. The amortized cost is O(1).
If causes list reverse, we take m steps to reverse, and take a step to remove an
element, where m is the length of r. We also use m credits in r. The amortized
cost is O(m + 1 − m) = O(1).
11.2.3. Implement the paired-array queue.
1: function Push(Q, x)
325

2: Append(Front(Q), x)

3: function Pop(Q)
4: if Rear(Q) = [ ] then
5: Rear(Q) ← Reverse(Front(Q))
6: Front(Q) ← [ ]
7: n ← Length(Rear(Q))
8: x ← Rear(Q)[n]
9: Length(Rear(Q)) ← n − 1
10: return x

Answer of exercise 11.3


11.3.1. Why need rollback an element (we cancelled the previous ‘cons’, removed x and
return a as the result) when n = 0 in abort?
The abort function is only called during pop. When n = 0, the rotate just finished,
it is about to transform the state from (Sc , 0, (x:a), f 0 ) to (Sf , a). While x being
linked previously is the one to be popped, hence, we need remove x and return a
as the result.
11.3.2. Implement the real-time queue with paired arrays. We can’t copy the array when
start rotation, or the performance will downgrade to linear time. Please implement
‘lazy’ copy, i.e., copy an element per step.
Assume we can get the length of an array in constant time. We push element to
the tail of array f , and pop from the tail of array r. When the two arrays are not
balanced, start a state machine to compute acc = reverse(f ) + + r step by step. If
f 6= [ ], we extract the tail element and append to the tail of acc. After reversed
f , we copy the elements from the left of r, and append to the tail of acc one by
one, i.e., append(acc, r[i]), where i = 0, 1, .., |r| − 1. While rotating in steps, one
can still pop from the tail of r, the rotation completes when i exceeds |r|.
data State<K> {
[K] acc, front, rear
Int idx

State([K] f, [K] r) {
acc = [], front = f, rear = r
idx = 0
}

// compute reverse(f) ++ r step by step


Self step() {
if front 6= [] then acc.append(front.popLast()) // reversing
if s.front == [] and idx < length(rear) { // concatenating
acc.append(rear[idx])
idx = idx + 1
}
}

Bool done() = (front == [] and length(rear) < idx)


}

data RealtimeQueue<K> {
[K] front = []
[K] rear = []
State<K> state = null

Bool isEmpty() = (front == [] and rear == [])


326 ANSWERS

Self push(K x) {
front.append(x)
balance()
}

K pop() {
x = rear.popLast()
balance()
return x
}

Void balance() {
if state == null and length(rear) < length(front) {
state = State(front, rear).step()
front = []
}
if state 6= null and state.step().done() {
rear = state.acc
state = null
}
}
}

Answer of exercise 12.1


12.1.1. How to handle the out of bound exception?

We can use Maybe type to handle the out of bound cases. If the index i < 0, return
Nothing; if the index exceeds, then it eventually converts to the empty forest after
recursion. We return Nothing in this case too.
getAt [] _ = Nothing
getAt (t:ts) i | i < 0 = Nothing
| i < size t = lookupTree i t
| otherwise = getAt ts (i - size t)
where
lookupTree 0 (Leaf x) = Just x
lookupTree i (Node sz t1 t2) = if i < sz `div` 2 then lookupTree i t1
else lookupTree (i - sz `div` 2) t2

Answer of exercise 12.2


12.2.1. Implement the random access for numeric representation S[i], 1 ≤ i ≤ n, where n
is the length of the sequence.
We skip the index out of bound exception:
getAt (Zero:ts) i = getAt ts i
getAt (One t:ts) i = if i < size t then lookupTree i t
else getAt ts (i - size t)
where
lookupTree 0 (Leaf x) = x
lookupTree i (Node sz t1 t2) = if i < sz `div` 2 then lookupTree i t1
else lookupTree (i - sz `div` 2) t2

12.2.2. Analyze the amortized performance of delete.


Consider the reverse process of insert, delete from a sequence with n = 2m elements
repeatedly till it becomes empty. The analysis is symmetric to insert, and it gives
the amortized constant time.
12.2.3. We can represent the full binary tree with array of length 2m , where m is none
negative integer. Implement the binary tree forest, insert, and random access.
327

data List<K> {
Int size = 0
[[K]] trees = [[]]
}

Int nbits(Int n) {
Int i = 0
while n 6= 0 {
i = i + 1
n = n / 2
}
return i
}

List<K> insert(List<K> xs, K x) {


var i = nbits(xs.size xor (1 + xs.size)) // locate the bit flip to 1
if i ≥ length(xs.trees) then xs.trees.append([])
xs.trees[i] = [x]
for Int j = 0, j < i, j++ {
xs.trees[i] = xs.trees[i] ++ xs.trees[j]
xs.trees[j] = []
}
xs.size = xs.size + 1
return xs
}

Optional<K> get(List<K> xs, Int i) {


for t in xs.trees {
Int size = length(t)
if i < size then return Optional.of(t[i])
i = i - size
}
return Optional.Nothing
}

Answer of exercise 12.3


12.3.1. Analyze the amortized performance for paired-array delete.

Define the potential of the paired-array sequence as the difference of the array
lengths: Φ(s) = |r| − |f | = n − m, where m = |f | and n = |r|. When delete from
the head, if f 6= [ ], then takes O(1) time to remove the last element of f . If f = [ ],
then takes O(n) time to halve r, reverse and replace as f 0 , and then uses another
O(1) time to remove the last element of f 0 . The amortized cost is:

c = n + 1 + Φ(s0 ) − Φ(s)
= n + 1 + (|r0 | − |f 0 |) − (|r| − |f |)
n n
= n + 1 + (n − d e) − (d e − 1) − (n − 0)
2 2
= 1

Hence the amortized time is O(1) when delete from head, symmetrically, the amor-
tized time is O(1) when delete from tail too.

Answer of exercise 12.4


12.4.1. Eliminate recursion, implement insert with loop.
Let Mid(T ) = t access the middle part of tree T = (f, t, r).
1: function Insert(x, T )
328 ANSWERS

2: n = (x)
3: ⊥← p ← ([ ], T, [ ])
4: while |Front(T )| ≥ 3 do
5: f ← Front(T )
6: n ← (f [2], f [3], ...)
7: Front(T ) ← [n, f [1]]
8: p←T
9: T ← Mid(T )
10: if T = NIL then
11: T ← ([n], NIL, [ ])
12: else if |Front(T )| = 1 and Rear(T ) = [ ] then
13: Rear(T ) ← Front(T )
14: Front(T ) ← [n]
15: else
16: Insert(Front(T ), n)
17: Mid(p) ← T
18: T ← Mid(⊥), Mid(⊥) ← NIL
19: return T
We wrap x in a leaf (x). If there are more than 3 elements in f , we go top-down
along the middle part. We extract the elements except the first one in f out, wrap
them in a node n (depth + 1), then insert n to the middle. We form n and the
remaining in f as the new f finger. At the end of traverse, we either reach an
empty tree, or a tree can hold more elements in f . For empty tree case, we create
a new leaf node; otherwise, we insert n to the head of f . We return the root T .
To simplify implementation, we create a special ⊥ node as the parent of the root.

Answer of exercise 12.5

12.5.1. Eliminate recursion, implement extract in loops.


We borrow node from the middle when f is empty. However, the tree may not
well formed, e.g., both f and the middle are empty. It is caused by splitting.

[] 2 r[1][1], r[1][2], ...

[] ... r[1][1], r[1][2], ...

n[ ][1], n[ ][2], ... ... r[ ][1], r[ ][2], ...

Figure 12.9: The f isn’t empty at level i.


To extract the first element, we need a top-bottom pass, locate a sub-tree, either
f isn’t empty, or both f and the middle are empty as shown in fig. 12.9. For the
former, we extract the first node from f ; for the latter, we swap f and r, convert
it to the former case. If the node extracted from f isn’t a leaf, we need go on
extracting. We back track along the parent, till extract a leaf and reach to the
root, as shown in fig. 12.10.
329

[] ... r[1][1], r[1][2], ...

n[ ][1], n[ ][2], ... r[ ][1], r[ ][2], ...

sub-trees of n[ ][1]

n[ ][2], ... ... r[ ][1], r[ ][2], ...

Extract the first n[i][1], move its sub-tree to f in upper level.


1
extract

2 r[1][1], r[1][2], ...

n[2][2], n[2][3], ... ... r[2][1], r[2][2], ...

n[ ][2], n[ ][3], ... r[ ][1], r[ ][2], ...

sub-trees of n[ ][1]

n[ ][2], ... ... r[ ][1], r[ ][2], ...

Repeat i times, extract x[1].


Figure 12.10: Bottom up back track to extract a leaf.

Assume the tree isn’t empty, we implement extract as below:


1: function Extract(T )
2: ⊥← ([], T, [])
3: while Front(T ) = [ ] and Mid(T ) 6= NIL do
4: T ← Mid(T )
5: if Front(T ) = [ ] and Rear(T ) 6= [ ] then
6: Exchange Front(T ) ↔ Rear(T )
7: f ← Front(T ), r ← Rear(T )
8: n ← (f [1], f [2], ...) . n is 2-3 tree
9: repeat
10: Front(T ) ← [n2 , n3 , ..]
11: n ← n1
12: T ← Parent(T )
13: if Mid(T ) becomes empty then
14: Mid(T ) ← NIL
15: until n is leaf
16: return (Elem(n), Mid(⊥))

Where function Elem(n) accesses the element in sub-tree n. We need change the
way to access the first/last element of finger tree. If the finger is empty, and the
middle isn’t empty, we need search along the middle.
1: function First-Leaf(T )
330 ANSWERS

2: while Front(T ) = [ ] and Mid(T ) 6= NIL do


3: T ← Mid(T )
4: if Front(T ) = [ ] and Rear(T ) 6= [ ] then
5: n ← Rear(T )[1]
6: else
7: n ← Front(T )[1]
8: while n is NOT leaf do
9: n ← n1
10: return n

11: function First(T )


12: return Elem(First-Leaf(T ))
In the second loop, if the node is not a leaf, we need traverse along the first sub-tree.
The method to access the last element is symmetric.

Answer of exercise 12.6


12.6.1. For random access, how to handle the empty tree ∅ and out of bound cases?
We check boundaries during random access, for example:

∅[i] = Nothing
i < 0 or i ≥ size T : Nothing
(
T [i] =
otherwise : ...

12.6.2. Implement cut i S, split sequence S at position i.


We give an implementation based on the tree definition in appendix. First do
boundary check, if 0 ≤ i < size s, we next call cutTree i S to split the tree:
cut :: Int → Seq a → (Seq a, Maybe a, Seq a)
cut i (Seq xs) | i < 0 = (Seq Empty, Nothing, Seq xs)
| i < size xs = case cutTree i xs of
(a, Just (Place _ (Elem x)), b) → (Seq a, Just x, Seq b)
(a, Nothing, b) → (Seq a, Nothing, Seq b)
| otherwise = (Seq xs, Nothing, Seq Empty)

cutTree splits the tree in three parts: left, middle, and right. We wrap the middle
in Maybe type to handle the not found case; when found, the result is a pair of
position i0 and node a, wrapped in the type of Place. if i points to finger f or r,
we call cutList to further split, then build the result; if i points to the middle, we
recursively cut the middle to obtain a place Place i0 a, then cut the 2-3 tree a at
position i0 :
cutTree :: (Sized a) ⇒ Int → Tree a → (Tree a, Maybe (Place a), Tree a)
cutTree _ Empty = (Empty, Nothing, Empty)
cutTree i (Lf a) | i < size a = (Empty, Just (Place i a), Empty)
| otherwise = (Lf a, Nothing, Empty)
cutTree i (Br s f m r)
| i < sf = case cutList i f of
(xs, x, ys) → (Empty <<< xs, x, tree ys m r)
| i < sm = case cutTree (i - sf) m of
(t1, Just (Place i' a), t2) →
let (xs, x, ys) = cutNode i' a
in (tree f t1 xs, x, tree ys t2 r)
| i < s = case cutList (i - sm) r of
(xs, x, ys) → (tree f m xs, x, ys >>> Empty)
where
331

sf = sum $ map size f


sm = sf + size m

Where tree f m r builds a finger tree, and simplifies the result:


tree as Empty [] = as >>> Empty
tree [] Empty bs = Empty <<< bs
tree [] m r = Br (size m + sum (map size r)) (nodesOf f) m' r
where (f, m') = uncons m
tree f m [] = Br (size m + sum (map size f)) f m' (nodesOf r)
where (m', r) = unsnoc m
tree f m r = Br (size m + sum (map size f) + sum (map size r)) f m r

We implement the finger cut and 2-3 tree cut as below:


cutList :: (Sized a) ⇒ Int → [a] → ([a], Maybe (Place a), [a])
cutList _ [] = ([], Nothing, [])
cutList i (x:xs) | i < sx = ([], Just (Place i x), xs)
| otherwise = let (xs', y, ys) = cutList (i - sx) xs
in (x:xs', y, ys)
where sx = size x

cutNode :: (Sized a) ⇒ Int → Node a → ([a], Maybe (Place a), [a])


cutNode i (Tr2 _ a b) | i < sa = ([], Just (Place i a), [b])
| otherwise = ([a], Just (Place (i - sa) b), [])
where sa = size a
cutNode i (Tr3 _ a b c) | i < sa = ([], Just (Place i a), [b, c])
| i < sab = ([a], Just (Place (i - sa) b), [c])
| otherwise = ([a, b], Just (Place (i - sab) c), [])
where sa = size a
sab = sa + size b

With cut defined, we can update or delete any element at a given position, move
to front (MTF), they all bound to O(lg n) time.
setAt s i x = case cut i s of
(_, Nothing, _) → s
(xs, Just y, ys) → xs + ++ (x <| ys)

extractAt s i = case cut i s of (xs, Just y, ys) → (y, xs +


++ ys)

moveToFront i s = if i < 0 | | i ≥ size s then s


else let (a, s') = extractAt s i in a <| s'

Answer of exercise 13.1


13.1.1. Optimize the basic quick sort definition for the singleton list case.
Add a case:
sort [x] = [x]

Answer of exercise 13.2


13.2.1. The pairwise merge is faster. This problem is essentially to merge k ordered se-
quences. Assume the average length of the k sequences is n for simplification.
When merge with fold, we first merge s1 and the empty sequence merge([ ], s1 ) =
[ ]⊕s1 , then merge s2 to get [ ]⊕s1 ⊕s2 , then merge s3 to get [ ]⊕s1 ⊕s2 ⊕s3 , ..., the
k(k + 1)
total complexity is O(n+2n+3n+4n+...+kn) = O(n ) = O(nk 2 ). While
2
for pairwise merging, the first round is to merge s1 ⊕ s2 , s3 ⊕ s4 , ..., takes O(kn)
332 ANSWERS

time in total; the next round is to merge (s1 ⊕ s2 ) ⊕ (s3 ⊕ s4 ), ..., taking O(kn)
time too. There are total lgk rounds, hence the total complexity is O(nk lg k).
Therefore the pairwise merge performs better.
We can also merge with a min-heap of size k. Store the minimum elements from
each sequence in the heap, keep popping the overall minimum one, and replace it
with the next element in that sequence. The complexity is O(nk lg k) too.

Answer of exercise 13.3


13.3.1. Define the generic pairwise fold f oldp, and use it to implement the bottom-up
merge sort.
It’s sufficient the binary combination function f be associative, i.e., f (f (x, y), z) =
f (x, f (y, z)). Let the unit of f be z, define the pairwise fold as:
foldp f z [] = z
foldp f z [x] = f x z
foldp f z xs = foldp f z (pairs xs) where
pairs (x:y:ys) = (f x y) : pairs ys
pairs ys = ys

For example, we can define sum = f oldp (+) 0, and the bottom-up merge sort is
defined as4 :
sort = foldp merge [] ◦ map (:[])

Answer of exercise 13.4


13.4.1. Build a binary search tree from a sequence using the idea of merge sort.

We can build the binary search tree either top-down or bottom-up. For top-
down, halve the list to two, recursively build two trees respectively then merge;
for bottom-up, wrap each element to a singleton leaf, then pair-wise merge trees
repeatedly to the final result. Both approaches depend on the tree merge, let us
define it6 . If either tree is empty, the merge result is the other; otherwise, let the
two trees be (A, x, B) and (C, y, D), if x < y, use y to partition B to two trees By
and B y , where By has all elements {b < y}, and By has the rest {b ≥ y}; use x to
partition C to Cx and C x . We then form the merge result as:

(A, x, B) ⊕ (C, y, D) = ((A ⊕ Cx , x, By ⊕ C x ), y, D ⊕ B y )

Where ‘⊕’ means tree merge. Symmetrically, when x ≥ y, we partition A and D.


Below example program implements tree partition and merge. It uses foldp defined
in Exercise 13.3.
toTree :: (Ord a) ⇒ [a] → Tree a
toTree = foldp merge Empty ◦ map leaf

partition x Empty = (Empty, Empty)


partition x (Node a y b)
| x < y = let (a1, a2) = partition x a in (a1, Node a2 y b)
| otherwise = let (b1, b2) = partition x b in (Node a y b1, b2)

merge Empty t = t
4 (:[]) is equivalent to x xs 7→ [x]:xs.
6 For the standalone problem of binary search tree merge, we can flatten the trees to two arrays (or
lists through in-order traverse), then apply merge (see eq. (13.31)), and rebuild the tree with the middle
element as the root.
333

merge t Empty = t
merge (Node a x b) (Node c y d)
| x < y = let
(b1, b2) = partition y b
(c1, c2) = partition x c
in Node (Node (merge a c1) x (merge b1 c2)) y (merge d b2)
| otherwise = let
(a1, a2) = partition y a
(d1, d2) = partition x d
in Node (merge a1 c) y (Node (merge a2 d1) x (merge b d2))

Answer of exercise 14.1


14.1.1. Prove the performance of k-selection problem is O(n) in average.
Refer to the performance analysis of quick sort in section 13.1.3.
14.1.2. To find the top k element in A, we can search x = max (take k A), y = min (drop k A).
If x < y, then the first k elements in A is the answer; otherwise, we partition the
first k elements with x, partition the rest with y, then recursively find in sub-
sequence [a ← A, x < a < y] for the top k 0 elements, where k 0 = k − |[a ← A, a ≤
x]|. Implement this solution, and evaluate its performance.
1: procedure Tops(k, A)
2: l←1
3: u ← |A|
4: loop
5: i ← Max-At(A[l..k])
6: j ← Min-At(A[k + 1..u])
7: if A[i] < A[j] then
8: break
9: Exchange A[l] ↔ A[j]
10: Exchange A[k + 1] ↔ A[i]
11: l ← Partition(A, l, k)
12: u ← Partition(A, k + 1, u)
The performance is O(n) in average. Every loop takes linear time to locate the
min i, max j. Then partitions two rounds in linear time. If the partition is
balanced, we discard half elements in average, hence the total time is bound to:
O(n + n/2 + n/4...) = O(n).
14.1.3. Find the ‘simplified’ median of two sorted arrays A and B in O(lg(m + n)) time,
where m = |A|, n = |B|. The array index starts from 0. The simplified median
m+n
is defined as median(A, B) = C[b c], where C = merge(A, B) is the merged
2
5
sorted array .
We give two solutions. The first applies binary search to each array. Let l = 0
and u = m be the lower and upper bounds respectively. We guess the median in
l+u
A is at index i = b c. According to the definition of the simplified median,
2
m+n
there are total h = b c elements before it. Where there are i elements before
2
A[i] in A. If we guess right, then there are j = h − i elements before A[i] in B. If
5 In statistics, the median of an ascending data set x with n elements is defined as:

odd(n) : x[ n+1
2
]
median(x) = 1 n n
even(n) : (x[ ] + x[ + 1])
2 2 2
334 ANSWERS

B[j] ≤ A[i] ≤ B[j + 1] holds, then the guessed A[i] is the median. Otherwise, we
need update the boundaries l or u if the guess is too big or small. Below example
program implements this solution:
K median([K] a, [K] b) {
if a == [] then return b[length(b) / 2]
if b == [] then return a[length(a) / 2]
Int i = medianOf(a, b)
return if i == -1 then return median(b, a) else a[i]
}

Int medianOf([K] a, [K] b) {


Int l = 0, u = length(a)
while l < u {
var i = (l + u) / 2
var j = (length(a) + length(b)) / 2 - i
if j < 1 or j ≥ len(b) {
if (j == 0 and a[i] ≤ b[0]) or
(j == len(b) and b[j - 1] ≤ a[i]) then return i
if j ≥ len(b) then l = i + 1 else u = i
} else {
if b[j - 1] ≤ a[i] and a[i] ≤ b[j] then return i
if a[i] < b[j - 1] then l = i + 1 else u = i
}
}
return -1
}

The second solution is to develop a generic function that looks for the k-th element.
Assume m ≥ n (otherwise swap A and B), If either array is empty, then return
the k-th element of the other array. If k = 1, then return the smaller one between
A[0] and B[0]. Otherwise guess j = min(k/2, n) and i = k − j, then check A[i]
and B[j]. If A[i] < B[j], we drop all elements before A[i] and after B[j], then
recursively find the (k − i)-th element of the remaining; otherwise, drop all before
B[j] and after A[i] for recursively finding.
K median([K] xs, [K] ys) {
Int n = length(xs), m = length(ys)
return kth(xs, 0, n, ys, 0, m, (m + n) / 2 + 1)
}

K kth([K] xs, Int x0, Int x1, [K] ys, Int y0, Int y1, Int k) {
if x1 - x0 < y1 - y0 then return kth(ys, y0, y1, xs, x0, x1, k)
if x1 ≤ x0 then return ys[y0 + k - 1]
if y1 ≤ y0 then return xs[x0 + k - 1]
if k == 1 then return min(xs[x0], ys[y0])
var j = min(k / 2, y1 - y0), i = k - j
i = x0 + i, j = y0 + j
if xs[i - 1] < ys[j - 1] then
return kth(xs, i, x1, ys, y0, j, k - i + x0)
else
return kth(xs, x0, i, ys, j, y1, k - j + y0)
}

We can’t define the simplified median as the m ∈ A +


+ B, satisfying:
|[y ← A +
+ B, y < m]| − |[y ← A +
+ B, y > m]| = 0, ±1
Meaning there are same amount of elements that are greater than m and less than
m, or they only differ by 1. Consider the counter example: [0, 1, 2, 3, 3, 3, 3, 3, 5].
It doesn’t work even if we use ≤ and ≥ instead.
14.1.4. For the saddle back search, eliminate recursion, implement it in loops to update
the boundary.
335

1: function Solve(f, z)
2: p ← 0, q ← z
3: S←φ
4: while p ≤ z and q ≥ 0 do
5: z 0 ← f (p, q)
6: if z 0 < z then
7: p←p+1
8: else if z 0 > z then
9: q ←q−1
10: else
11: S ← S ∪ {(p, q)}
12: p ← p + 1, q ← q − 1
13: return S
14.1.5. For 2D search, let the bottom-left be the minimum, the top-right be the maximum.
if z is less than the minimum or greater than the maximum, then no solution;
otherwise cut the rectangle into 4 parts with a horizontal line and a vertical line
crossed at the center. then recursive search in these 4 small rectangles. Implement
this solution and evaluate its performance.
1: procedure Search(f, z, a, b, c, d) . (a, b): bottom-left (c, d): top-right
2: if z ≤ f (a, b) or f (c, d) ≥ z then
3: if z = f (a, b) then
4: record (a, b) as a solution
5: if z = f (c, d) then
6: record (c, d) as a solution
7: return
a+c
8: p←b c
2
b+d
9: q←b c
2
10: Search(f, z, a, q, p, d)
11: Search(f, z, p, q, c, d)
12: Search(f, z, a, b, p, q)
13: Search(f, z, p, b, c, q)
Let the time to search in rectangle of area A be T (A). We take O(1) time to check
whether z ≤ f (a, b) or f (c, d) ≥ z, then divide into 4 smaller areas, i.e., T (A) =
4T (A/4) + O(c). We apply the master theorem, the complexity is O(A) = O(mn),
which is proportion to the area. It’s essentially same as exhaustive search in the
rectangle.

Answer of exercise 14.2


14.2.1. Extend to find k majorities that occurs over bn/kc times in collection A, where
n = |A|.

We use a dictionary of M ap : T 7→ Int, where T is the element type in A. It


records the net-wins for candidate a. Start the dictionary from empty ∅. We scan
A while updating the dictionary: f oldr maj ∅ A, where maj is defined as:

a ∈ m :
 m[a] ← m[a] + 1
maj a m = |m| < k : m[a] ← 1 (14.21)
otherwise : f ilter (b →

7 m[b] 6= 0) {b 7→ m[b] − 1|b ∈ m}

336 ANSWERS

For every a in A, if a ∈
/ m (new to the dictionary), and the candidates in m is less
than k, we add a to m with one net-win vote: m[a] ← 1; if a ∈ m, add the vote
by 1: m[a] ← m[a] + 1; otherwise, if there are already k candidates, we reduce the
vote by 1 for every one, and remove the candidate when the vote becomes 0.
We need verify the remaining candidates at last, whether the votes > n/k, let
m0 = {(a, 0)|a ∈ m}. Scan A again: f oldr cnt m0 A, where cnt is defined as:
cnt a m0 = if a ∈ m0 then m0 [a] ← m0 [a] + 1 else m0 (14.22)
After scan, m0 records the votes for each candidate, we filter the true winners in:
keys (f ilter (> n/k) m0 ).
majorities k xs = verify $ foldr maj Map.empty xs where
maj x m | x `Map.member` m = Map.adjust (1+) x m
| Map.size m < k = Map.insert x 1 m
| otherwise = Map.filter ( 6= 0) $ Map.map (-1 +) m
verify m = Map.keys $ Map.filter (> th) $ foldr cnt m' xs where
m' = Map.map (const 0) m
cnt x m = if x `Map.member` m then Map.adjust (1+) x m else m
th = (length xs) `div` k

Below is the corresponding iterative implementation:


1: function Maj(k, A)
2: m ← {}
3: for each a in A do
4: if a ∈ m then
5: m[a] ← m[a] + 1
6: else if |m| < k then
7: m[a] ← 1
8: else
9: for each c in m do
10: m[c] ← m[c] − 1
11: if m[c] = 0 then
12: Remove(c, m)
13: for each c in m do
14: m[c] ← 0
15: for each a in A do . verify
16: if a ∈ m then
17: m[a] ← m[a] + 1
18: r = [ ], n ← |A|
19: for each c in m do
n
20: if m[c] > then
k
21: Add(c, r)
22: return r

Answer of exercise 14.3


14.3.1. Modify the solution that finds the max sum of sub-vector, returns the sub-vector
of the maximum sum.
If want to return the sub-vector together with the maximum sum, we can maintain
two pairs Pm and P during folding, each pair contains the sum and the sub-vector
(S, L).
maxs = 1st ◦ f oldr f ((0, [ ]), (0, [ ]))
where : f x (Pm , (S, L)) = (Pm 0
= max(Pm , P 0 ), P 0 = max((0, [ ]), (x + S, x:L)))
337

14.3.2. Bentley gives a divide and conquer algorithm to find the max sum in O(n lg n)
time [2] . Split the vector at middle, recursively find the max sum of the two halves,
and the max sum that crosses the middle. Then pick the greatest. Implement this
solution.
1: function Max-Sum(A)
2: if A = φ then
3: return 0
4: else if |A| = 1 then
5: return Max(0, A[1])
6: else
7: m ← b |A|
2 c
8: a ← Max-From(Reverse(A[1...m]))
9: b ← Max-From(A[m + 1...|A|])
10: c ← Max-Sum(A[1...m])
11: d ← Max-Sum(A[m + 1...|A|)
12: return Max(a + b, c, d)

13:function Max-From(A)
14: sum ← 0, m ← 0
15: for i ← 1 to |A| do
16: sum ← sum + A[i]
17: m ← Max(m, sum)
18: return m
Consider the recursive equation: T (n) = 2T (n/2)+O(n), from the master theorem,
the performance is O(n).
14.3.3. Find the sub-metrics in a m × n metrics that gives the maximum sum.
We start from the first row of the metrics, add a row per round [M [1, ∗], M [2, ∗], ..., M [i, ∗]].
Then sum the numbers in each column and convert it to a vector:
 
Xi i
X i
X
V = M [1, j], M [2, j], ..., M [n, j]
j=1 j=1 j=1

Next use the maxsum to find the max sum in vector V , and record the global
maximum sum.
maxSum = maximum ◦ (map maxS) ◦ acc ◦ rows where
rows = init ◦ tails −− exclude the empty row
acc = concatMap (scanl1 (zipWith (+))) −− accumulated sum along columns
maxS = snd ◦ (foldl f (0, 0)) −− max sum in a vector
f (m, s) x = let m' = max (m + x) 0
s' = max m' s in (m', s')

Where tails is defined in Exercise 1.12.2, zipWith is defined in section 1.8,


concatMap is defined in section 6.5.1. The scanl is similar to f oldl, it records
the result every time in a list. scanl1 is a special case of scanl, that the initial
value is the first element.
scanl1 f [ ] = [ ]
scanl1 f (x:xs) = scanl f x xs
where
scanl f q [ ] = [q]
scanl f (x:xs) = q : scanl f (f q x) xs
338 ANSWERS

Below is the corresponding imperative program:


K maxsum2([[K]] m) {
Int n = length(m), k = length(m[0]) // number of row, col
K maxs = 0 // max so far
for i = 0 to n - 1 {
xs = [0] ∗ k
for j = i to n - 1 {
xs = [x + y for (x, y) in zip(xs, m[j])]
maxs = max(maxs, maxsum1(xs))
}
}
return maxs
}

K maxsum1([K] xs) {
K s = 0 // max so far
K m = 0 // max end here
for x in xs {
m = max(m + x, 0)
s = max(m, s)
}
return s
}

Answer of exercise 14.4


14.4.1. Modify the implementation with stack, find all ways to the maze.
dfsSolveAll m from to = map reverse $ solve [[from]] [] where
solve [] ss = ss
solve (c@(p:path):cs) ss
| p == to = solve cs (c:ss) −− find a solution, go on search
| otherwise = let os = filter (`notElem` path) (adjacent p) in
if os == [] then solve cs ss
else solve ((map (:c) os) + + cs) ss
adjacent (x, y) = [(x', y') |
(x', y') ← [(x-1, y), (x+1, y), (x, y-1), (x, y+1)],
inRange (bounds m) (x', y'), m ! (x', y') == 0]

The corresponding imperative program:


[[(Int, Int)]] solve([[Int]] m, (Int, Int) src, (Int, Int) dst) {
[[(Int, Int)]] stack = [[src]]
[[(Int, Int)]] s = []
while stack 6= [] {
path = stack.pop()
if last(path) == dst {
s.append(path)
} else {
for p in adjacent(m, last(path)) {
if not p in path then stack.append(path + [p])
}
}
}
return s
}

[(Int, Int)] adjacent([[(Int, Int)]] m, (Int x, Int y)) {


[(Int, Int)] ps = []
for (dx, dy) in [(0, 1), (0, -1), (1, 0), (-1, 0)] {
Int x1 = x + dx, y1 = y + dy
if 0 ≤ x1 < len(m[0]) and 0 ≤ y1 < len(m)
and m[y][x] == 0 then ps.append((x1, y1))
339

}
return ps
}

Answer of exercise 14.5


14.5.1. Extend the 8 queens to n queens.
We are about to put n queens on the n × n board. We replace 8 with the passed
in parameter n. Below table lists the solutions and the number of queens till 5.

n solution
1 [1]
2 no solution
3 no solution
4 [2,4,1,3],[3,1,4,2]
5 [2,4,1,3,5],[3,1,4,2,5],[1,3,5,2,4], ... total 10 solutions

14.5.2. There are 92 solutions to the 8 queens puzzle. For any solution, it’s also a solution
if rotates 90◦ . We can flip to get another solution. There are essentially 12 distinct
solutions. Write a program to find them.
The solutions are symmetric because the square board is symmetric. The dihe-
dral group D8 defines the square symmetric, including 8 permutations: identity
id, counter clockwise rotate around the center by 90°, 180°, and 270°. Reflect
horizontally, vertically, along the two diagonals. They send the queen at location
(i, j) to:

permutation position
id (i, j)
reflect along Y and X (9 − i, j), (i, 9 − j)
reflect along two diagonals (j, i), (9 − j, 9 − i)
rotate 90°, 180°, 270° (9 − j, i), (9 − i, 9 − j), (j, 9 − i)

We apply the 8 permutations for every solution to generate 8 solutions. We treat


them same, and use a set to store the 12 distinct solutions.
import Data.List ((\\), sortOn)
import Data.Set (Set, empty, insert, notMember, size)
import Data.Tuple (swap)

d8 = [id,
reverse, map (9 - ), −− reflect Y, X
trans swap, trans (λ(i, j) → (9 - j, 9 - i)), −− reflect AC, BD
trans (λ(i, j) → (9 - j, i)), −− 90
trans (λ(i, j) → (9 - i, 9 - j)), −− 180
trans (λ(i, j) → (j, 9 - i))] −− 270
where
trans f xs = snd $ unzip $ sortOn fst $ map f $ zip [1..8] xs

uniqueSolve = dfs [[]] (empty :: Set [Int]) where


dfs [] s = s
dfs (c:cs) s
| length c == 8 = dfs cs (uniqueAdd c s)
| otherwise = dfs ([(x:c) | x ← [1..8] \\ c,
not (attack x c)] ++ cs) s
uniqueAdd c s = if all (`notMember` s) [f c | f ← d8]
340 ANSWERS

then insert c s else s


attack x cs = let y = 1 + length cs in
any (λ(c, r) → abs(x - c) == abs(y - r)) $ zip (reverse cs) [1..]

The 12 solutions are:


[3,6,4,1,8,5,7,2],[3,6,8,1,4,7,5,2],[4,1,5,8,6,3,7,2],[4,2,7,3,6,8,5,1]
[4,6,8,3,1,7,5,2],[4,7,1,8,5,2,6,3],[5,2,4,7,3,8,6,1],[5,3,8,4,7,1,6,2]
[5,7,1,3,8,6,4,2],[5,7,4,1,3,8,6,2],[6,2,7,1,4,8,5,3],[6,4,7,1,8,2,5,3]

Answer of exercise 14.6


14.6.1. Extend the pegs puzzle solution for n pegs on each side.
We use n to build the start/end states:
solve n = dfs [[start]] [] where
dfs [] s = s
dfs (c:cs) s
| head c == end = dfs cs (reverse c:s)
| otherwise = dfs ((map (:c) $ moves $ head c) +
+ cs) s
start = replicate n (-1) ++ [0] ++ replicate n 1
end = reverse start

Answer of exercise 14.7


14.7.1. Improve the extended Euclid algorithm, find the x and y that minimize |x| + |y|
for the optimal solution for the two jugs puzzle.
import Data.List
import Data.Function (on)

−− Extended Euclidean Algorithm


gcmex a 0 = (a, 1, 0)
gcmex a b = (g, y', x' - y' ∗ (a `div` b)) where
(g, x', y') = gcmex b (a `mod` b)

−− Solve the linear Diophantine equation ax + by = c


solve a b c | c `mod` g 6= 0 = (0, 0, 0, 0) −− no solution
| otherwise = (x1, u, y1, v)
where
(g, x0, y0) = gcmex a b
(x1, y1) = (x0 ∗ c `div` g, y0 ∗ c `div` g)
(u, v) = (b `div` g, a `div` g)

−− Minimize |x| + |y|


jars a b c = (x, y) where
(x1, u, y1, v) = solve a b c
x = x1 - k ∗ u
y = y1 + k ∗ v
k = minimumBy (compare `on` (λi → abs (x1 - i ∗ u) +
abs (y1 + i ∗ v))) [-m..m]
m = max (abs x1 `div` u) (abs y1 `div` v)

−− Populate the steps


water a b c = if x > 0 then pour a x b y
else map swap $ pour b y a x
where
(x, y) = jars a b c

−− Pour from a to b, fill a for x times, and empty b for y times.


pour a x b y = steps x y [(0, 0)]
where
steps 0 0 ps = reverse ps
341

steps x y ps@((a', b'):_)


| a' == 0 = steps (x - 1) y ((a, b'):ps) −− fill a
| b' == b = steps x (y + 1) ((a', 0):ps) −− empty b
| otherwise = steps x y ((max (a' + b' - b) 0,
min (a' + b') b):ps) −− a to b

See section 2.2.3, chapter 2 in ‘Isomorphism - mathematics of programming’ for


more details.

Answer of exercise 14.8


14.8.1. Conway slide puzzle.
We use 8 numbers from 0 to 7 to represent pieces, where 0 is the free cell. The pieces
next to 0 can slide in, we represent it as the number 0 moves forward/backward in
a permutation. When moves forward beyond tail, it goes back to the head; when
moves backward beyond head, it goes to the tail. Particularly, when 0 is the first,
it can swap with the 5th piece and vice versa. The solution is very long with 12948
steps. The last several steps is as below.
start = [0..7]
end = 0:[7,6..1]

solve1 = dfs [[start]] where


dfs [] = []
dfs (c:cs)
| head c == end = reverse c
| otherwise = dfs ((map (:c) $ moves c) +
+ cs)

moves (s:visited) = filter (`notElem` visited) [fwd s, bk s, cut s]


where
fwd xs = case break (0 ==) xs of
(as, 0:b:bs) → as + + (b:0:bs)
(a:as, [0]) → 0:as + + [a]
bk xs = case break (0 ==) xs of
([], 0:bs) → bs + + [0]
(as, 0:bs) → (init as) + + (0 : last as : bs)
cut xs = case splitAt 4 xs of
((0:as), (x:bs)) → (x:as) + + (0:bs)
((x:as), (0:bs)) → (0:as) + + (x:bs)
_ → xs

...
[1,0,7,6,5,4,3,2],[1,7,0,6,5,4,3,2],[1,7,6,0,5,4,3,2],[1,7,6,5,0,4,3,2],
[1,7,6,5,4,0,3,2],[1,7,6,5,4,3,0,2],[1,7,6,5,4,3,2,0],[0,7,6,5,4,3,2,1]

Answer of exercise 14.9


14.9.1. Implement the imperative Huffman code table algorithm.
data Node<T> {
Optional<T> c = Nothing
Int w
Node<T> left = null, right = null

Bool isLeaf() = (left == null and right == null)


}

Node<T> merge(Node<T> a, Node<T> b) = Node(Nothing, a.w + b.w, a, b)

Bool (<)(Node<T> a, Node<T> b) = (a.w < b.w)

Node<T> huffman([Node<T>] ts) {


342 ANSWERS

while length(ts) > 1 {


Int n = length(ts)
for Int i = n - 3 down to i 0 {
if ts[i] < max(ts[n-1], ts[n-2]) {
Int j = if ts[n-1] < ts[n-2] then n - 2 else n - 1
swap(ts[i], ts[j])
}
}
ts[n-2] = merge(ts[n-1], ts[n-2])
ts.popLast()
}
return ts[0]
}

Map<T, [T]> codeTab(Node<T> t, [T] bits = [], Map<T, [T]> codes = {}) {
if t.isLeaf() {
codes[t.c] = bits
} else {
codeTab(t.left, bits + [0], codes)
codeTab(t.right, bits + [1], codes)
}
return codes
}

Answer of exercise 14.10


14.10.1. Use heap to build the Huffman tree: take two trees from the top, merge then add
back to the heap.

H = ∅ :
 ∅
Huffman H = |H| = 1 : pop H
otherwise : Huffman (push (merge ta tb ) H 00 )

Where: (ta , H 0 ) = pop H, (tb , H 00 ) = pop H 0


1: function Huffman(H)
2: while |H| > 1 do
3: ta ← Pop(H)
4: tb ← Pop(H)
5: Push(H, Merge(ta , tb ))
6: return Pop(H)
14.10.2. If we sort the symbols by their weight as A, there is a linear time algorithm to
build the Huffman tree: use a tree Q to store the merge result, repeat take the
minimal weighted tree from Q and the head of A, merge then add to the queue.
After process all trees in A, there is a single tree in the Q, which is the Huffman
tree. Implement this algorithm.
Huffman (t:ts) = build (t, (ts, ∅)), where:
build (t, ([ ], ∅)) = t
build (t, h) = build (extract (ts, push (merge t t0 ) q))
where (t0 , (ts, q)) = extract h
extract (t:ts, ∅)) = (t, (ts, ∅))
(t, ([ ], q 0 ), where : (t, q 0 ) = pop q
extract ([ ], q) = (
t0 < t : (t0 , (t:ts, q 0 )), where : (t0 , q 0 ) = pop q
extract (t:ts, q) =
t < t0 : (t, (ts, q))
343

14.10.3. Given a Huffman tree T , implement the decode algorithm with fold left.
decode = snd ◦ (f oldl lookup (T, [ ])), where:

lookup ((w, c), cs) b = (T, c:cs)


lookup ((w, l, r), cs) b = if b = 0 then (l, cs) else (r, cs)

Answer of exercise 14.11


14.11.1. For the longest common sub-sequence, build the optimal solution table with fold.
import Data.Sequence (Seq, singleton, fromList, index, ( |>))

lcs xs ys = construct $ foldl f (singleton $ fromList $ replicate (n+1) 0)


(zip [1..] xs) where
(m, n) = (length xs, length ys)
f tab (i, x) = tab |> (foldl longer (singleton 0) (zip [1..] ys)) where
longer r (j, y) = r |> if x == y
then 1 + (tab `index` (i-1) `index` (j-1))
else max (tab `index` (i-1) `index` j) (r `index` (j-1))
construct tab = get (reverse xs, m) (reverse ys, n) where
get ([], 0) ([], 0) = []
get ((x:xs), i) ((y:ys), j)
| x == y = get (xs, i-1) (ys, j-1) + + [x]
| (tab `index` (i-1) `index` j) > (tab `index` i `index` (j-1)) =
get (xs, i-1) ((y:ys), j)
| otherwise = get ((x:xs), i) (ys, j-1)

Answer of exercise 14.12


14.12.1. For the longest common sub-sequence problem, an alternative solution is to record
the length and the direction in the table. There are three directions: ‘N’ for north,
‘W’ for west, and ‘NW’. Given such a table, we can build the longest common
sub-sequence from the bottom-right entry. If the entry is ‘NW’, next go to the
upper-left entry; if it’s ‘N’, go to the above row; and go to the previous entry if
it’s ‘W’. Implement this solution.
data DIR = N | W | NW

[K] lcs([K] xs, [K] ys) {


Int m = length(x), n = length(ys)
[[(Int, DIR)]] c = [[(0, null)] ∗ (n + 1)] ∗ (m + 1)
for i = 1 to m {
for j = 1 to n {
if xs[i-1] == ys[j-1] {
c[i][j] = (fst(c[i-1][j-1]) + 1, DIR.NW)
} else {
c[i][j] = if fst(c[i-1][j]) > fst(c[i][j-1])
then (fst(c[i-1][j]), DIR.N)
else (fst(c[i][j-1]), DIR.W)
}
}
}
return rebuild(c, xs, ys)
}

[K] rebuild([[(Int, DIR)]] c, [K] xs, [K] ys) {


[K] r = []
Int m = length(xs), n = length(ys)
while m > 0 and n > 0 {
DIR d = snd(c[m][n])
344 ANSWERS

if d == DIR.NW {
r.append(xs[m - 1]) // or ys[n - 1]
m = m - 1, n = n - 1
} else if d == DIR.N {
m = m - 1
} else if d == DIR.W {
n = n - 1
}
}
return reverse(r)
}

14.12.2. For the subset sum upper/lower bound, does l ≤ 0 ≤ u always hold? can we reduce
the range between the bounds?
For non-empty subset (the sum of empty is 0), l ≤ 0 ≤ u does not hold necessarily.
Consider set X of only positive numbers, the lower bound can’t be less than 0, and
l = min(X). For set of only negative numbers, the upper bound can’t be greater
than 0, and u = max(X).
14.12.3. Compute the edit distance between two strings.
Int lev([K] s, [K] t) {
[[Int]] d = [[0]∗n]∗m //d[i][j]: distance between s[:i] and t[:j]
for Int i = 0 to length(s) {
d[i][0] = i //drop all chars of source prefix gives []
}
for Int j = 0 to length(t) {
d[0][j] = j //insert all chars of target prefix to []
}
for Int j = 1 to length(t) {
for i = 1 to length(m) {
c = if s[i-1] == t[j-1] then 0 else 1
d[i][j] = min([d[i-1][j] + 1, //deletion
d[i][j-1] + 1, //insertion
d[i-1][j-1] + c]) //substitution
}
}
return d[length(s)][length(t)]
}
Bibliography

[1] Richard Bird. “Pearls of functional algorithm design”. Cambridge University Press;
1 edition (November 1, 2010). ISBN-10: 0521513383. pp1 - pp6.
[2] Jon Bentley. “Programming Pearls(2nd Edition)”. Addison-Wesley Professional; 2
edition (October 7, 1999). ISBN-13: 978-0201657883.
[3] Chris Okasaki. “Purely Functional Data Structures”. Cambridge university press,
(July 1, 1999), ISBN-13: 978-0521663502
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. “In-
troduction to Algorithms, Second Edition”. The MIT Press, 2001. ISBN: 0262032937.
[5] Chris Okasaki. “Ten Years of Purely Functional Data Structures”. http://okasaki.
blogspot.com/2008/02/ten-years-of-purely-functional-data.html
[6] SGI. “Standard Template Library Programmer’s Guide”. http://www.sgi.com/tech/
stl/
[7] Wikipedia. “Fold(high-order function)”. https://en.wikipedia.org/wiki/Fold_
(higher-order_function)
[8] Wikipedia. “Function Composition”. https://en.wikipedia.org/wiki/Function_
composition
[9] Wikipedia. “Partial application”. https://en.wikipedia.org/wiki/Partial_application
[10] Miran Lipovaca. “Learn You a Haskell for Great Good! A Beginner’s Guide”. No
Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327-283-8
[11] Wikipedia. “Bubble sort”. https://en.wikipedia.org/wiki/Bubble_sort
[12] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[13] Chris Okasaki. “FUNCTIONAL PEARLS Red-Black Trees in a Functional Setting”.
J. Functional Programming. 1998
[14] Wikipedia. “Red-black tree”. https://en.wikipedia.org/wiki/Red-black_tree
[15] Lyn Turbak. “Red-Black Trees”. http://cs.wellesley.edu/~cs231/fall01/red-black.pdf
Nov. 2, 2001.
[16] Rosetta Code. “Pattern matching”. http://rosettacode.org/wiki/Pattern_matching
[17] Hackage. “Data.Tree.AVL”. http://hackage.haskell.org/packages/archive/AvlTree/
4.2/doc/html/Data-Tree-AVL.html

345
346 BIBLIOGRAPHY

[18] Wikipedia. “AVL tree”. https://en.wikipedia.org/wiki/AVL_tree


[19] Guy Cousinear, Michel Mauny. “The Functional Approach to Programming”. Cam-
bridge University Press; English Ed edition (October 29, 1998). ISBN-13: 978-
0521576819
[20] Pavel Grafov. “Implementation of an AVL tree in Python”. http://github.com/
pgrafov/python-avl-tree
[21] Chris Okasaki and Andrew Gill. “Fast Mergeable Integer Maps”. Workshop on ML,
September 1998, pages 77-86.
[22] D.R. Morrison, “PATRICIA – Practical Algorithm To Retrieve Information Coded
In Alphanumeric”, Journal of the ACM, 15(4), October 1968, pages 514-534.
[23] Wikipedia. “Suffix Tree”. https://en.wikipedia.org/wiki/Suffix_tree
[24] Wikipedia. “Trie”. https://en.wikipedia.org/wiki/Trie
[25] Wikipedia. “T9 (predictive text)”. https://en.wikipedia.org/wiki/T9_(predictive_
text)
[26] Wikipedia. “Predictive text”. https://en.wikipedia.org/wiki/Predictive_text
[27] Esko Ukkonen. “On-line construction of suffix trees”. Algorithmica 14 (3): 249–260.
doi:10.1007/BF01206331. http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.
pdf
[28] Weiner, P. “Linear pattern matching algorithms”, 14th Annual IEEE Symposium on
Switching and Automata Theory, pp. 1-11, doi:10.1109/SWAT.1973.13
[29] Esko Ukkonen. “Suffix tree and suffix array techniques for pattern analysis in strings”.
http://www.cs.helsinki.fi/u/ukkonen/Erice2005.ppt
[30] Suffix Tree (Java). http://en.literateprograms.org/Suffix_tree_(Java)
[31] Robert Giegerich and Stefan Kurtz. “From Ukkonen to McCreight and Weiner: A
Unifying View of Linear-Time Suffix Tree Construction”. Science of Computer Pro-
gramming 25(2-3):187-218, 1995. http://citeseer.ist.psu.edu/giegerich95comparison.
html
[32] Robert Giegerich and Stefan Kurtz. “A Comparison of Imperative and
Purely Functional Suffix Tree Constructions”. Algorithmica 19 (3): 331–353.
doi:10.1007/PL00009177. http://www.zbh.uni-hamburg.de/pubs/pdf/GieKur1997.
pdf
[33] Bryan O’Sullivan. “suffixtree: Efficient, lazy suffix tree implementation”. http://
hackage.haskell.org/package/suffixtree
[34] Danny. http://hkn.eecs.berkeley.edu/~dyoo/plt/suffixtree/
[35] Dan Gusfield. “Algorithms on Strings, Trees and Sequences Computer Science and
Computational Biology”. Cambridge University Press; 1 edition (May 28, 1997)
ISBN: 9780521585194
[36] Lloyd Allison. “Suffix Trees”. http://www.allisons.org/ll/AlgDS/Tree/Suffix/
[37] Esko Ukkonen. “Suffix tree and suffix array techniques for pattern analysis in strings”.
http://www.cs.helsinki.fi/u/ukkonen/Erice2005.ppt
BIBLIOGRAPHY 347

[38] Esko Ukkonen “Approximate string-matching over suffix trees”. Proc. CPM 93. Lec-
ture Notes in Computer Science 684, pp. 228-242, Springer 1993. http://www.cs.
helsinki.fi/u/ukkonen/cpm931.ps
[39] Wikipeida. “B-tree”. https://en.wikipedia.org/wiki/B-tree
[40] Wikipedia. “Heap (data structure)”. https://en.wikipedia.org/wiki/Heap_(data_
structure)
[41] Wikipedia. “Heapsort”. https://en.wikipedia.org/wiki/Heapsort
[42] Rosetta Code. “Sorting algorithms/Heapsort”. http://rosettacode.org/wiki/Sorting_
algorithms/Heapsort
[43] Wikipedia. “Leftist Tree”. https://en.wikipedia.org/wiki/Leftist_tree
[44] Bruno R. Preiss. Data Structures and Algorithms with Object-Oriented Design Pat-
terns in Java. http://www.brpreiss.com/books/opus5/index.html
[45] Donald E. Knuth. “The Art of Computer Programming. Volume 3: Sorting and
Searching.”. Addison-Wesley Professional; 2nd Edition (October 15, 1998). ISBN-13:
978-0201485417. Section 5.2.3 and 6.2.3
[46] Wikipedia. “Skew heap”. https://en.wikipedia.org/wiki/Skew_heap
[47] Sleator, Daniel Dominic; Jarjan, Robert Endre. “Self-adjusting heaps” SIAM Journal
on Computing 15(1):52-69. doi:10.1137/0215004 ISSN 00975397 (1986)
[48] Wikipedia. “Splay tree”. https://en.wikipedia.org/wiki/Splay_tree
[49] Sleator, Daniel D.; Tarjan, Robert E. (1985), “Self-Adjusting Binary Search Trees”,
Journal of the ACM 32(3):652 - 686, doi: 10.1145/3828.3835
[50] NIST, “binary heap”. http://xw2k.nist.gov/dads//HTML/binaryheap.html
[51] Donald E. Knuth. “The Art of Computer Programming, Volume 3: Sorting and
Searching (2nd Edition)”. Addison-Wesley Professional; 2 edition (May 4, 1998)
ISBN-10: 0201896850 ISBN-13: 978-0201896855
[52] Wikipedia. “Strict weak order”. https://en.wikipedia.org/wiki/Strict_weak_order
[53] Wikipedia. “FIFA world cup”. https://en.wikipedia.org/wiki/FIFA_World_Cup
[54] Wikipedia. “K-ary tree”. https://en.wikipedia.org/wiki/K-ary_tree
[55] Wikipedia, “Pascal’s triangle”. https://en.wikipedia.org/wiki/Pascal's_triangle
[56] Hackage. “An alternate implementation of a priority queue based on a
Fibonacci heap.”, http://hackage.haskell.org/packages/archive/pqueue-mtl/1.0.7/
doc/html/src/Data-Queue-FibQueue.html
[57] Chris Okasaki. “Fibonacci Heaps.” http://darcs.haskell.org/nofib/gc/fibheaps/orig
[58] Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan.
“The Pairing Heap: A New Form of Self-Adjusting Heap” Algorithmica (1986) 1:
111-129.
[59] Maged M. Michael and Michael L. Scott. “Simple, Fast, and Practical Non-
Blocking and Blocking Concurrent Queue Algorithms”. http://www.cs.rochester.
edu/research/synchronization/pseudocode/queues.html
348 BIBLIOGRAPHY

[60] Herb Sutter. “Writing a Generalized Concurrent Queue”. Dr. Dobb’s Oct 29, 2008.
http://drdobbs.com/cpp/211601363?pgno=1
[61] Wikipedia. “Tail-call”. https://en.wikipedia.org/wiki/Tail_call
[62] Wikipedia. “Recursion (computer science)”. https://en.wikipedia.org/wiki/
Recursion_(computer_science)#Tail-recursive_functions
[63] Harold Abelson, Gerald Jay Sussman, Julie Sussman. “Structure and Interpretation
of Computer Programs, 2nd Edition”. MIT Press, 1996, ISBN 0-262-51087-1.
[64] Chris Okasaki. “Purely Functional Random-Access Lists”. Functional Programming
Languages and Computer Architecture, June 1995, pages 86-95.
[65] Ralf Hinze and Ross Paterson. “Finger Trees: A Simple General-purpose Data
Structure,” in Journal of Functional Programming 16:2 (2006), pages 197-217.
http://www.soi.city.ac.uk/~ross/papers/FingerTree.html
[66] Guibas, L. J., McCreight, E. M., Plass, M. F., Roberts, J. R. (1977), ”A new repre-
sentation for linear lists”. Conference Record of the Ninth Annual ACM Symposium
on Theory of Computing, pp. 49-60.
[67] Generic finger-tree structure. http://hackage.haskell.org/packages/archive/
fingertree/0.0/doc/html/Data-FingerTree.html
[68] Wikipedia. “Move-to-front transform”. https://en.wikipedia.org/wiki/Move-to-
front_transform
[69] Robert Sedgewick. “Implementing quick sort programs”. Communication of ACM.
Volume 21, Number 10. 1978. pp.847 - 857.
[70] Jon Bentley, Douglas McIlroy. “Engineering a sort function”. Software Practice and
experience VOL. 23(11), 1249-1265 1993.
[71] Robert Sedgewick, Jon Bentley. “Quicksort is optimal”. http://www.cs.princeton.
edu/~rs/talks/QuicksortIsOptimal.pdf
[72] Fethi Rabhi, Guy Lapalme. “Algorithms: a functional programming approach”. Sec-
ond edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
[73] Simon Peyton Jones. “The Implementation of functional programming languages”.
Prentice-Hall International, 1987. ISBN: 0-13-453333-X
[74] Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. “Practical in-place mergesort”.
Nordic Journal of Computing, 1996.
[75] Josè Bacelar Almeida and Jorge Sousa Pinto. “Deriving Sorting Algorithms”. Tech-
nical report, Data structures and Algorithms. 2008.
[76] Cole, Richard (August 1988). “Parallel merge sort”. SIAM J. Comput. 17 (4): 770-
785. doi:10.1137/0217049. (August 1988)
[77] Powers, David M. W. “Parallelized Quicksort and Radixsort with Optimal Speedup”,
Proceedings of International Conference on Parallel Computing Technologies. Novosi-
birsk. 1991.
[78] Wikipedia. “Quicksort”. https://en.wikipedia.org/wiki/Quicksort
[79] Wikipedia. “Total order”. http://en.wokipedia.org/wiki/Total_order
BIBLIOGRAPHY 349

[80] Wikipedia. “Harmonic series (mathematics)”. https://en.wikipedia.org/wiki/


Harmonic_series_(mathematics)
[81] M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan, ”Time bounds for selec-
tion,” J. Comput. System Sci. 7 (1973) 448-461.
[82] Edsger W. Dijkstra. “The saddleback search”. EWD-934. 1985. https://www.cs.
utexas.edu/users/EWD/ewd09xx/EWD934.PDF.
[83] Robert Boyer, and Strother Moore. “MJRTY - A Fast Majority Vote Algorithm”.
Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning
Series, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991, pp. 105-117.
[84] Cormode, Graham; S. Muthukrishnan (2004). “An Improved Data Stream Summary:
The Count-Min Sketch and its Applications”. J. Algorithms 55: 29-38.
[85] Knuth Donald, Morris James H., jr, Pratt Vaughan. “Fast pattern matching in
strings”. SIAM Journal on Computing 6 (2): 323-350. 1977.

[86] Robert Boyer, Strother Moore. “A Fast String Searching Algorithm”. Comm. ACM
(New York, NY, USA: Association for Computing Machinery) 20 (10): 762-772. 1977
[87] R. N. Horspool. “Practical fast searching in strings”. Software - Practice & Experience
10 (6): 501-506. 1980.

[88] Wikipedia. “Boyer-Moore string search algorithm”. https://en.wikipedia.org/wiki/


Boyer-Moore_string_search_algorithm
[89] Wikipedia. “Eight queens puzzle”. https://en.wikipedia.org/wiki/Eight_queens_
puzzle

[90] George Pólya. “How to solve it: A new aspect of mathematical method”. Princeton
University Press(April 25, 2004). ISBN-13: 978-0691119663
[91] Wikipedia. “David A. Huffman”. https://en.wikipedia.org/wiki/David_A._Huffman
[92] Andrei Alexandrescu. “Modern C++ design: Generic Programming and Design Pat-
terns Applied”. Addison Wesley February 01, 2001, ISBN 0-201-70431-5

[93] Benjamin C. Pierce. “Types and Programming Languages”. The MIT Press, 2002.
ISBN:0262162091
[94] Joe Armstrong. “Programming Erlang: Software for a Concurrent World”. Pragmatic
Bookshelf; 1 edition (July 18, 2007). ISBN-13: 978-1934356005

[95] SGI. “transform”. http://www.sgi.com/tech/stl/transform.html


[96] ACM/ICPC. “The drunk jailer.” Peking University judge online for ACM/ICPC.
http://poj.org/problem?id=1218.
[97] Haskell wiki. “Haskell programming tips”. 4.4 Choose the appropriate fold. http:
//www.haskell.org/haskellwiki/Haskell_programming_tips
[98] Wikipedia. “Dot product”. https://en.wikipedia.org/wiki/Dot_product
[99] Xinyu LIU. “Isomorphism - mathematics of programming”. https://github.com/
liuxinyu95/unplugged
350 BIBLIOGRAPHY
GNU Free Documentation License

Version 1.3, 3 November 2008


Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.

<http://fsf.org/>

Everyone is permitted to copy and distribute verbatim copies of this license document,
but changing it is not allowed.

Preamble
The purpose of this License is to make a manual, textbook, or other functional and
useful document “free” in the sense of freedom: to assure everyone the effective freedom
to copy and redistribute it, with or without modifying it, either commercially or noncom-
mercially. Secondarily, this License preserves for the author and publisher a way to get
credit for their work, while not being considered responsible for modifications made by
others.
This License is a kind of “copyleft”, which means that derivative works of the document
must themselves be free in the same sense. It complements the GNU General Public
License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because
free software needs free documentation: a free program should come with manuals provid-
ing the same freedoms that the software does. But this License is not limited to software
manuals; it can be used for any textual work, regardless of subject matter or whether it
is published as a printed book. We recommend this License principally for works whose
purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS


This License applies to any manual or other work, in any medium, that contains a
notice placed by the copyright holder saying it can be distributed under the terms of this
License. Such a notice grants a world-wide, royalty-free license, unlimited in duration,
to use that work under the conditions stated herein. The “Document”, below, refers
to any such manual or work. Any member of the public is a licensee, and is addressed
as “you”. You accept the license if you copy, modify or distribute the work in a way
requiring permission under copyright law.
A “Modified Version” of the Document means any work containing the Document
or a portion of it, either copied verbatim, or with modifications and/or translated into
another language.
A “Secondary Section” is a named appendix or a front-matter section of the Doc-
ument that deals exclusively with the relationship of the publishers or authors of the
Document to the Document’s overall subject (or to related matters) and contains nothing
that could fall directly within that overall subject. (Thus, if the Document is in part a

351
352 BIBLIOGRAPHY

textbook of mathematics, a Secondary Section may not explain any mathematics.) The
relationship could be a matter of historical connection with the subject or with related
matters, or of legal, commercial, philosophical, ethical or political position regarding
them.
The “Invariant Sections” are certain Secondary Sections whose titles are designated,
as being those of Invariant Sections, in the notice that says that the Document is released
under this License. If a section does not fit the above definition of Secondary then it is
not allowed to be designated as Invariant. The Document may contain zero Invariant
Sections. If the Document does not identify any Invariant Sections then there are none.
The “Cover Texts” are certain short passages of text that are listed, as Front-Cover
Texts or Back-Cover Texts, in the notice that says that the Document is released under
this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may
be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy, represented
in a format whose specification is available to the general public, that is suitable for re-
vising the document straightforwardly with generic text editors or (for images composed
of pixels) generic paint programs or (for drawings) some widely available drawing editor,
and that is suitable for input to text formatters or for automatic translation to a variety of
formats suitable for input to text formatters. A copy made in an otherwise Transparent
file format whose markup, or absence of markup, has been arranged to thwart or dis-
courage subsequent modification by readers is not Transparent. An image format is not
Transparent if used for any substantial amount of text. A copy that is not “Transparent”
is called “Opaque”.
Examples of suitable formats for Transparent copies include plain ASCII without
markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly
available DTD, and standard-conforming simple HTML, PostScript or PDF designed
for human modification. Examples of transparent image formats include PNG, XCF
and JPG. Opaque formats include proprietary formats that can be read and edited only
by proprietary word processors, SGML or XML for which the DTD and/or processing
tools are not generally available, and the machine-generated HTML, PostScript or PDF
produced by some word processors for output purposes only.
The “Title Page” means, for a printed book, the title page itself, plus such following
pages as are needed to hold, legibly, the material this License requires to appear in the
title page. For works in formats which do not have any title page as such, “Title Page”
means the text near the most prominent appearance of the work’s title, preceding the
beginning of the body of the text.
The “publisher” means any person or entity that distributes copies of the Document
to the public.
A section “Entitled XYZ” means a named subunit of the Document whose title
either is precisely XYZ or contains XYZ in parentheses following text that translates
XYZ in another language. (Here XYZ stands for a specific section name mentioned below,
such as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To
“Preserve the Title” of such a section when you modify the Document means that it
remains a section “Entitled XYZ” according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that
this License applies to the Document. These Warranty Disclaimers are considered to be
included by reference in this License, but only as regards disclaiming warranties: any
other implication that these Warranty Disclaimers may have is void and has no effect on
the meaning of this License.

2. VERBATIM COPYING
BIBLIOGRAPHY 353

You may copy and distribute the Document in any medium, either commercially or
noncommercially, provided that this License, the copyright notices, and the license notice
saying this License applies to the Document are reproduced in all copies, and that you
add no other conditions whatsoever to those of this License. You may not use technical
measures to obstruct or control the reading or further copying of the copies you make
or distribute. However, you may accept compensation in exchange for copies. If you
distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may
publicly display copies.

3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers)
of the Document, numbering more than 100, and the Document’s license notice requires
Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all
these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the
back cover. Both covers must also clearly and legibly identify you as the publisher of
these copies. The front cover must present the full title with all words of the title equally
prominent and visible. You may add other material on the covers in addition. Copying
with changes limited to the covers, as long as they preserve the title of the Document and
satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put
the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest
onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100,
you must either include a machine-readable Transparent copy along with each Opaque
copy, or state in or with each Opaque copy a computer-network location from which
the general network-using public has access to download using public-standard network
protocols a complete Transparent copy of the Document, free of added material. If you use
the latter option, you must take reasonably prudent steps, when you begin distribution of
Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible
at the stated location until at least one year after the last time you distribute an Opaque
copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well
before redistributing any large number of copies, to give them a chance to provide you
with an updated version of the Document.

4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions
of sections 2 and 3 above, provided that you release the Modified Version under precisely
this License, with the Modified Version filling the role of the Document, thus licensing
distribution and modification of the Modified Version to whoever possesses a copy of it.
In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the
Document, and from those of previous versions (which should, if there were any, be
listed in the History section of the Document). You may use the same title as a
previous version if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for
authorship of the modifications in the Modified Version, together with at least five
of the principal authors of the Document (all of its principal authors, if it has fewer
than five), unless they release you from this requirement.
354 BIBLIOGRAPHY

C. State on the Title page the name of the publisher of the Modified Version, as the
publisher.

D. Preserve all the copyright notices of the Document.

E. Add an appropriate copyright notice for your modifications adjacent to the other
copyright notices.

F. Include, immediately after the copyright notices, a license notice giving the public
permission to use the Modified Version under the terms of this License, in the form
shown in the Addendum below.

G. Preserve in that license notice the full lists of Invariant Sections and required Cover
Texts given in the Document’s license notice.

H. Include an unaltered copy of this License.

I. Preserve the section Entitled “History”, Preserve its Title, and add to it an item
stating at least the title, year, new authors, and publisher of the Modified Version as
given on the Title Page. If there is no section Entitled “History” in the Document,
create one stating the title, year, authors, and publisher of the Document as given
on its Title Page, then add an item describing the Modified Version as stated in the
previous sentence.

J. Preserve the network location, if any, given in the Document for public access to
a Transparent copy of the Document, and likewise the network locations given in
the Document for previous versions it was based on. These may be placed in the
“History” section. You may omit a network location for a work that was published
at least four years before the Document itself, or if the original publisher of the
version it refers to gives permission.

K. For any section Entitled “Acknowledgements” or “Dedications”, Preserve the Title


of the section, and preserve in the section all the substance and tone of each of the
contributor acknowledgements and/or dedications given therein.

L. Preserve all the Invariant Sections of the Document, unaltered in their text and in
their titles. Section numbers or the equivalent are not considered part of the section
titles.

M. Delete any section Entitled “Endorsements”. Such a section may not be included in
the Modified Version.

N. Do not retitle any existing section to be Entitled “Endorsements” or to conflict in


title with any Invariant Section.

O. Preserve any Warranty Disclaimers.

If the Modified Version includes new front-matter sections or appendices that qualify
as Secondary Sections and contain no material copied from the Document, you may at
your option designate some or all of these sections as invariant. To do this, add their
titles to the list of Invariant Sections in the Modified Version’s license notice. These titles
must be distinct from any other section titles.
You may add a section Entitled “Endorsements”, provided it contains nothing but
endorsements of your Modified Version by various parties—for example, statements of
peer review or that the text has been approved by an organization as the authoritative
definition of a standard.
BIBLIOGRAPHY 355

You may add a passage of up to five words as a Front-Cover Text, and a passage of up
to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified
Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added
by (or through arrangements made by) any one entity. If the Document already includes
a cover text for the same cover, previously added by you or by arrangement made by the
same entity you are acting on behalf of, you may not add another; but you may replace
the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission
to use their names for publicity for or to assert or imply endorsement of any Modified
Version.

5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License,
under the terms defined in section 4 above for modified versions, provided that you in-
clude in the combination all of the Invariant Sections of all of the original documents,
unmodified, and list them all as Invariant Sections of your combined work in its license
notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical
Invariant Sections may be replaced with a single copy. If there are multiple Invariant
Sections with the same name but different contents, make the title of each such section
unique by adding at the end of it, in parentheses, the name of the original author or
publisher of that section if known, or else a unique number. Make the same adjustment
to the section titles in the list of Invariant Sections in the license notice of the combined
work.
In the combination, you must combine any sections Entitled “History” in the various
original documents, forming one section Entitled “History”; likewise combine any sections
Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You must delete
all sections Entitled “Endorsements”.

6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released
under this License, and replace the individual copies of this License in the various docu-
ments with a single copy that is included in the collection, provided that you follow the
rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individ-
ually under this License, provided you insert a copy of this License into the extracted
document, and follow this License in all other respects regarding verbatim copying of
that document.

7. AGGREGATION WITH INDEPENDENT


WORKS
A compilation of the Document or its derivatives with other separate and independent
documents or works, in or on a volume of a storage or distribution medium, is called an
“aggregate” if the copyright resulting from the compilation is not used to limit the legal
rights of the compilation’s users beyond what the individual works permit. When the
Document is included in an aggregate, this License does not apply to the other works in
the aggregate which are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Docu-
ment, then if the Document is less than one half of the entire aggregate, the Document’s
356 BIBLIOGRAPHY

Cover Texts may be placed on covers that bracket the Document within the aggregate, or
the electronic equivalent of covers if the Document is in electronic form. Otherwise they
must appear on printed covers that bracket the whole aggregate.

8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of
the Document under the terms of section 4. Replacing Invariant Sections with translations
requires special permission from their copyright holders, but you may include translations
of some or all Invariant Sections in addition to the original versions of these Invariant
Sections. You may include a translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include the original
English version of this License and the original versions of those notices and disclaimers.
In case of a disagreement between the translation and the original version of this License
or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled “Acknowledgements”, “Dedications”, or “His-
tory”, the requirement (section 4) to Preserve its Title (section 1) will typically require
changing the actual title.

9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly
provided under this License. Any attempt otherwise to copy, modify, sublicense, or dis-
tribute it is void, and will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your license from a particular
copyright holder is reinstated (a) provisionally, unless and until the copyright holder
explicitly and finally terminates your license, and (b) permanently, if the copyright holder
fails to notify you of the violation by some reasonable means prior to 60 days after the
cessation.
Moreover, your license from a particular copyright holder is reinstated permanently if
the copyright holder notifies you of the violation by some reasonable means, this is the
first time you have received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after your receipt of the
notice.
Termination of your rights under this section does not terminate the licenses of parties
who have received copies or rights from you under this License. If your rights have been
terminated and not permanently reinstated, receipt of a copy of some or all of the same
material does not give you any rights to use it.

10. FUTURE REVISIONS OF THIS LICENSE


The Free Software Foundation may publish new, revised versions of the GNU Free
Documentation License from time to time. Such new versions will be similar in spirit to
the present version, but may differ in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document
specifies that a particular numbered version of this License “or any later version” applies
to it, you have the option of following the terms and conditions either of that specified
version or of any later version that has been published (not as a draft) by the Free Software
Foundation. If the Document does not specify a version number of this License, you may
choose any version ever published (not as a draft) by the Free Software Foundation. If
the Document specifies that a proxy can decide which future versions of this License can
BIBLIOGRAPHY 357

be used, that proxy’s public statement of acceptance of a version permanently authorizes


you to choose that version for the Document.

11. RELICENSING
“Massive Multiauthor Collaboration Site” (or “MMC Site”) means any World Wide
Web server that publishes copyrightable works and also provides prominent facilities for
anybody to edit those works. A public wiki that anybody can edit is an example of such a
server. A “Massive Multiauthor Collaboration” (or “MMC”) contained in the site means
any set of copyrightable works thus published on the MMC site.
“CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0 license pub-
lished by Creative Commons Corporation, a not-for-profit corporation with a principal
place of business in San Francisco, California, as well as future copyleft versions of that
license published by that same organization.
“Incorporate” means to publish or republish a Document, in whole or in part, as part
of another Document.
An MMC is “eligible for relicensing” if it is licensed under this License, and if all
works that were first published under this License somewhere other than this MMC, and
subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or
invariant sections, and (2) were thus incorporated prior to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site under
CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is
eligible for relicensing.

ADDENDUM: How to use this License for your


documents
To use this License in a document you have written, include a copy of the License in
the document and put the following copyright and license notices just after the title page:

Copyright © YEAR YOUR NAME. Permission is granted to copy, distribute


and/or modify this document under the terms of the GNU Free Documenta-
tion License, Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-
Cover Texts. A copy of the license is included in the section entitled “GNU
Free Documentation License”.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the
“with … Texts.” line with this:

with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover
Texts being LIST, and with the Back-Cover Texts being LIST.

If you have Invariant Sections without Cover Texts, or some other combination of the
three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend re-
leasing these examples in parallel under your choice of free software license, such as the
GNU General Public License, to permit their use in free software.
Index

8 queens puzzle, 239 insert, 143


pop, 144
Auto completion, 80 push, 143
AVL tree, 57 Binomial tree, 139
balance, 60 merge, 144
definition, 57 Boyer-Moor majority number, 231
imperative insert, 62 Breadth-first search, 254
insert, 59
verification, 61 Change making problem, 258
Cock-tail sort, 130
B-tree, 89 complete binary tree, 109
delete, 99 Curried Form, 9
insert, 91 Currying, 9
lookup, 98
BFS, 254 Deep-first search, 237
Binary heap, 109 DFS, 237
build, 112 Dynamic programming, 259
decrease key, 114
Heapify, 110 equivalent, 18
insertion, 114
Fibonacci Heap
pop, 112
decrease key, 151
push, 114
delete min, 148
top, 112
insert, 147
top-k, 114
merge, 147
binary heap by array, 109
pop, 148
Binary Random Access List
Fibonacci heap, 146
insert, 174
Finger tree
random access, 174
Append to right, 183
remove, 174
Concatenate, 185
Binary search, 222 insert to left, 182
binary search tree, 27 Random access, 186
delete, 33 Remove from left, 183
insert, 28 Remove from right, 185
looking up, 31 fold, 19
min/max, 32
random build, 35 Greedy algorithm, 255
succ/pred, 32
traverse, 29 Heap sort, 114
binary tree, 27 Huffman coding, 255
Binomial Heap
Link, 142 in-order traverse, 30
Binomial heap, 139 Insertion sort
definition, 140 binary search, 41

358
INDEX 359

binary search tree, 42 group, 17


linked-list setting, 41 head, 2
insertion sort, 39 index, 2
insertion, 40 infix, 23
Integer Patricia, 68 init, 3
Integer prefix tree, 68 insert, 5
Integer tree insert at, 5
insert, 69 last, 3
lookup, 73 length, 2
Integer trie, 65 list comprehension, 13
insert, 66 lookup, 22
lookup, 68 map, 12
matching, 23
Kloski puzzle, 251 maximum, 10
KMP, 234 minimum, 10
Knuth-Morris-Pratt algorithm, 234 mutate, 5
prefix, 23
LCS, 261 product, 8
left child, right sibling, 140 reverse, 15
Leftist heap, 116 Right index, 3
heap sort, 117 rindex, 3
insert, 117 set at, 5
merge, 117 span, 17
pop, 117 split at, 16
rank, 116 suffix, 23
S-value, 116 sum, 8
top, 117 tail, 2
List take, 16
append, 5 take while, 16
break, 17 Transform, 11
concat, 8 unzip, 24
concats, 21 ZF expression, 13
cons, 2 zip, 24
Construction, 2 Longest common sub-sequence, 261
definition, 1
delete, 7 Maximum sum problem, 233
delete at, 7 Maze problem, 237
drop, 16 Merge Sort, 206
drop while, 16 Bottom-up merge sort, 215
elem, 22 In-place merge sort, 209
empty, 1 Merge, 206
empty testing, 1 Nature merge sort, 212
existence testing, 22 Performance, 208
Extract sub-list, 16 Work area, 208, 209
filter, 22 minimum free number, i
find, 22 MTF, 187
fold from left, 20
fold from right, 19 Paired-array sequence
foldl, 20 random access, 178
foldr, 19 remove and balance, 178
for each, 13 Pairing heap, 154
get at, 2 decrease key, 155
360 INDEX

definition, 154 numeric representation, 177


delete, 156 Paired-array sequence, 178
insert, 154 Skew heap, 118
pop, 155 insertion, 118
top, 154 merge, 118
Parallel merge sort, 216 pop, 118
Parallel quick sort, 216 top, 118
Patricia, 75 Splay heap, 119
Peg puzzle, 241 insert, 122
post-order traverse, 30 merge, 122
pre-order traverse, 30 pop, 122
Prefix tree, 75 splay, 119
insert, 76 top, 122
look up, 79 strict weak order, 129
Subset sum, 263
Queue
Balance Queue, 167 T9, 82
Circular buffer, 164 Tail call, 8
Lazy real-time queue, 170 Tail recursion, 8
linked-list, 163 Tail recursive call, 8
Paired-array queue, 166 The wolf, goat, and cabbage puzzle, 244
Paired-list queue, 165 Tournament knock out, 132
Real-time queue, 167 tree reconstruction, 31
Quick Sort tree rotation, 44
2-way partition, 200 Trie, 74
3-way partition, 201 insert, 74
Average case, 197 lookup, 75
Quick sort, 193
Improvement, 199 Water jugs puzzle, 247
partition, 194 word counter, 27
Performance, 197
Ternary partition, 199

Radix tree, 65
range traverse, 33
Red-black tree
Imperative delete, 275
red-black tree, 46
delete, 48
imperative insertion, 52
insert, 47
red-black properties, 46
reduce, 20

Saddle back search, 224


Selection algorithm, 221
selection sort, 127
min, 128
tail-recursive min, 128
Sequence
Binary random access list, 173
Concatenate-able list, 179
finger tree, 181

You might also like