Elementary Algorithms
Elementary Algorithms
Contents
I
Preface
0.1
0.2
0.3
0.4
0.5
II
11
Why? . . . . . . . . . . . . . . . . . . . . . . . . . . .
The smallest free ID problem, the power of algorithms
0.2.1 Improvement 1 . . . . . . . . . . . . . . . . . .
0.2.2 Improvement 2, Divide and Conquer . . . . . .
0.2.3 Expressiveness vs. Performance . . . . . . . . .
The number puzzle, power of data structure . . . . . .
0.3.1 The brute-force solution . . . . . . . . . . . . .
0.3.2 Improvement 1 . . . . . . . . . . . . . . . . . .
0.3.3 Improvement 2 . . . . . . . . . . . . . . . . . .
Notes and short summary . . . . . . . . . . . . . . . .
Structure of the contents . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Trees
27
13
13
14
15
16
18
18
18
21
23
24
.
.
.
.
.
.
structure
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
30
33
34
37
37
38
38
40
44
.
.
.
.
.
.
47
47
48
50
51
53
54
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
57
57
58
60
62
63
66
74
77
4 AVL tree
4.1 Introduction . . . . . . . . . . . . . . . . . . .
4.1.1 How to measure the balance of a tree?
4.2 Definition of AVL tree . . . . . . . . . . . . .
4.3 Insertion . . . . . . . . . . . . . . . . . . . . .
4.3.1 Balancing adjustment . . . . . . . . .
4.3.2 Pattern Matching . . . . . . . . . . . .
4.4 Deletion . . . . . . . . . . . . . . . . . . . . .
4.5 Imperative AVL tree algorithm . . . . . . .
4.6 Chapter note . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
81
81
81
81
84
86
90
92
92
96
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
99
99
100
101
101
103
104
105
106
111
113
113
115
116
117
117
118
123
125
125
129
134
6 Sux Tree
6.1 Introduction . . . . . . . . . . . . . .
6.2 Sux trie . . . . . . . . . . . . . . .
6.2.1 Node transfer and sux link
6.2.2 On-line construction . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
137
137
138
139
140
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
6.3
6.4
6.5
Sux
6.3.1
Sux
6.4.1
6.4.2
6.4.3
6.4.4
6.4.5
Notes
5
Tree . . . . . . . . . . . . . . . . . . .
On-line construction . . . . . . . . .
tree applications . . . . . . . . . . . .
String/Pattern searching . . . . . . .
Find the longest repeated sub-string
Find the longest common sub-string
Find the longest palindrome . . . . .
Others . . . . . . . . . . . . . . . . .
and short summary . . . . . . . . . .
7 B-Trees
7.1 Introduction . . . . . . . . . . . . .
7.2 Insertion . . . . . . . . . . . . . . .
7.2.1 Splitting . . . . . . . . . . .
7.3 Deletion . . . . . . . . . . . . . . .
7.3.1 Merge before delete method
7.3.2 Delete and fix method . . .
7.4 Searching . . . . . . . . . . . . . .
7.5 Notes and short summary . . . . .
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
144
144
153
153
155
157
159
159
159
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
163
163
165
165
172
172
180
186
187
Heaps
8 Binary Heaps
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Implicit binary heap by array . . . . . . . . . . . . . .
8.2.1 Definition . . . . . . . . . . . . . . . . . . . . .
8.2.2 Heapify . . . . . . . . . . . . . . . . . . . . . .
8.2.3 Build a heap . . . . . . . . . . . . . . . . . . .
8.2.4 Basic heap operations . . . . . . . . . . . . . .
8.2.5 Heap sort . . . . . . . . . . . . . . . . . . . . .
8.3 Leftist heap and Skew heap, the explicit binary heaps
8.3.1 Definition . . . . . . . . . . . . . . . . . . . . .
8.3.2 Merge . . . . . . . . . . . . . . . . . . . . . . .
8.3.3 Basic heap operations . . . . . . . . . . . . . .
8.3.4 Heap sort by Leftist Heap . . . . . . . . . . . .
8.3.5 Skew heaps . . . . . . . . . . . . . . . . . . . .
8.4 Splay heap . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1 Definition . . . . . . . . . . . . . . . . . . . . .
8.4.2 Heap sort . . . . . . . . . . . . . . . . . . . . .
8.5 Notes and short summary . . . . . . . . . . . . . . . .
191
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
193
193
193
194
195
196
198
205
207
208
209
210
211
212
214
214
220
220
sort
. . .
. . .
. . .
. . .
. . .
. . .
. . .
.
.
.
.
.
.
.
225
225
227
228
229
230
231
231
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
232
233
237
237
245
246
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
249
249
249
249
254
264
264
266
275
277
279
281
282
282
288
9.4
9.5
IV
. . .
. . .
. . .
. . .
heap
. . .
. . .
. . .
. . .
. . .
sort
. . .
.
.
.
.
.
.
.
.
.
.
.
.
293
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
295
295
296
296
299
302
302
305
306
308
315
318
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
321
321
322
322
322
324
329
332
335
335
336
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
336
337
339
342
343
345
348
349
354
355
356
361
373
377
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
379
379
379
380
381
382
385
387
388
391
391
398
402
403
403
404
411
411
412
417
419
425
427
427
14 Searching
14.1 Introduction . . . . . . . . . . . . .
14.2 Sequence search . . . . . . . . . . .
14.2.1 Divide and conquer search .
14.2.2 Information reuse . . . . . .
14.3 Solution searching . . . . . . . . .
14.3.1 DFS and BFS . . . . . . . .
14.3.2 Search the optimal solution
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
431
431
431
432
452
479
480
516
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
14.4 Short summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
VI
Appendix
549
Appendices
A Lists
A.1 Introduction . . . . . . . . . . . . . . . . . . .
A.2 List Definition . . . . . . . . . . . . . . . . .
A.2.1 Empty list . . . . . . . . . . . . . . . .
A.2.2 Access the element and the sub list . .
A.3 Basic list manipulation . . . . . . . . . . . . .
A.3.1 Construction . . . . . . . . . . . . . .
A.3.2 Empty testing and length calculating .
A.3.3 indexing . . . . . . . . . . . . . . . . .
A.3.4 Access the last element . . . . . . . .
A.3.5 Reverse indexing . . . . . . . . . . . .
A.3.6 Mutating . . . . . . . . . . . . . . . .
A.3.7 sum and product . . . . . . . . . . . .
A.3.8 maximum and minimum . . . . . . . .
A.4 Transformation . . . . . . . . . . . . . . . . .
A.4.1 mapping and for-each . . . . . . . . .
A.4.2 reverse . . . . . . . . . . . . . . . . . .
A.5 Extract sub-lists . . . . . . . . . . . . . . . .
A.5.1 take, drop, and split-at . . . . . . . .
A.5.2 breaking and grouping . . . . . . . . .
A.6 Folding . . . . . . . . . . . . . . . . . . . . .
A.6.1 folding from right . . . . . . . . . . . .
A.6.2 folding from left . . . . . . . . . . . .
A.6.3 folding in practice . . . . . . . . . . .
A.7 Searching and matching . . . . . . . . . . . .
A.7.1 Existence testing . . . . . . . . . . . .
A.7.2 Looking up . . . . . . . . . . . . . . .
A.7.3 finding and filtering . . . . . . . . . .
A.7.4 Matching . . . . . . . . . . . . . . . .
A.8 zipping and unzipping . . . . . . . . . . . . .
A.9 Notes and short summary . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
551
551
551
552
552
553
553
554
555
556
557
559
569
573
576
577
583
585
585
587
592
592
594
597
598
598
599
599
602
604
607
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
611
611
613
613
614
615
616
616
616
617
617
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
10
CONTENTS
Part I
Preface
11
Elementary Algorithms
0.1
13
Why?
Are algorithms useful?. Some programmers say that they seldom use any
serious data structures or algorithms in real work such as commercial application
development. Even when they need some of them, they have already been
provided by libraries. For example, the C++ standard template library (STL)
provides sort and selection algorithms as well as the vector, queue, and set data
structures. It seems that knowing about how to use the library as a tool is quite
enough.
Instead of answering this question directly, I would like to say algorithms
and data structures are critical in solving interesting problems, the usefulness
of the problem set aside.
Lets start with two problems that looks like they can be solved in a bruteforce way even by a fresh programmer.
0.2
This problem is discussed in Chapter 1 of Richard Birds book [1]. Its common
that applications and systems use ID (identifier) to manage objects and entities.
At any time, some IDs are used, and some of them are available for use. When
some client tries to acquire a new ID, we want to always allocate it the smallest
available one. Suppose IDs are non-negative integers and all IDs in use are kept
in a list (or an array) which is not ordered. For example:
[18, 4, 8, 9, 16, 1, 14, 7, 19, 3, 0, 5, 2, 11, 6]
How can you find the smallest free ID, which is 10, from the list?
It seems the solution is quite easy even without any serious algorithms.
1: function Min-Free(A)
2:
x0
3:
loop
4:
if x
/ A then
5:
return x
6:
else
7:
xx+1
Where the
/ is realized like below.
1: function (x,
/
X)
2:
for i 1 to |X| do
3:
if x = X[i] then
4:
return False
5:
return True
Some languages provide handy tools which wrap this linear time process. For
example in Python, this algorithm can be directly translated as the following.
def b r u t e f o r c e ( l s t ) :
i = 0
while True :
i f i not in l s t :
14
Preface
return i
i = i + 1
0.2.1
Improvement 1
The key idea to improve the solution is based on a fact that for a series of n
numbers x1 , x2 , ..., xn , if there are free numbers, some of the xi are outside the
range [0, n); otherwise the list is exactly a permutation of 0, 1, ..., n 1 and n
should be returned as the minimum free number. We have the following fact.
minf ree(x1 , x2 , ..., xn ) n
(1)
0.2.2
Although the above improvement is much faster, it costs O(n) extra spaces to
keep a check list. if n is huge number this means a huge amount of space is
wasted.
The typical divide and conquer strategy is to break the problem into some
smaller ones, and solve these to get the final answer.
We can put all numbers xi n/2 as a sub-list A and put all the others
as a second sub-list A . Based on formula 1 if the length of A is exactly n/2,
this means the first half of numbers are full, which indicates that the minimum
free number must be in A and so well need to recursively seek in the shorter
list A . Otherwise, it means the minimum free number is located in A , which
again leads to a smaller problem.
When we search the minimum free number in A , the conditions changes
a little bit, we are not searching the smallest free number starting from 0, but
16
Preface
actually from n/2 + 1 as the lower bound. So the algorithm is something like
minf ree(A, l, u), where l is the lower bound and u is the upper bound index of
the element.
Note that there is a trivial case, that if the number list is empty, we merely
return the lower bound as the result.
This divide and conquer solution can be formally expressed as a function :
minf ree(A) = search(A, 0, |A| 1)
search(A, l, u) =
l : A=
search(A , m + 1, u) : |A | = m l + 1
search(A , l, m) : otherwise
where
l+u
A = {x A x m}
A = {x A x > m}
m=
It is obvious that this algorithm doesnt need any extra space2 . Each call
performs O(|A|) comparison to build A and A . After that the problem scale
halves. So the time needed for this algorithm is T (n) = T (n/2) + O(n) which
reduce to O(n). Another way to analyze the performance is by observing that
the first call takes O(n) to build A and A and the second call takes O(n/2), and
O(n/4) for the third... The total time is O(n + n/2 + n/4 + ...) = O(2n) = O(n)
.
In functional programming languages such as Haskell, partitioning a list has
already been provided in the basic library and this algorithm can be translated
as the following.
import Data.List
minFree xs = bsearch xs 0 (length xs - 1)
bsearch xs l u | xs == [] = l
| length as == m - l + 1 = bsearch bs (m+1) u
| otherwise = bsearch as l m
where
m = (l + u) div 2
(as, bs) = partition ( m) xs
0.2.3
This program uses a quick-sort like approach to re-arrange the array so that
all the elements before lef t are less than or equal to m; while those between
lef t and right are greater than m. This is shown in figure 1.
left
x[i]<=m
right
x[i]>m
...?...
Figure 1: Divide the array, all x[i] m where 0 i < lef t; while all x[i] > m
where lef t i < right. The left elements are unknown.
This program is fast and it doesnt need extra stack space. However, compared to the previous Haskell program, its hard to read and the expressiveness
decreased. We have to balance performance and expressiveness.
3 This is done automatically in most functional languages since our function is in tail
recursive form which lends itself perfectly to this transformation
18
0.3
Preface
If the first problem, to find the minimum free number, is a some what useful in
practice, this problem is a pure one for fun. The puzzle is to find the 1,500th
number, which only contains factor 2, 3 or 5. The first 3 numbers are of course
2, 3, and 5. Number 60 = 22 31 51 , However it is the 25th number. Number
21 = 20 31 71 , isnt a valid number because it contains a factor 7. The first 10
such numbers are list as the following.
2,3,4,5,6,8,9,10,12,15
If we consider 1 = 20 30 50 , then 1 is also a valid number and it is the first
one.
0.3.1
It seems the solution is quite easy without need any serious algorithms. We can
check all numbers from 1, then extract all factors of 2, 3 and 5 to see if the left
part is 1.
1: function Get-Number(n)
2:
x1
3:
i0
4:
loop
5:
if Valid?(x) then
6:
ii+1
7:
if i = n then
8:
return x
9:
xx+1
function Valid?(x)
while x mod 2 = 0 do
12:
x x/2
13:
while x mod 3 = 0 do
14:
x x/3
15:
while x mod 5 = 0 do
16:
x x/5
17:
if x = 1 then
18:
return T rue
19:
else
20:
return F alse
This brute-force algorithm works for most small n. However, to find the
1500th number (which is 859963392), the C program based on this algorithm
takes 40.39 seconds in my computer. I have to kill the program after 10 minutes
when I increased n to 15,000.
10:
11:
0.3.2
Improvement 1
Analysis of the above algorithm shows that modular and divide calculations
are very expensive [2]. And they executed a lot in loops. Instead of checking a
number contains only 2, 3, or 5 as factors, one alternative solution is to construct
such number by these factors.
19
1*2=2
1*3=3
1*5=5
2*2=4
4
2*3=6
5
4*2=8
4*3=12
2*5=10
10
3*2=6
3*3=9
15
4*5=20
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
10
3*5=15
20
Preface
The insert function takes O(|Q|) time to find the proper position and insert
it. If the element has already existed, it just returns.
A rough estimation tells that the length of the queue increase proportion to
n, (Each time, we extract one element, and pushed 3 new, the increase ratio
2), so the total running time is O(1 + 2 + 3 + ... + n) = O(n2 ).
Figure3 shows the number of queue access time against n. It is quadratic
curve which reflect the O(n2 ) performance.
The C program based on this algorithm takes only 0.016[s] to get the right
answer 859963392. Which is 2500 times faster than the brute force solution.
Improvement 1 can also be considered in recursive way. Suppose X is the
infinity series for all numbers which only contain factors of 2, 3, or 5. The
following formula shows an interesting relationship.
X = {1} {2x : x X} {3x : x X} {5x : x X}
(2)
Where we can define to a special form so that all elements are stored
in order as well as unique to each other. Suppose that X = {x1 , x2 , x3 ...},
Y = {y1 , y2 , y3 , ...}, X = {x2 , x3 , ...} and Y = {y2 , y3 , ...}. We have
X : Y =
Y : X=
{x1 , X Y } : x1 < y1
X Y =
{x1 , X Y } : x1 = y1
{y1 , X Y } : x1 > y1
In a functional programming language such as Haskell, which supports lazy
evaluation, The above infinity series functions can be translate into the following
program.
ns = 1:merge (map (2) ns) (merge (map (3) ns) (map (5) ns))
merge [] l = l
merge l [] = l
merge (x:xs) (y:ys) | x <y = x : merge xs (y:ys)
21
| x ==y = x : merge xs ys
| otherwise = y : merge (x:xs) ys
0.3.3
Improvement 2
Considering the above solution, although it is much faster than the brute-force
one, It still has some drawbacks. It produces many duplicated numbers and
they are finally dropped when examine the queue. Secondly, it does linear scan
and insertion to keep the order of all elements in the queue, which degrade the
ENQUEUE operation from O(1) to O(|Q|).
If we use three queues instead of using only one, we can improve the solution
one step ahead. Denote these queues as Q2 , Q3 , and Q5 , and we initialize them
as Q2 = {2}, Q3 = {3} and Q5 = {5}. Each time we DEQUEUEed the smallest
one from Q2 , Q3 , and Q5 as x. And do the following test:
If x comes from Q2 , we ENQUEUE 2x, 3x, and 5x back to Q2 , Q3 , and
Q5 respectively;
If x comes from Q3 , we only need ENQUEUE 3x to Q3 , and 5x to Q5 ;
We neednt ENQUEUE 2x to Q2 , because 2x have already existed in Q3 ;
If x comes from Q5 , we only need ENQUEUE 5x to Q5 ; there is no need
to ENQUEUE 2x, 3x to Q2 , Q3 because they have already been in the
queues;
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
We repeatedly ENQUEUE the smallest one until we find the n-th element.
The algorithm based on this idea is implemented as below.
function Get-Number(n)
if n = 1 then
return 1
else
Q2 {2}
Q3 {3}
Q5 {5}
while n > 1 do
x min(Head(Q2 ), Head(Q3 ), Head(Q5 ))
if x = Head(Q2 ) then
Dequeue(Q2 )
Enqueue(Q2 , 2x)
Enqueue(Q3 , 3x)
Enqueue(Q5 , 5x)
else if x = Head(Q3 ) then
Dequeue(Q3 )
Enqueue(Q3 , 3x)
Enqueue(Q5 , 5x)
else
Dequeue(Q5 )
22
Preface
2*min=4
3*min=6
5*min=10
3*min=9
min=2
2*min=8
10
min=3
3*min=12
5*min=15
5*min=20
10
15
min=4
5*min=25
12
10
15
20
min=5
Enqueue(Q5 , 5x)
22:
nn1
23:
return x
This algorithm loops n times, and within each loop, it extract one head
element from the three queues, which takes constant time. Then it appends
one to three new elements at the end of queues which bounds to constant time
too. So the total time of the algorithm bounds to O(n). The C++ program
translated from this algorithm shown below takes less than 1 s to produce the
1500th number, 859963392.
21:
23
Q5.push(x5);
}
else if(x==Q3.front()){
Q3.pop();
Q3.push(x3);
Q5.push(x5);
}
else{
Q5.pop();
Q5.push(x5);
}
}
return x;
}
This solution can be also implemented in Functional way. We define a function take(n), which will return the first n numbers contains only factor 2, 3, or
5.
take(n) = f (n, {1}, {2}, {3}, {5})
Where
f (n, X, Q2 , Q3 , Q5 ) =
X : n=1
f (n 1, X {x}, Q2 , Q3 , Q5 ) : otherwise
Invoke last takeN 1500 will generate the correct answer 859963392.
0.4
If review the 2 puzzles, we found in both cases, the brute-force solutions are so
weak. In the first problem, its quite poor in dealing with long ID list, while in
the second problem, it doesnt work at all.
The first problem shows the power of algorithms, while the second problem
tells why data structure is important. There are plenty of interesting problems,
which are hard to solve before computer was invented. With the aid of computer and programming, we are able to find the answer in a quite dierent way.
24
Preface
0.5
In the following series of post, Ill first introduce about elementary data structures before algorithms, because many algorithms need knowledge of data structures as prerequisite.
The hello world data structure, binary search tree is the first topic; Then
we introduce how to solve the balance problem of binary search tree. After
that, Ill show other interesting trees. Trie, Patricia, sux trees are useful in
text manipulation. While B-trees are commonly used in file system and data
base implementation.
The second part of data structures is about heaps. Well provide a general Heap definition and introduce about binary heaps by array and by explicit
binary trees. Then well extend to K-ary heaps including Binomial heaps, Fibonacci heaps, and pairing heaps.
Array and queues are considered among the easiest data structures typically,
However, well show how dicult to implement them in the third part.
As the elementary sort algorithms, well introduce insertion sort, quick sort,
merge sort etc in both imperative way and functional way.
The final part is about searching, besides the element searching, well also
show string matching algorithms such as KMP.
Bibliography
[1] Richard Bird. Pearls of functional algorithm design. Cambridge University Press; 1 edition (November 1, 2010). ISBN-10: 0521513383
[2] Jon Bentley. Programming Pearls(2nd Edition). Addison-Wesley Professional; 2 edition (October 7, 1999). ISBN-13: 978-0201657883
[3] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
25
26
BIBLIOGRAPHY
Part II
Trees
27
Chapter 1
Introduction
Arrays or lists are typically considered the hello world data structures. However, well see they are not actually particularly easy to implement. In some
procedural settings, arrays are the most elementary data structures, and it is
possible to implement linked lists using arrays (see section 10.3 in [2]). On the
other hand, in some functional settings, linked lists are the elementary building
blocks used to create arrays and other data structures.
Considering these factors, we start with Binary Search Trees (or BST) as the
hello world data structure using an interesting problem Jon Bentley mentioned
in Programming Pearls [2]. The problem is to count the number of times each
word occurs in a large text. One solution in C++ is below:
int main(int, char ){
map<string, int> dict;
string s;
while(cin>>s)
++dict[s];
map<string, int>::iterator it=dict.begin();
for(; it!=dict.end(); ++it)
cout<<itfirst<<": "<<itsecond<<"n";
}
And we can run it to produce the result using the following UNIX commands
1
29
14
10
1.2
Data Layout
Based on the recursive definition of BSTs, we can draw the data layout in a
procedural setting with pointers as in Figure 1.3.
31
4
16
10
14
The node first contains a field for the key, which can be augmented with
satellite data. The next two fields contain pointers to the left and right children,
respectively. To make backtracking to ancestors easy, a parent field is sometimes
provided as well.
In this section, well ignore the satellite data for the sake of simplifying
the illustrations. Based on this layout, the node of BST can be defined in a
procedural language, such as C++:
template<class T>
struct node{
node(T x):key(x), left(0), right(0), parent(0){}
~node(){
delete left;
delete right;
}
node left;
node right;
node parent; //Optional, its helpful for succ and pred
T key;
};
There is another setting, for instance in Scheme/Lisp languages, the elementary data structure is a linked list. Figure 1.4 shows how a BST node can be
built on top of linked list.
In more functional settings, its hard to use pointers for backtracking (and
typically, there is no need for backtracking, since there are usually top-down
recursive solutions), and so the parent field has been omitted in that layout.
To simplify things, well skip the detailed layouts in the future and only focus
on the logic layouts of data structures. For example, below is the definition of
left
left
right
right
parent
parent
...
...
...
...
key
next
left ...
next
right ...
NIL
Figure 1.4: Binary search tree node layout on top of linked list. Where left...
and right ... are either empty or BST nodes composed in the same way.
1.3. INSERTION
33
1.3
Insertion
node(, k, ) : T =
This program utilized the pattern matching features provided by the language. However, even in functional settings without this feature (e.g. Scheme/Lisp)
the program is still expressive:
(define (insert tree x)
(cond ((null? tree) (list () x ()))
((< x (key tree))
(make-tree (insert (left tree) x)
(key tree)
(right tree)))
((> x (key tree))
(make-tree (left tree)
(key tree)
(insert (right tree) x)))))
function Insert(T, k)
root T
x Create-Leaf(k)
parent N IL
while T = N IL do
parent T
if k < Key(T ) then
T Left(T )
else
T Right(T )
Parent(x) parent
if parent = N IL then
return x
else if k < Key(parent) then
Left(parent) x
else
Right(parent) x
return root
tree T is empty
function Create-Leaf(k)
x Empty-Node
Key(x) k
22:
Left(x) N IL
23:
Right(x) N IL
24:
Parent(x) N IL
25:
return x
While more complex than the functional algorithm, it is still fast, even when
presented with very deep trees. Complete C++ and python programs are available along with this section for reference.
19:
20:
21:
1.4
Traversing
Traversing means visiting every element one-by-one in a BST. There are 3 ways
to traverse a binary tree: a pre-order tree walk, an in-order tree walk and a
post-order tree walk. The names of these traversal methods highlight the order
in which we visit the root of a BST.
pre-order traversal:, visit the key, then the left child, finally the right child;
in-order traversal: visit the left child, then the key, finally the right child;
post-order traversal: visit the left child, then the right child, finally the
key.
Note that each visiting operation is recursive. As mentioned before, we see
that the order in which the key is visited determines the name of the traversal
method.
For the BST shown in figure 1.2, below are the three dierent traversal
results.
1.4. TRAVERSING
35
: T =
node(Tl , k , Tr ) : otherwise
(1.2)
Where
Tl = map(f, Tl )
Tr = map(f, Tr )
k = f (k)
And Tl , Tr and k are the children and key when the tree isnt empty.
If we only need access the key without create the transformed tree, we can
realize this algorithm in procedural way lie the below C++ program.
template<class T, class F>
void in_order_walk(node<T> t, F f){
if(t){
in_order_walk(tleft, f);
f(tvalue);
in_order_walk(tright, f);
}
}
: T =
toList(Tl ) {k} toList(Tr ) : otherwise
(1.3)
(1.4)
(1.5)
insert
For the readers who are not familiar with folding from left, this function can
also be defined recursively as the following.
{
f romList(X) =
: X=
insert(f romList({x2 , x3 , ..., xn }), x1 ) : otherwise
Well intense use folding function as well as the function composition and
partial evaluation in the future, please refer to appendix of this book or [6] [7]
and [8] for more information.
Exercise 1.1
Given the in-order traverse result and pre-order traverse result, can you reconstruct the tree from these result and figure out the post-order traversing
result?
Pre-order result: 1, 2, 4, 3, 5, 6;
In-order result: 4, 2, 1, 5, 3, 6;
Post-order result: ?
Write a program in your favorite language to re-construct the binary tree
from pre-order result and in-order result.
Prove why in-order walk output the elements stored in a binary search
tree in increase order?
Can you analyze the performance of tree sort with big-O notation?
2 Also known as Curried form to memorialize the mathematican and logician Haskell
Curry.
1.5
37
There are three types of querying for binary search tree, searching a key in the
tree, find the minimum or maximum element in the tree, and find the predecessor
or successor of an element in the tree.
1.5.1
Looking up
According to the definition of binary search tree, search a key in a tree can be
realized as the following.
If the tree is empty, the searching fails;
If the key of the root is equal to the value to be found, the search succeed.
The root is returned as the result;
If the value is less than the key of the root, search in the left child.
Else, which means that the value is greater than the key of the root, search
in the right child.
This algorithm can be described with a recursive function as below.
: T =
T : k=x
lookup(T, x) =
lookup(Tl , x) : x < k
lookup(Tr , x) : otherwise
(1.6)
Where Tl , Tr and k are the children and key when T isnt empty. In the real
application, we may return the satellite data instead of the node as the search
result. This algorithm is simple and straightforward. Here is a translation of
Haskell program.
lookup Empty _ = Empty
lookup t@(Node l k r) x | k == x = t
| x < k = lookup l x
| otherwise = lookup r x
If the BST is well balanced, which means that almost all nodes have both
non-NIL left child and right child, for n elements, the search algorithm takes
O(lg n) time to perform. This is not formal definition of balance. Well show it
in later post about red-black-tree. If the tree is poor balanced, the worst case
takes O(n) time to search for a key. If we denote the height of the tree as h, we
can uniform the performance of the algorithm as O(h).
The search algorithm can also be realized without using recursion in a procedural manner.
1: function Search(T, x)
2:
while T = N IL Key(T ) = x do
3:
if x < Key(T ) then
4:
T Left(T )
5:
else
6:
T Right(T )
7:
return T
1.5.2
Minimum and maximum can be implemented from the property of binary search
tree, less keys are always in left child, and greater keys are in right.
For minimum, we can continue traverse the left sub tree until it is empty.
While for maximum, we traverse the right.
{
k : Tl =
min(T ) =
(1.7)
min(Tl ) : otherwise
{
max(T ) =
k : Tr =
max(Tr ) : otherwise
(1.8)
Both functions bound to O(h) time, where h is the height of the tree. For
the balanced BST, min/max are bound to O(lg n) time, while they are O(n) in
the worst cases.
We skip translating them to programs, Its also possible to implement them
in pure procedural way without using recursion.
1.5.3
is ref in ML and OCaml, but we only consider the purely functional settings.
39
When finding the successor of element x, which is the smallest one y that
satisfies y > x, there are two cases. If the node with value x has non-NIL right
child, the minimum element in right child is the answer; For example, in Figure
1.5, in order to find the successor of 8, we search its right sub tree for the
minimum one, which yields 9 as the result. While if node x dont have right
child, we need back-track to find the closest ancestor whose left child is also
ancestor of x. In Figure 1.5, since 2 dont have right sub tree, we go back to its
parent 1. However, node 1 dont have left child, so we go back again and reach
to node 3, the left child of 3, is also ancestor of 2, thus, 3 is the successor of
node 2.
4
16
10
14
Figure 1.5: The successor of 8, is the minimum one in its right sub tree, 9;
In order to find the successor of 2, we go up to its parent 1, but 1 doesnt have
left child, we go up again and find 3. Because its left child is also the ancestor
of 2, 3 is the result.
1:
2:
3:
4:
5:
6:
7:
8:
9:
If x doesnt has successor, this algorithm returns NIL. The predecessor case
is quite similar to the successor algorithm, they are symmetrical to each other.
1: function Pred(x)
2:
if Left(x) = N IL then
3:
return Max(Left(x))
4:
else
5:
p Parent(x)
6:
7:
8:
9:
Below are the Python programs based on these algorithms. They are changed
a bit in while loop conditions.
def succ(x):
if x.right is not None: return tree_min(x.right)
p = x.parent
while p is not None and p.left != x:
x=p
p = p.parent
return p
def pred(x):
if x.left is not None: return tree_max(x.left)
p = x.parent
while p is not None and p.right != x:
x=p
p = p.parent
return p
Exercise 1.2
Can you figure out how to iterate a tree as a generic container by using
Pred/Succ? Whats the performance of such traversing process in terms
of big-O?
A reader discussed about traversing all elements inside a range [a, b]. In
C++, the algorithm looks like the below code:
for each (m.lower bound(12), m.upper bound(26), f);
Can you provide the purely function solution for this problem?
1.6
Deletion
Deletion is another imperative only topic for binary search tree. This is because
deletion mutate the tree, while in purely functional settings, we dont modify
the tree after building it in most application.
However, One method of deleting element from binary search tree in purely
functional way is shown in this section. Its actually reconstructing the tree but
not modifying the tree.
Deletion is the most complex operation for binary search tree. this is because
we must keep the BST property, that for any node, all keys in left sub tree are
less than the key of this node, and they are all less than any keys in right sub
tree. Deleting a node can break this property.
In this post, dierent with the algorithm described in [2], A simpler one from
SGI STL implementation is used.[6]
To delete a node x from a tree.
1.6. DELETION
41
NIL
NIL
Tree
Tree
x
L
NIL
Tree
Tree
x
R
NIL
Figure 1.7: Delete a node which has only one non-NIL child.
Tree
min(R)
Tree
x
L
delete(R, min(R))
node(delete(T
,
x),
K,
T
)
l
r
Tl
:
:
:
:
:
:
T =
x<k
x>k
x = k Tl =
x = k Tr =
otherwise
(1.9)
Where
Tl = lef t(T )
Tr = right(T )
k = key(T )
y = min(Tr )
Translating the function to Haskell yields the below program.
delete Empty _ = Empty
delete (Node l k r) x | x < k = (Node (delete l x) k r)
| x > k = (Node l k (delete r x))
-- x == k
| isEmpty l = r
| isEmpty r = l
| otherwise = (Node l k (delete r k))
where k = min r
Function isEmpty is to test if a tree is empty (). Note that the algorithm
first performs search to locate the node where the element need be deleted,
after that it execute the deletion. This algorithm takes O(h) time where h is
the height of the tree.
1.6. DELETION
43
Its also possible to pass the node but not the element to the algorithm for
deletion. Thus the searching is no more needed.
The imperative algorithm is more complex because it need set the parent
properly. The function will return the root of the result tree.
1: function Delete(T, x)
2:
rT
3:
x x
save x
4:
p Parent(x)
5:
if Left(x) = N IL then
6:
x Right(x)
7:
else if Right(x) = N IL then
8:
x Left(x)
9:
else
both children are non-NIL
10:
y Min(Right(x))
11:
Key(x) Key(y)
12:
Copy other satellite data from y to x
13:
if Parent(y) = x then
y hasnt left sub tree
14:
Left(Parent(y)) Right(y)
15:
else
y is the root of right child of x
16:
Right(x) Right(y)
17:
Remove y
18:
return r
19:
if x = N IL then
20:
Parent(x) p
21:
if p = N IL then
We are removing the root of the tree
22:
rx
23:
else
24:
if Left(p) = x then
25:
Left(p) x
26:
else
27:
Right(p) x
28:
Remove x
29:
return r
Here we assume the node to be deleted is not empty (otherwise we can simply
returns the original tree). In other cases, it will first record the root of the tree,
create copy pointers to x, and its parent.
If either of the children is empty, the algorithm just splice x out. If it has
two non-NIL children, we first located the minimum of right child, replace the
key of x to ys, copy the satellite data as well, then splice y out. Note that there
is a special case that y is the root node of xs right sub tree.
Finally we need reset the stored parent if the original x has only one nonNIL child. If the parent pointer we copied before is empty, it means that we are
deleting the root node, so we need return the new root. After the parent is set
properly, we finally remove the old x from memory.
The relative Python program for deleting algorithm is given as below. Because Python provides GC, we neednt explicitly remove the node from the
memory.
def tree_delete(t, x):
Exercise 1.3
There is a symmetrical solution for deleting a node which has two non-NIL
children, to replace the element by splicing the maximum one out o the
left sub-tree. Write a program to implement this solution.
1.7
It can be found that all operations given in this post bound to O(h) time for a
tree of height h. The height aects the performance a lot. For a very unbalanced
tree, h tends to be O(n), which leads to the worst case. While for balanced tree,
h close to O(lg n). We can gain the good performance.
How to make the binary search tree balanced will be discussed in next post.
However, there exists a simple way. Binary search tree can be randomly built as
described in [2]. Randomly building can help to avoid (decrease the possibility)
unbalanced binary trees. The idea is that before building the tree, we can call
a random process, to shue the elements.
Exercise 1.4
Write a randomly building process for binary search tree.
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[2] Jon Bentley. Programming Pearls(2nd Edition). Addison-Wesley Professional; 2 edition (October 7, 1999). ISBN-13: 978-0201657883
[3] Chris Okasaki. Ten Years of Purely Functional Data Structures.
http://okasaki.blogspot.com/2008/02/ten-years-of-purely-functionaldata.html
[4] SGI.
Standard
Template
http://www.sgi.com/tech/stl/
Library
Programmers
Guide.
45
46
Chapter 2
Introduction
(2.1)
1 Some reader may argue that Bubble sort is the easiest sort algorithm. Bubble sort isnt
covered in this book as we dont think its a valuable algorithm[1]
47
48
insert
Figure 2.2: The left part is sorted data, continuously insert elements to sorted
part.
We can find there is recursive concept in this definition. Thus it can be
expressed as the following.
{
: A=
sort(A) =
(2.2)
insert(sort({a2 , a3 , ...}), a1 ) : otherwise
2.2
Insertion
We havent answered the question about how to realize insertion however. Its
a puzzle how does human locate the proper position so quickly.
For computer, its an obvious option to perform a scan. We can either scan
from left to right or vice versa. However, if the sequence is stored in plain array,
its necessary to scan from right to left.
2.2. INSERTION
49
function Sort(A)
for i 2 to |A| do
Insert A[i] to sorted sequence A[1...i 1]
x A[i]
j i1
while j > 0 x < A[j] do
A[j + 1] A[j]
j j1
A[j + 1] x
One may think scan from left to right is natural. However, it isnt as eect
as above algorithm for plain array. The reason is that, its expensive to insert an
element in arbitrary position in an array. As array stores elements continuously,
if we want to insert new element x in position i, we must shift all elements after
i, including i + 1, i + 2, ... one position to right. After that the cell at position i
is empty, and we can put x in it. This is illustrated in figure 2.3.
x
insert
A[1]
A[2]
...
A[i-1]
A[i]
A[i+1]
A[i+2]
...
A[n-1]
A[n]
empty
It can be found some other equivalent programs, for instance the following
ANSI C program. However this version isnt as eective as the pseudo code.
void isort(Key xs, int n){
int i, j;
for(i=1; i<n; ++i)
for(j=i-1; j 0 && xs[j+1] < xs[j]; --j)
swap(xs, j, j+1);
}
50
This is because the swapping function, which can exchange two elements
typically uses a temporary variable like the following:
void swap(Key xs, int i, int j){
Key temp = xs[i];
xs[i] = xs[j];
xs[j] = temp;
}
Exercise 2.1
Provide explicit insertion function, and call it with general insertion sort
algorithm. Please realize it in both procedural way and functional way.
2.3
Improvement 1
Lets go back to the question, that why human being can find the proper position
for insertion so quickly. We have shown a solution based on scan. Note the fact
that at any time, all cards at hands have been well sorted, another possible
solution is to use binary search to find that location.
Well explain the search algorithms in other dedicated chapter. Binary search
is just briefly introduced for illustration purpose here.
The algorithm will be changed to call a binary search procedure.
function Sort(A)
for i 2 to |A| do
x A[i]
p Binary-Search(A[1...i 1], x)
for j i down to p do
A[j] A[j 1]
A[p] x
Instead of scan elements one by one, binary search utilize the information
that all elements in slice of array {A1 , ..., Ai1 } are sorted. Lets assume the
order is monotonic increase order. To find a position j that satisfies Aj1
x Aj . We can first examine the middle element, for example, Ai/2 . If x is
less than it, we need next recursively perform binary search in the first half of
the sequence; otherwise, we only need search in last half.
Every time, we halve the elements to be examined, this search process runs
O(lg n) time to locate the insertion position.
2.4. IMPROVEMENT 2
51
function Binary-Search(A, x)
l1
u 1 + |A|
while l < u do
m l+u
2
if A[m] = x then
return m
else if A[m] < x then
l m+1
else
um
return l
Exercise 2.2
Write the binary search in recursive manner. You neednt use purely functional programming language.
2.4
Improvement 2
Although we improve the search time to O(n lg n) in previous section, the number of moves is still O(n2 ). The reason of why movement takes so long time, is
because the sequence is stored in plain array. The nature of array is continuously layout data structure, so the insertion operation is expensive. This hints
52
us that we can use linked-list setting to represent the sequence. It can improve
the insertion operation from O(n) to constant time O(1).
insert(A, x) =
{x} : A =
{x} A : x < a1
(2.3)
And we can complete the two versions of insertion sort program based on
the first two equations in this chapter.
isort [] = []
isort (x:xs) = insert (isort xs) x
53
return head;
}
reference based
array. For any
follows A[i]. It
below.
Here means the head of the N ext table. And the relative Python program
for this algorithm is given as the following.
def isort(xs):
n = len(xs)
next = [-1](n+1)
for i in range(n):
insert(xs, next, i)
return next
def insert(xs, next, i):
j = -1
while next[j] != -1 and xs[next[j]] < xs[i]:
j = next[j]
next[j], next[i] = i, next[j]
Although we change the insertion operation to constant time by using linkedlist. However, we have to traverse the linked-list to find the position, which
results O(n2 ) times comparison. This is because linked-list, unlike array, doesnt
support random access. It means we cant use binary search with linked-list
setting.
Exercise 2.3
Complete the insertion sort by using linked-list insertion function in your
favorate imperative programming language.
The index based linked-list return the sequence of rearranged index as
result. Write a program to re-order the original array of elements from
this result.
2.5
It seems that we drive into a corner. We must improve both the comparison
and the insertion at the same time, or we will end up with O(n2 ) performance.
54
We must use binary search, this is the only way to improve the comparison
time to O(lg n). On the other hand, we must change the data structure, because
we cant achieve constant time insertion at a position with plain array.
This remind us about our hello world data structure, binary search tree. It
naturally support binary search from its definition. At the same time, We can
insert a new node in binary search tree in O(1) constant time if we already find
the location.
So the algorithm changes to this.
function Sort(A)
T
for each x A do
T Insert-Tree(T, x)
return To-List(T )
Where Insert-Tree() and To-List() are described in previous chapter
about binary search tree.
As we have analyzed for binary search tree, the performance of tree sort is
bound to O(n lg n), which is the lower limit of comparison based sort[3].
2.6
Short summary
In this chapter, we present the evolution process of insertion sort. Insertion sort
is well explained in most textbooks as the first sorting algorithm. It has simple
and straightforward idea, but the performance is quadratic. Some textbooks
stop here, but we want to show that there exist ways to improve it by dierent
point of view. We first try to save the comparison time by using binary search,
and then try to save the insertion operation by changing the data structure to
linked-list. Finally, we combine these two ideas and evolute insertion sort to
tree sort.
Bibliography
[1] http://en.wikipedia.org/wiki/Bubble sort
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
55
56
Chapter 3
Introduction
3.1.1
We have shown the power of binary search tree by using it to count the occurrence of every word in a text. The idea is to use binary search tree as a
dictionary for counting.
One may come to the idea that to feed a yellow page book 1 to a binary
search tree, and use it to look up the phone number for a contact.
By modifying a bit of the program for word occurrence counting yields the
following code.
int main(int, char ){
ifstream f("yp.txt");
map<string, string> dict;
string name, phone;
while(f>>name && f>>phone)
dict[name]=phone;
for(;;){
cout<<"nname: ";
cin>>name;
if(dict.find(name)==dict.end())
cout<<"not found";
else
cout<<"phone: "<<dict[name];
}
}
This program works well. However, if you replace the STL map with the binary search tree as mentioned previously, the performance will be bad, especially
when you search some names such as Zara, Zed, Zulu.
This is because the content of yellow page is typically listed in lexicographic
order. Which means the name list is in increase order. If we try to insert a
1A
57
...
Exercise 3.1
For a very big yellow page list, one may want to speed up the dictionary
building process by two concurrent tasks (threads or processes). One task
reads the name-phone pair from the head of the list, while the other one
reads from the tail. The building terminates when these two tasks meet at
the middle of the list. What will be the binary search tree looks like after
building? What if you split the list more than two and use more tasks?
Can you find any more cases to exploit a binary search tree? Please
consider the unbalanced trees shown in figure 3.2.
3.1.2
In order to avoid such case, we can shue the input sequence by randomized
algorithm, such as described in Section 12.4 in [2]. However, this method doesnt
always work, for example the input is fed from user interactively, and the tree
need to be built/updated after each input.
There are many solutions people have ever found to make binary search tree
balanced. Many of them rely on the rotation operations to binary search tree.
Rotation operations change the tree structure while maintain the ordering of the
elements. Thus it can be used to improve the balance property of the binary
search tree.
In this chapter, well first introduce about red-black tree, which is one of
the most popular and widely used self-adjusting balanced binary search tree. In
3.1. INTRODUCTION
59
n-1
n-2
n-1
...
...
(a)
(b)
m
m-1
m+1
m-2
m+2
...
...
(c)
3.1.3
Tree rotation
(a)
(b)
Figure 3.3: Tree rotation, rotate-left transforms the tree from left side to right
side, and rotate-right does the inverse transformation.
Tree rotation is a kind of special operation that can transform the tree
structure without changing the in-order traverse result. It based on the fact
that for a specified ordering, there are multiple binary search trees correspond
to it. Figure 3.3 shows the tree rotation. For a binary search tree on the left
side, left rotate transforms it to the tree on the right, and right rotate does the
inverse transformation.
Although tree rotation can be realized in procedural way, there exists quite
simple functional description if using pattern matching. Denote the non-empty
tree as T = (Tl , k, Tr ).
{
((a, X, b), Y, c) : T = (a, X, (b, Y, c))
rotateL(T ) =
(3.1)
T : otherwise
{
(a, X, (b, Y, c)) : T = ((a, X, b), Y, c))
rotateR(T ) =
(3.2)
T : otherwise
However, the pseudo code dealing imperatively has to set all fields accordingly.
1: function Left-Rotate(T, x)
2:
p Parent(x)
3:
y Right(x)
Assume y = NIL
4:
a Left(x)
5:
b Left(y)
6:
c Right(y)
7:
Replace(x, y)
8:
Set-Children(x, a, b)
9:
Set-Children(y, x, c)
10:
if p = NIL then
3.1. INTRODUCTION
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
61
T y
return T
function Right-Rotate(T, y)
p Parent(y)
x Left(y)
a Left(x)
b Right(x)
c Right(y)
Replace(y, x)
Set-Children(y, b, c)
Set-Children(x, a, y)
if p = NIL then
T x
return T
Assume x = NIL
function Set-Left(x, y)
Left(x) y
if y = NIL then Parent(y) x
function Set-Right(x, y)
Right(x) y
if y = NIL then Parent(y) x
function Set-Children(x, L, R)
Set-Left(x, L)
Set-Right(x, R)
function Replace(x, y)
if Parent(x) = NIL then
if y = NIL then Parent(y) NIL
else if Left(Parent(x)) = x then
Set-Left(Parent(x), y)
else
Set-Right(Parent(x), y)
Parent(x) NIL
Compare these pseudo codes with the pattern matching functions, the latter
focus on the structure states changing, while the former focus on the rotation
process. As the title of this chapter indicated, red-black tree neednt be so
complex as it was thought. Most traditional algorithm text books use the classic
procedural way to teach red-black tree, there are several cases need to deal and
all need carefulness to manipulate the node fields. However, by changing the
mind to functional settings, things get intuitive and simple. Although there is
some performance overhead.
Most of the content in this chapter is based on Chris Okasakis work in [2].
3.2
17
11
15
25
22
27
All read only operations such as search, min/max are as same as in binary
search tree. While only the insertion and deletion are special.
As we have shown in word occurrence example, many implementation of
set or map container are based on red-black tree. One example is the C++
Standard library (STL)[6].
2 Red-black tree is one of the equivalent form of 2-3-4 tree (see chapter B-tree about 2-3-4
tree). That is to say, for any 2-3-4 tree, there is at least one red-black tree has the same data
order.
3.3. INSERTION
63
As mentioned previously, the only change in data layout is that there is color
information augmented to binary search tree. This can be represented as a data
field in imperative languages such as C++ like below.
enum Color {Red, Black};
template <class T>
struct node{
Color color;
T key;
node left;
node right;
node parent;
};
Exercise 3.2
Can you prove that a red-black tree with n nodes has height at most
2 lg(n + 1)?
3.3
Insertion
Inserting a new node as what has been mentioned in binary search tree may
cause the tree unbalanced. The red-black properties has to be maintained, so
we need do some fixing by transform the tree after insertion.
When we insert a new key, one good practice is to always insert it as a red
node. As far as the new inserted node isnt the root of the tree, we can keep all
properties except the 4-th one. that it may bring two adjacent red nodes.
Functional and procedural implementation have dierent fixing methods.
One is intuitive but has some overhead, the other is a bit complex but has
higher performance. Most text books about algorithm introduce the later one.
In this chapter, we focus on the former to show how easily a red-black tree
insertion algorithm can be realized. The traditional procedural method will be
given only for comparison purpose.
As described by Chris Okasaki, there are total 4 cases which violate property
4. All of them has 2 adjacent red nodes. However, they have a uniformed form
after fixing[2] as shown in figure 4.3.
Note that this transformation will move the redness one level up. During the
bottom-up recursive fixing, the last step will make the root node red. According
to property 2, root is always black. Thus we need final fixing to revert the root
color to black.
Observing that the 4 cases and the fixed result have strong pattern features,
the fixing function can be defined by using the similar method we mentioned in
@
R
@
@
A
@
I
@
@
A
3.3. INSERTION
65
tree rotation. Denote the color of a node as C, it has two values: black B, and
redR. Thus a non-empty tree can be represented as T = (C, Tl , k, Tr ).
{
balance(T ) =
(3.3)
where function match() tests if a tree is one of the 4 possible patterns as the
following.
(3.4)
where
ins(T, k) =
(R, , k, ) :
balance((ins(Tl , k), k , Tr )) :
T =
k < k
otherwise
(3.5)
If the tree is empty, then a new red node is created, and k is set as the key;
otherwise, denote the children and the key as Tl , Tr , and k , we compare k and
k and recursively insert k to one of the children. Function balance is called
after that, and the root is force to be black finally.
makeBlack(T ) = (B, Tl , k, Tr )
(3.6)
Summarize the above functions and use language supported pattern matching features, we can come to the following Haskell program.
insert t x = makeBlack $ ins t where
ins Empty = Node R Empty x Empty
ins (Node color l k r)
| x<k
= balance color (ins l) k r
| otherwise = balance color l k (ins r) --[3]
makeBlack(Node _ l k r) = Node B l k r
balance B (Node R (Node R a x b) y c) z d =
Node R (Node B a x b) y (Node
balance B (Node R a x (Node R b y c)) z d =
Node R (Node B a x b) y (Node
balance B a x (Node R b y (Node R c z d)) =
Node R (Node B a x b) y (Node
balance B a x (Node R (Node R b y c) z d) =
Node R (Node B a x b) y (Node
balance color l k r = Node color l k r
B c z d)
B c z d)
B c z d)
B c z d)
Note that the balance function is changed a bit from the original definition.
Instead of passing the tree, we pass the color, the left child, the key and the
right child to it. This can save a pair of boxing and un-boxing operations.
14
11
15
Exercise 3.3
Write a program in an imperative language, such as C, C++ or python
to realize the same algorithm in this section. Note that, because there is
no language supported pattern matching, you need to test the 4 dierent
cases manually.
3.4
Deletion
Remind the deletion section in binary search tree. Deletion is imperative only
for red-black tree as well. In typically practice, it often builds the tree just one
time, and performs looking up frequently after that. Okasaki explained why
he didnt provide red-black tree deletion in his work in [3]. One reason is that
deletions are much messier than insertions.
The purpose of this section is just to show that red-black tree deletion is
possible in purely functional settings, although it actually rebuilds the tree
because trees are read only in terms of purely functional data structure3 . In
real world, its up to the user (or actually the programmer) to adopt the proper
3 Actually, the common part of the tree is reused. Most functional programming environments support this persistent feature.
3.4. DELETION
67
solution. One option is to mark the node be deleted with a flag, and perform a
tree rebuilding when the number of deleted nodes exceeds 50%.
Not only in functional settings, even in imperative settings, deletion is more
complex than insertion. We face more cases to fix. Deletion may also violate the
red black tree properties, so we need fix it after the normal deletion as described
in binary search tree.
The deletion algorithm in this book are based on top of a handout in [5].
The problem only happens if you try to delete a black node, because it will
violate the last property of red-black tree, which means the number of black
node in the path may decreased so that it is not uniformed black-height any
more.
When delete a black node, we can resume the last red-black property by
introducing a doubly-black concept[2]. It means that the although the node
is deleted, the blackness is kept by storing it in the parent node. If the parent
node is red, it turns to black, However, if it has been already black, it turns to
doubly-black.
In order to express the doubly-black node, The definition need some modification accordingly.
data Color = R | B | BB -- BB: doubly black for deletion
data RBTree a = Empty | BBEmpty -- doubly black empty
| Node Color (RBTree a) a (RBTree a)
When deleting a node, we first perform the same deleting algorithm in binary
search tree mentioned in previous chapter. After that, if the node to be sliced
out is black, we need fix the tree to keep the red-black properties. The delete
function is defined as the following.
delete(T, k) = blackenRoot(del(T, k))
(3.7)
where
del(T, k) =
: T =
: k < k
: k > k
: Tl =
(3.8)
: Tr =
:
otherwise
The real deleting happens inside function del. For the trivial case, that the
tree is empty, the deletion result is ; If the key to be deleted is less than the
key of the current node, we recursively perform deletion on its left sub-tree; if
it is bigger than the key of the current node, then we recursively delete the key
from the right sub-tree; Because it may bring doubly-blackness, so we need fix
it.
If the key to be deleted is equal to the key of the current node, we need
splice it out. If one of its children is empty, we just replace the node by the
other one and reserve the blackness of this node. otherwise we cut and past the
minimum element k = min(Tr ) from the right sub-tree.
: T =
(B, Tl , k, Tr ) : C = R
(3.10)
mkBlk(T ) =
(B 2 , Tl , k, Tr ) : C = B
T : otherwise
where B2 denotes the doubly-black color.
Summarizing the above functions yields the following Haskell program.
delete t x = blackenRoot(del t x) where
del Empty _ = Empty
del (Node color l k r) x
| x < k = fixDB color (del l x) k r
| x > k = fixDB color l k (del r x)
-- x == k, delete this node
| isEmpty l = if color==B then makeBlack r else r
| isEmpty r = if color==B then makeBlack l else l
| otherwise = fixDB color l k (del r k) where k= min r
blackenRoot (Node _ l k r) = Node B l k r
blackenRoot _ = Empty
makeBlack
makeBlack
makeBlack
makeBlack
The final attack to the red-black tree deletion algorithm is to realize the
f ixBlack 2 function. The purpose of this function is to eliminate the doublyblack colored node by rotation and color changing.
Lets solve the doubly-black empty node first. For any node, If one of its
child is doubly-black empty, and the other child is non-empty, we can safely
replace the doubly-black empty with a normal empty node.
Like figure 3.7, if we are going to delete the node 4 from the tree (Instead
show the whole tree, only part of the tree is shown), the program will use a
doubly-black empty node to replace node 4. In the figure, the doubly-black
node is shown in black circle with 2 edges. It means that for node 5, it has a
doubly-black empty left child and has a right non-empty child (a leaf node with
key 6). In such case we can safely change the doubly-black empty to normal
empty node. which wont violate any red-black properties.
On the other hand, if a node has a doubly-black empty node and the other
child is empty, we have to push the doubly-blackness up one level. For example
3.4. DELETION
69
3
3
2
2
1
NIL
6
1
Figure 3.7: One child is doubly-black empty node, the other child is non-empty.
in figure 3.8, if we want to delete node 1 from the tree, the program will use a
doubly-black empty node to replace 1. Then node 2 has a doubly-black empty
node and has an empty right node. In such case we must mark node 2 as
doubly-black after change its left child back to empty.
Based on above analysis, in order to fix the doubly-black empty node, we
define the function partially like the following.
(B2 , , k, ) :
(C, , k, Tr ) :
f ixBlack 2 (T ) =
(C, Tl , k, ) :
... :
(Tl = Tr = ) (Tl = Tr = )
Tl = Tr =
Tr = Tl =
...
(3.11)
After dealing with doubly-black empty node, we need to fix the case that
the sibling of the doubly-black node is black and it has one red child. In this
situation, we can fix the doubly-blackness with one rotation. Actually there are
4 dierent sub-cases, all of them can be transformed to one uniformed pattern.
They are shown in the figure 3.9.
The handling of these 4 sub-cases can be defined on top of formula (3.11).
...
(C, (B, mkBlk(A), x, B), y, (B, C, z, D))
f ixBlack (T ) =
(C, (B, A, x, B), y, (B, C, z, mkBlk(D)))
...
2
:
:
:
:
...
p1.1
p1.2
...
(3.12)
5
2
NIL
(b) After 1 is sliced o, it is doubly-black empty. (c) We must push the doubly-blackness up
to node 2.
Figure 3.8: One child is doubly-black empty node, the other child is empty.
p1.1 :
p1.2 :
... :
mkBlk((C, mkBlk(A), x, (R, B, y, C))) :
f ixBlack (T ) =
mkBlk((C, (R, A, x, B), y, mkBlk(C))) :
... :
2
...
p2.1
p2.2
...
(3.13)
3.4. DELETION
71
Figure 3.9: Fix the doubly black by rotation, the sibling of the doubly-black
node is black, and it has one red child.
=
b
b
(a) Color of x can be either black or red. (b) If x was red, then it becomes black,
otherwise, it becomes doubly-black.
y
y
x
x
=
a
a
(c) Color of y can be either black or red. (d) If y was red, then it becomes black, otherwise, it becomes doubly-black.
}
}
There is a final case left, that the sibling of the doubly-black node is red.
We can do a rotation to change this case to pattern p1.1 or p1.2. Figure 3.11
shows about it.
We can finish formula (3.13) with (3.14).
...
2
2
f
ixBlack
(B,
f
ixBlack
((R,
A,
x,
B),
y,
C)
f ixBlack 2 (T ) =
f ixBlack 2 (B, A, x, f ixBlack 2 ((R, B, y, C))
:
:
:
:
...
p3.1
p3.2
otherwise
(3.14)
3.4. DELETION
73
p1.2, The doubly-black node was eliminated. The other cases may continuously
propagate the doubly-blackness from bottom to top till the root. Finally the
algorithm marks the root node as black anyway. The doubly-blackness will be
removed.
Put formula (3.11), (3.12), (3.13), and (3.14) together, we can write the final
Haskell program.
fixDB color BBEmpty k Empty = Node BB Empty k Empty
fixDB color BBEmpty k r = Node color Empty k r
fixDB color Empty k BBEmpty = Node BB Empty k Empty
fixDB color l k BBEmpty = Node color l k Empty
-- the sibling is black, and it has one red child
fixDB color a@(Node BB _ _ _) x (Node B (Node R b y c) z d) =
Node color (Node B (makeBlack a) x b) y (Node B c z d)
fixDB color a@(Node BB _ _ _) x (Node B b y (Node R c z d)) =
Node color (Node B (makeBlack a) x b) y (Node B c z d)
fixDB color (Node B a x (Node R b y c)) z d@(Node BB _ _ _) =
Node color (Node B a x b) y (Node B c z (makeBlack d))
fixDB color (Node B (Node R a x b) y c) z d@(Node BB _ _ _) =
Node color (Node B a x b) y (Node B c z (makeBlack d))
-- the sibling and its 2 children are all black, propagate the blackness up
fixDB color a@(Node BB _ _ _) x (Node B b@(Node B _ _ _) y c@(Node B _ _ _))
= makeBlack (Node color (makeBlack a) x (Node R b y c))
fixDB color (Node B a@(Node B _ _ _) x b@(Node B _ _ _)) y c@(Node BB _ _ _)
= makeBlack (Node color (Node R a x b) y (makeBlack c))
-- the sibling is red
fixDB B a@(Node BB _ _ _) x (Node R b y c) = fixDB B (fixDB R a x b) y c
fixDB B (Node R a x b) y c@(Node BB _ _ _) = fixDB B a x (fixDB R b y c)
-- otherwise
fixDB color l k r = Node color l k r
The deletion algorithm takes O(lg n) time to delete a key from a red-black
tree with n nodes.
Exercise 3.4
As we mentioned in this section, deletion can be implemented by just
marking the node as deleted without actually removing it. Once the number of marked nodes exceeds 50%, a tree re-build is performed. Try to
implement this method in your favorite programming language.
Why neednt enclose mkBlk with a call to f ixBlack 2 explicitly in the
definition of del(T, k)?
3.5
We almost finished all the content in this chapter. By induction the patterns, we
can implement the red-black tree in a simple way compare to the imperative tree
rotation solution. However, we should show the comparator for completeness.
For insertion, the basic idea is to use the similar algorithm as described in
binary search tree. And then fix the balance problem by rotation and return
the final result.
1: function Insert(T, k)
2:
root T
3:
x Create-Leaf(k)
4:
Color(x) RED
5:
p NIL
6:
while T = NIL do
7:
pT
8:
if k < Key(T ) then
9:
T Left(T )
10:
else
11:
T Right(T )
12:
Parent(x) p
13:
if p = NIL then
tree T is empty
14:
return x
15:
else if k < Key(p) then
16:
Left(p) x
17:
else
18:
Right(p) x
19:
return Insert-Fix(root, x)
The only dierence from the binary search tree insertion algorithm is that
we set the color of the new node as red, and perform fixing before return. It is
easy to translate the pseudo code to real imperative programming language, for
instance Python 4 .
def rb_insert(t, key):
root = t
x = Node(key)
parent = None
while(t):
4 C,
and C++ source codes are available along with this book
75
parent = t
if(key < t.key):
t = t.left
else:
t = t.right
if parent is None: #tree is empty
root = x
elif key < parent.key:
parent.set_left(x)
else:
parent.set_right(x)
return rb_insert_fix(root, x)
There are 3 base cases for fixing, and if we take the left-right symmetric
into consideration. there are total 6 cases. Among them two cases can be
merged together, because they all have uncle node in red color, we can toggle
the parent color and uncle color to black and set grand parent color to red.
With this merging, the fixing algorithm can be realized as the following.
1: function Insert-Fix(T, x)
2:
while Parent(x) = NIL Color(Parent(x)) = RED do
3:
if Color(Uncle(x)) = RED then
Case 1, xs uncle is red
4:
Color(Parent(x)) BLACK
5:
Color(Grand-Parent(x)) RED
6:
Color(Uncle(x)) BLACK
7:
x Grand-Parent(x)
8:
else
xs uncle is black
9:
if Parent(x) = Left(Grand-Parent(x)) then
10:
if x = Right(Parent(x)) then Case 2, x is a right child
11:
x Parent(x)
12:
T Left-Rotate(T, x)
Case 3, x is a left child
13:
Color(Parent(x)) BLACK
14:
Color(Grand-Parent(x)) RED
15:
T Right-Rotate(T , Grand-Parent(x))
16:
else
17:
if x = Left(Parent(x)) then
Case 2, Symmetric
18:
x Parent(x)
19:
T Right-Rotate(T, x)
Case 3, Symmetric
20:
Color(Parent(x)) BLACK
21:
Color(Grand-Parent(x)) RED
22:
T Left-Rotate(T , Grand-Parent(x))
23:
Color(T ) BLACK
24:
return T
This program takes O(lg n) time to insert a new key to the red-black tree.
Compare this pseudo code and the balance function we defined in previous
section, we can see the dierence. They dier not only in terms of simplicity,
but also in logic. Even if we feed the same series of keys to the two algorithms,
they may build dierent red-black trees. There is a bit performance overhead
in the pattern matching algorithm. Okasaki discussed about the dierence in
detail in his paper[2].
Figure 3.12 shows the results of feeding same series of keys to the above
python insertion program. Compare them with figure 3.6, one can tell the
dierence clearly.
11
5
14
(a)
15
1
(b)
77
Exercise 3.5
Implement the red-black tree deleting algorithm in your favorite imperative programming language. you can refer to [2] for algorithm details.
3.6
More words
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[2] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting. J. Functional Programming. 1998
[3] Chris Okasaki. Ten Years of Purely Functional Data Structures.
http://okasaki.blogspot.com/2008/02/ten-years-of-purely-functionaldata.html
[4] Wikipedia. Red-black tree. http://en.wikipedia.org/wiki/Red-black tree
[5] Lyn Turbak. Red-Black Trees. cs.wellesley.edu/ cs231/fall01/redblack.pdf Nov. 2, 2001.
[6] SGI STL. http://www.sgi.com/tech/stl/
[7] Pattern matching. http://rosettacode.org/wiki/Pattern matching
79
80
AVL tree
Chapter 4
AVL tree
4.1
4.1.1
Introduction
How to measure the balance of a tree?
Besides red-black tree, are there any other intuitive solutions of self-balancing
binary search tree? In order to measure how balancing a binary search tree
is, one idea is to compare the height of the right sub-tree and left sub-tree. If
they diers a lot, the tree isnt well balanced. Lets denote the dierence height
between two children as below
(T ) = |R| |L|
(4.1)
Where |T | means the height of tree T , and L, R denotes the left sub-tree
and right sub-tree.
If (T ) = 0, The tree is definitely balanced. For example, a complete binary
tree has n = 2h 1 nodes for height h. There is no empty branches unless the
leafs. Another trivial case is empty tree. () = 0. The less absolute value of
(T ) the more balancing the tree is.
We define (T ) as the balance factor of a binary search tree.
4.2
An AVL tree is a special binary search tree, that all sub-trees satisfying the
following criteria.
|(T )| 1
(4.2)
The absolute value of balance factor is less than or equal to 1, which means
there are only three valid values, -1, 0 and 1. Figure 4.1 shows an example AVL
tree.
Why AVL tree can keep the tree balanced? In other words, Can this definition ensure the height of the tree as O(lg n) where n is the number of the nodes
in the tree? Lets prove this fact.
For an AVL tree of height h, The number of nodes varies. It can have at
most 2h 1 nodes for a complete binary tree. We are interesting about how
81
82
10
many nodes there are at least. Lets denote the minimum number of nodes for
height h AVL tree as N (h). Its obvious for the trivial cases as below.
For empty tree, h = 0, N (0) = 0;
For a singleton root, h = 1, N (1) = 1;
Whats the situation for common case N (h)? Figure 4.2 shows an AVL tree
T of height h. It contains three part, the root node, and two sub trees L, R.
We have the following fact.
h = max(|L|, |R|) + 1
(4.3)
We immediately know that, there must be one child has height h 1. According to the definition of AVL tree, we have ||L| |B|| 1. This leads to the
fact that the height of other tree cant be lower than h 2, So the total number
of the nodes of T is the number of nodes in both children plus 1 (for the root
node). We exclaim that.
N (h) = N (h 1) + N (h 2) + 1
(4.4)
h-1
h-2
Figure 4.2: An AVL tree with height h, one of the sub-tree with height h 1,
the other is no less than h 2
This recursion reminds us the famous Fibonacci series. Actually we can
transform it to Fibonacci series by defining N (h) = N (h) + 1. So equation
83
(4.5)
Lemma 4.2.1. Let N (h) be the minimum number of nodes for an AVL tree
with height h. and N (h) = N (h) + 1, then
N (h) h
Where =
5+1
2
(4.6)
= N (h) + N (h 1)
h + h1
= h1 ( + 1)
= h+1
{F ibonacci}
{ + 1 = 2 =
5+3
2 }
(4.7)
It tells that the height of AVL tree is proportion to O(lg n), which means
that AVL tree is balanced.
During the basic mutable tree operations such as insertion and deletion, if
the balance factor changes to any invalid value, some fixing has to be performed
to resume || within 1. Most implementations utilize tree rotations. In this
chapter, well show the pattern matching solution which is inspired by Okasakis
red-black tree solution[2]. Because of this modify-fixing approach, AVL tree is
also a kind of self-balancing binary search tree. For comparison purpose, well
also show the procedural algorithms.
Of course we can compute the value recursively, another option is to store
the balance factor inside each nodes, and update them when we modify the tree.
The latter one avoid computing the same value every time.
Based on this idea, we can add one data field to the original binary search
tree as the following C++ code example 1 .
template <class T>
struct node{
int delta;
T key;
node left;
node right;
node parent;
};
1 Some
84
The immutable operations, including looking up, finding the maximum and
minimum elements are all same as the binary search tree. Well skip them and
focus on the mutable operations.
4.3
Insertion
Insert a new element to an AVL tree may violate the AVL tree property that the
absolute value exceeds 1. To resume it, one option is to do the tree rotation
according to the dierent insertion cases. Most implementation is based on this
approach
Another way is to use the similar pattern matching method mentioned by
Okasaki in his red-black tree implementation [2]. Inspired by this idea, it is
possible to provide a simple and intuitive solution.
When insert a new key to the AVL tree, the balance factor of the root may
changes in range [1, 1]2 , and the height may increase at most by one, which we
need recursively use this information to update the value in further level nodes.
We can define the result of the insertion algorithm as a pair of data (T , H).
Where T is the new tree and H is the increment of height. Lets denote
function f irst(pair) can return the first element in a pair. We can modify the
binary search tree insertion algorithm as the following to handle AVL tree.
insert(T, k) = f irst(ins(T, k))
(4.8)
where
ins(T, k) =
((, k, , 0), 1) : T =
tree(ins(L, k), k , (R, 0), ) : k < k
(4.9)
L, R, k , represent the left child, right child, the key and the balance factor
of a tree.
L = lef t(T )
R = right(T )
k = key(T )
= (T )
When we insert a new key k to a AVL tree T , if the tree is empty, we just
need create a leaf node with k, set the balance factor as 0, and the height is
increased by one. This is the trivial case.
2 Note
that, it doesnt mean is in range [1, 1], the changes of is in this range.
4.3. INSERTION
85
If T isnt empty, we need compare the key k with k. If k is less than the
key, we recursively insert it to the left child, otherwise we insert it into the right
child.
As we defined above, the result of the recursive insertion is a pair like
(L , Hl ), we need do balancing adjustment as well as updating the increment
of height. Function tree() is defined to dealing with this task. It takes 4 parameters as (L , Hl ), k , (R , Hr ), and . The result of this function is defined
as (T , H), where T is the new tree after adjustment, and H is the new
increment of height which is defined as
H = |T | |T |
(4.10)
= |T | |T |
= 1 + max(|R |, |L |) (1 + max(|R|, |L|))
= max(|R
|, |L |) max(|R|, |L|)
Hr : 0 0
+ Hr : 0 0
=
Hl : 0 0
Hl : otherwise
(4.11)
To prove this equation, note the fact that the height cant increase both in
left and right with only one insertion.
These 4 cases can be explained from the balance factor definition that it
equals to the dierence from the right sub tree and left sub tree.
If 0 and 0, it means that the height of right sub tree isnt less
than the height of left sub tree both before insertion and after insertion.
In this case, the increment in height of the tree is only contributed from
the right sub tree, which is Hr .
If 0, which means the height of left sub tree isnt less than the height
of right sub tree before, and it becomes 0, which means that the
height of right sub tree increases due to insertion, and the left side keeps
same (|L | = |L|). So the increment in height is
H
{ 0 0}
{|L| = |L |}
For the case 0 and 0, Similar as the second one, we can get.
H
{ 0 0}
For the last case, the both and is no bigger than zero, which means
the height left sub tree is always greater than or equal to the right sub
tree, so the increment in height is only contributed from the left sub tree,
which is Hl .
86
The next problem in front of us is how to determine the new balancing factor
value before performing balancing adjustment. According to the definition
of AVL tree, the balancing factor is the height of right sub tree minus the height
of left sub tree. We have the following facts.
= |R | |L |
= |R| + Hr (|L| + Hl )
= |R| |L| + Hr Hl
= + Hr Hl
(4.12)
With all these changes in height and balancing factor get clear, its possible
to define the tree() function mentioned in (4.9).
tree((L , Hl ), Key, (R , Hr ), ) = balance(node(L , Key, R , ), H)
(4.13)
Before we moving into details of balancing adjustment, lets translate the
above equations to real programs in Haskell.
First is the insert function.
insert::(Ord a)AVLTree a a AVLTree a
insert t x = fst $ ins t where
ins Empty = (Br Empty x Empty 0, 1)
ins (Br l k r d)
| x<k
= tree (ins l) k (r, 0) d
| x == k
= (Br l k r d, 0)
| otherwise = tree (l, 0) k (ins r) d
Here we also handle the case that inserting a duplicated key (which means
the key has already existed.) as just overwriting.
tree::(AVLTree a, Int) a (AVLTree a, Int) Int (AVLTree a, Int)
tree (l, dl) k (r, dr) d = balance (Br l k r d, delta) where
d = d + dr - dl
delta = deltaH d d dl dr
4.3.1
Balancing adjustment
4.3. INSERTION
87
(z) = 2
(x) = 2
(y) = 1
(y) = 1
y
(z) = 2
(x) = 1
(y) = 0
@
R
@
@
A
@
I
@
(x) = 2
@
A
(z) = 1
88
factor before fixing as (x), (y), and (z), while after fixing, they changes to
(x), (y), and (z) respectively.
Well next prove that, after fixing, we have (y) = 0 for all four cases, and
well provide the result values of (x) and (z).
Left-left lean case
As the structure of sub tree x doesnt change due to fixing, we immediately get
(x) = (x).
Since (y) = 1 and (z) = 2, we have
(y) = |C| |x| = 1 |C| = |x| 1
(z) = |D| |y| = 2 |D| = |y| 2
(4.14)
After fixing.
(z)
= |D| |C|
= |y| 2 (|x| 1)
= |y| |x| 1
=0
{F rom(4.14)}
{x is child of y |y| |x| = 1}
(4.15)
(4.16)
Summarize the above results, the left-left lean case adjust the balancing
factors as the following.
(x) = (x)
(y) = 0
(z) = 0
(4.17)
(4.18)
(4.19)
4.3. INSERTION
89
|z| |A| = 2
2 + max(|B|, |C|) |A| = 2
max(|B|, |C|) |A| = 0
{By (4.20)}
(4.21)
(4.22)
(4.23)
If (y) = 1, it means max(|B|, |C|) = |B|, taking this into (4.21), yields.
|B| |A| = 0 {By (4.19)}
(x) = 0
(4.24)
Summarize these 2 cases, we get relationship of (x) and (y) as the following.
{
1 : (y) = 1
(x) =
(4.25)
0 : otherwise
For (z) according to definition, it is equal to.
(z) = |D| |C|
{(z) = 1 = |D| |y|}
= |y| |C| 1
{|y| = 1 + max(|B|, |C|)}
= max(|B|, |C|) |C|
(4.26)
(z) =
(4.27)
0 : otherwise
Finally, for (y), we deduce it like below.
(y) = |z| |x|
= max(|C|, |D|) max(|A|, |B|)
There are three cases.
(4.28)
90
{|C| = |D|}
{From (4.25): (x) = 1 |B| |A| = 1}
{(y) = 1 |C| |B| = 1}
{|A| = |B|}
{From (4.27): |D| |C| = 1}
{(y) = 1 |C| |B| = 1}
(x) =
0 : otherwise
(y) = {
0
(4.29)
1
:
(y)
=
1
(z) =
0 : otherwise
Left-right lean case
Left-right lean case is symmetric to the Right-left lean case. By using the similar
deduction, we can find the new balancing factors are identical to the result in
(4.29).
4.3.2
Pattern Matching
All the problems have been solved and its time to define the final pattern
matching fixing function.
(T, H)
:
:
:
:
Pll (T )
Prr (T )
Prl (T ) Plr (T )
otherwise
(4.30)
Where Pll (T ) means the pattern of tree T is left-left lean respectively. (x)
and delta (z) are defined in (4.29). The four patterns are tested as below.
Pll (T ) : T = (((A, x, B, (x)), y, C, 1), z, D, 2)
Prr (T ) : T = (A, x, (B, y, node(C, z, D, (z)), 1), 2)
Prl (T ) : T = ((A, x, (B, y, C, (y)), 1), z, D, 2)
Plr (T ) : T = (A, x, ((B, y, C, (y)), z, D, 1), 2)
(4.31)
4.3. INSERTION
91
Translating the above function definition to Haskell yields a simple and intuitive program.
balance (Br (Br (Br a x b dx) y c (-1)) z d
(Br (Br a x b dx) y (Br c z d 0) 0,
balance (Br a x (Br b y (Br c z d dz)
1)
(Br (Br a x b 0) y (Br c z d dz) 0,
balance (Br (Br a x (Br b y c dy)
1) z d
(Br (Br a x b dx) y (Br c z d dz)
dx = if dy == 1 then -1 else 0
dz = if dy == -1 then 1 else 0
balance (Br a x (Br (Br b y c dy) z d (-1))
(Br (Br a x b dx) y (Br c z d dz)
dx = if dy == 1 then -1 else 0
dz = if dy == -1 then 1 else 0
balance (t, d) = (t, d)
(-2),
0)
2,
0)
(-2),
0, 0)
_) =
_) =
_) =
where
2, _) =
0, 0) where
The insertion algorithm takes time proportion to the height of the tree, and
according to the result we proved above, its performance is O(lg n) where n is
the number of elements stored in the AVL tree.
Verification
One can easily create a function to verify a tree is AVL tree. Actually we need
verify two things, first, its a binary search tree; second, it satisfies AVL tree
property.
We left the first verification problem as an exercise to the reader.
In order to test if a binary tree satisfies AVL tree property, we can test
the dierence in height between its two children, and recursively test that both
children conform to AVL property until we arrive at an empty leaf.
{
avl?(T ) =
T rue : T =
avl?(L) avl?(R) ||R| |L|| 1 : otherwise
(4.32)
And the height of a AVL tree can also be calculate from the definition.
{
0 : T =
|T | =
(4.33)
1 + max(|R|, |L|) : otherwise
The corresponding Haskell program is given as the following.
isAVL :: (AVLTree a) Bool
isAVL Empty = True
isAVL (Br l _ r d) = and [isAVL l, isAVL r, abs (height r - height l) 1]
height :: (AVLTree a) Int
height Empty = 0
height (Br l _ r _) = 1 + max (height l) (height r)
Exercise 4.1
Write a program to verify a binary tree is a binary search tree in your
favorite programming language. If you choose to use an imperative language,
please consider realize this program without recursion.
92
4.4
Deletion
As we mentioned before, deletion doesnt make significant sense in purely functional settings. As the tree is read only, its typically performs frequently looking
up after build.
Even if we implement deletion, its actually re-building the tree as we presented in chapter of red-black tree. We left the deletion of AVL tree as an
exercise to the reader.
Exercise 4.2
Take red-black tree deletion algorithm as an example, write the AVL tree
deletion program in purely functional approach in your favorite programming language.
4.5
We almost finished all the content in this chapter about AVL tree. However, it
necessary to show the traditional insert-and-rotate approach as the comparator
to pattern matching algorithm.
Similar as the imperative red-black tree algorithm, the strategy is first to do
the insertion as same as for binary search tree, then fix the balance problem by
rotation and return the final result.
1: function Insert(T, k)
2:
root T
3:
x Create-Leaf(k)
4:
(x) 0
5:
parent NIL
6:
while T = NIL do
7:
parent T
8:
if k < Key(T ) then
9:
T Left(T )
10:
else
11:
T Right(T )
12:
Parent(x) parent
13:
if parent = NIL then
tree T is empty
14:
return x
15:
else if k < Key(parent) then
16:
Left(parent) x
17:
else
18:
Right(parent) x
19:
return AVL-Insert-Fix(root, x)
Note that after insertion, the height of the tree may increase, so that the
balancing factor may also change, insert on right side will increase by 1,
while insert on left side will decrease it. By the end of this algorithm, we need
perform bottom-up fixing from node x towards root.
93
This is a top-down algorithm. It searches the tree from root down to the
proper position and inserts the new key as a leaf. By the end of this algorithm,
it calls fixing procedure, by passing the root and the new node inserted.
Note that we reuse the same methods of set left() and set right() as
we defined in chapter of red-black tree.
In order to resume the AVL tree balance property by fixing, we first determine if the new node is inserted on left hand or right hand. If it is on left, the
balancing factor decreases, otherwise it increases. If we denote the new value
as , there are 3 cases of the relationship between and .
If || = 1 and | | = 0, this means adding the new node makes the tree
perfectly balanced, the height of the parent node doesnt change, the algorithm can be terminated.
If || = 0 and | | = 1, it means that either the height of left sub tree or
right sub tree increases, we need go on check the upper level of the tree.
If || = 1 and | | = 2, it means the AVL tree property is violated due to
the new insertion. We need perform rotation to fix it.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
function AVL-Insert-Fix(T, x)
while Parent(x) = NIL do
(Parent(x))
if x = Left(Parent(x)) then
1
else
+ 1
(Parent(x))
P Parent(x)
L Left(x)
R Right(x)
3C
and C++ source code are available along with this book
94
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
95
96
We skip the AVL tree deletion algorithm and left this as an exercise to the
reader.
Exercise 4.3
Write the deletion algorithm in imperative approach in your favorite programming language.
4.6
Chapter note
AVL tree was invented in 1962 by Adelson-Velskii and Landis[3], [4]. The name
AVL tree comes from the two inventors name. Its earlier than red-black tree.
Its very common to compare AVL tree and red-black tree, both are selfbalancing binary search trees, and for all the major operations, they both consume O(lg n) time. From the result of (4.7), AVL tree is more rigidly balanced
hence they are faster than red-black tree in looking up intensive applications
[3]. However, red-black trees could perform better in frequently insertion and
removal cases.
Many popular self-balancing binary search tree libraries are implemented on
top of red-black tree such as STL etc. However, AVL tree provides an intuitive
and eective solution to the balance problem as well.
After this chapter, well extend the tree data structure from storing data in
node to storing information on edges, which leads to Trie and Patrica, etc. If
we extend the number of children from two to more, we can get B-tree. These
data structures will be introduced next.
Bibliography
[1] Data.Tree.AVL http://hackage.haskell.org/packages/archive/AvlTree/4.2/doc/html/DataTree-AVL.html
[2] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting. J. Functional Programming. 1998
[3] Wikipedia. AVL tree. http://en.wikipedia.org/wiki/AVL tree
[4] Guy Cousinear, Michel Mauny. The Functional Approach to Programming. Cambridge University Press; English Ed edition (October 29, 1998).
ISBN-13: 978-0521576819
[5] Pavel Grafov. Implementation of an
http://github.com/pgrafov/python-avl-tree
97
AVL
tree
in
Python.
98
Chapter 5
Introduction
The binary trees introduced so far store information in nodes. Its possible to
store the information in edges. Radix trees like Trie and Patricia are important
data structures in information retrieving and manipulating. They were invented
in 1960s. And are widely used in compiler design[2], and bio-information area,
such as DNA pattern matching [3].
0
1
10
011
100
1011
100
sented by edges. The nodes marked with keys in the above figure are only for
illustration purpose.
It is very natural to come to the idea is it possible to represent key in
integer instead of string? Because integer can be written in binary format,
such approach can save spaces. Another advantage is that the speed is fast
because we can use bit-wise manipulation in most programming environment.
5.2
Integer Trie
The data structure shown in figure 5.1 is often called as binary trie. Trie is
invented by Edward Fredkin. It comes from retrieval, pronounce as /tri:/
by the inventor, while it is pronounced /trai/ try by other authors [5]. Trie
is also called prefix tree. A binary trie is a special binary tree in which the
placement of each key is controlled by its bits, each 0 means go left and each
1 means go right[2].
Because integers can be represented in binary format, it is possible to store
integer keys rather than 0, 1 strings. When insert an integer as the new key to
the trie, we change it to binary form, then examine the first bit, if it is 0, we
recursively insert the rest bits to the left sub tree; otherwise if it is 1, we insert
into the right sub tree.
There is a problem when treat the key as integer. Consider a binary trie
shown in figure 5.2. If represented in 0, 1 strings, all the three keys are dierent.
But they are identical when turn into integers. Where should we insert decimal
3, for example, to the trie?
1
11
011
0011
101
the result is a trie with a root and a right leaf. There is only 1 level. decimal 2
is represented as 01, and decimal 3 is (11)2 in little-endian binary format. There
is no need to add any prefix 0, the position in the trie is uniquely determined.
5.2.1
In order to define the little-endian binary trie, we can reuse the structure of
binary tree. A binary trie node is either empty, or a branch node. The branch
node contains a left child, a right node, and optional value as satellite data. The
left sub tree is encoded as 0 and the right sub tree is encoded as 1.
The following example Haskell code defines the trie algebraic data type.
data IntTrie a = Empty
| Branch (IntTrie a) (Maybe a) (IntTrie a)
5.2.2
Insertion
Since the key is little-endian integer, when insert a key, we take the bit one by
one from the right most. If it is 0, we go to the left, otherwise go to the right
for 1. If the child is empty, we need create a new node, and repeat this to the
last bit of the key.
1: function Insert(T, k, v)
2:
if T = NIL then
3:
T Empty-Node
4:
pT
5:
while k = 0 do
6:
if Even?(k) then
7:
if Left(p) = NIL then
8:
Left(p) Empty-Node
9:
p Left(p)
10:
else
11:
if Right(p) = NIL then
12:
Right(p) Empty-Node
13:
p Right(p)
14:
k k/2
15:
Data(p) v
16:
return T
This algorithm takes 3 arguments, a Trie T , a key k, and the satellite date v.
The following Python example code implements the insertion algorithm. The
satellite data is optional, it is empty by default.
def trie_insert(t, key, value = None):
if t is None:
t = IntTrie()
102
Figure 5.2 shows a trie which is created by inserting pairs of key and value
{1 a, 4 b, 5 c, 9 d} to the empty trie.
1
1:a
4:b
1
5:c
1
9:d
insert(T, k, v) =
(Tl , v, Tr ) :
(insert(Tl , k/2, v), d, Tr ) :
k=0
even(k)
otherwise
(5.1)
If the key to be inserted already exists, this algorithm just overwrites the
103
previous stored data. It can be replaced with other alternatives, such as storing
data as with linked-list etc.
The following Haskell example program implements the insertion algorithm.
insert t 0 x = Branch (left t) (Just x) (right t)
insert t k x | even k = Branch (insert (left t) (k div 2) x) (value t) (right t)
| otherwise = Branch (left t) (value t) (insert (right t) (k div 2) x)
left (Branch l _ _) = l
left Empty = Empty
right (Branch _ _ r) = r
right Empty = Empty
value (Branch _ v _) = v
value Empty = Nothing
For a given integer k with m bits in binary, the insertion algorithm reuses
m levels. The performance is bound to O(m) time.
5.2.3
Look up
To look up key k in the little-endian integer binary trie. We take each bit of k
from right, then go left if this bit is 0, otherwise, we go right. The looking up
completes when all bits are consumed.
1: function Lookup(T, k)
2:
while x = 0 T =NIL do
3:
if Even?(x) then
4:
T Left(T )
5:
else
6:
T Right(T )
7:
k k/2
8:
if T = NIL then
9:
return Data(T )
10:
else
11:
return not found
Below Python example code uses bit-wise operation to implements the looking up algorithm.
def lookup(t, key):
while key != 0 and (t is not None):
if key & 1 == 0:
t = t.left
else:
t = t.right
key = key>>1
if t is not None:
return t.value
else:
return None
Looking up can also be define in recursive manner. If the tree is empty, the
looking up fails; If k = 0, the satellite data is the result to be found; If the last
104
bit is 0, we recursively look up the left child; otherwise, we look up the right
child.
: T =
d : k=0
lookup(T, k) =
(5.2)
lookup(Tl , k/2) : even(k)
5.3
Integer Patricia
Trie has some drawbacks. It wastes a lot of spaces. Note in figure 5.2, only
leafs store the real data. Its very common that an integer binary trie contains
many nodes only have one child. One improvement idea is to compress the
chained nodes together. Patricia is such a data structure invented by Donald
R. Morrison in 1968. Patricia means practical algorithm to retrieve information
coded in alphanumeric[3]. It is another kind of prefix tree.
Okasaki gives implementation of integer Patricia in [2]. If we merge the
chained nodes which have only one child together in figure 5.3, We can get a
Patricia as shown in figure 5.4.
001
4:b
1
1:a
0
01
9:d
1
5:c
105
MSB are omitted to save space. Okasaki lists some significant advantages of
big-endian Patricia[2].
5.3.1
Definition
In order to tell from which bit the left and right children dier, a mask is
recorded in the branch node. Typically, a mask is power of 2, like 2n for some
non-negative integer n, all bits being lower than n dont belong to the common
prefix.
The following Python example code defines Patricia as well as some auxiliary
functions.
class IntTree:
def __init__(self, key = None, value = None):
self.key = key
self.value = value
self.prefix = self.mask = None
self.left = self.right = None
def set_children(self, l, r):
self.left = l
self.right = r
def replace_child(self, x, y):
if self.left == x:
self.left = y
else:
self.right = y
def is_leaf(self):
return self.left is None and self.right is None
def get_prefix(self):
if self.prefix is None:
return self.key
else:
return self.prefix
106
5.3.2
Insertion
When insert a key, if the tree is empty, we can just create a leaf node with the
given key and satellite data, as shown in figure 5.5.
12
NIL
Figure 5.5: Left: the empty tree; Right: After insert key 12.
If the tree is just a singleton leaf node x, we can create a new leaf y, put the
key and data into it. After that, we need create a new branch node, and set x
and y as the two children. In order to determine if the y should be the left or
right node, we need find the longest common prefix of x and y. For example
if key(x) is 12 ((1100)2 in binary), key(y) is 15 ((1111)2 in binary), then the
longest common prefix is (11oo)2 . Where o denotes the bits we dont care about.
We can use another integer to mask those bits. In this case, the mask number
is 4 (100 in binary). The next bit after the longest common prefix presents 21 .
This bit is 0 in key(x), while it is 1 in key(y). We should set x as the left child
and y as the right child. Figure 5.6 shows this example.
12
prefix=1100
mask=100
0
12
1
15
Figure 5.6: Left: A tree with a singleton leaf 12; Right: After insert key 15.
In case the tree is neither empty, nor a singleton leaf, we need firstly check if
the key to be inserted matches the longest common prefix recorded in the root.
Then recursively insert the key to the left or the right child according to the
next bit of the common prefix. For example, if insert key 14 ((1110)2 in binary)
to the result tree in figure 5.6, since the common prefix is (11oo)2 , and the next
bit (the bit of 21 ) is 1, we need recursively insert to the right child.
If the key to be inserted doesnt match the longest common prefix stored in
the root, we need branch a new leaf out. Figure 5.7 shows these two dierent
cases.
For a given key k and value v, denote (k, v) is the leaf node. For branch
node, denote it in form of (p, m, Tl , Tr ), where p is the longest common prefix,
m is the mask, Tl and Tr are the left and right children. Summarize the above
107
prefix=1100
mask=100
0
12
prefix=1100
mask=100
15
12
1
prefix=1110
mask=10
0
14
15
prefix=1100
mask=100
0
12
prefix=0
mask=10000
15
1
prefix=1110
mask=10
0
12
1
15
(b) Insert key 5. It doesnt match the longest common prefix (1100)2 , a new leaf is branched out.
108
= T = (k, v )
= (k , v )
= (p, m, Tl , Tr ), match(k, p, m), zero(k, m)
insert(T, k, v) =
= (p, m, Tl , Tr ), match(k, p, m)
(5.3)
The first clause deals with the edge cases, that either T is empty or it is a
leaf node with the same key. The algorithm overwrites the previous value for
the later case.
The second clause handles the case that T is a leaf node, but with dierent
key. Here we need branch out another new leaf. We need extract the longest
common prefix, and determine which leaf should be set as the left, and which
should be set as the right child. Function join(k1 , T1 , k2 , T2 ) does this work.
Well define it later.
The third clause deals with the case that T is a branch node, the longest
common prefix matches the key to be inserted, and the next bit to the common
prefix is zero. Here we need recursively insert to the left child.
The fourth clause handles the similar case as the third clause, except that
the next bit to the common prefix is one, but not zero. We need recursively
insert to the right child.
The last clause is for the case that the key to be inserted doesnt match the
longest common prefix stored in the branch. We need branch out a new leaf by
calling the join function.
We need define function match(k, p, m) to test if the key k, has the same
prefix p above the masked bits m. For example, suppose the prefix stored in a
branch node is (pn pn1 ...pi ...p0 )2 in binary, key k is (kn kn1 ...ki ...k0 )2 in binary,
and the mask is (100...0)2 = 2i . They match if and only if pj = kj for all j,
that i j n.
One solution to realize match is to test if mask(k, m) = p is satisfied. Where
mask(x, m) = m 1&x, that we perform bitwise-not of m 1, then perform
bitwise-and with x.
Function zero(k, m) test the next bit of the common prefix is zero. With the
help of the mask m, we can shift m one bit to the right, then perform bitwise
and with the key.
(k, v)
join(k, (k, v), k , T )
(p, m, insert(Tl , k, v), Tr )
(p, m, Tl , insert(Tr , k, v))
join(k, (k, v), p, T )
:
:
:
:
:
T
T
T
T
T
(5.4)
If the mask m = (100..0)2 = 2i , k = (kn kn1 ...ki 1...k0 )2 , because the bit
next to ki is 1, zero(k, m) returns false value; if k = (kn kn1 ...ki 0...k0 )2 , then
the result is true.
Function join(p1 , T1 , p2 , T2 ) takes two dierent prefixes and trees. It extracts
the longest common prefix of p1 and p2 , create a new branch node, and set T1
and T2 as the two children.
{
join(p1 , T1 , p2 , T2 ) =
(p, m, T1 , T2 ) :
(p, m, T2 , T1 ) :
109
(5.6)
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
110
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
111
return t
The auxiliary functions, match, branch, lcp etc. are given as below.
def maskbit(x, mask):
return x & (~(mask-1))
def match(key, tree):
return (not tree.is_leaf()) and maskbit(key, tree.mask) == tree.prefix
def zero(x, mask):
return x & (mask>>1) == 0
def lcp(p1, p2):
diff = (p1 ^ p2)
mask=1
while(diff!=0):
diff>>=1
mask< 1
return (maskbit(p1, mask), mask)
def branch(t1, t2):
t = IntTree()
(t.prefix, t.mask) = lcp(t1.get_prefix(), t2.get_prefix())
if zero(t1.get_prefix(), t.mask):
t.set_children(t1, t2)
else:
t.set_children(t2, t1)
return t
Figure 5.8 shows the example Patricia created with the insertion algorithm.
prefix=0
mask=8
0
1:x
1
prefix=100
mask=2
0
4:y
1
5:z
5.3.3
Look up
Consider the property of integer Patricia tree. When look up a key, if it has
common prefix with the root, then we check the next bit. If this bit is zero, we
112
recursively look up the left child; otherwise if the bit is one, we next look up
the right child.
When reach a leaf node, we can directly check if the key of the leaf is equal
to what we are looking up. This algorithm can be described with the following
pseudo code.
1: function Look-Up(T, k)
2:
if T = NIL then
3:
return N IL
Not found
4:
while T is not leaf, and Match(k, Prefix(T ), Mask(T )) do
5:
if Zero?(k, Mask(T )) then
6:
T Left(T )
7:
else
8:
T Right(T )
9:
if T is leaf, and Key(T ) = k then
10:
return Data(T )
11:
else
12:
return N IL
Not found
Below Python example program implements the looking up algorithm.
def lookup(t, key):
if t is None:
return None
while (not t.is_leaf()) and match(key, t):
if zero(key, t.mask):
t = t.left
else:
t = t.right
if t.is_leaf() and t.key == key:
return t.value
else:
return None
lookup(T, k) =
The following
up algorithm.
T = (T = (k , v), k = k)
T = (k , v), k = k
T = (p, m, Tl , Tr ), match(k, p, m), zero(k, m)
T = (p, m, Tl , Tr ), match(k, p, m), zero(k, m)
otherwise
(5.7)
Haskell example program implements this recursive looking
v
lookup(Tl , k)
lookup(Tr , k)
:
:
:
:
:
113
search t k
= case t of
Empty Nothing
Leaf k x if k==k then Just x else Nothing
Branch p m l r
| match k p m if zero k m then search l k
else search r k
| otherwise Nothing
5.4
Alphabetic Trie
Integer based Trie and Patricia Tree can be a good start point. Such technical
plays important role in Compiler implementation. Okasaki pointed that the
widely used Glasgow Haskell Compiler, GHC, utilized the similar implementation for several years before 1998 [2].
If we extend the key from integer to alphabetic value, Trie and Patricia tree
can be very powerful in solving textual manipulation problems.
5.4.1
Definition
Its not enough to just use the left and right children to represent alphabetic
keys. Using English for example, there are 26 letters and each can be lower or
upper case. If we dont care about the case, one solution is to limit the number
of branches (children) to 26. Some simplified ANSI C implementations of Trie
are defined by using the array of 26 letters. This can be illustrated as in Figure
5.9.
Not all the 26 branches contain data. For instance, in Figure 5.9, the root
only has three non-empty branches representing letter a, b, and z. Other
branches such as for letter c, are all empty. We dont show empty branches in
the rest of this chapter.
If deal with case sensitive problems, or handle languages other than English,
there can be more letters than 26. The problem of dynamic size of sub branches
can be solved by using some collection data structures. Such as Hash table or
map.
A alphabetic trie is either empty or a node. There are two types of node.
A leaf node dont has any sub trees;
A branch node contains multiple sub trees. Each sub tree is bound to a
character.
Both leaf and branch may contain optional satellite data. The following
Haskell code shows the example definition.
data Trie a = Trie { value :: Maybe a
, children :: [(Char, Trie a)]}
empty = Trie Nothing []
Below ANSI C code defines the alphabetic trie. For illustration purpose
only, we limit the character set to lower case English letters, from a to z.
114
a
n
c
nil
z
...
o
an
o
boy
zoo
l
bool
r
another
Figure 5.9: A Trie with 26 branches, containing key a, an, another, bool,
boy and zoo.
115
struct Trie {
struct Trie children[26];
void data;
};
5.4.2
Insertion
When insert string as key, starting from the root, we pick the character one
by one from the string, examine which child represents the character. If the
corresponding child is empty, a new empty node is created. After that, the next
character is used to select the proper grand child.
We repeat this process for all the characters, and finally store the optional
satellite data in the node we arrived at.
Below pseudo code describes the insertion algorithm.
1: function Insert(T, k, v)
2:
if T = NIL then
3:
T Empty-Node
4:
pT
5:
for each c in k do
6:
if Children(p)[c] = NIL then
7:
Children(p)[c] Empty-Node
8:
p Children(p)[c]
9:
Data(p) v
10:
return T
The following example ANSI C program implements the insertion algorithm.
struct Trie insert(struct Trie t, const char key, void value) {
int c;
struct Trie p;
if(!t)
t = create_node();
for (p = t; key; ++key, p = pchildren[c]) {
c = key - a;
if (!pchildren[c])
pchildren[c] = create_node();
}
pdata = value;
return t;
}
Where function create node creates new empty node, with all children initialized to empty.
struct Trie create_node() {
struct Trie t = (struct Trie) malloc(sizeof(struct Trie));
int i;
for (i=0; i<26; ++i)
tchildren[i] = NULL;
tdata = NULL;
return t;
}
116
The insertion can also be realized in recursive way. Denote the key to be
inserted as K = k1 k2 ...kn , where ki is the i-th character. K is the rest of
characters except k1 . v is the satellite data to be inserted. The trie is in form
T = (v, C), where v is the satellite data, C = {(c1 , T1 ), (c2 , T2 ), ..., (cm , Tm )} is
the map of children. It maps from character ci to sub-tree Ti . if T is empty,
then C is also empty.
insert(T, K, v ) =
(v , C) :
(v, ins(C, k1 , K , v )) :
K=
otherwise.
(5.8)
(5.9)
5.4.3
Look up
To look up a key, we also extract the character from the key one by one. For each
character, we search among the children to see if there is a branch match this
character. If there is no such a child, the look up process terminates immediately
to indicate the not found error. When we reach the last character of the key,
The data stored in the current node is what we are looking up.
1: function Look-Up(T, key)
2:
if T = NIL then
3:
return not found
4:
for each c in key do
5:
if Children(T )[c] = NIL then
6:
return not found
7:
T Children(T )[c]
8:
return Data(T )
Below ANSI C program implements the look up algorithm. It returns NULL
to indicate not found error.
117
The look up algorithm can also be realized in recursive manner. When look
up a key, we start from the first character. If it is bound to some child, we
then recursively search the rest characters in that child. Denote the trie as
T = (v, C), the key being searched as K = k1 k2 ...kn if it isnt empty. The first
character in the key is k1 , and the rest characters are denoted as K .
lookup(T, K) =
v : K=
: f ind(C, k1 ) =
lookup(T , K ) : f ind(C, k1 ) = T
(5.10)
: C=
T1 : k1 = k
f ind(C, k) =
(5.11)
f ind(C , k) : otherwise
The following Haskell example program implements the trie looking up algorithm. It uses the lookup function provided in standard library.
find t [] = value t
find t (k:ks) = case lookup k (children t) of
Nothing Nothing
Just t find t ks
Exercise 5.1
Develop imperative trie by using collection data structure to manage multiple sub trees in alphabetic trie.
5.5
Alphabetic Patricia
Similar to integer trie, alphabetic trie is not memory ecient. We can use the
same method to compress alphabetic trie to Patricia.
5.5.1
Definition
Alphabetic Patricia tree is a special prefix tree, each node contains multiple
branches. All children of a node share the longest common prefix string. As the
result, there is no node with only one child, because it conflicts with the longest
common prefix property.
118
bo
zoo
zoo
ol
an
bool
y
boy
other
another
Figure 5.10: A Patricia prefix tree, with keys: a, an, another, bool, boy
and zoo.
If we turn the trie shown in figure 5.9 into Patricia by compressing all nodes
which have only one child. we can get a Patricia prefix tree as in figure 5.10.
We can modify the definition of alphabetic trie a bit to adapt it to Patricia.
The Patricia is either empty, or a node in form T = (v, C). Where v is the
optional satellite data; C = {(s1 , T1 ), (s2 , T2 ), ..., (sn , Tn )} is a list of pairs.
Each pair contains a string si , which is bound to a sub tree Ti .
The following Haskell example code defines Patricia accordingly.
type Key = String
data Patricia a = Patricia { value :: Maybe a
, children :: [(Key, Patricia a)]}
empty = Patricia Nothing []
Below Python code reuses the definition for trie to define Patricia.
class Patricia:
def __init__(self, value = None):
self.value = value
self.children = {}
5.5.2
Insertion
When insert a key, s, if the Patricia is empty, we create a leaf node as shown
in figure 5.11 (a). Otherwise, we need check the children. If there is some sub
tree Ti bound to the string si , and there exists common prefix between si and
s, we need branch out a new leaf Tj . The method is to create a new internal
branch node, bind it with the common prefix. Then set Ti as one child of this
branch, and Tj as the other child. Ti and Tj share the common prefix. This is
shown in figure 5.11 (b). However, there are two special cases, because s may
be the prefix of si as shown in figure 5.11 (c). And si may be the prefix of s as
in figure 5.11 (d).
The insertion algorithm can be described as below.
119
bo
NIL
boy
ol
(a) Insert key boy into the empty Patricia, the result is a leaf.
another
an
p1 p2
...
other
x
p1 p2
...
another
insert
an p1
...
an p1
...
insert
other
(d) Insert another, into the node with prefix an. We recursively insert key other to the child.
120
function Insert(T, k, v)
if T = NIL then
T Empty-Node
4:
pT
5:
loop
6:
match FALSE
7:
for each (si , Ti ) Children(p) do
8:
if k = si then
9:
Value(p) v
10:
return T
11:
c LCP(k, si )
12:
k1 k c
13:
k2 si c
14:
if c = NIL then
15:
match TRUE
16:
if k2 = NIL then
si is prefix of k
17:
p Ti
18:
k k1
19:
break
20:
else
Branch out a new leaf
21:
Children(p) Children(p) { (c, Branch(k1 , v, k2 , Ti ))
}
22:
Delete(Children(p), (si , Ti ))
23:
return T
24:
if match then
Add a new leaf
25:
Children(p) Children(p) { (k, Create-Leaf(v)) }
26:
return T
27:
return T
In the above algorithm, LCP function finds the longest common prefix of the
two given strings, for example, string bool and boy have the longest common
prefix bo. The subtraction symbol - for strings gives the dierent part of two
strings. For example bool - bo = ol. Branch function creates a branch
node and updates keys accordingly.
The longest common prefix can be extracted by checking the characters in
the two strings one by one till there are two characters dont match.
1: function LCP(A, B)
2:
i1
3:
while i |A| i |B| A[i] = B[i] do
4:
ii+1
5:
return A[1...i 1]
1:
2:
3:
There are two cases when branch out a new leaf. Branch(s1 , T1 , s2 , T2 )
takes two dierent keys and two trees. If s1 is empty, we are dealing the case
such as insert key an into a child bound to string another. We set T2 as the
child of T1 . Otherwise, we create a new branch node and set T1 and T2 as the
two children.
1: function Branch(s1 , T1 , s2 , T2 )
2:
if s1 = then
3:
Children(T1 ) Children(T1 ) {(s2 , T2 )}
4:
return T1
121
T Empty-Node
Children(T ) {(s1 , T1 ), (s2 , T2 )}
7:
return T
The following example Python program implements the Patricia insertion
algorithm.
5:
6:
Where the functions to find the longest common prefix, and branch out are
implemented as below.
# returns (p, s1, s2), where p is lcp, s1=s1-p, s2=s2-p
def lcp(s1, s2):
j=0
while j < len(s1) and j < len(s2) and s1[j] == s2[j]:
j += 1
return (s1[0:j], s1[j:], s2[j:])
def branch(key1, tree1, key2, tree2):
if key1 == "":
#example: insert "an" into "another"
tree1.children[key2] = tree2
return tree1
t = Patricia()
t.children[key1] = tree1
t.children[key2] = tree2
return t
The insertion can also be realized recursively. Start from the root, the program checks all the children to find if there is a node matches the key. Matching
122
means they have the common prefix. For duplicated keys, the program overwrites previous value. There are also alternative solution to handle duplicated
keys, such as using linked-list etc. If there is no child matches the key, the
program creates a new leaf, and add it to the children.
For Patricia T = (v, C), function insert(T, k, v ) inserts key k, and value v
to the tree.
insert(T, k, v ) = (v, ins(C, k, v ))
(5.12)
{(k, (v , ))}
{(k, (v , CT1 ))} C
ins(C, k, v ) =
{branch(k, v , k1 , T1 )} C
{(k1 , T1 )} ins(C , k, v )
:
:
:
:
C=
k1 = k
match(k1 , k)
otherwise
(5.13)
The first clause deals with the edge case of empty children. A leaf node
containing v which is bound to k will be returned as the only child. The second
clause overwrites the previous value with v if there is some child bound to the
same key. Note the CT1 means the children of sub tree T1 . The third clause
branches out a new leaf if the first child matches the key k. The last clause goes
on checking the rest children.
We define two keys A and B matching if they have non-empty common
prefix.
match(A, B) = A = B = a1 = b1
(5.14)
Where a1 and b1 are the first characters in A and B if they are not empty.
Function branch(k1 , v, k2 , T2 ) takes tow keys, a value and a tree. Extract the
longest common prefix k = lcp(k1 , k2 ), Denote the dierent part as k1 = k1 k,
k2 = k2 k. The algorithm firstly handles the edge cases that either k1 is the
prefix of k2 or k2 is the prefix of k1 . For the former case, It creates a new node
containing v, bind this node to k, and set (k2 , T2 ) as the only child; For the later
case, It recursively insert k1 and v to T2 . Otherwise, the algorithm creates a
branch node, binds it to the longest common prefix k, and set two children for
it. One child is (k2 , T2 ), the other is a leaf node containing v, and being bound
to k1 .
123
And function lcp(A, B) keeps taking same characters from A and B one by
one. Denote a1 and b1 as the first characters in A and B if they are not empty.
A and B are the rest parts except for the first characters.
{
lcp(A, B) =
: A = B = a1 = b1
{a1 } lcp(A , B ) : a1 = b1
(5.16)
5.5.3
Look up
When look up a key, we cant examine the characters one by one as in trie. Start
from the root, we need search among the children to see if any one is bound
to a prefix of the key. If there is such a child, we update the key by removing
the prefix part, and recursively look up the updated key in this child. If there
arent any children bound to any prefix of the key, the looking up fails.
1: function Look-Up(T, k)
2:
if T = NIL then
3:
return not found
4:
repeat
5:
match FALSE
6:
for (ki , Ti ) Children(T ) do
124
if k = ki then
return Data(Ti )
9:
if ki is prefix of k then
10:
match TRUE
11:
k k ki
12:
T Ti
13:
break
14:
until match
15:
return not found
Below Python example program implements the looking up algorithm. It
reuses the lcp(s1, s2) function defined previously to test if a string is the
prefix of the other.
7:
8:
(5.17)
If C is empty, the looking up fails; Otherwise, For C = {(k1 , T1 ), (k2 , T2 ), ..., (kn , Tn )},
we firstly examine if k is the prefix of k1 , if not the recursively check the rest
pairs denoted as C .
: C=
vT1 : k = k1
f ind(C, k) =
(5.18)
lookup(T1 , k k1 ) : k1 k
f ind(C , k) : otherwise
Where A B means string A is prefix of B. f ind mutually calls lookup if
some child is bound to a string which is prefix of the key.
Below Haskell example program implements the looking up algorithm.
import Data.List
find t k = find
find [] _ =
find (p:ps)
| (fst
(children t) k where
Nothing
k
p) == k = value (snd p)
125
5.6
Trie and Patricia can be used to solving some interesting problems. Integer
based prefix tree is used in compiler implementation. Some daily used software
applications have many interesting features which can be realized with trie or
Patricia. In this section, some applications are given as examples, including,
e-dictionary, word auto-completion, T9 input method etc. The commercial implementations typically do not adopt trie or Patricia directly. The solutions we
demonstrated here are for illustration purpose only.
5.6.1
Figure 5.12: E-dictionary. All candidates starting with what the user input are
listed.
A E-dictionary typically contains hundreds of thousands words. Its very expensive to performs a whole word search. Commercial software adopts complex
approaches, including caching, indexing etc to speed up this process.
Similar with e-dictionary, figure 5.13 shows a popular Internet search engine.
When user input something, it provides a candidate lists, with all items starting
with what the user has entered1 . And these candidates are shown in the order
of popularity. The more people search, the upper position it is in the list.
1 Its more complex than just matching the prefix. Including the spell checking and auto
currection, key words extraction and recommendation etc.
126
Figure 5.13: A search engine. All candidates starting with what user input are
listed.
In both cases, the software provides a kind of word auto-completion mechanism. In some modern IDEs, the editor can even help users to auto-complete
program code.
Lets see how to implementation of the e-dictionary with trie or Patricia. To
simplify the problem, we assume the dictionary only supports English - English
information.
A dictionary stores key-value pairs, the keys are English words or phrases,
the values are the meaning described in English sentences.
We can store all the words and their meanings in a trie, but it isnt space
eective especially when there are huge amount of items. Well use Patricia to
realize e-dictionary.
When user wants to look up word a, the dictionary does not only return
the meaning of a, but also provides a list of candidate words, which all start
with a, including abandon, about, accent, adam, ... Of course all these
words are stored in the Patricia.
If there are too many candidates, one solution is only displaying the top 10
words, and the user can browse for more.
The following algorithm reuses the looking up defined for Patricia. When it
finds a node bound to a string which is the prefix of what we are looking for, it
expands all its children until getting n candidates.
1: function Look-Up(T, k, n)
2:
if T = NIL then
3:
return
4:
pref ix NIL
5:
repeat
6:
match FALSE
7:
for (ki , Ti ) Children(T ) do
8:
if k is prefix of ki then
127
Where function Expand(T, pref ix, n) picks n sub trees, which share the
same prefix in T . It is realized as BFS (Bread-First-Search) traverse. Chapter
search explains BFS in detail.
1: function Expand(T, pref ix, n)
2:
R
3:
Q {(pref ix, T )}
4:
while |R| < n |Q| > 0 do
5:
(k, T ) Pop(Q)
6:
if Data(T ) = NIL then
7:
R R {(k, Data(T ) )}
8:
for (ki , Ti ) Children(T ) do
9:
Push(Q, (k + ki , Ti ))
The following example Python program implements the e-dictionary application. When testing if a string is prefix of another one, it uses the find function
provided in standard string library.
import string
def patricia_lookup(t, key, n):
if t is None:
return None
prefix = ""
while True:
match = False
for k, tr in t.children.items():
if string.find(k, key) == 0: #is prefix of
return expand(prefix+k, tr, n)
if string.find(key, k) ==0:
match = True
key = key[len(k):]
t = tr
prefix += k
break
if not match:
return None
def expand(prefix, t, n):
res = []
q = [(prefix, t)]
while len(res)<n and len(q)>0:
(s, p) = q.pop(0)
if p.value is not None:
128
This algorithm can also be implemented recursively, if the string we are looking for is empty, we expand all children until getting n candidates. Otherwise
we recursively examine the children to find one which has prefix equal to this
string.
In programming environments supporting lazy evaluation. An intuitive solution is to expand all candidates, and take the first n on demand. Denote the
Patricia prefix tree in form T = (v, C), below function enumerates all items
starts with key k.
f indAll(T, k) =
enum(C) :
{(, v)} enum(C) :
f ind(C, k) :
k = , v =
k = , v =
k =
(5.19)
The first two clauses deal with the edge cases the the key is empty. All the
children are enumerated except for those with empty values. The last clause
finds child matches k.
For non-empty children, C = {(k1 , T1 ), (k2 , T2 ), ..., (km , Tm )}, denote the
rest pairs except for the first one as C . The enumeration algorithm can be
defined as below.
{
: C=
mapAppend(k1 , f indAll(T1 , )) enum(C ) :
(5.20)
Where mapAppend(k, L) = {(k + ki , vi )|(ki , vi ) L}. It concatenate the
prefix k in front of every key-value pair in list L.
Function f ind(C, k) is defined as the following. For empty children, the
result is empty as well; Otherwise, it examines the first child T1 which is bound
to string k1 . If k1 is equal to k, it calls mapAppend to add prefix to the keys of
all the children under T1 ; If k1 is prefix of k, the algorithm recursively find all
children start with k k1 ; On the other hand, if k is prefix of k1 , all children
under T1 are valid result. Otherwise, the algorithm by-pass the first child and
goes on find the rest children.
enum(C) =
f ind(C, k) =
mapAppend(k, f indAll(T1 , ))
mapAppend(k1 , f indAll(T1 , k k1 ))
f indAll(T1 , )
f ind(C , k)
:
:
:
:
:
C=
k1 = k
k1 k
k k1
otherwise
(5.21)
Below example Haskell program implements the e-dictionary application according to the above equations.
findAll :: Patricia a Key [(Key, a)]
findAll t [] =
case value t of
129
In the lazy evaluation environment, the top n candidates can be gotten like
take(n, f indAll(T, k)). Appendix A has detailed definition of take function.
5.6.2
T9 input method
Most mobile phones around year 2000 are equipped with a key pad. Users have
quite dierent experience from PC when editing a short message or email. This
is because the mobile-phone key pad, or so called ITU-T key pad has much
fewer keys than PC. Figure 5.14 shows one example.
130
Compare these two methods, we can see the second one is much easier for
the end user. The only overhead is to store a dictionary of candidate words.
Method 2 is called as T9 input method, or predictive input method [6], [7].
The abbreviation T9 stands for textonym. It start with T with 9 characters.
T9 input can also be realized with trie or Patricia.
In order to provide candidate words to user, a dictionary must be prepared in
advance. Trie or Patricia can be used to store the dictionary. The commercial T9
implementations typically use complex indexing dictionary in both file system
and cache. The realization shown here is for illustration purpose only.
Firstly, we need define the T9 mapping, which maps from digit to candidate
characters.
MT 9 = { 2 abc, 3 def, 4 ghi,
5 jkl, 6 mno, 7 pqrs,
8 tuv, 9 wxyz}
(5.22)
With this mapping, MT 9 [i] returns the corresponding characters for digit i.
Suppose user input digits D = d1 d2 ...dn , If D isnt empty, denote the rest
digits except for d1 as D , below pseudo code shows how to realize T9 with trie.
1: function Look-Up-T9(T, D)
2:
Q {(, D, T )}
3:
R
4:
while Q = do
5:
(pref ix, D, T ) Pop(Q)
6:
for each c in MT 9 [d1 ] do
7:
if c Children(T ) then
8:
if D = then
9:
R R {pref ix + c}
10:
else
11:
Push(Q, (pref ix + c, D , Children(t)[c]))
12:
return R
Where pref ix + c means appending character c to the end of string pref ix.
Again, this algorithm performs BFS search with a queue Q. The queue is initialized with a tuple (pref ix, D, T ), containing empty prefix, the digit sequence
to be searched, and the trie. It keeps picking the tuple from the queue as far
as it isnt empty. Then get the candidate characters from the first digit to be
processed via the T9 map. For each character c, if there is a sub-tree bound to
it, we created a new tuple, update the prefix by appending c, using the rest of
digits to update D, and use that sub-tree. This new tuple is pushed back to the
131
queue for further searching. If all the digits are processed, it means a candidate
word is found. We put this word to the result list R.
The following example program in Python implements this T9 search with
trie.
T9MAP={2:"abc", 3:"def", 4:"ghi", 5:"jkl",
6:"mno", 7:"pqrs", 8:"tuv", 9:"wxyz"}
def trie_lookup_t9(t, key):
if t is None or key == "":
return None
q = [("", key, t)]
res = []
while len(q)>0:
(prefix, k, t) = q.pop(0)
i=k[0]
if not i in T9MAP:
return None #invalid input
for c in T9MAP[i]:
if c in t.children:
if k[1:]=="":
res.append((prefix+c, t.children[c].value))
else:
q.append((prefix+c, k[1:], t.children[c]))
return res
Because trie is not space eective, we can modify the above algorithm with
Patricia solution. As far as the queue isnt empty, the algorithm pops the tuple.
This time, we examine all the prefix-sub tree pairs. For every pair (ki , Ti ), we
convert the alphabetic prefix ki back to digits sequence D by looking up the T9
map. If D exactly matches the digits of what user input, we find a candidate
word; otherwise if the digit sequence is prefix of what user inputs, the program
creates a new tuple, updates the prefix, the digits to be processed, and the
sub-tree. Then put the tuple back to the queue for further search.
1: function Look-Up-T9(T, D)
2:
Q {(, D, T )}
3:
R
4:
while Q = do
5:
(pref ix, D, T ) Pop(Q)
6:
for each (ki , Ti ) Children(T ) do
7:
D Convert-T9(ki )
8:
if D D then
D is prefix of D
9:
if D = D then
10:
R R {pref ix + ki }
11:
else
12:
Push(Q, (pref ix + ki , D D , Ti ))
13:
return R
Function Convert-T9(K) converts each character in K back to digit.
1: function Convert-T9(K)
2:
D
3:
for each c K do
4:
for each (d S) MT 9 do
132
if c S then
D D {d}
break
8:
return D
The following example Python program implements the T9 input method
with Patricia.
5:
6:
7:
T9 input method can also be realized recursively. Lets first define the trie
solution. The algorithm takes two arguments, a trie storing all the candidate
words, and a sequence of digits that is input by the user. If the sequence
is empty, the result is empty as well; Otherwise, it looks up C to find those
children which are bound to the first digit d1 according to T9 map.
{
f indT 9(T, D) =
{} : D =
f old(f, , lookupT 9(d1 , C)) : otherwise
(5.23)
(5.24)
Note this mapAppend function is a bit dierent from the previous one defined
in e-dictionary application. The first argument is a character, but not a string.
Function lookupT 9(k, C) checks all the possible characters mapped to digit
k. If the character is bound to some child in C, it is record as one candidate.
lookupT 9(d, C) = f old(g, , MT 9 [k])
(5.25)
Where
{
g(L, k) =
L : f ind(C, k) =
{(k, T )} L : f ind(C, k) = T
(5.26)
133
There are few modifications when change the realization from trie to Patricia.
Firstly, the sub-tree is bound to prefix string, but not a single character.
{
f indT 9(T, D) =
{} : D =
f old(f, , f indP ref ixT 9(D, C)) : otherwise
(5.27)
The list for folding is given by calling function f indP ref ixT 9(D, C). And
f is also modified to reflect this change. It appends the candidate prefix D in
front of every result output by the recursive search, and then accumulates the
words.
f (L, (D , T )) = mapAppend(D , f indT 9(T , D D )) L
(5.28)
Function f indP ref ixT 9(D, C) examines all the children. For every pair
(ki , Ti ), if converting ki back to digits yields a prefix of D, then this pair is
selected as a candidate.
f indP ref ixT 9(D, C) = {(ki , Ti )|(ki , Ti ) C, convertT 9(ki ) D}
(5.29)
Function convertT 9(k) converts every alphabetic character in k back to digits according to T9 map.
convertT 9(K) = {d|c k, (d S) MT 9 c S}
(5.30)
134
Exercise 5.2
For the T9 input, compare the results of the algorithms realized with trie
and Patricia, the sequences are dierent. Why does this happen? How to
modify the algorithm so that they output the candidates with the same
sequence?
5.7
Summary
In this chapter, we start from the integer base trie and Patricia. The map
data structure based on integer Patricia plays an important role in Compiler
implementation. Alphabetic trie and Patricia are natural extensions. They can
be used to manipulate text information. As examples, predictive e-dictionary
and T9 input method are realized with trie or Patricia. Although these examples
are dierent from the real implementation in commercial software. They show
simple approaches to solve some problems. Other important data structure,
such as sux tree, has close relationship with them. Sux tree is introduced in
the next chapter.
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. Introduction to Algorithms, Second Edition. Problem 12-1.
ISBN:0262032937. The MIT Press. 2001
[2] Chris Okasaki and Andrew Gill. Fast Mergeable Integer Maps. Workshop
on ML, September 1998, pages 77-86, http://www.cse.ogi.edu/ andy/pub/finite.htm
[3] D.R. Morrison, PATRICIA Practical Algorithm To Retrieve Information
Coded In Alphanumeric, Journal of the ACM, 15(4), October 1968, pages
514-534.
[4] Sux Tree, Wikipedia. http://en.wikipedia.org/wiki/Sux tree
[5] Trie, Wikipedia. http://en.wikipedia.org/wiki/Trie
[6] T9 (predictive text), Wikipedia. http://en.wikipedia.org/wiki/T9 (predictive text)
[7] Predictive text, Wikipedia. http://en.wikipedia.org/wiki/Predictive text
135
136
Sux Tree
Chapter 6
Sux Tree
6.1
Introduction
138
a bananas na
na s
nas s
nas s
(6.1)
Where function suf f ixes(S) gives all the suxes for string S. If the string
is empty, the result is one empty string; otherwise, S itself is one sux, the
others can be given by recursively call suf f ixes(S ), where S is given by drop
the first character from S.
{
{} : S =
suf f ixes(S) =
(6.2)
{S} suf f ixes(S ) : otherwise
This solution constructs sux tree in O(n2 ) time, for string of length n.
It totally inserts n suxes to the tree, and each insertion takes linear time
proportion to the length of the sux. The eciency isnt good enough.
In this chapter, we firstly explain a fast on-line sux trie construction solution by using sux link concept. Because trie isnt space ecient, we next
introduce a linear time on-line sux tree construction algorithm found by Ukkonen. and show how to solve some interesting string manipulation problems with
sux tree.
6.2
Sux trie
Just likes the relationship between trie and Patricia, Sux trie has much simpler
structure than sux tree. Figure 6.3 shows the sux trie for banana.
Compare with figure 6.1, we can find the dierence between sux tree and
sux trie. Instead of representing a word, every edge in sux trie only represents a character. Thus sux trie needs much more spaces. If we pack all nodes
which have only one child, the sux trie is turned into a sux tree.
We can reuse the trie definition for sux trie. Each node is bound to a
character, and contains multiple sub trees as children. A child can be referred
from the bounded character.
139
6.2.1
For string S of length n, define Si = s1 s2 ...si . It is the prefix contains the first
i characters.
In sux trie, each node represents a sux string. for example in figure 6.4,
node X represents sux a, by adding character c, node X transfers to Y
which represents sux ac. We say node X transfer to Y with the edge of
character c[1].
Y Children(X)[c]
We also say that node X has a c-child Y . Below Python expression reflects
this concept.
y = x.children[c]
If node A in a sux trie represents sux si si+1 ...sn , and node B represents
sux si+1 si+2 ...sn , we say node B represents the sux of node A. We can
create a link from A to B. This link is defined as the sux link of node A[1].
Sux link is drawn in dotted style. In figure 6.4, the sux link of node A points
to node B, and the sux link of node B points to node C.
Sux link is valid for all nodes except the root. We can add a sux link
field to the trie definition. Below Python example code shows this update.
class STrie:
def __init__(self, suffix=None):
self.children = {}
self.suffix = suffix
140
root
X
c
a
C
B
o
a
A
o
Figure 6.4: Sux trie for string cacao. Node X a, node Y ac, X
transfers to Y with character c
sux string
s1 s2 s3 ...si
s2 s3 ...si
...
si1 si
si
6.2.2
On-line construction
For string S, Suppose we have constructed sux trie for the i-th prefix Si =
s1 s2 ...si . Denote it as Suf f ixT rie(Si ). Lets consider how to obtain Suf f ixT rie(Si+1 )
from Suf f ixT rie(Si ).
If list all suxes corresponding to Suf f ixT rie(Si ), from the longest (which
is Si ) to the shortest (which is empty), we can get table 6.1. There are total
i + 1 suxes.
One solution is to append the character si+1 to every sux in this table,
then add another empty string. This idea can be realized by adding a new child
for every node in the trie, and binding all these new child with edge of character
si+1 .
Algorithm 1 Update Suf f ixT rie(Si ) to Suf f ixT rie(Si+1 ), initial version.
1: for T Suf f ixT rie(Si ) do
2:
Children(T )[si+1 ] Create-Empty-Node
141
However, some nodes in Suf f ixT rie(Si ) may have the si+1 -child already.
For example, in figure 6.5, node X and Y are corresponding for sux cac and
ac respectively. They dont have the a-child. But node Z, which represents
sux c has the a-child already.
root
root
a c
Z
c
Y
c
142
links. And the order of such traversing is exactly what we want. Finally, there
is a special sux trie for empty string Suf f ixT rie(N IL), We define the top
equals to the root in this case.
function Insert(top, c)
if top = NIL then
The trie is empty
top Create-Empty-Node
T top
T Create-Empty-Node
dummy init value
while T = NIL Children(T )[c] = NIL do
Children(T )[c] Create-Empty-Node
Suffix-Link(T ) Children(T )[c]
T Children(T )[c]
T Suffix-Link(T )
if T = NIL then
Suffix-Link(T ) Children(T )[c]
return Children(top)[c]
returns the new top
Function Insert, updates Suf f ixT rie(Si ) to Suf f ixT rie(Si+1 ). It takes
two arguments, one is the top of Suf f ixT rie(Si ), the other is si+1 character.
If the top is NIL, it means the tree is empty, so there is no root. The algorithm
creates a root node in this case. A sentinel empty node T is created. It keeps
tracking the previous created new node. In the main loop, the algorithm checks
every node one by one along the sux link. If the node hasnt the si+1 -child, it
then creates a new node, and binds the edge to character si+1 . The algorithm
repeatedly goes up along the sux link until either arrives at the root, or find a
node which has the si+1 -child already. After the loop, if the node isnt empty,
it means we stop at a node which has the si+1 -child. The last sux link then
points to that child. Finally, the new top position is returned, so that we can
further insert other characters to the sux trie.
For a given string S, the sux trie can be built by repeatedly calling Insert
function.
1: function Suffix-Trie(S)
2:
t NIL
3:
for i 1 to |S| do
4:
t Insert(t, si )
5:
return t
This algorithm returns the top of the sux trie, but not the root. In order
to access the root, we can traverse along the sux link.
1: function Root(T )
2:
while Suffix-Link(T ) = NIL do
3:
T Suffix-Link(T )
4:
return T
Figure 6.6 shows the steps when construct sux trie for cacao. Only the
last layer of sux links are shown.
For Insert algorithm, the computation time is proportion to the size of sux
trie. In the worse case, the sux trie is built in O(n2 ) time, where n = |S|. One
example is S = an bn , that there are n characters of a and n characters of b.
The following example Python program implements the sux trie construc-
143
root
root
root
(a) Empty
(b) c
(c) ca
root
a
root
X
c
root
a c
a
C
Z
c
B
a
a
A
(d) cac
(e) caca
(f) cacao
Figure 6.6: Construct sux trie for cacao. There are 6 steps. Only the last
layer of sux links are shown in dotted arrow.
144
tion algorithm.
def suffix_trie(str):
t = None
for c in str:
t = insert(t, c)
return root(t)
def insert(top, c):
if top is None:
top=STrie()
node = top
new_node = STrie() #dummy init value
while (node is not None) and (c not in node.children):
new_node.suffix = node.children[c] = STrie(node)
new_node = node.children[c]
node = node.suffix
if node is not None:
new_node.suffix = node.children[c]
return top.children[c] #update top
def root(node):
while node.suffix is not None:
node = node.suffix
return node
6.3
Sux Tree
Sux trie isnt space ecient, and the construction time is quadratic. If dont
care about the speed, we can compress the sux trie to sux tree[6]. Ukkonen
found a linear time on-line sux tree construction algorithm in 1995.
6.3.1
On-line construction
145
that after the j-th node, they are not leaves any longer, we need repeatedly
branch out from this time point till the k-th node.
Ukkonen defines the first none-leaf node nj as active point and the last
node nk as end point. The end point can be the root.
Reference pair
a bananas na
X
na s
nas s
Y
nas s
146
The canonical reference pair is the one which has the closest node to the position. Specially, in case the position is an explicit node, the canonical reference
pair is (node, ), so (Y, ) is the canonical reference pair of node Y .
Below algorithm converts a reference pair (node, (l, r)) to the canonical reference pair (node , (l , r)). Note that since r doesnt change, the algorithm can
only return (node , l ) as the result.
Algorithm 3 Convert reference pair to canonical reference pair
1: function Canonize(node, (l, r))
2:
if node = NIL then
3:
if (l, r) = then
4:
return ( NIL, l)
5:
else
6:
return Canonize(root, (l + 1, r))
7:
while l r do
(l, r)isn tempty
8:
((l , r ), node ) Children(node)[sl ]
9:
if r l r l then
10:
l l + r l + 1
Remove |(l , r )| chars from (l, r)
11:
node node
12:
else
13:
break
14:
return (node, l)
If the passed in node parameter is NIL, it means a very special case. The
function is called like the following.
Canonize(Suffix-Link(root), (l, r))
Because the sux link of root points to NIL, the result should be (root, (l +
1, r)) if (l, r) is not . Otherwise, (NIL, ) is returned to indicate a terminal
position.
We explain this special case in detail in later sections.
The algorithm
In 6.3.1, we mentioned, all updating to leaves is trivial, because we only need
append the new coming character to the leaf. With reference pair, it means,
when update Suf f ixT ree(Si ) to Suf f ixT ree(Si+1 ), all reference pairs in form
(node, (l, i)), are leaves. They will change to (node, (l, i+1)) next time. Ukkonen
defines leaf in form (node, (l, )), here means open to grow. We can skip
all leaves until the sux tree is completely constructed. After that, we can
change all to the length of the string.
So the main algorithm only cares about positions from the active point to
the end point. However, how to find the active point and the end point?
When start sux tree construction, there is only a root node. There arent
any branches or leaves. The active point should be (root, ), or (root, (1, 0)) (the
string index starts from 1).
For the end point, it is a position where we can finish updating Suf f ixT ree(Si ).
According to the sux trie algorithm, we know it should be a position which has
the si+1 -child already. Because a position in sux trie may not be an explicit
node in sux tree, if (node, (l, r)) is the end point, there are two cases.
147
1. (l, r) = . It means the node itself is the end point. This node has the
si+1 -child, which means Children(node)[si+1 ] = NIL;
2. Otherwise, l r, the end point is an implicit position. It must satisfy si+1 = sl +|(l,r)| , where Children(node)[sl ]= ((l , r ), node ), |(l, r)|
means the length of sub-string (l, r). It equals to r l + 1. This is illustrated in figure 6.8. We can also say that (node, (l, r)) has a si+1 -child
implicitly.
148
Figure 6.9:
End
Suf f ixT ree(Si+1 ).
point
in
and
active
point
in
149
end point, the updating loop can be finished; else, we turn the position to an
explicit node, and return it for further branching.
We can finalize the Ukkonens algorithm as below.
1: function Suffix-Tree(S)
2:
root Create-Empty-Node
3:
node root, l 0
4:
for i 1 to |S| do
5:
(node, l) Update(node, (l, i))
6:
(node, l) Canonize(node, (l, i))
7:
return root
Figure 6.10 shows the steps when constructing the sux tree for string cacao.
ca
root
(a) Empty
(b) c
(c) ca
ca a
ac cac
aca caca
(d) cac
(e) caca
cao o
cao o
(f) cacao
Figure 6.10: Construct sux tree for cacao. There are 6 steps. Only the last
layer of sux links are shown in dotted arrow.
Note that we neednt set sux link for leaf nodes, only branch nodes need
sux links.
The following example Python program implements Ukkonens algorithm.
First is the node definition.
class Node:
def __init__(self, suffix=None):
self.children = {} # c:(word, Node), where word = (l, r)
self.suffix = suffix
Because there is only one copy of the complete string, all sub-strings are
represent in (lef t, right) pairs, and the leaf are open pairs as (lef t, ). The
sux tree is defined like below.
class STree:
def __init__(self, s):
150
The infinity is defined as the length of the string plus a big number. Some
auxiliary functions are defined.
def substr(str, str_ref):
(l, r)=str_ref
return str[l:r+1]
def length(str_ref):
(l, r)=str_ref
return r-l+1
151
The edge function extracts a common prefix from a list of strings. The prefix
returned by edge function may not be the longest one, empty string is also
allowed. The exact behavior can be customized with dierent edge functions.
build(edge, X)
This defines a generic radix tree building function. It takes an edge function,
and a set of strings. X can be all suxes of a string, so that we get sux trie
or sux tree. Well also explain later that X can be all prefixes, which lead to
normal prefix trie or Patricia.
Suppose all the strings are built from a character set . When build the
tree, if the string is empty, X only contains one empty string as well. The result
152
build(edge, X) =
leaf
branch({({c} p, build(edge, X ))|
c ,
G {group(X, c)},
(p, X ) {edge(G)}})
: X = {}
: otherwise
(6.3)
The algorithm categorizes all suxes by the first letter in several groups.
It removes the first letter for each element in every group. For example, the
suxes {acac, cac, ac, c} are categorized to groups {(a, [cac, c]),
(c, [ac, ])}.
group(X, c) = {C |{c1 } C X, c1 = c}
(6.4)
Function group enumerates all suxes in X, for each one, denote the first
character as c1 , the rest characters as C . If c1 is same as the given character c,
then C is collected.
Below example Haskell program implements the generic radix tree building
algorithm.
alpha = [a..z]++[A..Z]
lazyTree::EdgeFunc [String] Tr
lazyTree edge = build where
build [[]] = Lf
build ss = Br [(a:prefix, build ss) |
aalpha,
xs@(x:_) [[cs | c:csss, c==a]],
(prefix, ss)[edge xs]]
Dierent edge functions produce dierent radix trees. Since edge function
extracts common prefix from a set of strings. The simplest one constantly uses
the empty string as the common prefix. This edge function builds a trie.
edgeT rie(X) = (, X)
(6.5)
We can also realize an edge function, that extracts the longest common
prefix. Such edge function builds a Patricia. Denote the strings as X =
{x1 , x2 , ..., xn }, for the each string xi , let the initial character be ci , and the
rest characters in xi as Wi . If there is only one string in X, the longest common prefix is definitely this string; If there are two strings start with dierent
initial characters, the longest common prefix is empty; Otherwise,it means all
the strings share the same initial character. This character definitely belongs to
the longest common prefix. We can remove it from all strings, and recursively
call the edge function.
edgeT ree(X) =
(6.6)
153
For any given string, we can build sux trie and sux tree by feeding suxes
to these two edge functions.
suf f ixT rie(S) = build(edgeT rie, suf f ixes(S))
(6.7)
(6.8)
6.4
(6.9)
(6.10)
Sux tree can help to solve many string and DNA pattern manipulation problems particularly fast.
6.4.1
String/Pattern searching
There a plenty of string searching algorithms, such as the famous KMP(KnuthMorris-Pratt algorithm is introduced in the chapter of search) algorithm. Sux
tree can perform at the same level as KMP[11]. the string searching in bound
to O(m) time, where m is the length of the sub-string to be search. However,
O(n) time is required to build the sux tree in advance, where n is the length
of the text[12].
Not only sub-string searching, but also pattern matching, including regular
expression matching can be solved with sux tree. Ukkonen summarizes this
kind of problems as sub-string motifs: For a string S, Suf f ixT ree(S) gives
complete occurrence counts of all sub-string motifs of S in O(n) time, although
S may have O(n2 ) sub-strings.
154
This algorithm can also be realized in recursive way. For the non-leaf sux
tree T , denote the children as C = {(s1 , T1 ), (s2 , T2 ), ...}. We search the sub
string among the children.
lookuppattern (T, s) = f ind(C, s)
(6.11)
If children C is empty, it means the sub string doesnt occurs at all. Otherwise, we examine the first pair (s1 , T1 ), if s is prefix of s1 , then the number
of sub-trees in T1 is the result. If s1 is prefix of s, we remove s1 from s, and
155
0
max(1, |C1 |)
f ind(C, s) =
lookuppattern (T1 , s s1 )
f ind(C , s)
:
:
:
:
C=
s s1
s1 s
otherwise
(6.12)
We always append special terminator to the string (the $ in above program), so that there wont be any sux becomes the prefix of the other[3].
Sux tree also supports searching pattern like a**n, we skip it here. Readers can refer to [13] and [14] for details.
6.4.2
mississippi$ p
ppi$ ssi
A
ppi$ ssippi$
i$
pi$
si
B
ppi$ ssippi$
C
ppi$ ssippi$
156
This example tells us that the depth of the branch node should be measured by the number of characters traversed from the root. But not the number
of explicit branch nodes.
To find the longest repeated sub-string, we can perform BFS in the sux
tree.
1: function Longest-Repeated-Substring(T )
2:
Q (NIL, Root(T ))
3:
R NIL
4:
while Q is not empty do
5:
(s, T ) Pop(Q)
6:
for each ((l, r), T ) Children(T ) do
7:
if T is not leaf then
8:
s Concatenate(s, (l, r))
9:
Push(Q, (s , T ))
10:
R Update(R, s )
11:
return R
This algorithm initializes a queue with a pair of an empty string and the
root. Then it repeatedly examine the candidate in the queue.
For each node, the algorithm examines each children one by one. If it is a
branch node, the child is pushed back to the queue for further search. And the
sub-string represented by this child will be treated as a candidate of the longest
repeated sub-string.
Function Update(R, s ) updates the longest repeated sub-string candidates.
If multiple candidates have the same length, they are all kept in a result list.
1: function Update(L, s)
2:
if L = NIL |l1 | < |s| then
3:
return l {s}
4:
if |l1 | = |s| then
5:
return Append(L, s)
6:
return L
The above algorithm can be implemented in Python as the following example
program.
def lrs(t):
queue = [("", t.root)]
res = []
while len(queue)>0:
(s, node) = queue.pop(0)
for _, (str_ref, tr) in node.children.items():
if len(tr.children)>0:
s1 = s+t.substr(str_ref)
queue.append((s1, tr))
res = update_max(res, s1)
return res
def update_max(lst, x):
if lst ==[] or len(lst[0]) < len(x):
return [x]
if len(lst[0]) == len(x):
return lst + [x]
157
return lst
Searching the deepest branch can also be realized recursively. If the tree is
just a leaf node, empty string is returned, else the algorithm tries to find the
longest repeated sub-string from the children.
{
: leaf (T )
longest({si LRS(Ti )|(si , Ti ) C, leaf (Ti )}) : otherwise
(6.13)
The following Haskell example program implements the longest repeated
sub-string algorithm.
LRS(T ) =
isLeaf Lf = True
isLeaf _ = False
lrs Lf = ""
lrs (Br lst) = find $ filter (not isLeaf snd) lst where
find [] = ""
find ((s, t):xs) = maximumBy (compare on length) [s++(lrs t), find xs]
6.4.3
The longest common sub-string, can also be quickly found with sux tree. The
solution is to build a generalized sux tree. If the two strings are denoted as
txt1 and txt2 , a generalized sux tree is Suf f ixT ree(txt1 $1 txt2 $2 ). Where
$1 is a special terminator character for txt1 , and $2 = $1 is another special
terminator character for txt2 .
The longest common sub-string is indicated by the deepest branch node, with
two forks corresponding to both ...$1 ... and ...$2 (no $1 ). The definition of
the deepest node is as same as the one for the longest repeated sub-string, it is
the number of characters traversed from root.
If a node has ...$1 ... under it, the node must represent a sub-string of
txt1 , as $1 is the terminator of txt1 . On the other hand, since it also has ...$2
(without $1 ), this node must represent a sub-string of txt2 too. Because its the
deepest one satisfied this criteria, so the node represents the longest common
sub-string.
Again, we can use BFS (bread first search) to find the longest common
sub-string.
1: function Longest-Common-Substring(T )
2:
Q (NIL, Root(T ))
3:
R NIL
4:
while Q is not empty do
5:
(s, T ) POP(Q)
6:
if Match-Fork(T ) then
7:
R Update(R, s)
8:
for each ((l, r), T ) Children(T ) do
9:
if T is not leaf then
10:
s Concatenate(s, (l, r))
11:
Push(Q, (s , T ))
12:
return R
158
Most part is as same as the the longest repeated sub-sting searching algorithm. The function Match-Fork checks if the children satisfy the common
sub-string criteria.
1: function Match-Fork(T )
2:
if | Children(T ) | = 2 then
3:
{(s1 , T1 ), (s2 , T2 )} Children(T )
4:
return T1 is leaf T2 is leaf Xor($1 s1 , $1 s2 ))
5:
return FALSE
In this function, it checks if the two children are both leaf. One contains $2 ,
while the other doesnt. This is because if one child is a leaf, it always contains
$1 according to the definition of sux tree.
The following Python program implement the longest common sub-string
program.
def lcs(t):
queue = [("", t.root)]
res = []
while len(queue)>0:
(s, node) = queue.pop(0)
if match_fork(t, node):
res = update_max(res, s)
for _, (str_ref, tr) in node.children.items():
if len(tr.children)>0:
s1 = s + t.substr(str_ref)
queue.append((s1, tr))
return res
def is_leaf(node):
return node.children=={}
def match_fork(t, node):
if len(node.children)==2:
[(_, (str_ref1, tr1)), (_, (str_ref2, tr2))]=node.children.items()
return is_leaf(tr1) and is_leaf(tr2) and
(t.substr(str_ref1).find(#)!=-1) !=
(t.substr(str_ref2).find(#)!=-1)
return False
The longest common sub-string finding algorithm can also be realized recursively. If the sux tree T is a leaf, the result is empty; Otherwise, we examine
all children in T . For those satisfy the matching criteria, the sub-string are
collected as candidates; for those dont matching, we recursively search the
common sub-string among the children. The longest candidate is selected as
the final result.
: leaf (T )
{s
|(s
,
T
)
C,
match(T
)}
LCS(T ) =
i
i
i
i
: otherwise
159
lcs Lf = []
lcs (Br lst) = find $ filter (not isLeaf snd) lst where
find [] = []
find ((s, t):xs) = maxBy (compare on length)
(if match t
then s:(find xs)
else (map (s++) (lcs t)) ++ (find xs))
match (Br [(s1, Lf), (s2, Lf)]) = ("#" isInfixOf s1) /= ("#" isInfixOf s2)
match _ = False
6.4.4
(6.15)
6.4.5
Others
Sux tree can also be used for data compression, such as Burrows-Wheeler
transform, LZW compression (LZSS) etc. [3]
6.5
Sux Tree was first introduced by Weiner in 1973 [2]. In 1976, McCreight
greatly simplified the construction algorithm. McCreight constructs the sux
tree from right to left. In 1995, Ukkonen gave the first on-line construction
algorithms from left to right. All the three algorithms are linear time (O(n)).
And some research shows the relationship among these 3 algorithms. [7]
160
Bibliography
[1] Esko Ukkonen. On-line construction of sux trees. Algorithmica
14
(3):
249260.
doi:10.1007/BF01206331.
http://www.cs.helsinki.fi/u/ukkonen/SuxT1withFigs.pdf
[2] Weiner, P. Linear pattern matching algorithms, 14th Annual
IEEE Symposium on Switching and Automata Theory, pp. 1-11,
doi:10.1109/SWAT.1973.13
[3] Sux Tree, Wikipedia. http://en.wikipedia.org/wiki/Sux tree
[4] Esko Ukkonen. Sux tree and sux array techniques for pattern analysis
in strings. http://www.cs.helsinki.fi/u/ukkonen/Erice2005.ppt
[5] Trie, Wikipedia. http://en.wikipedia.org/wiki/Trie
[6] Sux Tree (Java). http://en.literateprograms.org/Sux tree (Java)
[7] Robert Giegerich and Stefan Kurtz. From Ukkonen to McCreight
and Weiner: A Unifying View of Linear-Time Sux Tree Construction. Science of Computer Programming 25(2-3):187-218, 1995.
http://citeseer.ist.psu.edu/giegerich95comparison.html
[8] Robert Giegerich and Stefan Kurtz. A Comparison of Imperative and Purely Functional Sux Tree Constructions. Algorithmica 19 (3):
331353. doi:10.1007/PL00009177. www.zbh.unihamburg.de/pubs/pdf/GieKur1997.pdf
[9] Bryan OSullivan. suxtree: Ecient, lazy sux tree implementation.
http://hackage.haskell.org/package/suxtree
[10] Danny. http://hkn.eecs.berkeley.edu/ dyoo/plt/suxtree/
[11] Zhang
Shaojie.
Lecture
of
Sux
http://www.cs.ucf.edu/ shzhang/Combio09/lec3.pdf
Trees.
161
162
B-Trees
Chapter 7
B-Trees
7.1
Introduction
164
CHAPTER 7. B-TREES
For non-empty binary tree (L, k, R), where L, R and k are the left, right children, and the key. Function Key(T ) accesses the key of tree T . The constraint
can be represented as the following.
x L, y R Key(x) k Key(y)
(7.1)
If we extend this definition to allow multiple keys and children, we get the
B-tree definition.
A B-tree
is either empty;
or contains n keys, and n + 1 children, each child is also a B-Tree, we
denote these keys and children as k1 , k2 , ..., kn and c1 , c2 , ..., cn , cn+1 .
Figure 7.2 illustrates a B-Tree node.
C[1]
K[1]
C[2]
K[2]
...
C[n]
K[n]
C[n+1]
(7.2)
Finally, after adding some constraints to make the tree balanced, we get the
complete B-tree definition.
All leaves have the same depth;
We define integral number, t, as the minimum degree of B-tree;
each node can have at most 2t 1 keys;
each node can have at least t 1 keys, except the root;
Consider a B-tree holds n keys. The minimum degree t 2. The height is
h. All the nodes have at least t 1 keys except the root. The root contains at
least 1 key. There are at least 2 nodes at depth 1, at least 2t nodes at depth 2,
at least 2t2 nodes at depth 3, ..., finally, there are at least 2th1 nodes at depth
h. Times all nodes with t 1 except for root, the total number of keys satisfies
the following inequality.
n 1 + (t 1)(2 + 2t + 2t2 + ... + 2th1 )
h1
tk
= 1 + 2(t 1)
k=0
th 1
= 1 + 2(t 1)
t1
= 2th 1
(7.3)
7.2. INSERTION
165
Thus we have the inequality between the height and the number of keys.
h logt
n+1
2
(7.4)
This is the reason why B-tree is balanced. The simplest B-tree is so called
2-3-4 tree, where t = 2, that every node except root contains 2 or 3 or 4 keys.
red-black tree can be mapped to 2-3-4 tree essentially.
The following Python code shows example B-tree definition. It explicitly
pass t when create a node.
class BTree:
def __init__(self, t):
self.t = t
self.keys = []
self.children = []
B-tree nodes commonly have satellite data as well. We ignore satellite data
for illustration purpose.
In this chapter, we will firstly introduce how to generate B-tree by insertion.
Two dierent methods will be explained. One is the classic method as in [2],
that we split the node before insertion if its full; the other is the modify-fix
approach which is quite similar to the red-black tree solution [3] [2]. We will
next explain how to delete key from B-tree and how to look up a key.
7.2
Insertion
B-tree can be created by inserting keys repeatedly. The basic idea is similar to
the binary search tree. When insert key x, from the tree root, we examine all
the keys in the node to find a position where all the keys on the left are less
than x, while all the keys on the right are greater than x.1 If the current node
is a leaf node, and it is not full (there are less then 2t 1 keys in this node), x
will be insert at this position. Otherwise, the position points to a child node.
We need recursively insert x to it.
Figure 7.3 shows one example. The B-tree illustrated is 2-3-4 tree. When
insert key x = 22, because its greater than the root, the right child contains
key 26, 38, 45 is examined next; Since 22 < 26, the first child contains key 21
and 25 are examined. This is a leaf node, and it is not full, key 22 is inserted
to this node.
However, if there are 2t 1 keys in the leaf, the new key x cant be inserted,
because this node is full. When try to insert key 18 to the above example
B-tree will meet this problem. There are 2 methods to solve it.
7.2.1
Splitting
166
CHAPTER 7. B-TREES
20
11
12
26
15
16
17
21
25
30
38
31
45
37
40
42
46
47
50
(a) Insert key 22 to the 2-3-4 tree. 22 > 20, go to the right child; 22 < 26 go
to the first child.
20
11
12
26
15
16
17
21
22
25
30
38
31
37
45
40
42
46
47
50
7.2. INSERTION
167
K[1]
C[1]
C[2]
K[2]
...
...
K[t]
C[t]
...
K[2t-1]
C[t+1]
...
C[2t-1]
C[2t]
...
K[1]
C[1]
K[2]
...
C[2]
...
K[t]
...
K[t-1]
K[t+1]
C[t]
C[t+1]
...
...
K[2t-1]
C[2t-1]
x.keys = x.keys[:t-1]
if not is_leaf(x):
y.children = x.children[t:]
x.children = x.children[:t]
168
CHAPTER 7. B-TREES
(b) t = 3
7.2. INSERTION
169
For the array based collection, append on the tail is much more eective
than insert in other position, because the later takes O(n) time, if the length
of the collection is n. The ordered insert program firstly appends the new
element at the end of the existing collection, then iterates from the last element
to the first one, and checks if the current two elements next to each other are
ordered. If not, these two elements will be swapped.
Insert then fixing
In functional settings, B-tree insertion can be realized in a way similar to redblack tree. When insert a key to red-black tree, it is firstly inserted as in the
normal binary search tree, then recursive fixing is performed to resume the
balance of the tree. B-tree can be viewed as extension to the binary search tree,
that each node contains multiple keys and children. We can firstly insert the
key without considering if the node is full. Then perform fixing to satisfy the
minimum degree constraint.
insert(T, k) = f ix(ins(T, k))
(7.5)
170
CHAPTER 7. B-TREES
Function ins(T, k) traverse the B-tree T from root to find a proper position
where key k can be inserted. After that, function f ix is applied to resume the
B-tree properties. Denote B-tree in a form of T = (K, C, t), where K represents
keys, C represents children, and t is the minimum degree.
Below is the Haskell definition of B-tree.
data BTree a = Node{ keys :: [a]
, children :: [BTree a]
, degree :: Int} deriving (Eq)
There are two cases when realize ins(T, k) function. If the tree T is leaf, k
is inserted to the keys; Otherwise if T is the branch node, we need recursively
insert k to the proper child.
Figure 7.6 shows the branch case. The algorithm first locates the position.
for certain key ki , if the new key k to be inserted satisfy ki1 < k < ki , Then
we need recursively insert k to child ci .
This position divides the node into 3 parts, the left part, the child ci and
the right part.
k, K[i-1]<k<K[i]
insert to
K[1]
C[1]
C[2]
K[2]
...
...
K[i-1]
C[i-1]
K[i]
C[i]
...
K[n]
C[i+1]
...
C[n]
C[n+1]
K[1]
K[2]
...
K[i-1]
k, K[i-1]<k<K[i]
K[i]
K[i+1]
...
K[n]
recursive insert
C[1]
C[2]
...
C[i-1]
C[i]
C[i+1]
...
C[n+1]
(K {k} K , , t) : C = , (K , K ) = divide(K, k)
make((K , C1 ), ins(c, k), (K , C2 )) : (C1 , C2 ) = split(|K |, C)
(7.6)
The first clause deals with the leaf case. Function divide(K, k) divide keys
into two parts, all keys in the first part are not greater than k, and all rest keys
are not less than k.
ins(T, k) =
K = K K k K , k K k k k
7.2. INSERTION
171
The second clause handle the branch case. Function split(n, C) splits children in two parts, C1 and C2 . C1 contains the first n children; and C2 contains
the rest. Among C2 , the first child is denoted as c, and others are represented
as C2 .
Here the key k need be recursively inserted into child c. Function make
takes 3 parameter. The first and the third are pairs of key and children; the
second parameter is a child node. It examines if a B-tree node made from these
keys and children violates the minimum degree constraint and performs fixing
if necessary.
{
f ixF ull((K , C ), c, (K , C )) :
(K K , C {c} C , t) :
f ull(c)
otherwise
(7.7)
Where function f ull(c) tests if the child c is full. Function f ixF ull splits
the the child c, and forms a new B-tree node with the pushed up key.
make((K , C ), c, (K , C )) =
f ix(T ) =
c : T = (, {c}, t)
({k }, {c1 , c2 }, t) : f ull(T ), (c1 , k , c2 ) = split(T )
T : otherwise
(7.9)
172
CHAPTER 7. B-TREES
7.3
Deletion
Deleting a key from B-tree may violate balance properties. Except the root, a
node shouldnt contain too few keys less than t 1, where t is the minimum
degree.
Similar to the approaches for insertion, we can either do some preparation
so that the node from where the key being deleted contains enough keys; or do
some fixing after the deletion if the node has too few keys.
7.3.1
We start from the easiest case. If the key k to be deleted can be located in
node x, and x is a leaf node, we can directly remove k from x. If x is the root
(the only node of the tree), we neednt worry about there are too few keys after
deletion. This case is named as case 1 later.
In most cases, we start from the root, along a path to locate where is the
node contains k. If k can be located in the internal node x, there are three sub
cases.
7.3. DELETION
173
Case 2a, If the child y precedes k contains enough keys (more than t), we
replace k in node x with k , which is the predecessor of k in child y. And
recursively remove k from y.
The predecessor of k can be easily located as the last key of child y.
This is shown in figure 7.8.
174
CHAPTER 7. B-TREES
7.3. DELETION
175
to ci , and move one key from the sibling up to x. Also we need move the
relative child from the sibling to ci .
This operation makes ci contains enough keys for deletion. we can next
try to delete k from ci recursively.
Figure 7.11 illustrates this case.
176
CHAPTER 7. B-TREES
7.3. DELETION
21:
22:
23:
177
Break
else
ii+1
if T is leaf then
return T
k doesnt exist in T .
if Can-Del(ci (T )) then
case 3
27:
if i > 1 Can-Del(ci1 (T )) then
case 3a: left sibling
28:
Insert(K(ci (T )), ki1 (T ))
29:
ki1 (T ) Pop-Back(K(ci1 (T )))
30:
if ci (T ) isnt leaf then
31:
c Pop-Back(C(ci1 (T )))
32:
Insert(C(ci (T )), c)
33:
else if i |C(T )| Can-Del(ci1 (T )) then case 3a: right sibling
34:
Append(K(ci (T )), ki (T ))
35:
ki (T ) Pop-Front(K(ci+1 (T )))
36:
if ci (T ) isnt leaf then
37:
c Pop-Front(C(ci+1 (T )))
38:
Append(C(ci (T )), c)
39:
else
case 3b
40:
if i > 1 then
41:
Merge-Children(T, i 1)
42:
else
43:
Merge-Children(T, i)
44:
Delete(ci (T ), k)
recursive delete
45:
if K(T ) = N IL then
Shrinks height
46:
T c1 (T )
47:
return T
Figure 7.13, 7.14, and 7.15 show the deleting process step by step. The nodes
modified are shaded.
The following example Python program implements the B-tree deletion algorithm.
24:
25:
26:
def can_remove(tr):
return len(tr.keys) tr.t
def replace_key(tr, i, k):
tr.keys[i] = k
return k
def merge_children(tr, i):
tr.children[i].keys += [tr.keys[i]] + tr.children[i+1].keys
tr.children[i].children += tr.children[i+1].children
tr.keys.pop(i)
tr.children.pop(i+1)
def B_tree_delete(tr, key):
i = len(tr.keys)
while i>0:
if key == tr.keys[i-1]:
if tr.leaf: # case 1 in CLRS
178
CHAPTER 7. B-TREES
7.3. DELETION
179
C
(b) After delete key B, case 3a, borrow from right sibling.
(c) After delete key U, case 3a, borrow from left sibling.
tr.keys.remove(key)
else: # case 2 in CLRS
if tr.children[i-1].can_remove(): # case 2a
key = tr.replace_key(i-1, tr.children[i-1].keys[-1])
B_tree_delete(tr.children[i-1], key)
elif tr.children[i].can_remove(): # case 2b
key = tr.replace_key(i-1, tr.children[i].keys[0])
B_tree_delete(tr.children[i], key)
else: # case 2c
tr.merge_children(i-1)
B_tree_delete(tr.children[i-1], key)
if tr.keys==[]: # tree shrinks in height
tr = tr.children[i-1]
return tr
elif key > tr.keys[i-1]:
break
else:
i = i-1
# case 3
if tr.leaf:
return tr #key doesnt exist at all
if not tr.children[i].can_remove():
if i>0 and tr.children[i-1].can_remove(): #left sibling
tr.children[i].keys.insert(0, tr.keys[i-1])
tr.keys[i-1] = tr.children[i-1].keys.pop()
if not tr.children[i].leaf:
tr.children[i].children.insert(0, tr.children[i-1].children.pop())
elif i<len(tr.children) and tr.children[i+1].can_remove(): #right sibling
180
CHAPTER 7. B-TREES
tr.children[i].keys.append(tr.keys[i])
tr.keys[i]=tr.children[i+1].keys.pop(0)
if not tr.children[i].leaf:
tr.children[i].children.append(tr.children[i+1].children.pop(0))
else: # case 3b
if i>0:
tr.merge_children(i-1)
else:
tr.merge_children(i)
B_tree_delete(tr.children[i], key)
if tr.keys==[]: # tree shrinks in height
tr = tr.children[0]
return tr
7.3.2
The merge and delete algorithm is a bit complex. There are several cases, and
in each case, there are sub cases to deal.
Another approach to design the deleting algorithm is to perform fixing after
deletion. It is similar to the insert-then-fix strategy.
delete(T, k) = f ix(del(T, k))
(7.10)
When delete a key from B-tree, we firstly locate which node this key is
contained. We traverse from the root to the leaves till find this key in some
node.
If this node is a leaf, we can remove the key, and then examine if the deletion
makes the node contains too few keys to satisfy the B-tree balance properties.
If it is a branch node, removing the key breaks the node into two parts. We
need merge them together. The merging is a recursive process which is shown
in figure 7.16.
When do merging, if the two nodes are not leaves, we merge the keys together, and recursively merge the last child of the left part and the first child
of the right part to one new node. Otherwise, if they are leaves, we merely put
all keys together.
Till now, the deleting is performed in straightforward way. However, deleting
decreases the number of keys of a node, and it may result in violating the B-tree
balance properties. The solution is to perform fixing along the path traversed
from root.
During the recursive deletion, the branch node is broken into 3 parts. The
left part contains all keys less than k, includes k1 , k2 , ..., ki1 , and children
c1 , c2 , ..., ci1 , the right part contains all keys greater than k, say ki , ki+1 , ..., kn+1 ,
and children ci+1 , ci+2 , ..., cn+1 . Then key k is recursively deleted from child ci .
Denote the result becomes ci after that. We need make a new node from these
3 parts, as shown in figure 7.17.
At this time point, we need examine if ci contains enough keys. If there
are too less keys (less than t 1, but not t in contrast to the merge-and-delete
approach), we can either borrow a key-child pair from the left or the right part,
and do inverse operation of splitting. Figure 7.18 shows example of borrowing
from the left part.
If both left part and right part are empty, we can simply push ci up.
7.3. DELETION
181
Figure 7.16: Delete a key from a branch node. Removing ki breaks the node
into 2 parts. Merging these 2 parts is a recursive process. When the two parts
are leaves, the merging terminates.
Figure 7.17: After delete key k from node ci , denote the result as ci . The fixing
makes a new node from the left part, ci and the right part.
182
CHAPTER 7. B-TREES
Figure 7.18: Borrow a key-child pair from left part and un-split to a new child.
Denote the B-tree as T = (K, C, t), where K and C are keys and children.
The del(T, k) function deletes key k from the tree.
del(T, k) =
(delete(K, k), , t) : C =
merge((K1 , C1 , t), (K2 , C2 , t)) : ki = k
/K
make((K1 , C1 ), del(c, k), (K2 , C2 )) : k
(7.11)
If children C = is empty, T is leaf. k is deleted from keys directly. Otherwise, T is internal node. If k K, removing it separates the keys and children
in two parts (K1 , C1 ) and (K2 , C2 ). They will be recursively merged.
K1 = {k1 , k2 , ..., ki1 }
K2 = {ki+1 , ki+2 , ..., km }
C1 = {c1 , c2 , ..., ci }
C2 = {ci+1 , ci+2 , ..., cm+1 }
If k
/ K, we need locate a child c, and further delete k from it.
(K1 , K2 ) = ({k |k K, k < k}, {k |k K, k < k })
(C1 , {c} C2 ) = splitAt(|K1 |, C)
The recursive merge function is defined as the following. When merge two
trees T1 = (K1 , C1 , t) and T2 = (K2 , C2 , t), if both are leaves, we create a new
leave by concatenating the keys. Otherwise, the last child in C1 , and the first
child in C2 are recursively merged. And we call make function to form the new
tree. When C1 and C2 are not empty, denote the last child of C1 as c1,m , the
rest as C1 ; the first child of C2 as C2,1 , the rest as C2 . Below equation defines
7.3. DELETION
183
(K1 K2 , , t) :
make((K1 , C1 ), merge(c1,m , c2,1 ), (K2 , C2 )) :
C1 = C2 =
otherwise
(7.12)
The make function defined above only handles the case that a node contains
too many keys due to insertion. When delete key, it may cause a node contains
too few keys. We need test and fix this situation as well.
merge(T1 , T2 ) =
f ixLow((K , C ), c, (K , C )) : low(c)
make((K , C ), c, (K , C )) =
(K K , C {c} C , t) : otherwise
(7.13)
Where low(T ) checks if there are too few keys less than t 1. Function
f ixLow(Pl , c, Pr ) takes three arguments, the left pair of keys and children, a
child node, and the right pair of keys and children. If the left part isnt empty, we
borrow a pair of key-child, and do un-splitting to make the child contain enough
keys, then recursively call make; If the right part isnt empty, we borrow a pair
from the right; and if both sides are empty, we return the child node as result.
In this case, the height of the tree shrinks.
Denote the left part Pl = (Kl , Cl ). If Kl isnt empty, the last key and child
are represented as kl,m and cl,m respectively. The rest keys and children become
Kl and Cl ; Similarly, the right part is denoted as Pr = (Kr , Cr ). If Kr isnt
empty, the first key and child are represented as kr,1 , and cr,1 . The rest keys
and children are Kr and Cr . Below equation gives the definition of f ixLow.
c : otherwise
(7.14)
Function unsplit(T1 , k, T2 ) is the inverse operation to splitting. It forms a
new B-tree nodes from two small nodes and a key.
unsplit(T1 , k, T2 ) = (K1 {k} K2 , C1 C2 , t)
(7.15)
The following example Haskell program implements the B-tree deletion algorithm.
import qualified Data.List as L
delete tr x = fixRoot $ del tr x
del:: (Ord a) BTree a a BTree a
del (Node ks [] t) x = Node (L.delete x ks) [] t
del (Node ks cs t) x =
case L.elemIndex x ks of
Just i merge (Node (take i ks) (take (i+1) cs) t)
(Node (drop (i+1) ks) (drop (i+1) cs) t)
Nothing make (ks, cs) (del c x) (ks, cs)
where
184
CHAPTER 7. B-TREES
(ks, ks) = L.partition (<x) ks
(cs, (c:cs)) = L.splitAt (length ks) cs
When delete the same keys from the B-tree as in delete and fixing approach,
the results are dierent. However, both satisfy the B-tree properties, so they
are all valid.
M
7.3. DELETION
185
186
7.4
CHAPTER 7. B-TREES
Searching
The search algorithm can also be realized by recursion. When search key k
in B-tree T = (K, C, t), we partition the keys with k.
K1 = {k |k < k}
K2 = {k |k k }
Thus K1 contains all the keys less than k, and K2 holds the rest. If the first
element in K2 is equal to k, we find the key. Otherwise, we recursively search
the key in child c|K1 |+1 .
(T, |K1 | + 1) : k K2
: C=
search(T, k) =
(7.16)
187
7.5
Exercise 7.1
When insert a key, we need find a position, where all keys on the left are
less than it, while all the others on the right are greater than it. Modify
the algorithm so that the elements stored in B-tree only need support
less-than and equality test.
We assume the element being inserted doesnt exist in the tree. Modify
the algorithm so that duplicated elements can be stored in a linked-list.
Eliminate the recursion in imperative B-tree insertion algorithm.
188
CHAPTER 7. B-TREES
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[2] B-tree, Wikipedia. http://en.wikipedia.org/wiki/B-tree
[3] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional
Setting. J. Functional Programming. 1998
189
190
BIBLIOGRAPHY
Part III
Heaps
191
Chapter 8
Binary Heaps
8.1
Introduction
Heaps are one of the most widely used data structuresused to solve practical
problems such as sorting, prioritized scheduling and in implementing graph
algorithms, to name a few[2].
Most popular implementations of heaps use a kind of implicit binary heap
using arrays, which is described in [2]. Examples include C++/STL heap and
Python heapq. The most ecient heap sort algorithm is also realized with
binary heap as proposed by R. W. Floyd [3] [5].
However, heaps can be general and realized with varies of other data structures besides array. In this chapter, explicit binary tree is used. It leads to
Leftist heaps, Skew heaps, and Splay heaps, which are suitable for purely functional implementation as shown by Okasaki[6].
A heap is a data structure that satisfies the following heap property.
Top operation always returns the minimum (maximum) element;
Pop operation removes the top element from the heap while the heap
property should be kept, so that the new top element is still the minimum
(maximum) one;
Insert a new element to heap should keep the heap property. That the
new top is still the minimum (maximum) element;
Other operations including merge etc should all keep the heap property.
This is a kind of recursive definition, while it doesnt limit the under ground
data structure.
We call the heap with the minimum element on top as min-heap, while if
the top keeps the maximum element, we call it max-heap.
8.2
194
return the root as the result. And for pop operation, we can remove the root
and rebuild the tree from the children.
If binary tree is used to implement the heap, we can call it binary heap. This
chapter explains three dierent realizations for binary heap.
8.2.1
Definition
The first one is implicit binary tree. Consider the problem how to represent
a complete binary tree with array. (For example, try to represent a complete
binary tree in the programming language doesnt support structure or record
data type. Only array can be used). One solution is to pack all elements from
top level (root) down to bottom level (leaves).
Figure 8.1 shows a complete binary tree and its corresponding array representation.
16
16
14
10
14
10
This mapping between tree and array can be defined as the following equations (The array index starts from 1).
1: function Parent(i)
2:
return 2i
3:
4:
function Left(i)
return 2i
function Right(i)
return 2i + 1
For a given tree node which is represented as the i-th element of the array,
since the tree is complete, we can easily find its parent node as the i/2-th
element; Its left child with index of 2i and right child of 2i + 1. If the index of
the child exceeds the length of the array, it means this node does not have such
a child (leaf for example).
In real implementation, this mapping can be calculated fast with bit-wise
operation like the following example ANSI C code. Note that, the array index
starts from zero in C like languages.
5:
6:
195
8.2.2
Heapify
The most important thing for heap algorithm is to maintain the heap property,
that the top element should be the minimum (maximum) one.
For the implicit binary heap by array, it means for a given node, which is
represented as the i-th index, we can develop a method to check if both its two
children are not less than the parent. In case there is violation, we need swap
the parent and child recursively [2]. Note that here we assume both the two
sub-trees are the valid heaps.
Below algorithm shows the iterative solution to enforce the min-heap property from a given index of the array.
1: function Heapify(A, i)
2:
n |A|
3:
loop
4:
l Left(i)
5:
r Right(i)
6:
smallest i
7:
if l < n A[l] < A[i] then
8:
smallest l
9:
if r < n A[r] < A[smallest] then
10:
smallest r
11:
if smallest = i then
12:
Exchange A[i] A[smallest]
13:
i smallest
14:
else
15:
return
For array A and the given index i, None its children should be less than
A[i], in case there is violation, we pick the smallest element as A[i], and swap
the previous A[i] to child. The algorithm traverses the tree top-down to fix the
heap property until either reach a leaf or there is no heap property violation.
The Heapify algorithm takes O(lg n) time, where n is the number of elements. This is because the loop time is proportion to the height of the complete
binary tree.
When implement this algorithm, the comparison method can be passed as
a parameter, so that both min-heap and max-heap can be supported. The
following ANSI C example code uses this approach.
typedef int (Less)(Key, Key);
int less(Key x, Key y) { return x < y; }
int notless(Key x, Key y) { return !less(x, y); }
void heapify(Key a, int i, int n, Less lt) {
int l, r, m;
while (1) {
l = LEFT(i);
196
Figure 8.2 illustrates the steps when Heapify processing the array {16, 4, 10, 14, 7, 9, 3, 2, 8, 1}
from the second index. The array changes to {16, 14, 10, 8, 7, 9, 3, 2, 4, 1} as a
max-heap.
8.2.3
Build a heap
(8.1)
197
16
14
10
(a) Step 1, 14 is the biggest element among 4, 14, and 7. Swap 4 with the left
child;
16
14
10
(b) Step 2, 8 is the biggest element among 2, 4, and 8. Swap 4 with the right
child;
16
14
10
198
(8.2)
Figure 8.3, 8.4 and 8.5 show the steps when building a max-heap from array
{4, 1, 3, 2, 16, 9, 10, 14, 8, 7}. The node in black color is the one where Heapify
being applied, the nodes in gray color are swapped in order to keep the heap
property.
8.2.4
The generic definition of heap (not necessarily the binary heap) demands us to
to provide basic operations for accessing and modifying data.
The most important operations include accessing the top element (find the
minimum or maximum one), popping the top element from the heap, finding
the top k elements, decreasing a key ( for min-heap. It is increasing a key for
max-heap), and insertion.
For the binary tree, most of operations are bound to O(lg n) in worst-case,
some of them, such as top is O(1) constant time.
Access the top element
For the binary tree realization, it is the root stores the minimum (maximum)
value. This is the first element in the array.
1: function Top(A)
2:
return A[1]
This operation is trivial. It takes O(1) time. Here we skip the error handling
for empty case. If the heap is empty, one option is to raise an error.
Heap Pop
Pop operation is more complex than accessing the top, because the heap property has to be maintained after the top element is removed.
The solution is to apply Heapify algorithm to the next element after the
root is removed.
One simple but slow method based on this idea looks like the following.
1: function Pop-Slow(A)
2:
x Top(A)
3:
Remove(A, 1)
16
10
14
199
14
16
10
(b) Step 1, The array is mapped to binary tree. The first branch node, which
is 16 is examined;
4
14
16
10
(c) Step 2, 16 is the largest element in current sub tree, next is to check node
with value 2;
Figure 8.3: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
200
14
16
10
(a) Step 3, 14 is the largest value in the sub-tree, swap 14 and 2; next is to
check node with value 3;
4
14
10
16
(b) Step 4, 10 is the largest value in the sub-tree, swap 10 and 3; next is to
check node with value 1;
Figure 8.4: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
201
14
16
10
(a) Step 5, 16 is the largest value in current sub-tree, swap 16 and 1 first; then
similarly, swap 1 and 7; next is to check the root node with value 4;
16
14
10
(b) Step 6, Swap 4 and 16, then swap 4 and 14, and then swap 4 and 8; And
the whole build process finish.
Figure 8.5: Build a heap from the arbitrary array. Gray nodes are changed in
each step, black node will be processed next step.
202
203
Decrease key
Heap can be used to implement priority queue. It is important to support key
modification in heap. One typical operation is to increase the priority of a tasks
so that it can be performed earlier.
Here we present the decrease key operation for a min-heap. The corresponding operation is increase key for max-heap. Figure 8.6 and 8.7 illustrate such a
case for a max-heap. The key of the 9-th node is increased from 4 to 15.
16
14
10
15
14
10
(b) The key is modified to 15, which is greater than its parent;
16
15
14
10
204
14
15
10
(a) Since 15 is greater than its parent 14, they are swapped. After that, because
15 is less than 16, the process terminates.
205
Insertion
Insertion can be implemented by using Decrease-Key [2]. A new node with
as key is created. According to the min-heap property, it should be the last
element in the under ground array. After that, the key is decreased to the value
to be inserted, and Decrease-Key is called to fix any violation to the heap
property.
Alternatively, we can reuse Heap-Fix to implement insertion. The new key
is directly appended at the end of the array, and the Heap-Fix is applied to
this new node.
1: function Heap-Push(A, k)
2:
Append(A, k)
3:
Heap-Fix(A, |A|)
The following example Python program implements the heap insertion algorithm.
def heap_insert(x, key, less_p = MIN_HEAP):
i = len(x)
x.append(key)
heap_fix(x, i, less_p)
8.2.5
Heap sort
206
top, it may violate the heap property. We can shrink the heap size by one and
perform Heapify to resume the heap property. This process is repeated till
there is only one element left in the heap.
1: function Heap-Sort(A)
2:
Build-Max-Heap(A)
3:
while |A| > 1 do
4:
Exchange A[1] A[n]
5:
|A| |A| 1
6:
Heapify(A, 1)
This is in-place algorithm, it neednt any extra spaces to hold the result.
The following ANSI C example code implements this algorithm.
void heap_sort(Key a, int n) {
build_heap(a, n, notless);
while(n > 1) {
swap(a, 0, --n);
heapify(a, 0, n, notless);
}
}
Exercise 8.1
Somebody considers one alternative to realize in-place heap sort. Take
sorting the array in ascending order as example, the first step is to build the
array as a minimum heap A, but not the maximum heap like the Floyds
method. After that the first element a1 is in the correct place. Next, treat
the rest {a2 , a3 , ..., an } as a new heap, and perform Heapify to them from
a2 for these n 1 elements. Repeating this advance and Heapify step
from left to right would sort the array. The following example ANSI C
code illustrates this idea. Is this solution correct? If yes, prove it; if not,
why?
void heap_sort(Key a, int n) {
build_heap(a, n, less);
while(--n)
heapify(++a, 0, n, less);
}
Because of the same reason, can we perform Heapify from left to right k
times to realize in-place top-k algorithm like below ANSI C code?
int tops(int k, Key a, int n, Less lt) {
build_heap(a, n, lt);
for (k = MIN(k, n) - 1; k; --k)
heapify(++a, 0, --n, lt);
return k;
}
8.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS207
8.3
Figure 8.8: A binary tree, all elements in children are not less than k.
If k is the top element, all elements in left and right children are not less than
k in a min-heap. After k is popped, only left and right children are left. They
have to be merged to a new tree. Since heap property should be maintained
after merge, the new root is still the smallest element.
Because both left and right children are binary trees conforming heap property, the two trivial cases can be defined immediately.
H2 : H1 =
H1 : H2 =
merge(H1 , H2 ) =
? : otherwise
Where means empty heap.
If neither left nor right child is empty, because they all fit heap property, the
top elements of them are all the minimum respectively. We can compare these
two roots, and select the smaller as the new root of the merged heap.
For instance, let L = (A, x, B) and R = (A , y, B ), where A, A , B, and B
are all sub trees. If x < y, x will be the new root. We can either keep A, and
recursively merge B and R; or keep B, and merge A and R, so the new heap
can be one of the following.
(merge(A, R), x, B)
(A, x, merge(B, R))
208
Both are correct. One simplified solution is to only merge the right sub tree.
Leftist tree provides a systematically approach based on this idea.
8.3.1
Definition
The heap implemented by Leftist tree is called Leftist heap. Leftist tree is first
introduced by C. A. Crane in 1972[6].
Rank (S-value)
In Leftist tree, a rank value (or S value) is defined for each node. Rank is the
distance to the nearest external node. Where external node is the NIL concept
extended from the leaf node.
For example, in figure 8.9, the rank of NIL is defined 0, consider the root
node 4, The nearest external node is the child of node 8. So the rank of root
node 4 is 2. Because node 6 and node 8 both only contain NIL, so their rank
values are 1. Although node 5 has non-NIL left child, However, since the right
child is NIL, so the rank value, which is the minimum distance to NIL is still 1.
4
NIL
NIL
NIL
NIL
NIL
Leftist property
With rank defined, we can create a strategy when merging.
Every time when merging, we always merge to right child; Denote the
rank of the new right sub tree as rr ;
Compare the ranks of the left and right children, if the rank of left sub
tree is rl and rl < rr , we swap the left and the right children.
We call this Leftist property. In general, a Leftist tree always has the
shortest path to some external node on the right.
Leftist tree tends to be very unbalanced, However, it ensures important
property as specified in the following theorem.
Theorem 8.3.1. If a Leftist tree T contains n internal nodes, the path from
root to the rightmost external node contains at most log(n + 1) nodes.
8.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS209
We skip the proof here, readers can refer to [7] and [1] for more information.
With this theorem, algorithms operate along this path are all bound to O(lg n).
We can reuse the binary tree definition, and augment with a rank field to
define the Leftist tree, for example in form of (r, k, L, R) for non-empty case.
Below Haskell code defines the Leftist tree.
data LHeap a = E -- Empty
| Node Int a (LHeap a) (LHeap a) -- rank, element, left, right
For empty tree, the rank is defined as zero. Otherwise, its the value of the
augmented field. A rank(H) function can be given to cover both cases.
{
0 : H=
rank(H) =
(8.3)
r : otherwise, H = (r, k, L, R)
Here is the example Haskell rank function.
rank E = 0
rank (Node r _ _ _) = r
8.3.2
Merge
H2 : H1 =
H1 : H2 =
merge(H1 , H2 ) =
mk(k1 , L1 , merge(R1 , H2 )) : k1 < k2
(8.5)
The merge function is always recursively called on the right side, and the
Leftist property is maintained. These facts ensure the performance being bound
to O(lg n).
The following Haskell example code implements the merge program.
merge E h = h
merge h E = h
merge h1@(Node _ x l r) h2@(Node _ y l r) =
210
8.3.3
Most of the basic heap operations can be implemented with merge algorithm
defined above.
Top and pop
Because the smallest element is always held in root, its trivial to find the
minimum value. Its constant O(1) operation. Below equation extracts the root
from non-empty heap H = (r, k, L, R). The error handling for empty case is
skipped here.
top(H) = k
(8.6)
For pop operation, firstly, the top element is removed, then left and right
children are merged to a new heap.
pop(H) = merge(L, R)
(8.7)
Because it calls merge directly, the pop operation on Leftist heap is bound
to O(lg n).
Insertion
To insert a new element, one solution is to create a single leaf node with the
element, and then merge this leaf node to the existing Leftist tree.
insert(H, k) = merge(H, (1, k, , ))
(8.8)
(8.9)
8.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS211
1
16
14
10
Figure 8.10: A Leftist tree built from list {9, 4, 16, 7, 10, 2, 14, 3, 8, 1}.
Figure 8.10 shows one example Leftist tree built in this way.
The following example Haskell code gives reference implementation for the
Leftist tree operations.
insert h x = merge (Node 1 x E E) h
findMin (Node _ x _ _) = x
deleteMin (Node _ _ l r) = merge l r
fromList = foldl insert E
8.3.4
With all the basic operations defined, its straightforward to implement heap
sort. We can firstly turn the list into a Leftist heap, then continuously extract
the minimum element from it.
sort(L) = heapSort(build(L))
{
heapSort(H) =
: H=
{top(H)} heapSort(pop(H)) : otherwise
(8.10)
(8.11)
212
8.3.5
Skew heaps
Leftist heap leads to quite unbalanced structure sometimes. Figure 8.11 shows
one example. The Leftist tree is built by folding on list {16, 14, 10, 8, 7, 9, 3, 2, 4, 1}.
1
10
14
16
Figure 8.11:
A very
{16, 14, 10, 8, 7, 9, 3, 2, 4, 1}.
unbalanced
Leftist
tree
build
from
list
Skew heap (or self-adjusting heap) simplifies Leftist heap realization and
intends to solve the balance issue[9] [10].
When construct the Leftist heap, we swap the left and right children during
merge if the rank on left side is less than the right side. This comparison-andswap strategy doesnt work when either sub tree has only one child. Because
in such case, the rank of the sub tree is always 1 no matter how big it is. A
Brute-force approach is to swap the left and right children every time when
merge. This idea leads to Skew heap.
Definition of Skew heap
Skew heap is the heap realized with Skew tree. Skew tree is a special binary
tree. The minimum element is stored in root. Every sub tree is also a skew tree.
It neednt keep the rank (or S-value) field. We can reuse the binary tree
definition for Skew heap. The tree is either empty, or in a pre-order form
(k, L, R). Below Haskell code defines Skew heap like this.
data SHeap a = E -- Empty
| Node a (SHeap a) (SHeap a) -- element, left, right
Merge
The merge algorithm tends to be very simple. When merge two non-empty Skew
trees, we compare the roots, and pick the smaller one as the new root, then the
other tree contains the bigger element is merged onto one sub tree, finally, the
tow children are swapped. Denote H1 = (k1 , L1 , R1 ) and H2 = (k2 , L2 , R2 ) if
they are not empty. if k1 < k2 for instance, select k1 as the new root. We
8.3. LEFTIST HEAP AND SKEW HEAP, THE EXPLICIT BINARY HEAPS213
can either merge H2 to L1 , or merge H2 to R1 . Without loss of generality,
lets merge to R1 . And after swapping the two children, the final result is
(k1 , merge(R1 , H2 ), L1 ). Take account of edge cases, the merge algorithm is
defined as the following.
H1
H2
merge(H1 , H2 ) =
(k
,
merge(R
,
H
),
L
1
1
2
1)
(k2 , merge(H1 , R2 ), L2 )
:
:
:
:
H2 =
H1 =
k1 < k 2
otherwise
(8.12)
All the rest operations, including insert, top and pop are all realized as same
as the Leftist heap by using merge, except that we neednt the rank any more.
Translating the above algorithm into Haskell yields the following example
program.
merge E h = h
merge h E = h
merge h1@(Node x l r) h2@(Node y l r) =
if x < y then Node x (merge r h2) l
else Node y (merge h1 r) l
insert h x = merge (Node x E E) h
findMin (Node x _ _) = x
deleteMin (Node _ l r) = merge l r
Dierent from the Leftist heap, if we feed ordered list to Skew heap, it can
build a fairly balanced binary tree as illustrated in figure 8.12.
1
16
14
10
Figure 8.12: Skew tree is still balanced even the input is an ordered list
{1, 2, ..., 10}.
214
8.4
Splay heap
The Leftist heap and Skew heap show the fact that its quite possible to realize
heap data structure with explicit binary tree. Skew heap gives one method to
solve the tree balance problem. Splay heap on the other hand, use another
method to keep the tree balanced.
The binary trees used in Leftist heap and Skew heap are not Binary Search
tree (BST). If we turn the underground data structure to binary search tree,
the minimum(or maximum) element is not root any more. It takes O(lg n) time
to find the minimum(or maximum) element.
Binary search tree becomes inecient if it isnt well balanced. Most operations degrade to O(n) in the worst case. Although red-black tree can be
used to realize binary heap, its overkill. Splay tree provides a light weight
implementation with acceptable dynamic balancing result.
8.4.1
Definition
Splay tree uses cache-like approach. It keeps rotating the current access node
close to the top, so that the node can be accessed fast next time. It defines
such kinds of operation as Splay. For the unbalanced binary search tree, after
several splay operations, the tree tends to be more and more balanced. Most
basic operations of Splay tree perform in amortized O(lg n) time. Splay tree
was invented by Daniel Dominic Sleator and Robert Endre Tarjan in 1985[11]
[12].
Splaying
There are two methods to do splaying. The first one need deal with many
dierent cases, but can be implemented fairly easy with pattern matching. The
second one has a uniformed form, but the implementation is complex.
Denote the node currently being accessed as X, the parent node as P , and
the grand parent node as G (If there are). There are 3 steps for splaying. Each
step contains 2 symmetric cases. For illustration purpose, only one case is shown
for each step.
Zig-zig step. As shown in figure 8.13, in this case, X and P are children
on the same side of G, either both on left or right. By rotating 2 times,
X becomes the new root.
Zig-zag step. As shown in figure 8.14, in this case, X and P are children
on dierent sides. X is on the left, P is on the right. Or X is on the right,
P is on the left. After rotation, X becomes the new root, P and G are
siblings.
Zig step. As shown in figure 8.15, in this case, P is the root, we rotate the
tree, so that X becomes new root. This is the last step in splay operation.
Although there are 6 dierent cases, they can be handled in the environments
support pattern matching. Denote the non-empty binary tree in form T =
215
d
X
216
(L, k, R),. when access key Y in tree T , the splay operation can be defined as
below.
((a, P, b), X, c)
:
:
:
:
:
:
:
(Node (Node a x b) p c)
then Node a x (Node b p
a g (Node b p (Node c x
then Node (Node (Node a
(Node a p
then Node
a g (Node
then Node
(Node
(Node
(Node
(Node
b
a
b
a
x
p
x
g
g d) y =
(Node c g d)) else t
d))) y =
g b) p c) x d else t
c)) g d) y
b) x (Node
c) p d)) y
b) x (Node
=
c g d) else t
=
c p d) else t
With splay operation defined, every time when insert a new key, we call
the splay function to adjust the tree. If the tree is empty, the result is a leaf;
otherwise we compare this key with the root, if it is less than the root, we
recursively insert it into the left child, and perform splaying after that; else the
key is inserted into the right child.
(, x, ) : T =
splay((insert(L, x), k, R), x) : T = (L, k, R), x < k
insert(T, x) =
insert E y = Node E y E
insert (Node l x r) y
| x>y
= splay (Node (insert l y) x r) y
| otherwise = splay (Node l x (insert r y)) y
217
Figure 8.16 shows the result of using this function. It inserts the ordered
elements {1, 2, ..., 10} one by one to the empty tree. This would build a very
poor result which downgrade to linked-list with normal binary search tree. The
splay method creates more balanced result.
10
Okasaki found a simple rule for Splaying [6]. Whenever we follow two left
branches, or two right branches continuously, we rotate the two nodes.
Based on this rule, splaying can be realized in such a way. When we access
node for a key x (can be during the process of inserting a node, or looking up a
node, or deleting a node), if we traverse two left branches or two right branches,
we partition the tree in two parts L and R, where L contains all nodes smaller
than x, and R contains all the rest. We can then create a new tree (for instance
in insertion), with x as the root, L as the left child, and R being the right child.
218
The partition process is recursive, because it will splay its children as well.
(, )
(T,
)
((L, k, L ), k , A, B)
partition(T, p) =
:
:
T =
T = (L, k, R) R =
T = (L, k, (L , k , R ))
k < p, k < p
(A, B) = partition(R , p)
T = (L, K, (L , k , R ))
k < p k
(A, B) = partition(L , p)
(, T ) : T = (L, k, R) L =
T = ((L , k , R ), k, R)
(A, (L , k , (R , k, R)) :
p k, p k
(A, B) = partition(L , p)
T = ((L , k , R ), k, R)
k p k
(A, B) = partition(R , p)
(8.15)
Function partition(T, p) takes a tree T , and a pivot p as arguments. The
first clause is edge case. The partition result for empty is a pair of empty left
and right trees. Otherwise, denote the tree as (L, k, R). we need compare the
pivot p and the root k. If k < p, there are two sub-cases. one is trivial case that
R is empty. According to the property of binary search tree, All elements are
less than p, so the result pair is (T, ); For the other case, R = (L , k , R ), we
need further compare k with the pivot p. If k < p is also true, we recursively
partition R with the pivot, all the elements less than p in R is held in tree A,
and the rest is in tree B. The result pair can be composed with two trees, one is
((L, k, L ), k , A); the other is B. If the key of the right sub tree is not less than
the pivot, we recursively partition L with the pivot to give the intermediate
pair (A, B), the final pair trees can be composed with (L, k, A) and (B, k , R ).
There are symmetric cases for p k. They are handled in the last three clauses.
Translating the above algorithm into Haskell yields the following partition
program.
partition E _ = (E, E)
partition t@(Node l x r) y
| x<y=
case r of
E (t, E)
Node l x r
if x < y then
let (small, big) = partition r y in
(Node (Node l x l) x small, big)
else
let (small, big) = partition l y in
(Node l x small, Node big x r)
219
partition l y in
(Node r x r))
partition r y in
Node big x r)
(8.16)
pop(T ) =
R : T = (, k, R)
(R , k, R) : T = ((, k , R ), k, R)
(8.18)
Note that the third clause performs splaying without explicitly call the
partition function. It utilizes the property of binary search tree directly.
Both the top and pop algorithms are bound to O(lg n) time because the
splay tree is balanced.
The following Haskell example programs implement the top and pop operations.
findMin (Node E x _) = x
findMin (Node l x _) = findMin l
deleteMin (Node E x r) = r
deleteMin (Node (Node E x r) x r) = Node r x r
deleteMin (Node (Node l x r) x r) = Node (deleteMin l) x (Node r x r)
220
Merge
Merge is another basic operation for heaps as it is widely used in Graph algorithms. By using the partition algorithm, merge can be realized in O(lg n)
time.
When merging two splay trees, for non-trivial case, we can take the root of
the first tree as the new root, then partition the second tree with this new root
as the pivot. After that we recursively merge the children of the first tree to
the partition result. This algorithm is defined as the following.
{
T2 : T1 =
(merge(L, A), k, merge(R, B)) : T1 = (L, k, R), (A, B) = partition(T2 , k)
(8.19)
If the first heap is empty, the result is definitely the second heap. Otherwise,
denote the first splay heap as (L, k, R), we partition T2 with k as the pivot to
yield (A, B), where A contains all the elements in T2 which are less than k, and
B holds the rest. We next recursively merge A with L; and merge B with R as
the new children for T1 .
Translating the definition to Haskell gives the following example program.
merge(T1 , T2 ) =
merge E t = t
merge (Node l x r) t = Node (merge l l) x (merge r r)
where (l, r) = partition t x
8.4.2
Heap sort
8.5
In this chapter, we define binary heap more general so that as long as the heap
property is maintained, all binary representation of data structures can be used
to implement binary heap.
This definition doesnt limit to the popular array based binary heap, but
also extends to the explicit binary heaps including Leftist heap, Skew heap and
Splay heap. The array based binary heap is particularly convenient for the
imperative implementation because it intensely uses random index access which
can be mapped to a completely binary tree. Its hard to find directly functional
counterpart in this way.
However, by using explicit binary tree, functional implementation can be
achieved, most of them have O(lg n) worst case performance, and some of them
even reach O(1) amortize time. Okasaki in [6] shows detailed analysis of these
data structures.
In this chapter, only purely functional realization for Leftist heap, Skew heap,
and Splay heap are explained, they can all be realized in imperative approaches.
221
Its very natural to extend the concept from binary tree to k-ary (k-way)
tree, which leads to other useful heaps such as Binomial heap, Fibonacci heap
and pairing heap. They are introduced in the following chapters.
Exercise 8.2
Realize the imperative Leftist heap, Skew heap, and Splay heap.
222
Bibliography
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[2] Heap (data structure), Wikipedia. http://en.wikipedia.org/wiki/Heap (data structure)
[3] Heapsort, Wikipedia. http://en.wikipedia.org/wiki/Heapsort
[4] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[5] Sorting
algorithms/Heapsort.
Rosetta
http://rosettacode.org/wiki/Sorting algorithms/Heapsort
Code.
223
224
Chapter 9
Introduction
We have introduced the hello world sorting algorithm, insertion sort. In this
short chapter, we explain another straightforward sorting method, selection sort.
The basic version of selection sort doesnt perform as good as the divide and
conqueror methods, e.g. quick sort and merge sort. Well use the same approaches in the chapter of insertion sort, to analyze why its slow, and try to
improve it by varies of attempts till reach the best bound of comparison based
sorting, O(n lg n), by evolving to heap sort.
The idea of selection sort can be illustrated by a real life story. Consider
a kid eating a bunch of grapes. There are two types of children according to
my observation. One is optimistic type, that the kid always eats the biggest
grape he/she can ever find; the other is pessimistic, that he/she always eats the
smallest one.
The first type of kids actually eat the grape in an order that the size decreases
monotonically; while the other eat in a increase order. The kid sorts the grapes
in order of size in fact, and the method used here is selection sort.
Based on this idea, the algorithm of selection sort can be directly described
as the following.
In order to sort a series of elements:
The trivial case, if the series is empty, then we are done, the result is also
empty;
Otherwise, we find the smallest element, and append it to the tail of the
result;
Note that this algorithm sorts the elements in increase order; Its easy to
sort in decrease order by picking the biggest element instead; Well introduce
about passing a comparator as a parameter later on.
225
226CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
(9.1)
Where m is the minimum element among collection A, and A is all the rest
elements except m:
m = min(A)
A = A {m}
We dont limit the data structure of the collection here. Typically, A is an
array in imperative environment, and a list (singly linked-list particularly) in
functional environment, and it can even be other data struture which will be
introduced later.
The algorithm can also be given in imperative manner.
function Sort(A)
X
while A = do
x Min(A)
A Del(A, x)
X Append(X, x)
return X
Figure 9.2 depicts the process of this algorithm.
pick
... sorted elements ...
min
Figure 9.2: The left part is sorted data, continuously pick the minimum element
in the rest and append it to the result.
We just translate the very original idea of eating grapes line by line without
considering any expense of time and space. This realization stores the result in
227
insert
Figure 9.3: The left part is sorted data, continuously pick the minimum element
in the rest and put it to the right position.
9.2
We havent completely realized the selection sort, because we take the operation
of finding the minimum (or the maximum) element as a black box. Its a puzzle
how does a kid locate the biggest or the smallest grape. And this is an interesting
topic for computer algorithms.
The easiest but not so fast way to find the minimum in a collection is to
perform a scan. There are several ways to interpret this scan process. Consider
that we want to pick the biggest grape. We start from any grape, compare
it with another one, and pick the bigger one; then we take a next grape and
compare it with the one we selected so far, pick the bigger one and go on the
take-and-compare process, until there are not any grapes we havent compared.
Its easy to get loss in real practice if we dont mark which grape has been
compared. There are two ways to to solve this problem, which are suitable for
dierent data-structures respectively.
228CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
9.2.1
Labeling
Method 1 is to label each grape with a number: {1, 2, ..., n}, and we systematically perform the comparison in the order of this sequence of labels. That we
first compare grape number 1 and grape number 2, pick the bigger one; then we
take grape number 3, and do the comparison, ... We repeat this process until
arrive at grape number n. This is quite suitable for elements stored in an array.
function Min(A)
m A[1]
for i 2 to |A| do
if A[i] < m then
m A[i]
return m
With Min defined, we can complete the basic version of selection sort (or
naive version without any optimization in terms of time and space).
However, this algorithm returns the value of the minimum element instead
of its location (or the label of the grape), which needs a bit tweaking for the
in-place version. Some languages such as ISO C++, support returning the
reference as result, so that the swap can be achieved directly as below.
template<typename T>
T& min(T from, T to) {
T m;
for (m = from++; from != to; ++from)
if (from < m)
m = from;
return m;
}
template<typename T>
void ssort(T xs, int n) {
for (int i = 0; i < n; ++i)
std::swap(xs[i], min(xs+i, xs+n));
}
229
n = len(xs)
for i in range(n):
m = min_at(xs, i, n)
(xs[i], xs[m]) = (xs[m], xs[i])
return xs
def min_at(xs, i, n):
m = i;
for j in range(i+1, n):
if xs[j] < xs[m]:
m=j
return m
9.2.2
Grouping
Another method is to group all grapes in two parts: the group we have examined,
and the rest we havent. We denote these two groups as A and B; All the
elements (grapes) as L. At the beginning, we havent examine any grapes at
all, thus A is empty (), and B contains all grapes. We can select arbitrary two
grapes from B, compare them, and put the loser (the smaller one for example) to
A. After that, we repeat this process by continuously picking arbitrary grapes
from B, and compare with the winner of the previous time until B becomes
empty. At this time being, the final winner is the minimum element. And A
turns to be L{min(L)}, which can be used for the next time minimum finding.
There is an invariant of this method, that at any time, we have L = A
{m} B, where m is the winner so far we hold.
This approach doesnt need the collection of grapes being indexed (as being
labeled in method 1). Its suitable for any traversable data structures, including
linked-list etc. Suppose b1 is an arbitrary element in B if B isnt empty, and B
is the rest of elements with b1 being removed, this method can be formalized as
the below auxiliary function.
(m, A) : B =
(9.3)
Where L is all elements in L except for the first one l1 . The algorithm
extractM in doesnt not only find the minimum element, but also returns the
updated collection which doesnt contain this minimum. Summarize this minimum extracting algorithm up to the basic selection sort definition, we can create
a complete functional sorting program, for example as this Haskell code snippet.
sort [] = []
sort xs = x : sort xs where
(x, xs) = extractMin xs
230CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
extractMin (x:xs) = min [] x xs where
min ys m [] = (m, ys)
min ys m (x:xs) = if m < x then min (x:ys) m xs else min (m:ys) x xs
The first line handles the trivial edge case that the sorting result for empty
list is obvious empty; The second clause ensures that, there is at least one
element, thats why the extractMin function neednt other pattern-matching.
One may think the second clause of min function should be written like
below:
min ys m (x:xs) = if m < x then min ys ++ [x] m xs
else min ys ++ [m] x xs
Or it will produce the updated list in reverse order. Actually, its necessary
to use cons instead of appending here. This is because appending is linear
operation which is proportion to the length of part A, while cons is constant
O(1) time operation. In fact, we neednt keep the relative order of the list to
be sorted, as it will be re-arranged anyway during sorting.
Its quite possible to keep the relative order during sorting1 , while ensure
the performance of finding the minimum element not degrade to quadratic. The
following equation defines a solution.
|L| = 1
l1 < m, (m, L ) = extractM in(L )
otherwise
(9.4)
If L is a singleton, the minimum is the only element it contains. Otherwise,
denote l1 as the first element in L, and L contains the rest elements except for
l1 , that L = {l2 , l3 , ...}. The algorithm recursively finding the minimum element
in L , which yields the intermediate result as (m, L ), that m is the minimum
element in L , and L contains all rest elements except for m. Comparing l1
with m, we can determine which of them is the final minimum result.
The following Haskell program implements this version of selection sort.
extractM in(L) =
(l1 , ) :
(l1 , L ) :
(m, l1 L ) :
sort [] = []
sort xs = x : sort xs where
(x, xs) = extractMin xs
extractMin [x] = (x, [])
extractMin (x:xs) = if x < m then (x, xs) else (m, x:xs) where
(m, xs) = extractMin xs
Note that only cons operation is used, we neednt appending at all because
the algorithm actually examines the list from right to left. However, its not
free, as this program need book-keeping the context (via call stack typically).
The relative order is ensured by the nature of recursion. Please refer to the
appendix about tail recursion call for detailed discussion.
9.2.3
Both the labeling method, and the grouping method need examine all the elements to pick the minimum in every round; and we totally pick up the minimum
1 known
as stable sort.
231
Exercise 9.1
Implement the basic imperative selection sort algorithm (the none in-place
version) in your favorite programming language. Compare it with the inplace version, and analyze the time and space eectiveness.
9.3
9.3.1
Minor Improvement
Parameterize the comparator
Before any improvement in terms of performance, lets make the selection sort
algorithm general enough to handle dierent sorting criteria.
Weve seen two opposite examples so far, that one may need sort the elements
in ascending order or descending order. For the former case, we need repeatedly
finding the minimum, while for the later, we need find the maximum instead.
They are just two special cases. In real world practice, one may want to sort
things in varies criteria, e.g. in terms of size, weight, age, ...
One solution to handle them all is to passing the criteria as a compare
function to the basic selection sort algorithms. For example:
{
sort(c, L) =
: L=
m sort(c, L ) : otherwise, (m, L ) = extract(c, L )
(9.5)
extract(c, L) =
(l1 , ) :
(l1 , L ) :
(m, {l1 } L ) :
|L| = 1
c(l1 , m), (m, L ) = extract(c, L )
c(l1 , m)
(9.6)
Where c is a comparator function, it takes two elements, compare them and
returns the result of which one is preceding of the other. Passing less than
operator (<) turns this algorithm to be the version we introduced in previous
section.
Some environments require to pass the total ordering comparator, which
returns result among less than, equal, and greater than. We neednt such
strong condition here, that c only tests if less than is satisfied. However, as the
minimum requirement, the comparator should meet the strict weak ordering as
following [16]:
232CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
Irreflexivity, for all x, its not the case that x < x;
Asymmetric, For all x and y, if x < y, then its not the case y < x;
Transitivity, For all x, y, and z, if x < y, and y < z, then x < z;
The following Scheme/Lisp program translates this generic selection sorting
algorithm. The reason why we choose Scheme/Lisp here is because the lexical
scope can simplify the needs to pass the less than comparator for every function
calls.
(define (sel-sort-by ltp? lst)
(define (ssort lst)
(if (null? lst)
lst
(let ((p (extract-min lst)))
(cons (car p) (ssort (cdr p))))))
(define (extract-min lst)
(if (null? (cdr lst))
lst
(let ((p (extract-min (cdr lst))))
(if (ltp? (car lst) (car p))
lst
(cons (car p) (cons (car lst) (cdr p)))))))
(ssort lst))
Note that, both ssort and extract-min are inner functions, so that the
less than comparator ltp? is available to them. Passing < to this function
yields the normal sorting in ascending order:
(sel-sort-by < (3 1 2 4 5 10 9))
;Value 16: (1 2 3 4 5 9 10)
9.3.2
The basic in-place imperative selection sorting algorithm iterates all elements,
and picking the minimum by traversing as well. It can be written in a compact
way, that we inline the minimum finding part as an inner loop.
procedure Sort(A)
for i 1 to |A| do
mi
for j i + 1 to |A| do
if A[i] < A[m] then
mi
Exchange A[i] A[m]
Observe that, when we are sorting n elements, after the first n 1 minimum
ones are selected, the left only one, is definitely the n-th big element, so that
233
we need NOT find the minimum if there is only one element in the list. This
indicates that the outer loop can iterate to n 1 instead of n.
Another place we can fine tune, is that we neednt swap the elements if the
i-th minimum one is just A[i]. The algorithm can be modified accordingly as
below:
procedure Sort(A)
for i 1 to |A| 1 do
mi
for j i + 1 to |A| do
if A[i] < A[m] then
mi
if m = i then
Exchange A[i] A[m]
Definitely, these modifications wont aects the performance in terms of bigO.
9.3.3
Cock-tail sort
swap
...
max
...
Figure 9.4: Select the maximum every time and put it to the end.
This version reveals the fact that, selecting the maximum element can sort
the element in ascending order as well. Whats more, we can find both the
minimum and the maximum elements in one pass of traversing, putting the
minimum at the first location, while putting the maximum at the last position.
This approach can speed up the sorting slightly (halve the times of the outer
loop). This method is called cock-tail sort.
procedure Sort(A)
for i 1 to |A|
2 do
234CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
min i
max |A| + 1 i
if A[max] < A[min] then
Exchange A[min] A[max]
for j i + 1 to |A| i do
if A[j] < A[min] then
min j
if A[max] < A[j] then
max j
Exchange A[i] A[min]
Exchange A[|A| + 1 i] A[max]
This algorithm can be illustrated as in figure 9.5, at any time, the left most
and right most parts contain sorted elements so far. That the smaller sorted ones
are on the left, while the bigger sorted ones are on the right. The algorithm scans
the unsorted ranges, located both the minimum and the maximum positions,
then put them to the head and the tail position of the unsorted ranges by
swapping.
swap
... sorted small ones ...
x ...
max
...
min
...
Figure 9.5: Select both the minimum and maximum in one pass, and put them
to the proper positions.
Note that its necessary to swap the left most and right most elements before
the inner loop if they are not in correct order. This is because we scan the range
excluding these two elements. Another method is to initialize the first element of
the unsorted range as both the maximum and minimum before the inner loop.
However, since we need two swapping operations after the scan, its possible
that the first swapping moves the maximum or the minimum from the position
we just found, which leads the second swapping malfunctioned. How to solve
this problem is left as exercise to the reader.
The following Python example program implements this cock-tail sort algorithm.
def cocktail_sort(xs):
n = len(xs)
for i in range(n / 2):
(mi, ma) = (i, n - 1 -i)
if xs[ma] < xs[mi]:
(xs[mi], xs[ma]) = (xs[ma], xs[mi])
for j in range(i+1, n - 1 - i):
if xs[j] < xs[mi]:
mi = j
if xs[ma] < xs[j]:
ma = j
(xs[i], xs[mi]) = (xs[mi], xs[i])
(xs[n - 1 - i], xs[ma]) = (xs[ma], xs[n - 1 - i])
235
return xs
236CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
We mentioned that the appending operation is expensive in this intuitive
version. It can be improved. This can be achieved in two steps. The first step is
to convert the cock-tail sort into tail-recursive call. Denote the sorted small ones
as A, and sorted big ones as B in figure 9.5. We use A and B as accumulators.
The new cock-tail sort is defined as the following.
{
: L = |L| = 1
: otherwise
(9.9)
Where lmin , lmax and L are defined as same as before. And we start sorting
by passing empty A and B: sort(L) = sort (, L, ).
Besides the edge case, observing that the appending operation only happens
on A {lmin }; while lmax is only linked to the head of B. This appending
occurs in every recursive call. To eliminate it, we can store A in reverse order
sort (A, L, B) =
ALB
sort (A {lmin }, L , {lmax } B)
append(L, x)
= reverse(cons(x, reverse(L)))
(9.10)
= reverse(cons(x, L ))
sort (A, L, B) =
reverse(A) B : L =
reverse({l1 } A) B : |L| = 1
(9.11)
Exercise 9.2
Realize the imperative basic selection sort algorithm, which can take a
comparator as a parameter. Please try both dynamic typed language and
static typed language. How to annotate the type of the comparator as
general as possible in a static typed language?
Implement Knuths version of selection sort in your favorite programming
language.
An alternative to realize cock-tail sort is to assume the i-th element both
the minimum and the maximum, after the inner loop, the minimum and
maximum are found, then we can swap the the minimum to the i-th
position, and the maximum to position |A|+1i. Implement this solution
in your favorite imperative language. Please note that there are several
special edge cases should be handled correctly:
237
9.4
Major improvement
Although cock-tail sort halves the numbers of loop, the performance is still
bound to quadratic time. It means that, the method we developed so far handles
big data poorly compare to other divide and conquer sorting solutions.
To improve selection based sort essentially, we must analyze where is the
bottle-neck. In order to sort the elements by comparison, we must examine all
the elements for ordering. Thus the outer loop of selection sort is necessary.
However, must it scan all the elements every time to select the minimum? Note
that when we pick the smallest one at the first time, we actually traverse the
whole collection, so that we know which ones are relative big, and which ones
are relative small partially.
The problem is that, when we select the further minimum elements, instead
of re-using the ordering information we obtained previously, we drop them all,
and blindly start a new traverse.
So the key point to improve selection based sort is to re-use the previous
result. There are several approaches, well adopt an intuitive idea inspired by
football match in this chapter.
9.4.1
The football world cup is held every four years. There are 32 teams from
dierent continent play the final games. Before 1982, there were 16 teams
compete for the tournament finals[4].
For simplification purpose, lets go back to 1978 and imagine a way to determine the champion: In the first round, the teams are grouped into 8 pairs
to play the game; After that, there will be 8 winner, and 8 teams will be out.
Then in the second round, these 8 teams are grouped into 4 pairs. This time
there will be 4 winners after the second round of games; Then the top 4 teams
are divided into 2 pairs, so that there will be only two teams left for the final
game.
The champion is determined after the total 4 rounds of games. And there
are actually 8 + 4 + 2 + 1 = 16 games. Now we have the world cup champion,
however, the world cup game wont finish at this stage, we need to determine
which is the silver medal team.
Readers may argue that isnt the team beaten by the champion at the final game the second best? This is true according to the real world cup rule.
However, it isnt fair enough in some sense.
We often heard about the so called group of death, Lets suppose that
Brazil team is grouped with Deutch team at the very beginning. Although both
238CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
teams are quite strong, one of them must be knocked out. Its quite possible
that even the team loss that game can beat all the other teams except for the
champion. Figure 9.6 illustrates such case.
16
16
16
15
16
14
13
10
16
13
13
14
10
10
12
12
14
11
14
239
15
15
15
15
14
13
10
15
-INF
13
13
14
10
10
12
14
12
11
14
14
-INF
-INF
13
14
13
10
-INF
13
14
10
13
10
12
14
12
11
14
13
-INF
-INF
13
12
13
10
-INF
13
13
12
10
10
12
12
11
11
-INF
240CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
We can reuse the binary tree definition given in the first chapter of this
book to represent tournament tree. In order to back-track from leaf to the root,
every node should hold a reference to its parent (concept of pointer in some
environment such as ANSI C):
struct Node {
Key key;
struct Node left, right, parent;
};
241
Function leaf(x) creats a leaf node, with value x as key, and sets all its
fields, left, right and parent to NIL. While function branch(key, left, right)
creates a branch node, and links the new created node as parent of its two
children if they are not empty. For the sake of brevity, we skip the detail of
them. They are left as exercise to the reader, and the complete program can be
downloaded along with this book.
Some programming environments, such as Python provides tool to iterate
every two elements at a time, for example:
for x, y in zip([iter(ts)]2):
We skip such language specific feature, readers can refer to the Python example program along with this book for details.
When the maximum element is extracted from the tournament tree, we
replace it with , and repeatedly replace all these values from the root to the
leaf. Next, we back-track to root through the parent field, and determine the
new maximum element.
function Extract-Max(T )
m Key(T )
Key(T )
while Leaf?(T ) do
The top down pass
if Key(Left(T )) = m then
T Left(T )
else
T Right(T )
Key(T )
while Parent(T ) = do
The bottom up pass
T Parent(T )
Key(T ) Max(Key(Left(T )), Key(Right(T )))
return m
This algorithm returns the extracted maximum element, and modifies the
tournament tree in-place. Because we cant represent in real program by
limited length of word, one approach is to define a relative negative big number,
which is less than all the elements in the tournament tree, for example, suppose
all the elements are greater than -65535, we can define negative infinity as below:
#define N_INF -65535
242CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
t = tleftkey == x ? tleft : tright;
tkey = N_INF;
}
while (tparent) {
t = tparent;
tkey = max(tleftkey, trightkey);
}
return x;
}
This algorithm firstly takes O(n) time to build the tournament tree, then
performs n pops to select the maximum elements so far left in the tree. Since
each pop operation is bound to O(lg n), thus the total performance of tournament knock out sorting is O(n lg n).
Refine the tournament knock out
Its possible to design the tournament knock out algorithm in purely functional
approach. And well see that the two passes (first top-down replace the champion with , then bottom-up determine the new champion) in pop operation
can be combined in recursive manner, so that we neednt the parent field any
more. We can re-use the functional binary tree definition as the following example Haskell code.
243
Thus a binary tree is either empty or a branch node contains a key, a left
sub tree and a right sub tree. Both children are again binary trees.
Weve use hard coded big negative number to represents . However, this
solution is ad-hoc, and it forces all elements to be sorted are greater than this
pre-defined magic number. Some programming environments support algebraic
type, so that we can define negative infinity explicitly. For instance, the below
Haskell program setups the concept of infinity 2 .
data Infinite a = NegInf | Only a | Inf deriving (Eq, Ord)
From now on, we switch back to use the min() function to determine the
winner, so that the tournament selects the minimum instead of the maximum
as the champion.
Denote function key(T ) returns the key of the tree rooted at T . Function
wrap(x) wraps the element x into a leaf node. Function tree(l, k, r) creates a
branch node, with k as the key, l and r as the two children respectively.
The knock out process, can be represented as comparing two trees, picking
the smaller key as the new key, and setting these two trees as children:
branch(T1 , T2 ) = tree(T1 , min(key(T1 ), key(T2 )), T2 )
(9.12)
(9.13)
The build (T) function terminates when there is only one tree left in T, which
is the champion. This is the trivial edge case. Otherwise, it groups every two
trees in a pair to determine the winners. When there are odd numbers of trees,
it just makes the last tree as the winner to attend the next level of tournament
and recursively repeats the building process.
{
T : |T| 1
build (T) =
(9.14)
build (pair(T)) : otherwise
2 The order of the definition of NegInf, regular number, and Inf is significant if we want
to derive the default, correct comparing behavior of Ord. Anyway, its possible to specify the
detailed order by make it as an instance of Ord. However, this is Language specific feature
which is out of the scope of this book. Please refer to other textbook about Haskell.
244CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
Note that this algorithm actually handles another special cases, that the list
to be sort is empty. The result is obviously empty.
Denote T = {T1 , T2 , ...} if there are at least two trees, and T represents the
left trees by removing the first two. Function pair(T) is defined as the following.
{
pair(T) =
(9.15)
When extracting the champion (the minimum) from the tournament tree,
we need examine either the left child sub-tree or the right one has the same key
as the root, and recursively extract on that tree until arrive at the leaf node.
Denote the left sub-tree of T as L, right sub-tree as R, and K as its key. We
can define this popping algorithm as the following.
L=R=
K = key(L), L = pop(L)
K = key(R), R = pop(R)
(9.16)
Its straightforward to translate this algorithm into example Haskell code.
pop(T ) =
tree(, , ) :
tree(L , min(key(L ), key(R)), R) :
Note that this algorithm only removes the current champion without returning it. So its necessary to define a function to get the champion at the root
node.
top(T ) = key(T )
(9.17)
With these functions defined, tournament knock out sorting can be formalized by using them.
sort(L) = sort (build(L))
(9.18)
: T = key(T ) =
{top(T )} sort (pop(T )) : otherwise
(9.19)
The rest of the Haskell code is given below to complete the implementation.
245
And the auxiliary function only, key, wrap accomplished with explicit infinity support are list as the following.
only (Only x) = x
key (Br _ k _ ) = k
wrap x = Br Empty (Only x) Empty
Exercise 9.3
Implement the helper function leaf(), branch, max() lsleaf(), and
release() to complete the imperative tournament tree program.
Implement the imperative tournament tree in a programming language
support GC (garbage collection).
Why can our tournament tree knock out sort algorithm handle duplicated
elements (elements with same value)? We say a sorting algorithm stable, if
it keeps the original order of elements with same value. Is the tournament
tree knock out sorting stable?
Design an imperative tournament tree knock out sort algorithm, which
satisfies the following:
Can handle arbitrary number of elements;
Without using hard coded negative infinity, so that it can take elements with any value.
Compare the tournament tree knock out sort algorithm and binary tree
sort algorithm, analyze eciency both in time and space.
Compare the heap sort algorithm and binary tree sort algorithm, and do
same analysis for them.
9.4.2
246CHAPTER 9. FROM GRAPE TO THE WORLD CUP, THE EVOLUTION OF SELECTION SORT
The final sorting structure described in equation 9.19 can be easily uniformed
to a more general one if we treat the case that the tree is empty if its root holds
infinity as key:
{
: T =
sort (T ) =
(9.20)
{top(T )} sort (pop(T )) : otherwise
This is exactly as same as the one of heap sort we gave in previous chapter.
Heap always keeps the minimum (or the maximum) on the top, and provides
fast pop operation. The binary heap by implicit array encodes the tree structure
in array index, so there arent any extra spaces allocated except for the n array
cells. The functional heaps, such as leftist heap and splay heap allocate n nodes
as well. Well introduce more heaps in next chapter which perform well in many
aspects.
9.5
Short summary
In this chapter, we present the evolution process of selection based sort. selection
sort is easy and commonly used as example to teach students about embedded
looping. It has simple and straightforward structure, but the performance is
quadratic. In this chapter, we do see that there exists ways to improve it not
only by some fine tuning, but also fundamentally change the data structure,
which leads to tournament knock out and heap sort.
Bibliography
[1] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] Wikipedia. Strict weak order. http://en.wikipedia.org/wiki/Strict weak order
[4] Wikipedia. FIFA world cup. http://en.wikipedia.org/wiki/FIFA World Cup
247
248
Chapter 10
Introduction
In previous chapter, we mentioned that heaps can be generalized and implemented with varies of data structures. However, only binary heaps are focused
so far no matter by explicit binary trees or implicit array.
Its quite natural to extend the binary tree to K-ary [1] tree. In this chapter,
we first show Binomial heaps which is actually consist of forest of K-ary trees.
Binomial heaps gain the performance for all operations to O(lg n), as well as
keeping the finding minimum element to O(1) time.
If we delay some operations in Binomial heaps by using lazy strategy, it
turns to be Fibonacci heap.
All binary heaps we have shown perform no less than O(lg n) time for merging, well show its possible to improve it to O(1) with Fibonacci heap, which
is quite helpful to graph algorithms. Actually, Fibonacci heap achieves almost
all operations to good amortized time bound as O(1), and left the heap pop to
O(lg n).
Finally, well introduce about the pairing heaps. It has the best performance
in practice although the proof of it is still a conjecture for the time being.
10.2
Binomial Heaps
10.2.1
Definition
Binomial heap is more complex than most of the binary heaps. However, it has
excellent merge performance which bound to O(lg n) time. A binomial heap is
consist of a list of binomial trees.
Binomial tree
In order to explain why the name of the tree is binomial, lets review the
famous Pascals triangle (Also know as the Jia Xians triangle to memorize the
Chinese methematican Jia Xian (1010-1070).) [4].
249
(a) A B0 tree.
rank=n-1
rank=n-1
...
...
251
(a) B0 tree;
(b) B1 tree;
(c) B2 tree;
(d) B3 tree;
4
...
(e) B4 tree;
45
37
29
10
44
30
23
22
48
31
17
32
24
50
55
253
Data layout
There are two ways to define K-ary trees imperatively. One is by using leftchild, right-sibling approach[2]. It is compatible with the typical binary tree
structure. For each node, it has two fields, left field and right field. We use the
left field to point to the first child of this node, and use the right field to point to
the sibling node of this node. All siblings are represented as a single directional
linked list. Figure 10.4 shows an example tree represented in this way.
R
C1
C1
C2
C2
...
NIL
...
Cn
Cm
are children of
linked one next to each other on the right side of C1 . C2 , ..., Cm
C1 .
The other way is to use the library defined collection container, such as array
or list to represent all children of a node.
Since the rank of a tree plays very important role, we also defined it as a
field.
For left-child, right-sibling method, we defined the binomial tree as the
following.1
class BinomialTree:
def __init__(self, x = None):
self.rank = 0
self.key = x
self.parent = None
self.child = None
self.sibling = None
When initialize a tree with a key, we create a leaf node, set its rank as zero
and all other fields are set as NIL.
It quite nature to utilize pre-defined list to represent multiple children as
below.
class BinomialTree:
def __init__(self, x = None):
self.rank = 0
self.key = x
self.parent = None
self.children = []
1C
While binomial heap are defined as a list of binomial trees (a forest) with
ranks in monotonically increase order. And as another implicit constraint, there
are no two binomial trees have the same rank.
type BiHeap a = [BiTree a]
10.2.2
Linking trees
Before dive into the basic heap operations such as pop and insert, Well first
realize how to link two binomial trees with same rank into a bigger one. According to the definition of binomial tree and heap property that the root always
contains the minimum key, we firstly compare the two root values, select the
smaller one as the new root, and insert the other tree as the first child in front
of all other children. Suppose function Key(T ), Children(T ), and Rank(T )
access the key, children and rank of a binomial tree respectively.
{
link(T1 , T2 ) =
node(r + 1, x, {T2 } C1 ) :
node(r + 1, y, {T1 } C2 ) :
x<y
otherwise
(10.1)
Where
x = Key(T1 )
y = Key(T2 )
r = Rank(T1 ) = Rank(T2 )
C1 = Children(T1 )
C2 = Children(T2 )
...
...
255
Its possible to realize the link operation in imperative way. If we use left
child, right sibling approach, we just link the tree which has the bigger key to
the left side of the others key, and link the children of it to the right side as
sibling. Figure 10.6 shows the result of one case.
1: function Link(T1 , T2 )
2:
if Key(T2 ) < Key(T1 ) then
3:
Exchange T1 T2
4:
Sibling(T2 ) Child(T1 )
5:
Child(T1 ) T2
6:
Parent(T2 ) T1
7:
Rank(T1 ) Rank(T1 ) + 1
8:
return T1
x
...
...
Figure 10.6: Suppose x < y, link y to the left side of x and link the original
children of x to the right side of y.
And if we use a container to manage all children of a node, the algorithm is
like below.
1: function Link(T1 , T2 )
2:
if Key(T2 ) < Key(T1 ) then
3:
Exchange T1 T2
4:
Parent(T2 ) T1
5:
Insert-Before(Children(T1 ), T2 )
6:
Rank(T1 ) Rank(T1 ) + 1
7:
return T1
Its easy to translate both algorithms to real program. Here we only show
the Python program of Link for illustration purpose 2 .
def link(t1, t2):
if t2.key < t1.key:
(t1, t2) = (t2, t1)
t2.parent = t1
t1.children.insert(0, t2)
t1.rank = t1.rank + 1
return t1
2 The
C and C++ programs are also available along with this book
Exercise 10.1
Implement the tree-linking program in your favorite language with left-child,
right-sibling method.
insertT (H, T ) =
{T } : H =
{T } H : Rank(T ) < Rank(T1 )
where
H = {T2 , T3 , ..., Tn }
The idea is that for the empty heap, we set the new tree as the only element
to create a singleton forest; otherwise, we compare the ranks of the new tree
and the first tree in the forest, if they are same, we link them together, and
recursively insert the linked result (a tree with rank increased by one) to the
rest of the forest; If they are not same, since the pre-condition constraints the
rank of the new tree, it must be the smallest, we put this new tree in front of
all the other trees in the forest.
From the binomial properties mentioned above, there are at most O(lg n)
binomial trees in the forest, where n is the total number of nodes. Thus function
insertT performs at most O(lg n) times linking, which are all constant time
operation. So the performance of insertT is O(lg n). 3
The relative Haskell program is given as below.
insertTree [] t = [t]
insertTree ts@(t:ts) t = if rank t < rank t then t:ts
else insertTree ts (link t t)
3 There is interesting observation by comparing this operation with adding two binary
numbers. Which will lead to topic of numeric representation[6].
257
With this auxiliary function, its easy to realize the insertion. We can wrap
the new element to be inserted as the only leaf of a tree, then insert this tree to
the binomial heap.
insert(H, x) = insertT (H, node(0, x, ))
(10.3)
Since wrapping an element as a singleton tree takes O(1) time, the real work
is done in insertT , the performance of binomial heap insertion is bound to
O(lg n).
The insertion algorithm can also be realized with imperative approach.
Algorithm 4 Insert a tree with left-child-right-sibling method.
1:
2:
3:
4:
5:
6:
function Insert-Tree(H, T )
while H = Rank(Head(H)) = Rank(T ) do
(T1 , H) Extract-Head(H)
T Link(T, T1 )
Sibling(T ) H
return T
Algorithm 4 continuously linking the first tree in a heap with the new tree
to be inserted if they have the same rank. After that, it puts the linked-list of
the rest trees as the sibling, and returns the new linked-list.
If using a container to manage the children of a node, the algorithm can be
given in Algorithm 5.
Algorithm 5 Insert a tree with children managed by a container.
1:
2:
3:
4:
5:
6:
function Insert-Tree(H, T )
while H = Rank(H[0]) = Rank(T ) do
T1 Pop(H)
T Link(T, T1 )
Head-Insert(H, T )
return H
In this algorithm, function Pop removes the first tree T1 = H[0] from the
forest. And function Head-Insert, insert a new tree before any other trees in
the heap, so that it becomes the first element in the forest.
With either Insert-Tree or Insert-Tree defined. Realize the binomial
heap insertion is trivial.
Algorithm 6 Imperative insert algorithm
1:
2:
function Insert(H, x)
return Insert-Tree(H, Node(0, x, ))
Exercise 10.2
Write the insertion program in your favorite imperative programming language by using the left-child, right-sibling approach.
Merge two heaps
When merge two binomial heaps, we actually try to merge two forests of binomial trees. According to the definition, there cant be two trees with the same
rank and the ranks are in monotonically increasing order. Our strategy is very
similar to merge sort. That in every iteration, we take the first tree from each
forest, compare their ranks, and pick the smaller one to the result heap; if the
ranks are equal, we then perform linking to get a new tree, and recursively insert
this new tree to the result of merging the rest trees.
Figure 10.7 illustrates the idea of this algorithm. This method is dierent
from the one given in [2].
We can formalize this idea with a function. For non-empty cases, we denote
the two heaps as H1 = {T1 , T2 , ...} and H2 = {T1 , T2 , ...}. Let H1 = {T2 , T3 , ...}
and H2 = {T2 , T3 , ...}.
H2 =
H1 =
Rank(T1 ) < Rank(T1 )
merge(H1 , H2 ) =
otherwise
(10.4)
To analysis the performance of merge, suppose there are m1 trees in H1 ,
and m2 trees in H2 . There are at most m1 + m2 trees in the merged result.
If there are no two trees have the same rank, the merge operation is bound to
O(m1 + m2 ). While if there need linking for the trees with same rank, insertT
performs at most O(m1 + m2 ) time. Consider the fact that m1 = 1 + lg n1
and m2 = 1 + lg n2 , where n1 , n2 are the numbers of nodes in each heap, and
lg n1 + lg n2 2lg n, where n = n1 + n2 , is the total number of nodes. the
final performance of merging is O(lg n).
Translating this algorithm to Haskell yields the following program.
merge
merge
merge
|
|
|
H1
H2
{T1 } merge(H1 , H2 )
{T1 } merge(H1 , H2 )
insertT (merge(H1 , H2 ), link(T1 , T1 ))
:
:
:
:
:
ts1 [] = ts1
[] ts2 = ts2
ts1@(t1:ts1) ts2@(t2:ts2)
rank t1 < rank t2 = t1:(merge ts1 ts2)
rank t1 > rank t2 = t2:(merge ts1 ts2)
otherwise = insertTree (merge ts1 ts2) (link t1 t2)
259
t1
...
t2
...
Rank(t1)<Rank(t2)?
the smaller
T1
T2
...
Ti
...
t2
...
t1
...
Rank(t1)=Rank(t2)
link(t1, t2)
T1
T2
...
Ti
insert
merge rest
(b) If two trees have same rank, link them to a new tree, and recursively insert it
to the merge result of the rest.
function Merge(H1 , H2 )
if H1 = then
return H2
if H2 = then
return H1
H
while H1 = H2 = do
T
if Rank(H1 ) < Rank(H2 ) then
(T, H1 ) Extract-Head(H1 )
else if Rank(H2 ) < Rank(H1 ) then
(T, H2 ) Extract-Head(H2 )
else
(T1 , H1 ) Extract-Head(H1 )
(T2 , H2 ) Extract-Head(H2 )
T Link(T1 , T2 )
Append-Tree(H, T )
if H1 = then
Append-Trees(H, H1 )
if H2 = then
Append-Trees(H, H2 )
return H
Equal rank
Since both heaps contain binomial trees with rank in monotonically increasing order. Each iteration, we pick the tree with smallest rank and append it to
the result heap. If both trees have same rank we perform linking first. Consider
the Append-Tree algorithm, The rank of the new tree to be appended, cant
be less than any other trees in the result heap according to our merge strategy,
however, it might be equal to the rank of the last tree in the result heap. This
can happen if the last tree appended are the result of linking, which will increase
the rank by one. In this case, we must link the new tree to be inserted with the
last tree. In below algorithm, suppose function Last(H) refers to the last tree
in a heap, and Append(H, T ) just appends a new tree at the end of a forest.
1: function Append-Tree(H, T )
2:
if H = Rank(T ) = Rank(Last(H)) then
3:
Last(H) Link(T , Last(H))
4:
else
5:
Append(H, T )
Function Append-Trees repeatedly call this function, so that it can append
all trees in a heap to the other heap.
1: function Append-Trees(H1 , H2 )
2:
for each T H2 do
3:
H1 Append-Tree(H1 , T )
261
Exercise 10.3
The program given above uses a container to manage sub-trees. Implement
the merge algorithm in your favorite imperative programming language with
left-child, right-sibling approach.
Pop
Among the forest which forms the binomial heap, each binomial tree conforms
to heap property that the root contains the minimum element in that tree.
However, the order relationship of these roots can be arbitrary. To find the
minimum element in the heap, we can select the smallest root of these trees.
Since there are lg n binomial trees, this approach takes O(lg n) time.
However, after we locate the minimum element (which is also know as the
top element of a heap), we need remove it from the heap and keep the binomial
property to accomplish heap-pop operation. Suppose the forest forms the binomial heap consists trees of Bi , Bj , ..., Bp , ..., Bm , where Bk is a binomial tree of
rank k, and the minimum element is the root of Bp . If we delete it, there will
be p children left, which are all binomial trees with ranks p 1, p 2, ..., 0.
One tool at hand is that we have defined O(lg n) merge function. A possible
approach is to reverse the p children, so that their ranks change to monotonically
increasing order, and forms a binomial heap Hp . The rest of trees is still a
extractM in(H) =
(T, ) :
(T1 , H ) :
(T , {T1 } H ) :
H is a singleton as {T }
Root(T1 ) < Root(T )
otherwise
(10.5)
where
H = {T1 , T2 , ...}
H = {T2 , T3 , ...}
(T , H ) = extractM in(H )
The result of this function is a tuple. The first part is the tree which has the
minimum element at root, the second part is the rest of the trees after remove
the first part from the forest.
This function examine each of the trees in the forest thus is bound to O(lg n)
time.
The relative Haskell program can be give respectively.
extractMin [t] = (t, [])
263
Of course, its possible to just traverse forest and pick the minimum root
without remove the tree for this purpose. Below imperative algorithm describes
it with left child, right sibling approach.
1: function Find-Minimum(H)
2:
T Head(H)
3:
min
4:
while T = do
5:
if Key(T )< min then
6:
min Key(T )
7:
T Sibling(T )
8:
return min
While if we manage the children with collection containers, the link list
traversing is abstracted as to find the minimum element among the list. The
following Python program shows about this situation.
def find_min(ts):
min_t = min(ts, key=lambda t: t.key)
return min_t.key
Next we define the function to delete the minimum element from the heap
by using extractM in.
delteM in(H) = merge(reverse(Children(T )), H )
(10.6)
where
(T, H ) = extractM in(H)
Translate the formula to Haskell program is trivial and well skip it.
To realize the algorithm in procedural way takes extra eorts including list
reversing etc. We left these details as exercise to the reader. The following
pseudo code illustrate the imperative pop algorithm
1: function Extract-Min(H)
2:
(Tmin , H) Extract-Min-Tree(H)
3:
H Merge(H, Reverse(Children(Tmin )))
4:
return (Key(Tmin ), H)
With pop operation defined, we can realize heap sort by creating a binomial
heap from a series of numbers, than keep popping the smallest number from the
heap till it becomes empty.
sort(xs) = heapSort(f romList(xs))
(10.7)
: H=
{f indM in(H)} heapSort(deleteM in(H)) : otherwise
(10.8)
Translate to Haskell yields the following program.
heapSort(H) =
Function fromList can be defined by folding. Heap sort can also be expressed
in procedural way respectively. Please refer to previous chapter about binary
heap for detail.
Exercise 10.4
Write the program to return the minimum element from a binomial heap
in your favorite imperative programming language with left-child, rightsibling approach.
Realize the Extract-Min-Tree() Algorithm.
For left-child, right-sibling approach, reversing all children of a tree is
actually reversing a single-direct linked-list. Write a program to reverse
such linked-list in your favorite imperative programming language.
10.3
Fibonacci Heaps
Its interesting that why the name is given as Fibonacci heap. In fact, there is
no direct connection from the structure design to Fibonacci series. The inventors
of Fibonacci heap, Michael L. Fredman and Robert E. Tarjan, utilized the
property of Fibonacci series to prove the performance time bound, so they
decided to use Fibonacci to name this data structure.[2]
10.3.1
Definition
265
algorithm dierent from some popular textbook[2]. Most of the ideas present
here are based on Okasakis work[6].
Lets review and compare the performance of binomial heap and Fibonacci
heap (more precisely, the performance goal of Fibonacci heap).
operation Binomial heap
Fibonacci heap
insertion
O(lg n)
O(1)
merge
O(lg n)
O(1)
top
O(lg n)
O(1)
pop
O(lg n)
amortized O(lg n)
Consider where is the bottleneck of inserting a new element x to binomial
heap. We actually wrap x as a singleton leaf and insert this tree into the heap
which is actually a forest.
During this operation, we inserted the tree in monotonically increasing order
of rank, and once the rank is equal, recursively linking and inserting will happen,
which lead to the O(lg n) time.
As the lazy strategy, we can postpone the ordered-rank insertion and merging
operations. On the contrary, we just put the singleton leaf to the forest. The
problem is that when we try to find the minimum element, for example the top
operation, the performance will be bad, because we need check all trees in the
forest, and there arent only O(lg n) trees.
In order to locate the top element in constant time, we must remember where
is the tree contains the minimum element as root.
Based on this idea, we can reuse the definition of binomial tree and give the
definition of Fibonacci heap as the following Haskell program for example.
data BiTree a = Node { rank :: Int
, root :: a
, children :: [BiTree a]}
The Fibonacci heap is either empty or a forest of binomial trees with the
minimum element stored in a special one explicitly.
data FibHeap a = E | FH { size :: Int
, minTree :: BiTree a
, trees :: [BiTree a]}
For convenient purpose, we also add a size field to record how many elements
are there in a heap.
The data layout can also be defined in imperative way as the following ANSI
C code.
struct node{
Key key;
struct node next, prev, parent, children;
int degree; / As known as rank /
int mark;
};
struct FibHeap{
struct node roots;
struct node minTr;
int n; / number of nodes /
};
In this chapter, we use the circular doubly linked-list for imperative settings
to realize the Fibonacci Heap as described in [2]. It makes many operations easy
and fast. Note that, there are two extra fields added. A degree, also known as
rank for a node is the number of children of this node; Flag mark is used only
in decreasing key operation. It will be explained in detail in later section.
10.3.2
(10.9)
267
Exercise 10.5
Implement the insert algorithm in your favorite imperative programming
language completely. This is also an exercise to circular doubly linked list manipulation.
Merge two heaps
Dierent with the merging algorithm of binomial heap, we post-pone the linking
operations later. The idea is to just put all binomial trees from each heap
together, and choose one special tree which record the minimum element for
the result heap.
H1
H2
merge(H1 , H2 ) =
F
ibHeap(s
+
s
,
T
,
{T
}
1
2
1min
2min
1
2)
H2 =
H1 =
root(T1min ) < root(T2min )
otherwise
(10.10)
where s1 and s2 are the size of H1 and H2 ; T1min and T2min are the special trees with minimum element as root in H1 and H2 respectively; T1 =
{T11 , T12 , ...} is a forest contains all other binomial trees in H1 ; while T2 has
the same meaning as T1 except that it represents the forest in H2 . Function
root(T ) return the root element of a binomial tree.
Note that as long as the operation takes constant time, these merge algorithm is bound to O(1). The following Haskell program is the translation of
this algorithm.
merge
merge
merge
|
|
:
:
:
:
h E=h
E h=h
h1@(FH sz1 minTr1 ts1) h2@(FH sz2 minTr2 ts2)
root minTr1 < root minTr2 = FH (sz1+sz2) minTr1 (minTr2:ts2++ts1)
otherwise = FH (sz1+sz2) minTr2 (minTr1:ts1++ts2)
(10.11)
Exercise 10.6
Implement the circular doubly linked list concatenation function in your
favorite imperative programming language.
Extract the minimum element from the heap (pop)
The pop operation is the most complex one in Fibonacci heap. Since we postpone the tree consolidation in merge algorithm. We have to compensate it
somewhere. Pop is the only place left as we have defined, insert, merge, top
already.
There is an elegant procedural algorithm to do the tree consolidation by
using an auxiliary array[2]. Well show it later in imperative approach section.
In order to realize the purely functional consolidation algorithm, lets first
consider a similar number puzzle.
Given a list of numbers, such as {2, 1, 1, 4, 8, 1, 1, 2, 4}, we want to add any
two values if they are same. And repeat this procedure till all numbers are
unique. The result of the example list should be {8, 16} for instance.
One solution to this problem will as the following.
consolidate(L) = f old(meld, , L)
(10.12)
Where f old() function is defined to iterate all elements from a list, applying
a specified function to the intermediate result and each element. it is sometimes
called as reducing. Please refer to Appendix A and the chapter of binary search
tree for it.
L = {x1 , x2 , ..., xn }, denotes a list of numbers; and well use L = {x2 , x3 , ..., xn }
to represent the rest of the list with the first element removed. Function meld()
is defined as below.
{x} : L =
meld(L , x + x1 ) : x = x1
meld(L, x) =
(10.13)
{x} L : x < x1
269
intermediate result
2
1, 2
(1+1), 2
(4+4)
(8+8)
1, 16
(1+1), 16
(2+2), 16
(4+4), 16
result
2
1, 2
4
8
16
1, 16
2, 16
4, 16
8, 16
{x}
meld(L , link(x, x1 ))
meld(L, x) =
{x} L
{x1 } meld(L , x)
:
:
:
:
L=
rank(x) = rank(x1 )
rank(x) < rank(x1 )
otherwise
(10.14)
(b) Step 1, 2
(d) Step 4
: T = children(Tmin ) =
F ibHeap(s 1, Tmin
, T ) : otherwise
(10.15)
271
(b) Step 6
(a) Step 5
(Tmin
, T ) = extractM in(consolidate(children(Tmin ) T))
The main part of the imperative realization is similar. We cut all children
of Tmin and append them to root list, then perform consolidation to merge all
trees with the same rank until all trees are unique in term of rank.
1: function Delete-Min(H)
2:
x Tmin (H)
3:
if x = N IL then
4:
for each y Children(x) do
5:
append y to root list of H
6:
Parent(y) N IL
7:
remove x from root list of H
8:
n(H) n(H) - 1
9:
Consolidate(H)
10:
return x
Algorithm Consolidate utilizes an auxiliary array A to do the merge job.
Array A[i] is defined to store the tree with rank (degree) i. During the traverse
of root list, if we meet another tree of rank i, we link them together to get a
new tree of rank i + 1. Next we clean A[i], and check if A[i + 1] is empty and
perform further linking if necessary. After we finish traversing all roots, array
A stores all result trees and we can re-construct the heap from it.
1: function Consolidate(H)
2:
D Max-Degree(n(H))
3:
for i 0 to D do
4:
A[i] N IL
5:
for each x root list of H do
6:
remove x from root list of H
7:
d Degree(x)
8:
while A[d] = N IL do
9:
y A[d]
10:
x Link(x, y)
11:
A[d] N IL
12:
dd+1
13:
A[d] x
14:
Tmin (H) N IL
root list is NIL at the time
15:
for i 0 to D do
16:
if A[i] = N IL then
17:
append A[i] to root list of H.
18:
if Tmin = N IL Key(A[i]) < Key(Tmin (H)) then
19:
Tmin (H) A[i]
The only unclear sub algorithm is Max-Degree, which can determine the
upper bound of the degree of any node in a Fibonacci Heap. Well delay the
realization of it to the last sub section.
273
Feed a Fibonacci Heap shown in Figure 10.9 to the above algorithm, Figure
10.11, 10.12 and 10.13 show the result trees stored in auxiliary array A in every
steps.
A[0]
A[0]
A[0]
A[1]
A[2]
A[3]
(a) Step 1, 2
A[1]
A[4]
A[2]
A[3]
A[1]
A[4]
A[2]
A[4]
A[3]
(c) Step 4
A[0]
A[1]
A[2]
A[3]
A[4]
(a) Step 5
A[0]
A[1]
A[2]
A[3]
A[4]
(b) Step 6
A[1]
A[2]
A[3]
275
A[4]
Exercise 10.7
Implement the remove function for circular doubly linked list in your favorite
imperative programming language.
10.3.3
(10.16)
Where t(H) is the number of trees in Fibonacci heap forest. We have t(H) =
1 + length(T) for any non-empty heap.
For the n-nodes Fibonacci heap, suppose there is an upper bound of ranks
for all trees as D(n). After consolidation, it ensures that the number of trees in
the heap forest is at most D(n) + 1.
Before consolidation, we actually did another important thing, which also
contribute to running time, we removed the root of the minimum tree, and
concatenate all children left to the forest. So consolidate operation at most
processes D(n) + t(H) 1 trees.
Summarize all the above factors, we deduce the amortized cost as below.
T
= Tconsolidation + (H ) (H)
= O(D(n) + t(H) 1) + (D(n) + 1) t(H)
= O(D(n))
(10.17)
If only insertion, merge, and pop function are applied to Fibonacci heap.
We ensure that all trees are binomial trees. It is easy to estimate the upper
limit D(n) is O(lg n). (Suppose the extreme case, that all nodes are in only one
Binomial tree).
However, well show in next sub section that, there is operation can violate
the binomial tree assumption.
277
Exercise 10.8
Why the tree consolidation time is proportion to the number of trees it
processed?
10.3.4
Decreasing key
There is a special heap operation left. It only makes sense for imperative settings. Its about decreasing key of a certain node. Decreasing key plays important role in some Graphic algorithms such as Minimum Spanning tree algorithm
and Dijkstras algorithm [2]. In that case we hope the decreasing key takes O(1)
amortized time.
However, we cant define a function like Decrease(H, k, k ), which first locates a node with key k, then decrease k to k by replacement, and then resume
the heap properties. This is because the time for locating phase is bound to
O(n) time, since we dont have a pointer to the target node.
In imperative setting, we can define the algorithm as Decrease-Key(H, x, k).
Here x is a node in heap H, which we want to decrease its key to k. We neednt
perform a search, as we have x at hand. Its possible to give an amortized O(1)
solution.
When we decreased the key of a node, if its not a root, this operation may
violate the property Binomial tree that the key of parent is less than all keys of
children. So we need to compare the decreased key with the parent node, and if
this case happens, we can cut this node and append it to the root list. (Remind
the recursive swapping solution for binary heap which leads to O(lg n))
...
...
...
@
@
x
...
Figure 10.15: x < y, cut tree x from its parent, and add x to root list.
Figure 10.15 illustrates this situation. After decreasing key of node x, it is
less than y, we cut x o its parent y, and past the whole tree rooted at x to
root list.
Although we recover the property of that parent is less than all children, the
tree isnt any longer a Binomial tree after it losses some sub tree. If a tree losses
279
Exercise 10.9
Prove that Decrease-Key algorithm is amortized O(1) time.
10.3.5
Its time to reveal the reason why the data structure is named as Fibonacci
Heap.
There is only one undefined algorithm so far, Max-Degree(n). Which can
determine the upper bound of degree for any node in a n nodes Fibonacci Heap.
Well give the proof by using Fibonacci series and finally realize Max-Degree
algorithm.
Lemma 10.3.1. For any node x in a Fibonacci Heap, denote k = degree(x),
and |x| = size(x), then
|x| Fk+2
(10.18)
Where Fk is Fibonacci series defined as the following.
0 : k=0
1 : k=1
Fk =
Fk1 + Fk2 : k 2
Proof. Consider all k children of node x, we denote them as y1 , y2 , ..., yk in the
order of time when they were linked to x. Where y1 is the oldest, and yk is the
youngest.
Obviously, |yi | 0. When we link yi to x, children y1 , y2 , ..., yi1 have
already been there. And algorithm Link only links nodes with the same degree.
Which indicates at that time, we have
degree(yi ) = degree(x) = i 1
sk
= 2+
sdegree(yi )
i=2
2+
si2
i=2
We next show that sk > Fk+2 . This can be proved by induction. For trivial
cases, we have s0 = 1 F2 = 1, and s1 = 2 F3 = 2. For induction case k 2.
We have
|x|
sk
2+
si2
i=2
2+
Fi
i=2
1+
Fi
i=0
Fi
i=0
= Fk+1 + Fk
= 1+
k1
Fi + Fk
i=0
1+
i=0
Fi
(10.19)
281
(10.20)
Recall the result of AVL tree, that Fk k , where = 1+2 5 is the golden
ratio. We also proved that pop operation is amortized O(lg n) algorithm.
Based on this result. We can define Function M axDegree as the following.
M axDegree(n) = 1 + log n
(10.21)
The imperative Max-Degree algorithm can also be realized by using Fibonacci sequences.
1: function Max-Degree(n)
2:
F0 0
3:
F1 1
4:
k2
5:
repeat
6:
Fk Fk1 + Fk2
7:
k k+1
8:
until Fk < n
9:
return k 2
Translate the algorithm to ANSI C given the following program.
int max_degree(int n){
int k, F;
int F2 = 0;
int F1 = 1;
for(F=F1+F2, k=2; F<n; ++k){
F2 = F1;
F1 = F;
F = F1 + F2;
}
return k-2;
}
10.4
Pairing Heaps
Although Fibonacci Heaps provide excellent performance theoretically, it is complex to realize. People find that the constant behind the big-O is big. Actually,
Fibonacci Heap is more significant in theory than in practice.
In this section, well introduce another solution, Pairing heap, which is one of
the best heaps ever known in terms of performance. Most operations including
insertion, finding minimum element (top), merging are all bounds to O(1) time,
while deleting minimum element (pop) is conjectured to amortized O(lg n) time
[7] [6]. Note that this is still a conjecture for 15 years by the time I write this
chapter. Nobody has been proven it although there are much experimental data
support the O(lg n) amortized result.
Besides that, pairing heap is simple. There exist both elegant imperative
and functional implementations.
10.4.1
Definition
Both Binomial Heaps and Fibonacci Heaps are realized with forest. While a
pairing heaps is essentially a K-ary tree. The minimum element is stored at
root. All other elements are stored in sub trees.
The following Haskell program defines pairing heap.
data PHeap a = E | Node a [PHeap a]
Note that the parent field does only make sense for decreasing key operation,
which will be explained later on. we can omit it for the time being.
10.4.2
In this section, we first give the merging operation for pairing heap, which can be
used to realize insertion. Merging, insertion, and finding the minimum element
are relative trivial compare to the extracting minimum element operation.
Merge, insert, and find the minimum element (top)
The idea of merging is similar to the linking algorithm we shown previously for
Binomial heap. When we merge two pairing heaps, there are two cases.
Trivial case, one heap is empty, we simply return the other heap as the
result;
Otherwise, we compare the root element of the two heaps, make the heap
with bigger root element as a new children of the other.
Let H1 , and H2 denote the two heaps, x and y be the root element of H1
and H2 respectively. Function Children() returns the children of a K-ary tree.
Function N ode() can construct a K-ary tree from a root element and a list of
children.
H1 : H2 =
H2 : H1 =
merge(H1 , H2 ) =
N
ode(x,
{H
}
Children(H
2
1 )) : x < y
(10.22)
4 We can parametrize the key type with C++ template, but this is beyond our scope, please
refer to the example programs along with this book
283
Where
x = Root(H1 )
y = Root(H2 )
Its obviously that merging algorithm is bound to O(1) time 5 . The merge
equation can be translated to the following Haskell program.
merge h E = h
merge E h = h
merge h1@(Node x hs1) h2@(Node y hs2) =
if x < y then Node x (h2:hs1) else Node y (h1:hs2)
Merge can also be realized imperatively. With left-child, right sibling approach, we can just link the heap, which is in fact a K-ary tree, with larger key
as the first new child of the other. This is constant time operation as described
below.
1: function Merge(H1 , H2 )
2:
if H1 = NIL then
3:
return H2
4:
if H2 = NIL then
5:
return H1
6:
if Key(H2 ) < Key(H1 ) then
7:
Exchange(H1 H2 )
8:
Insert H2 in front of Children(H1 )
9:
Parent(H2 ) H1
10:
return H1
Note that we also update the parent field accordingly. The ANSI C example
program is given as the following.
struct node merge(struct node h1, struct node h2) {
if (h1 == NULL)
return h2;
if (h2 == NULL)
return h1;
if (h2key < h1key)
swap(&h1, &h2);
h2next = h1children;
h1children = h2;
h2parent = h1;
h1next = NULL; /Break previous link if any/
return h1;
}
(10.23)
Same as the other two above operations, its bound to O(1) time.
5 Assume is constant time operation, this is true for linked-list settings, including cons
like operation in functional programming languages.
Exercise 10.10
Implement the program of removing a node from the children of its parent
in your favorite imperative programming language. Consider how can we ensure
the overall performance of decreasing key is O(1) time? Is left-child, right sibling
approach enough?
Delete the minimum element from the heap (pop)
Since the minimum element is always stored at root, after delete it during popping, the rest things left are all sub-trees. These trees can be merged to one big
tree.
pop(H) = mergeP airs(Children(H))
(10.24)
Pairing Heap uses a special approach that it merges every two sub-trees
from left to right in pair. Then merge these paired results from right to left
which forms a final result tree. The name of Pairing Heap comes from the
characteristic of this pair-merging.
Figure 10.16 and 10.17 illustrate the procedure of pair-merging.
The recursive pair-merging solution is quite similar to the bottom up merge
sort[6]. Denote the children of a pairing heap as A, which is a list of trees of
285
15
13
12
10
11
17
14
16
15
13
12
10
11
17
14
16
(b) After root element 2 being removed, there are 9 sub-trees left.
15
13
12
10
11
14
16
(c) Merge every two trees in pair, note that there are odd
number trees, so the last one neednt merge.
Figure 10.16: Remove the root element, and merge children in pairs.
11
14
10
14
16
16
12
11
10
14
11
15
12
13
11
10
14
16
16
287
{T1 , T2 , T3 , ..., Tm } for example. The mergeP airs() function can be given as
below.
mergeP airs(A) =
: A=
T1 : A = {T1 }
where
A = {T3 , T4 , ..., Tm }
is the rest of the children without the first two trees.
The relative Haskell program of popping is given as the following.
deleteMin (Node _ hs) = mergePairs hs where
mergePairs [] = E
mergePairs [h] = h
mergePairs (h1:h2:hs) = merge (merge h1 h2) (mergePairs hs)
Exercise 10.11
Write a program to insert a tree at the beginning of a linked-list in your
favorite imperative programming language.
Delete a node
We didnt mention delete in Binomial heap or Fibonacci Heap. Deletion can be
realized by first decreasing key to minus infinity (), then performing pop.
In this section, we present another solution for delete node.
The algorithm is to define the function delete(H, x), where x is a node in a
pairing heap H 6 .
If x is root, we can just perform a pop operation. Otherwise, we can cut x
from H, perform a pop on x, and then merge the pop result back to H. This
can be described as the following.
{
delete(H, x) =
pop(H) :
merge(cut(H, x), pop(x)) :
x is root of
otherwise
(10.26)
Exercise 10.12
Write procedural pseudo code for delete algorithm.
Write the delete operation in your favorite imperative programming language
Consider how to realize delete in purely functional setting.
10.5
In this chapter, we extend the heap implementation from binary tree to more
generic approach. Binomial heap and Fibonacci heap use Forest of K-ary trees
as under ground data structure, while Pairing heap use a K-ary tree to represent
heap. Its a good point to post pone some expensive operation, so that the over
all amortized performance is ensured. Although Fibonacci Heap gives good
performance in theory, the implementation is a bit complex. It was removed in
some latest textbooks. We also present pairing heap, which is easy to realize
and have good performance in practice.
6 Here
289
The elementary tree based data structures are all introduced in this book.
There are still many tree based data structures which we cant covers them all
and skip here. We encourage the reader to refer to other textbooks about them.
From next chapter, well introduce generic sequence data structures, array and
queue.
Bibliography
[1] K-ary tree, Wikipedia. http://en.wikipedia.org/wiki/K-ary tree
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[3] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[4] Wikipedia, Pascals triangle. http://en.wikipedia.org/wiki/Pascals triangle
[5] Hackage. An alternate implementation of a priority queue based on
a Fibonacci heap., http://hackage.haskell.org/packages/archive/pqueuemtl/1.0.7/doc/html/src/Data-Queue-FibQueue.html
[6] Chris Okasaki. Fibonacci Heaps. http://darcs.haskell.org/nofib/gc/fibheaps/orig
[7] Michael L. Fredman, Robert Sedgewick, Daniel D. Sleator, and Robert E.
Tarjan. The Pairing Heap: A New Form of Self-Adjusting Heap Algorithmica (1986) 1: 111-129.
291
292
BIBLIOGRAPHY
Part IV
293
Chapter 11
Introduction
It seems that queues are relative simple. A queue provides FIFO (first-in, firstout) data manipulation support. There are many options to realize queue includes singly linked-list, doubly linked-list, circular buer etc. However, well
show that its not so easy to realize queue in purely functional settings if it must
satisfy abstract queue properties.
In this chapter, well present several dierent approaches to implement queue.
A queue is a FIFO data structure satisfies the following performance constraints.
Element can be added to the tail of the queue in O(1) constant time;
Element can be removed from the head of the queue in O(1) constant
time.
These two properties must be satisfied. And its common to add some extra
goals, such as dynamic memory allocation etc.
Of course such abstract queue interface can be implemented with doublylinked list trivially. But this is a overkill solution. We can even implement
imperative queue with singly linked-list or plain array. However, our main
question here is about how to realize a purely functional queue as well?
Well first review the typical queue solution which is realized by singly linkedlist and circular buer in first section; Then we give a simple and straightforward
functional solution in the second section. While the performance is ensured in
terms of amortized constant time, we need find real-time solution (or worst-case
solution) for some special case. Such solution will be described in the third
and the fourth section. Finally, well show a very simple real-time queue which
depends on lazy evaluation.
Most of the functional contents are based on Chris, Okasakis great work in
[6]. There are more than 16 dierent types of purely functional queue given in
that material.
295
296
11.2
11.2.1
Queue can be implemented with singly linked-list. Its easy to add and remove
element at the front end of a linked-list in O(1) time. However, in order to
keep the FIFO order, if we execute one operation on head, we must perform the
inverse operation on tail.
In order to operate on tail, for plain singly linked-list, we must traverse the
whole list before adding or removing. Traversing is bound to O(n) time, where
n is the length of the list. This doesnt match the abstract queue properties.
The solution is to use an extra record to store the tail of the linked-list. A
sentinel is often used to simplify the boundary handling. The following ANSI
C 1 code defines a queue realized by singly linked-list.
typedef int Key;
struct Node{
Key key;
struct Node next;
};
struct Queue{
struct Node head, tail;
};
Figure 11.1 illustrates an empty list. Both head and tail point to the sentinel
NIL node.
tail
head
S
Figure 11.1: The empty queue, both head and tail point to sentinel node.
We summarize the abstract queue interface as the following.
function Empty
Create an empty queue
function Empty?(Q)
Test if Q is empty
function Enqueue(Q, x)
Add a new element x to queue Q
function Dequeue(Q)
Remove element from queue Q
function Head(Q)
get the next element in queue Q in FIFO order
1 Its possible to parameterize the type of the key with C++ template. ANSI C is used
here for illustration purpose.
297
Note the dierence between Dequeue and Head. Head only retrieve next
element in FIFO order without removing it, while Dequeue performs removing.
In some programming languages, such as Haskell, and most object-oriented
languages, the above abstract queue interface can be ensured by some definition.
For example, the following Haskell code specifies the abstract queue.
class Queue q where
empty :: q a
isEmpty :: q a Bool
push :: q a a q a
pop :: q a q a
front :: q a a
To ensure the constant time Enqueue and Dequeue, we add new element
to head and remove element from tail.2
function Enqueue(Q, x)
p Create-New-Node
Key(p) x
Next(p) N IL
Next(Tail(Q)) p
Tail(Q) p
Note that, as we use the sentinel node, there are at least one node, the
sentinel in the queue. Thats why we neednt check the validation of of the tail
before we append the new created node p to it.
function Dequeue(Q)
x Head(Q)
Next(Head(Q)) Next(x)
if x = Tail(Q) then
Q gets empty
Tail(Q) Head(Q)
return Key(x)
As we always put the sentinel node in front of all the other nodes, function
Head actually returns the next node to the sentinel.
Figure 11.2 illustrates Enqueue and Dequeue process with sentinel node.
Translating the pseudo code to ANSI C program yields the below code.
struct Queue enqueue(struct Queue q, Key x) {
struct Node p = (struct Node)malloc(sizeof(struct Node));
pkey = x;
pnext = NULL;
qtailnext = p;
qtail = p;
return q;
}
Key dequeue(struct Queue q) {
struct Node p = head(q); /gets the node next to sentinel/
Key x = key(p);
qheadnext = pnext;
if(qtail == p)
2 Its possible to add new element to the tail, while remove element from head, but the
operations are more complex than this approach.
298
tail
head
NIL
Enqueue
Sentinel
...
NIL
head
tail
Sentinel
...
NIL
head
tail
Sentinel
...
Dequeue
head
Sentinel
tail
...
NIL
NIL
299
qtail = qhead;
free(p);
return x;
}
This solution is simple and robust. Its easy to extend this solution even to
the concurrent environment (e.g. multicores). We can assign a lock to the head
and use another lock to the tail. The sentinel helps us from being dead-locked
due to the empty case [1] [2].
Exercise 11.1
Realize the Empty? and Head algorithms for linked-list queue.
Implement the singly linked-list queue in your favorite imperative programming language. Note that you need provide functions to initialize
and destroy the queue.
11.2.2
Another typical solution to realize queue is to use plain array as a circular buer
(also known as ring buer). Oppose to linked-list, array support appending to
the tail in constant O(1) time if there are still spaces. Of course we need reallocate spaces if the array is fully occupied. However, Array performs poor
in O(n) time when removing element from head and packing the space. This
is because we need shift all rest elements one cell ahead. The idea of circular
buer is to reuse the free cells before the first valid element after we remove
elements from head.
The idea of circular buer can be described in figure 11.3 and 11.4.
If we set a maximum size of the buer instead of dynamically allocate memories, the queue can be defined with the below ANSI C code.
struct Queue {
Key buf;
int head, tail, size;
};
When initialize the queue, we are explicitly asked to provide the maximum
size as argument.
struct Queue createQ(int max) {
struct Queue q = (struct Queue)malloc(sizeof(struct Queue));
qbuf = (Key)malloc(sizeof(Key)max);
qsize = max;
qhead = qtail = 0;
return q;
}
300
head
a[0]
tail
a[1]
...
boundary
a[i]
...
head
...
tail
a[j]
...
boundary
a[i]
...
head
...
tail
a[j]
...
boundary
a[i]
tail
head
a[0]
...
a[j]
boundary
...
tail
a[0]
a[1]
...
head
a[j-1]
a[j]
boundary
...
301
function Enqueue(Q, x)
if Full?(Q) then
Tail(Q) (Tail(Q) + 1)
Buffer(Q)[Tail(Q)] x
mod Size(Q)
function Head(Q)
if Empty?(Q) then
return Buffer(Q)[Head(Q)]
function Dequeue(Q)
if Empty?(Q) then
Head(Q) (Head(Q) + 1) mod Size(Q)
However, modular is expensive and slow depends on some settings, so one
may replace it by some adjustment. For example as in the below ANSI C
program.
void enQ(struct Queue q, Key x) {
if(!fullQ(q)) {
qbuf[qtail++] = x;
qtail -= qtail< qsize ? 0 : qsize;
}
}
Key headQ(struct Queue q) {
return qbuf[qhead]; / Assume queue isnt empty /
}
Key deQ(struct Queue q) {
Key x = headQ(q);
qhead++;
qhead -= qhead< qsize ? 0 : qsize;
return x;
302
Exercise 11.2
As the circular buer is allocated with a maximum size parameter, please
write a function to test if a queue is full to avoid overflow. Note there are two
cases, one is that the head is in front of the tail, the other is on the contrary.
11.3
11.3.1
Paired-list queue
We cant just use a list to implement queue, or we cant satisfy abstract queue
properties. This is because singly linked-list, which is the back-end data structure in most functional settings, performs well on head in constant O(1) time,
while it performs in linear O(n) time on tail, where n is the length of the list.
Either dequeue or enqueue will perform proportion to the number of elements
stored in the list as shown in figure 11.5.
EnQueue O(1)
x[n]
x[n-1]
...
x[2]
x[1]
NIL
DeQueue O(n)
x[1]
NIL
DeQueue O(1)
x[n]
x[n-1]
...
x[2]
Figure 11.5: DeQueue and EnQueue cant perform both in constant O(1)
time with a list.
We neither can add a pointer to record the tail position of the list as what
we have done in the imperative settings like in the ANSI C program, because
of the nature of purely functional.
Chris Okasaki mentioned a simple and straightforward functional solution in
[6]. The idea is to maintain two linked-lists as a queue, and concatenate these
two lists in a tail-to-tail manner. The shape of the queue looks like a horseshoe
magnet as shown in figure 11.6.
With this setup, we push new element to the head of the rear list, which is
ensure to be O(1) constant time; on the other hand, we pop element from the
head of the front list, which is also O(1) constant time. So that the abstract
queue properties can be satisfied.
The definition of such paired-list queue can be expressed in the following
Haskell code.
type Queue a = ([a], [a])
empty = ([], [])
Suppose function f ront(Q) and rear(Q) return the front and rear list in
such setup, and Queue(F, R) create a paired-list queue from two lists F and R.
303
front
DeQueue O(1)
x[n]
x[n-1]
...
x[2]
x[1]
NIL
EnQueue O(1)
y[m]
y[m-1]
...
y[2]
y[1]
NIL
rear
Figure 11.6: A queue with front and rear list shapes like a horseshoe magnet.
304
The EnQueue (push) and DeQueue (pop) operations can be easily realized
based on this setup.
push(Q, x) = Queue(f ront(Q), {x} rear(Q))
(11.1)
(11.2)
Queue(reverse(R), ) : F =
Q : otherwise
(11.3)
Thus if front list isnt empty, we do nothing, while when the front list becomes empty, we use the reversed rear list as the new front list, and the new
rear list is empty.
The new enqueue and dequeue algorithms are updated as below.
push(Q, x) = balance(F, {x} R)
(11.4)
pop(Q) = balance(tail(F ), R)
(11.5)
Sum up the above algorithms and translate them to Haskell yields the following program.
balance :: Queue a Queue a
balance ([], r) = (reverse r, [])
balance q = q
push :: Queue a a Queue a
push (f, r) x = balance (f, x:r)
pop :: Queue a Queue a
pop ([], _) = error "Empty"
pop (_:f, r) = balance (f, r)
Although we only touch the heads of front list and rear list, the overall
performance cant be kept always as O(1). Actually, the performance of this
algorithm is amortized O(1). This is because the reverse operation takes time
proportion to the length of the rear list. its bound O(n) time, where N = |R|.
We left the prove of amortized performance as an exercise to the reader.
11.3.2
305
front array
x[1]
x[2]
...
x[n-1]
x[n]
EnQueue O(1)
y[1]
y[2]
...
y[m-1]
y[m]
DeQueue O(1)
rear array
Figure 11.7: A queue with front and rear arrays shapes like a horseshoe magnet.
We can define such paired-array queue like the following Python code
class Queue:
def __init__(self):
self.front = []
self.rear = []
3 Legacy Basic code is not presented here. And we actually use list but not array in Python
to illustrate the idea. ANSI C and ISO C++ programs are provides along with this chapter,
they show more in a purely array manner.
306
def is_empty(q):
return q.front == [] and q.rear == []
The relative Push() and Pop() algorithm only manipulate on the tail of the
arrays.
function Push(Q, x)
Append(Rear(Q), x)
Here we assume that the Append() algorithm append element x to the end
of the array, and handle the necessary memory allocation etc. Actually, there
are multiple memory handling approaches. For example, besides the dynamic
re-allocation, we can initialize the array with enough space, and just report error
if its full.
function Pop(Q)
if Front(Q) = then
Front(Q) Reverse(Rear(Q))
Rear(Q)
n Length(Front(Q))
x Front(Q)[n]
Length(Front(Q)) n 1
return x
For simplification and pure illustration purpose, the array isnt shrunk explicitly after elements removed. So test if front array is empty () can be realized
as check if the length of the array is zero. We omit all these details here.
The enqueue and dequeue algorithms can be translated to Python programs
straightforwardly.
def push(q, x):
q.rear.append(x)
def pop(q):
if q.front == []:
q.rear.reverse()
(q.front, q.rear) = (q.rear, [])
return q.front.pop()
Exercise 11.3
Prove that the amortized performance of paired-list queue is O(1).
Prove that the amortized performance of paired-array queue is O(1).
11.4
Although paired-list queue is amortized O(1) for popping and pushing, the solution we proposed in previous section performs poor in the worst case. For
307
example, there is one element in the front list, and we push n elements continuously to the queue, here n is a big number. After that executing a pop operation
will cause the worst case.
According to the strategy we used so far, all the n elements are added to
the rear list. The front list turns to be empty after a pop operation. So the
algorithm starts to reverse the rear list. This reversing procedure is bound to
O(n) time, which is proportion to the length of the rear list. Sometimes, it cant
be acceptable for a very big n.
The reason why this worst case happens is because the front and rear lists are
extremely unbalanced. We can improve our paired-list queue design by making
them more balanced. One option is to add a balancing constraint.
|R| |F |
(11.6)
(11.7)
In the rest part of this section, we suppose the length of a list L, can be
retrieved as |L| in constant time.
Push and pop are almost as same as before except that we check the balance
invariant by passing length information and performs reversing accordingly.
push(Q, x) = balance(F, |F |, {x} R, |R| + 1)
(11.8)
(11.9)
|R| |F |
otherwise
(11.10)
Note that the function Queue() takes four parameters, the front list along
with its length (recorded), and the rear list along with its length, and forms a
paired-list queue augmented with length fields.
We can easily translate the equations to Haskell program. And we can
enforce the abstract queue interface by making the implementation an instance
of the Queue type class.
balance(F, |F |, R, |R|) =
Queue(F, |F |, R, |R|) :
Queue(F reverse(R), |F | + |R|, , 0) :
308
Exercise 11.4
Write the symmetric balance improvement solution for paired-array queue
in your favorite imperative programming language.
11.5
Although the extremely worst case can be avoided by improving the balancing
as what has been presented in previous section, the performance of reversing
rear list is still bound to O(n), where N = |R|. So if the rear list is very long,
the instant performance is still unacceptable poor even if the amortized time is
O(1). It is particularly important in some real-time system to ensure the worst
case performance.
As we have analyzed, the bottleneck is the computation of F reverse(R).
This happens when |R| > |F |. Considering that |F | and |R| are all integers, so
this computation happens when
|R| = |F | + 1
(11.11)
Both F and the result of reverse(R) are singly linked-list, It takes O(|F |)
time to concatenate them together, and it takes extra O(|R|) time to reverse
the rear list, so the total computation is bound to O(|N |), where N = |F | + |R|.
Which is proportion to the total number of elements in the queue.
In order to realize a real-time queue, we cant computing F reverse(R)
monolithic. Our strategy is to distribute this expensive computation to every
pop and push operations. Thus although each pop and push get a bit slow, we
may avoid the extremely slow worst pop or push case.
Incremental reverse
Lets examine how functional reverse algorithm is implemented typically.
{
: X=
reverse(X) =
(11.12)
reverse(X ) {x1 } : otherwise
309
(11.13)
Where
reverse (X, A) =
A : X=
reverse (X , {x1 } A) : otherwise
(11.14)
And we can schedule (slow-down) the above reverse (X, A) function with
these two types of state.
{
step(S, X, A) =
(Sf , A) :
(Sr , X , {x1 } A) :
S = Sr X =
S = Sr X =
(11.15)
Each step, we examine the state type first, if the current state is Sr (ongoing), and the rest elements to be reversed in X is empty, we can turn the
algorithm to finish state Sf ; otherwise, we take the first element from X, put it
in front of A just as same as above, but we do NOT perform recursion, instead,
we just finish this step. We can store the current state as well as the resulted
310
X and A, the reverse can be continued at any time when we call next step
function in the future with the stored state, X and A passed in.
Here is an example of this step-by-step reverse algorithm.
step(Sr , hello, )
step(Sr , ello, h)
...
step(Sr , o, lleh)
step(Sr , , olleh)
= (Sr , ello, h)
= (Sr , llo, eh)
= (Sr , , olleh)
= (Sf , olleh)
Now we can distribute the reverse into steps in every pop and push operations. However, the problem is just half solved. We want to break down
F reverse(R), and we have broken reverse(R) into steps, we next need to
schedule(slow-down) the list concatenation part F ..., which is bound to O(|F |),
into incremental manner so that we can distribute it to pop and push operations.
Incremental concatenate
Its a bit more challenge to implement incremental list concatenation than list
reversing. However, its possible to re-use the result we gained from increment
then take elements one by one from X and put them in front of Y just as what
we have done in reverse .
X Y
reverse(reverse(X)) Y
reverse (reverse(X), ) Y
reverse (reverse(X), Y )
reverse ( X , Y )
(11.16)
This fact indicates us that we can use an extra state to instruct the step()
311
(Sc , F , {r1 } R ) :
next(S) =
(Sf , A) :
(Sc , X , {x1 } A) :
S
S
S
S
= Sr F = R =
= Sr F = R = {r1 }
= Sc X =
= Sc X =
(11.17)
All left to us is to distribute these incremental steps into every pop and push
operations to implement a real-time O(1) purely functional queue.
Sum up
Before we dive into the final real-time queue implementation, lets analyze how
many incremental steps are taken to achieve the result of F reverse(R). According to the balance variant we used previously, |R| = |F | + 1, Lets denotes
m = |F |.
Once the queue gets unbalanced due to some push or pop operation, we start
this incremental F reverse(R). It needs m + 1 steps to reverse R, and at the
same time, we finish reversing the list F within these steps. After that, we need
extra m + 1 steps to execute the concatenation. So there are 2m + 2 steps.
It seems that distribute one step inside one pop or push operation is the
natural solution. However, there is a critical question must be answered: Is it
possible that before we finish these 2m + 2 steps, the queue gets unbalanced
again due to a series push and pop?
There are two facts about this question, one is good news and the other is
bad news.
Lets first show the good news, that luckily, continuously pushing cant make
the queue unbalanced again before we finish these 2m + 2 steps to achieve
F reverse(R). This is because once we start re-balancing, we can get a new
front list F = F reverse(R) after 2m+2 steps. While the next time unbalance
is triggered when
|R | = |F | + 1
= |F | + |R| + 1
= 2m + 2
(11.18)
312
front copy
{fi , fi+1 , ..., fM }
first i 1 elements popped
on-going computation
...)
(Sr , F , ..., R,
new rear
{...}
new elements pushed
steps exactly get finished at that time point. Which means the new front list
F is calculated OK. We can safely go on to compute F reverse(R ). Thanks
to the balance invariant which is designed in previous section.
But, the bad news is that, pop operation can happen at anytime before these
2m + 2 steps finish. The situation is that once we want to extract element from
front list, the new front list F = F reverse(R) hasnt been ready yet. We
dont have a valid front list at hand.
One solution to this problem is to keep a copy of original front list F , during
the time we are calculating reverse(F ) which is described in phase 1 of our incremental computing strategy. So that we are still safe even if user continuously
performs first m pop operations. So the queue looks like in table 11.1 at some
time after we start the incremental computation and before phase 1 (reverse F
and R simultaneously) ending4 .
After these M pop operations, the copy of F is exhausted. And we just start
incremental concatenation phase at that time. What if user goes on popping?
The fact is that since F is exhausted (becomes ), we neednt do concate
Empty
Reverse Int [a] [a] [a] [a] -- n, f, acc_f r, acc_r
Append Int [a] [a]
-- n, rev_f, acc
Done [a] -- result: f ++ reverse r
4 One may wonder that copying a list takes linear time to the length of the list. If so
the whole solution would make no sense. Actually, this linear time copying wont happen
at all. This is because the purely functional nature, the front list wont be mutated either
by popping or by reversing. However, if trying to realize a symmetric solution with pairedarray and mutate the array in-place, this issue should be stated, and we can perform a lazy
copying, that the real copying work wont execute immediately, instead, it copies one element
every step we do incremental reversing. The detailed implementation is left as an exercise.
313
And the data structure is defined with three parts, the front list (augmented
with length); the on-going state of computing F reverse(R); and the rear list
(augmented with length).
Here is the Haskell definition of real-time queue.
data RealtimeQueue a = RTQ [a] Int (State a) [a] Int
The empty queue is composed with empty front and rear list together with
idle state S0 as Queue(, 0, S0 , , 0). And we can test if a queue is empty by
checking if |F | = 0 according to the balance invariant defined before. Push and
pop are changed accordingly.
push(Q, x) = balance(F, |F |, S, {x} R, |R| + 1)
(11.19)
(11.20)
The major dierence is abort() function. Based on our above analysis, when
there is popping, we need decrease the counter, so that we can concatenate
one element less. We define this as aborting. The details will be given after
balance() function.
The relative Haskell code for push and pop are listed like this.
push (RTQ f lenf s r lenr) x = balance f lenf s (x:r) (lenr + 1)
pop (RTQ (_:f) lenf s r lenr) = balance f (lenf - 1) (abort s) r lenr
The balance() function first check the balance invariant, if its violated, we
need start re-balance it by starting compute F reverse(R) incrementally;
otherwise we just execute one step of the unfinished incremental computation.
{
balance(F, |F |, S, R, |R|) =
step(F, |F |, S, R, |R|) :
step(F, |F | + |R|, (Sr , 0, F, , R, ), 0) :
|R| |F |
otherwise
(11.21)
The step() function typically transfer the state machine one state ahead, and
it will turn the state to idle (S0 ) when the incremental computation finishes.
{
step(F, |F |, S, R, |R|) =
Queue(F , |F |, S0 , R, |R|) :
Queue(F, |F |, S , R, |R|) :
S = Sf
otherwise
(11.22)
314
(Sc , n, F , {r1 } R ) :
next(S) =
(Sf , A) :
(S
,
n
1,
X
,
{x
c
1 } A) :
S :
S = Sr F =
S = Sr F =
S = Sc n = 0
S = Sc n = 0
otherwise
(11.23)
(Reverse n (x:f)
(Reverse n [] f
(Concat 0 _ acc)
(Concat n (x:f)
s=s
Function abort() is used to tell the state machine, we can concatenate one
element less since it is popped.
(Sf , A ) :
(Sc , n 1, X A) :
abort(S) =
(Sr , n 1, F, F , R, R ) :
S :
S = Sc n = 0
S = Sc n = 0
S = Sr
otherwise
(11.24)
It seems that weve done, however, there is still one tricky issue hidden
behind us. If we push an element x to an empty queue, the result queue will
be:
Queue(, 1, (Sc , 0, , {x}), , 0)
If we perform pop immediately, well get an error! We found that the front
list is empty although the previous computation of F reverse(R) has been
finished. This is because it takes one more extra step to transfer from the state
(Sc , 0, , A) to (Sf , A). Its necessary to refine the S in step() function a bit.
{
next(next(S)) : F =
S =
(11.25)
next(S) : otherwise
The modification reflects to the below Haskell code:
step f lenf s r lenr =
case s of
Done f RTQ f lenf Empty r lenr
s RTQ f lenf s r lenr
where s = if null f then next $ next s else next s
315
Note that this algorithm diers from the one given by Chris Okasaki in
[6]. Okasakis algorithm executes two steps per pop and push, while the one
presents in this chapter executes only one per pop and push, which leads to
more distributed performance.
Exercise 11.5
Why need we rollback one element when n = 0 in abort() function?
Realize the real-time queue with symmetric paired-array queue solution
in your favorite imperative programming language.
In the footnote, we mentioned that when we start incremental reversing
with in-place paired-array solution, copying the array cant be done monolithic or it will lead to linear time operation. Implement the lazy copying
so that we copy one element per step along with the reversing.
11.6
(11.26)
Where we initialized X as the front list F , Y as the rear list R, and the
accumulator A is initialized as empty .
The trigger of rotation is still as same as before when |F | + 1 = |R|. Lets
keep this constraint as an invariant during the whole rotation process, that
|X| + 1 = |Y | always holds.
Its obvious to deduce to the trivial case:
rotate(, {y1 }, A) = {y1 } A
(11.27)
X reverse(Y ) A
{x1 } (X reverse(Y ) A)
{x1 } (X reverse(Y ) ({y1 } A))
{x1 } rotate(X , Y , {y1 } A)
Definition of (11.26)
Associative of
Nature of reverse and associative of
Definition of (11.26)
(11.28)
316
rotate(X, Y, A) =
{y1 } A : X =
{x1 } rotate(X , Y , {y1 } A) : otherwise
(11.29)
If we execute lazily instead of strictly, that is, execute once pop or push
operation is performed, the computation of rotate can be distribute to push and
pop naturally.
Based on this idea, we modify the paired-list queue definition to change the
front list to a lazy list, and augment it with a computation stream. [5]. When
the queue triggers re-balance constraint by some pop/push, that |F | + 1 = |R|,
The algorithm creates a lazy rotation computation, then use this lazy rotation as
the new front list F ; the new rear list becomes , and a copy of F is maintained
as a stream.
After that, when we performs every push and pop; we consume the stream
by forcing a operation. This results us advancing one step along the stream,
{x} F , where F = tail(F ). We can discard x, and replace the stream F
with F .
Once all of the stream is exhausted, we can start another rotation.
In order to illustrate this idea clearly, we turns to Scheme/Lisp programming
language to show example codes, because it gives us explicit control of laziness.
In Scheme/Lisp, we have the following three tools to deal with lazy stream.
(define (cons-stream a b) (cons a (delay b)))
(define stream-car car)
(define (stream-cdr s) (cdr (force s)))
A queue is consist of three parts, a front list, a rear list, and a stream which
represents the computation of F reverse(R). Create an empty queue is trivial
as making all these three parts null.
(define empty (make-queue () () ()))
Note that the front-list is also lazy stream actually, so we need use stream
related functions to manipulate it. For example, the following function test if
the queue is empty by checking the front lazy list stream.
317
The push function is almost as same as the one given in previous section.
That we put the new element in front of the rear list; and then examine the
balance invariant and do necessary balancing works.
push(Q, x) = balance(F, {x} R, Rs )
(11.30)
Where F represents the lazy stream of front list; Rs is the stream of rotation
computation. The relative Scheme/Lisp code is give below.
(define (push q x)
(balance (front-lst q) (cons x (rear q)) (rots q)))
While pop is a bit dierent, because the front list is actually lazy stream,
we need force an evaluation. All the others are as same as before.
pop(Q) = balance(F , R, Rs )
(11.31)
For illustration purpose, we skip the error handling (such as pop from an
empty queue etc) here.
And one can access the top element in the queue by extract from the front
list stream.
(define (front q) (stream-car (front-lst q)))
(11.33)
318
11.7
Just as mentioned in the beginning of this book in the first chapter, queue
isnt so simple as it was thought. Weve tries to explain algorithms and data
structures both in imperative and in function approaches; Sometimes, it gives
impression that functional way is simpler and more expressive in most time.
However, there are still plenty of areas, that more studies and works are needed
to give equivalent functional solution. Queue is such an important topic, that
it links to many fundamental purely functional data structures.
Thats why Chris Okasaki made intensively study and took a great amount of
discussions in [6]. With purely functional queue solved, we can easily implement
dequeue with the similar approach revealed in this chapter. As we can handle
elements eectively in both head and tail, we can advance one step ahead to
realize sequence data structures, which support fast concatenate, and finally we
can realize random access data structures to mimic array in imperative settings.
The details will be explained in later chapters.
Note that, although we havent mentioned priority queue, its quite possible
to realized it with heaps. We have covered topic of heaps in several previous
chapters.
Exercise 11.6
Realize dequeue, wich support adding and removing elements on both
sides in constant O(1) time in purely functional way.
Realize dequeue in a symmetric solution only with array in your favorite
imperative programming language.
Bibliography
[1] Maged M. Michael and Michael L. Scott. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms.
http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html
[2] Herb Sutter. Writing a Generalized Concurrent Queue. Dr. Dobbs Oct
29, 2008. http://drdobbs.com/cpp/211601363?pgno=1
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[4] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[5] Wikipedia. Tail-call. http://en.wikipedia.org/wiki/Tail call
[6] Wikipedia.
Recursion
(computer
science).
http://en.wikipedia.org/wiki/Recursion (computer science)#Tailrecursive functions
[7] Harold Abelson, Gerald Jay Sussman, Julie Sussman. Structure and Interpretation of Computer Programs, 2nd Edition. MIT Press, 1996, ISBN
0-262-51087-1
319
320
Sequences
Chapter 12
Introduction
In the first chapter of this book, which introduced binary search tree as the hello
world data structure, we mentioned that neither queue nor array is simple if
realized not only in imperative way, but also in functional approach. In previous
chapter, we explained functional queue, which achieves the similar performance
as its imperative counterpart. In this chapter, well dive into the topic of arraylike data structures.
We have introduced several data structures in this book so far, and it seems
that functional approaches typically bring more expressive and elegant solution. However, there are some areas, people havent found competitive purely
functional solutions which can match the imperative ones. For instance, the
Ukkonen linear time sux tree construction algorithm. another examples is
Hashing table. Array is also among them.
Array is trivial in imperative settings, it enables randomly accessing any
elements with index in constant O(1) time. However, this performance target
cant be achieved directly in purely functional settings as there is only list can
be used.
In this chapter, we are going to abstract the concept of array to sequences.
Which support the following features
Element can be inserted to or removed from the head of the sequence
quickly in O(1) time;
Element can be inserted to or removed from the tail of the sequence quickly
in O(1) time;
Support concatenate two sequences quickly (faster than linear time);
Support randomly access and update any element quickly;
Support split at any position quickly;
We call these features abstract sequence properties, and it easy to see the
fact that even array (here means plain-array) in imperative settings cant meet
them all at the same time.
321
322
Well provide three solutions in this chapter. Firstly, well introduce a solution based on binary tree forest and numeric representation; Secondly, well
show a concatenate-able list solution; Finally, well give the finger tree solution.
Most of the results are based on Chris, Okasakis work in [6].
12.2
12.2.1
12.2.2
323
t2
t1
x1
x2
x3
x4
x5
x6
(12.1)
The only dierence from the typical binary tree is that we augment the size
information to the tree. This enable us to get the size without calculation at
every time. For instance.
size (Leaf _) = 1
size (Node sz _ _) = sz
324
12.2.3
t1
x1
x2
x1
t2
t1
x3
x2
x1
x4
x3
x2
x1
(12.2)
325
t2
t2
t1
x5
x4
x3
x2
x1
x6
x5
x4
x3
x2
x1
{t} : F =
{t} F : size(t) < size(t1 )
(12.4)
Here we use the Lisp tradition to name the function that insert an element
before a list as cons.
Remove the element from the head of the sequence
Its not complex to realize the inverse operation of cons, which can remove
element from the head of the sequence.
If the first tree in the forest is a singleton leaf, remove this tree from the
forest;
326
Figure 12.4 illustrates the steps of removing elements from the head of the
sequence.
t2
x5
x4
x3
t2
x2
x1
x4
x3
x2
x1
t1
x3
x2
x1
(t1 , F ) :
extractT ree({tl , tr } F ) :
t1 is leaf
otherwise
(12.5)
With this function defined, its convenient to give head and tail functions,
the former returns the first element in the sequence, the latter return the rest.
head(S) = key(f irst(extractT ree(S)))
(12.6)
(12.7)
327
Where function f irst returns the first element in a paired-value (as known
as tuple); second returns the second element respectively. Function key is used
to access the element inside a leaf. Below are Haskell programs corresponding
to these two functions.
head ts = x where (Leaf x, _) = extractTree ts
tail = snd extractTree
Note that as head and tail functions have already been defined in Haskell
standard library, we given them apostrophes to make them distinct. (another
option is to hide the standard ones by importing. We skip the details as they
are language specific).
Random access the element in binary random access list
As trees in the forest help managing the elements in blocks, giving an arbitrary
index, its easy to locate which tree this element is stored, after that performing
a search in the tree yields the result. As all trees are binary (more accurate,
complete binary tree), the search is essentially binary search, which is bound to
the logarithm of the tree size. This brings us a faster random access capability
than linear search in linked-list setting.
Given an index i, and a sequence S, which is actually a forest of trees, the
algorithm is executed as the following 1 .
1. Compare i with the size of the first tree T1 in the forest, if i is less than
or equal to the size, the element exists in T1 , perform looking up in T1 ;
2. Otherwise, decrease i by the size of T1 , and repeat the previous step in
the rest of the trees in the forest.
This algorithm can be represented as the below equation.
{
lookupT ree(T1 , i) : i |T1 |
get(S, i) =
get(S , i |T1 |) : otherwise
(12.8)
Where |T | = size(T ), and S = {T2 , T3 , ...} is the rest of trees without the
first one in the forest. Note that we dont handle out of bound error case, this
is left as an exercise to the reader.
Function lookupT ree is a binary search algorithm. If the index i is 1, we
just return the root of the tree, otherwise, we halve the tree by unlinking, if i is
less than or equal to the size of the halved tree, we recursively look up the left
tree, otherwise, we look up the right tree.
lookupT ree(T, i) =
root(T ) : i = 1
lookupT ree(lef t(T )) : i |T2 |
(12.9)
Where function lef t returns the left tree Tl of T , while right returns Tr .
The corresponding Haskell program is given as below.
1 We follow the tradition that the index i starts from 1 in algorithm description; while it
starts from 0 in most programming languages
328
Figure 12.5 illustrates the steps of looking up the 4-th element in a sequence
of size 6. It first examine the first tree, since the size is 2 which is smaller than
4, so it goes on looking up for the second tree with the updated index i = 4 2,
which is the 2nd element in the rest of the forest. As the size of the next tree is
4, which is greater than 2, so the element to be searched should be located in
this tree. It then examines the left sub tree since the new index 2 is not greater
than the half size 4/2=2; The process next visits the right grand-child, and the
final result is returned.
t2
t2
t1
x6
x5
x4
x3
x2
x1
x4
x3
x2
x1
left(t2)
x4
x3
x3
(12.10)
Where S = {T2 , T3 , ...} is the rest of the trees in the forest without the first
one.
leaf (x) : i = 0 |T | = 1
tree(|T |, setT ree(Tl , i, x), Tr ) : i < |T2 |
As the nature of complete binary tree, for a sequence with n elements, which
is represented by binary random access list, the number of trees in the forest
is bound to O(lg n). Thus it takes O(lg n) time to locate the tree for arbitrary
index i, that contains the element in the worst case. The followed tree search
is bound to the heights of the tree, which is O(lg n) in the worst case as well.
So the total performance of random access is O(lg n).
Exercise 12.1
1. The random access algorithm given in this section doesnt handle the error
such as out of bound index at all. Modify the algorithm to handle this
case, and implement it in your favorite programming language.
2. Its quite possible to realize the binary random access list in imperative
settings, which is benefited with fast operation on the head of the sequence. the random access can be realized in two steps: firstly locate
the tree, secondly use the capability of constant random access of array.
Write a program to implement it in your favorite imperative programming
language.
12.3
330
by increasing the binary number by one; while remove an element from the head
mimics the decreasing of the corresponding binary number by one. This is as
known as numeric representation [6].
In order to represent the binary random access list with binary number, we
can define two states for a bit. That Zero means there is no such a tree with
size which is corresponding to the bit, while One, means such tree exists in the
forest. And we can attach the tree with the state if it is One.
The following Haskell program for instance defines such states.
data Digit a = Zero
| One (Tree a)
type RAList a = [Digit a]
Here we reuse the definition of complete binary tree and attach it to the
state One. Note that we cache the size information in the tree as well.
With digit defined, forest can be treated as a list of digits. Lets see how
inserting a new element can be realized as binary number increasing. Suppose
function one(t) creates a One state and attaches tree t to it. And function
getT ree(s) get the tree which is attached to the One state s. The sequence S
is a list of digits of states that S = {s1 , s2 , ...}, and S is the rest of digits with
the first one removed.
{one(t)}
{one(t)} S
: S=
: s1 = Zero
: otherwise
(12.12)
When we insert a new tree t to a forest S of binary digits, If the forest is
empty, we just create a One state, attach the tree to it, and make this state the
only digit of the binary number. This is just like 0 + 1 = 1;
Otherwise if the forest isnt empty, we need examine the first digit of the
binary number. If the first digit is Zero, we just create a One state, attach the
tree, and replace the Zero state with the new created One state. This is just like
(...digits...0)2 +1 = (...digits...1)2 . For example 6+1 = (110)2 +1 = (111)2 = 7.
The last case is that the first digit is One, here we make assumption that the
tree t to be inserted has the same size with the tree attached to this One state at
this stage. This can be ensured by calling this function from inserting a leaf, so
that the size of the tree to be inserted grows in a series of 1, 2, 4, ..., 2i , .... In such
case, we need link these two trees (one is t, the other is the tree attached to the
One state), and recursively insert the linked result to the rest of the digits. Note
that the previous One state has to be replaced with a Zero state. This is just
like (...digits...1)2 + 1 = (...digits ...0)2 , where (...digits ...)2 = (...digits...)2 + 1.
For example 7 + 1 = (111)2 + 1 = (1000)2 = 8
Translating this algorithm to Haskell yields the following program.
insertT ree(S, t) =
insertTree
insertTree
insertTree
insertTree
All the other functions, including link(), cons() etc. are as same as before.
extractT ree(S) =
(t, ) : S = {one(t)}
(t, S ) : s1 = one(t)
(12.13)
332
12.3.1
Its trivial to implement the imperative binary random access list by using
binary trees, and the recursion can be eliminated by updating the focused tree
in loops. This is left as an exercise to the reader. In this section, well show
some dierent imperative implementation by using the properties of numeric
representation.
Remind the chapter about binary heap. Binary heap can be represented by
implicit array. We can use similar approach that use an array of 1 element to
represent the leaf; use an array of 2 elements to represent a binary tree of height
1; and use an array of 2m to represent a complete binary tree of height m.
This brings us the capability of accessing any element with index directly
instead of divide and conquer tree search. However, the tree linking operation
has to be implemented as array copying as the expense.
The following ANSI C code defines such a forest.
#define M sizeof(int) 8
typedef int Key;
struct List {
int n;
Key tree[M];
};
Where n is the number of the elements stored in this forest. Of course we can
avoid limiting the max number of trees by using dynamic arrays, for example
as the following ISO C++ code.
template<typename Key>
struct List {
int n;
vector<vector<key> > tree;
};
However, the performance in theory isnt as good as before. This is because the linking operation downgrades from O(1) constant time to linear array
copying.
334
We can again calculate the average (amortized) performance by using aggregation analysis. When insert n = 2m elements to an empty list which is
represented by implicit binary trees in arrays, the numeric presentation of the
forest of arrays are as same as before except for the cost of bit flipping.
i
forest (MSB ... LSB)
0
0, 0, ..., 0, 0
0, 0, ..., 0, 1
1
2
0, 0, ..., 1, 0
3
0, 0, ..., 1, 1
...
...
2m 1
1, 1, ..., 1, 1
2m
1, 0, 0, ..., 0, 0
bit change cost 1 2m , 1 2m1 , 2 2m2 , ... 2m2 2, 2m1 1
The LSB of the forest changed every time when there is a new element
inserted, however, it creates leaf tree and performs copying only it changes from
0 to 1, so the cost is half of n unit, which is 2m1 ; The next bit flips as half as
the LSB. Each time the bit gets flipped to 1, it copies the first tree as well as
the new element to the second tree. the the cost of flipping a bit to 1 in this bit
is 2 units, but not 1; For the MSB, it only flips to 1 at the last time, but the
cost of flipping this bit, is copying all the previous trees to fill the array of size
2m .
Summing all to cost and distributing them to the n times of insertion yields
the amortized performance as below.
O(T /N )
O(
(12.15)
335
i -= sz;
}
return a.tree[j][i];
}
The imperative removal and random mutating algorithms are left as exercises
to the reader.
Exercise 12.2
1. Please implement the random access algorithms, including looking up and
updating, for binary random access list with numeric representation in
your favorite programming language.
2. Prove that the amortized performance of deletion is O(1) constant time
by using aggregation analysis.
3. Design and implement the binary random access list by implicit array in
your favorite imperative programming language.
12.4
12.4.1
Definition
x[n]
...
x[2]
x[1]
y[1]
y[2]
...
y[m]
336
Here we use vector provides in standard library to cover the dynamic memory
management issues, so that we can concentrate on the algorithm design.
12.4.2
Suppose function Front(L) returns the front array, while Rear(L) returns the
rear array. For illustration purpose, we assume the arrays are dynamic allocated.
inserting and appending can be realized as the following.
function Insert(L, x)
F Front(L)
Size(F ) Size(F ) + 1
F [Size(F )] x
function Append(L, x)
R Rear(L)
Size(R) Size(R) + 1
R[Size(R)] x
As all the above operations manipulate the front and rear array on tail, they
are all constant O(1) time. And the following are the corresponding ISO C++
programs.
template<typename Key>
void insert(List<Key>& xs, Key x) {
++xs.n;
xs.front.push_back(x);
}
template<typename Key>
void append(List<Key>& xs, Key x) {
++xs.m;
xs.rear.push_back(x);
}
12.4.3
random access
As the inner data structure is array (dynamic array as vector), which supports
random access by nature, its trivial to implement constant time indexing algorithm.
function Get(L, i)
F Front(L)
n Size(F )
if i n then
return F [n i + 1]
else
return Rear(L)[i n]
Here the index i [1, |L|]. If it is not greater than the size of front array,
the element is stored in front. However, as front and rear arrays are connect
head-to-head, so the elements in front array are in reverse order. We need locate
the element by subtracting the size of front array by i; If the index i is greater
than the size of front array, the element is stored in rear array. Since elements
337
are stored in normal order in rear, we just need subtract the index i by an oset
which is the size of front array.
Here is the ISO C++ program implements this algorithm.
template<typename Key>
Key get(List<Key>& xs, int i) {
if( i < xs.n )
return xs.front[xs.n-i-1];
else
return xs.rear[i-xs.n];
}
12.4.4
+
1
...
m]
2
else if R = then
R Reverse(F [1 ... n2 ])
F F [ n2 + 1 ... n]
Actually, the operations are symmetric for the case that front is empty and
the case that rear is empty. Another approach is to swap the front and rear for
one symmetric case and recursive resumes the balance, then swap the front and
rear back. For example below ISO C++ program uses this method.
template<typename Key>
void balance(List<Key>& xs) {
if(xs.n == 0) {
back_insert_iterator<vector<Key> > i(xs.front);
reverse_copy(xs.rear.begin(), xs.rear.begin() + xs.m/2, i);
xs.rear.erase(xs.rear.begin(), xs.rear.begin() +xs.m/2);
xs.n = xs.m/2;
xs.m -= xs.n;
} else if(xs.m == 0) {
swap(xs.front, xs.rear);
swap(xs.n, xs.m);
balance(xs);
swap(xs.front, xs.rear);
swap(xs.n, xs.m);
}
}
338
Its obvious that the worst case performance is O(n) where n is the number of
elements stored in paired-array list. This happens when balancing is triggered,
and both reverse and shifting are linear operation. However, the amortized
performance of removal is still O(1), the proof is left as exercise to the reader.
Exercise 12.3
339
12.5
Concatenate-able list
By using binary random access list, we realized sequence data structure which
supports O(lg n) time insertion and removal on head, as well as random accessing
element with a given index.
However, its not so easy to concatenate two lists. As both lists are forests of
complete binary trees, we cant merely merge them (Since forests are essentially
list of trees, and for any size, there is at most one tree of that size. Even
concatenate forests directly is not fast). One solution is to push the element
from the first sequence one by one to a stack and then pop those elements and
insert them to the head of the second one by using cons function. Of course
the stack can be implicitly used in recursion manner, for instance:
{
s2 : s1 =
cons(head(s1 ), concat(tail(s1 ), s2 )) : otherwise
(12.16)
Where function cons, head and tail are defined in previous section.
If the length of the two sequence is n, and m, this method takes O(N lg n)
time repeatedly push all elements from the first sequence to stacks, and then
takes (n lg(n+m)) to insert the elements in front of the second sequence. Note
that means the upper limit, There is detailed definition for it in [2].
We have already implemented the real-time queue in previous chapter. It
supports O(1) time pop and push. If we can turn the sequence concatenation
to a kind of pushing operation to queue, the performance will be improved to
O(1) as well. Okasaki gave such realization in [6], which can concatenate lists
in constant time.
To represent a concatenate-able list, the data structure designed by Okasaki
is essentially a K-ary tree. The root of the tree stores the first element in the
list. So that we can access it in constant O(1) time. The sub-trees or children
are all small concatenate-able lists, which are managed by real-time queues.
Concatenating another list to the end is just adding it as the last child, which
is in turn a queue pushing operation. Appending a new element can be realized
as that, first wrapping the element to a singleton tree, which is a leaf with no
children. Then, concatenate this singleton to finalize appending.
Figure 12.7 illustrates this data structure.
Such recursively designed data structure can be defined in the following
Haskell code.
concat(s1 , s2 ) =
340
x[1]
c[1]
x[2]...x[i]
c[2]
...
c[n]
x[i+1]...x[j]
x[k]...x[n]
x[1]
c[1]
x[2]...x[i]
c[2]
...
c[n]
x[i+1]...x[j]
c[n+1]
x[k]...x[n]
y[1]...y[m]
concat(s1 , s2 ) =
s1 : s2 =
s2 : s1 =
(12.17)
341
Besides the good performance of concatenation, this design also brings satisfied features for adding element both on head and tail.
cons(x, s) = concat(clist(x, ), s)
(12.18)
(12.19)
Getting the first element is just returning the root of the K-ary tree.
head(s) = root(s)
(12.20)
Its a bit complex to realize the algorithm that removes the first element
from a concatenate-able list. This is because after the root, which is the first
element in the sequence got removed, we have to re-construct the rest things, a
queue of sub-lists, to a K-ary tree.
After the root being removed, there left all children of the K-ary tree. Note
that all of them are also concatenate-able list, so that one natural solution is to
concatenate them all together to a big list.
{
: Q=
concat(f ront(Q), concatAll(pop(Q))) : otherwise
(12.21)
Where function f ront just returns the first element from a queue without
removing it, while pop does the removing work.
If the queue is empty, it means that there is no children at all, so the result is
also an empty list; Otherwise, we pop the first child, which is a concatenate-able
list, from the queue, and recursively concatenate all the rest children to a list;
finally, we concatenate this list behind the already popped first children.
With concatAll defined, we can then implement the algorithm of removing
the first element from a list as below.
concatAll(Q) =
tail(s) = linkAll(queue(s))
(12.22)
e : Q=
f (f ront(Q), f oldQ(f, e, pop(Q))) : otherwise
(12.23)
3 Some functional programming language, such as Haskell, defined type class, which is a
concept of monoid so that its easy to support folding on a customized data structure.
342
(12.24)
Exercise 12.4
1. Can you figure out a solution to append an element to the end of a binary
random access list?
2. Prove that the amortized performance of removal operation for concatenateable list is O(1). Hint: using the bankers method.
3. Implement the concatenate-able list in your favorite imperative language.
12.6
Finger tree
We havent been able to meet all the performance targets listed at the beginning
of this chapter.
Binary random access list enables to insert, remove element on the head of
sequence, and random access elements fast. However, it performs poor when
concatenates lists. There is no good way to append element at the end of binary
random access list.
Concatenate-able list is capable to concatenates multiple lists in a fly, and
it performs well for adding new element both on head and tail. However, it
doesnt support randomly access element with a given index.
These two examples bring us some ideas:
343
In order to support fast manipulation both on head and tail of the sequence, there must be some way to easily access the head and tail position;
Tree like data structure helps to turn the random access into divide and
conquer search, if the tree is well balance, the search can be ensured to be
logarithm time.
12.6.1
Definition
Finger tree[6], which was first invented in 1977, can help to realize ecient
sequence. And it is also well implemented in purely functional settings[5].
As we mentioned that the balance of the tree is critical to ensure the performance for search. One option is to use balanced tree as the under ground data
structure for finger tree. For example the 2-3 tree, which is a special B-tree.
(readers can refer to the chapter of B-tree of this book).
A 2-3 tree either contains 2 children or 3. It can be defined as below in
Haskell.
data Node a = Br2 a a | Br3 a a a
In imperative settings, node can be defined with a list of sub nodes, which
contains at most 3 children. For instance the following ANSI C code defines
node.
union Node {
Key keys;
union Node children;
};
344
We can use NIL pointer to represent an empty tree; and a leaf tree contains
only one element in its front finger, both its rear finger and middle part are
empty.
Figure 12.8 and 12.9 show some examples of figure tree.
NIL
NIL
NIL
(b) The tree resumes balancing. There are 2 elements in front finger;
The middle part is a
leaf, which contains a 3branches 2-3 tree.
345
The first example is an empty finger tree; the second one shows the result
after inserting one element to empty, it becomes a leaf of one node; the third
example shows a finger tree contains 2 elements, one is in front finger, the other
is in rear.
If we continuously insert new elements, to the tree, those elements will be
put in the front finger one by one, until it exceeds the limit of 2-3 tree. The 4-th
example shows such condition, that there are 4 elements in front finger, which
isnt balanced any more.
The last example shows that the finger tree gets fixed so that it resumes
balancing. There are two elements in the front finger. Note that the middle
part is not empty any longer. Its a leaf of a 2-3 tree (why its a leaf is explained
later). The content of the leaf is a tree with 3 branches, each contains an
element.
We can express these 5 examples as the following Haskell expression.
Empty
Lf a
[b] Empty [a]
[e, d, c, b] Empty [a]
[f, e] Lf (Br3 d c b) [a]
In the last example, why the middle part inner tree is a leaf? As we
mentioned that the definition of finger tree is recursive. The middle part
besides the front and rear finger is a deeper finger tree, which is defined as
T ree(N ode(a)). Every time we go deeper, the N ode is embedded one more
level. if the element type of the first level tree is a, the element type for the
second level tree is N ode(a), the third level is N ode(N ode(a)), ..., the n-th
level is N ode(N ode(N ode(...(a))...)) = N oden (a), where n indicates the N ode
is applied n times.
12.6.2
The examples list above actually reveal the typical process that the elements are
inserted one by one to a finger tree. Its possible to summarize these examples
to some cases for insertion on head algorithm.
When we insert an element x to a finger tree T ,
If the tree is empty, the result is a leaf which contains the singleton element
x;
If the tree is a singleton leaf of element y, the result is a new finger tree.
The front finger contains the new element x, the rear finger contains the
previous element y; the middle part is a empty finger tree;
If the number of elements stored in front finger isnt bigger than the upper
limit of 2-3 tree, which is 3, the new element is just inserted to the head
of front finger;
otherwise, it means that the number of elements stored in front finger
exceeds the upper limit of 2-3 tree. the last 3 elements in front finger is
wrapped in a 2-3 tree and recursively inserted to the middle part. the new
element x is inserted in front of the rest elements in front finger.
346
Suppose that function leaf (x) creates a leaf of element x, function tree(F, T , R)
creates a finger tree from three part: F is the front finger, which is a list contains
several elements. Similarity, R is the rear finger, which is also a list. T is the
middle part which is a deeper finger tree. Function tr3(a, b, c) creates a 2-3 tree
from 3 elements a, b, c; while tr2(a, b) creates a 2-3 tree from 2 elements a and
b.
leaf (x) :
tree({x}, , {y}) :
insertT (x, T ) =
tree({x, x1 }, insertT (tr3(x2 , x3 , x4 ), T ), R) :
tree({x} F, T , R) :
T =
T = leaf (y)
T = tree({x1 , x2 , x3 , x4 }, T , R)
otherwise
(12.25)
The performance of this algorithm is dominated by the recursive case. All
the other cases are constant O(1) time. The recursion depth is proportion to
the height of the tree, so the algorithm is bound to O(h) time, where h is the
height. As we use 2-3 tree to ensure that the tree is well balanced, h = O(lg n),
where n is the number of elements stored in the finger tree.
More analysis reveal that the amortized performance of insertT is O(1)
because we can amortize the expensive recursion case to other trivial cases.
Please refer to [6] and [5] for the detailed proof.
Translating the algorithm yields the below Haskell program.
cons
cons
cons
cons
cons
:: a Tree a Tree a
a Empty = Lf a
a (Lf b) = Tr [a] Empty [b]
a (Tr [b, c, d, e] m r) = Tr [a, b] (cons (Br3 c d e) m) r
a (Tr f m r) = Tr (a:f) m r
Here we use the LISP naming convention to illustrate inserting a new element
to a list.
The insertion algorithm can also be implemented in imperative approach.
Suppose function Tree() creates an empty tree, that all fields, including front
and rear finger, the middle part inner tree and parent are empty. Function
Node() creates an empty node.
function Prepend-Node(n, T )
r Tree()
pr
Connect-Mid(p, T )
while Full?(Front(T )) do
F Front(T )
F = {n1 , n2 , n3 , ...}
Front(T ) {n, F [1]}
F [1] = n1
n Node()
Children(n) F [2..]
F [2..] = {n2 , n3 , ...}
pT
T Mid(T )
if T = NIL then
T Tree()
Front(T ) {n}
else if | Front(T ) | = 1 Rear(T ) = then
Rear(T ) Front(T )
347
Front(T ) {n}
else
Front(T ) {n} Front(T )
Connect-Mid(p, T ) T
return Flat(r)
Where the notation L[i..] means a sub list of L with the first i 1 elements
removed, that if L = {a1 , a2 , ..., an }, then L[i..] = {ai , ai+1 , ..., an }.
Functions Front, Rear, Mid, and Parent are used to access the front
finger, the rear finger, the middle part inner tree and the parent tree respectively;
Function Children accesses the children of a node.
Function Connect-Mid(T1 , T2 ), connect T2 as the inner middle part tree
of T1 , and set the parent of T2 as T1 if T2 isnt empty.
In this algorithm, we performs a one pass top-down traverse along the middle
part inner tree if the front finger is full that it cant aord to store any more.
The criteria for full for a 2-3 tree is that the finger contains 3 elements already.
In such case, we extract all the elements except the first one o, wrap them to a
new node (one level deeper node), and continuously insert this new node to its
middle inner tree. The first element is left in the front finger, and the element
to be inserted is put in front of it, so that this element becomes the new first
one in the front finger.
After this traversal, the algorithm either reach an empty tree, or the tree
still has room to hold more element in its front finger. We create a new leaf
for the former case, and perform a trivial list insert to the front finger for the
latter.
During the traversal, we use p to record the parent of the current tree we
are processing. So any new created tree are connected as the middle part inner
tree to p.
Finally, we return the root of the tree r. The last trick of this algorithm is
the Flat function. In order to simplify the logic, we create an empty ground
tree and set it as the parent of the root. We need eliminate this extra ground
level before return the root. This flatten algorithm is realized as the following.
function Flat(T )
while T = NIL T is empty do
T Mid(T )
if T = NIL then
Parent(T ) NIL
return T
The while loop test if T is trivial empty, that its not NIL(= ), while both
its front and rear fingers are empty.
Below Python code implements the insertion algorithm for finger tree.
def insert(x, t):
return prepend_node(wrap(x), t)
def prepend_node(n, t):
root = prev = Tree()
prev.set_mid(t)
while frontFull(t):
f = t.front
t.front = [n] + f[:1]
348
def flat(t):
while t is not None and t.empty():
t = t.mid
if t is not None:
t.parent = None
return t
12.6.3
Its easy to implement the reverse operation that remove the first element from
the list by reversing the insertT () algorithm line by line.
Lets denote F = {f1 , f2 , ...} is the front finger list, M is the middle part
inner finger tree. R = {r1 , r2 , ...} is the rear finger list of a finger tree, and
R = {r2 , r3 , ...} is the rest of element with the first one removed from R.
(x, )
(x, leaf (y))
(x, tree({r1 }, , R ))
(x, tree(toList(F ), M , R))
(f1 , tree({f2 , f3 , ...}, M, R))
:
:
:
:
:
T = leaf (x)
T = tree({x}, , {y})
T = tree({x}, , R)
extractT (T ) =
otherwise
(12.26)
Where function toList(T ) converts a 2-3 tree to plain list as the following.
{
{x, y} : T = tr2(x, y)
toList(T ) =
(12.27)
{x, y, z} : T = tr3(x, y, z)
Here we skip the error handling such as trying to remove element from empty
tree etc. If the finger tree is a leaf, the result after removal is an empty tree;
If the finger tree contains two elements, one in the front finger, the other in
rear, we return the element stored in front finger as the first element, and the
resulted tree after removal is a leaf; If there is only one element in front finger,
the middle part inner tree is empty, and the rear finger isnt empty, we return
the only element in front finger, and borrow one element from the rear finger
to front; If there is only one element in front finger, however, the middle part
inner tree isnt empty, we can recursively remove a node from the inner tree,
and flatten it to a plain list to replace the front finger, and remove the original
349
only element in front finger; The last case says that if the front finger contains
more than one element, we can just remove the first element from front finger
and keep all the other part unchanged.
Figure 12.10 shows the steps of removing two elements from the head of
a sequence. There are 10 elements stored in the finger tree. When the first
element is removed, there is still one element left in the front finger. However,
when the next element is removed, the front finger is empty. So we borrow one
tree node from the middle part inner tree. This is a 2-3 tree. it is converted
to a list of 3 elements, and the list is used as the new finger. the middle part
inner tree change from three parts to a singleton leaf, which contains only one
2-3 tree node. There are three elements stored in this tree node.
Below is the corresponding Haskell program for uncons.
uncons
uncons
uncons
uncons
uncons
uncons
Similar as above, we can define head and tail function from uncons.
head = fst uncons
tail = snd uncons
12.6.4
The strategy used so far to remove element from finger tree is a kind of removing
and borrowing. If the front finger becomes empty after removing, we borrows
more nodes from the middle part inner tree. However there exists cases that the
tree is ill-formed, for example, both the front fingers of the tree and its middle
part inner tree are empty. Such ill-formed tree can result from imperatively
splitting, which well introduce later.
Here we developed an imperative algorithm which can remove the first element from finger tree even it is ill-formed. The idea is first perform a top-down
traverse to find a sub tree which either has a non-empty front finger or both
its front finger and middle part inner tree are empty. For the former case, we
can safely extract the first element which is a node from the front finger; For
the latter case, since only the rear finger isnt empty, we can swap it with the
empty front finger, and change it to the former case.
After that, we need examine the node we extracted from the front finger is
leaf node (How to do that? this is left as an exercise to the reader). If not, we
need go on extracting the first sub node from the children of this node, and left
the rest of other children as the new front finger to the parent of the current
tree. We need repeatedly go up along with the parent field till the node we
extracted is a leaf. At that time point, we arrive at the root of the tree. Figure
12.12 illustrates this process.
350
x[10]
x[9]
x[2]
x[1]
NIL
x[8]
x[7]
x[6]
x[5]
x[4]
x[3]
x[9]
x[2]
x[1]
NIL
x[8]
x[7]
x[6]
x[5]
x[4]
x[3]
x[8]
x[7]
x[6]
x[5]
x[2]
x[4]
x[1]
x[3]
351
[]
r[1][1]
r[1][2]
...
[]
r[2][1]
r[2][2]
...
r[i][1]
r[i][2]
...
n[i][1]
n[i][2]
...
...
...
Figure 12.11: Example of an ill-formed tree. The front finger of the i-th level
sub tree isnt empty.
children of n[i][1]=
n[i-1][1]
n[i-1][2]
[]
r[1][1]
r[1][2]
...
[]
r[2][1]
r[2][2]
...
x[1] is extracted
x[2]
n[2][2]
x[3]
...
r[1][1]
r[1][2]
...
n[2][3]
...
r[2][1]
r[2][2]
...
...
...
i-1
i-1
...
r[i-1][1]
r[i-1][2]
...
n[i-1][2]
n[i-1][3]
...
n[i][2]
...
r[i-1][1]
r[i-1][2]
...
r[i][1]
r[i][2]
...
n[i][2]
...
...
r[i][1]
r[i][2]
...
...
(a) Extract the first element n[i][1] and put its children(b) Repeat this process i times, and finally x[1] is
to the front finger of upper level tree.
extracted.
352
Based on this idea, the following algorithm realizes the removal operation
on head. The algorithm assumes that the tree passed in isnt empty.
function Extract-Head(T )
r Tree()
Connect-Mid(r, T )
while Front(T ) = Mid(T ) = NIL do
T Mid(T )
if Front(T ) = Rear(T ) = then
Exchange Front(T ) Rear(T )
n Node()
Children(n) Front(T )
repeat
L Children(n)
L = {n1 , n2 , n3 , ...}
n L[1]
n n1
Front(T ) L[2..]
L[2..] = {n2 , n3 , ...}
T Parent(T )
if Mid(T ) becomes empty then
Mid(T ) NIL
until n is a leaf
return (Elem(n), Flat(r))
Note that function Elem(n) returns the only element stored inside leaf node
n. Similar as imperative insertion algorithm, a stub ground tree is used as the
parent of the root, which can simplify the logic a bit. Thats why we need flatten
the tree finally.
Below Python program translates the algorithm.
def extract_head(t):
root = Tree()
root.set_mid(t)
while t.front == [] and t.mid is not None:
t = t.mid
if t.front == [] and t.rear != []:
(t.front, t.rear) = (t.rear, t.front)
n = wraps(t.front)
while True: # a repeat-until loop
ns = n.children
n = ns[0]
t.front = ns[1:]
t = t.parent
if t.mid.empty():
t.mid.parent = None
t.mid = None
if n.leaf:
break
return (elem(n), flat(root))
Member function Tree.empty() returns true if both the front finger and the
rear finger are empty. We put a flag Node.leaf to mark if a node is a leaf or
compound node. The exercise of this section asks the reader to consider some
alternatives.
As the ill-formed tree is allowed, the algorithms to access the first and last
353
element of the finger tree must be modified, so that they dont blindly return the
first or last child of the finger as the finger can be empty if the tree is ill-formed.
The idea is quite similar to the Extract-Head, that in case the finger is
empty while the middle part inner tree isnt, we need traverse along with the
inner tree till a point that either the finger becomes non-empty or all the nodes
are stored in the other finger. For instance, the following algorithm can return
the first leaf node even the tree is ill-formed.
function First-Lf(T )
while Front(T ) = Mid(T ) = NIL do
T Mid(T )
if Front(T ) = Rear(T ) = then
n Rear(T )[1]
else
n Front(T )[1]
while n is NOT leaf do
n Children(n)[1]
return n
Note the second loop in this algorithm that it keeps traversing on the first
sub-node if current node isnt a leaf. So we always get a leaf node and its trivial
to get the element inside it.
function First(T )
return Elem(First-Lf(T ))
The following Python code translates the algorithm to real program.
def first(t):
return elem(first_leaf(t))
def first_leaf(t):
while t.front == [] and t.mid is not None:
t = t.mid
if t.front == [] and t.rear != []:
n = t.rear[0]
else:
n = t.front[0]
while not n.leaf:
n = n.children[0]
return n
354
12.6.5
Because finger tree is symmetric, we can give the realization of appending element on tail by referencing to insertT algorithm.
leaf (x)
tree({y}, , {x})
appendT (T, x) =
tree(F, appendT (M, tr3(x1 , x2 , x3 )), {x4 , x})
tree(F, M, R {x})
T =
T = leaf (y)
T = tree(F, M, {x1 , x2 , x3 , x4 })
otherwise
(12.28)
Generally speaking, if the rear finger is still valid 2-3 tree, that the number
of elements is not greater than 4, the new elements is directly appended to rear
finger. Otherwise, we break the rear finger, take the first 3 elements in rear
finger to create a new 2-3 tree, and recursively append it to the middle part
inner tree. If the finger tree is empty or a singleton leaf, it will be handled in
the first two cases.
Translating the equation to Haskell yields the below program.
snoc
snoc
snoc
snoc
snoc
:: Tree a a
Empty a = Lf a
(Lf a) b = Tr [a]
(Tr f m [a, b, c,
(Tr f m r) a = Tr
:
:
:
:
Tree a
Empty [b]
d]) e = Tr f (snoc m (Br3 a b c)) [d, e]
f m (r++[a])
Function name snoc is mirror of cons, which indicates the symmetric relationship.
Appending new element to the end imperatively is quite similar. The following algorithm realizes appending.
function Append-Node(T, n)
r Tree()
pr
Connect-Mid(p, T )
while Full?(Rear(T )) do
R Rear(T )
R = {n1 , n2 , ..., , nm1 , nm }
Rear(T ) {n, Last(R) }
last element nm
n Node()
Children(n) R[1...m 1]
{n1, n2, ..., nm1 }
pT
T Mid(T )
if T = NIL then
T Tree()
Front(T ) {n}
else if | Rear(T ) | = 1 Front(T ) = then
Front(T ) Rear(T )
Rear(T ) {n}
else
Rear(T ) Rear(T ) {n}
Connect-Mid(p, T ) T
return Flat(r)
And the corresponding Python programs is given as below.
355
12.6.6
Similar to appendT , we can realize the algorithm which remove the last element
from finger tree in symmetric manner of extractT .
We denote the non-empty, non-leaf finger tree as tree(F, M, R), where F is
the front finger, M is the middle part inner tree, and R is the rear finger.
T = leaf (x)
T = tree({y}, , {x})
T = tree(F, , {x}) F =
removeT (T ) =
otherwise
(12.29)
Function toList(T ) is used to flatten a 2-3 tree to plain list, which is defined
previously. Function init(L) returns all elements except for the last one in list
L, that if L = {a1 , a2 , ..., an1 , an }, init(L) = {a1 , a2 , ..., an1 }. And Function
last(L) returns the last element, so that last(L) = an . Please refer to the
appendix of this book for their implementation.
Algorithm removeT () can be translated to the following Haskell program,
we name it as unsnoc to indicate its the reverse function of snoc.
unsnoc
unsnoc
unsnoc
unsnoc
unsnoc
unsnoc
(, x)
(leaf (y), x)
(tree(init(F ), , last(F )), x)
(tree(F, M , toList(R )), x)
(tree(F, M, init(R)), last(R))
:
:
:
:
:
:: Tree a (Tree a, a)
(Lf a) = (Empty, a)
(Tr [a] Empty [b]) = (Lf a, b)
(Tr f@(_:_) Empty [a]) = (Tr (init f) Empty [last f], a)
(Tr f m [a]) = (Tr f m (nodeToList r), a) where (m, r) = unsnoc m
(Tr f m r) = (Tr f m (init r), last r)
And we can define a special function last and init for finger tree which is
similar to their counterpart for list.
last = snd unsnoc
init = fst unsnoc
356
Imperatively removing the element from the end is almost as same as removing on the head. Although there seems to be a special case, that as we always
store the only element (or sub node) in the front finger while the rear finger and
middle part inner tree are empty (e.g. T ree({n}, N IL, )), it might get nothing
if always try to fetch the last element from rear finger.
This can be solved by swapping the front and the rear finger if the rear is
empty as in the following algorithm.
function Extract-Tail(T )
r Tree()
Connect-Mid(r, T )
while Rear(T ) = Mid(T ) = NIL do
T Mid(T )
if Rear(T ) = Front(T ) = then
Exchange Front(T ) Rear(T )
n Node()
Children(n) Rear(T )
repeat
L Children(n)
L = {n1 , n2 , ..., nm1 , nm }
n Last(L)
n nm
Rear(T ) L[1...m 1]
{n1 , n2 , ..., nm1 }
T Parent(T )
if Mid(T ) becomes empty then
Mid(T ) NIL
until n is a leaf
return (Elem(n), Flat(r))
How to access the last element as well as implement this algorithm to working
program are left as exercises.
12.6.7
concatenate
Consider the none-trivial case that concatenate two finger trees T1 = tree(F1 , M1 , R1 )
and T2 = tree(F2 , M2 , R2 ). One natural idea is to use F1 as the new front finger
for the concatenated result, and keep R2 being the new rear finger. The rest of
work is to merge M1 , R1 , F2 and M2 to a new middle part inner tree.
Note that both R1 and F2 are plain lists of node, so the sub-problem is to
realize a algorithm like this.
merge(M1 , R1 F2 , M2 ) =?
More observation reveals that both M1 and M2 are also finger trees, except
that they are one level deeper than T1 and T2 in terms of N ode(a), where a is
the type of element stored in the tree. We can recursively use the strategy that
keep the front finger of M1 and the rear finger of M2 , then merge the middle
part inner tree of M1 , M2 , as well as the rear finger of M1 and front finger of
M2 .
If we denote function f ront(T ) returns the front finger, rear(T ) returns
the rear finger, mid(T ) returns the middle part inner tree. the above merge
357
(12.30)
(12.31)
And compare it with equation 12.30, its easy to note the fact that concatenating is essentially merging. So we have the final algorithm like this.
concat(T1 , T2 ) = merge(T1 , , T2 )
(12.32)
f oldR(insertT, T2 , S) : T1 =
f oldL(appendT, T1 , S) : T2 =
merge(, {x} S, T2 ) : T1 = leaf (x)
merge(T1 , S, T2 ) =
merge(T
e : L=
f oldL(f, f (e, a1 ), L ) : otherwise
(12.34)
e : L=
f (a1 , f oldR(f, e, L )) : otherwise
(12.35)
358
algorithm, take out the last 3 elements, wrap them in a 2-3 tree, and recursive
perform insertT . Here is the definition of nodes.
{tr2(x1 , x2 )}
{tr3(x1 , x2 , x3 )}
nodes(L) =
{tr2(x1 , x2 ), tr2(x3 , x4 )}
L = {x1 , x2 }
L = {x1 , x2 , x3 }
L = {x1 , x2 , x3 , x4 }
otherwise
(12.36)
Function nodes follows the constraint of 2-3 tree, that if there are only 2 or
3 elements in the list, it just wrap them in singleton list contains a 2-3 tree; If
there are 4 elements in the lists, it split them into two trees each is consist of 2
branches; Otherwise, if there are more elements than 4, it wraps the first three
in to one tree with 3 branches, and recursively call nodes to process the rest.
The performance of concatenation is determined by merging. Analyze the
recursive case of merging reveals that the depth of recursion is proportion to the
smaller height of the two trees. As the tree is ensured to be balanced by using
2-3 tree. its height is bound to O(lg n ) where n is the number of elements. The
edge case of merging performs as same as insertion, (It calls insertT at most
8 times) which is amortized O(1) time, and O(lg m) at worst case, where m is
the dierence in height of the two trees. So the overall performance is bound to
O(lg n), where n is the total number of elements contains in two finger trees.
The following Haskell program implements the concatenation algorithm.
:
:
:
:
:: [a] [Node a]
[a, b] = [Br2 a b]
[a, b, c] = [Br3 a b c]
[a, b, c, d] = [Br2 a b, Br2 c d]
(a:b:c:xs) = Br3 a b c:nodes xs
359
Once either tree becomes empty, we stop traversing, and repeatedly insert
the 2-3 tree nodes in N to the other non-empty tree, and set it as the new
middle part inner tree of the upper level result.
Below algorithm describes this process in detail.
function Concat(T1 , T2 )
return Merge(T1 , , T2 )
function Merge(T1 , N, T2 )
r Tree()
pr
while T1 = NIL T2 = NIL do
T Tree()
Front(T ) Front(T1 )
Rear(T ) Rear(T2 )
Connect-Mid(p, T )
pT
N Nodes(Rear(T1 ) n Front(T2 ))
T1 Mid(T1 )
T2 Mid(T2 )
if T1 = NIL then
T T2
for each n Reverse(N ) do
T Prepend-Node(n, T )
else if T2 = NIL then
T T1
for each n N do
T Append-Node(T, n)
Connect-Mid(p, T )
return Flat(r)
Note that the for-each loops in the algorithm can also be replaced by folding
from left and right respectively. Translating this algorithm to Python program
yields the below code.
def concat(t1, t2):
return merge(t1, [], t2)
def merge(t1, ns, t2):
root = prev = Tree() #sentinel dummy tree
while t1 is not None and t2 is not None:
t = Tree(t1.size + t2.size + sizeNs(ns), t1.front, None, t2.rear)
prev.set_mid(t)
prev = t
ns = nodes(t1.rear + ns + t2.front)
t1 = t1.mid
t2 = t2.mid
if t1 is None:
prev.set_mid(foldR(prepend_node, ns, t2))
elif t2 is None:
prev.set_mid(reduce(append_node, ns, t1))
return flat(root)
360
Exercise 12.5
361
12.6.8
size augmentation
The strategy to provide fast random access, is to turn the looking up into treesearch. In order to avoid calculating the size of tree many times, we augment
an extra field to tree and node. The definition should be modified accordingly,
for example the following Haskell definition adds size field in its constructor.
data Tree a = Empty
| Lf a
| Tr Int [a] (Tree (Node a)) [a]
Suppose the function tree(s, F, M, R) creates a finger tree from size s, front
finger F , rear finger R, and middle part inner tree M . When the size of the tree
is needed, we can call a size(T ) function. It will be something like this.
0 : T =
? : T = leaf (x)
size(T ) =
s : T = tree(s, F, M, R)
If the tree is empty, the size is definitely zero; and if it can be expressed as
tree(s, F, M, R), the size is s; however, what if the tree is a singleton leaf? is
it 1? No, it can be 1 only if T = leaf (a) and a isnt a tree node, but a raw
362
element stored in finger tree. In most cases, the size is not 1, because a can be
again a tree node. Thats why we put a ? in above equation.
The correct way is to call some size function on the tree node as the following.
0 : T =
s : T = tree(s, F, M, R)
Note that this isnt a recursive definition since size = size , the argument
to size is either a tree node, which is a 2-3 tree, or a plain element stored in
the finger tree. To uniform these two cases, we can anyway wrap the single
plain element to a tree node of only one element. So that we can express all
the situation as a tree node augmented with a size field. The following Haskell
program modifies the definition of tree node.
data Node a = Br Int [a]
(12.38)
As both front finger and rear finger are lists of tree nodes, in order to calculate the total size of finger, we can provide a size (L) function, which sums up
size of all nodes stored in the list. Denote L = {a1 , a2 , ...} and L = {a2 , a3 , ...}.
{
0 : L=
size (L) =
(12.39)
size (a1 ) + size (L ) : otherwise
Its quite OK to define size (L) by using some high order functions. For
example.
size (L) = sum(map(size , L))
(12.40)
363
And we can turn a list of tree nodes into one deeper 2-3 tree and vice-versa.
wraps(L) = tr(size (L), L)
unwraps(n) = L : n = tr(s, L)
(12.41)
(12.42)
364
M =R=
M =F =
F = , (F , M ) = extractT (M
tree (F, M, R) =
R = , (M , R ) = removeT (M
otherwise
(12.43)
Where f romL() helps to turn a list of nodes to a finger tree by repeatedly
inserting all the element one by one to an empty tree.
f romL(F )
f romL(R)
tree (unwraps(F ), M , R)
tree (F, M , unwraps(R ))
:
:
:
:
:
f romL(L) = f oldR(insertT , , L)
Of course it can be implemented in pure recursive manner without using
folding as well.
The last case is the most straightforward one. If none of F , M , and R is
empty, it adds the size of these three part and construct the tree along with
this size information by calling tree(s, F, M, R) function. If both the middle
part inner tree and one of the finger is empty, the algorithm repeatedly insert
all elements stored in the other finger to an empty tree, so that the result is
constructed from a list of tree nodes. If the middle part inner tree isnt empty,
and one of the finger is empty, the algorithm borrows one tree node from the
middle part, either by extracting from head if front finger is empty or removing
from tail if rear finger is empty. Then the algorithm unwraps the borrowed
tree node to a list, and recursively call tree () function to construct the result.
This algorithm can be translated to the following Haskell code for example.
tree
tree
tree
tree
tree
Function tree () helps to minimize the modification. insertT () can be realized by using it like the following.
leaf (x)
tree ({x}, , {y})
insertT (x, T ) =
tree ({x, x1 }, insertT (wraps({x2 , x3 , x4 }), M ), R)
tree ({x} F, M, R)
: T =
: T = leaf (x)
: T = tree(s, {x1 , x2 , x3 , x4 }
: otherwise
(12.44)
a
a
a
a
365
Empty = Lf a
(Lf b) = tree [a] Empty [b]
(Tr _ [b, c, d, e] m r) = tree [a, b] (cons (wraps [c, d, e]) m) r
(Tr _ f m r) = tree (a:f) m r
The similar modification for augment size should also be tuned for imperative
algorithms, for example, when a new node is prepend to the head of the finger
tree, we should update size when traverse the tree.
function Prepend-Node(n, T )
r Tree()
pr
Connect-Mid(p, T )
while Full?(Front(T )) do
F Front(T )
Front(T ) {n, F [1]}
Size(T ) Size(T ) + Size(n)
update size
n Node()
Children(n) F [2..]
pT
T Mid(T )
if T = NIL then
T Tree()
Front(T ) {n}
else if | Front(T ) | = 1 Rear(T ) = then
Rear(T ) Front(T )
Front(T ) {n}
else
Front(T ) {n} Front(T )
Size(T ) Size(T ) + Size(n)
update size
Connect-Mid(p, T ) T
return Flat(r)
The corresponding Python code are modified accordingly as below.
def prepend_node(n, t):
root = prev = Tree()
prev.set_mid(t)
while frontFull(t):
f = t.front
t.front = [n] + f[:1]
t.size = t.size + n.size
n = wraps(f[1:])
prev = t
t = t.mid
if t is None:
t = leaf(n)
elif len(t.front)==1 and t.rear == []:
t = Tree(n.size + t.size, [n], None, t.front)
else:
t = Tree(n.size + t.size, [n]+t.front, t.mid, t.rear)
prev.set_mid(t)
return flat(root)
366
Note that the tree constructor is also modified to take a size argument as
the first parameter. And the leaf helper function does not only construct the
tree from a node, but also set the size of the tree with the same size of the node
inside it.
For simplification purpose, we skip the detailed description of what are modified in extractT , appendT , removeT , and concat algorithms. They are left as
exercises to the reader.
Split a finger tree at a given position
With size information augmented, its easy to locate a node at given position
by performing a tree search. Whats more, as the finger tree is constructed from
three part F , M , and R; and its nature of recursive, its also possible to split
it into three sub parts with a given position i: the left, the node at i, and the
right part.
The idea is straight forward. Since we have the size information for F ,
M , and R. Denote these three sizes as Sf , Sm , and Sr . if the given position
i Sf , the node must be stored in F , we can go on seeking the node inside F ; if
Sf < i Sf + Sm , the node must be stored in M , we need recursively perform
search in M ; otherwise, the node should be in R, we need search inside R.
If we skip the error handling of trying to split an empty tree, there is only
one edge case as below.
{
(, x, ) : T = leaf (x)
splitAt(i, T ) =
... : otherwise
Splitting a leaf results both the left and right parts empty, the node stored
in leaf is the resulting node.
The recursive case handles the three sub cases by comparing i with the
sizes. Suppose function splitAtL(i, L) splits a list of nodes at given position i
into three parts: (A, x, B) = splitAtL(i, L), where x is the i-th node in L, A is
a sub list contains all nodes before position i, and B is a sub list contains all
rest nodes after i.
(, x, )
(f romL(A), x, tree (B, M, R)
splitAt(i, T ) =
(tree (F, Ml , A), x, tree (B, Mr , R)
:
:
:
:
T = leaf (x)
i Sf , (A, x, B) = splitAtL(i, F )
Sf < i Sf + Sm
otherwise, (A, x, B) = splitAtL(i Sf Sm
(12.45)
Where Ml , x, Mr , A, B in the thrid case are calculated as the following.
(Ml , t, Mr ) = splitAt(i Sf , M )
(A, x, B) = splitAtL(i Sf size(Ml ), unwraps(t))
And the function splitAtL is just a linear traverse, since the length of list is
limited not to exceed the constraint of 2-3 tree, the performance is still ensured
to be constant O(1) time. Denote L = {x1 , x2 , ...} and L = {x2 , x3 , ...}.
splitAtL(i, L) =
(, x1 , )
(, x1 , L )
({x1 } A, x, B)
:
:
:
i = 0 L = {x1 }
i < size (x1 )
otherwise
(12.46)
367
Where
(A, x, B) = splitAtL(i size (x1 ), L )
The solution of splitting is a typical divide and conquer strategy. The performance of this algorithm is determined by the recursive case of searching in
middle part inner tree. Other cases are all constant time as weve analyzed. The
depth of recursion is proportion to the height of the tree h, so the algorithm is
bound to O(h). Because the tree is well balanced (by using 2-3 tree, and all
the insertion/removal algorithms keep the tree balanced), so h = O(lg n) where
n is the number of elements stored in finger tree. The overall performance of
splitting is O(lg n).
Lets first give the Haskell program for splitAtL function
splitNodesAt 0 [x] = ([], x, [])
splitNodesAt i (x:xs) | i < size x = ([], x, xs)
| otherwise = let (xs, y, ys) = splitNodesAt (i-size x) xs
in (x:xs, y, ys)
Then the program for splitAt, as there is already function defined in standard
library with this name, we slightly change the name by adding a apostrophe.
splitAt _ (Lf x) = (Empty, x, Empty)
splitAt i (Tr _ f m r)
| i < szf = let (xs, y, ys) = splitNodesAt i f
in ((foldr cons Empty xs), y, tree ys m r)
| i < szf + szm = let (m1, t, m2) = splitAt (i-szf) m
(xs, y, ys) = splitNodesAt (i-szf - sizeT m1) (unwraps t)
in (tree f m1 xs, y, tree ys m2 r)
| otherwise = let (xs, y, ys) = splitNodesAt (i-szf -szm) r
in (tree f m xs, y, foldr cons Empty ys)
where
szf = sizeL f
szm = sizeT m
Random access
With the help of splitting at any arbitrary position, its trivial to realize random
access in O(lg n) time. Denote function mid(x) returns the 2-nd element of a
tuple, lef t(x), and right(x) return the first element and the 3-rd element of the
tuple respectively.
getAt(S, i) = unwrap(mid(splitAt(i, S)))
(12.47)
It first splits the sequence at position i, then unwraps the node to get the element stored inside it. When mutate the i-th element of sequence S represented
by finger tree, we first split it at i, then we replace the middle to what we want
to mutate, and re-construct them to one finger tree by using concatenation.
setAt(S, i, x) = concat(L, insertT (x, R))
where
(L, y, R) = splitAt(i, S)
(12.48)
368
Whats more, we can also realize a removeAt(S, i) function, which can remove the i-th element from sequence S. The idea is first to split at i, unwrap
and return the element of the i-th node; then concatenate the left and right to
a new finger tree.
removeAt(S, i) = (unwrap(y), concat(L, R))
(12.49)
369
The algorithm returns the previous element before applying f as the final
result.
What hasnt been factored is the algorithm Lookup-Nodes(L, i, f ). It
takes a list of nodes, a position index, and a function to be applied. This
algorithm can be implemented by checking every node in the list. If the node is
a leaf, and the index is zero, we are at the right position to be looked up. The
function can be applied on the element stored in this leaf, and the previous value
is returned; Otherwise, we need compare the size of this node and the index to
determine if the position is inside this node and search inside the children of the
node if necessary.
function Lookup-Nodes(L, i, f )
loop
for n L do
if n is leaf i = 0 then
x Elem(n)
Elem(n) f (x)
return x
if i < Size(n) then
L Children(n)
break
i i Size(n)
The following are the corresponding Python code implements the algorithms.
def applyAt(t, i, f):
while t.size > 1:
szf = sizeNs(t.front)
szm = sizeT(t.mid)
if i < szf:
return lookupNs(t.front, i, f)
elif i < szf + szm:
t = t.mid
i = i - szf
else:
return lookupNs(t.rear, i - szf - szm, f)
n = first_leaf(t)
x = elem(n)
n.children[0] = f(x)
return x
def lookupNs(ns, i, f):
while True:
for n in ns:
if n.leaf and i == 0:
x = elem(n)
n.children[0] = f(x)
return x
if i < n.size:
ns = n.children
break
i = i - n.size
With auxiliary algorithm that can apply function at a given position, its
trivial to implement the Get-At and Set-At by passing special functions for
370
applying.
function Get-At(T, i)
return Apply-At(T, i, x .x)
function Set-At(T, i, x)
return Apply-At(T, i, y .x)
That is we pass id function to implement getting element at a position,
which doesnt change anything at all; and pass constant function to implement
setting, which set the element to new value by ignoring its previous value.
Imperative splitting
Its not enough to just realizing Apply-At algorithm in imperative settings,
this is because removing element at arbitrary position is also a typical case.
Almost all the imperative finger tree algorithms so far are kind of one-pass
top-down manner. Although we sometimes need to book keeping the root. It
means that we can even realize all of them without using the parent field.
Splitting operation, however, can be easily implemented by using parent
field. We can first perform a top-down traverse along with the middle part
inner tree as long as the splitting position doesnt located in front or rear finger.
After that, we need a bottom-up traverse along with the parent field of the two
split trees to fill out the necessary fields.
function Split-At(T, i)
T1 Tree()
T2 Tree()
while Sf i < Sf + Sm do
Top-down pass
T1 Tree()
T2 Tree()
Front(T1 ) Front(T )
Rear(T2 ) Rear(T )
Connect-Mid(T1 , T1 )
Connect-Mid(T2 , T2 )
T1 T1
T2 T2
i i Sf
T Mid(T )
if i < Sf then
(X, n, Y ) Split-Nodes(Front(T ), i)
T1 From-Nodes(X)
T2 T
Size(T2 ) Size(T ) - Size-Nodes(X) - Size(n)
Front(T2 ) Y
else if Sf + Sm i then
(X, n, Y ) Split-Nodes(Rear(T ), i Sf Sm )
T2 From-Nodes(Y )
T1 T
Size(T1 ) Size(T ) - Size-Nodes(Y ) - Size(n)
Rear(T1 ) X
Connect-Mid(T1 , T1 )
371
Bottom-up pass
The algorithm first creates two trees T1 and T2 to hold the split results. Note
that they are created as ground trees which are parents of the roots. The first
pass is a top-down pass. Suppose Sf , and Sm retrieve the size of the front finger
and the size of middle part inner tree respectively. If the position at which the
tree to be split is located at middle part inner tree, we reuse the front finger of
T for new created T1 , and reuse rear finger of T for T2 . At this time point, we
cant fill the other fields for T1 and T2 , they are left empty, and well finish filling
them in the future. After that, we connect T1 and T1 so the latter becomes the
middle part inner tree of the former. The similar connection is done for T2 and
T2 as well. Finally, we update the position by deducing it by the size of front
finger, and go on traversing along with the middle part inner tree.
When the first pass finishes, we are at a position that either the splitting
should be performed in front finger, or in rear finger. Splitting the nodes in
finger results a tuple, that the first part and the third part are lists before and
after the splitting point, while the second part is a node contains the element at
the original position to be split. As both fingers hold at most 3 nodes because
they are actually 2-3 trees, the nodes splitting algorithm can be performed by
a linear search.
function Split-Nodes(L, i)
for j [1, Length(L) ] do
if i < Size(L[j]) then
return (L[1...j 1], L[j], L[j + 1... Length(L) ])
i i Size(L[j])
We next create two new result trees T1 and T2 from this tuple, and connected
them as the final middle part inner tree of T1 and T2 .
Next we need perform a bottom-up traverse along with the result trees to
fill out all the empty information we skipped in the first pass.
We loop on the second part of the tuple, the node, till it becomes a leaf. In
each iteration, we repeatedly splitting the children of the node with an updated
position i. The first list of nodes returned from splitting is used to fill the rear
finger of T1 ; and the other list of nodes is used to fill the front finger of T2 .
After that, since all the three parts of a finger tree the front and rear finger,
and the middle part inner tree are filled, we can then calculate the size of the
tree by summing these three parts up.
function Sum-Sizes(T )
return Size-Nodes(Front(T )) + Size-Tr(Mid(T )) + Size-Nodes(Rear(T ))
372
Next, the iteration goes on along with the parent fields of T1 and T2 . The
last black-box algorithm is From-Nodes(L), which can create a finger tree
from a list of nodes. It can be easily realized by repeatedly perform insertion
on an empty tree. The implementation is left as an exercise to the reader.
The example Python code for splitting is given as below.
def splitAt(t, i):
(t1, t2) = (Tree(), Tree())
while szf(t) i and i < szf(t) + szm(t):
fst = Tree(0, t.front, None, [])
snd = Tree(0, [], None, t.rear)
t1.set_mid(fst)
t2.set_mid(snd)
(t1, t2) = (fst, snd)
i = i - szf(t)
t = t.mid
if i < szf(t):
(xs, n, ys) = splitNs(t.front, i)
sz = t.size - sizeNs(xs) - n.size
(fst, snd) = (fromNodes(xs), Tree(sz, ys, t.mid, t.rear))
elif szf(t) + szm(t) i:
(xs, n, ys) = splitNs(t.rear, i - szf(t) - szm(t))
sz = t.size - sizeNs(ys) - n.size
(fst, snd) = (Tree(sz, t.front, t.mid, xs), fromNodes(ys))
t1.set_mid(fst)
t2.set_mid(snd)
i = i - sizeT(fst)
while not n.leaf:
(xs, n, ys) = splitNs(n.children, i)
i = i - sizeNs(xs)
(t1.rear, t2.front) = (xs, ys)
t1.size = sizeNs(t1.front) + sizeT(t1.mid) + sizeNs(t1.rear)
t2.size = sizeNs(t2.front) + sizeT(t2.mid) + sizeNs(t2.rear)
(t1, t2) = (t1.parent, t2.parent)
return (flat(t1), elem(n), flat(t2))
The program to split a list of nodes at a given position is listed like this.
def splitNs(ns, i):
for j in range(len(ns)):
if i < ns[j].size:
return (ns[:j], ns[j], ns[j+1:])
i = i - ns[j].size
Exercise 12.6
373
1. Another way to realize insertT is to force increasing the size field by one,
so that we neednt write function tree . Try to realize the algorithm by
using this idea.
2. Try to handle the augment size information as well as in insertT algorithm for the following algorithms (both functional and imperative):
extractT , appendT , removeT , and concat . The head, tail, init and
last functions should be kept unchanged. Dont refer to the downloadable programs along with this book before you take a try.
3. In the imperative Apply-At algorithm, it tests if the size of the current
tree is greater than one. Why dont we test if the current tree is a leaf?
Tell the dierence between these two approaches.
4. Implement the From-Nodes(L) in your favorite imperative programming
language. You can either use looping or create a folding-from-right sub
algorithm.
12.7
374
Bibliography
[1] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[2] Chris Okasaki. Purely Functional Random-Access Lists. Functional Programming Languages and Computer Architecture, June 1995, pages 86-95.
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001.
ISBN: 0262032937.
[4] Miran Lipovaca. Learn You a Haskell for Great Good! A Beginners
Guide. No Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327283-8
[5] Ralf Hinze and Ross Paterson. Finger Trees: A Simple General-purpose
Data Structure. in Journal of Functional Programming16:2 (2006), pages
197-217. http://www.soi.city.ac.uk/ ross/papers/FingerTree.html
[6] Guibas, L. J., McCreight, E. M., Plass, M. F., Roberts, J. R. (1977), A
new representation for linear lists. Conference Record of the Ninth Annual
ACM Symposium on Theory of Computing, pp. 49C60.
[7] Generic finger-tree structure. http://hackage.haskell.org/packages/archive/fingertree/0.0/doc/html/DataFingerTree.html
[8] Wikipedia. Move-to-front transform. http://en.wikipedia.org/wiki/Moveto-front transform
375
376
BIBLIOGRAPHY
Part V
377
Chapter 13
Introduction
Its proved that the best approximate performance of comparison based sorting
is O(n lg n) [1]. In this chapter, two divide and conquer sorting algorithms are
introduced. Both of them perform in O(n lg n) time. One is quick sort. It is
the most popular sorting algorithm. Quick sort has been well studied, many
programming libraries provide sorting tools based on quick sort.
In this chapter, well first introduce the idea of quick sort, which demonstrates the power of divide and conquer strategy well. Several variants will be
explained, and well see when quick sort performs poor in some special cases.
That the algorithm is not able to partition the sequence in balance.
In order to solve the unbalanced partition problem, well next introduce
about merge sort, which ensure the sequence to be well partitioned in all the
cases. Some variants of merge sort, including nature merge sort, bottom-up
merge sort are shown as well.
Same as other chapters, all the algorithm will be realized in both imperative
and functional approaches.
13.2
Quick sort
380CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
13.2.1
Basic version
Summarize the above instruction leads to the recursive description of quick sort.
In order to sort a sequence of elements L.
If L is empty, the result is obviously empty; This is the trivial edge case;
Otherwise, select an arbitrary element in L as a pivot, recursively sort all
elements not greater than the pivot, put the result on the left hand of the
pivot, and recursively sort all elements which are greater than the pivot,
put the result on the right hand of the pivot.
381
Note that the emphasized word and, we dont use then here, which indicates
its quite OK that the recursive sort on the left and right can be done in parallel.
Well return this parallelism topic soon.
Quick sort was first developed by C. A. R. Hoare in 1960 [1], [15]. What
we describe here is a basic version. Note that it doesnt state how to select the
pivot. Well see soon that the pivot selection aects the performance of quick
sort dramatically.
The most simple method to select the pivot is always choose the first one so
that quick sort can be formalized as the following.
{
: L=
sort({x|x L , x l1 }) {l1 } sort({x|x L , l1 < x}) : otherwise
(13.1)
Where l1 is the first element of the non-empty list L, and L contains the rest
elements {l2 , l3 , ...}. Note that we use Zermelo Frankel expression (ZF expression for short)1 , which is also known as list comprehension. A ZF expression
{a|a S, p1 (a), p2 (a), ...} means taking all element in set S, if it satisfies all
the predication p1 , p2 , .... ZF expression is originally used for representing set,
we extend it to express list for the sake of brevity. There can be duplicated
elements, and dierent permutations represent for dierent list. Please refer to
the appendix about list in this book for detail.
Its quite straightforward to translate this equation to real code if list comprehension is supported. The following Haskell code is given for example:
sort(L) =
sort [] = []
sort (x:xs) = sort [y | yxs, y x] ++ [x] ++ sort [y | yxs, x < y]
This might be the shortest quick sort program in the world at the time when
this book is written. Even a verbose version is still very expressive:
sort [] = []
sort (x:xs) = as ++ [x] ++ bs where
as = sort [ a | a xs, a x]
bs = sort [ b | b xs, x < b]
There are some variants of this basic quick sort program, such as using
explicit filtering instead of list comprehension. The following Python program
demonstrates this for example:
def sort(xs):
if xs == []:
return []
pivot = xs[0]
as = sort(filter(lambda x : x pivot, xs[1:]))
bs = sort(filter(lambda x : pivot < x, xs[1:]))
return as + [pivot] + bs
13.2.2
We assume the elements are sorted in monotonic none decreasing order so far.
Its quite possible to customize the algorithm, so that it can sort the elements
1 Name
for the two mathematicans who found the modern set theory.
382CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
in other ordering criteria. This is necessary in practice because users may sort
numbers, strings, or other complex objects (even list of lists for example).
The typical generic solution is to abstract the comparison as a parameter as
we mentioned in chapters about insertion sort and selection sort. Although it
neednt the total ordering, the comparison must satisfy strict weak ordering at
least [17] [16].
For the sake of brevity, we only considering sort the elements by using less
than or equal (equivalent to not greater than) in the rest of the chapter.
13.2.3
Partition
Observing that the basic version actually takes two passes to find all elements
which are greater than the pivot as well as to find the others which are not
respectively. Such partition can be accomplished by only one pass. We explicitly
define the partition as below.
L=
p(l1 ), (A, B) = partition(p, L )
p(l1 )
(13.2)
Note that the operation {x} L is just a cons operation, which only takes
constant time. The quick sort can be modified accordingly.
partition(p, L) =
(, ) :
({l1 } A, B) :
(A, {l1 } B) :
: L=
sort(A) {l1 } sort(B) : otherwise, (A, B) = partition(x x l1 , L )
(13.3)
Translating this new algorithm into Haskell yields the below code.
sort(L) =
sort [] = []
sort (x:xs) = sort as ++ [x] ++ sort bs where
(as, bs) = partition ( x) xs
partition _ [] = ([], [])
partition p (x:xs) = let (as, bs) = partition p xs in
if p x then (x:as, bs) else (as, x:bs)
The concept of partition is very critical to quick sort. Partition is also very
important to many other sort algorithms. Well explain how it generally aects
the sorting methodology by the end of this chapter. Before further discussion
about fine tuning of quick sort specific partition, lets see how to realize it inplace imperatively.
There are many partition methods. The one given by Nico Lomuto [4] [2] will
be used here as its easy to understand. Well show other partition algorithms
soon and see how partitioning aects the performance.
Figure 13.2 shows the idea of this one-pass partition method. The array is
processed from left to right. At any time, the array consists of the following
parts as shown in figure 13.2 (a):
The left most cell contains the pivot; By the end of the partition process,
the pivot will be moved to the final proper position;
383
pivot
x[l]
left
right
...?...x[u]
pivot
left
x[l]
right
x[l+1]
...?...x[u]
(b) Start
left
pivot
x[l]
x[left]
right
swap
(c) Finish
Figure 13.2: Partition a range of array by using the left most element as pivot.
A segment contains all elements which are not greater than the pivot. The
right boundary of this segment is marked as left;
A segment contains all elements which are greater than the pivot. The
right boundary of this segment is marked as right; It means that elements
between left and right marks are greater than the pivot;
The rest of elements after right mark havent been processed yet. They
may be greater than the pivot or not.
At the beginning of partition, the left mark points to the pivot and the
right mark points to the the second element next to the pivot in the array as
in Figure 13.2 (b); Then the algorithm repeatedly advances the right mark one
element after the other till passes the end of the array.
In every iteration, the element pointed by the right mark is compared with
the pivot. If it is greater than the pivot, it should be among the segment between
the left and right marks, so that the algorithm goes on to advance the right
mark and examine the next element; Otherwise, since the element pointed by
right mark is less than or equal to the pivot (not greater than), it should be
put before the left mark. In order to achieve this, the left mark needs be
advanced by one, then exchange the elements pointed by the left and right
marks.
Once the right mark passes the last element, it means that all the elements
have been processed. The elements which are greater than the pivot have been
moved to the right hand of left mark while the others are to the left hand of this
mark. Note that the pivot should move between the two segments. An extra
384CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
exchanging between the pivot and the element pointed by left mark makes this
final one to the correct location. This is shown by the swap bi-directional arrow
in figure 13.2 (c).
The left mark (which points the pivot finally) partitions the whole array
into two parts, it is returned as the result. We typically increase the left
mark by one, so that it points to the first element greater than the pivot for
convenient. Note that the array is modified in-place.
The partition algorithm can be described as the following. It takes three
arguments, the array A, the lower and the upper bound to be partitioned 2 .
1: function Partition(A, l, u)
2:
p A[l]
the pivot
3:
Ll
the left mark
4:
for R [l + 1, u] do
iterate on the right mark
5:
if (p < A[R]) then negate of < is enough for strict weak order
6:
LL+1
7:
Exchange A[L] A[R]
8:
Exchange A[L] p
9:
return L + 1
The partition position
Below table shows the steps of partitioning the array {3, 2, 5, 4, 0, 1, 6, 7}.
(l) 3
(r) 2
5
4
0
1
6
7
initialize, pivot = 3, l = 1, r = 2
3
(l)(r) 2
5
4
0
1
6
7
2 < 3, advance l, (r = l)
3
(l) 2
(r) 5
4
0
1
6
7
5 > 3, move on
3
(l) 2
5
(r) 4
0
1
6
7
4 > 3, move on
3
(l) 2
5
4
(r) 0
1
6
7
0<3
3
2
(l) 0
4
(r) 5
1
6
7
Advance l, then swap with r
3
2
(l) 0
4
5
(r) 1
6
7
1<3
3
2
0
(l) 1
5
(r) 4
6
7
Advance l, then swap with r
3
2
0
(l) 1
5
4
(r) 6
7
6 > 3, move on
3
2
0
(l) 1
5
4
6
(r) 7 7 > 3, move on
1
2
0
3
(l+1) 5
4
6
7
r passes the end, swap pivot and l
This version of partition algorithm can be implemented in ANSI C as the
following.
int partition(Key xs, int l, int u) {
int pivot, r;
for (pivot = l, r = l + 1; r < u; ++r)
if (!(xs[pivot] < xs[r])) {
++l;
swap(xs[l], xs[r]);
}
swap(xs[pivot], xs[l]);
return l + 1;
}
385
With this partition method realized, the imperative in-place quick sort can
be accomplished as the following.
1: procedure Quick-Sort(A, l, u)
2:
if l < u then
3:
m Partition(A, l, u)
4:
Quick-Sort(A, l, m 1)
5:
Quick-Sort(A, m, u)
When sort an array, this procedure is called by passing the whole range as
the lower and upper bounds. Quick-Sort(A, 1, |A|). Note that when l u it
means the array slice is either empty, or just contains only one element, both
can be treated as ordered, so the algorithm does nothing in such cases.
Below ANSI C example program completes the basic in-place quick sort.
void quicksort(Key xs, int l, int u) {
int m;
if (l < u) {
m = partition(xs, l, u);
quicksort(xs, l, m - 1);
quicksort(xs, m, u);
}
}
13.2.4
Before exploring how to improve the partition for basic version quick sort, its
obviously that the one presented so far can be defined by using folding. Please
refer to the appendix A of this book for definition of folding.
partition(p, L) = f old(f (p), (, ), L)
(13.4)
Where function f compares the element to the pivot with predicate p (which
is passed to f as a parameter, so that f is in curried form, see appendix A for
detail. Alternatively, f can be a lexical closure which is in the scope of partition,
so that it can access the predicate in this scope.), and update the result pair
accordingly.
{
f (p, x, (A, B)) =
({x} A, B) :
(A, {x} B) :
p(x)
otherwise(p(x))
(13.5)
Note we actually use pattern-matching style definition. In environment without pattern-matching support, the pair (A, B) should be represented by a variable, for example P , and use access functions to extract its first and second
parts.
The example Haskell program needs to be modified accordingly.
sort [] = []
sort (x:xs) = sort small ++ [x] ++ sort big where
(small, big) = foldr f ([], []) xs
f a (as, bs) = if a x then (a:as, bs) else (as, a:bs)
386CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
Accumulated partition
The partition algorithm by using folding actually accumulates to the result pair
of lists (A, B). That if the element is not greater than the pivot, its accumulated
to A, otherwise to B. We can explicitly express it which save spaces and is
friendly for tail-recursive call optimization (refer to the appendix A of this book
for detail).
partition(p, L, A, B) =
(A, B) :
partition(p, L , {l1 } A, B) :
partition(p, L , A, {l1 } B) :
L=
p(l1 )
otherwise
(13.6)
sort (L, S) =
S : L=
sort(A, {l1 } sort(B, S)) : otherwise
(13.8)
387
[] acc = acc
(x:xs) acc = asort as (x:asort bs acc) where
bs) = part xs [] []
[] as bs = (as, bs)
(y:ys) as bs | y x = part ys (y:as) bs
| otherwise = part ys as (y:bs)
Exercise 13.1
Implement the recursive basic quick sort algorithm in your favorite imperative programming language.
Same as the imperative algorithm, one minor improvement is that besides
the empty case, we neednt sort the singleton list, implement this idea in
the functional algorithm as well.
The accumulated quick sort algorithm developed in this section uses intermediate variable A, B. They can be eliminated by defining the partition
function to mutually recursive call the sort function. Implement this idea
in your favorite functional programming language. Please dont refer to
the downloadable example program along with this book before you try
it.
13.3
Quick sort performs well in practice, however, its not easy to give theoretical
analysis. It needs the tool of probability to prove the average case performance.
Nevertheless, its intuitive to calculate the best case and worst case performance. Its obviously that the best case happens when every partition divides
the sequence into two slices with equal size. Thus it takes O(lg n) recursive calls
as shown in figure 13.3.
There are total O(lg n) levels of recursion. In the first level, it executes one
partition, which processes n elements; In the second level, it executes partition
two times, each processes n/2 elements, so the total time in the second level
bounds to 2O(n/2) = O(n) as well. In the third level, it executes partition four
times, each processes n/4 elements. The total time in the third level is also
bound to O(n); ... In the last level, there are n small slices each contains a
single element, the time is bound to O(n). Summing all the time in each level
gives the total performance of quick sort in best case as O(n lg n).
However, in the worst case, the partition process unluckily divides the sequence to two slices with unbalanced lengths in most time. That one slices with
length O(1), the other is O(n). Thus the recursive time degrades to O(n). If
we draw a similar figure, unlike in the best case, which forms a balanced binary
tree, the worst case degrades into a very unbalanced tree that every node has
only one child, while the other is empty. The binary tree turns to be a linked
388CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
n /4
n/2
n/2
n /4
n /4
n /4
...n...
...lg(n)...
Figure 13.3: In the best case, quick sort divides the sequence into two slices
with same length.
list with O(n) length. And in every level, all the elements are processed, so the
total performance in worst case is O(n2 ), which is as same poor as insertion sort
and selection sort.
Lets consider when the worst case will happen. One special case is that
all the elements (or most of the elements) are same. Nico Lomutos partition
method deals with such sequence poor. Well see how to solve this problem by
introducing other partition algorithm in the next section.
The other two obvious cases which lead to worst case happen when the
sequence has already in ascending or descending order. Partition the ascending
sequence makes an empty sub list before the pivot, while the list after the
pivot contains all the rest elements. Partition the descending sequence gives an
opponent result.
There are other cases which lead quick sort performs poor. There is no
completely satisfied solution which can avoid the worst case. Well see some
engineering practice in next section which can make it very seldom to meet the
worst case.
13.3.1
In average case, quick sort performs well. There is a vivid example that even
the partition divides the list every time to two lists with length 1 to 9. The
performance is still bound to O(n lg n) as shown in [2].
This subsection need some mathematic background, reader can safely skip
to next part.
There are two methods to proof the average case performance, one uses
an important fact that the performance is proportion to the total comparing
operations during quick sort [2]. Dierent with the selections sort that every two
elements have been compared. Quick sort avoid many unnecessary comparisons.
389
2
ji+1
(13.9)
C(n) =
P (i, j)
(13.10)
i=1 j=i+1
=
=
n1
2
ji+1
i=1 j=i+1
n1
ni
i=1 k=1
(13.11)
2
k+1
1 1
+ + .... = ln n + + n
2 3
C(n) =
n1
O(lg n) = O(n lg n)
(13.12)
i=1
The other method to prove the average performance is to use the recursive
fact that when sorting list of length n, the partition splits the list into two sub
lists with length i and ni1. The partition process itself takes cn time because
it examine every element with the pivot. So we have the following equation.
T (n) = T (i) + T (n i 1) + cn
(13.13)
Where T (n) is the total time when perform quick sort on list of length n.
Since i is equally like to be any of 0, 1, ..., n 1, taking math expectation to the
390CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
equation gives:
T (n)
(13.14)
2
T (i) + cn
n i=0
b1
n1
T (i) + cn2
(13.15)
i=0
n2
(13.16)
i=0
Subtract equation (13.15) and (13.16) can eliminate all the T (i) for 0 i <
n 1.
nT (n) = (n + 1)T (n 1) + 2cn c
(13.17)
(13.18)
k=3
(13.19)
391
Using the harmonic series mentioned above, the final result is:
O(
T (n)
T (1)
) = O(
+ 2c ln n + + n ) = O(lg n)
n+1
2
(13.20)
Thus
O(T (n)) = O(n lg n)
(13.21)
Exercise 13.2
Why Lomutos methods performs poor when there are many duplicated
elements?
13.4
Engineering Improvement
Quick sort performs well in most cases as mentioned in previous section. However, there does exist the worst cases which downgrade the performance to
quadratic. If the data is randomly prepared, such case is rare, however, there
are some particular sequences which lead to the worst case and these kinds of
sequences are very common in practice.
In this section, some engineering practices are introduces which either help
to avoid poor performance in handling some special input data with improved
partition algorithm, or try to uniform the possibilities among cases.
13.4.1
392CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
We can define the ternary quick sort as the following.
{
sort(L) =
: L=
sort(S) sort(E) sort(G) : otherwise
(13.22)
Where S, E, G are sub lists contains all elements which are less than, equal
to, and greater than the pivot respectively.
S = {x|x L, x < l1 }
E = {x|x L, x = l1 }
G = {x|x L, l1 < x}
The basic ternary quick sort can be implemented in Haskell as the following
example code.
sort [] = []
sort (x:xs) = sort [a | axs, a<x] ++
x:[b | bxs, b==x] ++ sort [c | cxs, c>x]
Note that the comparison between elements must support abstract lessthan and equal-to operations. The basic version of ternary sort takes linear
O(n) time to concatenate the three sub lists. It can be improved by using the
standard techniques of accumulator.
Suppose function sort (L, A) is the accumulated ternary quick sort definition, that L is the sequence to be sorted, and the accumulator A contains the
intermediate sorted result so far. We initialize the sorting with an empty accumulator: sort(L) = sort (L, ).
Its easy to give the trivial edge cases like below.
{
A : L=
sort (L, A) =
... : otherwise
For the recursive case, as the ternary partition splits to three sub lists S, E, G,
only S and G need recursive sort, E contains all elements equal to the pivot,
which is in correct order thus neednt to be sorted any more. The idea is to
sort G with accumulator A, then concatenate it behind E, then use this result
as the new accumulator, and start to sort S:
sort (L, A) =
A : L=
sort(S, E sort(G, A)) : otherwise
(13.23)
(S, E, G)
partition(p, L , {l1 } S, E, G)
partition(p, L, S, E, G) =
partition(p, L , S, {l1 } E, G)
partition(p, L , S, E, {l1 } G)
:
:
:
:
L=
l1 < p
l1 = p
p < l1
(13.24)
393
Where l1 is the first element in L if L isnt empty, and L contains all rest
elements except for l1 . Below Haskell program implements this algorithm. It
starts the recursive sorting immediately in the edge case of parition.
sort xs = sort xs []
sort []
r=r
sort (x:xs) r = part xs [] [x] [] r where
part [] as bs cs r = sort as (bs ++ sort cs
part (x:xs) as bs cs r | x < x = part xs
| x == x = part xs
| x > x = part xs
r)
(x:as) bs cs r
as (x:bs) cs r
as bs (x:cs) r
2-way partition
The cases with many duplicated elements can also be handled imperatively.
Robert Sedgewick presented a partition method [3], [4] which holds two pointers.
One moves from left to right, the other moves from right to left. The two pointers
are initialized as the left and right boundaries of the array.
When start partition, the left most element is selected as the pivot. Then
the left pointer i keeps advancing to right until it meets any element which is
not less than the pivot; On the other hand3 , The right pointer j repeatedly
scans to left until it meets any element which is not greater than the pivot.
At this time, all elements before the left pointer i are strictly less than the
pivot, while all elements after the right pointer j are greater than the pivot. i
points to an element which is either greater than or equal to the pivot; while j
points to an element which is either less than or equal to the pivot, the situation
at this stage is illustrated in figure 13.4 (a).
In order to partition all elements less than or equal to the pivot to the left,
and the others to the right, we can exchange the two elements pointed by i, and
j. After that the scan can be resumed. We repeat this process until either i
meets j, or they overlap.
At any time point during partition. There is invariant that all elements
before i (including the one pointed by i) are not greater than the pivot; while
all elements after j (including the one pointed by j) are not less than the pivot.
The elements between i and j havent been examined yet. This invariant is
shown in figure 13.4 (b).
3 We
dont use then because its quite OK to perform the two scans in parallel.
394CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
pivot
x[l]
>=pivot
x[i]
<=pivot
...?...
x[j]
pivot
x[l]
...?...
Figure 13.4: Partition a range of array by using the left most element as the
pivot.
After the left pointer i meets the right pointer j, or they overlap each other,
we need one extra exchanging to move the pivot located at the first position to
the correct place which is pointed by j. Next, the elements between the lower
bound and j as well as the sub slice between i and the upper bound of the array
are recursively sorted.
This algorithm can be described as the following.
1: procedure Sort(A, l, u)
sort range [l, u)
2:
if u l > 1 then
More than 1 element for non-trivial case
3:
i l, j u
4:
pivot A[l]
5:
loop
6:
repeat
7:
ii+1
8:
until A[i] pivot Need handle error case that i u in fact.
9:
repeat
10:
j j1
11:
until A[j] pivot Need handle error case that j < l in fact.
12:
if j < i then
13:
break
14:
Exchange A[i] A[j]
15:
Exchange A[l] A[j]
Move the pivot
16:
Sort(A, l, j)
17:
Sort(A, i, u)
Consider the extreme case that all elements are equal, this in-place quick sort
will partition the list to two equal length sub lists although it takes n2 unnecessary swaps. As the partition is balanced, the overall performance is O(n lg n),
which avoid downgrading to quadratic. The following ANSI C example program
implements this algorithm.
void qsort(Key xs, int l, int u) {
int i, j, pivot;
if (l < u - 1) {
395
pivot = i = l; j = u;
while (1) {
while (i < u && xs[++i] < xs[pivot]);
while (j l && xs[pivot] < xs[--j]);
if (j < i) break;
swap(xs[i], xs[j]);
}
swap(xs[pivot], xs[j]);
qsort(xs, l, j);
qsort(xs, i, u);
}
}
Comparing this algorithm with the basic version based on N. Lumotos partition method, we can find that it swaps fewer elements, because it skips those
have already in proper sides of the pivot.
3-way partition
Its obviously that, we should avoid those unnecessary swapping for the duplicated elements. Whats more, the algorithm can be developed with the idea
of ternary sort (as known as 3-way partition in some materials), that all the
elements which are strictly less than the pivot are put to the left sub slice, while
those are greater than the pivot are put to the right. The middle part holds all
the elements which are equal to the pivot. With such ternary partition, we need
only recursively sort the ones which dier from the pivot. Thus in the above
extreme case, there arent any elements need further sorting. So the overall
performance is linear O(n).
The diculty is how to do the 3-way partition. Jon Bentley and Douglas
McIlroy developed a solution which keeps those elements equal to the pivot at
the left most and right most sides as shown in figure 13.5 (a) [5] [6].
pivot
x[l]
...?...
pivot
396CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
to the pivot for j respectively. At this time, if i and j dont meet each other or
overlap, they are not only exchanged, but also examined if the elements pointed
by them are identical to the pivot. Then necessary exchanging happens between
i and p, as well as j and q.
By the end of the partition process, the elements equal to the pivot need
to be swapped to the middle part from the left and right ends. The number of
such extra exchanging operations are proportion to the number of duplicated
elements. Its zero operation if elements are unique which there is no overhead
in the case. The final partition result is shown in figure 13.5 (b). After that we
only need recursively sort the less-than and greater-than sub slices.
This algorithm can be given by modifying the 2-way partition as below.
1: procedure Sort(A, l, u)
2:
if u l > 1 then
3:
i l, j u
4:
p l, q u
points to the boundaries for equal elements
5:
pivot A[l]
6:
loop
7:
repeat
8:
ii+1
9:
until A[i] pivot
Skip the error handling for i u
10:
repeat
11:
j j1
12:
until A[j] pivot
Skip the error handling for j < l
13:
if j i then
14:
break
Note the dierence form the above algorithm
15:
Exchange A[i] A[j]
16:
if A[i] = pivot then
Handle the equal elements
17:
pp+1
18:
Exchange A[p] A[i]
19:
if A[j] = pivot then
20:
q q1
21:
Exchange A[q] A[j]
22:
if i = j A[i] = pivot then
A special case
23:
j j 1, i i + 1
24:
for k from l to p do Swap the equal elements to the middle part
25:
Exchange A[k] A[j]
26:
j j1
27:
for k from u 1 down-to q do
28:
Exchange A[k] A[i]
29:
ii+1
30:
Sort(A, l, j + 1)
31:
Sort(A, i, u)
This algorithm can be translated to the following ANSI C example program.
void qsort2(Key xs, int l, int u) {
int i, j, k, p, q, pivot;
if (l < u - 1) {
i = p = l; j = q = u; pivot = xs[l];
while (1) {
397
}
}
It can be seen that the the algorithm turns to be a bit complex when it
evolves to 3-way partition. There are some tricky edge cases should be handled
with caution. Actually, we just need a ternary partition algorithm. This remind
us the N. Lumotos method, which is straightforward enough to be a start point.
The idea is to change the invariant a bit. We still select the first element as
the pivot, as shown in figure 13.6, at any time, the left most section contains
elements which are strictly less than the pivot; the next section contains the
elements equal to the pivot; the right most section holds all the elements which
are strictly greater than the pivot. The boundaries of three sections are marked
as i, k, and j respectively. The rest part, which is between k and j are elements
havent been scanned yet.
At the beginning of this algorithm, the less-than section is empty; the
equal-to section contains only one element, which is the pivot; so that i is
initialized to the lower bound of the array, and k points to the element next
to i. The greater-than section is also initialized as empty, thus j is set to the
upper bound.
i
...?...
398CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
while k < j do
while pivot < A[k] do
j j1
Exchange A[k] A[j]
if A[k] < pivot then
Exchange A[k] A[i]
ii+1
k k+1
Sort(A, l, i)
Sort(A, j, u)
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
Compare this one with the previous 3-way partition quick sort algorithm, its
more simple at the cost of more swapping operations. Below ANSI C program
implements this algorithm.
void qsort(Key xs, int l, int u) {
int i, j, k; Key pivot;
if (l < u - 1) {
i = l; j = u; pivot = xs[l];
for (k = l + 1; k < j; ++k) {
while (pivot < xs[k]) { --j; swap(xs[j], xs[k]); }
if (xs[k] < pivot) { swap(xs[i], xs[k]); ++i; }
}
qsort(xs, l, i);
qsort(xs, j, u);
}
}
Exercise 13.3
All the quick sort imperative algorithms given in this section use the first
element as the pivot, another method is to choose the last one as the pivot.
Realize the quick sort algorithms, including the basic version, Sedgewick
version, and ternary (3-way partition) version by using this approach.
13.5
Although the ternary quick sort (3-way partition) solves the issue for duplicated
elements, it cant handle some typical worst cases. For example if many of the
elements in the sequence are ordered, no matter its in ascending or descending
order, the partition result will be two unbalanced sub sequences, one with few
elements, the other contains all the rest.
Consider the two extreme cases, {x1 < x2 < ... < xn } and {y1 > y2 > ... >
yn }. The partition results are shown in figure 13.7.
Its easy to give some more worst cases, for example, {xm , xm1 , ..., x2 , x1 , xm+1 , xm+2 , ...xn }
where {x1 < x2 < ... < xn }; Another one is {xn , x1 , xn1 , x2 , ...}. Their partition result trees are shown in figure 13.8.
Observing that the bad partition happens easily when blindly choose the
first element as the pivot, there is a popular work around suggested by Robert
...
(a) The partition tree for {x1 < x2 < ... < xn }, There arent any elements less
than or equal to the pivot (the first element) in every partition.
n-1
n-2
...
(b) The partition tree for {y1 > y2 > ... > yn },
There arent any elements greater than or equal to
the pivot (the first element) in every partition.
399
400CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
m-1
m+1
m-2
m+2
...
...
(a) Except for the first partition, all the others are unbalanced.
n-1
...
401
Sedgwick in [3]. Instead of selecting the fixed position in the sequence, a small
sampling helps to find a pivot which has lower possibility to cause a bad partition. One option is to examine the first element, the middle, and the last one,
then choose the median of these three element. In the worst case, it can ensure
that there is at least one element in the shorter partitioned sub list.
Note that there is one tricky in real-world implementation. Since the index
is typically represented in limited length words, it may cause overflow when
calculating the middle index by the naive expression (l + u) / 2. In order
to avoid this issue, it can be accessed as l + (u - l) / 2. There are two
methods to find the median, one needs at most three comparisons [5]; the other
is to move the minimum value to the first location, the maximum value to the
last location, and the median value to the meddle location by swapping. After
that we can select the middle as the pivot. Below algorithm illustrated the
second idea before calling the partition procedure.
1: procedure Sort(A, l, u)
2:
if u l > 1 then
Need handle overflow error in practice
3:
m l+u
2
4:
if A[m] < A[l] then
Ensure A[l] A[m]
5:
Exchange A[l] A[m]
6:
if A[u 1] < A[l] then
Ensure A[l] A[u 1]
7:
Exchange A[l] A[u 1]
8:
if A[u 1] < A[m] then
Ensure A[m] A[u 1]
9:
Exchange A[m] A[u 1]
10:
Exchange A[l] A[m]
11:
(i, j) Partition(A, l, u)
12:
Sort(A, l, i)
13:
Sort(A, j, u)
Its obviously that this algorithm performs well in the 4 special worst cases
given above. The imperative implementation of median-of-three is left as exercise to the reader.
However, in purely functional settings, its expensive to randomly access the
middle and the last element. We cant directly translate the imperative median
selection algorithm. The idea of taking a small sampling and then finding the
median element as pivot can be realized alternatively by taking the first 3. For
example, in the following Haskell program.
qsort [] = []
qsort [x] = [x]
qsort [x, y] = [min x y, max x y]
qsort (x:y:z:rest) = qsort (filter (< m) (s:rest)) ++ [m] ++ qsort (filter ( m) (l:rest)) where
xs = [x, y, z]
[s, m, l] = [minimum xs, median xs, maximum xs]
Unfortunately, none of the above 4 worst cases can be well handled by this
program, this is because the sampling is not good. We need telescope, but not
microscope to profile the whole list to be partitioned. Well see the functional
way to solve the partition problem later.
Except for the median-of-three, there is another popular engineering practice
to get good partition result. instead of always taking the first element or the
last one as the pivot. One alternative is to randomly select one. For example
402CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
as the following modification.
1: procedure Sort(A, l, u)
2:
if u l > 1 then
3:
Exchange A[l] A[ Random(l, u) ]
4:
(i, j) Partition(A, l, u)
5:
Sort(A, l, i)
6:
Sort(A, j, u)
The function Random(l, u) returns a random integer i between l and u,
that l i < u. The element at this position is exchanged with the first one, so
that it is selected as the pivot for the further partition. This algorithm is called
random quick sort [2].
Theoretically, neither median-of-three nor random quick sort can avoid the
worst case completely. If the sequence to be sorted is randomly distributed, no
matter choosing the first one as the pivot, or the any other arbitrary one are
equally in eect. Considering the underlying data structure of the sequence is
singly linked-list in functional setting, its expensive to strictly apply the idea
of random quick sort in purely functional approach.
Even with this bad news, the engineering improvement still makes sense in
real world programming.
13.6
There is some other engineering practice which doesnt focus on solving the bad
partition issue. Robert Sedgewick observed that when the list to be sorted is
short, the overhead introduced by quick sort is relative expense, on the other
hand, the insertion sort performs better in such case [4], [5]. Sedgewick, Bentley
and McIlroy tried dierent threshold, as known as Cut-O, that when there
are lesson than Cut-O elements, the sort algorithm falls back to insertion
sort.
1: procedure Sort(A, l, u)
2:
if u l > Cut-Off then
3:
Quick-Sort(A, l, u)
4:
else
5:
Insertion-Sort(A, l, u)
The implementation of this improvement is left as exercise to the reader.
Exercise 13.4
Can you figure out more quick sort worst cases besides the four given in
this section?
Implement median-of-three method in your favorite imperative programming language.
Implement random quick sort in your favorite imperative programming
language.
Implement the algorithm which falls back to insertion sort when the length
of list is small in both imperative and functional approach.
13.7
403
Side words
Its sometimes called true quick sort if the implementation equipped with most
of the engineering practice we introduced, including insertion sort fall-back with
cut-o, in-place exchanging, choose the pivot by median-of-three method, 3-waypartition.
The purely functional one, which express the idea of quick sort perfect cant
take all of them. Thus someone think the functional quick sort is essentially
tree sort.
Actually, quick sort does have close relationship with tree sort. Richard Bird
shows how to derive quick sort from binary tree sort by deforestation [7].
Consider a binary search tree creation algorithm called unf old. Which turns
a list of elements into a binary search tree.
{
: L=
unf old(L) =
(13.25)
tree(Tl , l1 , Tr ) : otherwise
Where
Tl = unf old({a|a L , a l1 })
Tr = unf old({a|a L , l1 < a})
(13.26)
The interesting point is that, this algorithm creates tree in a dierent way
as we introduced in the chapter of binary search tree. If the list to be unfold
is empty, the result is obviously an empty tree. This is the trivial edge case;
Otherwise, the algorithm set the first element l1 in the list as the key of the
node, and recursively creates its left and right children. Where the elements
used to form the left child are those which are less than or equal to the key in
L , while the rest elements which are greater than the key are used to form the
right child.
Remind the algorithm which turns a binary search tree to a list by in-order
traversing:
{
: T =
toList(lef t(T )) {key(T )} toList(right(T )) : otherwise
(13.27)
We can define quick sort algorithm by composing these two functions.
toList(T ) =
(13.28)
The binary search tree built in the first step of applying unf old is the intermediate result. This result is consumed by toList and dropped after the second
step. Its quite possible to eliminate this intermediate result, which leads to the
basic version of quick sort.
The elimination of the intermediate binary search tree is called deforestation.
This concept is based on Burstle-Darlingtons work [9].
13.8
Merge sort
Although quick sort performs perfectly in average cases, it cant avoid the worst
case no matter what engineering practice is applied. Merge sort, on the other
404CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
kind, ensure the performance is bound to O(n lg n) in all the cases. Its particularly useful in theoretical algorithm design and analysis. Another feature is
that merge sort is friendly for linked-space settings, which is suitable for sorting
nonconsecutive stored sequences. Some functional programming and dynamic
programming environments adopt merge sort as the standard library sorting
solution, such as Haskel, Python and Java (later than Java 7).
In this section, well first brief the intuitive idea of merge sort, provide a
basic version. After that, some variants of merge sort will be given including
nature merge sort, and bottom-up merge sort.
13.8.1
Basic version
Same as quick sort, the essential idea behind merge sort is also divide and conquer. Dierent from quick sort, merge sort enforces the divide to be strictly
balanced, that it always splits the sequence to be sorted at the middle point.
After that, it recursively sort the sub sequences and merge the sorted two sequences to the final result. The algorithm can be described as the following.
In order to sort a sequence L,
Trivial edge case: If the sequence to be sorted is empty, the result is
obvious empty;
Otherwise, split the sequence at the middle position, recursively sort the
two sub sequences and merge the result.
The basic merge sort algorithm can be formalized with the following equation.
{
sort(L) =
: L=
merge(sort(L1 ), sort(L2 )) : otherwise, (L1 , L2 ) = splitAt( |L|
2 , L)
(13.29)
Merge
There are two black-boxes in the above merge sort definition, one is the splitAt
function, which splits a list at a given position; the other is the merge function,
which can merge two sorted lists into one.
As presented in the appendix of this book, its trivial to realize splitAt in
imperative settings by using random access. However, in functional settings,
its typically realized as a linear algorithm:
{
n=0
otherwise, (A, B) = splitAt(n 1, L )
(13.30)
Where l1 is the first element of L, and L represents the rest elements except
of l1 if L isnt empty.
The idea of merge can be illustrated as in figure 13.9. Consider two lines of
kids. The kids have already stood in order of their heights. that the shortest
one stands at the first, then a taller one, the tallest one stands at the end of the
line.
splitAt(n, L) =
(, L) :
({l1 } A, B) :
405
Now lets ask the kids to pass a door one by one, every time there can be at
most one kid pass the door. The kids must pass this door in the order of their
height. The one cant pass the door before all the kids who are shorter than
him/her.
Since the two lines of kids have already been sorted, the solution is to ask
the first two kids, one from each line, compare their height, and let the shorter
kid pass the door; Then they repeat this step until one line is empty, after that,
all the rest kids can pass the door one by one.
This idea can be formalized in the following equation.
A
B
merge(A, B) =
{a1 } merge(A , B)
{b1 } merge(A, B )
:
:
:
:
B=
A=
a1 b1
otherwise
(13.31)
Where a1 and b1 are the first elements in list A and B; A and B are the
rest elements except for the first ones respectively. The first two cases are trivial
edge cases. That merge one sorted list with an empty list results the same sorted
list; Otherwise, if both lists are non-empty, we take the first elements from the
two lists, compare them, and use the minimum as the first one of the result,
then recursively merge the rest.
With merge defined, the basic version of merge sort can be implemented
like the following Haskell example code.
msort [] = []
msort [x] = [x]
msort xs = merge (msort as) (msort bs) where
(as, bs) = splitAt (length xs div 2) xs
merge xs [] = xs
406CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
merge [] ys = ys
merge (x:xs) (y:ys) | x y = x : merge xs (y:ys)
| x > y = y : merge (x:xs) ys
Note that, the implementation diers from the algorithm definition that it
treats the singleton list as trivial edge case as well.
Merge sort can also be realized imperatively. The basic version can be developed as the below algorithm.
1: procedure Sort(A)
2:
if |A| > 1 then
3:
m |A|
2
4:
X Copy-Array(A[1...m])
5:
Y Copy-Array(A[m + 1...|A|])
6:
Sort(X)
7:
Sort(Y )
8:
Merge(A, X, Y )
When the array to be sorted contains at least two elements, the non-trivial
sorting process starts. It first copy the first half to a new created array X, and
the second half to a second new array Y . Recursively sort them; and finally
merge the sorted result back to A.
This version uses the same amount of extra spaces of A. This is because the
Merge algorithm isnt in-place at the moment. Well introduce the imperative
in-place merge sort in later section.
The merge process almost does the same thing as the functional definition.
There is a verbose version and a simplified version by using sentinel.
The verbose merge algorithm continuously checks the element from the two
input arrays, picks the smaller one and puts it back to the result array A, it then
advances along the arrays respectively until either one input array is exhausted.
After that, the algorithm appends the rest of the elements in the other input
array to A.
1: procedure Merge(A, X, Y )
2:
i 1, j 1, k 1
3:
m |X|, n |Y |
4:
while i m j n do
5:
if X[i] < Y [j] then
6:
A[k] X[i]
7:
ii+1
8:
else
9:
A[k] Y [j]
10:
j j+1
11:
k k+1
12:
while i m do
13:
A[k] X[i]
14:
k k+1
15:
ii+1
16:
while j n do
17:
A[k] Y [j]
18:
k k+1
19:
j j+1
407
Although this algorithm is a bit verbose, it can be short in some programming environment with enough tools to manipulate array. The following Python
program is an example.
def msort(xs):
n = len(xs)
if n > 1:
ys = [x for x in xs[:n/2]]
zs = [x for x in xs[n/2:]]
ys = msort(ys)
zs = msort(zs)
xs = merge(xs, ys, zs)
return xs
def merge(xs, ys, zs):
i=0
while ys != [] and zs != []:
xs[i] = ys.pop(0) if ys[0] < zs[0] else zs.pop(0)
i=i+1
xs[i:] = ys if ys !=[] else zs
return xs
Performance
Before dive into the improvement of this basic version, lets analyze the performance of merge sort. The algorithm contains two steps, divide step, and merge
step. In divide step, the sequence to be sorted is always divided into two sub
sequences with the same length. If we draw a similar partition tree as what
we did for quick sort, it can be found this tree is a perfectly balanced binary
tree as shown in figure 13.3. Thus the height of this tree is O(lg n). It means
the recursion depth of merge sort is bound to O(lg n). Merge happens in every
level. Its intuitive to analyze the merge algorithm, that it compare elements
from two input sequences in pairs, after one sequence is fully examined the rest
one is copied one by one to the result, thus its a linear algorithm proportion to
the length of the sequence. Based on this facts, denote T (n) the time for sorting
the sequence with length n, we can write the recursive time cost as below.
n
n
T (n) = T ( ) + T ( ) + cn
2
2
n
= 2T ( ) + cn
2
(13.32)
It states that the cost consists of three parts: merge sort the first half takes
T ( n2 ), merge sort the second half takes also T ( n2 ), merge the two results takes
cn, where c is some constant. Solve this equation gives the result as O(n lg n).
Note that, this performance doesnt vary in all cases, as merge sort always
uniformly divides the input.
Another significant performance indicator is space occupation. However, it
varies a lot in dierent merge sort implementation. The detail space bounds
analysis will be explained in every detailed variants later.
For the basic imperative merge sort, observe that it demands same amount
of spaces as the input array in every recursion, copies the original elements
408CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
to them for recursive sort, and these spaces can be released after this level of
recursion. So the peak space requirement happens when the recursion enters to
the deepest level, which is O(n lg n).
The functional merge sort consume much less than this amount, because the
underlying data structure of the sequence is linked-list. Thus it neednt extra
spaces for merge4 . The only spaces requirement is for book-keeping the stack
for recursive calls. This can be seen in the later explanation of even-odd split
algorithm.
Minor improvement
Well next improve the basic merge sort bit by bit for both the functional and
imperative realizations. The first observation is that the imperative merge algorithm is a bit verbose. [2] presents an elegant simplification by using positive
as the sentinel. That we append as the last element to the both ordered
arrays for merging5 . Thus we neednt test which array is not exhausted. Figure
13.10 illustrates this idea.
x[1]
x[2]
...
a[i]
...
a[n]
b[j]
...
b[m]
INF
x[k]
INF
procedure Merge(A, X, Y )
Append(X, )
Append(Y, )
i 1, j 1
for k from 1 to |A| do
if X[i] < Y [j] then
A[k] X[i]
ii+1
else
A[k] Y [j]
j j+1
The following ANSI C program imlements this idea. It embeds the merge inside. INF is defined as a big constant number with the same type of Key. Where
4 The
5 For
complex eects caused by lazy evaluation is ignored here, please refer to [7] for detail
sorting in monotonic non-increasing order, can be used instead
409
the type can either be defined elsewhere or we can abstract the type information by passing the comparator as parameter. We skip these implementation
and language details here.
void msort(Key xs, int l, int u) {
int i, j, m;
Key as, bs;
if (u - l > 1) {
m = l + (u - l) / 2; / avoid int overflow /
msort(xs, l, m);
msort(xs, m, u);
as = (Key) malloc(sizeof(Key) (m - l + 1));
bs = (Key) malloc(sizeof(Key) (u - m + 1));
memcpy((void)as, (void)(xs + l), sizeof(Key) (m - l));
memcpy((void)bs, (void)(xs + m), sizeof(Key) (u - m));
as[m - l] = bs[u - m] = INF;
for (i = j = 0; l < u; ++l)
xs[l] = as[i] < bs[j] ? as[i++] : bs[j++];
free(as);
free(bs);
}
}
Running this program takes much more time than the quick sort. Besides
the major reason well explain later, one problem is that this version frequently
allocates and releases memories for merging. While memory allocation is one of
the well known bottle-neck in real world as mentioned by Bentley in [4]. One
solution to address this issue is to allocate another array with the same size to
the original one as the working area. The recursive sort for the first and second
halves neednt allocate any more extra spaces, but use the working area when
merging. Finally, the algorithm copies the merged result back.
This idea can be expressed as the following modified algorithm.
1: procedure Sort(A)
2:
B Create-Array(|A|)
3:
Sort(A, B, 1, |A|)
4:
5:
6:
7:
8:
9:
procedure Sort(A, B, l, u)
if u l > 0 then
m l+u
2
Sort(A, B, l, m)
Sort(A, B, m + 1, u)
Merge(A, B, l, m, u)
This algorithm duplicates another array, and pass it along with the original
array to be sorted to Sort algorithm. In real implementation, this working
area should be released either manually, or by some automatic tool such as GC
(Garbage collection). The modified algorithm Merge also accepts a working
area as parameter.
1: procedure Merge(A, B, l, m, u)
2:
i l, j m + 1, k l
3:
while i m j u do
4:
if A[i] < A[j] then
410CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
B[k] A[i]
ii+1
else
B[k] A[j]
j j+1
k k+1
while i m do
B[k] A[i]
k k+1
ii+1
while j u do
B[k] A[j]
k k+1
j j+1
for i from l to u do
A[i] B[i]
Copy back
This new version runs faster than the previous one. In my test machine, it
411
speeds up about 20% to 25% when sorting 100,000 randomly generated numbers.
The basic functional merge sort can also be fine tuned. Observe that, it
splits the list at the middle point. However, as the underlying data structure to
represent list is singly linked-list, random access at a given position is a linear
operation (refer to appendix A for detail). Alternatively, one can split the list
in an even-odd manner. That all the elements in even position are collected in
one sub list, while all the odd elements are collected in another. As for any list,
there are either same amount of elements in even and odd positions, or they
dier by one. So this divide strategy always leads to well splitting, thus the
performance can be ensured to be O(n lg n) in all cases.
The even-odd splitting algorithm can be defined as below.
L=
|L| = 1
otherwise, (A, B) = split(L )
(13.33)
When the list is empty, the split result are two empty lists; If there is only
one element in the list, we put this single element, which is at position 1, to the
odd sub list, the even sub list is empty; Otherwise, it means there are at least
two elements in the list, We pick the first one to the odd sub list, the second
one to the even sub list, and recursively split the rest elements.
All the other functions are kept same, the modified Haskell program is given
as the following.
split(L) =
(, ) :
({l1 }, ) :
({l1 } A, {l2 } B) :
13.9
One drawback for the imperative merge sort is that it requires extra spaces for
merging, the basic version without any optimization needs O(n lg n) in peak
time, and the one by allocating a working area needs O(n).
Its nature for people to seek the in-place version merge sort, which can
reuse the original array without allocating any extra spaces. In this section,
well introduce some solutions to realize imperative in-place merge sort.
13.9.1
The first idea is straightforward. As illustrated in figure 13.11, sub list A, and B
are sorted, when performs in-place merge, the invariant ensures that all elements
before i are merged, so that they are in non-decreasing order; every time we
compare the i-th and the j-th elements. If the i-th is less than the j-th, the
marker i just advances one step to the next. This is the easy case. Otherwise,
it means that the j-th element is the next merge result, which should be put in
front of i. In order to achieve this, all elements between i and j, including the
i-th should be shift to the end by one cell. We repeat this process till all the
elements in A and B are put to the correct positions.
1: procedure Merge(A, l, m, u)
412CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
xs[i]
xs[j]
2:
3:
4:
5:
6:
7:
8:
9:
while l m m u do
if A[l] < A[m] then
l l+1
else
x A[m]
for i m down-to l + 1 do
A[i] A[i 1]
A[l] x
Shift
13.9.2
In order to implement the in-place merge sort in O(n lg n) time, when sorting a
sub array, the rest part of the array must be reused as working area for merging.
As the elements stored in the working area, will be sorted later, they cant be
413
compare
... reuse ...
A[i]
...
B[j]
...
C[k]
...
414CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
causing any out-of-bound error;
2. The working area can be overlapped with either of the two sorted arrays,
however, it should be ensured that there are not any unmerged elements
being overwritten;
This algorithm can be implemented in ANSI C as the following example.
void wmerge(Key xs, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(xs, w++, xs[i] < xs[j] ? i++ : j++);
while (i < m)
swap(xs, w++, i++);
while (j < n)
swap(xs, w++, j++);
}
With this merging algorithm defined, its easy to imagine a solution, which
can sort half of the array; The next question is, how to deal with the rest of the
unsorted part stored in the working area as shown in figure 13.13?
...unsorted...
unsorted 1/4
sorted B 1/4
sorted B 1/4
415
work area
(a)
... ... ... ... merged 3/4 ... ... ... ...
(b)
8:
w l+u
Ensure the working area is big enough
2
9:
Sort(A, w, u , l)
The first half holds the sorted elements
10:
Merge(A, [l, l + u w], [u , u], w)
11:
for i w down-to l do
Switch to insertion sort
12:
ji
13:
while j u A[j] < A[j 1] do
14:
Exchange A[j] A[j 1]
15:
j j+1
Note that in order to satisfy the first constraint, we must ensure the working
area is big enough to hold all exchanged in elements, thats way we round it by
416CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
ceiling when sort the second half of the working area. Note that we actually
pass the ranges including the end points to the algorithm Merge.
Next, we develop a Sort algorithm, which mutually recursive call Sort and
exchange the result to the working area.
1: procedure Sort(A, l, u, w)
2:
if u l > 0 then
3:
m l+u
2
4:
Sort(A, l, m)
5:
Sort(A, m + 1, u)
6:
Merge(A, [l, m], [m + 1, u], w)
7:
else
Exchange all elements to the working area
8:
while l u do
9:
Exchange A[l] A[w]
10:
l l+1
11:
w w+1
Dierent from the naive in-place sort, this algorithm doesnt shift the array
during merging. The main algorithm reduces the unsorted part in sequence of
n n n
2 , 4 , 8 , ..., it takes O(lg n) steps to complete sorting. In every step, It recursively
sorts half of the rest elements, and performs linear time merging.
Denote the time cost of sorting n elements as T (n), we have the following
equation.
n
n
3n
n
7n
n
+ T( ) + c
+ ...
(13.34)
T (n) = T ( ) + c + T ( ) + c
2
2
4
4
8
8
Solving this equation by using telescope method, gets the result O(n lg n).
The detailed process is left as exercise to the reader.
The following ANSI C code completes the implementation by using the example wmerge program given above.
void imsort(Key xs, int l, int u);
void wsort(Key xs, int l, int u, int w) {
int m;
if (u - l > 1) {
m = l + (u - l) / 2;
imsort(xs, l, m);
imsort(xs, m, u);
wmerge(xs, l, m, m, u, w);
}
else
while (l < u)
swap(xs, l++, w++);
}
void imsort(Key xs, int l, int u) {
int m, n, w;
if (u - l > 1) {
m = l + (u - l) / 2;
w = l + u - m;
wsort(xs, l, m, w); / the last half contains sorted elements /
while (w - l > 2) {
n = w;
417
w = l + (n - l + 1) / 2; / ceiling /
wsort(xs, w, n, l); / the first half contains sorted elements /
wmerge(xs, l, l + n - w, n, u, w);
}
for (n = w; n > l; --n) /switch to insertion sort/
for (m = n; m < u && xs[m] < xs[m-1]; ++m)
swap(xs, m, m - 1);
}
}
However, this program doesnt run faster than the version we developed in
previous section, which doubles the array in advance as working area. In my
machine, it is about 60% slower when sorting 100,000 random numbers due to
many swap operations.
13.9.3
The in-place merge sort is still a live area for research. In order to save the
extra spaces for merging, some overhead has be introduced, which increases
the complexity of the merge sort algorithm. However, if the underlying data
structure isnt array, but linked-list, merge can be achieved without any extra
spaces as shown in the even-odd functional merge sort algorithm presented in
previous section.
In order to make it clearer, we can develop a purely imperative linked-list
merge sort solution. The linked-list can be defined as a record type as shown in
appendix A like below.
struct Node {
Key key;
struct Node next;
};
We can define an auxiliary function for node linking. Assume the list to be
linked isnt empty, it can be implemented as the following.
struct Node link(struct Node x, struct Node ys) {
xnext = ys;
return x;
}
418CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
The following example ANSI C program implements this splitting algorithm
embedded.
struct Node msort(struct Node xs) {
struct Node p, as, bs;
if (!xs | | !xsnext) return xs;
as = bs = NULL;
while(xs) {
p = xs;
xs = xsnext;
as = link(p, as);
swap(as, bs);
}
as = msort(as);
bs = msort(bs);
return merge(as, bs);
}
The only thing left is to develop the imperative merging algorithm for linkedlist. The idea is quite similar to the array merging version. As long as neither of
the sub lists is exhausted, we pick the less one, and append it to the result list.
After that, it just need link the non-empty one to the tail the result, but not
a looping for copying. It needs some carefulness to initialize the result list, as
its head node is the less one among the two sub lists. One simple method is to
use a dummy sentinel head, and drop it before returning. This implementation
detail can be given as the following.
struct Node merge(struct Node as, struct Node bs) {
struct Node s, p;
p = &s;
while (as && bs) {
if (askey < bskey) {
link(p, as);
as = asnext;
}
else {
link(p, bs);
bs = bsnext;
}
p = pnext;
}
if (as)
link(p, as);
if (bs)
link(p, bs);
return s.next;
}
Exercise 13.5
Proof the performance of in-place merge sort is bound to O(n lg n).
13.10
419
Knuth gives another way to interpret the idea of divide and conquer merge sort.
It just likes burn a candle in both ends [1]. This leads to the nature merge sort
algorithm.
8, 12, 14
0, 1, 4, 11
2, 3, 5
merge
7, 8, 12, 14, 15
13, 10, 6
15, 7
merge
... free cells ...
420CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
result starts from beginning. Next we repeat this step, which goes on scanning
toward the center of the original sequence. This time we merge the two ordered
sub sequences to the right hand of the working area toward the left. Such setup
is easy for the next round of scanning. When all the elements in the original
sequence have been scanned and merged to the target, we switch to use the
elements stored in the working area for sorting, and use the previous sequence
as new working area. Such switching happens repeatedly in each round. Finally,
we copy all elements from the working area to the original array if necessary.
The only question left is when this algorithm stops. The answer is that when
we start a new round of scanning, and find that the longest non-decreasing sub
list spans to the end, which means the whole list is ordered, the sorting is done.
Because this kind of merge sort proceeds the target sequence in two ways,
and uses the nature ordering of sub sequences, its named nature two-way merge
sort. In order to realize it, some carefulness must be paid. Figure 13.18 shows
the invariant during the nature merge sort. At anytime, all elements before
marker a and after marker d have been already scanned and merged. We are
trying to span the non-decreasing sub sequence [a, b) as long as possible, at the
same time, we span the sub sequence from right to left to span [c, d) as long as
possible as well. The invariant for the working area is shown in the second row.
All elements before f and after r have already been processed (Note that they
may contain several ordered sub sequences). For the odd times (1, 3, 5, ...), we
merge [a, b) and [c, d) from f toword right; while for the even times (2, 4, 6, ...),
we merge the two sorted sub sequences after r toward left.
... ? ...
421
function Sort(A)
if |A| > 1 then
n |A|
B Create-Array(n)
Create the working area
loop
[a, b) [1, 1)
[c, d) [n + 1, n + 1)
f 1, r n
front and rear pointers to the working area
t False
merge to front or rear
while b < c do
There are still elements for scan
repeat
Span [a, b)
bb+1
until b c A[b] < A[b 1]
repeat
Span [c, d)
cc1
until c b A[c 1] < A[c]
if c < b then
Avoid overlap
cb
if b a n then Done if [a, b) spans to the whole array
return A
if t then
merge to front
f Merge(A, [a, b), [c, d), B, f, 1)
else
merge to rear
r Merge(A, [a, b), [c, d), B, r, 1)
a b, d c
t t
Switch the merge direction
Exchange A B
Switch working area
return A
The merge algorithm is almost as same as before except that we need pass
a parameter to indicate the direction for merging.
1: function Merge(A, [a, b), [c, d), B, w, )
2:
while a < b c < d do
3:
if A[a] < A[d 1] then
4:
B[w] A[a]
5:
aa+1
6:
else
7:
B[w] A[d 1]
8:
dd1
9:
w w+
10:
while a < b do
11:
B[w] A[a]
12:
aa+1
13:
w w+
14:
while c < d do
15:
B[w] A[d 1]
16:
dd1
17:
w w+
18:
return w
422CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
The following ANSI C program implements this two-way nature merge sort
algorithm. Note that it doesnt release the allocated working area explictly.
int merge(Key xs, int a, int b, int c, int d, Key ys, int k, int delta) {
for(; a < b && c < d; k += delta )
ys[k] = xs[a] < xs[d-1] ? xs[a++] : xs[--d];
for(; a < b; k += delta)
ys[k] = xs[a++];
for(; c < d; k += delta)
ys[k] = xs[--d];
return k;
}
Key sort(Key xs, Key ys, int n) {
int a, b, c, d, f, r, t;
if(n < 2)
return xs;
for(;;) {
a = b = 0;
c = d = n;
f = 0;
r = n-1;
t = 1;
while(b < c) {
do {
/ span [a, b) as much as possible /
++b;
} while( b < c && xs[b-1] xs[b] );
do{
/ span [c, d) as much as possible /
--c;
} while( b < c && xs[c] xs[c-1] );
if( c < b )
c = b;
/ eliminate overlap if any /
if( b - a n)
return xs;
/ sorted /
if( t )
f = merge(xs, a, b, c, d, ys, f, 1);
else
r = merge(xs, a, b, c, d, ys, r, -1);
a = b;
d = c;
t = !t;
}
swap(&xs, &ys);
}
return xs; /cant be here/
}
423
than 2, this time, the working area will be filled with merged ordered sub arrays
of length 4, ... Repeat this we get the length of the non-decreasing sub arrays
doubled in every round, so there are at most O(lg n) rounds, and in every round
we scanned all the elements. The overall performance for this worst case is
bound to O(n lg n). Well go back to this interesting phenomena in the next
section about bottom-up merge sort.
In purely functional settings however, its not sensible to scan list from both
ends since the underlying data structure is singly linked-list. The nature merge
sort can be realized in another approach.
Observe that the list to be sorted is consist of several non-decreasing sub
lists, that we can pick every two of such sub lists and merge them to a bigger
one. We repeatedly pick and merge, so that the number of the non-decreasing
sub lists halves continuously and finally there is only one such list, which is the
sorted result. This idea can be formalized in the following equation.
sort(L) = sort (group(L))
(13.35)
Where function group(L) groups the elements in the list into non-decreasing
sub lists. This function can be described like below, the first two are trivial edge
cases.
If the list is empty, the result is a list contains an empty list;
If there is only one element in the list, the result is a list contains a
singleton list;
Otherwise, The first two elements are compared, if the first one is less
than or equal to the second, it is linked in front of the first sub list of the
recursive grouping result; or a singleton list contains the first element is
set as the first sub list before the recursive result.
{L} : |L| 1
{{l1 } L1 , L2 , ...} : l1 l2 , {L1 , L2 , ...} = group(L )
groupBy
groupBy
groupBy
groupBy
where
r@(ys:yss) = groupBy f xs
6 There
is a groupBy function provided in the Haskell standard library Data.List. However, it doesnt fit here, because it accepts an equality testing function as parameter, which
must satisfy the properties of reflexive, transitive, and symmetric. but what we use here, the
less-than or equal to operation doesnt conform to symetric. Refer to appendix A of this book
for detail.
424CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
Dierent from the sort function, which sorts a list of elements, function sort
accepts a list of sub lists which is the result of grouping.
sort (L) =
: L=
L1 : L = {L1 }
(13.37)
The first two are the trivial edge cases. If the list to be sorted is empty, the
result is obviously empty; If it contains only one sub list, then we are done. We
need just extract this single sub list as result; For the recursive case, we call a
function mergeP airs to merge every two sub lists, then recursively call sort .
The next undefined function is mergeP airs, as the name indicates, it repeatedly merges pairs of non-decreasing sub lists into bigger ones.
{
L : |L| 1
{merge(L1 , L2 )} mergeP airs(L ) : otherwise
(13.38)
When there are less than two sub lists in the list, we are done; otherwise, we
merge the first two sub lists L1 and L2 , and recursively merge the rest of pairs
in L . The type of the result of mergeP airs is list of lists, however, it will be
flattened by sort function finally.
The merge function is as same as before. The complete example Haskell
program is given as below.
mergeP airs(L) =
Alternatively, observing that we can first pick two sub lists, merge them to
an intermediate result, then repeatedly pick next sub list, and merge to this
ordered result weve gotten so far until all the rest sub lists are merged. This is
a typical folding algorithm as introduced in appendix A.
sort(L) = f old(merge, , group(L))
(13.39)
Exercise 13.6
Is the nature merge sort algorithm realized by folding is equivalent with
the one by using mergeP airs in terms of performance? If yes, prove it; If
not, which one is faster?
13.11
425
The worst case analysis for nature merge sort raises an interesting topic, instead
of realizing merge sort in top-down manner, we can develop a bottom-up version.
The great advantage is that, we neednt do book keeping any more, so the
algorithm is quite friendly for purely iterative implementation.
The idea of bottom-up merge sort is to turn the sequence to be sorted into n
small sub sequences each contains only one element. Then we merge every two
of such small sub sequences, so that we get n2 ordered sub sequences each with
length 2; If n is odd number, we left the last singleton sequence untouched. We
repeatedly merge these pairs, and finally we get the sorted result. Knuth names
this variant as straight two-way merge sort [1]. The bottom-up merge sort is
illustrated in figure 13.19
...
...
...
...
{
wraps(L) =
(13.40)
: L=
{{l1 }} wraps(L ) : otherwise
(13.41)
(13.42)
426CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
We reuse the function sort and mergeP airs which are defined in section of
nature merge sort. They repeatedly merge pairs of sub lists until there is only
one.
Implement this version in Haskell gives the following example code.
sort = sort map (x[x])
(13.43)
That instead of spanning the non-decreasing sub list as long as possible, the
predicate always evaluates to false, so the sub list spans only one element.
Similar with nature merge sort, bottom-up merge sort can also be defined
by folding. The detailed implementation is left as exercise to the reader.
Observing the bottom-up sort, we can find its in tail-recursion call manner,
thus its quite easy to translate into purely iterative algorithm without any
recursion.
1: function Sort(A)
2:
B
3:
for a A do
4:
B Append({a})
5:
N |B|
6:
while N > 1 do
7:
for i from 1 to N2 do
8:
B[i] Merge(B[2i 1], B[2i])
9:
if Odd(N ) then
10:
B[ N2 ] B[N ]
11:
12:
13:
14:
N N2
if B = then
return
return B[1]
The Python implementation combines multiple rounds of merging by consuming the pair of lists on the head, and appending the merged result to the
13.12. PARALLELISM
427
tail. This greatly simply the logic of handling odd sub lists case as shown in the
above pseudo code.
Exercise 13.7
Implement the functional bottom-up merge sort by using folding.
Implement the iterative bottom-up merge sort only with array indexing.
Dont use any library supported tools, such as list, vector etc.
13.12
Parallelism
We mentioned in the basic version of quick sort, that the two sub sequences
can be sorted in parallel after the divide phase finished. This strategy is also
applicable for merge sort. Actually, the parallel version quick sort and morege
sort, do not only distribute the recursive sub sequences sorting into two parallel
processes, but divide the sequences into p sub sequences, where p is the number
of processors. Idealy, if we can achieve sorting in T time with parallelism,
which satisifies O(n lg n) = pT . We say it is linear speed up, and the algorithm
is parallel optimal.
However, a straightforward parallel extension to the sequential quick sort
algorithm which samples several pivots, divides p sub sequences, and independantly sorts them in parallel, isnt optimal. The bottleneck exists in the divide
phase, which we can only achive O(n) time in average case.
The straightforward parallel extention to merge sort, on the other hand,
block at the merge phase. Both parallel merge sort and quick sort in practice
need good designes in order to achieve the optimal speed up. Actually, the
divide and conqure nature makes merge sort and quick sort relative easy for
parallelisim. Richard Cole found the O(lg n) parallel merge sort algorithm with
n processors in 1986 in [13].
Parallelism is a big and complex topic which is out of the scope of this
elementary book. Readers can refer to [13] and [14] for details.
13.13
Short summary
In this chapter, two popular divide and conquer sorting methods, quick sort
and merge sort are introduced. Both of them meet the upper performance
limit of the comparison based sorting algorithms O(n lg n). Sedgewick said
that quick sort is the greatest algorithm invented in the 20th century. Almost
all programming environments adopt quick sort as the default sorting tool. As
time goes on, some environments, especially those manipulate abstract sequence
which is dynamic and not based on pure array switch to merge sort as the general
purpose sorting tool7 .
The reason for this interesting phenomena can be partly explained by the
treatment in this chapter. That quick sort performs perfectly in most cases,
7 Actually, most of them are kind of hybrid sort, balanced with insertion sort to achieve
good performance when the sequence is short
428CHAPTER 13. DIVIDE AND CONQUER, QUICK SORT VS. MERGE SORT
it needs fewer swapping than most other algorithms. However, the quick sort
algorithm is based on swapping, in purely functional settings, swapping isnt
the most ecient way due to the underlying data structure is singly linked-list,
but not vectorized array. Merge sort, on the other hand, is friendly in such
environment, as it costs constant spaces, and the performance can be ensured
even in the worst case of quick sort, while the latter downgrade to quadratic
time. However, merge sort doesnt performs as well as quick sort in purely imperative settings with arrays. It either needs extra spaces for merging, which is
sometimes unreasonable, for example in embedded system with limited memory,
or causes many overhead swaps by in-place workaround. In-place merging is till
an active research area.
Although the title of this chapter is quick sort vs. merge sort, its not
the case that one algorithm has nothing to do with the other. Quick sort can
be viewed as the optimized version of tree sort as explained in this chapter.
Similarly, merge sort can also be deduced from tree sort as shown in [12].
There are many ways to categorize sorting algorithms, such as in [1]. One
way is to from the point of view of easy/hard partition, and easy/hard merge
[7].
Quick sort, for example, is quite easy for merging, because all the elements
in the sub sequence before the pivot are no greater than any one after the pivot.
The merging for quick sort is actually trivial sequence concatenation.
Merge sort, on the other hand, is more complex in merging than quick sort.
However, its quite easy to divide no matter what concrete divide method is
taken: simple divide at the middle point, even-odd splitting, nature splitting,
or bottom-up straight splitting. Compare to merge sort, its more dicult for
quick sort to achieve a perfect dividing. We show that in theory, the worst
case cant be completely avoided, no matter what engineering practice is taken,
median-of-three, random quick sort, 3-way partition etc.
Weve shown some elementary sorting algorithms in this book till this chapter, including insertion sort, tree sort, selection sort, heap sort, quick sort and
merge sort. Sorting is still a hot research area in computer science. At the
time when this chapter is written, people are challenged by the buzz word big
data, that the traditional convenient method cant handle more and more huge
data within reasonable time and resources. Sorting a sequence of hundreds of
Gigabytes becomes a routine in some fields.
Exercise 13.8
Design an algorithm to create binary search tree by using merge sort
strategy.
Bibliography
[1] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] Robert Sedgewick. Implementing quick sort programs. Communication
of ACM. Volume 21, Number 10. 1978. pp.847 - 857.
[4] Jon Bentley. Programming pearls, Second Edition. Addison-Wesley Professional; 1999. ISBN-13: 978-0201657883
[5] Jon Bentley, Douglas McIlroy. Engineering a sort function. Software
Practice and experience VOL. 23(11), 1249-1265 1993.
[6] Robert
Sedgewick,
Jon
Bentley.
Quicksort
is
optimal.
http://www.cs.princeton.edu/ rs/talks/QuicksortIsOptimal.pdf
[7] Richard Bird. Pearls of functional algorithm design. Cambridge University Press. 2010. ISBN, 1139490605, 9781139490603
[8] Fethi Rabhi, Guy Lapalme. Algorithms: a functional programming approach. Second edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
[9] Simon Peyton Jones. The Implementation of functional programming languages. Prentice-Hall International, 1987. ISBN: 0-13-453333-X
[10] Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. Practical in-place mergesort. Nordic Journal of Computing, 1996.
[11] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[12] Jos`e Bacelar Almeida and Jorge Sousa Pinto. Deriving Sorting Algorithms. Technical report, Data structures and Algorithms. 2008.
[13] Cole, Richard (August 1988). Parallel merge sort. SIAM J. Comput. 17
(4): 770C785. doi:10.1137/0217049. (August 1988)
[14] Powers, David M. W. Parallelized Quicksort and Radixsort with Optimal
Speedup, Proceedings of International Conference on Parallel Computing
Technologies. Novosibirsk. 1991.
429
430
Searching
Chapter 14
Searching
14.1
Introduction
Searching is quite a big and important area. Computer makes many hard searching problems realistic. They are almost impossible for human beings. A modern
industry robot can even search and pick the correct gadget from the pipeline for
assembly; A GPS car navigator can search among the map, for the best route
to a specific place. The modern mobile phone is not only equipped with such
map navigator, but it can also search for the best price for Internet shopping.
This chapter just scratches the surface of elementary searching. One good
thing that computer oers is the brute-force scanning for a certain result in a
large sequence. The divide and conquer search strategy will be briefed with two
problems, one is to find the k-th big one among a list of unsorted elements; the
other is the popular binary search among a list of sorted elements. Well also
introduce the extension of binary search for multiple-dimension data.
Text matching is also very important in our daily life, two well-known searching algorithms, Knuth-Morris-Pratt (KMP) and Boyer-Moore algorithms will be
introduced. They set good examples for another searching strategy: information
reusing.
Besides sequence search, some elementary methods for searching solution for
some interesting problems will be introduced. They were mostly well studied
in the early phase of AI (artificial intelligence), including the basic DFS (Depth
first search), and BFS (Breadth first search).
Finally, Dynamic programming will be briefed for searching optimal solutions, and well also introduce about greedy algorithm which is applicable for
some special cases.
All algorithms will be realized in both imperative and functional approaches.
14.2
Sequence search
Although modern computer oers fast speed for brute-force searching, and even
if the Moores law could be strictly followed, the grows of huge data is too fast
to be handled well in this way. Weve seen a vivid example in the introduction
chapter of this book. Its why people study the computer search algorithms.
431
432
14.2.1
One solution is to use divide and conquer approach. That if we can repeatedly
scale down the search domain, the data being dropped neednt be examined at
all. This will definitely speed up the search.
k-selection problem
Consider a problem of finding the k-th smallest one among n elements. The
most straightforward idea is to find the minimum first, then drop it and find
the second minimum element among the rest. Repeat this minimum finding and
dropping k steps will give the k-th smallest one. Finding the minimum among
n elements costs linear O(n) time. Thus this method performs O(kn) time, if k
is much smaller than n.
Another method is to use the heap data structure weve introduced. No
matter what concrete heap is used, e.g. binary heap with implicit array, Fibonacci heap or others, Accessing the top element followed by popping is typically bound O(lg n) time. Thus this method, as formalized in equation (14.1)
and (14.2) performs in O(k lg n) time, if k is much smaller than n.
top(k, L) = f ind(k, heapif y(L))
{
f ind(k, H) =
top(H) : k = 0
f ind(k 1, pop(H)) : otherwise
(14.1)
(14.2)
However, heap adds some complexity to the solution. Is there any simple,
fast method to find the k-th element?
The divide and conquer strategy can help us. If we can divide all the elements
into two sub lists A and B, and ensure all the elements in A is not greater than
any elements in B, we can scale down the problem by following this method1 :
1. Compare the length of sub list A and k;
2. If k < |A|, the k-th smallest one must be contained in A, we can drop B
and further search in A;
3. If |A| < k, the k-th smallest one must be contained in B, we can drop A
and further search the (k |A|)-th smallest one in B.
Note that the italic font emphasizes the fact of recursion. The ideal case
always divides the list into two equally big sub lists A and B, so that we can
halve the problem each time. Such ideal case leads to a performance of O(n)
linear time.
Thus the key problem is how to realize dividing, which collects the first m
smallest elements in one sub list, and put the rest in another.
This reminds us the partition algorithm in quick sort, which moves all the
elements smaller than the pivot in front of it, and moves those greater than
the pivot behind it. Based on this idea, we can develop a divide and conquer
k-selection algorithm, which is called quick selection algorithm.
1 This actually demands a more accurate definition of the k-th smallest in L: Its equal to
the k-the element of L , where L is a permutation of L, and L is in monotonic non-decreasing
order.
433
top(k, L) =
l1 : |A| = k 1
top(k 1 |A|, B) : |A| < k 1
top(k, A) : otherwise
(14.3)
L=
p(l1 ), (A, B) = partition(p, L )
p(l1 )
(14.4)
The following Haskell example program implements this algorithm.
partition(p, L) =
(, ) :
({l1 } A, B) :
(A, {l1 } B) :
434
The average case analysis needs tool of mathematical expectation. Its quite
similar to the proof given in previous chapter of quick sort. Its left as an exercise
to the reader.
Similar as quick sort, this divide and conquer selection algorithm performs
well most time in practice. We can take the same engineering practice such as
media-of-three, or randomly select the pivot as we did for quick sort. Below is
the imperative realization for example.
1: function Top(k, A, l, u)
2:
Exchange A[l] A[ Random(l, u) ]
Randomly select in [l, u]
3:
p Partition(A, l, u)
4:
if p l + 1 = k then
5:
return A[p]
6:
if k < p l + 1 then
7:
return Top(k, A, l, p 1)
8:
return Top(k p + l 1, A, p + 1, u)
This algorithm searches the k-th smallest element in range of [l, u] for array
A. The boundaries are included. It first randomly selects a position, and swaps
it with the first one. Then this element is chosen as the pivot for partitioning.
The partition algorithm in-place moves elements and returns the position where
the pivot being moved. If the pivot is just located at position k, then we are
done; if there are more than k 1 elements not greater than the pivot, the
algorithm recursively searches the k-th smallest one in range [l, p1]; otherwise,
k is deduced by the number of elements before the pivot, and recursively searches
the range after the pivot [p + 1, u].
There are many methods to realize the partition algorithm, below one is
based on N. Lumotos method. Other realizations are left as exercises to the
reader.
1: function Partition(A, l, u)
2:
p A[l]
3:
Ll
4:
for R l + 1 to u do
5:
if (p < A[R]) then
6:
LL+1
7:
Exchange A[L] A[R]
8:
Exchange A[L] p
9:
return L
Below ANSI C example program implements this algorithm. Note that it
handles the special case that either the array is empty, or k is out of the boundaries of the array. It returns -1 to indicate the search failure.
int partition(Key xs, int l, int u) {
int r, p = l;
for (r = l + 1; r < u; ++r)
if (!(xs[p] < xs[r]))
swap(xs, ++l, r);
swap(xs, p, l);
return l;
}
/ The result is stored in xs[k], returns k if u-l k, otherwise -1 /
435
(14.5)
Where c1 and c2 are constant factors for the median of median and partition
computation respectively. Solving this equation with telescope method or the
master theory in [2] gives the linear O(n) performance.
In case we just want to pick the top k smallest elements, but dont care
about the order of them, the algorithm can be adjusted a little bit to fit.
A
tops(k, L) =
A {l1 } tops(k |A| 1, B)
tops(k, A)
:
:
:
:
k =0L=
|A| = k
|A| < k
otherwise
(14.6)
436
binary search
Another popular divide and conquer algorithm is binary search. Weve shown
it in the chapter about insertion sort. When I was in school, the teacher who
taught math played a magic to me, He asked me to consider a natural number
less than 1000. Then he asked me some questions, I only replied yes or no, and
finally he guessed my number. He typically asked questions like the following:
Is it an even number?
Is it a prime number?
Are all digits same?
Can it be divided by 3?
...
Most of the time he guessed the number within 10 questions. My classmates
and I all thought its unbelievable.
This game will not be so interesting if it downgrades to a popular TV program, that the price of a product is hidden, and you must figure out the exact
price in 30 seconds. The host of the program tells you if your guess is higher
or lower to the fact. If you win, the product is yours. The best strategy is to
use similar divide and conquer approach to perform a binary search. So its
common to find such conversation between the player and the host:
P: 1000;
H: High;
P: 500;
H: Low;
P: 750;
H: Low;
P: 890;
H: Low;
P: 990;
H: Bingo.
My math teacher told us that, because the number we considered is within
1000, if he can halve the numbers every time by designing good questions, the
number will be found in 10 questions. This is because 210 = 1024 > 1000.
However, it would be boring to just ask it is higher than 500, is lower than 250,
... Actually, the question is it even is very good, because it always halve the
numbers2 .
2 When the author revise this chapter, Microsoft released a game in social networks. User
can consider a persons name, the AI robot asks 16 questions next. The user only answers
with yes or no. The robot will tell you who is that person. Can you figure out how the robot
works?
437
438
Err
b1
bsearch(x, L) =
bsearch(x,
A)
bsearch(x, B )
:
:
:
:
L=
x = b1 , (A, B) = splitAt( |L|
2 , L)
B = x < b1
otherwise
Where b1 is the first element if B isnt empty, and B holds the rest except
for b1 . The splitAt function takes O(n) time to divide the list into two subs A
and B (see the appendix A, and the chapter about merge sort for detail). If B
isnt empty and x is equal to b1 , the search returns; Otherwise if it is less than
b1 , as the list is sorted, we need recursively search in A, otherwise, we search in
B. If the list is empty, we raise error to indicate search failure.
As we always split the list in the middle point, the number of elements halves
in each recursion. In every recursive call, we takes linear time for splitting. The
splitting function only traverses the first half of the linked-list, Thus the total
time can be expressed as.
n
n
n
+ c + c + ...
2
4
8
This results O(n) time, which is as same as the brute force search from head
to tail:
Err : L =
l1 : x = l1
search(x, L) =
search(x, L ) : otherwise
T (n) = c
x1 : ax1 = y
Err : otherwise
This function examines the solution domain in monotonic increasing order.
It takes the first candidate element x1 from X, compare ax1 and y, if they are
equal, then x1 is the solution and we are done; if it is less than y, then x1 is
dropped, and we search among the rest elements represented as X ; Otherwise,
3 Some readers may argue that array should be used instead of linked-list, for example in
Haskell. This book only deals with purely functional sequences in finger-tree. Dierent from
the Haskell array, it cant support constant time random accessing
439
Err
m
bsearch(f, y, l, u) =
bsearch(f,
y,
l,
m
1)
bsearch(f, y, m + 1, u)
:
:
:
:
u<l
f (m) = y, m = l+u
2
f (m) > y
f (m) < y
(14.7)
As we halve the solution domain in every recursion, this method computes
f (x) in O(log y) times. It is much faster than the brute-force searching.
2 dimensions search
Its quite natural to think that the idea of binary search can be extended to 2
dimensions or even more general multiple-dimensions domain. However, it is
not so easy.
Consider the example of a m n matrix M . The elements in each row and
each column are in strict increasing order. Figure 14.1 illustrates such a matrix
for example.
1
2
3
4
...
2
4
5
6
3
5
7
8
4
6
8
9
...
...
...
...
Figure 14.1: A matrix in strict increasing order for each row and column.
Given a value x, how to locate all elements equal to x in the matrix quickly?
We need develop an algorithm, which returns a list of locations (i, j) so that
Mi,j = x.
4 One alternative is to reuse the result of an when compute an+1 = aan . Here we consider
for general form monotonic function f (n)
440
Richard Bird in [1] mentioned that he used this problem to interview candidates for entry to Oxford. The interesting story was that, those who had some
computer background at school tended to use binary search. But its easy to
get stuck.
The usual way follows binary search idea is to examine element at M m2 , n2 .
If it is less than x, we can only drop the elements in the top-left area; If it
is greater than x, only the bottom-right area can be dropped. Both cases are
illustrated in figure 14.2, the gray areas indicate elements can be dropped.
Figure 14.2: Left: the middle point element is smaller than x. All elements in
the gray area are less than x; Right: the middle point element is greater than
x. All elements in the gray area are greater than x.
The problem is that the solution domain changes from a rectangle to a L
shape in both cases. We cant just recursively apply search on it. In order to
solve this problem systematically, we define the problem more generally, using
brute-force search as a start point, and keep improving it bit by bit.
Consider a function f (x, y), which is strict increasing for its arguments, for
instance f (x, y) = ax + by , where a and b are natural numbers. Given a value
z, which is a natural number too, we want to solve the equation f (x, y) = z by
finding all none negative integral candidate pairs (x, y).
With this definition, the matrix search problem can be specialized by below
function.
{
Mx,y : 1 x m, 1 y n
f (x, y) =
1 : otherwise
Brute-force 2D search
As all solutions should be found for f (x, y). One can immediately give the brute
force solution by embedded looping.
1: function Solve(f, z)
2:
A
3:
for x {0, 1, 2, ..., z} do
4:
for y {0, 1, 2, ..., z} do
5:
if f (x, y) = z then
6:
A A {(x, y)}
7:
441
return A
solve(f, z) = {(x, y)|x {0, 1, ..., z}, y {0, 1, ..., z}, f (x, y) = z}
(14.8)
Saddleback search
We havent utilize the fact that f (x, y) is strict increasing yet. Dijkstra pointed
out in [6], instead of searching from bottom-left corner, starting from the topleft leads to one eective solution. As illustrated in figure 14.3, the search starts
from (0, z), for every point (p, q), we compare f (p, q) with z:
If f (p, q) < z, since f is strict increasing, for all 0 y < q, we have
f (p, y) < z. We can drop all points in the vertical line section (in red
color);
If f (p, q) > z, then f (x, q) > z for all p < x z. We can drop all points
in the horizontal line section (in blue color);
Otherwise if f (p, q) = z, we mark (p, q) as one solution, then both line
sections can be dropped.
This is a systematical way to scale down the solution domain rectangle. We
keep dropping a row, or a column, or both.
442
as solve(f, z) = search(f, z, 0, z)
p>zq <0
f (p, q) < z
f (p, q) > z
otherwise
(14.9)
The first clause is the edge case, there is no solution if (p, q) isnt top-left to
(z, 0). The following example Haskell program implements this algorithm.
:
search(f, z, p + 1, q) :
search(f, z, p, q) =
search(f, z, p, q 1) :
443
searching. This is the worst case. There are three best cases. The first one
happens that in every iteration, both p and q advance by one, so that it needs
only z + 1 steps; The second case keeps advancing horizontally to right and ends
when p exceeds z; The last case is similar, that it keeps moving down vertically
to the bottom until q becomes negative.
Figure 14.4 illustrates the best cases and the worst cases respectively. Figure
14.4 (a) is the case that every point (x, z x) in diagonal satisfies f (x, z x) = z,
it uses z + 1 steps to arrive at (z, 0); (b) is the case that every point (x, z) along
the top horizontal line gives the result f (x, z) < z, the algorithm takes z + 1
steps to finish; (c) is the case that every point (0, x) along the left vertical line
gives the result f (0, x) > z, thus the algorithm takes z + 1 steps to finish; (d) is
the worst case. If we project all the horizontal sections along the search path to
x axis, and all the vertical sections to y axis, it gives the total steps of 2(z + 1).
(14.10)
444
445
ul
f (m) y < f (m + 1), m = l+u
2
f (m) y
otherwise
(14.11)
The first clause handles the edge case of empty range. The lower boundary
is returned in such case; If the middle point produces a value less than or equal
to the target, while the next one evaluates to a bigger value, then the middle
point is what we are looking for; Otherwise if the point next to the middle
also evaluates to a value not greater than the target, the lower bound is set as
the middle point plus one, and we perform recursively binary search; In the last
case, the middle point evaluates to a value greater than the target, upper bound
is updated as the point proceeds to the middle for further recursive searching.
The following Haskell example code implements this modified binary search.
l
m
bsearch(f, y, l, u) =
bsearch(f, y, m + 1, u)
bsearch(f, y, l, m 1)
:
:
:
:
bsearch f y (l, u) | u l = l
| f m y = if f (m + 1) y
then bsearch f y (m + 1, u) else m
| otherwise = bsearch f y (l, m-1)
where m = (l + u) div 2
(14.12)
And the improved saddleback search shrinks to this new search domain
solve(f, z) = search(f, z, 0, m):
p>nq <0
f (p, q) < z
f (p, q) > z
otherwise
(14.13)
Its almost as same as the basic saddleback version, except that it stops
if p exceeds n, but not z. In real implementation, the result of f (p, q) can
be calculated once, and stored in a variable as shown in the following Haskell
example.
:
search(f, z, p + 1, q) :
search(f, z, p, q) =
search(f, z, p, q 1) :
446
m = bsearch (f 0) z (0, z)
n = bsearch (xf x 0) z (0, z)
This improved saddleback search firstly performs binary search two rounds
to find the proper m, and n. Each round is bound to O(lg z) times of calculation
for f ; After that, it takes O(m + n) time in the worst case; and O(min(m, n))
time in the best case. The overall performance is given in the following table.
times of evaluation f
worst case 2 log z + m + n
best case
2 log z + min(m, n)
For some function f (x, y) = ax + by , for positive integers a and b, m and n
will be relative small, that the performance is close to O(lg z).
This algorithm can also be realized in imperative approach. Firstly, the
binary search should be modified.
1: function Binary-Search(f, y, (l, u))
2:
while l < u do
3:
m l+u
2
4:
if f (m) y then
5:
if y < f (m + 1) then
6:
return m
7:
l m+1
8:
else
9:
um
10:
return l
Utilize this algorithm, the boundaries m and n can be found before performing the saddleback search.
1: function Solve(f, z)
2:
m Binary-Search(y f (0, y), z, (0, z))
3:
n Binary-Search(x f (x, 0), z, (0, z))
4:
p 0, q m
5:
S
6:
while p n q 0 do
7:
z f (p, q)
8:
if z < z then
9:
pp+1
10:
else if z > z then
11:
q q1
12:
else
13:
S S {(p, q)}
14:
p p + 1, q q 1
15:
return S
The implementation is left as exercise to the reader.
More improvement to saddleback search
In figure 14.2, two cases are shown for comparing the value of the middle point
in a matrix with the given value. One case is the center value is smaller than
the given value, the other is bigger. In both cases, we can only throw away 14
candidates, and left a L shape for further searching.
447
448
449
Figure 14.9: Recursively search the gray areas, the bold line should be included
if f (p, q) = z.
search(a,b),(c,d) =
: c<ad<b
csearch : c a < b d
rsearch : otherwise
(14.14)
Figure 14.10: Edge cases when performing binary search in the center line.
450
csearch =
search(p,q1),(c,d)
search(a,b),(p1,q+1) {(p, q)} search(p+1,q1),(c,d)
search(a,b),(p,q+1) search(p+1,q1),(c,d)
: z < f (p, q)
: f (p, q) = z
: otherwise
(14.15)
Where
q = b+d
2 )
p = bsearch(x f (x, q), z, (a, c))
Function rsearch is quite similar except that it searches in the center horizontal line.
rsearch =
search(a,b),(p1,q)
search(a,b),(p1,q+1) {(p, q)} search(p+1,q1),(c,d)
search(a,b),(p1,q+1) search(p+1,q),(c,d)
: z < f (p, q)
: f (p, q) = z
: otherwise
(14.16)
Where
p = a+c
2 )
q = bsearch(y f (p, y), z, (d, b))
The following Haskell program implements this algorithm.
search f z (a, b) (c, d) | c < a | | b < d = []
| c - a < b - d = let q = (b + d) div 2 in
csearch (bsearch (x f
| otherwise = let p = (a + c) div 2 in
rsearch (p, bsearch (f p)
where
csearch (p, q) | z < f p q = search f z (p, q - 1) (c, d)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1)
| otherwise = search f z (a, b) (p, q + 1) ++
search f z (p + 1, q - 1) (c, d)
rsearch (p, q) | z < f p q = search f z (a, b) (p - 1, q)
| f p q == z = search f z (a, b) (p - 1, q + 1) ++
(p, q) : search f z (p + 1, q - 1)
| otherwise = search f z (a, b) (p - 1, q + 1) ++
search f z (p + 1, q) (c, d)
x q) z (a, c), q)
z (d, b))
(c, d)
(c, d)
And the main program calls this function after performing binary search in
X and Y axes.
solve f z = search f z (0, m) (n, 0) where
m = bsearch (f 0) z (0, z)
n = bsearch (x f x 0) z (0, z)
m n
, )
2 2
(14.17)
451
= j + 2T (2i1 , 2j1 )
i1
=
2k (j k)
k=0
(14.18)
= O(2i (j i))
= O(m log(n/m))
Richard Bird proved that this is asymptotically optimal by a lower bound
of searching a given value in m n rectangle [1].
The imperative algorithm is almost as same as the functional version. We
skip it for the sake of brevity.
Exercise 14.1
Prove that the average case for the divide and conquer solution to kselection problem is O(n). Please refer to previous chapter about quick
sort.
Implement the imperative k-selection problem with 2-way partition, and
median-of-three pivot selection.
Implement the imperative k-selection problem to handle duplicated elements eectively.
Realize the median-of-median k-selection algorithm and implement it in
your favorite programming language.
The tops(k, L) algorithm uses list concatenation likes A {l1 } tops(k
|A| 1, B). It is linear operation which is proportion to the length of the
list to be concatenated. Modify the algorithm so that the sub lists are
concatenated by one pass.
The author considered another divide and conquer solution for the kselection problem. It finds the maximum of the first k elements and the
minimum of the rest. Denote them as x, and y. If x is smaller than y, it
means that all the first k elements are smaller than the rest, so that they
are exactly the top k smallest; Otherwise, There are some elements in the
first k should be swapped.
1: procedure Tops(k, A)
2:
l1
3:
u |A|
4:
loop
5:
i Max-At(A[l..k])
6:
j Min-At(A[k + 1..u])
7:
if A[i] < A[j] then
8:
break
9:
Exchange A[l] A[j]
10:
Exchange A[k + 1] A[i]
11:
l Partition(A, l, k)
12:
u Partition(A, k + 1, u)
452
14.2.2
Information reuse
453
reuse information as little as possible. After that, two popular string matching
algorithms, Knuth-Morris-Pratt algorithm and Boyer-Moore algorithm will be
introduced.
Boyer-Moore majority number
Voting is quite critical to people. We use voting to choose the leader, make
decision or reject a proposal. In the months when I was writing this chapter,
there are three countries in the world voted their presidents. All of the three
voting activities utilized computer to calculate the result.
Suppose there is a country in a small island wants a new president. According
to the constitution, only if the candidate wins more than half of the votes can
be selected as the president. Given a serious of votes, such as A, B, A, C, B,
B, D, ..., can we develop a program tells who is the new president if there is, or
indicate nobody wins more than half of the votes?
Of course this problem can be solved with brute-force by using a map. As
what we did in the chapter of binary search tree5 .
template<typename T>
T majority(const T xs, int n, T fail) {
map<T, int> m;
int i, max = 0;
T r;
for (i = 0; i < n; ++i)
++m[xs[i]];
for (typename map<T, int>::iterator it = m.begin(); it != m.end(); ++it)
if (itsecond > max) {
max = itsecond;
r = itfirst;
}
return max 2 > n ? r : fail;
}
This program first scan the votes, and accumulates the number of votes for
each individual with a map. After that, it traverse the map to find the one with
the most of votes. If the number is bigger than the half, the winner is found
otherwise, it returns a special value to indicate fail.
The following pseudo code describes this algorithm.
1: function Majority(A)
2:
M empty map
3:
for a A do
4:
Put(M , a, 1+ Get(M, a))
5:
max 0, m N IL
6:
for (k, v) M do
7:
if max < v then
8:
max v, m k
9:
if max > |A|50% then
10:
return m
11:
else
5 There is a probabilistic sub-linear space counting algorithm published in 2004, named as
Count-min sketch[8].
454
fail
For m individuals and n votes, this program firstly takes about O(n log m)
time to build the map if the map is implemented in self balanced tree (red-black
tree for instance); or about O(n) time if the map is hash table based. However,
the hash table needs more space. Next the program takes O(m) time to traverse
the map, and find the majority vote. The following table lists the time and space
performance for dierent maps.
map
time
space
self-balanced tree O(n log m) O(m)
hashing
O(n)
O(m) at least
Boyer and Moore invented a cleaver algorithm in 1980, which can pick the
majority element with only one scan if there is. Their algorithm only needs
O(1) space [7].
The idea is to record the first candidate as the winner so far, and mark
him with 1 vote. During the scan process, if the winner being selected gets
another vote, we just increase the vote counter; otherwise, it means somebody
vote against this candidate, so the vote counter should be decreased by one. If
the vote counter becomes zero, it means this candidate is voted out; We select
the next candidate as the new winner and repeat the above scanning process.
Suppose there is a series of votes: A, B, C, B, B, C, A, B, A, B, B, D, B.
Below table illustrates the steps of this processing.
winner count scan position
A
1
A, B, C, B, B, C, A, B, A, B, B, D, B
A
0
A, B, C, B, B, C, A, B, A, B, B, D, B
C
1
A, B, C, B, B, C, A, B, A, B, B, D, B
C
0
A, B, C, B, B, C, A, B, A, B, B, D, B
B
1
A, B, C, B, B, C, A, B, A, B, B, D, B
B
0
A, B, C, B, B, C, A, B, A, B, B, D, B
A
1
A, B, C, B, B, C, A, B, A, B, B, D, B
A
0
A, B, C, B, B, C, A, B, A, B, B, D, B
A
1
A, B, C, B, B, C, A, B, A, B, B, D, B
A
0
A, B, C, B, B, C, A, B, A, B, B, D, B
B
1
A, B, C, B, B, C, A, B, A, B, B, D, B
B
0
A, B, C, B, B, C, A, B, A, B, B, D, B
B
1
A, B, C, B, B, C, A, B, A, B, B, D, B
The key point is that, if there exits the majority greater than 50%, it cant
be voted out by all the others. However, if there are not any candidates win
more than half of the votes, the recorded winner is invalid. Thus it is necessary
to perform a second round scan for verification.
The following pseudo code illustrates this algorithm.
1: function Majority(A)
2:
c0
3:
for i 1 to |A| do
4:
if c = 0 then
5:
x A[i]
6:
if A[i] = x then
7:
cc+1
8:
else
9:
cc1
12:
455
return x
If there is the majority element, this algorithm takes one pass to scan the
votes. In every iteration, it either increases or decreases the counter according
to the vote is support or against the current selection. If the counter becomes
zero, it means the current selection is voted out. So the new one is selected as
the updated candidate for further scan.
The process is linear O(n) time, and the spaces needed are just two variables.
One for recording the selected candidate so far, the other is for vote counting.
Although this algorithm can find the majority element if there is. it still
picks an element even there isnt. The following modified algorithm verifies the
final result with another round of scan.
1: function Majority(A)
2:
c0
3:
for i 1 to |A| do
4:
if c = 0 then
5:
x A[i]
6:
if A[i] = x then
7:
cc+1
8:
else
9:
cc1
10:
c0
11:
for i 1 to |A| do
12:
if A[i] = x then
13:
cc+1
14:
if c > %50|A| then
15:
return x
16:
else
17:
fail
Even with this verification process, the algorithm is still bound to O(n) time,
and the space needed is constant. The following ISO C++ program implements
this algorithm 6 .
10:
template<typename T>
T majority(const T xs, int n, T fail) {
T m;
int i, c;
for (i = 0, c = 0; i < n; ++i) {
if (!c)
m = xs[i];
c += xs[i] == m ? 1 : -1;
}
for (i = 0, c = 0; i < n; ++i, c += xs[i] == m);
return c 2 > n ? m : fail;
}
456
c so far, and a counter n. For non empty list L, we initialize c as the first vote
l1 , and set the counter as 1 to start the algorithm: maj(l1 , 1, L ), where L is
the rest votes except for l1 . Below are the definition of this function.
c : L=
maj(c, n + 1, L ) : l1 = c
maj(c, n, L) =
(14.19)
maj(l1 , 1, L ) : n = 0 l1 = c
maj(c, n 1, L ) : otherwise
We also need to define a function, which can verify the result. The idea is
that, if the list of votes is empty, the final result is a failure; otherwise, we start
the Boyer-Moore algorithm to find a candidate c, then we scan the list again to
count the total votes c wins, and verify if this number is not less than the half.
f ail : L =
c : c = maj(l1 , 1, L ), |{x|x L, x = c}| > %50|L|
majority(L) =
f ail : otherwise
(14.20)
Below Haskell example code implements this algorithm.
majority :: (Eq a) [a] Maybe a
majority [] = Nothing
majority (x:xs) = let m = maj x 1 xs in verify m (x:xs)
maj c n [] = c
maj c n (x:xs) | c == x = maj c (n+1) xs
| n == 0 = maj x 1 xs
| otherwise = maj c (n-1) xs
verify m xs = if 2 (length $ filter (==m) xs) > length xs
then Just m else Nothing
457
m Max(m, s)
8:
return m
The brute force algorithm does not reuse any information in previous search.
Similar with Boyer-Moore majority vote algorithm, we can record the maximum
sum end to the position where we are scanning. Of course we also need record
the biggest sum found so far. The following figure illustrates this idea and the
invariant during scan.
7:
...
max
...
max end at i
...
A : L=
maxsum (A , B , L ) : otherwise
(14.21)
458
B = max(l1 + B, 0)
A = max(A, B )
KMP
String matching is another important type of searching. Almost all the software
editors are equipped with tools to find string in the text. In chapters about Trie,
Patricia, and sux tree, we have introduced some powerful data structures
which can help to search string. In this section, we introduce another two string
matching algorithms all based on information reusing.
Some programming environments provide built-in string search tools, however, most of them are brute-force solution including strstr function in ANSI
C standard library, find in C++ standard template library, indexOf in Java
Development Kit etc. Figure 14.12 illustrate how such character-by-character
comparison process works.
a n y
a n a n y m
a n a n t h o u s
a n a n y m
o w e r
q
(a) The oset s = 4, after matching q = 4 characters, the 5th mismatch.
a n y
a n a n y m
a n a n t h o u s
a n a n y m
o w e r
q
(b) Move s = 4 + 2 = 6, directly.
459
we can increase s not only by one. This is because we have already known that
the first four characters anan have been matched, and the failure happens at
the 5th position. Observe the two letters prefix an of the pattern string is also
a sux of anan that we have matched so far. A more eective way is to shift
s by two but not one, which is shown in figure 14.12 (b). By this means, we
reused the information that 4 characters have been matched. This helps us to
skip invalid positions as many as possible.
Knuth, Morris and Pratt presented this idea in [9] and developed a novel
string matching algorithm. This algorithm is later called as KMP, which is
consist of the three authors initials.
For the sake of brevity, we denote the first k characters of text T as Tk .
Which means Tk is the k-character prefix of T .
The key point to shift s eectively is to find a function of q, where q is the
number of characters matched successfully. For instance, q is 4 in figure 14.12
(a), as the 5th character doesnt match.
Consider what situation we can shift s more than 1. As shown in figure
14.13, if we can shift the pattern P ahead, there must exist k, so that the first k
characters are as same as the last k characters of Pq . In other words, the prefix
Pk is sux of Pq .
...
T[i]
T[i+1]
P[1]
P[2]
T[i+2]
...
P[j]
P[j+1]
P[1]
P[2]
T[i+q-1]
...
...
P[q]
P[k]
...
...
...
(14.22)
460
5:
6:
7:
8:
9:
10:
11:
12:
P[2]
...
P[k]
P[k+1]
...
P[q-1]
P[q]
...
?
P[1]
P[2]
...
P[k]
P[k+1]
...
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
The following table lists the steps of building prefix function for pattern
string ananym. Note that the k in the table actually means the maximum k
satisfies equation (14.22).
461
q
Pq k Pk
1
a 0
2
an 0
3
ana 1 a
4
anan 2 an
5
anany 0
6 ananym 0
Translating the KMP algorithm to Python gives the below example code.
def kmp_match(w, p):
n = len(w)
m = len(p)
fallback = fprefix(p)
k = 0 # how many elements have been matched so far.
res = []
for i in range(n):
while k > 0 and p[k] != w[i]:
k = fallback[k] #fall back
if p[k] == w[i]:
k=k+1
if k == m:
res.append(i+1-m)
k = fallback[k-1] # look for next
return res
def fprefix(p):
m = len(p)
t = [0]m # fallback table
k=0
for i in range(2, m):
while k>0 and p[i-1] != p[k]:
k = t[k-1] #fallback
if p[i-1] == p[k]:
k=k+1
t[i] = k
return t
The KMP algorithm builds the prefix function for the pattern string as a
kind of pre-processing before the search. Because of this, it can reuse as much
information of the previous matching as possible.
The amortized performance of building the prefix function is O(m). This
can be proved by using potential method as in [2]. Using the similar method, it
can be proved that the matching algorithm itself is also linear. Thus the total
performance is O(m + n) at the expense of the O(m) space to record the prefix
function table.
It seems that varies pattern string would aect the performance of KMP.
Considering the case that we are finding pattern string aaa...a of length m in a
string aaa...a of length n. All the characters are same, when the last character
in the pattern is examined, we can only fallback by 1, and this 1 character
fallback repeats until it falls back to zero. Even in this extreme case, KMP
algorithm still holds its linear performance (why?). Please try to consider more
cases such as P = aaaa...b, T = aaaa...a and so on.
462
T[2]
... ...
T[i-1]
T[i]
T[i+1]
T[i+2]
... ...
P[j+2]
...
T[n-1]
T[n]
?
s
P[1]
P[2]
...
P[j]
P[j+1]
P[m]
T = Tp Ts = reverse(reverse(Tp )) Ts = reverse(Tp ) Ts
P = Pp Ps = reverse(reverse(Pp )) Ps = reverse(Pp ) Ps
(14.23)
7 Again, we dont use native array, even it is supported in some functional programming
environments like Haskell.
463
The idea is to using pair (Tp , Ts ) and (Pp , Ps ) instead. With this change,
the if t = p, we can update the prefix part fast in constant time.
Tp = cons(t, Tp )
Pp = cons(p, Pp )
(14.24)
The KMP matching algorithm starts by initializing the success prefix parts
to empty strings as the following.
search(P, T ) = kmp(, (, P )(, T ))
(14.25)
Where is the prefix function we explained before. The core part of KMP
algorithm, except for the prefix function building, can be defined as below.
{|Tp |}
{|
Tp } kmp(, (Pp , Ps ), (Tp , Ts ))
Ps = Ts =
Ps = Ts =
Ps = Ts =
t=p
: t = p Pp =
: t = p Pp =
(14.26)
The first clause states that, if the scan successfully ends to both the pattern
and text strings, we get a solution, and the algorithm terminates. Note that we
use the right position in the text string as the matching point. Its easy to use
the left position by subtracting with the length of the pattern string. For sake
of brevity, we switch to right position in functional solutions.
The second clause states that if the scan arrives at the end of text string,
while there are still rest of characters in the pattern string havent been matched,
there is no solution. And the algorithm terminates.
The third clause states that, if all the characters in the pattern string have
been successfully matched, while there are still characters in the text havent
been examined, we get a solution, and we fallback by calling prefix function
to go on searching other solutions.
The fourth clause deals with the case, that the next character in pattern
string and text are same. In such case, the algorithm advances one character
ahead, and recursively performs searching.
If the the next characters are not same and this is the first character in the
pattern string, we just need advance to next character in the text, and try again.
Otherwise if this isnt the first character in the pattern, we call prefix function
to fallback, and try again.
The brute-force way to build the prefix function is just to follow the definition
equation (14.22).
(Pp , Ps ) = (Pp , Ps )
where
Pp = longest({s|s pref ixes(Pp ), s Pp })
Ps = P Pp
:
:
:
:
(14.27)
464
Every time when calculate the fallback position, the algorithm naively enumerates all prefixes of Pp , checks if it is also the sux of Pp , and then pick the
longest one as result. Note that we reuse the subtraction symbol here for list
dier operation.
There is a tricky case which should be avoided. Because any string itself is
both its prefix and sux. Say Pp Pp and Pp Pp . We shouldnt enumerate
Pp as a candidate prefix. One solution of such prefix enumeration can be realized
as the following.
{
{} : L = |L| = 1
cons(, map(s cons(l1 , s), pref ixes(L ))) : otherwise
(14.28)
Below Haskell example program implements this version of string matching
algorithm.
pref ixes(L) =
This version does not only perform poorly, but it is also complex. We can
simplify it a bit. Observing the KMP matching is a scan process from left to
the right of the text, it can be represented with folding (refer to Appendix A for
detail). Firstly, we can augment each character with an index for folding like
below.
zip(T, {1, 2, ...})
(14.29)
Zipping the text string with infinity natural numbers gives list of pairs. For
example, text string The quick brown fox jumps over the lazy dog turns into
(T, 1), (h, 2), (e, 3), ... (o, 42), (g, 43).
The initial state for folding contains two parts, one is the pair of pattern
(Pp , Ps ), with prefix starts from empty, and the sux is the whole pattern
string (, P ). For illustration purpose only, we revert back to normal pairs
but not (Pp , Ps ) notation. It can be easily replaced with reversed form in the
finalized version. This is left as exercise to the reader. The other part is a
465
list of positions, where the successful matching are found. It starts from empty
list. After the folding finishes, this list contains all solutions. What we need
is to extract this list from the final state. The core KMP search algorithm is
simplified like this.
kmp(P, T ) = snd(f old(search, ((, P ), ), zip(T, {1, 2, ...})))
(14.30)
The only black box is the search function, which takes a state, and a pair
of character and index, and it returns a new state as result. Denote the first
character in Ps as p and the rest characters as Ps (Ps = cons(p, Ps )), we have
the following definition.
p = c Ps =
p = c Ps =
Pp =
otherwise
(14.31)
If the first character in Ps matches the current character c during scan, we
need further check if all the characters in the pattern have been examined, if so,
we successfully find a solution, This position i in list L is recorded; Otherwise,
we advance one character ahead and go on. If p does not match c, we need
fallback for further retry. However, there is an edge case that we cant fallback
any more. Pp is empty in this case, and we need do nothing but keep the current
state.
The prefix-function developed so far can also be improved a bit. Since we
want to find the longest prefix of Pp , which is also sux of it, we can scan from
right to left instead. For any non empty list L, denote the first element as l1 ,
and all the rest except for the first one as L , define a function init(L), which
returns all the elements except for the last one as below.
{
: |L| = 1
init(L) =
(14.32)
cons(l1 , init(L )) : otherwise
((Pp p, Ps ), L {i})
((Pp p, Ps ), L)
search(((Pp , Ps ), L), (c, i)) =
((Pp , Ps ), L)
:
:
:
:
Note that this function can not handle empty list. The idea of scan from
right to left for Pp is first check if init(Pp ) Pp , if yes, then we are done;
otherwise, we examine if init(init(Pp )) is OK, and repeat this till the left most.
Based on this idea, the prefix-function can be modified as the following.
{
(Pp , Ps ) =
(Pp , Ps ) :
f allback(init(Pp ), cons(last(Pp ), Ps )) :
Pp =
otherwise
(14.33)
Where
{
f allback(A, B) =
(A, B) :
(init(A), cons(last(A), B)) :
A Pp
otherwise
(14.34)
Note that fallback always terminates because empty string is sux of any
string. The last(L) function returns the last element of a list, it is also a
linear time operation (refer to Appendix A for detail). However, its constant
466
The bottleneck is that we can not use native array to record prefix functions
in purely functional settings. In fact the prefix function can be understood as
a state transform function. It transfer from one state to the other according to
the matching is success or fail. We can abstract such state changing as a tree.
In environment supporting algebraic data type, Haskell for example, such state
tree can be defined like below.
data State a = E | S a (State a) (State a)
A state is either empty, or contains three parts: the current state, the new
state if match fails, and the new state if match succeeds. Such definition is quite
similar to the binary tree. We can call it left-fail, right-success tree. The state
we are using here is (Pp , Ps ).
Similar as imperative KMP algorithm, which builds the prefix function from
the pattern string, the state transforming tree can also be built from the pattern.
The idea is to build the tree from the very beginning state (, P ), with both
its children empty. We replace the left child with a new state by calling
function defined above, and replace the right child by advancing one character
ahead. There is an edge case, that when the state transfers to (P, ), we can
not advance any more in success case, such node only contains child for failure
case. The build function is defined as the following.
{
build((Pp , Ps ), , ) =
build((Pp , Ps ), , ) :
build((Pp , Ps ), L, R) :
Ps =
otherwise
(14.35)
Where
L = build((Pp , Ps ), , )
R = build((Ps {p}, Ps ), , ))
The meaning of p and Ps are as same as before, that p is the first character
in Ps , and Ps is the rest characters. The most interesting point is that the build
function will never stop. It endless build a infinite tree. In strict programming
environment, calling this function will freeze. However, in environments support
lazy evaluation, only the nodes have to be used will be created. For example,
both Haskell and Scheme/Lisp are capable to construct such infinite state tree.
In imperative settings, it is typically realized by using pointers which links to
ancestor of a node.
467
(, ananym)
fail
(, ananym)
fail
match
match
(a, nanym)
fail
(, ananym)
(a, ananym)
(, ananym)
...
...
...
match
(an, anym)
fail
(, ananym)
match
(ana, nym)
fail
...
(a, nanym)
match
(anan, ym)
fail
...
match
(an, anym)
(anany, m)
fail
(, ananym)
match
(ananym, )
fail
(, ananym)
empty
(14.37)
(R, A {i}) :
(R, A) :
search((((Pp , Ps ), L, R), A), (c, i)) =
((((P
,
P
),
L,
R),
A) :
p
s
p = c match(R)
p = c match(R)
Pp =
otherwise
(14.38)
The following Haskell example program implements this algorithm.
468
build
build
l
r
(S s@(xs, []) E E) = S s
(S s@(xs, (y:ys)) E E) =
= build (S (failure s) E
= build (S (xs++[y], ys)
The bottle-neck is that the state tree building function calls to fallback.
While current definition of isnt eective enough, because it enumerates all
candidates from right to the left every time.
Since the state tree is infinite, we can adopt some common treatment for
infinite structures. One good example is the Fibonacci series. The first two
Fibonacci numbers are defined as 0 and 1; the rest Fibonacci numbers can be
obtained by adding the previous two numbers.
F0 = 0
F1 = 1
Fn = Fn1 + Fn2
(14.39)
Thus the Fibonacci numbers can be list one by one as the following
F0
F1
F2
F3
...
=0
=1
= F1 + F0
= F2 + F1
(14.40)
We can collect all numbers in both sides, and define F = {0, 1, F1 , F2 , ...},
Thus we have the following equation.
F = {0, 1, F1 + F0 , F2 + F1 , ...}
= {0, 1} {x + y|x {F0 , F1 , F2 , ...}, y {F1 , F2 , F3 , ...}}
= {0, 1} {x + y|x F, y F }
(14.41)
Where F = tail(F ) is all the Fibonacci numbers except for the first one. In
environments support lazy evaluation, like Haskell for instance, this definition
can be expressed like below.
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
The recursive definition for infinite Fibonacci series indicates an idea which
can be used to get rid of the fallback function . Denote the state transfer tree
469
as T , we can define the transfer function when matching a character on this tree
as the following.
trans(T, c) =
root : T =
R : T = ((Pp , Ps ), L, R), c = p
trans(L, c) : otherwise
(14.42)
Ps =
otherwise
(14.43)
The last brick is to define the root of the infinite state transfer tree, which
initializes the building.
build(T, (Pp , Ps )) =
((Pp , Ps ), T, ) :
((Pp , Ps ), T, build(trans(T, p), (Pp {p}, Ps ))) :
root = build(, (, P ))
(14.44)
And the new KMP matching algorithm is modified with this root.
kmp(P, T ) = snd(f old(trans, (root, []), zip(T, {1, 2, ...})))
(14.45)
470
Figure 14.17 shows the first 4 steps when search anaym in text anal. Since
the first 3 steps all succeed, so the left sub trees of these 3 states are not actually
constructed. They are marked as ?. In the fourth step, the match fails, thus the
right sub tree neednt be built. On the other hand, we must construct the left
sub tree, which is on top of the result of trans(right(right(right(T ))), n), where
function right(T ) returns the right sub tree of T . This can be further expanded
according to the definition of building and state transforming functions till we
get the concrete state ((a, nanym), L, R). The detailed deduce process is left as
exercise to the reader.
(, ananym)
fail
?
match
(a, nanym)
fail
?
match
(an, anym)
fail
?
match
(ana, nym)
fail
(a, nanym)
match
?
Figure 14.17: On demand construct the state transform tree when searching
ananym in text anal.
This algorithm depends on the lazy evaluation critically. All the states to
be transferred are built on demand. So that the building process is amortized
O(m), and the total performance is amortized O(n + m). Readers can refer to
[1] for detailed proof of it.
Its worth of comparing the final purely functional and the imperative algorithms. In many cases, we have expressive functional realization, however, for
KMP matching algorithm, the imperative approach is much simpler and more
intuitive. This is because we have to mimic the raw array by a infinite state
transfer tree.
Boyer-Moore
Boyer-Moore string matching algorithm is another eective solution invited in
1977 [10]. The idea of Boyer-Moore algorithm comes from the following observation.
The bad character heuristics
When attempt to match the pattern, even if there are several characters from
the left are same, it fails if the last one does not match, as shown in figure 14.18.
Whats more, we wouldnt find a match even if we slide the pattern down by
1, or 2. Actually, the length of the pattern ananym is 6, the last character is
471
...
...
Figure 14.19: Slide the pattern if the unmatched character appears in the pattern.
Its quite possible that the unmatched character appears in the pattern more
than one position. Denote the length of the pattern as |P |, the character appears
in positions p1 , p2 , ..., pi . In such case, we take the right most one to avoid
missing any matches.
s = |P | pi
(14.46)
Note that the shifting length is 0 for the last position in the pattern according
to the above equation. Thus we can skip it in realization. Another important
point is that since the shifting length is calculated against the position aligned
472
with the last character in the pattern string, (we deduce it from |P |), no matter
where the mismatching happens when we scan from right to the left, we slide
down the pattern string by looking up the bad character table with the one in
the text aligned with the last character of the pattern. This is shown in figure
14.20.
(a)
...
...
(b)
Figure 14.20: Even the mismatching happens in the middle, between char i
and a, we look up the shifting value with character e, which is 6 (calculated
from the first e, the second e is skipped to avoid zero shifting).
There is a good result in practice, that only using the bad-character rule leads
to a simple and fast string matching algorithm, called Boyer-Moore-Horspool
algorithm [11].
1: procedure Boyer-Moore-Horspool(T, P )
2:
for c do
3:
[c] |P |
4:
for i 1 to |P | 1 do
Skip the last position
5:
[P [i]] |P | i
6:
s0
7:
while s + |P | |T | do
8:
i |P |
9:
while i 1 P [i] = T [s + i] do
scan from right
10:
ii1
11:
if i < 1 then
12:
found one solution at s
13:
ss+1
go on finding the next
14:
else
15:
s s + [T [s + |P |]]
The character set is denoted as , we first initialize all the values of sliding
table as the length of the pattern string |P |. After that we process the pattern
from left to right, update the sliding value. If a character appears multiple times
in the pattern, the latter value, which is on the right hand, will overwrite the
previous value. We start the matching scan process by aligning the pattern
and the text string from the very left. However, for every alignment s, we scan
from the right to the left until either there is unmatched character or all the
characters in the pattern have been examined. The latter case indicates that
weve found a match; while for the former case, we look up to slide the pattern
down to the right.
The following example Python code implements this algorithm accordingly.
def bmh_match(w, p):
n = len(w)
473
m = len(p)
tab = [m for _ in range(256)] # table to hold the bad character rule.
for i in range(m-1):
tab[ord(p[i])] = m - 1 - i
res = []
offset = 0
while offset + m n:
i=m - 1
while i 0 and p[i] == w[offset+i]:
i=i - 1
if i < 0:
res.append(offset)
offset = offset + 1
else:
offset = offset + tab[ord(w[offset + m - 1])]
return res
The algorithm firstly takes about O(||+|P |) time to build the sliding table.
If the character set size is small, the performance is dominated by the pattern
and the text. There is definitely the worst case that all the characters in the
pattern and text are same, e.g. searching aa...a (m of a, denoted as am )
in text aa......a (n of a, denoted as an ). The performance in the worst case
is O(mn). This algorithm performs well if the pattern is long, and there are
constant number of matching. The result is bound to linear time. This is as
same as the best case of full Boyer-Moore algorithm which will be explained
next.
The good sux heuristics
Consider searching pattern abbabab in text bbbababbabab... like figure 14.21.
By using the bad-character rule, the pattern will be slided by two.
b
...
T
b
X
a
(a)
(b)
...
474
...
Figure 14.22: As the prefix ab is also the sux of what weve matched, we can
slide down the pattern to a position so that ab are aligned.
slide the pattern so far. This is because bab appears somewhere else, which
starts from the 3rd character of the pattern. In order not to miss any potential
matching, we can only slide the pattern by two.
b
...
T
b
...
X
a
(a)
(b)
Figure 14.23: Weve matched bab, which appears somewhere else in the pattern
(from the 3rd to the 5th character). We can only slide down the pattern by 2
to avoid missing any potential matching.
The above situation forms the two cases of the good-sux rule, as shown in
figure 14.24.
Both cases in good sux rule handle the situation that there are multiple
characters have been matched from right. We can slide the pattern to the right
if any of the the following happens.
Case 1 states that if a part of the matching sux occurs as a prefix of the
pattern, and the matching sux doesnt appear in any other places in the
pattern, we can slide the pattern to the right to make this prefix aligned;
Case 2 states that if the matching sux occurs some where else in the pattern, we can slide the pattern to make the right most occurrence aligned.
Note that in the scan process, we should apply case 2 first whenever it is
possible, and then examine case 1 if the whole matched sux does not appears
in the pattern. Observe that both cases of the good-sux rule only depend on
the pattern string, a table can be built by pre-process the pattern for further
looking up.
For the sake of brevity, we denote the sux string from the i-th character
of P as Pi . That Pi is the sub-string P [i]P [i + 1]...P [m].
For case 1, we can check every sux of P , which includes Pm , Pm1 , Pm2 ,
..., P2 to examine if it is the prefix of P . This can be achieved by a round of
scan from right to the left.
For case 2, we can check every prefix of P includes P1 , P2 , ..., Pm1 to
examine if the longest sux is also a sux of P . This can be achieved by
another round of scan from left to the right.
475
(a) Case 1, Only a part of the matching sux occurs as a prefix of the pattern.
(b) Case 2, The matching sux occurs some where else in the pattern.
Figure 14.24: The light gray section in the text represents the characters have
been matched; The dark gray parts indicate the same content in the pattern.
476
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
This algorithm builds the good-sux heuristics table s . It first checks every
sux of P from the shortest to the longest. If the sux Pi is also the prefix of
P , we record this sux, and use it for all the entries until we find another sux
Pj , j < i, and it is also the prefix of P .
After that, the algorithm checks every prefix of P from the shortest to the
longest. It calls the function Suffix-Length(Pi ), to calculate the length of the
longest sux of Pi , which is also sux of P . If this length s isnt zero, which
means there exists a sub-string, that appears as the sux of the pattern. It
indicates that case 2 happens. The algorithm overwrites the s-th entry from
the right of the table s . Note that to avoid finding the same occurrence of the
matched sux, we test if P [i s] and P [m s] are same.
Function Suffix-Length is designed as the following.
1: function Suffix-Length(Pi )
2:
m |P |
3:
j0
4:
while P [m j] = P [i j] j < i do
5:
j j+1
6:
return j
The following Python example program implements the good-sux rule.
def good_suffix(p):
m = len(p)
tab = [0 for _ in range(m)]
last = 0
# first loop for case 1
for i in range(m-1, 0, -1): # m-1, m-2, .., 1
if is_prefix(p, i):
last = i
tab[i - 1] = last
# second loop for case 2
for i in range(m):
slen = suffix_len(p, i)
if slen != 0 and p[i - slen] != p[m - 1 - slen]:
tab[m - 1 - slen] = m - 1 - i
return tab
# test if p[i..m-1] is prefix of p
477
Its quite possible that both the bad-character rule and the good-sux rule
can be applied when the unmatch happens. The Boyer-Moore algorithm compares and picks the bigger shift so that it can find the solution as quick as
possible. The bad-character rule table can be explicitly built as below
1: function Bad-Character(P )
2:
for c do
3:
b [c] |P |
4:
for i 1 to |P | 1 do
5:
b [P [i]] |P | i
6:
return b
The following Python program implements the bad-character rule accordingly.
def bad_char(p):
m = len(p)
tab = [m for _ in range(256)]
for i in range(m-1):
tab[ord(p[i])] = m - 1 - i
return tab
The final Boyer-Moore algorithm firstly builds the two rules from the pattern,
then aligns the pattern to the beginning of the text and scans from right to the
left for every alignment. If any unmatch happens, it tries both rules, and slides
the pattern with the bigger shift.
1: function Boyer-Moore(T, P )
2:
n |T |, m |P |
3:
b Bad-Character(P )
4:
s Good-Suffix(P )
5:
s0
6:
while s + m n do
7:
im
8:
while i 1 P [i] = T [s + i] do
9:
ii1
10:
if i < 1 then
11:
found one solution at s
12:
ss+1
go on finding the next
13:
else
14:
s s + max(b [T [s + m]], s [i])
478
Exercise 14.2
Proof that Boyer-Moore majority vote algorithm is correct.
Given a list, find the element occurs most. Are there any divide and
conqueror solutions? Are there any divide and conqueror data structures,
such as map can be used?
How to find the elements occur more than 1/3 in a list? How to find the
elements occur more than 1/m in the list?
If we reject the empty array as valid sub-array, how to realize the maximum
sum of sub-arrays puzzle?
Bentley presents a divide and conquer algorithm to find the maximum sum
in O(n log n) time in [4]. The idea is to split the list at the middle point.
We can recursively find the maximum sum in the first half and second half;
However, we also need to find maximum sum cross the middle point. The
method is to scan from the middle point to both ends as the following.
1: function Max-Sum(A)
2:
if A = then
3:
return 0
4:
else if |A| = 1 then
5:
return Max(0, A[1])
6:
else
479
m |A|
2
a Max-From(Reverse(A[1...m]))
b Max-From(A[m + 1...|A|])
c Max-Sum(A[1...m])
d Max-Sum(A[m + 1...|A|)
return Max(a + b, c, d)
function Max-From(A)
sum 0, m 0
for i 1 to |A| do
sum sum + A[i]
m Max(m, sum)
return m
14.3
Solution searching
One interesting thing that computer programming can oer is solving puzzles.
In the early phase of classic artificial intelligent, people developed many methods
to search for solutions. Dierent from the sequence searching and string matching, the solution doesnt obviously exist among a candidates set. It typically
480
need construct the solution while trying varies of attempts. Some problems are
solvable, while others are not. Among the solvable problems, not all of them
just have one unique solution. For example, a maze may have multiple ways
out. People sometimes need search for the best one.
14.3.1
DFS and BFS stand for deep-first search and breadth-first search. They are
typically introduced as graph algorithms in textbooks. Graph is a comprehensive topic which is hard to be covered in this elementary book. In this section,
well show how to use DFS and BFS to solve some real puzzles without formal
introduction about the graph concept.
Maze
Maze is a classic and popular puzzle. Maze is amazing to both kids and adults.
Figure 14.26 shows an example maze. There are also real maze gardens can be
found in parks for fun. In the late 1990s, maze-solving games were quite often
hold in robot mouse competition all over the world.
481
0
1
1
1
1
0
1
0
1
1
1
1
0
1
0
1
1
1
1
0
1
0
1
1
1
1
0
1
0
0
0
0
0
0
0
Given a start point s = (i, j), and a goal e = (p, q), we need find all solutions,
that are the paths from s to e.
There is an obviously recursive exhaustive search method. That in order
to find all paths from s to e, we can check all connected points to s, for every
such point k, we recursively find all paths from k to e. This method can be
illustrated as the following.
Trivial case, if the start point s is as same as the target point e, we are
done;
Otherwise, for every connected point k to s, recursively find the paths
from k to e; If e can be reached via k, put section s-k in front of each path
between k and e.
However, we have to left bread crumbs to avoid repeatedly trying the same
attempts. This is because otherwise in the recursive case, we start from s, find
a connected point k, then we further try to find paths from k to e. Since s is
connected to k as well, so in the next recursion, well try to find paths from s
to e again. It turns to be the very same origin problem, and we are trapped in
infinite recursions.
Our solution is to initialize an empty list, use it to record all the points weve
visited so far. For every connected point, we look up the list to examine if it
has already been visited. We skip all the visited candidates and only try those
new ones. The corresponding algorithm can be defined like this.
solveM aze(m, s, e) = solve(s, {})
(14.47)
482
Where m is the matrix which defines a maze, s is the start point, and e is
the end point. Function solve is defined in the context of solveM aze, so that
the maze and the end point can be accessed. It can be realized recursively like
what we described above8 .
solve(s, P ) =
concat({
{{s} p|p P }
solve(s , {{s} p|p P })|
s adj(s), visited(s )})
: s=e
: otherwise
(14.48)
Note that P also serves as an accumulator. Every connected point is recorded
in all the possible paths to the current position. But they are stored in reversed
order, that is the newly visited point is put to the head of all the lists, and
the starting point is the last one. This is because the appending operation is
linear (O(n), where n is the number of elements stored in a list), while linking
to the head is just constant time. We can output the result in correct order by
reversing all possible solutions in equation (14.47)9 :
solveM aze(m, s, e) = map(reverse, solve(s, {}))
(14.49)
We need define functions adj(p) and visited(p), which finds all the connected
points to p, and tests if point p has been visited respectively. Two points are
connected if and only if they are next cells horizontally or vertically in the maze
matrix, and both have zero value.
adj((x, y)) = {(x , y )| (x , y ) {(x 1, y), (x + 1, y), (x, y 1), (x, y + 1)},
1 x M, 1 y N, mx y = 0}
(14.50)
Where M and N are the widths and heights of the maze.
Function visited(p) examines if point p has been recorded in any lists in P .
visited(p) = path P, p path
(14.51)
For a maze defined as matrix like below example, all the solutions can be
given by this program.
8 Function concat can flatten a list of lists. For example. concat({{a, b, c}, {x, y, z}}) =
{a, b, c, x, y, z}. Refer to appendix A for detail.
9 the detailed definition of reverse can be found in the appendix A.
0,
0,
0,
1,
0,
0,
1,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
1,
1,
1,
0,
1,
0,
1,
483
1],
1],
0],
1],
0],
0]]
[i, p, ... , s]
[s]
[a, s]
[b, s]
...
[p, ... , s]
[j, p, ..., s]
[q, ..., s]
[k, p, ..., s]
...
[q, ..., s]
...
Figure 14.28: The stack is initialized with a singleton list of the starting point
s. s is connected with point a and b. Paths {a, s} and {b, s} are pushed back.
In some step, the path ended with point p is popped. p is connected with points
i, j, and k. These 3 points are expanded as dierent options and pushed back
to the stack. The candidate path ended with q wont be examined unless all the
options above fail.
The stack can be realized with a list. The latest option is picked from the
head, and the new candidates are also added to the head. The maze puzzle can
484
(14.52)
As we are searching the first, but not all the solutions, map isnt used here.
When the stack is empty, it means that weve tried all the options and failed
to find a way out. There is no solution; otherwise, the top option is popped,
expanded with all the adjacent points which havent been visited before, and
pushed back to the stack. Denote the stack as S, if it isnt empty, the top
element is s1 , and the new stack after the top being popped as S . s1 is a list
of points represents path P . Denote the first point in this path as p1 , and the
rest as P . The solution can be formalized as the following.
s
1
solve (S) =
solve (S )
solve ({{p} P |p C} S)
:
:
:
:
S=
s1 = e
C = {c|c adj(p1 ), c P } =
otherwise, C =
(14.53)
Where the adj function is defined above. This updated maze solution can
be implemented with the below example Haskell program 10 .
Its quite easy to modify this algorithm to find all solutions. When we find
a path in the second clause, instead of returning it immediately, we record it
and go on checking the rest memorized options in the stack till until the stack
becomes empty. We left it as an exercise to the reader.
The same idea can also be realized imperatively. We maintain a stack to store
all possible paths from the starting point. In each iteration, the top option path
is popped, if the farthest position is the end point, a solution is found; otherwise,
all the adjacent, not visited yet points are appended as new paths and pushed
back to the stack. This is repeated till all the candidate paths in the stacks are
checked.
We use the same notation to represent the stack S. But the paths will
be stored as arrays instead of list in imperative settings as the former is more
eective. Because of this the starting point is the first element in the path
array, while the farthest reached place is the right most element. We use pn
to represent Last(P ) for path P . The imperative algorithm can be given as
below.
1: function Solve-Maze(m, s, e)
2:
S
3:
Push(S, {s})
4:
L
the result list
10 The
485
while S = do
P Pop(S)
if e = pn then
8:
Add(L, P )
9:
else
10:
for p Adjacent(m, pn ) do
11:
if p
/ P then
12:
Push(S, P {p})
13:
return L
The following example Python program implements this maze solving algorithm.
5:
6:
7:
And the same maze example given above can be solved by this program like
the following.
mz = [[0,
[1,
[1,
[1,
[0,
[0,
0,
0,
0,
1,
0,
0,
1,
1,
0,
0,
0,
0,
0,
0,
0,
1,
0,
1,
1,
1,
0,
1,
0,
1,
1],
1],
0],
1],
0],
0]]
It seems that in the worst case, there are 4 options (up, down, left, and right)
at each step, each option is pushed to the stack and eventually examined during
backtracking. Thus the complexity is bound to O(4n ). The actual time wont
be so large because we filtered out the places which have been visited before. In
the worst case, all the reachable points are visited exactly once. So the time is
486
solve({}, )
(14.54)
487
L : S=
solve(S
,
{s
}
L)
: |s1 | = 8
1
solve(S, L) =
{i} s1 | i [1, 8],
i
/ s1 ,
solve(
S , L) : otherwise
saf e(i, s1 )
(14.55)
If the stack is empty, all the possible candidates have been examined, its not
possible to backtrack any more. L has been accumulated all found solutions and
returned as the result; Otherwise, if the length of the top element in the stack
is 8, a valid solution is found. We add it to L, and go on finding other solutions;
If the length is less than 8, we need try to put the next queen. Among all the
columns from 1 to 8, we pick those not already occupied by previous queens
(through the i
/ s1 clause), and must not be attacked in diagonal direction
(through the saf e predication). The valid assignments will be pushed to the
stack for the further searching.
Function saf e(x, C) detects if the assignment of a queen in position x will
be attacked by other queens in C in diagonal direction. There are 2 possible
cases, 45 and 135 directions. Since the row of this new queen is y = 1 + |C|,
where |C| is the length of C, the saf e function can be defined as the following.
saf e(x, C) = (c, r) zip(reverse(C), {1, 2, ...}), |x c| = |y r|
(14.56)
Where zip takes two lists, and pairs every elements in them to a new
list. Thus If C = {ci1 , ci2 , ..., c2 , c1 } represents the column of the first
i 1 queens has been assigned, the above function will check list of pairs
{(c1 , 1), (c2 , 2), ..., (ci1 , i 1)} with position (x, y) forms any diagonal lines.
Translating this algorithm into Haskell gives the below example program.
solve = dfsSolve [[]] [] where
dfsSolve [] s = s
dfsSolve (c:cs) s
| length c == 8 = dfsSolve cs (c:s)
| otherwise = dfsSolve ([(x:c) | x [1..8] \\ c,
not $ attack x c] ++ cs) s
attack x cs = let y = 1 + length cs in
any ((c, r) abs(x - c) == abs(y - r)) $
zip (reverse cs) [1..]
Observing that the algorithm is tail recursive, its easy to transform it into
imperative realization. Instead of using list, we use array to represent queens
assignment. Denote the stack as S, and the possible solutions as A. The
imperative algorithm can be described as the following.
1: function Solve-Queens
488
S {}
L
The result list
while S = do
5:
A Pop(S)
A is an intermediate assignment
6:
if |A| = 8 then
7:
Add(L, A)
8:
else
9:
for i 1 to 8 do
10:
if Valid(i, A) then
11:
Push(S, A {i})
12:
return L
The stack is initialized with the empty assignment. The main process repeatedly pops the top candidate from the stack. If there are still queens left, the
algorithm examines possible columns in the next row from 1 to 8. If a column
is safe, that it wont be attacked by any previous queens, this column will be
appended to the assignment, and pushed back to the stack. Dierent from the
functional approach, since array, but not list, is used, we neednt reverse the
solution assignment any more.
Function Valid checks if column x is safe with previous queens put in A.
It filters out the columns have already been occupied, and calculates if any
diagonal lines are formed with existing queens.
1: function Valid(x, A)
2:
y 1 + |A|
3:
for i 1 to |A| do
4:
if x = A[i] |y i| = |x A[i]| then
5:
return False
6:
return True
The following Python example program implements this imperative algorithm.
2:
3:
4:
def solve():
stack = [[]]
s = []
while stack != []:
a = stack.pop()
if len(a) == 8:
s.append(a)
else:
for i in range(1, 9):
if valid(i, a):
stack.append(a+[i])
return s
def valid(x, a):
y = len(a) + 1
for i in range(1, y):
if x == a[i-1] or abs(y - i) == abs(x - a[i-1]):
return False
return True
Although there are 8 optional columns for each queen, not all of them are
valid and thus further expanded. Only those columns havent been occupied by
489
previous queens are tried. The algorithm only examines 15720, which is far less
than 88 = 16777216, possibilities [13].
Its quite easy to extend the algorithm, so that it can solve n queens puzzle,
where n 4. However, the time cost increases fast. The backtrack algorithm
is just slightly better than the one permuting the sequence of 1 to 8 (which is
bound to o(n!)). Another extension to the algorithm is based on the fact that
the chess board is square, which is symmetric both vertically and horizontally.
Thus a solution can generate other solutions by rotating and flipping. These
aspects are left as exercises to the reader.
Peg puzzle
I once received a puzzle of the leap frogs. It said to be homework for 2nd grade
student in China. As illustrated in figure 14.30, there are 6 frogs in 7 stones.
Each frog can either hop to the next stone if it is not occupied, or leap over one
frog to another empty stone. The frogs on the left side can only move to the
right, while the ones on the right side can only move to the left. These rules are
described in figure 14.31
490
(a) Solitaire
Figure
14.32:
Variants
of
the
http://home.comcast.net/ stegmann/jumping.htm
peg
puzzles
from
on 3rd stone can hop right to the empty stone; symmetrically, the frog on the
5th stone can hop left; Alternatively, the frog on the 2nd stone can leap right,
while the frog on the 6th stone can leap left.
We can record the state and try one of these 4 options at every step. Of
course not all of them are possible at any time. If get stuck, we can backtrack
and try other options.
As we restrict the left side frogs only moving to the right, and the right
frogs only moving to the left, the moves are not reversible. There wont be any
repetition cases as what we have to deal with in the maze puzzle. However, we
still need record the steps in order to print them out finally.
In order to enforce these restriction, let A, O, B in representation AAAOBBB
be -1, 0, and 1 respectively. A state L is a list of elements, each element is one
of these 3 values. It starts from {1, 1, 1, 0, 1, 1, 1}. L[i] access the i-th element, its value indicates if the i-th stone is empty, occupied by a frog from left
side, or occupied by a frog from right side. Denote the position of the vacant
stone as p. The 4 moving options can be stated as below.
Leap left: p < 6 and L[p + 2] > 0, swap L[p] L[p + 2];
Hop left: p < 7 and L[p + 1] > 0, swap L[p] L[p + 1];
Leap right: p > 2 and L[p 2] < 0, swap L[p 2] L[p];
Hop right: p > 1 and L[p 1] < 0, swap L[p 1] L[p].
Four functions leapl (L), hopl (L), leapr (L) and hopr (L) are defined accordingly. If the state L does not satisfy the move restriction, these function return
L unchanged, otherwise, the changed state L is returned accordingly.
We can also explicitly maintain a stack S to the attempts as well as the
historic movements. The stack is initialized with a singleton list of starting
state. The solution is accumulated to a list M , which is empty at the beginning:
solve({{1, 1, 1, 0, 1, 1, 1}}, )
(14.57)
As far as the stack isnt empty, we pop one intermediate attempt. If the
latest state is equal to {1, 1, 1, 0, 1, 1, 1}, a solution is found. We append
the series of moves till this state to the result list M ; otherwise, We expand to
next possible state by trying all four possible moves, and push them back to the
491
stack for further search. Denote the top element in the stack S as s1 , and the
latest state in s1 as L. The algorithm can be defined as the following.
M : S=
solve(S , {reverse(s1 )} M ) : L = {1, 1, 1, 0, 1, 1, 1}
solve(P S , M ) : otherwise
(14.58)
Where P are possible moves from the latest state L:
solve(S, M ) =
Running this program finds 2 symmetric solutions, each takes 15 steps. One
solution is list in the below table.
step -1 -1 -1 0
1
1
1
1
-1 -1 0 -1 1
1
1
2
-1 -1 1 -1 0
1
1
3
-1 -1 1 -1 1
0
1
4
-1 -1 1
0
1 -1 1
5
-1 0
1 -1 1 -1 1
6
0 -1 1 -1 1 -1 1
7
1 -1 0 -1 1 -1 1
8
1 -1 1 -1 0 -1 1
9
1 -1 1 -1 1 -1 0
10
1 -1 1 -1 1
0 -1
11
1 -1 1
0
1 -1 -1
12
1
0
1 -1 1 -1 -1
13
1
1
0 -1 1 -1 -1
14
1
1
1 -1 0 -1 -1
15
1
1
1
0 -1 -1 -1
492
Observe that the algorithm is in tail recursive manner, it can also be realized
imperatively. The algorithm can be more generalized, so that it solve the puzzles
of n frogs on each side. We represent the start state {-1, -1, ..., -1, 0, 1, 1, ...,
1} as s, and the mirrored end state as e.
1: function Solve(s, e)
2:
S {{s}}
3:
M
4:
while S = do
5:
s1 Pop(S)
6:
if s1 [1] = e then
7:
Add(M , Reverse(s1 ))
8:
else
9:
for m Moves(s1 [1]) do
10:
Push(S, {m} s1 )
11:
return M
The possible moves can be also generalized with procedure Moves to handle arbitrary number of frogs. The following Python program implements this
solution.
def solve(start, end):
stack = [[start]]
s = []
while stack != []:
c = stack.pop()
if c[0] == end:
s.append(reversed(c))
else:
for m in moves(c[0]):
stack.append([m]+c)
return s
def moves(s):
ms = []
n = len(s)
p = s.index(0)
if p < n - 2 and s[p+2] > 0:
ms.append(swap(s, p, p+2))
if p < n - 1 and s[p+1] > 0:
ms.append(swap(s, p, p+1))
if p > 1 and s[p-2] < 0:
ms.append(swap(s, p, p-2))
if p > 0 and s[p-1] < 0:
ms.append(swap(s, p, p-1))
return ms
def swap(s, i, j):
a = s[:]
(a[i], a[j]) = (a[j], a[i])
return a
For 3 frogs in each side, we know that it takes 15 steps to exchange them.
Its interesting to examine the table that how many steps are needed along with
the number of frogs in each side. Our program gives the following result.
493
number of frogs 1 2 3
4
5 ...
number of steps 3 8 15 24 35 ...
It seems that the number of steps are all square numbers minus one. Its
natural to guess that the number of steps for n frogs in one side is (n + 1)2 1.
Actually we can prove it is true.
Compare to the final state and the start state, each frog moves ahead n + 1
stones in its opposite direction. Thus total 2n frogs move 2n(n + 1) stones.
Another important fact is that each frog on the left has to meet every one on
the right one time. And leap will happen when meets. Since the frog moves
two stones ahead by leap, and there are total n2 meets happen, so that all these
meets cause moving 2n2 stones ahead. The rest moves are not leap, but hop.
The number of hops are 2n(n + 1) 2n2 = 2n. Sum up all n2 leaps and 2n
hops, the total number of steps are n2 + 2n = (n + 1)2 1.
Summary of DFS
Observe the above three puzzles, although they vary in many aspects, their
solutions show quite similar common structures. They all have some starting
state. The maze starts from the entrance point; The 8 queens puzzle starts
from the empty board; The leap frogs start from the state of AAAOBBB. The
solution is a kind of searching, at each attempt, there are several possible ways.
For the maze puzzle, there are four dierent directions to try; For the 8 queens
puzzle, there are eight columns to choose; For the leap frogs puzzle, there are
four movements of leap or hop. We dont know how far we can go when make a
decision, although the final state is clear. For the maze, its the exit point; For
the 8 queens puzzle, we are done when all the 8 queens being assigned on the
board; For the leap frogs puzzle, the final state is that all frogs exchanged.
We use a common approach to solve them. We repeatedly select one possible
candidate to try, record where weve achieved; If we get stuck, we backtrack
and try other options. We are sure by using this strategy, we can either find a
solution, or tell that the problem is unsolvable.
Of course there can be some variation, that we can stop when find one
answer, or go on searching all the solutions.
If we draw a tree rooted at the starting state, expand it so that every branch
stands for a dierent attempt, our searching process is in a manner, that it
searches deeper and deeper. We wont consider any other options in the same
depth unless the searching fails so that weve to backtrack to upper level of
the tree. Figure 14.33 illustrates the order we search a state tree. The arrow
indicates how we go down and backtrack up. The number of the nodes shows
the order we visit them.
This kind of search strategy is called DFS (Deep-first-search). We widely
use it unintentionally. Some programming environments, Prolog for instance,
adopt DFS as the default evaluation model. A maze is given by a set of rule
base, such as:
c(a,
c(b,
c(e,
c(f,
c(g,
c(h,
b).
c).
d),
c).
d).
f).
c(a, e).
c(b, f).
c(e, f).
c(g, h).
494
go(X, X).
go(X, Y) :- c(X, Z), go(Z, Y)
This program says that, a place is connected with itself. Given two dierent
places X and Y , if X is connected with Z, and Z is connected with Y , then X
is connected with Y . Note that, there might be multiple choices for Z. Prolog
selects a candidate, and go on further searching. It only tries other candidates
if the recursive searching fails. In that case, Prolog backtracks and tries other
alternatives. This is exactly what DFS does.
DFS is quite straightforward when we only need a solution, but dont care
if the solution takes the fewest steps. For example, the solution it gives, may
not be the shortest path for the maze. Well see some more puzzles next. They
demands to find the solution with the minimum attempts.
495
496
Figure 14.36: A lucky-draw game, the i-th person goes from the queue, pick a
ball, then join the queue at tail if he fails to pick the black ball.
We can use the quite same idea to solve our puzzle. The two banks of the
river can be represented as two sets A and B. A contains the wolf, the goat,
the cabbage and the farmer; while B is empty. We take an element along with
the farmer from one set to the other each time. The two sets cant hold conflict
things if the farmer is absent. The goal is to exchange the contents of A and B
with fewest steps.
We initialize a queue with state A = {w, g, c, p}, B = as the only element.
As far as the queue isnt empty, we pick the first element from the head, expand
it with all possible options, and put these new expanded candidates to the tail
of the queue. If the first element on the head is the final goal, that A = , B =
{w, g, c, p}, we are done. Figure 14.37 illustrates the idea of this search order.
Note that as all possibilities in the same level are examined, there is no need
for back-tracking.
There is a simple way to treat the set. A four bits binary number can be
used, each bit stands for a thing, for example, the wolf w = 1, the goat g = 2,
the cabbage c = 4, and the farmer p = 8. That 0 stands for the empty set, 15
stands for a full set. Value 3, solely means there are a wolf and a goat on the
river bank. In this case, the wolf will eat the goat. Similarly, value 6 stands for
another conflicting case. Every time, we move the highest bit (which is 8), or
together with one of the other bits (4 or 2, or 1) from one number to the other.
The possible moves can be defined as below.
{
mv(A, B) =
497
Figure 14.37: Start from state 1, check all possible options 2, 3, and 4 for next
step; then all nodes in level 3, ...
Where is the bitwise-and operation.
the solution can be given by reusing the queue defined in previous chapter.
Denote the queue as Q, which is initialed with a singleton list {(15, 0)}. If Q is
not empty, function DeQ(Q) extracts the head element M , the updated queue
becomes Q . M is a list of pairs, stands for a series of movements between the
river banks. The first element in m1 = (A , B ) is the latest state. Function
EnQ (Q, L) is a slightly dierent enqueue operation. It pushes all the possible
moving sequences in L to the tail of the queue one by one and returns the
updated queue. With these notations, the solution function is defined like below.
: Q=
reverse(M
{
} ) : A =0
solve(Q) =
{m}
M
|
m
mv(m
),
)) : otherwise
solve(EnQ (Q ,
valid(m, M )
(14.60)
Where function valid(m, M ) checks if the new moving candidate m = (A , B )
is valid. That neither A nor B is 3 or 6, and m hasnt been tried before in
M to avoid any repeatedly attempts.
valid(m, M ) = A = 3, A = 6, B = 3, B = 6, m
/M
(14.61)
The following example Haskell program implements this solution. Note that
it uses a plain list to represent the queue for illustration purpose.
import Data.Bits
solve = bfsSolve [[(15, 0)]] where
bfsSolve :: [[(Int, Int)]] [(Int, Int)]
bfsSolve [] = [] -- no solution
bfsSolve (c:cs) | (fst $ head c) == 0 = reverse c
| otherwise = bfsSolve (cs ++ map (:c)
(filter (valid c) $ moves $ head c))
valid (a, b) r = not $ or [ a elem [3, 6], b elem [3, 6],
(a, b) elem r]
moves (a, b) = if b < 8 then trans a b else map swap (trans b a) where
498
This algorithm can be easily modified to find all the possible solutions, but
not just stop after finding the first one. This is left as the exercise to the reader.
The following shows the two best solutions to this puzzle.
Solution 1:
Left
river Right
wolf, goat, cabbage, farmer
wolf, cabbage
goat, farmer
wolf, cabbage, farmer
goat
cabbage
wolf, goat, farmer
wolf
goat, cabbage, farmer
goat
wolf, cabbage, farmer
goat, farmer
wolf, cabbage
wolf, goat, cabbage, farmer
Solution 2:
Left
river Right
wolf, goat, cabbage, farmer
wolf, cabbage
goat, farmer
wolf, cabbage, farmer
goat
wolf
goat, cabbage, farmer
wolf, goat, farmer
cabbage
goat
wolf, cabbage, farmer
goat, farmer
wolf, cabbage
wolf, goat, cabbage, farmer
This algorithm can also be realized imperatively. Observing that our solution
is in tail recursive manner, we can translate it directly to a loop. We use a list
S to hold all the solutions can be found. The singleton list {(15, 0)} is pushed
to queue when initializing. As long as the queue isnt empty, we extract the
head C from the queue by calling DeQ procedure. Examine if it reaches the
final goal, if not, we expand all the possible moves and push to the tail of the
queue for further searching.
1: function Solve
2:
S
3:
Q
4:
EnQ(Q, {(15, 0)})
5:
while Q = do
6:
C DeQ(Q)
7:
if c1 = (0, 15) then
8:
Add(S, Reverse(C))
9:
else
10:
for m Moves(C) do
11:
if Valid(m, C) then
12:
EnQ(Q, {m} C)
13:
return S
Where Moves, and Valid procedures are as same as before. The following
Python example program implements this imperative algorithm.
def solve():
499
s = []
queue = [[(0xf, 0)]]
while queue != []:
cur = queue.pop(0)
if cur[0] == (0, 0xf):
s.append(reverse(cur))
else:
for m in moves(cur):
queue.append([m]+cur)
return s
def moves(s):
(a, b) = s[0]
return valid(s, trans(a, b) if b < 8 else swaps(trans(b, a)))
def valid(s, mv):
return [(a, b) for (a, b) in mv
if a not in [3, 6] and b not in [3, 6] and (a, b) not in s]
def trans(a, b):
masks = [ 8 | (1<<i) for i in range(4)]
return [(a ^ mask, b | mask) for mask in masks if a & mask == mask]
def swaps(s):
return [(b, a) for (a, b) in s]
There is a minor dierence between the program and the pseudo code, that
the function to generate candidate moving options filters the invalid cases inside
it.
Every time, no matter the farmer drives the boat back and forth, there are
m options for him to choose, where m is the number of objects on the river bank
the farmer drives from. m is always less than 4, that the algorithm wont take
more than n4 times at step n. This estimation is far more than the actual time,
because we avoid trying all invalid cases. Our solution examines all the possible
moving in the worst case. Because we check recorded steps to avoid repeated
attempt, the algorithm takes about O(n2 ) time to search for n possible steps.
Water jugs puzzle
This is a popular puzzle in classic AI. The history of it should be very long. It
says that there are two jugs, one is 9 quarts, the other is 4 quarts. How to use
them to bring up from the river exactly 6 quarts of water?
There are varies versions of this puzzle, although the volume of the jugs, and
the target volume of water dier. The solver is said to be young Blaise Pascal
when he was a child, the French mathematician, scientist in one story, and
Sim`eon Denis Poisson in another story. Later in the popular Hollywood movie
Die-Hard 3, actor Bruce Willis and Samuel L. Jackson were also confronted
with this puzzle.
P`olya gave a nice way to solve this problem backwards in [14].
Instead of thinking from the starting state as shown in figure 14.38. P`olya
pointed out that there will be 6 quarts of water in the bigger jugs at the final
stage, which indicates the second last step, we can fill the 9 quarts jug, then
500
501
Figure 14.40: Fill the bigger jugs, and pour to the smaller one twice.
(14.62)
Where m|n means n can be divided by m. Whats more, if a and b are relatively prime, which means gcd(a, b) = 1, its possible to bring up any quantity
g of water.
Although gcd(a, b) enables us to determine if the puzzle is solvable, it doesnt
give us the detailed pour sequence. If we can find some integer x and y, so that
g = xa + yb. We can arrange a sequence of operations (even it may not be the
best solution) to solve it. The idea is that, without loss of generality, suppose
x > 0, y < 0, we need fill jug A by x times, and empty jug B by y times in total.
Lets take a = 3, b = 5, and g = 4 for example, since 4 = 3 3 5, we can
arrange a sequence like the following.
502
A B operation
0 0
start
3 0
fill A
0 3
pour A into B
3 3
fill A
1 5
pour A into B
1 0
empty B
0 1
pour A into B
fill A
3 1
0 4
pour A into B
In this sequence, we fill A by 3 times, and empty B by 1 time. The procedure
can be described as the following:
Repeat x times:
1. Fill jug A;
2. Pour jug A into jug B, whenever B is full, empty it.
So the only problem left is to find the x and y. There is a powerful tool
in number theory called, Extended Euclid algorithm, which can achieve this.
Compare to the classic Euclid GCD algorithm, which can only give the greatest
common divisor, The extended Euclid algorithm can give a pair of x, y as well,
so that:
(d, x, y) = gcdext (a, b)
(14.63)
(14.64)
Since d is the common divisor, it can divide both a and b, thus d can divide
r as well. Because r is less than a, we can scale down the problem by finding
GCD of a and r:
(d, x , y ) = gcdext (r, a)
(14.65)
= x (b aq) + y a
= (y x q)a + x b
(14.66)
x = y x
y = x
b
a
(14.67)
Note that this is a typical recursive relationship. The edge case happens
when a = 0.
gcd(0, b) = b = 0a + 1b
(14.68)
503
Summarize the above result, the extended Euclid algorithm can be defined
as the following:
{
(b, 0, 1) : a = 0
gcdext (a, b) =
(14.69)
b
(d, y x , x ) : otherwise
a
Where d, x , y are defined in equation (14.65).
The 2 water jugs puzzle is almost solved, but there are still two detailed
problems need to be tackled. First, extended Euclid algorithm gives the linear
combination for the greatest common divisor d. While the target volume of
water g isnt necessarily equal to d. This can be easily solved by multiplying x
and y by m times, where m = g/gcd(a, b); Second, we assume x > 0, to form a
procedure to fill jug A with x times. However, the extended Euclid algorithm
doesnt ensure x to be positive. For instance gcdext (4, 9) = (1, 2, 1). Whenever
we get a negative x, since d = xa + yb, we can continuously add b to x, and
decrease y by a till x is greater than zero.
At this stage, we are able to give the complete solution to the 2 water jugs
puzzle. Below is an example Haskell program.
extGcd 0 b = (b, 0, 1)
extGcd a b = let (d, x, y) = extGcd (b mod a) a in
(d, y - x (b div a), x)
solve a b g | g mod d /= 0 = [] -- no solution
| otherwise = solve (x g div d)
where
(d, x, y) = extGcd a b
solve x | x < 0 = solve (x + b)
| otherwise = pour x [(0, 0)]
pour 0 ps = reverse ((0, g):ps)
pour x ps@((a, b):_) | a == 0 = pour (x - 1) ((a, b):ps) -- fill a
| b == b = pour x ((a, 0):ps) -- empty b
| otherwise = pour x ((max 0 (a + b - b),
min (a + b) b):ps)
Although we can solve the 2 water jugs puzzle with extended Euclid algorithm, the solution may not be the best. For instance, when we are going
to bring 4 gallons of water from 3 and 5 gallons jugs. The extended Euclid
algorithm produces the following sequence:
[(0,0),(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),
(0,4),(3,4),(2,5),(2,0),(0,2),(3,2),(0,5),(3,5),
(3,0),(0,3),(3,3),(1,5),(1,0),(0,1),(3,1),(0,4)]
It takes 23 steps to achieve the goal, while the best solution only need 6
steps:
[(0,0),(0,5),(3,2),(0,2),(2,0),(2,5),(3,4)]
Observe the 23 steps, and we can find that jug B has already contained 4
gallons of water at the 8-th step. But the algorithm ignores this fact and goes
on executing the left 15 steps. The reason is that the linear combination x and y
we find with the extended Euclid algorithm are not the only numbers satisfying
504
g = xa + by. For all these numbers, the smaller |x| + |y|, the less steps are
needed. There is an exercise to addressing this problem in this section.
The interesting problem is how to find the best solution? We have two
approaches, one is to find x and y to minimize |x| + |y|; the other is to adopt
the quite similar idea as the wolf-goat-cabbage puzzle. We focus on the latter
in this section. Since there are at most 6 possible options: fill A, fill B, pour
A into B, pour B into A, empty A and empty B, we can try them in parallel,
and check which decision can lead to the best solution. We need record all the
states weve achieved to avoid any potential repetition. In order to realize this
parallel approach with reasonable resources, a queue can be used to arrange our
attempts. The elements stored in this queue are series of pairs (p, q), where p
and q represent the volume of waters contained in each jug. These pairs record
the sequence of our operations from the beginning to the latest. We initialize
the queue with the singleton list contains the starting state {(0, 0)}.
solve(a, b, g) = solve {{(0, 0)}}
(14.70)
Every time, when the queue isnt empty, we pick a sequence from the head
of the queue. If this sequence ends with a pair contains the target volume g, we
find a solution, we can print this sequence by reversing it; Otherwise, we expand
the latest pair by trying all the possible 6 options, remove any duplicated states,
and add them to the tail of the queue. Denote the queue as Q, the first sequence
stored on the head of the queue as S, the latest pair in S as (p, q), and the rest
of pairs as S . After popping the head element, the queue becomes Q . This
algorithm can be defined like below:
: Q=
reverse(S) : p = g q = g
(14.72)
Its intuitive to define the 6 options. For fill operations, the result is that the
volume of the filled jug is full; for empty operation, the result volume is empty;
for pour operation, we need test if the jug is big enough to hold all the water.
f illA(p, q) = (a, q)
f illB(p, q) = (p, b)
emptyA(p, q) = (0, q)
emptyB(p, q) = (p, 0)
pourA(p, q) = (max(0, p + q b), min(x + y, b))
pourB(p, q) = (min(x + y, a), max(0, x + y a))
(14.73)
The following example Haskell program implements this method:
505
This method always returns the fast solution. It can also be realized in
imperative approach. Instead of storing the complete sequence of operations in
every element in the queue, we can store the unique state in a global history
list, and use links to track the operation sequence, this can save spaces.
(0, 0)
(3, 0)
(0, 5)
(0, 0)
(3, 5)
(0, 3)
(3, 2)
...
fill A
flll B
(3, 0)
(0, 5)
(0, 0)
(0, 3)
fill A empty B
(3, 5)
(0, 0)
pour B
(3, 2)
...
506
};
struct Step make_step(int p, int q, struct Step parent) {
struct Step s = (struct Step) malloc(sizeof(struct Step));
sp = p;
sq = q;
sparent = parent;
return s;
}
Where p, q are volumes of water in the 2 jugs. For any state s, define
functions p(s) and q(s) return these 2 values, the imperative algorithm can be
realized based on this idea as below.
1: function Solve(a, b, g)
2:
Q
3:
Push-and-record(Q, (0, 0))
4:
while Q = do
5:
s Pop(Q)
6:
if p(s) = g q(s) = g then
7:
return s
8:
else
9:
C Expand(s)
10:
for c C do
11:
if c = s Visited(c) then
12:
Push-and-record(Q, c)
13:
return NIL
Where Push-and-record does not only push an element to the queue, but
also record this element as visited, so that we can check if an element has been
visited before in the future. This can be implemented with a list. All push
operations append the new elements to the tail. For pop operation, instead of
removing the element pointed by head, the head pointer only advances to the
next one. This list contains historic data which has to be reset explicitly. The
following ANSI C code illustrates this idea.
struct Step steps[1000], head, tail = steps;
void push(struct Step s) { tail++ = s; }
struct Step pop() { return head++; }
int empty() { return head == tail; }
void reset() {
struct Step p;
for (p = steps; p != tail; ++p)
free(p);
head = tail = steps;
}
In order to test a state has been visited, we can traverse the list to compare
p and q.
int eq(struct Step a, struct Step b) {
507
And the result steps is back tracked in reversed order, it can be output with
a recursive function:
void print(struct Step s) {
if (s) {
print(sparent);
printf("%d, %dn", sp, sq);
}
}
Kloski
Kloski is a block sliding puzzle. It appears in many countries. There are dierent
sizes and layouts. Figure 14.42 illustrates a traditional Kloski game in China.
508
1 10 10 2
1 10 10 2
4 5
M =
3 4
3 7
8 5
6 0
0 9
In this matrix, the cells of value i mean the i-th piece covers this cell. The
509
Figure 14.44: Left: both the upper and the lower 1 are OK; Right: the upper 1
is OK, the lower 1 conflicts with 2.
The left example illustrates sliding block labeled with 1 down. There are two
cells covered by this block. The upper 1 moves to the cell previously occupied
by this same block, which is also labeled with 1; The lower 1 moves to a free
cell, which is labeled with 0;
The right example, on the other hand, illustrates invalid sliding. In this case,
the upper cells could move to the cell occupied by the same block. However, the
lower cell labeled with 1 cant move to the cell occupied by other block, which
is labeled with 2.
In order to test the valid moving, we need examine all the cells a block
will cover. If they are labeled with 0 or a number as same as this block, the
510
moving is valid. Otherwise it conflicts with some other block. For a layout L,
the corresponding matrix is M , suppose we want to move the k-th block with
(x, y), where |x| 1, |y| 1. The following equation tells if the moving
is valid:
valid(L, k, x, y) :
(i, j) L[k] i = i + y, j = j + x,
(1, 1) (i , j ) (5, 4), Mi j {k, 0}
(14.74)
Another important point to solve Kloski puzzle, is about how to avoid repeated attempts. The obvious case is that after a series of sliding, we end up
a matrix which have been transformed from. However, it is not enough to only
avoid the same matrix. Consider the following two metrics. Although M1 = M2 ,
we need drop options to M2 , because they are essentially the same.
1 10 10 2
2 10 10 1
1 10 10 2
2 10 10 1
4 5 M2 =
4 5
M1 = 3 4
3 4
3 7
3 7
8 5
6 5
6 0
0 9
8 0
0 9
This fact tells us, that we should compare the layout, but not merely matrix
to avoid repetition. Denote the corresponding layouts as L1 and L2 respectively,
its easy to verify that ||L1 || = ||L2 ||, where ||L|| is the normalized layout, which
is defined as below:
||L|| = sort({sort(li )|li L})
(14.75)
In other words, a normalized layout is ordered for all its elements, and every
element is also ordered. The ordering can be defined as that (a, b) (c, d)
an + b cn + d, where n is the width of the matrix.
Observing that the Kloski board is symmetric, thus a layout can be mirrored
from another one. Mirrored layout is also a kind of repeating, which should be
avoided. The following M1 and M2 show such an example.
10 10 1 2
3 1 10 10
10 10 1 2
3 1 10 10
5 4 4 M2 =
5
M1 = 3
4 4 2
3
7 6 2
5 8 9
5
6
7 0 0
0 0 9
8
Note that, the normalized layouts are symmetric to each other. Its easy to
get a mirrored layout like this:
mirror(L) = {{(i, n j + 1)|(i, j) l}|l L}
(14.76)
511
The queue contains the starting layout when initialized. Whenever this
queue isnt empty, we pick the first one from the head, checking if the biggest
block is on target, that L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)}. If yes, then we are
done; otherwise, we try to move every block with 4 options: left, right, up, and
down, and store all the possible, unique new layout to the tail of the queue.
During this searching, we need record all the normalized layouts weve ever
found to avoid any duplication.
Denote the queue as Q, the historic layouts as H, the first layout on the head
of the queue as L, its corresponding matrix as M . and the moving sequence to
this layout as S. The algorithm can be defined as the following.
: Q=
reverse(S) : L[10] = {(4, 2), (4, 3), (5, 2), (5, 3)}
solve(Q , H ) : otherwise
(14.77)
The first clause says that if the queue is empty, weve tried all the possibilities
and cant find a solution; The second clause finds a solution, it returns the
moving sequence in reversed order; These are two edge cases. Otherwise, the
algorithm expands the current layout, puts all the valid new layouts to the tail
of the queue to yield Q , and updates the normalized layouts to H . Then it
performs recursive searching.
In order to expand a layout to valid unique new layouts, we can define a
function as below:
solve(Q, H) =
(14.79)
Well next show some example Haskell Kloski programs. As array isnt
mutable in the purely functional settings, tree based map is used to represent
layout 11 . Some type synonyms are defined as below:
import qualified Data.Map as M
import Data.Ix
import Data.List (sort)
type Point = (Integer, Integer)
type Layout = M.Map Integer [Point]
type Move = (Integer, Point)
data Ops = Op Layout [Move]
512
Where function layout gives the normalized form by sorting. move returns
the updated map by sliding the i-th block with (y, x).
layout = sort map sort M.elems
move x (i, d) = M.update (Just map (flip shift d)) i x
shift (y, x) (dy, dx) = (y + dy, x + dx)
Function expand gives all the possible new options. It can be directly translated from expand(L, H).
expand :: Layout [[[Point]]] [Move]
expand x visit = [(i, d) | i [1..10],
d [(0, -1), (0, 1), (-1, 0), (1, 0)],
valid i d, unique i d] where
valid i d = all (p let p = shift p d in
inRange (bounds board) p &&
(M.keys $ M.filter (elem p) x) elem [[i], []])
(maybe [] id $ M.lookup i x)
unique i d = let mv = move x (i, d) in
all (notElem visit) (map layout [mv, mirror mv])
Note that we also filter out the mirrored layouts. The mirror function is
given as the following.
mirror = M.map (map ( (y, x) (y, 5 - x)))
This program takes several minutes to produce the best solution, which takes
116 steps. The final 3 steps are shown as below:
...
[5,
[5,
[7,
[A,
[A,
3,
3,
9,
A,
A,
2,
2,
4,
6,
0,
1]
1]
4]
0]
8]
[5,
[5,
[7,
[A,
[A,
3,
3,
9,
A,
A,
2,
2,
4,
0,
0,
1]
1]
4]
6]
8]
[5, 3, 2, 1]
3,
9,
A,
A,
2,
4,
A,
A,
513
1]
4]
6]
8]
514
Like most programming languages, arrays are indexed from 0 but not 1 in
Python. This has to be handled properly. The rest functions including mirror,
matrix, and move are implemented as the following.
def mirror(layout):
return [[(y, 5 - x) for (y, x) in r] for r in layout]
def matrix(layout):
m = [[0]4 for _ in range(5)]
for (i, ps) in zip(range(1, 11), layout):
for (y, x) in ps:
m[y - 1][x - 1] = i
return m
def move(layout, delta):
(i, (dy, dx)) = delta
m = dup(layout)
m[i - 1] = [(y + dy, x + dx) for (y, x) in m[i - 1]]
return m
def dup(layout):
return [r[:] for r in layout]
Its possible to modify this Kloski algorithm, so that it does not only stop
at the first solution, but also search all the solutions. In such case, the computation time is bound to the size of a space V , where V holds all the layouts can
be transformed from the starting layout. If all these layouts are stored globally, with a parent field point to the predecessor, the space requirement of this
algorithm is also bound to O(V ).
515
Summary of BFS
The above three puzzles, the wolf-goat-cabbage puzzle, the water jugs puzzle,
and the Kloski puzzle show some common solution structure. Similar to the
DFS problems, they all have the starting state and the end state. The wolfgoat-cabbage puzzle starts with the wolf, the goat, the cabbage and the farmer
all in one side, while the other side is empty. It ends up in a state that they
all moved to the other side. The water jugs puzzle starts with two empty jugs,
and ends with either jug contains a certain volume of water. The Kloski puzzle
starts from a layout and ends to another layout that the biggest block begging
slided to a given position.
All problems specify a set of rules which can transfer from one state to
another. Dierent form the DFS approach, we try all the possible options in
parallel. We wont search further until all the other alternatives in the same step
have been examined. This method ensures that the solution with the minimum
steps can be found before those with more steps. Review and compare the two
figures weve drawn before shows the dierence between these two approaches.
For the later one, because we expand the searching horizontally, it is called as
Breadth-first search (BFS for short).
516
15
e
11
7
10
f
5 12
d
6
c
14.3.2
Searching for the optimal solution is quite important in many aspects. People
need the best solution to save time, space, cost, or energy. However, its not
easy to find the best solution with limited resources. There have been many
optimal problems can only be solved by brute-force. Nevertheless, weve found
that, for some of them, There exists special simplified ways to search the optimal
solution.
Grady algorithm
Human coding
517
char code
char code
A
00000 N
01101
B
00001 O
01110
C
00010 P
01111
D
00011 Q
10000
E
00100 R
10001
F
00101 S
10010
G
00110 T
10011
00111 U
10100
H
I
01000 V
10101
J
01001 W
10110
K
01010 X
10111
L
01011 Y
11000
M
01100 Z
11001
With this code table, text INTERNATIONAL is encoded to 65 bits.
00010101101100100100100011011000000110010001001110101100000011010
Observe the above code table, which actually maps the letter A to Z from
0 to 25. There are 5 bits to represent every code. Code zero is forced as 00000
but not 0 for example. Such kind of coding method, is called fixed-length
coding.
Another coding method is variable-length coding. That we can use just
one bit 0 for A, two bits 10 for C, and 5 bits 11001 for Z. Although
this approach can shorten the total length of the code for INTERNATIONAL
from 65 bits dramatically, it causes problem when decoding. When processing
a sequence of bits like 1101, we dont know if it means 1 followed by 101,
which stands for BF; or 110 followed by 1, which is GB, or 1101 which is
N.
The famous Morse code is variable-length coding system. That the most
used letter E is encoded as a dot, while Z is encoded as two dashes and two
dots. Morse code uses a special pause separator to indicate the termination of
a code, so that the above problem wont happen. There is another solution to
avoid ambiguity. Consider the following code table.
char code char code
A
110
E
1110
I
101
L
1111
N
01
O
000
R
001
T
100
Text INTERNATIONAL is encoded to 38 bits only:
10101100111000101110100101000011101111
If decode the bits against the above code table, we wont meet any ambiguity
symbols. This is because there is no code for any symbol is the prefix of another
one. Such code is called prefix-code. (You may wonder why it isnt called as
non-prefix code.) By using prefix-code, we neednt separators at all. So that
the length of the code can be shorten.
This is a very interesting problem. Can we find a prefix-code table, which
produce the shortest code for a given text? The very same problem was given
to David A. Human in 1951, who was still a student in MIT[15]. His professor
518
Robert M. Fano told the class that those who could solve this problem neednt
take the final exam. Human almost gave up and started preparing the final
exam when he found the most ecient answer.
The idea is to create the coding table according to the frequency of the
symbol appeared in the text. The more used symbol is assigned with the shorter
code.
Its not hard to process some text, and calculate the occurrence for each
symbol. So that we have a symbol set, each one is augmented with a weight.
The weight can be the number which indicates the frequency this symbol occurs.
We can use the number of occurrence, or the probabilities for example.
Human discovered that a binary tree can be used to generate prefix-code.
All symbols are stored in the leaf nodes. The codes are generated by traversing
the tree from root. When go left, we add a zero; and when go right we add a
one.
Figure 14.47 illustrates a binary tree. Taking symbol N for example, starting from the root, we first go left, then right and arrive at N. Thus the code
for N is 01; While for symbol A, we can go right, right, then left. So A is
encode to 110. Note that this approach ensures none code is the prefix of the
other.
13
O, 1
N, 3
R, 1
T, 2
I, 2
A, 2
E, 1
L, 1
519
E, 1
L, 1
O, 1
R, 1
(a) 1.
T, 2
I, 2
(b) 2.
(c) 3.
A, 2
E, 1
L, 1
N, 3
O, 1
R, 1
(d) 4.
(e) 5.
8
T, 2
I, 2
A, 2
E, 1
L, 1
(f) 6.
13
O, 1
N, 3
R, 1
T, 2
I, 2
A, 2
E, 1
L, 1
(g) 7.
520
struct Node {
int w;
char c;
struct Node left, right;
};
Some limitation can be added to the definition, as empty tree isnt allowed.
A Human tree is either a leaf, which contains a symbol and its weight; or a
branch, which only holds total weight of all leaves. The following Haskell code,
for instance, explicitly specifies these two cases.
data HTr w a = Leaf w a | Branch w (HTr w a) (HTr w a)
When merge two Human trees T1 and T2 to a bigger one, These two trees
are set as its children. We can select either one as the left, and the other as
the right. the weight of the result tree T is the sum of its two children. so that
w = w1 + w2 . Define T1 < T2 if w1 < w2 , One possible Human tree building
algorithm can be realized as the following.
{
build(A) =
T1
build({merge(Ta , Tb )} A )
: A = {T1 }
: otherwise
(14.80)
A is a list of trees. It is initialized as leaves for all symbols and their weights.
If there is only one tree in this list, we are done, the tree is the final Human
tree. Otherwise, The two smallest tree Ta and Tb are extracted, and the rest
trees are hold in list A . Ta and Tb are merged to one bigger tree, and put back
to the tree list for further recursive building.
(Ta , Tb , A ) = extract(A)
(14.81)
We can scan the tree list to extract the 2 nodes with the smallest weight. Below equation shows that when the scan begins, the first 2 elements are compared
and initialized as the two minimum ones. An empty accumulator is passed as
the last argument.
extract(A) = extract (min(T1 , T2 ), max(T1 , T2 ), {T3 , T4 , ...}, )
(14.82)
For every tree, if its weight is less than the smallest two weve ever found,
we update the result to contain this tree. For any given tree list A, denote the
first tree in it as T1 , and the rest trees except T1 as A . The scan process can
be defined as the following.
A=
T1 < Tb
otherwise
(14.83)
Where Ta = min(T1 , Ta ), Tb = max(T1 , Ta ) are the updated two trees with
the smallest weights.
The following Haskell example program implements this Human tree building algorithm.
extract (Ta , Tb , A, B) =
(Ta , Tb , B) :
extract (Ta , Tb , A , {Tb } A) :
521
build [x] = x
build xs = build ((merge x y) : xs) where
(x, y, xs) = extract xs
extract (x:y:xs) = min2 (min x y) (max x y) xs [] where
min2 x y [] xs = (x, y, xs)
min2 x y (z:zs) xs | z < y = min2 (min z x) (max z x) zs (y:xs)
| otherwise = min2 x y zs (z:xs)
The algorithm merges all the leaves, and it need scan the list in each iteration.
Thus the performance is quadratic. This algorithm can be improved. Observe
that each time, only the two trees with the smallest weights are merged. This
522
reminds us the heap data structure. Heap ensures to access the smallest element
fast. We can put all the leaves in a heap. For binary heap, this is typically a
linear operation. Then we extract the minimum element twice, merge them,
then put the bigger tree back to the heap. This is O(lg n) operation if binary
heap is used. So the total performance is O(n lg n), which is better than the
above algorithm. The next algorithm extracts the node from the heap, and
starts Human tree building.
build(H) = reduce(top(H), pop(H))
(14.84)
This algorithm stops when the heap is empty; Otherwise, it extracts another
nodes from the heap for merging.
{
: H=
: otherwise
(14.85)
Function build and reduce are mutually recursive. The following Haskell
example program implements this algorithm by using heap defined in previous
chapter.
reduce(T, H) =
huffman
huffman
build
reduce
reduce
T
build(insert(merge(T, top(H)), pop(H)))
The heap solution can also be realized imperatively. The leaves are firstly
transformed to a heap, so that the one with the minimum weight is put on the
top. As far as there are more than 1 elements in the heap, we extract the two
smallest, merge them to a bigger one, and put back to the heap. The final tree
left in the heap is the result Human tree.
1: function Huffman(A)
2:
Build-Heap(A)
3:
while |A| > 1 do
4:
Ta Heap-Pop(A)
5:
Tb Heap-Pop(A)
6:
Heap-Push(A, Merge(Ta , Tb ))
7:
return Heap-Pop(A)
The following example C++ code implements this heap solution. The heap
used here is provided in the standard library. Because the max-heap, but not
min-heap would be made by default, a greater predication is explicitly passed
as argument.
bool greaterp(Node a, Node b) { return bw < aw; }
Node pop(Nodes& h) {
Node m = h.front();
pop_heap(h.begin(), h.end(), greaterp);
h.pop_back();
return m;
}
523
When the symbol-weight list has been already sorted, there exists a linear
time method to build the Human tree. Observe that during the Human tree
building, it produces a series of merged trees with weight in ascending order.
We can use a queue to manage the merged trees. Every time, we pick the two
trees with the smallest weight from both the queue and the list, merge them
and push the result to the queue. All the trees in the list will be processed, and
there will be only one tree left in the queue. This tree is the result Human
tree. This process starts by passing an empty queue as below.
build (A) = reduce (extract (, A))
(14.86)
Suppose A is in ascending order by weight, At any time, the tree with the
smallest weight is either the header of the queue, or the first element of the list.
Denote the header of the queue is Ta , after pops it, the queue is Q ; The first
element in A is Tb , the rest elements are hold in A . Function extract can be
defined like the following.
(Tb , (Q, A )) :
(Ta , (Q , A)) :
extract (Q, A) =
(Tb , (Q, A )) :
Q=
A = Ta < Tb
otherwise
(14.87)
Actually, the pair of queue and tree list can be viewed as a special heap.
The tree with the minimum weight is continuously extracted and merged.
reduce
(T, (Q, A)) =
{
T : Q=A=
reduce (extract (push(Q , merge(T, T )), A )) : otherwise
(14.88)
Where (T , (Q , A )) = extract (Q, A), which means extracting another
tree. The following Haskell example program shows the implementation of this
method. Note that this program explicitly sort the leaves, which isnt necessary
if the leaves are ordered. Again, the list, but not a real queue is used here for
illustration purpose. List isnt good at pushing new element, please refer to the
chapter of queue for details about it.
huffman :: (Num a, Ord a) [(b, a)] HTr a b
huffman = reduce wrap sort map ((c, w) Leaf w c) where
524
525
return t;
}
Note that the sorting isnt necessary if the trees have already been ordered.
It can be a linear time reversing in case the trees are in ascending order by
weight.
There are three dierent Human man tree building methods explained.
Although they follow the same approach developed by Human, the result trees
varies. Figure 14.49 shows the three dierent Human trees built with these
methods.
13
13
8
5
A, 2
N, 3
4
2
L, 1
T, 2
E, 1
I, 2
O, 1
O, 1
N, 3
R, 1
T, 2
I, 2
R, 1
A, 2
E, 1
L, 1
O, 1
N, 3
R, 1
A, 2
I, 2
T, 2
E, 1
L, 1
Figure 14.49: Variation of Human trees for the same symbol list.
Although these three trees are not identical. They are all able to generate the
most ecient code. The formal proof is skipped here. The detailed information
can be referred to [15] and Section 16.3 of [2].
The Human tree building is the core idea of Human coding. Many things
can be easily achieved with the Human tree. For example, the code table can
be generated by traversing the tree. We start from the root with the empty
prefix p. For any branches, we append a zero to the prefix if turn left, and
append a one if turn right. When a leaf node is arrived, the symbol represented
by this node and the prefix are put to the code table. Denote the symbol of a
leaf node as c, the children for tree T as Tl and Tr respectively. The code table
association list can be built with code(T, ), which is defined as below.
{
code(T, p) =
{(c, p)}
code(Tl , p {0}) code(Tr , p {1})
: leaf (T )
: otherwise
(14.89)
526
{c} :
{c} decode(root(T ), B) :
decode(T, B) =
decode(Tl , B ) :
decode(Tr , B ) :
B = leaf (T )
leaf (T )
b1 = 0
otherwise
(14.90)
Where root(T ) returns the root of the Human tree. The following Haskell
example code implements this algorithm.
decode
find
find
find
tr cs =
(Leaf _
(Leaf _
(Branch
find tr cs where
c) [] = [c]
c) bs = c : find tr bs
_ l r) (b:bs) = find (if b == 0 then l else r) bs
Note that this is an on-line decoding algorithm with linear time performance.
It consumes one bit per time. This can be clearly noted from the below imperative realization, where the index keeps increasing by one.
1: function Decode(T, B)
2:
W
3:
n |B|, i 1
4:
while i < n do
5:
RT
6:
while Leaf(R) do
7:
if B[i] = 0 then
8:
R Left(R)
9:
else
10:
R Right(R)
11:
ii+1
12:
W W Symbol(R)
13:
return W
This imperative algorithm can be implemented as the following example
C++ program.
string decode(Node root, const char bits) {
string w;
while (bits) {
Node t = root;
527
while (!isleaf(t))
t = 0 == bits++ ? tleft : tright;
w += tc;
}
return w;
}
: X=0
otherwise,
{cm } change(X cm , C) :
cm = max({c C, c X})
(14.91)
If C is in descending order, cm can be found as the first one not greater
than X. If we want to change 1.42 dollar, This function produces a coin list of
{100, 25, 5, 5, 5, 1, 1}. The output coins list can be easily transformed to contain
pairs {(100, 1), (25, 1), (5, 3), (1, 2)}. That we need one dollar, a quarter, three
coins of 5 cent, and 2 coins of 1 cent to make the change. The following Haskell
example program outputs result as such.
change(X, C) =
528
As mentioned above, this program assumes the coins are in descending order,
for instance like below.
solve 142 [100, 50, 25, 5, 1]
For a coin system like USA, the greedy approach can find the optimal solution. The amount of coins is the minimum. Fortunately, our greedy method
works in most countries. But it is not always true. For example, suppose a
country have coins of value 1, 3, and 4 units. The best change for value 6, is
to use two coins of 3 units, however, the greedy method gives a result of three
coins: one coin of 4, two coins of 1. Which isnt the optimal result.
Summary of greedy method
As shown in the change making problem, greedy method doesnt always give the
best result. In order to find the optimal solution, we need dynamic programming
which will be introduced in the next section.
However, the result is often good enough in practice. Lets take the wordwrap problem for example. In modern software editors and browsers, text spans
to multiple lines if the length of the content is too long to be hold. With wordwrap supported, user neednt hard line breaking. Although dynamic programming can wrap with the minimum number of lines, its overkill. On the contrary,
greedy algorithm can wrap with lines approximate to the optimal result with
quite eective realization as below. Here it wraps text T , not to exceeds line
width W , with space s between each word.
1: L W
2: for w T do
3:
if |w| + s > L then
4:
Insert line break
5:
L W |w|
6:
else
7:
L L |w| s
529
For each word w in the text, it uses a greedy strategy to put as many words
in a line as possible unless it exceeds the line width. Many word processors use
a similar algorithm to do word-wrapping.
There are many cases, the strict optimal result, but not the approximate
one is necessary. Dynamic programming can help to solve such problems.
Dynamic programming
In the change-making problem, we mentioned the greedy method cant always
give the optimal solution. For any coin system, are there any way to find the
best changes?
Suppose we have find the best solution which makes X value of money.
The coins needed are contained in Cm . We can partition these coins into two
collections, C1 and C2 . They make money of X1 , and X2 respectively. Well
prove that C1 is the optimal solution for X1 , and C2 is the optimal solution for
X2 .
Proof. For X1 , Suppose there exists another solution C1 , which uses less coins
than C1 . Then changing solution C1 C2 uses less coins to make X than Cm .
This is conflict with the fact that Cm is the optimal solution to X. Similarity,
we can prove C2 is the optimal solution to X2 .
Note that it is not true in the reverse situation. If we arbitrary select a
value Y < X, divide the original problem to find the optimal solutions for sub
problems Y and X Y . Combine the two optimal solutions doesnt necessarily
yield optimal solution for X. Consider this example. There are coins with value
1, 2, and 4. The optimal solution for making value 6, is to use 2 coins of value
2, and 4; However, if we divide 6 = 3 + 3, since each 3 can be made with optimal
solution 3 = 1 + 2, the combined solution contains 4 coins (1 + 1 + 2 + 2).
If an optimal problem can be divided into several sub optimal problems, we
call it has optimal substructure. We see that the change-making problem has
optimal substructure. But the dividing has to be done based on the coins, but
not with an arbitrary value.
The optimal substructure can be expressed recursively as the following.
{
: X=0
least({c change(X c)|c C, c X}) : otherwise
(14.92)
For any coin system C, the changing result for zero is empty; otherwise, we
check every candidate coin c, which is not greater then value X, and recursively
find the best solution for X c; We pick the coin collection which contains the
least coins as the result.
Below Haskell example program implements this top-down recursive solution.
change(X) =
change _ 0 = []
change cs x = minimumBy (compare on length)
[c:change cs (x - c) | c cs, c x]
Although this program outputs correct answer [2, 4] when evaluates change
[1, 2, 4] 6, it performs very bad when changing 1.42 dollar with USA coins
530
1
Fn1 + Fn2
: n=1n=2
: otherwise
(14.93)
F7 + F6
F6 + F5 + F5 + F4
F5 + F4 + F4 + F3 + F4 + F3 + F3 + F2
...
531
532
It happens that, both options yield a solution of two coins, we can select
either of them as the best solution. Generally speaking, the candidate with
fewest number of coins is selected as the solution, and filled into the table.
At any iteration, when we are trying to change the i < X value of money,
we examine all the types of coin. For any coin c not greater than i, we look
up the solution table to fetch the sub solution T [i c]. The number of coins
in this sub solution plus the one coin of c are the total coins needed in this
candidate solution. The fewest candidate is then selected and updated to the
solution table.
The following algorithm realizes this bottom-up idea.
1: function Change(X)
2:
T {, , ...}
3:
for i 1 to X do
4:
for c C, c i do
5:
if T [i] = 1 + |T [i c]| < |T [i]| then
6:
T [i] {c} T [i c]
7:
return T [X]
This algorithm can be directly translated to imperative programs, like Python
for example.
def changemk(x, cs):
s = [[] for _ in range(x+1)]
for i in range(1, x+1):
for c in cs:
if c i and (s[i] == [] or 1 + len(s[i-c]) < len(s[i])):
s[i] = [c] + s[i-c]
return s[x]
Observe the solution table, its easy to find that, there are many duplicated
contents being stored.
6
7
8
9
10
...
{1, 5} {1, 1, 5} {1, 1, 1, 5} {1, 1, 1, 1, 5} {5, 5} ...
This is because the optimal sub solutions are completely copied and saved
in parent solution. In order to use less space, we can only record the delta
part from the sub optimal solution. In change-making problem, it means that
we only need to record the coin being selected for value i.
1: function Change(X)
2:
T {0, , , ...}
3:
S {N IL, N IL, ...}
4:
for i 1 to X do
5:
for c C, c i do
6:
if 1 + T [i c] < T [i] then
7:
T [i] 1 + T [i c]
8:
S[i] c
9:
while X > 0 do
10:
Print(S[X])
11:
X X S[X]
Instead of recording the complete solution list of coins, this new algorithm
uses two tables T and S. T holds the minimum number of coins needed for
changing value 0, 1, 2, ...; while S holds the first coin being selected for the
533
optimal solution. For the complete coin list to change money X, the first coin
is thus S[X], the sub optimal solution is to change money X = X S[X]. We
can look up table S[X ] for the next coin. The coins for sub optimal solutions
are repeatedly looked up like this till the beginning of the table. Below Python
example program implements this algorithm.
def chgmk(x, cs):
cnt = [0] + [x+1] x
s = [0]
for i in range(1, x+1):
coin = 0
for c in cs:
if c i and 1 + cnt[i-c] < cnt[i]:
cnt[i] = 1 + cnt[i-c]
coin = c
s.append(coin)
r = []
while x > 0:
r.append(s[x])
x = x - s[x]
return r
(14.94)
In function change(T, i), all the coins not greater than i are examined to
select the one lead to the best result. The fewest number of coins, and the coin
being selected are formed to a pair. This pair is inserted to the finger tree, so
that a new solution table is returned.
change(T, i) = insert(T, f old(sel, (, 0), {c|c C, c i}))
(14.95)
Again, folding is used to select the candidate with the minimum number of
coins. This folding starts with initial value (, 0), on all valid coins. function
sel((n, c), c ) accepts two arguments, one is a pair of length and coin, which
12 Some purely functional programming environments, Haskell for instance, provide built-in
array; while other almost pure ones, such as ML, provide mutable array
534
is the best solution so far; the other is a candidate coin, it examines if this
candidate can make better solution.
sel((n, c), c ) =
(1 + n , c ) : 1 + n < n, (n , c ) = T [i c ]
(n, c) : otherwise
(14.96)
After the solution table is built, the coins needed can be generated from it.
{
: X=0
{c} make(X c, T ) : otherwise, (n, c) = T [X]
(14.97)
The following example Haskell program uses Data.Sequence, which is the
library of finger tree, to implement change making solution.
make(X, T ) =
535
: X =Y =
{x1 } LCS(X , Y ) : x1 = y1
536
n
2
t
3
e
4
n
5
n
6
a
7
b 1
a 2
n 3
a 4
n 5
a 6
This table shows an example of finding the longest common subsequence for
strings antenna and banana. Their lengths are 7, and 6. The right bottom
corner of this table is looked up first, Since its empty we need compare the
7th element in antenna and the 6th in banana, they are both a, Thus we
need next recursively look up the cell at row 5, column 6; Its still empty, and
we repeated this till either get a trivial case that one substring becomes empty,
or some cell we are looking up has been filled before. Similar to the changemaking problem, whenever the optimal solution for a sub-problem is found, it is
recorded in the cell for further reusing. Note that this process is in the reversed
order comparing to the recursive equation given above, that we start from the
right most element of each string.
Considering that the longest common subsequence for any empty string is
still empty, we can extended the solution table so that the first row and column
hold the empty strings.
a n t e n n a
b
a
n
a
n
a
Below algorithm realizes the top-down recursive dynamic programming solution with such a table.
1: T NIL
2: function LCS(X, Y )
3:
m |X|, n |Y |
4:
m m + 1, n n + 1
5:
if T = NIL then
6:
T {{, , ..., }, {, N IL, N IL, ...}, ...}
m n
7:
if X = Y = T [m ][n ] = NIL then
8:
if X[m] = Y [n] then
9:
T [m ][n ] Append(LCS(X[1..m 1], Y [1..n 1]), X[m])
10:
else
11:
T [m ][n ] Longer(LCS(X, Y [1..n 1]), LCS(X[1..m 1], Y ))
12:
return T [m ][n ]
The table is firstly initialized with the first row and column filled with empty
strings; the rest are all NIL values. Unless either string is empty, or the cell
content isnt NIL, the last two elements of the strings are compared, and recursively computes the longest common subsequence with substrings. The following
537
function Get(T, X, Y, i, j)
if i = 0 j = 0 then
return
else if X[i] = Y [j] then
return Append(Get(T, X, Y, i 1, j 1), X[i])
else if T [i 1][j] > T [i][j 1] then
return Get(T, X, Y, i 1, j)
else
return Get(T, X, Y, i, j 1)
In the bottom-up approach, we start from the cell at the second row and
the second column. The cell is corresponding to the first element in both X,
and Y . If they are same, the length of the longest common subsequence so far
is 1. This can be yielded by increasing the length of empty sequence, which is
stored in the top-left cell, by one; Otherwise, we pick the maximum value from
the upper cell and left cell. The table is repeatedly filled in this manner.
After that, a back-track is performed to construct the longest common subsequence. This time we start from the bottom-right corner of the table. If the
last elements in X and Y are same, we put this element as the last one of the
result, and go on looking up the cell along the diagonal line; Otherwise, we
538
compare the values in the left cell and the above cell, and go on looking up the
cell with the bigger value.
The following example Python program implements this algorithm.
def lcs(xs, ys):
m = len(xs)
n = len(ys)
c = [[0](n+1) for _ in xrange(m+1)]
for i in xrange(1, m+1):
for j in xrange(1, n+1):
if xs[i-1] == ys[j-1]:
c[i][j] = c[i-1][j-1] + 1
else:
c[i][j] = max(c[i-1][j], c[i][j-1])
return get(c, xs, ys, m, n)
def get(c, xs, ys, i, j):
if i==0 or j==0:
return []
elif xs[i-1] == ys[j-1]:
return get(c, xs, ys, i-1, j-1) + [xs[i-1]]
elif c[i-1][j] > c[i][j-1]:
return get(c, xs, ys, i-1, j)
else:
return get(c, xs, ys, i, j-1)
(14.99)
Note that, since the table need be looked up by index, X is zipped with
natural numbers. Function f creates a new row of this table by folding on
sequence Y , and records the lengths of the longest common sequence for all
possible cases so far.
f (T, (i, x)) = insert(T, f old(longest, {0}, zip({1, 2, ...}, Y )))
(14.100)
Function longest takes the intermediate filled row result, and a pair of index
and element in Y , it compares if this element is the same as the one in X. Then
fills the new cell with the length of the longest one.
{
x=y
otherwise
(14.101)
After the table is built. The longest common sub sequence can be constructed recursively by looking up this table. We can pass the reversed sequences
(14.102)
539
If the sequences are not empty, denote the first elements as x and y. The rest
elements are hold in X and Y respectively. The function get can be defined
as the following.
: X = Y =
: x=y
: T [i 1][j] > T [i][j 1]
: otherwise
(14.103)
Below Haskell example program implements this solution.
540
: X=
{{x1 }} solve(X , s) : x1 = s
sl = {x X, x < 0}
(14.105)
su = {x X, x > 0}
solve(X, s) =
2:
sl {x X, x < 0}
3:
su {x X, x > 0}
4:
n |X|
541
n (su sl + 1)
Note that the index to the columns of the table, doesnt range from 1 to su
sl +1, but maps directly from sl to su . Because most programming environments
dont support negative index, this can be dealt with T [i][j sl ]. The following
example Python program utilizes the property of negative indexing.
def solve(xs, s):
low = sum([x for x in xs if x < 0])
up = sum([x for x in xs if x > 0])
tab = [[False](up-low+1) for _ in xs]
for i in xrange(0, len(xs)):
for j in xrange(low, up+1):
tab[i][j] = (xs[i] == j)
j1 = j - xs[i];
tab[i][j] = tab[i][j] or tab[i-1][j] or
(low j1 and j1 up and tab[i-1][j1])
return tab[-1][s]
Note that this program doesnt use dierent branches for i = 0 and i =
1, 2, ..., n 1. This is because when i = 0, the row index to i 1 = 1 refers to
the last row in the table, which are all false. This simplifies the logic one more
step.
With this table built, its easy to construct all subsets sum to s. The method
is to look up the last row for cell represents s. If the last element xn = s, then
{xn } definitely is a candidate. We next look up the previous row for s, and
recursively construct all the possible subsets sum to s with {x1 , x2 , x3 , ..., xn1 }.
Finally, we look up the second last row for cell represents s xn . And for every
subset sums to this value, we add element xn to construct a new subset, which
sums to s.
1: function Get(X, s, T, n)
2:
S
3:
if X[n] = s then
4:
S S {X[n]}
5:
if n > 1 then
6:
if T [n 1][s] then
7:
S S Get(X, s, T, n 1)
8:
if T [n 1][s X[n]] then
9:
S S {{X[n]} S |S Get(X, s X[n], T, n 1) }
10:
return S
The following Python example program translates this algorithm.
542
2:
sl {x X, x < 0}
3:
su {x X, x > 0}
4:
T {, , ...}
su sl + 1
5:
for x X do
6:
T Duplicate(T )
7:
for j sl to su do
8:
j j x
9:
if x = j then
10:
T [j] T [j] {x}
11:
if sl j su T [j ] = then
12:
T [j] T [j] {{x} S|S T [j ]}
13:
T T
14:
return T [s]
The corresponding Python example program is given as below.
def subsetsum(xs, s):
low = sum([x for x in xs if x < 0])
up = sum([x for x in xs if x > 0])
tab = [[] for _ in xrange(low, up+1)]
for x in xs:
tab1 = tab[:]
for j in xrange(low, up+1):
if x == j:
tab1[j].append([x])
j1 = j - x
if low j1 and j1 up and tab[j1] != []:
tab1[j] = tab1[j] + [[x] + ys for ys in tab[j1]]
tab = tab1
return tab[s]
This imperative algorithm shows a clear structure, that the solution table is
543
built by looping every element. This can be realized in purely functional way
by folding. A finger tree can be used to represents the vector spans from sl to
su . It is initialized with all empty values as in the following equation.
subsetsum(X, s) = f old(build, {, , ..., }, X)[s]
(14.106)
After folding, the solution table is built, the answer is looked up at cell s13 .
For every element x X, function build folds the list {sl , sl + 1, ..., su }, with
every value j, it checks if it equals to x and appends the singleton set {x} to
the j-th cell. Not that here the cell is indexed from sl , but not 0. If the cell
corresponding to j x is not empty, the candidate solutions stored in that place
are also duplicated and add element x is added to every solution.
build(T, x) = f old(f, T, {sl , sl + 1, ..., su })
(14.107)
T [j] {{x} Y |Y T [j ]} : sl j su T [j ] = , j = j x
T : otherwise
(14.108)
Here the adjustment is applied on T , which is another adjustment to T as
shown as below.
{
{x} T [j] : x = j
T =
(14.109)
T : otherwise
f (T, j) =
Note that the first clause in both equation (14.108) and (14.109) return a
new table with certain cell being updated with the given value.
The following Haskell example program implements this algorithm.
subsetsum xs s = foldl build (fromList [[] | _ [l..u]]) xs idx s where
l = sum $ filter (< 0) xs
u = sum $ filter (> 0) xs
idx t i = index t (i - l)
build tab x = foldl (t j let j = j - x in
adjustIf (l j && j u && tab idx j /= [])
(++ [(x:ys) | ys tab idx j]) j
(adjustIf (x == j) ([x]:) j t)) tab [l..u]
adjustIf pred f i seq = if pred then adjust f (i - l) seq else seq
Some materials like [16] provide common structures to abstract dynamic programming. So that problems can be solved with a generic solution by customizing the precondition, the comparison of candidate solutions for better choice,
and the merge method for sub solutions. However, the variety of problems
makes things complex in practice. Its important to study the properties of the
problem carefully.
Exercise 14.3
Realize a maze solver by using the stack approach, which can find all the
possible paths.
13 Again, here we skip the error handling to the case that s < s or s > s . There is no
u
l
solution if s is out of range.
544
6
4
3
545
One option to realize the bottom-up solution for the longest common
subsequence problem is to record the direction in the table. Thus, instead
of storing the length information, three values like N, for north, W
for west, and NW for northwest are used to indicate how to construct
the final result. We start from the bottom-right corner of the table, if
the cell value is NW, we go along the diagonal by moving to the cell
in the upper-left; if its N, we move vertically to the upper row; and
move horizontally if its W. Implement this approach in your favorite
programming language.
Given a list of non-negative integers, find the maximum sum composed by
numbers that none of them are adjacent.
Levenshtein edit distance is defined as the cost of converting from one
string s to another string t. It is widely used in spell-checking, OCR
correction etc. There are three operations allowed in Levenshtein edit
distance. Insert a character; delete a character; and substitute a character.
Each operation mutate one character a time. The following exaple shows
how to convert string kitten to sitting. The Levenshtein edit distance
is 3 in this case.
1. kitten sitten (substitution of s for k);
2. sitten sittin (substitution of i for e);
3. sitten sitting (insertion of g at the end).
Develop a program to calculate Levenshtein edit distance for two strings
with Dynamic Programming.
14.4
Short summary
This chapter introduces the elementary methods about searching. Some of them
instruct the computer to scan for interesting information among the data. They
often have some structure, that can be updated during the scan. This can be
considered as a special case for the information reusing approach. The other
commonly used strategy is divide and conquer, that the scale of the search
domain is kept decreasing till some obvious result. This chapter also explains
methods to search for solutions among domains. The solutions typically are not
the elements being searched. They can be a series of decisions or some operation
arrangement. If there are multiple solutions, sometimes, people want to find the
optimized one. For some spacial cases, there exist simplified approach such as
the greedy methods. And dynamic programming can be used for more wide
range of problems when they shows optimal substructures.
546
Bibliography
[1] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting
and Searching (2nd Edition). Addison-Wesley Professional; 2 edition (May
4, 1998) ISBN-10: 0201896850 ISBN-13: 978-0201896855
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord
Stein. Introduction to Algorithms, Second Edition. ISBN:0262032937.
The MIT Press. 2001
[3] M. Blum, R.W. Floyd, V. Pratt, R. Rivest and R. Tarjan, Time bounds
for selection, J. Comput. System Sci. 7 (1973) 448-461.
[4] Jon Bentley. Programming pearls, Second Edition. Addison-Wesley Professional; 1999. ISBN-13: 978-0201657883
[5] Richard Bird. Pearls of functional algorithm design. Chapter 3. Cambridge University Press. 2010. ISBN, 1139490605, 9781139490603
[6] Edsger W. Dijkstra. The saddleback search. EWD-934.
http://www.cs.utexas.edu/users/EWD/index09xx.html.
1985.
[7] Robert Boyer, and Strother Moore. MJRTY - A Fast Majority Vote Algorithm. Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning Series, Kluwer Academic Publishers, Dordrecht, The
Netherlands, 1991, pp. 105-117.
[8] Cormode, Graham; S. Muthukrishnan (2004). An Improved Data Stream
Summary: The Count-Min Sketch and its Applications. J. Algorithms 55:
29C38.
[9] Knuth Donald, Morris James H., jr, Pratt Vaughan. Fast pattern matching
in strings. SIAM Journal on Computing 6 (2): 323C350. 1977.
[10] Robert Boyer, Strother Moore. A Fast String Searching Algorithm.
Comm. ACM (New York, NY, USA: Association for Computing Machinery) 20 (10): 762C772. 1977
[11] R. N. Horspool. Practical fast searching in strings. Software - Practice &
Experience 10 (6): 501C506. 1980.
[12] Wikipedia.
Boyer-Moore
string
search
algorithm.
http://en.wikipedia.org/wiki/Boyer-Moore string search algorithm
[13] Wikipedia. Eight queens puzzle. http://en.wikipedia.org/wiki/Eight queens puzzle
547
548
BIBLIOGRAPHY
[14] George Polya. How to solve it: A new aspect of mathematical method.
Princeton University Press(April 25, 2004). ISBN-13: 978-0691119663
[15] Wikipedia. David A. Human. http://en.wikipedia.org/wiki/David A. Human
[16] Fethi Rabhi, Guy Lapalme Algorithms: a functional programming approach. Second edition. Addison-Wesley.
Part VI
Appendix
549
Appendix A
Lists
A.1
Introduction
This book intensely uses recursive list manipulations in purely functional settings. List can be treated as a counterpart to array in imperative settings, which
are the bricks to many algorithms and data structures.
For the readers who are not familiar with functional list manipulation, this
appendix provides a quick reference. All operations listed in this appendix
are not only described in equations, but also implemented in both functional
programming languages as well as imperative languages as examples.
Besides the elementary list operations, this appendix also contains explanation of some high order function concepts such as mapping, folding etc.
A.2
List Definition
Like arrays in imperative settings, lists play a critical role in functional setting1 .
Lists are built-in support in some programming languages like Lisp families and
ML families so it neednt explicitly define list in those environment.
List, or more precisely, singly linked-list is a data structure that can be
described below.
A list is either empty;
Or contains an element and a list.
Note that this definition is recursive. Figure A.1 illustrates a list with n
nodes. Each node contains two part, a key element and a sub list. The sub list
contained in the last node is empty, which is denoted as NIL.
This data structure can be explicitly defined in programming languages support record (or compound type) concept. The following ISO C++ code defines
list2 .
1 Some reader may argue that lambda calculus plays the most critical role. Lambda calculus is somewhat as assembly languages to the computation world, which is worthy studying
from the essence of computation model to the practical programs. However, we dont dive
into the topic in this book. Users can refer to [4] for detail.
2 We only use template to parameterize the type of the element in this chapter. Except
this point, all imperative source code are in ANSI C style to avoid language specific features.
551
552
APPENDIX A. LISTS
key[1]
next
key[2]
next
...
key[N]
NIL
template<typename T>
struct List {
T key;
List next;
};
A.2.1
Empty list
A.2.2
Given a list L, two functions can be defined to access the element stored in
it and the sub list respectively. They are typically denoted as f irst(L), and
rest(L) or head(L) and tail(L) for the same meaning. These two functions are
named as car and cdr in Lisp for historic reason about the design of machine
registers [5]. In languages support Pattern matching (e.g. ML families, Prolog
and Erlang etc.) These two functions are commonly realized by matching the
cons which well introduced later. for example the following Haskell program:
head (x:xs) = x
tail (x:xs) = xs
553
If the list is defined in record syntax like what we did above, these two
functions can be realized by accessing the record fields 3 .
template<typename T>
T first(List<T> xs) { return xskey; }
template<typename T>
List<T> rest(List<T> xs) { return xsnext; }
A.3
A.3.1
The last C++ template meta programming example actually shows literate
construction of a list. A list can be constructed from an element with a sub
list, where the sub list can be empty. We denote function cons(x, L) as the
constructor. This name is used in most Lisp dialects. In ML families, there are
cons operator defined as ::, (in Haskell its :).
We can define cons to create a record as we defined above in ISO C++, for
example4 .
template<typename T>
List<T> cons(T x, List<T> xs) {
List<T> lst = new List<T>;
lstkey = x;
lstnext = xs;
return lst;
}
3 They
554
APPENDIX A. LISTS
A.3.2
It is trivial to test if a list is empty. If the environment contains nil concept, the
testing should also handle nil case. Both Lisp dialects and ML families provide
null testing functions. Empty testing can also be realized by pattern-matching
with empty list if possible. The following Haskell program shows such example.
null [] = True
null _ = False
0 :
1 + length(L ) :
L=
otherwise
(A.1)
Here L = rest(L) as mentioned above, its {l2 , l3 , ..., ln } for list contains n
elements. Note that both L and L can be empty . In this equation, we also
use = to test if list L is empty. In order to know the length of a list, we need
traverse all the elements from the head to the end, so that this algorithm is
proportion to the number of elements stored in the list. It is a linear algorithm
bound to O(n) time.
Below are two programs in Haskell and in Scheme/Lisp realize this recursive
algorithm.
length [] = 0
length (x:xs) = 1 + length xs
(define (length lst)
(if (null? lst) 0 (+ 1 (length (cdr lst)))))
How to testing if two lists are identical is left as exercise to the reader.
A.3.3
555
indexing
One big dierence between array and list (singly-linked list accurately) is that
array supports random access. Many programming languages support using
x[i] to access the i-th element stored in array in constant O(1) time. The
index typically starts from 0, but its not the all case. Some programming
languages using 1 as the first index. In this appendix, we treat index starting
from 0. However, we must traverse the list with i steps to reach the target
element. The traversing is quite similar to the length calculation. Thus its
commonly expressed as below in imperative settings.
1: function Get-At(L, i)
2:
while i = 0 do
3:
L Next(L)
4:
ii1
5:
return First(L)
Note that this algorithm doesnt handle the error case such that the index
isnt within the bound of the list. We assume that 0 i < |L|, where |L| =
length(L). The error handling is left as exercise to the reader. The following
ISO C++ code is a line-by-line translation of this algorithm.
template<typename T>
T getAt(List<T> lst, int n) {
while(n--)
lst = lstnext;
return lstkey;
}
F irst(L) :
getAt(Rest(L), i 1) :
i=0
otherwise
(A.2)
In order to get the i-th element, the algorithm does the following:
if i is 0, then we are done, the result is the first element in the list;
Otherwise, the result is to get the (i 1)-th element from the sub-list.
This algorithm can be translated to the following Haskell code.
getAt i (x:xs) = if i == 0 then x else getAt i-1 xs
Note that we are using pattern matching to ensure the list isnt empty, which
actually handles all out-of-bound cases with un-matched pattern error. Thus if
i > |L|, we finally arrive at a edge case that the index is i |L|, while the list is
empty; On the other hand, if i < 0, minus it by one makes it even farther away
from 0. We finally end at the same error that the index is some negative, while
the list is empty;
The indexing algorithm takes time proportion to the value of index, which is
bound to O(i) linear time. This section only address the read semantics. How
to mutate the element at a given position is explained in later section.
556
APPENDIX A. LISTS
A.3.4
Although accessing the first element and the rest list L is trivial, the opposite
operations, that retrieving the last element and the initial sub list need linear
time without using a tail pointer. If the list isnt empty, we need traverse it till
the tail to get these two components. Below are their imperative descriptions.
1: function Last(L)
2:
x NIL
3:
while L = NIL do
4:
x First(L)
5:
L Rest(L)
6:
return x
function Init(L)
L NIL
9:
while Rest(L) = NIL do
10:
L Append(L , First(L))
11:
L Rest(L)
12:
return L
The algorithm assumes that the input list isnt empty, so the error handling
is skipped. Note that the Init() algorithm uses the appending algorithm which
will be defined later.
Below are the corresponding ISO C++ implementation. The optimized version by utilizing tail pointer is left as exercise.
7:
8:
template<typename T>
T last(List<T> xs) {
T x; / Can be set to a special value to indicate empty list err. /
for (; xs; xs = xsnext)
x = xskey;
return x;
}
template<typename T>
List<T> init(List<T> xs) {
List<T> ys = NULL;
for (; xsnext; xs = xsnext)
ys = append(ys, xskey);
return ys;
}
F irst(L) :
last(Rest(L)) :
Rest(L) =
otherwise
(A.3)
557
The similar approach can be used to get a list contains all elements except
for the last one.
The edge case: If the list contains only one element, then the result is an
empty list;
Otherwise, we can first get a list contains all elements except for the last
one from the rest sub-list, then construct the final result from the first
element and this intermediate result.
{
init(L) =
: L =
cons(l1 , init(L )) : otherwise
(A.4)
Here we denote l1 as the first element of L, and L is the rest sub-list. This
recursive algorithm neednt use appending, It actually construct the final result
list from right to left. Well introduce a high-level concept of such kind of
computation later in this appendix.
Below are Haskell programs implement last() and init() algorithms by using
pattern matching.
last [x] = x
last (_:xs) = last xs
init [x] = []
init (x:xs) = x : init xs
Where [x] matches the singleton list contains only one element, while ( :xs)
matches any non-empty list, and the underscore ( ) is used to indicate that we
dont care about the element. For the detail of pattern matching, readers can
refer to any Haskell tutorial materials, such as [8].
A.3.5
Reverse indexing
Reverse indexing is a general case for last(), finding the last i-th element in
a singly-linked list with the minimized memory spaces is interesting, and this
problem is often used in technical interview in some companies. A naive implementation takes 2 rounds of traversing, the first round is to determine the
length of the list n, then, calculate the left-hand index by n i 1. Finally a
second round of traverse is used to access the element with the left-hand index.
This idea can be give as the following equation.
getAtR(L, i) = getAt(L, length(L) i 1)
There exists better imperative solution. For illustration purpose, we omit
the error cases such as index is out-of-bound etc. The idea is to keep two
pointers p1 , p2 , with the distance of i between them, that resti (p2 ) = p1 , where
resti (p2 ) means repleatedly apply rest() function i times. It says that succeeds
i steps from p2 gets p1 . We can start p2 from the head of the list and advance
the two pointers in parallel till one of them (p1 ) arrives at the end of the list.
At that time point, pointer p2 exactly arrived at the i-th element from right.
Figure A.2 illustrates this idea.
It is straightforward to realize the imperative algorithm based on this double
pointers solution.
558
APPENDIX A. LISTS
p2
x[1]
p1
x[2]
...
x[i+1]
...
x[N]
p2
x[1]
x[2]
...
x[N-i]
p1
...
x[N]
(b) When p1 reaches the end, p2 points to the i-th element from right.
1:
2:
3:
4:
5:
6:
7:
8:
9:
function Get-At-R(L, i)
pL
while i = 0 do
L Rest(L)
ii1
while Rest(L) = NIL do
L Rest(L)
p Rest(p)
return First(p)
The following ISO C++ code implements the double pointers right indexing
algorithm.
template<typename T>
T getAtR(List<T> xs, int i) {
List<T> p = xs;
while(i--)
xs = xsnext;
for(; xsnext; xs = xsnext, p = pnext);
return pkey;
}
559
Otherwise, we drop the first element from L and S, and recursively examine L and S .
This algorithm description can be formalized as the following equations.
getAtR(L, i) = examine(L, drop(i, L))
(A.5)
f irst(L) :
examine(rest(L), rest(S)) :
|S| = 1
otherwise
(A.6)
Well explain the detail of drop() function in later section about list mutating
operations. Here it can be implemented as repeatedly call rest() with specified
times.
{
L : n=0
drop(n, L) =
drop(n 1, rest(L)) : otherwise
Translating the equations to Haskell yields this example program.
atR :: [a] Int a
atR xs i = get xs (drop i xs) where
get (x:_) [_] = x
get (_:xs) (_:ys) = get xs ys
drop n as@(_:as) = if n == 0 then as else drop (n-1) as
A.3.6
Mutating
Strictly speaking, we cant mutate the list at all in purely functional settings.
Unlike in imperative settings, mutate is actually realized by creating new list.
Almost all functional environments support garbage collection, the original list
may either be persisted for reusing, or released (dropped) at sometime (Chapter
2 in [6]).
Appending
Function cons can be viewed as building list by insertion element always on head.
If we chains multiple cons operations, it can repeatedly construct a list from
right to the left. Appending on the other hand, is an operation adding element
to the tail. Compare to cons which is trivial constant time O(1) operation, We
must traverse the whole list to locate the appending position. It means that
appending is bound to O(n), where n is the length of the list. In order to speed
up the appending, imperative implementation typically uses a field (variable) to
record the tail position of a list, so that the traversing can be avoided. However,
in purely functional settings we cant use such tail pointer. The appending has
to be realized in recursive manner.
{
append(L, x) =
{x} : L =
cons(f irst(L), append(rest(L), x)) : otherwise
(A.7)
560
APPENDIX A. LISTS
That the algorithm handles two dierent appending cases:
If the list is empty, the result is a singleton list contains x, which is the
element to be appended. The singleton list notion {x} = cons(x, ), is a
simplified form of cons the element with an empty list ;
Otherwise, for the none-empty list, the result can be achieved by first appending the element x to the rest sub-list, then construct the first element
of L with the recursive appending result.
For the none-trivial case, if we denote L = {l1 , l2 , ...}, and L = {l2 , l3 , ...}
the equation can be written as.
{
append(L, x) =
{x} : L =
cons(l1 , append(L , x)) : otherwise
(A.8)
Even without the tail pointer, its possible to traverse the list imperatively
and append the element at the end.
1: function Append(L, x)
2:
if L = NIL then
3:
return Cons(x, NIL)
4:
HL
5:
while Rest(L) = NIL do
6:
L Rest(L)
7:
Rest(L) Cons(x, NIL)
8:
return H
The following ISO C++ programs implements this algorithm. How to utilize a tail field to speed up the appending is left as exercise to the reader for
interesting.
template<typename T>
List<T> append(List<T> xs, T x) {
List<T> tail, head;
for (head = tail = xs; xs; xs = xsnext)
tail = xs;
if (!head)
head = cons<T>(x, NULL);
else
tailnext = cons<T>(x, NULL);
return head;
}
561
So that we can use this function to mutate the 2nd element as below.
List<int> xs = cons(1, cons(2, cons<int>(3, NULL)));
getAt(xs, 1) = 4;
In an impure functional environment, such as Scheme/Lisp, to set the ith element to a given value can be implemented by mutate the referenced cell
directly as well.
(define (set-at! lst i x)
(if (= i 0)
(set-car! lst x)
(set-at! (cdr lst) (- i 1) x)))
This program first checks if the index i is zero, if so, it mutate the first
element of the list to given value x; otherwise, it deduces the index i by one, and
tries to mutate the rest of the list at this new index with value x. This function
doesnt return meaningful value. It is for use of side-eect. For instance, the
following code mutates the 2nd element in a list.
(define lst (1 2 3 4 5))
(set-at! lst 1 4)
(display lst)
(1 4 3 4 5)
cons(x, L ) :
cons(l1 , setAt(L , i 1, x)) :
i=0
otherwise
(A.9)
562
APPENDIX A. LISTS
Comparing the below Scheme/Lisp implementation to the previous one reveals the dierence from imperative mutating.
(define (set-at lst i x)
(if (= i 0)
(cons x (cdr lst))
(cons (car lst) (set-at (cdr lst) (- i 1) x))))
Here we skip the error handling for out-of-bound error etc. Again, similar
to the random access algorithm, the performance is bound to linear time, as
traverse is need to locate the position to set the value.
insertion
There are two semantics about list insertion. One is to insert an element at a
given position, which can be denoted as insert(L, i, x). The algorithm is close
to setAt(L, i, x); The other is to insert an element to a sorted list, so that the
the result list is still sorted.
Lets first consider how to insert an element x at a given position i. The
obvious thing is that we need firstly traverse i elements to get to the position,
the rest of work is to construct a new sub-list with x being the head of this
sub-list. Finally, we construct the whole result by attaching this new sub-list to
the end of the first i elements.
The algorithm can be described accordingly to this idea. If we want to insert
an element x to a list L at i.
Edge case: If i is zero, then the insertion turns to be a trivial cons
operation cons(x, L);
Otherwise, we recursively insert x to the sub-list L at position i 1; then
construct the first element with this result.
Below equation formalizes the insertion algorithm.
{
insert(L, i, x) =
cons(x, L) : i = 0
cons(l1 , insert(L , i 1, x)) : otherwise
(A.10)
563
HL
pL
while i = 0 do
pL
L Rest(L)
ii1
Rest(p) Cons(x, L)
return H
And the ISO C++ example program is given by translating this algorithm.
template<typename T>
List<T> insert(List<T> xs, int i , int x) {
List<T> head, prev;
if (i == 0)
return cons(x, xs);
for (head = xs; i; --i, xs = xsnext)
prev = xs;
prevnext = cons(x, xs);
return head;
}
cons(x, ) : L =
cons(x, L) : x < l1
insert(x, L) =
(A.11)
Since the algorithm need compare the elements one by one, its also a linear
time algorithm. Note that here we use the as notion for pattern matching in
Haskell. Readers can refer to [8] and [7] for details.
This ordered insertion algorithm can be designed in imperative manner, for
example like the following pseudo code5 .
1: function Insert(x, L)
2:
if L = x < First(L) then
5 Reader can refer to the chapter The evolution of insertion sort in this book for a minor
dierent one
564
APPENDIX A. LISTS
return Cons(x, L)
4:
HL
5:
while Rest(L) = First(Rest(L)) < x do
6:
L Rest(L)
7:
Rest(L) Cons(x, Rest(L))
8:
return H
If either the list is empty, or the new element to be inserted is less than
the first element in the list, we can just put this element as the new first one;
Otherwise, we record the head, then traverse the list till a position, where x is
less than the rest of the sub-list, and put x in that position. Compare this one
to the insert at algorithm shown previously, the variable p uses to point to the
previous position during traversing is omitted by examine the sub-list instead
of current list. The following ISO C++ program implements this algorithm.
3:
template<typename T>
List<T> insert(T x, List<T> xs) {
List<T> head;
if (!xs | | x < xskey)
return cons(x, xs);
for (head = xs; xsnext && xsnextkey < x; xs = xsnext);
xsnext = cons(x, xsnext);
return head;
}
With this linear time ordered insertion defined, its possible to implement
quadratic time insertion-sort by repeatedly inserting elements to an empty list
as formalized in this equation.
{
: L=
sort(L) =
(A.12)
insert(l1 , sort(L )) : otherwise
This equation says that if the list to be sorted is empty, the result is also
empty, otherwise, we can firstly recursively sort all elements except for the
first one, then ordered insert the first element to this intermediate result. The
corresponding Haskell program is given as below.
isort [] = []
isort (x:xs) = insert x (isort xs)
And the imperative linked-list base insertion sort is described in the following. That we initialize the result list as empty, then take the element one by
one from the list to be sorted, and ordered insert them to the result list.
1: function Sort(L)
2:
L
3:
while L = do
4:
L Insert(First(L), L )
5:
L Rest(L)
6:
return L
Note that, at any time during the loop, the result list is kept sorted. There is
a major dierence between the recursive algorithm (formalized by the equation)
and the procedural one (described by the pseudo code), that the former process
the list from right, while the latter from left. Well see in later section about
tail-recursion how to eliminate this dierence.
565
The ISO C++ version of linked-list insertion sort is list like this.
template<typename T>
List<T> isort(List<T> xs) {
List<T> ys = NULL;
for(; xs; xs = xsnext)
ys = insert(xskey, ys);
return ys;
}
There is also a dedicated chapter discusses insertion sort in this book. Please
refer to that chapter for more details including performance analysis and finetuning.
deletion
In purely functional settings, there is no deletion at all in terms of mutating, the
data is persist, what the semantic deletion means is actually to create a new
list with all the elements in previous one except for the element being deleted.
Similar to the insertion, there are also two deletion semantics. One is to
delete the element at a given position; the other is to find and delete elements
of a given value. The first can be expressed as delete(L, i), while the second is
delete(L, x).
In order to design the algorithm delete(L, i) (or delete at), we can use the
idea which is quite similar to random access and insertion, that we first traverse
the list to the specified position, then construct the result list with the elements
we have traversed, and all the others except for the next one we havent traversed
yet.
The strategy can be realized in a recursive manner that in order to delete
the i-th element from list L,
If i is zero, that we are going to delete the first element of a list, the result
is obviously the rest of the list;
If the list to be removed element is empty, the result is anyway empty;
Otherwise, we can recursively delete the (i 1)-th element from the sublist L , then construct the final result from the first element of L and this
intermediate result.
Note there are two edge cases, and the second case is major used for error
handling. This algorithm can be formalized with the following equation.
L : i = 0
: L=
delete(L, i) =
(A.13)
566
APPENDIX A. LISTS
This is a linear time algorithm as well, and there are also alternatives for
implementation, for example, we can first split the list at position i 1, to get
2 sub-lists L1 and L2 , then we can concatenate L1 and L2 .
The delete at algorithm can also be realized imperatively, that we traverse
to the position by looping:
1: function Delete(L, i)
2:
if i = 0 then
3:
return Rest(L)
4:
HL
5:
pL
6:
while i = 0 do
7:
ii1
8:
pL
9:
L Rest(L)
10:
Rest(p) Rest(L)
11:
return H
Dierent from the recursive approach, The error handling for out-of-bound is
skipped. Besides that the algorithm also skips the handling of resource releasing
which is necessary in environments without GC (Garbage collection). Below ISO
C++ code for example, explicitly releases the node to be deleted.
template<typename T>
List<T> del(List<T> xs, int i) {
List<T> head, prev;
if (i == 0)
head = xsnext;
else {
for (head = xs; i; --i, xs = xsnext)
prev = xs;
prevnext = xsnext;
}
xsnext = NULL;
delete xs;
return head;
}
567
Otherwise, we keep the first element, and recursively find and delete the
element in the sub list with the given value. The final result is a list
constructed with the kept first element, and the recursive deleting result.
This algorithm can be formalized by the following equation.
: L=
L : l1 = x
delete(L, x) =
(A.14)
This algorithm is bound to linear time as it traverses the list to find and
delete element. Translating this equation to Haskell program yields the below
code, note that, the first edge case is handled by pattern-matching the empty
list; while the other two cases are further processed by if-else expression.
del [] _ = []
del (x:xs) y = if x == y then xs else x : del xs y
Dierent from the above imperative algorithms, which skip the error handling in most cases, the imperative find and delete realization must deal with
the problem that the given value doesnt exist.
1: function Delete(L, x)
2:
if L = then
Empty list
3:
return
4:
if First(L) = x then
5:
H Rest(L)
6:
else
7:
HL
8:
while L = First(L) = x do
List isnt empty
9:
pL
10:
L Rest(L)
11:
if L = then
Found
12:
Rest(p) Rest(L)
13:
return H
If the list is empty, the result is anyway empty; otherwise, the algorithm
traverses the list till either finds an element identical to the given value or to
the end of the list. If the element is found, it is removed from the list. The
following ISO C++ program implements the algorithm. Note that there are
codes release the memory explicitly.
template<typename T>
List<T> del(List<T> xs, T x) {
List<T> head, prev;
if (!xs)
return xs;
if (xskey == x)
head = xsnext;
else {
for (head = xs; xs && xskey != x; xs = xsnext)
prev = xs;
if (xs)
prevnext = xsnext;
}
568
APPENDIX A. LISTS
if (xs) {
xsnext = NULL;
delete xs;
}
return head;
}
concatenate
Concatenation can be considered as a general case for appending, that appending only adds one more extra element to the end of the list, while concatenation
adds multiple elements.
However, It will lead to quadratic algorithm if implement concatenation
naively by appending, which performs poor. Consider the following equation.
{
concat(L1 , L2 ) =
L1 : L2 =
concat(append(L1 , f irst(L2 )), rest(L2 )) : otherwise
Note that each appending algorithm need traverse to the end of the list,
which is proportion to the length of L1 , and we need do this linear time appending work |L2 | times, so the total performance is O(|L1 | + (|L1 | + 1) + ... +
(|L1 | + |L2 |)) = O(|L1 ||L2 | + |L2 |2 ).
The key point is that the linking operation of linked-list is fast (constant
O(1) time), we can traverse to the end of L1 only one time, and link the second
list to the tail of L1 .
{
: L1 =
: otherwise
(A.15)
This algorithm only traverses the first list one time to get the tail of L1 , and
then linking the second list with this tail. So the algorithm is bound to linear
O(|L1 |) time.
This algorithm is described as the following.
concat(L1 , L2 ) =
L2
cons(f irst(L1 ), concat(rest(L1 ), L2 ))
If the first list is empty, the concatenate result is the second list;
Otherwise, we concatenate the second list to the sub-list of the first one,
and construct the final result with the first element and this intermediate
result.
Most functional languages provide built-in functions or operators for list
concatenation, for example in ML families ++ is used for this purpose.
[] ++ ys = ys
xs ++ [] = xs
(x:xs) ++ ys = x : xs ++ ys
Note we add another edge case that if the second list is empty, we neednt
traverse to the end of the first one and perform linking, the result is merely the
first list.
In imperative settings, concatenation can be realized in constant O(1) time
with the augmented tail record. We skip the detailed implementation for this
569
method, reader can refer to the source code which can be download along with
this appendix.
The imperative algorithm without using augmented tail record can be described as below.
1: function Concat(L1 , L2 )
2:
if L1 = then
3:
return L2
4:
if L2 = then
5:
return L1
6:
H L1
7:
while Rest(L1 ) = do
8:
L1 Rest(L1 )
9:
Rest(L1 ) L2
10:
return H
And the corresponding ISO C++ example code is given like this.
template<typename T>
List<T> concat(List<T> xs, List<T> ys) {
List<T> head;
if (!xs)
return ys;
if (!ys)
return xs;
for (head = xs; xsnext; xs = xsnext);
xsnext = ys;
return head;
}
A.3.7
(A.16)
(A.17)
570
APPENDIX A. LISTS
The following Haskell program implements sum and product.
sum [] = 0
sum (x:xs) = x + sum xs
product [] = 1
product (x:xs) = x product xs
Both algorithms traverse the whole list during calculation, so they are bound
to O(n) linear time.
Tail call recursion
Note that both sum and product algorithms actually compute the result from
right to left. We can change them to the normal way, that calculate the accumulated result from left to right. For example with sum, the result is actually
accumulated from 0, and adds element one by one to this accumulated result
till all the list is consumed. Such approach can be described as the following.
When accumulate result of a list by summing:
If the list is empty, we are done and return the accumulated result;
Otherwise, we take the first element from the list, accumulate it to the
result by summing, and go on processing the rest of the list.
Formalize this idea to equation yields another version of sum algorithm.
{
A : L=
sum (A, L) =
(A.18)
sum (A + l1 , L ) : otherwise
And sum can be implemented by calling this function by passing start value
0 and the list as arguments.
sum(L) = sum (0, L)
(A.19)
The interesting point of this approach is that, besides it calculates the result
in a normal order from left to right; by observing the equation of sum (A, L),
we found it neednt remember any intermediate results or states when perform
recursion. All such states are either passed as arguments (A for example) or
can be dropped (previous elements of the list for example). So in a practical
implementation, such kind of recursive function can be optimized by eliminating
the recursion at all.
We call such kind of function as tail recursion (or tail call), and the optimization of removing recursion in this case as tail recursion optimization[10]
because the recursion happens as the final action in such function. The advantage of tail recursion optimization is that the performance can be greatly
improved, so that we can avoid the issue of stack overflow in deep recursion
algorithms such as sum and product.
Changing the sum and product Haskell programs to tail-recursion manner
gives the following modified programs.
sum = sum 0 where
sum acc [] = acc
sum acc (x:xs) = sum (acc + x) xs
571
sort (A, L) =
(A.20)
sort (insert(l1 , A), L ) : otherwise
The the sorting algorithm is just calling this function by passing empty list
as the accumulator argument.
sort(L) = sort (, L)
(A.21)
1 : n=0
pow(b, n2 )2 : 2|n
pow(b, n) =
b pow(b, n 1) : otherwise
(A.22)
572
APPENDIX A. LISTS
1 : n=0
pow(b2 , n2 ) : 2|n
pow(b, n) =
(A.23)
b pow(b, n 1) : otherwise
With this change, its easy to get a tail-recursive
so that bn = pow (b, n, 1).
A :
pow (b2 , n2 , A) :
pow (b, n, A) =
pow (b, n 1, A b) :
(A.24)
A : n=0
bn = bn A.
This algorithm can be implemented in Haskell like the following.
pow b n = pow b n 1 where
pow b n acc | n == 0 = acc
| even n = pow (bb) (n div 2) acc
| otherwise = pow (bb) (n div 2) (accb)
573
function Product(L)
p1
while L = do
p p First(L)
L Rest(L)
return p
The corresponding ISO C++ example programs are list as the following.
template<typename T>
T sum(List<T> xs) {
T s;
for (s = 0; xs; xs = xsnext)
s += xskey;
return s;
}
template<typename T>
T product(List<T> xs) {
T p;
for (p = 1; xs; xs = xsnext)
p = xskey;
return p;
}
A.3.8
Another very useful use case is to get the minimum or maximum element of
a list. Well see that their algorithm structures are quite similar again. Well
generalize this kind of feature and introduce about higher level abstraction in
later section. For both maximum and minimum algorithms, we assume that the
given list isnt empty.
In order to find the minimum element in a list.
574
APPENDIX A. LISTS
If the list contains only one element, (a singleton list), the minimum element is this one;
Otherwise, we can firstly find the minimum element of the rest list, then
compare the first element with this intermediate result to determine the
final minimum value.
This algorithm can be formalized by the
l1 :
l1 :
min(L) =
min(L ) :
following equation.
L = {l1 }
l1 min(L )
otherwise
(A.26)
In order to get the maximum element instead of the minimum one, we can
simply replace the comparison to in the above equation.
l1 : L = {l1 }
l1 : l1 max(L )
max(L) =
(A.27)
max(L ) : otherwise
Note that both maximum and minimum actually process the list from right
to left. It remind us about tail recursion. We can modify them so that the list
is processed from left to right. Whats more, the tail recursion version brings us
on-line algorithm, that at any time, we hold the minimum or maximum result
of the list we examined so far.
a : L=
min(L , l1 ) : l1 < a
(A.28)
min (L, a) =
min(L , a) : otherwise
a : L=
max(L , l1 ) : a < l1
(A.29)
max (L, a) =
max(L , a) : otherwise
Dierent from the tail recursion sum and product, we cant pass constant
value to min , or max in practice, this is because we have to pass infinity
(min(L, )) or negative infinity (max(L, )) in theory, but in a real machine
neither of them can be represented since the length of word is limited.
Actually, there is workaround, we can instead pass the first element of the
list, so that the algorithms become applicable.
min(L) = min(L , l1 )
max(L) = max(L , l1 )
(A.30)
The corresponding real programs are given as the following. We skip the
none tail recursion programs, as they are intuitive enough. Reader can take
them as exercises for interesting.
min (x:xs) = min xs x where
min [] a = a
min (x:xs) a = if x < a then min xs x else min xs a
max (x:xs) = max xs x where
max [] a = a
max (x:xs) a = if a < x then max xs x else max xs a
575
The tail call version can be easily translated to imperative min/max algorithms.
1: function Min(L)
2:
m First(L)
3:
L Rest(L)
4:
while L = do
5:
if First(L) < m then
6:
m First(L)
7:
L Rest(L)
8:
return m
9:
10:
11:
12:
13:
14:
15:
16:
function Max(L)
m First(L)
L Rest(L)
while L = do
if m < First(L) then
m First(L)
L Rest(L)
return m
The corresponding ISO C++ programs are given as below.
template<typename T>
T min(List<T> xs) {
T x;
for (x = xskey; xs; xs = xsnext)
if (xskey < x)
x = xskey;
return x;
}
template<typename T>
T max(List<T> xs) {
T x;
for (x = xskey; xs; xs = xsnext)
if (x < xskey)
x = xskey;
return x;
}
max(L) =
l1 : |L| = 1
max(cons(l1 , L )) : l2 < l1
max(L ) : otherwise
(A.31)
576
APPENDIX A. LISTS
min(L) =
l1 : |L| = 1
min(cons(l1 , L )) : l1 < l2
min(L ) : otherwise
(A.32)
Exercise A.1
Given two lists L1 and L2 , design a algorithm eq(L1 , L2 ) to test if they
are equal to each other. Here equality means the lengths are same, and
at the same time, every elements in both lists are identical.
Consider varies of options to handle the out-of-bound error case when
randomly access the element in list. Realize them in both imperative
and functional programming languages. Compare the solutions based on
exception and error code.
Augment the list with a tail field, so that the appending algorithm can
be realized in constant O(1) time but not linear O(n) time. Feel free to
choose your favorite imperative programming language. Please dont refer
to the example source code along with this book before you try it.
With tail field augmented to list, for which list operations this field must
be updated? How it aects the performance?
Handle the out-of-bound case in insertion algorithm by treating it as appending.
Write the insertion sort algorithm by only using less than (<).
Design and implement the algorithm that find all the occurrence of a given
value and delete them from the list.
Reimplenent the algorithm to calculate the length of a list in tail-call
recursion manner.
Implement the insertion sort in tail recursive manner.
Implement the O(lg n) algorithm to calculate bn in your favorite imperative programming language. Note that we only need accumulate the
intermediate result when the bit is not zero.
A.4
Transformation
In previous section, we list some basic operations for linked-list. In this section,
we focus on the transformation algorithms for list. Some of them are corner
stones of abstraction for functional programming. Well show how to use list
transformation to solve some interesting problems.
A.4. TRANSFORMATION
A.4.1
577
: L=
cons(str(l1 ), toStr(L )) : otherwise
(A.33)
l1 : |L| = 1
l1 : snd(max (L )) < snd(l1 )
max (L ) : otherwise
(A.34)
maxBy(cmp, L) =
(A.35)
l1 : |L| = 1
l1 : cmp(l1 , maxBy(cmp, L ))
maxBy(cmp, L ) : otherwise
(A.36)
578
APPENDIX A. LISTS
Then max () is just a special case of maxBy() with the compare function
comparing on the second value in a pair.
max (L) = maxBy(less, L)
(A.37)
Here we write all functions in purely recursive way, they can be modified in
tail call manner. This is left as exercise to the reader.
With max () function defined, its possible to complete the solution by processing the whole list.
{
solve(L) =
: L=
cons(f st(max (l1 )), solve(L )) : otherwise
(A.38)
Map
Compare the solve() function in (A.38) and toStr() function in (A.33), it reveals very similar algorithm structure. although they targets on very dierent
problems, and one is trivial while the other is a bit complex.
The structure of toStr() applies the function str() which can turn a number
into string on every element in the list; while solve() first applies max () function
to every element (which is actually a list of pairs), then applies f st() function,
which essentially turns a list of pairs into a string. It is not hard to abstract
such common structure like the following equation, which is called as mapping.
{
: L=
map(f, L) =
(A.39)
cons(f (l1 )), map(f, L )) : otherwise
Because map takes a converter function f as argument, its called a kind of
high-order function. In functional programming environment such as Haskell,
mapping can be implemented just like the above equation.
map :: (ab)[a][b]
map _ [] = []
map f (x:xs) = f x : map f xs
The two concrete cases we discussed above can all be represented in high
order mapping.
toStr = map str
solve = map (f st max )
Where f g means function composing, that we first apply g then apply f .
For instance function h(x) = f (g(x)) can be represented as h = f g, reading
like function h is composed by f and g. Note that we use Curried form to omit
the argument L for brevity. Informally speaking, If we feed a function which
needs 2 arguments, for instance f (x, y) = z with only 1 argument, the result
turns to be a function which need 1 argument. For instance, if we feed f with
only argument x, it turns to be a new function take one argument y, defined as
g(y) = f (x, y), or g = f x. Note that x isnt a free variable any more, as it is
bound to a value. Reader can refer to any book about functional programming
for details about function composing and Currying.
Mapping can also be understood from the domain theory point of view.
Consider function y = f (x), it actually defines a mapping from domain of
A.4. TRANSFORMATION
579
variable x to the domain of value y. (x and y can have dierent types). If the
domains can be represented as set X, and Y , we have the following relation.
Y = {f (x)|x X}
(A.40)
This type of set definition is called Zermelo Frankel set abstraction (as known
as ZF expression) [7]. The dierent is that here the mapping is from a list to
another list, so there can be duplicated elements. In languages support list
comprehension, for example Haskell and Python etc (Note that the Python list
is a built-in type, but not the linked-list we discussed in this appendix), mapping
can be implemented as a special case of list comprehension.
map f xs = [ f x | x xs]
: r = 0 |L| < r
: otherwise
(A.41)
In this equation, {l} P means cons(l, P ), and L {l} denotes delete(L, l),
which is defined in previous section. If we take zero element for permutation,
or there are too few elements (less than r), the result is a list contains a empty
list; Otherwise for non-trivial case, the algorithm picks one element l from the
list, and recursively permutes the rest n 1 elements by picking up r 1 ones;
then it puts all the possible l in front of all the possible r 1 permutations.
Here is the Haskell implementation of this algorithm.
perm(L, r) =
{}
{{l} P |l L, P perm(L {l}, r 1)}
perm _ 0 = [[]]
perm xs r | length xs < r = [[]]
| otherwise = [ x:ys | x xs, ys perm (delete x xs) (r-1)]
580
APPENDIX A. LISTS
L Next(L)
12:
return L
In some static typed programming luangaes without type inference feature,
like C++6 , It is a bit complex to annotate the type of the passed-in function.
See [11] for detail. In fact some C++ environment provides the very same
mapping concept as in std::transform. However, it needs the reader to know
some langauge specific features, which are out of the scope of this book.
For brevity purpose, we switch to Python programming language for example
code. So that the type inference can be avoid in compile time. The definition
of a simple singly linked-list in Python is give as the following.
11:
class List:
def __init__(self, x = None, xs = None):
self.key = x
self.next = xs
def cons(x, xs):
return List(x, xs)
The mapping program, takes a function and a linked-list, and maps the
functions to every element as described in above algorithm.
def mapL(f, xs):
ys = prev = List()
while xs is not None:
prev.next = List(f(xs.key))
prev = prev.next
xs = xs.next
return ys.next
Dierent from the pseudo code, this program uses a dummy node as the head
of the resulting list. So it neednt test if the variable stores the last appending
position is NIL. This small trick makes the program compact. We only need
drop the dummy node before returning the result.
For each
For the trivial task such as printing a list of elements out, its quite OK to just
print each element without converting the whole list to a list of strings. We can
actually simplify the program.
1: function Print(L)
2:
while L = do
3:
print First(L)
4:
L Rest(L)
More generally, we can pass a procedure such as printing, to this list traverse,
so the procedure is performed for each element.
1: function For-Each(L, P )
2:
while L = do
3:
P(First(L))
4:
L Rest(L)
6 At
A.4. TRANSFORMATION
581
u : L=
do(p(l1 ), f oreach(L , p)) : otherwise
(A.42)
Here u means unit, its can be understood as doing nothing. The type of
unit is similar to the void concept in C or java like programming languages.
The do() function evaluates all its arguments, discards all the results except for
the last one, and returns the last result as the final value of do(). It is equivalent
to (begin ...) in Lisp families, and do block in Haskell in some sense. For
the details about unit type, please refer to [4].
Note that the for-each algorithm is just a simplified mapping, there are only
two minor dierence points:
It neednt form a result list, we care the side eect rather than the
returned value;
For each focus more on traversing, while mapping focus more on applying
function, thus the order of arguments are typically arranged as map(f, L)
and f oreach(L, p).
Some Functional programming facilities provide options for both returning
the result list or discarding it. For example Haskell Monad library provides both
mapM, mapM and forM, forM . Readers can refer to language specific materials
for detail.
Examples for mapping
Well show how to use mapping by an example, which is a problem of ACM/ICPC[12].
For sake of brevity, we modified the problem description a bit. Suppose there
are n lights in a room, all of them are o. We execute the following process n
times:
1. We switch all the lights in the room, so that they are all on;
2. We switch the 2, 4, 6, ... lights, that every other light is switched, if the
light is on, it will be o, and it will be on if the previous state is o;
3. We switch every third lights, that the 3, 6, 9, ... are switched;
4. ...
And at the last round, only the last light (the n-th light) is switched.
The question is how many lights are on finally?
Before we show the best answer to this puzzle, lets first work out a naive
brute-force solution. Suppose there are n lights, which can be represented as a
list of 0, 1 numbers, where 0 means the light is o, and 1 means on. The initial
state is a list of n zeros: {0, 0, ..., 0}.
We can label the light from 1 to n. A mapping can help us to turn the above
list into a labeled list7 .
7 Readers who are familiar with functional programming, may use zipping to achieve this.
Well explain zipping in later section.
582
APPENDIX A. LISTS
(A.44)
Note that, here we use Curried form of switch() function, which is equivalent
to
map((j,x) switch(i, (j, x)), L)
Here we need define a function proc(), which can perform the above mapping
on L over and over by n times. One option is to realize it in purely recursive
way as the following, so that we can call it like proc({1, 2, ..., n}, L)8 .
{
proc(I, L) =
L : I=
operate(I , map(switch(i1 ), L)) : otherwise
(A.45)
(A.46)
Lets see whats the answer for there are 1, 2, ..., 100 lights.
[1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,
8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,10]
This result is interesting:
8 This
A.4. TRANSFORMATION
583
solve(n) = n
(A.47)
The next Haskell command verifies that the answer for 1, 2, ..., 100 lights
are as same as above.
map (floor.sqrt) [1..100]
[1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,
8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,10]
Mapping is generic concept that it doesnt only limit in linked-list, but also
can be applied to many complex data structures. The chapter about binary
search tree in this book explains how to map on trees. As long as we can
traverse a data structure in some order, and the empty data structure can be
identified, we can use the same mapping idea. Well return to this kind of
high-order concept in the section of folding later.
A.4.2
reverse
How to reverse a singly linked-list with minimum space is a popular technical interview problem in some companies. The pointer manipulation must be
arranged carefully in imperative programming languages such as ANSI C. However, well show that, there exists an easy way to write this program:
1. Firstly, write a pure recursive straightforward solution;
2. Then, transform the pure recursive solution to tail-call manner;
584
APPENDIX A. LISTS
: L=
append(reverse(L ), l1 ) : otherwise
(A.48)
A : L=
reverse (L , {l1 } A) : otherwise
(A.49)
Where {l1 } A means cons(l1 , A). Dierent from appending, its a constant
O(1) time operation. The core idea is that we repeatedly take the element one
by one from the head of the original list, and put them in front the accumulated
result. This is just like we store all the elements in a stack, them pop them out.
This is a linear time algorithm.
Below Haskell program implements this tail-call version.
reverse [] acc = acc
reverse (x:xs) acc = reverse xs (x:acc)
Since the nature of tail-recursion call neednt book-keep any context (typically by stack), most modern compilers are able to optimize it to a pure imperative loop, and reuse the current context and stack etc. Lets manually do this
optimization so that we can get a imperative algorithm.
1: function Reverse(L)
2:
A
3:
while L = do
4:
A Cons(First(L), A)
5:
L Rest(L)
However, because we translate it directly from a functional solution, this
algorithm actually produces a new reversed list, but does not mutate the original
one. It is not hard to change it to an in-place solution by reusing L. For example,
the following ISO C++ program implements the in-place algorithm. It takes
O(1) memory space, and reverses the list in O(n) time.
585
template<typename T>
List<T> reverse(List<T> xs) {
List<T> p, ys = NULL;
while (xs) {
p = xs;
xs = xsnext;
pnext = ys;
ys = p;
}
return ys;
}
Exercise A.2
Implement the algorithm to find the maximum element in a list of pair in
tail call approach in your favorite programming language.
A.5
Extract sub-lists
Dierent from arrays which are capable to slice a continuous segment fast and
easily, It needs more work to extract sub lists from singly linked list. Such
operations are typically linear algorithms.
A.5.1
Taking first n elements from a list is semantically similar to extract sub list from
the very left like sublist(L, 1, n), where the second and the third arguments to
sublist are the positions the sub-list starts and ends. For the trivial edge case,
that either n is zero or the list is empty, the sub list is empty; Otherwise, we
can recursively take the first n 1 elements from the rest of the list, and put
the first element in front of it.
{
take(n, L) =
: L=n=0
cons(l1 , take(n 1, L )) : otherwise
(A.50)
Note that the edge cases actually handle the out-of-bound error. The following Haskell program implements this algorithm.
take _ [] = []
take 0 _ = []
take n (x:xs) = x : take (n-1) xs
Dropping on the other hand, drops the first n elements and returns the left
as result. It is equivalent to get the sub list from right like sublist(L, n + 1, |L|),
where |L| is the length of the list. Dropping can be designed quite similar to
taking by discarding the first element in the recursive case.
: L=
L : n=0
drop(n, L) =
(A.51)
drop(n 1, L )) : otherwise
Translating the algorithm to Haskell gives the below example program.
586
APPENDIX A. LISTS
drop _ [] = []
drop 0 L = L
drop n (x:xs) = drop (n-1) xs
The imperative taking and dropping are quite straight-forward, that they
are left as exercises to the reader.
With taking and dropping defined, extracting sub list at arbitrary position
for arbitrary length can be realized by calling them.
sublist(L, f rom, count) = take(count, drop(f rom 1, L))
(A.52)
(A.53)
Note that the elements in range [f rom, to] is returned by this function, with
both ends included. All the above algorithms perform in linear time.
take-while and drop-while
Compare to taking and dropping, there is another type of operation, that we
either keep taking or dropping elements as far as a certain condition is met. The
taking and dropping algorithms can be viewed as special cases for take-while
and drop-while.
Take-while examines elements one by one as far as the condition is satisfied,
and ignore all the rest of elements even some of them satisfy the condition.
This is the dierent point from filtering which well explained in later section.
Take-while stops once the condition tests fail; while filtering traverses the whole
list.
takeW hile(p, L) =
: L=
: p(l1 )
(A.54)
: L=
L : p(l1 )
dropW hile(p, L) =
(A.55)
587
split-at
With taking and dropping defined, splitting-at can be realized trivially by calling
them.
splitAt(i, L) = (take(i, L), drop(i, L))
A.5.2
(A.56)
breaking
Breaking can be considered as a general form of splitting, instead of splitting
at a given position, breaking examines every element for a certain predicate,
and finds the longest prefix of the list for that condition. The result is a pair of
sub-lists, one is that longest prefix, the other is the rest.
There are two dierent breaking semantics, one is to pick elements satisfying
the predicate as long as possible; the other is to pick those dont satisfy. The
former is typically defined as span, while the later as break.
Span can be described, for example, in such recursive manner: In order to
span a list L for predicate p:
If the list is empty, the result for this edge trivial case is a pair of empty
lists (, );
Otherwise, we test the predicate against the first element l1 , if l1 satisfies
the predicate, we denote the intermediate result for spanning the rest of
list as (A, B) = span(p, L ), then we put l1 in front of A to get pair
({l1 } A, B), otherwise, we just return (, L) as the result.
For breaking, we just test the negate of predicate and all the others are as
same as spanning. Alternatively, one can define breaking by using span as in
the later example program.
span(p, L) =
(, ) :
({l1 } A, B) :
(, L) :
break(p, L) =
(, ) :
({l1 } A, B) :
(, L) :
L=
p(l1 ) = T rue, (A, B) = span(p, L )
otherwise
(A.57)
L=
p(l1 ), (A, B) = break(p, L )
otherwise
(A.58)
Note that both functions only find the longest prefix, they stop immediately
when the condition is fail even if there are elements in the rest of the list meet
the predicate (or not). Translating them to Haskell gives the following example
program.
span _ [] = ([], [])
span p xs@(x:xs) = if p x then let (as, bs) = span xs in (x:as, bs) else ([], xs)
break p = span (not p)
588
APPENDIX A. LISTS
6:
7:
8:
function Break(p, L)
return Span(p, L)
1:
2:
3:
4:
5:
This algorithm creates a new list to hold the longest prefix, another option is
to turn it into in-place algorithm to reuse the spaces as in the following Python
example.
def span(p, xs):
ys = xs
last = None
while xs is not None and p(xs.key):
last = xs
xs = xs.next
if last is None:
return (None, xs)
last.next = None
return (ys, xs)
Note that both span and break need traverse the list to test the predicate,
thus they are linear algorithms bound to O(n).
grouping
Grouping is a commonly used operation to solve the problems that we need
divide the list into some small groups. For example, Suppose we want to group
the string Mississippi, which is actual a list of char { M, s, s, i, s, s,
i, p, p, i}. into several small lists in sequence, that each one contains
consecutive identical characters. The grouping operation is expected to be:
group(Mississippi) = { M, i, ss, i, ss, i, pp, i}
Another example, is that we have a list of numbers:
L = {15, 9, 0, 12, 11, 7, 10, 5, 6, 13, 1, 4, 8, 3, 14, 2}
We want to divide it into several small lists, that each sub-list is ordered
descending. The grouping operation is expected to be :
group(L) = {{15, 9, 0}, {12, 11, 7}, {10, 5}, {6}, {13, 1}, {4}, {8, 3}, {14, 2}}
Both cases play very important role in real algorithms. The string grouping
is used in creating Trie/Patricia data structure, which is a powerful tool in string
searching area; The ordered sub-list grouping can be used in nature merge sort.
There are dedicated chapters in this book explain the detail of these algorithms.
589
{}
{{l1 }}
group(p, L) =
{{l1 } g1 , g2 , ...}
{{l1 }, g1 , g2 , ...}
:
:
:
:
L=
|L| = 1
p(l1 , l2 ), G = group(p, L ) = {g1 , g2 , ...}
otherwise
(A.59)
Note that {l1 } g1 actually means cons(l1 , g1 ), which performs in constant
time. This is a linear algorithm performs proportion to the length of the list, it
traverses the list in one pass which is bound to O(n). Translating this program
to Haskell gives the below example code.
group _ [] = [[]]
group _ [x] = [[x]]
group p (x:xs@(x:_)) | p x x = (x:ys):yss
| otherwise = [x]:r
where
r@(ys:yss) = group p xs
590
APPENDIX A. LISTS
quadratic time if the appending function isnt optimized by storing the tail
position. The corresponding Python program is given as below.
def group(p, xs):
if xs is None:
return List(None)
(x, xs) = (xs.key, xs.next)
g = List(x)
G = List(g)
while xs is not None:
y = xs.key
if p(x, y):
g = append(g, y)
else:
g = List(y)
G = append(G, g)
x=y
xs = xs.next
return G
With the grouping function defined, the two example cases mentioned at the
beginning of this section can be realized by passing dierent predictions.
group(=, {m, i, s, s, i, s, s, i, p, p, i}) = {{M }, {i}, {ss}, {i}, {ss}, {i}, {pp}, {i}}
{} : L =
{{l1 } A} group(p, B) : otherwise
(A.60)
Where (A, B) = span(x p(l1 , x), L ) is the result of spanning on the rest
sub-list of L.
Although this new defined grouping function can generate correct result for
the first case as in the following Haskell code snippet.
groupBy (==) "Mississippi"
["m","i","ss","i","ss","i","pp","i"]
However, it seems that this algorithm cant group the list of numbers into
ordered sub lists.
591
The reason is because that, as the first element 15 is used as the left parameter to operator for span, while 15 is the maximum value in this list, so the
span function ends with putting all elements to A, and B is left empty. This
might seem a defect, but it is actually the correct behavior if the semantic is to
group equal elements together.
Strictly speaking, the equality predicate must satisfy three properties: reflexive, transitive, and symmetric. They are specified as the following.
Reflexive. x = x, which says that any element is equal to itself;
Transitive. x = y, y = z x = z, which says that if two elements are
equal, and one of them is equal to another, then all the tree are equal;
Symmetric. x = y y = x, which says that the order of comparing two
equal elements doesnt aect the result.
When we group character list Mississippi, the equal (=) operator is used,
which obviously conforms these three properties. So that it generates correct
grouping result. However, when passing () as equality predicate, to group a
list of numbers, it violets both reflexive and symmetric properties, that is reason
why we get wrong grouping result.
This fact means that the second algorithm we designed by using span, limits
the semantic to strictly equality, while the first one does not. It just tests the
condition for every two adjacent elements, which is much weaker than equality.
Exercise A.3
1. Implement the in-place imperative taking and dropping algorithms in your
favorite programming language, note that the out of bound cases should
be handled. Please try both languages with and without GC (Garbage
Collection) support.
2. Implement take-while and drop-while in your favorite imperative programming language. Please try both dynamic type language and static type
language (with and without type inference). How to specify the type of
predicate function as generic as possible in static type system?
3. Consider the following definition of
(, ) :
({l1 } A, B) :
span(p, L) =
(A, {l1 } B) :
span.
L=
p(l1 ) = T rue, (A, B) = span(p, L )
otherwise
Whats the dierence between this algorithm and the the one weve shown
in this section?
4. Implement the grouping algorithm by using span in imperative way in
your favorite programming language.
592
APPENDIX A. LISTS
A.6
Folding
We are ready to introduce one of the most critical concept in high order programming, folding. It is so powerful tool that almost all the algorithms so far
in this appendix can be realized by folding. Folding is sometimes be named as
reducing (the abstracted concept is identical to the buzz term map-reduce in
cloud computing in some sense). For example, both STL and Python provide
reduce function which realizes partial form of folding.
A.6.1
Remind the sum and product definition in previous section, they are quite similar actually.
{
0 : L=
sum(L) =
l1 + sum(L ) : otherwise
{
1 : L=
product(L) =
l1 product(L ) : otherwise
It is obvious that they have same structure. Whats more, if we list the
insertion sort definition, we can find that it also shares this structure.
{
: L=
sort(L) =
insert(l1 , sort(L )) : otherwise
This hint us that we can abstract this essential common structure, so that
we neednt repeat it again and again. Observing sum, product, and sort, there
are two dierent points which we can parameterize.
The result of the trivial edge case varies. It is zero for sum, 1 for product,
and empty list for sorting.
The function applied to the first element and the intermediate result varies.
It is plus for sum, multiply for product, and ordered-insertion for sorting.
If we parameterize the result of trivial edge case as initial value z (stands for
abstract zero concept), the function applied in recursive case as f (which takes
two parameters, one is the first element in the list, the other is the recursive
result for the rest of the list), this common structure can be defined as something
like the following.
{
z : L=
proc(f, z, L) =
f (l1 , proc(f, z, L )) : otherwise
Thats it, and we should name this common structure a better name instead
of the meaningless proc. Lets see the characteristic of this common structure.
For list L = {x1 , x2 , ..., xn }, we can expand the computation like the following.
proc(f, z, L) = f (x1 , proc(f, z, L )
= f (x1 , f (x2 , proc(f, z, L ))
...
= f (x1 , f (x2 , f (..., f (xn , f (f, z, ))...)
= f (x1 , f (x2 , f (..., f (xn , z))...)
A.6. FOLDING
593
Since f takes two parameters, its a binary function, thus we can write it in
infix form. The infix form is defined as below.
x f y = f (x, y)
(A.61)
The above expanded result is equivalent to the following by using infix notion.
proc(f, z, L) = x1 f (x2 f (...(xn f z))...)
Note that the parentheses are necessary, because the computation starts
from the right-most (xn f z), and repeatedly fold to left towards x1 . This is
quite similar to folding a Chinese hand-fan as illustrated in the following photos.
A Chinese hand-fan is made of bamboo and paper. Multiple bamboo frames are
stuck together with an axis at one end. The arc shape paper is fully expanded
by these frames as shown in Figure A.3 (a); The fan can be closed by folding
the paper. Figure A.3 (b) shows that some part of the fan is folded from right.
After these folding finished, the fan results a stick, as shown in Figure A.3 (c).
594
APPENDIX A. LISTS
a frame for a certain angle, so that it lays on top of the collapsed part. When
we start closing the fan, the initial collapsed result is the first bamboo frame.
The close process is folding from one end, and repeatedly apply the unit close
steps, till all the frames is rotated, and the folding result is a stick closed form.
Actually, the sum and product algorithms exactly do the same thing as
closing the fan.
sum({1, 2, 3, 4, 5}) = 1 + (2 + (3 + (4 + 5)))
= 1 + (2 + (3 + 9))
= 1 + (2 + 12)
= 1 + 14
= 15
product({1, 2, 3, 4, 5}) = 1 (2 (3 (4 5)))
= 1 (2 (3 20))
= 1 (2 60)
= 1 120
= 120
In functional programming, we name this process folding, and particularly,
since we execute from the most inner structure, which starts from the right-most
one. This type of folding is named folding right.
{
f oldr(f, z, L) =
z : L=
f (l1 , f oldr(f, z, L )) : otherwise
(A.62)
N
i=1
xi
(A.63)
xi
(A.64)
A.6.2
(A.65)
As mentioned in section of tail recursive call, both pure recursive sum and
product compute from right to left and they must book keep all the intermediate
results and contexts. As we abstract fold-right from the very same structure,
folding from right does the book keeping as well. This will be expensive if the
list is very long.
Since we can change the realization of sum and product to tail-recursive call
manner, it quite possible that we can provide another folding algorithm, which
processes the list from left to right in normal order, and enable the tail-call
optimization by reusing the same context.
A.6. FOLDING
595
Instead of induction from sum, product and insertion sort, we can directly
change the folding right to tail call. Observe that the initial value z, actually
represents the intermediate result. We can use it as the accumulator.
{
f oldl(f, z, L) =
z : L=
f oldl(f, f (z, l1 ), L ) : otherwise
(A.66)
Every time when the list isnt empty, we take the first element, apply function
f on the accumulator z and it to get a new accumulator z = f (z, l1 ). After
that we can repeatedly folding with the very same function f , the updated
accumulator z , and list L .
Lets verify that this tail-call algorithm actually folding from left.
5
i=1
Note that, we actually delayed the evaluation of f (z, l1 ) in every step. (This
is the exact behavior in system support lazy-evaluation, for instance, Haskell.
However, in strict system such as standard ML, its not the case.) Actually,
they will be evaluated in sequence of {1, 3, 6, 10, 15} in each call.
Generally, folding-left can be expanded in form of
f oldl(f, z, L) = f (f (...(f (f (z, l1 ), l2 ), ..., ln )
(A.67)
Or in infix manner as
f oldl(f, z, L) = ((...(z f l1 ) f l2 ) f ...) ln
(A.68)
With folding from left defined, sum, product, and insertion-sort can be transparently implemented by calling f oldl as sum(L) = f oldl(+, 0, L), product(L) =
f oldl(+, 1, L), and sort(L) = f oldl(insert, , L). Compare with the foldingright version, they are almost same at first glares, however, the internal implementation diers.
Imperative folding and generic folding concept
The tail-call nature of folding-left algorithm is quite friendly for imperative settings, that even the compiler isnt equipped with tail-call recursive optimization,
we can anyway implement the folding in while-loop manually.
1: function Fold(f, z, L)
2:
while L = do
3:
z f (z, First(L) )
4:
L Rest(L)
5:
return z
Translating this algorithm to Python yields the following example program.
596
APPENDIX A. LISTS
Actually, Python provides built-in function reduce which does the very
same thing. (in ISO C++, this is provided as reduce algorithm in STL.) Almost
no imperative environment provides folding-right function because it will cause
stack overflow problem if the list is too long. However, there still exist cases
that the folding from right semantics is necessary. For example, one defines a
container, which only provides insertion function to the head of the container,
but there is no any appending method, so that we want such a f romList tool.
f romList(L) = f oldr(insertHead, empty, L)
Calling f romList with the insertion function as well as an empty initialized
container, can turn a list into the special container. Actually the singly linkedlist is such a container, which performs well on insertion to the head, but poor
to linear time if appending on the tail. Folding from right is quite nature when
duplicate a linked-list while keeps the elements ordering. While folding from
left will generate a reversed list.
In such cases, there exists an alternative way to implement imperative folding
right by first reverse the list, and then folding the reversed one from left.
1: function Fold-Right(f, z, L)
2:
return Fold(f, z, Reverse(L))
Note that, here we must use the tail-call version of reversing, or the stack
overflow issue still exists.
One may think that folding-left should be chosen in most cases over foldingright because its friendly for tail-recursion call optimization, suitable for both
functional and imperative settings, and its an online algorithm. However,
folding-right plays a critical role when the input list is infinity and the binary
function f is lazy. For example, below Haskell program wraps every element in
an infinity list to a singleton, and returns the first 10 result.
take 10 $ foldr (x xs [x]:xs) [] [1..]
[[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]
This cant be achieved by using folding left because the outer most evaluation
cant be finished until all the list being processed. The details is specific to lazy
evaluation feature, which is out of the scope of this book. Readers can refer to
[13] for details.
Although the main topic of this appendix is about singly linked-list related
algorithms, the folding concept itself is generic which doesnt only limit to list,
but also can be applied to other data structures.
We can fold a tree, a queue, or even more complicated data structures as
long as we have the following:
The empty data structure can be identified for trivial edge case; (e.g.
empty tree)
We can traverse the data structure (e.g. traverse the tree in pre-order).
A.6. FOLDING
597
Some languages provide this high-level concept support, for example, Haskell
achieve this via monoid, readers can refer to [8] for detail.
There are many chapters in this book use the widen concept of folding.
A.6.3
folding in practice
We have seen that sum, product, and insertion sort all can be realized in folding.
The brute-force solution for the puzzle shown in mapping section can also be
designed by mixed use of mapping and folding.
Remind that we create a list of pairs, each pair contains the number of the
light, and the on-o state. After that we process from 1 to n, switch the light
if the number can be divided. The whole process can be viewed as folding.
f old(step, {(1, 0), (2, 0), ..., (n, 0)}, {1, 2, ..., n})
The initial value is the very first state, that all the lights are o. The list to
be folding is the operations from 1 to n. Function step takes two arguments, one
is the light states pair list, the other is the operation time i. It then maps on all
lights and performs switching. We can then substitute the step with mapping.
f old(L,i map(switch(i), L), {(1, 0), (2, 0), ..., (n, 0)}, {1, 2, ..., n})
Well simplify the notation, and directly write map(switch(i), L) for brevity
purpose. The result of this folding is the final states pairs, we need take the
second one of the pair for each element via mapping, then calculate the summation.
sum(map(snd, f old(map(switch(i), L), {(1, 0), (2, 0), ..., (n, 0)}, {1, 2, ..., n})))
(A.69)
There are materials provides plenty of good examples of using folding, especially in [1], folding together with fusion law are well explained.
concatenate a list of list
In previous section A.3.6 about concatenation, we explained how to concatenate two lists. Actually, concatenation of lists can be considered equivalent to
summation of numbers. Thus we can design a general algorithm, which can
concatenate multiple lists into one big list.
Whats more, we can realize this general concatenation by using folding.
As sum can be represented as sum(L) = f oldr(+, 0, L), its straightforward to
write the following equation.
concats(L) = f oldr(concat, , L)
(A.70)
Where L is a list of list, for example {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, ...}. Function concat(L1 , L2 ) is what we defined in section A.3.6.
In some environments which support lazy-evaluation, such as Haskell, this
algorithm is capable to concatenate infinite list of list, as the binary function
++ is lazy.
598
APPENDIX A. LISTS
Exercise A.4
Whats the performance of concats algorithm? is it linear or quadratic?
Design another linear time concats algorithm without using folding.
Realize mapping algorithm by using folding.
A.7
Searching and matching are very important algorithms. They are not only
limited to linked list, but also applicable to a wide range of data structures. We
just scratch the surface of searching and matching in this appendix. There are
dedicated chapters explain about them in this book.
A.7.1
Existence testing
The simplest searching case is to test if a given element exists in a list. A linear
time traverse can solve this problem. In order to determine element x exists in
list L:
If the list is empty, its obvious that the element doesnt exist in L;
If the first element in the list equals to x, we know that x exists;
Otherwise, we need recursively test if x exists in the rest sub-list L .
This simple description can be directly formalized to equation as the following.
F alse
T rue
xL=
x L
: L=
: l1 = x
: otherwise
(A.71)
A.7.2
599
Looking up
One extra step from existence testing is to find the interesting information stored
in the list. There are two typical methods to augment extra data to the element.
Since the linked list is chain of nodes, we can store satellite data in the node,
then provide key(n) to access the key of the node, rest(n) for the rest sub-list,
and value(n) for the augmented data. The other method, is to pair the key
and data, for example {(1, hello), (2, world), (3, f oo), ...}. Well introduce how
to form such pairing list in later section.
The algorithm is almost as same as the existence testing, that it traverses
the list, examines the key one by one. Whenever it finds a node which has the
same key as what we are looking up, it stops, and returns the augmented data.
It is obvious that this is linear strategy. If the satellite data is augmented to
the node directly, the algorithm can be defined as the following.
: L=
value(l
lookup(x, L) =
(A.72)
1 ) : key(l1 ) = x
lookup(x, L ) : otherwise
In this algorithm, L is a list of nodes which are augmented with satellite data.
Note that the first case actually means looking up failure, so that the result is
empty. Some functional programming languages, such as Haskell, provide Maybe
type to handle the possibility of fail. This algorithm can be slightly modified to
handle the key-value pair list as well.
: L=
snd(l1 ) : f st(l1 ) = x
(A.73)
lookup(x, L) =
lookup(x, L ) : otherwise
Here L is a list of pairs, functions f st(p) and snd(p) access the first part and
second part of the pair respectively.
Both algorithms are in tail-call manner, they can be transformed to imperative looping easily. We left this as exercise to the reader.
A.7.3
Lets take one more step ahead, looking up algorithm performs linear search by
comparing the key of an element is equal to the given value. A more general
case is to find an element matching a certain predicate. We can abstract this
matching condition as a parameter for this generic linear finding algorithm.
: L=
l1 : p(l1 )
f ind(p, L) =
(A.74)
f ind(p, L ) : otherwise
The algorithm traverses the list by examining if the element satisfies the
predicate p. It fails if the list is empty while there is still nothing found. This is
handled in the first trivial edge case; If the first element in the list satisfies the
condition, the algorithm returns the whole element (node), and user can further
handle it as he like (either extract the satellite data or do whatever); otherwise,
the algorithm recursively perform finding on the rest of the sub-list. Below is
the corresponding Haskell example program.
600
APPENDIX A. LISTS
find _ [] = Nothing
find p (x:xs) = if p x then Just x else find p xs
It is quite possible that there are multiple elements in the list which satisfy
the precondition. The finding algorithm designed so far just picks the first one
it meets, and stops immediately. It can be considered as a special case of finding
all elements under a certain condition.
Another viewpoint of finding all elements with a given predicate is to treat
the finding algorithm as a black box, the input to this box is a list, while the
output is another list contains all elements satisfying the predicate. This can
be called as filtering as shown in the below figure.
input
filter p
output
Figure A.4: The input is the original list {x1 , x2 , ..., xn }, the output is a list
{x1 , x2 , ..., xm }, that for xi , predicate p(xi ) is satisfied.
This figure can be formalized in ZF expression form. However, we actually
enumerate among list instead of a set.
f ilter(p, L) = {x|x L p(x)}
(A.75)
Some environment such as Haskell (and Python for any iterable), supports
this form as list comprehension.
filter p xs = [ x | x xs, p x]
Note that the Python built-in list isnt singly-linked list as we mentioned in
this appendix.
601
In order to modify the finding algorithm to realize filtering, the found elements are appended to a result list. And instead of stopping the traverse, all
the rest of elements should be examined with the predicate.
: L=
f ilter(p, L ) : otherwise
This algorithm returns empty result if the list is empty for trivial edge case;
For non-empty list, suppose the recursive result of filtering the rest of the sublist is A, the algorithm examine if the first element satisfies the predicate, it is
put in front of A by a cons operation (O(1) time).
The corresponding Haskell program is given as below.
filter _ [] = []
filter p (x:xs) = if p x then x : filter p xs else filter p xs
Although we mentioned that the next found element is appended to the result list, this algorithm actually constructs the result list from the right most to
the left, so that appending is avoided, which ensure the linear O(n) performance.
Compare this algorithm with the following imperative quadratic realization reveals the dierence.
1: function Filter(p, L)
2:
L
3:
while L = do
4:
if p(First(L)) then
5:
L Append(L , First(L))
Linear operation
6:
L Rest(L)
As the comment of appending statement, its typically proportion to the
length of the result list if the tail position isnt memorized. This fact indicates
that directly transforming the recursive filter algorithm into tail-call form will
downgrade the performance from O(n) to O(n2 ). As shown in the below equation, that f ilter(p, L) = f ilter (p, L, ) performs as poorly as the imperative
one.
f ilter (p, L, A) =
A : L=
f ilter (p, L , A {l1 }) : p(l1 )
(A.77)
602
APPENDIX A. LISTS
(A.79)
(A.80)
A.7.4
Matching
Matching generally means to find a given pattern among some data structures.
In this section, we limit the topic within list. Even this limitation will leads
to a very wide and deep topic, that there are dedicated chapters in this book
introduce matching algorithms. So we only select the algorithm to test if a given
list exists in another (typically longer) list.
Before dive into the algorithm of finding the sub-list at any position, two
special edge cases are used for warm up. They are algorithms to test if a given
list is either prefix or sux of another.
In the section about span, we have seen how to find a prefix under a certain
condition. prefix matching can be considered as a special case in some sense.
That it compares each of the elements between the two lists from the beginning
until meets any dierent elements or pass the end of one list. Define P L if
P is prefix of L.
T rue : P =
F alse : p1 = l1
(A.81)
P L=
P L : otherwise
This is obviously a linear algorithm. However, We cant use the very same
approach to test if a list is sux of another because it isnt cheap to start from
the end of the list and keep iterating backwards. Arrays, on the other hand
which support random access can be easily traversed backwards.
As we only need the yes-no result, one solution to realize a linear sux
testing algorithm is to reverse both lists, (which is linear time), and use prefix
testing instead. Define L P if P is sux of L.
L P = reverse(P ) reverse(L)
(A.82)
603
function Is-Infix(P, L)
while L = do
if P L then
return TRUE
L Rest(L)
return FALSE
Formalize this algorithm to recursive equation leads to the below definition.
T rue : P L
F alse : L =
inf ix?(P, L) =
(A.83)
Note that there is a tricky implicit constraint in this equation. If the pattern
P is empty, it is definitely the infix of any target list. This case is actually
covered by the first condition in the above equation because empty list is also
the prefix of any list. In most programming languages support pattern matching,
we cant arrange the second clause as the first edge case, or it will return false for
inf ix?(, ). (One exception is Prolog, but this is a language specific feature,
which we wont covered in this book.)
Since prefix testing is linear, and it is called while traversing the list, this
algorithm is quadratic O(nm). where n and m are the length of the pattern
and target lists respectively. There is no trivial way to improve this position by
position scanning algorithm to linear even if the data structure changes from
linked-list to randomly accessible array.
There are chapters in this book introduce several approaches for fast matching, including sux tree with Ukkonen algorithm, Knuth-Morris-Pratt algorithm and Boyer-Moore algorithm.
Alternatively, we can enumerate all suxes of the target list, and check if
the pattern is prefix of any these suxes. Which can be represented as the
following.
inf ix?(P, L) = S suf f ixes(L) P S
(A.84)
This can be represented as list comprehension, for example the below Haskell
program.
isInfixOf x y = (not null) [ s | s tails(y), x isPrefixOfs]
Where function isPrefixOf is the prefixing testing function defined according to our previous design. function tails generate all suxes of a list. The
implementation of tails is left as an exercise to the reader.
Exercise A.5
Implement the linear existence testing in both functional and imperative
approaches in your favorite programming languages.
Implement the looking up algorithm in your favorite imperative programming language.
Realize the linear time filtering algorithm by firstly building the result
list in reverse order, and finally reverse it to resume the normal result.
604
APPENDIX A. LISTS
Implement this algorithm in both imperative looping and functional tailrecursion call.
Implement the imperative algorithm of prefix testing in your favorite programming language.
Implement the algorithm to enumerate all suxes of a list.
A.8
: A=B =
cons((a1 , b1 ), zip(A , B )) : otherwise
(A.85)
Note that this algorithm is capable to handle the case that the two lists
being zipped have dierent lengths. The result list of pairs aligns with the
shorter one. And its even possible to zip an infinite list with another one with
limited length in environment support lazy evaluation. For example with this
auxiliary function defined, we can initialize the lights state as
zip({0, 0, ...}, {1, 2, ..., n}
In some languages support list enumeration, such as Haskell (Python provides similar range function, but it manipulates built-in list, which isnt linkedlist actually), this can be expressed as zip (repeat 0) [1..n]. Given a list
of words, we can also index them with consecutive numbers as
zip({1, 2, ...}, {a, an, another, ...})
Note that the zipping algorithm is linear, as it uses constant time cons operation in each recursive call. However, directly translating zip into imperative
manner would down-grade the performance to quadratic unless the linked-list
is optimized with tail position cache or we in-place modify one of the passed-in
list.
1: function Zip(A, B)
2:
C
3:
while A = B = do
4:
C Append(C, (First(A), First(B)))
5:
A Rest(A)
605
B Rest(B)
7:
return C
Note that, the appending operation is proportion to the length of the result
list C, so it will get more and more slowly along with traversing. There are three
solutions to improve this algorithm to linear time. The first method is to use a
similar approach as we did in infix-testing, that we construct the result list of
pairs in reverse order by always insert the paired elements on head; then perform
a linear reverse operation before return the final result; The second method is
to modify one passed-in list, for example A, in-place while traversing. Translate
it from list of elements to list of pairs; The third method is to remember the
last appending position. Please try these solutions as exercise.
The key point of linear time zipping is that the result list is actually built
from right to left, its quite possible to provide a folding-right realization. This
is left as exercise to the reader.
It is natural to extend the zipper algorithm so that multiple lists can be
zipped to one list of multiple-elements. For example, Haskell standard library
provides, zip, zip3, zip4, ..., till zip7. Another typical extension to zipper
is that, sometimes, we dont want to list of pairs (or tuples more generally),
instead, we want to apply some combinator function to each pair of elements.
For example, consider the case that we have a list of unit prices for every
fruit: apple, orange, banana, ..., as {1.00, 0.80, 10.05, ...}, with same unit of
Dollar; And the cart of customer holds a list of purchased quantity, for instance
{3, 1, 0, ...}, means this customer, put 3 apples, an orange in the cart. He doesnt
take any banana, so the quantity of banana is zero. We want to generate a
list of cost for the customer, contains how much should pay for apple, orange,
banana,... respectively.
The program can be written from scratch as below.
6:
{
paylist(U, Q) =
: U =Q=
cons(u1 q1 , paylist(U , Q )) : otherwise
Compare this equation with the zipper algorithm. It is easy to find the
common structure of the two, and we can parameterize the combinator function
as f , so that the generic zipper algorithm can be defined as the following.
{
: A=B =
cons(f (a1 , b1 ), zipW ith(f, A , B )) : otherwise
(A.86)
Here is an example that defines the inner-product (or dot-product)[14] by
using zipW ith.
zipW ith(f, A, B) =
(A.87)
606
APPENDIX A. LISTS
Similarly, the cart can also be represented clearly in such manner, for example,
Q = {(apple, 3), (orange, 1), (banana, 0), ...}.
Given such a product - unit price list and a product - quantity list, how
to calculate the total payment?
One straight forward idea derived from the previous solution is to extract the
unit price list and the purchased quantity list, then calculate the inner-product
of them.
pay = sum(zipW ith(, snd(unzip(P )), snd(unzip(Q))))
(A.88)
(A.89)
The initial result is a pair of empty list. During the folding process, the
head of the list, which is a pair of elements, as well as the intermediate result
are passed to the combinator function. This combinator function is given as a
lambda expression, that it extracts the paired elements, and put them in front
of the two intermediate lists respectively. Note that we use implicit pattern
matching to extract the elements from pairs. Alternatively this can be done by
using f st, and snd functions explicitly as
p,P (cons(f st(p), f st(P )), cons(snd(p), snd(P )))
The following Haskell example code implements unzip algorithm.
unzip = foldr (a, b) (as, bs) (a:as, b:bs) ([], [])
Zip and unzip concepts can be extended more generally rather than only
limiting within linked-list. It is quite useful to zip two lists to a tree, where the
data stored in the tree are paired elements from both lists. General zip and
unzip can also be used to track the traverse path of a collection to mimic the
parent pointer in imperative implementations. Please refer to the last chapter
of [8] for a good treatment.
Exercise A.6
Design and implement iota (I) algorithm, which can enumerate a list with
some given parameters. For example:
iota(..., n) = {1, 2, 3, ..., n};
iota(m, n) = {m, m + 1, m + 2, ..., n}, Where m n;
iota(m, m + a, ..., n) = {m, m + a, m + 2a, ..., n};
iota(m, m, ...) = repeat(m) = {m, m, m, ...};
iota(m, ...) = {m, m + 1, m + 2, ...}.
Note that the last two cases demand generate infinite list essentially. Consider how to represents infinite list? You may refer to the streaming and
lazy evaluation materials such as [5] and [8].
607
A.9
Exercise A.7
Develop a program to remove the duplicated elements in a linked-list.
In imperative settings, the duplicated elements should be removed inplace. In purely functional settings, construct a new list contains the
unique elements. The order of the elements should be kept as their origianl
appearence. What is the complexity of the program? Try to simplify the
solution if auxiliary data structures are allowed.
A decimal non-negative integer can be represented in linked-list. For
example 1024 can be represented as 4 2 0 1. Generally,
n = dm ...d2 d1 can be represented as d1 d2 ... dm . Given
two numbers a, b in linked-list form. Realize basic arithmetic operations
such as plus and minus.
In imperative settings, a linked-list may be corrupted, that it is circular.
In such list, some node points back to previous one. Figure A.5 shows
such situation. The normal iteration ends up infinite looping.
1. Write a program to detect if a linked-list is circular;
2. Write a program to find the node where the loop starts (the node
being pointed by two precedents).
608
APPENDIX A. LISTS
Bibliography
[1] Richard Bird. Pearls of Functional Algorithm Design. Cambridge University Press; 1 edition (November 1, 2010). ISBN: 978-0521513388
[2] Simon L. Peyton Jones. The Implementation of Functional Programming
Languages. Prentice-Hall International Series in Computer Since. Prentice
Hall (May 1987). ISBN: 978-0134533339
[3] Andrei Alexandrescu. Modern C++ design: Generic Programming and
Design Patterns Applied. Addison Wesley February 01, 2001, ISBN 0201-70431-5
[4] Benjamin C. Pierce. Types and Programming Languages. The MIT
Press, 2002. ISBN:0262162091
[5] Harold Abelson, Gerald Jay Sussman, Julie Sussman. Structure and Interpretation of Computer Programs, 2nd Edition. MIT Press, 1996, ISBN
0-262-51087-1
[6] Chris Okasaki. Purely Functional Data Structures. Cambridge university
press, (July 1, 1999), ISBN-13: 978-0521663502
[7] Fethi Rabhi, Guy Lapalme. Algorithms: a functional programming approach. Second edition. Addison-Wesley, 1999. ISBN: 0201-59604-0
[8] Miran Lipovaca. Learn You a Haskell for Great Good! A Beginners
Guide. No Starch Press; 1 edition April 2011, 400 pp. ISBN: 978-1-59327283-8
[9] Joe Armstrong. Programming Erlang: Software for a Concurrent World.
Pragmatic Bookshelf; 1 edition (July 18, 2007). ISBN-13: 978-1934356005
[10] Wikipedia. Tail call. https://en.wikipedia.org/wiki/Tail call
[11] SGI. transform. http://www.sgi.com/tech/stl/transform.html
[12] ACM/ICPC. The drunk jailer. Peking University judge online for
ACM/ICPC. http://poj.org/problem?id=1218.
[13] Haskell wiki. Haskell programming tips. 4.4 Choose the appropriate fold.
http://www.haskell.org/haskellwiki/Haskell programming tips
[14] Wikipedia. Dot product. http://en.wikipedia.org/wiki/Dot product
609
610
BIBLIOGRAPHY
Preamble
The purpose of this License is to make a manual, textbook, or other functional and useful document free in the sense of freedom: to assure everyone
the eective freedom to copy and redistribute it, with or without modifying it,
either commercially or noncommercially. Secondarily, this License preserves for
the author and publisher a way to get credit for their work, while not being
considered responsible for modifications made by others.
This License is a kind of copyleft, which means that derivative works of the
document must themselves be free in the same sense. It complements the GNU
General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software,
because free software needs free documentation: a free program should come
with manuals providing the same freedoms that the software does. But this
License is not limited to software manuals; it can be used for any textual work,
regardless of subject matter or whether it is published as a printed book. We
recommend this License principally for works whose purpose is instruction or
reference.
612
BIBLIOGRAPHY
BIBLIOGRAPHY
613
tion when you modify the Document means that it remains a section Entitled
XYZ according to this definition.
The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document. These Warranty Disclaimers
are considered to be included by reference in this License, but only as regards
disclaiming warranties: any other implication that these Warranty Disclaimers
may have is void and has no eect on the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and
the license notice saying this License applies to the Document are reproduced
in all copies, and that you add no other conditions whatsoever to those of this
License. You may not use technical measures to obstruct or control the reading
or further copying of the copies you make or distribute. However, you may
accept compensation in exchange for copies. If you distribute a large enough
number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you
may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed
covers) of the Document, numbering more than 100, and the Documents license
notice requires Cover Texts, you must enclose the copies in covers that carry,
clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover,
and Back-Cover Texts on the back cover. Both covers must also clearly and
legibly identify you as the publisher of these copies. The front cover must
present the full title with all words of the title equally prominent and visible.
You may add other material on the covers in addition. Copying with changes
limited to the covers, as long as they preserve the title of the Document and
satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you
should put the first ones listed (as many as fit reasonably) on the actual cover,
and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more
than 100, you must either include a machine-readable Transparent copy along
with each Opaque copy, or state in or with each Opaque copy a computernetwork location from which the general network-using public has access to
download using public-standard network protocols a complete Transparent copy
of the Document, free of added material. If you use the latter option, you must
take reasonably prudent steps, when you begin distribution of Opaque copies
in quantity, to ensure that this Transparent copy will remain thus accessible at
the stated location until at least one year after the last time you distribute an
Opaque copy (directly or through your agents or retailers) of that edition to the
public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a
chance to provide you with an updated version of the Document.
614
BIBLIOGRAPHY
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the
conditions of sections 2 and 3 above, provided that you release the Modified
Version under precisely this License, with the Modified Version filling the role
of the Document, thus licensing distribution and modification of the Modified
Version to whoever possesses a copy of it. In addition, you must do these things
in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that
of the Document, and from those of previous versions (which should, if
there were any, be listed in the History section of the Document). You
may use the same title as a previous version if the original publisher of
that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together
with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this
requirement.
C. State on the Title page the name of the publisher of the Modified Version,
as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications adjacent to
the other copyright notices.
F. Include, immediately after the copyright notices, a license notice giving
the public permission to use the Modified Version under the terms of this
License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Documents license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled History, Preserve its Title, and add to it
an item stating at least the title, year, new authors, and publisher of the
Modified Version as given on the Title Page. If there is no section Entitled
History in the Document, create one stating the title, year, authors, and
publisher of the Document as given on its Title Page, then add an item
describing the Modified Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public
access to a Transparent copy of the Document, and likewise the network
locations given in the Document for previous versions it was based on.
These may be placed in the History section. You may omit a network
location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives
permission.
BIBLIOGRAPHY
615
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all
of the original documents, unmodified, and list them all as Invariant Sections
of your combined work in its license notice, and that you preserve all their
Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple
identical Invariant Sections may be replaced with a single copy. If there are
multiple Invariant Sections with the same name but dierent contents, make
the title of each such section unique by adding at the end of it, in parentheses,
616
BIBLIOGRAPHY
the name of the original author or publisher of that section if known, or else a
unique number. Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled History in
the various original documents, forming one section Entitled History; likewise
combine any sections Entitled Acknowledgements, and any sections Entitled
Dedications. You must delete all sections Entitled Endorsements.
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this License in
the various documents with a single copy that is included in the collection,
provided that you follow the rules of this License for verbatim copying of each
of the documents in all other respects.
You may extract a single document from such a collection, and distribute it
individually under this License, provided you insert a copy of this License into
the extracted document, and follow this License in all other respects regarding
verbatim copying of that document.
8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders,
but you may include translations of some or all Invariant Sections in addition to
the original versions of these Invariant Sections. You may include a translation
of this License, and all the license notices in the Document, and any Warranty
Disclaimers, provided that you also include the original English version of this
License and the original versions of those notices and disclaimers. In case of a
disagreement between the translation and the original version of this License or
a notice or disclaimer, the original version will prevail.
BIBLIOGRAPHY
617
If a section in the Document is Entitled Acknowledgements, Dedications, or History, the requirement (section 4) to Preserve its Title (section 1)
will typically require changing the actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as
expressly provided under this License. Any attempt otherwise to copy, modify,
sublicense, or distribute it is void, and will automatically terminate your rights
under this License.
However, if you cease all violation of this License, then your license from
a particular copyright holder is reinstated (a) provisionally, unless and until
the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some
reasonable means prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable
means, this is the first time you have received notice of violation of this License
(for any work) from that copyright holder, and you cure the violation prior to
30 days after your receipt of the notice.
Termination of your rights under this section does not terminate the licenses
of parties who have received copies or rights from you under this License. If
your rights have been terminated and not permanently reinstated, receipt of a
copy of some or all of the same material does not give you any rights to use it.
11. RELICENSING
Massive Multiauthor Collaboration Site (or MMC Site) means any World
Wide Web server that publishes copyrightable works and also provides prominent facilities for anybody to edit those works. A public wiki that anybody can
edit is an example of such a server. A Massive Multiauthor Collaboration
(or MMC) contained in the site means any set of copyrightable works thus
published on the MMC site.
618
BIBLIOGRAPHY
CC-BY-SA means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation
with a principal place of business in San Francisco, California, as well as future
copyleft versions of that license published by that same organization.
Incorporate means to publish or republish a Document, in whole or in
part, as part of another Document.
An MMC is eligible for relicensing if it is licensed under this License, and
if all works that were first published under this License somewhere other than
this MMC, and subsequently incorporated in whole or in part into the MMC,
(1) had no cover texts or invariant sections, and (2) were thus incorporated prior
to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site
under CC-BY-SA on the same site at any time before August 1, 2009, provided
the MMC is eligible for relicensing.
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being
LIST.
If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software
license, such as the GNU General Public License, to permit their use in free
software.
Index
8 queens puzzle, 486
Auto completion, 125
AVL tree, 81
balancing, 86
definition, 81
deletion, 92
imperative insertion, 92
insertion, 84
verification, 91
B-tree, 163
delete, 172
insert, 165
search, 186
split, 165
BFS, 515
Binary heap, 193
build heap, 196
decrease key, 203
heap push, 205
Heapify, 195
insertion, 205
merge, 210
pop, 198
top, 198
top-k, 202
Binary Random Access List
Definition, 322
Insertion, 324
Random access, 327
Remove from head, 325
Binary search, 436
binary search tree, 29
data layout, 30
delete, 40
insertion, 33
looking up, 37
min/max, 38
randomly build, 44
search, 37
succ/pred, 38
traverse, 34
binary tree, 30
Binomial Heap
Linking, 254
Binomial heap, 249
definition, 252
insertion, 256
pop, 261
Binomial tree, 249
merge, 258
Boyer-Moor majority number, 453
Boyer-Moore algorithm, 470
Breadth-first search, 515
Change-making problem, 527
Cock-tail sort, 233
Deep-first search, 480
DFS, 480
Dynamic programming, 529
Fibonacci Heap, 264
decrease key, 277
delete min, 268
insert, 266
merge, 267
pop, 268
Finger Tree
Imperative splitting, 370
Finger tree
Append to tail, 354
Concatenate, 356
Definition, 343
Ill-formed tree, 349
Imperative random access, 368
Insert to head, 345
Random access, 361, 367
Remove from head, 348
Remove from tail, 355
Size augmentation, 361
619
620
splitting, 366
folding, 592
Grady algorithm, 516
Heap sort, 205
Human coding, 516
Implicit binary heap, 193
in-order traverse, 34
Insertion sort
binary search, 50
binary search tree, 53
linked-list setting, 51
insertion sort, 47
insertion, 48
Integer Patricia, 104
insert, 106
look up, 111
Integer prefix tree, 104
Integer trie, 100
insert, 101
look up, 103
Kloski puzzle, 507
KMP, 458
Knuth-Morris-Pratt algorithm, 458
LCS, 534
left child, right sibling, 253
Leftist heap, 208
heap sort, 211
insertion, 210
merge, 209
pop, 210
rank, 208
S-value, 208
top, 210
List
append, 559
break, 587
concat, 568
concats, 597
cons, 553
Construction, 553
definition, 551
delete, 565
delete at, 565
drop, 585
drop while, 586
elem, 598
INDEX
empty, 552
empty testing, 554
existence testing, 598
Extract sub-list, 585
filter, 599
find, 599
fold from left, 594
fold from right, 592
foldl, 594
foldr, 592
for each, 580
get at, 555
group, 588
head, 552
index, 555
infix, 602
init, 556
insert, 562
insert at, 562
last, 556
length, 554
lookup, 599
map, 577, 578
matching, 602
maximum, 573
minimum, 573
mutate, 559
prefix, 602
product, 569
reverse, 583
Reverse index, 557
rindex, 557
set at, 561
span, 587
split at, 585, 587
sux, 602
sum, 569
tail, 552
take, 585
take while, 586
Transformation, 576
unzip, 604
zip, 604
Longest common sub-string, 157
Longest common subsequence problem,
534
Longest palindrome, 159
Longest repeated sub-string, 155
Maximum sum problem, 456
INDEX
621
Paired-array list
Definition, 335
Insertion and appending, 336
Random access, 336
Removing and balancing, 337
Pairing heap, 281
decrease key, 284
definition, 282
delete, 288
delete min, 284
find min, 282
insert, 282
pop, 284
top, 282
Parallel merge sort, 427
Parallel quick sort, 427
Patricia, 117
insert, 118
look up, 123
Peg puzzle, 489
post-order traverse, 34
pre-order traverse, 34
Prefix tree, 117
Radix tree, 99
range traverse, 40
red-black tree, 57, 62
deletion, 66
imperative insertion, 74
insertion, 63
red-black properties, 62
Queue
Balance Queue, 306
Circular buer, 299
Incremental concatenate, 310
Incremental reverse, 308
Lazy real-time queue, 315
Paired-array queue, 305
Paired-list queue, 302
Real-time Queue, 308
Singly linked-list, 296
Quick Sort
622
Subset sum problem, 539
Sux link, 139
Sux tree, 137, 144
active point, 144
Canonical reference pair, 145
end point, 144
functional construction, 151
node transfer, 139
on-line construction, 146
reference pair, 145
string searching, 153
sub-string occurrence, 154
Sux trie, 138
on-line construction, 140
T9, 129
Tail call, 570
Tail recursion, 570
Tail recursive call, 570
Textonym input method, 129
The wolf, goat, and cabbage puzzle,
495
Tounament knock out
explict infinity, 242
tree reconstruction, 36
tree rotation, 60
Trie, 113
insert, 115
look up, 116
Trounament knock out, 237
Ukkonens algorithm, 146
Water jugs puzzle, 499
word counter, 29
INDEX