Balancing Weight-Balanced Trees
Balancing Weight-Balanced Trees
net/publication/220676747
CITATIONS READS
11 1,864
2 authors, including:
Yoichi Hirai
The University of Tokyo
8 PUBLICATIONS 139 CITATIONS
SEE PROFILE
All content following this page was uploaded by Yoichi Hirai on 19 May 2014.
KAZUHIKO YAMAMOTO
IIJ Innovation Institute Inc.
(e-mail: [email protected])
Abstract
A weight-balanced tree (WBT) is a binary search tree, whose balance is based on the sizes
of the subtrees in each node. Although purely functional implementations on a variant
WBT algorithm are widely used in functional programming languages, many existing
implementations do not maintain balance after deletion in some cases. The difficulty lies
in choosing a valid pair of rotation parameters: one for standard balance and the other for
choosing single or double rotation. This paper identifies the exact valid range of the rotation
parameters for insertion and deletion in the original WBT algorithm where one and only
one integer solution exists. Soundness of the range is proved using a proof assistant Coq.
Completeness is proved using effective algorithms generating counterexample trees. For two
specific parameter pairs, we also proved in Coq that set operations also maintain balance.
Since the difference between the original WBT and the variant WBT is small, it is easy to
change the existing buggy implementations based on the variant WBT to the certified original
WBT with a rational solution.
1 Introduction
Weight-balanced trees (WBTs) (Nievergelt & Reingold, 1972) are binary search
trees, which can be used to implement finite sets and finite maps (associative arrays).
Although other balanced binary search trees, such as AVL trees (Adel’son-Vel’skii &
Landis, 1962) and red–black trees (Guibas & Sedgewick, 1978), use the height of
subtrees for balancing, the balance of WBTs is based on the sizes (number of
elements) of the subtrees below each node. Its purely functional implementations
are widely used in functional programming languages. In fact, fundamental modules
Data.Set and Data.Map in Haskell (Marlow, 2010) and the wttree.scm library in
MIT/GNU Scheme and slib are based on a variant of the WBT algorithm (Adams,
1993).
In order to ensure performance, the algorithm keeps the height of a tree
logarithmic to its size by balancing the sizes of the subtrees in each node. In
2010, a bug report1 confirmed that the Data.Map library broke the tree balance after
1 http://hackage.haskell.org/trac/ghc/ticket/4242
288 Y. Hirai and K. Yamamoto
a c
Single rotation
c a
x z
y z x y
a b
Double rotation
c a c
x b
z x y0 y1 z
y0 y1
split
Fig. 1. A single left rotation and a double left rotation. a, b, and c are elements. x, y, y0, y1,
and z are the size of each tree. If y is too large, a double rotation is chosen. Otherwise, a
single rotation is used.
deletion.2 We investigated the existing literature but failed to find a rigorous proof
that both insertion and deletion preserve the balance of WBTs. Instead, we found
that proving balance preservation requires checking several inequalities in 14 cases
of program behaviors for five different parameter zones. We used a proof assistant
Coq (Bertot & Casteran, 2004) in order to cope with this intensive case analysis.
To keep the balance of WBTs, there are two important parameters Δ and Γ: Δ
decides whether any rotation is made at all and Γ chooses a single rotation or a
double rotation (Figure 1). These parameters must ensure that a newly created tree
is balanced after any insertion or deletion
√ √in a given balanced WBT. The original
paper of WBT suggests Δ, Γ = 1 + 2, 2, though the irrational parameters are
expensive to implement (Roura, 2001). We found valid rational parameters, which
can be implemented at low cost.
Contributions of the paper are as follows:
2 The Data.Map of the container library 0.3.0.0 or earlier has this bug.
Balancing weight-balanced trees 289
√ √
— The original proposal Δ, Γ = 1 + 2, 2 in Nievergelt & Reingold
(1972) maintains the strictest balance condition available within the range,
— The smaller Δ is the better performer;
• (soundness) A proof in Coq that insertion and deletion preserve balance of
trees when the parameters lie within the range;
• (completeness) An implemented and tested procedure for producing coun-
terexamples for any parameter pair outside the range.
This paper is organized as follows. We define terminology in Section 2. In Section 3,
we describe the basics of WBT algorithms and explain the difference between the
original WBT algorithm and a variant WBT algorithm. We point out the drawback
of the variant WBT algorithm in Section 4 and explain that our goal is to find
the valid range of the original WBT in Section 5. The valid range is identified
by preliminary tests in Section 6. For soundness of the range, we describe a Coq
certified proof in Section 7. For completeness, we show the method of producing
counterexamples outside the valid range in Section 8. Section 9 explains that our
suggested parameter retains almost the same performance as the existing parameters.
We show related work and conclusions in Sections 10 and 11, respectively.
2 Terminology
The original paper (Nievergelt & Reingold, 1972) uses the name “binary search trees
of bounded balance.” However, following Knuth’s book (Knuth, 1998), we call the
same family of algorithms “weight-balanced trees.” We use “the original WBT” for
the WBT algorithm in the original paper (Nievergelt & Reingold, 1972) and “the
variant WBT” for the WBT algorithm in Adams’s technical report (Adams, 1992) .
To check the balance of a WBT, weights of its left subtree and right subtree are
used. In the original WBT, the weight of a tree is the number of contained elements
plus one. In the variant WBT, the weight of a tree is simply the number of contained
elements. Both WBT algorithms use two parameters Δ and Γ mentioned in Section 1.
For the original WBT, we use Δ, Γ. For the variant WBT, we use (Δ, Γ).
We use the Haskell syntax to describe algorithms.
3 WBT
In this section, we briefly explain the basics of WBT.
3 Our code for algorithm implementation, tests, benchmarks, and proofs are available on the author’s web
page http://www.mew.org/~kazu/proj/weight-balanced-tree/ and the archive of the journal.
290 Y. Hirai and K. Yamamoto
As shown above, the size of Tip is 0 and the size of a Bin-constructed tree is the
sum of the sizes of its two subtrees, plus 1.
The balanced function uses the isBalanced predicate in order to check the first
condition of balancing. Each WBT algorithm has its own isBalanced predicate,
which is shown later. The balanced function then recursively calls itself on both
subtrees in order to check the second condition of balancing.
After an element is inserted, the balance between the left and right subtrees might
be broken. This is checked by isBalanced, which uses Δ. If the two subtrees are
still balanced, a new node is simply created; otherwise, a rotation is performed. The
exact condition depends on whether the original or the variant WBT is used. If an
element is inserted into the right subtree and the balance is broken, a left rotation
is performed.
balanceL :: a -> Set a -> Set a -> Set a
balanceL k l r
| isBalanced l r = bin k l r
| otherwise = rotateL k l r
For the rest of this paper, we only consider left rotations. Symmetric arguments
can be applied to right rotations. Note that a left rotation is also performed when
an element is deleted from the left subtree. There are two kinds of left rotations.
One is called a single rotation and the other is called a double rotation (Figure 1).
The isSingle predicate, which uses Γ, decides which rotation is used.
rotateL :: a -> Set a -> Set a -> Set a
rotateL k l r@(Bin _ _ rl rr)
| isSingle rl rr = singleL k l r
| otherwise = doubleL k l r
rotateL _ _ _ = error "rotateL"
Both single and double rotations move a part of the right subtree into the left sub-
tree. A single rotation moves the left subtree of the right subtree. A double rotation
moves a smaller part: the left subtree of the tree moved by a single rotation. A single
rotation can break the balance of the whole tree if the subtree being moved is too
large. To prevent that the isSingle function chooses a double rotation when the sub-
tree is much larger than its right sibling. Each WBT algorithm has its own isSingle
function, which is shown later. The error cases in these above three functions does
not occur since the insert operation adds one element onto the right subtree.
When the balance is broken, the algorithm chooses a single rotation or a double
rotation by comparing the weights of two subtrees of the right subtree (node c in
Figure 1). A single rotation is chosen if the weight of the left subtree of node c is
less than the weight of the right subtree of node c multiplied by Γ. Otherwise, a
double rotation is chosen.
4 wttree.scm in slib 3b3 or earlier has this bug but our fix was merged in December 2010. wttree.scm
in MIT/GNU Scheme 9.0.1 or earlier has this bug but our fix was merged in January 2011.
Balancing weight-balanced trees 293
b d
a e b e
0 2
d g c g
c f f
5 Goal
We chose the original WBT as our target algorithm because it has two benefits:
• Since weights are nonzero natural numbers, we do not have to treat small
trees as special. This makes mathematical analysis easier because there are
fewer cases to consider;
• The mathematical analysis in Nievergelt & Reingold (1972) is credible. It
considers the balance preservation by both insertions and deletions.
Note that it is easy to convert an existing program based on the variant WBT to
the original WBT because the only difference is in the isBalanced and isSingle
functions. We will discuss the original WBT in the rest√of √this paper. We chose
to seek rational valid parameters in addition to 1 + 2, 2 suggested by the
original paper. To implement the originally suggested balance condition with integer
arithmetic, we have to compare the squares of the weights. Here is a straightforward
implementation:
isBalanced :: Set a -> Set a -> Bool
isBalanced a b = 2 * y * y <= z * z
where x = size a + 1
y = size b + 1
z = x + y
Since integers in typical computer languages are fixed length, this calculation is prone
to overflow. In order to avoid this problem, we have to deploy more complicated
implementation. Rational parameters are preferable.
2.2
2
1.8
1.6
Γ
1.4
1.2
1
0.8 2 2.5 3 3.5 4 4.5 5
Δ
Fig. 3. Results of tests plotted along Δ, Γ. The dotted square symbols indicate that no
insertion nor deletion broke the balance. The plus symbols + indicate discovery of concrete
counterexamples.
6.1 Tests
To obtain a more precise parameter range around 3, 2, we tested not only with
integer parameters, but also with rational parameters. Figure 3 shows the results
with the range, where Δ = 2, 2.1, . . . , 5 and Γ = 1, 1.05, . . . , 2.2. The shape of the valid
range seemed more complex than we had expected.
2.2
2
1.8
1.6
Γ
1.4
1.2
1
0.8 2 2.5 3 3.5 4 4.5 5
Δ
Fig. 4. Results using the automated arithmetic solver Omega plotted along Δ, Γ. The square
symbols indicate the parameter pairs are valid. The plus symbols + indicate discovery of
concrete counterexamples. Blanks mean time-out.
2.2
Γ = 4/2
2
Γ <= Δ - 1
1.8
Γ = 5/3
1.6 Γ = 3/2
Γ Δ < 4.5
1.4 Γ = 4/3
Γ >= (Δ + 1) / Δ
1.2
1
0.8 2 2.5 3 3.5 4 4.5 5
Δ
Fig. 5. The boundaries of the valid parameter range for the original WBT are shown with
inequalities. The points are some rational valid parameters.
a
Double rotation b c
c a c a
x+1 b b w
x
w x y z w
y z Single rotation y z
Fig. 6. Two intermediate arithmetic lemmas. The variables x, y, z, and w informally stand for
the size of each subtree shown in the picture although the formal definition says nothing
about tree structures. good params denotes the restriction on parameters shown in Section 6.3.
The operator “<” denotes comparison that returns a Boolean. Note that in the final lemmas,
the domain is restricted to integers. We used this restriction for case analysis on particularly
small trees. An integer z can be converted into a rational number by writing (z#1).
track of those numerous cases and constraints by hand is troublesome and error-
prone.
In the 15,700 lines5 of Coq proof script, the first and second parts are abount
insertion and deletion, while the third and fourth parts are about set operations. The
first part proves arithmetic statements involving Δ and Γ. The second part proves
that any insertion or deletion on any balanced tree yields a balanced tree if the
parameter pair Δ, Γ lies within the conjectured range.
The first arithmetic statements are made in terms of rational numbers. We just
considered the sizes of five subtrees involved in rotations. Since Δ and Γ are not
fixed, the problem did not fit in Presburger arithmetic so that we could not use
omega tactic. While proving lemmas shown in Figure 6, we intensively used the
Psatz interface to the CSDP solver (Borchers, 1999). For a proof goal consisting of
5 An anonymous referee pointed out that since the script is written in a sparse manner, it is probably
possible to prove the same results with about 3,000 lines in a more concise style.
Balancing weight-balanced trees 297
Fig. 7. Two program theorems shown in the second half. balance rec means that the whole
tree is balanced. validsize rec means that every node in the tree has the correct size information
on it.
polynomial inequations, the Psatz interface tries to build a proof automatically with
the help of an external solver called CSDP. For some special small sizes, we had to
manually compute the ceil function when we stated that an integer less than 13.5
must be less than or equal to 13.
In the second part, we treated actual programs operating on actual tree structures
and proved lemmas shown in Figure 7. The second part does not rely on rational
numbers. Instead, we introduced integer variables called deltaU and deltaD to denote
the rational number deltaU/deltaD. In this part, we had to use induction on trees.
Moreover, we had to give special treatment to some particularly small trees because
the arithmetic lemmas required trees to have sufficiently many subtrees.
Set operations. We also verified that the set operations (union, difference, and inter-
section) preserve the balance condition under parameter pairs 3, 2 and 5/2, 3/2,
respectively, in the third and fourth parts. (The reason for choosing the second pair
is the benchmark described in Section 9.) That is, if two WBTs are balanced, their
union, difference, and intersection are also balanced.6 For the set operations, we
used the efficient hedge-union algorithms used in the current version of Data.Set
and Data.Map library, not the simple divide-and-conquer. The technical paper by
Adams (1992) describes the hedge approach as well as the divide-and-conquer
approach.
1 6 Γ 6 Δ. (2)
If the first constraint is not satisfied, there are no balanced trees of size two. On the
other hand, if the second constraint is broken, only single or double rotations are
chosen so that it is impossible to maintain balance.
6 For intersection, we used an experimental Function command of Coq 8.2 because the intersection
function has a complicated recursion: when the intersection function calls itself, the new argument is
generated by another function with recursion so the ordinary Fixpoint command cannot guess the
decreasing argument.
298 Y. Hirai and K. Yamamoto
a
Double rotation b d
d a d a
deleted c
b b y
c y x y c
x Single rotation x
Fig. 8. A counterexample outside the right boundary. x and y denote the size of each subtree.
The original tree on the left side is balanced. However, after deletion of the only element
in the left subtree, neither a single rotation nor a double rotation maintains the balance at
node a.
Consider the trees in Figure 8. The size of each tree is defined as follows:
x = Δ, y = 2Δ − Δ − 4.
This original tree on the left side of the figure is balanced. To see that, we look
at each node. Let us consider the balance at node a first. Since the size of the right
subtree x + y + 3 is larger than that of the left subtree, we only have to confirm that
the right subtree is not too large. More specifically, Δ times the weight of the left
subtree must be greater than or equal to the weight of the right subtree. Since the
weight of a tree is the size plus one, we are seeking this inequality:
Δ × (1 + 1) > (3 + x + y) + 1.
It is possible to mathematically analyze that the left-hand side has a positive value
for Δ > 4.5 but the analysis is boring. Instead, we plot the graph of f(Δ) in Figure 9,
where f(Δ) denotes the expression above.
It is obvious that node b is balanced. Thus, the entire original tree is balanced. If
we delete the left subtree of node a, the balance is broken:
Δ − (x + y + 4) = Δ − 2Δ < 0 (by Equation (1)).
So, either a single or double rotation takes place. After a double rotation, the
balance at node a is broken because Δ − (Δ + 1) < 0. Otherwise, after a single
rotation, the balance at node a is also broken since Δ − (Δ + 3) < 0.
Balancing weight-balanced trees 299
30
25
20
15
f(Δ)
10
5
0
-5
-10 0 1 2 3 4 5 6 7 8
Δ
a c
c a
x z
y z x-1 y
Fig. 10. A counterexample of the left boundary. The original tree in the left side is balanced.
However, after deletion of one element in the left subtree, a single rotation breaks the balance
at node c if y is large enough.
Consider the trees in Figure 10. Each of x, y, and z is the size of a subtree of the
original tree. x and z are defined using y:
y+1 y+z+2
z= , x= − 1.
Γ Δ
This original tree is balanced because of the following. At node c, the right subtree
of size z is not too much larger than the left subtree of size y for sufficiently large y:
y+1
Δ(y + 1) − (z + 1) = Δ(y + 1) − +1
Γ
y+1
> Δ(y + 1) − +1
Γ
1
= Δ− y+C
Γ
>0
where C does not contain x, y, or z. The last inequality holds for large y because
the coefficient for y is Δ − 1/Γ, which is positive since Δ > 1 > 1/Γ.
300 Y. Hirai and K. Yamamoto
The left subtree of c is not too much larger than the right subtree, either:
y+1
Δ(z + 1) − (y + 1) = Δ + 1 − (y + 1)
Γ
y+1
>Δ − (y + 1)
Γ
Δ
= − 1 (y + 1)
Γ
> 0 (by Equation (2)).
At node a, the right subtree is not smaller: x 6 y + z + 1. At the same time, the
right subtree is not too large:
y+z+2 y+z+2
Δ(x + 1) − (y + z + 2) = Δ − (y + z + 2) > Δ − (y + z + 2) = 0.
Δ Δ
If we delete one element from the left subtree of node a, the balance is broken as
follows:
y+z+2 y+z+2
Δx − (y + z + 2) = Δ − 1 − (y + z + 2) < Δ − (y + z + 2) = 0.
Δ Δ
At this time, a single rotation is chosen because
y+1 y+1
Γ(z + 1) − (y + 1) = Γ + 1 − (y + 1) > Γ − (y + 1) = 0.
Γ Γ
After the single rotation, the size of the left subtree of node c is x + y and that
of the right subtree is z. The balance is broken if the following expression has a
negative value:
y+1 y + y+1 +2
Δ(z + 1) − (x + y + 1) = Δ +1 − Γ
+y−1
Γ Δ
y+1
y+1 y+ Γ +2
<Δ +1 − +y−1
Γ Δ
y+1 y + y+1 +1
<Δ +1 − Γ
+y−1
Γ Δ
(Δ + 1)(Δ − Γ − 1)
= y+C
ΔΓ
where C does not contain x, y, or z. The coefficient for y is negative by Equation (3).
This implies, for sufficiently large y, the rotated tree becomes unbalanced after a
delete operation.
a b
c a c
x b
w x-1 y z w
y z
Fig. 11. A counterexample for parameter pairs outside the lower boundary. x, y, z, and w
denote the size of each subtree. The original tree on the left side is balanced. However, after
deletion of one element in the left subtree, a double rotation breaks the balance at node c if
z is large enough.
For large values of x, the original tree on the left side of the figure is balanced.
To see that, let us look at each node. On node a, the right subtree is not too large:
r+1 r+1
Δ(x + 1) = Δ >Δ = r + 1.
Δ Δ
The left subtree of node a is not too large if r is large enough:
r+1
Δ(r + 1) > r + 2 > r + 1 > = x + 1.
Δ
On node c, the right subtree is not too large:
Δ(y + z + 1 + 1) = Δ Δ(z + 1) + z + 1 > Δ(z + 1) + 1 = w + 1.
On the other hand, the left subtree of node c is not too large for large values of z:
Δ(w + 1) = Δ Δ(z + 1) + 1 > Δ(z + 1) + z + 1 = y + z + 2
where the inequality in the middle holds when z is large enough. This is because
both sides are almost linear on z, where the coefficient on the left side Δ2 is larger
than the coefficient on the right side Δ + 1. The inequality between the coefficients
Δ2 > Δ + 1 holds because Δ > 2.
On node b, the right subtree is not too large:
Δ(y + 1) = Δw = ΔΔ(z + 1) > Δ(z + 1) > z + 1
where the inequalities come from Δ > 2 and the fact that z is an integer. On the
other hand, the left subtree of node b is not too large:
Δ(z + 1) > Δ(z + 1) = w = y + 1.
Although the original tree is balanced as we have seen, if we delete an element
from the left subtree of node a, the balance is broken:
r+1 r+1
Δx − (r + 1) = Δ − 1 − (r + 1) < Δ − (r + 1) = 0.
Δ Δ
This implies either a single rotation or a double rotation takes place. Actually, a
double rotation takes place if z is large enough because the following expression has
a nonpositive value:
Γ(w + 1) − (y + z + 2) = Γ(Δ(z + 1) + 1) − (Δ(z + 1) + z + 1)
< Γ(Δ(z + 1) + 1) − (Δ(z + 1) + z)
= (ΓΔ − Δ − 1)z + C.
302 Y. Hirai and K. Yamamoto
a Double rotation b c
c a c a
Deleted
b b y
y x y
x Single rotation x
Fig. 12. A counterexample for parameter pairs outside the upper boundary. x and y denote
the sizes of the subtrees. The original tree on the left side is balanced. However, deletion of
the single element in the left subtree breaks the balance. A double rotation maintains the
balance but a single rotation breaks the balance at node a. When the parameter pair is
outside the upper boundary, a single rotation is chosen.
where C does not contain x, y nor z. The coefficient of z is negative according to the
inequation (4). So, we can choose a large enough z that ensures a double rotation.
If a double rotation is chosen, the balance is broken at node c:
Δ(z + 1) − (w + 1) = Δ(z + 1) − (Δ(z + 1) + 1) < Δ(z + 1) − Δ(z + 1) = 0.
Since we already have the other boundaries, we only have to consider 2.5 6 Δ <
4.5. Thus, in this last case, we only have to deal with four different small trees. It is
easy to check that these four trees are balanced and that its balance is broken if the
left subtree of node a is removed. If a double rotation is chosen, the resulting tree
is balanced. If a single rotation is chosen, the balance at node a is broken:
Δ − (x + 2) = Δ − Δ − 1 < 0.
In order to obtain a counterexample, it is enough to ensure a single rotation. For
this, satisfying the following inequality is enough:
x+2
Γ> = Γ.
y+1
In the table below, we summarize the above result for the four different trees.
x y Γ = (x + 2)/(y + 1)
Time
Fig. 13. Performance of the insert operation using different WBT algorithms.
9 Performance
The balance constraints are ultimately for performance. We benchmarked the
original WBT with 3, 2 to compare against the variant WBT with (3, 2) and
(4, 2), and Logarithmic BST described in Section 10. Their code is based on the
Haskell Data.Map implementation in the containers package version 0.3.0.0.7 We
used Dell OptiPlex 960 with a 2.66 GHz Intel Core 2 Quad CPU with 2 GB memory
running Linux 2.6.35. The Haskell compiler was the Glasgow Haskell Compiler
version 6.12.3 with the -O2 option. Benchmarking a language with lazy evaluation
is not straightforward. We used the criterion package version 0.5.0.5 and the
progression package version 0.4 as reliable benchmark tools. Data.Map is defined
as strict and we used a strict data type Int as key. So, we removed the toList
overhead used in criterion when reducing Data.Map to its normal form. We also
benchmarked the original WBT with several rational parameters.
Comparison between the original and variant WBTs. We evaluated the performance
of the insertion operation, the deletion operation, and the lookup operation. For all
operations, we prepared 1k, 10k, and 100k elements both in the increasing order
and random order. They are labeled as inc 103 , inc 104 , inc 105 , rnd 103 , rnd 104 ,
and rnd 105 , respectively, in Figures 13–15. Some error bars are invisibly short.
For the insertion operation, we measured the entire time to construct a WBT tree
from all elements. The results are illustrated in Figure 13. For the delete operation,
we first constructed a WBT tree from all elements then measured the entire time
to delete each element in the insertion order from the full tree. The results are
7 As of this writing, performance tuning is going on. The containers package version 0.3.0.0 does not
include such performance tuning.
304 Y. Hirai and K. Yamamoto
Time
Fig. 14. Performance of the delete operation using different WBT algorithm.
Time
Fig. 15. Performance of the lookup operation using different WBT algorithm.
illustrated in Figure 14. For the lookup operation, we first constructed a WBT tree
from all elements then we measured the entire time to look up each element in the
tree. The results are illustrated in Figure 15. To show the results of three different
sizes in a graph, we divide each entire time by each size. We can say that the original
WBT with 3, 2 has at least the same performance as the variant WBT with (3, 2)
and (4, 2) and Logarithmic BST.
Comparison among different parameter choices for the original WBT. Likewise, we
compared the performance of eight different parameter pairs within the valid range
for insertion, deletion, and lookup (Figures 16–18). We found that the smaller
Δ, which enforces the stricter balance condition, performs better. For incremental
inputs, the largest time difference between the slowest and the fastest reached 43%
for insertion. For randomized inputs, the largest difference was 14% for lookup.
10 Related work
Coq verification of balanced tree algorithms. Filliâtre and Letouzey (2004) proved
correctness of AVL tree and red-black tree implementations in Coq and extracted
OCaml codes from the Coq implementation. At some stages during the
Balancing weight-balanced trees 305
Time
Fig. 16. Performance of the insert operation with different verified parameter pairs.
Time
Fig. 17. Performance of the delete operation with different verified parameter pairs.
Time
Fig. 18. Performance of the lookup operation with different verified parameter pairs.
implementation, they were not able to prove a balancing condition in Coq. This
led to discovery of an implementation bug relating to the balance of the AVL tree
implementation in the OCaml standard library at the time. In this paper, we pointed
out balancing bugs of the algorithm, not merely in an implementation.
Charguéraud (2010) verified many functional tree algorithms in Okasaki’s book
(Okasaki, 1998) with a new method of transforming a program into a proposition
transformer. However, neither Charguéraud’s verification nor the book contains
WBT algorithms. If we apply Charguéraud’s method to verifying WBT algorithms,
306 Y. Hirai and K. Yamamoto
Other balanced tree algorithms. Logarithmic BST (Roura, 2001) is another variant of
WBT. To implement Logarithmic BST, isBalanced and isSingle use bit operations
and other code can be shared with the WBT family.
(.<.) :: Size -> Size -> Bool
a .<. b
| a >= b = False
| otherwise = ((a .&. b) ‘shiftL‘ 1) < b
11 Conclusion
We identified the exact range of the valid rotation parameters of the original weight-
balanced tree and proved in Coq that it can maintain balance after any insertion
and deletion operations. Within the range, the only integer solution is 3, 2, which
allows simpler implementation of the original weight-balanced tree. Benchmarks
showed that the original weight-balanced tree with 3, 2 works in almost the same
Balancing weight-balanced trees 307
performance as the variant at (3,2) and (4,2). We benchmarked other valid rational
parameters and found the smaller Δ is the better performer. We proved in Coq
that set operations, such as union, intersection, and difference, can maintain balance
under 3, 2 and 5/2, 3/2. We also showed how to produce counterexamples outside
the boundaries of the valid range.
Acknowledgments
The authors would like to thank Taylor Campbell for his bug report that initiated
our research and Eijiro Sumii for discussion and instructive comments on our early
draft. The authors are grateful to anonymous referees for a number of presentation
improvements and a concise title.
References
Adams, S. (1992) Implementing sets efficiently in a functional language, Technical report CSTR
92-10. University of Southampton.
Adams, S. (1993) Efficient sets: A balancing act. J. Funct. Program., 3(4), 553–562.
Adel’son-Vel’skii, G. M. & Landis, E. M. (1962) An algorithm for the organization of
information. Dokl. Akad. Nauk SSSR, 146(2), 263–266.
Bertot, Y. & Casteran, P. (2004) Interactive Theorem Proving and Program Development.
Coq’Art: The Calculus of Inductive Constructions. Springer.
Borchers, B. (1999) CSDP, a c library for semidefinite programming. Optim. Methods Softw.,
11(1), 613–623.
Charguéraud, A. (2010) Program verification through characteristic formulae.In Proceedings
of the 15th International Conference on Functional Programming (ICFP). ACM.
Claessen, K. & Hughes, J. (2000) QuickCheck: A lightweight tool for random testing of haskell
programs. In Proceedings. of the Fifth International Conference on Functional Programming
(ICFP). ACM.
Filliâtre, J.-C. & Letouzey, P. (2004) Functors for proofs and programs. In Programming
Languages and Systems, Schmidt, D. (ed), Lecture Notes in Computer Science, vol. 2986.
Springer, pp. 370–384.
Guibas, L. J. & Sedgewick, R. (1978) A dichromatic framework for balanced trees. In
Proceedings of the 19th Annual Symposium on Foundations of Computer Science (SFCS ’78).
IEEE, pp. 8–21.
Knuth, D. E. (1998) The Art of Computer Programming: Sorting and Searching. 2nd ed., vol. 3.
Addison-Wesley.
Marlow, S., et al. (2010) Haskell 2010 Language Report, Marlow, S. (ed), Available online
http://www.haskell.org/ (May 2011).
Nievergelt, J. & Reingold, E. M. (1972) Binary search trees of bounded balance. In Proceedings
of the Fourth Annual Acm Symposium on Theory of Computing. ACM, pp. 137–142.
Okasaki, C. (1998) Purely Functional Data Structures. Cambridge University Pres.
Pugh, W. (1991) The Omega test: A fast and practical integer programming algorithm for
dependence analysis. In Proceedings. of the 1991 ACM/IEEE Conference on Supercomputing.
ACM.
Roura, S. (2001) A new method for balancing binary search trees. In Automata, Languages and
Programming, Orejas, F., Spirakis, P. & van Leeuwen, J. (eds), Lecture Notes in Computer
Science, vol. 2076. Springer, pp. 469–480.