Symmetric Binary B-Trees
Symmetric Binary B-Trees
9 by Sprmger-Verlag 1972
Summary. A class of binary trees is described for maintaining ordered sets of data.
Random insertions, deletions, and retrievals of keys can be done in time proportional
to logN where N is the cardinality of the data-set. Symmetric B-Trees are a modifica-
tion of B-trees described previously by Bayer and McCreight. This class of trees properly
contains the balanced trees.
This paper will describe a further solution to the following well-known problem
in information processing: Organize and maintain an index, i.e. an ordered set
of keys or virtual addresses, used to access the elements in a set of data, in such
a w a y t h a t r a n d o m and sequential insertions, deletions, and retrievals can be
performed efficiently.
Other solutions to this problem have been described for a one-level stor~, in
[t, 3-5, 7] and for a two-level store with a pseudo-random access backup store
in [2]. All these techniques use trees to represent the d a t a sets. The class of trees
to be described in this paper is a generalization of the trees described in [t, 3-5],
b u t it is not comparable with the BB-trees described in [7]. The following tech-
nique is suitable for a one-level store.
Readers familiar with E2J and [3] m a y recognize the technique a.~ a further
modification of B-trees introduced in [2]. In [3] binary B-trees were considered
as a special case and a m"dified representation of the B-trees of [2]. Binary
B-trees are derived in a ~traightforward w a y from B-trees, they do exhibit,
however, an a s y m m e t r y in the sense t h a t the left arcs in a binary B-tree must
be d-arcs (downward), whereas the right arcs can be either b-arcs or 0-arcs
(horizontal). Removing this a s y m m e t r y naturally leads to the symmetric binary
B-trees described here.
After this brief digression on the relationship of this paper to earlier work we
will now proceed with a self-contained presentation of symmetric binary B-trees.
Notation. We will use t. ~t, v, w, x, y, z to denote trees and p, q, r, s to denote
nodes of trees, usually the root nodes. We assume t h a t " n o d e s " , or to be precise
" t h e values stored at the nodes", are taken from some set K of d a t a elements
or " k e y s " on which a total rder, denoted b y < , is defined. Except in very few
* This work was partially supported by an NSF grant while the author was at Purdue
University, Lafayette, Indiana, USA.
Symmetric Bina~- B-Trees: Data Structure and Maintenance Algorithms 29t
cases it is not necessary to distinguish between the nodes of a tree and the keys
stored there, the meaning will be clear by context. " , " is a special symbol used
in describing B-trees. Its presence should convey the intuitive notion of hori-
zontal arcs (e-arcs) to the left and right of a node as opposed to vertical arcs
!~-arcs).
,)
h~k~2h.
The height k of a B-tree is, as we shall see, related to the amount of work
necessary in the worst cases to insert, retrieve, and delete keys in the tree. To
obtain bounds on k we need the following theorem which characterizes Train(k),
the class of those B-trees of a given height k with the least number of nodes. We
state the theorem and sketch a proof using largely the terminology of the graphical
representation of B-trees.
T h ~ r c m 1. i) T~m(0)={e); T=m(t)--~T8(t ).
ii) t e Trim (k); k > t i f f
a) there is exactly one longest path, say 2.
b) ~ ends with a Q-pointer; in ~t ~-pointers and ~-pointers alternate.
c) t contains no ~-pointers except those in ~t.
Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms 293
( into ~ p2 ,
Now assume that properties a), b), and c) hold. Let [t[ be the number of
nodes in t.
Let tbal(h) be a completely balanced tree of height h, tmin(h ) a minimal B-tree
of height h, tabc (h) a B-tree of height h with properties a), b), c). Then from those
properties it is clear that every tree tabc(h ) satisfies the following recurrence
relation:
[t~bAo) l --o
It, b,(1)l----t.
Thus all B-trees with properties a), b), c) and a given height h have the same
number of nodes and therefore must be minimal, q.e.d.
We now solve the above recurrence relations with Itab,(h)l=]tmin(k)[. For
even tt we get
It=~(h)l=3.2 2 - 2 .
This bound is better than the bound obtained for even h. Let t be a B-tree of
height h. Using the worse bound obtained for even h, we obtain as bounds for It]:
logo(!t] + t) ~ h
fl(t)= / (fl(xh_l),*r*,fl(yh_x) )
(fl(xh_2),r*,fl(yh_x))
(fl(Xh_~), *r, fl(yh_z) )
if t=(xh_l,r, yh_~)
if t=(Xh_2, r, yh_l)
if t=(Xh_X,r, yh_z).
The proof of properties (2) is straightforward by induction on h. q.e.d.
296 R. Bayer:
1 I 1 I 1
B-trees. On the other hand, these same bounds and the fact t h a t balanced-trees
are a proper subclass of B-trees also sflggest t h a t less work should be required to
update B-trees than to update balanced-trees.
Maintenance Algorithms
We now consider the algorithms for maintaining B-trees if keys are inserted
and deleted randomly. The algorithm to retrieve keys is straightforward and will
not be described here. The following transformations will be needed both in the
insertion and the deletion processes whenever two successive p-pointers arise
and must be removed b y " s p l i t t i n g " .
Splitting Algorithm
The function a will transform certain trees, which are no longer B-trees,
back into B-trees. a will be applied only to trees of the form
~({,, .,, z))=,, (( (~,1, ,,1, ,,,), .,, z ) ) = (-i, ,1, (,,s, ,, ~}).
Now uxET~(h); (ur since uaGT~qCh--t), z E T a q ( h - - t ) . Thus
(u v rl, (u s, r, z) )E T6 (h + t). Furthermore postorder traversal of the resulting tree
yields the keys in the same order as postorder traversal of the old tree. For trees of
the general form (u, , r , , y) define: a((u, , , * , y)) = (ua, r 1, (u s, r*, y)). A similar
argument as before shows that this tree is in T~ (h + l).
Case2. u is of the form (ua, *'1", us) or (u4, r l , , us) and u s is (u s, "s, us). Then
define
] ((-1, ,,1, ,,s), ,,, (,,.. ,, ~)) if u = (ul, 9 r 1,, us)
Or(0,, r, Z))
/ ((",, ,1, "s), ,,, (",, ", ~)) if u=(u,,rx*,U~)
~ ((,,, , , , , y ) ) = J ((,,,. ,,1, ,,s), ,,, (-,, ,,, y)) if u = (ul, * rl*, us)
( ((u,, r~, us), "s, (u., ,., y)) if u = (u,, rl *, u3).
Similarly as in case t the resulting trees after applying a are in T6 (h + 1) and yield
the keys on postorder traversal in the same order as before.
298 R. Bayer :
The definition of a for (v, *r*, x) and for (w, r , , z) is left right s y m m e t r i c
with cases I and 2. The details are straight-forward and are omitted. T h e y can
also be found in our i m p l e m e n t a t i o n in the procedures S P L I T R R and S P L I T R L .
Case I is i m p l e m e n t e d in S P L I T L L , case 2 in S P L I T L R .
Insertion Algorithm
VCe will recursively define a function ~ which will insert a node s into a B-tree t.
Starting with an e m p t y tree, L will build a B-tree b y repeated insertion of nodes
in such a w a y t h a t postorder traversal of the tree will yield the keys stored at
the nodes in increasing order. We will not give an explicit proof t h a t , will build
and m a i n t a i n B-trees properly, but the following m a i n - o b s e r v a t i o n in the defini-
tion o f , will m a k e the construction of an induction proof straightforward.
Denote b y , (q, z) the tree obtained b y inserting node q into tree z. Then
z E To (k) ~ t (q, z) E T 0 (k) u TQ (k). F u r t h e r m o r e , if t (q, z) E T e (k), then t (q, z)
is (x, * r, y) or (x, r . , y) but not (x, 9 r . , y), i.e. the root of ~(q, z) (3)
has exactly one 0-pointer.
zE Te(k ) ~,(q, z)E re(k ) u Te(k + t).
The double arrow means "implies t h a t " .
Deletion Algorithm
I n this algorithm we m u s t distinguish between a , o d e and the key stored at
a node. To delete a key s from a B-tree t, first locate tile node, say n, containing
s in t. Then, if n has a nonempt3rleft (right) subtree, replace s b y the next smallest
(next largest) key, say q, in t. q Js found easily proceeding from n one step along
the left (right) pointer and then along the right (left) pointer as long as possible.
Now s is no longer in the tree and q is stored at node n. The node m containing q
originally is at the lowest &level and will have at least one e m p t y subtree. We
will then delete m and the copy of q stored at m from t. Thus we m a y assume
without loss of generality t h a t s will always be deleted from the lowest b-level
in the tree and t h a t s will have at least one e m p t y subtree. In our implementation
the replacement of s b y q and the deletion of m from t are merged into a single
algorithm.
We now define recursively a function a which deletes a key s from a B-tree t
- - u n d e r the assumption t h a t s is stored at a node (at the lowest &level) with at
least one e m p t y s u b t r e e - - a n d results in a B-tree x(s, t).
Observe t h a t in the definition of ~ we will get the following transformations:
'e if t is (e, s, e)
x if t is (x, *s, e) or (e, s*, x)
if t is (e,r,e) and s=4=r or
o~(s,t)= t if t is (x,.r,e) and s>r or
if t is (e,r.,x) and s<r.
In the last three cases the key s to be deleted is not even in the tree and
generally in an i m p l e m e n t a t i o n some special action will be taken.
This is case A in the implementation.
ii) Assume t h a t t is of one of the forms (x, r, y), (v, * r, y), (x, r*, w), (v, * r*, w)
and s > r and x, y, v, w4:e.
Case 1. yEToo(h); cr y)E T0o(h);
{(x,r,o~(s,y)) if t=(x,r,y)ETo(h+t)
~(s,t)= (v,,,,,~(s,y)) if t=(v,,~,y)~T~(h+~).
Thus r162 t) E ToQ(h + t).
Case 2. w E To (h + 1) ; ~ (s, w) e T~ (k + t) ;
{(x,r,,~(s,z~)) if t=(x,r,,w)ET~(hWt)
~(s,t)= (v,,,,,~(s,w)) if t = ( v , , , , , w ) e T ~ ( h + l ) .
Thus r162
(s, t) E To (h ,-t- t).
I n our i m p l e m e n t a t i o n cases t and 2k are t a k e n care of b y shorteutting the
recursion of a ( 9 o t o Q U I T ) as soon as no further modifications of the tree are
required.
Case 3a. yero(h); ~(s,y)r t=(x,r,y)ETo(h +t);
J(x,*r,~(s,y)) if xETo(h )
~(s't)= [a(x, .r,,t(s, y)) if xeTo(h ).
Thus ~ (s, t) E Tn (h + t) w TQ(k). This is ease B in the implementation.
Case 3b. yET~(h) ; ot(s, y)ETQ(k - t ) ; t=(y, .r, y)ET~(h + t) and
v=(vx, rl, v,)ETn(h-k- t)
o~(s,t)=l(v,,rl,(vz,*r,~(s,y))) if v, ETn(h)
t(v~,rl*,a(v~, *r,,t(s,y))) if v,.eTQ(h).
Thus cr (s, t)E Toe (h + 1). This is case C in the i m p l e m e n t a t i o n .
Case 4. wETo(h+t); ~(s, w)ET~(h);
. {(.~,r,o~(s,w)) if t=(x,r.,w)ETo(h+l )
0r (v,.r,o~(s,w)) if t=(v,.r.,w)ET~(h+l).
Thus ~. (s, t) E T~ o (h + 1). This is case F in the implementation.
Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms 301
Main Result
The work that must be performed for random retrievals, insertions, and dele-
tions is even in the worst cases proportional to the height of the B-tree, i.e. to
log2([t]) where It[ is the number of keys in the tree t.
Generalization
From the insertion and deletion algorithms discussed in this paper, it is quite
clear t h a t the class of binary B-trees could be enlarged b y allowing up to n
successive Q-pointers for n-----2, 3, 4 . . . . before requiring any modification or
" r e b a l a n c i n g " of the tree. This would require less rebalancing, but performance
proportional in time to log(It I) would still be guaranteed.
The insertion (deletion) algorithm has been written as two procedures, a non-
recursive outer procedure SYMINS (SYMDEL) and a recursive inner procedure
S Y M S E R T (SYMDELETE). The outer procedure SYMINS (SYNIDEL) allows
shortcutting the full recursion of S Y M I N S E R T (SYMDELETE) via the label
D O N E (QUIT). The inner procedure S Y M I N S E R T (SYMDELETE) performs
insertions (deletions) in a B-tree recursively.
It is assumed that the six procedures S P L I T R R , S P L I T R L , S P L I T L L ,
S P L I T L R , SYMINS, and SYMDEL are all declared in the same block or in such
a way t h a t S P L I T R R , S P L I T R L , S P L I T L L , and S P L I T L R can be used both
in SYMINS and in SYMDEL.
Note. The tree in Fig. t is a suitable tree for testing. Inserting the keys in the
order 8, 9, t t , t5, 19, 20, 2t, 7, 3, 2, t, 5, 6, 4, t3, 14, 10, 12, 17, t6, 18 will build
up the tree. Deleting the keys in the order t, 6, 2, 21, t6, 20, 8, t4, t l , 9, 5, t0,
t2, t,3, 3, 4, 7, t 5, 17, 18, t0 will exercise all the cases which can arise in any dele-
tion process.
procedure SYMINS (X, ROOT, ROOTBIT, FOUNDX, LP, RP, KEY, LBIT,
RBIT, GETQ) ;
v a l u e X; real X; i n t e g e r ROOT; B o o l e a n ROOTBIT;
label FOUNDX; integer array LP, RP; array KEY;
B o o l e a n a r r a y LBIT, RBIT; p r o c e d u r e GETQ;
begin p r o c e d u r e SYMSERT (P, BIT);
integer P; Boolean BIT;
if P =0 then
begin comment INSERT X AS NEW LEAF;
GETQ (P); KEY E P ] : = X ; B I T : = t r u e ;
LP [P] : ~ RP [PI : = 0; LBIT [P] : = RBIT [P] :----false
end
else if X - - KEY [P] t h e n g o t o FOUNDX
else if X < KEY [P] t h e n
begin c o m m e n t INSERT X IN L E F T SUBTREE;
SYMSERT (LP [P], LBIT EP]);
if LBIT [P] t h e n begin
21 Acta Informatma, Vol. I
304 R. Bayer :
if LBIT [LP [P]] then begin SPLITLL (P, LP, RP, LBIT);
B I T : = t r u e end
else if RBIT [LP [P]] then begin
SPLITLR (P, LP, RP, LBIT, RBIT); B I T : = t r u e end end
else gala DONE
end
else begin comment INSERT X IN RIGHT SUBTREE;
SYMSERT (RP [P], RBIT [P]);
if RBIT [P] then begin if RBIT [RP [P]] then
begin SPLITRR (P, LP, RP, RBIT);
B I T : = t r u e end
else if LBIT [RP [P]] then begin
SPLITRL (P, LP, RP, LBIT, RBIT); BIT: = t r u e end end
else goto DONE
e n d OF SYMSERT;
SYMSERT (ROOT, ROOTBIT) ; DONE:
end OF SYMINS
A U X X : = 0;
if ROOT----0 then goto XNOTINTREE else
S Y M D E L E T E (ROOT);
QUIT:
end O F S Y M D E L
References
1. Adelson-Velskii, G.M., Landis, E . M . : An informatioh organization algorithm.
D A N S S S R 146, 263-266 (t962).
2. Bayer, R., McCreight, E . M . : Organization and maintenance of large ordered
indexes. Acta Informatica 1, 173-t89 (1972).
3. Bayer, R. : Binary B-trees for virtual memory. Proceedings of 1971 ACM S I G F I D E T
Workshop on Data Description, Access and Control, edited by E. F. Codd and
A. L. Dean. pp. 219-235 (Nov. 1t-12, 197t), San Diego.
4. Foster, C. C,: Information storage and retrieval using AVL-trees. Proc. ACM 20th
Nat'l. Conf., p. 192-205. 1965.
5. Knott, G. D. : A balanced tree structure and retrieval algorithm. Proc. of Sympo-
sium on Information Storage and Retrieval, Univ. of Maryland, April t-2, t97t,
pp. 175-196.
6. Knuth, D. E. : The art of computer programming, vol. t. Addison-Wesley, 1969.
7. Nievergelt, J., Reingold, E. M. : Binary search trees of bounded balance. To appear,
Proceedings of 4th ACM SIGACT Conference t972.