0% found this document useful (0 votes)
212 views17 pages

Symmetric Binary B-Trees

A class of binary trees is described for maintaining ordered sets of data. Random insertions, deletions, and retrievals of keys can be done in time proportional to logN where N is the cardinality of the data-set. Symmetric B-Trees are a modification of B-trees described previously by Bayer and McCreight. This class of trees properly contains the balanced trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
212 views17 pages

Symmetric Binary B-Trees

A class of binary trees is described for maintaining ordered sets of data. Random insertions, deletions, and retrievals of keys can be done in time proportional to logN where N is the cardinality of the data-set. Symmetric B-Trees are a modification of B-trees described previously by Bayer and McCreight. This class of trees properly contains the balanced trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Acta Informatica 1, 290---306 (1972)

9 by Sprmger-Verlag 1972

Symmetric Binary B-Trees"


Data Structure and Maintenance Algorithms*
R. B a y e r

Received January 24, 1972

Summary. A class of binary trees is described for maintaining ordered sets of data.
Random insertions, deletions, and retrievals of keys can be done in time proportional
to logN where N is the cardinality of the data-set. Symmetric B-Trees are a modifica-
tion of B-trees described previously by Bayer and McCreight. This class of trees properly
contains the balanced trees.

This paper will describe a further solution to the following well-known problem
in information processing: Organize and maintain an index, i.e. an ordered set
of keys or virtual addresses, used to access the elements in a set of data, in such
a w a y t h a t r a n d o m and sequential insertions, deletions, and retrievals can be
performed efficiently.
Other solutions to this problem have been described for a one-level stor~, in
[t, 3-5, 7] and for a two-level store with a pseudo-random access backup store
in [2]. All these techniques use trees to represent the d a t a sets. The class of trees
to be described in this paper is a generalization of the trees described in [t, 3-5],
b u t it is not comparable with the BB-trees described in [7]. The following tech-
nique is suitable for a one-level store.
Readers familiar with E2J and [3] m a y recognize the technique a.~ a further
modification of B-trees introduced in [2]. In [3] binary B-trees were considered
as a special case and a m"dified representation of the B-trees of [2]. Binary
B-trees are derived in a ~traightforward w a y from B-trees, they do exhibit,
however, an a s y m m e t r y in the sense t h a t the left arcs in a binary B-tree must
be d-arcs (downward), whereas the right arcs can be either b-arcs or 0-arcs
(horizontal). Removing this a s y m m e t r y naturally leads to the symmetric binary
B-trees described here.
After this brief digression on the relationship of this paper to earlier work we
will now proceed with a self-contained presentation of symmetric binary B-trees.
Notation. We will use t. ~t, v, w, x, y, z to denote trees and p, q, r, s to denote
nodes of trees, usually the root nodes. We assume t h a t " n o d e s " , or to be precise
" t h e values stored at the nodes", are taken from some set K of d a t a elements
or " k e y s " on which a total rder, denoted b y < , is defined. Except in very few

* This work was partially supported by an NSF grant while the author was at Purdue
University, Lafayette, Indiana, USA.
Symmetric Bina~- B-Trees: Data Structure and Maintenance Algorithms 29t

cases it is not necessary to distinguish between the nodes of a tree and the keys
stored there, the meaning will be clear by context. " , " is a special symbol used
in describing B-trees. Its presence should convey the intuitive notion of hori-
zontal arcs (e-arcs) to the left and right of a node as opposed to vertical arcs
!~-arcs).

Definition of Symmetric Binary B-Trees


Symmetric binary B-trees, henceforth simply called B-trees, are defined
recursively as follows:
i) Let e be the empty tree and let # be the empty set. Then define To (0) = # ,
To (0) = {e}, i.e. the set with the single member e.
ii) For all integers h > 0 define
Tf(h) ={(x,r, y) lx, yr
TQ(h)=((x, ,r, y) l~T~(h), yET~Q(h--t),r~K }
~{(x,r*, y) lx~Tf~(h--t), y~Tf(h),r~K}
,_,{(x,,r,, y) lx, y~T~(h),r~K }
TaQ(h) = T~(h)~T~(h).
iii) A B-tree is a member of Tn~(h) for some integer h ~ 0 .
Note. We call x the left subtree, y the fight subtree, r the root, and h the
8-level or 6-height of a B-tree in Tfr We also say that the node r is at the
~-level or d-height h.
For the purpose of this paper we will use the following list data structure to
represent B-trees: Pointers (or arcs) attached to the root of a tree will point to
its subtrees, a left pointer to the left subtree and a right pointer to the fight
subtree. We consider a node and the two attached pointers as a group of physi-
cally adjacent data items, also called an " e n t r y " . The presence of the special
symbols 9 to the left of r and of 9 to the right of r shall be represented b y left
and fight e-pointers resp., their absence b y b-pointers. In graphical representations
of B-trees Q-pointers appear as horizontal, d-pointers as downward arcs. In the
computer implementation of B-trees, one bit is used to distinguish e-pointers
(t bit) from ~-pointers (0 bit). E m p t y trees and pointers to them are omitted.
A less formal, but intuitively more appealing definition of B-trees using the
terminology of E6] and ignoring empty B-trees is the following:
B-trees are directed binary trees with two kinds of arcs (pointers), namely
~-arcs (downward or vertical pointers) and p-arcs (horizontal pointers) such that:
i) The paths from the root to every leaf all have the same number of ~-arcs.
ii) All nodes except those at the lowest b-level have 2 sons.
iii) Some of the arcs m a y be e-arcs, but there may be no successive e-arcs.
In addition, the keys shall be stored at the nodes of a B-tree in such a way
that postorder traversal [6] of the tree yields the keys in increasing order, where
postord,.~r traversal is defined recursively as follows:
292 R. Bayer:

1. If the tree is empty, do nothing.


2. Traverse left subtree.
3. Visit root.
4. Traverse right subtree.
Fig. t. shows a graphical representation of a B-tree. Readers familiar with
balanced trees It, 4, 5] should observe that B-trees are r~ot balanced trees as
shown by the B-tree in Fig. t.

,)

Fig. t. Example of a symmetric binary B-tree


i
Number of Nodes and Height of a B-Tree
Let the height k of a B-tree be the maximal number of nodes in any path
from the root to a leaf. Note that the height k is larger than the ~-h.eight h whenever
the tree has ~-pointers, in particular for a given tree we.have:

h~k~2h.
The height k of a B-tree is, as we shall see, related to the amount of work
necessary in the worst cases to insert, retrieve, and delete keys in the tree. To
obtain bounds on k we need the following theorem which characterizes Train(k),
the class of those B-trees of a given height k with the least number of nodes. We
state the theorem and sketch a proof using largely the terminology of the graphical
representation of B-trees.
T h ~ r c m 1. i) T~m(0)={e); T=m(t)--~T8(t ).
ii) t e Trim (k); k > t i f f
a) there is exactly one longest path, say 2.
b) ~ ends with a Q-pointer; in ~t ~-pointers and ~-pointers alternate.
c) t contains no ~-pointers except those in ~t.
Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms 293

To prove the theorem we need the following lemmas:

Lemma 1. Every longest path in tETmln(k), k > t , ends with a 0-pointer.


Proof. If there is a longest path 4 without this property, then attach a new
node via a 0-pointer at the end of 4, delete from t the root-node and the subtree
of the root not containing 4. This results in a B-tree t' of the same height as t,
but with fewer nodes, a contradiction.
Lemma 2. If t E Tmin(k), k > t, and t has a longest path 4, then every 0-pointer
in t is in ~t.
Proo[. Assume there is a 0-pointer p not in 4. p points to some subtree u E T0 (m)
for some m. Let u be of the form (ul, rl, u2). Replace u b y ua, and p by a/}-pointer
to u v This results in a B-tree t' of height k, but with fewer nodes than t, a con-
tradiction.
Lemma 3. If 4 is a longest path in rETrain(k), k > t, then 4 does not contain
two or more successive b-pointers.
Proo[. Case 1. 4 contains 3 successive b-pointers, say Pl, P~, Ps pointing to
t,1 E To (m) , t 2 E To (m -- 1), t3 E To ~(m -- 2), respectively. Modify t as follows to obtain
a B-tree t' of height k but with fewer nodes than t: Delete all nodes in t at 0-level t
except those nodes in t 2. Change p, into a 0-pointer.
Case 2. Case t does not apply, but 4 has two successive b-pointers P2, P3
and maybe a 0-pointer Pl preceding P2- Several specific cases arise, but in all eases
modify t as follows to obtain a B-tree t' of height k but with fewer nodes than t:
Change P2 into a 0-pointer, Pl into k b-pointer. Delete one node (r,) and one of
its subtrees (x,). We illustrate just one ease. The described modification will
transform the tree

( into ~ p2 ,

Following [7] we indicate nodes by circles and trees by triangles.


Proo] o] Theorem. i) Obvious.
ii) Assume that rETrain(k); k > t. Then
a) follows from Lemmas t and 2,
b) follows from Lemmas t and 3,
c) follows from Lemma 2.
294 R. Bayer:

Now assume that properties a), b), and c) hold. Let [t[ be the number of
nodes in t.
Let tbal(h) be a completely balanced tree of height h, tmin(h ) a minimal B-tree
of height h, tabc (h) a B-tree of height h with properties a), b), c). Then from those
properties it is clear that every tree tabc(h ) satisfies the following recurrence
relation:

It~o(h)l=t+ tba, ~ - 1 ) +lt~=(h-t)l; heven


It.~.(h)l-----t+ tb.,(~--~)+lt~,~(h--~)l; /,odd

[t~bAo) l --o
It, b,(1)l----t.
Thus all B-trees with properties a), b), c) and a given height h have the same
number of nodes and therefore must be minimal, q.e.d.
We now solve the above recurrence relations with Itab,(h)l=]tmin(k)[. For
even tt we get

= 2 + 2 . (2h/s-t-- t) +[t,,la(h --2)[


=#~ +t=~(h-2)
=2/ca + 2 ~/z-x + " " -~2 x + 0
= 2h/t+x -- 2.

Similarly one obtains for odd h:


h--1

It=~(h)l=3.2 2 - 2 .
This bound is better than the bound obtained for even h. Let t be a B-tree of
height h. Using the worse bound obtained for even h, we obtain as bounds for It]:

Taking logarithms we obtain:


h
~- + t __log, (it I + 2 )

logo(!t] + t) ~ h

and consequently as bounds for *'


.:~-..9 height h of a B-tree t with It[ nodes:

log, (It] + t) _<_h~_2 logz([t[ + 2 ) --2.


Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms 295

B-Trees and Balanced Trees


We wish to compare the class of balanced trees with the class of B-trees.
The use of the special s y m b o l . , i.e. the distinction between horizontal and
vertical pointers, in the definition of B-trees makes B-trees strictly speaking
different objects t h a n binary trees. Ignoring t h a t distinction, however, we can
consider a B-tree as a binary tree since each node has 0, t, or 2 sons. Given a
binary tree t and a B-tree u we w a n t to consider t and u as essentially the same
trees if u has the same structure as t except,for the use of Q-pointers and 8-pointers.
The notion of " s i m i l a r i t y " makes this idea precise.
Definition. Let t be a binary tree and let u be a B-tree. Following [6] we
define the binary relation _ of similarity between t and u recursively as follows:
t ~- u if t----e and u = e or
if t = ( x , r , y ) and [uis(v,s,w) or(v, . s , w ) o r ( v , s . , w ) o r ( v , .s.,w)]
and x-~v and r=s and y-~w.
t :~ u otherwise.
The following theorem is a precise formulation of the statement, t h a t each
balanced tree can also be considered as a B-tree. We will see t h a t the converse
does not hold.
Theorem. Let t be a balanced tree, then there is a B-tree u such t h a t t ~-u.
Prool. We recursively define a function fl m a p p i n g balanced trees to B-trees
such t h a t t ~_ fl (t). F r o m the definition of fl it will be easy to see t h a t :
/ h+t ~
i) if t is a balanced tree of odd height h, then fl (t)E T e [ ~

"ii) iftisofevenheighth,thenfl(t)ET,(~).Thus~(t)ET,,([~])always. (2)


iii) t fi(t).
The function fl will not change the structure of t, fl only decides which pointers
of t should be p p o i n t e r s or 8-pointers in considering t as a B-tree.
Definition of 13. Denote b y xh_ 1, Yh-, etc. balanced trees of height h - t, h - - 2
etc. resp.
Case 1. fl (t) = e if t = e; remember t h a t e E TQ(0).
Case 2. t is a balanced tree of odd height h.
[(fl(Xh_l),r, fl(yh_l)) if t=(Xh_~,r, yh_~); h>t
|

fl(t)=~(fl(xh_2),r, fl(yh_l) ) if t=(Xh_vr, yh_t) ; h>3


|

[(fl(xh_~),r, fl(yh_2)) if t=(xh_x,r, yh_~) ; h ~ 3.


Case 3. t is a balanced tree of even height h ~ 2.

fl(t)= / (fl(xh_l),*r*,fl(yh_x) )
(fl(xh_2),r*,fl(yh_x))
(fl(Xh_~), *r, fl(yh_z) )
if t=(xh_l,r, yh_~)
if t=(Xh_2, r, yh_l)
if t=(Xh_X,r, yh_z).
The proof of properties (2) is straightforward by induction on h. q.e.d.
296 R. Bayer:

This t h e o r e m says t h a t essentially, i.e. up to similarity, the class of balanced


trees is a subclass of t h e class of B-trees. T h a t it is a proper subclass can easily
be seen from the fact t h a t there is no balanced tree similar to the B-tree in Fig. t,
Fig. ) shows a B-tree obtained from the b a l a n r e d tree in Fig. 2 according to
the function ft.
The upper b o u n d on the height of a B-tree obtained in (t) is a p p r o x i m a t e l y
2log 2 (It D instead of t, 5 log2 (It[) for the height of a balanced-tree [4]. This means
t h a t the u p p e r b o u n d for the retrieval time is b e t t e r for balanced-trees t h a n for

1 I 1 I 1

Fig. 2. A balanced tree of height 5

Fig. 3. The balanced tree of Fig, 2 considered as a B-tree of (5-height 3


Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms 297

B-trees. On the other hand, these same bounds and the fact t h a t balanced-trees
are a proper subclass of B-trees also sflggest t h a t less work should be required to
update B-trees than to update balanced-trees.

Maintenance Algorithms
We now consider the algorithms for maintaining B-trees if keys are inserted
and deleted randomly. The algorithm to retrieve keys is straightforward and will
not be described here. The following transformations will be needed both in the
insertion and the deletion processes whenever two successive p-pointers arise
and must be removed b y " s p l i t t i n g " .

Splitting Algorithm
The function a will transform certain trees, which are no longer B-trees,
back into B-trees. a will be applied only to trees of the form

(u, * r, z), (u, * r*, y), (v, * r*, x), (w, r , , x)


where
~,, x~T.(h)
~, y~T~(h)
w, zeT~Qih--t).
a applied to any one of those trees will result in a tree in the class T 6 (h + 1).
Intuitively a will transform trees to remove successive p-pointers, but it will
raise the ~-level of the tree b y one. Furthermore, if t yields increasing keys on
postorder traversal, then a (t) will.
Case 1. Let u be of the form (ul, .,1, u2). Then define:

~({,, .,, z))=,, (( (~,1, ,,1, ,,,), .,, z ) ) = (-i, ,1, (,,s, ,, ~}).
Now uxET~(h); (ur since uaGT~qCh--t), z E T a q ( h - - t ) . Thus
(u v rl, (u s, r, z) )E T6 (h + t). Furthermore postorder traversal of the resulting tree
yields the keys in the same order as postorder traversal of the old tree. For trees of
the general form (u, , r , , y) define: a((u, , , * , y)) = (ua, r 1, (u s, r*, y)). A similar
argument as before shows that this tree is in T~ (h + l).
Case2. u is of the form (ua, *'1", us) or (u4, r l , , us) and u s is (u s, "s, us). Then
define
] ((-1, ,,1, ,,s), ,,, (,,.. ,, ~)) if u = (ul, 9 r 1,, us)
Or(0,, r, Z))
/ ((",, ,1, "s), ,,, (",, ", ~)) if u=(u,,rx*,U~)

~ ((,,, , , , , y ) ) = J ((,,,. ,,1, ,,s), ,,, (-,, ,,, y)) if u = (ul, * rl*, us)
( ((u,, r~, us), "s, (u., ,., y)) if u = (u,, rl *, u3).

Similarly as in case t the resulting trees after applying a are in T6 (h + 1) and yield
the keys on postorder traversal in the same order as before.
298 R. Bayer :

The definition of a for (v, *r*, x) and for (w, r , , z) is left right s y m m e t r i c
with cases I and 2. The details are straight-forward and are omitted. T h e y can
also be found in our i m p l e m e n t a t i o n in the procedures S P L I T R R and S P L I T R L .
Case I is i m p l e m e n t e d in S P L I T L L , case 2 in S P L I T L R .

Insertion Algorithm
VCe will recursively define a function ~ which will insert a node s into a B-tree t.
Starting with an e m p t y tree, L will build a B-tree b y repeated insertion of nodes
in such a w a y t h a t postorder traversal of the tree will yield the keys stored at
the nodes in increasing order. We will not give an explicit proof t h a t , will build
and m a i n t a i n B-trees properly, but the following m a i n - o b s e r v a t i o n in the defini-
tion o f , will m a k e the construction of an induction proof straightforward.
Denote b y , (q, z) the tree obtained b y inserting node q into tree z. Then
z E To (k) ~ t (q, z) E T 0 (k) u TQ (k). F u r t h e r m o r e , if t (q, z) E T e (k), then t (q, z)
is (x, * r, y) or (x, r . , y) but not (x, 9 r . , y), i.e. the root of ~(q, z) (3)
has exactly one 0-pointer.
zE Te(k ) ~,(q, z)E re(k ) u Te(k + t).
The double arrow means "implies t h a t " .

Definition of L T o insert a node s into the B-tree t:


i) if t~-e then t(s, t ) = ( e , s, e). Observe t h a t tE Te(0 ), t(s, t)E T0(t ).
ii) t has one of the forms (x, r, y), (v, * r, y), (x, r . , w), (v, * r * , w) and s is
equal to r. Let t (s, t) = t. This m e a n s t h a t s is already in the tree. In an imple-
m e n t a t i o n of t some special action m a y then be taken.
iii) t has one of the forms as in ii) and s<r. Assume tETte(h ), thus
x, yET~e(h--I ) and v, wET6(h ).
Case1. t is (x, r, y) or (x,r., w). Then xET~e(h--t ) b y the definition of
B-trees. Insert s into x, i.e. c o n s t r u c t , (s, x) and proceed according to one of the
following two cases.
Case la. ,(s, x)ET~o(h--t ). Then define
(eCs, x),r,y) if t is (x,r,y),
,(s,t)= (,(s,x),r.,w) if t is (x,r*,w).
This m e a n s t h a t we simply insert s into the left subtree x, b u t do not change t
further. Note t h a t t (s, t) E T0~ (h).
Case lb. e(s, x) E T~ (h). Then define
(,(s, x), *r, y) if t is (x,r,y),
,(s,t)= (,(s,x),.r.,w) if t is (x,r.,w).
This means t h a t we insert s into the left subtree x, but since this increases the
d-height of the left subtree b y t, we also have to change the left pointer from r
to become a 0-pointer. Note t h a t ,(s, t) ET o(h) and it has one of the forms
described in (3)-
S,saa,.netIlc Binary B-Trees: Data Structure and Maintenance Algorithms 299

Case 2. t is (v, . r , y) or (v, . r . , w), i.e. tETo(h ). Thus vE To (h) b y t h e definition


of a B-tree. Insert s into v, i.e. construct t (s, v) and proceed according to one of tile
following two cases:
Case Ca. t (s, v) E T~ (h). Then define

(t(s, v), .r, y) if t is (v,,r,y),


t(s,t)-- (t(s,v),.r.,w) if t is (v,.r.,w).
This means t h a t we simply insert s into the left subtree, but we do not change t
further. Note t h a t t (s, t) E T o (h).
Case 2b. t(s, v)ETo(h ). N o w (t(s, v), .r, y) or (t(s, v), , r . , w) are no longer
a(t(s, v), . r , y ) and a(t(s, v), . r . , w) are B-trees in
B-trees, b u t T n ( h + t ) . We
define
la(t(s,v),.r,y) if t-----(v,.r,y),
) if t=(v,,r,,w).
ivl t has one the forms as in if) and r < s. The definition of t is left-right
symmetric to case iii). The details are omitted.
I t is crucial to observe t h a t the depth of recursion of t is limited b y the height
of the tree. N o w one can represent B-trees in such a way, as e.g. in our imple-
mentation, t h a t the work for the transformation performed b y t (and b y a) on
a B-tree is at each level b o u n d e d b y a constant. Then the total a m o u n t of work
required for the insertion of a single node into a tree t is at worst proportional to
the height of the tree, i.e. to 21ogz([t I + 2 ) - - 2 .

Deletion Algorithm
I n this algorithm we m u s t distinguish between a , o d e and the key stored at
a node. To delete a key s from a B-tree t, first locate tile node, say n, containing
s in t. Then, if n has a nonempt3rleft (right) subtree, replace s b y the next smallest
(next largest) key, say q, in t. q Js found easily proceeding from n one step along
the left (right) pointer and then along the right (left) pointer as long as possible.
Now s is no longer in the tree and q is stored at node n. The node m containing q
originally is at the lowest &level and will have at least one e m p t y subtree. We
will then delete m and the copy of q stored at m from t. Thus we m a y assume
without loss of generality t h a t s will always be deleted from the lowest b-level
in the tree and t h a t s will have at least one e m p t y subtree. In our implementation
the replacement of s b y q and the deletion of m from t are merged into a single
algorithm.
We now define recursively a function a which deletes a key s from a B-tree t
- - u n d e r the assumption t h a t s is stored at a node (at the lowest &level) with at
least one e m p t y s u b t r e e - - a n d results in a B-tree x(s, t).
Observe t h a t in the definition of ~ we will get the following transformations:

3' ~ To (k) ~o~ (s, y) E T~,(k) u TQ(k)


yET~(k)~(s,y)ET~(k) or a(s,y) ET0(k-l) and is of one of the forms (4)
e, (Yl, *rl' Y,), (Y~, rl*, Y2), b u t n o t (Yl, * rl*, 3'2).
300 R. Bayer :

Definition of ~t. i) tE Too(t ) and t has at least one e m p t y subtree.

'e if t is (e, s, e)
x if t is (x, *s, e) or (e, s*, x)
if t is (e,r,e) and s=4=r or
o~(s,t)= t if t is (x,.r,e) and s>r or
if t is (e,r.,x) and s<r.
In the last three cases the key s to be deleted is not even in the tree and
generally in an i m p l e m e n t a t i o n some special action will be taken.
This is case A in the implementation.
ii) Assume t h a t t is of one of the forms (x, r, y), (v, * r, y), (x, r*, w), (v, * r*, w)
and s > r and x, y, v, w4:e.
Case 1. yEToo(h); cr y)E T0o(h);
{(x,r,o~(s,y)) if t=(x,r,y)ETo(h+t)
~(s,t)= (v,,,,,~(s,y)) if t=(v,,~,y)~T~(h+~).
Thus r162 t) E ToQ(h + t).
Case 2. w E To (h + 1) ; ~ (s, w) e T~ (k + t) ;
{(x,r,,~(s,z~)) if t=(x,r,,w)ET~(hWt)
~(s,t)= (v,,,,,~(s,w)) if t = ( v , , , , , w ) e T ~ ( h + l ) .
Thus r162
(s, t) E To (h ,-t- t).
I n our i m p l e m e n t a t i o n cases t and 2k are t a k e n care of b y shorteutting the
recursion of a ( 9 o t o Q U I T ) as soon as no further modifications of the tree are
required.
Case 3a. yero(h); ~(s,y)r t=(x,r,y)ETo(h +t);
J(x,*r,~(s,y)) if xETo(h )
~(s't)= [a(x, .r,,t(s, y)) if xeTo(h ).
Thus ~ (s, t) E Tn (h + t) w TQ(k). This is ease B in the implementation.
Case 3b. yET~(h) ; ot(s, y)ETQ(k - t ) ; t=(y, .r, y)ET~(h + t) and
v=(vx, rl, v,)ETn(h-k- t)
o~(s,t)=l(v,,rl,(vz,*r,~(s,y))) if v, ETn(h)
t(v~,rl*,a(v~, *r,,t(s,y))) if v,.eTQ(h).
Thus cr (s, t)E Toe (h + 1). This is case C in the i m p l e m e n t a t i o n .
Case 4. wETo(h+t); ~(s, w)ET~(h);
. {(.~,r,o~(s,w)) if t=(x,r.,w)ETo(h+l )
0r (v,.r,o~(s,w)) if t=(v,.r.,w)ET~(h+l).
Thus ~. (s, t) E T~ o (h + 1). This is case F in the implementation.
Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms 301

iii) If s is smaller than r then definition of ~ is left-right symmetric with if).


The details are straightforward and are omitted.
Note that the. depth of recursion necessary for 0t is limited b y the height of
the tree. Similarly as in the insertion process the total amount of work required
for the deletion of a single key is at worst proportional to the height of the tree.

Main Result
The work that must be performed for random retrievals, insertions, and dele-
tions is even in the worst cases proportional to the height of the B-tree, i.e. to
log2([t]) where It[ is the number of keys in the tree t.

Generalization
From the insertion and deletion algorithms discussed in this paper, it is quite
clear t h a t the class of binary B-trees could be enlarged b y allowing up to n
successive Q-pointers for n-----2, 3, 4 . . . . before requiring any modification or
" r e b a l a n c i n g " of the tree. This would require less rebalancing, but performance
proportional in time to log(It I) would still be guaranteed.

Implementation of Insertion and Deletion Algorithms for B-Trees


For the ALGOL 60 implementation to be considered here a node in a B-tree
shall consist of five fields, namely:
LBIT: a Boolean variable to indicate t h a t the left arc is a Q-arc (true) or a
6-arc (false)
LP: the left downward pointer, an integer
KEY: the key in the node, a real
RP: the right pointer, downward or horizontal, also an integer
R B I T : a Boolean variable to indicate that the right pointer is a Q-arc (true) or
a ~-arc (false)
The absence of a pointer shall be represented by the value 0. Thus the insertion
and deletion procedure have array parameters LBIT, LP, K E Y , RP, R B I T to
store the nodes of the tree. The parameter X is the key to be inserted into or
deleted from the tree to whose root the parameter ROOT is pointing (ROOT = 0
for an e m p t y tree). The Boolean R O O T B I T indicates ROOT as a Q-arc or as a
8-arc. There are two procedure parameters to maintain a hst of free nodes, namely
A D D Q for the deletion procedure to enter a freed node into the free list, and
G E T Q for the insertion procedure to obtain a free node from the free list. Both
A D D Q and G E T Q have one integer parameter pointing to the node added to or
obtained from the free list. If the key to be inserted is already in the tree, control
will be transferred to the label parameter F O U N D X . If the key to be deleted is
not in the tree, control will be transferred to the label parameter X N O T I N T R E E
b y the deletion procedure.
302 R. Bayer:

The parameter P in SYMSERT and S Y M D E L E T E is the pointer to the root


of the subtree in which tile insertion or deletion must be performed. The parameter
B I T in SYMSERT indicates whether P is a 0-arc or a 6-arc.
The four procedures S P L I T R R , S P L I T R L , S P L I T L L , and S P L I T L R modify
the B-tree in order to remove successive Q-pointers. They are used both in the
insertion procedure SYMINS and in the deletion procedure SYMDEL, and are
the implementation of the function a.
Other local quantities in the procedures are:
AUXP: an auxiliary integer variable used as a temporary store for pointers.
D O N E : a label to which control is transferred after completing an insertion in
order to shortcut the full recursion of SYMSERT.
A U X X : an auxiliary integer variable pointing to the key X after it has been
found in the tree. A U X X ~ 0 otherwise.
Q U I T : a label to which control is transferred after completing the deletion of the
key in order to shortcut the full recursion of S Y M D E L E T E .
AUXD: an auxiliary integer variable used as temporary store for pointers.
SL: a label from where deletion of the key from the left subtree (smaller) is
continued.
GL: a label from where deletion of the key from the right subtree (greater) is
continued.

The insertion (deletion) algorithm has been written as two procedures, a non-
recursive outer procedure SYMINS (SYMDEL) and a recursive inner procedure
S Y M S E R T (SYMDELETE). The outer procedure SYMINS (SYNIDEL) allows
shortcutting the full recursion of S Y M I N S E R T (SYMDELETE) via the label
D O N E (QUIT). The inner procedure S Y M I N S E R T (SYMDELETE) performs
insertions (deletions) in a B-tree recursively.
It is assumed that the six procedures S P L I T R R , S P L I T R L , S P L I T L L ,
S P L I T L R , SYMINS, and SYMDEL are all declared in the same block or in such
a way t h a t S P L I T R R , S P L I T R L , S P L I T L L , and S P L I T L R can be used both
in SYMINS and in SYMDEL.
Note. The tree in Fig. t is a suitable tree for testing. Inserting the keys in the
order 8, 9, t t , t5, 19, 20, 2t, 7, 3, 2, t, 5, 6, 4, t3, 14, 10, 12, 17, t6, 18 will build
up the tree. Deleting the keys in the order t, 6, 2, 21, t6, 20, 8, t4, t l , 9, 5, t0,
t2, t,3, 3, 4, 7, t 5, 17, 18, t0 will exercise all the cases which can arise in any dele-
tion process.

procedure SPIITRR (P, LP, RP, RBIT);


integer P; integer array RP, LP;
Boolean array RBIT;
begin integer AUXP;
AUXP: =R1 ) [P]; RBIT [AUXP] :-----false;
RP [ P ] : = L P [AUXP] ; RBIT [P] :=false;
LP [AUXP] : = P ; P : = A U X P
end OF SPLITRR
Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms 303

procedure SPLITRL (P, LP, RP, LBIT, RBIT);


integer P; integer array LP, RP;
B o o l e a n a r r a y LBIT, RBIT;
begin integer AUXP;
AUXP: =- LP [RP[P]] ;
LBIT [RP [P]] :----false; LP [RP [P]] :-----RP [AUXP] ;
RP [AUXP] : = RP [P] ; RP [P] : = LP [AUXP] ;
RBIT [P] : ----false; LP [AUXP] : = P; P: = AUXP
end OF S P L I T R L

procedure S P L I T L L (P, LP, RP, LBIT);


integer P; integer array LP, RP;
Boolean array LBIT;
begin integer AUXP;
A U X P : = L P [P]; LBIT [AUXP]'-~false;
L P [ P ] : = RP [AUXP] ; LBIT [P] : -- false;
LP [AUXP] : = P; P: = AUXP
end of SPLITLL

procedure S P L I T L R (P, LP, RP, LBIT, RBIT);


integer P; integer array LP, RP;
B o o l e a n a r r a y LBIT, RBIT;
begin integer AUXP;
A U X P : = R P ELP [P]];
RP [LP [ P ] ] : = LP [AUXP] ; RBIT [LP [ P ] ] : = false;
L P [ A U X P ] :----L P [P]; L P [P]:= R P [AUXP];
L B I T [P] :=false; R P [ A U X P ] :=P; P: = A U X P
end O F S P L I T L R

procedure SYMINS (X, ROOT, ROOTBIT, FOUNDX, LP, RP, KEY, LBIT,
RBIT, GETQ) ;
v a l u e X; real X; i n t e g e r ROOT; B o o l e a n ROOTBIT;
label FOUNDX; integer array LP, RP; array KEY;
B o o l e a n a r r a y LBIT, RBIT; p r o c e d u r e GETQ;
begin p r o c e d u r e SYMSERT (P, BIT);
integer P; Boolean BIT;
if P =0 then
begin comment INSERT X AS NEW LEAF;
GETQ (P); KEY E P ] : = X ; B I T : = t r u e ;
LP [P] : ~ RP [PI : = 0; LBIT [P] : = RBIT [P] :----false
end
else if X - - KEY [P] t h e n g o t o FOUNDX
else if X < KEY [P] t h e n
begin c o m m e n t INSERT X IN L E F T SUBTREE;
SYMSERT (LP [P], LBIT EP]);
if LBIT [P] t h e n begin
21 Acta Informatma, Vol. I
304 R. Bayer :

if LBIT [LP [P]] then begin SPLITLL (P, LP, RP, LBIT);
B I T : = t r u e end
else if RBIT [LP [P]] then begin
SPLITLR (P, LP, RP, LBIT, RBIT); B I T : = t r u e end end
else gala DONE
end
else begin comment INSERT X IN RIGHT SUBTREE;
SYMSERT (RP [P], RBIT [P]);
if RBIT [P] then begin if RBIT [RP [P]] then
begin SPLITRR (P, LP, RP, RBIT);
B I T : = t r u e end
else if LBIT [RP [P]] then begin
SPLITRL (P, LP, RP, LBIT, RBIT); BIT: = t r u e end end
else goto DONE
e n d OF SYMSERT;
SYMSERT (ROOT, ROOTBIT) ; DONE:
end OF SYMINS

procedure SYMDEL (X, ROOT, XNOTINTREE, LP, RP, KEY, LBIT,


RBIT, ADDQ) ;
value X; real X; integer ROOT;
label XNOTINTREE; integer array LP, RP; array KEY;
Boolean array LBIT, RBIT; procedure ADDQ;
begin integer AUXX, AUXD;
comment RECURSIVE B-TREE DELETION ALGORITHM;
procedure SYMDELETE (P); integer P;
begin c o m m e n t DID WE FIND THE KEY TO BE DELETED;
if X = K E Y [P] t h e n AUXX: = P ;
if X _~ KEY [P] ALP [P] ~ 0 then
SL: begin SYMDELETE (LP [P]) ;
c o m m e n t CASES D, E, G;
if LBIT [P] then begin c o m m e n t CASE G;
LBIT [ P ] : = f a l s e ; g a l a QUIT end OF CASE G
else begin comment CASES E, D;
if RBIT [P] then begin comment CASE E;
AUXD:----RP [P]; R P [P] : = L P [AUXD] ;
LP [AUXD] := P; P: ----A U X D ;
if LBIT [RP [LP [P]]] then
begin S P L I T R L (LP [P], LP, RP, LBIT, RBIT) ;
LBIT [P] : = t r u e end
else if RBIT [RP [LP [P]]] then
begin SPLITRR (LP iP], LP, RP, RBIT) ;
LBIT IP] :-~true end;
goto QUIT
end OF CASE E
Symmetric Binary B-Trees. Data Structure and Maintenance Algorithms 305

else begin comment CASE D;


RBIT [P]:----true; if LBIT [RP [P2] then begin
SPLITRL (P, LP, RP, LBIT, RBIT); goto QUIT end
else if RBIT [RP [P]] then begin
SPLITRR (P, LP, RP, RBIT); goto QUIT END
end OF CASE D
end OF CASES D, E
end OF SL AND CASES D, E, G

else if X ~ KEY [P] A RP [PJ 4:0 then


GL: begin SYMDELETE (RP [P])"
comment CASES B, C, F;
if RBIT [P] then begin c o m m e n t CASE F;
RBIT [ P ] : = f a l s e ; goto QUIT end OF CASE F

else begin c o m m e n t CASES B, C;


if LBIT [P] then begin c o m m e n t CASE C;
AUXD: = LP [P]; LP [P] : -----RP [A-UXD];
RP [AUXD]:=P; P : = A U X D ;
if RBIT [LP [RP [P]]] then
begin SPLITLR (RP [P], LP, RP, LBIT, RBIT);
RBIT [P] : = t r u e end
else if LBIT [LP [RP [P]]] then
begin SPLITLL (RP [P], LP, RP, LBIT);
RBIT [P] : = t r u e end;
goto QUIT
end OF CASE C

else begin c o m m e n t CASE B;


LBIT [ P ] : = t r u e ;
if RBIT [LP [P]] then begin
S PLITLR (P, LP, RP, LBIT, RBIT); goto QUIT end
else if LBIT [LP [P]] then begin
SPLITLL (P, LP, RP, LBIT) ; goto QUIT end
end OF CASE B
end OF CASES B, C
end OF GL AND CASES B, C, F

else begin comment ARRIVED AT LEAF OR NEXT TO ONE, CASE A;


if AUXX = 0 then 9oto XNOTINTREE;
KEY [AUXX] : = KEY [P] ;
A U X D : = if LBIT [P] then LP [P] else RP [P];
ADDQ (P); P : = A U X D ; if P4:0 then goto QUIT
end
end OF SYMDELETE;
21"
306 R. Bayer: Data Structure and Maintenance Algorithms

A U X X : = 0;
if ROOT----0 then goto XNOTINTREE else
S Y M D E L E T E (ROOT);
QUIT:
end O F S Y M D E L

References
1. Adelson-Velskii, G.M., Landis, E . M . : An informatioh organization algorithm.
D A N S S S R 146, 263-266 (t962).
2. Bayer, R., McCreight, E . M . : Organization and maintenance of large ordered
indexes. Acta Informatica 1, 173-t89 (1972).
3. Bayer, R. : Binary B-trees for virtual memory. Proceedings of 1971 ACM S I G F I D E T
Workshop on Data Description, Access and Control, edited by E. F. Codd and
A. L. Dean. pp. 219-235 (Nov. 1t-12, 197t), San Diego.
4. Foster, C. C,: Information storage and retrieval using AVL-trees. Proc. ACM 20th
Nat'l. Conf., p. 192-205. 1965.
5. Knott, G. D. : A balanced tree structure and retrieval algorithm. Proc. of Sympo-
sium on Information Storage and Retrieval, Univ. of Maryland, April t-2, t97t,
pp. 175-196.
6. Knuth, D. E. : The art of computer programming, vol. t. Addison-Wesley, 1969.
7. Nievergelt, J., Reingold, E. M. : Binary search trees of bounded balance. To appear,
Proceedings of 4th ACM SIGACT Conference t972.

Prof. Dr. Rudolf Bayer


Mathematisches I n s t i t u t
der Technischen UniversitAt, Mtinchen
D-8000 Mtinchen 2
Arcisstr. 21
Federal Republic of Germany

You might also like