CMSC 420: Lecture 8 B-Trees: 1 J 1 J 1 I I 1 I 0 J
CMSC 420: Lecture 8 B-Trees: 1 J 1 J 1 I I 1 I 0 J
a a1 : a2 : a3
T1 T2 T3 T4
B-trees are multiway search trees, in which we achieve balance by constraining the “width”
of each node. We have already introduced this concept in our discussion of 2-3 trees. In this
lecture, we will consider how to generalize this to nodes of arbitrary width.
B-trees were first introduced way back in 1970 by Rudolf Bayer and Edward McCreight. They
have proven to be very popular, but with popularity comes variety. Numerous modifications
and adaptations of B-trees have been developed over the years. We will present one (fairly
simple) formulation. Later in the lecture we will discuss a particularly popular variant, called
B+ trees.
For any integer m ≥ 3, a B-tree of order m is a multiway search tree has the following
properties:
49 75 -- --
07 20 31 40 56 66 71 -- 81 89 -- --
02 08 23 35 42 53 58 67 72 77 84 90
04 09 25 38 44 54 59 68 74 78 85 91
06 13 26 -- 48 -- 62 70 -- 80 87 94
-- 19 30 -- -- -- 64 -- -- -- -- 97
in practice. A node in such a tree has between 50 and 100 children and holds between 49 and
99 keys. Of course, with such high fan-outs, the depth of the tree is quite small.
Height Analysis: The following theorem show that as fan-out of a B-tree grows, the height of
the tree decreases.
Theorem: A B-tree of order m containing n keys has height at most (lg n)/γ, where γ =
lg(m/2).
Proof: To avoid messy floor-ceiling arithmetic, let’s just assume that m is even. Let N (h)
denote the number of nodes in the skinniest possible order-m B-tree of height h. The
root has at least two children, each of them has at least m/2 children. Therefore, there
are at least two nodes at depth 1, 2(m/2) nodes at depth 2, 2(m/2)2 nodes at depth 3,
2(m/2)3 nodes at depth 4, and in general, there are at least 2(m/2)k−1 nodes at depth
k. Thus, the total number of nodes in an entire tree of height h is at least
h m i−1 h−1
X X m i
N (h) = 2 = 2 .
2 2
i=1 i=0
ci ,
P
This is a geometric series of the form i where c = m/2, and by standard formulas
h
we have N (h) = 2(c − 1)/(c − 1). Assuming that m is relatively large, we may ignore
the −1 in the numerator and denominator to yield N (h) ≈ 2ch /c = 2ch−1 = 2(m/2)h−1 .
Each node contains at least m/2 − 1 keys. Again, assuming that m is large, we can
approximate this as m/2. So the number of keys is at least (m/2)2(m/2)h−1 = 2(m/2)h .
By our hypothesis, the tree has n keys, and thus (recalling that “lg” means log base 2)
we infer that
m h m h n
n ≥ N (h) ≥ 2 ⇔ ≤
2 2 2
m n
⇔ h lg ≤ lg
2 2
n m
⇔ h ≤ lg lg
2 2
m
⇔ h ≤ (lg n) lg
2
In the case where m = 100, the above result implies that the height of the B-tree is not
greater than (lg n)/5.6, that is, it is 5.6 times smaller than a binary search tree. For example,
this means that you can store over a 100 million keys in a search structure of depth roughly
5.
Node structure: Although B-tree nodes can hold a variable number of items, this number gener-
ally changes dynamically as keys are inserted and deleted. Therefore, every node is allocated
with the maximum possible size, but most nodes will not be fully utilized. (Experimental
studies show that B-tree nodes are on average about 2/3 utilized.)
The code block below shows a possible Java implementation of a B-tree node implementation.
In this case, we are storing integers as the elements. We place twice as many elements in each
leaf node as in each internal node, since we do not need the storage for child pointers.
B-Tree Node
final int M = ... // order of the B-tree
class BTreeNode {
int nChildren; // number of children (from M/2 to M)
BTreeNode child[M]; // children pointers
Key key[M-1]; // keys
Value value[M-1]; // values
}
Note that 2-3 trees and 2-3-4 trees discussed in earlier lectures are special cases (when M = 3
and M = 4, respectively.)
Search: Searching a B-tree for a key x is a straightforward generalization of binary tree searching.
When you arrive at an internal node with keys a1 < a2 < . . . < aj−1 search (either linearly or
by binary search) for x in this list. If you find x in the list, then we have found x. Otherwise,
determine the index i such that ai−1 < x < ai . (Recall that a0 = −∞ and aj = +∞.) Then
recursively search the subtree Ti . When you arrive at a leaf, search all the keys in this node.
If it is not here, then x is not in the B-tree.
Restructuring: In an earlier lecture, we showed how to restructure 2-3 trees. We had three
mechanisms: splitting nodes, merging nodes, and subtree adoption. We will generalize each
of these operations to general B-trees.
Key Rotation (Adoption): Recall that a node in a B-tree can have from dm/2e up to m
children, and the number of keys is smaller by one. As a result of insertion or deletion,
a node may acquire one too many (m + 1 children and hence, m keys) or one too few
(dm/2e − 1 children and hence, dm/2e − 2 keys).
The easiest way in which to remedy the imbalanced is to move a child into or from one
of your siblings, assuming that you have a sibling can absorb this change. This is called
key rotation (or as I call it, adoption). For example, in Fig. 3, the node in red has too
few children, and since its left sibling can spare a child, we move this node’s rightmost
child over, sliding the associated key value up to the parent and we take the parent’s
key value.
This operation is not always possible, because it depends on the existence of a sibling
with a proper number of keys. Because allocating and deallocating nodes is a relatively
expensive operation, we prefer this operation whenever it is available.
?? ?? 57 ?? ?? Key rotation ?? ?? 44 ?? ??
(m = 5)
!!
16 27 44 -- -- 63 -- -- -- -- 16 27 -- -- -- 57 63 -- -- --
T1 T2 T3 T4 T5 T6 T1 T2 T3 T4 T5 T6
Node Splitting: As the result of insertion, a node may acquire one too many children (m+1
children and hence, m keys). When this happens and key rotation is not available, we
split the node into two nodes, one having m0 = dm/2e children and the other having the
remaining m00 = m+1−dm/2e children. Clearly, the first node has an acceptable number
of children. The following lemma demonstrates that the other node has an acceptable
number of children as well.
Lemma: For all m ≥ 2, dm/2e ≤ m + 1 − dm/2e ≤ m.
Proof: If m is even, then dm/2e = m/2, and the middle expression in the inequality
reduces to m + 1 − m/2 = m/2 + 1. Thus, the claim is equivalent to
m m
≤ + 1 ≤ m,
2 2
which is clearly true for any m ≥ 2. On the other hand, if m is odd then dm/2e =
(m+1)/2, and the middle expression in the inequality reduces to m+1−(m+1)/2 =
(m + 1)/2. Thus, the claim is equivalent to
m+1 m+1
≤ ≤ m,
2 2
which is also clearly true for any m ≥ 1.
44 Promote
Node splitting
!!
16 27 44 63 76 (m = 5) 16 27 -- -- -- 63 76 -- -- --
T1 T2 T3 T4 T5 T6 T1 T2 T3 T4 T5 T6
Returning to node splitting, we create two nodes and distribute the smallest m0 subtrees
to the first and the remaining m00 to the second node (see Fig. 4). Among the m − 1
keys, m0 − 1 smallest keys go with the first node and the m00 − 1 largest keys go with
the other node. Since (m0 − 1) + (m00 − 1) = m − 2, we have one extra key that does
not fit into either of these nodes. This node is promoted to the parent node. (As with
2-3 trees, if we do not have a parent, we create a new root node with this single key and
just two children. By the way, this is the reason that we allowed the root to have fewer
than dm/2e children.)
Since the parent acquires an extra key and extra child, the splitting process may prop-
agate to the parent node.
Node Merging: As the result of deletion, a node may have one too few children (dm/2e − 1
children and hence, dm/2e−2 keys). When this happens and key rotation is not available,
we may infer that its siblings have the minimum number dm/2e children. We merge this
node with either of its siblings into a single node having a total of m0 = (dm/2e − 1) +
dm/2e = 2 dm/2e − 1 children. The following lemma demonstrates that the resulting
node has an acceptable number of children.
Lemma: For all m ≥ 2, dm/2e ≤ 2 dm/2e − 1 ≤ m.
Proof: If m is even, then dm/2e = m/2, and the middle expression in the inequality
reduces to 2(m/2) − 1 = m − 1. Thus, the claim is equivalent to
m
≤ m − 1 ≤ m,
2
which is easily true for any m ≥ 2. On the other hand, if m is odd then dm/2e =
(m + 1)/2, and the middle expression in the inequality reduces to m. Thus, the
claim is equivalent to
m+1
≤ m ≤ m,
2
which is easily true for any m ≥ 1.
Returning to node merging, we merge the two nodes into a single node having m0 children
(see Fig. 5). The number of keys from the two initial nodes is dm/2e − 2 + dm/2e =
2 dm/2e − 2 = m0 − 2, which is one too few. We demote the appropriate key from the
parent’s node to yield the desired number of keys.
T1 T2 T3 T4 T5 T1 T2 T3 T4 T5
Since the parent has lost a key and a child, the merging process may propagate to the
parent node.
Given these operations, we can now describe how to perform the various dictionary operations.
Insertion: In the case of 2-3 trees, we would always split a node when it had too many keys. With
B-trees, creating nodes is a more expensive operation. So, whenever possible we will try to
employ key rotation to resolve nodes that are too full, and we will fall back on node splitting
only when necessary.
To insert a key into a B-tree of order m, we perform a search to find the appropriate leaf into
which to insert the node. If we find the key, then we signal a duplicate-key error. Otherwise,
if the leaf is not at full capacity (it has fewer than m − 1 keys) then we simply insert it and
are done. Note that this will involve sliding keys around within the leaf node to make room
for the new entry, but since m is assumed to be a constant (e.g., the size of one disk page),
we ignore this extra cost.
insert(29)
49 -- -- -- 49 -- -- --
07 20 31 40 56 66 71 75 07 20 31 40 56 66 71 75
!!
02 08 23 32 42 53 58 67 72 77 02 08 23 32 42 53 58 67 72 77
04 09 25 34 44 54 59 68 74 78 04 09 25 34 44 54 59 68 74 78
06 13 26 35 48 -- 62 70 -- 80 06 13 26 35 48 -- 62 70 -- 80
-- 19 30 38 -- -- 64 -- -- -- -- 19 29 38 -- -- 64 -- -- --
30
split split
26 49 -- -- 49 -- -- --
!!
07 20 -- -- 31 40 -- -- 56 66 71 75 07 20 26 31 40 56 66 71 75
02 08 23 29 32 42 53 58 67 72 77 02 08 23 29 32 42 53 58 67 72 77
04 09 25 30 34 44 54 59 68 74 78 04 09 25 30 34 44 54 59 68 74 78
06 13 -- -- 35 48 -- 62 70 -- 80 06 13 -- -- 35 48 -- 62 70 -- 80
-- 19 -- -- 38 -- -- 64 -- -- -- -- 19 -- -- 38 -- -- 64 -- -- --
Otherwise the node overflows and to remedy the situation, we first check whether either
sibling is less than full. If so, we perform a rotation moving the extra key and child into
this sibling. Otherwise, we perform a node split as described above (see Fig. 6). When this
happens, the parent acquires a new child and new key, and thus the splitting process may
continue with the parent node.
Deletion: As in binary tree deletion we begin by finding the node to be deleted. We need to find
a suitable replacement for this node. This is done by finding the largest key in the left child
(or equivalently the smallest key in the right child), and moving this key up to fill the hole.
This key will always be at the leaf level. This creates a hole at the leaf node. If this leaf node
still has sufficient capacity (at least dm/2e − 1 keys) then we are done.
Otherwise, we have an underflow situation at this node. As with insertion we first check
whether a key rotation is possible. If one of the two siblings has at least one key more than
the minimum, then we rotate the extra key into this node, and we are done (see Fig. 7).
If this is not possible, then any siblings of ours must have the minimum number of dm/2e
children, and so we can apply a node node merge (see Fig. 8).
07 20 -- -- 31 40 -- -- 56 66 71 75 07 23 -- -- 31 40 -- -- 56 66 71 75
02 08 23 29 35 42 53 58 67 72 77 02 08 23 29 35 42 53 58 67 72 77
04 09 25 30 38 44 54 59 68 74 78 04 09 25 30 38 44 54 59 68 74 78
06 13 -- -- -- 48 -- 62 70 -- 80 06 13 -- -- -- 48 -- 62 70 -- 80
-- 19 -- -- -- -- -- 64 -- -- -- -- 19 -- -- -- -- -- 64 -- -- --
key rotation
26 49 -- -- 26 49 -- --
07 19 -- -- 31 40 -- -- 56 66 71 75 07 23 -- -- 31 40 -- -- 56 66 71 75
!!
02 08 23 29 35 42 53 58 67 72 77 02 08 25 29 35 42 53 58 67 72 77
04 09 25 30 38 44 54 59 68 74 78 04 09 -- 30 38 44 54 59 68 74 78
06 13 -- -- -- 48 -- 62 70 -- 80 06 13 -- -- -- 48 -- 62 70 -- 80
-- -- -- -- -- -- -- 64 -- -- -- -- 19 -- -- -- -- -- 64 -- -- --
delete(30)
26 49 -- -- 26 49 -- --
07 19 -- -- 31 40 -- -- 56 66 71 75 07 19 -- -- 31 40 -- -- 56 66 71 75
!!
02 08 20 29 35 42 53 58 67 72 77 02 08 20 29 35 42 53 58 67 72 77
04 09 25 30 38 44 54 59 68 74 78 04 09 25 -- 38 44 54 59 68 74 78
06 13 -- -- -- 48 -- 62 70 -- 80 06 13 -- -- -- 48 -- 62 70 -- 80
-- -- -- -- -- -- -- 64 -- -- -- -- -- -- -- -- -- -- 64 -- -- --
rotate key
26 56 -- -- 26 49 -- -- merge
!!
07 19 -- -- 40 49 -- -- 66 71 75 -- 07 19 -- -- 40 -- -- -- 56 66 71 75
02 08 20 29 42 53 58 67 72 77 02 08 20 29 42 53 58 67 72 77
04 09 25 31 44 54 59 68 74 78 04 09 25 31 44 54 59 68 74 78
06 13 -- 35 48 -- 62 70 -- 80 06 13 -- 35 48 -- 62 70 -- 80
-- -- -- 38 -- -- 64 -- -- -- -- -- -- 38 -- -- 64 -- -- --
The removal of a key from the parent’s node may cause it to underflow. Thus, the process
may need to be repeated recursively up to the root. If the root now has only one child, and
we make this single child the new root of the B-tree.
B+ trees: B-trees have been very successful, and a number of variants have been proposed. A
particularly popular one for disk storage is called a B+ tree. The key differences with the
standard B-tree as the following:
Storing keys only in the internal nodes saves space, and allows for increased fan-out. This
means the tree height is lower, which reduces number of disk accesses. Thus, the internal
nodes are merely an index to locating the actual data, which resides at the leaf level. (The
policy regarding which keys a subtree contains are changed. Given an internal node with keys
ha1 , . . . , aj−1 i, subtree Tj contains keys x such that ai−1 < x ≤ ai .)
The next-leaf links enable efficient range reporting queries. In such a query, we are asked to
list all the keys in a range [xmin , xmax ]. We simply find the leaf node for xmin and then follow
next-leaf links until reaching xmax .