Principles of Database Management Systems: 4.1: B-Trees
Principles of Database Management Systems: 4.1: B-Trees
1: B-Trees 1
Principles of Database
Management Systems
4.1: B-Trees
DBMS 2001 Notes 4.1: B-Trees 2
B-Trees
a commonly used index structure
nonsequential, balanced (acces paths to
different records of equal length)
adapts well to insertions & deletions
consists of blocks holding at most n keys
and n+1 pointers, and at least half of this
(We consider a variation actually called a
B+ tree)
DBMS 2001 Notes 4.1: B-Trees 3
Root
B+Tree Example n=3
1
0
0
1
2
0
1
5
0
1
8
0
3
0
3
5
1
1
3
0
3
5
1
0
0
1
0
1
1
1
0
1
2
0
1
3
0
1
5
0
1
5
6
1
7
9
1
8
0
2
0
0
DBMS 2001 Notes 4.1: B-Trees 4
Sample non-leaf
to keys to keys to keys to keys
< 120 120s k<150 150sk<180 >180
1
2
0
1
5
0
1
8
0
DBMS 2001 Notes 4.1: B-Trees 5
Sample leaf node:
From non-leaf node
to next leaf
in sequence
1
2
0
1
3
0
u
n
u
s
e
d
T
o
r
e
c
o
r
d
w
i
t
h
k
e
y
1
2
0
T
o
r
e
c
o
r
d
w
i
t
h
k
e
y
1
3
0
DBMS 2001 Notes 4.1: B-Trees 6
Dont want nodes to be too empty
Number of pointers in use:
at internal nodes at least (n+1)/2
(to child nodes)
at leaves at least (n+1)/2
(to data records/blocks)
x= min {nZ n x}
x= max {nZn x}
DBMS 2001 Notes 4.1: B-Trees 7
Full node min. node
Non-leaf
Leaf
n=3
1
2
0
1
5
0
1
8
0
3
0
3
5
1
1
3
0
3
5
DBMS 2001 Notes 4.1: B-Trees 8
B+tree rules
(1) All leaves at the same lowest
level (balanced tree)
(2) Pointers in leaves point to
records except for sequence
pointer
DBMS 2001 Notes 4.1: B-Trees 9
(3) Number of pointers/keys for B+tree
Non-leaf
(non-root)
n+1 n (n+1)/2 (n+1)/2- 1
Leaf
(non-root)
n+1 n
Root n+1 n 2
(*)
1
Max Max Min Min
ptrs keys ptrsdata keys
(n+1)/2 (n+1)/2
(*)
1, if only one record in the file
DBMS 2001 Notes 4.1: B-Trees 10
Insert into B+tree
First lookup the proper leaf;
(a) simple case
leaf not full: just insert (key, pointer-to-record)
(b) leaf overflow
(c) non-leaf overflow
(d) new root
DBMS 2001 Notes 4.1: B-Trees 11
(a) Insert key = 32
n=3
3
5
1
1
3
0
3
1
3
0
1
0
0
3
2
DBMS 2001 Notes 4.1: B-Trees 12
(b) Insert key = 7
n=3
3
5
1
1
3
0
3
1
3
0
1
0
0
3
5
7
7
DBMS 2001 Notes 4.1: B-Trees 13
(c) Insert key = 160
n=3
1
0
0
1
2
0
1
5
0
1
8
0
1
5
0
1
5
6
1
7
9
1
8
0
2
0
0
1
6
0
1
8
0
1
6
0
1
7
9
DBMS 2001 Notes 4.1: B-Trees 14
(d) New root, insert 45
n=3
1
0
2
0
3
0
1
2
3
1
0
1
2
2
0
2
5
3
0
3
2
4
0
4
0
4
5
4
0
3
0
new root
Height grows at root
=> balance maintained
DBMS 2001 Notes 4.1: B-Trees 15
Again, first lookup the proper leaf;
(a): Simple case: no underflow; Otherwise ...
(b): Borrow keys from an adjacent sibling
(if it doesn't become too empty); Else ...
(c): Coalesce with a sibling node
> (d): Cases (a), (b) or (c) at non-leaf
Deletion from B+tree
DBMS 2001 Notes 4.1: B-Trees 16
(b) Borrow keys
Delete 50
1
0
4
0
1
0
0
1
0
2
0
3
0
3
5
4
0
5
0
n=4
3
5
3
5
=> min # of keys
in a leaf = 5/2 = 2
DBMS 2001 Notes 4.1: B-Trees 17
(c) Coalesce with a
sibling
Delete 50
2
0
4
0
1
0
0
2
0
3
0
4
0
5
0
n=4
4
0
DBMS 2001 Notes 4.1: B-Trees 18
4
0
4
5
3
0
3
7
2
5
2
6
2
0
2
2
1
0
1
4
1
3
1
0
2
0
3
0
4
0
(d) Non-leaf coalesce
Delete 37
n=4
4
0
3
0
2
5
2
5
new root
=> min # of keys in a
non-leaf =
(n+1)/2 - 1=3-1= 2
DBMS 2001 Notes 4.1: B-Trees 19
B+tree deletions in practice
Often, coalescing is not implemented
Too hard and not worth it!
later insertions may return the node back to its
required minimum size
Compromise: Try redistributing keys with a sibling;
If not possible, leave it there
if all accesses to the records go through the B-tree,
can place a "tombstone" for the deleted record at
the leaf
DBMS 2001 Notes 4.1: B-Trees 20
Why B-trees Are Good?
B-tree adapts well to insertions and deletions,
maintaining balance
DBA does not need to care about reorganizing
split/merge operations rather rare
(How often would nodes with 200 keys be split?)
> Access times dominated by key-lookup
(i.e., traversal from root to a leaf)
DBMS 2001 Notes 4.1: B-Trees 21
Efficiency of B-trees
For example, assume 4 KB blocks, 4 byte
keys and 8 byte pointers
How many keys and pointers fit in a node
(= index block)?
Max n s.t. (4*n + 8*(n+1)) B s 4096 B ?
> n=340; 340 keys and 341 pointers fit in a node
> 171 341 pointers in a non-leaf node
DBMS 2001 Notes 4.1: B-Trees 22
Efficiency of B-trees (cont.)
Assume an average node has 255 pointers
> a three-level B-tree has 255
2
= 65025 leaves with
total of 255
3
or about 16.6 million pointers to records
> if root block kept in main memory, each
record can be accessed with 2+1 disk I/Os;
If all 256 internal nodes are in main memory, record
access requires 1+1 disk I/Os
(256 x 4 KB = 1 MB; quite feasible!)
DBMS 2001 Notes 4.1: B-Trees 23
Outline/summary
B trees
popular index structures with graceful
growth properties
support range queries like
WHERE 100 < Key < 200
(> Exercises)
Next: Hashing schemes