B+ Trees
Tree-Structured Indices
Tree-structured indexing techniques
support both range searches and equality
searches.
ISAM: static structure; B+ tree:
dynamic, adjusts gracefully under inserts
and deletes.
index entry
ISAM P
0
K
1 P
1
K 2 P
2
K m Pm
Repeat sequential indexing until
sequential index fits on one page.
Non-leaf
Pages
Leaf
Pages
Overflow
page
Primary pages
Leaf pages contain data entries.
Example ISAM Tree
Each node can hold 2 entries; no need
for `next-leaf-page’ pointers. (Why?)
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Comments on ISAM Data Pages
Index Pages
File creation: Leaf (data) pages allocated
sequentially, sorted by search key; then index
pages allocated, then space for overflow pages.
Overflow pages
Index entries: <search key value, page id>; they
`direct’ search for data entries, which are in leaf pages.
Search: Start at root; use key comparisons to go to leaf.
Cost log F N ; F = # entries/index pg, N = # leaf pgs
Insert: Find leaf data entry belongs to, and put it there.
Delete: Find and remove from leaf; if empty overflow
page, de-allocate.
atic tree structure: inserts/deletes affect only leaf pages.
After Inserting 23*, 48*, 41*,
42* ...
Root
Index 40
Pages
20 33 51 63
Primary
Leaf 46* 55*
10* 15* 20* 27* 33* 37* 40* 51* 63* 97*
Pages
Overflow 23* 48* 41*
Pages
42*
... Then Deleting 42*,
51*, 97*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 55* 63*
23* 48* 41*
Note that 51 appears in index levels, but not in leaf!
B+ Tree: The Most Widely-
Used Index
Insert/delete at log F N cost; keep tree height-
balanced. (F = fanout, N = # leaf pages)
Minimum 50% occupancy (except for root). Each
node contains d <= m <= 2d entries. The
parameter d is called the order of the tree.
Supports equality and range-searches efficiently.
Index Entries
(Direct search)
Data Entries
("Sequence set")
Example B+ Tree
Search begins at root, and key
comparisons direct it to a leaf (as in
ISAM).
Search for
Root5*, 15*, all data entries >=
24* ... 13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Based on the search for 15*, we know it is not in the tree!
B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67%.
– average fanout = 133
Typical capacities:
– Height 4: 1334 = 312,900,700 records
– Height 3: 1333 = 2,352,637 records
Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbyte
– Level 3 = 17,689 pages = 133 MBytes
1
Inserting a Data Entry into a
B+ Tree
Find correct leaf L.
Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
Redistribute entries evenly, copy up middle key.
Insert index entry pointing to L2 into parent of L.
This can happen recursively
– To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
Splits “grow” tree; root split increases height.
– Tree growth: gets wider or one level taller at top.
1
Inserting 8* into Example B+
Tree
Entry to be inserted in parent node.
Observe how 5 (Note that 5 is
s copied up and
minimum continues to appear in the leaf.)
occupancy is
guaranteed in 2* 3* 5* 7* 8*
both leaf and
index pg splits.
Note difference
between copy- Entry to be inserted in parent node.
(Note that 17 is pushed up and only
up and push- 17
appears once in the index. Contrast
this with a leaf split.)
up; be sure you
understand the
5 13 24 30
reasons for this.
1
Example B+ Tree After
Inserting 8*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Notice that root was split, leading to increase in height.
In this example, we can avoid split by re-
distributing entries; however, this is
usually not done in practice.
1
Deleting a Data Entry from a
B+ Tree
Start at root, find leaf L where entry belongs.
Remove the entry.
– If L is at least half-full, done!
– If L has only d-1 entries,
Try to re-distribute, borrowing from sibling (adjacent
node with same parent as L).
If re-distribution fails, merge L and sibling.
If merge occurred, must delete entry (pointing to L
or sibling) from parent of L.
Merge could propagate to root, decreasing height.
1
Example Tree After (Inserting
8*, Then) Deleting 19* and
20* ...
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
Deleting 19* is easy.
Deleting 20* is done with re-distribution.
Notice how middle key is copied up.
1
... And Then Deleting
24*
Must merge. 30
Observe `toss’ of
index entry (on
22* 27* 29* 33* 34* 38* 39*
right), and `pull
down’ of index
entry (below).
Root
5 13 17 30
2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*
1
Summary
Tree-structured indexes are ideal for range-
searches, also good for equality searches.
ISAM is a static structure.
– Performance can degrade over time.
B+ tree is a dynamic structure.
– Inserts/deletes leave tree height-balanced; log F N
cost.
– High fanout (F) means depth rarely more than 3
or 4.
– Almost always better than maintaining a sorted
file.
1
Summary (Contd.)
– Typically, 67% occupancy on average.
– Usually preferable to ISAM, modulo locking
considerations; adjusts to growth gracefully.
Most widely used index in database
management systems because of its
versatility. One of the most optimized
components of a DBMS.