Tree-Structured Indexes: Comp 521 - Files and Databases Fall 2010 1
Tree-Structured Indexes: Comp 521 - Files and Databases Fall 2010 1
Indexes
Chapter 10
k1 k2 kN Index File
P K P K 2 P K m Pm
0 1 1 2
Non-leaf
Pages
Leaf
Pages
Overflow
page
Primary pages
Leaf pages contain data entries (i.e. actual records or <key, rid> pairs.
Comp 521 – Files and Databases Fall 2010 4
Comments on ISAM
File creation: Leaf (data) pages allocated
sequentially, sorted by search key; then index
pages allocated, then space for overflow pages. Data
Pages
Index entries: <search key value, page id>;
they `direct’ search for data entries,
which are in leaf pages.
Index Pages
Search: Start at root; use key comparisons
to go to leaf. Cost log F N
F = # entries/index pg, N = # leaf pgs
Overflow pages
Insert: Find leaf data entry belongs to,
put it there if space is available, else allocate an
overflow page, put it there, and link it in.
Delete: Find and remove from leaf; if empty overflow page, de-
allocate.
Static tree structure: inserts/deletes affect only leaf pages.
Comp 521 – Files and Databases Fall 2010 5
Example ISAM Tree
Each node can hold 2 entries; no need for
`next-leaf-page’ pointers. (Why?)
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages
20 33 51 63
Primary
Leaf
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 55* 63*
Index Entries
(Direct search)
Data Entries
("Sequence set")
Comp 521 – Files and Databases Fall 2010 9
Example B+ Tree
Search begins at root, and key comparisons
direct it to a leaf (as in ISAM).
Search for 5*, 15*, all data entries >= 24* ...
Root
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
understand the
reasons for this. 5 13 24 30
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
Root
Result: 5 13 17 30
22
5 13 17 20 30
2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*
22
5 13 17 20 30
2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*
17
5 13 20 22 30
2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
pages always
Data entry pages
entered into right- 6 12 23 35
not yet in B+ tree
most index page just
above leaf level.
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
When this fills up,
it splits. (Split may Root 20
go up right-most
path to the root.) 10 35 Data entry pages
not yet in B+ tree
Much faster than
repeated inserts, 6 12 23 38
especially if one
considers locking! 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Comp 521 – Files and Databases Fall 2010 23
Summary of Bulk Loading
Option 1: multiple inserts.
Slow.
Does not give sequential storage of leaves.
Option 2: Bulk Loading
Has advantages for concurrency control.
Fewer I/Os during build.
Leaves will be stored sequentially (and linked, of
course).
Can control “fill factor” on pages.