B+ Tree and Hashing in Dbms
B+ Tree and Hashing in Dbms
search-key pointer
• Index files are typically much smaller than the original file
In a dense secondary (non-clustering) index, the index must store a list of pointers to all
records with the same search-key value.
2. Sparse Index Files
Sparse Index: contains index records for only some
search-key values.
– Applicable when records are sequentially ordered on search-key
• We can have a secondary index with an index record for each search-key
value
Secondary Indices Example
▪ Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
▪ Secondary indices have to be dense
Q: Does it make sense to create a Sparse index file for secondary index?
Multilevel Index
• If index does not fit in memory, access becomes
expensive.
• Solution: treat index kept on disk as a sequential
file and construct a sparse index on it.
– outer index – a sparse index of the basic index
– inner index – the basic index file
• If even outer index is too large to fit in main
memory, yet another level of index can be
created, and so on.
• Indices at all levels must be updated on
insertion or deletion from the file.
Multilevel Index (Cont.)
Index Update: Deletion
Non-Leaf
Pages
20 33 51 63
Leaf 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages
Non-leaf
Pages
Leaf
Pages
Overflow
page
Primary pages
ISAM File Creation
▪ How to create an ISAM file?
▪ All leaf pages are allocated sequentially and
sorted on the search key value
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)
▪ Insert 23*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
23*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)
▪ Insert 48*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
23* 48*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)
▪ Insert 41*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
▪ Insert 42*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
▪ Delete 42*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
42*
ISAM: Deleting Entries
▪ The appropriate page is determined as for a search, and the
entry is deleted (with ONLY overflow pages removed when
becoming empty)
▪ Delete 42*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
▪ Delete 42*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
▪ Delete 51*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Note that 51 still appears in an index entry, but not in the leaf!
ISAM: Deleting Entries
▪ The appropriate page is determined as for a search, and the
entry is deleted (with ONLY overflow pages removed when
becoming empty)
▪ Delete 55*
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 55* 63* 97*
▪ If the data distribution and size are relatively static, ISAM might
be a good choice to pursue!
Dynamic Trees
▪ ISAM indices are static
▪ Long overflow chains can develop as the file grows, leading to
poor performance
p1 pn+1
Points to a sub-tree Points to a sub-tree
in which all keys are k1 k2 … kn in which all keys are
less than k1 greater than or equal kn
Database System Concepts - 7th Edition 14.32 ©Silberschatz, Korth and Sudarshan
B+ Tree: Searching for Entries
▪ Search begins at root, and key comparisons direct it
to a leaf (as in ISAM)
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
✔
B+ Tree: Searching for Entries
▪ Search begins at root, and key comparisons direct it
to a leaf (as in ISAM)
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Root
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
2* 3* 5* 7* 8*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
5 5 13 17 24 30
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
2* 3* 5* 7* 8* 5 13 24 30
B+ Tree: Examples of Insertions
17
▪ Insert entry 8* 5 13 24 30
Root
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
2* 3* 5* 7* 8*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root
FINAL TREE! 17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13 17 24 30
Do it through the parent
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root
8
13 17 24 30
Do it through the parent
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
▪ Non-Leaf Nodes
▪ Checking whether redistribution is possible usually
increases I/O
▪ Splitting non-leaf nodes typically pays off!
B+ Insertions: Keep in Mind
▪ Every data entry must appear in a leaf node;
hence, “copy up” the middle key upon splitting
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
FINAL TREE!
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
The sibling is ‘rich’ (i.e., can lend an entry); hence, remove 20* and redistribute!
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17 Is it done?
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
“Copy up” 27*, the lowest value in the leaf from which we borrowed 24*
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
“Copy up” 27*, the lowest value in the leaf from which we borrowed 24*
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
FINAL TREE!
B+ Tree: Examples of Deletions
▪ Delete 24*
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
The affected leaf will contain only 1 entry and the sibling cannot lend
any entry (i.e., redistribution is not applicable); hence, merge!
B+ Tree: Examples of Deletions
▪ Delete 24* “Toss” 27 because the page that it was
pointing to does not exist anymore!
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
30
Merge
5 13 30
5 13 30
Root
17
5 13 30
5 13 30
Root
5 13 30
5 13 30
Root
5 13 30
5 13 30
FINAL TREE!
B+ Tree: Bulk Loading
▪ Assume a collection of data records with an existing B+ tree
index on it
▪ How to add a new record to it?
▪ Use the B+ tree insert() function
Root
Sorted pages of data entries; not yet in B+ tree
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Add one entry to the root page for each subsequent page of
the sorted data entries (i.e., <lowest key value on page,
pointer to the page>)
Root
Sorted pages of data entries; not yet in B+ tree
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Add one entry to the root page for each subsequent page of
the sorted data entries (i.e., <lowest key value on page,
pointer to the page>)
Root
6
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Add one entry to the root page for each subsequent page of
the sorted data entries (i.e., <lowest key value on page,
pointer to the page>)
Root
6 10
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Split the root and create a new root page
Root
6 10
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Split the root and create a new root page
Root
10
‘push up’ the middle key
6 12
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Continue by inserting entries into the right-most index page
just above the leaf page; split when fills up
Root
10
6 12
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Continue by inserting entries into the right-most index page
just above the leaf page; split when fills up
Root 10 20
3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Continue by inserting entries into the right-most index page
just above the leaf page; split when fills up
Root 20
3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
Hashing
Hash-Based Indexing
▪ What indexing technique can we use to support range
searches (e.g., “Find s_name where gpa >= 3.0)?
▪ Tree-Based Indexing
0
h(key) mod N 2
key
h With Static Hashing,
allocated (as needed)
With Static Hashing, when corresponding
allocated sequentially buckets become full
and never de-allocated N-1
Primary bucket pages Overflow pages
Deficiencies of Static Hashing
Database System Concepts - 7th Edition 14.77 ©Silberschatz, Korth and Sudarshan
Extendible Hashing
▪ Extendible Hashing uses a directory of pointers to buckets
GLOBAL DEPTH
0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
5 = 101 1 1 5 21
1 B
* * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
DATA PAGES
Extendible Hashing: Inserting Entries
▪ An entry can be inserted as follows:
▪ Find the appropriate bucket (as in search)
0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
13 = 1101 1 1 5 21 13*
1 B
* * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
20 = 10100 1 1 5 21 13*
B
1 * * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
Bucket A
32 16
0
0 Bucket
1 5 21 13*
0 B
* * *
1
1
20 = 10100 0 Bucket
11 10
C
*
DIRECTORY
Bucket
15 7 19
D
* * *
Bucket A2
Is this enough? 4
*
12 20*
*
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
Bucket A
32 16
0
0 Bucket
1 5 21 13*
0 B
* * *
1
1
20 = 10100 0 Bucket
11 10
C
*
DIRECTORY
Bucket
15 7 19
D
* * *
Double the directory and
Bucket A2
increase the global depth 4
*
12 20*
*
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH
32 16 Bucket A
* *
DIRECTORY 4 12 20 Bucket A2
* * * (`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH 32 16 Bucket A
* *
▪ Example: insert 9*
3
00 1 9 Bucket
0 * * B
00
1
01
10 Bucket
9 = 1001 0
011 * C
10
0 15 7 19
10 Bucket
1 * * * D
110
Almost there… 111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH 32 16 Bucket A
* *
▪ Example: insert 9*
3
00 1 9 Bucket
0 * * B
00
1
01
10 Bucket
9 = 1001 0
011 * C
10
There was no need to 0
10 15 7 19 Bucket
1 * * * D
double the directory! 110
111 12 20 Bucket A2
4
* * * (`split image‘ of A)
When NOT to double the DIRECTORY
directory? 5 21 13 Bucket A2
* * * (`split image‘ of A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry LOCAL DEPTH
GLOBAL DEPTH
3
32 16 Bucket A
* *
▪ Example: insert 9*
3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
If a bucket whose local depth 1
110
* * * D
3
equals to the global depth, 111
4 12 20 Bucket A2
the directory must be * * * (`split image‘ of A)
doubled DIRECTORY 3
5 21 13 Bucket A2
* * * (`split image‘ of A)
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *
2
FULL, hence, split!
3
00 1 5 21 13 Bucket
0 * * * * B
00
1
01 2
9 = 1001 0 10 Bucket
011
* C
10
0 2
10
Because the local depth 1 15 7 19 Bucket
110
* * *
(i.e., 2) is less than the 111
D
global depth (i.e., 3), NO 3
need to double the DIRECTORY 4 12 20 Bucket A2
directory * * * (`split
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *
3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
1 * * * D
110 3
111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY 3
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *
3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
1 * * * D
FINAL STATE! 110 3
111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY 3
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
FULL, hence, split!
Repeat… LOCAL 2
DEPTH Bucket A
GLOBAL DEPTH 4 12 32 16
* * * *
2 2
Bucket
0 1 5 21 13
B
0 * * * *
0
20 = 10100 1
1 2
0 Bucket
11 10
C
Because the local depth *
2 2
0 1 5 21 13 Bucket
0 * * * * B
0
11 2
20 = 10100 011 10 Bucket
* C
2
DIRECTORY Bucket
15 7 19
Is this enough? * * *
D
2
4 12 20 Bucket A2
(`split
* * *
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
2
Repeat… LOCAL
DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *
3 2
00 1 5 21 13 Bucket
0 * * * * B
00
1
01 2
0 10 Bucket
011
* C
10
0 2
10
1 15 7 19 Bucket
110
Is this enough? 111
* * * D
2
DIRECTORY 4 12 20 Bucket A2
* * * (`split
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
3
Repeat… LOCAL
DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *
3 2
00 1 5 21 13 Bucket
0 * * * * B
00
1
FINAL STATE! 01 2
0 10 Bucket
011
* C
10
0 2
10
1 15 7 19 Bucket
110
* * * D
111
3
DIRECTORY 4 12 20 Bucket A2
* * * (`split
of Bucket A)
image'
Bitmap Indices
▪ Bitmap indices are a special type of index designed for efficient querying on
multiple keys
▪ Records in a relation are assumed to be numbered sequentially from, say, 0
• Given a number n it must be easy to retrieve record n
▪ Particularly easy if records are of fixed size
▪ Applicable on attributes that take on a relatively small number of distinct values
• E.g., gender, country, state, …
• E.g., income-level (income broken up into a small number of levels such as
0-9999, 10000-19999, 20000-50000, 50000- infinity)
▪ A bitmap is simply an array of bits
Database System Concepts - 7th Edition 14.97 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
▪ In its simplest form a bitmap index on an attribute has a bitmap for each value of
the attribute
• Bitmap has as many bits as records
• In a bitmap for value v, the bit for a record is 1 if the record has the value v
for the attribute, and is 0 otherwise
▪ Example
Database System Concepts - 7th Edition 14.98 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
Database System Concepts - 7th Edition 14.99 ©Silberschatz, Korth and Sudarshan
Spatial and Temporal Indices
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Spatial Data
▪ Databases can store data types such as lines, polygons, in addition to raster images
• allows relational databases to store and retrieve spatial information
• Queries can use spatial conditions (e.g. contains or overlaps).
• queries can mix spatial and nonspatial conditions
▪ Nearest neighbor queries, given a point or an object, find the nearest object that
satisfies given conditions.
▪ Range queries deal with spatial regions. e.g., ask for objects that lie partially or
fully inside a specified region.
▪ Queries that compute intersections or unions of regions.
▪ Spatial join of two spatial relations with the location playing the role of join
attribute.
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Indexing of Spatial Data
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Division of Space by Quadtrees
▪ Each node of a quadtree is associated with a rectangular region of space; the top
node is associated with the entire target space.
▪ Each non-leaf nodes divides its region into four equal sized quadrants
• correspondingly each such node has four child nodes corresponding to the
four quadrants and so on
▪ Leaf nodes have between zero and some fixed maximum number of points (set to
1 in example).
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
R-Trees
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Example R-Tree
▪ A set of rectangles (solid line) and the bounding boxes (dashed line) of the nodes
of an R-tree for the rectangles.
▪ The R-tree is shown on the right.
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Search in R-Trees
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Indexing Temporal Data
▪ Temporal data refers to data that has an associated time period (interval)
• Example: a temporal version of the course relation
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Indexing Temporal Data (Cont.)
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
End of Chapter 14
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Example of Hash Index
Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan