0% found this document useful (0 votes)
24 views

B+ Tree and Hashing in Dbms

This document discusses indexing in database systems. It begins with basic concepts of indices, which are used to speed up access to desired data. Index files contain index entries with search keys and pointers. Ordered indices store keys in sorted order, while hash indices distribute keys uniformly using a hash function. The document then covers ordered index types like dense and sparse indices, secondary indices, multilevel indices, indexing on multiple keys, ISAM trees, and insertion/deletion in indices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

B+ Tree and Hashing in Dbms

This document discusses indexing in database systems. It begins with basic concepts of indices, which are used to speed up access to desired data. Index files contain index entries with search keys and pointers. Ordered indices store keys in sorted order, while hash indices distribute keys uniformly using a hash function. The document then covers ordered index types like dense and sparse indices, secondary indices, multilevel indices, indexing on multiple keys, ISAM trees, and insertion/deletion in indices.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 110

Chapter 14: Indexing

Database System Concepts, 7th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Outline
• Basic Concepts
• Ordered Indices
• B+-Tree Index Files
• B-Tree Index Files
• Hashing
• Write-optimized indices
• Spatio-Temporal Indexing
Basic Concepts
• Indexing mechanisms used to speed up access to desired data.
– E.g., author catalog in library

• An index file consists of records (called index entries) of the form

search-key pointer

– Search Key - attribute to set of attributes used to look up records in a file.

• Index files are typically much smaller than the original file

• Two basic kinds of indices:


– Ordered indices: search keys are stored in sorted order
– Hash indices: search keys are distributed uniformly across “buckets” using a
“hash function”.
Index Evaluation Metrics
• Access types supported efficiently. E.g.,
– Records with a specified value in the attribute
– Records with an attribute value falling in a
specified range of values.
• Access time
• Insertion time
• Deletion time
• Space overhead
Ordered Indices
• In an ordered index, index entries are stored sorted on the search key
value.

• Clustering index: in a sequentially ordered file, the index whose search


key specifies the sequential order of the file.
– Also called primary index
– The search key of a primary index is usually but not necessarily the primary key.

• Secondary index: an index whose search key specifies an order


different from the sequential order of the file. Also called non-
clustering index.

• Index-sequential access file (ISAM): sequential file ordered on a


search key, with a clustering index on the search key.
– Designed for applications that require both sequential and random access to
individual/set of records.
Types of Ordered Indices
1. Dense Index Files
Dense index — Index record appears for every search-key value in the file.
– E.g. index on ID attribute of instructor relation

Identifier of a disk bock and an offset within the disk


block
Dense Index Files (Cont.)

• Dense index on dept_name, with instructor file


sorted on dept_name

In a dense secondary (non-clustering) index, the index must store a list of pointers to all
records with the same search-key value.
2. Sparse Index Files
Sparse Index: contains index records for only some
search-key values.
– Applicable when records are sequentially ordered on search-key

• To locate a record with search-key value K we:


– Find index record with largest search-key value < K
– Search file sequentially starting at the record to which the index record points

Compared to dense indices:


• Less space and less maintenance overhead for insertions and deletions.
• Generally slower than dense index for locating records.
Secondary Indices
• Frequently, one wants to find all the records whose values in a certain field
(which is not the search-key of the primary index) satisfy some condition.
– Example 1: In the instructor relation stored sequentially by ID, we may want to
find all instructors in a particular department
– Example 2: as above, but where we want to find all instructors with a specified
salary or with salary in a specified range of values

• We can have a secondary index with an index record for each search-key
value
Secondary Indices Example

▪ Secondary index on salary field of instructor

▪ Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
▪ Secondary indices have to be dense
Q: Does it make sense to create a Sparse index file for secondary index?
Multilevel Index
• If index does not fit in memory, access becomes
expensive.
• Solution: treat index kept on disk as a sequential
file and construct a sparse index on it.
– outer index – a sparse index of the basic index
– inner index – the basic index file
• If even outer index is too large to fit in main
memory, yet another level of index can be
created, and so on.
• Indices at all levels must be updated on
insertion or deletion from the file.
Multilevel Index (Cont.)
Index Update: Deletion

▪ If deleted record was the only


record in the file with its
particular search-key value,
the search-key is deleted
from the index also.

• Single-level index entry deletion:


– Dense indices – deletion of search-key is similar to file
record deletion.
– Sparse indices –
• if an entry for the search key exists in the index, it is deleted
by replacing the entry in the index with the next search-key
value in the file (in search-key order).
• If the next search-key value already has an index entry, the
entry is deleted instead of being replaced.
Index Update: Insertion
• Single-level index insertion:
– Perform a lookup using the search-key value of the record
to be inserted.
– Dense indices – if the search-key value does not appear in
the index, insert it
• Indices are maintained as sequential files
• Need to create space for new entry, overflow blocks may be
required
– Sparse indices – if index stores an entry for each block of
the file, no change needs to be made to the index unless a
new block is created.
• If a new block is created, the first search-key value appearing in
the new block is inserted into the index.
• Multilevel insertion and deletion: algorithms are
simple extensions of the single-level algorithms
Indices on Multiple Keys
• Composite search key
– E.g., index on instructor relation on attributes
(name, ID)
– Values are sorted lexicographically
• E.g. (John, 12121) < (John, 13514) and
(John, 13514) < (Peter, 11223)
– Can query on just name, or on (name, ID)
ISAM Trees
▪ Indexed Sequential Access Method (ISAM) trees
are static
Root
40

Non-Leaf
Pages
20 33 51 63

Leaf 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages

E.g., 2 Entries Per Page


ISAM Trees: Page Overflows
▪ What if there are a lot of insertions after creating
the tree?

Non-leaf
Pages

Leaf
Pages
Overflow
page
Primary pages
ISAM File Creation
▪ How to create an ISAM file?
▪ All leaf pages are allocated sequentially and
sorted on the search key value

▪ Or the data records are created and sorted


before allocating leaf pages

▪ The non-leaf pages are subsequently allocated


ISAM: Searching for Entries
▪ Search begins at root, and key comparisons direct it
to a leaf

▪ Search for 27*


Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)

▪ Insert 23*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)

▪ Insert 48*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)

▪ Insert 41*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48* 41*


ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)

▪ Insert 42*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48* 41*

Chains of overflow pages can easily develop! 42*


ISAM: Deleting Entries
▪ The appropriate page is determined as for a search, and the
entry is deleted (with ONLY overflow pages removed when
becoming empty)

▪ Delete 42*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48* 41*

42*
ISAM: Deleting Entries
▪ The appropriate page is determined as for a search, and the
entry is deleted (with ONLY overflow pages removed when
becoming empty)

▪ Delete 42*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48* 41*


ISAM: Deleting Entries
▪ The appropriate page is determined as for a search, and the
entry is deleted (with ONLY overflow pages removed when
becoming empty)

▪ Delete 42*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48* 41*


ISAM: Deleting Entries
▪ The appropriate page is determined as for a search, and the
entry is deleted (with ONLY overflow pages removed when
becoming empty)

▪ Delete 51*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48* 41*

Note that 51 still appears in an index entry, but not in the leaf!
ISAM: Deleting Entries
▪ The appropriate page is determined as for a search, and the
entry is deleted (with ONLY overflow pages removed when
becoming empty)

▪ Delete 55*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 55* 63* 97*

23* 48* 41*

Primary pages are NOT removed, even if they become empty!


ISAM: Some Issues
▪ Once an ISAM file is created, insertions and deletions affect only
the contents of leaf pages (i.e., ISAM is a static structure!)

▪ Since index-level pages are never modified, there is no need to


lock them during insertions/deletions
▪ Critical for concurrency!

▪ Long overflow chains can develop easily


▪ The tree can be initially set so that ~20% of each page is free

▪ If the data distribution and size are relatively static, ISAM might
be a good choice to pursue!
Dynamic Trees
▪ ISAM indices are static
▪ Long overflow chains can develop as the file grows, leading to
poor performance

▪ This calls for more flexible, dynamic indices that adjust


gracefully to insertions and deletions
▪ No need to allocate the leaf pages sequentially as in ISAM

▪ Among the most successful dynamic index schemes is


the B+ tree
B+ Tree Properties
• All paths from root to leaf are of the same length
• Each node that is not a root or a leaf has between ⎡n/2⎤ and n
children.
• A leaf node has between ⎡(n–1)/2⎤ and n–1 values
• Special cases:
– If the root is not a leaf, it has at least 2 children.
– If the root is a leaf (that is, there are no other nodes in the tree), it can
have between 0 and (n–1) values.

p1 pn+1
Points to a sub-tree Points to a sub-tree
in which all keys are k1 k2 … kn in which all keys are
less than k1 greater than or equal kn

Points to a sub-tree in which all keys are greater


than or equal k1 and less than to k2
Example of B+-Tree

Database System Concepts - 7th Edition 14.32 ©Silberschatz, Korth and Sudarshan
B+ Tree: Searching for Entries
▪ Search begins at root, and key comparisons direct it
to a leaf (as in ISAM)

▪ Example 1: Search for entry 5*


Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*


B+ Tree: Searching for Entries
▪ Search begins at root, and key comparisons direct it
to a leaf (as in ISAM)

▪ Example 2: Search for entry 15*


Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

15* is not found!


B+ Trees: Inserting Entries
▪ Find correct leaf L

▪ Put data entry onto L


▪ If L has enough space, done!
▪ Else, split L into L and a new node L2
▪ Re-partition entries evenly, copying up the middle key

▪ Parent node may overflow


▪ Push up middle key (splits “grow” trees; a root split
increases the height of the tree)
B+ Tree: Examples of Insertions
▪ Insert entry 8*

Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Leaf is full; hence, split!


B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

The middle key (i.e., 5) is “copied up”


5 and continues to appear in the leaf

2* 3* 5* 7* 8*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

5 5 13 17 24 30

> (n-1) keys


2* 3* 5* 7* 8*
Parent is full; hence, split!
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

The middle key (i.e., 17)


is “pushed up”
5 17

2* 3* 5* 7* 8* 5 13 24 30
B+ Tree: Examples of Insertions
17

▪ Insert entry 8* 5 13 24 30
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

2* 3* 5* 7* 8*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root
FINAL TREE! 17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Splitting the root lead to an increase of height by 1!

What about re-distributing entries instead of splitting nodes?


B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Leaf is full; hence, check the sibling ‘Poor Sibling’


B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30
Do it through the parent

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

8
13 17 24 30
Do it through the parent

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

“Copy up” the new low key value!

But, when to redistribute and when to split?


Splitting vs. Redistributing
▪ Leaf Nodes
▪ Previous and next-neighbor pointers must be updated
upon insertions (if splitting is to be pursued)
▪ Hence, checking whether redistribution is possible does
not increase I/O
▪ Therefore, if a sibling can spare an entry, re-distribute

▪ Non-Leaf Nodes
▪ Checking whether redistribution is possible usually
increases I/O
▪ Splitting non-leaf nodes typically pays off!
B+ Insertions: Keep in Mind
▪ Every data entry must appear in a leaf node;
hence, “copy up” the middle key upon splitting

▪ When splitting index entries, simply “push up” the


middle key

▪ Apply splitting and/or redistribution on leaf nodes

▪ Apply only splitting on non-leaf nodes


B+ Trees: Deleting Entries
▪ Start at root, find leaf L where entry belongs
▪ Remove the entry
▪ If L is at least half-full, done!
▪ If L underflows
▪Try to re-distribute (i.e., borrow from a “rich
sibling” and “copy up” its lowest key)
▪If re-distribution fails, merge L and a “poor
sibling”
▪Update parent
▪ And possibly merge, recursively
B+ Tree: Examples of Deletions
▪ Delete 19*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Removing 19* does not cause an underflow


B+ Tree: Examples of Deletions
▪ Delete 19*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

FINAL TREE!
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Deleting 20* causes an underflow; hence, check a sibling for redistribution


B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

The sibling is ‘rich’ (i.e., can lend an entry); hence, remove 20* and redistribute!
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17 Is it done?
5 13 24 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

“Copy up” 27*, the lowest value in the leaf from which we borrowed 24*
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

“Copy up” 27*, the lowest value in the leaf from which we borrowed 24*
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

FINAL TREE!
B+ Tree: Examples of Deletions
▪ Delete 24*
Root
17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

The affected leaf will contain only 1 entry and the sibling cannot lend
any entry (i.e., redistribution is not applicable); hence, merge!
B+ Tree: Examples of Deletions
▪ Delete 24* “Toss” 27 because the page that it was
pointing to does not exist anymore!
Root
17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

30
Merge

22* 27* 29* 33* 34* 38* 39*


B+ Tree: Examples of Deletions
▪ Delete 24* Is it done?
Root
17 No, but almost there…

5 13 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*


B+ Tree: Examples of Deletions
▪ Delete 24*
This entails an underflow; hence,
we must either redistribute or merge!
Root
17

5 13 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*


B+ Tree: Examples of Deletions
The sibling is “poor” (i.e., redistribution
▪ Delete 24* is not applicable); hence, merge!

Root
17

5 13 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*


B+ Tree: Examples of Deletions
▪ Delete 24*
Root
17

5 13 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

Root
5 13 30

Lacks a pointer for 30!


B+ Tree: Examples of Deletions
▪ Delete 24*
Root
17

5 13 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

Root
5 13 30

Lacks a key value to create a complete index entry!


B+ Tree: Examples of Deletions
▪ Delete 24*
Root
17

5 13 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

Root “Pull down” 17!


5 13 17 30
B+ Tree: Examples of Deletions
▪ Delete 24*
Root
5 13 17 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

FINAL TREE!
B+ Tree: Bulk Loading
▪ Assume a collection of data records with an existing B+ tree
index on it
▪ How to add a new record to it?
▪ Use the B+ tree insert() function

▪ What if we have a collection of data records for which we


want to create a B+ tree index? (i.e., we want to bulk load
the B+ tree)
▪ Starting with an empty tree and using the insert() function
for each data record, one at a time, is expensive!
▪ This is because for each entry we would require starting again
from the root and going down to the appropriate leaf page
B+ Tree: Bulk Loading
▪ What to do?
▪ Initialization: Sort all data entries, insert pointer to first (leaf)
page in a new (root) page

Root
Sorted pages of data entries; not yet in B+ tree

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Add one entry to the root page for each subsequent page of
the sorted data entries (i.e., <lowest key value on page,
pointer to the page>)

Root
Sorted pages of data entries; not yet in B+ tree

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Add one entry to the root page for each subsequent page of
the sorted data entries (i.e., <lowest key value on page,
pointer to the page>)

Root
6

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Add one entry to the root page for each subsequent page of
the sorted data entries (i.e., <lowest key value on page,
pointer to the page>)

Root
6 10

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Split the root and create a new root page

Root
6 10

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Split the root and create a new root page

Root
10
‘push up’ the middle key

6 12

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Continue by inserting entries into the right-most index page
just above the leaf page; split when fills up
Root
10

6 12

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Continue by inserting entries into the right-most index page
just above the leaf page; split when fills up

Root 10 20

Data entry pages


6 12 23 35
not yet in B+
tree

3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Continue by inserting entries into the right-most index page
just above the leaf page; split when fills up

Root 20

10 35 Data entry pages


not yet in B+
tree
6 12 23 38

3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
Hashing
Hash-Based Indexing
▪ What indexing technique can we use to support range
searches (e.g., “Find s_name where gpa >= 3.0)?
▪ Tree-Based Indexing

▪ What about equality selections (e.g., “Find s_name


where sid = 102”?
▪ Tree-Based Indexing
▪ Hash-Based Indexing (cannot support range searches!)

▪ Hash-based indexing, however, proves to be very useful


in implementing relational operators (e.g., joins)
Static Hashing
• A bucket is a unit of storage containing one or more entries (a
bucket is typically a disk block).
– we obtain the bucket of an entry from its search-key value using a hash
function
▪ A hash function h is used to map keys into a range of bucket numbers
• In a hash index, buckets store entries with pointers to records
• In a hash file-organization buckets store records

0
h(key) mod N 2
key
h With Static Hashing,
allocated (as needed)
With Static Hashing, when corresponding
allocated sequentially buckets become full
and never de-allocated N-1
Primary bucket pages Overflow pages
Deficiencies of Static Hashing

▪ In static hashing, function h maps search-key values to a fixed set of B of bucket


addresses. Databases grow or shrink with time.
• If initial number of buckets is too small, and file grows, performance will
degrade due to too much overflows.
• If space is allocated for anticipated growth, a significant amount of space
will be wasted initially (and buckets will be underfull).
• If database shrinks, again space will be wasted.
▪ One solution: periodic re-organization of the file with a new hash function
• Expensive, disrupts normal operations
▪ Better solution: allow the number of buckets to be modified dynamically.

Database System Concepts - 7th Edition 14.77 ©Silberschatz, Korth and Sudarshan
Extendible Hashing
▪ Extendible Hashing uses a directory of pointers to buckets
GLOBAL DEPTH

▪ The result of applying a hash 2


4 12 32 16 Bucket A
* * * *

function h is treated as a 0 1 5 21 Bucket


binary number and 0
0
* * * B

the last d bits are 1


1
10 Bucket
interpreted as an 0
11 * C

offset into the directory DIRECTORY 15 7 19 Bucket


* * * D

▪ d is referred to as the global depth DATA PAGES

of the hash file and is kept as part


of the header of the file
Extendible Hashing: Searching for Entries
▪ To search for a data entry, apply a hash function h to the
key and take the last d bits of its binary representation to
get the bucket number

▪ Example: search for 5*


2

0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
5 = 101 1 1 5 21
1 B
* * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
DATA PAGES
Extendible Hashing: Inserting Entries
▪ An entry can be inserted as follows:
▪ Find the appropriate bucket (as in search)

▪ Split the bucket if full and redistribute contents


(including the new entry to be inserted) across
the old bucket and its “split image”

▪ Double the directory if necessary

▪ Insert the given entry


Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry

▪ Example: insert 13*


2

0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
13 = 1101 1 1 5 21 13*
1 B
* * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry

▪ Example: insert 20* FULL, hence, split and redistribute!


2

0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
20 = 10100 1 1 5 21 13*
B
1 * * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
Bucket A
32 16

▪ Example: insert 20*


2 * *

0
0 Bucket
1 5 21 13*
0 B
* * *
1
1
20 = 10100 0 Bucket
11 10
C
*
DIRECTORY
Bucket
15 7 19
D
* * *

Bucket A2
Is this enough? 4
*
12 20*
*
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
Bucket A
32 16

▪ Example: insert 20*


2 * *

0
0 Bucket
1 5 21 13*
0 B
* * *
1
1
20 = 10100 0 Bucket
11 10
C
*
DIRECTORY
Bucket
15 7 19
D
* * *
Double the directory and
Bucket A2
increase the global depth 4
*
12 20*
*
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH
32 16 Bucket A
* *

▪ Example: insert 20*


3
0 00 1 5 21 13 Bucket
* * * * B
00
These two bits indicate a data entry that 1
01
belongs to one of these two buckets 0
011 10 Bucket
* C
1 00
10
The third bit distinguishes between these 1
110 15 7 19 Bucket
two buckets! * * * D
111

But, is it necessary always to DIRECTORY 4 12 20 Bucket A2


double the directory? * * * (`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH 32 16 Bucket A
* *
FULL, hence, split!
▪ Example: insert 9*
3
00 1 5 21 13 Bucket
0 * * * * B
00
1
01
9 = 1001 0 10 Bucket
011
* C
10
0
10
1 15 7 19 Bucket
110
* * * D
111

DIRECTORY 4 12 20 Bucket A2
* * * (`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH 32 16 Bucket A
* *

▪ Example: insert 9*
3
00 1 9 Bucket
0 * * B
00
1
01
10 Bucket
9 = 1001 0
011 * C
10
0 15 7 19
10 Bucket
1 * * * D
110
Almost there… 111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH 32 16 Bucket A
* *

▪ Example: insert 9*
3
00 1 9 Bucket
0 * * B
00
1
01
10 Bucket
9 = 1001 0
011 * C
10
There was no need to 0
10 15 7 19 Bucket
1 * * * D
double the directory! 110
111 12 20 Bucket A2
4
* * * (`split image‘ of A)
When NOT to double the DIRECTORY
directory? 5 21 13 Bucket A2
* * * (`split image‘ of A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry LOCAL DEPTH
GLOBAL DEPTH
3
32 16 Bucket A
* *

▪ Example: insert 9*
3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
If a bucket whose local depth 1
110
* * * D
3
equals to the global depth, 111
4 12 20 Bucket A2
the directory must be * * * (`split image‘ of A)
doubled DIRECTORY 3
5 21 13 Bucket A2
* * * (`split image‘ of A)
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *
2
FULL, hence, split!
3
00 1 5 21 13 Bucket
0 * * * * B
00
1
01 2

9 = 1001 0 10 Bucket
011
* C
10
0 2
10
Because the local depth 1 15 7 19 Bucket
110
* * *
(i.e., 2) is less than the 111
D
global depth (i.e., 3), NO 3
need to double the DIRECTORY 4 12 20 Bucket A2
directory * * * (`split
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *

3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
1 * * * D
110 3
111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY 3
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *

3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
1 * * * D
FINAL STATE! 110 3
111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY 3
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
FULL, hence, split!
Repeat… LOCAL 2
DEPTH Bucket A
GLOBAL DEPTH 4 12 32 16
* * * *

2 2
Bucket
0 1 5 21 13
B
0 * * * *
0
20 = 10100 1
1 2
0 Bucket
11 10
C
Because the local depth *

and the global depth are


2
both 2, we should double DIRECTORY
Bucket
the directory! 15 7 19
D
* * *
DATA PAGES
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
Repeat… LOCAL 2
Bucket A
DEPTH 32 16
GLOBAL DEPTH
* *

2 2

0 1 5 21 13 Bucket
0 * * * * B
0
11 2
20 = 10100 011 10 Bucket
* C

2
DIRECTORY Bucket
15 7 19
Is this enough? * * *
D

2
4 12 20 Bucket A2
(`split
* * *
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
2
Repeat… LOCAL
DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *

3 2
00 1 5 21 13 Bucket
0 * * * * B
00
1
01 2
0 10 Bucket
011
* C
10
0 2
10
1 15 7 19 Bucket
110
Is this enough? 111
* * * D

2
DIRECTORY 4 12 20 Bucket A2
* * * (`split
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
3
Repeat… LOCAL
DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *

3 2
00 1 5 21 13 Bucket
0 * * * * B
00
1
FINAL STATE! 01 2
0 10 Bucket
011
* C
10
0 2
10
1 15 7 19 Bucket
110
* * * D
111
3
DIRECTORY 4 12 20 Bucket A2
* * * (`split
of Bucket A)
image'
Bitmap Indices

▪ Bitmap indices are a special type of index designed for efficient querying on
multiple keys
▪ Records in a relation are assumed to be numbered sequentially from, say, 0
• Given a number n it must be easy to retrieve record n
▪ Particularly easy if records are of fixed size
▪ Applicable on attributes that take on a relatively small number of distinct values
• E.g., gender, country, state, …
• E.g., income-level (income broken up into a small number of levels such as
0-9999, 10000-19999, 20000-50000, 50000- infinity)
▪ A bitmap is simply an array of bits

Database System Concepts - 7th Edition 14.97 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
▪ In its simplest form a bitmap index on an attribute has a bitmap for each value of
the attribute
• Bitmap has as many bits as records
• In a bitmap for value v, the bit for a record is 1 if the record has the value v
for the attribute, and is 0 otherwise
▪ Example

Database System Concepts - 7th Edition 14.98 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)

▪ Bitmap indices are useful for queries on multiple attributes


• not particularly useful for single attribute queries
▪ Queries are answered using bitmap operations
• Intersection (and)
• Union (or)
▪ Each operation takes two bitmaps of the same size and applies the operation on corresponding
bits to get the result bitmap
• E.g., 100110 AND 110011 = 100010
100110 OR 110011 = 110111
NOT 100110 = 011001
• Males with income level L1: 10010 AND 10100 = 10000
▪ Can then retrieve required tuples.
▪ Counting number of matching tuples is even faster

▪ Bitmap indices generally very small compared with relation size


• E.g., If number of distinct attribute values is 8, bitmap is only 1% of relation size

Database System Concepts - 7th Edition 14.99 ©Silberschatz, Korth and Sudarshan
Spatial and Temporal Indices

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Spatial Data

▪ Databases can store data types such as lines, polygons, in addition to raster images
• allows relational databases to store and retrieve spatial information
• Queries can use spatial conditions (e.g. contains or overlaps).
• queries can mix spatial and nonspatial conditions
▪ Nearest neighbor queries, given a point or an object, find the nearest object that
satisfies given conditions.
▪ Range queries deal with spatial regions. e.g., ask for objects that lie partially or
fully inside a specified region.
▪ Queries that compute intersections or unions of regions.
▪ Spatial join of two spatial relations with the location playing the role of join
attribute.

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Indexing of Spatial Data

▪ k-d tree - early structure used for indexing


in multiple dimensions.
▪ Each level of a k-d tree partitions the space
into two.
• Choose one dimension for partitioning
at the root level of the tree.
• Choose another dimensions for
partitioning in nodes at the next level
and so on, cycling through the
dimensions.
▪ In each node, approximately half of the
points stored in the sub-tree fall on one side
and half on the other.
▪ The k-d-B tree extends the k-d
▪ Partitioning stops when a node has less than tree to allow multiple child nodes
a given number of points. for each internal node; well-suited
for secondary storage.

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Division of Space by Quadtrees

▪ Each node of a quadtree is associated with a rectangular region of space; the top
node is associated with the entire target space.
▪ Each non-leaf nodes divides its region into four equal sized quadrants
• correspondingly each such node has four child nodes corresponding to the
four quadrants and so on
▪ Leaf nodes have between zero and some fixed maximum number of points (set to
1 in example).

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
R-Trees

▪ R-trees are a N-dimensional extension of B+-trees, useful for indexing sets of


rectangles and other polygons.
▪ Supported in many modern database systems, along with variants like R + -trees
and R*-trees.
▪ Basic idea: generalize the notion of a one-dimensional interval associated with
each B+ -tree node to an
N-dimensional interval, that is, an N-dimensional rectangle.
▪ Will consider only the two-dimensional case (N = 2)
• generalization for N > 2 is straightforward, although R-trees work well only
for relatively small N
▪ The bounding box of a node is a minimum sized rectangle that contains all the
rectangles/polygons associated with the node
• Bounding boxes of children of a node are allowed to overlap

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Example R-Tree

▪ A set of rectangles (solid line) and the bounding boxes (dashed line) of the nodes
of an R-tree for the rectangles.
▪ The R-tree is shown on the right.

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Search in R-Trees

▪ To find data items intersecting a given query point/region, do the following,


starting from the root node:
• If the node is a leaf node, output the data items whose keys intersect the given
query point/region.
• Else, for each child of the current node whose bounding box intersects the
query point/region, recursively search the child
▪ Can be very inefficient in worst case since multiple paths may need to be
searched, but works acceptably in practice.

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Indexing Temporal Data

▪ Temporal data refers to data that has an associated time period (interval)
• Example: a temporal version of the course relation

▪ Time interval has a start and end time


• End time set to infinity (or large date such as 9999-12-31) if a tuple is
currently valid and its validity end time is not currently known
▪ Query may ask for all tuples that are valid at a point in time or during a time
interval
• Index on valid time period speeds up this task

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Indexing Temporal Data (Cont.)

▪ To create a temporal index on attribute a:


• Use spatial index, such as R-tree, with attribute a as one dimension, and time
as another dimension
▪ Valid time forms an interval in the time dimension
• Tuples that are currently valid cause problems, since value is infinite or very
large
▪ Solution: store all current tuples (with end time as infinity) in a separate
index, indexed on (a, start-time)
• To find tuples valid at a point in time t in the current tuple index,
search for tuples in the range (a, 0) to (a,t)
▪ Temporal index on primary key can help enforce temporal primary key constraint

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
End of Chapter 14

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan
Example of Hash Index

hash index on instructor, on attribute ID

Database System Concepts - 7th Edition 14. ©Silberschatz, Korth and Sudarshan

You might also like