0% found this document useful (0 votes)

24 views

B+ Tree and Hashing in Dbms

This document discusses indexing in database systems. It begins with basic concepts of indices, which are used to speed up access to desired data. Index files contain index entries with search keys and pointers. Ordered indices store keys in sorted order, while hash indices distribute keys uniformly using a hash function. The document then covers ordered index types like dense and sparse indices, secondary indices, multilevel indices, indexing on multiple keys, ISAM trees, and insertion/deletion in indices.

Uploaded by

Vishal kumar Maurya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

B+ Tree and Hashing in Dbms

Uploaded by

Vishal kumar Maurya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 110

Chapter 14: Indexing

Database System Concepts, 7th Ed.

©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Outline
• Basic Concepts
• Ordered Indices
• B+-Tree Index Files
• B-Tree Index Files
• Hashing
• Write-optimized indices
• Spatio-Temporal Indexing
Basic Concepts
• Indexing mechanisms used to speed up access to desired data.
– E.g., author catalog in library

• An index file consists of records (called index entries) of the form

search-key pointer

– Search Key - attribute to set of attributes used to look up records in a file.

• Index files are typically much smaller than the original file

• Two basic kinds of indices:

– Ordered indices: search keys are stored in sorted order
– Hash indices: search keys are distributed uniformly across “buckets” using a
“hash function”.
Index Evaluation Metrics
• Access types supported efficiently. E.g.,
– Records with a specified value in the attribute
– Records with an attribute value falling in a
specified range of values.
• Access time
• Insertion time
• Deletion time
• Space overhead
Ordered Indices
• In an ordered index, index entries are stored sorted on the search key
value.

• Clustering index: in a sequentially ordered file, the index whose search

key specifies the sequential order of the file.
– Also called primary index
– The search key of a primary index is usually but not necessarily the primary key.

• Secondary index: an index whose search key specifies an order

different from the sequential order of the file. Also called non-
clustering index.

• Index-sequential access file (ISAM): sequential file ordered on a

search key, with a clustering index on the search key.
– Designed for applications that require both sequential and random access to
individual/set of records.
Types of Ordered Indices
1. Dense Index Files
Dense index — Index record appears for every search-key value in the file.
– E.g. index on ID attribute of instructor relation

Identifier of a disk bock and an offset within the disk

block
Dense Index Files (Cont.)

• Dense index on dept_name, with instructor file

sorted on dept_name

In a dense secondary (non-clustering) index, the index must store a list of pointers to all
records with the same search-key value.
2. Sparse Index Files
Sparse Index: contains index records for only some
search-key values.
– Applicable when records are sequentially ordered on search-key

• To locate a record with search-key value K we:

– Find index record with largest search-key value < K
– Search file sequentially starting at the record to which the index record points

Compared to dense indices:

• Less space and less maintenance overhead for insertions and deletions.
• Generally slower than dense index for locating records.
Secondary Indices
• Frequently, one wants to find all the records whose values in a certain field
(which is not the search-key of the primary index) satisfy some condition.
– Example 1: In the instructor relation stored sequentially by ID, we may want to
find all instructors in a particular department
– Example 2: as above, but where we want to find all instructors with a specified
salary or with salary in a specified range of values

• We can have a secondary index with an index record for each search-key
value
Secondary Indices Example

▪ Secondary index on salary field of instructor

▪ Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
▪ Secondary indices have to be dense
Q: Does it make sense to create a Sparse index file for secondary index?
Multilevel Index
• If index does not fit in memory, access becomes
expensive.
• Solution: treat index kept on disk as a sequential
file and construct a sparse index on it.
– outer index – a sparse index of the basic index
– inner index – the basic index file
• If even outer index is too large to fit in main
memory, yet another level of index can be
created, and so on.
• Indices at all levels must be updated on
insertion or deletion from the file.
Multilevel Index (Cont.)
Index Update: Deletion

▪ If deleted record was the only

record in the file with its
particular search-key value,
the search-key is deleted
from the index also.

• Single-level index entry deletion:

– Dense indices – deletion of search-key is similar to file
record deletion.
– Sparse indices –
• if an entry for the search key exists in the index, it is deleted
by replacing the entry in the index with the next search-key
value in the file (in search-key order).
• If the next search-key value already has an index entry, the
entry is deleted instead of being replaced.
Index Update: Insertion
• Single-level index insertion:
– Perform a lookup using the search-key value of the record
to be inserted.
– Dense indices – if the search-key value does not appear in
the index, insert it
• Indices are maintained as sequential files
• Need to create space for new entry, overflow blocks may be
required
– Sparse indices – if index stores an entry for each block of
the file, no change needs to be made to the index unless a
new block is created.
• If a new block is created, the first search-key value appearing in
the new block is inserted into the index.
• Multilevel insertion and deletion: algorithms are
simple extensions of the single-level algorithms
Indices on Multiple Keys
• Composite search key
– E.g., index on instructor relation on attributes
(name, ID)
– Values are sorted lexicographically
• E.g. (John, 12121) < (John, 13514) and
(John, 13514) < (Peter, 11223)
– Can query on just name, or on (name, ID)
ISAM Trees
▪ Indexed Sequential Access Method (ISAM) trees
are static
Root
40

Non-Leaf
Pages
20 33 51 63

Leaf 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Pages

E.g., 2 Entries Per Page

ISAM Trees: Page Overflows
▪ What if there are a lot of insertions after creating
the tree?

Non-leaf
Pages

Leaf
Pages
Overflow
page
Primary pages
ISAM File Creation
▪ How to create an ISAM file?
▪ All leaf pages are allocated sequentially and
sorted on the search key value

▪ Or the data records are created and sorted

before allocating leaf pages

▪ The non-leaf pages are subsequently allocated

ISAM: Searching for Entries
▪ Search begins at root, and key comparisons direct it
to a leaf

▪ Search for 27*

Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)

▪ Insert 23*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)

▪ Insert 48*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48*
ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)

▪ Insert 41*
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23* 48* 41*

ISAM: Inserting Entries
▪ The appropriate page is determined as for a search, and the
entry is inserted (with overflow pages added if necessary)

Primary pages are NOT removed, even if they become empty!

ISAM: Some Issues
▪ Once an ISAM file is created, insertions and deletions affect only
the contents of leaf pages (i.e., ISAM is a static structure!)

▪ Since index-level pages are never modified, there is no need to

lock them during insertions/deletions
▪ Critical for concurrency!

▪ Long overflow chains can develop easily

▪ The tree can be initially set so that ~20% of each page is free

▪ If the data distribution and size are relatively static, ISAM might
be a good choice to pursue!
Dynamic Trees
▪ ISAM indices are static
▪ Long overflow chains can develop as the file grows, leading to
poor performance

▪ This calls for more flexible, dynamic indices that adjust

gracefully to insertions and deletions
▪ No need to allocate the leaf pages sequentially as in ISAM

▪ Among the most successful dynamic index schemes is

the B+ tree
B+ Tree Properties
• All paths from root to leaf are of the same length
• Each node that is not a root or a leaf has between ⎡n/2⎤ and n
children.
• A leaf node has between ⎡(n–1)/2⎤ and n–1 values
• Special cases:
– If the root is not a leaf, it has at least 2 children.
– If the root is a leaf (that is, there are no other nodes in the tree), it can
have between 0 and (n–1) values.

p1 pn+1
Points to a sub-tree Points to a sub-tree
in which all keys are k1 k2 … kn in which all keys are
less than k1 greater than or equal kn

Points to a sub-tree in which all keys are greater

than or equal k1 and less than to k2
Example of B+-Tree

Database System Concepts - 7th Edition 14.32 ©Silberschatz, Korth and Sudarshan
B+ Tree: Searching for Entries
▪ Search begins at root, and key comparisons direct it
to a leaf (as in ISAM)

▪ Example 1: Search for entry 5*

Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

✔
B+ Tree: Searching for Entries
▪ Search begins at root, and key comparisons direct it
to a leaf (as in ISAM)

▪ Example 2: Search for entry 15*

Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

15* is not found!

B+ Trees: Inserting Entries
▪ Find correct leaf L

▪ Put data entry onto L

▪ If L has enough space, done!
▪ Else, split L into L and a new node L2
▪ Re-partition entries evenly, copying up the middle key

▪ Parent node may overflow

▪ Push up middle key (splits “grow” trees; a root split
increases the height of the tree)
B+ Tree: Examples of Insertions
▪ Insert entry 8*

Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Leaf is full; hence, split!

B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

The middle key (i.e., 5) is “copied up”

5 and continues to appear in the leaf

2* 3* 5* 7* 8*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

5 5 13 17 24 30

> (n-1) keys

2* 3* 5* 7* 8*
Parent is full; hence, split!
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

The middle key (i.e., 17)

is “pushed up”
5 17

2* 3* 5* 7* 8* 5 13 24 30
B+ Tree: Examples of Insertions
17

▪ Insert entry 8* 5 13 24 30
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

2* 3* 5* 7* 8*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root
FINAL TREE! 17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Splitting the root lead to an increase of height by 1!

What about re-distributing entries instead of splitting nodes?

B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Leaf is full; hence, check the sibling ‘Poor Sibling’

B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

13 17 24 30
Do it through the parent

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
B+ Tree: Examples of Insertions
▪ Insert entry 8*
Root

8
13 17 24 30
Do it through the parent

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

“Copy up” the new low key value!

But, when to redistribute and when to split?

Splitting vs. Redistributing
▪ Leaf Nodes
▪ Previous and next-neighbor pointers must be updated
upon insertions (if splitting is to be pursued)
▪ Hence, checking whether redistribution is possible does
not increase I/O
▪ Therefore, if a sibling can spare an entry, re-distribute

▪ Non-Leaf Nodes
▪ Checking whether redistribution is possible usually
increases I/O
▪ Splitting non-leaf nodes typically pays off!
B+ Insertions: Keep in Mind
▪ Every data entry must appear in a leaf node;
hence, “copy up” the middle key upon splitting

▪ When splitting index entries, simply “push up” the

middle key

▪ Apply splitting and/or redistribution on leaf nodes

▪ Apply only splitting on non-leaf nodes

B+ Trees: Deleting Entries
▪ Start at root, find leaf L where entry belongs
▪ Remove the entry
▪ If L is at least half-full, done!
▪ If L underflows
▪Try to re-distribute (i.e., borrow from a “rich
sibling” and “copy up” its lowest key)
▪If re-distribution fails, merge L and a “poor
sibling”
▪Update parent
▪ And possibly merge, recursively
B+ Tree: Examples of Deletions
▪ Delete 19*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Removing 19* does not cause an underflow

B+ Tree: Examples of Deletions
▪ Delete 19*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

FINAL TREE!
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Deleting 20* causes an underflow; hence, check a sibling for redistribution

B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*

The sibling is ‘rich’ (i.e., can lend an entry); hence, remove 20* and redistribute!
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17 Is it done?
5 13 24 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

“Copy up” 27*, the lowest value in the leaf from which we borrowed 24*
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

“Copy up” 27*, the lowest value in the leaf from which we borrowed 24*
B+ Tree: Examples of Deletions
▪ Delete 20*
Root
17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

FINAL TREE!
B+ Tree: Examples of Deletions
▪ Delete 24*
Root
17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

The affected leaf will contain only 1 entry and the sibling cannot lend
any entry (i.e., redistribution is not applicable); hence, merge!
B+ Tree: Examples of Deletions
▪ Delete 24* “Toss” 27 because the page that it was
pointing to does not exist anymore!
Root
17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

30
Merge

22* 27* 29* 33* 34* 38* 39*

B+ Tree: Examples of Deletions
▪ Delete 24* Is it done?
Root
17 No, but almost there…

5 13 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

B+ Tree: Examples of Deletions
▪ Delete 24*
This entails an underflow; hence,
we must either redistribute or merge!
Root
17

5 13 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

B+ Tree: Examples of Deletions
The sibling is “poor” (i.e., redistribution
▪ Delete 24* is not applicable); hence, merge!

Root
17

Root “Pull down” 17!

5 13 17 30
B+ Tree: Examples of Deletions
▪ Delete 24*
Root
5 13 17 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

FINAL TREE!
B+ Tree: Bulk Loading
▪ Assume a collection of data records with an existing B+ tree
index on it
▪ How to add a new record to it?
▪ Use the B+ tree insert() function

▪ What if we have a collection of data records for which we

want to create a B+ tree index? (i.e., we want to bulk load
the B+ tree)
▪ Starting with an empty tree and using the insert() function
for each data record, one at a time, is expensive!
▪ This is because for each entry we would require starting again
from the root and going down to the appropriate leaf page
B+ Tree: Bulk Loading
▪ What to do?
▪ Initialization: Sort all data entries, insert pointer to first (leaf)
page in a new (root) page

Root
Sorted pages of data entries; not yet in B+ tree

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Add one entry to the root page for each subsequent page of
the sorted data entries (i.e., <lowest key value on page,
pointer to the page>)

Root
Sorted pages of data entries; not yet in B+ tree

Root
6

Root
6 10

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Split the root and create a new root page

Root
6 10

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Split the root and create a new root page

Root
10
‘push up’ the middle key

6 12

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Continue by inserting entries into the right-most index page
just above the leaf page; split when fills up
Root
10

6 12

Root 10 20

Data entry pages

6 12 23 35
not yet in B+
tree

3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
B+ Tree: Bulk Loading
▪ What to do?
▪ Continue by inserting entries into the right-most index page
just above the leaf page; split when fills up

Root 20

10 35 Data entry pages

not yet in B+
tree
6 12 23 38

3* 4* 6* 9* 10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
Hashing
Hash-Based Indexing
▪ What indexing technique can we use to support range
searches (e.g., “Find s_name where gpa >= 3.0)?
▪ Tree-Based Indexing

▪ What about equality selections (e.g., “Find s_name

where sid = 102”?
▪ Tree-Based Indexing
▪ Hash-Based Indexing (cannot support range searches!)

▪ Hash-based indexing, however, proves to be very useful

in implementing relational operators (e.g., joins)
Static Hashing
• A bucket is a unit of storage containing one or more entries (a
bucket is typically a disk block).
– we obtain the bucket of an entry from its search-key value using a hash
function
▪ A hash function h is used to map keys into a range of bucket numbers
• In a hash index, buckets store entries with pointers to records
• In a hash file-organization buckets store records

0
h(key) mod N 2
key
h With Static Hashing,
allocated (as needed)
With Static Hashing, when corresponding
allocated sequentially buckets become full
and never de-allocated N-1
Primary bucket pages Overflow pages
Deficiencies of Static Hashing

▪ In static hashing, function h maps search-key values to a fixed set of B of bucket

addresses. Databases grow or shrink with time.
• If initial number of buckets is too small, and file grows, performance will
degrade due to too much overflows.
• If space is allocated for anticipated growth, a significant amount of space
will be wasted initially (and buckets will be underfull).
• If database shrinks, again space will be wasted.
▪ One solution: periodic re-organization of the file with a new hash function
• Expensive, disrupts normal operations
▪ Better solution: allow the number of buckets to be modified dynamically.

Database System Concepts - 7th Edition 14.77 ©Silberschatz, Korth and Sudarshan
Extendible Hashing
▪ Extendible Hashing uses a directory of pointers to buckets
GLOBAL DEPTH

▪ The result of applying a hash 2

4 12 32 16 Bucket A
* * * *

function h is treated as a 0 1 5 21 Bucket

binary number and 0
0
* * * B

the last d bits are 1

1
10 Bucket
interpreted as an 0
11 * C

offset into the directory DIRECTORY 15 7 19 Bucket

* * * D

▪ d is referred to as the global depth DATA PAGES

of the hash file and is kept as part

of the header of the file
Extendible Hashing: Searching for Entries
▪ To search for a data entry, apply a hash function h to the
key and take the last d bits of its binary representation to
get the bucket number

▪ Example: search for 5*

0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
5 = 101 1 1 5 21
1 B
* * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
DATA PAGES
Extendible Hashing: Inserting Entries
▪ An entry can be inserted as follows:
▪ Find the appropriate bucket (as in search)

▪ Split the bucket if full and redistribute contents

(including the new entry to be inserted) across
the old bucket and its “split image”

▪ Double the directory if necessary

▪ Insert the given entry

Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry

▪ Example: insert 13*

0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
13 = 1101 1 1 5 21 13*
1 B
* * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry

▪ Example: insert 20* FULL, hence, split and redistribute!

0 Bucket A
4 12 32 16
0 * * * *
0 Bucket
20 = 10100 1 1 5 21 13*
B
1 * * *
0 Bucket
10
11 C
*
Bucket
15 7 19
DIRECTORY D
* * *
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
Bucket A
32 16

▪ Example: insert 20*

2 * *

0
0 Bucket
1 5 21 13*
0 B
* * *
1
1
20 = 10100 0 Bucket
11 10
C
*
DIRECTORY
Bucket
15 7 19
D
* * *

Bucket A2
Is this enough? 4
*
12 20*
*
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry
Bucket A
32 16

▪ Example: insert 20*

2 * *

0
0 Bucket
1 5 21 13*
0 B
* * *
1
1
20 = 10100 0 Bucket
11 10
C
*
DIRECTORY
Bucket
15 7 19
D
* * *
Double the directory and
Bucket A2
increase the global depth 4
*
12 20*
*
(`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH
32 16 Bucket A
* *

▪ Example: insert 20*

3
0 00 1 5 21 13 Bucket
* * * * B
00
These two bits indicate a data entry that 1
01
belongs to one of these two buckets 0
011 10 Bucket
* C
1 00
10
The third bit distinguishes between these 1
110 15 7 19 Bucket
two buckets! * * * D
111

But, is it necessary always to DIRECTORY 4 12 20 Bucket A2

double the directory? * * * (`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH 32 16 Bucket A
* *
FULL, hence, split!
▪ Example: insert 9*
3
00 1 5 21 13 Bucket
0 * * * * B
00
1
01
9 = 1001 0 10 Bucket
011
* C
10
0
10
1 15 7 19 Bucket
110
* * * D
111

DIRECTORY 4 12 20 Bucket A2
* * * (`split image'
of Bucket A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH 32 16 Bucket A
* *

▪ Example: insert 9*
3
00 1 9 Bucket
0 * * B
00
1
01
10 Bucket
9 = 1001 0
011 * C
10
0 15 7 19
10 Bucket
1 * * * D
110
Almost there… 111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry GLOBAL DEPTH 32 16 Bucket A
* *

▪ Example: insert 9*
3
00 1 9 Bucket
0 * * B
00
1
01
10 Bucket
9 = 1001 0
011 * C
10
There was no need to 0
10 15 7 19 Bucket
1 * * * D
double the directory! 110
111 12 20 Bucket A2
4
* * * (`split image‘ of A)
When NOT to double the DIRECTORY
directory? 5 21 13 Bucket A2
* * * (`split image‘ of A)
Extendible Hashing: Inserting Entries
▪ Find the appropriate bucket (as in search), split the bucket
if full, double the directory if necessary and insert the
given entry LOCAL DEPTH
GLOBAL DEPTH
3
32 16 Bucket A
* *

▪ Example: insert 9*
3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
If a bucket whose local depth 1
110
* * * D
3
equals to the global depth, 111
4 12 20 Bucket A2
the directory must be * * * (`split image‘ of A)
doubled DIRECTORY 3
5 21 13 Bucket A2
* * * (`split image‘ of A)
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *
2
FULL, hence, split!
3
00 1 5 21 13 Bucket
0 * * * * B
00
1
01 2

9 = 1001 0 10 Bucket
011
* C
10
0 2
10
Because the local depth 1 15 7 19 Bucket
110
* * *
(i.e., 2) is less than the 111
D
global depth (i.e., 3), NO 3
need to double the DIRECTORY 4 12 20 Bucket A2
directory * * * (`split
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *

3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
1 * * * D
110 3
111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY 3
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Example: insert 9*
LOCAL 3
Repeat… DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *

3 3
00 1 9 Bucket
0 * * B
00 2
1
01
10 Bucket
9 = 1001 0
011 * C
10 2
0 15 7 19
10 Bucket
1 * * * D
FINAL STATE! 110 3
111
4 12 20 Bucket A2
* * * (`split image‘ of A)
DIRECTORY 3
5 21 13 Bucket B2
* * * (`split image‘ of B)
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
FULL, hence, split!
Repeat… LOCAL 2
DEPTH Bucket A
GLOBAL DEPTH 4 12 32 16
* * * *

2 2
Bucket
0 1 5 21 13
B
0 * * * *
0
20 = 10100 1
1 2
0 Bucket
11 10
C
Because the local depth *

and the global depth are

2
both 2, we should double DIRECTORY
Bucket
the directory! 15 7 19
D
* * *
DATA PAGES
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
Repeat… LOCAL 2
Bucket A
DEPTH 32 16
GLOBAL DEPTH
* *

2 2

0 1 5 21 13 Bucket
0 * * * * B
0
11 2
20 = 10100 011 10 Bucket
* C

2
DIRECTORY Bucket
15 7 19
Is this enough? * * *
D

2
4 12 20 Bucket A2
(`split
* * *
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
2
Repeat… LOCAL
DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *

3 2
00 1 5 21 13 Bucket
0 * * * * B
00
1
01 2
0 10 Bucket
011
* C
10
0 2
10
1 15 7 19 Bucket
110
Is this enough? 111
* * * D

2
DIRECTORY 4 12 20 Bucket A2
* * * (`split
of Bucket A)
image'
Extendible Hashing: Inserting Entries
▪ Example: insert 20*
3
Repeat… LOCAL
DEPTH 32 16 Bucket A
GLOBAL DEPTH
* *

3 2
00 1 5 21 13 Bucket
0 * * * * B
00
1
FINAL STATE! 01 2
0 10 Bucket
011
* C
10
0 2
10
1 15 7 19 Bucket
110
* * * D
111
3
DIRECTORY 4 12 20 Bucket A2
* * * (`split
of Bucket A)
image'
Bitmap Indices

▪ Bitmap indices are a special type of index designed for efficient querying on
multiple keys
▪ Records in a relation are assumed to be numbered sequentially from, say, 0
• Given a number n it must be easy to retrieve record n
▪ Particularly easy if records are of fixed size
▪ Applicable on attributes that take on a relatively small number of distinct values
• E.g., gender, country, state, …
• E.g., income-level (income broken up into a small number of levels such as
0-9999, 10000-19999, 20000-50000, 50000- infinity)
▪ A bitmap is simply an array of bits

Database System Concepts - 7th Edition 14.97 ©Silberschatz, Korth and Sudarshan
Bitmap Indices (Cont.)
▪ In its simplest form a bitmap index on an attribute has a bitmap for each value of
the attribute
• Bitmap has as many bits as records
• In a bitmap for value v, the bit for a record is 1 if the record has the value v
for the attribute, and is 0 otherwise
▪ Example

▪ Bitmap indices are useful for queries on multiple attributes

• not particularly useful for single attribute queries
▪ Queries are answered using bitmap operations
• Intersection (and)
• Union (or)
▪ Each operation takes two bitmaps of the same size and applies the operation on corresponding
bits to get the result bitmap
• E.g., 100110 AND 110011 = 100010
100110 OR 110011 = 110111
NOT 100110 = 011001
• Males with income level L1: 10010 AND 10100 = 10000
▪ Can then retrieve required tuples.
▪ Counting number of matching tuples is even faster

▪ Bitmap indices generally very small compared with relation size

• E.g., If number of distinct attribute values is 8, bitmap is only 1% of relation size

▪ Databases can store data types such as lines, polygons, in addition to raster images
• allows relational databases to store and retrieve spatial information
• Queries can use spatial conditions (e.g. contains or overlaps).
• queries can mix spatial and nonspatial conditions
▪ Nearest neighbor queries, given a point or an object, find the nearest object that
satisfies given conditions.
▪ Range queries deal with spatial regions. e.g., ask for objects that lie partially or
fully inside a specified region.
▪ Queries that compute intersections or unions of regions.
▪ Spatial join of two spatial relations with the location playing the role of join
attribute.

▪ k-d tree - early structure used for indexing

in multiple dimensions.
▪ Each level of a k-d tree partitions the space
into two.
• Choose one dimension for partitioning
at the root level of the tree.
• Choose another dimensions for
partitioning in nodes at the next level
and so on, cycling through the
dimensions.
▪ In each node, approximately half of the
points stored in the sub-tree fall on one side
and half on the other.
▪ The k-d-B tree extends the k-d
▪ Partitioning stops when a node has less than tree to allow multiple child nodes
a given number of points. for each internal node; well-suited
for secondary storage.

▪ Each node of a quadtree is associated with a rectangular region of space; the top
node is associated with the entire target space.
▪ Each non-leaf nodes divides its region into four equal sized quadrants
• correspondingly each such node has four child nodes corresponding to the
four quadrants and so on
▪ Leaf nodes have between zero and some fixed maximum number of points (set to
1 in example).

▪ R-trees are a N-dimensional extension of B+-trees, useful for indexing sets of

rectangles and other polygons.
▪ Supported in many modern database systems, along with variants like R + -trees
and R*-trees.
▪ Basic idea: generalize the notion of a one-dimensional interval associated with
each B+ -tree node to an
N-dimensional interval, that is, an N-dimensional rectangle.
▪ Will consider only the two-dimensional case (N = 2)
• generalization for N > 2 is straightforward, although R-trees work well only
for relatively small N
▪ The bounding box of a node is a minimum sized rectangle that contains all the
rectangles/polygons associated with the node
• Bounding boxes of children of a node are allowed to overlap

▪ A set of rectangles (solid line) and the bounding boxes (dashed line) of the nodes
of an R-tree for the rectangles.
▪ The R-tree is shown on the right.

▪ To find data items intersecting a given query point/region, do the following,

starting from the root node:
• If the node is a leaf node, output the data items whose keys intersect the given
query point/region.
• Else, for each child of the current node whose bounding box intersects the
query point/region, recursively search the child
▪ Can be very inefficient in worst case since multiple paths may need to be
searched, but works acceptably in practice.

▪ Temporal data refers to data that has an associated time period (interval)
• Example: a temporal version of the course relation

▪ Time interval has a start and end time

• End time set to infinity (or large date such as 9999-12-31) if a tuple is
currently valid and its validity end time is not currently known
▪ Query may ask for all tuples that are valid at a point in time or during a time
interval
• Index on valid time period speeds up this task

▪ To create a temporal index on attribute a:

• Use spatial index, such as R-tree, with attribute a as one dimension, and time
as another dimension
▪ Valid time forms an interval in the time dimension
• Tuples that are currently valid cause problems, since value is infinite or very
large
▪ Solution: store all current tuples (with end time as infinity) in a separate
index, indexed on (a, start-time)
• To find tuples valid at a point in time t in the current tuple index,
search for tuples in the range (a, 0) to (a,t)
▪ Temporal index on primary key can help enforce temporal primary key constraint

hash index on instructor, on attribute ID

OCR Computer Science Paper 2 2024 unofficial mark scheme
No ratings yet
OCR Computer Science Paper 2 2024 unofficial mark scheme
7 pages
DBMS Unit V
No ratings yet
DBMS Unit V
17 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Indexing_Hashing_Files
No ratings yet
Indexing_Hashing_Files
68 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Indexing
No ratings yet
Indexing
11 pages
INDEXING
No ratings yet
INDEXING
10 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Module 4 Indexing
No ratings yet
Module 4 Indexing
20 pages
Indexing
No ratings yet
Indexing
8 pages
PPT-203105251-3
No ratings yet
PPT-203105251-3
35 pages
CS2202_IndexingHashing
No ratings yet
CS2202_IndexingHashing
83 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
File Organization
No ratings yet
File Organization
41 pages
Indexes
No ratings yet
Indexes
70 pages
Indexing
No ratings yet
Indexing
62 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Indexing
No ratings yet
Indexing
6 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Index Structures
No ratings yet
Index Structures
34 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
03 UW Indexing (1)
No ratings yet
03 UW Indexing (1)
97 pages
Chapter_3_File_Organization_Indexed_methods
No ratings yet
Chapter_3_File_Organization_Indexed_methods
31 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
SS3 TERM 1
No ratings yet
SS3 TERM 1
18 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
84 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
84 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
84 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
84 pages
IT3031-L06-Indexing
No ratings yet
IT3031-L06-Indexing
45 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
Dbms Notes
No ratings yet
Dbms Notes
21 pages
ch12 1 40
No ratings yet
ch12 1 40
40 pages
CMP 312
No ratings yet
CMP 312
2 pages
Indexing
No ratings yet
Indexing
24 pages
Indexing and Hashing: B.Ramamurthy
No ratings yet
Indexing and Hashing: B.Ramamurthy
24 pages
Lesson 8 Cs450 - Indexing
No ratings yet
Lesson 8 Cs450 - Indexing
31 pages
V Unit
No ratings yet
V Unit
15 pages
V_Unit[1]
No ratings yet
V_Unit[1]
36 pages
Screenshot 2025-03-12 at 9.41.04 AM
No ratings yet
Screenshot 2025-03-12 at 9.41.04 AM
41 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Test - 30: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 30: Final Test Series (Online) For JEE (Main) - 2021
9 pages
Test - 32: Final Test Series (Online) For JEE (Main) - 2021
100% (2)
Test - 32: Final Test Series (Online) For JEE (Main) - 2021
9 pages
Test - 31: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 31: Final Test Series (Online) For JEE (Main) - 2021
10 pages
Test - 32: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 32: Final Test Series (Online) For JEE (Main) - 2021
10 pages
Test - 29: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 29: Final Test Series (Online) For JEE (Main) - 2021
9 pages
Test - 30: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 30: Final Test Series (Online) For JEE (Main) - 2021
9 pages
Test - 28: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 28: Final Test Series (Online) For JEE (Main) - 2021
8 pages
Test - 27: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 27: Final Test Series (Online) For JEE (Main) - 2021
10 pages
Test - 28: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 28: Final Test Series (Online) For JEE (Main) - 2021
9 pages
Test - 26: Final Test Series (Online) For JEE (Main) - 2021
No ratings yet
Test - 26: Final Test Series (Online) For JEE (Main) - 2021
9 pages
Sets and Maps in Java
No ratings yet
Sets and Maps in Java
4 pages
Data Structures
No ratings yet
Data Structures
10 pages
Linear Programming
No ratings yet
Linear Programming
14 pages
GP-CP-Language Tools For JAVA
No ratings yet
GP-CP-Language Tools For JAVA
3 pages
Traveling Salesman Problem
No ratings yet
Traveling Salesman Problem
6 pages
Huffman_coding
No ratings yet
Huffman_coding
30 pages
Report Orange Ngviethoang0212
No ratings yet
Report Orange Ngviethoang0212
15 pages
Patterns DSA
No ratings yet
Patterns DSA
2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-15
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-15
3 pages
Algorithm and DS Sample Exam
No ratings yet
Algorithm and DS Sample Exam
2 pages
Data Structures File
No ratings yet
Data Structures File
43 pages
Graph Theory New (1)
No ratings yet
Graph Theory New (1)
22 pages
Beyond Binary Classification
No ratings yet
Beyond Binary Classification
34 pages
Title: Non-Linear Optimization (Unconstrained) - Direct Search Method
No ratings yet
Title: Non-Linear Optimization (Unconstrained) - Direct Search Method
21 pages
Unit 1
No ratings yet
Unit 1
17 pages
Doubly Linked List in Python
No ratings yet
Doubly Linked List in Python
9 pages
UG - B.Sc. - Computer Science - DATA STRUCTURES AND ALGORITHMS-13033
No ratings yet
UG - B.Sc. - Computer Science - DATA STRUCTURES AND ALGORITHMS-13033
136 pages
Cyber Security and Data Mining Competition Phase-04: Team Members
No ratings yet
Cyber Security and Data Mining Competition Phase-04: Team Members
13 pages
C Program To Implement Single Linked List
No ratings yet
C Program To Implement Single Linked List
12 pages
Introduction to the Design Analysis of Algorithms Second Edition International edition Anany Levitin download
100% (1)
Introduction to the Design Analysis of Algorithms Second Edition International edition Anany Levitin download
51 pages
Stack and Queue
No ratings yet
Stack and Queue
16 pages
9 AIML Question bank updated 5 units
No ratings yet
9 AIML Question bank updated 5 units
21 pages
CSE220 Quiz 01 - 02 (Fall 2022)
No ratings yet
CSE220 Quiz 01 - 02 (Fall 2022)
3 pages
DOC-20241006-WA0002
No ratings yet
DOC-20241006-WA0002
18 pages
1) Write A Program That Displays The Sum of Two Numbers. Apply The Software Development Method? Calculate The Sum of Two Numbers
No ratings yet
1) Write A Program That Displays The Sum of Two Numbers. Apply The Software Development Method? Calculate The Sum of Two Numbers
5 pages
Huffman Coding Source Code
No ratings yet
Huffman Coding Source Code
4 pages
The Traveling Salesman Problem: Irina Bryan April 18, 2009
No ratings yet
The Traveling Salesman Problem: Irina Bryan April 18, 2009
17 pages
XII Stack Worksheet
No ratings yet
XII Stack Worksheet
5 pages
Rubik Cube Solution: Irst Ayer
No ratings yet
Rubik Cube Solution: Irst Ayer
4 pages