0% found this document useful (0 votes)

5 views36 pages

Tutorial 10 Indexing

The document discusses two main indexing approaches in a DBMS: B+-tree and extendable hashing, noting that these topics will not be included in the exam. It details the creation of indexes in MySQL, the structure and properties of B+-trees, and the process of insertion and deletion within these trees. Additionally, it covers the concept of extendable hashing, highlighting its dynamic nature and the use of a bucket address table for efficient data management.

Uploaded by

wzy190817

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views36 pages

Tutorial 10 Indexing

Uploaded by

wzy190817

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Indexing COMP3278 2024-2025

Overview
• Two approach in indexing in a DBMS
• B+-tree
• Extendable hashing

This topic will not be included in the exam.

2
CREATE INDEX in MySQL
https://dev.mysql.com/doc/refman/5.7/en/create-index.html

• One can create an index in MySQL using the CREATE INDEX syntax.
• This speed up execution of queries if the index could be used.
• Note that all primary key are indexed

Index creation is DBMS specific, different DBMS (and

storage engine) will provide different indexing capabilities.
Knowing how to create proper index in a database for an
application is an important skill for database administrators.

3
Tree and hash
• Two major approaches are used to provide index
• Tree provides ordered list of indexed columns, support comparison operators
and ORDER BY queries
• Hash provides unordered list of indexed columns, support fast access by index
1
18 5
2
14 16 20 00
2
01
2
10
11
5 7 14 15 16 17 18 19 20 Bucket address table
2
3 11
7
4
+
B -tree - node
search-key value
A node in B+-tree consists of
(search-key) values and pointers
14 20

pointer

The number of search-key values is configurable

(but is fixed in the same tree)

In this example, the tree has an order 4, i.e., there are

4 pointers (3 search-key values) in a node.

5
+
B -tree – leaf and non-leaf node
Pointers in a non-leaf node points
to nodes at a lower level
14 20 Pointer left of a search-key value
points to a node with value less than
< 14 ≥ 14 < 20 ≥ 20 the search-key value.

5 7 15 20
Leaf nodes formed a linked-list using
their last pointers
Pointer before a value Values in non-leaf nodes are part of the tree structure,
points to its corresponding only the values in the leaves are the indexed values.
record

6
Consider this B+-tree,
1. How many nodes are accessed if we want to find the
value 16?
+
B -tree – quick test 2. How many nodes are accessed if we want to find
values between 13 and 17?
3. If no new node is added, how many values could be
stored in this tree?

14 18

7 9 14 20

1 3 4 7 8 10 13 14 15 16 17 18 19 20 21

Why don’t we just use binary tree?

B+-tree optimized memory access, a node fits in a single block in memory.
This minimize the number of block retrieval.

7
B+-tree properties
• Always balanced – depth of all leaf nodes are the same
• Except for the root node:
• A leaf node must be at least half-full (half of the values are not empty)
𝑚−1
• i.e., at least values for tree with order 𝑚.
2
• A non-leaf node must be at least half-full (half of the pointers are pointing to
a child)
𝑚
• i.e., at least pointers for tree with order 𝑚.
2

• In this discussion, we assume that the value to be indexed is unique.

• How non-unique values could be stored is implementation specific.

8
+
B -tree – insertion (1) Simply search and fill if leaf node is not full

Insert: 17
14 20

5 7 15 20

9
B+-tree – insertion (1) Simply search and fill if leaf node is not full
result

Insert: 17
14 20

5 7 15 17 20

10
B+-tree – insertion (1) Split node if inserting to a full node
result

Insert: 18
14 20

5 7 15 17 19 20

11
B+-tree – insertion (2) Pick a mid-point to split
In case the number of values to be
Splitting - 1 redistributed is odd, the choice is
implementation dependent
Insert: 18
15 17 18 19

If adding a value to a node exceed its Once split, a new parent is needed.
capacity, the node should be split and We pick the first value in the right child
the values should be redistributed As the parent.
18 We need to insert this value to the
parent node.

15 17 18 19
12
B+-tree – insertion (2) Insert to parent recursively
Update parent

Insert: 18
14 20

18 Parent can keep it, so

it is done.
5 7 20

15 17 18 19
13
B+-tree – insertion (2)
result

Insert: 18
14 18 20

5 7 15 17 18 19 20

14
+
B -tree – insertion (3) Node splitting may need to be done
recursively

Insert: 14
14 18 20

5 7 15 16 17 18 19 20

15
B+-tree – insertion (3) Node splitting may need to be done
First split recursively

Insert: 14
14 18 20

Parent is also full,

need to split
16
parent also
5 7 18 19 20

14 15 16 17
16
B+-tree – insertion (3) For non-leaf nodes, we split according
Splitting parent to the pointers but not values.
Here we have 5 pointers, the way to
split it is implementation specific
14 16 18 20
To do this, we pull one value up as
the new parent of the split. 18

14 16 20

14 16 20
17
B+-tree – insertion (3)
Result

Insert: 14
18

14 16 20

5 7 14 15 16 17 18 19 20

18
+
B -tree – deletion (1) When a value is deleted from a node, the
node may have less values than the
minimum.

Delete: 18
18

14 16 20

5 7 14 15 16 17 18 19 20
Once 18 is removed, this node has only 1 value.
Since a node must be at least half-full, this node
should be removed
19
B+-tree – deletion (1) If there is a sibling node that can take the value,
Merging nodes we merge the two nodes.
The choice of left/right sibling is implementation
specific.
Delete: 18
18

14 16 20

5 7 14 15 16 17 18 19 20
Note that this is NOT a Merge
sibling of the target node!

20
B+-tree – deletion (1)
Parent node
Once leaf nodes are merged, the
Delete: 18 corresponding parent node has to be
updated.
18 The parent node has only one
pointer now, which is less than
half full, this node has to be
14 16 removed also.

This time we can also merge with

its sibling.

5 7 14 15 16 17 19 20
Note that this is NOT a Merged
sibling of the target node!

21
B+-tree – deletion (1)
Parent node (2)

Delete: 18
What should this be?
18 Could be 19 – because the first value in the
children is 19
Could be 18 – because the previous value in the
14 16 parent is 18

5 7 14 15 16 17 19 20
Note that this is NOT a Merged
sibling of the target node!

22
B+-tree – deletion (1)
Result

Delete: 18 The original root node could be

removed as there is only one child.

14 16 18

5 7 14 15 16 17 19 20
Note that this is NOT a Merged
sibling of the target node!

23
+
B -tree – deletion (2) If merging fails, we need to redistribute
the nodes with its sibling.

Delete: 18
18

14 16 20

5 7 14 15 16 17 18 19 20 21 25
Once 18 is removed, the sibling node cannot take
the remaining value, a redistribution should be
done.
24
B+-tree – deletion (2)
Update parent

Delete: 18
18 The value in the parent node has to be
updated

14 16 21

5 7 14 15 16 17 19 20 21 25
Redistributed

25
+
B -tree – Summary
• Insertion: In practice, sometimes, parent node
is not updated after deletion even
• Split node and generate a new parent when there is an under fill, why?
• Recursively insert new parent to parent node
• Deletion:
• Merge node with sibling if node is underfill
• Recursively update parent nodes, delete node if empty
• Redistribute nodes if merging is not possible
• Update parent nodes

26
Extendable hashing
• Traditional hashing is static in nature
• Need to decide on the initial hash table size
• Not scaling well when database grows
• Dynamic hashing expand the hash table when it is needed
• Could be optimized by using bucket of size equals to one block of memory
• Extendable hashing is one of the example

27
Here is a typical implementation of
Static hashing static hashing for a database

Bucket 0
Each entry points to a
Value 𝑥 is placed in bucket 𝑚 if 4 record in the database
ℎ 𝑥 =𝑚

Bucket 1
5 21 If a bucket is full, overflow bucket will
be added through chaining
13
Bucket 2

Number of initial buckets equals the Configuration:

number of possible hash values, Bucket 3 hash function ℎ(𝑥) gives a hash value of 0 − 3 .
these buckets will be created even if Bucket size is 2.
they are empty.

28
Extendable hashing added two features:
1. A bucket address table
Extendable hashing 2. Hash prefix size

Hash prefix size Multiple entries in the bucket address table could
1 points to the same bucket.
1417 All entries in the same bucket has the same hash
3 prefix of the specific hash prefix size.
000
001 2
010 709 1867 If there are many entries with the
1346 same hash value, there will be
011 overflow buckets
100 3
101 1640
110
111
Bucket address table
3 Configuration:
hash function ℎ(𝑥) gives a hash value of 𝑛 bits.
Bucket address table lists all hash prefix 1394 Bucket size is 2.
of size 𝑛 bit. Where 𝑛 is indicated in the 1653
hash prefix size record. 29
Extendable hashing – construction example
Suppose we are adding these entries
in a configuration of extendable 𝒙 𝒉 𝒙
hashing with bucket size of 2. 3 3 (11)
2 2 (10)
5 1 (01)
7 3 (11)
11 3 (11)
13 1 (01)
0 0 (00)
1 1 (01)

30
𝒙 𝒉 𝒙
3 3 (11)
Extendable hashing 2 2 (10)
Initial state, add 3, 2 5 1 (01)
7 3 (11)
Initial state Add 3
11 3 (11)
0 0
13 1 (01)
0 0 3 0 0 (00)
1 1 (01)
Bucket address table Bucket address table

Initially, the hash prefix size is 0. ℎ 3 = 3, current hash prefix size is 0

There is one entry in the bucket address table First 0 bit is ∅ (nothing)
(20 = 1) The only bucket is not full, so 3 could be added.
There is one corresponding empty bucket Add 2
0
0 3
2
Bucket address table

Same when adding 2. 31

𝒙 𝒉 𝒙
Split bucket
1 prefix: 0 3 3 (11)
Extendable hashing 0
2 2 (10)
add 5 3
5 1 (01)
Add 5 7 3 (11)
0 2 11 3 (11)
0 1 prefix: 1
3 13 1 (01)
3
2 0 0 (00)
Bucket address table
2
1 1 (01)
Expand bucket address table

1 1
5 0 0
1
0 1
Bucket address table Bucket address table
1
Bucket address table 1
ℎ 5 = 1, first 0 bit is ∅ (nothing)
3 The only bucket is full, we need to split the bucket.
2 Hash prefix size for the new buckets is increased by 1.
The bucket address table has a smaller hash prefix size,
so it needs to be expanded.
32
𝒙 𝒉 𝒙
Split bucket
2 prefix: 10 3 3 (11)
Extendable hashing 1
2 2 2 (10)
add 7 3
5 1 (01)
Add 7 7 3 (11)
1 2 2 prefix: 11 11 3 (11)
1 5 3
13 1 (01)
0
1 0 0 (00)
1 Expand bucket address table
Bucket address table 1 1 (01)
3
2
2 1
00
0
1 01
1 10
5 Bucket address table
2 11
Bucket address table
00
2
01
2 ℎ 7 = 3, first 1 bit is 1
10 Bucket for “1” is full, we need to split the bucket.
11 Hash prefix size for the new buckets is increased by 1.
Bucket address table
2 The bucket address table has a smaller hash prefix size,
3 so it needs to be expanded.
7 33
𝒙 𝒉 𝒙
3 3 (11)
Extendable hashing 2 2 (10)
add 11 5 1 (01)
7 3 (11)
Add 11
11 3 (11)
1
13 1 (01)
5
0 0 (00)
2
00 1 1 (01)
2
01
2
10
11
Bucket address table
2
3 11
7

ℎ 11 = 3, first 2 bit is 11
Bucket for “11” is full, we consider splitting the bucket to prefix size 3, however all
entries has the same hash prefix of size 3. Splitting cannot resolve the collision.
An overflow bucket is added instead.
34
𝒙 𝒉 𝒙
3 3 (11)
Extendable hashing 2
2 2 (10)
add 13, 0, 1 0
5 1 (01)
7 3 (11)
11 3 (11)
2
13 1 (01)
5 1
13 0 0 (00)
2
00 1 1 (01)
2
01
2
10
11
Bucket address table
2
3 11
7

You can apply the same procedure to add the remaining 3.

This is left as an exercise.

35
Extendable hashing – Summary
• Initialize Bucket address table with prefix size 0.
• Associate one empty bucket of prefix size 0 to the only entry in the bucket
address table.
• To add a value, check the first 𝑛 bit of the hash value, where 𝑛 is the
current prefix size. Find the corresponding bucket.
• If the bucket is not full, add the entry to the bucket.
• If the bucket is full, try to split the bucket and redistribute the existing entries.
• If the bucket is still full, use an overflow bucket instead.
• Add the new entry to the newly split bucket. If the prefix size of the bucket is
larger than the current prefix size in the bucket address table, expand the
bucket address table.
• Then update the bucket address table.

RPP Akuntansi Dasar Dalam Bahasa Inggris
No ratings yet
RPP Akuntansi Dasar Dalam Bahasa Inggris
18 pages
Btrees Animated
No ratings yet
Btrees Animated
77 pages
Chapter 7 - Indexing
No ratings yet
Chapter 7 - Indexing
94 pages
CNG351 Lecture 12 B
No ratings yet
CNG351 Lecture 12 B
34 pages
Song Lyrics
No ratings yet
Song Lyrics
45 pages
Unit-5 B+Trees & Hashing
No ratings yet
Unit-5 B+Trees & Hashing
37 pages
B+ Tree and Hashing
100% (1)
B+ Tree and Hashing
24 pages
Assignment 1 PDF
No ratings yet
Assignment 1 PDF
3 pages
B-Trees DS
No ratings yet
B-Trees DS
28 pages
B.SC - in Civil Engineering Session 2014 2015
No ratings yet
B.SC - in Civil Engineering Session 2014 2015
25 pages
Adb A1 B21it083
No ratings yet
Adb A1 B21it083
15 pages
Trees
No ratings yet
Trees
20 pages
Physical DBs B Tree PDF
No ratings yet
Physical DBs B Tree PDF
35 pages
B+ Tree in DBMS
No ratings yet
B+ Tree in DBMS
21 pages
Annexes 1 - 18
No ratings yet
Annexes 1 - 18
26 pages
Class 15
No ratings yet
Class 15
18 pages
Literature Per Urat PDF
No ratings yet
Literature Per Urat PDF
6 pages
Iso 17 (1973)
No ratings yet
Iso 17 (1973)
8 pages
Physical DBs B+ Tree
No ratings yet
Physical DBs B+ Tree
35 pages
5b Tree Indexes
No ratings yet
5b Tree Indexes
41 pages
Cueng Discover Analyze Read Publish - 0
No ratings yet
Cueng Discover Analyze Read Publish - 0
48 pages
Unit-4 Hand Written
No ratings yet
Unit-4 Hand Written
35 pages
Dynamics of Human Existence. Dr. S. Jeyapragasam
No ratings yet
Dynamics of Human Existence. Dr. S. Jeyapragasam
7 pages
B Tree
No ratings yet
B Tree
53 pages
How To Create A Local Mirror of The Latest Update For Red Hat Enterprise Linux 5, 6, 7, 8 Without Using Satellite Server?
No ratings yet
How To Create A Local Mirror of The Latest Update For Red Hat Enterprise Linux 5, 6, 7, 8 Without Using Satellite Server?
28 pages
Dylyver Technologies Business Plan
No ratings yet
Dylyver Technologies Business Plan
29 pages
n3 BTrees
No ratings yet
n3 BTrees
14 pages
JENA
No ratings yet
JENA
91 pages
Group 4: Diet For Healthy Teath Bones
No ratings yet
Group 4: Diet For Healthy Teath Bones
26 pages
B+-Trees: Adapted From Mike Franklin
No ratings yet
B+-Trees: Adapted From Mike Franklin
21 pages
42 - 62 FH16 64T - 610HP - Prime Mover
No ratings yet
42 - 62 FH16 64T - 610HP - Prime Mover
5 pages
Indexing
No ratings yet
Indexing
56 pages
Lecture 9 DB Normalization
No ratings yet
Lecture 9 DB Normalization
112 pages
Indexing
No ratings yet
Indexing
77 pages
CS143: Index: Basic Problem Random-Order File
No ratings yet
CS143: Index: Basic Problem Random-Order File
12 pages
Microsoft Word - WW Formula 2nd Edition Cover and Index
No ratings yet
Microsoft Word - WW Formula 2nd Edition Cover and Index
20 pages
Bohler Art of Interpretation PDF
No ratings yet
Bohler Art of Interpretation PDF
20 pages
MZDG - 66172 1 en 0603
No ratings yet
MZDG - 66172 1 en 0603
44 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Unit 5 Indexing 2024
No ratings yet
Unit 5 Indexing 2024
50 pages
B - Tree
No ratings yet
B - Tree
46 pages
B+ Tree Rules
No ratings yet
B+ Tree Rules
9 pages
Lecture 6 - Searching
No ratings yet
Lecture 6 - Searching
41 pages
Brandon Jones 2017
No ratings yet
Brandon Jones 2017
6 pages
The Common House Gecko, Hemidactylus Frenatus Schlegel in Dumeril & Bibron 1836 (Reptilia: Gekkonidae) in Gujarat, India
No ratings yet
The Common House Gecko, Hemidactylus Frenatus Schlegel in Dumeril & Bibron 1836 (Reptilia: Gekkonidae) in Gujarat, India
6 pages
Multi Last
No ratings yet
Multi Last
10 pages
2 - Indexing Structures - Ch14
No ratings yet
2 - Indexing Structures - Ch14
50 pages
CPS216: Data-Intensive Computing Systems
No ratings yet
CPS216: Data-Intensive Computing Systems
70 pages
Hamm 2015
No ratings yet
Hamm 2015
8 pages
Importance of The Surface Oxide Layer in The Reduction of Outgassing From Stainless Steels
No ratings yet
Importance of The Surface Oxide Layer in The Reduction of Outgassing From Stainless Steels
7 pages
n04-B Trees
No ratings yet
n04-B Trees
19 pages
BCSE302L-Database Systems Module - 4 Part2
No ratings yet
BCSE302L-Database Systems Module - 4 Part2
71 pages
Trinomial Option Pricing Model
No ratings yet
Trinomial Option Pricing Model
5 pages
Rate Analysis of Ms Maqbool Ahme03122021095727
No ratings yet
Rate Analysis of Ms Maqbool Ahme03122021095727
6 pages
Approaches To Stylistics and The Literar
No ratings yet
Approaches To Stylistics and The Literar
8 pages
Revised Release Plan
No ratings yet
Revised Release Plan
1 page
Unit V
No ratings yet
Unit V
55 pages
Discursive - Boxing Passage
No ratings yet
Discursive - Boxing Passage
4 pages
Exercise 2 Memory Management
No ratings yet
Exercise 2 Memory Management
2 pages
CHE134P FINAL EXAM 2013 14 4t
No ratings yet
CHE134P FINAL EXAM 2013 14 4t
10 pages
CSE 301 Lecture-8-Indexing WT
No ratings yet
CSE 301 Lecture-8-Indexing WT
31 pages
08 Indexes1
No ratings yet
08 Indexes1
7 pages
Lesson 04
No ratings yet
Lesson 04
58 pages
Theory of Elasticity and Plasticity. (CVL 622) M.Tech. CE Term-2 (2017-18)
No ratings yet
Theory of Elasticity and Plasticity. (CVL 622) M.Tech. CE Term-2 (2017-18)
2 pages
Multilevel Indexing and B+ Trees
No ratings yet
Multilevel Indexing and B+ Trees
33 pages
Indexing and Hashing: (Emphasis On B+ Trees)
No ratings yet
Indexing and Hashing: (Emphasis On B+ Trees)
23 pages
B - Trees
No ratings yet
B - Trees
19 pages
Tree-Structured Indexes: R & G Chapter 9
No ratings yet
Tree-Structured Indexes: R & G Chapter 9
34 pages
Ads 2 Part 3
No ratings yet
Ads 2 Part 3
60 pages
24-Multi-Level Indexing, Dynamic Multilevel Indexing, B-Tree-11-09-2024
No ratings yet
24-Multi-Level Indexing, Dynamic Multilevel Indexing, B-Tree-11-09-2024
40 pages
Chapter 7 Indexing Part1
No ratings yet
Chapter 7 Indexing Part1
58 pages
Blink
No ratings yet
Blink
50 pages
B+ Tree & B Tree
No ratings yet
B+ Tree & B Tree
38 pages
Past Exam Questions 2
No ratings yet
Past Exam Questions 2
2 pages
B+ Trees: Brian Lee CS157B Section 1 Spring 2006
No ratings yet
B+ Trees: Brian Lee CS157B Section 1 Spring 2006
28 pages
Indexing and B+ Tress
No ratings yet
Indexing and B+ Tress
6 pages
Great Florida Birding Trail Map - South Section Updates
No ratings yet
Great Florida Birding Trail Map - South Section Updates
4 pages
Tutorial 5 Question Paper of SQL Exercise
No ratings yet
Tutorial 5 Question Paper of SQL Exercise
2 pages
B+ Trees: What Are B+ Trees Used For Whatisabtree What Is A B+ Tree Searching Insertion Deletion
No ratings yet
B+ Trees: What Are B+ Trees Used For Whatisabtree What Is A B+ Tree Searching Insertion Deletion
30 pages
B Tree PDF
No ratings yet
B Tree PDF
9 pages
B+ Tree: by Li Wen CS157B Professor: Sin-Min Lee
No ratings yet
B+ Tree: by Li Wen CS157B Professor: Sin-Min Lee
26 pages
B+ Tree: by Li Wen CS157B Professor: Sin-Min Lee
No ratings yet
B+ Tree: by Li Wen CS157B Professor: Sin-Min Lee
26 pages
B+ Tree: by Li Wen CS157B Professor: Sin-Min Lee
No ratings yet
B+ Tree: by Li Wen CS157B Professor: Sin-Min Lee
26 pages
Nure 231-11
No ratings yet
Nure 231-11
1 page
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
No ratings yet
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
42 pages
Indexing and Hashing: (Emphasis On B+ Trees)
No ratings yet
Indexing and Hashing: (Emphasis On B+ Trees)
23 pages
Dbms Indexing
No ratings yet
Dbms Indexing
3 pages
Chap 12. Extendible Hashing: File Structures
No ratings yet
Chap 12. Extendible Hashing: File Structures
40 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
From Everand
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
VIOLET CASTRO
No ratings yet
Data Structures the Fun Way: An Amusing Adventure with Coffee-Filled Examples
From Everand
Data Structures the Fun Way: An Amusing Adventure with Coffee-Filled Examples
Jeremy Kubica
No ratings yet
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
From Everand
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
Cory Althoff
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Tutorial 10 Indexing

Uploaded by

Tutorial 10 Indexing

Uploaded by

Indexing COMP3278 2024-2025

This topic will not be included in the exam.

Index creation is DBMS specific, different DBMS (and

The number of search-key values is configurable

In this example, the tree has an order 4, i.e., there are

Why don’t we just use binary tree?

• In this discussion, we assume that the value to be indexed is unique.

18 Parent can keep it, so

Parent is also full,

This time we can also merge with

Delete: 18 The original root node could be

Number of initial buckets equals the Configuration:

Initially, the hash prefix size is 0. ℎ 3 = 3, current hash prefix size is 0

Same when adding 2. 31

You can apply the same procedure to add the remaining 3.

You might also like