Lec20Indexing v1
Lec20Indexing v1
Spring2024
Dr. Shamsad Parvin
Chapter-14 silberchartz, chapter-14 ulman
Data base storage
The DBMS assumes that the primary storage location of the database is on a
non-volatile disk.
The DBMS's components manage the movement of data between non-volatile
and volatile storage.
Expensive
Faster
‘- Volatile
Non -Volatile
Cheaper
Slower 2
Data base storage
Access Time
‘-
https://gist.github.com/hellerbarde/2843375 3
Basic Concepts
• Example :
4
Basic concepts
5
Basic concepts
6
Indexes (or Indices)
7
Indexes (or Indices)
Search Key - attribute to set of attributes used to look up
records in a file.
An index file consists of records (called index entries) of the
form
search-key pointer
‘-
Set of pointer holding address of
Primary key the disk block
Index files are typically much smaller than the original file
Two basic kinds of indices:
• Ordered indices: search keys are stored in sorted order
• Hash indices: search keys are distributed uniformly
across “buckets” using a “hash function”.
8
Index Evaluation Metrics
9
Indexes (or Indices)
10
Physical design Advisor
Database statistics
‘-
Query Optimizer
Query or Update
Indexes
11
SQL Syntax
‘-
12
Index Classification
Primary
• Primary = is over attributes that include the primary key
Secondary : otherwise Ordered Clustered
Primary index
file index
Primary index is classified as Dense and Sparse
Clustered/unclustered
• – Clustered = records close in index are close in data ‘-
Un Ordered Secondary Secondary
file index index
• – Unclustered = records close in index may be far in
data
A file can be clustered on at most one search key Key Non Key
13
Sequential Files
A sequential file is created by
sorting the tuples of a relation by
their primary key. The tuples are
then distributed among blocks, in
this order.
Index file needs much fewer
blocks than the data file, hence
Block easier to fit in memory
‘-
It is common to leave some
space in each block, else
insertion of new samples need to
handled by overflow
14
Dense Index
A dense index is a sequence of blocks holding only the keys of the records and pointers
to the records themselves
Index blocks of the dense index maintain these keys in the same sorted order as in the
main data file itself.
Then, by using the index, we can find any record given its search key, with only one disk
I/O per lookup.
‘-
15
Dense Index Files : Example
‘-
‘-
‘-
18
Sparse Index
20
Multilevel Index
21
Multilevel Index (Cont.)
‘-
22
Secondary Index
23
Secondary Index
24
Applications of Secondary Index
Suppose there are relations R and S, with a many-one relationship from the tuples of R
to tuples of S. It may make sense to store each tuple of R with the tuple of S to which it is
related, rather than according to the primary key of R.
Example:
Movie (title, year, length, genre, studioName, producerC#)
Studio(name, address, presC#)
‘-
select title, year
from Movie, Studio
where presC# == zzz AND Movie.studioName=Studio.name;
Search with the primary key title and year are not preferable here
25
Example
We can create a clustered file structure for both relations that has the below
configuration
Then if we create an index for Studio with search key presC#, then whatever the value of
zzz is, we can quickly find the tuple for the proper studio.
‘-
26
Indirection in Secondary Index
‘-
‘-
Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
Secondary indices have to be dense
28
Indirection in Secondary Indexes
Several conditions to a query, and select title
each condition has a secondary index from Movie
to help it where year == 2005 AND
Movie.studioName=‘’Disney;
Find the bucket pointers that satisfy all
the conditions by intersecting sets of
pointers in memory, and retrieving only
the records pointed to by the surviving
pointers. ‘-
Save the I/O cost of retrieving records
that satisfy some, but not all, of the
conditions.
29
Index Update: Deletion
If deleted record was the only record in the file with its particular search-
key value, the search-key is deleted from the index also.
30
Index Update: Insertion
31
B Tree Index Files
32
B Trees
A more general structure that is commonly used in
commercial systems
The particular variant that is most often used is
known as a B+ tree
• B-trees automatically maintain as many levels of
index as is appropriate for the size of the file
being indexed.
‘-
• B-trees manage the space on the blocks they
use so that every block is between half used and
completely full
• Balanced (more or less), i.e. all the paths from
root to leaves have nearly the same length
• Disk-based: one node per block; Each block will
have space for 𝑛𝑛 search-key values and 𝑛𝑛 + 1
pointers.
P1 K1 P2 K2 --- Pn Kn Pn+1
33
Example of B+-Tree
‘-
34
B+-Tree Node Structure
Typical node
35
B+-Tree Index Files (Cont.)
All paths from root to leaf are of the same length : Balanced tree
Each Intermediate node has between n/2 and n children.
A leaf node has between (n–1)/2 and n–1 values of keys
Special cases:
• If the root is not a leaf, it has at least 2‘-children.
• If the root is a leaf (that is, there are no other nodes in the
tree), it can have between 0 and (n–1) values of key
36
Example: Leaf Nodes in B+-Trees
‘-
37
Class Activity
Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers
be 8 bytes. If there is no header information kept on the blocks, then we want to find the
largest integer value of 𝑛𝑛, such that an entire node can fit in once block.
‘-
4 × 𝑛𝑛 + 8 × 𝑛𝑛 + 1 ≤ 4096
𝑛𝑛 = 340
38
Rules for the Blocks of a B-tree
The keys in leaf nodes are copies of keys from the
data file. These keys are distributed among the leaves
in sorted order, from left to right.
39
Example: A complete B- Tree
‘-
Notice that at the leaf, all the keys appear just once, in order, as we look across the leaves from left to right
40
Class Activity
‘-
Leaf nodes must have between 3 and 5 values Minimum How many
((n–1)/2 and n –1, with n = 6). search-key at leaf node ?
Non-leaf nodes other than root must have between 3
and 6 children ((n/2 and n with n =6). Minimum How many
children for Internal node ?
Root must have at least 2 children.
41
B-tree Vs B+ Tree
We will discuss B+ tree, which has several advantages over standard B tree, which
include:
• Though duplication of keys are maintained, B+ tree allows the data pointers to be
present only in the leaf nodes, which makes the search and updates more efficient
• Leaf nodes are stored as structural linked list
‘-
42
Balancing Constraints
‘-
43
Rules for the Blocks of a B-tree
44
Lookups in B-tree
Suppose we have a B-tree index and we want to find a record with search-key value K .
Basis: If we are at a leaf, look among the keys there. If the ith key
is K , then
‘-
the ith pointer will take us to the desired record.
Induction : If we are at an interior node with keys K i ,K 2 , . . . , K
n, That is, there is only one child that could lead to a leaf with key K .
If K < K \ , then it is the first child, if K \ < K < K 2 , it is the second
child, and so on. Recursively apply the search procedure at this child.
45
Lookups
‘-
46
Range Queries on B+-Trees (Cont.)
Range queries find all records with search key values in a given
range
‘- in a specified
We can find all records with search key values
range [lb, ub].
‘-
lookup 32
48
Updates on B+-Trees: Insertion
49
Updates on B+-Trees: Insertion (Cont.)
Splitting a leaf node:
• take the n (search-key value, pointer) pairs (including the
one being inserted) in sorted order. Place the first n/2 in
the original node, and the rest in a new node.
• let the new node be p, and let k be the least key value in p.
Insert (k,p) in the parent of the node being split.
• If the parent is full, split it and propagate the split further
up.
‘-
Splitting of nodes proceeds upwards till a node that is not full is
found.
• In the worst case the root node may be split increasing the
height of the tree by 1.
50
Example : Insertion
‘-
32
51
Insert it right there
Another Insertion Example
‘-
Need to Split to
make a room….
152
Oops!!! 52
Another Insertion
Example
53
Another Insertion Example
54
Another Insertion Example
Construct a B-tree for the following sets of key values:
1,2,3,4,5,6,7,
Where the number of pointers that will fit in one node is 4
55
Delete a record with search key value 130
Deletion
‘-
156 ‘-
57