DBMS Ch14 Indexing
DBMS Ch14 Indexing
Basic Concepts
Ordered Indices
B+-Tree Index Files
B-Tree Index Files
Hashing
Write-optimized indices
Spatio-Temporal Indexing
Database System Concepts - 7th Edition 14.2 ©Silberschatz, Korth and Sudarshan
Basic Concepts
Index files are typically much smaller than the original file
No. of Index Blocks (M) << No. of database file blocks (N)
Database System Concepts - 7th Edition 14.4 ©Silberschatz, Korth and Sudarshan
Index Evaluation Metrics
Database System Concepts - 7th Edition 14.5 ©Silberschatz, Korth and Sudarshan
Types of Indices
Database System Concepts - 7th Edition 14.6 ©Silberschatz, Korth and Sudarshan
Ordered Indices
In an ordered index, index entries are stored sorted on the search key
value. Two main types of ordered indexing methods are:
Primary index: in a sequentially ordered file, the index whose search
key specifies the sequential order of the file.
• Also called clustering index
• The search key of a primary index is usually but not necessarily the
primary key.
Secondary index: an index whose search key specifies an order
different from the sequential order of the file.
• Also called nonclustering index.
Database System Concepts - 7th Edition 14.7 ©Silberschatz, Korth and Sudarshan
Dense Index Files
Dense index — Index entry appears for every search-key value in the file.
E.g. index on ID attribute of instructor relation
Database System Concepts - 7th Edition 14.8 ©Silberschatz, Korth and Sudarshan
Dense Index Files (Cont.)
The index record contains the search-key value and a pointer to the first
data record with that search-key value.
The other records with the same search-key value would be sorted
sequentially after the first record.
Dense index on dept_name, with instructor file sorted on dept_name
Database System Concepts - 7th Edition 14.9 ©Silberschatz, Korth and Sudarshan
Sparse Index Files
Sparse Index: Index entry appears for only some of the search-key
values in the file.
• Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
• Find index record with largest search-key value < K
• Search file sequentially starting at the record to which the index
record points
Database System Concepts - 7th Edition 14.10 ©Silberschatz, Korth and Sudarshan
Sparse Index Files (Cont.)
• For unclustered index: sparse index on top of dense index (multilevel index)
Database System Concepts - 7th Edition 14.11 ©Silberschatz, Korth and Sudarshan
Secondary Indices Example
Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
Secondary indices have to be dense
Database System Concepts - 7th Edition 14.12 ©Silberschatz, Korth and Sudarshan
Secondary Indices (Cont.)
Database System Concepts - 7th Edition 14.13 ©Silberschatz, Korth and Sudarshan
Multilevel Index
Database System Concepts - 7th Edition 14.15 ©Silberschatz, Korth and Sudarshan
Multilevel Index (Cont.)
Database System Concepts - 7th Edition 14.16 ©Silberschatz, Korth and Sudarshan
Index Update: Deletion
Dense indices
• If deleted record was the only record in the file with its particular
search-key value, the search-key is deleted from the index.
• If the deleted record was the first record with the search-key value, the
system updates the index entry to point to the next record.
• Otherwise, no update is required in the index
Database System Concepts - 7th Edition 14.17 ©Silberschatz, Korth and Sudarshan
Index Update: Deletion
Sparse indices
• if an entry for the search key exists in the index, it is deleted by
replacing the entry in the index with the next search-key value in the
file (in search-key order).
• If the next search-key value already has an index entry, the entry is
deleted instead of being replaced.
Database System Concepts - 7th Edition 14.18 ©Silberschatz, Korth and Sudarshan
Index Update: Insertion
Dense indices– if the search-key value does not appear in the index,
insert it
• Indices are maintained as sequential files
• Need to create space for new entry, overflow blocks may be
required
Database System Concepts - 7th Edition 14.19 ©Silberschatz, Korth and Sudarshan
Index Update: Insertion
Sparse indices - We assume that the index stores an entry for each
block. No change needs to be made to the index unless a new block
is created.
• If a new block is created, the first search-key value appearing in
the new block is inserted into the index.
Database System Concepts - 7th Edition 14.20 ©Silberschatz, Korth and Sudarshan
Indices on Multiple Keys
Database System Concepts - 7th Edition 14.21 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files
Database System Concepts - 7th Edition 14.22 ©Silberschatz, Korth and Sudarshan
Example of B+-Tree
Database System Concepts - 7th Edition 14.23 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files (Cont.)
Database System Concepts - 7th Edition 14.24 ©Silberschatz, Korth and Sudarshan
B+-Tree Node Structure
Typical node
Database System Concepts - 7th Edition 14.25 ©Silberschatz, Korth and Sudarshan
Leaf Nodes in B+-Trees
Database System Concepts - 7th Edition 14.26 ©Silberschatz, Korth and Sudarshan
Non-Leaf Nodes in B+-Trees
Non leaf nodes form a multi-level sparse index on the leaf nodes. For a
non-leaf node with m pointers:
• All the search-keys in the subtree to which P1 points are less than K1
• For 2 i n – 1, all the search-keys in the subtree to which Pi points
have values greater than or equal to Ki–1 and less than Ki
• All the search-keys in the subtree to which Pn points have values
greater than or equal to Kn–1
• General structure
Database System Concepts - 7th Edition 14.27 ©Silberschatz, Korth and Sudarshan
Example of B+-tree
Database System Concepts - 7th Edition 14.28 ©Silberschatz, Korth and Sudarshan
Observations about B+-trees
Database System Concepts - 7th Edition 14.29 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees
function find(v)
1. C=root
2. while (C is not a leaf node)
1. Let i be least number s.t. V Ki.
2. if there is no such number i then
3. Set C = last non-null pointer in C
4. else if (v = C.Ki ) Set C = Pi +1
5. else set C = C.Pi
3. if for some i, Ki = V then return C.Pi
4. else return null /* no record with search-key value v exists. */
Database System Concepts - 7th Edition 14.30 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees (Cont.)
Range queries find all records with search key values in a given range
• See book for details of function findRange(lb, ub) which returns set
of all such records
• Real implementations usually provide an iterator interface to fetch
matching records one at a time, using a next() function
Database System Concepts - 7th Edition 14.31 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees (Cont.)
If there are K search-key values in the file, the height of the tree is no
more than logn/2(K).
A node is generally the same size as a disk block, typically 4 kilobytes
• and n is typically around 100 (40 bytes per index entry).
With 1 million search key values and n = 100
• at most log50(1,000,000) = 4 nodes are accessed in a lookup
traversal from root to leaf.
Contrast this with a balanced binary tree with 1 million search key values
— around 20 nodes are accessed in a lookup
• above difference is significant since every node access may need a
disk I/O, costing around 20 milliseconds
Database System Concepts - 7th Edition 14.32 ©Silberschatz, Korth and Sudarshan
Non-Unique Keys
Database System Concepts - 7th Edition 14.33 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Insertion
Database System Concepts - 7th Edition 14.34 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Insertion (Cont.)
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Next step: insert entry with (Califieri, pointer-to-new-node) into parent
Database System Concepts - 7th Edition 14.35 ©Silberschatz, Korth and Sudarshan
B+-Tree Insertion
Affected nodes
Database System Concepts - 7th Edition 14.36 ©Silberschatz, Korth and Sudarshan
B+-Tree Insertion
Affected nodes
Database System Concepts - 7th Edition 14.37 ©Silberschatz, Korth and Sudarshan
Insertion in B+-Trees (Cont.)
Splitting a non-leaf node: when inserting (k,p) into an already full internal
node N
• Copy N to an in-memory area M with space for n+1 pointers and n
keys
• Insert (k,p) into M
• Copy P1,K1, …, K n/2-1,P n/2 from M back into node N
• Copy Pn/2+1,K n/2+1,…,Kn,Pn+1 from M into newly allocated node N'
• Insert (K n/2,N') into parent N
Example
Database System Concepts - 7th Edition 14.38 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion
Affected nodes
Database System Concepts - 7th Edition 14.39 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion (Cont.)
Affected nodes
Node with Gold and Katz became underfull, and was merged with its sibling
Parent node becomes underfull, and is merged with its sibling
• Value separating two nodes (at the parent) is pulled down when merging
Root node then has only one child, and is deleted
Database System Concepts - 7th Edition 14.41 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Deletion
Assume record already deleted from file. Let V be the search key value of the
record, and Pr be the pointer to the record.
Remove (Pr, V) from the leaf node
If the node has too few entries due to the removal, and the entries in the
node and a sibling fit into a single node, then merge siblings:
• Insert all the search-key values in the two nodes into a single node
(the one on the left), and delete the other node.
• Delete the pair (Ki–1, Pi), where Pi is the pointer to the deleted node,
from its parent, recursively using the above procedure.
Database System Concepts - 7th Edition 14.42 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Deletion
Otherwise, if the node has too few entries due to the removal, but the
entries in the node and a sibling do not fit into a single node, then
redistribute pointers:
• Redistribute the pointers between the node and a sibling such that
both have more than the minimum number of entries.
• Update the corresponding search-key value in the parent of the node.
The node deletions may cascade upwards till a node which has n/2 or
more pointers is found.
If the root node has only one pointer after deletion, it is deleted and the
sole child becomes the root.
Database System Concepts - 7th Edition 14.43 ©Silberschatz, Korth and Sudarshan
Complexity of Updates
Database System Concepts - 7th Edition 14.44 ©Silberschatz, Korth and Sudarshan
Non-Unique Search Keys
Database System Concepts - 7th Edition 14.45 ©Silberschatz, Korth and Sudarshan
B+-Tree File Organization
Database System Concepts - 7th Edition 14.46 ©Silberschatz, Korth and Sudarshan
B+-Tree File Organization (Cont.)
Good space utilization important since records use more space than
pointers.
To improve space utilization, involve more sibling nodes in redistribution
during splits and merges
• Involving 2 siblings in redistribution (to avoid split / merge where
possible) results in each node having at least 2n / 3 entries
Database System Concepts - 7th Edition 14.47 ©Silberschatz, Korth and Sudarshan
Other Issues in Indexing
Database System Concepts - 7th Edition 14.48 ©Silberschatz, Korth and Sudarshan
Indexing Strings
Database System Concepts - 7th Edition 14.49 ©Silberschatz, Korth and Sudarshan
Bulk Loading and Bottom-Up Build
Database System Concepts - 7th Edition 14.50 ©Silberschatz, Korth and Sudarshan
B-Tree Index Files
Similar to B+-tree, but B-tree allows search-key values to appear only once;
eliminates redundant storage of search keys.
Search keys in nonleaf nodes appear nowhere else in the B-tree; an
additional pointer field for each search key in a nonleaf node must be
included.
Generalized B-tree leaf node
Database System Concepts - 7th Edition 14.51 ©Silberschatz, Korth and Sudarshan
B-Tree Index Files (Cont.)
Database System Concepts - 7th Edition 14.52 ©Silberschatz, Korth and Sudarshan
B-Tree Index File Example
Database System Concepts - 7th Edition 14.53 ©Silberschatz, Korth and Sudarshan
Indexing on Flash
Database System Concepts - 7th Edition 14.54 ©Silberschatz, Korth and Sudarshan
Indexing in Main Memory
Database System Concepts - 7th Edition 14.55 ©Silberschatz, Korth and Sudarshan
Hashing
Database System Concepts - 7th Edition 14.56 ©Silberschatz, Korth and Sudarshan
Static Hashing
Database System Concepts - 7th Edition 14.57 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows
Database System Concepts - 7th Edition 14.58 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows (Cont.)
Closed hashing - When buckets are full, then a new data bucket is
allocated for the same hash result and is linked after the previous one. This
mechanism is known as Overflow chaining.
Database System Concepts - 7th Edition 14.59 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows (Cont.)
Database System Concepts - 7th Edition 14.60 ©Silberschatz, Korth and Sudarshan
Example of Hash Index
Database System Concepts - 7th Edition 14.61 ©Silberschatz, Korth and Sudarshan
Example of Hash File Organization
Database System Concepts - 7th Edition 14.62 ©Silberschatz, Korth and Sudarshan
Example of Hash File Organization
Database System Concepts - 7th Edition 14.63 ©Silberschatz, Korth and Sudarshan
Deficiencies of Static Hashing
Database System Concepts - 7th Edition 14.64 ©Silberschatz, Korth and Sudarshan
Dynamic Hashing
Database System Concepts - 7th Edition 14.65 ©Silberschatz, Korth and Sudarshan
Comparison of Ordered Indexing and Hashing
Database System Concepts - 7th Edition 14.66 ©Silberschatz, Korth and Sudarshan
End of Chapter 14
Database System Concepts - 7th Edition 14.67 ©Silberschatz, Korth and Sudarshan