Indexing
Indexing
B.Ramamurthy
Chapter 11
01/29/25 B.Ramamurthy 1
Representing Data
Attributes are represented in fixed or variable
length collections called “fields”
Fields in turn are put into fixed or variable length
collections called records.
Records are stored in physical blocks.
A collection of records that forms a relation is
stored as a collection of blocks called a file.
This file different than OS file. How?
Organization is different.
access.
01/29/25 B.Ramamurthy 2
Basic Concepts (indexing)
Indexing works the same way as a
catalog for a book in a library.
Indexing needs to be efficient to
allow fast access to records.
Two types of indices:
ordered indices and
hash indices
01/29/25 B.Ramamurthy 3
Techniques and Evaluation
Access types : types of accesses that are
supported efficiently. Search by specific value
or by range.
Access time: Time sit takes to find a particular
data or a set of data.
Insertion time: Time it takes to insert a new
item.
Deletion time: Time it takes to delete an item.
Space overhead : Additional space occupied by
the index structure.
01/29/25 B.Ramamurthy 4
Ordered Indices
To gain fast access to records in a file
we can use an index structure.
If the file containing the records is
sequentially ordered, the index whose
search key specifies the sequential
order of the file is the primary key
index.
Primary key indices are also called
clustering indices.
01/29/25 B.Ramamurthy 5
Primary Index
Assume that all files are ordered
sequentially on some search key.
Such files, with primary key on the
search key, are called index-
sequential files.
These files accommodate both
sequential and random access to
individual records.
01/29/25 B.Ramamurthy 6
Dense and Sparse Index
Dense index:
An index record appears for every search
key value in the file.
The index record contains the search key
and a pointer to the first data record with
that search-key value.
Sparse index:
An index is created only for a few values.
Each index contains a value and pointer to
first record that contains that value.
01/29/25 B.Ramamurthy 7
Dense Index
Brighton Brighton A-217 750
Downtown Downtown A-101 500
Mianus Downtown A-110 600
Perryridge Mianus A-215
Redwood 700
Perryridge A-102 400
Round Hill Perryridge A-201 900
Perryridge A-218 700
Redwood A-222 700
Round Hill A-305 350
01/29/25 B.Ramamurthy 8
Sparse Index
Brighton Brighton A-217 750
Mianus Downtown A-101 500
Redwood Downtown A-110 600
Mianus A-215
700
Perryridge A-102 400
Perryridge A-201 900
Perryridge A-218 700
Redwood A-222 700
Round Hill A-305 350
01/29/25 B.Ramamurthy 9
Multi-level Indices
Indices themselves may become too large for
efficient processing.
Example:
Consider file with 100000 records with 10 records in a
block.
With sparse index and one index per block we have
about 10,000 indices.
Assuming 100 indices fit into a block we need about
100 blocks.
It is desirable to keep the index file in the main
memory.
Problem: Searching a large index file becomes
expensive.
01/29/25 B.Ramamurthy 10
Multi-level Index
Solution: Index the index file. We
treat the index as we would treat any
other sequential file and construct a
sparse index on the primary index.
We binary-search the outer level
index to find the largest search key
less than or equal to the one we
desire.
Two-level sparse index ; Figure 11.4
01/29/25 B.Ramamurthy 11
Secondary Index
Secondary index is on attributes whose
values are not stored sequentially.
If the search key of a secondary index
is not a candidate key, the index needs
to be dense too.
We can use an extra level of indirection
with buckets at the second level.
See fig.11.5
01/29/25 B.Ramamurthy 12
Secondary Index
350 Brighton A-217 750
400 Downtown A-101 500
500 Downtown A-110 600
600 Mianus A-215
700
Perryridge A-102 400
700
750 Perryridge A-201 900
900 Perryridge A-218 700
Redwood A-222 700
Round Hill A-305 350
01/29/25 B.Ramamurthy 13
B+ Tree Index Files
Main disadvantage of the index-
sequential file organization is that
performance degrades as the file grows
both for index lookups and sequential
scans.
B+ tree index structure is most widely
used of several index structures that
maintain their efficiency despite
insertion and deletion of data.
01/29/25 B.Ramamurthy 14
B+ Tree Index files
A B+ index tree is a balanced tree
in which every path from root to
leaf is of same length and each
non-leaf node has between
ceiling(n/2) and n nodes where n is
fixed.
Typical node is a B+ tree:
n-1 search keys K1, K2,… Kn-1
n pointers P1, P2, …Pn
01/29/25 B.Ramamurthy 15
B+ Tree Node
P1 K1 P2 K2 …… Pn-1 Kn-1 Pn
01/29/25 B.Ramamurthy 16
B+ Tree (contd.)
Structure of a B+ tree
Queries on B+ trees
Updates on B+ trees (insertion ,
deletion)
B+ file organization
B Tree variation of B+ tree :
avoiding redundancy
01/29/25 B.Ramamurthy 17
Hashing
Can we avoid the IO operations that the
result from accessing the index file?
Hashing offers a way.
It also provides a way of constructing
indices (which need nor be sequential).
We will study static and dynamic
hashing.
01/29/25 B.Ramamurthy 18
Hash File Organization
Address of the disk block containing a
desired record is computed using a
function (hash function) and the search
key.
Let K denote set of all search keys, B
denote set of all bucket addresses. Hash
function h is a function that maps K to
B.
Bucket is typically a disk block.
01/29/25 B.Ramamurthy 19
Operations
To insert a record with Ki as key, compute
h(Ki) which gives the address of the bucket
for the record. If there is space in the
bucket then it is stored that bucket. (else
chaining?)
To lookup a record with key Ki, compute
h(ki). Check with every record in the
bucket to obtain the record.
To delete a similar hash, find and delete is
followed.
01/29/25 B.Ramamurthy 20
Hash Functions
Hash function should be chosen so that
The distribution of records is uniform.
The distribution is random.
Handling bucket overflows:
May occur due to insufficient number of
buckets.
Due to bucket skew.
Solution: Overflow buckets, chaining, double
hashing, linear probing, quadratic probing
01/29/25 B.Ramamurthy 21
Hash Indices
Hashing can be used for organizing
indices.Hash index organizes
search keys with their associated
pointers.
See Fig.11.22
Typically only secondary indices
need to be organized using
hashing.
01/29/25 B.Ramamurthy 22
Dynamic Hashing
Many of today’s databases grow very large
in (a short) time.
If you use static hash function we have
three option:
Choose hash function based on current size,
Choose hash function based on anticipated
size.
Periodically restructure the hash file in
response to growth.
Another solution: dynamic hashing.
01/29/25 B.Ramamurthy 23
Dynamic Hash Techniques
Dynamic hash techniques allow the
hash function to be modified
dynamically to accommodate the
growth and shrinkage of the database.
It is also known as extendable hashing.
Extendable hashing copes with the
growth in the database size by splitting
and coalescing buckets as the database
grows and shrinks.
01/29/25 B.Ramamurthy 24