0% found this document useful (0 votes)
35 views37 pages

Indexing Lecture Nov 2023 Detailed

The document discusses different types of database indexes including primary indexes, clustering indexes, and secondary indexes. It explains that primary indexes index the primary key of a table, clustering indexes physically organize data based on the index value, and secondary indexes index non-key fields. The document also covers performance characteristics and implementation details of each index type.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views37 pages

Indexing Lecture Nov 2023 Detailed

The document discusses different types of database indexes including primary indexes, clustering indexes, and secondary indexes. It explains that primary indexes index the primary key of a table, clustering indexes physically organize data based on the index value, and secondary indexes index non-key fields. The document also covers performance characteristics and implementation details of each index type.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Advanced

Databases
Indexing
Dr David Hamill
Overview

Introduction

Single-Level Ordered Indexes


• Primary Index
• Secondary Index (Non-Clustered)
• Clustering Index

Multi-Level Indexes

B-Trees and B+-Trees


Introduction
• We need more efficient ways of using indexes to retrieve data.
• So far, we have seen access structures concerned with how records
are organized in files and what access methods can be used based on
that structure:
• Heap Files
• No specific structure
• Use of sequential (linear) access to records
• Hash Files
• Uses hash function of a set of hash fields
• Allows direct access if has fields are known
• An Index is a data structure that allows a DBMS to locate particular
records in a file in less time.
• This results in faster responses to user queries.
• It can speed up retrieval of records if certain requirements on the
searching conditions are met.
• A database index is similar to an index in a
book or catalogue in a library:
• An author index
• A title index

• Each index offers an access path to records:


• No need to scan sequentially through a
file!
• An index is ordered and each index entry
contains the item required or one or
more locations where the item can be
found.
• An Index access structure is associated with a particular search key
and contains records consisting of a key value and the address of the
logical record in the file containing the key value.
• Values in the index are usually sorted/ordered according to the
indexing field (often based on a single attribute).
• When an index is ordered we can perform efficient binary search on
the index.
• It is possible to have more than one index field.
Data & Index Files
• There are 2 types of files:
• Data files – files containing the logical records.
• Index files – files containing the index records.

• We will look at the following types of indexes:


• Primary indexes
• Clustering indexes
• Secondary indexes
• Multilevel indexes
• B-tree and B+-tree structures
• Each indexing type have their own advantages and disadvantages.
• The following characteristics are taken into account:
• Access Types: the access methods that can be supported efficiently (value-
based search, range-based search)
• Access Time: time needed to locate the result set
• Insertion/deletion efficiency: how fast can we complete insertions and
deletions.
• Storage overhead: the additional storage requirements in an index structure.
Index types

Single-Level Ordered Indexes


• Primary Index
• Secondary Index (Non-clustered)
• Clustered Index

Multi-Level Indexes

B-Trees and B+-Trees


Single-Level Ordered Indexes
• Primary Index: if data is sequentially ordered and
the indexing field is a key field to the file
(guaranteed to be unique) then we call it a
primary index.

• Clustering Index: if the data file is sequentially


ordered on a non-key field and the indexing field
corresponds to a non-key field, then the index is a
clustering index.
Single-Level Ordered Indexes
• Secondary Index: An index that is defined on a non-
ordering field of the data file.

• A file can have at most one physical ordering field.


• A file can have at most one primary index or clustering
index, but not both.
• A file can have several secondary indexes.
• Secondary indexes do not affect the physical
organization of records.
Single-Level Ordered Indexes
• An index can be sparse or dense:
• A sparse index has an index record for some of the search key
values in the file.
• A dense index has an index record for every search key value in
the file.
• A primary index is built for a data file sorted on its key field.
• The index file is a sorted file whose records are fixed in length consisting
of two fields:
• The first field is the same data type as the ordering key field of the
data file.
• The second field is a pointer to a disk block.
• The ordering key field is called the primary key of the data file.
• There is one entry for each block in the data file.
Primary
Indexes

1:1 index file to data file is intuitive but wasteful


Primary Index -Example

Index File, sorted Data File, sorted

Distinct values
Primary Index -Performance
• The index file requires significantly fewer blocks than the data file
• Sparse index
• Index file record typically smaller in size than data file record

• A binary search on the index file requires fewer block accesses than a
binary search on the data file

• Insertion and deletion of records is problematic


• Not only do we have to move records in the data file we also have to change
some index entries

• Storage Overhead is not a serious problem


Clustering Indexes

• Clustering indexing is a database indexing technique that is used to


physically arrange the data in a table based on the values of the clustered
index key. This means that the rows in the table are stored on disk in the
same order as the clustered index key
• The leaf nodes of a clustered index contain the data pages.
• A clustered index is faster. A non-clustered index is slower. The clustered
index requires less memory for operations. A non-Clustered index requires
more memory for operations.
• A clustered index is most useful for columns that have range predicates
because it allows better sequential access of data in the table. As a result,
since like values are on the same data page, fewer pages are fetched.
Clustering Indexes

• A clustering index is built for a data file sorted on a non-key field.


• The index file is another sorted file whose records are fixed length
consisting of two fields.
• First field is of the same data type as the clustering field of the data file.
• Second field is a pointer to a disk block
• There is one entry in the clustering index for each distinct value of the
clustering field containing the value and a pointer to the first block in
the data file that holds at least one record with the value of the
clustering field.
Clustering Indexes

Multiple entries

Distinct values
Clustering Indexes Performance
• Index file requires significantly fewer blocks than the data file.
• Sparse index
• Index file record typically smaller than data file record.
• A binary search on the index file requires fewer block accesses than a
binary search on the data file.
• Insertion and deletion of records is problematic:
• We have to move records in the data file and we have to change some index
entries.
• Common to reserve a whole block for each distinct value of the clustering
field with all records with that value placed in the block.
• Storage overhead is not typically a serious problem.
Secondary (Non-Clustered) Indexes

• A secondary index is built for a non-ordering field of a data file.


• The index file is itself a sorted file whose records are fixed or variable
length consisting of two fields.
• The first field is the same data type as the indexing field.
• The second field is a pointer to a disk block for a record.

• We can consider two types of secondary indexes:


• Case 1: Using a dense secondary index that maps to all records in the data
file.
• Case 2: Using a secondary index that has an entry for each distinct key value
but whose pointers can be multivalued or point to a bucket of values.
• A value in the index field is not necessarily an ordering field of the
data file.
• When indexing field is not an ordering field, we construct a secondary
index on it where the index field can also be called a secondary key.
• There is one entry in the index file for each entry in the data file.
• Each entry contains the value of the secondary key for the
record and a pointer to either the block where the entry is
Secondary stored or the record itself.
• There may be duplicate values in the index field.

Indexes
(Case 1)
• A secondary index is a dense index since there is
one entry for every record in the data file.
• A binary search can be performed on the index.
• A secondary index usually needs more storage
space and longer search times because of the large
Secondary number of entries.

Indexes
(Case 1)
• Many records in the data file have the same value
for the indexing field.
• Several options are available for implementing such
an index:
1. User variable length records to hold an array of block
pointers associated with the indexing field value.
Secondary 2. User a single entry for each indexing field value. Create
extra level of redirection to handle multiple pointers.
Indexes
(Case 2)
Secondary Indexes
Secondary Indexes - summary
Index Type Number of Index Dense / Use Block Anchor
Entries Sparse
Primary Equal to the number of Sparse Yes
blocks in the data file
Clustering Equal to the number of Sparse Yes if separate blocks
distinct indexing field are used for records
values with different
indexing field values.
No otherwise
Secondary Equal to the number of Dense for
records for Case 1. Case 1.
Equal to the number of Sparse for
distinct indexing field Case 2.
values for Case 2.
Clustered v Non-Clustered

1. Difference 1: Only one clustered index per table. You can create multiple non-
clustered indexes in a single table
2. Difference 2: Clustered indexes only sort tables. Therefore, they do not
consume extra storage. Non-clustered indexes are stored in a separate place
from the actual table claiming more storage space.
3. Difference 3: Clustered indexes are faster than Non-clustered indexes since they
don’t involve any extra lookup step.
Multi-Level Indexes
• When an index file becomes large and extends over many pages, the search
time for the required index increases

• A multi-level index attempts to overcome this problem by reducing the search


range
• Treat the index like any other file
• Split the index into a number of smaller indexes
• Maintain an index to the indexes
Multi-Level
Indexes
• Multilevel indexes refer to a hierarchical
structure of indexes.

• Here, each level of the index provides a


more detailed reference to the data.

• It allows faster data retrieval, reduces


disk access, and improves query
performance.
Multi-Level Indexes - Performance
• Search performance increases when searching for a record based
on a specific indexing field value.
• Problems with insertions and deletions are still present.
• To retain the benefits of using multi-level indexing while reducing
insertion and deletion problems, an approach is taken that
leaves some space in each block for inserting new entries.
• This is called dynamic multi-level index and is often
implemented using a data structure called a balanced tree (B-
trees and B+-trees).
Tree Data Structure
Root Node

Child Node
Level 0 A

Level 1 B C

Level 2 D E F
Tree Data Structure
• The depth of a tree is the maximum number of levels between
the root node and a leaf node in the tree.
• If the depth from the root node to the leaf node is the same to each
leaf we have produced a balanced tree or B-Tree.
• The degree (or order) of a tree is the maximum number of
children allowed per parent.
• One more than the maximum number of key values per node.
• The access time of a tree depends on the depth rather than the breadth
of the tree. For this reason, it is better for it to be a leafy shallow tree.
• When a node reaches a maximum size, the median is promoted to a
higher node and the left and right sub-trees are split surrounding the
median.
• A special type of tree used to guide the search for a record.
• Multi-Level indexes can be considered a variation of search trees.
• Each block of entries is called a node.
• A node can have a certain number of pointers and a certain number of key
values.
• The index field values in each node guides us to the next node until we
reach the data block containing our required record.
• Using a pointer, we restrict our search at each sub-level to a sub-tree of the
search tree and can ignore all other nodes that are not in the sub-tree.

Search Trees
Tree Data Structure

• See example of constructing a b-tree in class:

Construct a b-tree of order 5 containing the following keys:


1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
Tree Data Structure
• The main difference between a B-Tree and B+Tree is that a B+Tree
does not allow storage of indexes at anywhere else other than the
leaves.
• Internal nodes in a B-Tree can contain indexes and pointers to other
indexes.
B-Tree vs B+Tree Performance
• Search often takes more time in B+-Trees vs B-Trees because keys are
not just available on the leaves.
• B+Tree can maintain duplicates.
• Insertions in B-Tree takes more time. B+Tree insertion always takes
the same time.
• Deletion of B-tree node is complex. Deletion in B+Tree is easy because
all indexes are at the leaves.
• B-Tree has no redundant search keys but the B+Tree may have
redundant search keys
Disadvantages of Indexing
• To perform the indexing database management system, you need a
primary key on the table with a unique value.
• You can’t perform any other indexes in Database on the Indexed data.
• You are not allowed to partition an index-organized table.
• SQL Indexing Decrease performance in INSERT, DELETE, and UPDATE
query.
Summary

Primary Index

Single-Level
Secondary Index
Ordered Indexes

Multi-Level
Clustering Index
Indexes

B-Trees and B+-


Trees

You might also like