Indexing
Indexing
Indexing is defined as a data structure technique which allows you to quickly retrieve records
from a database file. It is based on the same attributes on which the Indices has been done.
An index
An Index is a small table having only two columns. The first column comprises a copy of the
primary or candidate key of a table. Its second column contains a set of pointers for holding the
address of the disk block where that specific key value stored.
Types of Indexing
Type of Indexes
Database Indexing is defined based on its indexing attributes. Two main types of indexing
methods are:
a. Primary Indexing
b. Secondary Indexing
a. Primary Indexing
Primary Index is an ordered file which is fixed length size with two fields. The first field is the
same a primary key and second, filed is pointed to that specific data block. In the primary Index,
there is always one to one relationship between the entries in the index table.
Dense Index
Sparse Index
i. Dense Index
In a dense index, a record is created for every search key valued in the database. This helps you
to search faster but needs more space to store index records. In this Indexing, method records
contain search key value and points to the real record on the disk.
However, sparse Index stores index records for only some search-key values. It needs less space,
less maintenance overhead for insertion, and deletions but It is slower compared to the dense
Index for locating records.
Secondary Index
The secondary Index can be generated by a field which has a unique value for each record, and it
should be a candidate key. It is also known as a non-clustering index.
This two-level database indexing technique is used to reduce the mapping size of the first level.
For the first level, a large range of numbers is selected because of this; the mapping size always
remains small.
In a bank account database, data is stored sequentially by acc_no; you may want to find all
accounts in of a specific branch of ABC bank.
Here, you can have a secondary index for every search-key. Index record is a record point to a
bucket that contains pointers to all the records with their specific search-key value.
Clustering Index
In a clustered index, records themselves are stored in the Index and not pointers. Sometimes the
Index is created on non-primary key columns which might not be unique for each record. In such
a situation, you can group two or more columns to get the unique values and create an index
which is called clustered Index. This also helps you to identify the record faster.
Example:
Let's assume that a company recruited many employees in various departments. In this case,
clustering indexing should be created for all employees who belong to the same dept.
It is considered in a single cluster, and index points point to the cluster as a whole. Here,
Department _no is a non-unique key.
Multilevel Indexing is created when a primary index does not fit in memory. In this type of
indexing method, you can reduce the number of disk accesses to short any record and kept on a
disk as a sequential file and create a sparse base on that file.
B-Tree Index
B-tree index is the widely used data structures for Indexing. It is a multilevel index format
technique which is balanced binary search trees. All leaf nodes of the B tree signify actual data
pointers.
Moreover, all leaf nodes are interlinked with a link list, which allows a B tree to support both
random and sequential access.
Lead nodes must have between 2 and 4 values.
Every path from the root to leaf are mostly on an equal length.
Non-leaf nodes apart from the root node have between 3 and 5 children nodes.
Every node which is not a root or a leaf has between n/2] and n children.
Advantages of Indexing
It helps you to reduce the total number of I/O operations needed to retrieve that data, so
you don't need to access a row in the database from an index structure.
Offers Faster search and retrieval of data to users.
Indexing also helps you to reduce tablespace as you don't need to link to a row in a table,
as there is no need to store the ROWID in the Index. Thus you will able to reduce the
tablespace.
You can't sort data in the lead nodes as the value of the primary key classifies it.
Disadvantages of Indexing
To perform the indexing database management system, you need a primary key on the
table with a unique value.
You can't perform any other indexes on the Indexed data.
B+ Tree
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes
remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support
random access as well as sequential access.
Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of
the order n where n is fixed for every B+ tree.
o It contains an internal node and leaf node.
Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the root
node.
o At most, an internal node of the tree contains n pointers.
Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.
Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the
intermediary node which will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end,
we will be redirected to the third leaf node. Here DBMS will perform a sequential search to find
55.
B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after
55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60 there.
n this case, we have to split the leaf node, so that it can be inserted into tree without affecting the
fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split
the leaf node of the tree in the middle so that its balance is not altered. So we can group (50, 55)
and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60
added to it, and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to
find the node where it fits and then place it in that leaf node.
B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from
the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to
have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:
This is a binary tree structure similar to B+ This is a balanced tree with intermediary nodes and leaf
B Tree Index Files B+ Tree Index Files
tree. But here each node will have only two nodes. Intermediary nodes contain only pointers/address
branches and each node will have some to the leaf nodes. All leaf nodes will have records and
records. Hence here no need to traverse till leaf all are at same distance from the root.
node to get the data.
It has more height compared to width. Most width is more compared to height.
Number of nodes at any intermediary level 'l' Each intermediary node can have n/2 to n children. Only
is 2 l. Each of the intermediary nodes will have root node will have 2 children.
only 2 sub nodes.
Even a leaf node level will have 2 l nodes. Leaf node stores (n-1)/2 to n-1 values
Hence total nodes in the B Tree
are 2l+1−12l+1−1.
It might have fewer nodes compared to B+ tree Automatically Adjust the nodes to fit the new record.
as each node will have data. Similarly it re-organizes the nodes in the case of delete,
if required. Hence it does not alter the definition of B+
tree.
Since each node has record, there might not be Reorganization of the nodes does not affect the
required to traverse till leaf node. performance of the file. This is because, even after the
rearrangement all the records are still found in leaf
nodes and are all at equidistance. There is no change in
distance of records from neither root nor the time to
traverse till leaf node.
If the tree is very big, then we have to traverse If there is any rearrangement of nodes while insertion or
through most of the nodes to get the records. deletion, then it would be an overhead. It takes little
Only few records can be fetched at the effort, time and space. But this disadvantage can be
intermediary nodes or near to the root. Hence ignored compared to the speed of traversal
this method might be slower.