0% found this document useful (0 votes)
4 views

Indexing

Indexing is a database optimization technique that minimizes disk access by using data structures to quickly locate data, with primary and secondary indexing as the main methods. Primary indexing can be dense or sparse, while secondary indexing introduces multiple levels to reduce mapping size. Clustering indexes group non-unique columns for faster identification, and B+ trees are a structure used for efficient data storage and retrieval, supporting operations like insertion and deletion while maintaining balance.

Uploaded by

Tejas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Indexing

Indexing is a database optimization technique that minimizes disk access by using data structures to quickly locate data, with primary and secondary indexing as the main methods. Primary indexing can be dense or sparse, while secondary indexing introduces multiple levels to reduce mapping size. Clustering indexes group non-unique columns for faster identification, and B+ trees are a structure used for efficient data storage and retrieval, supporting operations like insertion and deletion while maintaining balance.

Uploaded by

Tejas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Indexing

Indexing is a way to optimize the performance of a database by minimizing the number of disk
accesses required when a query is processed. It is a data structure technique which is used to
quickly locate and access the data in a database.
Indexes are created using a few database columns.
• The first column is the Search key that contains a copy of the primary key or candidate key of
the table. These values are stored in sorted order so that the corresponding data can be
accessed quickly.
Note: The data may or may not be stored in sorted order.
• The second column is the Data Reference or Pointer which contains a set of pointers holding
the address of the disk block where that particular key value can be found.

From <https://www.geeksforgeeks.org/indexing-in-databases-set-1/>

Types of Indexing

From <https://www.guru99.com/indexing-in-database.html>

Indexing in Database is defined based on its indexing attributes. Two main types of
indexing methods are:
• Primary Indexing
• Secondary Indexing
Primary Index
Primary Index is an ordered file which is fixed length size with two fields. The first field is
the same a primary key and second, filed is pointed to that specific data block. In the
primary Index, there is always one to one relationship between the entries in the index
table.
The primary Indexing in DBMS is also further divided into two types.
• Dense Index
• Sparse Index

From <https://www.guru99.com/indexing-in-database.html>

• Dense Index:
• For every search key value in the data file, there is an index record.
• This record contains the search key and also a reference to the first data record with that
search key value.

RDBMS Page 1
• Dense Index:
• For every search key value in the data file, there is an index record.
• This record contains the search key and also a reference to the first data record with that
search key value.

Though it addresses quick search on any search key, the space used for index and address becomes overhead in the
memory. Here the (index, address) becomes almost same as (table records, address). Hence more space is consumed to
store the indexes as the record size increases.

• Sparse Index:
• The index record appears only for a few items in the data file. Each item points to a block as
shown.
• To locate a record, we find the index record with the largest search key value less than or equal
to the search key value we are looking for.
• We start at that record pointed to by the index record, and proceed along with the pointers in
the file (that is, sequentially) until we find the desired record.

From <https://www.geeksforgeeks.org/indexing-in-databases-set-1/>

But if we have very huge table, then if we provide very large range between the columns will not work. We will have to
divide the column ranges considerably shorter. In this situation, (index, address) mapping file size grows like we have
seen in the dense indexing.

Secondary Index
n this method, another level of indexing is introduced to reduce the (index, address) mapping size. That means initially
huge range for the columns are selected so that first level of mapping size is small. Then each range is further divided
into smaller ranges. First level of mapping is stored in the primary memory so that address fetch is faster. Secondary
level of mapping and the actual data are stored in the secondary memory – hard disk.

RDBMS Page 2
we can see that columns are divided into groups of 100s first. These groups are stored in the primary
memory. In the secondary memory, these groups are further divided into sub-groups. Actual data
records are then stored in the physical memory. We can notice that, address index in the first level is
pointing to the first address in the secondary level and each secondary index addresses are pointing
to the first address in the data block. If we have to search any data in between these values, then it
will search the corresponding address from first and second level respectively. Then it will go to the
address in the data blocks and perform linear search to get the data.
For example, if it has to search 111 in the above diagram example, it will search the max (111) <=
111 in the first level index. It will get 100 at this level. Then in the secondary index level, again it does
max (111) <= 111, and gets 110. Now it goes to data block with address 110 and starts searching
each record till it gets 111. This is how a search is done in this method. Inserting/deleting/updating is
also done in same manner.

Multilevel Indexing
In this method, we can see that index mapping growth is reduced to considerable amount. But this
method can also have same problem as the table size increases. In order to overcome this, we can
introduce multiple levels between primary memory and secondary memory. This method is also
known as multilevel indexing. In this method number of secondary level index is two or more.

Clustering Index
In some cases, the index is created on non-primary key columns which may not be unique for each
record. In such cases, in order to identify the records faster, we will group two or more columns
together to get the unique values and create index out of them. This method is known as clustering
index. Basically, records with similar characteristics are grouped together and indexes are created for
these groups.
For example, students studying in each semester are grouped together. i.e.; 1 st Semester students,
2nd semester students, 3rd semester students etc are grouped.

In above diagram we can see that, indexes are created for each semester in the index file. In the data
block, the students of each semester are grouped together to form the cluster. The address in the
index file points to the beginning of each cluster. In the data blocks, requested student ID is then
search in sequentially.
New records are inserted into the clusters based on their group. In above case, if a new student joins
3rd semester, then his record is inserted into the semester 3 cluster in the secondary memory. Same
is done with update and delete.
If there is short of memory in any cluster, new data blocks are added to that cluster.

RDBMS Page 3
This method of file organization is better compared to other methods as it provides clean distribution
of records, and hence making search easier and faster. But in each cluster, there would be unused
space left. Hence it will take more memory compared to other methods.

---------------------------------------------------------------------------------------------------------------------------------------
-
Introduction to B+ Trees

B+ tree has one root, any number of intermediary nodes (usually one) and a leaf node. Here all leaf
nodes will have the actual records stored. Intermediary nodes will have only pointers to the leaf
nodes; it not has any data. Any node will have only two leaves. This is the basic of any B+ tree.
Consider the STUDENT table below. This can be stored in B+ tree structure as shown below. We can
observe here that it divides the records into two and splits into left node and right node. Left node will
have all the values less than or equal to root node and the right node will have values greater than
root node. The intermediary nodes at level 2 will have only the pointers to the leaf nodes. The values
shown in the intermediary nodes are only the pointers to next level. All the leaf nodes will have the
actual records in a sorted order.

RDBMS Page 4
Insertion in B+ tree
Suppose we have to insert a record 60 in below structure. It will go to 3 rd leaf node after 55. Since it is
a balanced tree and that leaf node is already full, we cannot insert the record there. But it should be
inserted there without affecting the fill factor, balance and order. So the only option here is to split the
leaf node. But how do we split the nodes?

The 3rd leaf node should have values (50, 55, 60, 65, 70) and its current root node is 50. We will split
the leaf node in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65,
70) into 2 leaf nodes. If these two has to be leaf nodes, the intermediary node cannot branch from 50.
It should have 60 added to it and then we can have pointers to new leaf node.

This is how we insert a new entry when there is overflow. In normal scenario, it is simple to find the
node where it fits and place it in that leaf node.

Delete in B+ tree
Suppose we have to delete 60 from the above example. What will happen in this case? We have to
remove 60 from 4th leaf node as well as from the intermediary node too. If we remove it from
intermediary node, the tree will not satisfy B+ tree rules. So we need to modify it have a balanced
tree. After deleting 60 from above B+ tree and re-arranging nodes, it will appear as below.

Suppose we have to delete 15 from above tree. We will traverse to the 1 st leaf node and simply delete
15 from that node. There is no need for any re-arrangement as the tree is balanced and 15 do not
appear in the intermediary node.

RDBMS Page 5
RDBMS Page 6

You might also like