0% found this document useful (0 votes)
6 views

Unit -5 - part 2

Uploaded by

admin pro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Unit -5 - part 2

Uploaded by

admin pro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Course Code: CS302

Database Management Systems

UNIT – V

Storage and Indexing: Data Storage and Indexes - file organizations, primary,
secondary index structures, various index structures - hash-based, dynamic
hashing techniques, multi-level indexes, B+ trees.

Slide 1- 1
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Indexing
1. Objective: To create a data structure that improves the speed of data retrieval
operations.
2. Methodologies: Includes clustered, non-clustered, primary, secondary,
composite, bitmap, and hash indexes, among others.
3. Implications:
Clustered indexes are excellent for range-based queries but slow down
insert/update operations.
Non-clustered indexes improve data retrieval speed but can take up additional
storage.
Bitmap indexes are useful for low-cardinality columns.
4. Real-world Examples: Search engines, e-commerce websites, any application
that requires fast data retrieval.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 2


Indexing in DBMS
 Indexing is used to optimize the performance of a database by minimizing
the number of disk accesses required when a query is processed.
 The index is a type of data structure. It is used to locate and access the data
in a database table quickly.

Index structure:
 Indexes can be created using some database columns.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 3


Indexing in DBMS
 The first column of the database is the search key that contains a copy of
the primary key or candidate key of the table.
 The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily.
 It should be highlighted that sorting the data is not required.
 The second column of the database is the data reference or pointer.
 It contains a set of pointers holding the address of the disk block where the
value of the particular key can be found.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 4


Attributes of Indexing
 Access Types: This refers to the type of access such as value-based search,
range access, etc.
 Access Time: It refers to the time needed to find a particular data element or set
of elements.
 Insertion Time: It refers to the time taken to find the appropriate space and
insert new data.
 Deletion Time: Time taken to find an item and delete it as well as update the
index structure.
 Space Overhead: It refers to the additional space required by the index.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 5


Indexing Methods

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 6


Indexing Methods

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 7


Ordered indices
 Also known as Sequential File Organization
 The indices are usually sorted to make searching faster.

 The indices which are sorted are known as ordered indices.

Example: Suppose we have an employee table with thousands of record and


each of which is 10 bytes long. If their IDs start with 1, 2, 3....and so on and
we have to search student with ID-543.
 In the case of a database with no index, we have to search the disk block
from starting till it reaches 543. The DBMS will read the record after
reading 543*10=5430 bytes.
 In the case of an index, we will search using indexes and the DBMS will

read the record after reading 542*2= 1084 bytes which are very less
compared to the previous case.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 8


Primary Index
 If the index is created on the basis of the primary key of the table, then it is
known as primary indexing.
 These primary keys are unique to each record and contain 1:1 relation between
the records.
 As primary keys are stored in sorted order, the performance of the searching
operation is quite efficient.
 The primary index can be classified into two types:
 Dense index
 Sparse index.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 9


Dense index
 The dense index contains an index record for every search key value in the
data file.
 It makes searching faster.
 In this, the number of records in the index table is same as the number of
records in the main table.
 It needs more space to store index record itself.
 The index records have the search key and a pointer to the actual record on
the disk.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 10


Dense index

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 11


Dense index

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 12


Dense indices

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 13


Dense indices

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 14


Sparse index
 In the data file, index record appears only for a few items in the data file..
 Each item points to a block.
 In this, instead of pointing to each record in the main table, sparse Index
stores index records for only some search-key values.
 To locate a record, we find the index record with the highest search key
which has value less than or equal to the search key value we are looking
for.
 We start at that record pointed to by the index record, and proceed along
with the pointers in the file (that is, sequentially) until we find the desired
record.
 Number of Accesses required=log₂(n)+1, (here n=number of blocks
acquired by index file)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 15


Sparse index

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 16


Sparse index

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 17


Sparse index

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 18


Sparse indices

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 19


Clustering Index
 A clustered index can be defined as an ordered data file. Sometimes the index is
created on non-primary key columns which may not be unique for each record.
 In this case, to identify the record faster, we will group two or more columns to
get the unique value and create index out of them. This method is called a
clustering index.
 The records which have similar characteristics are grouped, and indexes are
created for these group.

 Example: suppose a company contains several employees in each department.


 Suppose we use a clustering index, where all employees which belong to the same
Dept_ID are considered within a single cluster, and
 index pointers point to the cluster as a whole.
 Here Dept_Id is a non-unique key.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 20


Clustering Index

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 21


Clustering Index
 The previous schema is little confusing because one disk block is shared by
records which belong to the different cluster.
 If we use separate disk block for separate clusters, then it is called better
technique.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 22


Clustering Index

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 23


Secondary Index in DBMS
 In the sparse indexing, as the size of the table grows, the size of mapping also
grows.
 These mappings are usually kept in the primary memory so that address fetch
should be faster.
 Then the secondary memory searches the actual data based on the address got
from mapping.
 If the mapping size grows then fetching the address itself becomes slower. In
this case, the sparse index will not be efficient.
 To overcome this problem, secondary indexing is introduced.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 24


Secondary Index in DBMS
 In secondary indexing, to reduce the size of mapping, another level of indexing
is introduced.
 This two-level database indexing technique is used to reduce the mapping size
of the first level.
 In this method, the huge range for the columns is selected initially so that the
mapping size of the first level becomes small.
 Then each range is further divided into smaller ranges.
 The mapping of the first level is stored in the primary memory, so that address
fetch is faster.
 The mapping of the second level and actual data are stored in the secondary
memory (hard disk).
 The secondary Index in DBMS can be generated by a field which has a unique
value for each record, and it should be a candidate key. It is also known as a
non-clustering index.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 25


Secondary Index in DBMS

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 26


Secondary Index in DBMS
For example:
 If you want to find the record of roll 111 in the diagram, then it will search the
highest entry which is smaller than or equal to 111 in the first level index. It
will get 100 at this level.
 Then in the second index level, again it does max (111) <= 111 and gets 110.
Now using the address 110, it goes to the data block and starts searching each
record till it gets 111.
 This is how a search is performed in this method. Inserting, updating or deleting
is also done in the same manner.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 27


Secondary Index in DBMS

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 28


Secondary Index in DBMS

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 29


Multilevel Indexing:
 Index records comprise search-key values and data pointers.
 With the growth of the size of the database, indices also grow.
 As the index is stored in the main memory,
 If single-level index is used, then a large size index cannot be kept in memory
which leads to multiple disk accesses.
 The multilevel indexing segregates the main block into various smaller blocks
so that the same can be stored in a single block.
 The outer blocks are divided into inner blocks which in turn are pointed to the
data blocks.
 This can be easily stored in the main memory with fewer overheads.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 30


Multilevel Indexing:

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 31


Multilevel Indexing:

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 32


Multilevel Indexing:

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 1- 33

You might also like