Unit-6 Storage Strategies

Indexing is a technique used to optimize database performance by minimizing disk accesses during queries. Database indexes are data structures that improve retrieval speed by allowing searches to quickly locate and access data. There are different types of indexes including primary indexes on primary keys, secondary indexes on other columns, and clustered indexes that group related records together physically. B-trees and hashing are common indexing methods, with B-trees organizing data in sorted order through nodes and hashing using a mathematical function to directly map data to storage locations.

Uploaded by

Shiv Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views43 pages

Unit-6 Storage Strategies

Uploaded by

Shiv Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Unit-6

Storage
Strategies
Index (Indexing)

2
What is database Index?
 Indexes are special lookup tables that the database
search engine can use to speed up data retrieval.
 A database index is a data structure that improves the
speed of data retrieval operations on a database table.
 An index in a database is very similar to an index in
the back of a book.
 Indexes are used to retrieve data from the database
very fast. The users cannot see the indexes, they are
just used to speed up searches/queries.
 Updating a table with indexes takes more time than
updating a table without (because the indexes also
need an update).
Indexing
 Indexing is used to optimize the performance of a database by
minimizing the number of disk accesses required when a query is
processed.
 The index is a type of data structure. It is used to locate and access
the data in a database table quickly.
 Indexes can be created using some database columns

 The first column of the database is the search key that contains a
copy of the primary key or candidate key of the table. The values of
the primary key are stored in sorted order so that the corresponding
data can be accessed easily.
 The second column of the database is the data reference. It contains
a set of pointers holding the address of the disk block where the
value of the particular key can be found.
Indexing methods
1) Ordered Index
 The indices are usually sorted to make searching faster.
The indices which are sorted are known as ordered
indices.
 Suppose we have an employee table with thousands of
record and each of which is 10 bytes long. If their IDs
start with 1, 2, 3....and so on and we have to search
employee with ID-543.
 In the case of a database with no index, we have to
search the disk block from starting till it reaches 543.
 In the case of an index, we will search using indexes.
2) Primary Index
 If the index is created on the basis of the primary key of the
table, then it is known as primary indexing. These primary
keys are unique to each record and contain 1:1 relation
between the records.
 As primary keys are stored in sorted order, the performance of
the searching operation is quite efficient.
 The primary index can be classified into two types:
1) Dense index
2) Sparse index.
1) Dense index
 The dense index contains an index record for every search key
value in the data file. It makes searching faster.
 In this, the number of records in the index table is same as the
number of records in the main table.
Primary Index Conti…
 It needs more space to store index record itself. The index records
have the search key and a pointer to the actual record on the disk.
 In dense index, there is an index record for every search key value in
the database.
 This makes searching faster but requires more space to store index
records.
 In this, the number of records in the index table is same as the
number of records in the main table.
 Index records contain search key
value and a pointer to the actual
record on the disk.
Primary Index Conti..
2) Sparse index.
 In the data file, index record appears only for a few items. Each
item points to a block.
 In sparse index, index records are not created for every search
key.
 The index record appears only for a few
items in the data file.
 It requires less space, less maintenance
overhead for insertion, and deletions
but is slower compared to the
dense index for locating records.
3) Clustered Index
 A clustered index can be defined as an ordered data file.
Sometimes the index is created on non-primary key columns
which may not be unique for each record.
 In this case, to identify the record faster, we will group two or
more columns to get the unique value and create index out of
them. This method is called a clustering index.
 The records which have similar
characteristics are grouped, and indexes
are created for these group.
Clustered Index
Example: suppose a company contains
several employees in each department.
Suppose we use a clustering index,
where all employees which belong to
the same Dept_ID are considered
within a single cluster, and index
pointers point to the cluster as a
whole. Here Dept_Id is a non-unique
key. we use separate disk block for
separate clusters
4) Secondary Index
 In the sparse indexing, as the size of the table grows, the size of
mapping also grows.
 These mappings are usually kept in the primary memory so
that address fetch should be faster.
 Then the secondary memory searches the actual data based on
the address got from mapping. If the mapping size grows then
fetching the address itself becomes slower. In this case, the
sparse index will not be efficient. To overcome this problem,
secondary indexing is introduced.
 In secondary indexing, to reduce the size of mapping, another
level of indexing is introduced.
 In this method, the huge range for the columns is selected
initially so that the mapping size of the first level becomes
small.
Secondary Index
 Then each range is further divided into smaller ranges.
 The mapping of the first level is stored in the primary memory,
so that address fetch is faster. The mapping of the second level
and actual data are stored in the secondary memory (hard disk).
Example
 If you want to find the record of roll 111 in the diagram, then it
will search the highest entry which is smaller than or equal to
111 in the first level index.
 It will get 100 at this level. Then in the second index level, again
it does max (111)<=111 and gets 110.
Secondary Index
 Now using the address 110, it goes to the data block and
starts searching each record till it gets 111.
 This is how a search is performed
in this method.
 Inserting, updating or deleting is
also done in the same manner.
B-tree

17
B-tree
 B-tree is a data structure that store data in its node in sorted
order. We can represent sample B-tree as follows.

 B-tree stores data in such a way that each node contains keys in
ascending order.
 Each of these keys has two references to another two child nodes.
 The left side child node keys are less than the current keys and the
right side child node keys are greater than the current keys.
Searching a record in B-tree

 Suppose we want to search 18 in the above B tree structure.

 First, we will fetch for the intermediary node which will
direct to the leaf node that can contain a record for 18.
 So, in the intermediary node, we will find a branch between
16 and 20 nodes.
 Then at the end, we will be redirected to the fifth leaf node.
Here DBMS will perform a sequential search to find 18.
Insert operation
B-Tree of Order m has the following properties.
 Property #1 - All leaf nodes must be at same level.
 Property #2 - All nodes except root must have at least [m/2]-
1 keys and maximum of m-1 keys.
 Property #3 - All non leaf nodes except root (i.e. all internal
nodes) must have at least m/2 children.
 Property #4 - If the root node is a non leaf node, then it must
have at least 2 children.
 Property #5 - A non leaf node with n-1 keys must
have n number of children.
 Property #6 - All the key values in a node must be
in Ascending Order.
Example:-
Construct a B-Tree of Order 3 by inserting numbers from 1 to 10.
Example-2
Hashing

35
Hashing
 In a huge database structure, it is very inefficient to search all the
index values and reach the desired data. Hashing technique is used to
calculate the direct location of a data record on the disk without using
index structure.
 In this technique, data is stored at the data blocks whose address is
generated by using the hashing function. The memory location where
these records are stored is known as data bucket or data blocks.
 In this, a hash function can choose any of the column value to
generate the address.
 Most of the time, the hash function uses the primary key to generate
the address of the data block.
 A hash function is a simple mathematical function to any complex
mathematical function.
 We can even consider the primary key itself as the address of the data
block.
Hashing
 Hashing method is used to index and retrieve items in a
database as it is faster to search that specific item using the
shorter hashed key instead of using its original value.
 Data is stored in the form of data blocks whose address is
generated by applying a hash function in the memory
location where these records are stored known as a data
Hashing
 The above diagram shows data block addresses same as primary key
value.
 This hash function can also be a simple mathematical function like
exponential, mod, cos, sin, etc.
 Suppose we have mod (5) hash function to determine the address of the
data block.
 In this case, it applies mod (5) hash function on the primary keys and
generates 3, 3, 1, 4 and 2 respectively, and records are stored in those data
block addresses
Why do we need hashing?
Here, are the situations in the DBMS where you need to apply the
Hashing method:
 For a huge database structure, it's tough to search all the index
values through all its level and then you need to reach the
destination data block to get the desired data.
 Hashing method is used to index and retrieve items in a
database as it is faster to search that specific item using the
shorter hashed key instead of using its original value.
 Hashing is an ideal method to calculate the direct location of a
data record on the disk without using index structure.
 It is also a helpful technique for implementing dictionaries.
Important terminologies using in hashing
 Data bucket – Data buckets are memory locations where the
records are stored. It is also known as Unit Of Storage.
 Key: A DBMS key is an attribute or set of an attribute which helps
you to identify a row(tuple) in a relation(table). This allows you to
find the relationship between two tables.
 Hash function: A hash function, is a mapping function which maps
all the set of search keys to the address where actual records are
placed.
 Linear Probing – Linear probing is a fixed interval between probes.
In this method, the next available data block is used to enter the
new record, instead of overwriting on the older record.
 Quadratic probing- It helps you to determine the new bucket
address. It helps you to add Interval between probes by adding the
consecutive output of quadratic polynomial to starting value given
by the original computation.
Important terminologies using in hashing
 Hash index – It is an address of the data block. A hash
function could be a simple mathematical function to
even a complex mathematical function.
 Double Hashing –Double hashing is a computer
programming method used in hash tables to resolve the
issues of has a collision.
 Bucket Overflow: The condition of bucket-overflow is
called collision. This is a fatal stage for any static has to
function.
Types of hashing:
1) Static hashing
 In the static hashing, the resultant data bucket address will always
remain the same.
 That means if we generate an address for EMP_ID =103 using the
hash function mod (5) then it will always result in same bucket
address 3. Here, there will be no change in the bucket address.
 Therefore, in this static hashing method, the number of data buckets
in memory always remains constant.
Operations of Static hashing
1) Searching a record
When a record needs to be searched, then the same hash function retrieves the
address of the bucket where the data is stored.
2) Insert a Record
When a new record is inserted into the table, then we will generate an address
for a new record based on the hash key and record is stored in that location.
3) Delete a Record
To delete a record, we will first fetch the record which is supposed to be
deleted. Then we will delete the records for that address in memory.
4) Update a Record
 To update a record, we will first search it using a hash function, and then the
data record is updated.
 If we want to insert some new record into the file but the address of a data
bucket generated by the hash function is not empty, or data already exists in
that address. This situation in the static hashing is known as bucket
overflow. This is a critical situation in this method.
Dynamic hashing
 The dynamic hashing method is used to overcome the problems of
static hashing like bucket overflow.
 In this method, data buckets grow or shrink as the records increases
or decreases. This method is also known as Extendable hashing
method.
 This method makes hashing dynamic, i.e., it allows insertion or
deletion without resulting in poor performance.
 How to search a key
 First, calculate the hash address of the key.
 Check how many bits are used in the directory, and these bits are
called as i.
 Take the least significant i bits of the hash address. This gives an index
of the directory.
 Now using the index, go to the directory and find bucket address
where the record might be.
Dynamic Hashing
How to insert a new record
 Firstly, you have to follow the same procedure for retrieval, ending
up in some bucket.
 If there is still space in that bucket, then place the record in it.
 If the bucket is full, then we will split the bucket and redistribute the
records.
Example
Consider the following grouping of keys into buckets, depending on the
prefix of their hash address
Dynamic hashing
 The last two bits of 2 and 4 are 00. So it will go into bucket B0.
 The last two bits of 5 and 6 are 01, so it will go into bucket B1.
 The last two bits of 1 and 3 are 10, so it will go into bucket B2.
 The last two bits of 7 are 11, so it will go into B3.
Dynamic hashing
Insert key 9 with hash address 10001 into the above
structure:
 Since key 9 has hash address 10001, it must go into the first
bucket. But bucket B1 is full, so it will get split.
 The splitting will separate 5, 9 from 6 since last three bits of 5,
9 are 001, so it will go into bucket B1, and the last three bits of
6 are 101, so it will go into bucket B5.
 Keys 2 and 4 are still in B0. The record in B0 pointed by the
000 and 100 entry because last two bits of both the entry are
00.
 Keys 1 and 3 are still in B2. The record in B2 pointed by the
010 and 110 entry because last two bits of both the entry are
10.
 Key 7 are still in B3. The record in B3 pointed by the 111 and
011 entry because last two bits of both the entry are 11.
Dynamic Hashing

Google Looker Studio
No ratings yet
Google Looker Studio
4 pages
Cisco IPS IDS Interview Questions and Answers VOL 1.0
100% (2)
Cisco IPS IDS Interview Questions and Answers VOL 1.0
11 pages
DBMS - R18 UNIT 5 Notes
86% (7)
DBMS - R18 UNIT 5 Notes
23 pages
Expert Veri+ed, Online, Free.: Topic 1 - Question Set 1
100% (2)
Expert Veri+ed, Online, Free.: Topic 1 - Question Set 1
183 pages
Spring Boot Interview Questions: Click Here
No ratings yet
Spring Boot Interview Questions: Click Here
15 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
5 pages
Unit - 5 DBMS
No ratings yet
Unit - 5 DBMS
69 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
7 pages
Indexing
No ratings yet
Indexing
6 pages
S - UNIT VII Indexing in Database
No ratings yet
S - UNIT VII Indexing in Database
9 pages
Dbms Mod3
No ratings yet
Dbms Mod3
54 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
4 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
Indexing
No ratings yet
Indexing
62 pages
Unit 3 Storage Strategies Indices B-Trees Hashing
No ratings yet
Unit 3 Storage Strategies Indices B-Trees Hashing
12 pages
Unit - 5 - Part 2
No ratings yet
Unit - 5 - Part 2
33 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
6 pages
Unit 5
No ratings yet
Unit 5
20 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
DBMS GTU Study Material Presentations Unit-6 03102020040343AM
No ratings yet
DBMS GTU Study Material Presentations Unit-6 03102020040343AM
27 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
UNIT 4 Updated - 121124
No ratings yet
UNIT 4 Updated - 121124
52 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Unit 6
No ratings yet
Unit 6
38 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
File Organization
No ratings yet
File Organization
41 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
Link
No ratings yet
Link
4 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
23 pages
SQL Indexes 2
No ratings yet
SQL Indexes 2
10 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
Document 4
No ratings yet
Document 4
20 pages
Co2 - Index in DBMS 1
No ratings yet
Co2 - Index in DBMS 1
29 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
5 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Index 1
No ratings yet
Index 1
25 pages
L6 Query Optimization
No ratings yet
L6 Query Optimization
52 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Indexing
No ratings yet
Indexing
6 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
28 pages
DBMS Unit5
No ratings yet
DBMS Unit5
20 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
24 pages
DBMS Series Part-2
No ratings yet
DBMS Series Part-2
80 pages
Dbms 3 Sem
No ratings yet
Dbms 3 Sem
31 pages
DBMS A1
No ratings yet
DBMS A1
10 pages
Indexing
No ratings yet
Indexing
6 pages
CMP 312
No ratings yet
CMP 312
2 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
Unit-10 PL - SQL Concepts
No ratings yet
Unit-10 PL - SQL Concepts
45 pages
Module3 DF
No ratings yet
Module3 DF
3 pages
DBMS - GTU Paper Solution
No ratings yet
DBMS - GTU Paper Solution
16 pages
DBMS Old Ques Paper
No ratings yet
DBMS Old Ques Paper
19 pages
Unit-4 Relational Database Design
No ratings yet
Unit-4 Relational Database Design
90 pages
DBMS - GTU - Asked - Ques - Unit Wise
100% (1)
DBMS - GTU - Asked - Ques - Unit Wise
16 pages
Unit-5 Query Processing and Optimization
No ratings yet
Unit-5 Query Processing and Optimization
40 pages
Unit-7 Transaction Processing
No ratings yet
Unit-7 Transaction Processing
107 pages
Unit-2 Data Models
No ratings yet
Unit-2 Data Models
92 pages
Unit-3 Relational Query
No ratings yet
Unit-3 Relational Query
81 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Marks: 10
No ratings yet
Marks: 10
10 pages
Unit-1-Database System Architecture
No ratings yet
Unit-1-Database System Architecture
38 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Question-1: What Is A Group Discussion?
No ratings yet
Question-1: What Is A Group Discussion?
12 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
SE Unit-1
No ratings yet
SE Unit-1
84 pages
ETC - Learning To Say NO
No ratings yet
ETC - Learning To Say NO
3 pages
Deng102 Communication Skills II
No ratings yet
Deng102 Communication Skills II
224 pages
ETC - Writing Agenda and Minutes of Meeting
No ratings yet
ETC - Writing Agenda and Minutes of Meeting
7 pages
ETC-Presentation Strategies - Public Speaking
No ratings yet
ETC-Presentation Strategies - Public Speaking
14 pages
CS 3306 Written Assignment Unit 1
No ratings yet
CS 3306 Written Assignment Unit 1
5 pages
3410
No ratings yet
3410
5 pages
FDSRPDF 2024 07 31
No ratings yet
FDSRPDF 2024 07 31
70 pages
1.1 Mod7 - Logfiles PDF
No ratings yet
1.1 Mod7 - Logfiles PDF
2 pages
SQL 1
No ratings yet
SQL 1
45 pages
Lesson 1 Introduction To Business Analytics
No ratings yet
Lesson 1 Introduction To Business Analytics
15 pages
CyberSecurity Short Course - Week 3
No ratings yet
CyberSecurity Short Course - Week 3
46 pages
Interfaces and Conversion in Oracle Applications
No ratings yet
Interfaces and Conversion in Oracle Applications
32 pages
Spring 2024 - CS441 - 2
No ratings yet
Spring 2024 - CS441 - 2
3 pages
Difference Between Red Hat Enterprise Linux 6 and 7
No ratings yet
Difference Between Red Hat Enterprise Linux 6 and 7
3 pages
Chapter 5 Distributed Systems
No ratings yet
Chapter 5 Distributed Systems
14 pages
SecurityPlus Domain4 Notes
No ratings yet
SecurityPlus Domain4 Notes
3 pages
Geoprocessing Tools at Arcgis 9.2 Desktop: Toolbox / Toolset Tool Arcview Arceditor Arcinfo
No ratings yet
Geoprocessing Tools at Arcgis 9.2 Desktop: Toolbox / Toolset Tool Arcview Arceditor Arcinfo
9 pages
L1 IT9 Creating Websites (Website Builders)
No ratings yet
L1 IT9 Creating Websites (Website Builders)
3 pages
A Guide To Make Your SSO UGM Account2 Min
No ratings yet
A Guide To Make Your SSO UGM Account2 Min
12 pages
Form Folder Steps
No ratings yet
Form Folder Steps
5 pages
2ndyear COMPS GTU - QBanks
No ratings yet
2ndyear COMPS GTU - QBanks
2 pages
Chetana Thorat Resume 2
No ratings yet
Chetana Thorat Resume 2
1 page
Project For The Web Admin Help
No ratings yet
Project For The Web Admin Help
64 pages
MCQ's of Data Mining CIT-661 Part 1 - Prepared by GCUF Guiders
No ratings yet
MCQ's of Data Mining CIT-661 Part 1 - Prepared by GCUF Guiders
9 pages
Unit-3 Software: Need of Computer Software
No ratings yet
Unit-3 Software: Need of Computer Software
10 pages
Lab Manual 03 CSE 314
No ratings yet
Lab Manual 03 CSE 314
7 pages
Software Engineering-Question Paper
No ratings yet
Software Engineering-Question Paper
2 pages
How To Setup Unlimited SMS Gateway
No ratings yet
How To Setup Unlimited SMS Gateway
14 pages
WebUI Admin Guide Platform V9.5.3
No ratings yet
WebUI Admin Guide Platform V9.5.3
74 pages
CheckPoint Firewall Interview Question and Answer-Part1
100% (1)
CheckPoint Firewall Interview Question and Answer-Part1
5 pages

Unit-6 Storage Strategies

Uploaded by

Unit-6 Storage Strategies

Uploaded by

Unit-6

 Suppose we want to search 18 in the above B tree structure.

You might also like