0% found this document useful (0 votes)
2 views21 pages

Dbms Notes - Unit 5

The document provides a comprehensive overview of Database Management Systems (DBMS), covering topics such as database design, the relational model, SQL, transaction concepts, and various file organization methods. It details different types of file organization including sequential, heap, hash, B+, and indexed sequential access methods, along with their advantages and disadvantages. Additionally, it discusses indexing techniques to optimize database performance and the structure of indices used for efficient data retrieval.

Uploaded by

haskgaming123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views21 pages

Dbms Notes - Unit 5

The document provides a comprehensive overview of Database Management Systems (DBMS), covering topics such as database design, the relational model, SQL, transaction concepts, and various file organization methods. It details different types of file organization including sequential, heap, hash, B+, and indexed sequential access methods, along with their advantages and disadvantages. Additionally, it discusses indexing techniques to optimize database performance and the structure of indices used for efficient data retrieval.

Uploaded by

haskgaming123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Database Management System

B.Tech. II Year II Semester

UNIT - I
Database System Applications: A Historical Perspective, File Systems versus a DBMS, the Data
Model, Levels of Abstraction in a DBMS, Data Independence, Structure of a DBMS
Introduction to Database Design: Database Design and ER Diagrams, Entities, Attributes, and
Entity Sets, Relationships and Relationship Sets, Additional Features of the ER Model, Conceptual
Design with the ER Model
UNIT - II
Introduction to the Relational Model: Integrity constraint over relations, enforcing integrity
constraints, querying relational data, logical data base design, introduction to views,
destroying/altering tables and views.
Relational Algebra, Tuple relational Calculus, Domain relational calculus.
UNIT - III
SQL: QUERIES, CONSTRAINTS, TRIGGERS: form of basic SQL query, UNION, INTERSECT, and
EXCEPT, Nested Queries, aggregation operators, NULL values, complex integrity constraints in
SQL, triggers and active data bases.
Schema Refinement: Problems caused by redundancy, decompositions, problems related to
decomposition, reasoning about functional dependencies, FIRST, SECOND, THIRD normal forms,
BCNF, lossless join decomposition, multi-valued dependencies, FOURTH normal form, FIFTH
normal form.
UNIT - IV
Transaction Concept, Transaction State, Implementation of Atomicity and Durability, Concurrent
Executions, Serializability, Recoverability, Implementation of Isolation, Testing for serializability,
Lock Based Protocols, Timestamp Based Protocols, Validation- Based Protocols, Multiple
Granularity, Recovery and Atomicity, Log–Based Recovery, Recovery with Concurrent
Transactions.
UNIT - V
Data on External Storage, File Organization and Indexing, Cluster Indexes, Primary and Secondary
Indexes, Index data Structures, Hash Based Indexing, Tree base Indexing, Comparison of File
Organizations, Indexes and Performance Tuning, Intuitions for tree Indexes, Indexed Sequential
Access Methods (ISAM), B+ Trees: A Dynamic Index Structure.

1 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


File Organization in DBMS
1. The File is a collection of records. Using the primary key, we can access the records. The type and
frequency of access can be determined by the type of file organization which was used for a given
set of records.
2. File organization is a logical relationship among various records.
3. This method defines how file records are mapped onto disk blocks.
4. File organization is used to describe the way in which the records are stored in terms of blocks,
and the blocks are placed on the storage medium.
5. The first approach to map the database to the file is to use the several files and store only one
fixed length record in any given file.
6. An alternative approach is to structure our files so that we can contain multiple lengths for
records.
7. Files of fixed length records are easier to implement than the files of variable length records.

2 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Types of file organization:
File organization contains various methods. These particular methods have pros and cons on the basis of
access or selection.
Types of file organization are as follows:
1. Sequential File Organization
2. Heap File Organization
3. Hash File Organization
4. B+ File Organization
5. Indexed Sequential Access Method (ISAM)
6. Cluster File Organization

Sequential File Organization

This method is the easiest method for file organization. In this method, files are stored sequentially. This
method can be implemented in two ways:

Pile File Method:

1. It is a quite simple method. In this method, we store the record in a sequence, i.e., one after
another. Here, the record will be inserted in the order in which they are inserted into tables.
2. In case of updating or deleting of any record, the record will be searched in the memory blocks.
When it is found, then it will be marked for deleting, and the new record is inserted.

3 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Insertion of the new record:
Suppose we have four records R1, R3 and so on up to R7 in a sequence. Hence, records are nothing but a
row in the table. Suppose we want to insert a new record R2 in the sequence, then it will be placed at
the end of the file.
Sorted File Method:
In this method, the new record is always inserted at the file's end, and then it will sort the sequence in
ascending or descending order. Sorting of records is based on any primary key or any other key.
In the case of modification of any record, it will update the record and then sort the file, and lastly, the
updated record is placed in the right place.

Pros of Sequential File Organization


1. It contains a fast and efficient method for the huge amount of data.
2. In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
3. It is simple in design. It requires no much effort to store the data.
4. This method is used for report generation or statistical calculations.
Cons of Sequential File Organization
1. It will waste time as we cannot jump on a particular record that is required but we have to move
sequentially which takes our time.
2. Sorted file method takes more time and space for sorting the records.

Heap File Organization


1. It is the simplest and most basic type of organization. It works with data blocks. In heap file
organization, the records are inserted at the file's end. When the records are inserted, it doesn't
require the sorting and ordering of records.

4 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


2. When the data block is full, the new record is stored in some other block. This new data block
need not to be the very next data block, but it can select any data block in the memory to store
new records.
3. The heap file is also known as an unordered file.
4. In the file, every record has a unique id, and every page in a file is of the same size.
5. It is the DBMS responsibility to store and manage the new records.

Insertion of a new record


Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a new
record R2 in a heap. If the data block 3 is full then it will be inserted in any of the database selected by
the DBMS, let's say data block 1.
If we want to search, update or delete the data in heap file organization, then we need to traverse the
data from staring of the file till we get the requested record.
If the database is very large then searching, updating or deleting of record will be time-consuming
because there is no sorting or ordering of records. In the heap file organization, we need to check all the
data until we get the requested record.

5 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Pros of Heap File Organization
1. It is a very good method of file organization for bulk insertion.
2. If there is a large number of data which needs to load into the database at a time, then this
method is best suited.
3. In case of a small database, fetching and retrieving of records is faster than the sequential
record.
Cons of Heap File Organization
1. This method is inefficient for the large database because it takes time to search or modify the
record.
2. This method is inefficient for large databases.

Hash File Organization

1. Hash File Organization uses the computation of hash function on some fields of the records.
2. The hash function's output determines the location of disk block where the records are to be
placed.
3. When a record has to be received using the hash key columns, then the address is generated,
and the whole record is retrieved using that address.
4. In the same way, when a new record has to be inserted, then the address is generated using the
hash key and record is directly inserted. The same process is applied in the case of delete and
update.
5. In this method, there is no effort for searching and sorting the entire file. In this method, each
record will be stored randomly in the memory.

6 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


B+ File Organization
1. B+ tree file organization is the advanced method of an indexed sequential access method.
2. It uses a tree-like structure to store records in File.
3. It uses the same concept of key-index where the primary key is used to sort the records. For each
primary key, the value of the index is generated and mapped with the record.
4. The B+ tree is similar to a binary search tree (BST), but it can have more than two children.
5. In this method, all the records are stored only at the leaf node.
6. Intermediate nodes act as a pointer to the leaf nodes. They do not contain any records.
The B+ tree shows that:
1. There is one root node of the tree, i.e., 25.
2. There is an intermediary layer with nodes. They do not store the actual record. They have only
pointers to the leaf node.
3. The nodes to the left of the root node contain the prior value of the root and nodes to the right
contain next value of the root, i.e., 15 and 30 respectively.
4. There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
5. Searching for any record is easier as all the leaf nodes are balanced.
6. In this method, searching any record can be traversed through the single path and accessed
easily.

7 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Pros of
B+ tree file organization
1. In this method, searching becomes very easy as all the records are stored only in the leaf nodes
and sorted the sequential linked list.
2. Traversing through the tree structure is easier and faster.
3. The size of the B+ tree has no restrictions, so the number of records can increase or decrease
and the B+ tree structure can also grow or shrink.
4. It is a balanced tree structure, and any insert/update/delete does not affect the performance of
tree.
Cons of B+ tree file organization
1. This method is inefficient for the static method.
Indexed sequential access method (ISAM)
1. ISAM method is an advanced sequential file organization.
2. In this method, records are stored in the file using the primary key.
3. An index value is generated for each primary key and mapped with the record.
4. This index contains the address of the record in the file.
5. If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.

8 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Pros of ISAM:
1. In this method, each record has the address of its data block, searching a record in a huge
database is quick and easy.
2. This method supports range retrieval and partial retrieval of records. Since the index is based on
the primary key values, we can retrieve the data for the given range of value.
3. In the same way, the partial value can also be easily searched, i.e., the student name starting with
'JA' can be easily searched.
Cons of ISAM
1. This method requires extra space in the disk to store the index value.
2. When the new records are inserted, then these files have to be reconstructed to maintain the
sequence.
3. When the record is deleted, then the space used by it needs to be released. Otherwise, the
performance of the database will slow down.

Cluster file organization


1. When the two or more records are stored in the same file, it is known as clusters.
2. These files will have two or more tables in the same data block, and key attributes which are used
to map these tables together are stored only once.
3. This method reduces the cost of searching for various records in different files.
4. The cluster file organization is used when there is a frequent need for joining the tables with the
same condition. These joins will give only a few records from both tables.
5. In this method, we can directly insert, update or delete any record. Data is sorted based on the
key with which searching is done. Cluster key is a type of key with which joining of the table is
performed.
6. In the given example, we are retrieving the record for only particular departments. This method
can't be used to retrieve the record for the entire department.

9 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Pros of Cluster file organization
1. The cluster file organization is used when there is a frequent request for joining the tables with
same joining condition.
2. It provides the efficient result when there is a 1:M mapping between the tables.
Cons of Cluster file organization
1. This method has the low performance for the very large database.
2. If there is any change in joining condition, then this method cannot use. If we change the
condition of joining then traversing the file takes a lot of time.
3. This method is not suitable for a table with a 1:1 condition.

Indexing in DBMS
1. Indexing is used to optimize the performance of a database by minimizing the number of disk
accesses required when a query is processed.
2. The index is a type of data structure. It is used to locate and access the data in a database table
quickly.
Index structure:
Indexes can be created using some database columns.
1. The first column of the database is the search key that contains a copy of the primary key or
candidate key of the table. The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily.
2. The second column of the database is the data reference. It contains a set of pointers holding the
address of the disk block where the value of the particular key can be found.

10 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known as
ordered indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes
long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
In the case of a database with no index, we have to search the disk block from starting till it reaches 543.
The DBMS will read the record after reading 543*10=5430 bytes. In the case of an index, we will search
using indexes and the DBMS will read the record after reading 542*2= 1084 bytes which are very less
compared to the previous case.
Primary Index
If the index is created on the basis of the primary key of the table, then it is known as primary indexing.
These primary keys are unique to each record and contain 1:1 relation between the records. As primary
keys are stored in sorted order, the performance of the searching operation is quite efficient. The
primary index can be classified into two types: Dense index and sparse index.
Dense Index
The dense index contains an index record for every search key value in the data file. It makes searching
faster. In this, the number of records in the index table is same as the number of records in the main
table. It needs more space to store index record itself. The index records have the search key and a
pointer to the actual record on the disk.

11 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Sparse index
In the data file, index record appears only for a few items. Each item points to a block. In this, instead of
pointing to each record in the main table, the index points to the records in the main table in a gap.

Clustering Index
A clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary
key columns which may not be unique for each record.
In this case, to identify the record faster, we will group two or more columns to get the unique value and
create index out of them. This method is called a clustering index.
The records which have similar characteristics are grouped, and indexes are created for these group.

12 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows. These mappings are
usually kept in the primary memory so that address fetch should be faster. Then the secondary memory
searches the actual data based on the address got from mapping. If the mapping size grows then
fetching the address itself becomes slower. In this case, the sparse index will not be efficient. To
overcome this problem, secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In this
method, the huge range for the columns is selected initially so that the mapping size of the first level
becomes small. Then each range is further divided into smaller ranges. The mapping of the first level is
stored in the primary memory, so that address fetch is faster. The mapping of the second level and
actual data are stored in the secondary memory (hard disk).
For example:
If you want to find the record of roll 111 in the diagram, then it will search the highest entry which is
smaller than or equal to 111 in the first level index. It will get 100 at this level.
Then in the second index level, again it does max (111) <= 111 and gets 110. Now using the address 110,
it goes to the data block and starts searching each record till it gets 111.
This is how a search is performed in this method. Inserting, updating or deleting is also done in the same
manner.

13 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


B+ Tree
1. The B+ tree is a balanced binary search tree. It follows a multi-level index format.
2. In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes remain
at the same height.
3. In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random
access as well as sequential access.
Structure of B+ Tree
1. In the B+ tree, every leaf node is at equal distance from the root node.
2. The B+ tree is of the order n where n is fixed for every B+ tree.
3. It contains an internal node and leaf node.

Internal node
An internal node of the B+ tree can contain at least n/2 record pointers except the root node. At most,
an internal node of the tree contains n pointers.
Leaf node
The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values. At most, a leaf
node contains n record pointer and n key values. Every leaf node of the B+ tree contains one block
pointer P to point to next leaf node.

14 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Searching a record in B+ Tree
Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the intermediary
node which will direct to the leaf node that can contain a record for 55. So, in the intermediary node, we
will find a branch between 50 and 75 nodes. Then at the end, we will be redirected to the third leaf
node. Here DBMS will perform a sequential search to find 55.

B+ Tree Insertion Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf
node after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60
there. In this case, we have to split the leaf node, so that it can be inserted into tree without affecting
the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf
node of the tree in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70)
into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60 added
to it, and then we can have pointers to a new leaf node

15 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from the
intermediate node as well as from the 4th leaf node too. If we remove it from the intermediate node,
then the tree will not satisfy the rule of the B+ tree. So we need to modify it to have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:

Hashing
In a huge database structure, it is very inefficient to search all the index values and reach the desired
data. Hashing technique is used to calculate the direct location of a data record on the disk without using
index structure.
In this technique, data is stored at the data blocks whose address is generated by using the hashing
function. The memory location where these records are stored is known as data bucket or data blocks.
In this, a hash function can choose any of the column value to generate the address. Most of the time,
the hash function uses the primary key to generate the address of the data block. A hash function is a
simple mathematical function to any complex mathematical function. We can even consider the primary
key itself as the address of the data block. That means each row whose address will be the same as a
primary key stored in the data block.

16 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Suppose we have mod (5) hash function to determine the address of the data block. In this case, it
applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4 and 2 respectively, and
records are stored in those data block addresses.

Types of Hashing:
Static Hashing - In static hashing, the resultant data bucket address will always be the same. That means
if we generate an address for EMP_ID =103 using the hash function mod (5) then it will always result in
same bucket address 3. Here, there will be no change in the bucket address. Hence in this static hashing,
the number of data buckets in memory remains constant throughout. In this example, we will have five
data buckets in the memory used to store the data.

17 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


Operations of Static Hashing
1. Searching a record - When a record needs to be searched, then the same hash function retrieves
the address of the bucket where the data is stored.
2. Insert a Record - When a new record is inserted into the table, then we will generate an address
for a new record based on the hash key and record is stored in that location.
3. Delete a Record - To delete a record, we will first fetch the record which is supposed to be
deleted. Then we will delete the records for that address in memory.
4. Update a Record - To update a record, we will first search it using a hash function, and then the
data record is updated.
If we want to insert some new record into the file but the address of a data bucket generated by the
hash function is not empty, or data already exists in that address. This situation in the static hashing is
known as bucket overflow. This is a critical situation in this method.
To overcome this situation, there are various methods. Some commonly used methods are as follows:
1. Open Hashing - When a hash function generates an address at which data is already stored, then the
next bucket will be allocated to it. This mechanism is called as Linear Probing.
For example: suppose R3 is a new address which needs to be inserted, the hash function generates
address as 112 for R3. But the generated address is already full. So the system searches next available
data bucket, 113 and assigns R3 to it.

2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is linked after
the previous one. This mechanism is known as Overflow chaining.

18 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


For example: Suppose R3 is a new address which needs to be inserted into the table, the hash function
generates address as 110 for it. But this bucket is full to store the new data. In this case, a new bucket is
inserted at the end of 110 buckets and is linked to it.

Dynamic Hashing
1. The dynamic hashing method is used to overcome the problems of static hashing like bucket
overflow.
2. In this method, data buckets grow or shrink as the records increases or decreases. This method is
also known as extendable hashing method.
3. This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in poor
performance.
How to search a key
1. First, calculate the hash address of the key.
2. Check how many bits are used in the directory, and these bits are called as i.
3. Take the least significant i bits of the hash address.
4. This gives an index of the directory. Now using the index, go to the directory and find bucket
address where the record might be.
How to insert a new record

Firstly, you have to follow the same procedure for retrieval, ending up in some bucket. If there is still
space in that bucket, then place the record in it. If the bucket is full, then we will split the bucket and
redistribute the records.

19 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


For example: Consider the following grouping of keys into buckets, depending on the prefix of their hash
address

The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6 are 01, so it
will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into bucket B2. The last two bits of
7 are 11, so it will go into B3.

Advantages of dynamic hashing


1. In this method, the performance does not decrease as the data grows in the system. It simply
increases the size of memory to accommodate the data.
2. In this method, memory is well utilized as it grows and shrinks with the data. There will not be
any unused memory lying.
3. This method is good for the dynamic database where data grows and shrinks frequently.
Disadvantages of dynamic hashing
1. In this method, if the data size increases then the bucket size is also increased. These addresses
of data will be maintained in the bucket address table.
2. This is because the data address will keep changing as buckets grow and shrink.
3. If there is a huge increase in data, maintaining the bucket address table becomes tedious.

20 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD


4. In this case, the bucket overflow situation will also occur. But it might take little time to reach
this situation than static hashing.

21 Prepared by Dr. P. Sammulal, Professor, Department of CSE, JNTUH-HYD

You might also like