Dbms Notes - Unit 5
Dbms Notes - Unit 5
UNIT - I
Database System Applications: A Historical Perspective, File Systems versus a DBMS, the Data
Model, Levels of Abstraction in a DBMS, Data Independence, Structure of a DBMS
Introduction to Database Design: Database Design and ER Diagrams, Entities, Attributes, and
Entity Sets, Relationships and Relationship Sets, Additional Features of the ER Model, Conceptual
Design with the ER Model
UNIT - II
Introduction to the Relational Model: Integrity constraint over relations, enforcing integrity
constraints, querying relational data, logical data base design, introduction to views,
destroying/altering tables and views.
Relational Algebra, Tuple relational Calculus, Domain relational calculus.
UNIT - III
SQL: QUERIES, CONSTRAINTS, TRIGGERS: form of basic SQL query, UNION, INTERSECT, and
EXCEPT, Nested Queries, aggregation operators, NULL values, complex integrity constraints in
SQL, triggers and active data bases.
Schema Refinement: Problems caused by redundancy, decompositions, problems related to
decomposition, reasoning about functional dependencies, FIRST, SECOND, THIRD normal forms,
BCNF, lossless join decomposition, multi-valued dependencies, FOURTH normal form, FIFTH
normal form.
UNIT - IV
Transaction Concept, Transaction State, Implementation of Atomicity and Durability, Concurrent
Executions, Serializability, Recoverability, Implementation of Isolation, Testing for serializability,
Lock Based Protocols, Timestamp Based Protocols, Validation- Based Protocols, Multiple
Granularity, Recovery and Atomicity, Log–Based Recovery, Recovery with Concurrent
Transactions.
UNIT - V
Data on External Storage, File Organization and Indexing, Cluster Indexes, Primary and Secondary
Indexes, Index data Structures, Hash Based Indexing, Tree base Indexing, Comparison of File
Organizations, Indexes and Performance Tuning, Intuitions for tree Indexes, Indexed Sequential
Access Methods (ISAM), B+ Trees: A Dynamic Index Structure.
This method is the easiest method for file organization. In this method, files are stored sequentially. This
method can be implemented in two ways:
1. It is a quite simple method. In this method, we store the record in a sequence, i.e., one after
another. Here, the record will be inserted in the order in which they are inserted into tables.
2. In case of updating or deleting of any record, the record will be searched in the memory blocks.
When it is found, then it will be marked for deleting, and the new record is inserted.
1. Hash File Organization uses the computation of hash function on some fields of the records.
2. The hash function's output determines the location of disk block where the records are to be
placed.
3. When a record has to be received using the hash key columns, then the address is generated,
and the whole record is retrieved using that address.
4. In the same way, when a new record has to be inserted, then the address is generated using the
hash key and record is directly inserted. The same process is applied in the case of delete and
update.
5. In this method, there is no effort for searching and sorting the entire file. In this method, each
record will be stored randomly in the memory.
Indexing in DBMS
1. Indexing is used to optimize the performance of a database by minimizing the number of disk
accesses required when a query is processed.
2. The index is a type of data structure. It is used to locate and access the data in a database table
quickly.
Index structure:
Indexes can be created using some database columns.
1. The first column of the database is the search key that contains a copy of the primary key or
candidate key of the table. The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily.
2. The second column of the database is the data reference. It contains a set of pointers holding the
address of the disk block where the value of the particular key can be found.
Clustering Index
A clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary
key columns which may not be unique for each record.
In this case, to identify the record faster, we will group two or more columns to get the unique value and
create index out of them. This method is called a clustering index.
The records which have similar characteristics are grouped, and indexes are created for these group.
Internal node
An internal node of the B+ tree can contain at least n/2 record pointers except the root node. At most,
an internal node of the tree contains n pointers.
Leaf node
The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values. At most, a leaf
node contains n record pointer and n key values. Every leaf node of the B+ tree contains one block
pointer P to point to next leaf node.
B+ Tree Insertion Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf
node after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60
there. In this case, we have to split the leaf node, so that it can be inserted into tree without affecting
the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf
node of the tree in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70)
into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60 added
to it, and then we can have pointers to a new leaf node
Hashing
In a huge database structure, it is very inefficient to search all the index values and reach the desired
data. Hashing technique is used to calculate the direct location of a data record on the disk without using
index structure.
In this technique, data is stored at the data blocks whose address is generated by using the hashing
function. The memory location where these records are stored is known as data bucket or data blocks.
In this, a hash function can choose any of the column value to generate the address. Most of the time,
the hash function uses the primary key to generate the address of the data block. A hash function is a
simple mathematical function to any complex mathematical function. We can even consider the primary
key itself as the address of the data block. That means each row whose address will be the same as a
primary key stored in the data block.
Types of Hashing:
Static Hashing - In static hashing, the resultant data bucket address will always be the same. That means
if we generate an address for EMP_ID =103 using the hash function mod (5) then it will always result in
same bucket address 3. Here, there will be no change in the bucket address. Hence in this static hashing,
the number of data buckets in memory remains constant throughout. In this example, we will have five
data buckets in the memory used to store the data.
2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is linked after
the previous one. This mechanism is known as Overflow chaining.
Dynamic Hashing
1. The dynamic hashing method is used to overcome the problems of static hashing like bucket
overflow.
2. In this method, data buckets grow or shrink as the records increases or decreases. This method is
also known as extendable hashing method.
3. This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in poor
performance.
How to search a key
1. First, calculate the hash address of the key.
2. Check how many bits are used in the directory, and these bits are called as i.
3. Take the least significant i bits of the hash address.
4. This gives an index of the directory. Now using the index, go to the directory and find bucket
address where the record might be.
How to insert a new record
Firstly, you have to follow the same procedure for retrieval, ending up in some bucket. If there is still
space in that bucket, then place the record in it. If the bucket is full, then we will split the bucket and
redistribute the records.
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6 are 01, so it
will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into bucket B2. The last two bits of
7 are 11, so it will go into B3.