Unit Iii DBMS
Unit Iii DBMS
Database Design- Dependencies and Normal forms, Functional Dependencies, 1NF, 2NF, 3NF, and BCNF. Higher Normal Forms-4NF and 5NF. Transaction
Management: ACID properties, Serializability, Concurrency Control, Database recovery management. Data Storage and Indexes, Hashing Techniques.
Therefore, in this static hashing method, the number of data buckets in memory always remains constant.
Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the
new record using its hash key. When the address is generated, the record is automatically stored in that location.
Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address
1. Open hashing
2. Close hashing.
Open Hashing
In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This
For example, A2 is a new record which you wants to insert. The hash function generates address as 222. But it is already oc-
cupied by some other value. That’s why the system looks for the next data bucket 501 and assigns A2 to it.
In the close hashing method, when buckets are full, a new bucket is allocated for the same hash and result are linked after
Dynamic Hashing
Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand. In this
hashing, the hash function helps you to create a large number of values.
Addresses in the memory are sorted according to a key value called Addresses are always generated using a hash fun
Storing of address
the primary key the key value.
Preferred for range retrieval of data- which means whenever there is particular record based on the search key. Howe
Use for
retrieval data for a particular range, this method is an ideal option. will only perform well when the hash function is
search key.
There will be many unused data blocks because of the delete/update In static and dynamic hashing methods, memory
Memory management operation. These data blocks can’t be released for re-use. That’s why always managed. Bucket overflow is also handle
What is Collision?
Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in
There are two technique which you can use to avoid a hash collision:
1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is
Therefore, in this static hashing method, the number of data buckets in memory always remains constant.
Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the
new record using its hash key. When the address is generated, the record is automatically stored in that location.
Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address
1. Open hashing
2. Close hashing.
Open Hashing
In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This
For example, A2 is a new record which you wants to insert. The hash function generates address as 222. But it is already oc-
cupied by some other value. That’s why the system looks for the next data bucket 501 and assigns A2 to it.
How Open Hash Works
Close Hashing
In the close hashing method, when buckets are full, a new bucket is allocated for the same hash and result are linked after
Dynamic Hashing
Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand. In this
hashing, the hash function helps you to create a large number of values.
Addresses in the memory are sorted according to a key value called Addresses are always generated using a hash fun
Storing of address
the primary key the key value.
Use for Preferred for range retrieval of data- which means whenever there is This is an ideal method when you want to retriev
retrieval data for a particular range, this method is an ideal option. particular record based on the search key. Howe
There will be many unused data blocks because of the delete/update In static and dynamic hashing methods, memory
Memory management operation. These data blocks can’t be released for re-use. That’s why always managed. Bucket overflow is also handle
What is Collision?
Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in
There are two technique which you can use to avoid a hash collision:
1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is
3. Example:
4. In this example, maf_year and color are independent of each other but dependent on car_model. In this example,
tribute.
10. So, X -> Y is a trivial functional dependency if Y is a subset of X. Let’s understand with a Trivial Functional
Dependency Example.
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
12. Consider this table of with two columns Emp_id and Emp_name.
13. {Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of
{Emp_id,Emp_name}.
Functional dependency which also known as a nontrivial dependency occurs when A->B holds true where B is not a subset of
Example:
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it’s non-trivial functional dependency.
Transitive Dependency in DBMS
A Transitive Dependency is a type of functional dependency which happens when “t” is indirectly formed by two functional
Example:
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the compay, we know its CEO’s name)
{ Company} -> {Age} should hold, that makes sense because if we know the company name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three or more attributes.
What is Normalization?
Normalization is a method of organizing the data in the database which helps you to avoid data redundancy, insertion, up-
date & deletion anomaly. It is a process of analyzing the relation schemas based on their different functional dependencies
Normalization is inherent to relational database theory. It may have the effect of duplicating the same data within the data-
Functional Dependency avoids data redundancy. Therefore same data do not repeat at multiple locations in that
database
It helps you to maintain the quality of data in the database
It helps you to defined meanings and constraints of databases
It helps you to identify bad designs
It helps you to find the facts regarding the database design
Normalization
A large database defined as a single relation may result in data duplication. This repetition of data may result in:
o It isn't easy to maintain and update data as it would involve searching many records in relation.
So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and
well-structured relations that are satisfy desirable properties. Normalization is a process of decomposing the relations into rela-
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable char-
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data redund -
ancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss of some
o Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to be up -
dated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The relation
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be lossless.
Advantages of Normalization
o Normalization helps to minimize data redundancy.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
o Careless decomposition may lead to a bad database design, leading to serious problems.
o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
o In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X →
Y.
1. X is a super key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes (EMP_STATE,
EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Boyce Codd normal form (BCNF)
o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency.
Example
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dan-
cing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
Example
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In
this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we
leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns
blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Transaction
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform operations for accessing the contents of
the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This small transaction contains sev -
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain 3500.
o The third operation will write the buffer's value to the database. So X's final value will be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that transaction may fail before finished
For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's value will remain 4000
DBMS is the management of data that should remain integrated when any changes are done in it. It is because if the integrity of
the data is affected, whole data will get disturbed and corrupted. Therefore, to maintain the integrity of the data, there are four
properties described in the database management system, which are known as the ACID properties. The ACID properties are
meant for the transaction that goes through a different group of tasks, and there we come to see the role of the ACID properties.
In this section, we will learn and understand about the ACID properties. We will learn what these properties stand for and what
does each property is used for. We will also understand the ACID properties with the help of some examples.
ACID Properties
1) Atomicity
The term atomicity defines that the data remains atomic. It means if any operation is performed on the data, either it should be
performed or executed completely or should not be executed at all. It further means that the operation should not break in
between or execute partially. In the case of executing operations on the transaction, the operation should be completely ex-
Example: If Remo has account A having $30 in his account from which he wishes to send $10 to Sheero's account, which is B. In
account B, a sum of $ 100 is already present. When $10 will be transferred to account B, the sum will become $110. Now, there
will be two operations that will take place. One is the amount of $10 that Remo wants to transfer will be debited from his ac -
count A, and the same amount will get credited to account B, i.e., into Sheero's account. Now, what happens - the first operation
of debit executes successfully, but the credit operation, however, fails. Thus, in Remo's account A, the value becomes $20, and
action.
The below image shows that both debit and credit operations are done successfully. Thus the transaction is atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue, and so the atomicity is the main
2) Consistency
The word consistency means that the value should remain preserved always. In DBMS, the integrity of the data should be
maintained, which means if a change in the database is made, it should remain preserved always. In the case of transactions,
the integrity of the data is very essential so that the database remains consistent before and after the transaction. The data
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one by one to both B & C. There are
two operations that take place, i.e., Debit and Credit. Account A firstly debits $50 to account B, and the amount in account A is
read $300 by B before the transaction. After the successful transaction T, the available amount in B becomes $150. Now, A deb -
its $20 to account C, and that time, the value read by C is $250 (that is correct as a debit of $50 has been successfully done to
B). The debit and credit operation from account A to C has been done successfully. We can see that the transaction is done suc -
cessfully, and the value is also read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which
means that data is inconsistent because when the debit operation executes, it will not be consistent.
3) Isolation
The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no data should affect the other
one and may occur concurrently. In short, the operation on one database should begin when the operation on the first database
gets complete. It means if two operations are being performed on two different databases, they may not affect the value of one
another. In the case of transactions, when two or more transactions occur simultaneously, the consistency should remain main-
tained. Any changes that occur in any particular transaction will not be seen by other transactions until the change is not com -
Example: If two operations are concurrently running on two different accounts, then the value of both accounts should not get
affected. The value should remain persistent. As you can see in the below diagram, account A is making T1 and T2 transactions
to account B and C, but both are executing independently without affecting each other. It is known as Isolation.
4) Durability
Durability ensures the permanency of something. In DBMS, the term durability ensures that the data after the successful execu -
tion of the operation becomes permanent in the database. The durability of the data should be so perfect that even if the sys-
tem fails or leads to a crash, the database still survives. However, if gets lost, it becomes the responsibility of the recovery man-
ager for ensuring the durability of the database. For committing the values, the COMMIT command must be used every time we
make changes.
Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency and availability of data in the database.
States of Transaction
o The active state is the first state of every transaction. In this state, the transaction is being executed.
o For example: Insertion or deletion or updating a record is done here. But all the records are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is executed in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all the effects are now
Failed state
o If any of the checks made by the database recovery system fails, then the transaction is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then the transaction will fail
to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the database recovery system will make sure that the
database is in its previous consistent state. If not then it will abort or roll back the transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the transaction, all the executed transactions are
o After aborting the transaction, the database recovery module will select one of the two operations:
Schedule
A series of operation from one transaction to another transaction is known as schedule. It is used to preserve the order of the
The serial schedule is a type of schedule where one transaction is executed completely before starting another transaction. In
the serial schedule, when the first transaction completes its cycle, then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no interleaving of opera -
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.
2. Non-serial Schedule
o If interleaving of operations is allowed, then there will be non-serial schedule.
o It contains many possible orders in which the system can execute the individual operations of the transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It has interleaving of operations.
3. Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow the transaction to execute concurrently without in-
o It identifies which schedules are correct when executions of the transaction have interleaving of their operations.
o A non-serial schedule will be serializable if its result is equal to the result of its transactions executed serially.
Here,
, Regulating Crypto
NOW
PLAYING
Executing a single transaction at a time will increase the waiting time of the other transactions which may
result in delay in the overall execution. Hence for increasing the overall throughput and efficiency of the system,
the database.
There are several problems that arise when numerous transactions are executed simultaneously in a random man -
ner. The database transaction consist of two major operations “Read” and “Write”. It is very important to manage
these operations in the concurrent execution of the transactions in order to maintain the consistency of the data.
Dirty read problem occurs when one transaction updates an item but due to some unconditional events that trans -
action fails but before the transaction performs rollback, some other transaction reads the updated value. Thus cre -
ates an inconsistency in the database. Dirty read problem comes under the scenario of Write-Read conflict between
1. The lost update problem can be illustrated with the below scenario between two transactions T1 and T2.
2. Transaction T1 modifies a database record without committing the changes.
3. T2 reads the uncommitted data changed by T1
4. T1 performs rollback
5. T2 has already read the uncommitted data of T1 which is no longer valid, thus creating inconsistency in
the database.
Lost update problem occurs when two or more transactions modify the same data, resulting in the update being
overwritten or lost by another transaction. The lost update problem can be illustrated with the below scenario
database.
Concurrency control protocols are the set of rules which are maintained in order to solve the concurrency control
problems in the database. It ensures that the concurrent transactions can execute properly while maintaining the
database consistency. The concurrent execution of a transaction is provided with atomicity, consistency, isolation,
In locked based protocol , each transaction needs to acquire locks before they start accessing or modifying the data
simultaneously. The transaction which is holding a shared lock can only read the data item but it can not modify the
data item.
Exclusive Lock : Exclusive lock is also known as the write lock. Exclusive lock allows a transaction to up -
date a data item. Only one transaction can hold the exclusive lock on a data item at a time. While a transaction is
holding an exclusive lock on a data item, no other transaction is allowed to acquire a shared/exclusive lock on the
There are two kind of lock based protocol mostly used in database:
Two Phase Locking Protocol : Two phase locking is a widely used technique which ensures strict order-
ing of lock acquisition and release. Two phase locking protocol works in two phases.
Growing Phase : In this phase, the transaction starts acquiring locks before performing any
modification on the data items. Once a transaction acquires a lock, that lock can not be released until the transac -
all the modifications on the data item. Once the transaction starts releasing the locks, it can not acquire any locks
further.
Strict Two Phase Locking Protocol : It is almost similar to the two phase locking protocol the only dif -
ference is that in two phase locking the transaction can release its locks before it commits, but in case of strict two
phase locking the transactions are only allowed to release the locks only when they performs commits.
tion of the timestamp values of the transactions. Therefore, guaranteeing that the transactions take place in the
correct order.
Advantages of Concurrency
In general, concurrency means, that more than one transaction can work on a system. The advantages of a concur -
Waiting Time: It means if a process is in a ready state but still the process does not get the system to get
execute is called waiting time. So, concurrency leads to less waiting time.
Response Time: The time wasted in getting the response from the cpu for the first time, is called re -
ation. Multiple transactions can run parallel in a system. So, concurrency leads to more Resource Utilization.
Efficiency: The amount of output produced in comparison to given input is called efficiency. So, Concur -
Disadvantages of Concurrency
Overhead: Implementing concurrency control requires additional overhead, such as acquiring and releas -
ing locks on database objects. This overhead can lead to slower performance and increased resource consumption,
sources, causing a circular dependency that can prevent any of the transactions from completing. Deadlocks can be
difficult to detect and resolve, and can result in reduced throughput and increased latency.
Reduced concurrency: Concurrency control can limit the number of users or applications that can access
the database simultaneously. This can lead to reduced concurrency and slower performance in systems with high
levels of concurrency.
Complexity: Implementing concurrency control can be complex, particularly in distributed systems or in
systems with complex transactional logic. This complexity can lead to increased development and maintenance
costs.
Inconsistency: In some cases, concurrency control can lead to inconsistencies in the database. For ex-
ample, a transaction that is rolled back may leave the database in an inconsistent state, or a long-running transac -
tion may cause other transactions to wait for extended periods, leading to data staleness and reduced accuracy.
Database recovery techniques are used in database management systems (DBMS) to restore a database to a con -
sistent state after a failure or error has occurred. The main goal of recovery techniques is to ensure data integrity
and consistency and prevent data loss. There are mainly two types of recovery techniques used in DBMS:
Rollback/Undo Recovery Technique: The rollback/undo recovery technique is based on the principle of backing
out or undoing the effects of a transaction that has not completed successfully due to a system failure or error. This
technique is accomplished by undoing the changes made by the transaction using the log records stored in the
transaction log. The transaction log contains a record of all the transactions that have been performed on the data -
base. The system uses the log records to undo the changes made by the failed transaction and restore the database
Commit/Redo Recovery Technique: The commit/redo recovery technique is based on the principle of reapplying
the changes made by a transaction that has been completed successfully to the database. This technique is accom -
plished by using the log records stored in the transaction log to redo the changes made by the transaction that was
in progress at the time of the failure or error. The system uses the log records to reapply the changes made by the
transaction and restore the database to its most recent consistent state.
In addition to these two techniques, there is also a third technique called checkpoint recovery. Checkpoint recovery
is a technique used to reduce the recovery time by periodically saving the state of the database in a checkpoint file.
In the event of a failure, the system can use the checkpoint file to restore the database to the most recent consist -
ent state before the failure occurred, rather than going through the entire log to recover the database.
Overall, recovery techniques are essential to ensure data consistency and availability in DBMS, and each technique
has its own advantages and limitations that must be considered in the design of a recovery system
Database systems, like any other computer system, are subject to failures but the data stored in them must be
available as and when required. When a database fails it must possess the facilities for fast recovery. It must also
have atomicity i.e. either transaction are completed successfully and committed (the effect is recorded permanently
in the database) or the transaction should have no effect on the database. There are both automatic and non-auto -
matic ways for both, backing up of data and recovery from any failure situations. The techniques used to recover
the lost data due to system crashes, transaction errors, viruses, catastrophic failure, incorrect commands execution,
etc. are database recovery techniques. So to prevent data loss recovery techniques based on deferred update and
immediate update or backing up data can be used. Recovery techniques are heavily dependent upon the existence
of a special file known as a system log. It contains information about the start and end of each transaction and any
updates which occur during the transaction. The log keeps track of all transaction operations that affect the values
tion.
read_item(T, X): This log entry records that transaction T reads the value of database item X.
write_item(T, X, old_value, new_value): This log entry records that transaction T changes the value of
the database item X from old_value to new_value. The old value is sometimes known as a before an
cessfully and its effect can be committed (recorded permanently) to the database.
abort(T): This records that transaction T has been aborted.
checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the system and
stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was in a
A transaction T reaches its commit point when all its operations that access the database have been executed suc-
cessfully i.e. the transaction has reached the point at which it will not abort (terminate without completing). Once
committed, the transaction is permanently recorded in the database. Commitment always involves writing a commit
entry to the log and writing the log to disk. At the time of a system crash, item is searched back in the log for all
transactions T that have written a start_transaction(T) entry into the log but have not written a commit(T) entry yet;
these transactions may have to be rolled back to undo their effect on the database during the recovery process.
Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e. reverse
the operations of a transaction. This involves examining a transaction for the log entry write_item(T,
x, old_value, new_value) and set the value of item x in the database to old-value. There are two major
techniques for recovery from non-catastrophic transaction failures: deferred updates and immediate
updates.
Deferred update – This technique does not physically update the database on disk until a transac-
tion has reached its commit point. Before reaching commit, all transaction updates are recorded in
the local transaction workspace. If a transaction fails before reaching its commit point, it will not have
changed the database in any way so UNDO is not needed. It may be necessary to REDO the effect of
the operations that are recorded in the local transaction workspace, because their effect may not yet
have been written in the database. Hence, a deferred update is also known as the No-undo/redo al-
gorithm
Immediate update – In the immediate update, the database may be updated by some operations of
a transaction before the transaction reaches its commit point. However, these operations are recor -
ded in a log on disk before they are applied to the database, making recovery still possible. If a trans -
action fails to reach its commit point, the effect of its operation must be undone i.e. the transaction
must be rolled back hence we require both undo and redo. This technique is known as undo/redo al-
gorithm.
Caching/Buffering – In this one or more disk pages that include data items to be updated are
cached into main memory buffers and then updated in memory before being written back to disk. A
collection of in-memory buffers called the DBMS cache is kept under the control of DBMS for holding
these buffers. A directory is used to keep track of which database items are in the buffer. A dirty bit is
associated with each buffer, which is 0 if the buffer is not modified else 1 if modified.
Shadow paging – It provides atomicity and durability. A directory with n entries is constructed,
where the ith entry points to the ith database page on the link. When a transaction began executing
the current directory is copied into a shadow directory. When a page is to be modified, a shadow page
is allocated in which changes are made and when it is ready to become durable, all pages that refer to
a backup of the data is not available and previous modifications need to be undone, this technique
can be helpful. With the backward recovery method, unused modifications are removed and the data-
base is returned to its prior condition. All adjustments made during the previous traction are reversed
during the backward recovery. In another word, it reprocesses valid transactions and undoes the erro -
needs to be updated with all changes verified, this forward recovery technique is helpful.
Some failed transactions in this database are applied to the database to roll those modifications for -
ward. In another word, the database is restored using preserved data and valid transactions counted
Full database backup – In this full database including data and database, Meta information needed
to restore the whole database, including full-text catalogs are backed up in a predefined time series.
Differential backup – It stores only the data changes that have occurred since the last full database
backup. When some data has changed many times since last full database backup, a differential
backup stores the most recent version of the changed data. For this first, we need to restore a full
database backup.
Transaction log backup – In this, all events that have occurred in the database, like a record of every
single statement executed is backed up. It is the backup of transaction log entries and contains all transactions that
had happened to the database. Through this, the database can be recovered to a specific point in time. It is even
possible to perform a backup from a transaction log if the data files are destroyed and not even a single committed
transaction is lost.
In DBMS, hashing is a technique to directly search the location of desired data on the disk without using index structure.
Hashing method is used to index and retrieve items in a database as it is faster to search that specific item using the shorter
hashed key instead of using its original value. Data is stored in the form of data blocks whose address is generated by apply-
ing a hash function in the memory location where these records are stored known as a data block or data bucket.
Here, are the situations in the DBMS where you need to apply the Hashing method:
For a huge database structure, it’s tough to search all the index values through all its level and then you need to
ture.
It is also a helpful technique for implementing dictionaries.
Data bucket – Data buckets are memory locations where the records are stored. It is also known as Unit Of Stor-
age.
Key: A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple) in a relation(table).
used to enter the new record, instead of overwriting on the older record.
Quadratic probing– It helps you to determine the new bucket address. It helps you to add Interval between
probes by adding the consecutive output of quadratic polynomial to starting value given by the original computation.
Hash index – It is an address of the data block. A hash function could be a simple mathematical function to even a
has a collision.
Bucket Overflow: The condition of bucket-overflow is called collision. This is a fatal stage for any static has to
function.
1. Static Hashing
2. Dynamic Hashing
Static Hashing
In the static hashing, the resultant data bucket address will always remain the same.
Therefore, if you generate an address for say Student_ID = 10 using hashing function mod(3), the resultant bucket address
will always be 1. So, you will not see any change in the bucket address.
Therefore, in this static hashing method, the number of data buckets in memory always remains constant.
Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the
new record using its hash key. When the address is generated, the record is automatically stored in that location.
Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address
1. Open hashing
2. Close hashing.
Open Hashing
In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This
For example, A2 is a new record which you wants to insert. The hash function generates address as 222. But it is already oc-
cupied by some other value. That’s why the system looks for the next data bucket 501 and assigns A2 to it.
In the close hashing method, when buckets are full, a new bucket is allocated for the same hash and result are linked after
Dynamic Hashing
Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand. In this
hashing, the hash function helps you to create a large number of values.
Addresses in the memory are sorted according to a key value called Addresses are always generated using a hash fun
Storing of address
the primary key the key value.
Preferred for range retrieval of data- which means whenever there is particular record based on the search key. Howe
Use for
retrieval data for a particular range, this method is an ideal option. will only perform well when the hash function is
search key.
There will be many unused data blocks because of the delete/update In static and dynamic hashing methods, memory
Memory management operation. These data blocks can’t be released for re-use. That’s why always managed. Bucket overflow is also handle
What is Collision?
Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in
There are two technique which you can use to avoid a hash collision:
1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is