0% found this document useful (0 votes)
15 views36 pages

Unit Iii DBMS

The document covers various aspects of database design, including normalization, functional dependencies, and hashing techniques. It explains different normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) and the importance of avoiding data redundancy and anomalies. Additionally, it discusses static and dynamic hashing methods, collision handling techniques, and the differences between ordered indexing and hashing.

Uploaded by

Chaitanya Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views36 pages

Unit Iii DBMS

The document covers various aspects of database design, including normalization, functional dependencies, and hashing techniques. It explains different normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) and the importance of avoiding data redundancy and anomalies. Additionally, it discusses static and dynamic hashing methods, collision handling techniques, and the differences between ordered indexing and hashing.

Uploaded by

Chaitanya Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT III:

Database Design- Dependencies and Normal forms, Functional Dependencies, 1NF, 2NF, 3NF, and BCNF. Higher Normal Forms-4NF and 5NF. Transaction

Management: ACID properties, Serializability, Concurrency Control, Database recovery management. Data Storage and Indexes, Hashing Techniques.

Therefore, in this static hashing method, the number of data buckets in memory always remains constant.

Static Hash Functions

 Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the

new record using its hash key. When the address is generated, the record is automatically stored in that location.
 Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address

of the bucket where data should be stored.


 Delete a record: Using the hash function, you can first fetch the record which is you wants to delete. Then you can

remove the records for that address in memory.

Static hashing is further divided into

1. Open hashing
2. Close hashing.

Open Hashing

In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This

method is also known as linear probing.

For example, A2 is a new record which you wants to insert. The hash function generates address as 222. But it is already oc-

cupied by some other value. That’s why the system looks for the next data bucket 501 and assigns A2 to it.

How Open Hash Works


Close Hashing

In the close hashing method, when buckets are full, a new bucket is allocated for the same hash and result are linked after

the previous one.

Dynamic Hashing

Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand. In this

hashing, the hash function helps you to create a large number of values.

Difference between Ordered Indexing and Hashing

Below are the key differences between Indexing and Hashing

Parameters Order Indexing Hashing

Addresses in the memory are sorted according to a key value called Addresses are always generated using a hash fun
Storing of address
the primary key the key value.

Performance of hashing will be best when there


It can decrease when the data increases in the hash file. As it stores the
constant addition and deletion of data. Howeve
Performance data in a sorted form when there is any (insert/delete/update)
the database is huge, then hash file organization
operation performed which decreases its performance.
maintenance will be costlier.

This is an ideal method when you want to retriev

Preferred for range retrieval of data- which means whenever there is particular record based on the search key. Howe
Use for
retrieval data for a particular range, this method is an ideal option. will only perform well when the hash function is

search key.

There will be many unused data blocks because of the delete/update In static and dynamic hashing methods, memory

Memory management operation. These data blocks can’t be released for re-use. That’s why always managed. Bucket overflow is also handle

regular maintenance of the memory is required. perfectly to extend static hashing.

What is Collision?

Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in

the hash table.


How to deal with Hashing Collision?

There are two technique which you can use to avoid a hash collision:

1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is

found, where a record should be placed.


2. Chaining: Chaining method builds a Linked list of items whose key hashes to the same value. This method requires

an extra link field to each table position.

Therefore, in this static hashing method, the number of data buckets in memory always remains constant.

Static Hash Functions

 Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the

new record using its hash key. When the address is generated, the record is automatically stored in that location.
 Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address

of the bucket where data should be stored.


 Delete a record: Using the hash function, you can first fetch the record which is you wants to delete. Then you can

remove the records for that address in memory.

Static hashing is further divided into

1. Open hashing
2. Close hashing.

Open Hashing

In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This

method is also known as linear probing.

For example, A2 is a new record which you wants to insert. The hash function generates address as 222. But it is already oc-

cupied by some other value. That’s why the system looks for the next data bucket 501 and assigns A2 to it.
How Open Hash Works

Close Hashing

In the close hashing method, when buckets are full, a new bucket is allocated for the same hash and result are linked after

the previous one.

Dynamic Hashing

Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand. In this

hashing, the hash function helps you to create a large number of values.

Difference between Ordered Indexing and Hashing

Below are the key differences between Indexing and Hashing

Parameters Order Indexing Hashing

Addresses in the memory are sorted according to a key value called Addresses are always generated using a hash fun
Storing of address
the primary key the key value.

Performance of hashing will be best when there


It can decrease when the data increases in the hash file. As it stores the
constant addition and deletion of data. Howeve
Performance data in a sorted form when there is any (insert/delete/update)
the database is huge, then hash file organization
operation performed which decreases its performance.
maintenance will be costlier.

Use for Preferred for range retrieval of data- which means whenever there is This is an ideal method when you want to retriev

retrieval data for a particular range, this method is an ideal option. particular record based on the search key. Howe

will only perform well when the hash function is


search key.

There will be many unused data blocks because of the delete/update In static and dynamic hashing methods, memory

Memory management operation. These data blocks can’t be released for re-use. That’s why always managed. Bucket overflow is also handle

regular maintenance of the memory is required. perfectly to extend static hashing.

What is Collision?

Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in

the hash table.

How to deal with Hashing Collision?

There are two technique which you can use to avoid a hash collision:

1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is

found, where a record should be placed.


2. Chaining: Chaining method builds a Linked list of items whose key hashes to the same value. This method requires

an extra link field to each table position.

3. Example:

Car_model Maf_year Color

H001 2017 Metallic

H001 2017 Green

H005 2018 Metallic

H005 2018 Blue

H010 2015 Metallic

H033 2012 Gray

4. In this example, maf_year and color are independent of each other but dependent on car_model. In this example,

these two columns are said to be multivalue dependent on car_model.

5. This dependence can be represented like this:

6. car_model -> maf_year


7. car_model-> colour

8. Trivial Functional Dependency in DBMS


9. The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in that at-

tribute.

10. So, X -> Y is a trivial functional dependency if Y is a subset of X. Let’s understand with a Trivial Functional

Dependency Example.

11. For example:

Emp_id Emp_name

AS555 Harry

AS811 George

AS999 Kevin

12. Consider this table of with two columns Emp_id and Emp_name.

13. {Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of

{Emp_id,Emp_name}.

Non Trivial Functional Dependency in DBMS

Functional dependency which also known as a nontrivial dependency occurs when A->B holds true where B is not a subset of

A. In a relationship, if attribute B is not a subset of attribute A, then it is considered as a non-trivial dependency.

Company CEO Age

Microsoft Satya Nadella 51

Google Sundar Pichai 46

Apple Tim Cook 57

Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name)

But CEO is not a subset of Company, and hence it’s non-trivial functional dependency.
Transitive Dependency in DBMS

A Transitive Dependency is a type of functional dependency which happens when “t” is indirectly formed by two functional

dependencies. Let’s understand with the following Transitive Dependency Example.

Example:

Company CEO Age

Microsoft Satya Nadella 51

Google Sundar Pichai 46

Alibaba Jack Ma 54

{Company} -> {CEO} (if we know the compay, we know its CEO’s name)

{CEO } -> {Age} If we know the CEO, we know the Age

Therefore according to the rule of rule of transitive dependency:

{ Company} -> {Age} should hold, that makes sense because if we know the company name, we can know his age.

Note: You need to remember that transitive dependency can only occur in a relation of three or more attributes.

What is Normalization?

Normalization is a method of organizing the data in the database which helps you to avoid data redundancy, insertion, up-

date & deletion anomaly. It is a process of analyzing the relation schemas based on their different functional dependencies

and primary key.

Normalization is inherent to relational database theory. It may have the effect of duplicating the same data within the data-

base which may result in the creation of additional tables.

Advantages of Functional Dependency

 Functional Dependency avoids data redundancy. Therefore same data do not repeat at multiple locations in that

database
 It helps you to maintain the quality of data in the database
 It helps you to defined meanings and constraints of databases
 It helps you to identify bad designs
 It helps you to find the facts regarding the database design

Normalization

A large database defined as a single relation may result in data duplication. This repetition of data may result in:

o Making relations very large.

o It isn't easy to maintain and update data as it would involve searching many records in relation.

o Wastage and poor utilization of disk space and resources.

o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and

well-structured relations that are satisfy desirable properties. Normalization is a process of decomposing the relations into rela-

tions with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.

o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable char-

acteristics like Insertion, Update, and Deletion Anomalies.

o Normalization divides the larger table into smaller and links them using relationships.

o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data redund -

ancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines

that helps to guide you in creating a good database structure.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship due to lack of data.

o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss of some

other important data.

o Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to be up -

dated.
Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The relation

is said to be in particular normal form if it satisfies constraints.

Following are the various types of Normal forms:

Normal Form Description

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be lossless.

Advantages of Normalization
o Normalization helps to minimize data redundancy.

o Greater overall database organization.

o Data consistency within the database.

o Much more flexible database design.


o Enforces the concept of relational integrity.

Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.

o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.

o It is very time-consuming and difficult to normalize relations of a higher degree.

o Careless decomposition may lead to a bad database design, leading to serious problems.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.

o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.

o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP

9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab

8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar


12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.

o In the second normal form, all non-key attributes are fully functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach

more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key.

That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38
TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.

o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.

o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X →

Y.

1. X is a super key.

2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes (EMP_STATE,

EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal
Boyce Codd normal form (BCNF)
o BCNF is the advance version of 3NF. It is stricter than 3NF.

o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.

o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY

2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY

2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID

For the second table: EMP_DEPT

For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.

o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency.

Example

STUDENT
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship

between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dan-

cing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing
74 Cricket

59 Hockey

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.

o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.

o 5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In

this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we

leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns

blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math
Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Transaction
o The transaction is a set of logically related operation. It contains a group of tasks.

o A transaction is an action or series of actions. It is performed by a single user to perform operations for accessing the contents of

the database.

Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This small transaction contains sev -

eral low-level tasks:

X's Account

1. Open_Account(X)

2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800

4. X.balance = New_Balance

5. Close_Account(X)

Y's Account

1. Open_Account(Y)

2. Old_Balance = Y.balance

3. New_Balance = Old_Balance + 800

4. Y.balance = New_Balance

5. Close_Account(Y)

Operations of Transaction:

Following are the main operations of transaction:

Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from the buffer.

Let's take an example to debit transaction from an account which consists of following operations:

1. 1. R(X);

2. 2. X = X - 500;

3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.

o The second operation will decrease the value of X by 500. So buffer will contain 3500.

o The third operation will write the buffer's value to the database. So X's final value will be 3500.

But it may be possible that because of the failure of hardware, software or power, etc. that transaction may fail before finished

all the operations in the set.

For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's value will remain 4000

in the database which is not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.


ACID Properties in DBMS

DBMS is the management of data that should remain integrated when any changes are done in it. It is because if the integrity of

the data is affected, whole data will get disturbed and corrupted. Therefore, to maintain the integrity of the data, there are four

properties described in the database management system, which are known as the ACID properties. The ACID properties are

meant for the transaction that goes through a different group of tasks, and there we come to see the role of the ACID properties.

In this section, we will learn and understand about the ACID properties. We will learn what these properties stand for and what

does each property is used for. We will also understand the ACID properties with the help of some examples.

ACID Properties

The expansion of the term ACID defines for:

1) Atomicity

The term atomicity defines that the data remains atomic. It means if any operation is performed on the data, either it should be
performed or executed completely or should not be executed at all. It further means that the operation should not break in

between or execute partially. In the case of executing operations on the transaction, the operation should be completely ex-

ecuted and not partially.

Example: If Remo has account A having $30 in his account from which he wishes to send $10 to Sheero's account, which is B. In

account B, a sum of $ 100 is already present. When $10 will be transferred to account B, the sum will become $110. Now, there

will be two operations that will take place. One is the amount of $10 that Remo wants to transfer will be debited from his ac -

count A, and the same amount will get credited to account B, i.e., into Sheero's account. Now, what happens - the first operation

of debit executes successfully, but the credit operation, however, fails. Thus, in Remo's account A, the value becomes $20, and

to that of Sheero's account, it remains $100 as it was previously present.


In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account B. So, it is not an atomic trans -

action.

The below image shows that both debit and credit operations are done successfully. Thus the transaction is atomic.

Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge issue, and so the atomicity is the main

focus in the bank systems.

2) Consistency

The word consistency means that the value should remain preserved always. In DBMS, the integrity of the data should be

maintained, which means if a change in the database is made, it should remain preserved always. In the case of transactions,
the integrity of the data is very essential so that the database remains consistent before and after the transaction. The data

should always be correct.

Example:

In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one by one to both B & C. There are

two operations that take place, i.e., Debit and Credit. Account A firstly debits $50 to account B, and the amount in account A is

read $300 by B before the transaction. After the successful transaction T, the available amount in B becomes $150. Now, A deb -

its $20 to account C, and that time, the value read by C is $250 (that is correct as a debit of $50 has been successfully done to

B). The debit and credit operation from account A to C has been done successfully. We can see that the transaction is done suc -

cessfully, and the value is also read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which

means that data is inconsistent because when the debit operation executes, it will not be consistent.

3) Isolation

The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no data should affect the other

one and may occur concurrently. In short, the operation on one database should begin when the operation on the first database

gets complete. It means if two operations are being performed on two different databases, they may not affect the value of one

another. In the case of transactions, when two or more transactions occur simultaneously, the consistency should remain main-

tained. Any changes that occur in any particular transaction will not be seen by other transactions until the change is not com -

mitted in the memory.

Example: If two operations are concurrently running on two different accounts, then the value of both accounts should not get

affected. The value should remain persistent. As you can see in the below diagram, account A is making T1 and T2 transactions

to account B and C, but both are executing independently without affecting each other. It is known as Isolation.
4) Durability

Durability ensures the permanency of something. In DBMS, the term durability ensures that the data after the successful execu -

tion of the operation becomes permanent in the database. The durability of the data should be so perfect that even if the sys-

tem fails or leads to a crash, the database still survives. However, if gets lost, it becomes the responsibility of the recovery man-

ager for ensuring the durability of the database. For committing the values, the COMMIT command must be used every time we

make changes.

Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency and availability of data in the database.

States of Transaction

In a database, the transaction can be in one of the following states -


Active state

o The active state is the first state of every transaction. In this state, the transaction is being executed.

o For example: Insertion or deletion or updating a record is done here. But all the records are still not saved to the database.

Partially committed

o In the partially committed state, a transaction executes its final operation, but the data is still not saved to the database.

o In the total mark calculation example, a final display of the total marks step is executed in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all the effects are now

permanently saved on the database system.

Failed state

o If any of the checks made by the database recovery system fails, then the transaction is said to be in the failed state.

o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then the transaction will fail

to execute.

Aborted

o If any of the checks fail and the transaction has reached a failed state then the database recovery system will make sure that the

database is in its previous consistent state. If not then it will abort or roll back the transaction to bring the database into a consistent state.

o If the transaction fails in the middle of the transaction then before executing the transaction, all the executed transactions are

rolled back to its consistent state.

o After aborting the transaction, the database recovery module will select one of the two operations:

1. Re-start the transaction

2. Kill the transaction

Schedule

A series of operation from one transaction to another transaction is known as schedule. It is used to preserve the order of the

operation in each of the individual transaction.


1. Serial Schedule

The serial schedule is a type of schedule where one transaction is executed completely before starting another transaction. In

the serial schedule, when the first transaction completes its cycle, then the next transaction is executed.

For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no interleaving of opera -

tions, then there are the following two possible outcomes:

1. Execute all the operations of T1 which was followed by all the operations of T2.

2. Execute all the operations of T1 which was followed by all the operations of T2.

o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.

o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.

2. Non-serial Schedule
o If interleaving of operations is allowed, then there will be non-serial schedule.

o It contains many possible orders in which the system can execute the individual operations of the transactions.

o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It has interleaving of operations.

3. Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow the transaction to execute concurrently without in-

terfering with one another.

o It identifies which schedules are correct when executions of the transaction have interleaving of their operations.

o A non-serial schedule will be serializable if its result is equal to the result of its transactions executed serially.
Here,

, Regulating Crypto

NOW

PLAYING

Schedule A and Schedule B are serial schedule.


Schedule C and Schedule D are Non-serial schedule.

Concurrency Control in DBMS

 Executing a single transaction at a time will increase the waiting time of the other transactions which may

result in delay in the overall execution. Hence for increasing the overall throughput and efficiency of the system,

several transactions are executed.


 Concurrently control is a very important concept of DBMS which ensures the simultaneous execution or

manipulation of data by several processes or user without resulting in data inconsistency.


 Concurrency control provides a procedure that is able to control concurrent execution of the operations in

the database.

Concurrency Control Problems

There are several problems that arise when numerous transactions are executed simultaneously in a random man -

ner. The database transaction consist of two major operations “Read” and “Write”. It is very important to manage

these operations in the concurrent execution of the transactions in order to maintain the consistency of the data.

Dirty Read Problem(Write-Read conflict)

Dirty read problem occurs when one transaction updates an item but due to some unconditional events that trans -

action fails but before the transaction performs rollback, some other transaction reads the updated value. Thus cre -

ates an inconsistency in the database. Dirty read problem comes under the scenario of Write-Read conflict between

the transactions in the database

1. The lost update problem can be illustrated with the below scenario between two transactions T1 and T2.
2. Transaction T1 modifies a database record without committing the changes.
3. T2 reads the uncommitted data changed by T1
4. T1 performs rollback
5. T2 has already read the uncommitted data of T1 which is no longer valid, thus creating inconsistency in

the database.

Lost Update Problem

Lost update problem occurs when two or more transactions modify the same data, resulting in the update being

overwritten or lost by another transaction. The lost update problem can be illustrated with the below scenario

between two transactions T1 and T2.

1. T1 reads the value of an item from the database.


2. T2 starts and reads the same database item.
3. T1 updates the value of that data and performs a commit.
4. T2 updates the same data item based on its initial read and performs commit.
5. This results in the modification of T1 gets lost by the T2’s write which causes a lost update problem in the

database.

Concurrency Control Protocols

Concurrency control protocols are the set of rules which are maintained in order to solve the concurrency control

problems in the database. It ensures that the concurrent transactions can execute properly while maintaining the
database consistency. The concurrent execution of a transaction is provided with atomicity, consistency, isolation,

durability, and serializability via the concurrency control protocols.

 Locked based concurrency control protocol


 Timestamp based concurrency control protocol

Locked based Protocol

In locked based protocol , each transaction needs to acquire locks before they start accessing or modifying the data

items. There are two types of locks used in databases.


 Shared Lock : Shared lock is also known as read lock which allows multiple transactions to read the data

simultaneously. The transaction which is holding a shared lock can only read the data item but it can not modify the

data item.
 Exclusive Lock : Exclusive lock is also known as the write lock. Exclusive lock allows a transaction to up -

date a data item. Only one transaction can hold the exclusive lock on a data item at a time. While a transaction is

holding an exclusive lock on a data item, no other transaction is allowed to acquire a shared/exclusive lock on the

same data item.

There are two kind of lock based protocol mostly used in database:

 Two Phase Locking Protocol : Two phase locking is a widely used technique which ensures strict order-

ing of lock acquisition and release. Two phase locking protocol works in two phases.
 Growing Phase : In this phase, the transaction starts acquiring locks before performing any

modification on the data items. Once a transaction acquires a lock, that lock can not be released until the transac -

tion reaches the end of the execution.


 Shrinking Phase : In this phase, the transaction releases all the acquired locks once it performs

all the modifications on the data item. Once the transaction starts releasing the locks, it can not acquire any locks

further.
 Strict Two Phase Locking Protocol : It is almost similar to the two phase locking protocol the only dif -

ference is that in two phase locking the transaction can release its locks before it commits, but in case of strict two

phase locking the transactions are only allowed to release the locks only when they performs commits.

Timestamp based Protocol


 In this protocol each transaction has a timestamp attached to it. Timestamp is nothing but the time in

which a transaction enters into the system.


 The conflicting pairs of operations can be resolved by the timestamp ordering protocol through the utiliza -

tion of the timestamp values of the transactions. Therefore, guaranteeing that the transactions take place in the

correct order.

Advantages of Concurrency

In general, concurrency means, that more than one transaction can work on a system. The advantages of a concur -

rent system are:

 Waiting Time: It means if a process is in a ready state but still the process does not get the system to get

execute is called waiting time. So, concurrency leads to less waiting time.
 Response Time: The time wasted in getting the response from the cpu for the first time, is called re -

sponse time. So, concurrency leads to less Response Time.


 Resource Utilization: The amount of Resource utilization in a particular system is called Resource Utiliz -

ation. Multiple transactions can run parallel in a system. So, concurrency leads to more Resource Utilization.
 Efficiency: The amount of output produced in comparison to given input is called efficiency. So, Concur -

rency leads to more Efficiency.

Disadvantages of Concurrency

 Overhead: Implementing concurrency control requires additional overhead, such as acquiring and releas -

ing locks on database objects. This overhead can lead to slower performance and increased resource consumption,

particularly in systems with high levels of concurrency.


 Deadlocks: Deadlocks can occur when two or more transactions are waiting for each other to release re -

sources, causing a circular dependency that can prevent any of the transactions from completing. Deadlocks can be

difficult to detect and resolve, and can result in reduced throughput and increased latency.
 Reduced concurrency: Concurrency control can limit the number of users or applications that can access

the database simultaneously. This can lead to reduced concurrency and slower performance in systems with high

levels of concurrency.
 Complexity: Implementing concurrency control can be complex, particularly in distributed systems or in

systems with complex transactional logic. This complexity can lead to increased development and maintenance

costs.
 Inconsistency: In some cases, concurrency control can lead to inconsistencies in the database. For ex-

ample, a transaction that is rolled back may leave the database in an inconsistent state, or a long-running transac -

tion may cause other transactions to wait for extended periods, leading to data staleness and reduced accuracy.

Database Recovery Techniques in DBMS

Database recovery techniques are used in database management systems (DBMS) to restore a database to a con -

sistent state after a failure or error has occurred. The main goal of recovery techniques is to ensure data integrity

and consistency and prevent data loss. There are mainly two types of recovery techniques used in DBMS:

Rollback/Undo Recovery Technique: The rollback/undo recovery technique is based on the principle of backing

out or undoing the effects of a transaction that has not completed successfully due to a system failure or error. This

technique is accomplished by undoing the changes made by the transaction using the log records stored in the

transaction log. The transaction log contains a record of all the transactions that have been performed on the data -

base. The system uses the log records to undo the changes made by the failed transaction and restore the database

to its previous state.

Commit/Redo Recovery Technique: The commit/redo recovery technique is based on the principle of reapplying

the changes made by a transaction that has been completed successfully to the database. This technique is accom -

plished by using the log records stored in the transaction log to redo the changes made by the transaction that was

in progress at the time of the failure or error. The system uses the log records to reapply the changes made by the

transaction and restore the database to its most recent consistent state.

In addition to these two techniques, there is also a third technique called checkpoint recovery. Checkpoint recovery

is a technique used to reduce the recovery time by periodically saving the state of the database in a checkpoint file.

In the event of a failure, the system can use the checkpoint file to restore the database to the most recent consist -

ent state before the failure occurred, rather than going through the entire log to recover the database.

Overall, recovery techniques are essential to ensure data consistency and availability in DBMS, and each technique

has its own advantages and limitations that must be considered in the design of a recovery system

Database systems, like any other computer system, are subject to failures but the data stored in them must be

available as and when required. When a database fails it must possess the facilities for fast recovery. It must also

have atomicity i.e. either transaction are completed successfully and committed (the effect is recorded permanently

in the database) or the transaction should have no effect on the database. There are both automatic and non-auto -
matic ways for both, backing up of data and recovery from any failure situations. The techniques used to recover

the lost data due to system crashes, transaction errors, viruses, catastrophic failure, incorrect commands execution,

etc. are database recovery techniques. So to prevent data loss recovery techniques based on deferred update and

immediate update or backing up data can be used. Recovery techniques are heavily dependent upon the existence

of a special file known as a system log. It contains information about the start and end of each transaction and any

updates which occur during the transaction. The log keeps track of all transaction operations that affect the values

of database items. This information is needed to recover from transaction failure.


 The log is kept on disk start_transaction(T): This log entry records that transaction T starts the execu -

tion.
 read_item(T, X): This log entry records that transaction T reads the value of database item X.
 write_item(T, X, old_value, new_value): This log entry records that transaction T changes the value of

the database item X from old_value to new_value. The old value is sometimes known as a before an

image of X, and the new value is known as an afterimage of X.


 commit(T): This log entry records that transaction T has completed all accesses to the database suc -

cessfully and its effect can be committed (recorded permanently) to the database.
 abort(T): This records that transaction T has been aborted.
 checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the system and

stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was in a

consistent state, and all the transactions were committed.

A transaction T reaches its commit point when all its operations that access the database have been executed suc-

cessfully i.e. the transaction has reached the point at which it will not abort (terminate without completing). Once

committed, the transaction is permanently recorded in the database. Commitment always involves writing a commit

entry to the log and writing the log to disk. At the time of a system crash, item is searched back in the log for all

transactions T that have written a start_transaction(T) entry into the log but have not written a commit(T) entry yet;

these transactions may have to be rolled back to undo their effect on the database during the recovery process.
 Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e. reverse

the operations of a transaction. This involves examining a transaction for the log entry write_item(T,

x, old_value, new_value) and set the value of item x in the database to old-value. There are two major

techniques for recovery from non-catastrophic transaction failures: deferred updates and immediate

updates.
 Deferred update – This technique does not physically update the database on disk until a transac-

tion has reached its commit point. Before reaching commit, all transaction updates are recorded in

the local transaction workspace. If a transaction fails before reaching its commit point, it will not have

changed the database in any way so UNDO is not needed. It may be necessary to REDO the effect of

the operations that are recorded in the local transaction workspace, because their effect may not yet

have been written in the database. Hence, a deferred update is also known as the No-undo/redo al-

gorithm
 Immediate update – In the immediate update, the database may be updated by some operations of

a transaction before the transaction reaches its commit point. However, these operations are recor -

ded in a log on disk before they are applied to the database, making recovery still possible. If a trans -

action fails to reach its commit point, the effect of its operation must be undone i.e. the transaction

must be rolled back hence we require both undo and redo. This technique is known as undo/redo al-

gorithm.
 Caching/Buffering – In this one or more disk pages that include data items to be updated are

cached into main memory buffers and then updated in memory before being written back to disk. A

collection of in-memory buffers called the DBMS cache is kept under the control of DBMS for holding
these buffers. A directory is used to keep track of which database items are in the buffer. A dirty bit is

associated with each buffer, which is 0 if the buffer is not modified else 1 if modified.
 Shadow paging – It provides atomicity and durability. A directory with n entries is constructed,

where the ith entry points to the ith database page on the link. When a transaction began executing

the current directory is copied into a shadow directory. When a page is to be modified, a shadow page

is allocated in which changes are made and when it is ready to become durable, all pages that refer to

the original are updated to refer new replacement page.


 Backward Recovery – The term “Rollback ” and “UNDO” can also refer to backward recovery. When

a backup of the data is not available and previous modifications need to be undone, this technique

can be helpful. With the backward recovery method, unused modifications are removed and the data-

base is returned to its prior condition. All adjustments made during the previous traction are reversed

during the backward recovery. In another word, it reprocesses valid transactions and undoes the erro -

neous database updates.


 Forward Recovery – “Roll forward “and “REDO” refers to forwarding recovery. When a database

needs to be updated with all changes verified, this forward recovery technique is helpful.

Some failed transactions in this database are applied to the database to roll those modifications for -

ward. In another word, the database is restored using preserved data and valid transactions counted

by their past saves.

Some of the backup techniques are as follows :

 Full database backup – In this full database including data and database, Meta information needed

to restore the whole database, including full-text catalogs are backed up in a predefined time series.
 Differential backup – It stores only the data changes that have occurred since the last full database

backup. When some data has changed many times since last full database backup, a differential

backup stores the most recent version of the changed data. For this first, we need to restore a full

database backup.
 Transaction log backup – In this, all events that have occurred in the database, like a record of every

single statement executed is backed up. It is the backup of transaction log entries and contains all transactions that

had happened to the database. Through this, the database can be recovered to a specific point in time. It is even

possible to perform a backup from a transaction log if the data files are destroyed and not even a single committed

transaction is lost.

What is Hashing in DBMS?

In DBMS, hashing is a technique to directly search the location of desired data on the disk without using index structure.

Hashing method is used to index and retrieve items in a database as it is faster to search that specific item using the shorter

hashed key instead of using its original value. Data is stored in the form of data blocks whose address is generated by apply-

ing a hash function in the memory location where these records are stored known as a data block or data bucket.

Why do we need Hashing?

Here, are the situations in the DBMS where you need to apply the Hashing method:

 For a huge database structure, it’s tough to search all the index values through all its level and then you need to

reach the destination data block to get the desired data.


 Hashing method is used to index and retrieve items in a database as it is faster to search that specific item using

the shorter hashed key instead of using its original value.


 Hashing is an ideal method to calculate the direct location of a data record on the disk without using index struc-

ture.
 It is also a helpful technique for implementing dictionaries.

Important Terminologies in Hashing

Here, are important terminologies which are used in Hashing:

 Data bucket – Data buckets are memory locations where the records are stored. It is also known as Unit Of Stor-

age.
 Key: A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple) in a relation(table).

This allows you to find the relationship between two tables.


 Hash function: A hash function, is a mapping function which maps all the set of search keys to the address where

actual records are placed.


 Linear Probing – Linear probing is a fixed interval between probes. In this method, the next available data block is

used to enter the new record, instead of overwriting on the older record.
 Quadratic probing– It helps you to determine the new bucket address. It helps you to add Interval between

probes by adding the consecutive output of quadratic polynomial to starting value given by the original computation.
 Hash index – It is an address of the data block. A hash function could be a simple mathematical function to even a

complex mathematical function.


 Double Hashing –Double hashing is a computer programming method used in hash tables to resolve the issues of

has a collision.
 Bucket Overflow: The condition of bucket-overflow is called collision. This is a fatal stage for any static has to

function.

Types of Hashing Techniques

There are mainly two types of SQL hashing methods/techniques:

1. Static Hashing
2. Dynamic Hashing

Static Hashing

In the static hashing, the resultant data bucket address will always remain the same.
Therefore, if you generate an address for say Student_ID = 10 using hashing function mod(3), the resultant bucket address

will always be 1. So, you will not see any change in the bucket address.

Therefore, in this static hashing method, the number of data buckets in memory always remains constant.

Static Hash Functions

 Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the

new record using its hash key. When the address is generated, the record is automatically stored in that location.
 Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address

of the bucket where data should be stored.


 Delete a record: Using the hash function, you can first fetch the record which is you wants to delete. Then you can

remove the records for that address in memory.

Static hashing is further divided into

1. Open hashing
2. Close hashing.

Open Hashing

In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This

method is also known as linear probing.

For example, A2 is a new record which you wants to insert. The hash function generates address as 222. But it is already oc-

cupied by some other value. That’s why the system looks for the next data bucket 501 and assigns A2 to it.

How Open Hash Works


Close Hashing

In the close hashing method, when buckets are full, a new bucket is allocated for the same hash and result are linked after

the previous one.

Dynamic Hashing

Dynamic hashing offers a mechanism in which data buckets are added and removed dynamically and on demand. In this

hashing, the hash function helps you to create a large number of values.

Difference between Ordered Indexing and Hashing

Below are the key differences between Indexing and Hashing

Parameters Order Indexing Hashing

Addresses in the memory are sorted according to a key value called Addresses are always generated using a hash fun
Storing of address
the primary key the key value.

Performance of hashing will be best when there


It can decrease when the data increases in the hash file. As it stores the
constant addition and deletion of data. Howeve
Performance data in a sorted form when there is any (insert/delete/update)
the database is huge, then hash file organization
operation performed which decreases its performance.
maintenance will be costlier.

This is an ideal method when you want to retriev

Preferred for range retrieval of data- which means whenever there is particular record based on the search key. Howe
Use for
retrieval data for a particular range, this method is an ideal option. will only perform well when the hash function is

search key.

There will be many unused data blocks because of the delete/update In static and dynamic hashing methods, memory

Memory management operation. These data blocks can’t be released for re-use. That’s why always managed. Bucket overflow is also handle

regular maintenance of the memory is required. perfectly to extend static hashing.

What is Collision?

Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in

the hash table.


How to deal with Hashing Collision?

There are two technique which you can use to avoid a hash collision:

1. Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is

found, where a record should be placed.


2. Chaining: Chaining method builds a Linked list of items whose key hashes to the same value. This method requires

an extra link field to each table position.

You might also like