0% found this document useful (0 votes)

2 views

DBMS Notes 4 - TutorialsDuniya

TutorialsDuniya.com offers free downloadable Computer Science notes, programs, projects, and books for university students in various fields such as BCA, MCA, B.Sc, B.Tech CSE, M.Sc, and M.Tech. The site includes a wide range of topics including algorithms, artificial intelligence, programming languages, and data science. Users are encouraged to share these resources with friends.

Uploaded by

cviolationsdatabase

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

DBMS Notes 4 - TutorialsDuniya

Uploaded by

cviolationsdatabase

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 101

TUTORIALSDUNIYA.

COM

DataBase Management
System Notes
Contributor: Abhishek Sharma
[Founder at TutorialsDuniya.com]

Computer Science Notes

Download FREE Computer Science Notes, Programs,
Projects, Books for any university student of BCA,
MCA, B.Sc, M.Sc, B.Tech CSE, M.Tech at
https://www.tutorialsduniya.com

Please Share these Notes with your Friends as well

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

TutorialsDuniya.com
Download FREE Computer Science Notes, Programs, Projects,
Books PDF for any university student of BCA, MCA, B.Sc,
B.Tech CSE, M.Sc, M.Tech at https://www.tutorialsduniya.com

 Algorithms Notes  Information Security

 Artificial Intelligence  Internet Technologies
 Android Programming  Java Programming
 C & C++ Programming  JavaScript & jQuery
 Combinatorial Optimization  Machine Learning
 Computer Graphics  Microprocessor
 Computer Networks  Operating System
 Computer System Architecture  Operational Research
 DBMS & SQL Notes  PHP Notes
 Data Analysis & Visualization  Python Programming
 Data Mining  R Programming
 Data Science  Software Engineering
 Data Structures  System Programming
 Deep Learning  Theory of Computation
 Digital Image Processing  Unix Network Programming
 Discrete Mathematics  Web Design & Development

Please Share these Notes with your Friends as well

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 Algorithms Notes  Information Security

Please Share these Notes with your Friends as well

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 Algorithms Notes  Information Security

Please Share these Notes with your Friends as well

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
a.
iy
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

UNIT-4
SCHEMA REFINEMENT and NORMAL FORMS
Syllabus: Problems caused by redundancy, Decompositions, problem related to
decomposition, reasoning about FDS, FIRST, SECOND, THIRD Normal forms, BCNKF,
Lossless join Decomposition, Dependency preserving Decomposition, Schema refinement in
Data base Design, Multi valued Dependencies, FORTH Normal Form.

Schema Refinement: The purpose of Schema refinement is used for a refinement

m
approach based on decompositions. Redundant storage of information (i.e. duplication of
data) is main cause of problem. This redundancy is eliminated by decompose the relation.

Q. Problems caused by redundancy: Redundancy is a method of storing the

co
same information repeatedly. It means, storing the same data more than one place with in
a database is can lead several problems. Such as
1) Redundant Storage: It removes the multi-valued attribute. It means, some tuples or

a.
information is stored repeatedly.
2) Update Anomalies: Suppose, if we update one row (or record) then DBMS will update
more than one similar row, causes update anomaly.

iy
For example, if we update the department name those who getting the salary 40000 then
that will update more than one row of employee table is causes update anomaly.
un
3) Insertion Anomalies: In insertion anomaly, when allows insertion for already existed
record again, causes insertion anomaly.
4) Deletion Anomalies: In deletion anomaly, when more than one record is deleted
instead of specified or one, causes deletion anomaly.
sD

Consider the following example,

Eno ename salary rating hourly_wages hours_worked
111 suresh 25000 8 10 40
al

222 eswar 30000 8 10 30

333 sankar 32000 5 7 30
444 padma 31000 5 7 32
ri

555 aswani 35000 8 10 40

In this example,
to

1) Redundancy storage: The rating value 8 corresponds to the hourly_wages 10

and there is repeated three times.
Tu

2) Update anomalies: If the hourly_wages in the first tuple is updated, it does

not make changes in the corresponding second and last tuples. Because, the
key element of tuples is emp_id.
i.e. update employee set hourly_wages = 12 where emp_id = 140; But the
question is update hourly_wages from 10 to 12.
3) Insertion anomalies: If we have to inset a new tuple for an employee, we
should know both the values rating value as well as hourly_wages value.
4) Deletion anomalies: Delete all the tupes through a given rating value, causes
deletion anomaly.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Q. Decompositions: A relation schema (table) R can be decomposition into a

collection of smaller relation schemas (tables) (i.e. R1, R2, . . . . Rm) to eliminate
anomalies caused by the redundancy in the original relation R is called
Decomposition of relation. This shown in Relational Algebra as
R1 с R for 1 ≤ i ≤ m and R1 U R2 U R3 . . . Rm = R.
(or)
A decomposition of a relation schema R consists of replacing the relation schema

m
by two or more relation schemas that each contains a subset of the attributes of
R and together include all attributes in R.
(or)

co
In simple, words, “The process of breaking larger relations into smaller relations is
known as decomposition”.
Consider the above example that can be decomposing the following hourly_emps

a.
relation into two relations.
Hourly_emp (eno, ename, salary, rating, houly_wages, hours_worked)

iy
This is decomposed into
Hourly_empsd( eno, ename, salary, rating, hours_worked) and
Wages(rating, houly_wages)
un
Eno ename salary rating hours_worked
111 suresh 25000 8 40
222 eswar 30000 8 30
sD

333 sankar 32000 5 30

444 padma 31000 5 32
555 aswani 35000 8 40
al

rating hourly_wages
8 10
ri

5 7

Q. Problems Related to Decomposition:

The use of Decomposition is to split the relation into smaller parts to

eliminate the anomalies. The question is
1. What are problems that can be caused by using decomposition?
Tu

2. When do we have to decompose a relation?

The answer for the first question is, two properties of decomposition are
considered.
1) Lossless-join Property: This property helps to identify any instance of the
original relation from the corresponding instance of the smaller relation
attained after decomposition.
2) Dependency Preservation: This property helps to enforce constraint on
smaller relation in order to enforce the constraints on the original relation.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

The answer for the second question is, number of normal forms exists. Every
relation schema is in one of these normal forms and these normal forms can help to
decide whether to decompose the relation schema further or not.
The Disadvantage of decomposition is that it enforces us to join the decomposed
relations in order to solve the queries of the original relation.

Q. What is Relation?: A relation is a named two–dimensional table of data.

Each relation consists of a set of named columns and an arbitrary number of

m
unnamed rows. Emp-id ename dept name salary
For example, a relation named 100 Simpson Marketing 48000
Employee contains following 140 smith Accounting 52000

co
110 Lucero Info-systems 43000
attributes, emp-id, ename, 190 Davis Finance 55000
dept name and salary. 150 martin marketing 42000

a.
marketing 42000
What are the Properties of relations?:
The properties of relations are defined on two dimensional tables. They are:

iy
 Each relation (or table) in a database has a unique name.
 An entry at the intersection of each row and column is atomic or single. These can be
no multiplied atttributes in a relation.
un
 Each row is unique, no two rows in a relation are identical.
 Each attribute or column within a table has a unique name.
 The sequence of columns (left to right) is insignificant the column of a relation can be
sD
interchanged without changing the meaning use of the relation.
 The sequence of rows (top to bottom) is insignificant. As with column, the rows of
relation may be interchanged or stored in any sequence.
Q. Removing multi valued attributes from tables: the “second property of
al

relations” in the above is applied to this table. In this property, there is no multi valied
attributes in a relation. This rule is applied to the table or relation to eliminate the one or
more multi valued attribute. Consider the following the example, the employee table
ri

contain 6 records. In this the course title has multi valued values/attributes. The
employee 100 taken two courses vc++ and ms-office. The record 150 did not taken any
to

course. So it is null.
Emp-id name dept-name salary course_title
100 Krishna cse 20000 vc++, msoffice
Tu

140 Rajasekhar cse 18000 C++, DBMS,DS

Now, this multi valued attributes are eliminated and shown in the following
employee2 table.
Emp-id name dept-name salary course_title
100 Krishna cs 20000 vc++
100 Krishna cs 20000 MSoffice
140 Rajasekhar cs 18000 C++
140 Rajasekhar cs 18000 DBMS
140 Rajasekhar cs 18000 DS

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Functional dependencies : A functional dependency is a constraint between two

attributes (or) two sets of attributes.
For example, the table EMPLOYEE has 4 columns that are Functionally dependencies on
EMP_ID.

Emp_id Ename Dept_name Salry

m
Partial functional dependency : It is a functional dependency in which one or more non-
key attributes are functionally dependent on part of the primary key. Consider the
following graphical representation, in that some of the attributes are partially depend on

co
primary key.

Emp_id Course_title Ename Dept_name Salry date completed

a.
iy
In this example, Ename, Dept_name, and salary are fully functionally depend on Primary
key of Emp_id. But Course_title and date_completed are partial functional dependency.
In this case, the partial functional dependency creates redundancy in that relation.
un
Q. What is Normal Form? What are steps in Normal Form?
NORMALIZATION: Normalization is the process of decomposing relations to produce
smaller, well-structured relation.
To produce smaller and well structured relations, the user needs to follow six
sD

normal forms.
Steps in Normalization:
A normal form is state of relation that result from applying simple rules from
al

regarding functional dependencies ( relationships between attributes to that relation. The

normal form are
1. First normal form 2. Second normal form
ri

3. Third normal form 4. Boyce/codd normal form

5. Fourth normal form 6. Fifth normal form
to

1) First Normal Form: Any multi-valued attributes (also called repeating groups) have
been removed,
2) Second Normal Form: Any partial functional dependencies have been removed.
Tu

3) Third Normal Form: Any transitive dependencies have been removed.

4) Boyce/Codd Normal Form: Any remaining anomalies that result from functional
dependencies have been removed.
5) Fourth Normal Form: Any multi-valued dependencies have been removed.
6) Fifth Normal Form: Any remaining anomalies have been removed.

Advantages of Normalized Relations Over the Un-normalized

Relations: The advantages of normalized relations over un-normalized relations are,

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

1) Normalized relation (table) does not contain repeating groups whereas, un-
normalized relation (table) contains one or more repeating groups.
2) Normalized relation consists of a primary key. There is no primary key presents in
un-normalized relation.
3) Normalization removes the repeating group which occurs many times in a table.
4) With the help of normalization process, we can transform un-normalized table to
First Normal Form (1NF) by removing repeating groups from un-normalized tables.
5) Normalized relations (tables) gives the more simplified result whereas un-

m
normalized relation gives more complicated results.
6) Normalized relations improve storage efficiency, data integrity and scalability. But
un-normalized relations cannot improvise the storage efficiency and data integrity.

co
7) Normalization results in database consistency, flexible data accesses.

Q. FIRST NORMAL FORM (1NF): A relation is in first normal form (1NF) contains no
multi-Valued attributes. Consider the example employee, that contain multi valued

a.
attributes that are removing and converting into single valued attributes
Multi valued attributes in course title
Emp-id name dept-name salary course_title

iy
100 Krishna cse 20000 vc++,
msoffice
140 Raja it
un 18000 C++,
DBMS,
DS
Removing the multi valued attributes and converting single valied using First NF
sD

Emp-id name dept-name salary course_title

100 Krishna cse 20000 vc++
100 Krishna cse 20000 msoffice
140 Raja it 18000 C++
al

140 Raja it 18000 DBMS

140 Raja it 18000 DS
ri

SECOND NORMAL FORM(2NF): A relation in Second Normal Form (2NF) if it is in the

1NF and if all non-key attributes are fully functionally dependent on the primary key. In
a functional dependency X  Y, the attribute on left hand side ( i.e. x) is the primary key
to

of the relation and right side attributes on right hand side i.e. Y is the non-key attributes.
In some situation some non key attributes are partial functional dependency on primary
key. Consider the following example for partial functional specification and also that
Tu

convert into 2 NF to decompose that into two relations.

Emp_id Course_title Ename Dept_name Salry date completed

In this example, Ename, Dept_name, and salary are fully functionally depend on Primary
key of Emp_id. But Course_title and date_completed are partial functional dependency.
In this case, the partial functional dependency creates redundancy in that relation.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 To avoid this, convert this into Second Normal Form. The 2NF will decompose
the relation into two relations, shown in graphical representation.

EMPLOYEE
Emp_id Ename Dept_name Salry

COURSE
Course_title Date_Completed Emp_id

m
In the above graphical representation

co
 the EMPLOYEE relation satisfies rule of 1 NF in Second Normal form and
 the COURSE relation satisfies rule of 2 NF by decomposing into two relation.

THIRD NORMAL FORM(3NF): A relation that is in Second Normal form and has no

a.
transitive dependencies present.
Transitive dependency : A transitive is a functional dependency between two non key

iy
attributes. For example, consider the relation Sales with attributes cust_id, name, sales
person and region that shown in graphical representation.
un
Cust_id Cname Sales_person Region
Salry
sD

CUST_ID NAME SALESPERSON REGION

1001 Anand Smith South
1002 Sunil kiran West
1003 Govind babu rao East
al

1004 Manohar Smith South

1005 Madhu Somu North
In this example, to insert, delete and update any row that facing Anomaly.
ri

a) Insertion Anomaly: A new salesperson is assigned to North Region without assign a

customer to that salesperson. This causes insertion Anomaly.

b) Deletion Anomaly: If a customer number say 1003 is deleted from the table, we lose
the information of salesperson who is assigned to that customer. This causes, Deletion
Tu

Anomaly.
c) Modification Anomaly: If salesperson Smith is reassigned to the East region, several
rows must be changed to reflect that fact. This causes, update anomaly.
To avoid this Anomaly problem, the transitive dependency can be removed by
decomposition of SALES into two relations in 3NF.

Consider the following example, that removes Anomaly by decomposing into two relations.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

CUST_ID NAME SALESPERSON SALESPERSON REGION

1001 Anand Smith Smith South
1002 Sunil kiran kiran West
1003 Govind babu rao babu rao East
1004 Manohar Smith Smith South
1005 Madhu Somu Somu North

SALES PERSON CUSTOMER

m
Sales_person Region Cust_id Cname Salry

co
Q. BOYCE/CODD NORMAL FORM(BCNF): A relation is in BCNF if it is in
3NF and every determinant is a candidate key.

a.
+
FD in F of the form X  A where X с S and A є S, X is a super key of R.
Boyce-Codd normal form removes the remaining anomalies in 3NF that are resulting

iy
from functional dependency, we can get the result of relation in BCNF.
For example, STUDENT-ADVIDSOR IN 3NF
STUDENT-ADVISOR
un
Student-id major-subject faculty-advisor
sD

STUDENT-ADVISOR relstion with simple data.

STYDENT-ID MAJOR-SUBJECT FACULTY-ADVISOR

1 MATHS B
2 MATHS B
3 MATHS B
ri

4 STATISTICS A
5 STATISTICS A
to

In the above relation the primary key in student-id and major-subject. Here the part of
the primary key major-subject is dependent upon a non key attribute faculty–advisor. So,
Tu

here the determenant the faculty-advisor. But it is not candidate key.

Here in this example there are no partial dependencies and transitive
dependencies. There is only functional dependency between part of the primary key and
non key attribute. Because of this dependency there is anomely in this relation.
Suppose that in maths subject the advisor’ B’ is replaced by X. this change must be
made in two or more rows in this relation. This is an updation anomely.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

To convert a relation to BCNF the first step in the original relation is modified that the
determinant(non key attributes) becoms a component of the primary key of new relation.
The attribute that is dependent on determinant becomes a non key attributes .
STUDENT-ADVISOR

STUDENT-ID FACULTY-ADVIDSOR MAJOR-SUBJEST

m
The second step in the conversion process is decompose the relation to eleminate the

co
partial functional dependency. This results in two relations. these relations are in 3NF
and BCNF . since there is only one candidate key. That is determinant.
Two relations are in BCNF.

a.
ADVISOR STUDENT

Faculty- advisor major-subject Student-id faculty-advisor

iy
In these two relations the student relation has a composite key , which contains
un
attributes student-id and faculty-advisor. Here faculty–advisor a foreign key which is
referenced to the primary key of the advisor relation.
Two relations are in BCNF with simple data. Student_id Faculty_Advisor
Faculty Advisor Major_subject 1 B
sD

2 B
B MATHS 3 A
A PHYSICS 4 A
5 A
al

Q.Fourth Normal Form (4 NF) : A relation is in BCNF that contain no multi-

valued dependency. In this case, 1 NF will repeated in this step. For example, R be a
ri

relation schema, X and Y be attributes of R, and F be a set of dependencies that includes

both FDs and MVDs. (i.e. Functional Dependency and Multi-valued Dependencies). Then R is
said to be in Fourth Normal Form (4NF) if for every MVD X  Y that holds over R, one
to

of the following statements is true.

1) Y с X or XY = R, or 2) X is a super key.
Tu

Example: Consider a relation schema ABCD and suppose that are FD A  BCD and the
MVD B   C are given as shown in Table B C A D tuples
It shows three tuples from relation ABCD that satisfies
the given MVD B   C. From the definition of a MVD, b c1 a1 d1 - tuple t1
given tuples t1 and t2, it follows that tuples t3 must also
be included in the above relation. Now, consider b c2 a2 d2 - tuple t2
tuples t2 and t3. From the given FD A  BCD and b c1 a2 d2 - tuple t3
the fact that these tuples have the same A-value,
we can compute

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 Algorithms Notes  Information Security

Please Share these Notes with your Friends as well

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

the c1 = c2. Therefore, we see that the FD B  C must hold over ABCD whenever the
FD A  BCD and the MVD B   C holds. If B  C holds, the relation is not in BCNF but
the relation is in 4 NF.
The fourth normal from is useful because it overcomes the problems of the various
approaches in which it represents the multi-valued attributes in a single relation.

Q. Fifth Normal Form (5 NF) : Any remaining anomalies from 4 NF relation

have been removed.

m
A relation schema R is said to be in Fifth Normal Form (5NF) if, for every join dependency
* (R1, . . . . Rn) that holds over R, one of the following statements is true.
*Ri = R for some I, or

co
* The JD is implied by the set of those FDs over R in which the left side is a key for R.
It deals with a property loss less joins.

a.
Q. LOSSELESS-JOIN DECOMPOSITION:
Let R be a relation schema and let F be a set FDs (Functional Dependencies) over R. A

iy
decomposition of R into two schemas with attribute X and Y is said to be lossless-join
decomposition with respect to F, if for every instance r of R that satisfies the
dependencies in Fr.
un
πx (r) πy (r) = r
In simple words, we can recover the original relation from the decomposed
relations.
sD

In general, if we take projection of a relation and recombine them using natural join, we
obtain some additional tuples that were not in the original relation.
S P D S P P D
al

s1 p1 d1 s1 p1
P1 d1
s2 p2 d2 s2 p2 p2 d2
ri

s2 p1 d3 s2 p1 p1 d3
to

S P D
s1 p1 d1
Tu

s1 p1 d3

s2 p2 d2

s2 p1 d1

s2 p1 d3

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

The decomposition of relation schema r i.e. SPD into SP i.e. PROJECTING πsp (r
) and PD i.e., projecting πPD (r) is therefore lossless decomposition as it gains back all
original tuples of relation ‘r’ as well as with some additional tuples that were not in original
relation ‘r’.
Q. Dependency-Preserving Decomposition:
Dependency-preserving decomposition property allows us to check integrity constraints
efficiently. In simple words, a dependency-preserving decomposition allows us to enforce

m
all FDs by examining a single relation on each insertion or modification of a tuple.
Let R be a relation schema that is decomposed in two schemas with attributes X
and Y and let F be a set of FDs over R. The projection of F on X is the set of FD s in the

co
+
closure F that involves only attributes in X. We denote the projection of F on attributes
+
X as Fx. Note that a dependency U  V in F is in Fx only if all the attributes in U and Y
are in X. The decomposition of relation schema R with FDs F into schemas with attributes

a.
+
X and Y is dependency-preserving if ( F x U F y) = F+.
That is, if we take the dependencies in Fx and Fy and compute the closure of their

iy
union, then we get back all dependencies in the closure of F. To enforce F x. we need to
examine only relation X (that inserts on that relation) to enforce F y, we need to examine
only relation Y.
un
Example: Suppose that a relation R with attributes ABC is decomposed into relations with
attributes AB and BC. The set F of FDs over r includes A  B , B  C and C  A. Here, A
 B is in FAB and B  C is in FBC and both are dependency-preserving. Where as C  A is
not implied by the dependencies of FAB and FBC. Therefore C  A is not dependency-
sD

preserving.
Consequently, FAB also contains B  A as well as A  B and FBC contains C 
B as well as B  C. Therefore FAB U FBC contain A  B , B  C, B  A and C  B.
al

Now, the closure of the dependencies in FAB and FBC includes C  A (because, from
C  B, B  A and transitivity rule, we compute as C  A).
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Unit – 5 (Transaction Management and Concurrency Control)

Transaction
A transaction can be defined as a group of tasks. A single task is the minimum processing
unit which cannot be divided further.
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500
from A's account to B's account. This very simple and small transaction involves several

m
low-level tasks.

A’s Account

co
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance

a.
Close_Account(A)

B’s Account

iy
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
un
Close_Account(B)

ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
sD

transaction in a database system must maintain Atomicity, Consistency, Isolation,

and Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
 Atomicity − This property states that a transaction must be treated as an atomic
al

unit, that is, either all of its operations are executed or none. There must be no
state in a database where a transaction is left partially completed. States should be
defined either before the execution of the transaction or after the
ri

execution/abortion/failure of the transaction.

 Consistency − The database must remain in a consistent state after any
transaction. No transaction should have any adverse effect on the data residing in
to

the database. If the database was in a consistent state before the execution of a
transaction, it must remain consistent after the execution of the transaction as well.
 Durability − The database should be durable enough to hold all its latest updates
even if the system fails or restarts. If a transaction updates a chunk of data in a
Tu

database and commits, then the database will hold the modified data. If a
transaction commits but the system fails before the data could be written on to the
disk, then that data will be updated once the system springs back into action.
 Isolation − In a database system where more than one transaction are being
executed simultaneously and in parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the only transaction in the
system. No transaction will affect the existence of any other transaction.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Transaction Log
 In the field of databases in computer science, a transaction log (also transaction
journal, database log, binary log or audit trail) is a history of actions executed
by a database management system used to guarantee ACID properties
over crashes or hardware failures. Physically, a log is a file listing changes to the
database, stored in a stable storage format.
 If, after a start, the database is found in an inconsistent state or not been shut down

m
properly, the database management system reviews the database logs
for uncommittedtransactions and rolls back the changes made by these transactions.
Additionally, all transactions that are already committed but whose changes were not

co
yet materialized in the database are re-applied. Both are done to
ensure atomicity and durability of transactions.

Anatomy of a general database log

a.
A database log record is made up of:
 Log Sequence Number (LSN): A unique ID for a log record. With LSNs, logs can be

iy
recovered in constant time. Most LSNs are assigned in monotonically increasing order,
which is useful in recovery algorithms, like ARIES.
 Prev LSN: A link to their last log record. This implies database logs are constructed
in linked list form.
un
 Transaction ID number: A reference to the database transaction generating the log
record.
 Type: Describes the type of database log record.
sD

 Information about the actual changes that triggered the log record to be written.

Types of database log records

All log records include the general log attributes above, and also other attributes depending
on their type (which is recorded in the Type attribute, as above).
 Update Log Record notes an update (change) to the database. It includes this extra
information:
ri

 PageID: A reference to the Page ID of the modified page.

 Length and Offset: Length in bytes and offset of the page are usually included.
to

 Before and After Images: Includes the value of the bytes of page before and after
the page change. Some databases may have logs which include one or both images.
 Compensation Log Record notes the rollback of a particular change to the database.
Each corresponds with exactly one other Update Log Record (although the
Tu

corresponding update log record is not typically stored in the Compensation Log
Record). It includes this extra information:
 undoNextLSN: This field contains the LSN of the next log record that is to be undone
for transaction that wrote the last Update Log.
 Commit Record notes a decision to commit a transaction.
 Abort Record notes a decision to abort and hence roll back a transaction.
 Checkpoint Record notes that a checkpoint has been made. These are used to speed
up recovery. They record information that eliminates the need to read a long way into

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

the log's past. This varies according to checkpoint algorithm. If all dirty pages are
flushed while creating the checkpoint (as in PostgreSQL), it might contain:
 redoLSN: This is a reference to the first log record that corresponds to a dirty page.
i.e. the first update that wasn't flushed at checkpoint time. This is where redo must
begin on recovery.
 undoLSN: This is a reference to the oldest log record of the oldest in-progress
transaction. This is the oldest log record needed to undo all in-progress transactions.

m
 Completion Record notes that all work has been done for this particular transaction.
(It has been fully committed or aborted)
Transaction Control

co
The following commands are used to control transactions.
 COMMIT − to save the changes.
 ROLLBACK − to roll back the changes.
 SAVEPOINT − creates points within the groups of transactions in which to

a.
ROLLBACK.
 SET TRANSACTION − Places a name on a transaction.

iy
Transactional Control Commands
Transactional control commands are only used with the DML Commands such as -
INSERT, UPDATE and DELETE only. They cannot be used while creating tables or dropping
un
them because these operations are automatically committed in the database.
The COMMIT Command
The COMMIT command is the transactional command used to save changes invoked by a
transaction to the database.
sD
The COMMIT command is the transactional command used to save changes invoked by a
transaction to the database. The COMMIT command saves all the transactions to the
database since the last COMMIT or ROLLBACK command.
The syntax for the COMMIT command is as follows.
al

COMMIT;

Example
Consider the CUSTOMERS table having the following records −
ri

+----+----------+-----+-----------+----------+
to

| ID | NAME | AGE | ADDRESS | SALARY |

+----+----------+-----+-----------+----------+
Tu

| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |

| 2 | Khilan | 25 | Delhi | 1500.00 |

| 3 | kaushik | 23 | Kota | 2000.00 |

| 4 | Chaitali | 25 | Mumbai | 6500.00 |

| 5 | Hardik | 27 | Bhopal | 8500.00 |

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

| 6 | Komal | 22 | MP | 4500.00 |

| 7 | Muffy | 24 | Indore | 10000.00 |

+----+----------+-----+-----------+----------+
Following is an example which would delete those records from the table which have age =
25 and then COMMIT the changes in the database.

m
SQL> DELETE FROM CUSTOMERS

co
WHERE AGE = 25;

SQL> COMMIT;
Thus, two rows from the table would be deleted and the SELECT statement would produce

a.
the following result.
+----+----------+-----+-----------+----------+

iy
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota
un
| 2000.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
sD

The ROLLBACK Command

The ROLLBACK command is the transactional command used to undo transactions that
have not already been saved to the database. This command can only be used to undo
transactions since the last COMMIT or ROLLBACK command was issued.
al

The syntax for a ROLLBACK command is as follows −

ROLLBACK;
ri

Example
Consider the CUSTOMERS table having the following records −
to

+----+----------+-----+-----------+----------+
Tu

| ID | NAME | AGE | ADDRESS | SALARY |

+----+----------+-----+-----------+----------+

| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |

| 2 | Khilan | 25 | Delhi | 1500.00 |

| 3 | kaushik | 23 | Kota | 2000.00 |

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

| 4 | Chaitali | 25 | Mumbai | 6500.00 |

| 5 | Hardik | 27 | Bhopal | 8500.00 |

| 6 | Komal | 22 | MP | 4500.00 |

| 7 | Muffy | 24 | Indore | 10000.00 |

m
+----+----------+-----+-----------+----------+
Following is an example, which would delete those records from the table which have the

co
age = 25 and then ROLLBACK the changes in the database.

SQL> DELETE FROM CUSTOMERS

a.
WHERE AGE = 25;

iy
SQL> ROLLBACK;
Thus, the delete operation would not impact the table and the SELECT statement would
produce the following result.
un
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
sD
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
al

| 7 | Muffy | 24 | Indore | 10000.00 |

+----+----------+-----+-----------+----------+
ri

The SAVEPOINT Command

A SAVEPOINT is a point in a transaction when you can roll the transaction back to a certain
point without rolling back the entire transaction.
to

The syntax for a SAVEPOINT command is as shown below.

SAVEPOINT SAVEPOINT_NAME;
Tu

This command serves only in the creation of a SAVEPOINT among all the transactional
statements. The ROLLBACK command is used to undo a group of transactions.
The syntax for rolling back to a SAVEPOINT is as shown below.
ROLLBACK TO SAVEPOINT_NAME;
Following is an example where you plan to delete the three different records from the
CUSTOMERS table. You want to create a SAVEPOINT before each delete, so that you can
ROLLBACK to any SAVEPOINT at any time to return the appropriate data to its original
state.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Example
Consider the CUSTOMERS table having the following records.

+----+----------+-----+-----------+----------+

| ID | NAME | AGE | ADDRESS | SALARY |

m
+----+----------+-----+-----------+----------+

| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |

co
| 2 | Khilan | 25 | Delhi | 1500.00 |

| 3 | kaushik | 23 | Kota | 2000.00 |

a.
| 4 | Chaitali | 25 | Mumbai | 6500.00 |

| 5 | Hardik | 27 | Bhopal | 8500.00 |

iy
| 6 | Komal | 22 | MP | 4500.00 |
un
| 7 | Muffy | 24 | Indore | 10000.00 |

+----+----------+-----+-----------+----------+
The following code block contains the series of operations.
sD

SQL> SAVEPOINT SP1;

Savepoint created.
al

SQL> DELETE FROM CUSTOMERS WHERE ID=1;

1 row deleted.
ri

SQL> SAVEPOINT SP2;

Savepoint created.

SQL> DELETE FROM CUSTOMERS WHERE ID=2;

1 row deleted.

SQL> SAVEPOINT SP3;

Savepoint created.

SQL> DELETE FROM CUSTOMERS WHERE ID=3;

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

1 row deleted.
Now that the three deletions have taken place, let us assume that you have changed your
mind and decided to ROLLBACK to the SAVEPOINT that you identified as SP2. Because SP2
was created after the first deletion, the last two deletions are undone −

SQL> ROLLBACK TO SP2;

m
Rollback complete.
Notice that only the first deletion took place since you rolled back to SP2.

co
SQL> SELECT * FROM CUSTOMERS;

+----+----------+-----+-----------+----------+

a.
| ID | NAME | AGE | ADDRESS | SALARY |

iy
+----+----------+-----+-----------+----------+

| 2 | Khilan | 25 | Delhi | 1500.00 |

| 3 | kaushik | 23 | Kota |
un
2000.00 |

| 4 | Chaitali | 25 | Mumbai | 6500.00 |

sD
| 5 | Hardik | 27 | Bhopal | 8500.00 |

| 6 | Komal | 22 | MP | 4500.00 |

| 7 | Muffy | 24 | Indore | 10000.00 |

+----+----------+-----+-----------+----------+
ri

6 rows selected.

The RELEASE SAVEPOINT Command

The RELEASE SAVEPOINT command is used to remove a SAVEPOINT that you have
created.
The syntax for a RELEASE SAVEPOINT command is as follows .
Tu

RELEASE SAVEPOINT SAVEPOINT_NAME;

Once a SAVEPOINT has been released, you can no longer use the ROLLBACK command to
undo transactions performed since the last SAVEPOINT.
The SET TRANSACTION Command
The SET TRANSACTION command can be used to initiate a database transaction. This
command is used to specify characteristics for the transaction that follows. For example,
you can specify a transaction to be read only or read write.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

The syntax for a SET TRANSACTION command is as follows.

SET TRANSACTION [ READ WRITE | READ ONLY ];

Concurrency Control

m
In the concurrency control, the multiple transactions can be executed simultaneously.

It may affect the transaction result. It is highly important to maintain the order of execution

co
of those transactions.

Problems of concurrency control

Several problems can occur when concurrent transactions are executed in an uncontrolled

a.
manner. Following are the three problems in concurrency control.
1. Lost updates

iy
2. Dirty read
3. Unrepeatable read
1. Lost update problem
o
un
When two transactions that access the same database items contain their operations
in a way that makes the value of some database item incorrect, then the lost update
problem occurs.
o If two transactions T1 and T2 read a record and then update it, then the effect of
updating of the first record will be overwritten by the second update.
sD

Example:
al
ri

Here,
to

o At time t2, transaction-X reads A's value.

o At time t3, Transaction-Y reads A's value.
o At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
o At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
Tu

o So at time T5, the update of Transaction-X is lost because Transaction y overwrites it

without looking at its current value.
o Such type of problem is known as Lost Update Problem as update made by one
transaction is lost here.
2. Dirty Read
o The dirty read occurs in the case when one transaction updates an item of the
database, and then the transaction fails for some reason. The updated database item
is accessed by another transaction before it is changed back to the original value.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

o A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has
values which have never formed part of the stable database.
Example:

m
o At time t2, transaction-Y writes A's value.

co
o At time t3, Transaction-X reads A's value.
o At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to
t1.
o So, Transaction-X now contains a value which has never become part of the stable

a.
database.
o Such type of problem is known as Dirty Read Problem, as one transaction reads a
dirty value which has not been committed.

iy
3. Inconsistent Retrievals Problem
o Inconsistent Retrievals Problem is also known as unrepeatable read. When a
un
transaction calculates some summary function over a set of data while the other
transactions are updating the data, then the Inconsistent Retrievals Problem occurs.
o A transaction T1 reads a record and then does some other processing during which
the transaction T2 updates the record. Now when the transaction T1 reads the
record, then the new value will be inconsistent with the previous value.
sD

o Example: Suppose two transactions operate on three accounts.

al
ri
to
Tu

o Transaction-X is doing the sum of all balance while transaction-Y is transferring an

amount 50 from Account-1 to Account-3.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

o Here, transaction-X produces the result of 550 which is incorrect. If we write this
produced result in the database, the database will become an inconsistent state
because the actual sum is 600.
o Here, transaction-X has seen an inconsistent state of the database.
Concurrency Control Protocol
Concurrency control protocols ensure atomicity, isolation, and serializability of concurrent
transactions. The concurrency control protocol can be divided into three categories:

m
1. Lock based protocol
2. Time-stamp protocol
3. Validation based protocol

co
Concurrency Control Problems
The coordination of the simultaneous execution of transactions in a multiuser database

a.
system is known as concurrency control. The objective of concurrency control is to ensure
the serializability of transactions in a multiuser database environment. Concurrency control
is important because the simultaneous execution of transactions over a shared database can

iy
create several data integrity and consistency problems. The three main problems are lost
updates, uncommitted data, and inconsistent retrievals.
un
1. Lost Updates:
The lost update problem occurs when two concurrent transactions, T1 and T2, are updating
the same data element and one of the updates is lost (overwritten by the other
transaction). Consider the following PRODUCT table example.
sD

One of the PRODUCT table’s attributes is a product’s quantity on hand (PROD_QOH).

Assume that you have a product whose current PROD_QOH value is 35. Also assume that
two concurrent transactions, T1 and T2, occur that update the PROD_QOH value for some
item in the PRODUCT table.
al

The transactions are as follows.

Two concurrent transactions update PROD_QOH:

Transaction Operation
T1: Purchase 100 units PROD_QOH = PROD_QOH + 100
to

T2: Sell 30 units PROD_QOH = PROD_QOH – 30

The Following table shows the serial execution of those transactions under normal
circumstances, yielding the correct answer PROD_QOH = 105.
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
But suppose that a transaction is able to read a product’s PROD_QOH value from the table

co
before a previous transaction (using the same product) has been committed.

The sequence depicted in the following Table shows how the lost update problem can arise.
Note that the first transaction (T1) has not yet been committed when the second

a.
transaction (T2) is executed. Therefore, T2 still operates on the value 35, and its
subtraction yields 5 in memory. In the meantime, T1 writes the value 135 to disk, which is
promptly overwritten by T2. In short, the addition of 100 units is “lost” during the process.

iy
un
sD

2. Uncommitted Data:
al

The phenomenon of uncommitted data occurs when two transactions, T1 and T2, are
executed concurrently and the first transaction (T1) is rolled back after the second
ri

transaction (T2) has already accessed the uncommitted data—thus violating the isolation
property of transactions.
to

To illustrate that possibility, let’s use the same transactions described during the lost
updates discussion. T1 has two atomic parts to it, one of which is the update of the
inventory, the other possibly being the update of the invoice total (not shown). T1 is forced
Tu

to roll back due to an error during the updating of the invoice’s total; hence, it rolls back all
the way, undoing the inventory update as well. This time, the T1 transaction is rolled back
to eliminate the addition of the 100 units. Because T2 subtracts 30 from the original 35
units, the correct answer should be 5.

Transaction Operation
T1: Purchase 100 units PROD_QOH = PROD_QOH + 100 (Rolled back)
T2: Sell 30 units PROD_QOH = PROD_QOH – 30

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

The following Table shows how, under normal circumstances, the serial execution of those
transactions yields the correct answer.

m
co
a.
The following Table shows how the uncommitted data problem can arise when the
ROLLBACK is completed after T2 has begun its execution.

iy
un
sD
al

3. Inconsistent Retrievals:
ri

Inconsistent retrievals occur when a transaction accesses data before and after another
transaction(s) finish working with such data. For example, an inconsistent retrieval would
to

occur if transaction T1 calculated some summary (aggregate) function over a set of data
while another transaction (T2) was updating the same data. The problem is that the
transaction might read some data before they are changed and other data after they are
Tu

changed, thereby yielding inconsistent results.

To illustrate that problem, assume the following conditions:

1. T1 calculates the total quantity on hand of the products stored in the PRODUCT table.

2. At the same time, T2 updates the quantity on hand (PROD_QOH) for two of the PRODUCT
table’s products.
The two transactions are shown in the following Table:

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 Algorithms Notes  Information Security

Please Share these Notes with your Friends as well

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
While T1 calculates the total quantity on hand (PROD_QOH) for all items, T2 represents the
correction of a typing error: the user added 10 units to product 1558-QW1’s PROD_QOH but

co
meant to add the 10 units to product 1546-QQ2’s PROD_QOH. To correct the problem, the
user adds 10 to product 1546-QQ2’s PROD_QOH and subtracts 10 from product 1558-
QW1’s PROD_QOH. The initial and final PROD_QOH values are reflected in the following
Table

a.
iy
un
sD

The following table demonstrates that inconsistent retrievals are possible during the
transaction execution, making the result of T1’s execution incorrect. The “After” summation
shown in Table 10.9 reflects the fact that the value of 25 for product 1546-QQ2 was read
after the WRITE statement was completed. Therefore, the “After” total is 40 + 25 = 65. The
al

“Before” total reflects the fact that the value of 23 for product 1558-QW1 was read before
the next WRITE statement was completed to reflect the corrected update of 13. Therefore,
the “Before” total is 65 + 23 = 88.
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

The computed answer of 102 is obviously wrong because you know from the previous Table
that the correct answer is 92. Unless the DBMS exercises concurrency control, a multiuser
database environment can create havoc within the information system.

The Scheduler and its Functions

You now know that severe problems can arise when two or more concurrent transactions
are executed. You also know that a database transaction involves a series of database I/O

m
operations that take the database from one consistent state to another. Finally, you know
that database consistency can be ensured only before and after the execution of
transactions.

co
 A database always moves through an unavoidable temporary state of inconsistency
during a transaction’s execution if such transaction updates multiple tables/rows. (If
the transaction contains only one update, then there is no temporary inconsistency.)

a.
That temporary inconsistency exists because a computer executes the operations
serially, one after another. During this serial process, the isolation property of

iy
transactions prevents them from accessing the data not yet released by other
transactions.
 The scheduler establishes the order in which the operations with in concurrent
un
transactions are executed. The scheduler interleaves the execution of database
operations to ensure serializability. To determine the appropriate order, the
scheduler bases its actions on concurrency control algorithms, such as locking or
time-stamping methods. The scheduler also makes sure that the computer’s CPU is
used efficiently.
sD

 The DBMS determines what transactions are serializable and proceeds to interleave
the execution of the transaction’s operations. Generally, transactions that are not
serializable are executed on a first-come, first-served basis by the DBMS. The
scheduler’s main job is to create a serializable schedule of a transaction’s
al

operations.
 A serializable schedule is a schedule of a transaction’s operations in which the
interleaved execution of the transactions (T1, T2, T3, etc.) yields the same results as
ri

if the transactions were executed in serial order (one after another).

 The scheduler also makes sure that the computer’s central processing unit (CPU) and
storage systems are used efficiently. If there were no way to schedule the execution
to

of transactions, all transactions would be executed on a first-come, first-served

basis. The problem with that approach is that processing time is wasted when the
CPU waits for a READ or WRITE operation to finish, thereby losing several CPU
Tu

cycles.
 The scheduler facilitates data isolation to ensure that two transactions do not update
the same data element at the same time. Database operations might require READ
and/or WRITE actions that produce conflicts. For example, The following Table shows
the possible conflict scenarios when two transactions, T1 and T2, are executed
concurrently over the same data. Note that in Table 10.11, two operations are in
conflict when they access the same data and at least one of them is a WRITE
operation.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
CONCURRENCY CONTROL WITH LOCKING METHODS

co
A lock guarantees exclusive use of a data item to a current transaction. In other words, transaction T2

a.
does not have access to a data item that is currently being used by transaction T1. A transaction
acquires a lock prior to data access; the lock is released (unlocked) when the transaction is completed so
that another transaction can lock the data item for its exclusive use.

iy
Most multiuser DBMSs automatically initiate and enforce locking procedures. All lock information is
managed by a lock manager.
un
Lock Granularity
Indicates the level of lock use. Locking can take place at the following levels: database, table, page,
sD

row or even field.

LOCK TYPES
al

Regardless of the level of locking, the DBMS may use different lock types:
ri

1. Binary Locks

Have only two states: locked (1) or unlocked (0).

2. Shared/Exclusive Locks
Tu

An exclusive lock exists when access is reserved specifically for the transaction that locked the object
.The exclusive lock must be used when the potential for conflict exists. A shared lock exists when
concurrent transactions are granted read access on the basis of common lock. A shared lock produces
no conflict as long as all the concurrent transactions are read only.

DEADLOCKS
A deadlock occurs when two transactions wait indefinitely for each other to unlock data.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

The three basic techniques to control deadlocks are:

 Deadlock preventation . A transaction requesting a new lock is aborted when there is the
possibility that a deadlock can occur. if the transaction is aborted , all changes made by this
transaction are rolled back and all locks obtained by the transaction are released .The
transaction is then rescheduled for execution.
 Deadlock detection. The DBMS periodically tests the database for deadlocks. if a deadlock is
found one of the transactions is aborted (rolled back and restarted) and the other transaction

m
are continues.
 Deadlock avoidance. The transaction must obtain all of the locks it needs before it can be
executed .This technique avoids the rollback of conflicting transactions by requiring that locks be

co
obtained in succession

What is Two-Phase Locking (2PL)?

a.
 Two-Phase Locking (2PL) is a concurrency control method which divides the execution
phase of a transaction into three parts.
 It ensures conflict serializable schedules.

iy
 If read and write operations introduce the first unlock operation in the transaction, then it
is said to be Two-Phase Locking Protocol. un
This protocol can be divided into two phases,
1. In Growing Phase, a transaction obtains locks, but may not release any lock.
2. In Shrinking Phase, a transaction may release locks, but may not obtain any lock.

 Two-Phase Locking does not ensure freedom from deadlocks.

Types of Two – Phase Locking Protocol

Following are the types of two – phase locking protocol:

1. Strict Two – Phase Locking Protocol

2. Rigorous Two – Phase Locking Protocol
3. Conservative Two – Phase Locking Protocol
ri

1. Strict Two-Phase Locking Protocol

 Strict Two-Phase Locking Protocol avoids cascaded rollbacks.

 This protocol not only requires two-phase locking but also all exclusive-locks should be held
until the transaction commits or aborts.
 It is not deadlock free.
Tu

 It ensures that if data is being modified by one transaction, then other transaction cannot
read it until first transaction commits.
 Most of the database systems implement rigorous two – phase locking protocol.
2. Rigorous Two-Phase Locking
 Rigorous Two – Phase Locking Protocol avoids cascading rollbacks.
 This protocol requires that all the share and exclusive locks to be held until the transaction
commits.
3. Conservative Two-Phase Locking Protocol

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 Conservative Two – Phase Locking Protocol is also called as Static Two – Phase Locking
Protocol.
 This protocol is almost free from deadlocks as all required items are listed in advanced.
 It requires locking of all data items to access before the transaction starts.

Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This

m
protocol uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions
at the time of execution, whereas timestamp-based protocols start working as soon as a

co
transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by
the age of the transaction. A transaction created at 0002 clock time would be older than all
other transactions that come after it. For example, any transaction 'y' entering the system

a.
at 0004 is two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the
system know when the last ‘read and write’ operation was performed on the data item.

iy
Concurrency control with time stamp ordering
The timestamp-ordering protocol ensures serializability among transactions in their
un
conflicting read and write operations. This is the responsibility of the protocol system that
the conflicting pair of tasks should be executed according to the timestamp values of the
transactions.
 The timestamp of transaction Ti is denoted as TS(Ti).
sD

 Read time-stamp of data-item X is denoted by R-timestamp(X).

 Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows −
 If a transaction Ti issues a read(X) operation −
al

o If TS(Ti) < W-timestamp(X)

 Operation rejected.
o If TS(Ti) >= W-timestamp(X)
 Operation executed.
ri

o All data-item timestamps updated.

 If a transaction Ti issues a write(X) operation −
o If TS(Ti) < R-timestamp(X)
to

 Operation rejected.
o If TS(Ti) < W-timestamp(X)
 Operation rejected and Ti rolled back.
o Otherwise, operation executed.
Tu

Thomas' Write Rule

This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and T i is rolled
back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.
Deadlock

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

In a multi-process system, deadlock is an unwanted situation that arises in a shared

resource environment, where a process indefinitely waits for a resource that is held by
another process.
For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to
complete its task. Resource X is held by T1, and T1 is waiting for a resource Y, which is held
by T2. T2 is waiting for resource Z, which is held by T0. Thus, all the processes wait for each
other to release resources. In this situation, none of the processes can finish their task.

m
This situation is known as a deadlock.
Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the
transactions involved in the deadlock are either rolled back or restarted.

co
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the
operations, where transactions are about to execute. The DBMS inspects the operations
and analyzes if they can create a deadlock situation. If it finds that a deadlock situation

a.
might occur, then that transaction is never allowed to be executed.
There are deadlock prevention schemes that use timestamp ordering mechanism of
transactions in order to predetermine a deadlock situation.

iy
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already
un
held with a conflicting lock by another transaction, then one of the two possibilities may
occur −
 If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than T j − then Ti is
allowed to wait until the data-item is available.
 If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a random
sD

delay but with the same timestamp.

This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
al

In this scheme, if a transaction requests to lock a resource (data item), which is already
held with conflicting lock by some another transaction, one of the two possibilities may
occur −
ri

 If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Tiwounds Tj. Tj is restarted later with
a random delay but with the same timestamp.
 If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
to

This scheme, allows the younger transaction to wait; but when an older transaction
requests an item held by a younger one, the older transaction forces the younger one to
abort and release the item.
Tu

In both the cases, the transaction that enters the system at a later stage is aborted.

Deadlock Avoidance
Aborting a transaction is not always a practical approach. Instead, deadlock avoidance
mechanisms can be used to detect any deadlock situation in advance. Methods like "wait-
for graph" are available but they are suitable for only those systems where transactions are
lightweight having fewer instances of resource. In a bulky system, deadlock prevention
techniques may work well.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every
second. The durability and robustness of a DBMS depends on its complex architecture and
its underlying hardware and system software. If it fails or crashes amid transactions, it is
expected that the system would follow some sort of algorithm or techniques to recover lost
data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as

m
follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it

co
can’t go any further. This is called transaction failure where only a few transactions or
processes are hurt.
Reasons for a transaction failure could be −
 Logical errors − Where a transaction cannot complete because it has some code

a.
error or any internal error condition.
 System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some

iy
system condition. For example, in case of deadlock or resource unavailability, the
system aborts an active transaction.
System Crash
un
There are problems − external to the system − that may cause the system to stop abruptly
and cause the system to crash. For example, interruptions in power supply may cause the
failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
sD

In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or
any other failure, which destroys all or a part of disk storage.
al

Storage Structure
We have already described the storage system. In brief, the storage structure can be
divided into two categories −
ri

 Volatile storage − As the name suggests, a volatile storage cannot survive system
crashes. Volatile storage devices are placed very close to the CPU; normally they
are embedded onto the chipset itself. For example, main memory and cache
to

memory are examples of volatile storage. They are fast but can store only a small
amount of information.
 Non-volatile storage − These memories are made to survive system crashes.
They are huge in data storage capacity, but slower in accessibility. Examples may
Tu

include hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed
up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and various files
opened for them to modify the data items. Transactions are made of various operations,
which are atomic in nature. But according to ACID properties of DBMS, atomicity of
transactions as a whole must be maintained, that is, either all the operations are executed
or none.
When a DBMS recovers from a crash, it should maintain the following −

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 It should check the states of all the transactions, which were being executed.
 A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
 It should check whether the transaction can be completed now or it needs to be
rolled back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −

m
 Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
 Maintaining shadow paging, where the changes are done on a volatile memory, and

co
later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and

a.
stored on a stable storage media, which is failsafe.
Log-based recovery works as follows −
 The log file is kept on a stable storage media.

iy
 When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>

un
When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
sD
 When the transaction finishes, it logs −
<Tn, commit>
The database can be modified using two approaches −
 Deferred database modification − All logs are written on to the stable storage
al

and the database is updated when a transaction commits.

 Immediate database modification − Each log follows an actual database
modification. That is, the database is modified immediately after every operation.
ri

Recovery with Concurrent Transactions

When more than one transaction are being executed in parallel, the logs are interleaved. At
the time of recovery, it would become hard for the recovery system to backtrack all logs,
to

and then start recovering. To ease this situation, most modern DBMS use the concept of
'checkpoints'.
Checkpoint
Tu

Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. As time passes, the log file may grow too big to be
handled at all. Checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner −

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

m
co
 The recovery system reads the logs backwards from the end to the last checkpoint.
 It maintains two lists, an undo-list and a redo-list.

a.
 If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.
 If the recovery system sees a log with <T n, Start> but no commit or abort log

iy
found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before
saving their logs.
un
sD
al
ri
to
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

UNIT – 6 :: Storage and Indexing

Syllabus: Data on External Storage, File Organization and Indexing, Cluster Indexes, Primary and
Secondary Indexes, Index data Structures, hash Based Indexing: Tree base Indexing, Comparison of File
Organizations, Indexes and Performance Tuning.
Data on External Storage: The data stored on the database is very large that cannot
accommodate in main memory and cannot store permanently. To store large volume of data permanently,
an external storage devices were developed.

m
Different Kind of memory in a Computer System (or) Memory Hierarchy: Memory in a computer system
is arranged in a hierarchy is shown as follows.

co
CPU

a.
CACHE MEMORY

Primary Storage

iy
MAIN MEMORY
un
FLASH MEMROY
sD

MAGNETIC DISK

Secondary Storage
OPTICAL DISK
al

TAPE
ri

Tertiary Storage
to

 At the top, there is Primary storage that has cache, flash and main memory to provide very fast access
the data.
 The secondary storage devices are Magnetic disks that are slower and permanent devices.
Tu

 The Tertiary Storage is a permanent and slowest device when compared with Magnetic disk.
Cache memory: The cache is the fastest but costliest memory available. It is not concern for databases.
Main Memory: The processor requires the data to be stored in main memory. Although main memory
contains Giga byte of storage capacity but it is not sufficient for databases.
Flash Memory: Flash memory stores data even if the power fails. Data can be retrieved as fast as in main
memory, however writing data to flash memory is a complex task and overwriting data cannot be done
directly. It is used in small computers.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Magnetic Disk Storage: Magnetic disk is the permanent data storage medium. It enables random access
of data and it is called “Direct-access” storage. Data from disk is transferred into main memory for
processing. After modification, the data is loaded back onto the disk.
Optical Disk: Optical disks are Compact Disks (CDs), Digital Video Disks (DVDs). These are commonly
used for permanent data storage. CD s are used for providing electronically published information and for
distributing software such as multimedia data. They have a larger capacity that is upto 640 MB. These
are relatively cheaper. To storage large volume of data CDs are replaced with DVDs. DVDs are in various
capacities based on manufacturing.

m
Advantages:
1. Optical disks are less expensive.
2. Large amount of data can be stored.

co
3. CDs and DVDs are longer durability than magnetic disk drives.
4. These provide nonvolatile storage of data.
5. These can store any type of data such text data, music, video etc.

a.
Tape Storage (or) Tertiary Storage media: Tape (or) Tertiary storage provides only sequential access to
the data and the access to data is much slower. They provide high capacity removable tapes. They can
have capacity of about 20 Giga bytes to 40 GB. These devices are also called “tertiary storage” or “off

iy
the storage”. In a larger database system, tape (tertiary) storage devices are using for backup storage of
data.
Magnetic tapes are fragmented into vertical columns referred as frames and horizontal rows referred as
un
tracks. The data is organized in the form of column string with one data/frame. Frames are in turn
fragmented into rows or tracks. One frame can store one byte of data and individual track can store a
single bit. The rest of the tract is treated as a parity track.
Advantages:
sD

1. Magnetic tapes are very less expensive and durable compared with optical disks.
2. These are reliable and a good tape drive system performs a read/write operation successfully.
3. These are very good choice for archival storage and any number of times data can be erased and
reused.
al

Disadvantages:
The major disadvantage of tapes is that they are sequential access devices.
They work very slow when compared to magnetic disks and optical disks.
ri

Performance Implications of Disk Structure:

1. Data must be in memory for the the DBMS to operate on it.
to

2. The unit for data transfer between disk and main memory is a block;if a single item on a block is
needed,the entire block is transferred.Reading or writing a disk block is called an I/O(for
input/output)operation.
Tu

3. The time to read or Write a block varies, depending on the location of the location of the data:
Access time=seek time+rotational delay+transfer time
4.the time for moving blocks to or from disk usually dominates the time taken for database operations.To
minimize this time, it is necessary to locate data records strategically on disk because of the geometry
and mechanics of disks.
Buffer Manager: The buffer manager is the software layer that is responsible for bringing pages from
physical disk to main memory as needed. The buffer manager manages the available main memory by
dividing into a collection of pages, which we called as buffer pool. The main memory pages in the buffer
pool are called frames.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

The goal of the buffer manager is to ensure that the data requests made by programs are satisfied
by copying data from secondary storage devices into buffer. In fact, if a program performs an input
statement, it calls the buffer manager for input operation to satisfy the requests by reading from
existing buffers.
Similarly, if a program performs an output statement, it calls the buffer manager for output
operation to satisfy the requests by writing to the buffers. Therefore, we can say that input and output
operations occurs b/w the program and the buffer area only.
In addition to the buffer pool itself, the buffer manager maintains two variables for each frame in

m
the pool. They are ‘pin-count’ and ‘dirty’. The number of times the page is requested in the frame, each
time the pin-count variable is incremented for that frame (i.e. set to 1 (pin-count=1)). For satisfying each
request the pin-count is decreased for that frame (i.e. set to 0).

co
Thus, if a page is requested the pin-count is incremented, if it fulfills the request the pin-count is
decremented.
In addition to this, if the page has been modified the Boolean variable, ‘dirty’ is set as ‘on’.

a.
Otherwise set to ‘off’.
Buffer pool

iy
un
DataBase
Disk
 Buffer Manager Writing the page to disk: When a page is requested the buffer manager does the
following.
sD

1. Checks the buffer pool that frame contains the requested page and if so, increment the pin-count
of that frame. If the page is not in the pool, the buffer manager brings it into the main memory
from disk and set the pin-count value as 1. Otherwise set to 0.
2. If the ‘dirty’ variable is set to ‘on’ then that page is modified and replaced by its previous page and
al

writes that page on to the disk.

 Pinning and Unpinning of pages: In buffer pool if some frame contains the requested page, the pin-
ri

count variable is set to 1 of that frame. So, pin-count = 1 is called pinning the requested page in its frame.
When the request of the requestor is fulfilled, the pin-count variable is set to 0 of that frame. Thus, the
buffer manager will not read any page into a frame unit when pin-count becomes 0 (zero).
to

 Allocation of Records to Blocks: The buffer management uses block of storage and it is replaced by
next record allocation when current record is deleted. The buffer manager also provides concurrency
control system to execute more than one process. In this case, the records are mapped onto disk blocks.
Tu

File Organization and Indexing:

File Organization: A file is organization is a method of logically arranging the records in file on the disk.
These records are mapped onto disk blocks.
In DBMS, the file of records is an important. That is, create a file, destroy it and also can insert
records into it, delete from it. It supports scan also. The most important and widely used storage
technique is Heap file. A Heap file is the simplest file structure. In a heap file, records are stored in
random order across the pages of the file.
Thus, a file organization can be defined as the process of arranging the records in a file, when the
file is stored on disk.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Types of File Organizations: Data is organized on secondary storage in terms of files. Each file has
several records.
The enormous data cannot be stored in main memory so, it is stored on magnetic disk. During the
process, the required data can be shifted to main memory from disk. The unit of information being
transferred b/w main memory and disk is called a page.
Tapes are also used to store the data in the database. But this can be accessed sequentially. So,
most of the time is wasted to transfer each page.

m
Buffer manager is software used for reading data into memory and writing data onto magnetic
disks. Each record on a file is identified by record id or rid. Whenever a page needs to be processed, the
buffer manager retrieves the page from the disk based on its record id.

co
Disk space manager is software that allocates space for records on the disk. When DBMS requires
an additional space, it calls disk space management to allocate the space. DBMS also informs disk space
manager when it’s not going to use the space.

a.
Most widely used file organizations are 1. Heap (Unordered ) Files. 2. Sequential (ordered) files.
3. Hash files.
1. Heap file: It is the simplest file organization and stores the files in the order they arrive. It also

iy
called unordered file.
Inserting a record: Records are inserted in the same order as they arrive.
Deleting a record: The record is to be deleted, first access that record and then marked as
deleted.
un
Accessing a Record: A linear search is performed on the files starting from the first record
until the desired record is found.
2. Ordered File: Files are arranged in sequential order. The main advantage of this file organization
sD

is that we can now use binary search as the file are sorted.
Insertion of Record: This is a difficult task. Because, first we need to identify the space where
record need insert and file is arranged in an order. If the space is available then record can
directly be inserted. If space is not sufficient, then that record moved to next page.
al

 Deletion of Files: This task is also difficult to delete the record. In this first find deleted
record and then remove the empty space of deleted record.
Access of files: This is similar as we can use binary search on files.
ri

3. Hash files: Using this file organization, files are not organized sequentially, instead they are
arranged randomly. The address of the page where the record is to be stored is calculated using a
to

‘hash function’.
Index: An index is a data structure which organizes data records on disk to optimize certain kinds of
retrieval operations. Using an index, we easily retrieve the records which satisfy search conditions on the
Tu

search key fields of the index. The ‘data entry’ is the term which we use to refer to the records stored in
an index file. We can search an index efficiently for finding desired data entries and use them for
obtaining data records. There are three alternatives for to store a data in an index,
1) A data entry K* is the actual data record with search key value of k
2) A data entry is a <k, rid> pair (Here rid is the row id or record id and k is key value).
3) A data entry is a <k, rid-list> pair (rid-list is a list of record ids of the data records with search
key value k).
Types of Index: There two index techniques to organize the file. They are 1. Clustered. 2. Un-clustered.
Clustered: A file organization in which the data records are ordered in same way as data entries in index
is called clustered.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Un-clustered: A file organization in which the data records are ordered in a different way as data entries
in index is called un-clustered.
,

Indexed Sequential files : Indexed sequential file overcomes the disadvantages of sequential file, where
in it is not possible to directly access a particular record. But, in index sequential file organization, it is
possible to access the record both sequentially and randomly. Similar to sequential file, the records in
indexed sequential file are organized in sequence based on primary key values. In addition to this, indexed
sequential file consists of the following two features that distinguishes it from a sequential file.

m
i)Index: It is used so as to support random access. It provides a lookup capability to reach quickly to the
desired record.
ii)Overflow file: The overflow file is similar to the log file used in the sequential file. Indexed sequential

co
file greatly reduces the time required to access a single record without sacrificing the sequential nature
of the file. In order to process a file sequentially, the records of the main file are initially processed in a
sequence until a pointer to the overflow file is found. Then accessing continues in the overflow file until a

a.
null pointer is encountered.

Hash File Organization: Hash file organization helps us to locate records very fast with a given search
key value. For example, “Find the sailor record for “SAM”, if the file is hashed on name field. In hashed

iy
files, the pages are grouped into bucket. Every bucket has bucket number which allows us to find the
primary page for that bucket. Then the record belonging to that bucket can be determined by using hash
function to the search fields, when inserting the record into the appropriate bucket with overflow pages
un
are maintained in a linked list. For searching a record with a given search key value, apply the hash
function to identify the bucket to which such records belongs and look at. This organization is called
static hashed file.
sD

Fixed-Length File Organization: A file is organized logically as a sequence of records. These

records are mapped onto disk block.
One approach to mapping the database to files is to use several files and to store records of only
one fixed length in any given file. An alternative is to structure our files so that we can accommodate
al

multiple lengths for records, however, files of fixed length records are easier to implement than the files
of variable-length records.
Fixed-length Records: As an example, consider a file of account records for our bank database. Each
ri

record of this file is defined as

Type deposit = record Acc.no branch name balance
Account_number : char(10);
to

Branch_name : char(20); Record 1 101 ongole 2000

Balance : real; Record 2 205 chimakurthy 3500
End Record 3 301 kandukur 4300
Tu

 In the above definition, each character occupies Record 4 102 ongole 2200
1 byte and a real occupies 8 bytes. So totally the Record 5 222 chimakurthy 1200
Above record occupies 38 bytes long. Record 6 333 kandukur 2600
Record 7 343 kandukur 2200

There is a problem to delete a record from this structure. The space occupied by the record to be
deleted must be filled with some other record of the file (or) we must have a way of marking deleted
records so that they can be ignored. This is shown in fig.
Acc.no branch name balance
Record 1 101 ongole 2000
kandukur 4300
102 ongole 2200
Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs,
222 Question
chimakurthy 1200 Papers
TutorialsDuniya.com
Download FREE Computer Science Notes, Programs, Projects,
Books PDF for any university student of BCA, MCA, B.Sc,
B.Tech CSE, M.Sc, M.Tech at https://www.tutorialsduniya.com

 Algorithms Notes  Information Security

Please Share these Notes with your Friends as well

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

The main disadvantages of this is when one record Record 3

to be deleted then entire database need to modify Record 4
why because after deletion of record that space is Record 5
occupied by its next record and so on. Record 6
Record 7

To rectify this drawback, DBMS provided a facility that deleted

record space is replaced by last record. This is shown in fig.

m
Acc.no branch name balance

Record 1 101 ongole 2000

co
Record 7 343 kandukur 2200
Record 3 301 kandukur 4300
Record 4 102 ongole 2200
Record 5

a.
222 chimakurthy 1200
Record 6 333 kandukur 2600
On insertion of a new record, we use the record pointer by the header.

iy
We change the head pointer to point to the next available record. If
no space is available, we add the new record to the end of the file.
Thus, Insertion and deletion for files of fixed-length records are simple to implement, because the space
un
made available by a deleted record is exactly the space needed to insert a record.

Byte-String Representation: A simple method for implementing variable-length records is to

attach a special end-of-record ( ) symbol to the end of each record. We can then store each record as a
sD

string of consecutive bytes. This is shown in fig.

Acc_name accno amt aacno amt accno amt
suman 1001 4000 2001 9000 2310 7000
al

Sreenivaas 3001 5000

Kalyan 1201 2300

Kiran kumara 3021 5000 3233 2200

Sowjanya 3201 5500
to

Sravani 2301 2500

The byte-string representation has some disadvantages:

1) It is not easy to reuse space occupied by a deleted record.

2) There is no space, for records to grow longer. If a variable-length record becomes longer, it must
be moved, movement is costly if pointers to the records are stored elsewhere in the database,
since the pointers must be located and updated.
Thus, the basic byte-string representation described here not usually used for implementing variable-
length records. However, a modified form of the byte-string representation, called the slotted-page
structure, is commonly used for organizing records within a single block. This is shown in following
structure.
size
# Entries

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Location
 There is a header at the beginning of each block, containing following information,
1) The header contains number of record entries.
2) The end of free space in the block
3) In an array, entries contain the location and size of each record.

The actual records are allocated contiguously in the block, starting from the end of the block. The free
space in the block is contiguous, b/w the final entry in the header array and the first records. If a record

m
is inserted, space is allocated for it at the end of free space and an entry containing its size and location is
added to the header.

co
 If a record is deleted, the space that it occupies is freed and its entry is set to delete. When a record
is to be deleted then space is replaced by final entry record.

a.
Fixed-Length Representation: The Fixed-Length representation is another way to
implement variable-length records efficiently in a file system to use one or more fixed-length records.

iy
There are two ways to do this, 1) Reserved Space 2) List Representation
1) Reserved Space: If there is a maximum record length that is never exceeded, we can use fixed-length
un
records of that length. Unused space is filled with a special null, or end-of-record, symbol. This is shown
in following figure (1).
2) List Representation: We can represent variable-length record by lists of fixed-length records, chained
together by pointer. This is shown in fig(2).
sD

suman 1001 4000 Suman 1001 4000

Ramu 2201 4400
ramu 2201 4400 Mamatha 3051 3400
fig(1) fig(2)
al

Mamatha 3051 3400

Kalyan 6001 4000
Sumathi 4011 2400
ri

swapna 3331 4200

kalyan 6001 4000

 The reserved-space method is useful when most records have a length close to the maximum.
Tu

Otherwise, a specified amount of space may be wasted.


 The List Representation method is useful when an account of bank customer has more accounts in
other branches. So, this method add a pointer field to represent the file when an accountant
having more accounts in different branches.

 The disadvantage of the structure is that we waste space in all records except the first in a chain. The
first method needs to have the branch_name, but subsequent records do not.

 To overcome this drawback, it allows two kind of blocks in file.

Anchor-block: Which contain the first records of a chain.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 Overflow-block: Which contain records other than those that are the first records of a chain.

Thus, all records within a block have the same length, even though not all records in the file have the same
length.

Types of Indexing: Indexes can perform the DBMS. By the index can locate the desired
record directly without scanning each record in the file.

m
An index can be defined as a data structure that allows faster retrieval of data. Each index is
based on certain attribute of the field. This is given in ‘search key’.
An index can refer the data based on several search keys. It means, the term data entry refer to

co
the records stored in an index file. The data entry can be a,
1) Search Key. 2) Search key with record id. 3) Search key with list of record ids).

a.
 A file organization when the records are stored and referred in same way as data entries in index
is called “clustered”.

iy
 A file organization when the records are stored and referred in a different way as data entries in
index is called “un-clustered”.
un
Differences b/w clustered and un-clustered:
 A file organization when the records are stored and referred in same way as data entries in index is
called clustered. Whereas un-clustered referred refer in a different way as data entries in index is called
un-clustered.
sD

 A clustered index is an index which uses alternative (1) whereas un-clustered index uses alternative (2)
and (3).
 Clustered index refer few pages when we require retrieving the records. Whereas un-clustered index
al

refer several pages when we require retrieving the records.

 If a file contains the records in sequential order, the index search key specifies the same order to
retrieve the records from the sequential order of the file is called clustered index. Whereas the index
ri

search key specifies the different order to retrieve the records from the sequential order of the file is
called un-clustered index.
to

Primary Index and Secondary Index:

Primary Index: A primary index is an index on a set of fields that includes the primary key is called a
Tu

primary index. A primary index is an ordered file whose records are of fixed length with two fields. The
first field of the record of file must be defined with a constraint “primary key” is called the primary key
of the data file and the second field is a pointer to a disk block.
There is one index entry in the index file for each block in the data file. Each index entry has the
value of the primary key field for the first record in a block and a pointer of that block as its two field
values. The two field values of index entry are referred to as key[i], primary key[i].
Primary indexes are further divided into dense index and sparse index.

1) Dense Index: An index record appears for every search key value in the file. The index record
contains the search-key value and a pointer to the first record with that search-key.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

2) Sparse Index: An index record is created for only some of the values. As it is true in dense indexes,
each index record contains a search-key value and a pointer to the first data record with the largest
search-key value that is less than or equal to the search-key value for which we are looking. We start at
the record pointed to by that index entry and follow the pointers in the file, until we find the desired
record.

Secondary Index: An index that is not a primary key index is called a secondary index. That is an
index on a set of fields that does not include the primary key is called a secondary index. A secondary

m
index on a candidate key looks just like a dense primary index, except that the records pointed to
successive values in the index are not stored sequentially. In general, secondary indices are different
from primary indices. It the search key of a primary index in not a primary key, it suffices the index

co
pointing to the first record with a particular value for the search key, since the other records can be
fetched by a sequential scan of the file.
A secondary index must contain pointers to all the records, because if the search ey of a secondary
index is not a primary key, it is not enough to point to just the first record.

a.
Thus, the records are ordered by the search key of the primary index but same search-key value
could be anywhere in the file.

iy
kalyan 1001 4000
4000
jaishnav 1022 3400
3400
un
priyanka 2301 2240
2240
kishore 1001 4500
4500
4050 sumahi 4001 4050
sD
chaitanya 3030 4200
4200
Anjali 1001 4500
4500
4050 swapna 4001 4050
mamatha 3030 4200
al

The above fig. shows the structure of a secondary index that uses an extra level of indirection on the
account file, on the search key balance.
A sequential scan in primary index is efficient because records in the file are stored physically in
ri

the same order as the index order. We cannot store a file physically ordered both by the search key of
the primary index and the search key of a secondary index. Because secondary-key order and physical-key
to

order differ, but if we attempt to scan the file sequentially in secondary-key order, the reading of each
record is likely to require the reading of a new block from disk. If a secondary index stores only some of
the search key values, records with intermediate search key values may be anywhere in the file and in
Tu

general, we cannot find them without searching the entire file. Secondary indices must therefore be
dense, with an index entry for every search-key value and a pointer to every record in the file but not the
sparse.
Secondary indices improve the performance of queries that use keys other than the search key of
the primary index. They also impose a significant overhead on modification of the database. The design
of a database decides which secondary indices are desirable on the basis of an estimate of the relative
frequency of queries and modifications.
Index Data Structures: The two methods in which file data entries can be organized in two
ways.
1) Hash-based indexing, which uses search key

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

2) Tree-based indexing, it refers to the process of

i) Finding a particular record in a file using one or more index or indexes.
ii) Strong a record in any order (randomly on the disk).

1) Hash Based Indexing: This type of indexing is used to find records quickly, by providing a search
key value.
In this, a group of file records stored in pages based on bucket method. The first bucket contains
a primary page and along with other pages is chained together. In order to determine the bucket for a

m
record, a special function is called a hash function along with a search key is used. By providing a bucket
number, we can obtain the primary page in one or more disk I/O operations.

co
Records Insertion into the Bucket: The records are inserted into the bucket by assigning
(allocating) the need “overflow” pages.
Record Searching: a hash function is used to find first, the bucket containing the records and then by
scanning all the pages in a bucket, the record with a given search key can be found.

a.
Suppose, if the record doesn’t have search key value then all the pages in the file needs to be
scanned.

iy
Record Retrieval: By applying a hash function to the record’s search key, the page containing the needed
record can be identified and retrieved in one disk I/O.
Consider a file student with a hash key rno. Applying the hash function to the rno, represents the
un
page that contains the needed record. The hash function ‘h’ uses the last two digits of the binary value of
the rno as the bucket identifier. A search key index of marks obtained i.e., marks contains <mrks, rid>
pairs as data entries in an auxiliary index file which is shown in the fig. The rid (record id) points to the
record whose search key value is mrks.
sD

2) Tree-based indexing: In Tree Based indexing the records arranged in tree-like structure. The data
entries are started according to the search key values and they are arranged in a hierarchical number to
find the correct page of the data entries.
al

Examples:
1) Consider the student record with a search key rno arranged in a tree-structured index. In order
to retrieve the nodes (A’1, B’1, L’11, L’12 and L’13) that need to perform disk I/O.
ri

The lowest leaf level contains these records. The additional records with rno’s < 19 and > 42 are added
to the left side of the leaf node L’11 and to the right of the leaf node L’13.
The root node is responsible for the start of search and these searches are then directed to the
to

correct leaf pages by the non-leaf pages which contain node pointers separated by the search key
values. The data entries in a subtree smaller than the key value ki are pointed to by the right node
pointer of ki, shown in fig.
Tu

2)In order to find the students whose roll numbers lies b/w ‘19’ and ‘24’, the direction of the search is
shown in the fig.
Suppose we want to find all the students roll numbers lying b/w 17
and 40, we first direct the search to the node A’ 1 and after
analyzing its contents, then forwarded the search to B’ 1 followed
by the leaf node L’11, which actually contains the required data
entry. The other leaf nodes L’12 and L’13 also contains the data
entries that fulfills our search criteria. For this, all the leaf
pages must be designed using double linked list.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

Thus, L’12 can be fetched using next pointer on L’11 and L’13 can be
obtained using the next pointer on L’12. Number of disk I/Os =
Length of the path from the root to a leaf (occurs in search) +
The number of satisfying data entries leaf pages.

Closed and Open Hashing: File organization based on the technique of hashing allows us to
avoid accessing an index structures. Hashing also provides a way of constructing indexes. There are two

m
types of hashing techniques. They are,
1) Static/ Open hashing. 2) Dynamic/Closed hashing.
In a hash file organization, we obtain the address of the disk block containing a desired record directly by

co
computing a function on the search key value of the record. In our description of hashing, we shall use the
term bucket to denote a unit of storage that can store one or more records. A bucket is typically a disk
block, but could be chosen to be smaller or larger then a disk block.

a.
Static hashing can obtain the address of the disk block containing a desired record
directly by computing hash function on the search key value of the record. In

iy
static hashing, the number buckets is static (fixed). The static hashing scheme is
illustrated as shown in fig.
un
The pages containing the index data can be viewed as a collection of buckets, with
one page and possible additional overflow pages for overflow buckets. A file
consists of buckets 0 through N – 1 for N buckets. Buckets contain data entries
which can be any of the three choices K*, < k, rid > pair, <k, rid-list> pair.
sD

To search for a data entry, we apply a hash function ‘h’ to identify the bucket to which it belongs and then
search this bucket.
To insert a data entry, we use the hash function to identify the correct bucket and then put the data entry
al

there. If there is no space for this data entry, we allocate a new overflow bucket, put the data entry and
add to the overflow page.
To delete a data entry, we use the hash function to identify the correct bucket, locate the data entry by
ri

searching the bucket and then remove it. If this data entry is the last in an overflow page, the overflow
page is removed and added to a list of free pages.
to

Thus, the number of buckets in a static hashing file is known when the file is created the pages can be
stored as successive disk pages.

Drawbacks of Static Hashing:

The main problem with static hashing is that the number of buckets is fixed.
 If a file shrinks greatly, a lot of space is wasted.
 If a file grows a lot, long overflow chains develop, resulting in poor performance.

Dynamic Hashing: The dynamic hashing technique allow the hash function to be modified dynamically to
accommodate the growth or shrinkage of the database, because most databases grow larger over time and
static hashing techniques presents serious problems to deal with them.
Thus, if we are using static hashing on such growing databases, we have three options:

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

1) Choose a hash function based on the current file size. This option will result in performance
degradation as the database grows.
2) Choose a hash function based on predicted size of the file for future. This option will result in the
wastage of space.
3) Periodically reorganize the hash structure in response to file growth.

Thus, using dynamic hashing techniques is the best solution. They are two types,
1. Extensible Hashing Scheme: Uses a directory to support inserts and deletes efficiently with no

m
overflow pages.
2. Linear Hashing Scheme: Uses a clever policy for creating new buckets and supports inserts and
deletes efficiently without the use of a directory.

co
Comparison of File Organization: To compare file organizations, we consider the following
operations that can be performed on a record. They are,

1) Record Insertion: For inserting a record we need to identify and fetch the page from the disk. The

a.
record is then added and the (modified) page is written back to the disk.
2) Record Deletion: It follows the same procedure as record insertion except that after identifying
and fetching a page, the record with the given rid is deleted and again the changed page is added

iy
back to the disk.
3) Record Scanning: In this all the file pages must be fetched from the disk and are stored in a pool of
buffers. Then the corresponding record can be retrieved.
un
4) Record Searching Based on Equality Selection: In this, all the records that satisfies a given
equality selection criteria are fetched from the disk.
Example: To find a student record based on the following equality selection criteria “student whose
roll number (rno) is 15 and whose marks (mrks) are 90* is the topper of the class.
sD

5) Record Searching Based on Selected Range: In this, all the records that satisfies a given equality
selection are fetched.
Example: Find all the records of the students whose secured marks are greater than 50.
al

Cost Model in terms of time needed for execution:

It is a method used to calculate the costs of different operations that are performed on the database.
Notations:
ri

B = The total number of pages without any space wastage when records are grouped into it.
R = The total number of records present in a page.
D = The average time needed to R/W (Read/Write) a disk page.
to

C = The average time needed for a record processing

H = Time required to apply the hash function on a record in hashed-file organization.
F = Fan-out (in tree indexes)
Tu

For calculating I/O costs (which is the base for costs of the database operations) we take,
D = 15 ms, C and H = 100 ns.

Heap Files:
1) Cost of Scanning: The cost of scanning heap files is given by B(D + RC). It means, Scanning R records of
B pages with time C per record would take BRC and scanning B pages with time D per page would take BD.
Therefore the total cost of scanning is BD + BRC  B(D + RC)
2) Cost of Insertion: The cost to insert a record in heap file is given as 2D + C. It means, to insert a
record, first we need to fetch the last page of the file that can take time ‘D’ then we need to add the

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

record that takes time ‘C’ and finally the page is written back to the disk from main memory will take time
‘D’. So, the total cost is, D + D + C  2D + C.
3) Cost of Deletion: The cost to delete a record from a heap file is given as, D + C + D = 2D + C. It means,
in order to delete a record, first search the record by reading the page that can take time

4) Record Searching based on some quality criteria: Searching exactly one record that meet the equality
that involves scanning half of the files based on the assumption to find the record.
This takes time = ½ x scanning cost  ½ x B (D + RC).

m
In case of multiple records the entire file need to be scanned.
5) Record searching with a Range selection: It is the same as the cost of scanning because it is not known
in advance how many records can satisfy the particular range. Thus, we need to scan the entire file that

co
would take B(D + RC).

Sorted Files:
1) Cost of scanning: The cost of scanning sorted files is given by B(D + RC) because all the pages need to

a.
be scanned in order to retrieve a record. i.e. cost of scanning sorted files = Cost of scanning heap files.
2) Cost Insertion: The cost of insertion in sorted files is given by search cost + B(D + RC).

iy
It includes, Finding correct position of the record + Adding of record + Fetching of pages + rewriting the
pages.
3) Cost of Deletion: The cost of deletion in sorted files is given by
Search cost + B(D + RC).
un
It includes, Searching a record + removing record + rewriting the modified page.
Note: The record deletion is based on equality.
sD

4) Cost of Searching with equality selection Criteria: This cost of sorted files is equal to D log2 B = It is
the time required for performing a binary search for a page that contain the records.
If many records satisfy, then record is equal to, D log2 B + C log2 R + Cost of sequential reading of all the
records.
al

5) Cost of Searching with Range Selection: This cost is given as,

Cost of fetching the first matching record’s page + Cost of obtaining the set of qualifying records.
If the range is small, then a single page contain all the matching records, else additional pages needs to be
ri

fetched.
Clustered Files:
to

1) Cost of Scanning: The cost of scanning clustered files is same as the cost of scanning sorted files
except that it has more number of pages and this cost is given as scanning B pages with time ‘D’ per page
takes BD and scanning R records of B pages with time C per record takes BRC. Therefore the total cost is,
Tu

1.5B(D + RC).
2) Cost of Insertion: The cost of insertion in clustered files is, Search + Write (D logF1.5B + Clog2R) + D.
3) Cost of Deletion: It is same as the cost of insertion and includes,
the cost of searching for a record + removing of a record + rewriting the modified page.
i.e. D logF 1.5B + Clog2R + D
4) Equality Selection Search:
i) For a single Qualifying Record: The cost of finding a single qualifying record in clustered files is the
sum of the binary searches involved in finding the first page in D logF 1.5B and finding the first
matching record in C log2 R. i.e. D logF1.5B + C log2 R.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

ii) For several Qualifying Records: If more than one record satisfies the selection criteria then they
are assumed to be located consecutively.
Cost required to find record is equal to,
D logF 1.5B + C log2R + cost involved in sequential reading of all records.

5) Range Selection Search: This cost in an equality search under several matched records.

m
Heap File with Un-clustered Tree Index:
1) Scanning: For scanning a student’s file,

co
i) Scan the index’s leaf level.
ii) Get the relevant record from the file for each data entry.
iii) Obtain sorted data records according to < rno, mrks >.
The cost of reading all the data entries is 0.15B (D + 6.7RC) I/Os. For each index entry a record has

a.
to be fetched in one I/O.
2) Insertion: The record is first inserted at 2D + C in students heap file and the associated entry in the

iy
index. The correct leaf page can be found in D logF 0.15B + C log2 6.7 R followed by the addition of a new
entry and rewriting in D.

3) Deletion: The cost of deletion includes,

un
Cost of finding the record in a file + cost of finding the entry in index + Cost of rewriting the modified page
in the index and the file.
It corresponds, D logF0.15B + C log2 6.7R + D + 2D.
4) Equality Selection Search: The cost involved in this operation is the sum of,
sD

i) The cost of finding the page containing a matched entry.

ii) The cost of finding the first matched entry and
iii) The cost of finding the first matched record.
It is given as, D logF 0.15B + C log2 6.7R + D
al

5) Range Selection Based-search: This is same as search with range selection in clustered files except
from having data pages it has data entries.
ri

Heap file with Un-clustered Hash Index:

1) Scan : The total cost is the sum of the cost in the retrieval of all data entries and one I/O cost for each
data record. It is given as, 0.125B(D + BRC) + BR(D + C).
2) Insertion: It involves the cost of inserting a record i.e., 2D + C in the heap file, the cost of finding the
Tu

page cost of adding a new entry and rewriting of the page, it is expressed as,
2D + C + (H + 2D + C).
3) Deletion: It involves the cost of finding the data record and the data entry at H + 2D + 4RC and writing
back the changed page to the index and file at 2D. The total cost is, (H + 2D + 4RC) + 2D.

4) Equality Selection Search: The total cost in the search accounts to,
i) The page containing the qualifying entries is identified at the cost H.
ii) Retrieval of the page, assuming that it is the only page present in the bucket occurs at D.
iii) The cost of finding an entry after scanning half the records on the page is 4RC.
iv) Fetching a record from the file is D. The total cost is,

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

H + D + 4RC + D  ( H + 2D + 4RC)
In case of many matched records the cost is,
H + D + 4RC + one I/O for each record that qualifies.
5) Range Selection Search: The cost of this is B(D + RC).

Comparing Advantages and Disadvantages of Different File Organizations:

m
File Organization Advantages Disadvantages

co
1) Heap file - Good storage efficiency - slow searches
- Rapid scanning - slow deletion
- Insertion is fast

a.
2) Sorted file - Good storage efficiency - Insertion is slow.
- Search is faster than heap file. - Slow deletion.

iy
3) Clustered file - Good storage efficiency - Space overhead
- Searches fast
- Efficient insertion and deletion
un
4) Heap file with - Fast insertion, deletion and searching - Scanning and range searches are slow.
Un-clustered
tree index.
sD

5) Heap file with - Fast insertion, deletion and searching - Doesn’t support range searches.
Un-clustered
Hash index.
al

Dangling Pointer: Dangling pointer is a pointer that does not point to a valid object of the appropriate
type. Dangling pointers arise when an object is deleted or de-allocated, without modifying the value of the
ri

pointer, so that the pointer still points to the memory location of the de-allocated memory.
In abject-oriented database, dangling pointer occur if we move or delete a record to which another
to

record contains as pointer that pointer no longer points to the desired record.

Detecting Dangling pointer in object-oriented Databases: Mapping objects to files is similar to mapping
Tu

tupples to files in a relational system, object data can be stored using file structures. Objects are
identified by an object identifier (OID), the storage system needs a mechanism to locate given its OID.

Logical identifiers do not directly specify an objects physical location, must maintain an index that maps an
OID to the object’s actual location.

Physical identifiers encode the location of the object so the object can be found directly. Physical OIDs
have the following parts.
1) A volume or file identifier.
2) A page identifier within the volume or file.

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

3) An offset within the page.

Physical OIDs may have a unique identifier. This identifier is stored in the object also and is used to detect
reference via dangling pointers.
Vol. page offset Unique_id Unique_id Data …..

Indexes and Performance Tunning:

The performance of the system depends greatly on the indexes. This is done in terms of the expected

m
work load.
Work Load Impact: Data entries that qualify particular selection criteria can be retrieved
effectively by means of indexes. Two selection types are,

co
1) Equality 2) Range Selection.

1) Equality:

a.
 An equality query for a composite search key is defined as a search key in which each field is associated
with a constant.
For Example data entries in a student file where rno = 15 and mrks = 90 can be retrieved by using equality

iy
query.

 This is supported by Hash-file organization

un
2) Range Query: A range query for a composite search key is defined as a search key in which all the
fields are not bounded to the constants.
Example: Data entries in a student file where rno = 15 with any mrks can be retrieved.
sD

Thus, Tree-based indexing supports both the selection criteria as well as inserts, deletes and updates
whereas only equality selection is supported by hash-based indexing apart form insertion, deletion and
updation.
al

Advantages of using tree-structured indexes:

1) By using tree-structured indexes, insertion and deletion of data entries can be handled effectively.
2) It finds the correct leaf page faster than binary search in a sorted file.
ri

Disadvantages:
to

The stored file pages are in accordance with the disk’s order hence sequential retrieval of such pages is
quicker which is not possible in tree-structured indexes.
Tu

Visit TUTORIALSDUNIYA.COM for Notes, Tutorials, Programs, Question Papers

 Algorithms Notes  Information Security

Please Share these Notes with your Friends as well

Data Science Handwritten Notes by Kirtika
No ratings yet
Data Science Handwritten Notes by Kirtika
59 pages
Artificial Intelligence Handwritten Notes by Riya
100% (1)
Artificial Intelligence Handwritten Notes by Riya
118 pages
CS331 - Normalization Exercise - Solution
100% (1)
CS331 - Normalization Exercise - Solution
15 pages
Software Engineering Notes - TutorialsDuniya
No ratings yet
Software Engineering Notes - TutorialsDuniya
147 pages
Operating System Notes 2 - TutorialsDuniya
No ratings yet
Operating System Notes 2 - TutorialsDuniya
120 pages
System Programming Notes 2 - TutorialsDuniya
No ratings yet
System Programming Notes 2 - TutorialsDuniya
73 pages
Internet Technologies Notes 2 - TutorialsDuniya
No ratings yet
Internet Technologies Notes 2 - TutorialsDuniya
74 pages
Computer Organization and Architecture Notes 1 - TutorialsDuniya
No ratings yet
Computer Organization and Architecture Notes 1 - TutorialsDuniya
119 pages
System Programming Notes 3 - TutorialsDuniya
No ratings yet
System Programming Notes 3 - TutorialsDuniya
40 pages
Theory of Computation Notes 3 - TutorialsDuniya
No ratings yet
Theory of Computation Notes 3 - TutorialsDuniya
150 pages
Theory of Computation Notes 2 - TutorialsDuniya PDF
100% (1)
Theory of Computation Notes 2 - TutorialsDuniya PDF
89 pages
Theory of Computation Notes 1 - TutorialsDuniya
No ratings yet
Theory of Computation Notes 1 - TutorialsDuniya
106 pages
PHP Notes
No ratings yet
PHP Notes
46 pages
Artificial Intelligence Notes 2 - TutorialsDuniya
No ratings yet
Artificial Intelligence Notes 2 - TutorialsDuniya
93 pages
DBMS Notes 3 - TutorialsDuniya
No ratings yet
DBMS Notes 3 - TutorialsDuniya
203 pages
System Programming Notes
No ratings yet
System Programming Notes
63 pages
PHP Notes - TutorialsDuniya
No ratings yet
PHP Notes - TutorialsDuniya
46 pages
PHP Notes - TutorialsDuniya
No ratings yet
PHP Notes - TutorialsDuniya
46 pages
PHP Notes - TutorialsDuniya
No ratings yet
PHP Notes - TutorialsDuniya
46 pages
Data Structures Notes - TutorialsDuniya-1
100% (1)
Data Structures Notes - TutorialsDuniya-1
163 pages
Python 1662881176
No ratings yet
Python 1662881176
103 pages
Pythone Notes
No ratings yet
Pythone Notes
103 pages
Internet Technologies Notes - TutorialsDuniya
No ratings yet
Internet Technologies Notes - TutorialsDuniya
172 pages
Data Mining Notes - TutorialsDuniya
No ratings yet
Data Mining Notes - TutorialsDuniya
155 pages
Microprocessor & Microcontrollers Notes
No ratings yet
Microprocessor & Microcontrollers Notes
90 pages
DSA Handwritten Notes ??
No ratings yet
DSA Handwritten Notes ??
163 pages
Machine Learning Notes - TutorialsDuniya
100% (1)
Machine Learning Notes - TutorialsDuniya
58 pages
Computer Graphics Notes - TutorialsDuniya
100% (1)
Computer Graphics Notes - TutorialsDuniya
188 pages
Computer Graphics Notes TutorialsDuniya
No ratings yet
Computer Graphics Notes TutorialsDuniya
188 pages
Machine Learning Hand Written Notes ?
No ratings yet
Machine Learning Hand Written Notes ?
57 pages
Contributor: Riya Goel
No ratings yet
Contributor: Riya Goel
57 pages
Machine Learning Hand Written Notes ?
No ratings yet
Machine Learning Hand Written Notes ?
57 pages
ML Notes
No ratings yet
ML Notes
57 pages
Data Science Handwritten Notes by Gurleen
No ratings yet
Data Science Handwritten Notes by Gurleen
66 pages
Artificial Intelligence Notes - TutorialsDuniya
No ratings yet
Artificial Intelligence Notes - TutorialsDuniya
118 pages
Artificial Intelligence Notes - TutorialsDuniya
100% (1)
Artificial Intelligence Notes - TutorialsDuniya
118 pages
Artificial Intelligence Hand Written Notes 1731139248
No ratings yet
Artificial Intelligence Hand Written Notes 1731139248
118 pages
Java Noted
No ratings yet
Java Noted
135 pages
Theory of Computation Handwritten Notes by Deepali
No ratings yet
Theory of Computation Handwritten Notes by Deepali
89 pages
DBMS Notes 2 - TutorialsDuniya
No ratings yet
DBMS Notes 2 - TutorialsDuniya
98 pages
Notes of C
No ratings yet
Notes of C
70 pages
Java Notes - TutorialsDuniya
No ratings yet
Java Notes - TutorialsDuniya
135 pages
Programming Fundamentals Using C++ Question Paper 2016 - Tutorialsduniya
No ratings yet
Programming Fundamentals Using C++ Question Paper 2016 - Tutorialsduniya
13 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
C++ Notes 2 - TutorialsDuniya
No ratings yet
C++ Notes 2 - TutorialsDuniya
121 pages
Computer Networks Notes - TutorialsDuniya
No ratings yet
Computer Networks Notes - TutorialsDuniya
129 pages
Discrete Structures Notes - TutorialsDuniya
No ratings yet
Discrete Structures Notes - TutorialsDuniya
136 pages
ToC Question Paper 2017 Solutions - Tutorialsduniya
No ratings yet
ToC Question Paper 2017 Solutions - Tutorialsduniya
17 pages
DBMS PPT Unit1
No ratings yet
DBMS PPT Unit1
100 pages
Artificial Intelligence Handwritten Notes by Riya Compress
No ratings yet
Artificial Intelligence Handwritten Notes by Riya Compress
118 pages
Computer System Architecture Question Paper 2016 - Tutorialsduniya
No ratings yet
Computer System Architecture Question Paper 2016 - Tutorialsduniya
5 pages
Microprocessor Question Paper 2010
No ratings yet
Microprocessor Question Paper 2010
5 pages
Computer System Architecture Question Paper 2015 - Tutorialsduniya
No ratings yet
Computer System Architecture Question Paper 2015 - Tutorialsduniya
5 pages
Data Structures and Algorithms question paper 2016 - Tutorialsduniya
No ratings yet
Data Structures and Algorithms question paper 2016 - Tutorialsduniya
5 pages
Microprocessor Question Paper 2015 - Tutorialsduniya
No ratings yet
Microprocessor Question Paper 2015 - Tutorialsduniya
5 pages
DS2014
No ratings yet
DS2014
5 pages
Theory of Computation Question Paper 2016 - Tutorialsduniya
No ratings yet
Theory of Computation Question Paper 2016 - Tutorialsduniya
5 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
SD2016
No ratings yet
SD2016
4 pages
Python Handwritten Notes
No ratings yet
Python Handwritten Notes
126 pages
Real World Training Design: Navigating Common Constraints for Exceptional Results
From Everand
Real World Training Design: Navigating Common Constraints for Exceptional Results
Jenn Labin
No ratings yet
DBMSDPP9byVijayAgarwalSir (1)
No ratings yet
DBMSDPP9byVijayAgarwalSir (1)
16 pages
Chapter - 7-Functional Dependencies - Normalization For Relational DBs
No ratings yet
Chapter - 7-Functional Dependencies - Normalization For Relational DBs
90 pages
Practice Dbi202 - Fe - 2024
No ratings yet
Practice Dbi202 - Fe - 2024
125 pages
Dbms Normalization
No ratings yet
Dbms Normalization
5 pages
W2 Topic3 RelationalDatabaseDesign 2021
No ratings yet
W2 Topic3 RelationalDatabaseDesign 2021
13 pages
RDBMS Unit 2
No ratings yet
RDBMS Unit 2
19 pages
Relational Algebra Practice - 2
No ratings yet
Relational Algebra Practice - 2
4 pages
Session 4
No ratings yet
Session 4
19 pages
Module 6 - Normalization-1
No ratings yet
Module 6 - Normalization-1
30 pages
03 Relational Algebra
No ratings yet
03 Relational Algebra
44 pages
Database Systems: Design, Implementation, and Management: Normalization of Database Tables
No ratings yet
Database Systems: Design, Implementation, and Management: Normalization of Database Tables
50 pages
IT8 - Table Normalization
No ratings yet
IT8 - Table Normalization
29 pages
DBMS-Rajib Mall - Unit I Notes
No ratings yet
DBMS-Rajib Mall - Unit I Notes
16 pages
Normalization
No ratings yet
Normalization
8 pages
Lanyliel Estano
No ratings yet
Lanyliel Estano
2 pages
Unit-Iii Normalization Functional Dependency: For Example
No ratings yet
Unit-Iii Normalization Functional Dependency: For Example
18 pages
CSE 303 Lec 10 DesignTheory
No ratings yet
CSE 303 Lec 10 DesignTheory
63 pages
PPT Lecture 2.1 and 2.2 DataModels(1)
No ratings yet
PPT Lecture 2.1 and 2.2 DataModels(1)
105 pages
Normalization Update2
No ratings yet
Normalization Update2
16 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
4 pages
Dbms Chapter 6 Normalization
No ratings yet
Dbms Chapter 6 Normalization
2 pages
Week 4 Solution
No ratings yet
Week 4 Solution
10 pages
DBMS Assignment-2
No ratings yet
DBMS Assignment-2
6 pages
4 DBMS Module-IV
No ratings yet
4 DBMS Module-IV
12 pages
OOAD: Normalization: Presenter: Dr. Ha Viet Uyen Synh
No ratings yet
OOAD: Normalization: Presenter: Dr. Ha Viet Uyen Synh
40 pages
DBMS Assignment 2
No ratings yet
DBMS Assignment 2
24 pages
DBMS Micro Project Ketan and Team
No ratings yet
DBMS Micro Project Ketan and Team
16 pages
Normalization 1
No ratings yet
Normalization 1
10 pages
UNIT THREE DBMS NOTES-1
No ratings yet
UNIT THREE DBMS NOTES-1
31 pages