4th Module DBMS Notes
4th Module DBMS Notes
Denormalization results to lower normal form; i.e. a 3NF will be converted to a 2NF
through denormalization.
However, the price we pay for increased performance through denormalization is
greater data redundancy.
Data should be consistent throughout the database i.e. it should not suffer from
following anomalies.
UPDATE ANOMALIES:
If one copy of such repeated data is updated, an inconsistency is created unless all
copies are similarly updated.
INSERTION ANOMALIES:
It may not be possible to store certain information unless some other, unrelated,
information is stored as well.
DELETION ANOMALIES:
It may not be possible to delete certain information without losing some other,
unrelated, information as well.
The key for Hourly_Emps is ssn. In addition, suppose that the hourly_wages attribute
is deterrnined by the rating attribute. That is, for a given rating value, there is only
one permissible hourly_wages value.This IC is an example of a functional
dependency. It leads to possible redundancy in the relation Hourly_Emps, as
illustrated in Figure 19.1.
Redundant Storage: The rating value 8 corresponds to the hourly wage 10,and this
(association is repeated three tirnes.
Update Anomalies: The hourly wages in the first tuple could be updated without making a
sirnilar change in the second tuple.
• Insertion Anamolies: We cannot insert a tuple for an employee unless we know the hourly
wage for the ernployee's rating value.
• Deletion Anamolies: If we delete all tuples with a given rating value (e.g., we delete the
tuples for Snlcthurst and Guldu) we lose the association between that rating value and its
hourly_wage value.
The Normalization process: The use normalization to produce a set of normalized tables to
store the data that will be used to generate the required information.
The objective of normalization is to ensure that each table conforms to the concept of well-
formed relations, that is, tables that have the following characteristics:
Each table represents a single subject. For example, a course table will contain only
data that directly pertains to courses. Similarly, a student table will contain only
student data.
No data item will be unnecessarily stored in more than one table (in short, tables have
minimum controlled redundancy). The reason for this requirement is to ensure that the
data are updated in only one place.
All nonprime attributes in a table are dependent on the primary key—the entire
primary key and nothing but the primary key. The reason for this requirement is to
ensure that the data are uniquely identifiable by a primary key value.
To accomplish the objective, the normalization process takes you through the steps that
lead to successively higher normal forms. The most common normal forms and their
basic characteristic are listed in Table.
The designer must take care to place the right attributes in the right tables by using
normalization principles.
All data of both current and historical has to be maintained properly and correctly.
Normalization is the process of removing redundant data from your tables in order to
improve storage efficiency, data integrity and scalability. This improvement is balanced against
an increase in complexity and potential performance losses from the joining of the normalized
tables at query-time.
There are two goals of the normalization process:
Eliminating redundant data (for example, storing the same data in more than one table)
Ensuring data dependencies make sense (only storing related data in a table).
Both of these are worthy goals as they reduce the amount of space a database consumes and
ensure that data is logically stored.
Normalization is the aim of well design Relational Database Management System
(RDBMS). It is step by step set of rules by which data is put in its simplest forms.
We normalize the relational database management system because of the following reasons:
1. Minimize data redundancy i.e. no unnecessarily duplication of data.
2. To make database structure flexible i.e. it should be possible to add new data values and
rows without reorganizing the database structure.
Data should be consistent throughout the database i.e. it should not suffer from following
anomalies.
Insert Anomaly - Due to lack of data i.e., all the data available for insertion such that
null values in keys should be avoided. This kind of anomaly can seriously damage a
database
Update Anomaly - It is due to data redundancy i.e. multiple occurrences of same values
in a column. This can lead to inefficiency.
Deletion Anomaly - It leads to loss of data for rows that are not stored elsewhere. It
could result in loss of vital data.
Complex queries required by the user should be easy to handle.
On decomposition of a relation into smaller relations with fewer attributes on normalization the
resulting relations whenever joined must result in the same relation without any extra rows. The
join operations can be performed in any order. This is known as Lossless Join decomposition.
The resulting relations (tables) obtained on normalization should possess the
properties such as each row must be identified by a unique key, no repeating groups,
homogenous columns, each column is assigned a unique name etc .
Functional Dependency
Functional dependency is a relationship that exists when one attribute uniquely determines
another attribute.
If R is a relation with attributes X and Y, a functional dependency between the attributes is
represented as X->Y, which specifies Y is functionally dependent on X. Here X is a determinant
set and Y is a dependent attribute. Each value of X is associated with precisely one Y value.
Functional dependency in a database serves as a constraint between two sets of attributes.
Defining functional dependency is an important part of relational database design and
contributes to aspect normalization.
For example:
If X->Y and t1.X=t2.X, then t1.Yshould be equal to t2.Y i.e t1.Y=t2.Y where t1 and t2 are the
tuples.
The table given below shows an instance where AB->C.
A B C D
a1 b1 c1 d1
Full Functional Dependency: In a relation,
a1 b1 c1 d2 there exists Full Functional Dependency between any
a1 b2 c2 d1 two attributes X and Y, when X is functionally
dependent on Y and is not functionally dependent on
a2 b1 c3 d1 any proper subset of Y.
Partial Functional Dependency: In a relation, there exists Partial Dependency, when a non
prime attribute (the attributes which are not a part of any candidate key ) is functionally
dependent on a proper subset of Candidate Key.
For example : Let there be a relation R ( Course, Sid , Sname , fid, schedule , room , marks )
Full Functional Dependencies: {Course , Sid) -> Sname , {Course , Sid} -> Marks, etc.
Partial Functional Dependencies : Course -> Schedule , Course -> Room
Closure set of Attributes:
The set of all those attributes which can be functionally determined from an attribute set is
called as a closure of that attribute set.
Closure of attribute set {X} is denoted as {X}+.
Inference rules:
These are the rules to find additional functional dependencies:
1. Reflexive property: If X⊇Y (X is a superset of Y), then X->Y.
DB+= {DB}
EB+= {EB}
FB+= {FBE}
GB+= {GBACD}
As AB, CB, DB, EB, FB, GB cannot determine all the attributes, we should again find the
exterior relation.
ACB+= {ACBDG}
CFB+= {CFBGEAD}
CGB+= {CGBAD}
CDB+= {CDBGA}
CEB+= {CEBGADF}
AFB+= {AFBCDEG}
AGB+= {AGBCD}
AEB+= {AEBCDFG}
ADB+= {ADBCG}
Out of these AFB, AEB, CFB, CEB are candidate key
Refer to first Normal form before 2nd normal form
2NF
To Avoid Partial Dependencies
In Relation schema R is in 2NF,If every non-prime attribute is not dependent
on any key
A functional dependency x->y is partial dependency, if some attribute A ∈
X can be removed from X and the dependency still holds
An attribute that is not part of any candidate key is known as non-prime
attribute.
RULES FOR 2NF
Table should be in 1NF
Single column primary key
All non-key attributes are fully functional dependent on the primary key
Example: Suppose a school wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects, the
table can have multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Teacher_id Teacher_age
111 38
222 38
333 40
teacher_subject table:
Teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
3NF
Table must be in 2NF form
Transitive functional dependency of non-prime attribute on any super key should be
removed
When an indirect relationship causes functional dependency it is called Transitive
Dependency.
OR
RULES
A table is in 3NF, if it is in 2NF and for each functional dependency X-> Y at least one of
the following conditions hold:
X is Candidate key of table
Y is prime attribute of table
Example: Suppose a company wants to store the complete address of each employee, they
create a table named employee_details that looks like this:
employee_PIN table:
BCNF
A Relation schema R is in BCNF, if whenever a non-trival functional dependencies X->A
hold inn R then ‘X’ is candidate key of R i.e, the determinant of all functional
dependencies must be super key
A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X
should be the candidate key of the table.
Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
Emp_i Emp_nation
d
1001 Indian
1002 American
emp_dept table:
dept_mapping table:
Emp_id Emp_dept
1001 Production
1001 Stores
1002 Designing
1002 Purchase
Functional dependencies:
emp_id->emp_nationality
emp_dept -> {dept_type, dept_no.emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies determinant( left side part)is a
key
4th Normal Form
Rules for 4th Normal Form
For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:
1. It should be in the Boyce-Codd Normal Form.
3. And, for a relation R(A, B,C), if there is a multi-valued dependency between, A and B, A
and C then B and C should be independent of each other.
Examples 2- consider a table with Subject, Lecturer who teaches each subject and
recommended Books for each subject.
If we observe the data in the table above it satisfies 3NF.
SUBJECT LECTURER
SUBJECTBOOKS
->After decomposition of table into 4th normal form:
EQUIVALENCE
Relational schemas F and R are said to be equivalent if F covers R and R covers F.
Example 1:
F:{A->C,AC->D,E->AD,E->H}
G:{A->CD,E->AH}
In F:
A+= {ACD}
AC+= {ACD}
E+= {EAHCD}
F covers G.
In G:
A+= {ACD}
E+= {EADHC}
G covers F.
Hence F and G are equivalent.
Example 2:
F:{A->B,B->C,C->D}
G:{A->BC,C->D}
In F:
A+= {ABC}
B+= {B}
F does not cover G.
In G:
A+= {ABCD}
C+= {CD}
G covers F.
Hence F and G are not equivalent.
MINIMAL COVER:
We say that a set of functional dependencies F covers another set of functional dependencies
G, if every functional dependency in G can be inferred from F. More formally, F covers G if
G+ ⊆ F+. F is a minimal cover of G if F is the smallest set of functional dependencies that
cover G. We won’t prove it, but every set of functional dependencies has a minimal cover.
Also, note that there may be more than one minimal cover.
STEPS TO FIND MINIMAL SET:
1.put the FD's in a standard form, split the FD's based on the decomposition rule such that
RHS contain single attribute.
2. find the redundant FD's and delete from the set.
3.find the redundant attributes on LHS and delete them.
EXAMPLE1
Let us apply these properties to F={A->B, C->B, D->ABC, AC->D}
STEP1: Right hand side (R.H.S) there should be single attribute so decompose D->ABC as
follows;
D->A, D->B, D->C;
● A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
● 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy. (till each table contains two columns).
● 5NF is also known as Project-join normal form (PJ/NF)
● If we can decompose table further to eliminate redundancy and anomaly, and when
we re-join the decomposed tables by means of candidate keys, we should not be
losing the original data or any new record set should not arise. In simple words,
joining two or more decomposed table should not lose records nor create new records.
● A relation R is in 5NF if and only if every join dependency in R is implied by the
candidate keys of R. A relation decomposed into two relations must have loss-less join
Property, which ensures that no spurious or extra tuples are generated, when relations
are reunited through a natural join.
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency)
● If you divide table in a way that it has only two columns then it doesnot have any
redundancy and when you join two tables it should form original table then only we
get lossless decomposition.
Example:
A B C
1 2 1
2 2 2
3 3 2
A B
1 2
2 2
3 3
Fig (b)
B C
2 1
2 2
3 2
Fig(c)
In fig(a) A is determining B&C so A is a candidate key. So we are decomposing original table
into fig(b) and fig(c).In fig(b) A is candidate key ,in fig(c) B is candidate key ang B is
common attribute for both of the tables.If we perform natural join on both the tables we
should get the original table but in this example we can find invalid tuples/unneccessary data
which is not present in the original table and this leads to lossey decomposition(In lossey
decomposition,after performing natural join we will get invalid tuples without losing actual
tuples).
Natural join for fig(b) and (c):
A B C
1 2 1
1 2 2
2 2 1
2 2 2
3 3 2
A B
1 2
2 2
3 3
fig(e)
A C
1 1
2 2
3 2
fig(f)
If we perform natural join on fig(e) & fig(f) then we get the original table fig(a).
Rules to be followed for decomposing:
Let R be the table and is decomposed into R1 and R2
1. R1UR2 →R,when R1UR2 is performed we should get R i.e, all attributes should be
covered.
2. R1ՈR2= ɸ , when R1ՈR2 is done then it should not be null i.e,there should exist a
common attribute.
3. R1ՈR2→R1 (or) R1ՈR2→R2 i.e, the common attribute that came from intersection
should be a candidate key.
If the same value appears in the rating column of two tuples, the IC tells us that the same
value must appear in the hourlywages column as well. This redundancy has the same
negative consequences as before:
Redandant Storage: the rating value 8 corresponds to the hourly wage 1000, and this
association is repeated three times.
Update anomalies: The hourlywages in the first tuple could be updated without
making a similar change in the second tuple.
• Insertion anomalies: We cannot insert a tuple for an employee unless we know the
hourlywage for the employee's rating value.
• Deletion anomalies: If we delete all tuples with a given rating value (e.g., we delete the
tuples for c and d) we lose the association between that rating value and its hourlywage
value.
CONSTRAINTS ON RELATIONSHIP SET:
Suppose that we have entity sets Parts, Suppliers and departments, as well as a
relationship set Contracts that involves all of them. we refer to the schema
Contracted(cid) specifies that a supplier S will supply some quantity Q of a part P to a
department D
Consider a policy that a department purchases at most one part from any given supplier.
Therefore, if there are several contracts between the same supplier and department, we
know that the same part must be involved in all of them. This constraint is an FD :DS ->P
Again we have redundancy and its associated problems. we can address this situation by
decomposing Contracts into two relations with attributes CQSD and DP. Intuitively, the
relation DP records the part supplied to a department by a supplier, and the relation
CQSD records additional information about a contract.
IDENTIFYING ATTRIBUTES OF ENTIES:
the ER diagram in Figure 19.11 shows a relationship set called Works_In with an
additional key constraint indicating that an employee can work in at most one department.
(Observe the arrow connecting Employees to Works_In.)
Using the key constraint, we can translate this ER diagram into two relations:
workersCssn, name, lot, did, since)
Departments(did, dname, budget)
The entity set Employees and the relationship set works_in are mapped to a single
relation, workers.
Suppose employees are assigned parking lots based on the departments are assigned to
same lot these constraints is not expressible in ER diagram. It is an example of FD :did
->lot.
The redundancy in this design can be eliminated by decomposing the workers relation
into two relations.
Workers(ssn,name,did,since)
Dept_lots(did,lot)
Now the two relations departments and dept_lots have the same key and describes the
same entity.
Workers(ssn,name,did,since)
Departments(did,dname,budget,lot)
Reserves(sid,bid,day)
In addition let us add an attribute C denoting the credit card to which the reservation is
charged.
Every sailor uses a unique credit card for reservation this constraint is expressed by FD :
sailors ->creditcard. we store credit card number as sailor as often as we have reservation
for that sailor this leads to redundancy and update anomalies a solution is to decompose
reserves into 2 relations with schema of attributes (sid, bid, day) and (sid, credit card). By
this one holds information about reservations and other holds information about credit
card.
Sid CreditcarDay
Bid
d
sid
Denormalization:
● Denormalization is a strategy that database manages use to increase the performance
of a database in infrastructure.It involves adding redundant data to a normalized
database to reduce certain type of problems with database queries that combine data
from various tables into a single table.
● In a traditional normalized database we store data in separate logical tables and
attempt to minimize redundant data we may strive to have only one copy of each
piece of data in database.
For Example,in normalized database we might have a course table and a teachers table.Each
entry in courses would store the teacherID for a course but not the teacherName.When we
need to retrieve a list of all courses with the Teacher name,we would do a join between these
two tables.In some ways,this is great ia a teacher changes is his/her name we only have to
update the name in one place.The drawback is that if tables are large,we may spend an
unnecessarily long time doing joins on tables.
Pros of denormalization:
1. Retrieving data is faster since we do fewer joins.
2. Queries to retrieve can be simpler(and therefore less likely to have bugs),since we
need to look at fewer tables.
Cons of denormalization:
1. Updates and inserts are more expensive.
2. Denormalization can make update and insert code harder to write.
3. Data may be inconsistent identifying which is the “correct ” value for piece of data is
difficult
4. Data redundancy necessitates more storage.