DBMS (R20) Unit - 4
DBMS (R20) Unit - 4
DBMS (R20) Unit - 4
Syllabus:
Schema Refinement (Normalization): Purpose of Normalization or schema refinement,
concept of functional dependency, normal forms based on functional dependency(1NF, 2NF
and 3 NF), concept of surrogate key, Boyce-codd normal form(BCNF), Lossless join and
dependency preserving decomposition, Fourth normal form(4NF), Fifth Normal Form (5NF).
Objectives:
After studying this unit, you will be able to:
Discuss the different types of anomalies in a database
State what is functional dependency
List the different forms of normalization
Differentiate among different types of normalization
DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION
The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition.
Normalization means “split the tables into small tables which will contain less number of
attributes in such a way that table design must not contain any problem of inserting,
deleting, updating anomalies and guarantees no redundancy”.
Normalization or Schema Refinement is a technique of organizing the data in the database.
It is a systematic approach of decomposing tables to eliminate data redundancy and
undesirable characteristics like Insertion, Update and Deletion Anomalies.
Redundancy: refers to repetition of same data or duplicate copies of same data stored in
different locations.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalized
databases where all the data is stored in one table which is sometimes called a flat file
database.
Anomalies refers to the problems occurred after poorly planned and unnormalized
databases where all the data is stored in one table which is sometimes called a flat file
database. Let us consider such type of schema
Here all the data is stored in a single table which causes redundancy of data or say
anomalies as SID and Sname are repeated once for same CID . Let us discuss anomalies one by
one.
Due to redundancy of data we may get the following problems, those are-
1.insertion anomalies : It may not be possible to store some information unless some other
information is stored as well.
2.redundant storage: some information is stored repeatedly
3.update anomalies: If one copy of redundant data is updated, then inconsistency is created
unless all redundant copies of data are updated.
4.deletion anomalies: It may not be possible to delete some information without losing some
other information as well.
Problem in updation / updation anomaly – If there is updation in the fee from 5000 to 7000,
then we have to update FEE column in all the rows, else data will become inconsistent.
Insertion Anomaly and Deletion Anomaly- These anomalies exist only due to redundancy,
otherwise they do not exist.
Insertion Anomalies: New course is introduced C4, But no student is there who is having C4
subject.
Because of insertion of some data, It is forced to insert some other dummy data.
Deletion Anomaly:
Deletion of S3 student cause the deletion of course. Because of deletion of some data forced to
delete some other useful data.
Purpose of Normalization:
Advantages of Normalization:
1. Greater overall database organization will be gained.
2. The amount of unnecessary redundant data reduced.
3. Data integrity is easily maintained within the database.
4. The database & application design processes are much for flexible.
5. Security is easier to maintain or manage.
Disadvantages of Normalization:
1. The disadvantage of normalization is that it produces a lot of tables with a relatively
small number of columns. These columns then have to be joined using their
primary/foreign key relationship.
2. This has two disadvantages.
Performance: all the joins required to merge data slow processing & place
additional stress on your hardware.
Complex queries: developers have to code complex queries in order to merge
data from different tables.
Case1: A →B
Here A1 belongs to B1 & B2. So A1 does not have unique value in B. So it is not in FD.
Case1: A →C
Here A1→C1 and A2, A3→C2. So A has unique values in B. So it is in FD.
Note: try to find all the possibilities. i.e., A→D, B→C, B→D, and C→D
Armstrong Axioms (Inference Rules ) : The term Armstrong axioms refers to the sound
and complete set of inference rules or axioms, introduced by William W. Armstrong, that is
used to test logical implication of functional dependencies.
Armstrong axioms define the set of rules for reasoning about functional dependencies and also
to infer all the functional dependencies on a relational database.
Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
The set of FD’s that is logically implied by F is called the closure of F and written as F +. And it
is defined as “If F is a set FD’s on a relation R, the F+, the closure of F by using the inferences
axioms that are not contained in F+.
Example: R (A, B, C, D) and set of Functional Dependencies are A→B, B→D, C→B then what
is the Closure of A, B, C, D?
Solution: A+ is
A+→ {A, B, D} i.e., A→B, B→D is exists and C is not FD on A. So it is eliminated.
B+→ { B, D} i.e., B→D is exists and A, C is not FD on A. So it is eliminated.
C+→ {C, B, D} i.e., C→B, B→D is exists and A is not FD on C. So it is eliminated.
The algorithm for computing the attribute closure of a set X of attributes is shown below
Candidate Key:
Candidate Key is minimal set of attributes of a relation which can be used to identify a tuple
uniquely.
Consider student table: student(sno, sname,sphone,age)
we can take sno as candidate key. we can have more than 1 candidate key in a table.
types of candidate keys:
1. simple(having only one attribute)
2. composite(having multiple attributes as candidate key)
Super Key:
Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.
Adding zero or more attributes to candidate key generates super key.
A candidate key is a super key but vice versa is not true.
Consider student table: student(sno, sname,sphone,age)
we can take sno, (sno, sname) as super key
Examples:
1. In a schema with attributes A,B,C,D and E the following set of attributes are given
AB, AC, CDE, BD, EA. Find CDAC determines from the given FDs or
not.
2. Check DA can be derived from the following FDs or not ABC, BCAD, DE,
CFB.
(i) Primary key: It is an unique value attribute in a table to enforce entity integrity and
ti identify rows in the table uniquely.
(ii) Composite Primary Key: Sometimes single attribute is not sufficient to identify
uniquely the rows in the table so, we combine 2 or more attributes to identify the
rows uniquely.
(iii) Candidate keys: Sometimes 2 or more independent attribute or attributes can be
used to identify the rows uniquely Eg :( vech no,veng no,purchase date) Either
vehicle no or vehicle engine no can be used as a key attribute then they are called as
candidate keys one of the candidate key can be elected as primary key.
Example 1: Find candidate keys for the relation R(ABCD) having following FD’s ABCD,
CA, DA.
Sol: From the given FD’s, the attribute B is key attribute because it is not in RHS of
functional dependency.
AD+ =AD
From the above attributes AB and BC determines all attributes.
AB, BC are candidate keys.
Example 2: Find candidate keys for the relation R(ABCDE) having following FD’s ABC,
CDE, BD, EA.
Sol: From the given FD’s, no attribute is key attribute because all are in RHS of
functional dependency. So check for all attributes of LHS.
A+ = ABC (∵ A BC)
= ABCD (∵ B D)
= ABCDE (∵ CD E)
B+ = BD (∵ B D)
E+ = EA (∵ E A)
= EABC (∵ A BC)
= EABCD (∵ B D)
C + = C
D + = D
CD+ = CDE (∵ CD E)
= CDEA (∵ E A)
= CDEAB (∵ A BC)
BC+ = BCD (∵ B D)
= BCDE (∵ CD E)
= BCDEA (∵ E A)
From the above attributes A, E, CD and BC determines all attributes.
A, E, CD, BC are candidate keys.
Different database designers may define different F.D’s sets from the same requirements. To
evaluate whether they are equivalent if we are able to derive all F.D’s in G from F and vice-
versa.
Sol:
Step 1: Take set F and enclose all FD’s in G that can be derived from F.
ACD
A+ from F
=A
=AC (∵ A C)
=ACD (∵ AC D)
A CD can be derived from F
EAH
E+ from F
=E
=EAD
=EADH
E AH can be derived from F
Step 2: Take set G and enclose all F.D’s in F that can be derived from
G. AC
A+ from G
=A
=ACD
A C can be derived from G
E AD
E+ from G
=E
=EAH
=EAHCD
E AH & E AD can be derived from G
G and F are equivalent.
(4) To identify the irreducible form of FD’s /canonical Form (minimal cover):
We try to minimize the functional dependency. The minimize FD should be equivalent to
original FD,
Procedure to find minimal set:
Step 1: Have single attributes on the RHS for every FD.
Step 2: Evaluate all F.D’s in step 1 for their necessity. If they are not necessary, remove them
from the list.
Step 3: Evaluate the necessity of the LHS attributes in FD’s obtained from step 2.If they are not
necessary remove from FD.
Step 4: Apply the union rule for common to LHS attribute in the FD’s obtained from step
3.Then we will get irreducible set.
Step 4:
Normal forms based on functional dependency (1NF, 2NF and 3 NF, Boyce-
Codd normal form (BCNF), 4NF)
Normalization means “split the tables into small tables which will contain less number of attributes in
such a way that table design must not contain any problem of inserting, deleting, updating anomalies
and guarantees no redundancy”.
The evolution of Normalization theories / Steps of Normalization / Different Normal Forms
is illustrated below-
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF).
Points to be Remember
1 NF is a mandatory NF and remaining are the optional
If you construct E-R diagrams in to the tables, then 4 NF and 5 NF need not be applied
on the table.
Practically applied normalization is upto 3NF and very rarely we will go beyond that.
2 NF dealing with the partial dependencies and 3NF is dealing with transitive
dependencies.
First Normal Form (1NF): A relation is said to in the 1NF if it is already in un-normalized
form and it satisfies the following conditions or rules or qualifications are:
1. Each attribute name must be unique.
2. Each attribute value must be single or atomic i.e., Single Valued Attributes.
3. Each row / record must be unique.
4. There is no repeating group’s.
Example: How do we bring an un-normalized table into first normal form? Consider the
following relation:
Solution: This table is not in first normal form because the [Color] column can contain
multiple values. For example, the first row includes values "red" and "green." To bring this
table to first normal form, we split the table into two tables and now we have the resulting
tables:
Second Normal Form (2NF): A relation is said to be in 2NF, if it is already in 1st NF and it
has no Partial Dependency i.e., no non-prime attribute is dependent on the only a part of the
candidate key.
(OR)
A relation is in second normal form if it satisfies the following conditions:
• It is in first normal form
• All non-key attributes are fully functional dependent on the primary key.
➔This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is
[Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is
only part of the primary key. Therefore, this table does not satisfy second normal form.
➔ To bring this table to second normal form, we break the table into two tables, and now we
have the following:
ABC BD
ABC BD
Q2 Consider the relation R=ABCDEF and set of FDs are A FC, CD, B E Find the
key and normalize into 2NF.
Third Normal Form (3NF): A database is in third normal form if it satisfies the following
conditions:
• It is in 2NF.
• There is no transitive functional dependency
By transitive functional dependency, we mean we have the following relationships in
the table: A is functionally dependent on B, and B is functionally dependent on C. In
this case, C is transitively dependent on A via B. and A non-key attribute is
depending on a non-key attribute.
➔ In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive
functional dependency, and this structure does not satisfy third normal form.
➔ To bring this table to third normal form, we split the table into two as follows:
Q1 Given relation R(ABCDE) and F:{ABC, BD, DE} Decompose in into 3NF.
from the given FDs determine primary key. Necessary attributes to include in the key
are A, B (because this attributes are not in RHS of FD).
Find the closure set of AB
AB+ = ABC
= ABCD (∵ B D)
= ABCDE (∵ D E)
AB is a primary key.
From the FDs BD is partially depending on AB. So decompose the table.
(D is a non-prime attribute derived by a part of the key)
B+ = BDE
ABCD
B+
ABC BDE
ABC BD, DE
table is in 2NF but not in 3NF. Because DE is transitive dependency.
(No non-key attribute should determining a non-key attribute)
D+ = DE
BDE
D+
BD DE
BD DE
Table is 3NF.
The relations after decomposing into 3NF.
R1: ABC
R2: BD
R3: DE
Q2 Given relation R=ABCDEFGHIJ and the set of FDs are AB C, ADE, BF, FGH,
D IJ Decompose R into 3NF.
Q3(a) Given a set of FDs for the relation schema R(ABCD) with primary key AB under
which R is 1NF but not in 2NF
(b) Find FDs such that R is in 2NF but not in 3NF
Sol: R=ABCD
Key=AB
(a) Atomic values are allowed in 1NF and partial dependency is not allowed in 2NF.
The following FDs are allowed.
B C, AC, B D, A D
(show the FDs which is having partial dependency)
(b) According to question partial dependencies are not allowed and transitivity
dependency is allowed. The following FDs are allowed.
C D, DC
Boyce-Codd normal form (BCNF): A relation is said to be in BCNF, if and only if every
determinant should be a candidate key.
✓ BCNF is the advance version of 3NF. It is stricter than 3NF.
✓ A table is in 3NF if for every functional dependency X → Y, X is the super key of the table.
✓ For BCNF, the table should be in 3NF and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department. EMPLOYEE table:
Fourth Normal Form (4NF): A relation said to be in 4NF if it is in Boyce Codd normal
form and should have no multi-valued dependency.
✓ For a dependency A→ B, if for a single value of A, multiple value of B exists then the
relation will be multi-valued dependency.
✓ Note: Multi Valued Dependency: A table is said to have multi-valued dependency, if the
following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R (A, B, C), if there is a multi-valued dependency between, A and
B, then B and C should be independent of each other.
◼ If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.
Example
The given STUDENT table is in 3NF but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT
relation, student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID,
which leads to un-necessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STUDENT_HOBBY
Review Questions
13. Give asset o FDs for the relation schema R(A,B,C,D) with primary key AB under which
R is in 1NF but not in 2NF.
14. Why is a relation that is in 3NF generally considered good? Explain.
15. Discuss about 4NF with suitable example.
16. What are the problems caused by redundantly storing information? Explain
17. Given Relation, R=(A,B,C,D,E,F,G) and Functional Dependencies
F={ {A,B}→{C}, { A,C}→{B}, {A,D}→{E}, {B}→{D}, { B,C}→{A}, {E}→{F}}.
Check whether the following decomposition of R into R1=(A,B,C), R2=(A,C,D,E) and
R3=(A,D,F) is satisfying the lossless Decomposition property.
18. What is dependency preservation property for decomposition? Explain why it is important.
19. Given a Relation R=(X,Y,Z) and Functional Dependencies are F={ {X,Y}→{Z}, {Z}→{X} }
Determine all Candidate keys of R and the normal form of R with proper explanation.
20. Define functional dependency? How can you compute the minimal cover for a set of
functional dependencies? Explain it with an example.
21. Consider schema R = (A, B, C, G, H, I) and the set F of functional dependencies {AB, AC,
CG H, CGI, BH}. Compute the candidate keys of the schema. Compute the closure of the
same.
22. Explain 3NF & BCNF. What is the difference between them?
23. What is functional dependency? Explain its usage in database design.
24. What is a surrogate key? How can it be used for schema refinement?
25. How to compute closure of set of functional dependency? Explain with a suitable example schema.
26. What is multi valued dependency? State and explain fourth normal form based on this concept.
27. Given a set of FDs for the relation schema R(A,B,C,D) with Primary key AB, and D C or
C D or AC D or AD C or BC D or BD C. In which normal form is R?
28. Discuss the problems caused by redundancy and the purpose of normalization.
29. Give relation schemas for the following normal forms
i) 2NF but not in 3NF ii) 3NF but not in BCNF
References:
Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
C.J. Date, Introduction to Database Systems, Pearson Education.
Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.