0% found this document useful (0 votes)
19 views66 pages

Unit 3 Normalization

The document discusses normalization in database design, emphasizing its importance in minimizing data redundancy and preventing anomalies such as insertion, deletion, and updation issues. It outlines the different normal forms (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF) and the conditions required for a table to meet each form, including the concepts of functional, partial, and transitive dependencies. Additionally, it explains the process of decomposition to achieve lossless join and dependency preservation in relational databases.

Uploaded by

hnpatil2821969
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views66 pages

Unit 3 Normalization

The document discusses normalization in database design, emphasizing its importance in minimizing data redundancy and preventing anomalies such as insertion, deletion, and updation issues. It outlines the different normal forms (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF) and the conditions required for a table to meet each form, including the concepts of functional, partial, and transitive dependencies. Additionally, it explains the process of decomposition to achieve lossless join and dependency preservation in relational databases.

Uploaded by

hnpatil2821969
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Sushma Vankhede

Computer Science and Engineering


Navrachana University
 Normalizationis a technique of organizing the
data into multiple related tables to minimize
data redundancy.
 Insertion Anomaly
 To insert data for every new row (of student data in
our case) is a data insertion problem or anomaly
 Reason for data repetition
 To different but related data is stored in the same
table.
 Deletion anomaly
 Loss of related dataset when some other dataset
is deleted.
 Updation anomaly
 When Mr. X leave and Mr. Y joins as New HOD
 Repetition of data hence needs extra space.
 Leads to insertion, deletion and Updation
issues.
 It will break one table into two tables
 It Divides data into separate independent
logical entities and relating them with
common key
 It can be achieved in multiple ways:
 Three basic Normal Form
 1NF
 2NF
 3NF
 Advance are:
 BCNF (Higher version of 3NF)
 4NF
 5NF
 For a table to be in the First Normal Form, it
should follow the following 4 rules:
1. It should only have single(atomic) valued
attributes/columns.
2. Values stored in a column should be of the
same domain
3. All the columns in a table should have
unique names.
4. And the order in which data is stored, does
not matter.
 Every table in your database should at least
be in the 1NF or else it can be considered as
BAD database.
 Fora table to be in the Second Normal Form,
it must satisfy two conditions:
1. The table should be in the First Normal Form.
2. There should be no Partial Dependency.
 Functional Dependency
 We say an attribute, B, has a functional dependency on
another attribute, A, if for any two records, which have
 the same value for A, then the values for B in these two
records must be the same. We illustrate this as:
 A→B

 Partial Dependency
 Partial Dependency exists, when for a composite primary
key, any attribute in the table depends only on a part of
the primary key and not on the complete primary key.
 To remove Partial dependency, we can divide the table,
remove the attribute which is causing partial
dependency, and move it to some other table where it
fits in well.
 Transitive Dependency
 Consider attributes A, B, and C, and where

A → B and B → C.
 Functional dependencies are transitive, which
means that we also have the functional
dependency
A→C
 We say that C is transitively dependent on A
through B.
 How to remove partial dependency
 In example we should remove teacher
column from score table to remove partial
dependency.
 Orwe can create new Teacher table and add
teachers information here.
 Itshould be in the 2NF.
 And it should not have Transitive Dependency
 Scoretable in 2NF.
 Now we also want to save columns
Exam_Name and Total_Marks in score table
 Solution
 BCNF is the advance version of 3NF. It is
stricter than 3NF.
 A table is in BCNF if every functional
dependency X → Y, X is the super key of the
table.
 it means, that for a dependency A → B, A
cannot be a non-prime attribute, if B is
a prime attribute.
 For BCNF, the table should be in 3NF, and for
every FD, LHS is super key.
 Example: Let's assume there is a company where employees
work in more than one department.
 In the given table Functional dependencies are as follows:
 EMP_ID → EMP_COUNTRY
 EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
A table is said to be in the Fourth Normal
Form when,
 It is in the Boyce-Codd Normal Form.
 And, it doesn't have Multi-Valued
Dependency.
A relation is in 5NF if it is in 4NF and not
contains any join dependency and joining
should be lossless.
 5NF is satisfied when all the tables are
broken into as many tables as possible in
order to avoid redundancy.
 5NF is also known as Project-join normal
form (PJ/NF).
 In the above table, John takes both Computer and Math class
for Semester 1 but he doesn't take Math class for Semester 2.
In this case, combination of all these fields required to identify
a valid data.
 Suppose we add a new Semester as Semester 3 but do not
know about the subject and who will be taking that subject so
we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two
columns blank.
 So to make the above table into 5NF, we can decompose it into
three relations P1, P2 & P3:
Types of Join dependency
 Lossless Join and
 Dependency Preserving Decomposition
 Decomposition of a relation is done when a
relation in relational model is not in
appropriate normal form.
 Relation R is decomposed into two or more
relations if decomposition is lossless join as
well as dependency preserving.
 If we decompose a relation R into relations R1 and R2,
 Decomposition is lossy if R1 ⋈ R2 ⊃ R
 Decomposition is lossless if R1 ⋈ R2 = R
 To check for lossless join decomposition using FD set,
following conditions must hold:
 Union of Attributes of R1 and R2 must be equal to
attribute of R.
 Each attribute of R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
 Intersection of Attributes of R1 and R2 must not be
NULL.
Att(R1) ∩ Att(R2) ≠ Φ
 Common attribute must be a key for at least one relation
(R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
A relation R (A, B, C, D) with FD set{A->BC} is
decomposed into R1(ABC) and R2(AD) which
is a lossless join decomposition as:
 First condition holds true as Att(R1) U Att(R2)
= (ABC) U (AD) = (ABCD) = Att(R).
 Second condition holds true as Att(R1) ∩
Att(R2) = (ABC) ∩ (AD) ≠ Φ
 Third condition holds true as Att(R1) ∩
Att(R2) = A is a key of R1(ABC) because A->BC
is given.
 Ifwe decompose a relation R into relations
R1 and R2, All dependencies of R either must
be a part of R1 or R2 or must be derivable
from combination of FD’s of R1 and R2.

 For Example, A relation R (A, B, C, D) with FD


set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving
because FD A->BC is a part of R1(ABC).
 Consider a schema R(A,B,C,D) and functional dependencies
A->B and C->D. Then the decomposition of R into R1(AB)
and R2(CD) is ____.
A. dependency preserving and lossless join
B. lossless join but not dependency preserving
C. dependency preserving but not lossless join
D. not dependency preserving and not lossless join
Answer:
For lossless join decomposition, these conditions must hold true:
Att(R1) U Att(R2) = ABCD = Att(R)
Att(R1) ∩ Att(R2) = Φ,
which violates the condition of lossless join decomposition. Hence
the decomposition is not lossless.
For dependency preserving decomposition,
A->B can be ensured in R1(AB) and C->D can be ensured in
R2(CD). Hence it is dependency preserving decomposition.
So, the correct option is C.

You might also like