Unit V:Normalization: Normalization: Relational Database Design Pitfalls, Denormalized Data, Decomposition
Unit V:Normalization: Normalization: Relational Database Design Pitfalls, Denormalized Data, Decomposition
Table 1 Table 2
• Few relational databases have limits on field lengths which can't be exceeded.
• Relational databases can sometimes become complex as the amount of data
grows, and the relations between pieces of data become more complicated.
• Complex relational database systems may lead to isolated databases where the
information cannot be shared from one system to another.
DECOMPOSITION
Decompose the above table into two tables −Now, you won’t be able to join the above tables, since Emp_ID isn’t part of the
DeptDetails relation.
<EmpDetails> <DeptDetails>
Emp_ID Emp_Name Emp_Age Emp_Locati Dept_ID Dept_Name
on
Dpt1 Operations
E001 Jacob 29 Alabama
• Decomposition must be lossless. It means that the information should not get lost
from the relation that is decomposed.
• It gives a guarantee that the join will result in the same relation as it was
decomposed.
Example:
• Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1,
E2, E3, . . . . En; With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en,
then it is called as 'Lossless Join Decomposition'.
• In the above example, it means that, if natural joins of all the decomposition give
the original relation, then it is said to be lossless join decomposition.
LOSSLESS DECOMPOSITION
•Decompose the above relation into two relations to check whether a decomposition is
lossless or lossy.
•Now, we have decomposed the relation that is Employee and Department.
LOSSLESS DECOMPOSITION
Employee ⋈ Department
Eid Ename Age City Salary Deptid DeptName
Example:
Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D} are given.
A relation R is decomposed into -
R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and
R2 = (C, D) with FDs F2 = {C -> D}.
• F' = F1 ∪ F2 = {A -> B, A -> C, C -> D}
• so, F' = F.
• And so, F'+ = F+.
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a
database with anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For
example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved
somewhere else.
• Insert anomalies − We tried to insert data in a record that does not exist at all.
• Normalization is a method to remove all these anomalies and bring the database to a consistent state.
FIRST NORMAL FORM
• First Normal Form is defined in the definition of relations (tables) itself. This
rule defines that all the attributes in a relation must have atomic domains. The
values in an atomic domain are indivisible units.
Before we learn about the second normal form, we need to understand the following −
• Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime
attribute.
• If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper subset
Y of X, for which Y → A also holds true.
SECOND NORMAL FORM
• We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon
both and not on any of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial
dependency, which is not allowed in Second Normal Form.
• We broke the relation in two as depicted in the above picture. So there exists no partial dependency.
THIRD NORMAL FORM
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City
can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows −
BOYCE-CODD NORMAL FORM
• Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −
• For a table to satisfy the Fourth Normal Form, it should satisfy the following
two conditions:
• For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may
have multi-valued dependency.
• Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
• And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and
C should be independent of each other.
• If all these conditions are true for any relation(table), it is said to have multi-valued dependency.
4TH NORMAL FORM
• Example – Consider the above schema, with a case as “if a company makes a
product and an agent is an agent for that company, then he always sells that
product for the company”. Under these circumstances, the ACP table is shown
Table – ACP
as:
• The relation ACP is again decompose into 3 relations. Now, the natural Join of
all the three relations will be shown as:
Table – R1
Table – R2 Table – R3
Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13
and R2 over ‘Agent’and ‘Product’ will be table ACP.