0% found this document useful (0 votes)
126 views30 pages

Unit V:Normalization: Normalization: Relational Database Design Pitfalls, Denormalized Data, Decomposition

The document discusses normalization of relational databases. It defines normalization as removing anomalies from database design like update, deletion, and insertion anomalies. It covers various normal forms including first, second, third, BCNF, and fifth normal forms. It also discusses relational database design concepts like relations, attributes, primary keys, relationships, decomposition, and dependency preservation.

Uploaded by

Arnav Chowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views30 pages

Unit V:Normalization: Normalization: Relational Database Design Pitfalls, Denormalized Data, Decomposition

The document discusses normalization of relational databases. It defines normalization as removing anomalies from database design like update, deletion, and insertion anomalies. It covers various normal forms including first, second, third, BCNF, and fifth normal forms. It also discusses relational database design concepts like relations, attributes, primary keys, relationships, decomposition, and dependency preservation.

Uploaded by

Arnav Chowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT V:NORMALIZATION

Normalization: Relational Database Design Pitfalls, Denormalized data, Decomposition,


Normalization Using Functional Dependencies, First Normal Form, Second Normal Form,
Third Normal Form, BCNF, Fourth Normal Form and Fifth Normal Form.
RELATIONAL DATABASE DESIGN

• The relational data model was introduced by C. F. Codd in 1970. Currently, it


is the most widely used data model. The relational data model describes the
world as
• “a collection of inter-related relations (or tables).”
• A relational data model involves the use of data tables that collect groups of
elements into relations. These models work based on the idea that each table
setup will include a primary key or identifier. Other tables use that identifier to
provide "relational" data links and results.
RELATIONAL DATABASE DESIGN
RELATIONAL DATABASE DESIGN

Table 1 Table 2

Product_code Description Price Invoice_code Invoice_line Product_code Quantity

A416 Colour Pen ₹ 25.00 3804 1 A416 15

C923 Pencil box ₹ 45.00 3804 2 C923 24


RELATIONAL DATABASE DESIGN

There are four stages of an RDM which are as follows −


• Relations and attributes − The various tables and attributes related to each table are identified. The tables represent entities, and the
attributes represent the properties of the respective entities.
• Primary keys − The attribute or set of attributes that help in uniquely identifying a record is identified and assigned as the primary key.
• Relationships −The relationships between the various tables are established with the help of foreign keys. Foreign keys are attributes
occurring in a table that are primary keys of another table. The types of relationships that can exist between the relations (tables) are One
to one, One to many, and Many to many
• Normalization − This is the process of optimizing the database structure. Normalization simplifies the database design to avoid
redundancy and confusion. The different normal forms are as follows:
• First normal form
• Second normal form
• Third normal form
• Boyce-Codd normal form
• Fifth normal form
DISADVANTAGES OF USING RELATIONAL MODEL

• Few relational databases have limits on field lengths which can't be exceeded.
• Relational databases can sometimes become complex as the amount of data
grows, and the relations between pieces of data become more complicated.
• Complex relational database systems may lead to isolated databases where the
information cannot be shared from one system to another.
DECOMPOSITION

• Decomposition is the process of breaking down in parts or elements.


• It replaces a relation with a collection of smaller relations.
• It breaks the table into multiple tables in a database.
• It should always be lossless, because it confirms that the information in the
original relation can be accurately reconstructed based on the decomposed
relations.
• If there is no proper decomposition of the relation, then it may lead to problems
like loss of information.
PROPERTIES OF DECOMPOSITION

Following are the properties of Decomposition,


• 1. Lossless Decomposition
• 2. Dependency Preservation
LOSSY DECOMPOSITION

• As the name suggests, when a relation is decomposed into two or more


relational schemas, the loss of information is unavoidable when the original
relation is retrieved.
<EmpInfo>

Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name

E001 Jacob 29 Alabama Dpt1 Operations

E002 Henry 32 Alabama Dpt2 HR

E003 Tom 22 Texas Dpt3 Finance


LOSSY DECOMPOSITION

Decompose the above table into two tables −Now, you won’t be able to join the above tables, since Emp_ID isn’t part of the
DeptDetails relation.

Therefore, the above relation has lossy decomposition.

<EmpDetails> <DeptDetails>
Emp_ID Emp_Name Emp_Age Emp_Locati Dept_ID Dept_Name
on
Dpt1 Operations
E001 Jacob 29 Alabama

E002 Henry 32 Alabama Dpt2 HR

E003 Tom 22 Texas Dpt3 Finance


LOSSLESS DECOMPOSITION

• Decomposition must be lossless. It means that the information should not get lost
from the relation that is decomposed.
• It gives a guarantee that the join will result in the same relation as it was
decomposed.

Example:
• Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1,
E2, E3, . . . . En; With instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en,
then it is called as 'Lossless Join Decomposition'.

• In the above example, it means that, if natural joins of all the decomposition give
the original relation, then it is said to be lossless join decomposition.
LOSSLESS DECOMPOSITION

Example: <Employee_Department> Table


Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Pune 20000 D001 Finance
E002 PQR 30 Pune 30000 D002 Production
E003 LMN 25 Mumbai 5000 D003 Sales
E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human
Resource

•Decompose the above relation into two relations to check whether a decomposition is
lossless or lossy.
•Now, we have decomposed the relation that is Employee and Department.
LOSSLESS DECOMPOSITION

Relation 1 : <Employee> Table Relation 2 : <Department> Table


Eid Ename Age City Salary Deptid Eid DeptName
E001 ABC 29 Pune 20000 D001 E001 Finance
E002 PQR 30 Pune 30000 D002 E002 Production
E003 LMN 25 Mumbai 5000 D003 E003 Sales
E004 XYZ 24 Mumbai 4000 D004 E004 Marketing
E005 STU 32 Bangalore 25000 D005 E005 Human Resource
LOSSLESS DECOMPOSITION

Employee ⋈ Department
Eid Ename Age City Salary Deptid DeptName

E001 ABC 29 Pune 20000 D001 Finance


E002 PQR 30 Pune 30000 D002 Production

E003 LMN 25 Mumbai 5000 D003 Sales


E004 XYZ 24 Mumbai 4000 D004 Marketing
E005 STU 32 Bangalore 25000 D005 Human
Resource
DEPENDENCY PRESERVATION
DECOMPOSITION

• The dependency preservation decomposition is another property of


decomposed relational database schema D in which each functional
dependency X -> Y specified in F either appeared directly in one of the relation
schemas Ri in the decomposed D or could be inferred from the dependencies
that appear in some Ri.
DEPENDENCY PRESERVATION
DECOMPOSITION

Example:
Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D} are given.
A relation R is decomposed into -
R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and
R2 = (C, D) with FDs F2 = {C -> D}.
• F' = F1 ∪ F2 = {A -> B, A -> C, C -> D}
• so, F' = F.
• And so, F'+ = F+.

• Thus, the decomposition is dependency preserving decomposition.


NORMALIZATION

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a
database with anomalies is next to impossible.

• Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For
example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the database in an inconsistent state.

• Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved
somewhere else.

• Insert anomalies − We tried to insert data in a record that does not exist at all.

• Normalization is a method to remove all these anomalies and bring the database to a consistent state.
FIRST NORMAL FORM

• First Normal Form is defined in the definition of relations (tables) itself. This
rule defines that all the attributes in a relation must have atomic domains. The
values in an atomic domain are indivisible units.

• We re-arrange the relation (table) as below, to convert it to First Normal Form.


• Each attribute must contain only a single value from its pre-defined domain.
SECOND NORMAL FORM

Before we learn about the second normal form, we need to understand the following −

• Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.

• Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime
attribute.

• If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper subset
Y of X, for which Y → A also holds true.
SECOND NORMAL FORM

• We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon
both and not on any of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial
dependency, which is not allowed in Second Normal Form.

• We broke the relation in two as depicted in the above picture. So there exists no partial dependency.
THIRD NORMAL FORM

• For a relation to be in Third Normal Form, it must be in Second Normal form


and the following must satisfy −

• No non-prime attribute is transitively dependent on prime key attribute.


• For any non-trivial functional dependency, X → A, then either −
• X is a super key or,
• A is prime attribute.
THIRD NORMAL FORM

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City
can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations as follows −
BOYCE-CODD NORMAL FORM

• Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −

• For any non-trivial functional dependency, X → A, X must be a super-key.


• In the above example Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip


and
Zip → City

Which confirms that both the relations are in BCNF.


4TH NORMAL FORM

• For a table to satisfy the Fourth Normal Form, it should satisfy the following
two conditions:

• It should be in the Boyce-Codd Normal Form.


• And, the table should not have any Multi-valued Dependency.
4TH NORMAL FORM

• What is Multi-valued Dependency?


• A table is said to have multi-valued dependency, if the following conditions are true,

• For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may
have multi-valued dependency.
• Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
• And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and
C should be independent of each other.
• If all these conditions are true for any relation(table), it is said to have multi-valued dependency.
4TH NORMAL FORM

s_id course hobby


1 • . Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey

s_id course hobby


1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
4TH NORMAL FORM

CourseOpted Table Hobbies Table


• .
s_id course s_id hobby
1 Science 1 Cricket
1 Maths 1 Hockey
2 C# 2 Cricket
2 Php 2 Hockey
FIFTH NORMAL FORM / PROJECTED
NORMAL FORM (5NF):

• A relation R is in 5NF if and only if every join dependency in R is implied by the


candidate keys of R. A relation decomposed into two relations must have loss-less
join Property, which ensures that no spurious or extra tuples are generated, when
relations are reunited through a natural join.

• Properties – A relation R is in 5NF if and only if it satisfies following conditions:

• R should be already in 4NF.


• It cannot be further non loss decomposed (join dependency)
FIFTH NORMAL FORM / PROJECTED
NORMAL FORM (5NF):

• Example – Consider the above schema, with a case as “if a company makes a
product and an agent is an agent for that company, then he always sells that
product for the company”. Under these circumstances, the ACP table is shown
Table – ACP
as:

AGENT COMPANY PRODUCT


A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
FIFTH NORMAL FORM / PROJECTED
NORMAL FORM (5NF):

• The relation ACP is again decompose into 3 relations. Now, the natural Join of
all the three relations will be shown as:
Table – R1
Table – R2 Table – R3

AGENT COMPANY AGENT PRODUCT COMPANY PRODUCT


A1 PQR A1 Nut PQR Nut
A1 XYZ A1 Bolt PQR Bolt
A2 PQR A2 Nut XYZ Nut
XYZ Bolt

Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13
and R2 over ‘Agent’and ‘Product’ will be table ACP.

You might also like