0% found this document useful (0 votes)
2 views

SQL Normalization

Normalization is the process of organizing data in a database to minimize redundancy and eliminate anomalies. It involves dividing larger tables into smaller ones linked by relationships, with various normal forms (1NF to 5NF) addressing different types of dependencies. While normalization offers advantages like data consistency and flexibility, it can also lead to performance issues and requires thorough understanding of user needs.

Uploaded by

Ravish
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

SQL Normalization

Normalization is the process of organizing data in a database to minimize redundancy and eliminate anomalies. It involves dividing larger tables into smaller ones linked by relationships, with various normal forms (1NF to 5NF) addressing different types of dependencies. While normalization offers advantages like data consistency and flexibility, it can also lead to performance issues and requires thorough understanding of user needs.

Uploaded by

Ravish
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Normalization

Organizing the data in SQL


database

Ashish Zope
Data Engineer at Bajaj Finserv
What is Normalization?
➢ Normalization is the process of organizing the data in the
database.

➢ Normalization is used to minimize the redundancy from a relation


or set of relations. It is also used to eliminate undesirable
characteristics like Insertion, Update, and Deletion Anomalies.

➢ Normalization divides the larger table into smaller and links them
using relationships.

➢ The normal form is used to reduce redundancy from the database


table.
Types of Normal Forms

• Eliminate Repeating Groups


1NF

• Eliminate Partial Functional Dependency


2NF

• Eliminate Transitive Dependency


3NF

• Eliminate Multi-Values Dependency


4NF

• Eliminate Join Dependency


5NF
Following are the various types of Normal forms:

NORMAL
DESCRIPTION
FORM
1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes


are fully functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition


dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal
form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and
has no multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Advantages of Normalization

➢ Normalization helps to minimize data redundancy.


➢ Greater overall database organization.
➢ Data consistency within the database.
➢ Much more flexible database design.
➢ Enforces the concept of relational integrity.
Disadvantages of Normalization

➢ You cannot start building the database before knowing what the
user needs.
➢ The performance degrades when normalizing the relations to
higher normal forms, i.e., 4NF, 5NF.
➢ It is very time-consuming and difficult to normalize relations of a
higher degree.
➢ Careless decomposition may lead to a bad database design,
leading to serious problems.
First Normal Form (1NF)
If a relation contains a composite or multi-valued attribute, it violates the first
normal form, or the relation is in the first normal form if it does not contain any
composite or multi-valued attribute.

A relation is in first normal form if every attribute in that relation is single-valued


attribute.

A table is in 1NF if:

➢ There are only Single Valued Attributes.

➢ Attribute Domain does not change.

➢ There is a unique name for every Attribute/Column.

➢ The order in which data is stored does not matter.


First Normal Form (1NF)
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.

EMPLOYEE table:
EmpID EmpName EmpMobNo State
1473 Ashish 7744046830 MH
7666556747
1323 Akshay 9767568976 MP
4545 Rahul 3444046830 GJ
2666556747

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EmpID EmpName EmpMobNo State


1473 Ashish 7744046830 MH
1473 Ashish 7666556747 MH
1323 Akshay 9767568976 MP
4545 Rahul 3444046830 GJ
4545 Rahul 2666556747 GJ
Second Normal Form (2NF)
The second Normal Form (2NF) is based on the concept of fully functional
dependency. The second Normal Form applies to relations with composite
keys, that is, relations with a primary key composed of two or more
attributes.

A relation that is in First Normal Form and every non-primary-key attribute


is fully functionally dependent on the primary key, then the relation is in
Second Normal Form (2NF).

A table is in 2NF if:

➢ In the 2NF, relational must be in 1NF.

➢ In the second normal form, all non-key attributes are fully


functional dependent on the primary key
Second Normal Form (1NF)
Example: Relation EMPLOYEE is not in 2NF because of multi-valued attribute
EMP_PHONE.

EMPLOYEE table:
EmpID EmpName EmpMobNo
1473 Ashish 7744046830
1473 Ashish 7666556747
1323 Akshay 9767568976
4545 Rahul 3444046830
4545 Rahul 2666556747

To convert the given table into 2NF, we decompose it into two tables:

EmpId Table EmpDetails Table


EmpID EmpName EmpID EmpMoNo
1473 Ashish 1473 7744046830
1323 Akshay 1473 7666556747
4545 Rahul 1323 9767568976
4545 3444046830
4545 2666556747
Third Normal Form (3NF)
A relation is in the third normal form, if there is no transitive dependency
for non-prime attributes as well as it is in the second normal form. A
relation is in 3NF if at least one of the following conditions holds in every
non-trivial function dependency X –> Y.
• X is a super key.
• Y is a prime attribute (each element of Y is part of some candidate key).

A table is in 3NF if:

➢ A relation will be in 3NF if it is in 2NF and not contain any transitive


partial dependency.

➢ 3NF is used to reduce the data duplication. It is also used to achieve


the data integrity.

➢ If there is no transitive dependency for non-prime attributes, then the


relation must be in third normal form.
Third Normal Form (3NF)
Example: Relation EMPLOYEE is not in 3NF because of multi-valued attribute EmpState
StateCode and EmpID

EmployeeDetails Table

EmpID EmpName EmpState StateCode


1473 Ashish MH 1
1323 Akshay MP 2
4545 Rahul GJ 3
5431 Ashwini MH 1

To convert the given table into 3NF, we decompose it into two tables:

EmployeeDetails Table StateDetails Table


EmpID EmpName StateCode StateCode EmpState
1473 Ashish 1 1 MH
1323 Akshay 2 2 MP
4545 Rahul 3 3 GJ
5431 Ashwini 1
Boyce Codd normal form (BCNF)
Although, 3NF is an adequate normal form for relational databases, still, this
(3NF) normal form may not remove 100% redundancy because of X−>Y
functional dependency if X is not a candidate key of the given relation. This
can be solved by Boyce-Codd Normal Form (BCNF).

A table is in BCNF if:

➢ BCNF is the advance version of 3NF. It is stricter than 3NF.

➢ A table is in BCNF if every functional dependency X → Y, X is the super


key of the table.

➢ For BCNF, the table should be in 3NF, and for every FD, LHS is super
key.
Boyce Codd normal form (BCNF)
Example: Relation EMPLOYEE is not in 3NF because of multi-valued attribute EmpState
StateCode and EmpID, We can split the table into Employee Table and State Table and
Mapping of Employee Table and State Table
EmployeeDetails Table
EmpID EmpName EmpState StateCode
1473 Ashish MH 1
1323 Akshay MP 2
4545 Rahul GJ 3
5431 Ashwini MH 1

To convert the given table into BCNF, we decompose it into three tables:

EmployeeDetails Table StateDetails Table MappingEmployeeState Table


EmpID EmpName StateCode EmpState EmpID StateCode
1473 Ashish 1 MH 1473 1
1323 Akshay 2 MP 1323 2
4545 Rahul 3 GJ 4545 3
5431 Ashwini 5431 1
Fourth normal form (4NF)
Although BCNF is a stricter normal form than 3NF and removes redundancy
caused by functional dependencies, it may not address all types of
redundancy, particularly those caused by multivalued dependencies. This
can be solved by Fourth Normal Form (4NF)

A table is in 4NF if:

➢ 4NF is an advanced version of BCNF. It is stricter than BCNF.

➢ A table is in 4NF if it is in BCNF and does not contain any non-trivial


multivalued dependencies.

➢ For 4NF, the table should be in BCNF, and for every multivalued
dependency, the left-hand side (LHS) must be a superkey
Fourth normal form (4NF)
Example: we have a table named EmployeeDetails that contains information about
employees, their skills, and hobbies:
EmployeesDetails Table
EmpID EmpName Skills Hobbies
1 Ashish SQL Photography
1 Ashish PySpark Photography
2 Akshay PowerBI Biking
2 Akshay PowerBI Painting
3 Rahul Python Reading
3 Rahul JAVA Reading

By splitting the original table into these three tables, we eliminate the multi-valued
dependencies, and each table now represents a single entity.
Employees Table EmployeeSkills Table EmployeeHobbies Table

EmpID EmpName EmpID Skill EmpID Hobby


1 Ashish 1 SQL 1 Photography
2 Akshay 1 PySpark 2 Biking
3 Rahul 2 PowerBI 2 Painting
3 Python 3 Reading
3 JAVA
Fifth normal form (5NF)
Although 4NF is a stricter normal form than BCNF and removes redundancy
caused by multivalued dependencies, it may not address all types of
redundancy, particularly those caused by join dependencies. This can be
solved by Fifth Normal Form (5NF).

A table is in 5NF if:

➢ 5NF is an advanced version of 4NF. It is stricter than 4NF.

➢ A table is in 5NF if it is in 4NF and does not contain any non-trivial join
dependencies.

➢ For 5NF, the table should be in 4NF, and for every join dependency,
the left-hand side (LHS) must be a superkey1
Ashish Zope
THANK YOU Data Engineer at Bajaj Finserv

If you find this helpful, Repost and follow for more content.
Submit corrections in comments, if any.

You might also like