Normalization
Normalization
SQL DATABASES
CONTENTS:
1. What is Normalization ↗
2. Anomaly and their types ↗
3. Types of Normalization ↗
4. 1NF ↗
5. 2NF ↗
6. 3NF ↗
7. BCNF ↗
8. 4NF
9. 5NF
Normalization
• Database normalization is a database schema design technique, by
which an existing Schema is modified to minimize redundancy and
dependency of data.
• Normalization split a large table into smaller tables and define
relationships between them to increase the clarity in organizing data.
Why is normalization needed?
• To avoid redundancy.
• To avoid/minimize anomalies and other issues.
Anomaly
• Problems that occur in poorly planned, unnormalized databases where all
the unrelated data is stored in one table.
Types of Anomalies
• insert
• delete
• update
Insert anomaly
• Insert anomaly occurs when data of certain attributes cannot be inserted into the
database without the presence of other attributes.
Delete anomaly
• Delete anomaly exists when data of certain attributes are lost because of the
deletion of other attributes.
Update anomaly
Course Table:
Course Name Course Duration
C++ 6 months
Java 5 months
Python 3 months
s001 P1
s001 P2
s002 P1
s003 P1
s003 P2
s004 P1
s005 P1
SECOND NORMAL FORM
[ 2NF ]
A table is said to be in 2NF, if it satisfies the following two conditions.
1. The table should be in 1NF.
2. The table should not contain any partial dependencies.
(or)
All non-prime attributes should depend on the entire primary key, not on the
part of any primary key. (i.e. candidate key)
(or)
No non-prime attribute is dependent on any proper subset of any candidate
key of the table.
Partial Dependency
• The attribute eid (employee id) is the primary key for the EMPLOYEE table.
• All other non-prime attributes ename (employee name), dob (date of birth) and salary
are completely dependent on the primary key attribute eid which means only by
using eid value, values of all other non-prime attributes in the table are uniquely
identifiable.
Example 2:
stud_no course_no course_fee · In the STUDENT table, the attributes stud_no and course_no
are the primary key attributes because any student can
101 C1 1500
learn any number of courses.
101 C3 1500
· Here the non-prime attribute course_fee is partially dependent
because its value can be identified by using course_fee alone.
102 C1 1500
· Hence the table is not in 2NF.
· To convert this table into 2NF, we split it into two tables which
103 C1 1500 are in 2NF.
· stud_no and course_no are the primary key for the tables
104 C2 2000 STUDENT and COURSE respectively.
104 C3 1500
105 C4 3000
STUDENT
stud_no course_no COURSE
101 C1 course_no course_fee
101 C3
C1 1500
102 C1
C2 2000
103 C1
C3 1500
104 C2
104 C3 C4 3000
105 C4
THIRD NORMAL FORM
[ 3NF ]
A table is said to be in 3NF or third normal form, if the following requirements are
satisfied.
• All 2NF requirements are fulfilled.
• There is no transitive dependency. (i.e.) A non-key column should not depend on
another non-key column.
(OR)
A relation is in 3NF if at least one of the following conditions holds in every non-trivial
functional dependency X -> Y
• X is a super key.
• Y is a Prime attribute.
Conversion to 3NF
Remove the transitively dependent attribute from the relation and place it in a
new relation along with the determinant.
Example: BOOK_DETAILS
Book_ID Category_ID Category_type Price
101 1 Competitive 600
102 2 Sports 350
103 2 Sports 275
104 1 Competitive 750
105 3 Novel 500
106 4 Poetry 350
• In the relation BOOK_DETAILS, the Book_ID attribute is the Primary Key attribute.
Because using BOOK_ID the values of all attributes such as Category_ID,
Category_type and Price of every record can be uniquely identified.
• The Non-prime attributes Price, Category_ID and Category_type are dependent on the
Primary Key.
• But the Category_type is also dependent on the non-prime attribute Category_ID.
• So it is transitively dependent on the Primary Key. i.e. The Primary Key attribute
BOOK_ID can find the Category_type through Category_ID too.
So to convert the relation to 3NF it is decomposed into 2 relations such as given below
BOOK_DETAILS CATEGORY_DETAILS
Book_ID Category_ID Price Category_ID Category_type
101 1 600
1 Competitive
102 2 350
2 Sports
103 2 275
3 Novel
104 1 750
4 Poetry
105 3 500
5 Puzzle
106 4 350
BOYCE CODD NORMAL
FORM
[ BCNF ]
• It is also called as 3.5NF
A table is said to be BCNF, if it satisfies the following conditions.
1. It should satisfy 3nf
2. All determinants must be candidate key.