CS331 - Chapter5 Normalization
CS331 - Chapter5 Normalization
CHAPTER 5
NORMALIZATION
CSE JUNIOR – SE JUNIOR – CS BACHELOR
WINTER 2021
Objectives
➢ What normalization is and what role it plays in the database
design process
➢ About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF
➢ How normal forms can be transformed from lower normal forms
to higher normal forms
➢ That normalization and ER modeling are used to produce a
good database design
2
Motivation
➢ Table = basic building block of database design
➢ Table structure is of great interest.
3
What is normalization ?
➢ Database design technique → organizing tables → reducing redundancy and
dependency of data.
➢ Divides larger tables into smaller tables and links them using relationships.
Objectives
➔ eliminate redundant (useless) data
➔ ensure data is stored logically.
4
Database Normal Forms
8
1st
Normal
3rd Normal 4th Normal
Form Form Form
2nd Boyce
Normal Codd
Form Normal
Form
1NF Rules
1. Each table cell should contain a single value.
✓ It should only have single or atomic valued attributes
✓ Values stored in a column should be of the same domain
✓ All the columns of this table should have unique names.
✓ And the order in which data is stored, does not matter.
10
Example Multi-valued
roll_no name subject
101 John Snow OS, CN Break the values into
103 Samwell Tarly Java atomic values
102 Alisha Robert C, C++
Boyce
2nd Codd
Normal Normal
Form
Form
2NF Rules
1. be in 1NF.
2. No Partial Dependency.
13
What is dependency?
Registration number
student_id Name branch address
(reg_no)
10 John Snow 07-WY CSE CA
Samwell
11 08-WY SE LA
Tarly
14
What is partial dependency?
Subject Student
subject_id subject_name Registration
student_id Name branch address
number (reg_no)
1 Java
John
10 07-WY CSE CA
2 C++ Snow
Samwell
3 Php 11 08-WY SE LA
Tarly
10 1 70 Java Teacher
10 2 75 C++ Teacher
11 1 80 Java Teacher
15
How to remove partial dependency?
Subject
subject_id subject_name Teacher
10 1 70
10 2 75
11 1 80
16
2NF Recap
17
3rd
Normal 4th
1st Normal Normal
Form Form Form
2nd Boyce
Normal Codd
Form Normal
Form
3NF Rules
1. Be in 2NF
2. No transitive functional dependency.
19
What is transitive dependency?
Student
Subject
stude Registration
subject_id subject Teacher Name branch address
nt_id number (reg_no)
_name
10 John Snow 07-WY CSE CA
1 Java Java Teacher
Samwell
2 C++ C++ teacher 11 08-WY SE LA
Tarly
3 Php Php teacher 12 Alisha 09-WY IT FL
Robert
10 1 70 Workshop 200
10 2 75 Practicals 70
11 1 80 Theoritical 30
20
How to remove Transitive Dependency?
Student Subject
student_id Name reg_no branch address Subject_id Subject_Name Teacher
10 John Snow 07-WY CSE CA 10 John Snow 07-WY
11 Samwell Tarly 08-WY SE LA 11 Samwell Tarly 08-WY
12 Alisha Robert 09-WY IT FL 12 Alisha Robert 09-WY
Score Exam
Exam_id Exam_name Total_marks
student_id subject_id marks Exam_Id
10 1 70 1 1 Workshop 200
10 2 75 2 2 Mains 70
11 1 80 3 3 practicals 30
4th
1st Normal 3rd Normal Normal
Form Form Form
2nd
Normal Boyce
Form Codd
Normal
Form
Super Key
➢ group of single or multiple keys which identifies rows in a table.
➢ may have additional attributes that are not needed for unique identification.
StudID is a PK.
Alternate Key
➢ column or group of columns in a table that uniquely identify every row in that table.
➢ A table can have multiple choices for a primary key but only one can be set as the primary key. All the
keys which are not PK are called Alternate Keys.
In this table, StudID, Roll No, Email are qualified to become a primary key.
But since StudID is the primary key, Roll No, Email become the alternative keys.
Candidate Key
➢ set of attributes that uniquely identify tuples in a table.
➢ Candidate Key is a super key with no repeated attributes.
➢ The Primary key should be selected from the candidate keys.
➢ Every table must have at least a single candidate key.
➢ A table can have multiple candidate keys but only a single primary key.
Properties of Candidate key: StudID Roll No First Name LastName Email
• It must contain unique values 1 11 Tom Price [email protected]
• Candidate key may have multiple attributes 2 12 Nick Wright [email protected]
• Must not contain null values 3 13 Dana Natan [email protected]
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table
Stud ID, Roll No, and email are candidate keys which help us to uniquely identify the student record in
the table.
Foreign Key
➢ a column that creates a relationship between two tables.
➢ The purpose of Foreign keys is to maintain data integrity and allow navigation between two different
instances of an entity.
➢ It acts as a cross-reference between two tables as it references the primary key of another table.
Teacher Department
Teacher ID Fname Lname DeptCode DeptName We cannot see which Teacher works in which
B002 David Warner 001 Science
department?
B017 Sara Joseph 002 English → To create a relationship between the two
tables we can add the DeptCode to Teacher
B009 Mike Brunton 005 Computer
table as a FK ➔ Referential integrity
1. Be in 3NF
2. For any dependency A → B, A should be a super key
▪ still there would be anomalies resulted if it has more than one Candidate Key
▪ for a dependency A → B, A cannot be a non-prime attribute, if B is a prime attribute.
▪ When a table has more than one candidate key, anomalies may result even though
the relation is in 3NF.
▪ BCNF is a special case of 3NF.
▪ A relation is in BCNF if, and only if, every determinant is a candidate key.
29
Example not satisfying BCNF
School enrollment
Student_id subject professor ➢ One student can enroll for multiple subjects.
• For example, student with student_id 101, has opted for
101 Java Donna Anderson
subjects - Java & C++
101 C++ Emma Klein
➢ For each subject, a professor is assigned to the student.
102 Java John Louis
➢ There can be multiple professors teaching one subject like we
103 C# Daniel Robert
have for Java.
104 Java Donna Anderson
❖ student_id & subject together form the PK, because using them, we can find all the columns of the table.
❖ One professor teaches only one subject, but one subject may have two different professors ➔ there is a
dependency between subject and professor here, where subject depends on the professor name.
✓ This table satisfies the 1NF because all the values are atomic, column names are unique and all the values
stored in a particular column are of same domain.
✓ This table also satisfies the 2NF as there is no Partial Dependency.
✓ Since there is no Transitive Dependency, the table also satisfies the 3NF
30
Why it is not satisfying BCNF & how to make it in BCNF?
➢ student_id & subject form primary key, which means subject column is a prime attribute.
➢ But, there is one more dependency, professor → subject.
➢ Since subject is a prime attribute and professor is a non-prime attribute, this depend is not allowed by BCNF.
➢ To make this relation/table satisfy BCNF, this table should be decomposed into two tables:
student table & professor table.
31
Generic explanation
When a table has more than one candidate key, anomalies may
result even though the relation is in 3NF. BCNFis a special case of
3NF.
A relation is in BCNF if, and only if, every determinant/dependency
is a candidate key.
4th Normal Form
A relation will be in 4NF if
• it is in Boyce Codd normal form
• has no multi-valued dependency.
• First Normal Form (1NF): only single values are permitted at the intersection of each
row and column so there are no repeating groups
• Second Normal Form (2NF): the relation must be in 1NF and all the attributes are
depending only on the whole PK (no partial dependency in case of composite PK)
• Third Normal Form (3NF): the relation must be in 2NF and all transitive dependencies
must be removed; a non-key attribute may not be functionally dependent on
another non-key attribute
▪ Use an ERD to provide the big picture, or macro view, of an organization’s data
requirements and operations.
➢ This is created through an iterative process that involves identifying relevant entities, their
attributes and their relationships.