0% found this document useful (0 votes)
53 views

CS331 - Chapter5 Normalization

1) Normalization is a database design technique used to organize data into tables to reduce redundancy and improve data integrity. 2) It involves dividing larger tables into smaller tables and linking them using relationships. This is done by applying normal forms like 1NF, 2NF, 3NF etc. 3) The goals of normalization are to eliminate redundant data, ensure data is stored logically and minimize data modification issues.

Uploaded by

Malek Msadek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

CS331 - Chapter5 Normalization

1) Normalization is a database design technique used to organize data into tables to reduce redundancy and improve data integrity. 2) It involves dividing larger tables into smaller tables and linking them using relationships. This is done by applying normal forms like 1NF, 2NF, 3NF etc. 3) The goals of normalization are to eliminate redundant data, ensure data is stored logically and minimize data modification issues.

Uploaded by

Malek Msadek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

CS331 DATABASE MANAGEMENT SYSTEMS

CHAPTER 5
NORMALIZATION
CSE JUNIOR – SE JUNIOR – CS BACHELOR
WINTER 2021
Objectives
➢ What normalization is and what role it plays in the database
design process
➢ About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF
➢ How normal forms can be transformed from lower normal forms
to higher normal forms
➢ That normalization and ER modeling are used to produce a
good database design

2
Motivation
➢ Table = basic building block of database design
➢ Table structure is of great interest.

Possibility of creating poor table structures even in a good database


design.

• Recognizing of poor table structure


Normalization
• Producing good tables

Process for evaluating and correcting table structures to minimize data


redundancies ( data anomalies)

3
What is normalization ?
➢ Database design technique → organizing tables → reducing redundancy and
dependency of data.
➢ Divides larger tables into smaller tables and links them using relationships.

Objectives
➔ eliminate redundant (useless) data
➔ ensure data is stored logically.

First normal form (1NF)


Second normal form (2NF) Edgar Codd
Third normal form (3NF)

Boyce-Codd Normal Form (BCNF) Raymond F. Boyce

4
Database Normal Forms

1st 3rd 4th


Normal Normal Normal 6th
Normal
Form Form Form Form

2nd Boyce 5th


Normal
Normal Codd Form
Form Normal
Form
Problems without normalization
❖ Data redundancy
❖ Memory space
❖ Database handling, request & update anomalies
Example – Student table
Head of
rollno name Branch Department office_tel
Insert-anomaly
(HoD)
401 John Snow CSE M. Michael Louis 53337 Update-anomaly
402 Samwell Tarly CSE M. Michael Louis 53337
Delete-anomaly
403 Alisha Robert CSE M. Michael Louis 53337

404 Donna Anderson CSE M. Michael Louis 53337

4 Computer Science students


Enrollment number, name, branch, head of department (hod) and office phone
number
Branch, hod, office_tel are repeated for the students who are in the same branch in
the college ➔ Data Redundancy.
Reasons for Database Normalization

1. Minimize duplicate data,


2. Minimize or avoid data modification issues,
3. Simplify queries.

8
1st
Normal
3rd Normal 4th Normal
Form Form Form

2nd Boyce
Normal Codd
Form Normal
Form
1NF Rules
1. Each table cell should contain a single value.
✓ It should only have single or atomic valued attributes
✓ Values stored in a column should be of the same domain
✓ All the columns of this table should have unique names.
✓ And the order in which data is stored, does not matter.

2. Each record needs to be unique.

10
Example Multi-valued
roll_no name subject
101 John Snow OS, CN Break the values into
103 Samwell Tarly Java atomic values
102 Alisha Robert C, C++

roll_no name subject


101 John Snow OS
101 John Snow CN
103 Samwell Tarly Java
102 Alisha Robert C
102 Alisha Robert C++
3rd
1st Normal Normal 4th Normal
Form Form Form

Boyce
2nd Codd
Normal Normal
Form
Form
2NF Rules

1. be in 1NF.
2. No Partial Dependency.

13
What is dependency?

Registration number
student_id Name branch address
(reg_no)
10 John Snow 07-WY CSE CA
Samwell
11 08-WY SE LA
Tarly

14
What is partial dependency?
Subject Student
subject_id subject_name Registration
student_id Name branch address
number (reg_no)
1 Java
John
10 07-WY CSE CA
2 C++ Snow
Samwell
3 Php 11 08-WY SE LA
Tarly

Score student_id subject_id marks teacher

10 1 70 Java Teacher

10 2 75 C++ Teacher

11 1 80 Java Teacher

15
How to remove partial dependency?
Subject
subject_id subject_name Teacher

1 Java Java Teacher

2 C++ C++ Teacher

3 Php Php Teacher

Score student_id subject_id marks

10 1 70

10 2 75

11 1 80

16
2NF Recap

1. 1NF without Partial Dependency.


2. Partial Dependency exists, when for a composite primary key, any
attribute in the table depends only on a part of the primary key and not on
the complete primary key.
3. To remove Partial dependency, we can divide the table, remove the
attribute which is causing partial dependency, and move it to some other
table where it fits in well

17
3rd
Normal 4th
1st Normal Normal
Form Form Form

2nd Boyce
Normal Codd
Form Normal
Form
3NF Rules

1. Be in 2NF
2. No transitive functional dependency.

19
What is transitive dependency?
Student
Subject
stude Registration
subject_id subject Teacher Name branch address
nt_id number (reg_no)
_name
10 John Snow 07-WY CSE CA
1 Java Java Teacher
Samwell
2 C++ C++ teacher 11 08-WY SE LA
Tarly
3 Php Php teacher 12 Alisha 09-WY IT FL
Robert

Score student_id subject_id marks Exam_name Total_marks

10 1 70 Workshop 200

10 2 75 Practicals 70

11 1 80 Theoritical 30

20
How to remove Transitive Dependency?
Student Subject
student_id Name reg_no branch address Subject_id Subject_Name Teacher
10 John Snow 07-WY CSE CA 10 John Snow 07-WY
11 Samwell Tarly 08-WY SE LA 11 Samwell Tarly 08-WY
12 Alisha Robert 09-WY IT FL 12 Alisha Robert 09-WY

Score Exam
Exam_id Exam_name Total_marks
student_id subject_id marks Exam_Id
10 1 70 1 1 Workshop 200
10 2 75 2 2 Mains 70
11 1 80 3 3 practicals 30
4th
1st Normal 3rd Normal Normal
Form Form Form

2nd
Normal Boyce
Form Codd
Normal
Form
Super Key
➢ group of single or multiple keys which identifies rows in a table.
➢ may have additional attributes that are not needed for unique identification.

EmpSSN EmpNum EmpName

9812345098 AB05 Shown

9876512345 AB06 Roslyn

199937890 AB07 James

EmpSSN and EmpNum name are superkeys.


Primary Key
➢ column or group of columns in a table that uniquely identify every row in that table.
➢ It cannot be a duplicate, i.e. the same value cannot appear more than once in the table.
➢ A table cannot have more than one primary key.
Rules for defining Primary key:
• Two rows cannot have the same PK value
• It must have a PK value for every row
• The PK field cannot be null.
• The value in a PK column(s) can never be modified/updated if any foreign key refers to that PK.

StudID Roll No First Name LastName Email


1 11 Tom Price [email protected]
2 12 Nick Wright [email protected]
3 13 Dana Natan [email protected]

StudID is a PK.
Alternate Key
➢ column or group of columns in a table that uniquely identify every row in that table.
➢ A table can have multiple choices for a primary key but only one can be set as the primary key. All the
keys which are not PK are called Alternate Keys.

StudID Roll No First Name LastName Email


1 11 Tom Price [email protected]
2 12 Nick Wright [email protected]
3 13 Dana Natan [email protected]

In this table, StudID, Roll No, Email are qualified to become a primary key.
But since StudID is the primary key, Roll No, Email become the alternative keys.
Candidate Key
➢ set of attributes that uniquely identify tuples in a table.
➢ Candidate Key is a super key with no repeated attributes.
➢ The Primary key should be selected from the candidate keys.
➢ Every table must have at least a single candidate key.
➢ A table can have multiple candidate keys but only a single primary key.
Properties of Candidate key: StudID Roll No First Name LastName Email
• It must contain unique values 1 11 Tom Price [email protected]
• Candidate key may have multiple attributes 2 12 Nick Wright [email protected]
• Must not contain null values 3 13 Dana Natan [email protected]
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table

Stud ID, Roll No, and email are candidate keys which help us to uniquely identify the student record in
the table.
Foreign Key
➢ a column that creates a relationship between two tables.
➢ The purpose of Foreign keys is to maintain data integrity and allow navigation between two different
instances of an entity.
➢ It acts as a cross-reference between two tables as it references the primary key of another table.

Teacher Department
Teacher ID Fname Lname DeptCode DeptName We cannot see which Teacher works in which
B002 David Warner 001 Science
department?
B017 Sara Joseph 002 English → To create a relationship between the two
tables we can add the DeptCode to Teacher
B009 Mike Brunton 005 Computer
table as a FK ➔ Referential integrity

Teacher ID DeptCode Fname Lname


B002 002 David Warner
B017 002 Sara Joseph
B009 001 Mike Brunton
BCNF Rules

1. Be in 3NF
2. For any dependency A → B, A should be a super key
▪ still there would be anomalies resulted if it has more than one Candidate Key
▪ for a dependency A → B, A cannot be a non-prime attribute, if B is a prime attribute.
▪ When a table has more than one candidate key, anomalies may result even though
the relation is in 3NF.
▪ BCNF is a special case of 3NF.
▪ A relation is in BCNF if, and only if, every determinant is a candidate key.

29
Example not satisfying BCNF
School enrollment
Student_id subject professor ➢ One student can enroll for multiple subjects.
• For example, student with student_id 101, has opted for
101 Java Donna Anderson
subjects - Java & C++
101 C++ Emma Klein
➢ For each subject, a professor is assigned to the student.
102 Java John Louis
➢ There can be multiple professors teaching one subject like we
103 C# Daniel Robert
have for Java.
104 Java Donna Anderson

❖ student_id & subject together form the PK, because using them, we can find all the columns of the table.
❖ One professor teaches only one subject, but one subject may have two different professors ➔ there is a
dependency between subject and professor here, where subject depends on the professor name.

✓ This table satisfies the 1NF because all the values are atomic, column names are unique and all the values
stored in a particular column are of same domain.
✓ This table also satisfies the 2NF as there is no Partial Dependency.
✓ Since there is no Transitive Dependency, the table also satisfies the 3NF

30
Why it is not satisfying BCNF & how to make it in BCNF?
➢ student_id & subject form primary key, which means subject column is a prime attribute.
➢ But, there is one more dependency, professor → subject.
➢ Since subject is a prime attribute and professor is a non-prime attribute, this depend is not allowed by BCNF.

➢ To make this relation/table satisfy BCNF, this table should be decomposed into two tables:
student table & professor table.

Professor table Student table


student_id Prof_id
prof_id Prof_name subject 101 1
1 Donna Anderson Java 101 2
2 Emma klein C++ 102 4
3 Daniel Robert C# 103 3
4 John Louis Java 104 1

31
Generic explanation
When a table has more than one candidate key, anomalies may
result even though the relation is in 3NF. BCNFis a special case of
3NF.
A relation is in BCNF if, and only if, every determinant/dependency
is a candidate key.
4th Normal Form
A relation will be in 4NF if
• it is in Boyce Codd normal form
• has no multi-valued dependency.

5th Normal Form


A relation is in 5NF if
• it is in 4NF
• not contains any join dependency and joining should be lossless.
Key terms & abbreviations
• Normalization: process of determining how much redundancy exists in a table

• First Normal Form (1NF): only single values are permitted at the intersection of each
row and column so there are no repeating groups

• Second Normal Form (2NF): the relation must be in 1NF and all the attributes are
depending only on the whole PK (no partial dependency in case of composite PK)

• Third Normal Form (3NF): the relation must be in 2NF and all transitive dependencies
must be removed; a non-key attribute may not be functionally dependent on
another non-key attribute

• Boyce-Codd normal form (BCNF): a special case of 3NF


Conclusion
▪ Normalization should be part of the database design process.
➢ However, it is difficult to separate the normalization process from the ER modelling process so the
two techniques should be used concurrently.

▪ Use an ERD to provide the big picture, or macro view, of an organization’s data
requirements and operations.
➢ This is created through an iterative process that involves identifying relevant entities, their
attributes and their relationships.

▪ Normalization procedure focuses on characteristics of specific entities and represents the


micro view of entities within the ERD.

You might also like