Lecture Note 5 (Normalization)
Lecture Note 5 (Normalization)
So, to handle these problems, we should analyze and decompose the relations
with redundant data into smaller, simpler, and well-structured relations that
are satisfy desirable properties. Normalization is a process of decomposing the
relations into relations with fewer attributes.
What is Normalization?
1|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Types of Anomalies in Database
Following are the types of anomalies that make the table inconsistency, loss of
integrity, and redundant data.
4. Deletion Anomalies: The delete anomaly refers to the situation where the
deletion of data results in the unintended loss of some other important data.
For example, if we want to remove Trent Bolt from the Student table, it also
removes his address, course and other details from the Student table.
Therefore, we can say that deleting some attributes can remove other attributes
of the database table.
2|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Types of Normal Forms
Normalization works through a series of stages called Normal forms. The
normal forms apply to individual relations. The relation is said to be in
particular normal form if it satisfies constraints. Following are the various types
of Normal forms. These are:
Example of 1NF
EMPLOYEE Table
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below.
3|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
Non-prime Attribute: Attributes which are not part of any candidate key.
Example of 2NF
Consider the following table.
• Note that, there are many courses having the same course fee.
• Here, COURSE_FEE cannot alone decide the value of COURSE_NO
or STUD_NO. That means, COURSE_FEE alone cannot be a Key.
4|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
• COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO. That means, {COURSE_FEE, STUD_NO} cannot be
a candidate key.
• COURSE_FEE together with COURSE_NO cannot decide the value
of STUD_NO. That means, {COURSE_FEE, COURSE_NO} cannot
be a candidate key.
• Hence, COURSE_FEE would be a non-prime attribute, as it does not
belong to the one only candidate key {STUD_NO, COURSE_NO}.
• But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is
dependent on COURSE_NO, where COURSE_NO is a proper subset
of the candidate key.
• That means, here, non-prime attribute COURSE_FEE is dependent on
a proper subset of the candidate key, which is a partial dependency
and so this relation is not in 2NF.
• To convert the above relation to 2NF, we need to split the table into
two tables such as: Table 1: STUD_NO, COURSE_NO Table 2:
COURSE_NO, COURSE_FEE.
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
NOTE: 2NF tries to reduce the redundant data getting stored in memory.
For instance, if there are 100 students taking C1 course, we don’t need to
store its Fee as 1000 for all the 100 records, instead, once we can store it in
the second table as the course fee for C1 is 1000.
5|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any transitive
partial dependency.
• If there is no transitive dependency for non-prime attributes, then the
relation must be in 3NF.
• A relation is in 3NF if it holds at least one of the following conditions
for every non-trivial function dependency X → Y.
i. X is a super key.
ii. Y is a prime attribute, i.e., each element of Y is part of some
candidate key.
Example of 3NF
EMPLOYEE_DETAIL Table
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
6|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
In the above table:
• Super Key: {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID,
EMP_NAME, EMP_ZIP}, …… and so on
• Candidate Key: {EMP_ID}
• Non-prime Attributes: In the given table, all attributes except EMP_ID
are non-prime.
EMPLOYEE Table
EMP_ID EMP_NAME EMP_ZIP
EMPLOYEE_ZIP Table
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
7|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Boyce Codd normal form (BCNF)
• BCNF is the advance version of 3NF. It is stricter than 3NF.
• A table is in BCNF if every functional dependency X → Y, X is the super
key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super
key.
Example of BCNF
Let's assume there is a company where employees work in more than one
department.
EMPLOYEE Table
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
EMP_COUNTRY Table
EMP_ID EMP_COUNTRY
264 India
364 UK
8|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
EMP_DEPT Table
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
EMP_DEPT_MAPPING Table
EMP_ID EMP_DEPT
264 283
264 300
364 232
364 549
Now, this is in BCNF because left side part of both the functional
dependencies is a super key.
9|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Fourth Normal Form (4NF)
A relation R is in 4NF if and only if the following conditions are satisfied:
Multi-valued Dependency
Example of 4NF
The given below STUDENT table is in 3NF, but the COURSE and HOBBY
are two independent entities. Hence, there is no relationship between COURSE
and HOBBY.
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
10 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So, there
is a multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
So, to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
11 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Fifth Normal Form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
• 5NF is also known as Project-join normal form (PJ/NF).
Example of 5NF
• In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2. In this case,
combination of all these fields required to identify a valid data. That means,
primary key for the table is combination of all the fields (columns).
• Suppose we add a new Semester as Semester 3 but do not know about
the subject and who will be taking that subject so we leave Lecturer and
Subject as NULL. But all three columns together act as a primary key,
so we can't leave other two columns blank.
• So, to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3.
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
12 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Relational Decomposition
• When a relation in the relational model is not in appropriate normal
form then the decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.
• If the relation has no proper decomposition, then it may lead to
problems like loss of information.
• Decomposition is used to eliminate some of the problems of bad design
like anomalies, inconsistencies, and redundancy.
13 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Types of Decomposition
Lossless Decomposition
• If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
• The lossless decomposition guarantees that the join of relations will
result in the same relation as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example
EMPLOYEE_DEPARTMENT
EMP_ EMP_ EMP_ EMP_ DEPT_ DEPT_
ID NAME AGE CITY ID NAME
14 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
EMPLOYEE
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID",
then the resultant relation will be lossless and will look like:
Employee ⋈ Department
EMP_ EMP_ EMP_ EMP_ DEPT_ DEPT_
ID NAME AGE CITY ID NAME
15 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Dependency Preserving
16 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)