Normalization of Database-Ass-2
Normalization of Database-Ass-2
Normalization of Database
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and
Deletion Anomalies. It is a multi-step process that puts data into tabular form,
removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
In the table above, we have data of 4 Computer Sci. students. As we can see, data
for the fields branch, hod(Head of Department) and office_tel is repeated for the
students who are in the same branch in the college, this is Data Redundancy.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of
the student cannot be inserted, or else we will have to set the branch information
as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch
information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science
department? In that case all the student records will have to be updated, and if by
mistake we miss any record, it will lead to data inconsistency. This is Updation
anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together, Student
information and Branch information. Hence, at the end of the academic year, if
student records are deleted, we will also lose the branch information. This is
Deletion anomaly.
Normalization Rule
Normalization rules are divided into the following normal forms:
UNIT-2 NORMALIZATION IN DBMS
we learned and understood how data redundancy or repetition can lead to several
issues like Insertion, Deletion and Updation anomalies and
how Normalization can reduce data redundancy and make the data more
meaningful.
UNIT-2 NORMALIZATION IN DBMS
we will learn about the 1st Normal Form which is more like the Step 1 of the
Normalization process. The 1st Normal form expects you to design your table in
such a way that it can easily be extended and it is easier for you to retrieve data
from it whenever required.
If tables in a database are not even in the 1st Normal Form, it is considered as bad
database design.
This rule expects that each column in a table should have a unique name. This is to
avoid confusion at the time of retrieving data or performing any other operation on
the stored data.
If one or more columns have same name, then the DBMS system will be left
confused.
Our table already satisfies 3 rules out of the 4 rules, as all our column names are
unique, we have stored data in the order we wanted to and we have not inter-mixed
different type of data in columns.
UNIT-2 NORMALIZATION IN DBMS
But out of the 3 different students in our table, 2 have opted for more than 1
subject. And we have stored the subject names in a single column. But as per the
1st Normal form each column must contain atomic value.
101 Akon OS
101 Akon CN
102 Bkon C
By doing so, although a few values are getting repeated but values for
the subject column are now atomic for each record/row.
Using the First Normal Form, data redundancy increases, as there will be many
columns with same data in multiple rows but each row as a whole will be unique.
If you want you can skip the video, as the concept is covered in detail below the
video.
For a table to be in the Second Normal Form, it must satisfy two conditions:
What is Partial Dependency? Do not worry about it. First let's understand what
is Dependency in a table?
What is Dependency?
Let's take an example of a Student table with
columns student_id, name, reg_no(registration
number), branch and address(student's home address).
In this table, student_id is the primary key and will be unique for every row, hence
we can use student_id to fetch any row of data from this table
Even for a case, where student names are same, if we know the student_id we can
easily fetch the correct record.
Hence we can say a Primary Key for a table is the column or a group of
columns(composite key) which can uniquely identify each record in the table.
I can ask from branch name of student with student_id 10, and I can get it.
Similarly, if I ask for name of student with student_id 10 or 11, I will get it. So all I
need is student_id and every other column depends on it, or can be fetched using
it.
This is Dependency and we also call it Functional Dependency.
subject_id subject_name
1 Java
2 C++
3 Php
UNIT-2 NORMALIZATION IN DBMS
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are
these and subject_id to know for which subject the marks are for.
Together, student_id + subject_id forms a Candidate Key(learn about Database
Keys) for this table, which can be the Primary key.
Confused, How this combination can be a primary key?
See, if I ask you to get me marks of student with student_id 10, can you get it from
this table? No, because you don't know for which subject. And if I give
you subject_id, you would not know for which student. Hence we need student_id
+ subject_id to uniquely identify any row.
And our Score table is now in the second normal form, with no partial dependency.
1 10 1 70
2 10 2 75
3 11 1 80
UNIT-2 NORMALIZATION IN DBMS
Quick Recap
1. For a table to be in the Second Normal form, it should be in the First Normal
form and it should not have Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute
in the table depends only on a part of the primary key and not on the complete
primary key.
3. To remove Partial dependency, we can divide the table, remove the attribute
which is causing partial dependency, and move it to some other table where it
fits in well.
Subject Table
Score Table
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the exam
name and total marks, so let's add 2 more columns to the Score table.
1 Workshop 200
2 Mains 70
3 Practicals 30
The second point sounds a bit tricky, right? In simple words, it means, that for a
dependency A → B, A cannot be a non-prime attribute, if B is a prime
attribute.
103 C# P.Chash
As you can see, we have also added some sample data to the
table. In the table above:
One student can enrol for multiple subjects. For example, student
with student_id 101, has opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for
Java.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd
Normal Form.
But this table is not in Boyce-Codd Normal Form.
student_id p_id
101 1
101 2
and so on...
1 P.Java Java
2 P.Cpp C++
and so on...
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we
will learn about the Fourth Normal Form.
For a table to satisfy the Fourth Normal Form, it should satisfy the following two
conditions:
If all these conditions are true for any relation(table), it is said to have multi-valued
dependency.
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as
shown below, because for one student, two hobbies exists, hence along with both
the courses, these hobbies should be specified.
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
s_id course
1 Science
1 Maths
2 C#
2 Php
s_id hobby
1 Cricket
1 Hockey
UNIT-2 NORMALIZATION IN DBMS
2 Cricket
2 Hockey
Fourth normal form (4NF) is a level of database normalization where there are no
non-trivial multivalued dependencies other than a candidate key. It builds on the
first three normal forms (1NF, 2NF and 3NF) and the Boyce-Codd Normal Form
(BCNF). It states that, in addition to a database meeting the requirements of
BCNF, it must not contain more than one multivalued dependency.
Example – Consider the database table of a class whaich has two relations R1
contains student ID(SID) and student name (SNAME) and R2 contains course
id(CID) and course name (CNAME).
S1 A
UNIT-2 NORMALIZATION IN DBMS
SIDSNAME
S2 B
C1 C
C2 D
Table – R1 X R2
SID SNAME CID CNAME
S1 A C1 C
S1 A C2 D
S2 B C1 C
S2 B C2 D
Example –
Table – R1
COMPANY PRODUCT
C1 pendrive
C1 mic
C2 speaker
C2 speaker
Company->->Product
Table – R2
AGENT COMPANY
Aman C1
Aman C2
UNIT-2 NORMALIZATION IN DBMS
AGENTCOMPANY
Mohan C1
Agent->->Company
Table – R3
AGENT PRODUCT
Aman pendrive
Aman mic
Aman speaker
Mohan speaker
Agent->->Product
Table – R1⋈R2⋈R3
COMPANY PRODUCT AGENT
C1 pendrive Aman
C1 mic Aman
C2 speaker speaker
UNIT-2 NORMALIZATION IN DBMS
C1 speaker Aman
Agent->->Product
Fifth Normal Form / Projected Normal Form (5NF):
Table – ACP
AGENT COMPANY PRODUCT
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decompose into 3 relations. Now, the natural Join of all the
three relations will be shown as:
UNIT-2 NORMALIZATION IN DBMS
Table – R1
AGENT COMPANY
A1 PQR
A1 XYZ
A2 PQR
Table – R2
AGENT PRODUCT
A1 Nut
A1 Bolt
A2 Nut
Table – R3
COMPANY PRODUCT
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
UNIT-2 NORMALIZATION IN DBMS
Result of Natural Join of R1 and R3 over „Company‟ and then Natural Join
of R13 and R2 over „Agent‟and „Product‟ will be table ACP.
Hence, in this example, all the redundancies are eliminated, and the
decomposition of ACP is a lossless join decomposition. Therefore, the
relation is in 5NF as it does not violate the property of lossless join.