Aman's Copy of DBMS 2 - Anomalies + Normalization

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

DBMS: Anomalies + Normalization

If a database design is not perfect, it may contain anomalies (problems), which are like
a bad dream for any database administrator. Managing a database with anomalies is
next to impossible.

The Problem of redundancy in Database

Redundancy means having multiple copies of the same data in the database. This
problem arises when a database is not normalized. Suppose in a college, a table of
student details attributes are
student Id, student name, contact number, college name, course opted & college rank.

As it can be observed that values of attribute college name, college rank, the course is
being repeated which can lead to problems. Problems caused due to redundancy are
Insertion anomaly, Deletion anomaly, and Updation anomaly.
1. Insertion Anomaly –
a. If a student takes admission and his/her detail has to be inserted whose
course is not being decided yet then insertion of complete data will not be
possible till the time course is decided for the student.
b. Although we have an option of inserting incomplete data but that can
cause problems if the data is used before adding complete data.
c. For eg: If we query to count the number of students who are in a particular
course, this student will not be counted in any course but ideally, sum of
students in all courses should be equal to the total number of students in
the college.
d. This problem happens when the insertion of a data record is not possible
without adding some additional unrelated data to the record, eg: adding
NULL in the course column.
2. Deletion Anomaly –
a. If the current batch graduates and we delete all the data of the students
then the details of college (GEU college has rank 1) will also get deleted
which should not occur ideally.
b. This anomaly happens when deletion of a data record results in losing
some unrelated information that was stored as part of the record that was
deleted from a table.

3. Update Anomaly –
a. Suppose if the rank of the college changes then changes will have to be
all over the database which will be time-consuming and computationally
costly.
b. If the update does not occur at all places then the database will be in an
inconsistent state.
Normalization

● Normalization is the process of organizing the data in the database.


● Normalization is used to minimize the redundancy from a relation or set of
relations. It is a method to remove all the anomalies and bring the database
to a consistent state.
● Normalization divides the larger table into the smaller table and links them using
relationships.
● Normalization is a stepwise process and it goes from First Normal Form →
Second Normal Form → Third Normal form → BC Normal Form.

First Normal form (1NF):


Each attribute should have an atomic value, multi-value attributes are not allowed.

If one user has multiple contact numbers then storing it in one column looks like

User_Id User_Name User_Contact

1234 Utkarsh 9876598765, 9898765765

2468 Karan 9876543210, 9898989898, 9897969594

To convert this data in 1NF we have two options:


1. Have multiple columns for contact numbers like:
User_Contact_1, User_Contact_2, and User_Contact_3.

User_Id User_Name User_Contact_1 User_Contact_2 User_Contact_3

1234 Utkarsh 9876598765 9898765765 NULL

2468 Karan 9876543210 9898989898 9897969594

Que: What are the problems that can happen due to this?
But this will:
a. If there are multiple users who don’t have 3 contact numbers then most of
the cells will be marked NULL, wasting some space that could be
avoided.
b. Restrict the user who has more than 3 contact numbers as adding a new
column for only a single user is not ideal because that column will have
NULL for all other users.

2. Have multiple rows to store different contact numbers.

User_Id User_Name User_Contact

1234 Utkarsh 9876598765

1234 Utkarsh 9898765765

2468 Karan 9876543210

2468 Karan 9898989898

2468 Karan 9897969594

SuperKey - {{userid, usercontact}, {userId,usercontact,username},{usercontact},


{usercontact,username}}
CandidateKey-{{usercontact},{A&b},{C&D}}

Que: What are the problems that can happen due to this?
But this will increase the redundancy & waste some space as we are storing User_Id
and User_Name multiple times - But this can be taken care of as we go to 2NF.

Second Normal Form (2NF):


There should be no partial dependency OR no non-prime attribute is dependent on the
proper subset of any candidate key of table.
That is, if X → A holds, then there should not be any proper subset Y of X, for which Y
→ A also holds true.

E.g, AB->C
Then A->C or B->C shouldn’t be there.

Prime key Attribute: Attributes that are part of one of the candidate keys.

Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.

Candidate Keys: {teacher_id, subject}


Prime Attributes -{teacher ID and Subject}

Non prime attribute: teacher_age

teacherId and subject-> Teacherage


The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non - prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no
non-prime attribute is dependent on the proper subset of any candidate key of the
table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_subject table:
Now the tables comply with Second normal form (2NF).

Problem Statement: Let’s say we have data of some students who opted
for multiple courses in a university.

User_Id User_Name User_Courses

1234 Utkarsh CS, Maths

2468 Karan IT, Maths, Physics

Que: Is the above data in 1NF?


Ans: No, because User_Courses is a multivalued attribute.

To convert this in 1NF, let’s have different rows for different courses.

User_Id User_Name User_Course

1234 Utkarsh CS

1234 Utkarsh Maths

2468 Karan IT

2468 Karan Maths

2468 Karan Physics


Candidate Key - {User_Id, UserCourse}
Prime Attributes ->{User id and User course}
Non Prime Attributes -> {username}

Que: Is the given table in 2NF?


Ans: No, since in the above table, non-prime attribute User_Name is dependent on
User_Id which is a proper subset of a Candidate Key = {User_Id, User_Course}.

To convert this in 2NF, let’s have different tables to store user data and course data.

User Table

User_Id User_Name

1234 Utkarsh

2468 Karan

User_Course Table

User_Id User_Course

1234 CS

1234 Maths

2468 IT

2468 Maths

2468 Physics

Third Normal Form (3NF):


No non-prime attribute is transitively dependent on the prime key attribute.
For every non-trivial functional dependency, X → Y, either X is a super key or Y is a
prime attribute.
In simple words, if we have a transitive dependency in the User table, where User_Id is
the primary key, like:

User_Id User_Zip User_City

Candidate Key -> {User_ID}


Prime Attribute ->{UserId}
Non Prime -> {UserZip and City}

UserZiip->UserCity

Here, User_Id → User_Zip & User_Zip → User_City therefore, by Transitive


Dependency User_Id → User_Zip → User_City.

In this case, neither User_Zip is a super key (as it cannot identify any row uniquely -
there can be multiple users living in the same zip location) nor User_City is a prime
attribute (not part of any candidate key). Hence, there is a transitive partial dependency
that violates 3NF.

To convert this into 3NF keep all the user data in the User table except User_City, and
have a separate table of [Zip, City] where Zip is a primary key.

User Table

User_Id User_Zip

City Table

Zip City
3rd NF if X->Y
Either
X is super key
Or
Y is a prime attribute

For BCNF ->


X -> SUper KEy

Boyce-Codd Normal form (BCNF):


It is an extension of 3NF which states that for any non-trivial functional dependency,
X → Y, X must be a super key.
For eg: In the above example, Zip → City where Zip was a super key, and User_Id →
Zip where User_Id was a super key, therefore it was in BCNF form as well.

Consider this table:

User_Id User_Name Country_Code Country_Name

1234 Utkarsh +91 India

5678 Abhishek +91 India

2468 Karan +1 US

In this table we have a functional dependency Country_Code → Country_Name in


which Country_Code is not a super key (as it cannot find a row uniquely) hence this
table is not in BCNF.

To convert the above data in BCNF, have separate tables for User data that does not
have Country_Name, and have a separate table for Country data.

User_Id User_Name Country_Code


1234 Utkarsh +91

5678 Abhishek +91

2468 Karan +1

Country_Code Country_Name

+91 India → Bharat

+1 US

Summary:
1NF = No multi-valued attribute allowed.
In every non-trivial functional dependency, X → Y:
2NF = X is not a subset of candidate key. No Prime Attrb-> Non Prime Attr
3NF = Either X is a super key or Y is a prime attribute.
BCNF = X is not a non-prime attribute.

Que: If there is a functional dependency where X → Y and X is a non-prime attribute


but Y is a prime key attribute then is this in
1. 3NF -> Yes
2. BCNF →

From a purist point of view, you want to normalize your data as much as possible,
but from a practical point of view you will find that you need to 'back out" of some
of your normalizations for performance reasons. This is called "denormalization".

You might also like