Aman's Copy of DBMS 2 - Anomalies + Normalization
Aman's Copy of DBMS 2 - Anomalies + Normalization
Aman's Copy of DBMS 2 - Anomalies + Normalization
If a database design is not perfect, it may contain anomalies (problems), which are like
a bad dream for any database administrator. Managing a database with anomalies is
next to impossible.
Redundancy means having multiple copies of the same data in the database. This
problem arises when a database is not normalized. Suppose in a college, a table of
student details attributes are
student Id, student name, contact number, college name, course opted & college rank.
As it can be observed that values of attribute college name, college rank, the course is
being repeated which can lead to problems. Problems caused due to redundancy are
Insertion anomaly, Deletion anomaly, and Updation anomaly.
1. Insertion Anomaly –
a. If a student takes admission and his/her detail has to be inserted whose
course is not being decided yet then insertion of complete data will not be
possible till the time course is decided for the student.
b. Although we have an option of inserting incomplete data but that can
cause problems if the data is used before adding complete data.
c. For eg: If we query to count the number of students who are in a particular
course, this student will not be counted in any course but ideally, sum of
students in all courses should be equal to the total number of students in
the college.
d. This problem happens when the insertion of a data record is not possible
without adding some additional unrelated data to the record, eg: adding
NULL in the course column.
2. Deletion Anomaly –
a. If the current batch graduates and we delete all the data of the students
then the details of college (GEU college has rank 1) will also get deleted
which should not occur ideally.
b. This anomaly happens when deletion of a data record results in losing
some unrelated information that was stored as part of the record that was
deleted from a table.
3. Update Anomaly –
a. Suppose if the rank of the college changes then changes will have to be
all over the database which will be time-consuming and computationally
costly.
b. If the update does not occur at all places then the database will be in an
inconsistent state.
Normalization
If one user has multiple contact numbers then storing it in one column looks like
Que: What are the problems that can happen due to this?
But this will:
a. If there are multiple users who don’t have 3 contact numbers then most of
the cells will be marked NULL, wasting some space that could be
avoided.
b. Restrict the user who has more than 3 contact numbers as adding a new
column for only a single user is not ideal because that column will have
NULL for all other users.
Que: What are the problems that can happen due to this?
But this will increase the redundancy & waste some space as we are storing User_Id
and User_Name multiple times - But this can be taken care of as we go to 2NF.
E.g, AB->C
Then A->C or B->C shouldn’t be there.
Prime key Attribute: Attributes that are part of one of the candidate keys.
Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_subject table:
Now the tables comply with Second normal form (2NF).
Problem Statement: Let’s say we have data of some students who opted
for multiple courses in a university.
To convert this in 1NF, let’s have different rows for different courses.
1234 Utkarsh CS
2468 Karan IT
To convert this in 2NF, let’s have different tables to store user data and course data.
User Table
User_Id User_Name
1234 Utkarsh
2468 Karan
User_Course Table
User_Id User_Course
1234 CS
1234 Maths
2468 IT
2468 Maths
2468 Physics
UserZiip->UserCity
In this case, neither User_Zip is a super key (as it cannot identify any row uniquely -
there can be multiple users living in the same zip location) nor User_City is a prime
attribute (not part of any candidate key). Hence, there is a transitive partial dependency
that violates 3NF.
To convert this into 3NF keep all the user data in the User table except User_City, and
have a separate table of [Zip, City] where Zip is a primary key.
User Table
User_Id User_Zip
City Table
Zip City
3rd NF if X->Y
Either
X is super key
Or
Y is a prime attribute
2468 Karan +1 US
To convert the above data in BCNF, have separate tables for User data that does not
have Country_Name, and have a separate table for Country data.
2468 Karan +1
Country_Code Country_Name
+1 US
Summary:
1NF = No multi-valued attribute allowed.
In every non-trivial functional dependency, X → Y:
2NF = X is not a subset of candidate key. No Prime Attrb-> Non Prime Attr
3NF = Either X is a super key or Y is a prime attribute.
BCNF = X is not a non-prime attribute.
From a purist point of view, you want to normalize your data as much as possible,
but from a practical point of view you will find that you need to 'back out" of some
of your normalizations for performance reasons. This is called "denormalization".