Mod3 Chap2 - Normalization
Mod3 Chap2 - Normalization
1
Anomalies in DBMS
What is Anomaly?
Anomaly means inconsistency in the pattern from the normal form. In Database
Management System (DBMS), anomaly means the inconsistency occurred in the relational
table during the operations performed on the relational table.
For example, if there is a lot of redundant data present in our database then DBMS
anomalies can occur.
If a table is constructed in a very poor manner, then there is a chance of database anomaly.
Due to database anomalies, the integrity of the database suffers.
The other reason for the database anomalies is that all the data is stored in a single table.
2
There can be three types of an anomaly in the database:
Updation / Update Anomaly
When we update some rows in the table, and if it leads to the inconsistency of the table then
this anomaly occurs. This type of anomaly is known as an updation anomaly.
In the table, if we want to update the address of Ramesh then we will have to update all the
rows where Ramesh is present. If during the update we miss any single row, then there will
be two addresses of Ramesh, which will lead to inconsistent and wrong databases.
Worker_id Worker_name Worker_dept Worker_address
3
Stu_id Stu_name Stu_branch Stu_club
2018nk01 Shivani Computer science literature
2018nk01 Shivani Computer science dancing
2018nk02 Ayush Electronics Videography
2018nk03 Mansi Electrical dancing
2018nk03 Mansi Electrical singing
2018nk04 Gopal Mechanical Photography
In the above table, if Shivani changes her branch from Computer Science to Electronics,
then we will have to update all the rows. If we miss any row, then Shivani will have more
than one branch, which will create the update anomaly in the table.
4
Insertion Anomaly
If there is a new row inserted in the table and it creates the inconsistency in the table then it
is called the insertion anomaly.
For example, if in the table, we create a new row of a worker, and if it is not allocated to any
department then we cannot insert it in the table so, it will create an insertion anomaly.
5
Stu_id Stu_name Stu_branch Stu_club
If we add a new row for student Ankit who is not a part of any club, we cannot insert the
row into the table as we cannot insert null in the column of stu_club. This is called
insertion anomaly.
6
Deletion Anomaly
If we delete some rows from the table and if any other information or data which is required is
also deleted from the database, this is called the deletion anomaly in the database.
For example, in the below table, if we want to delete the department number ECT669 then
the details of Rajesh will also be deleted since Rajesh's details are dependent on the row of
ECT669. So, there will be deletion anomalies in the table.
7
Stu_id Stu_name Stu_branch Stu_club
If we remove the photography club from the college, then we will have to delete its row from
the table.
But it will also delete the table of Gopal and his details. So, this is called deletion anomaly
and it will make the database inconsistent.
8
Functional dependency
In a given relation R, X and Y are attributes. Attribute Y is
functionally dependent on attribute X if each value of X
determines EXACTLY ONE value of Y, which is represented as
X -> Y .
9
Functional Dependencies
Consider the following Relation
10
Functional Dependencies- From the previous example
For each value of (Student# ,Course#), Marks obtained will be exactly one value. So we observe
the following Functional dependency
For each value of Course# the name of the course will be exactly one value. So, we observe the
following Functional dependency
COURSE# → CourseName,
For each value of Marks the grade will be exactly one value. So we observe the following
functional dependency
Marks → Grade
11
Functional dependency
Types of functional dependencies:
12
Full dependencies
X and Y are attributes.
X Functionally determines Y
Note: Subset of X should not functionally determine Y
13
Partial dependencies
X and Y are attributes.
Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of
attribute X.
Course# CourseName
14
Transitive dependencies
X Y and Z are three attributes.
X -> Y
Y-> Z
=> X -> Z
15
What is Normalization?
•Normalization divides the larger table into smaller and links them using relationships.
•The normal form is used to reduce redundancy, Inconsistency and Uncertainty from the
database table.
The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other
problems as the database grows.
16
Types of Normal Forms
17
Advantages of Normalization
Disadvantages of Normalization
•You cannot start building the database before knowing what the user
needs.
•The performance degrades when normalizing the relations to higher
normal forms, i.e., 4NF, 5NF.
•It is very time-consuming and difficult to normalize relations of a
higher degree.
•Careless decomposition may lead to a bad database design, leading to
serious problems.
18
First Normal Form (1NF)
•It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
•First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
19
The decomposition of the EMPLOYEE table into 1NF has been shown as
14 John 7272826385 UP
14 John 9064738238 UP
20
Online Retail Application Tables – 1NF Normalized
Observation on Un Normalized Retail Application Table
Qty
Purchase
CustomerId CustomerName Accountno ItemId ItemName UnitPrice Class d NetAmt
21
Second Normal Form (2NF)
•In the 2NF, relational must be in 1NF.
•In the second normal form, all non-key attributes are fully functional dependent on the
primary key
or
No partial dependency exists between non-key attributes and key attributes.
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In
a school, a teacher can teach more than one subject.
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
Non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset
of a candidate key. That's why it violates the rule for 2NF.
Note: Remember that we are dealing with non-key attributes
22
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
23
24
Second Normal Form (2NF) Contd…
•A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
•3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
•If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
2.Y is a prime attribute, i.e., each element of Y is part of some candidate key.
27
Third Normal Form: 3 NF
A relation R is said to be in the Third Normal Form (3NF) if and only if
− It is in 2NF and
− No transitive dependency exists between non-key attributes and
key attributes through another non key attribute.
A B C
It should be non
key attribute
28
EMPLOYEE_DETAIL table: EMP_ID EMP_NA EMP_ZIP EMP_STA EMP_CIT
ME TE Y
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
29
We need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMPLOYEE table: EMP_ID EMP_NAME EMP_ZIP
30
Third Normal Form : Example (Cont..)
Let's assume there is a company where employees work in more than one department.
33
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, decompose it into three tables:
EMP_DEPT
EMP_DEPT DEPT_TYPE EMP_DEPT_
NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
34
EMP_DEPT_MAPPING table: EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
35
Boyce-Codd Normal Form (BCNF)
Example - Address (Not in BCNF)
Scheme → {City, Street, ZipCode }
1. Key1 → {City, Street }
2. Key2 → {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street} → {ZipCode}
5. {ZipCode} → {City}
6. Dependency between attributes belonging to a key
BCNF - Decomposition
1. Place the two candidate primary keys in separate
entities
2. Place each of the remaining data items in one of the
resulting entities according to its dependency on the
primary key.
Example 1 (Convert to BCNF)
Old Scheme → {City, Street, ZipCode }
New Scheme1 → {ZipCode, Street}
New Scheme2 → {City, Street}
Fourth normal form (4NF)
•A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
•For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
STU_ID COURSE HOBBY
STUDENT 21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
38
STUDENT_COURSE STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
39
4NF - Decomposition
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.
•A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
•5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
•5NF is also known as Project-join normal form (PJ/NF).
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to identify
a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
41
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1 SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
P3
Computer Anshika
Computer John SEMSTER LECTURER
Math John Semester 1 Anshika
Math Akash Semester 1 John
Chemistry Praveen Semester 1 John
Semester 2 Akash
Semester 1 Praveen
42
In simple word, Supplier (“Ali”) produce (“ABC”) and Customer (“Nauman”) can use
it. But Ali and Nauman are not directly connected.
43
Normal Form Description
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no
multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
44