0% found this document useful (0 votes)
22 views

Mod3 Chap2 - Normalization

Normalization is the process of organizing data in a database to minimize redundancy and dependency. It involves dividing large tables into smaller, normalized tables and linking them together. This reduces anomalies like insertion, updation, and deletion anomalies that could violate data integrity. There are several normal forms like 1NF, 2NF, 3NF, BCNF that structure data to eliminate anomalies by removing redundant or dependent data. Normalization improves data consistency, flexibility, and integrity but can impact performance.

Uploaded by

Suhas Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Mod3 Chap2 - Normalization

Normalization is the process of organizing data in a database to minimize redundancy and dependency. It involves dividing large tables into smaller, normalized tables and linking them together. This reduces anomalies like insertion, updation, and deletion anomalies that could violate data integrity. There are several normal forms like 1NF, 2NF, 3NF, BCNF that structure data to eliminate anomalies by removing redundant or dependent data. Normalization improves data consistency, flexibility, and integrity but can impact performance.

Uploaded by

Suhas Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Normalization

1
Anomalies in DBMS

What is Anomaly?

Anomaly means inconsistency in the pattern from the normal form. In Database
Management System (DBMS), anomaly means the inconsistency occurred in the relational
table during the operations performed on the relational table.

There can be various reasons for anomalies to occur in the database.

For example, if there is a lot of redundant data present in our database then DBMS
anomalies can occur.

If a table is constructed in a very poor manner, then there is a chance of database anomaly.
Due to database anomalies, the integrity of the database suffers.

The other reason for the database anomalies is that all the data is stored in a single table.

So, to remove the anomalies of the database, normalization is used.

2
There can be three types of an anomaly in the database:
Updation / Update Anomaly

When we update some rows in the table, and if it leads to the inconsistency of the table then
this anomaly occurs. This type of anomaly is known as an updation anomaly.

In the table, if we want to update the address of Ramesh then we will have to update all the
rows where Ramesh is present. If during the update we miss any single row, then there will
be two addresses of Ramesh, which will lead to inconsistent and wrong databases.
Worker_id Worker_name Worker_dept Worker_address

65 Ramesh ECT001 Jaipur


65 Ramesh ECT002 Jaipur
73 Amit ECT002 Delhi
76 Vikas ECT501 Pune
76 Vikas ECT502 Pune
79 Rajesh ECT669 Mumbai

3
Stu_id Stu_name Stu_branch Stu_club
2018nk01 Shivani Computer science literature
2018nk01 Shivani Computer science dancing
2018nk02 Ayush Electronics Videography
2018nk03 Mansi Electrical dancing
2018nk03 Mansi Electrical singing
2018nk04 Gopal Mechanical Photography

In the above table, if Shivani changes her branch from Computer Science to Electronics,
then we will have to update all the rows. If we miss any row, then Shivani will have more
than one branch, which will create the update anomaly in the table.

4
Insertion Anomaly

If there is a new row inserted in the table and it creates the inconsistency in the table then it
is called the insertion anomaly.

For example, if in the table, we create a new row of a worker, and if it is not allocated to any
department then we cannot insert it in the table so, it will create an insertion anomaly.

Worker_id Worker_name Worker_dept Worker_address

65 Ramesh ECT001 Jaipur


65 Ramesh ECT002 Jaipur
73 Amit ECT002 Delhi
76 Vikas ECT501 Pune
76 Vikas ECT502 Pune
79 Rajesh ECT669 Mumbai

5
Stu_id Stu_name Stu_branch Stu_club

2018nk01 Shivani Computer science literature

2018nk01 Shivani Computer science dancing

2018nk02 Ayush Electronics Videography

2018nk03 Mansi Electrical dancing


2018nk03 Mansi Electrical singing
2018nk04 Gopal Mechanical Photography

If we add a new row for student Ankit who is not a part of any club, we cannot insert the
row into the table as we cannot insert null in the column of stu_club. This is called
insertion anomaly.

6
Deletion Anomaly

If we delete some rows from the table and if any other information or data which is required is
also deleted from the database, this is called the deletion anomaly in the database.

For example, in the below table, if we want to delete the department number ECT669 then
the details of Rajesh will also be deleted since Rajesh's details are dependent on the row of
ECT669. So, there will be deletion anomalies in the table.

Worker_id Worker_name Worker_dept Worker_address


65 Ramesh ECT001 Jaipur
65 Ramesh ECT002 Jaipur
73 Amit ECT002 Delhi
76 Vikas ECT501 Pune
76 Vikas ECT502 Pune
79 Rajesh ECT669 Mumbai

7
Stu_id Stu_name Stu_branch Stu_club

2018nk01 Shivani Computer science literature

2018nk01 Shivani Computer science dancing

2018nk02 Ayush Electronics Videography

2018nk03 Mansi Electrical dancing


2018nk03 Mansi Electrical singing
2018nk04 Gopal Mechanical Photography

If we remove the photography club from the college, then we will have to delete its row from
the table.

But it will also delete the table of Gopal and his details. So, this is called deletion anomaly
and it will make the database inconsistent.

8
Functional dependency
In a given relation R, X and Y are attributes. Attribute Y is
functionally dependent on attribute X if each value of X
determines EXACTLY ONE value of Y, which is represented as
X -> Y .

We say here “x determines y” or “y is functionally dependent


on x”
X→Y does not imply Y→X

If the value of an attribute “Marks” is known then the value of


an attribute “Grade” is determined since Marks→ Grade

9
Functional Dependencies
Consider the following Relation

REPORT (STUDENT#,COURSE#, StudentName, CourseName, Marks,


Grade)

Description of the Attributes:

STUDENT# - Student Number


COURSE# - Course Number
StudentName- Student Name
CourseName - Course Name
Marks - Scored in Course COURSE# by Student STUDENT#
Grade - obtained by Student STUDENT# in Course COURSE#

10
Functional Dependencies- From the previous example
For each value of (Student# ,Course#), Marks obtained will be exactly one value. So we observe
the following Functional dependency

STUDENT# COURSE# → Marks

For each value of Course# the name of the course will be exactly one value. So, we observe the
following Functional dependency

COURSE# → CourseName,

For each value of Marks the grade will be exactly one value. So we observe the following
functional dependency

Marks → Grade

11
Functional dependency
Types of functional dependencies:

◦ Full Functional dependency


◦ Partial Functional dependency
◦ Transitive dependency

12
Full dependencies
X and Y are attributes.
X Functionally determines Y
Note: Subset of X should not functionally determine Y

13
Partial dependencies
X and Y are attributes.
Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of
attribute X.

We have both the functional dependency valid in our example

Student# Course# CourseName

Course# CourseName

So we can say that CourseName is partially dependent on Student# Course#

14
Transitive dependencies
X Y and Z are three attributes.
X -> Y
Y-> Z
=> X -> Z

15
What is Normalization?

•Normalization is the process of organizing the data in the database.

•Normalization is used to minimize the redundancy from a relation or set of relations. It is


also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.

•Normalization divides the larger table into smaller and links them using relationships.

•The normal form is used to reduce redundancy, Inconsistency and Uncertainty from the
database table.

• This Refinement process is called Normalization

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other
problems as the database grows.

16
Types of Normal Forms

17
Advantages of Normalization

•Normalization helps to minimize data redundancy.


•Greater overall database organization.
•Data consistency within the database.
•Much more flexible database design.
•Enforces the concept of relational integrity.

Disadvantages of Normalization
•You cannot start building the database before knowing what the user
needs.
•The performance degrades when normalizing the relations to higher
normal forms, i.e., 4NF, 5NF.
•It is very time-consuming and difficult to normalize relations of a
higher degree.
•Careless decomposition may lead to a bad database design, leading to
serious problems.
18
First Normal Form (1NF)

•A relation will be 1NF if it contains an atomic value.

•It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.

•First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table: EMP_ID EMP_NAME EMP_PHON EMP_STATE


E
14 John 7272826385 UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab


8589830302

19
The decomposition of the EMPLOYEE table into 1NF has been shown as

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

20
Online Retail Application Tables – 1NF Normalized
Observation on Un Normalized Retail Application Table

CustomerDetails ItemDetails PurchaseDetails


1001 John 1500012351 STN001 Pen 10 A 5 50
1002 Tom 1200354611 BAK003 Bread 10 A 1 10
1003 Maria 2134724532 GRO001 Potato 20 B 1 20

Above observation violates 1NF definition


To bring it to 1NF we need to make the columns atomic

Qty
Purchase
CustomerId CustomerName Accountno ItemId ItemName UnitPrice Class d NetAmt

1001 John 1500012351 STN001 Pen 10 A 5 50

1002 Tom 1200354611 BAK003 Bread 10 A 1 10

1003 Maria 2134724532 GRO001 Potato 20 B 1 20

21
Second Normal Form (2NF)
•In the 2NF, relational must be in 1NF.
•In the second normal form, all non-key attributes are fully functional dependent on the
primary key
or
No partial dependency exists between non-key attributes and key attributes.

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In
a school, a teacher can teach more than one subject.

TEACHER table TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
Non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset
of a candidate key. That's why it violates the rule for 2NF.
Note: Remember that we are dealing with non-key attributes

22
To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table: TEACHER_ID TEACHER_AGE


25 30
47 35
83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer

23
24
Second Normal Form (2NF) Contd…

Example (Not 2NF)


Scheme → {Title, PubId, AuId, Price, AuAddress}
1. Key → {Title, PubId, AuId}
2. {Title, PubId, AuID} → {Price}
3. {AuID} → {AuAddress}
4. AuAddress does not belong to a key
5. AuAddress functionally depends on AuId which is a subset of a
key
2NF - Decomposition

Example (Convert to 2NF)


Old Scheme → {Title, PubId, AuId, Price, AuAddress}
New Scheme → {Title, PubId, AuId, Price}
New Scheme → {AuId, AuAddress}
Third Normal Form (3NF)

•A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.

•3NF is used to reduce the data duplication. It is also used to achieve the data integrity.

•If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.

1.X is a super key.

2.Y is a prime attribute, i.e., each element of Y is part of some candidate key.

27
Third Normal Form: 3 NF
A relation R is said to be in the Third Normal Form (3NF) if and only if
− It is in 2NF and
− No transitive dependency exists between non-key attributes and
key attributes through another non key attribute.

A B C

It should be non
key attribute

To make a table 3NF compliant, we have to remove all such Transitive


Dependencies

28
EMPLOYEE_DETAIL table: EMP_ID EMP_NA EMP_ZIP EMP_STA EMP_CIT
ME TE Y
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston

444 Lan 60007 US Chicago


555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID.

The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super


key(EMP_ID). It violates the rule of third normal form.

29
We need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMPLOYEE table: EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010


333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table: EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

30
Third Normal Form : Example (Cont..)

Example (Not in 3NF)


Scheme → {Title, PubID, PageCount, Price }
1. Key → {Title, PubId}
2. {Title, PubId} → {PageCount}
3. {PageCount} → {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID} → {Price} hence not in 3NF
3NF - Decomposition
1. Move all items involved in transitive dependencies to a new entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign key on the
original entity.

Example (Convert to 3NF)


Old Scheme → {Title, PubID, PageCount, Price }
New Scheme → {PubID, PageCount, Price}
New Scheme → {Title, PubID, PageCount}
Boyce Codd normal form (BCNF)

•BCNF is the advance version of 3NF. It is stricter than 3NF.


•A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
•For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Let's assume there is a company where employees work in more than one department.

EMPLOYEE table: EMP_ID EMP_COU EMP_DEPT DEPT_TYP EMP_DEPT


NTRY E _NO

264 India Designing D394 283

264 India Testing D394 300


364 UK Stores D283 232
364 UK Developing D283 549

Functional dependencies are


1.EMP_ID → EMP_COUNTRY
2.EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

33
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, decompose it into three tables:

EMP_COUNTRY EMP_ID EMP_COUNTRY


264 India
264 India

EMP_DEPT
EMP_DEPT DEPT_TYPE EMP_DEPT_
NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

34
EMP_DEPT_MAPPING table: EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

35
Boyce-Codd Normal Form (BCNF)
Example - Address (Not in BCNF)
Scheme → {City, Street, ZipCode }
1. Key1 → {City, Street }
2. Key2 → {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street} → {ZipCode}
5. {ZipCode} → {City}
6. Dependency between attributes belonging to a key
BCNF - Decomposition
1. Place the two candidate primary keys in separate
entities
2. Place each of the remaining data items in one of the
resulting entities according to its dependency on the
primary key.
Example 1 (Convert to BCNF)
Old Scheme → {City, Street, ZipCode }
New Scheme1 → {ZipCode, Street}
New Scheme2 → {City, Street}
Fourth normal form (4NF)
•A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
•For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
STU_ID COURSE HOBBY
STUDENT 21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So, there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

38
STUDENT_COURSE STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

39
4NF - Decomposition
1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.

Example (Convert to 3NF)


Old Scheme → {MovieName, ScreeningCity, Genre}
New Scheme → {MovieName, ScreeningCity}
New Scheme → {MovieName, Genre}

Movie Genre Movie ScreeningCity


Hard Code Comedy Hard Code Los Angles

Bill Durham Drama Hard Code New York

The Code Warrier Horror Bill Durham Santa Cruz

Bill Durham Durham

The Code Warrier New York


Fifth normal form (5NF)

•A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
•5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
•5NF is also known as Project-join normal form (PJ/NF).
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to identify
a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.

41
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1 SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2

SUBJECT LECTURER
P3
Computer Anshika
Computer John SEMSTER LECTURER
Math John Semester 1 Anshika
Math Akash Semester 1 John
Chemistry Praveen Semester 1 John
Semester 2 Akash
Semester 1 Praveen

42
In simple word, Supplier (“Ali”) produce (“ABC”) and Customer (“Nauman”) can use
it. But Ali and Nauman are not directly connected.

In table (SC) Supplier and Customer are directly connected.

43
Normal Form Description
1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency


exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no
multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.

44

You might also like