0% found this document useful (0 votes)
0 views

Lecture Note 5 (Normalization)

Normalization is a process used in database design to minimize data redundancy and eliminate anomalies such as insertion, update, and deletion anomalies. It involves decomposing larger tables into smaller, well-structured tables that adhere to specific normal forms, including 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF. Each normal form has its own set of rules to ensure data integrity and efficient storage.

Uploaded by

tasriftanim2002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lecture Note 5 (Normalization)

Normalization is a process used in database design to minimize data redundancy and eliminate anomalies such as insertion, update, and deletion anomalies. It involves decomposing larger tables into smaller, well-structured tables that adhere to specific normal forms, including 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF. Each normal form has its own set of rules to ensure data integrity and efficient storage.

Uploaded by

tasriftanim2002
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Normalization

A large database defined as a single relation may result in data duplication.


This repetition of data may result in:

• Making relations very large.


• It isn't easy to maintain and update data as it would involve searching
many records in relation.
• Wastage and poor utilization of disk space and resources.
• The likelihood of errors and inconsistencies increases.

So, to handle these problems, we should analyze and decompose the relations
with redundant data into smaller, simpler, and well-structured relations that
are satisfy desirable properties. Normalization is a process of decomposing the
relations into relations with fewer attributes.

What is Normalization?

• Normalization is the process of organizing the data in the database.


• Normalization is used to minimize the redundancy from a relation or set
of relations.
• It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
• Normalization divides the larger table into smaller and links them using
relationships.

Why do we Need Normalization?

• The main reason for normalizing the relations is removing certain


anomalies. These are: Data Redundancy, Insertion Anomaly, Deletion
Anomaly, Updation Anomaly.
• Failure to eliminate anomalies leads to data redundancy and can cause
data integrity and other problems as the database grows.
• Normalization consists of a series of guidelines that helps to guide you
in creating a good database structure.

1|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Types of Anomalies in Database
Following are the types of anomalies that make the table inconsistency, loss of
integrity, and redundant data.

1. Data Redundancy: Data redundancy occurs in a relational database when


two or more rows or columns have the same value or repetitive value leading
to unnecessary utilization of the memory. For example: there are two students in
the above table, 'James' and 'Ritchie Rich', whose records are repetitive when
we enter a new CourseID. Hence it repeats the studRegistration, StudName and
address attributes.
Student Table
StudRegistration CourseID StudName Address Course

205 6204 James Los Angeles Economics

205 6247 James Los Angeles Economics

230 6204 Ritchie Rich Egypt Computer

230 6208 Ritchie Rich Egypt Accounts

2. Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a


new tuple into a relationship due to lack of data. For example, In the Student
table, if we want to insert a new courseID, we need to wait until the student
enrolled in a course. In this way, it is difficult to insert new record in the table.
Hence, it is called insertion anomalies.

3. Updatation Anomalies: The update anomaly is when an update of a single


data value requires multiple rows of data to be updated. For example,
suppose there is a student 'James' who belongs to Student table. If we want to
update the course in the Student, we need to update the same in the Course
table; otherwise, the data can be inconsistent. And it reflects the changes in a
table with updated values where some of them will not.

4. Deletion Anomalies: The delete anomaly refers to the situation where the
deletion of data results in the unintended loss of some other important data.
For example, if we want to remove Trent Bolt from the Student table, it also
removes his address, course and other details from the Student table.
Therefore, we can say that deleting some attributes can remove other attributes
of the database table.

2|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Types of Normal Forms
Normalization works through a series of stages called Normal forms. The
normal forms apply to individual relations. The relation is said to be in
particular normal form if it satisfies constraints. Following are the various types
of Normal forms. These are:

• First Normal Form (1NF)


• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce and Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)

First Normal Form (1NF)


• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple values. It must
hold only single-valued attribute.
• First normal form disallows the multi-valued attribute, composite
attribute, and their combinations.

Example of 1NF

Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE Table
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below.

3|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


• To be in second normal form, a relation must be in 1NF and relation
must not contain any partial dependency.
• A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime
attribute (attributes which are not part of any candidate key) is
dependent on any proper subset of any candidate key of the table.

Non-prime Attribute: Attributes which are not part of any candidate key.

Partial Dependency: If the proper subset of candidate key determines non-


prime attribute, it is called partial dependency.

Example of 2NF
Consider the following table.

STUD_NO COURSE_NO COURSE_FEE


1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000

• Note that, there are many courses having the same course fee.
• Here, COURSE_FEE cannot alone decide the value of COURSE_NO
or STUD_NO. That means, COURSE_FEE alone cannot be a Key.

4|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
• COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO. That means, {COURSE_FEE, STUD_NO} cannot be
a candidate key.
• COURSE_FEE together with COURSE_NO cannot decide the value
of STUD_NO. That means, {COURSE_FEE, COURSE_NO} cannot
be a candidate key.
• Hence, COURSE_FEE would be a non-prime attribute, as it does not
belong to the one only candidate key {STUD_NO, COURSE_NO}.
• But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is
dependent on COURSE_NO, where COURSE_NO is a proper subset
of the candidate key.
• That means, here, non-prime attribute COURSE_FEE is dependent on
a proper subset of the candidate key, which is a partial dependency
and so this relation is not in 2NF.
• To convert the above relation to 2NF, we need to split the table into
two tables such as: Table 1: STUD_NO, COURSE_NO Table 2:
COURSE_NO, COURSE_FEE.

Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5

NOTE: 2NF tries to reduce the redundant data getting stored in memory.
For instance, if there are 100 students taking C1 course, we don’t need to
store its Fee as 1000 for all the 100 records, instead, once we can store it in
the second table as the course fee for C1 is 1000.

5|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any transitive
partial dependency.
• If there is no transitive dependency for non-prime attributes, then the
relation must be in 3NF.
• A relation is in 3NF if it holds at least one of the following conditions
for every non-trivial function dependency X → Y.
i. X is a super key.
ii. Y is a prime attribute, i.e., each element of Y is part of some
candidate key.

Transitive Functional Dependency: If A->B and B->C are two functional


dependencies (FDs), then A->C is called transitive dependency.
The normalization of 2NF relations to 3NF involves the removal of
transitive dependencies. If a transitive dependency exists, we remove the
transitively dependent attribute(s) from the relation by placing the attribute(s)
in a new relation along with a copy of the determinant.

Non-trivial Functional Dependency: A → B has a non-trivial functional


dependency if B is not a subset of A. When A intersection B is NULL, then A
→ B is called as complete non-trivial. Example:
ID → Name,
Name → DOB

Example of 3NF

Consider the following Employee_Details Table.

EMPLOYEE_DETAIL Table
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

6|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
In the above table:
• Super Key: {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID,
EMP_NAME, EMP_ZIP}, …… and so on
• Candidate Key: {EMP_ID}
• Non-prime Attributes: In the given table, all attributes except EMP_ID
are non-prime.

• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and


EMP_ZIP dependent on EMP_ID. The non-prime attributes
(EMP_STATE, EMP_CITY) transitively dependent on super key
(EMP_ID). It violates the rule of third normal form.
• To solve this problem, we need to move the EMP_CITY and
EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP
as a Primary key.

EMPLOYEE Table
EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP Table
EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

7|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Boyce Codd normal form (BCNF)
• BCNF is the advance version of 3NF. It is stricter than 3NF.
• A table is in BCNF if every functional dependency X → Y, X is the super
key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super
key.

Example of BCNF
Let's assume there is a company where employees work in more than one
department.

EMPLOYEE Table
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional Dependencies are as follows:


• EMP_ID → EMP_COUNTRY
• EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate Key: {EMP-ID, EMP-DEPT}


• The table is not in BCNF because neither EMP_DEPT nor EMP_ID
alone are keys. i.e., left side part of both the functional dependencies
are not super key.
• To convert the given table into BCNF, we decompose it into three
tables.

EMP_COUNTRY Table
EMP_ID EMP_COUNTRY

264 India

364 UK

8|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
EMP_DEPT Table
EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING Table
EMP_ID EMP_DEPT

264 283

264 300

364 232

364 549

Functional Dependencies of the above tables are:


• EMP_ID → EMP_COUNTRY
• EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Super Key / Candidate keys of the above tables are:


• For the first table: EMP_ID
• For the second table: EMP_DEPT
• For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional
dependencies is a super key.

9|Page
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Fourth Normal Form (4NF)
A relation R is in 4NF if and only if the following conditions are satisfied:

• It should be in the Boyce-Codd Normal Form (BCNF).


• The table should not have any Multi-valued Dependency.

A table with a multivalued dependency violates the normalization standard of


the Fourth Normal Form (4NF) because it creates unnecessary redundancies
and can contribute to inconsistent data. To bring this up to 4NF, it is necessary
to break this information into two tables.

Multi-valued Dependency

• For a dependency A → B, if for a single value of A, multiple values of


B exist, then the relation will be a multi-valued dependency.
• Put another way, two attributes (or columns) in a table are independent
of one another, but both depend on a third attribute.
• A multivalued dependency always requires at least three attributes
because it consists of at least two attributes that are dependent on a
third.

Example of 4NF
The given below STUDENT table is in 3NF, but the COURSE and HOBBY
are two independent entities. Hence, there is no relationship between COURSE
and HOBBY.

STUDENT
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

10 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So, there
is a multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.

So, to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE
STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY
STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

11 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Fifth Normal Form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
• 5NF is also known as Project-join normal form (PJ/NF).

Example of 5NF

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

• In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2. In this case,
combination of all these fields required to identify a valid data. That means,
primary key for the table is combination of all the fields (columns).
• Suppose we add a new Semester as Semester 3 but do not know about
the subject and who will be taking that subject so we leave Lecturer and
Subject as NULL. But all three columns together act as a primary key,
so we can't leave other two columns blank.
• So, to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3.

P1
SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

12 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
P2
SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3
SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Relational Decomposition
• When a relation in the relational model is not in appropriate normal
form then the decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.
• If the relation has no proper decomposition, then it may lead to
problems like loss of information.
• Decomposition is used to eliminate some of the problems of bad design
like anomalies, inconsistencies, and redundancy.

13 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Types of Decomposition

Lossless Decomposition

• If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
• The lossless decomposition guarantees that the join of relations will
result in the same relation as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.

Example

EMPLOYEE_DEPARTMENT
EMP_ EMP_ EMP_ EMP_ DEPT_ DEPT_
ID NAME AGE CITY ID NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above EMPLOYEE_DEPARTMENT relation is decomposed into two


relations EMPLOYEE and DEPARTMENT.

14 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
EMPLOYEE
EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT
DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID",
then the resultant relation will be lossless and will look like:

Employee ⋈ Department
EMP_ EMP_ EMP_ EMP_ DEPT_ DEPT_
ID NAME AGE CITY ID NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

15 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)
Dependency Preserving

• It is an important constraint of the database.


• In the dependency preservation, at least one decomposed table must
satisfy every dependency.
• If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of R1 and
R2.
• For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into
R1(ABC) and R2(AD) which is dependency preserving because FD A-
>BC is a part of relation R1(ABC).

16 | P a g e
Ishtiaq Ahammad (Lecturer, Dept. of CSE)

You might also like