Lecture17 - Database Normalization
Lecture17 - Database Normalization
Database Systems
2
What is meant by Normalization?
▪ Its main goal is to minimize data redundancy and dependency by organizing the fields and
tables of a database in such a way that ensures that when a change is made to one table, it only
needs to be made in one place.
▪ This helps to prevent anomalies, such as insertion, update, and deletion anomalies, which can
occur when data is not properly organized.
▪ Normalization typically involves dividing large tables into smaller, more manageable tables
and defining relationships between them.
▪ This process is carried out through a series of normal forms, each addressing different aspects
of data redundancy and dependency.
3
Purpose of Normalization?
▪ Minimize data redundancy: Normalization helps reduce the duplication of data within a database,
ensuring that each piece of information is stored in only one place.
▪ Improve data integrity: By organizing data into separate tables and eliminating anomalies,
normalization helps maintain consistency and accuracy of data.
▪ Facilitate efficient database management: Normalized databases are easier to update, insert, and
delete data without risking inconsistencies or errors.
▪ Enhance query performance: Well-structured, normalized databases often perform better for
complex queries, as they reduce the need for extensive data retrieval and processing.
4
What is Data Redundancy?
▪ In other words, it occurs when the same piece of data is stored in multiple places.
• Wasted storage space, inconsistency, update anomalies, Insertion and deletion anomalies,
and complexity.
5
Data Redundancy by Example-1
▪ Let's consider a hypothetical database table for storing information about employees, including
their ID, name, department, and manager. Here's an example of a table with data redundancy:
Note:
• This redundancy can lead to various issues
such as wasted storage space,
inconsistencies if the manager's name
changes, and complexities in managing the
data.
• Similarly, both Sarah Johnson and Bob White are in the Marketing
department, and "Bob White" is repeated for both the employee and the
manager.
6
Data Redundancy by Example-2
▪ Let's consider another example, this time focusing on customer orders in an e-commerce
database. Here's a table with some data redundancy:
In this table:
• John Smith has made two orders, but his name is repeated for each order.
• The product "Laptop" with ID 101 appears twice, and its name is repeated for each order.
• The information about the customer (CustomerID and CustomerName) is repeated for each order placed by
the same customer.
7
Resolving Data Redundancy Issue in Example-2
▪ To resolve the data redundancy issue in the previous
example, we can normalize the data by breaking it down
into separate tables. Here's how we can do it:
8
What is Data Dependency?
▪ It describes how changes in one data element may affect other data
elements.
9
Types of Dependencies
• For example, in a table of employees, the employee's social security number (SSN) may uniquely determine their
name.
10
Contd…
▪ Transitive Dependency:
• A transitive dependency occurs when an attribute depends on another attribute that is not its primary key.
• For example, in a table where the primary key is employee ID, the employee's department may depend on the
employee's manager, which in turn depends on the employee's ID.
Here's the transitive dependency:
11
Contd…
▪ Multivalued Dependency:
• Multivalued dependency occurs when the presence of one or more rows in a table implies the presence of other rows.
• This is often encountered in tables where one attribute may have multiple values associated with it.
• For example, in a table of courses and textbooks, if a course requires multiple textbooks, the presence of a course implies the
presence of multiple textbook entries.
001 John Smith Database • This means that for each EmployeeID, there can be multiple
values (skills) associated with it.
002 Alice Brown Design
For example:
003 Sarah Johnson Database
• EmployeeID 001 (John Smith) has both Programming and
003 Sarah Johnson Marketing Database skills.
004 Bob White Marketing • EmployeeID 003 (Sarah Johnson) has Database and
Marketing skills.
12
What are Data Modification Anomalies?
➢ Insertion Anomaly:
• When one cannot insert a tuple into a relationship due to lack of data is called as Insertion Anomaly.
➢ Deletion Anomaly:
• When a deletion of data results in some un intended loss of important data, then it is referred to as Deletion
Anomaly.
➢ Update Anomaly:
• When an single data value requires an update for multiple rows of data to be updated, it is referred to as
update Anomaly.
13
Delete Anomaly
For Example:
Insert Anomaly
Update Anomaly
14
The Process of Normalization
▪ The process of organizing a relational database in order to reduce data redundancy and
improve data integrity.
▪ When a requirement is not met, the relation violating the requirement must be decomposed
into relations that individually meet the requirements of normalization.
▪ Each step corresponds to a specific normal form that has known properties.
▪ The normal forms are categorized as (0NF), (1NF), (2NF), (3NF), (BCNF), (4NF), (5NF).
15
16
The process of
normalization
17
Steps:
18
Unnormalized Database (0NF or UNF)
➢ An unnormalized table has multiple values within a single field, as well as redundant information in the worst case.
➢ A table that contains one or more repeating groups
managerID managerName area employeeID employeeName sectorID sectorName To Convert to 1NF
Example-1 1 David D. 4 Finance
1 Adam A. East
2 Eugene E. 3 IT
3 George G. 2 Security
2 Betty B. West 4 Henry H. 1 Administration
5 Ingrid I. 4 Finance
6 James J. 1 Administration
3 Carl C. North
Example-2 7 Katy K. 4 Finance
F123 Hamza Johar Town Database Systems Operating systems Numerical Analysis
19
Step 1: First Normal Form 1NF
➢ To rework the database table into the 1NF, values within a single field must be atomic. All complex entities in the
table divide into new rows or columns.
Sr.No. managerID managerName area employeeID employeeName sectorID sectorName
1 1 Adam A. East 1 David D. 4 Finance
Example-1
2 1 Adam A. East 2 Eugene E. 3 IT
3 2 Betty B. West 3 George G. 2 Security
4 2 Betty B. West 4 Henry H. 1 Administration
5 2 Betty B. West 5 Ingrid I. 4 Finance
6 3 Carl C. North 6 James J. 1 Administration
7 3 Carl C. North 7 Katy K. 4 Finance
21
Transitive Dependency
22
Step 3: Third Normal Form 3NF
23
The database is currently in third normal form with three relations in total.
24
Summary
▪ Unnormalized Form (UNF) is a table that contains one or more repeating groups.
▪ First Normal Form (1NF) is a relation in which the intersection of each row and column contains
one and only one value.
▪ Second Normal Form (2NF) is a relation that is in first normal form and every non-primary-key
attribute is fully functionally dependent on the primary key.
▪ Third Normal Form (3NF) is a relation that is in first and second normal form in which no non-
primary- key attribute is transitively dependent on the primary key.
• Transitive dependency is a condition where A, B, and C are attributes of a relation such that if A ® B and B ® C, then C is
transitively dependent on A via B (provided that A is not functionally dependent on B or C).
25
Thankyou
Any Queries?
26