0% found this document useful (0 votes)
5 views

Lecture17 - Database Normalization

This document provides an overview of normalization in database systems, detailing its purpose, benefits, and the process involved in organizing data to minimize redundancy and improve integrity. It explains the concepts of data redundancy, functional dependency, and the various normal forms (1NF, 2NF, 3NF) that guide the normalization process. Additionally, it addresses common data modification anomalies and illustrates the normalization steps with examples.

Uploaded by

f2023266730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture17 - Database Normalization

This document provides an overview of normalization in database systems, detailing its purpose, benefits, and the process involved in organizing data to minimize redundancy and improve integrity. It explains the concepts of data redundancy, functional dependency, and the various normal forms (1NF, 2NF, 3NF) that guide the normalization process. Additionally, it addresses common data modification anomalies and illustrates the normalization steps with examples.

Uploaded by

f2023266730
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CC-2141

Database Systems

Department of Computer Science


School of Systems & Technology - SST
Chapter Objectives

In this chapter you will learn:


▪ The purpose of normalization.
▪ How normalization can be used when designing a relational database.
▪ The potential problems associated with redundant data in base relations.
▪ The concept of functional dependency, which describes the relationship between attributes.
▪ How to identify functional dependencies for a given relation.
▪ How to undertake the process of normalization.
▪ How normalization uses functional dependencies to group attributes into relations that are in a known normal form.
▪ How to identify the most commonly used normal forms: First Normal Form (1NF), Second Normal Form (2NF), and Third
Normal Form (3NF).
▪ The problems associated with relations that break the rules of 1NF, 2NF, or 3NF.
▪ How to represent attributes shown on a form as 3NF relations using normalization.

2
What is meant by Normalization?

▪ Normalization is a database design technique used to organize data in a relational database.

▪ Its main goal is to minimize data redundancy and dependency by organizing the fields and
tables of a database in such a way that ensures that when a change is made to one table, it only
needs to be made in one place.

▪ This helps to prevent anomalies, such as insertion, update, and deletion anomalies, which can
occur when data is not properly organized.

▪ Normalization typically involves dividing large tables into smaller, more manageable tables
and defining relationships between them.

▪ This process is carried out through a series of normal forms, each addressing different aspects
of data redundancy and dependency.

3
Purpose of Normalization?

▪ Minimize data redundancy: Normalization helps reduce the duplication of data within a database,
ensuring that each piece of information is stored in only one place.

▪ Improve data integrity: By organizing data into separate tables and eliminating anomalies,
normalization helps maintain consistency and accuracy of data.

▪ Facilitate efficient database management: Normalized databases are easier to update, insert, and
delete data without risking inconsistencies or errors.

▪ Simplify database design: Normalization provides a systematic approach to structuring databases,


making it easier to understand and maintain the database schema.

▪ Enhance query performance: Well-structured, normalized databases often perform better for
complex queries, as they reduce the need for extensive data retrieval and processing.

4
What is Data Redundancy?

▪ Data redundancy refers to the repetition of data within a database.

▪ In other words, it occurs when the same piece of data is stored in multiple places.

▪ Redundancy can lead to several issues in a database, including:

• Wasted storage space, inconsistency, update anomalies, Insertion and deletion anomalies,
and complexity.

▪ Overall, data redundancy is undesirable in a database because it can lead to inefficiency,


inconsistency, and data integrity problems.

▪ Normalization is one technique used to address data redundancy by organizing data in a


way that minimizes duplication and dependency.

5
Data Redundancy by Example-1
▪ Let's consider a hypothetical database table for storing information about employees, including
their ID, name, department, and manager. Here's an example of a table with data redundancy:

Note:
• This redundancy can lead to various issues
such as wasted storage space,
inconsistencies if the manager's name
changes, and complexities in managing the
data.

In this example: • Normalization would involve breaking down


this table into smaller, more efficient tables to
• Both John Smith and Alice Brown are in the Sales department, but the eliminate redundancy and improve data
name "Alice Brown" is repeated for both the employee and the integrity.
manager.

• Similarly, both Sarah Johnson and Bob White are in the Marketing
department, and "Bob White" is repeated for both the employee and the
manager.

6
Data Redundancy by Example-2
▪ Let's consider another example, this time focusing on customer orders in an e-commerce
database. Here's a table with some data redundancy:

In this table:

• John Smith has made two orders, but his name is repeated for each order.

• The product "Laptop" with ID 101 appears twice, and its name is repeated for each order.

• The information about the customer (CustomerID and CustomerName) is repeated for each order placed by
the same customer.
7
Resolving Data Redundancy Issue in Example-2
▪ To resolve the data redundancy issue in the previous
example, we can normalize the data by breaking it down
into separate tables. Here's how we can do it:

• With this normalization:

• Customer information (CustomerID and


CustomerName) is stored only once in the
Customers table.

• Product information (ProductID and ProductName) is


stored only once in the Products table.

• The Orders table contains references (CustomerID


and ProductID) to the Customers and Products
tables, eliminating redundancy.

• This normalization reduces data redundancy and


improves data integrity by ensuring that each piece of
data is stored in only one place.

8
What is Data Dependency?

▪ Data dependency in databases refers to the relationship between data


elements within a database.

▪ It describes how changes in one data element may affect other data
elements.

▪ Understanding data dependency is crucial for database design and management


because it helps ensure data integrity and consistency.

9
Types of Dependencies

There are 3 types of data dependencies: Functional, Transitive, Multivalued


▪ Functional Dependency:
• In a functional dependency, the value of one attribute uniquely determines the value of another attribute in the
same table.

• For example, in a table of employees, the employee's social security number (SSN) may uniquely determine their
name.

EmployeeID EmployeeName Department ManagerID • In this table:


001 John Smith Sales 002
• EmployeeID uniquely identifies each
002 Alice Brown Sales 002 employee.

003 Sarah Johnson Marketing 004 • EmployeeName depends on EmployeeID, as


each ID corresponds to a unique name
004 Bob White Marketing 004
(functional dependency).

10
Contd…

▪ Transitive Dependency:
• A transitive dependency occurs when an attribute depends on another attribute that is not its primary key.
• For example, in a table where the primary key is employee ID, the employee's department may depend on the
employee's manager, which in turn depends on the employee's ID.
Here's the transitive dependency:

• EmployeeID-> Department -> DepartmentLocation


EmployeeID EmployeeName Department DepartmentLocation • This means that indirectly, given an EmployeeID,
001 John Smith Sales New York we can determine the location of the department
they belong to.
002 Alice Brown Sales New York
For example:
003 Sarah Johnson Marketing Chicago
• EmployeeID 001 belongs to the Sales department.
004 Bob White Marketing Chicago
• The Sales department is located in New York.

11
Contd…

▪ Multivalued Dependency:
• Multivalued dependency occurs when the presence of one or more rows in a table implies the presence of other rows.
• This is often encountered in tables where one attribute may have multiple values associated with it.
• For example, in a table of courses and textbooks, if a course requires multiple textbooks, the presence of a course implies the
presence of multiple textbook entries.

Here's the multivalued dependency:


EmployeeID EmployeeName Skill
001 John Smith Programming • EmployeeID ->> Skill

001 John Smith Database • This means that for each EmployeeID, there can be multiple
values (skills) associated with it.
002 Alice Brown Design
For example:
003 Sarah Johnson Database
• EmployeeID 001 (John Smith) has both Programming and
003 Sarah Johnson Marketing Database skills.
004 Bob White Marketing • EmployeeID 003 (Sarah Johnson) has Database and
Marketing skills.

12
What are Data Modification Anomalies?

There are 3 types of Data Modification Anomalies:

➢ Insertion Anomaly:

• When one cannot insert a tuple into a relationship due to lack of data is called as Insertion Anomaly.

➢ Deletion Anomaly:

• When a deletion of data results in some un intended loss of important data, then it is referred to as Deletion
Anomaly.

➢ Update Anomaly:

• When an single data value requires an update for multiple rows of data to be updated, it is referred to as
update Anomaly.

13
Delete Anomaly
For Example:

Insert Anomaly

Update Anomaly

14
The Process of Normalization

▪ The process of organizing a relational database in order to reduce data redundancy and
improve data integrity.

▪ The technique involves a series of rules.

▪ When a requirement is not met, the relation violating the requirement must be decomposed
into relations that individually meet the requirements of normalization.

▪ Normalization is often executed as a series of steps.

▪ Each step corresponds to a specific normal form that has known properties.

▪ The normal forms are categorized as (0NF), (1NF), (2NF), (3NF), (BCNF), (4NF), (5NF).

▪ Only First Normal Form (1NF) that is critical in creating relations.

▪ All subsequent normal forms are optional.

15
16
The process of
normalization

17
Steps:

18
Unnormalized Database (0NF or UNF)

➢ An unnormalized table has multiple values within a single field, as well as redundant information in the worst case.
➢ A table that contains one or more repeating groups
managerID managerName area employeeID employeeName sectorID sectorName To Convert to 1NF
Example-1 1 David D. 4 Finance
1 Adam A. East
2 Eugene E. 3 IT
3 George G. 2 Security
2 Betty B. West 4 Henry H. 1 Administration
5 Ingrid I. 4 Finance

6 James J. 1 Administration
3 Carl C. North
Example-2 7 Katy K. 4 Finance

StudentID StudentName Address Course1 Course2 Course3

F123 Hamza Johar Town Database Systems Operating systems Numerical Analysis

S224 Ali Samanabad Assembly language Software Eng. Numerical Analysis

S334 Zainab Islamabad OO Programming Islamiyat English 2

19
Step 1: First Normal Form 1NF

➢ To rework the database table into the 1NF, values within a single field must be atomic. All complex entities in the
table divide into new rows or columns.
Sr.No. managerID managerName area employeeID employeeName sectorID sectorName
1 1 Adam A. East 1 David D. 4 Finance
Example-1
2 1 Adam A. East 2 Eugene E. 3 IT
3 2 Betty B. West 3 George G. 2 Security
4 2 Betty B. West 4 Henry H. 1 Administration
5 2 Betty B. West 5 Ingrid I. 4 Finance
6 3 Carl C. North 6 James J. 1 Administration
7 3 Carl C. North 7 Katy K. 4 Finance

StudentID StudentName Address Course

F123 Hamza Johar Town Database Systems


Example-2
F123 Hamza Johar Town Database Systems
F123 Hamza Johar Town Database Systems
S224 Ali Samanabad Assembly language
S224 Ali Samanabad Assembly language
S224 Ali Samanabad Assembly language
20
Step 2: Second Normal Form 2NF
▪ The second normal form in database normalization states that each row in the database table
must depend on the primary key. The table splits into two tables to satisfy the normal form:
•Manager (managerID, managerName, area)

managerID managerName area Note: The resulting database in the second


1 Adam A. East normal form is currently two tables with no
2 Betty B. West partial dependencies.
3 Carl C. North

•Employee (employeeID, employeeName, managerID, sectorID, sectorName)

employeeID employeeName managerID sectorID sectorName


1 David D. 1 4 Finance
2 Eugene E. 1 3 IT
3 George G. 2 2 Security
4 Henry H. 2 1 Administration
5 Ingrid I. 2 4 Finance
6 James J. 3 1 Administration
7 Katy K. 3 4 Finance

21
Transitive Dependency

Consider a table describing employees and their assigned departments:


In this table:
➢ EmployeeID is the primary key.
➢ EmployeeName, Department, and DepartmentHead are non-prime attributes (not part
of the primary key).
➢ There's a transitive dependency between Department and DepartmentHead.
➢ DepartmentHead depends on Department, but not directly on EmployeeID (the primary
key).
➢ This means if you know the Department, you can determine the DepartmentHead, but
it's indirectly through the EmployeeID.

EmployeeID EmployeeName Department DepartmentHead


1 John Sales Jane
2 Alice Marketing Bob
3 Mary Sales Jane

22
Step 3: Third Normal Form 3NF

➢ The third normal form decomposes any transitive functional dependencies.


➢ Currently, the table Employee has a transitive dependency which decomposes into two new tables:

•Employee (employeeID, employeeName, managerID, sectorID)

employeeID employeeName managerID sectorID


1 David D. 1 4 •Sector (sectorID, sectorName)
2 Eugene E. 1 3
sectorID sectorName
3 George G. 2 2
1 Administration
4 Henry H. 2 1
2 Security
5 Ingrid I. 2 4
3 IT
6 James J. 3 1
4 Finance
7 Katy K. 3 4

23
The database is currently in third normal form with three relations in total.
24
Summary

▪ Unnormalized Form (UNF) is a table that contains one or more repeating groups.

▪ First Normal Form (1NF) is a relation in which the intersection of each row and column contains
one and only one value.

▪ Second Normal Form (2NF) is a relation that is in first normal form and every non-primary-key
attribute is fully functionally dependent on the primary key.

▪ Third Normal Form (3NF) is a relation that is in first and second normal form in which no non-
primary- key attribute is transitively dependent on the primary key.

• Transitive dependency is a condition where A, B, and C are attributes of a relation such that if A ® B and B ® C, then C is
transitively dependent on A via B (provided that A is not functionally dependent on B or C).

25
Thankyou
Any Queries?

26

You might also like