Lecture 9 - Database Normalization PDF
Lecture 9 - Database Normalization PDF
Lecture 9 - Database Normalization PDF
Database Systems
Unit 9 :
Database
Normalization
Topic 9
Objective :
At the end of this unit, you should be able to:
Understand why and when to normalize tables.
Understand why redundant data can cause
update anomalies.
Define functional dependency and apply it in
normalization.
Perform normalization to place tables in third
normal form (3NF).
IT1768 Database Systems
Topic 9
Introduction to Normalization
Normalization is a design process whereby a
primitive data table (relation) is decomposed into
more simpler tables that more precisely fit the
relational model.
Normalization aims to remove redundant data in
order to eliminate update anomalies.
Topic 9
Introduction to Normalization
Objective of Normalization
To avoid redundant data from tables (in order to
increase the integrity of the database design and to
maximize flexibility of data storage).
Topic 9
Introduction to Normalization
When to use Normalization technique ?
It is used :
when a set of tables is already available but
there has been no time to build an ER model.
to validate the design of existing tables when
user is unsure if the ER model is correct.
If the application of database is simple.
Used as a bottom-up approach to design the
database for the application.
Topic 9
Redundant Data
Figure 1- Table with redundant data
Topic 9
After deploying
Normalization
technique, it yields
multiple tables with
redundant data
removed.
Topic 9
Database Anomalies
- Insertion
- Update
- Deletion
Topic 9
Database Anomalies
An un-normalized table :
Topic 9
Database Anomalies
Some important information may be lost if a relation
is updated that contains database anomalies.
It is important to remove these anomalies in order
to perform different processing on the relations
without difficulties.
3 Types of database anomalies
o Insertion anomaly
o Deletion anomaly
o Update anomaly
10
Topic 9
11
Topic 9
null
null
20
Bay Shore
120
null
Bill MCan
Administrator
60.75
18.5
null
null
null
null
12
Topic 9
Insertion Anomalies
An insertion anomaly is a failure to place
information about a new database entry into all the
places in the database where information about
that new entry needs to be stored.
The insertion anomaly occurs when a new record is
inserted in the relation. In this anomaly, the user
cannot insert ALL the fact about an entity until he
has an additional fact about other entity.
In a properly normalized database, information
about a new entry needs to be inserted into only
one place in the database.
IT1768 Database Systems
13
Topic 9
Insertion Anomalies
Base on the relation below, insertion anomalies occur when
inserting a new staff member will force user to create branch data.
e.g. Add a new staff member SG99
Inserting a new branch (is not allowed) if there is no staff member for
the branch yet.
e.g. Add a new branch B008
SG99
Michael
Designer
30000 null
null
null
null
null
null
33 Clarence Road
B008
14
Topic 9
Fee
M100
Activity
Skiing
M150
Swimming
50
M175
Squash
100
M200
Swimming
50
M200
Squash
100
null
Golf
500
200
15
Topic 9
Mem_ID Activity
Activity
Fee
M100
Skiing
Skiing
5200
M150
Swimming
Swimming
50
M175
Squash
Squash
100
M200
Swimming
M200
Squash
Table : Fee
Table : Member
IT1768 Database Systems
16
Topic 9
Deletion Anomalies
A deletion anomaly is a failure to remove
information about an existing database entry
when it is time to remove that entry.
The deletion anomaly occurs when a record is
deleted from the relation. In this anomaly, the
deletion of fact about an entity automatically
deleted the fact of another entity.
In a properly normalized database, information
about an old, to-be-gotten-rid-of entry needs to
be deleted from only one place in the database.
IT1768 Database Systems
17
Topic 9
Deletion Anomalies
Deleting rows may cause a loss of data that would
be needed for other future rows.
e.g. Deletion of staff member SA9 will result in branch
details of B007 being lost.
18
Topic 9
19
Topic 9
Contact
E-55
64453638
E-56
12 Clementi Ave 1
66138899
Old record
E-56
64563782
Updated record
20
Topic 9
21
Topic 9
Normalization
Normalization vs. Normalized Data
All anomalies are highly undesirable, their
occurrence constitutes corruption of the
database. Normalized data do not contain data
redundancies and therefore, are less
susceptible to data corruption.
Normalization is the activity, the process that
leads to a normalized data structure to achieve
normalized data.
22
Topic 9
Normalization
Normal Forms
Normalization consists of a series of rules that must
be applied in steps to reach a higher level of
normalization. These levels are called normal forms.
23
Topic 9
Student
ProjLocation
EmpNo
PS-025
Sentosa
EE-03
073248K
Obama
165
PS-025
Sentosa
ED-15
074565L
Bush
172
PS-025
Sentosa
EP-21
076386E
Raymond
148
PC-344
Clementi
ES-26
075473P
Bryan
182
PC-344
Clementi
EE-32
075412M
Vanessa
163
AdminNum StudName
Height
24
Topic 9
Repeating Group
The repeating group for Supplier in the table below
failed to distinguish PartNo and PDesc in the
database.
Part_Inventory
PartNo
PDesc
SuppNo
SName
SAddr
Cost
P1
Printer
S1
S2
Bright
SuperGood
Jurong
200
Senkang 220
P2
Speaker
S1
S2
Bright
SuperGood
Jurong
300
Senkang 350
25
Topic 9
(option1)
PDesc
SuppNo
SName
SAddr
Cost
P1
Printer
S1
Bright
Jurong
200
P1
Printer
S2
SuperGood
Senkang
220
P2
Speaker
S1
Bright
Jurong
300
P2
Speaker
S2
SuperGood
Senkang
350
26
Topic 9
(option 2)
PDesc
SuppNo Cost
P1
Printer
S1
200
P1
Printer
S2
220
P2
Speaker
S1
300
P2
Speaker
S2
350
SName
SAddr
S1
Bright
Jurong
S2
SuperGood
Senkang
27
Topic 9
Functional Dependency
Functional dependency describes the relationship
between attributes (columns) in a single relation.
An attribute is functionally dependant on another if
we can use the value of one attribute to determine
the value of another.
For example:
STUDENT( StudNum, StudName, StudAddress )
o StudNum determines ( ) StudName and StudAddress.
o StudName, StudAddress are functionally dependent on
StudNum.
In normalization, 2NF accounts for functional dependency
in the relation.
28
Topic 9
Functional Dependency
Example for further illustration:
StudNum
StudName
StudAddress
023344C
Kenny Ang
28 Woodland Rd
023817F
Lawrence Teo
028893E
Jenny Huang
29
Topic 9
SuppNo
Cost
P1
S1
200
P1
S2
220
P2
S1
300
P2
S2
350
In schema,
Part_Supplier( PartNo, SuppNo , Cost )
30
Topic 9
b) Partial dependency
There is Non-key column depend on some part of
the primary key.
Example:
Part_Supplier
SAddr
Cost
P1
S1
Bright
Jurong
200
P1
S2
SuperGood
Senkang
220
P2
S1
Bright
Jurong
300
P2
S2
SuperGood
Senkang
350
In schema,
Supplier( SuppNo, Sname, SAddr)
31
Topic 9
Transitive dependency
There is Non-key column depend on another
Non-key column.
Example :
A Non-key attribute
Part
PartNo
PartDesc
Storeman
SContact
P1
Printer
Ken
98553344
P2
Speaker
Ken
98553344
P3
Ribbon
Alvin
97334400
P4
Cartridge
Alvin
97334400
In schema,
Storeman --> SContact Store-Person( Storeman, Scontact)
32
Topic 9
Normalization Rules
First Normal Form Rule, (1st Normal Form, 1NF)
Description
The table must express a set of unordered, twodimensional (2-D) tables.
The table cannot contain repeating groups (No two rows in
a table can be identical)
PartNo
Cost
P1
200
P1
P1
220
P1
Repeating groups
PartNo SuppNo
SName
SAddr
Cost
S1
Bright
Jurong
200
S2
33
Topic 9
Normalization Rules
Second Normal Form Rule, (2nd Normal Form, 2NF)
Description
The table must be in 1NF. Every non-key attribute must be
functionally dependant on all of the key attribute(s).
PartNo
SuppNo
Cost
P1
S1
200
P1
S2
220
Fully dependant on the whole key attributes
34
Topic 9
Normalization Rules
Second Normal Form Rule:
(Partially functional Dependent)
PartNo Supplier_No Supplier_Name
Supplier_Addr
P1
S1
Ken
98553344
P2
S2
Ken
98553344
Partially dependant on the key attribute
35
Topic 9
Normalization Rules
Third Normal Form Rule, (3rd Normal Form, 3NF)
Description
The table must be in 2NF and it contains no transitive
dependency. i.e. No non-key column transitively dependant
on another non-key column.
Inventory
PartNo
PartDesc
Storeman SContact
P1
Printer
Ken
98553344
P2
Speaker
Ken
98553344
36
Topic 9
Normalization Rules
Normalization Process : Summary of Schemas
Part
No
PDesc
Supp
No
SName
SAddr
P1
Printer
S1
Bright
Jurong
P1
Printer
S2
P2
Speaker
Storeman
Scontact
200
Ken
98553344
SuperGood Senkang
220
Ken
98553344
S1
Bright
300
Alvin
97334400
P2
Speaker
S2
SuperGood Senkang
350
Alvin
97334400
Jurong
Cost
37
Topic 9
Normalization Process
Table with
repeating groups
Remove repeating
groups
1NF
Handle functional
dependencies
.
2NF
3NF
Handle transitive
dependencies
38
Topic 9
Relation Schema :
PROJECT ( Proj_Num, Proj_Name, Emp_Num, Emp_Name,
JOB_CLASS, CHG_HOUR, HOURS)
Its intended to use PROJ_NUM and EMP_NUM as the whole key attributes
IT1768 Database Systems
39
Topic 9
Normalization : 1NF
1NF Summary:
Define all key attributes by underlining them.
No repeating groups in table
All attributes dependent on primary key
PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS,
CHG_HOUR, HOURS) OR
40
Topic 9
Normalization : 2NF
2NF Summary:
In 1NF
Includes no partial dependencies
- No attribute dependent on a
portion of primary key
Still possible to exhibit transitive
dependency
- Attributes may be functionally
dependent on nonkey attributes
41
Topic 9
Normalization : 3NF
Create separate table(s) to eliminate
transitive functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
42
Topic 9
Normalized :
Illustration Example 2
Relational Schema:
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
Item_total_price and Order_total_price are derived values, they can
be derived as follows:
o Item_total_price = ItemPrice*ItemQty and
o Order_total_price = Item_total_price1 + Item_total_price2 + .
IT1768 Database Systems
43
Topic 9
Relational Schema:
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
The database schema now satisfies the two
requirements of 1st Normal Form atomicity and
uniqueness. Thus it fulfills the most basic of
creation of a relational database.
44
Topic 9
45
Topic 9
46
Topic 9
47
Topic 9
48
Topic 9
49
Topic 9
50
Topic 9
Design Trade-offs
De-Normalization
Sometimes, the relations could have a very low update rate
or very high query rate and the performance is
unsatisfactory. It is not worth the cost of normalizing it.
One example is as follows:
CUSTOMER (CustNumber, CustName, City, State, Zip)
Key : CustNumber
51
Topic 9
Summary
Explain data redundancy and database
anomalies
Introduce normalization process
1NF to remove repeating groups in relations
2NF with Functional dependency among key
and non-key attributes
3NF with transitive dependency among nonkey attributes
52