Lecture 9 - Database Normalization PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Topic 9

Database Systems

Unit 9 :
Database
Normalization

IT1768 Database Systems

Topic 9

Unit 9 : Database Normalization

Objective :
At the end of this unit, you should be able to:
Understand why and when to normalize tables.
Understand why redundant data can cause
update anomalies.
Define functional dependency and apply it in
normalization.
Perform normalization to place tables in third
normal form (3NF).
IT1768 Database Systems

Topic 9

Introduction to Normalization
Normalization is a design process whereby a
primitive data table (relation) is decomposed into
more simpler tables that more precisely fit the
relational model.
Normalization aims to remove redundant data in
order to eliminate update anomalies.

IT1768 Database Systems

Topic 9

Introduction to Normalization
Objective of Normalization
To avoid redundant data from tables (in order to
increase the integrity of the database design and to
maximize flexibility of data storage).

To help eliminate data anomalies (Insert, update,


Delete)

To produce controlled redundancies to link


tables.

IT1768 Database Systems

Topic 9

Introduction to Normalization
When to use Normalization technique ?
It is used :
when a set of tables is already available but
there has been no time to build an ER model.
to validate the design of existing tables when
user is unsure if the ER model is correct.
If the application of database is simple.
Used as a bottom-up approach to design the
database for the application.

IT1768 Database Systems

Topic 9

Redundant Data
Figure 1- Table with redundant data

It is unnecessary to store an employee's home address in


more than one record or one table. With duplicate data, it
occupies extra storage space and when all values (say
address for the same staff) are not updated completely, .
The value (in this case, the address) for that staff in one
table does not match the corresponding value (address)
of the same staff in another table.

IT1768 Database Systems

Topic 9

Redundant Data removed


Figure 2 Tables without redundant data

IT1768 Database Systems

After deploying
Normalization
technique, it yields
multiple tables with
redundant data
removed.

Topic 9

Database Anomalies
- Insertion
- Update
- Deletion

IT1768 Database Systems

Topic 9

Database Anomalies
An un-normalized table :

Database anomalies are the problems in relations


that occur due to redundancy of information in the
relations.
These anomalies affect the process of inserting,
deleting and modifying of data in the relations.
IT1768 Database Systems

Topic 9

Database Anomalies
Some important information may be lost if a relation
is updated that contains database anomalies.
It is important to remove these anomalies in order
to perform different processing on the relations
without difficulties.
3 Types of database anomalies
o Insertion anomaly
o Deletion anomaly
o Update anomaly

IT1768 Database Systems

10

Topic 9

Anomalies in un-Normalized table

Observation: PRO_NUM intended to be primary key

IT1768 Database Systems

Table entries invite data inconsistencies


Table displays data anomalies:
Update anomaly occurs when
Modifying JOB_CLASS
Insertion anomaly occurs when
Adding a new project: Need to assign an employee.
Deletion anomaly occurs if
Employees (103, 114) deleted, other vital data lost

11

Topic 9

Anomalies in Un-Normalized Table

null

null

20

Bay Shore

120
null

Bill MCan

Administrator

60.75

18.5

null

null

null

null

Observation: PRO_NUM intended to be primary key

IT1768 Database Systems

Table entries invite data inconsistencies


Table displays data anomalies:
Update anomaly occurs when
Modifying JOB_CLASS
Insertion anomaly occurs when
Adding a new project: Need to assign an employee.
Deletion anomaly occurs if
Employees (103, 114) deleted, other vital data lost

12

Topic 9

Insertion Anomalies
An insertion anomaly is a failure to place
information about a new database entry into all the
places in the database where information about
that new entry needs to be stored.
The insertion anomaly occurs when a new record is
inserted in the relation. In this anomaly, the user
cannot insert ALL the fact about an entity until he
has an additional fact about other entity.
In a properly normalized database, information
about a new entry needs to be inserted into only
one place in the database.
IT1768 Database Systems

13

Topic 9

Insertion Anomalies
Base on the relation below, insertion anomalies occur when

inserting a new staff member will force user to create branch data.
e.g. Add a new staff member SG99
Inserting a new branch (is not allowed) if there is no staff member for
the branch yet.
e.g. Add a new branch B008

SG99

Michael

Designer

30000 null

null

null

null

null

null

33 Clarence Road

IT1768 Database Systems

B008

14

Topic 9

Insertion Anomalies - Example

In the following table, if a new activity Golf of fee $500


is to be added, it has to wait until some new member
registers for Golf.

e.g. Add a new staff member M250


Mem_ID

IT1768 Database Systems

Fee

M100

Activity
Skiing

M150

Swimming

50

M175

Squash

100

M200

Swimming

50

M200

Squash

100

null

Golf

500

200

15

Topic 9

Insertion Anomalies - Example


To resolve insertion anomalies, it is to break-up the
relation into two separate relations; such as :
- Member (Key : Mem_ID)
- Fee (Key : Activity)

Mem_ID Activity

Activity

Fee

M100

Skiing

Skiing

5200

M150

Swimming

Swimming

50

M175

Squash

Squash

100

M200

Swimming

M200

Squash

Table : Fee

Table : Member
IT1768 Database Systems

16

Topic 9

Deletion Anomalies
A deletion anomaly is a failure to remove
information about an existing database entry
when it is time to remove that entry.
The deletion anomaly occurs when a record is
deleted from the relation. In this anomaly, the
deletion of fact about an entity automatically
deleted the fact of another entity.
In a properly normalized database, information
about an old, to-be-gotten-rid-of entry needs to
be deleted from only one place in the database.
IT1768 Database Systems

17

Topic 9

Deletion Anomalies
Deleting rows may cause a loss of data that would
be needed for other future rows.
e.g. Deletion of staff member SA9 will result in branch
details of B007 being lost.

IT1768 Database Systems

18

Topic 9

Modification (Update) Anomalies


The modification anomaly occurs when the record
is updated in the relation. In this anomaly, the
modification in the value of specific attribute
requires modification in all records in which that
value occurs.

IT1768 Database Systems

19

Topic 9

Modification (Update) Anomalies


A same record can be expressed in multiple records;
therefore updates to the table may result in logical
inconsistencies. For example; there could be two
different addresses in the Employee table when
updating the address of an employee who had
shifted his residence.
Employee
EmpID EmpAddress

Contact

E-55

Blk 16 Bedok South

64453638

E-56

12 Clementi Ave 1

66138899

Old record

E-56

180 AMK Ave 5

64563782

Updated record

IT1768 Database Systems

20

Topic 9

Update Anomaly Example 2


As shown in the example below, changing data in a
row forces changes of data in the duplicated rows.
e.g. Update address of B003
Need to update
all 3 records.
Failing to update
all affected
records will lead
to inconsistency
in the database.

Normalization and Update Anomalies

Normalization aims to remove redundant data, which


helps to eliminate update anomalies.

IT1768 Database Systems

21

Topic 9

Normalization
Normalization vs. Normalized Data
All anomalies are highly undesirable, their
occurrence constitutes corruption of the
database. Normalized data do not contain data
redundancies and therefore, are less
susceptible to data corruption.
Normalization is the activity, the process that
leads to a normalized data structure to achieve
normalized data.

IT1768 Database Systems

22

Topic 9

Normalization
Normal Forms
Normalization consists of a series of rules that must
be applied in steps to reach a higher level of
normalization. These levels are called normal forms.

There are many normal forms:


1NF -> 2NF -> 3NF -> BCNF -> 4NF -> 5NF

In general, the IT industry considers


normalization to the 3NF an acceptable level
to remove redundancy.
Normal forms higher than the 3NF deal with more
subtle anomalies.
IT1768 Database Systems

23

Topic 9

Repeating Group in Tables


A repeating group is an attribute, or group of
attributes, that occurs with multiple values for a
single occurrence of another attribute(s).
Project
ProjNum

Student
ProjLocation

EmpNo

PS-025

Sentosa

EE-03

073248K

Obama

165

PS-025

Sentosa

ED-15

074565L

Bush

172

PS-025

Sentosa

EP-21

076386E

Raymond

148

PC-344

Clementi

ES-26

075473P

Bryan

182

PC-344

Clementi

EE-32

075412M

Vanessa

163

Project table with repeating group


(ProjNum, ProjLocation)

AdminNum StudName

Height

Student table with no repeating groups

In normalization, 1NF eliminates repeating groups


IT1768 Database Systems

24

Topic 9

Repeating Group
The repeating group for Supplier in the table below
failed to distinguish PartNo and PDesc in the
database.
Part_Inventory
PartNo

PDesc

SuppNo

SName

SAddr

Cost

P1

Printer

S1
S2

Bright
SuperGood

Jurong
200
Senkang 220

P2

Speaker

S1
S2

Bright
SuperGood

Jurong
300
Senkang 350

Multiple occurrences of vendors for each single occurrence of part.

Repeating group for (SuppNo, sName, sAddr, cost)

IT1768 Database Systems

25

Topic 9

Repeating Group Remedy

(option1)

In order to create a table that is in first normal form we must


extract the repeating groups and place them in a separate
tables.
Part-Inventory
PartNo

PDesc

SuppNo

SName

SAddr

Cost

P1

Printer

S1

Bright

Jurong

200

P1

Printer

S2

SuperGood

Senkang

220

P2

Speaker

S1

Bright

Jurong

300

P2

Speaker

S2

SuperGood

Senkang

350

Part-Inventory(PartNo, Pdesc, SuppNo, SName, SAddr, Cost)


Splitting into two relations, (PartNo, Pdesc, SuppNo, Cost) eliminates repeating
groups and (SuppNo, SName, Saddr) contains no repeating group also.
IT1768 Database Systems

26

Topic 9

Repeating Group - Remedy

(option 2)

In order to create a table that is in first normal form we must


extract the repeating groups and place them in a separate
tables.
Part-Line (PartNo, PDesc, SuppNo, Cost )
PartNo

PDesc

SuppNo Cost

P1

Printer

S1

200

P1

Printer

S2

220

P2

Speaker

S1

300

P2

Speaker

S2

350

Splitting of the relation


into two tables
eliminates repeating
groups

Supplier (SuppNo, SName, SAddr )


SuppNo

IT1768 Database Systems

SName

SAddr

S1

Bright

Jurong

S2

SuperGood

Senkang
27

Topic 9

Functional Dependency
Functional dependency describes the relationship
between attributes (columns) in a single relation.
An attribute is functionally dependant on another if
we can use the value of one attribute to determine
the value of another.
For example:
STUDENT( StudNum, StudName, StudAddress )
o StudNum determines ( ) StudName and StudAddress.
o StudName, StudAddress are functionally dependent on
StudNum.
In normalization, 2NF accounts for functional dependency
in the relation.

IT1768 Database Systems

28

Topic 9

Functional Dependency
Example for further illustration:
StudNum

StudName

StudAddress

023344C

Kenny Ang

28 Woodland Rd

023817F

Lawrence Teo

211 Tampines Ave 7

028893E

Jenny Huang

11 Ang Mo Kio Ave 5

StudNum --> StudName, StudAddress


In schema, STUDENT ( StudNum, StudName, StudAddress )
StudNum is a key field. It functionally determines StudName
and StudAddress.
In other words, StudName and StudAddress are functionally
dependent on StudNum which is the key field.
Two types of functional dependency: Full and Partial
IT1768 Database Systems

29

Topic 9

Types of Functional Dependency

a) Full dependency (Full functional dependency)


Every Non-key column is functionally dependent
on all parts of the primary key (Whole key).
Example:
Part_Supplier
PartNo

SuppNo

Cost

P1

S1

200

P1

S2

220

P2

S1

300

P2

S2

350

IT1768 Database Systems

PartNo, SuppNo --> Cost

In schema,
Part_Supplier( PartNo, SuppNo , Cost )

30

Topic 9

Types of Functional Dependency

b) Partial dependency
There is Non-key column depend on some part of
the primary key.
Example:
Part_Supplier

Whole key : PartNo, SupplierNo

PartNo SuppNo SName

SAddr

Cost

P1

S1

Bright

Jurong

200

P1

S2

SuperGood

Senkang

220

P2

S1

Bright

Jurong

300

P2

S2

SuperGood

Senkang

350

SuppNo --> Sname, SAddr


IT1768 Database Systems

In schema,
Supplier( SuppNo, Sname, SAddr)
31

Topic 9

Types of Functional Dependency

Transitive dependency
There is Non-key column depend on another
Non-key column.
Example :
A Non-key attribute
Part
PartNo

PartDesc

Storeman

SContact

P1

Printer

Ken

98553344

P2

Speaker

Ken

98553344

P3

Ribbon

Alvin

97334400

P4

Cartridge

Alvin

97334400

In schema,
Storeman --> SContact Store-Person( Storeman, Scontact)

In normalization, 3NF accounts for transitive dependency in the relation.

IT1768 Database Systems

32

Topic 9

Normalization Rules
First Normal Form Rule, (1st Normal Form, 1NF)
Description
The table must express a set of unordered, twodimensional (2-D) tables.
The table cannot contain repeating groups (No two rows in
a table can be identical)
PartNo

Cost

P1

200

P1

P1

220

P1

Repeating groups

PartNo SuppNo

SName

SAddr

Cost

S1

Bright

Jurong

200

S2

SuperGood Senkang 220


No repeating groups

Part-Supplier( PartNo, SupNo, SName, SAddr, Cost)


IT1768 Database Systems

33

Topic 9

Normalization Rules
Second Normal Form Rule, (2nd Normal Form, 2NF)
Description
The table must be in 1NF. Every non-key attribute must be
functionally dependant on all of the key attribute(s).
PartNo

SuppNo

Cost

P1

S1

200

P1

S2

220
Fully dependant on the whole key attributes

2NF : Full Functional dependant :


PartNo, SuppNo --> Cost

Part-Supp( PartNo, SuppNo, Cost)


IT1768 Database Systems

34

Topic 9

Normalization Rules
Second Normal Form Rule:
(Partially functional Dependent)
PartNo Supplier_No Supplier_Name

Supplier_Addr

P1

S1

Ken

98553344

P2

S2

Ken

98553344
Partially dependant on the key attribute

2NF : Partial Functional dependant :


Supplier_No --> Supplier_Name, Supplier_Addr

Supplier( Supplier_No, Supplier_Name, Supplier_Addr)

IT1768 Database Systems

35

Topic 9

Normalization Rules
Third Normal Form Rule, (3rd Normal Form, 3NF)
Description
The table must be in 2NF and it contains no transitive
dependency. i.e. No non-key column transitively dependant
on another non-key column.
Inventory

PartNo

PartDesc

Storeman SContact

P1

Printer

Ken

98553344

P2

Speaker

Ken

98553344

3NF : Transitive dependant : Storeman --> SContact

Store-person( storeman, sContact)


IT1768 Database Systems

36

Topic 9

Normalization Rules
Normalization Process : Summary of Schemas
Part
No

PDesc

Supp
No

SName

SAddr

P1

Printer

S1

Bright

Jurong

P1

Printer

S2

P2

Speaker

Storeman

Scontact

200

Ken

98553344

SuperGood Senkang

220

Ken

98553344

S1

Bright

300

Alvin

97334400

P2

Speaker

S2

SuperGood Senkang

350

Alvin

97334400

Jurong

Cost

Inventory( PartNo, PDescr, SuppNo, SName, SAddr, Cost, Storeman, Scontact)


Part( PartNo, PDesc, Storeman)
Part-Supp( PartNo, SuppNo, Cost)

Supplier( SuppNo, SName, SAddr)


Store-person( Storeman, Scontact)
IT1768 Database Systems

37

Topic 9

Normalization Process
Table with
repeating groups

Remove repeating
groups

1NF
Handle functional
dependencies

.
2NF

3NF

IT1768 Database Systems

Handle transitive
dependencies

38

Topic 9

Normalization : Illustration Example 1

Relation Schema :
PROJECT ( Proj_Num, Proj_Name, Emp_Num, Emp_Name,
JOB_CLASS, CHG_HOUR, HOURS)

Its intended to use PROJ_NUM and EMP_NUM as the whole key attributes
IT1768 Database Systems

39

Topic 9

Normalization : 1NF

1NF Summary:
Define all key attributes by underlining them.
No repeating groups in table
All attributes dependent on primary key
PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS,
CHG_HOUR, HOURS) OR

PROJECT (PROJ_NUM, PROJ_NAME)


EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
IT1768 Database Systems

40

Topic 9

Normalization : 2NF
2NF Summary:
In 1NF
Includes no partial dependencies
- No attribute dependent on a
portion of primary key
Still possible to exhibit transitive
dependency
- Attributes may be functionally
dependent on nonkey attributes

PROJECT (PROJ_NUM, PROJ_NAME)


EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
IT1768 Database Systems

41

Topic 9

Normalization : 3NF
Create separate table(s) to eliminate
transitive functional dependencies
PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)


JOB (JOB_CLASS, CHG_HOUR)
3NF Summary:
In 2NF
Contain no transitive dependencies
IT1768 Database Systems

42

Topic 9

Normalized :

Illustration Example 2

Example of Un-normalized table:

Relational Schema:
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
Item_total_price and Order_total_price are derived values, they can
be derived as follows:
o Item_total_price = ItemPrice*ItemQty and
o Order_total_price = Item_total_price1 + Item_total_price2 + .
IT1768 Database Systems

43

Topic 9

Removing the repeating groups

Relational Schema:
ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
The database schema now satisfies the two
requirements of 1st Normal Form atomicity and
uniqueness. Thus it fulfills the most basic of
creation of a relational database.

IT1768 Database Systems

44

Topic 9

Functional dependency with key attributes


ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
ORDERS
(OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState, ItemID, ItemDescription, ItemQty, ItemPrice)

Next, we test the table for functional dependency (FD); this


consists of full functional dependency and partial functional
dependency.
Full FD are attribute(s) dependents on all key attributes (in
this case, OrderID and ItemID together.
Partial FD refers attributes that are dependents on any of
the key attributes alone.
IT1768 Database Systems

45

Topic 9

Functional dependency with key attributes


ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)
ORDERS
(OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState, ItemID, ItemDescription, ItemQty, ItemPrice)

ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState)


ORDERS_ITEMS (OrderID, ItemID, ItemQty)
ITEMS

(ItemID, ItemDescription, ItemPrice)

IT1768 Database Systems

46

Topic 9

Functional dependency with key attributes


ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState)

ORDERS_ITEMS (OrderID, ItemID, ItemQty)


ITEMS

(ItemID, ItemDescription, ItemPrice)

IT1768 Database Systems

47

Topic 9

Transitive Functional Dependency


ORDERS
(OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState, ItemID, ItemDescription, ItemQty, ItemPrice)

Next, we test the table for transitive functional dependency ;


i.e. any non-key attributes dependent on other non-key
attribute.
In this case, CustName, CustAddr, CustCity, CustState are
dependent on CustID. All are non-key attributes.

CUSTOMER (CustID, CustName, CustAddr, CustCity, CustState)

IT1768 Database Systems

48

Topic 9

Un-normalized to Normalized tables


ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState,
ItemID, ItemDescription, ItemQty, ItemPrice)

ORDERS (OrderID, OrderDate, CustID, CustName, CustAddr, CustCity, CustState)


ORDERS_ITEMS (OrderID, ItemID, ItemQty)
ITEMS

(ItemID, ItemDescription, ItemPrice)

CUSTOMER (CustID, CustName, CustAddr, CustCity, CustState)

IT1768 Database Systems

49

Topic 9

Finally - Normalized tables

IT1768 Database Systems

50

Topic 9

Design Trade-offs
De-Normalization
Sometimes, the relations could have a very low update rate
or very high query rate and the performance is
unsatisfactory. It is not worth the cost of normalizing it.
One example is as follows:
CUSTOMER (CustNumber, CustName, City, State, Zip)

Key : CustNumber

The Normalized relations are as follows:


CUSTOMER (CustNumber, CustName, Zip)
CODES ( Zip, City, State )
The two tables above are in normal form, but they most likely do not represent a
better design. The un-normalized table is probably better because it is easier to
process and the disadvantages of duplicating the City and State data are not very
important.
IT1768 Database Systems

51

Topic 9

Summary
Explain data redundancy and database
anomalies
Introduce normalization process
1NF to remove repeating groups in relations
2NF with Functional dependency among key
and non-key attributes
3NF with transitive dependency among nonkey attributes

IT1768 Database Systems

52

You might also like