Chapter Four
Chapter Four
Normalization
Objectives
Recalling Relational concepts
Understand different anomalies and functional dependency
concepts
Use normalization to convert anomalous tables to well-
structured relations
Why normalise
What is normalisation
Identify three problems solved by normalisation
Example of how to normalise
2
Relation
Definition: A relation is a named, two-dimensional table of data
Table consists of rows (records) and columns (attribute or field)
Requirements for a table to qualify as a relation:
It must have a unique name
Every attribute value must be atomic (not multivalued, not
composite)
Every row must be unique (can’t have two rows with exactly the
same values for all their fields)
Attributes (columns) in tables must have unique names
The order of the columns and rows must be irrelevant
3
Relation …
A relational database is merely a collection of data,
organized in a particular manner.
As the father of the relational database approach,
Boyce-Codd created a series of rules called normal
forms that help define that organization
Recall that
One of the best ways to determine what information should be
stored in a database is to clarify what questions will be asked of
it and what data would be included in the answers.
4
Why normalize
Data design aims to identify data stored in a system
Almost certainly stored in a relational database
Normalization intended to
Eliminate redundancy
Organize data efficiently
Reduce the potential for anomalies
5
What is normalization
Decompose a relation into a set of smaller relations
That achieve the goals stated previously
A relation is in a specific normal form (NF) if it
Satisfies requirements of all previous NFs
Satisfies requirements of the current NF
We concentrate on first 3 NFs
Data/Database normalization is a series of steps followed to
obtain a database design that allows for consistent storage and
efficient access of data in a relational database.
These steps reduce data redundancy and the risk of data
becoming inconsistent.
6
Cont…
Formal definition
NORMALIZATION is the process of identifying the
logical associations between data items and designing a
database that will represent such associations but without
suffering the update anomalies which are;
Insertion Anomalies
Deletion Anomalies
Modification Anomalies
8
Cont…
Denormalisation
Doesn't always make sense for data to be normalised
Some applications work better with denormalised data
Usually those that rely on lots of read only operations
11
Anomalies in this Table
Insertion–can’t enter a new employee without having the employee take a class
Deletion–if we remove employee 140, we lose information about the existence
of a Tax Acc class
Modification–giving a salary increase to employee 100 forces us to update
multiple records
13
Data Dependency
The logical associations between data items that point the
database designer in the direction of a good database design
are referred to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or
dependent relationship if certain values of data item B always
appears with certain values of data item A.
If the data item A is the determinant data item and B the
dependent data item then the direction of the association is
from A to B and not vice versa.
FDs are derived from the real-world constraints on the
attributes
14
Data Dependency
The essence of this idea is that if the existence of something,
call it A, implies that B must exist and have a certain value,
then we say that "B is functionally dependent on A."
We also often express this idea by saying that "A determines
B," or that "B is a function of A," or that "A functionally
governs B."
15
Data Dependency
Often, the notions of functionality and functional
dependency are expressed briefly by the statement, "If A,
then B."
It is important to note that the value B must be unique for a
given value of A, i.e., any given value of A must imply just
one and only one value of B, in order for the relationship to
qualify for the name "function."
(However, this does not necessarily prevent different values
of A from implying the same value of B.)
16
Data Dependency
X Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
The notation is: AB which is read as; B is functionally
dependent on A
In general, a functional dependency is a relationship among
attributes.
In relational databases, we can have a determinant that
governs one other attribute or several other attributes.
Who will tell us this FD? How do we know?
17
Data Dependency…
Partial Dependency
If an attribute which is not a member of the primary
key is dependent on some part of the primary key (if
we have composite primary key) then that attribute is
partially functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is non key
attribute.
Then if {A,B}C and BC
Then C is partially functionally dependent on {A,B}
18
Data Dependency…
Full Dependency
If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key
(if we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
Let {A,B} is the Primary Key and C is non key attribute
Then if {A,B}C and BC and AC does not hold
Then C Fully functionally dependent on {A,B}
19
Data Dependency…
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship
of the following form: "If A implies B, and if also B implies C, then
A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must be
an Animal.
Generalized way of describing transitive dependency is
that:
If A functionally governs B, AND If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B / A and C
/ A)
In the normal notation:
{(AB) AND (BC)} ==> AC provided that B / A and C / A
20
Steps of Normalization
We have various levels or steps in normalization called
Normal Forms.
The level of complexity, strength of the rule and
decomposition increases as we move from one lower level
Normal Form to the higher.
A table in a relational database is said to be in a certain
normal form if it satisfies certain constraints.
Normal form below(next) represents a stronger condition
than the previous one
21
Steps in normalization- Pictorial representation
22
First Normal Form (1NF)
Requires that all column values in a table are atomic (e.g., a
number is an atomic value, while a list or a set is not).
Solution
Moving this repeating groups to a new row by repeating the common
attributes. If so then Find the key with which you can find all data
Thus
No multivalued attributes
Every attribute value is atomic
For example Fig. 5-25 is not in 1st Normal Form (multivalued
attributes) it is not a relation
While Fig. 5-26 is in 1st Normal form
Remark
All relations are in 1st Normal Form
23
Cont…
Formal Definition: a table (relation) is in 1NF
If
There are no duplicated rows in the table. Unique identifier
Each cell is single-valued (i.e., there are no repeating groups).
Entries in a column (attribute, field) are of the same kind.
24
Example for First Normal form (1NF )
25
26
Example 2: Table with multivalued attributes, not in 1st
normal form
27
Table with no multivalued attributes and unique rows, in 1st
normal form
29
Cont…
Formal Definition: a table (relation) is in 2NF
If
It is in 1NF and
If all non-key attributes are dependent on the entire
primary key. i.e. no partial dependency.
30
Example 1: for 2NF
EMP_PROJ
EmpID EmpName ProjNo ProjName ProjLoc ProjFund PrrojMangID Incentive
EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID Incentive
31
Cont…
To convert it to a 2NF we need to remove all partial
dependencies of non key attributes on part of the primary
key.
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo} Incentive
32
Cont…
As we can see, some non key attributes are partially
dependent on some part of the primary key.
This can be witnessed by analyzing the first two
functional dependencies (FD1 and FD2).
Thus, each Functional Dependencies, with their
dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for
each.
33
Cont…
34
Example 2: Functional dependency diagram for INVOICE
Getting it into
Second Normal
Form
37
Cont…
2NF PLUS no transitive dependencies (functional dependencies
on non-primary-key attributes)
Note: This is called transitive, because the primary key is a determinant
for another attribute, which in turn is a determinant for a third
Solution:
Non-key determinant with transitive dependencies go into a new table;
non-key determinant becomes primary key in the new table and stays as
foreign key in the old table
38
Removing transitive dependencies
Getting it into
Third Normal
Form
39
2nd Example for (3NF)
Assumption: Students of same batch (same year) live in one
building or dormitory
Student
This schema is in its 2NF since the primary key is a single attribute.
40
Cont…
StudIDYear ANDYearDormitary
And
Year can not determine StudID and Dormitary can not
determine StudID
Then transitively StudIDDormitary
41
Cont…
42
Cont…
Generally, eventhough there are other four additional levels
of Normalization, a table is said to be normalized if it reaches
3NF.
A database with all tables in the 3NF is said to be Normalized
Database.
Reading Assignment
Boyce-Codd Normal Form (BCNF)
Forth Normal form (4NF)
Fifth Normal Form (5NF)
Domain-Key Normal Form (DKNF)
43
Summary of the Process of
Normalization
To understand normalisation you need to know these
problems
1NF: Repeating groups
2NF: Partial Dependency
3NF: Transitive Dependency and Derived Attributes
Normalisation is the process of decomposing relations
44
Cont…
Some important indicators
No Repeating or Redunduncy: no repeting fields in the table.
The Fields Depend Upon the Key: the table should solely depend on
the key.
The Whole Key: no partial keybdependency.
And Nothing But The Key: no inter data dependency.
45
Cont…
Pitfalls of Normalization
Requires data to see the problems
May reduce performance of the system
Is time consuming,
Difficult to design and apply and
Prone to human error
46
End of Chapter Four
47