Chapter 5
Chapter 5
Data Normalisation
1. Introduction
CDM Tasks frequencies
External models
OIS
ODM
Conceptual/organisational OMP
Transformation
Formalism
CIS
Logical/physical
Raw
LDM
Transformation Transformation
Formalism Formalism
Valued
LDM
Optimisation
Optimised
LDM
Technical
constraints
PDM
Database schema
Figure 1: Transition from the conceptual data model to the logical data model
1
One of the main challenges of database design is to avoid redundancies and
incoherencies. a first approach to constitute a set of good tables limiting the risk of potential
inconsistencies (avoid redundancies and null values) is to start from a universal table whose
diagram consists of all the attributes, to which we apply a normalization algorithm, or
normalization theory.
Normalization requires having more semantics on data. This complementary
semantics will be expressed through functional dependencies(FD) between attributes.
Normalization then presents itself as a decomposition process of this starting table into several
tables by projections judiciously defined according to these functional dependencies between
attributes. This normalization process, or standardization theory, can be illustrated as follows
(see Figure 2):
• attribute 1
• attribute 2
• attribute 3
Semantics
• attribute 4
on data
• attribute i
Decomposition
• attribute n algorithm
(normalisation) Reconstruction
with joints
Reconstruction Structuration
with joins of relations
R1 ••
Ri • •
Final relations
••
Rn •••
2
Set of relations (tables)
Relations in first normal form (1NF)
Relations in second normal form (2NF)
Relations in third normal form (3NF)
ORDER_LINE table
order_num Compound
item_num primary key
designation
qty
3
Consider the CUSTOMER table (customer_number, name, address); we have the following
functional dependencies:
4
4. Second Normal Form (2NF)
This standardization requires that the table is already in 1st normal form. it concerns
only tables with a compound primary key (composed of several attributes).
The rule imposes that the non-primary key attributes depend on the entire primary key. Any
attribute that would depend only on part of the primary key must be excluded from the table.
The process is as follows:
Group the attributes depending on the entire key in a table, and keep this key for this
table;
Group in another table the attributes depending on a part of the key, and make this part
of the key the primary key of the new table.
We notice that this second normal form is similar to the normalization rule of relations used in
entity-relationship formalism.
In our example, the ITEM_ORDER table is not in second form normal because the non-key
attribute “designation” does not fully depend on the compound primary key:
ORDER_LIGNE table
order_num
Compound PK
item_num
ITEM_ORDER table
qty
order_num Compound
item_num primary
designation ITEM table
key
qty item_num
qty
The transition to the second normal form lead us to replace this table ITEM_ORDER by the
tables ORDER_LIGNE and ITEM in second normal as shown here above.
Keep in the initial table the attributes that directly depend on the key.
Group in a table the attributes depending transitively; the transitive attribute remains
duplicated in the initial table, and becomes the primary key of the new table.
Note that Codd and many specialists have rigorously demonstrated that a third normal
form data model was a "canonical form” on a dataset, and that it minimized the redundancy
of the future database.
5
In our example, the previous COMMAND table is not in the third normal form because the
non-key attribute name depends on the key by transitivity:
ORDER table
order_num (PK)
date
ORDER table customer_num
order_num (PK)
date
customer_num
name CUSTOMER table
customer_num (PK)
name
The transition to third normal form lead us to replace the table ORDER by the tables ORDER
and CUSTOMER as illustrated here above.
ASSIGN table
employee_number
Primary key
project_number
nb_hours
unit_fab
The transition to the Boyce-Codd normal form will lead us to replace this table by the
following tables:
ASSIGN table
employee_number
Primary key
project_number
nb_hours
6
UNIT_FAB table
unit_fab (PK)
project_number