DB Lecture W09a Normalization
DB Lecture W09a Normalization
Databases Systems 1
Normalization
Normalization
• The important objective is to deal with anomalies
• Normalization is not must – but highly recommended
– Design a database without normalization – will still work
– If normalization is not done while designing a database – will
create many problems
• The conclusion – go for Normalization
– The process of organizing data to minimize redundancy is called
normalization.
• The goal of database normalization is
– to decompose relations with anomalies
– produce smaller, well-structured relations.
Databases Systems 2
Normalization
Normalization –– Cont…
Cont…
• Normalization usually involves
– dividing large tables into smaller (and less redundant) tables
– and defining relationships between them.
Databases Systems 3
Normalization
Normalization –– Cont…
Cont…
– Codd went on to define the Second Normal Form (2NF) and
Third Normal Form (3NF) in 1971,
– Codd and Raymond F. Boyce defined the Boyce-Codd Normal
Form (BCNF) in 1974 – the 4th NF.
– Higher normal forms (e.g., 5th NF) were defined by other
theorists in subsequent years.
• Informally, a relational database table (the computerized
representation of a relation)
– Often described as "normalized" if it is in the Third Normal Form.
– Most 3NF tables are free of insertion, update, and deletion
anomalies
Databases Systems 4
Anomalies
Anomalies
• Anomaly (flaw)
– The situation that can convert or make a database incorrect /
inconsistent.
• Anomalies will occur easily, if DB is not designed
carefully
– It can be controlled if aware of anomalies
• Types of Anomalies
1. Repetition Anomaly
2. Update Anomaly
3. Insertion Anomaly
4. Deletion Anomaly
Databases Systems 5
Repetition
Repetition Anomaly
Anomaly
• The ENAME, TITLE, SAL attribute values are repeated
for each project that the employee is involved in.
– Waste of space
– Complicates updates
EMP
Databases Systems 6
Update
Update Anomaly
Anomaly
• If any attribute of Employees Skills (say Employee
Address of an employee) is updated,
– Multiple tuples have to be updated to reflect the change.
Databases Systems 7
Insertion
Insertion Anomaly
Anomaly
• Until the new faculty member, Dr. Newsome, is assigned
to teach at least one course, his details cannot be
recorded
Databases Systems 8
Deletion
Deletion Anomaly
Anomaly
• If an engineer, who is the only employee on a project, leaves the
company, his personal information cannot be deleted, or the
information about that project is lost.
– May have to delete many tuples.
EMP
Databases Systems 9
What
What to
to do?
do?
• Take each relation individually and improve it in terms of
the desired characteristics
• Major activity of normalization is the Decomposition
– The process to break a relation
– The popular approach used is the Universal Relation Approach
• Starting from a single (universal) relation moving towards more
relations, until no anomalies exist.
1. The concept is to consider the whole database as a single relation
2. Analyze and remove the anomalies
The single relation is divided (decomposed) into two relations
3. Repeat step 2 for individual relation
4. The process continues until there remains no anomalies
Databases Systems 10
Normalization
Normalization Issues
Issues
1. How do we decompose a schema into a desirable
normal form?
2. What criteria should the decomposed schemas follow in
order to preserve the semantics of the original schema?
A. Lossless Decomposition: No loss of information, i.e., when a
relation is decomposed into two relations, as a result, no
information should go loss.
– i.e., obtain the original relation in terms of
i. Scheme of the relation
ii. Data of the relation
Databases Systems 11
Normalization
Normalization Issues
Issues –– Cont…
Cont…
– For example; a relation with 15 attributes originally
• Must obtain the same number of attributes after decomposition
– Original 100 records in a relation
• Must obtain the same original number of records after decomposition
– Note: Less or extra no. of attributes / records
• is loss of information
Databases Systems 12
Dependency
Dependency Structure
Structure
• Normalization process apply on DB / Tables schemes
– Based on Dependencies, i.e., starting from a universal relation
and decomposed in 2, 4 5, 6 etc relations
• The question is that
– On which basis such decomposition takes place?
Databases Systems 13
Dependency
Dependency
• A relation R defined on attributes A (A1, A2, A3 ..., An), if X
Databases Systems 14
Form
Form of
of Dependencies
Dependencies
• Functional Dependency (FD)
– The first three normal forms (1NF, 2NF, 3NF) and BCNF of
normalization process depends on FDs
• Multi-valued Dependency (MVD)
– Fourth normal form (4NF) is based on MVD
Databases Systems 15