Normalization
Normalization
A technique for producing a set of tables with desirable properties that support the
requirements of a user or company. Normalization is a useful technique in database design as
it’s used to check the structure of tables created from an ER model. Normalization is often
performed as a series of tests on a table to determine whether it satisfies or violates the rules
for a given normal form.
There are several normal forms, although the most commonly used ones are called first
normal form (lNF), second normal form (2NF), and third normal form (3NF). All these
normal forms are based on rules about relationships among the columns of a table. Badly
structured tables that contain redundant data can potentially suffer from problems called
update anomalies. Badly structured tables may occur due to errors in the original ER model or
in the process of translating the ER model into tables.
A major aim of relational database design is to group columns into tables to minimize data
redundancy and reduce the file storage space required by the implemented base tables. Tables
that have redundant data may have problems called update anomalies, which are classified as
insertion, deletion, or modification anomalies.
Insertion anomalies
You may wish to insert some data but the structure of the relation may not allow it, or may
cause an inconsistency in the data. There are two main types of insertion anomalies:
To insert the details of a new member of staff located at a given branch into the Staff Branch
table, we must also enter the correct details for that branch. For example, to insert the details
of a new member of staff at branch B002, we must enter the correct details of branch B002 so
that the branch details are consistent with values for branch B002 in other records of the Staff
Branch table.
To insert details of a new branch that currently has no members of staff into the Staff Branch
table, it's necessary to enter nulls into the staff-related columns, such as staff No. However, as
staff No is the primary key for the Staff Branch table, attempting to enter nulls for staff No
violates entity integrity, and is not allowed.
Deletion anomalies
Modification anomalies
If we want to change the value of one of the columns of a particular branch in the Staff
Branch table, for example the telephone number for branch BOOl, we must update the records
of all staff located at that branch. If this modification is not carried out on all the appropriate
records of the Staff Branch table, the database will become inconsistent.
Only first normal form (INF) is critical in creating appropriate tables for relational databases.
All the subsequent normal forms are optional. However, to avoid the update anomalies, it's
normally recommended that you proceed to third normal form (3NF).
A table in which the intersection of every column and record contains only one value.
Converting to lNF
To convert this version of the Branch table to INF, we create a separate table called
BranchTelephone to hold the telephone numbers of branches, by removing the tel Nos column
from the Branch table along with a copy of the primary key of the Branch table (branchNo).
The primary key for the new BranchTelephone table is the new telNo column.
A relation is in 2NF if and only if it is in 1NF and every non-key attribute is fully dependent
on the primary key
A table that is in first normal form and every non-primary-key column is fully functionally
dependent on the primary key. Full functional dependency indicates that if A and B are
columns of a table, B is fully functionally dependent on A (A B), if B is not dependent
on any subset of A. If B is dependent on a subset of A, this is referred to as a partial
dependency. If a partial dependency exists on the primary key, the table is not in 2NF. The
partial dependency must be removed for a table to achieve 2NF.
Third normal form (3NF)
A relation is in 3NF if and only if it is in 2NF and the non-key attributes are mutually
independent i.e no non-primary key attributes are transitively dependent on the primary key.
Transitive dependence is a type of functional dependency that occurs when a particular type
of relationship holds between columns of a table.