Topic 6- Normalization
Topic 6- Normalization
Database normalization is the process of organizing the fields and tables of a relational database
to minimize redundancy. Normalization usually involves dividing large tables into smaller (and
less redundant) tables and defining relationships between them.
Normalization of data can be defined as a process during which the existing tables of a database
are tested to find certain data dependency between the column and the rows or normalizing of
data can be referred to a formal technique of making preliminary data structures into an easy to
maintain and make efficient data structure With data normalization any table dependency
detected, the table is restructured into multiple tables (two tables) which eliminate any column
dependency. Incase data dependency is still exhibited the process is repeated till such
dependency are eliminated. The process of eliminating data redundancy is based upon a theory
called functional dependency
Normalization is the process of applying a number of rules to the tables, which have been
identified in order to simplify. The aim is to highlight dependencies between the various data
items so that we can reduce these dependencies.
Importance of normalization
It highlights constraints and dependency in the data and hence aid the understanding the
nature of the data
Normalization controls data redundancy to reduce storage requirement and standard
maintenance
Normalization provide unique identification for records in a database
Each stage of normalization process eliminate a particular type of undesirable
dependency
Normalization permits simple data retrieval in response to reports and queries
The third normalization form produces well designed database which provides a higher
degree of independency
Normalization helps define efficient data structures
Performing normalization
While designing a database out of an entity–relationship model, the main problem existing in
that ―raw‖ database is redundancy. Redundancy is storing the same data item in more one place.
A redundancy creates several problems like the following:
1. Extra storage space: storing the same data in many places takes large amount of disk space.
2. Entering same data more than once during data insertion.
3. Deleting data from more than one place during deletion.
4. Modifying data in more than one place.
5. Anomalies may occur in the database if insertion, deletion, modification etc are no done
properly. It creates inconsistency and unreliability in the database.
To solve this problem, the ―raw‖ database needs to be normalized. This is a step by step process
of removing different kinds of redundancy and anomaly at each step. At each step a specific rule
is followed to remove specific kind of impurity in order to give the database a slim and clean
look.
In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).
First Normal Form (1NF)
A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a
unique combination of values. The above table in UNF can be processed to create the following
table in 1NF.
As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows cannot be further decomposed, so the relation
is now in 1NF. 563982 616733
Second Normal Form (2NF)
A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends
on the primary key of the relation. Speaking inversely, if a table has some attributes which is not
dependant on the primary key of that table, then it is not in 2NF.
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and
Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which
Bank-Id Bank-Name
B01 SBI
B02 UTI
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.
Third Normal Form (3NF)
A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency in
that relation. Speaking inversely, if a table contains transitive dependency, then it is not in 3NF,
and the table must be split to bring it into 3NF.
What is a transitive dependency? Within a relation if we see
A → B [B depends on A]
And
B → C [C depends on B]
Then we may derive
A → C[C depends on A]
Such derived dependencies hold well in most of the situations. For example if we have
Roll → Marks
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information
that Rao is the Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept.
and deleting Head of Dept. form the given relation. The normalized relations are shown in the
following.
Physics Ghosh
A multi valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.
Revision quiz:
1. Table below is the un-normalized table used by a movie vendor to manager their movie
distribution. Normalize the table up to the third normal form (7marks) (2019July)
NAME PHYSICAL MOVIE TITLE TITLE
ADDRESS
Jean Zani Mashujaa Road The life of a Ms.
politician, bits and
bytes of computers
Sam Lemi Nairobi East Ways The valley of life, Mr.
Cutting the Roses
2. With the aid of an example, explain the term functional dependency as used in database
normalization (3marks)
3. The following are two relations in a database named persons and orders. Use the
information to answer the questions that follows
Person table
P_Id LastName FirstName Address City
1 James Katute 15 Streets Nairobi
2 Smith Nekesa 10 Avenue Nairobi
3 Kristen Oliya Makuba street Kiambu
Orders table
O_Id OrderNo P_Id
1 77895 2
3 22456 2
4 24562 1
i) With the aid of ana example from the table explain the foreign key constraints in
the database (3marks) (2019 July)
Normalize the table above to the second normal form (7marks) (2018Nov)
8. The table below is an employee table represented in 1NF. Use it to answer the questions
that follow
Employee Contract No Hours Employee Company ID Company
ID Name Location
616681B SC1025 72 P.White SC115 Nairobi
674315A SC1025 48 R.Press SC115 Nairobi
323113B SC1026 24 P.Smith SC23 Nairobi
616687B SC1026 24 P.White SC23 Nairobi
Represent the table in its 3NF (8marks) (2017Nov)
12. The table below shows students results slip. Use it to answer the question that follows
Normalize to 3NF-
13.