Database Normalization
Database Normalization
Normalization
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy(repetition) and
undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step
process that puts data into tabular form, removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
Eliminating redundant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.
Why Normalize a Database?
An anomaly is where there is an issue in the data that is not meant to be there. This can happen
if a database is not normalised.
We’ll be using a student database as an example, which records student, class, and teacher
information.
Student ID Student Name Fees Paid Course Name Class 1 Class 2 Class 3
This is not a normalised table, and there are a few issues with this.
Insert Anomaly
An insert anomaly happens when we try to insert a record into this table without knowing all
the data we need to know.
For example, if we wanted to add a new student but did not know their course name.
The new record would look like this:
Student Course
Student ID Fees Paid Class 1 Class 2 Class 3
Name Name
1 John Smith 200 Economics Economics 1 Biology 1
Computer Business Programming
2 Maria Griffin 500 Science Biology 1 Intro 2
Susan
3 400 Medicine Biology 2
Johnson
4 Matt Long 850 Dentistry
Jared
5 0 ?
Oldham
We would be adding incomplete data to our table, which can cause issues when
trying to analyse this data.
Update Anomaly
An update anomaly happens when we want to update data, and we update some of the data
but not other data.
For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would have to
query all of the columns that could have this Class field and rename each one that was found.
Student Course
Student ID Fees Paid Class 1 Class 2 Class 3
Name Name
Intro to
1 John Smith 200 Economics Economics 1
Biology
Computer Intro to Business Programming
2 Maria Griffin 500
Science Biology Intro 2
3 Susan 400 Medicine Biology 2
Johnson
4 Matt Long 850 Dentistry
There’s a risk that we miss out on a value, which would cause issues.
Ideally, we would only update the value once, in one location.
Delete Anomaly
A delete anomaly occurs when we want to delete data from the table, but we end up deleting
more than what we intended.
For example, let’s say Susan Johnson quits and her record needs to be deleted from the system.
We could delete her row:
But, if we delete this row, we lose the record of the Biology 2 class, because it’s not stored
anywhere else. The same can be said for the Medicine course.
We should be able to delete one type of data or one record without having impacts on other
records we don’t want to delete.
Without any normalization, all information is stored in one table as shown below.
1NF Example
2NF (Second Normal Form) Rules
Rule 1- Be in 1NF
Rule 2- Single Column Primary Key
Table 2
3NF (Third Normal Form) Rules
- Be in 2NF
Table 2
SUMMARY
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion
Anomalies.
It is a multi-step process that puts data into tabular form, removing duplicated data from the
relation tables