Database Normalization Tutorial
Database Normalization Tutorial
Conclusion
What Is Database Normalization?
Database normalization or SQL normalization helps us to group related data in
one single table. Any attributive data or indirectly related data are put in
Page 1 of 14
different tables and these tables are connected with a logical relationship
between parent and child tables.
In 1970, Edgar F. Codd came up with the concept of normalization. He shared a
paper named “A Relational Model of Data for Large Shared Banks” in which he
proposed “First Normal Form (1NF)”.
Advantages Of DBMS Normalization
Database Normalization provides the following basic advantages:
1. Normalization increases data consistency as it avoids the duplicity of data
by storing the data in one place only.
2. Normalization helps in grouping like or related data under the same schema,
thereby resulting in the better grouping of data.
3. Normalization improves searching faster as indexes can be created faster.
Hence, the normalized database or table is used for OLTP (Online
Transaction Processing).
Disadvantages Of Database Normalization
DBMS Normalization has the following disadvantages:
Page 2 of 14
1. We cannot find the associated data for, say a product or employee in one
place and we have to join more than one table. This causes a delay in
retrieving the data.
2. Thus, Normalization is not a good option in OLAP transactions (Online
Analytical Processing).
Before we proceed further, let’s understand the following terms:
Entity: Entity is a real-life object, where the data associated with such an
object is stored in the table. The example of such objects are employees,
departments, students, etc.
Attributes: Attributes are the characteristics of the entity, that give some
information about the Entity. For Example, if tables are entities, then the
columns are their attributes.
Types Of Normal Forms
#1) 1NF (First Normal Form)
By definition, an entity that does not have any repeating columns or data groups
can be termed as the First Normal Form. In the First Normal Form, every
column is unique.
Page 3 of 14
Following is how our Employees and Department table would have looked
if in first normal form (1NF):
empNum lastName firstName deptName deptCity deptCountry
1001 Andrews Jack Accounts New York United States
1002 Schwatz Mike Technology New York United States
1009 Beker Harry HR Berlin Germany
1007 Harvey Parker Admin London United Kingdom
1007 Harvey Parker HR London United Kingdom
Here, all the columns of both Employees and Department tables have been
clubbed into one and there is no need of connecting columns, like deptNum, as
all data is available in one place.
But a table like this with all the required columns in it, would not only be
difficult to manage but also difficult to perform operations on and also
inefficient from the storage point of view.
Page 4 of 14
#2) 2NF (Second Normal Form)
By definition, an entity that is 1NF and one of its attributes is defined as the
primary key and the remaining attributes are dependent on the primary key.
Following is an example of how the employees and department table would
look like:
Employees Table:
empNum lastName firstName
1001 Andrews Jack
1002 Schwatz Mike
1009 Beker Harry
1007 Harvey Parker
1007 Harvey Parker
Departments Table:
deptNum deptName deptCity deptCountry
1 Accounts New York United States
2 Technology New York United States
Page 5 of 14
deptNum deptName deptCity deptCountry
3 HR Berlin Germany
4 Admin London United Kingdom
EmpDept Table:
empDeptID empNum deptNum
1 1001 1
2 1002 2
3 1009 3
4 1007 4
5 1007 3
Here, we can observe that we have split the table in 1NF form into three
different tables. the Employees table is an entity about all the employees of a
company and its attributes describe the properties of each employee. The
primary key for this table is empNum.
Similarly, the Departments table is an entity about all the departments in a
company and its attributes describe the properties of each department. The
primary key for this table is the deptNum.
Page 6 of 14
In the third table, we have combined the primary keys of both the tables. The
primary keys of the Employees and Departments tables are referred to as
Foreign keys in this third table.
If the user wants an output similar to the one, we had in 1NF, then the user has
to join all the three tables, using the primary keys.
A sample query would look as shown below:
SELECT empNum, lastName, firstName, deptNum, deptName, deptCity,
deptCountry
FROM Employees A, Departments B, EmpDept C
WHERE A.empNum = C.empNum
AND B.deptNum = C.deptNum
WITH UR;
#3) 3NF (Third Normal Form)
By definition, a table is considered in third normal if the table/entity is already
in the second normal form and the columns of the table/entity are non-
transitively dependent on the primary key.
Page 7 of 14
Let’s understand non-transitive dependency, with the help of the following
example.
Say a table named, Customer has the below columns:
CustomerID – Primary Key identifying a unique customer
CustomerZIP – ZIP Code of the locality customer resides in
CustomerCity – City the customer resides in
In the above case, the CustomerCity column is dependent on the CustomerZIP
column and the CustomerZIP column is dependent on CustomerID.
The above scenario is called transitive dependency of the CustomerCity column
on the CustomerID i.e. the primary key. After understanding transitive
dependency, now let’s discuss the problem with this dependency.
There could be a possible scenario where an unwanted update is made to the
table for updating the CustomerZIP to a zipcode of a different city without
updating the CustomerCity, thereby leaving the database in an inconsistent state.
Page 8 of 14
In order to fix this issue, we need to remove the transitive dependency that could
be done by creating another table, say, CustZIP table that holds two columns i.e.
CustomerZIP (as Primary Key) and CustomerCity.
The CustomerZIP column in the Customer table is a foreign key to the
CustomerZIP in the CustZIP table. This relationship ensures that there is no
anomaly in the updates wherein a CustomerZIP is updated without making
changes to the CustomerCity.
#4) Boyce-Codd Normal Form (3.5 Normal Form)
By definition, the table is considered Boyce-Codd Normal Form, if it’s already
in the Third Normal Form and for every functional dependency between A and
B, A should be a super key.
This definition sounds a bit complicated. Let’s try to break it to understand it
better.
Functional Dependency: The attributes or columns of a table are said to be
functionally dependent when an attribute or column of a table uniquely
identifies another attribute(s) or column(s) of the same table.
For Example, the empNum or Employee Number column uniquely
Page 9 of 14
identifies the other columns like Employee Name, Employee Salary, etc. in
the Employee table.
Super Key: A single key or group of multiple keys that could uniquely
identify a single row in a table can be termed as Super Key. In general
terms, we know such keys as Composite Keys.
Let’s consider the following scenario to understand when there is a problem
with Third Normal Form and how does Boyce-Codd Normal Form comes to
rescue.
empNum firstName empCity deptName deptHead
1001 Jack New York Accounts Raymond
1001 Jack New York Technology Donald
1002 Harry Berlin Accounts Samara
1007 Parker London HR Elizabeth
1007 Parker London Infrastructure Tom
In the above example, employees with empNum 1001 and 1007 work in two
different departments. Each department has a department head. There can be
multiple department heads for each department. Like for the Accounts
department, Raymond and Samara are the two heads of departments.
Page 10 of 14
In this case, empNum and deptName are super keys, which implies that
deptName is a prime attribute. Based on these two columns, we can identify
every single row uniquely.
Also, the deptName depends on deptHead, which implies that deptHead is a
non-prime attribute. This criterion disqualifies the table from being part of
BCNF.
To solve this we will break the table into three different tables as mentioned
below:
Employees Table:
empNum firstName empCity deptNum
1001 Jack New York D1
1001 Jack New York D2
1002 Harry Berlin D1
1007 Parker London D3
1007 Parker London D4
Department Table:
Page 11 of 14
deptNum deptName deptHead
D1 Accounts Raymond
D2 Technology Donald
D1 Accounts Samara
D3 HR Elizabeth
D4 Infrastructure Tom
#5) Fourth Normal Form (4 Normal Form)
By definition, a table is in Fourth Normal Form, if it does not have two or more,
independent data describing the relevant entity.
#6) Fifth Normal Form (5 Normal Form)
A table can be considered in Fifth Normal Form only if it satisfies the conditions
for Fourth Normal Form and can be broken down into multiple tables without
loss of any data.
Frequently Asked Questions And Answers
Q #1) What is Normalization in a Database?
Page 12 of 14
Answer: Database Normalization is a design technique. Using this we can
design or re-design schemas in the database to reduce redundant data and the
dependency of data by breaking the data into smaller and more relevant tables.
Q #2) What are the different types of Normalization?
Answer: Following are the different types of normalization techniques that can
be employed to design database schemas:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (3.5NF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Q #3) What is the Purpose of Normalization?
Answer: The primary purpose of the normalization is to reduce the data
redundancy i.e. the data should only be stored once. This is to avoid any data
anomalies that could arise when we attempt to store the same data in two
different tables, but changes are applied only to one and not to the other.
Page 13 of 14
Q #4) What is Denormalization?
Answer: Denormalization is a technique to increase the performance of the
database. This technique adds redundant data to the database, contrary to the
normalized database that removes the redundancy of the data.
This is done in huge databases where executing a JOIN to get data from multiple
tables is an expensive affair. Thus, redundant data are stored in multiple tables
to avoid JOIN operations.
Conclusion
So far, we have all gone through three database normalization forms.
Theoretically, there are higher forms of database normalizations like Boyce-
Codd Normal Form, 4NF, 5NF. However, 3NF is the widely used normalization
form in the production databases.
Page 14 of 14