Normalization 1
Normalization 1
There are four types of anomalies, which are of concern, redundancy, insertion, deletion and
updation. Normalization is not compulsory, but it is strongly recommended that normalization must
be done. Because normalized design makes the maintenance of database much easier. While carrying
out the process of normalization, it should be applied on each table of database. It is performed after
the logical database design. This process is also being followed informally during conceptual database
design as well.
Normalization Process
There are different forms or levels of normalization. They are called as first, second and so on. Each
normalized form has certain requirements or conditions, which must be fulfilled. If a table or relation
fulfills any particular form then it is said to be in that normal form. The process is applied on each
relation of the database. The minimum form in which all the tables are in is called the normal form of
entire database. The main objective of normalization is to place the database in highest form of
normalization.
Functional Dependency
Normalization is based on the concept of functional dependency. A functional dependency is a type of
relationship between attributes.
Definition of Functional Dependency If A and B are attributes or sets of attributes of relation R, we say
that B is functionally dependent on A if each value of A in R has associated with it exactly one value of B
in R.
We write this as A B, read as “A functionally determines B” or “ A determines B”. This does not mean
that A causes B or that the value of B can be calculated from the value of A by a formula, although
sometimes that is the case. It simply means that if we know the value of A and we examine the table of
relation R, we will find only one value of B in all the rows that have the given value of A at any one time.
Thus then the two rows have the same A value, they must also have the same B value. However, for a
given B value, there may be several different A values. When a functional dependency exits, the
attributes or set of attributes on the left side of the arrow is called a determinant. Attribute of set of
attributes on left side are called determinant and on right are called dependants. If there is a relation R
with attributes (a,b,c,d,e)
a b,c,d d e For Example there is a relation of student with following attributes. We will establish
the functional dependency of different attributes: -
STD (stId,stName,stAdr,prName,credits)
stId stName,stAdr,prName,credits
prName credits
Now in this example if we know the stID we can tell the complete information about that student.
Similarly if we know the prName , we can tell the credit hours for any particular subject.
Functional Dependencies and Keys: We can determine the keys of a relation after seeing its functional
dependencies. The determinant of functional dependency that determines all attributes of that table
is the super key. Super key is an attribute or a set of attributes that identifies an entity uniquely. In a
table, a super key is any column or set of columns whose values can be used to distinguish one row
from another. A minimal super key is the candidate key , so if a determinant of functional dependency
determines all attributes of that relation then it is definitely a super key and if there is no other
functional dependency whereas a subset of this determinant is a super key then it is a candidate key.
So the functional dependencies help to identify keys. We have an example as under: -
EMP (eId,eName,eAdr,eDept,prId,prSal)
eId (eName,eAdr,eDept)
eId,prId prSal
Now in this example in the employee relation eId is the key from which we can uniquely determine the
employee name address and department . Similarly if we know the employee ID and project ID we can
find the project salary as well. So FDs help in finding out the keys and their relation as well.
Normal Forms
Normalization is basically; a process of efficiently organizing data in a database. There are two goals of
the normalization process: eliminate redundant data (for example, storing the same data in more than
one table) and ensure data dependencies make sense (only storing related data in a table). Both of
these are worthy goals as they reduce the amount of space a database consumes and ensure that data
is logically stored. We will now study the first normal form
First Normal Form: A relation is in first normal form if and only if every attribute is single valued for
each tuple. This means that each attribute in each row , or each cell of the table, contains only one
value. No repeating fields or groups are allowed. An alternative way of describing first normal form is
to say that the domains of attributes of a relation are atomic, that is they consist of single units that
cannot be broken down further. There is no multivalued (repeating group) in the relation multiple
values create problems in performing operations like select or join. For Example there is a relation of
Student
Now this table is in first normal form and for every tuple there is a unique value.
A relation is in second normal form if and only if it is in first normal form and all non key attributes are
fully functionally dependent on the key. Clearly if a relation is in 1NF and the key consists of a single
attribute, the relation is automatically 2NF. The only time we have to be concerned 2NF is when the
key is composite. A relation that is not in 2NF exhibits the update, insertion and deletion anomalies we
will now see it with an example. Consider the following relation.
Now in this relation the key is course ID and student ID. The requirement of 2NF is that all non-key
attributes should be fully dependent on the key there should be no partial dependency of the
attributes. But in this relation student ID is dependent on student name and similarly course ID is
partially dependent on faculty ID and room, so it is not in second normal form. At this level of
normalization, each column in a table that is not a determiner of the contents of another column must
itself be a function of the other columns in the table. For example, in a table with three columns
containing customer ID, product sold, and price of the product when sold, the price would be a
function of the customer ID (entitled to a discount) and the specific product. If a relation is not in 2NF
then there are some anomalies, which are as under:
• Redundancy
• Insertion Anomaly
• Deletion Anomaly
• Updation Anomaly
• Remove subsets of data that apply to multiple rows of a table and place them in separate rows.
• Create relationships between these new tables and their predecessors through the use of foreign
keys.
• Identify any determinants other than the composite key, and the columns they determine.
• Create and name a new table for each determinant and the unique columns it determines.
• Move the determined columns from the original table to the new table. The determinate becomes
the primary key of the new table.
• Delete the columns you just moved from the original table except for the determinant which will
serve as a foreign key.
Transitive Dependency
Transitive dependency is one that carries over another attribute. Transitive dependency occurs when
one non-key attribute determines another non-key attribute. For third normal form we concentrate
on relations with one candidate key, and we eliminate transitive dependencies. Transitive
dependencies cause insertion, deletion, and update anomalies. We will now see it with an example:-
Now here the table is in second normal form. As there is no partial dependency of any attributes here.
The key is student ID . The problem is of transitive dependency in which a non-key attribute can be
determined by a non-key attribute. Like here the program credits can be determined by program
name, which is not in 3NF. It also causes same four anomalies, which are due to transitive
dependencies. For Example:-
Now in this table all the four anomalies are exists in the table. So we will have to remove these
anomalies by decomposing this table after removing the transitive dependency. We will see it as
under: -
Identify any determinants, other the primary key, and the columns they determine.
• Create and name a new table for each determinant and the unique columns it determines.
• Move the determined columns from the original table to the new table. The determinate becomes
the primary key of the new table.
• Delete the columns you just moved from the original table except for the determinate which will
serve as a foreign key.
We have now decomposed the relation into two relations of student and program. So the relations are
in third normal form and are free of all the anomalies