Normalization
Normalization
Introduction to Normalization
• Normalization: Process of decomposing unsatisfactory "bad" relations by
breaking up their attributes into smaller relations
• Normalization is the process of removing redundant data from your tables
in order to improve storage efficiency, data integrity and scalability.
• This improvement is balanced against an increase in complexity and
potential performance losses from the joining of the normalized tables at
query-time.
• There are two goals of the normalization process:
– Eliminating redundant data (for example, storing the same data in more than one table)
and ensuring data dependencies make sense (only storing related data in a table). Both of
these are worthy goals as they reduce the amount of space a database consumes and
ensure that data is logically stored.
Introduction to Normalization
Normal form: Condition using keys and FDs of a relation to certify whether a relation schema
is in a particular normal form
– 2NF, 3NF, BCNF based on keys and FDs of a relation schema
– 4NF based on keys, multi-valued dependencies
There is a sequence to normal forms:
– 1NF is considered the weakest,
– 2NF is stronger than 1NF,
– 3NF is stronger than 2NF, and
– BCNF is considered the strongest
Also,
– any relation that is in BCNF, is in 3NF;
– any relation in 3NF is in 2NF; and
– any relation in 2NF is in 1NF.
WHY WE NEED NORMALIZATION?
• Updation Anamoly : To update address of a student who occurs twice or more than
twice in a table, we will have to update S_Address column in all the rows, else data
will become inconsistent.
• Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name
and address of a student but if student has not opted for any subjects yet then we have
to insert NULL there, leading to Insertion Anamoly.
• Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it,
when we delete that row, entire student record will be deleted along with it.
Example : Anomalies
in DBMS
Idea: In the below table of Student Info, we have tried to store entire data
about students.
Result: Entire branch data must be repeated for every student of branch.
Insertion Anomalies: When certain data (attributes) cannot be inserted into the database,
without the presence of other data. For example: In above table, if no student enrolled for any
branch then we can not add branch name.
Deletion Anomalies: If we delete some data (unwanted), it cause deletion of some other data
(wanted) due to multiple entries.
Update Anomalies: When we want to update a single piece of data, but it must be done at all of
its copies (due to multiple entries)
Normalization
Systematic process of reducing the redundancy and avoiding the existing anomalies in
the relation.
Objectives of Normalization
• An outer join between Employee and Employee Degree will produce the
information we saw before
Implication of INF
You must have single value in a single column, null value can be
present.
• 2NF (and 3NF) both involve the concepts of key and non-key attributes.
•
• A key attribute is any attribute that is part of a key; any attribute that is
not a key attribute, is a non-key attribute
Second Normal Form
• {SSN, PNUMBER} HOURS is a full FD since neither SSN HOURS
nor PNUMBER HOURS hold
• {SSN, PNUMBER} ENAME is not a full FD (it is called a partial
dependency ) since SSN ENAME also holds
• R can be decomposed into 2NF relations via the process of 2NF
normalization
Second Normal Form
Example: 2nd Normal Form
Example: R ( ABCD)
• Definition
– Transitive functional dependency – if there a set
of atribute Z that are neither a primary or candidate
key and both X Z and Y Z holds.
3 Normal Form
rd
Now all non-key attributes are fully functional dependent only on the primary key. In
[TABLE_BOOK], both [Genre ID] and [Price] are only dependent on [Book ID]. In
[TABLE_GENRE], [Genre Type] is only dependent on [Genre ID].
EXAMPLE
Consider this table:
FD set:
{STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key:
{STUD_NO}
For this relation in table , STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It
violates the third normal form. To convert it in third normal form, we will decompose the
relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as:
Note –
• Pno - Pname
• Pno - Color
• Pno - Wt
Dependency Diagrams - Exmaple
<ProjectCost <EmployeeProject>
>
The above relations states:
Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on
the project by the employee.
This summarizes and gives our fully functional dependency:
❑ The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example:
❑ <StudentProject>
❑ The StudentName can be determined by StudentID that makes the relation Partial
Dependent.
❑ The ProjectName can be determined by ProjectID, which that the relation Partial
Dependent.
Transitive Dependency
The transitivity rule is perhaps the most important one. It states that
if X functionally determines Y and Y functionally
determine Z then X functionally determines Z.
X🡪 Y
Y🡪 Z
X🡪 Z
Multi-Value Dependency
• A Multi-Value Dependency (MVD) occurs when two or more
independent multi valued facts about the same attribute occur within the
same table.
• Multivalued dependencies occur when the presence of one or
more rows in a table implies the presence of one or more other rows in
that same table.
• Examples: For example, imagine a car company that manufactures
many models of car, but always makes both red and blue colors of each
model. If you have a table that contains the model name, color and year
of each car the company manufactures, there is a multivalued
dependency in that table.
• If there is a row for a certain model name and year in blue, there must
also be a similar row corresponding to the red version of that same car.
Multivalued dependency
Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table. A multivalued
dependency is a complete constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation.
Example:
In this example, maf_year and color are independent of each other but
dependent on car_model. In this example, these two columns are said to
be multivalue dependent on car_model.
This dependence can be represented like this:
car_model -> maf_year
car_model-> colour
Car_model Maf_year Color
H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray
Join Dependency
• A Join Dependency exists if a relation R is equal to the join of the
projections X Z.