12 Normalization
12 Normalization
Yasir Ahmad
[email protected]
Normalization of 2
Database
Goals:
Database Normalization is a technique of organizing the data in the
database.
Normalization is a systematic approach of decomposing tables to eliminate
data redundancy(repetition) and undesirable characteristics like
Insertion, Update and Deletion Anomalies.
It is a multi-step process that puts data into tabular form, removing
duplicated data from the relation tables.
3
Used
STUDENT
8
Toremove Partial
Dependency and violation
on 2NF, decompose the
tables
18
NOTE !!!
Partial
Dependency exists, when for a composite
primary key(more than one primary keys), any
attribute in the table depends only on a part of the
primary key and not on the complete primary key.
The term data integrity refers to the accuracy and consistency of data. ... A good database will
enforce data integrity whenever possible. For example, a user could accidentally try to enter a
phone number into a date field. If the system enforces data integrity, it will prevent the user from
making these mistakes.
21
Third Normal Form
Relation with transitive dependency
22
P -> Q and Q -> R is true, then P-> R is a transitive dependency
CustID Name
CustID Salesperson BUT
CustID Region
CustID Salesperson Region
All this is OK Transitive dependency
(2nd NF) (not 3rd NF)
Database Systems
Removing a transitive dependency
23
Decomposing the SALES relation
Relations in 3NF
24
Salesperson Region
CustID Name
CustID Salesperson
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form
deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does
not have multiple overlapping candidate keys is said to be in BCNF. For a table to be
in BCNF, following conditions must be satisfied:
R must be in 3rd Normal Form
and, for each functional dependency ( X → Y ), X should be a super Key.
CANDIDATE KEY is a set of attributes that uniquely identify tuples in a table. Candidate Key is a
super key with no repeated attributes. The Primary key should be selected from the candidate
keys. Every table must have at least a single candidate key.
For a table to satisfy the Fourth Normal Form, it should satisfy the
following two conditions:
1.It should be in the Boyce-Codd Normal Form.
2.And, the table should not have any Multi-valued Dependency.
39
As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths, and has
two hobbies, Cricket and Hockey.
Well the two records for student with s_id 1, will give rise to two more records, as shown below, because for one
student, two hobbies exists, hence along with both the courses, these hobbies should be specified.
And, in the table above, there is no relationship between the columns course and hobby. They are independent
of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.
40
41