Unit 3
Unit 3
Number of Tables
Redundancy
First Normal Form (1NF)
Complexity
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Domain Key Normal Form (DKNF)
Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFininorder
ordertotoavoid
avoidthe
theDatabase
DatabaseInconsistency
Inconsistency. .
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
Types of Normalization
First Normal Form (1 NF)
Each field contains the smallest meaningful value
Solutions
Create a separate field/table for each set of related data.
Careful integration of the data from multiple sources may help reduce/avoid
redundancies and inconsistencies and improve mining speed and quality
9
Correlation Analysis
Correlation Analysis involves various method and techniques used
for studying and measuring the level of the relationship between
two variables. Two variables are said to be correlated if the change
in one variable result in a corresponding change in the other
variable.
10
Correlation Analysis
Karl Pearson Coefficient of Correlation
Karl Pearson’s measures, known as Pearsonian correlation coefficient between two
variables (series) X and Y, usually denoted by r (X, Y) or fxy simply r is a numerical
measure of linear relationship between them and is defined as the ratio of the
covariance between X and Y, written as Cov (x, y), to the product of the standard
deviations of X and Y.
11
Correlation Analysis
Interpretation of r
12
Case Study: The following data relate to age of 10 M/c operators and the number of
days on which they reported sick in a month:
Age (X) :28 32 38 42 46 52 54 57 58 63
Sick Days (Y) :0 1 3 4 2 5 4 6 7 8
13
Second Normal Form (2 NF)
Second normal form is based on concept of full functional
dependency . Let us first consider the functional dependency
Example 1
2 Snoopy 232-234-1234
3 Grumpy 665-235-6532
4 Jones 123-333-3333
5 Smith 654-223-3455
6 Joyce 666-666-6666
7 Roman 444-444-4444
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements:
This table is in 1NF but not 2 NF because unit_price is not functionally dependent on order_no and title of
the connected primary key.
Find and remove attributes that are functionally dependent on only a part of the key and not on
the whole key, and place them is deferent table. And Group the remaining attributes.
In The above example, since unit_price is not functionally dependent on whole of the key
Order_no + Title. We may unit_price along with title table called Book_Master.
Order_Master
Order_N0 Title Qty
1
Project
Computer Networks 1
Book_Master 1 Graphics 1
Title Unit_Price 1 DBMS 2
Computer Networks 250 2 Multimedia 1
Graphics 275 2 Data Structure 1
DBMS 295 3 DBMS 1
Multimedia 300 3 Multimedia 2
Data Structure 190 3 Computer Networks 5
Second Normal Form (2NF)
AKTU- 2016-2017
Ex. 1:Consider the universal relation schema R (A,B,C,D,E,F,G,H,I,J) and set of following FD. F={ABC,
ADE, BF, FGH, DIJ} determine the keys for R and decompose R into 2 nd NF.
Sol. : (AB)+ = ABC because ABC
=ABCDE because ADE
=ABCDE F because BF
=ABCDE FGH because FGH
=ABCDE FGH IJ because DIJ
Hence AB is the key of R. The given relation R has composite primary key of {AB} and
non prime attribute are {C,D,E,F,G,H,I,J}.
In this case, FD are ABC, ADE, BF which is only part of the primary key.
Therefore this relation does not satisfy 2NF.To bring this relation to 2NF, we break the
table into three relation are: R1 (A,B,C), R2 (A,D,E,I,J) and R3 (B,F,G,H)
Third Normal Form (3NF)
In the above relation, room_capacity is functional dependent on room_no and room_no is also functional
dependent on Course_name.
To convert the above table into 3 NF, we must remove the column Room_Capacity , since it is not
functional dependent on primary key Course_Name and place in the another table called Room along
with the attribute Room_No it is functionally dependent on.
Course Room_No
Table Course_Name Head_Dept
(Primary Key)
X1 X2
X3
B.Tech (CS) Prof. Gupta 102
B.Tech (IT) Prof. Smith 107
B.Tech (EC) Dr. Sharma 105
B.Tech (AI Mr. Sharma 103
Room_No Room_Capacity
Room
X3 X4
Table
102 60
107 50
105 60
103 100
111 40
Example 2: Consider the non-3NF table
EMPLOYEE_DEPARTMENT TABLE
EMPLOYEE TABLE
DEPARTMENT TABLE
E11 Operations
Ex. 1:Consider the universal relation schema R (A,B,C,D,E, F,G,H,I,J) and set of
following FD. F={ABC, AD, BF, FGH, DIJ} determine the keys
for R and decompose R into 2nd and 3rd NF.
Sol. : (ABE)+ = ABEC because ABC
=ABECD because AD
=ABECDF because BF
=ABECDFGH because FGH
=ABECDFGHIJ because DIJ
Hence ABE is the key of R. The given relation R has composite primary key of {ABE}
and non prime attribute are {C,D,F,G,H,I,J}.
In this case, FD are ABC, AD, BF which is only part of the primary key.
Therefore this relation does not satisfy 2NF. To bring this relation to 2NF, we break the
table into three relation are: R1 (A,B,C,E), R2 (A,D,E,I,J) and R3 (B,F,E,G,H).
Now in R3,BF and FGH then BGH that is transitivity properties exit. Therefore R3
is not in 3NF. To bring R3 in 3NF, we break the table into two relation are:
R3 (B,F,E,G,H) into R4 (B,E,F) and R3 (F,G,H).
Boyce-Codd Normal Form (BCNF)
BCNF is an extended form of 3NF.
In BCNF, we extend our concept up to all the candidate keys of the relation, which are linked
and two or more of the candidate share a common attribute.
Third normal form and BCNF are not same if the following conditions are true:
The table has two or more candidate keys
At least two of the candidate keys are composed of more than one attribute
The keys are not disjoint i.e. The composite candidate keys share some attributes
Example 1: Consider the non-BCNF Table
Table : Student
In this relation two candidate keys (S_Name, Subject) and (Stud_ID, Subject) exist, which are composite
keys and contain a common attribute subject.
This relation is in 3NF, however a lot of data repetition is there, the field of S_Name and Stud_ID.
Example 1: Consider the non-BCNF table
Find the remove the overlapping candidate keys. Place the part of the candidate key the attribute it is
•
Faculty_Course Faculty_Committee
Faculty Subject Faculty Committee
Dr. Sharma DBMS Dr. Sharma Placement
Dr. Sharma OS Dr. Sharma Discipline
Dr. Sharma Data Mining
Fifth Normal Form (5NF)
Fifth normal form is satisfied when all tables are broken into as many tables
as possible in order to avoid redundancy. Once it is in fifth normal form it
cannot be broken into smaller relations without changing the facts or the
meaning.
In the decompose tables, Mr. X is a supplier for Godrel for twice and Mr. Y is also
for twice for H. Lever. But if we decompose the table then we will loose
information, which can be shown as follows:
Company Supplier
Company_Suppliers
Company_Product Company Product (R2 ) Godrej Mr. X
(R1 ) Godrej Soap Godrej Mr. Y
Godrej Shampoo Godrej Mr. Z
H.Lever Soap H.Lever Mr. X
H.Lever Shampoo H.Lever Mr. Y
Example 1: Consider the non-5th table
If we want to display the products and their supplies, then we will have to use the join based
on the company attribute.
The result will display some spurious records. For Mr. Z, it will display both the products, soap
and shampoo as the company for which Mr. Z is the supplier (Godrej) is producing soap and
shampoo, which is correct.
Now suppose that original tables were to be decomposed in three parts, which is as shown.
Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas