0% found this document useful (0 votes)
30 views36 pages

Unit 3

The document discusses database normalization. Normalization involves organizing data into multiple tables to reduce redundancy and improve performance. It describes the goals of normalization as reducing redundancy by storing information only once and discusses benefits like saving storage space and faster sorting. The document outlines several forms of normalization from 1st normal form to 5th normal form and domain key normal form. It provides examples to illustrate 1st normal form and 2nd normal form.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views36 pages

Unit 3

The document discusses database normalization. Normalization involves organizing data into multiple tables to reduce redundancy and improve performance. It describes the goals of normalization as reducing redundancy by storing information only once and discusses benefits like saving storage space and faster sorting. The document outlines several forms of normalization from 1st normal form to 5th normal form and domain key normal form. It provides examples to illustrate 1st normal form and 2nd normal form.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Unit - 3

Database Design and Normalization


Definition
 Normalizing a database means tendency to simplify the
table for easy implementation in large database.

 Normalizing a logical database design involves organizing


the data into more than one table.

 Normalization improve the performance by reducing


redundancy in database table.

 The basic objectives of normalization are to reduce


redundancy which means that information to be stored only
once in relation.
 The Benefits of Normalization are:
 Save the storage space and make easier to insert, delete & update
the data.
 Fast sorting and index creation.
 Simplify the structure of the table.

 A properly Normalized Database should have the


following Characteristics :
 Scalar values in each fields.
 Absence of redundancy.
 Minimum use of null values.
 Minimum loss of information.
Levels of Normalization
 Levels of Normalization based on the amount of redundancy in the
database.

 Various levels of normalization are:

Number of Tables
Redundancy
 First Normal Form (1NF)

Complexity
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
 Domain Key Normal Form (DKNF)

Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFininorder
ordertotoavoid
avoidthe
theDatabase
DatabaseInconsistency
Inconsistency. .
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF

Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
Types of Normalization
First Normal Form (1 NF)
 Each field contains the smallest meaningful value

 The table does not contain repeating groups of


fields or repeating data within the same field

Solutions
 Create a separate field/table for each set of related data.

 Identify each set of related data with a primary key


Example (Not 1 NF)
PART WAREHOUSE QUANTITY

P0010 Warehouse A, Warehouse B, Warehouse C 400, 543, 329

P0020 Warehouse B, Warehouse C 200, 278

Really Bad Set-up!


Better, but still some disadvantage regarding Null Value

PART WAREHOUSE- A WAREHOUSE - B WAREHOUSE - C


(Primary Key) Qty Qty Qty

P0010 400 543 329

P0020 Null 200 278


1 NF – Decomposition
Example (1 NF)

PART WAREHOUSE QUANTITY

P0010 Warehouse A 400

P0010 Warehouse B 543

P0010 Warehouse C 329

P0020 Warehouse B 200

P0020 Warehouse D 278


Data Dependency
 Redundant (Unnecessary) data occur often when integration of multiple
databases
 The same attribute or object may have different names in different
databases.

 Redundant Attributes may be able to be detected by Correlation Analysis

 Careful integration of the data from multiple sources may help reduce/avoid
redundancies and inconsistencies and improve mining speed and quality

9
Correlation Analysis
Correlation Analysis involves various method and techniques used
for studying and measuring the level of the relationship between
two variables. Two variables are said to be correlated if the change
in one variable result in a corresponding change in the other
variable.

10
Correlation Analysis
Karl Pearson Coefficient of Correlation
Karl Pearson’s measures, known as Pearsonian correlation coefficient between two
variables (series) X and Y, usually denoted by r (X, Y) or fxy simply r is a numerical
measure of linear relationship between them and is defined as the ratio of the
covariance between X and Y, written as Cov (x, y), to the product of the standard
deviations of X and Y.

r > 0: X and Y are positively correlated.


r < 0: negatively correlated.
r = 0: No correlated.

11
Correlation Analysis

Interpretation of r

The value of r always lies between +1 and -1. When r = +1, it


indicates perfect positive correlation and r = -1 signified perfect
negative correlation. When r is near to zero it mean that there is
little/no correlation between X and Y.

r > 0: X and Y are positively correlated.


r < 0: negatively correlated.
r = 0: No correlated.

12
Case Study: The following data relate to age of 10 M/c operators and the number of
days on which they reported sick in a month:
Age (X) :28 32 38 42 46 52 54 57 58 63
Sick Days (Y) :0 1 3 4 2 5 4 6 7 8

Calculate Karl Pearson's Coefficient of correlation and interpret its r

Mean(X) = 470/10 = 47 ; Mean (Y) =40/10 = 4

13
Second Normal Form (2 NF)
 Second normal form is based on concept of full functional
dependency . Let us first consider the functional dependency

 A functional dependency (FD) is a kind of integrity constraints


that generalizes the concepts of key.

 Let X and Y are two attributes of a relation. Given the value of


X, if there is only one value of Y corresponding to it, then Y is
said to be functionally dependent on X.

 This is indicated by the notation X Y (Full FD)


Functional Dependencies
If one set of attributes in a table determines another set of attributes in the table, then the
second set of attributes is said to be functionally dependent on the first set of attributes.

Example 1

ISBN Title Price Table Scheme: {ISBN, Title, Price}


0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN}  {Title}
0-55-123456-9 Main Street $22.95
{ISBN} 
{Price} {Title,
0-123-45678-0 Ulysses $35.00
Price}  {ISBN}
1-22-233700-0 Visual $25.00
Basic
Functional Dependencies
Example 2
Table Scheme: {AuID, AuName, AuPhone}
Functional Dependencies: {AuId}  {AuPhone}
{AuId}  {AuName}
{AuName, AuPhone}  {AuID}

AuID AuName AuPhone


1 Sleepy 321-321-1111

2 Snoopy 232-234-1234

3 Grumpy 665-235-6532

4 Jones 123-333-3333

5 Smith 654-223-3455

6 Joyce 666-666-6666

7 Roman 444-444-4444
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements:

 The database is in 1st Normal Form

 All number of attributes in the table must be functionally


dependent on the key attribute (Composite Primary Key)
Second Normal Form (2NF)
Example 1: Consider the Non-Normalized table book_order
Table : book_order
Order_N0 Title Qty Unit_Price
1 Computer Networks 1 250
1 Graphics 1 275
1 DBMS 2 295
2 Multimedia 1 300
2 Data Structure 1 190
3 DBMS 1 295
3 Multimedia 2 300
3 Computer Networks 5 250
 The combination of order_no and title is the composite primary key since both order_no and title can not
repeat in the table.

 This table is in 1NF but not 2 NF because unit_price is not functionally dependent on order_no and title of
the connected primary key.

 Qty on other hand is functional dependent on connected composite primary key


Second Normal Form (2NF) Contd..

 To convert this relation in 2NF following two steps is performed:

 Find and remove attributes that are functionally dependent on only a part of the key and not on
the whole key, and place them is deferent table. And Group the remaining attributes.
 In The above example, since unit_price is not functionally dependent on whole of the key
Order_no + Title. We may unit_price along with title table called Book_Master.

Order_Master
Order_N0 Title Qty
1
Project
Computer Networks 1
Book_Master 1 Graphics 1
Title Unit_Price 1 DBMS 2
Computer Networks 250 2 Multimedia 1
Graphics 275 2 Data Structure 1
DBMS 295 3 DBMS 1
Multimedia 300 3 Multimedia 2
Data Structure 190 3 Computer Networks 5
Second Normal Form (2NF)
AKTU- 2016-2017

Ex. 1:Consider the universal relation schema R (A,B,C,D,E,F,G,H,I,J) and set of following FD. F={ABC,
ADE, BF, FGH, DIJ} determine the keys for R and decompose R into 2 nd NF.
Sol. : (AB)+ = ABC because ABC
=ABCDE because ADE
=ABCDE F because BF
=ABCDE FGH because FGH
=ABCDE FGH IJ because DIJ

Hence AB is the key of R. The given relation R has composite primary key of {AB} and
non prime attribute are {C,D,E,F,G,H,I,J}.

In this case, FD are ABC, ADE, BF which is only part of the primary key.

Therefore this relation does not satisfy 2NF.To bring this relation to 2NF, we break the
table into three relation are: R1 (A,B,C), R2 (A,D,E,I,J) and R3 (B,F,G,H)
Third Normal Form (3NF)

For a table to be in 3NF, there are two requirements:

 The table should be second normal form

 No attribute is transitively dependent on the primary key

A function dependency X Y in a relation schema R is transitively


dependency that is neither a candidate key nor a subset of any key of R and
both X Z and Z Y. (Partial FD)
Example 1: Consider the non-3NF Table
Table : Course_Room
Course_Name Head_Dept Room_No Room_Capacity
X1 X2 X3 X4
B.Tech (CS) Prof. Gupta 102 60
B.Tech (IT) Prof. Smith 107 50
B.Tech (EC) Dr. Sharma 105 60
B.Tech (AI Mr. Sharma 103 100

MCA Mr. Jindal 111 40

In the above relation, room_capacity is functional dependent on room_no and room_no is also functional
dependent on Course_name.

So a transitive functional dependency exist here i.e., X4 X3 and X3 X1.

Room _Capacity (x4) is NOT transitive functionally dependent on Course_Name (x1).

Hence table Course_Room is not in 3 NF.


Example 1: Consider the non-3NF table

To convert the above table into 3 NF, we must remove the column Room_Capacity , since it is not
functional dependent on primary key Course_Name and place in the another table called Room along
with the attribute Room_No it is functionally dependent on.
Course Room_No
Table Course_Name Head_Dept
(Primary Key)
X1 X2
X3
B.Tech (CS) Prof. Gupta 102
B.Tech (IT) Prof. Smith 107
B.Tech (EC) Dr. Sharma 105
B.Tech (AI Mr. Sharma 103

MCA Mr. Jindal 111

Room_No Room_Capacity
Room
X3 X4
Table
102 60
107 50
105 60
103 100
111 40
Example 2: Consider the non-3NF table

EMPLOYEE_DEPARTMENT TABLE

EMPNO FIRSTNAME LASTNAME DEPT ID DEPTNAME

000290 John Parker E11 Operations

000320 Ramlal Mehta E21 Software Support

000310 Maude Setright E11 Operations


Example 2 : Consider the non-3NF table

EMPLOYEE TABLE

EMPNO (Primary Key) FIRSTNAME LASTNAME DEPT ID

000290 John Parker E11

000320 Ramlal Mehta E21

000310 Maude Setright E11

DEPARTMENT TABLE

DEPT ID (Primary Key) DEPTNAME

E11 Operations

E21 Software Support


Third Normal Form (3NF)
AKTU- 2013-2014

Ex. 1:Consider the universal relation schema R (A,B,C,D,E, F,G,H,I,J) and set of
following FD. F={ABC, AD, BF, FGH, DIJ} determine the keys
for R and decompose R into 2nd and 3rd NF.
Sol. : (ABE)+ = ABEC because ABC
=ABECD because AD
=ABECDF because BF
=ABECDFGH because FGH
=ABECDFGHIJ because DIJ
Hence ABE is the key of R. The given relation R has composite primary key of {ABE}
and non prime attribute are {C,D,F,G,H,I,J}.
In this case, FD are ABC, AD, BF which is only part of the primary key.
Therefore this relation does not satisfy 2NF. To bring this relation to 2NF, we break the
table into three relation are: R1 (A,B,C,E), R2 (A,D,E,I,J) and R3 (B,F,E,G,H).
Now in R3,BF and FGH then BGH that is transitivity properties exit. Therefore R3
is not in 3NF. To bring R3 in 3NF, we break the table into two relation are:
R3 (B,F,E,G,H) into R4 (B,E,F) and R3 (F,G,H).
Boyce-Codd Normal Form (BCNF)
 BCNF is an extended form of 3NF.

 If a relation is BCNF then it must be in 3 NF.

 In BCNF, we extend our concept up to all the candidate keys of the relation, which are linked
and two or more of the candidate share a common attribute.

 In BCNF, a table must only have candidate key as determinants.

 Third normal form and BCNF are not same if the following conditions are true:
 The table has two or more candidate keys
 At least two of the candidate keys are composed of more than one attribute
 The keys are not disjoint i.e. The composite candidate keys share some attributes
Example 1: Consider the non-BCNF Table
Table : Student

Stud_ID S_Name Subject Grade

1908020109005 Vikas Kr. Mishra DBMS A


1908020109005 Vikas Kr. Mishra DAA B
1908020109005 Vikas Kr. Mishra CD B
1880210019 Rishabh Dube DBMS A

1880210019 Rishabh Dube DAA A

1880210019 Rishabh Dube CD B

In this relation following FD Exist:


(S_Name, Subject  Grade), (Stud_ID, Subject  Grade), (S_Name  Stud_ID) and (Stud_ID 
S_Name)

In this relation two candidate keys (S_Name, Subject) and (Stud_ID, Subject) exist, which are composite
keys and contain a common attribute subject.

This relation is in 3NF, however a lot of data repetition is there, the field of S_Name and Stud_ID.
Example 1: Consider the non-BCNF table

To convert this relation in BCNF following two steps is performed:

Find the remove the overlapping candidate keys. Place the part of the candidate key the attribute it is

FD on, in different table.

• Group the remaining items into a table.

Grade Stud_ID Subject Grade


Table
1908020109005 DBMS A
1908020109005 DAA B
1908020109005 CD B
1880210019 DBMS A
1880210019 DAA A
1880210019 CD B

Student_ID Stud_ID Name


Table 1908020109005 Vikas Kr. Mishra
1880210019 Rishabh Dube
Multi-Valued Dependency
1. MVD occurs when two or more independent multi-values facts about the same
attribute occur within the same relation. Generally it is denoted by X   Y i.e.,
there is a multi-valued dependency of Y on X.

2. Let R be a relation schema and let X and Y be the subsets of attributes of R.

Ex. Relation with MVD


Faculty Subject Committee
Dr. Sharma DBMS Placement
Dr. Sharma OS Placement
Dr. Sharma Data Mining Placement
Dr. Sharma DBMS Discipline
Dr. Sharma OS Discipline
Dr. Sharma Data Mining Discipline
Fourth Normal Form (4NF)
 Fourth normal form eliminates independent many-to-one relationships between columns.

 To be in Fourth Normal Form,


 A relation must first be in BCNF.
 A given relation may not contain more than one multi-valued attribute.

To convert this relation in 4 NF following two steps is performed:


1. Move the two multi-valued relations to separate tables
2. Identify a primary key for each of the new entity.

Faculty_Course Faculty_Committee
Faculty Subject Faculty Committee
Dr. Sharma DBMS Dr. Sharma Placement
Dr. Sharma OS Dr. Sharma Discipline
Dr. Sharma Data Mining
Fifth Normal Form (5NF)
 Fifth normal form is satisfied when all tables are broken into as many tables
as possible in order to avoid redundancy. Once it is in fifth normal form it
cannot be broken into smaller relations without changing the facts or the
meaning.

 In 5th normal form, we use the concept of Join Dependency which is


generalized form of Multi-value dependency.

 A Join Dependency (JD) denoted by (R1 , R2 , R3 ) specified on relation


schema R, Specifies a constraints on the state r of R. The constraint state
that every legal state r of R should have a lossless join decomposition into
R1 , R2 , …….Rn .

 MVD is a special case of a JD where n=2, i.e., JD denoted as (R 1 , R2 ).


Example 1: Consider the non-5th table
Company Product Supplier
Godrej Soap Mr. X
Godrej Shampoo Mr. X
Godrej Shampoo Mr. Y
Godrej Shampoo Mr. Z
H.Lever Soap Mr. X
H.Lever Soap Mr. Y
H.Lever Shampoo Mr. Y

In the decompose tables, Mr. X is a supplier for Godrel for twice and Mr. Y is also
for twice for H. Lever. But if we decompose the table then we will loose
information, which can be shown as follows:
Company Supplier
Company_Suppliers
Company_Product Company Product (R2 ) Godrej Mr. X
(R1 ) Godrej Soap Godrej Mr. Y
Godrej Shampoo Godrej Mr. Z
H.Lever Soap H.Lever Mr. X
H.Lever Shampoo H.Lever Mr. Y
Example 1: Consider the non-5th table
If we want to display the products and their supplies, then we will have to use the join based
on the company attribute.

The result will display some spurious records. For Mr. Z, it will display both the products, soap
and shampoo as the company for which Mr. Z is the supplier (Godrej) is producing soap and
shampoo, which is correct.

Now suppose that original tables were to be decomposed in three parts, which is as shown.

Company_Product Company_Suppliers Product_Suppliers


(R1 ) (R2 ) (R3 )
Company Supplier Product Supplier
Company Product
Godrej Mr. X Soap Mr. X
Godrej Soap
Godrej Mr. Y Soap Mr. Y
Godrej Shampoo
Godrej Mr. Z Shampoo Mr. X
H.Lever Soap
H.Lever Mr. X Shampoo Mr. Y
H.Lever Shampoo
H.Lever Mr. Y Shampoo Mr. Z
Domain Key Normal Form (DKNF)
 The relation is in DKNF when there can be no insertion or
deletion anomalies in the database.

 A Key uniquely identifies each row in a table.


Decomposition – Loss of Information
1. If decomposition does not cause any loss of information it is called a
lossless decomposition.
2. If a decomposition does not cause any dependencies to be lost it is
called a dependency-preserving decomposition.
3. Any table scheme can be decomposed in a lossless way into a
collection of smaller schemas that are in BCNF form. However the
dependency preservation is not guaranteed.
4. Any table can be decomposed in a lossless way into 3rd normal form
that also preserves the dependencies.

3NF may be better than BCNF in some cases

Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas

You might also like