0% found this document useful (0 votes)
18 views14 pages

E-Content - Dbms - Unit - 4

The document discusses database normalization and schema refinement. It defines several normal forms including 1NF, 2NF, 3NF and BCNF. 1NF requires that relations do not have multivalued attributes. 2NF eliminates partial dependencies by removing non-key attributes to new relations. 3NF removes transitive dependencies. The goal of normalization is to minimize data anomalies like insertion, deletion and update anomalies by removing redundant data through decomposition of relations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views14 pages

E-Content - Dbms - Unit - 4

The document discusses database normalization and schema refinement. It defines several normal forms including 1NF, 2NF, 3NF and BCNF. 1NF requires that relations do not have multivalued attributes. 2NF eliminates partial dependencies by removing non-key attributes to new relations. 3NF removes transitive dependencies. The goal of normalization is to minimize data anomalies like insertion, deletion and update anomalies by removing redundant data through decomposition of relations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit – 4

Schema Refinement and Normal forms


Schema:
A Schema can be defined as a complete description of database. The Specifications for database
schema is provided during the database design and this schema does not change frequently.

4.1 Schema Refinement:


Schema Refinement is a process of refining the schema so as to solve the problems caused by
redundantly storing the information.
Redundancy means duplication of data. Redundancy is at the root of several problems
associated with relational schemas. Some of them are
 Redundant storage,
 Insert/delete/update anomalies.

Figure 4.1
4.2 Problems caused by Redundancy:
4.2.1. Data Inconsistency
4.2.2. Memory Fragmentation
4.2.3. Anomalies:
There are three types of anomalies:
i. update,
ii. deletion and
iii. insertion anomalies.
Let us consider an example :
Each employee in a company has a department associated with them as well as the student
group they participate in.
Employee_ID Name Department Student_Group
123 James Accounting Pst club
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Management Club
456 Anand CIS Technology Org.
456 Anand CIS Pst club

4.3.2.1 Update Anomaly:


An update anomaly is a data inconsistency that results from data redundancy and a partial
update.
For example, if Anand’s department is an error it must be updated at least 2 times or there will be
inconsistent data in the database. If the user performing the update does not realize the data is stored
redundantly the update will not be done properly.
4.3.2.2 Deletion anomaly :
A deletion anomaly is the unintended loss of data due to deletion of other data.
For example, if the student group Pst club is disbanded and was deleted from the table above,
James and the Accounting department would cease to exist. This results in database inconsistencies
and is an example of how combining information that does not really belong together into one table can
cause problems.
4.3.4.3 Insertion anomaly :
An insertion anomaly is the inability to add data to the database due to absence of other data.
For example, assume Student_Group is defined so that null values are not allowed. If a new
employee is hired but not immediately assigned to a Student_Group then this employee could not be
entered into the database. This results in database inconsistencies due to omission.
Update, deletion, and insertion anomalies are very undesirable in any database. Anomalies are
avoided by the process of normalization.
Ways to avoid Data Anomalies:
There are two ways to avoid data anomalies. They are :
1. Normalization
2. Decomposition
4.4 Functional Dependency :
A Functional dependency is defined as the relationship between the attributes that correspond
to a single relation.
A functional dependency (FD) has the form X -> Y (read as X functionally determines Y )
where X and Y are sets of attributes in a relation R. Here X is used to determine the value of Y,so it is
said that Y is functionally dependent on X.
Example : A student can have only one birth year : S → B
Functional Dependencies in entity sets :
An entity set can have the following functional dependencies :
4.4.1.Fully Functional Dependency:
If x and y are attributes of an entity set in a table such that y is functionally dependent
only on x, but not on any proper subset of x,then this type of dependency is called as Fully
Functional Dependency.
Eg : RollNo, SubName -> Marks.
4.4.2. Partial Functional Dependency:
If x and y are attributes of an entity set in a table such that y is functionally dependent
only on x and elimination of some attributes from x does not affect the dependency, then this
type of dependency is called as Partial Functional Dependency.
Eg :emp_id, emp_name -> salary.
4.4.3. Transitive Functional Dependency:
If x,y,z are attributes of an entity set in a table such that x is functionally dependent on y
and y is functionally dependent on z, then z will be transitively dependent on x through y.
Eg: Students -> Teachers
Teachers -> Management
Management -> Students

4.5 Reasoning about FD’s:


Armstrong’s Axioms:
William W. Armstrong established a set of rules which can be used to infer the functional
dependencies in a relational database (from umbc.edu - no external linking, Google Database Design
UMBC):
 Reflexivity rule:
If A is a set of attributes, and B is a set of attributes that are completely contained in A,
then A implies B.
 Augmentation rule:
If A implies B, and C is a set of attributes, then if A implies B, then AC implies BC.
 Transitivity rule:
If A implies B and B implies C, then A implies C.
These can be simplified if we also use:
 Union rule: If A implies B and A implies C, the A implies BC.
 Decomposition rule: If A implies BC then A implies B and A implies C.
 Pseudo transitivity rule: If A implies B and CB implies D, then AC implies D.

4.6 Normalization:
Normalization is a relational database management system design concept which is a process of
designing the database structure such that it minimizes the data redundancy and also data anomalies.
The process of normalization includes a series of stages known as Normal Forms. Normalization
rules are divided into following normal form :
1. First Normal Form (1 NF)
2. Second Normal Form (2 NF)
3. Third Normal Form (3 NF)
4. Boyce-Codd Normal Form (BCNF)
4.6.1 First Normal Form (1 NF) :
A relation is in first Normal Form if and only if all underlying domains contain atomic values only.
In other words, a relation doesn’t have multivalve attributes.
For example:
Consider a STUDENT (Sid, Sname, Cname) relation.

Student :

SID Sname Cname

S1 A C,C++
Due to occurrence of MVA, the above relation is not in 1

S2 B C++,DB NF.

S3 A DB

SID : Primary Key


Solution :

Removal of MVA by inserting more rows

Student :

SID Sname Cname

S1 A C

S1 A C++
⇐ The relation is in 1NF
S2 B C++

S2 B DB

S3 A DB

SID : Primary Key

4.6.2 2NF – Second Normal Form :


Relation R is in Second Normal Form (2NF) only iff :
1. R should be in 1NF and
2. R should not contain any Partial Dependency

Partial Dependency :

Let R be a relational Schema and X,Y,A be the attribute sets over R.


X: Any Candidate Key
Y: Proper Subset of Candidate Key
A: Non Key Attribute
If Y → A exists in R, then R is not in 2 NF.
(Y → A) is a Partial dependency only if
 Y: Proper subset of Candidate Key
 A: Non Prime Attribute
Removal of Partial Dependency
If there is any partial dependency, remove partially dependent attributes from original table,
place them in a separate table along with a copy of its determinant.
Example 1 :Consider the relation
Student(SID, Sname, Cname) which is in 1 NF (No Multi-Valued-Attributes) :

Student :

SID Sname Cname

S1 A C

S1 A C++

S2 B C++

S2 B DB

S3 A DB Partial Dependencies :
SID → Sname {as SID is a Proper Subset of
{SID,Cname} : Primary Key
Candidate Key {SID,Cname}.
Functional Dependencies:
{SID,Cname} → Sname
SID → Sname

Solution : Removal of Partial Dependency by creating separate table


R2 :

SID SID Cname


R1 :
: Primar S1 C
SID Sname
y Key
S1 C++
S1 A
S2 C++
S2 B
S2 DB
S3 A
S3 DB

{SID,Cname} : Primary Key


The above two relations R1 and R2 are Lossless Join and Dependency Preserving . They were in 2NF
* There is less redundancy in 2NF rather than in 1 NF, but 2NF is not free from redundancy.
4.6.3 3NF – Third Normal Form:
Let R be the relational schema, R is in 3NF only if :
 R should be in 2NF.
 R should not contain transitive dependencies.

Transitive Dependency

Let R be a relational Schema and X,Y,Z be the attribute sets over R.


If X is functionally dependent on Y (X → Y)
and Y is functionally dependent on Z (Y → Z)
then X is transitive dependent on Z (X → Z)

Removal of Transitive Dependency


If there is any transitive dependency in the relation, then
 Create a separate relation and copy the dependent attribute along with a copy of its
determinant. and remove these determinants from the original table.
 Mark dependent attribute as a foreign key in the original relation and Mark dependent
attribute as a Primary key in the separate relation

Example of 3NF :
Let us consider the relation Sup_City(SID, Status, City) :

Sup_City :

SID Status City

S1 30 Delhi

S2 10 Karnal

S3 40 Rohtak

S4 30 Delhi

SID : Primary Key


Transitive Dependency :
SID → City {As SID → City and City → Status}

Solution :
Removal of Transitive Dependency by creating separate table

SC :
CS :
SID City
City Status
S1 Delhi
Delhi 30
S2 Karnal
Karnal 10
S3 Rohtak
Rohtak 40
S4 Delhi
City : Primary Key
SID : Primary Key

The relations SC and CS are in 3NF as they doesn't contain any transitive dependencies.
4.6.4 BCNF – Boyce Codd Normal Form:
Let R be the relational schema, R is in BCNF only if :
 R should be in 3NF.
 Every Functional Dependency will have a Superkey on the LHS or all determinants are the
superkeys.
Example :
Consider the following relationship R(ABCD) having following functional dependencies:

F = {A → BCD, BC → AD, D → B}

Candidate Keys are :


(A)+ = {ABCD}
(BC)+ = {BCAD}
(DC)+ = {DCBA}

Functional
Is FD in BCNF or not ? Reason ?
Dependency

A → BCD Yes A is a super key

BC → AD Yes BC is also a super key

D →A No D is not super key, it is part of key


Solution :
Decomposition in BCNF:
The relation R(ABCD) is decomposed into two relations R1 and R2 such that :
R1(A,D,C) and R2(D,B).
The above two relations R1 and R2
1. Lossless Join
2. BCNF Decomposition
3. But Not Dependency Preserving
Redundancy in BCNF
There will be 0% redundancy, because of Single Valued Functional Dependency.
Redundancy may exist because of Multivalved Dependency.
Multivalued Dependency
Consider a relation Faculty (FID, Course, Book) which consists of two multivalued attributes
(Course and Book). The two multivalued attributes are independent of each other.

FID Course Book

1 C1/C2 B1/B2

2 C1 B1

Differences between 3NF and BCNF :

S.NO 3NF BCNF

1. It concentrates on Primary Key It concentrates on Candidate Key.

2. Redundancy is high as compared to BCNF 0% redundancy

3. It may preserve all the dependencies It may not preserve the dependencies.

A dependency X → Y is allowed in 3NF if X is A dependency X → Y is allowed if X is a


4.
a super key or Y is a part of some key. super key

Decomposition:
Decomposition is the process of dividing longer relations into smaller relations. This division is
based on functional and other dependencies which are specified by database designer.
The decomposition of a relation scheme R consists of replacing the relation schema by two or more
relation schemas that each contain a subset of the attributes of R and together include all attributes in
R.
Problems related to Decomposition:
Decomposing a relation schema can create more problems unless we handle them carefully.
 Two important questions must be avoided repeatedly.They are:
1. Do we need to decompose a relation ?
2. What problems does a given decomposition cause ?
 To answer the first question, several normal forms have been proposed for relations. If a
relation schema is in one of these normal forms, we know that certain kinds of problems
cannot arise.
 To answer the second question, two properties of decomposition arte of particular
interest :
1. Loss-less join decomposition
2. Dependency Preserving Decomposition

4.7 Lossless join Decomposition:


Lossy Decomposition :
"The decomposition of relation R into R1 and R2 is lossy when the join of R1 and R2 does not
yield the same relation as in R."
One of the disadvantages of decomposition into two or more relational schemes (or tables) is that
some information is lost during retrieval of original relation or table.
Consider that we have table STUDENT with three attribute roll_no , sname and department.

STUDENT:

Roll_no Sname Dept

111 parimal COMPUTER

222 parimal ELECTRICAL

This relation is decomposed into two relation no_name and name_dept :

No_name:

Roll_no Sname

111 parimal
Sname Dept

parimal COMPUTER

parimal ELECTRICAL

222 parimal

Name_dept :
In lossy decomposition ,spurious tuples are generated when a natural join is applied to the relations in
the decomposition.
stu_joined :

Roll_no Sname Dept

111 parimal COMPUTER

111 parimal ELECTRICAL

222 parimal COMPUTER

222 parimal ELECTRICAL

The above decomposition is a bad decomposition or Lossy decomposition.


Lossless Join Decomposition :
"The decomposition of relation R into R1 and R2 is lossless when the join of R1 and R2 yield the
same relation as in R."
A relational table is decomposed (or factored) into two or more smaller tables, in such a way that
the designer can capture the precise content of the original table by joining the decomposed parts.
This is called lossless-join (or non-additive join) decomposition.
This is also referred as non-additive decomposition.
The lossless-join decomposition is always defined with respect to a specific set F of dependencies.
Example:
Consider that we have table STUDENT with three attribute roll_no , sname and department.
STUDENT :

Roll_no Sname Dept

111 parimal COMPUTER

222 parimal ELECTRICAL

This relation is decomposed into two relation Stu_name and Stu_dept :

Stu_name: Stu_dept :

Roll_no Sname Roll_no Dept

111 parimal 111 COMPUTER

222 parimal 222 ELECTRICAL

Now , when these two relations are joined on the common column 'roll_no' ,the resultant relation will
look like stu_joined.

stu_joined :

Roll_no Sname Dept

111 parimal COMPUTER

222 parimal ELECTRICAL

In lossless decomposition, no any spurious tuples are generated when a natural joined is applied to the
relations in the decomposition.

Tests for lossless join decomposition:


To Identify whether a decomposition is lossy or lossless, it must satisfy the following conditions :
a) R1 ∪ R2 = R
i. 2. R1 ∩ R2 ≠ Φ and
ii. 3. R1 ∩ R2 → R1 or R1 ∩ R2 → R2
Example 1 :
Let R(ABC) F = {A → B, A → C} decomposed into D = R1(AB), R2(BC) Find whether D is
Lossless or Lossy ?
Solution :
Given D = {AB, BC}
Step 1: AB ∪ BC = ABC
Step 2: AB ∩ BC = B //Intersection
+
Step 3: B = {B} //Not a super key of R 1 or R2
Therefore, the given Decomposition is lossy.

Example 2 :
Let R(ABCDEF) F = {A → B, B → C, C → D, E → F} decomposed into D = R1(AB), R2(BCD), R3(DEF).
Find whether D is Lossless or Lossy ?
Solution :
Step 1:
AB ∪ BCD ∪ DEF = ABCDEF = R // Condition 1 satisfies
step 2:
AB ∩ BCD = B
B+ = {BCD} //superkey of R2
⇒ R12(ABCD)

ABCD ∩ DEF = D
D+ = {D} // Not a superkey of R12 or R3
Therfore, the given decomposition is Lossy.

4.8 Dependency Preserving Decomposition:


Another property of decomposition is Dependency Preserving Decomposition.
If the original table is decomposed into multiple fragments, then somehow, we suppose to get
all original FDs from these fragments. In other words, every dependency in original table must be
preserved or say, every dependency must be satisfied by at least one decomposed table.
Let R be the original relational schema having FD set F. Let R1 and R2 having FD set F1 and F2
respectively, are the decomposed sub-relations of R. then there can be three cases:
 F1 U F2 = F -----> Decomposition is dependency preserving.
 F1 U F2 is a subset of F -----> Not Dependency preserving.
 F1 U F2 is a super set of F -----> This case is not possible.
Example:
Let a relation R (A, B, C, D ) and functional dependency {AB –> C, C –> D, D –> A}. Relation
R is decomposed into R1( A, B, C) and R2(C, D). Check whether decomposition is dependency
preserving or not.
Solution:
Given R1(A, B, C) and R2(C, D)

Let us find closure of F1 and F2:


To find closure of F1, consider all combination of ABC. i.e., find closure of A, B, C, AB, BC and
AC. Note that ABC is not considered as it is always ABC

closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute

closure(AB) = {A, B, C, D} = {A, B, C}


AB --> C // Removing AB from right side as these are trivial attributes

closure(BC) = {B, C, D, A} = {A, B, C}


BC --> A // Removing BC from right side as these are trivial attributes

closure(AC) = {A, C, D}
AC --> D // Removing AC from right side as these are trivial attributes

F1 {C--> A, AB --> C, BC --> A}.


Similarly F2 { C--> D }

In the original Relation Dependency {AB --> C , C --> D , D --> A}.


AB --> C is present in F1.
C --> D is present in F2.
D --> A is not preserved.
F1 U F2 is a subset of F.
Therefore, the given decomposition is not dependency preserving.

TEXT BOOK:
1. Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3 rd Edition, TATA
McGraw hill.
2. Web pages

You might also like