Lec 04
Lec 04
Normal Forms
CS3402 1
Functional Dependency
Functional dependency is a constraint between two sets of
attributes from the database
Formal definition:
Let R be a relation schema, and R, R (i.e., and are
sets of R’s attributes). We say:
If in any relation instance r(R), for all pairs of tuples t1 and t2 in
r, we have:
(t1[] = t2[]) (t1[] = t2[])
CS3402 2
Inference Rules for FDs
CS3402 3
Inference Rules for FDs
Closure of a set of attributes X with respect to F is the set X+ of all
attributes that are functionally determined by X
Note both X and X+ are a set of attributes
CS3402 4
Inference Rules for FDs
Closure of a set F of FDs is the set F+ of all FDs that can be inferred
from F.
CS3402 5
Relational Database Design
Logical / Conceptual DB Design
Schema
what relations (tables) are needed?
what their attributes should be?
CS3402 6
Relational Database Design
Introduction (con’d)
Normalization theory
- based on functional dependencies
* universe of relations
* 1st Normal Form (1NF)
* 2NF 1NF
* 3NF 2NF
* BCNF 3NF
* 4NF BCNF
4NF
* ...
5NF
CS3402 7
Normalization
Normalization
Proposed by Codd (1972)
take a relation schema through a series of tests based on FD
and primary keys to certify whether it satisfies a certain normal
form
Decompose a relation into several related relation to achieve:
Minimizing redundancy
Minimizing the insertion, deletion and update anomalies
CS3402 8
Normalization
Normalization requires two properties
CS3402 9
Lossless Decomposition
S# Country City
S3 USA N.Y.
reversible S5 USA L.A.
decomposition
S# Country S# Country
S3 USA S3 USA
S5 USA S5 USA
Natural Natural
S# City Join Country City Join
S3 N.Y. USA N.Y. Country City S#
S5 L.A. USA L.A. USA N.Y. S3
USA N.Y. S5
USA L.A. S3
USA L.A. S5
CS3402 10
First Normal Form with Primary Key
First normal form (1NF)
Disallow multi-values attributes, composite attributes and their
combination
The domain of an attribute must be atomic (simple and
indivisible) values
CS3402 11
CS3402 12
Figure 14.10 Normalizing
nested relations into 1NF.
(a) Schema of the
EMP_PROJ relation with a
nested relation attribute
PROJS. (b) Sample
extension of the EMP_PROJ
relation showing nested
relations within each tuple.
(c) Decomposition of
EMP_PROJ into relations
EMP_PROJ1 and
EMP_PROJ2 by propagating
the primary key.
CS3402 13
Problems with 1NF
Example:
FIRST(S#, Country, City, P#, Qty)
and its Functional Dependency Diagram:
What’s the primary key? (S#,P#)
S# Country
Qty
P# City
CS3402 14
Problems with 1NF
Problems with 1NF
Insert Anomalies
Inability to represent certain information:
Eg, cannot enter “Supplier and City” information until
Supplier supplies at least one product
Delete Anomalies
Deleting the “only tuple” for a supplier will destroy all
the information about that supplier
Update Anomalies
“S# and City” could be redundantly represented for
each P#, which may cause potential inconsistency
when updating a tuple
CS3402 15
Solution: 1NF->2NF
Possible Solution:
Replace the original table by two sub-tables
SECOND(S#, City, Country)
SP (S#, P#, Qty)
and now, their FD diagrams are:
S# Country
Qty S#
P#
City
CS3402 16
Second Normal Form with Primary Key
X->Y
CS3402 17
Second Normal Form with Primary Key
A relation R is in 2NF if every nonprime attributes A in R is fully
functional dependent on the primary key of R
CS3402 18
Second Normal Form with Primary Key
The nonprime attributes Country violates 2NF because of S#-
>Country
Similarly, S#->City
CS3402 19
Problem with 2NF
Country
S#
Qty S# city
P#
city Country
CS3402 21
Third Normal Form with Primary Key
3NF is based on the concept of transitive dependency
S#
Qty S# city
P#
city Country
CS3402 23
Normal Forms Defined Informally
1st normal form
All attributes are simple (no composite or multivalued attributes)
2nd normal form
All non-prime attributes depend on the whole key (no partial
dependency)
3rd normal form
All non-prime attributes only depend on the key (no transitive
dependency)
CS3402 24
Test and Remedy for Normalization
CS3402 25
Consider all candidate keys of a relation (not just a defined
primary key)
CS3402 26
General Definition of Second
Normal Form
CS3402 27
CS3402 28
General Definition of Third Normal
Form
LOTS2 is in 3NF
FD4 in LOTS1 violates 3NF because Area is not a superkey and
Price is not a prime attribute in LOTS1
LOTS1 violates 3NF because Price is transitively dependent on
each of the candidate keys of LOTS via the nonprime attribute Area
To normalize LOTS1 into 3NF, LOTS1B{Area, Price}
CS3402 29
Boyce-Codd Normal Form
BCNF was proposed as a simpler form of 3NF, but it was found to
be stricter than 3NF
Every relation in BCNF is also in 3NF
BUT Relation in 3NF is not necessarily in BCNF
Difference:
Condition which allows A to be prime is absent from BCNF
CS3402 30
Boyce-Codd Normal Form
Suppose we have thousands of lots in the relation but the lots are
from only two counties: A and B
Lot sizes in A are only 0.5, 0.6, 0.7, 0.8, 0.9 acres whereas lots in B
are 1.1, 1.2, …, 1.9
Then, Area -> County_name
FD5 satisfies 3NF in LOTS1A because County_name is a prime
attribute
FD5 violates BCNF in LOTS1A because Area is not a superkey of
LOTS1A
Create new relation R(Area, County_name). The area of lots can be
represented in R(Area, County_name) to reduce redundancy
CS3402 31
County_name is a prime attribute
CS3402 32
Boyce-Codd Normal Form
Some notes on BCNF:
BCNF is stronger than 3NF
BCNF is useful when a relation has:
multiple candidate keys, and
these candidate keys are composite ones, and
they overlap on some common attributes
BCNF reduces to 3NF if the above conditions do not apply
CS3402 33
References
6e
Ch. 14, p. 487-515
Ch. 15, p. 529-533
CS3402 34