Part4 - Ch9 - Functional Dependencies and Normalization
Part4 - Ch9 - Functional Dependencies and Normalization
Outline:
7.3 Normalization.
Definitions.
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Guideline1:
Map each entity set to a relation & each relationship set to a relation.
o Don’t mix these two
o Only use foreign keys to refer to other entities as opposed to
duplicating attributes in other relations.
EMP_DEPT
It is good design?
o The attributes semantic of the EMP_DEPT relation are not clear.
o It is poor design because they violate guideline 1 by mixing attributes from
distinct real-world entities.
o Solution: decompose the “EMP_DEPT” relation into two relations and foreign
key.
EMP DEPT
EMP_DEPT
….
Redundancy
o There are problems that result from having such large table:
B. Deletion Anomalies:
Deleting departments will delete their employees.
Guideline2:
Minimize the amount of data redundancy in the database.
o If some attributes do not apply to many tuples, then the occurrence of data
in those columns is sparse and storage space is wasted.
o Queries involving tables having null values might have difficulty dealing
with the null values, namely joins and aggregates.
Guideline3:
o Try to minimize the occurrences of NULL values in your design.
o Attributes that are NULL frequently could be placed in separate
relations (with the primary keys).
4. Spurious Tuples
Employee
EID EName TelNo SDate
12345 Kim 555-1234 01/23/2004
54321 Kim 555-5432 05/03/2001
Emp1
EID EName
12345 Kim
54321 Kim
Emp2
EName TelNo SDate
Kim 555-1234 01/23/2004
Kim 555-5432 05/03/2001
Guideline4:
o The relations should be designed to satisfy the lossless join
condition.
o No spurious tuples should be generated by doing a natural-join of
any relations.
Definitions
A B
1 4
2 5
1 7
On these instances, A→B does not hold, but B→A does hold.
A1 A2 A3 A4
1 2 3 4
1 2 3 5
6 7 8 2
2 1 3 4
X Y Z
X1 Y1 Z1
X2 Y1 Z1
X3 Y2 Z3
X4 Y3 Z2
Example: Consider the relation EMP (SSN, Ename, Deptno). Suppose EMP
is used to represent many-to-one relationship from EMP to DEPT, where SSN
is a key in the EMP relation and Deptno is a key in the DEPT relation.
o Are the following FDs holds in EMP relation?
1. {SSN}→{Deptno}
2. { Deptno}→{SSN}
o What about many-to-many relationship and one-to-one relationship?
Relation extensions r(R) that satisfy the FD constraints are called legal
extensions (or legal relation states) of R.
Given a set of FDs F, we can infer additional FDs that hold whenever the FDs
in F hold
The set of all dependences is called the closure of F and is denoted by F+.
A. Armstrong’s Axioms
We can find all of F+ by applying Armstrong’s axioms.
A→B
Transitive B→H
B→H
2. AG→I
By augmentation A→C with G, to get
AG→CG and then transitivity with CG→I
CG→I
Transitive AG→I
Aug
A→C AG→CG
Aug
CG→I CG→CGI
Aug Transitive CG→HI
CG→H CGI→HI
By reflexivity YZ→Y
Transitive X→Y
X→YZ
By reflexivity YZ→Z
Transitive X→Z
X→YZ
Aug
X→Y X→XY
Transitive X→YZ
Aug
X→Z XY→YZ
Aug
X→Y XW→YW
Transitive XW→YZ
Aug
W→Z WY→YZ
WY→Z
Transitive WX→Z
Aug
X→Y XW→YW
F+ can grow quite large, as we keep applying rules to find more FDs.
Sometimes we want to find all of F+, and other times we just want to find part
of it.
We are often interested in finding the part that tells us whether or not some
subset of attributes x is a superkey for R.
If you can uniquely determine all attributes in R by some subset of attributes
X, then X is a superkey for R.
The closure of X under F, denoted X+, is the subset of attributes that are
uniquely determined by X under F.
Definition:
o Given a schema R, a set X of attributes in R, and a set F of FDs that
hold for R, then the set of all attributes of R that are functionally
dependent on X is called closure of X under F (X+)
X+ := X;
Repeat
oldX+ := X+ ;
for each functional dependency Y→Z in F do
if Y X+ then X+ := X+ U Z;
Until (X+ = oldX+);
oldX+ X+
AB AB
ABC
ABCF
ABCEF
ABCEF
{AB}+ = {A B C E F}
{AB} is not a superkey because it is not determine R
oldX+ X+
AG AG
ABG
ABCG
ABCGH
ABCGH
ABCGHI
ABCGHI
{AG}+ = {A B C G H I}
{AG} is a superkey
7.3 Normalization
Definitions:
A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S
subset-of R with the property that no two tuples t1 and t2 in any legal relation
state r of R will have t1[S] = t2[S].
A key K is a superkey with the additional property that removal of any
attribute from K will cause K not to be a superkey any more.
If a relation schema has more than one key, each is called a candidate key.
o One of the candidate keys is arbitrarily designated to be the primary
key, and the others are called alternate keys.
A Prime attribute must be a member of some candidate key.
A Nonprime attribute is not a prime attribute—that is, it is not a member of
any candidate key.
There are three main techniques to achieve first normal form for such a
relation:
I. First Solution: Decompose the non 1NF relation into two 1NF
relations, by remove the attribute Dlocation that violate 1NF
and place it in a separate relation Dept_Location. The primary
key of this relation is the combination (Dnumber, Dlocation).
Dnumber Dlocation
5 LocX
5 LocY
5 LocZ
4 LocW
1 LocZ
II. Second solution: Expand the key so that there will be a separate
tuple in the original DEPARTMENT relation for each location
of a department.
The primary key is (Dnumber, Dlocation)
The EMPLOYEE relation is not in 1NF, because the Name attribute not
atomic attribute
SSN is the primary key of the EMP_PROJ relation and Pnumber is the
partial key of the nested relation; that is, within each tuple, the nested
relation must have unique values of Pnumber.
To normalize this into 1NF, we remove the nested relation attributes into
a new relation and propagate the primary key into it; the primary key of
the new relation will combine the partial key with the primary key of the
original relation.
SSN Ename
EMP_PROJ
SSN Pnumber Hours Ename Pname Plocation
FD1
FD2
FD3
o After decomposition:
No redundant values
No spurious tuples through join
Solution:
A B C D E F
FD1
FD2
A B C D
FD1
D E F
FD2
EMP_DEPT
Ename SSN BDate Address Dnumber Dname DMGRSSN
FD1
FD2
ED2
Dnumber Dname DMGRSSN
FD2
FD1: {Item}→{Category}
FD2: {Item}→{Discount}
FD3: {Category}→{Discount}
The relation is in 1NF and in 2NF but it is not in 3NF (FD3)
Solution:
LOTS
Property_ID# County_Name Lot# Area Price Tax_Rate
FD1
FD2
FD3
FD4
LOTS1 LOTS2
Property_ID# County_Name Lot# Area Price County_Name Tax_Rate
FD1 FD3
FD2
FD4
LOTS1 in 2NF but not in 3NF (FD4)
LOTS1A LOTS1B
Property_ID# County_Name Lot# Area Area Price
FD1 FD4
FD2
BOOK_AUTH
Book Author Book List Publisher Author
Publisher
Title Name Type Price Date Affil
FD1
FD2
FD3
The BOOK_AUTH relation in 1NF but it is not in 2NF (FD1 and FD3)
FD1
FD2
B_A2_1 B_A2_2
Book Book Book List
Publisher
Title Type Type Price
FD1 FD2
A B C
FD1
FD2
a1 b1 c1
a1 b2 c3
a2 b2 c4
a3 b2 c3
a3 b3 c5
A C C B
Solution:
Out of the above three, only the 3rd decomposition will not generate
spurious tuples after join.
How would you split the previous relation to minimize data redundancy?
Product Prod
CustID Name Order# OrderDate Price Qty
Code Desc
C004 Adams, Anne Ord001 03/08/2000 P100 TopDeck 3.25 150
C004 Adams, Anne Ord001 03/08/2000 P300 KitKat 3.10 100
C003 Jones, Carol Ord002 05/08/2000 P200 BarOne 2.95 240
C002 Black, Roger Ord003 05/08/2000 P500 MilkyBar 3.20 370
C002 Black, Roger Ord003 05/08/2000 P400 Flake 3.40 120
C001 Smith, John Ord004 09/08/2000 P300 KitKat 3.10 280
C005 Rhodes, Sean Ord005 12/08/2000 P400 Flake 3.40 150
C002 Black, Roger Ord006 13/08/2000 P100 TopDeck 3.25 320
C001 Smith, John Ord007 15/08/2000 P500 MilkyBar 3.20 240
How would you split the previous relation to minimize data redundancy?