DB Design - Normal Forms
DB Design - Normal Forms
Normal Forms
Readings :
3rd edition: Chapter 19, sections
19.1-19.6 (except 19.5.2), or
2nd edition: Chapter 15 sections 15.1-15.7
In Databases so far
Whats great about databases?
How to create a conceptual design using ER diagrams
How to create a logical design by turning the ER
diagrams into a relational schema including minimizing
the data and relations created
Now showing
Are we done (with the logical design)?
How to refine that schema to reduce duplication of
information
Learning Goals
Discuss pros and cons of redundancy in a database.
Provide examples of update, insertion, and deletion anomalies.
Given a set of tables and a set of functional dependencies over
them, determine all the keys for the tables.
Show that a table is/isnt in 3NF or BCNF.
Prove/disprove that a given table decomposition is a lossless
join decomposition. Justify why lossless join decompositions
are preferred decompositions.
Decompose a table into a set of tables that are in 3NF, or
BCNF.
Explain FD-preserving decompositions and why they are
desirable.
3
Name
Department
Address
Mailing address
Department
Mailing Location
Ed Knorr
Computer Science
Raymond Ng
Computer Science
Computer Science
Meghan Allan
Computer Science
Joel Friedman
Computer Science
Joel Friedman
Math
121-1984 Mathematics Rd
Brian Marcus
Math
121-1984 Mathematics Rd
Problems?
1. space.
2. typos
3. changes (e.g., departments move, or
change names)
Another example
Address(House#, Street, City,
Province, PostalCode).
PostalCode determines City, and Province, but is NOT a
key either.
That is, PostalCode > {City, Province}.
Street
City
Address
Postal code
Province
Postal code
Province
t1.Y = t2.Y
Example:
PostalCode City, Province holds provided:
for each possible t1, t2,
if t1.PostalCode = t2.PostalCode then
(t1.{City,Province} = t2.{City,Province})
9
10
mailingLocation(N, D, M ),
mailingLocation(N 0 , D, M 0 )
! M = M 0.
11
12
address(H, S, C, P, P C),
address(H, S, C, P, P C),
address(H 0 , S 0 , C 0 , P 0 , P C) address(H 0 , S 0 , C 0 , P 0 , P C)
! C = C 0.
! P = P 0.
13
address( , , C, , P C),
address( , , , P, P C),
address( , , C 0 , , P C)
address( , , , P 0 , P C)
! C = C 0.
! P = P 0.
14
Street
City
Province
Postal Code
101
Main Street
Vancouver
BC
V6A 2S5
103
Main Street
Vancouver
BC
V6A 2S5
101
Cambie Street
Vancouver
BC
V6B 4R3
103
Cambie Street
Vancouver
BC
V6B 4R3
101
Main Street
Delta
BC
V4C 2N1
103
Main Street
Delta
BC
V4C 2N1
15
Postal
code street? Department mailingLocation?
16
17
City, Province
House #
Street
City
Province
Postal Code
101
Main Street
Vancouver
BC
V6A 2S5
103
Main Street
Vancouver
BC
V6A 2S5
101
Cambie Street
Vancouver
BC
V6B 4R3
103
Cambie Street
Vancouver
BC
V6B 4R3
101
Main Street
Delta
BC
V4C 2N1
103
Main Street
Delta
BC
V4C 2N1
18
City, Province
House #
Street
City
Province
Postal Code
101
Main Street
Vancouver
BC
V6A 2S5
103
Main Street
Vancouver
BC
V6A 2S5
101
Cambie Street
Vancouver
BC
V6B 4R3
103
Cambie Street
Vancouver
BC
V6B 4R3
101
Main Street
Delta
BC
V4C 2N1
103
Main Street
Delta
BC
V4C 2N1
City, Province
House #
Street
City
Province
Postal Code
101
Main Street
Vancouver
BC
V6A 2S5
103
Main Street
Vancouver
BC
V6A 2S5
101
Cambie Street
Vancouver
BC
V6B 4R3
103
Cambie Street
Vancouver
BC
V6B 4R3
101
Main Street
Delta
BC
V4C 2N1
103
Main Street
Delta
BC
V4C 2N1
City, Province
House #
Street
City
Province
Postal Code
101
Main Street
Vancouver
BC
V6A 2S5
103
Main Street
Vancouver
BC
V6A 2S5
101
Cambie Street
Vancouver
BC
V6B 4R3
103
Cambie Street
Vancouver
BC
V6B 4R3
101
Main Street
Delta
BC
V4C 2N1
103
Main Street
Delta
BC
V4C 2N1
NULL
Vancouver
BC
V6T 1Z4
NULL
21
City, Province
House #
Street
City
Province
Postal Code
101
Main Street
Vancouver
BC
V6A 2S5
103
Main Street
Vancouver
BC
V6A 2S5
101
Cambie Street
Vancouver
BC
V6B 4R3
103
Cambie Street
Vancouver
BC
V6B 4R3
101
Main Street
Delta
BC
V4C 2N1
103
Main Street
Delta
BC
V4C 2N1
Street
City
City
Province Postal
Code
101
Main Street
Vancouver
103
Main Street
Vancouver
Vancouver
BC
V6A 2S5
101
Cambie Street
Vancouver
Vancouver
BC
V6B 4R3
103
Cambie Street
Vancouver
Delta
BC
V4C 2N1
101
Main Street
Delta
103
Main Street
Delta
Street
Postal Code
City
Province Postal
Code
101
Main Street
V6A 2S5
103
Main Street
V6A 2S5
Vancouver
BC
V6A 2S5
101
Cambie Street
V6B 4R3
Vancouver
BC
V6B 4R3
103
Cambie Street
V6B 4R3
Delta
BC
V4C 2N1
101
Main Street
V4C 2N1
103
Main Street
V4C 2N1
24
Suppose
a schema R(A,B,C,D) is not known to satisfy any FDs.
Can
we split R in a lossless way?
<in-class exercise.>
25
<in-class exercise.>
26
28
29
31
William W. Armstrong.
Canadian, eh?
32
William W. Armstrong.
Canadian, eh?
33
William W. Armstrong.
Canadian, eh?
city,major
34
William W. Armstrong.
Canadian, eh?
36
37
Example: Supplier-Part DB
Suppliers supply parts to projects.
supplier attributes: sname, city, status
part attributes: p#, pname
supplier-part attributes: qty:
SupplierPart(sname,city,status,p#,pname,qty)
Functional dependencies:
fd1:
sname city
fd2:
city status
fd3:
p# pname
fd4:
sname, p# qty
38
fd1:
fd2:
fd3:
fd4:
sname city
city status
p# pname
sname, p# qty
39
sname city
city status
p# pname
sname, p# qty
reflex
fd1.
2, fd2, trans
2, aug
3, aug
1, 5, union
4, 6, union
7, fd4, union
fd3, aug.
8, 9, union
40
p# pname
sname, p# qty
41
Closure = X;
repeat {
if (A1A2Ak
B is an FD) &
(A1A2Ak Closure)
add B to Closure }
until Closure does not change.
Algorithm for
computing the
closure of an
attribute set X.
42
fd1:
fd2:
fd3:
fd4:
sname city
city status
p# pname
sname, p# qty
{sname}+ =
{p#}+ =
4
Revisit previous supplier example.
44
45
Department
Mailing Location
Ed Knorr
Computer Science
Raymond Ng
Computer Science
Computer Science
Meghan Allan
Computer Science
Joel Friedman
Computer Science
Joel Friedman
Math
121-1984 Mathematics Rd
Brian Marcus
Math
121-1984 Mathematics Rd
46
49
50
1NF
Each attribute has only one value
E.g., for postal code you cant have
both V6T 1Z4 and V6S 1W6 in the same
tuple!
Codds original vision of the relational
model allowed multi-valued attributes.
51
52
3NF
i.e., whenever X
A relation R is in 3NF if:
If X A is a non-trivial dependency in R, determines a non-key
attr, X better be a
then either X is a superkey for R
super key.
or A is part of a key.
Note: Being part of a super key doesnt count! Why? Super Key could
contain junk.
In English:
Only (super)keys should determine other attributes.
Ex: Address(House#, Street, City, Province, PostalCode)
FD: PostalCode City
Recall applicable FDs for
Is it in BCNF? Why (not)?
Address: PostalCode City,
PostalCode Province.
54
What do we want?
Guaranteed freedom from redundancy!
How do we get there?
A relation may be in BCNF already!
Interesting fact: all two attribute relations are in BCNF!
Hint: What are the only possible non-trivial FDs in a 2attribute relation schema?
If not, decomposition is the answer!
55
Decomposing a Relation
A decomposition of R replaces R by two
or more relations s.t.:
Each new relation contains a subset of
the attributes of R (and no attributes
not appearing in R), and
Every attribute of R appears in at least one new relation.
Intuitively, decomposing R means storing instances of the
relations produced by the decomposition, instead of
instances of R.
E.g., Address(House#,Street,City,Province,Postal Code)
How can we decompose without losing information?
56
Address(House#,Street#,PostalCode)
57
Lossless-Join Decompositions:
Definition
Informally: If we break a relation, r, into pieces, when we put the
pieces back together, we should get exactly r back again
Formally: Decomposition of R into X and Y is lossless-join w.r.t. a
set of FDs F if, for every instance r that satisfies F:
If we JOIN the X-part of r with the Y-part of r the result is
exactly r
REMARKS:
1. It is always true that r is a subset of the JOIN of its X-part
and Y-part.
2. In general, the other direction does not hold! If it does, the
decomposition is a lossless-join.
58
All decompositions used to resolve redundancy must be lossless!
A
1
4
7
B
2
5
2
A
1
4
7
B
2
5
2
B
2
5
2
C
3
6
8
C
3 decompose
6
8
(join)
A
1
4
7
1
7
B
2
5
2
2
2
C
3
6
8
8
3
R1
R2
Others
BCNF Example
Recall def. of BCNF: For all non-trivial FDs XA, X must
be a superkey .
ABCD
E.g.: Relation: R(ABCD) FD: BC, DA
Keys?
AD B C
A+ = A; B+ = BC; C+ = C; D+ = AD;
(BD)+ = BDCA; BD is the only key
Process R(ABCD).
Look at FD B C. Is B a superkey?
No. Decompose R into R1(B,C), R2(A,B,D)
BC is the only FD that applies to R1.
R1 is in BCNF.
Process R2(ABD).
====>
61
ABCD
AD B
ADB
D
63
Yes. Closure
64
65
66
Company
Product
Company, Product
Unit, Product
Unit
Company
Unit Company
Company, Product Unit
Key(s)?
Unit
Product
BCNF:
No non-trivial FDs
We lose the FD: Company, Product Unit !!
67
Unit Company
Company
Unit
Product
SKYWill
UBC
SKYWill
Databases
Team Meat
UBC
Team Meat
Databases
Unit Company
No problem so far. All local FDs are satisfied.
Lets put all the data back into a single table again:
How could
Unit
Company
Product
the dbms
check if an
SKYWill
UBC
Databases
update would
Team Meat
UBC
Databases
violate the FD
Company,
Violates the FD:
Company, Product Unit
Product
68
Unit?
BCNF
70
Example:
AB, ABCDE, EFGH, ACDF
EG
71
72
73
74
FDs: ACDE,
CEA
75
3NF
2NF 1NF
77
78
79
80
81