0% found this document useful (0 votes)
97 views

DB Design - Normal Forms

This document discusses normalizing a database schema to reduce redundancy. It begins by providing an example of an address table with redundancy from a functional dependency between postal code and city/province. This redundancy can cause update, insertion and deletion anomalies. The document then suggests splitting the table into two tables to remove redundancy and avoid these anomalies. The new schema separates address information from the mapping of postal codes to locations.

Uploaded by

John Gong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

DB Design - Normal Forms

This document discusses normalizing a database schema to reduce redundancy. It begins by providing an example of an address table with redundancy from a functional dependency between postal code and city/province. This redundancy can cause update, insertion and deletion anomalies. The document then suggests splitting the table into two tables to remove redundancy and avoid these anomalies. The new schema separates address information from the mapping of postal codes to locations.

Uploaded by

John Gong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Unit 4 Schema Refinement and

Normal Forms

Readings :
3rd edition: Chapter 19, sections
19.1-19.6 (except 19.5.2), or
2nd edition: Chapter 15 sections 15.1-15.7

In Databases so far
Whats great about databases?
How to create a conceptual design using ER diagrams
How to create a logical design by turning the ER
diagrams into a relational schema including minimizing
the data and relations created
Now showing
Are we done (with the logical design)?
How to refine that schema to reduce duplication of
information

Learning Goals
Discuss pros and cons of redundancy in a database.
Provide examples of update, insertion, and deletion anomalies.
Given a set of tables and a set of functional dependencies over
them, determine all the keys for the tables.
Show that a table is/isnt in 3NF or BCNF.
Prove/disprove that a given table decomposition is a lossless
join decomposition. Justify why lossless join decompositions
are preferred decompositions.
Decompose a table into a set of tables that are in 3NF, or
BCNF.
Explain FD-preserving decompositions and why they are
desirable.
3

Dont forget about your project


milestones & deadlines!
http://www.cs.ubc.ca/~laks/cpsc304/project.html
You should have handed in your proposal (milestone #1) and
are hopefully working on turning your specification into an
ER diagram and then into tables (milestone #2).
Initially, you should be able to identify primary keys and
foreign keys.
see if you can identify additional constraints: FDs and
(candidate) keys from those FDs.
Normalize the tables into 3NF or BCNF.
Formal spec of your application (milestone #3).
Completed project (milestone #4).
Sign up for your demo once your TA notifies about demo
scheduling.
4

Consider the following entity set for mailing


addresses at UBC:

Name

Department

Address

Mailing address

Meets all the criteria that we have for an entity


There is nothing wrong with this entity
5

What would an instance look like?


Name

Department

Mailing Location

Ed Knorr

Computer Science

201-2366 Main Mall

Raymond Ng

Computer Science

201-2366 Main Mall

Laks V.S. Lakshmanan

Computer Science

201-2366 Main Mall

Meghan Allan

Computer Science

201-2366 Main Mall

Joel Friedman

Computer Science

201-2366 Main Mall

Joel Friedman

Math

121-1984 Mathematics Rd

Brian Marcus

Math

121-1984 Mathematics Rd

Problems?
1. space.
2. typos
3. changes (e.g., departments move, or
change names)

Okay, thats bad. But how do I know if a


department has just one address?
Databases allow you to say that one attribute determines
another through a functional dependency (FD).
So if Department determines MailingLocation but not
Name, we say that theres a functional dependency from
Department to MailingLocation. But Department is NOT
a key.

We write Department > MailingLocation to say each


dept has at most one mailing location.
Such statements are integrity constraints (ICs) called
functional dependencies (FDs).
7

Another example
Address(House#, Street, City,
Province, PostalCode).
PostalCode determines City, and Province, but is NOT a
key either.
That is, PostalCode > {City, Province}.
Street

Notational Note: We normally omit


House #
the set brackets and write instead
PostalCode > City, Province.

City
Address

Postal code

Province

Functional Dependencies (FDs)


technically speaking
Street
A functional dependency XY
City
(where X & Y are sets of attributes)
holds if for every legal instance, House #
Address
for all tuples t1, t2 :
if t1.X = t2.X then

Postal code
Province
t1.Y = t2.Y

Example:
PostalCode City, Province holds provided:
for each possible t1, t2,
if t1.PostalCode = t2.PostalCode then
(t1.{City,Province} = t2.{City,Province})
9

What are Functional Dependencies


saying exactly?
An FD X > Y holds for a relation r, provided given any
two tuples in r, if the X values agree, then the Y values
also agree
Also can be read as X determines Y, X arrow Y, or
X implies Y, or X controls Y. Pick a version that sounds
intuitive to you.

10

FDs made precise


Youve already seen a special case of FDs Key
constraints!
The FD Department > MailingLocation is supposed to
hold for the relation mailingAddress(Name, Department,
MailingLocation).
In Datalog-like notation, this means

mailingLocation(N, D, M ),
mailingLocation(N 0 , D, M 0 )
! M = M 0.
11

Better yet, in simplified datalog-like


notation
mailingLocation( , D, M ),
mailingLocation( , D, M 0 )
! M = M 0.

IMPORTANT: Read the two dont cares (i.e., _) as


whatever; they need not be equal; we dont constrain
them or even think about them!

12

One more example


The FD PostalCode > Province, City which is
supposed to hold for
address(House#, Street, City, Province, PostalCode) says:

address(H, S, C, P, P C),

address(H, S, C, P, P C),

address(H 0 , S 0 , C 0 , P 0 , P C) address(H 0 , S 0 , C 0 , P 0 , P C)
! C = C 0.

! P = P 0.

13

And its simplified version


address(House#, Street, City, Province, PostalCode)

address( , , C, , P C),

address( , , , P, P C),

address( , , C 0 , , P C)

address( , , , P 0 , P C)

! C = C 0.

! P = P 0.

14

Lets see some more instances


House #

Street

City

Province

Postal Code

101

Main Street

Vancouver

BC

V6A 2S5

103

Main Street

Vancouver

BC

V6A 2S5

101

Cambie Street

Vancouver

BC

V6B 4R3

103

Cambie Street

Vancouver

BC

V6B 4R3

101

Main Street

Delta

BC

V4C 2N1

103

Main Street

Delta

BC

V4C 2N1

Note: House#, Street, PostalCode is a key.


FD: It looks like maybe CityProvince, but theres a Victoria
in BC, Newfoundland, and Ontario & a Delta in Ontario.

Moral: cant tell from instances alone whether an FD necessarily


holds.

15

Which functional dependencies. again?


An FD is a statement about all allowable instances.
Must be identified by application semantics and at
design time.

Postal
code street? Department mailingLocation?

Given an instance r of R, we can check if r violates


some FD f, but we cannot tell if f holds over R, i.e.,
whether f holds for all allowable instances of R!
Note, r denotes instance and R denotes schema.

16

Which functional dependencies. again?


Well concentrate on cases where theres a single
attribute on the RHS: (e.g., PostalCodeProvince)

There are boring, trivial cases:


e.g. PostalCode, House# PostalCode
more generally, X > Y where Y is a subset of X

Our focus: the non-boring ones, since boring FDs are


trivial and dont tell us anything worthwhile.

17

Naming the Evils of Redundancy


Lets consider Postal Code

City, Province

House #

Street

City

Province

Postal Code

101

Main Street

Vancouver

BC

V6A 2S5

103

Main Street

Vancouver

BC

V6A 2S5

101

Cambie Street

Vancouver

BC

V6B 4R3

103

Cambie Street

Vancouver

BC

V6B 4R3

101

Main Street

Delta

BC

V4C 2N1

103

Main Street

Delta

BC

V4C 2N1

redundancy: city, province info. for each postal code


repeated once for each house!

18

Naming the Evils of Redundancy


Lets consider Postal Code

City, Province

House #

Street

City

Province

Postal Code

101

Main Street

Vancouver

BC

V6A 2S5

103

Main Street

Vancouver

BC

V6A 2S5

101

Cambie Street

Vancouver

BC

V6B 4R3

103

Cambie Street

Vancouver

BC

V6B 4R3

101

Main Street

Delta

BC

V4C 2N1

103

Main Street

Delta

BC

V4C 2N1

Update anomaly: How easily can we change Deltas


province?
must change once for each house in Delta!
19

Naming the Evils of Redundancy


Lets consider Postal Code

City, Province

House #

Street

City

Province

Postal Code

101

Main Street

Vancouver

BC

V6A 2S5

103

Main Street

Vancouver

BC

V6A 2S5

101

Cambie Street

Vancouver

BC

V6B 4R3

103

Cambie Street

Vancouver

BC

V6B 4R3

101

Main Street

Delta

BC

V4C 2N1

103

Main Street

Delta

BC

V4C 2N1

Insertion anomaly: What if we want to insert that


V6T 1Z4 is in Vancouver, BC?
Cant do now without a street and a house#!
20

Naming the Evils of Redundancy


Lets consider Postal Code

City, Province

House #

Street

City

Province

Postal Code

101

Main Street

Vancouver

BC

V6A 2S5

103

Main Street

Vancouver

BC

V6A 2S5

101

Cambie Street

Vancouver

BC

V6B 4R3

103

Cambie Street

Vancouver

BC

V6B 4R3

101

Main Street

Delta

BC

V4C 2N1

103

Main Street

Delta

BC

V4C 2N1

NULL

Vancouver

BC

V6T 1Z4

NULL

We can force-fit using NULL values.


But null values are problematic. Will discuss those
problems when discussing SQL. Want to minimize the
occurrence of nulls in a database.

21

Naming the Evils of Redundancy


Lets consider Postal Code

City, Province

House #

Street

City

Province

Postal Code

101

Main Street

Vancouver

BC

V6A 2S5

103

Main Street

Vancouver

BC

V6A 2S5

101

Cambie Street

Vancouver

BC

V6B 4R3

103

Cambie Street

Vancouver

BC

V6B 4R3

101

Main Street

Delta

BC

V4C 2N1

103

Main Street

Delta

BC

V4C 2N1

Deletion anomaly: If we delete all addresses with postal


code V6A 2S5, we lose the info. that V6A 2S5 is in
Vancouver!
Can we do better?
E.g., what about splitting the relation?
22

Maybe we should split up our relation?


House #

Street

City

City

Province Postal
Code

101

Main Street

Vancouver

103

Main Street

Vancouver

Vancouver

BC

V6A 2S5

101

Cambie Street

Vancouver

Vancouver

BC

V6B 4R3

103

Cambie Street

Vancouver

Delta

BC

V4C 2N1

101

Main Street

Delta

103

Main Street

Delta

Is this DB equivalent to what we started with?


How can you tell?
Did we gain / lose info.?
That didnt work so well!
23

One more try


What if we tried
House #

Street

Postal Code

City

Province Postal
Code

101

Main Street

V6A 2S5

103

Main Street

V6A 2S5

Vancouver

BC

V6A 2S5

101

Cambie Street

V6B 4R3

Vancouver

BC

V6B 4R3

103

Cambie Street

V6B 4R3

Delta

BC

V4C 2N1

101

Main Street

V4C 2N1

103

Main Street

V4C 2N1

Did we lose anything?


Are our problems fixed?
Okay, that worked pretty well.
Would be nice to understand why it worked!
Would be even better to understand when it would work.

24

What do we need to know to split apart


addresses without losing information?
FDs tell us when were storing redundant information
Reducing redundancy helps eliminate anomalies and
save storage space
Wed like to split apart tables without losing information

Suppose
a schema R(A,B,C,D) is not known to satisfy any FDs.
Can
we split R in a lossless way?

<in-class exercise.>

25

What do we need to know to split apart


addresses without losing information?
FDs tell us when were storing redundant information
Reducing redundancy helps eliminate anomalies and
save storage space
Wed like to split apart tables without losing information

Suppose a schema R(A,B,C,D) does satisfy some FDs.


Will any split of R be a lossless split?

<in-class exercise.>
26

What do we need to know to split apart


addresses without losing information?
FDs tell us when were storing redundant information
Reducing redundancy helps eliminate anomalies and save
storage space
Wed like to split apart tables without losing information

But first, we need to know:


what FDs are explicit (given) and
what FDs are implicit (can be derived)
Among other things, this can help us derive additional keys
from the given FDs (spare keys are handy in databases, just
as in real life well see why shortly)
27

What happened so far?


redundancy is bad and leads to several problems.
to minimize redundancy, split a table.
some splits may cause us to lose info.!
were hoping FDs will guide us to good splits w/o losing
info.
before we can do that, we need to know how to derive
new FDs from given FDs.

<Clicker Break 1>

28

The Keys the key!


As a reminder, a key is a minimal set of attributes that
uniquely identify tuples in a relation
i.e., a key is a minimal set of attributes that functionally
determines all the attributes
e.g., House#, Street, PostalCode is a key

29

The Keys the key!


As a reminder, a key is a minimal set of attributes that
uniquely identify tuples in a relation
i.e., a key is a minimal set of attributes that functionally
determines all the attributes
e.g., House#, Street, PostalCode is a key
A superkey for a relation uniquely identifies the relation, but
does not have to be minimal
i.e.,: key superkey
E.g.,:
House#, Street, PostalCode is a key and a super key
House#, Street, PostalCode, Province is a superkey,
but not a key
30

Notational (Review) Clinic


We write sets of attributes {A, B, C} as ABC.
In case of real attributes we write House#, Street instead
of {House#, Street}.
Instead of X [ Y, we simply write XY, where X and Y are
sets of attributes.

31

Deriving Additional FDs:


the basics

William W. Armstrong.
Canadian, eh?

Given some FDs, we can often infer additional FDs:


e.g., {sid phone, phone acode} implies
sid acode.
An FD f is implied by a set of FDs F if f holds whenever all
FDs in F hold.
(Consequence) closure of F : the set of all FDs implied
by F.

32

Deriving Additional FDs:


the basics

William W. Armstrong.
Canadian, eh?

Given some FDs, we can often infer additional FDs:


sid! phone, phone! acode implies
sid! acode.
An FD f is implied by a set of FDs F if f holds whenever all
FDs in F hold.
(Consequence) closure of F : the set of all FDs implied
by F.
Armstrongs Axioms (X, Y, Z are sets of attributes):
Reflexivity: If Y X, then X Y
e.g., city,majorcity

33

Deriving Additional FDs:


the basics

William W. Armstrong.
Canadian, eh?

Given some FDs, we can often infer additional FDs:


sid phone, phone acode implies sid acode.
An FD f is implied by a set of FDs F if f holds whenever all
FDs in F hold.
(Consequence) closure of F : the set of all FDs implied
by F.
Armstrongs Axioms (X, Y, Z are sets of attributes):

Augmentation: If X Y, then X Z Y Z for any Z


e.g., if sidcity, then sid,major

city,major
34

Deriving Additional FDs:


the basics

William W. Armstrong.
Canadian, eh?

Given some FDs, we can often infer additional FDs:


sid phone, phone acode implies sid acode.
An FD f is implied by a set of FDs F if f holds whenever all
FDs in F hold.
(Consequence) closure of F : the set of all FDs implied
by F.
Armstrongs Axioms (X, Y, Z are sets of attributes):

Transitivity: If X Y and Y Z, then X Z


e.g., sid phone, phone acode implies sid acode
These three are sound and complete inference rules for FDs.
35

Why do we care? Greatly simplifies analysis!

Deriving Additional FDs


Couple of additional rules (that follow from axioms):
Union: If XY and XZ, then XY Z

e.g., if sidacode and sidcity, then sidacode,city

Decomposition: If XY Z, then XY and XZ

e.g., if sidacode,city then sidacode, and sidcity

36

Deriving Additional FDs


Examples:
Derive union rule from axioms (Augmentation and
Transitivity)
Derive Decomposition rule from Reflex and Trans.

Corollary: Given any set of FDs F, can convert F into an


equivalent set of FDs F, s.t. every FD in F is of the form
XA, where X is a set of attributes and A is a single
attribute.

37

Example: Supplier-Part DB
Suppliers supply parts to projects.
supplier attributes: sname, city, status
part attributes: p#, pname
supplier-part attributes: qty:
SupplierPart(sname,city,status,p#,pname,qty)
Functional dependencies:
fd1:
sname city
fd2:
city status
fd3:
p# pname
fd4:
sname, p# qty

38

Supplier-Part Key: Part 1:


Determining all attributes
Exercise: Show that (sname, p#) is a key of
SupplierPart(sname,city,status,p#,pname,qty)

fd1:
fd2:
fd3:
fd4:

sname city
city status
p# pname
sname, p# qty

39

Supplier-Part Key: Part 1:


Determining all attributes
fd1:
Exercise: Show that (sname, p#) is a key of
fd2:
SupplierPart(sname,city,status,p#,pname,qty)
fd3:
Proof has two parts:
a. Show: sname, p# is a (super)key
fd4:
1.
sname, p# sname, p#
sname city
2.
sname status
3.
4.
sname,p# city, p#
sname,p# status, p#
5.
6.
sname,p# sname, p#, status
7.
sname,p# sname, p#, status, city
sname,p# sname, p#, status, city, qty
8.
9.
sname,p#sname,pname
10. sname,p# sname, p#, status, city, qty, pname

sname city
city status
p# pname
sname, p# qty
reflex
fd1.
2, fd2, trans
2, aug
3, aug
1, 5, union
4, 6, union
7, fd4, union
fd3, aug.
8, 9, union
40

Supplier-Part Key: Part 2:


Minimality
b. Show: (sname, p#) is a minimal superkey of
SupplierPart(sname,city,status, p#,pname,qty)
1. p# does not appear on the RHS of
any FD therefore except for p# itself,
nothing determines p#
3. specifically, sname p# does not hold
4. therefore, sname is not a key
fd1:
sname city
5. similarly, p# is not a key
fd2:
city status
fd3:
fd4:

p# pname
sname, p# qty

41

Do you, by any chance, have


anything less painful?
Scared youre going to mess up? There is a closure
method for checking FDs that is intuitive and easy to use.
We denote the closure of a set of attributes X as X+.
Fact: Attribute A belongs to X+ iff X>A holds. That is,
X+ = R iff X is a super key of R.

Closure = X;
repeat {
if (A1A2Ak
B is an FD) &
(A1A2Ak Closure)

add B to Closure }
until Closure does not change.

Algorithm for
computing the
closure of an
attribute set X.

42

Example of closure computation


SupplierPart(sname,city,status,p#,pname,qty)
Let us compute the following closures:
{sname, p#}+ =

fd1:
fd2:
fd3:
fd4:

sname city
city status
p# pname
sname, p# qty

{sname}+ =
{p#}+ =

Note on notation: when convenient, we write A+ instead of {A}+


and (AB)+ instead of {A, B}+.
43

Heres a painless method

4
Revisit previous supplier example.
44

Heres what happened in the last little


while
deriving new FDs from given FDs: painful method
directly use inference rules.
painless method: use the closure method.
find (candidate) keys: use the closure method to find
super keys and some additional reasoning to find
minimal super keys (aka candidate keys).

<Clicker Break 2>

45

Flash back our original question was


Is this a good design?
Name

Department

Mailing Location

Ed Knorr

Computer Science

201-2366 Main Mall

Raymond Ng

Computer Science

201-2366 Main Mall

Laks V.S. Lakshmanan

Computer Science

201-2366 Main Mall

Meghan Allan

Computer Science

201-2366 Main Mall

Joel Friedman

Computer Science

201-2366 Main Mall

Joel Friedman

Math

121-1984 Mathematics Rd

Brian Marcus

Math

121-1984 Mathematics Rd

Is there a rule that says if the amount of redundancy that


we have is good?
If this design isnt good, how to split the table in a good way?

46

Functional dependencies & keys


In a functional dependency, a set of attributes determines
other attributes, e.g., ABC, means A and B together
determine C
A trivial FD determines what you already have, e.g.,
ABB
A key is a minimal set of attributes determining the rest
of the attributes of a relation, e.g.,
R(Name, Department, MailingLocation).
A super key is a set of attributes determining the rest of
the attributes in the relation, but does not HAVE to be
minimal (e.g., the key {sname, p#} of relation
supplierPart, or adding in other attributes like city, status,
)
47

Functional dependencies & keys


Given a set of (explicit) functional dependencies, we can
derive others. Wed covered how to do so using
Armstrongs axioms
Theorem: R satisfying FDs F, decomposed into R1 and
R2. It is lossless join (LLJ) iff one of these FDs is implied
by F:
R1 R2 R1 OR
R1 R2 R2.
Note the Key connection! :-)
<Clicker Break 3>
48

Time we achieved some normalcy!


Role of FDs in detecting redundancy:
Consider a relation schema R with 3 attributes, A B C.
No FDs hold: There is no redundancy here.
Given A B: Several tuples could have the
same A value, and if so, theyll all have the same B
value!
Normalization: the process of removing redundancy
from data

49

Normal Forms: Why have one rule


when you can have four?
Provide guidance for table refinement/reducing redundancy.
Four important normal forms:
First normal form(1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd Normal Form (BCNF)
If a relation is in a certain normal form, certain problems
(aka anomalies!) are avoided/minimized.
Normal forms can help decide whether decomposition (i.e.,
splitting tables) will help.

50

1NF
Each attribute has only one value
E.g., for postal code you cant have
both V6T 1Z4 and V6S 1W6 in the same
tuple!
Codds original vision of the relational
model allowed multi-valued attributes.

51

Recall trivial FDs


An FD X > A is trivial if A belongs to X.
More generally, a FD X > Y (where X and Y are sets of
attributes) is trivial if Y is a subset of X.
e.g., City, Province City is a trivial FD.
We say an FD is non-trivial if it is not trivial.

52

3NF
i.e., whenever X
A relation R is in 3NF if:
If X A is a non-trivial dependency in R, determines a non-key
attr, X better be a
then either X is a superkey for R
super key.
or A is part of a key.
Note: Being part of a super key doesnt count! Why? Super Key could
contain junk.

Example: address(Street, City, PostalCode), abbreviated to:


address(S,C,P).
FDs: SCP.
PC.
Keys: SC, SP.
Does it satisfy 3NF? What about 2NF?
We will return to 3NF a little later.
53

Raymond Boyce & Ted Codd

Boyce-Codd Normal Form (BCNF)


A relation R is in BCNF if:
If X A is a non-trivial FD in R,
then X is a superkey for R
(Must be true for every such FD)

In English:
Only (super)keys should determine other attributes.
Ex: Address(House#, Street, City, Province, PostalCode)
FD: PostalCode City
Recall applicable FDs for
Is it in BCNF? Why (not)?
Address: PostalCode City,
PostalCode Province.

54

What do we want?
Guaranteed freedom from redundancy!
How do we get there?
A relation may be in BCNF already!
Interesting fact: all two attribute relations are in BCNF!
Hint: What are the only possible non-trivial FDs in a 2attribute relation schema?
If not, decomposition is the answer!

55

Decomposing a Relation
A decomposition of R replaces R by two
or more relations s.t.:
Each new relation contains a subset of
the attributes of R (and no attributes
not appearing in R), and
Every attribute of R appears in at least one new relation.
Intuitively, decomposing R means storing instances of the
relations produced by the decomposition, instead of
instances of R.
E.g., Address(House#,Street,City,Province,Postal Code)
How can we decompose without losing information?
56

How can we decompose a relation w/o


losing information?
Address(House#,Street,City,Province,Postal Code).

Address(House#,Street#,PostalCode)

PC(City, Province, PostalCode)

Does the above decomposition lose information?


What does it mean to lose information?
How can we tell if we lose?
We need to know how the JOIN operation in
Relational Algebra works, for this purpose.

57

Lossless-Join Decompositions:
Definition
Informally: If we break a relation, r, into pieces, when we put the
pieces back together, we should get exactly r back again
Formally: Decomposition of R into X and Y is lossless-join w.r.t. a
set of FDs F if, for every instance r that satisfies F:
If we JOIN the X-part of r with the Y-part of r the result is
exactly r
REMARKS:
1. It is always true that r is a subset of the JOIN of its X-part
and Y-part.
2. In general, the other direction does not hold! If it does, the
decomposition is a lossless-join.
58
All decompositions used to resolve redundancy must be lossless!

Example Lossy-Join Decomposition

A
1
4
7

B
2
5
2

A
1
4
7

B
2
5
2

B
2
5
2

C
3
6
8

C
3 decompose
6
8

So what did we lose?

(join)

A
1
4
7
1
7

B
2
5
2
2
2

C
3
6
8
8
3

Note: tuples (1 2 8),


(7 2 3) not present in
original.
59

How do we decompose into BCNF


losslessly?
Let r be a relation with attributes R, and F be a set of FDs
on R s.t. all FDs have a single attribute on the RHS.
Pick any f FD of the form XA that violates BCNF
Decompose R into two relations: R1(R-A) & R2(XA)
Recurse on R1 and R2 using FDs
Pictorially:

R1

R2
Others

Note: answer may vary depending on order you choose.


Thats okay -- All final answers guaranteed to be in
BCNF.
60

BCNF Example
Recall def. of BCNF: For all non-trivial FDs XA, X must
be a superkey .
ABCD
E.g.: Relation: R(ABCD) FD: BC, DA
Keys?
AD B C
A+ = A; B+ = BC; C+ = C; D+ = AD;
(BD)+ = BDCA; BD is the only key
Process R(ABCD).
Look at FD B C. Is B a superkey?
No. Decompose R into R1(B,C), R2(A,B,D)
BC is the only FD that applies to R1.
R1 is in BCNF.
Process R2(ABD).
====>

61

BCNF Example (contd.)


This is how far we got
We know all is well with BC, i.e., it is
in BCNF.

ABCD
AD B

Now, look at FD DA. Is D a superkey for R2?


No. Decompose R2 into
R3(D,A), R4(D,B).
B

ADB
D

Final answer: R1(B,C), R3(D,A), R4(D,B)


{R1, R3, R4} is a LLJ decomposition of R.
R1, R3, R4 are each in BCNF.
62

Another BCNF Example


R(ABCDE)
FD: ABC, DE.
Generate the BCNF (lossless-join) decomposition of R.
IOW, split up R into smaller relation schemas s.t. each of
them is in BCNF and together they are LLJ.

63

After you decompose, how do you


know which FDs apply?

Yes. Closure

64

Yet Another BCNF Example:


R(A,B,C,D,E,F)
FD =
AB
DE F,
BC
Is it in BCNF? If so, why. If not, decompose into BCNF

65

This BCNF stuff is great and easy!


Guaranteed that there will be no redundancy of data
Easy to understand (just look for superkeys)
Easy to do.
So why are there more normal forms?
For one thing, BCNF may not preserve all
dependencies
What does that mean?

66

An illustrative BCNF example


Unit

Company

Product

Company, Product
Unit, Product
Unit

Company

Unit Company
Company, Product Unit
Key(s)?
Unit

Product

BCNF:
No non-trivial FDs
We lose the FD: Company, Product Unit !!
67

Unit Company

So Whats the Problem?


Unit

Company

Unit

Product

SKYWill

UBC

SKYWill

Databases

Team Meat

UBC

Team Meat

Databases

Unit Company
No problem so far. All local FDs are satisfied.
Lets put all the data back into a single table again:
How could
Unit
Company
Product
the dbms
check if an
SKYWill
UBC
Databases
update would
Team Meat
UBC
Databases
violate the FD
Company,
Violates the FD:
Company, Product Unit
Product
68
Unit?

3NF to the rescue!


Recall: A relation R is in 3NF if:
If X A is a non-trivial FD in R,
then either X is a superkey for R
or A is part of a key.
(must be true for every such FD)

BCNF

Note: A must be part of a key not just a superkey (if a key


exists, all attributes are part of a superkey!)
Example: R(Unit,Company, Product)
FDs: Unit Company BCNF, no. Company part of a key so 3NF
Company, Product Unit Company, Product = superkey
Keys: {Company, Product}, {Unit,Product}
Is it in BCNF? 3NF?
To decompose into 3NF we rely on the minimal cover
69

Minimal Cover for a Set of FDs


Goal: Transform FDs to be as compact as possible
Minimal cover G for a set of FDs F:
Closure of F = closure of G (i.e., imply the same FDs)
RHS of each FD in G is a single attribute
If we delete an FD in G or delete attributes from an FD in
G, the closure changes
Intuitively, every FD in G is needed, and is as slim as
possible in order to get the same closure as F
e.g., AB, ABCDE, EFGH, ACDFEG has the
following minimal cover:
AB, ACDE, EFG and EFH
Well see how to derive this on the next slide

70

Finding minimal covers of FDs


1.
2.
3.

Put FDs in standard form (have only one attribute on


RHS)
Minimize LHS of each FD 1. Need ACDFE, ACDFG ?
2. ABCDE goes to ACDE (closure)
Delete Redundant FDs
3. Redundant: ACDFE, ACDFG
(take closure of ACDF w/o rule ACDFE)
In the end: AB, ACDE, EFG, EFH

Example:
AB, ABCDE, EFGH, ACDF

EG

71

Another minimal cover example


Consider the relation R(CSJDPQV) with FDs
CSJDPQV, JPC, SDP, JS
Find a minimal cover

72

Decomposition into 3NF


using Minimal Cover

73

Synthesis of 3NF from scratch

74

Another 3NF example


R(ABCDE)

FDs: ACDE,

CEA

75

Yet another checkpoint


3ND and BCNF most popular normal forms, i.e.,
designs free from anomalies.
BCNF stronger than 3NF.
BCNF decomposition guarantees lossless join (LLJ)
decomposition, i.e., good splits.
3NF decomposition also guarantees LLJ.
it can be additionally made to preserve FDs, which BCNF
is not always guaranteed to do.
there is a simple synthesis algorithm for obtaining a 3NF
design too; to use it, you need to be able to find a
minimal cover for the given set of FDs.
76

Comparing BCNF & 3NF


BCNF guarantees removal of all anomalies
3NF has some anomalies, but preserves all
dependencies
If a relation R is in BCNF it is in 3NF.
A 3NF relation R may not be in BCNF if all 3 of the
following conditions are true:
a. R has multiple keys
b. Keys are composite (i.e. not single-attributed)
c. These keys overlap
BCNF

3NF

2NF 1NF
77

On the one hand


Normalization and Design
Most organizations go to 3NF or better
If a relation has only 2 attributes, it is automatically in 3NF
and BCNF
Our goal is to use lossless-join for all decompositions and
preserve dependencies
BCNF decomposition is always lossless, but may not
preserve dependencies
Good heuristic :
Try to ensure that all relations are in at least 3NF
Check for dependency preservation

78

On the other hand


Denormalization
Process of intentionally violating a normal form to gain
performance improvements
Performance improvements:
Fewer joins
Reduces number of foreign keys
Since FDs are often indexed, the number of indexes
may be reduced
Useful if certain queries often require (joined) results, and
the queries are frequent enough

79

Learning Goals Revisited


Debate the pros and cons of redundancy in a database.
Provide examples of update, insertion, and deletion
anomalies.
Given a set of tables and a set of functional dependencies
over them, determine all the keys for the tables.
Show that a table is/isnt in 3NF or BCNF.
Justify why lossless join decompositions are preferred
decompositions.
Decompose a table into a set of tables that are in 3NF, or
BCNF.
Additionally

80

Learning Goals Revisited


Given a set of FDs, find all keys of a relation scheme and
prove that we have found them all.
Find a minimal cover for a set of FDs.
Test if a decomp. Is LLJ.
Test if a decomp. is dependency preserving, i.e.,
preserves all FDs.

81

You might also like