0% found this document useful (0 votes)
50 views39 pages

C20.0046: Database Management Systems Lecture #5: M.P. Johnson Stern School of Business, NYU Spring, 2008

This document summarizes a lecture on database management systems given by M.P. Johnson at NYU Stern in Spring 2008. The lecture covers functional dependencies, anomalies that can occur in databases, and techniques for normalization including BCNF and third normal form. Examples are provided to illustrate closure algorithms, different types of anomalies, and how to decompose a relation using projection to eliminate anomalies.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views39 pages

C20.0046: Database Management Systems Lecture #5: M.P. Johnson Stern School of Business, NYU Spring, 2008

This document summarizes a lecture on database management systems given by M.P. Johnson at NYU Stern in Spring 2008. The lecture covers functional dependencies, anomalies that can occur in databases, and techniques for normalization including BCNF and third normal form. Examples are provided to illustrate closure algorithms, different types of anomalies, and how to decompose a relation using projection to eliminate anomalies.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

C20.

0046: Database
Management Systems
Lecture #5
M.P. Johnson
Stern School of Business, NYU
Spring, 2008

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 1


Agenda
 Last time: FDs
 This time:
1. Anomalies
2. Normalization: BCNF & 3NF
 Next time: SQL

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 2


Review: FDs
 Definition:
If two tuples agree on the attributes
AA11,,AA22,,…,
…,AAnn
then they must also agree on the attributes
BB11,,BB22,,…,
…,BBmm

AA1,,AA2,,…,
…, AAn BB1,,BB2,,…,
…, BBm
 Notation: 1 2 n 1 2 m

 Read as: Ai functionally determines Bj

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 3


Review: Combining FDs
If some FDs are satisfied, then
others are satisfied too

name
name  color
color
If all these FDs are true: category
category  department
department
color,
color, category price
category  price

Then this FD also holds: name,


name, category  price
category  price

Why?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 4


Problem: find all FDs
 Given a relation instance and set of given FDs
 Find all FD’s satisfied by that instance

 Useful if we don’t get enough information from our


users: need to reverse engineer a data instance

 Q: How long does this take?


 A: Some time for each subset of atts
 Q: How many subsets?
 powerset
  exponential time in worst-case
 But can often be smarter…
M.P. Johnson, DBMS, Stern/NYU, Spring 2008 5
Closure Algorithm

Start
Startwith
withX={A
X={A11,,…,
…,AAnn}.}. Example:

name
name  color
color
Repeat:
Repeat: category
category  department
department
color,
color, category price
category  price
ifif BB11,,…,
…,BBnn 
CC isisaaFD
FD and
and
BB1,,…, B are all in X
1 …, Bnn are all in X {name, category}+ =
then
then add
add CCto toX.
X. {name, category, color,
department, price}
until
until XXdidn’t
didn’t change
change

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 6


Closure alg e.g.

Example: A,
A, BB 
 CC
A,
A, DD 
 BB
BB  DD

Compute X+, for every set X (AB is shorthand for {A,B}):

AA++ == A,
A, BB+ ==BD,
+
BD, CC+ == C,
+
C, DD+ == DD
+

AB
AB+ ==ABCD,
+
ABCD, AC AC+== AC,
+
AC, ADAD+ == ABCD,
+
ABCD, BC+ BC+==BC,
BC,BD+
BD+==BD,
BD,CD+
CD+==CD
CD
ABC
ABC+ ==ABD
+
ABD+ == ACD
+
ACD+==ABCD
+
ABCD((nononeed
needto
tocompute–why?)
compute–why?)
BCD
BCD = BCD, ABCD == ABCD
++
= BCD, ABCD ++
ABCD
What are the keys?
M.P. Johnson, DBMS, Stern/NYU, Spring 2008 7
Closure alg e.g.

In class:

R(A,B,C,D,E,F) A,
A, BB 
 CC
A,
A, DD 
 EE
BB  DD

A,
A, FF 
 BB

Compute {A,B}+ X = {A, B, }

Compute {A, F}+ X = {A, F, }

What are the keys?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 8


Closure alg e.g.
 Product(name, price, category, color)
name, category  price
category  color

FDs are:
Keys are: {name, category}

 Enrollment(student, address, course, room, time)


student  address
room, time  course
student, course  room, time

FDs are:
Keys are:

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 9


Next topic: Anomalies
 Identify anomalies in existing schemata

 Decomposition by projection

 BCNF

 Lossy v. lossless

 Third Normal Form


M.P. Johnson, DBMS, Stern/NYU, Spring 2008 10
Types of anomalies
 Redundancy
 Repeat info unnecessarily in several tuples

 Update anomalies:
 Change info in one tuple but not in another

 Deletion anomalies:
 Delete some values & lose other values too

 Insert anomalies:
 Inserting row means having to insert other, separate info /
null-ing it out

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 11


Examples of anomalies
Name SSN Mailing-address Phone
Michael 123 NY 212-111-1111
Michael 123 NY 917-111-1111
Hilary 456 DC 202-222-2222
Hilary 456 DC 914-222-2222
Bill 789 Chappaqua 914-222-2222
Bill 789 Chappaqua 212-333-3333

SSN
SSN Name,
Name,Mailing-address
Mailing-address SSN
SSN Phone
Phone
 Redundancy: name, maddress
 Update anomaly: Bill moves
 Delete anom.: Bill doesn’t pay bills, lose phones  lose Bill!
 Insert anom: can’t insert someone without a (non-null) phone
 Underlying cause: SSN-phone is many-many
 Effect: partial dependency ssn  name, maddress,
 Whereas key = {ssn,phone}
M.P. Johnson, DBMS, Stern/NYU, Spring 2008 12
Decomposition by projection
 Soln: replace anomalous R with projections of R onto
two subsets of attributes
 Projection: an operation in Relational Algebra
 Corresponds to SELECT command in SQL

 Projecting R onto attributes (A1,…,An) means


removing all other attributes
 Result of projection is another relation
 Yields tuples whose fields are A1,…,An
 Resulting duplicates ignored

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 13


Projection for decomposition
R(A
R(A11,, ...,
...,AAnn,, BB11,, ...,
...,BBmm,,CC11,,...,
...,CCpp))

RR11(A
(A11,,...,
...,AAnn,,BB11,,...,
...,BBmm)) RR22(A
(A11,, ...,
...,AAnn,, CC11,,...,
...,CCpp))

R1 = projection of R on A1, ..., An, B1, ..., Bm


R2 = projection of R on A1, ..., An, C1, ..., Cp
A1, ..., An  B1, ..., Bm  C1, ..., Cp = all attributes
R1 and R2 may (/not) be reassembled to produce original R

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 14


Decomposition example
Name SSN Mailing-address Phone
Break Michael 123 NY 212-111-1111
the Michael 123 NY 917-111-1111
relation Hilary 456 DC 202-222-2222

into Hilary 456 DC 914-222-2222


Bill 789 Chappaqua 914-222-2222
two:
Bill 789 Chappaqua 212-333-3333

Name SSN Mailing-address SSN Phone


Michael 123 NY 123 212-111-1111
Hilary 456 DC 123 917-111-1111
Bill 789 Chappaqua 456 202-222-2222

 The anomalies are gone 456 914-222-2222

 No more redundant data 789 914-222-2222


789 212-333-3333
 Easy to for Bill to move
 Okay for Bill to lose all phones
M.P. Johnson, DBMS, Stern/NYU, Spring 2008 15
Thus: high-level strategy
name

E/R Model: Product buys Person

price name ssn

Relational Model:
plus FD’s

Normalization:
Eliminates anomalies

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 16


Using FDs to produce good schemas
1. Start with set of relations
2. Define FDs (and keys) for them based on real
world
3. Transform your relations to “normal form”
(normalize them)
 Do this using “decomposition”

 Intuitively, good design means


 No anomalies
 Can reconstruct all (and only the) original information

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 17


Decomposition terminology
 Projection: eliminating certain atts from relation
 Decomposition: separating a relation into two by
projection
 Join: (re)assembling two relations
 Whenever a row from R1 and a row from R2 have the same
value for some atts A, join together to form a row of R3

 If exactly the original rows are reproduced by joining


the relations, then the decomposition was lossless
 We join on the attributes R1 and R2 have in common (As)
 If it can’t, the decomposition was lossy

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 18


Lossless Decompositions
A decomposition is lossless if we can recover:
R(A,B,C)
Decompose

R1(A,B) R2(A,C)
Recover

R’(A,B,C) should be the same as


R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
M.P. Johnson, DBMS, Stern/NYU, Spring 2008 19
Lossless decomposition
 Sometimes the same set of data is reproduced:
Name Price Category
Word 100 WP
Oracle 1000 DB
Access 100 DB

Name Price Name Category


Word 100 Word WP
Oracle 1000 Oracle DB
Access 100 Access DB

 (Word, 100) + (Word, WP)  (Word, 100, WP)


 (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB)
 (Access, 100) + (Access, DB)  (Access, 100, DB)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 20


Lossy decomposition
 Sometimes it’s not:
Name Price Category
Word 100 WP
What’s
Oracle 1000 DB
wrong?
Access 100 DB

Category Name Category Price

WP Word WP 100

DB Oracle DB 1000

DB Access DB 100

 (Word, WP) + (100, WP)  (Word, 100, WP)


 (Oracle, DB) + (1000, DB)  (Oracle, 1000, DB)
 (Oracle, DB) + (100, DB)  (Oracle, 100, DB)
 (Access, DB) + (1000, DB)  (Access, 1000, DB)
 (Access, DB) + (100, DB)  (Access, 100, DB)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 21


Ensuring lossless decomposition
R(A
R(A11,, ...,
...,AAnn,, BB11,, ...,
...,BBmm,,CC11,,...,
...,CCpp))

RR1(A , ..., A , B , ..., B ) RR22(A


(A11,, ...,
...,AAnn,, CC11,,...,
...,CCpp))
1(A11, ..., Ann, B11, ..., Bmm)

If A1, ..., An  B1, ..., Bm or A1, ..., An  C1, ..., Cp


Then the decomposition is lossless
Note: don’t need both
 Examples:
 name  price, so first decomposition was lossless
 category  name and category  price, and so second
decomposition was lossy

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 22


Quick lossless/lossy example

X Y Z
1 2 3
4 2 5

 At a glance: can we decompose into R1(Y,X), R2(Y,Z)?


 At a glance: can we decompose into R1(X,Y), R2(X,Z)?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 23


Next topic: Normal Forms
 First Normal Form = all attributes are atomic
 As opposed to set-valued
 Assumed all along

 Second Normal Form (2NF)

 Third Normal Form (3NF)

 Boyce Codd Normal Form (BCNF)

 Fourth Normal Form (4NF)

 Fifth Normal Form (5NF)


M.P. Johnson, DBMS, Stern/NYU, Spring 2008 24
BCNF definition
 A simple condition for removing anomalies from
relations:
AArelation
relationRRisisin
inBCNF
BCNFif:
if:
IfIf As
AsBs
Bs isisaanon-trivial
non-trivialdependency
dependency
in
in RR,, then
thenAs
Asisisaasuperkey
superkeyfor
for RR

 I.e.: The left side must always contain a key


 I.e: If a set of attributes determines other attributes,
it must determine all the attributes

 Slogan: “In every FD, the left side is a superkey.”


M.P. Johnson, DBMS, Stern/NYU, Spring 2008 25
BCNF decomposition algorithm
Repeat
Repeat
choose
chooseAA11,,…,
…,AAmm
BB11,,…,
…,BBnnthat
thatviolates
violatesthe
theBNCF
BNCFcondition
condition
//Heuristic: choose Bs as large as possible
split
splitRRinto
intoRR11(A
(A11,,…,
…,AAmm,,BB11,,…,
…,BBnn))and
andRR22(A
(A11,,…,
…,AAmm,,[others])
[others])
continue
continuewith
withboth
bothRR11and
andRR22
Until
Untilno
nomore
moreviolations
violations

B’s A’s Others

R1 R2
M.P. Johnson, DBMS, Stern/NYU, Spring 2008 26
Boyce-Codd Normal Form
 Name/phone example is not BCNF:
Name SSN Mailing-address Phone
Michael 123 NY 212-111-1111
Michael 123 NY 917-111-1111

 {ssn,phone} is key
 FD: ssn  name,mailing-address holds
 Violates BCNF: ssn is not a superkey
 Its decomposition is BCNF
 Only superkeys  anything else
Name SSN Mailing-address SSN PhoneNumber
Michael 123 NY 123 212-111-1111
123 917-111-1111

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 27


Design/BCNF example
 Consider situation:
 Entities: Emp(ssn,name,lot), Dept(id,name,budg)
 Relship: Works(E,D,since)
 Draw E/R

 New rule: in each dept, everyone parks in


same lot
 Translate to FD
 Normalize

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 28


BCNF Decomposition
 Larger example: multiple decompositions
 {Title, Year, Studio, President, Pres-Address}
 FDs:
 Title Year  Studio
 Studio  President
 President  Pres-Address
  Studio  President, Pres-Address
 No many-many this time
 Problem cause: transitive FDs:
 Title,year  studio  president

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 29


BCNF Decomposition
 Illegal: As  Bs, where As not a superkey
 Decompose: Studio  President, Pres-Address
 As = {studio}
 Bs = {president, pres-address}
 Cs = {title, year}
 Result:
1. Studios(studio, president, pres-address)
2. Movies(studio, title, year)
 Is (2) in BCNF? Is in (1) BCNF?
 Key: Studio
 FD: President  Pres-Address
 Q: Does president  studio? If so, president is a key
 But if not, it violates BCNF

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 30


BCNF Decomposition
 Studios(studio, president, pres-address)
 Illegal: As  Bs, where As not a superkey
  Decompose: President  Pres-Address
 As = {president}
 Bs = {pres-address}
 Cs = {studio}
 {Studio, President, Pres-Address} becomes
 {President, Pres-Address}
 {Studio, President}

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 31


Decomposition algorithm example
 R(N,O,R,P) F = {N  O, O  R, R  N}
Name Office Residence Phone
George Pres. WH 202-…
George Pres. WH 486-…
Dick VP NO 202-…
 Key: N,P Dick VP NO 307-…
 Violations of BCNF: N  O, OR, N OR

 Pick N  OR (on board)


 Can we rejoin? (on board)
 What happens if we pick N  O instead?
 Can we rejoin? (on board)
M.P. Johnson, DBMS, Stern/NYU, Spring 2008 32
An issue with BCNF
 We could lose FDs

 Relation: R(Title, Theater, Neighboorhood)


 FDs:
 Title,N’hood  Theater

 Assume a movie shouldn’t play twice in same n’hood


 Theater  N’hood
 Keys: Title Theater N’hood
 {Title, N’hood}
Aviator Angelica Village
 {Theater, Title} Life Aquatic Angelica Village

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 33


Losing FDs
 BCNF violation: Theater  N’hood
 Decompose:
 {Theater, N’Hood}
 {Theater, Title}

 Resulting relations:
R1 R2
Theater N’hood Theater Title
Angelica Village Angelica Aviator
Angelica Life Aquatic

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 34


Losing FDs
 Suppose we add new rows to R1 and R2:

R1 Theater N’hood R2 Theater Title


Angelica Village Angelica Life Aquatic
Film Forum Village Angelica Aviator
Film Forum Life Aquatic
R’
Theater N’hood Title
Angelica Village Life Aquatic
Angelica Village Aviator
Film Forum Village Life Aquatic
 Neither R1 nor R2 enforces FD Title,N’hood  Theater
M.P. Johnson, DBMS, Stern/NYU, Spring 2008 35
Third normal form: motivation
 Sometimes
 BCNF is not dependency-preserving, and
 Efficient checking for FD violation on updates is important

 In these cases BCNF is too severe a req.


 “over-normalization”

 Solution: define a weaker normal form, 3NF


 FDs can be checked on individual relations without
performing a join (no inter-relational FDs)
 relations can be converted, preserving both data and FDs

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 36


BCNF lossiness
 NB: BCNF decomp. is not data-lossy
 Results can be rejoined to obtain the exact original

 But: it can lose dependencies


 After decomp, now legal to add rows whose corresponding
rows would be illegal in (rejoined) original

 Data-lossy v. FD-lossy

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 37


Third Normal Form
 Now define the (weaker) Third Normal Form
 Turns out: this example was already in 3NF

AArelation
relationRRisisin
in3rd
3rdnormal
normalform
form ifif::

For
Forevery
everynontrivial
nontrivialdependency ...,AAnn 
dependencyAA11,,AA22,,..., BB
for
for R,
R, {A
{A11,, AA22,, ...,
..., AAnn }}isisaasuper-key
super-keyfor
forR,
R,
or
or BBisispart
partof ofaakey,key, i.e.,
i.e., BBisis prime
prime

Tradeoff:
BCNF = no FD anomalies, but may lose some FDs
3NF = keeps all FDs, but may have some anomalies

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 38


Next week
 For Monday: read ch.5.1-2 (SQL)

 Proj1 due next Monday

M.P. Johnson, DBMS, Stern/NYU, Spring 2008 39

You might also like