0% found this document useful (0 votes)
99 views10 pages

Database Design Theory: Introduction To Databases CSCC43 Winter 2011 Ryan Johnson

1) Database design theory guides the systematic improvement of database schemas through expressing constraints on data and using those constraints to decompose relations into normal forms. 2) Functional dependencies express relationships between attributes in a relation and are used to decompose relations and achieve normal forms that guarantee desirable properties like eliminating redundancy and anomalies. 3) Identifying the functional dependencies that accurately represent constraints in a domain is an important part of database design and requires domain knowledge that a database system alone cannot determine from the data.

Uploaded by

Pritam Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views10 pages

Database Design Theory: Introduction To Databases CSCC43 Winter 2011 Ryan Johnson

1) Database design theory guides the systematic improvement of database schemas through expressing constraints on data and using those constraints to decompose relations into normal forms. 2) Functional dependencies express relationships between attributes in a relation and are used to decompose relations and achieve normal forms that guarantee desirable properties like eliminating redundancy and anomalies. 3) Identifying the functional dependencies that accurately represent constraints in a domain is an important part of database design and requires domain knowledge that a database system alone cannot determine from the data.

Uploaded by

Pritam Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2

Database Design Theory


• Guides systematic improvements to database schemas
Functional dependencies, • General idea:
decompositions, normal forms – Express constraints on the data
– Use these to decompose the relations
• Ultimately, get a schema that is in a “normal form” that
Introduction to databases guarantees certain desirable properties
CSCC43 Winter 2011 • “Normal” in the sense of conforming to a standard
• The process of converting a schema to a normal form is called
Ryan Johnson normalization

Thanks to Arnold Rosenbloom and Renee Miller


2
for material in these slides

3 4

Goal #1: redundancy, redunancy Goal #2: expressing constraints


• Consider this schema • Consider the following sets of schemas:
Student Name Student Email Course Instructor Students(utorid, name, email)
vs.
Xiao xiao@gmail CSCC43 Johnson
Students(utorid, name)
Xiao xiao@gmail CSCD08 Bretscher Emails(utorid, address)
Jaspreet jaspreet@utsc CSCC43 Johnson • Consider also:
• What if… House(street, city, value, owner, propertyTax)
– Xiao changes email addresses? (update anomaly) vs.
– Xiao drops CSCD08? (deletion anomaly) House(street, city, value, owner)
– UTSC creates a new course, CSCC44 (insertion anomaly) TaxRates(city, value, propertyTax)

Multiple relations => exponentially worse Dependencies, constraints are domain-dependent

1
6

Functional dependencies
• Let X, Y be sets of attributes from relation R
• X -> Y is an assertion about tuples in R
– Any tuples which agree in all attributes of X must also agree in all
Part I: attributes of Y
• “X functionally determines Y”
Functional dependencies – Or, “The values of attributes Y are a function of those in X”
– Not necessarily an easy function to compute, mind you
=> Consider X -> h, where h is the hash of attributes in X
• Notational conventions
– “a”, “b”, “c” – specific attributes
– “A”, “B”, “C” – sets of (unnamed) attributes
– abc -> def – same as {a,b,c} -> {d,e,f}

Most common to see singletons (X -> y or abc -> d)

7 8

Splitting FDs Splitting FDs – example


• Attributes on right independent of each other • Consider the relation
– Consider a,b,c -> d,e,f – EmailAddress(user, domain, firstName, lastName)
– “Attributes a, b, and c functionally determine d, e, and f” – user,domain -> firstName, lastName
=> No mention of d relating to e or f directly • The following hold
• Useful to split up right side of FD – user,domain -> firstName
– abc -> def becomes abc -> d, abc -> e and abc -> f – user,domain -> lastName
• No safe way to split left side • The following do NOT hold!
– abc -> def is NOT the same as ab -> def and c -> def! – user -> firstName,lastName
– domain -> firstName,lastName

Gotcha: “doesn’t hold” = “not all tuples” != “all tuples not”

2
9 10

Trivial FDs Identifying functional dependencies


• Not all functional dependencies are useful • FDs are domain knowledge
– A -> A always holds – Intrinsic features of the data you’re dealing with
– abc -> a also always holds – Something you know (or assume) about the data
• FD with an attribute on both sides is “trivial” • Database engine cannot identify FDs for you
– Simplify by removing L ∩ R from R – Designer must specify them as part of schema
abc -> ad becomes abc -> d – DBMS can only enforce FDs when told to
– Or, in singleton form, delete trivial FDs • DBMS cannot safely “optimize” FDs either
abc -> a and abc -> d becomes just abc -> d – It has only a finite sample of the data
– An FD constrains the entire domain

11 12

Coincidence or FD? Keys and FDs


ID Email City Country Surname • Consider relation R with attributes A
1983 [email protected] Toronto Canada Fairgrieve
8624 [email protected] London Canada Samways
• Superkey
– Any S ⊆ A s.t. S -> A
9141 [email protected] Winnipeg Canada Samways
=> Any subset of A which determines all remaining attributes in A
1204 [email protected] Aachen Germany Lakemeyer
• Candidate key
• What if we try to infer FDs from the data? – C ⊆ A s.t. C -> A and X -> A does not hold for any X ⊂ C
– ID -> email, city, country, surname => A superkey which contains no other superkeys
– email -> city, country, surname => Remove any attribute and you no longer have a key

– city -> country • Primary key


– surname -> country – The candidate key we use to identify the relation
=> Always exists, only one allowed, doesn’t matter which C we use
• Prime attribute
Domain knowledge required to validate FDs – ∃ candidate key C s.t. x ∈ C

3
13 14

Candidate keys vs. superkeys FD: relaxes the concept of a “key”


• Consider these relations • Functional dependency: X -> Y
Students(ID, surname, name, email, address, major)
Houses(street, city, value, owner, tax) • Superkey: X -> R
• What are the candidate keys? • A superkey must include all remaining attributes
– Students: ID, what else?
– Houses: ? of the relation on the RHS
• What other superkeys exist? • An FD can involve just a subset of them
– Students: ID,surname ID,name ID,name,surname …
– Houses: ? • Example:
• Prime attributes? – Houses(street, city, value, owner, tax)
– Students: ? – street,city -> value,owner,tax (both FD and key)
– Houses: ?
– city,value -> tax (FD only)

15 16

Cyclic functional dependencies? Geometric view of FDs


• Attributes on right side of one FD may appear • Let D be the domain of tuples in R
on left side of another! – Every possible tuple is a point in D
– Simplest example: A -> B B -> A • FD X on R restricts tuples in R to a subset of D
– What does this say about A and B? – Points in D which violate X cannot be in R
• Example • Example: D(x,y,z)
– street,city -> value city,value -> tax – xy -> z
– studentID -> email email -> studentID => z = abs(x) + abs(y) (-1, -1, 2)
(1,1,0) (0,0,1)
– z -> x,y (1, 1, 2)
=> x=y=abs(z)/2 (1, 1, -2) (2, 2, -4)
(2, 2, 4) (0, 0, 0)
(1,-1,-2) (3,2,1)
(1, 2, 3)

4
17 18

Inferring functional dependencies Closure test for FDs


• Problem • Given attribute set A and FD set F
– Given FDs X1 -> a1, X2 -> a2, etc. – Denote AF+ as the closure of A relative to F
– Does some FD Y -> B (not given) also hold? => AF+ = set of all FDs given or implied by A
• Consider the dependencies • Computing the [transitive] closure of A
A -> B B -> C – Start: AF+ = A, F’ = F
Intuitively, A -> C also holds – While ∃X ∈ F’ s.t. LHS(X) ⊆ AF+ :
The given FDs entail (imply) it AF+ = AF+ U RHS(X)
F’ = F’ - X
– At end: A -> B ∀B ∈ AF+

How to prove it in the general case?

19 20

Closure test – example Discarding redundant FDs


• Consider R(a,b,c,d,e,f) • Minimal basis: opposite extreme from closure
with FDs ab -> c, ac -> d, c -> e, ade -> f • Given a set of FDs F, want to minimize F’ s.t.
• Find A+ if A = ab – F’ ⊆ F
– F’ entails X ∀X∈F

a b c d e f a b c d e f • Properties of a minimal basis


– RHS is always singleton
– Removing any FD from F’ loses information
– Removing any attribute from any X∈F loses information
a b c d e f a b c d e f

ab -> cdef -- ab is a candidate key!

5
21

Constructing a minimal basis


• Straightforward but time-consuming
1. Split all RHS into singletons
2. ∀X ∈ F’, test whether (F’-X)+ is still equivalent to F+
=> Might make F’ too small
Part II:
3. ∀i ∈ LHS(X) ∀X ∈ F’, let LHS(X’)=LHS(X)-i Schema decomposition
Test whether (F’-X+X’)+ is still equivalent to F+
=> Might make F’ too big
4. Repeat (2) and (3) until neither makes progress

23 24

FDs and redundancy Schema decomposition


• Given relation R and FDs F • Given relation R and FDs F
– R often exhibits anomalies due to redundancy – Split R into Ri s.t. ∀i Ri ⊂ R (no new attributes)
– F identifies many (not all) of the underlying problems – Split F into Fi s.t. ∀i F entails Fi (no new FDs)
• Idea – Fi involves only attributes in Ri
– Use F to identify “good” ways to split relations • Caveat: entirely possible to lose information
– Split R into 2+ smaller relations having less redundancy – F+ may entail FD X which is not in (Ui Fi)+
– Split up F into subsets which apply to the new relations => Decomposition lost some FDs
– Possible to have R ⊂ i Ri
=> Decomposition lost some relationships
• Goal: minimize anomalies without losing info
We’ll revisit information loss in a moment

6
25 26

Splitting relations – example Gotcha: lossy join decomposition


• Consider the following relation: • Consider a relation with one more tuple
Student Name Student Email Course Instructor Student Name Student Email Course Instructor
Xiao xiao@gmail CSCC43 Johnson Xiao xiao@gmail CSCC43 Johnson
Xiao xiao@gmail CSCD08 Bretscher Xiao xiao@gmail CSCD08 Bretscher
Jaspreet jaspreet@utsc CSCC43 Johnson Jaspreet jaspreet@utsc CSCC43 Johnson
• One possible decomposition Mary mary@utsc CSCD08 Rosenburg
– Students(email, name)
Courses(name, instructor) • Students Taking Courses has bogus tuples!
Taking(studentEmail, courseName) – Mary is not taking Bretscher’s section of D08
– Xiao is not in Rosenburg’s section of D08
Why did this happen? How to prevent it?

27 28

Ensuring lossless joins Projecting FDs


• If we decompose R into S and T • Once we’ve split a relation we have to refactor
• Either (S ∩ T) -> S or (S ∩ T) -> T must be in F+ our FDs to match
– Each FDs must only mention attributes from one relation
• In our example:
– ({email,course} ∩ {course,instructor}) = {course} • Similar to geometric projection
– course -/-> instructor (one-many relationship) – Many possible projections (depends on how we slice it)
– Keep only the ones we need (minimal basis)

7
29 30

FD projection algorithm Making projection more efficient


• Start with Fi = Ø • Ignore trivial dependencies
• For each subset X of Ri – No need to add X -> A if A is in X itself
– Compute X+ • Ignore trivial subsets
– For each attribute a in X+ – The empty set or of the set of all attributes (both are subsets
• If a is in Ri of X)
– add X -> a to Fi
• Ignore supersets of X if X + = R
• Compute the minimal basis of Fi – They can only give use “weaker” FDs (with more on the LHS)
• Projection is expensive
– Suppose R1 has n attributes
– How many subsets of R1 are there?
– How many times do we consider each attribute?
30

31 32

Example: Projecting FD’s Example -- Continued


• ABC with FD’s A ->B and B ->C. Project onto • Resulting FD’s: A ->B, A ->C, and
AC. B ->C.
– A +=ABC ; yields A ->B, A ->C. • Projection onto AC : A ->C.
• We do not need to compute AB + or AC +.
– Only FD that involves a subset of {A,C }.
– B +=BC ; yields B ->C.
– C +=C ; yields nothing. • Projection on BC: B ->C
– BC +=BC ; yields nothing. – Only FD that involves subset of {B, C}.

31 32

8
34

Motivation for normal forms


• Identify a “good” schema
– For some definition of “good”

Part III: – Avoid anomalies, redundancy, etc.


• Several known normal forms
Normal forms – 1st
– 2nd
– 3rd
– Boyce-Codd
– ... and several more we won’t discuss…

BCNF ⊆ 3NF ⊆ 2NF ⊆ 1NF (focus on 3NF/BCNF)

35 36

1st normal form (1NF) 2nd normal form (2NF)


• No multi-valued attributes allowed • Non-prime attributes depend on candidate keys
– Imagine storing a list/set of things in an attribute – Consider non-prime attribute ‘a’
=> Not really even expressible in RA – Then ∃FD X s.t. X -> a and X is a candidate key
• Counterexample • Counterexample
– Course(name, instructor, [student,email]*) – Movies(title, year, star, studioName, studioAddress, salary)
– Redundancy in non-list attributes – FD: title, year -> studioName, studioAddress
Title Year Star StudioName StudioAddr Salary
Name Instructor Student Name Student Email
Star Wars 1977 Hamill Lucasfilm 1 Lucas Way $100,000
CSCC43 Johnson Xiao xiao@gmail
Star Wars 1977 Ford Lucasfilm 1 Lucas Way $100,000
Jaspreet jaspreet@utsc Star Wars 1977 Fisher Lucasfilm 1 Lucas Way $100,000
Mary mary@utsc Patriot Games 1992 Ford Paramount Cloud 9 $2,000,000
CSCD08 Rosenburg Jaspreet jaspreet@utsc Last Crusade 1989 Ford Lucasfilm 1 Lucas Way $1,000,000

9
37 38

3rd normal form (3NF) 3NF, dependencies, and join loss


• Non-prime attr. depend only on candidate keys • Theorem: always possible to convert a schema to join-
– Consider FD X -> a
lossless, dependency-preserving 3NF
– Either a ∈ X OR X is a superkey OR a is prime • Caveat: still possible to create schemas in 3NF for
which these properties do not hold
=> No transitive dependencies allowed
• Lost dependencies
• Counterexample: – MovieInfo(title, year, studioName)
– studioName -> studioAddr – StudioAddress(title, year, studioAddress)
=> Unable to enforce studioName -> studioAddress
Title Year StudioName StudioAddr
• Lossy joins
Star Wars 1977 Lucasfilm 1 Lucas Way
– Movies(title, year, star)
Patriot Games 1992 Paramount Cloud 9 – StarSalary(star, salary)
Last Crusade 1989 Lucasfilm 1 Lucas Way => Movies StarSalary yields bogus tuples

39 40

Boyce-Codd normal form (BCNF) Limits of decomposition


• One additional restriction over 3NF • Pick two…
– All non-trivial FD have superkey LHS – Lossless join
• Counterexample – Dependency preservation
– CanadianAddress(street, city, province, postalCode) – Anomaly-free
– Candidate keys: {street, postalCode}, {street, city, province} • 3NF
– FD: postalCode -> city, province – Always allows join lossless and dependency preserving
– Satisfies 3NF: city, province both non-prime – May allow some anomalies
– Violates BCNF: postalCode is not a superkey • BCNF
=> Possible anomalies involving postalCode – Always excludes anomalies
– May give up one of join lossless or dependency preserving

Do we care? How often do postal codes change? Use domain knowledge to choose 3NF vs. BCNF

10

You might also like