0% found this document useful (0 votes)
41 views32 pages

Lecture03 Normalization

normalisation

Uploaded by

Miranda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views32 pages

Lecture03 Normalization

normalisation

Uploaded by

Miranda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

CSE 544

Principles of Database
Management Systems

Fall 2016
Lecture 3 –Schema Normalization

CSE 544 - Fall 2016 1


Projects

• We have 27 teams

• Impossible to discuss projects at office hours tomorrow

• Instead, sign up on doodle for a 10’ slot on Monday.

CSE 544 - Fall 2016 2


Database Design

• The relational model is great, but how do I design my


database schema?

CSE 544 - Fall 2016 3


Outline

• Conceptual db design: entity-relationship model

• Problematic database designs

• Functional dependencies

• Normal forms and schema normalization

CSE 544 - Fall 2016 4


Database Design Process

Conceptual SQL
Refinement Files
Modeling Tables

Conceptual Schema

ER diagrams Physical Schema


Relations

CSE 544 - Fall 2016 5


Conceptual Schema Design

name

Conceptual Model: Patient patien_of Doctor

zip name dno

Relational Model:
plus FD’s
(FD = functional dependency)

Normalization:
Eliminates anomalies
CSE 544 - Fall 2016 6
Entity-Relationship Diagram
name
since
dno
pno

Patient patient_of Doctor

zip specialty name

Attributes Entity sets Relationship sets


name Patient patient_of 7
Entity-Relationship Model
• Typically, each entity has a key
Many One
• ER relationships can include multiplicity
– One-to-one, one-to-many, etc.
– Indicated with arrows

• Can model multi-way relationships

• Can model subclasses

• And more...
CSE 544 - Fall 2016 8
Subclasses to Product
Relations Name Price Category

name category Gizmo 99 gadget


Camera 49 photo
price
Toy 39 gadget
Product

Sw.Product Name platforms


isa isa
Gizmo unix

Software Product Educational Product Ed.Product


Age
platforms Age Group Name
Group
Gizmo toddler
Other ways to convert are possible Toy retired
CSE 544 - Fall 2016 9
General approach to Translating
Diagram into Relations
Normally translate as follows:
• Each entity set becomes a relation
• Each relationship set becomes a relation
– Except many-one relationships. Can combine them with entity set.

One bad way to translate our diagram into relations


• PatientOf (pno, name, zip, dno, since)
• Doctor (dno, dname, specialty)

CSE 544 - Fall 2016 10


Outline

• Conceptual db design: entity-relationship model

• Problematic database designs

• Functional dependencies

• Normal forms and schema normalization

CSE 544 - Fall 2016 11


Problematic Designs

• Some db designs lead to redundancy


– Same information stored multiple times

• Problems
– Redundant storage
– Update anomalies
– Insertion anomalies
– Deletion anomalies

CSE 544 - Fall 2016 12


Problem Examples
PatientOf
pno name zip dno since Redundant
1 p1 98125 2 2000 If we update
1 p1 98125 3 2003 to 98119, we
get inconsistency
2 p2 98112 1 2002
3 p1 98143 1 1985

What if we want to insert a patient without a doctor?


What if we want to delete the last doctor for a patient?
Illegal as (pno,dno) is the primary key, cannot have nulls
CSE 544 - Fall 2016 13
Solution: Decomposition
Patient PatientOf
pno name zip pno dno since
1 p1 98125 1 2 2000
2 p2 98112 1 3 2003
3 p1 98143 2 1 2002
3 1 1985

Decomposition solves the problem,


but need to be careful…

CSE 544 - Fall 2016 14


Lossy Decomposition
Patient PatientOf
pno name zip name dno since
1 p1 98125 p1 2 2000
2 p2 98112 p1 3 2003
3 p1 98143 p2 1 2002
p1 1 1985

Decomposition can cause us to lose information!

CSE 544 - Fall 2016 15


Schema Refinement Challenges

• How do we know that we should decompose a relation?


– Functional dependencies
– Normal forms

• How do we make sure decomposition does not lose info?


– Lossless-join decompositions
– Dependency-preserving decompositions

CSE 544 - Fall 2016 16


Outline

• Conceptual db design: entity-relationship model

• Problematic database designs

• Functional dependencies

• Normal forms and schema normalization

CSE 544 - Fall 2016 17


Functional Dependency

• A functional dependency (FD) is an integrity constraint


that generalizes the concept of a key

• An instance of relation R satisfies the FD: X ® Y


– if for every pair of tuples t1 and t2
– if t1.X = t2.X then t1.Y = t2.Y
– where X, Y are two nonempty sets of attributes in R
• We say that X determines Y

• FDs come from domain knowledge

CSE 544 - Fall 2016 18


FD Example

An FD holds, or does not hold on an instance:

EmpID Name Phone Position


E0045 Smith 1234 Clerk
E3542 Mike 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer

EmpID à Name, Phone, Position


Position à Phone
but not Phone à Position
CSE 544 - Fall 2016 19
FD Terminology

• FD’s are constraints


– On some instances they hold
– On others they do not

• If every instance of R will be one in which a given FD will


hold, then we say that R satisfies the FD
– If we say that R satisfies an FD F, we are stating a constraint on R

• FDs come from domain knowledge

CSE 544 - Fall 2016 20


Decomposition Problems

• FDs will help us identify possible redundancy


– Identify redundancy and split relations to avoid it.

• Can we get the data back correctly ?


– Lossless-join decomposition

• Can we recover the FD’s on the ‘big’ table from the FD’s
on the small tables?
– Dependency-preserving decomposition
– So that we can enforce all FDs without performing joins

CSE 544 - Fall 2016 21


Outline

• Conceptual db design: entity-relationship model

• Problematic database designs

• Functional dependencies

• Normal forms and schema normalization

CSE 544 - Fall 2016 22


Normal Forms

• Based on Functional Dependencies


– 2nd Normal Form (obsolete)
– 3rd Normal Form We only discuss
– Boyce Codd Normal Form (BCNF) these two

• Based on Multivalued Dependencies


– 4th Normal Form
• Based on Join Dependencies
– 5th Normal Form

CSE 544 - Fall 2016 23


BCNF
A simple condition for removing anomalies from relations:

A relation R is in BCNF if:


If A1, ..., An ® B is a non-trivial dependency in R ,
then {A1, ..., An} is a superkey for R

BCNF ensures that no redundancy can be detected


using FD information alone

CSE 544 - Fall 2016 24


Our Example
PatientOf
pno name zip dno since
1 p1 98125 2 2000
1 p1 98125 3 2003
2 p2 98112 1 2002
3 p1 98143 1 1985

pno,dno is a key, but pno ® name, zip


BCNF violation so we decompose

CSE 544 - Fall 2016 25


Decomposition in General

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

R1 = projection of R on A1, ..., An, B1, ..., Bm


R2 = projection of R on A1, ..., An, C1, ..., Cp

Theorem If A1, ..., An à B1, ..., Bm


Then the decomposition is lossless
Note: don’t necessarily need A1, ..., An à C1, ..., Cp
CSE 544 - Fall 2016 26
BCNF Decomposition Algorithm

Repeat
choose A1, …, Am ® B1, …, Bn that violates BCNF condition
split R into
R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [rest])
continue with both R1 and R2
Until no more violations

Lossless-join decomposition: Attributes common to R1 and


R2 must contain a key for either R1 or R2

CSE 544 - Fall 2016 27


BCNF and Dependencies

Unit Company Product

FD’s: Unit ® Company; Company, Product ® Unit


So, there is a BCNF violation, and we decompose.

CSE 544 - Fall 2016 28


BCNF and Dependencies

Unit Company Product

FD’s: Unit ® Company; Company, Product ® Unit


So, there is a BCNF violation, and we decompose.
Unit Company
Unit ® Company

Unit Product
No FDs

In BCNF we lose the FD: Company, Product ® Unit


CSE 544 - Fall 2016 29
3NF

A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dep. A1, A2, ..., An ® B for R,


then {A1, A2, ..., An } is a super-key for R,
or B is part of a key.

CSE 544 - Fall 2016 30


3NF Discussion

• 3NF decomposition v.s. BCNF decomposition:


– Complex: see book

• Tradeoffs
– BCNF = no anomalies, but may lose some FDs
– 3NF = keeps all FDs, but may have some anomalies

CSE 544 - Fall 2016 31


Summary

• Database design is not trivial


– Use ER models
– Translate ER models into relations
– Normalize to eliminate anomalies

• Normalization tradeoffs
– BCNF: no anomalies, but may lose some FDs
– 3NF: keeps all FDs, but may have anomalies
– Too many small tables affect performance

CSE 544 - Fall 2016 32

You might also like