Week3 Lecture
Week3 Lecture
Introduction to Databases
1
Databases
Functional Dependencies and Normalization
Chapter 19
2
Review-Database Design Process
Requirements Analysis
Interact with users and domain experts to
specify what the database should do.
Conceptual Design
Translate the requirements into a high-level
conceptual description (often using E/R Model)
Logical Design CREATE TABLE T ( cin
Convert the E/R Model into the DBMS logical data INTEGER, pname
VARCHAR(50),
model (in this class the relational model with schemas address FLOAT,
and constraints) PRIMARY KEY (cin));
Schema Refinement
Refine the logical model for consistency (but keeping
in mind performance!)
Physical Design
Develop the physical schema: how are schemas
stored physically on disk? How are they partitionned
in a distributed setting? Which indexes should be built?
Security Design
3
Set up measures to secure and protect the database
Review-Database Design Process
Requirements Analysis
Interact with users and domain experts to
specify what the database should do.
Conceptual Design
Translate the requirements into a high-level
conceptual description (often using E/R Model)
Logical Design CREATE TABLE T ( cin
Convert the E/R Model into the DBMS logical data INTEGER, pname
VARCHAR(50),
model (in this class the relational model with schemas address FLOAT,
and constraints) PRIMARY KEY (cin));
Schema Refinement
Refine the logical model for consistency (but keeping
We are here!
in mind performance!)
Physical Design
Develop the physical schema: how are schemas
stored physically on disk? How are they partitionned
in a distributed setting? Which indexes should be built?
Security Design
4
Set up measures to secure and protect the database
The Evils of Redundancy
• Redundancy is at the root of several problems associated with relational
schemas:
• Redundancy means that some values in the database are replicated
• redundant storage,
• insert/delete/update anomalies
• Integrity constraints, in particular functional dependencies, can be used
to identify schemas with such problems and to suggest refinements.
• Main refinement technique: decomposition (replacing ABCD with, say,
AB and BCD, or ACD and ABD).
• Decomposition should be used judiciously:
• Is there reason to decompose a relation?
• What problems (if any) does the decomposition cause?
5
Functional Dependencies (FDs)
6
Functional Dependencies (FDs)
7
Functional Dependencies (FDs)
• A functional dependency X → Y holds over relation R if, for every
allowable instance r of R:
•
t1 r, t2 r, X (t1) = X (t2) implies Y
(t1) = Y (t2)
• i.e., given two tuples in r, if the X values agree, then the Y values must also
agree. (X and Y are sets of attributes.)
• An FD is a statement about all allowable relations.
• Must be identified based on semantics of application.
• Given some allowable instance r1 of R, we can check if it violates some FD f, but
we cannot tell if f holds over R!
• K is a candidate key for R means that K → R
• However, K → R does not require K to be minimal!
8
FDs vs Keys
• FDs are a generalization of keys
• A superkey/key:
• is a set of attributes that determines all the other attributes in a table
• KEY -> {all table attributes}
• A candidate key:
• A special case of superkey
• is a minimal set of attributes that determines all the other attributes
• A primary key:
• is one particular key chosen from the set of possible candidate keys
• Note that the notion of keys here is different from index/sort keys
9
Example: Constraints on Entity Set
• Consider relation obtained from Hourly_Emps:
• Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)
• Notation: We will denote this relation schema by listing the
attributes: SNLRWH
• This is really the set of attributes {S,N,L,R,W,H}.
• Sometimes, we will refer to all attributes of a relation by using the relation
name. (e.g., Hourly_Emps for SNLRWH)
• Some FDs on Hourly_Emps:
• ssn is the primary key: S → SNLRWH
• rating determines hrly_wages: R → W
• lot determines lot (L → L)
10
Example (Contd.)
Hourly_Emps
• Problems due to R → W: S N L R W H
• Update anomaly: Can we 123-22-3666 Attishoo 48 8 10 40
change W in just the 1st tuple 231-31-5368 Smiley 22 8 10 30
of SNLRWH?
131-24-3650 Smethurst 35 5 7 30
• Insertion anomaly: What if we
want to insert an employee and 434-26-3751 Guldu 35 5 7 32
don’t know the hourly wage for 612-67-4134 Madayan 35 8 10 40
his rating?
• Deletion anomaly: If we delete
all employees with rating 5, we
lose the information about the
wage for rating 5!
11
Example (Contd.)
Hourly_Emps
• Problems due to R → W: S N L R W H
• Update anomaly: Can we 123-22-3666 Attishoo 48 8 10 40
change W in just the 1st tuple 231-31-5368 Smiley 22 8 10 30
of SNLRWH?
131-24-3650 Smethurst 35 5 7 30
• Insertion anomaly: What if we
want to insert an employee and 434-26-3751 Guldu 35 5 7 32
don’t know the hourly wage for 612-67-4134 Madayan 35 8 10 40
his/her rating?
• Deletion anomaly: If we delete
all employees with rating 5, we
lose the information about the
wage for rating 5!
12
Example (Contd.)
Hourly_Emps
• Problems due to R → W: S N L R W H
• Update anomaly: Can we 123-22-3666 Attishoo 48 8 10 40
change W in just the 1st tuple 231-31-5368 Smiley 22 8 10 30
of SNLRWH?
131-24-3650 Smethurst 35 5 7 30
• Insertion anomaly: What if we
want to insert an employee and 434-26-3751 Guldu 35 5 7 32
don’t know the hourly wage for 612-67-4134 Madayan 35 8 10 40
his/her rating?
• Deletion anomaly: If we delete
all employees with rating 5, we
lose the information about the
wage for rating 5!
13
Example (Contd.)
Hourly_Emps
• Problems due to R → W: S N L R W H
• Update anomaly: Can we 123-22-3666 Attishoo 48 8 10 40
change W in just the 1st tuple 231-31-5368 Smiley 22 8 10 30
of SNLRWH?
131-24-3650 Smethurst 35 5 7 30
• Insertion anomaly: What if we
want to insert an employee and 434-26-3751 Guldu 35 5 7 32
don’t know the hourly wage for 612-67-4134 Madayan 35 8 10 40
his/her rating?
• Deletion anomaly: If we delete
all employees with rating 5, we
lose the information about the
wage for rating 5!
14
Example (Contd.)
Hourly_Emps
• Problems due to R → W: S N L R W H
• Update anomaly: Can we 123-22-3666 Attishoo 48 8 10 40
change W in just the 1st tuple 231-31-5368 Smiley 22 8 10 30
of SNLRWH?
131-24-3650 Smethurst 35 5 7 30
• Insertion anomaly: What if we
want to insert an employee and 434-26-3751 Guldu 35 5 7 32
don’t know the hourly wage for 612-67-4134 Madayan 35 8 10 40
his/her rating?
• Deletion anomaly: If we delete
all employees with rating 5, we
lose the information about the
wage for rating 5!
16
How do we fix this?
Hourly_Emps
S N L R W H
• Decomposing relations:
123-22-3666 Attishoo 48 8 10 40
• Dividing a relation into
231-31-5368 Smiley 22 8 10 30
smaller relations (with
overlap!) 131-24-3650 Smethurst 35 5 7 30
• FDs guide this process 434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
• Example:
Hourly_Emps2
• R → W is problematic so S N L R H
we decompose SNLRWH
123-22-3666 Attishoo 48 8 40
Wages
R W 231-31-5368 Smiley 22 8 30
8 10 131-24-3650 Smethurst 35 5 30
5 7 434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40 17
Reasoning About FDs
• Given some FDs, we can usually infer additional FDs:
• ssn → did, did → lot implies ssn → lot
• An FD f is implied by a set of FDs F if f holds whenever all FDs in F
hold.
• F += closure of F is the set of all FDs that are implied by F.
• Armstrong’s Axioms (X, Y, Z are sets of attributes):
• Reflexivity: If X Y, then Y → X
• Augmentation: If X → Y, then XZ → YZ for any Z
• Transitivity: If X → Y and Y → Z, then X → Z
• These are sound and complete inference rules for FDs!
18
Reasoning About FDs
• Given some FDs, we can usually infer additional FDs:
• ssn → did, did → lot implies ssn → lot
• An FD f is implied by a set of FDs F if f holds whenever all FDs in F
hold.
• F += closure of F is the set of all FDs that are implied by F.
• Armstrong’s Axioms (X, Y, Z are sets of attributes):
• Reflexivity: If X Y, then Y → X
• Augmentation: If X → Y, then XZ → YZ for any Z
• Transitivity: If X → Y and Y → Z, then X → Z
• These are sound and complete inference rules for FDs!
Using AA, we can get only the FDs in F+ and all these FDs
19
Reasoning About FDs (Contd.)
• Couple of additional rules (that follow from AA):
• Union: If X → Y and X → Z, then X → YZ
• Decomposition: If X → YZ, then X → Y and X →Z
• Example: Contracts(cid,sid,jid,did,pid,qty,value), and:
• C is the key: C → CSJDPQV
• Job (J) purchases each part (P) using single contract (C): JP → C
• Dept (D) purchases at most one part (P) from a supplier (S): SD → P
• Problem: Prove that SDJ is a key for Contracts
• JP → C, C → CSJDPQV imply JP → CSJDPQV (a key by transitivity)
• SD → P implies SDJ → JP (by augmentation)
• SDJ →JP, JP → CSJDPQV imply SDJ → CSJDPQV (a key by transitivity)
22
Reasoning About FDs (Contd.)
• How to check that X →Y is in F+.
• Computing X+ wrt. F:
• X+ :=X
• Repeat until no change (fixpoint):
For U → V F
If U X+, then add V to X+
• Check if Y is in X+
• Approach can also be used to check for keys of a relation
• If X+ =R, then X is a superkey for R (R is all attributes)
• How to check that X is a candidate key, i.e. minimal key?
• For each attribute A in X, check if (X-A)+ = R
23
Example
• R = {A, B, C, D, E}
• F = {B →CD, D->E, B->A, E->C, AD->B}
• Is B -> E in F+
24
Normal Forms
• Returning to the issue of schema refinement, the first question to
ask is whether any refinement is needed!
• If a relation is in a certain normal form (BCNF, 3NF etc.), it is
known that certain kinds of problems are avoided/minimized. This
can be used to help us decide whether decomposing the relation
will help.
• Role of FDs in detecting redundancy:
• Consider a relation R with 3 attributes, ABC.
• No FDs hold: There is no redundancy here.
• Given A → B: Several tuples could have the same A value, and if so, they’ll all
have the same B value!
25
Boyce-Codd Normal Form (BCNF) BCNF stands for Boyce-Codd Normal Form, and it's a higher
level of database normalization that refines the third normal
form (3NF). A relation is in BCNF if, for every non-trivial
• X contains a key for R. In simpler terms, BCNF ensures that there are no non-trivial
functional dependencies where the determinant (X) is not a
26
Third Normal Form (3NF)
• Reln R with FDs F is in 3NF if, for all X → A in F +
• A X (called a trivial FD), or or rel-->A
as if AB-->A it is trivial
27
What Does 3NF Achieve?
28
Decomposition of a Relation Scheme
29
Example Decomposition
30
Problems with Decompositions
31
Lossless Join Decompositions
• Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs
F if, for every instance r that satisfies F:
• X (r) Y (r) = r
• It is always true that r X (r) Y (r)
• In general, the other direction does not hold! If it does, the decomposition
is lossless-join.
• Definition extended to decomposition into 3 or more relations in a
straightforward way.
• It is essential that all decompositions used to deal with redundancy
be lossless! (Avoids Problem (2).)
32
More on Lossless Join A B
1 2
A B C 4 5
• The decomposition of R into X and
1 2 3 7 2
Y is lossless-join wrt F if and only if
the closure of F contains: 4 5 6
B C
• X Y → X, or
7 2 8
2 3
• X Y → Y 5 6
• In particular, the decomposition of 2 8
A B C
R into UV and R - V is lossless-join
if U → V holds over R. 1 2 3
4 5 6
7 2 8
1 2 8
7 2 3
33
Dependency Preserving Decomposition
34
Dependency Preserving Decompositions
(Contd.)
35
Decomposition into BCNF
36
BCNF and Dependency Preservation
• In general, there may not be a dependency preserving
decomposition into BCNF.
• e.g., CSZ, CS → Z, Z → C
• Can’t decompose while preserving 1st FD; not in BCNF.
• Similarly, decomposition of CSJDQV into SDP, JS and CJDQV is
not dependency preserving (w.r.t. the FDs JP → C, SD → P
and J → S).
• However, it is a lossless join decomposition.
• In this case, adding JPC to the collection of relations gives us a
dependency preserving decomposition.
• JPC tuples stored only for checking FD! (Redundancy!)
37
Decomposition into 3NF
• Obviously, the algorithm for lossless join decomp into BCNF can
be used to obtain a lossless join decomp into 3NF (typically, can
stop earlier).
• To ensure dependency preservation, one idea:
• If X → Y is not preserved, add relation XY.
• Problem is that XY may violate 3NF! e.g., consider the addition of CJP to
`preserve’ JP → C. What if we also have J → C ?
• Refinement: Instead of the given set of FDs F, use a minimal
cover for F.
38
Minimal Cover for a Set of FDs
• Minimal cover G for a set of FDs F:
• Closure of F = closure of G.
• Right hand side of each FD in G is a single attribute.
• If we modify G by deleting an FD or by deleting attributes from an FD in G, the
closure changes.
• Intuitively, every FD in G is needed, and ``as small as possible’’ in order
to get the same closure as F.
• e.g., A → B, ABCD → E, EF → GH, ACDF → EG has the
following minimal cover:
• A → B, ACD → E, EF →G and EF → H
• M.C. → Lossless-Join, Dep. Pres. Decomp!!! (in book)
39
Refining an ER Diagram
Before:
• 1st diagram translated:
since
Workers(S,N,L,D,S) name dname
Departments(D,M,B) ssn lot did budget
• Lots associated with workers.
• Suppose all workers in a dept are Employees Works_In Departments
assigned the same lot: D → L
• Redundancy; fixed by:
Workers2(S,N,D,S) After:
Dept_Lots(D,L) budget
since
• Can fine-tune this: name dname
Workers2(S,N,D,S) ssn did lot
Departments(D,M,B,L)
Employees Works_In Departments
40
Summary of Schema Refinement
• If a relation is in BCNF, it is free of redundancies that can be
detected using FDs. Thus, trying to ensure that all relations are in
BCNF is a good heuristic.
• If a relation is not in BCNF, we can try to decompose it into a
collection of BCNF relations.
• Must consider whether all FDs are preserved. If a lossless-join,
dependency preserving decomposition into BCNF is not possible (or
unsuitable, given typical queries), should consider decomposition into
3NF.
• Decompositions should be carried out and/or re-examined while keeping
performance requirements in mind.
41