CP 224 LECT - 2 - Relational Database Design
CP 224 LECT - 2 - Relational Database Design
Combine Schemas?
❖ Suppose we combine instructor and department into inst_dept
Update anomalies:
Modifying the budget in one tuple but not all tuples leads
to inconsistency.
A Lossy Decomposition
Example of Lossless-Join Decomposition
Functional Dependencies
Goals of Normalization
Functional-Dependency Theory
❖ R = (A, B, C, G, H, I)
F={ A→B
A→C
CG → H
CG → I
B → H}
+
❖ some members of F
• A→H
✓ by transitivity from A → B and B → H
• AG → I
✓ by augmenting A → C with G, to get AG → CG
and then transitivity with CG → I
• CG → HI
✓ by augmenting CG → I to infer CG → CGI,
and augmenting of CG → H to infer CGI → HI,
and then transitivity
+
Procedure for Computing F
❖ Additional rules:
• If α → β holds and α → γ holds, then α → β γ holds (union)
• If α → β γ holds, then α → β holds and α → γ holds
(decomposition)
• If α → β holds and γ β → δ holds, then α γ → δ holds
(pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.
❖ Is AG a candidate key?
1. Is AG a super key?
+
1. Does AG → R? == Is (AG) ⊇R
2. Is any subset of AG a superkey?
+
1. Does A → R? == Is (A) ⊇R
+
2. Does G → R? == Is (G) ⊇R
Uses of Attribute Closure
Extraneous Attributes
Canonical Cover
❖ A canonical cover for F is a set of dependencies F such that
c
• F logically implies all dependencies in F and
c,
• F logically implies all dependencies in F, and
c
• No functional dependency in F contains an extraneous attribute,
c
and
• Each left side of functional dependency in F is unique.
c
❖ Note: Union rule may become applicable after some extraneous attributes
have been deleted, so it has to be re-applied
Computing a Canonical Cover
❖ R = (A, B, C)
F = {A → BC
B → C
A → B
AB → C}
❖ Combine A → BC and A → B into A → BC
• Set is now {A → BC, B → C, AB → C}
❖ A is extraneous in AB → C
• Check if the result of deleting A from AB → C is implied by the other
dependencies
4 Yes: in fact, B → C is already present!
• Set is now {A → BC, B → C}
❖ C is extraneous in A → BC
• Check if A → C is logically implied by A → B and the other dependencies
4 Yes: using transitivity on A → B and B → C.
– Can use attribute closure of A in more complex cases
❖ The canonical cover is: A → B
B→C
Lossless-join Decomposition
❖ For the case of R = (R , R ), we require that for all possible relations r on schema R
1 2
r = ∏R1 (r ) ∏R2(r )
❖ A decomposition of R into R and R is lossless join if at least one of the following
1 2
+
dependencies is in F :
• R ∩R →R
1 2 1
• R ∩R →R
1 2 2
❖ The above functional dependencies are a sufficient condition for lossless join
decomposition; the dependencies are a necessary condition only if all constraints are
functional dependencies
Example
Dependency Preservation
+
❖ Let F be the set of dependencies F that include only attributes in R .
i i
✓ A decomposition is dependency preserving, if
(F1 𝖴F2 𝖴… 𝖴Fn ) + = F+
Example
❖ R = (A, B, C )
F = {A → B
B → C}
Key = {A}
❖ R is not in BCNF
❖ Decomposition R 1 = (A, B), R 2 = (B, C)
• R and R in BCNF
1 2
• Lossless-join decomposition
• Dependency preserving
Testing for BCNF
❖ To check if a non-trivial dependency α →β causes a violation of BCNF
+
1. compute α (the attribute closure of α), and
2. verify that it includes all attributes of R, that is, it is a superkey of R.
❖ R = (A, B, C )
F = {A → B
B → C}
Key = {A}
❖ Functional dependencies:
• course_id→ title, dept_name, credits
• building, room_number→capacity
• course_id, sec_id, semester, year→building, room_number,
time_slot_id
❖ BCNF Decomposition:
❖ course is in BCNF
• How do we know this?
❖ building, room_number→capacity holds on class-1
• but {building, room_number} is not a superkey for class-1.
• We replace class-1 by:
✓ classroom (building, room_number, capacity)
✓ section (course_id, sec_id, semester, year, building,
room_number, time_slot_id)
❖ classroom and section are in BCNF.
❖ Relation dept_advisor:
• dept_advisor (s_ID, i_ID, dept_name)
F = {s_ID, dept_name → i_ID, i_ID → dept_name}
• Two candidate keys: s_ID, dept_name, and i_ID, s_ID
• R is in 3NF
✓ s_ID, dept_name → i_ID s_ID
– dept_name is a superkey
✓ i_ID → dept_name
– dept_name is contained in a candidate key
Redundancy in 3NF
❖ Relation schema:
cust_banker_branch = (customer_id, employee_id, branch_name, type )
❖ The functional dependencies for this relation schema are:
1. customer_id, employee_id → branch_name, type
2. employee_id → branch_name
3. customer_id, branch_name → employee_id
❖ We first compute a canonical cover
st
1. branch_name is extraneous in the r.h.s. of the 1 dependency
2. No other attribute is extraneous, so we get F =
C
• Result will not depend on the order in which FDs are considered
Design Goals
❖ Goal for a relational database design is:
• BCNF.
• Lossless join.
• Dependency preservation.
❖ If we cannot achieve this, we accept one of
• Lack of dependency preservation
• Redundancy due to use of 3NF
❖ Interestingly, SQL does not provide a direct way of specifying functional
dependencies other than superkeys.
Can specify FDs using assertions, but they are expensive to test, (and
currently not supported by any of the widely used databases!)
• inst_child(ID, child_name)
• inst_phone(ID, phone_number)
MVD (Cont.)
❖ Tabular representation of α →→ β
Example
then
< y , z , w > ∈r and < y , z , w > ∈r
1 1 2 1 2 1
❖ Note that since the behavior of Z and W are identical it follows that
Y →→ Z if Y →→ W
Example (Cont.)
❖ In our example:
ID →→ child_name
ID →→ phone_number
❖ The above formal definition is supposed to formalize the notion that given
a particular value of Y (ID) it has associated with it a set of values of Z
(child_name) and a set of values of W (phone_number), and these two sets
are in some sense independent of each other.
❖ Note:
• If Y → Z then Y →→ Z
• Indeed we have (in above notation) Z = Z The claim follows.
1 2
Use of Multivalued Dependencies
e) R = (A, I) (R is in 4NF)
5 5
(R is in 4NF) f)R = (A, C, G)
6 6
❖ Temporal data have an association time interval during which the data
are valid.
❖ A snapshot is the value of the data at a particular point in time
❖ Several proposals to extend ER model by adding valid time to
• attributes, e.g., address of an instructor at different points in time
• entities, e.g., time duration when a student entity exists
• relationships, e.g., time during which an instructor was associated
with a student as an advisor.
❖ But no accepted standard
❖ Adding a temporal component results in functional dependencies like
ID → street, city
not to hold, because the address varies over time
❖ A temporal functional dependency X → Y holds on schema R if the
functional dependency X → Y holds on all snapshots for all legal
instances r (R).
Modeling Temporal Data (Cont.)
❖ In practice, database designers may add start and end time attributes to
relations
• E.g., course(course_id, course_title) is replaced by
course(course_id, course_title, start, end)
✓ Constraint: no two tuples can have overlapping valid times
– Hard to enforce efficiently
❖ Foreign key references may be to current version of data, or to data at a
point in time
• E.g., student transcript should refer to course information at the
time the course was taken
End of Chapter
❖ Case 1: If B in β:
❖ Case 2: B is in α.
• Since α is a candidate key, the third alternative in the definition of
3NF is trivially satisfied.
• In fact, we cannot show that γ is a superkey.
• This shows exactly why the third alternative is present in the
definition of 3NF.
Q.E.D.
Figure 8.02
Figure 8.03
Figure 8.04
Figure 8.05
Figure 8.06
Figure 8.14
Figure 8.15
Figure 8.17
31Fatimah AL-Shaikh