Advanced Relational Database Design: Appendix C
Advanced Relational Database Design: Appendix C
Advanced Relational
Database Design
1
2 Appendix C Advanced Relational Database Design
The bibliographical notes provide references to proofs that the preceding rules are
sound and complete. The following examples provide insight into how the formal
proofs proceed.
Let R = (A, B, C, G, H, I) be a relation schema. Suppose that A → → BC holds. The
definition of multivalued dependencies implies that, if t1 [A] = t2 [A], then there exist
tuples t3 and t4 such that
• Difference rule. If α →
→ β holds, and α →
→ γ holds, then α →
→ β − γ holds
and α →→ γ − β holds.
Let us apply our rules to the following example. Let R = (A, B, C, G, H, I) with the
following set of dependencies D given:
A→→B
B→→ HI
CG → H
• A →
→ CGHI: Since A → → B, the complementation rule (rule 4) implies that
A→→ R − B − A. R − B − A = CGHI , so A →
→ CGHI.
• A→ → HI: Since A →→ B and B →→ HI, the multivalued transitivity rule (rule
6) implies thatA →
→ HI − B. Since HI − B = HI , A →
→ HI.
• B → H: To show this fact, we need to apply the coalescence rule (rule 8).
B→ → HI holds. Since H ⊆ HI and CG → H and CG ∩ HI = ∅, we satisfy the
statement of the coalescence rule, with α being B, β being HI, δ being CG, and
γ being H. We conclude that B → H.
• A→ → CG: We already know that A →
→CGHI and A → → HI. By the difference
rule, A →
→ CGHI − HI . Since CGHI − HI = CG, A →
→ CG.
r1 : A B
a1 b1
a2 b1
r2 : C G H
c1 g1 h1
c2 g2 h2
r3 : A I
a1 i1
a2 i2
r4 : A C G
a1 c1 g1
a2 c2 g2
A B C G H I
a1 b1 c1 g1 h1 i1
a2 b1 c2 g2 h2 i2
Schema (C, G, H) is in 4NF, but schema (A, C, G, I) is not. To see that (A, C, G, I)
is not in 4NF, we note that since A → → HI is in D+ , A →→ I is in the restriction of D
to (A, C, G, I). Thus, in a third iteration of the while loop, we replace (A, C, G, I) by
two schemas (A, I) and (A, C, G). The algorithm then terminates and the resulting
4NF decomposition is {(A, B), (C, G, H), (A, I), (A, C, G)}.
This 4NF decomposition is not dependency preserving, since it fails to preserve
the multivalued dependency B → → HI. Consider Figure C.1, which shows the four
relations that may result from the projection of a relation on (A, B, C, G, H, I) onto
the four schemas of our decomposition. The restriction of D to (A, B) is A → → B and
some trivial dependencies. It is easy to see that r1 satisfies A → → B, because there is
no pair of tuples with the same A value. Observe that r2 satisfies all functional and
multivalued dependencies, since no two tuples in r2 have the same value on any at-
tribute. A similar statement can be made for r3 and r4 . Therefore, the decomposed
version of our database satisfies all the dependencies in the restriction of D. How-
ever, there is no relation r on (A, B, C, G, H, I) that satisfies D and decomposes into
r1 , r2 , r3 , and r4 . Figure C.2 shows the relation r = r1 1 r2 1 r3 1 r4 . Rela-
tion r does not satisfy B → → HI. Any relation s containing r and satisfying B → → HI
must include the tuple (a2 , b1 , c2 , g2 , h1 , i1 ). However, ΠCGH (s) includes a tuple
(c2 , g2 , h1 ) that is not in r2 . Thus, our decomposition fails to detect a violation of
B→ → HI.
C.2 Join Dependencies 5
We have seen that, if we are given a set of multivalued and functional dependen-
cies, it is advantageous to find a database design that meets the three criteria of
1. 4NF
2. Dependency preservation
3. Lossless join
If all we have are functional dependencies, then the first criterion is just BCNF.
We have seen also that it is not always possible to meet all three of these criteria.
We succeeded in finding such a decomposition for the bank example, but failed for
the example of schema R = (A, B, C, G, H, I).
When we cannot achieve our three goals, we have to compromise on one of 4NF
or dependency preservation.
R1 – R2 R1 ∩ R2
ΠR1 (t1) a1 . . . ai ai + 1 . . . aj
ΠR1 (t2) b1 . . . bi ai + 1 . . . aj
R1 ∩ R2 R2 – R1
ΠR2 (t1) ai + 1 . . . aj aj + 1 . . . an
ΠR2 (t2) ai + 1 . . . aj bj + 1 . . . bn
Thus, t1 [R1 ∩ R2 ] = t2 [R1 ∩ R2 ], but t1 and t2 have different values on all other
attributes. Let us compute ΠR1 (r) 1 ΠR2 (r). Figure C.3 shows ΠR1 (r) and ΠR2 (r).
When we compute the join, we get two tuples in addition to t1 and t2 , shown by t3
and t4 in Figure C.4.
If *(R1 , R2 ) holds, then, whenever we have tuples t1 and t2 , we must also have
t3 and t4 . Thus, Figure C.4 shows a tabular representation of the join dependency
*(R1, R2 ). Compare Figure C.4 with Figure 7.14, in which we gave a tabular repre-
sentation of α → → β. If we let α = R1 ∩ R2 and β = R1 , then we can see that
the two tabular representations in these figures are the same. Indeed, *(R1 , R2 ) is
just another way of stating R1 ∩ R2 → → R1 . Using the complementation and aug-
mentation rules for multivalued dependencies, we can show that R1 ∩ R2 → → R1
implies R1 ∩ R2 → → R2 . Thus, *(R1 , R2 ) is equivalent to R1 ∩ R2 → → R2 . This
observation is not surprising in light of the fact we noted earlier that R1 and R2 form
a lossless-join decomposition of R if and only if R1 ∩ R2 → → R2 or R1 ∩ R2 → → R1 .
Every join dependency of the form *(R1 , R2 ) is therefore equivalent to a multival-
ued dependency. However, there are join dependencies that are not equivalent to any
multivalued dependency. The simplest example of such a dependency is on schema
R = (A, B, C). The join dependency
R1 – R2 R1 ∩ R2 R2 – R1
t1 a1 ... ai ai + 1 ... aj aj + 1 . . . an
t2 b1 ... bi ai + 1 ... aj bj + 1 . . . bn
t3 a1 ... ai ai + 1 ... aj bj + 1 . . . bn
t4 b1 ... bi ai + 1 ... aj aj + 1 . . . an
A B C
a1 b1 c2
a2 b1 c1
a1 b2 c1
a1 b1 c1
Figure C.5 Tabular representation of *((A, B), (B, C), (A, C)).
Loan info schema = (branch name, customer name, loan number, amount)
from our banking example. We can define a relation loan info (Loan info schema) as the
set of all tuples on Loan info schema such that
• The loan represented by loan number is made by the branch named branch
name.
• The loan represented by loan number is made to the customer named customer
name.
• The loan represented by loan number is in the amount given by amount.
The preceding definition of the loan info relation is a conjunction of three predicates:
one on loan number and branch name, one on loan number and customer name, and one
on loan number and amount. Surprisingly, it can be shown that the preceding intu-
itive definition of loan info logically implies the join dependency *((loan number, branch
name), (loan number, customer name), (loan number, amount)).
A B C
a1 b1 c2
a2 b1 c1
a1 b2 c1
a1 b1 c1
Thus, join dependencies have an intuitive appeal and correspond to one of our
three criteria for a good database design.
For functional and multivalued dependencies, we were able to give a system of
inference rules that are sound and complete. Unfortunately, no such set of rules is
known for join dependencies. It appears that we must consider more general classes
of dependencies than join dependencies to construct a sound and complete set of
inference rules. The bibliographical notes contain references to research in this area.
A database design is in PJNF if each member of the set of relation schemas that con-
stitutes the design is in PJNF. PJNF is called fifth normal form (5NF) in some of the
literature on database normalization.
Consider again our banking example. Given the join dependency *((loan number,
branch name), (loan number, customer name), (loan number, amount)), Loan info schema is
not in PJNF. To put Loan info schema into PJNF, we must decompose it into the three
schemas specified by the join dependency: (loan number, branch name), (loan number,
customer name), and (loan number, amount).
Because every multivalued dependency is also a join dependency, it is easy to see
that every PJNF schema is also in 4NF. Thus, in general, we may not be able to find a
dependency-preserving decomposition into PJNF for a given schema.
1. Domain declaration. Let A be an attribute, and let dom be a set of values. The
domain declaration A ⊆ dom requires that the A value of all tuples be values
in dom.
2. Key declaration. Let R be a relation schema with K ⊆ R. The key declaration
key (K) requires that K be a superkey for schema R—that is, K → R. Note
that all key declarations are functional dependencies but not all functional
dependencies are key declarations.
C.3 Domain-Key Normal Form 9
We retain all the dependencies that we had on Account schema as general constraints.
The domain constraints for Special acct schema require that, for each account,
The domain constraints for Regular acct schema require that the account number does
not begin with 9. The resulting design is in DKNF, although the proof of this fact is
beyond the scope of this text.
Let us compare DKNF to the other normal forms that we have studied. Under the
other normal forms, we did not take into consideration domain constraints. We as-
sumed (implicitly) that the domain of each attribute was some infinite domain, such
as the set of all integers or the set of all character strings. We allowed key constraints
(indeed, we allowed functional dependencies). For each normal form, we allowed
a restricted form of general constraint (a set of functional, multivalued, or join de-
pendencies). Thus, we can rewrite the definitions of PJNF, 4NF, BCNF, and 3NF in a
manner that shows them to be special cases of DKNF.
We now present a DKNF-inspired rephrasing of our definition of PJNF. Let R =
(A1 , A2 , . . . , An ) be a relation schema. Let dom(Ai ) denote the domain of attribute
Ai , and let all these domains be infinite. Then all domain constraints D are of the form
Ai ⊆ dom(Ai ). Let the general constraints be a set G of functional, multivalued, or
join dependencies. If F is the set of functional dependencies in G, let the set K of key
10 Appendix C Advanced Relational Database Design
C.4 Summary
In this chapter we presented the theory of multivalued dependencies, including a set
of sound and complete inference rules for multivalued dependencies.
We then presented two more normal forms based on more general classes of con-
straints. Join dependencies are a generalization of multivalued dependencies, and
lead to the definition of PJNF. DKNF is an idealized normal form that may be difficult
to achieve in practice. Yet DKNF has desirable properties that should be included to
the extent possible in a good database design.
Exercises
C.1 List all the nontrivial multivalued dependencies satisfied by the relation in Fig-
ure C.7.
C.2 Use the definition of multivalued dependency (Section 7.6.1) to argue that each
of the following axioms is sound:
a. The complementation rule
b. The multivalued augmentation rule
c. The multivalued transitivity rule
C.3 Use the definitions of functional and multivalued dependencies (Sections 7.4
and 7.6.1) to show the soundness of the replication rule.
C.4 Show that the coalescence rule is sound. (Hint: Apply the definition of α → → β
to a pair of tuples t1 and t2 such that t1 [α] = t2 [α]. Observe that since δ ∩ β =
∅, if two tuples have the same value on R − β, then they have the same value
on δ.)
A B C
a1 b1 c1
a1 b1 c2
a2 b1 c1
a2 b1 c3
C.5 Use the axioms for functional and multivalued dependencies to show that each
of the following rules is sound:
a. The multivalued union rule
b. The intersection rule
c. The difference rule
C.6 Let R = (A, B, C, D, E), and let M be the following set of multivalued
dependencies
A →
→ BC
B →→ CD
E →→ AD
Bibliographical Notes
The notions of 4NF, PJNF, and DKNF are from Fagin [1977], Fagin [1979], and Fagin
[1981], respectively. The synthesis approach to database design is discussed in Bern-
stein [1976].
Join dependencies were introduced by Rissanen [1979]. Sciore [1982] gives a set
of axioms for a class of dependencies that properly includes the join dependencies.
In addition to their use in PJNF, join dependencies are central to the definition of
universal relation databases. Fagin et al. [1982] introduces the relationship between
join dependencies and the definition of a relation as a conjunction of predicates (see
Section C.2.1). This use of join dependencies has led to a large amount of research into
acyclic database schemas. Intuitively, a schema is acyclic if every pair of attributes is
related in a unique way. Formal treatment of acyclic schemas appears in Fagin [1983]
and in Beeri et al. [1983].
Additional dependencies are discussed in detail in Maier [1983]. Inclusion depen-
dencies are discussed by Casanova et al. [1984] and Cosmadakis et al. [1990]. Tem-
plate dependencies are covered by Sadri and Ullman [1982]. Mutual dependencies
are examined by Furtado [1978] and by Mendelzon and Maier [1979].