Relational Database Design_ Domain and data dependency
Relational Database Design_ Domain and data dependency
Redundancy:
Data for branch-name, branch-city, assets are repeated for each loan that a branch
makes
Wastes space
Complicates updating, introducing possibility of inconsistency of assets value
Decomposition
Null values
Cannot store information about a branch if no loans exist
Can use null values, but they are difficult to handle.
Decomposition of R = (A, B)
R1 = (A) R2 = (B)
A B A B
1 1
2 2
1 B(r)
A(r)
r
A B
A (r) B (r)
1
2
1
2
Goal — Devise a Theory for the
Following
If CG H and CG I => CG HI
By “union rule”
If CG->H then C->H and G->H by
decomposition rule.
We can find all of F+ by applying Armstrong’s
Axioms:
if , then (reflexivity)
if , then (augmentation)
if , and , then (transitivity)
R = (A, B, C, G, H, I)
F={ AB
AC
CG H
CG I
B H}
some members of F
+
A H
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG
and then transitivity with CG I
CG HI
from CG H and CG I : “union rule” can
be inferred from
definition
of functional dependencies, or
Augmentation of CG I to infer CG CGI,
augmentation of
CG H to infer CGI HI, and then transitivity
Procedure for Computing F+
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules
on f
add the resulting functional
dependencies to F+
NOTE: We will see an alternative procedure
for this task later
for each pair of functional dependencies f1and f2 in
F+
if f1 and f2 can be combined using
transitivity
then add the resulting functional
dependency to F+
until F+ does not change any further
Closure of Functional Dependencies
(Cont.)
We can further simplify manual computation
of F+ by using the following additional rules.
If holds and holds, then
holds (union)
If holds, then holds and
holds (decomposition)
If holds and holds, then
holds (pseudotransitivity)
The above rules can be inferred from Armstrong’s
axioms.
Closure of Attribute Sets
Given a set of attributes define the closure of
under F (denoted by +) as the set of attributes that
are functionally determined by under F:
is in F+ +
Algorithm to compute +, the closure of under F
result := ;
while (changes to result) do
for each in F do
begin
if result then result := result
end
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A B
AC
CG H
CG I
B H}
(AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Is AG a candidate key?
1. Is AG a super key?
+
1. Does AG R? == Is (AG) R
2. Is any subset of AG a superkey?
+
1. Does A R? == Is (A) R
2. Does G R? == Is (G)+ R
Uses of Attribute Closure
Computing closure of F
Foreach R, we find the closure +, and for each S
+, we output a functional dependency S.
Canonical Cover
Sets of functional dependencies may have
redundant dependencies that can be inferred from
the others
Eg: A C is redundant in: {A B, B C, A
C}
Parts of a functional dependency may be redundant
E.g. on RHS: {A B, B C, A CD} can be
simplified to
{A B, B C, A D}
E.g. on LHS: {A B, B C, AC D} can be
simplified to
{A B, B C, A D}
Intuitively, a canonical cover of F is a
“minimal” set of functional dependencies
equivalent to F, having no redundant
dependencies or redundant parts of
dependencies
Extraneous Attributes
Consider a set F of functional dependencies and
the functional dependency in F.
Attribute A is extraneous in if A
and F logically implies (F – { }) {( – A)
}.
Attribute A is extraneous in if A
and the set of functional dependencies
(F – { }) { ( – A)} logically implies F.
Note: implication in the opposite direction is
trivial in each of the cases above, since a
“stronger” functional dependency always
implies a weaker one
Example: Given F = {A C, AB C }
B is extraneous in AB C because {A
C, AB C} logically implies A C (I.e.
the result of dropping B from AB C).
Example: Given F = {A C, AB CD}
C is extraneous in AB CD since AB C
can be inferred even after deleting C
Testing if an Attribute is Extraneous
Consider a set F of functional dependencies and
the functional dependency in F.
To test if attribute A is extraneous in
1. compute ({} – A)+ using the dependencies in F
2. check that ({} – A)+ contains A; if it does, A is
extraneous
To test if attribute A is extraneous in
1. compute + using only the dependencies in
F’ = (F – { }) { ( – A)},
2. check that + contains A; if it does, A is extraneous
Canonical Cover
is trivial (i.e., )
is a superkey for R
Example
R = (A, B, C)
F = {A B
B C}
Key = {A}
R is not in BCNF
Decomposition R1 = (A, B), R2 = (B, C)
R1 and R2 in BCNF
Lossless-join decomposition
Dependency preserving
Testing for BCNF
To check if a non-trivial dependency causes
a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R, that is, it is a
superkey of R.
Simplified test: To check if a relation schema R is in
BCNF, it suffices to check only the dependencies in
the given set F for violation of BCNF, rather than
checking all dependencies in F+.
Ifnone of the dependencies in F causes a violation of
BCNF, then none of the dependencies in F+ will cause a
violation of BCNF either.
However, using only F is incorrect when testing a
relation in a decomposition of R
E.g. Consider R (A, B, C, D), with F = { A B, B C}
Decompose R into R1(A,B) and R2(A,C,D)
Neither of the dependencies in F contain only attributes from
(A,C,D) so we might be mislead into thinking R2 satisfies
BCNF.
In fact, dependency A C in F+ shows R2 is not in BCNF.
BCNF Decomposition Algorithm
result := {R};
done := false;
compute F+;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let be a nontrivial functional
dependency that holds on Ri
such that Ri is not in F+,
and = ;
result := (result – Ri ) (Ri – ) (, );
end
else done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join.
Example of BCNF Decomposition
R = (branch-name, branch-city, assets,
customer-name, loan-number, amount)
F = {branch-name assets branch-city
loan-number amount branch-name}
Key = {loan-number, customer-name}
Decomposition
R1 = (branch-name, branch-city, assets)
R2 = (branch-name, customer-name, loan-number,
amount)
R3 = (branch-name, loan-number, amount)
R4 = (customer-name, loan-number)
Final decomposition
Testing Decomposition for BCNF
R = (J, K, L)
F = {JK L
L K}
Two candidate keys = JK and JL
R is not in BCNF
Any decomposition of R will fail to
preserve
JK L
Third Normal Form: Motivation
There are some situations where
BCNF is not dependency preserving, and
efficient checking for FD violation on updates is
important
Solution: define a weaker normal form, called Third
Normal Form.
Allows some redundancy (with resultant problems; we
will see examples later)
But FDs can be checked on individual relations without
computing a join.
There is always a lossless-join, dependency-preserving
decomposition into 3NF.
Third Normal Form
A relation schema R is in third normal form (3NF) if
for all:
in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
j2 l1 k1
j3 l1 k1
null l2 k2
Lossless
join.
Dependency preservation.
classes
There are no non-trivial functional
dependencies and therefore the relation
is in BCNF
Insertion anomalies – i.e., if Sara is a
new teacher that can teach database,
two tuples need to be inserted
(database, Sara, DB Concepts)
(database, Sara, Ullman)
Multivalued Dependencies
(Cont.)
Therefore, it is better to decompose
classes into:
course teacher
database Avi
database Hank
database Sudarshan
operating systems Avi
operating systems Jim
teaches
course book
database DB Concepts
database Ullman
operating systems OS Concepts
operating systems Shaw
text
We shall see that these two relations are in Fourth Normal Form (4NF)
Multivalued Dependencies (MVDs)
Let R be a relation schema and let R
and R. The multivalued dependency
holds on R if in any legal relation r(R), for
all pairs for tuples t1 and t2 in r such that
t1[] = t2 [], there exist tuples t3 and t4 in
r such that:
t1[] = t2 [] = t3 [] = t4 []
t3[] = t1 []
t3[R – ] = t2[R – ]
t4 [] = t2[]
t4[R – ] = t1[R – ]
MVD (Cont.)
Tabular representation of
Example
Let R be a relation schema with a set of attributes
that are partitioned into 3 nonempty subsets.
Y, Z, W
We say that Y Z (Y multidetermines Z)
if and only if for all possible relations r(R)
< y1, z1, w1 > r and < y2, z2, w2 > r
then
< y1, z1, w2 > r and < y2, z2, w1 > r
Note that since the behavior of Z and W are
identical it follows that Y Z if Y W
Example (Cont.)
In our example:
course teacher
course book
The above formal definition is supposed to
formalize the notion that given a particular
value of Y (course) it has associated with it
a set of values of Z (teacher) and a set of
values of W (book), and these two sets are
in some sense independent of each other.
Use of Multivalued Dependencies
result: = {R};
done := false;
compute D+;
Let Di denote the restriction of D+ to Ri
while (not done)
if (there is a schema Ri in result that is not in
4NF) then
begin
let be a nontrivial multivalued
dependency that holds
on Ri such that Ri is not in Di, and
;
result := (result - Ri) (Ri - ) (, );
end
else done:= true;
Note: each Ri is in 4NF, and decomposition is lossless-
join
Example
R =(A, B, C, G, H, I)
F ={ A B
B HI
CG H }
R is not in 4NF since A B and A is not a superkey for R
Decomposition
a) R1 = (A, B) (R1 is in 4NF)
b) R2 = (A, C, G, H, I) (R2 is not in 4NF)
c) R3 = (C, G, H) (R3 is in 4NF)
d) R4 = (A, C, G, I) (R4 is not in 4NF)
Since A B and B HI, A HI, A I
e) R5 = (A, I) (R5 is in 4NF)
f)R6 = (A, C, G) (R6 is in 4NF)
Further Normal Forms
Join dependencies generalize multivalued
dependencies
lead to project-join normal form (PJNF) (also called
fifth normal form)
A class of even more general constraints, leads to
a normal form called domain-key normal form.
Problem with these generalized constraints: are
hard to reason with, and no set of sound and
complete set of inference rules exists.