0% found this document useful (0 votes)
74 views19 pages

FD Slide2 09

A candidate key of a relation is a minimal set of attributes that uniquely identifies each tuple. It has two properties: 1) all attributes are functionally dependent on the candidate key attributes, and 2) no proper subset of the candidate key attributes has property 1. Computing all candidate keys involves partitioning attributes into necessary, useless, and middle-ground, then generating subsets of middle-ground attributes combined with necessary attributes. The goal of normalization is to decompose relations into smaller, well-designed relations while preserving functional dependencies and ensuring the decomposition is lossless.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views19 pages

FD Slide2 09

A candidate key of a relation is a minimal set of attributes that uniquely identifies each tuple. It has two properties: 1) all attributes are functionally dependent on the candidate key attributes, and 2) no proper subset of the candidate key attributes has property 1. Computing all candidate keys involves partitioning attributes into necessary, useless, and middle-ground, then generating subsets of middle-ground attributes combined with necessary attributes. The goal of normalization is to decompose relations into smaller, well-designed relations while preserving functional dependencies and ensuring the decomposition is lossless.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CANDIDATE KEYS

A candidate key of a relation schema R is a subset X of the


attributes of R with the following two properties:

1. Every attribute is functionally dependent on X,


i.e., X + = all attributes of R (also denoted as X + = R).

2. No proper subset of X has the property (1),


i.e., X is minimal with respect to the property (1).

A sub-key of R: a subset of a candidate key;

a super-key: a set of attributes containing a candidate key.

We also use the abbreviation CK to denote "candidate key".

Let R(ABCDE) be a relation schema and consider the fol-


lowing functional dependencies F = {AB → E, AD → B, B → C,
C → D}. Since

( AC)+ =ABCDE,
A+ = A, and
C+ = CD,

we know that AC is a candidate key, both A and C are sub-keys,


and ABC is a super-key. The only other candidate keys are AB
and AD. Note that since nothing determines A, A is in every can-
didate key.
-2-

Computing All Candidate Keys of a Relation R.

Given a relation schema R( A1 , A2 , ..., An ) and a set of functional


dependencies F which hold true on R, how can we compute all
candidate keys of R?

Since each candidate key must be a minimal subset Z of { A1 , ...,


An } such that Z + = R, we have the following straightforward (and
brute-force) algorithm:

(1) Construct a list L consisting of all non-empty subsets of { A1 ,


..., An } (there are 2n − 1 of them). These subsets are
arranged in L in ascending order of the size of the subset:
We get L = 〈 Z 1 , Z 2 , ..., Z 2n −1 〉, such that |Z i | ≤ |Z i+1 |. Here
|Z i | denotes the number of elements in the set Z i .

(2) Initialize the set K = {} (K will contain all CK’s of R).


While L is not empty, remove the first element Z i from L,
and compute Z i + .
If Z i + = R, then
(a) Add Z i to K
(b) Remove any element Z j from L if Z i ⊂ Z j (because
Z j is too big, it can not be a CK).

(3) Output K as the final result.


-3-

ANALYSIS The method in previous page is correct but not effi-


cient: the list L is too big (exponential in the number of attributes
in R).

IDEA:
• Focus on "necessary" attributes that will definitely appear in
ANY candidate key of R.
• Ignore "useless" attribute that will NEVER be part of a can-
didate key.
Necessary attributes:
An attribute A is said to be a necessary attribute if
(a) A occurs only in the L.H.S. (left hand side) of the fd’s
in F; or
(b) A is an attribute in relation R, but A does not occur in
either L.H.S. or R.H.S. of any fd in F.
In other words, necessary attributes NEVER occur in the R.H.S. of
any fd in F.

Useless attributes:
An attribute A is a useless attribute if A occurs ONLY in the
R.H.S. of fd’s in F.

Middle-ground attributes:
An attribute A in relation R is a middle-ground attribute if A is nei-
ther necessary nor useless.

Example.
Consider the relation R(ABCDEG) with set of fd’s F = {AB → C,
C → D, AD → E}

Necessary Useless Middle-ground


A, B, G E C, D
-4-

An important observation about necessary attribute is: a nec-


essary attribute will appear in every CK of R, and thus ALL neces-
sary attributes must appear in every CK of R.

If X is the collection of ALL necessary attributes and X + = All


attributes of R, then X must be the ONLY candidate key of R.
(Think: Why is that?)

Thus we should first check the necessary attribute closure X + , and


terminate the CK computing algorithm when X + = R. On the other
hand, we also notice that useless attributes can never be part of any
CK. Therefore, in case X + ≠ R, and we have to enumerate subsets
Z i of R to find CK’s, the Z i ’s should NOT contain any useless
attributes, and each Z i should contain all the necessary attributes.
In fact, the list L of subsets to test should be constructed from all
non-empty subsets of the middle-ground attributes, with each sub-
set expanded to include all necessary attributes.
-5-

The algorithm for computing all candidate keys of R.

Input: A relation R={ A1 , A2 , ..., An }, and F, a set of functional


dependencies.
Output: K={ K 1 , . . . , K t }, the set of all candidate keys of R.

Step1.
Set F′ to a minimal cover of F (This is needed because
otherwise we may not detect all useless attributes).

Step2.
Partition all attributes in R into necessary, useless and
middle-ground attribute sets according to F′. Let X={
C 1 , . . . , C l } be the necessary attribute set, Y = {B1 , ...,
B k } be the useless attribute set, and M = { A1 , ..., An }
− (X ∪ Y) be the middle-ground attribute set.
If X={}, then go to step4.

Step3.
Compute X + . If X + =R, then set K= {X}, terminate.
-6-

Step4.
Let L = 〈Z 1 , Z 2 , . . . , Z m 〉 be the list of all non-empty
subsets of M (the middle-ground attributes) such that
L is arranged in ascending order of the size of Z i .
Add all attributes in X (necessary attributes) to each Z i
in L.

Set K = {}.
i ← 0.
WHILE L ≠ empty do
BEGIN
i ← i + 1.
Remove the first element Z from L.
Compute Z + .
If Z + = R,
then
begin
set K ← K ∪ {Z };
for any Z j ∈ L, if Z ⊂ Z j
then L ← L − {Z j }.
end
END

-7-

Example. (Computing all candidate keys of R.)


Let R = R(ABCDEG) and F = {AB → CD, A → B, B → C, C →
E, BD → A}. The process to compute all candidate keys of R is as
follows:

(1) The minimal cover of F is {A → B, A → D, B → C, C


→ E, BD → A}.

(2) Since attribute G never appears in any fd’s in the set of


functional dependencies, G must be included in a can-
didate key of R. The attribute E appears only in the
right hand side of fd’s and hence E is not in any key of
R. No attribute of R appears only in the left hand side
of the set of fd’s. Therefore X = G at the end of step
2.

(3) Compute G + = G, so G is not a candidate key.

(4) The following table shows the L, K, Z and Z + at the


very beginning of each iteration in the WHILE state-
ment.

i Z Z+ L K
〈AG, BG, CG, DG, ABG, ACG, {}
0 − −
ADG, BCG, BDG, CDG, ABCG,
ABDG, ACDG, BCDG, ABCDG〉

1 AG ABCDEG = R 〈BG, CG, DG, BCG, BDG, CDG, {AG}


BCDG〉

2 BG BCEG ≠ R 〈CG, DG, BCG, BDG, CDG, {AG}


BCDG〉
-8-

3 CG CEG ≠ R 〈DG, BCG, BDG, CDG, BCDG〉 {AG}

4 DG DG ≠ R 〈BCG, BDG, CDG, BCDG〉 {AG}

5 BCG BCEG ≠ R 〈BDG, CDG, BCDG〉 {AG}

6 BDG ABCDEG = R 〈CDG〉 {AG, BDG}

7 CDG CEDG ≠ R 〈〉 {AG, BDG}


-9-

GOALS OF NORMALIZATION

• Let R( A1 , ..., An ) be a relation schema with a set F of


functional dependencies.

• Decide whether a relation scheme R is in "good"


form.

• In the case that a relation scheme R is not in "good"


form,

decompose it into a set of smaller relation schemas


{R1 , R2 , ..., R m } such each relation schema R j is in
"good" form (such as 3NF or BCNF).

• Moreover, the the decomposition is a lossless-join


(LLJ) decomposition

• Preferably, the decomposition should be functional


dependency preserving.
-10-

LOSS-LESS-JOIN DECOMPOSITION

The decomposition

SP′(S#, P#, QTY, STATUS) = SP(S#, P#, QTY) ⊗ S′(S#, STATUS)

is called a loss-less-join (LLJ) decomposition, or we say that


the decomposition is loss-less, because the join of the com-
ponent relations SP(S#, P#, QTY) and S′(S#, STATUS) gives
back the original relation SP′(S#, P#, QTY, STATUS).

In general, a decomposition of relation schema R into


R1 , ..., R m is an LLJ decomposition if the natural join of R1 ,
..., R m equals the original relation R:

R = R1 ⊗ R2 ⊗ . . . ⊗R m

We must be very careful in performing the decomposi-


tion otherwise the loss-less property may not be maintained.

Intuitively, the decomposition of SP′ = SP ⊗ S′ is loss-less


because the attribute "S#" which is mutual to SP and S′ can
uniquely determine the attribute "STATUS".

Theoretically, we have the following theorem which tells us


the sufficient condition for a decomposition to be loss-less.

Heath Theorem. For a relation R with attributes A,


B, C and functional dependency A → B, the decomposition

R(A, B, C) = R1 (A, B) ⊗ R2 (A, C)

is always loss-less.
-11-

Functional Dependency-Preserving Decomposition

The two most desirable properties of any decomposi-


tion of a relation R are:

(1) loss-less, and


(2) fd-preserving.

• Let R be a relation schema with functional dependen-


cies F.

R decomposed into R1 , R2 , ..., R m

F projected into F 1 , F 2 , ..., F m

• The decomposition is LLJ:


R = R1 ⊗ R2 ⊗ . . . ⊗R m

• The decomposition is fd-preserving:


F 1 ∪ F 2 ∪ ... ∪ F m ≡ F.

Here F j : the projection of F onto R j , i.e., F j = {X →


Y | X → Y ∈ F + , and X ⊆ R j , Y ⊆ R j }.

F j is the set of fd’s from the closure of F, such that the


fd’s contain only attributes from R j .
-12-

FIRST, SECOND NORMAL FORM

First normal form. A relation R is said to be in first


normal form if all the underlying domains of R contain only
atomic values. We write 1NF as a short hand for first normal
form.

Non-key attribute. An attribute A in relation R is


said to be a non-key attribute if it is not a subkey, i.e., A is
not a component of any candidate key K of R.

Second normal form. A relation R is said to be in


second normal form if R is in 1NF and every non-key
attribute is fully dependent on each candidate key K of R. In
Example 1, the relation SP′ has only one candidate key (S#,
P#). The attribute STATUS is a non-key attribute of SP′ and
it is not fully dependent on (S#, P#), therefore SP′ is not in
2NF.

Although relations in 2NF have better properties than


those in 1NF, there are still problems with them.

These problems motivate the search for more "ideal" form of


relations and hence the 3NF and BCNF normal forms are
identified.
-13-

THIRD NORMAL FORM AND 3NF DECOMPOSITION

A relation schema R with a set fd’s F is said to be in


third normal form (3NF) if for each fd X → A ∈ F (here A is
a single attribute), we have

either X is a superkey

or A is a prime.

A single attribute A is called a prime if A is a subkey, i.e., A


is a component of a key.

Example. The relation (ADB) with F = {AD → B, B


→ D} is in 3NF, because AD and AB are its candidate keys -
so every attribute is a prime. Thus the fd B → D does not
cause a violation of the 3NF condition: although B is not a
superkey, D is a prime.

Bernstein’s Theorem. Every relation R has a loss-


less, fd-preserving 3NF decomposition.
-14-

BERNSTEIN’S ALGORITHM

Input: A relation schema R with given attributes


and a set F of functional dependencies that hold on R
such that F is already a minimal cover of itself.

Output: A set of 3NF relations that form a loss-less and


fd-preserving decomposition of R.

Algorithm:

1.
Group together all fd’s which have the same L.H.S. If
X → Y 1 , X → Y 2 , ..., X → Y k are all the fd’s with the
SAME L.H.S. X, then replace all of them by the single
fd X →Y 1Y 2 ... Y k .

2. For each fd X → Y in F, form the relation (XY) in the


decomposition.

3.
IF X′Y′ ⊂ XY, then remove the relation (X′Y′) from
the decomposition.

4.
If none of the relations obtained after step (3) contains
a candidate key of the original relation R, then find a
candidate key K of R and add the relation (K) to the
decomposition.
-15-

Example. This example shows the need for steps 1, 3 and 4.


Given R(ABCDE) and F = {A → B, A → C, C → A, BD →
E}.

Step 1. {A → BC, C → A, BD → E}.

Step 2. R1 (ABC), R2 (CA), R3 (BDE)

Step 3. R1 (ABC), R3 (BDE)

Step 4. R1 (ABC), R3 (BDE) and R4 (AD)

So the final decomposition is:


R(ABCDE) = R1 (ABC) ⊗ R3 (BDE) ⊗ R4 (AD)

Note that we add the relation R4 (AD) to the list of final


decomposition in step 4 of the algorithm, b/c the relation R
has two CK’s: AD and CD, and neither is contained in any of
the R1 (ABC) and R3 (BDE). So we need to add either (AD)
or (CD) (not both, though) to the finmal list of relations.
-16-

BOYCE-CODD NORMAL FORM (BCNF)

• A relation schema R is in BCNF, if for every nontriv-


ial functional dependency X → Y that holds on R, the
attribute X is a superkey. An fd X → Y is trivial, if Y
is a subset of X, i.e., Y ⊆ X.

• Given a relation schema R and a set F of functional


dependencies that hold on R, how can we tell whether
R is in BCNF or not?
We only need to check, for each fd X → Y in F, the
L.H.S. attribute X is a super key or not, i.e., whether
X + = R.

Consider the relation schema R(ABCDE) and the fd’s


F = {AB → E, AD → B, B → C, C → D}. Relation R has 3
candidate keys AB, AC, AD. Obviously R is not in BCNF,
because we have the fd’s B → C and C → D in R, but B is
not a superkey of R, neither is C. By using the Heath Theo-
rem, we can obtain an LLJ decomposition of R into three
BCNF relations as follows:
R(ABCDE)

by C → D

R1 (CD) R′(ABCE)

by B → C

R2 (BC) R3 (ABE)

The decomposition is given by R(ABCDE) = R1 (CD) ⊗


R2 (BC) ⊗ R3 (ABE).
-17-

Algorithm to decompose a relation R into a set of BCNF rela-


tions
Input: Relation R and set of functional dependencies F (F is
minimal)
Output: Result={ R1 , R2 , ..., R m } such that
(A) Each Ri is in BCNF
(B) The decomposition is loss-less-join decomposition

1. Group together all fd’s which have the same L.H.S. If


X → Y 1 , X → Y 2 , . . . , X → Y k are in F, then replace
all of them by a single fd X → Y 1 Y 2 . . . Y k .
result := {R};
done := FALSE;
Compute F + ;
2. WHILE (NOT done) DO
IF (there is a schema Ri in result that is not in
BCNF)
THEN BEGIN
Let X→Y be a nontrivial functional
dependency in F + that holds on Ri
such that X→ Ri is not in F + ;
result :=(result − {Ri } ) ∪ {(Ri − Y),
(XY)};
(this means to break Ri into two relations
Ri − Y and XY)
END
ELSE done := TRUE ;
-18-

Example. (LLJ decomposition of R into a set of


BCNF relations.) Let R = R(ABCDEGH) and F = {B → E,
B → H, E → A, E → D, AH → C}. The decomposition
according to the given algorithm is as follows:

(1) After grouping the R.H.S of the fd’s together, we get

{B → EH, E → AD, AH → C}.

(2) The only candidate key of R is BG. We have the follow-


ing LLJ decomposition:

R(ABCDEHG)
by AH → C

R1 (AHC) R′(ABDEHG)

by E → AD

R2 (EAD) R′′(BEHG)

by B → EH

R3 (BEH) R4 (BG)

It is easy to verify that the four resulting relations are


all in BCNF, and

R(ABCDEHG) = (AHC) ⊗ (EAD) ⊗ (BEH) ⊗ (BG).

Also we can see that all the functional dependencies in F are


preserved.
-19-

PROBLEM WITH BCNF DECOMPOSITION

A problem with BCNF is that we may not be able to


find a BCNF decomposition of a relation R such that the
decomposition has the properties of being both loss-less and
fd-preserving.

We know that we can always find an LLJ decomposi-


tion of a relation R into a set of BCNF relations. However,
we may not find a BCNF decomposition of a relation R such
that the decomposition is both LLJ AND fd-preserving.

For example, for relation schema R(ABD) with functional


dependencies F = {AD → B, B → D}, R is NOT in BCNF,
b/c in the fd B → D, attribute B is NOT a super key of R.
So we use Heath theorem to get the decomposition R(ADB)
= R1 (BD) ⊗ R2 (BA). However, the fd AD → B is not pre-
served by this decomposition. One cannot verify AD → B
be looking at the tuples of either of the R1 (BD) or R2 (BA).
This means that we must reconstruct the original relation
R(ADB) to verify AD → B. This is, however, undesirable.

You might also like