Unit 3
Unit 3
FUNCTIONAL DEPENDENCY
Let R be a relation schema
R and R
The functional dependency denoted by
between two sets of attributes and that are subsets of R specifies a constraint on the possible tuples that can form a relation
state r of R. The constraint is that, for any two tuples t1 and t2 in r that have
t1[] = t2 [] t1[ ] = t2 [ ]
Alternatively, the values of the component of a tuple uniquely (or functionally) determine the values of the component. We
also say that there is a functional dependency from to , or that is functionally dependent on . The abbreviation for
functional dependency is FD. The set of attributes is called the left-hand side of the FD, and is called the right-hand side.
Example:
Consider r(A,B ) with the following instance of r.
Example:
Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the
schema:
bor_loan (customer_id, loan_number, amount ).
We expect this functional dependency to hold:
loan_number amount
but would not expect the following to hold:
amount customer_name
Example:
EMP_PROJ (eno, pnumber, hours, ename, pname, plocation)
In EMP_PROJ relation the following functional dependency should hold
eno, pnumber hours
eno ename
pnumber pname, plocation
2. CLOSURE OF FD
Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by F.
The set of all functional dependencies that include F as well as all dependencies that can be inferred from F is the closure of F.
We denote the closure of F by F+. F+ is a superset of F.
Example,
Suppose that we specify the following set F of obvious functional dependencies on the relation schema
EMP_DEPT(ENAME, ENO, DOB, ADDRESS, DNUMBER, DMGRENO)
Some of the additional functional dependencies that we can infer from F are the following:
SSN {DNAME, DMGRSSN}
SSN SSN
DNUMBER DNAME
We can use the following three rules to find logically implied functional dependencies. By applying these rules repeatedly, we
can find all of F+, given F. This collection of rules is called Armstrong’s axioms in honor of the person who first proposed it.
Reflexivity rule. If α is a set of attributes and β ⊆α, then α →β holds.
Augmentation rule. If α → β holds and γ is a set of attributes, then γα → γβ holds.
Transitivity rule. If α →β holds and β → γ holds, then α → γ holds.
Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies. They are complete, because,
for a given set F of functional dependencies, they allow us to generate all F+.
Although Armstrong’s axioms are complete, it is tiresome to use them directly for the computation of F+. To simplify matters
further, there are some additional rules.
Union rule. If α → β holds and α → γ holds, then α → βγ holds.
Decomposition rule. If α → βγ holds, then α → β holds and α → γ holds.
Pseudotransitivity rule. If α→ β holds and γβ →δ holds, then αγ → δ holds.
Example
R = (A, B, C, G, H, I)
F = { A B, A C, CG H, CG I, B H}
Some members of F+
AH
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG and then transitivity with CG I
The functional dependency AG I can also infer by using Pseudotransitivity rule on A C and CG I.
CG HI
by augmenting CG I to infer CG CGI, and augmenting of CG H to infer CGI HI, and then transitivity
on CG CGI and CGI HI.
This functional dependency can also infer by using Union rule on CG H, CG I.
3. CLOSURE OF ATTRIBUTE
Let α be a set of attributes. The closure of α under F (denoted by α+) as the set of attributes that are functionally determined by α
under F.
result := α;
while (changes to result) do
for each in F do
begin
if result then result := result
end
Example:
R = (A, B, C, G, H, I)
F = {A B, A C, CG H, CG I, B H}
(AG)+ = ABCGHI
Example:
F = SSN ENAME,
PNUMBER {PNAME, PLOCATION},
{SSN, PNUMBER} HOURS
Using Algorithm, we calculate the following closure sets with respect to F;
{SSN }+ = {SSN, ENAME}
{PNUMBER }+ = {PNUMBER, PNAME, PLOCATION}
{SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS}
Sets of functional dependencies may have redundant dependencies that can be inferred from the others
For example: A C is redundant in: {A B, B C, A C }
Parts of a functional dependency may be redundant
E.g.: on RHS: {A B, B C, A CD} can be simplified to
{A B, B C, A D}
E.g.: on LHS: {A B, B C, AC D} can be simplified to
{A B, B C, A D}
Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant
dependencies or redundant parts of dependencies
Extraneous Attributes
Consider a set F of functional dependencies and the functional dependency in F.
Attribute A is extraneous in if A and F logically implies (F – { }) {( – A) }.
Attribute A is extraneous in if A and the set of functional dependencies
(F – { }) { ( – A)} logically implies F.
Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always
implies a weaker one
Example: Given F = {A C, AB C }
B is extraneous in AB C because {A C, AB C} logically implies A C (I.e. the result of dropping B from AB C).
Canonical Cover
A canonical cover for F is a set of dependencies Fc such that
F logically implies all dependencies in Fc, and
Fc logically implies all dependencies in F, and
No functional dependency in Fc contains an extraneous attribute, and
Each left side of functional dependency in Fc is unique.
Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied
Example:
Computing a Canonical Cover
R = (A, B, C)
F = {A BC, B C, A B, AB C}
Figure 1. Normalization into 1NF. (a) A relation schema that is not in 1NF. (b) Example state of relation DEPARTMENT.
(c) 1NF version of same relation with redundancy.
There are three main techniques to achieve first normal form for such a relation:
1. Remove the attribute DLOCATIONS that violates 1NF and place it in a separate relation DEPT_LOCATIONS along
with the primary key DNUMBER of DEPARTMENT. The primary key of this relation is the combination
{DNUMBER, DLOCATION}. A distinct tuple in DEPT_LOCATIONS exists for each location of a department. This
decomposes the non-1NF relation into two 1NFrelations.
2. Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each location of a
DEPARTMENT, as shown in Figure 1(c). In this case, the primary key becomes the combination {DNUMBER,
DLOCATION}. This solution has the disadvantage of introducing redundancy in the relation.
3. If a maximum number of values is known for the attribute-for example, if it is known that at most three locations can
exist for a department-replace the DLOCA· TIONS attribute by three atomic attributes: DLOCATIONl,
DLOCATION2, and DLOCATION3. This solution has the disadvantage of introducing null values if most departments
have fewer than three locations. It further introduces a spurious semantics about the ordering among the location values
that is not originally intended. Querying on this attribute becomes more difficult; for example, consider how you would
write the query: "List the departments that have "Bellaire" as one of their locations" in this design.
First normal form also disallows multivalued attributes that are themselves composite. These are called nested relations
because each tuple can have a relation within it. Figure 2 shows how the EMP_PRO) relation could appear if nesting is
allowed. Each tuple represents an employee entity, and a relation PRO)S(PNUMBER, HOURS) within each. tuple
represents the employee's projects and the hours per week that employee works on each project. The schema of this
EMP_PROJ relation can be represented as follows:
Figure 2. Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with a "nested relation" attribute
PROJS. (b) Example extension of the EMUROJ relation showing nested relations within each tuple. (c)
Decomposition of EMP_PROJ into relations EMP_PROJI and EMP_PROJ2 by propagating the primary key.
Example
The EMP_PROJ relation in Figure 3 is in INF but is not in 2NF. The nonprime attribute ENAME violates 2NF
because of FD2, as do the nonprime attributes PNAME and PLOCATION because of FD3. The functional
dependencies FD2 and FD3 make ENAME, PNAME, and PLOCATION partially dependent on the primary key
{SSN, PNUMBER} of EMP_PROJ, thus violating the 2NF test. The functional dependencies FDI, FD2, and FD3 in
Figure 3 hence lead to the decomposition of EMP_PRO] into the three relation schemas EPl, EP2, and EP3 shown in
Figure 3, each of which is in 2NF.
Example:
The relation schema EMP_DEPT in Figure 4 is in 2NF, since no partial dependencies on a key exist. However,
EMP_DEPT is not in 3NF because of the transitive dependency of DMGRSSN (and also DNAME) on SSN via
DNUMBER. We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas EDl and ED2
shown in Figure 4.
Figure 4. Normalizing EMP_DEPT into 3NF relations.
Example:
Consider the relation schema LOTS shown in Figure 5(a), which describes parcels of land for sale in various counties of a state.
Suppose that there are two candidate keys: PROPERTY_ID# and {COUNTY_NAME, LOT#}; that is, lot numbers are unique
only within each county, but PROPERTY_ID numbers are unique across counties for the entire state. Based on the two
candidate keys PROPERTY_ID# and {cOUNTY_NAME, LOT#}, we know that the functional dependencies FD1 and FD2 of
Figure 5(a) hold.
Suppose that the following two additional functional dependencies hold in LOTS:
FD3: COUNTY_NAME TAX_RATE
FD4: AREA PRICE
Figure 5.a. The LOTS relation with its functional dependencies FDl through FD4.
The LOTS relation schema violates the general definition of 2NF because TAX_RATE is partially dependent on the candidate
key {COUNTY_NAME, LOT#}, due to FD3. To normalize LOTS into 2NF, we decompose it into the two relations LOTSl and
LOTS2, shown in Figure 5(b). We construct LOTSl by removing the attribute TAX_RATE that violates 2NF from LOTS and
placing it with COUNTCNAME (the left-hand side of FD3 that causes the partial dependency) into another relation LOTS2.
Both LOTSl and LOTS2 are in 2NF. Notice that FD4 does not violate 2NF and is carried over to LOTSl.
Figure 5(b) Decomposing into the 2NF relations LOTS1 and LOTS2. (c) Decomposing LOTS1 into the 3NF relations
LOTS1A and LOTS1B. (d) Summary of the progressive normalization of LOTS.
According to this definition, LOTS2 (Figure 5(b)) is in 3NF. However, FD4 in LOTSl violates 3NF because AREA is not a
superkey and PRICE is not a prime attribute in LOTSl. To normalize LOTSl into 3NF, we decompose it into the relation
schemas LOTSlA and LOTSlB shown in Figure 5(c). We construct LOTSlA by removing the attribute PRICE that violates
3NF from LOTSl and placing it with AREA (the left-hand side of FD4 that causes the transitive dependency) into another
relation LOTSlB. Both LOTSlA and LOTSlB are in 3NF.
5.4 Boyce-Codd normal form
Definition
A relation schema R is in BCNF if whenever a nontrivial functional dependency X A holds in R, then X is a super key of
R.
Bovce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF. That is,
every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF. The only difference between
the definitions of BCNF and 3NF is that condition (b) of 3NF, which allows A to be prime, is absent from BCNF.
Example:
In figure 6(a), FD5 violates BCNF in LOTsIA because AREA is not a superkey of LOTslA. Note that FD5 satisfies 3NF in
LOTSIA because COUNTY_NAME is a prime attribute (condition b), but this condition does not exist in the definition of
BCNF. We can decompose LOTSIA into two BCNF relations LOTS1AX and LOTS 1AY, shown in Figure 6(a). This
decomposition loses the functional dependency FD2 because its attributes no longer coexist in the same relation after
decomposition.
Figure 6(a) BCNF normal ization of LOTS1A with the functional dependency FD2 being lost in the decomposition. (b) A
schematic relation with FDS; it is in 3NF, but not in BCNF.
6. DESIRABLE PROPERTIES OF DECOMPOSITION
There are two properties of decomposition; first is lossless-join decomposition and second is dependency preservation.
1. Lossless-Join Decomposition
When we decompose a relation into a number of smaller relations, it is crucial that the decomposition be lossless. Let R be a
relation schema, and let F be a set of functional dependencies on R. Let R1 and R2 form a decomposition of R. This
decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies is in F+:
• R1 ∩ R2 → R1
• R1 ∩ R2 → R2
In other words, if R1 ∩ R2 forms a superkey of either R1 or R2, the decomposition of R is a lossless-join decomposition.
We illustrate our concepts with the Lending-schema schema
This step results in a lossless-join decomposition, since loan-number is a common attribute and
loan-number → amount branch-name.
2. Dependency Preservation
There is another goal in relational-database design: dependency preservation. Let F be a set of functional dependencies on a
schema R, and let R1, R2, … … … …. Rn be a decomposition of R and F1, F2, . . . , . .Fn is the set of dependencies.
Let F’= F1 ∪ F2 ∪ · · · ∪ Fn. F’ is a set of functional dependencies on schema R, but, in general, F’≠ F. However, even if
F’≠ F, it may be that F’+ = F+. If the latter is true, then every dependency in F is logically implied by F’. We say that a
decomposition having the property F’+ = F+ is a dependency-preserving decomposition.
There is an algorithm for testing dependency preservation. The idea is to test each functional dependency α → β in F by
using a modified form of attribute closure to see if it is preserved by the decomposition. We apply the following procedure
to each α → β in F.
result = α
while (changes to result) do
for each Ri in the decomposition
t = (result ∩ Ri)+ ∩ Ri
result = result ∪t
The attribute closure is with respect to the functional dependencies in F. If result contains all attributes in β, then the
functional dependency α → β is preserved.