0% found this document useful (0 votes)
17 views11 pages

Unit 3

The document discusses functional dependencies (FDs) in relational databases, defining their role in constraining tuples and explaining concepts such as closure of FDs and attribute closure. It also covers canonical covers to eliminate redundant dependencies and outlines the normalization process, including the criteria for achieving various normal forms like First Normal Form (1NF). The document emphasizes minimizing redundancy and anomalies in relational schemas through normalization.

Uploaded by

AMAN RAJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views11 pages

Unit 3

The document discusses functional dependencies (FDs) in relational databases, defining their role in constraining tuples and explaining concepts such as closure of FDs and attribute closure. It also covers canonical covers to eliminate redundant dependencies and outlines the normalization process, including the criteria for achieving various normal forms like First Normal Form (1NF). The document emphasizes minimizing redundancy and anomalies in relational schemas through normalization.

Uploaded by

AMAN RAJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1.

FUNCTIONAL DEPENDENCY
Let R be a relation schema
  R and   R
The functional dependency denoted by

between two sets of attributes  and  that are subsets of R specifies a constraint on the possible tuples that can form a relation
state r of R. The constraint is that, for any two tuples t1 and t2 in r that have
t1[] = t2 []  t1[ ] = t2 [ ]

Alternatively, the values of the  component of a tuple uniquely (or functionally) determine the values of the  component. We
also say that there is a functional dependency from  to , or that  is functionally dependent on . The abbreviation for
functional dependency is FD. The set of attributes  is called the left-hand side of the FD, and  is called the right-hand side.

Example:
Consider r(A,B ) with the following instance of r.

On this instance, A  B does NOT hold, but B  A does hold.

Example:
Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the
schema:
bor_loan (customer_id, loan_number, amount ).
We expect this functional dependency to hold:
loan_number  amount
but would not expect the following to hold:
amount  customer_name
Example:
EMP_PROJ (eno, pnumber, hours, ename, pname, plocation)
In EMP_PROJ relation the following functional dependency should hold
eno, pnumber  hours
eno  ename
pnumber  pname, plocation

 K is a super key for relation schema R if and only if K  R.


 K is a candidate key for R if and only if
K  R, and
for no   K,   R

Trivial functional dependency:


A functional dependency is trivial if it is satisfied by all instances of a relation
Example:
 {customer_name, loan_number  customer_name
 customer_name  customer_name
In general,    is trivial if   

2. CLOSURE OF FD

Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by F.

For example: If A  B and B  C, then we can infer that A  C

The set of all functional dependencies that include F as well as all dependencies that can be inferred from F is the closure of F.
We denote the closure of F by F+. F+ is a superset of F.
Example,
Suppose that we specify the following set F of obvious functional dependencies on the relation schema
EMP_DEPT(ENAME, ENO, DOB, ADDRESS, DNUMBER, DMGRENO)

F= {SSN  {ENAME, BDATE, ADDRESS, DNUMBER},


DNUMBER  {DNAME, DMGRSSN}}

Some of the additional functional dependencies that we can infer from F are the following:
SSN {DNAME, DMGRSSN}
SSN  SSN
DNUMBER  DNAME

We can use the following three rules to find logically implied functional dependencies. By applying these rules repeatedly, we
can find all of F+, given F. This collection of rules is called Armstrong’s axioms in honor of the person who first proposed it.
 Reflexivity rule. If α is a set of attributes and β ⊆α, then α →β holds.
 Augmentation rule. If α → β holds and γ is a set of attributes, then γα → γβ holds.
 Transitivity rule. If α →β holds and β → γ holds, then α → γ holds.

Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies. They are complete, because,
for a given set F of functional dependencies, they allow us to generate all F+.

Although Armstrong’s axioms are complete, it is tiresome to use them directly for the computation of F+. To simplify matters
further, there are some additional rules.
 Union rule. If α → β holds and α → γ holds, then α → βγ holds.
 Decomposition rule. If α → βγ holds, then α → β holds and α → γ holds.
 Pseudotransitivity rule. If α→ β holds and γβ →δ holds, then αγ → δ holds.

Example
R = (A, B, C, G, H, I)
F = { A  B, A  C, CG  H, CG  I, B  H}

Some members of F+
 AH
 by transitivity from A  B and B  H
 AG  I
 by augmenting A  C with G, to get AG  CG and then transitivity with CG  I
 The functional dependency AG  I can also infer by using Pseudotransitivity rule on A  C and CG  I.
 CG  HI
 by augmenting CG  I to infer CG  CGI, and augmenting of CG  H to infer CGI  HI, and then transitivity
on CG  CGI and CGI  HI.
 This functional dependency can also infer by using Union rule on CG  H, CG  I.

3. CLOSURE OF ATTRIBUTE

Let α be a set of attributes. The closure of α under F (denoted by α+) as the set of attributes that are functionally determined by α
under F.

Algorithm to compute a+, the closure of a under F

result := α;
while (changes to result) do
for each    in F do
begin
if   result then result := result  
end
Example:
R = (A, B, C, G, H, I)
F = {A  B, A  C, CG  H, CG  I, B  H}

We can calculate (AG)+ by using algorithm


1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)

(AG)+ = ABCGHI

Example:
F = SSN  ENAME,
PNUMBER  {PNAME, PLOCATION},
{SSN, PNUMBER} HOURS
Using Algorithm, we calculate the following closure sets with respect to F;
{SSN }+ = {SSN, ENAME}
{PNUMBER }+ = {PNUMBER, PNAME, PLOCATION}
{SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS}

Uses of Attribute Closure


There are several uses of the attribute closure algorithm:
 Testing for superkey:
 To test if  is a superkey, we compute +, and check if + contains all attributes of R.
 Testing functional dependencies
 To check if a functional dependency    holds (or, in other words, is in F+), just check if   +.
 That is, we compute + by using attribute closure, and then check if it contains .
 Is a simple and cheap test, and very useful
 Computing closure of F
 For each   R, we find the closure +, and for each S  +, we output a functional dependency   S.

4. CANONICAL COVER (MINIMAL SET OF FUNCTIONAL DEPENDENCIES)

Sets of functional dependencies may have redundant dependencies that can be inferred from the others
 For example: A  C is redundant in: {A  B, B  C, A  C }
 Parts of a functional dependency may be redundant
 E.g.: on RHS: {A  B, B  C, A  CD} can be simplified to
{A  B, B  C, A  D}
 E.g.: on LHS: {A  B, B  C, AC  D} can be simplified to
{A  B, B  C, A  D}
Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant
dependencies or redundant parts of dependencies

Extraneous Attributes
Consider a set F of functional dependencies and the functional dependency    in F.
 Attribute A is extraneous in  if A   and F logically implies (F – {  })  {( – A)  }.
 Attribute A is extraneous in  if A   and the set of functional dependencies
(F – {  })  { ( – A)} logically implies F.
Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always
implies a weaker one

Example: Given F = {A  C, AB  C }
 B is extraneous in AB  C because {A  C, AB  C} logically implies A  C (I.e. the result of dropping B from AB  C).

Example: Given F = {A  C, AB  CD}


C is extraneous in AB  CD since AB  C can be inferred even after deleting C
Testing if an Attribute is Extraneous
 Consider a set F of functional dependencies and the functional dependency    in F.
 To test if attribute A   is extraneous in 
1. compute ({} – A)+ using the dependencies in F
2. check that ({} – A)+ contains  ; if it does, A is extraneous in 
 To test if attribute A   is extraneous in 
1. compute + using only the dependencies in
F’ = (F – {   })  { ( – A)},
2. check that + contains A; if it does, A is extraneous in 

Canonical Cover
A canonical cover for F is a set of dependencies Fc such that
 F logically implies all dependencies in Fc, and
 Fc logically implies all dependencies in F, and
 No functional dependency in Fc contains an extraneous attribute, and
 Each left side of functional dependency in Fc is unique.

To compute a canonical cover for F:


repeat
Use the union rule to replace any dependencies in F
1  1 and 1  2 with 1  1 2
Find a functional dependency    with an
extraneous attribute either in  or in 
If an extraneous attribute is found, delete it from   
until F does not change

Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied

Example:
Computing a Canonical Cover
R = (A, B, C)

F = {A  BC, B  C, A  B, AB  C}

 Combine A  BC and A  B into A  BC


 Set is now {A  BC, B  C, AB  C}
 A is extraneous in AB  C
 Check if the result of deleting A from AB  C is implied by the other dependencies
 Yes: in fact, B  C is already present!
 Set is now {A  BC, B  C}
 C is extraneous in A  BC
 Check if A  C is logically implied by A  B and the other dependencies
 Yes: using transitivity on A  B and B  C.
– Can use attribute closure of A in more complex cases

The canonical cover is:


AB
BC

5. NORMAL FORMS BASED ON PRIMARY KEYS


Normalization of Relation
The normalization process, as first proposed by Codd (l972a), takes a relation schema through a series of tests to "certify"
whether it satisfies a certain normal form. The process, which proceeds in a top-down fashion by evaluating each relation against
the criteria for normal forms and decomposing relations as necessary, can thus be considered as relational design by analysis.
Initially, Codd proposed three normal forms, which he called first, second, and third normal form. A stronger definition of 3NF-
called Boyce-Codd normal form (BCNF)-was proposed later by Boyce and Codd. All these normal forms are based on the
functional dependencies among the attributes of a relation. Later, a fourth normal form (4NF) and a fifth normal form (5NF) were
proposed, based on the concepts of multivalued dependencies and join dependencies.
Normalization of data can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary
keys to achieve the desirable properties of
(1) Minimizing redundancy and (2) minimizing the insertion, deletion, and update anomalies

Prime and Non-prime attribute


An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. An attribute is
called nonprime if it is not a prime attribute-that is, if it is not a member of any candidate key.

5.1 First Normal Form


It states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any
attribute in a tuple must be a single value from the domain of that attribute. Hence 1NF disallow multivalued attributes,
composite attributes, and their combinations.
Example
Consider the DEPARTMENT relation schema shown in Figure 1, whose primary key is DNUMBER. We assume that each
department can have a number of locations.

Figure 1. Normalization into 1NF. (a) A relation schema that is not in 1NF. (b) Example state of relation DEPARTMENT.
(c) 1NF version of same relation with redundancy.

There are three main techniques to achieve first normal form for such a relation:
1. Remove the attribute DLOCATIONS that violates 1NF and place it in a separate relation DEPT_LOCATIONS along
with the primary key DNUMBER of DEPARTMENT. The primary key of this relation is the combination
{DNUMBER, DLOCATION}. A distinct tuple in DEPT_LOCATIONS exists for each location of a department. This
decomposes the non-1NF relation into two 1NFrelations.
2. Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each location of a
DEPARTMENT, as shown in Figure 1(c). In this case, the primary key becomes the combination {DNUMBER,
DLOCATION}. This solution has the disadvantage of introducing redundancy in the relation.
3. If a maximum number of values is known for the attribute-for example, if it is known that at most three locations can
exist for a department-replace the DLOCA· TIONS attribute by three atomic attributes: DLOCATIONl,
DLOCATION2, and DLOCATION3. This solution has the disadvantage of introducing null values if most departments
have fewer than three locations. It further introduces a spurious semantics about the ordering among the location values
that is not originally intended. Querying on this attribute becomes more difficult; for example, consider how you would
write the query: "List the departments that have "Bellaire" as one of their locations" in this design.

First normal form also disallows multivalued attributes that are themselves composite. These are called nested relations
because each tuple can have a relation within it. Figure 2 shows how the EMP_PRO) relation could appear if nesting is
allowed. Each tuple represents an employee entity, and a relation PRO)S(PNUMBER, HOURS) within each. tuple
represents the employee's projects and the hours per week that employee works on each project. The schema of this
EMP_PROJ relation can be represented as follows:

EMP_PROJ (SSN, ENAME, {PROJS(PNUMBER, HOURS)}).

Figure 2. Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with a "nested relation" attribute
PROJS. (b) Example extension of the EMUROJ relation showing nested relations within each tuple. (c)
Decomposition of EMP_PROJ into relations EMP_PROJI and EMP_PROJ2 by propagating the primary key.

5.2 Second Normal Form


Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency X  Y is a full
functional dependency if removal of any attribute A from X means that the dependency does not hold any more. A functional
dependency X Y is a partial dependency if some attribute A є X can be removed from X and the dependency still holds; that is,
for some A є X, (X - {A})  Y.
Definition
A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the candidate key of R.
Or
A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is not partially dependent on any key of
R.

Example
The EMP_PROJ relation in Figure 3 is in INF but is not in 2NF. The nonprime attribute ENAME violates 2NF
because of FD2, as do the nonprime attributes PNAME and PLOCATION because of FD3. The functional
dependencies FD2 and FD3 make ENAME, PNAME, and PLOCATION partially dependent on the primary key
{SSN, PNUMBER} of EMP_PROJ, thus violating the 2NF test. The functional dependencies FDI, FD2, and FD3 in
Figure 3 hence lead to the decomposition of EMP_PRO] into the three relation schemas EPl, EP2, and EP3 shown in
Figure 3, each of which is in 2NF.

Figure 3. Normalizing EMP_PRO] into 2NF.

5.3 Third Normal Form


Third normal form (3NF) is based on the concept of transitive dependency. A functional dependency X  Y in a relation
schema R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of
R, and both X  Z and Z  Y hold.
Definition
According to Codd's original definition, a relation schema R is in 3NF if it satisfies 2NFandno nonprime attribute of R is
transitively dependent on the primary key.
Or
A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency X  A holds in R,
either (a) X is a super key of R, or (b) A is a prime attribute of R.

Example:
The relation schema EMP_DEPT in Figure 4 is in 2NF, since no partial dependencies on a key exist. However,
EMP_DEPT is not in 3NF because of the transitive dependency of DMGRSSN (and also DNAME) on SSN via
DNUMBER. We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas EDl and ED2
shown in Figure 4.
Figure 4. Normalizing EMP_DEPT into 3NF relations.

Example:
Consider the relation schema LOTS shown in Figure 5(a), which describes parcels of land for sale in various counties of a state.
Suppose that there are two candidate keys: PROPERTY_ID# and {COUNTY_NAME, LOT#}; that is, lot numbers are unique
only within each county, but PROPERTY_ID numbers are unique across counties for the entire state. Based on the two
candidate keys PROPERTY_ID# and {cOUNTY_NAME, LOT#}, we know that the functional dependencies FD1 and FD2 of
Figure 5(a) hold.
Suppose that the following two additional functional dependencies hold in LOTS:
FD3: COUNTY_NAME  TAX_RATE
FD4: AREA  PRICE

Figure 5.a. The LOTS relation with its functional dependencies FDl through FD4.

The LOTS relation schema violates the general definition of 2NF because TAX_RATE is partially dependent on the candidate
key {COUNTY_NAME, LOT#}, due to FD3. To normalize LOTS into 2NF, we decompose it into the two relations LOTSl and
LOTS2, shown in Figure 5(b). We construct LOTSl by removing the attribute TAX_RATE that violates 2NF from LOTS and
placing it with COUNTCNAME (the left-hand side of FD3 that causes the partial dependency) into another relation LOTS2.
Both LOTSl and LOTS2 are in 2NF. Notice that FD4 does not violate 2NF and is carried over to LOTSl.
Figure 5(b) Decomposing into the 2NF relations LOTS1 and LOTS2. (c) Decomposing LOTS1 into the 3NF relations
LOTS1A and LOTS1B. (d) Summary of the progressive normalization of LOTS.

According to this definition, LOTS2 (Figure 5(b)) is in 3NF. However, FD4 in LOTSl violates 3NF because AREA is not a
superkey and PRICE is not a prime attribute in LOTSl. To normalize LOTSl into 3NF, we decompose it into the relation
schemas LOTSlA and LOTSlB shown in Figure 5(c). We construct LOTSlA by removing the attribute PRICE that violates
3NF from LOTSl and placing it with AREA (the left-hand side of FD4 that causes the transitive dependency) into another
relation LOTSlB. Both LOTSlA and LOTSlB are in 3NF.
5.4 Boyce-Codd normal form
Definition
A relation schema R is in BCNF if whenever a nontrivial functional dependency X A holds in R, then X is a super key of
R.
Bovce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF. That is,
every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF. The only difference between
the definitions of BCNF and 3NF is that condition (b) of 3NF, which allows A to be prime, is absent from BCNF.
Example:
In figure 6(a), FD5 violates BCNF in LOTsIA because AREA is not a superkey of LOTslA. Note that FD5 satisfies 3NF in
LOTSIA because COUNTY_NAME is a prime attribute (condition b), but this condition does not exist in the definition of
BCNF. We can decompose LOTSIA into two BCNF relations LOTS1AX and LOTS 1AY, shown in Figure 6(a). This
decomposition loses the functional dependency FD2 because its attributes no longer coexist in the same relation after
decomposition.

Figure 6(a) BCNF normal ization of LOTS1A with the functional dependency FD2 being lost in the decomposition. (b) A
schematic relation with FDS; it is in 3NF, but not in BCNF.
6. DESIRABLE PROPERTIES OF DECOMPOSITION
There are two properties of decomposition; first is lossless-join decomposition and second is dependency preservation.
1. Lossless-Join Decomposition
When we decompose a relation into a number of smaller relations, it is crucial that the decomposition be lossless. Let R be a
relation schema, and let F be a set of functional dependencies on R. Let R1 and R2 form a decomposition of R. This
decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies is in F+:
• R1 ∩ R2 → R1
• R1 ∩ R2 → R2
In other words, if R1 ∩ R2 forms a superkey of either R1 or R2, the decomposition of R is a lossless-join decomposition.
We illustrate our concepts with the Lending-schema schema

Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)

We begin by decomposing Lending-schema into two schemas:

Branch-schema = (branch-name, branch-city, assets)


Loan-info-schema = (branch-name, customer-name, loan-number, amount)

Since branch-name → branch-name, branch-city, assets


And Branch-schema ∩ Loan-info-schema = {branch-name}, it follows that our initial decomposition is a lossless-join
decomposition.
Next, we decompose Loan-info-schema into

Loan-schema = (loan-number, branch-name, amount)


Borrower-schema = (customer-name, loan-number)

This step results in a lossless-join decomposition, since loan-number is a common attribute and
loan-number → amount branch-name.

2. Dependency Preservation
There is another goal in relational-database design: dependency preservation. Let F be a set of functional dependencies on a
schema R, and let R1, R2, … … … …. Rn be a decomposition of R and F1, F2, . . . , . .Fn is the set of dependencies.
Let F’= F1 ∪ F2 ∪ · · · ∪ Fn. F’ is a set of functional dependencies on schema R, but, in general, F’≠ F. However, even if
F’≠ F, it may be that F’+ = F+. If the latter is true, then every dependency in F is logically implied by F’. We say that a
decomposition having the property F’+ = F+ is a dependency-preserving decomposition.
There is an algorithm for testing dependency preservation. The idea is to test each functional dependency α → β in F by
using a modified form of attribute closure to see if it is preserved by the decomposition. We apply the following procedure
to each α → β in F.

result = α
while (changes to result) do
for each Ri in the decomposition
t = (result ∩ Ri)+ ∩ Ri
result = result ∪t
The attribute closure is with respect to the functional dependencies in F. If result contains all attributes in β, then the
functional dependency α → β is preserved.

You might also like