Unit-4 DBMS AIDS R20
Unit-4 DBMS AIDS R20
Unit-4 Syllabus
Functional Dependency:
Let R be a relation, let X and Y are two sets of attributes of R. For every pair of tuples
t1 and t2 in an instance r of R, if t1[X] =t2[X] => t1[Y] =t2[Y], then the functional
dependency X→Y holds on R. i.e., for every pair of tuples t1 and t2 with same value on X,
the values in Y must be same.
Ex:
A B C D
a1 b1 c1 d1
a1 b1 c1 d2
a1 b2 c2 d1
a2 b1 c3 d1
Note:
By looking at an instance it is meaningful to say that an FD doesn’t hold. But it isn’t
meaningful to say a FD holds on a relation by looking at one or even more instances.
Ex:
Let R (A,B,C) F={A→B,B→C} Then F+ ={A→B , B→C , A→C}
In a FD X→Y, X is called the determinant and Y is called dependent.
Armstrong’s Axioms:
Examples:
1) Consider R (A,B,C,G,H,I) and F= {A→B,A→C,CG→H,CG→I,B→H},
Check whether AG→I can be implied and also determine F+.
Sol: Consider A→C & CG→I
By pseudo transitivity, AG→I (Replace C with its determinant)
Consider A→C, CG→H =>AG→H
A→B, B→H =>A→H
2) Consider R(A,B,C,D,E)
F={A→BC , CD→E , B→D ,E→A} F+=?
Sol: A→BC =>A→B,A→C
B→D , CD→C =>CB→E
A→B , B→D =>A→D
A→C , CD→E =>AD→E
E→A , A→BC => E→BC
F+= {F}U{AD→E , A→D, E→BC, BC→E}
***If the closure of a set of attributes X computed under F includes all attributes of the
relation R and if X is minimal, i.e., No FDs exist among attribute of X, then X is a Candidate
Key of R. Some relation may have multiple candidate keys. In such a case, one of them is
designated as Primary Key of the table. A candidate key is simply called Key. All supersets of
a candidate key are called Super Keys. All attributes present in all candidate keys together are
called Key Attributes or Prime Attributes. The attributes that are not members of any key are
called Non-Key Attributes. ***
4) Given R(A,B,C,D,E)
F1= {A→B, AB→C , D→AC ,D→E}
F2= {A→BC, D→AE}
Check whether F1 and F2 are equivalent.
Sol: Consider F1:
D→AC =>D→A, D→C
A→B, AB→ C => A→C
F1 = {A→B, A→C, D→C,D→E ,D→A}
F2 = {A→B, A→C, D→A, D→E}
= {A→B, A→C , D→C, D→A , D→E})=F1.
(Or)
F1= {A→B , A→C, D→A,D→E}
= {A→BC,D→AE}=F2. Hence, F1 and F2 are equivalent.
An attribute “A” is extraneous in a FD X→Y (either on left hand side or on right hand side)
if we can safely remove “A” without changing the closure F+ of F.
Ex: 1) if AB→C and A→C, then B is extraneous.
2) If AB→CD and A→C, then C is extraneous in ‘CD’.
A canonical cover Fc for F is a set of FDs such that Fc logically implies all FDs in F+ and F
logically implies all FD’s in Fc+. There shouldn’t be any extraneous attribute in Fc .
Each determinant in Fc must be unique.
Sol:
i) Rewrite all FDs to contain single attribute on RHS.
A→B (1), A→C (2), B→C (3) , A→B(4) , AB→C(5)
ii) 1 and 4 are same. Hence, remove (4)
B in 5 is extraneous because A→B. Hence (5) can be written as A→C which is same as
Sol:
i) A→B , A→C , B→A , B→C , C→A , C→B
(1) (2) (3) (4) (5) (6)
ii) A→B , B→C =>A→C Hence, remove (2)
C→A , A→B =>C→B Hence, remove (6)
B→C , B→A =>B→A Hence, remove (3)
Sol:
i)AB→C , C→A , BC→D , ACD→B ,BE→C , EC→F, EC→A , CF→B,
(1) (2) (3) (4) (5) (6) (7) (8)
CF→D , D→E.
(9) (10)
In (7),’E’ is extraneous since C→A. Hence, remove E from (7) and then remove (7)
because it is equivalent to (2).
In (4),’A’ is extraneous since C→A. Hence, (4) becomes CD→B. No more reductions or
removals are possible. Therefore,
Fc ={AB→C , C→A , BC→D, CD→B ,BE→C ,EC→F ,CF→BD ,D→E}.
----As ACD→E, in FD (5) we can remove F to make ACD→E. Now, FD (5) is same as
FD (2). Hence, remove FD (5).
---- From revised FD (2) and FD (3), we get FD (6). (Pseudo Transitivity Rule) Hence,
remove FD (6). No more reductions are possible.
The above relation has redundancy in the form of <Rating, Hrly_Wage> pairs are stored at
multiple places. Due to this redundancy, the following anomalies occur:
1) Update Anomaly: The problem of updating one copy of redundant data without having a
similar update on other copies. ( We may update Hrly_Wage 200 as 300 for one record
leaving other two records with 200)
2) Insertion Anomaly: The inability to insert useful data without inserting unwanted data as
well. (Ex: to insert <10,500> as <Rating, Hrly_Wage> pair, we must have a sailor with rating
10)
3) Deletion Anomaly: The problem of losing useful data while deleting unwanted data. (If
the sailor 25 leaves the club, we miss the Hrly_Wage value for rating 7 as there is only one
sailor with rating 7).
These anomalies can be solved by decomposing the given table into two tables as follows:
Normal Form: A normal form defines the state of a relation with respect to functional
dependencies defined on that relation. By knowing the normal form of a relation, we are sure
that certain kind of problems does not occur whereas certain other kinds of problems may
occur. If we want to remove those problems also the relation must be refined to a higher
level.
Based on FDs, the following normal forms are defined:
-First Normal Form (1 NF)
-Second Normal Form (2 NF)
-Third Normal Form (3 NF)
-Boyce Codd Normal Form (BCNF)
These normal forms have increasingly restrictive requirements i.e., a relation which is in a
higher level NF will be in all lower level NFs.
Well Structured Relation: A relation which is free from redundancy and upon which we can
safely perform DML operations is called a well structured relation.
Normalization: The process of decomposing a relation with anomalies to form smaller and
well structured relations is called normalization.
S P D
s1 p1 d1 S P
s1 p1 P D
s1 p2 d2 p1 d1
s1 p2
s2 p1 d2 s2 p1 p2 d2
p1 d2
An instance of relation SPD Instance of SP Instance of PD
S P D
s1 p1 d1
s1 p1 d2
s1 p2 d2
s2 p1 d1
s2 p1 d2
SP ⋈PD
In the above decomposition the original instance was not recollected after joining the smaller
instances. Such decomposition is called lossy decomposition.
Partial FD: A FD in which a non-key attribute is determined by part of the key rather than
full key is called Partial FD. (Ex: If A and B are key attributes, then the FD: A→C is called
partial FD where C is a non-key attribute.)
Transitive FD: A FD that exists among non-key attributes is called Transitive FD. (Ex: If A
and B are key attributes, C and D are non-key attributes of a relation R, then the FD: C→D is
called Transitive FD.)
Key: <empno,course>
Second Normal Form (2NF): A relation R is in 2NF if it is in 1NF and does not contain any
partial functional dependencies. A functional dependency in which a non-key attribute
depends on only part of the key is called “Partial FD”.
Due to the presence of partial FDs, above relation is not in 2NF. It can be converted into a
collection of 2NF relations as follows:
Store( Store, Mgr)
Stock
Third Normal Form (3NF): A relation ‘R’ is in 3NF if it is in 2NF and does not contain any
transitive functional dependencies. A FD in which a non-key attribute determines other non-
key attribute is called “Transitive FD”.
Ex1:
Consider R(A,B,C,D) F = { A →B, B→C, A→D }
{A}+ = {A,B,C,D}
Hence, {A} is the key
Hence B,C and D are called non-key attributes
Based on F, R is 2NF but not 3NF.
R is decomposed into a collection of 3NF relations as follows. R1( B, C), R2(A, B, D)
Ex 1:
Consider R(Teacher#, Student#, Course#, Grade) and following FDs:
FD1: {Teacher#}→{Course#}
FD2: {Teacher#, Student#}→{Course#, Grade}
FD3: {Student#, Course#}→{Teacher#, Grade}
In FD1, the determinant is not a key. Hence, R is not in BCNF. It can be decomposed into R
collection of 2NF relations as follows:
R1( Teacher#, Course#) R2( Student#, Teacher#, Grade)
Ex 2:
Consider R(A,B,C,D) and F ={A→B, BC→D, D→E, E→A}
Find all candidate keys of R. Find the best normal form that R satisfies. Decompose R into a
collection of BCNF relations.
Sol:
{AC}+ = {A,B,C,D,E} {DC}+ = {A,B,C,D,E}
+
{EC} = {A,B,C,D,E} {BC}+ = {A,B,C,D,E}
Therefore, the candidate keys are {AC}, {DC}, {EC} and {BC}
Hence, all attributes are key attributes or prime attributes. Hence, R is in 3NF. But, except the
FD, BC→D, remaining FDs violate BCNF. Hence, the following decomposition is a
collection of BCNF relations.
R1(E,A), R2( B, C, D, E) (in 2NF, But not in 3NF)
Now, decompose R2 into R21(D,E) and R22(B,C, D). Now, {R1, R21,R22} is a BCNF
collection of R.
X Y A
x y1 a
x y2 ?
Let R is in BCNF. Let the FD, X→A holds on R. Hence, the value of A in second tuple
should be ‘a’. This appears as (X,A) pairs are redundantly stored.
But when R is in BCNF, X must be a key and hence it must determine Y also. Hence, the
value of Y should be same (either y1 or y2) in both the tuples. i.e., the 2 tuples represent a
single tuple. Hence, there is no redundancy.
Lossless join: Let R a relation which is decomposed into R1 and R2 with sets of attributes X
and Y. Let r is an instance of R.
If πX(r) ⋈πY(r) = r, then the decomposition of R is would be a lossless join.
Test: Let R be a relation and F be the set of FDs on R.
Let R is decomposed into R1 and R2
If either the FD R1∩R2 →R1 or R1∩R2 →R2 is in F+, then we say the decomposition is
lossless.
Dependency Preservation:
If we are able to enforce each of the original FDs on smaller relations without performing a
join, such a decomposition is said to be dependency preserving.
Test:
Let a relation R with a set of FDs ‘F’ be decomposed into 2 relations with sets of attributes X
and Y.
--- Let Fx be the set of FDs from F+ that contain only attributes in X.
--- Let Fy be the set of FDs from F+ that contain only attributes in Y.
if (Fx U Fy ) = F+, the decomposition is dependency preserving.
Ex: Let R(A,B,C)
F = { A→ B, B→C, C→A} is split into R1(A,B) and R2(B,C)
Is this decomposition dependency preserving?
Sol:
F+ = {A→B, B→C, C→A, C→B, A→C, B→A}
X = {A,B} Y = {B,C}
Fx = {A → B, B→A}
Fy = {B→C,C→B}
Fx U Fy = {A→B, B→C, C→B, B→A}
(Fx U Fy)+ = {A→B, B→C,C→B, B→A,A→C,C→A}
= F+
Hence, the decomposition is dependency preserving.
Note:
Databases are Select intensive and storage intensive.
Storage Intensive- Higher level normalization,
Select Intensive - Lower level normalization.
Consider the following relation R(Course, Teacher, Book) I.e., R(C,T,B). The meaning of a
tuple is teacher T teaches course C and B is a recommended book for C.
X Y Z
a b1 c1 ----tuple t1
a b2 c2 ----tuple t2
a b1 c2 ----tuple t3
a b2 c1 ----tuple t4
In the above instance, t1.x = t2.x, t1.xy = t3.xy and t2.z = t3.z
-Y⊆X
MVD, X→→ Y one of the following must be true.
- X is a super key.
- XY = R
The relation R(Course, Teacher, Book) is not in 4NF due to the presence of the MVDs
C→→T and C→→B.
- R can be decomposed into R1(Course, Teacher) and R2(Course, Book) which are in 4NF.
In a temporal database that stores data relating to time instances, it is necessary to distinguish
between the surrogate key and the business key. Every row would have both a business key
and a surrogate key. The surrogate key identifies one unique row in the database and the
business key identifies one unique entity of the modelled world. One table row represents a
slice of time holding all the entity's attributes for a defined time span. For example, a
table Employee_Contracts may hold temporal information to keep track of contracted
working hours. The business key for one contract will be identical (non-unique) in both rows
however the surrogate key for each row is unique.
Working
Surrogate Business Employee
Hours Per Row Valid From Row Valid To
Key Key Name
Week
Join Dependencies and 5th Normal Form (Projection Join Normal Form):
Join Dependency:
If the join of R1 and R2 over Q is equal to relation R then we can say that a join
dependency exists, where R1 and R2 are the decomposition R1 (P, Q) and R2 (Q, S) of a
given relation R (P, Q, S). R1 and R2 are a lossless decomposition of R.
Fifth normal form (5NF) is also known as Project-Join Normal Form (PJNF). It is a
level of database normalization designed to reduce redundancy in relational databases. A
relation is said to be in 5NF if and only if it satisfies 4NF and no join dependency exists. A
relation is said to have join dependency if it can be recreated by joining multiple sub
relations and each of these sub relations has a subset of the attributes of the original
relation.
Example:
Consider the relation R below having the schema R(Supplier, Product, Consumer). The
primary key is a combination of all three attributes of the relation.
Supplier Product Consumer Supplier Product
S1 P1 C2 S1 P1
S1 P2 C1 S1 P2
S2 P1 C1 S2 P1
S1 P1 C1
Table 1 Table 2
C1 P2 S1 C1
C1 P1 S2 C1
Table 3
Table 4
The table Table1 has no FDs and no MVDs. The key for the table is {Supplier, Product,
Consumer}. Hence, the table is in 4NF. Still, there is redundancy in the table in the form of
<S1,P1> pair and <P1,C1> pair are redundantly stored. This redundancy is due to join
⋈{Table 2,Table 3,Table 4} gives the original instance of the table (Table 1). Hence join
dependency.
dependency exists in Table 1. Therefore, Table 1 is not in 5NF or PJNF. However Table 2,
Table 3 and Table 4 satisfy 5NF as they have no multi valued dependency and cannot be
decomposed further. But this might not be true in all cases i.e., when we combine the
decomposed tables, the resultant table may not be equivalent to the original table, in that
case the original table is said to be in 5NF provided it is already in 4NF. However, 5NF is
not applied in practical scenarios and remains limited to theoretical concepts.
But, this decomposition is not dependency preserving because to check BC→D, we need to
join R1 and R3.
***The following decomposition of R(A,B,C,D,E) under same set of FDs is dependency
preserving. R1(B,C,D) R2(A,B) R3(A,C,E)***.
Ex 2:
Identify all candidate keys for R. Identify the best normal form that R satisfies. Decompose
R into a set of BCNF relations. Decompose R into a set of 3NF relations.
Sol: R =(A, B, C, D). F = {C→D, C→A, B→C}. The only candidate key is {B}
R is in 2NF but not in 3NF because FDs C→D and C→A are transitive. Now, decompose R
into R1(C,D) and R2(A,B,C). R2 is still not in 3NF. Decompose R2 into R3(C,A) and
R4(B,C) . Now, R1, R3 and R4 are in 3NF as well as in BCNF. The decomposition is both
lossless and dependency preserving.
Candidate Keys of R are: {AB}, {BC}, {CD} and {AD}. Hence, all attributes are key
attributes. As there are no partial or transitive FDs, R is in 3NF. But, C →A and D → B cause
violation of BCNF. Hence, decompose R into R1(D,B) and R2(D,A,C). Still, R2 is not in
BCNF. Hence, decompose R2 into R3(C,A) and R4(D,C).
Ex 4:
Find the best normal form satisfied by the relation R(A, B, C, D, E) with FD set
F={ BC→D, AC→BE, B→E }
The only candidate key of R is {AC} because {AC} +={A,B,C,D,E}. Also, neither A nor
C is determined from any other attribute. Hence, no other candidate key is exists.
Prime attributes are those attributes which are part of candidate key. i.e., A and C in
this example and B, D and E are non-prime attributes.
The relation R is in 1 st normal form as relational DBMS does not allow multi-valued or
composite attribute.
The relation is also in 2 nd normal form because in BC→D, BC is not a proper subset of
candidate key {AC}. In AC→BE, AC is candidate key. In B→E, B is not a proper subset of
candidate key AC.
The relation is not in 3rd normal form because in BC→D, neither BC is a super key nor D
is a prime attribute. In B→E neither B is a super key nor E is a prime attribute. Hence both
the FDs violate the condition of 3 rd normal form. So the highest normal form of relation is
2nd normal form.
To get a collection of 3NF relations, create R1 and R2 as follows.
R1(B,E) FR1= {B→E} R2(A,C,B,D) FR2= {BC→D, AC→B}. Now, R1 is in 3NF but R2 is
not in 3NF as in FD BC→D, neither BC is a super key nor D is a prime attribute. Now,
decompose R2 as follows. R21(B,C,D) R22(A,C,B).
Finally, {R1, R21,R22} is a 3NF collection of R. This collection is in BCNF also.
-----XXXXX-----