DE Module3
DE Module3
DE Module3
The goal of relational database design is to generate a set of relation schemas that allow us to store
information without unnecessary redundancy, yet allows us to retrieve information easily. The approach is
to decompose relations into normal forms using functional dependencies.
Functional Dependency
A functional dependency(FD) occurs when one attribute in a relation uniquely determines another
attribute.
If X and Y are two attributes of a relation, then given the value of if there is only one corresponding
value of Y, then Y is functionally dependent on X, or X determines Y. It is denoted as X Y.
Functional dependency may also be based on a composite attribute ie. XY Z, where Z is
functionally dependent on the composite attributes XY.
If X Y then it doesn’t ensure whether YX or not.
Functional dependency is a constraint and cannot be determined by inspection of an instance of the
relation, unless we know that it is true for all possible legal states of the relation.
The diagrammatic notation of FD is the left hand side(LHS) attributes are connected by vertical lines
to the horizontal FD line, while the right hand side(RHS) attributes are connected by arrows pointing
to the attributes.
Example:
In Student(Rollno, Name, Address)
RollnoName
RollnoAddress
Student
2
iv. Decomposition Rule (Projective Rule)
If A BC then AB and A C
Ex : Rollno FirstName,LastName
Then Rollno Firstname and Rollno Lastname
Proof:
A BC holds
Then BC B (By Reflexive Rule), BC C (By Reflexive Rule)
Thus A B (By Transitive Rule) and A C (By Transitive Rule)
Thus Union and Decomposition rules together give us some choices as to how to
choose a set of functional dependencies.
Two FDs X and Y are equivalent only if X Y and Y X.
If X1X2………….Xn B1B2………….Bn then the dependency is
3
- Trival if Bs are subset of Xs.
- Non Trival if at least one of the Bs is not among Xs.
- Completely Non Trival if none of the Bs are among Xs.
Given
Let F be the set of FDs, A be the set of attributes of F, B be the set of attributes determined by A,
C be the set of new attributes determined by B (ie. Indirectly from A) then we compute A +
(closure of A).
Let Result be a variable which stores the values of A.
Algorithm
Step1: Result = A
Step 2: Repeat until Result is unchanged
Step 3: For each FD B C in F
Step 4: Repeat
Step 5: If B is a subset of Result then Result = Result U C
Step 6: Goto Step 3
Step 7: Goto Step2
Step 8: End
Example: Given set F
A BC, D E, DF GAH, G DF and D I
Computing A+
Initially A determines A, then on further iterations, it adds B and C
Thus A+ ABC
Computing D+
Initially D determines D, then on further iterations, it adds E and I
Thus D+ DEI
4
Computing DF+
Initially DF determines D and F,, then on further iterations, it adds E, G, A, H and I. On still
further iterations it adds B and C
Thus DF+DFEGAHIBC
Computing G+
Initially G determines G, then on further iterations, it adds D and F. On further iterations it
adds E, A, H and I. On still further iterations it adds B and C
Thus G+GDFEAHIBC
So closure of F ie.F+ are
A+ ABC
D+ DEI
DF+ DFEGAHIBC
G+GDFEAHIBC
5
C+ ABCD, B+ ABCD, A+ABCD, D+ABCD, AB+ABCD, BC+ ABCD,
ABC+ABCD
Thus any of A, B, C or D can be the key.
6
Algorithm
Step 1: Repeat until F is unchanged
Step 2: Use Union rule to replace any FD of the form AB and BC with ABC
Step 3: Find any FD AB with extraneous attributes in either A or in B
Step 4: If extraneous attributes are found then delete it from AB
Step 5: Goto step 1
Step 6: End
Example:
Given set F
ABC, B C, A B, ABC
Computing Fc
ABC, AB
Thus on combining ABC(since same LHS attribute)
A is extraneous in ABC since already BC
Thus delete A and it becomes BC
Now from ABC and BC, C is extraneous in first FD since already BC
Thus delete C and it becomes AB.
7
ANAMOLIES IN DESIGNING DB
Data redundancy means repetition of information in the relation(or table). The aim of the
database system is to reduce redundancy, because redundancy leads to the wastage of storage space and an
increase in the size of stored data. Redundancy also gives rise to inconsistency problems which are known as
Data Anomalies.
These problems are remedied by decomposition of the relations (or tables). Decomposition is the
replacement of a relation schema R = (A1, A2,…… An) by a set of relation schemas {R1,R2….Rm} such
that Ri is a subset of R (for 1<=i<=m) and R1 U R2 U ……… U.Rm = R
Some keywords
1. Functional dependency : In a given table, an attribute Y is said to have a functional dependency
on a set of attributes X (written X → Y) if and only if each X value is associated with precisely one Y
value.
For example, among the attributes "Employee ID" and "Employee Date of Birth", the functional
dependency {Employee ID} → {Employee Date of Birth} would hold.
8
3. Full functional dependency : An attribute is fully functionally dependent on a set of attributes X if
it is: functionally dependent on X, and not functionally dependent on any proper subset of X.
{Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional
dependency, because it is also dependent on {Employee ID}.
8. Candidate key :A candidate key is a special subset of superkeys that do not have any
extraneous information in them: it is a minimal superkey.
NORMALIZATION
It is the decomposition of the relation (or table) into smaller relations based on the concept of functional
dependencies, to overcome undesirable anomalies. It groups the data over a number of tables which are
independent and contain no duplicate data. It eliminates redundancy and promotes integrity.
There are different types of normal forms and each normal form is usually built upon the previous normal
form.
9
Ex: table not in 1NF
FriendInfo
Id Name FavouriteArtist
10 Smith Lata, Kishore
20 Hary Kishore, Rafi
Table in 1Nf
FriendList ArtistList
Id FafouriteArtist
Id Name 10 Lata
10 Smith 10 Kishore
20 Hary 20 Kishore
20 Rafi
Empno and Projno is the composite primary key, but Ename,Pname, ploc are not depending on whole of the
primary key, so it is not in 2NF, thus the table must be decomposed.
Table in 2NF
Empproj1
Empno Projno Totalhrs
10
Empproj2
Empno Ename
Empproj3
Empid is the primary key, but Dname depends on Deptno(ie non key attribute), so transitive dependencies
must be removed and the table decomposed.
Tables in 3NF
Empinfo1
Empinfo2
Deptno Dname
11
It is usually encountered when there are interdependencies ie one attribute depends on another and
the other attribute depends on the first.
A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key.
These could be same situation when a 3NF relation may not be in BCNF the following conditions are found
true.
1. The candidate keys are composite.
2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation
12
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are duplicated.
Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the Head of
Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting
Head of Dept. form the given relation. The normalized relations are shown in the following.
Professor Code Department Percent Time
P1 Physics 50
P1 Mathematics 50
P2 Chemistry 25
P2 Physics 75
P3 Mathematics 100
Department HeadOfDepartment
Ghosh Physics
Mathematics Krishnan
Chemistry Rao
13
If a relation contains multivalue dependencies ie the values of the column are linked with multiple values
of other columns independently then the relation is not in 4NF. If all possible combinations of the values of
the columns can exist then it is not in 4NF, and must be decomposed. Thus it is based on multivalue
dependency.
Example :In the example there are two many-to-many relationships i.e one between employees and
skills, and one between employees and languages.
Example: Table not in 4NF
Employee
Employee Skill Language
Smith Type English
Smith Public Speaking Hindi
Jack Type English
Jack ShortHand Hindi
The above relation should be divided into following relations in order to satisfy 4NF.
Tables in 4NF
EmpSkill EmpLang
Employee Skill Employee Language
Smith Type Smith English
Smith Public Speaking Smith Hindi
Jack Type Jack English
Jack ShortHand Jack Hindi
Multivalue dependency is denoted as ↠
14
If all possible combinations of the values of the columns cannot exist then it is not in 5NF and must
be decomposed, along with an extra join restriction table. Thus it is based on join dependency. It is also
known as Project Join Normal Form(PJNF).
Example :
If an agent sells a certain product, and he represents a company making that product, then he sells that
product for that company.
Table not in 5NF
Tab1
Agent Company Product
Smith Ford Car
Smith Ford Truck
Smith GM Car
Smith GM Truck
Jones Ford Car
Tables in 5NF
Note:
If a relation has join dependency then it can be divided into smaller relations such that if we
combine the smaller relations then we can get the original table.
If join dependency doesn’t exist then either data is lost or new entries are created.
Join decomposition is a further generalization of Multivalued dependencies.
If the join of R1 and R2 over C is equal to relation R then we can say that a join dependency (JD)
exists, where R1 and R2 are the decomposition R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D).
15
Alternatively, R1 and R2 are a lossless decomposition of R.
16
information may occur. The decomposition of one relation into such subrelations, which do not
give the exact original records, on joining the decomposed tables, is known as lossy
decomposition.(ie gives either more number or less number of rows). All normal forms satisfy
this property.
17
Now if R1(ABC) and R2(AD) then it is dependency preserving because FD A->BC is a part
of relation R1(ABC).
18