Unit Ii
Unit Ii
1
Syllabus
• Relational algebra,
• Tuple and domain relational calculus,
• SQL3, DDL and DML constructs,
• Open source and Commercial DBMS - MYSQL, ORACLE,
DB2, SQL server.
• Relational database design:
– Domain and data dependency,
– Armstrong's axioms,
– Normal forms,
– Dependency preservation,
– Lossless design.
2
Relational algebra
3
Relational Query Languages
• Query languages: Allow manipulation and retrieval of data from
a database.
• Structured Query Language (SQL)
– Application-level query language
– Declarative
• Relational Algebra
– Intermediate language used within DBMS
– Procedural
Relational Algebra in a DBMS
Optimized
Relational Relational Query Executable
SQL algebra execution code
algebra
query expression expression plan
Code
parser
generator
Query optimizer
DBMS
5
Relational Algebra
• Relational algebra is a language for expressing relational
database queries.
• Relational algebra is a procedural query language.
• A query language is a language in which a user requests
information from the database.
• The basic set of operations for the relational model is known
as the relational algebra.
6
Unary Relational Operations
These operations are called unary operations, because they
operate on one relation.
– select:
– project:
– rename:
7
SELECT Operation
• SELECT operation is used to select a subset of the tuples from a
relation that satisfy a selection condition.
Syntax:
<selection condition>(R)
8
Loan Relation
9
Write the Relational Algebra expression
1. Select those tuples of the loan relation where the branch is
“Perryridge”
2. Find tuples in which the amount lent is more than $1200.
3. Find those tuples pertaining to loans of more than $1200
made by the Perryridge branch.
10
Write the Relational Algebra expression
1. Select those tuples of the loan relation where the branch is
“Perryridge”.
11
Write the Relational Algebra expression
12
Write the Relational Algebra expression
13
PROJECT Operation
This operation selects certain columns from the table and discards
the other columns.
Syntax:
<Attribute list>(R)
14
Write the Relational Algebra expression
1. List all loan numbers and the amount of the Loan.
15
Customer relation
16
Give the Relational Algebra expression
1. Find those customers who live in Harrison.
17
Give the Relational Algebra expression
18
Rename Operation
• It is used to rename relations as well as the attribute.
• The rename operation is represented by the lowercase Greek letter (rho).
• Renaming operation is very useful in the case of union and join operation.
• It improves the readability and better understandability.
- S (B1, B2, …, Bn ) ( R) is a renamed relation S based on R with column names B1, B2, ,…..Bn.
19
Rename Operation
• EMP1 ( EMP)
In this example we only rename the relation from EMP to EMP1.
• EMP1(Name1,dept1) ( EMP)
In this example we rename EMP as EMP1 and attributes Name and dept
as Name1 and dept1.
r (Name1,dept1) ( EMP)
In this example we rename the attributes only.
20
Set Theory operations
• Union Operation ()
• Intersection Operation()
• Set Difference (or MINUS) Operation(- )
21
UNION Operation
The result of this operation, denoted by R S, is a relation that
includes all tuples that are either in R or in S or in both R and S.
Duplicate tuples are eliminated.
22
Example
The depositor relation The borrower relation
23
Example
Find the names of all customers who have a loan, an account, or
both, from the bank.
24
INTERSECTION Operation
The result of this operation, denoted by R S, is a relation that
includes all tuples that are in both R and S.
25
Example
To find all customers who have both a loan and an account at the
bank.
26
Set Difference (or MINUS) Operation
• The result of this operation, denoted by R - S, is a relation that
includes all tuples that are in R but not in S.
27
Set Difference (or MINUS) Operation
Find all customers of the bank who have an account but not a
loan.
28
Additional Operations(Binary operations)
• Cartesian product (or cross product) or cross join (X)
• JOIN Operation
– Natural join
– Equi join
– Theta join
– Outer join
• Division
• Assignment
29
Cartesian product (cross product)
•The Cartesian-product operation, denoted by a cross (×), allows us
to combine information from any two relations.
•We write the Cartesian product of relations r1 and r2 as r1 × r2.
30
Example
Find the names of all customers who have a loan at the Perryridge
branch.
31
Result of borrower × loan.
32
σbranch-name =“Perryridge” (borrower × loan).
33
Πcustomer-name (σborrower .loan-number =loan.loan-number
(σbranch-name =“Perryridge” (borrower × loan)))
34
Natural join
• Natural join ( ) is a binary operator that is written as (R S)
where R and S are relations.
35
Example
• For an example consider the tables Employee and Dept their
natural join:
Employee Dept
Employee Dept
36
Equi join
37
38
39
Theta join or condition join
• This is same as EQUI join , but it allows all other operators like <,
>, >= etc.
• The θ-join is a binary operator that is written as R S where a
a θ b
and b are attribute names, θ is a binary relation in the set {<,
≤, =, >, ≥}, and R and S are relations.
40
Example
• Consider tables Car and Boat which list models of cars and
boats and their respective prices.
• Suppose a customer wants to buy a car and a boat, but she
doesn't want to spend more money for the boat than for the
car.
• The θ-join on the relation CarPrice ≥ BoatPrice produces a
table with all the possible options.
41
Example
42
Outer Join
• An extension of the join operation that avoids loss of
information.
• Computes the join and then adds tuples form one relation
that does not match tuples in the other relation to the result
of the join.
• Uses null values:
– null signifies that the value is unknown or does not exist
– All comparisons involving null are false by definition.
43
Outer Join – Example
• Relation loan
loan_number branch_name amount
L-170 Downtown 3000
L-230 Redwood 4000
L-260 Perryridge 1700
Relation borrower
customer_name loan_number
Jones L-170
Smith L-230
Hayes L-155
loan borrower
loan_number branch_name amount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
44
Outer Join – Example
loan_number branch_name amount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null
45
Outer Join – Example
Right Outer Join
loan borrower
46
Division
• The division is a binary operation that is written as R ÷ S.
• The result consists of the header which is in R but not in the
header of S, for which it holds that all their combinations with
tuples in S are present in R. It is suited to queries that include
the phrase “for all”.
• QUESTION: list the names of students who works for all the
database project.
47
Completed (R)
DBProject (S)
48
Assignment Operation
• The assignment operation () provides a convenient
way to express complex queries.
• The assignment operation, works like assignment in a
programming language.
• Assignment must always be made to a temporary
relation variable.
• Example:
– Temp1 r x s
– A,B (Temp1 )
– The result to the right of the is assigned to the
relation variable on the left of the .
– May use variable in subsequent expressions.
49
Aggregate Functions
• Aggregate function takes a collection of values and returns
a single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
50
Sum function
works
Find out the total sum of salaries of all the employees in the bank.
The relational algebra expression is:
g sum(salary) (works)
51
Count function
• Find the number of branches appearing in the works relation.
G count-distict(branch –name)(works)
52
Banking Example
53
Example Queries
• Find all loans of over $1200
Find the loan number for each loan of an amount greater than
$1200
loan_number (amount > 1200 (loan))
54
Continued…………
Find the names of all customers who have a loan, an account,
or both, from the bank
customer_name (borrower) customer_name (depositor)
55
Continued………
• We can find all customers of the bank who have an account
but not a loan.
56
Continued…………
• Find the names of all customers who have a loan at the
Perryridge branch.
customer_name (branch_name=“Perryridge”
customer_name(depositor)
57
Continued…………
• Find the names of all customers who have a loan at the
bank along with the loan number and the loan amount.
58
Solve
• employee (person-name, street, city)
• works (person-name, company-name, salary)
• company (company-name, city)
• manages (person-name, manager-name)
Consider the above relational database. Give an expression in
the relational algebra to express each of the following queries:
59
Questions
1. Find the names of all employees who work for BSP.
2. Find the names and cities of residence of all employees who
work for BSP.
3. Find the names, street address, and cities of residence of all
employees who work for First Bank Corporation and earn
more than $10,000 per annum.
4. Find the names of all employees in this database who live in
the same city as the company for which they work.
5. Find the names of all employees who live in the same city
and on the same street as that “Mr. Ravi”.
6. Find the names of all employees in this database who do not
work for BSP.
60
Relational calculus
(Tuple and domain relational calculus)
61
Relational calculus
• It is a nonprocedural query language.
• It describes the desired information without giving a specific
procedure for obtaining that information.
• Two types of relational calculus
– Tuple relational calculus
– Domain relational calculus
62
Tuple Relational Calculus
• A nonprocedural query language, where each query is of the
form
{t | P (t ) }
• It is the set of all tuples t such that predicate or condition P is
true for t.
63
Logical connectors
There are some logical connectors that are used in relational
calculus.
belongs to
implication(implies)
and
V or
not or negation
existential or there exist
universal or for all
64
Calculus Formula
1. Set of attributes and constants
2. Set of comparison operators: (e.g., , , , , , )
3. Set of connectives: and (), or (v)‚ not ()
4. Set of quantifiers:
t r (Q (t )) ”there exists” a tuple in t in relation
r such that predicate Q (t ) is true
t r (Q (t )) Q is true “for all” tuples t in relation r
65
Banking Example
66
Example Queries
1. Find the loan_number, branch_name, and amount for loans of
over $1200.
{t | t loan t [amount ] 1200}
67
Example Queries
3. Find the names of all customers having a loan at the Perryridge
branch.
customer_name (branch_name=“Perryridge”
(borrower.loan_number = loan.loan_number(borrower x loan)))
The set of all tuples t such that there exists a tuple s in relation loan for
which the values of t and s for the loan_number attribute are equal.
68
Example Queries
4. Find the names of all customers having a loan, an account, or both at
the bank.
5. Find the names of all customers who have a loan and an account
at the bank.
{t | s borrower ( t [customer_name ] = s [customer_name ])
u depositor ( t [customer_name ] = u [customer_name] )
69
Domain Relational Calculus
71
Example Queries
1. Find the loan_number, branch_name, and amount for loans of
over $1200.
72
Example Queries
2. Find all loan numbers for loans with an amount greater than
$1200.
73
SQL3, DDL and DML constructs
74
Relational database design:
Domain and data dependency,
Armstrong's axioms,
Normal forms,
Dependency preservation,
Lossless design.
91
Informal design guidelines for relational
schemas
• Semantics of attributes.
• Reducing the redundant values in tuples.
• Reducing the null values in tuples.
• Disallowing the possibility of generating spurious tuples.
92
Semantics of attributes
• Semantics – specifies how to interpret the attribute values stored
in a tuple of the relation.
• Name of the attribute must have some meaning.
• Relationship among the relations must be clear.
GUIDELINES 1:
Design a relation schema so that it is easy to explain its meaning.
Do not combine attributes from multiple entity types into a single
relation. 93
Reducing the redundant values in tuples
• One goal of schema design is to minimize the storage space that the
base relations occupy.
• Grouping attributes into relation schemas has a significant effect on
storage space.
• Mixing attributes of multiple entities may cause problems
• Information is stored redundantly wasting storage
• Problems with update anomalies
– Insertion anomalies
– Deletion anomalies
– Modification anomalies
94
Emp_Com
Insertion Anomalies
98
99
Functional Dependencies
• Functional dependencies (FDs) are used to specify formal
measures of the "goodness" of relational designs.
• A Functional dependency define the properties of the database
schema.
• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes.
• Functional dependency is a constraint between two sets of
attributes in a relation from a database.
• An FD is a property of the attributes in the schema R.
• The constraint must hold on every relation instance r(R).
100
Functional Dependencies
• A functional dependency is a constraint that specifies the
relationship between two sets of attributes where one set can
accurately determine the value of other sets.
• It is denoted as X → Y, where X is a set of attributes that is
capable of determining the value of Y.
• The attribute set on the left side of the arrow, X is
called Determinant, while on the right side, Y is called
the Dependent.
• The constraint is for any two tuples t1 and t2 in r if t1[X] =
t2[X] then they have t1[Y] = t2[Y].
• This means the value of X component of a tuple uniquely
determines the value of component Y.
101
Example
102
Example
• Consider the relation schema EMP_PROJ in the figure below:
103
Types of Functional Dependency
1. Full functional dependency
2. Partial dependency
3. Transitive dependency
4. Trivial and Non-trivial dependency
104
Full functional dependency
•For a relation schema R and a FD X->Y is a full functional
dependent if you remove any attribute from X the dependency
does not hold any more.
•{X-A}-> Y is no longer true.
105
Partial dependency
• A partial dependency X → Y, then there is some attribute on
X that can be removed from X and yet the dependency stills
holds.
106
Transitive dependency
108
Prime & non-prime attributes
If an attribute is a candidate key or a subset of candidate key
then it is called a prime attribute.
For ex,
A B C D
109
Inference Rules for FDs(Armstrong Axioms)
110
Armstrong's inference rules
111
Questions on inference rules for FD
1. Let set of FDs are given
F={ A->B,C->X, BX->Z}
Prove or disprove that AC->Z
Sol n:
C->X and BX->Z then CB->Z
By rule A6: Pseudo transitivity
A->B(given) and CB->Z then AC->Z
By rule A6: Pseudo transitivity
112
Questions on inference rules for FD
2. Let set of FDs are given
F={ A->B,C->D, C is subset of B}
Prove or disprove that A->C
Sol n:
C is subset of B then B->C
By rule A1: Reflexivity
A->B(given) and B->C then A-> C
By rule A3: Transitivity
113
Questions on inference rules for FD
3. Let set of FDs are given
F={ A->B,BC->D}
Prove or disprove that AC->D
Sol n:
A->B and BC->D then AC->D
By rule A6: Pseudo transitivity
114
Questions on inference rules for FD
4. Let set of FDs are given
F={ A->B,BC->D}
Prove or disprove that AD-> B
Sol n:
A->B then AD->BD
By rule A2: Augmentation
AD->BD then AD->B and AD->D
By rule A5: Decomposition
115
Questions on inference rules for FD
5. Let set of FDs are given
F={ A->B,A->C,BC->D}
Prove that A->D
Sol n:
A->B and A->C then A->BC
By rule A4: union
A->BC and BC->D(given) then A->D
By rule A3: Transitivity
116
Un-normalized Relation
A table is said to be un-normalized if each row contains
multiple set of values for some of the columns, these multiple
values in a single row are also called non-atomic values.
117
Normal form
• Basically the normal form of data indicate how much
redundancy is in that data.
• Types of normal form
– 1NF Keys; No repeating groups
– 2NF; No partial dependencies
– 3NF; No transitive dependencies
– BCNF; Determinants are candidate keys
– 4NF; No multi-valued dependencies
– 5NF (PJNF)
– DKNF(Domain Key Normal Form)
118
Introduction to Normalization
• Normalization , takes a relation schema through a series of tests to “certify” whether it
satisfies a certain normal form.
• The process, which proceeds in a top down fashion by evaluating each relation against
the criteria for normal forms and decomposing relations as necessary.
• Normalization is the process of decomposing unsatisfactory "bad" relations by breaking
up their attributes into smaller relations while ensuring data integrity, eliminating data
redundancy and minimizing the insertion, deletion and update anomalies.
119
Need of Normalization
120
Benefits of Normalization
• Improve storage efficiency
• Quicker updates
• Less data inconsistency
• Clearer data relationships
• Easier to add data
• Flexible Structure
121
First Normal Form
• It states that the value of any attribute in any relation must be a
single atomic value.
• In other words, only one value is associated with each attribute
and the value is not a set of values or a list of values (No
composite Attributes).
122
123
124
Second Normal Form
• A relation schema R is in second normal form (2NF) if it is in
the 1NF and if all non-prime attributes of R are fully
functionally dependent on the primary keys.
• R can be decomposed into 2NF relations via the process of
2NF normalization.
125
Sales order relation
126
Example
• The above relation is in 1NF because of the values in the
domain of each attribute of the relation are atomic.
• Then we check the above relation for 2NF.
• Then first find the attributes that make primary key.
• No one attribute alone form a primary key for relation sale
order.
• So we form (order, product) combined as a key of relation.
127
Continued…….
• Possible FDs are:
Order-> Customer
Order-> Address
Customer-> Address
Product -> Unit price
Order, product-> Quantity
Quantity
128
Second Normal Form
• 2NF means no partial dependencies on the key. But we have
• {order}-> {customer }
• {order}->{ address}
• {product}-> {unit price}
So the relation is in 1NF but not in 2NF. So we convert the relation
sale order in 2NF.
129
Conversion of 1NF to 2NF
Quantity
130
Relation in 2NF
131
Relation not in 2NF
134
Definition: Third normal form
Relation R is in 3NF if and only if every dependency A->B
satisfied by R meets at least ONE of the following criteria:
1. A is a superkey or candidate key
2. B is a subset of a candidate key
Or
Relation R is in 3NF if and only if every dependency A->B
satisfied by R meets at least ONE of the following criteria:
1. LHS of all FD is superkey or candidate key
2. RHS is a prime attribute
135
Steps to convert 2NF to 3NF
1. First find all candidate keys.
2. Determine all the prime and nonprime attributes separately.
3. Take first one as primary key. Try to find attributes that show
transitive dependency between them. This we can search
from FD diagram.
4. Try to remove the transitivity.
136
Functional dependencies in sales order
table
138
Relations in 3NF
139
Problems with 3NF
• There are some circumstances where 3NF suffers from certain
inadequacies:
• It does not deal satisfactory with relations which
– Have multiple candidate keys, where
– These candidate keys are composite, and
– The candidate keys overlap.
140
Problems
• Consider a relation schema R={A,B,C,D} and set of FDs are AB-
>CD and D->A. check whether above schema is in 3NF or not.
Solution:
First find the candidate key of above relation.
AB+={A,B,C,D}
So AB is the candidate key.
In FD,
AB->CD, LHS is candidate key.
D->A, RHS is a prime attribute.
So, above relation is in 3NF.
141
Problems
• Consider a relation schema R={A,B,C,D} and set of FDs are AB->C
and C->D. check whether above schema is in 3NF or not.
Solution:
First find the candidate key of above relation.
AB+={A,B,C,D}
So AB is the candidate key.
In FD,
AB->C, LHS is candidate key.
C->D, LHS is not candidate key or RHS is not prime attribute.
So, above relation is not in 3NF.
142
Boyce-Codd Normal Form (BCNF)
• A relation is in BCNF, if and only if,
– It is in 3NF and
– Every determinant in that table is a candidate key.
If a table contains only one candidate key, 3NF and BCNF
are equivalent.
143
Example
• Consider the following relation and determinants.
R(a,b,c,d)
a,b -> c,d
a,d -> b
• Here, the first determinant suggests that the primary key of R
could be changed from a,b to a,c.
• If this change was done all of the non-key attributes present in
R could still be determined, and therefore this change is legal.
• However, the second determinant indicates that a,d determines
b, but a,d could not be the key of R as a,d does not determine
all of the non key attributes of R (it does not determine c). We
would say that the first determinate is a candidate key, but the
second determinant is not a candidate key, and thus this
relation is not in BCNF (but is in 3rd normal form).
144
Example
145
Student relation
148
Continued…….
II. We will discover that the new relation using rule 1 has a
partial functional dependency. Because subject is
functionally dependent on trainer which is just one
component of the candidate key. So the new relation is in
1NF, but not in 2NF. So we decompose the relation to
eliminate the partial functional dependency.
149
3NF vs BCNF
150
Questions on Normalization
153
Multivalued Dependencies
• A multivalued dependency (MVD) exists between two fields
X and Y, when a single value of X is directly associated with
two or more values of Y.
• An MVD is represented as X Y and can be read as:
– “the value of X determines multiple values of Y”
Or
– “multiple values of Y are functionally dependent on the
value of X.
154
Teaching database
155
Example
1. {course} {book}
2. {course} {lecturer}
156
Types of MVD
• A multi-valued dependency can be further defined as being
trivial or nontrivial.
– A MVD A −>> B in relation R is defined as being
trivial if (a) B is a subset of A or (b) A B = R.
– A MVD is defined as being nontrivial if neither (a)
nor (b) are satisfied.
157
Definition MVD
• A multi-valued dependency X->> Y specified on a relational
schema R, where X and Y are both subset of R, specifies the
following constraint on any relation state r of R.
• If two tuples t1 and t2 exist in ‘r’ such that t1[X] = t2[X], then two
tuples t3 and t4 should also exist in R with the following
properties,
158
Example Name-
>>Email-id
Student (Name, Phone,Email-id,Age) with MVD Name->>Email-id. If student has
the two tuples:
t1
t2
It must also have the same tuples with Email-id components swapped :
t4
t3
159
Fourth Normal Form (4NF)
A relation schema R is in 4NF with respect to a set of
dependencies F if,
• It is in BCNF and
• For every nontrivial multivalued dependency X —>> Y , X is a
superkey —that is, X is either a candidate key or a superset
thereof.
160
Pizza Delivery Permutations
Restaurant Pizza Variety Delivery Area
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's ThinCrust Shelbyville
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
161
Pizza Delivery Permutations
162
To satisfy 4NF, we must place the facts about varieties offered into a
different table from the facts about delivery areas:
164
Join dependency
165
Join dependency
166
Types of Join dependency
• If any Ri of join dependency (R1,R2,...Rn) is equal to R, then it
is called a trivial join dependency else non-trivial i.e
Ri =R or r(Ri )= r(R).
167
5NF or Project Join Normal Form(PJNF)
A relation R is in 5NF, if
• it is in 4NF and
• for every non-trivial JD * (R1,R2,...Rn) of R, every Ri is a super key
of R.
168
Example
• Consider a relation schema
• R(A,B,C,D,E,F) with FDs A->BC and F->DE, JD
*[ABC,ADF,DEF]. Show that R is in 5NF or not.
Solution:
candidate key is =AF
So, R is not in PJNF because in the
JD *[ABC,ADF,DEF], ABC or DEF is not a super key of R.
169
Attribute closure
• In any relation R, let X be a set of attributes of R and a set of
functional dependencies F, we need a way to find all of the
attributes of R that are functionally determined by X.
• This set of attributes is called the closure of X under F and is
denoted X+ .
Ex:
a. If A -> B, then A+ = {A,B}
b. If A -> BC, then A+ = {A,B,C}
c. If A -> BC and C-> F, then A+ = {A, B,C,F}
170
Algorithm for finding the attribute closure
171
Question 1
R={A,B,C,D,E,F}
Let F={A-> B, BC->DE,AEF->G,B->F}
FIND A+, BC+, AEF+
Solution:
A+ ={A,B,F}
AEF+ ={A,B,E,F,G}
BC+ ={B,C,D,E,F}
172
Question 2
Consider a relation R with attributes
R={A,B,C,D,E,F} AND
FD={A->BC, E->CF,B->E,CD->EF}
COMPUTE AB+ .
Solution:
AB+ ={A,B,C,E,F}
173
Question 3
A relation schema
R = {A, B, C, D, E, F, G, H, I} &
the set of FDs F = {A->B, A->C, CG -> H, CG->I, B->H} holds
on R. compute AG +.
Solution:
AG + ={A,G,B,C,H,I}
174
Closure of set of functional dependencies
• F+ is a set of FDs that are implied by the FDs in F.
Algorithm
• Start with set of existing functional dependencies
• Apply rules of inference to determine new dependencies
• Iterate until set does not change
175
Question 1
• Compute F+ for relation schema R(A,B,C,D,E,F) and F={A->
B, A-> C,CD-> E,CD-> F, B-> F)
Solution:
A-> B and A-> C
By rule A4:union rule A->BC
A-> B and B-> F
By rule A3: transitive rule A->F
CD-> E and CD-> F
By rule A4:union rule CD->EF
176
Question 2
• Compute F+ for relation schema R(A,B,C,D) and F={A-> B,
A-> C,BC-> D)
Solution:
A-> B and A-> C
By rule A4:union rule A->BC
A-> BC and BC-> D
By rule A3: transitive rule A->D
A-> C and BC-> D
By rule A6:pseudotransitive rule AB->D
177
Question 3
• Compute F+ for relation schema R(A,B,C,D,E) and F={AB-
>C, A->D,D->E)
Solution:
A->D
By rule A2: augmentation rule AB->BD
AB->C and AB->BD
By rule A4: union rule AB->BCD
178
Equivalence between sets of Functional Dependencies
179
Equivalence between sets of Functional Dependencies
Take the attributes from the LHS of FDs in F and compute
attribute closure for each using FDs in G:
A+ using G = ABCD; A -> A; A -> B; A -> C; A -> D;
B+ using G = BC; B -> B; B -> C;
AC + using G = ABCD; AC -> A; AC -> B; AC -> C; AC -> D;
Notice that all FDs in F (highlighted) can be inferred using FDs
in G.
To see if all FDs in G are inferred by F, compute attribute closure
for attributes on the LHS of FDs in G using FDs in F:
A+ using F = ABCD; A -> A; A -> B; A -> C; A-> D;
B+ using F = BC; B -> B; B -> C;
Since all FDs in F can be obtained from G and vice versa, we
conclude that F and G are equivalent.
180
Example 1
A Counter Example:
F = {A -> B, A ->C}
G = {A -> B, B -> C}
F and G are not equivalent, because B -> C in G is not inferred
from the FDs in F.
A+ using G = ABC;
Whereas,
A+ using F = ABC;
B+ using F = B, indicating B -> C in G is not inferred using the
FDs from F.
181
Example 2
• Let the relation: R={A,B,C,D,E,F,G }satisfies following FDs
Let F={A-> B, BC->DE,AEF->G},
FIND AC+ . Is the FD, ACF ->DG implied by this set.
182
Decomposition and its desirable
properties
• Decomposition refers to breaking down of one table into
multiple tables such that every attribute appears in at least one
of the new relations .
• The process of normalization depends on being able to factor
or decompose a table into two or more smaller tables, in such
a way that we can recapture the precise content of the original
table by joining the decomposed parts.
183
Desirable properties of decomposition
• Attribute preservation: This is a simple and an obvious
requirement that involves preserving all the attributes that
were in the relation that is being decomposed.
• Lossy decomposition: There should be no loss of information
due to decomposition.
• Lossless join decomposition: Recombination using relational
join produces exactly same as pre decomposition.
• Dependency preservation: Dependency preservation is
another important requirement since a dependency is a
constraint on the database and if X-> Y holds than we know
that the two attributes are closely related and it would be
useful if both attributes appeared in the same relation so that
the dependency can be checked easily.
• Lack of redundancy: Redundancy or repetition should be
avoided as much as possible. 184
Example of lossy decomposition
Spurious tuples
185
Example Lossless-Join condition
Condition for lossless join.
1) All attributes of an original schema (R ) must appear in the
decomposition(R1 , R2 )
R= R1 R2
2) At least one of the following functional dependencies
holds
• If R1 R2= R1
• If R1 R2 = R2
186
Example 1
• Suppose we decompose the schema R={A,B,C,D,E) into
(A,B,C) and (A,D,E).
• Show that this decomposition is a loss less join decomposition
if the following FD’s hold
• A->BC
• CD->E
• B->D
• E->A
Solution:
• A decomposition {R1, R2} is a lossless-join decomposition if
R1 ∩ R2 → R1 or R1 ∩ R2 → R2.
• Let R1 = (A, B, C), R2 = (A, D, E), and
• R1 ∩ R2 = A.
• Since A is a candidate key , Therefore R1 ∩ R2 → R1. 187
Example 2
• Suppose we decompose the schema R={A,B,C,D,E) into
(A,B,C) and (C,D,E).
• Show that this decomposition is a loss less join decomposition
if the following FD’s hold
• A->BC
• CD->E
• B->D
• E->A
Solution:
• A decomposition {R1, R2} is a lossless-join decomposition if
R1 ∩ R2 → R1 or R1 ∩ R2 → R2.
• Let R1 = (A, B, C), R2 = (C, D, E), and
• R1 ∩ R2 = C.
• Since C is not candidate key , Therefore the decomposition is188
Example Dependency preservation
189
Dependency preservation
Condition
• If R is decomposed into X and Y then
190
Example 1
191
Example 2
192