0% found this document useful (0 votes)
11 views192 pages

Unit Ii

The document covers relational query languages, including relational algebra, tuple and domain relational calculus, and SQL3. It discusses various operations such as select, project, and join, as well as aggregate functions and relational database design principles. Additionally, it provides examples and exercises related to querying databases using these languages and operations.

Uploaded by

idyllic.bns1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views192 pages

Unit Ii

The document covers relational query languages, including relational algebra, tuple and domain relational calculus, and SQL3. It discusses various operations such as select, project, and join, as well as aggregate functions and relational database design principles. Additionally, it provides examples and exercises related to querying databases using these languages and operations.

Uploaded by

idyllic.bns1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 192

UNIT II

Relational Query Languages

1
Syllabus
• Relational algebra,
• Tuple and domain relational calculus,
• SQL3, DDL and DML constructs,
• Open source and Commercial DBMS - MYSQL, ORACLE,
DB2, SQL server.
• Relational database design:
– Domain and data dependency,
– Armstrong's axioms,
– Normal forms,
– Dependency preservation,
– Lossless design.

2
Relational algebra

3
Relational Query Languages
• Query languages: Allow manipulation and retrieval of data from
a database.
• Structured Query Language (SQL)
– Application-level query language
– Declarative
• Relational Algebra
– Intermediate language used within DBMS
– Procedural
Relational Algebra in a DBMS

Optimized
Relational Relational Query Executable
SQL algebra execution code
algebra
query expression expression plan
Code
parser
generator

Query optimizer

DBMS
5
Relational Algebra
• Relational algebra is a language for expressing relational
database queries.
• Relational algebra is a procedural query language.
• A query language is a language in which a user requests
information from the database.
• The basic set of operations for the relational model is known
as the relational algebra.

6
Unary Relational Operations
These operations are called unary operations, because they
operate on one relation.
– select: 
– project: 
– rename: 

7
SELECT Operation
• SELECT operation is used to select a subset of the tuples from a
relation that satisfy a selection condition.

Syntax:
 <selection condition>(R)

where the symbol  (sigma) is used to denote the select operator.


The selection condition appears as a subscript to σ.

8
Loan Relation

9
Write the Relational Algebra expression
1. Select those tuples of the loan relation where the branch is
“Perryridge”
2. Find tuples in which the amount lent is more than $1200.
3. Find those tuples pertaining to loans of more than $1200
made by the Perryridge branch.

10
Write the Relational Algebra expression
1. Select those tuples of the loan relation where the branch is
“Perryridge”.

11
Write the Relational Algebra expression

2. Find tuples in which the amount lent is more than $1200.

12
Write the Relational Algebra expression

3. Find those tuples pertaining to loans of more than $1200 made


by the Perryridge branch.

13
PROJECT Operation
This operation selects certain columns from the table and discards
the other columns.

Syntax:
 <Attribute list>(R)

where  (pi) is the symbol used to represent the project operation


and <attribute list> is the desired list of attributes from the
attributes of relation R.

14
Write the Relational Algebra expression
1. List all loan numbers and the amount of the Loan.

15
Customer relation

16
Give the Relational Algebra expression
1. Find those customers who live in Harrison.

17
Give the Relational Algebra expression

1. Find those customers who live in Harrison.

18
Rename Operation
• It is used to rename relations as well as the attribute.
• The rename operation is represented by the lowercase Greek letter  (rho).
• Renaming operation is very useful in the case of union and join operation.
• It improves the readability and better understandability.

Syntax for rename:

-  S (B1, B2, …, Bn ) ( R) is a renamed relation S based on R with column names B1, B2, ,…..Bn.

-  S ( R) is a renamed relation S based on R.

-  (B1, B2, …, Bn ) ( R) is a renamed column names B1, B2,,…..Bn based on R .

19
Rename Operation
•  EMP1 ( EMP)
In this example we only rename the relation from EMP to EMP1.
•  EMP1(Name1,dept1) ( EMP)
In this example we rename EMP as EMP1 and attributes Name and dept
as Name1 and dept1.
r (Name1,dept1) ( EMP)
In this example we rename the attributes only.

20
Set Theory operations
• Union Operation ()
• Intersection Operation()
• Set Difference (or MINUS) Operation(- )

21
UNION Operation
The result of this operation, denoted by R  S, is a relation that
includes all tuples that are either in R or in S or in both R and S.
Duplicate tuples are eliminated.

22
Example
The depositor relation The borrower relation

23
Example
Find the names of all customers who have a loan, an account, or
both, from the bank.

24
INTERSECTION Operation
The result of this operation, denoted by R  S, is a relation that
includes all tuples that are in both R and S.

25
Example

To find all customers who have both a loan and an account at the
bank.

26
Set Difference (or MINUS) Operation
• The result of this operation, denoted by R - S, is a relation that
includes all tuples that are in R but not in S.

27
Set Difference (or MINUS) Operation

Find all customers of the bank who have an account but not a
loan.

28
Additional Operations(Binary operations)
• Cartesian product (or cross product) or cross join (X)
• JOIN Operation
– Natural join
– Equi join
– Theta join
– Outer join
• Division
• Assignment

29
Cartesian product (cross product)
•The Cartesian-product operation, denoted by a cross (×), allows us
to combine information from any two relations.
•We write the Cartesian product of relations r1 and r2 as r1 × r2.

30
Example
Find the names of all customers who have a loan at the Perryridge
branch.

Loan relation borrower relation

31
Result of borrower × loan.

32
σbranch-name =“Perryridge” (borrower × loan).

33
Πcustomer-name (σborrower .loan-number =loan.loan-number
(σbranch-name =“Perryridge” (borrower × loan)))

34
Natural join
• Natural join ( ) is a binary operator that is written as (R S)
where R and S are relations.

• The result of the natural join is the set of all combinations of


tuples in R and S that are equal on their common attribute
names.

35
Example
• For an example consider the tables Employee and Dept their
natural join:
Employee Dept

Employee Dept

36
Equi join

• EQUI JOIN performs a JOIN against equality or matching


column(s) values of the associated tables.

37
38
39
Theta join or condition join
• This is same as EQUI join , but it allows all other operators like <,
>, >= etc.
• The θ-join is a binary operator that is written as R S where a
a θ b
and b are attribute names, θ is a binary relation in the set {<,
≤, =, >, ≥}, and R and S are relations.

40
Example
• Consider tables Car and Boat which list models of cars and
boats and their respective prices.
• Suppose a customer wants to buy a car and a boat, but she
doesn't want to spend more money for the boat than for the
car.
• The θ-join on the relation CarPrice ≥ BoatPrice produces a
table with all the possible options.

41
Example

42
Outer Join
• An extension of the join operation that avoids loss of
information.
• Computes the join and then adds tuples form one relation
that does not match tuples in the other relation to the result
of the join.
• Uses null values:
– null signifies that the value is unknown or does not exist
– All comparisons involving null are false by definition.

43
Outer Join – Example
• Relation loan
loan_number branch_name amount
L-170 Downtown 3000
L-230 Redwood 4000
L-260 Perryridge 1700

 Relation borrower
customer_name loan_number
Jones L-170
Smith L-230
Hayes L-155

loan borrower
loan_number branch_name amount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
44
Outer Join – Example
loan_number branch_name amount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null

 Left Outer Join


loan borrower

45
Outer Join – Example
 Right Outer Join
loan borrower

loan_number branch_name amount customer_name


L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-155 null null Hayes
 Full Outer Join
loan borrower

loan_number branch_name amount customer_name


L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null
L-155 null null Hayes

46
Division
• The division is a binary operation that is written as R ÷ S.
• The result consists of the header which is in R but not in the
header of S, for which it holds that all their combinations with
tuples in S are present in R. It is suited to queries that include
the phrase “for all”.
• QUESTION: list the names of students who works for all the
database project.

47
Completed (R)
DBProject (S)

Completed (R)÷ DBProject (S)

48
Assignment Operation
• The assignment operation () provides a convenient
way to express complex queries.
• The assignment operation, works like assignment in a
programming language.
• Assignment must always be made to a temporary
relation variable.
• Example:
– Temp1  r x s
–  A,B (Temp1 )
– The result to the right of the  is assigned to the
relation variable on the left of the .
– May use variable in subsequent expressions.
49
Aggregate Functions
• Aggregate function takes a collection of values and returns
a single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values

50
Sum function
works

Find out the total sum of salaries of all the employees in the bank.
The relational algebra expression is:
g sum(salary) (works)

51
Count function
• Find the number of branches appearing in the works relation.
G count-distict(branch –name)(works)

• Find out the minimum , maximum and average salary of the


employee.
g min(salary) (works)
g max(salary) (works)
g avg(salary) (works)

52
Banking Example

branch (branch_name, branch_city, assets)

customer (customer_name, customer_street, customer_city)

account (account_number, branch_name, balance)

loan (loan_number, branch_name, amount)

depositor (customer_name, account_number)

borrower (customer_name, loan_number)

53
Example Queries
• Find all loans of over $1200

amount > 1200 (loan)

 Find the loan number for each loan of an amount greater than
$1200
loan_number (amount > 1200 (loan))

54
Continued…………
 Find the names of all customers who have a loan, an account,
or both, from the bank
customer_name (borrower)  customer_name (depositor)

 Find the names of all customers who have a loan and an


account at bank.

customer_name (borrower)  customer_name (depositor)

55
Continued………
• We can find all customers of the bank who have an account
but not a loan.

56
Continued…………
• Find the names of all customers who have a loan at the
Perryridge branch.
customer_name (branch_name=“Perryridge”

(borrower.loan_number = loan.loan_number(borrower x loan)))

 Find the names of all customers who have a loan at the


Perryridge branch but do not have an account at any branch
of the bank.

customer_name (branch_name = “Perryridge”

(borrower.loan_number = loan.loan_number(borrower x loan))) –

customer_name(depositor)
57
Continued…………
• Find the names of all customers who have a loan at the
bank along with the loan number and the loan amount.

customer_name, loan_number, amount (borrower loan)

Find all customers who have an account at all branches located


in Brooklyn city.

customer_name, branch_name (depositor account)


 branch_name (branch_city = “Brooklyn” (branch))

58
Solve
• employee (person-name, street, city)
• works (person-name, company-name, salary)
• company (company-name, city)
• manages (person-name, manager-name)
Consider the above relational database. Give an expression in
the relational algebra to express each of the following queries:

59
Questions
1. Find the names of all employees who work for BSP.
2. Find the names and cities of residence of all employees who
work for BSP.
3. Find the names, street address, and cities of residence of all
employees who work for First Bank Corporation and earn
more than $10,000 per annum.
4. Find the names of all employees in this database who live in
the same city as the company for which they work.
5. Find the names of all employees who live in the same city
and on the same street as that “Mr. Ravi”.
6. Find the names of all employees in this database who do not
work for BSP.

60
Relational calculus
(Tuple and domain relational calculus)

61
Relational calculus
• It is a nonprocedural query language.
• It describes the desired information without giving a specific
procedure for obtaining that information.
• Two types of relational calculus
– Tuple relational calculus
– Domain relational calculus

62
Tuple Relational Calculus
• A nonprocedural query language, where each query is of the
form
{t | P (t ) }
• It is the set of all tuples t such that predicate or condition P is
true for t.

63
Logical connectors
There are some logical connectors that are used in relational
calculus.
 belongs to
 implication(implies)
 and
V or
 not or negation
 existential or there exist
 universal or for all

64
Calculus Formula
1. Set of attributes and constants
2. Set of comparison operators: (e.g., , , , , , )
3. Set of connectives: and (), or (v)‚ not ()
4. Set of quantifiers:
 t r (Q (t )) ”there exists” a tuple in t in relation
r such that predicate Q (t ) is true
 t r (Q (t )) Q is true “for all” tuples t in relation r

65
Banking Example

• branch (branch_name, branch_city, assets )


• customer (customer_name, customer_street, customer_city )
• account (account_number, branch_name, balance )
• loan (loan_number, branch_name, amount )
• depositor (customer_name, account_number )
• borrower (customer_name, loan_number )

66
Example Queries
1. Find the loan_number, branch_name, and amount for loans of
over $1200.
{t | t  loan  t [amount ]  1200}

2. Find the loan number for each loan of an amount greater


than $1200.

{t[loan_number ] | t  loan  t [amount ]  1200}

67
Example Queries
3. Find the names of all customers having a loan at the Perryridge
branch.

customer_name (branch_name=“Perryridge”
(borrower.loan_number = loan.loan_number(borrower x loan)))

{t[customer_name] | t  borrower  t [branch_name ] = “Perryridge”


 s  loan (s [loan_number ] = t [loan_number ])}

The set of all tuples t such that there exists a tuple s in relation loan for
which the values of t and s for the loan_number attribute are equal.
68
Example Queries
4. Find the names of all customers having a loan, an account, or both at
the bank.

{t | s  borrower ( t [customer_name ] = s [customer_name ])


 u  depositor ( t [customer_name ] = u [customer_name ])

5. Find the names of all customers who have a loan and an account
at the bank.
{t | s  borrower ( t [customer_name ] = s [customer_name ])
 u  depositor ( t [customer_name ] = u [customer_name] )

69
Domain Relational Calculus

• A form of Relational Calculus which uses domain variables that take


on values from an attributes domain, rather than values for an entire
tuple.
Definition:
An expression in Domain Calculus is of the form

{< x1, x2, … , xn > | P(x1, x2, … , xn) }


where
x1, x2, … , xn represents domain variables
P represents a formula composed of atoms

An Atom has the form


• < x1, x2, … , xn > ∈ r , where r is a relation on n attributes and x1,
x2, … , xn are the domain variables or domain constraints.
• xΘy , where x and y are domain variables and Θ is the
comparison operator (<,>, ≤, ≥, =, ≠). It is required that x and y
have domains that can be compared by Θ.
70
Banking Example

• branch (branch_name, branch_city, assets )


• customer (customer_name, customer_street, customer_city )
• account (account_number, branch_name, balance )
• loan (loan_number, branch_name, amount )
• depositor (customer_name, account_number )
• borrower (customer_name, loan_number )

71
Example Queries
1. Find the loan_number, branch_name, and amount for loans of
over $1200.

{ l, b, a  |  l, b, a   loan  a > 1200}

72
Example Queries
2. Find all loan numbers for loans with an amount greater than
$1200.

{< l > | ∃ b, a (< l, b, a > ∈ loan ∧ a > 1200)}

3. Find the names of all customers who have a loan of over


$1200.
{ c  |  l, b, a ( c, l   borrower   l, b, a   loan  a >
1200)}

73
SQL3, DDL and DML constructs

74
Relational database design:
Domain and data dependency,
Armstrong's axioms,
Normal forms,
Dependency preservation,
Lossless design.

91
Informal design guidelines for relational
schemas

• Semantics of attributes.
• Reducing the redundant values in tuples.
• Reducing the null values in tuples.
• Disallowing the possibility of generating spurious tuples.

92
Semantics of attributes
• Semantics – specifies how to interpret the attribute values stored
in a tuple of the relation.
• Name of the attribute must have some meaning.
• Relationship among the relations must be clear.

Employee Emp_id Emp_name Street City

Company_id Company_name C_City


Company

Works Emp_id Company_id Salary

Manages Manager_id Manager_name Emp_id

GUIDELINES 1:
 Design a relation schema so that it is easy to explain its meaning.
 Do not combine attributes from multiple entity types into a single
relation. 93
Reducing the redundant values in tuples
• One goal of schema design is to minimize the storage space that the
base relations occupy.
• Grouping attributes into relation schemas has a significant effect on
storage space.
• Mixing attributes of multiple entities may cause problems
• Information is stored redundantly wasting storage
• Problems with update anomalies
– Insertion anomalies
– Deletion anomalies
– Modification anomalies

94
Emp_Com

Emp_id Emp_name Street City Company_id Company_name C_City

Combining the 2 relations ‘Employee’ and ‘Company’ into one


relation namely: ‘Emp_com’.

Insertion Anomalies

Insertion anomalies can be differentiated into two types:


•To insert a new tuple into emp_com, either include the attribute
values for the company that the employee works for, or null if the
employee does not work for a company yet.
•To insert a new company that has no employees as yet in the
emp_com relation. Place the null values in the attributes for
95
employee.
Deletion & Modification Anomalies
Deletion anomalies
• If we delete from emp_com, an employee tuple that represent
the last employee working for a particular company, the
information concerning that company, is lost from the database.
Modification Anomalies
In Emp_com if the value of the attributes for a particular
company is to be changed, the update is required for all tuples
of ‘emp_com’ relation who work in that company, otherwise
database will become inconsistent.
GUIDELINE 2 : Design a schema that does not suffer from the
insertion, deletion and update anomalies. If there are any
present, then ensure that applications that update the database
will operate correctly.
96
Null Values in Tuples
If many of the attributes do not apply to all tuples in the relation
, we end up with many nulls in those tuples. Nulls can have
multiple interpretations such as,
– Attribute not applicable or invalid
– Attribute value unknown (may exist)
– Value known to exist, but unavailable

For example, if only 10 percent of employees have individual


offices, there is little justification for including an attribute
office_number in the employee relation.

GUIDELINE 3: Relations should be designed such that their


tuples will have as few NULL values as possible
97
Generation of SpuriousTuples
• At the time of joining of two tables the extra tuples which are
included in the join are extra, i.e. which are not required are
spurious tuples or dangling tuples or wrong tuple.

GUIDELINE 4 : design relation schemas so that they can be joined


with equality conditions on attributes that are either primary keys
or foreign keys in a way that guarantees that no spurious tuples
are generated.

98
99
Functional Dependencies
• Functional dependencies (FDs) are used to specify formal
measures of the "goodness" of relational designs.
• A Functional dependency define the properties of the database
schema.
• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes.
• Functional dependency is a constraint between two sets of
attributes in a relation from a database.
• An FD is a property of the attributes in the schema R.
• The constraint must hold on every relation instance r(R).

100
Functional Dependencies
• A functional dependency is a constraint that specifies the
relationship between two sets of attributes where one set can
accurately determine the value of other sets.
• It is denoted as X → Y, where X is a set of attributes that is
capable of determining the value of Y.
• The attribute set on the left side of the arrow, X is
called Determinant, while on the right side, Y is called
the Dependent.
• The constraint is for any two tuples t1 and t2 in r if t1[X] =
t2[X] then they have t1[Y] = t2[Y].
• This means the value of X component of a tuple uniquely
determines the value of component Y.

101
Example

102
Example
• Consider the relation schema EMP_PROJ in the figure below:

103
Types of Functional Dependency
1. Full functional dependency
2. Partial dependency
3. Transitive dependency
4. Trivial and Non-trivial dependency

104
Full functional dependency
•For a relation schema R and a FD X->Y is a full functional
dependent if you remove any attribute from X the dependency
does not hold any more.
•{X-A}-> Y is no longer true.

{SSN, PNUMBER}  HOURS is a full FD since neither

SSN  HOURS nor PNUMBER  HOURS hold

105
Partial dependency
• A partial dependency X → Y, then there is some attribute on
X that can be removed from X and yet the dependency stills
holds.

{SSN, PNUMBER}  ENAME is called a partial dependency )


since SSN  ENAME also holds

106
Transitive dependency

• Given a relation R with the functional dependencies F, if there


a set of attribute Z that are neither a primary or candidate key
and both X  Y and Y  Z holds, , then Z is transitively
dependent on X.
• (X->Z) is transitive dependent

SSN  DNUMBER and DNUMBER  DMGRSSN hold


then SSN  DMGRSSN is a transitive FD.

SSN  ENAME is non-transitive since there is no set of


attributes X where SSN  X and X  ENAME
107
Trivial and non-trivial functional
dependecy
• If X -> Y hold and X is a superset of Y, then this is called a
Trivial Functional Dependency. All FDs which are not trivial is
called Non-trivial FD.
Example:
• {Employee ID, EmployeeAddress} → {EmployeeAddress} is
trivial.

108
Prime & non-prime attributes
If an attribute is a candidate key or a subset of candidate key
then it is called a prime attribute.
For ex,
A B C D

‘A’ is a candidate key.


Combination of C & D is a candidate key.
Hence; A, C, D are prime attributes and B is a non-prime
attribute.

109
Inference Rules for FDs(Armstrong Axioms)

• Axioms, or rules of inference, provide a simpler technique for


reasoning about functional dependencies.
• Armstrong's axioms are a set of axioms (inference rules) used
to infer all the functional dependencies on a relational database.
• The very first Armstrong is the person who gives a set of
inference rules.

110
Armstrong's inference rules

4. IR4- (Union) If X -> Y and X -> Z, then X -> YZ


5. IR5-(Decomposition) If X -> YZ, then X -> Y and X -> Z
6. IR6-(Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z

111
Questions on inference rules for FD
1. Let set of FDs are given
F={ A->B,C->X, BX->Z}
Prove or disprove that AC->Z
Sol n:
C->X and BX->Z then CB->Z
By rule A6: Pseudo transitivity
A->B(given) and CB->Z then AC->Z
By rule A6: Pseudo transitivity

112
Questions on inference rules for FD
2. Let set of FDs are given
F={ A->B,C->D, C is subset of B}
Prove or disprove that A->C
Sol n:
C is subset of B then B->C
By rule A1: Reflexivity
A->B(given) and B->C then A-> C
By rule A3: Transitivity

113
Questions on inference rules for FD
3. Let set of FDs are given
F={ A->B,BC->D}
Prove or disprove that AC->D
Sol n:
A->B and BC->D then AC->D
By rule A6: Pseudo transitivity

114
Questions on inference rules for FD
4. Let set of FDs are given
F={ A->B,BC->D}
Prove or disprove that AD-> B
Sol n:
A->B then AD->BD
By rule A2: Augmentation
AD->BD then AD->B and AD->D
By rule A5: Decomposition

115
Questions on inference rules for FD
5. Let set of FDs are given
F={ A->B,A->C,BC->D}
Prove that A->D
Sol n:
A->B and A->C then A->BC
By rule A4: union
A->BC and BC->D(given) then A->D
By rule A3: Transitivity

116
Un-normalized Relation
A table is said to be un-normalized if each row contains
multiple set of values for some of the columns, these multiple
values in a single row are also called non-atomic values.

117
Normal form
• Basically the normal form of data indicate how much
redundancy is in that data.
• Types of normal form
– 1NF Keys; No repeating groups
– 2NF; No partial dependencies
– 3NF; No transitive dependencies
– BCNF; Determinants are candidate keys
– 4NF; No multi-valued dependencies
– 5NF (PJNF)
– DKNF(Domain Key Normal Form)

118
Introduction to Normalization
• Normalization , takes a relation schema through a series of tests to “certify” whether it
satisfies a certain normal form.
• The process, which proceeds in a top down fashion by evaluating each relation against
the criteria for normal forms and decomposing relations as necessary.
• Normalization is the process of decomposing unsatisfactory "bad" relations by breaking
up their attributes into smaller relations while ensuring data integrity, eliminating data
redundancy and minimizing the insertion, deletion and update anomalies.

119
Need of Normalization

120
Benefits of Normalization
• Improve storage efficiency
• Quicker updates
• Less data inconsistency
• Clearer data relationships
• Easier to add data
• Flexible Structure

121
First Normal Form
• It states that the value of any attribute in any relation must be a
single atomic value.
• In other words, only one value is associated with each attribute
and the value is not a set of values or a list of values (No
composite Attributes).

122
123
124
Second Normal Form
• A relation schema R is in second normal form (2NF) if it is in
the 1NF and if all non-prime attributes of R are fully
functionally dependent on the primary keys.
• R can be decomposed into 2NF relations via the process of
2NF normalization.

Steps to convert 1NF to 2NF


• First find all candidate keys.
• Determine all the prime and nonprime attributes separately.
• Check that no nonprime attribute is partially dependent on any
key.

125
Sales order relation

126
Example
• The above relation is in 1NF because of the values in the
domain of each attribute of the relation are atomic.
• Then we check the above relation for 2NF.
• Then first find the attributes that make primary key.
• No one attribute alone form a primary key for relation sale
order.
• So we form (order, product) combined as a key of relation.

127
Continued…….
• Possible FDs are:
Order-> Customer
Order-> Address
Customer-> Address
Product -> Unit price
Order, product-> Quantity

Quantity

128
Second Normal Form
• 2NF means no partial dependencies on the key. But we have
• {order}-> {customer }
• {order}->{ address}
• {product}-> {unit price}
So the relation is in 1NF but not in 2NF. So we convert the relation
sale order in 2NF.

129
Conversion of 1NF to 2NF

Quantity

R1 is now in 2NF, but


there is still partial FD
in R2.
Product-> unit price

130
Relation in 2NF

131
Relation not in 2NF

This table has a composite primary key [Customer ID, Store


ID]. The non-key attribute is [Purchase Location]. In this case,
[Purchase Location] only depends on [Store ID], which is only
part of the primary key. Therefore, this table does not satisfy
second normal form.
132
Continued…..

What we have done is to remove the partial functional


dependency that we initially had. Now, in the table
[TABLE_STORE], the column [Purchase Location] is fully
dependent on the primary key of that table, which is [Store
ID].
133
Third Normal Form
A relation schema R is in third normal form (3NF) if
• It is in 2NF and
• Every non-prime attribute A in R is non-transitively dependent
on the primary key.

134
Definition: Third normal form
Relation R is in 3NF if and only if every dependency A->B
satisfied by R meets at least ONE of the following criteria:
1. A is a superkey or candidate key
2. B is a subset of a candidate key

Or
Relation R is in 3NF if and only if every dependency A->B
satisfied by R meets at least ONE of the following criteria:
1. LHS of all FD is superkey or candidate key
2. RHS is a prime attribute

135
Steps to convert 2NF to 3NF
1. First find all candidate keys.
2. Determine all the prime and nonprime attributes separately.
3. Take first one as primary key. Try to find attributes that show
transitive dependency between them. This we can search
from FD diagram.
4. Try to remove the transitivity.

136
Functional dependencies in sales order
table

Order, product-> Quantity


137
Continued……..
• So the relation R1 is not in 3NF.
• R2 and R3 are also in 2NF and 3NF.
• So we convert only R1 into 3NF.
Conversion to 3NF

138
Relations in 3NF

139
Problems with 3NF
• There are some circumstances where 3NF suffers from certain
inadequacies:
• It does not deal satisfactory with relations which
– Have multiple candidate keys, where
– These candidate keys are composite, and
– The candidate keys overlap.

140
Problems
• Consider a relation schema R={A,B,C,D} and set of FDs are AB-
>CD and D->A. check whether above schema is in 3NF or not.
Solution:
First find the candidate key of above relation.
AB+={A,B,C,D}
So AB is the candidate key.
In FD,
AB->CD, LHS is candidate key.
D->A, RHS is a prime attribute.
So, above relation is in 3NF.

141
Problems
• Consider a relation schema R={A,B,C,D} and set of FDs are AB->C
and C->D. check whether above schema is in 3NF or not.
Solution:
First find the candidate key of above relation.
AB+={A,B,C,D}
So AB is the candidate key.
In FD,
AB->C, LHS is candidate key.
C->D, LHS is not candidate key or RHS is not prime attribute.
So, above relation is not in 3NF.

142
Boyce-Codd Normal Form (BCNF)
• A relation is in BCNF, if and only if,
– It is in 3NF and
– Every determinant in that table is a candidate key.
If a table contains only one candidate key, 3NF and BCNF
are equivalent.

143
Example
• Consider the following relation and determinants.
R(a,b,c,d)
a,b -> c,d
a,d -> b
• Here, the first determinant suggests that the primary key of R
could be changed from a,b to a,c.
• If this change was done all of the non-key attributes present in
R could still be determined, and therefore this change is legal.
• However, the second determinant indicates that a,d determines
b, but a,d could not be the key of R as a,d does not determine
all of the non key attributes of R (it does not determine c). We
would say that the first determinate is a candidate key, but the
second determinant is not a candidate key, and thus this
relation is not in BCNF (but is in 3rd normal form).
144
Example

Above relation is in BCNF.

145
Student relation

Assumption for student relation


1. Student may have more than one subject.
2. For each subject a student has exactly one trainer and one
percentage.
3. Trainer trains in exactly one subject.
146
Student relation

• primary key={sid, subject}


FDs are
{sid, subject}->Trainer
{sid, subject}-> percentage
Trainer-> subject
This relation is in 2NF and 3NF.
147
Converting a relation to BCNF
• A relation that is in 3NF but not in BCNF can be converted to
relation in BCNF using a simple two-step process.
I. In the first step the relation is modified so that the
determined in the relation that is not a candidate key
becomes a component of the primary key of the revised
relation.
The attribute that is functionally dependent on that
determinant becomes a non-key attribute. This is a legitimate
restructuring of the original relation because of the
functional dependency.

148
Continued…….
II. We will discover that the new relation using rule 1 has a
partial functional dependency. Because subject is
functionally dependent on trainer which is just one
component of the candidate key. So the new relation is in
1NF, but not in 2NF. So we decompose the relation to
eliminate the partial functional dependency.

149
3NF vs BCNF

150
Questions on Normalization

1. Consider the relation r(x,y,z,w) and a set F={y->w, xy->z}.


What are the candidate keys of this relation? What is the highest
normal form of this relation?
Solution:
• XY is the candidate key.
• 1 NF

2. What is the highest normal form of the relation


R1(A,B,C) with A->B, A->C and C->B
Solution:
• A is the candidate key.
• 2NF because B, C are not the part of candidate key
151
Questions on Normalization
3. What is the highest normal form of of the relation
R2(A,B,C,D) with A->BC, CD->B
Solution:
• AD is the candidate key.
• 1 NF

4. What is the highest normal form of the relation


R3(A,B,C,D,E) with A->BC, CD->B and E->A
Solution:
• DE is the candidate key.
• 1 NF
152
Multivalued Dependencies

153
Multivalued Dependencies
• A multivalued dependency (MVD) exists between two fields
X and Y, when a single value of X is directly associated with
two or more values of Y.
• An MVD is represented as X Y and can be read as:
– “the value of X determines multiple values of Y”
Or
– “multiple values of Y are functionally dependent on the
value of X.

154
Teaching database

155
Example

1. {course}  {book}
2. {course}  {lecturer}

156
Types of MVD
• A multi-valued dependency can be further defined as being
trivial or nontrivial.
– A MVD A −>> B in relation R is defined as being
trivial if (a) B is a subset of A or (b) A  B = R.
– A MVD is defined as being nontrivial if neither (a)
nor (b) are satisfied.

157
Definition MVD
• A multi-valued dependency X->> Y specified on a relational
schema R, where X and Y are both subset of R, specifies the
following constraint on any relation state r of R.
• If two tuples t1 and t2 exist in ‘r’ such that t1[X] = t2[X], then two
tuples t3 and t4 should also exist in R with the following
properties,

• t1[X] = t2[X] = t3[X] = t4[X]


• t3[Y] = t1[Y] and t4[Y]= t2[Y]
• t4[Z] = t1[Z] and t3[Z] =t2[Z]

158
Example Name-
>>Email-id
Student (Name, Phone,Email-id,Age) with MVD Name->>Email-id. If student has
the two tuples:

t1
t2
It must also have the same tuples with Email-id components swapped :

t4
t3
159
Fourth Normal Form (4NF)
A relation schema R is in 4NF with respect to a set of
dependencies F if,
• It is in BCNF and
• For every nontrivial multivalued dependency X —>> Y , X is a
superkey —that is, X is either a candidate key or a superset
thereof.

160
Pizza Delivery Permutations
Restaurant Pizza Variety Delivery Area
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's ThinCrust Shelbyville
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
161
Pizza Delivery Permutations

• The table has no non-key attributes because its only key is


{Restaurant, Pizza Variety, Delivery Area}.
• The problem is that the table features two non-trivial
multivalued dependencies on the {Restaurant} attribute
(which is not a superkey).
• The dependencies are:
{Restaurant} →→ {Pizza Variety}
{Restaurant} →→ {Delivery Area}

162
To satisfy 4NF, we must place the facts about varieties offered into a
different table from the facts about delivery areas:

Varieties By Restaurant Delivery Areas By Restaurant

Restaurant Pizza Variety Restaurant Delivery Area

Vincenzo's Thick Crust Vincenzo's Springfield


Pizza Pizza
Vincenzo's Thin Crust Vincenzo's Shelbyville
Pizza Pizza
Elite Pizza Thin Crust
Elite Pizza Capital City
Elite Pizza Stuffed Crust
A1 Pizza Springfield
A1 Pizza Thick Crust
A1 Pizza Shelbyville
A1 Pizza Stuffed Crust
A1 Pizza Capital City
163
Join dependency

A relation R satisfies the join dependency (JD) denoted by


* (R1,R2,...Rn) if and only if R is equal to the join of
R1, R2, ..., Rn where Ri are subsets of the set of attributes of R.

164
Join dependency

165
Join dependency

166
Types of Join dependency
• If any Ri of join dependency (R1,R2,...Rn) is equal to R, then it
is called a trivial join dependency else non-trivial i.e
Ri =R or r(Ri )= r(R).

167
5NF or Project Join Normal Form(PJNF)
A relation R is in 5NF, if
• it is in 4NF and
• for every non-trivial JD * (R1,R2,...Rn) of R, every Ri is a super key
of R.

168
Example
• Consider a relation schema
• R(A,B,C,D,E,F) with FDs A->BC and F->DE, JD
*[ABC,ADF,DEF]. Show that R is in 5NF or not.
Solution:
candidate key is =AF
So, R is not in PJNF because in the
JD *[ABC,ADF,DEF], ABC or DEF is not a super key of R.

169
Attribute closure
• In any relation R, let X be a set of attributes of R and a set of
functional dependencies F, we need a way to find all of the
attributes of R that are functionally determined by X.
• This set of attributes is called the closure of X under F and is
denoted X+ .
Ex:
a. If A -> B, then A+ = {A,B}
b. If A -> BC, then A+ = {A,B,C}
c. If A -> BC and C-> F, then A+ = {A, B,C,F}

170
Algorithm for finding the attribute closure

An algorithm for computing X+:


Step 1: start with result := X
step 2: loop through all FDs Y  Z in F
Step 3: if Y  result then result := result  Z
Step 4: once a FD is used, it does not need to be considered again
Step 5: iterate until set result does not enlarge
end

171
Question 1
R={A,B,C,D,E,F}
Let F={A-> B, BC->DE,AEF->G,B->F}
FIND A+, BC+, AEF+

Solution:
A+ ={A,B,F}
AEF+ ={A,B,E,F,G}
BC+ ={B,C,D,E,F}

172
Question 2
Consider a relation R with attributes
R={A,B,C,D,E,F} AND
FD={A->BC, E->CF,B->E,CD->EF}
COMPUTE AB+ .
Solution:
AB+ ={A,B,C,E,F}

173
Question 3
A relation schema
R = {A, B, C, D, E, F, G, H, I} &
the set of FDs F = {A->B, A->C, CG -> H, CG->I, B->H} holds
on R. compute AG +.
Solution:

AG + ={A,G,B,C,H,I}

174
Closure of set of functional dependencies
• F+ is a set of FDs that are implied by the FDs in F.
Algorithm
• Start with set of existing functional dependencies
• Apply rules of inference to determine new dependencies
• Iterate until set does not change

175
Question 1
• Compute F+ for relation schema R(A,B,C,D,E,F) and F={A->
B, A-> C,CD-> E,CD-> F, B-> F)
Solution:
A-> B and A-> C
By rule A4:union rule A->BC
A-> B and B-> F
By rule A3: transitive rule A->F
CD-> E and CD-> F
By rule A4:union rule CD->EF

176
Question 2
• Compute F+ for relation schema R(A,B,C,D) and F={A-> B,
A-> C,BC-> D)
Solution:
A-> B and A-> C
By rule A4:union rule A->BC
A-> BC and BC-> D
By rule A3: transitive rule A->D
A-> C and BC-> D
By rule A6:pseudotransitive rule AB->D

177
Question 3
• Compute F+ for relation schema R(A,B,C,D,E) and F={AB-
>C, A->D,D->E)
Solution:
A->D
By rule A2: augmentation rule AB->BD
AB->C and AB->BD
By rule A4: union rule AB->BCD

178
Equivalence between sets of Functional Dependencies

Consider two sets of FDs, F and G,


F = {A -> B, B -> C, AC -> D} and
G = {A -> B, B -> C, A -> D}
Are F and G equivalent?
• We can conclude that F and G are equivalent, if we can prove
that all FDs in F can be inferred from the set of FDs in G and
vice versa.

179
Equivalence between sets of Functional Dependencies
Take the attributes from the LHS of FDs in F and compute
attribute closure for each using FDs in G:
A+ using G = ABCD; A -> A; A -> B; A -> C; A -> D;
B+ using G = BC; B -> B; B -> C;
AC + using G = ABCD; AC -> A; AC -> B; AC -> C; AC -> D;
Notice that all FDs in F (highlighted) can be inferred using FDs
in G.
To see if all FDs in G are inferred by F, compute attribute closure
for attributes on the LHS of FDs in G using FDs in F:
A+ using F = ABCD; A -> A; A -> B; A -> C; A-> D;
B+ using F = BC; B -> B; B -> C;
Since all FDs in F can be obtained from G and vice versa, we
conclude that F and G are equivalent.
180
Example 1
A Counter Example:
F = {A -> B, A ->C}
G = {A -> B, B -> C}
F and G are not equivalent, because B -> C in G is not inferred
from the FDs in F.
A+ using G = ABC;
Whereas,
A+ using F = ABC;
B+ using F = B, indicating B -> C in G is not inferred using the
FDs from F.

181
Example 2
• Let the relation: R={A,B,C,D,E,F,G }satisfies following FDs
Let F={A-> B, BC->DE,AEF->G},
FIND AC+ . Is the FD, ACF ->DG implied by this set.

Solution: AC+ = ABCDE


To prove, ACF->DG
Find,
ACF+ = {ABCDEFG}
Now ACF+ contains D and G.
So, ACF->DG

182
Decomposition and its desirable
properties
• Decomposition refers to breaking down of one table into
multiple tables such that every attribute appears in at least one
of the new relations .
• The process of normalization depends on being able to factor
or decompose a table into two or more smaller tables, in such
a way that we can recapture the precise content of the original
table by joining the decomposed parts.

183
Desirable properties of decomposition
• Attribute preservation: This is a simple and an obvious
requirement that involves preserving all the attributes that
were in the relation that is being decomposed.
• Lossy decomposition: There should be no loss of information
due to decomposition.
• Lossless join decomposition: Recombination using relational
join produces exactly same as pre decomposition.
• Dependency preservation: Dependency preservation is
another important requirement since a dependency is a
constraint on the database and if X-> Y holds than we know
that the two attributes are closely related and it would be
useful if both attributes appeared in the same relation so that
the dependency can be checked easily.
• Lack of redundancy: Redundancy or repetition should be
avoided as much as possible. 184
Example of lossy decomposition

Spurious tuples

185
Example Lossless-Join condition
Condition for lossless join.
1) All attributes of an original schema (R ) must appear in the
decomposition(R1 , R2 )
R= R1  R2
2) At least one of the following functional dependencies
holds
• If R1  R2= R1
• If R1  R2 = R2

186
Example 1
• Suppose we decompose the schema R={A,B,C,D,E) into
(A,B,C) and (A,D,E).
• Show that this decomposition is a loss less join decomposition
if the following FD’s hold
• A->BC
• CD->E
• B->D
• E->A
Solution:
• A decomposition {R1, R2} is a lossless-join decomposition if
R1 ∩ R2 → R1 or R1 ∩ R2 → R2.
• Let R1 = (A, B, C), R2 = (A, D, E), and
• R1 ∩ R2 = A.
• Since A is a candidate key , Therefore R1 ∩ R2 → R1. 187
Example 2
• Suppose we decompose the schema R={A,B,C,D,E) into
(A,B,C) and (C,D,E).
• Show that this decomposition is a loss less join decomposition
if the following FD’s hold
• A->BC
• CD->E
• B->D
• E->A
Solution:
• A decomposition {R1, R2} is a lossless-join decomposition if
R1 ∩ R2 → R1 or R1 ∩ R2 → R2.
• Let R1 = (A, B, C), R2 = (C, D, E), and
• R1 ∩ R2 = C.
• Since C is not candidate key , Therefore the decomposition is188
Example Dependency preservation

• Let us consider a relation R(A,B,C,D) that has the


dependencies F that include the following
• A->B and A->C
• If we decompose the above relation into
• R1 (A,B) and R2 (B,C,D) the dependency A->C can not be
preserved by looking at only one relation.
• It is desirable that decomposition be such that each
dependency in F may be checked by looking at only one
relation and that no join need to be computed for checking
dependencies.

189
Dependency preservation
Condition
• If R is decomposed into X and Y then

190
Example 1

191
Example 2

192

You might also like