Database Systems
Mohamed Zahran (aka Z)
[email protected] http://www.mzahran.com
CSCI-GA.2433-001
Lecture 4: Relational Algebra and Calculus
Query Languages
(e.g. SQL)
Are specialized languages
for asking questions.
Relational Algebra and Calculus
Procedural: Algebra Declarative: Calculus
Query
Instances
of
Relations
Instances
of
A Relation
Relational Algebra
Queries are composed using a collection
of operators.
Every operator:
Accepts one or two relation instances
Returns a relation instance.
Compose relational algebra expression
Each query describes a step-by-step
procedure for computing the desired
answer.
Relational Algebra
Five basic operators
Selection
Projection
Union
Cross-product
Difference
Selection
) (
_
Input
Criteria Selection
o
A relation instance
The selection operator specifies the tuples to retain through selection criteria.
A boolean combination (i.e. using V and ) of terms
Attribute op constant or attribute1 op attribute2
< , <=, =, , >=, or >
Manipulates data in a single relation
Selection
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
o
rating
S
>8
2 ( )
sid sname rating age
28 yuppy 9 35.0
58 rusty 10 35.0
Projection
) (Input
fields
t
Allows us to extract columns from a relation
age
35.0
55.5
t
age
S ( ) 2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Example:
t o
sname rating rating
S
,
( ( ))
>8
2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
sname rating
yuppy 9
rusty 10
Set Operations
Takes as input two relation instances
Four standard operations
Union
Intersection
Set-difference
Cross-product
Union, intersection, and difference require
the two input set to be union compatible
They have the same number of fields
corresponding fields, taken in order from left
to right, have the same domains
Set Operation: Union
R U S returns relation instance
containing all tuples that occur in either
relation instance R or S, or both.
R and S must be union compatible.
Schema of the result is defined to be
that of R.
Set Operation: Union
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
44 guppy 5 35.0
28 yuppy 9 35.0
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S1
S2
S1 U S2
Set Operation: Intersection
R S: returns a relation instance
containing all tuples that occur in both R
and S.
R and S must be union compatible.
Schema of the result is that of R.
Set Operation: Intersection
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S1
S2
S1 S2
sid sname rating age
31 lubber 8 55.5
58 rusty 10 35.0
Set Operation: Set-Difference
R S: returns a relation instance
containing all tuples that occur in R but
not in S.
R and S must be union-compatible.
Scheme of the result is the schema of
R.
Set Operation: Set-Difference
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
S2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
sid sname rating age
22 dustin 7 45.0
S1 S2
Set Operation: Cross-Product
R x S: Returns a relation instance whose
scheme contains:
All the fields of R (in the same order as they
appear in R)
All the fields os S (in the same order as they
appear in S)
The result contains one tuple <r,s> for each
pair with r R and s S
Basically, it is the Cartesian product.
Fields of the same name are unnamed.
Set Operation: Cross-Product
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
S1 x R1
Renaming
Name conflict can arise in some
situations
It is convenient to be able to give names
to the fields of a relation instance
defined by a relational algebra
expression.
) ), ( ( E F R
Take arbitrary relational expression E
Returns an instance of a new relation R
R is the result of E except that some fields are renamed
Renaming list has the form (oldname newname or position newname)
Renaming
) 1 1 ), 2 5 , 1 1 ( ( R S sid sid C
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
sid1 sname rating age sid2 bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Question: Can you define R S
using other operators?
Other Operators?
We can define any operation using the
operators that we have seen.
Some other operations appear very
frequently.
So they deserve to have their own
operators.
Join
Division
Join
Can be defined as cross-product
followed by selection and projection.
We have several variants of join.
Condition joins
Equijoin
Natural join
Condition Join
R
c
S
c
R S = o ( )
Example:
S R
S sid R sid
1 1
1 1
. . <
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
Equijoin
S
c
R
Condition consists only of equalities connected by
Redundancy in retaining both attributes in result
So, an additional projection is applied to remove
the second attribute.
Equijoin
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
1 1
. .
R S
id S sid R =
Example:
sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
Natural Join
It is an equijoin in which equalities are
specified on all fields having the same
name in R and S
We can then omit the join condition.
Result is guaranteed not to have two
fields with the same name.
If no fields in common, then natural join
is simply cross product.
Division
Suppose A has two groups of fields <x,y>
y fields are same fields in terms of
domain as B
A/B = <x> such as for every y value in a
tuple of B there is <x,y> in A.
Division
sno pno
s1 p1
s1 p2
s1 p3
s1 p4
s2 p1
s2 p2
s3 p2
s4 p2
s4 p4
pno
p2
pno
p2
p4
pno
p1
p2
p4
sno
s1
s2
s3
s4
sno
s1
s4
sno
s1
A
B1
B2
B3
A/B1
A/B2 A/B3
Question: Can we define A/B
using the other basic operators?
Disqualified x values:
A/B:
t
x
A ( )
all disqualified tuples
t t
x x
A B A (( ( ) ) )
Examples
sid sname ratin age
Q1. Find the names of sailors who have reserved boat 103
t o
sname
bid
serves Sailors (( Re ) )
=103
)) Re (
103
( Sailors serves
bid
sname
=
o t
Solution 1:
Solution 2:
Sailors Reserves Boats
Examples
sid sname ratin age
Sailors Reserves Boats
Q2: Find the names of sailors who have reserved a red boat.
t o
sname
color red
Boats serves Sailors ((
' '
) Re )
=
t t t o
sname
sid bid color red
Boats s Sailors ( ((
' '
) Re ) )
=
Sol1:
Sol2:
Examples
sid sname ratin age
Sailors Reserves Boats
Q3: Find the colors of boats reserved by Lubber.
) Re )
' '
(( Boats serves Sailors
Lubber sname color
=
o t
Examples
sid sname ratin age
Sailors Reserves Boats
Q5. Fine the names of sailors who reserved a red or a green boat.
o ( , (
' ' ' '
)) Tempboats
color red color green
Boats
= v =
t
sname
Tempboats serves Sailors ( Re )
Relational Calculus
An alternative to relational algebra.
Declarative
describe the set of answers
without being explicit about how they should be
computed
One variant is called: tuple relational calculus
(TRC).
Another variant: domain relational calculus
(DRC)
Calculus has variables, constants, comparison
ops, logical connectives and quantifiers.
Tuple Relational Calculus
A TRC query has the form {T | p(T)}
T is a tuple variable
p(T) is a formula that describes T
Result: set of all tuples t to which p(T)
evaluates to true when T = t
Example:
Tuple Relational Calculus
Q: Find the names and ages of sailors with a rating above 7
Q: Find the sailor name, boat id, and reservation date for each reservation.
Domain Relational Calculus
Query has the form:
x x xn p x x xn 1 2 1 2 , ,..., | , ,...,
|
\
|
.
|
|
|
Answer includes all tuples that
make the formula be true.
x x xn 1 2 , ,...,
p x x xn 1 2 , ,...,
|
\
|
.
|
|
|
Example: Find all sailors with a rating above 7
Giving each attribute a
variable name
Ensures that I, N, T, and A
are restricted to be fields
of the same tuple
Algebra Vs Calculus
Every query that can be expressed in
relational algebra can also be expressed
in relational calculus.
The other way around is a bit tricky.
Think, for example, about:
Conclusions
Relational algebra and calculus are the
foundation of query languages like SQL.
Queries are expressed by languages like
SQL, and the DBMS translates the query
into relational algebra.
DBMS tries to look for the cheapest relational
expression.
Section 4.2.6 is very useful, pay close
attention to it.
For the calculus part, we will use slides
only.