0% found this document useful (0 votes)
117 views10 pages

Database Management Systems-4

The document provides an overview of relational algebra and calculus. It discusses how relational algebra uses operators like selection, projection, union, intersection, difference, and cross product to manipulate relational data and define queries. Examples are provided to illustrate each operator, showing sample relations and the results of applying the operators. Relational calculus is also introduced as the other formal query language associated with the relational model.

Uploaded by

Arun Sasidharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views10 pages

Database Management Systems-4

The document provides an overview of relational algebra and calculus. It discusses how relational algebra uses operators like selection, projection, union, intersection, difference, and cross product to manipulate relational data and define queries. Examples are provided to illustrate each operator, showing sample relations and the results of applying the operators. Relational calculus is also introduced as the other formal query language associated with the relational model.

Uploaded by

Arun Sasidharan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIT-2

Relational Algebra and Calculus


PRELIMINARIES

In defining relational algebra and calculus, the alternative of referring to fields by


position is more convenient than referring to fields by name: Queries often involve the
computation of intermediate results, which are themselves relation instances, and if we use
field names to refer to fields, the definition of query language constructs must specify the
names of fields for all intermediate relation instances. This can be tedious and is really a
secondary issue because we can refer to fields by position anyway. On the other hand, field
names make queries more readable.

Due to these considerations, we use the positional notation to formally define relational
algebra and calculus. We also introduce simple conventions that allow intermediate
relations to ‘inherit’ field names, for convenience.

We present a number of sample queries using the following schema:

Sailors (sid: integer, sname: string, rating: integer, age: real)


Boats (bid: integer, bname: string, color: string)
Reserves (sid: integer, bid: integer, day: date)

The key fields are underlined, and the domain of each field is listed after the
field name. Thus sid is the key for Sailors, bid is the key for Boats, and all three
fields together form the key for Reserves. Fields in an instance of one of these relations
will be referred to by name, or positionally, using the order in which they are listed above.
In several examples illustrating the relational algebra operators, we will use the
in-stances S1 and S2 (of Sailors) and R1 (of Reserves) shown in Figures 4.1, 4.2, and 4.3,
respectively,

Dept of CSE, Unit-2 Page 1


RELATIONAL ALGEBRA
Relational algebra is one of the two formal query languages associated with the re-
lational model. Queries in algebra are composed using a collection of operators. A
fundamental property is that every operator in the algebra accepts (one or two) rela-
tion instances as arguments and returns a relation instance as the result. This property
makes it easy to compose operators to form a complex query—a relational algebra
expression is recursively defined to be a relation, a unary algebra operator applied to a
single expression, or a binary algebra operator applied to two expressions. We
describe the basic operators of the algebra (selection, projection, union, cross-product,
and difference), as well as some additional operators that can be defined in terms of
the basic operators but arise frequently enough to warrant special attention, in the
following sections.Each relational query describes a step-by-step procedure for computing
the desired answer, based on the order in which operators are applied in the query. The
procedural nature of the algebra allows us to think of an algebra expression as a recipe, or a
plan, for evaluating a query, and relational systems in fact use algebra expressions to
represent query evaluation plans.

Selection and Projection

Relational algebra includes operators to select rows from a relation (σ) and to project
columns (π). These operations allow us to manipulate data in a single relation. Con-
sider the instance of the Sailors relation shown in Figure 4.2, denoted as S2. We can
retrieve rows corresponding to expert sailors by using the σ operator. The expression,

σrating>8(S2)
evaluates to the relation shown in Figure 4.4. The subscript rating>8 specifies the
selection criterion to be applied while retrieving tuples.
sname rating
yuppy 9
Lubber 8
Guppy 5
Rusty 10
sid sname rating age
28 Yuppy 9 35.0
58 Rusty 10 35.0
Figure 4.4 σrating>8(S2) Figure 4.5πsname,rating(S2)

Dept of CSE, Unit-2 Page 2


The selection operator σ specifies the tuples to retain through a selection condition. In
general, the selection condition is a boolean combination (i.e., an expression using the
logical connectives ∧ and ∨ ) of terms that have the form attribute op constant or
attribute1 op attribute2, where op is one of the comparison operators <, <=, =, =, >=,
or >. The reference to an attribute can be by position (of the form .i or i) or by name
(of the form .name or name). The schema of the result of a selection is the schema of
the input relation instance
The projection operator π allows us to extract columns from a relation; for example, we
can find out all sailor names and ratings by using π. The expression πsname,rating (S2)

Suppose that we wanted to find out only the ages of sailors. The expression
πage(S2)
a single tuple with age=35.0 appears in the result of the projection. This follows from
the definition of a relation as a set of tuples. In
practice, real systems often omit the expensive step of eliminating duplicate tuples,
leading to relations that are multisets. However, our discussion of relational algebra
and calculus assumes that duplicate elimination is always done so that relations are
always sets of tuples.

We can compute the names and ratings of highly rated sailors by combining two of the
preceding queries. The expression
πsname,rating(σrating>8(S2))

age sname rating


35.0 yuppy 9
55.5 Rusty 10
Figure 4.6 πage(S2) Figure 4.7 πsname,rating(σrating>8(S2))

Set Operations

The following standard operations on sets are also available in relational algebra: union (U),
intersection (∩), set-difference (−), and cross-product (×).

Dept of CSE, Unit-2 Page 3


 Union: R u S returns a relation instance containing all tuples that occur in either
relation instance R or relation instance S (or both). R and S must be union-
compatible, and the schema of the result is defined to be identical to the schema
of R.

 Intersection: R ∩ S returns a relation instance containing all tuples that occur in


both R and S. The relations R and S must be union-compatible, and the schema of
the result is defined to be identical to the schema of R.

 Set-difference: R − S returns a relation instance containing all tuples that occur


in R but not in S. The relations R and S must be union-compatible, and the
schema of the result is defined to be identical to the schema of R.
 Cross-product: R × S returns a relation instance whose schema contains all the
fields of R (in the same order as they appear in R) followed by all the fields of S
(in the same order as they appear in S). The result of R × S contains one tuple

〈r, s〉 (the concatenation of tuples r and s) for each pair of tuples r ∈ R, s ∈ S.

The cross-product opertion is sometimes called Cartesian product.


We now illustrate these definitions through several examples. The union of S1 and
S2 is shown in Figure 4.8. Fields are listed in order; field names are also inherited
from S1. S2 has the same field names, of course, since it is also an instance of
Sailors.In general, fields of S2 may have different names; recall that we require only
domains to match. Note that the result is a set of tuples. Tuples that appear in both
S1 and S2 appear only once in S1 ∪ S2. Also, S1 ∪ R1 is not a valid operation
because the two relations are not union-compatible. The intersection of S1 and S2 is
shown in Figure 4.9, and the set-difference S1 − S2 is shown in Figure 4.10.
sid sname rating age
22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0
28 Yuppy 9 35.0
44 Guppy 5 35.0

Figure 4.8 S1 ∪ S2

Dept of CSE, Unit-2 Page 4


sid sname rating age
sid sname rating age
31 Lubbe 8 55.5
r
58 Rusty 10 35.0 22 Dustin 7 45.0

Figure 4.9 S1 ∩ S2 Figure 4.10 S1 − S2

The result of the cross-product S1 × R1 is shown in Figure 4.11 The fields in S1


× R1 have the same domains as the
corresponding fields in R1 and S1. In Figure 4.11 sid is listed in parentheses
to
emphasize that it is not an inherited field name; only the corresponding domain is
inherited.

(sid) sname rating age (sid) bid day


22 Dustin 7 45.0 22 101 10/10/96
22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 22 101 10/10/96
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.0 22 101 10/10/96
58 Rusty 10 35.0 58 103 11/12/96
Figure 4.11 S1 × R1
Renaming
We introduce a renaming operator ρ for this purpose. The expression ρ(R(F ), E)
takes an arbitrary relational algebra expression E and returns an instance of a (new)
relation called R. R contains the same tuples as the result of E, and has the same
schema as E, but some fields are renamed. The field names in relation R are the
same as in E, except for fields renamed in the renaming list F.

For example, the expression ρ(C(1 → sid1, 5 → sid2), S1 × R1) returns a relation
that contains the tuples shown in Figure 4.11 and has the followi ng schema: C(sid1:
integer, sname: string, rating: integer, age: real, sid2: integer, bid:
integer,day: dates).

It is customary to include some additional operators in the algebra, but they can all be
defined in terms of the operators that we have defined thus far. (In fact, the renaming
operator is only needed for syntactic convenience, and even the ∩ operator is redundant; R
∩ S can be defined as R − (R − S).) We will consider these additional operators,and
their definition in terms of the basic operators, in the next two subsections.

Dept of CSE, Unit-2 Page 5


Joins

The join operation is one of the most useful operations in relational algebra and is
the most commonly used way to combine information from two or more relations.
Although a join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products.

joins have received a lot of attention, and there are several variants of
the join operation.

Condition Joins

The most general version of the join operation accepts a join condition c and a pair of
relation instances as arguments, and returns a relation instance. The join condition is
identical to a selection condition in form. The operation is defined as follows:

R ⊲⊳c S = σc(R × S)

Thus ⊲⊳ is defined to be a cross-product followed by a selection. Note that the condition


c can (and typically does) refer to attributes of both R and S. The reference to an

attribute of a relation, say R, can be by position (of the form R.i) or by name (of the
form R.name).As an example, the result of S1 ⊲⊳S1.sid<R1.sid R1 is shown in Figure 4.12.
Because sid appears in both S1 and R1, the corresponding fields in the result of the
cross-product S1 × R1 (and therefore in the result of S1 ⊲⊳S1.sid<R1.sid R1) are
unnamed. Domains are inherited from the corresponding fields of S1 and R1.

(sid) sname rating age (sid) bid day


22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 58 103 11/12/96

Figure 4.12 S1 ⊲⊳S1.sid<R1.sid R1

Equijoin

Dept of CSE, Unit-2 Page 6


A common special case of the join operation R ⊲⊳ S is when the join condition con-
sists solely of equalities (connected by ∧ ) of the form R.name1 = S.name2, that is,
equalities between two fields in R and S. In this case, obviously, there is some redun-
dancy in retaining both attributes in the result. For join conditions that contain only
such equalities, the join operation is refined by doing an additional projection in which
S.name2 is dropped. The join operation with this refinement is called equijoin.
The schema of the result of an equijoin contains the fields of R (with the same names
and domains as in R) followed by the fields of S that do not appear in the join
conditions. If this set of fields in the result relation includes two fields that inherit the
same name from R and S, they are unnamed in the result relation.

We illustrate S1 ⊲⊳R.sid=S.sid R1 in Figure 4.13. Notice that only one field called sid
appears in the result.
sid sname rating age bid day
22 Dustin 7 45.0 101 10/10/96
58 Rusty 10 35.0 103 11/12/96
r
Figure 4.13 S1 ⊲⊳ R.sid=S.sid R1
Natural Join

A further special case of the join operation R ⊲⊳ S is an equijoin in which equalities


are specified on all fields having the same name in R and S. In this case, we can
simply omit the join condition; the default is that the join condition is a collection of
equalities on all common fields. We call this special case a natural join, and it has the
nice property that the result is guaranteed not to have two fields with the same name.

The equijoin expression S1 ⊲⊳R.sid=S.sid R1 is actually a natural join and can simply be
denoted as S1 ⊲⊳ R1, since the only common field is sid. If the two relations have no
attributes in common, S1 ⊲⊳ R1 is simply the cross-product.

Division

The division operator is useful for expressing certain kinds of queries, for example:
“Find the names of sailors who have reserved all boats.” Understanding how to use
the basic operators of the algebra to define division is a useful exercise. However,

Dept of CSE, Unit-2 Page 7


the division operator does not have the same importance as the other operators—it
is not needed as often, and database systems do not try to exploit the semantics of
division by implementing it as a distinct operator (as, for example, is done with the
join operator).

We discuss division through an example. Consider two relation instances A and B in


which A has (exactly) two fields x and y and B has just one field y, with the same
domain as in A. We define the division operation A/B as the set of all x values (in
the form of unary tuples) such that for every y value in (a tuple of) B, there is a tuple

〈x,y〉 in A.

Another way to understand division is as follows. For each x value in (the first column
of) A, consider the set of y values that appear in (the second field of) tuples of A with
that x value. If this set contains (all y values in) B, the x value is in the result of A/B.

An analogy with integer division may also help to understand division. For integers A
and B, A/B is the largest integer Q such that Q ∗ B ≤ A. For relation instances A
and B, A/B is the largest relation instance Q such that Q × B ⊆ A.

Division is illustrated in Figure 4.14. It helps to think of A as a relation listing the


parts supplied by suppliers, and of the B relations as listing parts. A/Bi computes
suppliers who supply all parts listed in relation instance Bi.

Expressing A/B in terms of the basic algebra operators is an interesting exercise, and
the reader should try to do this before reading further. The basic idea is to compute
all x values in A that are not disqualified. An x value is disqualified if by attaching a

y value from B, we obtain a tuple 〈x,y〉 that is not in A. We can compute disqualified tuples

using the algebra expression

πx((πx(A) × B) − A)
Thus we can define A/B as
πx(A) − πx((πx(A) × B) − A)
Dept of CSE, Unit-2 Page 8
To understand the division operation in full generality, we have to consider the case
when both x and y are replaced by a set of attributes.
More Examples of Relational Algebra Queries
We illustrate queries using thei nstances S3 of Sailors, R2 of Reserves, and B1 of Boats,
shown in Figures 4.15,4.16,and4.17, respectively.

sid sname rating age sid bid day


22 Dustin 7 45.0 22 101 10/10/98
29 Brutus 1 33.0 22 102 10/10/98
31 Lubber 8 55.5 22 103 10/8/98
32 Andy 8 25.5 22 104 10/7/98
58 Rusty 10 35.0 31 102 11/10/98
64 Horatio 7 35.0 31 103 11/6/98
71 Zorba 10 16.0 31 104 11/12/98
74 Horatio 9 35.0 64 101 9/5/98
85 Art 3 25.5 64 102 9/8/98
95 Bob 3 63.5 74 103 9/8/98

Figure 4.15An Instance S3 of Sailors Figure 4.16An Instance R2 of Reserves

bid bname color


101 Interlak blue
102 e
Interlak red
103 eClipper green
104 Marine red

Figure 4.17 An Instance B1 of Boats

(Q1) Find the names of sailors who have reserved boat 103.

This query can be written as follows:


πsname((σbid=103Reserves) ⊲⊳Sailors)
We first compute the set of tuples in Reserves with bid = 103 and then take the
natural join of this set with Sailors. This expression can be evaluated on instances
of Reserves and Sailors. Evaluated on the instances R2 and S3, it yields a relation

that contains just one field, called sname, and three tuples 〈Dustin〉, 〈Horatio〉,and

Dept of CSE, Unit-2 Page 9


Lubber〉. (Observe that there are two sailors called Horatio, and only one of them has

reserved a red boat.)

We can break this query into smaller pieces using the renaming operator ρ:
ρ(T emp1, σbid=103Reserves)
ρ(T emp2, T emp1 ⊲⊳ Sailors) πsname(Temp2)
T emp1 denotes an intermediate relation that identifies reservations of boat 103. T
emp2 is another intermediate relation, and it denotes sailors who have made a
reservation in the set T emp1. The instances of these relations when evaluating this
query on the instances R2 and S3 are illustrated in Figures 4.18 and 4.19. Finally, we
extract the sname column from T emp2.

sid bid day sid sname rating age bid day


22 103 10/8/98 22 Dustin 7 45.0 103 10/8/98
31 103 11/6/98 31 Lubber 8 55.5 103 11/6/98
74 103 9/8/98 74 Horatio 9 35.0 103 9/8/98
Figure 4.18Instance of T emp1 Figure 4.19 Instance of T emp2

πsname(σbid=103(Reserves⊲⊳Sailors))

The DBMS translates an SQL query into (an extended form of) relational algebra, and
then looks for other algebra expressions that will produce the same answers but are
cheaper to evaluate. If the user’s query is first translated into the expression

πsname(σbid=103(Reserves⊲⊳Sailors))
a good query optimizer will find the equivalent expression
πsname((σbid=103Reserves) ⊲⊳ Sailors)

(Q2) Find the names of sailors who have reserved a red boat.

πsname((σcolor=′red′ Boats) ⊲⊳ Reserves ⊲⊳ Sailors


This query involves a series of two joins. First we choose (tuples describing) red boats.
Then we join this set with Reserves (natural join, with equality specified on the bid
column) to identify reservations of red boats. Next we join the resulting intermediate

Dept of CSE, Unit-2 Page 10

You might also like