Database Management Systems-4
Database Management Systems-4
Due to these considerations, we use the positional notation to formally define relational
algebra and calculus. We also introduce simple conventions that allow intermediate
relations to ‘inherit’ field names, for convenience.
The key fields are underlined, and the domain of each field is listed after the
field name. Thus sid is the key for Sailors, bid is the key for Boats, and all three
fields together form the key for Reserves. Fields in an instance of one of these relations
will be referred to by name, or positionally, using the order in which they are listed above.
In several examples illustrating the relational algebra operators, we will use the
in-stances S1 and S2 (of Sailors) and R1 (of Reserves) shown in Figures 4.1, 4.2, and 4.3,
respectively,
Relational algebra includes operators to select rows from a relation (σ) and to project
columns (π). These operations allow us to manipulate data in a single relation. Con-
sider the instance of the Sailors relation shown in Figure 4.2, denoted as S2. We can
retrieve rows corresponding to expert sailors by using the σ operator. The expression,
σrating>8(S2)
evaluates to the relation shown in Figure 4.4. The subscript rating>8 specifies the
selection criterion to be applied while retrieving tuples.
sname rating
yuppy 9
Lubber 8
Guppy 5
Rusty 10
sid sname rating age
28 Yuppy 9 35.0
58 Rusty 10 35.0
Figure 4.4 σrating>8(S2) Figure 4.5πsname,rating(S2)
Suppose that we wanted to find out only the ages of sailors. The expression
πage(S2)
a single tuple with age=35.0 appears in the result of the projection. This follows from
the definition of a relation as a set of tuples. In
practice, real systems often omit the expensive step of eliminating duplicate tuples,
leading to relations that are multisets. However, our discussion of relational algebra
and calculus assumes that duplicate elimination is always done so that relations are
always sets of tuples.
We can compute the names and ratings of highly rated sailors by combining two of the
preceding queries. The expression
πsname,rating(σrating>8(S2))
Set Operations
The following standard operations on sets are also available in relational algebra: union (U),
intersection (∩), set-difference (−), and cross-product (×).
Figure 4.8 S1 ∪ S2
For example, the expression ρ(C(1 → sid1, 5 → sid2), S1 × R1) returns a relation
that contains the tuples shown in Figure 4.11 and has the followi ng schema: C(sid1:
integer, sname: string, rating: integer, age: real, sid2: integer, bid:
integer,day: dates).
It is customary to include some additional operators in the algebra, but they can all be
defined in terms of the operators that we have defined thus far. (In fact, the renaming
operator is only needed for syntactic convenience, and even the ∩ operator is redundant; R
∩ S can be defined as R − (R − S).) We will consider these additional operators,and
their definition in terms of the basic operators, in the next two subsections.
The join operation is one of the most useful operations in relational algebra and is
the most commonly used way to combine information from two or more relations.
Although a join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products.
joins have received a lot of attention, and there are several variants of
the join operation.
Condition Joins
The most general version of the join operation accepts a join condition c and a pair of
relation instances as arguments, and returns a relation instance. The join condition is
identical to a selection condition in form. The operation is defined as follows:
R ⊲⊳c S = σc(R × S)
attribute of a relation, say R, can be by position (of the form R.i) or by name (of the
form R.name).As an example, the result of S1 ⊲⊳S1.sid<R1.sid R1 is shown in Figure 4.12.
Because sid appears in both S1 and R1, the corresponding fields in the result of the
cross-product S1 × R1 (and therefore in the result of S1 ⊲⊳S1.sid<R1.sid R1) are
unnamed. Domains are inherited from the corresponding fields of S1 and R1.
Equijoin
We illustrate S1 ⊲⊳R.sid=S.sid R1 in Figure 4.13. Notice that only one field called sid
appears in the result.
sid sname rating age bid day
22 Dustin 7 45.0 101 10/10/96
58 Rusty 10 35.0 103 11/12/96
r
Figure 4.13 S1 ⊲⊳ R.sid=S.sid R1
Natural Join
The equijoin expression S1 ⊲⊳R.sid=S.sid R1 is actually a natural join and can simply be
denoted as S1 ⊲⊳ R1, since the only common field is sid. If the two relations have no
attributes in common, S1 ⊲⊳ R1 is simply the cross-product.
Division
The division operator is useful for expressing certain kinds of queries, for example:
“Find the names of sailors who have reserved all boats.” Understanding how to use
the basic operators of the algebra to define division is a useful exercise. However,
〈x,y〉 in A.
Another way to understand division is as follows. For each x value in (the first column
of) A, consider the set of y values that appear in (the second field of) tuples of A with
that x value. If this set contains (all y values in) B, the x value is in the result of A/B.
An analogy with integer division may also help to understand division. For integers A
and B, A/B is the largest integer Q such that Q ∗ B ≤ A. For relation instances A
and B, A/B is the largest relation instance Q such that Q × B ⊆ A.
Expressing A/B in terms of the basic algebra operators is an interesting exercise, and
the reader should try to do this before reading further. The basic idea is to compute
all x values in A that are not disqualified. An x value is disqualified if by attaching a
y value from B, we obtain a tuple 〈x,y〉 that is not in A. We can compute disqualified tuples
πx((πx(A) × B) − A)
Thus we can define A/B as
πx(A) − πx((πx(A) × B) − A)
Dept of CSE, Unit-2 Page 8
To understand the division operation in full generality, we have to consider the case
when both x and y are replaced by a set of attributes.
More Examples of Relational Algebra Queries
We illustrate queries using thei nstances S3 of Sailors, R2 of Reserves, and B1 of Boats,
shown in Figures 4.15,4.16,and4.17, respectively.
(Q1) Find the names of sailors who have reserved boat 103.
that contains just one field, called sname, and three tuples 〈Dustin〉, 〈Horatio〉,and
We can break this query into smaller pieces using the renaming operator ρ:
ρ(T emp1, σbid=103Reserves)
ρ(T emp2, T emp1 ⊲⊳ Sailors) πsname(Temp2)
T emp1 denotes an intermediate relation that identifies reservations of boat 103. T
emp2 is another intermediate relation, and it denotes sailors who have made a
reservation in the set T emp1. The instances of these relations when evaluating this
query on the instances R2 and S3 are illustrated in Figures 4.18 and 4.19. Finally, we
extract the sname column from T emp2.
πsname(σbid=103(Reserves⊲⊳Sailors))
The DBMS translates an SQL query into (an extended form of) relational algebra, and
then looks for other algebra expressions that will produce the same answers but are
cheaper to evaluate. If the user’s query is first translated into the expression
πsname(σbid=103(Reserves⊲⊳Sailors))
a good query optimizer will find the equivalent expression
πsname((σbid=103Reserves) ⊲⊳ Sailors)
(Q2) Find the names of sailors who have reserved a red boat.