Chapter - 8-dbms
Chapter - 8-dbms
the union of two relations r1(X) and Father Child Mother Child
r2(X). defined on the same set of Adam Cain Eve Cain
attributes X, is expressed as r1 r2 and Adam Abel Eve Seth
Abraham Isaac Sarah Isaac
is also a relation on X containing the Abraham Ishmael Hagar Ishmael
tuples that belong to r1 or to r2, or to
both;
the difference of r1(X) and r2(X) is ΡΑΤΕRΝΓΓΥ MATERNITY ??
expressed as r1 - r2 and is a relation on
X containing the tuples that belong to Figure 3.2 A meaningful but incorrect union
r1 and not to r2;
the intersection of r1(X) and r2(X) is PATERNITY P A R E N T S < –F A T H E R (P A T E R N IT Y )
expressed as r1 r2 and is a
relation on X containing the
tuples that belong to both r1 and Father Child Parent Child
Adam Cain Adam Cain
r2. Adam Abel Adam Abel
Abraham Isaac Abraham Isaac
3.1.2 Renaming Isaac Jacob Isaac Jacob
The limitations we have had to impose on
the standard set operators, although Figure 3.3 A renaming.
justified, seem particularly restrictive. For
instance, consider the two relations in
Figure 3.2. It would be meaningful to execute a sort of union on them in order to obtain all the 'parent-
3.2
Chapter 3 Relational Algebra and Calculus
child' pairs held in the database, but that is not possible, because the attribute that we have instinctively
called Parent, is in fact called Father in one relation and Mother in the other.
To resolve the problem, we introduce a specific operator, whose sole purpose is to adapt attribute
names, as necessary, to facilitate the application of set operators. The operator is called renaming,
because it actually changes the names of the attributes, leaving the contents of the relations unchanged.
An example of renaming is shown in Figure 3.3; the operator changes the name of the attribute Father
to Parent, as indicated by the notation Parent Father given in subscript of the symbol . which
denotes the renaming; looking at the table it is easy to see how only the heading changes, leaving the
main body unaltered.
Figure 3.4 shows the application of the union to the result of two different names of the relations in
Figure 3.2.
Let us define the renaming operator in general
P A R E N T S < -– F A T H E R ( P A T E R N I T Y )
terms. Let r be a relation defined on the set of
P A R E N T S < –M O T H E R (P A T E R N IT Y )
attributes X and let Υ be another set of attributes
with the same cardinality Furthermore, let Parent Child
AlA2...Ak and B1B2…Bk be respectively an ordering
Adam Cain
of the attributes in X and an ordering of those in
Adam Abel
Y. Then the renaming
Abraham Isaac
Abraham Ishmael
B1B2 Bk A1 A2 Ak r
Eve Cain
contains a tuple t' for each tuple t in r. defined as Eve Seth
follows: t' is a tuple on Υ and t'(Bi) = t(Ai) for i= Sarah Isaac
1,..., n. The definition confirms that the changes Hagar Ishmael
that occur are changes to the names of the
Figure 3.4 A union preceded by two rcnamings
attributes, while the values remain unaltered and
are associated with new attributes. In practice, in the two lists A[A2…Ak and BlB2…Bk we indicate only those
attributes that are renamed (that is. those for which Ai Bi. This is the reason why in Figure 3.3 we have
written
PARENTFATHER (PATERNITY)
and not
PARENT,CHILDFATHER,CHILD(PATERNITY)
Figure 3.5 shows another
example of union preceded by
EMPLOYEES STAFF
renaming. In this case, in each
relation there arc two attributes Surname Branch Salary Surname Factory Wages
that are renamed and therefore Patterson Rome 45 Cooke Chicago 33
the ordering of the pairs Trumble London 53 Bush Monza 32
(Branch. Salary and so on) is
significant. LO C A T IO N ,P A Y < - – B R A N C H ,S A L A R Y ( E M P L O Y E E S )
LO C A T IO N ,P A Y < -– F A C T O R Y ,W A G E S (E M P L O Y E E S )
3.1.3 Selection
Surname Location Pay
We now turn our attention to Patterson Rome 45
the specific operators of Trumble London S3
relational algebra that allow the Cooke Chicago 33
manipulation of relations. Bush Monza 32
There are three operators, Figure 3.5 Another union preceded by renaming
selection, projection and join
(the last having several variants).
3.3
Chapter 3 Relational Algebra and Calculus
Before going into detail, note that selection and projection carry out functions that could be defined as
complementary (or orthogonal). They are both unary (that is, they have one relation as argument) and
produce as result a portion of that relation. More precisely, a selection produces a subset of tuples on all
the attributes, while a projection gives a result to which all the tuples contribute, but on a subset of
attributes. As illustrated in Figure 3.6, we can
say that selection generates 'horizontal selection
A Β C
decompositions' and projection generates
'vertical decompositions'.
Figure 3.7 and Figure 3.8 show two examples of
selection, which illustrate the fundamental
characteristics of the operator, denoted by the
A Β
symbol σ, with the appropriate 'selection projection
condition' indicated as subscript. The result
contains the tuples of the operand that satisfy
the condition. As shown in the examples, the
selection conditions can allow both for
comparisons between attributes and for Figure 3.6 Selection and projection are
comparisons between attributes and constants, orthogonal operators
and can be complex, being obtained by combining simple conditions with the logical connectives
(or), (and) and (not).
More precisely,
EMPLOYEES Age<30 Salary>40(EMPLOYEES)
given a relation r(X),
a prepositional Surname FirstName Age Salary
Surname FirstName Age Salary
formula F on X is a Smith Mary 25 2000 Smith Mary 25 2000
formula obtained by Black Lucy 40 3000 Verdi Nico 36 4500
combining atomic Verdi Nico 36 4500
conditions of the Smith Mark 40 3900
type ΑυΒ or Aυc
with the connectives
, and , where: Figure 3.7 A selection.
υ is a
comparison operator (=, . >. <. ,);
A and Β are attributes in X that are compatible (that is. the comparison υ is meaningful on the
values of their domains);
CITIZENS PlaceOfBirth=Residence(CITIZENS)
3.4
Chapter 3 Relational Algebra and Calculus
F1 F2, F1 Λ F2 and F1 have the usual meaning.
At this point we can complete the definition:
the selection F(r) produces a relation on the same attributes as r that contains the tuples of r for
which F is true.
3.1.4 Projection
The definition of the
projection operator is EMPLOYEES Surname,FirstName(EMPLOYEE)
also simple: given a
relation r(X) and a Surname FirstName Department Head Surname FirstName
subset Y of X, the Smith Mary Sales De Rossl Smith Mary
Black Lucy Sales De Rossi Black Lucy
projection of r on Υ Verdi Mary Personnel Fox Verdi Mary
(indicated by Y(r)) is Smith Mark Personnel Fox Smith Mark
the set of tuples on Υ
obtained from the tuples
Figure 3.9 A projection
of r considering only the
values on Ϋ.
Y(r)={t(Y) | tr}
Figure 3.9 shows a first example of projection, which clearly illustrates the concept mentioned above.
The projection allows the vertical decomposition of relations: the result of the projection contains in
this case as many tuples as its operand, defined however only on some of the attributes.
Figure 3.10 shows another projection, in which we note a different situation. The result contains fewer
tuples than the operand, because all the tuples in the operand that have equal values on all the attributes
of the projection give the same contribution to the projection itself. As relations are defined as sets,
they are not allowed to have tuples with the same values: equal contributions 'collapse' into a single
tuple.
In general, we can say EMPLOYEES Department,Head(EMPLOYEE)
that the result of a
projection contains at
most as many tuples as Surname FirstName Department Head Department Head
Smith Mary Sales De Rossl Sales De Rossl
the operand, but can Black Lucy Sales De Rossi Personnel Fox
contain fewer, as shown Verdi Mary Personnel Fox
in Figure 3.10. Note also Smith Mark Personnel Fox
that there exists a link
between the key Figure 3.10 A projection with fewer tuples than operands
constraints and the
projections: Y(r) contains the same number of tuples as r if and only if Υ is a superkey for r. In fact:
if Y is a superkey, then r does not contain pairs of tuples that are equal on Y and thus each tuple
makes a different contribution to the projection;
if the projection has as many tuples as the operand, then each tuple of r contributes to the
projection with different values, and thus r does not contain pairs of tuples equal on Y, but this
is exactly the definition of a superkey.
For the relation EMPLOYEES in Figure 3.9 and Figure 3.10, the attributes Surname and FirstName
form a key (and thus a superkey), while Department and Head do not form a superkey. Incidentally,
note that a projection can produce a number of tuples equal to those of the operand even if the attributes
involved are not defined as superkeys (of the schema) but happen to be a superkey for the specific
relation. For example, if we reconsider the relations discussed in Chapter 2 on the schema
Surname,FirstName(EMPLOYEE)
STUDENTS(RegNum, Surname, FirstName, BirthDate, DegreeProj)
Surname FirstName
3.5 Smith Mary
Black Lucy
Verdi Mary
Surname,FirstName (EMPLOYEE)
Smith Mark
Chapter 3 Relational Algebra and Calculus
we can say that for all the relations, the projection on RegNum and that on Surname, FirstName and
BirthDate have the same number of tuples as the operand. Conversely, a projection on Surname and
DegreeProg can have fewer tuples; however in the particular case (as in the example in Figure 2.16) in
which there are no students with the same surname enrolled on the same degree program, then the
projection on Surname and DegreeProg also has the same number of tuples as the operand.
3.1.5 Join
Let us now examine the join operator, which is the most important one in relational algebra. The join
allows us to establish connections among data contained in different relations, comparing the values
contained in them and thus using the fundamental characteristics of the model, that of being value-
based. There arc two main versions of the operator, which are, however, obtainable one from the other.
The first is useful for an introduction and the second is perhaps more relevant from a practical point of
view.
Natural join The natural join,
denoted by the symbol . is an r1 Employee Department Department Head
r2
operator that correlates data in Smith sales production Mori
Hack production sales Brown
different relations, on the basis of
Bfenchi production
equal values of attributes with
the same name. (The join is
defined here with two operands, Employee Department Head
but can be generalized.) Figure
r1 r2 Smith sales Brown
3.11 shows an example. The Black production Mori
Bianchi production Mori
result of the join is a relation on
the union of the sets of attributes
of the operands: in the figure, the
result is defined on Employee, F ig u r e 3 .1 1 A n a t u r a l j o in
Department. Head, that is on
the union of Employee, Department and Department, Head. The tuples in the join arc obtained by
combining the tuples of the operands with equal values on the common attributes, in the example the
attribute Department: for instance, the first tuple of the join is derived from the combination of the
first tuple of the relation r1 and the second tuple of r2: in fact they both have sales as the value for
Department.
In general, we say that the natural join r1r2 of r1(X1) and r2(X2) is a relation defined on X1X2 (that is,
on the union of the sets X1, and X2), as follows:
r1 r2= {t on X1X2 | exist t1 r1 and t2 r2 with t(X1) = t1 and t(X2) = t2}
More concisely, we could have written:
CARS
OFFENCES CARS
Code Date Officer Department Regtstration Owner Address
143256 25/10/92 567 75 5694 FR Latour Hortense Avenue Foch
967554 26/10/92 456 75 5694 FR Latour Hortense Avenue Foch
987557 26/10/92 456 75 6544 XY Cordon Edouard Rue du Pont
630876 15/10/92 456 47 6544 XY Mimault Bernard Avenue FDR
539856 12/10/92 567 47 6544 XY Mimault Bernard Avenue FDR
Figure 3.12 The relations offences and CARS (from Figure 2.19) and their join
Registration form a key for CARS; (ii) at least one because of the referential constraint between
Department and Registration in OFFENCES and the (primary) key of CARS. The join, therefore, has
exactly as many tuples as the relation OFFENCES.
Figure 3.13 shows another example of join, using the same relations as we have already used (Figure
3.4) to demonstrate a union preceded by renamings. Here, the data of the two relations is combined
according to the value of the child, returning the parents for each person for whom both are indicated in the
database.
The two examples, taken together, show how the
PATERNITY MATERNITY
various relational algebra operators allow
different ways of combining and correlating the
Father Child Mother Child
data contained in a database, according to the
Adam Cain Eve Cain
various requirements. Eve Seth
Adam Abel
Complete and incomplete joins Let us look at Abraham Isaac Sarah Isaac
Abraham lshmael Hagar Ishmael
some different examples of join, in order to
highlight some important points. In the example
in Figure 3.11, we can say that each tuple of each ΡΑΤΕRΝΠΥ MATERNITY
of the operands contributes to at least one tuple
of the result. In this case, the join is said to be Father Child Mother
complete. For each tuple t1 of r1, there is a tuple t Adam Cain Eve
Abraham Isaac Sarah
in r1 r2, such that t(X1) = t1 (and similarly for Abraharn Ishmael Hagar
r2). This property does not hold in general,
because it requires a correspondence between the Figure 3.13 Offspring with both parents.
tuples of the two relations. Figure 3.14 shows a
join in which some tuples in the operands (in
particular, the first of r1 and the second of r2 do not contribute to the result. This is because these tuples
3.7
Chapter 3 Relational Algebra and Calculus
have no counterpart (that is, a tuple with the same value on the common attribute Department) in the
other relation. These tuples are referred to as dangling tuples.
There is even the possibility, as an extreme case, that none of the tuples of the operands can be
combined, and this gives rise to an empty result (see the example in Figure 3.15).
In the extreme opposite
situation, each tuple of each Employee Department
Department Head
operand can be combined with r1 Smith sales r2 production Mori
Black production
all the tuples of the other, as White production
purchasing Brown
shown in Figure 3.16. In this
case, the result contains a Employee Department Head
number of tuples equal to the r1 r2 Black production Mori
product of the cardinalities of White production Mori
the operands and thus,|r1| x |r2|
tuples (where |r| indicates the
cardinality of the relation r).
Figure 3.14 A join with 'dangling* tuples
To summarize, we can say that
the join of r1 and r2 contains a
number of tuples between zero and | r1 | x | r2|. Furthermore:
if the join of r1 and
Employee Department r2 is complete, then
Department Head
r1 Smith sales r2 marketing Mori it contains a
Black production
White production
purchasing Brown number of tuples at
least equal to the
maximum of | r1 |
Employee Department Head
r1 r2 and | r2|;
if Χ1 Χ2 contains
a key for r2, then
Figure 3.15 An empty join. the join of r1(X1)
and r2(X2) contains
at most | r1 | tuples;
if X1 X2 is the primary key for r2 and there is a referential constraint between X1 X2 in r1 and
such a key of r2, then the join Employee Project r2 Project Head
of r1(X1) and r2(X2) contains r1 Smith A
exactly |r1| tuples. A Mori
Black A
White A A Brown
Outer joins. The fact that the join
operator 'leaves out' the tuples of a
relation that have no counterpart in the
other operand is useful in some cases Employee Project Head
but inconvenient in others, given the r1 r2 Smith A Mori
Black A Mori
possibility of omitting important
White A Mori
information. Take, for example, the Smith A Brown
join in Figure 3.14. Suppose we are Black A Brown
interested in all the employees, along White A Brown
with their respective heads, if known. Figure 3.16 A join with |r1 | X |r2| tuples.
The natural join would not help in
producing this result. For this purpose,
a variant of the operator called outer join was proposed (and adopted in the last version of SQL, as
discussed in Chapter 4). This allows for the possibility that all the tuples contribute to the result,
extended with null values where there is no counterpart. There are three variants of this operator: the
left outer join, which extends only the tuples of the first operand, the right outer join, which extends
those of the second operand and the full outer join, which extends all tuples. In Figure 3.17 we
3.8
Chapter 3 Relational Algebra and Calculus
demonstrate examples of outer joins on the relations already seen in Figure 3.14. The syntax is self-
explanatory.
N-ary join, intersection and cartesian product. Let us look at some of the properties of the natural
join. (We refer here to natural join rather than to outer join, for which some of the properties discussed here
do not hold.) First let us observe that it is commutative, that is, rl r2 is always equal to r2 r1. and associative.
r1 (r2 r3) is equal to (r1 r2} r3. Thus, we can write, where necessary, join sequences without brackets:
r1 r2 … rn or 1nri
Note also that we have stated no Employee Department
specific hypothesis about the sets of Department Head
r1 Smith sales r2 production Mori
attributes Xt and X2 on which the Black production
purchasing Brown
operands are defined. Therefore, the White production
two sets could even be equal or be
disjoint. Let us examine these extreme Employee Department Head
r1 LEFT r2 Smith sales NULL
cases; the general definition given
Black production Mori
above is still meaningful, but certain White production Mori
points should be noted. If X1 = X2, then Employee Department Head
the join coincides with the Black production Mori
intersectionr,U,) r1 RIGHT r2 White production Mori
NULL purchasing Brown
r1 r2(X1,)=r1(X1)r2(X1)
since, by definition, the result is a relation Employee Department Head
r1 FULL r2 Smith sales NULL
on the union of the two sets of attributes,
Black production Mori
and must contain the tuples t such that
White production Mori
t(X1) r1 and t(X2) r2 If X1 = X2. the union NULL purchasing Brown
of X1and X2 is also equal to X1, and thus t is
defined on X1: the definition thus requires
that t r1 and t r2, and therefore Figure 3.17 Some outer joins
coincides with the definition of
intersection.
EMPLOYEES PROJECTS
The case where the two sets of attributes arc
disjoint requires even more attention. The Employee Project
result is always defined on the union X1X2, Code Name
Smith A
and each tuple is always derived from two A Venus
Blade A
Β Mars
tuples, one for each of the operands. Black Β
However, since such tuples have no
attributes in common, there is no
requirement to be satisfied in order for them EMPLOYEES PROJECTS
to participate in the join. The condition that
the tuples must have the same values on the
common attributes is always verified. So the Employee Project Code Name
result of this join contains the tuples Smith A A Venus
Black A A Venus
obtained by combining the tuples of the
Black Β A Venus
operands in all possible ways. In this case, Smith A Β Mars
we often say that the join becomes a Black A Β Mars
cartesian product. This could be described Black Β Β Mars
as an operator defined (using the same
definition given above for natural join) on
Figure 3.18 A cartesian product
relations that have no attributes in common.
The use of the term is slightly misleading, as
it is not really the same as a cartesian product between sets. The cartesian product of two sets is a set of
pairs (with the first element from the first set and the second from the second). In the case here we have
3.9
Chapter 3 Relational Algebra and Calculus
tuples, each obtained by juxtaposing a tuple of the first relation and a tuple of the second. Figure 3.18
shows an example of the cartesian product, demonstrating how the result contains a number of tuples
equal to the product of the cardinalities of the operands.
Theta-join and equi-join. If we examine Figure 3.18, it is obvious that a cartesian product is, in
general, of very little use, because it combines tuples in a way that is not necessarily significant. In fact,
however, the Cartesian product is often followed by a selection, which preserves only the combined
tuples that satisfy the given requirements. For example, it makes sense to define a Cartesian product on
the relations EMPLOYEES and PROJECTS, if it is followed by the selection that retains only the
tuples with equal values on the attributes Project and Code (see Figure 3.19).
For this reason, another operator is often introduced, the theta-join. It is a derived operator, in the sense
that it is defined by means of other operators. Indeed, it is a cartesian product followed by a selection,
as follows:
3.10
Chapter 3 Relational Algebra and Calculus
3.1.6 Queries in relational algebra
In general, a query can be defined as a function that, when applied to database instances, produces
relations. More precisely, given a schema R of a database, a query is a function that, for every instance
r of R, produces a relation on a given set of attributes X, The expressions in the various query
languages (such as relational algebra) 'represent' or 'implement' queries: each expression defines a
function. We indicate by means of E(r) the result of the application of the expression E to the database
r.
In relational algebra, the queries on a database schema R are formulated by means of expressions
whose atoms are (names of) relations in R (the 'variables'). We conclude the presentation of relational
algebra by showing the formulation of some queries of increasing complexity, which refer to the
schema containing the two relations:
Head(SUPRVISIONEmployee=Number(Salary>40(EMPLOYEES))) (3.2)
The result is shown in Figure 3.22, referring again to
Number Name Age
the database in Figure 104 Luigi Neri 38
3.20.
210 Marco Celli 49
231 Siro Bisi 50
252 Nico Bini 44
301 Steve Smith 34
375 Mary Smith 50
3.11
Chapter 3 Relational Algebra and Calculus Head
Let us move on to some more complex examples. We begin by 210
slightly changing the above query: find the names and salaries 301
of the supervisors of the employees earning more than 40 375
thousand. Here, we can obviously use the preceding expression,
Figure 3.22 The result of the
but we must then produce, for each tuple of the result, the application of Expression 3.2 to the
information requested on the supervisor, which must be database in Figure 3.20.
extracted from the relation EMPLOYEES. Each tuple of the
result is constructed on the basis of three tuples, the first from EMPLOYEES (about an employee
earning more than 40 thousand), the second from SUPERVISION (giving the number of the supervisor
of the employee in question), and the third again from EMPLOYEES (with the information concerning
the supervisor). The solution intuitively requires the join of the relation EMPLOYEES with the result
of the preceding expression, but a warning is needed. In general, the supervisor and the employee are
not the same, and thus the two tuples of EMPLOYEES that contribute to a tuple of the join are
different. The join must therefore be preceded by a suitable renaming. The following is an example:
NameH,SalaryH(NumberH,NameH,SalaryH.AgeHNumber,Name.Salary,Age(EMPLOYEES)
NumberH=Head
(SUPERVISION Employee=Number(EMPLOYEES)))
(3.3)
Number,name,Salary,Numberh,nameH,SalaryH
(Salary>SalaryH
(NumberH,NameH,SalaryH.AgeHNumber,Name.Salary,Age(EMPLOYEES)
NumberH=Head(SUPERVISION Employee=Number(EMPLOYEES)))) (3.4)
The last example requires even more care: find the registration numbers and names of the supervisors
whose employees all earn more than 40 thousand. The
query includes a sort of universal quantification, but NumberName301Steve
relational algebra does not contain any constructs directly Smith375Mary Smith
suited to this purpose. We can, however, proceed with a
double negation, finding the supervisors none of whose
employees earns 40 thousand or less. This query is possible Figure 3.25. The result of
in relational algebra, using the difference operator. We expression 3.5 on the database
select all the supervisors except those who have an shown in Figure 3.20
employee who earns 40 thousand or less. The expression is
as follows:
3.12
Chapter 3 Relational Algebra and Calculus
Number,Name(EMPLOYEES NumberH=Head
(Head(SUPERVISION) –
(Head(SUPERVISION Employee=Number(Salary 40(EMPLOYEES))))) (3.5)
The result of this expression on the database in Figure 3.20 is shown in Figure 3.25.
x x (y + z)=x x y + x x z
For each value substituted for the three variables, the two expressions give the same result. In relational
algebra, we can give a similar definition. A first notion of equivalence refers to the database schema:
E1 R E2 if Ε1(r) E2(r). for every instance of r in R.
Absolute equivalence is a stronger property and is defined as follows:
E1 E2 if E1 R E2, for every schema R.
The distinction between the two cases is due to the fact that the attributes of the operands are not
specified in the expressions (particularly in the natural join operations). An example of absolute
equivalence is the following:
F1F2(E) F1(F2(E))
where Ε is any expression. This transformation allows the application of subsequent transformations
that operate on selections with atomic conditions.
2. Cascading projections: a projection can be transformed into a cascade of projections that 'eliminate'
attributes in various phases:
X(E) X (XY(E)
if E is defined on a set of attributes that contain Υ (and X). This too is a preliminary transformation that
will be followed by others.
3.13
Chapter 3 Relational Algebra and Calculus
3. Anticipation of the selection with respect to the join (often described as 'pushing selections down'):
F (ElE2) ElF(E2)
if the condition F refers only to attributes in the sub-expression E2.
4- Anticipation of the projection with respect to the join ('pushing projections down'); let E1 and E2 be
defined on X1 and X2 respectively; if Y2 X2 and Y2 X1 X2 (so the attributes in X2 - Y2 are not
involved in the join), then the following holds:
Y1 = (X1 Y) J1
Y2= (X2 Y) J2
On the basis of the equivalences above, we can eliminate from each relation all the attributes that do
not appear in the final result and are not involved in the join.
5. Combination of a selection and a cartesian product to form a theta-join:
3.14
Chapter 3 Relational Algebra and Calculus
6. Distribution of the selection with respect to the union:
3.16
Chapter 3 Relational Algebra and Calculus
Age > 30 Age 30Age IS NULL (PEOPLE)
This approach, as we explain in Chapter 4. is used in the present version of SQL, which supports a
three-valued logic, and is usable in earlier versions, which adopted a two-valued logic.
3.1.9 Views
In Chapter 1, we saw how it can be useful to make different representations of the same data available
to users. In the relational model, this is achieved by means of derived relations, that is, relations whose
content is defined in terms of the contents of other relations. Thus, in a relational database there can
exist base relations, whose content is autonomous and actually stored in the database, and derived
relations, whose content is derived from the content of other relations. It is possible that a derived
relation is defined in terms of other derived relations, on condition that an ordering exists among the
derived relations, so that all derived relations can be expressed in terms of base relations. 1 There are
basically two types of derived relations:
• materialized views: derived relations that are actually stored in the database;
• virtual relations (also called views, without further qualification): relations defined by means of
functions (expressions in the query language), not stored in the database, but usable in the queries as if
they were.
Materialized views have the advantage of being immediately available for queries. Frequently,
however, it is a heavy task to maintain their contents consistent with those of the relations from which
they are derived, as any change to the base relations from which they depend has to be propagated to
them. On the other hand, virtual relations must be recalculated for each query but produce no
consistency problems. Roughly, we can say that materialized views are convenient when there are
fewer updates than queries and the calculation of the view is complex. 2 It is difficult, however, to give
general techniques for maintaining consistency between base relations and materialized views. For this
reason, most commercial systems provide mechanisms for organizing only virtual relations, which
from here on, with no risk of ambiguity, we call simply views.
Views are defined in relational systems by means of query language expressions. Then queries on
views are resolved by substituting the definition of the view for the view itself, that is, by composing
the original query with the view query. For example, consider a database on the relations:
R=A>D(R1 R2)
On this schema, the query
B=G(R R3)
is executed by replacing R with its definition
1
This condition is relaxed in the recent proposals for deductive databases, which allow the definition of
recursive views. We discuss this issue briefly In Section 3.3.
2
We return to this subject in Chapter 12. in which we discuss active databases, and in Chapter 13. in which we
discuss data warehouses
3.17
Chapter 3 Relational Algebra and Calculus
A user interested in only a portion of the database can avoid dealing with the irrelevant
components. For example, in a database with two relations on the schemas
EMPLOYEES(Employee, Department; MANAGERS(Department, Supervisor)
a user interested only in the employees and their respective supervisors could find his task
facilitated by a view defined as follows:
Employee, Supervisor(EMPLOYEES MANAGERS)
Very complex expressions can be defined using views, with particular advantages in the case of
repeated sub-expressions.
By means of access authorizations associated with views, we can introduce mechanisms for the
protection of privacy; for instance, a user could be granted restricted access to the database
through a specifically designed view; this application of views is discussed in Chapter 4.
In the event of restructuring of a database, it can be convenient to define views corresponding to
relations that are no longer present after the restructuring. In this way, applications written
with reference to the earlier version of the schema can be used on the new one without the
need for modifications. For example, if a schema R(ABC) is replaced by two schemas R1(AB);
R2(BC), we can define a view, R = R1 R2 and leave intact the applications that refer to R. The
results as we show in Chapter 8 confirm that, if Β is a key for R2. then the presence of the view
is completely transparent.
As far as queries are concerned, views can be treated as if they were base relations. However, the same
cannot be said for update operations. In fact, it is often not even possible to define semantics for
updating views. Given an update on a view, we would like to have exactly one set of updates to the
base relations, such that the view, if computed after these changes to the base relations, appears as if the
given update had been performed on it. Unfortunately, this is not generally possible. For example, let
us look again at the view
3.18
Chapter 3 Relational Algebra and Calculus
This section (on calculus) and the following one (on Datalog) can be omitted without impairing the
understanding of the rest of the book.
It is not necessary to be acquainted with first order predicate calculus in order to read this section. We
give now some comments that enable anyone with prior knowledge to grasp the relationship with first
order predicate calculus; these comments may be omitted without compromising the understanding of
subsequent concepts.
There are some simplifications and modifications in relational calculus, with respect to first order
predicate calculus. First, in predicate calculus, we generally have predicate symbols (interpreted in the
same way as relations) and function symbols (interpreted as functions). In relational calculus, the
predicate symbols correspond to relations in the database (apart from other standard predicates such as
equality and inequality) and there are no function symbols. (They are not necessary given the flat
structure of the relations.)
Then, in predicate calculus both open formulas (those with free variables), and closed formulas (those
whose variables are all bound and none free), are of interest. The second type have a truth value that,
with respect to an interpretation, is fixed, while the first have a value that depends on the values
substituted for the free variables. In relational calculus, only the open formulas are of interest, A query
is defined by means of an open calculus formula and the result consists of tuples of values that satisfy
the formula when substituted for free variables.
{A1:x1,...,Ak:xk|f}
where:
A1,…,Ak are distinct attributes (which do not necessarily have to appear In the schema of the
database on which the query is formulated);
x1,...,xk are variables (which we will take to be distinct for the sake of convenience, even if this is
not strictly necessary);
f is a formula, according to the following rules:
There are two types of atomic formula:
R(Al:xl. .... APXP). where R(A1, ..., Ap) is a relational schema and x1,..., xp are variables;
xy or xc, with x and y variables, c constant and comparison operator (=, . , , >, <).
If f1, and f2 are formulas, thcn f1 f2, f1 f2 and f1 are formulas (,, are the logical
connectives): where necessary, in order to ensure that the precedence is unambiguous, brackets
can be used;
If f is a formula and x a variable (which usually appears in f, even if not strictly necessary) then
x(f) and x(f) are formulas ( and are the existential quantifier and universal quantifier,
respectively).
The list of pairs A1:x1 ,...., Ak: xk is called the target list because it defines the structure of the result,
which is made up of the relation on A1,...,Ak that contains the tuples whose values when substituted for
x1,...,xk render the formula true. The formal definition of the truth value of a formula goes beyond the
scope of this book and, at the same time, its meaning can be explained informally. Let us briefly follow
the syntactic structure of formulas (the term 'value' here means 'an element of the domain', where we
assume, for the sake of simplicity, that all attributes have the same domain):
an atomic formula R(A1:x1,....AP:XP) is true for values of x1,...,xp that form a tuple of R;
an atomic formula xy is true for values of x and y such that the value of x stands in relation
with the value of y. similarly for xc;
the meaning of connectives is the usual one;
3.19
Chapter 3 Relational Algebra and Calculus
for the formulas built with quantifiers:
x(f) is true if there exists at least one value for x that makes f true;
x(f) is true if f is true for all possible values for x.
Let us now illustrate relational calculus by showing how it can be used to express the queries that we
formulated in relational algebra in Section 3.1.6, over the schema:
Number,Name,Age (Salary>40(EMPLOYEES))
This query in calculus can be formulated in various ways. The most direct, if not the simplest, is based
on the observation that what interests us are the values of Number, Name and Age, which form part of
the tuples for which Salary is greater than 40. That is, for which there exists a value of Salary, greater
than 40, which allows the completion of a tuple of the relation EMPLOYEES. We can thus use an
existential quantifier:
Head(SUPRVISIONEmployee=Number(Salary>40(EMPLOYEES)))
3.20
Chapter 3 Relational Algebra and Calculus
can be formulated in calculus by:
NameH,SalaryH(NumberH,NameH,SalaryH.AgeHNumber,Name.Salary,Age(EMPLOYEES)
NumberH=Head
(SUPERVISION Employee=Number(EMPLOYEES)))
This query is formulated in calculus by requiring, for each tuple of the result, the existence of three
tuples: one relating to an employee earning more than 40 thousand, a second that indicates who is his
supervisor, and the last (again in the EMPLOYEES relation) that gives detailed information on the
supervisor:
{NameH:nh, SalaryH:sh|
ΕΜPLOYEES(Number:m, Name:n, Age:a, Salary :s) Λ s > 40 Λ
SUPERVISION( Empioyee:m, Head:h) Λ
EMPLOYEES(Number:h, Name: nh, Age:ah, Salary :sh)}
(3.11)
Consider next the query: find the employees earning more than their respective supervisors, showing
registration number, name and salary of the employees and supervisors (Expression 3.4 in algebra).
This differs from the preceding one only in the necessity of comparing values of the same attribute
originating from different tuples, which causes no particular problems:
Number,Name(EMPLOYEES NumberH=Head
(Head(SUPERVISION) –
(Head(SUPERVISION Employee=Number(Salary 40(EMPLOYEES)))))
In calculus, we must use a quantifier. By taking the same steps as for algebra, we can use a negated
existential quantifier. We use many of these, one for each variable involved.
3.21
Chapter 3 Relational Algebra and Calculus
{Number:h, Name:n | EMPLOYEES(Number:h, Name:n, Age:a, Salary:s) Λ
SUPERVISION(Employee:m, Head:h) Λ
m'(n'(a'(s'(EMPLOYEES(Number:m', Name:n', Age:a', Salary:s') Λ
SUPERVISION(Employee: m', Head:h) Λ s' <= 40))))}
(3.13)
As an alternative, we can use universal quantifiers:
(f g) = f g
(f g) =f g
are also valid for quantifiers:
{Α1: x1 | R(A1:x1)}
the result of which contains the values of the domain not appearing in R.
3.22
Chapter 3 Relational Algebra and Calculus
It is useful to introduce the following concept here: an expression of a query language is domain
independent if its result, on each instance of the database, does not vary if we change the domain on the
basis of which the expression is evaluated. A language is domain independent if all its expressions are
domain independent. The requirement of domain independence is clearly fundamental for real
languages, because domain dependent expressions have no practical use and can produce extensive
results.
Based on the expressions seen above, we can say that relational calculus is not domain independent. At
the same time, it is easy to see that relational algebra is domain independent, because it constructs the
results from the relations in the database, without ever referring to the domains of the attributes. So the
values of the results all come from the instance to which the expression is applied.
If we say that two query languages are equivalent when for each expression in one there exists an
equivalent expression in the other and vice versa, we can state that algebra and calculus are not
equivalent. This is because calculus, unlike algebra, allows expressions that are domain dependent.
However, if we limit our attention to the subset of relational calculus made up solely of expressions
that are domain independent, then we get a language that is indeed equivalent to relational algebra. In
fact:
for every expression of relational calculus that is domain independent there exists an expression
of relational algebra equivalent to it;
for every expression of relational algebra there is an expression of relational calculus equivalent
to it (and thus domain independent).
The proof of equivalence goes beyond the scope of this text, but we can mention its basic principles.
There is a correspondence between selections and simple conditions, between projection and existential
quantification, between join and conjunction, between union and disjunction and between difference
and conjunction associated with negation. The universal quantifiers can be ignored in that they can be
changed to existential quantifiers using de Morgan's laws.
In addition to the problem of domain dependence, relational calculus has another disadvantage, that of
requiring numerous variables, often one for each attribute of each relation involved. Then, when
quantifications are necessary the quantifiers are also multiplied. The only practical languages based at
least in part on domain calculus, known as Query-by-Example (QBE), use a graphic interface that frees
the user from the need to specify tedious details. Appendix A, which deals with the Microsoft Access
system, presents a version of QBE.
In order to overcome the limitations of domain calculus, a variant of relational calculus has been
proposed, in which the variables denote tuples instead of single values. In this way, the number of
variables is often significantly reduced, in that there is only a variable for each relation involved. This
tuple relational calculus would however be equivalent to domain calculus, and thus also have the
limitation of domain dependence. Therefore, we prefer to omit the presentation of this language.
Instead we will move directly to a language that has the characteristics of tuple calculus, and at the
same time overcomes the defect of domain dependence, by using the direct association of variables
with relations of the database. The following section deals with this language.
{T|L|f}
where:
L is the range list, enumerating the free variables of the formula f with the respective ranges of
variability: in fact. L is a list of elements of type x(R) with x variable and R relation name; if
3.23
Chapter 3 Relational Algebra and Calculus
x(R) is in the range list, then, when the expression is evaluated, the possible values for x are
just the tuples in the relation R:
T is the target list, composed of elements of type Y:x.Z (or simply x.Z, abbreviation for Z:x.Z),
with x variable and Y and Ζ sequences of attributes (of equal length); the attributes in Ζ must
appear in the schema of the relation that makes up the range of x. We can also write x.*, as
abbreviation for X:x.X, where the range of the variable χ is a relation on attributes X;
f is a formula with
atoms of type x.Ac or x1.A1x2.A2. which compare, respectively, the value of x on the attribute A
with the constant c and the value of x1, on A1 with that of x2 on A2,
connectives as for domain calculus;
quantifiers, which also associate ranges to the respective variables
x(R)(f) x(R)(f)
where, x(R)(f) means 'there is a tuple x in the relation R that satisfies the formula f' and x(R)
(f) means 'every tuple x in R satisfies f'.
Range declarations in the range list and in the quantifications have an important role: while introducing
a variable x, a range declaration R(x) specifies that x can assume as values only the tuples of the
relation R with which it is associated. Therefore this language has no need of atomic conditions such as
those seen in domain calculus, which specify that a tuple belongs to a relation.
We show next how the various queries that we have already expressed in algebra and domain calculus
can be formulated in this language.
The first query, which requests registration numbers, names, ages and salaries of the employees earning
more than 40 thousand, becomes very concise and clear (compare with Expression 3.7):
{e.*|c(EMPLOYEES)|e.Salary>40}
(3.15)
In order to produce only some of the attributes, registration numbers, names and ages of the employees
earning more than 40 thousand (Expression 3.1 in algebra and Expression 3.9 in domain calculus), it is
sufficient to modify the target list:
3.24
Chapter 3 Relational Algebra and Calculus
{NameH, SalaryH:e'.(Name. Salary) |
e'(EMPLOYEES), s(SUPERVISION), e(EMPLOYEES) |
e'.Number = s.Head Λ s.Employee = e.Number Λ
c.Salary > 40} (3.18)
Similarly, we can find the employees who earn more than their respective supervisors, snowing
registration number, name and salary of the employees and supervisors (Expression 3.4 in algebra and
Expression 3.12 in domain calculus):
{e.(Name,Number,Salary),
NameH, NumberH,SalaryH: e'.(Name,Number,Salary)|
e(employees), s(supervision), e'(employees) |
e.Number = s.Employee s.Head = e'.Number e.Salary > e'.Salary}
(3.19)
Queries with quantifiers are much more concise and practical here than in domain calculus. The query
that requests find the registration number and name of the supervisors whose employees all earn more
that 40 thousand (Expression 3.5 in algebra and Expression 3.13 or Expression 3.14 in domain
calculus) can be expressed with far fewer quantifiers and variables. Again, there are various options,
based on the use of the two quantifiers and of negation. With universal quantifiers:
3.25
Chapter 3 Relational Algebra and Calculus
BC(R1) BC(R2)
could not be expressed even with this extension, because the two relations have different schemas, and
thus a single variable cannot be associated with both.
We must stress that, while the union operator cannot be expressed in this version of calculus, the
intersection and difference operators are expressible.
• Intersection requires the tuples of the result to belong to both the operands and thus the result
can be constructed with reference to just one relation, with the additional condition that
requires the existence of an equal tuple in the other relation; for example, the intersection:
BC(R1) BC(R2)
can be expressed by:
{X1.BC | x1(R1)| x2(R2) (x1.B = x2.B x1.C =x2.C)}
• Similarly, the difference, which produces the tuples of an operand not contained in the other,
can be specified by requesting precisely those tuples of the first argument that do not appear in
the second. For example.
BC(R1) – BC(R2)
can be expressed by:
{X1.BC | x1(R1)| x2(R2) (x1.B = x2.B x1.C =x2.C)}
3.3 Datalog
We conclude this chapter with a brief discussion of another database query language that has generated
considerable interest in the scientific community since the mid-eighties. The basic concept on which
the Datalog language is based is that of adapting the logic programming language Prolog for use with
databases. We can illustrate neither Datalog nor Prolog in detail here, but we can mention the most
interesting aspects, particularly from the point of view of a comparison with the other languages seen in
this chapter.
In its basic form, Datalog is a simplified version of Prolog, 3 a language based on first order predicate
calculus, but with a different approach from the relational calculus discussed above. There are two
types of predicate in Datalog:
extensional predicates, which correspond to relations in the database;
intensional predicates, which essentially correspond to views (virtual relations), specified by
means of logical rules.
Datalog rules have the form:
Head body
where
the Head is an atomic formula of the form R(A1 : a1,..., Ap:ap), similar to those used in domain
relational calculus,4 where each ai, however, can be a constant or a variable;
the body is a list of atomic formulas, of both forms allowed in domain
calculus, that is, the form R(...) and the comparison between variables or between a variable
and a constant.
Rules define the 'content' of intensional predicates, as the tuples whose values satisfy the body. The
following conditions are imposed:
extensional predicates can appear only in the body of rules;
if a variable appears in the head of a rule, then it must also appear in the body of the same rule;
3
For those acquainted with Prolog, note that function symbols are not used in Datalog.
4
For the sake of continuity with previous sections, we use a non-positional notation for atomic formulas, while Datalog and
Prolog usually have a positional notation. The substance of the language is, however, the same
3.26
Chapter 3 Relational Algebra and Calculus
if a variable appears in a comparison atom, then it must also appear in an atom of the form R(...)
in the same body.
The first condition ensures that there will be no attempt to redefine the relations stored in the database.
The other two ensure a property similar (in this context) to domain independence as discussed with
regard to relational calculus.
A basic characteristic of Datatog. which distinguishes it from the other languages we have seen up to
now, is its use of recursion. It is possible for an intensional predicate to be defined in terms of itself
(directly or indirectly). We will return to this aspect shortly.
Datalog queries are specified simply by means of atoms R(A1 : a1, ..., Ap: ap), usually preceded by a
question mark '?', to underline precisely the fact that they are queries; however other syntactic
conventions may be used. Queries produce as results the tuples of the relation R that can be obtained by
suitable substitutions for the variables. For example, the query:
?EMPLOYEES(Number:m. Name: n. Age: 30. Salary:s)
returns the employees who are thirty years old. To formulate more complex queries, we must use rules.
For example, in order to find the registration numbers of the supervisors of the employees who earn
more than 40 thousand formulated in algebra by Expression 3.2 and in domain calculus by Expression
3.10, we define an intensional predicate SUPEROfRICH, with the rule:
SUPEROfRICH(Head:h)
EMPLOYEES(Number:m, Name:n, Age:a, Salary:s),
SUPERVISION(Employee:m, Head:h), s > 40 (3.22)
In order to evaluate a query of this nature, we must define the semantics of the rules. The basic concept
is that the body of a rule is considered as the conjunction of the atoms that appear in it, and thus the rule
can be evaluated in the same way as an expression of domain calculus. The body of the expression,
substituting the commas with and. becomes the formula, and the head of the expression, apart from the
name of the intensional predicate, becomes the target list. Expression 3.22 defines the intensionaJ
relation SUPEROfRICH as made up of the same tuples that appear in the result of Expression 3.10 of
calculus, which has precisely the structure described above:
(Head:h | EMPLOYEES(Number:m, Name: n, Age:a. Salary :s)
SUPERVISION( Employee:m, Head:h) Λ s > 40}
Similarly, we can write rules (with auxiliary intensional predicates) for many of the queries we have
looked at in preceding sections. In the absence of recursive definitions, the semantics of Datalog is
therefore very simple, in the sense that the various intensional predicates can be calculated by means of
expressions similar to calculus. However, using the definition given so far for Datalog it is not possible
to formulate all the queries that could be expressed in calculus (and in algebra). This is because there is
no construct available corresponding to the universal quantifier (or to negation in the full sense of the
term). It can be proven that non-recursive Datalog is equivalent to the domain independent subset of
calculus without negations or universal quantifiers.
To furnish Datalog with the same expressive power as calculus, we must add to the basic structure the
possibility of including in the body, not only atomic conditions, but also negations of atomic conditions
(which we indicate by the symbol NOT).
Only in this way can we formulate the query that requests find the registration numbers and names of
the supervisors whose employees all earn more than 40 thousand (Expression 3.13):
{Number:h, Name:n | EMPLOYEES(Number:h, Name:n, Age:a, Salary:s) Λ
SUPERVISION(Employee:m, Head:h) Λ
m'(n'(a'(s'(EMPLOYEES(Number:m', Name:n', Age:a', Salary:s') Λ
SUPERVISION(Employee: m', Head:h) Λ s' <= 40))))}
Let us proceed by defining a predicate for the supervisors who do not satisfy the condition:
3.27
Chapter 3 Relational Algebra and Calculus
SUPEROFSOMENOTRICH(Head:h)
SUPERVISION(Employee:m. Head:h), EMPLOYEES(Nlumber:m, Name:n, Age:a, Salary:s), s' 40
We can use this predicate in the negated form:
SUPEROFRICH(Number:h, Name:n)
EMPLOYEES(Number:h, Name:n, Age:a, Salary:s) SUPERVISlON(Employee:m. Head:h),
NOT SUPEROFSOMENOTRICH(Head:h)
We could prove that non-recursive Datalog with negation is equivalent to the domain-independent
subset of calculus.
Greater expressive power is obtained by using recursive rules. For example, referring again to the
database with the relations EMPLOYEES and SUPERVISION, we can define the intensional predicate
SUPERIORS, which gives, for each employee, the supervisor, the supervisor's supervisor and so on,
with no limits. For this we need two rules:
SUPERIORS(Employee:e, SuperHead:h) SUPERVISION(Employee:e·, Head:h)
SUPERIORS( Employee:e, SuperHead: h) SUPERVISION( Employee: e, Head: h')
SUPERIORS(Employee: h'. SuperHead:h)
The second rule is recursive, in that it defines the SUPERIORS relation in terms of itself. To evaluate
this rule, we cannot proceed as we have done up to now, because a single evaluation of the body would
not be sufficient to calculate the recursive predicate. There are various techniques for formally defining
the semantics in this case, but they are well beyond the scope of this text. We will touch upon the
simplest method, based on the technique known as fixpoint: the rules that define the intensional
recursive predicate are evaluated many times until an iteration does not generate new results. In our
case, the first iteration would generate a relation SUPERIORS equal to the extensional relation
SUPERVISION, that is, containing the supervisors of the employees. The second step would add the
supervisors of the supervisors, and so on. Obviously, queries of this nature cannot be formulated in
relational algebra (or in calculus) because we would have no way of knowing how many times the join
of the relation SUPERVISION with itself had to be specified.
As a final issue before concluding, we simply state the fact that certain recursive rules with negation
are difficult to evaluate, because the fixpoint cannot be reached. This is why limits are imposed on the
presence of negations in recursive rules. The reader should be aware that it is possible to identify a
perfectly workable subset of recursive Datalog with negation that is much more expressive than
calculus and relational algebra in that:
for every expression of algebra there is an equivalent expression of Datalog with negation;
there are recursive Datalog expressions for which there are no equivalent expressions in algebra
and calculus.
3.4 Bibliography
Relational algebra was proposed by Codd [26] as an essential component of the model. Relational
calculus and the close correspondence of the two families of languages were also proposed by Codd
[28]. Deeper and more formal treatment of relational languages can be found in the books devoted to
database theory: Ullman [88], Maier [58], Parcdacns et al. [67], Atzeni and De Antonellis [3],
Abiteboul, Hull and Vianu [1]. Datalog is discussed in depth by Ceri, Gottlob and Tanca [17], Ullman
[88], Abiteboul, Hull and Vianu [1].
3.5 Exercises
Exercise 3.1 Study the database schema containing the relations:
FILMS(FilmNumber, Title, Director, Year. ProductionCost)
ARTISTS(ActorNumber, Surname, FirstName, Sex, BirthDate, Nationality)
3.28
Chapter 3 Relational Algebra and Calculus
ROLES( FilmNumber, ActorNumber, Character)
Produce a database on this schema for which the joins between the various relations are all complete.
Assuming two referential constraints between the relation ROLES and the other two, discuss possible
cases of incomplete join.
Show a cartesian product that involves relations in this database.
Show a database for which one (or more) of the joins is (are) empty.
Exercise 3.2 With reference to the schema in Exercise 3.1, express the following queries in relational
algebra, in domain calculus, in tuple calculus and in Datalog:
the titles of the films starring Henry Fonda;
the titles of the films in which the director is also an actor;
the actors who have played two characters in the same film; show the titles of the films, first
name and surname of the actor and the two characters;
the titles of the films in which the actors are all of the same sex.
Exercise 3.3 Consider the database containing the following relations:
REPRESENTATIVE(Number, Surname, FirstName, Committee, County, Constituency)
CONSTITUENCIES(County, Number, Name)
COUNTIES(Code, Name, Region)
REGIONS(Code, Name)
COMMlTTEES(Number, Name, President)
Formulate the following queries in relational algebra, in domain calculus and in tuple calculus;
find the name and surname of the presidents of the committees in which there is at least one
representative from the county of Borsetshire;
find the name and surname of the members of the finance committee;
find the name, surname and constituency of the members of the finance committee;
find the name, surname, county and region of election of the delegates of the finance committee;
find the regions in which representatives having the same surname have been elected.
Exercise 3.4 Show how the formulation of the queries in Exercise 3.) could be facilitated by the
definition of views.
Exercise 3.5 Consider the database schema on the relations
COURSES(Number, Faculty, CourseTitle, Tutor)
STUDENTS( Number, Surname, FirstName, Faculty)
TUTORS( Number. Surname, FirstName)
EXAMS(Student, Course, Grade, Date)
STUDYPLAN( Student, Course, Year)
Formulate, in relational algebra, in domain calculus, in tuple calculus, and in Datalog, the queries that
produce:
the students who have gained an 'A' in at least one exam, showing, for each of them, the first
name, surname and the date of the first of such occasions;
for every course in the engineering faculty, the students who passed the exam during the last
session;
the students who passed all the exams required by their respective study plans;
for every course in the literature faculty, the student (or students) who passed the exam with the
highest grades;
the students whose study plans require them to attend lectures only in their own faculties;
first name and surname of the students who have taken an exam with a tutor having the same
surname as the student.
3.29
Chapter 3 Relational Algebra and Calculus
Exercise 3.6 With reference to the following database schema:
ClTlES(Name, Region, Population)
CROSSINGS(City, River)
RlVERS( River, Length)
formulate the following queries in relational algebra, domain calculus, tuple calculus and Datalog:
find the names, regions and populations for the cities that (i) have more than 50 thousand
inhabitants and (ii) and are crossed by the Thames or the Mersey;
find the cities that are crossed by (at least) two rivers, giving the name of the city and that of the
longest of the rivers.
Exercise 3.7 With reference to the following database schema:
TRIBUTARIES(Tributary, River)
RIVERS( River, Length)
formulate in Datalog, the query that finds all the tributaries, direct and indirect, of the Mississippi.
Exercise 3.8 Consider the relational schema consisting of the following relations:
TUTORS( Number, Surname, FirstName)
COURSES( Number, CourseName, Tutor)
STUDENTS(Number, Surname, FirstName)
EXAMS(Student, Course, Date, Grade)
With reference to this schema, formulate the expressions of algebra, tuple relational calculus and
Datalog that produce:
the exams passed by the student named Detrouvelan-Delaney (supposing him to be the only one
with such a surname), indicating, for each exam, the name of the course, the grade achieved
and the name of the tutor;
the tutors who teach two courses (and not more than two), indicating the surname and first name
of the tutor and the names of the two courses.
Exercise 3.9 Consider a relational schema containing the relations:
3.30
Chapter 3 Relational Algebra and Calculus
R1(AB), R2(CDE), R3(FGH)
and transform it with the goal of reducing the size of the intermediate results.
3.31