0% found this document useful (0 votes)
18 views8 pages

Translation and Optimization of Logic Queries, The Algebraic Approach

This document presents an algebraic approach to translating and optimizing logic queries. It develops a syntax-directed translation from rules of function-free logic programs to algebraic equations. It then shows solution methods for independent equations and systems of interdependent equations, defining the operational and fixpoint semantics of function-free logic programs and queries. It also presents algebraic optimization methods for "top-down" and "bottom-up" query execution strategies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

Translation and Optimization of Logic Queries, The Algebraic Approach

This document presents an algebraic approach to translating and optimizing logic queries. It develops a syntax-directed translation from rules of function-free logic programs to algebraic equations. It then shows solution methods for independent equations and systems of interdependent equations, defining the operational and fixpoint semantics of function-free logic programs and queries. It also presents algebraic optimization methods for "top-down" and "bottom-up" query execution strategies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

TRANSLATION AND OPTIMIZATION OF LOGIC QUERIES:

THE ALGEBRAIC APPROACH

S. Ceri ('1, G. Gottlob ('I), L. Lavazza ("1

('1 Dipartimento di Elettronica, Politecnico di Milano, Italy


(‘I) Istjtuto per La Matematica Abplicata del CNR, Genove, Italy
(-) TXT-Techint Software e Telematica, Nilano, Italy

ABSTRACT C. Determining how LP programs can be made more


efficient, by operating transformations from LP
This paper presents an algebraic approach to to LP. Tecniques such as the "magic set", the
translation and optimization of logic queries. We "counting" and the "Eager" methods have been
first develop a syntax directed translation from developed to this purpose.
rules of function-free logic programs to algebraic
We have focused our attention on the use of
equations; then we show solution methods for
relational algebra at work on the same problems.
independent equations and for systems of
While the idea of using relational algebra for
interdependent equations. Such solutions define the
executing logic queries is not new (see, among
operational and fixpoint semantics of function-free
others, CAho 791 and CNarque-Pucheu 841), the major
logic programs and queries. We also present
contribution of this paper is to give a systematic
algebraic optimization methods for "top-down" and
overview of how traditional algebra, extended by a
"bottom-up" strategies; the former are useful if no
closure operator, can be applied to solve logic
initial binding is provided with the query, while
queries.
the latter are useful if some arguments of the
query are bound to constant values.
2. MODELS FOR LOGIC PROGRAMMING -AND RELATIONAL
CATABASES - -
-1. INTRODUCTION
In this sect ion, we give our definition and
In recent times, the combination of relational
interpretation of function-free logic programs and
databases and logic programming (LPI has become a
queries; then, we introduce positive algebra
popular argument of research. The application of LP
extended with the closure operator.
as query Language of a relational database entails
a relevant enrichment of the expressive power of
traditional query languages; hence the database --2.1 Function-free -- Logic programs

community Looks at LP as a promising approach for


A function-free logic program (FFLP) is a set of
posing complex (e.g. recursive or deductive)
definite, function-free Horn clauses (i.e. clauses
queries. At the same time, databases provide the
that contain exactly one positive literal); we use
technology for processing Large collections of data
a Prolog-like syntax for clauses, which have their
in an efficient way, hence solving many of the
positive literal on the LHS and zero or more
problems posed by LP applications when they manage
(negative) literals on the RHS, in conjunctive
large amounts of information.
form. For instance, the following is a FFLP:
The efficient implementation of logic queries has
S(a,b).
been discussed in many recent papers, including
R(c,b).
CBancilhon86, Chandra82, Ceri86 Henshen84,
P(X):- P(Y), R(X,Y).
Marque-Pucheu84, Sacca'86, Ul lman851. Major
P(X):- S(Y,X).
research directions are:
Q(X,Y):- P(X), R(X,Z), S(Y,Z).
a. Designing "pure" LP languages, e.g. Languages
We can better interpret a FFLP within the framework
which do not incorporate procedural features,
of databases if we consider terms appearing in the
such as the dependency of the computation from
LHS of clauses as either database relations or
the order of clauses and the use of special
computed relations. Ground clauses of the former
predicates. This trend has marked the adoption
are stored in an extensional database EDB; ground
of "pure" Horn clauses as "standard" LP
clauses of the latter are evaluated by executing
language, and consequently a certain resistence
the FFLP; thus, each computed relation appears as
to Prolog.
LHS of one or more clause. In the above example, R
b. Determining how function-free Horn Clauses can and S are database relations, P and Q are computed
be efficiently executed. New formal models, relations; the first two clauses are equivalent to
such as "rule-goal" graphs, describe binding assigning an instance to relations R and S, and in
propagation among clauses. Several rules for general will not be present in FFLPs but will be
"capturing" nodes of graphs have been defined; stored in the EDB.
capturing a node is equivalent to evaluating
the information associated to the corresponding We generalize the class of database relations to
clause or rule, deduced from the database. include all those terms for which we have a
function available for evaluating arithmetic
predicates. For instance, the relation PLUS(X,Y,Z)

Permission lo copy without fee all or part of this material is granted provided thaf the copies are not made or distribufed for direct commercial
adoankzge, the VLDR copyright notice and the Me o/the publication and its date appear, and notice is given that copyin is by permission of
the Very Large Data Base Endowment. To cop ofherwise, or to republish, requires a fee and/or special permission from t %e Endowment.
ProceedingsoftheTwelfthlnternationalCon 3erenceonvery LargeData Bases Kyoto. August, 1986
-395-
links all representable integers X, Y, and Z such Then, P is a complete lattice under the union and
that X+Y=Z. Further, we can extend Horn clauses to intersection operations, having as minimal element
include special terms such as equality between the empty relation and as maximal element Dm(X).
variables (e.g., X=Y), since we assume that the
Expressions of RA+ are monotone: (i.e. X C Y =>
equality test be available, and hence we can
E(X) C E(Y)), hence algorithm AU always tgriinates
represent it through a special relation EQCX,Y). We
after-a finite number of iterations, and 0 E(X) is
assume that the domain and co-domain of functions
the minimal solution (i.e. the least fixpoint) of
be finite.
the equation X=ECX):
Each variable X of a clause of a FFLP program is Ox E(X) = min C X c P i X = E(X) )
associated to a finite range RgCX). Ranges of
variables of databKla= are limited to the 3. SYNTAX DIRECTED TRANSLATION FROM FUNCTION -
values existing in the instance of that relation or FREE
-- LOGIC PROGRAMS -TO EXTENDED RELATIONAL ALGEBRA
to the domain and co-domain of functions; all
variables of computed relations are limited to a A syntax-directed translation algorithm maps each
unique finite universe U which includes all clause of a FFLP into a correspondent disequation
individuals possibly occurring in the DB. of RA+. Disequations are subsequently interpreted
as equations under the Closed World Assumption
An input goal to a FFLP is a clause consisting of a
CCWA) and solved using the closure operation.
sinmiteral, for instance:
?-QCa,X).
--3.1 Translation -of individual clauses
The solution of an input goal PCtl,...,tn) is the
Let a generic Horn clause of a FFLP have the
set of all ground instances of PCtl,...,tn) which
following structure:
are logical consequences of the FFLP; for example:
SolCFFLP, QCa,X)) = CQCa,c) 1 Cc E U) h R: PCo,,... m,):- Q,(p ,,... p,), . . .. Qh(ps ,... 13,)
CFFLP U EDB => GlCa,c))>
Then, the translation associates to R an algebraic
disequation:
If the input goal is a literal with all places
bound to constant values, then the solution space ExprCQ,, . . . Qn) -C P
is reduced to the answers "yes" or "no":
where P is a computed relation correspondent to
SolCFFLP, P(c)) = "yes" if Cc E U)) A
predicate P, and similarly Q. are either computed
CFFLP U EDB => P(c)),
or database relations corres$ondent to predicates
"no" otherwise.
Q. . The specification of the algorithm for the
s;ntax directed translation requires defining some
--2.2. Positive algebra
--~ with closure
useful notation and two rewriting functions.
We assume that the reader is familiar with the - occursCo(.,RHS), of sort boolean, is "true" if
relational model and algebra as from CUllman821. the ter& cxi belongs to the RHS, "false"
Positive relational algebra CRA+) includes as otherwise.
primitive operators selection CO-), projection(
- corr(i) denotes the function returning, for
Cartesian product(x), join(W), semi-join(lx), and
each o(. of the LHS, the index j of the leftmost
union CU); noticeably, it does not include the
variable p. of the RHS such that pj=Mi, if such
difference operator. We extend projection to
a variable'exists.
include $, E (projection on the empty set) as an
operator which can be applied to an algebraic - const(M.) of sort boolean, is true if 9'. is a
expression E; F+ E returns "yes" if E #+, "no" if constant, false otherwise; constCpj5 is
E =+. + denotes the empty relation of suitable similarly defined.
degree.
- var(x.), of sort boolean, is defined as "not
conSdXiY.
The closure operator (0') is applied to an
expression
-E(X) of a (variable) relation X, - newvat- is a procedure returning a new
provided that the schema of the result of E(X) is mle name at each invocation.
the same as the schema of X. The language obtained
- Let E denote a string, x and y denote symbols.
by extending RA+ with the closure operator is
Then E<x,y> denotes a new string in which the
denoted as ERA+. The operational definition of the
first occurence of x is substituted by y.
closure operator isX given by the following program
AU which computes 0 E(X): The syntax directed translat ion of R into an
algebraic disequation of RA+ is defined through two
1 A (E(X)):
recursive rules which apply to the LHS and RHS of R
OS <- +
respectively.
) REPEAT
Y <- E(S) The first rule deals with three special cases:
s <- s u Y
a. bindings to constant values in the LHS;
/ UNTIL Y=S
1 RETURN S b. multiple occurrences of the same variable in
the LHS;
The fixpoint semantics of the closure operator is
also well defined. Let P denote the set of all C. existence of a variable of the LHS that does
possible relations having the same ciomain as X: not occur in the RHS.

P = powerset @m(X))

-396-
Each such special case is reconducted, by suitable By effect of the translation rules explained above,
string transformations, to an equivalent case in we can turn each FFLP into a set of algebraic
which all positions of the LHS are bound to disequations. For instance, the last 3 clauses of
distinct variables. Then, the second rule is the FFLP in Section 2.1 generate:
applied to the RHS; it simply generates all
-rr, CP W,=2 R) C P
selection conditions due to constant bindings or
replicated variables within the RHS, and then -m2 s -c P
builds a Cartesian product with all database or
computed relations corresponding to terms of the Tt-., 4 ((I= t-l,=, R) W3,2 s) -C Q
I
RHS. The Closed World Assumption (CWA, CReiter783)
enables us to turn these disequations into
After applying the second rule, equiva Lence -
equations. By the CWA, all facts which cannot be
preserving transformations can be applied to the
deduced by the application of the FFLP to the
resulting expression in RA+ to transform Cartesian
database are false; hence, no fact about a generic
products into joins and to propagate selections to
term P can be proved other than through existing
their operands, in the convent iona 1 w (see
disequations; hence the union of all RHS of
CULLman821). The notation Xi=, indicates the
..kQi
disequations gives the algebraic equation required
Cartesian product of relations QI’..‘
‘k’ to compute P. The above example generates the
--rule 1. system of equations:
TCR)=if 3 i : constCMi) /*a*/ P =TT, CP w RI U ‘TC2 S)
1=2
then
Q =TT , 4 ((P W,=, R) D43=2 s)
neuvar Cx)
return TCLHS<K,x>:-RHS, EQCx,ai)) FlarquelPucheu et a I. CMarque Pucheu841 show a
transformation from logic programs into equations
elsif 3 i,j : cXi=oi, iCj /*b*/
which does not directly use relational algebra.
then
newvar(x)
return TCLHStofi ,x>:-RHS, EQCx,Mj 1 --3.3. Solution -of independent equations

elsif 1 i: varCC$) A not occursCK,RHS) /*c*/ Solving a system of equations is easier if each
equation is independent from the others (or can be
then return TCLHS:-RHS, RgCwi))
made independent by suitable substitutions); an
else returnncorrC,) T’ CRHS)
,...corrCn) equation is independent when it does not contain
end if;
computed relations in the RHS other than the one in
--rule 2. the LHS. For the solution of independent equations,
two cases are given:
T’(R) = if 3 i : constC/$) I* a *I
then a. The RHS contains only database relations; in
newvar(x) this case, the solution of the equation is
return @I=8 T’(RHS&,x>) simply given by the evaluation of the
expression in RA+. This happens when rules for
elsif 3 i,j :’ ‘pi=aj, i<j I* b *I
the computed relation in the FFLP are
then
nonrecursive.
newvarCx1
return (5: T’ CRHS<pi ,x>) b. The RHS contains one or more occurrences of the
i=j
computed relation CR; in this case, the
else return (Xi=,
. . k ‘i) definition of the closure operator as fixpoint
end if;
of algebraic equations enables us to build the
Example. The following is a systematic application solution as follows:
of the translation algorithm.
sol(CR) = OCR RHSCCR)
P(X,X,Z):-SCX,Y)RCY,a,Z)
In the above example (Sect. 3.21, the first
equation is independent, and can be solved as:
1. by T, case Cb)
P
TC PCNl,X,Z):-SCX,Y),RCY,a,Z),EQCNl,X) 1 sol(P) = 0 tTf2CP W,=2 RI U CK2 S))
2. by T, last recursive call The second equation for Q depends on P, however we
T’ C SCX,Y),RCY,a,Z),EQCNl,XI ) can suspend its evaluation until we have solved the
%,I,5
equation for P. Then we consider P as a fixed (i.e.
3. by T’, cases (a) and Cb)
database) relation, and the second equation falls
T’CSCN2,N3),RCY,N4,Z),EQ(N1,X)I
V6,1,5%aA2=3A1=7 in case (a) above:
4. by T’, last recursive call
sol(Q) =r, 4 ((sol(P) W,=, R) W3=2 S)
n- 6,j,‘jc4=a A 2~3 ,j 1~7 (’ ’ R ’ EQ) I
We can now interpret input goals as suitable
5. After pushing selection and join conditions
expressions on the algebraic solutions; for
r6,1,5 ((s W2=, (eZzaR)) ‘-J,=, EQ) instance:
6. Final disequation:
r ?-P(X). <=> sol(P)
6I , I 5 ((s Hz=, (Oi,,R)) w,=, EQ) c p
?-P(a). <=>-(j- * G ,=a sol(P)
3.2. Transformations from disequations -to ?-QCa,X). <=>@,=a sol(Q)
eq;8tions --in RA+ ?-QCa,b). <=>$G,=a h 2=b sol(Q)

-397-
graph G’=G-CRj). If, by successive resolution and
--3.4. Reduction -by substitutions elimination of nodes, it is possible to get an
empty graph, we say that G is reducible -by
We ,have seen that by suitable substitutions it is substitutions.
sometimes possible to reduce a system of mutually
Lemma 3.1. Any acyclic connection graph G is
interdependent equations to a system of equations
reducibie-by substitutions.
which are independently resolvable, when a certain
order of evaluation is observed. We call these Theorem 3.1. A strongly connected graph G is
systems of equations reducible by substitutions. reducible-b? substitution iff it contains a node K
Let us first define the problem Gre formally. We such that G-CK) is acyclic.
start with a given set S of equations of the
Theorem 3.2. A graph G is reducible by substitution
following form:
iffits-strongly connected components are each
s: RI = ElCRl,...RnI
reducible by substitutions.
R2 = EZCRl,...Rnl
....... Proofs of Lemma 3.1 and of Theorems 3.1 and 3.2 are
Rn = EnCRl,...Rnl presented in CCamerini861. Based on Theorems 3.1
where each Ri is a distinct relational variable and and 3.2, we build algorithm REDUCE which determines
each Ei is an expression of RA , which involves whether a graph G is reducible by substitutions; in
some (not necessarily all) variables Rl,...Rn; we the positive case, the algorithm outputs a series
call Ei the defining part of Ri. of substitutions and resolutions (in correct order)
to reduce G.
A substitution consists in the replacement of some
variable Ri by its defining part Ei in the RHSs of The algorithm requires introducing a Reduc t i on
all equations of S. A resolution of Rj consists of Graph i!?CN,E), built from the dependency graph G by
a series of substitutions which generate an eifying all strongly connected components of G.
equation Rj=E’ that does not contain any variable NC& are the connected components of G;
different from Rj. Clearly, if Rj has a resolution, EC& are the edges between connected components
then its value can be computed either by evaluation of G, defined in the obvious way.
of an expression of constant relations (if Rj does Clearly % is acyclic; we call bottom nodes of % all
not appear in E’) or by the apajication of the nodes which have no outgoing edges.
closure operator to E’, yielding 0 E’CRj).
ALGORITHM “REDUCE”
After successful resolution, Rj can be marked as a INPUT: Dependency Graph G
known constant reCation and we can eliminate the OUTPUT: Sequence of substitutions or resolutions
defining equation for Rj from the original system
1. identify strong connection components Gl...Gk
of equations S. This step is called the elimination
of G.
of Rj from S. A set of equations is reducible by
substitutions iff it can be transformed to the 2. for each component Gi find a node Ki in Gi such
empty set by successive resolutions and that Gi-CKi) is acyclic; if Ki cannot be found
eliminations. Note that it is useless to substitute then stop with output “irreducible”.
a variable which occurs in its own defining part;
3. build the reduction graph %.
we shall forbid such substitutions.
4. if % is empty then stop.
It is clear that the particular form of each
expression Ei in S as well as the constant 5. for any bottom node Gi of c do:
relations that appear in Ei are not relevant for for each node Rj in Gi-CKi>
determining whether S is reducible by output “substitute Rj”;
substitutions; the only relevant information is the output “resolve Ki “;
mutual interdependency of relational variables in apply REDUCE to Gi-CKil;
S. The information on variable interdependency is ‘remove Gi from %.
most appropriately represented in the dependency
6. got0 4.
graph G=<N,E> defined as follows:
N(G)= CRl,...,Rn> It is easy to see that the REDUCE algorithm is
E(G)= C<Ri,Rj>lRj occurs in Eil. polynomial in the size of its input (i.e. the
Note that loops of the form’ <Ri,Ri> may occur in dependency graph G): the identification of the
E(G); in thiscase we call Ri a --Looping node. strong connection components (step I) as well as
the acyclicity test (step 2) are well known
It is easy to see that substituting a variable Ri
polynomial problems; furthermore it is not hard to
in S exactly corresponds to dropping the node Ri
see that steps 4-6 are repeated at most IINCG)lI
from G and linking all the predecessors of Ri to
times.
all successors of Ri by new edges. By analogy, we
Example. Consider the following system S of
call such a process the substitution of node Ri in
mns in RA+:
G. Since we have forbidden to substitze?i;;i;lz
Fi that occur in their own defining part, we forbid RI = Cl UT$,CRl W R6) U R5
to substitute looping nodes in a dependency graph. PI
R2 = R5 U R4
If, after a series of node substitutions, the node
R3 =-rr,, CR4 W R3) U C2
Rj has no outgoing edges to any nodes different
from Rj, we say that we have resolved Rj. After a R4 = R3 W p3 c4p2
successful resolution of Rj we can eliminate node
R5 =nL3 CR1 W R6) u c3
Rj and all its incident and outgoing edges from the P4
original graph G, yielding a new (and simpler) R6 = (RI W p5 C5) U C6

-398-
Figure 1 shows the reduction graph for S. We can optimize Algorithm A by using at each step
the recently computed values for Rl,...,Ri for the
computation of the new value of Ri+l, instead of
using the old values, i.e., Sl,...,Si:

ALGORITHM -B
1 FOR i:=l TO n DO Ri:=+;
1 REPEAT
cond:=true;
I FOR i:=l TO n DO
BEGIN
I S:= Ri;
Ri:= EiCRl,...,Rnl;
IF Ri <> S THEN cond:=false;
i END;
1 UNTIL cond;
Fig. 1. Dependency graph and reduction graph for a 1 FOR i:=l TO n DO OUTPUT(
system of equations
Algorithms A and B have well-known correspondents
in the field of numerical analysis: Algorithm A
The output produced by the REDUCE algorithm is as corresponds to the Jacobi algorithm for the
follous: iterative solution of systems of equations, while
algorithm B corresponds to the Gauss-Seidel
SUBSTITUTE R4; RESOLVE R3; RESOLVE R4;
algorithm.
SUBSTITUTE R5; SUBSTITUTE R6; RESOLVE RI;
RESOLVE R6; RESOLVE R5; RESOLVE R2.
--3.6. Conclusion of
--- Section 3
The sequence of solutions is produced as follows:
In this section we have given rules for
R3 =nl,CCR3 W C4) Pa R3) U C2
transforming FFLPs into systems of equations in RA+
R4 = R3 W p3 c4p3 p2 and we have shown how these systems can be solved
by use of the closure operator. By this process we
RI =rrl,CRl Wp,CCRl W p5 C5) U C6) U
have defined both the fixpoint semantics and the
TT13CR1 Wp4 ((RI Wp5 C5I U C6)) U Cl U C3 operational semantics for FFLPs and queries
operating on an extensional database EDB.
R6 = CR1 W C5) U C6
P5 Though outside the scope of this paper, it can be
R5 =~13CCR1 W R6) u C3)
P4 seen that our semantics correspond exactly to the
R2 = R5 U R4 semantics for logic programs defined by Van Emden
and Kovalski CVanEmden761; due to the limitedness
of ranges for individual variables and to the
--3.5. Resolution -of mutually dependent equations
absence of function symbols, the Herbrand universe
Some systems of equations cannot be reduced by of any FFLP is finite.
substitutions; in such cases we have to use other
solution methods. There are two approaches for 4. ALGEBRAIC APPROACH --TO THE OPTIMIZATION --OF LOGIC
solving nonreducible systems. In the first FROGRAMS
approach, the single relation variables Ri are
combined to one super-variable R, for instance by We turn nou to the optimization of expressions in
use of Cartesian products. Then the original system ERA+. Execution strategies presented in the
of equations can be rewritten as R=ECRl and solved previous section suffer from two major
by applying the closure operator. Two different disadvantages:
variants of this method are described in
a. The algorithm which computes the closure
CChandra821 and CCeri861.
operation is not too efficient.
The second approach computes the solution directly
b. In the computation; l+OXEtXj, conditions of the
from the original system of equations by initially
logica 1 query are not used, because the
setting Ri=d for i=l...n and then sucessively
s lection condition is not pushed inside
computing Ri:=EiCRl,.., Rnl, until the value of each 5
0 E(X).
Ri remains unchanged:
Proofs of theorems in this section are quite simple
ALGORITHM A
and can be found in CCeri867.
i ;;;Ei;=l-TO n DO Ri:=q;
--4.1. Efficient computation of
--- the closure operator
cond:=true;
1 FOR i:=l TO n DO Si:=Ri; Algorithm An presented in section 2.2 is not very
FOR i:=l TO n DO eff cient b&cause several partial unions are needed
! BEGIN and because E(S) has to be evaluated several times
Ri:= EiCSl,...,Snl; for the same tuples; consider now the more
1 IF Ri <> Si THEN cond:=false; eff cient program A,:
.
1 UNTILE::;d-
1 FOR i:=l Th n DO OUTPUT( 1 A (E(X)):
1

-399-
I SC-d) Theorem 4.4. If the degree of E(X) is 2, then
( REPEAT -hrn-A- is equivalent to A,.
3
Y <- s
The advantage of algorithm A compared to algorithm
1 S <- E(S)
A, is that we never compute ? he term E'CS,S), which
1 UNTIL Y=S
might be large. We further notice that the program
1 RETURN S
can be simplified by omitting the evaluation of T2
Theorem
--- 4.1. If E(X) is an expression in RA+, then if the expression E'(X,Y) is commutative. Algorithm
AUCECX)) and A,(ECX)) are equivalent programs. A3 can be generalized for a generic i.

Note that Theorem 4.1 does not hold if E(X) is not Example: Nonlinear ancestor program
in RA+; for instance, if ECX)=K-X, K #+, then A,
FFLP: ancCx,y):- parCx,y).
never halts while AU terminates returning K as
ancCx,y):- ancCx,z), ancCz,y).
result.
DISEQUATIONS: PAR C ANC
Program A, eliminates the first source of
-n- , 6(ANC W2,, ANC) C ANC
inefficiency, i.e. the computation of unions within
the program AU, but it does not eliminate the EQUATION: ANC = ;AR UTT, 4 (ANC M2;, ANC)
second source, i.e. the computation of E(S) several
SOLUTION:
times for the same tuple. We now develop two
sol CANC) = OANC(PAR Urr,'4CANC W2,,ANC))
algorithms which do not have this burdensome
property. The degree of ECANC) is ;. Assuming acyclicity of
the PAR relation, algorithm A3 produces at each
Definition 4.1. An expression E(X) over RA+ is
iteration i the pair of ancestors corresponding
linear if it-h;lds:
to the Zi-th and 2i+l-th generations; term T2
V X,Y C Rg(E) : ECXUY) = E(X) U E(Y)
should not be evaluated, as the expression is
We can now present algorithm A2: clearly commutative. Acyclicity of the PAR
relation is not required by algorithm A3.
1 A2CECX)k
1 S t-9; D c-6
--4.2. Pushing selection conditions --into linear
1 REPEAT
expressions
D <- E(D) - S
I S<-SUD
Aho and Ullman CAho791 indicate a Ipethod for
1 UNTIL D = +
optimizing expressions of the type: GF 0 (E(X)). We
1 RETURN S
briefly outline their method by one example. The
Theorem 4.2. If E(X) is linear, then A2CECX)) is linear expression for the set of all ancestors of
meGto AO(E(X)). an individual "a" is:

The advantage of algorithm A compared to algorithm ANC = @,=a Ox 'TF; 4CXW2,1PAR) U PAR)
I
A, is that the size of D. 4 the "difference term"
It holds:
produced at each iteration? is smaller with respect
clzaX =elza 'n, 4CXW2=IPAR) U PAR)
to the size of Si (the "accumulation" term). I
Theorem 4.3. Each expression E(X) can be rewritten By applying associativity and distributivity:
innG7ca 1 form: ci- 1=a~ = ml 4C~=a~~2=I~~~) u @,,,PAR)
I
‘i=l ..k TT-~‘ ~~i( ‘~ ’ ‘i) ’ ‘0
By introducing the variable Y for@l=aX we get:
where X" represents the Cartesian product of Ji
terms X and Ci are constant (database) relations. y = (~1,4(~~2,1 PAR) u l?lza~~~)

Let can denote the transformation in RA+ from an Thus we have:


expression E(X) to its canonica 1 form. Let
ANC = 0
Y
cITl,4(YW2=1 PAR) u O;,,PAR)
degreeCE(x)) denote the maximal $i in can(ECX))
Efficient algorithms can be developed depending on
This formula can now be evaluated using algorithm
the degree of an expression; we show, in
A . Unfortunately, this method applies only when
particular, algorithms developed for
t i e selection can be pushed directly to the
degree(E(X))=2. Note that if degreeCECX))=l then
variable X in E(X). In the rest of this section we
the expression is linear, hence A2 can be applied. how other
show, on the ground of examples,
Let E'CX,Y) be the expression obtained from E(X) by optimizations are possible. Consider the terms:
replacing each Cartesian product X x X with X x Y.
(4.1) Do = EC+)
Then, we can build algorithm A3.
D = ECD,) - EC+)
n+l
1 A (E(X)):
1 3S <-+; D <-+ Considering algorithm A2, it is easy to verify
1 REPEAT that, for linear expressions E(X), the following
Tl <- E'(S,D) equation holds:
I T2 <- E'CD,S) X
0 E(X) = Ui=o,...,oo D i'
T3 <- E(D)
I D <- CT1 U T2 U T3) S This formulation for E(X) is attractive because we
S <-SUD can compute terms so thatCF is pushed into each
1 UNTIL D = 9 D i, and never compute the full terms D.:
1 RETURN S
(4.2) rOxECX) = ui=U
,...,a 5 Di' '

-4oo-
Starting from this general form, we can see how PAR = +
algebraic manipulations produce the same effect as i=o"
techniques such as the "magic set" and "magic while Ri is not empty do
counting". We use a well-known example: the search begin
of same generation cousins. Let us produce the PARN = PAR,, U Ri
transformation from FFLP to ERA+ for this example:
i =i+l
FFLP: sgCX,X). end
sg(X,Y):- parCX,Xl),sgCXl,Yl),parCY,Yl). M =-rr2 PAR
M
DISEQUATIONS: EQ C SG Consider now the semi-join reduction of PAR:

5
5CCPARW2=,SG)W4=2PAR) C SG
, PAR' = PAR lX2,, M.
EQUATION: SG = EQ UT-$,5CCPARW2=,SG)W4=2PAR)
It is easy to see that only tuples of this relation
SOLUTION:
give a contribution to terms Di: we can then write:
sol(SG) = OsGCEQ U-l-T 1 ,#CPARW2=,SG)W4,2PAR))
=o- l=a PAR Ii o EQ o RAPi
Notice that the expreision computing SG is linear. Di
We introduce the "composition" operation, as in
By using equation (4.21, we deduce that:
CAho 791, to denote the following expression (where
R and S are binary relations): @,,a 0 sG (EQ u (PAR 0 SG 0 RAP)) =
RoS=i-i- R w2=l s U c- ,=a PAR' . o EQ o RAP' =
I,4 i=O,...,oD
Then, denoting as RAP the relation obtained by ,i o EQ o RAP' =
'i=lJ ()iza PAR
exchanging the order of attributes in PAR CRAP = R 'SG'a'
?=a O CEQ U (PAR' o SG o RAP)).
PAR), we have:
‘Tr2,1 We can then apply algorithm A2 to solve this
soi = OSGEQ U (PAR o CSG o RAP))
simplified problem. Note that M is itself obtained
Note that the composition operation is associative, as the application of the closure to a simple
hence: expression, as follows:
(X 0 Y) 0 2 = x 0 (Y 0 Z) = x 0 Y 0 z N ='n, Ox CCX o PAR) U G;,a PAR)

Further, ue indicate as X' a chain of i-l


Assuming acyclic data, we can easily show the
applications of the composition to a binary
relation X:
algebraic equivalent of the "magic counting"
method. Let us first simplify each term D. by
X' = x, 0 x2 0 . . . xi eliminating the EQ relation and propagdting
equality conditions:
Terms D. defined by the system (4.1) are:
= 6(+, = EQ D. = PAR' o RAP'
E" = ECDi) - EC+) = PAR' o EQ o RAP' Witft+'acyclic data, there cannot be replicated
i+l
Consider the query in FFLP:
tuples in the union of terms Ri; hence, there is an
i such that R.=$for j>i. But if R.=b, then also
?- sgCa,X). D.=+. Hence d can evaluate SC usihg the LHS of
eduation (4.2) by the following algorithm:
corresponding to the expression in ERA+:
c- ,=a sol(SG)
SG = +
By propagating selections to the terms Di in the
right side, we obtain:
while R #c$ do
D =qa PAR' * o EQ o RAP'
i begin
Let us compare the terms Di and Di+,: we denote as R = R o PAR
SG = SG U CR o RAP')
reducing common subexpression Ri the largest
i=j+l
common subexpression of Di, Di+, which includes end
selection condition(s); in this case, Algebraic transformations of this section are
R = qza PARi
easily generalized for a query with two bindings:
i
?-sg(a,b).
It is possible to pre-determine a subset of the
relation PAR which contains all relevant tuples for corresponding to the expression:
the computation of SG; this is done by evaluating
the "magic" set M of all elements that can appear
in the second column of terms R.; this is the set We omit details of derivations, and show the final
1
of all ancestors of "all. results:
Cl) Using magic sets:
n- Gc - -p
CEQ U (PAR' o SG o RAP'))
+ l=ah2=b
with RAP' being the semi-join reduction of RAP by
the magic set produced by the selection qzb RAP.

(2) Using magic counting, we produce the program:

-401-
(2) Using magic counting, we produce the program: CApt821 K.R. Apt, M.H. VanEmden, "Contributions to
the theory of logic programming", ACM Journal,
answer=no 29:3, ~841-862, 1982.
RI =G-
R2 = el=a ;;; CBancilhon861 F. Bancilhon, D. Maier, Y. Sagiv,
2=b J.D. Ulmann, "Magic sets and other strange ways
if RI o R2 #Q then answer = yes to implement logic programs", Proc. ACM-PODS,
while ((RI # +I and CR2 #4) and (answer = no)) Cambridge (MA), March 24-26 1986.
do
CCamerini861 P. Camerini, S. Ceri, G. Gottlob, L.
begin
Lavazza: "A note on the solution of mutually
RI = RI o PAR
dependent equations by variable substitution",
R2 = PAR o R2
Dipartimento di Elettronica, Internal Report,
if RI o R2 # Q then answer = yes
1986.
end
output answer CCeri851 S. Ceri, S. Crespi-Reghizzi, L. Lavazza:
"Extended Relational Algebra (ERA): Data
Notice that with this program we compute
structures and operations", Rep. n.
iteratively two terms, each obtained from one
Meteor/TZ/TXT/l, June 1985.
binding condition. The computation is halted as
soon as either of the two terms is empty or CCeri861 S. Ceri, G. Gottlob, L. Lavazza,
= RI o R2 produces one tuple. "Translation and optimization of logic queries:
'i
The algebraic approach", Dipartimento di
-5. CONCLUSIONS Elettronica, Politecnico di Milano, Rep.
n.86-004.
This paper has presented a systematic approach to
CChandra821 A.K. Chandra, D. Harel, "Horn clauses
the algebraic treatment of logic queries; we have
and the fixpoint hierarchy", in -Proc. ACM-PODS,
--
shown a syntax directed translation from FFLP to
pp. 158-162, 1982.
algebraic equations and then shown how equations or
systems of equations can be solved and how CHenshen 841 L.J. Henshen and S.A. Navqi, "On
individual equations can be optimized. compiling queries in recursive first-order
databases", --ACM Journal, 22~4, ~~-47-85, 1984.
Several problems considered in this paper need
further improvements: CMarque Pucheu843 G. Marque-Pucheu, J.
Marti%Gallausiaux, G. Jomier, "Interfacing
a. The proposed solution method for systems of
Prolog and relation1 DBMS" New applications of
equations could be improved by propagating
DBs, Academic Press 1984.
bindings from one equation to another.
CReiter781 R. Reiter, "On closed world databases",
b. Efficient algorithms presented for expressions
in Logic and databases, Plenum Press, New York,
of degree 1 and 2 can be generalized to
1978. -
expressions of any degree.
CSacca’861 D. Sacca', C. Zani 010, "On the
C. Further investigation is needed to fully Implementation of a Simple Class of Loqic Queries
understand how the algebraic approach compares
for Databases", Proc. 'ACM-PODS, Cambridge (MA),
with the "magic set" and "magic counting"
March 24-26 1986.
methods.
~Ullman821 J. Ullman, "Principles of database
d. Another noticeable direction of research has as
systems", Computer Science Press, Rockvi lle, Md.,
goal the treatment of Horn clauses including
1982.
function symbols. The necessary counterpart on
the database side is the extension of the CUllman J. Ullman, "Implementation of logical
relational model and languages to model complex query languages for databases", ACM-TODS, lO:3,
objects (e.g., non-INF relations). pp. 289-321, 1985.
CVanEmden761 M. H. Van Emden, R. Kovalski, "The
Acknowledgement semantics of predicate logic as a programming
language", --ACM Journal, 23:4, pp.733-742, 1976.
This research was supported by the Esprit project
n-432 Meteor (An integrated formal approach to
industrial software developement). We like to thank
Stefano Crespi-Reghizzi, who has stimulated our
work and provided useful comments on a first draft
of the paper, Letizia Tanca, who has suggested us
an improvement for the solution of systems of
equations, and Paolo Camerini, who has helped
proving the theorems in Section 3.4.

REFERENCES

[Ah0791 A.V. Aho, J.D. Ullman, "Universality of


data retrieval languages", Sixth ACM Symp. on
principles of programming languages, San Antonio
Jan. 1979.

-402-

You might also like