Expressive Power of SQL
Expressive Power of SQL
www.elsevier.com/locate/tcs
Abstract
It is a folk result in database theory that SQL cannot express recursive queries such as
reachability; in fact, a new construct was added to SQL3 to overcome this limitation. However,
the evidence for this claim is usually given in the form of a reference to a proof that relational
algebra cannot express such queries. SQL, on the other hand, in all its implementations has three
features that fundamentally distinguish it from relational algebra: namely, grouping, arithmetic
operations, and aggregation.
In the past few years, most questions about the additional power provided by these features
have been answered. This paper surveys those results, and presents new simple and self-contained
proofs of the main results on the expressive power of SQL. Somewhat surprisingly, tiny di1er-
ences in the language de2nition a1ect the results in a dramatic way: under some very natural
assumptions, it can be proved that SQL cannot de2ne recursive queries, no matter what aggregate
functions and arithmetic operations are allowed. But relaxing these assumptions just a tiny bit
makes the problem of proving expressivity bounds for SQL as hard as some long-standing open
problems in complexity theory.
c 2002 Elsevier Science B.V. All rights reserved.
1. Introduction
What queries can one express in SQL? Perhaps more importantly, one would like to
know what queries cannot be expressed in SQL—after all, it is the inability to express
certain properties that motivates language designers to add new features (at least one
hopes that this is the case).
This seems to be a rather basic question that database theoreticians should have
produced an answer to by the beginning of the 3rd millennium. After all, we have
been studying the expressive power of query languages for some 20 years now (and
0304-3975/03/$ - see front matter c 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 0 4 - 3 9 7 5 ( 0 2 ) 0 0 7 3 6 - 3
380 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
in fact more than that, if you count earlier papers by logicians on the expressiveness
of 2rst-order logic), and SQL is the de-facto standard of the commercial database
world—so there surely must be an answer somewhere in the literature.
When one thinks of the limitations of SQL, its inability to express reachability
queries comes to mind, as it is well documented in the literature (in fact, in many
database books written for very di1erent audiences, e.g. [1,5,7,26]). Let us consider
a simple example: suppose that R(Src,Dest) is a relation with Cight information:
Src stands for source, and Dest for destination. To 2nd pairs of cities (A; B) such
that it is possible to Cy from A to B with one stop, one would use a self-join as
follows.
SELECT R1.Src, R2.Dest
FROM R AS R1, R AS R2
WHERE R1.Dest = R2.Src
What if we want pairs of cities such that one makes two stops on the way? Then we
do a more complicated self-join shown below.
SELECT R1.Src, R3.Dest
FROM R AS R1, R AS R2, R AS R3
WHERE R1.Dest = R2.Src AND R2.Dest = R3.Src
Taking the union of these two and the relation R itself we would get the pairs of
cities such that one can Cy from A to B with at most two stops. But often one needs
a general reachability query in which no a priori bound on the number of stops is
known; that is, whether it possible to get to B from A.
Graph-theoretically, this means computing the transitive closure of R. It is well known
that the transitive closure of a graph is not expressible in relational algebra or calcu-
lus; in particular, expressions similar to those above (which happen to be unions of
conjunctive queries) cannot possibly express it. This appears to be a folk result in
the database community; while many papers do refer to [2] or some other source on
the expressive power of 2rst-order logic, many texts just state that relational algebra,
calculus and SQL cannot express recursive queries such as reachability.
With this limitation in mind, the SQL3 standard introduced recursion explicitly into
the language [7,12]. One would write the reachability query as follows.
trcl(x; y) : − r(x; y)
trcl(x; y) : − trcl(x; z); r(z; y):
L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404 381
When a new construct is added to a language, a good reason must exist for it, especially
if the language is a declarative query language, with a small number of constructs,
and with programmers relying heavily on its optimizer. The reason for introducing
recursion in the next SQL standard is precisely this folk result stating that it cannot
be expressed in the language. But when one looks at what evidence is provided to
support this claim, one notices that all the references point to papers in which it is
proved that relational algebra and calculus cannot express recursive queries. Why is
this not suHcient? Consider the following query
SELECT 1
FROM R1
WHERE (SELECT COUNT(*) FROM R1) >
(SELECT COUNT(*) FROM R2)
This query tests if |R1|¿|R2|: in that case, it returns 1, otherwise it returns the empty
set. However, logicians proved it long time ago that 2rst-order logic, and thus relational
calculus, cannot compare cardinalities of relations (cf. [1]), and yet we have a very
simple SQL query doing precisely that.
The conclusion, of course, is that SQL has more power than relational algebra, and
the main source of this additional power is its aggregation and grouping constructs, to-
gether with arithmetic operations on numerical attributes. But then one cannot say that
the transitive closure query is not expressible in SQL simply because it is inexpressible
in relational algebra. Thus, it might appear that the folk theorem about recursion and
SQL is an unproven statement.
Fortunately, this is not the case: the statement was (partially) proved in the past
few years; in fact, a series of papers proved progressively stronger results, 2nally
establishing good bounds on the expressiveness of SQL.
The main goal of the paper is twofold:
(a) We give an overview of these recent results on the expressiveness of SQL. We
shall see that some tiny di1erences in the language de2nition a1ect the results
in a dramatic way: under some assumptions, it can be shown that reachability
and many other recursive queries are not expressible in SQL. However, under a
slightly di1erent set of assumptions, the problem of proving expressivity bounds
for SQL is as hard as separating some complexity classes.
(b) Due to a variety of reasons, even the simplest proofs of expressivity results for
SQL are not easy to follow; partly this is due to the fact that most papers used
the setting of their predecessors that had unnecessary complications in the form of
nested relations, somewhat unusual (for mainstream database people) languages
and in2nitary logics. Here we get rid of those complications, and present a simple
and self-contained proof of expressivity bounds for SQL.
Organization. In the next section, we discuss the main features that distinguish
SQL from relational algebra, in particular, aggregate functions. We then give a brief
overview of the literature on the expressive power of SQL.
Starting with Section 3, we present those results in more detail. We introduce
relational algebra with grouping and aggregates, ALGaggr , that essentially captures basic
SQL statements. Section 4 states the main result on the expressive power of SQL,
382 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
namely that queries it can express are local. If one thinks of queries on graphs, it
means that the decision whether a tuple t̃ belongs to the output is determined by a
small neighborhood of t̃ in the input graph; the reachability query does not have this
property.
Section 5 de2nes an aggregate logic Laggr and shows a simple translation of the
algebra with aggregates ALGaggr into this logic. Then, in Section 6, we present a self-
contained proof of locality of Laggr (and thus of ALGaggr ).
In previous papers on the expressive power of SQL [24,25,22,18], we used languages
of a rather di1erent Cavor, based on structural recursion [4] and comprehensions [30].
In Section 7, we show that those languages are at most as expressive as ALGaggr .
In Section 8, we consider an extension ALG¡ aggr of ALGaggr in which non-numerical
order comparisons are allowed, and show that it is more powerful than the unordered
version. Furthermore, no non-trivial bounds on the expressiveness of this language can
be proved without answering some deep open problems in complexity theory.
Section 9 gives a summary and concluding remarks.
What exactly is SQL? There is, of course, a very long standard, that lists numerous
features, most of which have very little to do with the expressiveness of queries. As far
as expressiveness is concerned, the main features that distinguish SQL from relational
algebra, are the following:
• Aggregate functions: one can compute, for example, the average value in a column.
The standard aggregates in SQL are COUNT, SUM, AVG, MIN, MAX.
• Grouping: not only can one compute aggregates, one can also group them by values
of di1erent attributes. For example, it is possible to compute the average salary for
each department.
• Arithmetic: SQL allows one to apply arithmetic operations to numerical values.
For example, for relations S1(Empl,Dept) and S2(Empl,Salary), the following query
(assuming that Empl is a key for both relations) computes the average salary for each
department which pays total salary at least 100,000:
SELECT S1.Dept, AVG(S2.Salary)
FROM S1, S2
(*) WHERE S1.Empl = S2.Empl
GROUPBY S1.Dept
HAVING SUM(S2.Salary)¿100000
Next, we address the following question: what is an aggregate function? The 2rst paper
to look into this was probably [20]: it de2ned aggregate functions as f : R → Num,
where R is the set of all relations, and Num is a numerical domain. A problem with
this approach is that it requires a di1erent aggregate function for each relation and each
numerical attribute in it; that is, we do not have just one aggregate AVG, but in2nitely
many of those. This complication arises from dealing with duplicates in a column.
However, duplicates can be incorporated in a much more elegant way, as suggested
L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404 383
F = {f0 ; f1 ; f2 ; : : : ; f! };
where fk is a function that takes a k-element multiset (bag) of elements of Num and
produces an element of Num. For technical reasons, we also add a constant f! ∈ Num
whose intended meaning is the value of F on in2nite multisets. For example, if Num is
k
N, or Q, or R, we de2ne the aggregate = {s0 ; s1 ; : : :} by sk ({|x1 ; : : : ; xk |}) = i=1 xi ;
furthermore, s0 = s! = 0 (we use the {| |} brackets for multisets). This corresponds
to SQL’s SUM. For COUNT, one de2nes C = {c0 ; c1 ; : : :} with ck returning k (we may
again assume c! = 0). The aggregate AVG is de2ned as A = {a0 ; a1 ; : : :} with ak (X ) =
sk (X )=ck (X ), a0 = a! = 0. For MAX, we de2ne the aggregate {max0 ; max1 ; : : :} with
maxk ({|x1 ; : : : ; xk |}) = maxi6k xi , max0 = max! = 0, and likewise for MIN.
It is very hard to prove formal statements about a language like SQL: to put it mildly,
its syntax is not very easy to reason about. The research community has come up with
several proposals of languages that capture the expressiveness of SQL. The earliest one
is perhaps Klug’s extension of relational algebra by grouping and aggregation [20]: if
e is an expression producing a relation with m attributes, Ã is a set of attributes, and f
is an aggregate function, then eÃ; f is a new expression that produces a relation with
m + 1 attributes. Assuming f applies to attribute A , and B̃ is the list of all attributes
of the output of e, the semantics is best explained by SQL:
SELECT B̃, f(A )
FROM e
GROUPBY Ã
Klug’s paper did not analyze the expressive power of this algebra, nor did it show how
to incorporate arithmetic operations. The main contribution of [20] is an equivalence
result between the algebra and an extension of relational calculus. However, the main
focus of that extension is its safety, and the resulting logic is extremely hard to deal
with, due to many syntactic restrictions.
To the best of my knowledge, the 2rst paper that directly addressed the problem of
the expressive power of SQL, was the paper by Consens and Mendelzon in ICDT’90
[6]. They have a datalog-like language, whose non-recursive fragment is exactly as
expressive as Klug’s algebra. Then they show that this language cannot express the
transitive closure query under the assumption that DLOGSPACE is properly included
in NLOGSPACE. The reason is simple: Klug’s algebra (with some simple aggre-
gates) can be evaluated in DLOGSPACE, while transitive closure is complete for
NLOGSPACE.
That result can be viewed as a strong evidence that SQL is indeed incapable of
expressing reachability queries. However, it is not completely satisfactory for three
reasons. First, nobody knows how to separate complexity classes. Second, what if one
384 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
adds more complex aggregates that increase the complexity of query evaluation? And
third, what if the input graph has a very simple structure (for example, no node has
outdegree more than 1)? In this case reachability is in DLOGSPACE, and the argument
of [6] does not work.
In early 1990s, many people were looking into languages for collection types. Func-
tional statically typechecked query languages became quite fashionable, and they were
produced in all kinds of Cavors, depending on particular collection types they had to
support. It turned out that a set language capturing essentially the expressive power
of a language for bags, could also model all the essential features of SQL [24]. The
problem was that the language dealt with nested relations, or complex objects. But
then [24], extending [28,31], proved a conservativity result, stating that nested rela-
tions are not really needed if the input and output do not have them. That made it
possible to use a non-nested fragment of languages inspired by structural recursion [4]
and comprehensions [30] as a “theoretical reconstruction of SQL.”
Several papers dealt with this language, and proved a number of expressivity bounds.
The 2rst one, appearing in PODS’94 [24], showed that the language could not express
reachability queries. The proof, however, was very far from ideal. It only proved in-
expressibility of transitive closure in a way that was very unlikely to extend to other
queries. It relied on a complicated syntactic rewriting that would not work even for a
slightly di1erent language. And the proof would not work if one added more aggregate
functions.
The 2rst limitation was addressed in [8] where a certain general property of queries
expressible in SQL was established. However, the other two problems not only re-
mained, but were exacerbated: the rewriting of queries became particularly unpleasant.
In an attempt to remedy this, [22] gave an indirect encoding of a fragment of SQL
into 2rst-order logic with counting, FO(C) (it will be formally de2ned later). The re-
striction was to natural numbers, thus excluding aggregates such as AVG. The encoding
is bound to be indirect, since SQL is capable of expressing queries that FO(C) cannot
express. The encoding showed that for any query Q in SQL, there exists an FO(C)
query Q that shares some nice properties with Q. Then [22] established some prop-
erties of FO(C) queries and transferred them to that fragment of SQL. The proof was
much cleaner than the proofs of [24,8], at the expense of a less expressive language.
After that, [25] showed that the coding technique can be extended to SQL with ratio-
nal numbers and the usual arithmetic operations. The price to pay was the readability
of the proof—the encoding part became very unpleasant.
That was a good time to pause and see what must be done di1erently. How do we
prove expressivity bounds for relational algebra? We do it by proving bounds on the
expressiveness of 2rst-order logic (FO) over 2nite structures, since relational algebra
has the same power as FO. So perhaps if we could put aggregates and arithmetic
directly into logic, we would be able to prove expressivity bounds in a nice and
simple way?
That program was carried out in [18], and I shall survey the results below. One
problem with [18] is that it inherited too much unnecessary machinery from its pre-
decessors [8,22–25]: one had to deal with languages for complex objects and apply
conservativity results to get down to SQL; logics were in2nitary to start with, although
L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404 385
in2nitary connectives were not necessary to translate SQL; and expressivity proofs went
via a special kind of games invented elsewhere [16].
Here we show that all these complications are completely unnecessary: there is
indeed a very simple proof that reachability is not expressible in SQL, and this proof
will be presented below. Our language is a slight extension of Klug’s algebra (no
nesting). We translate it into an aggregate logic (with no in2nitary connectives) and
prove that it has nice locality properties (without using games).
To deal with aggregation, we must distinguish numerical columns (to which aggre-
gates can be applied) from non-numerical ones. We do it by typing: a type of a relation
is simply a list of types of its attributes.
We assume that there are two base types: a non-numerical type b with domain Dom,
and a numerical type n, whose domain is denoted by Num (it could be N; Z; Q; R, for
example).
A type of a relation is a string over the alphabet {b; n}. A relation R of type a1 : : : am
has m columns, the ith one containing entries of type ai . In other words, such a relation
is a 2nite subset of
m
dom(ai );
i=1
where dom(b) = Dom and dom(n) = Num. For example, the type of S2(Empl,Salary)
is bn. For a type t, t:i denotes the ith position in the string. The length of t is denoted
by |t|.
A database schema SC is a collection of relation names Ri and their types ti ; we
write Ri : ti if the type of Ri is ti .
Next, we de2ne expressions of relational algebra with aggregates, ALGaggr (; ),
parameterized by a collection of functions and predicates on Num, and a collection
of aggregates, over a given schema SC. Expressions are divided into three groups:
the standard relational algebra, arithmetic, and aggregation=grouping. In what follows,
m stands for |t|, and i1 ; : : : ; ik for a sequence 16i1 ¡ · · · ¡ik 6m.
3.2. Arithmetic
Semantics. For the relational algebra operations, this is standard. The operation is
permutation: each tuple (a1 ; : : : ; am ) is replaced by (a(1) ; : : : ; a(m) ). The condition i = j
in the selection predicate means equality of the ith and the jth attribute: (a1 ; : : : ; am )
is selected if ai = aj . Note that using Boolean operations we can model arbitrary com-
binations of equalities and disequalities among attributes.
For numerical selection, ![P]i1 ;:::;ik selects (a1 ; : : : ; am ) i1 P(ai1 ; : : : ; aik ) holds. Func-
tion application replaces each (a1 ; : : : ; am ) with (a1 ; : : : ; am ; f(ai1 ; : : : ; aik )). Apply[c]#
produces the relation {c}.
The aggregate operation is SQL SELECT Ã; F(Ai ) FROM e, where à = (A1 ; : : : ; Am ) is
the list of attributes. More precisely, if e evaluates to ã1 ; : : : ; ãp where ãj = (a1j ; : : : ; amj ),
1 m i i
then Aggr[i : F](e) replaces each ãj with (aj ; : : : ; aj ; f) where f = F({|a1 ; : : : ; ap |}).
Finally, Groupl ['S:e](e ) groups the tuples by the values of their 2rst l attributes
and applies e to the sets formed by this grouping. For example:
a1 b1 b1
a1 d1 a1 d1
a1 b2 b2 'S:e
a1
→ −→ d2 → a1 d2
a2 c1 c1
a2 a2 g1 a2 g1
a2 c2 c2
assuming that e returns {d1 ; d2 } when S = {b1 ; b2 }, and e returns {g1 } for S = {c1 ; c2 }.
Formally, let e evaluate to {ã1 ; : : : ; ãp }. We split each tuple ãj = (a1j ; : : : ; am j ) into
ãj = (a1j ; : : : ; alj ) that contains the 2rst l attributes, and ãj = (al+1
j ; : : : ; am
j ) that con-
tains the remaining ones. This de2nes, for each ãj , a set Sj = {ãr | ãr = ãj }. Let Tj =
L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404 387
m
{b̃1j ; : : : ; b̃j j } be the result of applying e with S interpreted as Sj . Then Groupl ['S:e]
(e ) returns the set of tuples of the form (ãj ; b̃ij ), 16j6p, 16i6mj .
Klug’s algebra. This algebra is one of the most popular theoretical languages for
aggregate functions. It does not split grouping and aggregation, and combines them in
the same operation as follows:
Example. The query (*) from Section 2 is de2ned by the following expression (which
uses the operator combining grouping with aggregation):
1;4 (![¿ 100000]5 ((Aggr 1 [3 : A; 3 : -](2;3;4 (!1=3 (S1 × S2 ))))));
where A is the aggregate AVG, is SUM, and ¿100000 is a unary predicate on N
which holds of numbers n¿100000.
Example. The only aggregate that can be applied to non-numerical attributes in SQL
is COUNT that returns the cardinality of
a column. It can be easily expressed in ALGaggr
as long as the summation aggregate and constant 1 are present. We show how to
de2ne Count m (e):
SELECT #1; : : : ; #m − 1,COUNT(#m)
FROM E
GROUPBY #1; : : : ; #m − 1
First, we add a new column, whose elements are all 1s: e1 = e × Apply[1]# . Then de2ne
an expression e = Aggr[2 : -](S), and use it to produce
e2 = Groupm−1 ['S:e ](e1 ):
388 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
This is almost the answer: there are extra 2 attributes, the mth attribute of e, and those
extra 1s. So 2nally we have
What kind of general statement can one provide that would give us strong evidence
that SQL cannot express recursive queries? For that purpose, we shall use the locality
of queries. Locality was the basis of a number of tools for proving expressivity bounds
of 2rst-order logic [15,13,11], and it was recently studied on its own and applied to
more expressive logics [17,23].
The general idea of this notion is that a query can only look at a small portion
of its input. If the input is a graph, “small” means a neighborhood of a 2xed ra-
dius. For example, Fig. 1 shows that reachability is not local: just take a graph like
the one shown in the picture so that there would be two points whose distance from
the endpoints and each other is more than 2r, where r is the 2xed radius. Then the
locality of query says that (a; b) and (b; a) are indistinguishable, as the query can
only look at the r-neighborhoods of a and b. Transitive closure, on the other hand,
does distinguish between (a; b) and (b; a), since b is reachable from a but not vice
versa.
We now de2ne locality formally. We say that a schema SC is purely relational
if there are no occurrences of the numerical type n in it. Let us 2rst restrict our
attention to graph queries. Suppose we have a purely relational schema R : bb; that is,
the relation R contains edges of a directed graph. Suppose e is an expression of the
same type bb; that is, it returns a directed graph. Given a pair of nodes a; b in R, and
a number r¿0, the r-neighborhood of a; b in R, NrR (a; b), is the subgraph on the set
of nodes in R whose distance from either a or b is at most r. The distance is measured
in the undirected graph corresponding to R, that is, R ∪ R−1 .
We write (a; b) ≈Rr (c; d) when the two neighborhoods, NrR (a; b) and NrR (c; d), are
isomorphic; that is, when there exists a (graph) isomorphism h between them such that
h(a) = c; h(b) = d. Finally, we say that e is local if there is a number r, depending on
e only, such that
We have seen that reachability is not local. Another example of a non-local query is
a typical example of recursive query called same-generation:
sg(x; x) : −
sg(x; y) : − R(x ; x); R(y ; y); sg(x ; y ):
This query is not local either: consider, for example, a graph consisting of two chains:
(a; b1 ); (b1 ; b2 ); : : : ; (bm−1 ; bm ) and (a; c1 ); (c1 ; c2 ); : : : ; (cm−1 ; cm ). Assume that same-
generation is local, and r¿0 witnesses that. Take m¿2r + 3, and note that the
r-neighborhoods of (br+1 ; cr+1 ) and (br+1 ; cr+2 ) are isomorphic. By locality, this
would imply that these pairs agree on the same-generation query, but in fact we have
(br+1 ; cr+1 ) ∈ sg(R) and (br+1 ; cr+2 ) ∈= sg(R).
We now state our main result on locality of queries, that applies to the language in
which no limit is placed on the available arithmetic and aggregate functions—all are
available. We denote this language by ALGaggr (All; All).
Theorem 1 (Locality of SQL). Let e be a purely relational graph query in ALGaggr (All;
All), that is, an expression of type bb over the scheme of one symbol R : bb. Then e
is local.
That is, neither reachability, nor same-generation, is expressible in SQL over the
base type b, no matter what aggregate functions and arithmetic operations are available.
Inexpressibility of many other queries can be derived from this, for example, tests for
graph connectivity and acyclicity.
Our next goal is to give an elementary, self-contained proof of this result. The
restriction to graph queries used in the theorem is not necessary; the result can be
stated in greater generality, but the restriction to graphs makes the de2nition of locality
very easy to understand. The proof will consist of three steps:
(1) We introduce an aggregate logic Laggr , as an extension of 2rst-order logic, and
show how ALGaggr queries are translated into it. We do it because it is easier to
prove expressivity bounds for a logic than for an algebra.
(2) We show that we can replace aggregate terms of Laggr by counting quanti:ers,
thereby translating Laggr into a simpler logic LC . The price to pay is that LC has
in2nitary connectives.
(3) We note that any use of an in2nitary connective resulting from translation of Laggr
into LC applies to a rather uniform family of formulae, and use this fact to give
a simple inductive proof of locality of LC formulae.
Our goal here is to introduce a logic Laggr into which we translate ALGaggr expres-
sions. The structures for this logic are precisely relational databases over two base types
with domains Dom and Num; that is, vocabularies are just schemas. This makes the
logic two-sorted; we shall also refer to Dom as :rst-sort and to Num as second-sort.
390 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
We now de2ne formulae and terms of Laggr (; ); as before, is a set of predicates
and functions on Num, and is a set of aggregates. The logic is just a slight extension
of the two-sorted 2rst-order logic.
A SC-structure D is a tuple A; RD D
1 ; : : : ; Rk , where A is a 2nite subset of Dom, and
RD
i of type t i is a 2nite subset of
|ti |
domj (D);
j=1
Theorem 2. Let e : t be an expression of ALGaggr (; ). Then there is a formula ’e (x̃)
of Laggr (; ), with x̃ of type t, such that for any SC-database D,
Proof. For the usual relational algebra operators, this is the same as the standard
textbook translation of algebra expressions into calculus expression. So we only show
how to translate arithmetic operations, aggregation, and grouping.
• Numerical selection: Let e = ![P]i1 ;:::;ik (e), where P is a k-ary predicate in . Then
’e (x̃) is de2ned as ’e (x̃) ∧ P(xi1 ; : : : ; xik ).
• Function application: Let e = Apply[f]i1 ;:::;ik (e), where f : Numk → Num is in .
Then ’e (x̃; q) ≡ ’e (x̃) ∧ (q = f(xi1 ; : : : ; xik )).
• Aggregation: Let e = Aggr[i : F](e). Then ’e (x̃; q) ≡ ’e (x̃) ∧ (q = Aggr F ỹ:
(’e (ỹ); yi )).
• Grouping: Let e = Groupm ['S:e1 ](e2 ), where e1 : u is an expression over SC ∪ {S : s},
and e2 over SC is of type t · s. Let x̃; ỹ; z̃ be of types t; s; u, respectively.
L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404 391
Then
’e (x̃; z̃) ≡ ∃ỹ’e2 (x̃; ỹ) ∧ ’e1 (z̃)[’e2 (x̃; ṽ)=S(ṽ)];
where the second conjunct is ’e1 (z̃) in which every occurrence of S(ṽ) is replaced
by ’e2 (x̃; ṽ).
The converse does not hold: formulae of Laggr need not de2ne safe queries, while
all ALGaggr queries are safe. It is possible, however, to prove a partial converse result;
see [18] for more details.
We start by stating our main result in greater generality, without restriction to graph
queries.
Let SC be purely relational (no occurrences of type n), and D an instance of SC.
The active domain of D, adom(D), is the set of all elements of Dom that occur in
relations of D. The Gaifman graph of D is the undirected graph G(D) on adom(D)
with (a; b) ∈ G(D) i1 a; b belong to the same tuple of some relation in D. The r-sphere
of a ∈ adom(D), SrD (a), is the set of all b such that d(a; b)6r, where the distance
d(·; ·) is taken in G(D). The r-sphere of ã = (a1 ; : : : ; ak ) is SrD (ã) = i6k SrD (ai ). The
r-neighborhood of ã, NrD (ã), is a new database, whose active domain is SrD (ã), and
whose SC-relations are simply restrictions of those relations in D. We write ã ≈D r b̃
when there is an isomorphism of relational structures h : NrD (ã) → NrD (b̃) such that in
addition h(ã) = b̃. Finally, we say that a query e of type b : : : b is local if there exists a
number r¿0 such that, for any database D, ã ≈D r b̃ implies that ã ∈ e(D) i1 b̃ ∈ e(D).
The minimum such r is called the locality rank of e and denoted by lr(e).
Theorem 3. Let e be a purely relational query in ALGaggr (All; All), that is, an expres-
sion of type b : : : b over a purely relational schema. Then e is local.
Since ALGaggr (All; All) can be translated into Laggr (All; All), it suHces to prove that
the latter is local. The proof of this is in two steps: we 2rst introduce a simpler counting
logic, LC , and show how to translate Laggr into it. We then give a simple proof of
locality of LC .
The logic LC is simpler than Laggr in that it does not have aggregate terms. There is
a price to pay for this—LC has in2nitary conjunctions and disjunctions. However, the
translation ensures that for each in2nite conjunction or disjunction, there is a uniform
bound on the rank of formulae in it (to be de2ned a bit later), and this property
suHces to establish locality.
6.1. Logic LC
The structures for LC are the same as the structures for Laggr . The only terms are vari-
ables (of either sort); in addition, every constant c ∈ Num is a term of the second sort.
392 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
Atomic formulae are R(x̃), where R ∈ SC, and x̃ is a tuple of terms (that is, variables
and perhaps constants from Num) of the appropriate sort, and x = y, where x; y are
terms of the same sort.
Formulae are closed under the Boolean connectives,
andin:nitary connectives:
if ’i , i ∈ I , is a collection of formulae, then i∈I ’i and i∈I ’i are LC formu-
lae. Furthermore, they are closed under both 2rst and second-sort quanti-
2cation.
Finally, for every i ∈ N, there is a quanti2er ∃i that binds one 2rst-sort variable: that
is, if ’(x; ỹ) is a formula, then ∃ix ’(x; ỹ) is a formula whose free variables are ỹ.
The semantics is as follows: D |= ∃ix’(x; ã) if there are i distinct elements b1 ; : : : ; bi ∈ A
such that D |= ’(bj ; ã), 16j6i. That is, the existential quanti2er is witnessed by at
least i elements. Note that the 2rst-sort quanti2cation is superCuous as ∃x’ is equivalent
∃1x ’.
We now introduce the notion of a rank of a formula, rk(’), for both LC and Laggr .
For LC , this is the quanti2er rank, but the second-sort quanti2cation does not count:
• For each atomic ’, rk(’) = 0.
• For ’ = i ’, rk(’) = supi rk(’), and likewise for .
• rk(¬’) = rk(’).
• rk(∃ix ’) = rk(’) + 1 for x 2rst-sort; rk(∃k’) = rk(’) for k second-sort.
For Laggr , the de2nition di1ers slightly.
• For a variable or a constant term, the rank is 0.
• The rank of an atomic formula is the maximum rank of a term in it.
• rk(’1 ∗ ’2 ) = max(rk(’1 ); rk(’2 )), for ∗ ∈ {∨; ∧}; rk(¬’) = rk(’).
• rk(f(21 ; : : : ; 2n )) = max16i6n rk(2i ).
• rk(∃x’) = rk(’) + 1 if x is 2rst-sort; rk(∃k’) = rk(’) if k is second-sort.
• rk(Aggr F ỹ: (’; 2)) = max(rk(’); rk(2)) + m, where m is the number of 2rst-sort
variables in ỹ.
This is the longest step in the proof, but although it is somewhat tedious, conceptually
it is quite straightforward.
Proposition 1. For every formula ’(x̃) of Laggr (All; All), there exists an equivalent
formula ’◦ (x̃) of LC such that rk(’◦ )6rk(’).
Proof. We start by showing that one can de2ne a formula ∃ix̃’ in LC , whose meaning
is that there exist at least i tuples x̃ such that ’ holds. Moreover, its rank equals rk(’)
plus the number of 2rst-sort variables in x̃. The proof is by induction on the length of
x̃. If x̃ is a single 2rst-sort variable, then the counting quanti2er
is already in LC . If
k is a second-sort variable, then ∃ik’(k; ·) is equivalent to C c∈C ’(c; ·), where C
ranges over i-element subsets of Num—this does not increase the rank. Suppose we
can de2ne it for x̃ being of length n. We now show how to de2ne ∃i(y; x̃)’ for y of
the 2rst sort, and ∃i(k; x̃)’ for k of the second sort.
L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404 393
(1) Let (z̃) ≡ ∃i(y; x̃)’(y; x̃; z̃). It is the case that there are at least i tuples (bj ; ãj )
satisfying ’(y; x̃; ·) i1 one can 2nd an l-tuple of pairs ((n1 ; m1 ); : : : ; (nl ; ml )) with
all mj s distinct, such that
• there are at least nj tuples ã for which the number of elements b satisfying
’(b; ã; ·) is precisely mj , and
l
• j=1 nj · mj ¿i.
Thus, (z̃) is equivalent to
l
∃nj x̃ (∃!mj y ’(y; x̃; z̃));
j=1
where the disjunction is taken over all the tuples satisfying nj , mj ¿0, mj s distinct,
l
and j=1 nj · mj ¿i (it is easy to see that a 2nite disjunction would suHce), and
∃!nu’ abbreviates ∃nu’ ∧ ¬ ∃(n + 1)u’.
The rank of this formula equals rk(∃!mj y’) = rk(’)+1, plus the number of 2rst-
sort variables in x̃ (by the induction hypothesis)—that is, rk(’) plus the number
of 2rst-sort variables in (y; x̃).
(2) Let (z̃) ≡ ∃i(k; x̃)’(k; x̃; z̃). The proof is identical to the proof above up to the
pointof writing down
the quanti2er ∃!mj k’(k; ·)—it is replaced by the formula
C ( c∈C ’(c; ·) ∧ = ¬ ’(c; ·)) where C ranges over mj -element subsets of
c∈C
Num. As the rank of this equals rk(’), we conclude that the rank of the formula
equivalent to (z̃) equals rk(’) plus the number of 2rst-sort variables in x̃.
This concludes the proof that counting over tuples is de2nable in LC . With this, we
prove the proposition by induction on the formulae and terms. We also produce, for
each second-sort term 2(x̃) of Laggr , a formula 2 (x̃; z) of LC , with z of the second
sort, such that D |= 2 (ã; q) i1 the value of 2(ã) on D is q.
We may assume, without loss of generality, that parameters of atomic Laggr formulae
R(·) and P(·) are tuples of variables: indeed, if a second-sort term occurs in R(· 2i ·),
it can be replaced by ∃k (k = 2i ) ∧ R(· k ·) without increasing the rank. We now de2ne
the translation as follows:
• For a second-sort term t which is a variable q, t (q; z) ≡ (z = q). If t is a constant
c, then t (z) ≡ (z = c).
◦
• For an atomic ’ of the form x = y, where x; y are 2rst-sort, ’ = ’. n
◦
• For an atomic ’ of the form P(21 (x̃); : : : ; 2n (x̃)), ’ (x̃) is (c1 ;:::;cn )∈P i=1 2i (x̃; ci ).
Note that rk(’◦ ) = maxi rk( 2i )6 maxi rk(2i ) = rk(’).
• (’1 ∨ ’2 )◦ = ’◦1 ∨’◦2 , (’1 ∧ ’2 )◦ = ’◦1 ∧ ’◦2 , (¬’)◦ = ¬ ’◦ , (∃x’)◦ = ∃x’◦ for x of
either sort. Clearly, this does not increase the rank.
• For a term 2(x̃) = f(21 (x̃); : : : ; 2n (x̃)), we have
n
2 (x̃; z) = (z = c) ∧ 2j (x̃; cj ):
(c;c1 ;:::;cn ):c=f(c̃) j=1
where ’◦∞ (x̃) tests if the number of ỹ satisfying ’(x̃; ỹ) is in2nite, and
produces
the value of the term in the case the number of such ỹ is 2nite.
Indeed, this formula asserts that either ’(x̃; ·) does not hold and then z = f0 , or that
c1 ; : : : ; cl are exactly the values of the term 2(x̃; ỹ) when ’(x̃; ỹ) holds, and that ni s
are the multiplicities of the ci s.
A straightforward analysis of the produced formulae shows that rk( 2 )6 max(rk(’◦ );
rk( 2 )) plus the number of 2rst-sort variables in ỹ; that is, rk( 2 )6rk(2 ). This
completes the proof of the proposition.
6.3. LC is local
Formulae of Laggr have 2nite rank; hence they are translated into LC formulae of
2nite rank. We now show by a simple induction argument that those formulae are
local. More precisely, we show that for every 2nite-rank LC formula ’(x̃;˜–) (x̃ of
2rst-sort, ˜– of second-sort) over purely relational SC, there exists a number r¿0 such
that ã ≈D
r b̃ implies D |= ’(ã;˜–0 ) ↔ ’(b̃;˜–0 ) for any ˜–0 . The smallest such r will be
denoted by lr(’). The proof is based on:
D D D
Proof. Fix an isomorphism h : N3r+1 (ã) → N3r+1 (b̃) with h(ã) = b̃. For any c ∈ S2r+1 (ã),
D
h(c) ∈ S2r+1 (b̃) has the same isomorphism type of its r-neighborhood. Thus, for any
isomorphism type T of an r-neighborhood of a single element, there are equally many
D D
elements in A − S2r+1 (ã) and in A − S2r+1 (b̃) that realize T . Thus, we have a bijection
L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404 395
D D
g : A − S2r+1 (ã) → A − S2r+1 (b̃) such that c ≈D
r g(c). Then can be de2ned as h on
D D
S2r+1 (ã), and as g on A − S2r+1 (ã).
Based on the lemma, we show that every LC formula ’ of 2nite rank is local, with
lr(’)6(3rk(’) − 1)=2. Note that for the sequence r0 = 0; : : : ; ri+1 = 3ri + 1; : : :, we have
rk = (3k − 1)=2; we show lr(’)6rrk(’) .
The proof of this is by induction on the formulae, and it is absolutelystraightforward
for all cases except counting quanti2ers. For example, if ’(x̃;˜–) = j ’j (x̃;˜–), and
m = rk(’), then by the hypothesis, lr(’j )6rm , as rk(’j )6rk(’). So 2x ˜–0 , and let
ã ≈Drm b̃. Then D |= ’j (ã;˜–0 ) ↔ ’j (b̃;˜–0 ) for all j by the induction hypothesis, and thus
D |= ’(ã;˜–0 ) ↔ ’(b̃;˜–0 ).
Now consider the case of the counting quanti2er (x̃;˜–) ≡ ∃iz’(x̃; z;˜–). Let rk(’) = m,
then rk( ) = m + 1 and rm+1 = 3rm + 1. Fix ˜–0 , and let ã ≈D rm+1 b̃. By the Permutation
D
Lemma, we get a permutation : A → A such that ãc ≈rm b̃(c). By the hypothesis,
lr(’)6rm , and thus D |= ’(ã; c;˜–0 ) ↔ ’(b̃; (c);˜–0 ). Hence, the number of elements
of A satisfying ’(ã; ·;˜–0 ) is exactly the same as the number of elements satisfying
’(b̃; ·;˜–0 ), which implies D |= (ã;˜–0 ) ↔ (b̃;˜–0 ). This concludes the proof of locality
of LC .
Putting everything together, let e be a purely relational expression of ALGaggr (All; All).
By Theorem 2, it is expressible in Laggr (All; All), and by Proposition 1, by a LC
formula of 2nite rank. Hence, it is local.
As was mentioned already, previous papers on the expressive power of SQL dealt
with a theoretical language of distinctly di1erent Cavor: that is, a functional, typed
language obtained as a restriction of a nested relational algebra with aggregates. In
this section we brieCy review that language, and present a translation from it to
ALGaggr (All; All), thereby showing that the results of this paper are at least as strong as
those in [18].
Following [18], we assume that the numerical domain is Q. We de2ne a relational
query language RLaggr (; ), parameterized by a collection of allowed arithmetic
functions and predicates and a collection of allowed aggregates . We assume that
the usual arithmetic operations (+, −, ∗, ÷) and the order ¡ on Q are always in
and the summation aggregate ( ) is always in .
There are three categories of types in RLaggr :
(1) Base types, which are b and Q; we denote them by b, possibly subscripted;
(2) Record types of the form b1 × · · · × bn , where b1 ; : : : ; bn are base types; we denote
them by rt;
(3) Relational types {rt}.
Expressions of the language (over a 2xed schema !) are shown in Fig. 2. We adopt
the convention of omitting the explicit type superscripts in these expressions whenever
they can be inferred from the context.
396 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
R ∈ SC e : Q e1 : t e2 : t
0; 1 : Q R : type(R) if e then e1 else e2 : t
e : Q × · · · × Q (n times)
for f : Qn → Q and P ⊆ Qn from
f(e) : Q P(e) : Q
e1 : b1 ; : : : en : bn
(e1 ; : : : ; en ) : b1 × · · · × bn
i6n e : b1 × · · · × b n e1 : b e 2 : b
i; n e : bi = (e1 ; e2 ) : Q
e : rt e1 : {rt} e2 : {rt}
rt
x : rt {e} : {rt} e1 ∪ e2 : {rt} ∅rt : {rt}
F ∈ e1 : Q e2 : {rt}
AggrF {e1 | xrt ∈ e2 } : Q
the value of e is 0, then it is the value of e2 . The value of (e1 ; : : : ; en ) is the n-ary tuple
having the values of e1 ; : : : ; en at positions 1; : : : ; n respectively. The value of i; n e is
the value at the ith position of the n-ary tuple denoted by e. The value of = (e1 ; e2 )
is 1 if e1 and e2 have the same value; otherwise, it is 0. The value of the variable x
is the corresponding a assigned to x in the given substitution. The value of {e} is the
singleton set containing the value of e. The value of e1 ∪ e2 is the union of the two
sets denoted by e1 and e2 . The value
of ∅ is the empty set.
To de2ne the semantics of , and AggrF , assume that the value of e2 is the set
{b1 ; : : : ; bm }. Then the value of {e1 | x ∈ e2 }[x1 := a1 ; : : : ; xn := an ](D) is de2ned to be
m
e1 [x1 := a1 ; : : : ; xn := an ; x := bi ](D):
i=1
The value of {e1 | x ∈ e2 }[x1 := a1 ; : : : ; xn := an ](D) is
m
e1 [x1 := a1 ; : : : ; xn := an ; x := bi ](D):
i=1
Previous bounds on the expressive power of aggregation were obtained in the context
of RLaggr or similar (and weaker) languages. We now show that nothing is lost by
going to a more natural (at least for a database person) language ALGaggr . A type of
the form {b × · · · × b} is called relational. A relational query in RLaggr then, just as
a relational query in ALGaggr , is an expression of a relational type over a database in
which every relation is of a relational type. In other words, numbers are not allowed
in the input and output.
Proof. To be able to give an inductive proof, we have to account for non-set types,
numerical types, and free variables in RLaggr expressions.
De2ne the transformation (·)set on RLaggr types and values as follows. If t is a base
type or a record type, then t set = {t}; otherwise t set = t. We extend this to tuples of
398 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
Note that there is a natural correspondence between types of the form (·)set and ALGaggr
types, and we shall use this correspondence (implicitly) in the proof.
For values, we de2ne xset = {x} for any x of base or record type, and xset = x other-
wise. The extension to tuples of values of record types is (x1 ; : : : ; xm )set = x1set × · · · ×
xmset . Note that if xi is of type rti , then (x1 ; : : : ; xm )set is of type (rt1 ; : : : ; rtm )set .
We now show the following by induction on the expressions of RLaggr (All; All).
Claim 1. Let e(x1 ; : : : ; xm ) be an RLaggr (All; All) expression over schema SC, where
each xi is of type rti . Then there exists an ALGaggr (All; All) expression e◦ over SC
extended with one relation X of type (rt1 ; : : : ; rtm )set such that, for any database D
and any tuple a1 ; : : : ; am of values of types rt1 ; : : : ; rtm ,
The theorem is a special case of this claim for expressions of relational types without
free variables.
We now present the main cases of the translation. If e is a constant c, the translation
is Apply[c]# . Predicates and functions are straightforwardly translated into numerical
selections and function application.
Consider if e1 then e2 else e3 . Since e1 produces 0 or 1, e1◦ produces {0} or {1}.
Thus,
produces the same result as e2 with an all-one column added if e1 is true (1), or the
same result as e3 with an all-zero column added if e1 is false (0). Hence, eliminating
the last column (by projection), gives the translation of if e1 then e2 else e3 .
The translations of product and projection become cartesian product and relational
projections, by the (·)set translation. For equality of e1 ; e2 of base types, note that
So far the only non-numerical selection we have seen was of the form !i=j , testing
equality of two attributes. We now extend the language to ALG¡ aggr by allowing selec-
tions of the form !i¡j (e), where both i and j are of the type b, and ¡ is some 2xed
linear ordering on the domain Dom.
This small addition changes the situation dramatically, and furthermore in this case
we cannot make blanket statements like “queries are local”—a lot will depend on
the numerical domain Num and available arithmetic operations. Note that even in the
case of relational calculus without aggregates, it is known that the addition of order
makes it more powerful, even with respect to queries that do mention the order at all
(cf. [1]).
Let Num = N. We consider a version of ALGaggr that has the most usual set of
arithmetic
and aggregate operators: namely, +; ·; ¡ and constants for arithmetic, and the
aggregate . This suHces to express aggregates MIN, MAX, COUNT, SUM, but certainly
not AVG, which produces rational numbers.
We shall use the notations:
• SQLN for ALGaggr ({+; ·; ¡; 0; 1}; {-}), and
• SQL¡ ¡
N for A LGaggr ({+; ·; ¡; 0; 1}; {-}).
It is suHcient to have constants just for 0 and 1, as all other numbers are de2nable
with +.
We show how a well-known counting logic FO(C) [3] can be embedded into SQL¡ N.
The importance of this lies in the fact that FO(C) over ordered structures captures a
complexity class, called TC0 [3,27], for which no nontrivial general lower bounds are
known. In fact, although TC0 is contained in DLOGSPACE, the containment is not
known to be proper, and to this day we don’t even know if TC0 = NP. Moreover, there
400 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
are indications that proving such a separation result, at least by traditional methods, is
either impossible, or would have some very unexpected cryptographic consequences [29].
Proof. With order and aggregate SUM, one can de2ne the set I = {1; : : : ; m} where
m = |adom(D)| (by counting the number of elements not greater than each element
in the active domain). Using Apply, one de2nes the operations + and · (as ternary
relations) and the linear ordering ¡ on I. Then the translation of FO(C) into SQL¡ N
proceeds exactly as the standard translation of relational calculus into relational algebra
(with extra relations for + and ·). The only exception is the counting quanti2er case:
(i; ỹ) ≡ ∃ix’(i; ỹ; x), where ỹ is of length p. Assume that ’ is translated into an
expression e that returns a relation with p + 2 attributes. To translate , we use to
count x’s, and compare their number with i’s, that is,
1;:::;p+1 (!16p+3 (Aggr p+1 p + 3 : (e × Apply[1]# ))):
(Note that we count the number of x’s, and thus we 2rst take product with the constant
relation {1}.)
0
Corollary 1. Assume that reachability is not expressible in SQL¡
N . Then uniform TC
is properly contained in NLOGSPACE.
0
Proposition 2. Every Boolean query in SQL¡
N is contained in P-uniform TC .
domain of size n, the largest integer that is contained in the result of any subexpression
of e, does not exceed pe (n). Given D whose active domain is of size n, let D be
D expanded with the relation {1; : : : ; pe (n)}. We can then translate SQL¡ N expressions
into circuits just as FO(C) formulae are translated into them, since no subexpression
of e produces an integer that is not contained in the active domain of D . Clearly,
the function that takes {1; : : : ; n} and produces {1; : : : ; pe (n)} is PTIME, and thus the
circuit for evaluating an expression e on inputs of size n can be produced in PTIME.
0
Hence, SQL¡ N is contained in P-uniform TC .
Notice that the reachability query, even over ordered domains of nodes, is order-
independent; that is, the result does not depend on a particular ordering on the nodes,
just on the graph structure. Could it be that order-independent queries in SQLN and
0
SQL¡N are the same? Of course, such a result would imply that TC is properly con-
tained in DLOGSPACE, and several papers suggested this approach towards separating
complexity classes. Unfortunately, it does not work, as shown in [17]:
Proof. It was shown in [17] that, on the graph of an n-element successor relation
with an extra predicate P interpreted as the 2rst log2 n elements, one can de2ne the
reachability query restricted to the elements of P in FO(C). Hence it can be done
SQL¡N.
Counting abilities of SQLN are essential for this result, as its analog for relational
calculus does not hold [9].
The language SQL¡ N falls short of the class of queries real SQL can de2ne, as it only
uses natural numbers. To deal with rational arithmetic (and thus to permit aggregates
such as AVG), we extend the numerical domain Num to that of rational numbers Q,
and introduce the language
SQL¡ ¡
N as A LGaggr ({+; −; ·; ÷; ¡; 0; 1}; {-}):
number, and
D1 = D2 ⇒ eSC (D1 ) = eSC (D2 ):
Thus, with the addition of some arithmetic operations, SQL¡ Q can express many
queries; in particular, SQL¡
Q extended with all computable numerical functions expresses
all computable queries over purely relational schemas! In fact, to express all computable
Boolean queries over such schemas, it suHces to add all computable functions from
Q to {0; 1}. In contrast, one can show that adding all computable functions from N
to {0; 1} to SQL¡ N does not give us the same power, as the resulting queries can be
coded by non-uniform TC0 circuits. Still, the coding is just of theoretical interest; even
for graphs with 20 nodes it can produces codes of the form p=q with p; q relatively
prime, and q¿101000 ; for q¿1010000 one needs only 60 nodes.
9. Conclusion
Did SQL3 designers really have to introduce recursion, or is it expressible with what
is already there? Our results show that they clearly had a good reason for adding a
L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404 403
Acknowledgements
Although the presentation here is new, it is based entirely on previous results ob-
tained jointly with other people. Special thanks to Limsoon Wong, with whom many
of those papers were coauthored, and who in fact suggested back in 1993 that we look
at the expressiveness of aggregation. The aggregate logic was developed jointly with
Limsoon, Lauri Hella, and Juha Nurmonen, who also collaborated with me on various
aspects of locality of logics. Simple proofs of locality of logics were discovered in an
attempt to answer some questions posed by Moshe Vardi. For their comments on the
paper I thank Limsoon, Lauri, Juha, Martin Grohe, Thomas Schwentick, Luc Segou2n,
and anonymous referees. Part of this work was done while I was visiting the Verso
group at INRIA-Rocquencourt.
References
[1] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison-Wesley, Reading, MA, 1995.
[2] A.V. Aho, J.D. Ullman, Universality of data retrieval languages, in: Principles of Programming
Languages, ACM Press, New York, 1979, pp. 110 –120.
[3] D.M. Barrington, N. Immerman, H. Straubing, On uniformity within NC 1 , J. Comput. System Sci.
41 (1990) 274–306.
[4] P. Buneman, S. Naqvi, V. Tannen, L. Wong, Principles of programming with complex objects and
collection types, Theoret. Comput. Sci. 149 (1995) 3–48.
[5] J. Celko, SQL for Smarties: Advanced SQL Programming, Morgan Kaufmann, Los Altos, CA, 2000.
[6] M. Consens, A. Mendelzon, Low complexity aggregation in GraphLog and Datalog, Theoret. Comput.
Sci. 116 (1993) 95–116.
[7] C.J. Date, H. Darwen, A Guide to the SQL Standard, Addison-Wesley, Reading, MA, 1997.
[8] G. Dong, L. Libkin, L. Wong, Local properties of query languages, Theoret. Comput. Sci. 239 (2000)
277–308.
[9] M. Grohe, T. Schwentick, Locality of order-invariant 2rst-order formulas, ACM Trans. Comput. Logic
1 (2000) 112–130.
[10] K. Etessami, Counting quanti2ers, successor relations, and logarithmic space, J. Comput. System Sci.
54 (1997) 400–411.
[11] R. Fagin, L. Stockmeyer, M. Vardi, On monadic NP vs monadic co-NP, Inform. Comput. 120 (1995)
78–92.
[12] S. Finkelstein, N. Mattos, I.S. Mumick, H. Pirahesh, Expressing recursive queries in SQL, ANSI
Document X3H2-96-075r1, 1996.
[13] H. Gaifman, On local and non-local properties, Proc. Herbrand Symp., Logic Colloquium ’81,
North-Holland, Amsterdam, 1982.
404 L. Libkin / Theoretical Computer Science 296 (2003) 379 – 404
[14] E. GrZadel, Y. Gurevich, Meta2nite model theory, Inform. Comput. 140 (1998) 26–81.
[15] W. Hanf, Model-theoretic methods in the study of elementary logic, in: J.W. Addison, et al. (Eds.),
The Theory of Models, North-Holland, Amsterdam, 1965, pp. 132–145.
[16] L. Hella, Logical hierarchies in PTIME, Inform. Comput. 129 (1996) 1–19.
[17] L. Hella, L. Libkin, J. Nurmonen, Notions of locality and their logical characterizations over 2nite
models, J. Symbolic Logic 64 (1999) 1751–1773.
[18] L. Hella, L. Libkin, J. Nurmonen, L. Wong, Logics with aggregate operators, J. ACM 48 (2001)
880–907.
[19] N. Immerman, Descriptive Complexity, Springer, Berlin, 1998.
[20] A. Klug, Equivalence of relational algebra and relational calculus query languages having aggregate
functions, J. ACM 29 (1982) 699–717.
[21] K.S. Larsen, On grouping in relational algebra, Internat. J. Found. Comput. Sci. 10 (1999) 301–311.
[22] L. Libkin, On the forms of locality over 2nite models, in: IEEE Symp. on Logic in Computer Science,
IEEE Press, New York, 1997, pp. 204 –215.
[23] L. Libkin, Logics with counting and local properties, ACM Trans. Comput. Logic 1 (2000) 33–59.
[24] L. Libkin, L. Wong, Query languages for bags and aggregate functions, J. Comput. System Sci.
55 (1997) 241–272.
[25] L. Libkin, L. Wong, On the power of aggregation in relational query languages, in: Proceedings of
Database Programming Languages, Lecture Notes in Computer Science, Vol. 1369, Springer, Berlin,
1997, pp. 260 –280.
[26] P. O’Neil, Database: Principles, Programming, Performance, Morgan Kaufmann, Los Altos, CA, 1994.
[27] I. Parberry, G. Schnitger, Parallel computation and threshold functions, J. Comput. System Sci.
36 (1988) 278–302.
[28] J. Paredaens, D. Van Gucht, Converting nested algebra expressions into Cat algebra expressions, ACM
Trans. Database Systems 17 (1992) 65–93.
[29] A. Razborov, S. Rudich, Natural proofs, J. Comput. System Sci. 55 (1997) 24–35.
[30] P. Wadler, Comprehending monads, Math. Struct. Comput. Sci. 2 (1992) 461–493.
[31] L. Wong, Normal forms and conservative extension properties for query languages over collection types,
J. Comput. System Sci. 52 (1996) 495–505.