Querycompiler PDF
Querycompiler PDF
(Under Construction)
[expected time to completion: 5 years]
Guido Moerkotte
March 5, 2019
Contents
I Basics 3
1 Introduction 5
1.1 General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 DBMS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Interpretation versus Compilation . . . . . . . . . . . . . . . . . . 6
1.4 Requirements for a Query Compiler . . . . . . . . . . . . . . . . 9
1.5 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Generation versus Transformation . . . . . . . . . . . . . . . . . 12
1.7 Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.8 Organization of the Book . . . . . . . . . . . . . . . . . . . . . . 13
3 Join Ordering 31
3.1 Queries Considered . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 Query Graph . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.2 Join Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.3 Simple Cost Functions . . . . . . . . . . . . . . . . . . . . 34
3.1.4 Classification of Join Ordering Problems . . . . . . . . . . 40
3.1.5 Search Space Sizes . . . . . . . . . . . . . . . . . . . . . . 41
3.1.6 Problem Complexity . . . . . . . . . . . . . . . . . . . . . 45
3.2 Deterministic Algorithms . . . . . . . . . . . . . . . . . . . . . . 47
3.2.1 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 Determining the Optimal Join Order in Polynomial Time 49
3.2.3 The Maximum-Value-Precedence Algorithm . . . . . . . . 56
3.2.4 Dynamic Programming . . . . . . . . . . . . . . . . . . . 61
3.2.5 Memoization . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.2.6 Join Ordering by Generating Permutations . . . . . . . . 79
3.2.7 A Dynamic Programming based Heuristics for Chain Queries 81
3.2.8 Transformation-Based Approaches . . . . . . . . . . . . . 94
i
ii CONTENTS
II Foundations 197
V Implementation 525
36 Outlook 607
Bibliography 620
Index 688
E ToDo 689
List of Figures
xiii
xiv LIST OF FIGURES
Goals
Primary Goals:
• book covers many query languages (at least SQL, OQL, XQuery (XPath))
Secondary Goals:
• book is thin
Acknowledgements
Introducer to query optimization: Günther von Bültzingsloewen
Peter Lockemann
First paper coauthor: Stefan Karl,
Coworkers: Alfons Kemper, Klaus Peithner, Michael Steinbrunn, Donald
Kossmann, Carsten Gerlhof, Jens Claussen,
Sophie Cluet, Vassilis Christophides, Georg Gottlob, V.S. Subramanian,
Sven Helmer, Birgitta König-Ries, Wolfgang Scheufele, Carl-Christian Kanne,
Thomas Neumann, Norman May, Matthias Brantner
Robin Aly
xix
LIST OF FIGURES 1
Discussions: Umesh Dayal, Dave Maier, Gail Mitchell, Stan Zdonik, Tamer
Özsu, Arne Rosenthal,
Don Chamberlin, Bruce Lindsay, Guy Lohman, Mike Carey, Bennet Vance,
Laura Haas, Mohan, CM Park,
Yannis Ioannidis, Götz Graefe, Serge Abiteboul, Claude Delobel Patrick
Valduriez, Dana Florescu, Jerome Simeon, Mary Fernandez, Christoph Koch,
Adam Bosworth, Joe Hellerstein, Paul Larson, Hennie Steenhagen, Harald
Schöning, Bernhard Seeger,
Encouragement: Anand Deshpande
Manuscript: Simone Seeger,
and many others to be inserted.
2 LIST OF FIGURES
Part I
Basics
3
Chapter 1
Introduction
5
6 CHAPTER 1. INTRODUCTION
CTS
execution plan
RTS
result
Rewrite
query interpretation
calculus result
tion, add/drop a view, update database items (e.g. tuples, relations, objects),
change authorizations, and state a query. Within the book, we will only be
concerned with the tiny last item.
interprete(SQLBlock x) {
eval(s, f , w, t, R) {
if(f .empty())
if(w(t))
R += s(t);
else
foreach(t0 ∈ first(f ))
eval(s, tail(f ), w, t ◦ t0 , R);
}
marized in Figure 1.4. First, the query is rewritten. Again, unnesting nested
queries is a main technique for performance gains. Other rewrites will be dis-
cussed in Part ??. After the rewrite, the plan generation takes place. Here,
an optimal plan is constructed. Whereas typically rewrite takes place on a
calculus-based representation of the query, plan generation constructs an alge-
braic expression containing well-known operators like selection and join. Some-
times, after plan generation, the generated plan is refined: some polishing takes
place. Then, code is generated, that can be interpreted by the runtime system.
More specifically, the query execution engine—a part of the runtime system—
interpretes the query execution plan. Let us illustrate this. The following query
8 CHAPTER 1. INTRODUCTION
query
parsing CTS
nfst
internal representation
rewrite I query
optimizer
internal representation
plan generation
internal representation
rewrite II
internal representation
code generation
execution plan
The CTS translates this query into a query execution plan. Part of the plan
is shown in Fig. 1.6. One rarely sees a query execution plan. This is the reason
why I included one. But note that the form of query execution plans differs
from DBMS to DBMS since it is (unfortunately) not standardized the way SQL
is. Most DBMSs can give the user abstract representations of query plans. It
is worth the time to look at the plans generated by some commercial DBMSs.
I do not expect the reader to understand the plan in all details. Some of
these details will become clear later. Anyway, this plan is given to the RTS
which then interprets it. Part of the result of the interpretation might look like
this:
RETURNFLAG LINESTATUS SUM QTY SUM EXTPR ...
A F 3773034 5319329289.68 ...
N F 100245 141459686.10 ...
N O 7464940 10518546073.98 ...
R F 3779140 5328886172.99 ...
This should look familar to you.
The above query plan is very simple. It contains only a few algebraic op-
erators. Usually, more algebraic operators are present and the plan is given in
a more abstract form that cannot be directly executed by the runtime system.
Fig. 2.10 gives an example of an abstracted more complex operator tree. We
will work with representations closer to this one.
A typical query compiler architecture is shown in Figure 1.5. The first com-
ponent is the parser. It produces an abstract syntax tree. This is not always the
case but this intermediate representation very much simplifies the task of fol-
lowing component. The NFST component performs several tasks. The first step
is normalization. This mainly deals with introducing new variables for subex-
pressions. Factorization and semantic analysis are performed during NFST.
Last, the abstract syntax tree is translated into the internal representation. All
these steps can typically be performed during a single path through the query
representation. Semantic analysis requires looking up schema definitions. This
can be expensive and, hence, the number of lookups should be minimized. Af-
ter NFST the core optimization steps rewrite I and plan generation take place.
Rewrite II does some polishing before code generation. These modules directly
correspond to the phases in Figure 1.4. They are typically further devided into
submodules handling subphases. The most prominent example is the prepara-
tion phase that takes place just before the actual plan generation takes place.
In our figures, we think of preparation as being part of the plan generation.
2. Completeness
(group
(tbscan
{segment ’lineitem.C4Kseg’ 0 4096}
{nalslottedpage 4096}
{ctuple ’lineitem.cschema’}
[ 20
LOAD_PTR 1
LOAD_SC1_C 8 1 2 // L_RETURNFLAG
LOAD_SC1_C 9 1 3 // L_LINESTATUS
LOAD_DAT_C 10 1 4 // L_SHIPDATE
LEQ_DAT_ZC_C 4 ’1998-02-09’ 1
] 2 1 // number of help-registers and selection-register
) 10 22 // hash table size, number of registers
[ // init
MV_UI4_C_C 1 0 // COUNT(*) = 0
LOAD_SF8_C 4 1 6 // L_QUANTITY
LOAD_SF8_C 5 1 7 // L_EXTENDEDPRICE
LOAD_SF8_C 6 1 8 // L_DISCOUNT
LOAD_SF8_C 7 1 9 // L_TAX
MV_SF8_Z_C 6 10 // SUM/AVG(L_QUANTITY)
MV_SF8_Z_C 7 11 // SUM/AVG(L_EXTENDEDPRICE)
MV_SF8_Z_C 8 12 // AVG(L_DISCOUNT)
SUB_SF8_CZ_C 1.0 8 13 // 1 - L_DISCOUNT
ADD_SF8_CZ_C 1.0 9 14 // 1 + L_TAX
MUL_SF8_ZZ_C 7 13 15 // SUM(L_EXTDPRICE * (1 - L_DISC))
MUL_SF8_ZZ_C 15 14 16 // SUM((...) * (1 + L_TAX))
] [ // advance
INC_UI4 0 // inc COUNT(*)
MV_PTR_Y 1 1
LOAD_SF8_C 4 1 6 // L_QUANTITY
LOAD_SF8_C 5 1 7 // L_EXTENDEDPRICE
LOAD_SF8_C 6 1 8 // L_DISCOUNT
LOAD_SF8_C 7 1 9 // L_TAX
MV_SF8_Z_A 6 10 // SUM/AVG(L_QUANTITY)
MV_SF8_Z_A 7 11 // SUM/AVG(L_EXTENDEDPRICE)
MV_SF8_Z_A 8 12 // AVG(L_DISCOUNT)
SUB_SF8_CZ_C 1.0 8 13 // 1 - L_DISCOUNT
ADD_SF8_CZ_C 1.0 9 14 // 1 + L_TAX
MUL_SF8_ZZ_B 7 13 17 15 // SUM(L_EXTDPRICE * (1 - L_DISC))
MUL_SF8_ZZ_A 17 14 16 // SUM((...) * (1 + L_TAX))
] [ // finalize
UIFC_C 0 18
DIV_SF8_ZZ_C 10 18 19 // AVG(L_QUANTITY)
DIV_SF8_ZZ_C 11 18 20 // AVG(L_EXTENDEDPRICE)
DIV_SF8_ZZ_C 12 18 21 // AVG(L_DISCOUNT)
] [ // hash program
HASH_SC1 2 HASH_SC1 3
] [ // compare program
CMPA_SC1_ZY_C 2 2 0
EXIT_NEQ 0
CMPA_SC1_ZY_C 3 3 0
])
Figure 1.6: Execution plan
5. Graceful degradation
1.5. SEARCH SPACE 11
6. Robustness
First of all, the query compiler must produce correct query evaluation plans.
That is, the result of the query evaluation plan must be the result of the query
as given by the specification of the query language. It must also cover the
complete query language. The next issue is that an optimal query plan must
(should) be generated. However, this is not always that easy. That is why some
database researchers say that one must avoid the worst plan. Talking about
the quality of a plan requires us to fix the optimization goal. Several goals are
reasonable: We can optimize throughput, minimize response time, minimize
resource consumption (both, memory and CPU), and so on. A good query
compiler supports two optimization goals: minimize resource consumption and
minimize the time to produce the first tuple. Obviously, both goals cannot be
achieved at the same time. Hence, the query compiler must be instructed about
the optimization goal.
Irrespective of the optimization goal, the query compiler should produce the
query evaluation plan fast. It does not make sense to take 10 seconds to optimize
a query whose execution time is below a second. This sounds reasonable but
is not trivial to achieve. As we will see, the number of query execution plans
that are equivalent to a given query, i.e. produce the same result as the query,
can be very large. Sometimes, very large even means that not all plans can
be considered. Taking the wrong approach to plan generation will result in no
plan at all. This is the contrary of graceful degradation. Expressed positively,
graceful degradation means that in case of limited resources, a plan is generated
that may not be the optimal plan, but also not that far away from the optimal
plan.
Last, typical software quality criteria should be met. We only mentioned
robustness in our list, but others like maintainability must be met also.
equivalent
plans
actual
search space
potential
search space
1.7 Focus
In this book, we consider only the compilation of queries. We leave out many
special aspects like query optimization for multi-media database systems or
1.8. ORGANIZATION OF THE BOOK 13
multidatabase systems. These are just two omissions. We further do not con-
sider the translation of update statements which — especially in the presence
of triggers — can become quite complex. Furthermore, we assume the reader to
be familiar with the fundamentals of database systems [260, 476, 637, 696, 805]
and their implementation [397, 312]. Especially, knowledge on query execution
engines is required [341].
Last, the book presents a very personal view on query optimization. To
see other views on the same topic, I strongly recommend to read the literature
cited in this book and the references found therein. A good start are overview
articles, PhD theses, and books, e.g. [889, 318, 439, 440, 460, 534] [599, 602,
649, 819, 839, 873, 874].
this chapter forms the core of every plan generator. The second reason is that
this problem allows to discuss some issues like search space sizes and problem
complexities. The third reason is that we do not have to delve into details.
We can stick to very simple (you might call them unrealistic) cost functions,
do not have to concern ourselves with details of the runtime system and the
like. Expressed positively, we can concentrate on some algorithmic aspects
of the problem. In Chapter 4 we do the opposite. The reader will not find
any advanced algorithms in this chapter but plenty of details on disks and cost
functions. The goal of the rest of the book is then to bring these issues together,
broaden the scope of the chapters, and treat problems not even touched by
them. The main issue not touched is query rewrite.
Chapter 2
Those attributes belonging to the key of the relations have been underlined.
With the following query we ask for all students attending a lecture by a
Professor called “Larson”.
15
16 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION
2.2 Algebra
Let us briefly recall the standard definition of the most important algebra-
ic operators. Their inputs are relations, that is sets of tuples. Sets do not
contain duplicates. The attributes of the tuples are assumed to be simple (non-
decomposable) values. The most common algebraic operators are defined in
Fig. 2.1. Although the common set operations union (∪), intersection (∩), and
setdifference (\) belong to the relational algebra, we did not list them. Re-
member that ∪ and ∩ are both commutative and associative. \ is neither of
them. Further, for ∪ and ∩, two distributivity laws hold. However, since these
operations are not used in this section, we refer to Figure 7.1 in Section 7.1.1.
Before we can understand Figure 2.1, we must clarify some terms and no-
tations. For us, a tuple is a mapping from a set of attribute names (or at-
tributes for short) to their corresponding values. These values are taken from
certain domains. An actual tuple is denoted embraced by brackets. They
include a comma-separated list of the form attribute name, column and at-
tribute value as in [name: ‘‘Anton’’, age: 2]. If we have two tuples
with different attribute names, they can be concatenated, i.e. we can take the
union of their attributes. Tuple concatentation is denoted by ‘◦’. For exam-
ple [name: ‘‘Anton’’, age: 2] ◦ [toy: ‘‘digger’’] results in [name:
‘‘Anton’’, age: 2, toy: ‘‘digger’’]. Let A and A0 be two sets of at-
tributes where A0 ⊆ A holds. Further let t a tuple with schema A. Then, we can
project t on the attributes in A (written as t.A). The resulting tuple contains on-
ly the attributes in A0 ; others are discarded. For example, if t is the tuple [name:
‘‘Anton’’, age: 2, toy: ‘‘digger’’] and A = {name, age}, then t.A is
the tuple [name: ‘‘Anton’’, age: 2].
A relation is a set of tuples with the same attributes. The schema of a
relation is the set of attributes. For a relation R this is sometimes denoted by
sch(R), the schema of R. We denote it by A(R) and extend it to any algebraic
expression producing a set of tuples. That is, A(e) for any algebraic expression
is the set of attributes the resulting relation defines. Consider the predicate
age = 2 where age is an attribute name. Then, age behaves like a free variable
that must be bound to some value before the predicate can be evaluated. This
motivates us to often use the terms attribute and variable synonymously. In the
above predicate, we would call age a free variable. The set of free variables of
an expression e is denoted by F(e).
Sometimes it is useful to work with sequences of attributes in compari-
son predicates. Let A = ha1 , . . . , ak i and B = hb1 , . . . , bk i be two attribute
sequences. Then for any comparison operator θ ∈ {=, ≤, <, ≥, >, 6=}, the ex-
pression AθB abbreviates a1 θb1 ∧ a2 θb2 ∧ . . . ∧ ak θbk .
Often, a natural join is defined. Consider two relations R1 and R2 . Define
Ai := A(Ri ) for i ∈ {1, 2}, and A := A1 ∩ A2 . Assume that A is non-empty
and A = ha1 , . . . , an i. If A is non-empty, the natural join is defined as
R1 B R2 := ΠA1 ∪A2 (R1 Bp ρA:A0 (R2 ))
where ρA:A0 renames the attributes ai in A to a0i in A0 and the predicate p has
the form A = A0 , i.e. a1 = a01 ∧ . . . ∧ an = a0n .
2.3. CANONICAL TRANSLATION 17
For our algebraic operators, several equivalences hold. They are given in
Figure 2.2. For them to hold, we typically require that the relations involved
have disjoint attribute sets. That is, we assume—even for the rest of the book—
that attribute names are unique. This is often achieved by using the notation
R.a for a relation R or v.a for a variable ranging over tuples with an attribute
a. Another possibility is to use the renaming operator ρ.
Some equivalences are not always valid. Their validity depends on whether
some condition(s) are satisfied or not. For example, Eqv. 2.4 requires F(p) ⊆ A.
That is, all attribute names occurring in p must be contained in the attribute set
A the projection retains: otherwise, we could not evaluate p after the projection
has been applied. Although all conditions in Fig. 2.2 are of this flavor, we will
see throughout the course of the book that more complex conditions exist.
select distinct a1 , a2 , . . . , am
from R1 c1 , R2 c2 , . . . , Rn cn
where p
Here, the Ri are relation names and the ci are correlation names. The ai in
the select clause are attribute names (or expressions of the form ci .ai ) taken
from the relations in the from clause. The predicate p is assumed to be a
conjunction of comparisions between attributes or attributes and constants.
The translation process then follows the procedure described in Figure 2.3.
First, we construct an expression that produces the cross product of the entries
18 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION
((. . . ((R1 A R2 ) A R3 ) . . .) A Rn ).
σp ((. . . ((R1 A R2 ) A R3 ) . . .) A Rn ).
3. Let s be the content of the select distinct clause. For the canonical
translation it must be of either ’*’ or a list a1 , . . . , an of attribute names.
Construct the expression
W if s = ’*’
S :=
Πa1 ,...,an (W ) if s = a1 , . . . , an
4. Return S.
where p equals
s.SNo = a.ASNo and a.ALNo = l.LNo and l.LPNo = p.PNo and p.PName =
‘Larson’.
Note that we used the notation R[r] to say that a relation named R has the
correlation name r. During the course of the book we will be more precise
about the semantics of this notation and it will deviate from the one suggested
here. We will take r as a variable successively bound to the elements (tuples)
in R. However, for the purpose of this chapter it is sufficient to think of it
as associating a correlation name with a relation. The query is represented
graphically in Figure 2.7 (top).
20 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION
3. introduce joins
(Eqv. 2.15: →)
Πs.SN ame (
σs.SN o=a.ASN o (
σa.ALN o=l.LN o (
σl.LP N o=p.P N o (
σp.P N ame=‘Larson0 (
((Student[s] A Attend[a]) A Lecture[l]) A Prof essor[p])))))
Πs.SN ame (
σl.LP N o=p.P N o (
σa.ALN o=l.LN o (
σs.SN o=a.ASN o (Student[s] A Attend[a])
ALecture[l])
A(σp.P N ame=‘Larson0 (Prof essor[p]))))
After translation and Steps 1 and 2 the algebraic expression looks like
Πs.SN ame (
σs.SN o=a.ASN o (
σa.ALN o=l.LN o (
(Student[s] A σl.LT itle=‘Databases I 0 (Lecture[l])) A Attend[a]))).
Neither of σs.SN o=a.ASN o and σa.ALN o=l.LN o can be pushed down further. Only
after reordering the cross products such as in
Πs.SN ame (
σs.SN o=a.ASN o (
σa.ALN o=l.LN o (
(Student[s] A Attend[a]) A σl.LT itle=‘Databases I 0 (Lecture[l]))))
Πs.SN ame (
σa.ALN o=l.LN o (
σs.SN o=a.ASN o (Student[s] A Attend[a])
Aσl.LT itle=‘Databases I 0 (Lecture[l])))
This is the reason why in some textbooks reorder cross products before selec-
tions are pushed down [260]. In this appoach, reordering of cross products takes
into account the selection predicates that can possibly be pushed down to the
leaves and down to just prior a cross product. In any case, the Steps 2 and 4
are highly interdependent and there is no simple solution. 2
After this small excursion let us resume rewriting our main example query.
The next step to be applied is converting cross products to join operations (Step
3). The motivation behind this step is that the evaluation of cross products
2.4. LOGICAL QUERY OPTIMIZATION 23
is very expensive and results in huge intermediate results. For every tuple in
the left input an output tuple must be produced for every tuple in the right
input. A join operation can be implemented much more efficiently. Applying
Equivalence 2.15 from left to right to our example query results in
Πs.SN ame (
((Student[s] Bs.SN o=a.ASN o Attend[a])
Ba.ALN o=l.LN o Lecture[l])
Bl.LP N o=p.P N o (σp.P N ame=‘Larson0 (Prof essor[p])))
All students and their attendances to some lecture are considered. The result
and hence the input to the next join will be very big. On the other hand, if there
is only one professor named Larson, the output of σp.P N ame=‘Larson0 (Prof essor[p])
is a single tuple. Joining this single tuple with the relation Lecture results in
an output containing one tuple for every lecture taught by Larson. For a large
university, this will be a small subset of all lectures. Continuing this line, we
get the following algebraic expression:
Πs.SN ame (
((σp.P N ame=‘Larson0 (Prof essor[p])
Bp.P N o=l.LP N o Lecture[l])
Bl.LN o=a.ALN o Attend[a])
Ba.ASno=s.SN o Student[s])
Πs.SN ame (
Πa.ASN o (
Πl.LN O (
Πp.P N o (σp.P N ame=‘Larson0 (Prof essor[p]))
Bp.P N o=l.LP N o
Πl.LP no,l.LN o (Lecture[l]))
Bl.LN o=a.ALN o
Πa.ALN o,a.ASN o (Attend[a]))
Ba.ASno=s.SN o
Πs.SN o,s.SN ame (Student[s]))
is thus an enforcer since it makes sure that the required property holds. As we
will see, properties and enforcers play a crucial role during plan generation.
If common subexpressions are detected at the algebraic level, it might be
beneficial to compute them only once and store the result. To do so, a tmp
operator must be introduced. Later on, we will see more of these operators
that materialize (partial) intermediate results in order to avoid the same com-
putation to be performed more than once. An alternative is to allow QEPs
which are DAGs and not merely trees (see Section ??).
Physical query optimization is concerned with all the issues mentioned
above. The outline of it is given in Figure 2.9. Let us demonstrate this for
our small example query. Let us assume that there exists an index on the name
of the professors. Then, instead of scanning the whole professor relation, it
is beneficial to use the index to retrieve only those professors named Larson.
Further, since a sort merge join is very robust and not the slowest alternative,
we choose it as an implementation for all our join operations. This requires that
we sort the inputs to the join operator on the join attributes. Since sorting is
a pipeline breaker, we introduce it between the projections and the joins. The
resulting plan is
Πs.SN ame (
Sorta.ASN o (Πa.ASN o (
Sortl.LN o (Πl.LN O (
Sortp.P N o (Πp.P N o (IdxScanp.P N ame=‘Larson0 (Prof essor[p])))
Bsmj
p.P N o=l.LP N o
Sortl.LP N o (Πl.LP no,l.LN o (Lecture[l])))
Bsmj
l.LN o=a.ALN o
Sorta.ALN o (Πa.ALN o,a.ASN o (Attend[a]))))
Bsmj
a.ASno=s.SN o
Sorts.SN o (Πs.SN o,s.SN ame (Student[s])))
where we annotated the joins with smj to indicate that they are sort merge
joins. The sort operator has the attributes on which to sort as a subscript. We
cheated a little bit with the notation of the index scan. The index is a physical
entity stored in the database. An index scan typically allows to retrieve the
TIDs of the tuples qualifying the predicate. If this is the case, another access
to the relation itself is necessary to fetch the relevant attributes (p.PNo in
our case) from the qualifying tuples of the relation. This issue is rectified in
Chapter 4. The plan is shown as an operator graph in Figure 2.10.
2.6 Discussion
This chapter left open many interesting issues. We took it for granted that the
presentation of a query is an algebraic expression or operator tree. Is this really
true? We have been very vague about ordering joins and cross products. We
only considered queries of the form select distinct. How can we assure correct
duplicate treatment for select all? We separated query optimization into two
distinct phases: logical and physical query optimization. Any separation into
26 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION
different phases results in the danger of not producing an optimal plan. Logical
query optimization turned out to be a little difficult: pushing selections down
and reordering joins are mutually interdependent. How can we integrate these
steps into a single one and thereby avoid the problem mentioned? Further, our
logical query optimization was not cost based and cannot be: too much infor-
mation is still missing from the plan to associate precise costs with a logical
algebraic expression. How can we integrate the phases? How can we determine
the costs of a plan? We covered only a small fraction of SQL. We did not discuss
disjunction, negation, union, intersection, except, aggregate functions, group-
by, order-by, quantifiers, outer joins, and nested queries. Furthermore, how
about other query languages like OQL, XPath, XQuery? Further, enhance-
ments like materialized views exist nowadays in many commercial systems.
How can we exploit them beneficially? Can we exploit semantic information?
Is our exploitation of index structures complete? What happens if we encounter
NULL-values? Many questions and open issues remain. The rest of the book
is about filling these gaps.
2.6. DISCUSSION 27
B B B
B
B p c B p c p B a
B B
B l l B l B
p l s a
s a c s a c s a
c
B c
B B B
B p p
c B c p B a B B
B l
l B l B p l a s
a s a a s a a s
c
B B B
B p B
c B p c p B
B s B B
s B s B
l p a s
a l a a l a a l
c
B B B
c
B
B s c B s c s B
B B c
B p p B p B
l p s a
a l c a l c a l
B B B B
B s c B s c s B B B
B p p B p B
c a s p l
l a a l a l a
c
B B B B
B p c B p c p B B B
B s s B s B a s l p
l a a l a l a
c
B B B B
B s c B s c s B a B B
B a a B a B s a l p
l p c l p cl p
c
B B B B
B s c B s c s B a B B
B a a B a B s a p l
p l p l p l
Πs.SN ame
Πs.SN ame
σs.SN o = a.ASN o
σa.ALN o = l.LN o
σl.LP N o = p.P N o
Πs.SN ame
σl.P N o = p.P N o
A
σa.ALN o = l.LN o σp.P N ame = 0 Larson0
A Professor[p]
Student[s] Attend[a]
Πs.SN ame
Bl.P N o = p.P N o
Student[s] Attend[a]
Πs.SN ame
Ba.ASN o = s.SN o
Professor[p]
Πs.SN ame
Ba.ASN o = s.SN o
Bl.LN o Student[s]
Professor[p]
Πs.SN ame
Bsmj
a.ASno = s.SN o
Sorta.ASN o Sorts.SN o
Bsmj
l.LN o=a.ALN o Student[s]
Sortl.LN o Sorta.ALN o
Bsmj
p.P N o=l.LP N o Attend[a]
Sortp.P N o Sortl.LP N o
Professor[p]
Figure 2.10: Plan for example query after physical query optimization
Chapter 3
Join Ordering
The problem of join ordering is a very restricted and — at the same time —
a very complex one. We have touched this issue while discussing logical query
optimization in Chapter 2. Join ordering is performed in Step 4 of Figure 2.5.
In this chapter, we simplify the problem of join ordering by not considering du-
plicates, disjunctions, quantifiers, grouping, aggregation, or nested queries. Ex-
pressed positively, we concentrate on conjunctive queries with simple and cheap
join predicates. What this exactly means will become clear in the next section.
Subsequent sections discuss different algorithms for solving the join ordering
problem. Finally, we take a look at the structure of the search space. This is
important if different join ordering algorithms are compared via benchmarks.
If the wrong parameters are chosen, benchmark results can be misleading.
The algorithms of this chapter form the core of every plan generator.
31
32 CHAPTER 3. JOIN ORDERING
s.SNo = a.ASNo
Student Attend
a.ALNo = l.LNo
Professor Lecture
l.LPNo = p.PNo
p.PName = ’Larson’
select *
from R1, R2, R3, R4
where f(R1.a, R2.a,R3.a) = g(R2.b,R3.b,R4.b)
((((R2 B R3 ) B R1 ) B R4 ) B R5 )
|Ri Bpi,j Rj |
fi,j =
|Ri | ∗ |Rj |
This is the number of tuples in the join’s result divided by the number of tuples
in the Cartesian Product between Ri and Rj . If fi,j is 0.1, then only 10% of
all tuples in the Cartesian Product survive the predicate pi,j . Note that the
selectivity is always a number between 0 and 1 and that fi,j = fj,i . We use an
f and not an s, since the selectivity of a predicate is often called filter factor .
Besides the relation’s cardinalities, the selectivities of the join predicates
pi,j are assumed to be given as input to the join ordering algorithm. Therefore,
we can compute the output cardinality of a join Ri Bpi,j Rj , as
From this it becomes clear that if there is no join predicate for two relations
Ri and Rj , we can assume a join predicate true and associate a selectivity of
1 with it. The output cardinality is then the cardinality of the cross product
3.1. QUERIES CONSIDERED 35
Note that this formula assumes that the selectivities are independent of each
other. Assuming independence is common but may be very misleading. More
on this issue can be found in Chapter ??. Nevertheless, we assume independence
and stick to the above formula.
For sequences of joins we can give a simple cardinality definition. Let s =
R1 , . . . , Rn be a sequence of relations. Then
n Y
Y k
|s| = ( fi,k |Rk |).
k=1 i=1
Given the above, a query graph alone is not really sufficient for the speci-
fication of a join ordering problem: cardinalities and selectivities are missing.
On the other hand, from a complete list of cardinalities and selectivities we can
derive the query graph. Obviously, the following defines a chain query with
query graph R1 − − − R2 − − − R3 :
|R1 | = 10
|R2 | = 100
|R3 | = 1000
f1,2 = 0.1
f2,3 = 0.2
In all examples, we assume for all i and j for which fi,j is not given that there
is no join predicate and hence fi,j = 1.
We now come to cost functions. The first cost function we consider is called
Cout . For a join tree T , Cout (T ) is the sum of all output cardinalities of all joins
in T . Recursively, we can define Cout as
0 if T is a single relation
Cout (T ) =
|T | + Cout (T1 ) + Cout (T2 ) if T = T1 B T2
are too complex for this section, we stick to simple cost functions proposed by
Krishnamurthy, Boral, and Zaniolo [512]. They argue that these cost functions
are appropriate for main memory database systems. For the three different
join implementations nested loop join (nlj), hash join (hj), and sort merge join
(smj), they give the following cost functions:
where ei are join trees and h is the average length of the collision chain in the
hash table. We will assume h = 1.2. All these cost functions are defined for a
single join operator. The cost of a join tree is defined as the sum of the costs of
all joins it contains. We use the symbols Cx to also denote the costs of not only
a single join but the costs of the whole tree. Hence, for sequences s of relations,
we have
n
X
Cnlj (s) = |s1 , . . . , si−1 | ∗ |si |
i=2
Xn
Chj (s) = 1.2|s1 , . . . , si−1 |
i=2
Xn n
X
Csmj (s) = |s1 , . . . , si−1 | log(|s1 , . . . , si−1 |) + |si | log(|si |)
i=2 i=2
Some notes on the cost functions are in order. First, note that these cost
functions are even for main memory a little incomplete. For example, constant
factors are missing. Second, the cost functions are mainly devised for left-deep
trees. This becomes apparent when looking at the costs of hash joins. It is
assumed that the right input is already stored in an appropriate hash table.
Obviously, this can only hold for base relations, giving rise to left-deep trees.
Third, Chj and Csmj do not work for cross products. However, we can extend
these cost functions by defining the cost of a cross product to be equal to
its output cardinality, which happens to be the cost of Cnlj . Fourth, in reality,
more complex cost functions are used and other parameters like the width of the
tuples—i.e. the number of bytes needed to store them—also play an important
role. Fifth, the above cost functions assume that the same join algorithm is
chosen throughout the whole plan. In practice, this will not be true.
For the above chain query, we compute the costs of different join trees. The
last join tree contains a cross product.
Cout Cnlj Chj Csmj
R1 B R2 100 1000 12 697.61
R2 B R3 20000 100000 120 10630.26
R1 A R3 10000 10000 10000 10000.00
(R1 B R2 ) B R3 20100 101000 132 11327.86
(R2 B R3 ) B R1 40000 300000 24120 32595.00
(R1 A R3 ) B R2 30000 1010000 22000 143542.00
3.1. QUERIES CONSIDERED 37
For the calculation of Cout note that |R1 B R2 B R3 | = 20000 is included in all
of the last three lines of its column. For the nested loop cost function, the costs
are calculated as follows:
• The costs of the same join tree differ under the different cost functions.
• The cheapest join tree is (R1 B R2 ) B R3 under all four cost functions.
• The join order matters even for join trees without cross products.
We would like to emphasize that the join order is also relevant under other cost
functions.
Avoiding cross products is not always beneficial, as the following query
specifiation shows:
|R1 | = 1000
|R2 | = 2
|R3 | = 2
f1,2 = 0.1
f1,3 = 0.1
Note that although the absolute numbers are quite small, the ratio of the best
and the second best join tree is quite large. The reader is advised to find more
examples and to apply other cost functions.
The following example illustrates that a bushy tree can be superior to any
linear tree. Let us use the following query specification:
|R1 | = 10
|R2 | = 20
|R3 | = 20
|R4 | = 10
f1,2 = 0.01
f2,3 = 0.5
f3,4 = 0.01
If we do not consider cross products, we have for the symmetric (see below)
cost function Cout the following join trees and costs:
Join Tree Cout
R1 B R2 2
R2 B R3 200
R3 B R4 2
((R1 B R2 ) B R3 ) B R4 24
((R2 B R3 ) B R1 ) B R4 222
(R1 B R2 ) B (R3 B R4 ) 6
Note that all other linear join trees fall into one of these classes, due to the
symmetry of the cost function and the join ordering problem. Again, the reader
is advised to find more examples and to apply other cost functions.
If we want to annotate a join operator by its implementation—which is
necessary for the correct computation of costs—we write Bimpl for an imple-
mentation impl. For example, Bsmj is a sort-merge join, and the according cost
function Csmj is used to compute its costs.
Two properties of cost functions have some impact on the join ordering
problem. The first is symmetry. A cost function Cimpl is called symmetric
if Cimpl (R1 Bimpl R2 ) = Cimpl (R2 Bimpl R1 ) for all relations R1 and R2 . For
symmetric cost functions, it does not make sense to consider commutativity.
Hence, it suffices to consider left-deep trees only if we want to restrict ourselves
to linear join trees. Note that Cout , Cnlj , Csmj are symmetric while Chj is not.
The other property is the adjacent sequence interchange (ASI) property.
Informally, the ASI property states that there exists a rank function such that
the order of two subsequences is optimal if they are ordered according to the
rank function. The ASI property is formally defined in Section 3.2.2. Only for
tree queries and cost functions with the ASI property, a polynomial algorithm
to find an optimal join order is known. Our cost functions Cout and Chj have the
ASI property, Csmj does not. Summarizing the properties of our cost functions,
we see that the classification is orthogonal:
3.1. QUERIES CONSIDERED 39
ASI ¬ ASI
symmetric Cout , Cnlj Csmj
¬ symmetric Chj (see text)
For the missing case of a non-symmetric cost function not having the ASI
property, we can use the cost function of the hybrid hash join [234, 664].
We turn to another not really well-researched topic. The goal is to cut
down the number of cost functions which have to be considered for optimization
and to possibly allow for simpler cost functions, which saves time during plan
generation. Unfortunately, we have to restrict ourselves to left-deep join trees.
Let s denote a sequence or permutation of a given set of joins. We define an
equivalence relation on cost functions.
That is, ΣIR is the set of all cost functions that are equivalent to Cout .
Let us consider a very simple example. The last element of the sum in Cout
is the size of the final join (all relations are joined). This is not the case for the
following cost function:
n−1
X
0
Cout (s) := |s1 , . . . , si |
i=2
0
Obviously, we have Cout is ΣIR. The next observation shows that we can
construct quite complex ΣIR cost functions:
Observation 3.1.3 Let C1 and C2 be two ΣIR cost functions. For non-
decreasing functions f1 : R → R and f2 : R × R → R and constants c ∈ R
and d ∈ R+ , we have that EX
C1 + c
C1 ∗ d
f1 ◦ C1
f2 ◦ (C1 , C2 )
are ΣIR. Here, ◦ denotes function composition and (·, ·) function pairing.
There are of course many more possibilites of constructing ΣIR functions. For
the cost functions Chj , Csmj , and Cnlj , we now investigate which of them have
the ΣIR property.
40 CHAPTER 3. JOIN ORDERING
and observation 3.1.3, we conclude that Chj is ΣIR for a fixed relation to be
joined first. If we can optimize Cout in polynomial time, then we can optimize
Cout for a fixed starting relation. Indeed, by trying each relation as a starting
EX relation, we can find the optimal join tree in polynomial time. An algorithm
that computes the optimal solution for an arbitrary relation to be joined first
can be found in Section 3.2.2.
Now, consider Csmj . Since
n
X
|s1 , . . . , si−1 |log(|s1 , . . . , si−1 |)
i=2
|R1 R2 | = 90
|R1 R3 | = 100
|R2 R3 | = 100
and
We see that R1 R2 R3 has the smallest sum of intermediate result sizes but
produces the highest cost. Hence, Cnlj is not ΣIR.
The query graph classes considered are chain, star , tree, and cyclic. For the join
tree classes we distinguish between the different join tree shapes, i.e. whether
they are left-deep, zig-zag, or bushy trees. We left out the right-deep trees, since
they do not differ in their behavior from left-deep trees. We also have to take
into account whether cross products are considered or not. For cost functions,
we use a simple classification: we only distinguish between those that have the
ASI property and those that do not. This leaves us with 4∗3∗2∗2 = 48 different
join ordering problems. For these, we will first review search space sizes and
complexity. Then, we discuss several algorithms for join ordering. Last, we give
some insight into cost distributions over the search space and how this might
influence the benchmarking of different join ordering algorithms.
Join Trees with Cross Products We consider the number of join trees for
a query graph with n relations. When cross products are allowed, the number
of left-deep and right-deep join trees is n!. By allowing cross products, the
query graph does not restrict the search space in any way. Hence, any of the n!
permutations of the n relations corresponds to a valid left-deep join tree. This
is true independent of the query graph.
Similarly, the number of zig-zag trees can be estimated independently of
the query graph. First note that for joining n relations, we need n − 1 join
operators. From any left-deep tree, we derive zig-zag trees by using the join’s
commutativity and exchange the left and right inputs. Hence, from any left-
deep tree for n relations, we can derive 2n−2 zig-zag trees. We have to subtract
another one, since the bottommost joins’ arguments are exchanged in different
left-deep trees. Thus, there exists a total of 2n−2 n! zig-zag trees. Again, this
number is independent of the query graph.
The number of bushy trees can be estimated as follows. First, we need the
number of binary trees. For n leaf nodes, the number of binary trees is given
by C(n − 1), where C(n) is defined by the recurrence
n−1
X
C(n) = C(k)C(n − k − 1)
k=0
with C(0) = 1. The numbers C(n) are called the Catalan Numbers (see [205]).
They can also be computed by the following formula:
1 2n
C(n) = .
n+1 n
42 CHAPTER 3. JOIN ORDERING
After we know the number of binary trees with n leaves, we now have to
attach the n relations to the leaves in all possible ways. For a given binary
tree, this can be done in n! ways. Hence, the total number of bushy trees is
n!C(n − 1). This can be simplified as follows (see also [303, 524, 854]):
1 2(n − 1)
n!C(n − 1) = n!
n n−1
1 (2n − 2)!
= n!
n (n − 1)!((2n − 2) − (n − 1))!
(2n − 2)!
=
(n − 1)!
The induction step for n > 1 provided by Thomas Neumann goes as follows:
n−1
X
f (n) = n + f (k − 1) ∗ (n − k)
k=2
n−3
X
= n+ f (k + 1) ∗ (n − k − 2)
k=0
n−3
X
= n+ 2k ∗ (n − k − 2)
k=0
n−2
X
= n+ k2n−k−2
k=1
n−2
X n−2
X
n−k−2
= n+ 2 + (k − 1)2n−k−2
k=1 k=2
n−2
X n−2
X
= n+ 2n−j−2
i=1 j=i
n−2
X n−i−2
X
= n+ 2j
i=1 j=0
n−2
X
= n+ (2n−i−1 − 1)
i=1
n−2
X
= n+ 2i − (n − 2)
i=1
= n + (2n−1 − 2) − (n − 2)
= 2n−1
as the left or right argument of the join. Hence, we can compute f(n) as
n−1
X
2 f(k) f(n − k)
k=1
This is equal to
2n−1 C(n − 1)
Star Queries, No Cartesian Product The first join has to connect the
center relation R0 with any of the other relations. The other relations can
follow in any order. Since R0 can be the left or the right input of the first
join, there are 2 ∗ (n − 1)! possible left-deep join trees for Star Queries with no
Cartesian Product.
The number of zig-zag join trees is derived by exchanging the arguments
of all but the first join in any left-deep join tree. We cannot consider the first
join since we did so in counting left-deep join trees. Hence, the total number of
zig-zag join trees is 2 ∗ (n − 1)! ∗ 2n−2 = 2n−1 ∗ (n − 1)!.
Constructing bushy join trees with no Cartesian Product from a Star Query
other than zig-zag join trees is not possible.
Remarks The numbers for star queries are also upper bounds for tree queries.
For clique queries, no join tree containing a cross product is possible. Hence,
all join trees are valid join trees and the search space size is the same as the
corresponding search space for join trees with cross products.
To give the reader a feeling for the numbers, the following tables contain
the potential search space sizes for some n.
Note that in Figure 2.6 only 32 join trees are listed, whereas the number of
bushy trees for chain queries with four relations is 40. The missing eight cases
are those zig-zag trees which are symmetric (i.e. derived by applying commu-
tativity to all occurring joins) to the ones contained in the second column.
From these numbers, it becomes immediately clear why historically the
search space of query optimizers was restricted to left-deep trees and cross
products for connected query graphs were not considered.
clique problem was used for the reduction [316]. Cout was also used in the other
proofs of NP-hardness results. The next line goes back to the same paper.
Ibaraki and Kameda also described an algorithm to solve the join ordering
problem for tree queries producing optimal left-deep trees for a special cost
function for a nested-loop n-way join algorithm. Their algorithm was based on
the observation that their cost function has the ASI property. For this case,
they could derive an algorithm from an algorithm for a sequencing problem for
job scheduling designed by Monma and Sidney [616]. They, in turn, used an
earlier result by Lawler [529]. The algorithm of Ibaraki and Kameda was slightly
generalized by Krishnamurthy, Boral, and Zaniolo, who were also able to sketch
a more efficient algorithm. It improves the time bounds from O(n2 log n) to
O(n2 ). The disadvantage of both approaches is that with every relation, a fixed
(i.e. join-tree independent) join implementation must be associated before the
optimization starts. Hence, it only produces optimal trees if there is only one
join implementation available or one is able to guess the optimal join method
before hand. This might not be the case. The polynomial algorithm which we
term IKKBZ is described in Section 3.2.2.
For star queries, Ganguly investigated the problem of generating optimal
left-deep trees if no cross products but two different cost functions (one for
nested loop join, the other for sort merge join) are allowed. It turned out that
this problem is NP-hard [308].
The next line is due to Cluet and Moerkotte [190]. They showed by reduc-
tion from 3DM that taking into account cross products results in an NP-hard
problem even for star queries. Remember that star queries are tree queries and
general graphs.
The problem for general bushy trees follows from a result by Scheufele and
Moerkotte [756]. They showed that building optimal bushy trees for cross
products only (i.e. all selectivities equal one) is already NP-hard. This result
also explains the last two lines.
By noting that for star queries, all bushy trees that do not contain a cross
product are left-deep trees, the problem can be solved by the IKKBZ algorithm
for left-deep trees. Ono and Lohman showed that for chain queries dynamic
programming considers only a polynomial number of bushy trees if no cross
products are considered [640]. This is discussed in Section 3.2.4.
The table is rather incomplete. Many open problems exist. For example,
if we have chain queries and consider cross products: is the problem NP-hard
or in P? Some results for this problem have been presented [756], but it is
still an open problem (see Section 3.2.7). Open is also the case where we
produce optimal bushy trees with no cross products for tree queries. Yet another
example of an open problem is whether we could drop the ASI property and are
still able to derive a polynomial algorithm for a tree query. This is especially
important, since the cost function for a sort-merge algorithm does not have the
ASI property.
Good summaries of complexity results for different join ordering problems
can be found in the theses of Scheufele [754] and Hamalainen [388].
Given that join ordering is an inherently complex problem with no polyno-
mial algorithm in sight, one might wonder whether there exists good polynomial
3.2. DETERMINISTIC ALGORITHMS 47
approximation algorithms. Chances are that even this is not the case. Chatter-
ji, Evani, Ganguly, and Yemmanuru showed that three different optimization
problems — all asking for linear join trees — are not approximable [142].
GreedyJoinOrdering-1({R1 , . . . , Rn }, (*weight)(Relation))
Input: a set of relations to be joined and a weight function
Output: a join order
S = ; // initialize S to the empty sequence
R = {R1 , . . . , Rn }; // let R be the set of all relations
while(!empty(R)) {
Let k be such that: weight(Rk ) = minRi ∈R (weight(Ri ));
R\ = Rk ; // eliminate Rk from R
S◦ = Rk ; // append Rk to S
}
return S
This algorithm takes cross products into account. If we are only interested
in left-deep join trees with no cross products, we have to require that Rk is
connected to some of the relations contained in S in case S 6= . Note that a
more efficient implementation would sort the relations according to their weight.
Not all heuristics can be implemented with a greedy algorithm as simple as
above. An often-used heuristics is to take the relation next that produces the
smallest (next) intermediate result. This cannot be determined by the relation
alone. One must take into account the sequence S already processed, since on-
ly then the selectivities of all predicates connecting relations in S and the new
relation are deducible. And we must take the product of these selectivities and
the cardinality of the new relation in order to get an estimate of the intermedi-
ate result’s cardinality. As a consequence, the weights become relative to S. In
other words, the weight function now has two parameters: a sequence of rela-
tions already joined and the relation whose relative weight is to be computed.
Here is the next algorithm:
48 CHAPTER 3. JOIN ORDERING
GreedyJoinOrdering-2({R1 , . . . , Rn },
(*weight)(Sequence of Relations, Relation))
Input: a set of relations to be joined and a weight function
Output: a join order
S = ; // initialize S to the empty sequence
R = {R1 , . . . , Rn }; // let R be the set of all relations
while(!empty(R)) {
Let k be such that: weight(S, Rk ) = minRi ∈R (weight(S, Ri ));
R\ = Rk ; // eliminate Rk from R
S◦ = Rk ; // append Rk to S
}
return S
GOO({R1 , . . . , Rn })
Input: a set of relations to be joined
Output: join tree
Trees := {R1 , . . . , Rn }
while (|Trees| != 1) {
find Ti , Tj ∈ Trees such that i 6= j, |Ti B Tj | is minimal
among all pairs of trees in Trees
Trees − = Ti ;
Trees − = Tj ;
Trees + = Ti B Tj ;
}
return the tree contained in Trees;
Our GOO variant differs slightly from the one proposed by Fegaras. He uses
arrays, explicitly handles the forming of the join predicates, and materializes
intermediate result sizes. Hence, his algorithm is a little more elaborated, but
we assume that the reader is able to fill in the gaps.
None of our algorithms so far considers different join implementations. An
explicit consideration of commutativity for non-symmetric cost functions could
also help to produce better join trees. The reader is asked to work out the details
of these extensions. In general, the heuristics do not produce the optimal plan. EX
The reader is advised to find examples where the heuristics are far off the best
possible plan. EX
The IKKBZ-Algorithm considers only join operations that have a cost func-
tion of the form:
cost(Ri 1 Rj ) = |Ri | ∗ hj (|Rj |)
where each Rj can have its own cost function hj . We denote the set of hj by
H and parameterize cost functions with it. Example instanciations are
• hj ≡ 1.2 for main memory hash-based joins
• hj ≡ id for nested-loop joins
where id is the identity function. Let us denote by ni the cardinality of the
relation Ri (ni := |Ri |). Then, the hi (ni ) represent the costs per input tuple to
be joined with Ri .
The algorithm works as follows. For every relation Rk it computes the
optimal join order under the assumption that Rk is the first relation in the join
sequence. The resulting subproblems then resemble a job-scheduling problem
that can be solved by the Monma-Sidney-Algorithm [616].
In order to present this algorithm, we need the notion of a precedence graph.
A precedence graph is formed by taking a node in the (undirected) query graph
and making this node a root node of a (directed) tree where the edges point
away from the selected root node. Hence, for acyclic, connected query graphs—
those we consider in this section—a precedence graph is a tree. We construct
the precedence graph of a query graph G = (V, E) as follows:
• Make some relation Rk ∈ V the root node of the precedence graph.
• As long as not all relations are included in the precedence graph: Choose
a relation Ri ∈ V , such that (Rj , Ri ) ∈ E is an edge in the query graph
and Rj is already contained in the (partial) precedence graph constructed
so far and Ri is not. Add Rj and the edge Rj → Ri to the precedence
graph.
A sequence S = v1 , . . . , vk of nodes conforms to a precedence graph G = (V, E)
if the following conditions are satisfied:
1. for all i (2 ≤ i ≤ k) there exists a j (1 ≤ j < i) with (vj , vi ) ∈ E and
2. there is no edge (vi , vj ) ∈ E for i > j.
For non-empty sequences U and V in a precedence graph, we write U → V if,
according to the precedence graph, U must occur before V . This requires U
and V to be disjoint. More precisely, there can only be paths from nodes in U
to nodes in V and at least one such path exists.
Consider the following query graph:
R1 R5
R3 R4
R2 R6
3.2. DETERMINISTIC ALGORITHMS 51
For this query graph, we can derive the following precedence graphs:
R1 R2 R3
R3 R3 R1 R2 R4
R2 R4 R1 R4 R5 R6
R5 R6 R5 R6
R4 R5 R6
R3 R5 R6 R4 R4
R3 R5 R6 R3 R5 R3
R1 R2 R1 R2
R1 B
R2 B R6
R3 B R5
R4 B R4
R5 B R3
R6 R1 R2
Define
R1,2,...,k := R1 1 R2 1 · · · 1 Rk
n1,2,...,k := |R1,2,...,k |
For a given precedence graph, let Ri be a relation and Ri be the set of relations
from which there exists a path to Ri . Then, in any join tree adhering to the
52 CHAPTER 3. JOIN ORDERING
and, in general,
k
Y
n1,2,...,k = (si ∗ ni ).
i=1
CH () = 0
CH (Rj ) = 0 if Rj is the root
CH (Rj ) = hj (nj ) else
CH (S1 S2 ) = CH (S1 ) + T (S1 ) ∗ CH (S2 )
where
T () = 1
Y
T (S) = (si ∗ ni )
Ri ∈S
Definition 3.2.2 Let A and B be two sequences and V and U two non-empty
sequences. We say that a cost function C has the adjacent sequence interchange
property (ASI property) if and only if there exists a function T and a rank
function defined for sequences S as
T (S) − 1
rank(S) =
C(S)
Lemma 3.2.3 The cost function CH defined in Definition 3.2.1 has the ASI
property.
CH (AU V B) = CH (A)
+T (A)CH (U )
+T (A)T (U )CH (V )
+T (A)T (U )T (V )CH (B)
and, hence,
• B → Ai , ∀ 1 ≤ i ≤ n
• Ai → B, ∀ 1 ≤ i ≤ n
• B 6→ Ai and Ai 6→ B, ∀ 1 ≤ i ≤ n
Lemma 3.2.5 Let C be any cost function with the ASI property and {A, B}
a module. If A → B and additionally rank(B) ≤ rank(A), then we find an
optimal sequence among those in which B directly follows A.
54 CHAPTER 3. JOIN ORDERING
Proof Every optimal permutation must have the form (U, A, V, B, W ), since
A → B. Assumption: V 6= . If rank(V ) ≤ rank(A), then we can ex-
change V and A without increasing the costs. If rank(A) ≤ rank(V ), we
have rank(B) ≤ rank(V ) due to the transitivity of ≤. Hence, we can exchange
B and V without increasing the costs. Both exchanges produce legal sequences
obeying the precedence graph, since {A, B} is a module. 2
If the precedence graph demands A → B but rank(B) ≤ rank(A), we speak
of contradictory sequences A and B. Since the lemma shows that no non-empty
subsequence can occur between A and B, we will combine A and B into a new
single node replacing A and B. This node represents a compound relation
comprising all relations in A and B. Its cardinality is computed by multiplying
the cardinalities of all relations occurring in A and B, and its selectivity s is
the product of all the selectivities si of the relations Ri contained in A and B.
The continued process of this step until no more contradictory sequence exists
is called normalization. The opposite step, replacing a compound node by the
sequence of relations it was derived from, is called denormalization.
We can now present the algorithm IKKBZ.
IKKBZ(G)
Input: an acyclic query graph G for relations R1 , . . . , Rn
Output: the best left-deep tree
R = ∅;
for (i = 1; i ≤ n; + + i) {
Let Gi be the precedence graph derived from G and rooted at Ri ;
T = IKKBZ-Sub(Gi );
R+ = T ;
}
return best of R;
IKKBZ-Sub(Gi )
Input: a precedence graph Gi for relations R1 , . . . , Rn rooted at some Ri
Output: the optimal left-deep tree under Gi
while (Gi is not a chain) {
let r be the root of a subtree in Gi whose subtrees are chains;
IKKBZ-Normalize(r);
merge the chains under r according to the rank function
in ascending order;
}
IKKBZ-Denormalize(Gi );
return Gi ;
IKKBZ-Normalize(r)
Input: the root r of a subtree T of a precedence graph G = (V, E)
Output: a normalized subchain
while (∃ r0 , c ∈ V , r →∗ r0 , (r0 , c) ∈ E: rank(r0 ) > rank(c)) {
replace r0 by a compound relation r00 that represents r0 c;
3.2. DETERMINISTIC ALGORITHMS 55
};
R1
100 18
R2 1 1 R5 19
2 3 R2 R3 R4 20
10 100 49 24
50 25
R1 R4
1
3
5
1
R6,7 5
1
R3 4 2 R6 R7
1
100 A) 10 10 20 D)
5
R5 6
R1
R1
19
R2 R3 R4 20
49 24
50 25 R2 R3 R4,6,7 199
320
4 49 24
R5 R6 5 50 25
5
6 5
E) R5 6
B)
1
R7 2
R1
19
R2 R3 R4 20
49 24
50 25
R5 R6,7 3 F)
C) 5
5
6
I R1 R2 R3 R4 II p2,3
p1,2 p3,4
B R4
p2,3
B R3
R1 R2
p1,2
IV a) p2,3 IV b) B
B B
p3,4
R1 R2 R3 R4
p1,2
V a) p2,3 V b) B
B B
p1,2 p3,4
R1 R2 R3 R4
Figure 3.4: A query graph, its directed join graph, some spanning trees and
join trees
Definition 3.2.6 The directed join graph of a conjunctive query with join pred-
icates P is a triple G = (V, Ep , Ev ), where V is the set of nodes and Ep and
Ev are sets of directed edges defined as follows. For any two nodes u, v ∈ V , if
R(u) ∩ R(v) 6= ∅ then (u, v) ∈ Ep and (v, u) ∈ Ep . If R(u) ∩ R(v) = ∅, then
(u, v) ∈ Ev and (v, u) ∈ Ev . The edges in Ep are called physical edges, those
in Ev virtual edges.
Note that in G for every two nodes u, v, there is an edge (u, v) that is either
physical or virtual. Hence, G is a clique.
Let us see how we can derive a join tree from a spanning tree of a directed
join graph. Figure 3.4 I) gives a simple query graph Q corresponding to a chain
and Part II) presents Q’s directed join graph. Physical edges are drawn by
solid arrows, virtual edges by dotted arrows. Let us first consider the spanning
tree shown in Part III a). It says that we first execute R1 Bp1,2 R2 . The next
join predicate to evaluate is p2,3 . Obviously, it does not make much sense to
execute R2 Bp2,3 R3 , since R1 and R2 have already been joined. Hence, we
replace R2 in the second join by the result of the first join. This results in the
join tree (R1 Bp1,2 R2 ) Bp2,3 R3 . For the same reason, we proceed by joining
this result with R4 . The final join tree is shown in Part III b). Part IV a)
shows another spanning tree. The two joins R1 Bp1,2 R2 and R3 Bp3,4 R4 can
be executed independently and do not influence each other. Next, we have to
consider p2,3 . Both R2 and R3 have already been joined. Hence, the last join
processes both intermediate results. The final join tree is shown in Part IV b).
The spanning tree shown in Part V a) results in the same join tree shown in
Part V b). Hence, two different spanning trees can result in the same join tree.
However, the spanning tree in Part IV a) is more specific in that it demands
R1 Bp1,2 R2 to be executed before R3 Bp3,4 .
Next, take a look at Figure 3.5. Part I), II), and III a) show a query graph,
its directed join tree and a spanning tree. To build a join tree from the spanning
tree we proceed as follows. We have to execute R2 Bp2,3 R3 and R3 B R4 first.
In which way we do so is not really fixed by the spanning tree. So let us do
both in parallel. Next is p1,2 . The only dependency the spanning tree gives
us is that it should be executed after p3,4 . Since there is no common relation
between those two, we perform R1 Bp1,2 R2 . Last is p4,5 . Since we find p3,4
below it, we use the intermediate result produced by it as a replacement for R4 .
The result is shown in Part III b). It has three loose ends. Additional joins are
required to tie the partial results together. Obviously, this is not what we want.
A spanning tree that avoids this problem of additional joins is called effective.
It can be shown that a spanning tree T = (V, E) is effective if it satisfies the
following conditions [530]:
1. T is a binary tree,
2. for all inner nodes v and node u with (u, v) ∈ E it holds that R∗ (T (u)) ∩
R(v) 6= ∅, and
58 CHAPTER 3. JOIN ORDERING
I R1 R2 R3 R4 R5
p3,4
Figure 3.5: A query graph, its directed join tree, a spanning tree and its problem
| Bu |
wu,v = .
|u u v|
(Lee, Shih, and Chen actually attach two weights to each edge: one additional
weight for the size of the tuples (in bytes) [530].)
The weights of physical edges are equal to the si of the dependency graph
used in the IKKBZ-Algorithm (Section 3.2.2). To see this, assume R(u) =
{R1 , R2 }, R(v) = {R2 , R3 }. Then
| Bu |
wu,v =
|u u v|
|R1 Bu R2 |
=
|R2 |
f1,2 |R1 | |R2 |
=
|R2 |
= f1,2 |R1 |
Hence, if the join R1 Bu R2 is executed before the join R2 Bv R3 , the input size
to the latter join changes by a factor wu,v . This way, the influence of a join
on another join is captured by the weights. Since those nodes connected by a
virtual edge do not influence each other, a weight of 1 is appropriate.
Additionally, we assign weights to the nodes of the directed join graph.
The weight of a node reflects the change in cardinality to be expected when
certain other joins have been executed before. They are specified by a (partial)
spanning tree S. Given S, we denote by BSpi,j the result of the join Bpi,j if all
joins preceding pi,j in S have been executed. Then the weight attached to node
pi,j is defined as
| BSpi,j |
w(pi,j , S) = .
|Ri Bpi,j Rj |
For empty sequences , we define w(pi,j , ) = |Ri Bpi,j Rj |. Similarly, we define
the cost of a node pi,j depending on other joins preceding it in some given
spanning tree S. We denote this by cost(pi,j , S). The actual cost function can
be one we have introduced so far or any other one. In fact, if we have a choice
of several join implementations, we can take the minimum over all their cost
functions. This then choses the most effective join implementation.
The maximum value precedence algorithm works in two phases. In the first
phase, it searches for edges with a weight smaller than one. Among these, the
one with the biggest impact is chosen. This one is then added to the spanning
tree. In other words, in this phase, the costs of expensive joins are minimized by
making sure that (size) decreasing joins are executed first. The second phase
adds edges such that the intermediate result sizes increase as little as possible.
MVP(G)
Input: a weighted directed join graph G = (V, Ep , Ev )
Output: an effective spanning tree
Q1 .insert(V ); /* priority queue with largest node weights w(·) first */
Q2 = ∅; /* priority queue with smallest node weights w(·) first */
G0 = (V 0 , E 0 ) with V 0 = V and E 0 = Ep ; /* working graph */
60 CHAPTER 3. JOIN ORDERING
MvpUpdate((u, v))
Input: an edge to be added to S
Output: side-effects on S, G0 ,
ES ∪ = {(u, v)};
E 0 \ = {(u, v), (v, u)};
E 0 \ = {(u, w)|(u, w) ∈ E 0 }; /* (1) */
E 0 ∪ = {(v, w)|(u, w) ∈ Ep , (v, w) ∈ Ev }; /* (3) */
if (v has two inflowing edges in S) { /* (2) */
E 0 \ = {(w, v)|(w, v) ∈ E 0 };
}
if (v has one outflowing edge in S) { /* (1) in paper but not needed */
E 0 \ = {(v, w)|(v, w) ∈ E 0 };
}
Note that in order to test for the effectiveness of a spanning tree in the
algorithm, we just have to check the conditions for the node the selected edge
leads to.
MvpUpdate first adds the selected edge to the spanning tree. It then elim-
inates edges that need not to be considered for building an effective spanning
tree. Since (u, v) has been added, both (u, v) and (v, u) do not have to be
considered any longer. Also, since effective spanning trees are binary trees, (1)
3.2. DETERMINISTIC ALGORITHMS 61
every node must have only one parent node and (2) at most two child nodes.
The edges leading to a violation are eliminated by MvpUpdate in the lines com-
mented with the corresponding numbers. For the line commented (3) we have
the situation that u → v 99K w and u → w in G. This means that u and w have
common relations, but v and w do not. Hence, the result of performing v on
the result of u will have a common relation with w. Thus, we add a (physical)
edge v → w.
(((R1 B R2 ) B R3 ) B R4 ) B R5
and
(((R3 B R1 ) B R2 ) B R4 ) B R5 .
If we know that ((R1 BR2 )BR3 ) is cheaper than ((R3 BR1 )BR2 ), we know that
the first join tree is cheaper than the second. Hence, we could avoid generating
the second alternative and still won’t miss the optimal join tree. The general
principle behind this is the optimality principle (see [204]). For the join ordering
problem, it can be stated as follows.1
To see why this holds, assume that the optimal join tree T for relations R1 , . . . , Rn
contains a subtree S which is not optimal. That is, there exists another join
tree S 0 for the relations contained in S with strictly lower costs. Denote by
T 0 the join tree derived by replacing S in T by S 0 . Since S 0 contains the same
relations as S, T 0 is a join tree for the relations R1 , . . . , Rn . The costs of the join
operators in T and T 0 that are not contained in S and S 0 are the same. Then,
since the total cost of a join tree is the sum of the costs of the join operators
and S 0 has lower costs than S, T 0 has lower costs than T . This contradicts the
optimality of T .
The idea of dynamic programming applied to the generation of optimal join
trees now is to generate optimal join trees for subsets of R1 , . . . , Rn in a bottom-
up fashion. First, optimal join trees for subsets of size one, i.e. single relations,
are generated. From these, optimal join trees of size two, three and so on until
n are generated.
Let us first consider generating optimal left-deep trees. There, join trees for
subsets of size k are generated from subsets of size k − 1 by adding a new join
operator whose left argument is a join tree for k − 1 relations and whose right
argument is a single relation. Exchanging left and right gives us the procedure
for generating right-deep trees. If we want to generate zig-zag trees since our
1
The optimality principle does not hold in the presence of properties.
62 CHAPTER 3. JOIN ORDERING
CreateJoinTree(T1 , T2 )
Input: two (optimal) join trees T1 and T2 .
for linear trees, we assume that T2 is a single relation
Output: an (optimal) join tree for joining T1 and T2 .
BestTree = NULL;
for all implementations impl do {
if(!RightDeepOnly) {
Tree = T1 Bimpl T2
if (BestTree == NULL || cost(BestTree) > cost(Tree)) {
BestTree = Tree;
}
}
if(!LeftDeepOnly) {
Tree = T2 Bimpl T1
if (BestTree == NULL || cost(BestTree) > cost(Tree)) {
BestTree = Tree;
}
}
}
return BestTree;
DP-Linear-1({R1 , . . . , Rn })
Input: a set of relations to be joined
Output: an optimal left-deep (right-deep, zig-zag) join tree
3.2. DETERMINISTIC ALGORITHMS 63
{R1 R2 R3 R4}
{R1 R4}
{R1 R3} {R2 R4}
{R1 R2} {R3 R4}
{R2 R3}
R1 R2 R3 R4
Note that this formulation is general enough to also capture the generation of
bushy trees. It is, however, a little vague due to its reference to “relevance”.
EX For the different join tree classes, this term can be given a precise semantics.
Let us take a look at an alternative order to join tree generation. Assume
that sets of relations are represented as bitvectors. A bitvector is nothing more
than a base two integer. Successive increments of an integer/bitvector lead to
different subsets. Further, the above condition is satisfied. We illustrate this by
a small example. Assume that we have three relations R1 , R2 , R3 . The i-th bit
from the right in a three-bit integer indicates the presence of Ri for 1 ≤ i ≤ 3.
3.2. DETERMINISTIC ALGORITHMS 65
000 {}
001 {R1 }
010 {R2 }
011 {R1 , R2 }
100 {R3 }
101 {R1 , R3 }
110 {R2 , R3 }
111 {R1 , R2 , R3 }
DP-Linear-2({R1 , . . . , Rn })
Input: a set of relations to be joined
Output: an optimal left-deep (right-deep, zig-zag) join tree
for (i = 1; i <= n; ++i) {
BestTree(1 << i − 1) = Ri ;
}
for (S = 1; S < 2n ; ++S) {
if (BestTree(S) != NULL) continue;
for all i ∈ S do {
S 0 = S \ {i};
CurrTree = CreateJoinTree(BestTree(S 0 ),Ri );
if (BestTree(S) == NULL || cost(BestTree(S)) > cost(CurrTree)) {
BestTree(S) = CurrTree;
}
}
}
return BestTree(2n − 1);
DP-Linear-2 differs from DP-Linear-1 not only in the order in which join trees
are generated. Another difference is that it takes cross products into account.
From DP-Linear-2, it is easy to derive an algorithm that explores the space
of bushy trees.
DP-Bushy({R1 , . . . , Rn })
Input: a set of relations to be joined
Output: an optimal bushy join tree
for (i = 1; i <= n; ++i) {
BestTree(1 << i − 1) = Ri ;
}
for (S = 1; S < 2n ; ++S) {
if (BestTree(S) != NULL) continue;
for all S1 ⊂ S do {
66 CHAPTER 3. JOIN ORDERING
S2 = S \ S1 ;
CurrTree = CreateJoinTree(BestTree(S1 ), BestTree(S2 ));
if (BestTree(S) == NULL || cost(BestTree(S)) > cost(CurrTree)) {
BestTree(S) = CurrTree;
}
}
}
return BestTree(2n − 1);
This algorithm also takes cross products into account. The critical part is the
generation of all subsets of S. Fortunately, Vance and Maier [885] provide a
code fragment with which subset bitvector representations can be generated
very efficiently. In C, this fragment looks as follows:
S1 = S & - S;
do {
/* do something with subset S1 */
S1 = S & (S1 - S);
} while (S1 != S);
S represents the input set. S1 iterates through all subsets of S where S itself and
the empty set are not considered. Analogously, all supersets an be generated
as follows:
S1 = ~S & - ~S;
/* do something with first superset S1 */
while (S1 ) {
S1 = ~S & (S1 - ~S)
/* do something with superset S1
}
exist. Chains require far fewer entries than cliques. It would be helpful to
have a small routine solving the following problem: given a query graph, how
many connected subgraph are there? Unfortunatly, this problem is #-P hard
as Sutner, Satyanarayana, and Suffel showed [843]. They build on results by
Valiant [883] and Lichtenstein [546]. (For a definition of #P-hard see the book
by Lewis and Papadimitriou [544] or the original paper by Valiant [882].)
However, for specific cases, these numbers can be given. If cross products
are consideres, the number of join trees stored in the dynamic programming
table is
2n − 1
which is one for each non-empty subset of relations.
If we do not consider cross products, the number of entries in the dynamic
programming table corresponds to the number of connected subgraphs of the
query graph. For connected query graphs, we denote this by #csg. For chains,
cycles, stars, and cliques with n nodes, we have
n(n + 1)
#csgchain (n) = (3.2)
2
#csgcycle (n) = n2 − n + 1 (3.3)
star n−1
#csg (n) = 2 +n−1 (3.4)
clique n
#csg (n) = 2 − 1 (3.5)
These equations can be derived from the following by summing over k > 1
where k gives the size of the connected subset:
#csgchain (n, k) = (n − k + 1)
1 n=k
#csgcycle (n, k) =
n else
n k=1
#csgstar (n, k) = n−1
k>1
k−1
n
#csgclique (n, k) =
k
Join Trees With Cartesian Product For the analysis of dynamic pro-
gramming variants that do consider cross products, the notion of join-pair is
helpful. Let S1 and S2 be subsets of the nodes (relations) of the query graph.
We say (S1 , S2 ) is a join-pair, if and only if
68 CHAPTER 3. JOIN ORDERING
(3n − 2n+1 + 1)
2
This is equal to the number of non-symmetric join-pairs.
Join Trees without Cross Products In this paragraph, we assume that the
query graph is connected. For the analysis of dynamic programming variants
that do not consider cross products, it is helpful to have the notion of a csg-
cmp-pair. Let S1 and S2 be subsets of the nodes (relations) of the query graph.
We say (S1 , S2 ) is a csg-cmp-pair , if and only if
1. S1 induces a connected subgraph of the query graph,
(n − 1)2
The following table presents some results for the above formulas.
Compare this table with the actual sizes of the search spaces in Section 3.1.5.
The dynamic programming algorithms can be implemented very efficiently
and often form the core of commercial plan generators. However, they have
the disadvantage that no plan is generated if they run out of time or space
since the search space they have to explore is too big. One possible remedy
goes as follows. Assume that a dynamic programming algorithm is stopped
in the middle of its way through its actual search space. Further assume that
the largest plans generated so far involve k relations. Then the cheapest of the
plans with k relations is completed by applying any heuristics (e.g. MinSel). The
completed plan is then returned. In Section 3.4.5, we will see two alternative
solutions. Another solution is presented in [480].
70 CHAPTER 3. JOIN ORDERING
DPsize
Input: a connected query graph with relations R = {R0 , . . . , Rn−1 }
Output: an optimal bushy join tree without cross products
for all Ri ∈ R {
BestPlan({Ri }) = Ri ;
}
for all 1 < s ≤ n ascending // size of plan
for all 1 ≤ s1 ≤ s/2 { // size of left/right subplan
s2 = s − s1 ; // size of right/left subplan
for all S1 ⊂ R in BestPlan with |S1 | = s1
S2 ⊂ R in BestPlan with |S2 | = s2 {
++InnerCounter;
6 S1 ∩ S2 ) continue;
if (∅ =
if not (S1 connected to S2 ) continue;
++CsgCmpPairCounter;
p1 =BestPlan(S1 );
p2 =BestPlan(S2 );
CurrPlan = CreateJoinTree(p1 , p2 );
if (cost(BestPlan(S1 ∪ S2 )) > cost(CurrPlan)) {
BestPlan(S1 ∪ S2 ) = CurrPlan;
}
}
}
OnoLohmanCounter = CsgCmpPairCounter / 2;
return BestPlan({R0 , . . . , Rn−1 });
for algorithm DPsize (see Fig. 3.7). A table BestPlan associates with each
set of relations the best plan found so far. The algorithm starts by initializing
this table with plans of size one, i.e. single relations. After that, it constructs
plans of increasing size (loop over s). Thereby, the first size considered is two,
since plans of size one have already been constructed. Every plan joining n
relations can be constructed by joining a plan containing s1 relations with a
plan containing s2 relations. Thereby, si > 0 and s1 + s2 = n must hold. Thus,
the pseudocode loops over s1 and sets s2 accordingly. Since for every possible
size there exist many plans, two more loops are necessary in order to loop over
the plans of sizes s1 and s2 . (This is best implemented by keeping list heads for
every possible plan size pointing to a first plan of this size and chaining plans
of equal size via some next-pointer.) Then, conditions (1) and (2) from above
are tested. Only if their outcome is positive, we consider joining the plans p1
and p2 . The result is a plan CurrPlan. Let S be the relations contained in
CurrPlan. If BestPlan does not contain a plan for the relations in S or the
one it contains is more expensive than CurrPlan, we register CurrPlan with
BestPlan.
The algorithm DPsize can be made more efficient in case of s1 = s2 . The
algorithm as stated cycles through all plans p1 joining s1 relations. For each
such plan, all plans p2 of size s2 are tested. Assume that plans of equal size are
represented as a linked list. If s1 = s2 , then it is possible to iterate through the
list for retrieving all plans p1 . For p2 we consider the plans succeeding p1 in
the list. Thus, the complexity can be decreased from P (s1 ) ∗ P (s2 ) to P (s1 ) ∗
P (s2 )/2, where P (si ) denotes the number of plans of size si . The following
formulas are valid only for the variant of DPsize where this optimization has
been incorporated (see [605] for details).
If the counter InnerCounter is initialized with zero at the beginning of the
algorithm DPsize, then we are able to derive analytically its value after DPsize
terminates. Since this value of the inner counter depends on the query graph,
we have to distinguish several cases. For chain, cycle, star, and clique queries,
chain , I cycle , I star , and I clique the value of InnerCounter
we denote by IDPsize DPsize DPsize DPsize
after termination of algorithm DPsize.
chain (n) =
For chain queries, we then have: IDPsize
1/48(5n4 + 6n3 − 14n2 − 12n) n even
4 3 2
1/48(5n + 6n − 14n − 6n + 11) n odd
cycle
For cycle queries, we have: IDPsize (n) =
1 4
4 (n − n3 − n2 ) n even
1 4 3 2
4 (n − n − n + n) n odd
star (n) =
For star queries, we have: IDPsize
(
2(n−1)
22n−4 − 1/4 n−1 + q(n) n even
2(n−1) n−1
22n−4 − 1/4 n−1 + 1/4 (n−1)/2 + q(n) n odd
72 CHAPTER 3. JOIN ORDERING
DPsub
Input: a connected query graph with relations R = {R0 , . . . , Rn−1 }
Output: an optimal bushy join tree
for all Ri ∈ R {
BestPlan({Ri }) = Ri ;
}
for 1 ≤ i < 2n − 1 ascending {
S = {Rj ∈ R|(bi/2j c mod 2) = 1}
if not (connected S) continue; // ∗
for all S1 ⊂ S, S1 6= ∅ do {
++InnerCounter;
S2 = S \ S1 ;
if (S2 = ∅) continue;
if not (connected S1 ) continue;
if not (connected S2 ) continue;
if not (S1 connected to S2 ) continue;
++CsgCmpPairCounter;
p1 = BestPlan(S1 );
p2 = BestPlan(S2 );
CurrPlan = CreateJoinTree(p1 , p2 );
if (cost(BestPlan(S)) > cost(CurrPlan)) {
BestPlan(S) = CurrPlan;
}
}
}
OnoLohmanCounter = CsgCmpPairCounter / 2;
return BestPlan({R0 , . . . , Rn−1 });
with q(n) = n2n−1 − 5 ∗ 2n−3 + 1/2(n2 − 5n + 4). For clique queries, we have:
clique
IDPsize (n) =
(
22n−2 − 5 ∗ 2n−2 + 1/4 2n n
n − 1/4 n/2 + 1 n even
22n−2 − 5 ∗ 2n−2 + 1/4 2n
n +1 n odd
n √
Note that 2nn is in the order of Θ(4 / n).
Proofs of the above formulas as well as implementation details for the algo-
rithm DPsize can be found in [605].
The number of failures for the additional check can easily be calculated as
2n − #csg(n) − 1.
Sample numbers Fig. 3.9 contains tables with values produced by our for-
mulas for input query graph sizes between 2 and 20. For different kinds of query
graphs, it shows the number of csg-cmp-pairs (#ccp). and the values for the
inner counter after termination of DPsize and DPsub applied to the different
query graphs.
Looking at these numbers, we observe the following:
• For chain and cycle queries, the DPsize soon becomes much faster than
DPsub.
74 CHAPTER 3. JOIN ORDERING
Chain Cycle
n #ccp/2 DPsub DPsize #ccp/2 DPsub DPsize
2 1 2 1 1 2 1
5 20 84 73 40 140 120
10 165 3962 1135 405 11062 2225
15 560 130798 5628 1470 523836 11760
20 1330 4193840 17545 3610 22019294 37900
Star Clique
n #ccp/2 DPsub DPsize #ccp/2 DPsub DPsize
2 1 2 1 1 2 1
5 32 130 110 90 180 280
10 2304 38342 57888 28501 57002 306991
15 114688 9533170 57305929 7141686 14283372 307173877
20 4980736 2323474358 59892991338 1742343625 3484687250 309338182241
Figure 3.9: Size of the search space for different graph structures
• For star and clique queries, the DPsub soon becomes much faster than
DPsize.
From the latter observation we can conclude that in almost all cases the tests
performed by both algorithms in their innermost loop fail. Both algorithms
are far away from the theoretical lower bound given by #ccp. This conclusion
motivates us to derive a new algorithm whose InnerCounter value is equal to
the number of csg-cmp-pairs.
DPccp
Input: a connected query graph with relations R = {R0 , . . . , Rn−1 }
Output: an optimal bushy join tree
for all Ri ∈ R) {
BestPlan({Ri }) = Ri ;
}
for all csg-cmp-pairs (S1 , S2 ), S = S1 ∪ S2 {
++InnerCounter;
++OnoLohmanCounter;
p1 = BestPlan(S1 );
p2 = BestPlan(S2 );
CurrPlan = CreateJoinTree(p1 , p2 );
if (cost(BestPlan(S)) > cost(CurrPlan)) {
BestPlan(S) = CurrPlan;
}
CurrPlan = CreateJoinTree(p2 , p1 );
if (cost(BestPlan(S)) > cost(CurrPlan)) {
BestPlan(S) = CurrPlan;
}
}
CsgCmpPairCounter = 2 * OnoLohmanCounter;
return BestPlan({R0 , . . . , Rn−1 });
0 1 1 1 1 0 0 0 1
...
2 3 2 3 3 2 3 2 3 2 2 3
Graph 1. 2. 3. 4. 5. 6. 7. ...
EnumerateCsgRec(G, S, X)
N = IN(S) \ X;
for all S 0 ⊆ N , S 0 6= ∅, enumerate subsets first {
emit (S ∪ S 0 );
}
for all S 0 ⊆ N , S 0 6= ∅, enumerate subsets first {
EnumerateCsgRec(G, (S ∪ S 0 ), (X ∪ N ));
}
R0
R1 R2 R3
R4
EnumerateCsgRec
S X N emit/S
{4} {0, 1, 2, 3, 4} ∅
{3} {0, 1, 2, 3} {4}
{3, 4}
{2} {0, 1, 2} {3, 4}
{2, 3}
{2, 4}
{2, 3, 4}
{1} {0, 1} {4}
{1, 4}
→ {1, 4} {0, 1, 4} {2, 3}
{1, 2, 4}
{1, 3, 4}
{1, 2, 3, 4}
{0} {0} {1, 2, 3}
{0, 1}
{0, 2}
{0, 3}
{0, 1, 2}
{0, 1, 3}
{0, 2, 3}
{0, 1, 2, 3}
→ {0, 1} {0, 1, 2, 3} {4}
{0, 1, 4}
→ {0, 2} {0, 1, 2, 3} {4}
{0, 2, 4}
are contained in the table in Figure 3.13. In this table, S and X are the
arguments of EnumerateCsgRec. N is the local variable after its initialization.
The column emit/S contains the connected subset emitted, which then becomes
the argument of the recursive call to EnumerateCsgRec (labelled by →). Since
listing all calls is too lengthy, only a subset of the calls is listed.
Generating the connected subsets is an important first step but clearly not
78 CHAPTER 3. JOIN ORDERING
Hence, {R4 } is emitted and together with {R1 }, it forms the csg-cmp-pair
({R1 }, {R4 }). Then, the recursive call to EnumerateCsgRec follows with ar-
guments G, {R4 }, and {R0 , R1 , R4 }. Subsequent EnumerateCsgRec generates
the connected sets {R2 , R4 }, {R3 , R4 }, and {R2 , R3 , R4 }, giving three more
csg-cmp-pairs.
3.2. DETERMINISTIC ALGORITHMS 79
3.2.5 Memoization
Whereas dynamic programming constructs the join trees iteratively from small
trees to larger trees, i.e. works bottom up, memoization works recursively. For a
given set of relations S, it produces the best join tree for S by recursively calling
itself for every subset S1 of S and considering all join trees between S1 and its
complement S2 . The best alternative is memoized (hence the name). The rea-
son is that two (even different) (sub-) sets of all relations may very well have the
common subsets. For example, {R1 , R2 , R3 , R4 , R5 } and {R2 , R3 , R4 , R5 , R6 }
have the common subset {R2 , R3 , R4 , R5 }. In order to avoid duplicate work,
memoization is essential.
In the following variant of memoization, we explore the search space of all
bushy trees and consider cross products. We split the functionality across two EX
functions. The first one initializes the BestTree data structure with single
relation join trees for Ri and then calls the second one. The second one is the
core memoization procedure which calls itself recursively.
MemoizationJoinOrdering(R)
Input: a set of relations R
Output: an optimal join tree for R
for (i = 1; i <= n; ++i) {
BestTree({Ri }) = Ri ;
}
return MemoizationJoinOrderingSub(R);
MemoizationJoinOrderingSub(S)
Input: a (sub-) set of relations S
Output: an optimal join tree for S
if(NULL == BestTree(S)) {
for all S1 ⊂ S do {
S2 = S \ S1 ;
CurrTree = CreateJoinTree(MemoizationJoinOrderingSub(S1 ), MemoizationJoinOrderingSub(
if (BestTree(S) == NULL || cost(BestTree(S)) > cost(CurrTree)) {
BestTree(S) = CurrTree;
}
}
}
return BestTree(S);
Again, pruning techniques can help to speed up plan generation [787]. ToDo?
ConstructPermutations(Query Specification)
Input: query specification for relations {R1 , . . . , Rn }
Output: optimal left-deep tree
BestPermutation = NULL;
Prefix = ;
Rest = {R1 , . . . , Rn };
ConstructPermutationsSub(Prefix, Rest);
return BestPermutation
ConstructPermutationsSub(Prefix, Rest)
Input: a prefix of a permutation and the relations to be added (Rest)
Ouput: none, side-effect on BestPermutation
if (Rest == ∅) {
if (BestPermutation == NULL || cost(Prefix) < cost(BestPermutation)) {
BestPermutation = Prefix;
}
return
}
foreach (Ri , Rj ∈ Rest) {
if (cost(Prefix ◦ hRi , Rj i) ≤ cost(Prefix ◦ hRj , Ri i)) {
ConstructPermutationsSub(Prefix ◦ hRi i, Rest \ {Ri });
}
if (cost(Prefix ◦ hRj , Ri i) ≤ cost(Prefix ◦ hRi , Rj i)) {
ConstructPermutationsSub(Prefix ◦ hRj i, Rest \ {Rj });
}
}
return
The algorithm can be made more efficient, if the foreach loop considers only
a single relation and performs the swap test with this relation and the last
relation occurring in Prefix.
The algorithm has two main advantages over dynamic programming and
memoization. The first advantage is that it needs only linear space opposed
to exponential space for the two mentioned alternatives. The other main
advantage over dynamic programming is that it generates join trees early,
whereas with dynamic programming we only generate a plan after the whole
search space has been explored. Thus, if the query contains too many joins—
that is, the search space cannot be fully explored in reasonable time and
space—dynamic programming will not generate any plan at all. If stopped,
3.2. DETERMINISTIC ALGORITHMS 81
ConstructPermutations will not necessarily compute the best plan, but still
some plans have been investigated. This allows us to stop it after some time
limit has exceeded. The time limit itself can be fixed, like 100 ms, or variable,
like 5% of the execution time of the best plan found so far.
The predicates in the if statement can be made more efficient if a (local)
ranking function is available. Further speed-up of the algorithm can be achieved
if additionally the idea of memoization is applied (of course, this jeopardizes
the small memory footprint).
The following variant might be interesting if one is willing to go from linear
space consumption to quadratic space consumption. The original algorithm
is then started n times, once for each relation as a starting relation. The n
different instantiations then have to run interleaved. This variant reduces the
dependency on the starting relation.
Worst Case Analysis ToDo/EX
Pruning/memoization/propagation ToDo/EX
R1 — R2 — . . . — Rn
For every edge (Ri , Ri+1 ), there is an associated selectivity fi,i+1 = |Ri B
Ri+1 |/|Ri × Ri+1 |. We define all other selectivities fi,j = 1 for |i − j| =6 1.
They correspond to cross products.
In this section we consider only left-deep processing trees. However, we
allow them to contain cross products. Hence, any permutation is a valid join
tree. There is a unique correspondence not only between left-deep join trees
but also between consecutive parts of a permutation and segments of a left-
deep tree. Furthermore, if a segment of a left-deep tree does not contain cross
products, it uniquely corresponds to a consecutive part of the chain in the query
graph. In this case, we also speak of (sub)chains or connected (sub)sequences.
We say that two relations Ri and Rj are connected if they are adjacent in G;
more generally, two sequences s and t are connected if there exist relations Ri
82 CHAPTER 3. JOIN ORDERING
Cu () := 0
Cu (Ri ) := 0 if u =
Y
Cu (Ri ) := ( fj,i )ni if u 6=
Rj <uRi Ri
with
Tu () := 1
Y Y
Tu (s) := ( fj,i ) ∗ ni
Ri ∈s Rj <us Ri
and only the second version will apply to the original cost function C. As we
will see, C 0 differs from C in exactly the problematic case in which it is defined
as Cu0 (Ri ) := |Ri |. Now, C0 (s) = 0 holds if and only if s = holds. Within
subsequent definitions and lemmata, C can also be replaced by C 0 without
changing their validity. Last, we abbreviate C by C for convenience.
Hence, within the optimal sequence, the relation with the smallest rank (here
R3 , since rankR1 (R3 ) < rankR1 (R2 )) is preferred. As the next lemma will
show, this is no accident.
2
Using the rank function, the following lemma can be proved.
Lemma 3.2.10 Let u, x and y be three subchains where x and y are not inter-
connected. Then we have:
A special case occurs when x and y are single relations. Then the above condi-
tion simplifies to
rankux (y) < ranku (x) ≤ ranku (y)
and
C(R1 R2 R3 ) = 15
C(R1 R3 R2 ) = 20
Hence,
and (R2 , R3 ) is a contradictory pair within R1 R2 R3 . Now the use of the term
contradictory becomes clear: the costs do not behave as could be expected from
the ranks. 2
The next (obvious) lemma states that contradictory chains are necessarily
connected.
Lemma 3.2.12 If there is no connection between two subchains x and y, then
they cannot build a contradictory pair (x, y).
Now we present the fact that between a contradictory pair of relations, there
cannot be any other relation not connected to them without increasing cost.
Lemma 3.2.13 Let S = usvtw be a sequence. If there is no connection between
relations in s and v and relations in v and t, and ranku (s) ≥ rankus (t), then
there exists a sequence S 0 not having higher costs, where s immediately precedes
t.
C(R1 R2 R3 R5 R4 ) = 4 + 8 + 16 + 8 = 36
C(R1 R2 R5 R3 R4 ) = 4 + 8 + 16 + 8 = 36
C(R1 R5 R2 R3 R4 ) = 2 + 8 + 16 + 8 = 34
86 CHAPTER 3. JOIN ORDERING
2
The next lemma shows that, if there exist two sequences of single rank-sorted
relations, then their costs as well as their ranks are necessarily equal.
Lemma 3.2.14 Let S = x1 · · · xn and S 0 = y1 · · · yn be two different rank-
sorted chains containing exactly the relations R1 , . . . , Rn , i.e.
rankx1 ···xi−1 (xi ) ≤ rankx1 ···xi (xi+1 ) for all 1 ≤ i ≤ n,
ranky1 ···yi−1 (yi ) ≤ ranky1 ···yi (yi+1 ) for all 1 ≤ i ≤ n,
then S and S 0 have equal costs and, furthermore,
rankx1 ···xi−1 (xi ) = ranky1 ···yi−1 (yi ) for all 1 < i ≤ n
One could conjecture that the following generalization of Lemma 3.2.14 is
true, although no one has proved it so far.
Conjecture 3.2.1 Let S = x1 · · · xn and S 0 = y1 · · · ym be two different rank-
sorted chains for the relations R1 . . . , Rn where the x0i s and yi0 s are subsequences
such that
rankx1 ···xi−1 (xi ) ≤ rankx1 ···xi (xi+1 ) for all 1 ≤ i < n,
ranky1 ···yi−1 (yi ) ≤ ranky1 ···yi (yi+1 ) for all 1 ≤ i < m,
and the subsequences xi and yj are all optimal (with respect to the fixed prefixes
x1 . . . xi−1 and y1 . . . yj−1 ), then S and S 0 have equal costs.
Consider the problem of merging two optimal unconnected chains. If we
knew that the ranks of relations in an optimal chain are always sorted in as-
cending order, we could use the classical merge procedure to combine the two
chains. The resulting chain would also be rank-sorted in ascending order and,
according to Lemma 3.2.14, it would be optimal. Unfortunately, this does not
work, since there are optimal chains whose ranks are not sorted in ascending
order: those containing sequences with contradictory ranks.
Now, as shown in Lemma 3.2.13, between contradictory pairs of relations
there cannot be any other relation not connected to them. Hence, in the merging
process, we have to take care that we do not merge a contradictory pair of
relations with a relation not connected to the pair. In order to achieve this,
we apply the same trick as in the IKKBZ algorithm: we tie the relations of a
contradictory subchain together by building a compound relation. Assume that
we tie together the relations r1 , . . . , rn to a new relation r1,...,n . Then we define
the size of r1,...,n as |r1,...,n | = |r1 B . . . B rn | Further, if some ri (1 ≤ i ≤ n)
does have a connection to some rk 6∈ {r1 , . . . , rn } then we define the selectivity
factor fr1,...,n ,rk between rk and r1,...,n as fr1,...,n ,rk = fi,k .
If we tie together contradictory pairs, the resulting chain of compound re-
lations still does not have to be rank-sorted with respect to the compound
relations. To overcome this, we iterate the process of tying contradictory pairs
of compound relations together until the sequence of compound relations is
rank-sorted, which will eventually be the case. That is, we apply the normal-
ization as used in the IKKBZ algorithm. However, we have to reformulate it
for relativized costs and ranks:
3.2. DETERMINISTIC ALGORITHMS 87
Normalize(p,s)
while (there exist subsequences u, v (u 6= ) and
compound relations x, y such that s = uxyv
and Cpu (xy) ≤ Cpu (yx)
and rankpu (x) > rankpux (y)) {
replace xy by a compound relation (x, y);
}
return (p, s);
The compound relations in the result of the procedure Normalize are called
contradictory chains. A maximal contradictory subchain is a contradictory sub-
chain that cannot be made longer by further tying steps. Resolving the tyings
introduced in the procedure normalize is called de-normalization. It works the
same way as in the IKKBZ algorithm. The cost, size and rank functions can
now be extended to sequences containing compound relations in a straightfor-
ward way. We define the cost of a sequence containing compound relations to
be identical with the cost of the corresponding de-normalized sequence. The
size and rank functions are defined analogously.
The following simple observation is central to the algorithms: every chain
can be decomposed into a sequence of adjacent maximal contradictory sub-
chains. For convenience, we often speak of chains instead of subchains and of
contradictory chains instead of maximal contradictory subchains. The mean-
ing should be clear from the context. Further, we note that the decomposi-
tion into adjacent maximal contradictory subchains is not unique. For exam-
ple, consider an optimal subchain r1 r2 r3 and a sequence u of preceding rela-
tions. If ranku (r1 ) > rankur1 (r2 ) > rankur1 r2 (r3 ) one can easily show that
both (r1 , (r2 , r3 )) and ((r1 , r2 ), r3 ) are contradictory subchains. Nevertheless,
this ambiguity is not important since in the following we are only interest-
ed in contradictory subchains which are optimal . In this case, the condition
Cu (xy) ≤ Cu (yx) is certainly true and can therefore be neglected. One can
show that for the case of optimal subchains the indeterministically defined nor-
malization process is well-defined, that is, if S is optimal, normalize(P,S) will
always terminate with a unique “flat” decomposition of S into maximal contra-
dictory subchains (flat means that we remove all but the outermost parenthesis,
e.g. (R1 R2 )(((R5 R4 )R3 )R6 ) becomes (R1 R2 )(R5 R4 R3 R6 )).
The next two lemmata and the conjecture show a possible way to overcome
the problem that if we consider cross products, we have an unconstrained or-
dering problem and the idea of Monma and Sidney as exploited in the IKKBZ
algorithm is no longer applicable. The next lemma is a direct consequence of
the normalization procedure.
The next result shows how to build an optimal sequence from two optimal
non-interconnected sequences.
Lemma 3.2.16 Let x and y be two optimal sequences of relations where x and
y are not interconnected. Then the sequence obtained by merging the maximal
contradictory subchains in x and y (as obtained by normalize) according to
their ascending rank is optimal.
Conjecture 3.2.2 Consider two sequences S and T containing exactly the re-
lations R1 ,. . . ,Rn . Let S = s1 . . . sk and T = t1 . . . tl be such that each of the
maximal contradictory subchains si , i = 1, . . . , k and tj , j = 1, . . . , l are optimal
recursively decomposable. Then S and T have equal costs.
Definition 3.2.17 (neighbourhood) We call the set of relations that are di-
rectly connected to a subchain (with respect to the query graph G) the complete
neighbourhood of that subchain. A neighbourhood is a subset of the complete
neighbourhood. The complement of a neighbourhood u of a subchain s is defined
as v \ u, where v denotes the complete neighbourhood of s.
3.2. DETERMINISTIC ALGORITHMS 89
s = R2 R4 R3 R6 R5 R1 .
M ⊆ {Ri , . . . , Rn }.
(c) For all (l1 , l2 ) ∈ L1 × L2 , perform the following steps:
i. Let L be the result of merging l1 and l2 according to their ranks.
ii. Use Ri L to update the current-best join ordering.
Suppose that conjecture 3.2.2 is true, and we can replace the backtracking part
by a search for the first solution. Then the complexity
Pn of the step 1 is O(n4 ),
whereas the complexity of step 2 amounts to i=1 (O(i ) + O(n − i)2 + O(n)) =
2
O(n3 ). Hence, the total complexity would be O(n4 ) in the worst case. Of
course, if our conjecture is false, the necessary backtracking step might lead to
an exponential worst case complexity.
Step 1 is identical to step 2 of our first algorithm. Note that Lemma 3.2.15
cannot be applied to the sequence in Step 2, since an optimal recursive de-
composable chain is not necessarily an optimal chain. Therefore, the question
arises whether Step 3 really makes sense. One can show that the partial order
defined by the precedence relation among the contradictory subchains has the
property that all elements along paths in the partial order are sorted by rank.
By computing a greedy topological ordering (greedy with respect to the ranks),
we obtain a sequence as requested in step 3.
Let us briefly analyze the worst case time complexity of the second algo-
rithm. The first step requires time O(n4 ), whereas the second step requires time
O(n2 ). The third step has complexity O(n log n). Hence, the total complexity
is O(n4 ).
Algorithm II’ is based on the cost function C 0 . We can now modify the
algorithm for the original cost function C as follows.
Algorithm CHAIN-II:
(a) Let L1 be the result of applying the steps 2 and 3 of Algorithm II’ to
all optimal recursive decomposable subchains whose extent (N, M )
satisfies Ri ∈ N and M ⊆ {R1 , . . . , Ri }.
(b) Let L2 be the result of applying the steps 2 and 3 of Algorithm II’ to
all optimal recursive decomposable subchains whose extent (N, M )
satisfies Ri N and M ⊆ {Ri , . . . , Rn }.
(c) Let L be the result of merging L1 and L2 according to their ranks.
(d) De-normalize L.
(e) Use Ri L to update the current-best join ordering.
Whereas the run time of the second algorithm is mainly determined by the
number of relations in the query, the run time of the first also heavily depends
on the number of existing optimal contradictory subchains. In the worst case,
the first algorithm is slightly inferior to the second. Additionally, Hamalainen
reports on an independent implementation of the second algorithm [388]. He
could not find an example where the second algorithm did not produce the
optimal result either. We encourage the reader to prove that it produces the
EX optimal result.
equivalence is applied, it is difficult to see whether the resulting join tree has al-
ready been produced or not (see also Figure 2.6). Thus, this procedure is highly
inefficient. Hence, it does not play any role in practice. Nevertheless, we give
the pseudo-code for it, since it forms the basis for several of the following algo-
rithms. We split the exhaustive transformation approach into two algorithms.
One that applies all equivalences to a given join tree (ApplyTransformations)
and another that does the loop (ExhaustiveTransformation). A transforma-
tion is applied in a directed way. Thus, we reformulate commutativity and
associativity as rewrite rules using ; to indicate the direction.
The following table summarizes all rules commonly used in transformation-
based and randomized join ordering algorithms. The first three are directly
derived from the commutativity and associativity laws for the join. The other
rules are shortcuts used under special circumstances. For example, left associa-
tivity may turn a left-deep tree into a bushy tree. When only left-deep trees are
to be considered, we need a replacement for left associativity. This replacement
is called left join exchange.
R1 B R2 ; R2 B R1 Commutativity
(R1 B R2 ) B R3 ; R1 B (R2 B R3 ) Right Associativity
R1 B (R2 B R3 ) ; (R1 B R2 ) B R3 Left Associativity
(R1 B R2 ) B R3 ; (R1 B R3 ) B R2 Left Join Exchange
R1 B (R2 B R3 ) ; R2 B (R1 B R3 ) Right Join Exchange
Two more rules are often used to transform left-deep trees. The first opera-
tion (swap) exchanges two arbitrary relations in a left-deep tree. The second
operation (3Cycle) performs a cyclic rotation of three arbitrary relations in a
left-deep tree. To account for different join methods, a rule called join method
exchange is introduced.
The first rule set (RS-0) we are using contains the commutativity rule and
both associativity rules. Applying associativity can lead to cross products. RS-0
If we do not want to consider cross products, we only apply any of the two
associativity rules if the resulting expression does not contain a cross product.
It is easy to extend ApplyTransformations to cover this by extending the if
conditions with
and (ConsiderCrossProducts || connected(·))
where the argument of connected is the result of applying a transformation.
ExhaustiveTransformation({R1 , . . . , Rn })
Input: a set of relations
Output: an optimal join tree
Let T be an arbitrary join tree for all relations
Done = ∅; // contains all trees processed
ToDo = {T }; // contains all trees to be processed
while (!empty(ToDo)) {
Let T be an arbitrary tree in ToDo
96 CHAPTER 3. JOIN ORDERING
ToDo \ = T ;
Done ∪ = T ;
Trees = ApplyTransformations(T );
for all T ∈ Trees do {
if (T 6∈ ToDo ∪ Done) {
ToDo + = T ;
}
}
}
return cheapest tree found in Done;
ApplyTransformations(T )
Input: join tree
Output: all trees derivable by associativity and commutativity
Trees = ∅;
Subtrees = all subtrees of T rooted at inner nodes
for all S ∈ Subtrees do {
if (S is of the form S1 B S2 ) {
Trees + = S2 B S1 ;
}
if (S is of the form (S1 B S2 ) B S3 ) {
Trees + = S1 B (S2 B S3 );
}
if (S is of the form S1 B (S2 B S3 )) {
Trees + = (S1 B S2 ) B S3 ;
}
}
return Trees;
Besides the problems mentioned above, this algorithm also has the problem
that the sharing of subtrees is a non-trivial task. In fact, we assume that
ApplyTransformations produces modified copies of T . To see how ExhaustiveTransformation
works, consider again Figure 2.6. Assume that the top-left join tree is the initial
join tree. Then, from this join tree ApplyTransformations produces all trees
reachable by some edge. All of these are then added to ToDo. The next call
to ApplyTransformations with any to the produced join trees will have the
initial join tree contained in Trees. The complete set of visited join trees after
this step is determined from the initial join tree by following at most two edges.
Let us reformulate the algorithm such that it uses a data structure similar
to dynamic programming or memoization in order to avoid duplicate work. For
any subset of relations, dynamic programming remembers the best join tree.
This does not quite suffice for the transformation-based approach. Instead, we
have to keep all join trees generated so far including those differing in the order
of the arguments or a join operator. However, subtrees can be shared. This
is done by keeping pointers into the data structure (see below). So, the dif-
ference between dynamic programming and the transformation-based approach
becomes smaller. The main remaining difference is that dynamic programming
3.2. DETERMINISTIC ALGORITHMS 97
only considers these join trees while with the transformation-based approach
we have to keep the considered join trees since other join trees (more beneficial)
might be generatable from them.
The data structure used for remembering trees is often called the MEMO
structure. For every subset of relations to be joined (except the empty set), a
class exists in the MEMO structure. Each class contains all the join trees that
join exactly the relations describing the class. Here is an example for join trees
containing three relations.
ExhaustiveTransformation2(Query Graph G)
Input: a query specification for relations {R1 , . . . , Rn }.
Output: an optimal join tree
initialize MEMO structure
ExploreClass({R1 , . . . , Rn })
return best of class {R1 , . . . , Rn }
ExploreClass(C)
Input: a class C ⊆ {R1 , . . . , Rn }
Output: none, but has side-effect on MEMO-structure
while (not all join trees in C have been explored) {
choose an unexplored join tree T in C
ApplyTransformation2(T )
mark T as explored
98 CHAPTER 3. JOIN ORDERING
}
return
ApplyTransformations2(T )
Input: a join tree of a class C
Output: none, but has side-effect on MEMO-structure
ExploreClass(left-child(T ));
ExploreClass(right-child(T ));
foreach transformation T and class member of child classes {
foreach T 0 resulting from applying T to T {
if T 0 not in MEMO structure {
add T 0 to class C of MEMO structure
}
}
}
return
T1 : Commutativity C1 B0 C2 ; C2 B1 C1
Disable all transformations T1 , T2 , and T3 for B1 .
Commutativity T1 gives us {R3 , R4 } B000 {R1 , R2 } (Step 3). For right associa-
tivity, we have two elements in class {R1 , R2 }. Substituting them and applying
T2 gives
The latter contains a cross product. This leaves us with the former as the result
of Step 4. The right argument of the top most join is R2 B111 {R3 , R4 }. Since
we do not find it in class {R2 , R3 , R4 }, we add it (4).
T3 is next.
The latter contains a cross product. This leaves us with the former as the result
of Step 5. We also add {R1 , R2 } B111 R3 . Now that {R1 , R2 } B111 {R3 , R4 } is
completely explored, we turn to {R3 , R4 }B000 {R1 , R2 }, but all transformations
are disabled here.
R1 B100 {R2 , R3 , R4 } is next. First, {R2 , R3 , R4 } has to be explored. The
only entry is R2 B111 {R3 , R4 }. Remember that {R3 , R4 } is already explored.
T2 is not applicable. The other two transformations give us
T1 {R3 , R4 } B000 R2
Those join trees not exhibiting a cross product are added to the MEMO struc-
ture under 6. Applying commutativity to {R2 , R4 } B100 R3 gives 7. Commuta-
tivity is the only rule enabled for R1 B100 {R2 , R3 , R4 }. Its application results
in 8.
{R1 , R2 , R3 } B100 R4 is next. It is simple to explore the class {R1 , R2 , R3 }
with its only entry {R1 , R2 } B111 R3 :
T1 R3 B000 {R1 , R2 }
Commutativity can still be applied to R1 B100 (R2 B111 R3 ). All the new entries
are numbered 9. Commutativity is the only rule enabled for {R1 , R2 , R3 }B100 R4
Its application results in 10.
2
The next two sets of transformations were originally intended for generating
all bushy/left-deep trees for a clique query [671]. They can, however, also be
used to generate all bushy trees when cross products are considered. The rule
set RS-2 for bushy trees is
T1 : Commutativity C1 B0 C2 ; C2 B1 C1
Disable all transformations T1 , T2 , T3 , and T4 for B1 .
3.3. PROBABILISTIC ALGORITHMS 101
If we initialize the MEMO structure with left-deep trees, we can strip down
the above rule set to Commutativity and Left Associativity. The reason is an
observation made by Shapiro et al.: from a left-deep join tree we can generate
all bushy trees with only these two rules [787].
If we want to consider only left-deep trees, the following rule set RS-3 is
appropriate:
T1 Commutativity R1 B0 R2 ; R2 B1 R1
Here, the Ri are restricted to classes with exactly one relation. T1 is
disabled for B1 .
We start with generating random left-deep join trees for n relations. This
problem is identical to generating random permutations. That is, we look for
a fast unranking algorithm that maps the non-negative integers in [0, n![ to
permutations. Let us consider permutations of the numbers {0, . . . , n − 1}.
A mapping between these numbers and relations is established easily, e.g. via
an array. The traditional approach to ranking/unranking of permutations is
to first define an ordering on the permutations and then find a ranking and
unranking algorithm relative to that ordering. For the lexicographic order, al-
gorithms require O(n2 ) time [547, 712]. More sophisticated algorithms separate
the ranking/unranking algorithms into two phases. For ranking, first the in-
version vector of the permutation is established. Then, ranking takes place for
the inversion vector. Unranking works in the opposite direction. The inver-
sion vector of a permutation π = π0 , . . . , πn−1 is defined to be the sequence
v = v0 , . . . , vn−1 , where vi is equal to the number of entries πj with πj > πi
and j < i. Inversion vectors uniquely determine a permutation [863]. However,
naive algorithms of this approach again require O(n2 ) time. Better algorithms
require O(n log n). Using an elaborated data structure, Dietz’ algorithm re-
quires O((n log n)/(log log n)) [238]. Other orders like the Steinhaus-Johnson-
Trotter order have been exploited for ranking/unranking but do not yield any
run-time advantage over the above mentioned algorithms (see [511, 712]).
Since it is not important for our problem that any order constraints are sat-
isfied for the ranking/unranking functions, we use the fastest possible algorithm
established by Myrvold and Ruskey [625]. It runs in O(n) which is also easily
seen to be a lower bound.
The algorithm is based on the standard algorithm to generate random per-
mutations [220, 247, 619]. An array π is initialized such that π[i] = i for
0 ≤ i ≤ n − 1. Then, the loop
Unrank(n, r) {
Input: the number n of elements to be permuted
and the rank r of the permutation to be constructed
3.3. PROBABILISTIC ALGORITHMS 103
Output: a permutation π
for (i = 0; i < n; + + i) π[i] = i;
Unrank-Sub(n, r, π);
return π;
}
}
Unrank-Sub(n, r, π) {
for (i = n; i > 0; − − i) {
swap(π[i − 1], π[r mod i]);
r = br/ic;
}
}
algorithm is shown in the third line below each tree: we remember the places
(index in the bit-string) where we find a 0 10 .
104 CHAPTER 3. JOIN ORDERING
B B B B B
B B B B B B
B B B B B B
B B B
B B B B B
B B B B B B B B
B B B B B
B B
B B B B
B B B B
B B B B B
B B B
4 1
[0,0]
3 4 1
[1,4[
2 9 3 1
[4,9[
1 14 5 2
[9,14[
1 2 3 4 5 6 7 8
These numbers are called the Ballot numbers [129]. The number of paths from
(i, j) to (2n, 0) can thus be computed as (see [548, 549]):
q(i, j) = p(2n − i, j)
Note the special case q(0, 0) = p(2n, 0) = C(n). In Figure 3.16, we annotated
nodes (i, j) by p(i, j). These numbers can be used to assign (sub-) intervals to
paths (Dyck words, trees). For example, if we are at (4, 4), there exists only
a single path to (2n, 0). Hence, the path that travels the edge (4, 4) → (5, 3)
has rank 0. From (3, 3) there are four paths to (2n, 0), one of which we already
considered. This leaves us with three paths that travel the edge (3, 3) → (4, 2).
The paths in this part as assigned ranks in the interval [1, 4[. Figure 3.16 shows
the intervals near the edges. For unranking, we can now proceed as follows.
Assume we have a rank r. We consider opening a parenthesis (go from (i, j) to
(i + 1, j + 1)) as long as the number of paths from that point does no longer
exceed our rank r. If it does, we close a parenthesis instead (go from (i, j) to
(i − 1, j + 1)). Assume, that we went upwards to (i, j) and then had to go down
to (i − 1, j + 1). We subtract the number of paths from (i + 1, j + 1) from our
rank r and proceed iteratively from (i − 1, j + 1) by going up as long as possible
and going down again. Remembering the number of parenthesis opened and
closed along our way results in the required encoding. The following algorithm
finalizes these ideas.
UnrankTree(n, r)
Input: a number of inner nodes n and a rank r ∈ [0, C(n − 1)]
106 CHAPTER 3. JOIN ORDERING
Given an array with the encoding of a tree, it is easy to construct the tree
from it. The following procedure does that.
TreeEncoding2Tree(n, aEncoding) {
Input: the number of internal nodes of the tree n
Output: root node of the result tree
root = new Node; /* root of the result tree */
curr = root; /* curr: current internal node whose subtrees are to be created */
i = 1; /* pointer to entry in encoding */
child = 0; /* 0 = left , 1 = right: next child whose subtree is to be created */
while (i < n) {
lDiff = aEncoding[i] - aEncoding[i − 1];
for (k = 1; k < lDif f ; + + k) {
if (child == 0) {
curr->addLeftLeaf();
child = 1;
} else {
curr->addRightLeaf();
while (curr->right() != 0) {
curr = curr->parent();
}
child = 1;
}
}
if (child == 0) {
curr->left(new Node(curr)); // curr becomes parent of new node
curr = curr->left();
++i;
3.3. PROBABILISTIC ALGORITHMS 107
child = 0;
} else {
curr->right(new Node(curr));
curr = curr->right();
++i;
child = 0;
}
}
while (curr != 0) {
curr->addLeftLeaf(); // addLeftLeaf adds leaf if no left-child exists
curr->addRightLeaf(); // analogous
curr = curr->parent();
}
return root;
}
R1 S1 R1 R1 S1
R2 S2 S1 R2 R1
R2 S1 R2
v v
S2 S2 S2
R S
v v v
with which we can construct new join trees: leaf-insertion introduces a new
leaf node into a given tree and tree-merging merges two join trees. Since we
do not want to generate cross products in this section, we have to apply these
operations carefully. Therefor, we need a description of how to generate all
valid join trees for a given query graph. The central data structure for this
purpose is the standard decomposition graph (SDG). Hence, in the second step,
we define SDGs and introduce an algorithm that derives an SDG from a given
query graph. In the third step, we start counting. The fourth and final step
consists of the unranking algorithm. We do not discuss the ranking algorithm.
It can be found in [302].
We use the Prolog notation | to separate the first element of a list from its
tail. For example, the list ha|ti has a as its first element and a tail t. Assume
that P is a property of elements. A list l0 is the projection of a list L on P , if
L0 contains all elements of L satisfying the property P . Thereby, the order is
retained. A list L is a merge of two disjoint lists L1 and L2 if L contains all
elements from L1 and L2 and both are projections of L.
A merge of a list L1 with a list L2 whose respective lengths are l1 and l2
can be described by an array α = [α0 , . . . , αl2 ] of non-negative integers whose
sum is equal to l1 . The non-negative integer αi−1 gives the number of elements
of L1 which precede the i-th element of L2 in the merged list. We obtain the
merged list L by first taking α0 elements from L1 . Then, an element from L2
follows. Then α1 elements from L1 and the next element of L2 follow and so
on. Finally follow the last αl2 elements of L1 . Figure 3.17 illustrates possible
merges.
Compare list merges to the problem of non-negative (weak) integer com-
position [?]. There, we ask for the number of compositions
P of a non-negative
integer n into k non-negative integers αi with ki=1 αk = n. The answer is
n+k−1
k−1 [818]. Since we have to decompose l1 into l2 + 1 non-negative inte-
gers, the number of possible merges is M (l1 , l2 ) = l1 l+l 2
2
. The observation
3.3. PROBABILISTIC ALGORITHMS 109
within Sk .
Accordingly, we partition the set of all possible merges into subsets. Each
subset is determined by α0 . For example, the set of possible merges of two
lists L1 and L2 with length l1 = l2 = 4 is partitioned into subsets with α0 = j
for 0 ≤ j ≤ 4. In each partition, we have M (j, l2 − 1) elements. To unrank
a number P r ∈ [1, M (l1 , l2 )], we first determine the partition by computing k =
j 0
minj r ≤ i=0 M (j, l2 − 1). Then, α0 = l1 − k. With the new rank r =
Pk
r − i=0 M (j, l2 − 1), we start iterating all over. The following table gives
the numbers for our example and can be used to understand the unranking
algorithm. The algorithm itself can be found in Figure 3.18.
UnrankDecomposition(r, l1 , l2 )
Input: a rank r, two list sizes l1 and l2
Output: a merge specification α.
for (i = 0; i ≤ l2 ; + + i) {
alpha[i] = 0;
}
i = k = 0;
while (l1 > 0 && l2 > 0) {
m = M (k, l2 − 1);
if (r ≤ m) {
alpha[i + +] = l1 − k;
l1 = k;
k = 0;
− − l2 ;
} else {
r− = m;
+ + k;
}
}
alpha[i] = l1 ;
return alpha;
T1 v T1 T1
T1 v T2
T2 T2
v
w T2 w w w
v(k)
Observe that if T = (L, v) ∈ TG , then T ∈ TG ≺ |L| = k.
The operation leaf-insertion is illustrated in Figure 3.19. A new leaf v is
inserted into the tree at level k. Formally, it is defined as follows.
3.3. PROBABILISTIC ALGORITHMS 111
e +e [0, 5, 5, 5, 3]
e c ∗c [0, 0, 2, 3]
a b c d
b d
[0, 1, 1] +c +c [0, 1]
a
[0, 1] +b d [1]
[1] a
Figure 3.20: A query graph, its tree, and its standard decomposition graph
• Ti = (Li , w).
Let α be the integer composition such that L is the result of merging L1 and L2
on α. Then we call (T1 , T2 , α) a merge triplet. We say that T is decomposed
into (constructed from) (T1 , T2 , α) on V1 and V2 .
to construct all possible unordered join trees. For each of our two operations it
has one kind of inner nodes. A unary node labeled +v stands for leaf-insertion
of v. A binary node labeled ∗w stands for tree-merging its subtrees whose only
common leaf is w.
The standard decomposition graph of a query graph G = (V, E) is con-
structed in three steps:
1. pick an arbitrary node r ∈ V as its root node;
3. call QG2SDG(G0 , r)
with
QG2SDG(G0 , r)
Input: a query tree G0 = (V, E) and its root r
Output: a standard query decomposition tree of G0
Let {w1 , . . . , wn } be the children of v;
switch (n) {
case 0: label v with "v";
case 1:
label v as "+v ";
QG2SDG(G0 , w1 );
otherwise:
label v as "∗v ";
create new nodes l, r with label +v ;
E \ = {(v, wi )|1 ≤ i ≤ n};
E ∪ = {(v, l), (v, r), (l, w1 )} ∪ {(r, wi )|2 ≤ i ≤ n};
QG2SDG(G0 , l);
QG2SDG(G0 , r);
}
return G0 ;
Note that QG2SDG transforms the original graph G0 into its SDG by side-effects.
Thereby, the n-ary tree is transformed into a binary tree similar to the procedure
described by Knuth [496, Chap 2.3.2]. Figure 3.20 shows a query graph G, its
tree G0 rooted at e, and its standard decomposition tree.
v(k)
For an efficient access to the number of join trees in some partition TG
in the unranking algorithm, we materialize these numbers. This is done in the
count array. The semantics of a count array [c0 , c1 , . . . , cn ] of a node u with
label ◦v (◦ ∈ {+, ∗}) of the SDG is that u can construct ci different trees in
which leaf v is at level i. Then, the total number of trees for a query can be
computed by summing up all the ci in the count array of the root node of the
decomposition tree.
To compute the count and an additional summand adornment of a node
labeled +v , we use the following lemma.
3.3. PROBABILISTIC ALGORITHMS 113
Lemma 3.3.4 Let G = (V, E) be a query graph with n nodes, v ∈ V such that
G0 = G|V \v is connected, (v, w) ∈ E, and 1 ≤ k < n. Then
v(k)
X w(i)
|TG | = |TG0 |
i≥k−1
This lemma follows from the observation made after the definition of the leaf-
insertion operation.
w(i)
The sets TG0 used in the summands of Lemma 3.3.4 directly correspond
v(k),i v(k),i
to subsets TG (k − 1 ≤ i ≤ n − 2) defined such that T ∈ TG if
v(k)
1. T ∈ TG ,
2. the insertion pair on v of T is (T 0 , k), and
w(i)
3. T 0 ∈ TG0 .
v(k),i w(i)
Further, |TG | = |TG0 |. For efficiency, we materialize the summands in an
array of arrays summands.
To compute the count and summand adornment of a node labeled ∗v , we use
the following lemma.
Lemma 3.3.5 Let G = (V, E) be a query graph, w ∈ V , T = (L, w) a join
tree of G, V1 , V2 ⊆ V such that G1 = G|V1 and G2 = G|V2 are connected,
V1 ∪ V2 = V , and V1 ∩ V2 = {v}. Then
v(k)
X k v(i) v(k−i)
|TG | = |TG1 | |TG2 |
i
i
This lemma follows from the observation made after the definition of the tree-
merge operation.
w(i)
The sets TG0 used in the summands of Lemma 3.3.5 directly correspond
v(k),i v(k),i
to subsets TG (0 ≤ i ≤ k) defined such that T ∈ TG if
v(k)
1. T ∈ TG ,
2. the merge triplet on V1 and V2 of T is (T1 , T2 , α), and
v(i)
3. T1 ∈ TG1 .
v(k),i v(i) v(k−i)
Further, |TG | = ki |TG1 | |TG2 |.
Before we come to the algorithm for computing the adornments count and
summands, let us make one observation that follows directly from the above
two lemmata. Assume a node v whose count array is [c1 , . . . , cP m ] and whose
summands is s = [s0 , . . . , sn ] with si = [si0 , . . . , sim ], then ci = m i
j=0 sj holds.
Figure 3.21 contains the algorithm to adorn SDG’s nodes with count and
summands. It has worst-case complexity O(n3 ). Figure 3.20 shows the count
adornment for the SDG. Looking at the count array of the root node, we see
that the total number of join trees for our example query graph is 18.
The algorithm UnrankLocalTreeNoCross called by UnrankTreeNoCross adorns
the standard decomposition graph with insert-at and merge-using annota-
tions. These can then be used to extract the join tree.
114 CHAPTER 3. JOIN ORDERING
Adorn(v)
Input: a node v of the SDG
Output: v and nodes below are adorned by count and summands
Let {w1 , . . . , wn } be the children of v;
switch (n) {
case 0: count(v) := [1]; // no summands for v
case 1:
Adorn(w1 );
assume count(w1 ) = [c10 , . . . , c1m1 ]; P 1
count(v) = [0, c1 , . . . , cm1 +1 ] where ck = m i=k−1 ci ;
1
summands(v) 0
= [s , . . . , s m 1 +1 ] where s = [s0 , . . . , skm1 +1 ] and
k k
1
ci if 0 < k and k − 1 ≤ i
ski =
0 else
case 2:
Adorn(w1 );
Adorn(w2 );
assume count(w1 ) = [c10 , . . . , c1m1 ];
assume count(w2 ) = [c20 , . . . , c2m2 ];
count(v) = [c0 , . . . , cm1 +m2 ] where
P 1 k 1 2
ck = m 2
i=0 i ci ck−i ; // ci = 0 for i 6∈ {0, . . . , m2 }
0 m +m ] where sk = [sk0 , . . . , skm1 ] and
k 1 2= [s , . . . , s
summands(v) 1 2
ski = i ci ck−i if 0 ≤ k − i ≤ m2
0 else
}
UnrankTreeNoCross(r,v)
Input: a rank r and the root v of the SDG
Output: adorned SDG
let count(v) = [x0 , . . . , xm ];
P
k := minj r ≤ ji=0 xi ; // efficiency: binary search on materialized sums.
P
r0 := r − k−1
i=0 xi ;
UnrankLocalTreeNoCross(v, r0 , k);
e(k)
The following table shows the intervals associated with the partitions TG for
the standard decomposition graph in Figure 3.20:
Partition Interval
e(1)
TG [1, 5]
e(2)
TG [6, 10]
e(3)
TG [11, 15]
e(4)
TG [16, 18]
3.3. PROBABILISTIC ALGORITHMS 115
{(x, y, z)|1 ≤ x ≤ X, 1 ≤ y ≤ Y, 1 ≤ z ≤ Z}
UnrankingTreeNoCrossLocal(v, r, k)
Input: an SDG node v, a rank r, a number k identifying a partition
Output: adornments of the SDG as a side-effect
Let {w1 , . . . , wn } be the children of v
switch (n) {
case 0:
assert(r = 1 && k = 0);
// no additional adornment for v
case 1:
let count(v) = [c0 , . . . , cn ];
let summands(v) = [s0 , . . . , sn ];
assert(k ≤ n && r ≤ ck );
P
k1 = minj r ≤ ji=0 ski ;
P 1 −1 k
r1 = r − ki=0 si ;
insert-at(v) = k;
UnrankingTreeNoCrossLocal(w1 , r1 , k1 );
case 2:
let count(v) = [c0 , . . . , cn ];
let summands(v) = [s0 , . . . , sn ];
let count(w1 ) = [c10 , . . . , c1n1 ];
let count(w2 ) = [c20 , . . . , c2n2 ];
assert(k ≤ n && r ≤ ck );
P
k1 = minj r ≤ ji=0 ski ;
P 1 −1 k
q = r − ki=0 si ;
k2 = k − k1 ;
(r1 , r2 , a) = UnrankTriplet(q, c1k1 , c2k2 , ki );
α = UnrankDecomposition(a);
merge-using(v) = α;
UnrankingTreeNoCrossLocal(w1 , r1 , k1 );
UnrankingTreeNoCrossLocal(w2 , r2 , k2 );
}
116 CHAPTER 3. JOIN ORDERING
QuickPick(Query Graph G)
Input: a query graph G = ({R1 , . . . , Rn }, E)
Output: a bushy join tree
BestTreeFound = any join tree
while stopping criterion not fulfilled {
E 0 = E;
Trees = {R1 , . . . , Rn };
while (|Trees| > 1) {
choose e ∈ E 0 ;
E 0 − = e;
if (e connects two relations in different subtrees T1 , T2 ∈ Trees) {
Trees -= T1 ;
Trees -= T2 ;
Trees += CreateJoinTree(T1 , T2 );
}
}
Tree = single tree contained in Trees;
if (cost(Tree) < cost(BestTreeFound)) {
BestTreeFound = Tree;
}
}
return BestTreeFound
IterativeImprovementBase(Query Graph G)
Input: a query graph G = ({R1 , . . . , Rn }, E)
Output: a join tree
do {
JoinTree = random tree
JoinTree = IterativeImprovement(JoinTree)
3.3. PROBABILISTIC ALGORITHMS 117
IterativeImprovement(JoinTree)
Input: a join tree
Output: improved join tree
do {
JoinTree’ = randomly apply a transformation to JoinTree;
if (cost(JoinTree’) < cost(JoinTree)) {
JoinTree = JoinTree’;
}
} while (local minimum not reached)
return JoinTree
SimulatedAnnealing(Query Graph G)
Input: a query graph G = ({R1 , . . . , Rn }, E)
Output: a join tree
BestTreeSoFar = random tree;
Tree = BestTreeSoFar;
118 CHAPTER 3. JOIN ORDERING
do {
do {
Tree’ = apply random transformation to Tree;
if (cost(Tree’) < cost(Tree)) {
Tree = Tree’;
} else {
0
with probability e−(cost(T ree )−cost(T ree))/temperature
Tree = Tree’;
}
if (cost(Tree) < cost(BestTreeSoFar)) {
BestTreeSoFar = Tree’;
}
} while (equilibrium not reached)
reduce temperature;
} while (not frozen)
return BestTreeSoFar
Besides the rule set used, the initial temperature, the temperature reduc-
tion, and the definitions of equilibrium and frozen determine the algorithm’s
behavior. For each of them several alternatives have been proposed in the lit-
erature. The starting temperature can be calculated as follows: determine the
standard deviation σ of costs by sampling and multiply it with a constant val-
ue ([847] use 20). An alternative is to set the starting temperature twice the
cost of the first randomly selected join tree [445] or to determine the starting
temperature such that at least 40% of all possible transformations are accepted
[823].
For temperature reduction, we can apply the formula temp∗ = 0.975 [445]
λt
or max(0.5, e− σ ) [847].
The equilibrium is defined to be reached if for example the cost distribution
of the generated solutions is sufficiently stable [847], the number of iterations is
sixteen times the number of relations in the query [445], or number of iterations
is the same as the number of relations in the query [823].
We can establish frozenness if the difference between the maximum and
minimum costs among all accepted join trees at the current temperature equals
the maximum change in cost in any accepted move at the current temperature
[847], the current solution could not be improved in four outer loop iterations
and the temperature has been fallen below one [445], or the current solution
could not be improved in five outer loop iterations and less than two percent of
the generated moves were accepted [823].
Considering databases are used in mission critical applitions. Would you
bet your business on these numbers?
the cheapest is considered even if its cost are higher than the costs of the current
join tree. In order to avoid running into cycles, a tabu set is maintained. It
contains the last join trees generated, and the algorithm is not allowed to visit
them again. This way, it can escape local minima, since eventually all nodes in
the valley of a local minimum will be in the tabu set. The stopping conditions
could be that there ws no improvement over the current best solution found
during the last given number of iterations or if the set neighbors minus the tabu
set is empty (in line (*)).
Tabu Search looks as follows:
TabuSearch(Query Graph)
Input: a query graph G = ({R1 , . . . , Rn }, E)
Output: a join tree
Tree = random join tree;
BestTreeSoFar = Tree;
TabuSet = ∅;
do {
Neighbors = all trees generated by applying a transformation to Tree;
Tree = cheapest in Neighbors \ TabuSet; (*)
if (cost(Tree) < cost(BestTreeSoFar)) {
BestTreeSoFar = Tree;
}
if(|TabuSet| > limit) remove oldest tree from TabuSet;
TabuSet += Tree;
} while (not stopping condition satisfied);
return BestTreeSoFar;
• Chromosome ←→ string
• Gene ←→ character
1 R2
B
R1 2
B B 1243
R3
3
B R3 R4 R5
4
R5 R4
R1 R2
The subsequence exchange for the ordered list encoding works as follows.
Assume two individuals with chromosomes u1 v1 w1 and u2 v2 w2 . From these we
generate u1 v10 w1 and u2 v20 w2 , where vi0 is a permutation of the relations in vi
such that the order of their appearence is the same as in u3−i v3−i w3−i . In order
to adapt the subsequence exchange operator to the ordinal number encoding,
we have to require that the vi are of equal length (|v1 | = |v2 |) and occur at the
same offset (|u1 | = |u2 |). We then simply swap the vi . That is, we generate
u1 v2 w1 and u2 v1 w2 .
The subset exchange is defined only for the ordered list encoding. Within
the two chromosomes, we find two subsequences of equal length comprising the
same set of relations. These sequences are then simply exchanged.
of a given size. We stop after we have not seen an improvement within the
population for a fixed number of iterations (say 30).
3.4.2 AB-Algorithm
The AB-Algorithm was developed by Swami and Iyer [848, 849]. It builds on
the IKKBZ-Algorithm by resolving its limitations. First, if the query graph
is cyclic, a spanning tree is selected. Second, two different cost functions for
joins (join methods) are supported by the AB-Algorithm: nested loop join and
sort merge join. In order to make the sort merge join’s cost model fit the ASI
property, it is simplified. Third, join methods are assigned randomly before
IKKBZ is called. Afterwards, an iterative improvement phase follows. The
algorithm can be formulated as follows:
AB(Query Graph G)
Input: a query graph G = ({R1 , . . . , Rn }, E)
Output: a left-deep join tree
while (number of iterations ≤ n2 ) {
if G is cyclic take spanning tree of G
randomly attach a join method to each relation
JoinTree = result of IKKBZ
while (number of iterations ≤ n2 ) {
apply Iterative Improvement to JoinTree
}
}
return best tree found
larger than in centralized systems [524]. The basic idea is that simulated an-
nealing is called n times with different initial join trees, if n is the number of
relations to be joined. Each join sequence in the set Solutions produced by
GreedyJoinOrdering-3 is used to start an independent run of simulated an-
nealing. As a result, the starting temperature can be decreased to 0.1 times
the cost of the initial plan.
3.4.4 GOO-II
GOO-II appends an Iterative Improvement step to the GOO-Algorithm.
IDP-1({R1 , . . . , Rn }, k)
Input: a set of relations to be joined, maximum block size k
Output: a join tree
for (i = 1; i <= n; ++i) {
BestTree({Ri }) = Ri ;
}
ToDo = {R1 , . . . , Rn };
while (|ToDo| > 1) {
k = min(k, |ToDo|);
for (i = 2; i < k; ++i) {
for all S ⊆ ToDo, |S| = i do {
for all O ⊂ S do {
BestTree(S) = CreateJoinTree(BestTree(S \ O), BestTree(O));
}
}
}
find V ⊂ ToDo, |V | = k
with cost(BestTree(V )) = min{cost(BestTree(W )) | W ⊂ ToDo, |W | = k};
generate new symbol T ;
BestTree({T }) = BestTree(V );
ToDo = (ToDo \ V ) ∪ {T };
for all O ⊂ V do delete(BestTree(O));
}
return BestTree({R1 , . . . , Rn });
IDP-2({R1 , . . . , Rn }, k)
Input: a set of relations to be joined, maximum block size k
Output: a join tree
for (i = 1; i <= n; ++i) {
BestTree({Ri }) = Ri ;
}
ToDo = {R1 , . . . , Rn };
while (|ToDo| > 1) {
// apply greedy algorithm to select a good building block
B = ∅;
for all v ∈ ToDo, do {
B += BestTree({v});
}
do {
find L, R ∈ B
with cost(CreateJoinTree(L,R))
= min{cost(CreateJoinTree(L0 ,R0 )) | L0 , R0 ∈ B};
P = CreateJoinTree(L,R));
B = (B \ {L, R}) ∪ {P };
} while (P involves no more than k relations and |B| > 1);
// reoptimize the bigger of L and R,
// selected in the last iteration of the greedy loop
if (L involves more tables than R) {
ReOpRels = relations involved in L;
} else {
ReOpRels = relations involved in R;
}
P = DP-Bushy(ReOpRels);
generate new symbol T ;
BestTree({T }) = P ;
ToDo = (ToDo \ ReOpRels) ∪ {T };
for all O ⊂ V do delete(BestTree(O));
}
return BestTree({R1 , . . . , Rn });
α(e) to denote the first element of a sequence. We identify single element se-
quences with elements. The function τ retrieves the tail of a sequence, and ⊕
concatenates two sequences. We denote the empty sequence by .
We define the algebraic operators recursively on their input sequences. The
order-preserving join operator is defined as the concatenation of an order-
preserving selection and an order-preserving cross product. For unary oper-
ators, if the input sequence is empty, the output sequence is also empty. For
126 CHAPTER 3. JOIN ORDERING
binary operators, the output sequence is empty whenever the left operand rep-
resents an empty sequence.
The order-preserving join operator is based on the definition of an order-
preserving cross product operator defined as
ˆ ) ⊕ (τ (e )×e
ˆ 2 := (α(e1 )Ae
e1 ×e 2 1 ˆ 2)
where (
ˆ := if e2 =
e1 Ae2 ˆ
(e1 ◦ α(e2 )) ⊕ (e1 Aτ (e2 )) else
We are now prepared to define the join operation on ordered sequences:
ˆ 2)
e1 B̂p e2 := σ̂p (e1 ×e
Before introducing the algorithm, let us have a look at the size of the search
space. Since the order-preserving join is associative but not commutative, the
input to the algorithm must be a sequence of join operators or, likewise, a
sequence of relations to be joined. The output is then a fully parenthesized
expression. Given a sequence of n binary associative but not commutative
operators, the number of fully parenthesized expressions is (see [204])
1 if n = 1
P (n) = Pn−1
k=1 P (k)P (n − k) if n>1
applicable-predicates(R, P)
01 B=∅
02 foreach p ∈ P
03 IF (F(p) ⊆ A(R))
04 B+ = p
05 return B
construct-bushy-tree(R, P)
01 n = |R|
02 for i = 1 to n
03 B =applicable-predicates(Ri , P)
04 P =P \B
05 p[i, i] = B
06 s[i, i] = S0 (Ri , B)
07 c[i, i] = C0 (Ri , B)
08 for l = 2 to n
09 for i = 1 to n − l + 1
10 j =i+l−1
11 B = applicable-predicates(Ri...j , P)
12 P =P \B
13 p[i, j] = B
14 s[i, j] = S1 (s[i, j − 1], s[j, j], B)
15 c[i, j] = ∞
16 for k = i to j − 1
17 q = c[i, k] + c[k + 1, j] + C1 (s[i, k], s[k + 1, j], B)
18 IF (q < c[i,j])
19 c[i, j] = q
20 t[i, j] = k
extract-plan(R, t, p)
extract-subplan(R, t, p, i, j)
01 IF (j > i)
02 X = extract-subplan(R, t, p, i, t[i, j])
03 Y = extract-subplan(R, t, p, t[i, j] + 1, j)
04 return X B̂p[i,j] Y
05 else
06 return σ̂p[i,i] (Ri )
statistics of the result of applying the predicates to the base relation and the
costs for computing these intermediate results, i.e. for retrieving the relevant
part of the base relation and applying the predicates (lines 02-07). Note that
this is not really trivial if there are several index structures that can be applied.
Then computing C0 involves considering different access paths. Since this is an
issue orthogonal to join ordering, we do not detail on it.
After we have the costs and statistics for sequences of length one, we com-
pute the same information for sequences of length two, three, and so on until
n (loop starting at line 08). For every length, we iterate over all subsequences
of that length (loop starting at line 09). We compute the applicable predicates
and the statistics. In order to determine the minimal costs, we have to consider
every possible split point. This is done by iterating the split point k from i to
j − 1 (line 16). For every k, we compute the cost and remember the k that
resulted in the lowest costs (lines 17-20).
The last subroutine takes the relations, the split points (t), and the applica-
ble predicates (p) as its input and extracts the plan. The whole plan is extracted
by calling extract-plan. This is done by instructing extract-subplan to re-
trieve the plan for all relations. This subroutine first determines whether the
plan for a base relation or that of an intermediate result is to be constructed.
In both cases, we did a little cheating here to keep things simple. The plan we
construct for base relations does not take the above-mentioned index structures
into account but simply applies a selection to a base relation instead. Obvi-
ously, this can easily be corrected. We also give the join operator the whole
set of predicates that can be applied. That is, we do not distinguish between
join predicates and other predicates that are better suited for a selection sub-
sequently applied to a join. Again, this can easily be corrected.
Let us have a quick look at the complexity of the algorithm. Given n rela-
tions with m attributes in total and p predicates, we can implement applicable-predicates
in O(pm) by using a bit vector representation for attributes and free variables
and computing the attributes for each sequence Ri , . . . , Rj once upfront. The
latter takes O(n2 m).
The complexity of the routine construct-bushy-tree is determined by the
three nested loops. We assume that S1 and C1 can be computed in O(p), which
is quite reasonable. Then, we have O(n3 p) for the innermost loop, O(n2 ) calls to
applicable-predicates, which amounts to O(n2 pm), and O(n2 p) for calls of
S1 . Extracting the plan is linear in n. Hence, the total runtime of the algorithm
is O(n2 (n + m)p)
In order to illustrate the algorithm, we need to fix the functions S0 , S1 , C0
and C1 . We use the simple cost function Cout . As a consequence, the array s
simply stores cardinalities, and S0 has to extract the cardinality of a given base
relation and multiply it by the selectivities of the applicable predicates. S1 mul-
tiplies the input cardinalities with the selectivities of the applicable predicates.
We set C0 to zero and C1 to S1 . The former is justified by the fact that every
relation must be accessed exactly once and hence, the access costs are equal in
130 CHAPTER 3. JOIN ORDERING
s
200
1
1
20
After initilization, the array c has 0 everywhere in its diagonal and the array p
empty sets.
For l = 2, the algorithm produces the following values:
where for each k the value of q (in the following table denoted by qk ) is deter-
mined as follows:
q1 = c[1, 1] + c[2, 4] + 40 = 0 + 3 + 40 = 43
q2 = c[1, 2] + c[3, 4] + 40 = 100 + 2 + 40 = 142
q3 = c[1, 3] + c[4, 4] + 40 = 101 + 0 + 40 = 141
Collecting all the above t[i, j] values leaves us with the following array as
input for extract-plan:
i\j 1 2 3 4
1 1 1 1
2 2 3
3 3
4
000 extract-plan(. . ., 1, 4)
100 extract-plan(. . ., 1, 1)
200 extract-plan(. . ., 2, 4)
210 extract-plan(. . ., 2, 3)
211 extract-plan(. . ., 2, 2)
212 extract-plan(. . ., 3, 3)
210 return (R2 B̂true R3 )
220 extract-plan(. . ., 4, 4)
200 return ((R2 B̂true R3 )B̂p3,4 R4 )
000 return (R1 B̂p1,2 ∧p1,4 ((R2 B̂true R3 )B̂p3,4 R4 ))
pointer into the literature might be useful. Lanzelotte and Valduriez provide an
object-oriented design for search strategies [522]. This allows easy modification
and even the exchange of the plan generator’s search strategy.
critical size is reached earlier. Since the number of selectivities involved in the
first few joins is small regardless of the connectivity, there is a lower limit to the
number of joined relations required to arrive at the critical intermediate result
size. If the connectivity is larger, this point is reached earlier, but there exists
a lower limit on the connectivity where this point is reached. The reason for
this lower limit is that the number of selectivities involved in the joins remains
small for the first couple of relations, independent of their connectivity. These
lines of argument explain subsequent findings, too.
The reader should be aware of the fact that the number of relations joined is
quite small (10) in our experiments. Further, as observed by several researchers,
if the number of joins increases, the number of “good” plans decreases [298, 845].
That is, increasing the number of relations makes the join ordering problem
more difficult.
Heuristics
For analyzing the influence of the parameters on the performance of heuristics,
we give the figures for four different heuristics. The first two are very simple.
The minSel heuristic selects those relations first of which incident join edges
exhibit the minimal selectivity. The recMinRel heuristic chooses those relations
first which result in the smallest intermediate relation.
We also analyzed the two advanced heuristics IKKBZ and RDC . The IKKBZ
heuristic [512] is based on an optimal join ordering procedure [432, 512] which
is applied to the minimal spanning tree of the join graph where the edges are
labeled by the selectivities. The family of RDC heuristics is based on the rela-
tional difference calculus as developed in [412]. Since our goal is not to bench-
mark different heuristics in order to determine the best one, we have chosen
the simplest variant of the family of RDC based heuristics. Here, the relations
are ordered according to a certain weight whose actual computation is—for
the purpose of this section—of no interest. The results of the experiments are
presented in Figure 3.30.
On a first glance, these figures look less regular than those presented so far.
This might be due to the non-stable behavior of the heuristics. Nevertheless,
we can extract the following observations. Many curves exhibit a peak at a
certain connectivity. Here, the heuristics perform worst. The peak connectivity
is dependent on the selectivity size but not as regular as in the previous curves.
Further, higher selectivities flatten the curves, that is, heuristics perform better
at higher selectivities.
Of course, if all local minima are of about the same cost, we do not have to
worry, otherwise we do. It would be very interesting to know the percentage of
local minima that are close to the global minima.
Concerning the second property, we first have to define the connection cost.
Let a and b be two nodes and P be the set of all paths from a to b. The
connection cost of a and b is then defined as minp∈P maxs∈p {cost(s)|s 6= a, s 6=
b}. Now, if the connection costs are high, we know that if we have to travel
from one local minima to another, there is at least one node we have to pass
which has high costs. Obviously, this is bad for our probabilistic procedures.
Ioannidis and Kang [446] call a search graph that is favorable with respect to
the two properties a well . Unfortunately, investigating these two properties
of real search spaces is rather difficult. However, Ioannidis and Kang, later
supported by Zhang, succeeded in characterizing cost wells in random graphs
[446, 447]. They also conclude that the search space comprising bushy trees is
better w.r.t. our two properties than the one for left-deep trees.
3.7 Discussion
Choose one of dynamic programming, memoization, permutations as the core
of your plan generation algorithm and extend it with the rest of book. ToDo
136 CHAPTER 3. JOIN ORDERING
3.8 Bibliography
ToDo: Oezsu, Meechan [650, 651]
Chapter 4
In this chapter we go down to the storage layer and discuss leaf nodes of query
execution plans and plan fragments. We briefly recap some notions, but reading
a book on database implementation might be helpful [397, 312]. Although
alternative storage technologies exist and are being developed [752], databases
are mostly stored on disks. Thus, we start out by introducing a simple disk
model to capture I/O costs. Then, we say some words about database buffers,
physical data organization, slotted pages and tuple identifiers (TIDs), physical
record layout, physical algebra, and the iterator concept. These are the basic
notions in order to start with the main purpose of this section: giving an
overview over the possibilities available to structure the low level parts of a
physical query evaluation plan. In order to calculate the I/O costs of these plan
fragments, a more sophisticated cost model for several kinds of disk accesses is
introduced.
137
138CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
arm
arm head spindle
assembly sector track
platter
head
arm
arm
pivot
cylinder
inner sectors. The highest density (e.g. in bits per centimeter) at which bits
can be separated is fixed for a given disk. For storing 512 B, this results in a
minimum sector length which is used for the tracks of the innermost cylinder.
Thus, since sectors on outer tracks are longer, storage capacity is wasted there.
To overcome this problem, disks have a varying number of sectors per track.
(This is where the picture lies.) Therefore, the cylinders are organized into
zones. Every zone contains a fixed number of consecutive cylinders, each having
a fixed number of sectors per track. Between zones, the number of sectors per
track varies. Outer zones have more sectors per track than inner zones. Since
the platters rotate with a fixed angular speed, sectors of outer cylinders can be
read faster than sectors of inner cylinders. As a consequence, the throughput
for reading and writing outer cylinders is higher than for inner cylinders.
Assume that we sequentially read all the sectors of all tracks of some con-
secutive cylinders. After reading all sectors of some track, we must proceed to
the next track. If it is contained in the same cylinder, then we must (simply)
use another head: a head switch occurs. Due to calibration, this takes some
time. Thus, if all sectors start at the same angular position, we come too late
to read the first sector of the next track and have to wait. To avoid this, the
angular start positions of the sectors of tracks in the same cylinder are skewed
such that this track skew compensates for the head switch time. If the next
track is contained in another cylinder, the heads have to switch to the next
cylinder. Again, this takes time and we miss the first sector if all sectors of a
surface start at the same angular positions. Cylinder skew is used such that
the time needed for this switch does not make us miss to start reading the next
sector. In general, skewing works in only one direction.
A sector can be addressed by a triple containing its cylinder, head (surface),
and sector number. This triple is called the physical address of a sector. How-
ever, disks are accessed using logical addresses. These are called logical block
numbers (LBN) and are consecutive numbers starting with zero. The disk in-
ternally maps LBNs to physical addresses. This mapping is captured in the
following table:
4.1. DISK DRIVE 139
SCSI bus
Disk 1
Disk 2
Seek Rotational
Disk 3
latency
Data transfer off mechanism Time
2. The disk controller decodes the command and calculates the physical
address.
140CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
3. During the seek the disk drive’s arm is positioned such that the accord-
ing head is correctly placed over the cylinder where the requested block
resides. This step consists of several phases.
4. The disk has to wait until the sector where the requested block resides
comes under the head (rotation latency).
5. The disk reads the sector and transfers data to the host.
form by far the major part. Let us call the result latency time. Then, we assume
an average latency time. This, of course, may result in large errors for a single
request. However, on average, the error can be as “low” as 35% [737]. The next
parameter is the sustained read rate. The disk is assumed to be able to deliver
a certain amount of bytes per second while reading data stored consecutively.
Of course, considering multi-zone disks, we know that this is oversimplified,
but we are still in our simplistic model. Analogously, we have a sustained write
rate. For simplicity, we will assume that this is the same as the sustained read
rate. Last, the capacity is of some interest. A hypothetical disk (inspired by
disks available in 2004) then has the following parameters:
Model 2004
Parameter Value Abbreviated Name
capacity 180 GB Dcap
average latency time 5 ms Dlat
sustained read rate 100 MB/s Dsrr
sustained write rate 100 MB/s Dswr
The time a disk needs to read and transfer n bytes is then approximated by
Dlat + n/Dsrr . Again, this is overly simplistic: (1) due to head switches and
cylinder switches, long reads have lower throughput than short reads and (2)
multiple zones are not modelled correctly. However, let us use this very sim-
plistic model to get some feeling for disk costs.
Database management system developers distinguish between sequential
I/O and random I/O. For sequential I/O, there is only one positioning at the
beginning and then, we can assume that data is read with the sustained read
rate. For random I/O, one positioning for every unit of transfer—typically a
page of say 8 KB—is assumed. Let us illustrate the effect of positioning by a
small example. Assume that we want to read 100 MB of data stored consecu-
tively on a disk. Sequential read takes 5 ms plus 1 s. If we read in blocks of
8 KB where each block requires positioning then reading 100 MB takes 65 s.
Assume that we have a relation of about 100 MB in size, stored on a disk,
and we want to read it. Does it take 1 s or 65 s? If the blocks on which it is
stored are randomly scattered on disk and we access them in a random order,
65 s is a good approximation. So let us assume that it is stored on consecutive
blocks. Assume that we read in chunks of 8 KB. Then,
• other applications,
could move the head away from our reading position. (Congestion on the SCSI
bus may also be problem.) Again, we could be left with 65 s. Reading the
whole relation with one read request is a possibility but may pose problems
to the buffer manager. Fortunately, we can read in chunks much smaller than
100 MB. Consider Figure 4.3. If we read in chunks of 100 8 KB blocks we are
already pretty close to one second (within a factor of two).
142CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
64
32
16
1
1 4 16 64
Figure 4.3: Time to read 100 MB from disk (depending on the number of 8 KB
blocks read at once)
4.1. DISK DRIVE 143
Note that the interleaving of actions does not necessarily mean a negative
impact. This depends on the point of view, i.e. what we want to optimize. If we
want to optimize response time for a single query, then obviously the impact of
concurrent actions is negative. If, however, we want to optimize resource (here:
disk) usage, concurrent actions might help. ToDo?
There are two important things to learn here. First, sequential read is much
faster than random read. Second, the runtime system should secure sequential
read. The latter point can be generalized: the runtime system of a database
management system has, as far as query execution is concerned, two equally
important tasks:
• (asynchronous) prefetching,
• piggy-back scans,
Let us take yet another look at it. 100 MB can be stored on 12800 8 KB
pages. Figure 4.4 shows the time to read n random pages. In our simplistic cost
model, reading 200 pages randomly costs about the same as reading 100 MB
sequentially. That is, reading 1/64th of 100 MB randomly takes as long as
reading the 100 MB sequentially. Let us denote by a the positioning time, s
the sustained read rate, p the page size, and d some amount of consecutively
stored bytes. Let us calculate the break-even point
n ∗ (a + p/s) = a + d/s
n = (a + d/s)/(a + p/s)
= (as + d)/(as + p)
a and s are disk parameters and, hence, fixed. For a fixed d, the break-even
point depends on the page size. This is illustrated in Figure 4.5. The x-axis is
the page size p in multiples of 1 K and the y-axis is (d/p)/n for d = 100 MB.
For sequential reads, the page size does not matter. (Be aware that our
simplistic model heavily underestimates sequential reads.) For random reads,
144CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
2.5
1.5
0.5
0
0 100 200 30
512
256
128
64
32
16
8
1 2 4 8
Figure 4.5: Break-even point in fraction of total pages depending on page size
146CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
Theorem 4.1.1 (Qyang) If the disk arm has to travel over a region of C
cylinders, it is positioned on the first of the C cylinders and has to stop at s − 1
of them, then sDseek (C/s) is an upper bound for the seek time.
Given the page identifier, the buffer frame is found by a hashtable lookup.
Accesses to the hash table and the buffer frame need to be synchronized. Before
accessing a page in the buffer, it must be fixed. These points account for the
fact that the costs of accessing a page in the buffer are, therefore, greater than
zero.
Partition Relation
1 1
contains fragmented
N N
Segment N mapped M Fragment
1
N
consists of contains
N
Page
stores
N M
N represented 1
Record Tuple
This query is valid only if the database item (relation) Student exists. It could
4.3. PHYSICAL DATABASE ORGANIZATION 149
2
This might not be true. Alternatively, the pages of a partition can be consecutively
numbered.
3
Extents are not shown in Fig. 4.6. They can be included between Partitions and Segments.
150CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
273 2
273 827
827 1
273 2
273 827
827 1
be cheaper than the first solution, there is still a non-negligible cost associated
with an attribute access.
The third physical record layout can be used to represent compressed at-
tribute values and even compressed length information for parts of varying size.
Note that if fixed size fields are compressed, their length becomes varying. Ac-
cess to an attribute now means decompressing length/offset information and
decompressing the value itself. The former is quite cheap: it boils down to an
indirect memory access with some offset taken from an array [908]. The cost
of the latter depends on the compression scheme used. It should be clear that
accessing an attribute value now is even more expensive. To make the costs of
an attribute access explicit was the sole purpose of this small section.
Remark Westmann et al. discuss an efficient implementation of compres-
sion and evaluate its performance [908]. Yiannis and Zobel report on experi-
ments with several compression techniques used to speed up the sort operator.
For some of them, the CPU usage is twice as large [946].
from Student
where ◦ denotes tuple concatenation and the ai must not be in A(e). (Remem-
ber that A(e) is the set of attributes produced by e.) Every input tuple t is
extended by new attributes ai , whose values are computed by evaluating the
expression ei , in which free variables (attributes) are bound to the attributes
(variables) provided by t.
The above problem can now be solved by
select name
from Student
where age > 30
The plan
Πn (χn:s.name (σa>30 (χa:s.age (Student[s]))))
does not. In the first plan the name attribute is only accessed for those students
with age over 30. Hence, it should be cheaper to evaluate. If the database
management system does not support this selective access mechanism, we often
find the scan enhanced by a list of attributes that is projected and included in
the resulting tuple stream.
In order to avoid copying attributes from their storage representation to
some main memory representation, some database management systems apply
another mechanism. They support the evaluation of some predicates directly
on the storage representation. These are boolean expressions consisting of sim-
ple predicates of the form Aθc for attributes A, comparison operators θ, and
constants c. Instead of a constant, c could also be the value of some attribute
or expression thereof given that it can be evaluated before the access to A.
Predicates evaluable on the disk representation are called SARGable where
SARG is an acronym for search argument. Note that SARGable predicates
may also be good for index lookups. Then they are called index SARGable.
In case they can not be evaluated by an index, they are called data SARGable
[772, 850, 318].
Since relation or segment scans can evaluate predicates, we have to extend
our notation for scans. Let I be a database item like a relation or segment.
Then, I[v; p] scans I, binds each item in I successively to v and returns only
those items for which p holds. I[v; p] is equivalent to σp (I[v]), but cheaper to
evaluate. If p is a conjunction of predicates, the conjuncts should be ordered
such that the attribute access cost reductions described above are reflected
(for details see Chapter ??). Syntactically, we express this by separating the
predicates by a comma as in Student[s; age > 30, name like ‘%m%0 ]. If we want
to make a distinction between SARGable and non-SARGable predicates, we
write I[v; ps ; pr ], with ps being the SARGable predicate and pr a non-SARGable
predicate. Additional extensions like a projection list are also possible.
4
The page on which the physical record resides must be fixed until all attributes are loaded.
Hence, an earlier point in time might be preferable.
4.9. TEMPORAL RELATIONS 155
It can be evaluated by
Dept[d] Bnl
e.dno=d.dno σe.age>30∧e.age<40 (Emp[d]).
Since the inner (right) argument of the nested-loop join is evaluated several
times (once for each department), materialization may pay off. The plan then
looks like
Dept[d] Bnl
e.dno=d.dno Tmp(σe.age>30∧e.age<40 (Emp[d])).
2. Dept[d] Bnl
e.dno=d.dno Rtmp [e]
The disk costs of writing and reading temporary relations can be calculated
using the considerations of Section 4.1.
select *
from TABLE(Primes(1,100)) as p
returns all primes between 1 and 100. The attribute names of the resulting
relation are specified in the declaration of the table function. Let us assume
that for Primes a single attribute prime is specified. Note that table func-
tions may take parameters. This does not pose any problems, as long as we
156CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
know that Primes is a table function and we translate the above query into
Primes(1, 100)[p]. Although this looks exactly like a table scan, the implemen-
tation and cost calculations are different.
Consider the following query where we extract the years in which we expect
a special celebration of Anton’s birthday.
select *
from Friends f,
TABLE(Primes(
CURRENT YEAR, EXTRACT(YEAR FROM f.birthday) + 100)) as p
where f.name = ‘Anton’
The result of the table function depends on our friend Anton. Hence, a join
is no solution. Instead, we have to introduce a new kind of join, the d-join
where the d stands for dependent. It is defined as
χb:EXT RACT Y EAR(f.birthday)+100 (σf.name=‘Anton0 (Friends[f ])) < Primes(c, b)[p] >
where we assume that some global entity c holds the value of CURRENT YEAR.
Let us do the above query for all friends. We just have to drop the where
clause. Obviously, this results in many redundant computations of primes. At
the SQL level, using the birthday of the youngest friend is beneficial:
select *
from Friends f,
TABLE(Primes(
CURRENT YEAR, (select max(birthday) from Friends) + 100)) as p
where p.prime ≥ f.birthday
At the algebraic level, this kind of optimizations will be considered in Section ??.
Things can get even more involved if table functions can consume and pro-
ToDo? duce relations, i.e. arguments and results can be relations.
Little can be said about the disk costs of table functions. They can be zero
if the function is implemented such that it does not access any disks (files stored
there), but it can also be very expensive if large files are scanned each time it is
called. One possibility is to let the database administrator specify the numbers
the query optimizer needs. However, since parameters are involved, this is
not really an easy task. Another possibility is to measure the table function’s
behavior whenever it is executed, and learn about its resource consumption.
4.11 Indexes
There exists a plethora of different index structures. In the context of relational
database management systems, the most versatile and robust index is the B-
tree or variants/improvements thereof (e.g. []). It is implemented in almost
4.11. INDEXES 157
If there exists a unique index on the key attribute eno, we can first access the
index to retrieve the TID of the employee tuple satisfying eno = 1077. Another
page access yields the tuple itself which constitutes the result of the query. Let
Empeno be the index on eno, then we can descend the B-tree, using 1077 as the
search key. A predicate that can be used to descend the B-tree or, in general,
governing search within an index structure, is called an index sargable predicate.
For the example query, the index scan, denoted as Empeno [x; eno = 1077],
retrieves a single leaf node entry with attributes eno and TID. Similar to the
regular scan, we assume x to be a variable holding a pointer to this index
entry. We use the notations x.eno and x.TID to access these attributes. To
dereference the TID, we use the map (χ) operator and a dereference function
deref (or ∗ for short). It turns a TID into a pointer in the buffer area. This of
course requires the page to be loaded, if it is not in the buffer yet. The complete
plan for the query is
Πname (χe:∗(x.TID),name:e.name (Empeno [x; eno = 1077]))
where we computed several new attributes with one χ operator. Note that
they are dependent on previously computed attributes and, hence, the order of
evaluation does matter.
We can make the dependency of the map operator more explicit by applying
a d-join. Denote by 2 an operator that returns a single empty tuple. Then
Πname (Empeno [x; eno = 1077] < χe:∗(x.TID),name:e.name (2) >)
is equivalent to the former plan. Joins and indexes will be discussed in Sec-
tion 4.14.
A range query like
select name
from Emp
where age ≥ 25 and age ≤ 35
5
Of course, any degree of clusteredness may occur and has to be taken into account in cost
calculations.
4.12. SINGLE INDEX ACCESS PATH 159
This alternative might turn out to be more efficient since sorting on an attribute
with a dense domain can be implemented efficiently. (We admit that in the
above example this is not worth considering.) There is another important
application of this technique: XQuery often demands output in document order.
If this order is destroyed during processing, it must at the latest be restored
when the output it produced [581]. Depending on the implementation (i.e. the
representation of document nodes or their identifiers), this might turn out to
be a very expensive operation.
The fact that index scans on B-trees return their result ordered on the
indexed attributes is also very useful if a merge-join on the same attributes (or
a prefix thereof, see Chapter 23 for further details) occurs. An example follows
later on.
Some predicates are not index SARGable, but can still be evaluated with
the index as in the following query
select name
from Emp
where age ≥ 25 and age ≤ 35 and age 6= 30
Some index scan implementations allow exclusive bounds for start and stop
conditions. With them, the query
select name
from Emp
where age > 25 and age < 35
Πname (χt:x.TID,e:∗t,name:e.name (Empage [x; 25 ≤ age; age ≤ 35; age 6= 25, age 6= 35]))
select name
from Emp
where age ≤ 20
we descend the B-tree to the “leftmost” page, i.e. the page containing the
smallest key value, and then proceed scanning leaf pages until we encounter the
key 20.
Having neither a start nor stop condition is also quite useful. The query
select count(*)
from Emp
can be evaluated by counting the entries in the leaf pages of a B-tree. Since
a B-tree typically occupies far fewer pages than the original relation, we have
a viable alternative to a relation scan. The same applies to the aggregate
functions sum and avg. The other aggregate functions min and max can be
evaluated much more efficiently by descending to the leftmost or rightmost leaf
page of a B-tree. This can be used to answer queries like
select min/max(salary)
from Emp
select name
from Emp
where salary = (select max(salary)
from Emp)
It can be evaluated by first computing the maximum salary and then retrieving
the employees earning this salary. This requires two descendants into the B-
tree, while obviously one is sufficient. Depending on the implementation of the
index (scan), we might be able to perform this optimization.
Further, the result of an index scan, whether it uses start and/or stop con-
ditions or not, is always sorted on the key. This property can be useful for
queries with no predicates. If we have neither a start nor a stop condition, the
resulting scan is called full index scan. As an example consider the query
select salary
from Emp
order by salary
Empsalary
5. a projection list.
A projection list has entries of the form a : x.b for attribute names a and b and
x being the name of the variable for the index entry. a : x.a is also allowed and
often abbreviated as a. We also often summarize start and stop conditions into
a single expression like in 25 ≤ age ≤ 35.
For a full index specification, we list all items in the subscript of the index
name separated by a semicolon. Still, we need some extensions to express the
queries with aggregation. Let a and b be attribute names, then we allow entries
of the form b : aggr(a) in the projection list and start/stop conditions of the
form min/max(a). The latter tells us to minimize/maximize the value of the
indexed attribute a. Only a complete enumeration gives us the full details. On
the other hand, extracting start and stop conditions and residual predicates
from a given boolean expression is rather simple. Hence, we often summarize
these three under a single predicate. This is especially useful when talking
about index scans in general. If we have a full index scan, we leave out the
predicate. We use a star ‘*’ as an abbreviated projection list that projects all
attributes of the index. (So far, these are the key attribute and the TID.) If
the projection list is empty, we assume that only the variable/attribute holding
the pointer to the index entry is projected.
Using this notation, we can express some plan fragments. These fragments
are complete plans for the above queries, except that the final projection is not
present. As an example, consider the following fragment:
All the plan fragments seen so far are examples of access paths. An access
path is a plan fragment with building blocks concerning a single database item.
4.12. SINGLE INDEX ACCESS PATH 163
Hence, every building block is an access path. The above plans touch two
database items: a relation and an index on some attribute of that relation.
If we say that an index concerns the relation it indexes, such a fragment is an
access path. For relational systems, the most general case of an access path uses
several indexes to retrieve the tuples of a single relation. We will see examples
of these more complex access paths in the following section. An access to the
original relation is not always necessary. A query that can be answered solely
by accessing indexes is called an index only query.
A query with in like
select name
from Emp
where age in {28, 29, 31, 32}
can be evaluated by taking the minimum and the maximum found in the left-
hand side of in as the start and stop conditions. We further need to construct
a residual predicate. The residual predicate can be represented either as age =
28 ∨ age = 29 ∨ age = 31 ∨ age = 32 or as age 6= 30.
An alternative is to use a d-join. Consider the example query
select name
from Emp
where salary in {1111, 11111, 111111}
Here, the numbers are far apart and separate index accesses might make sense.
Therefore, let us create a temporary relation Sal equal to {[s : 1111], [s :
11111], [s : 111111]}. When using it, the access path becomes
Some B-tree implementations allow efficient searches for multiple ranges and
implement gap skipping [33, 34, 166, 318, 319, 468, 536]. Gap skipping, some-
times also called zig-zag skipping, continues the search for keys in a new key
range from the latest position visited. The implementation details vary but
the main idea of it is that after one range has been completely scanned, the
current (leaf) page is checked for its highest key. If it is not smaller than the
lower bound of the next range, the search continues in the current page. If it
is smaller than the lower bound of the next range, alternative implementations
are described in the literature. The simplest is to start a new search from the
root for the lower bound. Another alternative uses parent pointers to go up a
page as long as the highest key of the current page is smaller than the lower
bound of the next range. If this is no longer the case, the search continues
downwards again.
Gap skipping gives even more opportunities for index scans and allows effi-
cient implementations of various index nested loop join strategies.
index scan on such an index. Having more attributes in the index makes it
more probable that queries are index-only.
Besides a full index scan, the index can be descended to directly search for
the desired tuple(s). Let us take a closer look at this possibility.
If the search predicate is of the form
k1 = c1 ∧ k2 = c2 ∧ . . . ∧ kj = cj
for some constants ci and some j <= n, we can generate the start and stop
condition
k1 = c1 ∧ . . . ∧ kj = cj .
This simple approach is only possible if the search predicates define values for
all search key attributes, starting from the first search key and then for all
keys up to the j-th search key with no key attribute unspecified in between.
Predicates concerning the other key attributes after the first non-specified key
attribute and the additional data attributes only allow for residual predicates.
This condition is often not necessary for multi-dimensional index structures,
whose discussion is beyond the book.
With ranges things become more complex and highly dependent on the
implementation of the facilities of the B-tree. Consider a query predicate re-
stricting key values as follows
k1 = c1 ∧ k2 ≥ c2 ∧ k3 = c3
a1 ≤ k1 ≤ b1 ∧ . . . ∧ aj ≤ kj ≤ bj
a2 ≤ k2 ≤ b2 ∧ . . . ∧ aj ≤ kj ≤ bj
as the residual predicate. If for some search key attribute kj the lower bound aj
is not specified, the start condition cannot contain kj and any kj+i . If for some
search key attribute kj the upper bound bj is not specified, the stop condition
cannot contain kj and any kj+i .
Two further enhancements of the B-tree functionality possibly allow for
alternative start/stop conditions:
So far, we are only able to exploit query predicates which specify value
ranges for a prefix of all key attributes. Consider querying a person on his/her
height and his/her hair color: haircolor = ’blond’ and height between
180 and 190. If we have an index on sex, haircolor, height, this index
cannot be used by means of the techniques described so far. However, since
there are only the two values male and female available for sex, we can rewrite
the query predicate to (sex = ’m’ and haircolor = ’blond’ and height
between 180 and 190) or (sex = ’f’ and haircolor = ’blond’ and height
between 180 and 190) and use two accesses to the index. This approach works
fine for attributes with a small domain and is described by Antoshenkov [34].
(See also the previous section for gap skipping.) Since the possible values for
key attributes may not be known to the query optimizer, Antoshenkov goes
one step further and shifts the construction of search ranges to index scan time.
Therefore, the index can be provided with a complex boolean expression which
is then refined (rewritten) as soon as search key values become known. Search
ranges are then generated dynamically, and gap skipping is applied to skip the
intervals between the qualifying ranges during the index scan.
select *
from Camera
where megapixel > 5 and distortion < 0.05
and noise < 0.01
and zoomMin < 35 and zoomMax > 105
We assume that on every attribute used in the where clause there exists an
index. Since the predicates are conjunctively connected, we can use a technique
called index and-ing. Every index scan returns a set (list) of tuple identifiers.
These sets/lists are then intersected. This operation is also called And merge
[553]. Using index and-ing, a possible plan is
((((
Cameramegapixel [c; megapixel > 5; TID]
∩
Cameradistortion [c; distortion < 0.05; TID])
∩
Cameranoise [c; noise < 0.01; TID])
∩
CamerazoomMin [c; zoomMin < 35; TID])
∩
CamerazoomMax [c; zoomMax > 105; TID])
166CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
This query can be evaluated by scanning the index on age and then eliminating
all employees with yearsOfEmployment = 10:
Empage [c; age ≥ 65; TID]\ EmpyearsOfEmployment [c; yearsOfEmployment 6= 10; TID]
Let us call the application of set difference on index scan results index differ-
encing.
Some predicates might not be very restrictive in the sense that more than
half the index has to be scanned. By negating these predicates and using
index differencing, we can make sure that at most half of the index needs to be
scanned. As an example consider the query
select *
from Emp
where yearsOfEmployment ≤ 5
and age ≤ 65
EmpyearsOfEmployment [c; yearsOfEmployment ≤ 5; TID] \ Empage [c; age > 65; TID]
becomes
where 2 returns a single empty tuple. Assume that every tuple contains an
attribute TID containing its TID. This attribute does not have to be stored
explicitly but can be derived. Then, we have the following alternative access
path for the join (ignoring projections):
For the join operator, the pointer-based join implementation developed in the
context of object-oriented databases may be the most efficient way to evaluate
the access path [793]. Obviously, sorting the result of the index scan on the
tuple identifiers can speed up processing since it turns random into sequential
I/O. However, this destroys the order on the key which might itself be useful
later on during query processing or required by the query7 . Sorting the tuple
ToDo identifiers was proposed by, e.g., Yao [944], Makinouchi, Tezuka, Kitakami, and
Adachi in the context of RDB/V1 [568]. The different variants (whether or not
and where to sort, join order) can now be transparently determined by the plan
generator: no special treatment is necessary. Further, the join predicates can
not only be on the tuple identifiers but also on key attributes. This often allows
to join with other than the indexed relations (or their indexes) before accessing
the relation.
Rosenthal and Reiner proposed to use joins to represent access paths with
indexes [726]. This approach is very elegant since no special treatment for index
processing is required. However, if there are many relations and indexes, the
search space might become very large, as every index increases the number
of joins to be performed. This is why Mohan, Haderle, Wang, and Cheng
abondoned this approach and sketched a heuristics which determines an access
path in case multiple indexes on a single table exist [614].
The query
select name,age
from Person
where name like ’R%’ and age between 40 and 50
is an index only query (assuming indexes on name and age) and can be trans-
lated to
Πname,age (
Empage [a; 40 ≤ age ≤ 50; TIDa, age]
BTIDa=TIDn
Empname [n; name ≥0 R0 ; name ≤0 R0 ; TIDn, name])
Let us now discuss the former of the two issues mentioned in the section’s
introduction. The query
select *
from Emp e, Dept d
where e.name = ‘Maier’ and e.dno = d.dno
If there are indexes on Emp.name and Dept.dno, we can replace σe.name=0 Maier0 (Emp[e])
by an index scan as we have seen previously:
Here, A(Emp) : t.∗ abbreviates access to all Emp attributes. This especially
includes dno:t.dno. (Strictly speaking, we do not have to access the name
attribute, since its value is already known.)
As we have also seen, an alternative is to use a d-join instead:
Let us abbreviate Empname [x; name = ‘Maier0 ] by Ei and χt:∗(x.T ID),A(e)t.∗ (2) by
Ea .
Now, for any e.dno, we can use the index on Dept.dno to access the ac-
cording department tuple:
Note that the inner expression Deptdno [y; y.dno = dno] contains the free variable
dno, which is bound by Ea . Dereferencing the TID of the department results
in the following algebraic modelling which models a complete index nested loop
join:
Ei < Ea >< Deptdno [y; y.dno = dno; dTID : y.TID] >< χu:∗dTID,A(Dept)u.∗ (2) >
Let us abbreviate Deptdno [y; y.dno = dno; dTID : y.TID] by Di and χu:∗dTID,A(Dept)u.∗ (2)
by Da . Fully abbreviated, the expression then becomes
• we can sort the result of the expression Ei < Ea > on dno for two reasons:
– If there are duplicates for dno, i.e. there are many employees named
“Maier” in each department, then this guarantees that no index page
(of the index Dept.dno) has to be read more than once.
– If additionally Dept.dno is a clustered index or Dept is an index-only
table contained in Dept.dno, then large parts of the random I/O can
be turned into sequential I/O.
170CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
– If the result of the inner is materialized (see below), then only one
result needs to be stored. Note that sorting is not necessary, but
grouping would suffice to avoid duplicate work.
• We can sort the result of the expression Ei < Ea >< Di > on dTID for
the same reasons as mentioned above for sorting the result of Ei on TID.
EX The reader is advised to explicitly write down the alternatives. Another exercise
is to give plan alternatives for the different cases of DB2’s Hybrid Join [318]
which can now be decomposed into primitives like relation scan, index scan,
d-join, sorting, TID dereferencing, and access to a unique index (see below).
Let us take a closer look at materializating the result of the inner of the d-
join. IBM’s DB2 for MVS considers temping (i.e. creating a temporary relation)
the inner if it is an index access [318]. Graefe provides a general discussion
on the subject [345]. Let us start with the above example. Typically, many
employees will work in a single department and possibly several of them are
called “Maier”. For everyone of them, we can be sure that there exists at most
one department. Let us assume that referential integrity has been specified.
Then, there exists exactly one department for every employee. We have to find
a way to rewrite the expression
such that the mapping dno−−→dTID is explicitly materialized (or, as one could
also say, cached ). For this purpose, Hellerstein and Naughton introduced a
modified version of the map operator that materializes its result [408]. Let us
denote this operator by χmat . The advantage of using this operator is that it is
quite general and can be used for different purposes (see e.g. [100], Chap. ??,
Chap. ??). Since the map operator extends a given input tuple by some at-
tribute values, which must be computed by an expression, we need one to
express the access to a unique index. For our example, we write
IdxAccDept
dno [y; y.dno = dno]
If we further assume that the outer (Ei < Ea >) is sorted on dno, then
it suffices to remember only the TID for the latest dno. We define the map
operator χmat,1 to do exactly this. A more efficient plan could thus be
select *
from Emp e, Wine w
where e.yearOfBirth = w.year
If we have no indexes, we can answer this query by a simple join where we only
have to decide the join method and which of the relations becomes the outer and
which the inner. Assume we have only wines from a few years. (Alternatively,
some selection could have been applied.) Then it might make sense to consider
the following alternative:
However, the relation Emp is scanned once for each Wine tuple. Hence, it might
make sense to materialize the result of the inner for every year value of Wine
if we have only a few year values. In other words, if we have many duplicates
for the year attribute of Wine, materialization may pay off since then we have
to scan Emp only once for each year value of Wine. To achieve caching of the
inner, in case every binding of its free variables possibly results in many tuples,
requires a new operator. Let us call this operator memox and denote it by M
[345, 100]. For the free variables of its only argument, it remembers the set
of result tuples produced by its argument expression and does not evaluate it
again if it is already cached. Using memox, the above plan becomes
It should be clear that for more complex inners, the memox operator can be
applied at all branches, giving rise to numerous caching strategies. Analogously
to the materializing map operator, we are able to restrict the materialization
to the results for a single binding for the free variables if the outer is sorted (or
grouped) on the free variables:
Sortw.yearOfBirth (Wine[w])
< M1 (EmpyearOfBirth [x; x.yearOfBirth = w.year] < χe:∗(x.TID),A(Emp):∗e (2) >) >
merge
EmpyearOfBirth [x] Bx.yearOfBirth=y.year Wineyear [y]
This example makes clear that the order provided by an index scan can be used
to speed up join processing. After evaluating this plan fragment, we have to
access the actual Emp and Wine tuples. We can consider zero, one, or two sorts
EX on their respective tuple identifiers. If the join is sufficiently selective, one of
these alternatives may prove more sufficient than the ones we have considered
so far.
Let R be the relation for which we have to retrieve the tuples. Then we use the
following abbreviations
We assume that the tuples are uniformly distributed among the m pages. Then,
each page stores B = N/m tuples. B is called blocking factor .
Let us consider some borderline cases. If k > N − N/m or m = 1, then all
pages are accessed. If k = 1 then exactly one page is accessed. The answer to
the general question will be expressed in terms of buckets (pages in the above
case) and items contained therein (tuples in the above case). Later on, we will
also use extents, cylinders, or tracks as buckets and tracks or sectors/blocks as
items.
We assume that a bucket contains items. The total number of items will be
N and the number of requested items will be k. The above question can then
be reformulated to how many buckets contain at least one of the k requested
items, i.e. how many qualifying buckets exist. We start out by investigating
the case where the items are uniformly distributed among the buckets. Two
subcases will be distinguished:
We then discuss the case where the items are non-uniformly distributed.
In any case, the underlying access model is random access. For example,
given a tuple identifier, we can directly access the page storing the tuple. Other
access models are possible. The one we will subsequently investigate is sequen-
tial access where the buckets have to be scanned sequentially in order to find
the requested items. After that, we are prepared to develop a model for disk
access costs.
Throughout this section, we will further assume that the probability that
1 N
we request a set with k items is N for all of the k possibilities to select
(k)
8
a k-set. We often make use of established equalities for binomial coefficients.
For convenience, the most frequently used equalities are listed in Appendix D.
8
A k-set is a set with cardinality k.
174CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
The second expression (4.4) is due to Yao, the third (4.5) is due to Waters.
Palvia and March proved both formulas to be equal [656] (see also [38]). The
fraction m = N/n may not be an integer. For these cases, it is advisable to
have a Gamma-function based implementation of binomial coeffcients at hand
(see [689] for details).
Depending on k and n, either the expression of Yao or the one of Waters is
faster to compute. After the proof of the above formulas and the discussion of
some special cases, we will give several approximations for p.
Proof
The total number of possibilities to pick the k items from all N items is
N
k . The number of possibilities to pick k items from all items not contained in
a fixed single bucket is N k−n . Hence, the probability p that a bucket does not
qualify is p = N k−n / Nk . Using this result, we can do the following calculation
N −n
k
p = N
k
(N − n)! k!(N − k)!
=
k!((N − n) − k)! N !
k−1
Y N −n−i
=
N −i
i=0
4.16. COUNTING THE NUMBER OF ACCESSES 175
2
Let us list some special cases:
n=1 k/N
n=N 1
k=0 0
k=1 B/N = (N/m)N = 1/m
k=N 1
We examine a slight generalization of the first case in more detail. Let N items
be distributed over N buckets such that every bucket contains exactly one item.
Further let us be interested in a subset of m buckets (1 ≤ m ≤ N ). If we pick
k items, then the number of buckets within the subset of size m that qualify is
k
mY1N (k) = m (4.6)
N
In order to see that the two sides are equal, we perform the following calculation:
N −1
Y1N (k) = (1 − k
N
)
k
(N −1)!
k!((N −1)−k)!
= (1 − N!
)
k!(N −k)!
(N − 1)!k!(N − k)!
= (1 − )
N !k!((N − 1) − k)!
N −k
= (1 − )
N
N N −k
= ( − )
N N
N −N +k
=
N
k
=
N
176CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
Since the computation of YnN (k) can be quite expensive, several approxima-
tions have been developed. The first one was given by Waters [899, 900]:
p ≈ (1 − k/N )n
This approximation (also described elsewhere [313, 656]) turns out to be pretty
good. However, below we will see even better approximations.
N,m
For Y n (k) Whang, Wiederhold, and Sagalowicz gave the following ap-
proximation for faster calculation [912]:
m ∗ [ (1 − (1 − 1/m)k )+
(1/(m2 n) ∗ k(k − 1)/2 ∗ (1 − 1/m)k−1 )+
(1.5/(m3 n4 ) ∗ k(k − 1)(2k − 1)/6 ∗ (1 − 1/m)k−1 ) ]
k
plower = (1 − )n
N − n−1
2
k k
pupper = ((1 − ) ∗ (1 − ))n/2
N N −n+1
for n = N/m. Dihr and Saharia claim that the maximal difference resulting
from the use of the lower and the upper bound to compute the number of page
accesses is 0.224—far less than a single page access.
Lemma 4.16.2 Let S be a set with |S| = N elements. Then, the number of
multisets with cardinality k containing only elements from S is
N N +k−1
=
k k
For a proof we just note that there is a bijection between the k-multisets and
the k-subsets of a N + k − 1-set. We can go from a multiset to a set by f with
f ({x1 ≤ . . . ≤ xk }) = {x1 + 0 < x2 + 1 < . . . < xk + (k − 1)} and from a set to a
multiset via g with g({x1 < . . . < xk }) = {x1 − 0 < x2 − 1 < . . . < xk − (k − 1)}.
N,m
Cheungn (k) = m ∗ CheungN
n (k) (4.7)
where
CheungN
n (k) = [1 − p̃] (4.8)
N −n+k−1
k
p̃ = N +k−1
(4.9)
k
k−1
Y N −n+i
= (4.10)
N +i
i=0
n−1
Y N −1−i
= (4.11)
N −1+k−i
i=0
Eq. 4.9 follows from the observation that the probability that some bucket
(N −n+k−1
k )
does not contain any of the k possibly duplicate items is N +k−1 . Eq. 4.10
( k )
follows from
N −n+k−1
k
p̃ = N +k−1
k
(N − n + k − 1)! k!((N + k − 1) − k)!
=
k!((N − n + k − 1) − k)! (N + k − 1)!
(N − n − 1 + k)! (N − 1)!
=
(N − n − 1)! (N − 1 + k)!
k−1
Y N −n+i
=
N +i
i=0
178CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
2
Cardenas discovered a formula that can be used to approximate p̃ [123]:
(1 − n/N )k
As Cheung pointed out, we can use the theorem to derive the number of
distinct items accessed contained in a k-multiset.
Nk
D(N, k) = (4.12)
N +k−1
if the elements in T occur with the same probability in S.
We apply the theorem for the specialQ0 case where every bucket contains exactly
N −1−i −1
one item (n = 1). In this case, i=0 N −1+k−i = NN−1+k . And the number of
N −1 N −1+k−N +1
qualifying buckets is N (1 − N −1+k ) = N ( N −1+k ) = N N +k−1 k
. 2
N
Another way to achieve this formula is the following. There are l pos-
sibilities to pick l different elements out of the N elements in T . In order to
build a k-multiset with l different elements, we mustadditionally choose n − l
N
l
elements from the l elements. Thus, we have l possibilities to
n−l
N
build a k-multiset. The total number of multisets is . Thus we may
l
conclude that
N
l
min(N,k)
X l n−l
D(N, k) = l
l=1
N
l
which can be simplified to the above.
4.16. COUNTING THE NUMBER OF ACCESSES 179
even when computing Y with Eq. 4.5. Nonetheless, for n ≥ 5, the error is
less than two percent. One of the problems when calculating the result of the
left-hand side is that the number of distinct items is not necessarily an integer.
To solve this problem, we can implement all our formulas using the Gamma-
function. But even then a small difference remains.
The approximation given in Theorem 4.16.3 is not too accurate. A better
approximation can be calculated from the probability distribution. Denote by
p(D(N, k) = j) the probability that the number of distinct values if we randomly
select k items with replacement from N given items equals j. Then
X
N k j
p(D(N, k) = j) = j(−1) ((j − l)/N )k
j l
l=0
and thus
min(N,k)
X X
N j
D(N, k) = j j(−1)k ((j − l)/N )k
j l
j=1 l=0
This formula is quite intense to calculate. We can derive a very good approx-
imation by the following reasoning. We draw k elements from the set T with
|T | = N elements. Every element from T can be drawn at most k times. We
produce N buckets, one for each element of T . In each bucket, we insert k
copies of the according element from t. Then, a sequence of draws from T
with duplicates can be represented by a sequence of draws without duplicate
by mapping them to different copies. Thus, the first occurrence is mapped to
the first element in the according bucket, the second one to the second copy
and so on. Then, we can apply formula by Waters and Yao to calculate the
number of buckets (and hence elements of T ) hit:
N k,k
D(N, k) = Y N (k)
Since the approximation is quite accurate and we already know how to efficiently
calculate this formula, this is our method of choice.
We now turn to relax the first assumption. Christodoulakis models the distri-
bution by m numbers ni (for 1 ≤ i ≤ m) if there are m buckets. Each ni equals
the number of records in some bucket i [173]. Luk proposes Zipfian record
distribution [561]. However, Ijbema and Blanken say that Water and Yao’s
formula is still better, as Luk’s formula results in too low values [434]. They all
come up with the same general formula presented below. Vander Zander, Tay-
lor, and Bitton [955] discuss the problem of correlated attributes which results
in some clusteredness. Zahorjan, Bell, and Sevcik discuss the problem where
every item is assigned its own access probability [954]. That is, they relax the
second assumption. We will come back to these issues in Section ??.
We still assume that every item is accessed with the same probability.
However, we relax the first assumption. The following formula derived by
Christodoulakis [173], Luk [561], and Ijbema and Blanken [434] is a simple
application of Waters’s and Yao’s formula to a more general case.
Note that the product formulation in Eq. 4.5 of Theorem 4.16.1 results in a
more efficient computation. We make a note of this in the following corollary.
with ( Q
nj −1 N −k−i
i=0 N −i k ≤ nj
pj = (4.16)
0 N − nj < k ≤ N
If we compute the pj after we have sorted the nj in ascending order, we can use
the fact that
nj+1 −1
Y N −k−i
pj+1 = pj ∗ .
N −i
i=nj
We can also use the theorem to calculate the number of qualifying buckets
in case the distribution is given by a histogram.
4.16. COUNTING THE NUMBER OF ACCESSES 181
L
X
N,m
W nj (k) = li YnNj (k) (4.17)
i=1
Last in this section, let us calculate the probability distribution for the
number of qualifying items within a bucket. The probability that x ≤ nj items
in a bucket j qualify can be calculated as follows. The number of possibilities
nj
to select x items in bucket nj is x . The number of possibilites to draw the
−nj
remaining k − x items from the other buckets is Nk−x . The total number
of possibilities to distribute k items over the buckets is Nk . This shows the
following:
nj
N −nj
x
XnNj (k, x) = N
k−x (4.18)
k
min(k,nj )
N,m X
X nj (k) = xXnNj (k, x) (4.19)
x=0
In standard statistics books the probability distribution XnNj (k, x) is called hy-
pergeometric distribution.
Let us consider the case where all nj are equal to n. Then we can calculate
the average number of qualifying items in a bucket. With y := min(k, n) we
182CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
have
min(k,n)
N,m X
X nj (k) = xXnN (k, x)
x=0
min(k,n)
X
= xXnN (k, x)
x=1
y
1 X n N −n
= N
x
k x=1
x k−x
y
1 X x n N −n
= N
k x=1
1 x k−x
y
1 X n n−1 N −n
= N
k x=1
1 x−1 k−x
n X y−1
1 n−1 N −n
= N
k x=0
0 + x (k − 1) − x
n
1 n−1+N −n
= N
k
0+k−1
n
1 N −1
= N
k
k−1
k k
= n =
N m
Let us consider the even more special case where every bucket contains a
single item. That is, N = m and ni = 1. The probability that a bucket contains
a qualifying item reduces to
1
N −1
x k−1
X1N (k, x) = N
k
N −1
k−1
= N
k
k k
= (= )
N m
Since x can then only be zero or one, the average number of qualifying items a
bucket contains is also Nk .
The formulas presented in this section can be used to estimate the number
of block/page accesses in case of random direct accesses. As we will see next,
other kinds of accesses occur and need different estimates.
is given by
B−j−1
b−1
BbB (j) = B
(4.20)
b
A more general theorem (see Theorem 4.16.13) was first presented by Yao [942].
The above formulation is due to Christodoulakis [176].
To see why the formula holds, consider the total number of bitvectors having
a one in position i followed by j zeros followed by a one. This number is B−j−2
b−2 .
We can chose B − j − 1 positions for i. The total number of bitvectors is Bb
and each bitvector has b − 1 sequences of the form that a one is followed by a
sequence of zeros is followed by a one. Hence,
B−j−2
(B − j − 1)
BbB (j) = B
b−2
(b − 1) b
B−j−1
b−1
= B
b
Part 1. of the theorem follows. To prove part 2., we count the number of
bitvectors that start with j zeros before the first one. There are B − j − 1
positions left for the remaining b−1 ones. Hence, the number of these bitvectors
is B−j−1
b−1 and part 2 follows. Part 3 follows by symmetry.
We can derive a less expensive way to evaluate the formula for BbB (j) as
184CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
b
follows. For j = 0, we have BbB (0) = B. If j > 0, then
B−j−1
b−1
BbB (j) = B
b
(B−j−1)!
(b−1)!((B−j−1)−(b−1))!
= B!
b!(B−b)!
(B − j − 1)! b!(B − b)!
=
(b − 1)!((B − j − 1) − (b − 1))! B!
(B − j − 1)! (B − b)!
= b
((B − j − 1) − (b − 1))! B!
(B − j − 1)! (B − b)!
= b
(B − j − b)! B!
b (B − j)! (B − b)!
=
B − j (B − b − j)! B!
j−1
b Y b
= (1 − )
B−j B−i
i=0
This formula is useful when BbB (j) occurs in sums over j because we can compute
the product incrementally.
Corollary 4.16.10 Using the terminology of Theorem 4.16.9, the expected val-
ue for the number of zeros
is
B−b
X
B B−b
Bb = jBbB (j) = (4.21)
b+1
j=0
4.16. COUNTING THE NUMBER OF ACCESSES 185
Let us calculate:
B−b
X B−b
X
B−j−1 B−j−1
j = (B − (B − j))
b−1 b−1
j=0 j=0
X
B−b
B−j−1
B−b
X
B−j−1
= B − (B − j)
b−1 b−1
j=0 j=0
X
B−b
b−1+j
X
B−b
B−j
= B −b
b−1 b
j=0 j=0
X
B−b
b−1+j
X
B−b
b+j
= B −b
j b
j=0 j=0
(b − 1) + (B − b) + 1 b + (B − b) + 1
= B −b
(b − 1) + 1 b+1
B B+1
= B −b
b b+1
B+1 B
= (B − b )
b+1 b
With
B+1 B(b + 1) − (Bb + b)
B−b =
b+1 b+1
B−b
=
b+1
Corollary 4.16.11 Using the terminology of Theorem 4.16.9, the expected to-
tal number of bits from the first bit to the last one, both included, is
Bb + b
B tot (B, b) = (4.22)
b+1
To see this, we subtract from B the average expected number of zeros between
the last one and the last bit:
B−b B(b + 1) B − b
B− = −
b+1 b+1 b+1
Bb + B − B + b
=
b+1
Bb + b
=
b+1
Bb − B + 2b
B 1-span (B, b) = (4.23)
b+1
We have two possibilities to argue here. The first subtracts from B the number
of zeros at the beginning and the end:
B−b
B 1-span (B, b) = B − 2
b+1
Bb + B − 2B + 2b
=
b+1
Bb − B + 2b
=
b+1
The other possibility is to add the number of zeros between the first and the
last one and the number of ones:
B
B 1-span (B, b) = (b − 1)B b + b
B − b b(b + 1
= (b − 1) +
b+1 b+1
Bb − b2 − B + b + b2 + b
=
b+1
Bb − B + 2b
=
b+1
The number of bits from the first bit to the last one including both . . . The
EX or Cor? distance between the first and the last one . . .
Let us have a look at some possible applications of these formulas. If we
look up one record in an array of B records and we search sequentially, how
many array entries do we have to examine on average if the search is successful?
In [574] we find these formulas used for the following scenario. Let a file
consist of B consecutive cylinders. We search for k different keys, all of which
occur in the file. These k keys are distributed over b different cylinders. Of
course, we can stop as soon as we have found the last key. What is the expected
total distance the disk head has to travel if it is placed on the first cylinder of
the file at the beginning of the search?
Another interpretation of these formulas can be found in [423, 575]. Assume
we have an array consisting of B different entries. We sequentially go through
all entries of the array until we have found all the records for b different keys.
We assume that the B entries in the array and the b keys are sorted. Further,
all b keys occur in the array. On the average, how many comparisons do we
need to find all keys?
Vector of Buckets
A more general scenario is as follows. Consider a sequence of m buckets con-
taining ni items each. Yao [942] developed the following theorem.
4.16. COUNTING THE NUMBER OF ACCESSES 187
Applications of this formula can be found in [173, 176, 574, 576, 867]. Manolopou-
los and Kollias describe the analogue for the replacement model [574].
Lang, Driscoll, and Jou discovered a general theorem which allows us to
estimate the expected number of block accesses for sequential search.
where Y is a random variable for the last item in the sequence that occurs among
the k items searched.
proof? ?
With the help of this theorem, it is quite easy to derive many average
sequential accesses for different models. Cor or EX?
non-replacement replacement
random [170, 173, 561, 674, 912, 941] [123, 173, 656, 674]
sequential [62, 173, 519, 576, 656, 655, 799, 942] [173, 519, 576, 799]
tree-structured [519, 518, 576, 655, 679] [519, 518, 576, 799]
This is a good approximation, as long as Qs (i) is not too small (e.g. > 4).
Ccmd = N Dcmd
...
| {z } | {z } | {z }
Scpe Scpe Scpe
The upper path illustrates Cseekgap , the lower braces indicate those parts for
which Cseekext is responsible.
The average seek cost for reaching the first qualifying cylinder is Davgseek .
How far are we now within the first extent? We use Corollary 4.16.10 to derive
that the number of non-qualifying cylinders preceding the first qualifying one
in some extent i is
Scpe (i) Scpe (i) − Qc (i)
B Qc (i) = .
Qc (i) + 1
The same is found for the number of non-qualifying cylinders following the
last qualifying cylinder. Hence, for every gap between the last and the first
qualifying cylinder of two extents i and i + 1, the disk arm has to travel the
distance
Scpe (i) Scpe (i+1)
∆gap (i) := B Qc (i) + Sfirst (i + 1) − Slast (i) − 1 + B Qc (i+1)
Let us turn to Cseekext (i). We first need the number of cylinders between
the first and the last qualifying cylinder, both included, in extent i. It can be
calculated using Corollary 4.16.12:
Hence, Ξ(i) is the minimal span of an extent that contains all qualifying cylin-
ders.
Using Ξ(i) and Theorem 4.1.1, we can derive an upper bound for Cseekext (i):
Ξ(i)
Cseekext (i) ≤ (Qc (i) − 1)Dseek ( ) (4.32)
Qc (i) − 1
4.17. DISK DRIVE COSTS FOR N UNIFORM ACCESSES 191
There are many more estimates for seek times. Older ones rely on a linear
disk model but also consider different disk scan policies. A good entry point is
the work by Theorey and Pinkerton [858, 859].
Crot (i) = Qt (i) DZscan (Szone (i)) B tot (DZspt (Szone (i)), Qspt (i)) (4.36)
Ssec ,D (Szone (i))
where Qspt (i) = W 1 Zspt
(N ) = DZspt (Szone (i)) SNsec is the expected
number of qualifying sectors per track in extent i. In case Qspt (i) < 1, we set
Qspt (i) := 1.
A more precise model is derived as follows. We sum up for all j the product
of (1) the probability that j sectors in a track qualify and (2) the average number
of sectors that have to be read if j sectors qualify. This gives us the number of
192CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
sectors that have to pass the head in order to read all qualifying sectors. We
only need to multiply this number by the time to scan a single sector and the
number of qualifying tracks. We can estimate (1) using Theorem 4.16.8. For
(2) we again use Corollary 4.16.11.
Using again Theorem 4.16.8, the expected rotational delay for the unquali-
fying sectors then is
We have to sum up this number for all extents and then add the time needed
to scan the N sectors. Hence
Sext
X
Crot = Crotpass (i) + Crotread (i)
i=1
where the total transfer cost for the qualifying sectors of an extent can be
estimated as
Crotread (i) = Qs (i) DZscan (Szone (i))
4.17. DISK DRIVE COSTS FOR N UNIFORM ACCESSES 193
Sext
X
Cheadswitch = (Qt (i) − Qc (i)) Dhdswitch (4.40)
i=1
4.17.7 Discussion
The disk drive cost model derived depends on many parameters. The first bunch
of parameters concerns the disk drive itself. These parameters can (and must
be) extracted from disk drives by using (micro-) benchmarking techniques [307,
853, 594, 759]. The second bunch of parameters concerns the layout of a segment
on disk. The database system is responsible for providing these parameters.
The closer it is to the disk, the easier these parameters are extracted. Building
a runtime system atop the operating system’s file system is obviously a bad
idea from the cost model perspective. If instead the storage manager of the
runtime system implements cylinder aligned extents (or at least track aligned
extents) using a raw I/O interface, the cost model will be easier to develop and
much more precise. Again, providing reliable cost models is one of the most
important tasks of the runtime system.
We have neglected many problems in our disk access model: partially filled
cylinders, pages larger than a block, disk drive’s cache, remapping of bad blocks,
non-uniformly distributed accesses, clusteredness, and so on. Whereas the first
two items are easy to fix, the rest is not so easy. In general, database systems
ignore the disk drive cache. The justifying argument is that the database buffer
is much larger than the disk drive’s cache and, hence, it is very unlikely that we
read a page that is not in the database buffer but in the disk cache. However,
this argument falls short for non-random accesses. Nevertheless, we will ignore
the issue in this book. The interested reader is referred to Shriver’s thesis for
disk cache modeling [800].
Remapping of bad sectors to other sectors really prevents the development
of a precise cost model for disk accesses. Modelling disk drives becomes already
a nightmare since a nice partitioning of the disk into zones is no longer possible
since some sectors, tracks and even cylinders are reserved for the remapping. So
even if no remapping takes place (which is very unlikely), having homogeneous
zones of hundreds of cylinders is a dream that will never come true. The
result is that we do not have dozens of homogeneous zones but hundreds (if
not thousands) of zones of medium homogeneity. These should be reduced to
a model of dozens of homogeneous zones such that the error does not become
too large. The remaining issues will be discussed later in the book. EX
194CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS
There is even more to say about our cost model. A very practical issue
arises if the number of qualifying cylinders is small. Then for some extent i,
the expected number of qualifying cylinders could be Qc (i) = 0.38. For some
of our formulas this is a big problem. As a consequence, special cases for small
EX N , small Qc , small Qt have to be developed and implemented.
Another issue is the performance of the cost model itself. The query com-
piler might evaluate the cost model’s formulas thousands or millions of times.
Hence, they should be fast to evaluate.
So far, we can adequately model the costs of N disk accesses. Some questions
remain. For example, how do we derive the number N of pages we have to
access? Do we really need to fetch all N pages from disk or will we find some
of them in the buffer? If yes, how many? Further, CPU costs are also an
important issue. Deriving a cost model for CPU costs is even more tedious
than modelling disk drive costs. The only choice available is to benchmark all
parts of a system and then derive a cost model using the extracted parameters.
To give examples of parameters to be extracted: we need the CPU costs for
accessing a page present in the buffer, for accessing a page absent in the buffer,
for a next call of an algebraic operator, for executing an integer addition, and
so on. Again, this cannot be done without tools [46, 240, 406, 456, 673].
The bottom line is that a cost model does not have to be accurate, but must
lead to correct decisions. In that sense, it must be accurate at the break even
points between plan alternatives. Let us illustrate this point by means of our
motivating example. If we know that the index returns a single tuple, it is quite
likely that the sequential scan is much more expensive. The same might be true
for 2, 3, 4, and 5 tuples. Hence, an accurate model for small N is not really
necessary. However, as we come close to the costs of a sequential scan, both the
cost model for the sequential scan and the one for the index-based access must
be correct since the product of their errors is the factor a bad choice is off the
best choice. This is a crucial point, since it is easy to underestimate sequential
access costs by a factor of 2-3 and overestimate random access cost by a factor
of 2-5.
4.19 Bibliography
ToDo:
• CPU Costs for B-tree search within inner and leaf pages [515]
• K accesses to unique index: how many page faults if buffer has size b?
[739]
• set oriented disk access to large complex objects [905, 904], assembly
operator: [475],
Foundations
197
Chapter 5
199
200 CHAPTER 5. LOGIC, NULL, AND BOOLEAN EXPRESSIONS
¬true =⇒ false
¬false =⇒ true
p∨p =⇒ p p∧p =⇒ p
p ∨ ¬p =⇒ true p ∧ ¬p =⇒ false
p1 ∨ (p1 ∧ p2 ) =⇒ p1 p1 ∧ (p1 ∨ p2 ) =⇒ p1
p ∨ false =⇒ p p ∧ true =⇒ p
p ∨ true =⇒ true p ∧ false =⇒ false
terpret NULL as a distinct value (for example when eliminating duplicates) and
.
use the NULL aware equality = (right-hand column of Figure 5.4). In general
three-valued expressions are usually converted into two-valued expressions by
considering ⊥ as true-interpreted or false-interpreted (Figure 5.6). An example
for false-interpreted ⊥ values are where clauses, which must evaluate to true to
produce a tuple, while check conditions are true-interpreted.
5.5 Bibliography
NULL-values: [734, 537, 735, 538]
5.5. BIBLIOGRAPHY 201
Commutativity
p1 ∨ p2 ⇐⇒ p2 ∨ p1 p1 ∧ p2 ⇐⇒ p2 ∧ p1
∃e1 ∃e2 p ⇐⇒ ∃e2 ∃e1 p ∀e1 ∀e2 p ⇐⇒ ∀e2 ∀e1 p
Associativity
(p1 ∨ p2 ) ∨ p3 ⇐⇒ p1 ∨ (p2 ∨ p3 ) (p1 ∧ p2 ) ∧ p3 ⇐⇒ p1 ∧ (p2 ∧ p3 )
Distributivity
p1 ∨ (p2 ∧ p3 ) ⇐⇒ (p1 ∨ p2 ) ∧ (p1 ∨ p3 ) p1 ∧ (p2 ∨ p3 ) ⇐⇒ (p1 ∧ p2 ) ∨ (p1 ∧ p3 )
∃e (p1 ∨ p2 ) ⇐⇒ (∃e p1 ) ∨ (∃ p2 ) ∀e (p1 ∧ p2 ) ⇐⇒ (∀e p1 ) ∧ (∀e p2 )
Idempotency
p∨p ⇐⇒ p p∧p ⇐⇒ p
p ∨ ¬p ⇐⇒ true p ∧ ¬p ⇐⇒ false
p1 ∨ (p1 ∧ p2 ) ⇐⇒ p1 p1 ∧ (p1 ∨ p2 ) ⇐⇒ p1
p ∨ false ⇐⇒ p p ∧ true ⇐⇒ p
p ∨ true ⇐⇒ true p ∧ false ⇐⇒ false
De Morgan
¬(p1 ∨ p2 ) ⇐⇒ ¬(p1 ) ∧ ¬(p2 ) ¬(p1 ∧ p2 ) ⇐⇒ ¬(p1 ) ∨ ¬(p2 )
Negation of Quantifiers
Elimination of Negation
x dxenull bxcnull
true true true
⊥ true false
false false false
Functional Dependencies
In many query results attribute values are not independent of each other but
have certain dependencies. Keeping track of these dependencies is very useful
for many optimizations, for example in the following query
the order by clause can be simplified to c.id without affecting the result:
c.id is the key of customers, and thus determines c.nid. c.nid is joined with
n.id, which is the key of nations and determines n.name, thus transitively c.id
determines n.name.
These functional dependencies between attributes have been studied primar-
ily in the context of database design, but many optimization steps like order
optimization (Chapter 23) and query unnesting (Chapter 14) profit greatly from
known functional dependencies. In the following we first study functional de-
pendencies when all attributes are not NULL, then extend this to attributes
with NULL values, and finally discuss how functional dependencies are effected
by relational operators.
f : A1 → A2
For base relations functional dependencies can be derived from the schema,
in particular key constraints and check conditions [667]. For intermediate results
203
204 CHAPTER 6. FUNCTIONAL DEPENDENCIES
1. A2 ⊆ A1 ⇒ A1 → A2
2. A1 → A2 ⇒ (A1 ∪ A3 ) → (A2 ∪ A3 )
3. A1 → A2 ∧ A2 → A3 ⇒ A1 → A3
The Armstrong axioms are sound and complete, i.e., it is possible to derive
all valid functional dependencies by applying these three axioms. For practical
purposes it is often convenient to include three additional rules which can be
derived from the original axioms:
4. A1 → A2 ∧ A1 → A3 ⇒ A1 → (A2 ∪ A3 )
5. A1 → (A2 ∪ A3 ) ⇒ A1 → A2 ∧ A1 → A3
6. A1 → A2 ∧ (A2 ∪ A4 ) → A3 ⇒ (A1 ∪ A4 ) → A3
Given a set of functional dependencies F, we denote with F + the closure
of F, i.e., the set of all functional dependencies that can be derived from F by
using the inference rules shown above.
Closely related to the concept of functional dependencies is the concept of
keys: Given a relation R and an attribute set A ⊆ A(R), A is a super key of
R if A → A(R) holds in R. Further A is a key of R if the following condition
holds:
∀A(A0 ⊂ A ⇒ ¬(A0 → A(R))).
6.4 Bibliography
Chapter 7
• All operators are polymorphic and can deal with (almost) any kind of
complex arguments.
• The algebra is redundant, since some special cases of the operators can
be implemented more efficiently.
205
206 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
X ∪s ∅ = X
X ∪s X = X (idempotency)
X ∪s Y = Y ∪s X (commutativity)
(X ∪s Y ) ∪s Z = X ∪s (Y ∪s Z) (associativity)
X ∩s ∅ = ∅
X ∩s X = X (idempotency)
X ∩s Y = Y ∩s X (commutativity)
(X ∩s Y ) ∩s Z = X ∩s (Y ∩s Z) (associativity)
X \s ∅ = X
∅ \s X = ∅
X \s X = ∅
X \s Y 6= Y \s X (wrong)
(X \s Y ) \s Z 6= X \s (Y \s Z) (wrong)
X ∩s Y = X \s (X \s Y )
X ∪s (Y ∩s Z) = (X ∪s Y ) ∩s (X ∪s Z) (distributivity)
X ∩s (Y ∪s Z) = (X ∩s Y ) ∪s (X ∩s Z) (distributivity)
(X ∪s Y ) \s Z = (X \s Z) ∪s (Y \s Z) (distributivity)
(X ∩s Y ) \s Z = (X \s Z) ∩s (Y \s Z) (distributivity)
(X \s (Y ∪s Z) = (X \s Y ) ∩s (X \s Z)
(X \s (Y ∩s Z) = (X \s Y ) ∪s (X \s Z)
containing the empty set can be simplified. Last but not least, some distribu-
tivity laws hold. These and other laws for set operation (see Fig. 7.1) should
be well-known.
A set of elements from a domain D can be seen as a function from D to
{0, 1}. For a given set S, this function
is called the characteristic function of
0 if s 6∈ S
S. It can be defined as χS (s) = . Obviously, there is a bijection
1 if s ∈ S
between characteristic functions and sets. That is, sets can be characterized by
their characteristic functions, and the set operations can be expressed in terms
EXC of operations on characteristic functions.
In the presence of null values, we have to be a little careful to evaluate an
expression like x ∈ S. Assume x is null and S contains some element y which
is also null. Then, we would like to have that x ∈ S and x is equal to y. Thus,
.
we must use ‘=’. Set equality can be expressed as equality of characteristic
functions. The subset relationship A ⊆ B can be expressed P as χA (x) ≤ χB (x)
for all x. The cardinality |S| for a set S is defined as x χS (x). Because we
deal with finite sets only, cardinality is well-defined. A singleton set is a set
containing only one element, i.e., a set whose cardinality equals 1.
As we have seen in Chapter 2, algebraic equivalences that reorder algebraic
operators form the fundamental basis for query optimization. One could discuss
the reorderability of each pair of operators resulting in n2 investigations if the
number of operators in the algebra is n. In order to simplify this tedious task,
we introduce a general argument covering most of the cases. The observation
7.1. SETS, BAGS, AND SEQUENCES 207
f (∅) = ∅,
f (X ∪s Y ) = f (X) ∪s f (Y ).
An n-ary mapping from sets to a set is called set-linear in its i-th argument, if
and only if for all sets X1 , . . . , Xn and Xi0 the following conditions hold:
(∅ ∪s X) 6= ∅,
(∅ ∩s X) = ∅,
(∅ \s X) = ∅,
(X \s ∅) 6= ∅,
(X ∪s Y ) ∪s Z = (X ∪s Z) ∪s (Y ∪s Z),
(X ∪s Y ) ∩s Z = (X ∩s Z) ∪s (Y ∩s Z),
(X ∪s Y ) \s Z = (X \s Z) ∪s (Y \s Z),
X \s (Y ∪s Z) 6= (X \s Y ) ∪s (X \s Z).
We can conclude that set union is neither left nor right set-linear, set intersection
is set-linear, and set difference is left set-linear but not right set-linear.
X ∪b ∅b = X
X ∪b X 6= X (wrong)
X ∪b Y = Y ∪b X (commutativity)
(X ∪b Y ) ∪b Z = X ∪b (Y ∪b Z) (associativity)
X ∩b ∅b = ∅b
X ∩b X = X (idempotency)
X ∩b Y = Y ∩b X (commutativity)
(X ∩b Y ) ∩b Z = X ∩b (Y ∩b Z) (associativity)
X \b ∅b = X
∅b \b X = ∅b
X \b X = ∅b
X \b Y 6= Y \b X (wrong)
(X \b Y ) \b Z 6= X \b (Y \b Z) (wrong)
X ∩b Y = X \b (X \b Y )
X ∪b (Y ∩b Z) = (X ∪b Y ) ∩b (X ∪b Z) (distributivity)
X ∩b (Y ∪b Z) 6= (X ∩b Y ) ∪b (X ∩b Z) (wrong)
(X ∪b Y ) \b Z 6= (X \b Z) ∪b (Y \b Z) (wrong)
(X ∩b Y ) \b Z = (X \b Z) ∩b (Y \b Z) (distributivity)
X \b (Y ∪b Z) 6= (X \b Y ) ∩b (X \b Z) (wrong)
X \b (Y ∩b Z) 6= (X \b Y ) ∪b (X \b Z) (wrong)
The laws for sets do not necessarily hold for bags (see Figure 7.2). We have
that bag union and bag intersection are both commutative and associative. Bag
difference is neither of them. Let us take a closer look at the different distribu-
tivity laws. Therefore, denote by LHS the left-hand side of an equivalence and
by RHS its right-hand side. Let us first prove
X ∪b (Y ∩b Z) = (X ∪b Y ) ∩b (X ∪b Z).
but
(X ∩b Y ) ∪b (X ∩b Z) = {13 }b ∪b {13 }b = {16 }b .
For the bags X = {15 }b , Y = {13 }b , and Z = {12 }b , we calculate
but
(X \b Z) ∪b (Y \b Z) = {13 }b ∪b {11 }b = {14 }b .
Consider
(X ∩b Y ) \b Z = (X \b Z) ∩b (Y \b Z).
This holds, since
X \b (Y ∪b Z) = {12 }b \b {12 }b = ∅b ,
210 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
but
(X \b Y ) ∩b (X \b Z) = {11 }b ∩b {11 }b = {11 }b ,
and
X \b (Y ∩b Z) = {12 }b \b {11 }b = {11 }b ,
but
(X \b Y ) ∪b (X \b Z) = {11 }b ∪b {11 }b = {12 }b .
Remark. Our definition of bag union is not the usual definition. The
standard set theoretic definition of the bag union operator ∪max is defined such
that
χX∪max Y (x) = max(χX (x), χY (x))
holds [21, 217]. With this definition, the laws for sets carry over to bags. We
decided to use the non-standard definition, since this is the semantics of bag
union in SQL and other query languages. Dayal, Goodman, and Katz [217] and
Albert [21] also investigate the non-standard bag union in their papers, although
under a different name. For example, Albert calls it bag concatenation. As a side
remark, it is interesting to note that Albert showed that bag concatenation can
not be expressed using ∪max , ∩b , \b [21]. Thus, any query language featuring ∪b
is strictly more expressive, since ∪max can be expressed using \b and ∪b because
the equivalence
X ∪max Y ≡ (X \b Y ) ∪b Y
holds. Two other laws involving ∪max are
X ∪max Y ≡ (X ∪b Y ) \b (X ∩b Y ),
X ∩b Y ≡ (X ∪b Y ) \b (X ∪max Y ).
I¯−1 (f (I(X)))
¯ = f (X)
7.1. SETS, BAGS, AND SEQUENCES 211
holds for all sets X. Analogously, we call binary functions g set-faithful if and
only if
I¯−1 (g(I(X),
¯ ¯ ))) = g(X, Y )
I(Y
holds for all sets X and Y .
\b and ∩b are set-faithful. Hence, we can (and often will) simply use \ and
∩ to denote bag difference and intersection. If the arguments happen to be sets,
the resulting bag will not contain any duplicates, i.e., it is a set.
Note that ∪b is not set-faithful. One possibility is to carefully distinguish
between ∪b and ∪s . However, this does not solve our problem for query process-
ing. A relation can be a set (e.g. if a primary key is defined) or a bag. Assume
we have two relations (or intermediate results) R1 , which is a set, and R2 , which
is a bag. Obviously, R1 ∪s R2 is not valid since R2 is a bag. By treating sets
as special bags, R1 ∪b R2 is valid. However, we cannot control duplicates in the
result as demanded by SQL, where there is a fundamental difference between
union all and union distinct. We could thus use two different union oper-
ators. Both take bags as input but one preserves duplicates, as does the bag
union, and the other eliminates duplicates. Let us denote the former by ∪ and
the latter by ∪d .
To go from a bag to a set, we have to eliminate duplicates. Let us denote
by ΠD the duplicate elimination operation. For a given bag B, we then have
χΠD (B) (z) = min(1, χB (z)). Using ΠD , we can define ∪d as
R1 ∪d R2 := ΠD (R1 ∪ R2 ).
However, the right-hand side is our preferred way to take care of duplicate
handling: we will always use the bag operator, denoted by ∪ and then, if
necessary, eliminate duplicates explicitly.
Summarizing, instead of working with sets and bags, we can work with bags
¯
only by identifying every set S with the bag I(S). To keep track of (possible)
duplicates, we can annotate all bags with a property indicating whether it
contains duplicates or not. If at some place a set is required and we cannot
infer that the bag in that place is duplicate free, we can use ΠD as an enforcer
of the set property. Note that for every set S we have ΠD (S) = S. Hence, ΠD
does not do any harm except for the resources it takes. The reasoning whether
a given expression produces duplicates or not is very important. Below, we will
indicate on the fly how reasoning about duplicates can be performed.
the sequence, i.e., α(S) = χS (0). For our example sequence, α(ha, b, b, c, bi) = a.
The rest or tail of a sequence S of length n is denoted by τ (S) and contains
all but the first element in the sequence. That is χτ (S) (i) = χS (i + 1). For our
example sequence, τ (ha, b, b, c, bi) = hb, b, c, bi.
Concatenation of two sequences is denoted by ⊕. The characteristic function
of the concatenation of two sequences S and T is
χS (i) if i < |S|,
χS⊕T (i) =
χT (i − |S|) if i ≥ |S|.
f () = ,
f (X ⊕ Y ) = f (X) ⊕ f (Y ).
equivalent to sum(all a). From this follows that aggregation functions can be
applied to sets or bags. Other query languages (OQL and XQuery) also allow
lists as arguments to aggregation functions. Additionally, OQL allows arrays.
Hence, aggregation functions should be defined for any bulk type.
Most query languages provide a special null value. In SQL it is called NULL.
Initially, OQL did not have a special null value. Fortunately, it was introduced
in version 3.0. There, the null value is called UNKNOWN. So far, XQuery has
no null value. Instead, the inventors of XQuery tried hard to let the empty
sequence play a dual role: that of an empty sequence and that of a null value.
Of course, this leads to awkward complications. We will use ’-’, ⊥, or NULL to
represent a null value. From this variance, the reader can already imagine its
importance.
Typically, aggregation functions can safely ignore null values. The only ex-
ception is count(*), where all input elements are counted. If for some attribute
a, we want to count only values of a with a 6= ⊥, then we often use countNN (a)
to emphasize this fact. The corresponding SQL function is count(a).
Let x be a single value and {x} a bag containing x only once. Since
min({x}) = x,
max({x}) = x,
sum({x}) = x,
avg({x}) = x,
min(min(X)) = min(X),
max(max(X)) = max(X),
sum(sum(X)) = sum(X),
avg(avg(X)) = avg(X),
agg : {τ }b → N .
with
2 1 1
agg(Z) = agg({agg(X), agg(Y )}b )
for all X and Y (not empty) with Z = X ∪ Y . This condition assures that
agg(Z) can be computed on arbitrary subsets (-lists, -bags) of Z independently
and the (partial) results can be aggregated to yield the correct total result. If
the condition holds, we say that agg is decomposable with inner agg1 and outer
agg2 .
A decomposable scalar aggregation function agg : {τ }b → N is called re-
versible if for aggO there exists a function (aggO )−1 : N 0 , N 0 → N 0 with
O I I
agg(X) = γ((agg)−1 (agg(Z), agg(Y )))
where
I
agg(X) = [sum : sum(X.a), count : |X|],
O
agg([sum : s1 , count : c1 ], [sum : s2 , count : c2 ]) = [sum : s1 + s2 , count : c1 + c2 ],
O −1
(agg) ([sum : s1 , count : c1 ], [sum : s2 , count : c2 ]) = [sum : s1 − s2 , count : c1 − c2 ],
γ([sum : s, count : c]) = s/c.
Here, sum(X.a) denotes the sum of all values of attribute a of the tuples in X,
and |X| denotes the cardinality of X. Note that aggI (∅) = [sum : 0, count : 0],
and γ([sum : 0, count : 0]) is undefined as is avg(∅).
In statistics, the variancePof a bag of numbers is often calculated. For a bag
1
B, it is defined as s2 = n−1 2
x∈B (x − x) , where x is the average of the values
P
in B, i.e., x = n1 x∈B x. As an exercise, the reader should show that variance
is decomposable and reversible.
Not all aggregation functions are decomposable and reversible. For instance,
min and max are decomposable but not reversible. If an aggregation function
is applied to a bag that has to be converted to a set, then decomposabili-
ty is jeopardized for sum and count. That is, in SQL sum(distinct) and
count(distinct) are not decomposable.
7.2. AGGREGATION FUNCTIONS 215
F 1,1 = F 1 ,
F 1,2 = F 2 ,
F 2,1 = F 2 ,
F 2,1 = F 2 .
Yan and Larson used the term Class C aggregation function for duplicate sensi-
tive aggregation functions and Class D for duplicate agnostic aggregation func-
tions [933].
Finally, note that for all aggregate functions except count(∗), we have
agg({a}) = a for arbitrary elements a. Thus, if we are sure that we deal
with only one tuple, we can apply the following rewrite. Let ai and bi be at-
tributes. Then, if F = (b1 : agg1 (a1 ), . . . , bm : aggm (am )), we define F̂ = (b1 :
a1 , . . . , bm : am ).
7.3 Operators
The bag operators as well as other typical operators like selection and join are
well-known. As we will see, the only difference in the definitions used here
is that they are extended to express nested queries. In order to enable this,
we allow the subscripts (predicates, expressions) of these operators to contain
algebraic expressions.
In this section, we define all our operators on bags. Besides duplicate elim-
ination, only projection will have explicit control over duplicates.
7.3. OPERATORS 217
Sometimes, the left outerjoin needs some additional tuning. The standard
definition of the left outerjoin demands that if some tuple from its left argument
does not have a join partner in its right argument, the attributes from the right
argument are given null values. We extend the left outerjoin such that values
other than null can be given to attributes of the right hand side. Similarily,
the full outerjoin will be extended to carry two superscripts for this kind of
defaults.
The d-join operation is used for performing a join between two bag valued
items, where the second one is dependent on the first one. One use is to express
queries with table functions (see Sec. 4.10). Another is to access index structures
(see Sec. 4.14). The d-join can also be used to unnest nested queries. It is often
equivalent to a join between two bags with a membership predicate [792]. In
some cases, it corresponds to an unnest operation.
The map operator χ ([479]) is well-known from the functional programming
language context. A special case of it, where it adds derived information in
form of an added attribute with an according value (e.g. by object-base lookup
or by method calls) to each tuple of a bag has been proposed in [478, 479].
Later, this special case was given the name materialization operator [91].
The unnest operator is known from NF2 [753, 733]. It will come in two dif-
ferent flavors allowing us to perform unnesting not only on nested relations but
also on attributes whose value is a bag of elements which are not tuples. The
reverse operator is the nest operator, which can be generalized to a grouping
operator. In our algebra, there exist two grouping operators: one unary group-
ing operator and one binary grouping operator (called groupjoin). The unary
grouping operator groups one bag of tuples according to a grouping condition.
Further, it can apply an arbitrary expression to the newly formed group. The
groupjoin adds a group to each element in the first argument bag. This group
is formed from the second argument. The groupjoin will exploit the fact that
in the object-oriented context objects can have bag-valued attributes. As we
will see, this is useful for both, unnesting nested queries and producing nested
results. We will even use nesting a a useful tool for processing SQL queries.
7.3.1 Preliminaries
As already mentioned, our algebraic operators not only deal with standard
relations but are polymorphic in the general sense. In order to fix the domain
of the operators, we need some technical abbreviations and notations. Let us
introduce these first.
Since our operators are polymorphic, we need variables for types. We use τ
possibly with a subscript to denote types. To express that a certain expression
is of type e, we write e :: τ . Starting from concrete names for types and
type variables, we can build type expressions the standard way by using type
constructors to build tuple types ([·]), set types {·}s , bag types {·}b and sequence
types < · >. Having two type expressions t1 and t2 , we denote by t1 ≤ t2 that
t1 is a subtype of t2 . It is important to note that this subtype relationship is not
based on the sub-/superclass hierarchy found in most object-oriented models.
Instead, it simply denotes substitutability. That is, type t1 provides at least all
218 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
• For an expression e with only one free variable x, we define e(t) = e[x ← t].
The mechanism is very much like the standard binding for the relational algebra.
Consider for example a select operation σa=3 (R). Then we assume that a, the
free variable of the subscript expression a = 3, is bound to the value of the
attribute a of the tuples of the relation R. To express this binding explicitly,
we would write for a tuple t ∈ R (a = 3)(t). Since a is an attribute of R and
hence of t, by our convention a is replaced by t.a, the value of attribute a of
tuple t. Since we want to avoid name conflicts right away, we assume that all
variable/attribute names used in a query are distinct. This can be achieved in
a renaming step. Typically, renaming takes place during the NFST phase.
Application of a function f to arguments ei is denoted by either regular
(e.g., f (e1 , . . . , en )) or dot (e.g., e1 .f (e2 , . . . , en )) notation. The dot notation is
used for type-associated methods occurring in the object-oriented context.
Last, we introduce the heavily overloaded symbol ◦. It denotes function
concatenation and (as a special case) tuple concatenation as well as the con-
catenation of tuple types to yield a tuple type containing the union of the
attributes of the two argument tuple types.
2
e[v1 ← e1 , . . . , vn ← en ] denotes a substitution of the variables vi by the expressions ei
within an expression e.
7.3. OPERATORS 219
Very often, we are given some database item which is a bag of other items.
Binding these to variables or, equivalently, embedding the items into a tuple,
we use the notation e[x] for an expression e and a variable/attribute name x.
For bag-valued expressions e, e[x] is defined as e[x] = {[x : y]|y ∈ e}. For
sequence-valued expressions e, we define e[a] = if e is empty and e[a] = h[a :
α(e)]i ⊕ τ (e)[a] otherwise.
7.3.2 Signatures
We are now ready to define the signatures of the operators of our algebra.
Their semantics is defined in a subsequent step. Remember that we consider all
operators as being polymorphic. Hence, their signatures are polymorphic and
contain type variables, denoted by τ , often with an index. As mentioned before,
we define all operators on bags. Let us start by typing our bag operators
∪ : {τ }b , {τ }b → {τ }b ,
∩ : {τ }b , {τ }b → {τ }b ,
\ : {τ }b , {τ }b → {τ }b ,
D
Π : {τ }b → {τ }b .
The unary operators we use have the following signatures, where B denotes
220 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
ΠA : {τ }b → {τ 0 }b
if τ ≤ τ 0 = [a1 : τ1 , . . . , an : τn ], A = {a1 , . . . , an },
0
ΠD
A : {τ }b → {τ }b
if τ ≤ τ 0 = [a1 : τ1 , . . . , an : τn ], A = {a1 , . . . , an },
σp : {τ }b → {τ }b
if p : τ → B,
χf : {τ1 }b → {τ2 }b
if f : τ1 → τ2 ,
χa:f : {τ1 }b → {τ1 ◦ [a : τ2 ]}b
if f : τ1 → τ2 ,
ΓθG;g:f : {τ1 ◦ τ2 }b → {τ1 ◦ [g : τ 0 ]}b
if τi ≤ [], f : {τ2 }b → τ 0 , G = A(τ1 ),
νG;g : {τ1 ◦ τ2 }b → {τ1 ◦ [g : {τ2 }b ]}b
if τi ≤ [], G = A(τ1 ),
µg : {τ }b → {τ 0 }b
if τ = [a1 : τ1 , . . . , an : τn , g : {τ0 }b ],
τ0 ≤ [],
τ 0 = [a1 : τ1 , . . . , an : τn ] ◦ τ0 ,
µg;c : {τ }b → {τ 0 }b
if τ = [a1 : τ1 , . . . , an : τn , g : {τ0 }b ],
τ 0 = [a1 : τ1 , . . . , an : τn ] ◦ [c : τ0 ].
flatten : {{τ }b }b → {τ }b .
7.3. OPERATORS 221
maxg;m;f : {τ }b → [m : τa , g : τf ]
if τ ≤ [a : τa ], f : {τa }b → τf .
7.3.3 Projection
Let A {a1 , . . . , an } be a set of attributes. We define the two projection operators
e2 := R2
e1 := R1
a2 b
a1
1 2
1
1 3
2
2 4
3
2 5
e4 := χg:σa1 =a2 (e2 ) (e1 )
e3 := Γa2 ;g:id (e2 )
a1 g
a2 g
1 {[a2 : 1, b2 : 2], [a2 : 1, b2 : 3]}b
1 {[a2 : 1, b2 : 2], [a2 : 1, b2 : 3]}b
2 {[a2 : 2, b2 : 4], [a2 : 2, b2 : 5]}b
2 {[a2 : 2, b2 : 4], [a2 : 2, b2 : 5]}b
3 ∅b
e5 := e1 Za1 =a2 ;g:id e2 e6 := e1 Ea1 =a2 e3
a1 g a1 a2 g
1 {[a2 : 1, b2 : 2], [a2 : 1, b2 : 3]}b 1 1 {[a2 : 1, b2 : 2], [a2 : 1, b2 : 3]}b
2 {[a2 : 2, b2 : 4], [a2 : 2, b2 : 5]}b 2 2 {[a2 : 2, b2 : 4], [a2 : 2, b2 : 5]}b
3 ∅b 3 - -
7.3.4 Selection
Note that in the following definition there is no restriction on the selection
predicate. It may contain path expressions, method calls, nested algebraic
operators, etc.:
7.3.5 Map
The map operator is of fundamental importance to the algebra. It comes in two
flavors. The first one extends a given input tuple by an attribute and assigns
a value to this new attribute. This variant is also called materialize operator
[478, 91]. The second one produces for each input element an output element
by applying a function to it. This corresponds to the standard map as defined
in, e.g., [479]. The latter is able to express the former. The two variants of the
map operator are defined as follows:
We can generalize the last variant to calculate values for many attributes. Given
an attribute assignment vector a1 : e1 , . . . , ak : ek , we define
χa1 :e1 ,...,ak :ek (e) := χak :ek (. . . χa1 :e1 (e) . . .).
7.3. OPERATORS 223
If we demand that ai 6∈ A(e), then the ai are new attributes. Then, the ma-
terialize operator and its special single-attribute case χa:e are called extending,
because it extends a given input tuple with new attributes while it does not
modify the values of the input attributes. Many equivalences only hold for this
specialization of the map operator, which, at the same time, is the predominant
variant used. In fact, it is sufficient for SQL. An example of an extending map
operator can be found in Fig. 7.4.
Note that the map operator for the object-oriented and object-relational
context obviates the need of a relational projection. Sometimes the map oper-
ator is equivalent to a renaming. In this case, we will use ρ instead of χ. Let
A = {a1 , . . . , an } and B = {b1 , . . . , bn } be two sets with n attributes each. We
then define
ΓθG;g:f (e) := {y ◦ [g : x] | y ∈ ΠD
G (e), x = f ({z|z ∈ e, z.G θ y.G}b )}s
We also introduce two variants of the grouping operator, which can be used
to abbreviate small expressions. Let F = b1 : e1 , . . . , bk : ek and F(ei ) = {g}
for all i = 1, . . . , k. Then we define
Fg = b1 : agg(g.a1 ), . . . , bk : agg(g.ak )
1 k
If the bag-valued attribute is not stored explicitly but derived by the evaluation
of an expression, we use the unnest map operator to unnest it:
The motivation for the unnest map operator is that it saves the explicit mate-
rialization of the result of the evaluation of the expression e2 .
The results of µg (e) and µa:g are duplicate-free, if and only if the following
two conditions hold.
1. The input e is duplicate-free.
The flatten operator’s result is duplicate-free if and only if the bags it contains
are duplicate-free and they have a pairwise empty intersection. Thus, explicit
duplicate control is very much in order.
e1 A e2 := {y ◦ x|y ∈ e1 , x ∈ e2 }b ,
e1 Bp e2 := {y ◦ x|y ∈ e1 , x ∈ e2 , p(y, x)}b ,
e1 Np e2 := {y|y ∈ e1 , ∃x ∈ e2 , p(y, x)}b ,
e1 Tp e2 := {y|y ∈ e1 , ¬∃x ∈ e2 p(y, x)}b ,
e1 Ep e2 := (e1 Bp e2 ) ∪ ((e1 Tp e2 ) A {⊥A(e2 ) }),
e1 Kp e2 := (e1 Bp e2 )
∪((e1 Tp e2 ) A {⊥A(e2 ) })
∪({⊥A(e1 ) } A (e2 Tp e1 )).
An example for the left outerjoin can be found in Fig. 7.4. More examples for
join, left outerjoin, and full outerjoin can be found in Fig. 7.6 for the predicate
0 := (b =.
qij := (bi = bj ) and in Fig. 7.7 for the predicate qij i bj ) .
Regular joins were already present in Codd’s original proposal of a relational
algebra [192]. Outerjoins were invented by Lacroix and Pirotte [516].
The next join operator to come is called dependency join, or d-join, and is
denoted by C. It is a join between two bags, where the evaluation of the second
bag may depend on the first bag. The filled triangle thus shows the direction
into which information has to flow in order to evaluate the d-join. It is used to
translate from clauses containing table functions with parameters (see Sec. 4.10
for an example) and lateral derived tables into the algebra. Whenever possible,
d-joins will be rewritten into standard joins. The definition of the d-join is
e1 C e2 := {y ◦ x|y ∈ e1 , x ∈ e2 (y)}b .
join partner. Let Di = di1 : ci1 , . . . , dik : cik (i = 1, 2) be two vectors assigning
constants cij to attributes dij . We then define
2
e1 ED
p e2 := (e1 Bp e2 )
∪((e1 Tp e2 ) A {⊥A(e2 )\A(D2 ) ◦ [D2 ]},
1 ;D 2
e1 KD
p e2 := (e1 Bp e2 )
∪((e1 Tp e2 ) A {⊥A(e2 )\A(D2 ) ◦ [D2 ]}),
∪((e2 Tp e1 ) A {⊥A(e1 )\A(D1 ) ◦ [D1 ]}),
e1 Bq e2 ≡ σq (e1 A e2 ).
e1 Nq e2 ≡ σp (e1 ).
e1 Tq e2 ≡ σq (e1 ).
The outerjoins were already defined using these three operators, which in turn
can be expressed using only selection and cross product.
We observe that:
• The results of cross product, (regular) join, left outerjoin and full outerjoin
are duplicate-free if and only if both of their inputs are duplicate-free.
• The results of a semi- and an antijoin are duplicate-free if and only if their
left-input is duplicate-free.
7.3.10 Groupjoin
The second grouping operator — called groupjoin or binary grouping — is de-
fined on two input bags. It is more than 20 years old, but there is still no
common name for it. It was first introduced by von Bültzingsloewen [888, 889]
under the name of outer aggregation. Nakano calls the same operator gen-
eral aggregate formation [626], since unary grouping is called aggregate for-
mation by Klug [494]. Steenhagen, Apers, Blanken, and de By call a vari-
ant of the groupjoin nest-join [820]. The groupjoin is quite versatile and we
strongly believe that no DBMS can do without it. For example, it has been
7.3. OPERATORS 227
Similar to unary grouping, we will use Zq;g;F to abbreviate Πg (χF (e1 Zq;g:id
e2 )), and Zq;F to abbreviate ZA;g;F . In both cases, F must be an aggregation
vector with F(F ) = {g}. An SQL notation variant of the groupjoin is defined
as e1 Zq;F e2 := e1 Zq;Fg e2 , where the requirements for F and Fg are the same
as for unary grouping.
Since the reader is most likely not familiar with groupjoin, let us give some
remarks and pointers on its implementation. Obviously, implementation tech-
niques for the equijoin and the nest operator can be used if θ stands for equality.
For the other cases, implementations based on sorting seem promising. One
could also consider implementation techniques for non-equi joins, e.g., those
developed for the band-width join [235]. An alternative is to use θ-tables,
which were developed for efficient aggregate processing [188]. Implementation
techniques for groupjoin have also been discussed in [143, 587].
Note that the groupjoin produces a duplicate-free result if and only if its
left input is duplicate-free. It is thus set-faithful.
The max operator successively performs three tasks. First, it calculates the
maximum (m) of all elements contained in e.a for some attribute a ∈ A(e).
Second, it uses this maximum (m) to select exactly those elements t from e such
that t.a = m, i.e., their a value is maximal. Third, these maximizing elements
t from e are collected into a bag and the result of applying the function f to it
is stored as the value for the attribute g. In a real implementation, at least the
first two phases will be merged. Thus, max requires only a single scan over e.
228 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
where J = F(e2 ) ∩ A(e1 ) and q̂ = ((J = J 0 ) ∧ q). The results of the d-semijoin
and d-antijoin are duplicate-free if and only if their left argument is.
We define the left outer d-join analogously to the left outerjoin:
Let us expand this definition. With E⊥2 = {⊥A(e2 ) }, J = F(e2 ) ∩ A(e1 ), and
q̂ = ((J = J 0 ) ∧ q), we then have
The result of a left outer d-join is duplicate-free if its left input and eb2 are.
Defining a full outer d-join does not make much sense. The third part of
the expression
is not even evaluable, since e2 can only be evaluated in the context of bindings
derived from e1 . One might be tempted to use eb2 such that the problematic
part becomes
(eb2 Tq e1 ) A {⊥A(e1 ) }).
However, we abandon this possibility.
The situation is less complicated for the dependent groupjoin. We can define
it as
e1 [q;g:f e2 := e1 Zqb;g:f eb2 . (7.4)
We leave it as an exercise to the reader to show that
g:f (∅)
e1 [q;g:f e2 ≡ e1 EJ=J 0 Γq;g:f (eb2 ), (7.5)
f (∅b ) = ∅b ,
f (X ∪b Y ) = f (X) ∪b f (Y ).
230 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
An n-ary mapping from bags to a bag is called strongly linear in its i-th ar-
gument if and only if for all bags X1 , . . . , Xn and Xi0 the following conditions
hold:
(∅b ∪b X) 6= ∅b ,
(∅b ∩b X) = ∅b ,
(∅b \b X) = ∅b ,
(X \b ∅b ) 6= ∅b
and
(X ∪b Y ) ∪b Z 6= (X ∪b Z) ∪b (Y ∪b Z),
(X ∪b Y ) ∩b Z 6= (X ∩b Z) ∪b (Y ∩b Z),
(X ∪b Y ) \b Z 6= (X \b Z) ∪b (Y \b Z),
X \b (Y ∪b Z) 6= (X \b Y ) ∪b (X \b Z),
we can conclude that bag union is neither strongly left nor strongly right linear,
bag intersection is neither strongly left nor strongly right bag-linear, and bag
difference is neither strongly left nor strongly right linear.
We can relax the definition of strongly linear by the additional assumption
that the intersection of the two unioned bags is empty. A unary function f from
bags to bags is called weakly linear if and only if the following two conditions
hold for all bags X and Y with X ∩b Y = ∅b :
f (∅b ) = ∅b ,
f (X ∪b Y ) = f (X) ∪b f (Y ).
An n-ary mapping from bags to a bag is called weakly linear in its i-th argument
if and only if for all bags X1 , . . . , Xn and Xi0 with Xi ∩b Xi0 = ∅b the following
conditions hold:
It is called weakly linear , if it is weakly linear in all its arguments. For a binary
function or operator where we can distinguish between the left and the right
7.4. LINEARITY OF ALGEBRAIC OPERATORS 231
unary binary
operator linear operator left lin. right lin.
ΠD ◦ ∪ - -
ΠA + ∩ ◦ ◦
ΠDA ◦ \ ◦ -
σp + A + +
χa:e + Bp + +
χf + Np + -
ΓθG;F - Tp + -
νG;g - Ep + -
µg + Kp - -
µa:g + Zp;F + -
Υf + C + does not apply
Υa:f +
flatten +
argument, we call it weakly left (right) linear if it is weakly linear in its first
(second) argument.
Using the commutativity of bag union and bag intersection as well as the
observations that in general
(∅b ∪b X) 6= ∅b ,
(∅b ∩b X) = ∅b ,
(∅b \b X) = ∅b ,
(X \b ∅b ) 6= ∅b
and
(X ∪b Y ) ∪b Z 6= (X ∪b Z) ∪b (Y ∪b Z),
(X ∪b Y ) ∩b Z = (X ∩b Z) ∪b (Y ∩b Z),
(X ∪b Y ) \b Z = (X \b Z) ∪b (Y \b Z),
Z \b (X ∪b Y ) 6= (Z \b X) ∪b (Z \b Y )
for X∩b Y = ∅b , we can conclude that bag union is neither weakly left nor weakly
right linear, bag intersection is weakly linear, and bag difference is weakly left
but not weakly right linear.
For the whole algebra, Table 7.1 summarizes the linearity properties for all
of our algebraic operators. Thereby, a ’+’ denotes strong linearity, ’◦’ denotes
weak linearity, and ’-’ denotes neither of them.
Let us take a closer look at the gap between weak and strong linearity. For
some bag B, define the unary function f on bags such that
3 if x ∈ B,
χf (B) (x) =
0 if x 6∈ B
232 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
holds. Then, f is weakly linear but not strongly linear. The problem is that
f manipulates the multiplicity of the elements. We can make the difference
between weakly and strongly linear explicit. Therefore, we remember that
the only difference in the definition was the disjointness we required for weak
linearity. Consequently, we consider now the special case of bags containing
a single element multiple times. We say that a unary function f is duplicate
faithful if and only if for all x
f ({xm }b ) = ∪m
i=1 f ({x}b )
holds. Then, a unary function is strongly bag linear if and only if it is weakly
bag linear and duplicate faithful. The same holds for n-ary functions if we
extend the property duplicate faithful to multiple arguments.
To see that the left semijoin is not even weakly right linear, consider the
following example:
{[a : 1]}b = {[a : 1]}b Na=b {[b : 1, c : 1], [b : 1, c : 2]}b
= {[a : 1]}b Na=b ({[b : 1, c : 1]}b ∪ {[b : 1, c : 2]}b )
6= ({[a : 1]}b Na=b {[b : 1, c : 1]}b ) ∪ ({[a : 1]}b Na=b {[b : 1, c : 2]}b )
= {[a : 1]2 }b .
This is the reason why some equivalences valid for sets do not hold for bags
anymore. For example, ΠA(e1 ) (e1 Bq12 e2 ) ≡ e1 Nq12 e2 holds for sets but not for
bags. If we eliminate duplicates explicitly, we still have
ΠD D
A(e1 ) (e1 Bq12 e2 ) ≡ ΠA(e1 ) (e1 Nq12 e2 ). (7.6)
Similiarily, we have
ΠD D
A(e1 ) (e1 Eq12 e2 ) ≡ ΠA(e1 ) (e1 ), (7.7)
ΠD D
A(e1 ) (e1 Kq12 e2 ) ≡ ΠA(e1 ) (e1 ). (7.8)
Let us now present some sample proofs of linearity. All proofs are by induc-
tion on the number of distinct elements contained in the argument bags.
χf is strongly linear.
χf (∅b ) = ∅b
χf ({xm }b ) = ∪m
i=1 f ({x}b )
χf (e1 ∪ e2 ) = {f (x)|x ∈ e1 ∪ e2 }b
= {f (x)|x ∈ e1 }b ∪ {f (x)|x ∈ e2 }b
= χf (e1 ) ∪ χf (e2 )
∅b C e2 = ∅b
{x }b C e2 = ∪m
m
i=1 ({x}b C e2 )
(e01 ∪ e001 ) C e2 = {y ◦ x|y ∈ e01 ∪ e001 , x ∈ e2 (y)}b
= {y ◦ x|y ∈ e01 , x ∈ e2 (y)}b ∪ {y ◦ x|y ∈ e001 , x ∈ e2 (y)}b
= (e01 C e2 ) ∪ (e001 C e2 )
Note that the notion of linearity cannot be applied to the second (inner)
argument of the d-join, since, in general, it cannot be evaluated indepen-
dently of the first argument.
µg is strongly linear.
µg (∅b ) = ∅b
µg ({xm }b ) = ∪m
i=1 (µg ({x}b )
µg (e1 ∪ e2 ) = {x.[g] ◦ y|x ∈ e1 ∪ e2 , y ∈ x.g}b
= {x.[g] ◦ y|x ∈ e1 , y ∈ x.g}b ∪ {x.[g] ◦ y|x ∈ e2 , y ∈ x.g}b
= µg (e1 ) ∪ µg (e2 )
flatten(∅b ) = ∅b
flatten({xm }b ) = ∪m
i=1 (flatten(x))
flatten(e1 ∪ e2 ) = {x|y ∈ e1 ∪ e2 , x ∈ y}b
= {x|y ∈ e1 , x ∈ y}b ∪ {x|y ∈ e2 , x ∈ y}b
= flatten(e1 ) ∪ flatten(e2 )
Note that the notion of linearity does not apply to the max operator, since it
does not return a bag.
234 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
unary binary
operator produced deleted operator produced deleted
ΠD ∅ ∅ ∪ ∅ ∅
ΠA ∅ A ∩ ∅ ∅
ΠDA ∅ A \ ∅ A(e2 )
σp ∅ ∅ A ∅ ∅
χa:e {a} ∅ Bq ∅ ∅
ΓθG;F A(F ) G Nq ∅ A(e2 )
νG;g {g} G Tq ∅ A(e2 )
µg A(g) {g} Eq A(e2 ) ∅
µa:g {a} {g} Kq A(e1 ) ∪ A(e2 ) ∅
Υa:f {a} ∅ Zq;g:F {g} ∅
7.5 Representations
7.5.1 Three Different Representations
In this section, we discuss different representations for sets, bags, and sequences.
Let us start with bags. Fig. 7.5 shows three different representations for bags.
236 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
R R
R
A B A B i
A B m
1 1 1 1 1
1 1 2
1 1 1 1 2
2 2 1
2 2 2 2 3
the order. One way to remedy this situation is to keep not only the multiplicity
of an element but its positions. This results in
or in
h[a : 1, p : {1, 3}s ], [a : 2, p : {2}s ]i
if we represent the set of positions at which a tuple occurs in an extra attribute
p. We call this a position-based representation. It is duplicate free and in case
of multiple duplicates in the original sequence, it saves some memory.
Doing the same exercise with the regular join operator results in the so-called
counting join [931].
Luckily, it is not necessary to introduce a special counting cross product, as
can be seen from
The result is
where we assumed that the implicit order in which the pregrouping operator
sees the tuples is from left to right. Calculating Γa;g;m:sum(m) gives with
holds. If the grouping operator is pushed into a join or any other binary op-
erator and still some outer grouping is present (see Sec. 7.11), then the inner
grouping can be replaced by a pregrouping. General partial pregrouping or
preaggregation is discussed in several papers [419, 526]. They also discuss the
expected resulting number of tuples of partial pregrouping.
bindings for all variables in E1 and E2 , if E10 and E20 , which result from applying
these bindings to them, are well-typed, then the evaluation of E10 and E20 yields
the same result.
Equivalence of relational expressions is discussed, e.g., by Aho, Saviv, and
Ullman [19, 18]. In fact, they discuss weak and strong equivalence. We use
strong equivalence. Weak equivalence is defined on universal relations and is
not sufficent for our purpose.
ΠD ΠA ΠDA σ χ Γ ν µ Υ
ΠD + - - + + - - (-) (-)
ΠA - - - + + - - + +
ΠDA - - - + + - - (-) (-)
σ + + + + + ◦ + + +
χ + + + + + - - + +
Γ - - - ◦ - - - - -
ν - - - + - - - - -
µ (-) + (-) + + - - + +
Υ (-) + (-) + + - - + +
and
χa:e2 (ΓG;F (e1 )) = ΓG∪{a};F (χa:e2 (e1 )) (7.19)
if F(e2 ) ⊆ G. Whereas the expression e2 is evaluated once per group on the
left-hand side, it is evaluated once per item in e1 on the right-hand side. This
242 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
∪ ∩ \ A B N T C E K Z
ΠD ◦/◦ +/+ +/◦ ◦/◦ ◦/◦ +/- +/- ◦/- ◦/- ◦/◦ +/-
ΠA -/- -/- -/- -/- -/- +/- +/- -/- -/- -/- -/-
ΠDA -/- -/- -/- -/- -/- +/- +/- -/- -/- -/- -/-
σ -/- +/+ +/- +/+ +/+ +/- +/- +/+ +/- -/- +/-
χ -/- -/- -/- +/+ +/+ +/- +/- +/+ +/◦ ◦/◦ +/-
Γ -/- -/- -/- -/- -/- +/- +/- -/- -/- -/- -/-
ν -/- -/- -/- -/- -/- +/- +/- -/- -/- -/- -/-
µ -/- -/- -/- +/+ +/+ +/- +/- +/+ +/- -/- +/-
Υ -/- -/- -/- +/+ +/◦ +/- +/- +/+ +/- -/- +/-
if F(e2 ) ⊆ G.
f (e1 ◦ e2 ) ≡ f (e1 ) ◦ e2 .
ΠD ΠA ΠDA σ χ Γ ν µ Υ
∪ - + - + + - - + +
∩ + - - + + - - + +
\ + - - + + - - + +
We now turn our attention to the grouping operator. Let e1 and e2 be two
expressions with A(e1 ) = A(e2 ). Further, let G ⊆ A(e1 ) be a set of grouping
attributes and F an aggregation vector. If (ΠG (e1 ) ∩ ΠG (e2 )) = ∅, then
for not necessarily distinct operators ◦a and ◦b . The subscripts in this equiv-
alence have the following meaning. For operators not carrying a predicate or
other expressions, it is immaterial and can be ignored. If an operator has an
expression e as a subscript, then ij (for 1 ≤ i, j ≤ 3, i 6= j) indicates that
F(e) ∩ ek = ∅ for 1 ≤ k ≤ 3 and k 6∈ {i, j}. This ensures that the equivalence
is correctly typed on both sides of the equivalence sign. If for two operators ◦a
and ◦b the above equivalence holds, then we denote this by assoc(◦a , ◦b ). As
we will see, assoc is not symmetric. Thus, we have to be very careful about the
order of the operators, which is tight to the syntactic pattern of the equivalence
above. In order not to make a mistake, one has to remember two things. First,
the operators appear in assoc in the same order as on the left-hand side of the
equivalence. Second, the equivalence has left associatiation on its left-hand side
and, consequently, right association on its right-hand side.
If both operators are commutative, then the assoc property is symmetric,
i.e.,
{t1 }b ◦a12 ({t2 }b ◦b23 {t3 }b ) = ({t1 }b ◦a12 {t2 }b ) ◦b23 {t3 }b
holds, where the subscript ij in ◦ij indicates that any subscript in ◦ij does
not access attributes from ek if k 6= i and k 6= j. Then, we can easily prove
246 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
∪ ∩ \ A B N T C E K Z
∪ + - - - - - - - - - -
∩ - + - - - + + - - - -
\ - - - - - - - - - - -
A - - - + + + + + + - +
B - - - + + + + + + - +
N - - - - - - - - - - -
T - - - - - - - - - - -
C - - - + + + + + + - +
E - - - - - - - - ◦ - -
K - - - - - - - - ◦ ◦ -
Z - - - - - - - - - - -
Obviously, the right-hand side is ill-typed. However, we could rewrite the pat-
tern to
(e1 ◦a12 e2 ) ◦b13 e3 ≡ (e1 ◦b13 e3 ) ◦a12 e2
because then both sides are well-typed. Let us call instances of this pattern
left asscom property and denote by l-asscom(◦a , ◦b ) the fact that the accord-
7.7. SIMPLE REORDERABILITY 247
ing equivalence holds. Analogously, we can define a right asscom property (r-
asscom):
e1 ◦a13 (e2 ◦b23 e3 ) ≡ e2 ◦b23 (e1 ◦a13 e3 ).
First note that l-asscom and r-asscom are symmetric properties, i.e.,
l-asscom(◦a , ◦b ) ≺ l-asscom(◦b , ◦a ),
r-asscom(◦a , ◦b ) ≺ r-asscom(◦b , ◦a ).
implies that
Thus,
({t1 }b ◦a12 {t2 }b ) ◦b13 {t3 }b ≡ ({t1 }b ◦b13 {t3 }b ) ◦a12 {t2 }b
for all ti and that ◦a and ◦b are strongly left linear. Then l-asscom(◦a , ◦b )
holds. The proof is by induction on the number of elements contained in e1 .
First observe that if e1 is empty, then (e1 ◦a12 e2 ) ◦b13 e3 and (e1 ◦b13 e3 ) ◦a12 e2 are
248 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
∪ ∩ \ A B N T C E K Z
∪ +/+ -/- -/- -/- -/- -/- -/- -/- -/- -/- -/-
∩ -/- +/+ -/- -/- -/- +/- +/- -/- -/- -/- -/-
\ -/- -/- -/- -/- -/- +/- +/- -/- -/- -/- -/-
A -/- -/- -/- +/+ +/+ +/- +/- +/+ +/- -/- +/-
B -/- -/- -/- +/+ +/+ +/- +/- +/+ +/- -/- +/-
N -/- +/- +/- +/- +/- +/- +/- +/- +/- -/- +/-
T -/- +/- +/- +/- +/- +/- +/- +/- +/- -/- +/-
C -/- -/- -/- +/+ +/+ +/- +/- +/+ +/- -/- +/-
E -/- -/- -/- +/- +/- +/- +/- +/- +/- ◦/- +/-
K -/- -/- -/- -/- -/- -/- -/- -/- ◦/- ◦/◦ -/-
Z -/- -/- -/- +/- +/- +/- +/- +/- +/- -/- +/-
∪ ∩ \ A B N T C E K Z
∪ -/- -/- -/- +/+ +/+ -/+ -/+ +/+ -/+ -/- -/+
∩ +/+ +/+ -/+ +/+ +/+ -/+ -/+ +/+ -/+ -/- -/+
\ -/- -/- -/- +/+ +/+ -/+ -/+ +/+ -/+ -/- -/+
also empty. Let e01 and e001 be two bags such that e1 = e01 ∪ e001 . The induction
step looks like this:
Table 7.7 summarizes the l-/r-asscom properties for all pairs of operators.
Most of the entries follow from the abovementioned. Some equivalences for
the d-join and the groupjoin, especially in conjunction with outerjoins, need
dedicated proofs. This is a good sign, since, thanks to l-/r-asscom, reorderings
become possible which were not possible with commutativity and associativity
alone.
Distributivity laws play a minor role in query compilers, but are very useful
to prove equivalences. We consider right and left distributivity (l/r-dist):
holds if e2 6= ∅.
Assume that the whole predicate of a binary operator references only at-
tributes from its left or its right argument. Then, some simplifications/rewrites
250 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
are possible:
where we left out symmetric cases, which are possible due to commutativity.
Let us consider the semijoin. If F(p1 ) ∩ A(e2 ) = ∅, then
σp1 (e1 ) if e2 6= ∅,
e1 Np1 e2 =
∅ if e2 = ∅,
which can be summarized to σe2 6=∅ (σp1 (e1 )). If F(p2 ) ∩ A(e1 ) = ∅, then
e1 if σp2 (e2 ) 6= ∅,
e1 Np2 e2 =
∅ if σp2 (e2 ) = ∅,
To consider the different cases for the left and full outerjoin, it is convenient
to define E⊥i = {⊥A(ei ) } for i = 1, 2 and given expressions ei . If F(p1 )∩A(e2 ) =
∅, we can reason as follows for the left outerjoin:
e1 Ep1 e2 ≡ (e1 Bp1 e2 ) ∪ ((e1 Tp1 e2 ) A E⊥2 )
≡ (σp1 (e1 ) A e2 ) ∪ ((σ(e2 =∅)∨(¬p1 ) (e1 )) A E⊥2 ).
e1 C e2 = e1 C (e02 ∪ e002 )
= e1 BJ=J 0 (eb0 ∪ eb00 )
2 2
= (e1 BJ=J 0 eb02 ) ∪ (e1 BJ=J eb002 )
= (e1 C e02 ) ∪ (e1 C e002 )
e1 C e2 ≡ e1 × e2 (7.54)
if F(e2 ) ∩ A(e1 ) = ∅,
e1 C σq (e2 ) ≡ e1 Bq e2 (7.55)
if F(e2 ) ∩ A(e1 ) = ∅.
Denote by e ↓ the fact that some expression e is defined, i.e., returns a valid
result. Then, we call a function f extending, if and only if
e1 := R1 e2 := R2 e3 := R3
a1 b1 a2 b2 a3 b3
1 1 1 2 1 3
2 4 2 4 2 5
3 5 3 6 3 6
4 - 4 - 4 -
j j j
e12 := e1 Bq12 e2 e13 := e1 Bq13 e3 e23 := e2 Bq23 e3
a1 b1 a2 b2 a1 b1 a3 b3 a2 b2 a3 b3
2 4 2 4 3 5 2 5 3 6 3 6
elo
12 := e 1 E q e
12 2 elo
13 := e1 E q e
13 3 elo
23 := e2 E q e
23 3
a1 b1 a2 b2 a1 b1 a3 b3 a2 b2 a3 b3
1 1 - - 1 1 - - 1 2 - -
2 4 2 4 2 4 - - 2 4 - -
3 5 - - 3 5 2 5 3 6 3 6
4 - - - 4 - - - 4 - - -
ef12o := e1 Kq12 e2 fo
e13 := e1 Kq13 e3 fo
e23 := e2 Kq23 e3
a1 b1 a2 b2 a1 b1 a3 b3 a2 b2 a3 b3
1 1 - - 1 1 - - 1 2 - -
2 4 2 4 2 4 - - 2 4 - -
3 5 - - 3 5 2 5 3 6 3 6
4 - - - 4 - - - 4 - - -
- - 1 2 - - 1 3 - - 1 3
- - 3 6 - - 3 6 - - 2 5
- - 4 - - - 4 - - - 4 -
e1 C p12 (e2 (e1 ) Bp23 e3 (e1 )) ≡ (e1 C p12 e2 (e1 )) C p23 e3 (e1 ) (7.60)
e1 C p12 (e2 Bp23 e3 (e1 )) ≡ (e1 Bp12 e2 ) C p23 e3 (e1 ) (7.61)
In the first equivalence, the join between e2 and e3 on the left-hand side must be
turned into a dependent join on the right-hand side. In the second equivalence,
the first dependent join between e1 and e2 becomes a regular join between e1
and e2 on the right-hand side and the regular join between e2 and e3 on the
left-hand side becomes a dependent join on the right-hand side.
e1 := R1 e2 := R2 e3 := R3
a1 b1 a2 b2 a3 b3
1 1 1 2 1 3
2 4 2 4 2 5
3 5 3 6 3 6
4 - 4 - 4 -
j0 j0 j0
e12 := e1 Bq12 0 e2 e13 := e1 Bq13 0 e3 e23 := e2 Bq23 0 e3
a1 b1 a2 b2 a1 b1 a3 b3 a2 b2 a3 b3
2 4 2 4 3 5 2 5 3 6 3 6
4 - 4 - 4 - 4 - 4 - 4 -
lo 0 lo 0 lo 0
e12 := e1 Eq12 0 e2 e13 := e1 Eq13 0 e3 e23 := e2 Eq23 0 e3
a1 b1 a2 b2 a1 b1 a3 b3 a2 b2 a3 b3
1 1 - - 1 1 - - 1 2 - -
2 4 2 4 2 4 - - 2 4 - -
3 5 - - 3 5 2 5 3 6 3 6
4 - 4 - 4 - 4 - 4 - 4 -
f o0 f o0 f o0
e12 := e1 Kq12 0 e2 e13 := e1 Kq13 0 e3 e23 := e2 Kq23 0 e3
a1 b1 a2 b2 a1 b1 a3 b3 a2 b2 a3 b3
1 1 - - 1 1 - - 1 2 - -
2 4 2 4 2 4 - - 2 4 - -
3 5 - - 3 5 2 5 3 6 3 6
4 - 4 - 4 - 4 - 4 - 4 -
- - 1 2 - - 1 3 - - 1 3
- - 3 6 - - 3 6 - - 2 5
outerjoin can have several reasons. First, outerjoins are part of the SQL 2
specification. Second, outerjoins can be introduced during query rewrite. For
example, unnesting nested queries or hierarchical views may result in outerjoins.
Sometimes, it is also possible to rewrite universal quantifiers to outerjoins [873,
215].
Before reading any further, the reader should get acquainted to outerjoins by
checking whether there is a mistake in Figs. 7.6, 7.7, or 7.8. There, we calculated
7.10. EQUIVALENCES FOR OUTERJOINS 255
e1 Eq12 ej23
a1 b1 a2 b2 a3 b3
elo
12 Bq23 e3 1 1 - - - -
a1 b1 a2 b2 a3 b3 2 4 - - - -
3 5 - - - -
4 - - - - -
j
e1 Kq12 e23
a1 b1 a2 b2 a3 b3
ef12oBq23 e3 1 1 - - - -
a1 b1 a2 b2 a3 b3 2 4 - - - -
- - 3 6 3 6 3 5 - - - -
4 - - - - -
- - 3 6 3 6
ej12 Kq23 e3
a1 b1 a2 b2 a3 b3
2 4 2 4 - - e1 Bq12 ef23o
- - - - 1 3 a1 b1 a2 b2 a3 b3
- - - - 2 5 2 4 2 4 - -
- - - - 3 6
- - - - 4 -
elo
12 Kq23 e 3
a1 b1 a2 b2 a3 b3
1 1 - - - - e1 Eq12 ef23o
2 4 2 4 - - a1 b1 a2 b2 a3 b3
3 5 - - - - 1 1 - - - -
4 - - - - - 2 4 2 4 - -
- - - - 1 3 3 5 - - - -
- - - - 2 5 4 - - - - -
- - - - 3 6
- - - - 4 -
for three relations Ri their joins, left outerjoins, and full outerjoins for three
different sets of predicates. The first set of predicates does not apply any special
comparisons with respect to null values. All predicates in this set are denoted
by qij (1 ≤ i, j ≤ 3) and defined as qij := (bi = bj ). The second set of predicates
.
uses the special comparison ‘=’. Remember that this dotted equality returns
true in the additional case that both arguments are null. The predicates of the
0 and defined as q 0 := (b = .
second set are denoted by qij ij i bj ). The third set of
. .
predicates consists of q120 := b1 = b2 ∨ b2 = null and q20 3 := b2 = b3 ∨ b2 = null.
Note that in Fig. 7.8 there is no difference between e2 Eq20 ,3 e3 and e2 Kq20 ,3 e3 .
Why?
The main purpose of this section is to derive equivalences among expressions
containing outerjoins. Let us start with the observation that the full outerjoin
is commutative, but the left outerjoin is not. Less simple is the next item on
256 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
0
e100 := elo 0 e3
12 Eq23 e100 := e1 Eq12 elo
23
a1 b1 a2 b2 a3 b3 a1 b1 a2 b2 a3 b3
1 1 - - 4 - 1 1 - - - -
2 4 2 4 - - 2 4 2 4 - -
3 5 - - 4 - 3 5 - - - -
4 - - - 4 - 4 - - - - -
0
e100 := ef12o Eq23
0 e3 e100 := e1 Kq12 elo
23
a1 b1 a2 b2 a3 b3 a1 b1 a2 b2 a3 b3
1 1 - - 4 - 1 1 - - - -
2 4 2 4 - - 2 4 2 4 - -
3 5 - - 4 - 3 5 - - - -
4 - - - 4 - 4 - - - - -
- - 1 2 - - - - 1 2 - -
- - 3 6 3 6 - - 3 6 3 6
- - 4 - 4 - - - 4 - 4 -
fo 0
e100 := e12 Kq23 0 e3 e100 := e1 Kq12 ef23o
a1 b1 a2 b2 a3 b3 a1 b1 a2 b2 a3 b3
1 1 - - 4 - 1 1 - - - -
2 4 2 4 - - 2 4 2 4 - -
3 5 - - 4 - 3 5 - - - -
4 - - - 4 - 4 - - - - -
- - 1 2 - - - - 1 2 - -
- - 3 6 3 6 - - 3 6 3 6
- - 4 - 4 - - - 4 - 4 -
- - - - 1 3 - - - - 1 3
- - - - 2 5 - - - - 2 5
If we let e2 and e3 be empty bags, then the right-hand side evaluates to the
empty bag but the left-hand side simplifies to e1 A {⊥A(e2 )∪A(e3 ) }. Thus,
¬assoc(E, B). By taking a look at
with e2 and e3 yielding the empty bag, we see that ¬assoc(K, B). Imagine e1
and e2 yield empty bags. The left-hand side of
then evaluates to {⊥A(e1 )∪A(e2 ) } A e3 . Since the right-hand side gives the empty
bag, we have ¬assoc(B, K). Last in this sequence, we consider
Assume again, that e1 and e2 evaluate to the empty bag. Then, the right-
hand side does the same, whereas the left-hand side results in the familiar
{⊥A(e1 )∪A(e2 ) } A e3 . Consequently, ¬assoc(E, K). Summarizing, we have
¬assoc(E, B), ¬assoc(K, B), ¬assoc(B, K), and ¬assoc(E, K). These neg-
ative results are also comfirmed by our example (see Fig. 7.9). This leaves us
to check each of assoc(E, E), assoc(B, E), assoc(K, E), and assoc(K, K),
apart from the already known assoc(B, B). Fig. 7.9 shows that for this partic-
ular example all four properties hold.
Let us start with assoc(E, E). To illustrate one problem which occurs in
the context of associativity, consider the following three relations:
e1 := R1 e2 := R2 e3 := R3
a b c d
a b - d
e1 Ea=b e2 .
e2 Ec=d∨c=null e3
a b c b c d
a – – b – c
.
(e1 Ea=b e2 ) Ec=d∨c=null e3 .
e1 Ea=b (e2 Ec=d∨c=null e3 )
a b c d a b c d
a – – c a – – –
Hence, in general (e1 Eq12 e2 ) Eq23 e3 6= e1 Eq12 (e2 Eq23 e3 ). The problem is
that the predicate q23 does not reject null values, where a predicate rejects null
values for a set of attributes A if it evaluates to false or undefined on every
tuple in which all attributes in A are null. That is, q rejects null values if and
only if q(⊥A ) 6= true. We also say that a predicate is strict or strong if it rejects
null values. For our example predicates, the following holds. All qij reject null
values on any A(ei ). The predicates q120 and q20 3 do not reject null values on
A(e2 ) but on A(e1 ) or A(e3 ), respectively. The predicates qij 0 neither reject null
that F(q12 ) ∩ A(e3 ) = ∅ and F(q23 ) ∩ A(e1 ) = ∅. For the left-hand side of
associativity, we have
The right part of the cross product on the right-hand side of the union, (E⊥2 Eq23
e3 ), does look suspicious. Note that if q23 rejects nulls on A(e2 ), this part sim-
plifies to E⊥23 . To confirm our suspicion, we take a look at the other side of
associativity:
The last step is true, since e2 Eq23 e3 preserves e2 and F(q12 )∩A(e3 ) = ∅. Thus,
the left outerjoin is associative if and only if
But this holds if q23 rejects nulls on A(e2 ). Thus, without any effort we have
just proven the second of the following equivalences:
cover all combinations of B, E, and K except the well-known case for regu-
lar joins. Since comm(B) and assoc(B, E) hold without restrictions, Eqv. 7.66
holds without restrictions. Since E is strongly left linear and the consumer/producer
relationship is not disturbed because q12 does not access attributes from e3 and
q13 does not access attributes from e2 , Eqv. 7.67 holds without restrictions.
From Eqv. 7.64 and the fact that the full outerjoin is commutative, Eqv. 7.68
follows. Some care is just needed to see how the necessary restriction for asso-
ciativity carries over to the l-asscom equivalence. Similarily, Eqv. 7.69 follows
from Eqv.7.65. In all cases, the necessity of the restrictions is due to the fact
that commutativity and l-asscom imply associativity.
The r-asscom property is handled quickly. The only valid equivalence we
have is
e1 Kq13 (e2 Kq23 e3 ) ≡ e2 Kq23 (e1 Kq13 e3 ), (7.70)
which follows directly from comm(K) and assoc(K, K) if q13 and q23 are both
strict on A(e3 ).
These equivalences hold under the condition that p rejects nulls on A(e3 ). They
can be prove to use the semijoin reducer equivalences 7.197-7.201. Similarly,
260 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
the equivalences
We can extend the last two equivalences to outerjoins with default values:
σp (e1 ED
q e2 ) ≡ σp (e1 Bq e2 )
2
if ¬p(D2 ), (7.86)
σp (e1 KD
q
1 ,D2
e2 ) ≡ σp (e1 Eq e2 ) if ¬p(D1 ), (7.87)
σp (e1 KD
q
1 ,D2
e2 ) ≡ σp (e1 Hq e2 ) if ¬p(D2 ). (7.88)
Given this definition of the outer union operator, we can define the outerjoin
operations as follows:
+
e1 Eq e2 := e1 Bq e2 ∪ (e1 \ ΠA1 (e1 Bq e2 )), (7.90)
+ +
e1 Kq e2 := e1 Bq e2 ∪ (e1 \ ΠA1 (e1 Bq e2 )) ∪ (e2 \ ΠA2 (e1 Bq e2 )).(7.91)
The expression e1 Eq12 (e2 Bq23 e3 ) cannot be reordered given the equivalences so
far. In order to allow reorderability of this expression, the generalized outerjoin
was introduced by Dayal [216]. Here, we follow Rosenthal and Galindo-Legaria
7.10. EQUIVALENCES FOR OUTERJOINS 261
[723]. The generalized left outerjoin preserves attributes for a subset A ⊆ A(e1 )
only. It is defined as
+
e1 EA
q e2 := (e1 Bq e2 ) ∪ (ΠA (e1 ) \ ΠA (e1 Bq e2 )). (7.92)
e1 EA
q e2 := (e1 Bq e2 ) ∪ (ΠA (e1 Tq e2 ) A {⊥A∪A(e2 ) }b ). (7.93)
then yields
{[a1 : 1, b1 : 1, a2 : −, b2 : −, a3 : −, b3 : −]}b .
Evaluating
(R1 Eb1 =b2 R2 )
yields
{[a1 : 1, b1 : 1, a2 : 1, b2 : 1], [a1 : 1, b1 : 1, a2 : 2, b2 : 1]}b .
Thus,
(R1 Eq12 R2 ) EA(R
q23
1)
R3
evaluates to
{[a1 : 1, b1 : 1, a2 : −, b2 : −, a3 : −, b3 : −]2 }b .
7.11.2 Join
Let us now come to some more complex cases concerning the reorganization
of expressions containing grouping and join. Traditionally, the grouping oper-
ator, if specified in some SQL query, is performed after the evaluation of all
join operations. However, pushing down grouping can substantially reduce the
input size of the joins and, thus, can be highly beneficial. Before we give some
equivalences, let us look at some example relations, their joins and the result of
applying some grouping operators. Fig 7.12 presents two relations R1 and R2
and the result of their join (e3 ) in the top row. The next row shows the result
of applying a grouping operator to each of these items (e4 to e6 ). The last row
contains the results of joining a grouped result with one original relation and
7.11. EQUIVALENCES FOR UNARY GROUPING 263
the result of joining the two grouped results given in e4 and e5 . Let us assume
that our original expression
e3 := R1 1j1 =j2 R2
e1 := R1 e2 := R2
g1 j1 a1 g2 j2 a2
g1 j1 a1 g2 j2 a2
1 1 2 1 1 2
1 1 2 1 1 2
1 1 2 1 1 4
1 2 4 1 1 4
1 2 4 1 2 8
1 2 8 1 2 8
1 2 8 1 2 8
e4 := Γg1 ,j1 ;F1 (e1 ) e5 := Γg2 ,j2 ;F2 (e2 )
0 e6 := Γg1 ,g2 ;F (e3 )
g1 j1 c1 b1 g2 j2 c2 b02
g1 c b1 b2
1 1 1 2 1 1 2 6
1 4 16 22
1 2 2 12 1 2 1 8
e7 := e4 1j1 =j2 e2 e8 := e1 1j1 =j2 e5
g1 j1 c1 b01 g2 j2 a2 g1 j1 a1 g2 j2 c2 b02
1 1 1 2 1 1 2 1 1 2 1 1 2 6
1 1 1 2 1 1 4 1 2 4 1 2 1 8
1 2 2 12 1 2 8 1 2 8 1 2 1 8
e9 := e4 1j1 =j2 e5
g1 j1 c1 b01 g2 j2 c2 b02
1 1 1 2 1 1 2 6
1 2 2 12 1 2 1 8
where
e1 Bq e2 ≡ µg2 (µg1 (ΓJ1 ;g1 :ΠA(e1 )\J1 (e1 ) Bq ΓJ2 ;g2 :ΠA(e2 )\J2 (e2 ))). (7.98)
e1 Bq e2 ≡ µg2 (µg1 (ΓG+ ;g1 :ΠA(e (e1 ) Bq ΓG+ ;g2 :ΠA(e (e2 ))) (7.99)
1 1 )\J1 2 2 )\J2
holds. Unnesting two nested attributes g1 and g2 in a row, as done in the above
equivalences, is like generating the cross product of the items contained in g1
and g2 . Under the above assumptions, we can thus state the following two
7.11. EQUIVALENCES FOR UNARY GROUPING 265
e3 := e1 1j1 =j2 e2
e2 := R2 g1 j1 a1 g2 j2 a2
e1 := R1
g2 j2 a2 1 1 2 1 1 2
g1 j1 a1
1 1 2 1 1 2 1 1 4
1 1 2
1 1 4 1 2 4 1 2 8
1 2 4
1 2 8 1 2 4 1 2 16 ∗
1 2 8
1 2 16 ∗ 1 2 8 1 2 8
1 2 8 1 2 16 ∗
e4 := Γg1 ,j1 ;F1 (e1 ) e5 := Γg2 ,j2 ;F2 (e2 )
0 e6 := Γg1 ,g2 ;F (e3 )
g1 j1 c1 b1 g2 j2 c2 b02
g1 c b1 b2
1 1 1 2 1 1 2 6
1 6 28 54 ∗
1 2 2 12 1 2 2 24 ∗
e7 := e4 1j1 =j2 e2
e8 := e1 1j1 =j2 e5
g1 j1 c1 b01 g2 j2 a2
g1 j1 a1 g2 j2 c2 b02
1 1 1 2 1 1 2
1 1 2 1 1 2 6
1 1 1 2 1 1 4
1 2 4 1 2 2 24 ∗
1 2 2 12 1 2 8
1 2 8 1 2 2 24 ∗
1 2 2 12 1 2 16 ∗
e9 := e4 1j1 =j2 e5
g1 j1 c1 b01 g2 j2 c2 b02
1 1 1 2 1 1 2 6
1 2 2 12 1 2 2 24 ∗
where
equivalences
In the next step, we want to get rid of g. Assume we apply some aggregation
function agg(g.ai ) to g of the latter equivalence, where ai is an attribute of g1 ,
i.e., ai ∈ A(g1 ) or, by definition, ai ∈ A(e1 ) \ G+
1 . It should be clear that
sum(g1 .ai ) ∗ |g2 | if agg = sum,
count(g1 .ai ) ∗ |g2 | if agg = count,
agg(g.ai ) =
min(g1 .ai ) if agg = min,
max(g1 .ai ) if agg = max,
266 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
and |g| = |g1 | ∗ |g2 |. Analogously, we can exchange 1 and 2 in case ai ∈ A(g2 ).
Now, we are prepared to add an additional grouping operator ΓG+ ;g;F to
both sides of Eqv. 7.101. Therefore, we assume that J1 ⊆ G+ and J2 ⊆ G+ .
Further, we define G+ +
i as G ∩ A(ei ) for i = 1, 2. This results in
ΓG+ ;g;F (e1 Bq e2 ) ≡ ΓG+ ;g;F (Πg1 ,g2 (µg (χg:g1 Ag2 (
ΓG+ ;g1 :Π (e1 ) Bq ΓG+ ;g2 :Π (e2 ))))).
1 A(e1 )\G+
1
2 A(e2 )\G+
2
We know that g, g1 , and g2 cannot be part of the left-hand side. This means
that they cannot occur in G or A(F ). Thus, we can eliminate the projection,
which gives us
ΓG+ ;g;F (e1 Bq e2 ) ≡ ΓG+ ;g;F (µg (χg:g1 Ag2 (ΓG+ ;g1 :Π (e1 )Bq ΓG+ ;g2 :Π (e2 ))))
1 A(e1 )\G+
1
2 A(e2 )\G+
2
Now, note that the outer grouping on the right-hand side undoes the unnesting
which immediately proceeds it. We could be tempted to rewrite the right-hand
side to something like
We must make sure that E produces only a single tuple for each group construct-
ed by ΓG+ ;g;F . From the definition of Γ, we see that neither ΠG+ (ΓG+ ;g1 :Π +
(e1 ))
1 1 A(e1 )\G1
Next, we should start moving the different map operators inwards. The only
problem occurs since F1 ⊗ c2 and F2 ⊗ c1 need elements of both parts of the
join. Let F1 be decomposable into F11 and F12 and F2 be decomposable into F21
7.11. EQUIVALENCES FOR UNARY GROUPING 267
and it holds if F is splittable into F1 and F2 such that F(Fi ) ⊆ A(ei ) and
F = F1 ◦ F2 , and Fi is splittable and decomposable into Fi1 and Fi2 .
Consider the expression ΓG;g;F (e1 Bq e2 ). We denote the set of join attributes
of q from ei as Ji = F(q) ∩ A(ei ) for i = 1, 2, and the set of all join attributes
by J = J1 ∪ J2 . If J ⊆ G, we have the above case. Assume J 6⊆ G. Define
G+ = G ∪ J, Gi = G ∩ A(ei ) and G+ i = Gi ∪ JI for i = 1, 2. Let F be
an aggregation vector splittable into F1 and F2 such that F(Fi ) ⊆ A(ei ) and
F = F1 ◦ F2 . Further, let Fi be decomposable into Fi1 and Fi2 . Then Eqvs. 7.13
and 7.103, together with the properties of aggregation functions and vectors
discussed in Sec. 7.2, give us the following
ΓG;F (e1 Bq e2 ) ≡ ΓG;F12 ,F22 (ΓG+ ;F11 ,F21 (e1 Bq e2 )),
≡ ΓG;F12 ,F22 (χF 1,2 ⊗c2 ,F 1,2 ⊗c1 (
1 2
ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Bq ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ))),
1 1 2 2
Eager/Lazy Groupby-Count
The following equivalence corresponds to the main theorem of Yan and Larson
[934]. It states that
ΓG;F (e1 Bq e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Bq e2 ) (7.104)
1 1
holds if F is splittable and F1 is decomposable into F11 and F12 . The proof of it
can be found in [934].
From Eqv. 7.104 several other equivalences can be derived easily. First,
since the join is commutative,
ΓG;F (e1 Bq e2 ) ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) (7.105)
2 2
Eager/Lazy Group-by
If F2 is empty, that is F2 = (), then Eqv. 7.104 simplifies to
Eager/Lazy Count
If F1 = (), then Eqv. 7.104 simplifies to
ΓG;F (e1 Bq e2 ) ≡ ΓG;(F1 ⊗c2 ) (e1 Bq ΓG+ ;c2 :count(∗) (e2 )). (7.109)
2
7.11. EQUIVALENCES FOR UNARY GROUPING 269
Double Eager/Lazy
For the next equivalence, assume F2 = (). Then
≡Eqv.7.109 ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Bq ΓG+ ;c2 :count(∗) (e2 )).
1 1 2
Thus,
ΓG;F (e1 Bq e2 ) ≡ ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Bq ΓG+ ;c2 :count(∗) (e2 )) (7.110)
1 1 2
ΓG;F (e1 Bq e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Bq ΓG+ ;F 1 (e2 )) (7.111)
1 2 2
Eager/Lazy Split
Applying Eqv. 7.104 and then Eqv. 7.105 results in the equivalence
FD 1 (G1 , G2 ) → G+
1 and
FD 2 (G+
1 , G2 ) → TID(e2 ).
270 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
R1 R2ugly
R2good R2bad
g1 j1 a1 g2 j2 k2
g2 j2 g2 j2
1 1 2 1 1 1
1 1 1 1
1 2 4 2 1 2
2 2 1 2
1 2 8 2 1 3
eg12 := R1 1j1 =j2 R2good
E1g := Γg1 ,g2 ;F (eg12 )
g1 j1 a1 g2 j2
g1 g2 c1 b1
1 1 2 1 1
1 1 1 2
1 2 4 2 2
1 2 2 12
1 2 8 2 2
e12 := R1 1j1 =j2 R2
b bad
not do so since we wanted to state them in the same way as Yan and Larson
did. As an exercise, the reader should perform the simplification.
The purpose of the functional dependencies can be sketched as follows. FD 1
ensures that each group on the left-hand side corresponds to one group on the
right-hand side. That is, the grouping by G+ 1 is not finer grained than the
grouping by G. FD 2 ensures that each row in the left argument of the join
on the right-hand side contributes at most one row to the overall result of the
right-hand side. This is illustrated by the following examples.
Fig 7.14 contains a relation R1 , which we use for expression e1 , and three
relations R2good , R2bad , and R2ugly , which we use for expression e2 . All of them are
depicted in the top row of Fig. 7.14. The next three rows contain the evaluations
of the left-hand side of Eqv. 7.113, divided into two steps. The first step (left
column) calculates the join between R1 and each of the possibilities for e2 . The
second step groups the result of the join (right column). The last three columns
evaluate the right-hand side of Eqv. 7.113. Again, the calculation is separated
into two steps. The first step does the grouping, the second step the join. We
leave the execution of the final projection to the reader.
For this example, the functional dependencies read as follows:
FD 1 (g1 , g2 ) → g1 , j1 and
FD 2 (g1 , j1 , g2 ) → tid(e2 ).
ΠD D
C (ΓG;F (e1 Bq e2 )) ≡ ΠC (ΓG+ ;F (e1 ) Bq e2 ), (7.115)
1
equivalences:
ΓG;F (e1 Bq e2 ) ≡ ΠC (χ(F\ c2 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))), (7.117)
⊗c )◦F 1 2 2 2 2
ΓG;F (e1 Bq e2 ) ≡ ΠC (χ \
2 (ΓG+ ;F 1 (e1 ) Bq ΓG+ ;c2 :count(∗) (e2 ))), (7.122)
F1 ⊗c2 1 1 2
ΓG;F (e1 Bq e2 ) ≡ ΠC (χ \
2 (ΓG+ ;c1 :count(∗) (e1 ) Bq ΓG+ ;F 1 (e2 ))), (7.123)
F2 ⊗c1 1 2 2
ΓG;F (e1 Bq e2 ) ≡ ΠC (χ \
2 2 (
F1 ⊗c2 ◦F\
2 ⊗c1
ΓG;F (e1 Bq e2 )
≡7.105 ΓG;(F1 ⊗c2 )◦F22 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))
2 2
≡7.16 ΠC (ΓG1 ,G+ ;(F1 ⊗c2 )◦F 2 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )))
2 2 2 2
≡7.97 ΠC (χ \ 2(
(F1 ⊗c2 )◦F2
Let us now come to the conditions attached to the equivalences. For our dis-
cussion, we denote by I the join with its (grouped) arguments, i.e.,
ΓG;F (e1 Bq e2 )
≡7.117 ΠC (χ \ 2 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )))
(F1 ⊗c 2 )◦F2 2 2
By symmetry, Eqvs. 7.116 and 7.118 hold. Since Eqvs. 7.120 and 7.121 are
also simplifications of Eqvs. 7.116 and 7.117, they can be proven similarily.
Let us turn to Eqv. 7.124. Since
ΠG+ (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ))
1 1 1
and
ΠG+ (ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))
2 2 2
are duplicate-free, Eqv. 7.124 holds if G → G+ . Eqvs. 7.122 and 7.123 follow
by simplifications from Eqv.7.124, if F1 or F2 is empty.
Applying Eqv. 7.104 and then Eqv. 7.23 to the right branch of the union gives
us:
ΓG;F11 ,F21 ((e1 Tq e2 ) A E⊥ )
≡ ΠC (ΓG;(F 1 ⊗c1 )◦F 1,2 (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 Tq e2 ) A E⊥ ))
2 1 1 1
≡ ΠC (ΓG;(F 1 ⊗c1 )◦F 1,2 ((ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Tq e2 ) A E⊥ ))
2 1 1 1
274 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
where in the first step we could omit the ΠC due to the subsequent grouping.
The second step pulls the two ΓG;(F 1 ⊗c1 )◦F 1,2 operators out of the two union
2 1
branches and merges them with the outer ΓG;F12 ,F22 . This is possible due to the
properties of the aggregation vectors involved and the fact that both group on
the same set G of grouping attributes.
Eager/Lazy Groupby-Count
Summarizing, we have the equivalence
ΓG;F (e1 Eq e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq e2 ), (7.125)
1 1
where E⊥ = {⊥A(e2 ) }. Applying Eqv. 7.105 to the left argument of the union
results in
ΓG;F11 ,F21 (e1 Bq e2 ) ≡ ΓG;(F11 ⊗c2 )◦F22 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )).
2 2
Eager/Lazy Group-by
If F2 = (), Eqv. 7.125 simplifies to
F 1 (∅)
ΓG;F (e1 Eq e2 ) ≡ ΓG;F22 (e1 Eq 2 ΓG+ ;F 1 (e2 )), (7.128)
2 2
Eager/Lazy Count
If F1 = (), Eqv. 7.125 simplifies to
ΓG;F (e1 Eq e2 ) ≡ ΓG;(F1 ⊗c2 ) (e1 Ecq2 :1 ΓG+ ;c2 :count(∗) (e2 )). (7.130)
2
Double Eager/Lazy
For the next equivalence assume F2 = (). We would like to derive an equivalence
similar to Eqv. 7.110. Here it is:
ΓG;F (e1 Eq e2 ) ≡ ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Ecq2 :1 ΓG+ ;c2 :count(∗) (e2 )), (7.131)
1 1 2
F 1 (∅)
ΓG;F (e1 Eq e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Eq 2 ΓG+ ;F 1 (e2 )) (7.132)
1 2 2
holds.
Eager/Lazy Split
The companion of Eqv. 7.112 for the left outerjoin is
which holds if F1 is decomposable into F11 and F12 , and F2 is decomposable into
F21 and F22 .
276 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
F 1 (∅),c2 :1
ΓG;F (e1 Eq e2 ) ≡ ΠC (χ(F\ c2 (e1 Eq
⊗c )◦F
2
ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))),
(7.135)
1 2 2 2 2
ΓG;F (e1 Eq e2 ) ≡ ΠC (χ \
2 (ΓG+ ;F 1 (e1 ) Ecq2 :1 ΓG+ ;c2 :count(∗) (e2 ))), (7.140)
F1 ⊗c2 1 1 2
F 1 (∅)
ΓG;F (e1 Eq e2 ) ≡ ΠC (χ \
2 (ΓG+ ;c1 :count(∗) (e1 ) Eq 2 ΓG+ ;F 1 (e2 )), (7.141)
F2 ⊗c1 1 2 2
ΓG;F (e1 Eq e2 ) ≡ ΠC (χ 2 2 (
G;F\ \
1 ⊗c2 ◦F2 ⊗c1
F 1 (∅),c2 :1
ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq 2 ΓG+ ;F 1 ◦(c2 :count(∗))(7.142)
(e2 ))).
1 1 2 2
D = d1 : c1 , . . . dl : cl ,
ΓG;F (e1 ED
q e2 ).
If we take a close look at the proof of Eqv. 7.125 and think of E⊥ as being
defined as
E⊥ := (⊥A(e2 )\A(D) A {D}),
7.11. EQUIVALENCES FOR UNARY GROUPING 277
we see that the proof remains valid. Thus, we have the following equivalences:
ΓG;F (e1 ED
q e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq e2 ), (7.143)
1 1
D,F21 (∅),c2 :1
ΓG;F (e1 ED
q e2 ) ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Eq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )), (7.144)
2 2
ΓG;F (e1 ED
q e2 ) ≡ ΓG;F12 (ΓG+ ;F 1 (e1 ED
q e2 )) (7.145)
1 1
if F2 is empty,
D,F21 (∅)
ΓG;F (e1 ED
q e2 ) ≡ ΓG;F22 (e1 Eq ΓG+ ;F 1 (e2 )) (7.146)
2 2
if F1 is empty,
ΓG;F (e1 ED
q e2 ) ≡ ΓG;(F2 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) ED
q e2 ) (7.147)
1
if F1 is empty,
ΓG;F (e1 ED
q e2 ) ≡ ΓG;(F1 ⊗c2 ) (e1 ED,c
q
2 :1
ΓG+ ;c2 :count(∗) (e2 )) (7.148)
2
if F2 is empty,
ΓG;F (e1 ED
q e2 ) ≡ ΓG;(F1 ⊗c2 ) (ΓG+ ;F 1 (e1 ) ED,c
q
2 :1
ΓG+ ;c2 :count(∗) (e2 )) (7.149)
1 1 2
if F2 is empty,
D,F21 (∅)
ΓG;F (e1 ED
q e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Eq ΓG+ ;F 1 (e2 )) (7.150)
1 2 2
if F1 is empty,
ΓG;F (e1 ED
q e2 ) ≡ ΓG;(F1 ⊗c2 )◦(F2 ⊗c1 ) ( (7.151)
D,F21 (∅),c2 :1
ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )).
1 1 2 2
These equivalences hold under the same conditions as their corresponding equiv-
alences for the outerjoin with no default.
ΓG;F (e1 Kq e2 ).
In order to deal with this expression, we will need the full outerjoin with defaults
for both sides. Define E1⊥ = {⊥A(e1 ) } and let us start by observing
ΓG;F 1 (e1 Eq e2 ) ≡ ΓG;(F 1 ⊗c1 )◦F 1,2 (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Eq e2 )
2 1 1 1
F 1,1 (∅),c2 :1
≡ ΓG;(F 1 ⊗c2 )◦F 1,2 (e1 Eq 2 ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 )).
1 2 2 2
278 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
Applying Eqvs. 7.104 and 7.23 to the right-hand side of the union yields
ΓG;F 1 ((e2 Tq e1 ) A E1⊥ ) ≡ ΓG;(F 1 ⊗c2 )◦F 1,2 (ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 Tq e1 ) A E1⊥ )
1 2 2 2
≡ ΓG;(F 1 ⊗c2 )◦F 1,2 ((ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ) Tq e1 ) A E1⊥ ).
1 2 2 2
ΓG;F (e1 Kq e2 )
≡ ΓG;F 2 (
F 1,1 (∅),c2 :1
ΓG;(F 1 ⊗c2 )◦F 1,2 (e1 Eq 2 ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ))
1 2 2 2
∪
(ΓG;(F 1 ⊗c2 )◦F 1,2 ((ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ) Tq e1 ) A E1⊥ )))
1 2 2 2
Eager/Lazy Groupby-Count
Due to the commutativity of the full outerjoin, we thus have
F 1 (∅),c1 :1;−
ΓG;F (e1 Kq e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 )Kq 1 e2 ) (7.152)
1 1
holds.
Eager/Lazy Group-by
If F2 is empty, then Eqv. 7.152 simplifies to
F 1 (∅);−
ΓG;F (e1 Kq e2 ) ≡ ΓG;F12 (ΓG+ ;F 1 (e1 ) Kq 1 e2 ). (7.154)
1 1
Eager/Lazy Count
If F1 is empty, then Eqv. 7.152 simplifies to
ΓG;F (e1 Kq e2 ) ≡ ΓG;(F2 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) Kcq1 :1;− e2 ). (7.156)
1
Double Eager/Lazy
If F2 is empty, the equivalence
F 1 (∅);c2 :1
ΓG;F (e1 Kq e2 ) ≡ ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Kq 1
ΓG+ ;(c2 :count(∗)) (e2 ))
1 1 2
(7.158)
1 2
holds if F1 is decomposable into F1 and F1 . If F1 is empty, the equivalence
c :1;F21 (∅)
ΓG;F (e1 Kq e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) Kq1 ΓG+ ;F 1 (e2 ))
1 2 2
(7.159)
holds if F2 is decomposable into F21 and F22 .
Proof: If F2 is empty, then
F 1 (∅);−
ΓG;F (e1 Kq e2 ) ≡Eqv. 7.154 ΓG;F12 (ΓG+ ;F 1 (e1 ) Kq 1 e2 )
1 1
F 1 (∅);c2 :1
≡Eqv. 7.157 ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Kq 1 ΓG+ ;(c2 :count(∗)) (e2 )).
1 1 2
If F1 is empty, then
−;F21 (∅)
ΓG;F (e1 Kq e2 ) ≡Eqv. 7.155 ΓG;F22 (e1 Kq ΓG+ ;F 1 (e2 ))
2 2
c :1;F21 (∅)
≡Eqv. 7.156 ΓG;(F22 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) Kq1 ΓG+ ;F 1 (e2 )).
1 2 2
Eager/Lazy Split
If F is splittable and decomposable, then
Proof:
ΓG;F (e1 Kq e2 )
F 1 (∅),c1 :1;−
≡Eqv. 7.152 ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Kq 1 e2 ))
1 1
F 1 (∅),c1 :1;F21 (∅),c2 :1
≡Eqv. 7.153 ΓG;(F12 ⊗c2 )◦(F22 ⊗c1 ) (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Kq 1 ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ))
1 1 2 2
2
280 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
−;F21 (∅),c2 :1
ΓG;F (e1 Kq e2 ) ≡ ΠC (χ(F\ c2 (e1 Kq
⊗c )◦F
ΓG+ ;F 1 ◦(c2 :count(∗)) (e(7.162)
2 ))),
1 2 2 2 2
c :1;F21 (∅)
ΓG;F (e1 Kq e2 ) ≡ ΠC (χ 2 (ΓG+ ;(c1 :count(∗)) (e1 ) Kq1 ΓG+ ;F 1(7.168)
(e2 ))),
G;F\
2 ⊗c1 1 2 2
ΓG;F (e1 Kq e2 ) ≡ ΠC (χ 2 2 (
G;F\ \
1 ⊗c2 ◦F2 ⊗c1
7.11.6 D-Join
Next, let us turn to the d-join. The outline of this subsection mirrors the one
for regular joins. Indeed, all equivalences that hold for regular joins will also
hold for d-joins.
Eager/Lazy Groupby-Count
The equivalence
ΓG;F (e1 Cq e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Cq e2 ) (7.170)
1 1
ΓG;F (e1 Cq e2 ) ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Cq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) (7.171)
2 2
Eager/Lazy Group-by
If F2 is empty, that is F2 = (), Eqv. 7.170 simplifies to
This equivalence holds if F1 is splittable and decomposable into F11 and F12 .
If F1 is empty, Eqv. 7.171 simplifies to
This equivalence holds if F2 is splittable and decomposable into F21 and F22 .
Eager/Lazy Count
ΓG;F (e1 Cq e2 ) ≡ ΓG;(F1 ⊗c2 ) (e1 Cq ΓG+ ;c2 :count(∗) (e2 )). (7.175)
2
Double Eager/Lazy
If F2 is empty
ΓG;F (e1 Cq e2 ) ≡ ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Cq ΓG+ ;c2 :count(∗) (e2 )), (7.176)
1 1 2
ΓG;F (e1 Cq e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Cq ΓG+ ;F 1 (e2 )) (7.177)
1 2 2
Eager/Lazy Split
Applying Eqv. 7.170 and then Eqv. 7.171 results in the equivalence
The top grouping can be eliminated under the conditions for the regular joins.
282 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
7.11.7 Groupjoin
Simple Facts about the Groupjoin
Last in this section, we consider the groupjoin and thus the expressions of the
form
ΓG;F (e1 Zq;F̂ e2 ).
Before we start, we discuss some equivalences for the groupjoin. Since σ and χ
are linear and Z is linear in its left argument, it is easy to show that
Then, we note that unary grouping can be expressed with the help of the
groupjoin.
Apart from this detail, these equivalences follow directly from the definition of
the groupjoin.
7.11. EQUIVALENCES FOR UNARY GROUPING 283
For the regular join, we can apply a selection to get rid of tuples not finding
a join partner by counting the number of join partners. This leads to the
following equivalences:
ΠC (e1 BG1 =G2 ΓθG2 ;g;F (e2 )) ≡ σc2 >0 (e1 ZG1 θG2 ;g;F ◦(c2 :|g|) e2 ),
ΠC (e1 BG1 =G2 ΓθG2 ;F (e2 )) ≡ σc2 >0 (e1 ZG1 θG2 ;F ◦(c2 :count(∗)) e2 ),
ΠC (e1 BG1 =G2 ΓG2 ;g;F (e2 )) ≡ σc2 >0 (e1 ZG1 =G2 ;g;F ◦(c2 :|g|) e2 ),
ΠC (e1 BG1 =G2 ΓG2 ;F (e2 )) ≡ σc2 >0 (e1 ZG1 =G2 ;F ◦(c2 :count(∗)) e2 ).
≡7.183 ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) ZJ1 θJ2 ;F e2 ))
1 1
The first equivalence additionally needs that F2 is empty, the second that F1 is
empty.
2. J2 → G+
2 holds in e2 ,
R2 R3 S
R1
a a b c d e
a
1 1 1 1 8 1
1
1 1 2 1 9 2
m2 : R2 Ea=c S
m1 : R1 Ea=c S a c d e m3 : R3 Eb=e S
a c d e 1 1 8 1 a b c d e
1 1 8 1 1 1 9 2 1 1 1 8 1
1 1 9 2 1 1 8 1 1 2 1 9 2
1 1 9 2
Consider just the one where all these operators have a hash-based implementa-
tion in a main-memory setting. Then, the left-hand side requires to build two
hash tables, whereas the right-hand side requires to build only one. Further,
no intermediate result tuples for the outerjoin have to be built.
The second equivalence replaces a sequence of a join and a grouping by a
groupjoin. Given the notations of the previous subsection, the equivalence
ΓG;F (e1 BJ1 =J2 e2 ) ≡ ΠC (σc2 >0 (e1 ZJ1 =J2 ;F ◦(c2 :count(∗)) e2 )) (7.194)
1. G → G+ +
2 and G1 , G2 → TID(e1 ) hold in e1 BJ1 =J2 e2
2. J2 → G+
2 holds in e2 , and
3. F(F ) ⊆ A(e2 ).
The intuition behind these conditions is the same as for the previous equiva-
lence. The fourth condition could be omitted, since empty groups are eliminated
by the selection σc2 >0 . Eqv. 7.194 is beneficial under similar circumstances as
Eqv. 7.193.
Before we come to the proofs, let us have a look at some examples. Fig. 7.15
contains some relations. The results of some outerjoins (Ri Eq S) with two
different join predicates are given in Fig. 7.16. Since all tuples in some Ri
always find a join partner, the results of the outerjoins are the same as the
corresponding join results. We are now interested in the functional dependencies
occurring in the conditions of our main equivalences. Therefore, we discuss
four example instances of Eqv. 7.194, where at most one of the functional
dependencies is violated:
286 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
G → G+
2 G1 , G+
2 → TID(e1 ) J 2 → G+
2
1 + + +
2 + + -
3 + - +
4 - + +
The according instances of the left-hand and right-hand side of Eqv. 7.194 are:
LHS RHS
1 Γa;sum(d) (R1 Ea=c S) R1 Za=c;sum(d) S
2 Γa,e;sum(d) (R1 Ea=c S) R1 Za=c;sum(d) S
3 Γa;sum(d) (R2 Ea=c S) R2 Za=c;sum(d) S
3 Γa;sum(d) (R3 Eb=e S) R3 Zb=e;sum(d) S
G G1 G2 J2 G+2
1 {a} {a} ∅ {c} {c}
2 {a, e} {a} {e} {c} {c, e}
3 {a} {a} ∅ {c} {c}
4 {a} {a} ∅ {e} {e}
Taking a look at Fig. 7.17, we see that both sides of the equivalence give the
same result only if none of the functional dependencies is violated.
7.12. ELIMINATING REDUNDANT JOINS 287
Proof of Eqv. 7.193 We now give the proof of Eqv. 7.193. We start with
the right-hand side and transform it until we get the left-hand side:
The preconditions follow from collecting the preconditions of the different equiv-
alences applied. 2
Proof of Eqv. 7.194 Eqv. 7.194 follows directly from Eqv. 7.193. An alter-
native is to modify the above proof by using Eqv. 7.187 instead of Eqv. 7.183
and Eqv. 7.117 instead of Eqv. 7.137.
Remark Often, we encounter expression of the form ΓG;F (e1 ) ZJ1 =J2 e2 . If
G = J1 , the hash table for the grouping can be reused by the groupjoin. Simi-
larily, if G ⊇ J1 , any sorting produced to perform a sort-based grouping can be
reused for a a sort-based groupjoin.
where Ei is defined as
ΓA(ei );ci :count(∗) (ei )
for i = 1, 2.
ΠD
A(e1 ) (e1 BA1 =A2 e2 ) ≡ e1
holds whenever the second condition is fulfilled. Outerjoins are also easier. The
equivalence
e1 Bq e2 ≡ e1 Bq (e2 Nq e1 ) (7.197)
e1 Nq e2 ≡ e1 Nq (e2 Nq e1 ) (7.198)
e1 Tq e2 ≡ e1 Tq (e2 Nq e1 ) (7.199)
e1 Eq e2 ≡ e1 Eq (e2 Nq e1 ) (7.200)
e1 Zq;g:e e2 ≡ e1 Zq;g:e (e2 Nq e1 ) (7.201)
F(pb ) ∩ A(e2 ) = ∅ e1 e2 e1 e3
F(pb ) ∩ A(e1 ) = ∅ e2 e3 e1 e3
assoc (oa , ob )
comm (ob )
(e2 oa12 e1 ) ob13 e3 e3 ob13 (e2 oa12 e1 )
comm (ob )
(e1 oa12 e2 ) ob13 e3 e3 ob13 (e1 oa12 e2 )
comm (oa )
(e1 ob13 e3 ) oa12 e2 e2 oa12 (e1 ob13 e3 )
comm (oa )
(e3 ob13 e1 ) oa12 e2 e2 oa12 (e3 ob13 e1 )
assoc (ob , oa )
applied for left nesting but not both, and either associativity or r-asscom can
be applied for right-nesting but not both.
Fig. 7.19 shows an example of the seach space for an expression (e1 ◦a12 e2 )◦b13
e3 , where the subscripts of the operators indicate which arguments are refer-
enced in their predicate. We observe that any expression in this search space
can be reached by a sequence of at most two applications of commutativity, at
most one application of associativity, l-asscom, or r-asscom, finally followed by
at most two applications of commutativity. The total number of applications
of commutativity can be restricted to 2. The case (e1 ◦a12 e2 ) ◦b23 e3 is left to the
reader.
The last observation only holds if there are no degenerate predictates and
no cross products in the original plan. Fig. 7.20 shows all possible plans for
two binary operators ◦a and ◦b . One can think of them as cross products. The
plans are generated by applying assoc, l-asscom, r-asscom, and commutativity
rewrites. Assume that the initial plan is the one in row 1 and column 3. The
other plans in the first row are generated by using all rewrites but commuta-
tivity. The second row shows the plans derived from the plan above them by
7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE291
◦b ◦a ◦b ◦a ◦b ◦a
e2 ◦a e1 ◦b ◦a e3 ◦b e2 e1 ◦a e3 ◦b
e3 e1 e3 e2 e2 e1 e3 e1 e2 e3 e2 e1
◦b ◦a ◦b ◦a ◦b ◦a
◦a e2 ◦b e1 e3 ◦a e2 ◦b ◦a e1 ◦b e3
e1 e3 e2 e3 e1 e2 e1 e3 e3 e2 e1 e2
◦b ◦a ◦b ◦a ◦b ◦a
◦a e2 ◦b e1 e3 ◦a e2 ◦b ◦a e1 ◦b e3
e3 e1 e3 e2 e2 e1 e3 e1 e2 e3 e2 e1
applying commutativity to the lower operator. The third row applies commu-
tativity to the top operator of the plan above it in the first row. The fourth row
applies commutativity to both operators. Thus, all plans in a column below
a plan in the first row can be generated by at most two applications of com-
mutativity. Of course, there are more possibilities to transform one plan into
another. In order to indicate them, let us denote the matrix of plans by P . The
application of transformations other than commutativity gives:
P [2, i] ←→ P [3, i + 1]
P [3, i] ←→ P [2, i + 1]
P [4, i] ←→ P [4, i + 1]
P [1, 1] ←→ P [4, 6]
P [2, 1] ←→ P [3, 6]
P [3, 1] ←→ P [3, 6]
It is easy to see, that we need more than one of assoc, l-asscom, or r-asscom to
get from P [1, 3] to, e.g., P [1, 1].
7.15.2 Exploration
How does the plan generator explore this search space? Remember the join or-
dering algorithms from Chapter 3, especially DPsub, DPsize, and DPccp, which
are all based on dynamic programming. We extend the simple algorithm DPsub
to one called DPsube. The resulting code is shown in Fig. ??. As input it takes
the set of n relations R = {R0 , . . . , Rn−1 } and the set of operators O containing
292 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
Preliminaries
In order to open our approach for new algebraic operators, we use a table driven
approach. We use four tables which contain the properties of the algebraic
operators. These contain the information of Tables 7.6 and 7.7 together with
the information about the commutativity of the operators. Thus, extending
our approach only requires to extend these tables.
We develop our final approach in three steps. At each step, we present a
complete bundle consisting of three components:
1. a representation for conflicts
Algorithm DPsube
for all Ri ∈ R
BestPlan({Ri }) = Ri ;
for 1 ≤ i < 2n − 1 ascending
S = {Rj ∈ R|(bi/2j c mod 2) = 1}
if (|S| = 1) continue
for all S1 ⊂ S, S1 6= ∅ do
S2 = S \ S1 ;
for all ◦ ∈ O do
if (applicable(◦, S1 , S2 ))
build and handle the plans BestPlan(S1 ) ◦ BestPlan(S2 )
if (◦ is commutative)
build and handle the plans BestPlan(S2 ) ◦ BestPlan(S1 )
return BestPlan(R);
set of relations that must be present before the operator can be applied. Some-
times, SES is called NEL.
For every operator ◦, SES(◦) is thus a set of relations. Then, a plan of the
form plan(S1 )◦plan(S2 ) is only considered if the test SES(◦) ⊆ S1 ∪S2 succeeds.
Hence, SES checks for a consumer/producer relationships.
Some operators like the groupjoin or map operator introduce new attributes.
These are treated as if they belong to a new artificial relation. This new relation
is present in the set of accessible relations after the groupjoin or map operator
has been applied.
We assume that an initial operator tree is given and refer to it as the operator
tree. We need some notation. For a set of attributes A, we denote by REL(A)
the set of relations to which these attributes belong. We abbreviate REL(F(e))
by FT (e). Let ◦ be an operator in the initial operator tree. We denote by left(◦)
(right(◦)) its left (right) descendants. STO(◦) denotes the operators contained in
the operator subtree rooted at ◦. REL(◦) denotes the set of relations contained
in the subtree rooted at ◦.
The syntactic eligibility set (SES) is used to express the syntactic con-
straints: all referenced attributes/relations must be present before an expression
can be evaluated. First of all, it contains the relations referenced by a predicate.
Further, as we also deal with table functions and dependent join operators as
well as groupjoins, we need the following extensions. Let R be a relation, T
a table-valued function call, ◦p any of our binary or unary operators except a
294 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
SES(R) = {R}
SES(T ) = {T }
[
SES(◦p ) = SES(R) ∩ REL(◦p )
R∈FT (p)
[
SES(gjp;a1 :e1 ,...,an :en ) = SES(R) ∩ REL(gj)
R∈FT (p)∪FT (ei )
tesl ⊆ S1 , and
tesr ⊆ S2 .
Approach CD-A
Let us first consider a simple operator tree with only two operators. Take a
look at the upper half of Fig. 7.22. There, it illustrates the application of
associativity and l-asscom to some plan. In the case that associativity does
not hold, we add REL(e1 ) to TES(◦b ). This prevents the plan on the right-hand
side of the arrow marked with assoc. It does not, however, prevent the plan
on the right-hand side of the arrow marked with l-asscom. Similarily, adding
REL(e2 ) to TES(◦b ) does prevent the plan resulting from l-asscom but not the
plan resulting from applying associativity. The lower part of Fig. 7.22 shows
the actions needed if an operator is nested in the right argument. Again, we
can precisely prevent the invalid plans.
The only problem we now have to solve is that a conflicting operator is
deeper down the tree. This is possible since in general the ei are trees them-
selves. Some reordering could possibly move a conflicting operator up to the
top of an argument subtree. We thus have to calculate the total eligibility sets
bottom-up. In a first step, for every operator ◦ in a given operator tree SES(◦)
is calculated. Then, TES(◦) is initialized to SES(◦). After that, the following
procedure is applied bottom-up to every operator ◦apa in the operator tree:
7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE295
◦b ◦a
assoc ¬assoc(◦a , ◦b )
◦a e3 −−→ e1 ◦b TES(◦b ) ∪ = REL(e1 )
e1 e2 e2 e3
◦a
l-asscom ¬l-asscom(◦a , ◦b )
−−→ e2
◦b TES(◦b ) ∪ = REL(e2 )
e1 e3
◦b ◦a
assoc ¬assoc(◦b , ◦a )
e3 ◦a −−→ e2
◦b TES(◦b ) ∪ = REL(e2 )
e1 e2 e3 e1
◦a
r-asscom ¬r-asscom(◦a , ◦b )
−−→ e1 ◦b TES(◦b ) ∪ = REL(e1 )
e3 e2
Figure 7.22: Calculating TES for simple operator trees
If we do not have degenerate predicates and cross products among the operators
in the initial operator tree, we can safely use TES instead of REL.
The conflict representation comprises the TES for every operator. The defi-
nition of applicable is
Let us now see why applicable is correct. We have to show that it prevents
the generation of bad plans. Take the ¬ assoc case with nesting on the left.
Let the original operator tree contain (e1 ◦a12 e2 ) ◦b23 e3 . Define the set of tables
R2 := FT (◦b23 ) ∩ REL(left(◦b23 )) and R3 := FT (◦b23 ) ∩ REL(right(◦b23 )). Then
SES(◦b23 ) = R2 ∪ R3 . Further, since ¬assoc(◦a12 , ◦b23 ), we have
E2,3 N0,1
R0 R1 R0 R1 R2 R3 R2 R3
initial plan Plan 1 Plan 3
Note that we used ⊇ and not equality since due to other conflicts, TES(◦b ) could
be larger. Next, we observe that
tesl(◦b23 ) ⊆ S1 and
tesr(◦b23 ) ⊆ S2 ,
and fails if S1 6⊇ REL(e1 ). Thus, neither e2 ◦b23 e3 nor e3 ◦b23 e2 will be generated
and, hence, e1 ◦a12 (e2 ◦b23 e3 ) will not be generated. Similarily, if ¬l-asscom(◦a , ◦b ),
tesl(◦b ) will contain REL(e2 ) and the test prevents the generation of e1 ◦b e3 . The
remaining two cases can be checked analogously.
From this discussion, it follows that DPsube generates only valid plans. How-
ever, it does not generate all valid plans. It is thus incomplete, as we can see
from the example shown in Fig. 7.23. Since ¬assoc(N, E), TES(E) contains
R1 . Thus, neither Plan 1 nor Plan 3 or any of those derived from applying join
commutativity to them will be generated.
Approach CD-B
In order to avoid this problem, we need the more flexible mechanism of conflict
rules. A conflict rule is simply a pair of sets of tables denoted by T1 → T2 .
With every operator node ◦ in the operator tree, we associate a set of conflict
rules. Thus, our conflict representation now associates with every operator a
TES and a set of conflict rules.
Before we introduce their construction, let us illustrate their role in applica-
ble(S1 , S2 ). A conflict rule T1 → T2 is obeyed for S1 and S2 , if with S = S1 ∪ S2
the following condition holds:
T1 ∩ S 6= ∅ =⇒ T2 ⊆ S.
7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE297
◦b ◦a
assoc ¬assoc(◦a , ◦b )
◦a e3 −−→ e1 ◦b CR(◦b ) + = REL(e2 ) → REL(e1 )
e1 e2 e2 e3
◦a
l-asscom ¬l-asscom(◦a , ◦b )
−−→ e2
◦b CR(◦b ) + = REL(e1 ) → REL(e2 )
e1 e3
◦b ◦a
assoc ¬assoc(◦b , ◦a )
e3 ◦a −−→ e2
◦b CR(◦b ) + = REL(e1 ) → REL(e2 )
e1 e2 e3 e1
◦a
r-asscom ¬r-asscom(◦a , ◦b )
−−→ e1 ◦b CR(◦b ) + = REL(e2 ) → REL(e1 )
e3 e2
Figure 7.24: Calculating conflict rules for simple operator trees
Thus, if T1 contains a single relation from S, then S must contain all relations
in T2 . Keeping this in mind, it is easy to see that the invalid plans are indeed
prevented by the rules shown in Fig. 7.24 if they are obeyed.
As before, we just need to generalize it to arbitrary trees:
CD-B(◦bpb ) // operator ◦b and its predicate pb
for ∀ ◦a ∈ STO(left(◦b ))
if ¬assoc(◦a , ◦b ) then CR(◦b ) + = REL(right(◦a )) → REL(left(◦a ))
if ¬l-asscom(◦a , ◦b ) then CR(◦b ) + = REL(left(◦a )) → REL(right(◦a ))
for ∀ ◦a ∈ STO(right(◦b ))
if ¬assoc(◦b , ◦a ) then CR(◦b ) + = REL(left(◦a )) → REL(right(◦a ))
if ¬r-asscom(◦a , ◦b ) then CR(◦b ) + = REL(right(◦a )) → REL(left(◦a ))
B0,1 B1,2
R0 N1,3 B0,1 R2
B1,2 R3 R0 N1,3
R1 R2 R1 R3
initial plan valid plan prevented
The latter rule prevents the plan on the right-hand side of Fig. 7.25. Note that it
is overly careful since R2 6∈ FT (N1,3 ). In fact, r-asscom would never be applied
in this example, since B0,1 accesses table R1 and applying r-asscom would thus
destroy the consumer/producer relationship already checked by SES(B0,1 ).
Approach CD-C
The approach CD-C differs from CD-B only by the calculation of the conflict
rules. The conflict representation and the procedure for applicable remain the
same. The idea is now to learn from the above example and include only those
relations under operator ◦a , which occur in the predicate. However, we have to
be careful to include special cases for degenerate predicates and cross products.
if ¬assoc(◦b , ◦a ) then
if REL(right(◦a )) ∩ FT (◦a ) 6= ∅ then
CR(◦b ) + = REL(left(◦a )) → REL(right(◦a )) ∩ FT (◦a )
else
CR(◦b ) + = REL(left(◦a )) → REL(right(◦a ))
if ¬r-asscom(◦a , ◦b ) then
if REL(left(◦a )) ∩ FT (◦a ) 6= ∅ then
CR(◦b ) + = REL(right(◦a )) → REL(left(◦a )) ∩ FT (◦a )
7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE299
else
CR(◦b ) + = REL(right(◦a )) → REL(left(◦a ))
Rule Simplification
Large TES make the search space to be explored by the plan generator smaller
and, thus, lead to more efficiency, at least if an advanced plan generator like
DPhyp is used. Further, reducing the number of rules slightly decreases plan
generation time. Thus, applying laws like
R1 → R2 , R1 → R3 ≡ R1 → R2 ∪ R3
R1 → R2 , R3 → R2 ≡ R1 ∪ R3 → R2
can be used to rearrange the rule set for efficient evaluation. However, we are
much more interested in eliminated rules altogether by adding their right-hand
side to the TES. For some operator ◦, consider a conflict rule R1 → R2 . If
R1 ∩ TES(◦) 6= ∅, then we can add R2 to TES due to the existential quantifier on
the left-hand side of a rule in the definition of obey. Further, if R2 ⊆ TES(◦), we
can safely eliminate the rule. Applying these rearrangements is often possible
since both REL(left(◦a )) ∩ FT (◦) and REL(right(◦a )) ∩ FT (◦) will be non-empty.
CD-C
for each unary operator (◦b )
for each unary operator ◦a in STO(◦b )
if ¬reorderable(◦a , ◦b )
TES(◦b ) + = AREL(◦a )
0b / σ / � 0b/ 0b/
0a / σ / � 0a / σ / �
0a /
a b c
Let us first consider the case where a binary operator ◦a can be found
somewhere below a unary operator ◦b . This is illustrated in Fig. 7.26 a. (Don’t
be confused by the two dotted lines, they will be used later on. Just image a
single line connecteding ◦b with ◦a .) If ◦b is left- and right-pushable into ◦a , we
do not have any conflict. If ◦b is neither left- nor right-pushable into ◦a , any
valid plan must contain ◦b above ◦a . This is achieved by extending the TES of
◦b by all relations below ◦a . Consider the case where ◦b is not right-pushable.
Then, we must prevent any plan where ◦b occurs in the right subtree of ◦a .
Adding a conflict rule ◦a which says that if any relation from ◦a ’s right subtree
occurs in the current plan to which we want to add ◦b , then it must contain
all relations from its left subtree. The other case is symmetric. We summarize
these ideas in the following extension to CD-C:
CD-C
for all unary operators ◦b in the original operator tree
for all binary operators ◦a ∈ ST O(◦b )
if ¬left-pushable(◦b , ◦a ) ∧ right-pushable(◦b , ◦a )
CR(◦b )+ = REL(left(◦a )) → REL(right(◦a ))
if left-pushable(◦b , ◦a ) ∧ ¬right-pushable(◦b , ◦a )
CR(◦b )+ = REL(right(◦a )) → REL(left(◦a ))
if ¬left-pushable(◦b , ◦a ) ∧ ¬right-pushable(◦b , ◦a )
TES(◦b )+ = REL(◦a )
Now, we consider the case where a unary operator ◦a can be found some-
where below a binary operator ◦b (see Fig. 7.26 b,c). In this case, if it cannot
be pulled up, we prevent this by adding the artificial relation AREL of ◦b to
the TES of ◦a :
CD-C
for all binary operators ◦b in the original operator tree
7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE301
A selection operator can be changed into a join if its predicate references two
or more relations. In this case, a conflict between the resulting join and some
other binary operator might occur. We can handle these potential conflicts as
follows. Consider Fig. 7.26 a. By ◦b /σ/B we denote our selection that can be
turned into a join. By ◦a /E we denote a binary operator below our selection.
The case that it might be a left outerjoin is used in a subsequent example.
The Figure shows the trick we perform. We assume that a selection that can
be turned into a join has two arguments, a left and a right subtree. Both of
which point to the (only) child node of the selection. Thus, the left outerjoin is
once the left child of the selection/join and once the right one. Then, the usual
CD-C procedure can be run in addition to the above conflict handling. Let us
do so for the example. In case we treat the left outerjoin as the left child of the
outerjoin, we derive from ¬assoc(E, B)
possibly with ∩FT (◦a ) on the right-hand side. In the other case, we get due to
the fact that ¬r-asscom(B, E)
possibly with ∩FT (◦a ) on the right-hand side. In any case, both conflicts result
in the same conflict rule. Further, both are subsumed by the above conflict
handling for the unary/binary operator mix. The reader should validate that
this is the case for all the operators in our algebra. However, since we want to
be purely table driven, we simply add these (redundant) conflict rules and rely
on rule simplification.
Note that in order to prevent this plan, we would have to detect conflicts on
the “other side” of the plan. In our example, we need to consider conflicts
between operators in the left and the right subtree of B1,3 . Since cross products
and degenerate predicates should be rare in real queries, it suffices to produces
correct plans. We have no ambition to explore the complete search space. Thus,
we just want to make sure that in these abnormal cases the plan generator still
produces a correct plan. In order to do so, we proceed as follows. We extend
the conflict representation by two bitvectors representing the left and the right
relations of an operator. Let us call them relLeft and relRight. Then, we extend
the applicable test and check that at least one relation from relLeft occurs in the
left subplan, and at least one relation from relRight occurs in the right subplan.
That is, in the test for applicable(◦, S1 , S2 ), we conjunctively check
This results in a correct test, but, as experiments have shown, about a third
of the valid search space will not be explored if cross products are present in
the initial operator tree. However, note that if the initial plan does not contain
cross products and degenerate predicates, this test will always succeed such that
in this case still the whole core search space is explored. Further, still a larger
portion of the core search space is explored when comparing this approach to
the one by Rao et al. [703, 704]. There, two separate runs of the plan generator
for the arguments of a cross product hinders any reordering of operators with
cross products.
There is a second issue concerning cross products. In some rare cases, they
might be beneficially introduced, even if the initial plan does not demand them.
In this case, we can proceed as proposed by Rao et al. [703, 704]. For each
relation R, a companion set is calculated which contains all relations, that are
connected to R only by inner join predicates. Within a companion set, all join
orders and introductions of cross products are valid.
It is rather simple to incorporate our test into other algorithms than DPsub (e.g.,
DPsize, DPccp, TDxxx). However, the result is not necessarily too efficient. An
efficient approach is discussed in Chapter ??, where we generalize DPccp to cover
hypergraphs. To see why this is appropriate, observe that (tesl(◦), tesr(◦)) is a
hyperedge.
At the beginning, we talked about the core search space. Why core? Because
there are more equivalences which we would like to be considered by the plan
generator. First of all, early grouping can significantly improve performance.
Then, some equivalences with operator conversions (e.g., Eqvs. ??, 7.94, 7.95,
7.194), and 7.193) are also important. These cases require some special treat-
ment. This is discussed in Chapter ??.
7.16. LOGICAL ALGEBRA FOR SEQUENCES 303
7.16.1 Introduction
The algebra (NAL) we use here extends the SAL-Algebra [69] developed by
Beeri and Tzaban. SAL is the order-preserving counterpart of the algebra used
in [185, 187] and in this book.
SAL and NAL work on sequences of sets of variable bindings, i.e., sequences
of unordered tuples where every attribute corresponds to a variable. We allow
nested tuples, i.e. the value of an attribute may be a sequence of tuples. Single
tuples are constructed by using the standard [·] brackets. The concatenation
of tuples and functions is denoted by ◦. The set of attributes defined for an
expression e is defined as A(e). The set of free variables of an expression e is
defined as F(e).
The projection of a tuple on a set of attributes A is denoted by |A . For an
expression e1 possibly containing free variables, and a tuple e2 , we denote by
e1 (e2 ) the result of evaluating e1 where bindings of free variables are taken from
variable bindings provided by e2 . Of course this requires F(e1 ) ⊆ A(e2 ). For a
set of attributes A, we define the tuple constructor ⊥A such that it returns a
tuple with attributes in A initialized to NULL.
For sequences e we use α(e) to denote the first element of a sequence. We
identify single element sequences and elements. The function τ retrieves the
tail of a sequence, and ⊕ concatenates two sequences. We denote the empty
sequence by . As a first application, we construct from a sequence of non-
tuple values e a sequence of tuples denoted by e[a]. It is empty if e is empty.
Otherwise, e[a] = [a : α(e)] ⊕ τ (e)[a].
By id we denote the identity function. In order to avoid special cases during
the translation of XQuery into the algebra, we use the special algebraic operator
(2̂) that returns a singleton sequence consisting of the empty tuple, i.e., a tuple
with no attributes.
We will only define order-preserving algebraic operators. For the unordered
counterparts see [187]. Typically, when translating a more complex XQuery into
our algebra, a mixture of order-preserving and not order-preserving operators
will occur. In order to keep the section readable, we only employ the order-
preserving operators and use the same notation for them that has been used in
[185, 187], SAL [69], and this book.
Again, our algebra will allow nesting of algebraic expressions. For example,
within a selection predicate of a select operator we allow the occurrence of
further nested algebraic expressions. Hence, a join within a selection predicate is
possible. This simplifies the translation procedure of nested XQuery expressions
into the algebra. However, nested algebraic expressions force a nested loop
evaluation strategy. Thus, the goal of the paper will be to remove nested
algebraic expressions. As a result, we perform unnesting of nested queries not
at the source level, but at the algebraic level. This approach is more versatile
and less error-prone.
304 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
e2 := R2
e1 := R1 e3 := χ̂a:σ̂a1 =a2 (e2 ) (e1 )
a2 b
a1 a1 a
1 2
1 1 h[1, 2], [1, 3]i
1 3
2 2 h[2, 4], [2, 5]i
2 4
3 3 hi
2 5
where (
ˆ := if e2 = ,
e1 Ae 2 ˆ
(e1 ◦ α(e2 )) ⊕ (e1 Aτ (e2 )) else.
7.16. LOGICAL ALGEBRA FOR SEQUENCES 305
ˆ 2 ).
e1 B̂p e2 := σ̂p (e1 ×e
The left outer join, which will play an essential role in unnesting, is defined
g:e
as e1 Êp e2 :=
g:e
(α(e1 )B̂p e2 ) ⊕ (τ (e1 )Êp e2 ) if (α(e1 )B̂p e2 ) 6= ,
(α(e1 ) ◦ ⊥A(e2 )\{g} ◦ [g : e]) else.
g:e
⊕(τ (e1 )Êp e2 )
where g ∈ A(e2 ). Our definition deviates slightly from the standard left outer
join operator, as we want to use it in conjunction with grouping and (aggregate)
functions. Consider the relations R1 and R2 in Figure 7.28. If we want to join
R1 (via a left outerjoin) with e3 , which is grouped on a2 , we need to be able
to handle empty groups (as for the tuple with a1 = 3 in e1 in the example). In
the definition of the left outerjoin with default, the expression e then defines
the value given to the attribute g for all those elements in e1 that do not find
g=0
a join partner in e2 . In our example, it would specify Ê .
We define the dependency join (d-join for short) as
(
if e1 =
ˆ 2>
e1 <e ˆ := ˆ ˆ 2>
α(e1 )Ae2 (e1 ) ⊕ τ (e1 )<e ˆ else
Let θ ∈ {=, ≤, ≥, <, >, 6=} be a comparison operator on atomic values. The
grouping operator which produces a sequence-valued new attribute containing
“the group” is defined by using a groupjoin.
Here, G(x) := f (σ̂x|A1 θA2 (e2 )), and function f assigns a meaningful value to
empty groups. See also Figure 7.28 for an example. The unary grouping oper-
ator processes a single relation and obviously groups only on those values that
are present. The groupjoin works on two relations and uses the left-hand one
to determine the groups. This will become important for the correctness of the
unnesting procedure.
306 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES
e := R2
e1 := R1 2
a2 b e3 := Γ̂a2 ;g:count (R2 ) e4 := Γ̂=a2 ;g:id (R2 )
a1
1 2 a2 g a2 g
1
1 3 1 2 1 h[1, 2], [1, 3]i
2
2 4 2 2 2 h[2, 4], [2, 5]i
3
2 5
e5 := R1 Ẑa1 =a2 ;g:id (R2 )
a1 g
1 h[1, 2], [1, 3]i
2 h[2, 4], [2, 5]i
3 hi
if e = ,
µ̂g (e) := ˆ
(α(e)|{g} ×α(e).g) ⊕ µ̂g (τ (e)) else.
where e.g retrieves the sequence of tuples of attribute g. In case that g is empty,
it returns the tuple ⊥A(e.g) . (In our example in Figure 7.28, µ̂g (e4 ) = e2 .)
This operator is mainly used for evaluating XPath expressions. Since this is
a very complex issue [332, 334, 410], we do not delve into optimizing XPath
evaluation, but instead take an XPath expression occurring in a query as it is
and use it in the place of e2 . Optimized translation of XPath is orthogonal to
our unnesting approach and not covered in this paper. The interested reader is
referred to [410, 411].
7.17. LITERATURE 307
7.16.3 Equivalences
To acquaint the reader with ordered sequences, we state some familiar equiva-
lences that still hold.
Of course, in the above equivalences the usual restrictions hold. For ex-
ample, if we want to push a selection predicate into the left part of a join,
it may not reference attributes of the join’s right argument. In other words,
F(p1 ) ∩ A(e2 ) = ∅ is required. As another example, Eqv. 7.214 only holds if
F(e1 ) ∩ A(e1 ) = ∅. In Eqv. 7.213, the function f may not alter the schema, and
b must be an attribute name. Please note that cross product and join are still
associative in the ordered context. However, neither of them is commutative.
Further, pushing selections into the second argument of a left-outer join is (in
general) not possible. For strict predicates we can do better, but this is beyond
the scope of the book.
7.16.4 Bibliography
Zaniolo [528]
7.17 Literature
• Bags, Sets, boolean algebras: [615]
• NF2 : [4, 200, 425, 733, 734, 537, 735, 538, 766]
• HAS: [128]
• OO Algebra [169]
• OO Algebra [184]
• OO Algebra [374]
• OO Algebra [552]
• OO Algebra [748]
• OO Algebra [951]
• SAL [69]: works on lists. Intended for semistructured data. SAL can be
thought of as the order-preserving counterpart of the algebra presented
in [185, 187] extended to handle semistructured data. These extensions
are similar to those proposed in [5, 177]
• TAX [455]: The underlying algebra’s data model is based on sets of or-
dered labeled trees. Intended for XML.
• [363]
• Geo: [379]
7.18 ToDo
ToDo Grouping Mapping and Commutativity Comment on ΠA (R1 ∩ R2 ) 6≡ ΠA (R1 ) ∩
ToDo ΠA (R2 ) ΠA (R1 \ R2 ) 6≡ ΠA (R1 ) \ ΠA (R2 )
[704]
pregrouping Tsois, Sellis: [869]
bulktypes: Albert: [21]
bulktypes: Dayal: [217]
Chapter 8
Declarative Query
Representation
8.2 Datalog
8.5 Expressiveness
transitivity: [678]. aggregates: [494]. complex object and nested relations: [3].
8.6 Bibliography
309
310 CHAPTER 8. DECLARATIVE QUERY REPRESENTATION
Chapter 9
9.5 Bibliography
311
312 CHAPTER 9. TRANSLATION AND LIFTING
Chapter 10
Query Equivalence,
Containment, Minimization,
and Factorization
q(X, Y ) : −p(X, Y ).
under set semantics. The latter query now contains fewer body literals. Query
minimization now asks for an equivalent query with the least possible number
of body literals. One possible approach is to successively delete a body literal
until no more body literal can be deleted without violating equivalence to the
original query.
The above example is also illustrative since it shows that query equiva-
lence (and thus query containment) differs under different semantics: whereas
the above two queries are equivalent under set semantics, they are not under
bag semantics. To see this, consider the extensional database {p(a, b), p(a, b)}.
The result of the first query contains p(a, b) four times whereas the last query
contains is only 2 times.
313
314CHAPTER 10. QUERY EQUIVALENCE, CONTAINMENT, MINIMIZATION, AND FACTORIZAT
q1 : r1 : − l1 , . . . , lk
q2 : r2 : − l10 , . . . , lm
0
Let V(qi ) be the set of variables occurring in qi , and C(qi ) be the set of constants
occurring in qi . Further, let h be a substitution h : V(q2 ) → (V(q1 ) ∪ C(q1 )).
We call h a containment mapping from q2 to q1 , if and only if the following
conditions are fulfilled:
The latter condition states that for each body literal li0 in q2 there is a body
literal lj in q1 such that h(li ) = li0 . Note that this does not imply that h is
injective or surjective.
The following theorem connects containment mappings with the contain-
ment problem:
and, hence, q1 ⊆ q2 .
A query q is minimal, if it contains the minimal possible number of body
literals. More formally, q is minimal , if for any query q 0 with q ≡ q 0 the number
of body literals in q 0 greater than or equal to the number of body literals in
q. The following theorem shows that our initial thoughts on minimization are
correct for conjunctive queries.
This suggests a simple procedure for minimizing a given query q. For every
body literal check whether some containment mapping h exists such that it is
subsumed by some other body literal. Note that this containment mapping
must not rename head variables.
Let q and q 0 be two conjunctive queries. If q can be derived from q 0 solely
by reordering body literals and renaming variables, then q and q 0 are called
isomorphic. Minimal queries are unique up to some isomorphism. Obviously,
minimizing conjunctive queries is also NP-complete.
Let us now come to unions of conjunctive queries. Let Q = Q1 ∪ . . . ∪ Qk
and Q0 = Q01 ∪ . . . ∪ Q0l be two unions of conjunctive queries Qi and Q0j with a
common head predicate. A containment mapping h from Q to Q0 maps each Qi
to some Q0j such that h(Qi ) ⊆ Q0j . Sagiv and Yannakakis showed the following
theorem [746].
1. R ≡ Q
2. ¬∃ R0 ⊂ R R0 ≡ Q
This corollary implies that we can minimize a query that is a union of con-
junctive queries by eliminating those conjunctive queries Qi from it that are
contained in some Qj .
For conjunctive queries the problems of containment, equivalence, and min-
imization are
The problems of containment, equivalence, and minimization of conjunctive
queries are most difficitult if all body literals have a common predicate p. This
is quite an unrealistic assumption as typical conjunctive queries will not only
self-join the same relation. A first question is thus whether there exist special
cases where there are polynomial algorithms for containment checking. Another
strain of work is devoted to more complex queries. As it turns out, the results
become less nice and more restricted.
Theorem 10.1.5 Assume the two conjunctive queries q1 and q2 are of the form
q1 : p1 : − l1 , . . . , lk , e1 , . . . , el
q2 : p2 : − l10 , . . . , lm
0 , e0 , . . . , e0
1 n
where pi are the head literals, li and li0 are ordinary subgoals and ei and e0i
are inequalities. Let h be a containment mapping from q2 to q1 where both are
restricted to their ordinary literals. If additionally for all i = 1, . . . , n we have
e1 , . . . , el =⇒ h(e0i )
then q1 ⊆ q2 .
This result is due to Klug [495] who used the following procedure to reason
about inequalities using comparison operators in {=, <, ≤}. Given a set of
inequalities L, an directed graph G is defined whose nodes are the variables
and constants in L. Whenever for all x < y or x ≤ y in L, the edge (x, y) is
added to G. For all constants c and c0 in L, if c < c0 then we add an edge
(c, c0 ). Edges are labeled with the according comparison operator. For equality
predicates, an edge in both direction is added. Given the graph G, we conclude
that x ≤ y if there is a path from x to y and x < y only if additionally at least
one edge is labelled by <. An alternative is to use the procedure presented in
Section 11.2.3 to solve the inequality inference problem. It also allows for the
comparison operator 6=.
To see why a dense domain is important consider the domain of integers.
From 1 < x < 3 we can easily conclude that x = 2, a fact we can derive
neither from the procedure above nor from the axioms and inference procedure
presented in Section 11.2.3.
10.2. BAG SEMANTICS 317
10.3 Sequences
10.3.1 Path Expressions
XPath constructs and their short-hands to denote XPath sublanguages.
• branching (’[]’)
Otherwise XPath only contains the child axis and node name tests. These
sublanguages are represented as tree patterns.
Query containment for certain subclasses:
• Por is in PTIME
• P[],or is coNP-complete
• P| is coNP-complete [595]
[632] showed that P[],∗,//,| is coNP-complete for infinite alphabets and in
PSPACE for finite alphabets.
10.4. MINIMIZATION 319
• P//,| is PSPACE-complete
• P[],∗,// with variable binding and equivality tests is Πp2 - hard [231]
10.4 Minimization
minimization: [514]
10.6 Bibliography
In a pair of papers Aho, Sagiv, and Ullmann [15, 16] study equivalence, con-
tainment, and minimization problems for tableaux. More specifically, they
introduce a restricted variant of relational expressions containing projection,
natural join, and selection with predicates that only compare attributes with
constants. They further assume the existence of a universal relation. That
is, every relation R is the projection of the universal relation on A(R). Now,
these restricted conjunctive queries can be expressed with tableaux. The au-
thors tableaux equivalence, containment, and minimization problems also in
the presence of functional dependences. The investigated problems are all NP-
complete. Since the practical usefulness is limited we do not give the concrete
results of this pair of papers.
[155, 158] contains (complexity) results for deciding query equivalence in
the case of recursive and nonrecursive datalog.
320CHAPTER 10. QUERY EQUIVALENCE, CONTAINMENT, MINIMIZATION, AND FACTORIZAT
Rewrite Techniques
321
Chapter 11
Simple Rewrites
Constant subexpressions are evaluated and the result replaces the subexpres-
sion. For example an expression 1/100 is replaced by 0.01. Other expressions
like a − 10 = 50 can be rewritten to a = 40. However, the latter kind of rewrite
is rarely performed by commercial systems.
Eliminate Between
Eliminate IN
Eliminating LIKE
323
324 CHAPTER 11. SIMPLE REWRITES
Eliminate - and /
(x − y) ; x + (−y) x/y ; x ∗ (1/y)
11.2.2 Equality
Equality is a reflexive, symmetric and transitive binary relationship (see Fig. 11.2).
Such a relation is called an equivalence relation Hence, a set of conjunctively
326 CHAPTER 11. SIMPLE REWRITES
N OT true → f alse
N OT f alse → true
p AN D true → p
p AN D f alse → f alse
p OR true → true
p OR f alse → p
x=x
x=y =⇒ y = x
x = y ∧ y = z =⇒ x = z
11.2.3 Inequality
Table 11.1 gives a set of axioms used to derive new predicates from a set of
conjunctively occurring inequalities S (see [874], see Fig. 11.4).
11.2. DERIVING NEW PREDICATES 327
Step 3 can be performed as follows. For any true IUs X and Y we find these
IUs Z with X ≤ Z ≤ Y .
Then we check whether any two such Z’s are related by 6=. Here, it is
sufficient to check the original 6= pairs in S and these derived in 1.
A1 : X ≤ X
A2 : X < Y ⇒ X ≤ Y
A3 : X < Y ⇒ X 6= Y
A4 : X ≤ Y ∧ X 6= Y ⇒ X < Y
A5 : X 6= Y ⇒ Y 6= X
A6 : X < Y ∧ Y < Z ⇒ X < Z
A7 : X ≤ Y ∧ Y ≤ Z ⇒ X ≤ Z
A8 : X ≤ Z ∧ Z ≤ Y ∧ X ≤ W ∧ W ≤ Y ∧ W 6= Z ⇒ X 6= Y
11.2.4 Aggregation
select A1 , . . . , Ak , a1 , . . . , al
from R1 , . . . , Rn
where pw
A1 , . . . , A m
group by
having ph
328 CHAPTER 11. SIMPLE REWRITES
Even if we know that max(salary) > 100.000, the above query block is not
equivalent to
select deptNo, max(salary), min(salary)
from Employee
where salary ¿ 100.000
deptNo
group by
Neither is
select deptNo, max(salary)
from Employee
deptNo
group by
having avg(salary) ¿ 50.000
11.3. PREDICATE PUSH-DOWN AND PULL-UP 329
equivalent to
11.2.5 ToDo
[569]
11.6.1 Introduction
The growing importance of object-relational database systems (ORDBMS) [830]
has kindled a renewed interest in the efficient processing of set-valued attributes.
One particular problem in this area is the joining of two relations on set-valued
attributes [311, 414, 700]. Recent studies have shown that finding optimal join
algorithms with set-containment predicates is very hard [119]. Nevertheless, a
certain level of efficiency for joins on set-valued attributes is indispensable in
practice.
Obviously, brute force evaluation via a nested-loop join is not going to be
very efficient. An alternative is the introduction of special operators on the
physical level of a DBMS [414, 700]. Integration of new algorithms and data
structures on the physical level is problematic, however. On one hand this
approach will surely result in tremendous speed-ups, but on the other hand
this efficiency is purchased dearly. It is very costly to implement and integrate
new algorithms robustly and reliably.
330 CHAPTER 11. SIMPLE REWRITES
11.6.2 Preliminaries
In this section we give an overview of the definition of the set type. Due to the
deferral of set types to SQL-4 [287], we use a syntax similar to that of Informix
1 . A possible example declaration of a table with a set-valued attribute is:
setID is the key of the relation, whereas content stores the actual set. The
components of a set can be any built-in or user-defined type. In our case we
used set<char(3)>, because we wanted to store 3-grams (see also Section ??).
We further assume that on set-valued attributes the standard set operations
and comparison operators are available.
Our rewriting method is based on unnesting the internal nested representa-
tion. The following view defining the unnested version of the above table keeps
our representation more concise:
where setID identifies the corresponding set, d takes on the different values in
content and card is the cardinality of the set. We also need unnest<char(3)>,
a table function that returns a set in the form of a relation. As unnest<char(3)>
returns an empty relation for an empty set, we have to consider this special case
in the second subquery of the union statement, inserting a tuple containing a
dummy value.
(The comparison with 0 is only needed for DB2, which does not understand the
type bool.)
This query can be rewritten as follows. The basic idea is to join the unnest-
ed version of the table based on the set elements, group the tuples by their
set identifiers, count the number of elements for every set identifier and com-
pare this number with the original counts. The filter predicate vn1.card <=
vn2.card discards some sets that cannot be in the result of the set-containment
join. We also consider the case of empty sets in the second part of the query.
Summarizing the rewritten query we get
The formulation of the unnested query is much simpler than the unnested
query in Section 11.6.3. Due to our view definition, not much rewriting is
necessary. We just have to take care of empty sets again, although this time in
a different, simpler way.
11.7 Bibliography
This section is based on the investigations by Helmer and Moerkotte [417].
There, we also find a performance evaluation indicating that that the rewrites
depending on the relation sizes result in speed-up factors between 5 and 50 even
for moderately sized relations. Nevertheless, it is argued their, that support for
set-valued attributes must be build into the DBMS. A viable alternative to the
rewrites presented here is the usage of special join algorithms for join predicates
involving set-valued attributes [311, 413, 414, 571, 590, 591, 700]. Nevertheless,
as has been shown by Cai, Chakaravarthy, Kaushik, and Naughton, dealing
with set-valued attributes in joins theoretically (and of course practical) difficult
issue [119]. Last, to efficiently support simple selection predicates on set-valued
attributes, special index structures should be incorporated into the DBMS [415,
416, 418].
11.7. BIBLIOGRAPHY 333
IU::addEqualityClassUnderThis(IU* lIU){
IU*lRepresentativeThis = this -> getEqualityRepresentativeIU;
IU*lRepresentativeArg = aIU -> getEqualityRepresentativeIU;
IU::addEqualityPredicate(Compositing* p){
IU*lLeft = p -> leftIU;
IU*lRight = p -> rightIU;
if (p -> isEqualityPredicateIU &&
lLeft -> getEqualityRepresentativeIU ==
lRight -> getEqualityRepresentativeIU){
if(lLeft - > isBoundToConstantIU) {
lLeft -> addEqualityClassUnderThis(lRight);
}else
if(lRight -> isBoundToConstantIU){
lRight -> addEqualityClassUnderThis(lLeft),
}else
if (lLeft -> _equalityClassRank > lRight ->
_equalityClassRank){
lLeft -> addEqualityClassUnderThis(lRight)
}else{
lright -> addEqualityClassUnderThis(lLeft)
}
}
}
Figure 11.3:
334 CHAPTER 11. SIMPLE REWRITES
A1 X≤X
A2 X <Y ⇒ X≤Y
A3 X <Y ⇒ X 6= Y
A4 X ≤Y ∧X 6= Y ⇒ X<Y
A5 X 6= Y ⇒ Y 6= X
A6 X <Y ∧Y <Z ⇒ X<Z
A7 X ≤Y ∧Y ≤Z ⇒ X≤Z
A8 X ≤ Z ∧ Z ≤ Y ∧ X ≤ W ∧ W ≤ Y ∧ W 6= Z ⇒ X 6= Y
View Merging
create view
However, there are a few pitfalls. This simple version of view merging can
only be applied to simple select-project-join queries not containing duplicate
elimination, set operations, grouping or aggregation. In these cases, complex
view merging must be applied.
335
336 CHAPTER 12. VIEW MERGING
but view resolution with a subsequent push-down of the predicate e.salary >
150.000 will result in
select e.eno, e.name
from ( select e1.eno, e1.name, e1.salary, e1.dno
from Emp1[e1]
where e1.salary > 150000)
(
union all select e2.eno, e2.name, e2.salary, e2.dno
from Emp2[e2]
where e2.salary > 150000)
the query
select *
from EmpStat[e]
where e.dno = 10
can be rewritten to
select e.dno, min(e.salary) minSal, max(e.salary) maxSal, avg(e.salary) avgSal
from Emp[e]
where e.dno = 10
group by e.dno
selectd.name, s.avgSalary)
from Dept[d], (select e.dno, avg(salary) as avgSalary
from Emp[e]
group by e.dno) [s]
where d.location = ‘Paris‘ and
d.dno = s.dno
This query can then be unnested using the techniques of Section ??.
Sometimes strange results occur. Consider for example the view
This is perfectly o.k. You just need to think twice about it. The resulting plan
will contain two group operations: XXX Plan
and a query asking for all those employees together with their salaries in Parisian
departments earning the minimum salary:
Note that the employee relation occurs twice. Avoiding to scan the employee
relation twice can be done as follows:
12.5 Bibliography
Chapter 13
Quantifier treatment
13.1 Pseudo-Quantifiers
Again, the clue to rewrite subqueries with a ANY or ALL predicate is to apply
aggregate functions [310]. A predicate of the form
In the above rewrite rules, the predicate < can be replaced by =, ≤, etc. If the
predicate is > or ≥ then the above rules are flipped. For example, a predicate
of the form >ANY becomes >select min and >ALL becomes >select max.
After the rewrites have been applied, the Type A or Type JA unnesting
techniques can be applied, depending on the details of the inner query block.
341
342 CHAPTER 13. QUANTIFIER TREATMENT
...
where exists (select ...
from ...
where ...)
It is equivalent to
...
where 0 > (select count(. . . )
from ...
where ...)
...
where not exists (select ...
from ...
where ...)
is equivalent to
...
where 0 = (select count(. . . )
from ...
where ...)
After these rewrites have been applied, the Type A or Type JA unnesting
techniques can be applied, depending on the details of the inner query block.
Case-No. 1 2 3 4 5 6 7 8
p() p() p() p() p(e1 ) p(e1 ) p(e1 ) p(e1 )
q() q(e1 ) q(e2 ) q(e1 , e2 ) q() q(e1 ) q(e2 ) q(e1 , e2 )
Case-No. 9 10 11 12 13 14 15 16
p(e2 ) p(e2 ) p(e2 ) p(e2 ) p(e1 , e2 ) p(e1 , e2 ) p(e1 , e2 ) p(e1 , e2 )
q() q(e1 ) q(e2 ) q(e1 , e2 ) q() q(e1 ) q(e2 ) q(e1 , e2 )
Q ≡ select e1
from e1 in E1
where for all e2 in select e2
from e2 in E2
where p:
q
where p (called the range predicate) and q (called the quantifier predicate) are
predicates in a subset of the variables {e1 , e2 }. This query pattern is denoted
by Q.
In order to emphasize the (non-)occurrence of variables in a predicate p, we
write p(e1 , . . . , en ) if p depends on the variables e1 , . . . , en . Using this conven-
tion, we can list all the possible cases of variable occurrence. Since both e1 and
e2 may or may not occur in p or q, we have to consider 16 cases (see Table 13.1).
All cases but 12, 15, and 16 are rather trivial. Class 12 queries can be unnested
by replacing the universal quantifier by a division, set difference, anti-semijoin,
or counting. Class 15 queries are treated by set difference, anti-semijoin or
grouping with count aggregation. For Class 16 queries, the alternatives are set
difference, anti-semijoin, and grouping with count aggregation. In all cases,
special care has to be taken regarding NULL values. For details see [180].
select al.name
from al in Airline
where for all ap in (select ap
from ap in Airport
where apctry = ’USA’):
ap in al.lounges
Define U ≡ πap (σapctry=0 U SA0 (Airport[ap, apctry])). Then the three alternative
algebraic expressions equivalent to this query are
then Airline[name]
else µap:lounges (Airline[name, lounges]) ÷ U
This plan is only valid, if the projected attributes of Airline form a su-
perkey.
ifσp(e (E2 [e2 ])6=∅ (((E1 [e1 ] Bq(e1 ,e2 ) E2 [e2 ]) ÷ σp(e2 ) (E2 [e2 ])), E1 [e1 ])
2)
In case the selection σp(e2 ) (E2 [e2 ]) yields at least a one tuple or object, we can
apply the prediate p to the divident, as in
ifσp(e (E2 [e2 ])6=∅ (((E1 [e1 ] Bq(e1 ,e2 ) σp(e2 ) (E2 [e2 ])) ÷ σp(e2 ) (E2 [e2 ])), E1 [e1 ]).
2)
ifσp(e (E2 [e2 ])6=∅ ((µe2 :SetAttribute (E1 [e1 , SetAttribute]) ÷ σp(e2 ) (E2 [e2 ])), E1 [e1 ])
2)
E1 [e1 ] \ πe1 ((E1 [e1 ] × σp(e2 ) (E2 [e2 ])) \ (E1 [e1 ] Bq(e1 ,e2 ) σp(e2 ) (E2 [e2 ])))
This plan is mentioned in [819], however using a regular join instead of a semi-
join.
The anti-semijoin can be employed to eliminate the set difference yielding
the following plan:
E1 [e1 ] T¬q(e1 ,e2 ) σp(e2 ) (E2 [e2 ])
This plan is in many cases the most efficient plan. However, the correctness of
this plan depends on the uniqueness of e1 , i.e., the attribute(s) e1 must be a
(super) key of E1 . This is especially fulfilled in the object-oriented context if
e1 consists of or contains the object identifier.
We do not present the plans based group and count operations (see [180]).
13.3. UNIVERSAL QUANTIFIER 345
E1 [e1 ] \ πe1 ((E1 [e1 ] Bp(e1 ,e2 ) E2 [e2 ]) \ (E1 [e1 ] Bp(e1 ,e2 ) σq(e2 ) (E2 [e2 ])))
E1 [e1 ] \ πe1 ((E1 [e1 ] Bp(e1 ,e2 ) E2 [e2 ]) \ (E1 [e1 ] Bp(e1 ,e2 )∧q(e1 ,e2 ) E2 [e2 ])).
This plan can first be refined by replacing the seet difference of the two join
expression by a semijoin resultint in
Again, the uniqueness constraing on E2 [e2 ] is required for this most efficient
plan to be valid.
For all discussed classes, problems with NULL values might occur. In that
case, the plans have to refined [180].
346 CHAPTER 13. QUANTIFIER TREATMENT
13.4 Bibliography
[459] [216] [180] [702, 695]
Chapter 14
347
348 CHAPTER 14. UNNESTING NESTED QUERIES
Chapter 15
15.4 Bibliography
materialized view with aggregates: [817],
materialized view with disjunction: [11],
SQL Server: [328]
other: [12, 148, 149, 160, 540, 832, 868, 935] [137, 141, 161, 151, 279, 474,
527, 663, 694, 773]
some more including maintenance etc: [10, 14, 52, 92, 148, 154, 201, 377, 392]
[428, 473, 539, 691, 736, 262, 817] [832, 841, 840, 960, 936] [6, 248, 249, 399]
Overview: [384]
[541]
performance eval: [90]
Stacked views: [221]
recursion: [251]
with patterns (integration): [694], [250, 252], [230]
349
350CHAPTER 15. OPTIMIZING QUERIES WITH MATERIALIZED VIEWS
Chapter 16
351
352 CHAPTER 16. SEMANTIC QUERY REWRITE
16.4 Bibliography
[81] [72] [930] Foreign functions semantic rules rewrite: [151] Conjunctive Queries,
Branch Minimization: [730]
Part IV
Plan Generation
353
Chapter 17
• For large join queries do not apply transitivity of equality to derive new
predicates and disable cross products and possibly bushy trees.
17.4 Bibliography
355
356 CHAPTER 17. CURRENT SEARCH SPACE AND ITS LIMITS
Chapter 18
Dynamic Programming-Based
Plan Generation
18.1 Introduction
select * from R, S, T, where R.A = S.B AND S.C = T.D and R.E
+ S.F = T.G
357
358CHAPTER 18. DYNAMIC PROGRAMMING-BASED PLAN GENERATION
R1 R4
R2 R5
R3 R6
18.2 Hypergraphs
Let us start with the definition of hypergraphs.
that the number of connected subgraphs is far smaller than the number of csg-
cmp-pairs. The problem now is to enumerate the csg-cmp-pairs efficiently and
in an order acceptable for dynamic programming. The latter can be expressed
more specifically. Before enumerating a csg-cmp-pair (S1 , S2 ), all csg-cmp-pairs
(S10 , S20 ) with S10 ⊆ S1 and S20 ⊆ S2 have to be enumerated.
18.4 Neighborhood
The main idea to generate csg-cmp-pairs is to incrementally expand connected
subgraphs by considering new nodes in the neighborhood of a subgraph. Infor-
mally, the neighborhood N (S) under an exclusion set X consists of all nodes
reachable from S that are not in X. We derive an exact definition below.
When choosing subsets of the neighborhood for inclusion, we have to treat
a hypernode as a single instance: either all of its nodes are inside an enumer-
ated subset or none of them. Since we want to use the fast subset enumeration
procedure introduced by Vance and Maier [885], we must have a single bit
representing a hypernode and also single bits for relations occurring in simple
edges. Since these may overlap, we are constrained to choose one unique rep-
resentative of every hypernode occurring in a hyperedge. We choose the node
that is minimal with respect to ≺. Accordingly, we define:
Note that if S is empty, then min(S) is also empty. Otherwise, it contains a sin-
gle element. Hence, if S is a singleton set, then min(S) equals the only element
contained in S. For our hypergraph in Fig. ?? and with S = {R4 , R5 , R6 }, we
have min(S) = {R4 }.
Let S be a current set, which we want to expand by adding further relations.
Consider a hyperedge (u, v) with u ⊆ S. Then, we will add min(v) to the
neighborhood of S. However, we have to make sure that the missing elements
of v, i.e. v \ min(v), are also contained in any set emitted. We thus define
min(S) = S \ min(S)
E ↓0 (S, X) = {v|(u, v) ∈ E, u ⊆ S, v ∩ S = ∅, v ∩ X = ∅}
18.5. THE CCP ENUMERATOR BUENUMCPPHYP 361
Define E ↓ (S, X) to be the minimal set of hypernodes such that for all v ∈
E ↓0 (S, X) there exists a hypernode v 0 in E ↓ (S, X) such that v 0 ⊆ v. Note
that apart from the connectedness, we test exactly the conditions given in
Def. 18.3.1. For our hypergraph in Fig. ?? and with X = S = {R1 , R2 , R3 }, we
have E ↓ (S, X) = {{R4 , R5 , R6 }}.
We are now ready to define the neighborhood of a hypernode S, given a set
of excluded nodes X.
[
IN(S, X) = min(v) (18.1)
v∈E↓(S,X)
2. both the primary connected subgraphs and its connected complement are
created by recursive graph traversals;
Summarizing the above, the algorithm traverses the graph in a fixed order and
recursively produces larger connected subgraphs. The main challenge relative
to DPccp is the traversal of hyperedges: First, the ”starting” side of the edge
can require multiple nodes, which complicates neighborhood computation. In
particular the neighborhood can no longer be computed as a simple bottom-up
union of local neighborhoods. Second, the ”ending” side of the edge can lead
to multiple nodes at once, which disrupts the recursive growth of components.
Consider a set S1 , which we want to extend by a hyperedge (u, w). Even if
u ⊆ S1 , there is no guarantee that S1 ∪ w will be connected.
To overcome these problems, the algorithm picks a representative end node.
In our exmaple, it picks the 1 in the n : 1 of item 4 (see also Eq. 18.1). With
it, it starts the recursive growth and exploits the DP table to check if a valid
constellation has been reached, i.e., the constructed hypernode induces a con-
nected subgraph. This exploitation builds on the fact that our DP strategies
362CHAPTER 18. DYNAMIC PROGRAMMING-BASED PLAN GENERATION
enumerate subsets before supersets. We are now prepared to discuss the details
of the algorithm.
We give the implementation of our join ordering algorithm for hypergraphs
by means of the pseudocode for member functions of a class BuEnumCcpHyp.
This allows us to minimize the number of parameters by assuming that this
class contains references to the query hypergraph (G = (V, E)) and to the
dynamic programming table (DpTable).
The whole algorithm is distributed over five subroutines. The top-level rou-
tine BuEnumCcpHyp initializes the DpTable with access plans for single relations
and then calls EmitCsg and EnumerateCsgRec for each set containing exactly
one relation. In a real implementation, the DpTable should be initialized before
calling BuEnumCcpHyp.
The member function EnumerateCsgRec is responsible for enumerating con-
nected subgraphs. It does so by calculating the neighborhood and iterating
over each of its subset. For each such subset S1 , it calls EmitCsg. This member
function is responsible for finding suitable complements. It does so by calling
EnumerateCmpRec, which recursively enumerates the complements S2 for the
connected subgraph S1 found before. The pair (S1 , S2 ) is a csg-cmp-pair. For
every such pair, EmitCsgCmp is called. Its main responsibility is to consider a
plan built up from the plans for S1 and S2 . The following subsections discuss
these five member functions in detail. We illustrate them with the example
hypergraph shown in Fig. ??. The corresponding traversal steps are shown in
Fig. 18.2, we will illustrate them during the description of the algorithm.
18.5.1 BuEnumCcpHyp
The pseudocode for BuEnumCcpHyp looks as follows:
BuEnumCcpHyp()
for each v ∈ V // initialize DpTable
DpTable[{v}] = plan for v
for each v ∈ V descending according to ≺
EmitCsg({v}) // process singleton sets
EnumerateCsgRec({v}, Bv ) // expand singleton sets
return DpTable[V ]
In the first loop, it initializes the dynamic programming table with plans for sin-
gle relations. In the second loop, it calls for every node in the query graph, in de-
creasing order (according to ≺) the two subroutines EmitCsg and EnumerateCsgRec.
In Fig. 18.2, we find the call stack of our algorithm. The calls generated by
BuEnumCcpHyp correspond to those with stack-depth zero, where the stack-
depth is indicated in the second column from the left. For convenience, we not
only give the parameters, but also the neighborhood IN. The algorithm calls
EmitCsg({v}) for single nodes v ∈ V to generate all csg-cmp-pairs ({v}, S2 ) via
calls to EnumerateCsgCmp and EmitCsgCmp, where v ≺ min(S2 ) holds. This con-
dition implies that every csg-cmp-pair is generated only once, and no symmetric
pairs are generated. In Fig. 18.2, this corresponds to single vertex graphs, e.g.
step 1 and 2. The calls to EnumerateCsgRec extend the initial set {v} to larger
18.5. THE CCP ENUMERATOR BUENUMCPPHYP 363
. . . . . . . . . . . .
R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R1 R4
R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5
R3 R6 R3 R6 R3 R6 R3 R6 R3 R6 R3 R6
. . . . . . . . . . . .
1 2 3 4 5 6
. . . . . . . . . . . .
R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R1 R4
R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5
R3 R6 R3 R6 R3 R6 R3 R6 R3 R6 R3 R6
. . . . . . . . . . . .
7 8 9 10 11 12
. . . . . . . . . . . .
R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R1 R4
R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5
R3 R6 R3 R6 R3 R6 R3 R6 R3 R6 R3 R6
. . . . . . . . . . . .
13 14 15 16 17 18
. . . . . . . . . . . .
R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R1 R4
R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5
R3 R6 R3 R6 R3 R6 R3 R6 R3 R6 R3 R6
. . . . . . . . . . . .
19 20 21 22 23 24
. . . .
Legend:
R1 R4 R1 R4 R1 R2 connected subgraph R1 forbidden node
R2 R5 R2 R5 R1 R2 connected complement R1 non-forbidden node
R3 R6 R3 R6
. . . .
25 26
sets S1 , for which then connected subsets of its complement S2 are found such
that (S1 , S2 ) results in a csg-cmp-pair. In Fig. 18.2, this is shown in step 2, for
example, where EnumerateCsgRec starts with R5 and expands it to {R5 , R6 } in
step 4 (step 3 being the construction of the complement). To avoid duplicates
during enumerations, all nodes that are ordered before v according to ≺ are
prohibited during the recursive expansion [607]. Formally, we define this set as
Bv = {w|w ≺ v} ∪ {v}.
18.5.2 EnumerateCsgRec
The general purpose of EnumerateCsgRec is to extend a given set S1 , which
induces a connected subgraph of G to a larger set with the same property. It
does so by considering each non-empty, proper subset of the neighborhood of S1 .
For each of these subsets N , it checks whether S1 ∪N is a connected component.
This is done by a lookup into the DpTable. If this test succeeds, a new connected
component has been found and is further processed by a call EmitCsg(S1 ∪ N ).
Then, in a second step, for all these subsets N of the neighborhood, we call
EnumerateCsgRec such that S1 ∪ N can be further extended recursively. The
364CHAPTER 18. DYNAMIC PROGRAMMING-BASED PLAN GENERATION
reason why we first call EmitCsg and then EnumerateCsgRec is that in order
to have an enumeration sequence valid for dynamic programming, smaller sets
must be generated first. Summarizing, the code looks as follows:
EnumerateCsgRec(S1 , X)
for each N ⊆ IN(S1 , X): N 6= ∅
if DpTable[S1 ∪ N ]6= ∅
EmitCsg(S1 ∪ N )
for each N ⊆ IN(S1 , X): N 6= ∅
EnumerateCsgRec(S1 ∪ N, X ∪ IN(S1 , X))
18.5.3 EmitCsg
EmitCsg takes as an argument a non-empty, proper subset S1 of V , which
induces a connected subgraph. It is then responsible to generate the seeds for
all S2 such that (S1 , S2 ) becomes a csg-cmp-pair. Not surprisingly, the seeds
are taken from the neighborhood of S1 . All nodes that have ordered before the
smallest element in S1 (captured by the set Bmin(S1 ) ) are removed from the
neighborhood to avoid duplicate enumerations [607]. Since the neighborhood
also contains min(v) for hyperedges (u, v) with |v| > 1, it is not guaranteed that
S1 is connected to v. To avoid the generation of false csg-cmp-pairs, EmitCsg
checks for connectedness. However, each single neighbor might be extended to
a valid complement S2 of S1 . Hence, no such test is necessary before calling
EnumerateCmpRec, which performs this extension. The pseudocode looks as
follows:
EmitCsg(S1 )
X = S1 ∪ Bmin(S1 )
N = IN(S1 , X)
for each v ∈ N descending according to ≺
S2 = {v}
if ∃(u, v) ∈ E : u ⊆ S1 ∧ v ⊆ S2
EmitCsgCmp(S1 , S2 )
EnumerateCmpRec(S1 , S2 , X ∪ Bv (N ))
sets, there is no call to EmitCsgCmp. However, the set {R4 } can be extended
to a valid complement, namely {R4 , R5 , R6 }. Properly extending the seeds of
complements is the task of the call to EnumerateCmpRec in step 21.
18.5.4 EnumerateCmpRec
EnumerateCsgRec has three parameters. The first parameter S1 is only used
to pass it to EmitCsgCmp. The second parameter is a set S2 which is connect-
ed and must be extended until a valid csg-cmp-pair is reached. Therefore, it
considers the neighborhood of S2 . For every non-empty, proper subset N of
the neighborhood, it checks whether S2 ∪ N induces a connected subset and is
connected to S1 . If so, we have a valid csg-cmp-pair (S1 , S2 ) and can start plan
construction (done in EmitCsgCmp). Irrespective of the outcome of the test, we
recursively try to extend S2 such that this test becomes successful. Overall, the
EnumerateCmpRec behaves very much like EnumerateCsgRec. Its pseudocode
looks as follows:
EnumerateCmpRec(S1 , S2 , X)
for each N ⊆ IN(S2 , X): N 6= ∅
if DpTable[S2 ∪ N ]6= ∅ ∧
∃(u, v) ∈ E : u ⊆ S1 ∧ v ⊆ S2 ∪ N
EmitCsgCmp(S1 , S2 ∪ N )
X = X ∪ IN(S2 , X)
for each N ⊆ IN(S2 , X): N 6= ∅
EnumerateCmpRec(S1 , S2 ∪ N, X)
18.5.5 EmitCsgCmp
The procedure EmitCsgCmp(S1 ,S2 ) is called for every S1 and S2 such that
(S1 , S2 ) forms a csg-cmp-pair. It is the (call back) interface for BuEnumCcpHyp.
Its only task is to call BuildPlan, which then builds the optimal plan(s) for
(S1 , S2 ).
calcNeighborhood(S, X)
N := ∅
if isConnected(S)
N = simpleNeighborhood(S) \ X
else
foreach s ∈ S
N ∪= simpleNeighborhood(s)
F = (S ∪ X ∪ N ) // forbidden since in X or already handled
foreach (u, v) ∈ E
if u ⊆ S
if v ∩ F = ∅
N += min(v)
F ∪= N
if v ⊆ S
if u ∩ F = ∅
N += min(u)
F ∪= N
hypergraph not containing any subsumed edges. For some set S, for which we
want to calculate the neighborhood, define the set of reachable hypernodes as
where X contains the forbidden nodes. Then, any set of nodes N such that for
every hypernode in W (S, X) exactly one element is contained in N can serve
as the neighborhood.
Further, in order to make BuEnumCcpHyp as efficient as DPccp for simple
graphs, it is convenient to materialize the simple neighborhood for every plan
class contained in the DpTable and calculate it bottom-up. Figure 18.3 contains
one possible implementation of the neighborhood calculation.
18.6 DPhyp
19.1 Introduction
Simple rewrites as indicated in Section ?? for IN and OR predicates that boil
down to comparisons of a column with a set of constants can eliminate disjunc-
tion from the plan or push it into a multirange index access.
Another possibility that can be used for disjunctions on single columns is
to use DISJOINT UNION of plans. This is a special form of UNION where
conditions ensure that no phantom duplicates are produced. The DISJOINT
UNION operator merely concatenates the result tables without any further
overhead like duplicate elimination.
For example a predicate of the form x = c1 or y = c2 where x and y are
columns of the same table results in two predicates
1. x = c1
2. x <> c1 AND y = c2
Obviously, no row can satisfy both conditions. Hence, the query select *
from R where x = c1 or y = c2 can be safely rewritten to
In case there are indexes on x and y efficient plans do exist. If they don’t the
table R needs to be scanned twice. This problem is avoided by using bypass
plans.
DISJOIN UNIONs can also be used for join predicates. Consider the follow-
ing example query: select * from R, S where R.a = S.a OR R.b = S.a This
query can be rewritten to (select * from R, S where R.a = S.a) DISJOINT
UNION (select * from R, S where R.a <> S.a and R.b = S.b) The gen-
eral condition here is that all equality predicates have one side identical. Note
that both tables are scanned and joined twice. Bypass plans will eliminate this
problem.
367
368 CHAPTER 19. OPTIMIZING QUERIES WITH DISJUNCTIONS
∪ ∪ ∪
σb σa σb σc σb σc
σa σc σa σa σa
Ia Ib II
σa σb∨c
σb∨c σa
I II
CNF plans never produce duplicates. The evaluation of the boolean factors
can stop as soon as some predicate evaluates to true. Again, some (expensive)
predicates might be evaluted more than once in CNF plans. Figure 19.3 shows
some bypass plans. Note the different output streams. It should be obvious,
that a bypass plan can be more efficient than both a CNF or DNF plan. It
∪ ∪ ∪ ∪
σc σb σc σa σa
− + +
σb σc σa σa σc σa
σc
σa σa
−− + − + − +
σb σb σb
I II III IV V
is possible to extend the idea of bypass plans to join operators. However, this
and the algorithm to generate bypass plans is beyond the scope of the current
paper (see [481, 824, 182]).
370 CHAPTER 19. OPTIMIZING QUERIES WITH DISJUNCTIONS
• logical information
• physical information
– costs
– cardinality information
For fast processing, the first three set-valued items in the logical information
block are represented as bit-vectors. However, the problem is that an upper
bound on the size of these bitvectors is not reasonable. Hence, they are of
variant size. It is recommendable, to have a plan node factory that generates
plan nodes of different length such that the bit-vectors are included in the plan
node. A special interpreter class then knows the offsets and lengths of the
different bitvectors and supplies the operations needed to deal with them. This
bit-vector interpreter can be attached to the plan generator’s control block as
indicated in Fig. 25.3.
19.6 Bibliography
Disjunctive queries: P. Ciaccia and M. Scalas: Optimization Strategy for Re-
lational Queries. IEEE Transaction on Software Engineering 15 (10), pp 1217-
1235, 1989.
Kristofer Vorwerk, G. N. Paulley: On Implicate Discovery and Query Op-
timization. International Database Engineering and Applications Symposium
(IDEAS’02)
Jack Minker, Rita G. Minker: Optimization of Boolean Expressions-Historical
Developments. IEEE Annals of the History of Computing 2 (3), pp 227-238,
1980.
Chaudhuri: SIGMOD 03: [146]
Conjunctive Queries, Branch Minimization: [730]
Also Boolean Difference Calculus (?): [813]
372 CHAPTER 19. OPTIMIZING QUERIES WITH DISJUNCTIONS
Chapter 20
373
374 CHAPTER 20. GENERATING PLANS FOR THE FULL ALGEBRA
Chapter 21
Generating DAG-structured
Plans
@misc{ roy-optimization,
author = "Prasan Roy",
title = "Optimization of DAG-Structured Query Evaluation Plans",
url = "citeseer.nj.nec.com/roy98optimization.html" }
375
376 CHAPTER 21. GENERATING DAG-STRUCTURED PLANS
Chapter 22
22.1 Introduction
As we have seen in Chapter 3, computing the the optimal join for large queries is
a very hard problem. Most hand-written queries join just a few (<15) relations,
but in general join queries can become quite large: Some systems like SAP R/3
store their data in thousands of relations, and subsequently generate large join
queries. Other examples include data warehousing, where a fact table is joined
with a large number of dimension tables, forming a star join, and databases that
make heavy use of views to simplify query formulation (where the views then
implicitly add joins). Existing database management systems have difficulties
optimizing very large join queries, falling back to heuristics when they cannot
solve them exactly anymore. This is unfortunate, as it does not offer a smooth
transition. Ideally, one would optimize a query as much as possible under given
time constraints.
When optimizing join queries, the optimal join order is usually determined
using some variant of dynamic programming (DP). However finding the opti-
mal join is NP-hard in general, which means that large join queries become
intractable at some point. On the other hand, the complexity of the problem
depends heavily upon the structure of the query (see Chapter 3), where some
queries can be optimized exactly even for a large number of relations while
other queries quickly become too difficult. As computing the optimal join order
becomes intractable at some point, the standard technique of handling large
join queries resorts to some heuristics. Some commercial database systems first
try to solve the problem exactly using DP, and then fall back to greedy heuris-
tics when they run out memory. As we have seen in Chapter 3, a wide range
of heuristics has been proposed in the literature. Most of them integrate some
kind of greedy processing in the optimization process, greedily building execu-
tion plan fragments that seem plausible. The inherent problem of this approach
is that it is quite likely to greedily make a decision that one would regret hav-
ing more information about the complete execution plan. For example greedily
deciding which two relations should be joined first is very hard, as it depends
377
378 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH
1e+09
chains
1e+08 cycles
stars
1e+07 grids
cliques
optimization time [ms]
1e+06
100000
10000
1000
100
10
4 6 8 10 12 14 16 18 20
number of relations
soon become too expensive for everything except chain and cycle queries. Clique
queries are particular bad, of course, but even the data warehousing star queries
are too complex relatively soon. For really large queries (e.g., 50 relations),
finding the optimal solution using DP is out of question for most query types.
Now the basic idea of graph simplification stems from the fact that some
graph easier to solve than others: If the problem is too difficult to solve exactly,
we change the query graph to make it easier to solve. We will look at this
simplification strategy in the next section.
R0 R1 R0 R1
graph
R3 R2 R3 R2
R0 B R1 R0 B R1
joins R0 B R2 {R0 , R1 } B R2
R0 B R3 R0 B R3
original 1st step
R0 R1 R0 R1
graph
R3 R2 R3 R2
R0 B R1 R0 B R1
joins {R0 , R1 } B R2 {R0 , R1 } B R2
{R0 , R1 } B R3 {R0 , R1 , R2 } B R3
2nd step 3rd step
join, and update the neighbors when a join is modified. Further, we remember
the estimated benefit for each neighbor, and keep all joins in a priority queue
ordered by the maximum benefit they can get from ordering a neighbor. This
eliminates the two nested loops in the algorithm, and greatly improves runtime.
experiments [627]. Note that we can even avoid generating all possible merge
steps: By using search techniques for unbounded search (e.g., [75]) we can
generate merging steps as required by the search strategy. This does not change
the asymptotic complexity, but it is more efficient if most queries require few
or no simplification steps (which is probably the case in practice).
C((X B1 R1 ) B2 R2 )
orderingBenefit(X B1 R1 , X B2 R2 ) =
C((X B2 R2 ) B1 R1 )
The rational here is that if joining first R2 and then R1 is orders of magni-
tude cheaper than first joining R1 and then R2 , it is very likely that the join
with R2 will come before the join with R1 in the optimal solution, regardless of
the other relations involved. As the simplification algorithms orders the highest
384 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH
expected benefit first, it first enforces orderings where the cost differences are
particularly large (and thus safe).
Note that the criterion shown above is oversimplified. First, computing
the cost function C is not trivial, as we are only comparing joins and do not
have complete execution plans yet. In particular information about physical
properties of the input is missing, which is required by some cost functions. One
way to avoid this is to use the Cout cost functions for the benefit estimation.
The advantage of Cout is that it can be used without physical information,
and further the optimizations based upon Cout are usually not that bad, as
minimizing intermediate results is a plausible goal. Using the real cost function
would be more attractive, but for some cost functions we can only use the real
cost function in the final DP phase, as then physical information is available.
The second problem is that we are not really comparing X B1 R1 with
X B2 R2 , but S1L B1 S1R with S2L B2 S2R , where B1 and B2 are neighboring
hyperedges in the query graph. The are multiple cases that can occur, here we
assume that S2L ⊆ S1L , the other cases are analogous. We define |S|B as the
output cardinality of joining the relations in S:
Then the joins S1L B1 S1R and S2L B2 S2R can be interpreted as X B1 R1 and X B2 R2
with |X| = max(|S1L |B , |S2L |B ), |R1 | = |S1R |B , and |R2 | = |S2R |B . Note that we
do not have to compute the costs of joining the relations in Si , as we are only
interested in comparing the relative performance of B1 and B2 . Note further
that the accuracy of the prediction will increase over time, as the Si grow and
at some point contain all relations that will come before a join. Therefore it
is important to make the ’safe’ orderings early, when the uncertainty is higher,
and perform the more unclear orderings later when more is known about the
input.
C(R0 B1 R1 B2 R2 ) ≥C(R0 B2 R2 B1 R1 )
⇒C(R0 B3 R3 B1 R1 B2 R2 )≥C(R0 B3 R3 B2 R2 B1 R1 )
Or, in other words, the optimal relative ordering of B1 and B2 remains un-
changed by changing the cardinality of R0 by a factor of α. This is closely
related to the known ASI property of cost functions [432]. as it can be shown
easily that every ASI cost function is relative order preserving. But relative or-
der preserving is more general than ASI, for example a simple sort-merge-join
22.3. GRAPH SIMPLIFICATION ALGORITHM 385
cost function (CSM (R1 B R2 ) = C(R1 ) + C(R2 ) + |R1 | log |R1 | + |R2 | log |R2 |)
does not satisfy the ASI property, but is relative order preserving.
As queries we consider star queries of the form Q = (V = {R0 , . . . , Rn ), E =
{R0 B1 R1 , . . . , R0 Bn Rn }) (can be guaranteed by renaming relations), and
require independence between join predicates and a relative order preserving
cost function C. W.l.o.g. we assume that the cost function is symmetric,
as we can always construct a symmetric cost function by using min(C(Ri B
Rj ), C(Rj B Ri )). Then, star queries have two distinct properties: First, all
query plans are linear, with R0 involved in the first join. Thus, as our cost
function is symmetric, we can restrict ourselves to plans of the form (R0 B
Rπ(1) ) . . . B Rπ(n) , where π(i) defines a permutation of [1, n]. Second, given a
non-empty join tree T and a relation Ri 6∈ T , T 0 = T B Ri is a valid join tree
and |T 0 | = |T ||Ri | |R 0 BRi |
|R0 ||Ri | . Thus any (new) relation can be joined to an existing
join tree and the selectivity of the join is unaffected by the relations already
contained in the tree (due to the independence of join predicates). Note that
while this holds for star queries, it does not hold in general. For example, clique
queries also allow for an arbitrary join order, but the selectivities are affected
by previously joined relations.
Using these observations, we now show the optimality for star queries:
Lemma 22.3.1 Given a query Q = (V, E), a relative order preserving cost
function C and four relations R0 , Ri , Rj , Rk ∈ V (i 6= j 6= k 6= 0). Then
C(R0 Bi Ri Bj Rj ) ≥ C(R0 Bj Rj Bi Ri ) implies C(R0 Bi Ri Bj Rj Bk Rk ) ≥
C(R0 Bj Rj Bi Ri Bk Rk ).
Theorem 22.3.2 Follows directly from the fact that (R0 Bi Ri Bj Rj ) ≡ (R0 Bj
Rj Bi Ri ). The join Bk gets the same input in both cases, and thus causes the
same costs. This lemma holds even for non-star queries and arbitrary (mono-
tonic) cost functions.
Lemma 22.3.3 Given a query Q = (V, E), a relative order preserving cost
function C and four relations R0 , Ri , Rj , Rk ∈ V (i 6= j 6= k 6= 0). Then
C(R0 Bi Ri Bj Rj ) ≥ C(R0 Bj Rj Bi Ri ) implies C(R0 Bk Rk Bi Ri Bj Rj ) ≥
C(R0 Bk Rk Bj Rj Bi Ri ).
Theorem 22.3.4 Follows from the definition of relative order preserving cost
functions.
Corollary 1 Given a query Q = (V, E), a relative order preserving cost func-
tion C, three relations R0 , Ri , Rj ∈ V (i 6= j 6= 0), and two join sequences
S1 , S2 of relations in V such that R0 S1 Bi Ri Bj Rj S2 forms a valid join tree.
Then C(R0 Bi Ri Bj Rj ) ≥ C(R0 Bj Rj Bi Ri ) implies C(R0 S1 Bi Ri Bj Rj S2 ) ≥
C(R0 S1 Bj Rj Bi Ri S2 ).
Theorem 22.3.5 Follows from Lemma 22.3.1 and 22.3.3. Both assume noth-
ing about Rk except independence, thus Bk Rk could be a sequence of joins.
386 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH
Theorem 1 Given a star query Q = (V, E) and a relative order preserving cost
function C. Then for any optimal optimal join tree T and pairs of relations
Ri , Rj neighbored in T (i.e., T has the form R0 S1 Bi Ri Bj Rj S2 ) the following
condition holds: Either C(R0 Bi Ri Bj Rj ) ≤ C(R0 Bj Rj Bi Ri ) or T 0 =
R0 S1 Bj Rj Bi Ri S2 is optimal, too.
Theorem 22.3.6 By contradiction. We assume that C(R0 Bi Ri Bj Rj ) >
C(R0 Bj Rj Bi Ri ) and T 0 is not optimal. By Corollary 1 we can deduce that
C(R0 Bi Ri Bj Rj ) > C(R0 Bj Rj Bi Ri ) ⇒ C(T 0 ) = C(R0 S1 Bj Rj Bi Ri S2 ) ≤
C(R0 S1 Bi Ri Bi Ri S2 ) = C(T ). This is a contradiction to the assumption that
T 0 is not optimal
This theorem is a strong indication that our simplification algorithm is plau-
sible, as we know that one of the optimal solutions will satisfy the ordering
constraints used by the algorithm. Unfortunately the authors of [627] were
only able to prove the optimality by restricting the cost function some more
(perhaps unnecessarily): A cost function C is fully relative order preserving if
it is relative order preserving and the following condition holds for arbitrary
relations R0,...,3 and arbitrary joins B1,2,3 with independent join predicates:
C(R0 B1 R1 B2 R2 ) ≥ C(R0 B2 R2 B1 R1 ) ⇒ C(R0 B1 R1 B3 R3 B2 R2 ) ≥
C(R0 B2 R2 B3 R3 B1 R1 ). Again, this property is satisfied by all ASI cost
functions. Using this definition, we can show the optimality as follows.
Lemma 22.3.7 Given a query Q = (V, E), a fully relative order preserving
cost function C, three relations R0 , Ri , Rj ∈ V (i 6= j 6= 0), and three join
sequences S1 , S2 , S3 of relations in V such that R0 S1 Bi Ri S2 Bj Rj S3 forms a
valid join tree. Then C(R0 Bi Ri Bj Rj ) ≥ C(R0 Bj Rj Bi Ri ) implies C(R0 S1 Bi
Ri S2 Bj Rj S3 ) ≥ C(R0 S1 Bj Rj S2 Bi Ri S3 ).
Theorem 22.3.8 Follows from Corollary 1 and the definition of fully relative
order preserving cost functions.
Theorem 2 Given a star query Q = (V, E) and a fully relative order pre-
serving cost function C. Applying the GraphSimplificationOptimizer algorithm
repeatedly leads to the optimal execution plan.
Theorem 22.3.9 As Q is a star query, any linear join order is valid, thus join
ordering is done purely based upon costs. The algorithm repeatedly orders the
two joins with the largest quotient, which is guaranteed to be ≥ 1 due to the lack
of join ordering constraints. Lemma 22.3.7 shows joins can be ordered relative
to each other regardless of other relations, thus if the algorithm orders Bi before
Bj there exists an optimal solution with Bi before Bj (analogue to Theorem 1).
The algorithm simplified the graph until the joins are in a total order, which
uniquely describes one optimal execution plan.
500000
400000
#subgraphs
300000
200000
100000
0
14000
12000
10000
time [ms]
8000
6000
4000
2000
10
0
costs
5
0
0 20 40 60 80 100 120 140 160
number of simplification steps
Figure 22.5: The Effect of Simplification Steps for a Star Query with 20 Rela-
tions
Clearly, each simplification step decreases the search space, i.e., the number
of connected subgraphs. Ideally the optimization time goes down analogous-
ly, and, unfortunately, the costs will go up if the heuristic makes mistakes.
Figure 22.5 shows how the number of connected subgraphs, the optimization
time, and the scaled costs (relative to the optimal solution) change during sim-
plification of a star query with 20 relations. As predicted, the search space
shrinks monotonically with simplification. It does not shrink strictly mono-
tonically, as the simplification algorithm sometimes adds restrictions that are
already implied through other restrictions, but this is not an issue for the full
algorithm due to the binary search. The optimization time follows the search
space size, although there are some local peaks. Apparently they are caused
by the higher costs of hyperedges for the DPhyp algorithm relative to normal
edges. The scaled costs are constantly 1 here, i.e., the algorithm produces the
optimal solution regardless of the number of simplification steps. This is due to
the theoretical properties of the ordering heuristic (see Section 22.3.4), which
in this case is optimal.
For grid queries the general situation is similar as shown in Figure 22.6.
Search space and optimization time decrease similar to star queries, the costs
however increase over time. Initially the heuristic performs only the relatively
safe orderings, which do not cause any increases in costs, but at some point it
makes a mistake in ordering and causes the costs to increase step-wise. Fortu-
nately this happens when the search space has already been reduced a lot, which
means that for simpler queries there is a reasonable hope that the heuristic will
388 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH
24000
21000
18000
#subgraphs 15000
12000
9000
6000
3000
0
2100
1800
1500
time [ms]
1200
900
600
300
30
0
costs
15
0
0 20 40 60 80 100
number of simplification steps
Figure 22.6: The Effect of Simplification Steps for a Grid Query with 20 Rela-
tions
23.1 Introduction
The most expensive operations (e.g. join, grouping, duplicate elimination) dur-
ing query evaluation can be performed more efficiently if the input is ordered
or grouped in a certain way. Therefore, it is crucial for query optimization to
recognize cases where the input of an operator satisfies the ordering or group-
ing requirements needed for a more efficient evaluation. Since a plan generator
typically considers millions of different plans – and, hence, operators –, this
recognition easily becomes a performance bottleneck for plan generation, often
leading to heuristic solutions.
The importance of exploiting available orderings has already been recog-
nized in the seminal work of Selinger et al [772]. They presented the concept of
interesting orderings and showed how redundant sort operations could be avoid-
ed by reusing available orderings, rendering sort-based operators like sort-merge
join much more interesting.
Along these lines, it is beneficial to reuse available grouping properties, for
example for hash-based operators. While heuristic techniques to avoid redun-
dant group-by operators have been given [152], for a long time groupings have
not been treated as thoroughly as orderings. One reason might be that while
orderings and groupings are related (every ordering is also a grouping), group-
ings behave somewhat differently. For example, a tuple stream grouped on the
attributes {a, b} need not be grouped on the attribute {a}. This is different
from orderings, where a tuple stream ordered on the attributes (a, b) is also
ordered on the attribute (a). Since no simple prefix (or subset) test exists for
groupings, optimizing groupings even in a heuristic way is much more difficult
than optimizing orderings. Still, it is desirable to combine order optimization
and the optimization of groupings, as the problems are related and treated sim-
389
390CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN
ilarly during plan generation. Recently, some work in this direction has been
published [898]. However, this only covers a special case of grouping. Instead,
in this chapter we follow the approach presented by Neumann and Moerkotte
[631, 630]
Other existing frameworks usually consider only order optimization, and
experimental results have shown that the costs for order optimization can have
a large impact on the total costs of query optimization [631]. Therefore, some
care is needed when adding groupings to order optimization, as a slowdown of
plan generation would be unacceptable.
In this chapter, we present a framework to efficiently reason about orderings
and groupings. It can be used for the plan generator described in Chapter ??,
but is actually an independent component that could be used in any kind of plan
generator. Experimental results show that it efficiently handles orderings and
groupings at the same time, with no additional costs during plan generation and
only modest one time costs. Actually, the operations needed for both ordering
and grouping optimization during plan generation can be performed in O(1),
basically allowing to exploit groupings for free.
23.2.1 Ordering
During plan generation, many operators require or produce certain orderings.
To avoid redundant sorting, it is required to keep track of the orderings a certain
plan satisfies. The orderings that are relevant for query optimization are called
interesting orders [772]. The set of interesting orders for a given query consists
of
This includes the final ordering requested by the given query, if this is specified.
The interesting orders are logical orderings. This means that they specify a
condition a tuple stream must meet to satisfy the given ordering. In contrast,
the physical ordering of a tuple stream is the actual succession of tuples in
the stream. Note that while a tuple stream has only one physical ordering,
23.2. PROBLEM DEFINITION 391
it can satisfy multiple logical orderings. For example, the stream of tuples
((1, 1), (2, 2)) with schema (a, b) has one physical ordering (the actual stream),
but satisfies the logical orderings a, b, ab and ba.
Some operators, like sort, actually influence the physical ordering of a
tuple stream. Others, like select, only influence the logical ordering. For
example, a sort[a] produces a tuple stream satisfying the ordering (a) by
actually changing the physical order of tuples. After applying select[a=b] to
this tuple stream, the result satisfies the logical orderings (a), (b), (a, b), (b, a),
although the physical ordering did not change. Deduction of logical orderings
can be described by using the well-known notion of functional dependency (FD)
[807]. In general, the influence of a given algebraic operator on a set of logical
orderings can be described by a set of functional dependencies.
We now formalize the problem. Let R = (t1 , . . . , tr ) be a stream (ordered
sequence) of tuples in attributes A1 , . . . , An . Then R satisfies the logical order-
ing o = (Ao1 , . . . , Aom ) (1 ≤ oi ≤ n) if and only if for all 1 ≤ i < j ≤ r the
following condition holds:
Ω0 (O, F ) := O
Ωi (O, F ) := Ωi−1 (O, F ) ∪
[
O0 with o `f O0
f ∈F,o∈Ωi−1 (O,F )
S∞
Let Ω(O, F ) be the prefix closure of i=0 Ωi (O, F ). We write o `F o0 if and
only if o0 ∈ Ω(O, F ).
23.2.2 Grouping
It was shown in [898] that, similar to order optimization, it is beneficial to
keep track of the groupings satisfied by a certain plan. Traditionally, group-by
operators are either applied after the rest of the query has been processed or
are scheduled using some heuristics [152]. However, the plan generator could
take advantage of grouping properties produced e.g. by avoiding re-hashing if
such information was easily available.
Analogous to order optimization, we call this grouping optimization and
define that the set of interesting groupings for a given query consists of
This includes the grouping specified by the group-by clause of the query, if any
exists.
These groupings are similar to logical orderings, as they specify a condition
a tuple stream must meet to satisfy a given grouping. Likewise, functional
dependencies can be used to infer new groupings.
More formally, a tuple stream R = (t1 , . . . , tr ) in attributes A1 , . . . , An
satisfies the grouping g = {Ag1 . . . , Agm } (1 ≤ gi ≤ n) if and only if for all
1 ≤ i < j < k ≤ r the following condition holds:
∀1 ≤ l ≤ m ti .Agl = tk .Agl
⇒ ∀1 ≤ l ≤ m ti .Agl = tj .Agl
Two remarks are in order here. First, note that a grouping is a set of
attributes and not – as orderings – a sequence of attributes. Second, note
that given two groupings g and g 0 ⊂ g and a tuple stream R satisfying the
grouping g, R need not satisfy the grouping g 0 . For example, the tuple stream
((1, 2), (2, 3), (1, 4)) with the schema (a, b) is grouped by {a, b}, but not by {a}.
This is different from orderings, where a tuple stream satisfying an ordering o
also satisfies all orderings that are a prefix of o.
23.2. PROBLEM DEFINITION 393
Ω0 (G, F ) := G
Ωi (G, F ) := Ωi−1 (G, F ) ∪
[
G0 with g `f G0
f ∈F,g∈Ωi−1 (G,F )
S∞
Let Ω(G, F ) be i=0 Ωi (G, F ). We write g `F g 0 if and only if g 0 ∈ Ω(G, F ).
1. key constraints
3. filter predicates
4. simple expressions
23.3 Overview
As we have seen, explicit maintenance of the set of logical orderings and group-
ings can be very expensive. However, the ADT OrderingGrouping required
for plan generation does not need to offer access to this set: It only allows to
test if a given interesting order or grouping is in the set and changes the set
according to new functional dependencies. Hence, it is not required to explicitly
represent this set; an implicit representation is sufficient as long as the ADT
operations can be implemented atop of it. In other words, we need not be able
to reconstruct the set of logical orderings and groupings from the state of the
ADT. This gives us room for optimizations.
The initial idea (see [631]) was to represent sets of logical orderings as states
of a finite state machine (FSM). Roughly, a state of the FSM represents a
current physical ordering and the set of logical orderings that can be inferred
from it given a set of functional dependencies. The edges (transitions) in the
FSM are labeled by sets of functional dependencies. They lead from one state
to another, if the target state of the edge represents the set of logical orderings
that can be derived from the orderings the edge’s source node represents by
applying the set of functional dependencies the edge is labeled with. We have
396CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN
abcd
{b → d}
abc ab a
{b → d} {b → d}
abdc abd
{b → d}
abc abcd
{b → d}
abcd abc abcd
{b → d}
abc ab a
{b → d} {b → d}
abdc abd
{b → d} a,ab,abc
a,ab,abc,{ab} abd,abcd,
abdc,{abd}
state of the ADT is a state of the FSM and testing for a logical ordering or
grouping can be performed by checking if the node with the ordering or group-
ing is reachable from the current state by following edges (as we will see, this
can be precomputed to yield the O(1) time bound for the ADT operations). If
the state of the ADT must be changed because of functional dependencies, the
state in the FSM is changed by following the edge labeled with the functional
dependency.
However, the non-determinism of this transition is a problem. Therefore, for
practical purposes the NFSM must be converted into a DFSM. The resulting
DFSM is shown in Figure 23.5. Note that although in this simple example the
DFSM is very small, the conversion could lead to exponential growth. There-
fore, additional pruning techniques for groupings are presented in Section 23.4.7.
However, the inclusion of groupings is not critical for the conversion, as the
grouping part of the NFSM is nearly independent of the ordering part. In
Section 23.5 we look at the size increase due to groupings. The memory con-
sumption usually increases by a factor of two, which is the minimum expected
increase, since every ordering is a grouping.
Some operators, like sort, change the physical ordering. In the NFSM, this
is handled by changing the state to the node corresponding to the new physical
ordering. Implied by its construction, in the DFSM this new physical ordering
typically occurs in several nodes. For example, (a, b, c) occurs in both nodes of
the DFSM in Figure 23.5. It is, therefore, not obvious which node to choose.
We will take care of this problem during the construction of the NFSM (see
Section 23.4.3).
4. Precompute values
for a sample query the extraction of both interesting orders and groupings is
illustrated in Section 23.5.
To illustrate subsequent steps, we assume that the set of sets of functional
dependencies
F = {{b → c}, {b → d}},
the interesting groupings
have been extracted from the query. We assume that those in OT = {(a, b, c)}
and GT = {{b, c}} are tested for but not produced by any operator, whereas
those in OP = {(b), (a, b)} and GP = {{b}} may be produced by some algebraic
operators.
b b
a,b b,c
a,b,c
{b → c}
b b,c b
{b → c}
q0 a,b a b,c
{b → c}
a,b,c
b b
{b → c}
q0 a,b a b,c
{b → c}
a,b,c
The artificial start state q0 has emanating edges incident to all states rep-
resenting interesting orders in OIP and interesting groupings in GPI (DA ). Also,
the states representing orderings have edges to their corresponding grouping
states (DOG ), as every ordering is also a grouping. The final NFSM for the
example is shown in Figure 23.10. Note that the states representing (a, b, c)
and {b, c} are not linked by an artificial edge since it is only tested for, as they
are in QTI .
{b}
b b
(b)
{b → c}
(a,b)
qo a,b a b,c
{b → c}
a,b,c
{b → c}
{b} 1:{b} 4:{b},{b,c}
(b) {b → c}
qo 2:(b),{b} 5:(b),{b},{b,c}
(a,b)
{b → c}
3:(a),(a,b) 6:(a),(a,b),(a,b,c)
state 1: 2: 3: 4: 5: 6:
(a) (a,b) (a,b,c) (b) {b} {b,c}
1 0 0 0 0 1 0
2 0 0 0 1 1 0
3 1 1 0 0 0 0
4 0 0 0 0 1 1
5 0 0 0 1 1 1
6 1 1 1 0 0 0
The construction of the DFSM from the NFSM follows the standard power
set construction that is used to translate an NFA into a DFA [544]. A formal
description and a proof of correctness is given in Section ??. It is important
to note that this construction preserves the start state and the artificial edges.
The resulting DFSM for the example is shown in Figure 23.11.
state 1: 2: 3: 4:
{b → c} (a, b) (b) {b}
qo - 3 2 1
1 4 - - -
2 5 - - -
3 6 - - -
4 4 - - -
5 5 - - -
6 6 - - -
the search space: Plans can only be compared and pruned if they have compa-
rable ordering and a comparable set of functional dependencies (see [807, 808]
for details). Reducing the size of the DFSM removes information that is not
relevant for plan generation and, therefore, allows a more aggressive pruning of
plans.
At first, the functional dependencies are pruned. Here, functional dependen-
cies which can never lead to a new interesting order or grouping are removed.
For convenience, we extend the definition of Ω(O, F ) and define
{(o1 , o2 ) | o1 ∈ QA , o2 ∈ QA ∧ ∀f ∈ F :
(Ω({o1 }, {f }) \ Ω({o1 }, )) =
(Ω({o2 }, {f }) \ Ω({o2 }, ))}.
The following states can be replaced with the next state reachable by an edge:
{o | o ∈ QA ∧ ∀f ∈ F :
Ω(Ω({o}, ), {f }) \ {o} =
Ω(Ω({o}, ) \ {o}, {f })}.
In the example, this removed the state (b, c), which was artificial and only led
to the state (b).
These techniques reduce the size of the NFSM, but still most states are
artificial states, i.e. they are only created because they can be reached by con-
sidering functional dependencies when a certain ordering or grouping is avail-
able. But many of these states are not relevant for the actual query processing.
406CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN
For example, given a set of interesting orders which consists only of a single
ordering (a) and a set of functional dependencies which consists only of a → b,
the NFSM will contain (among others) two states: (a) and (a, b). The state
(a, b) is created since it can be reached from (a) by considering the functional
dependency, however, it is irrelevant for the plan generation, since (a, b) is not
an interesting order and is never created nor tested for. Actually, in the ex-
ample above, the whole functional dependency would be pruned (since b never
occurs in an interesting order), but the problem remains for combinations of
interesting orders: Given the interesting orders (a), (b) and (c) and the func-
tional dependencies {a → b, b → a, b → c, c → b}, the NFSM will contain states
for all permutations of a, b and c. But these states are completely useless, since
all interesting orders consist only of a single attribute and, therefore, only the
first entry of an ordering is ever tested.
Ideally, the NFSM should only contain states which are relevant for the
query; since this is difficult to ensure, a heuristic can be used which greatly
reduces the size of the NFSM and still guarantees that all relevant states are
available: When considering a functional dependency of the form a → b and
an ordering o1 , o2 , . . . , on with oi = a for some i (1 ≤ i ≤ n), the b can be
inserted at any position j with i < j ≤ n + 1 (for the special case of a condition
a = b, i = j is also possible). So, an entry of an ordering can only affect entries
on the right of its own position. This means that it is unnecessary to consider
those parts of an ordering which are behind the length of the longest interesting
order; since that part cannot influence any entries relevant for plan generation,
it can be omitted. Therefore, the orderings created by functional dependencies
can be cut off after the maximum length of interesting orders, which results in
less possible combinations and a smaller NFSM.
The space of possible orderings can be limited further by taking into account
the prefix of the ordering: before inserting an entry b in an ordering o1 , o2 , . . . , on
at the position i, check if there is actually an interesting order with the prefix
o1 , o2 , ...oi−1 , b and stop inserting if no interesting order is found. Also limit the
new ordering to the length of the longest matching interesting order; further
attributes will never be used. If functional dependencies of the form a = b occur,
they might influence the prefix of the ordering and the simple test described
above is not sufficient. Therefore, a representative is chosen for each equivalence
class created by these dependencies, and for the prefix test the attributes are
replaced with their representatives. Since the set of interesting orders with
a prefix of o1 , . . . , on is a superset of the set for the prefix o1 , ...on , on+1 , this
heuristic can be implemented very efficiently by iterating over i and reducing
the set as needed.
Additional techniques can be used to avoid creating superfluous artifical
states for groupings: First, in Step 2.3 (see Figure 23.6) the set of attributes
occurring in interesting groupings is determined:
AG = {a | ∃g ∈ GI : a ∈ g}
r(a, 0) = {a}
r(a, n) = r(a, n − 1) ∪
{a0 | ∃(a1 . . . am → a0 ) ∈ F :
{a1 . . . am } ∩ r(a, n − 1) 6= ∅}
r(a) = r(a, |F|) ∩ AG
E(a, 0) = {a}
E(a, n) = E(a, n − 1) ∪
{a0 | ((a = a0 ) ∈ F) ∨ ((a0 = a) ∈ F)}
E(a) = E(a, |F|)
e(a) = a representative choosen from E(A)
e({a1 . . . an }) = {e(a1 ) . . . e(an )}.
GE
I = {e(g) | g ∈ GI }
Note that although they appear to test similar conditions, the first pruning
technique (using r(a)) is not dominated by the second one (using e(a)). Con-
sider e.g. the interesting grouping {a}, the equation a = b and the functional
dependency a → b. Using only the second technique, the grouping {a, b} would
be created, although it is not relevant.
select *
from S s, R r
where r.a=s.a and r.b=s.b and
r.c=s.c and r.d=s.d
When answering this query using a sort-merge join, the operator has to
request a certain odering. But there are many orderings that could be used:
The intuitive ordering would be abcd, but adcb or any other premutation could
have been used as well. This is problematic, as checking for an exponential
number of possibilities is not acceptable in general. Note that this problem is
not specific to our approach, the same is true, e.g., for Simmen’s approach.
The problem can be solved by defining a total ordering between the at-
tributes, such that a canonical ordering can be constructed. We give some rules
how to derive such an ordering below, but it can happen that such an order-
ing is unavailable (or rather the construction rules are ambiguous). Given, for
example, two indices, one on abcd and one on adcb, both orderings would be a
reasonable choice. If this happens, the operators have two choices: Either they
accept all reasonable orderings (which could still be an exponential number,
but most likely only a few orderings remaing) or they limit themselves to one
ordering, which could induce unnecessary sort operators. Probably the second
choice is preferable, as the ambiguous case should be rare and does not justify
the complex logic of the first solution.
The attribute ordering can be derived by using the following heuristical
rules:
1. Only attributes that occur in sets without natural ordering (i.e. complex
join predicates or grouping attributes) have to be ordered.
2. Orderings that are given (e.g., indices, user-requested orderings etc.) or-
der some attributes.
Figure 23.14: Plan generation for different join graphs, Simmen’s algorithm
(left) vs. our algorithm (middle)
The rules must check if they create contradictions. If this happens. the
contradicting ordering must be omitted, resulting in potentially superfluous sort
operators. Note that in some cases these sort operators are simply unavoidable:
If for the example query one index on R exists with the ordering abcd and one
index on S with the ordering dcba, the heuristical rules detect a contradiction
and choose one of the orderings. This results in a sort operator before the
(sort-merge) join, but this sort could not have been avoided anyway.
calls to the very expensive reduce operation. Second, since Simmen’s algorithm
requires dynamic memory, we implemented a specially tailored memory man-
agement. This alone gave us a speed up by a factor of three. We further tuned
the algorithm by thoroughly profiling it until no more improvements were possi-
ble. For each order optimization framework the plan generator was recompiled
to allow for as many compiler optimizations as possible. We also carefully ob-
served that in all cases both order optimization algorithms produced the same
optimal plan.
We first measured the plan generation times and memory usage for TPC-
R Query 8. A detailed discussion of this query follows in Section 23.7, here
we ignored the grouping properties to compare it with Simmen’s algorithm.
The result of this experiment is summarized in the following table. Since or-
der optimization is tightly integrated with plan generation, it is impossible to
exactly measure the time spent just for order optimization during plan gener-
ation. Hence, we decided to measure the impact of order optimization on the
total plan generation time. This has the advantage that we can also (for the
first time) measure the impact order optimization has on plan generation time.
This is important since one could argue that we are optimizing a problem with
no significant impact on plan generation time, hence solving a non-problem. As
we will see, this is definitely not the case.
In subsequent tables, we denote by t(ms) the total execution time for plan
generation measured in milliseconds, by #Plans the total number of subplans
generated, by t/plan the average time (in microseconds) needed to introduce
one plan operator, i.e. the time to produce a single subplan, and by Memory
the total memory (in KB) consumed by the order optimization algorithms.
Simmen Our algorithm
t (ms) 262 52
#Plans 200536 123954
t/plan (µs) 1.31 0.42
Memory (KB) 329 136
From these numbers, it becomes obvious that order optimization has a signif-
icant influence on total plan generation time. It may come as a surprise that
fewer plans need to be generated by our approach. This is due to the fact
that the (reduced) FSM only contains the information relevant to the query,
resulting in fewer states. With Simmen’s approach, the plan generator can only
discard plans if the ordering is the same and the set of functional dependen-
cies is equal (respectively a subset). It does not recognize that the additional
information is not relevant for the query.
In order to show the influence of the query on the possible gains of our
algorithm, we generated queries with 5-10 relations and a varying number of
join predicates —that is, edges in the join graph. We always started from
a chain query and then randomly added some edges. For small queries we
averaged the results of 100 queries and averaged 10 queries for large queries.
The results of the experiment can be found in Fig. 23.14. In the second column,
we denote the number of edges in terms of the number of relations (n) given in
the first column. The next six columns contain (1) the total time needed for
23.7. INFLUENCE OF GROUPINGS 411
plan generation (in ms), (2) the number of (sub-) plans generated, and (3) the
time needed to generate a subplan (in µs), i.e. to add a single plan operator, for
(a) Simmen’s algorithm (columns 3-5) and our algorithm (columns 6-8). The
total plan generation time includes building the DFSM when our algorithm is
used. The last three columns contain the improvement factors for these three
measures achieved by our algorithm. More specifically, column % x contains
the result of dividing the x column of Simmen’s algorithm by the corresponding
x column entry of our algorithm.
Note that we are able to keep the plan generation time below one second
in most cases and three seconds in the worst case, whereas when Simmen’s
algorithm is applied, plan generation time can be as high as 200 seconds. This
observation leads to two important conclusions:
For completeness, we also give the memory consumption during plan gen-
eration for the two order optimization algorithms (see Fig. 23.15). For our
approach, we also give the sizes of the DFSM which are included in the to-
tal memory consumption. All memory sizes are in KB. As one can see, our
approach consumes about half as much memory as Simmen’s algorithm.
select
o year,
sum(case when nation = ’[NATION]’
then volume
else 0
end) / sum(volume) as mkt share
from
(select
extract(year from o orderdate) as o year,
l extendedprice * (1-l discount) as volume,
n2.n name as nation
from part,supplier,lineitem,orders,customer,
nation n1,nation n2,region
where
p partkey = l partkey and
s suppkey = l suppkey and
l orderkey = o orderkey and
o custkey = c custkey and
c nationkey = n1.n nationkey and
n1.n regionkey = r regionkey and
r name = ’[REGION]’ and
s nationkey = n2.n nationkey and
o orderdate between date ’1995-01-01’ and
date ’1996-12-31’ and
p type = ’[TYPE]’
) as all nations
group by o year
order by o year;
When considering this query, all attributes used in joins, group-by and
order-by clauses are added to the set of interesting orders. Since hash-based
solutions are possible, they are also added to the set of interesting groupings.
23.7. INFLUENCE OF GROUPINGS 413
Note that here OIT and GTI are empty, as we assumed that each ordering
and grouping would be produced if beneficial. For example, we might assume
that it makes no sense to intentionally group by o year: If a tuple stream is
already grouped by o year it makes sense to exploit this, however, instead of
just grouping by o year it could make sense to sort by o year, as this is required
anyway (although here it only makes sense if the sort operator performs early
aggregation). In this case, {o year} would move from GPI to GTI , as it would
be only tested for, but not produced.
The set of functional dependencies (and equations) contains all join condi-
tions and constant conditions:
preparation time
10 o+g (n-1)
o (n-1)
o+g (n)
o (n)
8 o+g (n+1)
o (n+1)
duration (ms)
0
4 5 6 7 8 9 10 11
no of relations
Here time and space requirements both increase by a factor of two. Since
all interesting orderings are also treated as interesting groupings, a factor of
about two was expected.
While Query 8 is one of the more complex TPC-R queries, it is not overly
complex when looking at order optimization. It contains 16 interesting order-
ings/groupings and 8 functional dependencies, but they cannot be combined in
many reasonable ways, resulting in a comparatively small DFSM. In order to
get more complex examples, we produced randomized queries with 5 − 10 rela-
tions and a varying number of join predicates. We always started from a chain
query and then randomly added additional edges to the join graph. The results
are shown for n − 1, n and n + 1 additional edges. In the case of 10 relations,
this means that the join graph consisted of 18, 19 and 20 edges, respectively.
The time and space requirements for the preparation step are shown in
Figure 23.16 and Figure 23.17, respectively. For each number of relations, the
requirements for the combined framework (o+g) and the framework ignoring
groupings (o) are shown. The numbers in parentheses (n − 1, n and n + 1) are
the number of additional edges in the join graph.
10 o+g (n-1)
o (n-1)
o+g (n)
o (n)
8 o+g (n+1)
o (n+1)
memory (KB)
0
4 5 6 7 8 9 10 11
no of relations
OaG →bG is first grouped by a and then (within the block of tuples with the same
a value) grouped by b. However, this is a very strong condition that is usually
not satisfied by a hash-based grouping operator. Therefore, their work is not
general enough to capture the full functionality offered by a state-of-the-art
query execution engine.
In this chapter, we followed [631, 630].
418CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN
Chapter 24
24.1 Introduction
The plan generator relies on a cost function to evaluate the different plans and to
determine the cheapest one. This chapter is concerned with the development of
cost functions. The main input to cost functions are cardinalities. For example,
assume a scan of a relation, which also applies a selection predicate. Clearly,
the cost of scanning the relation depends on the physical layout of the relation
on disk. Further, the CPU cost for evaluating the predicate depends on the
number of tuples in the relation. Note that the cardinality of a relation is
independent of its physical layout.
In general, the cost of an algebraic operator is estimated by using a profile of
the database. The profile must be small, e.g., a couple of kilobytes per relation1 .
We distuinguish between the logical and the physical profile. For each database
item and its constituents, there exist specialized logical and physical profiles.
They exist for relations, indices, attributes, and sets of attributes. Consider a
relation R. Its cardinality |R| belongs to its logical profile, whereas the number
of pages ||R|| it occupies belongs to its physical profile. In Chapter 4, we saw
more advanced physical profiles.
The DBMS must be capable to perform several operations to derive profiles
and to deal with them. Fig. 24.1 gives an overview. This figure roughly follows
the approach of Mannino et al. [573, 572]. The first operation is the build op-
eration, that takes as input a specification of the profiles to be build (because
there are many different alternatives, as we will see) and the database. From
that, it builds the according profiles for all database items of all the different
granularities. When updates arrive, the profiles must be updated. This can
either be done by a complete recalculation or by an incremental update opera-
tion on the profiles themselves. The latter is reflected in the operation update.
Unfortunately, not all profiles can have an update operation. Within this book,
we will not be too concerned with building and updating profiles. At the end
1
Given today’s cost for main memory, it may also be reasonable to use a couple of
megabytes.
419
420 CHAPTER 24. CARDINALITY AND COST ESTIMATION
logical profile
profile specification build
physical profile
update profile
update
profile
cardinality estimation
cost cost
physical profiles estimation
algebraic expression
of this chapter, we will provide some references (See [206] for an overview).
The main operations this chapter deals with are among the remaining ones.
The first of them is cardinality estmation. Given an algebraic expression or a
calculus expression together with a logical profile of the database, we estimate
the output/result cardinality of the expression. Why do we say algebraic or
calculus expression? Remember that plan generators generate plans for plan
classes. Each plan class corresponds to a set of equivalent plans. They all pro-
duce the same result and, hence, the same number of output tuples. Thus, in
theory, one arbitrary representative of the class of equivalent algebraic expres-
sions should suffice to calculate the logical profile, as a logical profile depends
only on the outcome. On the other hand, the plan class more directly corre-
sponds to a calculus expression. Hence, estimating the result cardinality of a
calculus expression is a viable alternative. In the literature, most papers deal
with the first approach while only a few deal with the latter (e.g., [223]).
The second operation we will be concerned with is cost estimation. Given
logical and physical profiles for all inputs and an algebraic operator (tree), this
operation calculates the actual costs. Chapter 4 contains a detailed discussion
about disk access cost calculation. Hence, this part is considered done for
building blocks and access paths.
The third major task is profile propagation. Given a logical or physical
profile and an expression, we must be able to calculate the profile of the result,
since this may be the input to other expressions and thus be needed for further
cardinality estimates. The estimation of a physical profile occurs mostly in
24.1. INTRODUCTION 421
cases where operators write to disk. Given Chapter 4, this task is easy enough
to be left to the reader.
Since we follow the algebraic approach, we must be able to calculate the
output cardinality of every operator occurring in the algebra. This task is
vastly simplified by the observations contained in Table 24.1.
This shows that we can go a far way if we are able to estimate the output
cardinality for duplicate eliminating projections, selections, (bag) joins, and
semijoins. For a certain class of profiles, Richard shows that a profile consist-
ing ‘only’ of the sizes of all duplicate eliminating projections on all subsets of
attributes of all relations is a complete profile under certain assumptions [717].
Since the set of subsets of a set of attributes can be quite large, Richard ex-
ploits functional dependencies to reduce this set by exploiting the fact that
|ΠD D
α∪β (R)| = |Πα (R)| if there exists a functional dependency α → β.
A major differenciator for logical attribute profiles is the kind of the domain
of the attribute. We distinguish between categorial attributes (e.g., color), dis-
crete ordered domains (e.g., integer attributes, decimals, strings), and continous
ordered domains (e.g., float). Categorial domains may be ordered or unordered.
In the first case they are called ordinal, in the latter nominal. We will be mainly
concerned with integer attributes. Strings are special, and we discuss some ap-
proaches in Sec. 24.13.6. Continous domains are also special. The probability
of occurrence of any value in a continous domain in a finite set is zero. The
techniques developed in this section can often easily be adopted to continous
422 CHAPTER 24. CARDINALITY AND COST ESTIMATION
where w is the weight which can be adapted to different situations. If, for
example, the system is CPU bound, we should increase w and if it is I/O
bound, we decrease w.
However, it is not totally clear what we are going to optimize under this cost
formula. One interpretation could be the following. Assume w = 0.5. Then,
we could interprete the total costs as response time under the assumption that
fifty percent of the CPU time can be executed in parallel with I/O. Accordingly,
we find other top-most cost formulas. For example, the weight is sometimes
dropped [401]:
C = CI/O + Ccpu (24.2)
Under the above interpretation, this would mean that concurrency is totally
absent. The opposite, total concurrency between I/O and CPU, can also be
found [166]:
C = max(CI/O , Ccpu ) (24.3)
In these green days, an alternative is to calculate the power consumption
during query execution. Therefore, we convert CPU time to Watts consumed
by the CPU and disk time to Watts consumed by the disks and simply add up
these number to get an estimate of the power consumption of the plan.
However, these formulae sometimes raise a problem. For example, the nested
loop join method requires multiple evalutions of its inner part.
24.2. A FIRST APPROACH 423
24.2.4 Abbreviations
We need some abbreviations to state our cost formulas. A first bunch of them
is summarized in Table 24.2. There, we assume that an index is always a B +
tree.
R,S,T relations
I index
A,B,C attributes or sets of attributes
DA ΠD A (R)
dA |DA |
minA min ΠD A (R) for an attribute A of R
maxA max ΠD A (R) for an attribute A of R
|R| number of tuples of R
||R|| number of pages on which R is stored
||A||B average length of a value of attribute A of R (in bytes)
||A(R)||B average length of a tuple in bytes
||I|| number of leave pages of an index
H(I) depth of the index I minus 1
the selectivity of p. It is the main focus of the next subsection. Selinger et al.
distinguish two cases. In the first case, all pages containing qualifying tuples fit
into main memory. For this case, they estimate the number of pages accessed
by
H(T ) + F (p) ∗ (||I|| + ||R||).
EXC Note that with the help of Chapter 4, we can already do better. In the second
case, where the pages containing qualifying tuples do not fit into main memory,
they give the estimate
Next, we have to discuss the costs of different join methods. Selinger et al.
propose cost formulas for the simple nested loop join (1nl ) and the sort merge
join (1sm ). Since summing up the costs of all operators in a tree results in
some problems for nested loop joins, they adhere to a recursive computation of
total costs. Let e1 and e2 be two algebraic expressions. Then they estimate the
cost of the simple nested loop join as follows:
consists of writing and reading the result of ei if it needs to be sorted. This can
be estimated as
where pagesize is the page size in bytes. The factor 1.2 is called the universal
fudge factor . In the above case, it takes care of the storage overhead incurred
by using slotted pages. If we assume that the merge phase of the sort merge
join can be performed in main memory, no additional I/O costs occur and we
are done.
Clearly, in the light of Chapter 4, counting the numbers of pages read is not
sufficient as the discrepancy between random and sequential I/O is tremendous.
Thus, better cost functions should use a more elaborate I/O cost model along
the lines of Chapter 4. In any case, note that the calculation of the I/O or CPU
costs of any operator highly depends on its input and output cardinalities.
The idea of the approach of Selinger et al. is to calculate the result cardinal-
ity for a plan class by the following procedure. First, the sizes of all relations
represented by the plan class are multiplied. This is the result of their cross
product. In a second step, they take a look at the predicate p applied to the
relations in the plan class. For p they calculate a selectivity estimate s(p) and
multiply it with the result of the first step. This then gives the result. Hence,
if a plan class represents the algebraic expression
σp (Ani=1 Ri ),
ing these and other assumptions often leads to an overestimate of real result
cardinalities [173, 175].
How bad is it in terms of plan generation if we under- or overestimate the
cardinalities of intermediate results? As Ioannidis and Christodoulakis pointed
out, errors propagate multiplicatively through joins [444]. Assume we want to
join eight relations R1 , . . . , R8 and that the cardinality estimates of Ri are each
a factor of 5 off. Then the cardinality estimation of R1 1 R2 1 R3 will be a
factor of 125 off. Clearly, this can affect the subsequent join ordering. If we
were only a factor of 2 off, the cardinality estimation of R1 1 R2 1 R3 could be
only a factor of eight off. This shows that minimizing the multiplicative error
is a serious intention.
The effect of misestimating cardinalities on plan quality has not been thor-
oughly investigated. There exists a study by Kumar and Stonebraker, which
concludes that it does not matter [513]. However, we do not trust this conclu-
sion. Swami and Schiefer give a query and its profiles for which bad cardinality
estimates lead to a very bad plan [851]. A very impressive example query is
presented in [871]. The plan produced for the query under cardinality estima-
tion errors runs 40 minutes while the plan produced with better cardinality
estimates takes less than 2 seconds. Later, we will give two further examples
showing that good cardinality estimation is vital for generation of good plans.
Hence, we are very sure that accurate estimation is vital for plan generation.
We suggest to the reader to find examples, using the simple Cout cost function,
where wrong cardinality estimates lead to bad plans. EXC
bA = [lA , uA , fA , dA ]
lA = min(ΠA (R))
uA = max(ΠA (R))
fA = |R|
dA = |ΠD
A (R)|
If the attribute A is implicit from the context or does not matter, we may omit
it.
24.3.2 Assumptions
The first two assumptions we make are:
occuring for attribute A, also known as active domain. Then we can define the
spread as
∆i = xi+1 − xi
The equal spread assumption (ESA) states that ∆i = ∆j for all 1 ≤ i, j < dA .
Denote this value by ∆A .
There are three subtypes of the equal spread assumption, depending on
whether we assume the lower and upper bounds lA and uA belong to DA . Type
I assumes lA , uA ∈ DA . Then ∆A becomes (uA − lA )/(dA − 1). In case of type
II, where lA ∈ DA and uA 6∈ DA holds, we have ∆A = (uA − lA )/dA . For type
III, where lA 6∈ DA and uA 6∈ DA we get ∆A = (uA − lA )/(dA + 1). As an
example, take lA = 1, uA = 13, and dA = 3. Then for the three types we have
the different values 12/2 = 6, 12/3 = 4, and 12/4 = 3. It should be clear that
the difference is small if dA is sufficiently large. If dA is small, we can store
the frequency of each value explicitly. Otherwise, it is large, and it does not
matter which type we use. In case of integers, the above numbers may result
in non-integers. Thus, we prefer to define in this case
uq − lq + 1
∆A = b c.
dA
An alternative to the uniform distribution assumption and the equal spread
assumption is the continous-value assumption. Here, we assume that all values
in the (discrete and finite) domain occur with frequency fA /nA .
Different assumptions can lead to different estimates. To see this, we first
fix some notation. Then, we provide estimation procedures under the contin-
ues value assumption and under the equal spread assumption. Afterwards, we
present an example. Assume we are given a relation R and one of its attributes
A. The possible values for attribute A as implied by its type is called universe
and abbreviated by UA . The set of possible values is the active domain DA ,
which we already saw. The total number of tuples in R is typically called its
cardinality and denoted by |R|. However, in this chapter we prefer to call this
value cumulated frequency and denote it by fA . Remember that we denote the
minimum of DA by lA and the maximum by uA .
For attribute A, we consider range queries and try to estimate the result
cardinality thereof. Thus, we are interested in queries Q of the form
select count(*) from R where lq ≤ A ≤ uq .
We denote the result of this range query by fq .
We describe frequency densities of some attribute A by sets of points (xi , fi ),
where xi is a domain value and fi is the frequency of the domain value. Thus,
the frequency density is the result of the query
select A, count(*) from R group by A.
Here is our example for a frequency density:
Thus, the integer value 1 occurs 7 times and the value 7 occurs 2 times.
430 CHAPTER 24. CARDINALITY AND COST ESTIMATION
uq − lq + 1
fˆq (cva) := ∗ fA .
ua − la + 1
Let us first recall the spread under the equal spread assumptions. For integer
values, we defined
uq − lq + 1
∆A := b c.
dA
Using this definition, we provide an estimate for fˆq (esa) by applying the fol-
lowing formula:
qu − ql + 1 fA
fˆq (esa) := b c∗ .
∆A dA
Note that if the active domain is dense, i.e., all possible values within [lA , uA ]
EXC occur in the database, then the estimation under cva and esa coincide.
Fig. 24.2 shows the results for 28 different range queries specified by their
lower bound (lq ) and upper bound (uq ) for the frequency density given above.
The true cumulated frequency within the given query range is given in the
column fq . The estimates determined under CVA and ESA are presented as
well as a column indicating the better assumption for that particular query. As
we can see, in most cases ESA wins. However, experiments by Wang and Sevcik
[893] came to the conclusion that the opposite is true and CVA is superior to
ESA. (We can follow this claim at least for some of their data sets). Since
estimates using CVA are easier to calculate and easily extendible to contineous
domains, we prefer them.
Given the above assumptions (and one more to come), the task is to establish
the operations cardinality estimation and logical profile propagation. The latter
implies that we can calculate the logical profile of all attributes of any result
relation established by applying some algebraic operator. Assume we have
solved this task. Then it is clear that the cumulated frequency fA , which
equals |R| in this section, solves the task of cardinality estimation. Hence,
we will not mention the cardinality estimation task explicitly any more. The
use of the cumulated frequency fA instead of the seemingly simpler cardinality
notation |R| is motivated by the fact that a single attribute will have multiple
(small, piecewise) profiles if histograms are applied. To make the formulas of
this section readily available for histogram use is the main motivation for using
the cumulated frequency.
Figure 24.2: Sample for range query result estimation under CVA and ESA.
two attributes A and B, we again need to give the profile propagation for all
attributes C, which are different from them.
Exact match queries The first case we consider is σA=c for a constant c.
0 = c, u0 = c. Further,
Clearly, lA A
1 if c ∈ ΠA (R)
d0A =
0 else
We cannot be sure whether the first or second case occurs. Since no reasonable
cardinality estimation should ever return zero, we always assume c ∈ ΠA (R).
More generally, we assume that all constants in a query are contained in the
database in the according attributes.
As every distinct value occurs about fA /dA times, we conclude that fA0 =
432 CHAPTER 24. CARDINALITY AND COST ESTIMATION
fA /dA . A special case occurs if A is the key. Then, we can immediately conclude
that fA0 = 1.
Let us now consider another attribute C ∈ A(R), C 6= A. Since fC0 = fA0 ,
we only need to establish d0C . For the lack of any further knowledge, we keep
the lower and upper bounds, i.e. lC 0 = l and u0 = u . To derive the number of
C C C
distinct values remaining for attribute B, we can use the formula by Yao/Waters
(see Sec. 4.16.1) Denote by s(p) = |σA=c (R)|/|R| = fA0 /fA the fraction of tuples
that survives the selection with predicate p ≡ A = c. Fix a distinct value for
C. Using the uniform distribution assumption, it occurs in fC /dC tuples of R.
Then, for this value we have fA −ff C0 /dC possibilities to chose fA0 tuples without
A
it. The total number of possibilities to chose fA0 tuples is ffA0 . Thus, we may
A
conclude that
d0C = dC YffCA/dC (fA0 )
Alternatively, we could use
Range queries Let us now turn to range queries, i.e. selection predicates of
the form c1 ≤ A ≤ c2 , where lA ≤ c1 < c2 ≤ uA . In all of them, the lower
0 = c and u0 = c . Using the
and upper bounds are given by the range, i.e. lA 1 A 2
System R approach, we can estimate
c2 − c1
fA0 = ∗ fA
uA − lA
c2 − c1
d0A = ∗ dA
uA − lA
In general, we have
n
X
fA0 = fB0 = fA p(xi = A)p(xi = B|xi = A) (24.4)
i=1
For ΠB (R) ⊆ ΠA (R), we get fA0 = fB0 = f /dA . Summarizing these cases, we
may conclude that
f
fA0 = fB0 =
max(dA , dB )
which is the formula applied in System R if indices exist on A and B. Clearly,
we can calculate an upper bound on the number of distinct values as
Let us estimate the cumulated frequency after the selection if none of the
above conditions hold and independence of A and B holds. Then, the condi-
tional probability p(xi = B|xi = A) becomes p(xi = B) = 1/n. Thus
n
X f dA 1 f
fA0 = fB0 = =
dA n n n
i=1
In case of ΠA (R) ⊆ ΠB (R), only dA such pairs out of dA ∗ dB exist. Thus, the
factor becomes dA /(dA ∗ dB ) = 1/dB . For ΠB (R) ⊆ ΠA (R), we have the factor
1/dA . Both cases can be summarized as in
D(n ∗ n, fA )
d0A = d0B =
max(dA , dB )
In case the domain size n is not available, we could estimate it by |ΠD A (R) ∪
D
ΠB (S)|. If this number is not available either, we could hesitatingly use dA dB .
An alternative is to use
predicate f0 d0 comment
c2 −c1 c2 −c1
c1 ≤ A ≤ c2 fA0 = uA −lA ∗ fA d0A = uA −lA ∗ dA
(c2 −c1 )
fA0 = d0A ∗ (fA /dA ) d0A = ∆A
⊆
A=B fA0 = f
max(dA ,dB ) d0A = dA ∗ YffAA/dA (fA0 ) ΠA (R) ⊇ ΠB (R)
fA0 = fB0 = fA
n d0A = dA ∗ YffAA/dA (fA0 ) else
As an exercise the reader may verify that fA0 = (dA − 1)fA /(2dA ) under the
type II equal spread assumption. As an additional exercise the reader should
derive d0A and d0B . We conjecture that EXC
or
d0A = dA ∗ YffAA/dA (fA0 ).
The following observation is crucial: Even if in the original relations the values
of A and B are uniformely distributed, which typically is not the case, the
distribution of the values A and B after the selection with A ≤ B is non-
uniform. For example,
xi − lB
p(xi ≤ B) =
uB − lB
Open ranges and functions There are plenty of other cases for selection
predicates, which we have not discussed. Let us briefly mention a few of them.
436 CHAPTER 24. CARDINALITY AND COST ESTIMATION
Clearly, we have:
EXC This helps to estimate the fA0 . The d0A are left to the reader.
Estimating selectivities for (user defined) functions and expressions can be
done by using computed attributes. For example, cardinalities for selections
with predicates like g(A) = c for a function g can be treated by introducing an
additional attribute gA for which a profile can be established.
Semijoin Let us now turn to the join operator and its variants. We start with
the left-semijoin and consider expressions of the type R NA=B S. If ΠA (R) ⊆
ΠB (S), then R NA=B S = R, and no profiles change. If ΠA (R) ⊇ ΠB (S), then
fA0 = fA dB /dA and d0A = dB . If A and B are independent, we calculate
n
X n
X
fA fA dA dB fA dB
fA0 = p(xi = A)p(xi ∈ B) = = .
dA dA n n n
i=1 i=1
and
dA dB
d0A = .
n
For an attribute C ∈ A(R) \ {A, B}, we have fC0 = fA0 and
Regular Join For the regular join R 1A=B S. Let us start with an at-
tribute C ∈ A(R) \ {A, B}. We can apply the formulas for the semijoin because
ΠDC (R 1A=B S) = ΠC (R N S). For attributes C ∈ A(S) \ {A, B} remember
D
Rosenthal showed that this result also holds if the condition of fairness holds
for at least one relation [721]. A relation is called fair with respect to an
attribute A if for the expected value E(|σA=x (R)|) = |R|/nA holds. In this
case, the expected value for the result of the join is (|R||S|)/n. Note that
ΠDA (R 1A=B S) = Π (R NA=B S). Thus, we can estimate the number of
D
distinct values as
dA dB
d0A = d0B = .
n
Selfjoin The above formulas only apply if we are not dealing with a selfjoin.
Of course, R 1A=B R does not pose any problems. However, R 1A=A R does,
because all tuples find a join partner. The estimates are easy to derive:
fA fA
fA0 =
dA
d0A = dA
For all attributes C other than A, fC0 = fA0 and d0C = dC .
As pointed out by [23], selfjoin sizes can be used to derive an upper bound
for general joins:
|R BA=A R| + |S BB=B S|
|R BA=B S| ≤ .
2
This bound, which is an immediate consequence of the Cauchy Schwarz in-
equality, can be used as a sanity check. Table 24.5 summarizes our findings for
joins.
if nAi , the size of the domain of Ai , is known. Otherwise, we can use the
estimate
n
Y
D( dAi , |R|)
i=1
438 CHAPTER 24. CARDINALITY AND COST ESTIMATION
join f’ d’ comment
fA fB
R BA=B S fA0 = dB d0A = dA ΠA (R) ⊆ ΠB (S)
fA fB
fA0 = n d0A = dAndB else
fA fA
R BA=A R fA0 = dA d0A = dA
The number of distinct values in any attribute does not change, i.e. d0Ai = dAi .
If we have a functional dependencies and κ → A for a set of attributes A
and κ ⊂ A, then
ΠD D
A (R) = Πκ (R).
Further, if |ΠD D 0 0
A (R)| = |R|, we have |ΠA0 (R)| = |R| for all A with A ⊇ A.
The above estimates for the result size of a duplicate eliminating projection
assumes that the attribute values are uniformly distributed, i.e., every distinct
value occurs with the same probability. As we will not deal with projections
any more in this part of the book, let us complete the subject by giving an
approach where each attribute value can have its own probability of occurrence.
This is not unlikely, and for attributes with few possible values the following
approach proposed by Yu, Zuzarte, and Sevcik is quite reasonable [953]. The
assumptions are that the attributes are independent and the values of each
of them are drawn by independent Bernoulli trials. Under these assumptions,
they derive the following three results: a lower bound, an upper bound, and an
estimate for the expected number of distinct values in the projection. In order
to state these results, we need some additional notation. Let R be a relation
and define N = |R|. Further let G = {A1 , . . . , An } be a subset of the attributes
of R. Define di = |ΠD Ai (R)| to be the number of distinct values occurring in
attribute Ai . We denote values of Ai by ai,1 , . . . , ai,di .
We wish to derive an estimate for DG = |ΠD G (R)|. Therefore, we model each
attribute Ai by a frequency vector fi = (fi,1 , . . . , fi,di ) where fi,j is the number
of occurrences of the j-th distinct value ai,j of Ai divided by N . If, for example,
A1 has three distinct values which occur 90, 9, and 1 times in a relation with
N = 100 elements, then f1 becomes (0.9, 0.09, 0.01).
Let us first look at bounds for DG . Trivially, DG is bounded from above by
n
Y
DG ≤ min{N, di }
i=1
24.3. A FIRST LOGICAL PROFILE AND ITS PROPAGATION 439
These bounds are very rough. This motivated Yu et al. to derive better ones.
Before we proceed, let us consider another example. Assume we have three
attributes A1 , A2 , and A3 all with frequency vectors fi = (0.9, 0.09, 0.01) for a
relation of size N = 100. Since we assume attribute independence, the proba-
bility of (a1,3 , a2,3 , a3,3 ) is 0.01 ∗ 0.01 ∗ 0.01. Thus, its occurrence in a relation of
size 100 is highly unlikely. Hence, we expect DG to be less than 27 = 3*3*3. In
general, we observe that the probability of occurrence of a tuple (a1,j1 , . . . , an,jn )
is the product of the relative frequencies f1,j1 ∗ . . . ∗ fn,jn . From this, the ba-
sic idea of the approach of Yu et al. becomes clear: we have to systematically
consider all the different possibilities to multiply relative frequencies. This is
nicely captured by the Kronecker product (tensor product).
Before we proceed, let us state the upper and lower bounds in case of two
attributes by giving two theorems developed by Yu et al. [953].
di
X
⊥
DG = max li,j .
i=1,2
j=1
The algorithm in Fig. 24.3 calculates the lower bound DG ⊥ . Calculating the
>
upper bound DG is much easier. For each fi,j , we compute ui,j by simply
comparing fi,j N and di0 . Adding up the ui,j for each attribute and taking the
lesser of the two sums gives the desired result.
440 CHAPTER 24. CARDINALITY AND COST ESTIMATION
CalculateLowerBoundForNumberOfDistinctValues(f1 , f2 )
/* frequency vectors f1 and f2 */
sort fi (i = 1, 2) in descending order;
for i = 1, 2 {
i0 = 3 − i;
for j = 1, . . . di {
k = 1;
P
while (fi,j > kl=1 fi0 ,l )
++k;
li,j = k;
}
Pi
lbi = dj=1 li,j ;
}
DG⊥ = max
i=1,2 lbi ;
return DG ⊥
⊥
Figure 24.3: Calculating the lower bound DG
The estimate can not be calculated easily. First, we calculate the Kronecker
product fG = f1 ⊗ . . . ⊗ fn of all frequency vectors. Note that to every value
combination v ∈ ΠD D
A1 (R) × . . . × ΠAn (R) there corresponds exactly one compo-
nent in fG , which contains its probability of occurrence. With this observation,
it is easy to derive the following theorem, in whichQwe denote by fG,i the i-th
component of fG and by M its length, i.e. M = ni=1 di . Further remember
that N = |R|.
EstimateNumberOfDistinctValues(f1 , . . . , fn )
/* frequency vectors fi */
/* step 1: calculate fG = f1 ⊗ . . . ⊗ fn */
fG = f1 ;
for (i = 2; i ≤ n; ++i) {
f old = fG ;
fG = ; // empty vector
for (j = 1; j ≤ |f old |; ++j) {
for (k = 1; k ≤ di ; ++k) {
fG = push back(fG , fjold × fi,j ); // append a value to a vector
}
}
}
/* step 2: compute the expected number of distinct value combinations */
S = 0;
for (j = 1, j ≤ M ; ++j) { // M = length(fG )
S += (1 − fj )N ;
}
D̂G = M − S;
return D̂G ;
The algorithm for computing the estimate is given in Fig. 24.4. In the first,
most expensive phase, it constructs the Kronecker product. Then, the simple
calculations according to the theorem follow. A more efficient implementation
would calculate the Kronecker product only implicitly. Further, the frequency
vectors may not be completely known but only a part of it via some histogram.
As was also shown by Yu et al., end-biased histograms (coming soon) are opti-
mal under the following error metrics. Let D̂G,hist be the estimate derived for
a histogram. The error function they consider is
S. Hence, for any such a, fA = fA /dA and nB equal to the size of the common
domain of R.B and S.B, we can calculate the survival probability as
fA nB
/
|S| |S|
provided that fA ≥ |S| and R is a set. Denote by fA0 and d0A the cumulated
frequency and the number of distinct values for attribute A in the result of
R ÷ S. Then we have the estimate
0 0 fA nB
fA = dA = dA ∗ /
|S| |S|
in case R is a set.
If R is a bag, we must be prepared to see duplicates in σA=a (R). In this
case we can adjust the above formula to
0 0 xA n
fA = dA = dA ∗ /
|S| |S|
X
ha n
fA0 = d0A = /
|S| |S|
a∈ΠD
A (R)
Keeping ha for every possible a may not be practical. However, if the number
of distinct values in H = {ha |a ∈ ΠDA (R)} is small, we can keep the number of
distinct a values for each possible ha . Assume H = {h1 , . . . , hk } and define
gi = |{a ∈ ΠD
A (R)|ha = hi }|,
24.3.7 Remarks
NULL Values Our profile is not really complete for attributes which can
have NULL values. To deal with these, we need to extend our profiles by the
frequency d⊥A with which NULL occurs in an attribute A of some relation. It is
straightforward to extend the above profile to deal with this additional count.
24.4. APPROXIMATION OF A SET OF VALUES 443
p
q-value max(X) min(X) Eq = maxni=1 max{xi /x̂, x̂/xi }
Sets of Attributes Note that nothing prevents us to use the formulas de-
veloped above for selections and joins if A and B are attribute sets instead of
single attributes. We just have to know or calculate dA for sets of attributes A.
for coefficients cj ∈ R. The estimates yˆi for yi are then derived from fˆ by
n
X
yˆi := fˆ(xi ) = cj Φj (xi ).
j=1
Note that the functions Φj are not necessarily linar functions. For example, we
could use polynomials Φj (x) = xj−1 . Further, there is no need for x to be a
single number. It could as well be a vector ~x.
It is convenient to state our approximation problem in terms of vectors and
matrices. Let (xi , yi ) be the points we want to approximate and Φj , 1 ≤ j ≤ n
be some functions. We define the design matrix A ∈ Rm×n , A = (ai,j ) by
ai,j = Φj (xi )
1 xm
As an example consider the three points
A~c
gives the result of fˆ for all points. Clearly, ~c should be determined such that
the deviation of A~c from ~y = (y1 , . . . , ym )T becomes minimal.
The deviation could be zero, that is A~c = ~y . However, remember our as-
sumption that m > n. This means that we have more equations than variables.
Thus, we have an overdetermined system of equations and it is quite unlikely
that a solution to this system of equations exists. This motivates our goal to
find an approximation as good as possible. Next, we formalize this goal.
Often used measures for deviations or distances of two vectors are based on
norms.
Definition 24.5.1 (norm) Let S be a linear space. Then a function ||x|| :
S → R is called a norm if and only if it has the following three properties:
1. ||x|| > 0 unless x = 0
2. ||λx|| = |λ| ||x||
3. ||x + y|| ≤ ||x|| + ||y||
Various norms, called p norms can be found in the literature. Let x ∈ Rn
and p ≥ 1 where p = ∞ is possible. Then
Xn
1
||x||p = ( |xi |p ) p .
i=0
Using these norms, we can define distance functions d1 , d2 , and d∞ . For two
vectors x and y in Rn , we define
It should be clear, that these define the error measures E1 , E2 , and E∞ , which
we used in Sec. 24.4. The only missing error function is Eq . We immediately
fill this gap, and start with the one dimensional case.
24.5. APPROXIMATION WITH LINEAR MODELS 447
Note that for x > 0, ||x||Q = max(x, 1/x). The multivariate case is a straight-
forward extension using the maximum over all components:
1. ||x|| ≥ 0
The Q-paranorm is a norm, hence the name. The only missing part is the
distance function stated next. Let x and y be two vectors in Rn , where y =
(y1 , . . . , yn )T with yi > 0. Then we define
dq (x, y) = ||x/y||Q
For different l (d), we get different problems. For l1 the problem is called
quantile regression. We will not deal with it here, since we do not know of any
application of it in the database context. The solutions for the problems for l2 ,
l∞ , and lq are discussed in subsequent sections, after we have given some exam-
ple applications of what needs to be approximated in a DBMS. Before we pro-
ceed, let us give the solutions for approximating the points (1, 20), (2, 10), (3, 60)
with a linear function α + βx. The following table shows the values of x, y
and estimates for y produced by the best approximations fˆl2 , fˆl∞ , fˆlq , which
minimize l2 , l∞ , and lq , resp. Additionally, we give the α and β of the best
approximations as well as their quality measured by l1 , l2 and lq .
x y fˆl2 fˆl∞ fˆlq
1 20 10 5 10
2 10 30 25 20
3 60 50 45 30
α 20 20 10
β -10 -15 0
l2 14.1421 15 19.1485
l∞ 20 15 30
lq 3 4 2
Let us repeat some general insights into approximation problems as defined
above. Thereby, we follow the exposition of Watson [901]. We start with stating
theorems on the existence of a solution. The following two theorems only apply
to norms. That is, they do not apply to lq . However, as we will see later,
solutions under lq exist.
Theorem 24.5.5 (Existence 1) Let M denote a compact set in a normed
linear space. Then to each point g of the space there exists a point of M closest
to g.
Compactness is a sufficient but not a necessary condition.
Theorem 24.5.6 (Existence 2) Let M be a finite dimensional subspace of a
normed linear space S. Then there exists a best approximation in M to any
point of S.
The next point to consider is the uniqueness of a solution. Proving the unique-
ness of a solution is easy, if the norm is strictly convex.
Definition 24.5.7 ((strictly) convex) Let f (x) be a function on the ele-
ments x of a linear space S. Then f (x) is convex if
f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 )
for all x1 , x2 ∈ S and 0 ≤ λ ≤ 1.
If 0 < λ < 1 implies strict inequality in the above inequality, f (x) is called
strictly convex.
It is easy to show that all lp norms for p 6= ∞ are strictly convex and that l∞
and lq are convex, but not strictly convex. For strictly convex norms, it is easy
to show that a solution is unique.
24.5. APPROXIMATION WITH LINEAR MODELS 449
Since many seeks occur during the processing of a single query, l2 is the appro-
priate norm. On the surface, we seem to have a problem √ using a simple linear
model. However, we can approximate the parts c1 +c2 d and c3 +c4 d for sever-
al distinct c0 either by trying a full range of values for c0 or by a binary search.
The solution for c0 we then favor is the one in which the maximum of the errors √
on both parts becomes minimal. A second problem is the occurrence of d,
450 CHAPTER 24. CARDINALITY AND COST ESTIMATION
√
since this does not look linear. However, choosing Φ1 = 1 and Φ2 (x) = x will
work fine.
Another method is to transform a set of points (xi , yi ) with two (injective)
transformation functions tx and ty into the set of points (tx (xi ), ty (yi )). Then
this set is approximated and the result is transformed back. While using this
approach, special attention has to be paid to the norm, as it can change due to
the transformation. We see examples of this later on in Sec. 24.5.6.
if the active domain of the attribute under consideration is dense, which assume
in this subsection. In Section ??, we present estimation formulas without this
assumption. If the number of values between c1 and c2 is too large for an
explicit summation, we can apply speed-up techniques if the function fˆ has a
simple form. For example, if fˆ is a linear function fˆ(x) = α + βx, the above
EXC sum can be calculated very efficiently . If the number of values between c1 and
c2 is very large and no efficient form for the above sum can be found, or if we
do not have a discrete domain, we can use the integral to approximate the sum.
Thus, we use the right-hand side of
X Z c2
fˆ(x) ≈ fˆ(x)dx
c1 ≤x≤c2 c1
24.5. APPROXIMATION WITH LINEAR MODELS 451
xi 1 2 3 4 5 6 7 8 9 10
fi 10 10 0 1 1 1 0 1 40 36
fi+ 10 20 20 21 22 23 23 24 64 100
5x 5 10 15 20 25 30 35 40 45 50
For our example, we let us define fˆ+ (x) = 5x. Note that this is a linear
approximation. Then, we see that fˆ+ (xi ) is never more than a factor of 2 away
from f + (xi ). Thus, it is a pretty good approximation. This is illustrated in
Figure 24.6. However, we see that
The estimates differ by far more than a factor of 2 from their true values. Thus,
we have to look for a different solution.
452 CHAPTER 24. CARDINALITY AND COST ESTIMATION
120
cum freq
5x
100
80
60
40
20
0
0 2 4 6 8 10
Assume for a change, that we are interested inP half-open intervals. Thus, we
would like to provide estimates for f − (c1 , c2 ) := c1 ≤xi <c2 fi . A rather simple
method is to directly approximate f − (c1 , c2 ) by a linear function fˆ− (c1 , c2 ) =
ac1 + bc2 + c, where we apply the two constraints that fˆ− (x, x) = 0 and
fˆ− (lA , uA ) = fA . Remember that lA = min(ΠA (R)), uA = max(ΠA (R)), and
fA = |R|. With these constraints, fˆ simplifies to fˆ− (c1 , c2 ) = ucA2 −c
−lA , which
1
should look familiar to the reader. No errror bounds come with this approxi-
mation. Even worse, we do not know, whether it minimizes the q-errors or not.
Thus, it seems to be better to control the q-error more directly.
situation where almost all values in [l, u] occur with some frequency in A. Only
a few holes exist. Denote by W the set of values in [l, u] that do not occur in
A. Again, the selection is of the form σc1 ≤A≤c2 (R). In the case of holes, the
above summation for range queries leads to a wrong result. It would be better
to calculate the result cardinality as in
X X
fˆ(x) − fˆ(x)
c1 ≤x≤c2 c1 ≤x≤c2 ,x∈W
For this to work, we have to know where the holes in [l, u] are. If there are only
a few of them, we can memorize them. If they are to many to be stored, we
can approximate them as follows. Let W = {w1 , . . . , wm }. Then, we can use
approximation techniques to approximate the set of points (i, wi ), 1 ≤ i ≤ m.
Depending on the size of the interval [l, u] either l∞ or lq is the appropriate
norm. Similarily, the peaks, i.e., the distinct values occurring in the attribute
A of R, can be approximated if there are only a few of them in the domain of
A.
Why Q?
where we intentionally left out all the join predicates. Ioanidis and Christo-
doulakis pointed out that errors propagate exponentially through joins [444].
Denote by si the cardinality of σpi (Ri ) and by ŝi its estimate. Further assume
that independence holds. This means, that si can be written as fi |Ri |, where
fi is the selectivity of pi . Denote by fi,j the selectivity of the join predicate
between Ri and Rj , if it exists. Otherwise we define fi,j = 1. The result of
joining a subset x ⊆ {R1 , . . . , Rn } has cardinality
Y Y Y
sx = ( fi )( fi,j )( |Ri |)
Ri ∈x Ri ,Rj ∈x Ri ∈x
Denote by fˆi the estimate for the selectivities of the pi and assume that the
join selectivities have been estimated correctly (which, of course, is difficult in
practice). Then, the estimated cardinality of the result of joining the relations
454 CHAPTER 24. CARDINALITY AND COST ESTIMATION
in x is
Y Y Y
ŝx = ( fˆi )( fi,j )( |Ri |)
Ri ∈x Ri ,Rj ∈x Ri ∈x
Y Y Y Y
= ( fi /fi )( fˆi )( fi,j )( |Ri |)
Ri ∈x Ri ∈x Ri ,Rj ∈x Ri ∈x
Y Y Y Y
= ( fˆi /fi )( fi )( fi,j )( |Ri |)
Ri ∈x Ri ∈x Ri ,Rj ∈x Ri ∈x
Y
= ( fˆi /fi )sx
Ri ∈x
where some i belong to the category with fˆi /fi < 1 and others to the one
with fˆi /fi > 1. Remember that during dynamic programming, all subsets of
relations are considered. Especially those subsets occur in which all relations
belong to one category only. Hence, building on the cancellation of errors by
mixing them from different categories is not a true option. Instead, we should
minimize Y
max{fi /fˆi , fˆi /fi }
Ri ∈x
in order to minimize errors and error propagation. This product can be mini-
mized by minimizing each of its factors. This means that if we want to minimize
error propagation, we have to minimize the multiplicative error Eq for estimat-
ing the cardinalities of selections based on equality. This finding can obviously
be generalized to any kind of selections. Thus, for cardinality estimations for
selections (and joins, or cardinality estimation in general) the q-error is the
error metrics of choice.
f0,i fi |Ri | < f0,j fj |Rj | ⇐⇒ f0,i fˆi |Ri | < f0,j fˆj |Rj |
which is equivalent to
fi ri fˆi ri
< 1 ⇐⇒ <1
fj rj fˆj rj
for ri = f0,i |Ri |. We now show that if
s
fi fi ri
|| ||Q < min || ||Q (24.9)
fˆi i6=j fj rj
for all i, then P = P̂ . This condition implies the much weaker condition that
for all i 6= j
fˆi fˆj fi ri
|| ||Q || ||Q < || ||Q (24.10)
fi fj fj rj
To show the claim, it suffices to show that (fi ri )/(fj rj ) < 1 implies (fˆi ri )/(fˆj rj ) <
1. This follows from
fˆi ri fˆi fj fi ri
=
fˆj rj fi fˆj fj rj
fˆi fj fi ri
= ( )/(|| ||Q ) (∗)
fi fˆj fj rj
fˆi fj fi ri
≤ (|| ||Q || ||Q )/(|| ||Q )
fi fˆj fj rj
< 1
where (*) follows from (fi ri )/(fj rj ) < 1. Thus, we have shown that if the
q-error is limited as in condition 24.9, the produced plan is still optimal.
chain query
12 star query
10
0
1 1.2 1.4 1.6 1.8 2 2.2 2.4
estimated cardinalities, C(P ) be the true costs under C of the optimal plan,
and C(P̂ ) be the true costs under C of the plan produced under the estimated
cardinalities. Then
C(P̂ ) ≤ q 4 C(P ),
where q is defined as
q = max ||ŝx /sx ||Q ,
x⊆X
with X being the set of relations to be joined, and sx (ŝx ) is the true (estimated)
size of the join of the relations in x. That is, q is the maximum estimation error
taken over all intermediate results.
This bound is rather tight, as is demonstrated by the example shown in Fig. 24.7
(taken from [611]). This figure shows for a chain and a star query with four
relations the quotient cost(P̂ )/cost(P ) for increasing q-errors. For the star
query, we see that this ratio is about 11.11, which is about 23.46 . Thus, a
bound of the form q 3 C(P ) would fail.
Since we are used to solve equations for x, we rewrite our problem to A~x = b.
That is, the vector ~x replaces the coefficient vector c. Using Theorem 24.5.10,
we must have that A~x∗ −b is orthogonal to the range of A. The range of a matrix
A ∈ Rm×n is defined as R(A) = {Ax|x ∈ Rn }. Let ai be i-th column vector
of A and ~x = (x1 , . . . , xn )T . Then, the best approximation can be found by
solving the following system of linear equations, which is called (Gauß) normal
equations:
Definition 24.5.12 (full rank) A matrix A ∈ Rm×n , m > n has full rank if
its rank is n.
Note that for all matrices A ∈ Rm×n , we always have that AAT and AT A are
symmetric.
Definition 24.5.14 (idempotent) A matrix A ∈ Rn×n is idempotent if and
only if AA = A.
A matrix for which the uniquely determined inverse exists is called regular.
Definition 24.5.16 (orthogonal) A matrix A ∈ Rn×n is orthogonal if and
only if AAT = AT A = I.
458 CHAPTER 24. CARDINALITY AND COST ESTIMATION
where Ai,j ∈ R(n−1)×(n−1) results from A by eliminating the i-th row and the
j-th column.
λ(A) := {λ1 , . . . , λk }
Two similar matrices have the same Eigenvalues, as can be seen from the fol-
lowing theorem.
Theorem 24.5.21 Let A, B ∈ Rn×n be two similar matrices. Then they have
the same characteristic polynomial.
Every matrix and, hence, every vector has a g-inverse. For regular matrices,
the g-inverse and the inverse coincide. In general, the g-inverse is not uniquely
determined. Adding some additional properties makes it unique.
1. AA+ A = A
2. A+ AA+ = A+
24.5. APPROXIMATION WITH LINEAR MODELS 459
3. (A+ A)T = A+ A
4. (AA+ )T = AA+
For every matrix and, hence, every vector there exists a uniquely determined
Moore-Penrose inverse. In case A is regular, A+ = A−1 holds. If A is symmetric,
then A+ A = AA+ . If A is symmetric and idempotent, then A+ = A. Further,
all of A+ A, AA+ , I − A+ A, and I − AA+ are idempotent. Here are some
equalities holding for the Moore-Penrose inverse:
(A+ )+ = A (24.12)
T + + T
(A ) = (A ) (24.13)
T + + T +
(A A) = A (A ) (24.14)
T + T + +
(AA ) = (A ) A (24.15)
T + T
A AA = A (24.16)
+ T T
A AA = A (24.17)
U T AV = S
S = diag(s1 , . . . , sk )
s1 ≥ s2 ≥ . . . ≥ sr > sr+1 = . . . = sk = 0
For a proof and algorithms to calculate the SVD of an arbitrary matrix see the
book by Golub and Loan [330]. Another proof can be found in the book by
Harville [402]. The diagonal elements si of S, which is orthogonal equivalent to
A, are called singular values. From
S T S = (U T AV )T (U T AV ) = V T AT U U T AV = V −1 AT AV
where
r(~a) = ~b − A~a (24.19)
The components of the vector r(~a) are denoted by ri (~a).
As pointed out earlier, l∞ is a convex norm. Hence, a solution exists. Since
l∞ is not strictly convex, the uniqueness of the solution is not guaranteed.
To solve problem 24.18 by following the approach proposed by Watson [901].
We start by characterizing the solution, continue with the conditions under
which uniqueness holds, make some more observations, and finally derive an
algorithm for the case n = 2, i.e. we find a best approximation by a linear
function. Although only few applications in databases exist for l∞ , it is very
useful to find a best approximation under lq if we want to approximate by a
function eβ+αx (see Sec. 24.5.6).
Assume we have a best solution ~a. Then, for some indices i, ri (~a) attains the
maximum, i.e. ri (~a) = ||r(~a)||∞ . Otherwise, a better solution would exist. We
denote the set of indices where the maximum is attained by I(~ ¯ a). We further
¯ The
denote by θi (~a) the sign of ri (~a). Thus ri (~a) = θi (~a)||r(~a)||∞ for all i ∈ I.
following theorem gives a characterization of the solution.
Theorem 24.5.25 A vector ~a ∈ Rn solves problem 24.18 if and only if there
exists a subset I of I¯ with |I| ≤ n + 1 and a vector ~λ ∈ Rm such that
1. λi = 0 for all i 6∈ I,
2. λi θi ≥ 0 for all i ∈ I, and
3. AT~λ = ~0.
The set I in the theorem is called an extremal subset of a solution ~a.
There are two important corollaries to this theorem.
Corollary 24.5.26 Let ~a solve problem 24.18. Then ~a solves an l∞ approxi-
mation problem in Rn+1 obtained by restricting the components of r(~a) to some
particular n + 1 components. If A has rank t, then the components of r(~a may
be restricted to a particular t + 1 components.
Corollary 24.5.27 Let ~a solve problem 24.18 and let I be chosen according
to Theorem 24.5.25 such that λi 6= 0 for all i ∈ I. Further let d~ be another
solution to 24.18. Then
~ = ri (~a).
ri (d)
Hence, not surprisingly, any two solutions have the same residuals for compo-
nents where the maximum is attained. The theorem and its first corollary state
that we need at most t + 1 solutions for a matrix A of rank t. The next theorem
shows that at least t + 1 indices exist where the maximum is attained.
462 CHAPTER 24. CARDINALITY AND COST ESTIMATION
Theorem 24.5.28 If A has rank t, a solution to problem 24.18 exists for which
¯ ≥ t + 1.
|I|
Thus, any submatrix of A consisting of a subset of the rows of A, which core-
¯ a), must have rank t for some solution ~a to
spond to the indices contained in I(~
problem 24.18.
The above theorems and corollaries indicate that the clue to uniqueness is
the rank of subsets of rows of A. The following definition captures this intuition.
Definition 24.5.29 (Haar condition) A matrix R ∈ Rm×n , where m ≥ n
satisfies the Haar condition if and only if every submatrix consisting of n rows
of A is nonsingular.
Finally, we can derive uniqueness for those A which satisfy the Haar condition:
Theorem 24.5.30 If A satisfies the Haar condition, the solution to prob-
lem 24.18 is unique.
Obviously, we need to know, whether the Haar condition holds for a matrix
A. Remember that we want to approximate a set of points by a linear combi-
nation of functions Φj , 1 ≤ j ≤ n. From the points (xi , yi ), 1 ≤ i ≤ m, and the
Φj , the design matrix A is derived as shown in Equation 24.5. If the Φj form a
Chebyshev set, the design matrix will fulfill the Haar condition.
Definition 24.5.31 (Chebyshev set) Let X be a closed interval of R. A set
of continous function Φ1 (x), . . . , Φn (x), Φi : X → R, is called a Chebyshev
set, if every non-trivial linear combination of these functions has at most n − 1
zeros in X.
Assume the xi are ordered, that is xi < xi+1 for 1 ≤ i < m. Further, it is well-
known that the set of polynomials Φj = xj−1 , 1 ≤ j ≤ n, forms a Chebyshev
set on any interval X. From now on, we assume that our xi are ordered, that
is x1 < . . . < xm . Further, we define X = [x1 , xm ]. We also assume that the
matrix A of Problem 24.18 is defined as given in Equation 24.5, where the Φj
are continous functions from X to R.
We still need some more knowledge in order to build an algorithm. The
next definition will help to derive a solution for subsets I of {1, . . . , m} with
|I| = n + 1.
Definition 24.5.32 (alternating set) Let ~a be a vector in Rn . We say that
r(~a) alternates s times, if there exists points xi1 , . . . , xis ∈ {x1 , . . . , xm } such
that
rik (~a) = −rik+1 (~a)
for 1 ≤ k < s. The set {xi1 , . . . , xis } is called an alternating set for ~a.
Consider again the example where we want to approximate the three points
(1, 20), (2, 10), and (3, 60) by a linar function. We saw that the solution to our
problem is fˆl∞ (x) = −15 + 20x. The following table gives the points, the value
of fˆl∞ , the residuals, including their signs.
x y fˆl∞ ri
1 20 5 +15
2 10 25 -15
3 60 45 +15
where
Φ1 (x1 ) . . . Φn (x1 )
..
∆(x1 , . . . , xn ) = det ... ..
. . (24.20)
Φ1 (xn ) . . . Φn (xn )
Then
sign(∆i ) = sign(∆i+1 ), ∀ 1 ≤ i ≤ n.
Let us take a closer look at Theorem 24.5.33 in the special case where m = 3,
i.e. we have exactly three points (xi1 , yi1 ), (xi2 , yi2 ), and (xi3 , yi3 ). We find the
best linear approximation fˆ(x) = α + βx under l∞ by solving the following
equations:
yi1 − (α + βxi1 ) = −1 ∗ λ
yi2 − (α + βxi2 ) = +1 ∗ λ
yi3 − (α + βxi3 ) = −1 ∗ λ
where λ represents the value of ||r(~a)||∞ for the solution ~a to be found. Solving
these equations results in
The algorithm to find the best approximation under l∞ starts with three
arbitrary points with indices i1 , i2 , and i3 with xi1 < xi2 < xi3 . Next, it
derives α, β, and λ using the solutions to our equations 24.21-24.21. Then,
the algorithm tries find new indices j1 , j2 , j3 by exchanging one of the ij with
464 CHAPTER 24. CARDINALITY AND COST ESTIMATION
some k such that λ will be increased. Obviously, we use a k that maximizes the
deviation from the best approximation fˆ of i1 , i2 , i3 , i.e.
• xk > xi3
if (sign(yk − fˆk ) == sign(yi3 − fˆi3 ))
then j1 = i1 , j2 = i2 , j3 = k
else j1 = i2 , j2 = i3 , j3 = k
The above rules are called exchange rules. In general, they state that if k falls
between two indices, the one with the same sign as rk is replaced by k. If k is
smaller than the smallest index (larger than the largest index), we consider two
cases. If the smaller (largest) index has the same sign of its residue as k, we
exchange it with k; otherwise we exchange it with the largest (smallest) index.
Stated this way, we can use the exchange rules for cases where n > 2.
Algorithm 24.8 summarizes the above considerations.
In case n > 2, the above algorithm remains applicable. We just have to use
the general exchange rule and provide a routine solving the following system of
EXC equations for xi and λ:
BestLinearApproximationUnderChebyshevNorm
3. Find an xk for which the deviation of fˆ from the given data is maximized.
Call this maximal deviation λmax .
5. Return α, β, λ.
This time, we measure the deviation by applying lq . That is, we want to find a
coefficients aj such that the function
n
X
fˆ(x) = aj Φj (x)
j=1
minimizes ( )
yi fˆ(xi )
max max , .
i=1,...,m fˆ(xi ) yi
Let ~a and ~b be two vectors in Rn with bi > 0. Then, we define ~a/~b = ~~a =
b
(a1 /b1 , . . . , an /bn )t T .
Let A ∈ Rm×n be a matrix, where m > n and ~b = (b1 , . . . , bm )t T be a vector
in Rm with bi > 0. Then we can state the problem as
under the constraint that αit T > 0, 1 ≤ i ≤ m, for all row vectors αi of A.
Alternatively, we can modify A by “dividing’ it by ~b. We need some nota-
tions to do so. Let ~b = (b1 , . . . , bm )t T be a vector in Rm . Define diag(~b) to be the
m × m diagonal matrix which contains the bi in its diagonal and is zero outside
the diagonal. For vectors ~b with bi > 0, we can define ~b−1 = (1/b1 , . . . , 1/bm )t T .
Using these notations, we can define
A0 = diag(~b−1 )A
Keeping the trick with A0 in mind, it is easy to see that Problem 24.21 can
be solved, if we can solve the general problem
find~a ∈ Rn that minimizes||A~a||Q . (24.23)
The following proposition ensures that a solution to this general problem exists.
Further, since ||A~a||Q is convex, the minimum is a global one.
Proposition 24.5.1 Let A ∈ Rm,n such that R(A) ∩ Rm
>0 6= ∅. Then ||A · ||Q
attains its minimum.
Recall that lq is subadditive and convex. Further it is lower semi-continuous
(see also [719, p. 52]). However, it is not strictly convex. Hence, as with l∞ , we
expect uniqueness to hold only under certain conditions.
We need some more notation. Let A ∈ Rm,n . We denote by R(A) =
{A~a | ~a ∈ Rn } the range of A and by N (A) = {~a ∈ Rn | A~a = 0} the nullspace
of A.
Problem (24.23) can be rewritten as the following constrained minimization
problem:
1
min q subject to ≤ A~a ≤ q and q ≥ 1. (24.24)
(~a,q)∈Rn ×R q
The Lagrangian of (24.24) is given by
1
L(~a, q, λ+ , λ− , µ) := q − (λ+ )T (q − A~a) − (λ− )T (A~a − ) − µ(q − 1).
q
Assume that R(A) ∩ Rm >0 6= ∅. Then the set {(~ a, q) : 1q ≤ A~a ≤ q and q ≥
1} is non-empty and closed and there exists (~a, q) for which we have strong
inequality in all conditions. Then the following Karush-Kuhn-Tucker conditions
ˆ, q̂) to be a minimizer of (24.24), see, e.g., [814,
are necessary and sufficient for (~a
+ − m
p. 62]: there exist λ̂ , λ̂ ∈ R≥0 and µ̂ ≥ 0 such that
ˆ, q̂, λ̂+ , λ̂− , µ̂) = AT λ+ − AT λ− = 0
∇~a L(~a (24.25)
m
X m
∂ ˆ 1 X −
L(~a, q̂, λ̂+ , λ̂− , µ̂) = 1 − λ̂+
i − λ̂i − µ = 0 (24.26)
∂q q2
i=1 i=1
and for i = 1, . . . , m,
λ̂+ ˆ i
â − (A~a) = 0, (24.27)
i
λ̂− ˆ i−1
(A~a) = 0, (24.28)
i
q̂
µ̂(q̂ − 1) = 0.
24.5. APPROXIMATION WITH LINEAR MODELS 467
Assume that 1m 6∈ R(A), where 1m is the vector with all components 1. Then
q̂ > 1 and consequently µ̂ = 0. Furthermore, it is clear that not both λ̂+ i and
λ̂− can be positive because the conditions q̂ = (A~a ˆ)i cannot be
ˆ)i and 1 = (A~a
i q̂
fulfilled at the same time, since q̂ > 1.
Setting λ̂ := λ̂+ − λ̂− , we can summarize our findings (24.25) - (24.28) in
the following theorem.
Theorem 24.5.35 Let A ∈ Rm,n such that R(A) ∩ Rm >0 6= ∅ and 1m 6∈ R(A).
ˆ
Then (~a, q̂) solves (24.24) if and only if there exists λ̂ ∈ Rm such that
i) AT λ̂ = 0.
P 1 P
ii) q = q λ̂i + q λ̂i .
λ̂i >0 λ̂i <0
Remark. We see that 1 < q̂ = (A~a ˆ)i implies sign (A~a
ˆ)i − 1 = 1 and that
1 > 1/q̂ = (A~a ˆ)i implies sign (A~a
ˆ)i − 1 = −1; whence λ̂i (A~a ˆ)i − 1 ≥ 0.
For our approximation problem (24.21) this means that the residuum fˆ(xi ) − bi
fulfills λ̂i (fˆ(xi ) − bi ) ≥ 0.
Under certain conditions, problem (24.23) has a unique solution which can
be simply characterized. Let us start with some straightforward considerations
ˆ of ||A·||Q that
in this direction. If N (A) 6= {~0}, then we have for any minimizer ~a
ˆ + β, β ∈ N (A) is also a minimizer. In particular, we have that N (A) 6= {~0} if
~a
• m < n,
A+ 1m + N (A),
Proposition 24.5.2 Let A ∈ Rn+1,n such that R(A) ∩ Rn+1 >0 6= ∅, 1m 6∈ R(A)
and rank(A) = n. Then ||A · ||Q has a unique minimizer if and only if the
Lagrange multipliers λ̂i , i = 1, . . . , n + 1 are not zero.
468 CHAPTER 24. CARDINALITY AND COST ESTIMATION
2. The matrix (m, n)–matrix A in (24.22) is the product of the diagonal ma-
trix diag (1/bi )m
i=1 with positive diagonal entries and a Vandermonde matrix.
Hence, it can easily be seen that spark(A) = n + 1. If an (m, n)–matrix A has
spark(A) = n + 1, then A fulfills the Haar condition.
Proposition 24.5.2 can be reformulated as follows:
Corollary 24.5.36 Let A ∈ Rn+1,n such that R(A)∩Rn+1 >0 6= ∅ and 1m 6∈ R(A)
. Then ||A · ||Q has a unique minimizer if and only if spark(A) = n + 1.
Theorem 24.5.37 Let A ∈ Rm,n such that R(A) ∩ Rm >0 6= ∅. Suppose that
spark(A) = n + 1. Then ||A · ||Q has a unique minimizer which is determined
by n + 1 rows of A, i.e., there exists an index set J ⊂ {1, . . . , m} of cardinality
|J| = n + 1 such that ||A · ||Q and ||A|J · ||Q have the same minimum and the
same minimizer. Here A|J denotes the restriction of A to the rows which are
contained in the index set J. We call such index set J an extremal set.
case. For ( 21 , 1)T we have sign(λ̂1 , λ̂2 , λ̂3 , λ̂3 ) = (−1, 0, 1, −1) while the pattern is
(0, 1, 1, −1) for ( 23 , 2)T and (0, 0, 1, −1) within the line bounded by these points.
By Theorem 24.5.37, a method for finding theminimizer of ||A · ||Q would
m
be to compute the unique minimizers of the n+1 subproblems ||A|J · ||Q for
all index sets J of cardinality n + 1 and to take the largest minimum â and
the corresponding ˆ
m
~a as minimizer
3
of the original problem. For our line problem
there exist 3 = O(m ) of these subproblems. In the following section, we
give another algorithm which is also based on Theorem 24.5.37, but ensures
24.5. APPROXIMATION WITH LINEAR MODELS 469
that the value a enlarges for each new choice of the subset J. Since there is
only a finite number of such subsets we must reach a stage where no further
increase is possible and J is an extremal set. In normed spaces such methods
are known as ascent methods, see [901].
In this section, we suggest a detailed algorithm for minimizing ||A·||Q , where
we restrict our attention to the line problem
bi β + αxi
max max , . (24.29)
i=1,...,m β + αxi bi
i.e., to the matrix A in (24.22) with n = 2.
Corollary 24.5.38 Let (xi , bi ), i = 1, 2, 3 be given points with pairwise distinct
xi ∈ R and positive bi , i = 1, 2, 3. Then the minimum q̂ and the minimizer
ˆ ∈ R2 of (24.29) are given by q̂ = ||q̂1 ||Q and
~a
β̂ 1 x2 −x1 b1 q̂1
= ,
α̂ x2 − x1 −1 1 b2 q̂2
where q
r2
if r1 < 0 and r2 > 0,
q 1−r1
1−r2
q̂1 := r1 if r1 > 0 and r2 < 0, (24.30)
q
1
r1 +r2 if r1 > 0 and r2 > 0,
(
r1
1/q̂1 if r2 < 0,
q̂2 := r1
x̂1 if r2 >0
and
b1 (x2 − x3 ) b2 (x3 − x1 )
r1 := , r2 := .
b3 (x2 − x1 ) b3 (x2 − x1 )
Remark. If the points are ordered, i.e., x1 < x2 < x3 (or alternatively in
descending order), then either A~a ˆ = (q̂, 1/q̂, q̂)T or A~a
ˆ = (1/q̂, q̂, 1/q̂)T . This
means that λ̂ in Theorem 24.5.35 has alternating signs. In other words, the
points f (x1 ), f (x3 ) lie above b1 , b3 and f (x2 ) lies below b2 or conversely.
Later we will show that the alternating sign condition is true for general
best polynomial approximation with respect to the Q-paranorm.
Corollary 24.5.38 is the basis of the Algorithm 24.9, which finds the optimal
line with respect to three points in each step and chooses the next three points
if the minimum corresponding to their line becomes larger.
Proposition 24.5.3 The algorithm computes the line f (x) = β̂ + α̂x which
minimizes (24.29).
Remark. Alternatively, one can deal with ordered points b1 < b2 < b3
r2
which restricts the effort in (24.30) to q̂1 = 1−r 1
but requires an ascending
ordering of the points xi1 , xi2 , xj in each step of the algorithm.
Finally, we want to generalize the remark on the signs of the Lagrange
multipliers given after Corollary 24.5.38. Remember that the set of polynomials
Φi (x) = xi−1 , i = 1, . . . , n forms a Chebyshev set (see Def. 24.5.31). Applying
again Lemma 24.5.34, one can easily prove the following result.
470 CHAPTER 24. CARDINALITY AND COST ESTIMATION
1. For i = 1, . . . , m; i 6= i1 , i2 compute
1
(α + βx1 ) = y1
λ
λ(α + βx2 ) = y2
1
(α + βx3 ) = y3
λ
If we number the above equations from 1 to 3, then we may conclude that
where
x3 − x1
q13 :=
y3 − y1
g := q13 y3 − x3 + x2
β = 0
α = λy1
p
λ = y2 /y1
Let us start with the first problem. That is, we ask for an exponential
function Pn
fˆ = e j=1 αj Φj
which best approximates under lq a given set of points (xi , yi ), i = 1, . . . , m
with pairwise distinct xi ∈ Rd and yi > 0, 1 ≤ i ≤ m. Note that fˆ > 0 by
definition. Since the ln function increases strictly monotone this is equivalent
to minimizing
( )!
yi fˆ(xi )
ln max max , = max max{ln yi − ln fˆ(xi ), ln fˆ(xi ) − ln yi }
i=1,...,m fˆ(xi ) yi i=1,...,m
n
X
= max | ln yi − αj Φj (xi )|
i=1,...,m
j=1
= k(ln yi )m
i=1 − Φ αk∞ .
P
Thus, it remains to find the best function nj=1 αj Φj (xi ) with respect to the
l∞ norm.
It is now easy to see that we can solve the second problem as follows.
EXC Let (xi , yi ) be the data we want to approximate by a function of the form
ln(p(x)) while minimizing the Chebyshev norm. We can do so by finding the
best approximation of (xi , eyi ) under lq .
10 10
8 8
6 6
4 4
2 2
0 0
0 2 4 6 8 10 0 2 4 6 8 10 12 14 16 18
10
0
0 2 4 6 8 10 12 14 16 18
The first two constraints and u ≤ a are already cone constraints. The remaining
constraints 1 ≤ aui can be rewritten to
√
2 √0 0
0 2
ui + √0 ∈ L4r
1 a
1 √2
1 1 − 2
because the following inequalities are equivalent:
√ √ √ √
( 2ui )2 + ( 2a)2 ≤ 2(ui + a + 2)(ui + a − 2)
u2i + a2 ≤ (ui + a)2 − 2
1 ≤ ui a
24.6.1 Bucketization
Assume we wish to partition the active domain DA (= ΠA (R)) of some attribute
A of some relation R into β buckets Bi (1 ≤ i ≤ β). Then, each bucket contains
a subset of values of the universe of A. That is Bi ⊆ [lA , uA ]. Not any subset is
used in practice. Since it is too memory consuming to store the values in each
bucket explicitly, buckets always comprise subintervals of the active domain.
Further, these are typically non-overlapping. That is Bi ∩ Bj = ∅ for i 6= j.
Such a partitioning of the active domain can be achieved in two steps. In
a first step, we fix a set of bucket boundaries bi ∈ [lA , ua ] such that lA = b0 ≤
b1 ≤ . . . ≤ bα = uA . In order to decrease the search space, the bi are typically
chosen from the active domain, that is bi ∈ DA .
In a second step, we use these values as bucket boundaries to determine the
buckets. Here, there are several alternatives. Let us first consider the case of
an integer-valued attribute A. If we use closed intervals for buckets, we can
define a bucket as comprising the values in [bi−1 + 1, bi ]. Note that [bi , bi+1 ]
does not work, since it overlaps with [bi+1 , bi+2 ]. We could also build a bucket
[bi−1 , bi − 1]. But with proper choices of the bi , these two are equivalent. A not
equivalent alternative is to use half-open intervals. In this case, we can define
a bucket as [bi , bi+1 [.
Another issue S is whether the buckets completely cover the active domain,
that is whether βi=1 Bi = [lA , uA ] holds or not. In the latter case, we can define
buckets comprising closed intervals as [bi , bi+1 ] if we do not define buckets for
[bi−1 , bi ] and [bi+1 , bi+2 ]. Thus, our histogram (the set of buckets we define)
contains holes. This is typically only the case if no value from DA falls into the
hole.
Summarizing, we have the following three alternatives for attribut with a
discrete, ordered domain:
1. closed-interval histogram without holes
If most range queries use close ranges, 1) and 2) should be the preferred
options. To see this, note that we must add or subtract the frequencies of the
boundaries to convert a half-open or open inveral into a closed interval or vice
versa. Further, even if we have a good approximation of the exact frequencies
of single values, we still face the problem of values not occurring in the active
domain if it is not dense. Nonetheless, the subsequent discussion applies with
minor modifications to all three alternatives.
Equi-Width Histograms
Kooi was the first to propose histograms for selectivity estimation [504]. The
first type of histograms he proposed were equi-width histograms. In an equi-
width histogram, the bucket boundaries are determined by
bi = x0 + iδ
where δ = (xd − x0 )/β.
Equi-Width Histograms
Kooi [504] also proposed the alternative of equi-width histograms [504]. There,
all buckets have about the same cumulated frequency. This can be achieved
by a single scan through X and starting a new bucket as soon as the current
bucket’s cumulated frequency exceeds F + /β.
Another interesting approach to equi-width histograms has been described
by Piatetsky-Shapiro and Connell [675]. There, buckets can overlap. The con-
struction is quite simple, though expensive. First, from X we construct a bag Y
of cardinality n such that each value xi occurs exactly fi times in Y . Then, Y is
sorted by increasing values. A parameter S is used to determine the number of
values to be stored. S determines the distance in the sorted vector Y between
two stored values via N = (n − 1)/S. From the sorted vector we then pick
every value at a position 1 + iN for i = 0, . . . , S. Hence, we store S + 1 values.
If (n − 1)/S is not an integer, the distance between the last two elements can
be smaller.
This approach is called distribution steps, but could also be termed quantiles.
If some values occur very frequently (more often than N times), then they are
stored more than once. Thus, the buckets overlap. Besides the values nothing
else is stored. Piatetsky-Shapiro and Connell then continue to give selectivity
estimation formulars [675], which we do not repeat here.
476 CHAPTER 24. CARDINALITY AND COST ESTIMATION
2. largely varying si .
24.7 More on Q
24.7.1 Properties of the Q-Error
Definition of the Q-Error
Let f ≥ 0 be a number and fˆ ≥ 0 be an estimate for f . Then, we define the
q-error of fˆ as
||fˆ/f ||Q ,
where ||x||Q := max(x, 1/x).
If for some value q ≥ 1 ||fˆ/f ||Q ≤ q, we say that the estimate is q-acceptable.
24.7. MORE ON Q 477
Sums
For 1 ≤ i ≤ n, let fi be true values and fˆi be estimates with ||fˆi /fi ||Q ≤ q for
all 1 ≤ i ≤ n. Then
n
X n
X n
X
1/q ∗ fi ≤ fˆi ≤ q ∗ fi
i=1 i=1 i=1
holds, i.e., Pn ˆ
fi
|| Pi=1
n ||Q ≤ q.
i=1 fi
Products
For 1 ≤ i ≤ n, let fi be true values and fˆi be estimates with ||fˆi /fi ||Q ≤ qi for
all 1 ≤ i ≤ n. Then
n
Y n
Y n
Y n
Y n
Y
(1/qi ) ∗ fi ≤ ˆ
fi ≤ qi ∗ fi
i=1 i=1 i=1 i=1 i=1
holds, i.e.,
Qn ˆ n
Y
Q i=1 fi
|| n ||Q ≤ qi .
i=1 fi i=1
Note that division behaves like multiplication.
Differences
Assume we are given a total value t > 0, which is the sum of a large values
l > 0 and a small value s > 0. Thus, t = l + s and s ≤ l. The latter implies
that s ≤ t/2 ≤ l.
If we know t and an estimate ˆl of l, we can get an estimate ŝ = t − ˆl. This
kind of estimation is not a good idea, as we see from the following example.
Assume that t = 100, l = 90, s = 10, and ˆl = 99. Although ||ˆl/l||Q = 1.1, we
have ŝ = t − ˆl = 1 and, thus, ||ŝ/s||Q = 10.
The situation is different, if we use an estimate ŝ of (the smaller) s to derive
an estimate ˆl for (the larger) l as the following theorem shows.
478 CHAPTER 24. CARDINALITY AND COST ESTIMATION
t/2 ≤ l = t − s (*),
t − ŝ < t/2
t
ŝ > t −
2
t
qs > t−
2
t
qs >
2
t
2q >
s
1 s
< (**)
2q t
Thus,
ˆl t/2
|| ||Q = || ||Q
l l
t
= || ||Q
2(t − s)
2(t − s)
=∗
t
s
= 2(1 − )
t
1
≤∗∗ 2(1 − )
2q
1
= 2−
q
≤ q
To see why the last inequality holds, we first observe that 2 − 1q is strongly
increasing in q. Further, remember that 1 ≤ q always holds and observe that
(1) q − (2 − 1q ) is strongly decreasing in q and (b) it attains its minimum in
24.7. MORE ON Q 479
1
q − (2 − ) ≥ 0
q
1
q ≥2−
q
finishes Case 1.
Case 2: We have to show that
(1/q)l ≤ t − ŝ ≤ ql
t − ŝ ≤ ql
≺ t − ŝ ≤ q(t − s)
≺ t − 1q s ≤ q(t − s)
≺ t − 1q s ≤ qt − qs
≺ qs − 1q s ≤ qt − t
≺ (q − 1q )s ≤ (q − 1)t
(q− 1q ) t
≺ (q−1) ≤ s
(q − 1q ) q−1+1− 1
q
=
(q − 1) q−1
1 − 1q
= 1+
q−1
≤ 2
t
≤
s
1− 1q
To see that q−1 ≤ 1 consider
1− 1q
q−1 ≤ 1
≺ 1 − 1q ≤ q−1
≺ 2 ≤ q + 1q
Consider
(1/q)l ≤ t − ŝ
≺ 1/q(t − s) ≤ t − ŝ
≺ 1/q(t − s) ≤ t − qs
1 1
≺ q t − q s ≤ t − qs
≺ qs − 1q s ≤ t − 1q t
≺ (q − 1q )s ≤ (1 − 1q )t
q− 1q t
≺ 1− 1q
≤ s (*)
Observe that
1
q− q q2 − 1
1 =
1− q
q−1
(q + 1)(q − 1)
=
q−1
= q + 1 (**)
Since the latter holds, we are done with the case t/2 ≤ 1q (t − s) and, thus, with
Case 2.
2
24.7. MORE ON Q 481
Proof
Case 1. ˆl = t̂/2. Define
ˆl t̂/2
q ∗ := || ||Q = || ||Q .
l t−s
Case 1.1 t̂/2 ≥ t − s. Then
t̂/2
q∗ =
t−s
t̂/2
≤
t/2
≤ q
2
Note that if we make sure that we overestimate t, i.e., t̂ ≥ t, then only Case
2.1 applies and t̂ is quite precise.
ToDo Proof
Case 1. Consider the case where t̂/2 > t̂ − ŝ and, thus, ˆl = t̂/2. The first
condition implies that ŝ > t̂/2. Also, by our preconditions, t/2 ≤ t − s. Define
ˆl t̂/2
q ∗ := || ||Q = || ||Q .
l t−s
t̂/2
q∗ =
t−s
qt t/2
≤
t/2
≤ qt
Case 2. Consider the case where t̂/2 ≤ t̂ − ŝ and, thus, ˆl = t̂ − ŝ. Define
ˆl t̂ − ŝ
q ∗ := || ||Q = || ||Q .
l t−s
t̂ − ŝ
q∗ =
t−s
t 1 s
≤ qt −
t − s qs t − s
1 s
≤ 2qt −
qs t − s
≤ 2qt
24.7. MORE ON Q 483
24.7.3 θ,q-Acceptability
One problem occurs if the cardinality estimate for some query is fˆ ≥ 1 and the
true cardinality is zero. This happens, since we should never return an estimate
of zero, because this leads to query simplifications which may be wrong or in
reorderings which may not be appropriate. To solve this dilemma, there is
only a single solution: during query optimization time, we execute building
blocks and even access paths until the first tuple has been delivered. From
there on, we know for sure, wether the result will be empty or not. If there
is a tuple delivered, we buffer it, since we want to avoid its recalculation at
runtime. The overhead of this method should therefore be low. Now, assume
that we are willing to buffer more tuples (say 1000). Then, if there are less
than 1000 qualifying tuples, we now the exact answer after fetching them. If we
have to halt the evaluation of the build block since the buffer is full, we know
that there will be ≥ 1000 qualifying tuples. Let us denote by θbuf the number
of tuples we are willing to buffer. Since we interleave query optimization and
query execution, this can be considered a small step in the direction of adaptive
query optimization [228].
However, before we can evaluate a building block or access paths, we have to
determine an optimal one, which in turn requires cardinality estimates! Before
we proceed note that cardinality estimates may be imprecise as long as they do
not influence the decisions of the query optimizer badly. This means, as long
as the query optimizer produces the best plan, any estimate is o.k. Let’s for
example take the decision wether to exploit an index or not. Assume, an index
is better than a scan if less than 10% of the tuples qualify (This is a typical
value [101, 361]). If the relation has 10000 tuples, the threshold is at 1000
tuples. Thus, assume that for a given range query both, the estimate and the
true value do not exceed 500. Then, no matter what the estimate is, we should
use the index. Note that the q-error can be 500 (e.g., the estimate is 1 and the
true value is 500). Still it does not have any bad influence on our decision. The
important thing is that the estimate has to be precise around 1000. For a given
relation and one of its indices, we denote by θidx the number of tuples that, if
exceeded make a table scan more efficient than the index scan.
Let us now combine these two things. Assume we want to have a maximal q-
error of q. Define θ = min(θbuf − 1, (1/q)θidx ). Assume that fˆ is an estimate for
the true cardinality f . Further assume that if fˆ or f exeeds θ, then ||fˆ/f ||Q ≤ q.
Now let’s go through the optimizer. In a first step, we define our building blocks
and access paths, which requires to decide on index usage. Clearly, the estimate
will be precise above (1/q)θidx , which includes the critical part. After evaluating
a building block or access path, we have precise cardinality estimates if fewer
than θbuf tuples are retrieved. Otherwise, our estimate will obey the given
q-error. Thus, we are as precise as necessary under all circumstances.
These simple observations motivate us to introduce the notion of θ, q-acceptability.
Let f ≥ 0 be a number and fˆ ≥ 0 be an estimate for f . Let q ≥ 1 and θ ≥ 1
be numbers. We say that fˆ is θ, q-acceptable if
1. f ≤ θ ∧ fˆ ≤ θ or
486 CHAPTER 24. CARDINALITY AND COST ESTIMATION
2. ||fˆ/f ||Q ≤ q.
Discretization
Testing θ,q-acceptability for a given bucket for a continous domain directly is
impossible since it would involve testing θ,q-acceptability of fˆ+ (c1 , c2 ) for all
c1 , c2 within the bucket. In this section, we show that a test quadratic in the
number of distinct values in the bucket suffices.
Let c1 , c2 be a query interval. Assume i, j are chosen such that [xi , xj ] ⊆
[c1 , c2 ] ⊂ [xi−1 , xj+1 ]. Since there is no distinct value between xi and xi−1 and
between xj and xj+1 , we have that f + (c1 , c2 ) = f + (xi , xj ) < f + (xi−1 , xj+1 ).
Assume the following conditions hold:
1. fˆ+ is monotonic.
fˆ+ (x ,x )
2. || f + (c1i ,c2j ) ||Q ≤ q
ˆ+ (c1 ,c2 )
Since fˆ+ (xi , xj ) = fˆ+ (c1 , c2 ) ≤ fˆ+ (xi−1 , xj+1 ), we then have || ff + (c 1 ,c2 )
||Q ≤ q.
Exploiting this fact, we can develop the following quadratic test for some
given θ and q. If for all i, j such that xi and xj are in the bucket we have that
or
fˆ+ (xi , xj ) fˆ+ (xi−1 , xj+1 )
|| ||Q ≤ q ∧ || ||Q ≤ q
f + (xi , xj ) f + (xi , xj )
then the bucket is θ,q-acceptable.
24.7. MORE ON Q 487
Subtests
1. f + (xi , xi0 ) ≤ θ
and
• xi = xi1 and
• xj = xim .
or (b)
f + (xi1 , xil−1 ) + θ
≤
fˆ+ (xi , xi ) + 1
1 l−1
f + (x
i1 , xil−1 ) + θ
≤
ˆ+
f (xi1 , xil−1 )
θ
≤ q+
ˆ+
f (xi , xi ) 1 l−1
1
≤ q+
k
Summarizing, we are able to trade in accuracy for performance when testing
the θ, q-acceptability of some bucket.
2. maxi fi / mini fi ≤ q 2 .
24.7. MORE ON Q 489
The first condition also holds for non-dense buckets. The last condition only
holds if we use our flexibility concerning the α in our approximation function.
If we use fˆavg
+ , we need to exchange it against
b0 ≤ c1 ≤ b1 ≤ c2 ≤ b2 .
the true cumulated frequency within an interval contained in the bucket. Fur-
ther, we assume that the approximation function fˆi+ (x, y) of every bucket Bi is
θ,q-acceptable.
We introduce the following abbreviations:
f1 := f + (c1 , b1 )
f2 := f + (b1 , c2 )
f := f1 + f2
f1 := fˆ1+ (c1 , b1 )
ˆ
fˆ2 := fˆ2+ (b1 , c2 )
fˆ := fˆ1 + fˆ2
Now, we investigate the estimation error for our range query. We distinguish
several cases.
490 CHAPTER 24. CARDINALITY AND COST ESTIMATION
Case 1. In the first case, we assume f ≤ kθ and fˆ ≤ kθ. In this case, the
estimate is kθ,q-acceptable.
Case 2. In the second case, we assume that (f1 > θ∨ fˆ1 > θ)∧(f2 > θ∨ fˆ2 > θ).
It follows that ||fˆ/f ||Q ≤ q.
Case 3. We now assume that neither the condition of Case 1 nor the condition
of Case 2 holds. Thus,
¬(f ≤ kθ ∧ fˆ ≤ kθ) ∧ ¬((f1 > θ ∨ fˆ1 > θ) ∧ (f2 > θ ∨ fˆ2 > θ)),
which is equivalent to
fˆ fˆ1 + fˆ2
q ∗ := || ||Q = || ||Q .
f f1 + f2
kθ < f = f1 + f2 ≤ θ + f2
fˆ2
|| ||Q ≤ q.
f2
fˆ1 + fˆ2
q∗ =
f1 + f2
fˆ1 fˆ2
= +
f1 + f2 f1 + f2
θ fˆ2
< +
kθ f2
1
≤ q+
k
24.7. MORE ON Q 491
f1 + f2
q∗ =
fˆ1 + fˆ2
θ + q fˆ2
≤
fˆ2
θ
≤ q+
fˆ2
θ
≤ q+
(1/q)f2
θ
≤ q+
(1/q)(k − 1)θ
q
≤ q+
k−1
This implies
kθ < f = f1 + f2 ≤ f1 + θ
fˆ1
|| ||Q ≤ q.
f1
Case 3.2.1 Assume fˆ1 + fˆ2 ≥ f1 + f2 . Then, fˆ1 + fˆ2 > kθ.
A simple calculation gives us
fˆ1 + fˆ2
q∗ =
f1 + f2
fˆ1 fˆ2
= +
f1 + f2 f1 + f2
fˆ1 θ
≤ +
f1 kθ
1
≤ q+
k
492 CHAPTER 24. CARDINALITY AND COST ESTIMATION
f1 + f2
q∗ =
fˆ1 + fˆ2
q fˆ1 + θ
≤
fˆ2
θ
≤ q+
fˆ2
θ
≤ q+
(1/q)f2
θ
≤ q+
(1/q)(k − 1)θ
q
≤ q+
k−1
fˆ2
|| ||Q ≤ q.
f2
fˆ1 + fˆ2
q∗ =
f1 + f2
fˆ1 fˆ2
≤ +
f1 + f2 f1 + f2
θ fˆ2
≤ +
f2 f2
θ
≤ +q
(1/q)fˆ2
θ
≤ +q
(1/q)(k − 1)θ
q
≤ q+
k−1
24.7. MORE ON Q 493
fˆ1
|| ||Q ≤ q.
f1
fˆ1 + fˆ2
q∗ =
f1 + f2
fˆ1 fˆ2
≤ +
f1 + f2 f1 + f2
fˆ1 θ
≤ +
f1 f1
θ
≤ q+
(1/q)fˆ1
θ
≤ q+
(1/q)(k − 1)θ
q
≤ q+
k−1
Case 3.4.2 Assume f1 + f2 > fˆ1 + fˆ2 . A simple calculation gives us
f1 + f2
q∗ =
fˆ1 + fˆ2
f1 f2
≤ +
f1 + f2 f1 + fˆ2
ˆ ˆ ˆ
f1 θ
≤ +
fˆ1 kθ
1
≤ q+
k
2
494 CHAPTER 24. CARDINALITY AND COST ESTIMATION
f1 := f + (c1 , b1 )
f2 := f + (b1 , bn−1 )
f3 := f + (bn−1 , c2 )
f := f1 + f2 + f3
f1 := fˆ1+ (c1 , b1 )
ˆ
fˆ2 := fˆ2+ (b1 , bn−1 )
fˆ3 := fˆ+ (bn−1 , c2 )
3
fˆ := fˆ1+ + fˆ2+ + fˆ3+
¬(f ≤ kθ ∧ fˆ ≤ kθ) ∧ ¬((f1 > θ ∨ fˆ1 > θ) ∧ (f3 > θ ∨ fˆ3 > θ)),
which is equivalent to
and
q fˆ2 > (k − 2)θ.
24.7. MORE ON Q 495
If f ≤ fˆ we get
fˆ
q∗ =
f
fˆ1 + fˆ2 + fˆ3
=
f1 + f2 + f3
fˆ1 + fˆ3 fˆ2
= +
f1 + f2 + f3 f1 + f2 + f3
2θ
≤ +q
kθ
2
≤ q+
k
If fˆ ≤ f we get
f
q∗ =
fˆ
f1 + f2 + f3
=
fˆ1 + fˆ2 + fˆ3
2θ + f2
≤
fˆ2
2θ
≤ q+
fˆ2
2θ
≤ q+
(1/q)(k − 2)θ
2q
≤ q+
k−2
Case 3.1.2 Assume fˆ > kθ. From fˆ = fˆ1 + fˆ2 + fˆ3 > kθ and fˆ1 ≤ θ and
fˆ3 ≤ θ, we get
fˆ2 > (k − 2)θ
and
qf2 > (k − 2)θ.
If f ≤ fˆ we get
fˆ
q∗ =
f
fˆ1 + fˆ2 + fˆ3
=
f1 + f2 + f3
2θ
≤ q+
f2
2θ
≤ q+
(1/q)(k − 2)θ
2q
≤ q+
k−2
496 CHAPTER 24. CARDINALITY AND COST ESTIMATION
If fˆ ≤ f we get
f
q∗ =
fˆ
f1 + f2 + f3
=
fˆ1 + fˆ2 + fˆ3
f2 + 2θ
≤
f1 + fˆ2 + fˆ3
ˆ
2θ
≤ q+
kθ
2
≤ q+
k
f2 + f3 > (k − 1)θ
and
q(fˆ2 + fˆ3 ) > (k − 1)θ
ˆ ˆ
since || ff22 +f
+ f3
3
||Q ≤ q.
ˆ
If f ≤ f we get
fˆ
q∗ =
f
fˆ1 + fˆ2 + fˆ3
=
f1 + f2 + f3
θ
≤ q+
kθ
1
≤ q+
k
If fˆ ≤ f we get
f
q∗ =
fˆ
f1 + f2 + f3
=
fˆ1 + fˆ2 + fˆ3
θ
≤ q+
f2 + fˆ3
ˆ
θ
≤ q+
(1/q)(k − 1)θ
q
≤ q+
k−1
24.7. MORE ON Q 497
qcompressb(x, b)
return (0 == x) ? 0 : dlogb (x)e + 1e
qdecompressb(y, b)
return (0 == y) ? 0 : by−1+0.5
qcompressbase(x, k)
// x is the largest number to be compressed
// k is the number of bits used to store a compressed value
return x1/((1<<k)−1)
24.7.6 Q-Compression
General Q-Compression
The goal of q-compression is to approximate a number x ≥ 1 with a small q-
error. Given some b > 0, let x be some number in the interval [b2l , b2(l+1) ]. If we
approximate x by b2l+1 then ||b2l+1 /x||Q ≤ b. Let xmax be the largest number to
be compressed. If xmax ≤ b2(k+1) for some k is the maximal occurring number,
we can approximate any x in [1, xmax ] with dlog2 (k)e bits obeying a maximal
q-error of b. We can extend q-compression to allow for the compression of 0
as in the code in Fig. 24.11. 2
√ There, we use the base b instead of b as above.
Thus, the error is at most b. Let us consider a concrete example. Let b = 1.1.
Assume we use 8 bits to store a number. Then, since 1.1254 ≈ 32.6 9
√ ∗ 10 , we can
approximate even huge numbers with a small q-error of at most 1.1 = 1.0488.
Other examples are given in Table 24.7.
There exists a small disadvantage of q-compression with a general base.
Though calculating the logarithm is quite cheap, since typically machine in-
structions to do so exist, calculating the power during decompression is quite
expensive. On our machine, compression takes roughly 54 ns whereas decom-
pression takes 158 ns. This is bad since in the context of cardinality estimation,
decompression is used far more often than compression. Thus, we introduce an
alternative called binary q-compression.
Binary Q-Compression
The idea of binary q-compression is simple. Let x be the number we want
to compress. If we take the base b = 2 then dlog2 (x)e = k, where k is the
index of the highest bit set. This calculation can be done by √
a rather efficient
machine instruction. This gives us a maximum q-error of 2. We can go
below this, by remembering not only the highest bit set, but the k highest
bits set. Additionally, we store the position of them (their shift) in s bits.
The pseudocode is given in Fig. 24.12, where we extended the scheme to allow
for the compression of zero. So far, this resembles a special floating point
24.7. MORE ON Q 499
p √
2n ∗ (2n+1 − 1) ≈ 2n ∗ 2n+1
√
= 22n ∗ 2
√
= 2 ∗ 2n
√
= 2n + ( 2 − 1) ∗ 2n
√
The second part can be calculated by a constant ( 2 − 1) shifted by n to the
left. The pseudocode in Fig. 24.12 gives the calculation of this√constant C in C.
The best theoretical q-error achievable with storing k bits is 1 + 21−k . With
our fast approximation, we get pretty close as the following table shows. The
observed maximal q-error column was obtained experimentally. The deviation
from the observed maximal q-error to the theoretical maximal q-error is due
to the fact that only a small portion of the digits of C are used. Further,
compression (2.7 ns) and decompression (2.8 ns) are fast.
500 CHAPTER 24. CARDINALITY AND COST ESTIMATION
qcompress2(x, k, s)
if 2s > x
then
bits = x
shift = 0
else
shift = index-of-highest-bit-set(x) - k + 1;
bits = (x >> shift)
return (bits << shift) | shift
qdecompress2(y, k, s)
shift = y & (2s − 1)
bits = y >> shift
x = bits << shift
– assume C = (int) ((sqrt((double) 2.0) - 1.0) * 4 * (1 << 30))
x |= (C >> (32 - shift))
return x
√
k max q-error observed max q-error theoretical ( 1 + 21−k )
1 1.5 1.41
2 1.25 1.22
3 1.13 1.12
4 1.07 1.06
5 1.036 1.03
6 1.018 1.016
7 1.0091 1.0078
8 1.0045 1.0039
9 1.0023 1.00195
10 1.0011 1.00098
11 1.00056 1.00048
12 1.00027 1.00024
Incremental Updates
It might come as a surprise that q-compressed numbers can be incrementally
updated. Already in 1978, Morris observed this fact [617]. Later, Flajolet
analyzed the probabilistic counting method thoroughly [280]. The main idea is
rather simple. For binary q-compressed numbers, the incrementing procedure
is defined as follows:
RandomIncrement(int& c)
// c: the counter
24.8. ONE DIMENSIONAL SYNOPSES 501
τ1,1 : 100
–
τ̂1,1 : 100
τ1,2 : 52 τ2,2 : 48
m1,2 : 33 –
τ̂1,2 : 52 τ̂2,2 : 48
xi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
fi 7 5 18 0 6 10 0 6 0 6 9 5 13 0 8 7
This bucket is divided into 8 bucklets of width 16/8 = 2. Every bucklet τi,8
summarizes the values in bucket i, 1 ≤ i ≤ 8. The next higher level of the four
502 CHAPTER 24. CARDINALITY AND COST ESTIMATION
level tree contains four values τi,4 (1 ≤ i ≤ 4) summing the frequencies in the
i-th quarter of the bucket. Thus, τi,4 = τ2i−1,8 + τ2i,8 for 1 ≤ i ≤ 4. The third
level of the four level tree defines the values τi,2 for i = 1, 2 summing up the
frequencies in each half of the bucket. The last level, τ1,1 contains the sum of
all frequencies fi in the bucket. This scheme is illustrated in Fig. 24.13 and
formally defined as
τi,2k := τ2i−1,2k+1 + τ2i,2k+1
for k = 0, . . . , 3.
The four level tree in Fig. 24.13 is compressed into 64 bits as follows. τ1,1 is
stored in the first 32 bits. Next, the τj,2k for k > 0 are only stored if j is odd.
For even j = 2i, τ2i,2k+1 can be calculated given τi,2k :
k 0 1 2 3
bk 32 6 5 4
The intention is that if we make a mistake at a higher level, all lower levels are
affected. Thus, we want to be precise at higher levels.
Instead of storing τ2i−1,2k+1 directly, the ratio τ2i−1,2k+1 /τi,2k is approximat-
ed using bk bits:
τ2i−1,2k+1 bk
m2i−1,2k+1 := round( (2 − 1)). (24.33)
τi,2k
m2i−1,2k
τ̂2i,22 k := round( ∗ τ̂i,2k ). (24.34)
2bk − 1
This recursion is possible, since we store τ1,1 explicitly. The τ̂ are also given in
Fig. 24.13.
Now, consider the example in Fig. 24.14. It shows the four level tree for a
frequency density where the eight bucklets have the following cumulated fre-
quencies:
i 1 2 3 4 5 6 7 8
fi+ 1.000.000 100.000 10.000 1000 100 10 1 10.000
24.8. ONE DIMENSIONAL SYNOPSES 503
τ1,1 : 1121111
–
τ̂1,1 : 1121111
τ1,8 : 1000000 τ2,8 : 100000 τ3,8 : 10000 τ4,8 : 1000 τ5,8 : 100 τ6,8 : 10 τ7,8 : 1 τ8,8 : 10000
m1,8 : 14 – m3,8 : 14 – m5,8 : 14 – m7,8 : 0 –
τ̂1,8 : 1029762 τ̂2,8 : 73554 τ̂3,8 : 0 τ̂4,8 : 0 τ̂5,8 : 0 τ̂6,8 : 0 τ̂7,8 : 0 τ̂8,8 : 17795
As we can see, the error for last bucketlet 8,8 is quite large. The reason is that
we substract an estimate of larger number from a smaller number, which is not
a good idea (see Sec. 24.7.1). Although, the four level tree is an excellent idea,
it has two major problems:
2. Always the left child’s τ is subtracted from the right child’s τ . This results
in uncontrollable errors if the right child’s τ is smaller than the left child’s
τ (see Sec. 24.7.1).
poly1dim, exppoly1dim
(sometimes called column group). Although the number of distinct values for a
set of attributes is rather interesting at different points in cardinality estimation
(see our simple profile), we show how to provide selectivity estimates under the
uniform distribution assumption but without relying on the attribute value in-
dependence assumption. The number of distinct values for a column group can
be calculated by sorting or hashing, but this is often far to expensive. Thus,
we provide a set of techniques based on sketches. Sketches are small summaries
of data typically calculated online, i.e., with a single scan over the data. We
concentrate on sketches for the count distinct case. A general introduction and
an overview is contained in [206] and a recent evaluation of different sketches
can be found in [400].
The problem we look at in this section is to approximately answer a query
of the form
select count(*)
from Car
where Make = Honda and Model = Accord
Denote by p1 (p2 ) the first (second) predidate in the query. The selectivities are
s(p1 ) = 1/7 and s(p2 ) = 1/8. Assuming AVI yields ŝ(p1 ∧ p2 ) = s(p1 ) ∗ s(p2 ) =
1/56. The true selectivity is 1/10. The number of distinct values in the attribute
group (Make,Model) is calculated by the following query
and results in #DV = 9. Assuming that all distinct values occur equally often
(uniformity assumption) results in the selectivity estimate ŝ(p1 ∧ p2 ) = 1/9,
which is much better.
Throughout the rest of this section, we assume that we want to produce
an estimate dˆ for the number of distinct values d of a multiset (bag) X =
{x1 , . . . , xd }.
506 CHAPTER 24. CARDINALITY AND COST ESTIMATION
Car
ID Make Model
1 Honda Accord
2 Honda Civic
3 Toyota Camry
4 Nissan Sentra
5 Toyota Corolla
6 BMW 323
7 Mazda 323
8 Saab 95i
9 Ford F150
10 Mazda 323
24.9.2 DvByKMinVal
Assume the hash function h hashes the elements of our set X to the interval
[0, 1[. Further, let H = {hi |hi = h(xi ), xi ∈ X} and assume that 0 ≤ hi ≤
hi+1 < 1. If the hash function spreads out the elements evenly, we expect
an average distance of δ = 1/(d + 1) ≈ 1/d between two neighbored hash
values. For some given k, consider hk , i.e., the k-th smallest value in H. This
value can easily be calculated exploiting a heap to keep the lowest k distinct
values while scanning X. Clearly, we expect the value of hk to be around kδ.
Thus, δ = hk /k. If we plug this into the former equation, we get hk /k = 1/dˆ
and hence dˆ = k/hk . This very simple algorithm (see Fig.24.17), which we
24.9. SKETCHES FOR COUNTING THE NUMBER OF DISTINCT VALUES507
LinearCounting(X, h, B)
X: bag of elements
h: hash function in [0, b − 1]
B: bitvector of length bk
initialize B with zeros
for all x ∈ X do B[h(i] = 1
for all i ∈ [0, m[ do
if 0 = M [i] then z := z + 1
if 0 = z then z := 1
o := m − z // number of ones in the bitvector corresponding to M
√
if o < m
then return o
else return m ∗ ln(m/z)
DvByKMinVal(X,h)
input: a bag X, a hashfunction h : X → [0, 1]
output: estimtate for the number dˆ for the number of distinct values in X
using,e.g., a heap calculate the k-th minimal value hk in {h(x)|x ∈ X}
dˆ := (k − 1)/hk
return dˆ
LogarithmicCounting(X, b)
// X: bag of elements
// h: hash function
// b: length of bitvector
// constant φ = 0.7735162909
// indices for bitvectors start with 0
let B be a bitvector of length b, with all bits set to zero
for each x ∈ X do B | = lowest-bit-set(h(x))
R = index-of-lowest-zero-bit in B
return (1/φ) ∗ 2R
A factor 1/φ corrects this. Fig. 24.18 shows the full algorithm including φ.
LogarithmicCounting produces rather rough estimates. This is remedied
by a first alternative called Multiple Probabilistic Counting. The idea is to
calculate m estimates for m independent hash functions and average these.
However, using m different hash function is expensive and it may prove difficult
to find these [24]. As a variant, Flajolet and Martin suggest to use several
predetermined permutations and only one hash function [283]. However, both
alternatives are still quite expensive. Hence, we don’t detail on this algorithm.
The third variant, Probabilistic Counting with Stochastic Averaging (PC-
SA), also averages several estimates, but it does so without applying multiple
hash functions. The idea is to split the bitvector of the hashed value into two
parts. The first k bits give an index into an array of bitvectors of size b − k.
Then, for every input value xi , only the bitvector determined by the first k bits
of h(xi ) is manipulated by remembering its lowest bit set. Instead of one R,
we now have several Rj for 1 ≤ j ≤ 2k . These are averaged and the resulting
estimate is produced. Fig. 24.19 shows the full pseudocode, where we also in-
tegrated the unbiasing presented in [283]. The standard deviation of PCSA is
√
0.78/ m.
PCSA(X, h, b, k)
// X: bag of elements
// h: hash function
// b: length of bitvector produced by h
// k: length of prefix used to index array
// constant φ = 0.7735162909
// constant ψ = 1 + (0.31/m)
// indices for bitvectors start with zero
m := 2k
let B be an array of size m containing bitvectors of size b − k
for each x ∈ X do
// i, r = split h(x) into k and b − k bits:
i = h(x)&((1 << k) − 1)
r = h(x) >> k
B[i] | = lowest-bit-set(r)
for each B[j], (1 ≤ j ≤ m) do
RPj = index-of-lowest-zero-bit(B[j])
S = j Rj
return (m/(φ ∗ ψ)) ∗ 2(S/m)
LogLogCounting(M )
1/m
αm := (Γ(−1/m) 1−2 ln 2 )−m // Γ is the gamma function
P
dˆloglog := αm m2(1/m) j M [j]
return dˆ
FillM(X, h, M)
// X: bag of elements
// h: hash function
// M: array of integers of size m = 2k
// k: length of prefix used to index array M of maxima
// indices for bitvectors start with one
initialize M to 0
for each x ∈ X do
y = h(x)
if 0 == y
then
M[0] := max(M [0], 33 − k) // if the length of hash value is 32 bits
else
i := y & ((1 << k) - 1)
j := idx-lowest-bit-set(y >> k)
M [i] = max(M [i], j)
SuperLogLog(M )
dˆlinc := LinearCounting(M )
dˆloglog := LogLogCounting(M )
dˆsupll := α̃(dˆloglog )m2apartial
L := 10m/5
case
when dˆsupll < L ∧ dˆlinc < L do N := dˆlinc
when dˆsupll > L ∧ dˆlinc > L do N := dˆsupll
else N := (dˆlinc + dˆsupll )/2
esac
H = 232 // if 32 bit is the length of a hash value
N
return −H ln(1 − H ) // correction of hash collisions
eliminates bad accidental outliers. Then, three estimates are calculated. The
first is dˆlinc produced by LinearCounting. The second is dˆloglog produced by
LogLogCounting. This estimate is only used to produce the next estimate via
some function α̃. The third is dˆsupll . The estimate produced by SuperLogLog
is then calculated as shown in Fig. 24.21.
The only missing piece is the calculation of the unbiasing function α̃ given
in Fig. 24.22. As one can see, a polynomial of degree 4 is evaluated if k exceeds
24.9. SKETCHES FOR COUNTING THE NUMBER OF DISTINCT VALUES511
α̃(x)
// x is the estimate produced by LogLogCounting
// remember: k is the number of bits used for indexing M
κ := bln(x/m)/ln(2) + 1.48c + 1 − ln(x/m)/ln(2)
if k < 4
then r := 0.74
else r := c4 κ4 + c3 κ3 + c2 κ2 + c1 κ1 + c0
return r
Coefficents ci :
c4 c3 c2 c1 c0
k =4 0.003497 −0.03555 0.1999 −0.4812 1.139000
k =5 0.00324250 −0.0346687 0.19794194 −0.47555735320 1.140732
k =6 0.0031390489 −0.0343776755 0.197295 −0.4730536 1.141759
k =7 0.0030924632 −0.0342657653 0.197045 −0.4718622 1.142318
k =8 0.0030709 −0.034219 0.19694 −0.47129 1.142600
k ∈ [9, 12] 0.0030517 −0.034180 0.19685 −0.47077 1.142870
k > 12 0.0030504 −0.034177 0.19685 −0.47073 1.142880
3. The coefficients differ for different k. They are also given in the figure. The
√
standard deviation of SuperLogLogCounting is 1.05/ m. For hashing strings,
Flajolet and co-workers suggest to use the hash function proposed by Lum,
Yuen, Dodd [562].
24.9.6 DvByMinAvg
Whereas DvByKMinVal calculates the k-th smallest value, Lumbroso proposed to
calculate m minima and average them [563]. This is done by splitting the values
in the bag X into m partitions using the first l bits of the hash values. The
remaining bits are then used to calculate the minima. The code of DvByMinAvg
is shown in Fig. 24.24. The average of the minima contained in M is then
calculated as the estimate E. As before, linear counting is used to estimate
small numbers of distinct values. For the medium range, Lumbroso showed
that the expected value of the estimate dˆ of the algorithm is (see Theorem 4 in
512 CHAPTER 24. CARDINALITY AND COST ESTIMATION
HyperLogLog(X, h, m)
X: bag of elements
h: hash function to {0, 1}3 2
m: number of entries in matrix M , m = 2l for some l
FillM(X,h, M )
P
E := αm m2 ( i=0 i < m2−M [i] )−1 // ’raw’ estimate
if E < 5/2
m
then V := number of empty entries in M
E ∗ := (V = 0) ? E ∗ : m log(m/V )
1 32
else if E ≤ 30 2
∗
then E := E
else E ∗ := −232 log(1 − E/232 )
return E ∗
[563]):
ˆ ≈ d
E(d) ,
1 − e−λ
where d is the true number of distinct values in X and λ = d/m. In order to
ˆ
correct this bias, we set y = d/m and solve
λ
y=
λ − e−λ
for λ. Let us denote this inverse function by f −1 (λ). The best quadratic
approximation under lq is f −1 (x) ≈ −0.0329046x2 + 1.34703 ∗ x − 0.932685 with
a maximal q-error of 1.0035.
24.9.7 DvByKMinAvg
Giroire proposed and algorithm we call DvByKMinAvg [325]. Alghough older
than the approach by Lumbroso, DvByKMinAvg can easiest be understood as
a combination of DvByKMin and DvByMinAvg. As can be seen in Fig. 24.25,
we maintain an array M of buckets. Each bucket holds the k minimal values
assigned to it, where k is a parameter pragmatically chosen to be 3 [325]. This
combines relatively low overhead with relatively high precision. After the array
M has been filled with the minimal values of the actual estimate is calculated in
two steps. First, the sum of the negative logarithms of the k-th minimal values
is calculated. In the algorithm, we denote by M k [i] the k-th smallest value in
bucket i. Then, the actual estimate is calculated from this sum. The estimate
found in the algorithm corresponds ot the logarithm family algorithm. Giroire
presented two more estimators, namely the inverse family algorithm and the
square root family algorithm [325].
24.10. MULTIDIMENSIONAL SYNOPSIS 513
DvByMinAvg(X, h, m)
X: bag of elements
h: hash function to [0, 1].
m: number of entries in array matrix M , m = 2l for some l
// calculate m minima
for all x ∈ X do
a := h(x)
i := bamc
M [i] := min(M [i], am − bamc)
od
m(m−1)
dˆ := M [0]+...+M [m−1]
V := number of empty entries in M
if V ≤ 0.86m
then E ∗ := m log(m/V )
else if V < m
ˆ
then E ∗ := mf −a (d/m)
else
E ∗ := dˆ
return E ∗
DvByKMinAvg(X, h, m)
X: bag of elements
h: hash function to [0, 1].
m: number of entries in array M of buckets, m = 2l for some l
every bucket in M holds the k smallest values assigned to this bucket
// calculate m times k smallest values
for all x ∈ X do
if i−1
m ≤ h(x) ≤ m
1
mensions.
σod≤4∧sd≤4 (Orders)
and we wish to estimate the result cardinality using the independence assump-
tion. Since the selectivity of od ≤ 4 is 40/84 and the selectivity of sd ≤ 4 is
34/84, the total selectivity under independence is 40/84 ∗ 34/84 = 0.19, and
thus an estimate of 0.19 ∗ 84 ≈ 16 for our result cardinality. The true result
cardinality is 34.
σod≤4∧sd≥6 (Orders)
and we wish to estimate the result cardinality using the independence assump-
tion. Since the selectivity of od ≤ 4 is 40/84 and the selectivity of sd ≥ 6 is
40/84, we get that the total selectivity is 40/84 ∗ 40/84 = 0.23, and thus an
estimate of 0.23 ∗ 84 = 19 for our result cardinality. The true result cardinality
is 1.
Two dimensional synopses are meant to avoid these inaccuracies.
A − B ≤ c2 − d1
B − A ≤ d2 − c1
which is equivalent to
A − B ≥ c1 − d2
A − B ≤ c2 − d1
516 CHAPTER 24. CARDINALITY AND COST ESTIMATION
and thus
(c1 − d2 ) ≤ (A − B) ≤ (c2 − d1 ). (24.36)
Using the one-dimensional histogram, we can derive an estimate for the selec-
tivity of Eq. 24.36. Call this selectivity s(Eq.24.36). Additionally, denote by
s(c1 ≤ A ≤ c2 ) and s(d1 ≤ B ≤ c2 ) the selectivities of the two range predi-
cates. Under the independence assumption, we would calculate the selectivity
of Eq.24.35 as
Let us see how this works for our example queries. In order to determine
s(od ≤ 4∧sd ≤ 4), we have to determine the selectivities of the single predicates,
which are s(od ≤ 4) = 40/84 and s(sd ≤ 4) = 34/84. Instantiating Eq.24.36
with c1 = d1 = 0 and c2 = d2 = 4 gives us −4 ≤ (sd − od) ≤ 4. Thus, all
tuples qualify and the selectivity of this predicate is 1. Hence, we derive the
cardinality estimate
which is closer to the truth than the estimate produced under independence.
SELECT count(*)
FROM Lineitem l, Orders o
WHERE o.orderdate >= 1995.03.01 AND
l.shipdate <= 1995.03.07 AND
l.orderno = o.orderno
Here, the two date attributes come from different relations. The solution to this
problem is rather simple: define a statistical view. Although the exact syntax
may differ, it is simply a view definition as in
24.10. MULTIDIMENSIONAL SYNOPSIS 517
select *
from R, S, T
where R.A = S.B and S.B = T.C
The query compiler uses transitivity to derive more predicates to increase the
search space and make it more indendent of the actual query formulation chosen
by the user (Sec. 11.2.2). Thus, the query is rewritten to
select *
from R, S, T
where R.A = S.B and S.B = T.C and R.A = T.C
and all of {R.A, S.B, T.C} are within the same equivalence class. All of the
three equality predicates have an associated selectivity. However, after two of
the predicates have been applied and their selectivities have been multiplied, the
third predicate is implied by the other two and, accordingly its selectivity should
not be used. This can easily be prevented by using a union find datastructure
[205] associated with each plan class. It contains only those variables that
contain in equivalence classes with cardinality greater than two. Initially, each
of these variables is in its own class. Then, whenever an equality predicate
is about to be applied, we check whether the two variables on the left and
right are already in the same equivalence class. If so, we ignore the predicate.
Otherwise, we apply the predicate and union the two equivalence classes of the
variables. There remains only one open question. Assume the plan generator
has generated the partial plan RBR.A=S.B S. Then, there are two predicates left
to join T : S.B = T.C and R.A = T.C. For this case, where several predicates
that can be applied, Swami and Schiefer [851] showed that the following rule
(called LS ) is the correct way to do it:
Thus, to make things more efficient, we sort the equality predicates whose vari-
ables belong to equivalence classes with more than two elements by decreasing
selectivity. Then, we can proceed as indicated above.
level 2
level 1
level 0
Further, these numbers can be derived from the sizes of the nodes, keys, and
page pointers. For B+ -Trees on attributes with domains of variable size (e.g.,
varchar), these numbers have to be maintained explicitly or estimates have to
be produced. The same is true for the min[0] and max[0] values of the leaf
nodes. Let us first assume that the min[i] and max[i] values are given. This
then results in pseudo-ranked trees [31].
For the true number of tuples f [0] in a leaf node min[0] ≤ f [0] ≤ max[0]
holds. For an arbitrary node N[1] at level 1 the number of tuples f [1] in any
of its subtrees j satisfies min[0] ∗ min[1] ≤ f [1] ≤ max[0] ∗ max[1]. In general,
for an arbitrary non-root
Q node at levelQl, the number of tuples f [l] stored in its
subtree satisfies li=1 min[l] ≤ f [l] ≤ li=1 max[i]. Denote by MIN[l] the first
product and by MAX[l] the second. Then, the most accurate estimate we can
return is
q-middle(MIN[i], MAX[i])
p min[i]
with a maximal q-error of (2l ) if || max[i] ||Q ≤ 2 holds at all levels including
the leaf node level.
Given a node N[l, k] at an arbitrary level l > 0 with J child nodes, we can
estimate its contribution to a range query with query interval Iq as in
J
X len(Iq ∩ N[l, k].I[j])
q-middle(MIN[l], MAX[l]).
len(N[l, k].I[j])
j=1
This procedure can now be applied to the root node only, However, it maybe
beneficial to descend into child nodes for better estimates. This is especially
true for those child nodes, that are not fully contained in the query interval.
Thus, the questions arise (1) which nodes to descend and (2) when to stop.
Several traversal strategies have been defined (see [35]).
This is not too bad, but there are certain problems. As indicated above,
variable length keys and overflow pages due to high skew result in certain prob-
lems. Concerning the former problem. One possibility to overcome the former
problem is to explicitly maintain the minimal and maximal fanout for each lev-
el. If this is too expensive, we could maintain the number of nodes n[l] at every
level l and use this to calculate the average fanout at level l as n[l + 1]/n[l]
and use this number instead of the minmal and maximal fanout. Definitely, we
loose any error bounds in this case. Consider the latter problem. The simplest
solution is to maintain the number of leaf nodes explicitly and to derive an
average number avg[0] of tuples in the leaf nodes, which is then used instead of
min[0] and max[0]. Obviously, we lose precision, which can only be restored by
maintaing explicit cardinality counters.
Dictionaries
Introduction Many main memory database management systems designed
for OLAP are column stores. Further, they often use ordered dictionaries to
facilitate compression of columns. Two commercial systems following these
lines are Hana [], DB2 Blue [] SQL Server [].
24.13. SELECTED ISSUES 521
1. a mapping of i to xi and
2. a mapping from xi to i.
lq ≤ A → lidx := max({i|xi ≥ lq })
lq < A → lidx := max({i|xi > lq })
A ≤ uq → uidx := min({i|xi ≤ uq })
A < uq → uidx := min({i|xi < uq })
Any range query (open or half-open or open) is then mapped to the closed
range query
Qidx := σlidx ≤A≤uidx (R). (24.38)
The mapping itself can be carried out by rather efficiently by a binary search
within the dictionary.
Since Q and Qidx are equivalent, estimation problems can now be carried
out on Qidx . This task is simplified by the very structure of a dictionary.
Distinct Values Since the dictionary is typically dense, that is no values that
do not occur in the active domain are stored, the number of distinct values of
A in Q can be calculated exactly:
|ΠD
A (σlidx ≤A≤uidx (R))| = uidx − lidx + 1. (24.39)
This requires that the fi (4 bytes) are stored for every dictionary entry.
At the expense of CPU time, we can use q-compression on the fi to diminish
memory consumption to one byte per dictionary entry. Thereby, we can be
very precise, since, e.g., 1.1255 ≈ 36 ∗ 109 .
522 CHAPTER 24. CARDINALITY AND COST ESTIMATION
• build some kind of histogram on the dictionary index, where, within every
bucket, we have to be precise only for ranges comprising more than δ
values (see Sec. 24.8.3).
24.13.2 Sampling
[206]
Implementation
525
Chapter 25
Architecture of a Query
Compiler
25.2 Architecture
Figure 25.1 a path of a query through the optimizer. For every step, a single
component is responsible. Providing a facade for the components results in the
overall architecture (Fig. 25.2). Every component is reentrant and stateless.
The information necessary for a component to process a query is passed via
references to control blocks. Control blocks are discussed next, then we discuss
memory management. Subsequent sections describe the components in some
detail.
527
528 CHAPTER 25. ARCHITECTURE OF A QUERY COMPILER
query
parsing CTS
nfst
internal representation
rewrite I query
optimizer
internal representation
plan generation
internal representation
rewrite II
internal representation
code generation
execution plan
a pointer to the schema cache. The schema cache itself allows to look up type
names, relations, extensions, indexes, and so on.
The query control block contains all the information gathered for the current
query so far. It contains the abstract syntax tree, after its construction, the
analyzed and translated query after NFST has been applied, the rewritten plan
25.4. MEMORY MANAGEMENT 529
after the Rewrite I phase, and so on. It also contains a link to the memory
manager that manages memory for this specific query. After the control block
for a query is created, the memory manager is initialized. During the destructor
call, the memory manager is destroyed and memory released.
Some components need helpers. These are also associated with the control
blocks. We discuss them together with the components.
25.6 Driver
25.7 Bibliography
1
Design pattern.
530 CHAPTER 25. ARCHITECTURE OF A QUERY COMPILER
NFST
CodeGenerator
run(NFST_CB*)
Scanner run(CodeGenerator_CB*)
SchemaCache Factorizer
NFST_CB
BlockHandler
Global_CB Rewrite_I_CB
Rewrite_II_CB OpCodeMapper
MemoryManager
CodeGenerator_CB OperatorFactory
RegisterManager
Internal Representations
26.1 Requirements
easy access to information
query representation: overall design goal: methods/functions with semantic
meaning, not only syntactic meaning.
relationships: consumer/producer (occurrance) precedence order informa-
tion equivalence of expressions (transitivity of equality) see also expr.h fuer
andere funktionen/beziehungen die gebraucht werden
2-ebenen repraesentation. 2. ebene materialisiert einige beziehungen und
funktionen, die haeufig gebraucht werden und kompliziert zu berechnen sind an-
derer grund fuer materialisierung: vermeide zuviele geschachtelte forschleifen.
bsp: keycheck: gegeben eine menge von attributen und eine menge von schlues-
seln, ist die menge ein schluessel? teste jeden schluessel, kommt jedes element
in schluessel in menge von attributen vor? (schon drei schleifen!!!)
modellierungsdetail: ein grosser struct mit dicken case oder feine klassen-
hierarchie. wann splitten: nur wenn innerhalb des optimierers verschiedene
abarbeitung erfordert.
Representation: info captured: 1) 1st class information (information ob-
vious in original query+(standard)semantic analysis) 2) 2nd class information
(derived information) 3) historic information (during query optimization itself)
- modified (original expression, modifier) - copied (original expression, copi-
er) 4) information about the expression itselt: (e.g.: is function call, is select)
5) specific representations for specific purposes (optimization algorithms, code
generation, semantic analysis) beziehungen zwischen diesen repraesentationen
info captured for 1) different parts of the optimizer
syntactic/semantic information
garbage collection: 1) manually 2) automatic 3) semi-automatic (collect
references, free at end of query)
533
534 CHAPTER 26. INTERNAL REPRESENTATIONS
• korrelationspraedikate
for blocks: indicator whether they should produce a null-tuple, in case they
do not produce any tuple. this is nice for some rewrite rules. other possibility:
if-statement in algebra.
kosten: diese sind zu berechnen und dienen als grundlage fuer die planbe-
wertung ges-kosten /* gesamt kosten (ressourcenverbrauch) */ ges-kosten +=
cpu-instr / inst/sek ges-kosten += seek-kosten * overhead (waiting/cpu) ges-
kosten += i/o-kosten * io-weight cpu-kosten /* reine cpu-kosten */ i/o-kosten
/* hintergrundspeicherzugriff ( warten auf platte + cpu fuer seitenzugriffe) */
536 CHAPTER 26. INTERNAL REPRESENTATIONS
• ordnung
• boolean properties
• kostenvektor
• cardinalitaeten bewiesen/geschaetzt
• gewuenschter puffer
• schluessel, fds
• menge der objekte, von denen der plan (der ja teilplan sein kann) abhaengt
das folgende ist alles blabla. aber es weisst auf den punkt hin,
das in dieser beziehung etwas getan werden muss.
--index: determine degree of clustering
- lese_rate = #gelesene_seiten / seiten_fuer_relation
ein praedikate erniedrigt die lesen_rate, ein erneutes lesen aufgrund einer verdr
falls TIDs sortiert werden, muss fetch_ration erneut berechnet werden
- seiten koennen in gruppen z.b. auf einem zylinder zusammengefasst werden
und mit einem prefetch befehl geholt werden. anzahl seeks abschaetzen
- cluster_ration(CR)
CR = P(read(t) ohne page read) = (card - anzahl pagefetch)/card
= (card - (#pagefetch - #page))/card
das ist besonderer quark
- cluster_factor(CF)
CF = P(avoid unnecessary pagefetch) = (pagefetch/maxpagefetch)
= card -#fetch / card - #pageinrel
das ist besonderer quark
index retrieval on full key => beide faktoren auf 100% setzen, da
innerhalb eines index die TIDs pro key-eintrag sortiert werden.
26.9 Bibliography
538 CHAPTER 26. INTERNAL REPRESENTATIONS
Chapter 27
27.1 Parsing
Lexical analysis is pretty much the same as for traditional compilers. However,
it is convenient to treat keywords as soft. This allows for example for relation
names like order which is a keyword in SQL. This might be very convenient for
users since SQL has plenty (several hundreds) of keywords. For some keywords
like select there is less danger of it being a relation name. A solution for group
and order would be to lex them as a single token together with the following
by.
Parsing again is very similar to parsing in compiler construction. For both,
lexing and parsing, generators can be used to generate these components. The
parser specification of SQL is quite lengthy while the one for OQL is pretty
compact. In both cases, a LALR(2) grammar suffices. The outcome of the
parser should be an abstract syntax tree. Again the data structure for abstract
syntax trees (ast) as well as operations to deal with them (allocation, deletion,
traversal) can be generated from an according ast specification.
During parsing already some of the basic rewriting techniques can be ap-
plied. For example, between can be eliminated.
In BD II, there are currently four parsers (for SQL, OQL, NQL (a clean
version of XQuery), XQuery). The driver allows to step through the query
compiler and allows to influence its overall behavior. For example, several trace
levels can be switched on and off while within the driver. Single rewrites can
be enabled and disabled. Further, the driver allows to switch to a different
query language. This is quite convenient for debugging purposes. We used the
Cocktail tools to generate the lexer, parser, ast, and NFST component.
539
540CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION
*
/
1. normalization of expressions,
Although these are different tasks, a single pass over the abstract syntax tree
suffices to perform all these tasks in one step.
Consider the following example query:
Expression
• for a given expression: compute the set of occurring (consumed, free) IUs
• constant folding
• merge and/or (from e.g. binary to n-ary) and push not operations
27.3 Normalization
Fig. 27.3 shows the result after normalization. The idea of normalization is to
introduce intermediate IUs such that all operators take only IUs as arguments.
This representation is quite useful.
27.4 Factorization
Common subexpressions are factorized by replacing them with references to
some IU. For the expressions in TPCD query 1, the result is shown in Fig. 27.4.
542CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION
IU:−
*
IU:− IU:−
...
select a, b, c
from A, B
where d > e and f = g
...
27.6. SEMANTIC ANALYSIS 543
IU:
*
IU: IU:
*
IU:
- +
Consider the semantic analysis of d . Since SQL provides implicit name look up,
we have to check (formerly analyzed) relations A and B whether they provide
an attribute called d . If none of them provides an attribute d , then we must
check the next upper SFW-block. If at least one of the relations A or B provides
an attribute d, we just check that only one of them provides such an attribute.
Otherwise, there would be an unallowed ambiguity. The blockwise look up is
handled by block handler. For every newly encounterd block (e.g. SFW block),
a new block is opened. All identifiers analyzed within that block are pushed
into the list of identifiers for that block. In case the query language allows for
implicit name resolution, it might also be convenient to push all the attributes
of an analyzed relation into the blocks list. The lookup is then performed
blockwise. Within every block, we have to check for ambiguities. If the lookup
fails, we have to proceed looking up the identifier in the schema. The handling
of blocks and lookups is performed by the BlockHandler component attached
to the control block of the NFST component (Fig. 25.3).
544CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION
27.7 Translation
The translation step translates the original AST representation into an internal
representation. There are as many internal query representations as there are
query compiler. They all build on calculus expressions, operator graphs build
over some algebra, or tableaux representations [873, 874]. A very powerful
representation that also captures the subtleties of duplicate handling is the
query graph model (QGM) [676].
The representation we use here is a mixture of a typed algebra and calcu-
lus. Algebraic expressions are simple operator trees with algebraic operators
like selection, join, etc. as nodes. These operator trees must be correctly typed.
For example, we are very picky about whether a selection operator returns a
set or a bag. The expression that more resemble a calculus representation than
an algebraic expression is the SFWD block used in the internal representation.
We first clarify our notion of block within the query representation described
here and then give an example of an SFWD block. A block is everything that
produces variable bindings. For example a SFWD-block that pretty directly
corresponds to a SFW-block in SQL or OQL. Other examples of blocks are
quantifier expressions and grouping operators. A block has the following ingre-
dients:
• a list of inputs of type collection of tuples1 (labeled from)
• a set of expressions whose top is an IU (labeled define)
• a selection predicate of type bool (labeled where)
For quantifier blocks and group blocks, the list of inputs is restricted to length
one. The SFWD-block and the grouping block additionally have a projection
list (labeled select) that indicates which IUs are to be projected (i.e. passed to
subsequent operators). Blocks are typed (algebraic) expressions and can thus
be mixed with other expressions and algebraic operator trees.
An example of a SFWD-block is shown in Fig. 27.5 where dashed lines
indicate the produced-by relationship. The graph corresponds to the internal
1
We use a quite general notion of tuple: a tuple is a set of variable (IU) bindings.
27.7. TRANSLATION 545
select
where
>
define
IU: IU: IU: IU:
/ 100 100.000
*
from
key IU key IU
scan IU:e scan IU:d
Relation/Extent Relation/Extent
"Employee" "Department"
• eliminate duplicates
• preserve duplicates
This summary is also attached to every block. Let us illustrate this by a simple
example:
For the inner block, the user specifies that duplicates are to be preserved. How-
ever, duplicates or not does not modify the outcome of exists. Hence, the
contextual information indicates that the outcome for the inner block is a don’t
care. The processing view can determine whether the block produces dupli-
cates. If for all the entries in the from clause, a key is projected in the select
clause, then the query does not produce duplicates. Hence, no special care has
to be taken to remove duplicates produced by the outer block if we assume that
ssno is the key of Employee.
No let us consider the annotations for the arguments in the from clause.
The query
outer part) should be annotated by (OJ). We use (AJ) as the anti-join annota-
tion, and (DJ) for a d-join. To complete annotation, the case of a regular join
can be annotated by (J). If the query language also supports all-quantifications,
that translate to divisions, then the annotation (D) should be supported.
Since the graphical representation of a query is quite complex, we also use
text representations of the result of the NFST phase. Consider the following
OQL query:
select distinct s
from s in Student, c in s.courses
where c.name = “Database”
select distinct s
from s in Student, c in s.courses
where cn = “Database”
define cn = c.name
PROJECT [s]
SELECT [cn=”Database”]
EXPAND [cn:c.name]
s from its left input, the d-join computes the set s.courses. For every course
c in s.courses an output tuple containing the original student s and a single
course c is produced. If the evaluation of the right argument of the d-join is not
dependend on the left argument, the d-join is equivalent with a cross product.
The first optimization is to replace d-joins by cross products whenever possible.
Queries with a group by clause must be translated using the unary grouping
operator GROUP which we denote by Γ. It is defined as
Γg;θA;f (e) = {y.A ◦ [g : G]|y ∈ e,
G = f ({x|x ∈ e, x.Aθy.A})}
where the subscripts have the following semantics: (i) g is a new attribute
that will hold the elements of the group (ii) θA is the grouping criterion for a
sequence of comparison operators θ and a sequence of attribute names A, and
(iii) the function f will be applied to each group after it has been formed. We
often use some abbreviations. If the comparison operator θ is equal to “=”, we
don’t write it. If the function f is identity, we omit it. Hence, Γg;A abbreviates
Γg;=A;id .
Let us complete the discussion on internal query representation. We already
mentioned algebraic operators like selection and join. These are called logical
algebraic operators. There implementations are called physical algebraic opera-
tors. Typically, there exist several possible implementations for a single logical
algebraic operator. The most prominent example being the join operator with
implementations like Grace join, sort-merge join, nested-loop join etc. All the
operators can be modelled as objects. To do so, we extend the expression hi-
erarchy by an algebra hierarchy. Although not shown in Fig 27.7, the algebra
class should be a subclass of the expression class. This is not necessary for SQL
but is a requirement for more orthogonal query languages like OQL.
27.8. REWRITE I 549
Expression
Algebraic Operator
Join
Dup SortUnnestSelectChiProjection AlgIfDivision AlgSetop
Elim
27.8 Rewrite I
27.10 Rewrite II
generated. First program is the init program. It initializes the registers that
will hold the results for the aggregate functions. For example, for an average
operation, the register is initalized with 0. The advance program is executed
once for every tuple to advance the aggregate computation. For example, for
an average operations, the value of some register of the input tuple is added
to the result register holding the average. The finalize program performs post-
processing for aggregate functions. For example for the average, it devides the
sum by the number of tuples. For hash-based grouping, the last two programs
(see Fig.1.6) compute the hash value of the input register set and compare the
group-by attributes of the input registers with those of every group in the hash
bucket.
During the code generation for the subscripts factorization of common subex-
pression has to take place. Another task is register allocation and deallocation.
This task is performed by the register manager. It uses subroutines to de-
termine whether some registers are no longer needed. The register manager
must also keep track in which register some IU is stored (if at all). Another
component used during code generation is a factory that generates new AVM
operations. This factory is associated with a table driven component that maps
the operations used in the internal query representation to AVM opcodes.
27.12 Bibliography
Chapter 28
Hard-Wired Algorithms
28.1.1 Introduction
select e.name
from Employee e, Department d
where e.dno = d.dno and d.name = “shoe”
The bottom level contains two table scans that scan the base tables Employee
and Department. Then, a selection operator is applied to restrict the depart-
ments to those named “shoe”. A nested-loop join is used to select those employ-
ees that work in the selected departments. The projection restricts the output
to the name of the employees, as required by the query block. For such a plan,
a cost function is used to estimate its cost. The goal of plan generation is to
generate the cheapest possible plan. Costing is briefly sketched in Section ??.
The foundation of plan generation are algebraic equivalences. For e, e1 , e2 , . . .
being algebraic expressions and p, q predicates, here are some example equiva-
551
552 CHAPTER 28. HARD-WIRED ALGORITHMS
Project (e.name)
lences:
For more equivalences and conditions that ought to be attached to the equiva-
lences see the appendix ??. Note that commutativity and associativity of the
join operator allow an arbitrary ordering. Since the join operator is the most
expensive operation, ordering joins is the most prominent problem in plan gen-
eration.
These equivalences are of course independent of the actual implementation
of the algebraic operators. The total number of plans equivalent to the original
query block is called the potential search space. However, not always is the
total search space considered. The set of plans equivalent to the original query
considered by the plan generator is the actual search space. Since the System R
plan generator [772], certain restrictions are applied. The most prominent are:
• Generate only plans where selections are pushed down as far as possible.
B
B
B R4
B R3 B B
R1 R2 R1 R2 R3 R4
even harder. Even if there is no predicate, that is only cross products have
to be used, the problem is NP-hard [756]. This is surprising since generating
left-deep trees with cross products as the only operation is very simple: just
sort the relations by increasing sizes.
Given the complexity of the problem, there are only two alternatives to
generate plans: either explore the total search space or use heuristics. The
former can be quite expensive. This is the reason why the above mentioned
restrictions to the search space have traditionally been applied. The latter
approach risks missing good plans. The best-known heuristics is to join the
relation next, that results in the smallest next intermediate result. Estimating
the cardinality of such results is discussed in Section ??.
Traditionally, selections where pushed as far down as possible. However,
for expensive selection predicates (e.g. user defined predicates, those involving
user-defined functions, predicates with subqueries) this does not suffice. For
example, if a computer vision application has to compute the percentage of
snow coverage for a given set of satellite images, this is not going to be cheap. In
fact, it can be more expensive than a join operation. In these cases, pushing the
expensive selection down misses good plans. That is why lately research started
to take expensive predicates into account. However, some of the proposed
solutions do not guarantee to find the optimal plans. Some approaches and their
bugs are discussed in [153, 409, 407, 755, 757]. Although we will subsequently
give an algorithm that incorporates correct predicate placement, not all plan
generators do so. An alternative approach (though less good) is to pull-up
expensive predicates in the Rewrite-II-phase.
There are several approaches to explore the search space. The original ap-
proach is to use dynamic programming [772]. The dynamic programming algo-
rithm is typically hard-coded. Figure 28.3 illustrates the principle of bottom-up
plan generation as applied in dynamic programming. The bottom level consists
of the original relations to be joined. The next level consists of all plans that
join a subset of cardiniality two of the original relations. The next level con-
tains all plans for subsets of cardinality three, and so on. With the advent
of new query optimization techniques, new data models, extensible database
systems, researchers where no longer satisfied with the hard-wired approach.
Instead, they aimed for rule-based plan generation. There exist two differ-
ent approaches for rule-based query optimizers. In the first approach, the al-
gebraic equivalences that span the search space are used to transform some
initial query plan derived from the query block into alternatives. As search
strategies either exhaustive search is used or some stochastic approach such as
simulated annealing, iterative improvement, genetic algorrithms and the like
[73, 440, 445, 446, 823, 847, 846, 849]. This is the transformation-based ap-
proach. This approach is quite inefficient. Another approach is to generate
plans by rules in a bottom-up fashion. This is the generation-based approach.
In this approach, either a dynamic programming algorithm [556] is used or
memoization [354]. It is convenient to classify the rules used into logical and
physical rules. The logical rules directly reflect the algebraic equivalences. The
physical rules or implementation rules transform a logical algebraic operator
into a physical algebraic operator. For example, a join-node becomes a nested-
28.1. HARD-WIRED DYNAMIC PROGRAMMING 555
Within the brief discussion in the last subsection, we enumerated plans such
that first all 1-relation plans are generated, then all 2-relation plans and so on.
This enumeration order is not the most efficient one. Let us consider the simple
problem where we have to generate exactly one best plan for the subsets of the
n element set of relations to be joined. The empty subset is not meaningful,
leaving the number of subsets to be investigated at 2n − 1. Enumerating these
subsets can be done most efficient by enumerating them in counting order . That
is, we initialize a n bit counter with 1 and count until have reached 2n − 1. The
n bits represent the subsets. Note that with this enumeration order, plans are
still generated bottom up. For a given subset R of the relations (encoded as the
bit pattern a), we have to generate a plan from subsets of this subset (encoded
as the bit pattern s). For example, if we only want to generate left-deep trees,
then we must consider 1 element subsets and their complements. If we want
to generate bushy trees, all subsets must be considered. We can generate these
subsets by a very fast algorithm developed by Vance and Maier [885]:
s = a & -a;
while(s) {
s = a & (s - a);
process(s);
}
Here, the user requests the result to be order on d.dno. Incidentally, this is also a
join attribute. During bottom up plan generation, we might think that a Grace
hash join is more efficient than a sort-merge join since the cost of sorting the
relations is too high. However, the result has to be sorted anyway so that this
sort may pay off. Hence, we have have to keep both plans. The approach is the
following. In the example, an ordering on d.dno is called an interesting order.
In general, any order that is helpful for ordering the output as requested by the
user, for a join operator, for a grouping operator, or for duplicate elimination
is called an interesting order . The dymamic programming algorithm is then
modified such that plans are not pruned, if they produce different interesting
orders.
28.2 Bibliography
28.2. BIBLIOGRAPHY 557
proc Optimal-Bushy-Tree(R, P )
1 for k = 1 to n do
2 for all k-subsets Mk of R do
3 for l = 0 to min(k, m) do
4 for all l-subsets Pl of Mk ∩ RS do
5 best cost so far = ∞;
6 for all subsets L of Mk with 0 < |L| < k do
7 L0 =VMk \ L, V = Pl ∩ L, V 0 = Pl ∩ L0 ;
8 p = {pi,j | pi,j ∈ P, Ri ∈ V, Rj ∈ V 0 }; // p=true might hold
9 T = (T [L, V ] 1p T [L0 , V 0 ]);
10 if Cost(T) < best cost so far then
11 best cost so far = Cost(T);
12 T [Mk , Pl ] = T ;
13 fi;
14 od;
15 for all R ∈ Pl do
16 T = σR (T [Mk , Pl \ {R}]);
17 if Cost(T) < best cost so far then
18 best cost so far = Cost(T);
19 T [Mk , Pl ] = T ;
20 fi;
21 od;
22 od;
23 od;
24 od;
25 od;
26 return T [R, S];
Rule-Based Algorithms
29.3 Bibliography
559
560 CHAPTER 29. RULE-BASED ALGORITHMS
Chapter 30
561
562 CHAPTER 30. EXAMPLE QUERY COMPILER
Globale Kontrolle
Regionen
? ? ?
Kontrolle Kontrolle Kontrolle
? ?
Kontrolle Kontrolle
Transformation Transformation
30.1.4 Ereq
A primary goal of the EREQ project is to define a common architecture for the
next generation of database managers. This architecture now includes
* the query language OQL (a la ODMG), * the logical algebra AQUA (a la
Brown), and * the physical algebra OPA (a la OGI/PSU).
It also includes
* software to parse OQL into AQUA (a la Bolo)
and query optimizers:
* OPT++ (Wisconsin), * EPOQ (Brown), * Cascades (PSU/OGI), and *
Reflective Optimizer (OGI).
In order to test this architecture, we hope to conduct a ”bakeoff” in which
the four query optimizers will participate. The primary goal of this bakeoff is
to determine whether optimizers written in different contexts can accommodate
the architecture we have defined. Secondarily, we hope to collect enough per-
formance statistics to draw some conclusions about the four optimizers, which
have been written using significantly different paradigms.
564 CHAPTER 30. EXAMPLE QUERY COMPILER
Modellbeschreibung
?
Optimierergenerator
?
C-Compiler
?
Anfrage - Synt. Analyse - Optimierer - Interpreter - Antwort
At present, OGI and PSU are testing their optimizers on the bakeoff queries.
Here is the prototype bakeoff optimizer developed at OGI. This set of Web
pages is meant to report on the current progress of their effort, and to define
the bakeoff rules. Please email your suggestions for improvement to Leo Fegaras
[email protected]. Leo will route comments to the appropriate author.
http://www.cse.ogi.edu/DISC/projects/ereq/bakeoff/bakeoff.html
30.1.5 Exodus/Volcano/Cascade
Im Rahmen des Exodus-Projektes wurde ein Optimierergenerator entwickelt
[351]. Einen Überblick über den Exodus-Optimierergenerator gibt Abbildung 30.2.
Ein Model description file enthält alle Angaben, die für einen Optimierer nötig
sind. Da der Exodus-Optimierergenerator verschiedene Datenmodelle unter-
stützen soll, enthält dieses File zunächst einmal die Definition der verfügbaren
Operatoren und Methoden. Dabei werden mit Operatoren die Operatoren der
logischen Algebra bezeichnet und mit Methoden diejenigen der physischen Al-
gebra, also die Implementierungen der Operatoren. Das Model description file
enthält weiterhin zwei Klassen von Regeln. Transformationen basieren auf alge-
braischen Gleichungen und führen einen Operatorbaum in einen anderen über.
Implementierungsregeln wählen für einen gegebenen Operator eine Methode
aus. Beide Klassen von Regeln haben einen linken Teil, der mit einem Teil des
aktuellen Operatorgraphen übereinstimmen muß, einen rechten Teil, der den
Operatorgraphen nach Anwendung der Regel beschreibt, und eine Bedingung,
die erfüllt sein muß, damit die Regel angewendet werden kann. Während die
linke und rechte Seite der Regel als Muster angegeben werden, wird die Be-
dingung durch C-Code beschrieben. Auch für die Tranformation lassen sich
C-Routinen verwenden. In einer abschließenden Sektion des Model description
files finden sich dann die benötigten C-Routinen.
30.1. RESEARCH PROTOTYPES 565
Aus dem Model description file wird durch den Optimierergenerator ein
C-Programm erzeugt, das anschließend übersetzt und gebunden wird. Das
Ergebnis ist dann der Anfrageoptimierer, der in der herkömmlichen Art und
Weise verwendet werden kann. Es wurde ein übersetzender Ansatz für die
Regeln gewählt und kein interpretierender, da in einem von den Autoren vorher
durchgeführten Experiment sich die Regelinterpretation als zu langsam erwiesen
hat.
Die Regelabarbeitung im generierten Optimierer verwaltet eine Liste OPEN,
in der alle anwendbaren Regeln gehalten werden. Ein Auswahlmechanismus
bestimmt dann die nächste anzuwendende Regel und entfernt sie aus OPEN.
Nach deren Anwendung werden die hierdurch ermöglichten Regelanwendungen
detektiert und in OPEN vermerkt. Zur Implementierung des Auswahlmecha-
nismus werden sowohl die Kosten eines aktuellen Ausdrucks als auch eine Ab-
schätzung des Potentials einer Regel in Betracht gezogen. Diese Abschätzung
des Potentials berechnet sich aus dem Quotienten der Kosten für einen Opera-
torbaum vor und nach Regelanwendung für eine Reihe von vorher durchgeführten
Regelanwendungen. Mit Hilfe dieser beiden Angaben, den Kosten des aktuellen
Operatorgraphen, auf den die Regel angewendet werden soll, und ihres Poten-
tials können dann Abschätzungen über die Kosten des erzeugten Operator-
graphen berechnet werden. Die Suchstrategie ist Hill climbing.
Der von den Autoren vermerkte Hauptnachteil ihres Optimierergenerators,
den sie jedoch für alle transformierenden regelbasierten Optimerer geltend machen,
ist die Unmöglichkeit der Abschätzung der absoluten Güte eines Operator-
baumes und des Potentials eines Operatorbaumes im Hinblick auf zukünftige
Optimierungen. Dadurch kann niemals abgeschätzt werden, ob der optimale
Operatorbaum bereits erreicht wurde. Erst nach Generierung aller Alternativ-
en ist die Auswahl des optimalen Operatorbaumes möglich. Weiter bedauern es
die Autoren, daß es nicht möglich ist, den A∗-Algorithmus als Suchfunktion zu
verwenden, da die Abschätzung des Potentials oder der Distanz zum optimalen
Operatorgraphen nicht möglich ist.
Zumindest kritisch gegenüberstehen sollte man auch der Bewertung einzel-
ner Regeln, da diese, basierend auf algebraischen Gleichungen, von zu feiner
Granularität sind, als daß eine allgemeine Bewertung möglich wäre. Die erfol-
greiche Verwendung des Vertauschens zweier Verbundoperationen in einer An-
frage bedeutet noch lange nicht, daß diese Vertauschung auch in der nächsten
Anfrage die Kosten verringert. Die Hauptursache für die kritische Einstel-
lung gegenüber dieser recht ansprechenden Idee ist, daß eine Regelanwendung
zu wenig Information/Kontext berücksichtigt. Würde dieses Manko beseitigt,
wären Regeln also von entschieden gröberer Granularität, so erschiene dieser
Ansatz vielversprechend. Ein Beispiel wäre eine Regel, die alle Verbundoper-
ationen gemäß einer gegebenen Heuristik ordnet, also ein komplexer Algorith-
mus, der mehr Wissen in seine Entscheidungen einbezieht.
Graefe selbst führt einige weitere Nachteile des Exodus-Optimierergenerators
an, die dann zur Entwicklung des Volcano-Optimierergenerators führten [353,
354]. Unzureichend unterstützt werden
• nicht-triviale Kostenmodelle,
566 CHAPTER 30. EXAMPLE QUERY COMPILER
• Eigenschaften,
• Heuristiken und
(select <proj-list>
<sel-pred-list>
<join-pred-list>
<table-list>)
<rel-name>.<attr-name>
30.1. RESEARCH PROTOTYPES 567
Anfrage
?
Generierung des
allg. Ausdrucks
?
Zugriffsplan-
generierung
?
Join-
Reihenfolge und
-Methoden
?
Auswertungsplan
30.1.7 Genesis
Das globale Ziel des Genesisprojektes [58, 59, 60, 63] war es, die gesamte Daten-
banksoftware zu modularisieren und eine erhöhte Wiederverwendbarkeit von
Datenbankmodulen zu erreichen. Zwei Teilziele wurden hierbei angestrebt:
568 CHAPTER 30. EXAMPLE QUERY COMPILER
Wir interessieren uns hier lediglich für die Erreichung der Ziele beim Bau von
Optimierern [56, 61].
Die Standardisierung der Schnittstellen wird durch eine Verallgemeinerung
von Anfragegraphen erreicht. Die Algorithmen selbst werden durch Transfor-
mationen auf Anfragegraphen beschrieben. Man beachte, daß dies nicht be-
deutet, daß die Algorithmen auch durch Transformationregeln implementiert
werden. Regeln werden lediglich als Beschreibungsmittel benutzt, um die Natur
der Wiederverwendbarkeit von Optimierungsalgorithmen zu verstehen.
Die Optimierung wird in zwei Phasen eingeteilt, die Reduktionsphase und
die Verbundphase. Die Reduktionsphase bildet Anfragegraphen, die auf nicht
reduzierten Datenmengen arbeiten, auf solche ab, die auf reduzierten Daten-
mengen arbeiten. Die Reduktionsphase orientiert sich also deutlich an den
Heuristiken zum Durchschieben von Selektionen und Projektionen. Die zweite
Phase bestimmt Verbundordnungen. Damit ist die in den Papieren beschriebene
Ausprägung des Ansatzes sehr konservativ in dem Sinne, daß nur klassische
Datenmodelle betrachtet werden. Eine Anwendung der Methodik auf objekto-
rientierte oder deduktive Datenmodelle steht noch aus.
Folglich lassen sich nur die existierenden klassischen Optimierungsansätze
mit diesen Mitteln hinreichend gut beschreiben. Ebenso lassen sich die ex-
istierenden klassischen Optimierer mit den vorgestellten Mitteln als Zusam-
mensetzung der ebenfalls im Formalismus erfaßten Algorithmen beschreiben.
Die Zusammensetzung selbst wird mit algebraischen Termersetzungen beschrieben.
Durch neue Kompositionsregeln lassen sich dann auch neue Optimierer beschreiben,
die andere Kombinationen von Algorithmen verwenden.
Durch die formale, implementierungsunabhängige Beschreibung sowohl der
einzelnen Optimierungsalgorithmen als auch der Zusammensetzung eines Opti-
mierers wird die Wiederverwendbarkeit von bestehenden Algorithmen optimal
unterstützt. Wichtig dabei ist auch die Verwendung der standardisierten An-
fragegraphen. Dieser Punkt wird allerdings aufgeweicht, da auch vorgesehen ist,
verschiedene Darstellungen von Anfragegraphen zu verwenden [59]. Hierdurch
wird die Wiederverwendung von Implementierungen von Optimierungsalgorith-
men natürlich in Frage gestellt, da diese üblicherweise nur auf einer bestimmten
Darstellung der Anfragegraphen arbeiten.
Wenn neue Optimierungsansätze entwickelt werden, so lassen sie sich eben-
falls im vorgestellten Formalismus beschreiben. Gleiches gilt auch für neue In-
dexstrukturen, da auch diese formal beschrieben werden [57, 62]. Nicht abzuse-
hen ist, in wieweit der standardisierte Anfragegraph Erweiterungen standhält.
Dies ist jedoch kein spezifisches Problem des Genesisansatzes, sondern gilt für
alle Optimierer. Es ist noch offen, ob es gelingt, die Optimierungsalgorithmen
so zu spezifizieren und zu implementieren, daß sie unabhängig von der konkreten
Darstellung oder Implementierung der Anfragegraphen arbeiten. Der objekto-
rientierte Ansatz kann hier nützlich sein. Es erhebt sich jedoch die Frage, ob bei
Einführung eines neuen Operators die bestehenden Algorithmen so implemen-
30.1. RESEARCH PROTOTYPES 569
tierbar sind, daß sie diesen ignorieren können und trotzdem sinnvolle Arbeit
leisten.
Die Beschränkung auf zwei Optimierungsphasen, die Reduktions- und die
Verbundphase, ist keine Einschränkung, da auch sie mittels Termersetzungsregeln
festgelegt wurde, und somit leicht geändert werden kann.
Da die Beschreibungen des Optimierers und der einzelnen Algorithmen un-
abhängig von der tatsächlichen Implementierung sind, sind auch die globale
Kontrolle des Optimierers und die lokalen Kontrollen der einzelnen Algorithmen
voneinander losgelöst. Dieses ist eine wichtige Forderung, um Erweiterbarkeit
zu erreichen. Sie wird oft bei regelbasierten Optimierern verletzt und schränkt
somit deren Erweiterbarkeit ein.
Die Evaluierbarkeit, die Vorhersagbarkeit und die frühe Bewertung von Al-
ternativen sind mit dem vorgestellten Ansatz nicht möglich, da die einzelnen
Algorithmen als Transformationen auf dem Anfragegraphen aufgefaßt werden.
Dieser Nachteil gilt jedoch nicht allein für den hier vorgestellten Genesisansatz,
sondern generell für alle bis auf einen Optimierer. Es ist allerdings nicht ab-
sehbar, ob dieser Nachteil aus dem verwendeten Formalismus resultiert oder
lediglich aus deren Konkretisierung bei der Modellierung bestehender Optimier-
er. Es ist durchaus möglich, daß der Formalismus mit leichten Erweiterungen
auch andere Ansätze, insbesondere den generierenden, beschreiben kann.
Insgesamt handelt es sich beim Genesisansatz um einen sehr brauchbaren
Ansatz. Leider hat er, im Gegensatz zur Regelbasierung, nicht genug Wider-
hall gefunden hat. Er hat höchst wahrscheinlich mehr Möglichkeiten, die An-
forderungen zu erfüllen, als bisher ausgelotet wurde.
30.1.8 GOMbgo
570 CHAPTER 30. EXAMPLE QUERY COMPILER
GOMql-Anfrage
Termrepräsentation
? Heuristik
ASR-Schema - Regelanwendung
HH
Y
HH
HH
Regelbasis
Liste der optimierten Terme
optimierter Term
Code Generator
Auswertungsplan
(QEP)
?
heuristic
evaluator
tool−
box cond rule
mgr application
query optimized
transf query alternatives
pattern mgr
matcher
environment manager
Schema Manager
− types
− access support relations
X
?
Normalisierung
?
Algebraische-
optimierung
1
@
? @
@
Konstante u.
gemeinsame π χ
Teilausdrücke
@
@
@
?
sort
Übersetzung in
@
Ausdrucksalgebra
@
@
σ head
? @
@
@
nicht-algebr.
Optimierung
REL
?
30.1.9 Gral
Gral ist ein erweiterbares geometrisches Datenbanksystem. Der für dieses Sys-
tem entwickelte Optimierer, ein regelbasierter Optimierer in Reinkultur, erzeugt
aus einer gegebenen Anfrage in fünf Schritten einen Ausführungsplan (s. Abb. 30.6
a) [66]. Die Anfragesprache ist gleich der verwendeten deskriptiven Algebra
(descriptive algebra). Diese ist eine um geometrische Operatoren erweiterte
relationale Algebra. Als zusätzliche Erweiterung enthält sie die Möglichkeit,
Ausdrücke an Variablen zu binden. Ein Auswertungsplan wird durch einen Aus-
druck der Ausführungsalgebra (executable algebra) dargestellt. Die Ausführungsalgebra
beinhaltet im wesentlichen verschiedene Implementierungen der deskriptiven
Algebra und Scan-Operationen. Die Trennung zwischen deskriptiver Algebra
und Ausführungsalgebra ist strikt, das heißt, es kommen keine gemischten Aus-
drücke vor (außer während der expliziten Konvertierung (Schritt 4)).
Die Schritte 1 und 3 sind durch feste Algorithmen implementiert. Während
30.1. RESEARCH PROTOTYPES 573
wobei
specification von der Form
SPEC spec1 ,. . . ,specn
ist. Dabei sind die speci Range-Spezifikationen wie beispielsweise opi in <
OpSet >.
definition Variablen definiert (bspw. für Attributsequenzen). In Gral ex-
istieren verschiedene Sorten von Variablen für Attribute, Operationen,
Relationen etc.
pattern ein Muster in Form eines Ausdrucks ist, der Variablen und Konstan-
ten enthalten kann. Der Ausdruck kann ein Ausdruck der deskriptiven
Algebra oder der Ausführungsalgebra sein.
conditioni eine Bedingung ist. Diese Bedingung ist ein allgemeiner boolescher
Ausdruck. Spezielle Prädikate wie ExistsIndex (existiert ein Index für eine
Relation?) werden von Gral zur Verfügung gestellt.
resulti wiederum ein Ausdruck ist, der das Ergebnis der Regel beschreibt.
valuationi ist ein arithmetischer Ausdruck, der einen numerischen Wert zurückliefert.
Dieser kann in einer (Gral unterstützt mehrere) Auswahlstrategie herange-
zogen werden: Es wird die Regel mit der kleinsten valuation bevorzugt.
30.1. RESEARCH PROTOTYPES 575
Die Auswertung einer Regel erfolgt standardmäßig. Sei E der Ausdruck auf
den die Regel angewendet werden soll.
Der Gral-Optimierer ist ein reiner regelbasierter Optimierer, der den Trans-
formationsansatz verfolgt. Dementsprechend treffen alle vorher identifizierten
Nachteile derselben zu.
Zu bemängeln sind im einzelnen folgende Punkte:
30.1.10 Lambda-DB
http://lambda.uta.edu/lambda-DB/manual/overview.html
Query Execution Plans QEPs werden als (deep) processing trees repraesen-
tiert.
Cost Model Ziemlich aehnlich dem, das wir verwenden. Sie benutzt auch
solche Sachen wie card(C), size(C), ndist(Ai ), f an(Ai ), share(Ai ). Einzel-
heiten stehen in meiner Ausarbeitung.
30.1.12 Opt++
wisconsin
30.1.13 Postgres
Postgres ist kein Objektbanksystem sondern fällt in die Klasse der erweiterten
relationalen Systeme [833]. Die wesentlichen Erweiterungen sind
• berechenbare Attribute, die als Quel-Anfragen formuliert werden [831],
• Operationen [829],
• Regeln [832].
Diese beiden Punkte sollen uns jedoch an dieser Stelle nicht interessieren. Die
dort entwickelten Optimierungstechniken, insbesondere die Materialisierung der
berechenbaren Attribute, sind in der Literatur beschrieben [461, 394, 392, 393].
Unser Interesse richtet sich vielmehr auf eine neuere Publikation, in der eine
Vorschlag für die Reihenfolgebestimmung von Selektionen und Verbundopera-
tionen unterbreitet wird [409]. Diese soll im folgenden kurz vorgestellt werden.
Zunächst jedoch einige Vorbemerkungen.
Wenn man eine Selektion verzögert, also nach einem Verbund ausführt,
obwohl dies nicht notwendig wäre, so kann es passieren, daß das Selektion-
sprädikat auf mehr Tupeln ausgewertet werden muß. Es kann jedoch nicht
passieren, daß es auf mehr verschiedenen Werten ausgeführt werden muß. Im
Gegenteil, die Anzahl der Argumentewerte wird durch einen Verbund im allge-
meinen verkleinert. Cached man also die bereits errechneten Werte des Selek-
tionsprädikates, so wird die Anzahl der Auswertungen des Selektionsprädikates
30.1. RESEARCH PROTOTYPES 577
nach einem Verbund zumindest nicht größer. Die Auswertung wird dann durch
ein Nachschlagen ersetzt. Da wir hier nur teure Selektionsprädikate betrachten,
ist ein Nachschlagen sehr billig gegenüber der Auswertung. Die Kosten für das
Nachschlagen können sogar vernachläßigt werden. Es bleibt das Problem der
größe des Caches. Liegt Eingabe sortiert nach den Argumenten des Selektion-
sprädikates vor, so kann der die größe des Caches unter Umständen auf 1 re-
duziert werden. Er erübrigt sich ganz, wenn man eine indirekte Repräsentation
des Verbundergebnisses verwendet. Eine mögliche indirekte Repräsentation ist
in Abbildung ?? dargestellt, wobei die linke der abgebildeten Relationen die
Argumente für das betrachtete Selektionsprädikat enthalte.
Für jedes Selektionsprädikat p(a1 , . . . , an ) mit Argumenten ai bezeichne cp
die Kosten der Auswertung auf einem Tupel. Diese setzen sich aus CPU-
und I/O-Kosten zusammen (s. [409]). Ein Plan ist ein Baum, dessen Blätter
scan-Knoten enthalten und dessen innere Knoten mit Selektions- und Verbund-
prädikaten markiert sind. Ein Strom in einem Plan ist ein Pfad von einem Blatt
zur Wurzel. Die zentrale Ide ist nun die Selektions- und Verbundprädikate nicht
zu unterscheiden, sondern gleich zu behandeln. Dabei wird angenommen, daß
alle diese Prädikate auf dem Kreuzprodukt aller Relationen der betrachteten
Anfrage arbeiten. Dies erfordert eine Anpassung der Kosten. Seien a1 , . . . , an
die Relationen der betrachteten Anfrage und p ein Prädikat über den Relationen
a1 , . . . , ak . Dann sind die globalen Kosten von p wie folgt definiert:
C(p) = Qn c p
i=k+1 |ai |
Die globalen Kosten berechnen die Kosten der Auswertung des Prädikates über
der gesamten Anfrage. Hierbei müssen natürlich diejenigen Relationen her-
ausgenommen werden, die das Prädikat nicht beeinflussen. Zur Illustration
nehme man an, p sei ein Selektionsprädikat auf nur einer Relation a1 . Wendet
man p direkt auf a1 an, so entstehen die Kosten cp ∗ |a1 |. Im vereinheitlicht-
en Modell wird angenommen, daß jedes Prädikat auf dem Kreuzprodukt aller
in der Anfrage beteiligten Relationen ausgewertet wird. Es entstehen also die
Kosten C(p)∗|a1 |∗|a2 |∗. . .∗|an |. Diese sind aber gleich cp ∗|a1 |. Dies ist natürlich
nur unter der Verwendung eines Caches für die Werte der Selektionsprädikate
korrekt. Man beachte weiter, daß die Selektivität s(p) eines Prädikates p un-
abhängig von der Lage innerhalb eines Stroms ist.
Der globale Rang eines Prädikates p ist definiert als
s(p)
rank (p) = C(p)
Man beachte, daß die Prädikate innerhalb eines Stroms nicht beliebig umord-
bar sind, da wir gewährleisten müssen, daß die von einem Prädikat benutzten
Argumente auch vorhanden sein müssen. In [409] wir noch eine weitere Ein-
schränkung vorgenommen: Die Verbundreihenfolge darf nicht angetastet wer-
den. Es wird also vorausgesetzt, daß eine optimale Verbundreihenfolge bereits
bestimmt wurde und nur noch die reinen Selektionsprädikate verschoben wer-
den dürfen.
Betrachtet man zunächst einmal nur die Umordnung der Prädikate auf
einem Strom, so erhält man bedingt durch die Umordbarkeitseinschränkungen
578 CHAPTER 30. EXAMPLE QUERY COMPILER
30.1.15 Secondo
Gueting
30.1.16 Squiral
Der erste Ansatz eines regelbasierten Optimierers, Squiral, kann auf das Jahr
1975 zurückgeführt werden [812]. Man beachte, daß dieses Papier vier Jahre
älter ist als das vielleicht am häufigsten zitierte Papier über den System R
Optimierer [772], der jedoch nicht regelbasiert, sondern fest verdrahtet ist.
Abbildung 30.7 gibt einen Überblick über den Aufbau von Squiral. Nach
der syntaktischen Analyse liegt ein Operatorgraph vor. Dieser ist in Squiral
zunächst auf einen Operatorbaum beschränkt. Zur Behandlung von gemein-
samen Teilausdrücken wird das Anlegen von temporären Relationen, die den
30.1. RESEARCH PROTOTYPES 579
query
parsing
operator graph
transformation graph
transformations
rules transformations
optimized
operator graph
operator base
construction procedures
cooperative
concurrent
programs
database
machine
result
eingegangen.
Die wesentliche Aufgabe der Operatorkonstruktion ist die Auswahl der tatsächlichen
Implementierungen der Operatoren im Operatorgraph unter optimaler Aus-
nutzung gegebener Sortierreihenfolgen. Auch diese Phase der Optimierung ist
in Squiral nicht kostenbasiert. Sie wird durch zwei Durchläufe durch den Op-
eratorgraphen realisiert. Der erste Durchlauf berechnet von unten nach oben
die möglichen Sortierungen, die ohne zusätzlichen Aufwand möglich sind, da
beispielsweise Relationen schon sortiert sind, und vorhandene Sortierungen
durch Operatoren nicht zerstört werden. Im zweiten Durchlauf, von oben nach
unten, werden Umsortierungen nur dann vorgenommen, wenn keine der im er-
sten Durchlauf berechneten Sortierungen eine effiziente Implementierung des
zu konvertierenden Operators erlaubt. Beide Durchläufe sind mit Regelsätzen
spezifiziert. Es ist bemerkenswert, daß die Anzahl der Regeln, 32 für den
Aufwärtspaß und 34 für den Abwärtspaß, die Anzahl der Regeln für die Trans-
formationsphase (insgesamt 7 Regeln), bei weitem übertrifft. Auch die Kom-
plexität der Regeln ist erheblich höher.
Beide für uns interessante Phasen, die Operatorgraphtransformation und
Operatorkonstruktion, sind mit Regeln spezifiziert. Es ist jedoch in beiden
Phasen kein Suchprozeß nötig, da die Regeln alle Fälle sehr gezielt auflisten
und somit einen eindeutigen Entscheidungsbaum beschreiben. Eine noch minu-
tiösere Unterscheidung für die Erzeugung von Ausführungsplänen in der Oper-
atorkonstruktionsphase gibt es nur noch bei Yao [944]. Diese haben auch den
Vorteil, durch Kostenrechnungen belegt zu sein.
Da die Regeln in ihren Prämissen die Heuristik ihrer Anwendung mit kodieren
und keine eigene Suchfunktion zur Anwendung der Regeln existiert, ist die Er-
weiterbarkeit sehr schwierig. Das Fehlen jeglicher Kostenbewertung macht eine
Evaluation der Alternativen unmöglich. Daher ist es auch schwer, die einzel-
nen Komponenten des Optimierers, nämlich die Regeln, zu bewerten, zumal der
transformierende Ansatz gewählt wurde. Der Forderung nach Vorhersagbarkeit
und stetiger Leistungsabfall wird in diesem Ansatz ebenfalls nicht nachgegan-
gen.
Zerteilung
?
Anfrage-
transformation
Planoptimierung
Planverfeinerung
?
Auswertungsplan
(F). Knoten, die mit F markiert sind, tragen zur Erzeugung des Ergebnisses
eines Operators bei, die Quantorenmarkierungen zu dessen Einschränkung. Die
Kanten sind mit den Prädikaten markiert. Es ergeben sich also Schleifen für nur
eine Relation betreffende Prädikate. Weitere Operatoren sind insert, update,
intersection, union und group-by. Daneben wird die QGM-Repräsentation ein-
er Anfrage mit Schemainformation und statistischen Daten angereichert. Sie
dient also auch als Sammelbecken für alle die Anfrage betreffende Information.
Die QGM-Repräsentation dient der Anfragetransformation (Abb. 30.8) als
Ausgangspunkt. Die Anfragetransformation generiert zu einer QGM-Repräsentation
verschiedene äquivalente QGM-Repräsentationen. Die Anfragetransformation
läßt sich, abgesehen von den Darstellungsunterschieden von QGM und Hydro-
gen, als eine Variante der Source-level-Transformationen ansehen. Sie wird
regelbasiert implementiert, wobei C die Regelsprache ist. Eine Regel besteht
aus 2 Teilen, einer Bedingung und einer Aktion. Jeder Teil wird durch eine
C-Prozedur beschrieben. Dadurch erübrigt sich die Implementierung eines all-
gemeinen Regelinterpreters mit Pattern-matching. Regeln können in Gruppen
zusammengefaßt werden. Der aktuelle Optimierer umfaßt drei Klassen von
Regeln:
Für die Ausführung der Regeln stehen drei verschiedene Suchstrategien zur
Verfügung:
1. sequentiell,
2. prioritätsgesteuert und
Alternative
deklarative normalisierter Objektalgebra- typkonsistenter optimierter Ausf”uhrungs-
Anfrage Kalk”ulausdruck ausdruck Ausdruck Algebraausdruck pl”ane
• nested loop join, nested loop outer join, index nested loop joins, sort merge
join, sort merge outer join, hash joins, hash outer join, cartesian join, full
outer join, cluster join, anti-joins, semi-joins, uses bitmap indexes for star
queries
• sort group-by,
• index-organized tables
• function-based indexes
30.2. COMMERCIAL QUERY COMPILER 587
• access path: table scan, fast full index scan, index scan, ROWID scans
(access ROW by ROWID), cluster scans, hash scans. [former two with
prefetching] index scans:
• eliminate between
• elminate x in (c1 . . . cn) (also uses IN-LIST iterator as outer table con-
structor in a d-join or nested-loop join like operation.
query transformer:
• view merging
• predicate pushing
• subquery unnesting
remaining subplans for nested query blocks are ordered in an efficient manner
plan generator:
• single row joins are placed first (based on unique and key constraints.
• join statement with outer join: table with outer join operator must come
after the other table in the condition in the join order. optimizer does
not consider join orders that violate this rule.
• histograms
• push-join predicate
• subquery unnesting
• index joins
rest:
parameters:
statistics:
• table statistics
number of rows, number of blocks, average row length
• column statistics
number of distinct values, number of nulls, data distribution
• index statistics
number of keys, (from column statistics?) number of leaf blocks, levels,
clustering factor (collocation amount of the index block/data blocks, 3-17)
• system statistics
I/O performance and utilization, cpu performance and utilization
generating statistics:
• exact computation
histograms:
• value-based histograms
used for number of distinct values ≤ number of buckets
• index-organized tables
• convert b-tree result RID lists to bitmaps for further bitmap anding
• hash clusters
590 CHAPTER 30. EXAMPLE QUERY COMPILER
• continue 5-35
Selected Topics
591
Chapter 31
• disable prefetching
593
594 CHAPTER 31. GENERATING PLANS FOR TOP-N-QUERIES?
Chapter 32
Recursive Queries
595
596 CHAPTER 32. RECURSIVE QUERIES
Chapter 33
SCAN [s:student]
597
598 CHAPTER 33. ISSUES INTRODUCED BY OQL
The algebraic expression in Fig. 33.1 implies a scan of all students and a sub-
sequent dereferentiation of the supervisor attribute in order to access the su-
pervisors. If not all supervisors fit into main memory, this may result in many
page accesses. Further, if there exists an index on the supervisor’s age, and
the selection condition ssa < 30 is highly selective, the index should be applied
in order to retrieve only those supervisors required for answering the query.
Type-based rewriting enables this kind of optimization. For any expression of
certain type with an associated extent, the extent is introduced in the from
clause. For our query this results in
select distinct p
from p in Professor
where p.room.number = 209
Straight forward evaluation of this query would scan all professors. For every
professor, the room relationship would be traversed to find the room where the
professor resides. Last, the room’s number would be retrieved and tested to be
209. Using the inverse relationship, the query could as well be rewritten to
33.2. CLASS HIERARCHIES 599
JOIN [ss=p]
HH
H
HH
H
H
H
SELECT [sg>8] SELECT [pa<30]
The evaluation of this query can be much more efficient, especially if there
exists an index on the room number. Rewriting queries by exploiting inverse
relationships is another rewrite technique to be applied during Rewrite Phase
I.
Manager
boss: CEO
6
CEO
phisticated possibilities to realize extents and scans over them are needed. The
different possible implementations can be classified along two dimensions. The
first dimension distinguishes between logical and physical extents, the second
distinguishes between strict and (non-strict) extents.
Obviously, the two classifications are orthogonal. Applying them both re-
sults in the four possibilities presented graphically in Fig. 33.4. [191] strongly
argues that strict extents are the method of choice. The reason is that only
this way the query optimizer might exploit differences for extents. For example,
there might be an index on the age of Manager but not for Employee. This
difference can only be exploited for a query including a restriction on age, if we
have strict extents.
However, strict extents result in initial query plans including UNION oper-
ators. Consider the query
select e
from e in Employee
where e.salary > 100.000
33.3. CARDINALITIES AND COST FUNCTIONS 601
C: {id1 } C: {id1 }
logical
C: ob1 C: ob1
physical
C1 : ob2 C2 : ob3 C1 : ob1 , ob2 C2 : ob1 , ob3
excluding including
e1 ∪ e2 ≡ e2 ∪ e1 (33.1)
e1 ∪ (e2 ∪ e3 ) ≡ (e1 ∪ e2 ) ∪ e3 (33.2)
σp (e1 ∪ e2 ) ≡ σp (e1 ) ∪ σp (e2 ) (33.3)
χa:e (e1 ∪ e2 ) ≡ χa:e (e1 ) ∪ χa:e (e2 ) (33.4)
(e1 ∪ e2 ) 1p e3 ≡ (e1 1p e3 ) ∪ (e2 1p e3 ) (33.5)
34.9 Bibliography
603
604 CHAPTER 34. ISSUES INTRODUCED BY XPATH
Chapter 35
35.5 Bibliography
[588] [870] [229]
Numbering: [276] Timber [452] TAX Algebra [455], physical algebra of Tim-
ber [661]
Structural Joins [20, 816]
SAL: [69], TAX: [455], XAL: [289]
• StatiX: [290]
605
606 CHAPTER 35. ISSUES INTRODUCED BY XQUERY
Outlook
What we did not talk about: multiple query optimization, semantic query
optimization, special techniques for optimization in OBMSs, multi-media data
bases, object-relational databases, spatial databases, temporal databases, and
query optimization for parallel and distributed database systems.
Recursive Queries?
607
608 CHAPTER 36. OUTLOOK
Appendix A
Query Languages?
A.2 SQL
A.3 OQL
A.4 XPath
A.5 XQuery
A.6 Datalog
609
610 APPENDIX A. QUERY LANGUAGES?
Appendix B
• Hash-Teams
611
612 APPENDIX B. QUERY EXECUTION ENGINE (?)
Appendix C
pareval Falls ein Glied einer Konjunktion zu false evaluiert, werden die restlichen
Glieder nicht mehr evaluiert. Dies ergibt sich automatisch durch die Ver-
wendung von hintereinanderausgeführten Selektionen.
pushnot Falls ein Prädikat die Form ¬(p1 ∧ p2 ) hat, so ist pareval nicht an-
wendbar. Daher werden Negationen nach innen gezogen. Auf ¬p1 ∨ ¬p2
ist pareval dann wieder anwendbar. Das Durchschieben von Negationen
ist auch im Kontext von NULL-Werten unabdingbar für die Korrektheit.
Dies ist eine Optimierungstechnik, die oft bereits auf der Quellebene
durchgeführt wird.
projpush Die Technik zur Behandlung von Projektionen ist nicht ganz so ein-
fach wie die der Selektion. Zu unterscheiden ist hier, ob es sich um eine
Projektion mit Duplikateliminierung handelt oder nicht. Je nach dem
613
614APPENDIX C. GLOSSARY OF REWRITE AND OPTIMIZATION TECHNIQUES
grouppush Pushing a grouping operation past a join can lead to better plans.
crossjoin Ein Kreuzprodukt, das von einer Selektion gefolgt wird, wird wenn
immer möglich in eine Verbundoperation umgewandelt. Diese Optimierung-
stechnik schränkt den Suchraum ein, da Pläne mit Kreuzprodukten ver-
mieden werden.
joinpush Tables that are guaranteed to produce a single tuple are always
pushed to be joined first. This reduces the search space. The single tuple
condition can be evaluated by determining whether all key attributes of
a relation are fully qualified. [318, 319].
unnest Entschachtelung von Anfragen [185, 187, 310, 486, 492, 493, 676, 819,
821, 822]
pma Predicate Move around moves predicates between queries and subqueries.
Mostly they are duplicated in order to yield as many restrictions in a block
as possible [542]. As a special case, predicates will be pushed into view
definitions if they have to be materialized temporarily [318, 319].
exproj For subqueries with exist prune unnecessary entries in the select clause.
The intention behind is that attributes projected unnecessarily might in-
fluence the optimizer’s decision on the optimal access path [318, 319].
vm View merging expands the view definition within the query such that is
can be optimized together with the query. Thereby, duplicate accesses to
the view are resolved by different copies of the views definition in order
to facilitate unnesting [318, 319, 676].
like1 If the like predicate does not start with %, then a prefix index can be
used.
like2 The pattern is analyzed to see whether a range of values can be extracted
such that the pattern does not have to be evaluated on all tuples. The
result is either a pretest or an index access. [318, 319].
tmplay Das temporäre Ändern eines Layouts eines Objektes kann durchaus
sinnvoll sein, wenn die Kosten, die durch diese Änderung entstehen, durch
den Gewinn der mehrmaligen Verwendung dieses Layouts mehr als kom-
pensiert werden. Ein typisches Beispiel ist Pointer-swizzling.
AggrJoin Joins with non-equi join predicates based on ≤ or <, can be pro-
cessed more efficiently than by a cross product with a subsequent selection
[189].
rid/tidsort When several tuples qualify during an index scan, the resulting
TIDs can be sorted in order to guarantee sequential access to the base
relation.
617
multIDXsplit If two ranges are queried within the same query ([1-10],[20-30])
consider multIDX or use a single scan through the index [1-30] with an
additional qualification predicate.
lock The optimizer should chose the correct locks to set on tables. For example,
if a whole table is scanned, a table lock should be set.
stop Stop evaluation after the first tuple qualifies. This is good for existential
subqueries, universal subqueries (disqualify), semi-joins for distinct results
and the like.
kann auch nach a,b,c sortiert werden. stoert gar nicht, vereinfacht aber
die duplikateliminierung. nur ein sortieren notwendig.
XXX - use keys, inclusion dependencies, fds etc. (all user specified and de-
rived) (propagate keys over joins as fds), (for a function call: derived IU is
functional dependend on arguments of the function call if function is de-
terministic) (keys can be represented as sets of IUs or as bitvectors(given
numbering of IUs)) (numbering inprecise: bitvectors can be used as filters
(like for signatures))
Appendix D
Useful Formulas
The following identities can be found in the book by Graham, Knuth, and
Patashnik [356].
( n!
n k!(n−k)! if 0 ≤ k ≤ n
= (D.1)
k 0 else
n n
= (D.2)
k n−k
n n n−1
= (D.3)
k k k−1
n n−1
k = n (D.4)
k k−1
n n−1
(n − k) = n (D.5)
k k
n n−1
(n − k) = n (D.6)
k n−k−1
n n−1 n−1
= + (D.7)
k k k−1
r m r r−k
= (D.8)
m k k m−k
619
620 APPENDIX D. USEFUL FORMULAS
Last,
n
X
n
k = n2n−1
k
k=0
Bibliography
[4] S. Abiteboul and N. Bidoit. Non first normal form relations: An alge-
bra allowing restructuring. Journal of Computer Science and Systems,
33(3):361, 1986.
[8] A. Aboulnaga and J. Naughton. Building XML statistics for the hidden
web. In Int. Conference on Information and Knowledge Management
(CIKM), pages 358–365, 2003.
621
622 BIBLIOGRAPHY
[12] F. Afrati, C. Li, and J. Ullman. Generating efficient plans for queries
using views. In Proc. of the ACM SIGMOD Conf. on Management of
Data, pages 319–330, 2001.
[19] A.V. Aho, Y. Sagiv, and J.D. Ullman. Equivalence among relational
expressions. SIAM Journal on Computing, 8(2):218–246, 1979.
[21] J. Albert. Algebraic properties of bag data types. In Proc. Int. Conf. on
Very Large Data Bases (VLDB), pages 211–219, 1991.
[23] N. Alon, P. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-
join sizes in limited storage. J. Comput System Sciences, 35(4):391–432,
2002.
[26] P. Alsberg. Space and time savings through large database compression
and dynamic restructuring. In Proc IEEE 63,8, Aug. 1975.
[33] G. Antoshenkov. Query processing in DEC Rdb: Major issues and future
challenges. IEEE Data Engineering Bulletin, 16:42–52, Dec. 1993.
[36] P.M.G. Apers, A.R. Hevner, and S.B. Yao. Optimization algorithms for
distributed queries. IEEE Trans. on Software Eng., 9(1):57–68, 1983.
[37] P.M.G. Apers, A.R. Hevner, and S.B. Yao. Optimization algorithms for
distributed queries. IEEE Trans. on Software Eng., 9(1):57–68, 1983.
[41] M.M. Astrahan, M.W. Blasgen, D.D. Chamberlin, K.P. Eswaran, J.N.
Gray, P.P. Griffiths, W.F. King, R.A. Lorie, P.R. Mc Jones, J.W. Mehl,
G.R. Putzolu, I.L. Traiger, B.W. Wade, and V. Watson. System R:
relational approach to database management. ACM Transactions on
Database Systems, 1(2):97–137, June 1976.
[69] C. Beeri and Y. Tzaban. SAL: An algebra for semistructured data and
XML. In ACM SIGMOD Workshop on the Web and Databases (WebDB),
1999.
[75] J.L̃. Bentley and A. C.-C. Yao. An almost optimal algorithm for un-
bounded searching. Inf. Proc. Lett., 5(3):82–87, 1976.
[85] G. Bhargava, P. Goel, and B. Iyer. No regression algorithm for the enu-
meration of projections in SQL queries with joins and outer joins. In IBM
Centre for Advanced Studies Conference (CASCOM), 1995.
[86] G. Bhargava, P. Goel, and B. Iyer. Efficient processing of outer joins and
aggregate functions. In Proc. IEEE Conference on Data Engineering,
pages 441–449, 1996.
[87] A. Biliris. An efficient database storage structure for large dynamic ob-
jects. In Proc. IEEE Conference on Data Engineering, pages 301–308,
1992.
[88] D. Bitton and D. DeWitt. Duplicate record elimination in large data files.
ACM Trans. on Database Systems, 8(2):255–265, 1983.
[90] J. Blakeley and N. Martin. Join index, materialized view, and hybrid
hash-join: a performance analysis. In Proc. IEEE Conference on Data
Engineering, pages 256–236, 1990.
[95] T. Böhme and E. Rahm. Xmach-1: A benchmark for XML data manage-
ment. In BTW, pages 264–273, 2001.
[96] A. Bolour. Optimal retrieval for small range queries. SIAM J. of Comput.,
10(4):721–741, 1981.
628 BIBLIOGRAPHY
[101] M. Brantner, S. Helmer, C-C. Kanne, and G. Moerkotte. Index vs. navi-
gation in XPath evaluation. In Int. XML Database Symp. (XSym), pages
16–30, 2006.
[106] S. Bressan, M. Lee, Y. Li, Z. Lacroix, and U. Nambiar. The XOO7 XML
Management System Benchmark. Technical Report TR21/00, National
University of Singapore, 2001.
[113] F. Buccafurri and G. Lax. Fast range query estimation by n-level tree
histograms. Data & Knowledge Engineering, 51:257–275, 2004.
[116] P. Buneman, S. Davidson, W. Fan, C. Hara, and W. Tan. Keys for XML.
In WWW Conference, pages 201–210, 2001.
[132] S. Ceri and G. Gottlob. Translating SQL into relational algebra: Op-
timization, semantics and equivalence of SQL queries. IEEE Trans. on
Software Eng., 11(4):324–345, Apr 1985.
[159] P. Cheeseman, B. Kanefsky, and W. Taylor. Where the really hard prob-
lems are. In Int. Joint Conf. on Artificial Intelligence (IJCAI), pages
331–337, 1991.
[162] Y. Chen and K. Yi. Two-level sampling for join size estimation. In Proc.
of the ACM SIGMOD Conf. on Management of Data, pages 759–774,
2017.
BIBLIOGRAPHY 633
[169] M. Cherniack and S. Zdonik. Rule languages and internal algebras for
rule-based optimizers. In Proc. of the ACM SIGMOD Conf. on Manage-
ment of Data, pages 401–412, 1996.
[170] T.-Y. Cheung. Estimating block accesses and number of records in file
management. Communications of the ACM, 25(7):484–487, 1982.
[185] S. Cluet and G. Moerkotte. Nested queries in object bases. In Proc. Int.
Workshop on Database Programming Languages, pages 226–242, 1993.
[192] E. Codd. A relational model of data for large shared data banks. Com-
munications of the ACM, 13(6):377–387, 1970.
[200] L. Colby. A recursive algebra and query optimization for nested relational
algebra. In Proc. of the ACM SIGMOD Conf. on Management of Data,
pages 273–283, 1989.
[207] D. Cornell and P. Yu. Integration of buffer management and query op-
timization in relational database environments. In Proc. Int. Conf. on
Very Large Data Bases (VLDB), pages 247–255, 1989.
[209] K. Culik, T. Ottmann, and D. Wood. Dense multiway trees. ACM Trans.
on Database Systems, 6(3):486–512, 1981.
[213] D. Das and D. Batory. Praire: A rule specification framework for query
optimizers. In Proc. IEEE Conference on Data Engineering, pages 201–
210, 1995.
[214] C. J. Date. The outer join. In Proc. of the Int. Conf. on Databases,
Cambridge, England, 1983.
[221] D. DeHaan, P.-A. Larson, and J. Zhou. Stacked index views in Microsoft
SQL server. In Proc. of the ACM SIGMOD Conf. on Management of
Data, pages 179–190, 2005.
[222] K. Delaney. Inside Microsoft SQL Server 2005: Query Tuning and Opti-
mization. Microsoft Press, 2008.
[225] N. Derrett and M.-C. Shan. Rule-based query optimization in IRIS. Tech-
nical report, Hewlard-Packard Laboratories, 1501 Page Mill Road, Palo
Alto, CA94303, 1990.
[238] P. Dietz. Optimal algorithms for list indexing and subset ranking. In
Workshop on Algorithms and Data Structures (LNCS 382), pages 39–46,
1989.
[278] T. Fiebig and G. Moerkotte. Algebraic XML construction and its opti-
mization in Natix. World Wide Web Journal, 4(3):167–187, 2002.
[283] P. Flajolet and G. Martin. Probabilistic counting algorithms for data base
applications. Rapports de Recherche 313, INRIA Rocquencourt, 1984.
[284] P. Flajolet and G. Martin. Probabilistic counting algorithms for data base
applications. J. Comput. Syst. Sci., 31(2):182–209, 1985.
[289] F. Frasincar, G.-J. Houben, and C. Pau. XAL: An algebra for XML query
optimization. In Australasian Database Conference (ADC), 2002.
[330] G. Golub and C. van Loan. Matrix Computations. The John Hopkins
University Press, 1996. Third Edition.
[335] M. Gouda and U. Dayal. Optimal semijoin schedules for query processing
in local distributed database systems. In Proc. of the ACM SIGMOD
Conf. on Management of Data, pages 164–175, 1981.
[336] P. Goyal. Coding methods for text string search on compressed databases.
Information Systems, 8(3):231–233, 1983.
BIBLIOGRAPHY 645
[340] G. Graefe. Heap-filter merge join: A new algorithm for joining medium-
size inputs. IEEE Trans. on Software Eng., 17(9):979–982, 1991.
[341] G. Graefe. Query evaluation techniques for large databases. ACM Com-
puting Surveys, 25(2), June 1993.
[344] G. Graefe. The cascades framework for query optimization. IEEE Data
Engineering Bulletin, 18(3):19–29, Sept 1995.
[346] G. Graefe, R. Bunker, and S. Cooper. Hash joins and hash teams in
Microsoft SQL Server. In Proc. Int. Conf. on Very Large Data Bases
(VLDB), pages 86–97, 1998.
[348] G. Graefe and R. Cole. Dynamic query evaluation plans. In Proc. of the
ACM SIGMOD Conf. on Management of Data, pages ?–?, 1994.
[352] G. Graefe, A. Linville, and L. Shapiro. Sort versus hash revisited. IEEE
Trans. on Knowledge and Data Eng., 6(6):934–944, Dec. 1994.
[355] G. Graefe and K. Ward. Dynamic query evaluation plans. In Proc. of the
ACM SIGMOD Conf. on Management of Data, pages 358–366, 1989.
[358] G. Grahne and A. Thomo. New rewritings and optimizations for regular
path queries. In Proc. Int. Conf. on Database Theory (ICDT), pages
242–258, 2003.
[361] J. Gray and G. Graefe. The five-minute rule ten years later, and other
computer storage rules of thumb. ACM SIGMOD Record, 26(4):63–68,
1997.
[362] J. Gray and F. Putzolu. The 5 minute rule for trading memory for disk
accesses and the 10 byte rule for trading memory for CPU time. In Proc.
of the ACM SIGMOD Conf. on Management of Data, pages 395–398,
1987.
[364] T. Grust. Accelerating XPath location steps. In Proc. of the ACM SIG-
MOD Conf. on Management of Data, pages 109–120, 2002.
[365] T. Grust and M. Van Keulen. Tree awareness for relational database
kernels: Staircase join. In Intelligent Search on XML Data, pages 231–
245, 2003.
[366] T. Grust, M. Van Keulen, and J. Teubner. Staircase join: Teach a rela-
tional dbms to watch its (axis) steps. In Proc. Int. Conf. on Very Large
Data Bases (VLDB), pages 524–525, 2003.
[378] R. Güting, R. Zicari, and D. Choy. An algebra for structured office doc-
uments. ACM Trans. on Information Systems, 7(4):123–157, 1989.
[384] A. Halevy. Answering queries using views: A survey. The VLDB Journal,
10(4):270–294, Dec. 2001.
[394] E.N. Hanson, M. Chaabouni, C.-H. Kim, and Y.-W. Wang. A predi-
cate matching algorithm for database rule systems. In Proc. of the ACM
SIGMOD Conf. on Management of Data, pages 271–?, 1990.
[416] S. Helmer and G. Moerkotte. A study of four index structures for set-
valued attributes of low cardinality. Technical Report 02/99, University
of Mannheim, 1999.
[417] S. Helmer and G. Moerkotte. Compiling away set containment and inter-
section joins. Technical Report 4, University of Mannheim, 2002.
[419] S. Helmer, T. Neumann, and G. Moerkotte. Early grouping gets the skew.
Technical Report 9, University of Mannheim, 2002.
[424] T. Hogg and C. Williams. Solving the really hard problems with coopera-
tive search. In Proc. National Conference on Artificial Intelligence, pages
231–236, 1993.
BIBLIOGRAPHY 651
[430] H.-Y. Hwang and Y.-T. Yu. An analytical method for estimating and
interpreting query time. In Proc. Int. Conf. on Very Large Data Bases
(VLDB), pages 347–358, 1987.
[456] C. Janssen. The visual profiler. perform internet search for this or similar
tools.
[459] M. Jarke and J.Koch. Range nesting: A fast method to evaluate quantified
queries. In Proc. of the ACM SIGMOD Conf. on Management of Data,
pages 196–206, 1983.
[487] J. J. King. Exploring the use of domain knowledge for query processing
efficiency. Technical Report STAN-CS-79-781, Computer Science Depart-
ment, Stanford University, 1979.
[491] M. Klettke, L. Schneider, and A. Heuer. Metrics for XML Document Col-
lections. In EDBT Workshop XML-Based Data Management (XMLDM),
pages 15–28, 2002.
656 BIBLIOGRAPHY
[493] A. Klug. Access paths in the “ABE” statistical query facility. In Proc. of
the ACM SIGMOD Conf. on Management of Data, pages 161–173, 1982.
[499] J. Kollias. An estimate for seek time for batched searching of random or
index sequential structured files. The Computer Journal, 21(2):132–133,
1978.
[506] D. Kossmann. The state of the art in distributed query processing. ACM
Computing Surveys, 32(4):422–469, 2000.
BIBLIOGRAPHY 657
[514] I. Kunen and D. Suciu. A scalable algorithm for query minimization. ask
Dan for more information, year.
[515] S. Kwan and H. Strong. Index path length evaluation for the research
storage system of system r. Technical Report RJ2736, IBM Research
Laboratory, San Jose, 1980.
[519] S.-D. Lang, J. Driscoll, and J. Jou. A unified analysis of batched searching
of sequential and tree-structured files. ACM Trans. on Database Systems,
14(4):604–618, 1989.
[527] P.-Å. Larson and H. Yang. Computing queries from derived relations.
In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 259–269,
1985.
[528] Y.-N. Law, H. Wang, and C. Zaniolo. Query languages and data models
for database sequences and data streams. In VLDB, pages 492–503, 2004.
[530] C. Lee, C.-S. Shih, and Y.-H. Chen. Optimizing large join queries using
a graph-based approach. IEEE Trans. on Knowledge and Data Eng.,
13(2):298–315, 2001.
[533] T. Lehman and B. Lindsay. The Starburst long field manager. In Proc.
Int. Conf. on Very Large Data Bases (VLDB), pages 375–383, 1989.
[535] A. Lerner and D. Shasha. AQuery: query language for ordered data,
optimization techniques, and experiments. In Proc. Int. Conf. on Very
Large Data Bases (VLDB), pages 345–356, 2003.
[538] M. Levene and G. Loizou. A fully precise null extended nested relational
algebra. Fundamenta Informaticae, 19(3/4):303–342, 1993.
[543] A.Y. Levy and I.S. Mumick. Reasoning with aggregation constraints. In
P. Apers, M. Bouzeghoub, and G. Gardarin, editors, Proc. European Conf.
on Extending Database Technology (EDBT), Lecture Notes in Computer
Science, pages 514–534. Springer, March 1996.
[545] C. Li, K. Chang, I. Ilyas, and S. Song. RankSQL: Query algebra and
optimization for relational top-k queries. In Proc. of the ACM SIGMOD
Conf. on Management of Data, pages 131–142, 2005.
[553] J. W. S. Liu. Algorithms for parsing search queries in systems with in-
verted file organization. ACM Trans. on Database Systems, 1(4):299–316,
1976.
[558] D. Lomet. B-tree page size when caching is considered. ACM SIGMOD
Record, 27(3):28–32, 1998.
[560] H. Lu and K.-L. Tan. On sort-merge algorithms for band joins. IEEE
Trans. on Knowledge and Data Eng., 7(3):508–510, Jun 1995.
[578] R. Marek and E. Rahm. TID hash joins. In Int. Conference on Informa-
tion and Knowledge Management (CIKM), pages 42–49, 1994.
[584] N. May, S. Helmer, and G. Moerkotte. Three Cases for Query Decorre-
lation in XQuery. In Int. XML Database Symp. (XSym), pages 70–84,
2003.
[588] J. McHugh and J. Widom. Query optimization for XML. In Proc. Int.
Conf. on Very Large Data Bases (VLDB), pages 315–326, 1999.
[596] T. Milo and D. Suciu. Index structures for path expressions. In Proc. Int.
Conf. on Database Theory (ICDT), pages 277–295, 1999.
[607] G. Moerkotte and T. Neumann. Analysis of two existing and one new
dynamic programming algorithm for the generation of optimal bushy trees
without cross products. In Proc. Int. Conf. on Very Large Data Bases
(VLDB), pages 930–941, 2006.
[614] C. Mohan, D. Haderle, Y. Wang, and J. Cheng. Single table access using
multiple indexes: Optimization, execution, and concurrency control tech-
niques. In Int. Conf. on Extended Database Technology (EDBT), pages
29–43, 1990.
[628] T. Neumann and C. Galindo-Legaria. Taking the edge off cardinality es-
timation errors using incremental execution. In Proc. der GI-Fachtagung
Datenbanksysteme für Büro, Technik und Wissenschaft (BTW), pages
73–92, 2013.
[638] P. O’Neil and D. Quass. Improved query performance with variant index-
es. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages
38–49, 1997.
[648] G. Ozsoyoglu and H. Wang. A relational calculus with set operators, its
safety and equivalent graphical languages. IEEE Trans. on Software Eng.,
SE-15(9):1038–1052, 1989.
[649] T. Özsu and J. Blakeley. W. Kim (ed.): Modern Database Systems, chap-
ter Query Processing in Object-Oriented Database Systems, pages 146–
174. Addison Wesley, 1995.
[658] V. Papadimos and D. Maier. Mutant query plans. Information & Software
Technology, 44(4):197–206, 2002.
[663] C.-S. Park, M. Kim, and Y.-J. Lee. Rewriting OLAP queries using materi-
alized views and dimension hierarchies in data. In Proc. IEEE Conference
on Data Engineering, pages 515–523, 2001.
[664] J. Patel, M. Carey, and M. Vernon. Accurate modeling of the hybrid hash
join algorithm. In Proc. ACM SIGMETRICS Conf. on Measurement and
Modeling of Computer Systems, pages 56–66, 1994.
[677] H. Pirahesh, T. Leung, and W. Hassan. A rule engine for query transfor-
mation in Starburst and IBM DB2 C/S DBMS. In Proc. IEEE Conference
on Data Engineering, pages 391–400, 1997.
[683] N. Polyzotis and M. Garofalakis. Structure and value synopsis for XML
data graphs. In Proc. Int. Conf. on Very Large Data Bases (VLDB),
pages 466–477, 2002.
[692] Y.-J. Qyang. A tight upper bound for the lumped disk seek time for the
SCAN disk scheduling policy. Information Processing Letters, 54:355–358,
1995.
[694] A. Rajaraman, Y. Sagiv, and J.D. Ullman. Answering queries using tem-
plates with binding patterns. In Proc. ACM SIGMOD/SIGACT Conf.
on Princ. of Database Syst. (PODS), PODS, 1995.
[695] Bernhard Mitschang Ralf Rantzau, Leonard D. Shapiro and Quan Wang.
Algorithms and applications for universal quantification in relational
databases. Information Systems, 28(1-2):3–32, 2003.
[705] J. Rao and K. Ross. Reusing invariants: A new strategy for correlated
queries. In Proc. of the ACM SIGMOD Conf. on Management of Data,
pages 37–48, Seattle, WA, 1998.
[706] S. Rao, A. Badia, and D. Van Gucht. Providing better support for a
class of decision support queries. In Proc. of the ACM SIGMOD Conf.
on Management of Data, pages 217–227, 1996.
[710] D. Reiner and A. Rosenthal. Strategy spaces and abstract target machines
for query optimization. Database Engineering, 5(3):56–60, Sept. 1982.
[720] D.J. Rosenkrantz and M.B. Hunt. Processing conjunctive predicates and
queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages
64–74, 1980.
[739] G. Sacco. Index access with a finite buffer. In Proc. Int. Conf. on Very
Large Data Bases (VLDB), pages 301–309, 1887.
[740] G. Sacco and M. Schkolnick. A technique for managing the buffer pool in
a relational system using the hot set model. In Proc. Int. Conf. on Very
Large Data Bases (VLDB), pages 257–262, 1982.
[750] S. Savage. The Flaw of Average. John Wiley & Sons, 2009.
[753] H.-J. Schek and M. Scholl. The relational model with relation-valued
attributes. Information Systems, 11(2):137–147, 1986.
[790] G. M. Shaw and S.B. Zdonik. A query algebra for object-oriented databas-
es. Tech. report no. cs-89-19, Department of Computer Science, Brown
University, 1989.
[791] G.M. Shaw and S.B. Zdonik. An object-oriented query algebra. In 2nd Int.
Workshop on Database Programming Languages, pages 111–119, 1989.
[792] G.M. Shaw and S.B. Zdonik. A query algebra for object-oriented databas-
es. In Proc. IEEE Conference on Data Engineering, pages 154–162, 1990.
[794] E. Shekita, K.-L. Tan, and H. Young. Multi-join optimization for sym-
metric multiprocessors. In Proc. Int. Conf. on Very Large Data Bases
(VLDB), pages 479–492, 1993.
[795] E. Shekita, H. Young, and K.-L. Tan. Multi-join optimization for sym-
metric multiprocessors. In Proc. Int. Conf. on Very Large Data Bases
(VLDB), pages 479–492, 1993.
[796] P. Shenoy and H. Cello. A disk scheduling framework for next generation
operating systems. In Proc. ACM SIGMETRICS Conf. on Measurement
and Modeling of Computer Systems, pages 44–55, 1998.
[802] A. Shrufi and T. Topaloglou. Query processing for knowledge bases using
join indices. In Int. Conference on Information and Knowledge Manage-
ment (CIKM), 1995.
[804] M. Siegel, E. Sciore, and S. Salveter. A method for automatic rule deriva-
tion to support semantic query optimization. ACM Trans. on Database
Systems, 17(4):53–600, 1992.
[813] R. Sosic, J. Gu, and R. Johnson. The Unison algorithm: Fast evalua-
tion of boolean expressions. ACM Transactions on Design Automation of
Electronic Systems (TODAES), 1:456 – 477, 1996.
[831] M. Stonebraker et al. QUEL as a data type. In Proc. of the ACM SIG-
MOD Conf. on Management of Data, Boston, MA, June 1984.
680 BIBLIOGRAPHY
[834] M. Stonebraker, E. Wong, P. Kreps, and G. Held. The design and imple-
mentation of INGRES. ACM Trans. on Database Systems, 1(3):189–222,
1976.
[835] D. Straube and T. Özsu. Access plan generation for an object algebra.
Technical Report TR 90-20, Department of Computing Science, Univer-
sity of Alberta, June 1990.
[841] D. Suciu. Query decomposition and view maintenance for query languages
for unconstrained data. In Proc. Int. Conf. on Very Large Data Bases
(VLDB), pages 227–238, 1996.
[848] A. Swami and B. Iyer. A polynomial time algorithm for optimizing join
queries. Technical Report RJ 8812, IBM Almaden Research Center, 1992.
[849] A. Swami and B. Iyer. A polynomial time algorithm for optimizing join
queries. In Proc. IEEE Conference on Data Engineering, pages 345–354,
1993.
[850] A. Swami and B. Schiefer. Estimating page fetches for index scans with
finite LRU buffers. In Proc. of the ACM SIGMOD Conf. on Management
of Data, pages 173–184, 1994.
[851] A. Swami and B. Schiefer. On the estimation of join result sizes. In Proc.
of the Int. Conf. on Extending Database Technology (EDBT), pages 287–
300, 1994.
[854] K.-L. Tan and H. Lu. A note on the strategy space of multiway join query
optimization problem in parallel systems. SIGMOD Record, 20(4):81–82,
1991.
[860] J. Teubner, T. Grust, and M. Van Keulen. Bridging the gap between
relational and native XML storage with staircase join. Grundlagen von
Datenbanken, pages 85–89, 2003.
[873] J.D. Ullman. Database and Knowledge Base Systems, volume Volume 1.
Computer Science Press, 1989.
BIBLIOGRAPHY 683
[874] J.D. Ullman. Database and Knowledge Base Systems, volume Volume 2.
Computer Science Press, 1989.
[875] J.D. Ullman. Database and Knowledge Base Systems. Computer Science
Press, 1989.
[876] D. Straube und T. Özsu. Query transformation rules for an object al-
gebra. Technical Report TR 89-23, Department of Computing Science,
University of Alberta, Sept. 1989.
[877] T. Urhan, M. Franklin, and L. Amsaleg. Cost based query scrambling for
initial delays. In Proc. of the ACM SIGMOD Conf. on Management of
Data, pages 130–141, 1998.
[885] B. Vance and D. Maier. Rapid bushy join-order optimization with carte-
sian products. In Proc. of the ACM SIGMOD Conf. on Management of
Data, pages 35–46, 1996.
[892] F. Waas and A. Pellenkoft. Join order selection - good enough is easy. In
BNCOD, pages 51–67, 2000.
[896] W. Wang, H. Jiang, H. Lu, and J. Yu. Containment join size estima-
tion: Models and methods. In Proc. of the ACM SIGMOD Conf. on
Management of Data, pages 145–156, 2003.
[897] W. Wang, H. Jiang, H. Lu, and J. Yu. Bloom histogram: Path selectivity
estimation for xml data with updates. In Proc. Int. Conf. on Very Large
Data Bases (VLDB), pages 240–251, 2004.
[899] S. Waters. File design fallacies. The Computer Journal, 15(1):1–4, 1972.
[913] N. Wilhelm. A general model for the performance of disk systems. Journal
of the ACM, 24(1):14–31, 1977.
[915] C. Williams and T. Hogg. Using deep structure to locate hard problems.
In Proc. National Conference on Artificial Intelligence, pages 472–477,
1992.
[934] W. Yan and P.-A. Larson. Eager aggregation and lazy aggregation. In
Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 345–357, 1995.
[935] H. Yang and P.-A. Larson. Query transformation for PSJ-queries. In Proc.
Int. Conf. on Very Large Data Bases (VLDB), pages 245–254, 1987.
[939] B. Yao and T. Özsu. XBench – A Family of Benchmarks for XML DBMSs.
Technical Report CS-2002-39, University of Waterloo, 2002.
[942] S. B. Yao. An attribute based model for database access cost analysis.
ACM Trans. on Database Systems, 2(1):45–67, 1977.
[945] S.B. Yao, A.R. Hevner, and H. Young-Myers. Analysis of database sys-
tem architectures using benchmarks. IEEE Trans. on Software Eng.,
SE-13(6):709–725, 1987.
[946] J. Yiannis and J. Zobel. Compression techniques for fast external sorting.
VLDB Journal, 16(2):269–291, 2007.
[947] Y. Yoo and S. Lafortune. An intelligent search method for query optimiza-
tion by semijoins. IEEE Trans. on Knowledge and Data Eng., 1(2):226–
237, June 1989.
ToDo
• [895]
• [98]
• Bypass Plans
• magic set and semi join reducers [78, 80, 79, 171, 335, 622, 620, 622, 621,
780, 825, 947]
• join indexes and clustering tuples of different relations with 1:n relation-
ship [226, 395, 879, 880, 802]
• Prefetching [902]
689
690 APPENDIX E. TODO
• compression [26, 54, 163, 203, 255, 254, 329, 336] [634, 707, 732, 783, 784,
844, 907, 919]
• semantic QO SQO: [1, 81, 168, 326, 359, 487, 488, 509, 517] [635, 643,
644, 668, 797, 804, 806, 932] [543]
• join+buffer: [916]
• benchmark(ing): Gray Book: [360]; papers: [95, 106, 654, 738, 763, 945,
939, 940]
• BXP: [105, 256, 306, 369, 391, 431, 483, 681, 713, 803, 810, 813, 482]
BXP complexity: [76] BXP var infl: [467]
• joins: [688]
• fragmentation: [742]
691
• indexing+caching: [774]
• dist db: [36, 37, 94, 212, 937] Donald’s state of the art: [506]
• [144, 145]
• nested: [45]
• dupelim: [88]
• early aggregation
• classics: [386]
• Hwang/Yu: [430]
• Kambayashi: [469]
• determine optimal page access sequence and buffer size to access pairs
(x,y) of pages where join partners of one relation lie on x and of the
other on y (Fotouhi, Pramanik [288], Merret, Kambayashi, Yasuura [593],
Omiecinski [636], Pramanik, Ittner [688], Chan, Ooi [140])
• Sigmod05:
• LOCI: [657]
• PostgresExperience: [894]
• Bruno, Galindo-Legaria, Joshi [110] which is like GOO with the two addi-
tional techniques of pushing partial plans down and pulling partial plans
up whenever a new join is added.
• check: joinorder chapter: ccp: Fig3: numbers for #ccp for chain for n=20,
formel:nn -¿2n