Adaptive Query Processing
Adaptive Query Processing
in
Databases
Vol. 1, No. 1 (2007) 1–140
c 2007 A. Deshpande, Z. Ives and V. Raman
DOI: 10.1561/1900000001
1
University of Maryland, USA, [email protected]
2
University of Pennsylvania, USA, [email protected]
3
IBM Almaden, USA, [email protected]
Abstract
As the data management field has diversified to consider settings in
which queries are increasingly complex, statistics are less available, or
data is stored remotely, there has been an acknowledgment that the
traditional optimize-then-execute paradigm is insufficient. This has led
to a plethora of new techniques, generally placed under the common
banner of adaptive query processing, that focus on using runtime feed-
back to modify query processing in a way that provides better response
time or more efficient CPU utilization.
In this survey paper, we identify many of the common issues, themes,
and approaches that pervade this work, and the settings in which each
piece of work is most appropriate. Our goal with this paper is to be
a “value-add” over the existing papers on the material, providing not
only a brief overview of each technique, but also a basic framework
for understanding the field of adaptive query processing in general.
We focus primarily on intra-query adaptivity of long-running, but not
full-fledged streaming, queries. We conclude with a discussion of open
research problems that are of high importance.
1
Introduction
2
3
the reasons behind the push toward adaptivity (Section 1.2); we then
present a road map for the rest of the survey (Section 1.3), and briefly
discuss the related surveys of interest (Section 1.4).
1 Exceptfor certain embedded SQL queries, which may be pre-optimized or optimized once
for multiple possible input bindings.
6 Introduction
what tuple to output next). Also, complex query plans may require too
many resources to be fully pipelined. In these settings, the optimizer
must break the plan into multiple segments, materializing (storing)
intermediate results at the end of each stage and using that as an
input to the next stage.
Second, the issue of scheduling computation in a query plan has
many performance implications. Traditional query processing makes
the assumption that an individual operator implementation (e.g., a
nested loops join) should be able to control how CPU cycles are allo-
cated to its child operators. This is achieved through a so-called itera-
tor [53] architecture: each operator has open, close, and getNextTuple
methods. The query engine first invokes the query plan root node’s
open method, which in turn opens its children, and the process repeats
recursively down the plan. Then getNextTuple is called on the root
node. Depending on the operator implementation, it will make calls to
its children’s getNextTuple methods until it can return a tuple to its
parent. The process completes until no more tuples are available, and
then the engine closes the query plan.
An alternate approach, so called data-driven or dataflow schedul-
ing [121], is used in many parallel database systems. Here, in order to
allow for concurrent computation across many machines, the data pro-
ducers — not the consumers — control the scheduling. Each operator
takes data from an input queue, processes it, and sends it to an output
queue. Scheduling is determined by the rates at which the queues are
filled and emptied. In this survey, we will discuss a number of adaptive
techniques that in essence use a hybrid of the iterator and data-driven
approaches.
There have been two responses to the challenges posed above. The
first, a very pragmatic response by application vendors, has been to
build domain-specific optimization capabilities outside the DBMS and
override its local optimizer. Many commercial DBMSs allow users to
specify “hints” on what access methods and join orders to use, via SQL
or catalog tables. Recently, SAP has built an application-level query
processor that runs only a very limited set of plans (essentially, only
table scans), but at very high efficiency [82]. While this achieves SAP’s
target of satisfying its users, it runs counter to the database commu-
nity’s goals of developing high-performance, general-purpose processors
for declarative queries.
Our interest in this survey is on the second development, which
has been the focus of the academic and commercial DBMS research
community: the design and construction of what have come to be known
as adaptive (or autonomic) query processing systems, that use runtime
feedback to adapt query processing.
1.3 Road Map 9
10
2.1 Query Optimization 11
Fig. 2.1 An example query with two expensive user-defined predicates, two serial plans for
it, and one conditional plan that uses age to decide which of the two expensive predicates
to apply first.
Serial Plans: The natural class of execution plans to consider for eval-
uating such queries is the class of serial orders that specify a single order
in which the predicates should be applied to the tuples of the relation
(Figure 2.1). Given a serial order, Sπ1 , . . . , Sπn , where π1 , . . . , πn denotes
a permutation of 1, . . . , n, the expected cost per tuple for that order can
be written as:
Join Order: The access methods provide source tuples to the query
plan. To join these tables, the optimizer then chooses a join order.
Fig. 2.2 (i) A multi-way join query that we use as the running example; (ii) a left-deep join
order, in which the right child of each join must be a base relation; (iii) a join order that
uses a ternary join operator.
16 Background: Conventional Optimization Techniques
during cost estimation, along with the join order selection, and are
partly constrained by the available access methods. For instance, if
the data from a relation is streaming in, pipelined join operators must
be used. Similarly, an index access method is often ideally suited for
use with an index nested-loops join or sort-merge join.
Fig. 2.4 Symmetric hash join (doubly pipelined hash join) operator builds hash tables on
both inputs.
18 Background: Conventional Optimization Techniques
2.3 Summary
The traditional, single-pass, optimize-then-execute strategy for query
execution has served the database community quite well since the
1970s. As queries have become more complex and widespread, how-
ever, they have started to run into limitations. Schemes for robust opti-
mization, parametric optimization, and inter-query adaptivity alleviate
some of these difficulties by reducing sensitivity to errors. A significant
virtue of these methods is that they impose little runtime overhead on
query execution. Perhaps even more importantly, they serve as a simple
upgrade path for conventional single-pass query processors.
However, there are settings in which even these techniques run into
limitations: for instance, if the query workload is highly diverse, then
subsequent queries may have little overlap; if the actual costs change
frequently, as they may in highly distributed settings, then the recal-
26 Background: Conventional Optimization Techniques
27
28 Foundations of Adaptive Query Processing
Fig. 3.1 Example of an eddy instantiated for a 4-way join query (taken from Avnur and
Hellerstein [6]). A routing table can be used to record the valid routing destinations, and
possibly current probabilities for choosing each destination, for different tuple signatures.
operators (thereby, in effect, changing the query plan being used for the
tuple). The eddy operator, which is used as the tuple router, monitors
the execution, and makes the routing decisions for the tuples.
Figure 3.1 shows how an eddy can be used to execute a 4-way join
query. Along with an eddy, three join operators and one selection oper-
ator are instantiated. The eddy executes the query by routing tuples
from the relations R, S, and T through these operators; a tuple that
has been processed by all operators is sent to the output. The eddy can
adapt to changing data or operator characteristics by simply changing
the order in which the tuples are routed through these operators. Note
that the operators themselves must be chosen in advance (this was
somewhat relaxed by a latter approach called SteMs that we discuss in
Section 6). These operator choices dictate, to a large degree, the plans
among which the eddy can adapt. Pipelined operators like symmetric
hash join offer the most freedom in adapting and typically also provide
immediate feedback to the eddy (for determining the operator selectiv-
3.1 New Operators 31
ities and costs). On the other hand, blocking operators like sort-merge
operators are not very suitable since they do not produce output before
consuming the input relations in their entirety.
Various auxiliary data structures are used to assist the eddy during
the execution; broadly speaking, these serve one of two purposes:
Given these data structures, the eddy follows a two-step process for
routing a tuple:
frequency with which the eddy executes Step 1 and the statistics that
it collects; these two factors also determine how well and how quickly
the eddy can adapt. This overhead can be minimized with some careful
engineering (e.g., by amortizing the cost of Step 1 over many tuples [37],
or by using random sampling to maintain the statistics). We will revisit
this issue in more detail when we discuss specific routing policies in the
later sections.
Fig. 3.2 Executing a 4-way join query using the MJoin operator. The triangles denote the
in-memory hash indexes built on the relations.
Similarly, when a new R tuple arrives, it is first built into the hash
table on R. It is then probed into the hash table on S on attribute S.a
first, and the resulting matches are then probed into the hash tables
on T and U . Note that the R tuple is not eligible to be probed into
the hash tables on T or U directly, since it does not contain the join
attributes corresponding to either of those two joins.
An MJoin is significantly more attractive over a tree of binary
operators when processing queries involving sliding windows over data
streams; when a base tuple from a relation drops out of the sliding win-
3.2 Adaptivity Loop 35
dow, only the base tuple needs to be located and removed from a hash
table, since intermediate tuples are not stored in the hash tables. Fur-
ther, MJoins are naturally very easy to adapt; the query plan being used
can be changed by simply changing the probing sequence. For these two
reasons, much work in data stream processing has focused on using an
MJoin-style execution [12, 86]. However, MJoins tend not to perform
as well as trees of binary join operators, especially for non-streaming
data. This is mainly because they do not reuse the intermediate results;
we will revisit this issue in more detail in Sections 6 and 7.
ing the key aspects of the technique and its relative trade-offs. First,
we provide further detail on each of the stages.
3.2.1 Measurement
In general, all intra-query adaptation methods perform some form of
monitoring, such as measuring cardinalities at key points in plan execu-
tion. As we will see in Sections 6–8, mid-query reoptimization [75, 87]
does this at the end of each pipeline stage, and it may also add statis-
tics collectors (i.e., histogram construction algorithms) at points that
are judged to be key; eddies measure the selectivities and costs of the
operators; corrective query processing [73]1 maintains cardinality infor-
mation for all operators and their internal state (including aggregation).
Often it is desirable to explore the costs of options other than the
currently executing query plan. Eddies perform exploration as a part of
execution, by routing tuples through different alternative paths, serv-
ing the goals of execution and information gathering simultaneously.
Antoshenkov’s work on DEC Rdb [3] adopted a different, competitive
strategy in which multiple alternative plans were redundantly run in
parallel; once enough information was gathered to determine which
plan appeared most promising, all other plans were terminated.
Finally, at times it is more efficient to stratify the search and mea-
surement space, by executing a plan in a series of steps, and hence mea-
suring only the active portions of query execution. Choose nodes [52]
and mid-query reoptimization [75, 87] follow this strategy, interleaving
measurement-plus-execution with analysis and actuation.
3.2.2 Analysis
The analysis step focuses on determining how well execution is pro-
ceeding — relative to original estimates or to the estimated or mea-
sured costs of alternative strategies. There are two major caveats to
this phase. First, the only way to know precise costs of alternative
strategies is through competitive execution, which is generally expen-
sive. Thus most adaptive strategies employ sampling or cost modeling
3.2.3 Planning
Planning is often closely interlinked with analysis, since it is quite com-
mon that the same process that reveals a plan to be performing poorly
will also suggest a new plan. Mid-query reoptimization and its variants
(Section 8) compute plans in stages — using analysis from the current
stage to produce a new plan for the next stage, and supplementing
it with the appropriate measurement operators. Corrective query pro-
cessing (Section 7) incrementally collects information as it computes a
query, and it uses this information to estimate the best plan for the
remaining input data.
In some places, changing the query plan requires additional
“repairs”: Query scrambling, for instance, may change the order of exe-
cution of a query plan, and in some cases plan synthesis [118] is required
(see Section 8.3). Similarly, corrective query processing requires a com-
putation to join among intermediate results that were created in dif-
ferent plans; this is often done in a “cleanup” or “stitch-up” phase.
Eddies and their descendants do not plan in the same fashion as the
other strategies: they use queue length, cost, and selectivity estimation
to determine a “next step” in routing a tuple from one plan to another.
SteMs [95] and STAIRs [40] use different planning heuristics to manage
the intermediate state, in essence performing the same actions as the
3.2 Adaptivity Loop 39
3.2.4 Actuation
Actuation, the process of changing a query plan, is a mechanism whose
cost depends on how flexible plan execution needs to be. Additionally,
when changing a plan, some previous work may be sacrificed, accumu-
lated execution state in the operators may not be reused easily and
may need to be recomputed.
In the simplest of cases, where query plans can only be changed
after a pipeline finishes (as with mid-query reoptimization [75] and
choose nodes [52]), actuation is essentially free, since it is inexpensive
to reconfigure operators that have not yet begun execution. However,
even with this restriction, it is possible that prior work might have to
be discarded: consider a scenario where sub-optimality is detected after
the build phase of a hash join, and the new plan chooses not to use the
hash table [18]. Kabra and DeWitt [75] explicitly disallowed such actu-
ations. Several other techniques, e.g., query scrambling [2], POP [87],
Rio [9], consider the availability of such state while re-planning
(Section 8).
In contrast, schemes that support the changing of query plans in
the middle of pipelined execution must be more careful. The main con-
cern here regards the state that gets accumulated inside the operators
during execution; if the query plan is changed, we must make sure that
the internal state is consistent with the new query plan. The adaptive
techniques proposed in literature have taken different approaches to
solving this problem. Some AQP techniques will only consider switch-
ing to a new query plan if it is consistent with the previously computed
internal state. For example, the original proposal for the eddies tech-
nique [6] does not allow the access methods chosen at the beginning
to be changed during execution. Although this restricts the adapta-
tion opportunities somewhat, these techniques can still adapt among
a fairly large class of query plans. Another alternative is to switch at
certain consistent points indicated by punctuation [116] markers (e.g.,
the end of a window or the change in a grouping value). Other tech-
40 Foundations of Adaptive Query Processing
R 1 S 1 T = (R 1 S1 1 T ) ∪ (R 1 S2 1 T )
3.4.1 System R
As discussed in Section 2.1.4, the System R query processor proposed
the optimize-then-execute paradigm still prevalent today. The adaptiv-
ity in this system was limited to inter-query adaptation.
3.4.2 Ingres
The query processor of Ingres, one of the earliest relational database
systems [113], is highly adaptive. It did not have the notion of a query
3.4 Adaptivity Loop and Post-mortem in Some Example Systems 43
As we can see, the Ingres query processor interleaves the four com-
ponents of the adaptivity loop to a great extent, and is highly adaptive
44 Foundations of Adaptive Query Processing
Post-mortem: Except in some very specific cases, the steps followed dur-
ing the execution of a query in Ingres cannot be written down as tra-
ditional query plans (even if the notion of horizontal partitioning is
employed).
3.4.3 Eddies
In essence, eddies unify the four components of the adaptivity loop into
a single unit and allow arbitrarily interleaving between them, leaving
analysis and actuation to the routing policy. We briefly discuss some of
the tradeoffs in the process here, and defer a more detailed discussion
to when we present the routing policies.
planning, several latter policies did this periodically [37, 40]; it could
also be done in response to certain events instead (as the A-Greedy
technique [10] does for selection ordering — see Section 4.1).
Actuation: The process of actuation and its cost depend largely on the
operators being used by the eddy, and also the plan space that the eddy
explores (Section 3.2.4). For selection-ordering queries executed using
stateless pipelined filters, the cost of actuation is negligible. For multi-
way join queries, the actuation cost can vary from negligible (if n-ary
hash join operator is used) to prohibitively high (if state manipulation
is required [40]).
a local disk, may be fetched from a remote data source using a variety
of access methods, or may be streaming into the system.
We relate each technique to the measure/analyze/plan/actuate
loop, and at the end of each section we include a brief recap of how all of
the discussed techniques fit into this loop. For several of the techniques,
we also discuss how a post-mortem analysis of the query execution may
be done, to get more insights into the behavior of those techniques.
4
Adaptive Selection Ordering
47
48 Adaptive Selection Ordering
(Section 6); this equivalence will not only aid us in understanding com-
plex multi-way join queries, but will also form the kernel around which
many of the adaptive techniques are designed. In fact, the Ingres system
makes little differentiation between selection ordering and join order-
ing. Through the technique of decomposition [122] (Section 3.4.2), it
maps join processing into a series of tuple lookups, variable bindings,
and selection predicates. Thus join really becomes a matter of deter-
mining an order for binding tuples and evaluating selections. Addition-
ally, the problem of selection ordering is quite well-understood, with
several analytical and theoretical results known for it. As we will see,
this is in large part due to the “stateless” nature of selection ordering
queries.
We begin this section by presenting an adaptive technique called
A-Greedy [10] that was proposed for evaluating selection ordering
queries over data streams (Section 4.1). We then consider adapting
using the eddy operator discussed in the previous section, and discuss
several routing policies proposed for adapting selection ordering queries
(Section 4.2). We conclude with a brief discussion of several extensions
of the selection ordering problem to parallel and distributed scenarios
(Section 4.3).
Sπ1 → · · · → Sπn would have been the order chosen by the Greedy
algorithm if the following property holds:
Intuitively, if this was not true for indexes πi and πj , then the next
operator chosen after π1 , . . . , πi−1 would have been Sπj , and not Sπi (see
Algorithm 2.1). The reoptimizer continuously monitors this invariant
over the profile tuples; if it discovers that the invariant is violated at
position i, it reoptimizes the evaluation order after Sπi by invoking the
Greedy algorithm over the profile tuples.
The reoptimizer uses a data structure called matrix-view, V
(Figure 4.1), for monitoring the invariant:
in the recent past (as defined by the length of the sliding window) are
predictive of the data properties in the near future. In a latter paper,
Munagala et al. [90] show how this assumption may be relaxed, and
present an online algorithm with a competitive ratio of O(log(n)).
We can examine A-Greedy using the adaptivity loop (Section 3.2):
Fig. 4.2 Executing a selection query using an eddy using the Lottery Scheduling policy.
Because of back-pressure, even though S1 has a high ticket count, the next tuple in the
queue at the eddy will not be routed to it.
4.2 Adaptation using Eddies 53
Table 4.1 Comparing the techniques and the routing policies discussed in this section using
the adaptivity loop. We omit the actuation aspect since the cost of actuation is negligible
for all of these.
Deterministic [37]
Measurement: Selectivities by observing operator behavior (tuples-
in/tuples-out); costs monitored explicitly.
Analysis and Planning: Periodically re-plan using rank ordering.
A-Greedy [10]
Measurement: Conditional selectivities by random sampling; costs
monitored explicitly.
Analysis: Detect violations of the greedy invariant.
Planning: Re-plan using the Greedy algorithm.
Lottery scheduling [6]
Measurement: Monitor ticket counts and queue lengths.
Analysis and planning: Choose the route per-tuple based on those.
Content-based routing [17]
Measurement: For each operator, monitor conditional selectivities
for the best classifier attribute.
Analysis and planning: Choose the route per-tuple based on val-
ues of classifier attributes and selectivities (exploits conditional
plans).
4.4 Summary
Selection ordering is a much simpler problem than optimizing com-
plex multi-way join queries, and this simplicity not only enables design
of efficient algorithms for solving this problem, but also makes the
problem more amenable to formal analysis. We discussed two adaptive
query processing techniques for this problem: A-Greedy and eddies.
The A-Greedy algorithm has two notable features: first, it takes into
account the correlations in the data, and second, approximation guar-
antees can be provided on its performance. The second technique we
saw, eddies, is more of an architectural mechanism enabling adap-
tivity that is not tied to any specific decision algorithm; we will
see this again when we discuss eddies for general multi-way join
queries.
4.4 Summary 59
As we will see in the next two sections, several problems that arise in
adaptive execution of multi-way join queries can be reduced to selection
ordering, and the algorithms we discussed in this section can be used
unchanged in those settings. This is not particularly surprising; the
similarities between selection ordering and multi-way join queries were
noted by Ibaraki and Kameda [67] who exploited these similarities to
design an efficient query optimization heuristic.
5
Adaptive Join Processing: Overview
60
61
Note that most database systems typically use query execution plans
with a mixture of blocking operators and pipelines; these can be
adapted independently using different AQP techniques [18].
6
Adaptive Join Processing: History-Independent
Pipelined Execution
1 When multiple drivers are permitted, query execution using binary join operators is not
history-independent, and will be discussed in the next section.
63
64 Adaptive Join Processing: History-Independent Pipelined Execution
Fig. 6.1 Two possible orderings of the driven relations. ci and fi denote the costs and
fanouts of the join operators, respectively.
and its tuples are joined with the other (driven) relations one-by-one
(Figure 6.1). The joins are done as nested-loops joins or index joins
or hash joins, depending on the access method used for the driven
relations. If the driven relation is scanned, the join is a nested-loops
join; if it is looked up via an existing index, the join is an index join; if
it is looked up via a dynamically constructed hash index, it is a hash
join. Figure 6.1 shows two example pipelined plans for a 4-relation
query, with the driver relation S.
This type of execution is history-independent in the following sense.
For a fixed driving tuple s, the behavior of the join operators (e.g., their
join selectivities with respect to s) is independent of when s arrives,
whether it is the first tuple from S, the last tuple, or any other. In
other words, the joining with s is a side-effect free operation that does
not alter the internal state of the join operators (the only state change
is the cursor needed to generate join matches, which is transient).
We begin by discussing static optimization in this plan space and
show how the problem of join ordering can be reduced to the selection
ordering problem.
relation, the choice of the access method for performing each join, and
the ordering of the joins.
purpose. The Greedy algorithm (Section 2.1.2) can also be used for
this purpose if the correlations across the join operators are known or
monitored.
Adapting the Probing Sequences: Similarly, we can use the
adaptive techniques presented in Section 4 for adapting the probing
sequences. Each of the probing sequences must be adapted separately,
though some statistics can be shared between these. Furthermore, as
discussed in Section 6.1.1.2, the probing sequences may have to obey
precedence constraints depending on the query. Babu et al. [10], in
the context of the STREAM project, use the A-Greedy algorithm
(Section 4.1) for this purpose.
that specify not a particular n-ary join algorithm but rather a charac-
terization of the space of correct n-ary join algorithms. For example,
a slight variation on this dataflow results in an index join, as we see
next.
Asynchronous Index Joins using SteMs: A simple way to perform
an index join is to replace one of the scan AMs in Figure 6.2, say that
on R, with an index AM, as shown in Figure 6.3. Now, R tuples enter
the dataflow only in response to probes from S or T . Notice that the
index AMR returns non-concatenated matches (i.e., tuples with only
the R fields). These R matches will concatenate with the corresponding
terms of the state built in the operators. This allows the correct-
ness of the join algorithm to be enforced by an external router like
an eddy, without breaking the encapsulation offered by the separate
operators.
Fig. 6.4 (i) MJoins or SteMs implicit partition the data by order of arrival and routing
decisions; (ii) subqueries are evaluated using left-deep pipelined plans.
6.2.5 Discussion
Although these strategies are easier to reason about and to design poli-
cies for, they tend to suffer from suboptimal performance compared to
using a tree of binary joins or using an eddy with binary join operators
(Section 7.2). The main reason is that these strategies do not reuse
any intermediate tuples that may have been produced during query
execution. This leads to two problems:
Consider the example query in Figure 6.1, and let the current
choice for the probing sequence for R tuples be S → T → U (Fig-
ure 6.5). In other words, a new R tuple, say r, is first joined with
S, and then with T and finally with U . If two consecutive (or close
by) tuples r1 and r2 have identical values of the attribute a (the join
attribute for the join with S), then the resulting S 1 T 1 U matches
for the probes will be identical. However, the MJoins approach is
unable to take advantage of such similarities, and will re-execute the
joins.
The key additional construct of the A-Caching approach is an inter-
mediate result cache. An intermediate result cache, CY X is defined by:
(a) X, the relation for which it is maintained, and (b) Y, the set of
operators in the probing sequence for which it stores the cached results.
For example, the cache for storing the S 1 T 1 U matches for the R
tuples will be denoted by C1Ra S,1b T,1c U . The cached entries are of the
form (u, v) where u denotes the value of the join attribute (attribute
R.a in this case), and v denotes the result tuples that would be gener-
ated by probing into the corresponding operators. A cache might not
contain results for all possible values of the join attribute, but if it does
contain at least one result for a specific value of the join attribute, it
must contain all matching results for that value. In other words, if there
is an entry (a1 , (ST U )1 ) in the cache, then (ST U )1 must be equal to
σS.a=a1 (S 1 T 1 U ).
6.3 Adaptive Caching (A-Caching) 79
Cache Update: There are two ways a cache might be updated. First,
the results found by probing into the operators when there is a cache
miss can be inserted back into the cache. Second, when new T and U
tuples arrive (or existing T or U tuples need to be deleted because of
sliding windows on those relations), the cache needs to be updated to
satisfy the constraint described above. This latter step can be compu-
tationally expensive and hence the choice of which caches to maintain
must be made carefully.
6.4 Summary
In this section, we saw several techniques for adaptation of multi-way
join queries when the execution is pipelined and history-independent.
Table 6.1 compares two — StreaMON and SteMs, using the adaptivity
loop framework. StreaMON is a specific adaptive system that chooses
the probing sequence using the A-Greedy algorithm of Section 4.1,
while SteMs enable an adaptation framework that allow a rich space of
routing policies.
The biggest advantage of history-independent execution is that the
plan space is much easier to analyze and to design algorithms for. Con-
sider a case when a total of N tuples have been processed by the sys-
tem, and let M denote the number of choices that the executor faced.
We now turn our focus to the space of pipelined query plans where
the state built up inside the join operators during query execution
depends on the optimization or adaptation choices made by the query
processor. This includes the class of traditional query plans where a
tree of fully pipelined (symmetric) binary join operators is used to
execute a query with multiple driver relations; the state accumulated
inside the operators depends on the join order being used to execute the
query.
We begin our discussion with corrective query processing, which uses
a conventional (binary) query plan at any time, but may use multi-
ple plans over the entire execution (Section 7.1). We then revisit the
eddies architecture, and consider adaptation when binary pipelined join
operators are used with eddies (Section 7.2). In Sections 7.3 and 7.4,
we present the STAIR operator and the CAPE adaptive query pro-
cessor, respectively, both of which allow explicit state manipulation
during query processing. Finally, we compare these schemes using the
adaptivity loop, and conclude with a discussion of pros and cons of
history-dependent schemes (Section 7.5).
82
7.1 Corrective Query Processing 83
Fig. 7.1 An aggregation/join query as a combination of two plans plus a final stitch-up plan
(see Section 7.1.2).
end. The CQP engine chooses an initial plan, and begins executing this
first plan, requesting data from remote sources in the form of (finite)
sequential streams. As execution progresses, the CQP engine monitors
cost and cardinality and performs re-estimation to determine whether
the initial plan is performing adequately. If not, the plan can be reopti-
mized based on extrapolation from the data encountered to this point.
If a more promising plan is discovered, the current plan’s input is sus-
pended and it completes processing of the partition of data it has seen
so far (represented by F 0 , T 0 , C 0 ). Now, the new plan begins execu-
tion in a new phase (Plan 1 in the figure), over the remaining data
from the source streams (F 1 , T 1 , C 1 ). The plan replacement process
repeats as many times as the optimizer finds more promising alter-
native plans, until all source streams have been consumed. Finally, a
stitch-up phase takes the intermediate results computed in all previous
phases and combines them to return the remaining results. (We note
that the stitch-up phase is not strictly required by the CQP approach,
as cross-phase computation could be integrated into any prior phase —
but the initial work in [73] chose this implementation.) Stitch-up plan
generation is based on the algebraic properties of the query plan and
we discuss it in more detail in Section 7.1.2.
We can examine CQP through the lens of Section 3.2 as follows.
One other aspect worth noting is that the CQP approach scales to
workloads that are larger than memory [71]: in fact, there is a natural
parallel between the horizontal partitioning done on the data by phased
execution and the overflow resolution schemes used by hash join algo-
rithms. We now discuss some of the major design points of the CQP
approach.
2A third aspect, tuple ordering, which might prioritize some tuples over others [97], allows
for asymmetric treatment of output results, but is not considered here.
7.1 Corrective Query Processing 87
The eddy scheduling scheme combines both of these factors in its tuple
routing; in contrast, CQP separates them under the belief that aggres-
sively scheduling a less-desirable plan ordering is generally undesirable
even in the presence of intermittent delays.
CQP relies on pipelined hash joins to perform scheduling, in a way
that masks most I/O delays. Its focus is solely on cost-based plan selec-
tion, ensuring the plan is changed at consistent points, and any neces-
sary cross-phase stitch-up computations.
Join: We can take any join expression over m relations, each divided
into n partitions, and write it as
R1 1 · · · 1 R m = (R1c1 1 · · · 1 Rm
cm
),
1≤c1 ≤n,...,1≤cm ≤n
c
where Rjj represents some subset of relation Rj . This is equivalent
to the series of join expressions between subsets that have matching
superscripts:
R1i 1 · · · 1 Rm
i
, 1≤i≤n
Plus the union of all remaining combinations:
{t|t ∈ (R1c1 1 · · · 1 Rm
cm
), 1 ≤ ci ≤ n, ¬(c1 = · · · = cm )}
Fig. 7.2 Internals of pipelined hash join vs. complementary join pair; “Q”s represent queues
between threads.
Fig. 7.3 An eddy instantiated for the query R 1a S 1b T . Valid options for routing are
labeled on the edges.
eddy is a tuple router that sits at the center of a dataflow observing the
data and operator characteristics, and affects plan changes by changing
the way tuples are routed through the operators.
Figure 7.3 shows the eddy instantiated for a three-relation join
query R 1a S 1b T (we will use this query as the running example for
this section). For this query, along with the eddy operator, two doubly
pipelined (symmetric) hash join operators are instantiated.
The query is executed by routing the input tuples through these two
operators. The valid routing options for various types of tuples (shown
on the data flow edges) are as follows:
• Avnur et al. [6] maintain the lineage in the form of ready and
done bits (cf. Section 3.1.2) that are associated with each
tuple. The done bits indicate which operators the tuple has
already visited, whereas the ready bits indicate the valid rout-
ing destinations for the tuple. The operators are in charge of
setting these bits for a tuple before it is returned to the eddy.
• For efficiency reasons, a latter implementation of eddies in
PostgreSQL [37, 40] used a routing table for this purpose.
The lineage of the tuple was encoded as an integer that was
treated as a bitmap (similar to the done bitmap). A rout-
ing table, say r-table, initialized once at the beginning of
query execution, maintained the valid routing destinations
for tuples with lineage x at r-table[x ], thereby allowing the
eddy to efficiently find valid routing destinations for any
tuple. The size of the routing table, however, is exponen-
tial in the number of operators and hence the approach is
not suitable for queries with a large number of operators.
Fig. 7.4 (i) Distribution of tuples at the end of the execution for the query R 1 S 1 T ;
S1 and S2 denote the partitions of S that were routed to R 1 S and S 1 T operators,
respectively. (ii) The two query execution plans executed by the eddy.
Designing effective routing policies for the case of eddies with binary
join operators remains an important open problem in this area. Next
we attempt to provide some insights into this problem by analyzing
eddies, and discussing the issue of state accumulation in more detail.
We then briefly describe an operator called STAIR that was proposed
to handle some of the state accumulation issues.
Note that if the ST tuples were themselves split among the two
possible destinations (R 1 S and T 1 U ), S and T must be further
96 Adaptive Join Processing: History-Dependent Pipelined Execution
Fig. 7.5 An example distribution of tuples for query R 1a S 1b T 1c U at the end of query
execution.
Fig. 7.6 (i) Query execution state at time τ when using an eddy and STAIRs — Rτ denotes
the tuples of relation R that have been processed by time τ ; (ii) The execution state after
Demote(Rτ 1 SτR ∈ S.b, SτR ); (iii) The execution state after Promote(SτR ∈ S.a, S.b, T .b).
Figure 7.6 (iii) shows the result of these operations. As we can see,
the state now reflects what it would have been if SτR had previously been
routed to the S.b and T .b STAIRs, instead of R.a and S.a STAIRs. As
a result, future R tuples will not be forced to join with SτR .
The process of moving state from one STAIR to another is referred
to as state migration.
• (Rτ 1 SτR ) 1 Tτ
• Rτ 1 (SτT 1 Tτ )
• SτR 1 Tτ
• R∞ 1 (S∞ 1 T∞ ) − Rτ 1 Sτ 1 Tτ
7.4 Dynamic Plan Migration in CAPE 103
The first two rows are the plans executed till time τ , and the third
row shows the work done during the state migration step; the last row
specifies the work done after time τ (we abuse the notation somewhat
to indicate that the results already produced by time τ are not regen-
erated).
7.5 Summary
In this section, we discussed a variety of schemes for adaptive query
processing that use trees of binary join operators for query execu-
tion. Table 7.1 recaps some of these techniques from the perspective
of the adaptivity loop. The main challenge with using binary joins
for execution is dealing with and reasoning about the state that gets
Table 7.1 Comparing some of the techniques discussed in this section using the adaptivity
loop.
CQP [73]
Measurement: Operator cardinalities (hence join selectivities);
sort orders; incremental histograms.
Analysis and planning: Periodic or trigger-based re-planning
using an integrated query reoptimizer.
Actuation: By replacing the query plan operators other than
the root and leaves. Requires a stitch-up plan at the end.
106
8.1 Plan Staging 107
1 Thisdoes put the application program in the business of guiding query plans, which goes
against the relational philosophy.
8.2 Mid-Query Reoptimization 109
2 Any other statistic over these tuples, such as a frequency distribution, can be substituted
for cardinality.
110 Adaptive Join Processing: Non-pipelined Execution
3 Generalizing this for near-optimal plans is straightforward: the optimizer computes the
rectangle corresponding to cost(P 1) − αcost(P 2) < β for suitable constants α and β.
116 Adaptive Join Processing: Non-pipelined Execution
ear query plans, where at least one input to every join has a known
cardinality.
Fig. 8.5 Query scrambling example. When a delay in arrival of sales causes plan portion
(1) to stall, we can run plan portion (2) until the delay is resolved.
118 Adaptive Join Processing: Non-pipelined Execution
Table 8.1 Comparing some of the techniques discussed in this section using the adaptivity
loop.
Plan staging
Measurement: Sizes of intermediate results.
Analysis and planning: None.
Actuation: Resubmit next stage of query to the DBMS.
Mid-query reoptimization [9, 75, 87]
Measurement: Cardinalities or other table statistics computed at
checkpoints.
Analysis: Detect violations of validity ranges.
Planning: Resubmit a changed query [75], or re-invoke optimizer
on original query, exposing intermediate results as materialized
views [87], or switch to a pre-chosen alternative plan [9].
Actuation: Instantiate operators according to the new plan and start
executing them.
Query scrambling [118]
Measurement: Delays in data arrival.
Analysis and planning: Choose a join that can be run during the
delay, or re-invoke the optimizer to synthesize new operators.
Actuation: Schedule an operator that can be executed given the
availability of data sources.
120 Adaptive Join Processing: Non-pipelined Execution
121
122 Summary and Open Questions
plan than joins S and T , only to learn that they are also correlated.
This process can continue as it slowly learns about all the correlation
present in the system — each time, the query processor will switch to
plan that joins tables not joined until then. This phenomenon is easily
observed during mid-query reoptimization and is described as “fleeing
from knowledge to ignorance” in [44]. In some cases that can be a very
good strategy, as it focuses on exploring the unknown; however, in other
cases it simply results in a continuous succession of bad plans. Again, we
are hopeful that techniques from machine learning might be helpful in
this area.
130
References
131
132 References
[59] P. J. Haas and J. M. Hellerstein, “Ripple joins for online aggregation,” in SIG-
MOD ’99: Proceedings of the 1999 ACM SIGMOD international conference
on Management of data, (New York, NY, USA), pp. 287–298, ACM Press,
1999.
[60] J. M. Hellerstein, “Optimization techniques for queries with expensive meth-
ods,” ACM Transactions on Database Systems, vol. 23, no. 2, pp. 113–157,
1998.
[61] J. M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman,
T. Roth, and P. J. Haas, “Interactive data analysis: The Control project,”
Computer, vol. 32, no. 8, pp. 51–59, 1999.
[62] M. Herbster and M. K. Warmuth, “Tracking the best expert,” Machine Learn-
ing, vol. 32, pp. 151–178, August 1998.
[63] D. A. Huffman, “A method for the construction of minimum redundancy
codes,” in Proc. Inst. Radio Eng., pp. 1098–1101, 1952.
[64] A. Hulgeri and S. Sudarshan, “Parametric query optimization for linear and
piecewise linear cost functions.,” in VLDB ’02: Proceedings of 28th Interna-
tional Conference on Very Large Data Bases, August 20-23, 2002, Hong Kong,
China, pp. 167–178, 2002.
[65] A. Hulgeri and S. Sudarshan, “AniPQO: Almost non-intrusive parametric
query optimization for nonlinear cost functions.,” in VLDB ’03: Proceedings
of 29th International Conference on Very Large Data Bases, September 9-12,
2003, Berlin, Germany, pp. 766–777, 2003.
[66] J.-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and
S. Zdonik, “High-availability algorithms for distributed stream processing,”
in ICDE ’05: Proceedings of the 21st International Conference on Data Engi-
neering (ICDE’05), (Washington, DC, USA), pp. 779–790, IEEE Computer
Society, 2005.
[67] T. Ibaraki and T. Kameda, “On the optimal nesting order for computing
N-relational joins,” ACM Transactions on Database Systems, vol. 9, no. 3,
pp. 482–502, 1984.
[68] Y. E. Ioannidis, “Query optimization,” ACM Computing Surveys, vol. 28,
no. 1, pp. 121–123, 1996.
[69] Y. E. Ioannidis and S. Christodoulakis, “On the propagation of errors in the
size of join results,” in SIGMOD ’91: Proceedings of the 1991 ACM SIGMOD
international conference on Management of data, (New York, NY, USA),
pp. 268–277, ACM Press, 1991.
[70] Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis, “Parametric query
optimization,” The VLDB Journal, vol. 6, no. 2, pp. 132–151, 1997.
[71] Z. G. Ives, Efficient Query Processing for Data Integration. PhD thesis, Uni-
versity of Washington, August 2002.
[72] Z. G. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld, “An adaptive
query execution system for data integration,” in SIGMOD ’99: Proceedings
of the 1999 ACM SIGMOD international conference on Management of data,
(New York, NY, USA), pp. 299–310, ACM Press, 1999.
[73] Z. G. Ives, A. Y. Halevy, and D. S. Weld, “Adapting to source properties in
processing data integration queries,” in SIGMOD ’04: Proceedings of the 2004
References 137
ference on Very Large Data Bases, pp. 412–419, Morgan Kaufmann Publishers
Inc., 1986.
[100] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe, “Efficient and extensible
algorithms for multi query optimization,” in SIGMOD ’00: Proceedings of the
2000 ACM SIGMOD international conference on Management of data, (New
York, NY, USA), pp. 249–260, ACM Press, 2000.
[101] E. A. Rundensteiner, L. Ding, T. M. Sutherland, Y. Zhu, B. Pielech, and
N. Mehta, “CAPE: Continuous query engine with heterogeneous-grained
adaptivity,” in VLDB ’04: Proceedings of the Thirtieth International Con-
ference on Very Large Data Bases, Toronto, Canada, pp. 1353–1356, 2004.
[102] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G.
Price, “Access path selection in a relational database management system,”
in SIGMOD ’79: Proceedings of the 1979 ACM SIGMOD International Con-
ference on Management of Data, 1979.
[103] T. K. Sellis, “Multiple-query optimization,” ACM Trans. Database Syst.,
vol. 13, no. 1, pp. 23–52, 1988.
[104] P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. Y. C. Leung, R. Ramakrishnan,
D. Srivastava, P. J. Stuckey, and S. Sudarshan, “Cost-based optimization for
magic: Algebra and implementation,” in SIGMOD ’96: Proceedings of the 1996
ACM SIGMOD International Conference on Management of Data, pp. 435–
446, ACM Press, 1996.
[105] P. Seshadri, H. Pirahesh, and T. Y. C. Leung, “Complex query decorrelation,”
in ICDE ’96: Proceedings of the Twelfth International Conference on Data
Engineering, New Orleans, LA, pp. 450–458, February 26–March 1 1996.
[106] M. A. Shah, J. M. Hellerstein, and E. Brewer, “Highly available, fault-tolerant,
parallel dataflows,” in SIGMOD ’04: Proceedings of the 2004 ACM SIGMOD
international conference on Management of data, (New York, NY, USA),
pp. 827–838, ACM Press, 2004.
[107] J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. F. Naughton, and D. Maier,
“Architecting a network query engine for producing partial results,” in ACM
SIGMOD Workshop on the Web (WebDB) 2000, Dallas, TX, pp. 17–22, 2000.
[108] M. A. Shayman and E. Fernandez-Gaucherand, “Risk-sensitive decision-
theoretic diagnosis,” IEEE Transactions on Automatic Control, vol. 46,
pp. 1166–1171, 2001.
[109] H. Simon and J. Kadane, “Optimal problem-solving search: All-or-none solu-
tions,” Artificial Intelligence, vol. 6, pp. 235–247, 1975.
[110] U. Srivastava, K. Munagala, and J. Widom, “Operator placement for
in-network stream query processing,” in PODS ’05: Proceedings of the
Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems, pp. 250–258, 2005.
[111] U. Srivastava, K. Munagala, J. Widom, and R. Motwani, “Query optimiza-
tion over web services,” in VLDB ’06: Proceedings of the 32nd international
conference on Very large data bases, pp. 355–366, VLDB Endowment, 2006.
[112] M. Stillger, G. Lohman, V. Markl, and M. Kandil, “LEO – DB2’s LEarning
Optimizer,” in VLDB ’01: Proceedings of 27th International Conference on
Very Large Data Bases, Morgan Kaufmann, September 11–14 2001.
140 References
[113] M. Stonebraker, E. Wong, P. Kreps, and G. Held, “The design and imple-
mentation of Ingres,” ACM Transactions on Database Systems, vol. 1, no. 3,
pp. 189–222, 1976.
[114] M. Templeton, H. Henley, E. Maros, and D. J. V. Buer, “InterViso: Dealing
with the complexity of federated database access,” The VLDB Journal, vol. 4,
no. 2, 1995.
[115] F. Tian and D. J. DeWitt, “Tuple routing strategies for distributed eddies,” in
VLDB ’03: Proceedings of 29th International Conference on Very Large Data
Bases, pp. 333–344, Berlin, Germany: Morgan Kaufmann, September 9–12
2003.
[116] P. A. Tucker and D. Maier, “Exploiting punctuation semantics in data
streams,” in ICDE ’02: Proceedings of the 18th International Conference on
Data Engineering, (Washington, DC, USA), p. 279, IEEE Computer Society,
2002.
[117] T. Urhan and M. J. Franklin, “XJoin: a reactively-scheduled pipelined join
operator,” IEEE Data Engineering Bulletin, vol. 23, no. 2, pp. 27–33, 2000.
[118] T. Urhan, M. J. Franklin, and L. Amsaleg, “Cost based query scrambling
for initial delays,” in SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD
International Conference on Management of Data, pp. 130–141, Seattle, WA:
ACM Press, June 2–4 1998.
[119] E. Viglas and S.-J. F. Naughton, Novel Query Optimization and Evaluation
Techniques. PhD thesis, University of Wisconsin at Madison, 2003.
[120] S. Viglas, J. F. Naughton, and J. Burger, “Maximizing the output rate of
multi-way join queries over streaming information sources,” in VLDB ’03:
Proceedings of the 29th International Conference on Very Large Data Bases,
Berlin, Germany: Morgan Kaufmann, September 9–12 2003.
[121] A. N. Wilschut and P. M. G. Apers, “Dataflow query execution in a par-
allel main-memory environment,” in PDIS ’91: Proceedings of the First
International Conference on Parallel and Distributed Information Systems,
Fontainebleu Hilton Resort, Miami Beach, FL, pp. 68–77, IEEE Computer
Society, 1991.
[122] E. Wong and K. Youssefi, “Decomposition — strategy for query processing,”
ACM Transactions on Database Systems, vol. 1, no. 3, pp. 223–241, 1976.
[123] D. Zhang, J. Li, K. Kimeli, and W. Wang, “Sliding window based multi-join
algorithms over distributed data streams,” in ICDE ’06: Proceedings of the
22nd International Conference on Data Engineering (ICDE’06), (Washington,
DC, USA), p. 139, IEEE Computer Society, 2006.
[124] Y. Zhu, E. A. Rundensteiner, and G. T. Heineman, “Dynamic plan migration
for continuous queries over data streams,” in SIGMOD ’04: Proceedings of the
2004 ACM SIGMOD international conference on Management of data, (New
York, NY, USA), pp. 431–442, ACM Press, 2004.