VLDB 2023 - Simple Adaptive Query Processing Vs Learned Query Optimizers - Observations and Analysis (P2962-Zhang)
VLDB 2023 - Simple Adaptive Query Processing Vs Learned Query Optimizers - Observations and Analysis (P2962-Zhang)
ABSTRACT 1 INTRODUCTION
There have been many decades of work on optimizing query pro- Query optimizers (QOs) are performance-critical components of
cessing in database management systems. Recently, modern ma- database management systems (DBMS) as they pick efficient phys-
chine learning (ML), and specifically reinforcement learning (RL), ical query plans for input declarative queries. At the same time,
has gained increased attention as a means to develop a query op- QOs are notoriously hard to develop, fine-tune, and maintain. As
timizer (QO). In this work, we take a closer look at two recent a result, machine learning (ML) has been explored as a means to
state-of-the-art (SOTA) RL-based QO methods to better understand either improve the performance of an existing QO or to shorten the
their behavior. We find that these RL-based methods do not general- effort required to develop a new QO [44, 45, 57].
ize as well as it seems at first glance. Thus, we ask a simple question: However, learned query optimizers have several potential draw-
How do SOTA RL-based QOs compare to a simple, modern, adaptive backs. From the perspective of a query optimizer developer, before
query processing approach? To answer this question, we choose applying a new method to the query optimizer, they need to un-
two simple adaptive query processing techniques and implemented derstand two critical questions: 1) where does the improvement
them in PostgreSQL. The first adapts an individual join operation come from, and 2) when could the learned optimizer fail. As with all
on-the-fly and switches between a Nested Loop Join algorithm and ML-based software, these two questions are difficult to answer for
a Hash Join algorithm to avoid sub-optimal join algorithm decisions. learned query optimizers. First, ML models have a black-box nature;
The second is a technique called Lookahead Information Passing their decisions and performance for different inputs are hard to
(LIP), in which adaptive semijoin techniques are used to make a explain. Second, learned QOs may overfit to the data and scenarios
pipeline of join operations execute efficiently. To our surprise, we used during training, thus, exhibiting weak robustness to workload
find that this simple adaptive query processing approach is not only variations. This is a classic problem with ML-based software, and
competitive to the SOTA RL-based approaches but, in some cases, while recent works attempt to resolve it using methods such as
outperforms the RL-based approaches. The adaptive approach is transfer learning, it remains challenging. Finally, current learned
also appealing because it does not require an expensive training QOs have focused mainly on non-complex query structures and it
step, and it is fully interpretable compared to the RL-based QO ap- is unclear if they can generalize to complex SQL structures such as
proaches. Further, the adaptive method works across complex query common table expressions (CTEs). Due to the above challenges, to
constructs that RL-based QO methods currently cannot optimize. the best of our knowledge, to date, no mainstream database engine
(commercial or open source) has adopted a learned QO approach,
PVLDB Reference Format: which may indicate that there may be some way to go before learned
Yunjia Zhang, Yannis Chronis, Jignesh M. Patel, and Theodoros Rekatsinas. query optimizers go mainstream. However, a learned QO is not the
Simple Adaptive Query Processing vs. Learned Query Optimizers: only approach that aims for intelligent query processing.
Observations and Analysis . PVLDB, 16(11): 2962 - 2975, 2023. Adaptive query processing is a different popular approach to
doi:10.14778/3611479.3611501 efficient query processing. There is a rich body of work here, as
these techniques have been studied by the database community
PVLDB Artifact Availability:
for decades (e.g., [11, 19, 24, 25, 28, 33, 35, 37, 59]). An adaptive
The source code, data, and/or other artifacts have been made available at
approach to query processing aims to use runtime methods to adjust
https://github.com/yunjiazhang/adaptiveness_vs_learning.
the query processing steps to achieve high performance. In contrast
to a learned QO approach, an adaptive query processing approach
∗ Currently at Google.
† Work
does not intrinsically overfit to any workload since there is no
done while at U. Wisconsin.
‡ Currently at Apple. static training step. Moreover, adaptive query processing techniques
This work is licensed under the Creative Commons BY-NC-ND 4.0 International have already been adopted by several DBMS products, including
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of Oracle [15], SQL Server [56], Spark SQL [52], and SQLite [29].
this license. For any use beyond those covered by this license, obtain permission by
emailing [email protected]. Copyright is held by the owner/author(s). Publication rights An interesting observation is that philosophically there is a key
licensed to the VLDB Endowment. similarity between a reinforcement learning (RL) based optimizer
Proceedings of the VLDB Endowment, Vol. 16, No. 11 ISSN 2150-8097. and adaptive query processing. Compared with a traditional QO, a
doi:10.14778/3611479.3611501
learned QO utilizes neural network-based models that are trained
2962
on latency observations over static training workloads to 1) refine PostgreSQL Bao
cost estimation and 2) predict a join order and configuration of Balsa LIP+AJA
physical operators that will minimize the processing time of a query. ×101 ×102
In an adaptive query processing engine, such cost estimations are
2963
Although the two techniques LIP and AJA have been sepa- programming [8, 50]) can produce a suboptimal query plan with a
rately proposed before, we combine these two techniques high execution cost.
in a unified adaptive query processing framework.
• We evaluate the LIP+AJA approach on the commonly used 2.2 Reinforcement Learning in QO
JOB [41] and Stack [44] benchmark queries. We show that
Reinforcement Learning (RL) has recently become a popular founda-
this approach speeds up the JOB and Stack queries signif-
tion for proposals that aim to rethink query optimizers [44, 45, 57].
icantly and, in many cases, is better than what Balsa and
In this paper, we focus on two SOTA RL-based query optimizers:
Bao can achieve.
Balsa [57] and Bao [44].
• We demonstrate the versatility of the LIP+AJA approach
In Balsa, states correspond to partial plans, and actions corre-
by applying it to six TPC-H queries that contain CTEs.
spond to steps in building a query plan. An RL-based optimizer
A speedup of 1.4x was observed in this workload, which
generates query plans in a bottom-up fashion: it starts with a sin-
learned QOs have not been able to tackle to-date.
gle scan node and builds plans by adding new joins to the current
node, constructing partial plans, until there are no tables that can be
2 BACKGROUND added to the join. For example, consider a query that needs to join
the tables {𝐴, 𝐵, 𝐶}. The RL-based optimizer starts by considering
This section introduces the necessary background and the termi-
performing a single scan over one of the tables 𝐴, 𝐵, or 𝐶. Given a
nologies that are used throughout the paper.
chosen table, the next possible actions correspond to joining this
table with one of the other tables. For example, if Table 𝐴 is chosen
2.1 Query Optimization first, the next actions can be 𝐴 ⋈︁ 𝐵 or 𝐴 ⋈︁ 𝐶. The search proceeds
in an iterative manner until the desired join conditions are met.
There are two critical optimization dimensions considered by most
At the core of Balsa lies a cost model powered by a neural net-
query optimizers: 1) join order, and 2) physical operators. Both
work (NN). The input of the NN is a (partial) physical query plan
dimensions largely affect the execution latency of the query plans.
with information on both the join order and the physical operators.
Join Order. In a query plan with multiple equijoins (the focus
Given this input, the NN outputs a real number that estimates the
of this paper and also the focus of previous work on ML-based
overall minimum query latency corresponding to the plan [57]. At
QO), the join order defines the order of relations in which the
each step during the search, partial plans are given to the learned
equijoins are performed. A good join order starts with the join
cost model, and the cheapest partial plan is kept for further search-
operation that prunes most of the data and minimizes the data
ing. The NN-based cost model is trained on collected true query
that is passed to subsequent joins. While optimizing the join order
latencies by executing physical query plans in the underlying DBMS.
is critical to the efficiency of a query plan, obtaining a good join
During training, at each iteration, the RL-based optimizer first gen-
order is challenging given the large exponential search space and
erates plans for training queries with the current cost model. The
inaccurate cost estimation models [41].
generated plans are then fed into the DBMS execution engine to
Physical Operators. In a physical plan, the physical operators de-
retrieve the true execution cost. Finally, the NN-based cost model
fine the algorithm that is used to perform the logical operations. In
is re-trained/updated using the plans and their corresponding exe-
this paper, we only consider the two important operators, namely
cution costs. RL-based query optimizers also balance exploitation
scan and equijoin. There are two commonly supported physical op-
(selecting the min-cost plan) and exploration (selecting an unseen
erators for the scan operation: 1) index scan, and 2) sequential scan.
plan) to learn a more accurate cost model.
An index scan can be more efficient when a large number of records
As for Bao, although Bao also adopts an NN-based cost model
are eliminated by the selection criteria on the indexed attribute,
that predicts the performance of query plans, the action space of
otherwise, a sequential scan is more efficient. Three algorithms are
Bao is restricted by a limited set of hints that steer the underlying
commonly used for the equijoin operation: 1) hash join, 2) nested
query optimizer. These hints specify the usage of a combination of
loop join, and 3) merge join. Hash join can be more efficient for
different join operators (hash, merge, and nested loop joins) and
large relations; nested loop join can be a better choice for small
scan operators (sequential, index, and index-only scans) [4]. Bao
relations; merge join can outperform the other two when the two
predicts hints of physical operators for a given query and relies
input relations are sorted on the join key. The choice of physical
on the underlying query optimizer to select the join order. Given a
operators is also critical when assembling an efficient query plan.
query, Bao first builds query plans for each of the hints by passing
Cost Models. The goal of a query optimizer is to select a plan that
the query with hints to the underlying query optimizer, and then
minimizes the execution cost. Since the real execution cost cannot
uses the NN-based cost model to select a plan to execute. The
be retrieved upfront, a query optimizer uses a cost model to make
training loop for Bao is similar to that of Balsa: it uses the actual
an estimate, and use a search algorithm to identify which choice of
execution cost of the query plan as the training ground truth, and
join order and physical join algorithms will minimize the query cost
it also strikes a balance between exploitation and exploration.
given the cost model [50]. However, the cost model often provides
weak estimates of the true cost of a query plan due to inaccurate
cardinality estimates, especially for intermediate results [41]. This 3 ANALYZING LEARNED QUERY OPTIMIZERS
phenomenon is especially true when there are data correlations Given a declarative query, the optimization space considered by
and/or dependencies in the underlying database. Thus, even an learned query optimizers is 1) the order in which joins are evaluated
exhaustive exploration of the plan search space (using dynamic and 2) the physical operators. To better understand the individual
2964
Table 1: Test query sets from JOB, Stack, and TPC-H. Stack-Rand and 2) Stack-Slow. In Stack-Rand, we randomly select
1,238 queries added to the test set and use the remaining ones as
# of avg. # min. # max. # training queries. For Stack-Slow, we use the slowest 1,238 queries in
queries of join of join of join PostgreSQL as the test queries, and the remaining queries are used
JOB-Join-1 23 4 3 4 to train the learned QOs. Table 1 summarizes these four workloads
JOB-Join-2 18 6 5 6 (along with other workloads introduced later in Section 6).
JOB-Join-3 21 7 7 7 Given that different initialization of the Balsa model can result in
JOB-Join-4 21 8 7 8 diverse workload performance, we conduct five independent train-
JOB-Join-5 21 11 10 11 ing of Balsa using distinct random initialization. In the following
JOB-Join-6 9 14 13 16 experiments, we report the mean workload run times of the five
JOB-Rand 19 7 4 11 different predicted query plan sets, along with the standard errors
JOB-Slow 19 7 4 11 of the mean workload run times.
JOB-Rand-CV 19 × 5 - - - Training Setups. Similar to [57], we use Balsa with PostgreSQL as
Stack-Join-1 1,172 3 3 3 our test framework. We explore four configurations:
Stack-Join-2 1,108 5 5 5
• Balsa-overfit. In this setup, all queries in the workload (all
Stack-Join-3 2,409 6 6 6
113 queries in the JOB benchmark [41], or all 6,191 queries
Stack-Join-4 1,202 7 7 7
in the Stack benchmark [44]) are added to the training set.
Stack-Join-5 100 8 8 8
We use the trained model to predict the plans (both physical
Stack-Join-6 200 10 10 11
join algorithms and join orders) for the queries in the test
Stack-Rand 1,238 6 3 11
set. With this setup, we seek to evaluate Balsa’s memoriza-
Stack-Slow 1,238 6 3 11
tion performance and the best-case improvements that it
Stack-Rand-CV 1, 238 × 5 - - -
can provide for fixed query workloads.
TPC-H-Complex 6 4 3 8
• Balsa-overfit-JO. In this setup, we use the same model
as Balsa-overfit, i.e., a model trained on all queries, but we
impact of these two dimensions, we conduct an ablation study on only consider the predicted join order. Then, we use the
the predictions of the learned query optimizer. For learned query PostgreSQL optimizer to select the join algorithms. This
optimizers, these optimization decisions can be made either directly configuration allows us to understand the contribution of
(Balsa) or indirectly through hints provided to the underlying opti- the join order in isolation.
mizer (Bao). Since Bao only focuses on steering hints for physical • Balsa. This setup is the original Balsa configuration. The
operators and relies on the underlying query optimizer for join model is trained only on the training portions of the JOB-
orders, we choose to use Balsa for our study in this section as it Rand and the JOB-Slow workloads.
covers the entire optimization space. • Balsa-JO. This configuration uses the learned Balsa model
obtained by the previous configuration but, again, we only
3.1 Where do the Improvements Come from? consider the predicted join order provided by Balsa and use
the PostgreSQL optimizer to select the join algorithms.
Balsa jointly predicts join orders and physical operators, and both
predictions contribute to better query plans. To uncover the individ- For all settings of Balsa, we limit the wall-clock time to 10 hours
ual contributions of these dimensions, we perform an ablation study (including training, planning, and execution) for all JOB train-test
over each dimension by considering the following configurations: splits, and 100 hours for all Stack train-test splits. We use the model
(1) We consider using the exact query plan generated by Balsa, after the last iteration to predict the query plans. We choose this
i.e., we use both the predicted join order and the physical cutoff time as it matches the one used in the Balsa paper [57]; we
operators in the generated plan; we refer to this configuration also use the implementation provided by the authors [5].
as Balsa in the figures and the discussions that follow. Results. The results on the four workloads (JOB-Rand, JOB-Slow,
(2) We consider using only the join order that is predicted by Stack-Rand, and Stack-Slow) are shown in Figure 2. For JOB-Rand
Balsa, and couple that decision with the physical join algo- (Figure 2a), Balsa-overfit improves the workload execution time
rithms that are chosen by the original PostgreSQL optimizer; over PostgreSQL by 2.5× (15.6s saved) when the tested queries are
we refer to this configuration as Balsa-JO. also present in the training set. If we only apply the join orders
Our hypothesis is that if Balsa-JO is as good as Balsa, then the predicted, Balsa-overfit-JO yields a comparable speedup of 2.1×
join order selection contributes more to the performance gains. (12.2s saved) over PostgreSQL. This result highlights that when the
Otherwise, picking the join algorithm is the more critical decision. queries are in the training set, the join order predicted by Balsa’s
Benchmarks and Metrics. We evaluate Balsa on two workloads: 1) RL model is the critical decision that improves execution time.
JOB [41] and 2) Stack [44]. We follow the analysis setup presented When trained and tested on different queries, Balsa has a speedup
in the Balsa work [57] and consider two train-test splits of each of 1.5× (8.6s saved). As expected, the improvement is lower than
workload: 1) randomly selected queries as the test set, and 2) the that of Balsa-overfit. If we only use the join orders generated by the
slowest queries (in terms of PostgreSQL run times) as the test set. learned model, i.e., we use the Balsa-JO configuration, we observe
Specifically, for JOB, we use JOB-Rand and JOB-Slow as described similar average runtime as the original PostgreSQL , and with larger
in Section 1. As for Stack, we create two similar train-test splits: 1) standard errors. In addition, the performance gap between the Balsa
2965
PostgreSQL Balsa-overfit-JO Balsa-JO 103
PostgreSQL Balsa
2.5
2.0 1.0
101
1.5
1.0 0.5
0.5 100
0.0 0.0 JOB-Join-1 JOB-Join-2 JOB-Join-3 JOB-Join-4 JOB-Join-5 JOB-Join-6
2966
Adaptive query processing with LIP AJA
Planning Execution
⋈#$#
Step 1: Initialize and build bloom filters
…
… 𝑇% 𝑇& 𝑇'
⋈#$#
𝜎% 𝜎& 𝜎'
⋈#$# 𝜎!"# 𝜎!"! 𝜎!"# 𝜎!"" Query results
𝜎!"! , 𝜎!""
Step 2: Query execution
cost model to estimate plan costs, adaptive query processing op- Specifically, during planning, for each selection predicate 𝜎𝑖 on
timizes joins and physical operators using statistics collected at 𝑇𝑖 in the query plan, LIP examines whether there are tables 𝑇 𝑗 that
runtime. Since adaptive query processing is not trained, it is intrin- equijoin with 𝑇𝑖 on attribute 𝑎𝑖 𝑗 in the “lower levels" of the query
sically not overfitting to any query pattern or data set. plan. If such 𝑇 𝑗 tables exist, then a lookahead bloom filter 𝜎𝐵𝐹𝑖 is
While there are a large number of adaptive query processing planned to push down 𝜎𝑖 to 𝑇 𝑗 . During execution, the bloom filter
techniques in the community (e.g., [2, 21, 26, 33, 38]), in this paper, 𝜎𝐵𝐹𝑖 is built by scanning 𝜎𝑖 (𝑇𝑖 ) and adding the set of join keys
we take a relatively simple approach and pick two techniques to 𝑎𝑖 𝑗 to 𝜎𝐵𝐹𝑖 . Later when the DBMS execution engine is processing
build a rudimentary adaptive query processing framework. Our the join on 𝑇 𝑗 , each join key 𝑎𝑖 𝑗 in 𝑇 𝑗 is tested for membership in
goal is not to build and evaluate a comprehensive adaptive query 𝜎𝐵𝐹𝑖 . If there are multiple bloom filters 𝜎𝐵𝐹𝑖 s on the same table, LIP
processing framework (that is an orthogonal research topic), but to has an adaptive reordering mechanism of the bloom filters based
show how a simple adaptive framework compared to learning-based on the collected runtime statistics [59]. This adaptive reordering
QO approaches. The simple adaptive query processing framework mechanism makes the bloom filter probing order converge to the
that we consider in this paper combines two existing adaptive optimal join order over time. For further details about LIP, we refer
mechanisms: Lookahead Information Passing (LIP) and Adaptive the reader to [29, 48, 59].
Join Algorithm (AJA). In our adaptive query processing framework, all joins are exe-
cuted using an adaptive join algorithm (AJA) that selects the join
algorithm on-the-fly to avoid a bad join algorithm selection. AJA
4.1 The LIP+AJA Adaptive Framework starts the join operation using a hash join algorithm by default,
Figure 4 shows an overview of our proposed LIP+AJA framework. and if the number of keys inserted in the hash table is smaller
This framework takes a physical query plan as input and optimizes than a threshold 𝑇 , AJA switches to a faster nested loop join algo-
the execution of that plan. The input is a plan that can be generated rithm [2, 56].
by any query optimizer (learned or not). For all experiments, we There are two strategies of nested loop join algorithms that
use the PostgreSQL optimized plans as input to our adaptive frame- AJA considers: 1) simple (naive) nested loop join, and 2) index
work, except for the experiments discussed in Section 6.3 where nested loop join [30]. Compared to simple nested loop join, index
LIP+AJA uses plans that are generated by Balsa and Bao, and for nested loop join algorithm utilizes the index built on the inner table
the experiment discussed in Section 6.4 where random plans are join key to avoid exhaustive scanning of the inner table. Given
fed into LIP+AJA. the different complexity of the two nested loop join algorithms,
For a given execution plan, LIP+AJA first uses LIP to rewrite the we apply two different values of the threshold 𝑇 (𝑇 = 𝑉𝑖𝑛𝑑 or
execution plan by injecting bloom filters in the equijoin portion 𝑇 = 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 ) depending on the index availability on the inner side
of the pipeline. Next, it replaces all physical join operators with join key. If an index is available on the inner table join key and
adaptive join operators (AJA). Finally, it executes the rewritten plan the built hash table is smaller than the threshold 𝑇 = 𝑉𝑖𝑛𝑑 , AJA
using an adaptive execution strategy. switches to an index nested loop join, using the rows in the built
LIP generalizes the well-known semijoin technique to optimize hash table as the outer loop and the other table with the index as
an equijoin pipeline [59]. The main idea behind LIP is that a good the inner loop. Otherwise, if there is no such index, AJA uses the
join order applies the most selective join operation first, reducing threshold 𝑇 = 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 to determine whether to switch the original
tuples that are passed to subsequent join operations. Likewise, hash join to a simple nested loop join.
LIP uses lookahead bloom filters [59] to push down the selection The pseudocode for AJA is shown in Algorithm 1. AJA takes
predicates in the equijoin pipeline, making the query execution not two join components 𝑅1, 𝑅2 (tables or intermediate results), along
sensitive to the join orders [59].
2967
Algorithm 1 Adaptive join algorithm (AJA) 5 IMPLEMENTATION
Input: Join components 𝑅1 , 𝑅2 , threshold values 𝑉𝑖𝑛𝑑 , 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 Although LIP+AJA can be integrated into any DBMS, we chose
Output: join result 𝑅𝑜 PostgreSQL as the target DBMS as it is also the platform used
1. ℎ𝑎𝑠ℎ_𝑡𝑎𝑏𝑙𝑒 = 𝐻𝑎𝑠ℎ𝐵𝑢𝑖𝑙𝑑 (𝑅1 ) // Build the hash table on 𝑅1 . in many previous works on learned QO. We implement LIP as a
2. if index is available on 𝑅2 and |ℎ𝑎𝑠ℎ_𝑡𝑎𝑏𝑙𝑒 | ≤ 𝑉𝑖𝑛𝑑 : PostgreSQL extension and perform a proof-of-concept simulation
3. 𝑅𝑜 = 𝐼𝑛𝑑𝑒𝑥𝑁 𝑒𝑠𝑡𝑒𝑑𝐿𝑜𝑜𝑝 (ℎ𝑎𝑠ℎ_𝑡𝑎𝑏𝑙𝑒, 𝑅2 ) of AJA. This section introduces the implementation and simulation
4. else if index is not available on 𝑅2 and |ℎ𝑎𝑠ℎ_𝑡𝑎𝑏𝑙𝑒 | ≤ 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 : details. We also analyze the overheads introduced by LIP+AJA.
5. 𝑅𝑜 = 𝑆𝑖𝑚𝑝𝑙𝑒𝑁 𝑒𝑠𝑡𝑒𝑑𝐿𝑜𝑜𝑝 (ℎ𝑎𝑠ℎ_𝑡𝑎𝑏𝑙𝑒, 𝑅2 )
6. else: 5.1 LIP as a PostgreSQL Extension
7. 𝑅𝑜 = 𝐻𝑎𝑠ℎ𝑃𝑟𝑜𝑏𝑒 (𝑅2, ℎ𝑎𝑠ℎ_𝑡𝑎𝑏𝑙𝑒) // Continue with hash join.
We implemented LIP as a PostgreSQL extension. The input to LIP
8. return 𝑅𝑜
is the optimized plan from PostgreSQL. The LIP extension consists
of multiple user-defined C PostgreSQL functions [9] to initialize,
build, and probe the bloom filters. Table 2 lists the key functions
that are used in the LIP extension. To execute queries with LIP , we
rewrite the queries into two SQL blocks with these functions.
with the two threshold values (𝑉𝑖𝑛𝑑 and 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 ) as inputs. The two
Bloom Filter Building Block. During initialization, the number of
threshold values 𝑉𝑖𝑛𝑑 and 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 are universal for the underlying
the bloom filters needed (cf. Section 4.1) for the query at hand is
database. As in SQL Server 2019 [2, 56], AJA starts by operating
decided, and the memory space is declared in the corresponding
as a hash join operator. It first builds a hash table (Line 1). If there
shared memory (using the pg_lip_bloom_init function). LIP uses
exists an index on the join key of 𝑅2 and the number of keys in
shared memory to store the bloom filters since PostgreSQL sup-
the hash table (|ℎ𝑎𝑠ℎ_𝑡𝑎𝑏𝑙𝑒 | in Line 2) is smaller than the threshold
ports parallel query plans (for example, parallel sequential scan and
𝑇 = 𝑉𝑖𝑛𝑑 , it switches to an index nested loop join (Line 2 and 3). In
parallel hash join) and it may start multiple workers that use the
the index nested loop join, the outer loop is built using the existing
bloom filters at the same time. We use the method outlined in [18]
hash table to avoid rescanning 𝑅1 . If there is no such index on 𝑅2 ,
to pick parameters for our bloom filters. We feed into the model in
the threshold value 𝑉𝑛𝑜𝑛𝑖𝑑𝑥 is used to determine whether to switch
[18] a false positive rate threshold of 0.01 and a key threshold of
to a simple nested loop join (Line 4 and 5). Finally, if AJA decides
107 , and get an optimized bloom filter configuration, which is 11
not to switch to any type of the nested loop join, it continues the
MiB in size and uses seven hash functions. To build a bloom filter
hash join by probing the hash table with rows in 𝑅2 (Line 7).
for a table, we scan the table once and add the valid keys to the
The two values 𝑉𝑖𝑛𝑑 and 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 of threshold 𝑇 are system de-
bloom filter using the function called pg_lip_bloom_add, which
pendent and differentiate the costs between the hash join and the
internally uses MurmurHash2 as the hash function.
two flavors of nested loop join algorithms. We use a grid search
Query Execution Block. We rewrite an input query using the
method with a 1-join query template to determine the two values.
probing function pg_lip_bloom_probe as a predicate. For a given
We present how we pick the values of the threshold in Section 5.3.
key, it probes the bloom filter and returns a boolean value.
Example. We use Query 17a from the JOB benchmark to illustrate
4.2 Discussion the use of the LIP extension functions. Figure 5 shows the original
Compared to learned query optimizers, the adaptive LIP+AJA ap- query and the LIP-rewritten query. To build the bloom filters, we
proach does not overfit to any data or query workload. The only add a SQL block that consists of queries to build the bloom filers
input signals to LIP+AJA are the statistics collected during query using the functions pg_lip_bloom_init and pg_lip_bloom_add.
execution, including the selectivities of the bloom filters (LIP) and Then, the original query is rewritten using subqueries to probe the
the number of keys added to hash tables (AJA). This adaptive frame- bloom filters using the pg_lip_bloom_probe function.
work also provides potential robustness to bad input query plans:
First, for a sub-optimal join order, LIP reduces data movement by 5.2 Optimization Rules of LIP
pushing down a semijoin filter to a lower level in the query plan, and Not all bloom filters that LIP builds are equally effective. Turning
this method has been shown to be robust to a variety of input join off ineffective bloom filters improves query execution time as both
orders [48, 59]. Second, AJA is robust to the original join algorithm building and probing the filters introduce overhead. The effective-
selection since it picks the join algorithm on the fly. We experi- ness of a bloom filter depends on the number of rows pruned, and
mentally demonstrate the robustness of LIP+AJA in Sections 6.3 this cost has to be balanced with the overhead that is incurred in
and 6.4. In addition, since the bloom filter only uses an atomic building and probing the bloom filter. While these decisions can
bit-setting primitive in the construction phase and is read-only in be more tightly integrated within the core data processing engine
the probe phase, LIP is fully parallelizable. Thus, LIP is compatible (e.g., as was done in [48]), given our implementation that is based
with parallel query execution plans. on making changes to PostgreSQL using only the external query
We note that LIP+AJA also introduces runtime overhead. Specif- rewrite and extensibility mechanisms (as is also the approach taken
ically, LIP introduces the overhead of building and probing bloom in ML-based QO), we use two heuristics to aid our implementation.
filters, and AJA introduces the overhead of building and rescanning Optimizing Bloom Filter Building. First, if the predicate 𝜎𝑖 used
a hash table. In practice, these overheads are small, as described in to build the bloom filter cannot prune more than 𝑏 𝑓 _𝑏𝑢𝑖𝑙𝑑_𝑝𝑟 of
Sections 5.4. the join keys (according to the cardinality estimator in PostgreSQL),
2968
Table 2: Key functions supported in the PostgreSQL extension of LIP.
SELECT pg_lip_bloom_init(3);
SELECT sum(pg_lip_bloom_add(0, id))
FROM keyword AS k WHERE k.keyword = ‘character-name-in-title';
SELECT MIN(n.name) AS member_in_charnamed_american_movie, SELECT sum(pg_lip_bloom_add(1, id))
MIN(n.name) AS a1 FROM company_name AS cn WHERE cn.country_code = ‘[us]’;
FROM SELECT sum(pg_lip_bloom_add(2, id))
cast_info AS ci, Execution Step 1: FROM name AS n WHERE n.name LIKE ‘B%’;
company_name AS cn, Building Bloom Filters
keyword AS k, SELECT MIN(n.name) AS member_in_charnamed_american_movie,
movie_companies AS mc, MIN(n.name) AS a1 FROM
movie_keyword AS mk, ( SELECT * FROM cast_info
name AS n, WHERE pg_lip_bloom_probe(2, person_id)) AS ci,
title AS t company_name AS cn,keyword AS k,
WHERE cn.country_code =‘[us]’ ( SELECT * FROM movie_companies
AND k.keyword = ‘character-name-in-title’ WHERE pg_lip_bloom_probe(1, company_id)) AS mc,
AND n.name LIKE ‘B%’ Execution Step 2: ( SELECT * FROM movie_keyword
AND n.id = ci.person_id AND ci.movie_id = t.id WHERE pg_lip_bloom_probe(0, keyword_id)) AS mk,
Query Execution
AND t.id = mk.movie_id AND mk.keyword_id = k.id name AS n,title AS t
AND t.id = mc.movie_id AND mc.company_id = cn.id WHERE cn.country_code =‘[us]’ AND k.keyword =‘character-name-in-title’
AND ci.movie_id = mc.movie_id AND n.name LIKE ‘B%’ AND n.id = ci.person_id
AND ci.movie_id = mk.movie_id AND ci.movie_id = t.id AND t.id = mk.movie_id
AND mc.movie_id = mk.movie_id; AND mk.keyword_id = k.id AND t.id = mc.movie_id
AND mc.company_id = cn.id AND ci.movie_id = mc.movie_id
AND ci.movie_id = mk.movie_id AND mc.movie_id = mk.movie_id;
then we turn off the bloom filter 𝜎𝐵𝐹𝑖 ; i.e., such bloom filters are loosens the restriction on the bloom filters, and more bloom filters
deemed ineffective, and they are simply not constructed. Second, if will be built and probed. The appropriate setting for 𝑏 𝑓 _𝑏𝑢𝑖𝑙𝑑_𝑝𝑟
a bloom filter 𝜎𝐵𝐹𝑖 cannot be pushed to at least one level down in and 𝑏 𝑓 _𝑝𝑟𝑜𝑏𝑒_𝑝𝑟 is dependent on the efficiency of the LIP imple-
the input query pipeline, we turn off that filter. Collectively, these mentation. Since our first version of LIP still has space for further
two heuristics prune bloom filters that are unlikely to be effective. optimization (see Section 5.4), we aggressively set 𝑏 𝑓 _𝑏𝑢𝑖𝑙𝑑_𝑝𝑟 to
Optimizing Bloom Filter Probing. While we may choose to build 0.9, and 𝑏 𝑓 _𝑝𝑟𝑜𝑏𝑒_𝑝𝑟 to 0.1. For the parameter 𝑣𝑎𝑙𝑖𝑑_𝑟𝑜𝑤𝑠, we set
a bloom filter 𝜎𝐵𝐹𝑖 for a table 𝑇𝑖 if 𝜎𝑖 is selective, when we apply it to a value of 1000 for all our experiments.
𝜎𝐵𝐹𝑖 to the next table 𝑇 𝑗 in the join order, the filter 𝜎𝐵𝐹𝑖 may not
prune sufficient rows. Hence, paying the cost to probe the filter 5.3 Simulating AJA
𝜎𝐵𝐹𝑖 may not result in improved performance. To ameliorate this
situation, we introduce an optimization heuristic that adaptively Algorithm 1 shows the adaptive join algorithm following SQL
disables the probing of specific bloom filters at runtime. During Server [2, 56]. In Algorithm 1, the hash table must be fully popu-
query processing, we measure the pruning rate of each filter on lated. For the simulation, we first execute the query with a hash join
the first 𝑣𝑎𝑙𝑖𝑑_𝑟𝑜𝑤𝑠 rows that it scans. Any filter that prunes fewer algorithm to determine the actual cardinality of each hash table.
than 𝑏 𝑓 _𝑝𝑟𝑜𝑏𝑒_𝑝𝑟 percentage of the input rows, is deactivated. Beyond the cardinality computation step, our simulation follows
This decision is remembered for the next 10 × 𝑣𝑎𝑙𝑖𝑑_𝑟𝑜𝑤𝑠 input the same steps as Algorithm 1: if there is an index and the true car-
rows that are scanned, after which the pruning rate of the filter dinality collected is smaller than 𝑉𝑖𝑛𝑑 , we use an index nested loop
is re-evaluated, and it is activated if needed. To further optimize join algorithm; if there is no such index available and the true car-
the probing performance, if multiple bloom filters are activated on dinality collected is smaller than 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 , we apply a simple nested
a single table, we rearrange the order in which they are probed, loop join algorithm; otherwise, AJA uses a hash join algorithm.
prioritizing the most selective filters first to minimize the number We specify the physical join algorithm in PostgreSQL by adding
of bloom filter probes [59]. execution hints to the query [1]. Since both the above simulation
Discussion. The hyper-parameters 𝑏 𝑓 _𝑏𝑢𝑖𝑙𝑑_𝑝𝑟 , 𝑏 𝑓 _𝑝𝑟𝑜𝑏𝑒_𝑝𝑟 , and and a deeper implementation of AJA use the same true cardinality,
𝑣𝑎𝑙𝑖𝑑_𝑟𝑜𝑤𝑠 determine which lookahead bloom filters are built and the only runtime difference corresponds to the overhead of building
probed. Intuitively, increasing 𝑏 𝑓 _𝑏𝑢𝑖𝑙𝑑_𝑝𝑟 and 𝑏 𝑓 _𝑝𝑟𝑜𝑏𝑒_𝑝𝑟 re- (Line 1) and re-scanning the hash table (Line 3 or 5). We analyze
sults in fewer bloom filters being built and probed, leaving the this overhead in Section 5.4.
most “effective” bloom filters in place. Decreasing the two constants Determining Threshold Values 𝑉𝑖𝑛𝑑 and 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 . In Algorithm 1,
the two values 𝑉𝑖𝑛𝑑 and 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 of threshold 𝑇 determines when
2969
AJA switches from a hash join algorithm to one of the nested loop 5.5 Discussion
join algorithms. As 𝑉𝑖𝑛𝑑 and 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 are relevant to the computation As noted earlier, we have a simple implementation of LIP+AJA,
costs of the join operators, we perform a grid search using a simple which follows the philosophy of “simplicity in implementation” as
select-join query template shown below from the IMDB data set [6]. previous papers in RL-based QO [44, 45, 57]. Nevertheless, there
In this template, we use a large fact table cast_info as 𝑅2 , which are opportunities to make this implementation more efficient. In
allows the generated threshold 𝑇 to be “safe” for large tables. In our implementation, the bloom filter construction overhead is high
practice, for data sets other than IMDB, this 1-join template can be as the input tables are scanned twice and the bloom filter probe
created accordingly using the two largest fact tables as 𝑅1 and 𝑅2 . functions are not efficient (cf. Section 5.4). A more sophisticated im-
WITH R1 AS ( SELECT ∗ FROM t i t l e LIMIT n ) plementation directly in the core code of PostgreSQL could reduce
SELECT ∗ FROM R1 , c a s t _ i n f o as R2 these overheads. In addition, although our initial simulation of AJA
WHERE R1 . i d = R2 . m o v i e _ i d ; provides a framework to carry out our proof-of-concept evalua-
tions, implementing AJA as another joining method in PostgreSQL
Using the template, we generate queries with varying values of
could make it more general.
the parameter 𝑛. Each query is run using a hash join algorithm,
and two nested loop join algorithms – index nested loop join and
simple nested loop join (we construct the appropriate index on 6 EVALUATION
the underlying inner table to evaluate the index nested loop join In this section, we experimentally answer the central question of
option). In practice, by performing a grid search on 𝑛, we establish this paper: How do the SOTA RL-based query optimizers compare to
the two cut-off values: 𝑉𝑖𝑛𝑑 = 500, 000 and 𝑉𝑛𝑜𝑛𝑖𝑛𝑑 = 1. the adaptive query processing with LIP+AJA? Our key finding is that
LIP+AJA is not only comparable, but it also outperforms RL-based
5.4 Analyzing the Overheads query optimizers in many cases. In addition, we also evaluate:
In addition to speeding up the query execution, applying LIP and
(1) Can LIP+AJA further optimize the plans generated by learned
AJA may also introduce overheads to the query execution. In this
query optimizers? See Section 6.3.
section, we describe such overheads in detail.
(2) How robust is LIP+AJA to random, possibly bad, query
Overheads associated with LIP. There are two sources of over-
plans? See Section 6.4.
heads associated with the LIP mechanism: 1) the cost to build bloom
(3) How does LIP+AJA perform on queries with subplans? See
filters, and 2) the cost associated with probing the bloom filters.
Section 6.5.
During the bloom filter building phase, we scan input tables and
populate the bloom filters with join keys. This overhead grows
linearly as the number of keys added. 6.1 Experimental Setup
During the probing phase, LIP calculates the hash of the key and Baselines. We use two representative but different RL-based query
checks the bloom filter for a hit. This overhead increases linearly optimizers, Balsa and Bao, with PostgreSQL as our baseline methods.
with the number of probes. If the bloom filter is selective, fewer For Balsa and Bao, we use the source code provided by the authors [3,
tuples will pass through the pipeline, reducing the effect of this 5]. For JOB, we limit the wall-clock training time to 10 hours for
overhead. The heuristic proposed in Section 5.2 aims to disable both Balsa and Bao, including the time to train the neural networks,
ineffective filters and reorder the enabled ones. The total building optimize the queries, and execute the workload. We selected such
and probing overheads of LIP account for 26% of the LIP+AJA query time limitations as similar training times are reported in [57]. As
execution time in the JOB-Rand workload and 18% in the JOB-Slow for Stack, since there is a larger number of queries in the workload,
workload (cf. Figure 1). we increase the training time limit to 100 hours.
Overheads of AJA. As shown in Algorithm 1, AJA introduces Our Method. For LIP+AJA, we use the query plans generated by
overheads if it decides to switch join algorithm. The overhead comes the PostgreSQL query optimizer as input unless otherwise specified.
from building a hash table that is not used for the join, and the cost We report the end-to-end execution time of LIP+AJA, including
to rescan the hash table to start the nested loop join algorithm. all the overheads introduced by LIP+AJA (cf. Section 5.4). As dis-
We estimate these overheads as follows: First, the hash table cussed in Section 4.2, LIP and AJA are compatible with parallel
building phase can be pipelined with the scan phase [7], resulting query execution plans. Therefore, we have enabled parallel plans
in low latency impact, never exceeding two milliseconds in our in PostgreSQL and let PostgreSQL’s optimizer determine the num-
experiments. Thus, the estimated hash table building overhead is ber of workers to be launched. Parallel execution as provided by
2 × 1 {AJA→NL} , where 1 {AJA→NL} indicates whether AJA switches PostgreSQL is also enabled for the learned QOs, Balsa and Bao.
from hash join to a nested loop join. Additionally, we measured the Benchmarks. We use queries from three commonly used bench-
scanning cost of an in-memory hash table, finding that PostgreSQL marks, JOB [41], Stack [44], and TPC-H [49]. A summary of the
scans approximately 4000 rows per millisecond (in a single thread). test query sets is shown in Table 1.
If the total number of keys in the hash table is 𝑛, then the estimated JOB workloads. Same as Section 3.1, we follow the analysis setup
hash table scanning overhead is 4000 𝑛 ×1
{AJA→NL} . Thus, the total presented in Balsa [57] and evaluate query performance on 1) ran-
𝑛
overhead is (2 + 4000 ) × 1 {AJA→NL} . This overhead is low in prac- domly selected queries (JOB-Rand) and 2) slow queries (JOB-
tice since we only switch the join algorithm when 𝑛 is small. For Slow). In addition, to reduce the impact of randomness in JOB-
example, on JOB queries, AJA overhead takes less than 1% of the Rand, we employ a Monte Carlo cross-validation analysis and create
query execution time. JOB-Rand-CV workload. In JOB-Rand-CV, we randomly select 19
2970
queries as the test set five times, and for each of the five test sets, PostgreSQL Bao-overfit Bao
the learned query optimizers are trained on the remaining queries. Balsa-overfit Balsa LIP+AJA
Stack workloads. We also adopt the Stack workload [44] in our ×101 ×102
study. This query workload consists of more than six thousand
3 1.00
For each of the train-test configurations (except TPC-H-Complex),
0.75
Balsa and Bao are trained on the training part of the workload and 2
0.50
tested on the queries in the test set. For LIP+AJA, we directly run 1 0.25
queries in the test set, as no training step is required in this case. 0 0.00
Since current ML-based query optimizers are not able to tackle
queries with complex SQL structures (e.g., CTEs) in the TPC-H- (c) Stack-Rand (d) Stack-Slow
Complex workload, we only present results with LIP+AJA.
Normalized run time
Metrics. We use the wall-clock query execution time as the main
1.0 1.0
evaluation metric. We repeat Balsa and Bao-related experiments
five times with different random initialization. We report the mean
workload run times of the five different predicted query plan sets, 0.5 0.5
along with the standard errors associated with the mean workload
run times. For LIP+AJA, we also run the test workload five times 0.0 0.0
and report the mean run time. Note that the run times reported for
LIP+AJA include all the overheads described in Section 5.4. (e) JOB-Rand-CV (f) Stack-Rand-CV
System Configurations. We use a machine with 40 CPU cores and
250GB RAM for all experiments. We use an NVIDIA Tesla V100 GPU Figure 6: Performance of LIP+AJA vs. learned QOs.
with 32GB of GPU memory to train the learned query optimizers
(Balsa and Bao). We use a configuration of PostgreSQL similar to
[41]: We use PostgreSQL 12.5 and set the memory limit per operator
(work_mem) to 4GB, buffer pool size (shared_buffers) to 4GB, and Bao saved 28.8s), while the improvement with LIP+AJA is 2.0×
the buffer cache size (effective_cache_size) to 32GB. We allow (48.9s saved). In addition, comparing with the training performance
at most eight parallel workers to work on a query. In addition, we of learned query optimizers, LIP+AJA is comparable to Balsa-overfit
disable the genetic query optimizer in PostgreSQL, allowing queries (1.9x, 47.3s saved) and Bao-overfit (1.8x, 44.8s saved) on JOB-Slow.
with a large number of joins to utilize the same query optimizer as As for Stack-Rand, LIP+AJA (2.5x, 2409s saved) has a similar
smaller queries. To allow faster query execution, we pre-construct performance to Balsa-overfit (2.6x, 2605s saved) and Bao-overfit
indices on foreign key attributes in all datasets [41, 57]. (2.3x, 2417s saved). However, LIP+AJA outperforms Balsa (2.3x,
1851s saved) and Bao (1.3x, 905s saved). In addition, for Stack-Slow,
LIP+AJA (2.6x, 7877s saved) can outperform the best-performing
6.2 Performance of LIP+AJA learned QO configuration Balsa-overfit (1.7x, 5038s saved).
We first evaluate the end-to-end performance of all methods on Performance Breakdown. To demonstrate performance variations
the JOB-Rand, JOB-Slow, Stack-Rand, and Stack-Slow workloads. among different queries, we provide a runtime breakdown of the
In addition, we break down the JOB-Rand workload to show the JOB-Rand workload. We compare three approaches, namely Balsa,
query-by-query performance. Further, we evaluate Balsa, Bao, and Balsa-overfit, and LIP+AJA, with PostgreSQL, as depicted in Figure 7.
LIP+AJA on different random sets of queries by performing a Monte Negative values indicate improved performance. The queries in JOB-
Carlo cross-validation using JOB-Rand-CV and Stack-Rand-CV. Rand are sorted in descending order based on their PostgreSQL
Performance. Figure 6 shows the performance of LIP+AJA, Balsa runtimes. Therefore, query 17e is the slowest query on PostgreSQL,
and Bao. On the JOB-Rand workload, Balsa and Bao improve the while query 15b is the fastest query in Figure 7.
PostgreSQL by 1.5× (8.6s saved) and 1.2× (3.6s saved) respectively, Among LIP+AJA, Balsa, and Balsa-overfit, the performance gain
while LIP+AJA improves performance by 1.5× (8.5s saved). Balsa for the JOB-Rand workload is primarily in the slower queries (from
and LIP+AJA have similar performance on random queries. 17e to 20a), which are also part of the JOB-Slow workload.
Turning our attention to JOB-Slow, we observe that Balsa and However, for the faster queries (from 22c to 15b), Balsa-overfit
Bao both improve PostgreSQL by 1.4× on average (Balsa saved 26.5s, achieves a total runtime of 5.48s, outperforming PostgreSQL (7.15s).
2971
Relative run time compared
d
2a
b
a
a
b
b
a
b
4c
4b
b
b
17
22
30
17
20
15
10
16
15
24
22
10
26
12
15
(a) JOB-Rand (b) JOB-Slow
Figure 7: Performance breakdown of JOB-Rand. Figure 8: Applying LIP+AJA to plans from Balsa and Bao.
2972
has an average run time of 346 seconds, while LIP+AJA w/ random Adaptive query processing generally delays making decisions until
plan brings down the run time to 31 seconds. For JOB-Slow, LIP+AJA some execution statistics have been collected, and thus it dramati-
reduces the run time from 832 seconds to 75 seconds. cally reduces the dependence on having accurate estimates upfront.
Takeaway. Although we do not exhaustively generate (the expo- The granularity of the adaptiveness varies and includes tuning
nential number of) all random plans for the queries in the JOB-Rand the physical implementation of an operator (e.g., [2, 19, 31]) and
and JOB-Slow workloads, the above results demonstrate the robust- making the entire equijoin pipeline more robust (e.g., [36, 59]). LIP
ness of LIP+AJA to a variety of random input query plans. and AJA are selected as two typical adaptive methods addressing
orthogonal aspects of equijoin order selection and join operator
6.5 Adaptive Query Processing on Subplans algorithms. LIP can be seen as a special case of SIP [36] as it reduces
Unlike learned query optimizers that have a static model architec- the tuples from the fact table by using the classic semijoin tech-
ture, LIP+AJA is more flexible and can be applied to queries that nique [13, 16, 43, 55] to minimize unnecessary data movement. LIP
have complex SQL constructs. In this experiment, we test LIP+AJA is also similar to the algorithm proposed in [58], but the reduction
on queries with complex subqueries using the TPC-H-Complex step and the join step in LIP are interleaved. LIP also optimizes
workload. Many of the queries in this workload contain an ORDER the bloom filters in the query execution pipeline by introducing
BY clause, making the latency sensitive to the order of the output adaptiveness in both the building and probing phases, which can
rows. Since in the preliminary AJA (Algorithm 1), the order of the significantly improve its performance. For further details about LIP
results may not be the same as originally planned (for example, and its connection to related work, we refer the readers to [59].
if the join algorithm is switched to a nested loop join algorithm), The recent success of ML has led to new ideas to improve query
applying AJA to order-sensitive queries may introduce extra re- optimization. To mitigate the impact of inaccurate estimates, ML
ordering overhead. Thus, for this experiment, we let PostgreSQL’s models have been proposed for cardinality estimation [27, 34, 39, 42,
query optimizer choose the physical join operators, and we turn off 51, 53]. These learned cardinality estimators still need to work with
the AJA mechanism when the query has an ORDER BY subclause. a query optimizer to generate complete physical plans. Reinforce-
ment learning (RL) has recently become popular for end-to-end
query optimization [40, 44–46, 54, 57]. DQ [40] and ReJOIN [46] use
Run time (seconds)
60 PostgreSQL
LIP a neural network to learn the plan enumeration strategy. Bao [44]
40 optimizes queries by learning the optimization flags to set for each
input query. Neo [45] and Balsa [57] train neural network-based
20 cost models and generate plans that minimize the estimated costs.
0
8 CONCLUSIONS
Figure 10: Performance of LIP on TPC-H-Complex. In this paper, we compare a specific, simple, adaptive query process-
ing approach, LIP+AJA, with two SOTA RL-based query optimizers,
Results. The results of this experiment are shown in Figure 10. For Balsa and Bao. We show that LIP+AJA is not only comparable to
this workload, LIP improves PostgreSQL by 1.4×. the RL-based optimizers in terms of query execution performance,
Takeaway. LIP+AJA can be used on complex SQL query constructs, but it is also better in many cases. In addition, the adaptive query
making it more broadly applicable than current RL-based QOs. processing approach LIP+AJA is able to optimize complex query
constructs, which current RL-based query optimizers cannot tackle.
7 RELATED WORK Further, the adaptive approach does not require an expensive re-
The shortcomings of traditional optimization techniques have been training step if the workload changes. Given the flexibility and
well documented and often stem from estimation errors [41]. Query effectiveness of adaptive approaches to query processing, our com-
optimizers use estimation models to predict the cardinality of (par- munity should continue to acknowledge their appeal, and future
tial) query plans, estimate their cost, and choose the most efficient ML-based query optimization proposals must consider comparing
plan. The simplifying assumptions used to build estimation mod- their approach with adaptive query processing techniques.
els fail to accurately capture real data distributions, leading query
optimizers to rely on predictions with large errors.
Many approaches have been proposed to address the mentioned ACKNOWLEDGMENTS
shortcomings. Feedback loops can enhance estimate quality by This work was supported in part by DARPA under grant ASKEM
using observed statistics from previous queries [10, 21, 22, 47]. An- HR001122S0005. The U.S. Government is authorized to reproduce
other approach allows for query plans to change during execution and distribute reprints for Governmental purposes notwithstand-
if the observed statistics differ significantly from the predicted sta- ing any copyright notation thereon. Any opinions, findings, and
tistics used at optimization time. Another set of techniques is to conclusions, or recommendations expressed in this material are
selectively re-optimize sub-optimal plans/sub-plans while minimiz- those of the authors and do not necessarily reflect those of the
ing the cost of re-optimization [17, 20, 23, 32, 35, 37]. Adaptive query sponsor. This work was also partly supported by funding from the
processing is a third line of work that moves certain optimization Wisconsin Alumni Research Foundation. Additional support was
decisions to the execution layer of the database engine, making exe- provided by the National Science Foundation (NSF) under grants
cution more efficient and robust [11, 12, 14, 19, 24, 28, 31, 33, 54, 59]. OAC-1835446 and CCF-2312739.
2973
REFERENCES lightweight models. Proceedings of the VLDB Endowment 12, 9 (2019), 1044–1057.
[1] 2012. pg_hint_plan. Retrieved July 15, 2023 from https://pghintplan.osdn.jp/pg_ [28] Kwanchai Eurviriyanukul, Norman W. Paton, Alvaro A. A. Fernandes, and
hint_plan.html Steven J. Lynden. 2010. Adaptive Join Processing in Pipelined Plans. In Pro-
[2] 2019. Adaptive join in Microsoft SQL Server. Retrieved July 15, ceedings of the 13th International Conference on Extending Database Technology
2023 from https://techcommunity.microsoft.com/t5/sql-server-blog/introducing- (Lausanne, Switzerland) (EDBT ’10). Association for Computing Machinery, New
batch-mode-adaptive-joins/ba-p/385411 York, NY, USA, 183–194. https://doi.org/10.1145/1739041.1739066
[3] 2020. Bao source code repository. Retrieved July 15, 2023 from https://github. [29] Kevin P Gaffney, Martin Prammer, Larry Brasfield, D Richard Hipp, Dan Kennedy,
com/learnedsystems/BaoForPostgreSQL and Jignesh M Patel. 2022. SQLite: Past, Present, and Future. Proceedings of the
[4] 2020. Hints used by Bao. Retrieved July 15, 2023 from https://rmarcus.info/ VLDB Endowment 15, 21 (2022), 3535–3547.
appendix.html [30] Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM
[5] 2022. Balsa source code repository. Retrieved July 15, 2023 from https://github. Computing Surveys (CSUR) 25, 2 (1993), 73–169.
com/balsa-project/balsa/ [31] Goetz Graefe. 2011. A generalized join algorithm. Datenbanksysteme für Business,
[6] 2022. IMDB dataset. Retrieved July 15, 2023 from https://www.imdb.com/ Technologie und Web (BTW) (2011).
interfaces/ [32] G. Graefe and K. Ward. 1989. Dynamic Query Evaluation Plans. In Proceedings of
[7] 2022. Overview of PostgreSQL Internals. Retrieved July 15, 2023 from https: the 1989 ACM SIGMOD International Conference on Management of Data (Portland,
//www.postgresql.org/docs/15/executor.html Oregon, USA) (SIGMOD ’89). Association for Computing Machinery, New York,
[8] 2022. Overview of PostgreSQL Query Optimizer. Retrieved July 15, 2023 from NY, USA, 358–366. https://doi.org/10.1145/67544.66960
https://www.postgresql.org/docs/12/planner-optimizer.html [33] Joseph M. Hellerstein, Michael J. Franklin, Sirish Chandrasekaran, Amol Desh-
[9] 2022. PostgreSQL user-defined C functions. Retrieved July 15, 2023 from https: pande, Kris Hildrum, Samuel Madden, Vijayshankar Raman, and Mehul A. Shah.
//www.postgresql.org/docs/current/xfunc-c.html 2000. Adaptive query processing: Technology in evolution. IEEE Data Eng. Bull.
[10] Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-tuning histograms: Building 23, 2 (2000), 7–18.
histograms without looking at data. ACM SIGMOD Record 28, 2 (1999), 181–192. [34] Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kris-
[11] L. Amsaleg, A. Tomasic, M.J. Franklin, and T. Urhan. 1996. Scrambling query tian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from Data, not from
plans to cope with unexpected delays. In Fourth International Conference on Queries! Proceedings of the VLDB Endowment 13, 7, 992–1005.
Parallel and Distributed Information Systems. 208–219. https://doi.org/10.1109/ [35] Yannis E Ioannidis, Raymond T Ng, Kyuseok Shim, and Timos K Sellis. 1997.
PDIS.1996.568681 Parametric query optimization. The VLDB Journal 6, 2 (1997), 132–151.
[12] Ron Avnur and Joseph M. Hellerstein. 2000. Eddies: Continuously Adaptive Query [36] Zachary G Ives and Nicholas E Taylor. 2008. Sideways information passing for
Processing. In Proceedings of the 2000 ACM SIGMOD International Conference push-style query processing. In 2008 IEEE 24th International Conference on Data
on Management of Data (Dallas, Texas, USA) (SIGMOD ’00). Association for Engineering. IEEE, 774–783.
Computing Machinery, New York, NY, USA, 261–272. https://doi.org/10.1145/ [37] Navin Kabra and David J DeWitt. 1998. Efficient mid-query re-optimization of
342009.335420 sub-optimal query execution plans. In Proceedings of the 1998 ACM SIGMOD
[13] Edward Babb. 1979. Implementing a relational database by means of specialzed international conference on Management of data. 106–117.
hardware. ACM Transactions on Database Systems (TODS) 4, 1 (1979), 1–29. [38] Srikanth Kandula, Laurel Orr, and Surajit Chaudhuri. 2019. Pushing data-induced
[14] Shivnath Babu, Rajeev Motwani, Kamesh Munagala, Itaru Nishizawa, and Jen- predicates through joins in big-data clusters. Proceedings of the VLDB Endowment
nifer Widom. 2004. Adaptive Ordering of Pipelined Stream Filters. In Proceedings 13, 3 (2019), 252–265.
of the 2004 ACM SIGMOD International Conference on Management of Data (Paris, [39] Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and
France) (SIGMOD ’04). Association for Computing Machinery, New York, NY, Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with
USA, 407–418. https://doi.org/10.1145/1007568.1007615 deep learning. arXiv preprint arXiv:1809.00677 (2018).
[15] Pete Belknap, Ali Cakmak, Sunil Chakkappen, Immanuel Chan, Deba Chatterjee, [40] Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion
Dinesh Das, Leonidas Galanis, Bruce Golbus, Shantanu Joshi, Tom Kyte, et al. Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement
2013. Oracle Database SQL Tuning Guide, 12c Release 1 (12.1) E15858-15. (2013). Learning. arXiv preprint arXiv:1808.03196 (2018).
[16] Philip A Bernstein and Dah-Ming W Chiu. 1981. Using semi-joins to solve [41] Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and
relational queries. Journal of the ACM (JACM) 28, 1 (1981), 25–40. Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of
[17] Pedro Bizarro, Nicolas Bruno, and David J. DeWitt. 2009. Progressive Parametric the VLDB Endowment 9, 3 (2015), 204–215.
Query Optimization. IEEE Transactions on Knowledge and Data Engineering 21, 4 [42] Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015.
(2009), 582–594. https://doi.org/10.1109/TKDE.2008.160 Cardinality estimation using neural networks. In Proceedings of the 25th Annual
[18] Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable International Conference on Computer Science and Software Engineering. 53–59.
errors. Commun. ACM 13, 7 (1970), 422–426. [43] Lothar F Mackert and Guy M Lohman. 1986. R* optimizer validation and per-
[19] Renata Borovica-Gajic, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski, formance evaluation for local queries. In Proceedings of the 1986 ACM SIGMOD
and Campbell Fraser. 2015. Smooth scan: Statistics-oblivious access paths. In international conference on Management of data. 84–95.
2015 IEEE 31st International Conference on Data Engineering. IEEE, 315–326. [44] Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Al-
[20] Nicolas Bruno and Rimma V Nehme. 2008. Configuration-parametric query izadeh, and Tim Kraska. 2022. Bao: Making learned query optimization practical.
optimization for physical design tuning. In Proceedings of the 2008 ACM SIGMOD ACM SIGMOD Record 51, 1 (2022), 6–13.
international conference on Management of data. 941–952. [45] Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh,
[21] Chungmin Melvin Chen and Nick Roussopoulos. 1994. Adaptive selectivity Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A learned
estimation using query feedback. In Proceedings of the 1994 ACM SIGMOD inter- query optimizer. arXiv preprint arXiv:1904.03711 (2019).
national conference on Management of data. 161–172. [46] Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for
[22] Chungmin Melvin Chen and Nick Roussopoulos. 1994. Adaptive Selectivity Join Order Enumeration. In Proceedings of the First International Workshop on
Estimation Using Query Feedback. In Proceedings of the 1994 ACM SIGMOD Inter- Exploiting Artificial Intelligence Techniques for Data Management. 1–4.
national Conference on Management of Data (Minneapolis, Minnesota, USA) (SIG- [47] Volker Markl, Guy M Lohman, and Vijayshankar Raman. 2003. LEO: An auto-
MOD ’94). Association for Computing Machinery, New York, NY, USA, 161–172. nomic query optimizer for DB2. IBM Systems Journal 42, 1 (2003), 98–106.
https://doi.org/10.1145/191839.191874 [48] Jignesh M Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang,
[23] Richard L. Cole and Goetz Graefe. 1994. Optimization of Dynamic Query Eval- Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A data
uation Plans. In Proceedings of the 1994 ACM SIGMOD International Confer- platform based on the scaling-up approach. Proceedings of the VLDB Endowment
ence on Management of Data (Minneapolis, Minnesota, USA) (SIGMOD ’94). 11, 6 (2018), 663–676.
Association for Computing Machinery, New York, NY, USA, 150–160. https: [49] Meikel Poess and Chris Floyd. 2000. New TPC benchmarks for decision support
//doi.org/10.1145/191839.191872 and web commerce. ACM Sigmod Record 29, 4 (2000), 64–71.
[24] Amol Deshpande, Joseph M Hellerstein, et al. 2004. Lifting the burden of history [50] P Griffiths Selinger, Morton M Astrahan, Donald D Chamberlin, Raymond A
from adaptive query processing. In VLDB. Citeseer, 948–959. Lorie, and Thomas G Price. 1979. Access path selection in a relational database
[25] Amol Deshpande, Zachary G. Ives, and Vijayshankar Raman. 2007. Adaptive management system. In Proceedings of the 1979 ACM SIGMOD international
Query Processing. Found. Trends Databases 1, 1 (2007), 1–140. https://doi.org/10. conference on Management of data. 23–34.
1561/1900000001 [51] Suraj Shetiya, Saravanan Thirumuruganathan, Nick Koudas, and Gautam Das.
[26] Bailu Ding, Surajit Chaudhuri, and Vivek Narasayya. 2020. Bitvector-aware 2020. Astrid: accurate selectivity estimation for string predicates using deep
query optimization for decision support queries. In Proceedings of the 2020 ACM learning. Proceedings of the VLDB Endowment 14, 4 (2020).
SIGMOD International Conference on Management of Data. 2011–2026. [52] Apache Spark. 2018. Spark SQL, DataFrames and datasets guide.
[27] Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, [53] Ji Sun and Guoliang Li. 2019. An end-to-end learning-based cost estimator. arXiv
and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using preprint arXiv:1906.02560 (2019).
2974
[54] Immanuel Trummer, Samuel Moseley, Deepak Maram, Saehan Jo, and Joseph An- [57] Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion
tonakakis. 2018. Skinnerdb: regret-bounded query evaluation via reinforcement Stoica. 2022. Balsa: Learning a Query Optimizer Without Expert Demonstrations.
learning. Proceedings of the VLDB Endowment 11, 12 (2018), 2074–2077. arXiv preprint arXiv:2201.01441 (2022).
[55] Patrick Valduriez and Georges Gardarin. 1984. Join and semijoin algorithms [58] Mihalis Yannakakis. 1981. Algorithms for acyclic database schemes. In VLDB,
for a multiprocessor database machine. ACM Transactions on Database Systems Vol. 81. 82–94.
(TODS) 9, 1 (1984), 133–161. [59] Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M. Patel. 2017. Looking
[56] Bob Ward. 2019. SQL Server 2019 Revealed: Including Big Data Clusters and Ahead Makes Query Plans Robust: Making the Initial Case with in-Memory Star
Machine Learning. Springer. Schema Data Warehouse Workloads. Proc. VLDB Endow. 10, 8 (apr 2017), 889–900.
https://doi.org/10.14778/3090163.3090167
2975