0% found this document useful (0 votes)
14 views4 pages

2022 DeepO a Learned Query Optimizer - 副本

DeepO is a novel deep-learning-based query optimizer designed to enhance query performance in database systems, specifically integrated with PostgreSQL. It addresses limitations of previous learned optimizers by providing fine-grained optimization options and confidence-aware cost estimations without relying on cardinality. The demonstration showcases DeepO's architecture, user interface, and preliminary results indicating improved query execution times compared to traditional optimizers.

Uploaded by

sparkyen1208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

2022 DeepO a Learned Query Optimizer - 副本

DeepO is a novel deep-learning-based query optimizer designed to enhance query performance in database systems, specifically integrated with PostgreSQL. It addresses limitations of previous learned optimizers by providing fine-grained optimization options and confidence-aware cost estimations without relying on cardinality. The demonstration showcases DeepO's architecture, user interface, and preliminary results indicating improved query execution times compared to traditional optimizers.

Uploaded by

sparkyen1208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Demonstration SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA

DeepO: A Learned Query Optimizer


Luming Sun, Tao Ji, Cuiping Li*, Hong Chen
{sunluming,jitao,licuiping,chong}@ruc.edu.cn
Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education
School of Information, Renmin University of China, Beijing, China

ABSTRACT Artificial intelligence has been widely adopted in many appli-


Query optimization is crucial for the query performance of database cations and the database community also has started to explore
systems. Despite decades of efforts from both research and indus- how learning techniques can be leveraged in database manage-
trial communities, query optimization remains one of the most chal- ment systems [4]. The majority of these works focus on replacing
lenging problems. Thanks to the advances in artificial intelligence, a component of the optimizer with learned models[5, 6]. How-
data-driven and learning-based techniques are seeing traction in ever, these methods are faced with several problems. For example,
database research recently. However, most former learning-based MOSE [5] is able to provide efficient cardinality estimation only
works perform less practical because they are evasive about the for multi-dimensional range predicates but neglects the problem
interaction between learning components and database systems. of Join operation. And the End-to-End cost estimator [6] demands
In this demonstration, we introduce DeepO, a novel deep-learning- large amount of training data before it brings improvement to query
based query optimizer that offers high-quality and fine-grained performance. Bao [3] is a very impressive learned optimizer, and
query optimization efficiently and practically. We implement DeepO it utilizes optimization strategies (e.g. disable Hash Join, force In-
and incorporate it into PostgreSQL, and we also provide a web user dex Scan)to generates optimized query plans. However, Bao can
interface, where users can carry out the optimization operations only offer query-level optimization, for example, disabling Hash
interactively and evaluate the optimization performance. Prelimi- Join in Bao will force all join operations in the query to use other
nary results show that DeepO outperforms the baseline PostgreSQL join method. Besides, Bao does not optimize join order directly but
optimizer. leaves it to the default query optimizer of DBMS.
Motivated by the above shortcomings, in this demo, we will
CCS CONCEPTS present the implementation of DeepO, a novel Deep-learning-based
query Optimizer. DeepO has a query cost learner based on Bayesian
• Information systems → Query optimization.
neural network [1] and tree-structured network [7], and it can
embed complex query plans into dense feature vectors and then
KEYWORDS
learn the query execution time accurately. The cost learner does not
query optimization, deep learning, AI4DB rely on cardinality, thus mitigating the performance risks caused
ACM Reference Format: by inaccurate cardinality estimation, and it can give confidence
Luming Sun, Tao Ji, Cuiping Li*, Hong Chen. 2022. DeepO: A Learned intervals of cost prediction generated by the BNN. For a query,
Query Optimizer. In Proceedings of the 2022 International Conference on DeepO generates a set of optimization candidates (which we call
Management of Data (SIGMOD ’22), June 12–17, 2022, Philadelphia, PA, USA. hints), and use the cost learner to evaluate them. Then DeepO will
ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3514221.3520167 offer top-K hints that have smaller estimated costs with relatively
higher confidence; users can use the predicted optimal hint and are
1 INTRODUCTION also allowed to check the hints and decide which one to use. By
Query optimization is of critical importance to query performance offering the opportunity for users to select the optimization hint,
of database systems. Cost-based approach is the mainstream in we avoid the interpretability issues of the black-box deep learning
query optimization yet the estimated cost relies heavily on the optimization methods. We implement DeepO and incorporate it
accuracy of cardinality estimation and hand-crafted cost models. with PostgreSQL1 to evaluate its performance and make it practical.
But in reality, non-uniform data distribution and attributes corre- We also provide a web UI, where users can submit queries and get
lation make it hard for cardinality estimation techniques such as recommended optimization hints. The web UI also demonstrates
histograms to give accurate estimations, and building an excellent the difference of execution performance of queries before and after
query optimizer can cost a lot of time and human efforts [2]. they are optimized, so that users can better understand how DeepO
improves the query performance. We identify contributions of this
*Corresponding author: Cuiping Li.
demo as follows:
Permission to make digital or hard copies of all or part of this work for personal or (1) We demonstrate DeepO, a deep-learning-based query opti-
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation mizer. To our knowledge, DeepO is the first learned optimizer
on the first page. Copyrights for components of this work owned by others than ACM integrated with real DBMS that offers fine-grained query
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, optimization.
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected]. (2) In our demonstration, we give a full explanation of DeepO’s
SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA internal components and integration details. In particular,
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9249-5/22/06. . . $15.00
https://doi.org/10.1145/3514221.3520167 1 https://www.postgresql.org/

2421
Demonstration SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA

SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA Luming Sun, Tao Ji, Cuiping Li*, Hong Chen

Users PostgreSQL

Execution Plans
Web UI Query Optimizer Query Executor

New Query

Hint Generator Cost Learner


Scan Method Query Plan Collector
Join Method
Join Order Operator Embedder

Candidate Hints Plan Embedder


Top-K
Hints
Estimated Results Learning Model

Estimated Cost
Query Plan Dataset
Result Ranker with Confidence

Online Optimization Offline Cost Learning

Figure 1: DeepO system architecture

our demo shows (i) how the cost learner embeds the query query execution plan. And in the cost learning step, embedded
plan and learns the cost, (ii) what are the optimization op- plans are used as input to train the neural network. We adopt tree-
tions and how to get them. structured network [7] to process the query plan vectors, and we
(3) We offer a web UI to interact with users and demonstrate modify the network with Bayesian neural network[1]. By doing so,
the improvements brought by DeepO. Our preliminary re- we can offer confidence-aware cost estimation for query plans.
sults show that DeepO is able to reduce query execution In the online optimization phase, DeepO firstly generates a set
time compared to baseline optimizer. We have open sourced of candidate optimization hints for new queries. At present, DeepO
DeepO, and it is available at github https://github.com/RUC- offers three kinds of optimizations including scan methods (i.e.
AIDB/DeepO. Sequential Scan and Index Scan), join methods (i.e. different kinds
of join methods such as Hash Join), and join orders (including the
order of different tables and the order of inner and outer tables).
2 SYSTEM DESCRIPTION
We utilize the pg_hint_plan2 extension to control execution plans
We first demonstrate the system workflow of DeepO, then we in- with hint phrases describing our optimization methods.
troduce the core component - the Cost Learner. After DeepO gets the optimization hint candidates, it will gen-
erate multiple query plans in parallel using the candidates. The
2.1 DeepO Workflow plans are sent to the Cost Learner to get estimated cost and the
As shown in Figure 1, DeepO consists of two phases: an offline confidence of the estimation. Then the Result Ranker will return the
cost learning phase, in which history query processing logs are top-K optimization hints that have the smallest estimation cost with
collected and the learning-based Cost Learner is trained, and an high confidence. Then in the web UI, users can use the hint with
online optimization phase, where DeepO provides users with query minimal cost directly or check the recommended hints and decide
optimization hints and corresponding estimated costs. which one to use. And the web UI also allows users to compare the
During the offline cost learning phase, DeepO collects query real performance of query execution plans using different optimiza-
execution plan logs from PostgreSQL. For example, given a SQL tion options. Every time the queries are executed by PostgreSQL,
statement, the EXPLAIN ANALYZE command in PostgreSQL can DeepO again collects the run-time performance of the optimized
be used to retrieve the tree-structured query execution plan as well plans and add them to plan dataset for further model update. In
as the actual running cost. DeepO takes these plan-cost pairs as case of change of database schema (e.g. new tables or attributes),
training data for later cost learning. DeepO will automatically fall back to PostgreSQL’s optimizer and
We train the Cost Learner using the collected data in a supervised add the plan to plan dataset to retrain the model. In this way, as
way. This process takes in executed query plans and gives out a DeepO optimizes more queries, the Cost Learner continues to learn
trained model. It involves three steps: operator embedding, plan and constantly improves its performance.
embedding, and cost learning. In the first step, all operators of the
query plans are embedded into dense and representative feature 2.2 Cost Learner Internals
vectors. Different embedding strategies are designed separately for DeepO treats the cost estimation task as a supervised learning
leaf operators and non-leaf ones in order to capture underlying and problem. During offline training process, it takes query plans as
complicated query characteristics. In the second step, we transform
the embedded operators to tree-structured format according to the 2 https://pghintplan.osdn.jp/pg_hint_plan.html

2422
Demonstration SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA

DeepO: A Learned Query Optimizer SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA

Predicted Cost
𝑉#

FC Layer
Hash Join TableA.id=TalbeB.id
Hash Join Non-leaf operator [0, 1, 0, … , 0, 1, 0, …, 0, 1, 0] FC layer
TableA.id=TableB.id
operator id column id
H#
Sequential Scan Sequential Scan
TableA Filter: a1 > 100
and a2 =‘test’ TableB Filter: b = 7 Cell
𝑉! 𝑉"

Embedding Unit
Preprocessing
Leaf operator
[0.345,0.476,...,0.327]
Leaf operator 𝑔𝑎𝑡𝑒𝑠 𝑔𝑎𝑡𝑒𝑠 𝑔𝑎𝑡𝑒𝑠
[0.145,0.532,...,0.614]
H! 𝑉# H"

(a) Plan Tree (b) Operator & Plan Embedding (c) BNN Tree Network

Figure 2: Learned cost estimator overview

input and the actual execution latency as label. The Cost Learner is , where 𝑥 is the layer input, 𝑦ˆ is the layer output, and 𝜔 is the
the core of DeepO, and Figure 2 shows an example of the embedding hyper-parameter of probability distribution 𝑝.
process and network structure of the learner. The BNN-based Tree Network unit (Figure 2(c)) has gates and
Operator embedding & Plan embedding. An abstract repre- memory cell and it can process tree-structured input. It makes full
sentation of a query execution plan is a physical operator tree, as use of information from lower layer to upper layer by transferring
illustrated in Figure 2(a). We divide operators of a query plan into cell memory and affecting the gates weight with hidden states. For
two categories according to their characteristics and position in each operator of the encoded plan, a tree unit is applied to do the
the tree: the leaf operators and non-leaf ones. The leaf operators local children-parent style aggregation. As the figure shows, hidden
are Scan operators usually with multiple predicates. And the non- vectors of all children (𝐻𝑙 and 𝐻𝑟 ) as well as embedded vector of
leaf operators are Join, Sort, etc. We design different embedding the current parent 𝑉𝑝 are combined and transformed into a new
strategies for them in order to capture underlying and complicate hidden vector. The hidden vector generated by the unit is finally fed
query characteristics. For leaf operators, the Scan predicates can into a BNN-based fully connected layer to produce the estimated
be regarded as set of words, so we treat them as a sentence and cost and estimation confidence for the query plan.
represent each non-numeric word using one-hot encoding. Since
the range filter conditions on columns vary largely, we scale the
numbers into range [0,1] using min-max normalization. Then we
feed these sequential embedded vectors into a embedding network.
After that, leaf operators get represented in dense, information-rich
vectors in low dimension (green and blue color in Figure 2(b)).
For non-leaf operators, the encoded vectors consist of two parts.
First, the operator kind is encoded into a unique one-hot vector.
Then, the additional information about the operators (e.g. Join
condition) is encoded into a vector of length T where T is the
number of all the columns in database, and the corresponding bit is
set to 1 when the column is in the conditions (See Figure 2(b)). In
order to offer dense and coordinated input to cost learning network,
we add a fully connected layer to encode the vectors of non-leaf Figure 3: Demonstration of Plan Embedding
operators into the same length as leaf embedding. After the operator
embedding phase, a encoded tree with the same structure as the
input query plan is obtained, which we call plan embedding.
BNN-based Tree Network. We adopt tree-structured network[7] 3 EXPERIMENTS AND DEMONSTRATION
in order to capture the propagation process of query plan. And we In this section, we introduce the user interface of our demonstration
employ Bayesian neural network (BNN)[1] to deploy the network. and show the preliminary performance of DeepO. We implement
BNN serves as a bridge between deep learning and Bayesian in- the Cost Learner of DeepO using PyTorch3 . Our experiments are
ference, and in BNN we can sample the network weights for a carried on the IMDB dataset [2]. We use queries from MSCN4 ,
probability distribution and then optimize these distribution pa- and after the offline cost learning process, we set the connection
rameters rather than having deterministic weights. So with BNN, it configuration of PostgreSQL and launch the optimization service
is possible to give prediction intervals and measure the confidence with the learned model. Users can use the web UI to optimize queries
the predictions. To do so, we replace the deterministic parameters interactively with DeepO and compare the performance between
of the fully-connected layers by sampling the parameters of the DeepO and other optimizer.
layer with the following equations on each feed-forward operation:
3 https://pytorch.org/
𝑦ˆ = 𝑓𝑤 (𝑥), 𝑤 ∼ 𝑝𝜔 (𝜔) (1) 4 https://github.com/andreaskipf/learnedcardinalities

2423
Demonstration SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA

SIGMOD ’22, June 12–17, 2022, Philadelphia, PA, USA Luming Sun, Tao Ji, Cuiping Li*, Hong Chen

3.1 DeepO Optimization Performance


We now demonstrate the plan embedding results and optimization
performance of DeepO.
Plan Embedding. We first verify that plan embedding can rep-
resent the query plans in a information-rich fashion. Figure 3 shows
plan embedding page of DeepO. In the right part, we demonstrate
the intermediate result of the leaf operator embedding using the
Embedding Projector5 . Similar points in high-dimensional space
are close to each other in low-dimensional space after using t-SNE
projection. As demonstrated, Scan operations on the same table
are obviously clustered together. More importantly, points are ar-
ranged in a certain order accordant with their semantic information,
proving the embedded vectors fully represent the operation details.
DeepO Optimization Performance. We evaluate the cost es-
timation performance by using the trained Cost Learner to offer
cost estimation for the test set (including both queries before and
Figure 5: Page of optimization comparison
after optimization). Since DeepO adopts Bayesian Neural Network,
we can generate prediction intervals with lower bounds and upper
bounds. Our experiments show that above 95% estimations lie in the 4 CONCLUSION
prediction interval, meaning that DeepO can provide with accurate In this demo paper, we present a deep-learning-based query opti-
cost estimation. Besides, we compare the performance between mizer called DeepO. With the cost learning ability, DeepO is able
raw queries and the queries optimized with hints offered by DeepO. to provide cost estimation to query plans and help to optimize
Results show that compared with PostgreSQL, DeepO reduces more join order and physical operator selection. The main difference of
than 5% execution latency on 15% test queries, and DeepO achieves DeepO from existing optimizers is that it learns to improve cost
similar execution performance on 79% queries. estimation performance as queries coming by and no longer re-
lies on cardinality estimation. And DeepO is more practical than
3.2 User Interface former learning-based query optimization components. We imple-
Figure 3 demonstrates the four tabs in the UI: SQL Query Tool, Plan ment DeepO and make it a well-knit component in PostgreSQL and
Optimization Analysis, Query History, and Plan Embedding. provide a web UI to demonstrate and compare the performance.

5 ACKNOWLEDGEMENT
This work was partially supported by the National Key Research
and Develop Plan under Grant 2018YFB1004401, National Natu-
ral Science Foundation of China under Grant 62072460, 62076245,
62172424, and Beijing Natural Science Foundation under Grant
4212022.

REFERENCES
[1] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015.
Weight uncertainty in neural network. In International Conference on Machine
Learning. PMLR, 1613–1622.
[2] Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper,
and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc.
VLDB Endow. 9, 3 (2015), 204–215.
[3] Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh,
Figure 4: Candidate optimization options and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. Pro-
ceedings of the 2021 International Conference on Management of Data (2021).
[4] Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma,
In the page of SQL Query Tool, users can set the connection Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth San-
configurations of PostgreSQL and enter the SQL they would like turkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu,
to optimize. As Figure 4 demonstrates, after a query is submitted, Ran Xian, and Tieying Zhang. [n.d.]. Self-Driving Database Management Sys-
tems. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research,
several optimization options will show on the page and users can Chaminade, CA, USA, January 8-11, 2017.
select the default or any option they want. [5] Luming Sun, Cuiping Li, Tao Ji, and Hong Chen. 2021. MOSE: A Monotonic
Selectivity Estimator Using Learned CDF. IEEE Transactions on Knowledge and
After several queries are optimized and executed, we can com- Data Engineering (2021).
pare their execution performance in the Plan Optimization Analysis [6] SunJi and LiGuoliang. 2019. An end-to-end learning-based cost estimator. In VLDB
page. As demonstrated in Figure 5, besides execution time, we also 2019.
[7] Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved
demonstrate the plans using pev26 . Semantic Representations From Tree-Structured Long Short-Term Memory Net-
works. In ACL.
5 https://projector.tensorflow.org/
6 https://github.com/dalibo/pev2

2424

You might also like