Base Paper
Base Paper
April 1, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2904720
ABSTRACT The high complexity behind SQL language and database schemas has made database querying
a challenging task to human programmers. In this paper, we present our new natural language database
querying (NADAQ) system as an alternative solution, by designing new translation models smoothly fusing
deep learning and traditional database parsing techniques. On top of the popular encoder-decoder model
for machine translation, NADAQ injects new dimensions of schema-aware bits associated with the input
words into encoder phase and adds new hidden memory neurons controlled by the finite state machine
for grammatical state tracking into the decoder phase. We further develop new techniques to enable the
augmented neural network to reject queries irrelevant to the contents of the target database and recommend
candidate queries reversely transformed into natural language. NADAQ performs well on real-world database
systems over human labeled workload, returning query results at 90% accuracy.
2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.
35012 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
B. Xu et al.: NADAQ Based on Deep Learning
executable logical form by directly linking the semantic rejection models for irrelevant question screening. User inter-
interpretation of the input natural language and the records face module contains the interfaces to human users to support
in the database [14], [15], [22]. all important functionalities. The key technical components
However, the straightforward solution by using deep learn- include:
ing models does not render the effectiveness and efficiency Speech Recognition converts the audio input into natural
as expected. Firstly, most of the computation in deep learning language text. The module uses the voice dictation interface
model training is wasted on learning the grammar structure of provided by the iflytek open platform.1 It also supports man-
SQL, which is actually well studied and modeled by mature ual correction, such that the user is allowed to revise the text
techniques in the database community. Secondly, machine output based on his/her own understanding.
learning approaches linking query results with individual Translation adopts a customized machine learning model
cells in the database tables [14], [15], [22] do not scale well based on encoder-decoder [2], [17], which is state-of-the-art
with the table size, even not reusable when updates are solution of machine translation. The key technical innovation
applied on the tables. Thirdly, machine learning approaches in this part is the addition of additional hidden states into the
always output the result SQL queries even when the input model to track the SQL-aware grammatical states based on a
question does not match the schema/contents of the database. finite state machine. These hidden states are helpful to filter
Inspired by recent work on integrating grammar structure out invalid output words at decoding phase and provide useful
into sequence-to-sequence model [7], [16], [23], we design hints for better neural network training.
our new natural-language database querying (NADAQ) sys- Rejection enables NADAQ to reject meaningless inputs
tem to tackle all these problems, by smoothly fusing deep from the users. In order to evaluate the relevance between
learning and traditional database query parsing techniques. user inputs and contents of the database, NADAQ builds a
The core technical contributions include: 1) instead of forcing rejection model on top of the translation model. The rejection
deep learning models to learn SQL grammar, we use finite decisions are made based on the uncertainty of the translation
state machines to track the grammatical states of the output model when choosing tables, columns and predicate expres-
SQL sequence. These states are also fed into the neural sions for the SQL queries.
network, in order to help the model to better select the subse- Recommendation enhances the effectiveness of NADAQ
quent words; 2) by tracking the selection of tables, columns by providing candidate queries to the users for refinement
and predicate expressions, we build models to evaluate the and selection. To help users without any SQL background
relevance of the original questions to the result SQL queries, knowledge, NADAQ translates the candidate queries into
and thus rejecting irrelevant questions; 3) we adopt cus- natural language, so that human users could easily understand
tomized beam search technique to identify candidate queries the physical meanings of the candidate queries and improve
and present the natural language explanations of the queries the efficiency of human-computer interaction.
to the user’s query refinement. Result Display executes the SQL query on the database
In the rest of the paper, section II overviews the system and consequently displays the query outcomes to the user.
architecture of NADAQ. Section III introduces the models
adopted by NADAQ. Section IV presents the demonstration III. MODELS IN NADAQ
workflow. Section V evaluates the performance of the models A. QUERY TRANSLATION
experimentally, and Section VI finally concludes the work. We develop a SQL-aware recurrent neural network struc-
ture on top of the encoder-decoder framework, for the task
II. SYSTEM OVERVIEW of translation from natural language to SQL, as is shown
As is shown in Figure 1, NADAQ consists of three major in Figure 2. Basically, the neural network accepts natural
components, including data storage module, model manage- language input as a sequence of words in the encoder phase.
ment module and user interface module. Data storage module After finishing the processing of all input words, the neural
includes MySQL as the underlying database engine, which network transits to decoder phase, starting to output words
extracts meta-data from the tables for translation model train-
ing and executes the SQL queries to return search results 1 http://www.xfyun.cn
to human users. Model management module is the core of
NADAQ, which manages various models for bidirectional
translation between natural language and SQL, as well as
as the translation outcomes. The motivations of the neural possibly multiple rules from candidate long-term dependen-
network customization for SQL translation are introduced cies, e.g., rules in Table 2. The short-term dependency rule is
below. updated according to the current grammar state as well as the
last output word from the decoder. Long-term dependencies
1) ENCONDER PROCESSING are updated based on the active symbols chosen by the SQL
The key of the encoder phase is to digest the original natural parser, maintained in the grammar state vector.
language input and put the most important information in the
memory before proceeding to the decoder phase. We propose TABLE 2. Partial rules of long-term dependencies.
to extract additional semantic features that link the original
words to the semantics of the grammatical structure of the
target language. We generate a group of labels based on the
Backus-Naur Form (BNF) of SQL. Specifically, each label
corresponds to a terminal symbol in the BNF Grammar for
SQL. Given a small group of samples, we manually label the
words and employ conditional random fields (CRFs) [10] to
build effective classifiers. We use a binary vector s to indicate the masks generated
by the single rule of short-term dependency, and li for the
2) DECODER PROCESSING
i-th mask generated by the rule of long-term dependencies.
Given these masks, the word selection process in the decoder
We incorporate SQL parsing techniques into the neural net-
is modified as:
work in two different ways, including the embedding of
grammar state in the hidden layer and the masking of word
yt = σy (Wy ht + by ) ⊗ s ⊗ l1 . . . ⊗ lL , (1)
outputs. In SQL parsers, based on the precedented word out-
puts, the parser selects the candidate expression for following where L is the number of active rules of long-term
words based on the structure of BNF. Motivated by this, dependencies.
we use a binary vector structure to represent all possible
grammatical states, each of which corresponds to a candi-
B. QUERY REJECTION
date expression in BNF. We also employ a stack structure,
The query rejection model is used to block meaningless or
which is used to track the grammar states when sub-queries
irrelevant questions that cannot be processed by SQL queries
are generated recursively. Let gt denote the grammar state
based on the database schema. Our query rejection model
in top entry in the stack at time t. To incorporate gt into
is motivated by the observation on the uncertainty of the
the model, the memory of the neural network is updated as
neural network when outputting keywords in SQL output,
follows:
especially on the table name, column name and predicate
ft = σg (Wf xt + Uf ht−1 + Vf gt−1 + bf ) expressions. Therefore, NADAQ collects the statistics of
entropies of output word selection based on the grammati-
it = σg (Wi xt + Ui ht−1 + Vi gt−1 + bi )
cal states maintained by the translation model. The thresh-
ot = σg (Wo xt + Uo ht−1 + Vo gt−1 + bo ) old method and the MLP method are two basic strategies
ct = ft ⊗ ct−1 + it ⊗ σc (Wf xt + Uf ht−1 + Vf gt−1 + bf ) employed by the system. The first strategy simply rejects a
ht = ot ⊗ σ (ct ) question if the total entropy is above a specified threshold,
while the second strategy builds a 3-layer fully connect neural
where ⊗ indicates element-wise multiplication operation. network model by utilizing valid and invalid questions to the
database.
TABLE 1. Partial rules of short-term dependencies.
C. QUERY RECOMMENDATION
There are always errors in the translations from natural lan-
guage to SQL, regardless of the training performance of
the machine learning model. To further enhance the effec-
tiveness of the system, we design a recommendation mech-
anism, which returns multiple candidate SQL answers to
the user. The key technical challenge is to cover meaning-
ful and representative SQL queries which are more likely
In the decoder, there are two types of word masks used to to be answers to the original question in natural language.
filter out invalid words for outputting, which are mainly based To facilitate effective query recommendation, we design
on short-term dependencies and long-term dependencies search strategies to generate multiple answers, ranking strat-
respectively. At each step, the decoder chooses one rule from egy for candidate selection and presentation strategy for bet-
candidate short-term dependencies, e.g., rules in Table 1, and ter human-computer interaction.
FIGURE 3. User interface snapshots in NADAQ. (a) Interface of query inputs and recommendation. (b) Interface of query rejection.
2) RANKING STRATEGY
Given the candidate queries for selection, we rank the queries
based on certain score function over the SQL queries. The 3,543,360 publications and 1,592,014 researchers, collected
score is calculated based on the probability of the output by Microsoft Academic Search. The IMDB database has
words at pivot time steps, which are the time steps involving 3 tables containing records of 3,654 movies, 4,370 actors
expansion of the beam. The score is then normalized based on and 1,659 directors, respectively. We also employ the logical
the number of pivot time steps encountered by the individual form benchmark GEO, by converting the logical queries into
SQL query, to avoid the unfair preference to long queries. equivalent SQL queries as done in [4] and [8]. The GEO
database has 7 tables, which contains records of 368 cities
IV. DEMONSTRATION WORKFLOW
in 51 states in the USA. The statistics of the datasets are
In the demonstration of NADAQ, following the general work- listed in Table 3. Note that SQL vocabulary size of GEO is
flow of the system, users are allowed to ask queries in much larger than that of the other two datasets, and nearly
speech. As is shown in Figure 3(a), the recommendation can- 40% of the SQL statements containing sub-queries on the
didates are presented on the screen. By clicking the words in GEO dataset. On GEO, we use original workloads in the
the SQL query candidate, NADAQ displays the grammatical dataset. On the other two databases, we ask volunteers to label
status on the top right corner and the word selection options in over randomly generated queries, by describing the queries in
the bottom right corner. Once the user selects one of the rec- English. Two examples of the generated SQL queries with the
ommendations, NADAQ executes the query and present the aggregator and join operation are given as follows:
results in the box on the bottom of the interface. If NADAQ SELECT <column_array>
decides to reject the query due to the limited relevance of the FROM <table> WHERE <column> =/> <value>
original question to the contents of the database, as is shown
in Figure 3(b), it displays the histogram of valid questions’ SELECT AGG(<table_1.column_array>)
entropy, as well as the confidence threshold (the red line in FROM <table_1> INNER JOIN <table_2>
the figure) of the rejection decision. Here, the confidence ON <table_1.key> = <table_2.key>
threshold can be determined by the threshold method and the WHERE <table_2.column> =/> <value>
MLP method. All these samples are used in the training of the SQL
translation model. The translation model is implemented on
V. EXPERIMENTS top of Tensorflow 1.2.0. We employ 1 hidden layer with
We run our experiments on three databases. Aca- 256 neurons for both encoder and decoder, and softmax neu-
demic (MAS) database has 17 tables, containing records of ron for output word selection. In order to augment Long Short
A. BASELINE APPROACHES
We employ three baseline approaches in performance eval-
uation, including the convolutional neural network machine
translation (conv-seq in short) model [4], the attention-based
sequence-to-sequence machine translation (attn-seq in short) FIGURE 6. Testing results on GEO. (a) Query result accuracy. (b) Query
model [1], and semantic parsing model with feedback (SPF rejection accuracy.
in short) [7]. Note that SPF does not work on the IMDB and
MAS dataset. Because SPF uses templates to enlarge training
data, and there is no template for join operations on these two unnecessary efforts on grammar understandings. Though
datasets. NADAQ and SPF both performed well on the GEO dataset,
the F1 score of NADAQ(0.839) is still higher than that of
B. PERFORMANCE METRICS SPF(0.836) on GEO database. Moreover, NADAQ involves
We measure F1 accuracy of the results based on the recall and fewer manual operations. The F1 scores of NADAQ on IMDB
precision of the query results. To get the recall and precision, database and GEO are above 80%, close to the standard of
we executing these queries in the relational database and commercial usage.
compare the results against that of the ground-truth queries. We also report the F1 score of the query rejection mech-
anism with two alternative models on all the three datasets.
For the threshold method, the candidate is rejected if its
entropy is larger than the mean of the valid questions’ average
entropy and the invalid questions’ average entropy. For the
MLP method, a multi-layer neural network method is trained
to determine the threshold. As shown in the figures, the MLP
method outperforms the threshold method over all the three
datasets. In detail, the F1 score of the threshold approach is
around 0.75 on IMDB and GEO, and 0.65 on MAS, while
FIGURE 4. Testing results on IMDB. (a) Query result accuracy. (b) Query
that of the MLP method is above 0.9 on IMDB and MAS,
rejection accuracy. and 0.84 on GEO.
VI. CONCLUSION
In this paper, we demonstrate our new natural language
database querying (NADAQ) system. The design of NADAQ
combines the state-of-the-art deep learning techniques and
the mature SQL parsing techniques. It enables the system
to significantly improve the quality of query translation and
inspire new solutions to meaningless question rejection and
query candidate recommendation. The results show NADAQ
FIGURE 5. Testing results on MAS. (a) Query result accuracy. (b) Query provides nearly commercial standard service as a querying
rejection accuracy. interface to human users, even when they neither understand
SQL nor comprehend the database schema.
[6] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural ZHENJIE ZHANG received the B.S. degree from
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. the Department of Computer Science and Engi-
[7] S. Iyer, I. Konstas, A. Cheung, J. Krishnamurthy, and L. Zettlemoyer, neering, Fudan University, in 2004, and the Ph.D.
‘‘Learning a neural semantic parser from user feedback,’’ in Proc. ACL, degree in computer science from the School of
2017, pp. 963–973. Computing, National University of Singapore,
[8] R. Jia and P. Liang, ‘‘Data recombination for neural semantic parsing,’’ in in 2010. He was a Senior Research Scientist with
Proc. ACL, 2016, pp. 12–22. the Advanced Digital Sciences Center, Illinois at
[9] A. Kannan et al., ‘‘Smart reply: Automated response suggestion for
Singapore Pte. He is currently the R&D Director
Email,’’ in Proc. KDD, 2016, pp. 955–964.
with Singapore R&D, Yitu Technology Pte Ltd.
[10] J. Lafferty et al., ‘‘Conditional random fields: Probabilistic models for
segmenting and labeling sequence data,’’ in Proc. 18th Int. Conf. Mach. His research interests include a variety of different
Learn. (ICML), vol. 1, 2001, pp. 282–289. topics, including causality, database query processing, high-dimensional
[11] F. Li and H. V. Jagadish, ‘‘Constructing an interactive natural language indexing, and data privacy.
interface for relational databases,’’ Proc. VLDB Endowment, vol. 8, no. 1,
pp. 73–84, 2014.
[12] C. Liang, J. Berant, Q. Le, K. D. Forbus, and N. Lao, ‘‘Neural symbolic
machines: Learning semantic parsers on freebase with weak supervision,’’
in Proc. ACL, 2017, pp. 23–33. XIAOYAN YANG received the B.S. degree from
[13] P. Liang, ‘‘Learning executable semantic parsers for natural language the Department of Computer Science and Engi-
understanding,’’ Commun. ACM, vol. 59, no. 9, pp. 68–76, 2016.
neering, Fudan University, and the Ph.D. degree
[14] L. Mou, Z. Lu, H. Li, and Z. Jin, ‘‘Coupling distributed and sym-
in computer science from the School of Comput-
bolic execution for natural language queries,’’ in Proc. ICML, 2017,
pp. 2518–2526. ing, National University of Singapore. She was
[15] A. Neelakantan, Q. V. Le, M. Abadi, A. McCallum, and D. Amodei. (2016). a Postdoctoral Fellow with the Advanced Digital
‘‘Learning a natural language interface with neural programmer.’’ [Online]. Sciences Center, Illinois at Singapore Pte. She is
Available: https://arxiv.org/abs/1611.08945 currently a Senior Data Scientist with Singapore
[16] M. Rabinovich, M. Stern, and D. Klein, ‘‘Abstract syntax networks for code R&D, Yitu Technology Pte Ltd.
generation and semantic parsing,’’ in Proc. ACL, 2017, pp. 1139–1149.
[17] I. Sutskever, O. Vinyals, and Q. V. Le, ‘‘Sequence to sequence learning
with neural networks,’’ in Proc. NIPS, 2014, pp. 3104–3112.
[18] Y. Wu et al. (2016). ‘‘Google’s neural machine translation system: Bridg-
ing the gap between human and machine translation.’’ [Online]. Available:
https://arxiv.org/abs/1609.08144
ZHIFENG HAO received the B.Sc. degree in
[19] C. Xiao, M. Dymetman, and C. Gardent, ‘‘Sequence-based structured
prediction for semantic parsing,’’ in Proc. ACL, 2016, pp. 1341–1350.
mathematics from Sun Yat-sen University, in 1990,
[20] X. Yang, C. M. Procopiuc, and D. Srivastava, ‘‘Summarizing relational and the Ph.D. degree in mathematics from Nanjing
databases,’’ Proc. VLDB Endowment, vol. 2, no. 1, pp. 634–645, 2009. University, in 1995. He is currently a Professor
[21] X. Yang, C. M. Procopiuc, and D. Srivastava, ‘‘Summary graphs for with the School of Computer, Guangdong Uni-
relational database schemas,’’ Proc. VLDB Endowment, vol. 4, no. 11, versity of Technology, and also with the School
pp. 899–910, 2011. of Mathematics and Big Date, Foshan University.
[22] P. Yin, Z. Lu, H. Li, and B. Kao, ‘‘Neural enquirer: Learning to query tables His research interests include various aspects of
with natural language,’’ in Proc. IJCAI, 2016, pp. 2308–2314. algebra, machine learning, data mining, and evo-
[23] P. Yin and G. Neubig, ‘‘A syntactic neural model for general-purpose code lutionary algorithms.
generation,’’ in Proc. ACL, 2017, pp. 440–450.
BOYAN XU received the B.S. degree in com- ZIJIAN LI received the B.S. degree in software
puter science and technology from the Guangdong engineering from the Guangdong University of
University of Technology, in 2014, and the mas- Technology, in 2017, where he is currently pur-
ter’s and Ph.D. programs in computer application suing the M.S. degree in computer science and
engineering for Ph.D. degree. His research inter- technology. He has been a Visiting Student with
ests include a variety of different topics includ- the Advanced Digital Sciences Center, Singapore,
ing machine learning, natural language processing, since 2017, and has also been with Nanyang
and their applications. Technological University, since 2018. His research
interests include a variety of different topic includ-
ing natural language processing, transfer learning,
and domain adaptation.