0% found this document useful (0 votes)

22 views6 pages

Base Paper

Uploaded by

ferat27568

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

Base Paper

Uploaded by

ferat27568

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Received February 20, 2019, accepted March 6, 2019, date of publication March 13, 2019, date of current version

April 1, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2904720

NADAQ: Natural Language Database Querying

Based on Deep Learning
BOYAN XU1 , RUICHU CAI 1 , ZHENJIE ZHANG2 , XIAOYAN YANG2 ,
ZHIFENG HAO1,3 , ZIJIAN LI1 , AND ZHIHAO LIANG1
1 Facultyof Computer, Guangdong University of Technology, Guangzhou 510006, China
2 Singapore R&D, Yitu Technology Pte Ltd., Singapore 117372
3 School of Mathematics and Big Data, Foshan University, Foshan 528000, China

Corresponding author: Ruichu Cai ([email protected])

This work was supported in part by the NSFC-Guangdong Joint Found under Grant U1501254, in part by the Natural Science Foundation
of China under Grant 61876043 and 61472089, in part by the Natural Science Foundation of Guangdong under Grant 2014A030306004
and Grant 2014A030308008, in part by the Science and Technology Planning Project of Guangdong under Grant 2015B010108006 and
Grant 2015B010131015, in part by the Guangdong High-level Personnel of Special Support Program under Grant 2015TQ01X140, in part
by the Pearl River S&T Nova Program of Guangzhou under Grant 201610010101, and in part by the Science and Technology Planning
Project of Guangzhou under Grant 201604016075.

ABSTRACT The high complexity behind SQL language and database schemas has made database querying
a challenging task to human programmers. In this paper, we present our new natural language database
querying (NADAQ) system as an alternative solution, by designing new translation models smoothly fusing
deep learning and traditional database parsing techniques. On top of the popular encoder-decoder model
for machine translation, NADAQ injects new dimensions of schema-aware bits associated with the input
words into encoder phase and adds new hidden memory neurons controlled by the finite state machine
for grammatical state tracking into the decoder phase. We further develop new techniques to enable the
augmented neural network to reject queries irrelevant to the contents of the target database and recommend
candidate queries reversely transformed into natural language. NADAQ performs well on real-world database
systems over human labeled workload, returning query results at 90% accuracy.

INDEX TERMS Databases, natural language processing, recurrent neural networks.

I. INTRODUCTION deep learning [6], has revealed a new possibility of inter-

Structured Query Language (SQL) has been the standard action mechanisms between human users and complex
querying interface of traditional relational database systems machine systems. Particularly, the huge success of perfor-
for the last few decades. A good relational database pro- mance improvement on machine translation tasks [17], [18]
grammer is expected to master SQL programming language is inspiring the adoption of natural language as the inputs of
and get familiar with the schemas of the database, before queries and commands instead of SQL, therefore minimizing
writing quality queries for database applications. Both of the the technical demands to the system administrator on both
tasks of SQL mastering and schema learning are extremely programming skills as well as knowledge to the database
challenging, even for experts with a strong computer sci- schema.
ence background. The database research community has been Different from the existing techniques in the database
devoting huge efforts to the enhancement of database usabil- community, which requires human efforts on error
ity, by easing the hardness of SQL programming and complex correction [11], deep learning approach attempts to construct
schema comprehension [20], [21]. the transformation from input natural language sequence to
The explosive development of machine learning and the output SQL language sequence purely based on the trans-
artificial intelligence techniques, especially boosted by lation samples only. As a mainstream deep learning based
approach, semantic parsing convert natural language into
The associate editor coordinating the review of this manuscript and formal and executable logical form [3], [5], [19]. Besides of
approving it for publication was Vlad Diaconita. semantic parsing, researchers are also attempting to generate

2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.
35012 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
B. Xu et al.: NADAQ Based on Deep Learning

executable logical form by directly linking the semantic rejection models for irrelevant question screening. User inter-
interpretation of the input natural language and the records face module contains the interfaces to human users to support
in the database [14], [15], [22]. all important functionalities. The key technical components
However, the straightforward solution by using deep learn- include:
ing models does not render the effectiveness and efficiency Speech Recognition converts the audio input into natural
as expected. Firstly, most of the computation in deep learning language text. The module uses the voice dictation interface
model training is wasted on learning the grammar structure of provided by the iflytek open platform.1 It also supports man-
SQL, which is actually well studied and modeled by mature ual correction, such that the user is allowed to revise the text
techniques in the database community. Secondly, machine output based on his/her own understanding.
learning approaches linking query results with individual Translation adopts a customized machine learning model
cells in the database tables [14], [15], [22] do not scale well based on encoder-decoder [2], [17], which is state-of-the-art
with the table size, even not reusable when updates are solution of machine translation. The key technical innovation
applied on the tables. Thirdly, machine learning approaches in this part is the addition of additional hidden states into the
always output the result SQL queries even when the input model to track the SQL-aware grammatical states based on a
question does not match the schema/contents of the database. finite state machine. These hidden states are helpful to filter
Inspired by recent work on integrating grammar structure out invalid output words at decoding phase and provide useful
into sequence-to-sequence model [7], [16], [23], we design hints for better neural network training.
our new natural-language database querying (NADAQ) sys- Rejection enables NADAQ to reject meaningless inputs
tem to tackle all these problems, by smoothly fusing deep from the users. In order to evaluate the relevance between
learning and traditional database query parsing techniques. user inputs and contents of the database, NADAQ builds a
The core technical contributions include: 1) instead of forcing rejection model on top of the translation model. The rejection
deep learning models to learn SQL grammar, we use finite decisions are made based on the uncertainty of the translation
state machines to track the grammatical states of the output model when choosing tables, columns and predicate expres-
SQL sequence. These states are also fed into the neural sions for the SQL queries.
network, in order to help the model to better select the subse- Recommendation enhances the effectiveness of NADAQ
quent words; 2) by tracking the selection of tables, columns by providing candidate queries to the users for refinement
and predicate expressions, we build models to evaluate the and selection. To help users without any SQL background
relevance of the original questions to the result SQL queries, knowledge, NADAQ translates the candidate queries into
and thus rejecting irrelevant questions; 3) we adopt cus- natural language, so that human users could easily understand
tomized beam search technique to identify candidate queries the physical meanings of the candidate queries and improve
and present the natural language explanations of the queries the efficiency of human-computer interaction.
to the user’s query refinement. Result Display executes the SQL query on the database
In the rest of the paper, section II overviews the system and consequently displays the query outcomes to the user.
architecture of NADAQ. Section III introduces the models
adopted by NADAQ. Section IV presents the demonstration III. MODELS IN NADAQ
workflow. Section V evaluates the performance of the models A. QUERY TRANSLATION
experimentally, and Section VI finally concludes the work. We develop a SQL-aware recurrent neural network struc-
ture on top of the encoder-decoder framework, for the task
II. SYSTEM OVERVIEW of translation from natural language to SQL, as is shown
As is shown in Figure 1, NADAQ consists of three major in Figure 2. Basically, the neural network accepts natural
components, including data storage module, model manage- language input as a sequence of words in the encoder phase.
ment module and user interface module. Data storage module After finishing the processing of all input words, the neural
includes MySQL as the underlying database engine, which network transits to decoder phase, starting to output words
extracts meta-data from the tables for translation model train-
ing and executes the SQL queries to return search results 1 http://www.xfyun.cn
to human users. Model management module is the core of
NADAQ, which manages various models for bidirectional
translation between natural language and SQL, as well as

FIGURE 2. A running example of customized encode-decode neural

FIGURE 1. NADAQ system overview. network for our SQL translation task.

VOLUME 7, 2019 35013

B. Xu et al.: NADAQ Based on Deep Learning

as the translation outcomes. The motivations of the neural possibly multiple rules from candidate long-term dependen-
network customization for SQL translation are introduced cies, e.g., rules in Table 2. The short-term dependency rule is
below. updated according to the current grammar state as well as the
last output word from the decoder. Long-term dependencies
1) ENCONDER PROCESSING are updated based on the active symbols chosen by the SQL
The key of the encoder phase is to digest the original natural parser, maintained in the grammar state vector.
language input and put the most important information in the
memory before proceeding to the decoder phase. We propose TABLE 2. Partial rules of long-term dependencies.
to extract additional semantic features that link the original
words to the semantics of the grammatical structure of the
target language. We generate a group of labels based on the
Backus-Naur Form (BNF) of SQL. Specifically, each label
corresponds to a terminal symbol in the BNF Grammar for
SQL. Given a small group of samples, we manually label the
words and employ conditional random fields (CRFs) [10] to
build effective classifiers. We use a binary vector s to indicate the masks generated
by the single rule of short-term dependency, and li for the
2) DECODER PROCESSING
i-th mask generated by the rule of long-term dependencies.
Given these masks, the word selection process in the decoder
We incorporate SQL parsing techniques into the neural net-
is modified as:
work in two different ways, including the embedding of
grammar state in the hidden layer and the masking of word
yt = σy (Wy ht + by ) ⊗ s ⊗ l1 . . . ⊗ lL , (1)
outputs. In SQL parsers, based on the precedented word out-
puts, the parser selects the candidate expression for following where L is the number of active rules of long-term
words based on the structure of BNF. Motivated by this, dependencies.
we use a binary vector structure to represent all possible
grammatical states, each of which corresponds to a candi-
B. QUERY REJECTION
date expression in BNF. We also employ a stack structure,
The query rejection model is used to block meaningless or
which is used to track the grammar states when sub-queries
irrelevant questions that cannot be processed by SQL queries
are generated recursively. Let gt denote the grammar state
based on the database schema. Our query rejection model
in top entry in the stack at time t. To incorporate gt into
is motivated by the observation on the uncertainty of the
the model, the memory of the neural network is updated as
neural network when outputting keywords in SQL output,
follows:
especially on the table name, column name and predicate
ft = σg (Wf xt + Uf ht−1 + Vf gt−1 + bf ) expressions. Therefore, NADAQ collects the statistics of
entropies of output word selection based on the grammati-
it = σg (Wi xt + Ui ht−1 + Vi gt−1 + bi )
cal states maintained by the translation model. The thresh-
ot = σg (Wo xt + Uo ht−1 + Vo gt−1 + bo ) old method and the MLP method are two basic strategies
ct = ft ⊗ ct−1 + it ⊗ σc (Wf xt + Uf ht−1 + Vf gt−1 + bf ) employed by the system. The first strategy simply rejects a
ht = ot ⊗ σ (ct ) question if the total entropy is above a specified threshold,
while the second strategy builds a 3-layer fully connect neural
where ⊗ indicates element-wise multiplication operation. network model by utilizing valid and invalid questions to the
database.
TABLE 1. Partial rules of short-term dependencies.
C. QUERY RECOMMENDATION
There are always errors in the translations from natural lan-
guage to SQL, regardless of the training performance of
the machine learning model. To further enhance the effec-
tiveness of the system, we design a recommendation mech-
anism, which returns multiple candidate SQL answers to
the user. The key technical challenge is to cover meaning-
ful and representative SQL queries which are more likely
In the decoder, there are two types of word masks used to to be answers to the original question in natural language.
filter out invalid words for outputting, which are mainly based To facilitate effective query recommendation, we design
on short-term dependencies and long-term dependencies search strategies to generate multiple answers, ranking strat-
respectively. At each step, the decoder chooses one rule from egy for candidate selection and presentation strategy for bet-
candidate short-term dependencies, e.g., rules in Table 1, and ter human-computer interaction.

35014 VOLUME 7, 2019

B. Xu et al.: NADAQ Based on Deep Learning

FIGURE 3. User interface snapshots in NADAQ. (a) Interface of query inputs and recommendation. (b) Interface of query rejection.

1) SEARCH STRATEGY TABLE 3. Workload statistics on three databases.

In sequence-to-sequence learning and its variants, beam

search is commonly used in the inference phase, e.g., [9],
in order to maintain multiple promising candidates at
the same time. We apply guided beam search, by expand-
ing the group of candidates at crucial grammatical states,
when the output sequence is attempting to choose tables or
columns.

2) RANKING STRATEGY
Given the candidate queries for selection, we rank the queries
based on certain score function over the SQL queries. The 3,543,360 publications and 1,592,014 researchers, collected
score is calculated based on the probability of the output by Microsoft Academic Search. The IMDB database has
words at pivot time steps, which are the time steps involving 3 tables containing records of 3,654 movies, 4,370 actors
expansion of the beam. The score is then normalized based on and 1,659 directors, respectively. We also employ the logical
the number of pivot time steps encountered by the individual form benchmark GEO, by converting the logical queries into
SQL query, to avoid the unfair preference to long queries. equivalent SQL queries as done in [4] and [8]. The GEO
database has 7 tables, which contains records of 368 cities
IV. DEMONSTRATION WORKFLOW
in 51 states in the USA. The statistics of the datasets are
In the demonstration of NADAQ, following the general work- listed in Table 3. Note that SQL vocabulary size of GEO is
flow of the system, users are allowed to ask queries in much larger than that of the other two datasets, and nearly
speech. As is shown in Figure 3(a), the recommendation can- 40% of the SQL statements containing sub-queries on the
didates are presented on the screen. By clicking the words in GEO dataset. On GEO, we use original workloads in the
the SQL query candidate, NADAQ displays the grammatical dataset. On the other two databases, we ask volunteers to label
status on the top right corner and the word selection options in over randomly generated queries, by describing the queries in
the bottom right corner. Once the user selects one of the rec- English. Two examples of the generated SQL queries with the
ommendations, NADAQ executes the query and present the aggregator and join operation are given as follows:
results in the box on the bottom of the interface. If NADAQ SELECT <column_array>
decides to reject the query due to the limited relevance of the FROM <table> WHERE <column> =/> <value>
original question to the contents of the database, as is shown
in Figure 3(b), it displays the histogram of valid questions’ SELECT AGG(<table_1.column_array>)
entropy, as well as the confidence threshold (the red line in FROM <table_1> INNER JOIN <table_2>
the figure) of the rejection decision. Here, the confidence ON <table_1.key> = <table_2.key>
threshold can be determined by the threshold method and the WHERE <table_2.column> =/> <value>
MLP method. All these samples are used in the training of the SQL
translation model. The translation model is implemented on
V. EXPERIMENTS top of Tensorflow 1.2.0. We employ 1 hidden layer with
We run our experiments on three databases. Aca- 256 neurons for both encoder and decoder, and softmax neu-
demic (MAS) database has 17 tables, containing records of ron for output word selection. In order to augment Long Short

VOLUME 7, 2019 35015

B. Xu et al.: NADAQ Based on Deep Learning

Term Memory [6] with the grammatical states, we manually

revise the original LSTM model in Tensorflow to add new
hidden states and manipulation logic.

A. BASELINE APPROACHES
We employ three baseline approaches in performance eval-
uation, including the convolutional neural network machine
translation (conv-seq in short) model [4], the attention-based
sequence-to-sequence machine translation (attn-seq in short) FIGURE 6. Testing results on GEO. (a) Query result accuracy. (b) Query
model [1], and semantic parsing model with feedback (SPF rejection accuracy.

in short) [7]. Note that SPF does not work on the IMDB and
MAS dataset. Because SPF uses templates to enlarge training
data, and there is no template for join operations on these two unnecessary efforts on grammar understandings. Though
datasets. NADAQ and SPF both performed well on the GEO dataset,
the F1 score of NADAQ(0.839) is still higher than that of
B. PERFORMANCE METRICS SPF(0.836) on GEO database. Moreover, NADAQ involves
We measure F1 accuracy of the results based on the recall and fewer manual operations. The F1 scores of NADAQ on IMDB
precision of the query results. To get the recall and precision, database and GEO are above 80%, close to the standard of
we executing these queries in the relational database and commercial usage.
compare the results against that of the ground-truth queries. We also report the F1 score of the query rejection mech-
anism with two alternative models on all the three datasets.
For the threshold method, the candidate is rejected if its
entropy is larger than the mean of the valid questions’ average
entropy and the invalid questions’ average entropy. For the
MLP method, a multi-layer neural network method is trained
to determine the threshold. As shown in the figures, the MLP
method outperforms the threshold method over all the three
datasets. In detail, the F1 score of the threshold approach is
around 0.75 on IMDB and GEO, and 0.65 on MAS, while
FIGURE 4. Testing results on IMDB. (a) Query result accuracy. (b) Query
that of the MLP method is above 0.9 on IMDB and MAS,
rejection accuracy. and 0.84 on GEO.

VI. CONCLUSION
In this paper, we demonstrate our new natural language
database querying (NADAQ) system. The design of NADAQ
combines the state-of-the-art deep learning techniques and
the mature SQL parsing techniques. It enables the system
to significantly improve the quality of query translation and
inspire new solutions to meaningless question rejection and
query candidate recommendation. The results show NADAQ
FIGURE 5. Testing results on MAS. (a) Query result accuracy. (b) Query provides nearly commercial standard service as a querying
rejection accuracy. interface to human users, even when they neither understand
SQL nor comprehend the database schema.

C. EXPERIMENTAL RESULTS REFERENCES

We report the experimental results of translation and rejec- [1] D. Bahdanau, K. Cho, and Y. Bengio. (2015). ‘‘Neural machine trans-
tion on three databases in Figures 4, 5 and 6 respec- lation by jointly learning to align and translate.’’ [Online]. Available:
https://arxiv.org/abs/1409.0473
tively. On IMDB database (Figure 4(a)) and MAS database [2] K. Cho et al., ‘‘Learning phrase representations using RNN
(Figure 5(a)), we present the testing results with three types of encoder-decoder for statistical machine translation,’’ in Proc. EMNLP,
workloads, Mixed workload with all join and select queries, 2014, pp. 1724–1734.
[3] L. Dong and M. Lapata, ‘‘Language to logical form with neural attention,’’
as well as Join and Select workloads with individual type of
in Proc. ACL, 2016, pp. 33–43.
queries only. On GEO database, we present the results with [4] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin,
the original workloads(Figure 6a). ‘‘Convolutional sequence to sequence learning,’’ in Proc. ICML, 2017,
Generally speaking, NADAQ outperforms conv-seq and pp. 1243–1252.
[5] K. Guu, P. Pasupat, E. Z. Liu, and P. Liang, ‘‘From language to programs:
attn-seq in all settings by a significant margin, because Bridging reinforcement learning and maximum marginal likelihood,’’ in
the addition of grammar states brings benefits to save the Proc. ACL, 2017, pp. 1051–1062.

35016 VOLUME 7, 2019

B. Xu et al.: NADAQ Based on Deep Learning

[6] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural ZHENJIE ZHANG received the B.S. degree from
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. the Department of Computer Science and Engi-
[7] S. Iyer, I. Konstas, A. Cheung, J. Krishnamurthy, and L. Zettlemoyer, neering, Fudan University, in 2004, and the Ph.D.
‘‘Learning a neural semantic parser from user feedback,’’ in Proc. ACL, degree in computer science from the School of
2017, pp. 963–973. Computing, National University of Singapore,
[8] R. Jia and P. Liang, ‘‘Data recombination for neural semantic parsing,’’ in in 2010. He was a Senior Research Scientist with
Proc. ACL, 2016, pp. 12–22. the Advanced Digital Sciences Center, Illinois at
[9] A. Kannan et al., ‘‘Smart reply: Automated response suggestion for
Singapore Pte. He is currently the R&D Director
Email,’’ in Proc. KDD, 2016, pp. 955–964.
with Singapore R&D, Yitu Technology Pte Ltd.
[10] J. Lafferty et al., ‘‘Conditional random fields: Probabilistic models for
segmenting and labeling sequence data,’’ in Proc. 18th Int. Conf. Mach. His research interests include a variety of different
Learn. (ICML), vol. 1, 2001, pp. 282–289. topics, including causality, database query processing, high-dimensional
[11] F. Li and H. V. Jagadish, ‘‘Constructing an interactive natural language indexing, and data privacy.
interface for relational databases,’’ Proc. VLDB Endowment, vol. 8, no. 1,
pp. 73–84, 2014.
[12] C. Liang, J. Berant, Q. Le, K. D. Forbus, and N. Lao, ‘‘Neural symbolic
machines: Learning semantic parsers on freebase with weak supervision,’’
in Proc. ACL, 2017, pp. 23–33. XIAOYAN YANG received the B.S. degree from
[13] P. Liang, ‘‘Learning executable semantic parsers for natural language the Department of Computer Science and Engi-
understanding,’’ Commun. ACM, vol. 59, no. 9, pp. 68–76, 2016.
neering, Fudan University, and the Ph.D. degree
[14] L. Mou, Z. Lu, H. Li, and Z. Jin, ‘‘Coupling distributed and sym-
in computer science from the School of Comput-
bolic execution for natural language queries,’’ in Proc. ICML, 2017,
pp. 2518–2526. ing, National University of Singapore. She was
[15] A. Neelakantan, Q. V. Le, M. Abadi, A. McCallum, and D. Amodei. (2016). a Postdoctoral Fellow with the Advanced Digital
‘‘Learning a natural language interface with neural programmer.’’ [Online]. Sciences Center, Illinois at Singapore Pte. She is
Available: https://arxiv.org/abs/1611.08945 currently a Senior Data Scientist with Singapore
[16] M. Rabinovich, M. Stern, and D. Klein, ‘‘Abstract syntax networks for code R&D, Yitu Technology Pte Ltd.
generation and semantic parsing,’’ in Proc. ACL, 2017, pp. 1139–1149.
[17] I. Sutskever, O. Vinyals, and Q. V. Le, ‘‘Sequence to sequence learning
with neural networks,’’ in Proc. NIPS, 2014, pp. 3104–3112.
[18] Y. Wu et al. (2016). ‘‘Google’s neural machine translation system: Bridg-
ing the gap between human and machine translation.’’ [Online]. Available:
https://arxiv.org/abs/1609.08144
ZHIFENG HAO received the B.Sc. degree in
[19] C. Xiao, M. Dymetman, and C. Gardent, ‘‘Sequence-based structured
prediction for semantic parsing,’’ in Proc. ACL, 2016, pp. 1341–1350.
mathematics from Sun Yat-sen University, in 1990,
[20] X. Yang, C. M. Procopiuc, and D. Srivastava, ‘‘Summarizing relational and the Ph.D. degree in mathematics from Nanjing
databases,’’ Proc. VLDB Endowment, vol. 2, no. 1, pp. 634–645, 2009. University, in 1995. He is currently a Professor
[21] X. Yang, C. M. Procopiuc, and D. Srivastava, ‘‘Summary graphs for with the School of Computer, Guangdong Uni-
relational database schemas,’’ Proc. VLDB Endowment, vol. 4, no. 11, versity of Technology, and also with the School
pp. 899–910, 2011. of Mathematics and Big Date, Foshan University.
[22] P. Yin, Z. Lu, H. Li, and B. Kao, ‘‘Neural enquirer: Learning to query tables His research interests include various aspects of
with natural language,’’ in Proc. IJCAI, 2016, pp. 2308–2314. algebra, machine learning, data mining, and evo-
[23] P. Yin and G. Neubig, ‘‘A syntactic neural model for general-purpose code lutionary algorithms.
generation,’’ in Proc. ACL, 2017, pp. 440–450.

BOYAN XU received the B.S. degree in com- ZIJIAN LI received the B.S. degree in software
puter science and technology from the Guangdong engineering from the Guangdong University of
University of Technology, in 2014, and the mas- Technology, in 2017, where he is currently pur-
ter’s and Ph.D. programs in computer application suing the M.S. degree in computer science and
engineering for Ph.D. degree. His research inter- technology. He has been a Visiting Student with
ests include a variety of different topics includ- the Advanced Digital Sciences Center, Singapore,
ing machine learning, natural language processing, since 2017, and has also been with Nanyang
and their applications. Technological University, since 2018. His research
interests include a variety of different topic includ-
ing natural language processing, transfer learning,
and domain adaptation.

RUICHU CAI received the B.S. degree in applied

mathematics and the Ph.D. degree in computer
science from the South China University of Tech- ZHIHAO LIANG received the B.E. degree in com-
nology, in 2005 and 2010, respectively. He was a puter science and technology from the Guangdong
Visiting Student with the National University of University of Technology, in 2018, where he is cur-
Singapore, from 2007 to 2009, and a Research rently pursuing the master’s degree. His research
Fellow with the Advanced Digital Sciences Cen- interest includes natural language processing.
ter, Illinois at Singapore Pte, from 2013 to 2014.
He is currently a Professor with the School of
Computer, Guangdong University of Technology.
His research interests include a variety of different topics including causality,
machine learning, and their applications.

VOLUME 7, 2019 35017

PRVT Logs - 23
No ratings yet
PRVT Logs - 23
9,325 pages
Ni Daqmx - User - Manual - 2025 01 28 08 56 18
No ratings yet
Ni Daqmx - User - Manual - 2025 01 28 08 56 18
631 pages
Normas Iso Publicadas - Marco e Abril
No ratings yet
Normas Iso Publicadas - Marco e Abril
71 pages
Babo Dialogue Manual Version 3.90V
No ratings yet
Babo Dialogue Manual Version 3.90V
18 pages
ATR4518R4: Antenna Specifications
100% (2)
ATR4518R4: Antenna Specifications
2 pages
Source Code Program in C For Relocation Loader
100% (2)
Source Code Program in C For Relocation Loader
6 pages
Elapsed Time (Min) Hydromer Reading K L/T Hydrometer Reading With Correction Effective Length From
No ratings yet
Elapsed Time (Min) Hydromer Reading K L/T Hydrometer Reading With Correction Effective Length From
2 pages
Manual Soft Starter 3RW44 en
0% (1)
Manual Soft Starter 3RW44 en
262 pages
Ftoon Kedwan - NLP Application - Natural Language Questions and SQL Using Computational Linguistics-CRC Press (2023)
No ratings yet
Ftoon Kedwan - NLP Application - Natural Language Questions and SQL Using Computational Linguistics-CRC Press (2023)
176 pages
Sur - Flo Turbine Meter
No ratings yet
Sur - Flo Turbine Meter
40 pages
P ADM SYS 70 Sample Questions
No ratings yet
P ADM SYS 70 Sample Questions
5 pages
Mastering Android NDK - Sample Chapter
No ratings yet
Mastering Android NDK - Sample Chapter
25 pages
A Iab 097627
No ratings yet
A Iab 097627
38 pages
BX53M BXFM
No ratings yet
BX53M BXFM
28 pages
SSL Tls
No ratings yet
SSL Tls
17 pages
Please Beware of The Phishing Scam Website That Cloned Our Webpages, Whi CH Is Paid Google Ads Shows in The First Position in Search Results
No ratings yet
Please Beware of The Phishing Scam Website That Cloned Our Webpages, Whi CH Is Paid Google Ads Shows in The First Position in Search Results
3 pages
Ch-2 Constants, Variables and Data Types
No ratings yet
Ch-2 Constants, Variables and Data Types
34 pages
12 250002341000202402003002MemorialWeb2025226175646
No ratings yet
12 250002341000202402003002MemorialWeb2025226175646
4 pages
Queuing Models Lecture Presentation
No ratings yet
Queuing Models Lecture Presentation
59 pages
SPM 2 Marks Refer
No ratings yet
SPM 2 Marks Refer
13 pages
JAVA Conditional Statements
No ratings yet
JAVA Conditional Statements
10 pages
SQLPa LM
No ratings yet
SQLPa LM
61 pages
Project 4
No ratings yet
Project 4
2 pages
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
No ratings yet
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
19 pages
CV Melkamu
No ratings yet
CV Melkamu
2 pages
Vikram Resume-2
No ratings yet
Vikram Resume-2
3 pages
Referencing Quick Guide
No ratings yet
Referencing Quick Guide
2 pages
Assignment-Iii Case Study: Domino'S Sizzles With Pizza Tracker
No ratings yet
Assignment-Iii Case Study: Domino'S Sizzles With Pizza Tracker
3 pages
Enhancing Text-To-SQL Capabilities of Large Language Models
No ratings yet
Enhancing Text-To-SQL Capabilities of Large Language Models
22 pages
Why The Business Model Canvas Is: But Not Great
No ratings yet
Why The Business Model Canvas Is: But Not Great
2 pages
Codigo de Sistema de Panel Solar Seguidor de Luz
No ratings yet
Codigo de Sistema de Panel Solar Seguidor de Luz
4 pages
Structure-Guided Large Language Models For
No ratings yet
Structure-Guided Large Language Models For
24 pages
Project Report - 7 - Merged
No ratings yet
Project Report - 7 - Merged
46 pages
E-SQL Direct Schema Linking Via Question Enrichment in
No ratings yet
E-SQL Direct Schema Linking Via Question Enrichment in
20 pages
LLM Based Survey Text 1741015993
No ratings yet
LLM Based Survey Text 1741015993
20 pages
NL 2 SQL
No ratings yet
NL 2 SQL
12 pages
Nusrat Jahan Brishty UPDATED RESUME
No ratings yet
Nusrat Jahan Brishty UPDATED RESUME
1 page
From Natural Language To SQL Review of
No ratings yet
From Natural Language To SQL Review of
15 pages
Research Paper
No ratings yet
Research Paper
32 pages
A Survey NLP Natural Language Processing and Trans
No ratings yet
A Survey NLP Natural Language Processing and Trans
12 pages
Bluecoat Syslog - Access Logs
No ratings yet
Bluecoat Syslog - Access Logs
4 pages
SUJAN CV Computer Operator
No ratings yet
SUJAN CV Computer Operator
1 page
Context-Based Ontology Modelling For Database: Enabling Chatgpt For Semantic Database Management
No ratings yet
Context-Based Ontology Modelling For Database: Enabling Chatgpt For Semantic Database Management
16 pages
E-SQL: Direct Schema Linking Via Question Enrichment in Text-to-SQL
No ratings yet
E-SQL: Direct Schema Linking Via Question Enrichment in Text-to-SQL
18 pages
Project Report - 6
No ratings yet
Project Report - 6
7 pages
Few-Shot Text-to-SQL Translation Using Structure
No ratings yet
Few-Shot Text-to-SQL Translation Using Structure
28 pages
Project Report - 4
No ratings yet
Project Report - 4
5 pages
Catsql:: Towards Real World Natural Language To SQL Applications
No ratings yet
Catsql:: Towards Real World Natural Language To SQL Applications
14 pages
Ijarcce 235
No ratings yet
Ijarcce 235
3 pages
Syntax and Relation Enhanced Query Generation For
No ratings yet
Syntax and Relation Enhanced Query Generation For
12 pages
QueryPilot 0th Review
No ratings yet
QueryPilot 0th Review
17 pages
Project Report - 3
No ratings yet
Project Report - 3
3 pages
Natural Language To SQL Queries
No ratings yet
Natural Language To SQL Queries
17 pages
Large Language Model Enhanced Text-to-SQL Generation - A Survey
No ratings yet
Large Language Model Enhanced Text-to-SQL Generation - A Survey
18 pages
1711 04436v1
No ratings yet
1711 04436v1
13 pages
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
No ratings yet
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
18 pages
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
No ratings yet
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
17 pages
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
No ratings yet
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
11 pages
Dbms Lab El Report
No ratings yet
Dbms Lab El Report
20 pages
The Future of Database Systems: Innovations and Challenges in Natural Language Interfaces
No ratings yet
The Future of Database Systems: Innovations and Challenges in Natural Language Interfaces
7 pages
Ccece 2019 8861892
No ratings yet
Ccece 2019 8861892
4 pages
247 Sqlnet Generating Structured Q
No ratings yet
247 Sqlnet Generating Structured Q
15 pages
Nlidb PPT (33247)
No ratings yet
Nlidb PPT (33247)
18 pages
LSTM4
No ratings yet
LSTM4
5 pages
Data Democratisation With Deep Learning
No ratings yet
Data Democratisation With Deep Learning
4 pages
1.1 Overview
No ratings yet
1.1 Overview
4 pages
Mid Sem Report
No ratings yet
Mid Sem Report
11 pages
LLM Model Transform For Short Term Trading On Commodity
No ratings yet
LLM Model Transform For Short Term Trading On Commodity
7 pages
AIOS Compiler - LLM As Interpreter For Natural Language Programming and Flow Programming of AI Agents
No ratings yet
AIOS Compiler - LLM As Interpreter For Natural Language Programming and Flow Programming of AI Agents
12 pages
Base Paper
No ratings yet
Base Paper
10 pages
NLQ 262290 5914375 NLQ
No ratings yet
NLQ 262290 5914375 NLQ
8 pages
QASs Presentation
No ratings yet
QASs Presentation
20 pages
Formation of SQL From Natural Language Query Using NLP: Uma M Sneha V Sneha G
No ratings yet
Formation of SQL From Natural Language Query Using NLP: Uma M Sneha V Sneha G
5 pages
Question Answering System: 296: Natural Language Processing
No ratings yet
Question Answering System: 296: Natural Language Processing
30 pages
Natural Language Interfaces To Database (NLIDB) PDF
No ratings yet
Natural Language Interfaces To Database (NLIDB) PDF
3 pages
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
No ratings yet
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
17 pages
Constructing An Interactive Natural Language Interface For Relational Databases
No ratings yet
Constructing An Interactive Natural Language Interface For Relational Databases
12 pages
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
No ratings yet
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
18 pages
Seq 2 SQL
No ratings yet
Seq 2 SQL
13 pages
Natural Language Processing With Some Abbreviation To SQL
No ratings yet
Natural Language Processing With Some Abbreviation To SQL
5 pages
LLM Based TXT To SQL
No ratings yet
LLM Based TXT To SQL
18 pages
LANLI: A Natural Language Interfacing Tool For Relational Database Query Generation
No ratings yet
LANLI: A Natural Language Interfacing Tool For Relational Database Query Generation
14 pages
Question Answering System For Election Database in Telugu Language
No ratings yet
Question Answering System For Election Database in Telugu Language
5 pages
Literature Review On Vulnerability Detection Using
No ratings yet
Literature Review On Vulnerability Detection Using
10 pages
Accessing Database Using NLP
No ratings yet
Accessing Database Using NLP
6 pages
NLP To SQL
No ratings yet
NLP To SQL
1 page
NLQ PDF
No ratings yet
NLQ PDF
5 pages
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
No ratings yet
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
7 pages
Advances in Natural Language Processing
No ratings yet
Advances in Natural Language Processing
7 pages

Base Paper

Uploaded by

Base Paper

Uploaded by

Received February 20, 2019, accepted March 6, 2019, date of publication March 13, 2019, date of current version

NADAQ: Natural Language Database Querying

Corresponding author: Ruichu Cai ([email protected])

INDEX TERMS Databases, natural language processing, recurrent neural networks.

I. INTRODUCTION deep learning [6], has revealed a new possibility of inter-

FIGURE 2. A running example of customized encode-decode neural

VOLUME 7, 2019 35013

35014 VOLUME 7, 2019

1) SEARCH STRATEGY TABLE 3. Workload statistics on three databases.

In sequence-to-sequence learning and its variants, beam

VOLUME 7, 2019 35015

Term Memory [6] with the grammatical states, we manually

C. EXPERIMENTAL RESULTS REFERENCES

35016 VOLUME 7, 2019

RUICHU CAI received the B.S. degree in applied

VOLUME 7, 2019 35017

You might also like