ChatGPT SQL

Text to sql research paper publish in 2022

Uploaded by

Abdul Bari Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views7 pages

ChatGPT SQL

Text to sql research paper publish in 2022

Uploaded by

Abdul Bari Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL

capability
Aiwei Liu1 , Xuming Hu1 , Lijie Wen1 , Philip S. Yu1,2
1
Tsinghua University
2
University of Illinois at Chicago
1
{liuaw20, hxm19}@mails.tsinghua.edu.cn
1
[email protected] 2 [email protected]

Abstract have begun to analyze the zero-shot ability of Chat-

GPT in various natural language processing tasks,
This paper presents the first comprehensive
such as information extraction (Wei et al., 2023),
analysis of ChatGPT’s Text-to-SQL ability.
text summarization (Wang et al., 2023), and mathe-
arXiv:2303.13547v1 [cs.CL] 12 Mar 2023

Given the recent emergence of large-scale

conversational language model ChatGPT and matical abilities (Frieder et al., 2023). Due to Chat-
its impressive capabilities in both conver- GPT’s strong ability in code generation and the fact
sational abilities and code generation, we that code generation models usually require a large
sought to evaluate its Text-to-SQL perfor- amount of annotated data to produce good results,
mance. We conducted experiments on 12 a zero-shot code generation model is very impor-
benchmark datasets with different languages,
tant. This paper first conducts a comprehensive
settings, or scenarios, and the results demon-
strate that ChatGPT has strong text-to-SQL evaluation of ChatGPT’s zero-shot performance on
abilities. Although there is still a gap from a challenging code generation task: Text-to-SQL.
the current state-of-the-art (SOTA) model per- The Text-to-SQL task involves converting user
formance, considering that the experiment was input text into SQL statements that can be exe-
conducted in a zero-shot scenario, ChatGPT’s cuted on a database, allowing non-expert users
performance is still impressive. Notably, in
to better access the contents of a database. The
the ADVETA (RPL) scenario, the zero-shot
ChatGPT even outperforms the SOTA model design of Text-to-SQL models is typically chal-
that requires fine-tuning on the Spider dataset lenging because they need to work across different
by 4.1%, demonstrating its potential for use databases and consider various user text input text
in practical applications. To support further and database structures. Due to the complexity
research in related fields, we have made the of the Text-to-SQL task, a comprehensive eval-
data generated by ChatGPT publicly available uation of its performance requires consideration
at https://github.com/THU-BPM/chatgpt-sql.
of a variety of scenarios in addition to the clas-
1 Introduction sic Spider dataset (Yu et al., 2018). For example,
Spider-SYN (Gan et al., 2021a) focuses on scenar-
With the increasing attention given to large-scale ios where the data schema mentioned in the user
language models, they have become an essential text input is synonymous with the database schema,
component in natural language processing. As the Spider-DK (Gan et al., 2021b) considers scenarios
size of pre-trained models grows, their usage is where the input question contains additional knowl-
also gradually changing. Different from models edge, Spider-CG (Gan et al., 2022) emphasizes
such as BERT (Devlin et al., 2018) and T5 (Raffel the combination generalization ability of models,
et al., 2020), which require fine-tuning with a small and ADVETA (Pi et al., 2022) considers scenarios
amount of data, models such as GPT-3 (Brown where column names in the database have been
et al., 2020), require the prompt design to generate modified. Additionally, to better reflect real-world
target outputs. The recent ChatGPT1 model, which scenarios, SParC(Yu et al., 2019b) and CoSQL (Yu
employs Reinforcement Learning for Human Feed- et al., 2019a) incorporate multi-turn interaction be-
back (RLHF) (Christiano et al., 2017), simplifies tween the user and the system. Finally, to evaluate
prompt design, enabling better utilization of the models’ multilingual capabilities, CSpider (Min
zero-shot ability of large-scale pre-trained models et al., 2019) and DuSQL (Wang et al., 2020) evalu-
in a conversational way. Based on this, many works ate Text-to-SQL performance in Chinese.
1
https://chat.openai.com/ During our experiments, we evaluate the ability
of ChatGPT on 12 different Text-to-SQL bench- ### Complete sqlite SQL query only and with no explanation
### Sqlite SQL tables, with their properties:
mark datasets. Based on the experimental results, #
# AREA_CODE_STATE(area_code,state);
# CONTESTANTS(contestant_number,contestant_name);
we conclude the following observations: # VOTES(vote_id,phone_number,state,contestant_number,created).
#
### Return the names of the contestants whose names contain the
substring 'Al' .
1. Compared to the current state-of-the-art SELECT

(SOTA) model that uses complete training SELECT contestant_name from contestants where contestant_name
like %al%
data, ChatGPT without using task-specific
training data only performs 14% worse. This ###
###Complete
###
Complete
###Sqlite SQL
Sqlite
sqlite
SQL
SQLSQL
sqlite
tables,
query
withwith
tables,
only and
query only
theirtheir
with
and
properties:
nowith
explanation
properties:
no explanation
#
#
already demonstrates that ChatGPT is a strong # airlines(uid,Airline,Abbreviation,Country);
# airports(City,AirportCode,AirportName,Country,CountryAbbrev);
# airlines(uid,Airline,Abbreviation,Country);
# flights(Airline,FlightNo,SourceAirport,DestAirport).
# airports(City,AirportCode,AirportName,Country,CountryAbbrev);
zero-shot Text-to-SQL converter. # flights(Airline,FlightNo,SourceAirport,DestAirport).
#
# What are all the abbreviations?
###
SELECT
### What are all the abbreviations?
SELECT
2. The robustness of ChatGPT in generating SQL
SELECT * FROM AIRLINES
statements is very strong, and the performance
gap between ChatGPT and the SOTA models ### What is the abbreviation for Jetblue Airways?
SELECT
is only 7.8% on some robustness settings of
the Spider dataset, which is lower than the SELECT * FROM AIRLINES WHERE Airline = “JetBlue Airways”

14% gap on the standard Spider dataset. ### What is the country corresponding it?
SELECT

3. In the ADVETA (Pi et al., 2022) scenario SELECT Country FROM AIRLINES WHERE Airline = “JetBlue
Airways”
where the column names in the database
are adversarially modified, ChatGPT’s per-
Figure 1: Example prompts for Text-to-SQL using
formance even surpasses that of the current
ChatGPT. The prompt at the top is for a single-turn sce-
SOTA models by 4.1%. nario, while the one below is for multi-turn scenarios
where only new questions are added in each interaction.
4. The Exact Match metric of the data generated
by ChatGPT is very low because there are
many different ways to express SQLs with scenario, where the prompt for the first interaction
the same purpose. Therefore, we mainly use is the same as that in the single-turn scenario, and
execution accuracy as the evaluation metric. for subsequent interactions, only the new questions
are required.
Overall, our experiments demonstrate that Chat-
GPT has strong Text-to-SQL capabilities and ro-
3 Experiment
bustness, and it outperforms SOTA models in cer-
tain scenarios. 3.1 Experiment Setup

2 Method Datasets. We conduct extensive experiments on

twelve public benchmark datasets as follows:
In order to enable ChatGPT to generate accurate (1) Spider (Yu et al., 2018) is a large-scale
SQL outputs, we utilized the prompt as shown cross-domain Text-to-SQL benchmark. It contains
in Figure 1. To ensure a fair demonstration of 8659 training samples across 146 databases and
ChatGPT’s Text-to-SQL capabilities, we directly 1034 evaluation samples across 20 databases. (2)
adopted the Text-to-SQL prompt used in the Ope- Spider-SYN (Gan et al., 2021a) is a challenging
nAI demo webwite2 without conducting further variant of the Spider evaluation dataset. Spider-
prompt exploration. SYN is constructed by manually modifying natural
The upper half of Figure 1 represents the prompt language questions with synonym substitutions.
in a single-turn Text-to-SQL scenario, where only (3) Spider-DK (Gan et al., 2021b) is a human-
the database and question information is required curated dataset based on Spider, which samples
in the prompt. Meanwhile, in order to facilitate 535 question-SQL pairs across 10 databases
further evaluations, we emphasize in the prompt from the Spider development set and modifies
that the generated SQL statements can be executed them to incorporate the domain knowledge. (4)
in an SQLite database. The lower half of Figure 1 Spider-Realistic (Deng et al., 2020) is a new
represents the prompt in a multi-turn Text-to-SQL evaluation set based on the Spider dev set with
2
https://platform.openai.com/examples/default-sql- explicit mentions of column names removed,
translate which contains 508 samples. (5) Spider-CG(SUB)
S PIDER S PIDER -SYN S PIDER -R EALISTIC
Methods / Datasets
VA EX TS VA EX TS VA EX TS
T5-3B + PICARD 98.4 79.3 69.4 98.2 69.8 61.8 97.1 71.4 61.7
RASAT + PICARD 98.8 80.5 70.3 98.3 70.7 62.4 97.4 71.9 62.6
RESDSQL-3B + NatSQL 99.1 84.1 73.5 98.8 76.9 66.8 98.4 81.9 70.1
ChatGPT 97.7 70.1(14↓) 60.1 96.2 58.6(18.3↓) 48.5 96.8 63.4(18.5 ↓) 49.2

Table 1: Comparison of the performance of ChatGPT and other models on Spider, Spider-SYN, and Spider-
Realistic datasets.

S PIDER -DK ADVETA( RPL ) ADVETA( ADD )

Methods / Datasets
VA EX TS VA EX TS VA EX TS
T5-3B + PICARD 97.8 62.5 - 92.7 50.6 - 97.2 69.4 -
RASAT + PICARD 98.5 63.9 - 92.9 51.5 - 97.4 70.7 -
RESDSQL-3B + NatSQL 98.8 66.0 - 93.9 54.4 - 97.9 71.9 -
ChatGPT 96.4 62.6(3.4 ↓) - 91.4 58.5(4.1 ↑) - 93.1 68.1(3.8 ↓) -

Table 2: Performance of different methods on the Spider-DK, ADVETA(RPL) and ADVETA(ADD) benchmark
datasets.

and Spider-CG(APP) (Gan et al., 2022) are two use the main-stream exact match accuracy, as SQL
evaluation datasets to measure the compositional queries that achieve the same goal can often be
generalization of models, which is constructed expressed in different ways, making it difficult for
by sub-sentence substitution between different zero-shot ChatGPT models to achieve high exact
examples and appending a sub-sentence into match accuracy.
another sentence separately. (6) ADVETA(rpl)
and ADVETA(add) (Pi et al., 2022) are two Baselines. Due to our exclusive reliance on
challenging test datasets for the Spider dataset execution-based evaluation, we did not employ
which are composed of adversarial replacements baselines such as RatSQL (Wang et al., 2019)
of column names and the addition of new column and LGESQL (Cao et al., 2021), which generate
names, respectively. (7) CSpider (Min et al., only SQL skeletons without generating values. In-
2019) dataset is constructed by translating Spider stead, we primarily utilized three baselines: (1)
into Chinese, which is the same size as the origin PICARD (Scholak et al., 2021) is a method for
Spider dataset (8) DuSQL (Wang et al., 2020) is constraining auto-regressive decoders of language
a larger scale Chinese Text-to-SQL dataset with models through incremental parsing. (2) RASAT
23,797 question/SQL pairs. (9) SParC (Yu et al., (Qi et al., 2022) introduces relation-aware self-
2019b) and CoSQL (Yu et al., 2019a) are two attention into transformer models and also utilizes
multi-turn Text-to-SQL dataset with 1625 and constrained auto-regressive decoders. (3) RESD-
1007 questions in the dev set separately. SQL (Li et al., 2023) proposes a ranking-enhanced
encoding and skeleton-aware decoding framework
to decouple the schema linking and the skeleton
Evaluation Metrics. We mainly adopt three parsing. Among those, PICARD and RASAT are
evaluation metrics which are valid SQL (VA), based on T5-3B (Raffel et al., 2020) model.
execution accuracy(EX), and test-suite accuracy
(TS). Valid SQL (VA) is the proportion of SQL
3.2 Main Experiment
statements that can be executed successfully.
Execution accuracy (EX) is the proportion of data Evaluation on Spider Dataset. In Table 1,
where the execution results match the standard we present a comparison between ChatGPT
SQL statements. Test-suite accuracy (TS) (Zhong and the current state-of-the-art (SOTA) models.
et al., 2020) could achieve high code coverage Overall, ChatGPT exhibits a strong Text-to-SQL
from a distilled test suite of the database, which ability.Despite the 14% gap in execution accuracy
is also based on execution. Note that we do not compared to the current SOTA models and a 13.4%
S PIDER -CG(SUB) S PIDER -CG(APP)
Methods / Datasets
VA EX TS VA EX TS
T5-3B + PICARD 98.4 82.1 74.3 95.8 68.0 60.5
RASAT + PICARD 99.0 82.6 76.1 96.2 68.6 61.0
RESDSQL-3B + NatSQL 99.4 83.3 77.5 96.4 69.4 62.4
ChatGPT 98.3 76.6(6.7 ↓) 67.2 91.2 61.3(8.1 ↓) 47.9

Table 3: Performance of different methods on the Spider-CG(SUB) and Spider-CG(APP) benchmark datasets.

gap in test suite accuracy, it is remarkable that SPAR C C O SQL

ChatGPT achieved such results in a zero-shot Methods / Datasets
VA EX VA EX
scenario considering that it was not fine-tuned on
T5-3B + PICARD - - 97.5 64.7
the Spider training set.
RASAT + PICARD 98.4 74.0 97.8 66.3
ChatGPT 97.3 63.1 95.8 60.7
Evaluation on Spider-SYN and Spider-
Realistic Datasets. Table 1 also includes a Table 4: The performance of ChatGPT on two multi-
comparison of ChatGPT’s performance on the turn Text-to-SQL datasets: SParC and CoSQL.
Spider-SYN and Spider-Realistic datasets. The
main difference between these datasets and the CS PIDER D U SQL
Spider dev set is that they eliminate the explicit Methods / Datasets
appearance of the database schema in the questions. VA EX VA EX
Overall, although ChatGPT still performs well ChatGPT 96.0 65.1 82.7 53.7
on these two settings, the performance gap
Table 5: The performance of ChatGPT on two Chinese
between ChatGPT and the original SOTA models
Text-to-SQL datasets: CSpider and DuSQL.
becomes slightly larger than that on the Spider
dataset. This suggests that the current models have
already achieved sufficient robustness in these two 3, we further analyze ChatGPT’s ability in the
scenarios. compositional generalization scenario. We found
that in Spider-CG (SUB), SQL substructures are
Evaluation on Spider-DK and ADVETA replaced to form combinations that do not exist in
Datasets. In Table 2, we further compare and the training set. In this scenario, ChatGPT even
analyze ChatGPT’s performance on Spider-DK, outperforms the original Spider dev set. Even on
ADVETA (RPL), and ADVETA (ADD). We find the more challenging Spider-CG (APP) dataset,
that ChatGPT performs exceptionally well on ChatGPT achieves strong performance, and the
these datasets, with very small performance gaps performance gap with SOTA models is relatively
compared to the current SOTA models. In fact, smaller than that on the original Spider dataset.
ChatGPT outperforms all current SOTA models on Overall, since ChatGPT is a zero-shot model, it is
ADVETA (RPL). For the Spider-DK dataset, we not as affected by compositional generalization
speculate that ChatGPT’s excellent performance as the SOTA models. Overall, zero-shot models
is due to its additional knowledge provided by have greater advantages in the compositional
the large-scale pretraining. As for scenarios generalization setting.
such as ADVETA, where the dataset’s column
names undergo adversarial modifications, the poor Evaluation on multi-turn Text-to-SQL scenar-
generalization performance of current models may ios. Given ChatGPT’s strong contextual modeling
be due to the significant distribution difference ability, we further evaluate its performance on
from the original dataset. Overall, ChatGPT multi-turn Text-to-SQL scenarios: SPAR C and
exhibits strong robustness in scenarios that require C O SQL. As shown in Table 4, ChatGPT exhibits
additional knowledge or adversarial modifications strong multi-turn Text-to-SQL ability. Although
are applied to the database column names. there is still a gap compared to the current SOTA
models, the gap is relatively smaller compared to
Evaluation on Spider-CG Dataset. In Table the single-turn Spider dataset. Meanwhile, Chat-
GPT also performs better on CoSQL datasets with Question: Show the stadium name and the number of
more average interactions, which also indicates concerts in each stadium.
that ChatGPT’s strong contextual modeling ability ChatGPT: SELECT stadium.Name,
is very helpful for multi-turn Text-to-SQL. COUNT(concert.concert_ID) FROM stadium
LEFT JOIN concert ON stadium.Stadium_ID =
concert.Stadium_ID GROUP BY stadium.Name;
Evaluation on Chinese Text-to-SQL scenarios.
Gold: SELECT T2.name , count(*) FROM concert
We further evaluate ChatGPT’s Text-to-SQL abil-
AS T1 JOIN stadium AS T2 ON T1.stadium_id =
ity on other languages in Table 5. The experiments T2.stadium_id GROUP BY T1.stadium_id
are mainly conducted on two datasets, CSpider and Question:How many car models were produced by the
DuSQL, where only the questions are in Chinese maker with full name American Motor Company?
for CSpider and both the schema names and ques- ChatGPT: SELECT COUNT(*) FROM model_list
tions are in Chinese for DuSQL. The results show WHERE Maker = ’American Motor Company’
that while ChatGPT performs well in the Chinese Gold: SELECT count(*) FROM CAR_MAKERS AS T1
Text-to-SQL scenario, there is still a performance JOIN model_list AS T2 ON T1.Id = T2.Maker
gap compared to the English Text-to-SQL scenario. WHERE T1.FullName = ’American Motor Company’;
Moreover, the performance is even worse when the Question: How many cars have a larger accelerate than
table names and column names are also in Chinese, the car with the largest horsepower?
with a large number of generated SQL queries be- ChatGPT: SELECT COUNT(*) FROM cars_data WHERE
Accelerate > (SELECT MAX(Horsepower) FROM
ing non-executable and a lower execution accuracy.
cars_data)
This suggests the cross-lingual generalization abil-
Gold: SELECT COUNT(*) FROM CARS_DATA WHERE
ity of ChatGPT requires further improvement. Accelerate > (SELECT Accelerate FROM
CARS_DATA ORDER BY Horsepower DESC LIMIT 1);
3.3 Case Study
Question: What is the abbreviation of Airline "JetBlue
In Table 6, we present four typical prediction er- Airways"?
rors made by ChatGPT on the Spider dev dataset. ChatGPT: SELECT Abbreviation FROM airlines
The first error case shows that ChatGPT tends to WHERE Airline = ’Jetblue Airways’ ;
design JOIN statements more finely by using LEFT Gold: SELECT Abbreviation FROM AIRLINES WHERE
JOIN, but this level of granularity is not present in Airline = "JetBlue Airways";
the original Spider dev dataset. The second error
Table 6: Case study: We selected four cases of incor-
case arises from ChatGPT’s confusion regarding
rect predictions generated by ChatGPT on the Spider
the database structure, and it is not clear which col- development set for analysis.
umn the term "full name" specifically refers to. The
third example’s error was due to the generated SQL
statement lacking correct semantic interpretation,
resulting in incorrect output for the "where" clauses users into SQL statements that can be executed
with nested SQL statements. The fourth case of on a database. On the classic Spider dataset (Yu
error is due to errors in copying specific values, et al., 2018), many classic works such as RatSQL
where the case sensitivity of the original value was (Wang et al., 2019) and LGESQL (Cao et al., 2021)
not preserved when regenerating the value. have achieved excellent results. Since Text-to-SQL
is a very complex task involving both user input
In summary, ChatGPT’s errors mostly occur in
questions and database structure, the robustness
small details, and some of these issues can be ad-
of the model is crucial. To further explore this is-
dressed and improved in later stages of develop-
sue, Gan et al. (2021a) proposed the Spider-SYN
ment, such as in the first, third, and fourth cases.
dataset to evaluate the robustness of models under
However, for errors like the second case, which
synonym substitution scenarios. Some works, such
indicate a lack of understanding of the database
as Proton (Wang et al., 2022) and ISESL-SQL (Liu
schema, further improvements to the model’s abil-
et al., 2022), are also devoted to improving the ro-
ity may be necessary to resolve them.
bustness of models in this scenario. Meanwhile,
4 Related Work many works explore the robustness of the Text-
to-SQL task in other scenarios. The Spider-DK
Text-to-SQL is an important semantic parsing task dataset (Gan et al., 2021b) evaluates the robustness
that converts natural language questions posed by of models in scenarios requiring additional knowl-
edge. The Spider-Realistic dataset (Deng et al., uate ChatGPT’s ability. And in future work, better
2020) removes the explicit appearance of dataset prompts could be designed to explore ChatGPT’s
schema information in user questions, thereby in- Text-to-SQL ability.
creasing the difficulty of the original task. The
Spider-CG dataset (Gan et al., 2022) evaluates the 6 Future work
robustness of models in compositional generaliza-
In future work, we will primarily consider the fol-
tion scenarios. The ADVETA dataset (Pi et al.,
lowing two directions to further explore ChatGPT’s
2022) evaluates the robustness of models in scenar-
capabilities in the Text-to-SQL task. Firstly, we
ios involving adversarial modifications of database
will conduct more interactions with ChatGPT to
table information. In addition, to verify the robust-
address the issue of generating non-executable SQL
ness of models in cross-lingual scenarios, CSpider
statements. We can design ChatGPT to engage in
(Min et al., 2019) and DuSQL (Wang et al., 2020)
multi-turn dialogues with the provided database
have been proposed to evaluate the robustness of
error messages to further ensure the validity of
models in the Chinese language. To evaluate the
generated SQL statements. Secondly, we will add
performance of Text-to-SQL in more realistic sce-
more highly correlated in-context examples to the
narios, SParC (Yu et al., 2019b) and CoSQL (Yu
prompt to enhance ChatGPT’s ability to generate
et al., 2019a) have been proposed to evaluate the
Text-to-SQL.
performance of multi-turn Text-to-SQL. Models
such as STA R (Cai et al., 2022) and CQR-SQL
(Xiao et al., 2022) have also achieved good results References
in this scenario.
Currently, several methods have been attempted Tom Brown, Benjamin Mann, Nick Ryder, Melanie
Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind
to explore the improvement of large-scale language Neelakantan, Pranav Shyam, Girish Sastry, Amanda
models for Text-to-SQL models. The PICARD Askell, et al. 2020. Language models are few-shot
(Scholak et al., 2021) and RASAT (Qi et al., 2022) learners. Advances in neural information processing
utilize the T5-3B model, but still require the train- systems, 33:1877–1901.
ing data for fine-tuning. Rajkumar et al. (2022) Zefeng Cai, Xiangyu Li, Binyuan Hui, Min Yang,
investigated the Text-to-SQL capabilities of the Bowen Li, Binhua Li, Zheng Cao, Weijie Li, Fei
GPT3 model in a zero-shot setting. Cheng et al. Huang, Luo Si, et al. 2022. Star: Sql guided pre-
(2022) proposed the BINDER model based on the training for context-dependent text-to-sql parsing.
arXiv preprint arXiv:2210.11888.
GPT3 codex, which has similar Text-to-SQL gen-
eration capabilities with the need for in-context Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao,
exemplar annotations. However, these works do Su Zhu, and Kai Yu. 2021. Lgesql: line graph en-
not provide a comprehensive evaluation of Text- hanced text-to-sql model with mixed local and non-
to-SQL and are limited to a few datasets without local relations. arXiv preprint arXiv:2106.01093.
other robustness settings. In this work, we are the Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu
first to evaluate the comprehensive Text-to-SQL Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong,
capabilities of ChatGPT. Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer,
et al. 2022. Binding language models in symbolic
languages. arXiv preprint arXiv:2210.02875.
5 Conclusion
Paul F Christiano, Jan Leike, Tom Brown, Miljan Mar-
In this work, we conducted a comprehensive analy- tic, Shane Legg, and Dario Amodei. 2017. Deep re-
sis of ChatGPT’s zero-shot ability in Text-to-SQL. inforcement learning from human preferences. Ad-
We found that even without using any training data, vances in neural information processing systems, 30.
ChatGPT still has strong Text-to-SQL ability, al-
Xiang Deng, Ahmed Hassan Awadallah, Christopher
though there is still some gap compared to the cur- Meek, Oleksandr Polozov, Huan Sun, and Matthew
rent SOTA models. Additionally, ChatGPT demon- Richardson. 2020. Structure-grounded pretraining
strated strong robustness, performing relatively bet- for text-to-sql. arXiv preprint arXiv:2010.12773.
ter on most robustness benchmarks and even sur-
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
passing the current SOTA models on the ADVETA Kristina Toutanova. 2018. Bert: Pre-training of deep
benchmark. Although this paper has made some bidirectional transformers for language understand-
findings, we only utilize a common prompt to eval- ing. arXiv preprint arXiv:1810.04805.
Simon Frieder, Luca Pinchetti, Ryan-Rhys Grif- Torsten Scholak, Nathan Schucher, and Dzmitry Bah-
fiths, Tommaso Salvatori, Thomas Lukasiewicz, danau. 2021. Picard: Parsing incrementally for
Philipp Christian Petersen, Alexis Chevalier, and constrained auto-regressive decoding from language
Julius Berner. 2023. Mathematical capabilities of models. arXiv preprint arXiv:2109.05093.
chatgpt. arXiv preprint arXiv:2301.13867.
Bailin Wang, Richard Shin, Xiaodong Liu, Olek-
Yujian Gan, Xinyun Chen, Qiuping Huang, and sandr Polozov, and Matthew Richardson. 2019.
Matthew Purver. 2022. Measuring and improving Rat-sql: Relation-aware schema encoding and
compositional generalization in text-to-sql via com- linking for text-to-sql parsers. arXiv preprint
ponent alignment. arXiv:1911.04942.
Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Jiaan Wang, Yunlong Liang, Fandong Meng, Zhixu
Purver, John R Woodward, Jinxia Xie, and Peng- Li, Jianfeng Qu, and Jie Zhou. 2023. Cross-
sheng Huang. 2021a. Towards robustness of text- lingual summarization via chatgpt. arXiv preprint
to-sql models against synonym substitution. In Pro- arXiv:2302.14229.
ceedings of the 59th Annual Meeting of the Associa-
tion for Computational Linguistics and the 11th In- Lihan Wang, Bowen Qin, Binyuan Hui, Bowen Li, Min
ternational Joint Conference on Natural Language Yang, Bailin Wang, Binhua Li, Jian Sun, Fei Huang,
Processing (Volume 1: Long Papers), pages 2505– Luo Si, et al. 2022. Proton: Probing schema link-
2515. ing information from pre-trained language models
for text-to-sql parsing. In Proceedings of the 28th
Yujian Gan, Xinyun Chen, and Matthew Purver. ACM SIGKDD Conference on Knowledge Discovery
2021b. Exploring underexplored limitations of and Data Mining, pages 1889–1898.
cross-domain text-to-sql generalization. arXiv
preprint arXiv:2109.05157. Lijie Wang, Ao Zhang, Kun Wu, Ke Sun, Zhenghua
Li, Hua Wu, Min Zhang, and Haifeng Wang. 2020.
Haoyang Li, Jing Zhang, Cuiping Li, and Hong Dusql: A large-scale and pragmatic chinese text-to-
Chen. 2023. Decoupling the skeleton parsing and sql dataset. In Proceedings of the 2020 Conference
schema linking for text-to-sql. arXiv preprint on Empirical Methods in Natural Language Process-
arXiv:2302.05965. ing (EMNLP), pages 6923–6935.
Aiwei Liu, Xuming Hu, Li Lin, and Lijie Wen. 2022. Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang,
Semantic enhanced text-to-sql parsing via iteratively Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu,
learning schema linking graph. In Proceedings of Yufeng Chen, Meishan Zhang, et al. 2023. Zero-
the 28th ACM SIGKDD Conference on Knowledge shot information extraction via chatting with chatgpt.
Discovery and Data Mining, pages 1021–1030. arXiv preprint arXiv:2302.10205.
Qingkai Min, Yuefeng Shi, and Yue Zhang. 2019. A Dongling Xiao, Linzheng Chai, Qian-Wen Zhang,
pilot study for chinese sql semantic parsing. arXiv Zhao Yan, Zhoujun Li, and Yunbo Cao. 2022.
preprint arXiv:1909.13293. Cqr-sql: Conversational question reformulation en-
hanced context-dependent text-to-sql parsers. arXiv
Xinyu Pi, Bing Wang, Yan Gao, Jiaqi Guo, Zhoujun
preprint arXiv:2205.07686.
Li, and Jian-Guang Lou. 2022. Towards robustness
of text-to-SQL models against natural and realistic Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue,
adversarial table perturbation. In Proceedings of the Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi,
60th Annual Meeting of the Association for Compu- Zihan Li, et al. 2019a. Cosql: A conversational
tational Linguistics (Volume 1: Long Papers), pages text-to-sql challenge towards cross-domain natural
2007–2022, Dublin, Ireland. Association for Com- language interfaces to databases. arXiv preprint
putational Linguistics. arXiv:1909.05378.
Jiexing Qi, Jingyao Tang, Ziwei He, Xiangpeng Wan, Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga,
Chenghu Zhou, Xinbing Wang, Quanshi Zhang, and Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingn-
Zhouhan Lin. 2022. Rasat: Integrating relational ing Yao, Shanelle Roman, et al. 2018. Spider: A
structures into pretrained seq2seq model for text-to- large-scale human-labeled dataset for complex and
sql. arXiv preprint arXiv:2205.06983. cross-domain semantic parsing and text-to-sql task.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine arXiv preprint arXiv:1809.08887.
Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern
Wei Li, and Peter J Liu. 2020. Exploring the limits Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li,
of transfer learning with a unified text-to-text trans- Bo Pang, Tao Chen, et al. 2019b. Sparc: Cross-
former. The Journal of Machine Learning Research, domain semantic parsing in context. arXiv preprint
21(1):5485–5551. arXiv:1906.02285.
Nitarshan Rajkumar, Raymond Li, and Dzmitry Bah- Ruiqi Zhong, Tao Yu, and Dan Klein. 2020. Seman-
danau. 2022. Evaluating the text-to-sql capabil- tic evaluation for text-to-sql with distilled test suites.
ities of large language models. arXiv preprint arXiv preprint arXiv:2010.02840.
arXiv:2204.00498.

Information System Security Plan
No ratings yet
Information System Security Plan
8 pages
100day CPA Youtube Method
100% (2)
100day CPA Youtube Method
2 pages
Comparative Report SMS LMS
No ratings yet
Comparative Report SMS LMS
14 pages
What Can ChatGPT Do?
No ratings yet
What Can ChatGPT Do?
10 pages
Dora Error 2
No ratings yet
Dora Error 2
39 pages
Buku Manual Mindray Dp10
0% (1)
Buku Manual Mindray Dp10
157 pages
Seminar Report On Chat GPT
100% (4)
Seminar Report On Chat GPT
8 pages
Structure Query Language (SQL)
No ratings yet
Structure Query Language (SQL)
112 pages
Duplicate PDF
No ratings yet
Duplicate PDF
214 pages
PROJECT Synopses: Hostel Management System
No ratings yet
PROJECT Synopses: Hostel Management System
22 pages
Google Analytics Certification Question
100% (1)
Google Analytics Certification Question
5 pages
Online Cab Booking System
100% (3)
Online Cab Booking System
37 pages
Dehnsupport Toolbox Ds709 e
No ratings yet
Dehnsupport Toolbox Ds709 e
20 pages
Poseidon2 - User Manual
No ratings yet
Poseidon2 - User Manual
76 pages
Basic Database Concepts
No ratings yet
Basic Database Concepts
24 pages
Will and Going To For Predictions Worksheet
No ratings yet
Will and Going To For Predictions Worksheet
3 pages
Sans Detecting and Mitigating The Gatekeeper User Override On Macos in An Enterprise Environment
No ratings yet
Sans Detecting and Mitigating The Gatekeeper User Override On Macos in An Enterprise Environment
29 pages
Chapter 14
No ratings yet
Chapter 14
28 pages
Sample User Guide
No ratings yet
Sample User Guide
181 pages
CCSM Overview
No ratings yet
CCSM Overview
1 page
Allah Name
No ratings yet
Allah Name
5 pages
Big Data Analytics Notes
67% (3)
Big Data Analytics Notes
16 pages
AMD Accelerated Parallel Processing OCL Programming Guide-2013!06!21
No ratings yet
AMD Accelerated Parallel Processing OCL Programming Guide-2013!06!21
288 pages
Preprints202303 0438 v1
No ratings yet
Preprints202303 0438 v1
18 pages
Script Quizizz
No ratings yet
Script Quizizz
4 pages
2021 04 17T16 33 48 - R3dlog
No ratings yet
2021 04 17T16 33 48 - R3dlog
4 pages
Quick Reference Guide - CMDB - Create New Network Gear CI Record
No ratings yet
Quick Reference Guide - CMDB - Create New Network Gear CI Record
1 page
Chat Code NRF
No ratings yet
Chat Code NRF
3 pages
Evaluating The Text-to-SQL Capabilities of Large Language Models
No ratings yet
Evaluating The Text-to-SQL Capabilities of Large Language Models
13 pages
Chatbot Interaction With Artificial Intelligence: Human Data Augmentation With T5 and Language Transformer Ensemble For Text Classification
No ratings yet
Chatbot Interaction With Artificial Intelligence: Human Data Augmentation With T5 and Language Transformer Ensemble For Text Classification
16 pages
A Comprehensive Study of ChatG
No ratings yet
A Comprehensive Study of ChatG
24 pages
Futureinternet 15 00192
No ratings yet
Futureinternet 15 00192
24 pages
5 2022 Bea-1 28
No ratings yet
5 2022 Bea-1 28
16 pages
Objective:: Related Coursework
No ratings yet
Objective:: Related Coursework
1 page
Can Chatgpt Understand Too? A Comparative Study On Chatgpt and Fine-Tuned Bert
No ratings yet
Can Chatgpt Understand Too? A Comparative Study On Chatgpt and Fine-Tuned Bert
19 pages
2024 Lrec-Main 539
No ratings yet
2024 Lrec-Main 539
19 pages
C: A Pragmatic Chinese Answer-to-Sequence Dataset With Large Scale and High Quality
No ratings yet
C: A Pragmatic Chinese Answer-to-Sequence Dataset With Large Scale and High Quality
16 pages
ChatGPT For Data Science Cheat Sheet KDnuggets
100% (1)
ChatGPT For Data Science Cheat Sheet KDnuggets
1 page
ChatGPT: Jack of All Trades, Master of None
No ratings yet
ChatGPT: Jack of All Trades, Master of None
40 pages
Brochure Hospital 0822 Highq
No ratings yet
Brochure Hospital 0822 Highq
2 pages
ChatGPT A Comprehensive Review On Background, Applications, Key
No ratings yet
ChatGPT A Comprehensive Review On Background, Applications, Key
34 pages
Automated Chatbot Implemented Using Natural Language Processing PDF
No ratings yet
Automated Chatbot Implemented Using Natural Language Processing PDF
5 pages
1 s2.0 S2667345223000317 Main
No ratings yet
1 s2.0 S2667345223000317 Main
10 pages
A ChatGPT-MATLAB Framework For Numerical Modeling in Geotechnical Engineering Applications
No ratings yet
A ChatGPT-MATLAB Framework For Numerical Modeling in Geotechnical Engineering Applications
13 pages
A Brief Overview of Chatgpt: The History, Status Quo and Potential Future Development
No ratings yet
A Brief Overview of Chatgpt: The History, Status Quo and Potential Future Development
15 pages
Blue Light Blue Color Blocks Flight Attendant CV - 20240530 - 170623 - 0000
No ratings yet
Blue Light Blue Color Blocks Flight Attendant CV - 20240530 - 170623 - 0000
2 pages
Other Tools Manual
No ratings yet
Other Tools Manual
362 pages
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
No ratings yet
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
17 pages
What Types of Questions Require Conversation To Answer? A Case Study of Askreddit Questions
No ratings yet
What Types of Questions Require Conversation To Answer? A Case Study of Askreddit Questions
9 pages
Few-Shot Text-to-SQL Translation Using Structure
No ratings yet
Few-Shot Text-to-SQL Translation Using Structure
28 pages
S 2.0: E L M R - W E T - SQL W - : Pider Valuating Anguage Odels On EAL Orld Nterprise EXT TO ORK Flows
No ratings yet
S 2.0: E L M R - W E T - SQL W - : Pider Valuating Anguage Odels On EAL Orld Nterprise EXT TO ORK Flows
45 pages
Chase SQL
No ratings yet
Chase SQL
30 pages
PLSQL 16 18
No ratings yet
PLSQL 16 18
3 pages
ChatGPT: Applications, Opportunities, and Threats
No ratings yet
ChatGPT: Applications, Opportunities, and Threats
13 pages
FINAL-MIDTERM Major2
No ratings yet
FINAL-MIDTERM Major2
20 pages
SSRN Id4402499
No ratings yet
SSRN Id4402499
7 pages
RATSQL
No ratings yet
RATSQL
12 pages
Major2 Synopsis
No ratings yet
Major2 Synopsis
13 pages
Graphix T5
No ratings yet
Graphix T5
10 pages
A Brief Overview of ChatGPT The History Status Quo and Potential Future Development
No ratings yet
A Brief Overview of ChatGPT The History Status Quo and Potential Future Development
15 pages
ChatGPT: A Revolutionary Human-Machine Communication Technology
No ratings yet
ChatGPT: A Revolutionary Human-Machine Communication Technology
3 pages
PICARD
No ratings yet
PICARD
7 pages
Recent Advances in Text To SQL
No ratings yet
Recent Advances in Text To SQL
22 pages
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
No ratings yet
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
22 pages
ChatGPTandGPT 4forProfessionalTranslators SaiCheongSiu SSRN Id4448091
No ratings yet
ChatGPTandGPT 4forProfessionalTranslators SaiCheongSiu SSRN Id4448091
37 pages
Analyzing The Effectiveness of Large Language Models On Text-to-SQL Synthesis
No ratings yet
Analyzing The Effectiveness of Large Language Models On Text-to-SQL Synthesis
5 pages
Catsql:: Towards Real World Natural Language To SQL Applications
No ratings yet
Catsql:: Towards Real World Natural Language To SQL Applications
14 pages
LLM Model Transform For Short Term Trading On Commodity
No ratings yet
LLM Model Transform For Short Term Trading On Commodity
7 pages
A+ Exam Wrong Answers
No ratings yet
A+ Exam Wrong Answers
50 pages
670e4e23bdd7d170839060aa2023 Findings-Emnlp 227
No ratings yet
670e4e23bdd7d170839060aa2023 Findings-Emnlp 227
32 pages
Module-2 NOSQL
No ratings yet
Module-2 NOSQL
5 pages
Preprints202402 0693 v1
No ratings yet
Preprints202402 0693 v1
9 pages
Chat-GPT Document
No ratings yet
Chat-GPT Document
21 pages
Large Language Model Enhanced Text-to-SQL Generation - A Survey
No ratings yet
Large Language Model Enhanced Text-to-SQL Generation - A Survey
18 pages
Research Paper
No ratings yet
Research Paper
32 pages
Context-Based Ontology Modelling For Database: Enabling Chatgpt For Semantic Database Management
No ratings yet
Context-Based Ontology Modelling For Database: Enabling Chatgpt For Semantic Database Management
16 pages
Exploring ChatGPT Capabilities and Limitations A Survey
No ratings yet
Exploring ChatGPT Capabilities and Limitations A Survey
24 pages
Labruna Et Al. (2023) - Unraveling ChatGPT A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations
No ratings yet
Labruna Et Al. (2023) - Unraveling ChatGPT A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations
21 pages
A22-DS - (C TO All) - 23-08-2023 - (Regular)
No ratings yet
A22-DS - (C TO All) - 23-08-2023 - (Regular)
2 pages
GenAI Intern Assignment
No ratings yet
GenAI Intern Assignment
2 pages
Aim - Procedure - Result - Single Side
No ratings yet
Aim - Procedure - Result - Single Side
18 pages
Evaluation On ChatGPT For Chinese Language
No ratings yet
Evaluation On ChatGPT For Chinese Language
19 pages
Din SQL Research Paper
No ratings yet
Din SQL Research Paper
34 pages
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT On Reasoning, Hallucination, and Interactivity
No ratings yet
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT On Reasoning, Hallucination, and Interactivity
44 pages
Project Report - 7 - Merged
No ratings yet
Project Report - 7 - Merged
46 pages
Data Science AI Query Solution
No ratings yet
Data Science AI Query Solution
6 pages
Project Report - 3
No ratings yet
Project Report - 3
3 pages
Introduction To AI Prompt Hub
No ratings yet
Introduction To AI Prompt Hub
16 pages
DB GPT Hub 2024
No ratings yet
DB GPT Hub 2024
17 pages
Solid-SQL Enhanced Schema-Linking Based In-Context Learning For
No ratings yet
Solid-SQL Enhanced Schema-Linking Based In-Context Learning For
11 pages
ChatGPT Research Overview
No ratings yet
ChatGPT Research Overview
2 pages
SQLPa LM
No ratings yet
SQLPa LM
61 pages
Gpteval: A Survey On Assessments of Chatgpt and Gpt-4: Rui Mao, Guanyi Chen, Xulang Zhang, Frank Guerin Erik Cambria
No ratings yet
Gpteval: A Survey On Assessments of Chatgpt and Gpt-4: Rui Mao, Guanyi Chen, Xulang Zhang, Frank Guerin Erik Cambria
18 pages
Enhancing Text-To-SQL Capabilities of Large Language Models
No ratings yet
Enhancing Text-To-SQL Capabilities of Large Language Models
22 pages
DTS SQL
No ratings yet
DTS SQL
9 pages

ChatGPT SQL

Uploaded by

ChatGPT SQL

Uploaded by

A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL

Abstract have begun to analyze the zero-shot ability of Chat-

Given the recent emergence of large-scale

2 Method Datasets. We conduct extensive experiments on

S PIDER -DK ADVETA( RPL ) ADVETA( ADD )

gap in test suite accuracy, it is remarkable that SPAR C C O SQL

You might also like