Enhancing Text-To-SQL Capabilities of Large Language Models
Enhancing Text-To-SQL Capabilities of Large Language Models
Linyong Nan1 Yilun Zhao1 Weijin Zou1 Narutatsu Ri2 Jaesung Tae1
Ellen Zhang1 Arman Cohan1,3 Dragomir Radev1
1
Yale University 2 Columbia University 3 Allen Institute for AI
{linyong.nan, yilun.zhao}@yale.edu
14937
with various numbers of demonstrations. To estab- • Spider-Syn (Gan et al., 2021a) replaced
lish substantial conclusions when comparing dis- schema-related words in the questions of Spi-
tinct prompting approaches, we present the mean der examples with manually selected syn-
and standard deviation for models sharing identi- onyms that reflect real-world question para-
cal configurations except for the varying number phrases to evaluate the robustness of systems.
of demonstrations. In addition, we employ a ma-
jority vote on these models exhibiting diverse per- • Spider-DK (Gan et al., 2021b) defined five
formances. Specifically, we obtain the execution types of domain knowledge and modified
results of different models’ greedy decoding predic- some Spider examples by adding domain
tions, eliminate those with execution errors by de- knowledge to evaluate the cross-domain gen-
terministic database management system (DBMS), eralization capability of a given system.
and choose the prediction that receives the major- • Spider-Realistic (Deng et al., 2021) removed
ity vote. Alternative integration methods, such as explicit mentions of column names from Spi-
the self-consistency sampling (Wang et al., 2023), der examples to reflect more realistic text-
are also available, but we reserve their exploration table alignment settings, and selected eight ex-
for future research. The comprehensive results are isting Text-to-SQL datasets for cross-domain
available in Figures 10, 11, 12 of the Appendix for evaluation.
reader’s perusal.
We propose the following procedure for con- Model We evaluate different ICL strategies with
structing prompts for the Text-to-SQL task. Given Codex (Chen et al., 2021), a GPT-3 variant that was
a set A of annotated examples, we first establish finetuned on code data on the web and has demon-
a categorization that divides the pool into disjoint strated state-of-the-art performance as the time of
partitions Aα , Aβ , . . . ,, with each partition contain- writing (Ni et al., 2023). Specifically, we use the
ing examples whose SQL queries share a relatively code-davinci-002 engine and present the results
similar syntax structure. Next, we apply the k- of systems with prompts ranging from 1 to 10-shot.
Means strategy detailed in Section 2.1 to obtain Additionally, we report the few-shot results utiliz-
diverse demonstration examples Dj for partition ing the ChatGPT (gpt-3.5-turbo) model. How-
Aj . For each example, the demonstration is con- ever, due to its maximum context length limitation
structed by transforming the database into multi- of 4096, we only obtain results for systems pro-
ple CREATE queries and augmenting with schema- vided with prompts ranging from 1 to 5-shot.4
related knowledge. During inference, we employ a Evaluation Metric We use execution accuracy
preliminary model to generate a draft SQL query, as the evaluation metric for all experiments, which
which is used to determine the problem category measures the percentage of system predictions lead-
and thus the corresponding Dj for building the ing to the gold execution result.
prompt. We obtain multiple predictions using vari-
ous numbers of shots in Dj and perform majority Baselines We compare the following prompting
voting to arrive at the final prediction. Details of strategies for generating SQL queries in few-shot
this approach are shown in Algorithm 2 of the ap- and zero-shot settings.
pendix.
Few-shot
3 Experiments
• Random sampling (R): Select demonstration ex-
3.1 Experimental Settings amples randomly from the pool.
• Similarity sampling (S)
Dataset We conduct comprehensive experiments
• Diversity sampling (D): Select diverse examples
on the following four semantic parsing datasets:
from k-Means clusters of the pool.
• Spider (Yu et al., 2018) is a cross-domain se- • Similarity-Diversity sampling (SD): Select ex-
mantic parsing dataset that contains complex amples based on Algorithm 1.
Text-to-SQL problems. The data originates • SD + schema augmentation (SA): Enhance in-
from 200 databases covering 138 different do- structions with schema knowledge (semantic aug-
mains. We use the 7,000 training data as our mentation or structure augmentation).
pool of annotated examples. 4
Public API available at https://openai.com/api/.
14938
(a) Few-shot results
Figure 1: Few-shot and zero-shot results of Codex for all datasets. In the few-shot setting, error bars indicate
means and standard deviations over performances of systems provided with prompts ranging from 4-shot to 10-shot.
To obtain the error bars for the random sampling approach, we conducted 3 independent runs using different
random seeds. Schema augmentation utilized for the reported results in (a) is structure augmentation - add ontology
summary. In the zero-shot setting, the error bars indicate means and standard deviations over 3 independent runs.
Our results suggest that 1) using similarity and diversity objectives in the sampling process, 2) including schema
representation in instructions, and 3) employing model voting with different shot outcomes both contribute to the
improvement of ICL performance.
• SD + SA + Voting: Integrated strategy described efficacy across multiple datasets. The evaluation
in Algorithm 2. of demonstration sampling strategies in a few-
shot setting testing on code-davinci-002 is illus-
Zero-shot trated in Figure 1a, and more few-shot results of
gpt-3.5-turbo are shown in Figure 2. We com-
• Baseline - DB as text-seq: Standard prompt for
pared different demonstration selection strategies,
Text-to-SQL task, where structured knowledge is
including random selection, k-nearest neighbors
linearized as text sequence.
selection (similarity sampling)5 , k-means selec-
• Baseline - DB as code-seq: Improve instructions
tion (diversity sampling), and our proposed ap-
by linearizing structured knowledge source as
proach, which combines both similarity and di-
multiple SQL CREATE queries.
versity. Moreover, we examined the impact of
• Baseline - DB as code-seq + SA: Enhance in-
augmenting schema representation within the task
structions with schema knowledge.
instructions and assessed the performance of our
3.2 Main Results
5
Due to the deprecation of the Codex API in March 2023,
In this section, we present a comprehensive analy- similarity sampling experiments were only conducted on the
sis of various prompting strategies, assessing their Spider dataset.
14939
integrated strategy. Our findings indicate that em- demonstrations in order to effectively understand
ploying similarity and diversity objectives in the and utilize the provided information. In future
sampling process leads to better performance on study, we will explore better structural schema aug-
average across all datasets. Furthermore, incorpo- mentation that aligns to the zero-shot setting.
rating schema representation within the instructions
enhances performance, and the implementation of 4 Analysis
voting of models with different shot results in a
4.1 Prediction-Syntax based Retrieval
marked improvement in overall performance.
The existing method for selecting demonstrations
relies on the semantic representations of the ques-
tion and the database. We propose an alterna-
tive method specifically for code generation tasks,
which focuses on the syntax of the solution code.
We examined syntax coverage and syntax similarity
of the prompts produced with different strategies.
Syntax coverage is computed by counting the occur-
rence of syntactic elements (keywords, operators,
and identifiers), and dividing it by the total number
of all syntactic elements. Syntax similarity, on the
other hand, is measured by the mean Euclidean
distance between the discrete vector representation
of the predicted SQL and vectors that represent
Figure 2: Few-shot results of gpt-3.5-turbo for Spi-
the gold SQLs of the demonstrations selected. As
der. Error bars indicate means and standard deviations indicated in Table 1 of the appendix, both of these
over performances of systems provided with 1-shot to metrics contribute to the quality of the examples
5-shot prompts. Schema augmentation utilized for the selected. Furthermore, a simple summation of the
reported results is semantic augmentation - add column two measurements suggests a correlation with the
summary as block-comment. system’s performance, as illustrated in Figure 6 of
the appendix. We argue the efficacy of our strat-
The efficacy of schema augmentation is fur- egy through the following rationale: (1) in cases
ther supported by experiments in a zero-shot set- where the pool of annotated examples is limited
ting, as illustrated in Figure 1b. We compared in diversity of the problem structures, certain test
systems using different linearization methods for problems may lack similar examples available for
prompts: one that transforms the database into retrieval; and (2) neither the semantic represen-
a text sequence, and another that uses multiple tation of the question/database nor the distance
CREATE queries to represent the database. The metric inherently support encapsulation and com-
latter method shows noticeable improvement in parison of different problem structures, whereas
performance. We also contrasted two separate SQL syntax provides direct measurement of the
techniques for augmenting schema representation: problem structures. Given these constraints, the
one that adds semantic information to each col- optimal strategy is to select similar examples while
umn within each table, and another that incorpo- ensuring the coverage of as many syntax demon-
rates entity-relationship knowledge into the schema. strations as feasible to mitigate potential failures in
The results suggest that structural augmentation similarity-based retrieval.
(adding ontology summary) brings a slight greater
improvement in the few-shot setting for Codex 4.2 Comparative Analysis of Retrieval
(shown in Figure 5), while semantic augmenta- Methods
tion (adding column summary as block comments) We conducted an examination of various similarity-
proves more beneficial in the zero-shot setting for based retrieval methods and presented a compar-
Codex and also the few-shot setting for ChatGPT ative analysis of their performance in Figure 3.
(gpt-3.5-turbo). We hypothesize that this dif- The primary variable in this investigation was the
ference may arise from the less descriptive nature representation extracted for each example, with a
of structural augmentation, which calls for more focus on extracting and comparing the following
14940
Figure 3: Comparison between various similarity based demonstration selection methods. Q indicates the embedding
model employed to extract representation for the question; D stands for database, and S stands for SQL query.
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Xiang Deng, Ahmed Hassan Awadallah, Christopher
Liang. 2013. Semantic parsing on Freebase from Meek, Oleksandr Polozov, Huan Sun, and Matthew
question-answer pairs. In Proceedings of the 2013 Richardson. 2021. Structure-grounded pretraining
Conference on Empirical Methods in Natural Lan- for text-to-SQL. In Proceedings of the 2021 Con-
guage Processing, pages 1533–1544, Seattle, Wash- ference of the North American Chapter of the Asso-
ington, USA. Association for Computational Linguis- ciation for Computational Linguistics: Human Lan-
tics. guage Technologies, pages 1337–1350, Online. As-
sociation for Computational Linguistics.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie
Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong
Neelakantan, Pranav Shyam, Girish Sastry, Amanda Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, and
Askell, Sandhini Agarwal, Ariel Herbert-Voss, Zhifang Sui. 2023. A survey on in-context learning.
14943
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xi- (Volume 1: Long Papers), pages 23–33, Vancouver,
aocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Canada. Association for Computational Linguistics.
Ting Liu, Daxin Jiang, and Ming Zhou. 2020. Code-
BERT: A pre-trained model for programming and Aiwei Liu, Xuming Hu, Lijie Wen, and Philip S. Yu.
natural languages. In Findings of the Association 2023. A comprehensive evaluation of chatgpt’s zero-
for Computational Linguistics: EMNLP 2020, pages shot text-to-sql capability.
1536–1547, Online. Association for Computational
Linguistics. Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan,
Lawrence Carin, and Weizhu Chen. 2022. What
Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and makes good in-context examples for GPT-3? In
Tushar Khot. 2023. Complexity-based prompting for Proceedings of Deep Learning Inside Out (DeeLIO
multi-step reasoning. In The Eleventh International 2022): The 3rd Workshop on Knowledge Extrac-
Conference on Learning Representations. tion and Integration for Deep Learning Architectures,
pages 100–114, Dublin, Ireland and Online. Associa-
Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew tion for Computational Linguistics.
Purver, John R. Woodward, Jinxia Xie, and Peng-
sheng Huang. 2021a. Towards robustness of text- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
to-SQL models against synonym substitution. In dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
Proceedings of the 59th Annual Meeting of the Asso- Luke Zettlemoyer, and Veselin Stoyanov. 2020.
ciation for Computational Linguistics and the 11th Ro{bert}a: A robustly optimized {bert} pretraining
International Joint Conference on Natural Language approach.
Processing (Volume 1: Long Papers), pages 2505–
2515, Online. Association for Computational Lin- Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel,
guistics. and Pontus Stenetorp. 2022. Fantastically ordered
prompts and where to find them: Overcoming few-
Yujian Gan, Xinyun Chen, and Matthew Purver. 2021b. shot prompt order sensitivity. In Proceedings of the
Exploring underexplored limitations of cross-domain 60th Annual Meeting of the Association for Compu-
text-to-SQL generalization. In Proceedings of the tational Linguistics (Volume 1: Long Papers), pages
2021 Conference on Empirical Methods in Natural 8086–8098, Dublin, Ireland. Association for Compu-
Language Processing, pages 8926–8931, Online and tational Linguistics.
Punta Cana, Dominican Republic. Association for
Computational Linguistics. Ansong Ni, Srini Iyer, Dragomir Radev, Ves Stoyanov,
Wen tau Yih, Sida I. Wang, and Xi Victoria Lin. 2023.
Matt Gardner, Pradeep Dasigi, Srinivasan Iyer, Alane Lever: Learning to verify language-to-code genera-
Suhr, and Luke Zettlemoyer. 2018. Neural semantic tion with execution.
parsing. In Proceedings of the 56th Annual Meeting
of the Association for Computational Linguistics: Tu- Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt,
torial Abstracts, pages 17–18, Melbourne, Australia. Noah A. Smith, and Mike Lewis. 2023. Measuring
Association for Computational Linguistics. and narrowing the compositionality gap in language
Hila Gonen, Srini Iyer, Terra Blevins, Noah A. Smith, models.
and Luke Zettlemoyer. 2022. Demystifying prompts
Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
in language models via perplexity estimation.
Dario Amodei, and Ilya Sutskever. 2019. Language
Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Models are Unsupervised Multitask Learners.
Noah A. Smith, and Mari Ostendorf. 2022. In-
context learning for few-shot dialogue state tracking. Colin Raffel, Noam Shazeer, Adam Roberts, Kather-
In Findings of the Association for Computational ine Lee, Sharan Narang, Michael Matena, Yanqi
Linguistics: EMNLP 2022, pages 2627–2643, Abu Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the
Dhabi, United Arab Emirates. Association for Com- limits of transfer learning with a unified text-to-text
putational Linguistics. transformer. Journal of Machine Learning Research,
21(140):1–67.
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yu-
taka Matsuo, and Yusuke Iwasawa. 2023. Large lan- Nitarshan Rajkumar, Raymond Li, and Dzmitry Bah-
guage models are zero-shot reasoners. danau. 2022. Evaluating the text-to-sql capabilities
of large language models.
Haoyang Li, Jing Zhang, Cuiping Li, and Hong Chen.
2023. Resdsql: Decoupling schema linking and Nils Reimers and Iryna Gurevych. 2019. Sentence-
skeleton parsing for text-to-sql. In AAAI. BERT: Sentence embeddings using Siamese BERT-
networks. In Proceedings of the 2019 Conference on
Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. For- Empirical Methods in Natural Language Processing
bus, and Ni Lao. 2017. Neural symbolic machines: and the 9th International Joint Conference on Natu-
Learning semantic parsers on Freebase with weak ral Language Processing (EMNLP-IJCNLP), pages
supervision. In Proceedings of the 55th Annual Meet- 3982–3992, Hong Kong, China. Association for Com-
ing of the Association for Computational Linguistics putational Linguistics.
14944
Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Michi- and Denny Zhou. 2023. Self-consistency improves
hiro Yasunaga, Haitian Sun, Dale Schuurmans, Jure chain of thought reasoning in language models. In
Leskovec, and Denny Zhou. 2021. Lego: Latent The Eleventh International Conference on Learning
execution-guided reasoning for multi-hop question Representations.
answering on knowledge graphs. In Proceedings of
the 38th International Conference on Machine Learn- Yue Wang, Weishi Wang, Shafiq Joty, and Steven C.H.
ing, volume 139 of Proceedings of Machine Learning Hoi. 2021. CodeT5: Identifier-aware unified pre-
Research, pages 8959–8970. PMLR. trained encoder-decoder models for code understand-
ing and generation. In Proceedings of the 2021
Ohad Rubin, Jonathan Herzig, and Jonathan Berant. Conference on Empirical Methods in Natural Lan-
2022. Learning to retrieve prompts for in-context guage Processing, pages 8696–8708, Online and
learning. In Proceedings of the 2022 Conference of Punta Cana, Dominican Republic. Association for
the North American Chapter of the Association for Computational Linguistics.
Computational Linguistics: Human Language Tech-
nologies, pages 2655–2671, Seattle, United States. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten
Association for Computational Linguistics. Bosma, brian ichter, Fei Xia, Ed H. Chi, Quoc V Le,
and Denny Zhou. 2022. Chain of thought prompt-
Torsten Scholak, Nathan Schucher, and Dzmitry Bah- ing elicits reasoning in large language models. In
danau. 2021. PICARD: Parsing incrementally for Advances in Neural Information Processing Systems.
constrained auto-regressive decoding from language
models. In Proceedings of the 2021 Conference on Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten
Empirical Methods in Natural Language Processing, Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and
pages 9895–9901, Online and Punta Cana, Domini- Denny Zhou. 2023. Chain-of-thought prompting elic-
can Republic. Association for Computational Lin- its reasoning in large language models.
guistics.
Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, and Ling-
Taylor Sorensen, Joshua Robinson, Christopher Ryt- peng Kong. 2022. Self-adaptive in-context learning.
ting, Alexander Shaw, Kyle Rogers, Alexia Delorey,
Mahmoud Khalil, Nancy Fulda, and David Wingate. Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong,
2022. An information-theoretic approach to prompt Torsten Scholak, Michihiro Yasunaga, Chien-Sheng
engineering without ground truth labels. In Proceed- Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Vic-
ings of the 60th Annual Meeting of the Association tor Zhong, Bailin Wang, Chengzu Li, Connor Boyle,
for Computational Linguistics (Volume 1: Long Pa- Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming
pers), pages 819–862, Dublin, Ireland. Association Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith,
for Computational Linguistics. Luke Zettlemoyer, and Tao Yu. 2022. UnifiedSKG:
Unifying and multi-tasking structured knowledge
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, grounding with text-to-text language models. In Pro-
Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, ceedings of the 2022 Conference on Empirical Meth-
Luke Zettlemoyer, Noah A. Smith, and Tao Yu. 2023. ods in Natural Language Processing, pages 602–631,
Selective annotation makes language models better Abu Dhabi, United Arab Emirates. Association for
few-shot learners. In The Eleventh International Con- Computational Linguistics.
ference on Learning Representations.
Xuchen Yao and Benjamin Van Durme. 2014. Infor-
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann mation extraction over structured data: Question an-
Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, swering with Freebase. In Proceedings of the 52nd
and Tatsunori B. Hashimoto. 2023. Stanford alpaca: Annual Meeting of the Association for Computational
An instruction-following llama model. https:// Linguistics (Volume 1: Long Papers), pages 956–966,
github.com/tatsu-lab/stanford_alpaca. Baltimore, Maryland. Association for Computational
Linguistics.
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier
Martinet, Marie-Anne Lachaux, Timothée Lacroix, Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang,
Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Hyeongu Yun, Yireun Kim, and Minjoon Seo. 2023.
Azhar, Aurelien Rodriguez, Armand Joulin, Edouard In-context instruction learning.
Grave, and Guillaume Lample. 2023. Llama: Open
and efficient foundation language models. Pengcheng Yin and Graham Neubig. 2018. TRANX:
A transition-based neural abstract syntax parser for
Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr semantic parsing and code generation. In Proceed-
Polozov, and Matthew Richardson. 2020. RAT-SQL: ings of the 2018 Conference on Empirical Methods
Relation-aware schema encoding and linking for text- in Natural Language Processing: System Demonstra-
to-SQL parsers. In Proceedings of the 58th Annual tions, pages 7–12, Brussels, Belgium. Association
Meeting of the Association for Computational Lin- for Computational Linguistics.
guistics, pages 7567–7578, Online. Association for
Computational Linguistics. Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Se-
bastian Riedel. 2020. TaBERT: Pretraining for joint
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, understanding of textual and tabular data. In Proceed-
Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, ings of the 58th Annual Meeting of the Association
14945
for Computational Linguistics, pages 8413–8426, On-
line. Association for Computational Linguistics.
Tao Yu, Rui Zhang, Heyang Er, Suyi Li, Eric Xue,
Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze
Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga,
Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan
Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vin-
cent Zhang, Caiming Xiong, Richard Socher, Walter
Lasecki, and Dragomir Radev. 2019. CoSQL: A
conversational text-to-SQL challenge towards cross-
domain natural language interfaces to databases. In
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the
9th International Joint Conference on Natural Lan-
guage Processing (EMNLP-IJCNLP), pages 1962–
1979, Hong Kong, China. Association for Computa-
tional Linguistics.
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga,
Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingn-
ing Yao, Shanelle Roman, Zilin Zhang, and Dragomir
Radev. 2018. Spider: A large-scale human-labeled
dataset for complex and cross-domain semantic pars-
ing and text-to-SQL task. In Proceedings of the 2018
Conference on Empirical Methods in Natural Lan-
guage Processing, pages 3911–3921, Brussels, Bel-
gium. Association for Computational Linguistics.
Yiming Zhang, Shi Feng, and Chenhao Tan. 2022. Ac-
tive example selection for in-context learning. In Pro-
ceedings of the 2022 Conference on Empirical Meth-
ods in Natural Language Processing, pages 9134–
9148, Abu Dhabi, United Arab Emirates. Association
for Computational Linguistics.
14946
Appendix
14947
Algorithm 2: Integrated Strategy
Input: Set of annotated examples A, test examples T , # demonstrations k, categorization
{α, β, ...}, and from Algorithm 1: disjoint partitions {Aα , Aβ , ...} and corresponding
demonstrations {Dα , Dβ , ...}
Result: Set of SQL predictions SP, where SPi is the final prediction for test example Ti
for Ti in test set T do
Ti .SQL = initial_predictor(Ti );
ci = get_category(Ti .SQL);
for n = 4 to k do
Pin = build_prompt(Dci [: n], Ti );
∗
Pin = augment_schema(Pin );
∗
SPni = Model(Pin );
ERni = DBMS(SPni );
end
ER∗i = Remove_Exec_Errors(ERi );
SPi = Majority_Vote(ER∗i );
end
return SP
Table 1: Average syntax coverage and similarity measures of the prompt for different demonstration selection
strategies and the corresponding execution accuracies.
14948
Figure 6: Correlation between syntax coverage and similarity measures of prompts and execution accuracy.
Figure 7: Effects of various prompting strategies on Text-to-SQL problems of different difficulty levels.
14949
1 Given the following database schema :
2 gymnast : gymnast_id , floor_exercise_points , pommel_horse_points , rings_points ,
vault_points , parallel_bars_points , horizontal_bar_points , total_points |
people : people_id , name , age , height , hometown
3
4 Answer the following : Return the total points of the gymnast with the lowest
age .
5
6 select t1 . total_points from gymnast as t1 join people as t2 on t1 . gymnast_id =
t2 . people_id order by t2 . age asc limit 1
Listing 1: Baseline prompt with text representation of the database.
14950
1 /* Given the following database schema : */
2 CREATE TABLE IF NOT EXISTS " department " (
3 " Department_ID " int , -- a unique identifier for a department
4 " Name " text , -- the name of the department
5 " Creation " text , -- the date the department was created
6 " Ranking " int , -- the ranking of the department within the organization
7 " Budget_in_Billions " real , -- the department ' s budget in billions of
dollars
8 " Num_Employees " real , -- the number of employees in the department
9 PRIMARY KEY ( " Department_ID " )
10 );
11 CREATE TABLE IF NOT EXISTS " head " (
12 " head_ID " int , -- a unique identifier for the head of a department
13 " name " text , -- the name of the head of the department
14 " born_state " text , -- the state where the head of the department was born
15 " age " real , -- the age of the head of the department
16 PRIMARY KEY (" head_ID " )
17 );
18 CREATE TABLE IF NOT EXISTS " management " (
19 " department_ID " int , -- the unique identifier for the department being
managed
20 " head_ID " int , -- the unique identifier for the head of the department
21 " temporary_acting " text , -- whether the head of the department is serving
in a temporary or acting capacity
22 PRIMARY KEY (" Department_ID " , " head_ID ")
23 FOREIGN KEY (" Department_ID " ) REFERENCES ` department `( " Department_ID " )
24 FOREIGN KEY (" head_ID " ) REFERENCES `head `( " head_ID " )
25 );
26
27 /* Answer the following : What are the distinct creation years of the
departments managed by a secretary born in state ' Alabama '? */
28
29 select distinct t1 . creation from department as t1 join management as t2 on t1 .
department_id = t2 . department_id join head as t3 on t2 . head_id = t3 . head_id
where t3 . born_state = ' Alabama '
Listing 3: Prompt with semantic augmentation of the schema as inline comment.
14951
1 /* Given the following database schema : */
2 CREATE TABLE IF NOT EXISTS " department " (
3 " Department_ID " int ,
4 " Name " text ,
5 " Creation " text ,
6 " Ranking " int ,
7 " Budget_in_Billions " real ,
8 " Num_Employees " real ,
9 PRIMARY KEY ( " Department_ID " )
10 );
11 CREATE TABLE IF NOT EXISTS " head " (
12 " head_ID " int ,
13 " name " text ,
14 " born_state " text ,
15 " age " real ,
16 PRIMARY KEY (" head_ID " )
17 );
18 CREATE TABLE IF NOT EXISTS " management " (
19 " department_ID " int ,
20 " head_ID " int ,
21 " temporary_acting " text ,
22 PRIMARY KEY (" Department_ID " ," head_ID ") ,
23 FOREIGN KEY (" Department_ID " ) REFERENCES ` department `( " Department_ID " ) ,
24 FOREIGN KEY (" head_ID " ) REFERENCES `head `( " head_ID " )
25 );
26
27 /* Table column descriptions :
28 { ' department ': {' Department_ID ': 'a unique identifier for a department ' , ' Name
': ' the name of the department ' , ' Creation ': ' the date the department was
created ' , ' Ranking ': ' the ranking of the department within the organization
' , ' Budget_in_Billions ': " the department ' s budget in billions of dollars ",
' Num_Employees ': ' the number of employees in the department '} , ' head ': {'
head_ID ': 'a unique identifier for the head of a department ' , ' name ': ' the
name of the head of the department ' , ' born_state ': ' the state where the
head of the department was born ' , 'age ': ' the age of the head of the
department '} , ' management ': {' department_ID ': ' the unique identifier for
the department being managed ' , ' head_ID ': ' the unique identifier for the
head of the department ' , ' temporary_acting ': ' whether the head of the
department is serving in a temporary or acting capacity '}} */
29 /* Answer the following : What are the distinct creation years of the
departments managed by a secretary born in state ' Alabama '? */
30
31 select distinct t1 . creation from department as t1 join management as t2 on t1 .
department_id = t2 . department_id join head as t3 on t2 . head_id = t3 . head_id
where t3 . born_state = ' Alabama '
Listing 4: Prompt with semantic augmentation of the schema as block comment.
14952
continents.contid -> countries.continent, countries.countryid ->
car_makers.country, car_makers.id -> model_list.maker, model_list.model ->
car_names.model, car_names.makeid -> cars_data.id
14953
1 /* Given the following database schema : */
2 CREATE TABLE IF NOT EXISTS " continents " (
3 " ContId " INTEGER PRIMARY KEY ,
4 " Continent " TEXT
5 );
6 CREATE TABLE IF NOT EXISTS " countries " (
7 " CountryId " INTEGER PRIMARY KEY ,
8 " CountryName " TEXT ,
9 " Continent " INTEGER ,
10 FOREIGN KEY ( Continent ) REFERENCES continents ( ContId )
11 );
12 CREATE TABLE IF NOT EXISTS " car_makers " (
13 " Id " INTEGER PRIMARY KEY ,
14 " Maker " TEXT ,
15 " FullName " TEXT ,
16 " Country " TEXT ,
17 FOREIGN KEY ( Country ) REFERENCES countries ( CountryId )
18 );
19 CREATE TABLE IF NOT EXISTS " model_list " (
20 " ModelId " INTEGER PRIMARY KEY ,
21 " Maker " INTEGER ,
22 " Model " TEXT UNIQUE ,
23 FOREIGN KEY ( Maker ) REFERENCES car_makers ( Id )
24
25 );
26 CREATE TABLE IF NOT EXISTS " car_names " (
27 " MakeId " INTEGER PRIMARY KEY ,
28 " Model " TEXT ,
29 " Make " TEXT ,
30 FOREIGN KEY ( Model ) REFERENCES model_list ( Model )
31 );
32 CREATE TABLE IF NOT EXISTS " cars_data " (
33 " Id " INTEGER PRIMARY KEY ,
34 " MPG " TEXT ,
35 " Cylinders " INTEGER ,
36 " Edispl " REAL ,
37 " Horsepower " TEXT ,
38 " Weight " INTEGER ,
39 " Accelerate " REAL ,
40 " Year " INTEGER ,
41 FOREIGN KEY ( Id ) REFERENCES car_names ( MakeId )
42 );
43
44 /*
45 Database ontology :
46 continents . contid -> countries . continent , countries . countryid -> car_makers .
country , car_makers . id -> model_list . maker , model_list . model -> car_names .
model , car_names . makeid -> cars_data . id
47 */
48 /* Answer the following : How many continents are there ? */
49
50 select count (*) from continents ;
Listing 5: Prompt with structure augmentation of the schema.
14954
Figure 10: Few-shot results for comparing different sampling strategies with different number of demonstra-
tion examples selected for the prompt.
Figure 11: Few-shot results for comparing different schema representation augmentation methods with
different number of demonstration examples selected for the prompt.
14955
Figure 12: Few-shot results for comparing different sampling strategies on Text-to-SQL problems of different
difficulty levels, with different number of demonstration examples selected for the prompt.
14956