0% found this document useful (0 votes)

6 views5 pages

Project Report - 4

The document reviews advancements in natural language to SQL generation, highlighting the limitations of static prompt designs and proposing a dynamic, schema-aware solution using LangChain for real-time query execution. It discusses various methods like few-shot learning, dynamic table selection, prompt optimization, query rephrasing, and context retention to enhance SQL generation accuracy and user experience. The proposed system aims to improve accessibility for non-technical users while addressing challenges such as ambiguity in natural language and the need for continuous prompt refinement.

Uploaded by

thou.71772117146

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

Project Report - 4

Uploaded by

thou.71772117146

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CHAPTER 2

LITERATURE REVIEW

2.1 NATURAL LANGUAGE TO SQL GENERATION

2.1.1 DESCRIPTION

The base paper emphasizes the use of large language models (LLMs),
such as GPT-based architectures, for translating natural language into SQL
queries. It focuses on prompt formatting strategies and benchmark testing to
evaluate model performance. While effective in structured experiments, this
approach is limited by static prompt design and lacks integration with live database
systems.Our project builds upon this by implementing a real-time, schema-aware
solution using LangChain.Instead of relying on static schema inputs, we
dynamically fetch and inject database schema information into prompts, enabling
more accurate and context-sensitive SQL generation. Unlike the base paper,
which evaluates outputs offline, our system interacts directly with live databases,
delivering instant query execution and feedback. This shift makes the solution
more practical and adaptable to real-world use cases.

2.1.2 MERIT

● Improves database accessibility for non-technical users.

● Automates query generation without the need for SQL expertise.
● Schema-aware responses that adapt to different domains.

2.1.3 DEMERIT

● SQL generation was static and not executed in real-time.

● Ambiguous natural language can lead to incorrect SQL generation.
● Requires continuous prompt refinement for optimal performance.

4
2.2 FEW-SHOT LEARNING FOR TEXT-TO-SQL

2.2.1 DESCRIPTION

The base paper explores prompt-based learning using two types of fixed
prompt structures (Type I and Type II), but it applies few-shot examples in a static
manner—embedding hardcoded natural language and SQL pairs into prompts
during benchmarking. While this improves performance, it lacks adaptability
across dynamic schemas or varying user queries. In contrast, our solution adopts
a dynamic few shot strategy using LangChain FewShotPromptTemplate.Here, 3–5
relevant natural language questions and their corresponding SQL queries are
programmatically selected and inserted into the prompt based on the user's
current query context. This method simulates how humans learn: by observing
examples before attempting similar tasks. It improves generalization without the
need for large training datasets or model fine-tuning. The dynamic nature allows
better alignment with schema variations and query intent in real-time.

2.2.2 MERIT

● Enhances model adaptability with minimal examples.

● Requires no fine-tuning or retraining for each schema.
● Improves SQL generation even with minimal data availability.

2.2.3 DEMERIT

● Few-shot examples were static.

● Can be inconsistent if the prompt structure is not standardized.
● Struggles with complex or novel queries not covered in examples.

2.3 DYNAMIC TABLE SELECTION

2.3.1 DESCRIPTION

In multi-table databases, accurately identifying which tables to reference is

essential for generating valid SQL queries. The base paper relies on static prompt
injection of entire schemas, where all table names and columns are included in the

5
prompt regardless of their relevance to the user’s query. This increases prompt
length, introduces noise, and reduces accuracy—especially in large or complex
databases.Our solution overcomes these limitations through dynamic table
selection using semantic embeddings. Each table name and its metadata are
converted into vector representations using a pre-trained embedding model.This
allows the model to focus on only the necessary schema elements, improving the
quality of SQL generation and minimizing the inclusion of irrelevant or conflicting
tables.

2.3.2 MERIT

● Reduces SQL generation errors by avoiding irrelevant tables.

● Enhances contextual awareness in multi-table databases.
● Scales effectively with growing schema complexity.

2.3.3 DEMERIT

● Includes all schema tables in the prompt, regardless of relevance.

● May return incorrect tables if the query is ambiguous.
● Manual schema injection makes it harder to scale or adapt dynamically.

2.4 PROMPT OPTIMIZATION

2.4.1 DESCRIPTION

The base paper explores two prompt types (Type I and Type II), showing
that changing the structure and verbosity of prompts significantly impacts
performance. However, these prompts were manually crafted and static, requiring
trial-and-error tuning per use case. There was no support for dynamically adapting
the prompt based on schema, task, or query complexity.Our implementation
addresses this limitation through LangChain’s PromptTemplate and
FewShotPromptTemplate, which allow programmatic and flexible prompt
construction. Each prompt includes a clear task instruction, dynamically injected
schema, and optionally relevant few-shot examples—tailored to the user’s current

6
query. This reduces manual overhead, ensures consistency, and allows the
system to handle varied schemas and query types more effectively.

2.4.2 MERIT

● Improves consistency and quality of SQL outputs.

● Reduces ambiguity in model interpretation by clearly separating schema,
task, and input.
● Essential for tailoring the model to specific domains or schemas.

2.4.3 DEMERIT

● Manually crafting effective prompts can be time-consuming.

● Hard to generalize across very different schema or domains.
● Too long prompts may exceed model token limits.

2.5 QUERY REPHRASING USING LLMs

2.5.1 DESCRIPTION

Natural language queries can be vague or ambiguous, making it difficult for

the model to interpret them accurately. To solve this, the concept of query
rephrasing is introduced. In earlier versions of the system, user queries were
automatically rephrased into multiple semantically similar versions using LLMs.
Each variant was then tested, and the one that yielded the most accurate SQL
was chosen. Though not used in the current implementation, this method forms a
critical part of modern NL2SQL research. It allows the system to overcome
limitations of unclear or grammatically incorrect user input. By rephrasing the
question into more structured and explicit versions, the model is better able to
generate the correct SQL, especially for edge cases.

2.5.2 MERIT

● Increases reliability by reducing ambiguity.

● Allows for flexible interpretation of varied user phrasing.
● Can improve accuracy without modifying the underlying model.

7
2.5.3 DEMERIT

● Rephrasing was not part of the system, only used in testing.

● Requires a selection mechanism to pick the best variant.
● May not significantly help with deeply complex queries.

2.6 LANGCHAIN MEMORY FOR CONTEXT RETENTION

2.6.1 DESCRIPTION

While the base paper provided a foundational approach to natural language

queries over data, it lacked a mechanism for content retention across user
interactions. This limited the system’s ability to handle follow-up questions or
sustain coherent conversations across multiple turns.To overcome this limitation,
our solution integrates LangChain’s memory module, which enables the system to
retain and utilize past interactions. With memory, users no longer need to repeat
details in every query. For instance, after asking for “clients in Bangalore,” a user
can simply follow up with, “How many total orders do they have?” The system
understands that “they” refers to the clients retrieved in the previous query, thereby
supporting natural, human-like dialogue.

2.6.2 MERIT

● Maintains dialogue continuity for multi-turn queries.

● Reduces user effort by eliminating repetitive inputs.
● Users can explore data incrementally through chained queries.

2.6.3 DEMERIT

● The base system treats each query in isolation, discarding any context from
previous interactions.
● Users must repeat entire query details even for simple follow-ups, which
can be inefficient and frustrating.
● Potential risk of context leakage between user sessions.

Tableau 9 - The Official Guide - Peck, George
100% (1)
Tableau 9 - The Official Guide - Peck, George
356 pages
SSIS Interview Questions and Answers For Experienced and Freshers
100% (2)
SSIS Interview Questions and Answers For Experienced and Freshers
18 pages
Identifying and Resolving Database Performance Problems
No ratings yet
Identifying and Resolving Database Performance Problems
24 pages
Project Report - 6
No ratings yet
Project Report - 6
7 pages
Project Report - 7 - Merged
No ratings yet
Project Report - 7 - Merged
46 pages
Project Report - 3
No ratings yet
Project Report - 3
3 pages
Mastering the Art of PL/SQL Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of PL/SQL Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Advanced SQL Queries: Writing Efficient Code for Big Data
From Everand
Advanced SQL Queries: Writing Efficient Code for Big Data
Robert Johnson
5/5 (2)
SQL Fundamentals for New Developers: A Practical Guide with Examples
From Everand
SQL Fundamentals for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
Dbms Lab El Report
No ratings yet
Dbms Lab El Report
20 pages
T-SQL Techniques and Best Practices: Definitive Reference for Developers and Engineers
From Everand
T-SQL Techniques and Best Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PLpgSQL Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
PLpgSQL Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced SQL Performance Tuning: Optimize Your Database Workloads
From Everand
Advanced SQL Performance Tuning: Optimize Your Database Workloads
Robert Johnson
No ratings yet
Database Chatbot LangChain Presentation
No ratings yet
Database Chatbot LangChain Presentation
11 pages
Professional PL/SQL Development: Definitive Reference for Developers and Engineers
From Everand
Professional PL/SQL Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
No ratings yet
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
11 pages
PostgreSQL Foundations: Definitive Reference for Developers and Engineers
From Everand
PostgreSQL Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Project Report - 8
No ratings yet
Project Report - 8
1 page
SQLite Essentials: Definitive Reference for Developers and Engineers
From Everand
SQLite Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pet-Sql:: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL With Cross-Consistency
No ratings yet
Pet-Sql:: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL With Cross-Consistency
15 pages
Structure-Guided Large Language Models For
No ratings yet
Structure-Guided Large Language Models For
24 pages
Practical MXNet Applications: Definitive Reference for Developers and Engineers
From Everand
Practical MXNet Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
3.1 Purpose
No ratings yet
3.1 Purpose
10 pages
NL 2 SQL
No ratings yet
NL 2 SQL
12 pages
SQLPa LM
No ratings yet
SQLPa LM
61 pages
Formation of SQL From Natural Language Query Using NLP: Uma M Sneha V Sneha G
No ratings yet
Formation of SQL From Natural Language Query Using NLP: Uma M Sneha V Sneha G
5 pages
Project Report - 2
No ratings yet
Project Report - 2
3 pages
SQLAlchemy Essentials: Definitive Reference for Developers and Engineers
From Everand
SQLAlchemy Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Natural Language Database
No ratings yet
Natural Language Database
68 pages
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
No ratings yet
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
18 pages
SurrealDB in Depth: The Complete Guide for Developers and Engineers
From Everand
SurrealDB in Depth: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
NL2SQL Handbook
No ratings yet
NL2SQL Handbook
181 pages
Effective Cucumber Automation: Definitive Reference for Developers and Engineers
From Everand
Effective Cucumber Automation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LightGBM in Practice: Definitive Reference for Developers and Engineers
From Everand
LightGBM in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Seq 2 SQL
No ratings yet
Seq 2 SQL
13 pages
1.1 Overview
No ratings yet
1.1 Overview
4 pages
Svelte Essentials: Definitive Reference for Developers and Engineers
From Everand
Svelte Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Concurrency and Parallel Programming Unlock the Secrets of Expert-Level Skills.pdf
From Everand
Mastering Concurrency and Parallel Programming Unlock the Secrets of Expert-Level Skills.pdf
Larry Jones
No ratings yet
BentoML Adapter Integrations for Machine Learning Frameworks: The Complete Guide for Developers and Engineers
From Everand
BentoML Adapter Integrations for Machine Learning Frameworks: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
KNIME Workflow Design and Automation: Definitive Reference for Developers and Engineers
From Everand
KNIME Workflow Design and Automation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Slick in Depth: Definitive Reference for Developers and Engineers
From Everand
Slick in Depth: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering the Art of Smalltalk Programming: Advanced Techniques and Skills
From Everand
Mastering the Art of Smalltalk Programming: Advanced Techniques and Skills
Steve Jones
No ratings yet
Mid Sem Report
No ratings yet
Mid Sem Report
11 pages
2024 Lrec-Main 539
No ratings yet
2024 Lrec-Main 539
19 pages
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation
No ratings yet
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation
17 pages
GreptimeDB Essentials: The Complete Guide for Developers and Engineers
From Everand
GreptimeDB Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Sublime Text Essentials: Definitive Reference for Developers and Engineers
From Everand
Sublime Text Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
KeyDB Administration and Performance Tuning: Definitive Reference for Developers and Engineers
From Everand
KeyDB Administration and Performance Tuning: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DB GPT Hub 2024
No ratings yet
DB GPT Hub 2024
17 pages
247 Sqlnet Generating Structured Q
No ratings yet
247 Sqlnet Generating Structured Q
15 pages
SpecFlow Test Automation Essentials: Definitive Reference for Developers and Engineers
From Everand
SpecFlow Test Automation Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Building Scalable Systems with C: Optimizing Performance and Portability
From Everand
Building Scalable Systems with C: Optimizing Performance and Portability
Larry Jones
No ratings yet
Comprehensive Guide to MiniTest: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to MiniTest: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
1711 04436v1
No ratings yet
1711 04436v1
13 pages
Scripting with PowerShell for Beginners: A Practical Guide with Examples
From Everand
Scripting with PowerShell for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Transact-SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Transact-SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Shell Scripting Step by Step: A Practical Guide with Examples
From Everand
Shell Scripting Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
No ratings yet
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
7 pages
PrestoDB in Practice: Definitive Reference for Developers and Engineers
From Everand
PrestoDB in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PrimeReact Component Development Guide: Definitive Reference for Developers and Engineers
From Everand
PrimeReact Component Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
E-SQL: Direct Schema Linking Via Question Enrichment in Text-to-SQL
No ratings yet
E-SQL: Direct Schema Linking Via Question Enrichment in Text-to-SQL
18 pages
Inference Controls
No ratings yet
Inference Controls
21 pages
Metadata Catalogues in Spatial Information Systems
No ratings yet
Metadata Catalogues in Spatial Information Systems
22 pages
Introduction To MongoDB
No ratings yet
Introduction To MongoDB
8 pages
ERD Assignment Solution
100% (2)
ERD Assignment Solution
6 pages
Databricks Certified Associate Data Engineer
100% (1)
Databricks Certified Associate Data Engineer
18 pages
Cassandra - Module5
No ratings yet
Cassandra - Module5
37 pages
Unit III Dbms Question and Answer
No ratings yet
Unit III Dbms Question and Answer
13 pages
M5 - T-TFGC-B - Introduction To The Terraform State
No ratings yet
M5 - T-TFGC-B - Introduction To The Terraform State
22 pages
Target To Source Mapping RT
No ratings yet
Target To Source Mapping RT
158 pages
COMP1556 - Week 3 Database Technologies Applications: The Science Fiction Case Study and Notation Forms
No ratings yet
COMP1556 - Week 3 Database Technologies Applications: The Science Fiction Case Study and Notation Forms
38 pages
SAP BW - Reconcile SAP CRM Installed Base Data With SAP BW Datasource
No ratings yet
SAP BW - Reconcile SAP CRM Installed Base Data With SAP BW Datasource
34 pages
Test 02 - Attempt Review
No ratings yet
Test 02 - Attempt Review
8 pages
MODULE 2 and 3
No ratings yet
MODULE 2 and 3
53 pages
c6-r3: Advanced Database Management Systems Note: 1.
No ratings yet
c6-r3: Advanced Database Management Systems Note: 1.
2 pages
Manual - NQDI - Database Changes
No ratings yet
Manual - NQDI - Database Changes
76 pages
Informatics Practices Project - 221228 - 132356
No ratings yet
Informatics Practices Project - 221228 - 132356
30 pages
How To Optimize SQL Server Query Performance - Statistics, Joins and Index Tuning
No ratings yet
How To Optimize SQL Server Query Performance - Statistics, Joins and Index Tuning
25 pages
BioTime 8.5 Integration Manual
No ratings yet
BioTime 8.5 Integration Manual
7 pages
Becoming A ZFS Ninja
No ratings yet
Becoming A ZFS Ninja
68 pages
JDs - Placement Drive - Data Analyst
No ratings yet
JDs - Placement Drive - Data Analyst
1 page
Automated Workflows Using The Cumulus Scheduler
No ratings yet
Automated Workflows Using The Cumulus Scheduler
7 pages
Hana File System
No ratings yet
Hana File System
8 pages
Retail Sales Analytics Project
No ratings yet
Retail Sales Analytics Project
3 pages
Binary Search Is Like Looking Up A Phone Number or A Word in The Dictionary
No ratings yet
Binary Search Is Like Looking Up A Phone Number or A Word in The Dictionary
17 pages
Dms Report
No ratings yet
Dms Report
10 pages
Course 1 Module 03 Extra Problems
No ratings yet
Course 1 Module 03 Extra Problems
2 pages

Project Report - 4

Uploaded by

Project Report - 4

Uploaded by

CHAPTER 2

2.1 NATURAL LANGUAGE TO SQL GENERATION

●​ Improves database accessibility for non-technical users.

●​ SQL generation was static and not executed in real-time.

●​ Enhances model adaptability with minimal examples.

●​ Few-shot examples were static.

2.3 DYNAMIC TABLE SELECTION

In multi-table databases, accurately identifying which tables to reference is

●​ Reduces SQL generation errors by avoiding irrelevant tables.

●​ Includes all schema tables in the prompt, regardless of relevance.

2.4 PROMPT OPTIMIZATION

●​ Improves consistency and quality of SQL outputs.

●​ Manually crafting effective prompts can be time-consuming.

2.5 QUERY REPHRASING USING LLMs

Natural language queries can be vague or ambiguous, making it difficult for

●​ Increases reliability by reducing ambiguity.

●​ Rephrasing was not part of the system, only used in testing.

2.6 LANGCHAIN MEMORY FOR CONTEXT RETENTION

While the base paper provided a foundational approach to natural language

●​ Maintains dialogue continuity for multi-turn queries.

You might also like

● Improves database accessibility for non-technical users.

● SQL generation was static and not executed in real-time.

● Enhances model adaptability with minimal examples.

● Few-shot examples were static.

● Reduces SQL generation errors by avoiding irrelevant tables.

● Includes all schema tables in the prompt, regardless of relevance.

● Improves consistency and quality of SQL outputs.

● Manually crafting effective prompts can be time-consuming.

● Increases reliability by reducing ambiguity.

● Rephrasing was not part of the system, only used in testing.

● Maintains dialogue continuity for multi-turn queries.