0% found this document useful (0 votes)
2 views

Project Report -4

The document reviews advancements in natural language to SQL generation, highlighting the limitations of static prompt designs and proposing a dynamic, schema-aware solution using LangChain for real-time query execution. It discusses various methods like few-shot learning, dynamic table selection, prompt optimization, query rephrasing, and context retention to enhance SQL generation accuracy and user experience. The proposed system aims to improve accessibility for non-technical users while addressing challenges such as ambiguity in natural language and the need for continuous prompt refinement.

Uploaded by

thou.71772117146
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Project Report -4

The document reviews advancements in natural language to SQL generation, highlighting the limitations of static prompt designs and proposing a dynamic, schema-aware solution using LangChain for real-time query execution. It discusses various methods like few-shot learning, dynamic table selection, prompt optimization, query rephrasing, and context retention to enhance SQL generation accuracy and user experience. The proposed system aims to improve accessibility for non-technical users while addressing challenges such as ambiguity in natural language and the need for continuous prompt refinement.

Uploaded by

thou.71772117146
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CHAPTER 2

LITERATURE REVIEW

2.1 NATURAL LANGUAGE TO SQL GENERATION

2.1.1 DESCRIPTION

The base paper emphasizes the use of large language models (LLMs),
such as GPT-based architectures, for translating natural language into SQL
queries. It focuses on prompt formatting strategies and benchmark testing to
evaluate model performance. While effective in structured experiments, this
approach is limited by static prompt design and lacks integration with live database
systems.Our project builds upon this by implementing a real-time, schema-aware
solution using LangChain.Instead of relying on static schema inputs, we
dynamically fetch and inject database schema information into prompts, enabling
more accurate and context-sensitive SQL generation. Unlike the base paper,
which evaluates outputs offline, our system interacts directly with live databases,
delivering instant query execution and feedback. This shift makes the solution
more practical and adaptable to real-world use cases.

2.1.2 MERIT

●​ Improves database accessibility for non-technical users.


●​ Automates query generation without the need for SQL expertise.
●​ Schema-aware responses that adapt to different domains.

2.1.3 DEMERIT

●​ SQL generation was static and not executed in real-time.


●​ Ambiguous natural language can lead to incorrect SQL generation.
●​ Requires continuous prompt refinement for optimal performance.

4
2.2 FEW-SHOT LEARNING FOR TEXT-TO-SQL

2.2.1 DESCRIPTION

The base paper explores prompt-based learning using two types of fixed
prompt structures (Type I and Type II), but it applies few-shot examples in a static
manner—embedding hardcoded natural language and SQL pairs into prompts
during benchmarking. While this improves performance, it lacks adaptability
across dynamic schemas or varying user queries. In contrast, our solution adopts
a dynamic few shot strategy using LangChain FewShotPromptTemplate.Here, 3–5
relevant natural language questions and their corresponding SQL queries are
programmatically selected and inserted into the prompt based on the user's
current query context. This method simulates how humans learn: by observing
examples before attempting similar tasks. It improves generalization without the
need for large training datasets or model fine-tuning. The dynamic nature allows
better alignment with schema variations and query intent in real-time.

2.2.2 MERIT

●​ Enhances model adaptability with minimal examples.


●​ Requires no fine-tuning or retraining for each schema.
●​ Improves SQL generation even with minimal data availability.

2.2.3 DEMERIT

●​ Few-shot examples were static.


●​ Can be inconsistent if the prompt structure is not standardized.
●​ Struggles with complex or novel queries not covered in examples.

2.3 DYNAMIC TABLE SELECTION

2.3.1 DESCRIPTION

In multi-table databases, accurately identifying which tables to reference is


essential for generating valid SQL queries. The base paper relies on static prompt
injection of entire schemas, where all table names and columns are included in the

5
prompt regardless of their relevance to the user’s query. This increases prompt
length, introduces noise, and reduces accuracy—especially in large or complex
databases.Our solution overcomes these limitations through dynamic table
selection using semantic embeddings. Each table name and its metadata are
converted into vector representations using a pre-trained embedding model.This
allows the model to focus on only the necessary schema elements, improving the
quality of SQL generation and minimizing the inclusion of irrelevant or conflicting
tables.

2.3.2 MERIT

●​ Reduces SQL generation errors by avoiding irrelevant tables.


●​ Enhances contextual awareness in multi-table databases.
●​ Scales effectively with growing schema complexity.

2.3.3 DEMERIT

●​ Includes all schema tables in the prompt, regardless of relevance.


●​ May return incorrect tables if the query is ambiguous.
●​ Manual schema injection makes it harder to scale or adapt dynamically.

2.4 PROMPT OPTIMIZATION

2.4.1 DESCRIPTION

The base paper explores two prompt types (Type I and Type II), showing
that changing the structure and verbosity of prompts significantly impacts
performance. However, these prompts were manually crafted and static, requiring
trial-and-error tuning per use case. There was no support for dynamically adapting
the prompt based on schema, task, or query complexity.Our implementation
addresses this limitation through LangChain’s PromptTemplate and
FewShotPromptTemplate, which allow programmatic and flexible prompt
construction. Each prompt includes a clear task instruction, dynamically injected
schema, and optionally relevant few-shot examples—tailored to the user’s current

6
query. This reduces manual overhead, ensures consistency, and allows the
system to handle varied schemas and query types more effectively.

2.4.2 MERIT

●​ Improves consistency and quality of SQL outputs.


●​ Reduces ambiguity in model interpretation by clearly separating schema,
task, and input.
●​ Essential for tailoring the model to specific domains or schemas.

2.4.3 DEMERIT

●​ Manually crafting effective prompts can be time-consuming.


●​ Hard to generalize across very different schema or domains.
●​ Too long prompts may exceed model token limits.

2.5 QUERY REPHRASING USING LLMs

2.5.1 DESCRIPTION

Natural language queries can be vague or ambiguous, making it difficult for


the model to interpret them accurately. To solve this, the concept of query
rephrasing is introduced. In earlier versions of the system, user queries were
automatically rephrased into multiple semantically similar versions using LLMs.
Each variant was then tested, and the one that yielded the most accurate SQL
was chosen. Though not used in the current implementation, this method forms a
critical part of modern NL2SQL research. It allows the system to overcome
limitations of unclear or grammatically incorrect user input. By rephrasing the
question into more structured and explicit versions, the model is better able to
generate the correct SQL, especially for edge cases.

2.5.2 MERIT

●​ Increases reliability by reducing ambiguity.


●​ Allows for flexible interpretation of varied user phrasing.
●​ Can improve accuracy without modifying the underlying model.

7
2.5.3 DEMERIT

●​ Rephrasing was not part of the system, only used in testing.


●​ Requires a selection mechanism to pick the best variant.
●​ May not significantly help with deeply complex queries.

2.6 LANGCHAIN MEMORY FOR CONTEXT RETENTION

2.6.1 DESCRIPTION

While the base paper provided a foundational approach to natural language


queries over data, it lacked a mechanism for content retention across user
interactions. This limited the system’s ability to handle follow-up questions or
sustain coherent conversations across multiple turns.To overcome this limitation,
our solution integrates LangChain’s memory module, which enables the system to
retain and utilize past interactions. With memory, users no longer need to repeat
details in every query. For instance, after asking for “clients in Bangalore,” a user
can simply follow up with, “How many total orders do they have?” The system
understands that “they” refers to the clients retrieved in the previous query, thereby
supporting natural, human-like dialogue.

2.6.2 MERIT

●​ Maintains dialogue continuity for multi-turn queries.


●​ Reduces user effort by eliminating repetitive inputs.
●​ Users can explore data incrementally through chained queries.

2.6.3 DEMERIT

●​ The base system treats each query in isolation, discarding any context from
previous interactions.
●​ Users must repeat entire query details even for simple follow-ups, which
can be inefficient and frustrating.
●​ Potential risk of context leakage between user sessions.

You might also like