Project Report -4
Project Report -4
LITERATURE REVIEW
2.1.1 DESCRIPTION
The base paper emphasizes the use of large language models (LLMs),
such as GPT-based architectures, for translating natural language into SQL
queries. It focuses on prompt formatting strategies and benchmark testing to
evaluate model performance. While effective in structured experiments, this
approach is limited by static prompt design and lacks integration with live database
systems.Our project builds upon this by implementing a real-time, schema-aware
solution using LangChain.Instead of relying on static schema inputs, we
dynamically fetch and inject database schema information into prompts, enabling
more accurate and context-sensitive SQL generation. Unlike the base paper,
which evaluates outputs offline, our system interacts directly with live databases,
delivering instant query execution and feedback. This shift makes the solution
more practical and adaptable to real-world use cases.
2.1.2 MERIT
2.1.3 DEMERIT
4
2.2 FEW-SHOT LEARNING FOR TEXT-TO-SQL
2.2.1 DESCRIPTION
The base paper explores prompt-based learning using two types of fixed
prompt structures (Type I and Type II), but it applies few-shot examples in a static
manner—embedding hardcoded natural language and SQL pairs into prompts
during benchmarking. While this improves performance, it lacks adaptability
across dynamic schemas or varying user queries. In contrast, our solution adopts
a dynamic few shot strategy using LangChain FewShotPromptTemplate.Here, 3–5
relevant natural language questions and their corresponding SQL queries are
programmatically selected and inserted into the prompt based on the user's
current query context. This method simulates how humans learn: by observing
examples before attempting similar tasks. It improves generalization without the
need for large training datasets or model fine-tuning. The dynamic nature allows
better alignment with schema variations and query intent in real-time.
2.2.2 MERIT
2.2.3 DEMERIT
2.3.1 DESCRIPTION
5
prompt regardless of their relevance to the user’s query. This increases prompt
length, introduces noise, and reduces accuracy—especially in large or complex
databases.Our solution overcomes these limitations through dynamic table
selection using semantic embeddings. Each table name and its metadata are
converted into vector representations using a pre-trained embedding model.This
allows the model to focus on only the necessary schema elements, improving the
quality of SQL generation and minimizing the inclusion of irrelevant or conflicting
tables.
2.3.2 MERIT
2.3.3 DEMERIT
2.4.1 DESCRIPTION
The base paper explores two prompt types (Type I and Type II), showing
that changing the structure and verbosity of prompts significantly impacts
performance. However, these prompts were manually crafted and static, requiring
trial-and-error tuning per use case. There was no support for dynamically adapting
the prompt based on schema, task, or query complexity.Our implementation
addresses this limitation through LangChain’s PromptTemplate and
FewShotPromptTemplate, which allow programmatic and flexible prompt
construction. Each prompt includes a clear task instruction, dynamically injected
schema, and optionally relevant few-shot examples—tailored to the user’s current
6
query. This reduces manual overhead, ensures consistency, and allows the
system to handle varied schemas and query types more effectively.
2.4.2 MERIT
2.4.3 DEMERIT
2.5.1 DESCRIPTION
2.5.2 MERIT
7
2.5.3 DEMERIT
2.6.1 DESCRIPTION
2.6.2 MERIT
2.6.3 DEMERIT
● The base system treats each query in isolation, discarding any context from
previous interactions.
● Users must repeat entire query details even for simple follow-ups, which
can be inefficient and frustrating.
● Potential risk of context leakage between user sessions.