Project Report - 7 - Merged
Project Report - 7 - Merged
SQL : AN INTELLIGENT
PROJECT WORK CONVERSATIONAL
INTERFACE FOR DATABASE
QUERYING
2025
Submitted by
THOUFEEQ A 71772117146
VAITHEESHWARAN S 71772117147
VISWESWARAN G S 71772117150
BOOPATHI HARI R 71772117L03
PROJECT WORK
APRIL 2025
of B.E. (COMPUTER SCIENCE AND ENGINEERING) during the year 2024 - 2025
We would like to dedicate the work to our parents for their constant
encouragement throughout the project. We also thank all our friends for their
cooperation and suggestions towards the successful completion of this project.
III
SYNOPSIS
The system utilizes Large Language Models (LLMs) to create SQL queries
from user queries. To make it more accurate, it employs dynamic few-shot learning
methods. Relevant examples are selected depending on their similarity to the user
query, thereby enhancing relevance.
One of the most important features of the system is dynamic table selection,
wherein the system scans input keywords and queries the relevant tables in the
database schema. This improves the accuracy of SQL results by avoiding
unnecessary or irrelevant references from tables.
The SQL query is constructed via GPT and then runs on a networked
MySQL database. The result of the query is then passed through GPT once more
to create a paraphrased, natural language response, thus becoming more
user-friendly. To handle multi-turn conversations and follow-up queries, the system
uses LangChain's Conversation Buffer Memory. Using this feature, the chatbot can
keep context awareness between questions, thus making the conversations more
natural and coherent.
IV
TABLE OF CONTENTS
BONAFIDE CERTIFICATE II
ACKNOWLEDGEMENT III
SYNOPSIS IV
TABLE OF CONTENTS V
1 INTRODUCTION 1
1.1 DESCRIPTION
2 LITERATURE REVIEW 4
2.1.1 DESCRIPTION
2.1.2 MERIT
2.1.3 DEMERIT
2.2.1 DESCRIPTION
2.2.2 MERIT
2.2.3 DEMERIT
V
2.3 DYNAMIC TABLE SELECTION 5
2.3.1 DESCRIPTION
2.3.2 MERIT
2.3.3 DEMERIT
2.4.1 DESCRIPTION
2.4.2 MERIT
2.4.3 DEMERIT
2.5.1 DESCRIPTION
2.5.2 MERIT
2.5.3 DEMERIT
2.4.1 DESCRIPTION
2.4.2 MERIT
2.4.3 DEMERIT
3 SYSTEM SPECIFICATION 9
4.5 TRAINING
5.1 IMPLEMENTATION 21
5.1.1 REQUIREMENTS
5.2 OUTPUT 27
6 CONCLUSION 29
6.1 CONCLUSION
7 REFERENCES 30
CHAPTER 1
INTRODUCTION
1.1 DESCRIPTION
1. Model Selection: With the emergence of various LLMs, selecting the
most suitable model for Text-to-SQL is challenging due to variations in
architecture, size, training data, and computational requirements.
2. No Dynamic Table Selection: Existing models typically consider the
entire schema during query generation, which leads to longer response times and
reduced accuracy.
3. Minimal Use of Few-Shot Learning: While few-shot learning could
enhance adaptability, existing systems rarely use contextual examples to improve
performance on previously unseen query patterns.
4. Lack of Memory Cache: Increases query processing time as results
are not stored for reuse, leading to redundant computations.
1
1.3 PROBLEM DEFINITION
2
1.5 ORGANIZATION OF THE PROJECT
3
CHAPTER 2
LITERATURE REVIEW
2.1.1 DESCRIPTION
The base paper emphasizes the use of large language models (LLMs),
such as GPT-based architectures, for translating natural language into SQL
queries. It focuses on prompt formatting strategies and benchmark testing to
evaluate model performance. While effective in structured experiments, this
approach is limited by static prompt design and lacks integration with live database
systems.Our project builds upon this by implementing a real-time, schema-aware
solution using LangChain.Instead of relying on static schema inputs, we
dynamically fetch and inject database schema information into prompts, enabling
more accurate and context-sensitive SQL generation. Unlike the base paper,
which evaluates outputs offline, our system interacts directly with live databases,
delivering instant query execution and feedback. This shift makes the solution
more practical and adaptable to real-world use cases.
2.1.2 MERIT
2.1.3 DEMERIT
4
2.2 FEW-SHOT LEARNING FOR TEXT-TO-SQL
2.2.1 DESCRIPTION
The base paper explores prompt-based learning using two types of fixed
prompt structures (Type I and Type II), but it applies few-shot examples in a static
manner—embedding hardcoded natural language and SQL pairs into prompts
during benchmarking. While this improves performance, it lacks adaptability
across dynamic schemas or varying user queries. In contrast, our solution adopts
a dynamic few shot strategy using LangChain FewShotPromptTemplate.Here, 3–5
relevant natural language questions and their corresponding SQL queries are
programmatically selected and inserted into the prompt based on the user's
current query context. This method simulates how humans learn: by observing
examples before attempting similar tasks. It improves generalization without the
need for large training datasets or model fine-tuning. The dynamic nature allows
better alignment with schema variations and query intent in real-time.
2.2.2 MERIT
2.2.3 DEMERIT
2.3.1 DESCRIPTION
5
prompt regardless of their relevance to the user’s query. This increases prompt
length, introduces noise, and reduces accuracy—especially in large or complex
databases.Our solution overcomes these limitations through dynamic table
selection using semantic embeddings. Each table name and its metadata are
converted into vector representations using a pre-trained embedding model.This
allows the model to focus on only the necessary schema elements, improving the
quality of SQL generation and minimizing the inclusion of irrelevant or conflicting
tables.
2.3.2 MERIT
2.3.3 DEMERIT
2.4.1 DESCRIPTION
The base paper explores two prompt types (Type I and Type II), showing
that changing the structure and verbosity of prompts significantly impacts
performance. However, these prompts were manually crafted and static, requiring
trial-and-error tuning per use case. There was no support for dynamically adapting
the prompt based on schema, task, or query complexity.Our implementation
addresses this limitation through LangChain’s PromptTemplate and
FewShotPromptTemplate, which allow programmatic and flexible prompt
construction. Each prompt includes a clear task instruction, dynamically injected
schema, and optionally relevant few-shot examples—tailored to the user’s current
6
query. This reduces manual overhead, ensures consistency, and allows the
system to handle varied schemas and query types more effectively.
2.4.2 MERIT
2.4.3 DEMERIT
2.5.1 DESCRIPTION
2.5.2 MERIT
7
2.5.3 DEMERIT
2.6.1 DESCRIPTION
2.6.2 MERIT
2.6.3 DEMERIT
● The base system treats each query in isolation, discarding any context from
previous interactions.
● Users must repeat entire query details even for simple follow-ups, which
can be inefficient and frustrating.
● Potential risk of context leakage between user sessions.
8
CHAPTER 3
SYSTEM SPECIFICATION
RAM : 8 GB or higher
Platform Support
Operating System
Browser Performance
OpenAI provides tools and APIs for developing AI-based applications utilizing
advanced language models such as ChatGPT (GPT-3.5 / GPT-4).
KEY FEATURES
9
● Facilitates tasks such as text generation, summarization, translation, and
conversational AI.
KEY FEATURES
KEY FEATURES
KEY FEATURES
10
3.2.5 PACKAGE: PYMYSQL
KEY FEATURES
KEY FEATURES
KEY FEATURES
KEY FEATURES
11
● Includes connectors and components for third-party tools.
KEY FEATURES
KEY FEATURES
● Integrates well with other Python libraries like NumPy and Matplotlib.
12
CHAPTER 4
METHODOLOGY
ARCHITECTURE DIAGRAM
13
4.2 ARCHITECTURE DESCRIPTION
14
entire input is correctly taken, stored, and validated before moving to the next step.
This module also supports multi-line prompts and edge cases like incomplete or
ambiguous queries. It acts as a bridge between the human input and machine
logic. Moreover, it enhances usability by allowing flexible and varied query formats.
By handling user input gracefully, it increases the system's overall
user-friendliness.
15
4.3.4 Query Generation using GPT
Once the SQL query is executed and the raw result is retrieved from
the database, this module takes over to convert the output into a user-friendly
sentence. It ensures that the final result is not a plain table or number but a
well-structured and grammatically correct English sentence. This is crucial for
users who are not familiar with SQL output formats. The module uses
16
template-based or model-based methods to transform tabular responses into
human-readable narratives. It may also include additional explanation or
summarization for clarity. This layer adds polish to the user experience and
bridges the gap between technical data and user understanding. It plays a crucial
role in making the system usable by non-technical stakeholders. Its goal is to
provide accurate, readable, and meaningful results.
4.3.7 Chatbot UI
The ChatBot UI is the graphical user interface that allows the user to
communicate with the system. It features an input box for questions, a display
area for the responses, and a conversational layout for interaction history. The UI
is designed to be simple, clean, and responsive to accommodate all types of
users. Built with modern front-end technologies, it supports real-time interactions
and dynamic updates. The interface makes it easy to handle multi-turn queries
and lets users visualize both their input and the output clearly. The UI plays a key
role in increasing engagement and usability. It also offers basic validations, error
handling, and retry mechanisms. Overall, this module makes the underlying
complex system accessible to all users.
17
4.5 TRAINING
● Query correctness
● Schema relevance
● Response clarity
18
4.8 DATASETS USED
This structured schema allows the system to perform complex SQL queries across
multiple tables, supporting a wide range of user prompts. The dataset was
synthetically generated to mimic real-world e-commerce data and is used as the
backend for validating SQL queries and generating responses.
19
CHAPTER 5
IMPLEMENTATION AND RESULT
5.1.1 REQUIREMENTS
The following packages were installed by using command
● gunicorn
● Django>=4.2
● openai
● langchain>=0.1.13
● langchain-community>=0.0.26
● langchain-core>=0.1.25
● sqlalchemy>=2.0
● pymysql
● djangorestframework
● django-cors-headers
Langchain_core.py
import os
import re
from functools import lru_cache
from django.conf import settings
20
# === Session Memory ===
store = {}
def get_memory(session_id):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
21
# === SQL Prompt Builder ===
def build_sql_prompt():
schema = get_schema_description()
return PromptTemplate.from_template(
f"""You are an expert SQL assistant.
Use the schema and examples below to write a valid MySQL query for the user's
question.
Only return the raw SQL query without explanation or markdown formatting.
Schema:
{schema}
{EXAMPLES}
Chat History:
{{chat_history}}
User: {{question}}
SQL:"""
)
22
try:
print(f"\nExecuting SQL Query:\n{query}")
result = db.run(query)
return result if result else {"info": "No results found."}
except Exception as e:
print(f"\nSQL Execution Error: {e}")
return {"error": f"Query failed: {str(e)}"}
Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer:"""
)
rephrase_chain = answer_prompt | llm | StrOutputParser()
# 1. Generate SQL
raw_sql = memory_chain.invoke(
{"question": question},
config={"configurable": {"session_id": session_id}}
)
clean_query = clean_sql_output(raw_sql)
clean_query = correct_common_sql_errors(clean_query)
# 2. Execute SQL
sql_result = execute_query(clean_query)
# 3. Rephrase answer
return rephrase_chain.invoke({
"question": question,
23
"query": clean_query,
"result": str(sql_result)
})
urls.py
urlpatterns = [
path("ask/", AskQuestion.as_view(), name="ask-question"),
]
views.py
class AskQuestion(APIView):
def post(self, request):
question = request.data.get("question")
session_id = request.data.get("session_id", "default-session")
answer = process_question(question, session_id)
return Response({"answer": answer})
Index.tsx
24
AlertTriangle,
CheckCircle,
HelpCircle,
Loader,
} from "lucide-react";
type Message = {
role: "user" | "assistant";
text: string;
timestamp: Date;
status?: "success" | "error" | "pending";
};
25
textareaRef.current.scrollHeight,
150
)}px`;
}
}, [question]);
try {
const res = await axios.post(
"https://nl2sql-backend-zqrg.onrender.com/api/ask/",
{
question: text,
session_id: "frontend-user",
}
);
setMessages([
...updatedMessages,
{
role: "assistant",
text: res.data.answer,
26
timestamp: new Date(),
status: "success",
},
]);
} catch (err) {
setMessages([
...updatedMessages,
{
role: "assistant",
text: "Something went wrong. Please try again.",
timestamp: new Date(),
status: "error",
},
]);
} finally {
setLoading(false);
}
};
return (
27
<div
className={`flex flex-col h-screen ${
theme === "dark" ? "bg-gray-900 text-white" : "bg-gray-50 text-gray-900"
}`}
>
{/* Header with better branding */}
<header
className={`${
theme === "dark"
? "bg-gray-800 border-gray-700"
: "bg-white border-gray-200"
} border-b px-6 py-4 flex items-center justify-between shadow-sm`}
>
<div className="flex items-center space-x-3">
<div className="bg-gradient-to-r from-blue-600 to-purple-600 text-white
p-2 rounded-lg">
<Database size={24} />
</div>
<h1 className="text-2xl font-bold bg-gradient-to-r from-blue-600
to-purple-600 text-transparent bg-clip-text">
SQL Assistant
</h1>
</div>
<div className="flex items-center space-x-4">
<button
onClick={toggleTheme}
className={`${
theme === "dark"
? "text-gray-300 hover:text-white"
: "text-gray-500 hover:text-gray-700"
} flex items-center gap-1 text-sm px-3 py-1 rounded-md
hover:bg-opacity-10 hover:bg-gray-500`}
☀️ 🌙
>
{theme === "dark" ? " Light" : " Dark"}
</button>
<button
onClick={clearConversation}
className={`${
theme === "dark"
? "text-gray-300 hover:text-white"
: "text-gray-500 hover:text-gray-700"
} flex items-center gap-1 text-sm px-3 py-1 rounded-md
hover:bg-opacity-10 hover:bg-gray-500`}
>
28
<Trash2 size={16} />
Clear chat
</button>
</div>
</header>
29
} border rounded-lg text-left hover:border-blue-300 transition-all flex
items-start space-x-3`}
onClick={() => useSuggestion(suggestion.text)}
>
<div
className={`mt-1 ${
theme === "dark" ? "text-blue-400" : "text-blue-500"
}`}
>
<ChevronRight size={16} />
</div>
<div>
<div
className={`text-sm font-medium ${
theme === "dark" ? "text-blue-400" : "text-blue-600"
}`}
>
{suggestion.category}
</div>
<div>{suggestion.text}</div>
</div>
</button>
))}
</div>
</div>
) : null}
{messages.map((msg, i) => (
<div
key={i}
className={`flex ${
30
msg.role === "user" ? "justify-end" : "justify-start"
}`}
>
<div
className={`rounded-2xl p-4 max-w-3xl whitespace-pre-wrap
shadow-sm
${
msg.role === "user"
? "bg-gradient-to-r from-blue-500 to-blue-600 text-white"
: theme === "dark"
? "bg-gray-800 border-gray-700 text-gray-100"
: "bg-white border border-gray-200 text-gray-800"
}
`}
>
<div className="flex justify-between items-center mb-2">
<div className="flex items-center">
{msg.role === "user" ? (
<span className="font-semibold flex items-center">
You
</span>
):(
<span className="font-semibold flex items-center">
<Database size={16} className="mr-1" /> SQL Assistant
</span>
)}
</div>
<span
className={`text-xs ${
msg.role === "user"
? "opacity-75"
: theme === "dark"
? "text-gray-400"
: "text-gray-500"
}`}
>
{formatTime(msg.timestamp)}
</span>
</div>
<div
className={`${msg.role === "assistant" ? "prose" : ""} ${
theme === "dark" && msg.role === "assistant"
? "prose-invert"
: ""
31
}`}
>
{msg.text}
</div>
{msg.status === "error" && (
<div className="flex items-center text-red-500 text-sm mt-2">
<AlertTriangle size={14} className="mr-1" /> Error: Unable
to process request
</div>
)}
</div>
</div>
))}
{loading && (
<div className="flex justify-start">
<div
className={`${
theme === "dark"
? "bg-gray-800 border-gray-700 text-gray-300"
: "bg-white border-gray-200 text-gray-600"
} rounded-2xl p-4 shadow-sm border flex items-center space-x-3`}
>
<Loader size={18} className="animate-spin" />
<span>Generating response...</span>
</div>
</div>
)}
32
theme === "dark" ? "bg-gray-700" : "bg-white"
} rounded-xl border ${
isExpanded
? "border-blue-400 shadow-md"
: theme === "dark"
? "border-gray-600 shadow-sm"
: "border-gray-300 shadow-sm"
} transition-all duration-200`}
>
<textarea
ref={textareaRef}
className={`w-full p-4 pr-24 resize-none focus:outline-none rounded-xl
max-h-36 ${
theme === "dark"
? "bg-gray-700 text-white placeholder-gray-400"
: "bg-white text-gray-700 placeholder-gray-500"
}`}
placeholder="Ask about your database..."
value={question}
onChange={(e) => {
setQuestion(e.target.value);
setIsExpanded(e.target.value.length > 0);
}}
onKeyDown={handleKeyDown}
onFocus={() => setIsExpanded(true)}
onBlur={() => setIsExpanded(question.length > 0)}
rows={1}
/>
<button
onClick={() => askQuestion()}
disabled={loading || !question.trim()}
className={`absolute bottom-3 right-3 p-2 rounded-lg transition-all ${
loading || !question.trim()
? theme === "dark"
? "bg-gray-600 text-gray-400"
: "bg-gray-100 text-gray-400"
: "bg-gradient-to-r from-blue-500 to-purple-600 text-white shadow-md
hover:shadow-lg"
}`}
>
<Send size={20} />
</button>
</div>
<div className="flex justify-between items-center mt-2">
33
<p
className={`text-xs ${
theme === "dark" ? "text-gray-400" : "text-gray-500"
}`}
>
Press Enter to send • Shift+Enter for new line
</p>
<div className="flex items-center">
<span
className={`text-xs mr-2 ${
theme === "dark" ? "text-gray-400" : "text-gray-500"
}`}
>
Powered by AI
</span>
<Code
size={14}
className={theme === "dark" ? "text-gray-400" : "text-gray-500"}
/>
</div>
</div>
</div>
</footer>
</div>
);}
34
5.2 Output
35
36
CHAPTER 6
CONCLUSION
6.1 CONCLUSION
37
CHAPTER 7
REFERENCES
[1] Biderman, S., Schoelkopf, H., Anthony, Q.G., Bradley, H., O’Brien, K.,
Hallahan, E., Khan, M.A., Purohit, S., Prashanth, U.S.V.S.N., Raff, E., et al., 2023,
"Pythia: A Suite for Analyzing Large Language Models Across Training and
Scaling", International Conference on Machine Learning, PMLR, pp. 2397–2430
[2] Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar,
E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., et al., 2023, "Sparks of Artificial General
Intelligence: Early Experiments with GPT-4", arXiv preprint, arXiv:2303.12712
[3] Dahl, D.A., Bates, M., Brown, M.K., Fisher, W.M., Hunicke-Smith, K., Pallett,
D.S., Pao, C., Rudnicky, A., Shriberg, E., 1994, "Expanding the Scope of the ATIS
Task: The ATIS-3 Corpus", Human Language Technology Workshop Proceedings,
Plainsboro, New Jersey, March 8–11, 1994
[4] Edward, B., Fourrier, C., Habib, N., Han, S., Lambert, N., Rajani, N.,
Sanseviero, O., Tunstall, L., Wolf, T., 2023, "Open LLM Leaderboard",
HuggingFace
[5] Geng, X., Gudibande, A., Liu, H., Wallace, E., Abbeel, P., Levine, S., Song,
D., 2023, "Koala: A Dialogue Model for Academic Research", Blog Post
[6] Hemphill, C.T., Godfrey, J.J., Doddington, G.R., 1990, "The ATIS Spoken
Language Systems Pilot Corpus", Speech and Natural Language Workshop
Proceedings, Hidden Valley, Pennsylvania, June 24–27, 1990[
[7] Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T.,
Rutherford, E., de Las Casas, D., Hendricks, L.A., Welbl, J., Clark, A., et al., 2022,
"Training Compute-Optimal Large Language Models", arXiv preprint,
arXiv:2203.15556
[8] Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen,
W., 2021, "LoRA: Low-Rank Adaptation of Large Language Models", arXiv
preprint, arXiv:2106.09685
38
[9] Katsogiannis-Meimarakis, G., Koutrika, G., 2023, "A Survey on Deep
Learning Approaches for Text-to-SQL", The VLDB Journal, pp. 1–32
[10], Kocetkov, D., Li, R., Ben Allal, L., Li, J., Mou, C., Muñoz Ferrandis, C.,
Jernite, Y., Mitchell, M., Hughes, S., Wolf, T., Bahdanau, D., von Werra, L., de
Vries, H., 2022, "The Stack: 3 TB of Permissively Licensed Source Code", Preprint
[11] Köpf, A., Kilcher, Y., von Rütte, D., Anagnostidis, S., Tam, Z.R., Stevens, K.,
Barhoum, A., Duc, N.M., Stanley, O., Nagyfi, R., et al., 2023, "OpenAssistant
Conversations – Democratizing Large Language Model Alignment", arXiv preprint,
arXiv:2304.07327
[12] Koschke, R., Falke, R., Frenzel, P., 2006, "Clone Detection Using Abstract
Syntax Suffix Trees", 13th Working Conference on Reverse Engineering, IEEE,
pp. 253–262
[14] Li, H., Zhang, J., Li, C., Chen, H., 2023, "RESDSQL: Decoupling Schema
Linking and Skeleton Parsing for Text-to-SQL", AAAI Conference on Artificial
Intelligence, Vol. 37, pp. 13067–13075
[15] Li, J., Hui, B., Cheng, R., Qin, B., Ma, C., Huo, N., Huang, F., Du, W., Si, L.,
Li, Y., 2023, "Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware
Layers for Text-to-SQL Parsing", arXiv preprint, arXiv:2301.07507
39