First Page 1 - Removed
First Page 1 - Removed
Final year project report submitted to the department of Computer Science Engineering in partial
By
Undertheguidance of
Prof.HimanshuVerma
AssociateProfessor
CERTIFICATE
This is to certify that this project report entitled “Ai Chatbot” submitted in
partial fulfillment of the requirements for the degree of Bachelor of Technology in Computer
Science Engineering of the B.K.Birla Institute of Engineering & Technology , Pilani , during the
academic year 2025, is a bonafide record of work carried out under my guidance and
supervision.
Guide:
It gives me immense pleasure to express my deepest sense of gratitude and sincere thanks to our guide MR.
Himanshu Verma, Assistant Professor, Department of Computer Science Engineering, BKBIET Pilani for his
valuable guidance, encouragement and help for completing this work. I would like to express our sincere
thanks to Dr. Nimish Kumar, HOD of Computer Science Engineering, BKBIET Pilani, for giving me this
opportunity to undertake this work. I would also like to thanks Dr. Anil Kumar Sharma, Principal BKBIET Pilani
if or the whole hearted support. .I am also grateful to my teacher Mr. Himanshu Verma for their constant support
. At the end I would like to express my sincere thanks to all my friends and others who helped me directly and
Thanking you
We would like to express our heartfelt gratitude to everyone who has helped us throughout
our journey to complete this thesis project. First and foremost, we would like to thank our
project supervisor, Mr. Himanshu Verma, for his constant guidance, support, and
encouragement. His expertise in the fields of computer vision and machine learning has
been invaluable, and his feedback has been instrumental in shaping the direction of this
project.
We would also like to extend our sincere thanks to Dr. Himanshu erma, the course
coordinator and examiner, for his unwavering support and encouragement throughout this
process. His guidance and feedback have been crucial in ensuring the quality and rigor of
this work. Additionally, we would like to thank our family and friends for their love,
support, and encouragement throughout this journey. Their unwavering belief in us has
been a source of strength and inspiration. Finally, we would like to express our gratitude to
all the researchers, scholars, and practitioners whose work has informed and inspired this
project. Your contributions to the field of facial biometric technology have been invaluable,
and we are honored to have been able to build upon your work. Thank you all for your
contributions, support, and encouragement
.
Table of Contents
1 Introduction 7
1.1 Overview of the Project 7
1.2 Scope and Objective 7
2. System Design 8
2.1 Natural Language Processing 8
2.2 Advantage of Natural Language Processing 9
2.3 Disadvantage of Natural Language Processing 9
2.4 Architecture Diagram 10
2.5 Hardware Requirement 10
2.6 Software Requirement 10
4. Conclusion 17
Chapter 1.
INTRODUCTION
A chatbot is a way of solving a user’s by interacting with a computer program. Chatbots and
other virtual methods of communication are becoming more popular as people turn away
from more conventional forms of communication. Chatbots provide a way for people to get
their queries answered virtually without having to correspond to emails or talk on the phone
with a customer service person. Chatbots can be adopted by an organization to increase their
time efficiency as AI-based chatbots can answer customers recurring questions easily.
SCOPE
The goal of the chatbot is to respond as precisely as possible to client questions. The
consumer should be able to receive the clearest possible responses to their inquiries.
OBJECTIVE
The main aim of the project is to provide the users with a platform to get their questions and
queries answered in a easy and simple way. This helps users avoid having to wait on a call or
wait for an E-mail to get their queries resolved. This chatbot is a way for users to get their
queries answered without having to go through a customer service call or having to talk to an
automated recording to get their concerns cleared. Here we train the chatbot to recognize
multiple types of words and phrases related to a particular question in order to provide as
Natural language processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving computers the
ability to understand text and spoken words in much the same way human beings can.
statistical, machine learning, and deep learning models. Together, these technologies enable
computers to process human language in the form of text or voice data and to ‘understand’ its
full meaning, complete with the speaker or writer’s intent and sentiment.
Natural language processing strives to build machines that understand and respond to text or
voice data—and respond with text or speech of their own—in much the same way humans
do.
The Python programming language provides a wide range of tools and libraries for attacking
specific NLP tasks. Many of these are found in the Natural Language Toolkit, or NLTK, an
open source collection of libraries, programs, and education resources for building NLP
programs.
The NLTK includes libraries for many of the NLP tasks plus libraries for subtasks, such as
passages into tokens that help the computer better understand the text). It also includes libraries
for implementing capabilities such as semantic reasoning, the ability to reach logical
The accuracy of the answer increases with the amount of relevant information provided
in the questions.
Users can ask questions about any subject and get a direct response in seconds.
It is easy to implement.
Using a program is less costly than hiring a person. A person can take two or three
NLP process help computer communicate with a human in their language and
model, it can take a week to achieve a good performance depending o the amount of
data.
The system is built for a single and specific task only, it is unable to adapt to new
Windows 7 or 10
• 32 GB RAM
• 3 GB Storage
• Nltk module
• Tensorflow module
A Python library is a collection of related modules. It contains bundles of code that can be used
repeatedly in different programs. It makes Python Programming simpler and convenient for
the programmer. As we don’t need to write the same code again and again for different
programs. Python libraries play a very vital role in fields of Machine Learning, Data Science,
Nltk
Tensorflow
Numpy
Random
Pickle
Nltk:
The Natural Language Toolkit (NLTK) is a platform used for building Python programs that
work with human language data for applying in statistical natural language processing
stemming, tagging and semantic reasoning. It also includes graphical demonstrations and
sample data
sets as well as accompanied by a cook book and a book which explains the principles behind
It comes with a hands-on guide that introduces topics in computational linguistics as well as
programming fundamentals for Python which makes it suitable for linguists who have no
deep knowledge in programming, engineers and researchers that need to delve into
Tensorflow:
employed in deep learning algorithms and machine learning algorithms. It was created by the
Google Brain team researchers within the Google AI organization and is currently widely
utilized by math, physics, and machine learning researchers for complicated mathematical
computations. TensorFlow is designed to be fast, and it employs techniques such as XLA (XLA
or Accelerated Linear Algebra is a domain-specific compiler for linear algebra that can
accelerate TensorFlow models with potentially no source code changes.) to do speedy linear
algebra computations.
NumPy:
NumPy is one of the most widely used open-source Python libraries, focusing on scientific
computation. It features built-in mathematical functions for quick computation and supports
big matrices and multidimensional data. “Numerical Python” is defined by the term
“NumPy.” It can be used in linear algebra, as a multi-dimensional container for generic data,
and as a random number generator, among other things. Some of the important functions in
NumPy are arcsin(), arccos(), tan(), radians(), etc. NumPy Array is a Python object which
defines an N- dimensional array with rows and columns. In Python, NumPy Array is
preferred over lists because it takes up less memory and is faster and more convenient to use.
Random:
Python Random module is an in-built module of Python which is used to generate random
numbers. These are pseudo-random numbers means these are not truly random. This module
can be used to perform random actions such as generating random numbers, print random a
Pickle:
Pickle in Python is primarily used in serializing and deserializing a Python object structure. In
other words, it’s the process of converting a Python object into a byte stream to store it in a
file/database, maintain program state across sessions, or transport data over the network. The
pickled byte stream can be used to re-create the original object hierarchy by unpickling the
stream.
3.2.Data
The crucial element in artificial intelligence tasks is the data. The results will be highly
influenced by the data that are given, how are they formatted, their consistency, their
relevance to the subject at hand and so on. At this step, many questions should be answered
in order to guarantee that the results will be accurate and relevant. The data that is used
should be clearly stated, in this case, with proper patterns and responses
3.3. Software
Description 3.3.1.Python
semantics. Its high-level built in data structures, combined with dynamic typing and dynamic
binding, make it very attractive for Rapid Application Development, as well as for use as a
scripting or glue language to connect existing components together. Python's simple, easy to
learn syntax emphasizes readability and therefore reduces the cost of program maintenance.
Python supports modules and packages, which encourages program modularity and code reuse.
The Python interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed.
Python is the simplest language of all the programming languages, and in reality, is one-fifth
when compared with other OOP languages. This is why it is currently among the most well-
Python comes with Prebuilt Libraries such as Numpy to perform scientific calculations,
Scipy for advanced computing, and Pybrain for machine learning (Python Machine
Python developers all over the globe offer extensive support and assistance through tutorials
and forums, helping the programmer much easier than another popular language.
Python is platform-independent and therefore is among the most adaptable and well-known
options for various platforms and technologies, with minimal modifications to the basics of
coding.
Python has the greatest flexibility among other programs, with the option of choosing among
OOPs method and scripting. Additionally, you can use the IDE to search for all codes and be
3.3.1. Pycharm
It provides code analysis, a graphical debugger, an integrated unit tester, integration with
version control systems (VCSes), and supports web development with Django. PyCharm is
Professional Edition, released under a proprietary license and a Community Edition released
under the Apache License. PyCharm Community Edition is less extensive than the
Professional Edition.
chatbot.py:
load_dotenv()
os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
app = Flask( name )
def get_pdf_text(pdf_files):
text = ""
for pdf in pdf_files:
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text()
return text
def get_text_chunks(text):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
chunks = text_splitter.split_text(text)
return chunks
def get_vector_store(text_chunks):
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
vector_store.save_local("faiss_index")
def get_conversational_chain():
prompt_template = """
Answer the question as detailed as possible from the provided context, make sure to provide all the
details. If user asks for summary, then summarize all the contents of the file and give the summary of the
content.\n\n
Context:\n {context}?\n
Question: \n{question}\n
Answer:
"""
model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.3)
prompt = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
chain = load_qa_chain(model, chain_type="stuff",
prompt=prompt) return chain
@app.route("/")
def serve_index():
return send_from_directory("static", "index.html")
@app.route("/upload", methods=["POST"])
def upload_files():
files = request.files.getlist("files")
pdf_texts = [get_pdf_text([file]) for file in files]
full_text = "".join(pdf_texts)
text_chunks = get_text_chunks(full_text)
get_vector_store(text_chunks)
return jsonify({"message": "Files processed successfully."})
@app.route("/summarize", methods=["POST"])
def summarize_pdf():
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.load_local(
"faiss_index", embeddings, allow_dangerous_deserialization=True
)
docs = vector_store.similarity_search("Summarize the
document.") chain = get_conversational_chain()
summary_response = chain(
{"input_documents": docs, "question": "Summarize the document in detail"},
return_only_outputs=True,
)
return jsonify({"summary": summary_response["output_text"]})
@app.route("/ask", methods=["POST"])
def ask_question():
user_question = request.json.get("question")
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.load_local(
"faiss_index", embeddings, allow_dangerous_deserialization=True
)
docs = vector_store.similarity_search(user_question)
chain = get_conversational_chain()
response = chain(
{"input_documents": docs, "question": user_question}, return_only_outputs=True
)
return jsonify({"response": response["output_text"]})
Key Achievements:
Challenges Addressed:
The project successfully handles large documents by chunking text into smaller
segments, ensuring memory-efficient processing.
It uses advanced embeddings and similarity search techniques to maintain high accuracy
in information retrieval.
Security and stability concerns are mitigated through configurable storage paths and
modular code design.
Future Enhancements:
While the system performs well, there is potential for further improvements:
1. Enhanced Error Handling:
Incorporating comprehensive exception handling to address issues like file format
errors, API failures, and incomplete inputs.
2. Improved Security:
Refining deserialization mechanisms and implementing file validation to protect
against malicious uploads and data breaches.
3. Scalability:
Optimizing the vector store for handling even larger datasets and incorporating cloud-
based solutions for distributed processing.
4. Expanded Functionality:
Adding features such as multi-language support, optical character recognition (OCR)
for scanned PDFs, and dynamic question-answering across multiple files.
This project showcases the transformative potential of AI-driven tools in automating and
enhancing document analysis tasks. It is applicable across a wide range of domains,
including: