0% found this document useful (0 votes)
10 views19 pages

First Page 1 - Removed

The document is a final year project report on an AI chatbot submitted by Ajay Singh and Jitesh Verma to the B.K. Birla Institute of Engineering & Technology. It outlines the project's objectives, system design, and implementation details, emphasizing the use of Natural Language Processing (NLP) and Python libraries. The report includes acknowledgments, a certificate of authenticity, and a structured table of contents detailing various chapters related to the chatbot's development.

Uploaded by

Ajay singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views19 pages

First Page 1 - Removed

The document is a final year project report on an AI chatbot submitted by Ajay Singh and Jitesh Verma to the B.K. Birla Institute of Engineering & Technology. It outlines the project's objectives, system design, and implementation details, emphasizing the use of Natural Language Processing (NLP) and Python libraries. The report includes acknowledgments, a certificate of authenticity, and a structured table of contents detailing various chapters related to the chatbot's development.

Uploaded by

Ajay singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

AI CHATBOT

Final year project report submitted to the department of Computer Science Engineering in partial

fulfilmentof therequirementsfor theBachelorofTechnology

By

AJAY SINGH (21EBKCS008)&JITESH VERMA (21EBKCS047)

Undertheguidance of

Prof.HimanshuVerma

AssociateProfessor

Department of Computer Science Engineering

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

B.K. BIRLA INSTITUTE OF ENGINEERING &TECHNOLOGY, PILANI (RAJ.)

Affiliated to Bikaner Technical University, Bikaner


B.K. BIRLA INSTITUTE OF ENGINEERING &TECHNOLOGY
Session 2025

CERTIFICATE

This is to certify that this project report entitled “Ai Chatbot” submitted in
partial fulfillment of the requirements for the degree of Bachelor of Technology in Computer
Science Engineering of the B.K.Birla Institute of Engineering & Technology , Pilani , during the
academic year 2025, is a bonafide record of work carried out under my guidance and
supervision.

AJAY SINGH (21EBKCS008)&JITESH VERMA (21EBKCS047)

Guide:

Mr.Himanshu Verma Dr. Nimish Kumar


(Asst. Professor, CSE Dept.) (HOD, CSE Dept.)
ACKNOWLEDGEMENT

It gives me immense pleasure to express my deepest sense of gratitude and sincere thanks to our guide MR.
Himanshu Verma, Assistant Professor, Department of Computer Science Engineering, BKBIET Pilani for his
valuable guidance, encouragement and help for completing this work. I would like to express our sincere
thanks to Dr. Nimish Kumar, HOD of Computer Science Engineering, BKBIET Pilani, for giving me this

opportunity to undertake this work. I would also like to thanks Dr. Anil Kumar Sharma, Principal BKBIET Pilani

if or the whole hearted support. .I am also grateful to my teacher Mr. Himanshu Verma for their constant support

and their guidance

. At the end I would like to express my sincere thanks to all my friends and others who helped me directly and

indirectly for completing this work.

Thanking you

AJAY SINGH (21EBKCS008)& JITESH VERMA (21EBKCS047)


ABSTRACT

We would like to express our heartfelt gratitude to everyone who has helped us throughout
our journey to complete this thesis project. First and foremost, we would like to thank our
project supervisor, Mr. Himanshu Verma, for his constant guidance, support, and
encouragement. His expertise in the fields of computer vision and machine learning has
been invaluable, and his feedback has been instrumental in shaping the direction of this
project.
We would also like to extend our sincere thanks to Dr. Himanshu erma, the course
coordinator and examiner, for his unwavering support and encouragement throughout this
process. His guidance and feedback have been crucial in ensuring the quality and rigor of
this work. Additionally, we would like to thank our family and friends for their love,
support, and encouragement throughout this journey. Their unwavering belief in us has
been a source of strength and inspiration. Finally, we would like to express our gratitude to
all the researchers, scholars, and practitioners whose work has informed and inspired this
project. Your contributions to the field of facial biometric technology have been invaluable,
and we are honored to have been able to build upon your work. Thank you all for your
contributions, support, and encouragement
.
Table of Contents

Chapter Title Page Number


Number

1 Introduction 7
1.1 Overview of the Project 7
1.2 Scope and Objective 7

2. System Design 8
2.1 Natural Language Processing 8
2.2 Advantage of Natural Language Processing 9
2.3 Disadvantage of Natural Language Processing 9
2.4 Architecture Diagram 10
2.5 Hardware Requirement 10
2.6 Software Requirement 10

3. Implementation and Analysis 11


3.1 Python Library 12
3.2 Data 12
3.3.Software Description 13
3.3.1.Python 13
3.3.2Pycharm 14
3.4.Sample Coding 15

4. Conclusion 17
Chapter 1.
INTRODUCTION

1.1 OVERVIEW OF THE PROJECT

A chatbot is a way of solving a user’s by interacting with a computer program. Chatbots and

other virtual methods of communication are becoming more popular as people turn away

from more conventional forms of communication. Chatbots provide a way for people to get

their queries answered virtually without having to correspond to emails or talk on the phone

with a customer service person. Chatbots can be adopted by an organization to increase their

time efficiency as AI-based chatbots can answer customers recurring questions easily.

1.2 SCOPE AND OBJECTIVE

SCOPE
The goal of the chatbot is to respond as precisely as possible to client questions. The

consumer should be able to receive the clearest possible responses to their inquiries.

OBJECTIVE

The main aim of the project is to provide the users with a platform to get their questions and

queries answered in a easy and simple way. This helps users avoid having to wait on a call or

wait for an E-mail to get their queries resolved. This chatbot is a way for users to get their

queries answered without having to go through a customer service call or having to talk to an

automated recording to get their concerns cleared. Here we train the chatbot to recognize

multiple types of words and phrases related to a particular question in order to provide as

proper answers to the users question as possible.


Chapter 2. System Design

2.1 Natural Language Processing (NLP)

Natural language processing (NLP) refers to the branch of computer science—and more

specifically, the branch of artificial intelligence or AI—concerned with giving computers the

ability to understand text and spoken words in much the same way human beings can.

NLP combines computational linguistics—rule-based modeling of human language—with

statistical, machine learning, and deep learning models. Together, these technologies enable

computers to process human language in the form of text or voice data and to ‘understand’ its

full meaning, complete with the speaker or writer’s intent and sentiment.

Natural language processing strives to build machines that understand and respond to text or

voice data—and respond with text or speech of their own—in much the same way humans

do.

Python and the Natural Language Toolkit (NLTK)

The Python programming language provides a wide range of tools and libraries for attacking

specific NLP tasks. Many of these are found in the Natural Language Toolkit, or NLTK, an

open source collection of libraries, programs, and education resources for building NLP

programs.

The NLTK includes libraries for many of the NLP tasks plus libraries for subtasks, such as

sentence parsing, word segmentation, stemming and lemmatization (methods of trimming


words down to their roots), and tokenization (for breaking phrases, sentences, paragraphs and

passages into tokens that help the computer better understand the text). It also includes libraries

for implementing capabilities such as semantic reasoning, the ability to reach logical

conclusions based on facts extracted from text.

2.1.1 Advantages of Natural Language Processing

 The accuracy of the answer increases with the amount of relevant information provided

in the questions.

 Structuring a high unstructured data source.

 Users can ask questions about any subject and get a direct response in seconds.

 It is easy to implement.

 Using a program is less costly than hiring a person. A person can take two or three

times longer than a machine to execute the tasks mentioned.

 NLP system provides answers to the questions in natural language.

 Allow you to perform more language-based data compares to a human being

without fatigue and in an unbiased and consistent way.

 NLP process help computer communicate with a human in their language and

scales other language-related tasks.

 It is a faster customer service response time.

2.1.2 Disadvantage of Natural Language Processing

 If it is necessary to develop a model with a new one without using a pre-trained

model, it can take a week to achieve a good performance depending o the amount of

data.

 The system is built for a single and specific task only, it is unable to adapt to new

domains and problems because of limited functions.


 In complex query language, the system may not be able to provide the correct

answer it a question that is poorly worded or ambiguous.

 It is not 100% reliable, It is never 100% dependable. There is the possibility of

error in its prediction and results.

2.1.3 Architecture Diagram

3.1 ARCHITECTURE DIAGRAM

2.1.4 Hardware Requirements

• Modern Operating System

 Windows 7 or 10

 Mac OS X 10.11 or higher, 64-bit

 Linux: RHEL 6/7, 64-bit

• X86 64-bit CPU

• 32 GB RAM

• 3 GB Storage

2.2 Software Requirements

• Programming Language - Python 3.9


• IDE - Pycharm

• Nltk module

• Tensorflow module

Chapter 3. Implementation and Analysis

3.1. Python Library

A Python library is a collection of related modules. It contains bundles of code that can be used

repeatedly in different programs. It makes Python Programming simpler and convenient for

the programmer. As we don’t need to write the same code again and again for different

programs. Python libraries play a very vital role in fields of Machine Learning, Data Science,

Data Visualization, etc.

Python libraries that are used in the project are:

 Nltk

 Tensorflow

 Numpy

 Random

 Pickle

Nltk:

The Natural Language Toolkit (NLTK) is a platform used for building Python programs that

work with human language data for applying in statistical natural language processing

(NLP). It contains text processing libraries for tokenization, parsing, classification,

stemming, tagging and semantic reasoning. It also includes graphical demonstrations and

sample data
sets as well as accompanied by a cook book and a book which explains the principles behind

the underlying language processing tasks that NLTK supports.

It comes with a hands-on guide that introduces topics in computational linguistics as well as

programming fundamentals for Python which makes it suitable for linguists who have no

deep knowledge in programming, engineers and researchers that need to delve into

computational linguistics, students and educators.

Tensorflow:

TensorFlow is a high-performance numerical calculation library that is open source. It is also

employed in deep learning algorithms and machine learning algorithms. It was created by the

Google Brain team researchers within the Google AI organization and is currently widely

utilized by math, physics, and machine learning researchers for complicated mathematical

computations. TensorFlow is designed to be fast, and it employs techniques such as XLA (XLA

or Accelerated Linear Algebra is a domain-specific compiler for linear algebra that can

accelerate TensorFlow models with potentially no source code changes.) to do speedy linear

algebra computations.

NumPy:

NumPy is one of the most widely used open-source Python libraries, focusing on scientific

computation. It features built-in mathematical functions for quick computation and supports

big matrices and multidimensional data. “Numerical Python” is defined by the term

“NumPy.” It can be used in linear algebra, as a multi-dimensional container for generic data,

and as a random number generator, among other things. Some of the important functions in

NumPy are arcsin(), arccos(), tan(), radians(), etc. NumPy Array is a Python object which

defines an N- dimensional array with rows and columns. In Python, NumPy Array is

preferred over lists because it takes up less memory and is faster and more convenient to use.

Random:
Python Random module is an in-built module of Python which is used to generate random

numbers. These are pseudo-random numbers means these are not truly random. This module

can be used to perform random actions such as generating random numbers, print random a

value for a list or string, etc.

Pickle:

Pickle in Python is primarily used in serializing and deserializing a Python object structure. In

other words, it’s the process of converting a Python object into a byte stream to store it in a

file/database, maintain program state across sessions, or transport data over the network. The

pickled byte stream can be used to re-create the original object hierarchy by unpickling the

stream.

3.2.Data

The crucial element in artificial intelligence tasks is the data. The results will be highly

influenced by the data that are given, how are they formatted, their consistency, their

relevance to the subject at hand and so on. At this step, many questions should be answered

in order to guarantee that the results will be accurate and relevant. The data that is used

should be clearly stated, in this case, with proper patterns and responses

3.3. Software

Description 3.3.1.Python

Python is an interpreted, object-oriented, high-level programming language with dynamic

semantics. Its high-level built in data structures, combined with dynamic typing and dynamic

binding, make it very attractive for Rapid Application Development, as well as for use as a

scripting or glue language to connect existing components together. Python's simple, easy to

learn syntax emphasizes readability and therefore reduces the cost of program maintenance.
Python supports modules and packages, which encourages program modularity and code reuse.

The Python interpreter and the extensive standard library are available in source or binary form

without charge for all major platforms, and can be freely distributed.

Python is the simplest language of all the programming languages, and in reality, is one-fifth

when compared with other OOP languages. This is why it is currently among the most well-

known languages in the marketplace.

Python comes with Prebuilt Libraries such as Numpy to perform scientific calculations,

Scipy for advanced computing, and Pybrain for machine learning (Python Machine

Learning), making it among the top languages for AI.

Python developers all over the globe offer extensive support and assistance through tutorials

and forums, helping the programmer much easier than another popular language.

Python is platform-independent and therefore is among the most adaptable and well-known

options for various platforms and technologies, with minimal modifications to the basics of

coding.

Python has the greatest flexibility among other programs, with the option of choosing among

OOPs method and scripting. Additionally, you can use the IDE to search for all codes and be

a blessing to developers struggling with different algorithms.

3.3.1. Pycharm

PyCharm is an Integrated Development Environment (IDE) used for programming in Python.

It provides code analysis, a graphical debugger, an integrated unit tester, integration with

version control systems (VCSes), and supports web development with Django. PyCharm is

developed by the Czech company JetBrains.


It is cross-platform working on Windows, Mac OS X and Linux. PyCharm has a

Professional Edition, released under a proprietary license and a Community Edition released

under the Apache License. PyCharm Community Edition is less extensive than the

Professional Edition.

3.4. Sample Code

chatbot.py:

from flask import Flask, request, jsonify,


send_from_directory from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
from langchain_google_genai import
GoogleGenerativeAIEmbeddings import google.generativeai as genai
from langchain.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains.question_answering import
load_qa_chain from langchain.prompts import PromptTemplate
from dotenv import load_dotenv

load_dotenv()
os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
app = Flask( name )

def get_pdf_text(pdf_files):
text = ""
for pdf in pdf_files:
pdf_reader = PdfReader(pdf)
for page in pdf_reader.pages:
text += page.extract_text()
return text

def get_text_chunks(text):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
chunks = text_splitter.split_text(text)
return chunks
def get_vector_store(text_chunks):
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
vector_store.save_local("faiss_index")

def get_conversational_chain():
prompt_template = """
Answer the question as detailed as possible from the provided context, make sure to provide all the
details. If user asks for summary, then summarize all the contents of the file and give the summary of the
content.\n\n
Context:\n {context}?\n
Question: \n{question}\n

Answer:
"""
model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.3)
prompt = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
chain = load_qa_chain(model, chain_type="stuff",
prompt=prompt) return chain

@app.route("/")
def serve_index():
return send_from_directory("static", "index.html")

@app.route("/upload", methods=["POST"])
def upload_files():
files = request.files.getlist("files")
pdf_texts = [get_pdf_text([file]) for file in files]
full_text = "".join(pdf_texts)
text_chunks = get_text_chunks(full_text)
get_vector_store(text_chunks)
return jsonify({"message": "Files processed successfully."})

@app.route("/summarize", methods=["POST"])
def summarize_pdf():
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.load_local(
"faiss_index", embeddings, allow_dangerous_deserialization=True
)
docs = vector_store.similarity_search("Summarize the
document.") chain = get_conversational_chain()
summary_response = chain(
{"input_documents": docs, "question": "Summarize the document in detail"},
return_only_outputs=True,
)
return jsonify({"summary": summary_response["output_text"]})

@app.route("/ask", methods=["POST"])
def ask_question():
user_question = request.json.get("question")
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.load_local(
"faiss_index", embeddings, allow_dangerous_deserialization=True
)
docs = vector_store.similarity_search(user_question)
chain = get_conversational_chain()
response = chain(
{"input_documents": docs, "question": user_question}, return_only_outputs=True
)
return jsonify({"response": response["output_text"]})

if name == " main ":


app.run(debug=True)
Conclusion

This project successfully implements a robust document processing and conversational AI


system using Flask, demonstrating the effective integration of natural language processing
(NLP) techniques and modern AI tools. The application enables users to upload PDF
documents, extract and process their textual content, and interact with the information
through advanced features such as summarization and question-answering. By leveraging
FAISS (Facebook AI Similarity Search) for efficient vector-based similarity search and
Google Generative AI for generating embeddings and responses, the system ensures
accurate, context-aware outputs.

Key Achievements:

1. Automated Document Analysis:


The system automates the extraction of text from PDFs, splitting the content into
manageable chunks for processing. This eliminates the need for manual document
analysis, saving time and effort.
2. Vector-Based Information Retrieval:
Through FAISS, the system efficiently organizes and retrieves relevant information
from large datasets. This allows precise responses to user queries and ensures the
scalability of the solution.
3. Conversational AI Integration:
The application incorporates advanced AI models to generate detailed and
contextually relevant summaries and answers. The prompt design ensures user queries
are handled effectively, whether the task is summarization or answering specific
questions.
4. User-Friendly Interaction:
By providing endpoints for uploading files, summarizing content, and asking
questions, the system offers an intuitive and seamless user experience, catering to
diverse use cases.

Challenges Addressed:
 The project successfully handles large documents by chunking text into smaller
segments, ensuring memory-efficient processing.
 It uses advanced embeddings and similarity search techniques to maintain high accuracy
in information retrieval.
 Security and stability concerns are mitigated through configurable storage paths and
modular code design.

Future Enhancements:

While the system performs well, there is potential for further improvements:
1. Enhanced Error Handling:
Incorporating comprehensive exception handling to address issues like file format
errors, API failures, and incomplete inputs.
2. Improved Security:
Refining deserialization mechanisms and implementing file validation to protect
against malicious uploads and data breaches.
3. Scalability:
Optimizing the vector store for handling even larger datasets and incorporating cloud-
based solutions for distributed processing.
4. Expanded Functionality:
Adding features such as multi-language support, optical character recognition (OCR)
for scanned PDFs, and dynamic question-answering across multiple files.

Impact and Applications:

This project showcases the transformative potential of AI-driven tools in automating and
enhancing document analysis tasks. It is applicable across a wide range of domains,
including:

 Education: Assisting students and educators by summarizing research papers,


textbooks, and lecture notes.
 Business: Enabling efficient analysis of reports, contracts, and policy documents.
 Research: Facilitating data extraction and insight generation from extensive
academic literature.

You might also like