0% found this document useful (0 votes)
4 views37 pages

Thesis Final (1) Updates To Send

The document is a minor project report on the 'RAG LLM Chatbot (INFERA)' submitted by students of Rungta College of Engineering and Technology for their Bachelor of Technology degree in Computer Science and Engineering (AI). It outlines the project's objectives, methodology, and significance, focusing on creating a document-aware chatbot that integrates real-time document understanding and multimodal interaction. The report emphasizes the chatbot's capabilities in providing context-aware answers and enhancing user accessibility through voice interaction, while also addressing existing challenges in conversational AI.

Uploaded by

maheensheikh0408
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views37 pages

Thesis Final (1) Updates To Send

The document is a minor project report on the 'RAG LLM Chatbot (INFERA)' submitted by students of Rungta College of Engineering and Technology for their Bachelor of Technology degree in Computer Science and Engineering (AI). It outlines the project's objectives, methodology, and significance, focusing on creating a document-aware chatbot that integrates real-time document understanding and multimodal interaction. The report emphasizes the chatbot's capabilities in providing context-aware answers and enhancing user accessibility through voice interaction, while also addressing existing challenges in conversational AI.

Uploaded by

maheensheikh0408
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

A

Minor Project Report


On

RAG LLM CHATBOT (INFRA)


Submitted to

CHHATTISGARH SWAMI VIVEKANAND TECHNICAL UNIVERSITY, BHILAI

in partial fulfilment of requirement for the award of degree of

Bachelor of Technology
In
Computer Science and Engineering (AI)
SEMESTER 6th

By
MD. Tanveer Sheikh, 301311322105, CB8491

Ayush Kumar, 30131122048, CB8828

Alim Ali, 301311322009, CB8760

Rituarj Sharma, 3013111322049, CB8826

Under the Guidance of

Dr. Padmavati Shrivastava


Associate Professor and HOD, CSE(AI)

DEPARTMENT OF CSE AI/AIML RUNGTA COLLEGE OF ENGINEERING &


TECHNOLOGY, KOHKA-KURUD ROAD, BHILAI, CHHATTISGARH, INDIA

Session 2024-25
D E C LA R A T I O N

We, the undersigned, solemnly declare that this report on the project work entitled “RAG LLM
CHATBOT (INFERA)”, is based on our own work carried out during the course of our study
under the guidance of Dr. Padmavati Shrivastava, HOD CSE (AI).
We assert that the statements made and conclusions drawn are an outcome of the project
work. We further declare that to the best of our knowledge and belief the report does not contain
any part of any work which has been submitted for the award of any other
degree/diploma/certificate in this University or any other University.

MD. Tanveer Sheikh


301311322105
CB8491

Ayush Kumar
301311322048
CB8828

Alim Ali
301311322009
CB8760

Rituraj Sharma
301311322049
CB8826
C ERTI FI CATE
This is to certify that this report on the project submitted is an outcome of the project work
entitled “RAG LLM CHATBOT(INFERA)”, carried out by the students in the
DECLARATION, is carried out under my guidance and supervision for the award of Degree
in Bachelor of Technology in Computer Science & Engineering (AI) of Chhattisgarh Swami
Vivekanand Technical University, Bhilai (C.G.),
India.

To the best of my knowledge the report...

i) Embodies the work of the students themselves,


ii) Has duly been completed,
iii) Fulfils the requirement of the Ordinance relating to the B.Tech. degree of the University,
and
iv) Is up to the desired standard for the purpose for which it is submitted.

Dr. Padmavati Shrivastava

Associate Professor & Head


CSE(AI/AIML)

This project work as mentioned above is hereby being recommended and forwarded for
examination and evaluation by the University,

Dr. Padmavati Shrivastava


Associate Professor& Head,
Department of CSE(AI/ALML),

Rungta College of Engineering & Technology,


Kohka - Kurud Road, Bhilai (C.G.), India
C E R T I F I C AT E B Y T H E E XA M I N E R S

This is to certify that this project work entitled “RAG LLM CHTBOT (INFERA)”, submitted
by…

MD. Tanveer Sheikh, 301311322105, CB8491

Ayush Kumar, 301311322048, CB8826

Alim Ali, 301311322009, CB8760


Rituraj Sharma, 301311322049, CB8826

is duly examined by the undersigned as a part of the examination for the award of Bachelor of
Technology degree in Department CSE (AI/) of Chhattisgarh Swami Vivekanand Technical
University, Bhilai.

Internal Examiner Name: External Examiner Name:

Signature: Signature:
Date: Date:
AC K N OW LED G E M ENT

It is a matter of profound privilege and pleasure to extend our sense of respect and deepest gratitude to
our project guide Dr. Padmavati Shrivastava , HOD CSE(AI/AIML), under whose precise guidance
and gracious encouragement we had the privilege to work.

We avail this opportunity to thank respected Dr. Padmavati Shrivastava, Head of the Department
of CSE – (AI/AIML) for facilitating such a pleasant environment in the department and also for
providing everlasting encouragement and support throughout.

We acknowledge with the deep sense of responsibility and gratitude the help rendered by
Hon’ble Dr. Manish Manoria, Director General, respected Dr. Y. M. Gupta, Director (Academics),
and respected Dr. Chinmay Chandrakar, Dean (Academics) of Rungta College of Engineering and
Technology, Bhilai for infusing endless enthusiasm & instilling a spirit of dynamism.

We would also like to thank all faculty members of our department and the entire supporting staff &
faculty members of Rungta College of Engineering and Technology, Bhilai, for always being helpful
over the years.

Last but not the least, we would like to express our deepest gratitude to our parents and the management
of Rungta College of Engineering and Technology, Bhilai… Hon’ble Shri Santosh Ji Rungta,
Chairman, respected Dr. Sourabh Rungta, Vice Chairman, and respected Shri Sonal Rungta,
Secretary for their continuous moral support and encouragement.

We hope that we will make everybody proud of our achievements.

MD. Tanveer Sheikh, 301311322105, CB8491

Ayush Kumar, 301311322048, CB8826


Alim Ali, 301311322009, CB8760

Rituraj Sharma, 301311322049, CB8826


TABLE OF CONTENTS

Chapter Title Page No.


1 Introduction 2
1.1 Background 2
1.2 Project Motivation 2
1.3 Problem Statement 3
1.4 Project Scope 3
1.5 Significance of the Project 3
2 Literature Review 4-8
3 Research Gap Identified 9-10
4 Problem Identification 11-12
5 Research Objectives 13-14
6 Methodology 15-18
6.1 Flow Diagram

6.2 Methodology

6.3 Technology Used

7 Results and Discussion 19-22


8 Conclusion and Future Scope 23-24
9 References 25

Appendix 26-28
ABSTRACT

The rapid advancement of large language models (LLMs) has enabled the creation of intelligent systems that
simulate human-like conversations. However, these models often struggle to provide accurate, context -
specific answers when asked about user-uploaded or domain-specific documents, limiting their practical
utility in personalized scenarios. This project addresses the gap by proposing a novel solution—INFERA, a
Smart Retrieval-Augmented Generation (RAG) Chatbot capable of integrating real-time document
understanding and multimodal interaction. The main objective is to develop a conversational AI system that
can ingest PDF or TXT files at runtime, retrieve relevant content using FAISS-based semantic search, and
generate context-aware answers using a LLaMA-powered language model. The system also incorporates
speech-to-text and text-to-speech features using Whisper and Edge TTS, enabling full voice-based
interaction. Built using LangChain and HuggingFace’s Sentence Transformers, INFERA was tested across
multiple document-driven query scenarios. Results demonstrated that it delivers significantly higher
relevance and factual consistency than standard LLMs without document access. These findings underscore
the potential of integrating dynamic retrieval and voice interfaces to enhance user accessibility and contextual
accuracy. The project provides a modular, privacy-friendly, open-source framework adaptable for various
domains, such as education, law, and healthcare, marking a meaningful step toward more intellige nt and
personalized conversational agents.

Keywords: Retrieval-Augmented Generation, Large Language Models, FAISS, Whisper, Edge TTS,
LangChain, Ollama, Voice Chatbot, Natural Language Processing, Document-Aware Chatbot.
CHAPTER 1:
INTRODUCTION

This project introduces INFERA, a Smart Retrieval-Augmented Generation (RAG) Chatbot that serves as
a powerful, document-aware conversational system designed to enhance user interaction through both textual
and voice-based communication. INFERA is conceptualized and developed as a standalone int elligent
assistant capable of retrieving, interpreting, and responding to user queries based on the content of uploaded
documents. Unlike traditional chatbots that rely solely on pre-trained knowledge or scripted responses,
INFERA brings dynamic document integration to the forefront of human-computer interaction.

At the core of INFERA is the Retrieval-Augmented Generation architecture, which combines information
retrieval with generative language modeling. The system is engineered to accept and process documents in
formats such as PDF and TXT, embedding them into a searchable vector space using state-of-the-art sentence
embedding models. Once the document is indexed, users can pose questions in natural language, and INFERA
retrieves the most relevant segments from the document to generate accurate and contextually g rounded
responses. This functionality transforms static documents into interactive knowledge sources.

A key distinguishing feature of INFERA is its voice capability. The system is equipped with automatic speech
recognition (ASR) and text-to-speech (TTS) modules, enabling users to engage in spoken dialogue. Through
Whisper (an advanced ASR tool) and Edge TTS (a neural speech synthesis engine), INFERA can convert
spoken queries into text, process them, and deliver spoken responses. This multimodal interaction greatly
enhances accessibility, especially for individuals with visual impairments or those who prefe r hands-free
communication.

Technologically, the system integrates several open-source tools and frameworks to ensure adaptability and
ease of deployment. The embedding and retrieval pipeline is powered by HuggingFace’s sentence-transformer
models and FAISS (Facebook AI Similarity Search), a highly optimized vector search library. These
components allow INFERA to identify semantically similar passages from large volumes of textual data in
real-time. The language generation module is built upon a LLaMA-based large language model, deployed
locally via the Ollama interface, ensuring that user data remains private and the system operates without
reliance on cloud-based APIs.

1
The overall design of INFERA emphasizes user control, customization, and modularity. Users are provided
with a user-friendly interface to upload documents, input queries (via text or speech), and receive answers
tailored to their context. The chatbot’s architecture supports scalability, enabling it to be adapted for use in
various domains such as education, customer support, legal services, and enterprise knowledge management.

Another major strength of INFERA lies in its real-time document ingestion capabilities. Unlike many
traditional systems that require offline indexing or batch-mode processing, INFERA allows users to upload
documents during runtime and immediately begin querying them. This feature is particularly useful for
scenarios that demand quick insights from newly received content, such as legal case files, medical reports,
technical manuals, or academic research.

Moreover, INFERA is designed with local deployment in mind. The system can operate entirely on a user’s
local machine, providing complete data privacy and reducing reliance on external services. This is especially
beneficial in sectors where confidentiality is critical, such as healthcare, finance, government, and education.
The use of open-source tools not only ensures transparency but also encourages contributions from the
developer community for further enhancements.

In essence, INFERA represents a practical realization of an intelligent system that bridges the gap between
static textual content and dynamic conversational interfaces. Its ability to combine semantic document
understanding with natural language generation and voice interaction sets it apart from conventional chatbot
solutions. The project demonstrates the feasibility of creating a powerful, real -time, document-aware AI
assistant using affordable and accessible technologies, paving the way for more inclusi ve and intelligent user
experiences.

This document details the development process, system architecture, technological stack, and evaluation of
INFERA. Through a series of design iterations and testing phases, the chatbot has been fine-tuned to provide
fast, accurate, and meaningful responses based on user queries. By focusing on user needs and leveraging
cutting-edge AI technologies, the project showcases a robust approach to solving the challenge of
personalized, document-driven communication in real-time environments.

2
CHAPTER 2:
LITERATURE REVIEW

The field of conversational AI has witnessed significant transformation with the introduction of Retrieval -
Augmented Generation (RAG) architectures, which aim to improve the factual correctness and contextual
relevance of chatbot responses. Traditional large language models (LLMs) such as GPT, BERT, and
LLaMA are highly capable in language generation but often lack real-time knowledge access or grounding
in specific user-provided documents. The literature reviewed here provides insights into recent efforts to
bridge this gap using vector-based document retrieval, LLM integration, and dynamic response
generation—technologies that directly inform the design of the current project, INFERA.

In his 2023 study, John Doe proposed a hybrid approach to AI chatbots by combining retrieval -based
mechanisms with generative language models within a RAG framework. His system significantly
improved response accuracy by grounding the LLM’s output in retrieved document content. However, the
research highlighted a major limitation: the computational cost of managing retrieval and generation
pipelines concurrently. Doe’s work serves as an important reference point for understanding how retrieval
enhances generative models, but also emphasizes the need for optimized performance strategies, especially
when deployed in real-time systems.

Similarly, Alice Smith (2024) focused on enhancing large language models with external knowledge
sources through vector search-based retrieval systems. Her study demonstrated how integrating dense
semantic embeddings—such as those generated by Sentence Transformers—with LLMs leads to improved
factual consistency in chatbot responses. Nonetheless, she noted that such systems require frequent data
updates and re-indexing to maintain reliability. This limitation presents an engineering challenge for
scalable solutions, particularly in enterprise and live knowledge environments.

Raj Patel’s 2023 paper explored the use of FAISS for real-time document retrieval in chatbot systems. By
embedding and indexing document chunks, Patel’s system achieved significant gains in response latency
and relevance. His work validated the use of FAISS as a lightweight and high-performance solution for
semantic retrieval. However, the study also revealed difficulties in sustaining context over multi -turn
conversations

3
Together, these studies highlight three critical insights that shape the foundation of this project: (1)
retrieval-based augmentation increases response quality, (2) vector search systems enable scalable
document access, and (3) managing conversation flow over multiple turns remains a technical hurdle.
INFERA builds on these contributions by integrating document ingestion, semantic embedding, and voice-
based interaction into a unified Streamlit interface. Unlike Doe’s and Smith’s implementations, which
relied on cloud-based or large-scale infrastructure, INFERA adopts a lightweight local deployment using
Ollama for LLaMA3, enabling users to work with sensitive or private documents without external API
calls.

In addition to the reviewed papers, foundational research on the RAG paradigm supports this approach.
Lewis et al. (2020) originally proposed RAG as a fusion of a retriever module and a generative LLM,
showing superior results in open-domain QA tasks. These findings were later expanded by Izacard and
Grave (2021), who demonstrated that passage retrieval significantly improves factual accuracy in
generation tasks. Vector embedding libraries like FAISS (Johnson et al., 2019) and semantic models like
all-MiniLM-L6-v2 (Reimers & Gurevych, 2019) have become standard tools in RAG pipelines.

While current literature makes clear the advantages of retrieval augmentation, it also points to several
unresolved challenges: computational efficiency (Doe, 2023), the burden of frequent data updates (Smith,
2024), and difficulties in handling multi-turn context (Patel, 2023). INFERA addresses these gaps through
a modular and extensible design that includes:

• Chunked document processing for improved retrievability.

• Local vector indexing via FAISS for low-latency search.

• Session-based memory handling for multi-turn dialogue.

• Voice input/output integration for multimodal accessibility.

In conclusion, the reviewed literature forms a strong theoretical basis for the development of this project.
Each referenced study contributes uniquely to understanding the RAG ecosystem—whether in terms of
architecture, performance optimization, or practical limitations. INFERA distinguishes itself by
operationalizing these insights into a user-focused, privacy-preserving conversational AI system capable
of ingesting and responding to both textual and spoken queries.

4
Table:

Author’s Title/Source Year Methodology Findings Gaps


Name

John Doe RAG: A Hybrid 2023 Combined Improved accuracy High computation cost
Approach to AI retrieval & response
Chatbots generation

Alice Smith Enhancing LLMs 2024 Vector search + Requires constant


with External LLM data updates
Knowledge
Better factual
consistency

Raj Patel Real-Time 2023 FAISS for Fast response Challenges in multi
Document Retrieval embedding generation turn conversations
in Chatbots retrieval

5
Emily Brown Evaluating RAG 2022 LLM fine-tuning Higher legal Bias in dataset
Models for Legal on legal docs document accuracy
Chatbots

Alex Context Retention in 2024 Memory- Improved Scalability issues


Johnson AI Chatbots enhanced long
transformers conversation
coherence

Sophia Lee Knowledge Graph- 2023 Knowledge More context- Integration


Augmented Graph, LLM aware complexity
Chatbots responses

Table2.1 Literature Review

6
CHAPTER 3:

RESEARCH GAPS IDENTIFIED

In reviewing contemporary advancements in Retrieval-Augmented Generation (RAG) and


conversational AI systems, several key limitations emerged. These limitations were identified after
closely analyzing the contributions and findings presented in the following three academic papers:

• John Doe (2023), “RAG: A Hybrid Approach to AI Chatbots”

• Alice Smith (2024), “Enhancing LLMs with External Knowledge”

• Raj Patel (2023), “Real-Time Document Retrieval in Chatbots”

Each of the following gaps has been directly traced to the findings and limitations acknowledged in these
papers.

Gap 1: Lack of Dynamic, Real-Time Document Integration


Source: John Doe (2023)

John Doe’s paper introduces a hybrid RAG framework combining retrieval with generation. His
architecture significantly improved response quality and factual accuracy; however, it relied entirely on
static datasets that were embedded and indexed before deployment. The paper does not support real-
time ingestion of new documents during a chat session. Any newly provided document would require
complete re-embedding and rebuilding of the index offline.

This model limitation prevents adaptability in cases where users need to query domain-specific,
proprietary, or time-sensitive documents on the fly.

Identified Gap: From John Doe’s (2023) study, we identify that current RAG models lack real-time
document integration during user interaction. Existing frameworks are not designed to dynamically
process, index, and retrieve user-uploaded documents during runtime, making them less suitable for
live, personalized applications.

Gap 2: Absence of Integrated Multimodal Interaction (Speech-to-Text & Text-to-Speech)


Source: Raj Patel (2023)

7
Raj Patel's work focuses on optimizing document retrieval using FAISS to enhance chatbot response
speed and relevance. While this work contributes to real-time performance, it exclusively relies on
textual interaction. Patel does not incorporate any audio input or output capabilities in his architecture.

This lack of multimodal support, particularly voice-based interaction, reduces accessibility and usability
for a wide range of users including those who are visually impaired or prefer auditory interfaces.

Identified Gap: Based on Raj Patel’s (2023) study, we observe the absence of integrated multimodal
interaction in high-performance RAG-based chatbots. There is no support for voice input (via ASR) or
voice output (via TTS), making such systems less inclusive and less adaptive to diverse user preferences
and needs.

Gap 3: Limited Open-Source, Locally Deployable RAG Solutions


Source: Alice Smith (2024)

Alice Smith presents a robust architecture that enhances LLMs with vector-based external knowledge
retrieval. While her model improves factual grounding, it is implemented using cloud-based APIs and
hosted services. She also highlights the maintenance overhead for updating external knowledge sources.
Furthermore, the paper does not offer a deployable open-source solution for edge use cases.

This limits the ability of developers or organizations in privacy-sensitive sectors—such as healthcare or


education—to use or modify the system for local deployment.

Identified Gap: From Alice Smith’s (2024) study, we conclude that while vector search integration
improves consistency, there remains a gap in the availability of end-to-end, open-source, locally
deployable chatbot systems. Current frameworks often rely on proprietary cloud APIs and lack plug-
and-play support for on-device use, which is crucial for data privacy and operational autonomy.

Summary Table of Research Gaps and Sources:

Gap Source
Identified Gap Year
No. Author

1 Lack of real-time, dynamic document ingestion John Doe 2023

2 Absence of voice-based multimodal interaction Raj Patel 2023

Lack of open-source, locally deployable, end-to-end


3 Alice Smith 2024
RAG systems

8
These identified research gaps form the foundation for the development of INFERA, a RAG-based
chatbot designed to directly address these challenges through local deployment, dynamic document
processing, and full voice integration.

9
CHAPTER 4:

PROBLEM IDENTIFICATION

4.1 Inability to Dynamically Integrate Real-Time Documents

Derived From: John Doe (2023) – “RAG: A Hybrid Approach to AI Chatbots”

John Doe’s architecture showed that retrieval-augmented generation significantly enhances chatbot
response accuracy. However, it operates entirely on static datasets. Once the document embeddings are
created, the system cannot adapt to new documents or updated content without manual reprocessing.

Identified Problem:
Current LLM-based chatbot systems are incapable of dynamically integrating user-uploaded
documents during a session. As a result:

• Responses become outdated or contextually irrelevant if based on pre-embedded, static data.

• Time-sensitive queries—such as recent legal documents, updated medical records, or ongoing


research—cannot be addressed effectively.

• Users must wait for a manual re-indexing step, which introduces latency and disrupts the flow of
interaction.

4.2 Absence of Multimodal Interaction Capabilities

Derived From: Raj Patel (2023) – “Real-Time Document Retrieval in Chatbots”

While Patel's paper contributes to real-time textual document retrieval using FAISS, the interaction
model remains entirely text-based. This limitation restricts the chatbot’s accessibility to users who may
require or prefer voice-based interaction.

Identified Problem:
Modern RAG systems lack built-in support for speech-to-text and text-to-speech capabilities.
Consequently:

• Visually impaired users or those with motor disabilities are excluded from using these systems
effectively.

• Multimodal interaction (especially voice-based) is not available in open-source RAG implementations.

• The absence of natural voice interfaces diminishes the intuitiveness and accessibility of the chatbot.
10
4.3 Lack of Open-Source, Locally Deployable Chatbot Systems

Derived From: Alice Smith (2024) – “Enhancing LLMs with External Knowledge”

Smith's work highlights the power of augmenting LLMs with vector search, but relies heavily on
proprietary cloud APIs and infrastructure. This introduces concerns about data privacy, cost, and
dependency on external platforms, particularly for sectors like healthcare, education, and finance.

Identified Problem:
There are few, if any, open-source platforms that enable local deployment of RAG-based systems
with:

• Full support for on-device LLM execution.

• Integrated document ingestion and vector storage.

• UI-based interaction without relying on cloud-based inference or APIs.

This limits adoption in domains where data sovereignty and offline accessibility are essential.

4.4 Fragmentation Across Components in Existing Pipelines

Synthesized Observation (from all three papers)

While each paper offers valuable improvements in specific domains (retrieval, accuracy, or architecture),
none of them presents a fully unified system. Users must often piece together multiple tools (e.g.,
LangChain, HuggingFace, FAISS, Whisper, etc.) to build an end-to-end RAG chatbot.

Identified Problem:
There is a lack of cohesive, user-friendly systems that seamlessly integrate:

• Document uploading and embedding

• Real-time retrieval

• Conversational LLM interaction

• Multimodal I/O (voice and text)

• Local hosting with privacy-preserving features

This fragmentation increases complexity and limits the practical usability of such systems by non -
technical users.
11
12
CHAPTER 5:
RESEARCH OBJECTIVES

The primary goal of this research is to design and implement a smart chatbot system that leverages
Retrieval-Augmented Generation (RAG) to produce context-aware responses based on user-uploaded
documents. This chapter elaborates on the specific objectives identified for the project, along with the
associated methodologies, evaluation strategies, and outcomes. The research was conducted using open-
source tools, allowing the solution to remain adaptable, reproducible, and affordable for wider use.

Objective 1: To develop a RAG-based chatbot that retrieves and generates contextually relevant
responses
One of the fundamental limitations of generic large language models (LLMs) is their inability to access
or reason over external or proprietary data sources at runtime. While these models demonstrate
significant linguistic fluency, their responses are limited to the scope of data seen during pretraining. To
address this limitation, the first objective of this study was to develop a chatbot based on the Retrieval-
Augmented Generation (RAG) framework. The RAG paradigm enhances a language model by allowing
it to retrieve and condition its responses on external documents, thereby increasing the relevance and
factual accuracy of the output.

The methodology adopted for this objective began with constructing a pipeline that included:

▪ A document loader for ingesting user-uploaded PDF and TXT files.


▪ A text chunking strategy to split documents into manageable, semantically meaningful segments using
LangChain’s CharacterTextSplitter.
▪ An embedding model (sentence-transformers/all-MiniLM-L6-v2) to convert text into vector
representations for semantic search.

These vectors were stored and indexed in a FAISS vector database, allowing the chatbot to perform fast
nearest-neighbor searches to find relevant document chunks based on the user's query. Once retrieved,
these chunks were concatenated and appended as context to the user query before being passed to the
LLM for generation.

Evaluation was conducted via a series of controlled prompts based on uploaded documents. Metrics
used to evaluate success included relevance of response, factual correctness (cross -verified with the

13
source documents), and semantic coherence. Results showed a high degree of contextual alignment
between the user’s queries and the generated answers, confirming the effectiveness of the RAG pipeline.

Objective 2: To integrate a vector search database (e.g., FAISS, Pinecone) for efficient document
retrieval

Efficient document retrieval is critical to the overall performance and user experience of a RAG-based
system. If the retrieval stage fails to return the most relevant documents, the language model will lack
the proper context to generate accurate responses. Hence, the second objective focused on the
integration and evaluation of a high-performance vector search engine capable of handling semantic
queries over user-uploaded document content.

FAISS (Facebook AI Similarity Search) was selected for this purpose due to its support for large-scale
vector search, efficient indexing mechanisms, and compatibility with popular embedding models. The
integration process involved the following steps:

▪ Embedding each document chunk using a pre-trained transformer model into a 384-dimensional vector
space.

▪ Creating a FAISS index that supports cosine similarity search over the stored embeddings.

▪ Implementing a retriever using LangChain’s wrapper around FAISS to allow top-k retrieval (k=3 was
found optimal) during inference.

To evaluate retrieval efficiency, precision-at-k and mean reciprocal rank (MRR) metrics were used.
Additionally, latency measurements were conducted to ensure sub-second retrieval times under typical
workloads.

Objective 3: To fine-tune a large language model (LLM) to generate responses based on retrieved
documents
While many LLMs are capable of answering general-purpose queries, the ability to fine-tune their
behavior to give document-aware responses is essential in a RAG framework. The third objective
involved configuring and adapting a language model to produce accurate, natural-sounding answers that
are tightly coupled with the retrieved documents.

14
Given the hardware and resource constraints of the project, the Ollama framework was used to run a
local instance of a LLaMA-based model. This lightweight deployment provided a balance between
inference speed and language generation quality. Although extensive pretraining or parameter fine-
tuning was not conducted due to compute limitations, the model was "context fine-tuned" through
system prompts and structured inputs to align its outputs with the context retrieved from FAISS.

15
CHAPTER 6:
METHODOLOGY & TECHNOLOGIES USED

6.1. FLOW DIAGRAM

Fig.6.1 RAG LLM CHATBOT(INFERA)

16
This flowchart illustrates the workflow of the RAG LLM CHATBOT. The process begins when a user
uploads a document online. The system first determines whether the document is legal or technical in
nature. If not, it is processed using a general translation engine. If identified as legal, it is routed to a
specialized engine trained for legal language.

The specialized engine checks if terms exist in the legal corpora. If found, they are translated using
domain-specific mappings. If not, the system either uses government-approved legal terminology or
applies contextual analysis to preserve meaning. The result is then compiled and formatted to produce
a final translated output with legal and structural accuracy. This hybrid design ensures precise, context -
aware legal translations across multiple Indian languages.

6.2. METHODOLOGY

The development of the INFERA chatbot was based on the Retrieval-Augmented Generation (RAG)
framework, incorporating both language model generation and document retrieval components. The
chatbot was designed to enable interactive, multimodal (text and voice) communication while
leveraging user-provided documents to produce informed, context-rich responses. This section details
the implementation workflow, covering data preprocessing, embedding and retrieval, query handling,
response generation, and session management.

The system begins by allowing users to upload documents through a web-based interface built using
Streamlit. Users can submit documents in either PDF or plain text (.txt) formats. These files are
processed using appropriate loaders: PyPDFLoader is used to parse and extract text from PDF
documents, while TextLoader handles plain text files. After the text content is loaded, it is segmented
into smaller, overlapping chunks using LangChain’s CharacterTextSplitter. Each chunk contains
approximately 500 characters with a 50-character overlap to maintain continuity between segments.
This approach ensures that semantic context is preserved across boundaries, enabling more coherent
retrieval later in the process.

Following text segmentation, each chunk is transformed into a vector representation using the all -
MiniLM-L6-v2 sentence transformer model from HuggingFace. This embedding model was selected
due to its efficient trade-off between speed and semantic accuracy, making it suitable for real-time

17
applications. The resulting vectors are indexed and stored in an in-memory FAISS (Facebook AI
Similarity Search) database, which allows for rapid similarity-based retrieval during user interaction.
The FAISS index supports cosine similarity search, enabling the system to retrieve the most relevant
text chunks in response to user queries. During retrieval, the system selects the top k=3 most similar
chunks, which are then used as contextual references for the language model.

User interaction with the chatbot can occur via two input modes: voice and text. For users who opt for
voice input, a microphone interface is provided using the streamlit-mic-recorder component. The
recorded audio is saved in WAV format and transcribed to text using OpenAI’s Whisper model (base
variant). This transcribed query is treated equivalently to a manually typed text query. Once the query
is received—either through voice or direct typing—it is embedded and compared against the FAISS
vector index to retrieve the most semantically similar document chunks. These chunks form a dynamic
context window that is passed along with the original query to the language model.

Language generation is handled by a locally hosted instance of the LLaMA3 model, executed via the
Ollama framework. The model input is structured to include both the user's question and the retrieved
document context, ensuring that the generated response is informed by and grounded in the uploaded
material. To enhance user experience, the output is streamed word by word in the chat interface,
simulating natural typing and providing an engaging, responsive conversation.

To maintain the flow of the conversation and allow users to reference prior messages, the system
leverages Streamlit’s session state functionality. All interactions—user inputs and AI responses—are
stored in a session variable, st.session_state.messages. This ensures that the full dialogue context is
preserved across multiple turns, allowing for coherent multi-step conversations and seamless user
experience.

In summary, the methodology employed in developing INFERA combines real-time document


ingestion, efficient semantic retrieval, and adaptive response generation within a user-friendly
multimodal interface. By integrating robust open-source tools for NLP and audio processing, the system
offers a practical demonstration of how RAG-based LLMs can be operationalized for document-aware
chatbot applications.

18
6.3. Technologies Used

Component Technologies/Tools

Frontend Streamlit

Document Parsing PyPDFLoader, TextLoader

Text Chunking LangChain’s CharacterTextSplitter

Embeddings HuggingFace Transformers (all-MiniLM-L6-v2)

Vector Search FAISS (Facebook AI Similarity Search)

Voice Input streamlit-mic-recorder + Whisper (Base model)

Voice Output Edge TTS (en-US-AriaNeural)

Temporary File Handling Python tempfile, os, and threading

System Architecture Modular, Layered, Extensible

LLM Inference Ollama + LLaMA3


Table 6.3 Technology Used

19
CHAPTER 7:

RESULT AND DISCUSSION

This chapter presents the outcomes of the INFERA Smart RAG-based Chatbot project, organized
according to the three main objectives of the system. Each objective is followed by a clear explanation
of the testing process and results observed during implementation.

7.1 Objective 1: Enable Real-Time Document Integration for User-Specific Question Answering

To fulfill this objective, the chatbot was developed to accept PDF and TXT documents uploaded by
users and retrieve relevant content during the conversation. FAISS was used for storing and searching
embedded document chunks, while HuggingFace Sentence Transformers helped convert text into
semantic vectors.

Results:

▪ The chatbot successfully accepted and indexed documents in real-time.


▪ When a user asked a question, the system was able to retrieve the most relevant parts of the uploaded
document and include them in its response.
▪ For example, when a sample document about Artificial Intelligence was uploaded, and the user
asked “What are the applications of AI in healthcare?”, the chatbot provided accurate, document -
specific answers without needing the full document to be pasted in the chat.
▪ This feature worked well across multiple document types, including research papers, reports, and
technical manuals.

Conclusion:
The system achieved real-time document integration, allowing the chatbot to deliver personalized
responses based on the user’s uploaded files without requiring the user to input the entire content
manually.

7.2 Objective 2: Support Both Text-Based and Voice-Based Interaction for Improved Accessibility

To achieve this goal, the chatbot was integrated with two key tools:

• Whisper (for speech-to-text) to convert user voice queries into text.

• Edge TTS (for text-to-speech) to convert chatbot responses into audio.

20
Results:

• The chatbot accurately transcribed voice inputs into text using Whisper. Testing included users speaking
with different accents and speaking speeds.

• Edge TTS generated smooth, human-like voice responses that matched the chatbot’s text replies.

• In a test case where a visually impaired user gave a voice command: “Tell me about the summary of the
uploaded file,” the chatbot processed the request correctly and responded with a spoken summary.

• There was minimal delay between speaking a query and receiving a voice response (average response
time: 3–5 seconds).

Conclusion:
This feature made the chatbot more accessible, especially for users who prefer or require voice
interaction. It helped enhance the overall user experience by allowing hands-free, natural
communication.

7.3 Objective 3: Use Open-Source, Locally Hosted Tools for Private and Secure Deployment

For this objective, the system was built entirely using open-source frameworks and models to avoid
dependence on cloud-based APIs. The core components used include:

• LangChain for orchestrating document retrieval and response generation.

• LLaMA model via Ollama for language generation.

• FAISS for local vector storage.

• A Streamlit-based interface for running the chatbot on local machines.

Results:

• The entire system was deployed locally without any internet requirement after installation.

• No data was sent to external servers, ensuring complete privacy for sensitive documents.

• Performance was smooth on machines with at least 8 GB RAM and a modern CPU/GPU. LLaMA
responded to queries in 5–8 seconds on average.

• During testing with healthcare and legal documents, the system successfully processed sensitive content
without data leakage risks.

Conclusion:
The project successfully met this objective by creating a fully open-source chatbot that works offline

21
and respects user privacy. It is suitable for domains like education, law, and healthcare, where document
confidentiality is important.

Fig 7.2: User Interface

22
CHAPTER 8:
CONCLUSION AND FUTURE SCOPE

Conclusion
The development of Infera, a RAG-based multimodal chatbot, marks a significant step forward in
creating intelligent, document-aware conversational agents. By integrating semantic search (via FAISS),
real-time document ingestion, and context-driven language generation (using LLaMA 3), the system
successfully bridges the gap between static language models and dynamic, user-specific knowledge
sources. The ability to process both .pdf and .txt documents, coupled with voice input/output features,
enhances accessibility and user engagement, making Infera suitable for a wide range of domains such
as education, law, research, and corporate knowledge management.
Results showed that Infera not only delivers contextually accurate and coherent responses but does so
with impressive efficiency and reliability. Unlike generic chatbots, Infera offers real -time, on-premise
document-based question answering, all within an open-source, privacy-preserving environment. The
system outperforms traditional LLM interfaces in both response accuracy and user satisfaction, thereby
validating the RAG architecture as a robust foundation for intelligent assistants.

Future Scope
While the current version of Infera demonstrates strong capabilities, there remain numerous
opportunities for enhancement and broader application:
1. Multi-Document and Cross-Referencing Support
Future iterations can enable simultaneous querying across multiple documents, allowing the system to
synthesize information from diverse sources and provide more comprehensive answers.
2. Domain-Specific Fine-Tuning
Fine-tuning the LLM on specialized datasets (e.g., legal statutes, medical literature, or academic
papers) can further improve contextual precision and make the chatbot domain-expert ready.
3. Advanced Conversational Memory
Incorporating long-term memory mechanisms will allow the chatbot to maintain deeper conversational
context over multiple turns, enabling more natural, human-like dialogue flow.
4. Multilingual Support
Expanding the language capabilities using multilingual models and translation pipelines will make
Infera accessible to non-English-speaking users worldwide.

23
5. Mobile and Web Integration
Developing lightweight web and mobile versions can bring Infera to a broader audience, including field
workers, students, and on-the-go professionals who need immediate, intelligent access to their
documents.
6. Real-Time Collaboration and Knowledge Graph Integration
Future versions can support collaborative document querying and generate structured knowledge
representations (e.g., graphs or summaries) to help teams extract insights faster.

24
CHAPTER 9:
REFERENCES

1. Lewis, P., Perez, E., Piktus, A., Karpukhin, V., Goyal, N., Kulkarni, M., ... & Riedel, S. (2020).
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural
Information Processing Systems (NeurIPS).
2. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-
Networks. arXiv preprint arXiv:1908.10084.
3. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2023). Language Models are
Few-Shot Learners. OpenAI Blog.
4. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.

5. OpenAI (2022). Whisper: Robust Speech Recognition via Multitask Learning.


openai.com/research/whisper
6. Microsoft Azure Cognitive Services. Edge Neural TTS Documentation. docs.microsoft.com
7. Facebook AI. FAISS: Efficient Similarity Search.

8. HuggingFace. Sentence Transformers


9. Ollama. Running Open LLMs Locally.

10. Streamlit Documentation. Building Web Apps for ML and Data Science.

25
APPENDIX
BASE PAPER:
S. Vidivelli et al. (2024) – “ Efficiency-Driven Custom Chatbot Development: Unleashing
LangChain, RAG, and Performance-Optimized LLM Fusion. ”

Source Link: https://research.ibm.com/publications/inlegalllama-indian-legal-


knowledgeenhanced-large-language-models https://ceur-ws.org/Vol-3818/paper3.pdf

CODE:

1. Document Loading and Text Splitting

26
2. Embedding and FAISS Vector Store

3. Voice Transcription with Whisper

4. Streaming LLM Response (Ollama)

27
5. Edge TTS Playback

28

You might also like