0% found this document useful (0 votes)

16 views8 pages

DL Pro 456

Uploaded by

prapadhya uppalapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

DL Pro 456

Uploaded by

prapadhya uppalapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Methodology/Algorithms

LLMs are advanced natural language processing models that utilize deep learning techniques to
understand and generate human-like text. These models, such as OpenAI's GPT (Generative Pre-
trained Transformer) series, are trained on vast amounts of text data to learn the intricacies of
human language. LLMs are capable of a wide range of language tasks, including text generation,
translation, summarization, and question answering.

Fig: Chatbot Architecture

Sentence Embedding:

Sentence embeddings are representations of sentences in a high-dimensional vector space that

capture the semantic meaning of the text chunks they represent. These embeddings are typically
generated using pre-trained language models, such as BERT (Bidirectional Encoder Representations
from Transformers) or GPT (Generative Pre-trained Transformer). These models are trained on large
amounts of text data using unsupervised learning techniques to learn contextualized representations
of words and sentences. By encoding sentences into dense vectors, sentence embeddings enable
various natural language processing tasks, including semantic similarity measurement, text
classification, and information retrieval. The embeddings are learned in such a way that similar
sentences are represented by vectors that are close to each other in the embedding space,
facilitating tasks like clustering and retrieval.

Retrieval based QA:

Retrieval-based question answering (QA) is a technique that involves retrieving relevant documents
or passages from a database and selecting answers based on them. In this approach, a question is
first encoded into a vector representation, often using techniques like sentence embedding. Then,
the vector database, containing precomputed embeddings of documents or sentences, is queried to
retrieve the most relevant passages related to the question. Finally, answer selection algorithms are
applied to extract or generate answers from the retrieved passages. This approach is particularly
effective for QA tasks where the answer can be found within the given context, such as factoid-based
questions or information retrieval tasks. By leveraging the semantic similarity between the question
and the documents, retrieval-based QA systems can provide accurate and relevant answers to user
queries.

Conversational Language Model:

The Llama model (TheBloke/Llama-2-7B-Chat-GGML) is a conversational language model trained on

the Gale Encyclopedia medical textbook data. This model employs algorithms for natural language
understanding and generation in conversational contexts. It leverages pre-trained language
representations to comprehend and generate human-like responses in conversations related to
medical topics. The model is fine-tuned on conversational data to improve its ability to engage in
dialogues, understand context, and generate coherent and informative responses. By combining
advanced language modeling techniques with domain-specific knowledge from the medical textbook
data, the Llama model demonstrates capabilities in conversational AI applications, such as chatbots
for healthcare support, medical question answering, and patient interaction systems.

Data Collection and Preprocessing:

 Text Extraction: Data collection involves ingesting PDF documents containing medical
information. The PyPDFLoader is used to load these documents from a specified directory
(DATA_PATH). The PyPDFLoader extracts this text, converting it into a format suitable for
analysis by the chatbot.

Fig: Medical Textbook used for Model Training

 Text Chunking: Medical documents can be lengthy and complex. To handle such documents
effectively, a RecursiveCharacterTextSplitter is employed to split the extracted text into
manageable chunks. The text splitter splits the documents into chunks with a specified size
(chunk_size) and overlap (chunk_overlap). These parameters determine the granularity of
the text chunks and the amount of overlap between adjacent chunks. Fine-tuning these
parameters can optimize the balance between chunk size and contextual coherence

Embedding Generation:

 After preprocessing, the text chunks undergo embedding generation. This involves
representing each chunk of text as a numerical vector using pre-trained language models.
The HuggingFaceEmbeddings module is employed for this task, utilizing the sentence-
transformers/all-MiniLM-L6-v2 model to generate embeddings.
 Semantic Representation: The embeddings capture the semantic content of the text,
encoding information about the meaning and context of the medical information contained
in the text chunks.

Vector Database Creation:

The vector database creation process involves several key components:

 Text Embeddings: The embeddings generated for the text chunks serve as the basis for
constructing the vector database. Each text chunk is represented as a high-dimensional
numerical vector, capturing its semantic content.
 FAISS Library: The FAISS (Facebook AI Similarity Search) library is employed for constructing
and managing the vector database. FAISS provides highly optimized algorithms for similarity
search in large-scale datasets, making it well-suited for the task of retrieving relevant
documents during question answering.
 The embeddings of the text chunks are indexed using FAISS to create the vector database.
This indexing process organizes the embeddings in a structure that enables fast nearest
neighbor search, allowing the chatbot to retrieve the most relevant documents efficiently.

Model Loading:

In addition to constructing the vector database, the chatbot also loads a pre-trained language model
for question answering tasks. This model serves as the core component for generating responses to
user queries based on the information retrieved from the vector database.

 Selection of Pre-trained Model: A pre-trained language model suitable for conversational

question answering tasks is chosen. In the provided code, the TheBloke/Llama-2-7B-Chat-
GGML model is selected, which is specifically trained for medical conversational QA using the
Gale Encyclopedia medical textbook data.
 Integration with Chatbot Pipeline: Once initialized, the language model is integrated into the
chatbot pipeline for question answering. It serves as the primary component responsible for
understanding user queries, retrieving relevant information from the vector database, and
generating informative responses.

Q/A Chain Creation:

The pre-trained language model selected for the chatbot is integrated into the question answering
pipeline. In the provided code, the CTransformers module is used to initialize and configure the
language model.

 The vector database constructed earlier serves as the retrieval component of the question
answering pipeline. It enables the chatbot to efficiently search for relevant documents based
on user queries.
 Prompt Template: A custom prompt template is defined to structure the input for the
question answering model. This template provides context and formatting instructions for
generating responses based on user queries.
 Chain Construction: Using the components mentioned above, the question answering chain
(qa_chain) is constructed. This chain incorporates the language model, retrieval component,
and prompt template to facilitate effective question answering.

Bot Initialization and Execution:

The qa_bot function initializes the chatbot by loading the necessary components, including
the language model, vector database, and prompt template. It sets up the question
answering pipeline qa_chain for processing user queries.

The final_result function takes a user query as input and executes the chatbot by passing the
query through the question answering pipeline. The chatbot retrieves relevant information
from the vector database and generates an informative response based on the user query.

Integration with Chainlit:

Chainlit is a library used for building conversational AI agents, and it plays a crucial role in
orchestrating the interaction between the chatbot and users.

 Message Handlers: Chainlit provides message handlers to manage the flow of conversation
between the chatbot and users. Handlers are set up to handle bot initialization, user queries,
and response delivery.
 User Session Management: Chainlit facilitates user session management, allowing the
chatbot to maintain context and state information across interactions with users.
 Asynchronous Processing: Chainlit supports asynchronous processing of user queries and
responses, ensuring smooth and responsive interaction between the chatbot and users.

Implementation:

Ingest.py

from langchain_community.embeddings import HuggingFaceEmbeddings

from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

DATA_PATH = 'data/'
DB_FAISS_PATH = 'vectorstore/db_faiss'

# Create vector database

def create_vector_db():
loader = DirectoryLoader(DATA_PATH,
glob='*.pdf',
loader_cls=PyPDFLoader)

documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,
chunk_overlap=50)
texts = text_splitter.split_documents(documents)

embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-
MiniLM-L6-v2',
model_kwargs={'device': 'cpu'})

db = FAISS.from_documents(texts, embeddings)
db.save_local(DB_FAISS_PATH)

if __name__ == "__main__":
create_vector_db()

model.py
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.prompts import PromptTemplate
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import CTransformers
from langchain.chains import RetrievalQA
import chainlit as cl

DB_FAISS_PATH = 'vectorstore/db_faiss'

custom_prompt_template = """Use the following pieces of information to answer the

user's question.
If you don't know the answer, just say that you don't know, don't try to make up an
answer.

Context: {context}
Question: {question}

Only return the helpful answer below and nothing else.

Helpful answer:
"""

def set_custom_prompt():
"""
Prompt template for QA retrieval for each vectorstore
"""
prompt = PromptTemplate(template=custom_prompt_template,
input_variables=['context', 'question'])
return prompt

#Retrieval QA Chain
def retrieval_qa_chain(llm, prompt, db):
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff',
retriever=db.as_retriever(search_kwargs={'k': 2}),
return_source_documents=True,
chain_type_kwargs={'prompt': prompt})
return qa_chain

#Loading the model

def load_llm():
# Load the locally downloaded model here
llm = CTransformers(
model = "TheBloke/Llama-2-7B-Chat-GGML",
model_type="llama",
max_new_tokens = 512,
temperature = 0.5
)
return llm

#QA Model Function

def qa_bot():
embeddings = HuggingFaceEmbeddings(model_name=
"sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'})
db = FAISS.load_local(DB_FAISS_PATH, embeddings,
allow_dangerous_deserialization=True)
llm = load_llm()
qa_prompt = set_custom_prompt()
qa = retrieval_qa_chain(llm, qa_prompt, db)

return qa

#output function
def final_result(query):
qa_result = qa_bot()
response = qa_result({'query': query})
return response

#chainlit code
@cl.on_chat_start
async def start():
chain = qa_bot()
msg = cl.Message(content="Starting the bot...")
await msg.send()
msg.content = "Hi, Welcome to Medical Bot. What is your query?"
await msg.update()

cl.user_session.set("chain", chain)

@cl.on_message
async def main(message: cl.Message):
chain = cl.user_session.get("chain")
cb = cl.AsyncLangchainCallbackHandler(
stream_final_answer=True, answer_prefix_tokens=["FINAL", "ANSWER"]
)
cb.answer_reached = True
res = await chain.acall(message.content, callbacks=[cb])
answer = res["result"]
#sources = res["source_documents"]
print(answer)
#if sources:
# answer += f"\nSources:" + str(sources)
#else:
# answer += "\nNo sources found"

await cl.Message(content=answer).send()

Results and discussion :

Use this image for abstract

Bring Your Data To Life - Creating A Chatbot With LLM, LangChain, Vector DB
No ratings yet
Bring Your Data To Life - Creating A Chatbot With LLM, LangChain, Vector DB
10 pages
NTRCA 16th CS & ICT Written Question
88% (17)
NTRCA 16th CS & ICT Written Question
3 pages
Lab Experiment 1 LLM
No ratings yet
Lab Experiment 1 LLM
3 pages
A Comprehensive Survey of ChatGPT - Advancements, Applications, Prospects, and Challenges
No ratings yet
A Comprehensive Survey of ChatGPT - Advancements, Applications, Prospects, and Challenges
31 pages
NeurIPS 2023 Hugginggpt Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face Paper Conference
No ratings yet
NeurIPS 2023 Hugginggpt Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face Paper Conference
27 pages
Labruna Et Al. (2023) - Unraveling ChatGPT A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations
No ratings yet
Labruna Et Al. (2023) - Unraveling ChatGPT A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations
21 pages
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
No ratings yet
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
27 pages
Downloaded From: Https://ray - Yorksj.ac - Uk/id/eprint/9863/: Institutional Repository Policy Statement
No ratings yet
Downloaded From: Https://ray - Yorksj.ac - Uk/id/eprint/9863/: Institutional Repository Policy Statement
18 pages
2023 Findings-Acl 277
No ratings yet
2023 Findings-Acl 277
19 pages
Deployment of Medibot in Medical Field
No ratings yet
Deployment of Medibot in Medical Field
11 pages
Content IPL 5sem
No ratings yet
Content IPL 5sem
14 pages
COMP9444 Dragon Master Demo
No ratings yet
COMP9444 Dragon Master Demo
11 pages
The Health ChatBots in Telemedicine Intelligent Di
No ratings yet
The Health ChatBots in Telemedicine Intelligent Di
12 pages
Emotional Generative Dialog System
No ratings yet
Emotional Generative Dialog System
6 pages
Final Synopsis
No ratings yet
Final Synopsis
10 pages
2ndproposal ConceptPaper
No ratings yet
2ndproposal ConceptPaper
5 pages
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
No ratings yet
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
25 pages
Health Care Chatbot Final
No ratings yet
Health Care Chatbot Final
7 pages
Hope To Skills: Lecture# 02 Irfan Malik, Dr. Sheraz Naseer
No ratings yet
Hope To Skills: Lecture# 02 Irfan Malik, Dr. Sheraz Naseer
38 pages
Introduction To Docs and Image Based Voice Chatbots
No ratings yet
Introduction To Docs and Image Based Voice Chatbots
17 pages
Mini Project Docubot Power Point
No ratings yet
Mini Project Docubot Power Point
17 pages
Chatbot PPT 2.0
No ratings yet
Chatbot PPT 2.0
14 pages
Openai Chatgpt Seminar Report Collegelib
No ratings yet
Openai Chatgpt Seminar Report Collegelib
8 pages
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
No ratings yet
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
18 pages
Case Study
No ratings yet
Case Study
25 pages
Chatbot PPT 2.0
No ratings yet
Chatbot PPT 2.0
13 pages
Hehehehehevnwegisdbg
No ratings yet
Hehehehehevnwegisdbg
2 pages
University Elective ChatGPT Assignments
No ratings yet
University Elective ChatGPT Assignments
8 pages
Deep Learning Project
No ratings yet
Deep Learning Project
21 pages
FINAL-MIDTERM Major2
No ratings yet
FINAL-MIDTERM Major2
20 pages
Large Language Models LLMs
No ratings yet
Large Language Models LLMs
2 pages
KJNSKD
No ratings yet
KJNSKD
4 pages
01 Merged
No ratings yet
01 Merged
15 pages
Chatbot Systems For Document Interaction
No ratings yet
Chatbot Systems For Document Interaction
3 pages
Slides
No ratings yet
Slides
63 pages
Dexter
No ratings yet
Dexter
5 pages
Experiential Learning
No ratings yet
Experiential Learning
8 pages
LLM Intro
No ratings yet
LLM Intro
19 pages
Artificial Intelligence Boost Productivity
No ratings yet
Artificial Intelligence Boost Productivity
35 pages
Ai 1
No ratings yet
Ai 1
22 pages
ChatBot Synopsis-Final
No ratings yet
ChatBot Synopsis-Final
7 pages
Intelligent Chat Bot Source Code
No ratings yet
Intelligent Chat Bot Source Code
10 pages
Synopsis Chatbot PDF
100% (1)
Synopsis Chatbot PDF
6 pages
New 181
No ratings yet
New 181
2 pages
A E C P T L L M: A P ' G: N Mpirical Ategorization of Rompting Echniques FOR Arge Anguage Odels Ractitioner S Uide
No ratings yet
A E C P T L L M: A P ' G: N Mpirical Ategorization of Rompting Echniques FOR Arge Anguage Odels Ractitioner S Uide
16 pages
ChatBot Using TenserFlow
No ratings yet
ChatBot Using TenserFlow
12 pages
HuggingGPT: Solving AI Tasks With ChatGPT and Its Friends in HuggingFace
100% (1)
HuggingGPT: Solving AI Tasks With ChatGPT and Its Friends in HuggingFace
18 pages
Natural Language Understanding in Chatbots
No ratings yet
Natural Language Understanding in Chatbots
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
No ratings yet
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
8 pages
Course Project Report For: Artificial Intelligence EL-3011
No ratings yet
Course Project Report For: Artificial Intelligence EL-3011
8 pages
GRP 117 Review 1 Chatbot
No ratings yet
GRP 117 Review 1 Chatbot
28 pages
Running Llama 2 On CPU Inference Locally For Document Q&A - by Kenneth Leung - Jul, 2023 - Towards Data Science
100% (1)
Running Llama 2 On CPU Inference Locally For Document Q&A - by Kenneth Leung - Jul, 2023 - Towards Data Science
21 pages
Gen Ai Lab - DS
No ratings yet
Gen Ai Lab - DS
26 pages
Learning Assistant
No ratings yet
Learning Assistant
6 pages
Static Prompting: Micro-Course
No ratings yet
Static Prompting: Micro-Course
4 pages
SQL PL SQL Interview Questions
100% (1)
SQL PL SQL Interview Questions
74 pages
SuperDataScience - Data Scientist Learning Path Study Plan
100% (2)
SuperDataScience - Data Scientist Learning Path Study Plan
30 pages
Chatbot: Abhishek Verma (00414902018) Archit Kr. Singh (01414902018) Jatin Bagga (03814902018)
No ratings yet
Chatbot: Abhishek Verma (00414902018) Archit Kr. Singh (01414902018) Jatin Bagga (03814902018)
29 pages
OCP - SQL&PL - SQL (Vol1)
100% (1)
OCP - SQL&PL - SQL (Vol1)
322 pages
EDM Reporting User Manual
No ratings yet
EDM Reporting User Manual
63 pages
AE Restartability
No ratings yet
AE Restartability
28 pages
System Design Golden Rules
No ratings yet
System Design Golden Rules
37 pages
Spring Batch Reference
No ratings yet
Spring Batch Reference
282 pages
DDS Unit - 5
No ratings yet
DDS Unit - 5
27 pages
ISCDC For Oracle
No ratings yet
ISCDC For Oracle
104 pages
Concurrency Control, Lock-Based Protocol & Time-Stamp Protocol
No ratings yet
Concurrency Control, Lock-Based Protocol & Time-Stamp Protocol
8 pages
Business Intelligence and Analytics Systems For Decision Support 10th Edition Sharda Solutions Manualdownload
100% (7)
Business Intelligence and Analytics Systems For Decision Support 10th Edition Sharda Solutions Manualdownload
56 pages
Information Search
No ratings yet
Information Search
50 pages
Database Development
No ratings yet
Database Development
36 pages
UG 4 Sem
No ratings yet
UG 4 Sem
12 pages
MultiLoad General Code
No ratings yet
MultiLoad General Code
55 pages
Perhitungan Harga Pokok Produksi Dengan Metode Harga Pokok Proses Pada PT. Persada
No ratings yet
Perhitungan Harga Pokok Produksi Dengan Metode Harga Pokok Proses Pada PT. Persada
8 pages
13 DataStorage
No ratings yet
13 DataStorage
47 pages
MSSQL To Tibero Migration
No ratings yet
MSSQL To Tibero Migration
20 pages
Sahira Sulthana - Resume New
No ratings yet
Sahira Sulthana - Resume New
2 pages
Khulna University of Engineering and Technology, Khulna-9203
No ratings yet
Khulna University of Engineering and Technology, Khulna-9203
5 pages
Locklist Info
No ratings yet
Locklist Info
2 pages
1.4 Framework For Data Warehouse Design
No ratings yet
1.4 Framework For Data Warehouse Design
8 pages
IEEE Java Projects List - SPARKTECH 8904892715
No ratings yet
IEEE Java Projects List - SPARKTECH 8904892715
8 pages
SEL SEO Guide
No ratings yet
SEL SEO Guide
21 pages
Basic Data Mining Tutorial
No ratings yet
Basic Data Mining Tutorial
35 pages
Specilization in Ai&ml
No ratings yet
Specilization in Ai&ml
8 pages
Baze de Date
No ratings yet
Baze de Date
17 pages
Resume Sumit
No ratings yet
Resume Sumit
1 page
Oracle Additional Interfaces - 6
No ratings yet
Oracle Additional Interfaces - 6
2 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
TextMate in Depth: Definitive Reference for Developers and Engineers
From Everand
TextMate in Depth: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

DL Pro 456

Uploaded by

DL Pro 456

Uploaded by

Methodology/Algorithms

Fig: Chatbot Architecture

Sentence embeddings are representations of sentences in a high-dimensional vector space that

Retrieval based QA:

Conversational Language Model:

The Llama model (TheBloke/Llama-2-7B-Chat-GGML) is a conversational language model trained on

Data Collection and Preprocessing:

Fig: Medical Textbook used for Model Training

Vector Database Creation:

The vector database creation process involves several key components:

 Selection of Pre-trained Model: A pre-trained language model suitable for conversational

Q/A Chain Creation:

Bot Initialization and Execution:

Integration with Chainlit:

from langchain_community.embeddings import HuggingFaceEmbeddings

# Create vector database

custom_prompt_template = """Use the following pieces of information to answer the

Only return the helpful answer below and nothing else.

#Loading the model

#QA Model Function

Results and discussion :

You might also like