Gen AI Guide

Retrieval Augmented Generation (RAG) is an architecture that enhances the performance of large language models (LLMs) by providing relevant information from external data sources during output generation. It involves three main stages: data preparation, retrieval, and generation, with various techniques to improve performance, such as hybrid search and overlapping chunks. RAG is particularly useful for dynamic datasets where fine-tuning is not feasible, allowing for improved responses to queries beyond the model's original training data.

Uploaded by

Durvesh Mahurkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views6 pages

Gen AI Guide

Uploaded by

Durvesh Mahurkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Gen AI

Guide

1/1/2024

acer
Durvesh Mahurkar
What is Retrieval Augmented Generation?
If you have been looking up data in a vector store or some other database and passing
relevant info to LLM as context when generating output, you are already doing retrieval
augmented generation. Retrieval augmented generation or RAG for short is the architecture
popularized by Meta in 2020 that aims to improve the performance of LLMs by passing
relevant information to the model along with the question/task details.

Why RAG?
LLMs are trained on large corpuses of data and can answer any questions or complete tasks
using their parameterized memory. These models have a knowledge cutoff dates depending
on when they were last trained on. When asked a question out of its knowledge base or about
events that happened after the knowledge cutoff date, there is a high chance that the model
will hallucinate. Researchers at Meta discovered that by providing relevant information about
the task at hand, the model’s performance at completing the task improves significantly.

For example, if the model is being asked about an event that happened after the cutoff date,
providing information about this event as context and then asking the question will help the
model answer the question correctly. Because of the limited context window length of LLMs,
we can only pass the most relevant knowledge for the task at hand. The quality of the data we
add in the context influences the quality of the response that the model generates. There are
multiple techniques that ML practitioners use in different stages of a RAG pipeline to
improve LLM’s performance.

RAG vs Fine-tuning

Fine-tuning is the process of training a model on a specific task like how one can fine-tune
GPT-3.5 on a question answering dataset to improve its performance on question answering
for that specific dataset. Fine-tuning is a good approach if you have a dataset large enough for
the task at hand and the dataset doesn't change. If the dataset is dynamic, we will need to keep
retraining the model to keep up with the changes. Fine-tuning is also not a good approach if
you don’t have a large dataset for the task at hand. In such cases, you can use RAG to
improve the performance of LLMs. Similarly you can use RAG to improve the performance
of LLMs on tasks like summarization, translation etc. that may not be possible to fine-tune
on.

How it works?
RAG architecture and the pipeline involves three main stages - data preparation,
retrieval and generation. The data preparation stage involves identifying the data
sources, extracting the data from the sources, cleaning the data and storing it in a database.
The retrieval stage involves retrieving relevant data from the database based on the task at
hand. The generation stage involves generating the output using the retrieved data and the
task at hand. The quality of the output depends on the quality of the data and the retrieval
strategy. The following sections describe each stage in detail.
Data Preparation

Based on the type of tasks LLM is going to handle, data preparation usually involves
identifying the data sources, extracting the data from the sources, cleaning the data and
storing it in a database. The kind of database being used to store the data and the steps
involved in preparing the data can vary depending on the use case and the retrieval methods.
For example, if you are using a vector store like Weaviate, you will need to create
embeddings for the data and store them in the vector store. If you are using a search engine
like Elasticsearch, you will need to index the data in the search engine. If you are using a
graph database like Neo4j, you will need to create nodes and edges for the data and store
them in the graph database. We will discuss the different types of databases and the steps
involved in preparing the data in the next section.

Vector Store

Vector stores are useful for storing unstructured data like text, images, audio etc. and for
searching the data based on semantic similarity. An embedding model is used to generate
vector embeddings for the data we store in the database. Data will need to be chunked into
smaller pieces depending on the type of data, use case and the embedding model. For
example, if you are storing text data, you can chunk the data into sentences or paragraphs. If
the you are storing code, you can chunk the data into functions or classes. You may use
smaller chunks if you choose to provide a wide range of snippets in context to the LLM.
Once the data is chunked, you can generate embeddings for each chunk and store them in the
vector store. When a query is made to the vector store, the query is also converted into an
embedding and the vector store returns the most similar embeddings to the query.

Vector databases like Weaviate will take care of generating embeddings during both storage
and retrieval and you can just focus on data modeling and chunking strategies.
Keyword Search

Keyword search is a simple approach to retrieving data where the data is indexed based on
keywords and the search engine returns the documents that contain the keywords. Keyword
search is useful for storing structured data like tables, documents etc. and for searching the
data using keywords.

Graph Database

Graph databases store data in the form of nodes and edges. They are useful for storing
structured data like tables, documents etc. and for searching the data using relationships
between the data. For example, if you are storing data about people, you can create nodes for
each person and edges between people who know each other. When a query is made to the
graph database, the graph database returns the nodes that are connected to the query node.
This kind of retrieval where the knowledge graphs are used is useful for tasks like question
answering where the answer is a person or an entity.

Search Engine

Data in a RAG pipeline could be retrieved from public search engines like Google, Bing etc
or internal search engines like Elasticsearch, Solr etc. During retrieval stage in RAG
architecture, the search engine is queried with the question/task details and the search engine
returns the most relevant documents. Search engines are useful for retrieving data from the
web and for searching the data with keywords. Data from a search engine can be combined
with data from other databases like vector stores, graph databases etc. to improve the quality
of the output.

tip

Hybrid approaches that combine multiple strategies (like semantic search + keyword
matches) are also possible and are known to give better results for most use cases. For
example, you can use a vector store to store text data and a graph database to store structured
data and combine the results from both databases to generate the output.

Retrieval

Once the data is identified and processed to be ready for retrieval, RAG pipeline involves
retrieving the relevant data based on the task (question asked by user) being handled and
preparing the context to be passed to the generator. Retrieval strategy can vary depending on
the use case. It usually involves passing the user's query or task to the datastore and pulling
relevant results. For example, if we are building a question answering system with a vector
database storing the chunks of related data, we can generate embeddings for the user's query,
do a similarity search for the embeddings in the vector database and retrieve the most similar
chunks (some vector databases takes care of generating embeddings during retrieval).
Similarly depending on the use case, we can do a hybrid search on the same vector store or
with multiple databases and combine the results to pass as context to the generator.

Generation
Once the relevant data is retrieved, it is passed to the generator (LLM) along with the user's
query or task. The LLM generates the output using the retrieved data and the user's query or
task. The quality of the output depends on the quality of the data and the retrieval strategy.
The instructions for generating the output will also greatly impact the quality of the output.

Techniques to improve RAG performance in production

Following are some techniques across the different stages of RAG pipeline that can be used
to improve the performance of RAG in production.

1. Hybrid search: Combining semantic search with keyword search to retrieve relevant
data from a vector store is known to give better results for most use cases.
2. Summaries: It may be beneficial to summarize the chunks and storing the summaries
in the vector store instead of raw chunks. For example, if your data involves a lot of
filler words, it is a good idea to summarize the chunks to remove the filler words and
store the summaries in the vector store. This will improve the quality of generation
since we are removing the noise from the data as well as help with reducing the
number of tokens in the input.
3. Overlapping chunks: When splitting the data into chunks for semantic retrieval,
there could be instances with semantic search where we may pick a chunk which may
have related and useful context in the neighboring chunks. Passing this chunk to the
LLM for generation without surrounding context may result in poor quality output. To
avoid this, we can overlap the chunks and pass the overlapping chunks to the LLM for
generation. For example, if we are splitting the data into chunks of 100 tokens, we can
overlap the chunks by 50 tokens. This will ensure that we are passing the surrounding
context to the LLM for generation.
4. Fine-tuned embedding models: Using off-the-shelf embedding models like BERT,
ada etc to generate embedding for the data chunks may work for most use-cases. But
if you are working on a specific domain, these models may not represent the domain
well in the vector space resulting in poor quality retrieval. In such cases, we can fine-
tune and use an embedding model on the data from the domain to improve the quality
of retrieval.
5. Metadata: Providing metadata like source etc., about the chunks being passed in the
context will help the LLM understand the context better resulting in better output
generation.
6. Re-ranking: When using semantic search, it is possible that the top-k results are
similar to each other. In such cases, we should consider re-ranking the results based
on other factors like metadata, keyword matches etc. to cover a wide range of snippets
in context to the LLM.
7. Lost in the middle: It has been observed that LLMs do not place equal weight to all
the tokens in the input. Tokens in the middle appear to have been given less weight
than the tokens at the beginning and end of the input. This is known as the lost in the
middle problem. To avoid this, we can re-order the context snippets so we place the
most important snippets at the beginning and end of the input and the less important
snippets in the middle.
Data Preparation

A Taxonomy of Retrieval Augmented Generation
100% (2)
A Taxonomy of Retrieval Augmented Generation
56 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
RAG Architecture
100% (8)
RAG Architecture
52 pages
Artificial Intelligence in Oil and Gas Upstream
No ratings yet
Artificial Intelligence in Oil and Gas Upstream
10 pages
Augmented Reality
No ratings yet
Augmented Reality
26 pages
Robotic Surgery
100% (1)
Robotic Surgery
21 pages
Ethical OS Toolkit
100% (1)
Ethical OS Toolkit
78 pages
Rag 1708257109
100% (1)
Rag 1708257109
5 pages
Redefining C4ISR
No ratings yet
Redefining C4ISR
4 pages
RAG Slide ENG
No ratings yet
RAG Slide ENG
41 pages
RAG Technics
100% (1)
RAG Technics
8 pages
Implementing A Retrieval-Augmented Generation System
No ratings yet
Implementing A Retrieval-Augmented Generation System
3 pages
RAG - The Future of LLMs - LinkedIn
No ratings yet
RAG - The Future of LLMs - LinkedIn
7 pages
RAG Syllabus R&D
No ratings yet
RAG Syllabus R&D
6 pages
Retrieval Augmented Generation - A Simple Introduction
No ratings yet
Retrieval Augmented Generation - A Simple Introduction
82 pages
Hypothesis Space and Inductive Bias - Inductive Bias - Inductive Learning - Underfitting and Overfitting
No ratings yet
Hypothesis Space and Inductive Bias - Inductive Bias - Inductive Learning - Underfitting and Overfitting
4 pages
7 Agentic RAG System Architectures To Build AI Agents
100% (1)
7 Agentic RAG System Architectures To Build AI Agents
12 pages
Arabic Chatbots A Survey
No ratings yet
Arabic Chatbots A Survey
8 pages
Rag
No ratings yet
Rag
10 pages
Arti Cial Intelligence in Computer-Aided Auditing Techniques and Technologies (Caatts) and An Application Proposal For Auditors
No ratings yet
Arti Cial Intelligence in Computer-Aided Auditing Techniques and Technologies (Caatts) and An Application Proposal For Auditors
24 pages
Data For GenAI
No ratings yet
Data For GenAI
17 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
A Simple Guide To Retrieval Augmented Generation 1720484135
No ratings yet
A Simple Guide To Retrieval Augmented Generation 1720484135
9 pages
Advanced RAG Techniques - What They Are & How To Use Them
No ratings yet
Advanced RAG Techniques - What They Are & How To Use Them
16 pages
How To Build AI Driven Knowledge Assistants
100% (1)
How To Build AI Driven Knowledge Assistants
24 pages
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
No ratings yet
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
18 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
100% (10)
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
6 pages
WWW Databricks Com Glossary Retrieval-Augmented-Generation-Rag
No ratings yet
WWW Databricks Com Glossary Retrieval-Augmented-Generation-Rag
12 pages
The DOM GraphRAG Project
No ratings yet
The DOM GraphRAG Project
30 pages
Trends in Food Science & Technology: Eloisa Bagnulo, Giulia Strocchi, Carlo Bicchi, Erica Liberto
No ratings yet
Trends in Food Science & Technology: Eloisa Bagnulo, Giulia Strocchi, Carlo Bicchi, Erica Liberto
13 pages
Semantic Search and Beyond handout-Tim-Clarke
No ratings yet
Semantic Search and Beyond handout-Tim-Clarke
16 pages
Building Blocks of Rag Ebook Final
100% (2)
Building Blocks of Rag Ebook Final
9 pages
Job Descriptions
No ratings yet
Job Descriptions
20 pages
LLM Engineering - Master AI, Large Language Models & Agents - Udemy
No ratings yet
LLM Engineering - Master AI, Large Language Models & Agents - Udemy
13 pages
Llmrag
No ratings yet
Llmrag
6 pages
WWW Oracle Com in Artificial-Intelligence Generative-Ai Retrieval-Augmented-Generation-Rag
No ratings yet
WWW Oracle Com in Artificial-Intelligence Generative-Ai Retrieval-Augmented-Generation-Rag
7 pages
Multi-Head RAG: Solving Multi-Aspect Problems With LLMs
No ratings yet
Multi-Head RAG: Solving Multi-Aspect Problems With LLMs
14 pages
5th and 6th Topic
No ratings yet
5th and 6th Topic
8 pages
NVIDIA RAG Whitepaper
No ratings yet
NVIDIA RAG Whitepaper
7 pages
GENAI1
No ratings yet
GENAI1
25 pages
Harness Proprietary Data With Foundational Models and RAG: by Marian Veteanu
No ratings yet
Harness Proprietary Data With Foundational Models and RAG: by Marian Veteanu
20 pages
BB - AI-Empowered Human Resource Management - 8tr
No ratings yet
BB - AI-Empowered Human Resource Management - 8tr
8 pages
17 (Advanced) RAG Techniques To Turn Your LLM App Prototype Into A Production-Ready Solution - by Dominik Polzer - Jun, 2024 - Towards Data Science
No ratings yet
17 (Advanced) RAG Techniques To Turn Your LLM App Prototype Into A Production-Ready Solution - by Dominik Polzer - Jun, 2024 - Towards Data Science
54 pages
Week 5 - LLM - RAG
No ratings yet
Week 5 - LLM - RAG
34 pages
Weaviate Advanced RAG Techniques Ebook
100% (1)
Weaviate Advanced RAG Techniques Ebook
13 pages
TVS - Chitkara
No ratings yet
TVS - Chitkara
1 page
Medium
No ratings yet
Medium
22 pages
RAG Workflowllllll
No ratings yet
RAG Workflowllllll
3 pages
RAG Vs VectorDB. Introduction To RAG and VectorDB - by Bijit Ghosh - Medium
No ratings yet
RAG Vs VectorDB. Introduction To RAG and VectorDB - by Bijit Ghosh - Medium
37 pages
Retrieval Augmented Generation Options Good 5 38
No ratings yet
Retrieval Augmented Generation Options Good 5 38
34 pages
Sema4 AI Agents Top5 Ebook 102024 Final
No ratings yet
Sema4 AI Agents Top5 Ebook 102024 Final
14 pages
Minor Proj
No ratings yet
Minor Proj
15 pages
Benchmark Data Contamination of Large Language Models: A Survey
No ratings yet
Benchmark Data Contamination of Large Language Models: A Survey
31 pages
Hilti Ai Assisted Data Search or Ai Powered Data Management - Shikhar Ashutosh Moondra
No ratings yet
Hilti Ai Assisted Data Search or Ai Powered Data Management - Shikhar Ashutosh Moondra
12 pages
Syllabus Term Wise Class X 2024 25 Google Docs
No ratings yet
Syllabus Term Wise Class X 2024 25 Google Docs
17 pages
Fin Irjmets1647283669
No ratings yet
Fin Irjmets1647283669
8 pages
Into RAG Wirh LLMs
No ratings yet
Into RAG Wirh LLMs
47 pages
RAG Understanding PDF
No ratings yet
RAG Understanding PDF
12 pages
Fastrag: Retrieval Augmented Generation For Semi-Structured Data
No ratings yet
Fastrag: Retrieval Augmented Generation For Semi-Structured Data
9 pages
(K12) Đề Cương Tiếng Anh GHK2
No ratings yet
(K12) Đề Cương Tiếng Anh GHK2
24 pages
2111.02735v3 - Speech Emotion Detection
No ratings yet
2111.02735v3 - Speech Emotion Detection
7 pages
PCA PDF 1646672241
No ratings yet
PCA PDF 1646672241
11 pages
RAG Developers Stack
No ratings yet
RAG Developers Stack
13 pages
Module 5
No ratings yet
Module 5
20 pages
ITSM24PreviewGuide itSMFUK
No ratings yet
ITSM24PreviewGuide itSMFUK
16 pages
Machine Learning Questions
No ratings yet
Machine Learning Questions
2 pages
RAG From Scratch - Overview
No ratings yet
RAG From Scratch - Overview
1 page
Keywords
No ratings yet
Keywords
5 pages
25 EDICS Reserch-Guidance-Information
No ratings yet
25 EDICS Reserch-Guidance-Information
2 pages
FY24 Annual Report
No ratings yet
FY24 Annual Report
267 pages
Artificial Intelligence and Data Driven Optimization of Internal Combustion Engines 1st Edition Jihad Badra (Editor)
No ratings yet
Artificial Intelligence and Data Driven Optimization of Internal Combustion Engines 1st Edition Jihad Badra (Editor)
68 pages
Untitled 2
No ratings yet
Untitled 2
40 pages
Sil 1
No ratings yet
Sil 1
10 pages
RAG Cheat Sheet-2
No ratings yet
RAG Cheat Sheet-2
29 pages
How Can We Make AI Hallucinate Less
No ratings yet
How Can We Make AI Hallucinate Less
2 pages
Learning: Gen Ai
No ratings yet
Learning: Gen Ai
6 pages
Major Projectpp
No ratings yet
Major Projectpp
5 pages
Internship Report Hamas Khan
No ratings yet
Internship Report Hamas Khan
24 pages
Rag System Notes
No ratings yet
Rag System Notes
26 pages
Ai Question Paper PDF
No ratings yet
Ai Question Paper PDF
3 pages
Steps Involved in RAG
No ratings yet
Steps Involved in RAG
4 pages
Retrieval-Augmented Generation (RAG) : Michael Klesel H. Felix Wittmann
No ratings yet
Retrieval-Augmented Generation (RAG) : Michael Klesel H. Felix Wittmann
11 pages
What Is Retrieval-Augmented Generation, Aka RAG?: Rick Merritt
No ratings yet
What Is Retrieval-Augmented Generation, Aka RAG?: Rick Merritt
9 pages
Privacy First RAG Closed-Loop LLMs For Industrial Data Security
No ratings yet
Privacy First RAG Closed-Loop LLMs For Industrial Data Security
12 pages
Ue21cs421ac1 20240924233834
No ratings yet
Ue21cs421ac1 20240924233834
54 pages
12 Essential RAG Types 1735544647
No ratings yet
12 Essential RAG Types 1735544647
29 pages
RAG 570 Hasnad Ahmed2
No ratings yet
RAG 570 Hasnad Ahmed2
9 pages
Chapter 3 Methods
No ratings yet
Chapter 3 Methods
20 pages

Gen AI Guide

Uploaded by

Gen AI Guide

Uploaded by

Gen AI

Techniques to improve RAG performance in production

You might also like