A Taxonomy of Retrieval Augmented Generation
A Taxonomy of Retrieval Augmented Generation
A
Taxonomy of
Retrieval Augmented
Generation
Components, Concepts, Use Cases & more...
200+ Terms
to know, build and
improve RAG systems
Introduction
Retrieval Augmented Generation, or RAG, stands as a pivotal
technique shaping the landscape of the applied generative AI. A
O
D
novel concept introduced by Lewis et al in their seminal paper
T
Retrieval-Augmented Generation for Knowledge-Intensive NLP
TIO NTE
Tasks, RAG has swiftly emerged as a cornerstone, enhancing
E
reliability and trustworthiness in the outputs from Large Language
UID
Models (LLMs).
E
In 2024, RAG is one of the most widely used techniques in
M
generative AI applications and as per Databricks, at least 60% of
EG
LLM applications utilise some form of RAG. RAG’s acceptance is
G
N
also propelled by the simplicity of the concept. Simply put, a RAG
system searches for information from a knowledge base and sends
U
L
RA
AL
AS
IEV
NE
GE
TR
RE
Figure 1: Retrieval Augmented Generation enhances the reliability and the trustworthiness
in LLM responses (Source: A Simple Guide to Retrieval Augmented Generations)
O
D
RAG continues to evolve it’s crucial to create a shared language
T
framework for researchers, practitioners, developers and business
TIO NTE
leaders.
E
UID
This taxonomy is an attempt to clarify the components of RAG,
serve as a guide for understanding key building blocks and provide
E
a roadmap to navigate through this, somewhat complex, evolving
M
RAG ecosystem.
EG
G
N
U
L
A
P
IM
RA
AL
AS
IEV
NE
GE
TR
RE
Evaluation _________________________________ 23
Terms associated with metrics, frameworks and metrics used in RAG
evaluation
O
Knowledge Cut-off Date: Training an LLM is an expensive and
D
time-consuming process. It takes massive volumes of data and
T
several weeks, or even months, to train an LLM. The data that LLMs
TIO NTE
are trained on is therefore not current. For example, GPT-4o has
E
knowledge only up to October 2023. Any event that happened after
UID
this knowledge cut-off date is not available to the model.
E
Training Data Limitation: LLMs are trained on large volumes of
M
EG
data from a variety of public sources — like Llama 3 has been
trained on a whopping 15 trillion tokens (about 7 times more than
G
N
Llama 2) — but they do not have any knowledge of information that
U
is not public. Publicly available LLMs have not been trained on
L
O
D
Claude 3 Anthropic March 2024 200k tokens
T
A SIMPLE GUIDE TO
GPT-4o OpenAI April 2023 128k tokens
TIO NTE
LLaMA 3.1 Meta June 2024 128k tokens
E
PaLM 2 Google DeepMind April 2023 32k tokens
RETRIEVAL AUGMENTED
Gemini 1.5 Pro Google DeepMind Early 2024 256k tokens
UID
Claude 2 Anthropic Early 2023 100k tokens
Mistral Mistral AI 2023 8k tokens
E
Falcon 40B TII March 2023 2k tokens
GENERATION
M
BLOOM BigScience Early 2022 2k tokens
EG
GPT-NeoX-20B EleutherAI April 2022 2k tokens
Table 1: Popular LLMs with their cut-off date and context window
G
N
U
RAG Concepts
L
O
D
T
Retrieval: The process via which, information pertinent to the user
TIO NTE
query is searched for and fetched from the knowledge base.
E
UID
Augmentation: The process of adding the retrieved information to
the user query.
M E
Generation: The process of generating results by the LLM when
EG
provided with an augmented prompt.
G
N
U Source Citation:
Ability of a RAG
L
to the information
IM
A SIMPLE GUIDE TO
from the knowledge
RA
AL
response
NE
DOWNLOAD number of
documents can be
FROM GUMROAD
RE
O
the knowledge base for RAG applications. It is a non real-time
D
pipeline that updates the knowledge base at periodic intervals.
T
TIO NTE
E
Source Systems: The original locations where the data that is
desired for the RAG application is stored. These can be data lakes,
UID
file systems, CMSs, SQL & NoSQL databases, 3rd party data stores
E
etc.
M
EG
G
N
U
L
A
P
IM
RA
AL
AS
IEV
NE
Data Loading: The first step of the indexing pipeline that connects
to source systems to extract and parse files for data to be used in
the RAG knowledge base.
RE
O
D
confidential data
T
TIO NTE
Chunking: The process of breaking down long pieces of text into
E
smaller manageable sizes or “chunks”. Chunking is crucial to the
UID
efficient creation of knowledge base for RAG systems. Chunking
increases the ease of search and overcomes the context window
E
limits of LLMs.
M
EG
Lost in the middle problem: Even in those LLMs which have a
G
N
long context window (Claude 3 by Anthropic has a context window
of up to 200,00 tokens), an issue with accurately reading the
U
L
RA
AL
O
the larger document to each chunk to enrich the context of the
D
smaller chunk. This makes more context available to the LLM
T
TIO NTE
without adding too much noise. It also improves the retrieval
E
accuracy and maintains semantic coherence across chunks. This is
particularly useful in scenarios where a more holistic view of the
UID
information is crucial. While this approach enhances the
E
understanding of the broader context, it adds a level of complexity
M
and comes at the cost of higher computational requirements,
EG
increased storage needs and possible latency in retrieval.
G
N
Agentic Chunking: In agentic chunking, chunks from the text are
U
created based on a goal or a task. Consider an e-commerce
L
platform wanting to analyse customer reviews. The best way for the
A
P
topic are put in the same chunk. Similarly, the critical reviews and
RA
AL
O
text is first broken down into very small units (e.g., sentences,
D
paragraphs), and the small chunks are merged into larger ones
T
until the chunk size is achieved. Sliding window chunking uses
TIO NTE
overlap between chunks to maintain context across chunk
E
boundaries.
UID
M E
EG
G
N
U
L
A
P
IM
RA
AL
context, they also carry a lot of noise. Smaller chunks, on the other
TR
O
sentiment, category etc. that can be inferred beyond tags like
D
source, timestamp, author etc. can be used to enhance retrieval.
T
TIO NTE
Parent Child Indexing: A document structure where documents
E
are organised hierarchically. The parent document contains
UID
overarching themes or summaries, while child documents delve into
E
specific details. During retrieval, the system can first locate the
most relevant child documents and then refer to the parent
M
EG
documents for additional context if needed. This approach
enhances the precision of retrieval while maintaining the broader
G
N
context. At the same time, this hierarchical structure can present
U
challenges in terms of memory requirements and computational
L
load.
A
P
IM
O
because they help in establishing semantic relationship between
D
words, phrases, and documents. Cosine similarity is calculated as
T
the cosine value of the angle between the two vectors. Cosine of
TIO NTE
parallel lines i.e. angle=0 is 1 and cosine of a right angle i.e. 90 is 0.
E
On the other end, the cosine of opposite lines i.e. angle =180 is -1.
UID
Therefore, the cosine similarity lies between -1 and 1 where
E
unrelated terms have a value close to 0, and related terms have a
value close to 1.
M
EG
G
N
U
L
A
P
IM
RA
AL
AS
IEV
NE
Figure 7: Cosine Similarity between vectors
O
researchers at Allen Institute for AI. ELMo embeddings have been
D
shown to improve performance on question answering and
T
sentiment analysis tasks.
TIO NTE
E
BERT: Bidirectional Encoder Representations from Transformers,
UID
developed by researchers at Google, is a Transformers
E
architecture-based model. It provides contextualized word
embeddings by considering bidirectional context, achieving state-
M
EG
of-the-art performance on various natural language processing
tasks.
G
N
U
Pre-trained Embeddings Models: Embeddings models that have
L
are available to use. This is also one of the reasons why the usage
of embeddings has exploded in popularity across machine learning
AS
IEV
applications.
NE
Vector Indices: These are libraries that focus on the core features
of indexing and search. They do not support data management,
query processing, interfaces etc. They can be considered a bare
bones vector database. Examples of vector indices are Facebook
AI Similarity Search (FAISS), Non-Metric Space Library (NMSLIB),
Approximate Nearest Neighbors Oh Yeah (ANNOY), etc
A SIMPLE GUIDE TO
O
D
T
RETRIEVAL AUGMENTED
TIO NTE
E
GENERATION
UID
E
DOWNLOAD
M
EG
FROM GUMROAD
Figure 8: Creation and maintenance of non-parametric database via the indexing pipeline
G
N
U
Generation Pipeline: The set of processes that is employed to
L
knowledge base.
O
measure used to evaluate the importance of a word in a document
D
relative to a collection of documents (corpus). It assigns higher
T
TIO NTE
weights to words that appear frequently in a document but
E
infrequently across the corpus.
UID
BM25: Best Match 25 is an advanced probabilistic model used to
E
rank documents based on the query terms appearing in each
M
document. It is part of the family of probabilistic information
EG
retrieval models and is considered an advancement over the
classic TF-IDF model. The improvement that BM25 brings is that it
G
N
adjusts for the length of the documents so that longer documents
U
do not unfairly get higher scores.
L
A
P
A SIMPLE GUIDE TO
O
D
T
RETRIEVAL AUGMENTED
TIO NTE
E
GENERATION
UID
E
DOWNLOAD
M
EG
FROM GUMROAD
G
N
U
L
DocT5Query
AS
IEV
O
principles to information retrieval. Example: Quantum Language
D
Models (QLM)
T
TIO NTE
E
Neural IR models: Encompass various neural network-based
approaches to information retrieval. Examples: NPRF (Neural PRF),
UID
KNRM (Kernel-based Neural Ranking Model)
M E
Augmentation: The process of combining user query and the
EG
retrieved documents from the knowledge base.
G
N
Prompt Engineering: The technique of giving instructions to an
U
LLM to attain a desired outcome. The goal of Prompt Engineering is
L
A
to construct the prompts to achieve accuracy and relevance in the
P
RA
AL
O
introduction of intermediate “reasoning” steps, improves the
D
performance of LLMs in tasks that require complex reasoning like
T
arithmetic, common sense, and symbolic reasoning.
TIO NTE
E
Self Consistency: While CoT uses a single Reasoning Chain in
UID
Chain of Thought prompting, Self-Consistency aims to sample
E
multiple diverse reasoning paths and use their respective
generations to arrive at the most consistent answer
M
EG
Generated Knowledge Prompting: This technique explores the
G
N
idea of prompt-based knowledge generation by dynamically
U
constructing relevant knowledge chains, leveraging models' latent
L
O
D
specific prompts through a process involving query, uncertainty
T
analysis, human annotation, and enhanced inference.
TIO NTE
E
ReAct Prompting: ReAct integrates LLMs for concurrent reasoning
UID
traces and task-specific actions, improving performance by
interacting with external tools for information retrieval. When
E
combined with CoT, it optimally utilises internal knowledge and
M
external information, enhancing interpretability and trustworthiness
EG
of LLMs.
G
N
Recursive Prompting: Recursive prompting breaks down complex
U
L
RA
AL
O
A SIMPLE GUIDE TO
D
T
TIO NTE
RETRIEVAL AUGMENTED
E
UID GENERATION
M E
DOWNLOAD
EG
G
N
U FROM GUMROAD
L
A
P
IM
O
D
T
Accuracy: The proportion of correct predictions (both true
TIO NTE
positives and true negatives) among the total number of cases
E
examined. Even though accuracy is a simple, intuitive metric, it is
UID
not the primary metric for retrieval. In a large knowledge base,
majority of documents are usually irrelevant to any given query,
E
which can lead to misleadingly high accuracy scores. It does not
M
consider ranking of the retrieved results.
EG
G
N
Precision: It measures the proportion of retrieved documents that
are relevant to the user query. It answers the question, “Of all the
U
L
RA
AL
the question, “Of all the relevant documents, how many were
actually retrieved?”. Like precision, recall also doesn’t consider the
ranking of the retrieved documents. It can also be misleading as
RE
O
D
T
TIO NTE
E
UID
E
A SIMPLE GUIDE TO
M
EG
RETRIEVAL AUGMENTED
G
N
U GENERATION
L
A
P
DOWNLOAD
IM
RA
AL
FROM GUMROAD
AS
IEV
NE
GE
O
the ranking quality by considering the position of relevant
D
documents in the result list and assigning higher scores to relevant
T
documents appearing earlier. It is particularly effective for
TIO NTE
scenarios where documents have varying degrees of relevance.
E
UID
Context relevance: Evaluates how well the retrieved documents
E
relate to the original query. The key aspects are topical alignment,
information usefulness and redundancy. The retrieved context
M
EG
should contain information only relevant to the query or the prompt.
For context relevance, a metric ‘S’ is estimated. ‘S’ is the number of
G
N
sentences in the retrieved context that are relevant for responding
U
to the query or the prompt.
L
A
P
A SIMPLE GUIDE TO
IM
RA
AL
RETRIEVAL AUGMENTED
AS
IEV
NE
GENERATION
GE
TR
DOWNLOAD
FROM GUMROAD
RE
Figure 12: Context relevance evaluates the degree to which the retrieved information is
relevant to the query
O
ensures that the facts in the response do not contradict the context
D
T
and can be traced back to the source. It also ensures that the LLM
TIO NTE
is not hallucinating. Faithfulness first identifies the number of
E
“claims” made in the response and calculates the proportion of
UID
those “claims” present in the context.
E
Hallucination Rate: Calculate the proportion of generated claims
M
in the response that are not present in the retrieved context
EG
G
Coverage: Measures the number of relevant claims in the context
N
and calculates the proportion of relevant claims present in the
U
L
RA
AL
O
to answer the questions that can be potentially asked of the
D
T
system. It is very probable that a document is related to the user
TIO NTE
query but does not have any meaningful information to answer the
E
query. The ability of the RAG system to separate these noisy
UID
documents from the relevant ones is Noise Robustness.
E
Negative Rejection: By nature, Large Language Models always
M
generate text. It is possible that there is absolutely no information
EG
about the user query in the documents in the knowledge base. The
G
ability of the RAG system to not give an answer when there is no
N
relevant information is Negative Rejection.
U
L
A
P
RA
AL
O
retrieval and generation components of RAG systems without
D
T
relying on extensive human annotations. RAGAs also helps in
TIO NTE
synthetically generating a test dataset that can be used to
E
evaluate a RAG pipeline
UID
Synthetic Test Dataset Generation: Using models like LLMs to
E
automatically generate ground truth data from the knowledge
M
base.
EG
G
LLM as a judge: Using an LLM to evaluate a task.
N
U
L
A SIMPLE GUIDE TO
A
P
RETRIEVAL AUGMENTED
IM
RA
AL
GENERATION
AS
IEV
NE
evaluations.
O
Question-Answer datasets.
D
T
TIO NTE
E
UID
M E
EG
G
N
Figure 14: BEIR – 9 tasks and 18 (of 19) datasets (Source: BEIR: A Heterogeneous
Benchmark for Zero-shot Evaluation of Information Retrieval Models)
U
L
A
P
RA
AL
O
D
T
Retrieve-Read: A retriever that retriever reads information and the
TIO NTE
LLM that is reads this information to generate the results
E
A SIMPLE GUIDE TO
UID
E
RETRIEVAL AUGMENTED
M
EG
GENERATION
G
N
U DOWNLOAD
L
A
P
FROM GUMROAD
IM
RA
AL
RAG Failure Points: A RAG system can misfire if the the retriever
AS
IEV
NE
fails to retrieve the entire context or retrieves irrelevant context, the
LLM despite being provided the context, does not consider it and,
instead of answering the query picks irrelevant information from the
GE
TR
context.
O
D
Naive RAG.
A SIMPLE GUIDE TO
T
TIO NTE
RETRIEVAL AUGMENTED
E
UID GENERATION
M E
EG
DOWNLOAD
G
N
U FROM GUMROAD
L
A
P
RA
AL
better retrieval.
the input user query in a manner that makes it better suited for the
retrieval tasks
Query Expansion: The original user query is enriched with the aim
of retrieving more relevant information. This helps in increasing the
recall of the system and overcomes the challenge of incomplete or
very brief user queries.
A Taxonomy of Retrieval Augmented Generation Page 32 of 56
Pipeline Design
Multi-query expansion: In this approach, multiple variations of
O
the original query are generated using an LLM and each variant
D
query is used to search and retrieve chunks from the knowledge
T
base.
TIO NTE
E
Sub-query expansion: In this approach instead of generating
UID
variations of the original query, a complex query is broken down
into simpler sub-queries.
M E
Step back expansion: The term comes from the step-back
EG
prompting approach where the original query is abstracted to a
G
N
higher-level conceptual query.
U
L
Query Rewriting: Queries are rewritten from the input. The input in
AS
for retrieval.
TR
O
language, complexity, source of information etc
D
T
TIO NTE
Hybrid Retrieval: Hybrid retrieval strategy is an essential
E
component of production-grade RAG systems. It involves
UID
combining retrieval methods for improved retrieval accuracy. This
can mean simply using a keyword-based search along with
E
semantic similarity. It can also mean combining all sparse
M
embedding, dense embedding vector and knowledge graph-based
EG
search.
G
N
A SIMPLE GUIDE TO
U
L
A
P
RETRIEVAL AUGMENTED
IM
RA
AL
GENERATION
AS
IEV
NE
Figure 17: Hybrid retriever employs multiple querying techniques and combines the results
O
D
important to the query. This also has a positive effect on the cost
T
and efficiency of the system.
TIO NTE
E
Reranking: Retrieved information from different sources and
UID
techniques can further be ranked to determine the most relevant
documents. Reranking, like hybrid retrieval, is commonly becoming
E
a necessity in production RAG systems. To this end, commonly
M
available rerankers like multi-vector, Learning to Rank (LTR), BERT
EG
based and even hybrid rerankers that can be employed are gaining
G
N
prominence.
U
L
A
P
RA
AL
O
D
involves summarization, specific database searches, or merging
T
different information streams.
TIO NTE
E
Task Adapter: This module makes RAG adaptable to various
UID
downstream tasks allowing the development of task-specific end-
to-end retrievers with minimal examples, demonstrating flexibility in
E
handling different tasks. The Task Adapter Module allows the RAG
M
system to be fine-tuned for specific tasks like summarisation,
EG
translation, or sentiment analysis.
G
N
U
L
A
P
IM
RA
AL
AS
IEV
NE
GE
TR
RE
O
RAG system. A RAG system is likely to fail if any of these layers are
D
missing or incomplete
T
TIO NTE
Data Layer: The data layer serves the critical role of creating and
E
storing the knowledge base for RAG. It is responsible for collecting
UID
data from source systems, transforming it into a usable format and
storing it for efficient retrieval.
M E
Model Layer: Predictive models enable generative AI applications.
EG
Some models are provided by third parties and some need to be
G
N
custom trained or fine-tuned. Generating quick and cost-effective
U
model responses is also an important aspect of leveraging
L
predictive models. This layer holds the model library, training & fine-
A
P
RA
AL
O
managing the interactions amongst the other layers in the system.
D
It is a central coordinator that enables communication between
T
data, retrieval systems, generation models and other services.
TIO NTE
E
A SIMPLE GUIDE TO
UID
E
RETRIEVAL AUGMENTED
M
EG
GENERATION
G
N
U DOWNLOAD
L
A
P
FROM GUMROAD
IM
RA
AL
Figure 18: Core RAGOps stack where Data, Model, Model Deployment and App
AS
IEV
NE
Orchestration layers interact with source systems and managed service providers and co-
ordinate with the application layer to interface with the user.
GE
O
health of the RAG system. Understanding system behaviour and
D
identifying points of failure, assessing the relevance & adequacy of
T
information, and tracking regular system metrics tracking like
TIO NTE
resource utilisation, latency and error rates form the part of the
E
monitoring layer.
UID
LLM Security & Privacy Layer: RAG systems rely on large
E
knowledge bases stored in vector databases, which can contain
M
sensitive information. They need to follow all data privacy
EG
regulations, data protection strategies like anonymisation,
G
N
encryption, differential privacy, query validation & sanitisation, and
U
output filtering to assist in protection against attacks. Implementing
L
RA
AL
the RAG system better and are decided based on the end
requirements.
O
which is particularly important for large-scale systems.
D
T
Explainability and Interpretability Layer: Provides transparency
TIO NTE
for system decisions, especially important for domains requiring
E
accountability
UID
Collaboration and Experimentation Layer: Useful for teams
E
working on development and experimentation but non-critical for
M
system operation.
EG
G
N
U
L
A
P
IM
RA
AL
AS
IEV
NE
GE
TR
RE
O
structures not only increases the contextual understanding but also
D
equips the system with enhanced reasoning capabilities and
T
improved explainability.
TIO NTE
E
Knowledge Graphs: Knowledge graphs organise data in a
UID
structured manner as entities and relationships.
E
GraphRAG: An open-source framework developed by Microsoft
M
EG
that facilitates automatic creation of knowledge graphs from
source documents and then uses the knowledge graph for retrieval.
G
N
U
Graph Communities: Partitioning entities & relationships into
L
groups.
A
P
IM
O
or audio. Multimodal systems can handle multiple modalities
D
simultaneously.
T
TIO NTE
Multimodal Embeddings: A unified vector representation that
E
encodes multiple data types (e.g., text and image embeddings
UID
combined) to enable retrieval across different modalities.
E
CLIP (Contrastive Language-Image Pre-training): A model
M
EG
developed by OpenAI that learns visual concepts from natural
language supervision, often used for cross-modal retrieval and
G
N
generation U
L
A SIMPLE GUIDE TO
AS
IEV
NE
RETRIEVAL AUGMENTED
GE
GENERATION
TR
DOWNLOAD
RE
FROM GUMROAD
Figure 19: Mapping data of different modalities into a shared embedding space
O
workflow to query types and the type of documents in the
D
knowledge base.
T
TIO NTE
Adaptive Frameworks: Dynamic systems that adjust retrieval and
E
generation strategies based on the evolving context and data,
UID
ensuring relevant responses.
E
Routing Agents: Agents responsible for directing user queries to
M
EG
the most appropriate sources or sub-systems for efficient
processing.
G
N
U
Query Planning Agents: Agents that break down complex queries
L
retrieval pipelines.
IM
RA
AL
O
D
OpenAI Milvus
T
HuggingFace Chroma
TIO NTE
Google Vertex AI Weaviate
E
Anthropic Deep Lake
UID
AWS Bedrock Qdrant
AWS Sagemaker Elasticsearch
E
Cohere Vespa
M
Azure Machine Learning Redis (Vector Search Support)
EG
IBM Watson AI Vald
G
N
Mistral AI Zilliz
Salesforce Einstein Marqo
U
L
RA
AL
SingleStore
Data Loading
Snorkel AI Application Framework
AS
IEV
NE
LlamaIndex LangChain
LangChain LlamaIndex
Scale AI Haystack
GE
TR
Roboflow Orchestration)
Datature Rasa (Conversational AI)
V7 Labs Flyte
Clarifai Prefect
Airflow
Metaflow
O
D
PromptLayer TruEra
T
TruLens Fiddler AI
TIO NTE
TruEra Arize AI
E
PromptHero Aporia
UID
TextSynth WhyLabs
Evidently AI
E
Deployment Frameworks Superwise
M
Vllm Monte Carlo
EG
TensorRT-LLM Datadog
G
N
ONNX Runtime
KubeFlow Proprietary LLMs/VLMs
U
L
RA
AL
O
Phi by Microsoft Mostly AI
D
GPT-Neo by EleutherAI Tonic.ai
T
DistilBERT by HuggingFace Synthesis AI
TIO NTE
TinyBERT
E
ALBERT (A Lite BERT) by Others
UID
Google Cohere reranker
MiniLM by Microsoft Unstructured.io
E
DistilGPT2 by HuggingFace
M
Reformer by Google
EG
T5-Base by Google
G
N
U
Managed RAG solutions
L
Amazon Bedrock
IM
Knowledge Bases
RA
AL
Vectorize.io
IEV
NE
Ontology
TR
Neo4j
Stardog
RE
TerminusDB
TigerGraph
O
D
Corrective RAG: In this approach, real-time information is
T
retrieved to check for the factual accuracy of the LLM generated
TIO NTE
answer. Particularly useful in fact-checking, medical & legal
E
domains.
UID
Contrastive RAG: Integrates contrastive learning techniques to
E
enhance the retrieval process by distinguishing between relevant
M
and irrelevant documents. https://arxiv.org/abs/2406.06577
EG
G
N
Selective RAG: Optimises the retrieval phase by determining when
it is beneficial to retrieve external information. This method aims to
U
L
RA
AL
O
D
access and utilise external knowledge dynamically during their fine-
T
tuning process.
TIO NTE
E
RAPTOR: Recursive Abstractive Processing for Tree-Organised
UID
Retrieval focusses on creating a recursive, tree-like structure from
documents to improve context-aware information retrieval. It is
E
beneficial for question-answering tasks, especially when dealing
M
with extensive documents or information that requires multi-step
EG
reasoning.
G
N
Application Areas
U
L
A
P
RA
AL
search engine companies are now building LLM first search engines
IEV
NE
where RAG is the cornerstone of the algorithm.
O
learning & development, RAG is used extensively to create
D
T
personalised learning paths based on past trends and for
TIO NTE
automated evaluation and feedback.
E
Real-time Event Commentary: Imagine an event like a sport or a
UID
news event. A retriever can connect to real-time updates/data via
E
APIs and pass this information to the LLM to create a virtual
M
commentator. These can further be augmented with Text-To-
EG
Speech models. IBM leveraged technology for commentary during
the 2023 US Open Tennis tournament.
G
N
U
L
RA
issues. These agents can also route users to more specialised
AL
agents depending on the nature of the query. Almost all LLM based
chatbots on websites or as internal tools use RAG. These are being
AS
IEV
NE
used in industries like Travel & Hospitality, Fintech and e-commerce.
O
D
T
Relevance Mismatch: Difficulty retrieving the most relevant
TIO NTE
documents or passages from a large dataset due to suboptimal
E
ranking or search mechanisms.
UID
Over-Retrieval: Retrieving too many documents, leading to
E
unnecessary noise and irrelevant content in the final generation.
M
EG
Sparse vs Dense Retrieval Trade-off: Balancing between sparse
retrieval (TF-IDF, BM25) and dense retrieval (using embeddings) to
G
N
maximise relevance without losing performance.
U
L
A
Document Question Answering Systems: With access to
P
RA
intelligent AI system that can answer all questions about the
AL
organisation.
AS
IEV
NE
Latency: Retrieval from large or distributed knowledge bases can
introduce significant delays, affecting real-time applications.
GE
TR
O
long, multi-turn queries, leading to disjointed or incomplete answers.
D
T
TIO NTE
Incoherent Summarisation: Generating inconsistent or disjointed
summaries from multiple retrieved documents, leading to poor user
E
experience.
UID
E
Over-Generation: Generating overly verbose or redundant
responses based on retrieved data that fails to condense the key
M
EG
points effectively.
G
N
Inconsistent Modal Alignment: Challenges integrating
U
multimodal data (e.g., text, images, videos) where retrieved content
L
RA
AL
O
manipulated to feed biased or poisoned data into the generation
D
pipeline, leading to compromised outputs.
T
TIO NTE
Adversarial Attacks: Security vulnerabilities where attackers may
E
influence retrieval or generation results by exploiting weaknesses in
UID
retrieval pipelines.
E
Knowledge Base Updating: Maintaining an up-to-date
M
EG
knowledge base while keeping the retrieval process fast and
accurate can be difficult, especially in dynamic fields like news or
G
N
finance. U
L
/in/Abhinav-Kimothi @akaiworks
@abhinav_kimothi @abhinavkimothi