0% found this document useful (0 votes)
423 views41 pages

RAG Slide ENG

The document discusses Retrieval-Augmented Generation (RAG), which uses large language models to generate answers or text but first retrieves relevant information from external sources to incorporate into the generation. It describes the shifting paradigms in RAG from naive approaches to more advanced techniques, including optimizations to data indexing, pre-retrieval and post-retrieval processing, and modular architectures. The key issues addressed in RAG include what information to retrieve, when to perform retrieval during generation, and how to integrate the retrieved information into the language model.

Uploaded by

Wellcare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
423 views41 pages

RAG Slide ENG

The document discusses Retrieval-Augmented Generation (RAG), which uses large language models to generate answers or text but first retrieves relevant information from external sources to incorporate into the generation. It describes the shifting paradigms in RAG from naive approaches to more advanced techniques, including optimizations to data indexing, pre-retrieval and post-retrieval processing, and modular architectures. The key issues addressed in RAG include what information to retrieve, when to perform retrieval during generation, and how to integrate the retrieved information into the language model.

Uploaded by

Wellcare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Retrieval-Augmented Generation (RAG):

Paradigms, Technologies, and Trends

Haofen Wang
Tongji University
1. RAG Over view

CONTENTS 2. RAG Par adigms Shifting

3. Key Technologies and Evaluation

4. RAG Stack and Industr y Pr actices

5. Summar y and Pr ospect


PART 01
Overview of RAG
Background
Drawbacks of LLMs

• Hallucination
• Outdated information
• Low efficiency in parameterizing knowledge
• Lack of in-depth knowledge in specialized domains
• Weak inferential capabilities

Practical Requirements of Application

• Domain-specific accurate answering


• Frequent updates of data
• Traceability and explainability of generated content
• Controllable Cost
Draw by DALL·E-3
• Privacy protection of data
Retrieval-Augmented Generation (RAG)

When answering questions or


generating text, it first retrieves
relevant information from a large
number of documents, and then
LLMs generates answers based
on this information.

By attaching a external
knowledge base, there is no need
to retrain the entire large model
for each specific task.

The RAG model is especially


suitable for knowledge-intensive
tasks.

A typical case of RAG


Symbolic Knowledge or Parametfic Knowledge

Ways to optimize LLMs.

Prompt Engineering

Retrieval-Augmented
Generation

Instruct / Fine-tuning

A typical case of RAG


RAG vs Fine-tuning
RAG Applications

Scenarios where RAG is applicable: Q&A Fact Checking Dialog


RETRO (Borgeaud et al2021) RAG (Lewis et al, 2020) BlenderBot3 (Shuster et
REALM (Gu et al, 2020) ATLAS (lzacard et al, 2022) al.2022)
ATLAS (lzacard et al, 2023) Evi. Generator (Asai et al, Internet-augmented
• Long-tail distribution of data 2022a) generation
(Komeili et a., 2022)

• Frequent knowledge updates


Summary Machine Translation Code Generation
FLARE (Jiang et al, 2023) kNN-MT (Khandelwal et al., DocPrompting (Zhou et al.,
• Answers requiring verification 2020)TRIME-MT (Zhong et
al., 2022)
2023
Natural ProverWelleck et al.,
2022)

and traceability

• Specialized domain knowledge Natural Language Sentiment Commonsense


Inference analysis reasoning
kNN-Prompt (Shi et al., 2022) kNN-Prompt (Shi et al., Raco (Yu et al, 2022)
• Data privacy preservation NPM (Min et al., 2023) 2022)NPM (Min et al., 2023)
PART 02
RAG Par adigms Shifting
Naive RAG
Step1 Indexing Step3 Gener ation
Naive RAG
1. Divide the document into even The original query and the retrieved text are combined
chunks, each chunk being a and input into a LLM to get the final answer
piece of the original text.
2. Using the encoding model to
generate an embedding for each
chunck. Advanced RAG
3. Store the Embedding of each
block in the vector database.

Step2 Retr ival


Retrieve the k most relevant
documents using vector similarity
Modular RAG
search.
Advanced RAG
Index Optimization  Pre-Retrieval Process  Retrieval 
Naive RAG
Post-Retrieval Process Genaration

• Optimizing Data Indexing:

sliding window, fine-grained

segmentation、adding metadata Advanced RAG

• Pre-Retrieval Process:retrieve

routes, summaries, rewriting, and

confidence judgment

• Post-Retrieval Process:reorder, Modular RAG

filter content retrieval


Modular RAG

Modules Pattern Naive RAG

Search
Naive RAG Read Retrieve Generate

Aggregation Read
Predict
Rewrite DSP
Demonstrate Search Predict Generate Advanced RAG
(2022)

RAG Rerank
Retrieve Rewrite-
Filter Retrieve-Read Rewrite Retrieve Read
(2023)
Generate
Demonstrate
Retrieve-then-
read Retrieve Read Generate
Reflect (2023)
Modular RAG
Comparison of RAG Paradigms
The three key questions of RAG

How to use the


What to retrieve ? When to retrive ?
retrieved information ?
• Token • Single search • Input/Data Layer
• Phrase • Each token
• Chunk • Every N tokens • Model/Intermediate Layer
• Paragraph • Adaptive search • Output/Prediction Layer
• Entity
• Knowledge graph

Augmentation stage: Retrieval choice: Model Generation choice:


• Pre-training • BERT Collaboration • GPT
Other
Issues • Fine-tuning • Roberta • Llama
• Inference • BGE • T5
Scale
• ...... selectionz • ......
Key issue of RAG — What to retrieve
The search is broad, recalling a Knowledge Graph |2023
coarse Chunk|In-Context RAG 2023 large amount of information, but
with low accuracy, high
coverage but includes much
redundant information.

Phrase|NPM 2023
Richer semantic and structured
Retrieval information, but the retrieval
granularity efficiency is lower and is limited
by the quality of KG.

Entity|EasE 2022
Token|KNN-LMM 2019
It excels in handling long-tail
and cross-domain issues with
high computational efficiency,
but it requires significant
storage.
meticulous

low High
level of structuration
Key issue of RAG — How to use the retrieved content
Integrating the retrieved information into different layers of the generation model,during inference process.

Using simple, but unable to support the


retrieval of more knowledge blocks, and
Input / Date layer
the optimization space is limited.

Integrate
retrieval Supports the retrieval of more knowledge
positions. blocks, but introduces additional
Model / Interlayer complexity and must be trained.

Ensuring the output results are highly


Output /Prediction layer relevant to the retrieval content, but the
efficiency is low.
Key issue of RAG — When to retrieve

High efficiency, but low Balancing efficiency and A large amount of information
relevance of the retrieved information might not yield the with low efficiency and
documents optimal solution redundant information.

Once|Replug 2023 Adaptive|Flare 2023 Every N Tokens|Atlas 2023

Conducting once search Adaptively conduct the search. Retrieve once for every N tokens
during the reasoning process. generated.
Low High
Retrieval frequency
Overview of RAG Development
PART 03
Key Technologies and Evaluation
Techniques for Better RAG —— Data indexing optimization
Chunk Optimization

Small-2-Big Slidingwindow Summary Small-2-Big

Embeding at sentence liding chunk covers Retrieve documents


level expand the the entire text, through summaries,
window during avoiding semantic then retrieve text blocks
generation process. ambiguity from the documents. Abstract

Adding Metadata

Example Page Time Type Document Title Pseudo


Metadata

Metadata Filtering/Enrichment

Pseudo Metadata Generation Metadata filter


Metadata
Enhance retrieval by gener-ating a Dissect and annotate the filter
hypothetical document for the document. During the query, infer
incoming query and creating qu- metadata filters in addition to
estions that the text block can answer. semantic queries
Techniques for Better RAG —— Structured Corpus

Hierarchical Organization of Retrieval Corpora

● Summary → Document
Replace document retrieval with summary
retrieval, not only retrieving the most directly
relevant nodes, but also exploring additional nodes
associated with those nodes.

● Document → Embedded Objects

Documents have embedded objects (such as tables,


charts), first retrieve entity reference objects, then
query underlying objects, such as document blocks,
databases, sub-nodes.
Techniques for Better RAG —— Retrieval Source Optimization
Phrases
Unstructured
Data
Prompt

Cross-lingustic

Prompt | UPRISE [Cheng et al.,2023] Cross-language| CREA-ICL [Li et al., 2023]

Triples
Structured Data
Subgraph | SUGRE [Kang
Subgraphs et al., 2023]

LLM Memory
LLM Memory | Selfmem
Generated Text
[Cheng et al., 2023]

Generated Code
Techniques for Better RAG —— KG as a Retrieval Data Source
 GraphRAG
 Extract entities from the user's input query, then construct a subgraph to form context, and finally feed it
into the large model for generation.
 Implementation
 Use LLM (or other models) to extract key entities from the question.
 Retrieve subgraphs based on entities, delving to a certain depth, such as 2 hops or even more.
 Utilize the obtained context to generate answers through LLM.
Techniques for Better RAG —— Query Optimization
Questions and answers do not always possess high semantic similarity; adjusting the Query can yield better retrieval results.

Query Rewriting Query Clarification

Rewrite-Retrieve-Read [Ma et al., 2023] Tree of Clarifications(TOC)[Kim et al.,2023]


Techniques for Better RAG —— Embedding Optimization
Selecting a More Suitable
Embedding Provider

BAAI-General-Embedding (BGE) LLM-Embedder(BGE2) [Aksitov et al.,2023]

Fine-tuning the Embedding Model

Fine-tuning According to Domain-Specific Fine-tuning the Adapter Module to Align the Embedding
Repositories and Downstream Tasks Model with the Retrieval Repository
Techniques for Better RAG —— Retrieval Process Optimization
Iterative Adaptive
Iteratively Retrieving from the Corpus to Acquire Dynamically Determined by the LLM, the Timing and
More Detailed and In-depth Knowledge Scope of Retrieva

ITER [Feng et FLARE [Jiang et


al., 2023] al., 2023]

IRCOT
[Trivedi et Self-RAG
al.,2022] [Asai et al.,
2023]
Techniques for Better RAG —— Hybrid (RAG + Fine-tuning)
Retriever Fine-Tuning Generator Fine-Tuning

Highly Adaptive
General-Purpose
Retrieval Plugin
Augment with Structural
Information Integration

AAR [Yu et al., 2023]

Collaborative Fine-Tuning SANTA [Li et al., 2023]

• R-FT
Mi n i m i z i n g t h e K L D i v e r g e n c e
Between the Retriever Distribution
and LLM Preferences

• LM-FT
Maximizing the Likelihood of the
Correct Answer Given Retrieval-
Augmented Instructions
RA-DIT [Lin et al., 2023]
Summary of Related Research

《Retrieval-Augmented Generation for Large Language Models: A Survey》


How to Evaluate the Effectiveness of RAG
EvaluationMethods Independent Evaluation End-to-End Evaluation
Evaluate the content ultimately generated by the model.
Retriever Generation/Synthesis By gener ated conten By evaluation method
Evaluate the Quality of Text Quality of Context Enhanced with With labels:EM,Accuracy Human evaluation
Blocks Retrieved by the Query Retrieved Documents Evaluation Without labels: Fidelity, Automatic evaluation (LLM
Metrics: MRP, Hit Rate, NDCG Metrics: Context Relevance Relevance, Harmlessness judge)

Key Metr ics & Capabilities Key Metr ics Key Capabilities
Noise Robustness Negative Rejection
Quer y Context Relevance: Can the model extract useful When therequired knowledge is not
Answer Relevance Is the context enhanced with information from noisy exsiting in the retrieved documents, the
Is the answer relevant to retrieved documents relevant documents? answer should be refused.
the query? to the query?

Info Integr ation Counter factual Robustness


Answer Context Can the model answer complex Can the model recognize the risk of
Answer Fidelity: questions that require integrating known factual errors in the retrieved
Is the answer based on the information from multiple documents? documents?
given context?

Assessment Fr amewor k
Use LLM as the adjudicator judge.
• Answer Fidelity
Evaluation
Tr uLens RAGAS ARES • Answer Relevance

Synthetic dataset + Fine-tuning + Ranking using • Contextual Relevance


Based on handwr itten pr ompt
confidence inter vals
PART 04
RAG Stack and Industr y Pr actices
Existing Tech Stack for RAG

Name Pr os Cons

Inconsistent behavior ,API


LangChain Modular, full-featured
conceals details,complexity and
low flexibility. LangChain
FlowiseAI
Requires combination use, low
LlamaIndex Focus on RAG
customization.

Easy to get started, visualized Does not support complex


FlowiseAI
workflows. scenarios.

Adapts to multi-agent Low efficiency, requires multiple


AutoGen
scenarios. rounds of dialogue. LlamaIndex

AutoGen
RAG Industry Application Practices

The intelligent upgrade of


traditional industries

NetEase - ChatBI BMW - CarExpert


RAG

AI Toolchain
Enhancement

Cohere - Coral Amazon - Kendra


PART 06
Summar y and Outlook
Summary — The Framework of RAG
Summary —— Three Trends of RAG
• The Scaling Law of RAG Models
• How to Improve the Efficiency of Retrieving
Large-scale Data
Technology
• Mitigation of Forgetting in Long-context Scenarios
• Enhancement of Multimodal Retrieval

• Modularity Will Become Mainstream


• Patterns for Module Organization Await
RAG Paradigm
Refinement
• Evaluation Systems Need to Evolve and Improve
with Time
• Preliminary Formation of Toolchain Technology Stack
Ecological
• One-stop Platform Still Requires Polishing
Environment
• Explosion of Enterprise-level Applications
Prospects — Existing Challengs of RAG
Further address the challenges faced by RAG itself

Long context Coordination with FT The role of LLMs


• Retrieved content is excessive, exceeding
• How to simultaneously leverage the • LLM can be used for retrieval (LLM
window limit.
• The context is too long to result Lost in effects of RAG and FT. generation replaces retrieval, retrieving
the Milddle.
• How do the two coordinate, how are from LLM memory), for generation, and
• If the context window is not limited, is
there still a need for RAG? they organized, is it in Pipeline, for evaluation. How to further explore the
alternating, or end-to-end? potential of LLM in RAG.

Robustness Scaling Law Engineering Practice


• How to handle the incorrect content retrieved • Does the RAG model satisfy the Scaling • How to reduce the latency of retrieving
• How to filter and verify the content retrieved. Law ultra-large-scale corpora.
• How to improve the model's resistance to • Does RAG exhibit, or under what • How to ensure that the content retrieved is
toxicity and noise scenarios does it exhibit an Inverse Scaling not leaked by large models
Law
Prospects — Mult-Modality Extension
Transferring the concept of RAG from text to other modalities of data
Image

RA-CM3 [Yasunaga et al.,2023] RA-CLIP [Xie et al.,2023]

Vedio Code

Re-AudioLDM [Yuan et al.,2023] DocPrompting [Zhou et al.,2023]


Prospects —— Development of RAG Ecosystem
Further expand the downstream tasks of RAG and improve ecological construction

Downstream Task Development and Evaluation

Recommendation System Information extraction Report generation


| TIGER [Rajput et al.,2023] | Filter- Rerank [Ma et al.,2023] | FABULA [Ranade et al.,2023]
Technology Stack Construction
• Customized function, meeting a variety
of needs
• Simplified use, further reducing the
barrier to entry. Open-source framework for
• Specialized functions, gradually towards Personal Knowledge
Assistant Based on RAG production environments
production environments.
Reference
1. Alon, U. et al. Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval.
2. Lewis, P. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
3. Guu, K., Lee, K., Tung, Z., Pasupat, P. & Chang, M.-W. REALM: Retrieval-Augmented Language Model Pre-Training. Preprint at
http://arxiv.org/abs/2002.08909 (2020).
4. Dai, Z. et al. Promptagator: Few-shot Dense Retrieval From 8 Examples. Preprint at http://arxiv.org/abs/2209.11755 (2022).
5. Izacard, G. et al. Atlas: Few-shot Learning with Retrieval Augmented Language Models. Preprint at http://arxiv.org/abs/2208.03299 (2022).
6. Gao, L., Ma, X., Lin, J. & Callan, J. Precise Zero-Shot Dense Retrieval without Relevance Labels. Preprint at http://arxiv.org/abs/2212.10496 (2022).
7. Muennighoff, N., Tazi, N., Magne, L. & Reimers, N. MTEB: Massive Text Embedding Benchmark. in Proceedings of the 17th Conference of the European
Chapter of the Association for Computational Linguistics 2014–2037 (Association for Computational Linguistics, 2023).
8. Ren, Y. et al. Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation. in Proceedings of the 61st Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 293–306 (Association for Computational Linguistics, 2023).
9. Zhang, J. et al. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models. in Proceedings of the 61st Annual Meeting of the
Association for Computational Linguistics (Volume 2: Short Papers) 1128–1136 (Association for Computational Linguistics, 2023). 10. Khattab, O. et al.
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP. Preprint at http://arxiv.org/abs/2212.14024 (2023).
11. Cheng, X. et al. Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory. Preprint at http://arxiv.org/abs/2305.02437 (2023).
12. Luo, Z. et al. Augmented Large Language Models with Parametric Knowledge Guiding. Preprint at http://arxiv.org/abs/2305.04757 (2023).
13. Shi, W. et al. REPLUG: Retrieval-Augmented Black-Box Language Models. Preprint at http://arxiv.org/abs/2301.12652 (2023).
14. Yu, Z., Xiong, C., Yu, S. & Liu, Z. Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In. Preprint at
http://arxiv.org/abs/2305.17331 (2023).
15. Kang, M., Kwak, J. M., Baek, J. & Hwang, S. J. Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation. Preprint at
http://arxiv.org/abs/2305.18846 (2023).
16. Trivedi, H., Balasubramanian, N., Khot, T. & Sabharwal, A. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step
Questions. Preprint at http://arxiv.org/abs/2212.10509 (2023).
17. Wang, L., Yang, N. & Wei, F. Learning to Retrieve In-Context Examples for Large Language Models. Preprint at http://arxiv.org/abs/2307.07164 (2023).
18. Li, Z. et al. Towards General Text Embeddings with Multi-stage Contrastive Learning. Preprint at http://arxiv.org/abs/2308.03281 (2023).
19. Ng, Y. et al. SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool. Preprint at http://arxiv.org/abs/2308.03983 (2023).
20. Huang, J. et al. RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models. Preprint at http://arxiv.org/abs/2308.07922
(2023).
Reference
21. Zhu, Y. et al. Large Language Models for Information Retrieval: A Survey. Preprint at http://arxiv.org/abs/2308.07107 (2023).
22. Wang, X. et al. KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases. Preprint at
http://arxiv.org/abs/2308.11761 (2023).
23. Chen, J., Lin, H., Han, X. & Sun, L. Benchmarking Large Language Models in Retrieval-Augmented Generation. Preprint at http://arxiv.org/abs/2309.01431
24. Es, S., James, J., Espinosa-Anke, L. & Schockaert, S. RAGAS: Automated Evaluation of Retrieval Augmented Generation. Preprint at
http://arxiv.org/abs/2309.15217 (2023).
25. Yoran, O., Wolfson, T., Ram, O. & Berant, J. Making Retrieval-Augmented Language Models Robust to Irrelevant Context. Preprint at
http://arxiv.org/abs/2310.01558 (2023).
26. Feng, Z., Feng, X., Zhao, D., Yang, M. & Qin, B. Retrieval-Generation Synergy Augmented Large Language Models. Preprint at
http://arxiv.org/abs/2310.05149 (2023).
27. Zheng, H. S. et al. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models. Preprint at http://arxiv.org/abs/2310.06117 (2023).
28. Cheng, D. et al. UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation. Preprint at http://arxiv.org/abs/2303.08518 (2023).
29. Wang, B. et al. InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining. Preprint at http://arxiv.org/abs/2310.07713 (2023).
30. Jiang, Z. et al. Active Retrieval Augmented Generation. Preprint at http://arxiv.org/abs/2305.06983 (2023).
31. Gou, Q. et al. Diversify Question Generation with Retrieval-Augmented Style Transfer. Preprint at http://arxiv.org/abs/2310.14503 (2023).
32. Ma, X., Gong, Y., He, P., Zhao, H. & Duan, N. Query Rewriting for Retrieval-Augmented Large Language Models. Preprint at
http://arxiv.org/abs/2305.14283 (2023).
33. Yang, H. et al. PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter.
Preprint at http://arxiv.org/abs/2310.18347 (2023).
34. Kim, G., Kim, S., Jeon, B., Park, J. & Kang, J. Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models.
Preprint at http://arxiv.org/abs/2310.14696 (2023).
35. Shao, Z. et al. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. Preprint at
http://arxiv.org/abs/2305.15294 (2023).
36. Zhang, P., Xiao, S., Liu, Z., Dou, Z. & Nie, J.-Y. Retrieve Anything To Augment Large Language Models. Preprint at http://arxiv.org/abs/2310.07554 (2023).
37. Purwar, A. & Sundar, R. Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface. Preprint at
http://arxiv.org/abs/2310.04205 (2023).
38. Lin, X. V. et al. RA-DIT: Retrieval-Augmented Dual Instruction Tuning. Preprint at http://arxiv.org/abs/2310.01352 (2023).
39. Yu, W. et al. Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models. Preprint at http://arxiv.org/abs/2311.09210 (2023).
Thank you!

For more information, please see:


Our paper : https://arxiv.org/abs/2312.10997
Our GitHub: https://github.com/Tongji-KGLLM/RAG-Survey

You might also like