0% found this document useful (0 votes)

423 views41 pages

RAG Slide ENG

The document discusses Retrieval-Augmented Generation (RAG), which uses large language models to generate answers or text but first retrieves relevant information from external sources to incorporate into the generation. It describes the shifting paradigms in RAG from naive approaches to more advanced techniques, including optimizations to data indexing, pre-retrieval and post-retrieval processing, and modular architectures. The key issues addressed in RAG include what information to retrieve, when to perform retrieval during generation, and how to integrate the retrieved information into the language model.

Uploaded by

Wellcare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

423 views41 pages

RAG Slide ENG

Uploaded by

Wellcare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Retrieval-Augmented Generation (RAG):

Paradigms, Technologies, and Trends

Haofen Wang
Tongji University
1. RAG Over view

CONTENTS 2. RAG Par adigms Shifting

3. Key Technologies and Evaluation

4. RAG Stack and Industr y Pr actices

5. Summar y and Pr ospect

PART 01
Overview of RAG
Background
Drawbacks of LLMs

• Hallucination
• Outdated information
• Low efficiency in parameterizing knowledge
• Lack of in-depth knowledge in specialized domains
• Weak inferential capabilities

Practical Requirements of Application

• Domain-specific accurate answering

• Frequent updates of data
• Traceability and explainability of generated content
• Controllable Cost
Draw by DALL·E-3
• Privacy protection of data
Retrieval-Augmented Generation (RAG)

When answering questions or

generating text, it first retrieves
relevant information from a large
number of documents, and then
LLMs generates answers based
on this information.

By attaching a external
knowledge base, there is no need
to retrain the entire large model
for each specific task.

The RAG model is especially

suitable for knowledge-intensive
tasks.

A typical case of RAG

Symbolic Knowledge or Parametfic Knowledge

Ways to optimize LLMs.

Prompt Engineering

Retrieval-Augmented
Generation

Instruct / Fine-tuning

A typical case of RAG

RAG vs Fine-tuning
RAG Applications

Scenarios where RAG is applicable: Q&A Fact Checking Dialog

RETRO (Borgeaud et al2021) RAG (Lewis et al, 2020) BlenderBot3 (Shuster et
REALM (Gu et al, 2020) ATLAS (lzacard et al, 2022) al.2022)
ATLAS (lzacard et al, 2023) Evi. Generator (Asai et al, Internet-augmented
• Long-tail distribution of data 2022a） generation
(Komeili et a., 2022)

• Frequent knowledge updates

Summary Machine Translation Code Generation
FLARE (Jiang et al, 2023) kNN-MT (Khandelwal et al., DocPrompting (Zhou et al.,
• Answers requiring verification 2020)TRIME-MT (Zhong et
al., 2022)
2023
Natural ProverWelleck et al.,
2022)

and traceability

• Specialized domain knowledge Natural Language Sentiment Commonsense

Inference analysis reasoning
kNN-Prompt (Shi et al., 2022) kNN-Prompt (Shi et al., Raco (Yu et al, 2022)
• Data privacy preservation NPM (Min et al., 2023) 2022)NPM (Min et al., 2023)
PART 02
RAG Par adigms Shifting
Naive RAG
Step1 Indexing Step3 Gener ation
Naive RAG
1. Divide the document into even The original query and the retrieved text are combined
chunks, each chunk being a and input into a LLM to get the final answer
piece of the original text.
2. Using the encoding model to
generate an embedding for each
chunck. Advanced RAG
3. Store the Embedding of each
block in the vector database.

Step2 Retr ival

Retrieve the k most relevant
documents using vector similarity
Modular RAG
search.
Advanced RAG
Index Optimization  Pre-Retrieval Process  Retrieval 
Naive RAG
Post-Retrieval Process Genaration

• Optimizing Data Indexing：

sliding window, fine-grained

segmentation、adding metadata Advanced RAG

• Pre-Retrieval Process：retrieve

routes, summaries, rewriting, and

confidence judgment

• Post-Retrieval Process：reorder, Modular RAG

filter content retrieval

Modular RAG

Modules Pattern Naive RAG

Search
Naive RAG Read Retrieve Generate

Aggregation Read
Predict
Rewrite DSP
Demonstrate Search Predict Generate Advanced RAG
(2022)

RAG Rerank
Retrieve Rewrite-
Filter Retrieve-Read Rewrite Retrieve Read
(2023)
Generate
Demonstrate
Retrieve-then-
read Retrieve Read Generate
Reflect (2023)
Modular RAG
Comparison of RAG Paradigms
The three key questions of RAG

How to use the

What to retrieve ? When to retrive ?
retrieved information ?
• Token • Single search • Input/Data Layer
• Phrase • Each token
• Chunk • Every N tokens • Model/Intermediate Layer
• Paragraph • Adaptive search • Output/Prediction Layer
• Entity
• Knowledge graph

Augmentation stage： Retrieval choice: Model Generation choice:

• Pre-training • BERT Collaboration • GPT
Other
Issues • Fine-tuning • Roberta • Llama
• Inference • BGE • T5
Scale
• ...... selectionz • ......
Key issue of RAG — What to retrieve
The search is broad, recalling a Knowledge Graph ｜2023
coarse Chunk｜In-Context RAG 2023 large amount of information, but
with low accuracy, high
coverage but includes much
redundant information.

Phrase｜NPM 2023
Richer semantic and structured
Retrieval information, but the retrieval
granularity efficiency is lower and is limited
by the quality of KG.

Entity｜EasE 2022
Token｜KNN-LMM 2019
It excels in handling long-tail
and cross-domain issues with
high computational efficiency,
but it requires significant
storage.
meticulous

low High
level of structuration
Key issue of RAG — How to use the retrieved content
Integrating the retrieved information into different layers of the generation model,during inference process.

Using simple, but unable to support the

retrieval of more knowledge blocks, and
Input / Date layer
the optimization space is limited.

Integrate
retrieval Supports the retrieval of more knowledge
positions. blocks, but introduces additional
Model / Interlayer complexity and must be trained.

Ensuring the output results are highly

Output /Prediction layer relevant to the retrieval content, but the
efficiency is low.
Key issue of RAG — When to retrieve

High efficiency, but low Balancing efficiency and A large amount of information
relevance of the retrieved information might not yield the with low efficiency and
documents optimal solution redundant information.

Once｜Replug 2023 Adaptive｜Flare 2023 Every N Tokens｜Atlas 2023

Conducting once search Adaptively conduct the search. Retrieve once for every N tokens
during the reasoning process. generated.
Low High
Retrieval frequency
Overview of RAG Development
PART 03
Key Technologies and Evaluation
Techniques for Better RAG —— Data indexing optimization
Chunk Optimization

Small-2-Big Slidingwindow Summary Small-2-Big

Embeding at sentence liding chunk covers Retrieve documents

level expand the the entire text, through summaries,
window during avoiding semantic then retrieve text blocks
generation process. ambiguity from the documents. Abstract

Adding Metadata

Example Page Time Type Document Title Pseudo

Metadata

Metadata Filtering/Enrichment

Pseudo Metadata Generation Metadata filter

Metadata
Enhance retrieval by gener-ating a Dissect and annotate the filter
hypothetical document for the document. During the query, infer
incoming query and creating qu- metadata filters in addition to
estions that the text block can answer. semantic queries
Techniques for Better RAG —— Structured Corpus

Hierarchical Organization of Retrieval Corpora

● Summary → Document
Replace document retrieval with summary
retrieval, not only retrieving the most directly
relevant nodes, but also exploring additional nodes
associated with those nodes.

● Document → Embedded Objects

Documents have embedded objects (such as tables,

charts), first retrieve entity reference objects, then
query underlying objects, such as document blocks,
databases, sub-nodes.
Techniques for Better RAG —— Retrieval Source Optimization
Phrases
Unstructured
Data
Prompt

Cross-lingustic

Prompt | UPRISE [Cheng et al.,2023] Cross-language| CREA-ICL [Li et al., 2023]

Triples
Structured Data
Subgraph | SUGRE [Kang
Subgraphs et al., 2023]

LLM Memory
LLM Memory | Selfmem
Generated Text
[Cheng et al., 2023]

Generated Code
Techniques for Better RAG —— KG as a Retrieval Data Source
 GraphRAG
 Extract entities from the user's input query, then construct a subgraph to form context, and finally feed it
into the large model for generation.
 Implementation
 Use LLM (or other models) to extract key entities from the question.
 Retrieve subgraphs based on entities, delving to a certain depth, such as 2 hops or even more.
 Utilize the obtained context to generate answers through LLM.
Techniques for Better RAG —— Query Optimization
Questions and answers do not always possess high semantic similarity; adjusting the Query can yield better retrieval results.

Query Rewriting Query Clarification

Rewrite-Retrieve-Read [Ma et al., 2023] Tree of Clarifications（TOC）[Kim et al.,2023]

Techniques for Better RAG —— Embedding Optimization
Selecting a More Suitable
Embedding Provider

BAAI-General-Embedding (BGE) LLM-Embedder(BGE2) [Aksitov et al.,2023]

Fine-tuning the Embedding Model

Fine-tuning According to Domain-Specific Fine-tuning the Adapter Module to Align the Embedding
Repositories and Downstream Tasks Model with the Retrieval Repository
Techniques for Better RAG —— Retrieval Process Optimization
Iterative Adaptive
Iteratively Retrieving from the Corpus to Acquire Dynamically Determined by the LLM, the Timing and
More Detailed and In-depth Knowledge Scope of Retrieva

ITER [Feng et FLARE [Jiang et

al., 2023] al., 2023]

IRCOT
[Trivedi et Self-RAG
al.,2022] [Asai et al.,
2023]
Techniques for Better RAG —— Hybrid (RAG + Fine-tuning)
Retriever Fine-Tuning Generator Fine-Tuning

Highly Adaptive
General-Purpose
Retrieval Plugin
Augment with Structural
Information Integration

AAR [Yu et al., 2023]

Collaborative Fine-Tuning SANTA [Li et al., 2023]

• R-FT
Mi n i m i z i n g t h e K L D i v e r g e n c e
Between the Retriever Distribution
and LLM Preferences

• LM-FT
Maximizing the Likelihood of the
Correct Answer Given Retrieval-
Augmented Instructions
RA-DIT [Lin et al., 2023]
Summary of Related Research

《Retrieval-Augmented Generation for Large Language Models: A Survey》

How to Evaluate the Effectiveness of RAG
EvaluationMethods Independent Evaluation End-to-End Evaluation
Evaluate the content ultimately generated by the model.
Retriever Generation/Synthesis By gener ated conten By evaluation method
Evaluate the Quality of Text Quality of Context Enhanced with With labels：EM，Accuracy Human evaluation
Blocks Retrieved by the Query Retrieved Documents Evaluation Without labels: Fidelity, Automatic evaluation (LLM
Metrics: MRP, Hit Rate, NDCG Metrics: Context Relevance Relevance, Harmlessness judge)

Key Metr ics & Capabilities Key Metr ics Key Capabilities
Noise Robustness Negative Rejection
Quer y Context Relevance: Can the model extract useful When therequired knowledge is not
Answer Relevance Is the context enhanced with information from noisy exsiting in the retrieved documents, the
Is the answer relevant to retrieved documents relevant documents? answer should be refused.
the query? to the query?

Info Integr ation Counter factual Robustness

Answer Context Can the model answer complex Can the model recognize the risk of
Answer Fidelity: questions that require integrating known factual errors in the retrieved
Is the answer based on the information from multiple documents? documents?
given context?

Assessment Fr amewor k
Use LLM as the adjudicator judge.
• Answer Fidelity
Evaluation
Tr uLens RAGAS ARES • Answer Relevance

Synthetic dataset + Fine-tuning + Ranking using • Contextual Relevance

Based on handwr itten pr ompt
confidence inter vals
PART 04
RAG Stack and Industr y Pr actices
Existing Tech Stack for RAG

Name Pr os Cons

Inconsistent behavior ,API

LangChain Modular, full-featured
conceals details,complexity and
low flexibility. LangChain
FlowiseAI
Requires combination use, low
LlamaIndex Focus on RAG
customization.

Easy to get started, visualized Does not support complex

FlowiseAI
workflows. scenarios.

Adapts to multi-agent Low efficiency, requires multiple

AutoGen
scenarios. rounds of dialogue. LlamaIndex

AutoGen
RAG Industry Application Practices

The intelligent upgrade of

traditional industries

NetEase - ChatBI BMW - CarExpert

RAG

AI Toolchain
Enhancement

Cohere - Coral Amazon - Kendra

PART 06
Summar y and Outlook
Summary — The Framework of RAG
Summary —— Three Trends of RAG
• The Scaling Law of RAG Models
• How to Improve the Efficiency of Retrieving
Large-scale Data
Technology
• Mitigation of Forgetting in Long-context Scenarios
• Enhancement of Multimodal Retrieval

• Modularity Will Become Mainstream

• Patterns for Module Organization Await
RAG Paradigm
Refinement
• Evaluation Systems Need to Evolve and Improve
with Time
• Preliminary Formation of Toolchain Technology Stack
Ecological
• One-stop Platform Still Requires Polishing
Environment
• Explosion of Enterprise-level Applications
Prospects — Existing Challengs of RAG
Further address the challenges faced by RAG itself

Long context Coordination with FT The role of LLMs

• Retrieved content is excessive, exceeding
• How to simultaneously leverage the • LLM can be used for retrieval (LLM
window limit.
• The context is too long to result Lost in effects of RAG and FT. generation replaces retrieval, retrieving
the Milddle.
• How do the two coordinate, how are from LLM memory), for generation, and
• If the context window is not limited, is
there still a need for RAG? they organized, is it in Pipeline, for evaluation. How to further explore the
alternating, or end-to-end? potential of LLM in RAG.

Robustness Scaling Law Engineering Practice

• How to handle the incorrect content retrieved • Does the RAG model satisfy the Scaling • How to reduce the latency of retrieving
• How to filter and verify the content retrieved. Law ultra-large-scale corpora.
• How to improve the model's resistance to • Does RAG exhibit, or under what • How to ensure that the content retrieved is
toxicity and noise scenarios does it exhibit an Inverse Scaling not leaked by large models
Law
Prospects — Mult-Modality Extension
Transferring the concept of RAG from text to other modalities of data
Image

RA-CM3 [Yasunaga et al.,2023] RA-CLIP [Xie et al.,2023]

Vedio Code

Re-AudioLDM [Yuan et al.,2023] DocPrompting [Zhou et al.,2023]

Prospects —— Development of RAG Ecosystem
Further expand the downstream tasks of RAG and improve ecological construction

Downstream Task Development and Evaluation

Recommendation System Information extraction Report generation

| TIGER [Rajput et al.,2023] | Filter- Rerank [Ma et al.,2023] | FABULA [Ranade et al.,2023]
Technology Stack Construction
• Customized function, meeting a variety
of needs
• Simplified use, further reducing the
barrier to entry. Open-source framework for
• Specialized functions, gradually towards Personal Knowledge
Assistant Based on RAG production environments
production environments.
Reference
1. Alon, U. et al. Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval.
2. Lewis, P. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
3. Guu, K., Lee, K., Tung, Z., Pasupat, P. & Chang, M.-W. REALM: Retrieval-Augmented Language Model Pre-Training. Preprint at
http://arxiv.org/abs/2002.08909 (2020).
4. Dai, Z. et al. Promptagator: Few-shot Dense Retrieval From 8 Examples. Preprint at http://arxiv.org/abs/2209.11755 (2022).
5. Izacard, G. et al. Atlas: Few-shot Learning with Retrieval Augmented Language Models. Preprint at http://arxiv.org/abs/2208.03299 (2022).
6. Gao, L., Ma, X., Lin, J. & Callan, J. Precise Zero-Shot Dense Retrieval without Relevance Labels. Preprint at http://arxiv.org/abs/2212.10496 (2022).
7. Muennighoff, N., Tazi, N., Magne, L. & Reimers, N. MTEB: Massive Text Embedding Benchmark. in Proceedings of the 17th Conference of the European
Chapter of the Association for Computational Linguistics 2014–2037 (Association for Computational Linguistics, 2023).
8. Ren, Y. et al. Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation. in Proceedings of the 61st Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 293–306 (Association for Computational Linguistics, 2023).
9. Zhang, J. et al. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models. in Proceedings of the 61st Annual Meeting of the
Association for Computational Linguistics (Volume 2: Short Papers) 1128–1136 (Association for Computational Linguistics, 2023). 10. Khattab, O. et al.
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP. Preprint at http://arxiv.org/abs/2212.14024 (2023).
11. Cheng, X. et al. Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory. Preprint at http://arxiv.org/abs/2305.02437 (2023).
12. Luo, Z. et al. Augmented Large Language Models with Parametric Knowledge Guiding. Preprint at http://arxiv.org/abs/2305.04757 (2023).
13. Shi, W. et al. REPLUG: Retrieval-Augmented Black-Box Language Models. Preprint at http://arxiv.org/abs/2301.12652 (2023).
14. Yu, Z., Xiong, C., Yu, S. & Liu, Z. Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In. Preprint at
http://arxiv.org/abs/2305.17331 (2023).
15. Kang, M., Kwak, J. M., Baek, J. & Hwang, S. J. Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation. Preprint at
http://arxiv.org/abs/2305.18846 (2023).
16. Trivedi, H., Balasubramanian, N., Khot, T. & Sabharwal, A. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step
Questions. Preprint at http://arxiv.org/abs/2212.10509 (2023).
17. Wang, L., Yang, N. & Wei, F. Learning to Retrieve In-Context Examples for Large Language Models. Preprint at http://arxiv.org/abs/2307.07164 (2023).
18. Li, Z. et al. Towards General Text Embeddings with Multi-stage Contrastive Learning. Preprint at http://arxiv.org/abs/2308.03281 (2023).
19. Ng, Y. et al. SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool. Preprint at http://arxiv.org/abs/2308.03983 (2023).
20. Huang, J. et al. RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models. Preprint at http://arxiv.org/abs/2308.07922
(2023).
Reference
21. Zhu, Y. et al. Large Language Models for Information Retrieval: A Survey. Preprint at http://arxiv.org/abs/2308.07107 (2023).
22. Wang, X. et al. KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases. Preprint at
http://arxiv.org/abs/2308.11761 (2023).
23. Chen, J., Lin, H., Han, X. & Sun, L. Benchmarking Large Language Models in Retrieval-Augmented Generation. Preprint at http://arxiv.org/abs/2309.01431
24. Es, S., James, J., Espinosa-Anke, L. & Schockaert, S. RAGAS: Automated Evaluation of Retrieval Augmented Generation. Preprint at
http://arxiv.org/abs/2309.15217 (2023).
25. Yoran, O., Wolfson, T., Ram, O. & Berant, J. Making Retrieval-Augmented Language Models Robust to Irrelevant Context. Preprint at
http://arxiv.org/abs/2310.01558 (2023).
26. Feng, Z., Feng, X., Zhao, D., Yang, M. & Qin, B. Retrieval-Generation Synergy Augmented Large Language Models. Preprint at
http://arxiv.org/abs/2310.05149 (2023).
27. Zheng, H. S. et al. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models. Preprint at http://arxiv.org/abs/2310.06117 (2023).
28. Cheng, D. et al. UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation. Preprint at http://arxiv.org/abs/2303.08518 (2023).
29. Wang, B. et al. InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining. Preprint at http://arxiv.org/abs/2310.07713 (2023).
30. Jiang, Z. et al. Active Retrieval Augmented Generation. Preprint at http://arxiv.org/abs/2305.06983 (2023).
31. Gou, Q. et al. Diversify Question Generation with Retrieval-Augmented Style Transfer. Preprint at http://arxiv.org/abs/2310.14503 (2023).
32. Ma, X., Gong, Y., He, P., Zhao, H. & Duan, N. Query Rewriting for Retrieval-Augmented Large Language Models. Preprint at
http://arxiv.org/abs/2305.14283 (2023).
33. Yang, H. et al. PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter.
Preprint at http://arxiv.org/abs/2310.18347 (2023).
34. Kim, G., Kim, S., Jeon, B., Park, J. & Kang, J. Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models.
Preprint at http://arxiv.org/abs/2310.14696 (2023).
35. Shao, Z. et al. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. Preprint at
http://arxiv.org/abs/2305.15294 (2023).
36. Zhang, P., Xiao, S., Liu, Z., Dou, Z. & Nie, J.-Y. Retrieve Anything To Augment Large Language Models. Preprint at http://arxiv.org/abs/2310.07554 (2023).
37. Purwar, A. & Sundar, R. Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface. Preprint at
http://arxiv.org/abs/2310.04205 (2023).
38. Lin, X. V. et al. RA-DIT: Retrieval-Augmented Dual Instruction Tuning. Preprint at http://arxiv.org/abs/2310.01352 (2023).
39. Yu, W. et al. Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models. Preprint at http://arxiv.org/abs/2311.09210 (2023).
Thank you！

For more information, please see:

Our paper : https://arxiv.org/abs/2312.10997
Our GitHub: https://github.com/Tongji-KGLLM/RAG-Survey

RelativityOne - Recipes PDF
100% (1)
RelativityOne - Recipes PDF
294 pages
A Taxonomy of Retrieval Augmented Generation
100% (2)
A Taxonomy of Retrieval Augmented Generation
56 pages
Generative AI Apps With Langchain and Python - Rabi Jay
100% (1)
Generative AI Apps With Langchain and Python - Rabi Jay
387 pages
GraphRAG + GPT-4o-Mini Is The RAG Heaven - by Vatsal Saglani - Jul, 2024 - Towards AI
No ratings yet
GraphRAG + GPT-4o-Mini Is The RAG Heaven - by Vatsal Saglani - Jul, 2024 - Towards AI
34 pages
Multi-Agent Agentic RAG Systems - Prashant Sahu
No ratings yet
Multi-Agent Agentic RAG Systems - Prashant Sahu
10 pages
Building LLM Applications For Production
100% (3)
Building LLM Applications For Production
28 pages
Osman Taha
No ratings yet
Osman Taha
149 pages
Building LLM Powered Applications With Langchain
100% (1)
Building LLM Powered Applications With Langchain
11 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
Information Technology - Class 10
100% (1)
Information Technology - Class 10
13 pages
RAG Architecture
100% (8)
RAG Architecture
52 pages
Technical Service Manual For OCE 7XXX
No ratings yet
Technical Service Manual For OCE 7XXX
384 pages
Generative AI With Large Language Models
100% (4)
Generative AI With Large Language Models
31 pages
LangChain Cheat Sheet KDnuggets
No ratings yet
LangChain Cheat Sheet KDnuggets
1 page
CS8080 Irt
No ratings yet
CS8080 Irt
258 pages
Splunk User
No ratings yet
Splunk User
11 pages
Unit - 1: 1.role of Information Architect
100% (1)
Unit - 1: 1.role of Information Architect
27 pages
Company Guide: MR Vaibhav: Team Lead Sales and Marketing Outlook
No ratings yet
Company Guide: MR Vaibhav: Team Lead Sales and Marketing Outlook
52 pages
University of Mumbai MCQ Question Bank: Semester
No ratings yet
University of Mumbai MCQ Question Bank: Semester
17 pages
The Word Wide Web Multimedia
No ratings yet
The Word Wide Web Multimedia
12 pages
Zebra - User's Guide and Reference
100% (1)
Zebra - User's Guide and Reference
163 pages
P 01 Intro
No ratings yet
P 01 Intro
70 pages
How To Build Your Own Custom ChatGPT Bot With Custom Knowledge Base - Better Programming
No ratings yet
How To Build Your Own Custom ChatGPT Bot With Custom Knowledge Base - Better Programming
8 pages
EntityFrameworkCodeFirst&AspMVC 2.0
No ratings yet
EntityFrameworkCodeFirst&AspMVC 2.0
41 pages
Content Server 6.5 SP2 Full-Text Indexing Deployment and Administration Guide
No ratings yet
Content Server 6.5 SP2 Full-Text Indexing Deployment and Administration Guide
144 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
Sathi A Das 2003
No ratings yet
Sathi A Das 2003
10 pages
Retrieval Augmented Generation - Streamlining The Creation of Intelligent Natural Language Processing Models
No ratings yet
Retrieval Augmented Generation - Streamlining The Creation of Intelligent Natural Language Processing Models
8 pages
Pdf995 Read Me File Instructions For Install
No ratings yet
Pdf995 Read Me File Instructions For Install
2 pages
Microsoft Purview EDiscovery Playbook EPIQ Nov 2023 Final3
No ratings yet
Microsoft Purview EDiscovery Playbook EPIQ Nov 2023 Final3
81 pages
Microsoft 365 Copilot Architecture & Deployment
No ratings yet
Microsoft 365 Copilot Architecture & Deployment
7 pages
Evaluating and Implementing Web Scale Discovery Services: Part 1
No ratings yet
Evaluating and Implementing Web Scale Discovery Services: Part 1
90 pages
Retrieval-Augmented Generation For Large Language Models A Survey
No ratings yet
Retrieval-Augmented Generation For Large Language Models A Survey
26 pages
Seminar Report Format and Content
No ratings yet
Seminar Report Format and Content
5 pages
Project Guidelines For MCA
No ratings yet
Project Guidelines For MCA
9 pages
GNU/Linux Semantic Storage System: Ahmed Salama, Ahmed Samih Amr Ramadan, Karim M. Yousef
No ratings yet
GNU/Linux Semantic Storage System: Ahmed Salama, Ahmed Samih Amr Ramadan, Karim M. Yousef
106 pages
Introduction To Search Engine Optimization
No ratings yet
Introduction To Search Engine Optimization
22 pages
Web Concepts
No ratings yet
Web Concepts
142 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
LLM Application Through Production
100% (11)
LLM Application Through Production
254 pages
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
No ratings yet
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
13 pages
Rag 1708257109
100% (1)
Rag 1708257109
5 pages
Langchain PDF Reader
100% (1)
Langchain PDF Reader
15 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
100% (1)
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
39 pages
Vector Databases - A Technical Primer
100% (1)
Vector Databases - A Technical Primer
68 pages
What Is Elasticsearch
No ratings yet
What Is Elasticsearch
63 pages
Little Guide To Building Large Language Models in 2024
100% (1)
Little Guide To Building Large Language Models in 2024
65 pages
RAG - The Future of LLMs - LinkedIn
No ratings yet
RAG - The Future of LLMs - LinkedIn
7 pages
Gtu Project Format
No ratings yet
Gtu Project Format
8 pages
Generative AI - POC - Readout
100% (3)
Generative AI - POC - Readout
56 pages
MVC Cheat Sheet PDF
No ratings yet
MVC Cheat Sheet PDF
5 pages
Types of RAG: @bhavishya Pandit
No ratings yet
Types of RAG: @bhavishya Pandit
15 pages
RAG Technics
100% (1)
RAG Technics
8 pages
RAG Notes
No ratings yet
RAG Notes
19 pages
RAG Notes
No ratings yet
RAG Notes
4 pages
Langchain Retrieval Augmented Generation White Paper
100% (1)
Langchain Retrieval Augmented Generation White Paper
23 pages
GenAI POC - Training
100% (1)
GenAI POC - Training
43 pages
Mastering Chunking in RAG - Techniques and Strategies
No ratings yet
Mastering Chunking in RAG - Techniques and Strategies
12 pages
Generative AI - 48 Hours TOC
100% (1)
Generative AI - 48 Hours TOC
4 pages
LangChain Academy - Introduction To LangGraph - Motivation
No ratings yet
LangChain Academy - Introduction To LangGraph - Motivation
17 pages
GraphRAG + GPT-4o Mini - Building An AI Knowledge Graph at Low Cost - by Shuyi Wang - Jul, 2024 - Cubed
No ratings yet
GraphRAG + GPT-4o Mini - Building An AI Knowledge Graph at Low Cost - by Shuyi Wang - Jul, 2024 - Cubed
31 pages
10 Evani Generative AI Champion
No ratings yet
10 Evani Generative AI Champion
39 pages
Explaining Vector Databases in 3 Levels of Difficulty - by Leonie Monigatti - Jul, 2023 - Towards Data Science
No ratings yet
Explaining Vector Databases in 3 Levels of Difficulty - by Leonie Monigatti - Jul, 2023 - Towards Data Science
12 pages
Weaviate Advanced RAG Techniques Ebook
100% (1)
Weaviate Advanced RAG Techniques Ebook
13 pages
Enterprise Vault Technical Detail Presentation
No ratings yet
Enterprise Vault Technical Detail Presentation
48 pages
Multi-Document Agentic RAG Using Llama-Index and Mistral - by Plaban Nayak - The AI Forum - May, 2024 - Medium
100% (1)
Multi-Document Agentic RAG Using Llama-Index and Mistral - by Plaban Nayak - The AI Forum - May, 2024 - Medium
24 pages
Generative AI LLM Tutorial
No ratings yet
Generative AI LLM Tutorial
25 pages
LLM Questions
100% (1)
LLM Questions
51 pages
LLM Evaluation
No ratings yet
LLM Evaluation
1 page
Software Architecture in An AI World
100% (1)
Software Architecture in An AI World
25 pages
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
100% (1)
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
27 pages
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
No ratings yet
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
15 pages
GenAI Interview Questions-Draft
No ratings yet
GenAI Interview Questions-Draft
27 pages
What Are Vector Databases
No ratings yet
What Are Vector Databases
5 pages
Evolving LLOMPS For RAG
No ratings yet
Evolving LLOMPS For RAG
6 pages
LLM Applications
100% (1)
LLM Applications
1 page
20 Types of LLM Guardrails
No ratings yet
20 Types of LLM Guardrails
12 pages
300 LangChain Projects
100% (1)
300 LangChain Projects
17 pages
7 Agentic RAG System Architectures To Build AI Agents
100% (1)
7 Agentic RAG System Architectures To Build AI Agents
12 pages
Automatic Indexing
No ratings yet
Automatic Indexing
26 pages
Vector Databases
No ratings yet
Vector Databases
35 pages
26 RAG Concepts in Alphabetical Order
No ratings yet
26 RAG Concepts in Alphabetical Order
15 pages
1GitHub - Modelcontextprotocol - Python-Sdk - The Official Python SDK For Model Context Protocol Servers and Clients
No ratings yet
1GitHub - Modelcontextprotocol - Python-Sdk - The Official Python SDK For Model Context Protocol Servers and Clients
9 pages
LangGraph: Multi-Agent Systems
No ratings yet
LangGraph: Multi-Agent Systems
9 pages
Knowledge Graphs V Vector Databases and When Not To Use Them!
No ratings yet
Knowledge Graphs V Vector Databases and When Not To Use Them!
3 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Hands-On Guide To Agentic Corrective RAG-1
No ratings yet
Hands-On Guide To Agentic Corrective RAG-1
5 pages
RAG Understanding PDF
No ratings yet
RAG Understanding PDF
12 pages
Spark for Data Science
From Everand
Spark for Data Science
Srinivas Duvvuri
No ratings yet

RAG Slide ENG

Uploaded by

RAG Slide ENG

Uploaded by

Retrieval-Augmented Generation (RAG):

Paradigms, Technologies, and Trends

CONTENTS 2. RAG Par adigms Shifting

3. Key Technologies and Evaluation

4. RAG Stack and Industr y Pr actices

5. Summar y and Pr ospect

Practical Requirements of Application

• Domain-specific accurate answering

When answering questions or

The RAG model is especially

A typical case of RAG

Ways to optimize LLMs.

A typical case of RAG

Scenarios where RAG is applicable: Q&A Fact Checking Dialog

• Frequent knowledge updates

• Specialized domain knowledge Natural Language Sentiment Commonsense

Step2 Retr ival

• Optimizing Data Indexing：

sliding window, fine-grained

segmentation、adding metadata Advanced RAG

routes, summaries, rewriting, and

• Post-Retrieval Process：reorder, Modular RAG

filter content retrieval

Modules Pattern Naive RAG

How to use the

Augmentation stage： Retrieval choice: Model Generation choice:

Using simple, but unable to support the

Ensuring the output results are highly

Once｜Replug 2023 Adaptive｜Flare 2023 Every N Tokens｜Atlas 2023

Small-2-Big Slidingwindow Summary Small-2-Big

Embeding at sentence liding chunk covers Retrieve documents

Example Page Time Type Document Title Pseudo

Pseudo Metadata Generation Metadata filter

Hierarchical Organization of Retrieval Corpora

● Document → Embedded Objects

Documents have embedded objects (such as tables,

Prompt | UPRISE [Cheng et al.,2023] Cross-language| CREA-ICL [Li et al., 2023]

Query Rewriting Query Clarification

Rewrite-Retrieve-Read [Ma et al., 2023] Tree of Clarifications（TOC）[Kim et al.,2023]

BAAI-General-Embedding (BGE) LLM-Embedder(BGE2) [Aksitov et al.,2023]

Fine-tuning the Embedding Model

ITER [Feng et FLARE [Jiang et

AAR [Yu et al., 2023]

Collaborative Fine-Tuning SANTA [Li et al., 2023]

《Retrieval-Augmented Generation for Large Language Models: A Survey》

Info Integr ation Counter factual Robustness

Synthetic dataset + Fine-tuning + Ranking using • Contextual Relevance

Inconsistent behavior ,API

Easy to get started, visualized Does not support complex

Adapts to multi-agent Low efficiency, requires multiple

The intelligent upgrade of

NetEase - ChatBI BMW - CarExpert

Cohere - Coral Amazon - Kendra

• Modularity Will Become Mainstream

Long context Coordination with FT The role of LLMs

Robustness Scaling Law Engineering Practice

RA-CM3 [Yasunaga et al.,2023] RA-CLIP [Xie et al.,2023]

Re-AudioLDM [Yuan et al.,2023] DocPrompting [Zhou et al.,2023]

Downstream Task Development and Evaluation

Recommendation System Information extraction Report generation

For more information, please see:

You might also like