Guide To End-to-End RAG Systems Evaluation

GENAI

Uploaded by

vicky.sonawane3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views8 pages

Guide To End-to-End RAG Systems Evaluation

GENAI

Uploaded by

vicky.sonawane3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Guide to

End-to-End
RAG Systems
Evaluation

Dipanjan (DJ)
Standard RAG System
Evaluation Metrics

Two major points in a RAG System need evaluation

Retriever: This is where we measure retrieval performance from the
Vector DB for input queries
Contextual Precision: Relevant retrieved context to input query should rank higher
Contextual Recall: Retrieved context should align with expected ground truth response
Contextual Relevancy: Relevancy of statements in retrieved context to the input query
should be more in count

Generator: This is where we measure the quality of generated responses

from the LLM for input queries and retrieved context
Answer Relevancy: Relevancy of statements in generated response to the input query
should be more in count or semantically similar (LLM-based or semantic similarity)
Faithfulness: Count of truthful claims made in the generated response w.r.t the retrieved
context should be more
Hallucination Check: Number of statements in generated response which contradict the
ground truth context should be minimal
Custom LLM as a Judge: You can create your own judging metrics based on custom
evaluation criteria as needed
Source: Dipanjan (DJ)
End-to-End RAG
Evaluation Workflow

The following key steps are necessary to enable end-to-end

evaluation of a RAG System
Build a RAG System which can return generated responses and
retrieved context sources in one go
Using your context documents generate golden reference data
samples using LLMs or manually
Run input queries from each reference sample through your
RAG System and get generated responses
Create Test Cases using your golden reference data and actual
generated responses and retrieved contexts
Use any standard RAG Evaluation framework to evaluate based
on metrics and settings of your choice
Review performance of your system based on results and iterate
Source: Dipanjan (DJ)
LLM-based Sythetic
Golden Reference Data
Generator

Create Golden Reference Data

samples manually or using an LLM
synthetically
Golden reference data samples
would consist of the following:
Input Query: Input question to
the RAG system
Expected Output: Ground truth
answer to be expected from the
LLM Generator
Context: Expected ground truth
context which should be
retrieved

Source: https://www.anthropic.com/news/contextual-retrieval
RAG System with
Sources

Build a RAG System as usual which can return the generated

response to any input query
Besides the response also return the retrived source context
This helps in evaluating retriever and generator metrics in one go
Avoids having to run separate queries on Vector DB for evaluating
retriever metrics and RAG System for generator metrics for each
reference data sample

Source: Dipanjan (DJ)

Create Evaluation
Test Cases

Here we take the input query of each golden reference data sample
Pass the query to our RAG System and take the Retrieved Context
and LLM Response as output
Append them to each golden reference data sample to create a test
case
Each Test Case Sample will consist of the following:
Input Query: Input question to the RAG system
Expected Output: Ground truth answer to be expected from the
LLM Generator
Context: Expected ground truth context which should be
retrieved
Actual Output: The actual response from the RAG System's LLM
Generator
Retrieved Context: The actual retrieved context from the RAG
System's Vector DB Generator

Source: Dipanjan (DJ)

Run RAG Evaluation on
Test Cases

Define the RAG Metrics you want to evaluate each test case on in
terms of:
Metric Definition
Pass or Fail Threshold
Specific evaluation instructions in case of custom metrics
Evaluate each test case and store the metrics
Visualize on your dashboard as needed and improve system over
time
Source: Dipanjan (DJ)
RAG Evaluation Example
with DeepEval

You can leverage libraries like DeepEval and Ragas to make things
easier for you or even create your own custom eval metrics

Source: Dipanjan (DJ)

On Case Study Method of Teaching
No ratings yet
On Case Study Method of Teaching
36 pages
Developers Guide To RAG With Data Streaming
100% (1)
Developers Guide To RAG With Data Streaming
22 pages
Retrieval Augmented Generation - A Simple Introduction
No ratings yet
Retrieval Augmented Generation - A Simple Introduction
82 pages
AquaLab 4 Water Activity Meter Manual
No ratings yet
AquaLab 4 Water Activity Meter Manual
129 pages
Kinematic Diagrams
No ratings yet
Kinematic Diagrams
16 pages
Guide To Top 7 LLM Parameters
100% (1)
Guide To Top 7 LLM Parameters
9 pages
A Taxonomy of Retrieval Augmented Generation
100% (2)
A Taxonomy of Retrieval Augmented Generation
56 pages
Building Blocks of Rag Ebook Final
100% (2)
Building Blocks of Rag Ebook Final
9 pages
Technical Handbook Abarth 500 A.C. and L.E
100% (1)
Technical Handbook Abarth 500 A.C. and L.E
52 pages
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
100% (10)
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
6 pages
Rag Evaluations - A Simple Guide To Rag
No ratings yet
Rag Evaluations - A Simple Guide To Rag
16 pages
RAG Architecture
100% (8)
RAG Architecture
52 pages
Guide To Evaluating LLM and RAG Systems
No ratings yet
Guide To Evaluating LLM and RAG Systems
41 pages
Generative AI
No ratings yet
Generative AI
25 pages
55+ Python Projects
No ratings yet
55+ Python Projects
7 pages
RAG Technics
100% (1)
RAG Technics
8 pages
Filipino Inventors
0% (1)
Filipino Inventors
2 pages
Of Plymouth Plantation PDF
100% (2)
Of Plymouth Plantation PDF
4 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
Al Furjan 1515 Villas&Terrace Homes
No ratings yet
Al Furjan 1515 Villas&Terrace Homes
21 pages
Source Follower: (Common-Drain Amplifier)
No ratings yet
Source Follower: (Common-Drain Amplifier)
40 pages
Literature Review of Personality Traits Essays
100% (1)
Literature Review of Personality Traits Essays
6 pages
Assessment Task 2: Activity No. 1
No ratings yet
Assessment Task 2: Activity No. 1
5 pages
FSBC01 The Use of Repair and Maintenance Budget For Buildings
No ratings yet
FSBC01 The Use of Repair and Maintenance Budget For Buildings
5 pages
Gravity Light Project
No ratings yet
Gravity Light Project
16 pages
RAG - Genai
No ratings yet
RAG - Genai
11 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
SSP Cakram5 6
No ratings yet
SSP Cakram5 6
420 pages
What Is Retrieval-Augmented Generation, Aka RAG?: Rick Merritt
No ratings yet
What Is Retrieval-Augmented Generation, Aka RAG?: Rick Merritt
9 pages
The Psychology of Academic Achievement
No ratings yet
The Psychology of Academic Achievement
31 pages
Guide To RAG System Evaluation Metrics
No ratings yet
Guide To RAG System Evaluation Metrics
21 pages
CTSD-Lab Mannual Final - 241204 - 102238
No ratings yet
CTSD-Lab Mannual Final - 241204 - 102238
54 pages
RAGBench - Explainable Benchmark For Retrieval-Augmented Generation Systems
No ratings yet
RAGBench - Explainable Benchmark For Retrieval-Augmented Generation Systems
18 pages
Assessing RAG Models For Health Chatbots
No ratings yet
Assessing RAG Models For Health Chatbots
17 pages
Ecn Rulesregulationsandplayingconditions v4.0.3
No ratings yet
Ecn Rulesregulationsandplayingconditions v4.0.3
142 pages
Speculative RAG Enhancing RAG Through Drafting 1721165432
No ratings yet
Speculative RAG Enhancing RAG Through Drafting 1721165432
17 pages
Medium
No ratings yet
Medium
22 pages
RAG - The Future of LLMs - LinkedIn
No ratings yet
RAG - The Future of LLMs - LinkedIn
7 pages
Optimise LLM Selection and Save Cost and Resources
No ratings yet
Optimise LLM Selection and Save Cost and Resources
8 pages
Module 4 - RAG (Retrieval Augmented Generation) - PEC GenAI Course
No ratings yet
Module 4 - RAG (Retrieval Augmented Generation) - PEC GenAI Course
23 pages
12 Essential RAG Types 1735544647
No ratings yet
12 Essential RAG Types 1735544647
29 pages
Vector Search
No ratings yet
Vector Search
10 pages
RAG Retrieval-Augmented Generation
No ratings yet
RAG Retrieval-Augmented Generation
12 pages
RAGCHECKER
No ratings yet
RAGCHECKER
27 pages
Know Your RAG: Dataset Taxonomy and Generation Strategies For Evaluating RAG Systems
No ratings yet
Know Your RAG: Dataset Taxonomy and Generation Strategies For Evaluating RAG Systems
19 pages
RAG Does Not Work For Enterprises
No ratings yet
RAG Does Not Work For Enterprises
14 pages
Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems
No ratings yet
Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems
18 pages
A Deep Dive Into Retrieval Augmented Generation: Team Members
No ratings yet
A Deep Dive Into Retrieval Augmented Generation: Team Members
14 pages
SSRN 5267341
No ratings yet
SSRN 5267341
16 pages
GRAMMAR Grounded and Modular Methodology For Assessment of Domain-Specific Retrieval-Augmented Language Model
No ratings yet
GRAMMAR Grounded and Modular Methodology For Assessment of Domain-Specific Retrieval-Augmented Language Model
16 pages
Multi-Task Retriever Fine-Tuning For Domain-Specific and Efficient RAG
No ratings yet
Multi-Task Retriever Fine-Tuning For Domain-Specific and Efficient RAG
9 pages
Quiksam PDF
No ratings yet
Quiksam PDF
6 pages
Evaluating Retrieval Quality in Retrieval-Augmented Generation
No ratings yet
Evaluating Retrieval Quality in Retrieval-Augmented Generation
6 pages
07evaluating Retrival Quality in Rag
No ratings yet
07evaluating Retrival Quality in Rag
6 pages
MA RAG DiverseDS
No ratings yet
MA RAG DiverseDS
16 pages
What Is Retrieval-Augmented Generation (RAG)
No ratings yet
What Is Retrieval-Augmented Generation (RAG)
12 pages
Scamper Technique
No ratings yet
Scamper Technique
19 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
2024.naacl Industry.23
No ratings yet
2024.naacl Industry.23
16 pages
ARES - An Automated Evaluation Framework For Retrieval-Augmented Generation Systems
No ratings yet
ARES - An Automated Evaluation Framework For Retrieval-Augmented Generation Systems
14 pages
Privacy First RAG Closed-Loop LLMs For Industrial Data Security
No ratings yet
Privacy First RAG Closed-Loop LLMs For Industrial Data Security
12 pages
Applsci 14 09318 v2
No ratings yet
Applsci 14 09318 v2
18 pages
Rag Foundry - Diff Framework
No ratings yet
Rag Foundry - Diff Framework
10 pages
WWW Oracle Com in Artificial-Intelligence Generative-Ai Retrieval-Augmented-Generation-Rag
No ratings yet
WWW Oracle Com in Artificial-Intelligence Generative-Ai Retrieval-Augmented-Generation-Rag
7 pages
Quiz Ecology
No ratings yet
Quiz Ecology
9 pages
Gautam 2024 Evaluating
No ratings yet
Gautam 2024 Evaluating
7 pages
Understanding RAG
No ratings yet
Understanding RAG
16 pages
The Ultimate Guide To GenAI RAG: Enhancing AI With Real-Time Data Retrieval
No ratings yet
The Ultimate Guide To GenAI RAG: Enhancing AI With Real-Time Data Retrieval
12 pages
RAG Evaluation 1742146813
No ratings yet
RAG Evaluation 1742146813
8 pages
Ways To Assess RAG
No ratings yet
Ways To Assess RAG
5 pages
Objective Function Decisions Demand Supply Constraints
No ratings yet
Objective Function Decisions Demand Supply Constraints
7 pages
Document 2
No ratings yet
Document 2
12 pages
Week 1 AI Paper by Hand
No ratings yet
Week 1 AI Paper by Hand
8 pages
LLM Interpretability 101
No ratings yet
LLM Interpretability 101
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
11 pages
RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance
No ratings yet
RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance
8 pages
Seven Failure Points When Engineering A Retrieval Augmented Generation System
No ratings yet
Seven Failure Points When Engineering A Retrieval Augmented Generation System
6 pages
Cloud Google Com Use-Cases Retrieval-Augmented-Generation
No ratings yet
Cloud Google Com Use-Cases Retrieval-Augmented-Generation
7 pages
Seven Failure Points When Engineering A Retrieval Augmented Generation System
No ratings yet
Seven Failure Points When Engineering A Retrieval Augmented Generation System
6 pages
Llmrag
No ratings yet
Llmrag
6 pages
Critical Analysis of My Mother at Sixty Six
No ratings yet
Critical Analysis of My Mother at Sixty Six
7 pages
Imagine
No ratings yet
Imagine
5 pages
Computer (Eng) SSC CHSL 2024 All 70 Questions (RBE)
No ratings yet
Computer (Eng) SSC CHSL 2024 All 70 Questions (RBE)
8 pages
Ratio and Proportion Sheet 1
No ratings yet
Ratio and Proportion Sheet 1
5 pages
048beng-Evaluation +measuring+rag+performance
No ratings yet
048beng-Evaluation +measuring+rag+performance
4 pages
Jithin Original
No ratings yet
Jithin Original
2 pages
Learning: Gen Ai
No ratings yet
Learning: Gen Ai
6 pages
Movement in The Classroom
No ratings yet
Movement in The Classroom
7 pages
Tendernotice - 1 (5) - 1
No ratings yet
Tendernotice - 1 (5) - 1
4 pages
Retrieval-Augmented Generation (RAG)
No ratings yet
Retrieval-Augmented Generation (RAG)
2 pages
COVID-19 Testing
No ratings yet
COVID-19 Testing
2 pages
Title: Enhancing Actionable Insights From Recorded Conversations Using Retrieval Augmented Generation (RAG)
No ratings yet
Title: Enhancing Actionable Insights From Recorded Conversations Using Retrieval Augmented Generation (RAG)
2 pages
Grammar Test - Year - 6 - Feira
No ratings yet
Grammar Test - Year - 6 - Feira
2 pages
Tyjt
No ratings yet
Tyjt
2 pages
Review Article
No ratings yet
Review Article
2 pages
CTPAT Job Aid - Personnel Training Checklist Sample - October 2021
No ratings yet
CTPAT Job Aid - Personnel Training Checklist Sample - October 2021
4 pages

Guide To End-to-End RAG Systems Evaluation

Uploaded by

Guide To End-to-End RAG Systems Evaluation

Uploaded by

Guide to

Two major points in a RAG System need evaluation

Generator: This is where we measure the quality of generated responses

The following key steps are necessary to enable end-to-end

Create Golden Reference Data

Build a RAG System as usual which can return the generated

Source: Dipanjan (DJ)

Source: Dipanjan (DJ)

Source: Dipanjan (DJ)

You might also like