0% found this document useful (0 votes)

59 views34 pages

Week 5 - LLM - RAG

Uploaded by

RAUSHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views34 pages

Week 5 - LLM - RAG

Uploaded by

RAUSHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

RETRIEVAL AUGMENTED GENERATION

[Modern Information Retrieval]

Open problem with a billion dollar potential!

INFORMATION RETRIEVAL
SBERT Fine-Tuning
- The query has a vector representation using embeddings

- Documents in the database stored as embeddings

- Brute Force Approach:

Do a dot product of the query vector with the embeddings
of all the documents, and choose the one that gives the
closest match

- Hierarchical Navigable Small World (HNSW):

Create a layered graph structure of the document
embedding vectors so that the search process is made
much faster
Retrieval Augmented Generation [RAG]

- New data gets generated at a rapid pace

- Re-Training LLMs is expensive

- LLM Hallucinations

- Solution is to extract relevant information from trusted

sources and then use LLMs to generate a concise response
What are the main challenges in this pipeline?
What are the main challenges in this pipeline?
Keyword Search

vs.

Semantic Search

vs.

RAG
CHUNKING
In the context of building LLM-related applications,
chunking is the process of breaking down large pieces
of text into smaller segments.

It’s an essential technique that helps optimize the

relevance of the content we get back from a vector
database once we use the LLM to embed content.
SEMANTIC CHUNKING

● Break up the document into sentences.

● Create sentence groups: for each sentence, create a group containing

some sentences before and after the given sentence.

● Generate embeddings for each sentence group.

● Compare distances between each group sequentially: as long as the

topic or theme is the same, the distance between the sentence group
embedding for a given sentence and the sentence group preceding it
will be low. On the other hand, higher semantic distance indicates that
the theme or topic has changed.
Generally, a combination of different techniques is used
and so the same sentence can appear in multiple chunks.

This leads to larger size of the database

and more computations while retrieval,
but improves RAG performance.
RE-RANKING
Rerankers and Two-Stage Retrieval

For vector search to work, we need vectors. These

vectors are essentially compressions of the "meaning"
behind some text into (typically) 768 or 1536-dimensional
vectors. There is some information loss because we're
compressing this information into a single vector.

Because of this information loss, we often see that the

top three (for example) vector search documents will
miss relevant information. Unfortunately, the retrieval
may return relevant information below our top_k cutoff.
What do we do if relevant information at a lower
position would help our LLM formulate a better
response?

The easiest approach is to increase the number of

documents we're returning (increase top_k) and pass
them all to the LLM.
Unfortunately, we cannot return everything.

LLMs have limits on how much text we can pass to them — we call this limit
the context window.

Some LLMs have huge context windows, like Anthropic's Claude, with a
context window of 100K tokens.

With that, we could fit many tens of pages of text — so could we return many
documents (not quite all) and "stuff" the context window to improve recall?

Again, no. We cannot use context stuffing because this reduces the LLM's
recall performance.
The solution to this issue is to maximize retrieval
recall by retrieving plenty of documents and then
maximize LLM recall by minimizing the number of
documents that make it to the LLM.

To do that, we reorder retrieved documents and keep

just the most relevant for our LLM

— to do that, we use reranking.

A reranking model
— also known as a cross-encoder
— is a type of model that,
given a query and document pair,
will output a similarity score.

We use this score to reorder the documents

by relevance to our query.
For retrieval, is it better to use bi-encoders or cross-encoders?
Is one of them going to be much slower than the other?
Search engineers have used rerankers in two-stage retrieval systems for a
long time.

In these two-stage systems, a first-stage model (an embedding

model/retriever) retrieves a set of relevant documents from a larger dataset.

Then, a second-stage model (the reranker) is used to rerank those

documents retrieved by the first-stage model.

We use two stages because retrieving a small set of documents from a large
dataset is much faster than reranking a large set of documents — we'll
discuss why this is the case soon — but TL;DR,

rerankers are slow, and retrievers are fast.

If a reranker is so much slower,
why bother using them?
Cross-Encoders
Cross-encoder models redefine the conventional approach by
employing a classification mechanism for pairs of data.

The model takes a pair of data, such as two sentences, as input, and
produces an output value between 0 and 1, indicating the similarity
between the two items.

This departure from vector embeddings allows for a more nuanced

understanding of the relationships between data points.

It's important to note that cross-encoders require a pair of "items" for

every input
Ensemble Models

One approach to re-ranking involves using an ensemble of multiple

language models or ranking algorithms.

By combining the strengths of different models, ensemble-based

re-ranking can provide more accurate and diverse results compared to
relying on a single model.
Contextual Re-Ranking

This technique involves incorporating contextual information, such as

user preferences or interaction history, into the re-ranking process.

By personalizing the ranking criteria based on the user’s context, the

system can deliver more relevant and engaging responses.
Query Expansion

Query expansion is a re-ranking technique that involves modifying or

expanding the user’s initial query to better capture their intent.

This can be achieved by adding related terms, synonyms, or even

paraphrasing the query.

By broadening the scope of the search, query expansion helps retrieve

more relevant and diverse candidates for re-ranking.
Feature-based Re-Ranking

In this approach, the system assigns scores to the top-k candidates

based on a set of predefined features, such as term frequency,
document length, or entity overlap.

These scores are then used to re-rank the candidates, ensuring that
the most relevant and informative responses are selected for the
generation step.
Re-Ranking is an open problem
and lot of work is needed to improve its accuracy
Is RAG a fool proof solution
to augment LLMs with new information?

What do you think are the challenges?

Non-Graded Task

Build a RAG pipeline (with re-ranking)

using books from any domain

Explore LangChain and LlamaIndex in detail,

and use them for your RAG pipeline

Compare with classical SEIR algorithms

Weaviate Advanced RAG Techniques Ebook
100% (1)
Weaviate Advanced RAG Techniques Ebook
13 pages
A Simplified Guide To Retrieval-Augmented Generation - by Kausertp - Medium
No ratings yet
A Simplified Guide To Retrieval-Augmented Generation - by Kausertp - Medium
10 pages
D.A.V. School: Artificial Intelligence Class: X Quarterly Examination 2020-2021 Time: 2 Hrs
No ratings yet
D.A.V. School: Artificial Intelligence Class: X Quarterly Examination 2020-2021 Time: 2 Hrs
8 pages
Hybrid Retrieval-Augmented Generation Approach For LLMs Query Response Enhancement
No ratings yet
Hybrid Retrieval-Augmented Generation Approach For LLMs Query Response Enhancement
5 pages
Psycho-Pass, A Case Study
No ratings yet
Psycho-Pass, A Case Study
10 pages
A Simple Guide To Retrieval Augmented Generation 1720484135
No ratings yet
A Simple Guide To Retrieval Augmented Generation 1720484135
9 pages
CCS369 - TSS-Unit 3
No ratings yet
CCS369 - TSS-Unit 3
55 pages
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
100% (4)
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
2 pages
The Origins of Consciousness
100% (1)
The Origins of Consciousness
18 pages
RAG Technics
100% (1)
RAG Technics
8 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
Learning To Rank
No ratings yet
Learning To Rank
777 pages
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
No ratings yet
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
46 pages
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
100% (10)
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
6 pages
4 IRModels
No ratings yet
4 IRModels
46 pages
Rag System Notes
No ratings yet
Rag System Notes
26 pages
IBM Course1 Syllabus
100% (1)
IBM Course1 Syllabus
2 pages
200001
100% (1)
200001
21 pages
EY The New Age Artificial Intelligence For Human Resource Opportunities and Functions
No ratings yet
EY The New Age Artificial Intelligence For Human Resource Opportunities and Functions
11 pages
Mirdasm Report File
No ratings yet
Mirdasm Report File
101 pages
Hpe Alletra 6000
No ratings yet
Hpe Alletra 6000
16 pages
Liu 2009
No ratings yet
Liu 2009
109 pages
LectureLtR-neural IR 1
No ratings yet
LectureLtR-neural IR 1
55 pages
W6L2 LLM For Search
No ratings yet
W6L2 LLM For Search
70 pages
Bulu
No ratings yet
Bulu
47 pages
RAG Training NEW
No ratings yet
RAG Training NEW
47 pages
Advanced RAG Techniques - What They Are & How To Use Them
No ratings yet
Advanced RAG Techniques - What They Are & How To Use Them
16 pages
Information Retrieval: Adt-V Unit
No ratings yet
Information Retrieval: Adt-V Unit
106 pages
01 Course Intro and Intro To AI
No ratings yet
01 Course Intro and Intro To AI
63 pages
IRS Unit 3 by Krishna
No ratings yet
IRS Unit 3 by Krishna
50 pages
Understand What LLM Needs: Dual Preference Alignment For Retrieval-Augmented Generation
No ratings yet
Understand What LLM Needs: Dual Preference Alignment For Retrieval-Augmented Generation
37 pages
Eurofound - Ethical Digitalisation at Work
No ratings yet
Eurofound - Ethical Digitalisation at Work
68 pages
Ai-El RMC
No ratings yet
Ai-El RMC
42 pages
Improving Retrieval Augmented Generation
No ratings yet
Improving Retrieval Augmented Generation
33 pages
IR Lecture 6b
No ratings yet
IR Lecture 6b
45 pages
Untitled 2
No ratings yet
Untitled 2
40 pages
Maximizing Rag Efficiency A Comparative Analysis of Rag Methods
No ratings yet
Maximizing Rag Efficiency A Comparative Analysis of Rag Methods
25 pages
AI For Cyber Security Automated Incident Response Systems
No ratings yet
AI For Cyber Security Automated Incident Response Systems
30 pages
E Commerce Module 5
No ratings yet
E Commerce Module 5
24 pages
ML Imp Ques 2
No ratings yet
ML Imp Ques 2
37 pages
Beyond Explaining The Basics of Retrieval (Augmented Generation)
No ratings yet
Beyond Explaining The Basics of Retrieval (Augmented Generation)
22 pages
AutoRAG Automated Framework For Optimization of Retrieval-Augmented Generation
No ratings yet
AutoRAG Automated Framework For Optimization of Retrieval-Augmented Generation
22 pages
Generative AI in Search and Recommendations
No ratings yet
Generative AI in Search and Recommendations
50 pages
Artificial Intelligence Lecture Note
No ratings yet
Artificial Intelligence Lecture Note
55 pages
Information Retrival List of Experiment - Odd Sem 2024-25
No ratings yet
Information Retrival List of Experiment - Odd Sem 2024-25
23 pages
Chap - Week8 - Queries and Information Needs
No ratings yet
Chap - Week8 - Queries and Information Needs
44 pages
RAG 570 Hasnad Ahmed2
No ratings yet
RAG 570 Hasnad Ahmed2
9 pages
LLM-Based Compact Reranking With Document Features For Scientific Retrieval
No ratings yet
LLM-Based Compact Reranking With Document Features For Scientific Retrieval
17 pages
Unit 4
No ratings yet
Unit 4
17 pages
2024 Work Trend Index Annual Report 6 7 24 666b2e2fafceb
No ratings yet
2024 Work Trend Index Annual Report 6 7 24 666b2e2fafceb
39 pages
GROUP PROJECT - HRM Solution
No ratings yet
GROUP PROJECT - HRM Solution
17 pages
Hybrid RAG For Unstructured Data
No ratings yet
Hybrid RAG For Unstructured Data
25 pages
GENAI1
No ratings yet
GENAI1
25 pages
Module 3-2
No ratings yet
Module 3-2
17 pages
DataGemma FullPaper
No ratings yet
DataGemma FullPaper
39 pages
Mini Project 1
No ratings yet
Mini Project 1
16 pages
IRS 2nd Chap
No ratings yet
IRS 2nd Chap
42 pages
Ragg
No ratings yet
Ragg
23 pages
RAG - Search Generate
No ratings yet
RAG - Search Generate
13 pages
NLP Review 3 Formatted 2
No ratings yet
NLP Review 3 Formatted 2
27 pages
46964-Article Text-148750-1-10-20240627
No ratings yet
46964-Article Text-148750-1-10-20240627
11 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Learning: Gen Ai
No ratings yet
Learning: Gen Ai
6 pages
Semantic Search and Beyond handout-Tim-Clarke
No ratings yet
Semantic Search and Beyond handout-Tim-Clarke
16 pages
Knowledge Retrieval Based On Generative AI: 1 Te-Lun Yang
No ratings yet
Knowledge Retrieval Based On Generative AI: 1 Te-Lun Yang
8 pages
Steps Involved in RAG
No ratings yet
Steps Involved in RAG
4 pages
LIBS 894 Assignment Three Classic Models
No ratings yet
LIBS 894 Assignment Three Classic Models
8 pages
Geochemical Anomalies
No ratings yet
Geochemical Anomalies
8 pages
Fast Animal Pose Estimation Using Deep Neural Networks
No ratings yet
Fast Animal Pose Estimation Using Deep Neural Networks
13 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
15 pages
Graph Theory Implementation in NLP
No ratings yet
Graph Theory Implementation in NLP
9 pages
Grounding LLM Models For Increased Accuracy
No ratings yet
Grounding LLM Models For Increased Accuracy
9 pages
Re-Ranking The Context For Multimodal Retrieval Augmented Generation
No ratings yet
Re-Ranking The Context For Multimodal Retrieval Augmented Generation
5 pages
Information Retrieval Models: Vector Space Models: Chengxiang Zhai
No ratings yet
Information Retrieval Models: Vector Space Models: Chengxiang Zhai
30 pages
Text Classification With Switch Transformer - 1716327819025
No ratings yet
Text Classification With Switch Transformer - 1716327819025
5 pages
Improving Retrieval For RAG Based Question Answering Models On Financial Documents
No ratings yet
Improving Retrieval For RAG Based Question Answering Models On Financial Documents
7 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
Gen AI Guide
No ratings yet
Gen AI Guide
6 pages
Orios Case Study
No ratings yet
Orios Case Study
6 pages
Fin Irjmets1647283669
No ratings yet
Fin Irjmets1647283669
8 pages
Task 4 - Resource - Finance Optimization Steps
No ratings yet
Task 4 - Resource - Finance Optimization Steps
3 pages
Create A Chatbot Using Python
No ratings yet
Create A Chatbot Using Python
2 pages
Chazey Partners RPA - 2017
No ratings yet
Chazey Partners RPA - 2017
20 pages
Cisco Public: White Paper
No ratings yet
Cisco Public: White Paper
22 pages
Data Driven Control
No ratings yet
Data Driven Control
2 pages
Chapter 2: Modeling: Advanced Topics in Information Retrieval
No ratings yet
Chapter 2: Modeling: Advanced Topics in Information Retrieval
28 pages
Acm Iconiaac 2014
No ratings yet
Acm Iconiaac 2014
8 pages

Week 5 - LLM - RAG

Uploaded by

Week 5 - LLM - RAG

Uploaded by

RETRIEVAL AUGMENTED GENERATION

[Modern Information Retrieval]

Open problem with a billion dollar potential!

- Documents in the database stored as embeddings

- Brute Force Approach:

- Hierarchical Navigable Small World (HNSW):

- New data gets generated at a rapid pace

- Re-Training LLMs is expensive

- Solution is to extract relevant information from trusted

It’s an essential technique that helps optimize the

● Break up the document into sentences.

● Create sentence groups: for each sentence, create a group containing

● Generate embeddings for each sentence group.

● Compare distances between each group sequentially: as long as the

This leads to larger size of the database

For vector search to work, we need vectors. These

Because of this information loss, we often see that the

The easiest approach is to increase the number of

To do that, we reorder retrieved documents and keep

— to do that, we use reranking.

We use this score to reorder the documents

In these two-stage systems, a first-stage model (an embedding

Then, a second-stage model (the reranker) is used to rerank those

rerankers are slow, and retrievers are fast.

This departure from vector embeddings allows for a more nuanced

It's important to note that cross-encoders require a pair of "items" for

One approach to re-ranking involves using an ensemble of multiple

By combining the strengths of different models, ensemble-based

This technique involves incorporating contextual information, such as

By personalizing the ranking criteria based on the user’s context, the

Query expansion is a re-ranking technique that involves modifying or

This can be achieved by adding related terms, synonyms, or even

By broadening the scope of the search, query expansion helps retrieve

In this approach, the system assigns scores to the top-k candidates

What do you think are the challenges?

Build a RAG pipeline (with re-ranking)

Explore LangChain and LlamaIndex in detail,

Compare with classical SEIR algorithms

You might also like