0% found this document useful (0 votes)

30 views12 pages

Blogs Nvidia Com Blog What-Is-Retrieval-Augmented-Generation

Uploaded by

uma5b3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views12 pages

Blogs Nvidia Com Blog What-Is-Retrieval-Augmented-Generation

Uploaded by

uma5b3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Home AI Data Center Driving Gaming Pro Graphics Robotics Healthcare Startups A

What Is Retrieval-Augmented Generation, aka RAG?

Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI
models with facts fetched from external sources.
November 15, 2023 by Rick Merritt

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
u Share
Reading Time: 6 mins

g
NVIDIA and our partners useEditor’s
cookiesnote: This article
and other was
tools to updated
collect on September
information 23, 2024.
you provide as

f
well as your interaction with our websites for performance improvement, analytics, and to
To understand
assist in our marketing efforts. the
We also share latest
this advancewith
information in generative AI, imagine a
our social media,
courtroom.
advertising, and analytics partners. You can manage your cookie settings by clicking on
"Manage Settings". Please see our Cookie Policy for more information.
Judges hear and decide cases based on their general understanding

h
of the law. Sometimes a case — like a malpractice suit or a labor
Manage Settings
dispute — requires special expertise, so judges send Agree
court clerks to a

d
law library, looking for precedents and specific cases they can cite.

Like a good judge, large language models (LLMs) can respond to a

wide variety of human queries. But to deliver authoritative answers
that cite sources, the model needs an assistant to do some research.
All NVIDIA News
The court clerk of AI is a process called retrieval-augmented Leading Through
generation, or RAG for short. Learning, Ruixuan Li
Champions AI
How It Got Named ‘RAG’ Innovation

Patrick Lewis, lead author of the 2020 paper that coined the term, Waterways Wonder:
apologized for the unflattering acronym that now describes a Clearbot Autonomously
growing family of methods across hundreds of papers and dozens of Cleans Waters With
Energy-Efficient AI
commercial services he believes represent the future of generative
AI.
How Digital Twins Are
“We definitely would have put more thought into the name had we Driving Efficiency and
known our work would become so widespread,” Lewis said in an Cutting Emissions in
Manufacturing
interview from Singapore, where he was sharing his ideas with a
regional conference of database developers.
Get Ready to Slay:
“We always planned to have a nicer sounding name, but when it came ‘Dragon Age: The
Veilguard’ to Soar Into
time to write the paper, no one had a better idea,” said Lewis, who
GeForce NOW at Launch
now leads a RAG team at AI startup Cohere.

‘We Would Like to

So, What Is Retrieval-Augmented Generation (RAG)?
Achieve Superhuman

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Retrieval-augmented generation (RAG) is a Productivity,’ NVIDIA
technique for enhancing the accuracy and CEO Says as Lenovo
Brings Smarter AI to
reliability of generative AI models with facts
Enterprises
fetched from external sources.

In other words, it fills a gap in how LLMs

work. Under the hood, LLMs are neural
networks, typically measured by how many
parameters they contain. An LLM’s
parameters essentially represent the
general patterns of how humans use words
Patrick Lewis to form sentences.

That deep understanding, sometimes called parameterized

knowledge, makes LLMs useful in responding to general prompts at
light speed. However, it does not serve users who want a deeper dive
into a current or more specific topic.

Combining Internal, External Resources

Lewis and colleagues developed retrieval-augmented generation to

link generative AI services to external resources, especially ones rich
in the latest technical details.

The paper, with coauthors from the former Facebook AI Research

(now Meta AI), University College London and New York University,
called RAG “a general-purpose fine-tuning recipe” because it can be
used by nearly any LLM to connect with practically any external
resource.

Building User Trust

Retrieval-augmented generation gives models sources they can cite,

like footnotes in a research paper, so users can check any claims.
That builds trust.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
What’s more, the technique can help models clear up ambiguity in a
user query. It also reduces the possibility a model will make a wrong
guess, a phenomenon sometimes called hallucination.

Another great advantage of RAG is it’s relatively easy. A blog by Lewis

and three of the paper’s coauthors said developers can implement
the process with as few as five lines of code.

That makes the method faster and less expensive than retraining a
model with additional datasets. And it lets users hot-swap new
sources on the fly.

How People Are Using RAG

With retrieval-augmented generation, users can essentially have

conversations with data repositories, opening up new kinds of
experiences. This means the applications for RAG could be multiple
times the number of available datasets.

For example, a generative AI model supplemented with a medical

index could be a great assistant for a doctor or nurse. Financial
analysts would benefit from an assistant linked to market data.

In fact, almost any business can turn its technical or policy manuals,
videos or logs into resources called knowledge bases that can
enhance LLMs. These sources can enable use cases such as
customer or field support, employee training and developer
productivity.

The broad potential is why companies including AWS, IBM, Glean,

Google, Microsoft, NVIDIA, Oracle and Pinecone are adopting RAG.

Getting Started With Retrieval-Augmented

Generation

To help users get started, NVIDIA developed an AI workflow for

retrieval-augmented generation. It includes a sample chatbot and the

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
elements users need to create their own applications with this new
method.

The workflow uses NVIDIA NeMo Retriever, a collection of easy-to-

use NVIDIA NIM microservices for large scale information retrieval.
NIM eases deployment of secure, high performance AI model
inferencing across clouds, data centers and workstations.

These components are all part of NVIDIA AI Enterprise, a software

platform that accelerates development and deployment of
production-ready AI with the security, support and stability
businesses need.

Getting the best performance for RAG workflows requires massive

amounts of memory and compute to move and process data. The
NVIDIA GH200 Grace Hopper Superchip, with its 288GB of fast
HBM3e memory and 8 petaflops of compute, is ideal — it can deliver
a 150x speedup over using a CPU.

Once companies get familiar with RAG, they can combine a variety of
off-the-shelf or custom LLMs with internal or external knowledge
bases to create a wide range of assistants that help their employees
and customers.

RAG doesn’t require a data center. LLMs are debuting on Windows

PCs, thanks to NVIDIA software that enables all sorts of applications
users can access even on their laptops.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
An example application for RAG on a PC.

PCs equipped with NVIDIA RTX GPUs can now run some AI models
locally. By using RAG on a PC, users can link to a private knowledge
source – whether that be emails, notes or articles – to improve
responses. The user can then feel confident that their data source,
prompts and response all remain private and secure.

A recent blog provides an example of RAG accelerated by TensorRT-

LLM for Windows to get better results fast.

The History of RAG

The roots of the technique go back at least to the early 1970s. That’s
when researchers in information retrieval prototyped what they called
question-answering systems, apps that use natural language
processing (NLP) to access text, initially in narrow topics such as
baseball.

The concepts behind this kind of text mining have remained fairly
constant over the years. But the machine learning engines driving
them have grown significantly, increasing their usefulness and
popularity.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
In the mid-1990s, the Ask Jeeves service, now Ask.com, popularized
question answering with its mascot of a well-dressed valet. IBM’s
Watson became a TV celebrity in 2011 when it handily beat two
human champions on the Jeopardy! game show.

Today, LLMs are taking question-answering systems to a whole new

level.

Insights From a London Lab

The seminal 2020 paper arrived as Lewis was pursuing a doctorate in

NLP at University College London and working for Meta at a new
London AI lab. The team was searching for ways to pack more
knowledge into an LLM’s parameters and using a benchmark it
developed to measure its progress.

Building on earlier methods and inspired by a paper from Google

researchers, the group “had this compelling vision of a trained system
that had a retrieval index in the middle of it, so it could learn and
generate any text output you wanted,” Lewis recalled.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
The IBM Watson question-answering system became a celebrity when it won big on the TV
game show Jeopardy!

When Lewis plugged into the work in progress a promising retrieval

system from another Meta team, the first results were unexpectedly
impressive.

“I showed my supervisor and he said, ‘Whoa, take the win. This sort of
thing doesn’t happen very often,’ because these workflows can be
hard to set up correctly the first time,” he said.

Lewis also credits major contributions from team members Ethan

Perez and Douwe Kiela, then of New York University and Facebook AI
Research, respectively.

When complete, the work, which ran on a cluster of NVIDIA GPUs,

showed how to make generative AI models more authoritative and
trustworthy. It’s since been cited by hundreds of papers that
amplified and extended the concepts in what continues to be an
active area of research.

How Retrieval-Augmented Generation Works

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
At a high level, here’s how an NVIDIA technical brief describes the
RAG process.

When users ask an LLM a question, the AI model sends the query to
another model that converts it into a numeric format so machines
can read it. The numeric version of the query is sometimes called an
embedding or a vector.

Retrieval-augmented generation combines LLMs with embedding models and vector

databases.

The embedding model then compares these numeric values to

vectors in a machine-readable index of an available knowledge base.
When it finds a match or multiple matches, it retrieves the related
data, converts it to human-readable words and passes it back to the
LLM.

Finally, the LLM combines the retrieved words and its own response
to the query into a final answer it presents to the user, potentially
citing sources the embedding model found.

Keeping Sources Current

In the background, the embedding model continuously creates and

updates machine-readable indices, sometimes called vector
databases, for new and updated knowledge bases as they become
available.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Retrieval-augmented generation combines LLMs with embedding models and vector
databases.

Many developers find LangChain, an open-source library, can be

particularly useful in chaining together LLMs, embedding models and
knowledge bases. NVIDIA uses LangChain in its reference
architecture for retrieval-augmented generation.

The LangChain community provides its own description of a RAG

process.

Looking forward, the future of generative AI lies in creatively chaining

all sorts of LLMs and knowledge bases together to create new kinds
of assistants that deliver authoritative results users can verify.

Get a hands on using retrieval-augmented generation with an AI

chatbot in this NVIDIA LaunchPad lab.

Explore generative AI sessions and experiences at NVIDIA GTC, the

global conference on AI and accelerated computing, running March 18-
21 in San Jose, Calif., and online.

Categories: Deep Learning | Explainer | Generative AI

Tags: Artificial Intelligence | Events | Inference | Machine Learning |

New GPU Uses | NVIDIA NeMo | TensorRT | Trustworthy AI

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Corporate Information Get Involved News & Events

About NVIDIA Forums Newsroom

Corporate Overview Careers NVIDIA Blog

Technologies Developer Home NVIDIA Technical Blog

NVIDIA Research Join the Developer Program Webinars

Investors NVIDIA Partner Network Stay Informed

Social Responsibility NVIDIA Inception Events Calendar

NVIDIA Foundation Resources for Venture Capitalists NVIDIA GTC

Venture Capital (NVentures) NVIDIA On-Demand

Technical Training

Training for IT Professionals

Professional Services for Data

Science

EXPLORE OUR REGIONAL BLOGS AND OTHER SOCIAL NETWORKS

e
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
USA - United States

Privacy Policy Manage My Privacy Legal Accessibility

PDFmyURL converts web pages and even full websites to PDF easily and quickly.

Retrieval Augmented Generation - A Simple Introduction
No ratings yet
Retrieval Augmented Generation - A Simple Introduction
82 pages
2024-05-EB-A Compact GuideTo RAG
No ratings yet
2024-05-EB-A Compact GuideTo RAG
38 pages
Enterprise GenAI For Dummies
No ratings yet
Enterprise GenAI For Dummies
49 pages
Evaluating Malicious Generative AI Capabilities 1730000831
No ratings yet
Evaluating Malicious Generative AI Capabilities 1730000831
42 pages
Building Blocks of Rag Ebook Final
100% (2)
Building Blocks of Rag Ebook Final
9 pages
7 Agentic RAG System Architectures To Build AI Agents
100% (1)
7 Agentic RAG System Architectures To Build AI Agents
12 pages
A Taxonomy of Retrieval Augmented Generation
100% (2)
A Taxonomy of Retrieval Augmented Generation
56 pages
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
No ratings yet
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
18 pages
RAG Architecture
100% (8)
RAG Architecture
52 pages
The Complete Guide To RAG
No ratings yet
The Complete Guide To RAG
27 pages
Generative AI
No ratings yet
Generative AI
25 pages
Generative AI Intern - 2025
No ratings yet
Generative AI Intern - 2025
3 pages
Rag - LLM
No ratings yet
Rag - LLM
16 pages
RAG - Genai
No ratings yet
RAG - Genai
11 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
Unit 1
No ratings yet
Unit 1
35 pages
Natural Language Processing ..
No ratings yet
Natural Language Processing ..
20 pages
Artificial Intelligence in Education A Systematic
No ratings yet
Artificial Intelligence in Education A Systematic
16 pages
Github - Blog - Ai and ML - Generative Ai - What Is Retrieval Augmented Generation and What Does It Do For Generative Ai
No ratings yet
Github - Blog - Ai and ML - Generative Ai - What Is Retrieval Augmented Generation and What Does It Do For Generative Ai
14 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
What Is Retrieval Augmented Generation Rag Final v2 Cs
No ratings yet
What Is Retrieval Augmented Generation Rag Final v2 Cs
5 pages
What Is Retrieval-Augmented Generation (RAG)
No ratings yet
What Is Retrieval-Augmented Generation (RAG)
12 pages
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
No ratings yet
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
16 pages
A.I. Content Engine Deck
No ratings yet
A.I. Content Engine Deck
9 pages
What Is Retrieval-Augmented Generation, Aka RAG?: Rick Merritt
No ratings yet
What Is Retrieval-Augmented Generation, Aka RAG?: Rick Merritt
9 pages
Hamilton, Wiliam, and Hattie (2023) - Final
No ratings yet
Hamilton, Wiliam, and Hattie (2023) - Final
45 pages
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
No ratings yet
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
108 pages
7 Popular Agentic RAG System Architectures 1736324693
No ratings yet
7 Popular Agentic RAG System Architectures 1736324693
10 pages
LLM Model
No ratings yet
LLM Model
43 pages
Keynote 1 - Generative AI Inferencing For LLM and Multimodal Models With NEMO
No ratings yet
Keynote 1 - Generative AI Inferencing For LLM and Multimodal Models With NEMO
46 pages
MD Zaid Hussain GEN AI Engineer Resume 1
No ratings yet
MD Zaid Hussain GEN AI Engineer Resume 1
2 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
Retrieval Augmented Generation Options Good 5 38
No ratings yet
Retrieval Augmented Generation Options Good 5 38
34 pages
Tips Reference 1
No ratings yet
Tips Reference 1
42 pages
WWW - K2view - Com - What Is Retrieval Augmented Generation
No ratings yet
WWW - K2view - Com - What Is Retrieval Augmented Generation
29 pages
17 (Advanced) RAG Techniques To Turn Your LLM App Prototype Into A Production-Ready Solution - by Dominik Polzer - Jun, 2024 - Towards Data Science
No ratings yet
17 (Advanced) RAG Techniques To Turn Your LLM App Prototype Into A Production-Ready Solution - by Dominik Polzer - Jun, 2024 - Towards Data Science
54 pages
WWW Databricks Com Glossary Retrieval-Augmented-Generation-Rag
No ratings yet
WWW Databricks Com Glossary Retrieval-Augmented-Generation-Rag
12 pages
Week 3 - LLM - PreTraining
No ratings yet
Week 3 - LLM - PreTraining
41 pages
RAG - The Future of LLMs - LinkedIn
No ratings yet
RAG - The Future of LLMs - LinkedIn
7 pages
Research Ibm Com Blog retrieval-augmented-generation-RAG
No ratings yet
Research Ibm Com Blog retrieval-augmented-generation-RAG
11 pages
Building Blocks of Rag Ebook Final
No ratings yet
Building Blocks of Rag Ebook Final
15 pages
A Practical Blueprint For Implementing Generative AI Retrieval-Augmented Generation
No ratings yet
A Practical Blueprint For Implementing Generative AI Retrieval-Augmented Generation
19 pages
Agent Rag
No ratings yet
Agent Rag
35 pages
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
No ratings yet
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
12 pages
ARTICLE - Is Agentic RAG Worth The Investment? Agentic RAG Pricing and ROI Breakdown
No ratings yet
ARTICLE - Is Agentic RAG Worth The Investment? Agentic RAG Pricing and ROI Breakdown
1 page
NEW 25.02.03 AGENTIC-AI-RESEARCH 2501.09136v2
No ratings yet
NEW 25.02.03 AGENTIC-AI-RESEARCH 2501.09136v2
39 pages
Generative AI PPT Final
No ratings yet
Generative AI PPT Final
34 pages
NVIDIA RAG Whitepaper
No ratings yet
NVIDIA RAG Whitepaper
7 pages
Rag Survey
No ratings yet
Rag Survey
22 pages
RAG 101 With Gaudi - Eduardo Alvarez-1
No ratings yet
RAG 101 With Gaudi - Eduardo Alvarez-1
20 pages
Ifc Report 18
No ratings yet
Ifc Report 18
45 pages
WWW Analyticsvidhya Com Blog 2023 09 Retrieval-Augmented-Generation-Rag-In-Ai
No ratings yet
WWW Analyticsvidhya Com Blog 2023 09 Retrieval-Augmented-Generation-Rag-In-Ai
20 pages
E H - Ai C: A R M F: Valuating Uman Ollaboration Eview and Ethodological Ramework
No ratings yet
E H - Ai C: A R M F: Valuating Uman Ollaboration Eview and Ethodological Ramework
28 pages
Adapting LLM Agents With Universal Feedback in Communi-Cation
No ratings yet
Adapting LLM Agents With Universal Feedback in Communi-Cation
20 pages
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT On Reasoning, Hallucination, and Interactivity
No ratings yet
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT On Reasoning, Hallucination, and Interactivity
44 pages
Legal Query RAG
No ratings yet
Legal Query RAG
17 pages
Building Scalable AI-Powered Applications With Clo
No ratings yet
Building Scalable AI-Powered Applications With Clo
9 pages
Seamless Interactions With Files
No ratings yet
Seamless Interactions With Files
10 pages
BITS Hyderabad
No ratings yet
BITS Hyderabad
15 pages
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
No ratings yet
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
19 pages
Building Domain-Specific Custom LLM Models Harnessing The Power of Open Source Foundation Models
No ratings yet
Building Domain-Specific Custom LLM Models Harnessing The Power of Open Source Foundation Models
11 pages
WWW Oracle Com in Artificial-Intelligence Generative-Ai Retrieval-Augmented-Generation-Rag
No ratings yet
WWW Oracle Com in Artificial-Intelligence Generative-Ai Retrieval-Augmented-Generation-Rag
7 pages
Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework For Medical Applications
No ratings yet
Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework For Medical Applications
13 pages
RAG For Vision - Building Multimodal Computer Vision Systems - by The Tenyks Blogger - Jul, 2024 - Medium
No ratings yet
RAG For Vision - Building Multimodal Computer Vision Systems - by The Tenyks Blogger - Jul, 2024 - Medium
18 pages
3160 Wolf Accurate Video Capti
No ratings yet
3160 Wolf Accurate Video Capti
12 pages
The Billionaire AI Roadmap From Beginner To Ultra-Advanced (5-10 Years)
No ratings yet
The Billionaire AI Roadmap From Beginner To Ultra-Advanced (5-10 Years)
13 pages
The Ultimate Guide To GenAI RAG: Enhancing AI With Real-Time Data Retrieval
No ratings yet
The Ultimate Guide To GenAI RAG: Enhancing AI With Real-Time Data Retrieval
12 pages
Natural Language Processing
No ratings yet
Natural Language Processing
11 pages
GenAI Roadmap
No ratings yet
GenAI Roadmap
8 pages
RAG's Implementation
No ratings yet
RAG's Implementation
9 pages
The Difference Between RAG, Agentic RAG and Agents
No ratings yet
The Difference Between RAG, Agentic RAG and Agents
9 pages
5th and 6th Topic
No ratings yet
5th and 6th Topic
8 pages
Research On The Application of Large Language Models in Human Resource Management Practices
No ratings yet
Research On The Application of Large Language Models in Human Resource Management Practices
8 pages
Cloud Google Com Use-Cases Retrieval-Augmented-Generation
No ratings yet
Cloud Google Com Use-Cases Retrieval-Augmented-Generation
7 pages
PDP On Generative AI Essentials A Deep Dive Into Theory and Practice
No ratings yet
PDP On Generative AI Essentials A Deep Dive Into Theory and Practice
3 pages
WWW Cohesity Com Glossary Retrieval-Augmented-Generation-Rag
No ratings yet
WWW Cohesity Com Glossary Retrieval-Augmented-Generation-Rag
5 pages
MapNation - AI-Powered Personalized Learning Roadmaps
No ratings yet
MapNation - AI-Powered Personalized Learning Roadmaps
12 pages
Reading For Week 10
No ratings yet
Reading For Week 10
5 pages
Top FREE NVIDIA AI Courses
No ratings yet
Top FREE NVIDIA AI Courses
5 pages
RAG Workflowllllll
No ratings yet
RAG Workflowllllll
3 pages
4
No ratings yet
4
3 pages
RAG Detailed Overview
No ratings yet
RAG Detailed Overview
3 pages
Tyjt
No ratings yet
Tyjt
2 pages
What Is RAG.
No ratings yet
What Is RAG.
2 pages
RAG Research Document Abhishek
No ratings yet
RAG Research Document Abhishek
2 pages
Unlocking Data with Generative AI and RAG: Enhance generative AI systems by integrating internal data with large language models using RAG
From Everand
Unlocking Data with Generative AI and RAG: Enhance generative AI systems by integrating internal data with large language models using RAG
Keith Bourne
No ratings yet
Learning AWS
From Everand
Learning AWS
Aurobindo Sarkar
4/5 (4)
Effective Business Intelligence with QuickSight
From Everand
Effective Business Intelligence with QuickSight
Rajesh Nadipalli
No ratings yet
Learning Azure DocumentDB
From Everand
Learning Azure DocumentDB
Becker Riccardo
No ratings yet
Blueprints of DevSecOps: Foundations to Fortify Your Cloud
From Everand
Blueprints of DevSecOps: Foundations to Fortify Your Cloud
Naveen Pakalapati
No ratings yet