0% found this document useful (0 votes)
52 views

Cloud Google Com Use-Cases Retrieval-Augmented-Generation

Uploaded by

uma5b3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Cloud Google Com Use-Cases Retrieval-Augmented-Generation

Uploaded by

uma5b3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Overview Solutions Products Pricing Resources  Docs Support language English‬ Sign in

Contact Us Start free

Topics RAG

What is Retrieval-Augmented
Generation (RAG)?
RAG (Retrieval-Augmented Generation) is an AI framework
that combines the strengths of traditional information retrieval 35:30
systems (such as search and databases) with the capabilities
Grounding for Gemini with Vertex AI Search and DIY
of generative large language models (LLMs). By combining
RAG
your data and world knowledge with LLM language skills,
grounded generation is more accurate, up-to-date, and
relevant to your specific needs. Check out this e-book to
unlock your “Enterprise Truth.”

Get started for free

How does Retrieval-Augmented Generation work?


RAGs operate with a few main steps to help enhance generative AI outputs:

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Retrieval and pre-processing: RAGs leverage powerful search algorithms to query external data, such
as web pages, knowledge bases, and databases. Once retrieved, the relevant information undergoes pre-
processing, including tokenization, stemming, and removal of stop words.
Grounded generation: The pre-processed retrieved information is then seamlessly incorporated into the
pre-trained LLM. This integration enhances the LLM's context, providing it with a more comprehensive
understanding of the topic. This augmented context enables the LLM to generate more precise,
informative, and engaging responses.

RAG operates by first retrieving relevant information from a database using a query generated by the LLM. This
retrieved information is then integrated into the LLM's query input, enabling it to generate more accurate and
contextually relevant text. Retrieval is usually handled by a semantic search engine that uses embeddings
stored in vector databases and sophisticated ranking and query rewriting features, ensuring that the results
are relevant to the query and will answer the user’s question.

Why Use RAG?


RAG offers several advantages augmenting traditional methods of text generation, especially when dealing
with factual information or data-driven responses. Here are some key reasons why using RAG can be
beneficial:

Access to fresh information

LLMs are limited to their pre-trained data. This leads to outdated and potentially inaccurate responses. RAG
overcomes this by providing up-to-date information to LLMs.

Factual grounding

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
LLMs are powerful tools for generating creative and engaging text, but they can sometimes struggle with
factual accuracy. This is because LLMs are trained on massive amounts of text data, which may contain
inaccuracies or biases.

Providing “facts” to the LLM as part of the input prompt can mitigate “gen AI hallucinations.” The crux of this
approach is ensuring that the most relevant facts are provided to the LLM, and that the LLM output is entirely
grounded on those facts while also answering the user’s question and adhering to system instructions and
safety constraints.

Using Gemini’s long context window (LCW) is a great way to provide source materials to the LLM. If you need to
provide more information than fits into the LCW, or if you need to scale up performance, you can use a RAG
approach that will reduce the number of tokens, saving you time and cost.

Search with vector databases and relevancy re-rankers

RAGs usually retrieve facts via search, and modern search engines now leverage vector databases to
efficiently retrieve relevant documents. Vector databases store documents as embeddings in a high-
dimensional space, allowing for fast and accurate retrieval based on semantic similarity. Multi-modal
embeddings can be used for images, audio and video, and more and these media embeddings can be
retrieved alongside text embeddings or multi-language embeddings.

Advanced search engines like Vertex AI Search use semantic search and keyword search together (called
hybrid search), and a re-ranker which scores search results to ensure the top returned results are the most
relevant. Additionally searches perform better with a clear, focused query without misspellings; so prior to
lookup, sophisticated search engines will transform a query and fix spelling mistakes.

Relevance, accuracy, and quality

The retrieval mechanism in RAG is critically important. You need the best semantic search on top of a curated
knowledge base to ensure that the retrieved information is relevant to the input query or context. If your

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
retrieved information is irrelevant, your generation could be grounded but off-topic or incorrect.

By fine-tuning or prompt-engineering the LLM to generate text entirely based on the retrieved knowledge, RAG
helps to minimize contradictions and inconsistencies in the generated text. This significantly improves the
quality of the generated text, and improves the user experience.

The Vertex Eval Service now scores LLM generated text and retrieved chunks on metrics like “coherence,”
“fluency,” “groundedness,” "safety," “instruction_following,” “question_answering_quality,” and more. These
metrics help you measure the grounded text you get from the LLM (for some metrics that is a comparison to a
ground truth answer you have provided). Implementing these evaluations gives you a baseline measurement
and you can optimize for RAG quality by configuring your search engine, curating your source data, improving
source layout parsing or chunking strategies, or refining the user’s question prior to search. A RAG Ops,
metrics driven approach like this will help you hill climb to high quality RAG and grounded generation.

RAGs, agents, and chatbots

RAG and grounding can be integrated into any LLM application or agent which needs access to fresh, private,
or specialized data. By accessing external information, RAG-powered chatbots and conversational agents
leverage external knowledge to provide more comprehensive, informative, and context-aware responses,
improving the overall user experience.

Your data and your use case are what differentiate what you are building with gen AI. RAG and grounding bring
your data to LLMs efficiently and scalably.

What Google Cloud products and services


are related to RAG?
The following Google Cloud products are related to Retrieval-Augmented
Generation:
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Vertex AI Search Vertex AI Vector Search BigQuery
Vertex AI Search is Google The ultra performant vector Large datasets that you can use
Search for your data, a fully index that powers Vertex AI to train machine learning
managed, out-of-the-box Search; it enables semantic and models, including models for
search and RAG builder. hybrid search and retrieval from Vertex AI Vector Search.
huge collections of embeddings
with high recall at high query
rate.

Grounded Generation AlloyDB LlamaIndex on Vertex


API Run models in Vertex AI and Build your own search engine
Gemini high-fidelity mode access them in your application for RAG and grounding using
grounded with Google Search using familiar SQL queries. Use Google or open source
or inline facts or bring your own Google models, such as Gemini, components and our fully
search engine. or your own custom models. managed orchestration system
based on LlamaIndex.

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Further reading
Learn more about using retrieval augmented generation with these resources.

Using Vertex AI to build next-gen search applications | Google Cloud Blog


RAGs powered by Google Search technology
RAG with databases on Google Cloud
Infrastructure for a RAG-capable generative AI application using Vertex AI
APIs to build your own search and Retrieval Augmented Generation (RAG) systems
How to use RAG in BigQuery to bolster LLMs
Code sample and quickstart to get familiar with RAG

Take the next step Need help getting


started?
Start building on Google Cloud with $300 in free credits and Contact sales
20+ always free products.
Work with a trusted
partner
Get started for free Find a partner

Continue browsing
See all products

Why Google Products and pricing Solutions Resources Engage


Choosing Google Cloud Google Cloud pricing Infrastructure modernization Google Cloud Affiliate Program Contact sales

Trust and security Google Workspace pricing Databases Google Cloud documentation Find a Partner

Modern Infrastructure Cloud See all products Application modernization Google Cloud quickstarts Become a Partner

PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Multicloud Smart analytics Google Cloud Marketplace Events

Global infrastructure Artificial Intelligence Learn about cloud computing Podcasts

Customers and case studies Security Support Developer Center

Analyst reports Productivity & work Code samples Press Corner


transformation
Whitepapers Cloud Architecture Center Google Cloud on YouTube
Industry solutions
Blog Training Google Cloud Tech on YouTube
DevOps solutions
Certifications Follow on X
Small business solutions
Google for Developers Join User Research
See all solutions
Google Cloud for Startups We're hiring. Join Google Cloud!

System status Google Cloud Community

Release Notes

About Google | Privacy | Site terms | Google Cloud terms Our third decade of climate action: join us

Sign up for the Google Cloud newsletter Subscribe language English‬

PDFmyURL converts web pages and even full websites to PDF easily and quickly.

You might also like