0% found this document useful (0 votes)

949 views38 pages

2024-05-EB-A Compact GuideTo RAG

2024-05-EB-A_Compact_GuideTo_RAG

Uploaded by

Nikolaos Tsinganos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

949 views38 pages

2024-05-EB-A Compact GuideTo RAG

2024-05-EB-A_Compact_GuideTo_RAG

Uploaded by

Nikolaos Tsinganos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

eBook

A Compact Guide to
Retrieval Augmented Generation (RAG)
Definitions, components and basics for practitioners
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 2

Introduction: Retrieval Augmented Generation (RAG) With Vector Search...................................................................3

Contents LLMs and prompts....................................................................................................................................................................................................................................................................................5
Vector Search and embedding models...................................................................................................................................................................................................................................6

Asking RAG About Databricks Asset Bundles (DABs)................................................................................................................... 8

Asking an LLM without RAG about DABs.................................................................................................................................................................................................................................8
Using RAG to give an LLM access to documentation about DABs....................................................................................................................................................................10

Addressing the Shortcomings of LLMs With RAG........................................................................................................................... 11

RAG compared to LLM-only approaches..............................................................................................................................................................................................................................12

RAG Use Cases.................................................................................................................................................................................................... 14

Question-answering systems....................................................................................................................................................................................................................................................... 14
Customer service................................................................................................................................................................................................................................................................................... 14
Content generation............................................................................................................................................................................................................................................................................... 14
Code assistance...................................................................................................................................................................................................................................................................................... 14

RAG With Vector Search — Step by Step............................................................................................................................................ 15

Data preparation: Getting an external information source into a vector database.......................................................................................................................... 15
Retrieval: Getting relevant context ......................................................................................................................................................................................................................................... 19
Augmentation: Adding context to the user’s prompt .............................................................................................................................................................................................. 23
Generation: Producing useful output with an LLM ..................................................................................................................................................................................................... 27
Evaluation: Measuring RAG performance ..........................................................................................................................................................................................................................30

Utilizing RAG With Other Modeling and Model Customization Methods......................................................................... 31

Prompt engineering.............................................................................................................................................................................................................................................................................. 31
Fine-tuning ................................................................................................................................................................................................................................................................................................ 32
Pretraining .................................................................................................................................................................................................................................................................................................. 33
RAG ................................................................................................................................................................................................................................................................................................................... 33
Combinations of methods .............................................................................................................................................................................................................................................................34

RAG on Databricks.......................................................................................................................................................................................... 35
Lakehouse architecture...................................................................................................................................................................................................................................................................35
Vector Search ..........................................................................................................................................................................................................................................................................................35
Model serving ..........................................................................................................................................................................................................................................................................................35
MLflow ............................................................................................................................................................................................................................................................................................................36
Lakehouse Monitoring .......................................................................................................................................................................................................................................................................36

Summary............................................................................................................................................................................................................... 37
GenAI training........................................................................................................................................................................................................................................................................................... 37
Additional resources........................................................................................................................................................................................................................................................................... 37
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 3

Introduction Retrieval Augmented Generation (RAG) With Vector Search

Retrieval augmented generation (RAG) is the process of combining a user’s prompt with relevant external
information to form a new, expanded prompt for a large language model (LLM) such as GPT-4 or Llama 2. The
expanded prompt enables the LLM to provide more relevant, timely and accurate responses.

LLMs offer powerful language understanding and generation capabilities, but they aren’t reliable information
sources and they lack access to proprietary information or any other information they weren’t trained on. They
are also prone to so-called “hallucinations” — fabricating answers instead of acknowledging that they don’t know
the correct answer.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 4

Using RAG with an LLM helps to address some of these issues. Giving the LLM all the information it needs to
answer a question enables it to provide answers about topics it was not trained on and reduces the likelihood
of hallucinations. For example, an LLM trained on public data cannot answer any questions about a company’s
internal memos or project documents. It’s likely to hallucinate if asked about such documents. A RAG
application can supply all or parts of those documents to the LLM, giving it the context it needs to give correct
and relevant answers.

RAG can work with a variety of data sources, including text, podcasts, videos, live search results and structured
databases. In this document, we look at RAG over stored unstructured data such as PDFs, scraped web pages
or code. This guide focuses on a RAG approach in which data is retrieved from vector databases using a
process known as Vector Search.

We’ll start with some brief background information on LLMs and prompts, which form the foundation for RAG.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 5

LLMs and prompts

LLMs are a relatively new class of machine learning algorithms that can interpret, manipulate and generate
text-based content. They’re trained on massive text datasets from diverse sources, including books, text
scraped from the internet and code repositories. During the training process, the model learns statistical
relationships between words and phrases, enabling it to generate new text using the context of text it has
already seen or generated.

LLMs are typically used via “prompting.” A prompt is text that a user provides to an LLM and that the LLM
responds to. Prompts can take many different forms. Some models are trained to complete text, so prompts
to these models take the form of incomplete statements like “Jack and Jill went up the hill to…,” which the
model then continues. Other models expect questions or instructions as prompts. These models can answer
questions such as, “What happened to Jack after Jack and Jill went up the hill?” RAG applications that enable
users to ask questions about text generally use instruction-following and question-answering LLMs.

LLMs can typically handle prompts of at least several paragraphs in length. This is important for RAG. In RAG,
the user’s question or instruction is combined with some information retrieved from an external data source,
forming the new, augmented prompt.

Next, let’s discuss where the external information is retrieved from.

E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 6

Vector Search and embedding models

An effective RAG application must be able to find information relevant to the user’s prompt and supply it to the
LLM. Selecting the most relevant texts from potentially millions of documents can be a significant challenge. To
address this challenge, we use a technique called Vector Search to identify text relevant to the user’s prompt.

In a RAG system with Vector Search, a special type of language model called an “embedding model” translates
each text we want to search into a numeric vector, a series of numbers that encapsulates the text’s meaning.
The same model also converts the user’s query to a comparable vector. This process makes it possible
to mathematically compare the user’s query to the text and identify those that are the most similar and the
most relevant.

Crucially, these vectors represent the meanings of the text from which they are generated. Thus, when we
search for relevant vectors, we are searching on the basis of meaning. This approach enables us to retrieve the
text most relevant to the user’s query and pass it to the LLM along with the user’s original query.

We’ve talked a lot about embeddings encoding the meanings of texts. It’s important to note that this is the
meaning according to the model. The models are trained to capture meanings and relationships that humans
care about, but the meaning captured by the embedding model may not be exactly what we want. This is one
reason why it’s essential to test and evaluate every component of a RAG application.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 7

V ECTO R DATABAS ES

The vectors generated by embedding models are often stored in a specialized vector database. Vector
databases are optimized for storing and retrieving vector data efficiently. Like traditional databases, vector
databases can be used to manage permissions, metadata and data integrity, ensuring secure and organized
access to information. They also tend to include update mechanisms so newly added texts are indexed and
ready to use quickly.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 8

Suppose we want to ask some questions about Databricks Asset Bundles (DABs), a new Databricks feature
Asking that enables an infrastructure as code (IaC) approach to managing Databricks projects. We’ll use the Llama 2
RAG About LLM, which doesn’t have access to up-to-date information about DAB because it was trained before the DAB
Databricks Public Preview was released.

Asset Bundles All the following results are actual results generated by an LLM or by the vector database.

(DABs)
Asking an LLM without RAG about DABs
Let’s see what happens if we ask the Llama 2-70B-chat LLM a couple questions about Databricks Asset
Bundles without providing it with any documentation. In this case, the LLM is being used alone, not in the
context of a RAG application.

1 User: What are databricks DABs?

2 Bot: Databricks DABs (Databricks Automation Building blocks) are a set of pre-built, reusable components that enable
3 users to automate and streamline their data engineering workflows on the Databricks platform. DABs provide a way to
4 encapsulate and reuse common data engineering patterns, such as data ingestion, transformation, and loading, across
5 different use cases and projects.

6 DABs are designed to be modular and can be combined to create more complex workflows. They provide a way to define
7 repeatable data engineering tasks that can be shared across different teams and projects, improving productivity and
8 reducing the risk of errors.

In this example, the model just guessed, stating incorrectly that DAB stands for “Databricks Automation
Building blocks” and expanding on what those might be. Notably, it never indicated that it didn’t know the
answer or that it was making things up. This could represent a real issue in a customer support setting.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 9

Let’s try another question.

1 User: What version of the databricks CLI do I need in order to use Databricks Asset Bundles?

2 Bot: To use Databricks Asset Bundles, you need to have a Databricks CLI version that is greater than or equal to
3 1.10.0.

4 You can check your current version of the Databricks CLI by running the command `databricks --version` in your
5 terminal or command prompt.

6 If you have an older version of the Databricks CLI installed, you can upgrade to the latest version by running the
7 command `pip install --upgrade databricks-cli` (for Python 3) or `pip install --upgrade databricks-cli<=1.10.0` (for
8 Python 2).

9 Alternatively, you can download the latest version of the Databricks CLI from the Databricks website and install it
10 manually.

11 Once you have installed the required version of the Databricks CLI, you can use the `databricks assets` command to
12 manage Databricks Asset Bundles.

Databricks CLI version 1.10.0 doesn’t exist and isn’t the correct answer. Furthermore, there is no databricks
assets command. In both cases, the model hallucinated an answer.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 10

Using RAG to give an LLM access to documentation about DABs

What happens if we set up a RAG system that can search for relevant segments of the Databricks Asset
Bundles docs and provide them to the model?

We set up a RAG application by extracting text from documentation about DABs, loading it into a vector
database and setting up a retrieval system to pass relevant information to the LLM.

Now when we ask our question, the RAG application queries the vector database and retrieves the most
relevant results, which it sends along with our question to the LLM. Let’s ask the same questions as before
and see how the results differ.

Here are the new results.

1 User: What are databricks DABs?

2 Bot: According to the provided context, Databricks Asset Bundles (DABs) are a tool for streamlining the development
3 of complex data, analytics, and machine learning (ML) projects on the Databricks platform. They provide a way to
4 manage complex projects during active development by offering CI/CD capabilities to software development workflows
5 in a single, concise, and declarative YAML syntax. DABs allow users to automate project tests, deployments, and
6 configuration management, which can reduce errors and promote software best practices across an organization through
7 templated projects.

1 User: What version of the databricks CLI do I need in order to use Databricks Asset Bundles?

2 Bot: Databricks CLI version 0.205 or higher is required for Databricks Asset Bundles. You can confirm the version by
3 running the command 'databricks --version' in your terminal. Note that version 0.205.2 or higher is required. If you
4 haven't installed the Databricks CLI yet, you can install it by following the instructions in the Install or update
5 the Databricks CLI documentation. Additionally, make sure that your remote Databricks workspaces are configured
6 correctly, specifically that the workspace files feature is enabled, which is the case by default for Databricks
7 Runtime version 11.2 or later.

In both cases, the model generated correct answers because it was given access to the information it needed.

Without RAG, the best the LLM could do was guess or admit it didn’t know. With RAG, the LLM provided the
correct answers.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 11

LLMs are capable of giving compelling and coherent answers to user prompts. They’re often able to detect
Addressing the nuance, identify context and give the appearance of reasoning when answering. And because they’re trained
Shortcomings of on vast amounts of data, they have access to enormous amounts of knowledge.
LLMs With RAG LLMs, however, are not reliable as knowledge sources. LLMs often respond with made-up answers, or
hallucinations, rather than acknowledging that they don’t know the correct answer. Furthermore, LLMs are
fundamentally limited by their training data. An LLM alone doesn’t know anything that happened after it’s
training is completed, and it lacks access to proprietary information, such as company documents, that
weren’t publicly available during training.

We can try to mitigate these problems by explicitly giving the LLM the information it needs to address the
user’s prompt. This can be as simple as copying and pasting a couple of pages of reference documents and
sending them, along with a question, to ChatGPT or another LLM. In so doing, we have augmented the original
prompt (the question) with the information needed to answer the question accurately. The additional step
of building a retrieval system, such as a vector database, allows us to automate this process and ensure
that the model has the most relevant information without requiring the user to seek it out and add it to
the prompt manually.

While implementing RAG with Vector Search involves the extra steps of data processing and managing the
generated vectors (often with a vector database), it can help to address the limitations of using LLMs alone.
RAG improves on LLM-only approaches by providing additional, specific context that the LLM can use when
formulating an answer. RAG also has benefits compared to retrieval-only systems, as the LLM can process
texts from multiple sources into a readable output tailored to the user’s prompt.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 12

RAG compared to LLM-only approaches

Let’s explore some of the benefits of RAG in more detail.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 13

■ RAG applications can incorporate proprietary data. Most LLMs are trained on publicly available data.
They don’t have access to a company’s internal documents or communications. RAG allows you to
supply proprietary or domain-specific information such as internal memos, emails or design documents
to the model.
■ RAG applications can access up-to-date information. LLMs are generally trained at a particular point
in time and then released for use. Older models lack updated information about the state of the world or
a particular field or business. For example, if a new version of a software product is released after an LLM
was trained, it won’t be able to provide assistance specific to the new version. RAG provides a way to
supply the model with up-to-date information.
■ RAG can enhance the accuracy of LLM responses. LLMs alone can respond with incorrect or
fabricated information (hallucinations). They aren’t consistent or reliable information sources. An
effective RAG system can retrieve relevant and correct references and supply them to the model,
potentially reducing the occurrence of hallucinations. Outputs can include citations of original sources,
allowing for human veriﬁcation.
■ RAG enables fine-grained data access control. LLMs alone cannot reliably provide different
responses to different users based on security or permission considerations. RAG applications,
on the other hand, can be designed to retrieve only documents that a user has permission to access.
This can enable LLMs to securely reference conﬁdential or personal data based on the access
credentials of the system’s user.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 14

RAG equips LLMs with context-specific information that LLMs alone either don’t possess or may not be able to
RAG Use Cases generate reliably when needed. This enables several different applications that would be difficult or impossible
using only LLMs.

Question-answering systems
RAG becomes invaluable in use cases where the aim is to “talk to documents,” such as querying HR policies
or accessing real-time financial reports. Using RAG, information can be dynamically retrieved and presented
in a conversational manner to an end user. For instance, a large e-commerce company uses Databricks for an
internal RAG application, enabling their HR team to query hundreds of employee policy documents.

Customer service
RAG systems can streamline the customer service process by providing support personnel with personalized
and more informed responses to customer queries. This can enhance customer experience, reduce response
times and increase resolution efficiency. We see this kind of “internal copilot” RAG application across many
customers seeking to improve the efficiency and effectiveness of internal workers.

Content generation
In content creation scenarios, RAG can be used to draft communications, like sales emails, by integrating the
most recent data and relevant context. This can ensure that customer outreach is both personalized and
reflects the latest information. One Databricks customer is leveraging RAG to draft email responses to inbound
sales emails, incorporating external product and customer information into responses.

Code assistance
RAG can enhance code completion and code Q&A systems by intelligently searching and retrieving
information from code bases, documentation and external libraries. This can result in improved code
generation and more relevant responses compared to LLM-only code assistants.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 15

RAG with Vector Search involves retrieving information using a vector database, augmenting the user’s
RAG With Vector prompt with that information and generating a response based on the user’s prompt and information retrieved
Search — Step using an LLM. In this section, we’ll review each of these steps, focusing on the processes represented in a
by Step reasonably standard RAG system. Be aware that there are many different approaches to these steps, including
some advanced techniques that may increase performance but add complexity.

Data preparation: Getting an external information source into

a vector database

Before we can perform RAG with Vector Search, we need to get data — in this case, unstructured text data —
into a vector database. There are many approaches to doing this, and it’s essential to try different methods to
determine which is the most effective for your use case.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 16

Data preparation generally isn’t a one-time task, because a vector database should be regularly updated to
provide up-to-date and high-quality information. This is one of the key benefits of RAG — we can continuously
update the vector database without needing to update the LLM weights over time.

A few core steps for preparing data for RAG include the following:

PARS I N G TH E I N PUT D O C U M E NTS

The raw documents may not be in a format amenable to processing for RAG with Vector Search. Images may
need to be converted to text; tables or images might require further processing and there may be extraneous
text, such as page headers or page numbers, to clean up or remove. It is often necessary to parse the raw
input documents and get them into a format — usually text — that will work with the rest of the RAG pipeline.

S PLIT TI N G D O C U M E NTS I NTO C H U N KS

You typically don’t want to retrieve entire books, web pages or articles in a RAG application. Instead, split the
documents into smaller chunks so you can send more specific results to the LLM for context. “Documents” is a
general term for referring to source texts, but you can think of documents as any kind of text.

Chunk size can affect the output quality of a RAG application. If the chunks are too small, they may not include
enough context to address the user’s query. If the chunks are too large, the LLM may fail to pull out the relevant
details, focusing instead on other details included in the chunk.

There’s no one-size-fits-all solution to choosing the best chunk size. It depends on the source documents, the
LLM and the RAG application’s goals. It’s important to try different chunk sizes.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 17

E M B E D D I N G TH E TE X T C H U N KS

After splitting the source documents into manageable chunks, use an embedding model to convert each of
those chunks into a high-dimensional numerical vector.

An embedding model is a special kind of language model that uses its knowledge of language to generate a
numeric vector or a series of numbers, called an embedding, from a text. Embeddings encode the nuanced
and context-specific meaning of each text in numeric form. A good embedding model will know that “raining
cats and dogs” is a phrase about the weather, not a phrase about pets.

The true power of embeddings for RAG is that they can be mathematically compared to each other. We can
measure how “similar” two embeddings are, which in this context, equates to how closely the meanings of
their original texts are related. This will be especially useful later in the RAG process when we embed a user’s
prompt, compare it to the embedded texts in the vector database and identify those we think are most likely
to help the LLM provide a useful answer.

STO R I N G AN D I N D E XI N G TH E E M B E D D I N G S

Embeddings are stored in a specialized kind of database known as a vector database, which is designed to
efficiently store and search for vector data like embeddings. A vector database is a type of vector store —
these terms are often used interchangeably — though “vector store” can refer to any type of vector storage
solution, not just to databases. Vector databases often incorporate update mechanisms so newly added
chunks can be searched and retrieved immediately. While such databases are not strictly necessary for RAG
or Vector Search, they often meaningfully improve RAG performance and reliability.

Having a huge number of text chunks can result in slower retrieval speeds. A common approach to maintain
performance is to index the embeddings with a vector index. A vector index is a mechanism, often part of a
vector database, that uses various algorithms to organize and map vector embeddings in a way that optimizes
search efficiency.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 18

R ECO R D I N G M E TADATA

Capturing metadata along with text chunks allows us to filter results based on metadata (if applicable)
and provide detailed references along with the results. A RAG application with metadata could, for instance,
provide specific URLs or page numbers for the sources retrieved, or it could allow users to explicitly filter
by date or source.

E X AM PLE : PR E PRO C ES S I N G AN D E M B E D D I N G

In the example above, we started with two documents about Databricks Asset Bundles. To use them for RAG, we:

1. Split each document into chunks. For example, one of those chunks is:

1 Databricks Assets Bundles are an infrastructure-as-code (IaC) approach to managing your Databricks projects. Use them
2 when you want to manage complex projects where multiple contributors and automation are essential, and continuous
3 integration and deployment (CI/CD) are a requirement. Since bundles are defined and managed through YAML templates
4 and files you create and maintain alongside source code, they map well to scenarios where IaC is an appropriate
5 approach.\n\nSome ideal scenarios for bundles include:\n\nDevelop data, analytics, and ML projects in a team-
6 based environment. Bundles can help you organize and manage various source files efficiently. This ensures smooth
7 collaboration and streamlined processes.\n\nIterate on ML problems faster. Manage ML pipeline resources (such as
8 training and batch inference jobs) by using ML projects that follow production best practices from the beginning.'

2. Embed the chunks. We use a general-purpose embedding model called bge-large-en to turn each
chunk into a 1024-dimension numeric vector, which is basically a list of 1024 numbers. The chunk above is
translated to:

1 [0.0209503173828125, 0.0172576904296875, -0.003314971923828125, -0.0025310516357421875, 0.00670623779296875,

2 -0.00506591796875, 0.0005450248718261719, -0.049896240234375, 0.00630950927734375, 0.0003032684326171875,
3 -0.001049041748046875, -0.0084991455078125, 0.031585693359375, -0.0621337890625, -0.009765625, 0.017669677734375,
4 -0.045623779296875, 0.0022907257080078125, -0.0736083984375, 0.0286102294921875, 0.01532745361328125, 0.0298919677734375,
5 -0.09027099609375, -0.0207977294921875, -0.048736572265625, 0.07818603515625, 0.0648193359375,

6 ...

7 -0.01076507568359375, 0.0123443603515625, -0.016693115234375, 0.0243377685546875, 0.0244903564453125]

E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 19

3. Once both documents are split into chunks and embedded, we use Databricks Vector Search to store
and index the embeddings. We also record some metadata — the document title and the document
date — along with each text chunk and embedding. We’ll come back to these vectors and show how
they are used to retrieve relevant information in the next section, which focuses on retrieval.

At this phase, the data has been preprocessed and can now be queried. The next step is to retrieve the
relevant information from the vector database.

Retrieval: Getting relevant context

After preprocessing our original documents, we have a vector database storing the text chunks, embeddings
and metadata. With this in place, we can get to the first step in RAG: retrieval. In the retrieval step, the user
provides a prompt, often a question, to the RAG application. The RAG application uses the prompt to query
the database and identify the most relevant results, which can be used to augment the original prompt (the
next step).
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 20

Q U E RY I N G TH E V ECTO R DATABAS E

We can’t directly match a user’s input, which is usually plain text, with the records in our vector database.
So, first we need to use the same embedding model that was used to embed the original text chunks to also
embed the user’s query. Once we have the embedded query, we can search the vector database to find the
most similar records.

If the database contains only a small number of records, searching might involve calculating a similarity score
for each record. For larger databases, we use vector indexes and specialized search algorithms — many of
which use approximations to improve efficiency — to speed up the process.

Once the vector database has identified the most relevant results, the texts of those results can be combined
with the user’s prompt and sent to the LLM to generate the final response. Note that the embeddings are
not “translated back” to text. Instead, the text chunks are stored with the embeddings or linked to them via
database keys, so the chunks can simply be retrieved and sent to the next step in the RAG process.

We should decide how many results our RAG system should retrieve. This, like chunk size, is worth testing and
can have a significant impact on the quality of the results. Retrieving too few records may mean missing some
relevant information, while too many results may dilute the relevant information and make it more likely for the
LLM to give irrelevant answers.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 21

I M PROV I N G R E TR I E VAL

The approach described above is often quite effective, but there are many more advanced techniques for
improving retrieval, including:

■ Hybrid search: This method blends traditional keyword search with Vector Search, which can improve
retrieval accuracy
■ Reranking: An additional model can be used to reorder the records initially returned by the similarity
search, ensuring the most relevant results are prioritized
■ Summarized text comparison: Some RAG applications don’t compare the user’s prompt directly
to raw text embeddings. Instead, they use embeddings of summarized texts for a more efficient
matching process.
■ Contextual chunk retrieval: It’s often beneficial to include chunks adjacent to the most relevant ones
(e.g., the paragraphs preceding and following a retrieved chunk). This approach provides more complete
context, which might aid the LLM in generating a useful response.
■ Prompt refinement: Some RAG applications employ a language model to refine the user’s original
prompt, crafting a new query that better captures the user’s intent for more effective searching in the
vector database
■ Domain-specific tuning: Utilizing embedding models that are fine-tuned for specific tasks or domains
can enhance the accuracy and relevance of the retrieved information

These approaches are worth trying if testing reveals that the retrieval component of a RAG application is often
failing to return the most relevant records from the vector database.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 22

E X AM PLE : R E TR I E V I N G TH E I N FO R MATI O N

All the information we need about Databricks Asset Bundles is now available in the vector database. In the
retrieval phase, we have to get that information out. To do so, we:

1. Embed the prompt. We use the same embedding model we used to embed the original document
chunks to embed the query, and once again the result is a 1024-dimensional vector. So if we start with
the prompt “What are Databricks Asset Bundles?,” we end up with the embedding:

1 [0.006649017333984375, 0.029144287109375, 0.0001398324966430664, 0.00481414794921875, -0.006526947021484375,

2 -0.00818634033203125, 0.029571533203125, -0.031982421875, 0.01082611083984375, 0.0025653839111328125, -0.013031005859375,
3 0.01155853271484375, 0.054931640625, -0.049224853515625, -8.767843246459961e-05, 0.03131103515625, -0.022613525390625,
4 0.0148162841796875, -0.052520751953125, 0.003780364990234375, 0.0279998779296875, 0.018585205078125, -0.081787109375,
5 -0.030731201171875, -0.0236053466796875, 0.0357666015625, 0.03387451171875, 0.0335693359375,

6 ...,

7 0.0260467529296875, -0.0013675689697265625, 0.032318115234375, -0.002666473388671875, 0.0269012451171875, 0.0616455078125]

2. Use the embedding to search the vector database. We use the built-in similarity_search method
of Databricks Vector Search to query the vector database with the embedded prompt. We specify that
we want it to return the stored text and that we want the two most relevant results. From this,
the database returns:
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 23

1 ['What are Databricks Asset Bundles?\nJanuary 08, 2024\n\nIn this article you will learn the basics of using
2 Databricks Asset Bundles, a new tool for streamlining the development of complex data, analytics, and ML projects for
3 the Databricks platform. Bundles make it easy to manage complex projects during active development by providing CI/CD
4 capabilities to your software development workflow in a single concise and declarative YAML syntax. By using bundles
5 to automate your project's tests, deployments, and configuration management you can reduce errors while promoting
6 software best practices across your organization as templated projects.\n\nPreview\n\nThis feature is in Public
7 Preview.\n\nBundles provide a way to include metadata alongside your project's source files to specify information
8 including:\n\nRequired cloud infrastructure and workspace configurations.\n\nUnit and integration tests.',

9 'Databricks Assets Bundles are an infrastructure-as-code (IaC) approach to managing your Databricks projects.
10 Use them when you want to manage complex projects where multiple contributors and automation are essential, and
11 continuous integration and deployment (CI/CD) are a requirement. Since bundles are defined and managed through
12 YAML templates and files you create and maintain alongside source code, they map well to scenarios where IaC is an
13 appropriate approach.\n\nSome ideal scenarios for bundles include:\n\nDevelop data, analytics, and ML projects in
14 a team-based environment. Bundles can help you organize and manage various source files efficiently. This ensures
15 smooth collaboration and streamlined processes.\n\nIterate on ML problems faster. Manage ML pipeline resources (such
16 as training and batch inference jobs) by using ML projects that follow production best practices from the beginning.']

Now what do we do with this information? This is where the augmentation part of RAG comes in.

Augmentation: Adding context to the user’s prompt

E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 24

The retrieval component of a RAG system queries the vector database with the user’s prompt. It returns a
selection of relevant texts and, in some cases, metadata. The texts are used to augment the original prompt.

AU G M E NTI N G TH E PRO M P T WITH TH E R E TR I E V E D CO NTE X T

In its simplest form, “augmentation” means combining the user’s original prompt with the retrieved texts.
This equips the model with both the prompt and the context needed to address the prompt. In practice, the
structure of the new prompt that combines the retrieved texts and the user’s prompt can impact the quality of
the final result.

For example, the final prompt usually includes an instruction on how to use the context:

1 ```

2 Based on the following context, answer the user's question. Context:

3 {context}

4 User question:

5 {user question}

6 ```

Depending on the model, putting the context first might be more or less effective than putting the user’s
question first. The phrasing might also be consequential. For example, you might want to phrase the instruction
to emphasize that the model should generate its answer using only the retrieved context.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 25

CO NTE X T WI N D OW: H OW M U C H CO NTE X T S H O U LD WE PROV I D E?

LLMs are limited by a “context window,” or the amount of text they can process to generate a response.
Designing a RAG system involves ensuring that all retrieved texts and the user’s prompt fit within this window.
Overloading the system with too many texts might lead to errors or lost context.

Some LLMs boast longer context windows, capable of handling texts as lengthy as short books. But this doesn’t
mean adding more texts to the user’s prompt will always be beneficial. LLMs sometimes struggle to pay equal
attention to all parts of a lengthy context. They typically focus more effectively on the beginning and end,
potentially overlooking the middle content. This is known as the “lost in the middle” phenomenon. Hence,
even with longer context windows, careful selection and arrangement of texts are crucial for augmenting
prompts effectively.

E X AM PLE : PRO M P T AU G M E NTATI O N

Our question for the RAG application was “What are Databricks Asset Bundles?” and we retrieved the following
two chunks for context:

We use these and some more general instructions to construct the final prompt we send to the model:

1 You are a helpful assistant. Answer the user's question. If context is provided, you must answer based only on the
2 context. If no context is provided, answer based on your knowledge. If you don't know the answer, say you don't know.
3 Be concise.

4 Answer the question based on the provided context. Context:

5 What are Databricks Asset Bundles?\nJanuary 08, 2024\n\nIn this article you will learn the basics of using Databricks
6 Asset Bundles, a new tool for streamlining the development of complex data, analytics, and ML projects for the
7 Databricks platform. Bundles make it easy to manage complex projects during active development by providing CI/CD
8 capabilities to your software development workflow in a single concise and declarative YAML syntax. By using bundles
9 to automate your project's tests, deployments, and configuration management you can reduce errors while promoting
10 software best practices across your organization as templated projects.\n\nPreview\n\nThis feature is in Public
11 Preview.\n\nBundles provide a way to include metadata alongside your project's source files to specify information
12 including:\n\nRequired cloud infrastructure and workspace configurations.\n\nUnit and integration tests.

13 Databricks Assets Bundles are an infrastructure-as-code (IaC) approach to managing your Databricks projects. Use them
14 when you want to manage complex projects where multiple contributors and automation are essential, and continuous
15 integration and deployment (CI/CD) are a requirement. Since bundles are defined and managed through YAML templates
16 and files you create and maintain alongside source code, they map well to scenarios where IaC is an appropriate
17 approach.\n\nSome ideal scenarios for bundles include:\n\nDevelop data, analytics, and ML projects in a team-
18 based environment. Bundles can help you organize and manage various source files efficiently. This ensures smooth
19 collaboration and streamlined processes.\n\nIterate on ML problems faster. Manage ML pipeline resources (such as
20 training and batch inference jobs) by using ML projects that follow production best practices from the beginning.

We’ve now augmented the original prompt with the context needed to address it and with instructions on how
to use the context. The last step is to pass this along to the LLM.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 27

Generation: Producing useful output with an LLM

After the retrieval and augmentation steps, we have a prompt and a set of retrieved texts, formatted with
instructions on how to use the texts to answer the prompt. In the generation step of RAG, we send the
augmented prompt to an LLM, and the LLM responds with an answer.

At this stage, many approaches can be used to customize the final output and adjust the end user’s
experience with the RAG system.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 28

PRO M P T E N G I N E E R I N G

As noted in the previous section, we can include instructions on how the model should use the retrieved
context (e.g., “answer only based on the provided context”). We can provide similar instructions to the LLM
to guide its output tone and structure. We might, for example, specify that the answers should be very polite
or should be phrased in a short and direct manner. We could also instruct the model to refuse to answer
questions about a competitor’s products.

PR E PRO C ES S I N G AN D P OSTPRO C ES S I N G

Some LLMs are quite good at following instructions, but none are completely reliable. Some programmatic
preprocessing or postprocessing can be used to structure the output in a certain way or to guarantee that
certain types of responses are or are not generated. For example, we could append a list of links to the most
relevant context sources at the end of each response, or we could automatically respond with a polite refusal
to answer if specific words or phrases are included in the user’s prompt.

CO N V E RSATI O N H I STO RY

Another important consideration is whether the RAG system should have a memory. In other words, can it
engage in multi-turn conversations in which it remembers earlier questions and responses? A conversational
AI system can give users the opportunity to refine their questions and ask follow-ups, but it adds some
complexity. Different models and frameworks use different approaches to storing conversation histories, and
including conversation histories may require more careful management of context length.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 29

E X AM PLE : R AG O UTPUT

After we generate our augmented prompt, it is sent to the LLM. This is the response:

1 User: What are Databricks Asset Bundles?

2 Bot: Databricks Asset Bundles are a way to manage complex data, analytics, and machine learning projects on the
3 Databricks platform using an infrastructure-as-code (IaC) approach. They provide a way to include metadata alongside
4 project source files to specify information such as required cloud infrastructure and workspace configurations, unit
5 and integration tests, and CI/CD capabilities. Bundles are defined and managed through YAML templates and files,
6 making it easier to manage complex projects with multiple contributors and automation, and promote software best
7 practices across an organization. They are particularly useful in scenarios such as team-based development, iterating
8 on machine learning problems, and managing ML pipeline resources.

One way to enhance the generation step of RAG is to add multi-turn conversation ability. This allows us to ask
a follow-up question that refers back to the original question and answer:

1 User: What version of the CLI do I need in order to use them?

2 Bot: According to the provided context, you need Databricks CLI version 0.205.2 or higher to use Databricks Asset
3 Bundles. You can check the version of the Databricks CLI you are currently using by running the command `databricks
4 --version` in your terminal. If you don't have the required version, you can install or update the Databricks CLI
5 following the instructions provided in the article.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 30

Evaluation: Measuring RAG performance

Because a RAG application has many adjustable variables that may affect the retrieval or generation quality, it’s
important to have ways to measure its performance.

RAG evaluation is an area of active research and experimentation. It’s one of the most challenging parts of
setting up a RAG application, and there’s no universal solution. As we’ve discussed, RAG comprises multiple
steps, and it’s often helpful to evaluate these steps separately. A great LLM won’t be able to compensate for a
poor retrieval pipeline, and the best retrieval system can’t overcome the limitations of a weak model.

At its core, RAG evaluations involve generating prompts, identifying the relevant records that should be
retrieved to address each prompt and generating good answers to those questions. Running the evaluations
means passing each evaluation prompt to the RAG application and comparing the desired retrievals and
responses to the actual retrievals and responses.

RAG evaluations often rely on other LLMs to judge response quality. For example, RAG responses are often
evaluated on their “faithfulness” to the provided context. A judge LLM examines the context and the end
response from a RAG application and provides a rating on how true the response is to the context.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 31

At this point, you should have a good idea of what RAG is and how it compares to using an LLM alone. However,
Utilizing RAG RAG isn’t the only approach to customizing LLMs or equipping them with new information. In this section, we
With Other explain how RAG fits into the broader context of LLM customization approaches.
Modeling All of the following approaches involve tradeoffs between cost, complexity and expressive power. “Cost” refers
and Model simply to the financial cost of setting up and using a given model or system. “Complexity” means the intricacy

Customization or technical difficulty, which may be reflected in the time, effort and expertise required. And “expressiveness”
refers to the model’s or system’s ability to generate diverse, meaningful and useful responses tailored to your
Methods specific needs.

These methods are not mutually exclusive and should be used in combination to maximize task- or domain-
specific performance.

Prompt engineering
Prompt engineering is the process of designing prompts or prompt templates that guide a model’s outputs
toward a desired result. It’s typically the least complex approach and entails the lowest up-front costs
because it doesn’t involve changing the model’s weights or working with any external data systems.

The cost associated with prompt engineering will vary. Large and highly capable models are often required in
order to understand and follow complex prompts. These models often entail higher serving costs or per-token
costs than smaller, less-capable models. That said, prompt engineering doesn’t come with the high up-front
costs of training a model or setting up the infrastructure for a production RAG system.

The expressiveness obtainable via prompt engineering is fundamentally limited by the underlying model.
Prompt engineering offers a good alternative to RAG in cases where there’s no need for proprietary
or recent knowledge.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 32

Fine-tuning
Fine-tuning is the process of adapting a pretrained generative model to a new domain or task by training all or
some of the model’s weights (or, in the case of adapter methods, new weights) on new data. The primary goal
of fine-tuning is to enhance the model’s expressiveness and accuracy in handling domain-specific queries
or tasks. For example, a language model might be fine-tuned to follow instructions based on a large dataset
of instruction and response data. Or a model could be fine-tuned on a collection of medical texts in order to
better understand medical jargon.

The cost and complexity of this process can vary greatly, depending on factors such as the size of the model,
the quantity and specificity of the training data and the nature of the task. Fine-tuning can sometimes be
used to reduce costs. A smaller model fine-tuned on a specific task can replace a larger and more expensive
generalist model.

FI N E-TU N I N G AN D R AG

While RAG excels in enhancing a model’s responses with additional, relevant information, it doesn’t
fundamentally change the model’s behavior or linguistic style. Any limitations or quirks of the base model will
still be present in a RAG system, while fine-tuning can durably change the model’s behavior in ways that are
less constrained by the base model’s behavior.

On the other hand, fine-tuning doesn’t include a straightforward mechanism for rapidly updating the model
with new information, and it may not be as reliable as RAG for generating relevant responses, even if the model
was fine-tuned on relevant data.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 33

Pretraining
Pretraining is the process of training an LLM from scratch. This is the highest-cost and highest-complexity
approach, but it offers the greatest potential control over the model’s expressiveness.

Pretraining a model gives you control over all the data that goes into it. This might mean including proprietary
data not available to off-the-shelf models or excluding data from sources not deemed trustworthy, reliable or
legally acceptable in a given business context. For example, you may decide not to include Reddit data when
pretraining a model to give legal or financial advice.

Consider pretraining when it is essential to understand and control all the data a model is trained on or when
you need a domain-specific model that meets certain evaluation or performance requirements not available in
existing models.

RAG
RAG is more complex than prompt engineering alone. It requires setting up a retrieval system (i.e., a vector
database) and integrating the retrieved context with the prompt.

If access to external information is the goal, RAG offers many benefits:

■ Ability to add and remove data sources without changing the model
■ Control over who has permission to access certain data sources
■ Flexibility to compare different LLMs without needing to train them on new data

The cost and complexity of a RAG system depends on the choice of model and on the scale and structure of
the retrieval system. As with any database, a vector database that guarantees low-latency retrieval over a vast
number of records will be more costly than a higher-latency, smaller-scale system. But a RAG system alone will
not entail the up-front cost and complexity of pretraining or fine-tuning a model.

The expressiveness of a RAG system, though limited by the choice of LLM, can still be quite high given its
access to contextually relevant external data sources.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 34

Combinations of methods
These methods can and often should be used together. RAG and prompt engineering are already inseparable
— merging the user’s prompt with the external data sources is a form of prompt engineering.

Using a custom pretrained or fine-tuned model in a RAG system can improve the RAG system by offering
fine-grained control over the model’s data, the response tone and structure and the aptitude with
domain-specific language.

In general, it’s good practice to start with less costly, less complex methods and evaluate their performance.
Moving to more complex methods or combinations of methods is a good option when the simpler methods
prove inadequate for their tasks.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 35

While it’s possible to set up a quick RAG demo on your laptop in a few minutes, a production-ready RAG
RAG on system requires careful orchestration of several different components in a reliable, scalable and secure
Databricks manner. Databricks offers an end-to-end RAG solution combining data management and governance with a
vector database, model serving and other tools for managing and monitoring AI processes.

Lakehouse architecture
RAG applications in Databricks are built on lakehouse architecture. The lakehouse centralizes the management
of structured data, unstructured data and AI assets under a common governance scheme, Unity Catalog.
Organizations can build cloud-agnostic RAG systems on proprietary data with sophisticated security, lineage
tracking and monitoring.

Vector Search
Databricks Vector Search enables you to create an auto-updating vector database from Delta tables,
managed via Databricks Unity Catalog and searchable using a simple API. Databricks Vector Search scales
automatically to handle different numbers of documents and queries.

Model serving
Databricks Model Serving provides a number of different ways to host and use LLMs and embedding models
for RAG. Databricks Model Serving supports custom models via MLflow; governance of external models such as
SaaS models from OpenAI, Anthropic or Google; and state-of-the-art open source models from the Databricks
Foundation Model APIs, which offer both pay-per-token and provisioned throughput offerings.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 36

MLflow
MLflow is an open source end-to-end model lifecycle management platform. It includes a variety of tools
useful for implementing and improving RAG systems, including an evaluation framework and a prompt
engineering UI.

Lakehouse Monitoring
Databricks Lakehouse Monitoring provides a centralized monitoring solution for both model and data
monitoring. It allows you to keep track of various statistical properties for all your data sources as well as to
monitor the performance of your served models.
E B O O K : A C O M PA C T G U I D E T O R E T R I E VA L A U G M E N T E D G E N E R AT I O N (R A G) 37

Whether you’re looking to disrupt traditional industries, enhance creative endeavors or solve complex
Summary problems in novel ways, the potential applications of generative AI are limited only by your imagination and
willingness to experiment. Remember, every significant advancement in this field began with a simple idea and
the courage to explore it further.

For those seeking more knowledge or who are simply curious about the latest developments in the realm of
generative AI, we’ve provided some resources on training, demos and product information.

GenAI training
■ Generative AI engineer learning pathway: Take self-paced, on-demand and instructor-led courses on
generative AI
■ Free LLM course (edX): An in-depth course to learn GenAI and LLMs inside and out
■ GenAI webinar: Learn how to take control of your GenAI app performance, privacy and cost, and drive
value with generative AI

Additional resources
■ The Big Book of MLOps: A deep dive into the architectures and technologies behind MLOps — including
LLMs and GenAI
■ Mosaic AI: Product page covering the features of Mosaic AI within Databricks
■ The Big Book of Generative AI: Best practices for building production-quality GenAI applications
Build Production-Quality GenAI Applications — See How
Create high-quality generative AI applications and ensure your output is accurate,
governed and safe. See why over 10,000 organizations worldwide rely on Databricks for
all their workloads from BI to AI — test-drive the full Databricks Platform free for 14 days.

Try Databricks free Take Generative AI Fundamentals On-Demand Training

About Databricks
Databricks is the data and AI company. More than 10,000 organizations worldwide —
including Comcast, Condé Nast, Grammarly and over 50% of the Fortune 500 — rely
on the Databricks Data Intelligence Platform to unify and democratize data, analytics
and AI. Databricks is headquartered in San Francisco, with offices around the globe,
and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake
and MLflow. To learn more, follow Databricks on LinkedIn, X and Facebook.

Retrieval Augmented Generation - A Simple Introduction
No ratings yet
Retrieval Augmented Generation - A Simple Introduction
82 pages
Embeddings - A Simple Guide To Rag
No ratings yet
Embeddings - A Simple Guide To Rag
10 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Vector Embedding
No ratings yet
Vector Embedding
8 pages
Definitive Guide To Testing LLM Applications
No ratings yet
Definitive Guide To Testing LLM Applications
37 pages
Ethics of AI GDC
No ratings yet
Ethics of AI GDC
49 pages
WWW - K2view - Com - What Is Retrieval Augmented Generation
No ratings yet
WWW - K2view - Com - What Is Retrieval Augmented Generation
29 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Generative AI With Large Language Models AWS & DeepLearning
No ratings yet
Generative AI With Large Language Models AWS & DeepLearning
96 pages
AI Privacy Risks and Mitigations in Large Language Models
No ratings yet
AI Privacy Risks and Mitigations in Large Language Models
102 pages
Gated Recurrent Unit: Master Sidsd - S2
100% (1)
Gated Recurrent Unit: Master Sidsd - S2
23 pages
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
No ratings yet
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
61 pages
LLM and RAG
No ratings yet
LLM and RAG
12 pages
Grade 8 Electric Circuits Compressed
No ratings yet
Grade 8 Electric Circuits Compressed
18 pages
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
No ratings yet
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
281 pages
Distributed System
100% (1)
Distributed System
119 pages
RAG Notes
No ratings yet
RAG Notes
4 pages
Newwhitepaper Agents2
No ratings yet
Newwhitepaper Agents2
84 pages
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
Small Language Models (SLMS)
No ratings yet
Small Language Models (SLMS)
23 pages
Hybrid Retrieval-Augmented Generation Approach For LLMs Query Response Enhancement
No ratings yet
Hybrid Retrieval-Augmented Generation Approach For LLMs Query Response Enhancement
5 pages
Crud Rag
No ratings yet
Crud Rag
31 pages
Neo4j - GraphRAG - 2024
100% (1)
Neo4j - GraphRAG - 2024
23 pages
LLMs in Production-MLC - GRC
No ratings yet
LLMs in Production-MLC - GRC
39 pages
RAG (Retrieval Augmented Generation)
No ratings yet
RAG (Retrieval Augmented Generation)
3 pages
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
No ratings yet
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
15 pages
LLM Assignment 1
No ratings yet
LLM Assignment 1
3 pages
NLP and Generative AI Syllabus - 2025
No ratings yet
NLP and Generative AI Syllabus - 2025
5 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
7 pages
IWA-14 Vehicle Safety Barrier
No ratings yet
IWA-14 Vehicle Safety Barrier
6 pages
Fine-Tuning Legal-BERT - LLMs For Automated Legal Text Classification - by Drewgelbard - Nov, 2024 - Towards AI
No ratings yet
Fine-Tuning Legal-BERT - LLMs For Automated Legal Text Classification - by Drewgelbard - Nov, 2024 - Towards AI
27 pages
Agents in Artificial Intelligence Book
No ratings yet
Agents in Artificial Intelligence Book
29 pages
Advanced RAG Techniques - What They Are & How To Use Them
No ratings yet
Advanced RAG Techniques - What They Are & How To Use Them
16 pages
Natural Language Processing
100% (1)
Natural Language Processing
6 pages
Evaluating LLM Models For Production Systems - Methods and Practices - Data Phoenix
No ratings yet
Evaluating LLM Models For Production Systems - Methods and Practices - Data Phoenix
61 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
New CZ3005 Module 2 - Intelligent Agents and Search
No ratings yet
New CZ3005 Module 2 - Intelligent Agents and Search
66 pages
Techniques To FineTune LLMs
No ratings yet
Techniques To FineTune LLMs
7 pages
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
No ratings yet
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
76 pages
The Hundred Page Machine Learning Book
No ratings yet
The Hundred Page Machine Learning Book
7 pages
Centralized Architecture of The Distribution Substation Automation
No ratings yet
Centralized Architecture of The Distribution Substation Automation
154 pages
NorthStar Kick Off - Master
No ratings yet
NorthStar Kick Off - Master
78 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
Anomaly Detection: Course: Data Mining II
No ratings yet
Anomaly Detection: Course: Data Mining II
12 pages
Chapter - 1: Existing System
100% (1)
Chapter - 1: Existing System
15 pages
Drill Grinding Attachment Form 825-55 6 PG
100% (1)
Drill Grinding Attachment Form 825-55 6 PG
6 pages
Bias-Variance Tradeoff Presentation
No ratings yet
Bias-Variance Tradeoff Presentation
11 pages
Data Science Skills They Dont Teach You
No ratings yet
Data Science Skills They Dont Teach You
72 pages
Graph RAG
No ratings yet
Graph RAG
7 pages
Programming Agents Williams
No ratings yet
Programming Agents Williams
31 pages
Kinematics Modeling and Simulation of SCARA Robot Arm
No ratings yet
Kinematics Modeling and Simulation of SCARA Robot Arm
6 pages
1st Year Time Table - Temp - CSE-I
No ratings yet
1st Year Time Table - Temp - CSE-I
1 page
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
No ratings yet
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
34 pages
Building Your Own Autonomous LLM Agents - LinkedIn
No ratings yet
Building Your Own Autonomous LLM Agents - LinkedIn
33 pages
Shreyash's Resume
No ratings yet
Shreyash's Resume
1 page
Knowledge Graph Construction Using Large Language Models
No ratings yet
Knowledge Graph Construction Using Large Language Models
17 pages
Knowledge Graphs V Vector Databases and When Not To Use Them!
No ratings yet
Knowledge Graphs V Vector Databases and When Not To Use Them!
3 pages
RAG Syllabus R&D
No ratings yet
RAG Syllabus R&D
6 pages
Python Programming-Grade 9
No ratings yet
Python Programming-Grade 9
53 pages
IJPREMS Template January 2023
No ratings yet
IJPREMS Template January 2023
2 pages
Andrea Martorana Tusa: Failure Prediction For Manufacturing Industry
No ratings yet
Andrea Martorana Tusa: Failure Prediction For Manufacturing Industry
23 pages
Mobile Elevating Work Platform Pre-Use Inspection Checklist
No ratings yet
Mobile Elevating Work Platform Pre-Use Inspection Checklist
1 page
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
Iceland
No ratings yet
Iceland
112 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
Nextion Manual
No ratings yet
Nextion Manual
5 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
8 pages
AI+Governance+Framework+by+Trail+ +2024.2
No ratings yet
AI+Governance+Framework+by+Trail+ +2024.2
22 pages
What Are Schedule Levels From Level 1 To Level 5
No ratings yet
What Are Schedule Levels From Level 1 To Level 5
21 pages
Prediction and Monitoring of Air Pollution Using Internet of Things (IoT)
No ratings yet
Prediction and Monitoring of Air Pollution Using Internet of Things (IoT)
4 pages
PCB & Electronics Workshop Lab
No ratings yet
PCB & Electronics Workshop Lab
35 pages
Conteo Vehicular
No ratings yet
Conteo Vehicular
96 pages
WSN Routing Protocols
No ratings yet
WSN Routing Protocols
77 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
Data Science - A Kaggle Walkthrough - Introduction - 1 PDF
No ratings yet
Data Science - A Kaggle Walkthrough - Introduction - 1 PDF
5 pages
Social Networks 21
No ratings yet
Social Networks 21
62 pages
Nitro Shock Absorbers
No ratings yet
Nitro Shock Absorbers
25 pages
2401 03910
No ratings yet
2401 03910
30 pages
MM Brochure - IQ7000 - MERCK - Ina
No ratings yet
MM Brochure - IQ7000 - MERCK - Ina
12 pages
User Manual VBB Classic
No ratings yet
User Manual VBB Classic
160 pages
PLC Standar IEC 1131-3
No ratings yet
PLC Standar IEC 1131-3
6 pages
Wire EDM Features - Suzhou Baoma Numerical Control Equipment Co., LTD
No ratings yet
Wire EDM Features - Suzhou Baoma Numerical Control Equipment Co., LTD
4 pages
Iccgi 2024 1 10 10002
No ratings yet
Iccgi 2024 1 10 10002
11 pages
P1 Automated Recognition Chatv2
No ratings yet
P1 Automated Recognition Chatv2
10 pages
Internship Report
No ratings yet
Internship Report
21 pages
Toxic Comment Analyser: Indian Institute of Information Technology, Kalyani
No ratings yet
Toxic Comment Analyser: Indian Institute of Information Technology, Kalyani
37 pages
Medical Management System
No ratings yet
Medical Management System
5 pages
Government With Algorithm Strategy To Improve Sync
No ratings yet
Government With Algorithm Strategy To Improve Sync
12 pages
Z VF-S15 PIDcontrol-InstrMan Neutral e E6581879
No ratings yet
Z VF-S15 PIDcontrol-InstrMan Neutral e E6581879
23 pages
Alienware Area-51m Setup and Specifications: Regulatory Model: P38E Regulatory Type: P38E001
No ratings yet
Alienware Area-51m Setup and Specifications: Regulatory Model: P38E Regulatory Type: P38E001
20 pages
THE Constitution
No ratings yet
THE Constitution
19 pages
2018 Nike SUSTAINABLE REPORT
No ratings yet
2018 Nike SUSTAINABLE REPORT
6 pages
How To Calculate LTE Data Rate
No ratings yet
How To Calculate LTE Data Rate
2 pages
Making The Most of It
No ratings yet
Making The Most of It
4 pages
CMPE 30141 Module Plate 2
No ratings yet
CMPE 30141 Module Plate 2
5 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet

2024-05-EB-A Compact GuideTo RAG

Uploaded by

2024-05-EB-A Compact GuideTo RAG

Uploaded by

eBook

Introduction: Retrieval Augmented Generation (RAG) With Vector Search...................................................................3

Asking RAG About Databricks Asset Bundles (DABs)................................................................................................................... 8

Addressing the Shortcomings of LLMs With RAG........................................................................................................................... 11

RAG Use Cases.................................................................................................................................................................................................... 14

RAG With Vector Search — Step by Step............................................................................................................................................ 15

Utilizing RAG With Other Modeling and Model Customization Methods......................................................................... 31

Introduction Retrieval Augmented Generation (RAG) With Vector Search

LLMs and prompts

Next, let’s discuss where the external information is retrieved from.

Vector Search and embedding models

1 User: What are databricks DABs?

Let’s try another question.

Using RAG to give an LLM access to documentation about DABs

Here are the new results.

1 User: What are databricks DABs?

RAG compared to LLM-only approaches

Data preparation: Getting an external information source into

PARS I N G TH E I N PUT D O C U M E NTS

S PLIT TI N G D O C U M E NTS I NTO C H U N KS

1 [0.0209503173828125, 0.0172576904296875, -0.003314971923828125, -0.0025310516357421875, 0.00670623779296875,

7 -0.01076507568359375, 0.0123443603515625, -0.016693115234375, 0.0243377685546875, 0.0244903564453125]

Retrieval: Getting relevant context

1 [0.006649017333984375, 0.029144287109375, 0.0001398324966430664, 0.00481414794921875, -0.006526947021484375,

7 0.0260467529296875, -0.0013675689697265625, 0.032318115234375, -0.002666473388671875, 0.0269012451171875, 0.0616455078125]

Augmentation: Adding context to the user’s prompt

AU G M E NTI N G TH E PRO M P T WITH TH E R E TR I E V E D CO NTE X T

2 Based on the following context, answer the user's question. Context:

CO NTE X T WI N D OW: H OW M U C H CO NTE X T S H O U LD WE PROV I D E?

E X AM PLE : PRO M P T AU G M E NTATI O N

4 Answer the question based on the provided context. Context:

Generation: Producing useful output with an LLM

1 User: What are Databricks Asset Bundles?

1 User: What version of the CLI do I need in order to use them?

Evaluation: Measuring RAG performance

If access to external information is the goal, RAG offers many benefits:

Try Databricks free Take Generative AI Fundamentals On-Demand Training

You might also like