Basics of Retrieval-Augmented Generation or RAG
Basics of Retrieval-Augmented Generation or RAG
As interest in large language models, or LLMs, increases, developers are exploring ways to harness their
potential.
However, pre-trained LLMs might not perform optimally right out of the box for your business needs.
You may need to decide between using model fine-tuning, a process where a pre-trained model is
further trained on a new dataset without starting from scratch or Retrieval-Augmented Generation to
enhance the performance.
In this episode, we will explore what RAG is and a pattern to implement RAG using Amazon Bedrock
foundation models and other AWS Services.
RAG can be particularly useful in developing applications, like Q&A chat bots that securely interact with
your internal knowledge basis or enterprise data sources.
Such an approach is more suitable compared to out-of-the-box LLMs, which may lack your enterprise-
specific knowledge. Let's dive into understanding what Retrieval-Augmented Generation is.
Retrieval-Augmented Generation helps to retrieve data from outside a foundation model and augment
your prompts, which is a natural language text that requests the LLM to perform a specific task by
adding the relevant retrieved data in context.
It is composed of three components: retrieval, augmentation, and generation. Upon receiving a user
query, relevant content is retrieved from external knowledge basis or other data sources based on the
specifics of the query.
The retrieved contextual information is then appended to the original user query, creating an
augmented query to serve as the input to the foundation model.
The foundation model then generates a response based on the augmented query.
With this high-level flow, now let's cover a few different types of retrieval and see where RAG fits in. The
three typical types of retrieval consist of, first, rule-based, in which the model fetches unstructured
data, such as documents and thus keyword-based searches. Second, transaction-based, where
transactional data is retrieved from a database or an API. And third, semantic-based, where the model
retrieves relevant documents based on text embeddings [Text embeddings are a technique that
converts text into numerical vectors that represent the meaning and context of the words].
This is where the RAG model is most applicable. First, let's further define embedding and its relevance
when implementing RAG.
Embedding refers to transforming data, like text, images, audio, into numerical representation in a high-
dimensional vector space using machine learning algorithms.
This allows understanding semantics, learning complex patterns, and using the vector representation for
applications like search, classification, and natural language processing.
Let's take a deeper look at an end-to-end RAG architecture leveraging AWS Services.
Step – 1 First, you start with the selection of a large language model. Some considerations to keep in
mind are use cases, context length, hosting, training data if applicable, customization, and license
agreements. For this, you can use Amazon Bedrock, which is a fully managed service that offers a choice
of high-performing foundation models from leading AI companies via a single API.
Along with a broad set of capabilities, you need to build generative AI applications with security, privacy,
and responsible.
Step- 2 Now, with the LLM selection in place, you will start with identifying your knowledge base and
converting them into embeddings for a vector store.
For embedding models, factors such as max input size, latency, output embedding size, ease of hosting,
and accuracy are crucial considerations. When considering embedding models, your options include
Amazon Titan Embeddings, Cohere Embed, and other embedding models. For your query, generate the
embeddings of the query using the same embedding model.
Step- 3 Next are the vector databases, which provide you the ability to store and retrieve vectors as
high-dimensional points. With this, you can index vectors generated by embeddings into a vector
database. When evaluating options, consider the nature of data sources and formats, dimensions, the
choice between fully-managed services and self-managed, development complexity, and scalability.
Available vector store options include Vector Engine for Amazon OpenSearch, AWS Kendra, and Aurora
pgvector store. Now, let's talk about orchestration for all of these components. Some available options
are Amazon Bedrock Knowledge Base and Agents, LangChain, LlamaIndex, Step Functions, as well as
other open-source solutions.
In this episode, we discussed the basics of RAG and reference patterns for implementation. We covered
the basics around embeddings and why it is important when implementing RAG in your applications.
Finally, we saw an end-to-end architecture using AWS Services, including Amazon Bedrock. Check out
the links in the description below for more details. Thank you for watching "Back to Basics". See you
next time.