Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
Search
Get unlimited access to the best of Medium for less than $1/week. Become a member
1. Outdated Knowledge: LLMs can’t access new information after they’re trained. They
rely only on the data they were trained with, so they can’t provide real-time or up-to-
date information.
2. Factual Mistakes: LLMs can generate fluent text but sometimes give incorrect or
misleading answers, especially on less common or specialized topics.
4. No Access to External Information: LLMs can’t look up answers from external sources,
like the internet or a database, which limits their ability to provide specific or accurate
details on certain topics.
image by CampusX
The process of IR generally consists of a few key steps, the first of which is indexing.
Indexing involves converting external data sources into numerical representations, making
it easier to search through large datasets. For example, if you’re looking for information on
the “ICC Cricket World Cup 2023,” IR systems will scan through the database and rank all
related documents, prioritizing the ones that contain the most relevant information.
Workflow of RAG :
In Retrieval-Augmented Generation (RAG), the workflow revolves around three main
components: Retrieve, Augment, and Generate. Here’s a detailed breakdown of each
phase:
1. Retrieve
This phase is responsible for fetching relevant information from an external knowledge
base, database, or document repository. The process begins with a query, usually derived
from the user’s input or a given prompt.
Embedding Model: The input query is first converted into vector embeddings using an
embedding model. This model maps the input into a numerical form that can be used
for similarity searches.
Vector Database: Once the query is embedded, it is sent to a Vector DB, which contains
embeddings of documents, text data, or any relevant external information. This
database is indexed based on vector similarity (cosine similarity is often used).
Retriever & Ranker: A retriever component then selects the top N documents or
relevant data points based on similarity. These documents are ranked in order of
relevance, typically using semantic search or other retrieval algorithms like sparse or
dense retrieval methods.
2. Augment
In this phase, the retrieved information is used to provide additional context to the query or
prompt, enhancing the model’s understanding of the task.
Retrieved Context: The top N documents fetched in the retrieval stage are passed back
to the model as retrieved context. This information is appended or "augmented" with
the original user query to provide additional details and improve the relevance and
accuracy of the response.
The goal here is to leverage both the external knowledge base and the model’s trained
knowledge to handle specific or unseen questions better.
image Source
3. Generate
The final stage is responsible for generating the actual output, which combines the original
prompt/query with the augmented data from the retrieval phase.
LLMs (Large Language Models): The augmented prompt, along with the retrieved
context, is passed to the LLMs (e.g., GPT, BERT, or any transformer-based model). The
LLMs processes the input and generates a response that is more context-aware and
accurate, thanks to the extra information it received from the retrieval phase.
This represents a large corpus of documents or information that may not be part of the
trained model (such as private databases, custom datasets, or any source of external
knowledge).
2. Embedding Model:
3. Vector DB:
The embeddings are stored in a Vector Database, which allows for efficient similarity
search. When a query is issued, the database retrieves the documents based on their
vector similarity to the query.
Retriever: When a query is provided, the retriever pulls the top N most relevant
documents (based on vector similarity).
Ranker: The ranker ranks these documents based on relevance (using similarity
measures like cosine similarity).
. Retrieved Context:
The top N most relevant documents or pieces of text are selected as the retrieved
context to be used by the generative model in the next step.
This highlights the limitation of purely generative models (such as LLMs) that do not
have access to new or external data beyond their training corpus. They may:
Not be up-to-date.
2. Embedding Model:
This step represents how the query and retrieved documents are encoded into
embeddings to be processed by the generative model (e.g., LLMs).
The LLMs (like GPT, BART, T5, etc.) takes the retrieved context (from the left side of the
diagram) along with the original query and generates a response. This helps to improve
the factual accuracy of the output by integrating external, relevant documents.
The user receives the final output, which is more informed and factually correct, as it
integrates both generative capabilities and retrieved information.
So far, we have covered the basics of RAG. Now, let’s delve into the concept of Vector
Databases.
image Source
These vectors allow us to capture the semantic similarity between words. For example, the
vectors for “Apple” and “orange” might show similarities in their fruit-related attributes,
while vectors for “Apple” and “Samsung” would highlight their similarities in the tech
context.
Traditional relational databases might initially seem like a viable option. You’d generate
embeddings, store them in a SQL database, and then compare new query embeddings to
retrieve relevant results. However, this approach struggles with scalability and efficiency
when dealing with vast amounts of data.
image Source
To address these challenges, vector databases come into play. They are optimized for
storing and querying large-scale vector embeddings. Instead of linear search, which is
computationally expensive and slow for large datasets, vector databases use techniques like
indexing and locality-sensitive hashing (LSH) to speed up searches.
1. Faster Searches: By employing techniques like LSH, they can rapidly locate relevant
vectors.
2. Optimal Storage: They are designed to handle the unique requirements of vector data,
ensuring efficient storage and retrieval.
As vector databases continue to evolve, they are becoming increasingly essential for
applications involving semantic search, recommendation systems, and more.
I hope this overview helps you understand the fundamentals of vector databases and their
significance in modern data retrieval and search technologies.
References :
Research Paper for RAG : Retrieval-Augmented Generation for Large Language Models: A Survey
I trust this blog has enriched your understanding of Retrieval Augmented Generation(RAG).
If you found value in this content, I invite you to stay connected for more insightful posts.
Your time and interest are greatly appreciated. Thank you for reading!
Following
Written by Sachinsoni
465 Followers · 17 Following
Mar 5 5
Sachinsoni
Sachinsoni
Sachinsoni
Dhiraj K
Oct 26
In Towards AI by Mdabdullahalhasib
Oct 19 111
Lists
Staff picks
776 stories · 1469 saves
Self-Improvement 101
20 stories · 3093 saves
Productivity 101
20 stories · 2599 saves
Samar Singh
Jun 17 94 1
Shrinivasan Sankar
Ignacio Pérez
Aug 17 164
Oct 15 2