0% found this document useful (0 votes)
146 views18 pages

Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium

Uploaded by

MohitKhemka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views18 pages

Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium

Uploaded by

MohitKhemka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Open in app

Search

Get unlimited access to the best of Medium for less than $1/week. Become a member

Introduction to RAG (Retrieval Augmented


Generation) and Vector Database
Sachinsoni · Following
8 min read · Sep 15, 2024

Listen Share More

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances the


capabilities of language models by combining two key processes: retrieving information
and generating text. While traditional language models like GPT rely only on what they
were trained on, RAG goes a step further by searching for relevant information from
external sources, like a database or documents, to help generate more accurate and
detailed answers. This makes RAG especially useful in tasks where up-to-date or specialized
knowledge is needed, such as answering questions or generating informative content.
image by CampusX

Limitations of Large Language Models Before RAG :


Large Language Models (LLMs) like GPT-3 have impressive capabilities, but they come with
several key limitations:

1. Outdated Knowledge: LLMs can’t access new information after they’re trained. They
rely only on the data they were trained with, so they can’t provide real-time or up-to-
date information.

2. Factual Mistakes: LLMs can generate fluent text but sometimes give incorrect or
misleading answers, especially on less common or specialized topics.

3. Hallucination Problem: LLMs sometimes “hallucinate,” meaning they confidently


generate information that sounds reasonable but is completely false. This happens
because they rely only on patterns from their training data rather than real-time
information.

4. No Access to External Information: LLMs can’t look up answers from external sources,
like the internet or a database, which limits their ability to provide specific or accurate
details on certain topics.

How RAG Solves These Issues


Retrieval-Augmented Generation (RAG) helps solve many of these problems by allowing
LLMs to fetch relevant information from external sources. Instead of relying solely on their
internal knowledge, RAG models can search databases or documents for real-time, up-to-
date information, which helps reduce hallucinations and improves the accuracy of
generated content. By integrating retrieval, RAG ensures that the model generates more
reliable, fact-based answers, even for specialized or complex queries.

Understanding Information Retrieval Before RAG


Before diving into Retrieval-Augmented Generation (RAG), it’s crucial to understand
Information Retrieval (IR), which plays a foundational role. As the name suggests,
information retrieval is about finding and extracting relevant data from large datasets.
Think of it like this: when we were kids, we used to answer questions based on a given
paragraph. We didn’t use the entire paragraph; instead, we picked the exact information
needed to answer the question. This is the essence of IR — retrieving only the relevant
information from massive collections, which could be text, images, audio, or even video.

image by CampusX

The process of IR generally consists of a few key steps, the first of which is indexing.
Indexing involves converting external data sources into numerical representations, making
it easier to search through large datasets. For example, if you’re looking for information on
the “ICC Cricket World Cup 2023,” IR systems will scan through the database and rank all
related documents, prioritizing the ones that contain the most relevant information.

Workflow of RAG :
In Retrieval-Augmented Generation (RAG), the workflow revolves around three main
components: Retrieve, Augment, and Generate. Here’s a detailed breakdown of each
phase:

1. Retrieve
This phase is responsible for fetching relevant information from an external knowledge
base, database, or document repository. The process begins with a query, usually derived
from the user’s input or a given prompt.

Embedding Model: The input query is first converted into vector embeddings using an
embedding model. This model maps the input into a numerical form that can be used
for similarity searches.

Vector Database: Once the query is embedded, it is sent to a Vector DB, which contains
embeddings of documents, text data, or any relevant external information. This
database is indexed based on vector similarity (cosine similarity is often used).

Retriever & Ranker: A retriever component then selects the top N documents or
relevant data points based on similarity. These documents are ranked in order of
relevance, typically using semantic search or other retrieval algorithms like sparse or
dense retrieval methods.

2. Augment
In this phase, the retrieved information is used to provide additional context to the query or
prompt, enhancing the model’s understanding of the task.

Retrieved Context: The top N documents fetched in the retrieval stage are passed back
to the model as retrieved context. This information is appended or "augmented" with
the original user query to provide additional details and improve the relevance and
accuracy of the response.

The goal here is to leverage both the external knowledge base and the model’s trained
knowledge to handle specific or unseen questions better.
image Source

3. Generate
The final stage is responsible for generating the actual output, which combines the original
prompt/query with the augmented data from the retrieval phase.

LLMs (Large Language Models): The augmented prompt, along with the retrieved
context, is passed to the LLMs (e.g., GPT, BERT, or any transformer-based model). The
LLMs processes the input and generates a response that is more context-aware and
accurate, thanks to the extra information it received from the retrieval phase.

Formatted Response: The output is returned as a formatted response, usually displayed


in the user interface. The user query is enriched with additional, often domain-specific
or up-to-date, information, addressing some of the inherent limitations of standard
language models.

Visualizing the RAG Workflow:


The below diagram outlines the Retrieval-Augmented Generation (RAG) framework,
showing how it integrates retrieval methods with a generative approach to improve text
generation.
image Source

Explanation of Above diagram workflow :


A. Left Side (Retrieval Methods)
1. Private or Custom Data:

This represents a large corpus of documents or information that may not be part of the
trained model (such as private databases, custom datasets, or any source of external
knowledge).

2. Embedding Model:

An embedding model (like BERT, or Sentence Transformers) is used to convert the


documents and the user’s query into dense vector representations (embeddings). These
vectors represent the semantic meaning of the documents and queries in high-
dimensional space.

3. Vector DB:

The embeddings are stored in a Vector Database, which allows for efficient similarity
search. When a query is issued, the database retrieves the documents based on their
vector similarity to the query.

4. Retriever and Ranker:

Retriever: When a query is provided, the retriever pulls the top N most relevant
documents (based on vector similarity).
Ranker: The ranker ranks these documents based on relevance (using similarity
measures like cosine similarity).

. Retrieved Context:

The top N most relevant documents or pieces of text are selected as the retrieved
context to be used by the generative model in the next step.

B. Right Side (Generative Approach)


1. External Data Source:

This highlights the limitation of purely generative models (such as LLMs) that do not
have access to new or external data beyond their training corpus. They may:

Not be up-to-date.

Suffer from hallucinations.

Lack specific domain knowledge.

2. Embedding Model:

This step represents how the query and retrieved documents are encoded into
embeddings to be processed by the generative model (e.g., LLMs).

3. LLMs (Large Language Model):

The LLMs (like GPT, BART, T5, etc.) takes the retrieved context (from the left side of the
diagram) along with the original query and generates a response. This helps to improve
the factual accuracy of the output by integrating external, relevant documents.

4. Formatted Response (User Interface):

The user receives the final output, which is more informed and factually correct, as it
integrates both generative capabilities and retrieved information.

So far, we have covered the basics of RAG. Now, let’s delve into the concept of Vector
Databases.

Understanding Vector Databases :


In today’s digital landscape, when you perform a search on Google, such as “calories in
apple” versus “employees in Apple,” the search engine cleverly distinguishes between the
fruit and the company. But have you ever wondered how Google achieves this? The answer
lies in a technique known as semantic search.
Semantic search moves beyond simple keyword matching to understand the intent behind
a user’s query and leverage context for more accurate results. At its core, semantic search
relies on the concept of embeddings — numerical representations of text.
What Are Embeddings ?
Embeddings transform words or sentences into numeric vectors. For instance, consider the
word “Apple.” In one context, it could refer to the fruit, while in another, it might denote the
tech company. To represent this, we create a vector that encodes features related to each
context. For “Apple” the fruit, the vector might reflect attributes like “fruit,” “sweet,” and
“edible.” For “Apple” the company, the vector would focus on “technology,” “company,” and
“innovation.”

image Source

These vectors allow us to capture the semantic similarity between words. For example, the
vectors for “Apple” and “orange” might show similarities in their fruit-related attributes,
while vectors for “Apple” and “Samsung” would highlight their similarities in the tech
context.

The Role of Vector Databases :


With thousands or even millions of embeddings to manage, storing and searching these
vectors efficiently becomes crucial.
image Source

Traditional relational databases might initially seem like a viable option. You’d generate
embeddings, store them in a SQL database, and then compare new query embeddings to
retrieve relevant results. However, this approach struggles with scalability and efficiency
when dealing with vast amounts of data.
image Source

To address these challenges, vector databases come into play. They are optimized for
storing and querying large-scale vector embeddings. Instead of linear search, which is
computationally expensive and slow for large datasets, vector databases use techniques like
indexing and locality-sensitive hashing (LSH) to speed up searches.

Locality Sensitive Hashing (LSH) :


LSH is a technique that partitions vectors into “buckets” based on their similarity. When a
search query is performed, it is hashed into one of these buckets, significantly reducing the
number of comparisons needed. Instead of comparing the query vector with every stored
vector, you only need to compare it with those in the same bucket. This method accelerates
search times and enhances efficiency.
image Source

Why Vector Databases?


Vector databases offer two primary benefits:

1. Faster Searches: By employing techniques like LSH, they can rapidly locate relevant
vectors.

2. Optimal Storage: They are designed to handle the unique requirements of vector data,
ensuring efficient storage and retrieval.

As vector databases continue to evolve, they are becoming increasingly essential for
applications involving semantic search, recommendation systems, and more.

I hope this overview helps you understand the fundamentals of vector databases and their
significance in modern data retrieval and search technologies.

References :

Research Paper for RAG : Retrieval-Augmented Generation for Large Language Models: A Survey

Vector Database Youtube Video: https://youtu.be/72XgD322wZ8?si=KPFg30be_EBu7EUa

Vector Database Article : https://www.pinecone.io/learn/vector-database/

I trust this blog has enriched your understanding of Retrieval Augmented Generation(RAG).
If you found value in this content, I invite you to stay connected for more insightful posts.
Your time and interest are greatly appreciated. Thank you for reading!

Following

Written by Sachinsoni
465 Followers · 17 Following

More from Sachinsoni


Sachinsoni

Python Cookiecutter : Streamlining MLOps Project Setup


Cookiecutter is a valuable tool for streamlining the project setup process by providing pre-configured
templates specifically designed for…

Mar 5 5

Sachinsoni

Mastering Pip: Essential for Data Scientists and Python Developers


Ever wondered how Python developers and data scientists effortlessly manage all those cool packages
they use in their projects? Well, it’s…
Oct 14, 2023 1

Sachinsoni

From Code to Containers: Revolutionizing Data Science with Docker


Imagine if every time you wanted to share your amazing data science project, you had to deal with
software compatibility issues, missing…

Aug 27, 2023 110

Sachinsoni

Different Metrices in Machine Learning for Measuring performance of


Classification Algorithms
Accuracy: In the context of classification problems, accuracy refers to the measurement of how often a
model correctly predicts the class…

Jun 12, 2023

See all from Sachinsoni

Recommended from Medium

Dhiraj K

ChatGPT: Reinforcement Learning from Human Feedback (RLHF)


Imagine training a dog. You reward it with a treat when it performs a trick correctly, and if it misbehaves,
you guide it towards better…

Oct 26
In Towards AI by Mdabdullahalhasib

A Complete Guide to Embedding For NLP & Generative AI/LLM


Understand the concept of vector embedding, why it is needed, and implementation with LangChain.

Oct 19 111

Lists

Staff picks
776 stories · 1469 saves

Stories to Help You Level-Up at Work


19 stories · 880 saves

Self-Improvement 101
20 stories · 3093 saves

Productivity 101
20 stories · 2599 saves
Samar Singh

Mastering RAG: Advanced Methods to Enhance Retrieval-Augmented


Generation
RAG (Retrieval Augmentation Generation) is technique which gives llm the external knowledge or
data,the data on which llm has not been…

Jun 17 94 1

Shrinivasan Sankar

Chunking in Retrieval Augmented Generation (RAG) — theory with hands-on


In my previous article, we saw a comprehensive overview of Retrieval Augmented Generation(RAG). We
saw why we need RAG and in what…
Jul 22 2

Ignacio Pérez

Understanding RAG Basics, its Patterns and its Stages — Introduction


StepByStep: Introducing RAG, how it works and its patterns and stages or modules.

Aug 17 164

In Python in Plain English by Luis Valencia

Implementing Byte Pair Encoding (BPE) for Tokenization: A Step-by-Step Guide


In the book “Build a Large Language Model (From Scratch)”, Sebastian Raschka introduces various
ways to process text data for large…

Oct 15 2

See more recommendations

You might also like