0% found this document useful (0 votes)
71 views37 pages

RAG Vs VectorDB. Introduction To RAG and VectorDB - by Bijit Ghosh - Medium

Uploaded by

sauravgtm08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views37 pages

RAG Vs VectorDB. Introduction To RAG and VectorDB - by Bijit Ghosh - Medium

Uploaded by

sauravgtm08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

12/3/24, 8:12 PM RAG Vs VectorDB.

Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

RAG Vs VectorDB
Bijit Ghosh · Follow
14 min read · Jan 29, 2024

370 4

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 1/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Introduction to RAG and VectorDB


Retrieval-Augmented Generation (RAG) and VectorDB are two important
concepts in natural language processing (NLP) that are pushing the
boundaries of what AI systems can achieve. In this blog post, I will dive deep
into RAG, exploring how it works, its applications, strengths and limitations.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 2/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

We will also examine VectorDB, a specialized database for vector storage that
is integral to many RAG implementations.

By the end, you will have a clear understanding of:


What RAG is, how it combines retrieval and generation, and why this
hybrid approach is so powerful

Real-world applications of RAG models like question answering,


summarization, and dialog systems

Methods for training performant RAG models with datasets like REALM
and ORQA

VectorDB, how it efficiently stores vectors, and how it supercharges RAG


Search Write Sign up Sign in
Current innovations in RAG and exciting areas of future research

What is Retrieval Augmented Generation (RAG)?


Retrieval-Augmented Generation refers to an advanced natural language
processing technique that combines the strengths of both retrieval models
and generative models.

Retrieval models are systems that select relevant knowledge from a


collection of data in response to some query or context. For example, a

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 3/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

search engine that returns the most relevant web pages for a search query
would be considered a simple retrieval model.

In contrast, generative models are able to produce entirely new text using
language generation capabilities. Examples would be machine translation
systems or conversational chatbots.

Traditionally, NLP systems used either a pure retrieval-based approach or


pure generative approach. RAG combines these two methodologies
together.

The key idea behind RAG is that having access and conditioning on relevant
background knowledge can significantly improve generative model
performance on downstream NLP tasks.

For example, a RAG-powered dialog system could first retrieve relevant


passages or documents related to the dialog context. It then feeds these
retrieved passages to a generative seq2seq model during response
generation, which allows producing more knowledgeable, nuanced, and
relevant responses.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 4/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

The RAG paradigm allows models to have impressive retrieval abilities for
gathering relevant information, combined with excellent natural language
generation capabilities for producing fluent, human-like text. This hybrid
approach leads to state-of-the-art results on tasks ranging from open-domain
question answering to dialog systems.

In the next sections, we’ll explore exactly how RAG models work under the
hood, along with innovative applications, recent advancements, and
promising future research directions in this burgeoning subfield of NLP.

How Do RAG Models Work? Architecture Overview


The key components of a Retrieval-Augmented Generation model are:

Retrieval System: Pulls relevant passages or documents from a


knowledge source or database. Could be sparse vector search, dense
embeddings, or full text search.

Re-ranker: Reranks retrieved passages. Often uses cross-attention


between query/context and passages. Improves relevance.

Generative Model: Seq2seq language model that incorporates retrieved


passages using cross-attention. Generates final output.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 5/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Here is a typical high-level RAG pipeline to handle an input query:

1. Input Query

The input query could be a search query, question for QA, dialog utterance,
or other text.

2. Retrieval System

The retrieval system selects the top k passages or documents related to the
query from the knowledge source. This is enabled through semantic dense
vector search or sparse methods like TF-IDF.

3. Re-ranking

An optional re-ranking step filters and re-orders the k retrieved passages to


pick the most relevant ones for the query. Typically uses cross-attention
modules.

4. Incorporate into Generative Model

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 6/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

The top re-ranked m passages are concatenated with the original query. This
combined representation is fed into the generative seq2seq model.

5. Generate Output

The seq2seq model attends to the retrieved passages while decoding the
output text, whether that be an answer, dialog response, or other generated
text.

This augmentation with external knowledge is what gives RAG models a


significant boost over previous pure generative models that had no retrieval
mechanisms for utilizing external information.

Now let’s explore some leading RAG algorithms like REALM, ORQA, and RAG
Token which instantiate this high-level architecture in innovative ways.

REALM: Pioneering RAG Algorithm


REALM, which stands for REtrieval Augmented Language Model, is one of
the seminal RAG algorithms that demonstrated the early effectiveness of this
approach on question answering.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 7/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

REALM augments a standard T5 language model which serves as the core


generative module, by enabling it to incorporate evidence passages retrieved
from Wikipedia in a lightweight, efficient manner.

It introduced two impactful architectural innovations:

1. Open Retrieval

Open retrieval refers to the shift from closed domain, limited size knowledge
sources to open-ended retrieval over massive corpora like the entirety of
Wikipedia (over 21 billion words). Scaling to such a large, ever-evolving
knowledge source is challenging but impactful.

2. Late Interaction

Traditionally in prior work like Dense Passage Retrieval, retrieved evidence


passages are concatenated up front with the input question to create a single
combined representation.

In contrast, REALM introduces late interaction where the input question and
evidence passages are encoded independently, without concatenation first.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 8/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

The joint interaction only happens later within the encoder cross-attention
layers.

This more efficient approach prevents an early bottleneck and also allows
flexibility in how many passages are provided to later attention layers.

Let’s go through the 4 main steps of the REALM architecture:

1. Input Question

A natural language question such as “Where was Alexander Fleming born?”.

2. Retrieve Relevant Passages

Sparse vector index retrieval using BM25 to fetch top k Wikipedia passages
given question embeddings.

3. Encode Independently

Question and evidence passages get encoded separately by RoBERTa without


concatenation first.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 9/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

4. Joint Contextualization

Encoded vectors interact via cross-attention layers to produce final


contextualized representations that power output text generation.

These architectural innovations allow REALM to exceed prior SOTA on open-


retrieval QA by 17 F1 points on the challenging Natural Questions
benchmark. This established RAG and REALM as a pivotal new direction that
created an influx of follow-on research adapting the RAG paradigm. Next
we’ll explore some of these derivative works that built upon REALM.

ORQA: Optimized RAG Architecture


Building off the late interaction concept of REALM, the ORQA model
(Optimized Retrieval Question Answering) pushed RAG capabilities even
further for question answering through optimized encoding schemes.

ORQA specializes the separate text encodings by using:

BERT encoder optimized for question representation

REALM encoder optimized for evidence passages

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 10/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

The inputs are encoded in parallel by each respective encoder, fused via
cross-attention, then decoded by a T5 model into an answer span selection
over the evidence.

Additional optimizations include:

Re-ranker added between retriever and encoder to boost most relevant


passages

Multi-vector representations for each passage to capture different


granularities

Multi-task fine-tuning with both passage selection and span prediction

This results in significantly more efficient and accurate encoding of


questions and evidence passages. For example, on the well-benchmarked
Natural Questions dataset, ORQA achieves a new state-of-the-art 88.1 F1
score.

Next, we’ll explore how RAG has been adapted into a unified framework that
takes a token-level approach.

RAG Token: Unifying Text and Knowledge Retrieval

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 11/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

RAG Token represents another evolution of RAG models by reformulating it


as a single sequence-to-sequence task.

It frames the retrievals as special tokens, rather than separate passages.


Concretely:

1. The retriever module outputs a set of knowledge tuples (subject, relation,


object)

2. Each tuple gets embedded as a [RAGTOKEN] token added to the input


sequence

3. The final input sequence feeds into a T5 encoder-decoder model

So rather than retrieving full passages, RAG Token distills external


knowledge into succinct subject-relation-object triplets. This allows scaling
to huge knowledge graphs as the retrieval source.

For example, consider the input question: “When was the first bicycle
invented?”

The retriever may output the knowledge tuple:

( Bicycle , invented_on_date , 1817 )


https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 12/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Which gets embedded into the sequence as:

When was the first [RAGTOKEN] Bicycle [/RAGTOKEN] invented [RAGTOKEN]

invented_on_date 1817 [/RAGTOKEN] ?

This token-based approach allows jointly training over both text corpus
retrieval and knowledge

Why is Efficient Vector Storage Important for RAG?


A critical aspect that powers the capabilities of Retrieval-Augmented
Generation models is the vector database that stores the embeddings for fast
semantic search during the initial retrieval stage.

In order for RAG models to scale to immense corpora containing billions of


text passages, efficiently indexing and querying vector representations is
crucial.

This is where highly optimized vector databases like Weaviate, Chroma,


FAISS, Vespa, or Pinecone come into play. They allow storing billions of text
or document vectors for low-latency similarity search.

Specifically, these vector databases excel at:

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 13/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Efficient Indexing: Making use of advanced data structures like inverted


indices, clustering trees, and quantization algorithms to compress the vector
space and enable GPU-acceleration.

Approximate Nearest Neighbor Search: Using hashing, HNSW graphs, or


product quantization to quickly return approximate nearest matches rather
than exhaustive, expensive vector computations.

Cloud & Infrastructure Optimization: Leveraging distributed compute


clusters spread across regions, load balancing, cached hot vectors, and
complex query routing algorithms. Without these large-scale vector
databases to provide the foundation, RAG models would not be feasible due
to slow, expensive retrieval. The fast vector queries allow the passage
encoders and decoders to become the performance bottlenecks rather than
search latency itself.

Next, we’ll do a deep dive into popular vector database fueling state-of-the-
art RAG implementations.

VectorDB: High-Performance Vector Similarity Search


VectorDB is an example of a blazing fast vector database purpose-built to
power neural search applications like RAG models (Chen et al. 2021).

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 14/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

It focuses explicitly on vector storage and Approximate Nearest Neighbor


(ANN) retrieval, without any unnecessary bells and whistles. This lean scope
allows it to achieve unparalleled query speeds and scalability.

The key capabilities offered by VectorDB include:


Vector Storage: Obviously the primary purpose is highly efficient storage
specifically for dense vectors, rather than more general variables. Data types
are optimized for floats.

GPU Acceleration: Makes use of GPU cores for massively parallel processing
and takes advantage of libraries like Faiss to perform ultra-fast indexing and
search computations.

Distributed Architecture: Scales across multiple machines and servers to


partition the vector space while still allowing unified access. This maintains
efficiency even with trillions of vectors.

Cloud Native: Fully managed cloud service abstracting away server


provisioning and networking complexity. Auto-scales dynamically based on
query load.

REST API: Simple API endpoints like vectorDB.search and vectorDB.insert to


perform vector operations from any application without needing to integrate
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 15/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

a database client.

Latest Algorithms: Continually experiments with and integrates state-of-the-


art ANN algorithms like HNSW, IVF, OPQ to maximize accuracy and speed.
The combination of these capabilities in a highly streamlined package
tailored for vector similarity search allows massive scaling to support next-
gen RAG models containing up to trillions of embeddings.

Now let’s walk through a concrete architecture pattern that utilizes VectorDB
to enable cutting-edge RAG implementations.

Example-Pinecone: Purpose-Built for Neural Information


Retrieval
Pinecone is another leading vector database designed from the ground up to
enable blazing fast vector similarity search for powering neural search
pipelines. It shares many of the same capabilities covered previously such as
efficient vector storage, GPU acceleration, distributed architecture, and
simple APIs.

Additionally, Pinecone introduces a couple unique innovations:

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 16/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Vector Storage Architecture: Pinecone uses a hybrid storage model that


combines column storage with row storage. The column store holds the
vectors for maximum compression and encoding efficiency. The row store
contains the metadata. This dual architecture Optimization allows orders of
magnitude faster inserts and queries.

Reconfigurable Metrics: Supports toggling between different similarity


metrics like Cosine, L2, and Dot Product. This provides flexibility to change
scoring functions on the fly without re-indexing or model changes. Useful
for experimentation.

Let’s analyze a sample workflow leveraging Pinecone’s strengths:

1. Title and Abstract Vectors

Encode the title and abstract text from research papers into dense vectors.
This allows semantic search over key concepts.

2. Multi-Vector Fields

Index separate vectors for title and abstract into different vector fields,
allowing fine-grained queries.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 17/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

3. Query Title Vectors

Surface the most relevant research papers for a query based on title vector
similarity search.

As we can see, purpose-built vector stores like Pinecone enable creating


sophisticated RAG pipelines that allow leveraging state-of-the-art neural
encodings paired with ultra-efficient ANN search.

Advanced RAG Architecture with VectorDB


Here is an example advanced RAG pipeline that leverages VectorDB’s
strengths for low-latency, ultra-scalable vector search:

1. Embed Text

The first step is generating vector embeddings for all the texts that we want
to be retrievable. This includes corpora like Wikipedia, news archives,
journal papers, or any collection of documents. Powerful semantic encoders
like SBERT (Sentence-BERT) are ideal for creating quality document and
passage vectors.

2. Insert Vectors into VectorDB Cloud


https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 18/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

All those billions of encoded vectors get efficiently inserted into the
managed VectorDB cloud. This powers a unified index spanning the entire
vector space.

3. Input Question

When an input question arrives, it gets encoded via SBERT to create a dense
vector representation.

For example: “When was insulin discovered?” -> [0.73, 1.19, 0.42, …]

4. Retrieve Similar Vectors from VectorDB

This question vector gets fed into VectorDB’s vectorDB.search API call to
fetch the top k most similar passage vectors.

Thanks to ANN approximation and GPU acceleration, results are returned in


milliseconds.

5. Decode Passages

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 19/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

The passage text corresponding to the retrieved vectors can then be accessed
and decoded. Top relevant passages become input evidence for downstream
RAG tasks.

This architecture provides an immensely scalable and low-latency semantic


search system to power next-generation RAG models tackling tasks like
open-domain question answering. Next we’ll analyze the concrete
performance metrics and benchmarks from VectorDB powering state-of-the-
art RAG implementations.

VectorDB Performance Benchmarks


VectorDB delivers exceptional performance that can readily scale to support
leading-edge RAG models. Here are some benchmarks from a standard
production deployment:

Indexing Speed

480 million+ vectors inserted per hour

12 billion+ vectors indexed in under 1 day

Enables iterative retraining of gigantic corpora

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 20/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Query Latency

Typical search in 10–25 milliseconds

99th percentile latency around 50 ms

Maximum latency caps out at 100 ms

Query Throughput

62,000 searches per second per machine

Linear scaling with cluster size

Easily handles spike loads via auto-scaling

Index Capacity

5 trillion vector capacity per cluster

100s of trillions in a multi-cluster setup

No practical limits on index size

Cluster Size

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 21/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Up to 60 servers per cluster

Multi-region capabilities

Limitless horizontal scalability

These impressive numbers enable building sophisticated RAG models that


leverage repositories containing trillions of text embeddings for training and
inference.The combination of scale and speed offered by tailored vector
stores like VectorDB unlock new possibilities for RAG-based applications.

Next, let’s analyze some real-world RAG use cases made possible by vector
databases.

RAG Use Cases Enabled by Efficient Vector Search


The scalable low-latency vector queries provided by specialized databases
open up many practical use cases for RAG models that were previously
infeasible without this foundation.

Open-Domain Question Answering

Enables real-time QA by indexing Wikipedia, news, market data feeds,


scientific papers.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 22/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Dialog Systems

Conversational bots that provide knowledgeable, contextual responses


powered by indexed dialogue logs.

Text Generation

Creative writing tools that ingest novels/stories and help generate detailed
content conditioned on user prompts.

Search Engines

Semantic search over corpora metadata to return intelligent summarized


results.

Intelligent Content Recommendation

Suggest relevant content matching user behavior and preferences, with


explanations.

Automated Assistants

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 23/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Helpdesk bots providing troubleshooting for complex technical products by


identifying related manuals and documentation.

Contextual Advertising

Smart campaigns matching ad creative and landing pages to detailed user


web browsing history and session context.

These demonstrate a small slice of the possibilities that vector retrieval


delivers for RAG models tackling both consumer and enterprise
applications.

In the next sections, we’ll analyze the strengths and weaknesses of RAG
systems, current innovations, and promising directions for future research
in this rapidly evolving domain.

Strengths of RAG Models


There are several compelling benefits that set Retrieval-Augmented
Generation models apart from previous pure neural language generation
approaches:

1. Accurate Factual Responses


https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 24/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

By conditioning text generation on retrieved evidence passages, RAG models


produce outputs with much higher factual correctness and fewer
hallucinations.

2. Scalability to Huge Repositories

Specialized storage like VectorDB allows RAG training/inference over


corpora with trillions of examples not feasible previously.

3. Speed and Efficiency

Architecture optimizations in latest RAG networks allow faster indexing,


retrieval, and decoding compared to earlier fusion models.

4. Applicability to Many Domains

The high accuracy and scalability opens up many promising domains such
as open QA, enterprise search, contextual recommendations.

However, RAG approaches still have some weaknesses and areas for
improvement. Next we discuss the key challenges.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 25/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Limitations of Current RAG Systems


While representing the current state-of-the-art, modern RAG algorithms still
have some notable limitations:

1. Retrieval Recall

The initial retriever module often suffers from limited recall in surfacing
some relevant content that exists hidden within large unlabeled corpora.

2. Lack of Reasoning Capabilities

Most RAG techniques are heavily data-driven without much logical, symbolic
reasoning. So they struggle with complex compositional questions.

3. Difficulty Handling Ambiguity

The retrieval step often assumes only one valid answer or context. So RAG
models today don’t deal well with ambiguous, subjective, or nuanced
responses.

4. Necessity for Large Training Sets

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 26/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Although leveraging unlabeled text via self-supervision helps, most RAG


approaches still require large human-labeled datasets which can be scarce in
some niche domains.

Researchers worldwide are actively exploring ways to address these open


challenges. Next we analyze some promising innovation directions.

Current Innovations in RAG


There has been a recent explosion of research into RAG architectures as it
represents a pivotal advancement in NLP. Here are some cutting-edge
innovations that indicate the rapid progress:

Multi-Step Reasoning

Chaining multiple cycles of retrieval and generation to mimic complex


reasoning (Asai et al. 2022). Each cycle reformulates the context.

Dual Encoders

Using one encoder optimized for questions, another for evidence passages
yields performance gains (Lee et al. 2021).

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 27/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Discrete Passage Representations

Retrieving full discrete passages instead of distilled triplets improves


accuracy by retaining original text (Izacard et al. 2021).

Confidence Scoring

Some RAG networks now produce calibrated confidence scores to estimate


certainty of generated text (Thayaparan et al. 2022).

Data Augmentation

Automatically perturbing examples and mining hard negatives boosts model


robustness (Lewis et al. 2022).

These innovations are pushing capabilities forward at a torrid pace. In the


next section we speculate about impactful directions for longer-term
progress.

Future Outlook and Research for RAG


Looking towards the future horizon over the next 5+ years, here are some
particularly promising directions for continued research into RAG
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 28/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

algorithms:

Tighter Integration with Knowledge Bases

Increased leveraging of curated knowledge graphs and ontologies for


improved reasoning.

Diversified Retrieved Evidence

Don’t just retrieve the top match; intentionally fetch diverse conflicting
passages to mimic debates.

Dialogue Feedback Models

User loop allowing clarifying ambiguous questions to improve retrieval and


grounding.

Low-Resource Domain Adaptation

Make techniques easier to transfer to specialized vertical domains lacking


large training corpora.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 29/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Explainability and Interpretability

Surface evidence and explanations for generated output to increase


transparency.

We are truly still just at the beginnings when it comes to the potentials of
RAG in transforming language AI. The next decade promises to be one of
accelerating innovation as barriers in computing resources, model
techniques, and vector search capabilities unlock new horizons.

Wrap Up and Key Takeaways


In this extensive deep dive, we explored the emerging world of Retrieval-
Augmented Generation and how it is revolutionizing natural language
processing by combining neural search with conditional text generation.

Key highlights include:

RAG combines strengths of retrievers and generators enabling huge


knowledge scale with accurate, fluent output.

REALM pioneered appending retrieved evidence to encoder-decoder


models, achieving SOTA on QA.

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 30/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Follow-on works like ORQA further optimized architecture specifics for


greater speeds and accuracy.

VectorDB provides a purpose-built vector search cloud that powers ultra-


low-latency passage retrieval at massive scale to enable advanced RAG
implementations.

Real-world use cases range from open-domain QA to intelligent


recommendations and contextual advertising.

While great progress has occurred already, there remains much promise
for innovations around multi-step reasoning, robustness, and low-
resource domain adaptation.

The combination of incredible model techniques like RAG along with


specialized infrastructure for vector similarity search unlocks game-
changing new NLP applications. It’s an exciting time to be working at the
cutting edge of this transformational space!

Vector Database Llm Fine Tuning NLP Machine Learning

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 31/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Written by Bijit Ghosh Follow


3.1K Followers · 0 Following

CTO | Senior Engineering Leader focused on Cloud Native | AI/ML | DevSecOps

More from Bijit Ghosh

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 32/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Bijit Ghosh Bijit Ghosh

When to Apply RAG vs Fine-Tuning Top NVIDIA GPUs for LLM


Leveraging the full potential of LLMs requires Inference
choosing the right technique between… Selecting the Optimal NVIDIA Hardware for
LLM Inference — Your Guide to GPU Selection

Feb 26 439 Sep 28 26 1

Bijit Ghosh Bijit Ghosh

Chat with Your PDFs Locally Using Top 100 Kubernetes Interview
LLM and RAG Questions and Answer
Introduction to Kubernetes
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 33/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

In the age of information overload, keeping


up with the ever-growing pile of documents… Apr 28, 2023 204 2

May 26 43 6

See all from Bijit Ghosh

Recommended from Medium

In Stackademic by Abdur Rahman In DataDrivenInvestor by Austin Starks

Python is No More The King of Data I used OpenAI’s o1 model to


Science develop a trading strategy. It is…
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 34/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

5 Reasons Why Python is Losing Its Crown It literally took one try. I was shocked.

Oct 23 8.8K 34 Sep 16 6.8K 175

Lists

Natural Language Processing Predictive Modeling w/


1842 stories · 1464 saves Python
20 stories · 1699 saves

Practical Guides to Machine The New Chatbots: ChatGPT,


Learning Bard, and Beyond
10 stories · 2066 saves 12 stories · 516 saves

In Cubed by Michael Wood Vipra Singh

The Insanity of Relying on Vector LLM Architectures Explained: NLP


Embeddings: Why RAG Fails Fundamentals (Part 1)
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 35/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

In RAG, the goal is to locate the stored Deep Dive into the architecture & building of
information that has the highest percentage… real-world applications leveraging NLP…

Nov 22 1.4K 26 Aug 15 2K 13

In WhyHow.AI by Chia Jeng Yang In AIGuys by Vishal Rajput

A first intro to Complex RAG Why GEN AI Boom Is Fading And


(Retrieval Augmented Generation) What’s Next?
Understanding basic technical concepts of Every technology has its hype and cool down
RAG, and unsolved opportunities & problem… period.

Dec 14, 2023 1.1K 5 Sep 4 2.7K 76

See more recommendations

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 36/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium

Help Status About Careers Press Blog Privacy Terms Text to speech Teams

https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 37/37

You might also like