RAG Vs VectorDB. Introduction To RAG and VectorDB - by Bijit Ghosh - Medium
RAG Vs VectorDB. Introduction To RAG and VectorDB - by Bijit Ghosh - Medium
RAG Vs VectorDB
Bijit Ghosh · Follow
14 min read · Jan 29, 2024
370 4
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 1/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 2/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
We will also examine VectorDB, a specialized database for vector storage that
is integral to many RAG implementations.
Methods for training performant RAG models with datasets like REALM
and ORQA
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 3/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
search engine that returns the most relevant web pages for a search query
would be considered a simple retrieval model.
In contrast, generative models are able to produce entirely new text using
language generation capabilities. Examples would be machine translation
systems or conversational chatbots.
The key idea behind RAG is that having access and conditioning on relevant
background knowledge can significantly improve generative model
performance on downstream NLP tasks.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 4/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
The RAG paradigm allows models to have impressive retrieval abilities for
gathering relevant information, combined with excellent natural language
generation capabilities for producing fluent, human-like text. This hybrid
approach leads to state-of-the-art results on tasks ranging from open-domain
question answering to dialog systems.
In the next sections, we’ll explore exactly how RAG models work under the
hood, along with innovative applications, recent advancements, and
promising future research directions in this burgeoning subfield of NLP.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 5/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
1. Input Query
The input query could be a search query, question for QA, dialog utterance,
or other text.
2. Retrieval System
The retrieval system selects the top k passages or documents related to the
query from the knowledge source. This is enabled through semantic dense
vector search or sparse methods like TF-IDF.
3. Re-ranking
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 6/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
The top re-ranked m passages are concatenated with the original query. This
combined representation is fed into the generative seq2seq model.
5. Generate Output
The seq2seq model attends to the retrieved passages while decoding the
output text, whether that be an answer, dialog response, or other generated
text.
Now let’s explore some leading RAG algorithms like REALM, ORQA, and RAG
Token which instantiate this high-level architecture in innovative ways.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 7/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
1. Open Retrieval
Open retrieval refers to the shift from closed domain, limited size knowledge
sources to open-ended retrieval over massive corpora like the entirety of
Wikipedia (over 21 billion words). Scaling to such a large, ever-evolving
knowledge source is challenging but impactful.
2. Late Interaction
In contrast, REALM introduces late interaction where the input question and
evidence passages are encoded independently, without concatenation first.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 8/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
The joint interaction only happens later within the encoder cross-attention
layers.
This more efficient approach prevents an early bottleneck and also allows
flexibility in how many passages are provided to later attention layers.
1. Input Question
Sparse vector index retrieval using BM25 to fetch top k Wikipedia passages
given question embeddings.
3. Encode Independently
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 9/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
4. Joint Contextualization
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 10/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
The inputs are encoded in parallel by each respective encoder, fused via
cross-attention, then decoded by a T5 model into an answer span selection
over the evidence.
Next, we’ll explore how RAG has been adapted into a unified framework that
takes a token-level approach.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 11/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
For example, consider the input question: “When was the first bicycle
invented?”
This token-based approach allows jointly training over both text corpus
retrieval and knowledge
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 13/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Next, we’ll do a deep dive into popular vector database fueling state-of-the-
art RAG implementations.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 14/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
GPU Acceleration: Makes use of GPU cores for massively parallel processing
and takes advantage of libraries like Faiss to perform ultra-fast indexing and
search computations.
a database client.
Now let’s walk through a concrete architecture pattern that utilizes VectorDB
to enable cutting-edge RAG implementations.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 16/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Encode the title and abstract text from research papers into dense vectors.
This allows semantic search over key concepts.
2. Multi-Vector Fields
Index separate vectors for title and abstract into different vector fields,
allowing fine-grained queries.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 17/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Surface the most relevant research papers for a query based on title vector
similarity search.
1. Embed Text
The first step is generating vector embeddings for all the texts that we want
to be retrievable. This includes corpora like Wikipedia, news archives,
journal papers, or any collection of documents. Powerful semantic encoders
like SBERT (Sentence-BERT) are ideal for creating quality document and
passage vectors.
All those billions of encoded vectors get efficiently inserted into the
managed VectorDB cloud. This powers a unified index spanning the entire
vector space.
3. Input Question
When an input question arrives, it gets encoded via SBERT to create a dense
vector representation.
For example: “When was insulin discovered?” -> [0.73, 1.19, 0.42, …]
This question vector gets fed into VectorDB’s vectorDB.search API call to
fetch the top k most similar passage vectors.
5. Decode Passages
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 19/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
The passage text corresponding to the retrieved vectors can then be accessed
and decoded. Top relevant passages become input evidence for downstream
RAG tasks.
Indexing Speed
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 20/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Query Latency
Query Throughput
Index Capacity
Cluster Size
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 21/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Multi-region capabilities
Next, let’s analyze some real-world RAG use cases made possible by vector
databases.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 22/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Dialog Systems
Text Generation
Creative writing tools that ingest novels/stories and help generate detailed
content conditioned on user prompts.
Search Engines
Automated Assistants
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 23/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Contextual Advertising
In the next sections, we’ll analyze the strengths and weaknesses of RAG
systems, current innovations, and promising directions for future research
in this rapidly evolving domain.
The high accuracy and scalability opens up many promising domains such
as open QA, enterprise search, contextual recommendations.
However, RAG approaches still have some weaknesses and areas for
improvement. Next we discuss the key challenges.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 25/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
1. Retrieval Recall
The initial retriever module often suffers from limited recall in surfacing
some relevant content that exists hidden within large unlabeled corpora.
Most RAG techniques are heavily data-driven without much logical, symbolic
reasoning. So they struggle with complex compositional questions.
The retrieval step often assumes only one valid answer or context. So RAG
models today don’t deal well with ambiguous, subjective, or nuanced
responses.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 26/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Multi-Step Reasoning
Dual Encoders
Using one encoder optimized for questions, another for evidence passages
yields performance gains (Lee et al. 2021).
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 27/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Confidence Scoring
Data Augmentation
algorithms:
Don’t just retrieve the top match; intentionally fetch diverse conflicting
passages to mimic debates.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 29/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
We are truly still just at the beginnings when it comes to the potentials of
RAG in transforming language AI. The next decade promises to be one of
accelerating innovation as barriers in computing resources, model
techniques, and vector search capabilities unlock new horizons.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 30/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
While great progress has occurred already, there remains much promise
for innovations around multi-step reasoning, robustness, and low-
resource domain adaptation.
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 31/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 32/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Chat with Your PDFs Locally Using Top 100 Kubernetes Interview
LLM and RAG Questions and Answer
Introduction to Kubernetes
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 33/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
May 26 43 6
5 Reasons Why Python is Losing Its Crown It literally took one try. I was shocked.
Lists
In RAG, the goal is to locate the stored Deep Dive into the architecture & building of
information that has the highest percentage… real-world applications leveraging NLP…
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 36/37
12/3/24, 8:12 PM RAG Vs VectorDB. Introduction to RAG and VectorDB | by Bijit Ghosh | Medium
Help Status About Careers Press Blog Privacy Terms Text to speech Teams
https://medium.com/@bijit211987/rag-vs-vectordb-2c8cb3e0ee52 37/37