0% found this document useful (0 votes)
39 views22 pages

Medium

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views22 pages

Medium

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Take your membership to the next level.

Save 20% when you upgrade now

From RAG to GraphRAG , What is


the GraphRAG and why i use it?
Jeong Yitae · Follow
13 min read · Mar 12, 2024

462 6

Before discussing RAG and GraphRAG,


The era of ChatGPT has arrived. It’s an era so influenced by large
language models that it could be called the third industrial revolution.
Nowadays, even my mother uses ChatGPT for her queries, showing how
its usage spans generations and is ever-expanding.

The reason for this broad utilization likely lies in its ability to accurately
fetch and convey the information users seek. In an age overwhelmed by
information, it serves to selectively provide the ‘necessary’ information.

Despite the significant progress made to date, there have been numerous
challenges. For instance, one such challenge is the ‘hallucination’
phenomenon, where inaccurate information is provided. This issue
stems from various causes, with a primary one being the
misinterpretation of user intent, leading to irrelevant information being
fetched.

The solution is straightforward: accurately understand the user’s intent


and deliver ‘relevant’ information.

Efforts to improve this involve various approaches, mainly categorized


into four types:

1. Building large language models from scratch, which allows for clear
data context from the outset but comes with high construction costs.

2. Adopting ‘well-trained’ large language models and further training


them in specific domains, which is cost-effective and relatively accurate
but challenging to maintain the balance between the model’s context and
domain-specific context.

3. Using large language models as is, but adding additional context to


user queries, which is cost-effective but risks subjectivity and potential
bias in context provision.

4. Keeping the large language model while providing extra context on


‘relevant information’ during the response process, which allows for up-
to-date, cost-effective responses but involves complexity in identifying
and integrating relevant documents.

Additionally, these methods can be compared in five aspects: cost,


accuracy, domain-specific terminology, up-to-date responses,
transparency, and interpretability.

For a detailed comparison, refer to https://deci.ai/blog/fine-tuning-peft-


prompt-engineering-and-rag-which-one-is-right-for-you/.

This post discussed various methodologies attempted to address the


hallucination phenomenon in large language models. Specifically, it will
examine the Retrieval Augmented Generation (RAG) technology, which
involves fetching ‘relevant’ information and providing context, and
explore RAG’s limitations and GraphRAG as a means to overcome them.

Brief Introduction to RAG


What is RAG (Retrieval Augmented Generation)? As mentioned, it’s a
technology that interprets user queries ‘well’, fetches ‘relevant’
information, processes it into context, and then incorporates this useful
information into responses.

As referenced in the cited site, RAG is characterized by its cost-


effectiveness, relative accuracy, adequacy in providing domain-specific
contextualization, ability to reflect the latest information, and
transparency and interpretability in tracing the source documents of the
information, making it a predominantly chosen approach.
Figure 1. RAG Operation Process / https://deci.ai/blog/fine-tuning-peft-
prompt-engineering-and-rag-which-one-is-right-for-you/

The key lies in ‘properly’ interpreting queries, fetching relevant


information, and processing it into context.

As seen in Figure 1, the process from user query → response generation


via a pre-trained large language model (LLM) → delivery of the response
to the user, now includes an additional step where a Retrieval Model
fetches relevant information for the query. This added Retrieval Model is
where the aforementioned three elements take place.

To perform these three tasks effectively, the process is divided and


implemented/improved in four stages: 1. Pre-Retrieval 2. Chunking 3.
Retrieval 4. Post-Retrieval.

Pre-Retrieval
Data granularity refers to the level of detail or precision of the data to be
searched by the RAG model to enhance the generation process,
processed in advance before the retrieval step.

Combining the strength of large pre-trained language models with a


retrieval component, the RAG model generates responses by searching a
text segment (e.g., sentence, paragraph, or document) database for
relevant information.

Data granularity can range from sentence-level (e.g., individual facts,


sentences, or short paragraphs) to paragraph-level (e.g., entire
documents or articles).

The choice of data granularity affects the model’s performance and its
ability to generate accurate and contextually relevant text.
Fine-grained data can provide more specific and detailed information for
the generation task, while coarse-grained data can provide broader
context or general knowledge.

Choosing the right data granularity to optimize the effectiveness of the


RAG model is crucial. It involves balancing the need to provide detailed
and relevant information against the risk of overloading the model with
too much data or too general data that becomes unhelpful.

Chunking
This is the process of appropriately processing the input form of source
data for quantification into a large language model. Since the number of
tokens that can be input into a large language model is limited, it’s
important to segment and input the information properly.

For example, in a conversation between people, an ideal situation is


assumed where the conversation is evenly distributed within a given
time.

If one person speaks for 59 minutes and the other for 1 minute in an
hour, the conversation is dominated by one person ‘inputting’
information, resembling not an exchange but an infusion of information.

Conversely, if each person speaks for 30 minutes, it’s considered an


efficient conversation because information is evenly exchanged.

In other words, to provide ‘good’ information to a large language model,


it’s crucial to give ‘appropriate’ context. Given the limited length (tokens),
it’s important to preserve the organic relationship between contexts
within the given context limit. Therefore, in processing relevant data, the
issue of ‘data length limit’ arises.

Retrieval
This stage involves searching a document or text segment database to
find content related to the user’s query. It includes understanding the
intent and context of the query and selecting the most relevant
documents or texts from the database based on this understanding.

For instance, when processing a query about “the health benefits of


green tea,” the model finds documents mentioning the health benefits of
green tea and selects them based on similarity metrics.

Post-Retrieval
This stage processes the retrieved information to effectively integrate it
into the generation process. It may include summarizing the searched
text, selecting the most relevant facts, and refining the information to
better match the user’s query.

For example, after analyzing documents on the health benefits of green


tea, it may summarize key points like “Green tea is rich in antioxidants,
which can reduce the risk of certain chronic diseases and improve brain
function,” to generate a comprehensive and informative response to the
user’s query.

RAG owns limitaion


RAG has its efficient aspects compared to other methods, such as cost,
up-to-date information, and domain specificity, but it also has its
inherent limitations. The following illustration seems to depict these
limitations well within the RAG process. Based on the illustration, we will
examine a few representative limitations.
1. Missing Content: The first limitation is failing to index documents related
to the user’s query, thus not being able to use them to provide context.
Despite diligently preprocessing and properly storing data in the
database, not being able to utilize it is a significant shortfall.

2. Missed the Top Ranked Documents: The second issue arises when
documents related to the user’s query are retrieved but are of minimal
relevance, leading to answers that don’t satisfy the user’s expectations.
This primarily stems from the subjective nature of determining the
“number of documents” to retrieve during the process, highlighting a
major limitation. Therefore, it’s necessary to conduct various
experiments to define this k hyperparameter properly.

3. Not in Context — Consolidation Strategy Limitations: Documents


containing the answer are retrieved from the database but fail to be
included in the context for generating an answer. This happens when
numerous documents are returned, and a consolidation process is
required to select the most relevant information.

4. Not Extracted: The fourth is a fundamental limitation of LLMs (Large


Language Models), which tend to retrieve ‘approximate’ rather than
‘exact’ values. Thus, obtaining ‘approximate’ or ‘similar’ values can lead to
irrelevant information, causing a significant impact due to minor noise
in future responses.

5. Wrong Format: The fifth issue appears closely related to instruction


tuning, a method of enhancing zero-shot performance through fine-
tuning the LLM with an Instruction dataset. It occurs when additional
instructions are incorrectly formatted in the prompt, leading to
misunderstanding or misinterpretation by the LLM, resulting in
erroneous answers.

6. Incorrect Specificity: The sixth issue involves either insufficiently using


the user query information or excessively using it, leading to problems
during the consideration of the query’s importance. This is likely to occur
when there’s an inappropriate combination of input and retrieval output.

7. Incomplete: The seventh limitation is when, despite the ability to use the
context in generating answers, missing information leads to incomplete
responses to the user’s query.

In summary, the main causes of these limitations are 1. Indexing —


retrieving documents relevant to the user’s query, 2. Properly providing
information before generating an answer, and 3. The suitable combination
of input and pre/post-retrieval processes. These three factors highlight
what’s crucial in RAG and pose the question of how these issues can be
improved.

When using GraphRAG


it can address some of the limitations of RAG as mentioned above from
the perspectives of Pre-Retrieval, Post-Retrieval, and Prompt
Compression, considering the contexts of Knowledge Graph’s Retrieval
and Reasoning.

Graph Retrieval focuses on enhancing context by fetching relevant


information, while Graph Reasoning applies to how information, such as
chunking and context inputs, is traversed and searched within RAG.

Pre-Retrieval can leverage knowledge graph indexing to fetch related


documents. By semantically indexing documents based on nodes and
edges within the knowledge graph, it directly retrieves semantically
related documents.

The process involves considering whether to fetch nodes or subgraphs.


Extracting nodes involves comparing the user query with chunked nodes
to find the most similar ones and using their connected paths as query
syntax.

However, this approach requires specifying how many nodes within a


path to fetch and depends heavily on the information extraction model
used for creating the knowledge graph, highlighting the importance of
the model’s performance.

Additionally, Variable Length Edges (VLE) may be used to fetch related


information, necessitating database optimization for efficient retrieval.
Discussions on database design and optimization, involving database
administrators and machine learning engineers, are crucial for
enhancing performance.

Subgraphs involve fetching ego-graphs connected to relevant nodes,


potentially embedding multiple related ego-graphs to compare the
overall context with the user’s query.

This method requires various graph embedding experiments due to


performance differences based on the embedding technique used.
Post-Retrieval involves a re-ranking process that harmoniously uses
values from both RAG and GraphRAG. By leveraging semantic search
values from GraphRAG alongside RAG’s similarity search values, it
generates context. GraphRAG’s values allow for verifying the semantic
basis of the retrieval, enhancing the accuracy of the fetched information.

Using a single database for both vectorDB and GraphDB allows for
semantic (GraphRAG) and vector (RAG) indexing within the same
database, facilitating verification of retrieval accuracy and enabling
improvements for inaccuracies.

Prompt Compression benefits from graph information during prompt


engineering, such as deciding which chunk information to inject into
prompts.

Graphs enable the return of only relevant information post-retrieval,


based on the relationship between the query context and the documents.
This allows for tracing the source of irrelevant information for
improvements.

For instance, if an inappropriate response is generated, graph queries


can be used to trace back to the problematic part for immediate
correction.

Overall, GraphRAG provides a comprehensive approach to addressing RAG’s


limitations by integrating knowledge graph techniques for better
information retrieval, reasoning, and context generation, thereby enhancing
the accuracy and relevance of the responses generated.

GraphRAG architecture
There are 4 modules for executing GraphRAG Query Rewriting , Augment
, Retrieval 에서 Semantic Search , Similarity Search.

Query Rewriting
Rewriting User’s query impelemnt in this process. if user write and order
the engine, we can add the additional and useful context its query
prompt format. In this process, we redefined this things for clarify the
users intention.
Pre-Retrieval & Post-Retrieval
This phase involves contemplating what information to retrieve and how
to process that information once retrieved. During the Pre-Retrieval
phase, the focus is primarily on decisions related to setting the chunking
size, how to index, ensuring data is well-cleaned, and detecting and
removing any irrelevant data if present.

In the Post-Retrieval phase, the challenge is to harmonize the data


effectively. This stage mainly involves two processes: Re-ranking and
Prompt Compression. In Prompt Compression, the query result,
specifically the Graph Path, is utilized as part of the Context + Prompt for
answer generation, incorporating it as a prompt element. Re-ranking
employs the results of Graph Embedding combined with LLM (Large
Language Model) Embedding to enhance the diversity and accuracy of
the ranking.

This approach is strategic in enhancing the performance and relevance


of the generated answers, ensuring that the process not only fetches
pertinent information but also integrates it efficiently to produce
coherent and contextually accurate responses.

Get factors ready for GraphRAG


To effectively store, manage, and retrieve graph-shaped data, software
that reflects the unique characteristics of the data is necessary. Just as
RDBMS (Relational Database Management System) serves to manage
table-form data efficiently, GDBMS (Graph Database Management
System) exists to adeptly handle graph-shaped data. Especially in the
context of Knowledge Graph Reasoning, if the database is not optimized
for graph structures, the cost of reversing relevance through JOIN
operations significantly increases, potentially leading to bottlenecks.

Hence, GDBMS is essential in GraphRAG for its efficiency in managing all


these aspects. For retrieving graphs, a model that generates graph
queries is required. Although it might be clear which data is related,
automating the process of fetching associated data from specific data
points is crucial. This necessitates a natural language processing model
dedicated to generating graph queries.

Unfortunately, there’s a lack of datasets for graph query generation,


highlighting the urgent need for data acquisition. Neo4j has taken a step
forward by launching a data crowdsourcing initiative, which can be
explored further through the provided link for those interested in
contributing or learning more.

Regarding the extraction of information to create graph forms, an


information extraction model is necessary to infer the relationships
between well-chunked documents.

Two main approaches can be considered: using NLP’s Named Entity


Recognition (NER) or employing a Foundation model from Knowledge
Graph. Each approach has its distinct differences.

NLP focuses on semantics from a textual perspective, heavily relying on a


predefined dependency among words, whereas Knowledge Graphs,
formed from a knowledge base through Foundation models, focus on
nodes and can regulate the amount of information transmitted between
edges.

For embedding graph data, a model is utilized to add additional context


to the Reranker, incorporating a holistic perspective through Graph
Embedding, diverging from the conventional sequence perspective of
LLMs (Large Language Models). This allows for structural characteristics
to be imparted, complementing the sequence perspective that focuses on
relationships over time with a graph perspective that ensures all chunks
(nodes) are evenly represented, thereby filling in any potentially missed
information.

GraphRAG limitations
GraphRAG, like RAG, has clear limitations, which include how to form
graphs, generate queries for querying these graphs, and ultimately
decide how much information to retrieve based on these queries. The
main challenges are ‘query generation’, ‘reasoning boundary’, and
‘information extraction’. Particularly, the ‘reasoning boundary’ poses a
significant limitation as optimizing the amount of related information
can lead to overload during information retrieval, negatively impacting
the core aspect of GraphRAG, which is answer generation.

Applying GraphRAG
GraphRAG utilizes graph embeddings from GNN (graph neural network)
results to enhance text embeddings with user query response inference.
This method, known as Soft-prompting, is a type of prompt engineering.
Prompt engineering can be divided into Hard and Soft categories. Hard
involves explicitly provided prompts, requiring manual context addition
to user queries. This method’s downside is the subjective nature of
prompt creation, although it’s straightforward to implement.

On the contrary, Soft involves implicitly providing prompts, where


additional embedding information is added to the model’s existing text
embeddings to derive similar inference results. This method ensures
objectivity by using ‘learned’ context embeddings and can optimize
weight values. However, it requires direct model design and
implementation, making it more complex.

When to Use GraphRAG


GraphRAG is not a cure-all. It’s not advisable to use advanced techniques
like GraphRAG without a clear need, especially if traditional RAG works
well. The introduction of GraphRAG should be justified with factual
evidence, especially when there’s a mismatch between the information
retrieved during the retrieval stage and the user’s query intent. This is
akin to the fundamental limitations of vector search, where information
is retrieved based on ‘approximate’ rather than ‘exact’ values, leading to
potential inaccuracies.

When efforts like introducing BM25 for exact search in a hybrid search
approach, improving the ranking process, or fine-tuning for embedding
quality do not significantly enhance RAG performance, it might be worth
considering GraphRAG.

Conclusion
This post covered everything from RAG to GraphRAG, focusing on
methods like fine-tuning, building from scratch, prompt engineering,
and RAG to improve response quality. While RAG is acclaimed for
efficiently fetching related documents for answering queries at relatively
lower costs, it faces several limitations in the retrieval process. Advanced
RAG, or GraphRAG, emerges as a solution to overcome these limitations
by leveraging ‘semantic’ reasoning and retrieval. Key considerations for
effectively utilizing GraphRAG include information extraction techniques
to infer and generate connections between chunked data, knowledge
indexing for storage and retrieval, and models for generating graph
queries, such as the Cypher Generation Model. With new technologies
emerging daily, this post aims to serve as a resource on GraphRAG,
helping you become more familiar with this advanced approach. Thank
you for reading through this extensive discussion.

ADs Open in app

1
Search Write

Reference
https://medium.com/@bijit211987/top-rag-pain-points-and-solutions-
108d348b4e5d

https://luv-bansal.medium.com/advance-rag-improve-rag-performance-
208ffad5bb6a

Barnett, Scott, et al. “Seven failure points when engineering a retrieval


augmented generation system.” arXiv preprint arXiv:2401.05856 (2024).

https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-
one-is-right-for-you/

Luo, Linhao, et al. “Reasoning on graphs: Faithful and interpretable large


language model reasoning.” arXiv preprint arXiv:2310.01061 (2023).

https://towardsdatascience.com/advanced-retrieval-augmented-
generation-from-theory-to-llamaindex-implementation-4de1464a9930

Graphrag Retrieval Augmented Llm ChatGPT

Written by Jeong Yitae Follow

629 Followers
Linkedin : jeongiitae / i'm the graph and network data enthusiast. I always consider how the
graph data is useful in the real-world.

More from Jeong Yitae

Jeong Yitae Jeong Yitae

Today, I’d like to discuss the GraphRAG, Let’s check the


evaluation methods for GraphRAG. rationality with the paper and a fe…
Additionally, there are evaluation methods Email : [email protected]
where LLMs serve as the evaluators, includin…

Jul 3 104 Mar 3 128

Jeong Yitae Jeong Yitae

How do we measure if GraphRAG Semantic search and its


will help with the RAG pipeline? supplement ‘Graph based…
From Local to Global: A Graph RAG Approach Graph Neural Prompting with Large
to Query-Focused Summarization. Language Models
Jun 12 19 1 Mar 24 147

See all from Jeong Yitae

Recommended from Medium

Tomaz Bratanic in Neo4j Developer Blog Dominik Polzer in Towards Data Science

Implementing ‘From Local to 17 (Advanced) RAG Techniques to


Global’ GraphRAG with Neo4j and… Turn Your LLM App Prototype into…
Combine text extraction, network analysis, A collection of RAG techniques to help you
and LLM prompting and summarization for… develop your RAG app into something robus…

Jul 9 457 9 Jun 26 2.1K 21

Lists

Natural Language Processing


1621 stories · 1190 saves
Valentina Alto in Microsoft Azure Karthik Rajan, Ph.D in AI Advances

Introducing GraphRAG with Microsoft’s GraphRAG + AutoGen +


LangChain and Neo4j Ollama + Chainlit = Fully Local &…
Part 1: Getting Started with Graph Databases This superbot app integrates GraphRAG with
in the LLMs Era AutoGen agents, powered by local LLMs fro…

Apr 28 759 13 Jul 15 774 10

Terence Luca… in Government Digital Services, Si… Vishal Rajput in AIGuys

From Conventional RAG to Graph RAG 2.0: Retrieval Augmented


RAG Language Models
When Large Language Models Meet RAG 2.0 shows the true capabilities of
Knowledge Graphs Retrieval Systems and LLMs

Mar 16 691 9 Apr 16 1.3K 4


See more recommendations

You might also like