0% found this document useful (0 votes)

81 views10 pages

Multi-Meta-RAG Improving RAG For Multi-Hop Queries Using Database Filtering With LLM-Extracted Metadata

Uploaded by

zexyzm1201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views10 pages

Multi-Meta-RAG Improving RAG For Multi-Hop Queries Using Database Filtering With LLM-Extracted Metadata

Uploaded by

zexyzm1201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Multi-Meta-RAG: Improving RAG for Multi-Hop

Queries using Database Filtering with

LLM-Extracted Metadata

Mykhailo Poliakov[0009−0006−5263−762X] and Nadiya Shvai[0000−0001−8194−6196]

arXiv:2406.13213v2 [cs.CL] 19 Aug 2024

National University of Kyiv-Mohyla Academy

{mykhailo.poliakov, n.shvay}@ukma.edu.ua

Abstract. The retrieval-augmented generation (RAG) enables retrieval

of relevant information from an external knowledge source and allows
large language models (LLMs) to answer queries over previously un-
seen document collections. However, it was demonstrated that tradi-
tional RAG applications perform poorly in answering multi-hop ques-
tions, which require retrieving and reasoning over multiple elements of
supporting evidence. We introduce a new method called Multi-Meta-
RAG, which uses database filtering with LLM-extracted metadata to im-
prove the RAG selection of the relevant documents from various sources
applicable to the question. While database filtering is specific to a set
of questions from a particular domain and format, we found that Multi-
Meta-RAG greatly improves the results on the MultiHop-RAG bench-
mark. The code is available on GitHub.

Keywords: large language models · retrieval augmented generation ·

multi-hop question answering

1 Introduction
Large Language Models (LLMs) have shown remarkable language understanding
and generation abilities [10,13]. However, there are two main challenges: static
knowledge [8] and generative hallucination [5]. Retrieval-augmented generation
[6] is an established process for answering user questions over entire datasets.
RAG also helps mitigate generative hallucination and provides LLM with new in-
formation on which it was not trained [11]. Real-world RAG pipelines often need
to retrieve evidence from multiple documents simultaneously, a procedure known
as multi-hop querying. Nevertheless, existing RAG applications face challenges
in answering multi-hop queries, requiring retrieval and reasoning over numerous
pieces of evidence [12]. In this paper, we present Multi-Meta-RAG: an improved
RAG using a database filtering approach with LLM-extracted metadata that
significantly improves the results on the MultiHop-RAG benchmark.

2 Related works
MultiHop-RAG [12] is a novel benchmarking dataset focused on multi-hop
queries, including a knowledge base, questions, ground-truth responses, and
supporting evidence. The news articles were selected from September 26, 2023,
to December 26, 2023, extending beyond the knowledge cutoff of ChatGPT1
and GPT-42 . A trained language model extracted factual or opinion sentences
from each news article. These factual sentences act as evidence for multi-hop
queries. The selection method involves keeping articles with evidence that
overlaps keywords with other articles, enabling the creation of multi-hop queries
with answers drawn from numerous sources. Given the original evidence and
its context, GPT-4 was used to rephrase the evidence, referred to as claims.
Afterward, the bridge entity or topic is used to generate multi-hop queries.
For example, "Did Engadget report a discount on the 13.6-inch MacBook Air
before The Verge reported a discount on Samsung Galaxy Buds 2?" is a typi-
cal query from the MultiHop-RAG dataset. Answering it requires evidence from
Engadget and The Verge to formulate an answer. Also, it requires LLM to figure
out the temporal ordering of events. MultiHop-RAG also has inference, com-
parison, and null (without correct answer) queries, in addition to the temporal
query above.

Engadget Chunk

Did Engadget report a discount on the 13.6-inch

MacBook Air before The Verge reported a Embedding
Vector DB
discount on Samsung Galaxy Buds 2? CNN Chunk

BBC Chunk

User Prompt

Query

News Article
News Article
Context
"source":
"source": "BBC"
"Engadget"

News Article
News Article
"source": "The
Verge"
"source": "CNN" LLM Wrong response

Fig. 1. A naive RAG implementation for MultiHop-RAG queries. RAG selects chunks
from articles not asked in the example query, which leads to LLM giving a wrong
response.

In a typical RAG application, we use an external corpus that comprises multiple

documents and serves as the knowledge base. Each document within this corpus
is segmented into chunks. These chunks are then converted into vector represen-
1
gpt-3.5-turbo-0613
2
gpt4-0613
tations using an embedding model and stored in a vector database. Given a user
query, RAG typically retrieves the top-K chunks that best match the query. The
retrieved chunks, combined with the query, are submitted to an LLM to generate
a final response.
For the MultiHop-RAG benchmark, scraped articles act as a knowledge base
for the RAG application tested. The problem is that a naive RAG application
fails to recognize that the query asks for information from specific sources. Top-
K chunks such as RAG retrieves often contain information from sources other
than those mentioned in the query. Retrieved chunks might even miss relevant
sources, leading to a wrong response, as depicted in Figure 1.
Several popular benchmarks, such as HotpotQA [16] and 2WikiMultiHopQA
[4], can be used for QA from multiple document sources. However, these datasets
primarily focus on estimating LLM reasoning skills and do not highlight retriev-
ing evidence from the knowledge base. Another problem is that they are based
on Wikipedia, which means LLM’s are already trained on the same data.
Alternative solutions to tackle multi-hop queries include graph-based solu-
tions like Graph RAG [3]. While Graph RAG evaluates MultiHop-RAG dataset,
it is used purely as a knowledge base for an independent question set. Another
LLM assesses the responses for custom metrics such as comprehensiveness, di-
versity, empowerment, and directness instead of simple accuracy.

3 Multi-Meta-RAG

3.1 Extraction of Relevant Query Metadata with the LLM

Each question in the MultiHop-RAG [12] benchmark follows a typical struc-

ture. Every query requests information from one or more sources of news. In
addition, some temporal queries require news articles from a particular date. We
can extract the query filter via helper LLM by constructing a few-shot prompt [1]
with examples of extracted article sources and publishing dates as a filter. The
prompt template is provided in Appendix 5.2. We only run metadata extraction
with ChatGPT3 because this additional RAG pipeline step must be quick and
cheap. We found out that this step takes 0.7 seconds on average for one query.
Two query metadata filter fields are extracted: article source and publication
date. The complete filter is a dictionary with two fields combined. Samples of
extracted metadata filters can be found in Table 1. The primary filtering oper-
ator is $in, the only operator provided in the examples in a few-shot prompt
template. The LLM also correctly chooses a tiny fraction of the $nin operator
for some queries without an example. While LLM only used $in and $nin for
article sources, the model sometimes chooses other operators like $lt or $gt for
publication date for a fraction of temporal queries. Because the number of such
queries is small, we decided to only use date filters with $in and $nin operators
and a most frequent date format4 for easier matching in the database. All queries
3
gpt-3.5-turbo-1106
4
strftime format code %B %-d, %Y
Table 1. Examples of extracted metadata filters using a few-shot prompt with corre-
sponding queries. Correct usage of the $nin operator for the last query can be noted.

Query Extracted Filter

Does the TechCrunch article " source ": {

report on new hiring at Starz, " $ i n " : [ " TechCrunch " , " Engadget " ]
}
while the Engadget article dis-
cusses layoffs within the entire
video game industry?

Did The Guardian’s report on " published_at " : {

" $ i n " : [ " December 1 2 , 2 0 2 3 " ]
December 12, 2023, contradict },
the Sporting News report re- " source ": {
" $ i n " : [ " The Guardian " , " S p o r t i n g News " ]
garding the performance and }
future outlook of Manchester
United?

Who is the individual facing a

criminal trial on seven counts
" source ": {
of fraud and conspiracy, pre- " $nin ": [
viously likened to a financial " TechCrunch"
icon but not by TechCrunch, ]
}
and is accused by the prose-
cution of committing fraud for
wealth, power, and influence?

have a source filter extracted, while the publishing date filter was extracted in
15.57% of queries, while 22.81% of queries of the MultiHop-RAG dataset are
temporal.

3.2 Improved Chunk Selection using Metadata Filtering

The extracted metadata could be used to enhance an RAG application (Figure

2). We split the articles in the MultiHop-RAG [12] knowledge base into chunks,
each containing 256 tokens using LLamaIndex [7] using a sentence splitter as in
the original MultiHop-RAG implementation. We also picked a chunk overlap of
32, finding out that smaller chunk overlap leads to a better variety of unique
chunks in the top-K selection than the original implementation, which used the
LLamaIndex default of 200. We selected LangChain [2] Neo4j [9] vector store
as a vector database as its index implementation recently5 started to support
metadata filtering. We then convert the chunks using an embedding model and
save the embeddings into a vector database with article metadata saved as node
properties.
Likewise, in the retrieval stage, we transform a query using the same em-
bedding model and retrieve the top-K most relevant chunks with the highest
5
April 2024
Did Engadget report a discount on the 13.6-inch
MacBook Air before The Verge reported a LLM
discount on Samsung Galaxy Buds 2? Extract Metadata

{
"source":{
"$in":[
"Engadget",
"The Verge"
User ]
}
Embedding }

News Article
News Article
"source":
"source": "BBC" Store vectorized chunks
"Engadget"
and metadata
Vector DB
(Neo4j)

News Article
News Article
"source": "The Engadget, The
"source": "CNN"
Verge" Verge, BBC, CNN,
etc.

Metadata

Engadget Chunk

Prompt

Correct response The Verge Chunk

LLM Query

Context
The Verge Chunk

Fig. 2. Multi-Meta-RAG: an improved RAG with database filtering using metadata.

Metadata is extracted via secondary LLM. With filtering, we can ensure top-K chunks
are always from relevant sources with better chances of getting correct overall responses.

cosine similarity with the query embedding. We also filter the chunks with
LLM-extracted metadata in the same stage. Similarly to MultiHop-RAG, we
use a Reranker module (bge-reranker-large [15]) to examine the retrieval per-
formance. After retrieving 20 corresponding chunks using the embedding model
and metadata filter, we select the top-K chunks using the Reranker.

4 Results

4.1 Chunk Retrieval Experiment

We selected two best-performing embedding models from the original MultiHop-

RAG experiment for testing metadata filtering chunk retrieval performance, bge-
large-en-v1.5 [15], and voyage-02 [14]. The retrieved list of chunks is compared
with the ground truth evidence associated with each query, excluding the null
queries, as they lack corresponding evidence. For evaluation, we assume the Top-
K chunks are retrieved and use metrics such as Mean Average Precision at K
(MAP@K), Mean Reciprocal Rank at K (MRR@K), and Hit Rate at K (Hit@K).
MAP@K measures the average precision of the top-K retrieval across all queries.
MRR@K calculates the average reciprocal ranks of the first relevant chunk within
the top-K retrieved set for each query. Hit@K measures the proportion of evi-
dence that appears in the top-K retrieved set. The experiment (Table 2) with
RAG showed considerable improvement in both embeddings for all core metrics:
MRR@10, MAP@10, Hits@10, and Hits@4. Most notably, for voyage-02, Hits@4
enhanced by 17.2%. This improvement is important for practical RAG systems,
where the top-K retrieved should be as low as possible to account for context
window limits and cost.

Table 2. Chunk retrieval experiment results. Top-10 chunks are selected with bge-
reranker-large after the top-20 chunks are found via the similarity search and database
metadata filtering. A chunk size of 256 and a chunk overlap of 32 is used. We evaluate
both Baseline RAG and Multi-Meta-RAG using an evaluation script provided in the
MultiHop-RAG repository.

Baseline RAG [12]

Embedding
MRR@10 MAP@10 Hits@10 Hits@4
bge-large-en-v1.5 (evaluated) 0.6029 0.2687 0.7490 0.6661
voyage-02 (evaluated) 0.6016 0.2619 0.7419 0.6630
Multi-Meta-RAG (ours)
Embedding
MRR@10 MAP@10 Hits@10 Hits@4
bge-large-en-v1.5 0.6574 0.3293 0.8909 0.7672
voyage-02 0.6748 0.3388 0.9042 0.792

4.2 LLM Response Generation Experiment

Table 3. Overall generation accuracy of LLMs with MultiHop-RAG (top-6 chunks

with voyage-02)

Accuracy
LLM
Ground-truth [12] Baseline RAG [12] Multi-Meta-RAG (ours)
GPT4 (gpt-4-0613) 0.89 0.56 0.606
PaLM (text-bison@001) 0.74 0.47 0.608

As with embeddings, we picked two best-achieving LLMs on ground-truth

chunks based on MultiHop-RAG initial experiments, GPT-4 and Google PaLM.
We achieved substantial improvement in accuracy (Table 3) for both models
compared to baseline RAG implementation. Google PaLM accuracy improved
by 25.6% from 0.47 to 0.608. GPT-4 results also show a 7.89% increase from 0.56
to 0.63. The accuracy is calculated by checking if any word in an LLM-generated
response is present in the correct gold answer for each question.

Table 4. Generation accuracy of LLMs with MultiHop-RAG per question type (top-6
chunks with voyage-02)

Accuracy
Question Type
GPT4 (gpt-4-0613) PaLM (text-bison@001)

Inference 0.951 0.9203

Comparison 0.382 0.5397
Temporal 0.256 0.4545
Null 0.9867 0.2492

Table 4 shows the detailed evaluation results of different question types for
GPT-4 and Google PaLM. Both models scored remarkable scores that exceeded
0.9 for inference queries. Google PaLM performs significantly better for com-
parison and temporal queries than GPT-4. However, PaLM struggles with Null
questions, whereas GPT-4 achieves a near-perfect score. These results suggest
that combining both models for different queries can be a valid strategy to in-
crease overall accuracy further.

5 Conclusion

This paper introduces Multi-Meta-RAG, a method of improving RAG for multi-

hop queries using database filtering with LLM-extracted metadata. Multi-Meta-
RAG considerably improves results in chunk retrieval and LLM generation ex-
periments while being relatively straightforward and explainable compared to
alternative solutions, like Graph RAG [3].

5.1 Limitations

The proposed solution still has some limitations. Firstly, extracting metadata
requires a set of queries from a particular domain and question format, as
well as additional inference time. Secondly, it requires the manual creation of
a prompt template that will extract the metadata from the query. Thirdly, while
the improved results are encouraging, they still fall considerably below the results
achieved by feeding LLM precise ground-truth facts.
5.2 Future work

Future work includes trying more generic prompt templates for metadata ex-
traction using multi-hop datasets from different domains. In addition, testing
alternative LLMs, like LLama 3.1 [13], on datasets with more recent cut-off
dates is viable.

Acknowledgments. This research was partially funded by OpenAI Researcher

Access Program (Application 0000005294).

Appendix

Metadata Extraction Prompt Template

Given the question, extract the metadata to filter the database about article
sources. Avoid stopwords.

The sources can only be from the list: [’Yardbarker’, ’The Guardian’, ’Revyuh
Media’, ’The Independent - Sports’, ’Wired’, ’Sport Grill’, ’Hacker News’, ’Iot
Business News’, ’Insidesport’, ’Sporting News’, ’Seeking Alpha’, ’The Age’, ’CB-
SSports.com’, ’The Sydney Morning Herald’, ’FOX News - Health’, ’Science
News For Students’, ’Polygon’, ’The Independent - Life and Style’, ’FOX News -
Entertainment’, ’The Verge’, ’Business Line’, ’The New York Times’, ’The Roar |
Sports Writers Blog’, ’Sportskeeda’, ’BBC News - Entertainment & Arts’, ’Busi-
ness World’, ’BBC News - Technology’, ’Essentially Sports’, ’Mashable’, ’Ad-
vanced Science News’, ’TechCrunch’, ’Financial Times’, ’Music Business World-
wide’, ’The Independent - Travel’, ’FOX News - Lifestyle’, ’TalkSport’, ’Yahoo
News’, ’Scitechdaily | Science Space And Technology News 2017’, ’Globes En-
glish | Israel Business Arena’, ’Wide World Of Sports’, ’Rivals’, ’Fortune’, ’Zee
Business’, ’Business Today | Latest Stock Market And Economy News India’,
’Sky Sports’, ’Cnbc | World Business News Leader’, ’Eos: Earth And Space Sci-
ence News’, ’Live Science: The Most Interesting Articles’, ’Engadget’]

Examples to follow:
Question: Who is the individual associated with the cryptocurrency industry
facing a criminal trial on fraud and conspiracy charges, as reported by both The
Verge and TechCrunch, and is accused by prosecutors of committing fraud for
personal gain?
Answer: {’source’: {’$in’: [’The Verge’, ’TechCrunch’]}}
Question: After the TechCrunch report on October 7, 2023, concerning Dave
Clark’s comments on Flexport, and the subsequent TechCrunch article on Octo-
ber 30, 2023, regarding Ryan Petersen’s actions at Flexport, was there a change
in the nature of the events reported?
Answer: {’source’: {’$in’: [’TechCrunch’]}, ’published_at’: ’$in’: {[’October 7,
2023’, ’October 30, 2023’]}}
Question: Which company, known for its dominance in the e-reader space and
for offering exclusive invite-only deals during sales events, faced a stock decline
due to an antitrust lawsuit reported by ’The Sydney Morning Herald’ and dis-
cussed by sellers in a ’Cnbc | World Business News Leader’ article?
Answer: {’source’: {’$in’: [’The Sydney Morning Herald’, ’Cnbc | World Business
News Leader’]}}

If you detect multiple queries, return the answer for the first. Now it is your
turn:
Question: <query>
Answer:

References

1. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C.,
Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,
C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are
few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H.
(eds.) Advances in Neural Information Processing Systems. vol. 33, pp. 1877–1901.
Curran Associates, Inc. (2020)
2. Chase, H.: LangChain (Oct 2022), https://github.com/langchain-ai/langchain
3. Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Larson,
J.: From local to global: A graph rag approach to query-focused summarization
(2024), https://arxiv.org/abs/2404.16130
4. Ho, X., Duong Nguyen, A.K., Sugawara, S., Aizawa, A.: Construct-
ing a multi-hop QA dataset for comprehensive evaluation of reason-
ing steps. In: Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the
28th International Conference on Computational Linguistics. pp. 6609–
6625. International Committee on Computational Linguistics, Barcelona,
Spain (Online) (Dec 2020). https://doi.org/10.18653/v1/2020.coling-main.580,
https://aclanthology.org/2020.coling-main.580
5. Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W.,
Feng, X., Qin, B., Liu, T.: A survey on hallucination in large language models:
Principles, taxonomy, challenges, and open questions (2023)
6. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H.,
Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented
generation for knowledge-intensive nlp tasks. In: Larochelle, H., Ranzato, M., Had-
sell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing
Systems. vol. 33, pp. 9459–9474. Curran Associates, Inc. (2020)
7. Liu, J.: LlamaIndex (Nov 2022). https://doi.org/10.5281/zenodo.1234,
https://github.com/jerryjliu/llama{_}index
8. Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R.,
Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A., Grave, E., LeCun, Y.,
Scialom, T.: Augmented language models: a survey (2023)
9. Neo4j, Inc.: Neo4j graph database, https://neo4j.com/product/neo4j-graph-database
10. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang,
C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L.,
Simens, M., Askell, A., Welinder, P., Christiano, P.F., Leike, J., Lowe, R.: Training
language models to follow instructions with human feedback. In: Koyejo, S., Mo-
hamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural
Information Processing Systems. vol. 35, pp. 27730–27744. Curran Associates, Inc.
(2022)
11. Shuster, K., Poff, S., Chen, M., Kiela, D., Weston, J.: Retrieval augmenta-
tion reduces hallucination in conversation. In: Moens, M., Huang, X., Spe-
cia, L., Yih, S.W. (eds.) Findings of the Association for Computational Lin-
guistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Repub-
lic, 16-20 November, 2021. pp. 3784–3803. Association for Computational
Linguistics (2021). https://doi.org/10.18653/V1/2021.FINDINGS-EMNLP.320,
https://doi.org/10.18653/v1/2021.findings-emnlp.320
12. Tang, Y., Yang, Y.: Multihop-rag: Benchmarking retrieval-augmented generation
for multi-hop queries (2024)
13. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T.,
Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave,
E., Lample, G.: Llama: Open and efficient foundation language models (2023)
14. Voyage AI: Voyage ai cutting-edge embedding and rerankers,
https://www.voyageai.com
15. Xiao, S., Liu, Z., Zhang, P., Muennighoff, N., Lian, D., Nie, J.Y.: C-pack: Packaged
resources to advance general chinese embedding (2024)
16. Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., Manning,
C.D.: HotpotQA: A dataset for diverse, explainable multi-hop question answer-
ing. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of
the 2018 Conference on Empirical Methods in Natural Language Processing. pp.
2369–2380. Association for Computational Linguistics, Brussels, Belgium (Oct-Nov
2018). https://doi.org/10.18653/v1/D18-1259, https://aclanthology.org/D18-1259

Pico Bricks Ebook 15
100% (1)
Pico Bricks Ebook 15
234 pages
RAG Understanding PDF
No ratings yet
RAG Understanding PDF
12 pages
Retrieval Augmented Generation - A Simple Introduction
No ratings yet
Retrieval Augmented Generation - A Simple Introduction
82 pages
Session 7 LLMs Fine Tuning and RAG
No ratings yet
Session 7 LLMs Fine Tuning and RAG
21 pages
A Taxonomy of Retrieval Augmented Generation
100% (2)
A Taxonomy of Retrieval Augmented Generation
56 pages
RAG Architecture
100% (8)
RAG Architecture
52 pages
RAG Slide ENG
No ratings yet
RAG Slide ENG
41 pages
RAG Technics
100% (1)
RAG Technics
8 pages
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
No ratings yet
Introduction To RAG (Retrieval Augmented Generation) and Vector Database - by Sachinsoni - Medium
18 pages
Retrieval-Augmented Generation For Large Language Models A Survey
No ratings yet
Retrieval-Augmented Generation For Large Language Models A Survey
26 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
COT-MATH - Identifying Parallel, Inter-Secting and Perpendicular Lines
70% (10)
COT-MATH - Identifying Parallel, Inter-Secting and Perpendicular Lines
3 pages
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
100% (10)
External Information On Large Linguistic Models Utilizing Retrieval Enhanced Generation (RAG)
6 pages
Rag 1708257109
No ratings yet
Rag 1708257109
5 pages
A Comprehensive Guide To Building Agentic RAG Systems With LangGraph
No ratings yet
A Comprehensive Guide To Building Agentic RAG Systems With LangGraph
23 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
Rag System Notes
No ratings yet
Rag System Notes
26 pages
RAG - Genai
No ratings yet
RAG - Genai
11 pages
Build A Q&A App With Multi-Modal RAG Using Gemini Pro: About This Codelab
No ratings yet
Build A Q&A App With Multi-Modal RAG Using Gemini Pro: About This Codelab
1 page
LlamaIndex Talk (AI User Conference)
No ratings yet
LlamaIndex Talk (AI User Conference)
35 pages
DataGemma FullPaper
No ratings yet
DataGemma FullPaper
39 pages
Practical RAG
No ratings yet
Practical RAG
127 pages
Retrieval-Augmented Generation (RAG) - Modified
No ratings yet
Retrieval-Augmented Generation (RAG) - Modified
17 pages
RAG - Search Generate
No ratings yet
RAG - Search Generate
13 pages
RAGBench - Explainable Benchmark For Retrieval-Augmented Generation Systems
No ratings yet
RAGBench - Explainable Benchmark For Retrieval-Augmented Generation Systems
18 pages
Ue21cs421ac1 20240924233834
No ratings yet
Ue21cs421ac1 20240924233834
54 pages
Deeprag
No ratings yet
Deeprag
12 pages
Enhanced Retrieval-Augmented Reasoning With Open-Source Large Language Models
No ratings yet
Enhanced Retrieval-Augmented Reasoning With Open-Source Large Language Models
14 pages
The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation
No ratings yet
The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation
22 pages
12 Essential RAG Types 1735544647
No ratings yet
12 Essential RAG Types 1735544647
29 pages
Maximizing Rag Efficiency A Comparative Analysis of Rag Methods
No ratings yet
Maximizing Rag Efficiency A Comparative Analysis of Rag Methods
25 pages
Advanced-Q&A-and - RAG-Series
No ratings yet
Advanced-Q&A-and - RAG-Series
21 pages
A Survey On Rag Meeting LLM
No ratings yet
A Survey On Rag Meeting LLM
18 pages
Improving GPT Responses Using Ontology and RAG: Vashishtha Abhay s1282001 Prof. PAIK, Incheon
No ratings yet
Improving GPT Responses Using Ontology and RAG: Vashishtha Abhay s1282001 Prof. PAIK, Incheon
7 pages
FNN111 Nutrition Chapter 1 NOTES The Role of Nutrition in Our Health
No ratings yet
FNN111 Nutrition Chapter 1 NOTES The Role of Nutrition in Our Health
5 pages
IR LLMs
No ratings yet
IR LLMs
17 pages
Lecture 36 Introduction To Langchain
No ratings yet
Lecture 36 Introduction To Langchain
31 pages
01rag For LLM A Survey
No ratings yet
01rag For LLM A Survey
21 pages
Know Your RAG: Dataset Taxonomy and Generation Strategies For Evaluating RAG Systems
No ratings yet
Know Your RAG: Dataset Taxonomy and Generation Strategies For Evaluating RAG Systems
19 pages
Retrieval-Augmented Generation For Natural Language Processing: A Survey
No ratings yet
Retrieval-Augmented Generation For Natural Language Processing: A Survey
19 pages
Hybrid RAG For Unstructured Data
No ratings yet
Hybrid RAG For Unstructured Data
25 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
Corrective Retrieval Augmented Generation: Zhang Et Al. 2023b Muhlgay Et Al. 2023
No ratings yet
Corrective Retrieval Augmented Generation: Zhang Et Al. 2023b Muhlgay Et Al. 2023
14 pages
Retrieval-Augmented Generation For Natural Language Processing-A Survey
No ratings yet
Retrieval-Augmented Generation For Natural Language Processing-A Survey
17 pages
MACOY - Physical Activity Attitude Questionnaire
No ratings yet
MACOY - Physical Activity Attitude Questionnaire
2 pages
Physical Therapy Clinical Management.9 PDF
No ratings yet
Physical Therapy Clinical Management.9 PDF
24 pages
Speculative RAG Enhancing RAG Through Drafting 1721165432
No ratings yet
Speculative RAG Enhancing RAG Through Drafting 1721165432
17 pages
A Survey On Rag Meeting LLMS: Towards Retrieval-Augmented Large Language Models
No ratings yet
A Survey On Rag Meeting LLMS: Towards Retrieval-Augmented Large Language Models
18 pages
Semantic Search and Beyond handout-Tim-Clarke
No ratings yet
Semantic Search and Beyond handout-Tim-Clarke
16 pages
Crag Pa Peer
No ratings yet
Crag Pa Peer
16 pages
Multi-Head RAG: Solving Multi-Aspect Problems With LLMs
No ratings yet
Multi-Head RAG: Solving Multi-Aspect Problems With LLMs
14 pages
LlamaIndex Talk (W&B Fully Connected 2024)
No ratings yet
LlamaIndex Talk (W&B Fully Connected 2024)
38 pages
Fastrag: Retrieval Augmented Generation For Semi-Structured Data
No ratings yet
Fastrag: Retrieval Augmented Generation For Semi-Structured Data
9 pages
ETOM Processes
No ratings yet
ETOM Processes
45 pages
Internship Report Hamas Khan
No ratings yet
Internship Report Hamas Khan
24 pages
WsCube Tech Online MERN Stack Course
No ratings yet
WsCube Tech Online MERN Stack Course
24 pages
Knowledge Retrieval Based On Generative AI: 1 Te-Lun Yang
No ratings yet
Knowledge Retrieval Based On Generative AI: 1 Te-Lun Yang
8 pages
A Survey On Retrieval-Augmented Text Generation For Large Language Models
No ratings yet
A Survey On Retrieval-Augmented Text Generation For Large Language Models
18 pages
Medium
No ratings yet
Medium
22 pages
R AG: Incorporating Retrieval Information Into Retrieval Augmented Generation
No ratings yet
R AG: Incorporating Retrieval Information Into Retrieval Augmented Generation
13 pages
A Deep Dive Into Retrieval Augmented Generation: Team Members
No ratings yet
A Deep Dive Into Retrieval Augmented Generation: Team Members
14 pages
Llmrag
No ratings yet
Llmrag
6 pages
Rag
No ratings yet
Rag
10 pages
Learning: Gen Ai
No ratings yet
Learning: Gen Ai
6 pages
Mitsubishi FD70N Part 4 Circuit Diagram
No ratings yet
Mitsubishi FD70N Part 4 Circuit Diagram
42 pages
Open Hole Logging Costs ( ) : Platform Express
No ratings yet
Open Hole Logging Costs ( ) : Platform Express
8 pages
3 Minute Speech
No ratings yet
3 Minute Speech
2 pages
Characteristics of Profesional Ethics
No ratings yet
Characteristics of Profesional Ethics
17 pages
Precalculus-Q1-W6-Module 9
No ratings yet
Precalculus-Q1-W6-Module 9
16 pages
Dexos2™ Brands - GM Dexoscontact Dexos® Licensing Program
No ratings yet
Dexos2™ Brands - GM Dexoscontact Dexos® Licensing Program
9 pages
Energy Efficient Pumping Technology Innovations and Recent Trends
No ratings yet
Energy Efficient Pumping Technology Innovations and Recent Trends
15 pages
Troubleshooting
No ratings yet
Troubleshooting
6 pages
Clasa A 9a Limba Engleza
No ratings yet
Clasa A 9a Limba Engleza
2 pages
Series and Parallel - Simple Circuits: © Boardworks LTD 2003
No ratings yet
Series and Parallel - Simple Circuits: © Boardworks LTD 2003
22 pages
Optimization of Transportation Costs in Supply Cha PDF
No ratings yet
Optimization of Transportation Costs in Supply Cha PDF
83 pages
620b6 2. Rkvy Project Proposal For New Project
No ratings yet
620b6 2. Rkvy Project Proposal For New Project
6 pages
A Mathematical Model For Blood Flow in Magnetic Field - 2
No ratings yet
A Mathematical Model For Blood Flow in Magnetic Field - 2
16 pages
Evergreen State - Music Cultures of The World (1993-1994) Sean Williams
No ratings yet
Evergreen State - Music Cultures of The World (1993-1994) Sean Williams
4 pages
Two Phase Pressure Drop & Flooding Characyeristics in A Horizontal Vertical Pulsed Seive Plate Column
No ratings yet
Two Phase Pressure Drop & Flooding Characyeristics in A Horizontal Vertical Pulsed Seive Plate Column
11 pages
Iconlibrary Production Oct2016
No ratings yet
Iconlibrary Production Oct2016
137 pages
La Influencia de Los Traductores en Los Avances Científicos
No ratings yet
La Influencia de Los Traductores en Los Avances Científicos
3 pages
(Ebook PDF) Chemistry 4th Edition by Julia Burdge 2024 Scribd Download
100% (4)
(Ebook PDF) Chemistry 4th Edition by Julia Burdge 2024 Scribd Download
46 pages
Apinayé Art: A Case Study in A Brazilian Indigenous School
No ratings yet
Apinayé Art: A Case Study in A Brazilian Indigenous School
23 pages
Positive Results For Detection of Colchicine
No ratings yet
Positive Results For Detection of Colchicine
12 pages
ZLAN9480A User Manual
No ratings yet
ZLAN9480A User Manual
8 pages
176 Series Remote Indicators-SBEM
No ratings yet
176 Series Remote Indicators-SBEM
2 pages
Practical C++ Backend Programming
From Everand
Practical C++ Backend Programming
Justin Barbara
No ratings yet
Haystack for Natural Language Search and Question Answering: The Complete Guide for Developers and Engineers
From Everand
Haystack for Natural Language Search and Question Answering: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
From Everand
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
Justin Barbara
No ratings yet

Multi-Meta-RAG Improving RAG For Multi-Hop Queries Using Database Filtering With LLM-Extracted Metadata

Uploaded by

Multi-Meta-RAG Improving RAG For Multi-Hop Queries Using Database Filtering With LLM-Extracted Metadata

Uploaded by

Multi-Meta-RAG: Improving RAG for Multi-Hop

Queries using Database Filtering with

Mykhailo Poliakov[0009−0006−5263−762X] and Nadiya Shvai[0000−0001−8194−6196]

National University of Kyiv-Mohyla Academy

Abstract. The retrieval-augmented generation (RAG) enables retrieval

Keywords: large language models · retrieval augmented generation ·

Did Engadget report a discount on the 13.6-inch

In a typical RAG application, we use an external corpus that comprises multiple

3.1 Extraction of Relevant Query Metadata with the LLM

Each question in the MultiHop-RAG [12] benchmark follows a typical struc-

Query Extracted Filter

Does the TechCrunch article " source ": {

Did The Guardian’s report on " published_at " : {

Who is the individual facing a

3.2 Improved Chunk Selection using Metadata Filtering

The extracted metadata could be used to enhance an RAG application (Figure

Correct response The Verge Chunk

Fig. 2. Multi-Meta-RAG: an improved RAG with database filtering using metadata.

4.1 Chunk Retrieval Experiment

We selected two best-performing embedding models from the original MultiHop-

Baseline RAG [12]

4.2 LLM Response Generation Experiment

Table 3. Overall generation accuracy of LLMs with MultiHop-RAG (top-6 chunks

As with embeddings, we picked two best-achieving LLMs on ground-truth

Inference 0.951 0.9203

This paper introduces Multi-Meta-RAG, a method of improving RAG for multi-

Acknowledgments. This research was partially funded by OpenAI Researcher

Metadata Extraction Prompt Template

You might also like