Context Awareness Gate For Retrieval Augmented Generation
Context Awareness Gate For Retrieval Augmented Generation
Context Awareness Gate For Retrieval Augmented Generation
Generation
Mohammad Hassan Heydari Arshia Hemmat Erfan Naman Afsaneh Fatemi
Computer Engineering Faculty Computer Engineering Faculty Computer Engineering Facu. Computer Engineering Faculty
University of Isfahan University of Isfahan University of Isfahan University of Isfahan
Isfahan, Iran Isfahan, Iran Isfahan, Iran Isfahan, Iran
[email protected] [email protected] [email protected] a [email protected]
arXiv:2411.16133v1 [cs.LG] 25 Nov 2024
Abstract—Retrieval-Augmented Generation (RAG) has proaches contribute to more accurate and effective retrieval
emerged as a widely adopted approach to mitigate the of information.
limitations of large language models (LLMs) in answering Despite ongoing efforts to develop more reliable retrieval
domain-specific questions. Previous research has predominantly
focused on improving the accuracy and quality of retrieved data methods for extracting relevant data chunks, many question-
chunks to enhance the overall performance of the generation answering systems do not solely rely on local or domain-
pipeline. However, despite ongoing advancements, the critical specific datasets for answering user queries. In addition to
issue of retrieving irrelevant information—which can impair a domain-specific user queries, many input queries do not
model’s ability to utilize its internal knowledge effectively—has necessitate retrieval from local datasets, which reduces the
received minimal attention. In this work, we investigate the
impact of retrieving irrelevant information in open-domain scalability and reliability of question-answering systems [11].
question answering, highlighting its significant detrimental effect To tackle this limitation, retrieval methods based on query
on the quality of LLM outputs. To address this challenge, we classification and routing mechanisms have proven effective
propose the Context Awareness Gate (CAG) architecture, a novel in enhancing retrieval accuracy by directing the search toward
mechanism that dynamically adjusts the LLM’s input prompt a set of documents closely related to the user’s query [10].
based on whether the user query necessitates external context
retrieval. Additionally, we introduce the Vector Candidates However, in our study, we demonstrate that even with semantic
method, a core mathematical component of CAG that is routing, the probability of retrieving irrelevant information
statistical, LLM-independent, and highly scalable. We further remains non-negligible, particularly when dealing with a broad
examine the distributions of relationships between contexts and domain of potential queries.
questions, presenting a statistical analysis of these distributions. Due to the inherently local search mechanism of Retrieval-
This analysis can be leveraged to enhance the context retrieval
process in retrieval-augmented generation (RAG) systems. Augmented Generation (RAG) systems [1], [12], even for
Index Terms—Retrieval-Augmented Generation, Hallucina- queries that are largely irrelevant, the pipeline will still return
tion, Large Language Models, Open Domain Question Answering a set number of passages. While existing research has made
strides in addressing the challenge of imperfect data retrieval
[11], [13], [14], the issue of broad-domain question answering
I. I NTRODUCTION in RAG systems has received relatively little attention.
Many queries submitted to RAG-enhanced question-
Retrieval-augmented generation (RAG) has emerged as a answering (QA) systems do not require data retrieval, such as
leading approach for implementing question-answering sys- daily conversations, general knowledge questions, or questions
tems that require intensive domain-specific knowledge [1]. that large language models (LLMs) themselves can answer
This method allows for the utilization of customized datasets using their internal knowledge [10], [11], [15]. Retrieving
to generate answers, grounded in the information provided passages for all input queries, especially in these cases, can
by those datasets. However, the effectiveness of the retrieval significantly diminish the retrieval precision [11] and the con-
component within RAG pipelines is critical, as it directly text relevancy [16], often rendering them entirely irrelevant.
influences the reliability and quality of the generated outputs To address this issue, we propose a novel context-aware
[2], [3]. gate architecture for RAG-enhanced systems which is highly
In efforts to enhance the quality of the retrieval component scalable of dynamically routing the LLM input prompt to
in RAG pipelines, research has demonstrated that transforming increase the quality of pipeline outputs.
the user’s input query into varying levels of abstraction before For better comprehension of our work, we highlight three
conducting the document search can significantly improve main contributions in this study:
the relevance of the retrieved data. Several methods have • Context Awareness Gate (CAG): We introduce a novel
been proposed, including query expansion into multi-query gate architecture that significantly broadens the domain
searches, chain of verification [4], [5], pseudo-context search accessibility of RAG systems. CAG leverages both query
[6] and query transformation [7], [8], [9], [10]. These ap- transformation and dynamic prompting to enhance the
reliability of RAG pipelines in both open-domain and al. (2024) [20] argue that simply adding more context to the
closed-domain question answering tasks. LLM input prompt does not necessarily improve performance.
• Vector Candidates (VC): We propose a statistical se- In a recent and highly relevant study, Wang et al. (2024)
mantic analysis algorithm that improves semantic search [11] show that when retrieval precision is below 20%, RAG
and routing by utilizing the concept of pseudo-queries is not beneficial for QA systems. They highlight that when
and in-dataset embedding distributions. retrieval precision approaches zero, the RAG pipeline performs
• Context Retrieval Supervision Benchmark (CRSB) significantly worse than a pipeline without RAG.
Dataset: Alongside our technical and statistical investiga-
tions, we introduce the CRSB dataset, which consists of III. A PPROACH
data from 17 diverse fields. We study the inner context- To address the challenges associated with retrieving irrele-
query distributions of this rich dataset and demonstrate vant information [11], we propose the Context Awareness Gate
the effectiveness and scalability of Vector Candidates on (CAG) architecture, which utilizes Vector Candidates as its
practical QA systems 1 . primary statistical method for query classification. CAG sig-
nificantly improves the performance of open-domain question-
answering systems by dynamically adjusting the input prompt
for the LLM, transitioning from RAG-based context prompts
to Few Shot, Chain-of-Thought (CoT) [4], [5], and other
methodologies. Consequently, the LLM responds to user
queries based on its internal knowledge base.