Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Qi, Jirui; Sarti, Gabriele; Fernández, Raquel; Bisazza, Arianna

Computer Science > Computation and Language

arXiv:2406.13663v2 (cs)

[Submitted on 19 Jun 2024 (v1), revised 1 Jul 2024 (this version, v2), latest version 18 Oct 2024 (v4)]

Title:Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Authors:Jirui Qi, Gabriele Sarti, Raquel Fernández, Arianna Bisazza

View PDF HTML (experimental)

Abstract:Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution.

Comments:	Under review. Code and data released at this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2406.13663 [cs.CL]
	(or arXiv:2406.13663v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.13663

Submission history

From: Jirui Qi [view email]
[v1] Wed, 19 Jun 2024 16:10:26 UTC (11,795 KB)
[v2] Mon, 1 Jul 2024 12:39:26 UTC (11,796 KB)
[v3] Thu, 3 Oct 2024 11:03:22 UTC (3,226 KB)
[v4] Fri, 18 Oct 2024 13:16:57 UTC (3,226 KB)

Computer Science > Computation and Language

Title:Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators