Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

Chen, Beitao; Lyu, Xinyu; Gao, Lianli; Song, Jingkuan; Shen, Heng Tao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.15356 (cs)

[Submitted on 24 May 2024 (v1), last revised 19 Nov 2024 (this version, v2)]

Title:Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

Authors:Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen

View PDF HTML (experimental)

Abstract:Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropriately widens the contrastive logits gap between hallucinatory and targeted ones. However, due to uncontrollable nature of the global visual uncertainty, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations and may even lead to the generation of undesired hallucinations. To tackle this issue, we conducted the theoretical analysis to promote the effectiveness of contrast decoding. Building on this insight, we introduce a novel optimization strategy named Hallucination-Induced Optimization (HIO). This strategy seeks to amplify the contrast between hallucinatory and targeted tokens relying on a fine-tuned theoretical preference model (i.e., Contrary Bradley-Terry Model), thereby facilitating efficient contrast decoding to alleviate hallucinations in LVLMs. Extensive experimental research demonstrates that our HIO strategy can effectively reduce hallucinations in LVLMs, outperforming state-of-the-art methods across various benchmarks.

Comments:	Accepted by NeurIPS 2024. arXiv admin note: text overlap with arXiv:2311.16922 by other authors
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.15356 [cs.CV]
	(or arXiv:2405.15356v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.15356

Submission history

From: Beitao Chen [view email]
[v1] Fri, 24 May 2024 08:46:31 UTC (950 KB)
[v2] Tue, 19 Nov 2024 13:18:57 UTC (1,081 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators