Uncertainty-Aware Image Captioning

Fei, Zhengcong; Fan, Mingyuan; Zhu, Li; Huang, Junshi; Wei, Xiaoming; Wei, Xiaolin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2211.16769 (cs)

[Submitted on 30 Nov 2022]

Title:Uncertainty-Aware Image Captioning

Authors:Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

View PDF

Abstract:It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it. However, current image captioning methods usually consider the generation of all words in a sentence sequentially and equally. In this paper, we propose an uncertainty-aware image captioning framework, which parallelly and iteratively operates insertion of discontinuous candidate words between existing words from easy to difficult until converged. We hypothesize that high-uncertainty words in a sentence need more prior information to make a correct decision and should be produced at a later stage. The resulting non-autoregressive hierarchy makes the caption generation explainable and intuitive. Specifically, we utilize an image-conditioned bag-of-word model to measure the word uncertainty and apply a dynamic programming algorithm to construct the training pairs. During inference, we devise an uncertainty-adaptive parallel beam search technique that yields an empirically logarithmic time complexity. Extensive experiments on the MS COCO benchmark reveal that our approach outperforms the strong baseline and related methods on both captioning quality as well as decoding speed.

Comments:	Accepted by AAAI2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2211.16769 [cs.CV]
	(or arXiv:2211.16769v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2211.16769

Submission history

From: Zhengcong Fei [view email]
[v1] Wed, 30 Nov 2022 06:19:47 UTC (1,287 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Uncertainty-Aware Image Captioning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Uncertainty-Aware Image Captioning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators