Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Guo, Longteng; Liu, Jing; Zhu, Xinxin; Yao, Peng; Lu, Shichen; Lu, Hanqing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2003.08897 (cs)

[Submitted on 19 Mar 2020]

Title:Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Authors:Longteng Guo, Jing Liu, Xinxin Zhu, Peng Yao, Shichen Lu, Hanqing Lu

View PDF

Abstract:Self-attention (SA) network has shown profound value in image captioning. In this paper, we improve SA from two aspects to promote the performance of image captioning. First, we propose Normalized Self-Attention (NSA), a reparameterization of SA that brings the benefits of normalization inside SA. While normalization is previously only applied outside SA, we introduce a novel normalization method and demonstrate that it is both possible and beneficial to perform it on the hidden activations inside SA. Second, to compensate for the major limit of Transformer that it fails to model the geometry structure of the input objects, we propose a class of Geometry-aware Self-Attention (GSA) that extends SA to explicitly and efficiently consider the relative geometry relations between the objects in the image. To construct our image captioning model, we combine the two modules and apply it to the vanilla self-attention network. We extensively evaluate our proposals on MS-COCO image captioning dataset and superior results are achieved when comparing to state-of-the-art approaches. Further experiments on three challenging tasks, i.e. video captioning, machine translation, and visual question answering, show the generality of our methods.

Comments:	Accepted by CVPR 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Multimedia (cs.MM)
Cite as:	arXiv:2003.08897 [cs.CV]
	(or arXiv:2003.08897v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2003.08897

Submission history

From: Longteng Guo [view email]
[v1] Thu, 19 Mar 2020 16:54:16 UTC (1,289 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators