Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Harzig, Philipp; Zecha, Dan; Lienhart, Rainer; Kaiser, Carolin; Schallner, René

doi:10.1109/MIPR.2019.00085

Computer Science > Computer Vision and Pattern Recognition

arXiv:1905.01919 (cs)

[Submitted on 6 May 2019]

Title:Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Authors:Philipp Harzig, Dan Zecha, Rainer Lienhart, Carolin Kaiser, René Schallner

View PDF

Abstract:Automatically generating descriptive captions for images is a well-researched area in computer vision. However, existing evaluation approaches focus on measuring the similarity between two sentences disregarding fine-grained semantics of the captions. In our setting of images depicting persons interacting with branded products, the subject, predicate, object and the name of the branded product are important evaluation criteria of the generated captions. Generating image captions with these constraints is a new challenge, which we tackle in this work. By simultaneously predicting integer-valued ratings that describe attributes of the human-product interaction, we optimize a deep neural network architecture in a multi-task learning setting, which considerably improves the caption quality. Furthermore, we introduce a novel metric that allows us to assess whether the generated captions meet our requirements (i.e., subject, predicate, object, and product name) and describe a series of experiments on caption quality and how to address annotator disagreements for the image ratings with an approach called soft targets. We also show that our novel clause-focused metrics are also applicable to other image captioning datasets, such as the popular MSCOCO dataset.

Comments:	6 pages, accepted at MIPR 2019
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1905.01919 [cs.CV]
	(or arXiv:1905.01919v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1905.01919
Related DOI:	https://doi.org/10.1109/MIPR.2019.00085

Submission history

From: Philipp Harzig [view email]
[v1] Mon, 6 May 2019 10:42:10 UTC (1,690 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators