Learning Multi-Modal Word Representation Grounded in Visual Context

Zablocki, Éloi; Piwowarski, Benjamin; Soulier, Laure; Gallinari, Patrick

Computer Science > Computation and Language

arXiv:1711.03483 (cs)

[Submitted on 9 Nov 2017]

Title:Learning Multi-Modal Word Representation Grounded in Visual Context

Authors:Éloi Zablocki, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari

View PDF

Abstract:Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify text-based techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1711.03483 [cs.CL]
	(or arXiv:1711.03483v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1711.03483

Submission history

From: Eloi Zablocki [view email]
[v1] Thu, 9 Nov 2017 17:28:07 UTC (715 KB)

Computer Science > Computation and Language

Title:Learning Multi-Modal Word Representation Grounded in Visual Context

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Learning Multi-Modal Word Representation Grounded in Visual Context

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators