Sherlock: Modeling Structured Knowledge in Images

Elhoseiny, Mohamed; Cohen, Scott; Chang, Walter; Price, Brian; Elgammal, Ahmed

Computer Science > Computer Vision and Pattern Recognition

arXiv:1511.04891v3 (cs)

[Submitted on 16 Nov 2015 (v1), revised 8 Jan 2016 (this version, v3), latest version 2 Apr 2016 (v4)]

Title:Sherlock: Modeling Structured Knowledge in Images

Authors:Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal

View PDF

Abstract:How to build a machine learning method that can continuously gain structured visual knowledge by learning structured facts? Our goal in this paper is to address this question by proposing a problem setting, where training data comes as structured facts in images with different types including (1) objects(e.g., < boy >), (2) attributes (e.g., < boy,tall >), (3) actions (e.g., < boy, playing >), (4) interactions (e.g., < boy, riding, a horse >). Each structured fact has a semantic language view (e.g., < boy, playing >) and a visual view (an image with this fact). A human is able to efficiently gain visual knowledge by learning facts in a never ending process, and as we believe in a structured way (e.g., understanding "playing" is the action part of < boy, playing >, and hence can generalize to recognize < girl, playing > if just learn < girl > additionally). Inspired by human visual perception, we propose a model that is (1) able to learn a representation, we name as wild-card, which covers different types of structured facts, (2) could flexibly get fed with structured fact language-visual view pairs in a never ending way to gain more structured knowledge, (3) could generalize to unseen facts, and (4) allows retrieval of both the fact language view given the visual view (i.e., image) and vice versa. We also propose a novel method to generate hundreds of thousands of structured fact pairs from image caption data, which are necessary to train our model and can be useful for other applications.

Comments:	Jan 7 Update
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1511.04891 [cs.CV]
	(or arXiv:1511.04891v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1511.04891

Submission history

From: Mohamed Elhoseiny Mohamed Elhoseiny [view email]
[v1] Mon, 16 Nov 2015 09:56:04 UTC (5,197 KB)
[v2] Thu, 19 Nov 2015 22:36:55 UTC (7,322 KB)
[v3] Fri, 8 Jan 2016 02:56:24 UTC (8,880 KB)
[v4] Sat, 2 Apr 2016 05:26:39 UTC (8,879 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Sherlock: Modeling Structured Knowledge in Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Sherlock: Modeling Structured Knowledge in Images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators