Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

Moon, Seungwhan; Kim, Suyoun; Wang, Haohan

Computer Science > Neural and Evolutionary Computing

arXiv:1412.3121 (cs)

[Submitted on 9 Dec 2014 (v1), last revised 18 Feb 2016 (this version, v2)]

Title:Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

Authors:Seungwhan Moon, Suyoun Kim, Haohan Wang

View PDF

Abstract:We propose a transfer deep learning (TDL) framework that can transfer the knowledge obtained from a single-modal neural network to a network with a different modality. Specifically, we show that we can leverage speech data to fine-tune the network trained for video recognition, given an initial set of audio-video parallel dataset within the same semantics. Our approach first learns the analogy-preserving embeddings between the abstract representations learned from intermediate layers of each network, allowing for semantics-level transfer between the source and target modalities. We then apply our neural network operation that fine-tunes the target network with the additional knowledge transferred from the source network, while keeping the topology of the target network unchanged. While we present an audio-visual recognition task as an application of our approach, our framework is flexible and thus can work with any multimodal dataset, or with any already-existing deep networks that share the common underlying semantics. In this work in progress report, we aim to provide comprehensive results of different configurations of the proposed approach on two widely used audio-visual datasets, and we discuss potential applications of the proposed approach.

Comments:	6 pages, MMML workshop at NIPS 2015
Subjects:	Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Cite as:	arXiv:1412.3121 [cs.NE]
	(or arXiv:1412.3121v2 [cs.NE] for this version)
	https://doi.org/10.48550/arXiv.1412.3121

Submission history

From: Seungwhan Moon [view email]
[v1] Tue, 9 Dec 2014 21:12:19 UTC (554 KB)
[v2] Thu, 18 Feb 2016 19:56:41 UTC (248 KB)

Computer Science > Neural and Evolutionary Computing

Title:Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Neural and Evolutionary Computing

Title:Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators