A Framework for Generative and Contrastive Learning of Audio Representations

Verma, Prateek; Smith, Julius

Computer Science > Sound

arXiv:2010.11459 (cs)

[Submitted on 22 Oct 2020 (v1), last revised 16 Mar 2021 (this version, v2)]

Title:A Framework for Generative and Contrastive Learning of Audio Representations

Authors:Prateek Verma, Julius Smith

View PDF

Abstract:In this paper, we present a framework for contrastive learning for audio representations, in a self supervised frame work without access to any ground truth labels. The core idea in self supervised contrastive learning is to map an audio signal and its various augmented versions (representative of salient aspects of audio like pitch, timbre etc.) to a space where they are close together, and are separated from other different signals. In addition we also explore generative models based on state of the art transformer based architectures for learning latent spaces for audio signals, without access to any labels. Here, we map audio signals on a smaller scale to discrete dictionary elements and train transformers to predict the next dictionary element. We only use data as a method of supervision, bypassing the need of labels needed to act as a supervision for training the deep neural networks. We then use a linear classifier head in order to evaluate the performance of our models, for both self supervised contrastive and generative transformer based representations that are learned. Our system achieves considerable performance, compared to a fully supervised method, with access to ground truth labels to train the neural network model. These representations, with avail-ability of large scale audio data show promise in various tasks for audio understanding tasks

Comments:	6 pages, 2 figures, 5 page version
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2010.11459 [cs.SD]
	(or arXiv:2010.11459v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2010.11459

Submission history

From: Prateek Verma [view email]
[v1] Thu, 22 Oct 2020 05:52:32 UTC (523 KB)
[v2] Tue, 16 Mar 2021 21:41:13 UTC (681 KB)

Computer Science > Sound

Title:A Framework for Generative and Contrastive Learning of Audio Representations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:A Framework for Generative and Contrastive Learning of Audio Representations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators