Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

Wu, Yujin; Daoudi, Mohamed; Amad, Ali

Computer Science > Human-Computer Interaction

arXiv:2303.17611 (cs)

[Submitted on 29 Mar 2023]

Title:Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

Authors:Yujin Wu, Mohamed Daoudi, Ali Amad

View PDF

Abstract:Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model proved to be more accurate and robust compared to fully-supervised methods on low data regimes.

Comments:	Accepted IEEE Transactions On Affective Computing
Subjects:	Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2303.17611 [cs.HC]
	(or arXiv:2303.17611v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2303.17611

Submission history

From: Mohammed Daoudi [view email]
[v1] Wed, 29 Mar 2023 19:45:55 UTC (13,620 KB)

Computer Science > Human-Computer Interaction

Title:Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators