Deep Joint Spatiotemporal Network (DJSTN) for Efficient Facial Expression Recognition

Dami Jeong; Byung-Gyu Kim; Suh-Yeon Dong

doi:10.3390/s20071936

Deep Joint Spatiotemporal Network (DJSTN) for Efficient Facial Expression Recognition

Sensors (Basel). 2020 Mar 30;20(7):1936. doi: 10.3390/s20071936.

Authors

Dami Jeong¹, Byung-Gyu Kim¹, Suh-Yeon Dong¹

Affiliation

¹ Department of IT Engineering, Sookmyung Women's University, 100 Chungpa-ro 47 gil, Yongsna-gu, Seoul 04310, Korea.

Abstract

Understanding a person's feelings is a very important process for the affective computing. People express their emotions in various ways. Among them, facial expression is the most effective way to present human emotional status. We propose efficient deep joint spatiotemporal features for facial expression recognition based on the deep appearance and geometric neural networks. We apply three-dimensional (3D) convolution to extract spatial and temporal features at the same time. For the geometric network, 23 dominant facial landmarks are selected to express the movement of facial muscle through the analysis of energy distribution of whole facial landmarks.We combine these features by the designed joint fusion classifier to complement each other. From the experimental results, we verify the recognition accuracy of 99.21%, 87.88%, and 91.83% for CK+, MMI, and FERA datasets, respectively. Through the comparative analysis, we show that the proposed scheme is able to improve the recognition accuracy by 4% at least.

Keywords: deep learning; deep spatiotemporal network; facial expression recognition (FER); geometric feature; joint fusion classifier; local binary pattern (LBP) feature.

MeSH terms

Adult
Algorithms
Emotions / physiology*
Face / physiology*
Facial Expression
Facial Muscles / physiology
Facial Recognition / physiology*
Female
Humans
Male
Neural Networks, Computer*

Grants and funding

NRF-2016R1D1A1B04934750/National Research Foundation of Korea (NRF)