Subject Independent Emotion Recognition System For
Subject Independent Emotion Recognition System For
https://doi.org/10.1007/s12652-020-02338-8
ORIGINAL RESEARCH
Received: 21 January 2020 / Accepted: 10 July 2020 / Published online: 16 July 2020
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
Emotion recognition from Electroencephalography (EEG) is a better choice for the people with facial deformity like where
facial data is not accurate or not available for example burned or paralyzed faces. This research exploits the image process-
ing capability of convolutional neural network (CNN) and proposes a CNN model to classify different emotions from the
scalogram images of EEG data. Scalogram images from EEG obtained by applying continuous wavelet transform used for
the study. The proposed model is subject independent where the objective is to extract emotion specific features from EEG
data irrespective of the source of the data. The proposed emotion recognition model is evaluated on two benchmark public
databases namely DEAP and SEED. In order to show the model as a purely subject independent one, the cross data base
criteria is also used for evaluation. The various performance evaluation experiments show that the proposed model is com-
parable in terms of emotion classification accuracy.
Keywords Convolutional neural network · Affective computing · Valence arousal model · Scalogram
13
Vol.:(0123456789)
sessions is used to train the classifier. Later on different ses- shows the use of fractal dimension (Sourina and Liu 2011)
sion data is used to test the classifier. For subject dependent as features of EEG signals. Fractals are based on self-simi-
case, good accuracy have been achieved on DEAP database. larity concepts. Higher-order-crossing (HOC) based feature
Moon et al. (2018) have used brain connectivity features vector is developed (Petrantonakis and Hadjileontiadis 2009)
with CNN on DEAP database. They considered three con- to overcome the problem of subject to subject variability
nectivity features- Pearson correlation coefficients, Phase lag in EEG signals corresponding to emotions. HOC is a time
index and phase locking value separately to feed into CNN. series analysis method used to find out the number of zero
Salama et al. (Salama et al. 2018) have represented EEG crossing point from the filtered EEG signal. Zero crossing is
data from DEAP database in three dimensional space and used to measure the oscillating property of the signal. Some
used 3D CNN as classifier. They reported model accuracy works (Mert and Akan 2018) have extracted intrinsic mode
87.44% for valence and 88.49% for arousal using k-fold cross functions (IMFs) from EEG and several statistical or Power
validation techniques. There k was set to 5. Li et al. (Li et al. spectral Density based features are determined from using
2016) have used scalogram images with hybrid classifica- these IMFs (Pandey and Seeja 2019c).
tion model which was the combination of CNN and LSTM. The people who have burnt or paralyzed faces are more
They reported K-fold cross validation accuracy as 74.12% prone to emotion disorders. Emotion disorder may lead
for arousal and 72.06% for valence taking K at 5. In this way to various psychological problems. Therefore, continuous
this work was also subject dependent. Few authors worked monitoring of Emotions is required. Moreover, in situa-
on each subject individually and then reported their aver- tions where the labeled data of a particular subject (subject
age classification accuracy as in Wang et al. (2011). They dependent case) is not available for training the model as in
worked on self created data and got 66.51% accuracy for case of a paralyzed person, the subject independent model
four emotions with five subjects only. Liang et al. (2019) is the only way to identify the emotions. Therefore, in this
have used clustering based approach to classify emotions paper a subject independent EEG based Emotion recognition
using EEG. On DEAP database they got accuracy as 61.02 is suggested for the people with facial deformity. The major
for arousal and 52.58% for valence with subject dependent contributions of this research are,
criteria.
Lakhan et al. (2019) created and worked on their own • A pure subject independent emotion recognition system
database to examine the suitability of OpenBCI as compared has proposed using cross database criteria.
to costly EEG amplifiers for data acquisition. Moreover, they • An optimized CNN model is proposed for EEG based
validated their method on other available benchmark data- emotion recognition that takes scalogram images of EEG
bases. In case of DEAP data they reported mean valence as input and classifies emotions.
accuracy 57.60% and mean arousal accuracy 62.00%.
In subject independent approach, training data belongs to
different subjects and test data belongs to fully different sub- 2 Materials and methods
ject. For subject independent case, very less number of work
is reported with limited classification accuracy. Rayatdoost 2.1 EEG
and Soleymani (2018) have worked with subject independ-
ent approach on DEAP database and got accuracy as 59.22 EEG captures and records the electrical functions of the
for valence and 55.70 for arousal. An ensemble classifier is brain along the scalp. There are primarily five types of EEG
constructed (Fazli et al. 2009) for using temporal and spatial frequency bands: δ < 4 Hz., θ > = 4 and < 8 Hz., α > = 8
filters for EEG data related to Brain computer interface. A and < = 14 Hz., β > 14 and < = 40 and γ between 40 to
transfer learning approach is proposed (Li et al. 2019) for 100 Hz. The main difficulty with EEG is its poor spatial
fast deployment of emotion recognition models. They could resolution. Subjects wearing an electrode cap during watch-
improve the classification accuracy by 12.72% compared to ing the stimuli for predetermined amount of time and the
non transfer learning approach on SEED data. EEG signals would be recorded using any recording soft-
Statistical features like time domain and frequency ware of EEG. The electrodes in the electrode cap should be
domain features or both are used to represent EEG. Short placed according to 10/20 international electrode placement
time Fourier Transform (Ackermann et al. 2016), CWT (Li system (Klem et al. 1999).
et al. 2016) and discrete wavelet transforms (Murugappan
et al. 2010; Pandey and Seeja 2019a, b) are some techniques 2.2 Valence arousal model
those are used for EEG-signal analysis. Fourier Transform
gives phase and amplitude of sinusoids of a signal, whereas Researchers understand and prepare emotional space by
wavelet coefficients give the correlation between wavelet using two unique models. Emotions are classified by using
function and signal at a particular time. Few of the works these two models (Mauss and Robinson 2009)—discrete and
13
2.3 Databases
2.3.1 DEAP Database
13
⎜∫
Sc = ��fw (s, u)�� = ⎜ �fw (s, u)�2 du⎟ (2) fore it is cropped and fed into the deep learning based CNN
⎟ classifier. Scalogram is cropped to reduce the size of image
⎝−∞ ⎠
and hence save the computation. Moreover, it is observed
Scalogram Sc represents the energy of fw at a scale s. that cropped image gives better classification accuracy as
The scalogram represents those scales/frequencies of a compared to the full image. Figure 3 represents the complete
signal, which contribute the most to the total energy of the approach used for the proposed study. This work is subject
signal. In other ways, the time-varying energy substances independent as data from 2 subjects is used for testing and
upon a span of frequencies can be obtained by plotting 30 subjects for training at a time for both the cases—valence
squared modulus of the wavelet transform as a function and arousal for DEAP database. In case of SEED the data
of frequency and time to create a scalogram (Kareem and from 12 subjects is used to train the CNN model and 3 sub-
Kijewski 2002). Figure 2 shows a sample EEG signal jects to test the model.
from DEAP database and its scalogram image. In order to implement pure subject independent approach
and to obtain better generalization, a database independent
criterion is utilized. Therefore model is trained with one
2.5 CNN database and tested with another database. Two benchmark
EEG emotion databases are utilized for this purpose namely-
CNN is a deep learning technique which provides sparse DEAP and SEED. To do so first, all frontal electrodes EEG
connectivity as well as weight sharing. CNN consists from DEAP and SEED is converted into scalogram images
mainly of three unique types of layers. These layers using CWT and distributed into three classes of emotions,
involves convolution, pooling and fully connected. Con- positive, neutral and negative based on valence rating.
volution layer is the backbone of CNN. Learnable filter
is convolved through the input volume. Weight sharing 2.6.1 Proposed CNN architecture
is done by convolving same filter at different position of
the image. This results fewer number of parameters and Proposed model comprises of total twelve layers. First is
hence faster the network. Pooling operation can be aver- input layer in which scalogram of size 91 × 91 is inputted
age pooling or max pooling. followed by two set of convolution, batch normalization,
Relu and max pooling layer are used. After that, there exists
Fig. 2 A sample EEG signal and its cropped scalogram image
13
a fully connected layer, which connected to all the previous A simple CNN architecture is more appropriate than
inputs. Convolution is a widely used operation in the area pre trained models like Alex Net or Google net for the
of signal processing, image processing, and other engineer- proposed work. Since here, the classification task is to
ing as well as Science based applications. A centered con- classify two or three classes of emotions those require less
volution operation is used in this model. If I be the pixel complex CNN architectures than other pre trained models
value of scalogram image obtained after applying CWT onto like image net with thousand classes of images. Moreover
the EEG signal x, a centered convolution operation can be the domain of training data would be fully different so
defined as shown in Eq. (3). domain adaptation (transfer learning) would not be accu-
rate in this case of scalogram images as pre-trained CNNs
∑
m∕2
∑
n∕2
are trained with the images of several objects. Hence, in
Aij = (I ∗ K)ij = Ii−a,j−b K(m∕2)+a,(n∕2)+b (3)
a=−m∕2 b=−n∕2
the proposed CNN model, two set of Convolution, batch
normalization, ReLU and max pooling layer is used to
Here Aij is the revised entry, Iij is the centered pixel of the recognize emotions.
image block (portion) through which the filter convolved.
The two dimensional filter is of size mxn and K is the weight
assigned to neighborhood pixels. Final result is fed into 3 Implementation
pooling layer after passing ReLU (rectified linear unit) layer.
ReLU layer converts each negative value to 0. Then pooling The proposed methodology is implemented in a system
layer performs down sampling operation. Here max pool- with 12 GB RAM and Intel(R) core(TM) Intel i5 processor
ing is used in this study. Batch normalization layer is added with 2.50 GHz speed using single GPU environment. This
between convolution and Relu layer to normalize the deter- work is implemented using MATLAB R2018b.
mined feature maps by convolution layer. Next is fully con- In the proposed work, EEG signals are converted into
nected layer that is fully connected with the previous layer scalogram images by using CWT. These images are fed
inputs. Figure 4 depicts the proposed CNN architecture. into CNN. Since CNN is a deep learning approach, it does
13
the entire feature engineering itself. While implementing Table 3 Training and testing data statistics (DEAP database)
CWT, to convert EEG into scalogram images, the elemen- No. of electrodes Training Testing Total
tary steps of decompositions were decided by using the
number of voices per octave. So parameters for data crea- 10 frontal electrodes 12,000 800 12,800
tion were Voice per Octave (VOP) and Mother Wavelet. All 62 electrodes 38,400 2560 40,960
VOP ranged from 12 to 48. Best observed value is 38 in
case of EEG emotion recognition. Three mother wavelets
were analyzed -Morse wavelet, Bump and Amor (Analytic Table 4 Results with DEAP data
Morlet). Morse wavelet has given the better results. CWT No. of electrodes Accuracy Accuracy 3 emotions
contains plethora of frequency components to analyze sig- (valence) (arousal) (positive, neutral,
nals and the time–frequency representation of a signal is negative)
termed as scalogram. To generate scalogram, first, filter
10 frontal 61.50 58.50 46.67
bank is created where the signal length is the size of an
All 32 59.50 58.00 44.33
EEG for one electrode having 7680 data values in case of
DEAP database and 8000 in case of SEED database. Now
the continuous wavelet transform is applied on the cre-
ated filter bank. The time bandwidth product is kept as 60. imbalance problem in both the Valence and Arousal classes,
CWT does not use constant window size. It uses smaller valence and arousal thresholds are adjusted. For arousal,
scale for high frequencies and larger scales for low fre- threshold obtained is 5.23 means if the arousal rating is less
quencies. There are 120 scales used in CWT filter-bank to than or equal to 5.23, the EEG corresponding to this rat-
obtain scalogram for one EEG signal in which min value is ing would be in low arousal class otherwise in high arousal
0.689988916500468 and max value is 666.893121863546. class. Similarly for valence, the threshold value is 5.04. If
The activation function used at fully connected layer is the participant’s rating is less than or equal to 5.04, the cor-
softmax. With the proposed CNN model, several param- responding EEG belongs to negative valence class and if the
eters were evaluated at their different values like number participant’s rating is greater than 5.04, the corresponding
of filters and filter size, number of convolutional layers, EEG belongs to positive valence class.
number of epochs, initial learning rate, bias learning rate For three emotion classification, the valence ratings are
and batch size. Good results are obtained with batch size divided into three parts. One to three belongs to negative
100, filter size 4 × 4, number of filters 16, no. of convolu- class, four to six belonged neutral class and seven to nine
tion layers 2, initial learning rate 0.0001, bias learning rated videos belonged to positive class. But the data distri-
rate 0.001 at varying number of epochs for different cases. bution was highly imbalanced in three classes. To deal with
Three different experiments were carried out to demon- the stated problem, rating range has been adjusted so that
strate the proposed subject independent model. In experi- data distribution becomes nearly balanced. Updated ranges
ment 1 and 2 different subjects’ data were used for training of rating are 1.0 to 4.1 for negative, greater than 4.1 to less
and testing and in experiment 3, different databases were than 6.7 for neutral class and 6.7 to 9.0 for positive class for
used for training and testing. valence. The results are shown in Table 4.
13
Table 5 Training and testing data statistics (SEED database) Table 7 Results for 3 class (positive, neutral, negative) of emotions
No. of electrodes Training Testing Total Training Database Test database No. of electrodes Accuracy
Table 6 Results with SEED data Table 8 Results for 2 classes (high/low valence)
No. of electrodes Valence acc. 3 emotions Training Database Test database No. of electrodes Accuracy
(positive, neutral,
negative) DEAP SEED 10 54.00
SEED DEAP 51.02
10 frontal 56.22 51.5
All 62 53.68 51.0
13
5 Conclusion
From the experiments, it is found that,
This paper proposes a Deep CNN model for subject inde-
(a) From the experiment 1 and 2 (Tables 4 and Table 6), pendent emotion recognition from EEG Data. In order to
it is clear that frontal electrodes are better for emo- exploit the image processing capability of CNN, in the pro-
tion recognition as compared to all electrodes in case posed methodology, scalogram image of EEG is used as
of valence and arousal classification, since the results the input to the CNN. The low/high valence and arousal
obtained with frontal electrodes is better as compared thresholds are selected in such a way that the dataset is
to all 32 electrodes in case of DEAP and all 62 elec- almost balanced for classification. The experiments show
trodes in case of SEED data. that the EEG from frontal electrodes contains information
(b) Experiment 2 shows that the accuracy of seed to seed about emotion compared to other electrodes that supports
is higher and it may be due to the stimulus videos used the existing theory. In order to evaluate the proposed model
in Seed data set creation. That means the videos were as a pure subject independent emotion recognition system,
capable to induce emotions distinctly in SEED data. cross database criteria is also used for evaluation. In cross
For example, two persons are fighting in a video. By database criteria, the model is trained with one dataset and
looking this video, one subject may be sad and other tested on another dataset for better generalization. The pro-
person may be angry on one of the fighting persons posed model performs better when it is trained using the
according to the situation portrayed in the video. So benchmark DEAP dataset. The proposed model is found to
sad and angry are different emotions having different be effective in subject independent emotion recognition in
valence levels. From the results, it seems that in SEED, terms of classification accuracy compared to the state of
Jirayucharoensak et al. (2014) Subject independent (DEAP Data- Fast Fourier Transform (feature-PSD) Valence 53, Arousal 52
base)
Rayatdoost and Soleymani (2018) Subject independent (DEAP Data- Best at spectral- topography maps as Valence-59.22 Arousal-55.70
base) features with CNN classifier
Proposed Subject independent (DEAP Data- Scalogram images of all frontal Valence-61.50, Arousal-58.50
base) electrodes as features with CNN
classifier
13
the art models. In the future work, attention mechanism on Lang PJ (1995) The emotion probe: studies of motivation and atten-
different brain region may be employed to improve the clas- tion. Am Psychol 50(5):372
Li X, Song D, Zhang P, Yu G, Hou Y, Hu B (2016) Emotion recogni-
sification accuracy for subject independent emotion recogni- tion from multi-channel EEG data through convolutional recur-
tion with EEG signals. rent neural network. In: 2016 IEEE international conference on
bioinformatics and biomedicine (BIBM). IEEE, pp 352–359
Li Y, Zheng W, Zong Y, Cui Z, Zhang T, Zhou X (2018) A bi-
hemisphere domain adversarial neural network model for EEG
References emotion recognition. IEEE Trans Affect Comput. https://doi.
org/10.1109/TAFFC.2018.2885474
Ackermann P, Kohlschein C, Bitsch JÁ, Wehrle K, Jeschke S (2016) Li J, Qiu S, Shen Y, Liu C, He H (2019) Multisource transfer learn-
EEG-based automatic emotion recognition: feature extraction, ing for cross-subject EEG emotion recognition. IEEE Trans
selection and classification methods. In: e-Health networking, Cybern 20(7):3281–3293
applications and services (Healthcom), 2016 IEEE 18th interna- Liang Z, Oba S, Ishii S (2019) An unsupervised EEG decoding sys-
tional conference on. IEEE, pp 1–6 tem for human emotion recognition. Neural Netw 116:257–268
Alarcao SM, Fonseca MJ (2017) Emotions recognition using EEG sig- Mauss IB, Robinson MD (2009) Measures of emotion: a review.
nals: a survey. IEEE Trans Affect Comput 10(3):374–393 Cogn Emot 23(2):209–237
Bolós VJ, Benítez R (2014) The wavelet scalogram in the study of time Mert A, Akan A (2018) Emotion recognition from EEG signals by
series. In: Advances in differential equations and applications. using multivariate empirical mode decomposition. Pattern Anal
Springer, Cham, pp 147–154 Appl 21(1):81–89
Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment Moon SE, Jang S, Lee JS (2018) Convolutional neural network
manikin and the semantic differential. J Behav Ther Exp Psychia- approach for EEG-based emotion recognition using brain con-
try 25(1):49–59 nectivity and its spatial information. In: 2018 IEEE interna-
Fazli S, Popescu F, Danóczy M, Blankertz B, Müller KR, Grozea C tional conference on acoustics, speech and signal processing
(2009) Subject-independent mental state classification in single (ICASSP). IEEE, pp 2556–2560
trials. Neural Netw 22(9):1305–1312 Murugappan M, Ramachandran N, Sazali Y (2010) Classification of
Jayaram V, Alamgir M, Altun Y, Scholkopf B, Grosse-Wentrup M human emotion from EEG using discrete wavelet transform. J
(2016) Transfer learning in brain-computer interfaces. IEEE Com- Biomed Sci Eng 3(04):390
put Intell Mag 11(1):20–31 Pandey P, Seeja KR (2019a) Emotional state recognition with EEG
Jirayucharoensak S, Pan-Ngum S, Israsena P (2014) EEG-based emo- signals using subject independent approach. In: Mishra D, Yang
tion recognition using deep learning network with principal com- XS, Unal A (eds) Data science and big data analytics. Lecture
ponent based covariate shift adaptation. Sci World J 2014. https:// Notes on Data Engineering and Communications Technologies,
doi.org/10.1155/2014/627892 vol 16. Springer, Singapore, pp 117–124
Kareem A, Kijewski T (2002) Time-frequency analysis of wind effects Pandey P, Seeja KR (2019b) Subject-independent emotion detection
on structures. J Wind Eng Ind Aerodyn 90(12–15):1435–1452 from EEG signals using deep neural network. In: International
Klem GH, Lüders HO, Jasper HH, Elger C (1999) The ten-twenty elec- conference on innovative computing and communications.
trode system of the International Federation. Electroencephalogr Springer, Singapore, pp 41–46
Clin Neurophysiol 52(3):3–6 Pandey P, Seeja KR (2019c) Subject independent emotion rec-
Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T et al ognition from EEG using VMD and deep learning. J King
(2011) Deap: a database for emotion analysis; using physiological Saud Univ-Comput Inf Sci. https: //doi.org/10.1016/j.jksuc
signals. IEEE Trans Affect Comput 3(1):18–31 i.2019.11.003
Lakhan P, Banluesombatkul N, Changniam V, Dhithijaiyratn R, Lee- Petrantonakis PC, Hadjileontiadis LJ (2009) Emotion recognition
laarporn P, Boonchieng E et al (2019) Consumer grade brain from EEG using higher order crossings. IEEE Trans Inf Technol
sensing for emotion recognition. IEEE Sens J 19(21):9896–9907 Biomed 14(2):186–197
Lan Z, Sourina O, Wang L, Scherer R, Müller-Putz GR (2018) Rayatdoost S, Soleymani M (2018). Cross-corpus eeg-based emo-
Domain adaptation techniques for EEG-based emotion recog- tion recognition. In: 2018 IEEE 28th international workshop on
nition: a comparative study on two public datasets. IEEE Trans machine learning for signal processing (MLSP). IEEE, pp 1–6
Cogn Dev Syst 11(1):85–94
13
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol Wang XW, Nie D, Lu BL (2011) EEG-based emotion recognition using
39(6):1161–1178 frequency domain features and support vector machines. In: Inter-
Salama ES, El-Khoribi RA, Shoman ME, Shalaby MAW (2018) national conference on neural information processing. Springer,
EEG-based emotion recognition using 3D convolutional neural Berlin, pp 734–743
networks. Int J Adv Comput Sci Appl 9(8):329–337 Zheng WL, Lu BL (2015) Investigating critical frequency bands and
Song T, Zheng W, Song P, Cui Z (2018) EEG emotion recognition channels for EEG-based emotion recognition with deep neural
using dynamical graph convolutional neural networks. IEEE Trans networks. IEEE Trans Auton Ment Dev 7(3):162–175
Affect Comput. https://doi.org/10.1109/TAFFC.2018.2817622
Sourina O, Liu Y (2011) A fractal-based algorithm of emotion recog- Publisher’s Note Springer Nature remains neutral with regard to
nition from EEG using arousal-valence model. In: International jurisdictional claims in published maps and institutional affiliations.
Conference on Bio-inspired Systems and Signal Processing, Vol
2. SCITEPRESS, pp 209–214
13
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at