EEG Emotion Recognition

Biomedical Signal Processing and Control 58 (2020) 101756
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control

journal homepage: www.elsevier.com/locate/bspc
EEG-based emotion recognition using simple recurrent units network

and ensemble learning
Chen Wei, Lan-lan Chen ∗ , Zhen-zhen Song, Xiao-guang Lou, Dong-dong Li
Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology,
Shanghai, 200237, PR China
a r t i c l e i n f o a b s t r a c t
Article history: The purpose of this research is to develop an EEG-based emotion recognition system for identification of
Received 25 June 2019 three emotions: positive, neutral and negative. Up to now, various modeling approaches for automatic
Received in revised form 16 October 2019 emotion recognition have been reported. However, the time dependency property during emotion pro-
Accepted 5 November 2019
cess has not been fully considered. In order to grasp the temporal information of EEG, we adopt deep
Available online 8 January 2020
Simple Recurrent Units (SRU) network which is not only capable of processing sequence data but also
has the ability to solve the problem of long-term dependencies occurrence in normal Recurrent Neural
Keywords:
Network (RNN). Before training the emotion models, Dual-tree Complex Wavelet Transform (DT-CWT)
EEG
Emotion recognition
was applied to decompose the original EEG into five constituent sub-bands, from which features were
DT-CWT then extracted using time, frequency and nonlinear analysis. Next, deep SRU models were established
SRU using four different features over five frequency bands and favorable results were found to be related to
Ensemble learning higher frequency bands. Finally, three ensemble strategies were employed to integrate base SRU models
to get more desirable classification performance. We evaluate and compare the performance of shallow
models, deep models and ensemble models. Our experimental results demonstrated that the proposed
emotion recognition system based on SRU network and ensemble learning could achieve satisfactory
identification performance with relatively economic computational cost.
© 2019 Published by Elsevier Ltd.
1. Introduction visual based approaches, physiological signals show relatively high

identification accuracy because the users cannot control it. There
Emotion is an essential component of being human, and has a are two kinds of physiological changes related to emotion: one is
significant impact on people’s daily activities, including communi- relevant to Peripheral Nervous System (PNS) and the other is rele-
cation, interaction, learning, etc. It is important to distinguish the vant to Central Nervous System (CNS). Usually, features extracted
emotional states of people around us for natural communication from peripheral physiological signals such as electrocardiogram
[1]. Besides, in the field of human-machine interaction (HMI), auto- (ECG), skin conductance (SC) and pulse could provide detailed
matic emotion recognition is also an important and challenging task and complex information for recognizing emotional states [5].
[2]. Compared with other peripheral physiological signals, electroen-
A large number of literatures on technique approaches for emo- cephalogram (EEG) signals captured from central nervous system,
tion recognition have been reported over the past years, and can can directly reflect the activity of the brain, and have an intrinsic
mainly be classified into three categories: (1) facial expressions relationship with human emotional states [6].
and voice; (2) periphery physiological signals; (3) brain signals With the rapid progress of wearable devices, flexible dry elec-
generated by central nervous system [3]. In these measures, the trode, machine learning and wide applications of human-machine
audio-visual based detectors that interpret facial expression and interaction, more and more researches and approaches on EEG-
voice realize noncontact recognition of emotion, but they cannot based emotion recognition have been developed. In the previous
always return reliable results because people may easily disguise studies, researchers extracted various features from raw EEG
their emotions without being noticed [4]. Compared with audio- signals to evaluate the correlation between the EEG data and
emotional states, such as time analysis, wavelet coefficients and
nonlinear dynamical analysis [7–9]. To reduce the dimension of fea-
∗ Corresponding author.
tures, Principal Component Analysis (PCA) and Linear Discriminant
E-mail address: [email protected] (L.-l. Chen).
Analysis (LDA) are the most widely used methods [10–12]. Besides,
https://doi.org/10.1016/j.bspc.2019.101756
1746-8094/© 2019 Published by Elsevier Ltd.
2 C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756
Minimum Redundancy Maximum Relevance (mRMR), Singular

Value Decomposition and QR Factorization have been employed
so as to eliminate the weak and redundant features [13,14]. Some
modeling techniques, such as Naive Bayes (NB), Support Vector
Machine (SVM) and Random Forest (RF) are usually used with man-
ual feature extraction and selection processes together [15,16].
Recent deep learning techniques enable automatic feature
extraction and selection, and have a great influence on signal and
information processing [17]. Many deep learning models, such as Fig. 1. The experimental process of SEED.
Auto-Encoder (AE), Deep Belief Network (DBN) and Convolution
Neural Network (CNN), have been successfully applied into physi-
2. SEED dataset
ological signals processing and achieve promising results compared
with conventional shallow models [10,11]. adopted multimodal
In this paper, emotion recognition research was conducted on
deep learning approach to classify different emotions based on two
SEED dataset [14], in which emotions were stimulated by film clips.
standard emotion recognition datasets (i.e., SEED and DEAP). They
These emotional films were all Chinese films containing scene and
demonstrated that high level representation features extracted by
audio, and could place the subjects in a more realistic scene and
the Bimodal Deep Auto-Encoder (BDAE) were significant for emo-
evoke strong physiological changes. There were totally 15 clips in
tion recognition. Wang et al. [18] used the data augmentation
one experiment, and each clip lasted for about 4 min. There were
method which help mitigate overfitting by generating modified
3 classes of emotions (positive, neutral and negative) estimated in
samples from the training data. After comparing the performance
the dataset, and each emotion had 5 corresponding film clips. In one
of SVM and CNN models (i.e., LeNet and ResNet) on SEED and
session, there was a 5 s hint before each clip, 45 s self-assessment
MAHNOB-HCI datasets, they found that deep models could get bet-
and 15 s rest after each clip. For the self-assessment, subjects were
ter effect with the data augmentation method.
required to complete the questionnaires to feedback their emo-
In most cases, the subject’s emotional states are often treated as
tional experience. Fig. 1 shows the detailed experimental process.
independent points. However, emotion is a changing process with
There were totally 15 subjects (7 males and 8 females;
the time dependency property, which should be taken into con-
23.27 ± 2.37 years old) participating in the experiments. All par-
sideration [19]. Deep learning methods like CNN or AE could not
ticipants were native Chinese students with self-reported normal
grasp the temporal information of EEG. There is another deep learn-
vision and hearing. Each subject executed the experiments 3 times,
ing algorithm called Recurrent Neural Network (RNN), in which
and the experiment interval was one week or longer. EEG was
connections between units form a loop that enables it to process
recorded from a 62-channel electrode cap according to the 10–20
sequence data. RNN has been widely used on machine translation
system at a sampling rate of 1000 Hz. For signal processing, the orig-
and speech recognition [20]. However, standard RNN has the inher-
inal EEG data were first downsampled to 200 Hz and then filtered
ent problem of long-term dependencies. Some special kinds of RNN
to 0.5–70 Hz.
such as Long Short-Term Memory [21] show the ability of solving
this problem and have been applied to process and analyze physi-
ological signals. Tang et al. [22] introduced a Bimodal-LSTM model 3. Methodology
for emotion recognition with multimodal signals. When evalu-
ated on SEED dataset, the Bimodal-LSTM model used EEG and eye The general block diagram of the proposed system is presented
movement features as inputs, while it analyzed EEG and peripheral in Fig. 2. Firstly, signals are pre-processed by filtering, then decom-
physiological signals when examined on DEAP dataset. Li et al. [9] posed into sub-bands using DT-CWT. Secondly, four kinds of feature
first encapsulated the multi-channel signals in DEAP dataset into extraction methods are executed from time, frequency and non-
grid-like7-15: frames by using wavelet and scalogram transform. linear analysis. Thirdly, SRU models are established on different
Then they designed a hybrid deep learning model combining CNN feature sets. Finally, three different ensemble strategies are adopted
and LSTM, aiming to extract task-related features, explore inter- and compared to select the best ensemble model.
channel correlation and integrate contextual information from the
frames. 3.1. Dual-tree complex wavelet transform (DT-CWT)
Recently, an improved version of RNN, namely Simple Recurrent
Units (SRU) has been proposed, which can simplify the computation Put forward by Kingsbury [23], DT-CWT utilizes a dual tree of
and expose more parallelism, so the training time can be reduced wavelet filters to attain the real and imaginary parts of complex
at a large degree. In this paper, we adopt SRU to construct classifi- coefficients. Compared with Discrete Wavelet Transform (DWT),
cation model, which both considers the temporal changing of EEG DT-CWT has the prominent attributes of approximate shift invari-
and computing efficiency to help with the recognition of emotion. ance and superior anti-aliasing [24].
Furthermore, ensemble learning methods are applied to integrate Fig. 3 presents the decomposition for a 3-level DT-CWT. Tree
multiple SRU models so as to obtain better results than using any A and Tree B represent real and imaginary parts of the transform
base model alone. The current work is designed to predict future respectively. The h0 and h1 are the low- and high-pass filter pair
emotion for a single subject after customizing a person-specific for Tree A, while the g0 and g1 are that for Tree B. Besides, (↓2)
model. To validate the efficiency of the proposed machine learning is the down-sampling operator. The sampling location of Tree B is
algorithms, we test the stability of our emotion recognition system always kept in the middle position of Tree A, thus Tree B can exactly
within and cross experimental days. collect the missing information of Tree A. Therefore, Tree A and Tree
The layout of the paper is as follows. Section 2 introduces the B can complement and fulfill each other. This could weaken the
SEED dataset studied in this paper. Section 3 gives a description effect of alternately sampling, restrain the problem of frequency
of our methodology, including DT-CWT, feature extraction meth- aliasing, and make the reconstruction components closer to real
ods, SRU network and ensemble strategies. Section 4 shows the results [25,26].
experimental results. Section 5 reviews similar researches and dis- EEG has five common rhythms, which are Delta (0–4 Hz),
cusses the contributions and limitations of the study. In addition, Theta (4–8 Hz), Alpha (8–13 Hz), Beta (13–30 Hz) and Gamma
conclusions have been made in this section as well. (30–50 Hz). These five rhythms range from 0 to 50 Hz, and con-
C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756 3
Fig. 2. General block diagram of emotion recognition based on SRU.
Fig. 3. Schematic of 3-level DT-CWT decomposition.
tain abundant information about basic rhythms of EEG. In this 3.2.2. Frequency analysis
study, 5-level decomposition and reconstruction on EEG are per- On the basis of fast Fourier transform (FFT), the power spectral
formed by DT-CWT, thus six components of z5 (0–3.125 Hz), y5 density (PSD) approach is adopted to obtain the characteristics of
(3.125–6.25 Hz), y4 (6.25–12.5 Hz), y3 (12.5–25 Hz), y2 (25–50 Hz) EEG signals in the frequency domain [7]. The definition of PSD is
and y1 (50–100 Hz) can be acquired. Among them, the ranges of z5, proposed by Wiener-Khintchine theorem. Regarding the signal as
y5, y4, y3 and y2 are close to the frequency ranges of Delta, Theta, a stationary random process, the signal autocorrelation function is
Alpha, Beta and Gamma respectively. Hence, these five sub-bands calculated as:
could represent the corresponding rhythms of EEG.
1
N−1
ˆ
3.2. Feature extraction Rx (m) = x(i)x(i + m) (2)
N
i=0
The aim of feature extraction is to gain the salient features of EEG

under different emotional states [7]. For a comprehensive study, we where m = −(M − 1), ..., −1, 0, 1, ..., M − 1, M ≤ N. Then the PSD
investigate four different kinds of features from the aspects of time, is the Fourier transform of the signal autocorrelation function, and
frequency and nonlinear analysis. could be computed as:
The segment length ought to be chosen reasonably. The longer
the segment is, the more temporal information it reflects, but the
M−1
ˆ jw ˆ −jwm
fewer samples we obtain in the data pool. To balance the num- S x (e ) = Rx (m)e (3)
ber and length of the segments, we first divided the EEG signals m=−(M−1)
into segments with a length of 5 s. Four feature extraction methods

were carried out every 0.5 s with an overlap rate of 50% respectively,
3.2.3. Nonlinear analysis
thus the sequence length of each segment was 19. In addition,
Fractal dimension (FD) is suitable for measuring the complexity
because 62-channel electrodes were used in this experiment, the
of biological signals [28]. Higuchi method shows an excellent ability
input structure of each segment was 62*19. In the 3-class emo-
in estimating FD, and is used for nonlinear analysis of EEG signals in
tion recognition task, the number of segments in each category was
this paper [25,26]. From the time-series points x(1), x(2), ..., x(N),
approximately equal. The data structure is shown in Fig. 4.
a new time series Xm k can be constructed as follows:
3.2.1. Time analysis N−m

The mean absolute value (MAV) method is employed to extract Xm
k : x(m), x(m + k), x(m + 2k), ..., x(m + int ( )k)
k (4)
the time-domain features of EEG signals [27]. Given a finite set of
time-series points x(1), x(2), ..., x(N), the MAV refers to the average m = 1, 2, ..., k
of the absolute value of the segmented signal, formulated in Eq. (1):
where m and k represent the initial time and discrete interval time,
1 respectively. The curve length Lm (k) of Xm
N
k is calculated as:
M = log x(i) (1)
N
i=1 int ( N−m )
1 k N−1
where N is the number of points in the time-windowed signal, x(i) Lm (k) = ( x(m + ik) − x(m + (i − 1)k)) × (5)
k int ( N−m )k
is the value of the ith point, and M is the log-transformed value. i=1 k
Fig. 4. The data structure of samples.
Mean value of the curve length L(k) is computed by averaging which have the ability to learn long-term dependencies because of
Lm (k) over k sets for all m. After that, FD could be obtained by: the exclusive design of units [25,26].
For sequences of symbols, recurrent networks process one sym-
log L(k)
FD = − (6) bol at a time. In common RNN architectures, such as LSTM and GRU,
log k the computation in each step is based on completing the previous
Another nonlinear analysis method used in this research is dif- step. So recurrent computations are less suitable to parallelization
ferential entropy (DE). DE is the continuous version of Shannon [32]. Besides, gating method is used in most recurrent architectures
entropy, and could be computed as: to control the information flow to alleviate vanishing and exploding
gradient problems. In this process, the computation of the network,
especially the matrix multiplication, is the most expensive opera-
h(x) = − f (x) log(f (x))dx (7)
tion. Lei et al. [32] proposed an improved version of RNN named
Simple Recurrent Units (SRU). The key design in SRU is making the
DE feature is simple and efficient for the complexity evalua- gate computation only rely on the current input of the recurrence. In
tion of a continuous random variable. Previous studies have shown this way, only the point-wise multiplication is dependent on pre-
the advantage of DE in characterizing EEG time series [14]. For a vious steps. Thus the matrix multiplications in the feed-forward
fixed length EEG sequence, DE is equal to the logarithm of PSD in network can be easily parallelized. The structure of SRU is shown
a certain frequency band. If a random variable obeys the Gaussian in Fig. 6.
distribution N(, 2 ), the DE feature can simply be computed by The basic form of SRU includes a single forget gate. Given an
the following formulation: input xt at time t, a linear transformation x̃t and the forget gate ft
∞ are computed as:
1 (x − )2 1 (x − )2 1
h(x) = − √ exp log √ exp dx = log 2e 2 (8)
−∞ 2 2 2 2
2 2 2 2 2 x̃t = Wxt (10)
where e is the Euler’s constant, and is the standard deviation of a ft = (Wf xt + bf ) (11)
sequence.
This computation only depends on xt , which realizes computing
it in parallel across all time steps. The forget gate is used to modulate
3.3. Simple recurrent units (SRU) the internal state ct :
Recurrent Neural Network (RNN) has loops inside, and the ct = ft ct−1 + (1 − ft ) x̃t (12)
information is transmitted from the present loop to the next. The
The reset gate is employed to calculate the output state ht as a
chain-like property indicates that RNN is the natural structure of
combination of ct and xt :
neural network applied to sequences and lists [25,26]. The structure
of standard RNN is shown in Fig. 5. rt = (Wr xt + br ) (13)
Given a general input sequence [x1 , x2 , ..., xk ], where xi ∈ Rd , at
each time-step of RNN modeling, a hidden state is generated which ht = rt tanh(ct ) + (1 − rt ) xt (14)
produces a hidden sequence of [c1 , c2 , ..., ck ]. The hidden state at
The complete algorithm also utilizes skip connections to
time-step t is computed from an activation function f of the current
improve training of deep networks with numerous layers. While
input xt and previous hidden state ct - 1 as:
even a naive implementation of the approach brings improvements
ct = f (xt , ct−1 ) (9) in performance, one of the merits is that it can realize optimization
especially for the existing hardware architectures. Eliminating the
Then an optional output can be produced by ht = g(ct ), result- dependencies between time steps for the most expensive opera-
ing in an output sequence [h1 , h2 , ..., hk ], which can be used for tions enables parallelization across different dimensions and time
sequence-to-sequence tasks [29]. However, standard RNN cannot steps.
avoid the problem of long-term dependencies, which implies RNN When using the artificial neural network to process data, a mul-
would lose the capacity to connect information as the gap between tilayer network can usually get more satisfactory effects than a
loops increases [30]. Thus, special kinds of RNN are proposed, such single-layer network. Deep RNN is a kind of deep network in which
as Long Short-Term Memory [21] and Gated Recurrent Units [31], depth is added via a recurrent connection in the hidden layer [33].
Fig. 5. The structure of standard RNN.
Fig. 6. The structure of SRU.
Fig. 7. The structure of the 2-level SRU model.
Usually a single-layer network has limited ability to extract abstract reasonably. The more layers the network has, the more training
features. While a multilayer network could produce more easily time it will take. With the comprehensive consideration of effec-
learned representations of input sequences, leading to a better clas- tiveness and efficiency, this paper designs a 2-level SRU model. As
sification accuracy [34]. But the depth of network should be chosen shown in Fig. 7, the deep SRU model consists of an input layer,
a sequence-to-sequence SRU layer, a many-to-one SRU layer, a 4. Results

Fully-Connected (FC) layer and a final soft-max layer for emotion
recognition. We should have adjusted the number of nodes and The proposed system was designed to predict future emotion
hidden layers using grid cross-validation, but in consideration of for a specific subject in two manners: One was within-one-day
the limited computational resources, the hidden layer sizes were manner where the emotion model was trained and tested using
tested on typical parts of data. It was found the network appeared the data obtained from the same experimental day of one par-
to work well when the first SRU layer used 200 units while the sec- ticipant. According to the experimental sequence, the first nine
ond SRU layer contained 100 units and the FC layer had 50 units. sessions were used for model training while the last six sessions
Meanwhile, the dropout probability after it was set to 50% in order were for testing. The other was cross-day manner where the
to avoid overfitting. emotion model was trained and tested using the data obtained
from different experimental days of one participant. Under both
3.4. Ensemble learning manner, the validation rate was set to 10% from the training
data.
Considering that there are four different kinds of features across Each model was trained with 200 epochs, and the optimization
five different frequency bands, then twenty groups of base models algorithm used for SRU model was Adam [38] with a mini-batch
are obtained. To further improve the recognition effect, ensemble size of 50. The initial learning rate was 0.001, while the first
learning methods are adopted to make full use of these base models. and second momentum decay rates were 0.90 and 0.99 respec-
Ensemble learning can prevent the overfitting problem of a sin- tively. The proposed deep learning method was implemented
gle base classifier, and many ensemble learning algorithms have on Keras library with TensorFlow backend and run on a PC
been developed so far, such as Boosting, Bagging and Stacking [35]. with Intel(R) Core(TM) i7-7700HQ CPU @2.80 Hz, 16GB memory,
The base classifiers ought to be designed rather different from each NVIDIA GeForce GTX 1060 GPU, and 64-bit Windows10 operating
other. In general, base classifiers can be constructed by using dif- system.
ferent input features, training sets, and learning algorithms [36]. In this section, we aim to examine the SRU model using dif-
In this paper, different input features and different frequency ferent features on different frequency bands. After DT-CWT and
bands are utilized to design base SRU models. Then multiple SRUs feature extraction, raw EEG data were transformed into four dif-
are integrated in order to obtain better classification results than ferent features (MAV, PSD, FD and DE) over five typical frequency
any single model. We propose three ensemble strategies: (1) Select bands (Delta, Theta, Alpha, Beta and Gamma). Then base SRU mod-
the best feature for each of the five frequency bands for ensemble els were trained on these samples respectively. Sections 4.1–4.3
(i.e., 5 models from the 20 candidate models); (2) Select the best report the results evaluated in within-one-day manner and Sec-
two frequency bands for each of the four features for ensemble (i.e., tion 4.4 presents the stability of emotion recognition model cross
8 models from the 20 candidate models); and (3) Ensemble all the days.
20 candidate SRU models. In these strategies, the outputs of the
individual models are combined by majority voting and weighted 4.1. Feature analysis with ROC
averaging methods [37]. The ensemble framework is depicted in
Fig. 8. Receiver operating characteristic (ROC) curve is usually
Let yi (x) i = 1, 2, ..., K, represents the output of the i-th base employed as a visual technique to reveal the association between
model, and Y (x) is the ensemble result. true positive rate and false positive rate in pace with the change
(1) Voting of a threshold parameter [39]. Three emotion categories, i.e., neg-
The output class label is determined through majority voting ative, neutral and positive, were divided into three 2-class cases,
among base models. That is: i.e., negative to others, neutral to others, and positive to others, as
shown in Fig. 9.

K
Fig. 10(a)–(d) are ROC curves for MAV, PSD, FD and DE features,
Y (x) = sign( yi (x)) (15)
respectively. Moreover, the area under ROC curve (AUC) was cal-
i=1
culated to make comparisons among multiple features and bands.
(2) Weighted Averaging First, we compared the ROC curves of the MAV feature on different
The ensemble result is computed by assigning different weights frequency bands. As shown in Fig. 10(a), the AUCs of Gamma and
to different base models. That is: Beta bands were larger than other frequency bands, while Gamma
band performed the best. These results reveal that Gamma and

K
Y (x) = wi yi (x) (16) Beta bands are more related with emotion activities than other
frequency bands [40].
i=1
Besides, it could be seen that the AUCs of Positive state were

T a little larger than that of Negative and Neutral states for most
where wi is the weight of yi , and wi ≥ 0 wi = 1. In general, the of the frequency bands. Take the MAV feature for instance, for
i=1 Negative, Auc Gamma = 0.78, Auc Beta = 0.74, Auc Alpha = 0.63; for
individual model with higher recognition accuracy should be more Neutral, Auc Gamma = 0.83, Auc Beta = 0.78, Auc Alpha = 0.65; for
weighted. In this paper, wi is chosen based on the accuracy ranking Positive, Auc Gamma = 0.90, Auc Beta = 0.86, Auc Alpha = 0.68. In
of the base models: some sense, positive class was more recognizable compared with
ki the other two classes in this experiment. Moreover, when using the
wi = (17) PSD, FD and DE features, we could get the similar conclusion as that

K
ki from the MAV feature.
i=1
4.2. Classification performance of emotion recognition models
In Eq. (15), ki represents the ranking index of the model accu-
racy on validation data, e.g., ki = K when the model has the highest The classification confusion matrices are illustrated in Fig. 11,
validation accuracy, while ki = 1 when the model has the lowest which uses the results from MAV feature for demonstration. In
validation accuracy. a confusion matrix, each row donates the target class, while
Fig. 8. Detailed framework of the ensemble model.
Fig. 9. Three emotional states divided into three 2-class cases.
each column donates the predicted class. The value in element of one. The average classification results of all subjects are shown
(i, j) is the percentage of samples in class i that is classi- in Table 1.
fied as class j. It can be seen that the SRU model achieved Similar to the results from confusion matrices, the SRU model
relatively competitive recognition performance on higher fre- obtained obviously better performance on higher frequency bands.
quency bands, such as the best classification accuracy of 79.22% Besides, SRU achieved relatively stable results on features extracted
on Gamma band and 75.91% on Beta band. When tested on from raw signals and lower frequency bands. When using the DE
lower frequency bands, the classifier obtained lower accuracy feature, the SRU model led to the best classification performance
such as 69.04%, 69.07% and 68.11% on Alpha, Theta and Delta on Gamma band with an average accuracy of 80.02%. Compared
bands respectively. In addition, an accuracy of 68.67% was got- with other classifiers, the SRU model attained the best recogni-
ten when trained on samples extracted directly from raw EEG tion performance on all the five frequency bands. For the MAV
signals. feature, it obtained the best classification accuracy of 79.22%, which
When using the PSD, FD and DE features, similar phenomenon was 11.34%, 10.98% and 10.04% higher than KNN, NB and SVM
could be observed as that of MAV feature, that was higher frequency respectively. For the PSD feature, its best classification accuracy
bands had stronger discriminative capacity for emotion recognition was 78.29%, which was 10.71%, 10.03% and 9.16% higher than KNN,
tasks than lower frequency bands. Unlike MAV and PSD features, NB and SVM respectively. For the FD feature, the best classification
the FD and DE features captured from raw signals led to a reason- accuracy was 77.22%, which was 10.53%, 12.14% and 8.01% higher
able accuracy of 74.33 and 74.35%, which showed the suitability of than KNN, NB and SVM respectively. For the DE feature, the best
FD and DE features to deal with raw signals. In addition, positive classification accuracy was 80.02%, which was 8.63%, 28.45% and
class showed higher correct identification percentage than other 6.33% higher than KNN, NB and SVM respectively. These results
two classes on all the frequency bands. In terms of computational demonstrated the superiority of the SRU models based on EEG for
cost, the average training time of the SRU model was 60.1 s. When the emotion recognition.
the SRU layers were replaced by LSTM, the training time became
136.5 s. Obviously, the computation speed of SRU was much supe-
4.3. Ensemble results
rior to that of LSTM.
Further, we compared the SRU with traditional classification
The final results of the three ensemble strategies are presented
algorithms, such as K-Nearest Neighbor (KNN), Naive Bayes (NB)
in Table 2. From it, we could learn that the voting or weighted
and Support Vector Machine (SVM). For these methods, the input
method of Strategy2, and the weighted method of Strategy3 outper-
sample structure (62∗19) was first transformed into a column vec-
formed all individual base models. Strategy1 did not lead to a good
tor (1178∗1). Then PCA was adopted to reduce the dimension of
classification performance due to the interference of a few lower
features (50∗1). In KNN, the number of nearest neighbors K was
frequency bands; Strategy2 achieved more desirable results than
searched in the space [1:20] with a step of one for the optimal value.
Strategy1. Specifically, the weighted method of Strategy2 led to the
In NB, we assumed that each feature obeyed Gaussian distribution,
best classification accuracy of 83.13% among all these three strate-
and the default prior probability was the appearance frequency
gies, which was 3.11% higher than the best individual SRU model
of each class. In SVM, we chose linear kernel, and searched the
(80.02%); Strategy3 using the weighted method also achieved good
optimal value for C in the parameter space 2[−10:10] with a step
classification effects, but the training time was much longer than
Fig. 10. The ROC curves of four different features.
the other two strategies. Therefore, the weighted method of Strat- 4.4. Stability of emotion recognition model cross days
egy2 was selected as the final ensemble strategy. Different subjects
may have different optimal features and frequency bands, the In SEED dataset, each participant executed the experiments
ensemble method could lead to a better average value and a smaller three times on different experimental days. By using SEED, we could
standard deviation for all subjects by combining several base evaluate the stability of emotion recognition model cross days. We
models. splitted the data in three different ways:
Fig. 11. The confusion matrices of MAV feature.
Table 1
Classification accuracy (%) comparison of different classifiers.
Feature Frequency band Classifiers
KNN NB SVM SRU
Gamma 67.88 ± 4.78 68.24 ± 6.85 69.18 + 10.84 79.22 ± 8.09

Beta 60.14 ± 2.55 63.55 ± 7.43 66.76 ± 7.27 75.91 ± 8.15
Alpha 44.92 ± 8.28 39.46 ± 6.48 49.23 ± 7.37 69.04 ± 5.35
MAV
Theta 44.02 ± 7.53 43.73 ± 1.36 45.91 ± 7.46 69.07 ± 3.23
Delta 44.61 ± 7.34 43.76 ± 2.63 45.33 ± 4.45 68.11 ± 2.94
Raw Signal 49.04 ± 5.97 42.57 ± 7.63 57.76 ± 9.92 68.67 ± 7.77
Gamma 67.58 ± 10.47 68.26 ± 6.76 69.13 ± 12.10 78.29 ± 8.01
Beta 60.26 ± 12.42 61.43 ± 9.46 67.27 ± 11.15 76.42 ± 7.91
Alpha 45.45 ± 9.13 40.82 ± 6.85 49.09 ± 9.28 69.29 ± 4.96
PSD
Theta 44.70 ± 10.66 45.59 ± 2.12 46.10 ± 7.14 68.32 ± 2.56
Delta 44.82 ± 9.85 43.78 ± 3.32 46.13 ± 7.74 67.46 ± 8.62
Raw Signal 51.10 ± 8.48 48.01 ± 0.36 59.95 ± 11.26 68.09 ± 2.33
Gamma 66.69 ± 7.93 65.08 ± 6.63 69.21 ± 6.42 77.22 ± 7.31
Beta 59.73 ± 3.65 55.26 ± 6.79 65.74 ± 3.58 75.52 ± 6.45
Alpha 44.63 ± 2.79 39.74 ± 9.36 48.04 ± 2.03 69.11 ± 4.12
FD
Theta 44.43 ± 9.63 44.43 ± 3.52 45.57 ± 8.37 68.14 ± 2.75
Delta 44.87 ± 1.69 41.27 ± 2.41 44.84 ± 2.06 68.15 ± 2.34
Raw Signal 58.86 ± 10.25 63.16 ± 3.46 68.29 ± 3.05 74.33 ± 7.77
Gamma 71.39 ± 4.56 50.76 ± 4.63 73.69 ± 7.44 78.86 ± 8.09
Beta 68.52 ± 9.75 51.57 ± 5.12 71.43 ± 1.48 80.02 ± 8.15
Alpha 59.89 ± 5.28 46.56 ± 6.31 64.68 ± 8.13 71.52 ± 5.35
DE
Theta 61.54 ± 4.69 46.07 ± 2.12 63.72 ± 4.36 69.85 ± 3.23
Delta 62.08 ± 7.65 44.19 ± 6.57 66.23 ± 9.58 70.05 ± 2.94
Raw Signal 63.69 ± 4.39 46.18 ± 3.42 66.95 ± 9.36 74.35 ± 7.77
The best performance for each feature and each frequency is in bold.
Type I: The data from the first experimental day of one par- Type III: The data from the first and second experimental day
ticipant was used as training data and the data from the second of one participant was used as training data and the data from the
experimental day was used as testing data. third experimental day was used as testing data.
Type II: The data from the first experimental day of one par- The average results from 15 participant were presented in
ticipant was used as training data and the data from the third Table 3, where differential entropy (DE) in Gamma band was used
experimental day was used as testing data. as feature for comparison. Initially, we intuitively thought that the
Table 2
Classification comparison of different ensemble strategies.
Ensemble Strategy Method Accuracy (%) Precision (%) Recall (%) F-score(%) Time (s)
Voting 77.12 ± 3.22 76.21 ± 4.34 76.43 ± 3.43 75.51 ± 1.76

Strategy1 315.4
Weighed 78.84 ± 1.88 76.91 ± 5.32 78.12 ± 3.73 77.44 ± 4.38
Voting 81.82 ± 2.13 80.71 ± 4.23 80.94 ± 6.48 80.21 ± 2.89
Strategy2 378.5
Weighted 83.13 ± 1.67 82.24 ± 3.64 81.53 ± 3.25 81.24 ± 3.65
Voting 80.21 ± 4.23 79.54 ± 3.34 79.12 ± 3.95 78.81 ± 1.96
Strategy3 936.1
Weighed 81.53 ± 2.22 81.11 ± 1.58 80.93 ± 3.86 80.54 ± 3.86
The best performance for different ensemble strategies is in bold.
Table 3
The average accuracies (%) of our emotion model within and cross days.
Stats. Type I Type II Type III Within-one -day
Mean 79.03 77.35 79.62 78.86

Std. 7.53 6.73 7.04 8.09
Differential entropy (DE) in Gamma band is used for comparison in Table 3.
identification accuracy in within-one-day manner would be higher stimulus. Compared with pictures and music, videos are composed
than that in the cross-day manner. In fact, we could find that the of scenes and audio and can supply more real-life scenarios to sub-
recognition accuracy was relatively stable within and cross days, jects. Thus, videos have become popular in emotion recognition
which reflected that the relation between the variation of emo- researches. Some famous datasets for emotion recognition, such
tional states and the EEG signal was relatively stable for one person as DEAP [14–16,25,46] and SEED [14,40,47] have been established,
over time. There is a reason why the recognition accuracy under where the emotions are all elicited using videos.
within-one-day manner is not very high. The movie chips used to Traditional classification algorithms along with feature selec-
stimulate emotion in the same experimental day originated from tion or dimensionality reduction methods are extensively used to
different films. So using the first nine movie sessions to model and classify different emotional states, and have achieved fine classifi-
the last six to test is a cross-film modeling in some sense. Consid- cation effects. Zheng et al. [14] operated their research on DEAP and
ering EEG signal is task-related and sensitive to different emotion SEED datasets based on traditional machine learning algorithms.
stimulus, the cross-task recognition accuracy of 78.86% within sin- They applied Differential Entropy (DE), Linear Dynamic System
gle day is acceptable. (LDS), Minimal Redundancy Maximal Relevance (MRMR), and Dis-
criminative Graph regularized Extreme Learning Machine (GELM)
to feature extraction, smoothing, selection and pattern recogni-
5. Discussions and conclusions
tion, respectively. Besides, they investigated the stability of neural
patterns over time. In our proposed framework, we adopted deep
5.1. Comparisons with similar work
learning method, which did not need the step of feature selec-
tion or dimensionality reduction, thus could simplify the process
This research presented an analysis of multi-domain and multi-
of emotion recognition. We also investigated the stability of neural
frequency features of EEGs related to emotion changes. Simple
patterns over different sessions and different experimental days.
Recurrent Units Network and ensemble learning methods were uti-
In recent years, deep learning algorithms are gradually used to
lized to construct an automatic recognition system to distinguish
solve emotion recognition problems. Zheng and Lu [40] introduced
three different emotional states. Several fundamental conclusions
a deep learning algorithm named Deep Belief Network (DBN) for
can be drawn: 1) Higher frequency bands like Gamma and Beta, are
emotion recognition on SEED dataset. DBN models were trained
more favorable for emotion recognition than other lower frequency
with differential entropy (DE) features, then critical frequency
bands. 2) Stimulated by scene and audio materials in the experi-
bands and channels were selected according to the weight dis-
ments, positive emotion is relatively more recognizable compared
tributions of the trained DBN models. At last, they designed four
with the other two emotional states by using EEG measurements.
different profiles, and obtained the best effect with the profile of
3) As an improved kind of RNN, SRU Network is good at grasping
the 12 channels. Li et al. [47] first organized DE features from dif-
the temporal changing property under different emotions by using
ferent channels to form two-dimensional maps. Then they trained
sequential data. 4) Efficient SRU Network and ensemble learn-
the Hierarchical Convolutional Neural Network (HCNN) to identify
ing method show high identification performance and acceptable
different emotions. HCNN yielded the highest accuracy of 88.2% on
processing efficiency in automatic recognition of emotion. 5) The
Gamma band. They also confirmed that the high-frequency bands
performance of our emotion system shows that the neural patterns
Gamma and Beta were the optimum bands for emotion processing.
are relatively stable within and cross days.
In our experimental settings, we also investigated critical frequency
A brief summary of emotion recognition based on EEG is shown
bands, but we did not select critical channels. DT-CWT technique
in Table 4, and the numbers in parenthesis denote the number of
was adopted to decompose EEG into five sub-bands, so as to study
levels for each emotional dimension. These researches present the
the discriminative capacity of different frequency bands. In addi-
feasibility and availability of establishing emotion models using
tion, we have considered the temporal characteristics of EEG, and
EEG. In these studies, the stimuli materials used in the experi-
transformed the features into a sequence format. Compared with
ments contain images, music and videos. Early studies often used
DBN and HCNN, our SRU model is another deep learning method
pictures as emotion elicitation materials. For example, Heraz and
with different specialties.
Frasson [41], Brown et al. [42] and Jenke et al. [43] used IAPS
Considering the temporal, spatial and frequency characteristics
(International Affective Picture System) to implement emotion
of EEG signals, Alhagry et al. [46] and [25,26] took the advan-
experiments. Later, researchers applied music to stimulate corre-
tage of LSTM to distinguish different emotion classes on DEAP
sponding emotional states. Lin et al. [44] and Hadjidimitriou and
dataset, respectively. Alhagry et al. [46] used LSTM to learn features
Hadjileontiadis [45] operated emotion recognition based on music
Table 4
Comparison of the existing emotion recognition systems.
Authors Stimuli Channels Subjects Method Description Emotional States Performance (Accuracy)
Heraz and Frasson IAPS 2 17 Amplitudes of four frequency Valence (12), arousal (12) Valence: 74%, arousal: 74%
[41] bands, evaluated KNN, Bagging and dominance (12) and dominance: 75%
Brown et al. [42] IAPS 8 11 Spectral power features, KNN Positive, negative and 85%
neutral
Jenke et al. [43] IAPS 64 16 Higher Order Crossings, Higher Happy, curious, angry, sad 36.8%
Order Spectra and and quiet
Hilbert-Huang Spectrum
features, QDA
Lin et al. [44] Music 24 26 Power spectral density and Joy, anger, sadness, and 82.29%
asymmetry features of five pleasure
frequency bands, evaluated
SVM
Hadjidimitriou and Music 14 9 Time-frequency analysis, KNN, Like and dislike 86.52%
Hadjileontiadis QDA and SVM
[45]
Koelstra et al. [15] Video 32 32 Spectral power features of four Valence (2), arousal (2) and Valence:57.6%,
frequency bands, Gaussian liking (2) arousal:62.0% and liking:
naive Bayes classifier 55.4%
Gupta et al. [16] Video 32 32 Graph-theoretic features, Valence (2), arousal (2), Valence: 67%, arousal: 69%,
SVM/RVM dominant (2) and liking (2) dominant: 65% and liking:
65%
Zheng et al. [40] Video 62 15 Differential entropy features of Negative, neutral and 86.65%
five frequency bands, DBN positive
32 32 Differential entropy features of Quadrants of VA space (4) 69.67%
Zheng et al. [14] Video
62 15 four/five frequency bands, LDS, Negative, neutral and 91.07%
MRMR, GELM positive
Alhagry et al. [46] Video 32 32 Raw EEG signal, LSTM Valence (2), arousal (2) and Valence: 85.45%, arousal:
liking (2) 85.65% and liking: 87.99%
Li et al. [25] Video 32 32 Rational asymmetry features of Valence (2) 76.67%
four frequency bands, LSTM
Li et al. [47] Video 62 15 Organize the differential Negative, neutral and 88.2%
entropy features as positive
two-dimensional maps, HCNN
from every 5 s-length EEG data, then the dense layer was applied • Regard emotion as a changing process with the time dependency
to classify these features. This method led to an average accu- property.
racy of 85.65%, 85.45% and 87.99% with low/high arousal, valence
and liking classes, respectively [25,26]. extracted Rational Asym- Most of the shallow learning and deep learning methods
metry (RASM) features from every 63s-length signal to extract treated emotional states as independent points and ignored the
the frequency-space domain features of EEG signals. Then LSTM time dependency property during emotion process. In order
was constructed as the classifier and achieved a mean accuracy of to grasp the temporal information of EEG, we adopted Sim-
76.67% with low/high valence classes. In our research, SRU models ple Recurrent Units (SRU) Network which was not only capable
were established instead of LSTM networks, which could realize the of processing sequence data but also had the ability to solve
computation in parallel and achieved obviously improved compu- the problem of long-term dependencies occurrence in normal
tational efficiency. RNN network.
5.2. Merits of the proposed system • Obtain competitive classification performance with low compu-
tational cost.
Our research constructed an automatic emotion recognition
system based on Simple Recurrent Units Network and ensemble Superior to common RNN, SRU realizes computing in parallel,
learning. The main contributions of this paper can be summarized thus it can accelerate the calculation at a large degree. Furthermore,
as follows: ensemble learning methods using three different strategies were
applied to integrate multiple SRU models so as to obtain better
• Comprehensively extract physiological features of EEG from results than using any base model alone.
multi-domains and multi-frequency bands.
• Explore the stability of our emotion recognition system over time.
Considering the brain as a sort of highly complex system, the
current study evaluated the characteristics of brain signals from The dataset used in this paper consisted of 15 participants
various aspects: time analysis from mean absolute value (MAV), and each one performed the experiments three times on dif-
frequency analysis from power spectral density (PSD) and nonlin- ferent experimental days. The performance of the proposed
ear analysis from fractal dimension (FD) and differential entropy subject-specific emotion models were evaluated in with-one-
(DE). Besides, raw EEG signals were decomposed and reconstructed day and cross-day two manners. The neural patterns of
using dual-tree complex wavelet transform (DT-CWT), which could EEG signals over time for emotion recognition were fully
be viewed as an improved kind of wavelet transform and had better explored.
expression capacity for EEG due to its superiorities of approx-
imate shift invariance and excellent reconstruction. Our results 5.3. Limitations and future work
demonstrated that the features extracted from sub-bands espe-
cially higher frequency bands provided more accurate information The limitations of the current work and corresponding direc-
than those from original signals. tions of future research could be summarized as follows:
• The SRU network parameters, such as the number of nodes References

and parameters of training algorithms, were mostly selected
depending on experience or trial-and-error method. Thus, a [1] R.W. Picard, Affective Computing, MIT Press, 2000.
[2] J. Wagner, N.J. Kim, E. Andre, From physiological signals to emotions:
more systematic method for selecting the appropriate param- implementing and comparing selected methods for feature extraction and
eters would better to be developed. classification, in: 2005 IEEE International Conference on Multimedia and
• The machine learning model constructed in this paper was Expo, IEEE Computer Society, 2005, pp. 940–943.
[3] K.H. Kim, S.W. Bang, S.R. Kim, Emotion recognition system using short-term
subject-specific. The well-learned parameters are appropriate for monitoring of physiological signals, Med. Biol. Eng. Comput. 42 (3) (2004)
one subject, but may not suitable for others. This may limit the 419–427.
applicability of the proposed emotion recognition methods. So [4] M.J. Black, Y. Yacoob, Recognizing facial expressions in image sequences using
local parameterized models of image motion, Int. J. Comput. Vis. 25 (1) (1997)
it is worth developing a generic (subject-independent or cross- 23–48.
subject) classifier for all subjects. [5] J.F. Brosschot, J.F. Thayer, Heart rate response is longer after negative
• Compared with traditional machine learning methods, deep emotions than after positive emotions, Int. J. Psychophysiol. 50 (3) (2003)
181–187.
learning is more dependent on powerful computation and will
[6] X.W. Li, B. Hu, T.S. Zhu, J.Z. Yan, F. Zheng, Towards affective learning with an
spend much more time on training in general. In order to EEG feedback approach, in: Acm International Workshop on Multimedia
save the computational cost, we can consider introducing some Technologies for Distance Learning, ACM, 2009, pp. 33–38.
pre-training models or combining transfer learning methods to [7] X.W. Wang, D. Nie, B.L. Lu, Emotional state classification from EEG data using
machine learning approach, Neurocomputing 129 (4) (2014) 94–106.
accelerate the model training. [8] L.L. Chen, J. Zhang, J.Z. Zou, C.J. Zhao, G.S. Wang, A framework on
wavelet-based nonlinear features and extreme learning machine for epileptic
seizure detection, Biomed. Signal Process. Control 10 (2014) 1–10.
Acknowledgments [9] X. Li, D. Song, P. Zhang, G.L. Yu, Y.X. Hou, B. Hu, Emotion recognition from
multi-channel EEG data through convolutional recurrent neural network, in:
This work is partly supported by National Natural Science Foun- 2016 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM), IEEE, 2016, pp. 352–359.
dation of China (Nos. 61976091 and 61806078) and Fundamental [10] J.X. Liu, H.Y. Meng, A.K. Nandi, M.Z. Li, Emotion detection from EEG
Research Funds for the Central Universities (Nos. 222201917006). recordings, in: International Conference on Natural Computation, IEEE, 2016,
pp. 1722–1727.
[11] W. Liu, W.L. Zheng, B.L. Lu, Emotion recognition using multimodal deep
Declaration of Competing Interest learning, in: International Conference on Neural Information Processing,
Springer, 2016, pp. 521–529.
[12] L.L. Chen, Y. Zhao, P.F. Ye, J. Zhang, J.Z. Zou, Detecting driving stress in
There is no conflict of interest in this work.
physiological signals based on multimodal feature analysis and kernel
classifiers, Expert Syst. Appl. 85 (2017) 279–291.
Appendix A. A list of acronyms [13] S.N. Daimi, G. Saha, Classification of emotions induced by music videos and
correlation with participants’ rating, Expert Syst. Appl. 41 (13) (2014)
6057–6065.
A summary of acronyms used in this research is listed in [14] W.L. Zheng, J.Y. Zhu, B.L. Lu, Identifying Stable Patterns over Time for Emotion
Table A1. Recognition from EEG, arXiv preprint arXiv:1601.02197, 2016.
[15] S. Koelstra, C. Muhl, M. Soleymani, DEAP: a database for emotion analysis
using physiological signals, IEEE Trans. Affect. Comput. 3 (1) (2012) 18–31.
[16] R. Gupta, K.U. Laghari, T.H. Falk, Relevance vector classifier decision fusion
Table A1
and EEG graph-theoretic features for automatic affective state
A summary of acronyms.
characterization, Neurocomputing 174 (2015) 875–884.
Acronym Full name [17] X. Chai, Q.S. Wang, Y.P. Zhao, X. Liu, O. Bai, Y.Q. Li, Unsupervised domain
adaptation techniques based on auto-encoder for non-stationary EEG-based
AE Auto-encoder emotion recognition, Comput. Biol. Med. 79 (2016) 205–214.
AUC The area under ROC curve [18] F. Wang, S.H. Zhong, J.F. Peng, J.M. Jiang, Y. Liu, Data augmentation for
BDAE Bimodal deep auto-encoder EEG-based emotion recognition with deep convolutional neural networks, in:
CNN Convolution neural network Conference on Multimedia Modeling, Springer, 2018, pp. 82–93.
CNS Central nervous system [19] N. Zhang, W.L. Zheng, W. Liu, B.L. Lu, Continuous vigilance estimation using
LSTM neural networks, in: International Conference on Neural Information
DBN Deep belief network
Processing, Springer, 2016, pp. 530–537.
DE Differential entropy
[20] L. Deng, J.Y. Li, J.T. Huang, K.S. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X.D. He,
DT-CWT Dual-tree complex wavelet transform
J. Williams, Y.F. Gong, A. Acero, Recent advances in deep learning for speech
DWT Discrete wavelet transform research at microsoft, in: International Conference on Acoustics, Speech, and
ECG Electrocardiogram Signal Processing (ICASSP), IEEE, 2013, pp. 8604–8608.
EEG Electroencephalogram [21] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8)
FC Fully-connected (1997) 1735–1780.
FD Fractal dimension [22] H. Tang, W. Liu, W.L. Zheng, B.L. Lu, Multimodal emotion recognition using
FFT Fast Fourier transform deep neural networks, in: International Conference on Neural Information
GELM Discriminative graph regularized extreme learning machine Processing, Springer, 2017, pp. 811–819.
HCNN Hierarchical convolutional neural network [23] N.G. Kingsbury, The dual-tree complex wavelet transform: a new technique
HMI Human-machine interaction for shift invariance and directional filters, IEEE Digital Signal Processing
IAPS International affective picture system Workshop 98 (1998) 2–5.
KNN K-nearest neighbor [24] U. Bal, Dual tree complex wavelet transform based de-noising of optical
microscopy images, Biomed. Opt. Express 3 (12) (2012) 3231–3239.
LDA Linear discriminant analysis
[25] M.Y. Li, W.Z. Chen, T. Zhang, Automatic epileptic EEG detection using
LDS Linear dynamic system
DT-CWT-based non-linear features, Biomed. Signal Process. Control 34 (2017)
MAV Mean absolute value 114–125.
MRMR Minimum redundancy maximum relevance [26] Z.Q. Li, X. Tian, L. Shu, X.M. Xu, B. Hu, Emotion recognition from EEG using
NB Naive Bayes RASM and LSTM, in: International Conference on Internet Multimedia
PCA Principal component analysis Computing and Service, Springer, 2017, pp. 310–318.
PNS Peripheral nervous system [27] H.M. Shim, H. An, S. Lee, E.H. Lee, H.K. Min, S. Lee, EMG pattern classification
PSD Power spectral density by split and merge deep belief network, Symmetry 8 (12) (2016)
RASM Rational asymmetry 148.
RF Random forest [28] U.R. Acharya, S.V. Sree, C.A.A. Peng, R. Yanti, J.S. Suri, Application of non-linear
RNN Recurrent neural network and wavelet based features for the automated identification of epileptic EEG
ROC Receiver operating characteristic signals, Int. J. Neural Syst. 22 (2) (2012) 565–579.
SC Skin conductance [29] X.Y. Zhang, F. Yin, Y.M. Zhang, C.L. Liu, Y. Bengio, Drawing and recognizing
Chinese characters with recurrent neural network, IEEE Trans. Pattern Anal.
SRU Simple recurrent units
Mach. Intell. 40 (4) (2018) 849–862.
SVM Support vector machine
[30] Y. Bengio, P.Y. Simard, P. Frasconi, Learning long-term dependencies with [39] L.L. Chen, Y. Zhao, J. Zhang, J.Z. Zou, Automatic detection of
gradient descent is difficult, IEEE Trans. Neural Netw. 5 (2) (1994) alertness/drowsiness from physiological signals using wavelet-based
157–166. nonlinear features and machine learning, Expert Syst. Appl. 42 (21) (2015)
[31] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, 7344–7355.
Y. Bengio, Learning Phrase Representations Using RNN Encoder-Decoder for [40] W.L. Zheng, B.L. Lu, Investigating critical frequency bands and channels for
Statistical Machine Translation, arXiv preprint arXiv:1406.1078, EEG-based emotion recognition with deep neural networks, IEEE Trans.
2014. Auton. Ment. Dev. 7 (3) (2015) 162–175.
[32] T. Lei, Y. Zhang, S.I. Wang, H. Dai, Y. Artzi, Simple Recurrent Units for Highly [41] A. Heraz, C. Frasson, Predicting the three major dimensions of the learner’s
Parallelizable Recurrence, arXiv preprint arXiv:1709.02755, 2017. emotions from brainwaves, World Acad. Sci. Eng. Technol. Int. J. Comput.
[33] R.G. Hefron, B.J. Borghetti, J.C. Christensen, C.M. Kabban, Deep long short-term Electr. Autom. Control Inf. Eng. 1 (7) (2007) 1988–1994.
memory structures model temporal dependencies improving cognitive [42] L. Brown, B. Grundlehner, J. Penders, Towards wireless emotional valence
workload estimation, Pattern Recognit. Lett. detection from EEG, in: International Conference of the IEEE Engineering in
94 (2017) 96–104. Medicine and Biology Society (EMBC), IEEE, 2011, pp. 2188–2191.
[34] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016. [43] R. Jenke, A. Peer, M. Buss, Feature extraction and selection for emotion
[35] A. Ozcift, A. Gulten, Classifier ensemble construction with rotation forest to recognition from EEG, IEEE Trans. Affect. Comput. 5 (3) (2014) 327–339.
improve medical diagnosis performance of machine learning algorithms, [44] Y.P. Lin, C.H. Wang, T. Jung, T.L. Wu, S. Jeng, J. Duann, J. Chen, EEG-Based
Comput. Methods Programs Biomed. 104 (3) (2011) emotion recognition in music listening, IEEE Trans. Biomed. Eng. 57 (7) (2010)
443–451. 1798–1806.
[36] C. Padilha, D.A. Barone, A.D. Neto, A multi-level approach using genetic [45] S. Hadjidimitriou, L.J. Hadjileontiadis, Toward an EEG-based recognition of
algorithms in an ensemble of least squares support vector machines, Knowl. music liking using time-frequency analysis, IEEE Trans. Biomed. Eng. 59 (12)
Based Syst. 106 (2016) 85–95. (2012) 3498–3510.
[37] J.H. Zhang, S.N. Li, R.B. Wang, Pattern recognition of momentary mental [46] S. Alhagry, A.A. Fahmy, R.A. ElKhoribi, Emotion recognition based on EEG
workload based on multi-channel electrophysiological data and ensemble using LSTM recurrent neural network, Int. J. Adv. Comput. Sci. Appl. (IJACSA) 8
convolutional neural networks, Front. Neurosci. 11 (2017) 1–16. (10) (2017) 355–358.
[38] D.P. Kingma, J.L. Ba, Adam: a Method for Stochastic Optimization, arXiv [47] J.P. Li, Z.X. Zhang, H.G. He, Hierarchical convolutional neural networks for
preprint arXiv:1412.6980, 2014. EEG-based emotion recognition, Cognit. Comput. 10 (2) (2018) 368–380.

EEG Emotion Recognition

Uploaded by

Copyright:

Available Formats

EEG Emotion Recognition

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EEG Emotion Recognition

Uploaded by

Copyright:

Available Formats

Biomedical Signal Processing and Control 58 (2020) 101756

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control

EEG-based emotion recognition using simple recurrent units network

1. Introduction visual based approaches, physiological signals show relatively high

Minimum Redundancy Maximum Relevance (mRMR), Singular

Fig. 2. General block diagram of emotion recognition based on SRU.

Fig. 3. Schematic of 3-level DT-CWT decomposition.

The aim of feature extraction is to gain the salient features of EEG

into segments with a length of 5 s. Four feature extraction methods

3.2.1. Time analysis N−m

Fig. 4. The data structure of samples.

Fig. 5. The structure of standard RNN.

Fig. 6. The structure of SRU.

Fig. 7. The structure of the 2-level SRU model.

a sequence-to-sequence SRU layer, a many-to-one SRU layer, a 4. Results

Fig. 8. Detailed framework of the ensemble model.

Fig. 9. Three emotional states divided into three 2-class cases.

Fig. 10. The ROC curves of four different features.

Fig. 11. The confusion matrices of MAV feature.

Feature Frequency band Classiﬁers

KNN NB SVM SRU

Gamma 67.88 ± 4.78 68.24 ± 6.85 69.18 + 10.84 79.22 ± 8.09

Voting 77.12 ± 3.22 76.21 ± 4.34 76.43 ± 3.43 75.51 ± 1.76

The best performance for different ensemble strategies is in bold.

Stats. Type I Type II Type III Within-one -day

Mean 79.03 77.35 79.62 78.86

Differential entropy (DE) in Gamma band is used for comparison in Table 3.

• The SRU network parameters, such as the number of nodes References

You might also like