EEG Emotion Recognition
EEG Emotion Recognition
EEG Emotion Recognition
a r t i c l e i n f o a b s t r a c t
Article history: The purpose of this research is to develop an EEG-based emotion recognition system for identification of
Received 25 June 2019 three emotions: positive, neutral and negative. Up to now, various modeling approaches for automatic
Received in revised form 16 October 2019 emotion recognition have been reported. However, the time dependency property during emotion pro-
Accepted 5 November 2019
cess has not been fully considered. In order to grasp the temporal information of EEG, we adopt deep
Available online 8 January 2020
Simple Recurrent Units (SRU) network which is not only capable of processing sequence data but also
has the ability to solve the problem of long-term dependencies occurrence in normal Recurrent Neural
Keywords:
Network (RNN). Before training the emotion models, Dual-tree Complex Wavelet Transform (DT-CWT)
EEG
Emotion recognition
was applied to decompose the original EEG into five constituent sub-bands, from which features were
DT-CWT then extracted using time, frequency and nonlinear analysis. Next, deep SRU models were established
SRU using four different features over five frequency bands and favorable results were found to be related to
Ensemble learning higher frequency bands. Finally, three ensemble strategies were employed to integrate base SRU models
to get more desirable classification performance. We evaluate and compare the performance of shallow
models, deep models and ensemble models. Our experimental results demonstrated that the proposed
emotion recognition system based on SRU network and ensemble learning could achieve satisfactory
identification performance with relatively economic computational cost.
© 2019 Published by Elsevier Ltd.
https://doi.org/10.1016/j.bspc.2019.101756
1746-8094/© 2019 Published by Elsevier Ltd.
2 C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756
tain abundant information about basic rhythms of EEG. In this 3.2.2. Frequency analysis
study, 5-level decomposition and reconstruction on EEG are per- On the basis of fast Fourier transform (FFT), the power spectral
formed by DT-CWT, thus six components of z5 (0–3.125 Hz), y5 density (PSD) approach is adopted to obtain the characteristics of
(3.125–6.25 Hz), y4 (6.25–12.5 Hz), y3 (12.5–25 Hz), y2 (25–50 Hz) EEG signals in the frequency domain [7]. The definition of PSD is
and y1 (50–100 Hz) can be acquired. Among them, the ranges of z5, proposed by Wiener-Khintchine theorem. Regarding the signal as
y5, y4, y3 and y2 are close to the frequency ranges of Delta, Theta, a stationary random process, the signal autocorrelation function is
Alpha, Beta and Gamma respectively. Hence, these five sub-bands calculated as:
could represent the corresponding rhythms of EEG.
1
N−1
ˆ
3.2. Feature extraction Rx (m) = x(i)x(i + m) (2)
N
i=0
Mean value of the curve length L(k) is computed by averaging which have the ability to learn long-term dependencies because of
Lm (k) over k sets for all m. After that, FD could be obtained by: the exclusive design of units [25,26].
For sequences of symbols, recurrent networks process one sym-
log L(k)
FD = − (6) bol at a time. In common RNN architectures, such as LSTM and GRU,
log k the computation in each step is based on completing the previous
Another nonlinear analysis method used in this research is dif- step. So recurrent computations are less suitable to parallelization
ferential entropy (DE). DE is the continuous version of Shannon [32]. Besides, gating method is used in most recurrent architectures
entropy, and could be computed as: to control the information flow to alleviate vanishing and exploding
gradient problems. In this process, the computation of the network,
especially the matrix multiplication, is the most expensive opera-
h(x) = − f (x) log(f (x))dx (7)
tion. Lei et al. [32] proposed an improved version of RNN named
Simple Recurrent Units (SRU). The key design in SRU is making the
DE feature is simple and efficient for the complexity evalua- gate computation only rely on the current input of the recurrence. In
tion of a continuous random variable. Previous studies have shown this way, only the point-wise multiplication is dependent on pre-
the advantage of DE in characterizing EEG time series [14]. For a vious steps. Thus the matrix multiplications in the feed-forward
fixed length EEG sequence, DE is equal to the logarithm of PSD in network can be easily parallelized. The structure of SRU is shown
a certain frequency band. If a random variable obeys the Gaussian in Fig. 6.
distribution N(, 2 ), the DE feature can simply be computed by The basic form of SRU includes a single forget gate. Given an
the following formulation: input xt at time t, a linear transformation x̃t and the forget gate ft
∞ are computed as:
1 (x − )2 1 (x − )2 1
h(x) = − √ exp log √ exp dx = log 2e 2 (8)
−∞ 2 2 2 2
2 2 2 2 2 x̃t = Wxt (10)
where e is the Euler’s constant, and is the standard deviation of a ft = (Wf xt + bf ) (11)
sequence.
This computation only depends on xt , which realizes computing
it in parallel across all time steps. The forget gate is used to modulate
3.3. Simple recurrent units (SRU) the internal state ct :
Recurrent Neural Network (RNN) has loops inside, and the ct = ft ct−1 + (1 − ft ) x̃t (12)
information is transmitted from the present loop to the next. The
The reset gate is employed to calculate the output state ht as a
chain-like property indicates that RNN is the natural structure of
combination of ct and xt :
neural network applied to sequences and lists [25,26]. The structure
of standard RNN is shown in Fig. 5. rt = (Wr xt + br ) (13)
Given a general input sequence [x1 , x2 , ..., xk ], where xi ∈ Rd , at
each time-step of RNN modeling, a hidden state is generated which ht = rt tanh(ct ) + (1 − rt ) xt (14)
produces a hidden sequence of [c1 , c2 , ..., ck ]. The hidden state at
The complete algorithm also utilizes skip connections to
time-step t is computed from an activation function f of the current
improve training of deep networks with numerous layers. While
input xt and previous hidden state ct - 1 as:
even a naive implementation of the approach brings improvements
ct = f (xt , ct−1 ) (9) in performance, one of the merits is that it can realize optimization
especially for the existing hardware architectures. Eliminating the
Then an optional output can be produced by ht = g(ct ), result- dependencies between time steps for the most expensive opera-
ing in an output sequence [h1 , h2 , ..., hk ], which can be used for tions enables parallelization across different dimensions and time
sequence-to-sequence tasks [29]. However, standard RNN cannot steps.
avoid the problem of long-term dependencies, which implies RNN When using the artificial neural network to process data, a mul-
would lose the capacity to connect information as the gap between tilayer network can usually get more satisfactory effects than a
loops increases [30]. Thus, special kinds of RNN are proposed, such single-layer network. Deep RNN is a kind of deep network in which
as Long Short-Term Memory [21] and Gated Recurrent Units [31], depth is added via a recurrent connection in the hidden layer [33].
C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756 5
Usually a single-layer network has limited ability to extract abstract reasonably. The more layers the network has, the more training
features. While a multilayer network could produce more easily time it will take. With the comprehensive consideration of effec-
learned representations of input sequences, leading to a better clas- tiveness and efficiency, this paper designs a 2-level SRU model. As
sification accuracy [34]. But the depth of network should be chosen shown in Fig. 7, the deep SRU model consists of an input layer,
6 C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756
each column donates the predicted class. The value in element of one. The average classification results of all subjects are shown
(i, j) is the percentage of samples in class i that is classi- in Table 1.
fied as class j. It can be seen that the SRU model achieved Similar to the results from confusion matrices, the SRU model
relatively competitive recognition performance on higher fre- obtained obviously better performance on higher frequency bands.
quency bands, such as the best classification accuracy of 79.22% Besides, SRU achieved relatively stable results on features extracted
on Gamma band and 75.91% on Beta band. When tested on from raw signals and lower frequency bands. When using the DE
lower frequency bands, the classifier obtained lower accuracy feature, the SRU model led to the best classification performance
such as 69.04%, 69.07% and 68.11% on Alpha, Theta and Delta on Gamma band with an average accuracy of 80.02%. Compared
bands respectively. In addition, an accuracy of 68.67% was got- with other classifiers, the SRU model attained the best recogni-
ten when trained on samples extracted directly from raw EEG tion performance on all the five frequency bands. For the MAV
signals. feature, it obtained the best classification accuracy of 79.22%, which
When using the PSD, FD and DE features, similar phenomenon was 11.34%, 10.98% and 10.04% higher than KNN, NB and SVM
could be observed as that of MAV feature, that was higher frequency respectively. For the PSD feature, its best classification accuracy
bands had stronger discriminative capacity for emotion recognition was 78.29%, which was 10.71%, 10.03% and 9.16% higher than KNN,
tasks than lower frequency bands. Unlike MAV and PSD features, NB and SVM respectively. For the FD feature, the best classification
the FD and DE features captured from raw signals led to a reason- accuracy was 77.22%, which was 10.53%, 12.14% and 8.01% higher
able accuracy of 74.33 and 74.35%, which showed the suitability of than KNN, NB and SVM respectively. For the DE feature, the best
FD and DE features to deal with raw signals. In addition, positive classification accuracy was 80.02%, which was 8.63%, 28.45% and
class showed higher correct identification percentage than other 6.33% higher than KNN, NB and SVM respectively. These results
two classes on all the frequency bands. In terms of computational demonstrated the superiority of the SRU models based on EEG for
cost, the average training time of the SRU model was 60.1 s. When the emotion recognition.
the SRU layers were replaced by LSTM, the training time became
136.5 s. Obviously, the computation speed of SRU was much supe-
4.3. Ensemble results
rior to that of LSTM.
Further, we compared the SRU with traditional classification
The final results of the three ensemble strategies are presented
algorithms, such as K-Nearest Neighbor (KNN), Naive Bayes (NB)
in Table 2. From it, we could learn that the voting or weighted
and Support Vector Machine (SVM). For these methods, the input
method of Strategy2, and the weighted method of Strategy3 outper-
sample structure (62∗19) was first transformed into a column vec-
formed all individual base models. Strategy1 did not lead to a good
tor (1178∗1). Then PCA was adopted to reduce the dimension of
classification performance due to the interference of a few lower
features (50∗1). In KNN, the number of nearest neighbors K was
frequency bands; Strategy2 achieved more desirable results than
searched in the space [1:20] with a step of one for the optimal value.
Strategy1. Specifically, the weighted method of Strategy2 led to the
In NB, we assumed that each feature obeyed Gaussian distribution,
best classification accuracy of 83.13% among all these three strate-
and the default prior probability was the appearance frequency
gies, which was 3.11% higher than the best individual SRU model
of each class. In SVM, we chose linear kernel, and searched the
(80.02%); Strategy3 using the weighted method also achieved good
optimal value for C in the parameter space 2[−10:10] with a step
classification effects, but the training time was much longer than
8 C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756
the other two strategies. Therefore, the weighted method of Strat- 4.4. Stability of emotion recognition model cross days
egy2 was selected as the final ensemble strategy. Different subjects
may have different optimal features and frequency bands, the In SEED dataset, each participant executed the experiments
ensemble method could lead to a better average value and a smaller three times on different experimental days. By using SEED, we could
standard deviation for all subjects by combining several base evaluate the stability of emotion recognition model cross days. We
models. splitted the data in three different ways:
C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756 9
Table 1
Classification accuracy (%) comparison of different classifiers.
The best performance for each feature and each frequency is in bold.
Type I: The data from the first experimental day of one par- Type III: The data from the first and second experimental day
ticipant was used as training data and the data from the second of one participant was used as training data and the data from the
experimental day was used as testing data. third experimental day was used as testing data.
Type II: The data from the first experimental day of one par- The average results from 15 participant were presented in
ticipant was used as training data and the data from the third Table 3, where differential entropy (DE) in Gamma band was used
experimental day was used as testing data. as feature for comparison. Initially, we intuitively thought that the
10 C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756
Table 2
Classification comparison of different ensemble strategies.
Ensemble Strategy Method Accuracy (%) Precision (%) Recall (%) F-score(%) Time (s)
Table 3
The average accuracies (%) of our emotion model within and cross days.
identification accuracy in within-one-day manner would be higher stimulus. Compared with pictures and music, videos are composed
than that in the cross-day manner. In fact, we could find that the of scenes and audio and can supply more real-life scenarios to sub-
recognition accuracy was relatively stable within and cross days, jects. Thus, videos have become popular in emotion recognition
which reflected that the relation between the variation of emo- researches. Some famous datasets for emotion recognition, such
tional states and the EEG signal was relatively stable for one person as DEAP [14–16,25,46] and SEED [14,40,47] have been established,
over time. There is a reason why the recognition accuracy under where the emotions are all elicited using videos.
within-one-day manner is not very high. The movie chips used to Traditional classification algorithms along with feature selec-
stimulate emotion in the same experimental day originated from tion or dimensionality reduction methods are extensively used to
different films. So using the first nine movie sessions to model and classify different emotional states, and have achieved fine classifi-
the last six to test is a cross-film modeling in some sense. Consid- cation effects. Zheng et al. [14] operated their research on DEAP and
ering EEG signal is task-related and sensitive to different emotion SEED datasets based on traditional machine learning algorithms.
stimulus, the cross-task recognition accuracy of 78.86% within sin- They applied Differential Entropy (DE), Linear Dynamic System
gle day is acceptable. (LDS), Minimal Redundancy Maximal Relevance (MRMR), and Dis-
criminative Graph regularized Extreme Learning Machine (GELM)
to feature extraction, smoothing, selection and pattern recogni-
5. Discussions and conclusions
tion, respectively. Besides, they investigated the stability of neural
patterns over time. In our proposed framework, we adopted deep
5.1. Comparisons with similar work
learning method, which did not need the step of feature selec-
tion or dimensionality reduction, thus could simplify the process
This research presented an analysis of multi-domain and multi-
of emotion recognition. We also investigated the stability of neural
frequency features of EEGs related to emotion changes. Simple
patterns over different sessions and different experimental days.
Recurrent Units Network and ensemble learning methods were uti-
In recent years, deep learning algorithms are gradually used to
lized to construct an automatic recognition system to distinguish
solve emotion recognition problems. Zheng and Lu [40] introduced
three different emotional states. Several fundamental conclusions
a deep learning algorithm named Deep Belief Network (DBN) for
can be drawn: 1) Higher frequency bands like Gamma and Beta, are
emotion recognition on SEED dataset. DBN models were trained
more favorable for emotion recognition than other lower frequency
with differential entropy (DE) features, then critical frequency
bands. 2) Stimulated by scene and audio materials in the experi-
bands and channels were selected according to the weight dis-
ments, positive emotion is relatively more recognizable compared
tributions of the trained DBN models. At last, they designed four
with the other two emotional states by using EEG measurements.
different profiles, and obtained the best effect with the profile of
3) As an improved kind of RNN, SRU Network is good at grasping
the 12 channels. Li et al. [47] first organized DE features from dif-
the temporal changing property under different emotions by using
ferent channels to form two-dimensional maps. Then they trained
sequential data. 4) Efficient SRU Network and ensemble learn-
the Hierarchical Convolutional Neural Network (HCNN) to identify
ing method show high identification performance and acceptable
different emotions. HCNN yielded the highest accuracy of 88.2% on
processing efficiency in automatic recognition of emotion. 5) The
Gamma band. They also confirmed that the high-frequency bands
performance of our emotion system shows that the neural patterns
Gamma and Beta were the optimum bands for emotion processing.
are relatively stable within and cross days.
In our experimental settings, we also investigated critical frequency
A brief summary of emotion recognition based on EEG is shown
bands, but we did not select critical channels. DT-CWT technique
in Table 4, and the numbers in parenthesis denote the number of
was adopted to decompose EEG into five sub-bands, so as to study
levels for each emotional dimension. These researches present the
the discriminative capacity of different frequency bands. In addi-
feasibility and availability of establishing emotion models using
tion, we have considered the temporal characteristics of EEG, and
EEG. In these studies, the stimuli materials used in the experi-
transformed the features into a sequence format. Compared with
ments contain images, music and videos. Early studies often used
DBN and HCNN, our SRU model is another deep learning method
pictures as emotion elicitation materials. For example, Heraz and
with different specialties.
Frasson [41], Brown et al. [42] and Jenke et al. [43] used IAPS
Considering the temporal, spatial and frequency characteristics
(International Affective Picture System) to implement emotion
of EEG signals, Alhagry et al. [46] and [25,26] took the advan-
experiments. Later, researchers applied music to stimulate corre-
tage of LSTM to distinguish different emotion classes on DEAP
sponding emotional states. Lin et al. [44] and Hadjidimitriou and
dataset, respectively. Alhagry et al. [46] used LSTM to learn features
Hadjileontiadis [45] operated emotion recognition based on music
C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756 11
Table 4
Comparison of the existing emotion recognition systems.
Authors Stimuli Channels Subjects Method Description Emotional States Performance (Accuracy)
Heraz and Frasson IAPS 2 17 Amplitudes of four frequency Valence (12), arousal (12) Valence: 74%, arousal: 74%
[41] bands, evaluated KNN, Bagging and dominance (12) and dominance: 75%
Brown et al. [42] IAPS 8 11 Spectral power features, KNN Positive, negative and 85%
neutral
Jenke et al. [43] IAPS 64 16 Higher Order Crossings, Higher Happy, curious, angry, sad 36.8%
Order Spectra and and quiet
Hilbert-Huang Spectrum
features, QDA
Lin et al. [44] Music 24 26 Power spectral density and Joy, anger, sadness, and 82.29%
asymmetry features of five pleasure
frequency bands, evaluated
SVM
Hadjidimitriou and Music 14 9 Time-frequency analysis, KNN, Like and dislike 86.52%
Hadjileontiadis QDA and SVM
[45]
Koelstra et al. [15] Video 32 32 Spectral power features of four Valence (2), arousal (2) and Valence:57.6%,
frequency bands, Gaussian liking (2) arousal:62.0% and liking:
naive Bayes classifier 55.4%
Gupta et al. [16] Video 32 32 Graph-theoretic features, Valence (2), arousal (2), Valence: 67%, arousal: 69%,
SVM/RVM dominant (2) and liking (2) dominant: 65% and liking:
65%
Zheng et al. [40] Video 62 15 Differential entropy features of Negative, neutral and 86.65%
five frequency bands, DBN positive
32 32 Differential entropy features of Quadrants of VA space (4) 69.67%
Zheng et al. [14] Video
62 15 four/five frequency bands, LDS, Negative, neutral and 91.07%
MRMR, GELM positive
Alhagry et al. [46] Video 32 32 Raw EEG signal, LSTM Valence (2), arousal (2) and Valence: 85.45%, arousal:
liking (2) 85.65% and liking: 87.99%
Li et al. [25] Video 32 32 Rational asymmetry features of Valence (2) 76.67%
four frequency bands, LSTM
Li et al. [47] Video 62 15 Organize the differential Negative, neutral and 88.2%
entropy features as positive
two-dimensional maps, HCNN
from every 5 s-length EEG data, then the dense layer was applied • Regard emotion as a changing process with the time dependency
to classify these features. This method led to an average accu- property.
racy of 85.65%, 85.45% and 87.99% with low/high arousal, valence
and liking classes, respectively [25,26]. extracted Rational Asym- Most of the shallow learning and deep learning methods
metry (RASM) features from every 63s-length signal to extract treated emotional states as independent points and ignored the
the frequency-space domain features of EEG signals. Then LSTM time dependency property during emotion process. In order
was constructed as the classifier and achieved a mean accuracy of to grasp the temporal information of EEG, we adopted Sim-
76.67% with low/high valence classes. In our research, SRU models ple Recurrent Units (SRU) Network which was not only capable
were established instead of LSTM networks, which could realize the of processing sequence data but also had the ability to solve
computation in parallel and achieved obviously improved compu- the problem of long-term dependencies occurrence in normal
tational efficiency. RNN network.
5.2. Merits of the proposed system • Obtain competitive classification performance with low compu-
tational cost.
Our research constructed an automatic emotion recognition
system based on Simple Recurrent Units Network and ensemble Superior to common RNN, SRU realizes computing in parallel,
learning. The main contributions of this paper can be summarized thus it can accelerate the calculation at a large degree. Furthermore,
as follows: ensemble learning methods using three different strategies were
applied to integrate multiple SRU models so as to obtain better
• Comprehensively extract physiological features of EEG from results than using any base model alone.
multi-domains and multi-frequency bands.
• Explore the stability of our emotion recognition system over time.
Considering the brain as a sort of highly complex system, the
current study evaluated the characteristics of brain signals from The dataset used in this paper consisted of 15 participants
various aspects: time analysis from mean absolute value (MAV), and each one performed the experiments three times on dif-
frequency analysis from power spectral density (PSD) and nonlin- ferent experimental days. The performance of the proposed
ear analysis from fractal dimension (FD) and differential entropy subject-specific emotion models were evaluated in with-one-
(DE). Besides, raw EEG signals were decomposed and reconstructed day and cross-day two manners. The neural patterns of
using dual-tree complex wavelet transform (DT-CWT), which could EEG signals over time for emotion recognition were fully
be viewed as an improved kind of wavelet transform and had better explored.
expression capacity for EEG due to its superiorities of approx-
imate shift invariance and excellent reconstruction. Our results 5.3. Limitations and future work
demonstrated that the features extracted from sub-bands espe-
cially higher frequency bands provided more accurate information The limitations of the current work and corresponding direc-
than those from original signals. tions of future research could be summarized as follows:
12 C. Wei, L.-l. Chen, Z.-z. Song et al. / Biomedical Signal Processing and Control 58 (2020) 101756
[30] Y. Bengio, P.Y. Simard, P. Frasconi, Learning long-term dependencies with [39] L.L. Chen, Y. Zhao, J. Zhang, J.Z. Zou, Automatic detection of
gradient descent is difficult, IEEE Trans. Neural Netw. 5 (2) (1994) alertness/drowsiness from physiological signals using wavelet-based
157–166. nonlinear features and machine learning, Expert Syst. Appl. 42 (21) (2015)
[31] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, 7344–7355.
Y. Bengio, Learning Phrase Representations Using RNN Encoder-Decoder for [40] W.L. Zheng, B.L. Lu, Investigating critical frequency bands and channels for
Statistical Machine Translation, arXiv preprint arXiv:1406.1078, EEG-based emotion recognition with deep neural networks, IEEE Trans.
2014. Auton. Ment. Dev. 7 (3) (2015) 162–175.
[32] T. Lei, Y. Zhang, S.I. Wang, H. Dai, Y. Artzi, Simple Recurrent Units for Highly [41] A. Heraz, C. Frasson, Predicting the three major dimensions of the learner’s
Parallelizable Recurrence, arXiv preprint arXiv:1709.02755, 2017. emotions from brainwaves, World Acad. Sci. Eng. Technol. Int. J. Comput.
[33] R.G. Hefron, B.J. Borghetti, J.C. Christensen, C.M. Kabban, Deep long short-term Electr. Autom. Control Inf. Eng. 1 (7) (2007) 1988–1994.
memory structures model temporal dependencies improving cognitive [42] L. Brown, B. Grundlehner, J. Penders, Towards wireless emotional valence
workload estimation, Pattern Recognit. Lett. detection from EEG, in: International Conference of the IEEE Engineering in
94 (2017) 96–104. Medicine and Biology Society (EMBC), IEEE, 2011, pp. 2188–2191.
[34] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016. [43] R. Jenke, A. Peer, M. Buss, Feature extraction and selection for emotion
[35] A. Ozcift, A. Gulten, Classifier ensemble construction with rotation forest to recognition from EEG, IEEE Trans. Affect. Comput. 5 (3) (2014) 327–339.
improve medical diagnosis performance of machine learning algorithms, [44] Y.P. Lin, C.H. Wang, T. Jung, T.L. Wu, S. Jeng, J. Duann, J. Chen, EEG-Based
Comput. Methods Programs Biomed. 104 (3) (2011) emotion recognition in music listening, IEEE Trans. Biomed. Eng. 57 (7) (2010)
443–451. 1798–1806.
[36] C. Padilha, D.A. Barone, A.D. Neto, A multi-level approach using genetic [45] S. Hadjidimitriou, L.J. Hadjileontiadis, Toward an EEG-based recognition of
algorithms in an ensemble of least squares support vector machines, Knowl. music liking using time-frequency analysis, IEEE Trans. Biomed. Eng. 59 (12)
Based Syst. 106 (2016) 85–95. (2012) 3498–3510.
[37] J.H. Zhang, S.N. Li, R.B. Wang, Pattern recognition of momentary mental [46] S. Alhagry, A.A. Fahmy, R.A. ElKhoribi, Emotion recognition based on EEG
workload based on multi-channel electrophysiological data and ensemble using LSTM recurrent neural network, Int. J. Adv. Comput. Sci. Appl. (IJACSA) 8
convolutional neural networks, Front. Neurosci. 11 (2017) 1–16. (10) (2017) 355–358.
[38] D.P. Kingma, J.L. Ba, Adam: a Method for Stochastic Optimization, arXiv [47] J.P. Li, Z.X. Zhang, H.G. He, Hierarchical convolutional neural networks for
preprint arXiv:1412.6980, 2014. EEG-based emotion recognition, Cognit. Comput. 10 (2) (2018) 368–380.