Extracting A Novel Emotional EEG Topographic Map Based On

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Hindawi

Journal of Healthcare Engineering


Volume 2023, Article ID 9223599, 19 pages
https://doi.org/10.1155/2023/9223599

Research Article
Extracting a Novel Emotional EEG Topographic Map Based on
a Stacked Autoencoder Network

Elnaz Vafaei ,1 Fereidoun Nowshiravan Rahatabad ,1 Seyed Kamaledin Setarehdan ,2


and Parviz Azadfallah 3
1
Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
2
School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
3
Tarbiat Modares University, Tehran, Iran

Correspondence should be addressed to Seyed Kamaledin Setarehdan; [email protected]

Received 9 April 2022; Revised 2 November 2022; Accepted 23 December 2022; Published 19 January 2023

Academic Editor: Haihong Zhang

Copyright © 2023 Elnaz Vafaei et al. Tis is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Emotion recognition based on brain signals has increasingly become attractive to evaluate human’s internal emotional states.
Conventional emotion recognition studies focus on developing machine learning and classifers. However, most of these methods
do not provide information on the involvement of diferent areas of the brain in emotions. Brain mapping is considered as one of
the most distinguishing methods of showing the involvement of diferent areas of the brain in performing an activity. Most
mapping techniques rely on projection and visualization of only one of the electroencephalogram (EEG) subband features onto
brain regions. Te present study aims to develop a new EEG-based brain mapping, which combines several features to provide
more complete and useful information on a single map instead of common maps. In this study, the optimal combination of EEG
features for each channel was extracted using a stacked autoencoder (SAE) network and visualizing a topographic map. Based on
the research hypothesis, autoencoders can extract optimal features for quantitative EEG (QEEG) brain mapping. Te DEAP EEG
database was employed to extract topographic maps. Te accuracy of image classifers using the convolutional neural network
(CNN) was used as a criterion for evaluating the distinction of the obtained maps from a stacked autoencoder topographic map
(SAETM) method for diferent emotions. Te average classifcation accuracy was obtained 0.8173 and 0.8037 in the valence and
arousal dimensions, respectively. Te extracted maps were also ranked by a team of experts compared to common maps. Te
results of quantitative and qualitative evaluation showed that the obtained map by SAETM has more information than
conventional maps.

1. Introduction treatment process of diseases such as depression, autism,


epilepsy, and similar cases [2]. In addition, emotion rec-
Emotion is one of the essential cognitive aspects of human ognition is an interesting topic in many research areas. Te
beings. According to cognitive studies, evaluation of human brain-computer interface (BCI) system introduces methods
emotion in contact with individuals and social environments such as recording physiological signals from the human
plays an important role in behavior human daily life [1]. Te brain based on the central nervous system [3]. Physiological
emotion of a normal individual can be recognized by signals record the electrical activity of neurons in the brain in
processing body reactions including facial expressions, diferent parts of the cerebral cortex. Electroencephalogram
voice, body gesture, and electrophysiological reactions. (EEG), which has been used to detect brain abnormalities, is
Electrophysiological signals are more preferable, especially a noninvasive method for recording brain signals [4] and
in case of abnormal individuals, which other body reactions contains rich information about internal emotional states
rarely represent internal emotional states. Terefore, the with the most comprehensive features. Te EEG signal can
study of emotions would have a great impact on the be processed by the state-of-the-art marching learning
2 Journal of Healthcare Engineering

methods, machine learning classifers, and classifcation obtained by extracting features from the EEG signal.
approach. Today, with the advancement of topographic maps, the
Machine learning is one of the leading methods in de- analysis of EEG provides a comprehensive exploration of
veloping BCIs. Machine learning has many subsets such as temporal and spatial characteristics simultaneously
recurrent networks, deep learning networks, and Boltzmann [12, 13].
networks, which have their own strengths and weaknesses In conventional “topographic brain mapping” tech-
based on the application [1, 5, 6]. Deep learning is a spe- nique, only one feature is considered to draw a map. For
cialized example of this method, which has been considered instance, the classical Fourier transform is calculated to
in recent decades. Te development of machine learning quantify the power spectrum in the frequency subband of
algorithms is an interesting topic in the feld of cognitive EEG signal [14], and entropy is another feature derived from
science. Deep learning networks are a trending machine EEG signal for brain mapping. Keshmiri et al. [15] examined
learning subject capable of detecting underlying states entropy to diferentiate between the brain’s negative, neutral,
hidden in EEG signals. Deep learning, especially in the case and positive states to emotional stimuli. Moreover, power
of large dataset such as EEG, shows acceptable and citable spectrum density (PSD) is another feature, which provides
results in both supervised and unsupervised EEG a separate topographic brain map [16]. As a consequence,
classifcations [6]. investigating all the features underlying the EEG signal
Autoencoder (AE) is a special type of artifcial neural would create a larger number of topographic brain maps.
network and one of the deep learning algorithms, which Tis study aimed to evaluate the hypothesis that com-
automatically learn the compressed representation of raw pression of temporal, frequency, linear, and nonlinear EEG
input data [7]. Autoencoders (AEs) can extract low-level features can provide original and useful information about
features from the input layer and high-level features in deep brain function in the form of topographic brain maps. Tus,
layers, which is well done with the structure of stacked we have presented a novel method to reduce the number of
autoencoders (SAEs) [8]. AEs extract complex nonlinear topographic brain maps to only one map by preserving
patterns from EEG data, which make the process of di- spatial features and extracting the optimal combination of all
agnosing and treating diseases more accurate. Zhao and He features that existed in EEG signals. Terefore, the resulting
[9] developed deep learning networks to analyse early-stage topographic brain map is a specifc combination of the
Alzheimer’s disease from the EEG signal and reported 92% extracted feature while preserving the spatial EEG signals
accuracy to increase the diagnosis of this disease. Jose et al. features [11]. Terefore, a method is required to extract the
[8] employed SAEs to study epilepsy and detected epileptic optimal combination of EEG features. Hence, an AE-based
seizures by EEG signals and extracted features, such as optimal feature selection network was proposed to extract
relative energy, spectral features, and some nonlinear fea- the optimal topographic brain map (stacked autoencoder
tures from each channel. Tese data were imported as input topographic map-SAETM), which would provide more
to an autoencoder network, which resulted in 91.5% accu- complete information about brain functions. In addition,
racy in the diagnosis of seizure with the concept of adaptive. evaluating one map instead of several maps speeds up the
Furthermore, the study of AE networks in emotion recog- diagnostic process. To prove the study hypothesis, SAETM
nition from EEG data has received much attention in recent and conventional topographic maps were compared in
decades. Yin et al. [6] conducted studies on emotion rec- a quantitative and qualitative manner. Tere are many
ognition through deep networking based on a multiple- common criteria for measuring the similarity of two images,
fusion-layer based ensemble classifer of stacked autoen- including absolute error, mean square error, peak signal-to-
coder (SAE). Using the AE network could increase the noise ratio, histogram, similarity of Euclidean distance, or
average classifcation by up to 5.26% compared to other correlation coefcient to compare two independent images
emotion recognition networks [6]. On the other hand, the [17] and also using classifer methods. Accordingly, Topic
combination of neural networks is one of the most recently and Russo [3] revealed that CNN networks have the highest
published for emotion classifcation. Liu et al. [10] combined performance in calculating similarity between maps of
a convolutional neural network (CNN), SAE deep neural diferent classes. In addition, similar studies on the DEAP
network, and a deep neural network (DNN) to classify database by topographic brain maps with deep learning
emotional states and reported acceptable results compared networks have enhanced the process of emotion recognition
to a neural network method. based on Capsule neural network (CapsNet) [18]. Finally,
Te EEG signal has acceptable temporal resolution and the SAETM and conventional topographic brain maps were
it does not provide useful information in terms of spatial compared by a team of specialists based on a scale ques-
resolution [11, 12]. As a result, spatial resolution of EEG tionnaire for further evaluation.
contains rich information about emotional states. One of
the common methods in visualizing EEG signal is quan- 2. Materials and Methods
titative EEG (QEEG) analysis, that is well known as to-
pographic brain mapping, which provides a cost-efective Te study consists of four main parts, including EEG signal
and practical method for spatial evaluating of neural ac- processing, stacked autoencoder network, emotion classi-
tivities. Tis method represents structural and efective fcation, and algorithm parameters, extracting a new to-
communication in nerve cells, nerve complexes, and brain pographic brain map. Te frst part includes EEG signal
structure [13]. Brain topography by the QEEG technique is preprocessing and extraction of conventional features in
Journal of Healthcare Engineering 3

Part 2
Part 1
Part 3
FINAL
ABSTRACTED
EEG POWER SAEs FEATURES CLASSIFICATION
FEATURE SAE 1 MLP 1
Channel 1 EXTRACTION
MLP EMOTION CLASSES
Mean
Low
Standard
DEAP deviation
High
DATASET
Zero-
crossing rate
FEATURE
Channel 32 SAE 32 MLP 32
EXTRACTION
Fractal
dimension
BRAIN MAP
Approximate
entropy

Correlation
dimension

Part 4

Figure 1: Architecture of the stacked autoencoder topographic map (SAETM) algorithm.

emotion recognition as well. In the second part, the participants aged 19–37 (mean age 26.9), half of whom
extracted features are abstracted by the autoencoders. Te were women. Tis experiment was designed in a con-
best structure of features is obtained by the emotion classifer trolled environment to stimulate emotions. Forty music
in part three. In the last part, the ultimate features are used to videos were played based on diferent emotional states
draw the topographic brain map. Te architecture of the when recording the signals. Tere was a 3-second interval
SAETM is illustrated in Figure 1, including primary feature between each music video to reset the participant’s
extraction (part 1), SAEs networks for abstracted feature emotional states. Te baseline signal was recorded for
extraction (part 2), multilayer perception (MLP) networks to 5 seconds and after that the videos were randomly dis-
extract fnal features based on emotion classifcation (part 3), played to the participants. Tose videos that were used as
and topographic brain mapping (part 4). As shown in emotional stimuli were categorized with emotional labels
Figure 1, the EEG signal features are extracted for each using the self-assessment Mankins questionnaire. Ten,
channel and fed to an SAE network. Tus, there are 32 SAE the participant gave each video a score of one to nine after
networks. At the output of each SAE, an MLP network is watching the full videos. Scores 1 to 3 corresponded to the
used to obtain a fnal feature; therefore, one feature is ob- negative state of the valence dimension and the inactive
tained for each channel. Moreover, there is an MLP classifer state in the arousal dimension, 4 to 6 were related to the
that is applied to the output of the previous MLPs layer. Te neutral state of the valence dimension and the normal
output of this classifer is used for emotion classifcation, in state in the arousal dimension, and 7 to 9 were relevant to
arousal and valance dimensions, that the parameters of the the positive state of the valence dimension and the active
SAETM algorithm will be adjusted by this classifer. A colour state in the arousal dimension. Tese scores were divided
is assigned proportionally to each weight of the frst MLPs into happy, pleased, relaxed, excited, neutral, calm, dis-
layer to draw a topographic brain map. tressed, miserable, and depressed classes, which were
related to four dimensions of emotion valence (positive/
negative), arousal (passive/active), liking (like/dislike),
2.1. Database. In this study, DEAP physiological dataset and dominance [19].
was used in emotion analysis with simultaneous recording
of EEG signals and eight electrophysiological signals,
including skin galvanic, respiratory rate, skin tempera- 2.2. Preprocessing. In the preprocessing part, unwanted
ture, pulse rate, blood pressure, neck and smile muscle noises and artifacts in the signal are removed. Tis study
activity, and EOG signal. Te EEG signal was recorded aimed to investigate electroencephalographic signals from
through 32 locations based on the International 10–20 the DEAP dataset. Te 1-minute (trial) EEG signals of each
system. Te study was conducted on 32 healthy video were recorded with a sampling frequency of 256 Hz
4

Table 1: Features extracted in the SAETM algorithm.


Channels Features Formula Feature index
2
EEG power of sub-bands: Teta (4–8 Hz), P � 1/N 􏽐N
i�1 |xi | No. 1–128 (4 features
alpha (8–12 Hz), beta (12–30 Hz), gamma
Form a time series of data xi � [x1 , x2 , · · · , xN ] ∗ 32 channels)
(30–45 Hz)-average PSD
μ � 1/N 􏽐N i�1 xi
Mean
Form a time series of data xi � [x1 , x2 , · · · , xN ]
2
σ 2 � 1/N − 1 􏽐N i�1 (xi − μ)
Standard deviation
Form a time series of data xi � [x1 , x2 , · · · , xN ]
ZCR � 1/2N|sgn[x(n)] − sgn[x(n − 1)]|
Zero-crossing rate
Form a time series of data xi � [x1 , x2 , · · · , xN ]
N � ε− D
Fractal dimension where the variable N stands for the number of measurement units, ε for
Fp1, AF3, F3, F7, FC5, FC1, C3, T7, CP5,
the scaling factor, and D for the fractal dimension
CP1, P3, P7, PO3, O1, Oz, Pz, Fp2, AF4,
1: Form a time series of data xi � [x1 , x2 , · · · , xN ]
Fz, F4, F8, FC6, FC2, Cz, C4, T8, CP6, No. 129–321 (6
2: Form a sequence of vectors Uj � [xi , xi+1 , · · · , xi+m−1 ] for fx m and
CP2, P4, P8, PO4, and O2 features ∗ 32
real number r
channels)
3: Use the sequence xi � [x1 , x2 , · · · , xN−M+1 ] to construct, for each
Approximate entropy
i, 1 ≤ i ≤ N − m + 1
Cm i (r) � (number of(xj )such thatd|xi − xj | ≤ r)/(N − m + 1)
m
4: φ (r) � (N − m + 1)− 1 􏽐�(N−m+1) i�1 log (Cm i (r))
5: ApEn � φm (r) − φm+1 (r)
For any set of N points in an m-dimensional space
1: xi→ � [x1 (i), x2 (i), · · · , xN (i)]
Correlation dimension 2: C(ϵ) � lim (g/N2 )n where g is the total number of pairs of points,
n⟶∞
which have a distance between them that is less than distance ϵ
3: C(ϵ) ∼ ϵCD
Journal of Healthcare Engineering
Journal of Healthcare Engineering 5

and converted to a frequency of 128 Hz using the down Arousal


sample method. Ten, all the EEG trials were fltered to
0.05–47 Hz. Recorded EEG is afected by several noises and High Arousal-Low Valence High Arousal-High Valence
artefacts. Te independent component analysis (ICA) al-
gorithm extracts statistically independent components from Annoying Excited
a mixture of sources. In this study, the ICA was used to Angry Happy
remove unwanted signals, including EMG and EOG signals. pleased
Nervous Valence
On average, 1–3 artifact-related independent components
(ICs) were removed per participant. Sad Relaxed
Bored Peaceful

2.3. Primary Feature Extraction. Feature selection is con- Sleepy Calm


sidered as one of the most important parts since these
features can describe the signal. EEG signal features are Low Arousal-Low Valence Low Arousal-High Valence
divided into three main classes of time, frequency, and
time-frequency features [11]. In this study, features, in- Figure 2: Arousal- and valence-based emotion model. Four areas
cluding power and statistical features as linear features of the model are presented by the extracted features.
and entropy, fractal dimension, and correlation di-
mension as nonlinear features are selected, which were
3. Stacked Autoencoder Topographic
considered in previous emotion recognition studies. Te
calculation of power is a common feature for all EEG Map (SAETM)
subbands [20, 21]. Power spectrum density for fve sub- Te autoencoder is a deep learning network to get a better
bands, theta (4–8 Hz), low alpha (8–10 Hz), upper alpha description of the features [24]. Autoencoders have a sym-
(10–12 Hz), beta (12‒30 Hz), and gamma (30 Hz higher) is metrical structure and the inputs and outputs are similar [7].
calculated by Welch’s method [22]. Mean, standard de- Each layer of autoencoders consists of three layers (input layer,
viation, and zero-crossing rate are examined as statistical one hidden layer, and one output layer). Te hidden layer
features [6] and signal complexity is measured by entropy contains two parts, encoder and decoder. Te stacked
[1]. Te fractal dimension is used for measuring the autoencoder includes several autoencoders with a SoftMax
complexity and irregularity of the signal [23]. Te cor- layer. Te input of the frst layer of the SAE network is the
relation dimension shows the relationship between the features extracted from the EEG signal (Table 1). Tese fea-
signal and itself, which extracts repetitive and periodic tures are weights and biases calculated with the training of the
patterns of the signal [24] that these features were frst AE network. Te output of the encoder at this stage is the
extracted from the fltered signal. Te extracted features input of the next AE network. Tis process continues to obtain
were normalized to the baseline signal in the range of zero the fnal abstracted features, and fnally, the output of the last
and one. Table 1 lists the features extracted in this study AE network encoder is used to classify emotions [25].
based on previous studies [23]. In the frst step of the SAEs training, the network uses
All data were labelled according to the arousal-valance unlabelled data to extract EEG abstracted features in an
domain. Te data labels were used for supervised training of unsupervised procedure. Ten, the encoder part is com-
the SAME algorithm. Te trials were 1-minute intervals in pleted with a classifer and it trained with supervised pro-
which music videos with diferent emotional states were cedure to fnetuning the SAE parameters. It can help to
shown. Te DEAP dataset of each trial specifed a number initialize the weights one layer at a time by minimizing the
from one to nine, which was assigned to it. Tis study focused reconstruction loss.
on the high arousal-high valence, low arousal-high valence, Assuming that the vector of the extracted features from
high arousal-low valence, and low arousal-low valence. Te input and the vector of the hidden layer are x ∈ Rn and
reason for this choice is that the diference between the h ∈ Rm , respectively, n is the dimension of the extracted
positive and negative levels of the valence scale and the high features in input and m is the dimension of abstracted
and low levels of the arousal scale are very signifcant. Tese features (Equation (1)) (R is real number).
two scales have two complementary and diferent aspects to
examining positive and negative emotions [22]. A 2-second h � σ(x.W + b), (1)
window with 50% overlap was used to extract the features. A
total of 8 music videos were played in high arousal-high where W ∈ Rm×n is a weight matrix, b ∈ Rm is a bias vector,
valence. 60 ∗ 8 ∗ 10 (60 windows ∗ 8 music videos ∗ 10 fea- and σ is an activation function (sigmoid function) (Equation
tures) features were extracted from the frst area. Te low (2)) that is located in the output layer.
arousal-low valence included 12 music videos, and the 1
σ(z) � . (2)
extracted features were 60 windows ∗ 12 music videos ∗ 10 1 + e− z 􏼁
features. Te two parts of low arousal-high valence and high
arousal-low valence played ten music videos, 60 ∗ 10 ∗ 10 (60 x′ ∈ Rn is the next layer that has the same dimension as
windows ∗ 10 music videos ∗ 10 features) features were the input vector. Te output reconstructs the input vector by
extracted for each area [19] (Figure 2). updating the hidden layer weights.
6 Journal of Healthcare Engineering

x′ � σ 􏼐h.WT + c􏼑 where y is the output function in which β is the matrix of


(3) weights and α is the bias vector in the last layer. Te fne-
� σ 􏽨σ(x.W + b).WT + c􏽩. tuning stage is an important stage of SAE networks. Te
fnetuning method is used to train large labelled data and can
Te autoencoder parameters, W, WT , b, and c are ob- improve classifer performance [6, 25]. Tis stage fnetunes
tained by the backpropagation algorithm by the square error the parameters of the last layer of SAE by backpropagation
cost function according to equation (4) that l is considered algorithm in the form of training with the supervisor. Te
the number of the training samples. parameters obtained in part 4 are used for the topographic
l brain map.
E � 􏽘 (x − x′)2 . (4) Te number of layers and the number of neurons in each
k�0 SAE layer are important in SAETM training. Terefore, the
minimum hidden layer and minimum number of neurons in
Te next autoencoder layer is used by h and this op- each layer are essential for having an optimal classifcation.
eration is repeated l times to produce a stacked autoencoder. In this study, Pearson or Spearman correlation coefcients
Te best abstracted features are produced in the hidden layer were used to fnd the most optimal structure [6, 10]. Tese
of each autoencoder and h(l) is the best representation of two coefcients calculate the best similarity between input
abstracted features (Equation (5)). and output data. Tese two parameters calculate the best
h(l) � σ 􏼐W(l) · · · σ 􏼐W(2) σ 􏼐W(1) x + c(1) 􏼑 + c(2) 􏼑. + c(l) 􏼑. similarity between input and output data. Terefore, the
structural loss function (SLF) is defned based on equation
(5) (9).
Tis stage is called pretraining to set SAE parameters. An
MLP network with one output neuron is added to the en- SLF � ω1 􏼐1 − ρ21Dx Dz 􏼑 + 1 − ω1 􏼁􏼐1 − ρ22Dx Dz 􏼑, (9)
coder side of each SAE to extract the one abstracted feature
where ω1 � 0.5, ρ1Dx Dz , and ρ2Dx Dz are the Pearson corre-
in order to plot brain map in topographic map stage. Ui is
lation coefcient and Spearman rank correlation coefcient,
the output function in which µ is the matrix of weights and z
respectively [6], where Dx is the input matrix and Dz in-
is the bias vector in the MLP layer, and i is the number of
dicates the output matrix.
SAEs.
Ui � σ 􏼐µh(l) + z􏼑i ∈ {1, 2, ..32}. (6) 3.1. Classifer Evaluation. Depending on literature [26],
choosing the type of classifer can afect the results. For this
Te feature sets are defned as F(n) . It means that the
purpose, referenced classifers will be used in this study and
features of each channel are grouped into ten parts, the
the desired classifer will be selected based on the results. To
power is F1 , F2 , F3 , F4 (four subbands are selected). Te
check the accuracy of the network, we consider a criterion as
linear EEG features including means, standard deviation,
described below. Te following equations are used to
and zero-crossing rate are F5 , F6 , and F7 , respectively. In the
evaluate the precision of classifer of emotion classes in
end, F8 , F9 , and F10 are built based on nonlinear features,
equation (10), in which TP is true positive and FP is false
fractal dimension, approximate entropy, and correlation
positive [27].
dimensions. Terefore, feature vectors are defned,
x(Fj ) ∈ Fj , j ∈ {1, 2, .10}. We construct i SAE for describing TP
Precision � . (10)
the hidden feature abstractions of each channel based on TP + FP
equation (7), where S1sae (x), . . . S32
sae (x) denote the higher Te network accuracy is calculated by equation (11),
feature abstractions of each channel features.
where FN is a false negative.


⎪ U1 � σ 􏼐µh(l) + z􏼑 � S1sae (x),

⎪ TP

⎪ U2 � σ 􏼐µh(l) + z􏼑 � S2 (x), Recall � . (11)
⎨ sae TP + FN
⎪ (7)

⎪ ...

⎪ Te classifer accuracy is generally obtained from

⎩ 32
U � σ 􏼐µh(l) + z􏼑 � S32 equation (12) in which TN is true negative.
sae (x).

Te structure of the SAETM is completed by placing two (TN + TP)


overall classification accuracy � .
1 (TN + FN + TP + FP)
neurons in the last layer (Equation (8)), where y � 􏼠 􏼡 or (12)
0
0 Te F1 is a combination of the accuracy and recall
y � 􏼠 􏼡 shows the low and high levels of emotion
1 criteria, which is obtained according to equation (13).
dimensions. 2 ×(Precision × Recall)
F1 score � . (13)
y � σ 􏼐βUi + α􏼑. (8) Precision + Recall
Journal of Healthcare Engineering 7

Input
Conv1 Max pool Conv2 Max pool
Image

Fully
Output Reshape Conv3
connected

Figure 3: CNN model to classify the topographic maps.

3.2. Evaluating the Topographic Brain Maps. Extracting to- matrix of shared weights, and hx,y is the input activation at
pography or brain map is one of the practical methods of position x, y [29].
QEEG. Parameters of making the brain topography are CNN model, which is used in this study, is presented in
calculated for diferent subbands of EEG signal for each Figure 3. Max pooling were applied as pooling method. In
number of electrodes according to the standard of the 10–20 max pooling, the maximum activation output is pooled into
international system. Te extracted features in the previous a 2 × 2 input region and the parameters of the model were set
section are considered as colour mapping parameters. Te as follows: Number of epochs: 10, optimizer: RMS prop,
bilinear interpolation method is used for navigating the learning rate � 0.001, the parameter β: 0.9, activation: sig-
values between the electrodes [13, 27]. In this study, the moid, stride: 1 for convolution layer, stride: 2 for
brain topographic map was extracted by the MNE library in pooling layer.
Python software.
4. Results and Discussion
3.3. Te CNN Used in Image Classifcation. Te convolutional 4.1. Results. In this section, the results obtained from the
neural network (CNN) is a feed-forward neural network, in SAETM were presented to extract topographic brain maps.
which the input of this network is image-like. CNNs are Te data were divided into train and test groups to evaluate
originally designed for evaluating images [3]. In this study, this algorithm. All data were normalized for each participant
we use CNN accuracy as criteria to measure similarity be- with a mean of zero and a standard deviation of one to
tween two groups of topographic maps. Te building blocks eliminate the diference in the scale of features. K-fold cross-
in CNN architecture include convolution layer, pooling validation method was used to evaluate the studied samples
layer, and fully connected layers. Te convolutional layer is better. k � 10 was considered so that each time 0.1 of the data
the central part of a CNN. In this layer, there are multiple is selected for testing and trained with 0.9 of the data. Tis
flter slides (or Kernel) that convolves across the input with operation is repeated ten times to observe all the data by the
the convolution operation. Tis operation has the ability to network.
extract features with preserving spatial information from the
database and the pooling layer can decrease the spatial di-
4.2. Architecture of the SAETM. Te appropriate selection of
mension of features. In addition, the pooling layer also flters
SAETM parameters, that is, the number of hidden layers and
out noise from the image. An image is convolved with a flter
the number of neurons in each layer, improves network
to learn one feature from the whole image. Te fully con-
performance. Figure 4 illustrates the SLF based on (7) for the
nected layers connect inputs in the previous layer, pooling
ten features selected in Table 1. Te SLF was used to optimize
layer, to the output neurons [3, 28]. Suppose a M × M image
the number of neurons in each layer that was calculated by
convolves with a k × k kernel. Equation (14) shows the size of
adding the number of neurons in each layer. Figure 4(a)
the output image without padding and equation (15) is the
represents the trend of feature abstraction in the F3 channel
convolution operation. Padding is used in order to preserve
as an example of channels in the left hemisphere. As shown,
the size of input image. Te size of the output image with
the input of the frst hidden layer is ten features extracted
padding is shown in equation (16).
from the EEG signal. Te SLF value has its lowest value in the
M × M ∗ k × k � M − K + 1, (14) frst layer with seven neurons. Terefore, seven abstracted
features were obtained in the frst layer. Adding another
2 2
neuron to this layer increases the amount of SLF. Terefore,
⎝B + 􏽘 􏽘 w h
O � δ⎛ ⎠
⎞ the minimum amount of SLF, which is seven neurons in the
i,j a+i,b+j , (15)
i�0 j�0 frst hidden layer, is important. Te seven features extracted
from the frst hidden layer are the inputs of the second
(M + 2P − F) hidden layer. Te minimum amount of SLF is observed in
M × M∗k∗k � , (16) the second hidden layer with four neurons. Tus, ten
(s + 1)
neurons are reduced to seven neurons and fnally to four
where O is the output, P is the padding, s is the stride, b is the neurons. Figure 4(b) presents these calculations in the right
bias, δ is the sigmoidal activation function, δ is a 3 × 3 weight hemisphere for the F4 channel. In this channel, ten features
8 Journal of Healthcare Engineering

0.80 0.90
0.85
0.75
0.80

SLF
SLF

0.70 0.75
0.70
0.65
0.65
0.60 0.60
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Number of Neurons Number of Neurons
Hidden Layer 1 Hidden Layer 1
Hidden Layer 2 Hidden Layer 2
Hidden Layer 3 Hidden Layer 3
(a) (b)
1.00
0.95
0.90
0.85
SLF

0.80
0.75
0.70
0.65
0.60
1 2 3 4 5 6 7 8 9 10
Number of Neurons
Hidden Layer 1
Hidden Layer 2
Hidden Layer 3
(c)

Figure 4: SLF changes for the number of neurons in each layer: (a) the number of neurons in each hidden layer in the F3 channel in the left
hemisphere, (b) the number of neurons in each hidden layer in the F4 channel in the right hemisphere, and (c) the number of neurons in
each hidden layer in the Cz channel in centre of the head.

Table 2: Number of neurons in each hidden layer for 32 SAEs equivalent to 32 channels.
Hidden layer Hidden layer
SAEs Hidden layer Hidden layer Hidden layer SAEs Hidden layer Hidden layer Hidden layer
1 2 3 1 2 3
SAE (Fp1) 8 5 3 SAE (Fp2) 6 4 1
SAE (AF3) 8 4 2 SAE (AF4) 6 4 4
SAE (F3) 7 4 4 SAE (Fz) 7 5 4
SAE (F7) 6 5 2 SAE (F4) 6 3 1
SAE (FC5) 8 6 3 SAE (F8) 7 5 3
SAE (FC1) 5 4 1 SAE (FC6) 7 4 3
SAE (C3) 7 4 3 SAE (FC2) 5 3 1
SAE (T7) 5 3 3 SAE (Cz) 7 3 3
SAE (CP5) 6 4 1 SAE (C4) 6 4 2
SAE (CP1) 8 5 4 SAE (T8) 6 4 4
SAE (P3) 7 3 3 SAE (CP6) 8 4 3
SAE (P7) 7 4 2 SAE (CP2) 7 3 2
SAE (PO3) 6 4 2 SAE (P4) 7 4 2
SAE (O1) 7 5 2 SAE (P8) 6 4 3
SAE (Oz) 6 3 3 SAE (PO4) 8 4 2
SAE (Pz) 8 6 2 SAE (O2) 7 4 2

were reduced to six in the frst hidden layer, three features in and three in the second layer. Table 2 shows the number of
the second layer, and fnally to one feature. Figure 4(c) is neurons in each hidden layer in each of the 32 channels. Te
similarly calculated for the Cz channel. As shown, ten maximum and minimum neurons in the last layer are four
features were decreased to seven features in the frst layer and one, respectively.
Journal of Healthcare Engineering 9

4.3. Accuracy Measures for the Comparison of Classifers. with 73.2% in the arousal dimensions reported the least
Te abstracted features are obtained in the last layer after accuracy, and the SAETM reported 83.3% and 82.8% ac-
fnetuning the SAE parameters. According to the hypothesis curacy in both valence and arousal dimensions, respectively.
of this study, the output of each SAE is used as an optimal Based on the results, SAE network has better performance
feature to extract the brain topographic map. Te perfor- compared to other networks (p < 0.01). Computational time
mance of the SAETM algorithm was compared with several for training the network with diferent feature extraction
emotion classifers. Figure 5 demonstrates the comparison of method is shown in Figure 10. Te highest value is related to
the accuracy of emotion classifers with the accuracy of the KLDA method and the SAETM had the lowest
SAETM. KNN (K-nearest neighbour classifer), BN (naive computational time.
Bayesian classifer), and SVM (support vector machines) are Some linear and nonlinear features of the EEG signal
selected for the reason that these classifers are known as were used based on Table 1 in the designed SAETM al-
widely used classifers in emotion recognition feld using gorithm. Te three modes were examined to evaluate the
EEG information [23]. selected features. In the frst state, the network only trains
In Figure 5, the SAETM is made up of the MLP network with linear features. Te second state is the desired
(multilayer perceptron) [26]. Figures 5(a) and 5(b) show the nonlinear features, and in the third state, the combination
accuracy of the classifers in the valence and arousal di- of linear and nonlinear features was evaluated. If the input
mensions, respectively. Te accuracy of the SAETM and of SAE networks was linear features, the accuracy of the
SVM networks are close to each other and average accuracy network in the valence and arousal dimensions is 65.7%
of SAETM and SVM networks are as much as in the valence and 64.2%, respectively. Te network accuracy is 53.6%
dimension 83.3% and 82.7% and in the arousal dimension and 54.9%, respectively, by applying nonlinear features.
82.8% and 74.8%, respectively. KNN and BN networks show Te accuracy of the network according to Figure 5 in the
the average accuracy equal to in the valence dimension valence and arousal dimensions is 83.3% and 82.8%, re-
74.3% and 79.2% and in the arousal dimension 73.4% and spectively, if linear and nonlinear features are applied as
77.2%, respectively. Te SAETM method had the highest and inputs to SAE networks (SAETM). In addition, the F1
the KNN network had the lowest accuracy. Tere is a sig- score for the SAETM in the valence and arousal di-
nifcant diference between these two classifers SAETM and mensions, which is obtained from equations (11) and (12)
SVM (p > 0.01) and other classifers. Te loss of the pro- (Precision and Recall concepts) is 81.8% and 80.3%, re-
posed SAETM structure with respect to check the gener- spectively, and the SVM network is 78.4% in the valence
alization of this network is presented in Figure 6. As shown, dimension and 72.7% in the arousal dimension. Terefore,
the SAETM has appropriate generalization on validation using linear and nonlinear features together gives better
data and the maximum epoch is considered 200. results than the other two modes.
Figure 7 shows a comparison of network performance
with the Box–Whisker display in two dimensions, arousal
4.5. Comparisons for Combination of Classifers and Feature
(b) and valance (a). Each column corresponds to a classifer.
Extraction Methods. Te result of accuracy comparison and
Te highest accuracy is related to SVM and MLP classifers.
computational time to combine common classifers and feature
Te MLP network was used for simplifying the structure of
extraction methods is shown in Tables 3 and 4 respectively, and
the SAETM. Te classifcation accuracy and the needed
it is visible that the accuracy of combination of SVM classifer
computational time for training an emotion recognition
and NPCA feature extraction method in valance (78.04), and
network are signifcant factors for building a new network
SVM classifer and KLD method in arousal (78.23) perform
structure. Te computational time taken by the SAETM,
better than comparable methods reported (Table 3). On the
SVM, KNN, and BN networks for training are illustrated in
other hand, computational time to train in valence and arousal
Figure 8. Te BN has the highest computational time, while
space show that the combination of the KNN classifer and
the KNN has the lowest value. Te SAETM reports less
PCA feature extraction method, in the valence (452 seconds)
computing time than the BN and it is near to SVM.
and in the arousal (470 seconds), provides less computational
time in comparison with others (Table 4).
4.4. Comparison of Diferent Feature Extraction Methods.
In this study, the SAE network was selected as the feature 4.6. Emotional Topographic Brain Mapping. In this study,
extraction method. Te SAE network was compared with a brain topographic map is extracted by selecting the MLP
PCA feature extraction method, nonlinear PCA method, and network and assigning a colour appropriate to the weight of
KLDA method to evaluate the selected feature extraction each node in this network (Figure 1). Figures 11(a) and
method. Figure 9 indicates the comparison of the classifer 11(b) show the map use of the SAETM method and the
results for the 32 participants based on these three methods common method for ten features of Table 1 while watching
in the valence and arousal dimensions. Figure 9(a) dem- emotional video clips. Images obtained from sub-band
onstrates the Box–Whisker diagram of the results of com- power, mean, standard deviation, zero-crossing rate, frac-
paring SAE networks in the valence dimension, and tal dimension, entropy, and correlation dimension features
Figure 9(b) presents its arousal dimension with PCA, are observed separately in four emotion classes. Te right
nonlinear PCA, and KLDA. Linear PCA method with an column in Figures 11(a) and 11(b) is the images from the
average accuracy of 75.3% in the valence and KLDA method SAETM algorithm. Te SAETM in four scales of high
10 Journal of Healthcare Engineering

90 90
85 85
ACC (%)

ACC (%)
80 80
75 75
70 70
65 65
5 10 15 20 25 30 5 10 15 20 25 30
Participants Index Participants Index
SAETM KNN SAETM KNN
SVM BN SVM BN
(a) (b)

Figure 5: Comparison of emotion classifers with SAETM for two dimensions of valence (a) and arousal (b).

Loss
0.75

0.70

0.65

0.60

0.55

0.50

0.45

0.40
0 25 50 75 100 125 150 175 200
Epochs
Train
Validation
Figure 6: Te loss of the SAETM structure with the increase in the number of training epochs.

88
86
85
84
82
80
80
ACC (%)
ACC (%)

78
75
76
74
72 70

70
68 65
SAETM SVM KNN BN SAETM SVM KNN BN
(a) (b)

Figure 7: Box–Whisker diagram to compare SAETM classifer with SVM, KNN, and BN networks for two dimensions of valence (a) and
arousal (b).

arousal-high valence, low arousal-high valence, high most brain activity and dark blue the least brain activity
arousal-low valence, and low arousal-low valence, could (Figure 11). In high arousal-high valence (1) in both
create more separation for the border of active areas in the Figures 11(a) and 11(b), the active regions in the frontal
brain compared to common methods. Dark red shows the section are only in theta power and standard deviation and
Journal of Healthcare Engineering 11

700
600

Time in seconds
500
400
300
200
100
0
SAETM SVM KNN BN
Figure 8: Computational time (s) to train networks.

88 88
86 86
84 84
82 82
80 80

ACC (%)
ACC (%)

78 78
76 76
74 74
72 72
70 70
68 68

SAETM NPCA KLDA PCA SAETM NPCA KLDA PCA

(a) (b)

Figure 9: Box–Whisker diagram to compare SAE method for feature extraction with NPCA, KLDA, and PCA methods in two dimensions
of valence (a) and arousal (b).

1400 value of the beta power in the center of the head towards the
Time in seconds

1200
1000 frontal. Te occipital section is neutral or inactive in all
800 images (2) except the correlation dimension image. In the
600 next image, the correlation of the left hemisphere shows the
400
200 highest brain activity at this scale. In the SAETM image, the
0 frontal area shows the inactive areas of the brain at this scale
SAETM NPCA KLDA PCA and the head-to-back center bar shows the active area of the
Figure 10: Computational time (s) to train for diferent feature brain. In this network, the image is divided into two inactive
extraction method. and active parts from the middle of the head into two parts,
including the front and the central bar of the head, re-
spectively. In high arousal-low valence (3), the active parts
the images related to the two features of mean and zero- in the images are theta, alpha, beta power, and mean in the
crossing rate are observed in the occipital region. Brain frontal region to the right hemisphere. Images of zero-
activity was high at relative entropy in the center of the head crossing rate and to some extent, the fractal dimension
toward the frontal lobe. Brain activity in the three images features show the highest brain activity in the right
was related to features such as theta power, relative entropy, hemisphere. Te central area to the back of the head shows
and fractal dimension in the lower right hemisphere. the activity of the brain at its lowest state in all images except
Moreover, the images of relative entropy and correlation the entropy feature. In the SAETM algorithm, the active and
dimension in the left hemisphere indicate the lowest values inactive parts of the brain are divided from the center of the
of low brain activity. In the SAETM, the brain’s activity in head into right and left hemispheres. In this image, the
the frontal areas in the left hemisphere can be observed, frontal region is obtained in two fully active hemispheres.
along with its inactivity in the right hemisphere. Te active Finally, in low arousal-low valence (4), the frontal to central
and inactive parts are separated from the center of the head part of the head showed low brain activity in all images
and divided into right and left hemispheres. Most of the except alpha power and fractal dimension. Tree images of
brain activity is in the left hemisphere towards the frontal. beta power, mean, and fractal dimension in the occipital
In low arousal-high valence (2), the active part of the brain region show brain activity. In the image of the SAETM, two
is observed in the center of the head toward the left active and inactive parts are divided from the center of the
hemisphere in images of theta, alpha, gamma, and standard head to the front and back of the head, which shows the
deviation. Te active part of the image has the maximum brain’s activity in the occipital.
12 Journal of Healthcare Engineering

Table 3: Accuracy comparison for combination of classifers and feature extraction methods.
(a) Valence
Feature extraction method
Accuracy (%) PCA NPCA KLD
SVM 75.45 78.04 77.51
Classifer type
KNN 74.81 76.42 76.05
BN 75.32 77.41 76.13
(b) Arousal
Feature extraction method
Accuracy (%) PCA NPCA KLD
SVM 76.26 77.25 78.23
Classifer type
KNN 73.45 74.91 75.63
BN 74.96 75.56 76.91

Table 4: Computational time(s) to train for combination of classifers and feature extraction methods.
(a) Valence
Feature extraction method
Time (s) PCA NPCA KLD
SVM 558 634 881
Classifer type
KNN 452 841 847
BN 593 729 1143
(b) Arousal
Feature extraction method
Time (s) PCA NPCA KLD
SVM 571 595 914
Classifer type
KNN 470 537 914
BN 623 755 1076

4.7. Comparison of the Resulting Topographic Maps. Tere Table 6 demonstrates the accuracy of the CNN classi-
are several methods as numerical criteria rubric to com- fcation after watching ten music videos. Te accuracy of the
pare the resulting topographic maps including the use of CNN classifcation was 0.4874 after watching the frst video
classifer networks and comparing network accuracy as for SAETM. Te resulting image reported accuracy of 0.7923
a criterion for distinguishing network inputs (input im- after fve minutes and 0.8305 after ten minutes. According to
ages). Table 5 shows the results of using successful net- Table 6, CNN network classifed the image from the SAETM
works in image classifcation. As shown, the map after watching the ffth music video with the accuracy close
classifcation results of the SAETM algorithm have the to watching the tenth music video. Terefore, the SAETM
highest accuracy (0.8305 ± 0.02). In addition, the average produced a brain topographic map in a shorter time. Te
accuracy of diferent classifcations on the images obtained best accuracy was obtained in the ninth or tenth minute in
from this network has the highest value (0.7613 ± 0.04). In CNN network accuracy for ten other features during
the SAETM, the BN classifer has the lowest accuracy, this time.
which is equal to 0.6906 ± 0.12. Tis value is still higher
than the average accuracy of the various classifers and the
average accuracy for the alpha power is 0.5863, which is the 4.8. Quality Evaluation of the Resulting Maps. To evaluate the
highest accuracy after the SAETM. Terefore, the images quality of the resulting maps, 20 experts in the feld of to-
obtained by the SAETM have more distinction than any of pographic brain maps were asked to give a score of zero to
the common images. Te results obtained by Chao et al. ten via scale questionnaire to EEG maps extracted from the
[18] reported accuracy results as much as 0.6673 in the SAETM and maps obtained from the common methods. Te
valence dimension and 0.6828 in the arousal dimension by scale questionnaire is designed based on the rate of difer-
creating an image by mapping the electrodes on a two- entiation and meaningfulness of the photos. Te results of
dimensional matrix. Topic and Russo [3] evaluated the ANOVA test show that topographic maps obtained from
images obtained from CNN network on DEAP data and SAETM are preferable to common methods (P < 0.001)
extracted features from the resulting images with an ac- (Figure 12). Te resulting maps well diferentiate the active
curacy of 0.7630 in the valence dimension and 0.7654 in areas in diferent parts of the brain while watching music
the arousal dimension. Te SAETM achieved accuracy of videos from the rest time. Moreover, these maps show that
0.8173 in the valence dimension and 0.8037 in the arousal the extracted topographic maps have spatial, temporal, and
dimension by CNN classifcation. In addition, the F1 score frequency information that would lead to more un-
criterion for the SAETM in two dimensions of valence and derstanding of anatomical brain function. Terefore, to-
arousal was 0.8031 and 0.7984, respectively. pographic images, which contain rich spatial and functional
Journal of Healthcare Engineering 13

High Valence-High Arousal

1)

Teta Alpha Beta Gamma Mean SAETM


High Valence-Low Arousal

2)

Teta Alpha Beta Gamma Mean SAETM


Low Valence-High Arousal

3)

Teta Alpha Beta Gamma Mean SAETM


Low Valence-Low Arousal

4)

Teta Alpha Beta Gamma Mean SAETM


(a)
Figure 11: Continued.
14 Journal of Healthcare Engineering

High Valence-High Arousal

1)

Standard Deviation Zero-crossing Rate Fractal Dimension Approximate Entropy Correlation Dimension SAETM
High Valence-Low Arousal

2)

Standard Deviation Zero-crossing Rate Fractal Dimension Approximate Entropy Correlation Dimension SAETM
Low Valence-High Arousal

3)

Standard Deviation Zero-crossing Rate Fractal Dimension Approximate Entropy Correlation Dimension SAETM
Low Valence-Low Arousal

4)

Standard Deviation Zero-crossing Rate Fractal Dimension Approximate Entropy Correlation Dimension SAETM
(b)

Figure 11: (a) Images obtained from ten features extracted from EEG signal (power for four sub-bands and mean) and images obtained
from SAETM during viewing ten music videos. (b) Images obtained from ten features extracted from EEG signal (standard deviation, zero-
crossing rate, fractal dimension, entropy, and correlation dimension) and images obtained from SAETM during the viewing period of ten
music videos.

information about the brain, will lead discover more im- Due to the fact that in the generate of emotions, the EEG
plications about humans. signal indicates the trigger of the deeper sources of the brain,
All software implementations were run on a Windows topographic brain mapping as a feasible method allows us to
10 64-bit workstation with an Intel Celeron 2.4 GHz and study emotion with more details about the activity of brain
4 GB of RAM. areas. In our study, the features obtained for topographic
mapping are a nonlinear combination of features used in
conventional brain mapping. Terefore, the only common
5. Discussion
feature of the obtained map and common maps is the degree
Electroencephalographic methodological issues have a high of participation of each area of the brain in emotional ac-
temporal resolution but low spatial resolution for locating the tivity. To compare the obtained map with common maps, we
source. Te sensitivity of spatial resolution decreases as investigate the degree of participation of brain areas in
a function of the depth of neural sources. Terefore, the ability diferent emotions. Tere are several studies that show
to detect deep brain generators that are vital to the production stimuli with relative valence afect the interhemispheric
of emotions is still a matter of debate. Numerous EEG studies asymmetry within the prefrontal cortex [31], which results
on emotion support the idea that the impact of deep sources the development of the “hemispheric valence hypothesis”
such as the hippocampus, the amygdala, or basal ganglia can [32] and it states that high valence emotions are largely
be reasonably determined despite relatively low signal strength processed in the left frontal cortex and low valence emotions
using a variety of source analysis methods [30]. are largely processed within the right prefrontal cortex [33].
Journal of Healthcare Engineering

Table 5: Accuracy of SVM, BN, KNN, CapsNet, and CNN networks in image classifcation by ten features, including subband power, mean, standard deviation, zero-crossing rate, fractal
dimension, entropy, and correlation dimension.
Features
Classifers Power Power Power Power Standard Zero-crossing Fractal Approximate Correlation
Mean SAETM
theta alpha beta gamma deviation rate dimension entropy dimension
SVM 0.5132 ± 0.02 0.5818 ± 0.01 0.4363 ± 0.06 0.5527 ± 0.01 0.3280 ± 0.13 0.3620 ± 0.02 0.4750 ± 0.12 0.5145 ± 0.03 0.4239 ± 0.05 0.3840 ± 0.01 0.7536 ± 0.01
BN 0.4746 ± 0.13 0.5373 ± 0.02 0.5248 ± 0.05 0.4323 ± 0.01 0.4129 ± 0.01 0.3359 ± 0.01 0.3984 ± 0.03 0.4719 ± 0.11 0.3487 ± 0.15 0.4602 ± 0.04 0.6906 ± 0.12
KNN 0.5601 ± 0.02 0.4982 ± 0.12 0.4880 ± 0.05 0.3760 ± 0.05 0.3717 ± 0.01 0.3129 ± 0.04 0.4573 ± 0.04 0.3985 ± 0.03 0.4604 ± 0.04 0.4228 ± 0.02 0.7158 ± 0.04
CNN 0.6534 ± 0.03 0.6710 ± 0.04 0.5730 ± 0.07 0.5915 ± 0.04 0.4872 ± 0.06 0.3916 ± 0.01 0.4716 ± 0.02 0.5610 ± 0.04 0.5201 ± 0.07 0.5072 ± 0.12 0.8305 ± 0.02
CapsNet 0.6721 ± 0.06 0.6430 ± 0.11 0.5928 ± 0.02 0.6592 ± 0.07 0.5340 ± 0.16 0.4924 ± 0.12 0.5935 ± 0.04 0.6026 ± 0.15 0.5873 ± 0.02 0.6453 ± 0.13 0.8159 ± 0.01
Average (ACC) 0.5747 ± 0.052 0.5863 ± 0.06 0.5230 ± 0.05 0.5223 ± 0.036 0.4268 ± 0.074 0.3790 ± 0.04 0.4792 ± 0.05 0.5097 ± 0.072 0.4681 ± 0.066 0.4839 ± 0.064 0.7613 ± 0.04
15
16

Table 6: CNN network accuracy in image classifcation after 10 minutes for SAETM features and ten other features.
Features
Time
(m) Power Power Power Power Standard Zero-crossing Fractal Approximate Correlation
Mean SAETM
theta alpha beta gamma deviation rate dimension entropy dimension
1 0.4281 0.4323 0.4303 0.4183 0.4065 0.3619 0.4003 0.4549 0.4426 0.4273 0.4874
2 0.4170 0.4382 0.4580 0.4089 0.4103 0.3547 0.4097 0.4609 0.4432 0.4175 0.4787
3 0.4295 0.4693 0.4531 0.4198 0.4074 0.3683 0.4162 0.4656 0.4594 0.4145 0.5996
4 0.4536 0.4726 0.4854 0.4186 0.4238 0.3605 0.4229 0.4768 0.4529 0.4291 0.6491
5 0.4609 0.4812 0.4930 0.4347 0.4239 0.4038 0.4140 0.4611 0.4535 0.4043 0.7752
6 0.4729 0.4983 0.5034 0.4279 0.4158 0.3794 0.4371 0.4662 0.4485 0.4296 0.7923
7 0.4582 0.4978 0.5012 0.4376 0.4283 0.3775 0.4395 0.4693 0.4526 0.4352 0.8164
8 0.4901 0.5391 0.5078 0.4594 0.4192 0.3738 0.4588 0.4738 0.4594 0.4808 0.8283
9 0.5865 0.6449 0.5539 0.5523 0.4763 0.3875 0.4521 0.5532 0.4820 0.4986 0.8298
10 0.6534 0.6710 0.5730 0.5915 0.4872 0.3916 0.4716 0.5610 0.5201 0.5072 0.8305
Journal of Healthcare Engineering
Journal of Healthcare Engineering 17

ANOVA for scale questionnaire


8
7
6
5
Scale

4
3
2
1
SAETM Teta Alpha Beta Gamma Mean Std ZCR FD ApEn CD
Figure 12: Statistical comparison of scores obtained from scale questionnaire for maps extracted from SAETM, EEG power sub-bands
(theta, alpha, beta, gamma), mean, standard deviation (Std), zero-crossing rate (ZCR), fractal dimension (FD), approximate entropy
(ApEn), and correlation dimension (CD) by ANOVA test.

As it can be seen in Figures 11(a) and 11(b), SAETM map the EEG signal. Te accuracy of the network was
is clearly interhemispheric asymmetric and shows that evaluated in three modes of using only linear fea-
arousal is associated with brain activity in the right posterior tures, nonlinear features, and fnally the use of linear
cortex and valence is associated with brain activity in the left and nonlinear features, according to which the
frontal lobe, which is supported by Rogenmoser et al. [34]. choice of linear and nonlinear features increased the
Te relative diferences in interhemisphere asymmetry be- accuracy of the network.
tween high and low valence conditions, were investigated (iii) Te optimal number of neurons in each hidden
and the results of Kolmogorov–Smirnov (ks) tests show layer for each SAE network was calculated based on
signifcant diferences (p < 0.01). We also investigate the the SLF. For example, ten extracted features are
dynamics of interhemisphere asymmetry by applying the compressed into seven and fnally into four features
Shannon entropy of the extracted maps (10 minute) for in the F3 channel.
diferent valence in trials. A signifcant diference was found
(iv) Te accuracy of the SAETM for classifying the four
(p < 0.01). Te results show interhemisphere asymmetry
classes of emotions is a parameter for evaluating the
refects activity in subcortical brain regions. Specifcally
choice of feature extraction method. SAE networks
changes in prefrontal asymmetry are known to be related
can correctly select features due to their deep
with amygdala and cerebellum. As the SATEM map shows
structure and the accuracy calculated in the valence
frontal asymmetry is well refected in high valence-high
dimension (83.3%) and in the arousal dimension
arousal condition as well as supported by Hamann [30].
(82.8%).
As depicted in last row of Figure 11, in low arousal-low
valence condition, asymmetry relates to frontal-occipital and (v) Extracting the topographic maps of the SAETM was
it can be most likely related to visual processing activity used in this study and the results were compared
rather than emotional activity. Furthermore, this can be quantitatively and qualitatively with common maps.
observed in high valence-low arousal, however, frontal Te accuracy of maps classifers as a criterion for
asymmetry is also observed to some extent. Terefore, we quantifying image diferentiation indicated that
can conclude that low arousal stimuli do not cause great deal CNN has the highest accuracy on maps from the
of frontal asymmetry. In addition, as can be seen in high- SAETM (0.8305 ± 0.02). Qualitative evaluation of
arousal stimuli, SAETM map is asymmetric in left and right maps by the experts showed that maps obtained
hemispheres. from the SAETM are signifcantly diferent from
According to the results, the following items were common maps.
evaluated to test the hypotheses of this study. (vi) Features extracted from the SAETM produced maps
in less time than a single feature. CNN classifed
(i) SAE networks can extract deep features in the EEG maps with more than 79% accuracy fve minutes
signal due to their deep structure. Feature extraction after the signal. Tis result showed that the speed of
by SAE networks was compared with PCA, non- user recognition increases by enhancing speed of
linear PCA, and KLDA feature extraction methods. image production.
Te results showed that SAE networks can extract
features with the accuracy of 83.3% and 82.8% in the Finally, the limitations in current work and further work
valence and arousal dimensions, respectively. may include the following:
(ii) Te use of linear and nonlinear features was ex- (i) Te SATEM emotion classifer presents in this study
pected to provide better representations of the is designed by the classifer paradigm. In future
signal to the classifer due to the nonlinear nature of studies, we propose that the network structure of the
18 Journal of Healthcare Engineering

SAE be formed in an automatic manner, as well as Conflicts of Interest


the network structure based on the criteria and
quantitative methods for generating topographic Te authors declare that there are no conficts of interest
maps with the highest distinction. regarding the publication of this paper.
(ii) Te performance of the SATEM has been under-
mined when data are limited. Te potential reason is Acknowledgments
that the deep models require large size of data Te authors are grateful for the support of the Islamic Azad
samples. On the other hand, considering that University, Science and Research Branch of the Faculty of
stacked autoencoders have the ability to extract deep Biomedical Engineering. Tis study was self-funded.
features in the data, it is suggested to use raw EEG
signal instead of the features that used in this study
References
for SAE input to retain the spatial characteristics of
EEG signals as much as possible. [1] C. Wei, L. L. Chen, Z. Z. Song, X. G. Lou, and D. D. Li, “EEG-
(iii) Since topographic maps provide rich information in based emotion recognition using simple recurrent units
the diagnosis of mental disorders, other directions network and ensemble learning,” Biomedical Signal Processing
and Control, vol. 58, Article ID 101756, 2020.
deserving of exploration in future works include
[2] L. Shu, J. Xie, M. Yang et al., “A review of emotion recognition
implemented on more datasets especially for mental using physiological signals,” Sensors, vol. 18, no. 7, p. 2074,
disorders and functional network analysis based on 2018.
the decoded hidden features. Moreover, the authors [3] A. Topic and M. Russo, “Emotion recognition based on EEG
suggest the simultaneous fMRI and EEG to in- feature maps through deep learning network,” Engineering
vestigate the relationship between the obtained Science and Technology, an International Journal, vol. 24,
maps and the deeper sources of the brain. no. 6, pp. 1442–1454, 2021.
[4] J. Li, Z. Zhang, and H. He, “Hierarchical convolutional neural
6. Conclusions networks for EEG-based emotion recognition,” Cognitive
Computation, vol. 10, no. 2, pp. 368–380, 2018.
In this study, we proposed and implemented a stacked [5] Y. Gao, H. J. Lee, and R. M. Mehmood, “Deep learninig of
autoencoder network, which creates novel emotional to- EEG signals for emotion recognition,” in Proceedings of the
pographic EEG brain maps. Tis deep learning approach IEEE International Conference On Multimedia & Expo
Workshops (ICMEW), Italy, Europe, July 2015.
aimed to extract EEG maps with higher diferentiation than
[6] Z. Yin, M. Zhao, Y. Wang, J. Yang, and J. Zhang, “Recognition
common maps. Tis method combines EEG features of emotions using multimodal physiological signals and an
commonly used in emotion studies to extract richer features ensemble deep learning model,” Computer Methods and
in a supervised emotion classifcation framework. In addi- Programs in Biomedicine, vol. 140, pp. 93–110, 2017.
tion, the accuracy of the classifer was considered as a cri- [7] A. Emami, N. Kunii, T. Matsuo, T. Shinozaki, K. Kawai, and
terion for optimal feature combination. Terefore, the H. Takahashi, “Autoencoding of long-term scalp electroen-
obtained map is considered as the optimal map in terms of cephalogram to detect epileptic seizure for diagnosis support
diferentiating between diferent emotional states. Perfor- system,” Computers in Biology and Medicine, vol. 110,
mance of the algorithm was approved by the quantitative pp. 227–233, 2019.
and qualitative evaluation of classifer accuracy and emo- [8] J. Prabin Jose, M. Sundaram, and G. Jafno, “Adaptive rag-
bull rider: a modifed self-adaptive optimization algorithm for
tional EEG maps extracted from DEAP database. Te results
epileptic seizure detection with deep stacked autoencoder
obtained in this study show that the proposed method has an using electroencephalogram,” Biomedical Signal Processing
acceptable ability to create topographic brain maps with and Control, vol. 64, Article ID 102322, 2021.
more diferentiation than conventional EEG maps. It also [9] Y. Zhao and L. He, “Deep learning in the EEG diagnosis of
allows us to better understand the involvement of diferent alzheimer’s disease,” in Computer Vision, C. Jawahar and
areas of the brain in emotional activities with the state-of- S. Shan, Eds., Springer, Berlin, Germany, 2015.
the-art deep learning models. [10] J. Liu, G. Wu, Y. Luo et al., “EEG-based emotion classifcation
using a deep neural network and sparse autoencoder,”
Data Availability Frontiers in Systems Neuroscience, vol. 14, p. 43, 2020.
[11] G. Chen, X. Zhang, Y. Sun, and J. Zhang, “Emotion feature
Te DEAP Dataset (A Dataset for Emotion Analysis) used to analysis and recognition based on reconstructed EEG sour-
support the fndings of this study were supplied by Sander ces,” IEEE Access, vol. 8, pp. 11907–11916, 2020.
[12] C. M. Michel and D. Brunet, “EEG source imaging: a practical
Koelstra et al. under license and so cannot be made freely
review of the analysis steps,” Frontiers in Neurology, vol. 10,
available. Requests for access to these data should be made to p. 325, 2019.
([email protected], https://www.eecs.qmul.ac.uk/ [13] L. S. Hooi, H. Nisar, and Y. V. Voon, “Tracking of EEG
mmv/datasets/deap/). activity using topographic maps,” in Proceedings of the IEEE
International Conference on Signal and Image Processing
Consent Applications (ICSIPA), Kuala Lumpur, Malaysia, October
2015.
Te data were recorded with the written consent of the [14] Z.-M. Wang, S.-Y. Hu, and H. Song, “Channel selection
participants. method for EEG emotion recognition using normalized
Journal of Healthcare Engineering 19

mutual information,” IEEE Access, vol. 7, pp. 143303–143311, [30] S. Hamann, “Mapping discrete and dimensional emotions
2019. onto the brain: controversies and consensus,” Trends in
[15] S. Keshmiri, M. Shiomi, and H. Ishiguro, “Entropy of the Cognitive Sciences, vol. 16, no. 9, pp. 458–466, 2012.
multi-channel EEG recordings identifes the distributed sig- [31] L. A. Schmidt and L. J. Trainor, “Frontal brain electrical
natures of negative, neutral and positive afect in whole-brain activity (EEG) distinguishes valence and intensity of musical
variability,” Entropy, vol. 21, no. 12, p. 1228, 2019. emotions,” Cognition & Emotion, vol. 15, no. 4, pp. 487–500,
[16] S. Siddharth, T.-P. Jung, and T. J. Sejnowski, “Utilizing deep 2001.
learning towards multi-modal bio-sensing and vision-based [32] K. M. Heilman, “Te neurobiology of emotional experience,”
afective computing,” IEEE Transactions on Afective Com- Journal of Neuropsychiatry and Clinical Neurosciences, vol. 9,
puting, vol. 13, no. 1, pp. 96–107, 2022. no. 3, pp. 439–448, 1997.
[17] G. P. Renieblas, A. T. Nogués, A. M. González, N. Gómez- [33] D. Sammler, M. Grigutsch, T. Fritz, and S. Koelsch, “Music
and emotion: electrophysiological correlates of the processing
Leon, and E. G. del Castillo, “Structural similarity index family
of pleasant and unpleasant music,” Psychophysiology, vol. 44,
for image quality assessment in radiological images,” Journal
no. 2, pp. 293–304, 2007.
of Medical Imaging, vol. 4, no. 3, Article ID 035501, 2017.
[34] L. Rogenmoser, N. Zollinger, S. Elmer, and L. Jäncke, “In-
[18] H. Chao, L. Dong, Y. Liu, and B. Lu, “Emotion recognition
dependent component processes underlying emotions during
from multiband EEG signals using CapsNet,” Sensors, vol. 19, natural music listening,” Social Cognitive and Afective
no. 9, p. 2212, 2019. Neuroscience, vol. 11, no. 9, pp. 1428–1439, 2016.
[19] S. Koelstra, C. Muhl, M. Soleymani, and L. Jong-Seok,
A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras,
DEAP: a database for emotion analysis ;using physiological
signals,” IEEE Transactions on Afective Computing, vol. 3,
no. 1, pp. 18–31, 2012.
[20] J. Atkinson and D. Campos, “Improving BCI-based emotion
recognition by combining EEG feature selection and kernel
classifers,” Expert Systems with Applications, vol. 47,
pp. 35–41, 2016.
[21] J. Tong, S. Liu, Y. Ke et al., “EEG-based emotion recognition
using nonlinear feature,” in Proceedings of the 2017 IEEE 8th
International Conference on Awareness Science and Technol-
ogy (iCAST), Taiwan, China, November 2017.
[22] W. L. Zheng, J. Y. Zhu, and B. L. Lu, “Identifying stable
patterns over time for emotion recognition from EEG,” IEEE
Transactions on Afective Computing, vol. 10, no. 3, pp. 417–
429, 2019.
[23] C. Yu and M. Wang, “Survey of emotion recognition methods
using EEG information,” Cognitive Robotics, vol. 2, pp. 132–
146, 2022.
[24] K. Mao, R. Tang, X. Wang, W. Zhang, and H. Wu, “Feature
representation using deep autoencoder for lung nodule image
classifcation,” Complexity, vol. 2018, Article ID 3078374,
11 pages, 2018.
[25] X. Chai, Q. Wang, Y. Zhao, X. Liu, O. Bai, and Y. Li, “Un-
supervised domain adaptation techniques based on auto-
encoder for non-stationary EEG-based emotion recogni-
tion,” Computers in Biology and Medicine, vol. 79, pp. 205–
214, 2016.
[26] J. Wang and M. Wang, “Review of the emotional feature
extraction and classifcation using EEG signals,” Cognitive
robotics, vol. 1, pp. 29–40, 2021.
[27] R. Bajpai, R. Yuvaraj, and A. A. Prince, “Automated EEG
pathology detection based on diferent convolutional neural
network models: deep learning approach,” Computers in
Biology and Medicine, vol. 133, Article ID 104434, 2021.
[28] H. Yu, L. T. Yang, Q. Zhang, D. Armstrong, and M. J. Deen,
“Convolutional neural networks for medical image analysis:
state-of-the-art, comparisons, improvement and perspec-
tives,” Neurocomputing, vol. 444, pp. 92–110, 2021.
[29] R. Chauhan, K. K. Ghanshala, and R. C. Joshi, “Convolutional
neural network (CNN) for image detection and recognition,”
in Proceedings of the First International Conference on Secure
Cyber Computing and Communication (ICSCCC), Jalandhar,
India, Decmber 2018.

You might also like