A Dual-Branch Dynamic Graph Convolution Based Adaptive TransFormer Feature Fusion Network For EEG Emotion Recognition
A Dual-Branch Dynamic Graph Convolution Based Adaptive TransFormer Feature Fusion Network For EEG Emotion Recognition
4, OCTOBER-DECEMBER 2022
Abstract—Electroencephalograph (EEG) emotion recognition plays an important role in the brain-computer interface (BCI) field.
However, most of recent methods adopted shallow graph neural networks using a single temporal feature, leading to the limited
emotion classification performance. Furthermore, the existing methods generally ignore the individual divergence between
different subjects, resulting in poor transfer performance. To address these deficiencies, we propose a dual-branch dynamic
graph convolution based adaptive transformer feature fusion network with adapter-finetuned transfer learning (DBGC-ATFFNet-
AFTL) for EEG emotion recognition. Specifically, a dual-branch graph convolution network (DBGCN) is firstly designed to
effectively capture the temporal and spectral characterizations of EEG simultaneously. Second, the adaptive Transformer
feature fusion network (ATFFNet) is conducted by integrating the obtained feature maps with the channel-weight unit, leading to
significant difference between different channels. Finally, the adapter-finetuned transfer learning method (AFTL) is applied in
cross-subject emotion recognition, which proves to be parameter-efficient with few samples of the target subject. The
competitive experimental results on three datasets have shown that our proposed method achieves the promising emotion
classification performance compared with the state-of-the-art methods. The code of our proposed method will be available at:
https://github.com/smy17/DANet.
Index Terms—EEG, emotion recognition, graph neural network, Transformer, transfer learning
spectral features simultaneously, which results in the insuf- Main contributions of this paper are summarized as
ficient extraction of EEG emotional information. follows:
In addition, recent studies have demonstrated that deep
learning methods are able to learn more discriminative fea- 1) We propose a novel DBGC-ATFFNet-AFTL method
tures from data automatically [15]. Many researchers have for EEG emotion recognition, which integrates high-
started to explore high-level information in EEG emotion level features with dual branches into the deep learn-
recognition with graph neural networks [16] and then fuse ing network. The proposed DBGC-ATFFNet-AFTL
the extracted feature maps with shallow 2D convolution. method performs more accurately and efficiently on
For instance, Song et al. [17] applied the graph neural net- emotion classification than the widely used dynamic
work to classify the DE features, which efficiently modeled Graph Neural Network.
the connection between different EEG channels with a 2) A dual-branch dynamic graph convolution block is
dynamic adjacent matrix. In order to simultaneously take developed to acquire the temporal and spectral char-
the distribution of brain regions into account, a novel fea- acteristics by dual branches, which overcomes the
ture representation of EEG features was discussed [18] and weakness of insufficient emotional information extrac-
4D convolution was applied to extract high-level feature tion with a single encoding path.
maps. However, conventional CNNs can only focus on local 3) We design an adaptive transformer feature fusion
spatial features of the brain network, which leaded to the network to implement the fusion of high-level tem-
loss of patterns in higher dimensional space. In order to poral and spectral features simultaneously, effi-
avoid this limitation, the attention units were applied with ciently associating the spatial distribution of EEG
LSTM to improve the invariance ability against the emo- channels with deeply encoded emotion characteris-
tional intensity fluctuation and automatically adjust the tics, and thus boosts the classification performance.
weights of channels [35], which may provide a solution to 4) We propose an adapter-finetuned transfer learning
efficiently fusing the obtained feature maps from GNNs. algorithm to realize the rapid cross-subject EEG
Some recent studies started to introduce cross-subject emotion recognition through finetuning the Adapter
experiments in order to enable rapid application of emo- modules. It can effectively avoid the overfitting
tion recognition in BCIs. Although the classical subject- problem brought by the subject-dependent method
dependent methods [18] have been proved to show an and show an outstanding performance with quite a
outstanding performance in emotion recognition, the risk small number of trainable parameters.
of overfitting and dependence on the amount of data
limit its flexible application. In order to overcome these
problems, Song et al. [17] and Li et al. [35] applied the 2 METHODOLOGY
leave-one-subject-out method to transfer the general The overall architecture of our proposed DBGC-ATFFNet-
emotion pattern of source subjects to the target subject. AFTL is outlined in Fig. 1, and summarized as follows.
Although these methods achieved promising classifica- For the design of our proposed model, differential entropy
tion results, non-negligible divergence between different (DE) and power spectral density (PSD) of EEG segments
individuals and the simple transferring without adaption are first calculated by inverse fast Fourier transform
maybe leads to misjudgment of the pretrained model, (IFFT) and short time Fourier transform (STFT), and then
which results in poor classification performance on the the DBGC module captures both the temporal DE and
target subject. spectral PSD information of EEG signals by using dual
To address the issues above, in this article, we propose a branches of graph convolution; second, the ATFFNet
novel dual-branch dynamic graph convolution based adap- effectively fuses the obtained feature maps, which consists
tive transformer feature fusion network with adapter-fine- of multi-head self-attention mechanism and subject-adap-
tuned transfer learning, namely, DBGC-ATFFNet-AFTL for tive unit (SAU); finally, the classification block gives the
EEG emotion recognition. First, both differential entropy and final recognition results. Additionally, the pipeline of our
power spectral density features of each EEG segment are proposed adapter-finetuned transfer learning is given as
computed and the dual-branch graph dynamic convolution follows: 1) We firstly divide the whole dataset into two
(DBGC) network is designed to capture deeper temporal and parts: the i-th subject is the target subject, which is ran-
spectral features in different frequency bands from dual domly selected, while the rest (N-1) subjects are the source
branches respectively. Second, the Adaptive Transformer fea- subjects. Specifically, half samples of the target subject are
ture fusion Network (ATFFNet) is adopted to apply self- used for finetuning while the other half samples from the
attention mechanism on different kinds of feature maps in i-th subject are applied to evaluate the classification per-
consideration of the channel connection, in order to effec- formance; 2) We then pretrain the proposed model on all
tively capture the global pattern of the emotion status. samples of source subjects and update all the trainable
Moreover, the adapter-finetuned transfer learning (AFTL) parameters; 3) Based on training samples of the target
algorithm is proposed to efficiently avoid the overfitting subject, we only finetune the parameters of Adapter,
problem caused by insufficient samples of target subject which is embedded in the SAU, to bridge the gaps
through finetuning the Adapter modules. The proposed between the target subject and source subjects. 4) Finally,
DBGC-ATFFNet-AFTL method is evaluated on three public the well-trained model is evaluated on test samples of the
datasets and gains promising performance compared with target subject. In the following three subsections, we dis-
the state-of-the-art methods, demonstrating its efficacy in cuss in detail the specific implementation of the proposed
EEG emotion recognition. innovation modules.
Authorized licensed use limited to: Srinakharinwirot University provided by UniNet. Downloaded on May 13,2024 at 15:45:21 UTC from IEEE Xplore. Restrictions apply.
2220 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 13, NO. 4, OCTOBER-DECEMBER 2022
2.1 Dual-Branch Dynamic Graph Convolution The PSD features can be computed by using Short Time
Network Fourier Transform (STFT), which is defined by [17]:
The original EEG signals are defined as E ¼ fðXi ; yi Þji ¼
1; 2; . . . ; Kg, where Xi 2 RCS is a two-dimension array hp ðXÞ ¼ E x2 (2)
representing the i-th EEG trail with C channels and S sam-
ples. K is the total number of EEG signal trails. yi is the corre- where x is formally a signal variable acquired from a certain
sponding label of Xi and takes its value from label set L frequency band on a certain EEG channel.
including M classes in an emotion recognition task. For Therefore, we extract the DE feature tensor and PSD fea-
example, the label set for the discrete datasets (i.e., SEED) ture tensor, F D ; F P 2 RCB from each segment, where C is
consists of the explicit emotion statuses: L ¼ fl1 ¼ 00neutral00 ; the number of EEG channels and B is the number of fre-
l2 ¼ 00happy00 ; l3 ¼ 00 sad00 g. quency bands, respectively. Since the above extracted fea-
As previous works [13], [20], we firstly split the original tures are relatively independent and fail to fully consider
EEG signals with a T s-long window without overlapping, the effects of different brain regions on emotion, in this part,
and it has been proved that real-time emotion recognition a DBGC module is designed to further explore deeper tem-
can be approximately realized with emotional information poral and spectral features with channel relationship. The
largely preserved when T is set 1 [17]. Each segment is layout of the detailed DBGC is depicted in Fig. 2, which
assigned with the same label as the original EEG signals. includes two synchronized Graph-Conv branches with a
According to the experiment results in recent studies deeply encoded adjacent matrix.
[17], [18], differential entropy (DE) and power spectral den- Based on the previous extracted DE and PSD features,
sity (PSD) features have been proved to achieve a promising a dynamic adjacent matrix is proposed to model the con-
performance in depicting the emotional fluctuation. How- nection among EEG channels [21], [22]. We firstly ran-
ever, for different types of emotional stimulation, the two domly initialize an adjacent matrix A 2 RCC , where the
features make a different contribution to the final recogni- ði; jÞ-th element measures the coupling strengths between
tion result. Under video stimulation (i.e., SEED dataset), for the i-th and j-th EEG channel. In this way, A shows that
example, DE features perform better in identifying the emo- every channel is densely related with each other, taking
tion of subjects [12], while PSD features prove to have an direction and strength into account simultaneously. Then
outstanding result under music stimulation (i.e., DEAP the matrix is encoded using Tanh nonlinearity to simulate
dataset) [19]. Therefore, we choose both DE and PSD as the the directional dependencies between different channels
basic input features for our proposed model. as follows:
The two types of features of all EEG channels are computed
according to the five frequency bands [20] (i.e., d[14 Hz], A ~
~dd ¼ s W 2 d W 1 A (3)
u[48 Hz], a[814 Hz], b[1431 Hz] and g[3151Hz]), both of
which have been proven to be effective for emotion recognition where A ~ 2 RCC is vectorized from A, W 1 2 Rð r ÞðCCÞ CC
CC
[5]. The DE feature for Gaussian distribution is defined as fol- and W 2 2 RðCCÞð r Þ are weight matrixes, dðÞ and sðÞ are
lows [12]: ELU and Tanh functions respectively, and r is the reduction
Z ratio. Therefore, a dense adjacent matrix Add 2 RCC is
1
1 ðx mÞ2 1 ðx mÞ2 obtained by reshaping A ~dd 2 RðCCÞ1 into RCC , where the
D ðX Þ ¼ pffiffiffiffiffiffi exp ln pffiffiffiffiffiffiffiffiffiffi exp dx
1 2ps
2 2s 2 2ps 2 2s 2 ði; jÞ-th entry is learnable and reflects the directional depen-
1 dency between the i-th and j-th EEG channel. Then we
¼ ln2pes 2
2 adopt a rectified linear unit (ReLU) to penal weak channel
(1) couplings and as a result, a non-negative adjacent matrix
where X denotes the Gaussian distribution Nðm; s 2 Þ, x is a Ads is achieved. Thus, G is defined as G ¼ fV; F D ; F P ; Ads g,
variable, p and e are constants, respectively. where V is the vertex set with jVj ¼ C nodes, node attributes
Authorized licensed use limited to: Srinakharinwirot University provided by UniNet. Downloaded on May 13,2024 at 15:45:21 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: DUAL-BRANCH DYNAMIC GRAPH CONVOLUTION BASED ADAPTIVE TRANSFORMER FEATURE FUSION NETWORK FOR EEG... 2221
HT0 ¼ MHSAðLN ðH ÞÞ þ H C
H T ¼ SAU LN HT0 þ HT0 (5)
Fig. 2. The layout of dual-branch dynamic graph conv network.
TABLE 1
QK T
Z ¼ Ads softmax pffiffiffiffiffi V (7) Details of SEED, SEED IV, and DEAP
dk
Item SEED SEED-IV DEAP
Eq. (7) calculates the weight of the global spatial connec- Channel Num 62 62 32
Subject Num 15 15 32
tion between all EEG channels and fuses different kinds of Video Num 15 24 40
extracted features efficiently. Stimulus Materials Film Clips Film Clips Music Videos
Moreover, we add the Adapter module to the subject- Emotion Status 3 classes 4 classes —
adaptive unit (SAU) apart from the common multi-layer
perceptron [23] for the purpose of transfer learning. The
architecture of adapter module is depicted in Fig. 3, which
indicator function. In order to avoid the overfitting problem
consists of a bottleneck including two feed-forward layers
of the proposed model, we also introduce the trade-off regu-
and an ELU activation unit. To make it more parameter-effi-
larization to the Eq. (8), where u refers to the learnable
cient, the adapter also contains a skip-connection. Adapter
parameters of the model and is the regularization weight.
modules perform more general architectural modifications
to make a pre-trained model suitable for the target subject,
which involves adding a small number of new parameters. 3 EXPERIMENTAL RESULTS
The detailed description of transfer learning is given in the 3.1 EEG Datasets
next section. Finally, the fused feature map is obtained after The effectiveness of the proposed DBGC-ATFFNet-AFTL is
processing the output of self-attention mechanism by means evaluated on three public EEG datasets which are described
of the matrix multiplication and layer normalization. in this section:
2.3 Classification Block 1) SJTU Emotion EEG Dataset: (SEED) [12] contains EEG data
The classification block in the proposed model is designed of 15 subjects (7 males and 8 females), which are col-
to give the final affective computing results based on the lected via 62 EEG electrodes from the subjects when
fused high-level features. We firstly flat all the feature maps they are watching fifteen Chinese film clips with three
into 1-Dimension vector and feed them into two fully con- types of emotions, i.e., negative, positive and neutral.
nected layers. Then, the Softmax function computes the Each subject has three sessions and there are 15 trails (5
classification probabilities from the output vectors, the max- for each class) per session. The EEG signals were
imum of which is considered as the classification result. recorded and down sampled to 200 Hz by 62 electrodes.
2) SJTU Emotion EEG Dataset IV: (SEED IV) [25] comprises
EEG data of 15 subjects (7 males and 8 females)
Algorithm 1. The Learning Rate and Parameter Update
recorded in 62 channels. The experiment setting is the
in the Proposed DBGC-ATFFNet-AFTL Method
same as SEED. The data were collected when partici-
Input: Raw EEG signals x, the corresponding class labels y, the pants watch movies in four types of emotions, namely
maximum epoch t, learning rate , regularization weight , the neutral, sad, fear, and happy, and the eye movement
DBGC-ATFFNet-AFTL NetðÞ; features are not used in this paper. Each movie lasts
Output: The affective computing result O; around 2 minutes. Three sessions of data are collected
1 Initializing parameters in the as uð0Þ ; and each session comprises 24 trials/movies for each
2 Initializing data Di in one batch with i ¼ 1; 2; . . . ; N; subject.
3 Initializing t ¼ 200, ¼ 1 103 , q ¼ 0; 3) Database for Emotion Analysis using Physiological Signals:
4 while q 6¼ t do
(DEAP) [19] consists of EEG data from 32 subjects (16
5 for i ¼ 1 to N do
males and 16 females) who were shown 40 music vid-
6 Generate conditional probability pj ¼ NetðuðqÞ ; Di Þ;
eos, and the physiological signals of the subjects were
7 Calculate loss J ðqÞ on x by Eq. (8);
8 Calculate the gradient g ¼ rJ ðqÞ ;
recorded. Additionally, the subjects specified rating
9 Update the parameters: uðqþ1Þ uðqÞ g; values according to four emotional states (valence,
10 end for arousal, liking, and dominance) using consecutive
11 q þ þ; numbers between 1 and 9. The length of the DEAP data
12 end while was 63s sampled at 128 Hz in 32 channels.
13 Get the classification result O ¼ NetðuðtÞ ; xÞ; As is discussed in [26], there are repeated sessions in
SEED and the first one reflects stronger emotion feedback
To summarize, the optimizing procedure of our proposed that is more reliable than the latter two sessions, we also
DBGC-ATFFNet-AFTL is shown in Algorithm 1. The model only use the first session for each subject to ensure the con-
is trained by minimizing the cross-entropy loss J between sistency of evaluation in the experiment of this paper. We
model prediction and the label, which is defined by: strictly follow the evaluation protocol in [5] for all three
datasets and the detail information of three datasets can be
N X
X M
J¼ log ðpi Þvðyi ¼ li Þ þ kuk (8) found in Table 1.
j¼1 i¼1
3.2 Evaluation Metrics and Models
where pi is the j-th conditional probability generated by the The proposed DBGC-ATFFNet-AFTL is evaluated by the
model, lj is the j-th class from the label set L, vðÞ is the classification accuracy (Acc) [12], F1-score (F1) [27], and
Authorized licensed use limited to: Srinakharinwirot University provided by UniNet. Downloaded on May 13,2024 at 15:45:21 UTC from IEEE Xplore. Restrictions apply.
SUN ET AL.: DUAL-BRANCH DYNAMIC GRAPH CONVOLUTION BASED ADAPTIVE TRANSFORMER FEATURE FUSION NETWORK FOR EEG... 2223
TABLE 2
The Overall Comparison of Classification Performance on SEED Dataset
area under curve (AUC) [28], respectively. Moreover, stan- which is 1.93%, 2.87%, and 7.92% higher than baseline
dard deviation (Std) [29] is introduced to assess the robust- methods respectively. The obvious improvement on Acc
ness of our proposed model. Specifically, for a certain shows that our proposed method can better fuse two types
subject, the Std value is calculated based on the accuracy in of features and in the meantime, a relatively low standard
multiple folds of testing, while for the average result, Std is deviation of our method indicates that its robust ability of
obtained from average accuracies of all subjects. In order to high- level feature extraction.
compare with the proposed method, we use three baseline In terms of SEED IV dataset, our proposed approach gets
models and all these models are retested on three datasets 89.97%, 0.898 and 0.971 on Acc, F1, and AUC, which per-
in the same experiment environment. We faithfully repro- forms better than all three other baseline models. Moreover,
duced these deep learning models and brief introductions a relatively low standard deviation of 2.85% demonstrates
of the three baseline models are given as follows: the robustness of our proposed method. Particularly, since
there are totally 32 subjects in DEAP dataset, we summarize
1) DGCNN [17]: This method uses a dynamic adjacent the experiment results in Fig. 4 due to space limitation. The
matrix to simulate the channel relationship with shal- confusion matrixes are utilized for validation in Fig. 5. We
low DE features, which has shown its ability in EEG can find that our proposed DBGC-ATFFNet-AFTL can
feature extraction. achieve encouraging results of VA-space (4-class: HVHA,
2) 4D-RCNN [18]: This is a CNN-based model which also HVLA, LVHA, LVLA) classification on Acc, F1 and AUC
combines recurrent neural network and DE features to metrics. We can see that our DBGC-ATFFNet gains out-
integrate both spatial and temporal information. standing accuracy results on all three datasets, with an
3) resHGCN [27]: This is a deep learning model with the impressive recognition performance close to 1 in three kinds
residual architecture and an encoded adjacent matrix of emotions on SEED dataset. As a result, the above experi-
using multi fully connected layers, which probably mental results prove the ability of our proposed method in
retains DE features of preprocessed EEG signals. EEG emotion classification.
Moreover, for further investigating the importance of To demonstrate the superiority of our DBGC-ATFFNet-
each feature in our DBGC-ATFFNet-AFTL, we carry out the AFTL on emotion recognition, Table 4 makes a comparison
ablation study with two simplified models, each of which between classification results reported in recent years on
consists of only one feature branch. both SEED and SEED IV datasets. Since most methods only
report Acc and Std on their papers, we use these two met-
3.3 Performance of Subject-Dependent rics for comparison here. From Table 4, we can learn that
Experiments our proposed method has a better classification perfor-
In order to evaluate the efficacy of the proposed framework, mance than most of these methods for the SEED and SEED
we compare our DBGC-ATFFNet-AFTL with the above IV datasets. Compared with the methods which adopted
baseline models. Tables 2 and 3 list the classification results the single DE feature with GNNs, our method fuses both
of subject-dependent experiments on SEED and SEED IV the DE and PSD features and gains accuracy increases of
datasets. From Table 2 we can learn that all three baseline 1.21% and 7.24% on the subject-dependent task of SEED
models can achieve relatively high average accuracies on 15 and SEED IV datasets. Moreover, in comparison with 4D-
subjects of SEED dataset. Especially, our proposed DBGC- CRNN which used convolution operation, our proposed
ATFFNet-AFTL has the highest average Acc, F1, and AUC. method with Adaptive Transformer cares not only the spa-
In terms of Acc, our DBGC-ATFFNet-AFTL reaches 97.31%, tial location relationship but also the global connection of
Authorized licensed use limited to: Srinakharinwirot University provided by UniNet. Downloaded on May 13,2024 at 15:45:21 UTC from IEEE Xplore. Restrictions apply.
2224 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 13, NO. 4, OCTOBER-DECEMBER 2022
TABLE 3
The Overall Comparison of Classification Performance on SEED IV Dataset
different EEG channels, and reaches 3.32% higher in accu- an impressive classification performance in cross-subject
racy. Especially for SEED IV dataset, which contains shorter experiments. As DEAP is a dimensional dataset [19] and
records of EEG signals, our method learns the common pat- there was no previous work which adopted this dataset for
tern of feature distribution from the subjects and has a more similar testing, the experiments are carried out on SEED
robust classification performance. Additionally, the perfor- and SEED IV datasets for further comparison. There are
mance comparison on three kinds of classification tasks of totally 15 subjects in each dataset, to apply adapter-fine-
DEAP dataset is listed in Table 5, and our proposed method tuned transfer learning algorithm, we split the whole data-
performs better on Valence, Arousal and VA-Space classifi- set into two parts: EEG signal data of 14 subjects is used for
cation tasks and gains relatively lower standard deviation pretraining, and EEG data of the rest one subject is used for
of 3.27%, 2.89% and 3.10% respectively. Consequently, it is finetuning and evaluating the performance of the proposed
obvious that our DBGC-ATFFNet-AFTL can distinguish method. The detailed process is depicted in Fig. 1
between EEG signals of different emotions more accurately Firstly, our proposed approach is pretrained on the
and effectively. source data and then freeze all the learnable parameters
except the Adapter modules. It is proven through a series of
TABLE 5
Comparison With the State-of-the-Art Methods of Subject-Dependent on DEAP Dataset
experiments that our adapter-finetuned transfer learning dataset. It is obvious that the introduction of PSD features
strategy has the optimal classification performance with a compensates for the lack of information in the spectral
relatively low resource of data when 50 percent of the target domain of the EEG signals. Additionally, we convolve these
samples are randomly chosen to finetune the parameters of two types of pre-computed features with dual branches of
Adapters. Based on the well-trained model, finally, we carry graph convolution to make the most of different distribu-
out the evaluation experiments on the rest EEG data of the tions in temporal and spectral domain. This enhancement
target subject. To demonstrate the superiority of our indicates that it is helpful to input more pre-processed fea-
Adapter-finetuning method on cross-subject emotion recog- tures and there is probably complementary information in
nition, Table 6 makes comparison between classification these two types of features, which are both beneficial to the
results reported in recent years on both SEED and SEED IV extraction of high-level features.
datasets. Our method gains accuracy of 94.39% and 89.78%
on SEED and SEED IV datasets respectively. Especially for
SEED IV dataset, which contains shorter records of EEG sig- 4.2 Efficacy of the Adaptive Transformer Feature
nals, our proposed method learns the common pattern of Fusion Network
feature distribution from source subjects and has a more To assess the effectiveness of Adaptive Transformer Feature
robust classification performance on the target subject. Fusion Network, we conduct ablation experiments. From
the results which are listed Table 7, we can find that accu-
racy increases from 95.44%, 88.75% and 90.46% to 97.31%,
4 DISCUSSIONS
4.1 Efficacy of Dual-Branch Dynamic Graph
Convolution Network
To validate the superiority of our extracted features, the t-
SNE visualization is utilized. The t-SNE visualizes the
extracted EEG features into a 2D embedding space. The
experimental results for all three datasets are presented in
Fig. 6. From the t-SNE plots, compared with those methods
that applied DGCNN [17], 4D-CRNN [18] and resHGCN
[27] to single DE features, our proposed method obtains an
outstanding classification performance with considerable
inter-class distance on discrete emotion datasets: SEED and
SEED IV, while there is a relatively dense distribution of
features for the DEAP, which is a dimension emotion Fig. 6. The t-SNE visualization in 2D embedding space of different types
of features on all three datasets.
TABLE 6
Comparison With the State-of-the-art Methods of Cross-Subject
on SEED and SEED IV Datasets TABLE 7
Ablation Study of Adaptive Transformer Feature Fusion Network
Method Year SEED SEED-IV on All Three Datasets
Acc Std (%) Acc Std (%)
Method SEED SEED IV DEAP
DGCNN[17] 2018 79.95 9.02 —
RGNN[30] 2020 85.30 6.72 73.84 8.02 AccStd (%) AccStd (%) AccStd (%)
TANN[34] 2021 84.41 8.75 68.00 8.35 w / o ATTFFNet 95.443.06 88.753.72 90.463.98
BiHDM[35] 2021 85.40 7.53 69.03 8.66 w / ATFFNet 97.311.47 89.972.85 91.983.10
Ours 2022 94.39 3.23 89.78 3.09
Where bold fonts indicate best results, “w /o” denotes “without” and “w /”
Where bold fonts indicate best results. denotes “with”.
Authorized licensed use limited to: Srinakharinwirot University provided by UniNet. Downloaded on May 13,2024 at 15:45:21 UTC from IEEE Xplore. Restrictions apply.
2226 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 13, NO. 4, OCTOBER-DECEMBER 2022
89.97%, and 91.98% on all three datasets with relatively low finetuning Adapter module contributes to EEG emotion
standard deviations. In general, the outstanding perfor- classification of crossing subjects. In terms of computational
mance of our DBGC-ATFFNet-AFTL method indicates that cost, as in Table 8, our proposed Adapter-finetuned DBGC-
ATFFNet unit effectively fuses the aforemost extracted fea- ATFFNet spends less time for training and has less parame-
tures and explores the connection between EEG channels ters involved, which enhances the efficiency of transfer
more comprehensively in the meantime. Unsimilar to 2D learning. It is worth mentioning that the training and testing
convolution and 3D convolution, which are commonly used of the deep learning model are performed on Nvidia Tesla
in previous works [18], the Adaptive Transformer in our V100 GPU with 32 GB memory. In this experiment environ-
ATFFNet can avoid the overdependence on the explicit spa- ment, it averagely just takes 53.3s and 22.2s to finish the
tial location distribution. Moreover, the features with the finetuning step and apply the pretrained model to the target
sequential form are beneficial to decrease the number of subject on SEED and SEED IV datasets respectively, which
trainable parameters of the model and reduce the risk of also significantly reduces the quantity of the trainable
overfitting. Additionally, the Channel-Weight unit reinfor- parameters in the meantime compared with the non-trans-
ces the learning of the model about the channel connection fer model.
with the existing trainable parameters.
4.4 Saliency Map Analysis of EEG Channels
4.3 Efficacy of the Adapter-Finetuned In order to demonstrate the interpretability of our pro-
Transfer Learning posed approach, we apply the saliency map method [36],
In order to assess the effectiveness of our adapter-finetuned [37] based on the gradient propagation, which is widely
transfer learning method, we apply the same protocol to the used in the computer vision area. Fig. 8 depicts the aver-
corresponding subject-dependent experiments and the age neural patterns for positive, neutral and negative
results of two training methods on SEED and SEED IV data- emotions in five frequency bands. The visualization on
sets are compared in Fig. 7 and Table 8, respectively. We the scalp maps presents spatial distributions for emotion
explore the effectiveness of the Adapter-finetuning method recognition tasks, directly reflecting how our proposed
from two perspectives. In terms of the classification perfor- method behaves when making inferences on EEG signals.
mance, from Fig. 7, we can learn that the proposed DBGC- For example, from Fig. 8, the emotionally active areas are
ATFFNet-AFTL with finetuned Adapters reaches average mainly concentrated on the left prefrontal and parietal
accuracies of 94.39% and 89.78% on two datasets, which are sites. Brain activities associated with emotions generally
both higher than the non-transfer DBGC-ATFFNet method. have a significant response in b bands. The findings of
Especially on SEED IV dataset, which has a relatively small these saliency maps have been demonstrated and are in
number of EEG samples, the accuracy of the model using line with the existing emotion studies [38], [39], [40].
transfer learning is 4.81% higher than the non-transfer one. Apart from these findings, we can further learn that the
The experiment results indicate the strategy of transfer responses of both the b and g bands are more notable in
learning indeed helps the model learn to extract higher- all three kinds of emotional states. For neutral emotions,
level features which are easily missed in analyzing a single the neural patterns tend to be gentler compared with the
subject. Thus, it is obvious that transfer learning based on positive and negative, while for negative emotions, there
TABLE 8
Comparison of Parameters and Computational Cost on SEED and SEED IV Datasets
are significantly higher b responses at the prefrontal cor- [4] J. J. Yan, W. M. Zheng, Q. Y. Xu, G. M. Lu, H. B. Li, and B. Wang,
“Sparse kernel reduced-rank regression for bimodal emotion rec-
tex, and the lateral temporal and parietal areas activate ognition from facial expression and speech,” IEEE Trans. Multime-
more, which are similar to those of positive emotions. dia, vol. 18, no. 7, pp. 1319–1329, Jul. 2016.
[5] W. L. Zheng, J. Y. Zhu, and B. L. Lu, “Identifying stable patterns
over time for emotion recognition from EEG,” IEEE Trans. Affect.
4.5 Limitations and Future Directions Comput., vol. 10, no. 3, pp. 417–429, Jul.–Sep. 2017.
[6] J. C. Britton, K. L. Phan, S. F. Taylor, R. C. Welsh, K. C. Berridge, and
Although the proposed method achieves outstanding classi- I. Liberzon, “Neural correlates of social and nonsocial emotions: An
fication results, our present work still suffers from several fMRI study,” Neuroimage, vol. 31, no. 1, pp. 397–409, 2006.
limitations. First, the extracted features from preprocessed [7] E. Lotfi and M.-R. Akbarzadeh-T, “Practical emotional neural
networks,” Neural Netw., vol. 59, pp. 61–72, 2014.
EEG signals are constraint to prior knowledge, which may [8] G. Pfurtscheller et al., “The hybrid BCI,” Front. Neurosci., vol. 4,
lead to loss of some useful information hidden in original 2010, Art. no. 3.
EEG signals. Therefore, our important future work is to han- [9] S. Alhagry, A. A. Fahmy, and R. A. El-Khoribi, “Emotion recogni-
dle affective computing tasks with different types of feature tion based on EEG using LSTM recurrent neural network,” Emo-
tion, vol. 8, no. 10, pp. 355–358, 2017.
maps, which are directly extracted from raw signals by [10] M. Murugappan, M. Rizon, R. Nagarajan, and S. Yaacob, “Inferring
using end-to-end deep learning models. Second, although of human emotional states using multichannel EEG,” Eur. J. Sci.
our method shows the effectiveness in cross-subject emo- Res., vol. 48, no. 2, pp. 281–299, 2010.
[11] B. Reuderink, C. M€ uhl, and M. Poel, “Valence, arousal and domi-
tion classification, we involve all the available EEG channels nance in the EEG during game play,” Int. J. Auton. Adaptive Com-
into transfer learning. It is possible that several channels mun. Syst., vol. 6, no. 1, pp. 45–62, 2013.
make greater contribution to affective computing while [12] W. L. Zheng and B. L. Lu, “Investigating critical frequency bands
others provide noisy signals that are harmful to the analysis. and channels for EEG-based emotion recognition with deep neu-
ral networks,” IEEE Trans. Auton. Ment. Develop., vol. 7, no. 3,
Thus, we will consider adaptively selecting a few EEG chan- pp. 162–175, Sep. 2015.
nels with outstanding performance to improve our method [13] W. L. Zheng, J. Y. Zhu, Y. Peng, and B. L. Lu, “EEG-based emotion
in the future work. classification using deep belief networks,” in Proc. IEEE Int. Conf.
Multimedia Expo, 2014, pp. 1–6.
[14] G. W. Xiao, M. Shi, M. W. Ye, B. W. Xu, Z. D. Chen, and Q. S. Ren,
“4D attention-based neural network for EEG emotion recog-
5 CONCLUSION nition,” Cogn. Neurodyn., vol. 16, pp. 805–818, 2022.
[15] H. Zeng et al., “EEG emotion classification using an improved
In this article, we propose a novel dual-branch dynamic sincnet-based deep learning model,” Brain Sci., vol. 9, no. 11, 2019,
graph convolution based adaptive Transformer feature Art. no. 326.
fusion network (DBGC-ATFFNet-AFTL) for EEG emotion [16] Y. Li, J. Y. Liu, Z. Y. Tang, and B. Y. Lei, “Deep spatial-temporal
recognition. Specifically, our proposed DBGC-ATFFNet- feature fusion from adaptive dynamic functional connectivity for
MCI identification,” IEEE Trans. Med. Imag., vol. 39, no. 9,
AFTL first compute differential entropy and power spectral pp. 2818–2830, Sep. 2020.
density features from preprocessed EEG signals. Next, the [17] T. F. Song, W. M. Zheng, P. Song, and Z. Cui, “EEG emotion rec-
dual-branch dynamic graph convolution network extracts ognition using dynamical graph convolutional neural networks,”
IEEE Trans. Affect. Comput., vol. 11, no. 3, pp. 532–541, Third Quar-
high-level EEG information of different emotions through ter 2020.
two branches of feature inputs. Moreover, the adaptive [18] F. Y. Shen, G. J. Dai, G. Lin, J. H. Zhang, W. Z. Kong, and H. Zeng,
transformer feature fusion network is further employed to “EEG-based emotion recognition using 4D convolutional recur-
fuse the obtained features and learn the connection relation- rent neural network,” Cogn. Neurodyn., vol. 14, no. 6, pp. 815–828,
2020.
ship of EEG channels simultaneously in the meantime. [19] S. Koelstra et al., “DEAP: A database for emotion analysis; using
Additionally, we introduce adapter-finetuned transfer learn- physiological signals,” IEEE Trans. Affect. Comput., vol. 3, no. 1,
ing to the cross-subject emotion classification task by finetun- pp. 18–31, Jan.–Mar. 2011.
ing the Adapter modules. We conduct experiments on three [20] L. N. Wang et al., “Automatic epileptic seizure detection in EEG
signals using multi-domain feature extraction and nonlinear ana-
public emotional EEG datasets to evaluate the effectiveness of lysis,” Entropy, vol. 19, no. 6, 2017, Art. no. 222.
our proposed DBGC-ATFFNet-AFTL method. Experimental [21] D. A. Spielman, “Spectral graph theory and its applications,” in Proc.
results show that our proposed method has a better perfor- 48th Annu. IEEE Symp. Foundations Comput. Sci., 2007, pp. 29–38.
[22] Y. Li, Y. Liu, W. G. Cui, Y. Z. Guo, H. Huang, and Z. Y. Hu,
mance in accuracy, F1-score and AUC value. Detailed discus- “Epileptic seizure detection in EEG signals using a unified tempo-
sions demonstrate our proposed approach is able to ral-spectral squeeze-and-excitation network,” IEEE Trans. Neural
effectively learn emotion information from EEG signals, Syst. Rehabil. Eng., vol. 28, no. 4, pp. 782–794, Apr. 2020.
which has a potential application prospect in the affective [23] A. Vaswani et al., “Attention is all you need,” in Proc. 31st Int.
Conf. Neural Inf. Process. Syst., 2017, pp. 5998–6008.
computing area. [24] Z. Liu et al., “Swin transformer: Hierarchical vision transformer
using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.,
REFERENCES 2021, pp. 10 012–10 022.
[25] W. L. Zheng, W. Liu, Y. F. Lu, B. L. Lu, and A. Cichocki,
[1] L. H. He, D. Hu, M. Wan, Y. Wen, K. M. Von Deneen, and M. C. “Emotionmeter: A multimodal framework for recognizing human
Zhou, “Common bayesian network for classification of EEG-based emotions,” IEEE Trans. Cybern., vol. 49, no. 3, pp. 1110–1122, Mar.
multiclass motor imagery BCI,” IEEE Trans. Syst., Man, Cybern.: 2019.
Syst., vol. 46, no. 6, pp. 843–854, Jun. 2016. [26] G. H. Zhang, M. J. Yu, Y. J. Liu, G. Z. Zhao, D. Zhang, and W. M.
[2] L. Fiorini, G. Mancioppi, F. Semeraro, H. Fujita, and F. Cavallo, Zheng, “SparseDGCNN: Recognizing emotion from multichannel
“Unsupervised emotional state classification through physiologi- EEG signals,” IEEE Trans. Affect. Comput., to be published,
cal parameters for social robotics applications,” Knowl.-Based doi: 10.1109/TAFFC. 2021.3051332.
Syst., vol. 190, 2020, Art. no. 105217. [27] Y. Li, Y. Liu, Y. Z. Guo, X. F. Liao, B. Hu, and T. Yu, “Spatio-temporal-
[3] S. Katsigiannis and N. Ramzan, “DREAMER: A database for emo- spectral hierarchical graph convolutional network with semisuper-
tion recognition through EEG and ECG signals from wireless low- vised active learning for patient-specific seizure prediction,” IEEE
cost off-the-shelf devices,” IEEE J. Biomed. Health Inform., vol. 22, Trans. Cybern., to be published, doi: 10.1109/TCYB.2021.3071860.
no. 1, pp. 98–107, Jan. 2018.
Authorized licensed use limited to: Srinakharinwirot University provided by UniNet. Downloaded on May 13,2024 at 15:45:21 UTC from IEEE Xplore. Restrictions apply.
2228 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 13, NO. 4, OCTOBER-DECEMBER 2022
[28] K. Li et al., “Multi-label spacecraft electrical signal classification Shuyue Yu received the BS degree in measure-
method based on DBN and random forest,” PLoS One, vol. 12, ment, control technology, and instrument and the
no. 5, 2017, Art. no. e0176614. MS degree in control science and engineering
[29] Y. Li, L. H. Guo, Y. Liu, J. Y. Liu, and F. G. Meng, “A temporal- from the Beijing University of Posts and Telecom-
spectral-based squeeze-and-excitation feature fusion network for munications, Beijing, China, in 2016 and 2019,
motor imagery EEG decoding,” IEEE Trans. Neural Syst. Rehabil. respectively. She is currently an engineer with
Eng., vol. 29, pp. 1534–1545, 2021. Beijing Aerospace Measurement and Control
[30] P. X. Zhong, D. Wang, and C. Y. Miao, “EEG-based emotion recogni- Technology Company, Ltd. Her current research
tion using regularized graph neural networks,” IEEE Trans. Affect. interests include robotics and BCI.
Comput., to be published, doi: 10.1109/TAFFC.2020.2994159.
[31] J. Y. Liu, Y. X. Zhao, H. Wu, and D. M. Jiang, “Positional-spectral-
temporal attention in 3D convolutional neural networks for EEG
emotion recognition,” 2021, arXiv:2110.09955.
[32] J. X. Ma, H. Tang, W. L. Zheng, and B. L. Lu, “Emotion recognition
using multimodal residual LSTM network,” in Proc. 27th ACM Int. Hongbin Han received the undergraduate and
Conf. Multimedia, 2019, pp. 176–183. graduate degrees in clinical medicine from Dalian
[33] W. Tao et al., “EEG-based emotion recognition via channel-wise Medical University, Dalian, China, in 1996, and the
attention and self attention,” IEEE Trans. Affect. Comput., to be doctor of radiology degree from Peking University
published, doi: 10.1007/S42486-021-00078-Y. Health Center, Beijing, China, in 1998. He com-
[34] Y. Li, B. X. Fu, F. Li, G. M. Shi, and W. M. Zheng, “A novel trans- pleted the radiology residency with the Peking Uni-
ferability attention neural network model for EEG emotion recog- versity Health Center, joined the faculty of the
nition,” Neurocomputing, vol. 447, pp. 92–101, 2021. Department of Radiology, Peking University Third
[35] Y. Li et al., “A novel bi-hemispheric discrepancy model for EEG Hospital in 1998, and is currently the professor with
emotion recognition,” IEEE Trans. Cogn. Devlop. Syst., vol. 13, the Radiology Department. He has established the
no. 2, pp. 354–367, Jun. 2021. Beijing MRI Technology Research Laboratory and
[36] Y. Li, M. Y. Lei, W. G. Cui, Y. Z. Guo, and H. L. Wei, “A paramet- severed as the Director since 2010. His research interests include the
ric time-frequency conditional granger causality method using development and application of advanced MRI techniques for diagnosis,
ultra-regularized orthogonal least squares and multiwavelets for and therapy of human brain diseases.
dynamic connectivity analysis in EEGs,” IEEE Trans. Biomed. Eng.,
vol. 66, no. 12, pp. 3509–3525, Dec. 2019.
[37] Y. Li, H. Yang, B. Y. Lei, J. Y. Liu, and C. Y. Wee, “Novel effective Bin Hu (Member, IEEE) received PhD degree in
connectivity inference using ultra-group constrained orthogonal computer science from the Institute of Computing
forward regression and elastic multilayer perceptron classifier for Technology, Chinese Academy of Science, China,
MCI identification,” IEEE Trans. Med. Imag., vol. 38, no. 5, in 1998. Since 2008, he has been a professor with
pp. 1227–1239, May 2019. the School of Information Science and Engineer-
[38] S. K. Hadjidimitriou and L. J. Hadjileontiadis, “Toward an EEG- ing, Lanzhou University, China. He had been also
based recognition of music liking using time-frequency analysis,” guest professorship in ETH Zurich, Switzerland till
IEEE Trans. Biomed. Eng., vol. 59, no. 12, pp. 3498–3510, Dec. 2012. 2011. His research interests include pervasive
[39] R. Jenke, A. Peer, and M. Buss, “Feature extraction and selection computing, computational psychophysiology, and
for emotion recognition from EEG,” IEEE Trans. Affect. Comput., data modeling.
vol. 5, no. 3, pp. 327–339, Jul.–Sep. 2014.
[40] M. Balconi and C. Lucchiari, “Consciousness and arousal
effects on emotional face processing as revealed by brain oscil- Yang Li received the PhD degree in automatic
lations. A gamma band analysis,” Int. J. Psychophysiol., vol. 67, control and systems engineering from the Univer-
no. 1, pp. 41–46, 2008. sity of Sheffield, Sheffield, U.K., in 2011. He did
post-doctoral research with the Department of
Computer and Biomedical Engineering, Univer-
Mingyi Sun received the bachelor’s degree in sity of North Carolina at Chapel Hill, Chapel Hill,
NC, for one year. In 2013, he joined the Depart-
engineering from Beihang University, Beijing,
ment of Automation Sciences and Electrical Engi-
China, in 2021, where he is currently working
toward the master’s degree with the Department neering, Beihang University, Beijing, China, as a
of Automation Science and Electrical Engineer- professor. His current research interests include
ing. His current research interests include signal system identification and modeling for complex
processing, machine learning, and brain-com- nonlinear processes: NARMAX methodology and applications, nonsta-
puter interface. tionary signal processing and sparse representation, medical image
analysis, and brain–computer interface.
Authorized licensed use limited to: Srinakharinwirot University provided by UniNet. Downloaded on May 13,2024 at 15:45:21 UTC from IEEE Xplore. Restrictions apply.