0% found this document useful (0 votes)
94 views10 pages

Motor Imagery Classification Via Temporal Attention Cues of Graph Embedded EEG Signals

research paper

Uploaded by

Chrisilla S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views10 pages

Motor Imagery Classification Via Temporal Attention Cues of Graph Embedded EEG Signals

research paper

Uploaded by

Chrisilla S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2570 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO.

9, SEPTEMBER 2020

Motor Imagery Classification via Temporal


Attention Cues of Graph Embedded
EEG Signals
Dalin Zhang , Student Member, IEEE, Kaixuan Chen , Student Member, IEEE, Debao Jian,
and Lina Yao , Member, IEEE

Abstract—Motor imagery classification from EEG signals In recent years, there have been substantial achievements
is essential for motor rehabilitation with a Brain-Computer in EEG-based motor imagery classification. However, the
Interface (BCI). Most current works on this issue require outstanding works generally focus on the subject-dependent
a subject-specific adaptation step before applied to a new
user. Thus the research of directly extending a pre-trained scenario, where training and test data are from the same group
model to new users is particularly desired and indispens- of subjects [2]. In this condition, a brief calibration session is
able. As brain dynamics fluctuate considerably across dif- essential before a BCI system is ready to be used by a new
ferent subjects, it is challenging to design practical hand- user [3]. This adaptation process needs to be performed on each
crafted features based on prior knowledge. Regarding this
new subject and in each usage, which is labor-intensive and
gap, this paper proposes a Graph-based Convolutional Re-
current Attention Model (G-CRAM) to explore EEG features time-consuming, resulting in limited usability and scalability of
across different subjects for motor imagery classification. BCI systems. It is essential to overcome this subject-independent
A graph structure is first developed to represent the po- issue. However, the apparent changes in EEG signals across
sitioning information of EEG nodes. Then a convolutional different subjects cause enormous challenges in solving such a
recurrent attention model learns EEG features from both
problem [4].
spatial and temporal dimensions and emphasizes on the
most distinguishable temporal periods. We evaluate the Traditional EEG analysis methods depend on hand-crafted
proposed approach on two benchmark EEG datasets of features and subsequent machine learning algorithms. One of the
motor imagery classification on the subject-independent most popular hand-crafted features is the power spectral density
testing. The results show that the G-CRAM achieves su- (PSD). The phenomenon of EEG power in some frequency bands
perior performance to state-of-the-art methods regarding
increasing/decreasing, which is called event-related synchro-
recognition accuracy and ROC-AUC. Furthermore, model
interpretation studies reveal the learning process of differ- nization/desynchronization (ERS/ERD), are widely observed
ent neural network components and demonstrate that the when analyzing the PSD patterns of motor imagery EEG sig-
proposed model can extract detailed features efficiently. nals [5]. In the meantime, not all EEG nodes can provide distin-
Index Terms—EEG, Motor Imagery, Deep Learning. guishable information in terms of PSD features. An EEG channel
selection approach is usually preferred to choose the most dis-
I. INTRODUCTION criminative EEG nodes [6]. C3, C4, and Cz are three commonly
reported channels that are most useful for motor imagery clas-
OTOR imagery classification is the basic to a BCI, which
M supports motor rehabilitation of post-stroke patients [1].
The EEG signals, which are captured from a human’s scalp
sification. However, there are some drawbacks when using the
hand-crafted features. First, previous studies disagreed with the
range of some frequency bands. For example, [7] defined the mu
and thus reflect the electrical activities of human the cortex,
band between 8–13 Hz, while [8] defined the mu band between
is one of the most active physiological cues to build a BCI
8-12 Hz. Second, the amount of effective EEG nodes that are
system. Researchers have widely explored the EEG-based BCI
selected by a channel selection algorithm is generally decided by
due to its zero clinical risks as well as portable and cost-effective
an expert’s knowledge and experience. Third, in these traditional
acquisition devices.
works, all steps are separated, which is meaningless and not only
wastes time but also prevents the potential that different steps
Manuscript received September 9, 2019; revised December 18, 2019; may promote each other during the feature learning process.
accepted January 12, 2020. Date of publication January 16, 2020; date
of current version September 3, 2020. (Corresponding author: Dalin Even though powerful state-of-the-art classifiers have been used
Zhang.) on hand-crafted features and achieved partial improvements
The authors are with the School of Computer Science and Engineer- in performance [9], human-designed features may neglect the
ing, University of New South, Sydney, 2052 NSW Australia (e-mail: dalin.
[email protected]; [email protected]; [email protected]; critical information within raw EEG signals [10].
[email protected]). In contrast to hand-crafted features, deep learning methods
This article has supplementary downloadable material available at can learn the underlying information across different subjects
https://ieeexplore.ieee.org, provided by the authors.
Digital Object Identifier 10.1109/JBHI.2020.2967128 automatically [11]. Considerable effort has been devoted to

2168-2194 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: MOTOR IMAGERY CLASSIFICATION VIA TEMPORAL ATTENTION CUES OF GRAPH EMBEDDED EEG SIGNALS 2571

developing EEG analysis approaches using deep learning tech- the superior generalization performance of the proposed
niques and achieved promising results [12]–[15]. Reference [13] method on new subjects;
presents a compact convolutional neural network (CNN) and r We provide detailed insight discussions on the model in-
demonstrates its success on different EEG paradigms. To make terpretation of the neural network and performance impact
use of the temporal dynamics efficiently, [14] proposes to adopt of the graph representation. The CNN prefers to focus on
a recurrent neural network (RNN) of long short-term mem- small brain areas, and the recurrent attention network not
ory (LSTM) cells besides CNNs. Some works also combine only focuses on the last temporal step but also gives high
traditional spectral features and deep learning methods [12], weights to early steps. The results also indicate that the
[15]. Despite the success of deep learning in EEG analysis, graph embedding impacts more on the dataset of a larger
few deep learning works build a motor imagery classification number of EEG nodes.
model demonstrating generalization abilities on new subjects
[16], [17]. II. RELATED WORK
To solve the subject-independent problem of EEG-based
motor imagery classification, this work proposes a novel A. EEG Motor Imagery Classification
Graph-based Convolutional Recurrent Model (G-CRAM) that EEG-based motor imagery classification is the basis of many
efficiently learns the spatial information with the aid of graph synchronous BCIs, and abundant approaches have been pub-
representations of EEG nodes and extracts attentional temporal lished on this task. Common Spatial Pattern (CSP) is one of
dynamics using a recurrent attention network. First, we utilize the most popular and effective feature extraction methods in
the spatial positioning of EEG nodes to form the EEG graph motor imagery EEG classification [19]. It is a spatial filtering
to explicitly exhibit the spatial information of EEG node con- approach that tries to find a linear combination of EEG channels
nections. Different from previous spatial filtering methods, the that the power difference of different motor imagery classes
proposed graph embedding does not rely on subjects or tasks, is maximized [20]. There have been lots of works reported to
thus being more robust on new subjects. Then, a sliding window extend CSP and achieve remarkable improvement [21], [22].
technique is used to split the EEG representations into multiple One of its most successful variants is the Filter Bank CSP
consecutive temporal slices, and a specifically designed CNN (FBCSP), which addresses CSP’s drawback of depending on
structure is designed to learn spatio-temporal traits within an a particular frequency band by applying CSP to different fre-
EEG temporal slice. Lastly, we employ a recurrent attention quency bands and selecting subject-specific features by a feature
network to acquire the temporal dependencies across differ- selection method [22]. FBCSP was the state-of-the-art method
ent EEG temporal slices. In the standard recurrent module, in motor imagery EEG classification and has provided excellent
the temporal cues are usually accumulated to the last time results [23]. In terms of classifiers, traditional algorithms like
step and consequently used for classification that some critical SVM and LDA are commonly used in many EEG motor imagery
information in early time steps may be omitted. In contrast, studies [22], [24].
the proposed model assigns weights to different temporal cues Due to its end-to-end structure and superior performance,
and aggregates all information for the final classification. This deep learning has been applied to classifying motor imagery
study employs two benchmark EEG motor imagery datasets to in some studies. [24] proposes a carefully designed CNN with
validate the proposed method in a subject-independent manner and a crop training strategy and achieves better performance
and demonstrates its superiority to a series of comparison ap- than FBCSP. [13] designs a lightweight CNN that shows com-
proaches. Besides the performance improvement, understanding petitive performance to state-of-the-art methods in diverse BCI
how the model works is also a critical and attractive research paradigms, such as motor imagery and P300. RNN is also ex-
topic. Besides the overall performance evaluation, insights of tensively used to extract temporal features in the motor imagery
how each part of the presented model works are also investigated classification task. [25] proposes to fuse the CNN and RNN with
and discussed in details. A preliminary version of this work has fuzzy integral and leverage the reinforcement learning technique
been reported [18]. The implementation code is made publicly to optimize the fuzzy measures. Feature engineering is also used
available.1 The main contributions of this work are summarized to improve the performance of deep learning models. Power
as follows: spectral features are popular for motor imagery classification
r This work designs a novel deep learning method for the
due to previous evidence of its discriminative ability [26]. [27]
EEG motor imagery classification task. The proposed proposes to use the CNN and the power spectral density fea-
model uses a graph embedding to represent EEG spatial in- tures for motor imagery classification and achieves competitive
formation, which differs from other spatial filtering meth- accuracy.
ods by its independence of subjects and tasks. A recurrent
attention module is then developed to assign weights to
different temporal cues, instead of relying on accumulative B. Graph Theory and Attention Model
temporal information by a standard recurrent network; Graph theory can be used to model many types of relations
r We carry out comprehensive experiments on two bench- and has been applied to different areas such as human action
mark datasets with the subject-independent setting, which recognition [28], recommendation system [29] and anomaly
is rarely reported in previous studies. The results show detection [30]. Kipf et al. propose a Graph Convolutional
Networks (GCN) for semi-supervised citation-based document
1 [Online]. Available: https://github.com/dalinzhang/GCRAM classification [31]. Given the labels of some nodes, a neural

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
2572 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO. 9, SEPTEMBER 2020

slices along the time dimension and fed into the attention-based
neural network, which can better. extract the temporal features
for the motor imagery classification. The model is an end-to-end
framework that can be trained by standard back-propagation.

Fig. 1. A timing scheme of a motor imagery EEG acquisition C. Represent EEG Node Connections
experiment.
In the node dimension of XT , one EEG node at most has
two neighbors. Such a representation is limited to reflect the
real-world situation where an EEG node usually has multiple
network tries to infer the labels of the rest nodes with the aid
neighboring nodes acquiring EEG signals of a certain brain
of graph theory which represents the relations between all the
area. Thus representing the relations of different EEG nodes is
nodes. Considering that the spatial relations between different
essential to successful EEG analysis. In our work, we leverage
EEG nodes are crucial to successful EEG analysis [7], [12], [32],
the EEG node positioning to form graph representations of
we propose to utilize the graph theory to represent the spatial
EEG nodes, which include spatial information of the natural
relationships of EEG nodes in this work.
EEG node connections. In particular, we construct an undirected
The attention-based neural network architecture has shown
spatial graph G = (V, E) on the EEG node positioning. The
promising power on various tasks, like speech recognition [33],
node set V = {si |i ∈ [1, n]} includes all the EEG nodes in an
machine translation [34], and activity recognition [35], [36].
experiment. Depending on the structure of the adjacency matrix
The self-attention mechanism is a specific and powerful soft
of EEG nodes, we design three EEG representation graphs:
attention mechanism [34]. Ashish et al. entirely rely on the
N-Graph (NG), D-Graph (DG), and S-Graph (SG). The graph
self-attention mechanism for the sequence to sequence task
definition enhances the brain area representation ability of EEG
without using traditional CNN or RNN structures and achieve
signals but decreases the effect of noise on each EEG node by
the state-of-the-art results [37]. It has also shown promising per-
combining neighboring nodes to represent the central one. This
formance in various tasks like abstractive summarization [38],
design also empowers the EEG representations to be robust to
intention recognition [39], and textual entailment [40]. Re-
missing value issues by embedding each EEG node with the
garding the characteristic of the self-attention mechanism and
assist of its neighboring nodes instead of only relying on the
human’s attention is focused on different periods, we adopt
measurement of itself.
the self-attention mechanism to emphasize on different EEG
1) N-Graph: Fig. 3 shows an example positioning of
temporal periods.
64-channel EEG nodes. In the 2D position projection
(Fig. 3(b)), each node has several naturally neighbors (up,
III. METHODOLOGY down, left, right, up-left, up-right, down-left, and down-
A. Problem Definition right); for example, the node s11 has eight neighbouring
nodes (s3 , s4 , s5 , s12 , s19 , s18 , s17 , s10 ). Based on this observa-
Before going into the methodology details, we first briefly tion, we build a connection between two naturally neighboring
introduce the motor imagery experiment process and formally EEG nodes. Formally, the edge set can be denoted as Ev =
define the research problem. Fig. 1 shows the timing scheme of {si sj |(i, j) ∈ H}, where H is the set of naturally neighboring
a typical motor imagery experiment. The beep and cue are used EEG nodes. We also regard each node as connecting to itself.
to notice and indicate the subject to perform the motor imagery We can define the adjacency matrix of the N-Graph as a square
task. The duration of motor imagery is of research interest. matrix |V| × |V| with its binary element representing whether
Formally, the duration of interest is T -second long. Each of two EEG nodes are neighboring to each other:
the n EEG nodes has a sensor recording sequence ri∈[1,n] = 
[si1 , si2 , . . ., sik ] ∈ Rk through k = T × f time points, where f 1 if si sj ∈ Ev
Aij =
is the sampling frequency and sit is the measurement of the ith 0 else.
EEG sensor at the time point t. Thus the raw EEG features of the
trial T is a two-dimensional (2D) tensor XT = [r1 ; r2 ; . . .; rn ] ∈ We then follow the spectral graph theory [31] to nor-
1 1
Rn×k with one dimension representing EEG node and the other malize the adjacency matrix: Âv =   v− 2 A
D  v− 2 , where
v D
v = Av + In ,  v = diag( 
representing time series. Our goal is to make motor imagery A D j A1j , j A2j . . . j A|V |j )
1
classification of the EEG trials XT . The following experiment is the diagonal node degree matrix, and D  v− 2 =
results are based on single-trial subject-independent testing, diag( √1 , √1 . . . √ 1 ). Then the N-Graph
where training and testing trials are drawn from different groups j A1j j A2j j A|V |j

of subjects. representation Zv of raw EEG signals is the matrix product of


the normalized N-Graph adjacency matrix Âv and the raw EEG
trial XT :
B. Pipeline Overview
Zv = Âv XT , Zv ∈ Rn×k .
Fig. 2 shows an overview of our proposed approach. The EEG
signals are first embedded by a graph representation. Three dif- 2) D-Graph: The adjacency matrix of N-Graph, is a simple
ferent graph embedding schemes are developed based on differ- binary embedding that roughly represents the spatial information
ent considerations. The embedded EEG signals are then cut into without refined depiction of EEG node spatial relationships.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: MOTOR IMAGERY CLASSIFICATION VIA TEMPORAL ATTENTION CUES OF GRAPH EMBEDDED EEG SIGNALS 2573

Fig. 2. Overview of the graph Convolutional Recurrent Attention Model (G-CRAM) on EEG motor imagery classification. We first represent the
raw EEG measurement by a spatial graph drawn from EEG node positions; then we apply a sliding window technique to crop continuous EEG
sequences into temporal slices and utilize a CNN layer to extract spatio-temporal features of each slice; a recurrent attention layer is used to
extract the attentive temporal dynamic features; lastly the extracted features are classified to the target using a dense layer and a standard softmax
classifier.

how to define the distance between a node and itself. For the
first problem, we regard the two EEG nodes are neighboring
if the distance between two nodes is smaller than the average
value of the distance set L. For the second problem, the distance
between a node and itself is defined as the average distance of
other neighboring nodes to this node. Therefore, we define the
elements of the adjacency matrix Ad as:
⎧ 1

⎨ dij if dij < E(L)
Aij = 0 if dij  E(L)

⎩ 1
E({diq |diq <E(L),q∈[1,n]}) if i = j
Fig. 3. An example positioning of 64-channel EEG nodes.
where E(L) is the average of distance set L. Similar
to the N-Graph, the D-Graph adjacency matrix is
1 1
The binary adjacency matrix considers all neighboring nodes −2 −2 
contribute equally to the central node, while the real-world  to Â
also normalized d = Dd Ad Dd , where Dd =
diag( j A1j , j A2j . . . j A|V |j ) is the diagonal degree
situation is that those relatively distant neighboring nodes have 1
 − 2 = diag( √1
matrix, and D , √1 . . . √ 1 ).
less influence and the relatively adjacent neighboring nodes have d
j A1j j A2j j A|V |j
more influence on the central nodes. In Fig. 3(b) for example, The raw EEG trial XT represented with the D-Graph spatial
the central node s11 may be influenced to different degrees by its information is:
eight neighboring nodes (s3 , s4 , s5 , s12 , s19 , s18 , s17 , s10 ) based
on the spatial distance between neighboring nodes to the central Zd = Âd XT , Zd ∈ Rn×k .
node s11 . The simple binary adjacency matrix is not flexible and 3) S-Graph: In the definition of D-Graph, the distance be-
not able to convey such kind information. tween a node and itself is defined as the average distance
Considering the above disadvantages, we define a distance- of other neighboring nodes to this node. Another strategy to
based EEG graph called D-Graph, which uses the real-world 3D defining the self-distance is to use the shortest distance from a
distance between EEG nodes rather than the binary connections node’s neighbors to itself. We call this graph definition S-Graph.
between naturally neighboring nodes. The adjacency matrix of Similarly, its adjacency matrix element is defined as:
D-Graph has the distance between two neighboring EEG nodes ⎧ 1
as its element instead of binary elements indicating neighboring ⎪
⎨ dij if dij < E(L)
or not. First, we define the set of the distance between any two Aij = 0 if dij  E(L)
EEG nodes as L = {dij |(si , sj ) ∈ V 2 , i = j}, where dij is the ⎪
⎩ 1
Euclidean distance between node si and sj . The locations of Min(diq |q∈[1,n]}) if i = j
the EEG nodes are from the international 10-10 system [41] The S-Graph is also normalized in the same way to avoid chang-
with the three-dimensional Talairach coordinate representation. ing the scale of XT . The final representation of the S-Graph is:
In practice, two issues should be addressed before constructing
the adjacency matrix: 1) how to define neighboring nodes; 2) Zs = Âs XT , Zs ∈ Rn×k .

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
2574 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO. 9, SEPTEMBER 2020

TABLE I
THE CONFIGURATIONS OF THE SPATIO-TEMPORAL ENCODING NETWORK

Different from other spatial filtering like CSP [42], the graph
embedding only relies on the real-world node placement, so
it is independent of subjects, targeting tasks, and manually-set
parameters. Therefore, it is a supplement to the following neural
network, which is a data-driven feature learning approach.

Fig. 4. Illustration of the self-attention module. A nonlinear encoding


D. Spatio-Temporal Encoding layer first transforms the encoded EEG temporal slices and the results
are scaled and normalized to get the attention weight of each temporal
After embedding raw EEG signals, a sliding window is ap- slice. Lastly, the attention weight is multiplied with its corresponding
plied to cut the EEG representations along the time dimension encoded features.
into several temporal slices Qi ∈ Rn×w , where w is the tem-
poral slice length. Let the interval between two neighbouring
slices be p, then m = int((k − w)/p) slices are obtained from cells. The output of the RNN is the hidden states of the second
 
one EEG trial. We specifically design a CNN to encode the recurrent layer {hi ∈ Rl |hi = LST M (Si ), i = 1. . .m}, where
spatio-temporal information within a temporal slice. l is the hidden state size.
Although deep networks have strong learning abilities, deeper Because a subject usually concentrates on the experiment at
is not always better for EEG analysis [43]. Table I gives the some time but is distracted at the other time and different subjects
detailed configuration of the proposed spatio-temporal encoding pay attention a different time within a trial, emphasizing on
network. We use one CNN layer and one pooling layer. The the EEG temporal slices when a subject concentrates on the
height of the CNN kernel is set to n, same to the amount of experiment while neglecting the other slices is necessary for
EEG nodes, for considering all EEG nodes at once. The width successful EEG analysis. A self-attention mechanism [34], as
of the kernel is extended to 45 for exploring long temporal illustrated in Fig. 4, is used to allocate adaptable weights to
dynamics. The output amount of CNN filters is empirically different input elements according to their values and aggregate
set to 40. The convolutional filtering thus can uncover the this information to form a final representation. One important
spatio-temporal information across different EEG nodes. Each feature of the self-attention mechanism is that the weight values
temporal slice is encoded to higher-level representations {Ui ∈ are adapted according to the input values, thus meets the demand
Rwc |Ui = Conv(Qi ), i ∈ [1, m]}. The activation function used of subject-independent EEG signal analysis where different
in the convolutional operations is the Exponential Linear Unit subjects concentrate on different temporal periods. Each slice

(ELU) function. We use the valid padding option. Thus the representation hi is first non-linearly transformed into a latent
output of the CNN layer has the height of 1. A maxpooling space:
layer is then applied to reduce the number of parameters and 
extract important information. The final encoded representation Hi = tanh(Wi hi + bi ), Hi ∈ Rha
is {Si ∈ Rwp |Si = MaxPool(Ui ), i ∈ [1, m]}.
where Wi ∈ Rl×ha and bi ∈ Rha are the input-to-hidden weight
matrix and bias for a hidden layer of size ha . The softmax
E. Exploring Attentive Temporal Dynamics activation function, defined as softmax(xi ) = Z1 exp(xi ) with Z

Following the spatio-temporal feature extraction within single = i exp(xi ), is applied to the nonlinear latent representation
EEG temporal slices, a recurrent attention network is introduced Hi to obtain the weight of importance for each slice:
to discover the attentive temporal dependencies across differ-
exp(Hi vi )
ent EEG temporal slices. In traditional recurrent networks, the Vi =   .
features that are accumulated from the previous time step are i exp(Hi vi )

usually adopted for further analysis. However, some crucial The slice attention vector vi ∈ Rha is randomly initialized
information in early steps may be forgotten inevitably due to the and jointly learned during the training process. The sof tmax()
structural limitation of recurrent cells. To overcome this issue, function guarantees that all the computed weights sum to 1.
we propose to utilize a self-attention module to assign adaptive This weight matrix will focus on specific temporal slices that
weights to different recurrent time steps, so that information in are more distinguishable than others. Lastly, in the interest of
early time steps is preserved and flexibly incorporated. computational efficiency, a weighted sum of all EEG temporal
The Long Short-Term Memory (LSTM) units are used to slices is computed to a slice-focused representation:
build two stacked RNN layers. After flattening the output of the 
previous spatio-temporal encoding, m 1D vectors are obtained A= V i hi , A ∈ Rl .
and input into the RNN. Therefore, each RNN layer has m LSTM i

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: MOTOR IMAGERY CLASSIFICATION VIA TEMPORAL ATTENTION CUES OF GRAPH EMBEDDED EEG SIGNALS 2575

The attentive temporal dynamic representation A is fed into TABLE II


COMPARISON WITH STATE-OF-THE-ART AND BASELINE MODELS
a standard softmax classifier:
P = sof tmax(W A + b),
where W and b are weight and bias matrices respectively of
the motor imagery classification layers. Then the cross-entropy
error over all labeled samples is evaluated:

L=− Ŷc log(Pc ),
c

where Ŷc and Pc is the label and the classification probability


of motor imagery strategy c respectively. The network weights
and biases are trained with batch gradient descent. The final
classification result is defined as the motor imagery strategy
with max classification probability.

IV. EXPERIMENT AND RESULTS


A. Dataset and Implementation Details
The proposed method is evaluated on two widely used bench-
mark EEG dataset: PhysioNet EEG Motor Imagery Dataset [44]
and BCI Competition IV dataset 2a [45].
1) PhysioNet Dataset: The PhysioNet dataset comprises 109
healthy subjects executing left/right fist open and close imagery.
The EEG data is collected using BCI2000 instrumentation with
64 EEG nodes and a 160 Hz sampling rate. Each trial lasts about
3.1 seconds resulting in 497 recording time steps. After data in- 400 and the step of is 10 and 50 for the PhysioNet and BCICIV2a
spection, we remove the data of subject #88, #89, #92, and #100 dataset respectively. We make use of the TensorFlow framework
because of the damaged recordings with multiple consecutive for a GPU-based implementation using matrix multiplications.
“rest” sections. As a result, we have 105 subjects in total.We The stochastic gradient descent with Adam update rule is used
prepared nine subject-independent evaluation sets (A01-A09), to minimize the cross-entropy loss function. The network pa-
each of which contains ten randomly selected subjects as a test rameters are optimized with a learning rate of 10−5 . Dropout
and the remaining 95 subjects as training. Because each subject regularization is applied after the CNN layer and the recurrent
has around 43 trials with a roughly balanced ratio in the right network layer with the dropout probability of 0.5. The hidden
and left fist motor imagery, there are about 4085 trials in one state size of the LSTM cell l is 64. The non-linear transformation
training set and 430 trials in one test set. size of the self-attention is 512. The proposed model has 16
2) BCICIV2a Dataset: The BCICIV2a dataset contains EEG hyper-parameters and 420,356 trainable parameters.
signals of 22 nodes recorded with nine healthy subjects and
two sessions on two different days. Each session consists of
288 four-second trials of motor imagery per subject (imagin- B. Experimental Results
ing the movement of the left hand, the right hand, the feet, 1) Comparison Results: The PhysioNet dataset and BCI-
and the tongue). The signals were sampled with 250 Hz and CIV2a dataset we used are roughly balanced. Thus we evaluate
bandpass-filtered between 0.5 Hz and 100 Hz by the dataset the proposed model with classification accuracy and the Area
provider before release. The original dataset uses the 288 trials Under ROC Curve (ROC-AUC). Table II presents the overall
of the first session as training and the 288 trials of the second comparison results and the detailed results can be found in the
session as a test. However, in the subject-independent scenario, supporting documents. Because deep learning is an advanced
the original dataset needs to be re-split by subject with the technique that relies on proper structure design, we compare with
leave-one-subject-out manner. Consequently, nine evaluation several deep learning approaches with various model structures
datasets (A01-A09) are achieved, each of which has 576 trials and feature embedding strategies. To make a fair comparison
(288 trials × 2 sessions) of one subject as a test and 4608 trials and show the superior structure of the proposed approach, the
(288 trials × 2 sessions × 8 subjects) of the remaining eight most recent state-of-the-art approaches whose implementation
subjects as training. code is available online are selected for comparison. We first
3) Implementation Details: One of the crucial advantages of make a comparison with the recently published EEGNet [13],
the deep learning technique lies in its no need for hand-crafted which encapsulates well-known EEG feature extraction con-
features. By following the conventions [43], [46], we directly cepts for BCI to construct a uniform approach for different
feed the raw EEG data into the proposed framework pipeline BCI paradigms. Then the proposed model is compared with
without any filtering. The sliding window size for both datasets is the CTCNN (CroppedTrainingCNN) [43] method, as this work

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
2576 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO. 9, SEPTEMBER 2020

reports comprehensive research on various CNN architectures classification. Specifically, instead of summing up the convo-
and proposes a crop training strategy which outperforms the tra- lutional results along the EEG node dimension, we retain the
ditional trial-based training manner. A further comparison with convolutional results in the EEG node dimension and average
the EEG-Image [15] approach is performed. The EEG-Image the results along the time dimension. Therefore, a feature vector
model selects three widely explored aspects of EEG signals: of size n is achieved after ELU activation with each element
spectral, spatial, and temporal as prior features and proposes a representing the extracted features of each EEG node from the
carefully designed convolutional recurrent architecture for the CNN layer. The feature vector is then normalized to [−1, 1] and
mental workload classifcation. The Cascade and Parallel CRNN visualized with topographic scalp plots.
reported in [46] are also used for comparison, as this work Fig. 5 presents 10 representative topographic scalp plots of
also reports to preserve EEG spatial information by considering convolutional feature maps for each evaluation datasets. As
adjacent EEG nodes. It provides state-of-the-art results on the shown in Fig. 5, the CNN layer focuses on relatively small
PhysioNet dataset in the subject-dependent scenario. Lastly, detailed brain areas, which is important for successful EEG
the proposed G-CRAM is compared with two traditional EEG feature extractions [12]. Furthermore, it is consistent with pre-
analysis methods, PSD-SVM [47] and FBCSP [22]. The PSD vious reports [49], [50] that the CNN layer emphasizes on
provides time-frequency features that are commonly used in the central (FC, C, and CP) and frontal (F)/pre-frontal (Fp)
traditional EEG motor imagery analysis. FBCSP is a widely used areas of a human brain. More specifically, some convolutional
traditional method and has won several BCI competitions. Apart feature maps activate at the three EEG nodes C3, C4, and Cz,
from the state-of-the-art approaches, the proposed model is which are widely demonstrated holding the most distinguishable
further compared with two self-built baseline models: CNN and information regarding EEG-based motor imagery classification
RNN. The CNN model has three CNN layers directly applied in previous studies [47], [51]. For example, as presented in
on the raw EEG trials. The RNN model has two LSTM-based Fig. 5(a) of the PhysioNet dataset, kernel # 1 focuses on Cz;
RNN layers to find the temporal relationships between different kernel # 22 focuses on C4; kernel # 5, # 24, # 35 and # 37 focus
slices in an EEG trial. We also compare the proposed method on C3. For the BCICIV2a dataset presented in Fig. 5(b), kernel
with the CRAM model to demonstrate the effectiveness of the # 8, # 26 and # 27 focus on Cz; kernel # 17 and # 27 focus on C4;
graph representation of EEG signals. kernel # 6, # 22, # 27 and # 32 focus on C3. Besides, the CNN
In Table II, the NG-CRAM, DG-CRAM, and SG-CRAM layer also learns to target other EEG nodes, which is helpful to
represent the G-CRAM with N-Graph, D-Graph, and S-Graph discriminate different motor imagery tasks as well [52], [53],
respectively. All three models outperform comparison methods especially for different subjects and paradigms. Therefore, the
in terms of accuracy and ROC-AUC on the subject-independent spatio-temporal encoding layer is able to act as a spatial filter to
testing. The primary reason that the proposed methods surpass extract features of the most distinguishable EEG nodes.
traditional methods like FBCSP is the multiple nonlinear trans- 3) How Does the Recurrent Attention Module Work?: In or-
formation process which is a main advantage of deep learning der to understand how the recurrent attention module learns
frameworks. When compared with deep learning models, our EEG features, we collect the attention matrices of the correctly
proposed methods have two main advantages that help to pro- classified test samples of the DG-CRAM model and plot the
duce superior performance. The first advantage is our proposed statistical results in Fig. 6. The elements in the attention matrix
graph embedding method. Compared with the pure deep learn- indicate the weight values assigned to the corresponding RNN
ing models which do not have particular data representations, output. In Fig. 6, larger number on X axis means later in time.
like EEGNet and CTCNN, our proposed graph representation It is obvious that most temporal slices that are later in time
embeds the spatial relationship of EEG nodes, which facilitates have larger weight values, suggesting more influence on the
the following neural network to analyze EEG signals. On the final classification results. This trend shows that the recurrent
other hand, compared with the EEG-Image model, which has a attention module tends to focus more on later temporal slices.
spatial representation, our graph representation scheme does not Intuitively, different subjects have different ways of thinking and
rely on data implanting, so that is free of the risk of introducing would concentrate on different temporal periods. Thus effective
noises. Finally, different from the cascade model and parallel attention would focus on different temporal periods. However,
model, which only adopt naive channel re-arrangement, our considering the input of the self-attention module is the features
graph scheme introduces an adjacency matrix to optimize the from the previous RNN layers that accumulate the information
raw data to a more effective embedding. The second advantage from early time step and aggregate gradually to the final time
is the recurrent attention module. Compared with the naive step, the later time steps would have more information than
recurrent network, the recurrent-attention module not only takes earlier time steps. Hence the self-attention module would give
the temporal information into consideration but also assigns larger weights to later RNN output.
adaptive weights to different time periods within an EEG trial. To make further exploration, we build a comparison model
The recurrent-attention module has already been widely demon- with the self-attention module directly after the pooling layer
strated more powerful than the traditional recurrent network in without RNN layers in between. We collect the comparison
exploring temporal features [48]. model’s attention matrices of the correctly classified test samples
2) How Does the G-CRAM Encode Spatial Information?: and plot the statistical results in Fig. 7. It is found that there is
In this section, we analyze the learning process of the CNN no such a “later-higher” trend in the attention matrices, demon-
layer to show how EEG features are learned for motor imagery strating the RNN layer is the cause of the “later-higher” trend

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: MOTOR IMAGERY CLASSIFICATION VIA TEMPORAL ATTENTION CUES OF GRAPH EMBEDDED EEG SIGNALS 2577

Fig. 5. Visualization of the convolutional feature maps in topographic plots. We select 10 representative feature maps for each dataset. The feature
values of all EEG nodes are normalized to range [−1, 1]. A large value indicates a major impact on subsequent motor imagery classification; in
contrast a small value represents minor impact. # represents the digital label of the feature map.

Fig. 6. Box plot of the attention matrices of the correctly classified test Fig. 8. Box plot of the attention matrices of the correctly classified test
samples with RNN layers. Larger number on X axis means later in time. samples without RNN layers of individual subjects.

technique that combines feature extraction and classification


into an end-to-end framework, it is powerful to process raw data
directly. Besides, [43] argued that CNN could learn to work as a
frequency filter to extract the band power in frequency bands rel-
evant to motor imagery. Therefore, CNN in the proposed model
can help to minimize artifacts. From the view of experiment
results, the proposed method outperforms the FBCSP, which
uses frequency filtering to remove artifacts. As a result, the
Fig. 7. Box plot of the attention matrices of the correctly classified test proposed model is at least robust to artifacts as using a frequency
samples without RNN layers. Larger number on X axis means later in filter qualitatively. The quantitative analysis of the robustness
time. against artifacts is critical but rarely studied, so we leave it for
future work.
2) Statistical Significance Test: Following the conventions
in the attention matrices. In the PhysioNet results, the attention
matrices show an even distribution on different temporal slices in previous EEG studies [12], [43], we perform the Wilcoxon
(Fig. 7(a)), while in the BCICIV2a results, there is a peaking Signed-Rank test to analyze the statistical significance of perfor-
trend throughout the attention matrices (Fig. 7(b)). The main mance improvement of the proposed approaches. The detailed
reason is that there are ten subjects in the PhysioNet evaluation results are summarized in the supporting document. It is demon-
set; thus, the attention matrix distribution tends to be even. By strated that on the PhysioNet dataset, our proposed G-CRAM
contrast, there is only one subject in each BCICIV2a evaluation models outperform all the state-of-the-art and baseline methods
set, so the attention matrix exhibits a clear subject-specific significantly (p < 0.05). On the other hand, considering the BCI-
pattern. As shown in Fig. 8(a), (b), and (c), different subjects CIV2a dataset, the performance improvement of the proposed
have different attention patterns indicating strong variability models to the comparison methods is significant (p < 0.05)
across subjects. except for the baseline CRAM model. This result indicates
that the graph representation does not significantly improve
the performance of the proposed models on the BCICIV2a
C. Discussion dataset (p > 0.05 when comparing CRAM with NG-CRAM,
1) Robustness to Artifacts: Traditional methods use fre- DG-CRAM, and SG-CRAM). The reason may be that the BCI-
quency filtering to remove high- and low-frequency artifacts CIV2a dataset has fewer EEG nodes that the benefits imported
before classification. In contrast, as deep learning is a new by the graph embedding are limited. The difference between

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
2578 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO. 9, SEPTEMBER 2020

the three graph representation methods is also not significant revealed that the EEG graph with a more considerable amount
(p > 0.05), suggesting that the exact distance between EEG of nodes improves the overall performance more significantly.
nodes is not essential in current graph schemes. However, the
distance-based method is more easily adaptive to various EEG B. Future Direction
nodes.
There are several future research directions to further develop-
3) Effect of the Number of EEG Nodes: The proposed graph
ing the proposed method. The first direction is to explore the G-
embedding strategy can be directly applied to EEG data with
CRAM on other BCI modalities, such as P300 and SSVEP. Due
any number of EEG nodes. As the graph representation is
to its task-irrelevant scheme, the proposed graph representation
location-based and the locations of EEG nodes are recognized
can be directly applied to other BCI modalities. Meanwhile, the
internationally (such as 10-10 system), given an EEG headset,
recurrent attention module would also benefit the extraction of
the coordinates (locations) of its EEG nodes would be fixed,
the most discriminative temporal cues, such as that of P300. As
and consequently, the graph representation could be achieved.
most current works focus on a particular EEG task, the potential
Therefore, the proposed graph representation approach is adap-
of G-CRAM being adaptive to different EEG tasks would be of
tive to different amounts of EEG nodes.
great interest to researchers. The second research direction is
The graph embedding aims at explicitly representing the
to incorporate the proposed method into a real-world BCI and
spatial distribution of EEG nodes to help the following neural
evaluate its online performance. The presented model can be
network extract powerful features. However, if there were only
efficiently incorporated into an online BCI with the off-the-shelf
a few EEG nodes (such as three or single nodes), the network
deep learning framework, like Tensorflow. Since only a few
would easily find the node correlations and extract useful fea-
works have been reported to incorporate a deep learning model
tures without the help of spatial embedding. In the evaluation,
into a real-world BCI application, developing such an online
two datasets with different amounts of EEG nodes (PhysioNet
system would be a remarkable contribution to the community.
64 vs BCICIV2a 22) are used. The significance test results show
Taking power spectral features into consideration is also an ex-
that the graph representation has a significant impact on the
citing research opportunity. The power spectral is another widely
dataset of 64 nodes (p < 0.05) but an insignificant impact on
used feature for EEG motor imagery classification. The proposed
the dataset of 22 nodes (p > 0.05). In addition to the number
G-CRAM can be used to represent the power distribution over
of EEG nodes, the locations of EEG nodes are also critical to
the scalp and find the distinguishable power changes over time.
model performance. If EEG nodes were not placed on the active
brain areas, even though there were lots of EEG nodes, the graph
representation would not work. REFERENCES
4) Effect of Temporal Slice Size: The size of the temporal [1] A. Berger, F. Horst, S. Müller, F. Steinberg, and M. Doppelmayr, “Cur-
slice is an important hyper-parameter. In light of the evidence rent state and future prospects of EEG and fNIRS in robot-assisted
gait rehabilitation: A brief review,” Frontiers Human Neurosci., vol. 13,
that the EEG signal presents multiple time scales, such as both pp. 172:1–172:17, 2019.
local and global oscillations in time [43], [54], [55], we design [2] D. Zhang, L. Yao, K. Chen, and J. Monaghan, “A convolutional recur-
the temporal slices for local temporal feature extraction. Then rent attention model for subject-independent EEG signal analysis,” IEEE
Signal Process. Lett., vol. 26, no. 5, pp. 715–719, May 2019.
the embedded local temporal features are input into a recurrent [3] J. Van Erp, F. Lotte, and M. Tangermann, “Brain-computer interfaces:
attention module to obtain attentive global temporal features. A Beyond medical applications,” Computer, vol. 45, no. 4, pp. 26–34, 2012.
large or small size of the temporal slice would degrade the model [4] H.-I. Suk and S.-W. Lee, “A novel Bayesian framework for discriminative
feature extraction in brain-computer interfaces,” IEEE Trans. Pattern Anal.
performance. We carefully tuned the size of temporal slices and Mach. Intell., vol. 35, no. 2, pp. 286–299, Feb. 2013.
reported the best results. [5] M. Hamedi, S.-H. Salleh, and A. M. Noor, “Electroencephalographic
motor imagery brain connectivity analysis for BCI: A review,” Neural
Computation, vol. 28, no. 6, pp. 999–1041, 2016.
[6] Y. Yang, S. Chevallier, J. Wiart, and I. Bloch, “Subject-specific time-
V. CONCLUSION AND FUTURE DIRECTION frequency selection for multi-class motor imagery-based BCIS using
few Laplacian EEG channels,” Biomed. Signal Process. Control, vol. 38,
A. Conclusion pp. 302–311, 2017.
[7] Y. Kim, J. Ryu, K. K. Kim, C. C. Took, D. P. Mandic, and C. Park,
This paper targets the EEG motor imagery classification task “Motor imagery classification using Mu and beta rhythms of EEG with
and proposes a novel deep learning approach. The deep learning strong uncorrelating transform based complex common spatial patterns,”
Comput. Intell. Neurosci., vol. 2016, pp. 1 489 692:1–1 489 692:13, 2016.
model leverages an original graph representation to embed the [8] D. J. McFarland and J. R. Wolpaw, “Sensorimotor rhythm-based brain-
spatial information, which is different from other spatial filtering computer interface (BCI): Feature selection by regression improves per-
methods, due to its independence of both subjects and tasks. A formance,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 13, no. 3, pp. 372–
379, Sep. 2005.
recurrent attention network is used to assign weights to different [9] C. Ieracitano, N. Mammone, A. Bramanti, A. Hussain, and F. C. Morabito,
temporal cues instead of using a standard recurrent network to “A convolutional neural network approach for classification of Dementia
accumulate temporal information. Comprehensive experiments stages based on 2D-spectral representation of EEG recordings,” Neuro-
computing, vol. 323, pp. 96–107, 2019.
on two benchmark datasets show the superior performance of [10] Z. Jiao, X. Gao, Y. Wang, J. Li, and H. Xu, “Deep convolutional neural
the proposed model on new subjects subjects, which is rarely networks for mental load classification based on EEG data,” Pattern
reported in previous studies. Detailed insights of feature ex- Recognit., vol. 76, pp. 582–595, 2018.
[11] K. Chen, L. Yao, D. Zhang, X. Chang, G. Long, and S. Wang, “Distri-
traction and the impact of EEG nodes are also investigated by butionally robust semi-supervised learning for people-centric sensing,” in
interpretation experiments and statistical significance tests. It is Proc. Thirty-Third AAAI Conf. Artif. Intell., vol. 33, 2019, pp. 3321–3328.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: MOTOR IMAGERY CLASSIFICATION VIA TEMPORAL ATTENTION CUES OF GRAPH EMBEDDED EEG SIGNALS 2579

[12] P. Zhang, X. Wang, W. Zhang, and J. Chen, “Learning spatial–spectral– [34] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
temporal EEG features with recurrent 3D convolutional neural networks jointly learning to align and translate,” in Proc. Int. Conf. Learn. Repre-
for cross-task mental workload assessment,” IEEE Trans. Neural Syst. sentation, 2015, pp. 1–15.
Rehabil. Eng., vol. 27, no. 1, pp. 31–42, Jan. 2019. [35] K. Chen et al., “Interpretable parallel recurrent neural networks with
[13] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and convolutional attentions for multi-modality activity modeling,” in Proc.
B. J. Lance, “EEGNET: A compact convolutional network for EEG-based IEEE Int. Joint Conf. Neural Netw., 2018, pp. 1–8.
brain-computer interfaces,” J. Neural Eng., vol. 15, no. 15, pp. 056 013:1– [36] K. Chen, L. Yao, D. Zhang, B. Guo, and Z. Yu, “Multi-agent attentional
056 013:17, 2018. activity recognition,” in Proc. 28th Int. Joint Conf. Artif. Intell., 2019,
[14] D. Zhang, L. Yao, X. Zhang, S. Wang, W. Chen, and R. Boots, “Cascade and pp. 4031–4038.
parallel convolutional recurrent neural networks on EEG-based intention [37] A. Vaswani et al., “Attention is all you need,” in Proc. Advances Neural
recognition for brain computer interface,” in Proc. 32nd AAAI Conf. Artif. Inf. Process. Syst., 2017, pp. 5998–6008.
Intell., 2018, pp. 1703–1710. [38] R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for ab-
[15] P. Bashivan, I. Rish, M. Yeasin, and N. Codella, “Learning representations stractive summarization,” in Proc. Int. Conf. Learn. Representation, 2018,
from EEG with deep recurrent-convolutional neural networks,” in Proc. pp. 1–12.
Int. Conf. Learn. Representation, 2016, pp. 1–15. [39] D. Zhang, L. Yao, K. Chen, and S. Wang, “Ready for use: Subject-
[16] X. Zhu, P. Li, C. Li, D. Yao, R. Zhang, and P. Xu, “Separated channel independent movement intention recognition via a convolutional attention
convolutional neural network to realize the training free motor imagery model,” in Proc. 27th ACM Int. Conf. Inf. Knowl. Manag., 2018, pp. 1763–
BCI systems,” Biomed. Signal Process. Control, vol. 49, pp. 396–403, 1766.
2019. [40] A. P. Parikh, O. Täckström, D. Das, and J. Uszkoreit, “A decomposable
[17] M. Riyad, M. Khalil, and A. Adib, “Cross-subject EEG signal classification attention model for natural language inference,” in Proc. Conf. Empirical
with deep neural networks applied to motor imagery,” in International Methods Natural Lang. Process., 2016, pp. 2249–2255.
Conference on Mobile, Secure, and Programmable Networking, Berlin, [41] V. Jurcak, D. Tsuzuki, and I. Dan, “10/20, 10/10, and 10/5 systems revis-
Germany: Springer, 2019, pp. 124–139. ited: Their validity as relative head-surface-based positioning systems,”
[18] D. Zhang, K. Chen, D. Jian, L. Yao, S. Wang, and P. Li, “Learning Neuroimage, vol. 34, no. 4, pp. 1600–1611, 2007.
attentional temporal cues of brainwaves with spatial embedding for motion [42] G. Pfurtscheller and C. Neuper, “Motor imagery and direct brain-
intent detection,” in Proc. 19th IEEE Int. Conf. Data Mining, 2019, pp. 1–6. computer communication,” Proc. IEEE, vol. 89, no. 7, pp. 1123–1134,
[19] I. Xygonakis, A. Athanasiou, N. Pandria, D. Kugiumtzis, and P. D. Jul. 2001.
Bamidis, “Decoding motor imagery through common spatial pattern filters [43] R. T. Schirrmeister et al., “Deep learning with convolutional neural
at the EEG source space,” Comput. Intell. Neurosci., vol. 2018, pp. 1–10, networks for EEG decoding and visualization,” Human Brain Mapping,
2018. vol. 38, no. 11, pp. 5391–5420, 2017.
[20] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatial [44] A. L. Goldberger et al., “Physiobank, physiotoolkit, and physionet: Com-
filtering of single trial EEG during imagined hand movement,” IEEE Trans. ponents of a new research resource for complex physiologic signals,”
Rehabil. Eng., vol. 8, no. 4, pp. 441–446, Dec. 2000. Circulation, vol. 101, no. 23, pp. e215–e220, 2000.
[21] F. Lotte and C. Guan, “Regularizing common spatial patterns to improve [45] C. Brunner, R. Leeb, G. Müller-Putz, A. Schlögl, and G. Pfurtscheller,
BCI designs: Unified theory and new algorithms,” IEEE Trans. Biomed. “BCI competition 2008–graz data set a,” Institute for Knowledge Dis-
Eng., vol. 58, no. 2, pp. 355–362, Feb. 2011. covery (Laboratory of Brain-Computer Interfaces), Graz University of
[22] K. K. Ang, Z. Y. Chin, H. Zhang, and C. Guan, “Filter bank common Technology, vol. 16, pp. 1–6, 2008.
spatial pattern (FBCSP) in brain-computer interface,” in Proc. IEEE Int. [46] D. Zhang, L. Yao, K. Chen, S. Wang, X. Chang, and Y. Liu, “Making
Joint Conf. Neural Netw., 2008, pp. 2390–2397. sense of spatio-temporal preserving representations for EEG-based hu-
[23] K. K. Ang, Z. Y. Chin, C. Wang, C. Guan, and H. Zhang, “Filter bank man intention recognition,” IEEE Trans. Cybern., to be published, doi:
common spatial pattern algorithm on BCI competition IV datasets 2a and 10.1109/TCYB.2019.2905157.
2b,” Frontiers Neurosci., vol. 6, p. 39, 2012. [47] V. P. Oikonomou, K. Georgiadis, G. Liaros, S. Nikolopoulos, and I. Kom-
[24] A. Schlögl, F. Lee, H. Bischof, and G. Pfurtscheller, “Characterization patsiaris, “A comparison study on EEG signal processing techniques using
of four-class motor imagery EEG data for the BCI-competition 2005,” J. motor imagery EEG data,” in Proc. IEEE 30th Int. Symp. Comput.-Based
Neural Eng., vol. 2, no. 4, p. L14, 2005. Med. Syst., 2017, pp. 781–786.
[25] D. Zhang, L. Yao, S. Wang, K. Chen, Z. Yang, and B. Benatallah, “Fuzzy [48] Y. Wang, M. Huang, X. Zhu, and L. Zhao, “Attention-based LSTM for
integral optimization with deep q-network for EEG-based intention recog- aspect-level sentiment classification,” in Proc. Conf. Empirical Methods
nition,” in Pacific-Asia Conference on Knowledge Discovery and Data Natural Lang. Process., 2016, pp. 606–615.
Mining, Berlin, Germany: Springer, 2018, pp. 156–168. [49] J. Shin, J. Kwon, and C.-H. Im, “A ternary hybrid EEG-NIRS brain-
[26] P. Herman, G. Prasad, T. M. McGinnity, and D. Coyle, “Comparative computer interface for the classification of brain activation patterns during
analysis of spectral approaches to feature extraction for EEG-based motor mental arithmetic, motor imagery, and idle state,” Frontiers Neuroinfor-
imagery classification,” IEEE Trans. Neural Syst. Rehabil. Eng. , vol. 16, matics, vol. 12, pp. 5:1–9, 2018.
no. 4, pp. 317–326, Aug. 2008. [50] B. Shrestha, I. Vlachos, J. A. Adkinson, and L. Iasemidis, “Distinguishing
[27] A. Pérez-Zapata, A. F. Cardona-Escobar, J. A. Jaramillo-Garzón, and G. motor imagery from motor movement using phase locking value and
M. Díaz, “Deep convolutional neural networks and power spectral density Eigenvector centrality,” in Proc. IEEE 32nd Southern Biomed. Eng. Conf.,
features for motor imagery classification of EEG signals,” in International 2016, pp. 107–108.
Conference on Augmented Cognition, Berlin, Germany: Springer, 2018, [51] G. Pfurtscheller, C. Brunner, A. Schlögl, and F. L. Da Silva, “Mu
pp. 158–169. rhythm (de) synchronization and EEG single-trial classification of dif-
[28] Y. Yi and M. Lin, “Human action recognition with graph-based multiple- ferent motor imagery tasks,” NeuroImage, vol. 31, no. 1, pp. 153–159,
instance learning,” Pattern Recognit., vol. 53, pp. 148–162, 2016. 2006.
[29] Q. Yuan, G. Cong, and A. Sun, “Graph-based point-of-interest recom- [52] A. Ghaemi, E. Rashedi, A. M. Pourrahimi, M. Kamandar, and F. Rahdari,
mendation with geographical and temporal influences,” in Proc. 23 rd Int. “Automatic channel selection in EEG signals for classification of left or
Conf. Inf. Knowl. Manage., 2014, pp. 659–668. right hand movement in brain computer interfaces using improved binary
[30] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and gravitation search algorithm,” Biomed. Signal Process. Control, vol. 33,
description: A survey,” Data Mining Knowl. Discovery, vol. 29, no. 3, pp. 109–118, 2017.
pp. 626–688, 2015. [53] H. Shan, H. Xu, S. Zhu, and B. He, “A novel channel selection
[31] T. N. Kipf and M. Welling, “Semi-supervised classification with graph method for optimal classification in different motor imagery BCI
convolutional networks,” in Proc. Int. Conf. Learn. Representation, 2017, paradigms,” BioMedical Eng. Online, vol. 14, no. 1, pp. 93:1–93:18,
pp. 1–14. 2015.
[32] D. Zhang, L. Yao, K. Chen, S. Wang, P. D. Haghighi, and C. Sullivan, “A [54] S. Monto, S. Palva, J. Voipio, and J. M. Palva, “Very slow EEG fluctuations
graph-based hierarchical attention model for movement intention detection predict the dynamics of stimulus detection and oscillation amplitudes in
from EEG signals,” IEEE Trans. Neural Syst. Rehabil. Eng.), vol. 27, humans,” J. Neurosci., vol. 28, no. 33, pp. 8268–8272, 2008.
no. 11, pp. 2247–2253, Nov. 2019. [55] S. Vanhatalo, J. M. Palva, M. Holmes, J. Miller, J. Voipio, and K. Kaila,
[33] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, “End-to- “Infraslow oscillations modulate excitability and interictal Epileptic activ-
end attention-based large vocabulary speech recognition,” in Proc. IEEE ity in the human cortex during sleep,” Proc. Nat. Academy Sci., vol. 101,
Int. Conf. Acoust., Speech, Signal Process., 2016, pp. 4945–4949. no. 14, pp. 5053–5057, 2004.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 11,2022 at 04:06:17 UTC from IEEE Xplore. Restrictions apply.

You might also like