Hybrid Deep Neural Network Using Transfer Learning For EEG Motor Imagery
Hybrid Deep Neural Network Using Transfer Learning For EEG Motor Imagery
Hybrid deep neural network using transfer learning for EEG motor imagery
decoding
Ruilong Zhang, Qun Zong, Liqian Dou ∗, Xinyi Zhao, Yifan Tang, Zhiyu Li
School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Keywords: A major challenge in motor imagery (MI) of electroencephalogram (EEG) based brain–computer interfaces
Motor imagery (BCIs) is the individual differences for different people. That the classification model should be retrained from
Brain–computer interfaces scratch for a new subject often leads to unnecessary time consumption. In this paper, a ‘‘brain-ID’’ framework
Deep neural network
based on the hybrid deep neural network with transfer learning (HDNN-TL) is proposed to deal with individual
End-to-end learning
differences of 4-class MI task. An end-to-end HDNN is developed to learn the common features of MI signal.
Transfer learning
HDNN consists of convolutional neural network (CNN) and Long Short-Term Memory (LSTM) which are utilized
to decode the spatial and temporal features of the MI signal simultaneously. To deal with the EEG individual
differences problem, transfer learning technique is implemented to fine-tune the followed fully connected (FC)
layer to accommodate new subject with fewer training data. The classification performance on BCI competition
IV dataset 2a by the proposed HDNN-TL in terms of kappa value is 0.8. We compared HDNN-TL, HDNN
and other state-of-art methods and the experimental results demonstrate that the proposed method can get a
satisfying result for new subjects with less time and fewer training data in MI task.
∗ Corresponding author.
E-mail address: [email protected] (L. Dou).
https://doi.org/10.1016/j.bspc.2020.102144
Received 24 June 2020; Received in revised form 28 July 2020; Accepted 7 August 2020
Available online 25 August 2020
1746-8094/© 2020 Elsevier Ltd. All rights reserved.
R. Zhang et al. Biomedical Signal Processing and Control 63 (2021) 102144
2
R. Zhang et al. Biomedical Signal Processing and Control 63 (2021) 102144
Fig. 4. The framework of LSTM cell, ‘‘S’’ denotes sigmoid activation function, ‘‘tanh’’
denotes hyperbolic tangent activation function, ‘‘+ ’’ is plus and ‘‘×’’ is multiplication.
The ‘‘𝐶𝑡 ’’ presents the state of LSTM cell at current moment [36].
Fig. 2. Proposed CNN_1 model. It provides different forms of output for followed CNN_2
and LSTM.
Fig. 5. Proposed Long Short-Term Memory (LSTM) network, which consists of 4 LSTM
cell. The output of CNN_1 as the input signal feeds into LSTM cell sequentially at every
time step. The parameters of 4 LSTM cell are the same and the framework of LSTM
cell is shown in Fig. 4.
2.3. LSTM
3
R. Zhang et al. Biomedical Signal Processing and Control 63 (2021) 102144
2.5.1. Pre-training
When learned for common spatial and temporal features of MI, all
parameters of the HDNN model are initialized with random Truncated
Normal distribution (mean = 0, std = 0.1) from scratch, and trained
for 1000 epochs with the mini-batch size of 24 samples. Some other
hyperparameters are learning rate: 0.001, LSTM time step: 4 and LSTM
cell size: 32. We use the Tensorflow framework and NVidia GTX 1070
Fig. 6. The structure of full connect (FC) neural network for transfer learning. It
contains 3 hidden layers (512 nodes, 512 nodes, 32 nodes in each hidden layer) and GPU to train the neural network.
dropout technique is used in second layer. When we train a classification model for
new subject, we only need to fine-tune the parameters of FC.
2.5.2. Fine-tune training
In fine-tune training step, the parameters of HDNN as a constant are
absent from training process. The FC parameters are random initialized
is updated by residual and new useful information,
[ ] and freshly trained, and the learning rate is also default learning rate:
𝑖𝑡 = 𝜎(𝑾 𝑖 ⋅ 𝒉𝒍𝑡−1 , 𝒙𝑡 + 𝒃𝑖 ), (6) 0.001.
[ ]
𝑪̂ 𝑡 = tanh(𝑾 𝑐 ⋅ 𝒉𝒍𝑡−1 , 𝒙𝑡 + 𝒃𝑐 ), (7)
𝑪 𝑡 = 𝑓𝑡 × 𝑪 𝑡−1 + 𝑖𝑡 × 𝑪̂ 𝑡 . (8) 3. Evaluation and discussion
where 𝑾 𝑖 , 𝑾 𝑐 , 𝒃𝑖 and 𝒃𝑐 are the weight and the bias of input gate
In this section, we have evaluated our proposed method and com-
respectively, 𝑪 𝑡 denotes the current state of LSTM cell which has been
pared the performances of HDNN-TL, HDNN and other state-of-art
updated.
methods on BCI competition IV dataset 2a.
Eventually, the output of current LSTM cell can be obtained by,
[ ]
𝑜𝑡 = 𝜎(𝑾 𝑜 ⋅ 𝒉𝒍𝑡−1 , 𝒙𝑡 + 𝒃𝑜 ), (9)
3.1. Dataset
𝒉𝒍𝑡 = 𝑜𝑡 × tanh(𝑪 𝑡 ). (10)
where 𝑾 𝑜 and 𝒃𝑜 is weight and bias respectively, 𝒉𝒍𝑡 denotes the In this paper, 2008 BCI Competition IV dataset 2a public EEG
current temporal feature of MI signal. dataset [44] is adopted for evaluating the proposed HDNN and HDNN-
TL. The dataset is a 4-class MI task (left hand, right hand, feet, and
2.4. FC with transfer learning tongue) recorded by 9 healthy subjects from 22 scalp electrodes with a
250-Hz sampling rate. Each session has 72 trials per class corresponding
After CNN and LSTM extract the spatial and temporal features of MI 288 total samples. The timing scheme consists of 2 s fixation, 1.25 s
signal, FC is proposed to analyze the obtained features by synthesis and cue time and followed 4 s MI process. Similar to previous research of
give us a classification result. The structure of FC is shown in Fig. 6. MI classification, the performance of proposed method is measured in
FC is adopted by 3-layer feedforward neural network (FNN). RELU is terms of accuracy and Cohen kappa.
utilized as activation function in each hidden layer and the softmax
function is selected to represent exponential probability distribution
between different classes in output layer, 3.2. HDNN evaluation
𝑒𝑦𝑚
𝑦𝑝,𝑚 = ∑𝑇 , (11)
𝑒 𝑦𝑚 In BCI Competition IV dataset 2a, each subject is recorded with
𝑚
two sessions, among which the first one is datasets ‘‘T’’ for train-
where 𝑚 is the index of each class, 𝑇 denotes the total number of
ing the classification algorithm and the other one is datasets ‘‘E’’ for
classes. Furthermore, dropout technology is used in second hidden layer
evaluating the trained classification algorithm. In the training process,
to reduce the network overfitting.
the HDNN was trained in 500 iterations by datasets ‘‘T’’ (an iteration
For transfer learning technique, we follow the method of [37,38]
where the parameters of HDNN are fine-tuned as a constant. The corresponding a mini-batch samples for training), and datasets ‘‘E’’ was
FC parameters are random initialized and freshly trained, in order used for evaluating the training effect after every training iteration. The
to accommodate the new subject MI feature in our application. The relationship between training error rate and evaluating accuracy were
learning rate of FC is kept with default learning rate. Transfer learning overlaid in Fig. 7. The mean total training time of nine subjects was
in deep learning representation, as empirically is verified in many about 15 s. As shown in Fig. 7, the training error was less than 0.1
previous literature, including many application in medical image, such and evaluating accuracy curve converged obviously to around 0.8 after
as [39–42]. More thorough theoretical studies on imaging statistics 300 iterations. To compare HDNN with other state-of-art method, we
with transfer learning will be needed for future studies. provided kappa values for HDNN, as presented in Table 1.
4
R. Zhang et al. Biomedical Signal Processing and Control 63 (2021) 102144
Table 1
Mean kappa values of the HDNN, HDNN-TL and competing methods on the BCI Competition IV Dataset 2a.
Methods Subjects Average
A01 A02 A03 A04 A05 A06 A07 A08 A09
Ang et al. (FBCSP). [11] 0.68 0.42 0.75 0.48 0.40 0.27 0.77 0.75 0.61 0.57
Kam et al. [45] 0.74 0.35 0.76 0.53 0.38 0.31 0.84 0.74 0.74 0.60
LDA 0.76 0.41 0.83 0.56 0.35 0.26 0.79 0.80 0.72 0.60
Blumberg et al. [46](EM-LDA) 0.59 0.41 0.82 0.57 0.38 0.29 0.79 0.80 0.72 0.60
Vidaurre et al. [13](PMean) 0.76 0.38 0.87 0.60 0.46 0.34 0.77 0.76 0.74 0.63
Luis et al. [47] 0.83 0.51 0.88 0.68 0.56 0.35 0.90 0.84 0.75 0.70
Rebeca et al. [48] 0.84 0.55 0.90 0.71 0.66 0.44 0.94 0.85 0.76 0.74
Sakhavi et al. [19] 0.88 0.65 0.90 0.66 0.62 0.45 0.89 0.83 0.79 0.74
Ai et al. [49] 0.77 0.54 0.84 70 0.63 0.61 0.77 0.84 0.86 0.73
zhang et al. [17](shared network) 0.87 0.59 0.90 0.76 0.82 0.66 0.95 0.86 0.89 0.80
HDNN 0.82 0.61 0.85 0.64 0.78 0.73 0.84 0.85 0.87 0.78
HDNN-TL 0.92 0.63 0.86 0.67 0.81 0.75 0.86 0.87 0.91 0.81
Fig. 8. The relationship of training errors and iterations with different size of training
samples. The training error converges faster with a small training sample set.
Fig. 7. The relationship between training error, evaluating accuracy and iterations. We
iteratived training the HDNN in 500 times. For each iteration, training batch is 24 sets
of EEG training samples. We also used corresponding evaluation dataset to evaluate
the network for every iteration training. Total training time is about 16 s. in [17] is that the training datasets need to contain a certain number
samples of the subjects expected to be classified. For that reason, the
network need to be retrained when comes a new subject. As shown
3.3. HDNN-TL evaluation in Table 1, there is nothing to choose between proposed HDNN-TL
and the shared network [17] in 4-class MI classification. However, the
Training HDNN-TL needs two step: pre-training and fine-tune train- remarkable advantage of proposed HDNN-TL is the network can learn
ing. We merged datasets ‘‘T’’ from all the subjects except the subject, the individual MI features faster and only need fewer training samples
who we want to evaluate, to pre-train the HDNN-TL, and we used the for a new subject.
datasets ‘‘T’’ of the absent subject in pre-training process to fine-tune Fig. 8 shows the training error convergence of HDNN-TL in fine-tune
FC parameters. For example, if we want to evaluate the HDNN-TL for training process of subject 9 with different size of training samples. Just
subject 8, we merge datasets ‘‘T’’ from subject 1∼7 and 9 to pre-train as we expected, the training error converges faster within a smaller
HDNN-TL, then datasets ‘‘T’’ from subject 8 is used to fine-tune FC training samples. Fig. 9 shows the evaluation accuracy for each subject
layers of HDNN-TL. The comparison of HDNN-TL, HDNN and other with different numbers of training samples. Although larger dataset
art-of-state methods evaluated on BCI Competition IV Dataset 2a was often leads to better classification results, a satisfactory classification
shown in Table 1, including the champion of this Competition [11]. result can also be gotten by a small one, even better than traditional
Moreover, LDA results were also provided as the baseline. The highest methods. The advantages of training network by a small sample are
kappa value was highlighted in boldface. faster network convergence and fewer training data needed to be
Although the classification performance is difference for each sub-
collected from the new subject.
ject, HDNN and HDNN-TL have a huge improvement compared with
previously traditional methods in general. The corresponding kappa
values are 0.57 for FBCSP and 0.60 for LDA, whereas they are for 4. Conclusion
HDNN-TL(0.81) and for HDNN(0.78) which demonstrate and promo-
tion 42%, 37%(FBCSP) and 35%, 30%(LDA), respectively, with respect The emergence of DL technique has greatly enhanced classification
on kappa values. Compared with HDNN, the improvement of HDNN- tasks in several fields, such as natural language and image processing.
TL is about 4%. The reason of the improvement maybe that HDNN-TL In recent years, deep learning methods have also used widely in BCI
can extract and learn more common features of MI signal. Compared applications. Huge amount of multi-channel EEG time series can be fed
with some current-state of the art methods [19,48,49], not only HDNN into deep neural networks to get a satisfactory result. The challenges
but also HDNN-TL has a improvement about 5 percent in Mean kappa of improving the classification accuracy due to (1) how to learn the
values. Compared with the shared network in [17], HDNN is less features presentation of MI task sufficiently and (2) how to deal with
impressive. The possible reason behind this is that all the subjects individual differences in EEG signal of different subjects. To address the
datasets are used to train the shared network and the MI features is above challenges, we proposed ‘‘brain-ID’’ framework based on deep
learned more comprehensively. However, the defect of shared network transfer learning methods for the classification of MI tasks.
5
R. Zhang et al. Biomedical Signal Processing and Control 63 (2021) 102144
Fig. 9. The comparison of evaluating accuracy different size of training samples. Although a better classification result can be gotten by a larger dataset, the classification result
is also acceptable by a small dataset.
The proposed end-to-end HDNN avoids the loss of features by Declaration of competing interest
learning the spatial and temporal features simultaneously from MI
signal (addressing problem 1). The combination of CNN and LSTM The authors declare that they have no known competing financial
builds a parallel network that has a significantly higher accuracy than interests or personal relationships that could have appeared to
traditional feature extraction and classification technique. Furthermore, influence the work reported in this paper.
OVR-FBCSP method is adopted to make the MI features more promi-
nent. Exactly as the conclusion in [19], the CSP method is not affected Acknowledgments
by the network optimization and in turn, the network is forced to work
with an input it has no control over. This research was supported in Fund of Science and Technology
Unlike other methods, the end-to-end HDNN-TL framework can on Space Intelligent Control Laboratory (6142208180202); the Min-
fine-tune the parameters to adjust to a new subject quickly by using istry of Education Equipment Development Fund (6141A0202304,
transfer learning technique (addressing problem 2). In the traditional 6141A02033311) in part by National key research and development
classification technique, it is a waste of time and troublesome to train Program of China under Grant(2018AAA0102401).
a classification model repeatedly when it comes to a new subject,
due to a. It needs sufficient MI data to make the classification model References
accurately. It is obviously that collecting the data needs a lot of time
and labor resources; b. It also takes a lot of time to train the classifica- [1] J.R. Wolpaw, N. Birbaumer, D.J. McFarland, G. Pfurtscheller, T.M. Vaughan,
tion model from scratch. To address the above problems, we proposed Brain–computer interfaces for communication and control, Clin. Neurophysiol.
113 (6) (2002) 767–791, http://dx.doi.org/10.1016/S1388-2457(02)00057-3.
HDNN to learn the common feature of MI task and FC to map the
[2] d.L.B. Van, B.D. Plass-Oude, B. Reuderink, M. Poel, A. Nijholt, How much control
common features with a new subject. Therefore, a small sample size is enough? Influence of unreliable input on user experience, IEEE Trans. Cybern.
can achieve a acceptable classification accuracy and the training time 43 (6) (2013) 1584–1592, http://dx.doi.org/10.1109/TCYB.2013.2282279.
decreases with fewer trainable parameters. [3] J. Long, Y. Li, H. Wang, T. Yu, J. Pan, F. Li, A hybrid brain computer interface
to control the direction and speed of a simulated or real wheelchair, IEEE Trans.
Furthermore, to settle over-fitting problem, the regularization tech-
Neural Syst. Rehabil. Eng. 20 (5) (2012) 720–729, http://dx.doi.org/10.1109/
nique, such as dropout, is utilized. Another advantage of this work is TNSRE.2012.2197221.
the zero-padding strategy which is used in CNN framework to avoid [4] M. Arvaneh, C. Guan, K.K. Ang, C. Quek, Optimizing spatial filters by minimiz-
losing the edge information of EEG spatial features. ing within-class dissimilarities in electroencephalogram-based brain–computer
interface, IEEE Trans. Neural Netw. Learn. Syst. 24 (4) (2013) 610–619, http:
One limitation of this study is that transfer learning method we
//dx.doi.org/10.1109/TNNLS.2013.2239310.
used still requires a small number of samples with labels to training [5] G. Pfurtscheller, Graphical display and statistical evaluation of event-related
the networks. According to the latest transfer learning researches, they desynchronization (ERD), Electroencephalogr. Clin. Neurophysiol. 43 (5) (1977)
can exploit rich labeled data from relevant domains to help the learning 757–760.
[6] G. Pfurtscheller, Event-related synchronization (ERS): an electrophysiological
in the target task with unsupervised domain adaptation [50,51]. This
correlate of cortical areas at rest, Electroencephalogr. Clin. Neurophysiol. 83
might be a possible way for the improved performance. In the future, (1) (1992) 62–69.
we will attempt to explore an unsupervised domain adaptation transfer [7] G. Pfurtscheller, C. Neuper, D. Flotzinger, M. Pregenzer, EEG-based discrimina-
learning method for MI classification. tion between imagination of right and left hand movement, Electroencephalogr.
Clin. Neurophysiol. 103 (6) (1997) 642–651.
Overall, this study indicates that ‘‘brain-ID’’ framework by means of
[8] T.M. Vaughan, W.J. Heetderks, L.J. Trejo, W.Z. Rymer, M. Weinrich, M.M.
CNN, LSTM and transfer learning is a promising classification technique Moore, A. Kübler, B.H. Dobkin, N. Birbaumer, E. Donchin, Brain-computer
for MI task which outperforms other technique such as LDA, SVM, interface technology: a review of the second international meeting, IEEE Trans.
and other traditional classifier. The presented work can be applied to Neural Syst. Rehabil. Eng. 11 (2) (2003) 94–109.
imagery-based BCI systems and extended to other types of EEG-based [9] J. Klonovs, C.K. Petersen, H. Olesen, A. Hammershoj, ID proof on the go:
Development of a mobile EEG-based biometric authentication system, IEEE
BCIs. Veh. Technol. Mag. 8 (1) (2013) 81–89, http://dx.doi.org/10.1109/MVT.2012.
2234056.
[10] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, K. Muller, Optimizing spatial
CRediT authorship contribution statement
filters for robust EEG single-trial analysis, IEEE Signal Proc. Mag. 25 (1) (2008)
41–56, http://dx.doi.org/10.1109/MSP.2008.4408441.
Ruilong Zhang: Writing - original draft, Conceptualization, [11] K.K. Ang, Z.Y. Chin, C. Wang, C. Guan, H. Zhang, Filter bank common spatial
Methodology, Software. Qun Zong: Project administration, Super- pattern algorithm on BCI competition IV datasets 2a and 2b, Front. Neurosci. 6
(2012) 39, http://dx.doi.org/10.3389/fnins.2012.00039.
vision. Liqian Dou: Writing - review & editing, Supervision. Xinyi
[12] S. Kumar, A. Sharma, T. Tsunoda, An improved discriminative filter bank
Zhao: Writing - review & editing, Conceptualization. Yifan Tang: selection approach for motor imagery EEG signal classification using mutual
Software, Validation. Zhiyu Li: Investigation, Software. information, BMC Bioinformatics 18 (Suppl 16) (2017) 545.
6
R. Zhang et al. Biomedical Signal Processing and Control 63 (2021) 102144
[13] C. Vidaurre, M. Kawanabe, P. von Bünau, B. Blankertz, K.R. Müller, Toward [33] D. Wu, B. Lance, V. Lawhern, Transfer learning and active transfer learning for
unsupervised adaptation of LDA for brain–computer interfaces, IEEE Trans. reducing calibration data in single-trial classification of visually-evoked poten-
Biomed. Eng. 58 (3) (2011) 587–597, http://dx.doi.org/10.1109/TBME.2010. tials, in: 2014 IEEE International Conference on Systems, Man, and Cybernetics
2093133. (SMC), 2014, pp. 2801–2807, http://dx.doi.org/10.1109/SMC.2014.6974353.
[14] Y. Li, C. Guan, An extended EM algorithm for joint feature extraction and [34] G. Dornhege, B. Blankertz, G. Curio, K.-R. Müller, Increase information transfer
classification in brain-computer interfaces, Neural Comput. 18 (11) (2006) rates in BCI by CSP extension to multi-class, in: Advances in Neural Information
2730–2761. Processing Systems, 2004, pp. 733–740.
[15] S. Sun, C. Zhang, Adaptive feature extraction for EEG signal classification, Med. [35] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8)
Biol. Eng. Comput. 44 (10) (2006) 931. (1997) 1735–1780.
[16] W. Wu, X. Gao, B. Hong, S. Gao, Classifying single-trial EEG during motor im- [36] K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber, LSTM:
agery by iterative spatio-spectral patterns learning (ISSPL), IEEE Trans. Biomed. A search space odyssey, IEEE Trans. Neural Netw. Learn. Syst. 28 (10) (2017)
Eng. 55 (6) (2008) 1733–1743, http://dx.doi.org/10.1109/TBME.2008.919125. 2222–2232.
[17] Z. Ruilong, Z. Qun, L. Dou, Z. Xinyi, A novel hybrid deep learning scheme [37] A.S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson, CNN features off-the-shelf:
for four-class motor imagery classification, J. Neural Eng. (2019) http://dx.doi. An astounding baseline for recognition, in: 2014 IEEE Conference on Computer
org/10.1088/1741-2552/ab3471, URL http://iopscience.iop.org/10.1088/1741- Vision and Pattern Recognition Workshops, 2014, pp. 512–519, http://dx.doi.
2552/ab3471. org/10.1109/CVPRW.2014.131.
[18] N. Lu, T. Li, X. Ren, H. Miao, A deep learning scheme for motor imagery [38] R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-based convolutional net-
classification based on restricted Boltzmann machines, IEEE Trans. Neural Syst. works for accurate object detection and segmentation, IEEE Trans. Pattern Anal.
Rehabil. Eng. 25 (6) (2017) 566–576, http://dx.doi.org/10.1109/TNSRE.2016. Mach. Intell. 38 (1) (2016) 142–158, http://dx.doi.org/10.1109/TPAMI.2015.
2601240. 2437384.
[19] S. Sakhavi, C. Guan, S. Yan, Learning temporal information for brain-computer [39] H. Shin, L. Lu, L. Kim, A. Seff, J. Yao, R.M. Summers, Interleaved text/image
interface using convolutional neural networks, IEEE Trans. Neural Netw. deep mining on a large-scale radiology database, in: 2015 IEEE Conference
Learn. Syst. 29 (11) (2018) 5619–5629, http://dx.doi.org/10.1109/TNNLS.2018. on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1090–1099,
2789927. http://dx.doi.org/10.1109/CVPR.2015.7298712.
[20] R.T. Schirrmeister, J.T. Springenberg, F. Ldj, M. Glasstetter, K. Eggensperger, M. [40] A. Gupta, M. Ayhan, A. Maida, Natural image bases to represent neuroimaging
Tangermann, F. Hutter, W. Burgard, T. Ball, Deep learning with convolutional data, in: International Conference on Machine Learning, 2013, pp. 987–994.
neural networks for EEG decoding and visualization, Hum. Brain Mapp. 38 (11) [41] F. Fahimi, Z. Zhang, W. Boon Goh, T.-S. Lee, K. Ang, C. Guan, Inter-subject
(2017) 5391–5420. transfer learning with end-to-end deep convolutional neural network for EEG-
[21] S. Sakhavi, C. Guan, S. Yan, Parallel convolutional-linear neural network for based BCI, J. Neural Eng. 16 (2018) http://dx.doi.org/10.1088/1741-2552/
motor imagery classification, in: 2015 23rd European Signal Processing Con- aaf3f6.
ference (EUSIPCO), 2015, pp. 2736–2740, http://dx.doi.org/10.1109/EUSIPCO. [42] C. Wei, Y. Lin, Y. Wang, T. Jung, N. Bigdely-Shamlo, C. Lin, Selective trans-
2015.7362882. fer learning for EEG-based drowsiness detection, in: 2015 IEEE International
[22] Y.R. Tabar, U. Halici, A novel deep learning approach for classification of EEG Conference on Systems, Man, and Cybernetics, 2015, pp. 3229–3232, http:
motor imagery signals, J. Neural Eng. 14 (1) (2017) 016003. //dx.doi.org/10.1109/SMC.2015.560.
[23] K.P. Thomas, C. Guan, C.T. Lau, A.P. Vinod, K.A. Kai, Adaptive tracking of [43] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, arXiv
discriminative frequency components in electroencephalograms for a robust preprint arXiv:1412.6980.
brain–computer interface, J. Neural Eng. 8 (3) (2011) 1–15. [44] M. Tangermann, K.-R. Müller, A. Aertsen, B. N, C. Braun, B. C, R. Leeb, M. C, M.
[24] S. Lu, C. Guan, H. Zhang, Unsupervised brain computer interface based on KJ, G. Müller-Putz, G. Nolte, G. Pfurtscheller, H. Preissl, G. Schalk, S. A, V. C,
intersubject information and online adaptation, IEEE Trans. Neural Syst. Rehabil. S. Waldert, B. B, Review of the BCI competitioncompetition IV, Front. Neurosci.
Eng. 17 (2) (2009) 135–145, http://dx.doi.org/10.1109/TNSRE.2009.2015197. 55 (2012).
[25] C. Vidaurre, C. Sannelli, K.R. Müller, B. Blankertz, Machine-learning-based [45] T.E. Kam, H.I. Suk, S.W. Lee, Non-homogeneous spatial filter optimiza-
coadaptive calibration for brain-computer interfaces, Neural Comput. 23 (3) tion for electroencephalogram (EEG)-based motor imagery classification,
(2011) 791–816. Neurocomputing 108 (5) (2013) 58–68.
[26] W. Samek, C. Vidaurre, K.R. Müller, M. Kawanabe, Stationary common spatial [46] J. Blumberg, J. Rickert, S. Waldert, A. Schulze-Bonhage, A. Aertsen, C. Mehring,
patterns for brain–computer interfacing, J. Neural Eng. 9 (2) (2012) 026013. Adaptive classification for brain computer interfaces, in: 2007 29th Annual
[27] F. Lotte, C. Guan, Regularizing common spatial patterns to improve BCI designs: International Conference of the IEEE Engineering in Medicine and Biology
Unified theory and new algorithms, IEEE Trans. Biomed. Eng. 58 (2) (2011) Society, 2007, pp. 2536–2539, http://dx.doi.org/10.1109/IEMBS.2007.4352845.
355–362, http://dx.doi.org/10.1109/TBME.2010.2082539. [47] L.F. Nicolas-Alonso, R. Corralejo, J. Gomez-Pilar, R. Hornero, Adaptive semi-
[28] S.J. Pan, Q. Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng. supervised classification to reduce intersession non-stationarity in multiclass
22 (10) (2010) 1345–1359, http://dx.doi.org/10.1109/TKDE.2009.191. motor imagery-based brain-computer interfaces, Neurocomputing 159 (C) (2015)
[29] Q. Yang, W.U. Xindong, 10 challenging problems in data mining research, Int. 186–196, http://dx.doi.org/10.1016/j.neucom.2015.02.005.
J. Inf. Tech. Decis. 5 (04) (2006) 597–604. [48] L.F. Nicolas-Alonso, R. Corralejo, J. Gomez-Pilar, D. Álvarez, R. Hornero, Adap-
[30] Y.A.L. Alsabahi, L. Fan, X. Feng, Image classification method in DR image tive stacked generalization for multiclass motor imagery-based brain computer
based on transfer learning, in: 2018 Eighth International Conference on Image interfaces, IEEE Trans. Neural Syst. Rehabil. Eng. 23 (4) (2015) 702–712, http:
Processing Theory, Tools and Applications (IPTA), 2018, pp. 1–4, http://dx.doi. //dx.doi.org/10.1109/TNSRE.2015.2398573.
org/10.1109/IPTA.2018.8608157. [49] Q. Ai, A. Chen, K. Chen, Q. Liu, T. Zhou, S. Xin, Z. Ji, Feature extraction of four-
[31] Y. Zhang, X. Gao, L. He, W. Lu, R. He, Objective video quality assessment class motor imagery EEG signals based on functional brain network, J. Neural
combining transfer learning with CNN, IEEE Trans. Neural Netw. Learn. Syst. Eng. 16 (2) (2019) 026032.
(2019) 1–15, http://dx.doi.org/10.1109/TNNLS.2018.2890310. [50] E. Tzeng, J. Hoffman, K. Saenko, T. Darrell, Adversarial discriminative domain
[32] M. Völker, R.T. Schirrmeister, L.D.J. Fiederer, W. Burgard, T. Ball, Deep transfer adaptation, in: Proceedings of the IEEE Conference on Computer Vision and
learning for error decoding from non-invasive EEG, in: 2018 6th International Pattern Recognition, 2017, pp. 7167–7176.
Conference on Brain-Computer Interface (BCI), 2018, pp. 1–6, http://dx.doi.org/ [51] Y. Zhang, Y. Wei, Q. Wu, P. Zhao, S. Niu, J. Huang, M. Tan, Collaborative
10.1109/IWW-BCI.2018.8311491. unsupervised domain adaptation for medical image diagnosis, 2020, arXiv
preprint arXiv:2007.07222.