Imp REVIW

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

A Survey on Deep Learning-based Non-Invasive Brain Signals:

Recent Advances and New Frontiers

Xiang Zhang1,5 , Lina Yao1 , Xianzhi Wang2 , Jessica Monaghan3 , David


McAlpine3 , Yu Zhang4
arXiv:1905.04149v5 [cs.HC] 21 Oct 2020

1 Universityof New South Wales, Australia


2 Universityof Technology Sydney, Australia
3 Macquarie university, Australia
4 Lehigh University, USA
5 Harvard University, USA

E-mail: xiang [email protected],


[email protected], [email protected],
{jessica.monaghan,david.mcalpine}@mq.edu.au, [email protected]

AcceptedOctober 2020

Abstract. Brain signals refer to the biometric information collected from the human brain.
The research on brain signals aims to discover the underlying neurological or physical status of
the individuals by signal decoding. The emerging deep learning techniques have improved the
study of brain signals significantly in recent years. In this work, we first present a taxonomy
of non-invasive brain signals and the basics of deep learning algorithms. Then, we provide the
frontiers of applying deep learning for non-invasive brain signals analysis, by summarizing a
large number of recent publications. Moreover, upon the deep learning-powered brain signal
studies, we report the potential real-world applications which benefit not only disabled people
but also normal individuals. Finally, we discuss the opening challenges and future directions.

Submitted to: J. Neural Eng.


A Survey on Deep Learning-based Non-Invasive Brain Signals 2

1. Introduction

Brain signals measure the instinct biometric information


from the human brain, which reflects the user’s passive
or active mental state. Through precise brain signal
decoding, we can recognize the underlying psychological
and physical status of the user and further improve his/her
life quality. Based on the signal collection, brain signals
contain invasive signals and non-invasive signals. The
former are acquired by electrodes deployed under the scalp
while the latter are collected upon human scalp without
electrodes being inserted. In this survey, we mainly Figure 1: Generally workflow of brain signal analysis. It is
consider non-invasive brain signals1 . named as a Brain-Computer Interface if the classified signal
are used to control smart equipment (dashed lines).
1.1. General Workflow
Figure 1 shows the general paradigm of brain signal Feature extraction refers to the process of extracting
decoding, which receives brain signals and produces discriminating features from the input signals through
the user’s latent informatics. The workflow includes domain knowledge. Traditional features are extracted
several key components: brain signal collection, signal from time-domain (e.g., variance, mean value, kurtosis),
preprocessing, feature extraction, classification, and data frequency-domain (e.g., fast Fourier transform), and time-
analysis. The brain signals are collected from humans frequency domains (e.g., discrete wavelet transform).
and sent to the preprocessing component for denoising They will enrich distinguishable information regarding
and enhancement. Then, the discriminating features are user intention. Feature extraction is highly dependent
extracted from the processed signals and sent to the on the domain knowledge. For example, neuroscience
classifier for further analysis. knowledge is required to extract distinctive features from
The collection methods differ from signal to signal. motor imagery EEG signals. Manual feature extraction
For example, EEG signals measure the voltage fluctuation is also time-consuming and difficult. Recently, deep
resulting from ionic current within the neurons of the learning provides a better option to automatically extract
brain. Collecting EEG signals requires placing a series distinguishable features.
of electrodes on the scalp of the human head to record The classification component refers to the machine
the electrical activity of the brain. Since the ionic learning algorithms that classify the extracted features into
current generated within the brain is measured at the scalp, logical control signals recognizable by external devices.
obstacles (e.g., skull) greatly decrease the signal quality— Deep learning algorithms are shown to be more powerful
the fidelity of the collected EEG signals, measured as than traditional classifiers [2, 3, 4].
Signal-to-Noise Ratio (SNR), is only approximately 5% of The classification results reflect the user’s psycholog-
that of original brain signals [1]. The collection methods of ical or physical status and can inspire further information
more non-invasive signals can be found in Appendix A. analysis. This is widely used in real-world applications
Therefore, brain signals are usually preprocessed be- such as neurological disorder diagnosis, emotion measur-
fore feature extraction to increase the SNR. The preprocess- ing, and driving fatigue detection. Appropriate treatment,
ing component contains multiple steps such as signal clean- therapy, and precaution could be conducted based on the
ing (smoothing the noisy signals or resolving the inconsis- analysis results.
tencies), signal normalization (normalizing each channel of In specific, the system is called a Brain-Computer
the signals along time-axis), signal enhancement (removing Interface (BCI) while the decoded brain signals are
direct current), and signal reduction (presenting a reduced converted into digital commands to control the smart
representation of the signal). equipment and react with the user (dashed lines in Figure 1).
BCI2 systems interpret the human brain patterns into

1 Without specification, the brain signals mentioned in this work refer 2 Apart from BCI, there are a number of similar terms to define the

to non-invasive signals. system that machines are directly controlled by human brain signals,
A Survey on Deep Learning-based Non-Invasive Brain Signals 3

messages or commands to communicate with the outer and feature extraction. Second, deep neural networks can
world [5]. BCI is generally a closed-loop system with an capture both representative high-level features and latent
external device (e.g., wheelchair and robotic arm), which dependencies through deep structures.
can directly serve the user. In contrast, brain signal analysis
doesn’t require a specific device as long as the analysis 1.3. Why this Survey is Necessary?
results can benefit society and individuals.
In this survey, we summarize the state-of-the-art We conduct this survey for three reasons. First, there
studies which adopt deep learning models: 1) for feature lacks a comprehensive survey on the non-invasive brain
extraction only; 2) for classification only; 3) for both signals. Table 1 shows a summary of the existing survey
feature extraction and classification. The details will be on brain signals. As our best knowledge, the limited
introduced in Section 4. Brain signal underpins many existing surveys [14, 24, 7, 11, 5, 8, 15] only focus on
novel applications that are important to people’s daily partial EEG signals. For example, Lotte et al. [11]
life. For example, the brain signal-based user identification and Wang et al. [18] focus on general EEG without
system, with high fake-resistance, allows normal people analyzing EEG subtypes; Cecotti et al. [28] focus on
to enjoy enhanced entertainment and security [6]; for Event-Related Potentials (ERP); Haseer et al. [29] focus on
people with psychological/physical deceases or disabilities, functional near-infrared spectroscopy (fNIRS); Mason et al.
brain signals enable them to control smart device such as [15] brief the neurological phenomenons like event-related
wheelchairs, home appliances, and robots. We present a desynchronization (ERD), P300, SSVEP, Visual Evoked
wide range of deep learning-based brain signal applications Potentials (VEP), Auditory Evoked Potentials (AEP) but
in Section 5. have not organized them systematically; Abdulkader et
al. [7] present a topology of brain signals but have
not mentioned spontaneous EEG and Rapid Serial Visual
1.2. Why Deep Learning?
Presentation (RSVP); Lotte et al. [5] have not considered
Although traditional brain signal system has made tremen- ERD and RSVP; VEP should be a subtype of ERP in [8].
dous progress [7, 8], it still faces significant challenges. Ahn et al. [21] review the performance variation in MI-EEG
First, brain signals are easily corrupted by various biologi- based BCI systems. Roy et al. [17] list some deep learning-
cal (e.g., eye blinks, muscle artifacts, fatigue, and the con- based EEG studies but present little technical inspirations
centration level) and environmental artifacts (e.g., noises) and have less analysis on deep learning algorithms, they
[7]. Therefore, it is crucial to distill informative data from also failed to investigate other non-invasive brain signals
corrupted brain signals and build a robust system that works beyond EEG. In particular, compared to [17], this work
in different situations. Second, it faces the low SNR of non- provides a better introduction of deep learning including the
stationary electrophysiological brain signals [9]. The low basic concepts, algorithms, and popular models (Section 3
SNR cannot be easily addressed by traditional preprocess- and Appendix B). Moreover, this paper discusses the high-
ing or feature extraction methods due to the time complex- level guidelines in brain signal analysis in terms of the brain
ity of those method and the risk of information loss [10]. signal paradigms, the suitable deep learning frameworks
Third, feature extraction highly depends on human exper- and the promising real-world applications (Section 6).
tise in the specific domain. For example, it requires the ba- Second, few research has investigated the association
sic biological knowledge to investigate sleep state through between deep learning ([30, 31]) and brain signals ([32, 7,
Electroencephalogram (EEG) signals. Human experience 11, 5, 8, 15]). To the best of our knowledge, this paper is in
may help on certain aspects but fall insufficient in more gen- the first batch of comprehensive survey on recent advances
eral circumstances. An automatic feature extraction method on deep learning-based brain signals. We also point out
is highly desirable. Moreover, most existing machine learn- frontiers and promising directions in this area.
ing research focuses on static data and therefore, cannot Lastly, the existing surveys focus on specific areas
classify rapidly changing brain signals accurately. For in- or applications and lack an overview of broad scenarios.
stance, the state-of-the-art classification accuracy for multi- For example, Litjens et al. [16] summarize several
class motor imagery EEG is generally below 80% [11]. It deep neural network concepts aiming at medical image
requires novel learning methods to deal with dynamical data analysis; Soekadar et al. [20] review the BCI systems
streams in brain signal systems. and machine learning methods for stroke-related motor
Until now, deep learning has been applied extensively paralysis based on Sensori-Motor Rhythms (SMR); Vieira
in brain signal applications and shown success in addressing et al. [33] investigate the application of brain signals on the
the above challenges [12, 13]. Deep learning has neurological disorder and psychiatric.
two advantages. First, it works directly on raw brain
signals, thus avoiding the time-consuming preprocessing 1.4. Our Contributions
This survey can mainly benefit: 1) the researchers
like Brain-Machine Interface (BMI), Brain Interface (BI), Direct Brain
Interface (DBI), Adaptive Brain Interface (ABI), and so on. with computer science background who are inter-
A Survey on Deep Learning-based Non-Invasive Brain Signals 4

Table 1: The existing survey on brain signals in the last decade. The column ‘Comprehensiveness’ indicates whether the
survey covers all subcategories of non-invasive brain signals or not. MI EEG refers to Motor Imagery EEG signals.

Publication
No. Reference Comprehensiveness Signal Deep Learning Area
Time
Mental Disease
2 [14] No fMRI Yes 2018
Diagnosis
3 [11] Partial EEG (MI EEG, P300) No 2007 Classification
4 [5] Partial EEG (MI EEG, P300) Partial 2018 Classification
5 [15] Partial EEG (ERD, P300, SSVEP, VEP, AEP) No 2007
Medical Image
6 [16] No MRI, CT Partial 2017
Analysis
7 [17] No EEG Yes 2019
8 [8] No EEG No 2007 Signal Processing
9 [18] Partial EEG No 2016 BCI Applications
10 [7] Yes No 2015
11 [19] No EEG Partial 2018
Neurorehabilitation
12 [20] No EEG, fMRI No 2015
of Stroke
13 [21] No MI EEG No 2015
14 [22] No fMRI No 2014
Applications
15 [23] No ERP (P300) No 2017
of ERP”
Applications
16 [24] No fMRI Yes 2018
of fMRI
17 [25] No ERP No 2017 Classification
18 [26] Partial EEG No 2019 Brain Biometrics
19 [27] Partial EEG No 2018 BCI Paradigms
EEG and the subcategories,
20 Current Study Yes Yes
fNIRS, fMRI, MEG

ested in the brain signal research; 2) the biomedi- Section 5 discusses the applications related to brain signals.
cal/medical/neuroscience experts who want to adopt deep Section 6 provides a detailed analysis and gives guidelines
learning techniques to solve problems in basic science. for choosing appropriate deep learning models based on
To our best knowledge, this survey is the first the specific brain signal. Section 7 points out the opening
comprehensive survey of the recent advances and frontiers challenges and future directions. Finally, Section 8 gives
of deep learning-based brain signal analysis. To this end, the concluding remarks.
we have summarized over 200 contributions, most of which
were published in the last five years. We make several key 2. Brain Imaging Techniques
contributions in this survey:
• We review brain signals and deep learning techniques In this section, we present a brief introduction of typical
to help readers gain a comprehensive understanding of non-invasive brain imaging techniques. More fundamental
this area of research. details about non-invasive brain signal (e.g., concepts,
characteristics, advantages, and drawbacks) are provided
• We discuss the popular deep learning techniques and
in Appendix A.
state-of-the-art models for brain signals, providing
Figure 2 shows a taxonomy of non-invasive brain sig-
practical guidelines for choosing the suitable deep
nals based on the signal collection method. Non-invasive
learning models given a specific subtype of signal.
signals divides into Electroencephalogram (EEG), Func-
• We review the applications of deep learning-based tional near-infrared spectroscopy (fNIRS), Functional mag-
brain signal analysis and highlight some promising netic resonance imaging (fMRI), and Magnetoencephalog-
topics for future research. raphy (MEG) [34]. Table 2 summarizes the characteristics
The rest of this survey is structured as followed. of various brain signals. In this survey, we mainly focus on
Section 2 briefly introduces an taxonomy of brain signals EEG signals and its subcategories because they dominate
in order to help the reader build a big picture in this field. the non-invasive signals. EEG monitors the voltage fluctu-
Section 3 overviews the commonly used deep learning ations generated by an electrical current within human neu-
models to present the basic knowledge for researchers rons. The electrodes attached on scalp can measure vari-
(e.g., neurological and biomedical scholars ) who are not ous types of EEG signals, including spontaneous EEG [35]
familiar with deep learning. Section 4 presents the state- and evoked potentials (EP) [36]. Depending on the sce-
of-the-art deep learning techniques for brain signals and nario, spontaneous EEG further diverges into sleep EEG,
A Survey on Deep Learning-based Non-Invasive Brain Signals 5

Sleeeping

Motor Imagery

Spontaneous EEG Emotional EEG

Mental Disease

Visual Evoked Rapid Serial Visual


Potential (VEP) Presentation (RSVP)
Others

EEG
Event-Related
Auditory Evoked Rapid Serial Auditory
Potiental
Potential (AEP) Presentation (RAVP)
(ERP)

Somatosensory
Evoked Potential
fNIRS (SEP)
Evoked Potential
(EP)
Non-Invasive
Brain Signals Steady State
Visually Evoked
Potentials (SSVEP)

fMRI
Steady State Steady State
Evoked Potentials Auditory Evoked
(SSEP) Potentials (SSAEP)

Steady State
Somatosensory
MEG
Evoked Potentials
(SSSEP)

Figure 2: The taxonomy of non-invasive brain signals. The dashed quadrilaterals (RAVP, SEP, SSAEP, and SSSEP) are
not included in this survey because there is no existing work focusing on them involving deep learning algorithms. P300,
which is a positive potential recorded approximately 300 ms after the onset of presented stimuli, is not listed in this signal
tree because it is included by ERP (which refers to all the potentials after the presented stimuli). In this classification, other
brain imaging technique beyond EEG (e.g., MEG and fNIRS) could also include visual/auditory tasks theoretically, but we
omitted them since there is no existing work adopting deep learning on these tasks.

motor imagery EEG, emotional EEG, mental disease EEG, 3. Overview on Deep Learning Models
and others. Similarly, EP divides into event-related poten-
tials (ERP) [28] and steady-state evoked potentials (SSEP) In this section, we formally introduce the deep learning
[37] according to the frequency of external stimuli. Each models including concepts, architectures, and techniques
potential contains visual-, auditory-, and somatosensory- that are commonly used in the field of brain signal
potentials based on the external stimuli types. researches. Deep learning is a class of machine learning
Regarding the other non-invasive techniques, fNIRS techniques that uses many layers of information-processing
produces functional neuroimages by employing near- stages in hierarchical architectures for pattern classification
infrared (NIR) light to measure the aggregation de- and feature/representation learning [31]. More detailed
gree of oxygenated hemoglobin (Hb) and deoxygenated- information about the deep learning techniques which
hemoglobin (deoxy-Hb), both of which have higher ab- are common-used in brain signal analysis can be find in
sorbers of light than other head components such as skull Appendix B.
and scalp [38]; fMRI monitors brain activities by detecting Deep learning algorithms contain several subcate-
the blood flow changes in brain areas [14]; MEG reflects gories based on the aim of the techniques (Figure 3):
brain activities via magnetic changes [39]. • Discriminative deep learning models, which classify
the input data into a pre-known label based on the
adaptively learned discriminative features. Discrim-
A Survey on Deep Learning-based Non-Invasive Brain Signals 6

Table 2: Summary of non-invasive brain signals’ characteristics.

Signals EEG fNIRS fMRI MEG


Spatial resolution Low Intermediate High Intermediate
Temporal resolution High Low Low High
Signal-to-Noise Ratio Low Low Intermediate Low
Portability High High Low Low
Cost Low Low High High
Characteristic Electrical Metabolic Metabolic Magnetic

Deep Learning
Models

Discriminative Representative Generative


Hybrid Models
Models Models Models

MLP RNN CNN AE RBM DBN VAE GAN

LSTM GRU D-AE D-RBM DBN-AE DBN-RBM

Figure 3: Deep learning models. They can be divided into discriminative, representative, generative and hybrid models
based on the algorithm functions. Discriminative models (Appendix B.1) mainly include Multi-Layer Perceptron (MLP),
Recurrent Neural Networks(RNN), and Convolutional Neural Networks (CNN). The two mainstreams of RNN are Long
Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Representative models (Appendix B.2) can be divided
into Authoencoder (AE), Restricted Boltzmann Machine (RBM), and Deep Belief Networks (DBN). D-AE denotes Deep-
Autoencoder which refers to the Autoencoder with multiple hidden layers. Likewise, D-RBM denotes Deep-Restricted
Boltzmann Machine with multiple hidden layers. Deep Belief Network can be composed of AE or RBM, therefore, we
divided DBN into DBN-AE and DBN-RBM. Generative models (Appendix B.3) that are commonly used in non-invasive
brain signal analysis include Variational Autoencoder (VAE) and Generative Adversarial Networks (GAN).

Table 3: Summary of deep learning model types

Deep Learning Input Output Function Training method


Discriminative Input data Label Feature extraction, Classification Supervised
Representative Input data Representation Feature extraction Unsupervised
Generative Input data New Sample Generation, Reconstruction Unsupervised
Hybrid Input data – – –

inative algorithms are able to learn distinctive fea- Perceptron (MLP) [40], Recurrent Neural Networks
tures by non-linear transformation, and classification (RNN) [41], Convolutional Neural Networks (CNN)
through probabilistic prediction3 . Thus these algo- [42], along with their variations.
rithms can play the role of both feature extraction • Representative deep learning models, which learn the
and classification (corresponding to Figure 1). Dis- pure and representative features from the input data.
criminative architectures mainly include Multi-Layer These algorithms only have the function of feature
3 The classification function is achieved by the combination of a
extraction (Figure 1) but cannot make classification.
Commonly used deep learning algorithms for repre-
softmax layer and one-hot label encoding. The one-hot label encoding
refers to encoding the label by the one-hot method, which is a group of sentation are Autoencoder (AE) [43], Restricted Boltz-
bits among which the only valid combinations of values are those with a mann Machine (RBM) [44], Deep Belief Networks
single high (1) bit and all the others low (0) bits. For instance, a set of (DBN) [45], along with their variations.
labels 0, 1, 2, 3 can be encoded as (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0,
0, 0, 1). • Generative deep learning models, which learn the
A Survey on Deep Learning-based Non-Invasive Brain Signals 7

joint probability distribution of the input data and the SWS since there is no clear distinction between them [49].
target label. In the brain signal scope, generative Generally, in the sleep stage analysis, the EEG signals are
algorithms are mostly used to generate a batch of preprocessed by a filter which has various passband in dif-
brain signals samples to enhance the training set. ferent papers, but all notched at 50 Hz. The EEG signals
Generative models commonly used in brain signal are usually segmented into 30s windows.
analysis include Variational Autoencoder (VAE)4 [46], (i) Discriminative models. CNN are frequently used
Generative Adversarial Networks (GANs) [47], etc. for sleep stage classification on single-channel EEG [25,
• Hybrid deep learning models, which combine more 50]. For example, Viamala et al. [51] manually extracted
than two deep learning models. For example, the time-frequency features and achieved a classification
the typical hybrid deep learning model employs a accuracy of 86%. Others used RNN [52] and LSTM
representation algorithm for feature extraction and [53] based on various features from the frequency domain,
discriminative algorithms for classification. correlation, and graph theoretical features.
(ii) Representative models. Tan et al. [54] adopted
The summary of the characteristics of each deep a DBN-RBM algorithm to detect sleep spindle based on
learning subcategories are listed in Table 3. Almost all the Power Spectral Density (PSD) features extracted from sleep
classification functions in neural networks are implemented EEG signals and achieved an F-1 of 92.78% on a local
by a softmax layer, which will not be regarded as an dataset. Zhang et al. [49] further combined DBN-RBM
algorithmic component in this survey. For instance, a model with three RBMs for sleep feature extraction.
combining a DBN and a softmax layer will still be regarded (iii) Hybrid models. Manzano et al. [55] presented
as a representative model instead of a hybrid model. a multi-view algorithm in order to predict sleep stage
by combining CNN and MLP. The CNN was employed
4. State-of-The-Art DL Techniques for Brain Signals to receive the raw time-domain EEG oscillations while
the MLP received the spectrum singles processed by the
In this section, we thoroughly summarize the advanced Short-Time Fourier Transform (STFT) among 0.5-32 Hz.
studies on deep learning-based brain signals (Table 4). The Fraiwan et al. [56] combined DBN with MLP for neonatal
hybrid models are divided into three parts: the combination sleep state identification. Supratak et al. [57] proposed
of RNN and CNN, the combination of representative and a model by combing a multi-view CNN and LSTM for
discriminative models (denoted as ‘Repre + Discri’), and automatic sleep stage scoring, in which the former was
others hybrid models. adopted to discover time-invariant dependencies while the
latter (a bidirectional LSTM) was adopted the temporal
4.1. EEG features during the sleep. Dong et al. [58] proposed a
hybrid deep learning model aiming at temporal sleep stage
Due to the advantages of high portability and low price, classification and took advantage of MLP for detecting
EEG signals have attracted much attention. Most of the hierarchical features along with LSTM for sequential
latest publications on non-invasive brain signals are related information learning.
to EEG. In this section, we summarize two aspects of (2) MI EEG. Deep learning models have shown the
EEG signals: spontaneous EEG and evoked potentials. As superior on the classification of Motor-Imagery (MI) EEG
implied by the name, the former are spontaneous and the and real-motor EEG [59, 60].
latter requires outside stimuli. (i) Discriminative models. Such models mostly use
CNN to recognize MI EEG [61]. Some are based on
4.1.1. Spontaneous EEG We present the deep learning manually extracted features [62, 63]. For instance, Lee et al.
models for spontaneous EEG according to the application [64] and Zhang et al. [65] employed CNN and 2-D CNN,
scenarios as follows. respectively, for classification; Zhang et al. [65] learned
(1) Sleep EEG. Sleep EEG is mainly used for rec- affective information from EEG signals to built a modified
ognizing the sleep stage and diagnosing sleep disorders LSTM control smart home appliances. Others also used
or cultivating the healthy habit [48, 49]. According to CNN for feature extraction [66]. For example, Wang et
Rechtschaffen and Kales (R&K) rules, the sleep stage in- al. [67] first used CNN to capture latent connections from
cludes wakefulness, non-REM (rapid eye movement) 1, MI-EEG signals and then applied weak classifiers to choose
non-REM 2, non-REM 3, non-REM 4, and REM. The important features for the final classification; Hartmann et
American Academy of Sleep Medicine (AASM) recom- al. [59] investigated how CNN represented spectral features
mends segmentation of sleep in five stages: wakefulness, through the sequence of the MI EEG samples. MLP has
non-REM 1, non-REM 2, slow wave sleep (SWS), and also been applied for MI EEG recognition [68], which
REM. The non-REM 3 and non-REM 4 are combined into showed higher sensitivity to EEG phase features at earlier
stages and higher sensitivity to EEG amplitude features at
4 VAE is a variation of AE. However, they are working on different later stages.
aspects. Therefore, we separately introduce AE and VAE.
Table 4: A summary of non-invasive brain signal studies based on deep learning models
Deep Learning Models
Discriminative Models Representative Models Generative Models Hybrid Models
brain signals
DBN
MLP RNN CNN AE (D-AE) RBM (D-RBM) VAE GAN LSTM+CNN Repre + Discri Others
DBN-AE DBN-RBM
[51],[48],
[57] [55],
Sleep EEG [69, 52] [53, 52] [25, 50], [49, 54] [56]
[52] [58]
[70, 52]
Non-
[64], [72],
invasive EEG Spont-
[60], [84, 85],
Signals aneous [6], [74, 75] [78, 79],
MI EEG [71],[68] [63],[73], [77] [81] [82, 10] [4],[83] [67, 86]
EEG [61, 65] [76] [80]
[59, 62] [2]
[66]
[89],[90], [98],[99],
Emotional [94], [96, 97] [105, 106]
[87] [88] [91], [99] [100],[101], [104] [108]
EEG [95] [98] [107]
[92, 93] [102, 103]
[112],[113],
[114],[115], [121],[122],
Mental Disease [129, 120],
[109] [110],[111] [116],[117], [123], [125] [126, 127] [128]
EEG [130, 131]
[118, 119] [124]
[120]
[132, 81],
Data
[133]
Augmentation
[134]
[139], [140],
[141],[142],
[135, 136] [138],[143],[144], [153],[152],
Others [138] [150],[151] [152] [156, 147] [157],[158, 159] [160]
[137] [145, 146] [154, 155]
[147, 147]
[148, 149]
[163],[73]
[164, 147], [168, 169], [170],[171], [97, 173]
VEP [161, 162], [163, 134] [167]
[147, 165] [165] [172] [96]
ERP
EP [166]
[176],[177],
[178],[179],
[180, 181],
RSVP [174, 175], [186] [181, 175] [12]
[182, 175]
[183, 184]
[185, 12]
[187, 165],
AEP [165]
[166, 188]
[191], [192],
SSEP SSVEP [189] [190] [190, 193] [195] [195] [196] [197]
[194]
[38],[198],
fNIRS [199],[71], [198] [201]
[200]
[204],[63],
[205],[206], [214, 215],
fMRI [202, 203] [209] [210], [211],[210],[212, 213] [217]
[117],[207], [216, 203]
[208, 194]
MEG [218],[204] [219] [220]
A Survey on Deep Learning-based Non-Invasive Brain Signals 9

(ii) Representative models. DBN is widely used as a affective state recognition based on representative features
basis for MI EEG classification for its high representative of the residual channels.
ability [80, 79]. For example, Ren et al. [78] applied a The emotion is affected by many subjective and
convolutional DBN based on RBM components, showing environmental factors (e.g., gender and fatigue). Yan et
better feature representation than hand-crafted features. Li al. [95] investigated the discrepancy of emotional patterns
et al. [77] processed EEG signals with discrete wavelet between men and women by proposing a novel model
transformation and then applied a DBN-AE based on called Bimodal Deep AutoEncoder (BDAE) which received
denoising AE. Other models include the combination of AE both EEG and eye movement features and shared the
model (for feature extraction) and a KNN classifier [75], information in a fusion layer which connected with an
the combination of Genetic Algorithm (for hyper-parameter SVM classifier. The results showed that the females have
tuning) and MLP (for classification) [84], the combination higher EEG signal diversity on the fearful emotion while
AE and XGBoost for multi-person scenarios [76], and males on sad emotion. Moreover, for women, the inter-
the combination of LSTM and reinforcement learning for subject differences in fear is more significant then other
multi-modality signal classification [85, 2]. emotions [95]. To overcome the mismatched distribution
(iii) Hybrid models. Several studies proposed hybrid among the samples collected from different subjects or
models for the recognition of MI EEG [81]. For example, different experimental sessions, Chai et al. [94] proposed an
Tabar et al. [4] extracted high-level representations from the unsupervised domain adaptation technology which is called
time, frequency domain and location information of EEG subspace alignment autoencoder (SAAE) by combing an
signals using CNN and then used a DBN-AE with seven AE and a subspace alignment solution. The proposed
AEs as the classifier; Tan et al. [82] used a denoising AE approach obtained a mean accuracy of 77.88% in person
for dimensional reduction, a multi-view CNN combined independent scenario.
with RNN for discovering latent temporal and spatial (iii) Hybrid models. One common-used hybrid
information, and finally achieved an average accuracy of model is a combination of RNN and MLP. For example,
72.22% on a public dataset. Alhagry et al. [108] employed an LSTM architecture
(3) Emotional EEG. The emotion of an individual for feature extraction from emotional EEG signals and
can be evaluated in three aspects: valence, arousal, and the features are forwarded into an MLP for classification.
dominance. The combination of the three aspects form Furthermore, Yin et al. [107] proposed a multi-view
emotions such as fear, sadness, and anger, which can be ensemble classifier to recognize individual emotions using
revealed by EEG signals. multimodal physiological signals. The ensemble classifier
(i) Discriminative models. MLP are traditionally used contains several D-AEs with three hidden layers and a
[137, 87] while CNN and RNN are increasingly popular fusion structure. Each D-AE receives one physiological
in EEG based emotion prediction [89, 90]. Typical CNN- signal (e.g., EEG) and then sends the outputs of D-AE to
based work in this category includes hierarchical CNN a fusion structure which is composed of another D-AE. At
[89, 92] and augmenting the training set for CNN [91]. Li last, an MLP classifier makes the prediction based on the
et al. [89] were the first to propose capturing the spatial mixed features. Kawde et al. [105] implemented an affect
dependencies among EEG channels via converting multi- recognition system by combining a DBN-RBM for effective
channel EEG signals into a 2-D matrix. Besides, Talathi feature extraction and an MLP for classification.
[110] used a discriminative deep learning model composed (4) Mental Disease EEG. A large number of
of GRU cells. Zhang et al. [88] proposed a spatial- researchers exploited EEG signals to diagnose neurological
temporal recurrent neural network, which employs a multi- disorders, especially epileptic seizure [109].
directional RNN layer to discover long-range contextual (i) Discriminative models. The CNN is widely used
cues and a bi-directional RNN layer to capture sequential in the automatic detection of epileptic seizure [112, 114,
features produced by the previous spatial RNN. 116, 93]. For example, Johansen et al. [118] adopted
(ii) Representative models. DBN, especially DBN- CNN to work on the high-passed (1 Hz) EEG signals of
RBM, is widely used for the unsupervised representation epileptic spike and achieved an AUC of 94.7%. Acharya
ability in emotion recognition [100, 106, 103]. For instance, et al. [113] employed a CNN model with 13 layers
Xu et al. [99, 101] proposed a DBN-RBM algorithm with on depression detection, which was evaluated on a local
three RBMs and an RBM-AE to predict affective state; dataset with 30 subjects and achieved the accuracies of
Zhao et al. [126] and Zheng et al. [102] cobmined 93.5% and 96.0% based on the left- and right- hemisphere
DBN-RBM with SVM and Hidden Markov Model (HMM), EEG signals, respectively. Morabito et al. [115] tried to
respectively, addressing the same problem; Zheng et al. exploit a CNN structure to extract suitable features of multi-
[96, 97] introduced a D-RBM with five hidden RBM layers channel EEG signals to classify Alzheimer’s Disease from
to search the important frequency patterns and informative the patients with Mild Cognitive Impairment and healthy
channels in affection recognition; Jia et al. [98] eliminated control group. The EEG signals are filtered in bandpass
channels with high errors and then used D-RBM for (0.1 ∼ 30 Hz) and achieved an accuracy of around 82% for
A Survey on Deep Learning-based Non-Invasive Brain Signals 10

three-class classification. Eapid Eye Movement Behavior signal classification [132]. Palazzo et al. [133] first
Disorder (RBD) may cause many mental disorder diseases demonstrated that the information contained in brainwaves
like Parkinson’s disease (PD). Ruffini et al. [111] described are empowered to distinguish the visual object and then
an Echo State Networks (ESNs) model, a particular class extracted more robust and distinguishable representations
of RNN, to distinguish RBD from healthy individuals. In of EEG data using RNN. At last, they employed the
some research, the discriminative model is only employed GAN paradigm to train an image generator conditioned by
for feature extraction. For example, Ansari et al. [119] used the learned EEG representations, which could convert the
CNN to extract the latent features and fed into a Random EEG signals into images [133]. Kavasidis et al. [134]
Forest classifier for the final seizure detection of neonatal aiming at converting EEG signals into images. The EEG
babies. Chu et al. [149] combined CNN and a traditional signals were collected when the subjects were observing
classifier for schizophrenia recognition. images on a screen. An LSTM layer was employed to
(ii) Representative models. For disease detection, extract the latent features from the EEG signals, and the
one commonly used method is adopting a representative extracted features were regarded as the input of a GAN
model (e.g., DBN) followed by a softmax layer for structure. The generator and the discriminator of the
classification [127]. Page et al. [125] adopted DBN-AE GAN were both composed of convolutional layers. The
to extract informative features from seizure EEG signals. generator was supposed to generate an image based on the
The extracted features were fed into a traditional logistic input EEG signals after the pre-training. Abdelfattach et
regression classifier for seizure detection. Al et al. [131] al. [132] adopted a GAN on seizure data augmentation.
proposed a multi-view DBN-RBM structure to analyze The generator and discriminator are both composed of
EEG signals from depression patients. The proposed fully-connected layers. The authors demonstrated that
approach contains multiple input pathways, composed of GAN outperforms other generative models such as AE and
two RBMs, while each corresponded to one EEG channel. VAE. After the augmentation, the classification accuracy
All the input pathways would merge into a shared structure increased dramatically from 48% to 82%.
which is composed of another RBMs. Some papers would (6) Others. Some researches have explored a wide
like to preprocess the EEG signals through dimensionality range of exciting topics. The first one is how EEG signals
reduction methods such as PCA [129] while others prefer are affected by audio/visual stimuli. This differs from the
to directly fed the raw signals to the representative model potentials evoked by audio/visual stimulations because the
[122]. Lin et al. [122] proposed a sparse D-AE with three stimuli in this phenomenon always exist instead of flicking
hidden layers to extract the representative features from in a particular frequency. Stober et al. [188, 142] claimed
epileptic EEG signals while Hosseini et al. [129] adopted a that the rhythm-evoked EEG signals are informative enough
similar sparse D-AE with two hidden layers. to distinguish the rhythm stimuli. The authors conducted
(iii) Hybrid models. A popular hybrid method is an experiment where 13 participants were stimulated by 23
a combination of RNN and CNN. Shah et al. [128] rhythmic stimuli, including 12 East African and 12 Western
investigated the performance of CNN-LSTM on seizure stimuli. For the 24-category classification, the proposed
detection after channel selection and the sensitivities range CNN achieved a mean accuracy of 24.4%. After that,
from 33% to 37% while false alarms ranges from 38% the authors exploited convolutional AE for representation
to 50%. Golmohammadi et al. [130] proposed a learning and CNN for recognition and achieved an accuracy
hybrid architecture for automatic interpretation of EEG by of 27% for 12-class classification [157]. Sternin et al. [148]
integrating both the temporal and spatial information. 2D adopted CNN to capture discriminative features from the
and 1D CNNs capture the spatial features while LSTM EEG oscillations to distinguish whether the subject was
networks capture the temporal features. The authors listening or imaging music. Similarly, Sarkar et al. [165]
claimed a sensitivity of 30.83% and a specificity of 96.86% designed two deep learning models to recognize the EEG
on the well-known TUH EEG seizure corpus. In the signals aroused by audio or visual stimuli. For this binary
detection of early-stage Creutzfeldt-Jakob Disease (SJD), classification task, the proposed CNN and DBN-RBM with
Morabito et al. [123] combined D-AE and MLP together. three RBMs achieved the accuracy of 91.63% and 91.75%,
The EEG signals of SJD were first filtered by bandpass respectively. Furthermore, the spontaneous EEG could be
(0.5∼70 Hz) and then fed into a D-AE with two hidden used to distinguish the user’s mental state (logical versus
layers for feature representation. At last, the MLP emotional) [172].
classifier obtained the accuracy of 81∼ 83% in a local Moreover, some researchers focus on the impact on
dataset. Convolutional autoencoder, replacing the fully- EEG of cognitive load [138] or physical workload [221].
connected layers in a standard AE by convolutional and Bashivan et al. [159] first extract informative features
de-convolutional layers, is applied to extract the seizure through wavelet entropy and band-specific power, which
features in an unsupervised manner [124]. would be fed into a DBN-RBM for further refining. At last,
(5) Data augmentation. The generative models such an MLP is employed for cognitive load level recognition.
as GAN could be used for data augmentation in brain The authors, in another work [171], also denoted to find the
A Survey on Deep Learning-based Non-Invasive Brain Signals 11

general features which are constant in inter-/intra- subjects (8-13 Hz) of EEG and an ERS in the beta band (13-30
scenarios under various mental load. Yin et al. [150] Hz). In particular, the ERD/ERS were calculated as relative
collected the EEG signals from different mental workload changes in power concerning baseline: ERD/ERS =
levels (e.g., high and low) for binary classification. The (Pe −Pb )/Pb , where Pe denotes the signals power over one-
EEG signals are filtered by a low-pass filter, transformed to second segment when the event occurring and Pb denotes
the frequency domain and be calculated the power spectral the signal power in a one-second segment during baseline
density (PSD). The extracted PSD features were fed into a which is before the event [71]. Generally, the baseline refers
denoising D-AE structure for future refining. They finally to the rest state. For example, Sakhavi et al. calculated the
got an accuracy of 95.48%. Li et al. [155] worked on the ERD/ERS map and analyzed the different patterns among
recognition of mental fatigue level, including alert, slight different tasks. The analysis demonstrated that the dynamic
fatigue, and severe fatigue. of energy should be considered because the static energy
In addition, EEG based driver fatigue detection is an does not contains enough information [86].
attractive area [158, 151, 147]. Huang et al. [140] designed There are several overlooked yet promising areas.
a 3D CNN to predict the reaction time in drowsiness Baltatzis et al. [141] adopted CNN to detect school
driving. This is meaningful to reduce traffic accident. bullying through the EEG when watching the specific
Hajinoroozi et al. [153] adopted a DBN-RBM to handle video. They achieved 93.7% and 88.58% for binary and
the EEG signals which were processed by ICA. They four-class classification. Khurana et al. [223] proposed
achieved an accuracy of around 85% in binary classification deep dictionary learning that outperformed several deep
(‘drowsy’ or ‘alert’). The strength of this paper is that it learning methods. Volker et al. [143] evaluated the use of
evaluated the DBN-RBM on three levels: time samples, Deep CNN in flanker task, which achieved an averaging
channel epochs, and windowed samples. The experiments accuracy of 84.1% on the seen subject and 81.7 on the
illustrated that the channel epoch level outperformed the unseen subject. Zhang et al. [160] combined CNN and
other two levels. San et al. [154] combined deep learning graph network to discover the latent information from the
models with a traditional classifier to detect driver fatigue. EEG signal.
The model contains a DBN-RBM structure followed by an Miranda-Correa et al. [104] proposed a cascaded
SVM classifier, which achieved the detection accuracy of framework by combing RNN and CNN to predict
73.29%. Almogbel et al. [145] investigated the drivers’ individuals’ affective level and personal factors (Big-
mental state under different low workload levels. A five personality traits, mood, and social context). An
proposed CNN is claimed to detect the driving workload experiment conducted by Putten et al. [146] attempted
directly based on the raw EEG signals. to identify the user’s gender based on their EEG signals.
The research of the detection of eye state has shown They employed a standard CNN algorithm and achieved
exceeding accuracy. Narejo et al. [152] explored the the binary classification accuracy of 81% over a local
detection of eye state (closed or open) based on EEG dataset. The detection of emergency braking intention
signals. They tried a DBN-RBM with three RBMs and a could help to reduce the responses time. Hernandez et
DBN-AE with three AEs and achieved a high accuracy of al. [144] demonstrated that the driver’s EEG signals
98.9%. Reddy et al. [136] tried a simpler structure, MLP, could distinguish braking intention and normal driving
and got a slightly lower accuracy of 97.5%. state. They combined a CNN algorithm which achieved the
Furthermore, to make this survey more complete, we accuracy of 71.8% in binary classification. Behncke et al.
provide a brief introduction of Event-related desynchro- [139] applied deep learning, a CNN model, in the context
nization/synchronization (ERD/ERS). ERD/ERS refers to of robot assistive devices. They attempted to use CNN to
the phenomena that the magnitude and frequency distri- improve the accuracy of decoding robot errors from EEG
bution of the EEG signal power changes during a specific while the subject was watching the robot both during an
brain state [36]. In particular, ERD denotes the power object grasping and a pouring task.
decrease of ongoing EEG signals while ERS represents Teo et al. [135] tried to combine the brain signal and
the power increase of EEG signals. This characteristic recommender system, which predicted the user’s preference
of ERD/ERS of brain signals can be used to detect the by EEG signals. There were sixteen participants took the
event which caused the EEG fluctuation. For example, experiments which collected the EEG signals when the
[222] presents the ERD/ERS phenomena in motor cortex subject was presented 60 bracelet-like objects as rotating
recorded during a motor-imagery task. visual stimuli (a 3D object). Then, an MLP algorithm was
ERD/ERS mainly appears in sensory, cognitive and adopted to classify the user like or dislike the object. This
motor procedures, which is not widely used in brain exploration got the prediction accuracy of 63.99%. Some
research due to the drawbacks like unstable accuracy cross researchers have tried to explore a common framework
subjects [36]. In most of the situations, the ERD/ERS is which can be used for various brain signal paradigms.
regarded as a specific feature of EEG powers for further Lawhern et al. [73] introduced EEGNet based on a compact
analysis [81, 4]. The task causes an ERD in the mu band CNN and evaluated its robustness in various brain signal
A Survey on Deep Learning-based Non-Invasive Brain Signals 12

contexts [73]. results. The AEP signals are filtered by 0.1 ∼ 8 Hz and
downsampled from 256 Hz to 25 Hz. The experimental
4.1.2. Evoked Potential Next, we introduce the latest results showed that the downsampled data work better.
researches on evoked potentials including ERP and SSEP. (iii) RSVP. Among various VEP diagrams, RSVP has
(1) ERP. In most situations, the ERP signals are attracted much attention [183]. In the analysis of RSVP,
analyzed through P300 phenomena. Meanwhile, almost a number of discriminative deep learning models (e,g.,
all the studies on P300 are based on the scenario of ERP. CNN [177, 178, 182] and MLP [174]) has achieved a
Therefore, in this section, a majority of the P300 related big success. A common preprocessing method used in
publications are introduced in the subsection of VEP/AEP RSVP signals is frequency filtering. The pass bands are
according to the scenario. generally ranged from 0.1 ∼ 50 Hz [176, 185]. Cecotti
(i) VEP. VEP is one of the most popular subcategories et al. [12] worked on the classification of ERP signals in
of ERP [23, 224, 163]. Ma et al. [225] worked on motion- RSVP scenario and proposed a modified CNN model for the
onset VEP (mVEP) by extracting representative features detection of the specific target in RSVP. In the experiment,
through deep learning and adopted genetic algorithm the images of faces and cars were regarded as target or non-
combined with a multi-level sensing structure to compress target, respectively. The image presenting frequency is 2
the raw signals. The compressed signals were sent to a Hz. In each session, the target probability was 10%. The
DBN-RBM algorithm to capture the more abstract high- proposed model offered an AUC of 86.1%. Hajinoroozi et
level features. Maddula et al. [170] filtered the P300 signals al. [179] adopted a CNN model targeting the inter-subject
with visual stimuli by a bandpass filter (2 ∼ 35 Hz) and and inter-task detection of RSVP. The experimental results
then fed into a proposed hybrid deep learning model for showed that CNN worked good in cross-task but failed to
further analysis. The model includes a 2D CNN structure get satisfying performance in the cross-subject scenario.
to capture the spatial features followed by an LSTM layer Mao et al. [175] compared three different deep neural
for temporal feature extraction. Liu et al. [168] combined network algorithms in the prediction of whether the subject
a DBN-RBM representative model with an SVM classifier had seen the target or not. The MLP, CNN, and DBN
for concealed information test and achieved a high accuracy models obtained the AUC of 81.7%, 79.6%, and 81.6%,
of 97.3% over a local dataset. Gao et al. [167] employed respectively. The author also applied a CNN model to
an AE model for feature extraction followed by an SVM analyze the RSVP signals for person identification [180].
classifier. In the experiment, each segment contains 150 The representative deep learning models are also
points, which were divided into five time-steps, and each applied in RSVP. Vareka et al. [186] verified if deep
step had 30 points. This model achieved an accuracy learning performs well for single trial P300 classification.
of 88.1% over a local dataset. A wide range of P300 They conducted an RSVP experiment while the subjects
related studies is based on P300 speller [173], which allows were asked to recognize the target from non-target and
the user to write characters. Cecotti et al. [177] tried distracters. Then a DBN-AE was implemented and
to increase the P300 detection accuracy for more precise compared with some non-deep learning algorithms. The
word-spelling. A new model was presented based on CNN, DBN-AE was composed of five AEs while the hidden
which including five low-level CNN classifiers with the layer of the last AE only has two nodes which can be
different feature set, and the final high-level results are used for classification through softmax function. Finally,
voted by the low-level classifiers. The highest accuracy the proposed model achieved the accuracy of 69.2%.
reached 95.5% over the dataset II from the third BCI Manor et al. [181] applied two deep neural networks to
competition. Liu et al. [164] proposed a Batch Normalized deal with the RSVP signals after lowpass filtering (0 ∼
Neural Network (BN3 ) which is a variant of CNN in 51 Hz). Discriminative CNN achieved the accuracy of
P300 speller. The proposed method consists of six layers, 85.06%. Meanwhile, the representative convolutional D-
and the batch normalization was operated in each batch. AE achieved the accuracy of 80.68%.
Kawasaki et al. [162] employed an MLP model to detect (2) SSEP. Most of deep learning-based studies in
P300 segments from non-P300 segments and achieved the SSEP area focus on SSVEP like [191]. SSVEP refers to
accuracy of 90.8%. brain oscillations evoked by the flickering visual stimuli,
(ii) AEP. A few works focused on the recognition of which generally produced from the parietal and occipital
AEP. For example, Carabez et al. [187] proposed and tested regions [192]. Attia et al. [196] aimed at finding an
18 CNN structures to classify single-trial AEP signals. In intermediate representation of SSVEP. A hybrid method
the experiment, the volunteers were required to wear on combined CNN and RNN was proposed to capture the
an earphone which produces auditory stimulus designed meaningful features from the time domain directly, which
based on the oddball paradigm. The experimental analysis achieved the accuracy of 93.59%. Waytowich et al. [192]
demonstrated that the CNN frameworks, regardless of the applied a compact CNN model to directly work on the
number of convolutional layers, were effective to extract raw SSVEP signals without any hand-crafted features. The
the temporal and spatial features and provided competitive reported cross subject mean accuracy was approximately
A Survey on Deep Learning-based Non-Invasive Brain Signals 13

80%. Thomas et al. [190] first filter the raw SSVEP signals (PET) and fMRI, fNIRS has higher time resolution and
through a bandpass filter (5 ∼ 48 Hz) and then operated more affordable [201].
discrete FFT on consecutive 512 points. The processed data
were classified by a CNN (69.03%) and an LSTM (66.89%)
independently. 4.3. fMRI
Perez et al. [197] adopted a representative model, a
sparse AE, to extract the distinct features from the SSVEP Recently, several deep learning methods have been applied
from multi-frequency visual stimuli. The proposed model to fMRI analysis, especially on the diagnosis of cognitive
employed a softmax layer for the final classification and impairment [14, 33].
achieved the accuracy of 97.78%. Kulasingham et al. (1) Discriminative models. Among the discriminative
[195] classified SSVEP signals in the context of guilty models, CNN is a promising model to analyze fMRI
knowledge test. The authors applied DBN-RBM and DBN- [206]. For example, Havaei et al. built a segmentation
AE independently and achieved the accuracy of 86.9% and approach for brain tumor based on fMRI with a novel CNN
86.01%, respectively. Hachem et al. [189] investigated algorithm which can capture both the global features and
the influence of fatigue on SSVEP through an MLP model the local features simultaneously [205]. The convolutional
during wheelchair navigation. The goal of this study was filters have different size. Thus, the small-size and large-
to seek the key parameters to switch between manual, size filter could exploit the local and global features,
semi-autonomous, and autonomous wheelchair command. independently. Sarraf et al. [226, 207] applied deep CNN
Aznan et al. [193] explored the SSVEP classification, to recognize Alzheimer’s Disease based on fMRI and MRI
where the signals were collected through dry electrodes. data. Morenolopez et al. [227] employed a CNN model
The dry signals were more challenging for the lower SNR to deal with fMRI of brain tumor patients for three-class
than standard EEG signals. This study applied a CNN recognition (normal, edema, or active tumor). The model
discriminative model and achieved the highest accuracy of was evaluated over BRATS dataset and obtained the F1
96% over a local dataset. score of 88%. Hosseini et al. [117] employed CNN for
feature extraction. The extracted features were classified by
SVM for the detection of an epileptic seizure.
4.2. fNIRS
Furthermore, Li et al. proposed a data completion
Up to now, only a few of researchers paid attention on method based on CNN. In particular, utilizing the
deep learning-based fNIRS. Naseer et al. [38] analyzed information from fMRI data to complete PET, then train the
the difference between two mental tasks (mental arithmetic classifier based on both fMRI and PET [208]. In the model,
and rest) based on fNIRS signals. The authors manually the input data of the proposed CNN is the fMRI patch, and
extracted six features from the prefrontal cortex fNIRS and the output is a PET patch. There are two convolutional
compared six different classifiers. The results demonstrated layers with ten filters mapping the fMRI to PET. The
that the MLP with the accuracy of 96.3% outperformed experiments illustrated that the classifier trained by the
all the traditional classifiers, including SVM, KNN, naive combination of fMRI and PET (92.87%) outperformed the
Bayes, etc. Huve et al. [198] classified the fNIRS one trained by solo fMRI (91.92%) Moreover, Koyamada et
signals, which were collected from the subjects during three al. used a nonlinear MLP to extract common features from
mental states, including substractions, word generation, and different subjects. The model is evaluated over a dataset
rest. The employed MLP model achieved the accuracy from the Human Connectome Project (HCP) [202].
of 66.48% based on the hand-crafted features (e.g., the (2) Representative models. A wide range of publica-
concentration of OxyHb/DeoxyHb). After that, the authors tions demonstrated the effectiveness of representative mod-
study the mobile robot control through fNIRS signals and els in recognition of fMRI data [213]. Hu et al. [217] used
got the binary classification accuracy of 82% (offline) and demonstrated that deep learning outperforms other machine
66% (online) [199]. Chiarelli et al. [71] exploited the learning methods in the diagnosis of neurological disorders
combination of fNIRS and EEG for left/right MI EEG such as Alzheimer’s disease. Firstly, the fMRI images were
classification. Sixteen features extracted from fNIRS converted to a matrix to represent the activity of 90 brain
signals (eight from OxyHb and eight from DeoxyHb) were regions. Secondly, a correlation matrix is obtained by cal-
fed into an MLP classifier with four hidden layers. culating the correlation between each pair of brain regions
On the other hand, Hiroyasu et al. [201] attempted to represent the functional connectivity between different
to detect the gender of the subject through their fNIRS brain regions. Furthermore, a targeted AE is built to clas-
signals. The authors employed a denoising D-AE with sify the correlation matrix, which is sensitive to AD. The
three hidden layers to extract distinctive features to be fed proposed approach achieved an accuracy of 87.5%. Plis et
into an MLP classifier for gender detection. The model al. [211] employed a DBN-RBM with three RBM compo-
was evaluated over a local dataset and gained the average nents to extract the distinctive features from ICA processed
accuracy of 81%. In this study, the authors also pointed fMRI and finally achieved an average F1 measure of above
out that, compared with Positron Emission Tomography 90% over four public datasets. Suk et al. compared the
A Survey on Deep Learning-based Non-Invasive Brain Signals 14

effectiveness of DBN-RBM and DBN-AE on Alzheimer’s background are not listed in this table. Therefore, the
disease detection and the experimental results showed that publication amounts in this table are less than in Table 4.
the former obtained the accuracy of 95.4%, which is slightly
lower than the latter (97.9%) [210]. Suk et al. [209] applied 5.1. Health Care
a D-AE model to extract latent features from the resting-
state fMRI data on the diagnosis of Mild Cognitive Impair- In the health care area, the deep learning-based brain
ment (MCI). The latent features are fed into a SVM clas- signal systems mainly works on the detection and diagnosis
sifier which achieved the accuracy of 72.58%. Ortiz et al. of mental diseases such as sleep disorders, Alzheimer’s
[212] proposed a multi-view DBN-RBM to receives the in- Disease, epileptic seizure, and other disorders. In the
formation of MRI and PET simultaneously. The learned first place, for the sleep disorder detection, most studies
representations were sent to several simple SVM classifiers are focused on the sleep stage detection based on sleep
which were ensembled to form a high-level stronger classi- spontaneous EEG. In this situation, the researchers do
fier by voting. not need to recruit patients with sleep disorder because
(3) Generative models. The reconstruction of natural the sleep EEG signals can be easily collected from
image (e.g., fMRI) has been attracted lots of attention healthy individuals. In terms of the algorithm, it can
[215, 88, 203]. Seeliger et al. [214] proposed a deep be observed from Table 5 that the DBN-RBM and CNN
convolutional GAN for reconstructing visual stimuli from are widely adopted for feature selection and classification.
fMRI, which aimed at training a generator to create an Ruffini et al. [111] walk one step further by detecting
image similar to the visual stimuli. The generator contains the REM Behavior Disorder (RBD), which may cause
four convolutional layers in order to convert the input fMRI neurodegenerative diseases such as Parkinson’s disease.
to a natural image. Han et al. [215] focused on the They achieved an average accuracy of 85% in recognition
generation of synthetic multi-sequence fMRI using GAN. of the RBD from healthy controls.
The generated image can be used for data augmentation Moreover, fMRI is widely applied in the diagnosis
for better diagnostic accuracy or physician training to help of Alzheimer’s Disease. By taking advantage of the high
better understand various diseases. The authors applied spatial resolution of fMRI, the diagnosis achieved the
the existing Deep Convolutional GAN (DCGAN) [228] and accuracy of above 90% in several studies. Another reason
Wasserstein GAN (WGAN) [229] and found that the former that contributes to competitive performance is the binary
works better. Shen et al. [203] presented another image classification scenario. Apart from that, there are several
recovery approach by minimizing the distance between the publications diagnose the AD based on spontaneous EEG
real image and the image generated based on real fMRI. [115, 126].
Besides, the diagnosis of epileptic seizure attracted
much attention. The seizure detection mainly based on
4.4. MEG
spontaneous EEG. The popular deep learning models in
Garg et al. [218] worked on the refining of MEG signals by this scenario contain the independent CNN and RNN,
removing the artifacts like eye-blinks and cardiac activity. along with hybrid models combined RNN and CNN. Some
The MEG singles were decomposed by ICA first and then models integrated the deep learning models for feature
classified by a 1-D CNN model. At last, the proposed extraction and traditional classifier for detection [127, 125].
approach achieved the sensitivity of 85% and specificity of For example, Yuan et al. [121] applied a D-AE in feature
97% over a local dataset. Hasasneh et al. [220] also focused extraction followed by SVM for seizure diagnosis. Ullah
on artifacts detection (cardiac and ocular artifacts). The et al. [112] adopted the voting for post-processing, which
proposed approach uses CNN to capture temporal features proposed several different CNN classifiers and predicted the
and MLP to extract spatial information. Shu et al. [219] final result by voting.
employed a sparse AE to learn the latent dependencies Furthermore, there are a lot of other healthcare issues
of MEG signals in the task of single word decoding. can be solved by brain signal research. The cardiac artifacts
The results demonstrated that the proposed approach is in MEG can be automatically detected by deep learning
advantageous for some subjects, although it did not produce models[218, 220]. Several modified CNN structures are
an overall increase in decoding accuracy. Cichy et al. [204] proposed to detect brain tumor based on fMRI from the
applied a CNN model to recognize visual object based on public BRATS dataset [205, 206]. Researchers have
MEG and fMRI signals. demonstrated the effectiveness of deep learning models in
the detection of a wide number of mental diseases such
5. Brain Signal-based Applications as depression [113], Interictal Epileptic Discharge (IED)
[230], schizophrenia [211], Creutzfeldt-Jakob Disease
Deep learning models have contributed to various of (CJD) [123], and Mild Cognitive Impairment (MCI) [209].
brain signal applications as summarized in Table 5. The
papers focused on signal classification without application
A Survey on Deep Learning-based Non-Invasive Brain Signals 15

to recognize a person’s identity [6]. The latter conducts


binary classification to decide whether a person is
authorized [61].
The majority of the existing biometric identifica-
tion/authentication systems rely on individuals’ intrinsic
physiological features such as face, iris, retina, voice, and
fingerprint [6]. They are vulnerable to various attacks
based on anti-surveillance prosthetic masks, contact lenses,
(a) Brain signals (b) Deep learning models vocoder, and fingerprint films. EEG-based biometric per-
son identification is a promising alternative given its highly
Figure 4: Illustration of the publications proportion for resilient to spoofing attacks—individual’s EEG signals are
crucial brain signals and deep learning models. virtually impossible for an imposter to mimic. Koike et al.
[161] have adopted deep neural networks to identify the
user’s ID based on the VEP signals; Mao et al. [180] ap-
5.2. Smart Environment plied CNN for person identification based on RSVP signals;
The smart environment is a promising application scenario Zhang et al. [6] proposed an attention-based LSTM model
for brain signals in the future. With the development and evaluated it over both public and local datasets. EEG
of Internet of Things (IoT), an increasing number smart signals are also combined with gait information in a hybrid
environment can be connected to brain signals. For deep learning model for a dual-authentication system [61].
example, the assisting robot can be used in smart home
[65, 2], in which the robot can be controlled by brain signals 5.5. Affective Computing
of the individuals. Moreover, Behncke et al. [139] and
Affective states of a user provide critical information
Huve et al. [199] investigated the robot control problem
for many applications such as personalized information
based on the visual stimulated spontaneous EEG and fNIRS
(e.g., multimedia content) retrieval or intelligent human-
signals. The brain signal controlled exoskeleton could help
computer interface design [99]. Recent research illustrated
the disabilities who damaged the motor system in sub-limb
that deep learning models can enhance the performance in
in walking and daily activities [191]. In the future, the
affective computing. The most widely used circumplex
research on brain-controlled appliances may be beneficial to
model believe the emotions are distributed in two dimen-
the elders or disabilities in smart home and smart hospital.
sions: arousal and valence. The arousal refers to the inten-
sity of the emotional stimuli or how strong is the emotion.
5.3. Communication The valence refers to the relationship within the person who
One of the biggest advantages of brain signals, compared experiences the emotion. In some other models, the domi-
to other human-machine interface techniques, is that brain nance and liking dimensions are deployed.
signal enables the patient who lost most motor abilities Some research [89, 90, 91] attempts to classify
like speaking to communicate with the outer world. The users’ emotional state into two (positive/negative) or three
deep learning technology improved the efficiency of brain categories (positive, neutral, and negative) based on EEG
signal based communications. One typical diagram which signals using deep learning algorithms such as CNN and
enables individual typing without any motor system is P300 its variants [87]. DBN-RBM is the most representative
speller, which can convert the user’s intent into text [162]. deep learning model to discover the concealed features from
The powerful deep learning models empower the brain emotional spontaneous EEG [99, 96]. Xu et al. [99] applied
signal systems to recognize the P300 segment from the non- DBN-RBM as feature extractors to classify affective states
P300 segment while the former contains the communication based on EEG.
information of the user [166]. In a higher level, the Further, some researchers aim to recognize the pos-
representative deep learning models can help to detect what itive/negative state of each specific emotional dimension.
character the user is focusing on and print it on the screen to For example, Yin et al. [107] employed an ensemble clas-
chat with others [166, 170, 164]. Additionally, Zhang et al. sifier of AE in order to recognize the user’s affection. Each
[10] proposed a hybrid model that combined RNN, CNN, AE uses three hidden layers to filter out noises and to de-
and AE to extract the informative features from MI EEG to rive stable physiological feature representations. The pro-
recognize what letter the user wants to speak. posed model was evaluated over the benchmark, DEAP, and
achieved the arousal of 77.19% and valence of 76.17%.
5.4. Security
5.6. Driver Fatigue Detection
Brain signals can be used in security scenarios such
as identification (or recognition) and authentication (or Vehicle drivers’ ability to keep alert and maintain optimal
verification). The former conducts multi-class classification performance will dramatically affect the traffic safety [145].
A Survey on Deep Learning-based Non-Invasive Brain Signals 16

Table 5: Summary of deep learning-based brain signal applications. The ‘local’ dataset refers to private or not available
dataset. The public datasets (along with download links) will be introduced in Section 5.9. In the signals, S-EEG, MD
EEG, and E-EEG separately denote sleep EEG, mental disease EEG, and emotional EEG. The single ‘EEG’ refers to
the other subcategory of spontaneous EEG. In the models, RF and LR denote to random forest and logistic regression
algorithms, respectively. In the performance column, ‘N/A’, ‘sen’, ‘spe’, ’aro’, ‘val’, ‘dom’, and ‘like’ denote not-found,
sensitivity, specificity, arousal, valence, dominance, and liking, respectively. For each application scenario, the literature
are sorted out by signal types and deep learning models.

Deep Learning
Brain Signal Applications Reference Signals Dataset Performance
Models
University
Shahin et al. [69] S-EEG MLP Hospital 0.9
in Berlin
Biswai et al. [52] S-EEG RNN Local 0.8576
Ruffini et al. [111] S-EEG RNN Local 0.85
Vilamala et al. [51] S-EEG CNN Sleep-EDF 0.86
Tsinalis et al. [25] S-EEG CNN Sleep-EDF 0.82
Sleep Sors et al. [50] S-EEG CNN SHHS 0.87
Quality Chambon et al. [48] S-EEG Multi-view CNN MASS session 3 N/A
Evaluation Manzano et al. [55] S-EEG CNN + MLP Sleep-EDF 0.732
Fraiwan et al. [56] S-EEG DBN-AE + MLP Local 0.804
Tan et al. [54] S-EEG DBN-RBM Local 0.9278 (F1)
Zhang et al. [49] S-EEG DBN + voting UCD 0.9131
Fernandez et al. [70] S-EEG CNN SHHS 0.9 (F1)
MASS/
Supratak et al. [57] S-EEG CNN + LSTM 0.862/0.82
Sleep-EDF
Morabito et al. [115] MD EEG CNN Local 0.82
Zhao et al. [126] MD EEG DBN-RBM Local 0.92
DBN-AE; 0.979;
AD Suk et al. [210] fMRI ADNI
DBN-RBM 0.954
Detection
Health Sarraf et al. [207] fMRI CNN ADNI 0.9685
Care Li et al. [208] fMRI CNN + LR ADNI 0.9192
Hu et al. [217] fMRI D-AE + MLP ADNI 0.875
DBN-RBM
Ortiz et al. [212] fMRI, PET ADNI 0.9
+ SVM
Hosseini et al. [120] EEG CNN Local 0.96
Yuan et al. [109] MD EEG Attention-MLP CHB-MIT 0.9661
Tsiouris et al. [53] MD EEG LSTM CHB-MIT >0.99
Talathi et al. [110] MD EEG GRU BUD 0.996
Acharya et al. [114] MD EEG CNN UBD 0.8867
Schirmeister et al. [116] MD EEG CNN TUH 0.854
Hosseini et al. [117] MD EEG CNN Local N/A
Johansen et al. [118] MD EEG CNN Local 0.947 (AUC)
Seizure
Ansari et al. [119] MD EEG CNN + RF Local 0.77
Detection
Ullah et al. [112] MD EEG CNN + voting UBD 0.954
Wen et al. [124] MD EEG AE Local 0.92
Lin et al.[122] MD EEG D-AE UBD 0.96
Yuan et al. [121] MD EEG D-AE + SVM CHB-MIT 0.95
Page et al. [125] MD EEG DBN-AE + LR N/A 0.8 ∼ 0.9
DBN-RBM
Turner et al. [127] MD EEG Local N/A
+ LR
Hosseini et al. [129] MD EEG D-AE + MLP Local 0.94
Sen: 0.3083;
Golmohammadi et al. [130] MD EEG RNN+CNN TUH
Spe: 0.9686
Sen: 0.39;
Shah et al. [128] MD EEG CNN+ LSTM TUH
Spe: 0.9037
A Survey on Deep Learning-based Non-Invasive Brain Signals 17

Table 5. Summary of deep learning-based brain signal applications (Continued). IEF and CJD refer to Interictal Epileptic
Discharge and Creutzfeldt-Jakob Disease, respectively.

Deep Learning
Brain Signal Applications Reference Signals Dataset Performance
Models
Others:
IED Antoniades et al. [231] EEG AE + CNN Local 0.68
CJD Morabito et al. [123] MD EEG D-AE Local 0.81 ∼ 0.83
Acharya et al. [113] MD EEG CNN Local 0.935 ∼ 0.9596
Depression
DBN-RBM
Al et al. [131] MD EEG Local 0.695
+ MLP
Morenolopez et al. [227] fMRI CNN BRATS 0.88 (F1)
Brain Tumor Shreyas et al. [206] fMRI CNN BRATS 0.83
Health Havaei et al. [205] fMRI Muli-scale CNN BRATS 0.88 (F1)
Care Plils et al. [211] fMRI DBN-RBM Combined 0.9 (F1)
Schizophrenia
CNN + RF
Chu et al. [149] Local 0.816, 0.967, 0.992
+ Voting
Mild Cognitive
Suk et al. [209] fMRI AE + SVM ADNI2 0.7258
Impairment (MCI)
Sen: 0.85,
Cardiac Garg [218] MEG CNN Local
Spe: 0.97
Detection
Hasasneh et al. [220] MEG CNN + MLP Local 0.944
Robot Control Behncke et al. [139] EEG CNN Local 0.75
Smart Smart
Zhang et al. [65] MI EEG RNN EEGMMI 0.9553
Environment Home
Exoskeleton
Kwak et al. [191] SSVEP CNN Local 0.9403
Control
Huve et al. [199] fNIRS MLP Local 0.82
LSTM+CNN
Zhang et al. [10] MI EEG Local 0.9452
+AE
Kawasaki et al. [162] VEP MLP Local 0.908
Communication The third BCI
Cecotti et al. [166] VEP CNN competition, 0.945
Dataset II
The third BCI
Liu et al. [164] VEP CNN competition, 0.92 ∼ 0.96
Dataset II
The third BCI
Cecotti et al. [166] VEP CNN + Voting competition, 0.955
Dataset II
Maddula et al. [170] VEP RCNN Local 0.65∼0.76
Attention-based
Zhang et al. [6] MI-EEG EEGMMI + local 0.9882
RNN
Identification
Security Koike et al. [161] VEP MLP Local 0.976
Mao et al. [180] RSVP CNN Local 0.97
Authentication Zhang et al. [61] MI EEG Hybrid EEGMMI + local 0.984
Frydenlund et al. [87] E-EEG MLP DEAP N/A
Zhang et al. [88] E-EEG RNN SEED 0.895
Li et al. [201] E-EEG CNN SEED 0.882
Liu et al. [90] E-EEG CNN Local 0.82
Hierarchical
Li et al. [89] E-EEG SEED 0.882
CNN
Affective Computing
Chai et al. [94] E-EEG AE SEED 0.818
DBN-AE,
Xu et al. [99] E-EEG DEAP >0.86 (F1)
DBN-RBM
0.8 ∼
Jia et al. [98] E-EEG DBN-RBM DEAP
0.85 (AUC)
Aro:0.642,
Li et al. [100] E-EEG DBN-RBM DEAP Val:0.584,
Dom 0.658
Aro:0.6984,
Xu et al. [101] E-EEG DBN-RBM DEAP Val:0.6688,
Lik: 0.7539
A Survey on Deep Learning-based Non-Invasive Brain Signals 18

Table 5. Summary of deep learning-based brain signal applications (Continued).

Deep Learning
Brain Signal Applications Reference Signals Dataset Performance
Models
DBN-RBM
Zheng et al. [102] E-EEG Local 0.8762
+ HMM
DBN-RBM
Zhang et al. [96, 97] E-EEG SEED 0.8608
+ MLP
Affective Computing DBN-RBM
Gao et al. [106] E-EEG Local 0.684
+ MLP
Multi-view D-AE Aro: 0.7719;
Yin et al. [107] E-EEG DEAP
+ MLP Val: 0.7617
Mioranda et al. [104] E-EEG RNN + CNN AMIGOS ¡0.7
Aro:0.8565,
Alhagry et al. [108] E-EEG LSTM + MLP DEAP Val:0.8545,
Lik: 0.8799
SEED,
Liu et al. [95] EEG AE 0.9101, 0.8325
DEAP
Aro: 0.7033;
Kawde et al. [105] EEG DBN-RBM DEAP Val: 0.7828;
Dom: 0.7016
Hung et al. [140, 140] EEG CNN Local 0.572 (RMSE)
Hung et al. [140] EEG CNN Local
Almogbel et al. [145] EEG CNN Local 0.9531
Hajinoroozi et al. [147, 147] EEG CNN Local 0.8294
Drive Fatigue Detection Hajinoroozi et al. [153] EEG DBN-RBM Local 0.85
San et al. [154] EEG DBN-RBM + SVM Local 0.7392
Chai et al. [158] EEG DBN + MLP Local 0.931
Du et al. [151] EEG D-AE + SVM Local 0.094 (RMSE)
Hachem et al. [189] SSVEP MLP Local 0.75
Yin et al. [150] EEG D-AE Local 0.9584
Bashivan et al. [159] EEG DBN-RBM Local 0.92
Li et al. [155] EEG DBN-RBM Local 0.9886
Mental Load Measurement Bashivan et al. [171] EEG R-CNN Local 0.9111
Bashivan et al. [172] EEG DBN + MLP Local N/A
Naseer et al. [38] fNIRS MLP Local 0.963
Hennrich et al. [200] fNIRS MLP Local 0.641
School Bullying Baltatzis et al. [141] EEG CNN Local 0.937
Stober et al. [142] EEG CNN Local 0.776
Stober et al. [157] EEG AE + CNN Open MIIR 0.27 for 12-class
Music Detection
Stober et al. [188] EEG CNN Local 0.244
Sternin et al. [148] EEG CNN Local 0.75
Number
Waytowich et al. [192] SSVEP CNN Local 0.8
Choosing
Cichy et al. [204] fMRI, MEG CNN N/A N/A
Manor et al. [176] RSVP CNN Local 0.75
Visual Object Cecotti et al. [177] RSVP CNN Local 0.897 (AUC)
Other
Recognition Hajinoroozi et al. [179] RSVP CNN Local 0.7242 (AUC)
Appli-
Shamwell et al. [185] RSVP CNN Local 0.7252 (AUC)
-cations
Perez et al. [197] SSVEP AE Local 0.9778
Guilty
DBN-RBM; 0.869;
Knowledge Kulasingham et al. [195] SSVEP Local
DBN-AE 0.8601
Test
Concealed
Information Liu et al. [168] EEG DBN-RBM Local 0.973
Test
Flanker Task Volker et al. [143] EEG CNN Local 0.841
Narejo et al. [152] EEG DBN-RBM UCI 0.989
Eye State
Reddy et al. [136] EEG MLP Local 0.975
User Preference Teo et al. [135] EEG MLP Local 0.6399
Emergency
Hernandez et al. [144] EEG CNN Local 0.718
Braking
Gender Putten et al. [146] EEG CNN Local 0.81
Detection Hiroyasu et al. [201] fNIRS D-AE + MLP Local 0.81
A Survey on Deep Learning-based Non-Invasive Brain Signals 19

EEG signals have proven useful in evaluating the human’s MLP algorithm for fNIRS-based binary mental task level
cognitive state in different context. Generally, a driver is classification (mental arithmetic and rest) [38]. The
regarded as in an alert state if the reaction time is lower experiment results showed that the MLP outperformed
than 0.7 seconds and in fatigue state if it is higher than 2.1 the traditional classifiers like SVM, KNN, and achieved
seconds. Hajinoroozi et al. [153] considered the detection the highest accuracy of 96.3%. Bashivan et al. [159]
of driver’s fatigue from EEG signals by discovering the presented a statistical approach, a DBN model, for the
distinct features. They explored an approach based on DBN recognition of mental workload level based on single-trial
for dimension reduction. EEG. Before the DBN, the authors manually extracted
Detecting driver fatigue is crucial because the drowsi- the wavelet entropy and band-specific power from three
ness of the driver may lead to disaster. Driver fatigue de- frequency bands (theta, alpha, and beta). At last,
tection is feasible in practice. In the hardware aspect, the the experiments demonstrated the recognition of mental
collection equipment of EEG singles is off-the-shelf and workload achieved an overall accuracy of 92%. Zhang
portable enough to be used in a car. Moreover, the price et al. [156] investigate the mental load measurement
of an EEG headset is affordable for most people. In the across multiple mental tasks via a recurrent-convolutional
algorithm aspect, deep learning models have enhanced the framework. The model simultaneously learns EEG features
performance of fatigue detection. As we summarized, the from the spatial, spectral, and temporal dimensions, which
EEG based driving drowsiness can be recognized with high results in the accuracy of 88.9% in binary classification
accuracy (82% ∼ 95%). (high/low workload levels).
Future scope of drive fatigue detection is in the self-
driving scenario. As we know, in the most situation of 5.8. Other Applications
self-driving (e.g., Automation level 35 ), the human driver is
expected to respond appropriately to a request to intervene, There are plenty of interesting scenarios beyond the above
which indicates that the driver should keep alert state. where deep learning-based brain signals can apply, such as
Therefore, we believe the application of brain signal-based recommender system [135] and emergency braking [144].
drive fatigue detection will benefit the development of the One possible topic is the recognition of a visual object,
self-driving car. which may be used in guilty knowledge test [195] and
concealed information test [168]. The neurons of the
participant will produce a pulse when he/she suddenly
5.7. Mental Load Measurement
watch a similar object. Based on the theory, the visual
The EEG oscillations can be used to measure the mental target recognition is mainly used RSVP signals. Cecotti
workload level, which can sustain decision making and et al. [177] aimed to build a common model for target
strategy development in the context of human-machine recognition, which can work for various subjects instead of
interaction [150]. Additionally, the appropriate mental a specific subject.
workload is essential for maintaining human health and Besides, researchers have investigated to distinguish
preventing accidents. For example, the abnormal mental the subject’s gender by the fNIRS [201] and spontaneous
workload of the human operator may result in performance EEG [146]. Hiriyasu et al. [201] adopted deep learning to
degradation which could cause catastrophic accidents recognize the gender of the subject based on the cerebral
[232]. Evaluation of operator Mental Workload levels blood flow. The experiment results suggested that the
via ongoing EEG is quite promising in Human-Machine cerebral blood flow changes in different ways for male and
collaborative task environment to alarm the temporal female. Putten et al. [146] tried to discover the sex-specific
operator performance degradation. information from the brain rhythms and adopted a CNN
Several researchers have been paid attention to this model to recognize the participant’s gender. This paper
topic. The mental workload can be measured from fNIRS illustrated that fast beta activity (20 ∼25 Hz) is one of the
signals or spontaneous EEG. Naseer et al. adopted a most distinctive attributes.
5 https://en.wikipedia.org/wiki/Self-driving car
6 https://physionet.org/physiobank/database/sleep-edfx/ 5.9. Benchmark Datasets
7 https://massdb.herokuapp.com/en/
8 https://physionet.org/pn3/shhpsgdb/ We have extensively explored the benchmark datasets
9 https://physionet.org/pn6/chbmit/ usable for deep learning-based brain signals (Table 6). We
10 https://www.isip.piconepress.com/projects/tuh eeg/html/downloads.shtml
provide a bunch of public datasets with download links,
11 https://physionet.org/pn4/eegmmidb/
12 http://www.bbci.de/competition/ii/
which cover most brain signal types. In particular, BCI
13 http://www.eecs.qmul.ac.uk/mmv/datasets/amigos/readme.html competition IV (BCI-C IV) contains five datasets via the
14 http://bcmi.sjtu.edu.cn/ seed/download.html same link. For better understanding, we present the number
15 https://www.eecs.qmul.ac.uk/mmv/datasets/deap/
of subjects, the number of class (how many categories),
16 https://owenlab.uwo.ca/research/the openmiir dataset.html
17 http://adni.loni.usc.edu/data-samples/access-data/
sampling rate, and the number of channels of each dataset.
18 https://www.med.upenn.edu/sbia/brats2018/data.html In the ‘# Channel’ column, the default channel is for EEG
A Survey on Deep Learning-based Non-Invasive Brain Signals 20

Table 6: The summary of public dataset for brain signal studies. The ‘# Sub’, ‘# Cla’, and S-Rate denote the number
of subject, number of class, and sampling rate, respectively. FM denote finger movement while BCI-C denote the BCI
Competition. The ‘# channel‘ refers to the number of brain signal channels.

Brain Signals Name Link # Sub # Cla S-Rate # Channel


Sleep-EDF6 : Telemetry 22 6 100 2
Sleep-EDF: Cassette 78 6 100, 1 2
MASS-17 53 5 256 17
Sleep MASS-2 19 6 256 19
EEG MASS-3 62 5 256 20
MASS-4 40 6 256 4
MASS-5 26 6 256 20
SHHS8 5804 N/A 125, 50 2
Seizure CHB-MIT9 22 2 256 18
EEG TUH10 315 2 200 19
EEGMMI11 109 4 160 64
BCI-C II12 , Dataset III 1 2 128 3
BCI-C III, Dataset III a 3 4 250 60
EEG
BCI-C III, Dataset III b 3 2 125 2
MI BCI-C III, Dataset IV a 5 2 1000 118
EEG BCI-C III, Dataset IV b 1 2 1001 119
BCI-C III, Dataset IV c 1 2 1002 120
BCI-C IV, Dataset I 7 2 1000 64
BCI-C IV, Dataset II a 9 4 250 22
BCI-C IV, Dataset II b 9 2 250 3
AMIGOS13 40 4 128 14
Emotional
SEED14 15 3 200 62
EEG
DEAP15 32 4 512 32
Others
Open MIIR16 10 12 512 64
EEG
BCI-C II, Dataset II b 1 36 240 64
VEP
BCI-C III, Dataset II 2 26 240 64
ADNI17 202 3 N/A N/A
fMRI
BRATS18 2013 65 4 N/A N/A
MEG BCI-C IV, Dataset III 2 4 400 10

signals. Some datasets contain more biometric signals (e.g., (fNIRS, fMRI, and MEG). Furthermore, there are about
ECG), but we only list the channels related to brain signals. 70% of the EEG papers pay attention to the spontaneous
EEG (133 publications). For better understanding, we split
6. Analysis and Guidelines the spontaneous EEG into several aspects: the sleep, the
motor imagery, the emotional, the mental disease, the data
In this section, we first analyze what is the most suitable augmentation, and others.
deep learning models for each brain signal. Then, we First, the classification of the sleep EEG mainly
summarize the popular deep learning models in brain signal depends on the discriminative and the hybrid models.
research. At last, we investigate the brain signals in terms Among the nineteen studies about sleep stage classification,
of application. We hope this survey could help our readers there are six employed CNN and the modified CNN models
to select the most effective and efficient methods when independently while two papers adopted RNN models.
dealing with brain signals. Please recall Table 4 where we There are three hybrid models built on the combination of
summarize the brain signals and the corresponding deep CNN and RNN.
learning models of the state-of-the-art papers. Figure 4 Second, in terms of the research on MI EEG (30
illustrated of the publications proportion for crucial brain publications), the independent CNN and CNN-based hybrid
signals and deep learning models. models are widely used. As for the representative models,
DBN-RBM is often applied to capture the latent features
6.1. Brain Signal Acquisition from the MI EEG signals.
Third, there are twenty-five publications related to
Among the non-invasive signals, the studies on EEG is far spontaneous emotional EEG. More than half of them
more than the sum of all the other brain signal paradigms employed representative models (such as D-AE, D-RBM,
A Survey on Deep Learning-based Non-Invasive Brain Signals 21

especially DBN-RBM) for unsupervised feature learning. 6.2. Selection Criteria for Deep Learning Models
The most typical state recognition works recognize the
Our investigation shows that discriminative models are
user’s emotion as positive, neutral, or negative. Some
most frequent in the summarized publications. This is
researchers take a further step to classify the valence, and
reasonable at a high level because a large proportion of
the arouse rate, which is more complex and challenging.
brain signal issues can be regarded as a classification
Fourth, the research on mental disease diagnosis is
problem. Another observation is that CNN and its variants
promising and attracting. The majority of the related
are adopted in more than 70% of the discriminative models,
research focuses on the detection of epileptic seizure and
for which we provide reasons as follows.
Alzheimer’s Disease. Since the detection is a binary
First, the design of CNN is powerful enough to extract
classification problem which is rather easier than multi-
the latent discriminative features and spatial dependencies
class classification, many studies can achieve a high
from the EEG signals for classification. As a result, CNN
accuracy like above 90%. In this area, the standard CNN
structures are adopted for classification in some studies
model and the D-AE are prevalent. One possible reason is
while adopted for feature extraction in some other studies.
that CNN and AE are the most well-known and effective
Second, CNN has been achieved great success in
deep learning models for classification and dimensionality
some research areas (e.g., computer vision), which makes
reduction.
it extremely famous and feasible (public codes). Thus, the
Fifth, several publications pay attention to the GAN
brain signal researchers have more chance to understand
based data augmentation. At last, about thirty studies
and apply CNN on their works.
are investigating other spontaneous EEG such as driving
Third, some brain signal diagrams (e.g., fMRI)
fatigue, audio/visual stimuli impact, cognitive/mental load,
are naturally formed as two-dimension images that are
and eye state detection. These studies extensively apply
conducive to be processedg by CNN. Meanwhile, other 1-D
standard CNN models and variants.
signals (e.g., EEG) could be converted into 2-D images for
Moreover, apart from spontaneous EEG, evoked
further analysis by CNN. Here, we provide several methods
potentials also attracted much attention. On the one hand,
converting 1-D EEG signals (with multiple channels) to
in ERP, VEP and the subcategory RSVP has drawn lots
the 2-D matrix: 1) convert each time-point19 to a 2-D
of investigations because visual stimuli, compared to other
image; 2) convert a segment into a 2-D matrix. In the first
stimuli, is easier to be conducted and more applicable in
situation, suppose we have 32 channels, and we can collect
the real world (e.g., P300 speller can be used for brain
32 elements (each element corresponding to a channel) at
typing). For VEP (twenty-one publications), there are
each time-point. As described in [89], the collected 32
elven studies applied discriminative models, and six works
elements could be converted into a 2-D image based on the
adopted hybrid models. In terms of RSVP, the sole CNN
spatial position. In the second situation, suppose we have
dominates the algorithms. Apart from them, five papers
32 channels, and the segment contains 100 time-points. The
focused on the analysis of AEP signals. On the other hand,
collected data can be arranged as a matrix with the shape of
among the steady-state related researches, only SSVEP has
[32, 100] where each row and column refers to a specific
been studied by deep learning models. Most of them only
channel and time-point, respectively.
applied discriminative models on the recognition of the
Fourth, there are a lot of variants of CNN which are
target image.
suitable for a wide range of brain signal scenarios. For
Furthermore, beyond the diverse EEG diagrams, a
example, the single-channel EEG signals can be processed
wide range of papers paid attention to fNIRS and fMRI.
by 1-D CNN. In terms of RNN, only about 20% of
The fNIRS images are rarely studied by deep learning, and
discriminative model-based papers adopted RNN, which is
the major studies just employed the simple MLP models.
much less than we expected since RNN has demonstrated
We believe more attention should be paid to the research
powerful in temporal feature learning. One possible reason
on fNIRS for the high portability and low cost. As for the
for this phenomena is that processing a long sequence by
fMRI, twenty-three papers proposed deep learning models
RNN is time-consuming and the EEG signals are generally
to the classification. The CNN model is widely used
formed as a long sequence. For example, the sleep signals
for its outstanding performance in feature learning from
are usually sliced into segments with 30 seconds, which
images. There are also several papers interested in image
has 3000 time-points under 100 Hz sampling rate. For
reconstruction based on fMRI signals. One reason why
a sequence with 3000 elements, through our preliminary
fMRI is so hot is that several public datasets are available
experiments, RNN takes more than 20 folds training time
on the Internet, although the fMRI equipment is expensive.
than CNN. Moreover, MLP is not popular due to its
The MEG signals are mainly used in the medical area,
inferior effectiveness (e.g., non-linear ability) to the other
which is insensitive to the deep learning algorithm. Thus,
algorithms its simple deep learning architecture.
we only found very few studies on MEG. The sparse AE
and CNN algorithms have a positive influence on the feature
refining and classification of MEG. 19 Time-point represents one sampling point. For example, we can have

100 time-points if the sampling rate is 100 Hz.


A Survey on Deep Learning-based Non-Invasive Brain Signals 22

As for representative models, DBN, especially DBN- good discriminative feature learning ability and lead to a
RBM, is the most popular model for feature extraction. comprehensive performance. Generally, most of the deep
DBN is widely used in brain signal for two reasons: 1) it learning algorithms can achieve the accuracy of above 85%
learns the generative parameters that reveal the relationship in the context of multiple sleep stage scenario. Upon this,
of variables in neighboring layers efficiently; 2) it makes it the combined hybrid models (e.g., CNN integrates with
straightforward to calculate the values of latent variables LSTM) can only have incremental improvements.
in each hidden layer [233]. However, most works that One key method to detect Alzheimer’s Disease is brain
employed the DBN-RBM model were published before signal analysis by measuring the functions of specific brain
2016. It can be inferred that the researchers prefer to use regions. In detail, the diagnosis can be conducted by
DBN for feature learning followed by a non-deep learning spontaneous EEG signals or fMRI images. For MD EEG,
classifier before 2016; but recently, an increasing number of DBN is supposed to outperform CNN since the EEG signals
studies would like to adopt CNN or hybrid models for both contains more temporal instead of spatial information. As
feature learning and classification. for the fMRI pictures, CNN have great advantages in the
Moreover, generative models are rarely employed in- grid-arranged spatial information learning, which makes it
dependently. The GAN- and VAE-based data augmentation obtain a very comprehensive classification accuracy (above
and image reconstruction are mainly focused on fMRI and 90%). As for epileptic seizure, the diagnosis are generally
EEG signals. It is demonstrated that the trained classifier based on EEG signals. The single RNN classifier (e.g.,
will achieve more competitive performance after data aug- LSTM or GRU) seems work better than its counterparts due
mentation. Therefore, this is a promising research prospect to the excellent temporal dependency representing ability.
in the future. Here, the complex hybrid models indeed outperform the
Last but not the least, there are fifty-three publications single component. For example, [130] achieves a better
proposed hybrid models for brain signal studies. Among specification than [116] on the same dataset because of
them, the combinations of RNN and CNN take about one- combing with RNN. Most of the epileptic seizure detection
fifth proportion. Since RNN and CNN are illustrated models claim a rather high classification accuracy (above
having excellent temporal and spatial feature extraction 95%). One possible reason is that the binary recognition
ability, it is natural to combine them for both temporal and scenario is much easier than multi-class classification.
spatial feature learning. Another type of hybrid models The brain signal-controlled smart environment only
is the combination of representative and discriminative appear in a small number of publications. Among them, the
models. This is easy to understand because the former is brain signals are collected through very different methods.
employed for feature refining, and the latter is employed for This is an emerging but promising field because it is
classification. There are twenty-eight publications which easy to integrate with smart home and smart hospital to
almost covered all the brain signals proposed this type of benefit the individuals whether healthy or disable. Another
hybrid deep learning models. The adopted representative advantage of brain signals is bridging people’s inside and
models are mostly AE or DBN-RBM; at the meanwhile, the outer world by communication techniques. In this area,
adopted discriminative models are mostly CNN. Apart from lots of investigations are focusing on the VEP signals
that, there are twelve papers proposed other hybrid models because the visual evoked potential is obvious and easy
such as two discriminative models. For example, several to be detected. One important data source is from the
studies proposed the combination of CNN and MLP where third BCI competition. In addition, brain signal analysis
a CNN structure is used for extract spatial features and an can be widely implement in security systems since the
MLP is used for classification. brain signals are invisible and very hard to be mimicked.
The characteristic of high fake-resistance enables brain
6.3. Application Performance signal a raising star in the identification/authentication in
confidential scenarios. The drawbacks of brain signal-
In order to have a closer observation of the recent advances based security systems are the expensive equipment and
on deep learning-based brain signal analysis, we analyze inconvenient (e.g., the subject have to wear an EEG headset
the brain signal acquisition methods and the deep learning to monitor the brainwaves).
algorithms in terms of application performance. In some Affective computing has drawn much attention in
cases, various studies adopt the same deep architecture recent years. The EEG signals have high temporal
working on the same dataset but results in different resolution and able to capture the quick-varying emotions.
performance, which maybe caused by the different pre- Therefore, almost all the studies are based on spontaneous
processing methods and hyper-parameter settings. EEG signals. The signals are gathered when the subject is
To begin with, the most appealing and hot field is watching video which is supposed to arouse the subject’s
that using brain signal analysis on health care area. For specific emotion. Another reason for this phenomenon
sleep quality evaluation, the dominate brain signals are is that there are several open-source EEG-based affecting
spontaneous EEG which are measured while the patient is analysis datasets (e.g., DEAP and SEED) which greatly
sleeping. The single RNN or CNN models seem have a
A Survey on Deep Learning-based Non-Invasive Brain Signals 23

promote the investigation in this area. The EEG-based learning tool and be integrated with suitable attention
affective computing contains two mainstreams. One mechanisms to form a general classification framework.
of them focuses on developing powerful discriminative One additional direction we may consider is how to
classifiers (such as hierarchical CNN) which are designed interpret the feature representation derived by the deep
to perform feature extraction and classification in the same neural network, what is the intrinsic relationship between
step. The other tries to learn the latent features through deep the learned features and the task-related neural pattern,
representative models (e.g., DBN-RBM) and then send the or neuropathology of mental disorders. More and more
learned representations into a powerful classifier (such as people are realizing that interpretation could be even more
HMM and MLP). It can be observed that the former models important than prediction performance, since we usually
([88, 201]) seem outperform the latter methods ([96]) with just treat deep learning as a black box.
a small margin on the SEED dataset.
Drive fatigue detection can be easily integrated in 7.2. Subject-Independent Classification
the platforms such as self-driving vehicles. Nevertheless,
there are only a few publications in this area due to the Until now, most brain signal classification tasks focus on
expensive experimental cost and the lack of accessible person-dependent scenarios, where the training samples
dataset. Moreover, there are a lot of interesting applications and testing samples are collected from the identical
(e.g., guilty knowledge test and gender detection) have been individual. The future direction is to realize person-
explored by deep learning models. independent classification so that the testing data will never
appear in the training set. High-performance person-
independent classification is compulsory for the wide
7. Open Issues
application of brain signals in the real world.
Although deep learning has lifted the performance of brain One possible solution to achieving this goal is to build
signal systems, technical and usability challenges remain. a personalized model with transfer learning. A personalized
The technical challenges concern the classification ability affective model can adopt a transductive parameter transfer
in complex scenarios, and the usability challenges refer to approach to construct individual classifiers and to learn a
limitations in large scale real-world deployment. In this regression function that maps the relationship between data
section, we introduce these challenges and point out the distribution and classifier parameters [234]. Another poten-
possible solutions. tial solution is mining the subject-independent component
from the input data. The input data can be decomposed into
two parts: a subject-dependent component, which depends
7.1. Explainable General Framework on the subject and a subject-independent component, which
Until now, we have introduced several types of brain signals is common for all subjects. A hybrid multi-task model can
(e.g., spontaneous EEG, ERP, fMRI) and deep learning work on two tasks simultaneously, one focusing on per-
models that have been applied for each type. One promising son identification and the other on class recognition. A
research direction for deep learning-based brain signal well-trained and converged model is supposed to extract the
research is to develop a general framework that can handle subject-independent features in the class recognition task.
various brain signals regardless of the number of channels
used for signal collection, the sample dimensions (e.g., 1- 7.3. Semi-supervised and Unsupervised Classification
D or 2-D sample), and stimulation types (e.g., visual or
audio stimuli), etc. The general framework would require The performance of deep learning highly depends on the
two key capabilities: the attention mechanism and the size of training data, which, however, requires expensive
ability to capture latent feature. The former guarantees the and time-consuming manual labeling to collect abundant
framework can focus on the most valuable parts of input class labels for a wide range of scenarios such as sleep EEG.
signals, and the latter enables the framework to capture the While supervised learning requires both observations and
distinctive and informative features. labels for the training, unsupervised learning requires no
The attention mechanism can be implemented based labels, and semi-supervised learning only requires partial
on attention scores or by various machine learning labels [98]. They are, therefore, more suitable for problems
algorithms such as reinforcement learning. The attention with little ground truth.
scores can be inferred from the input data and work as a Zhang et al. proposed an Adversarial Variational
weight to help the framework to pay attention to the parts Embedding (AVAE) framework that combines a VAE++
with high attention scores. Reinforcement learning has model (as a high-quality generative model) and semi-
shown to be able to find the most valuable part through supervised GAN (as a posterior distribution learner) [235]
a policy search [85]. CNN is the most suitable structure for robust and effective semi-supervised learning. Jia et
for capturing features at various levels and ranges. In al. proposed a semi-supervised framework by leveraging
the future, CNN could be used as a fundamental feature the data distribution of unlabelled data to prompt the
representation learning of labelled data [98].
A Survey on Deep Learning-based Non-Invasive Brain Signals 24

Two methods may enhance the unsupervised learning: Currently, there are three types of EEG collection
one is to employ crowd-sourcing to label the unlabeled equipment: the unportable, the portable headset, and ear-
observations; the other is to leverage unsupervised domain EEG sensors. The unportable equipment has high sampling
adaption learning to align the distribution of source brain frequency, channel numbers, and signal quality but is
signals and the distribution of target signals with a linear expensive. It is suitable for physical examination in a
transformation. hospital. The portable headsets (e.g., Neurosky, Emotiv
EPOC) have 1 ∼ 14 channels and 128∼ 256 sampling
7.4. Online Implementation rate but has inaccuracy readings and cause discomfort after
long-time use. The ear-EEG sensors, which are attached to
Most of the existing brain signal systems focus on offline the outer eat, have gained increasing attention recently but
procedure which means that the training and testing dataset remain mostly at the laboratory stage [239]. The ear-EEG
are pre-collected and evaluated offline. However, in the sensors contain a series of electrodes which are placed in
real-world scenarios, the brain signal systems are supposed each ear canal and concha [240]. The EEGrids, to the best
to receive live data stream and produce classification results of our knowledge, is the only commercial ear-EEG. It has
in real time, which is still very challenging. multi-channel sensor arrays placed around the ear using an
For EEG signals, in the online system, compared to adhesive 20 and is even more expensive. A promising future
the offline procedure, the gathered live signals are more direction is to improve the usability by developing a cheaper
noisy and unstable due to lots of factors such as the (e.g., lower than 200$) and more comfortable (e.g., can last
less-concentrating of the subject [236] and the inherent longer than 3 hours without feeling uncomfortable) wireless
destabilization of the equipment (e.g., fluctuating sampling ear-EEG equipment.
rate). Through our empirical experiments, online brain
signal systems generally perform a lower accuracy of
8. Conclusion
10% than their counterparts. One future scope of online
implementation is to develop a batch of robust algorithms In this paper, we thoroughly summarize the recent advances
in order to handle the influence factors and discover the in deep learning models for non-invasive brain signal
latent distinctive patterns underlying the noisy live brain analysis. Compared with traditional machine learning
signals. [237] implemented an EEG-based online system methods, deep learning not only enables to learn high-
that achieves comparable performance, however, this work level features automatically from brain signals but also
only investigates a very high-level target (i.e., human have less dependency on domain knowledge. We organize
attention). Discovering the latent invariant representations brain signals and dominant deep learning models, followed
through covariance matrices of EEG signals can help to by discussing state-of-the-art deep learning techniques for
mitigate the influence of extinct perturbations [238]. Some brain signals. Moreover, we provide guidelines to help
post-processing methods (e.g., voting and aggregating) researchers to find the suitable deep learning algorithms for
[166, 149] can help to improve the decoding performance each category of brain signals. Finally, we overview deep
by averaging the results from multiple continues samples. learning-based brain signal applications and point out the
However, these methods will inevitably bring higher open challenges and future directions.
latency. Thus, the post-processing requires a trade-off
between the high-accuracy and low-latency.
For fNIRS and fMRI, the online evaluation is relatively References
less challenging since they have a rather low temporal [1] T. Ball, M. Kern, I. Mutschler, A. Aertsen, and A. Schulze-
resolution. The online images with less dynamic can be Bonhage, “Signal quality of simultaneously recorded invasive
regarded as static images to some extent, which makes and non-invasive eeg,” Neuroimage, vol. 46, no. 3, pp. 708–716,
the online system approximating to the offline system. 2009.
[2] X. Zhang, L. Yao, S. Zhang, S. Kanhere, M. Sheng, and Y. Liu,
Furthermore, most fMRI and MEG signals are used to “Internet of things meets brain-computer interface: A unified
evaluate the user’s neurological status (e.g., detect the deep learning framework for enabling human-thing cognitive
effects of tumor) which does not require an instantaneous interactivity,” IEEE Internet of Things Journal, 2018.
response. Thus, they have less demand for a real-time [3] X. An, D. Kuang, X. Guo, Y. Zhao, and L. He, “A deep learning
method for classification of eeg data based on motor imagery,”
monitoring system. in International Conference on Intelligent Computing, 2014, pp.
203–210.
[4] Y. R. Tabar and U. Halici, “A novel deep learning approach for
7.5. Hardware Portability
classification of eeg motor imagery signals,” Journal of neural
engineering, vol. 14, no. 1, p. 016003, 2016.
Poor portability of hardware has been preventing brain
[5] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo,
signals from wide application in the real world. In most A. Rakotomamonjy, and F. Yger, “A review of classification
scenarios, users would like to use small, comfortable, or algorithms for eeg-based brain–computer interfaces: a 10 year
even wearable brain signal hardware to collect brain signals
and to control appliances and assistant robots. 20 http://ceegrid.com/home/concept/
A Survey on Deep Learning-based Non-Invasive Brain Signals 25

update,” Journal of neural engineering, vol. 15, no. 3, p. 031005, vol. 95, pp. 4–20, 2014.
2018. [23] A. Haider and R. Fazel-Rezai, “Application of p300 event-
[6] X. Zhang, L. Yao, S. S. Kanhere, Y. Liu, T. Gu, and K. Chen, related potential in brain-computer interface,” in Event-Related
“Mindid: Person identification from brain waves through Potentials and Evoked Potentials, 2017.
attention-based recurrent neural network,” Proceedings of [24] J. Liu, Y. Pan, M. Li, Z. Chen, L. Tang, C. Lu, and J. Wang,
the ACM on Interactive, Mobile, Wearable and Ubiquitous “Applications of deep learning to mri images: a survey,” Big
Technologies, vol. 2, no. 3, p. 149, 2018. Data Mining and Analytics, vol. 1, no. 1, pp. 1–18, 2018.
[7] S. N. Abdulkader, A. Atia, and M.-S. M. Mostafa, “Brain computer [25] O. Tsinalis, P. M. Matthews, Y. Guo, and S. Zafeiriou, “Automatic
interfacing: Applications and challenges,” Egyptian Informatics sleep stage scoring with single-channel eeg using convolutional
Journal, vol. 16, no. 2, pp. 213–230, 2015. neural networks,” arXiv preprint arXiv:1610.01683, 2016.
[8] A. Bashashati, M. Fatourechi, R. K. Ward, and G. E. Birch, [26] Q. Gui, M. Ruiz-blondet, S. Laszlo, and Z. Jin, “A survey on brain
“A survey of signal processing algorithms in brain–computer biometrics,” ACM Computing Surveys, vol. 51, no. 112, 2019.
interfaces based on electrical brain signals,” Journal of Neural [27] R. Abiri, S. Borhani, E. W. Sellers, Y. Jiang, and X. Zhao, “A
engineering, vol. 4, no. 2, p. R32, 2007. comprehensive review of eeg-based brain–computer interface
[9] W. Samek, K.-R. Müller, M. Kawanabe, and C. Vidaurre, paradigms,” Journal of neural engineering, vol. 16, no. 1, p.
“Brain-computer interfacing in discriminative and stationary 011001, 2019.
subspaces,” in Engineering in Medicine and Biology Society [28] H. Cecotti and A. J. Ries, “Best practice for single-trial detection
(EMBC), 2012 Annual International Conference of the IEEE, of event-related potentials: Application to brain-computer
2012, pp. 2873–2876. interfaces,” International Journal of Psychophysiology, vol. 111,
[10] X. Zhang, L. Yao, Q. Z. Sheng, S. S. Kanhere, T. Gu, and pp. 156–169, 2017.
D. Zhang, “Converting your thoughts to texts: Enabling brain [29] N. Naseer and K.-S. Hong, “fnirs-based brain-computer interfaces:
typing via deep feature learning of eeg signals,” in 2018 a review,” Frontiers in human neuroscience, vol. 9, p. 3, 2015.
IEEE International Conference on Pervasive Computing and [30] J. Schmidhuber, “Deep learning in neural networks: An overview,”
Communications (PerCom), 2018, pp. 1–10. Neural networks, vol. 61, pp. 85–117, 2015.
[11] F. Lotte, M. Congedo, A. Lécuyer, F. Lamarche, and B. Arnaldi, [31] L. Deng, “A tutorial survey of architectures, algorithms, and
“A review of classification algorithms for eeg-based brain– applications for deep learning,” APSIPA Transactions on Signal
computer interfaces,” Journal of neural engineering, vol. 4, and Information Processing, vol. 3, 2014.
no. 2, p. R1, 2007. [32] M. Fatourechi, A. Bashashati, R. K. Ward, and G. E. Birch, “Emg
[12] H. Cecotti, M. P. Eckstein, and B. Giesbrecht, “Single-trial and eog artifacts in brain computer interface systems: A survey,”
classification of event-related potentials in rapid serial visual Clinical neurophysiology, vol. 118, no. 3, pp. 480–494, 2007.
presentation tasks using supervised spatial filtering,” IEEE [33] S. Vieira, W. H. Pinaya, and A. Mechelli, “Using deep learning to
transactions on neural networks and learning systems, vol. 25, investigate the neuroimaging correlates of psychiatric and neu-
no. 11, pp. 2030–2042, 2014. rological disorders: Methods and applications,” Neuroscience &
[13] M. Mahmud, M. S. Kaiser, A. Hussain, and S. Vassanelli, Biobehavioral Reviews, vol. 74, pp. 58–75, 2017.
“Applications of deep learning and reinforcement learning to [34] P. Aricò, G. Borghini, G. Di Flumeri, N. Sciaraffa, and F. Babiloni,
biological data,” IEEE transactions on neural networks and “Passive bci beyond the lab: current trends and future
learning systems, vol. 29, no. 6, pp. 2063–2079, 2018. directions,” Physiological measurement, vol. 39, no. 8, p.
[14] D. Wen, Z. Wei, Y. Zhou, G. Li, X. Zhang, and W. Han, “Deep 08TR02, 2018.
learning methods to process fmri data and their application in [35] G. Pfurtscheller and C. Neuper, “Motor imagery and direct brain-
the diagnosis of cognitive impairment: A brief overview and our computer communication,” Proceedings of the IEEE, vol. 89,
opinion,” Frontiers in neuroinformatics, vol. 12, p. 23, 2018. no. 7, pp. 1123–1134, 2001.
[15] S. Mason, A. Bashashati, M. Fatourechi, K. Navarro, and G. Birch, [36] D. Huang, K. Qian, D.-Y. Fei, W. Jia, X. Chen, and O. Bai,
“A comprehensive survey of brain interface technology designs,” “Electroencephalography (eeg)-based brain–computer interface
Annals of biomedical engineering, vol. 35, no. 2, pp. 137–169, (bci): A 2-d virtual wheelchair control based on event-
2007. related desynchronization/synchronization and state control,”
[16] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, IEEE Transactions on Neural Systems and Rehabilitation
M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Engineering, vol. 20, no. 3, pp. 379–388, 2012.
Sánchez, “A survey on deep learning in medical image analysis,” [37] D. Regan, “Steady-state evoked potentials,” JOSA, vol. 67, no. 11,
Medical image analysis, vol. 42, pp. 60–88, 2017. pp. 1475–1489, 1977.
[17] Y. Roy, H. Banville, I. Albuquerque, A. Gramfort, T. H. Falk, [38] N. Naseer, N. K. Qureshi, F. M. Noori, and K.-S. Hong, “Analysis
and J. Faubert, “Deep learning-based electroencephalography of different classification techniques for two-class functional
analysis: a systematic review,” Journal of neural engineering, near-infrared spectroscopy-based brain-computer interface,”
vol. 16, no. 5, p. 051001, 2019. Computational intelligence and neuroscience, vol. 2016, 2016.
[18] X. Wang, G. Gong, N. Li, and Y. Ma, “A survey of the bci and [39] S. Singh, S. Jain, T. Ahuja, Y. Sharma, and N. Pathak, “Study
its application prospect,” in Theory, Methodology, Tools and for reduction of pollution level in diesel engines, petrol engines
Applications for Modeling and Simulation of Complex Systems, and generator sets by bio signal ring,” International Journal of
2016, pp. 102–111. Advance Research and Innovation, vol. 6, no. 3, pp. 175–181,
[19] F. Movahedi, J. L. Coyle, and E. Sejdić, “Deep belief networks 2018.
for electroencephalography: A review of recent contributions [40] S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, and
and future outlooks,” IEEE journal of biomedical and health classification,” IEEE Transactions on neural networks, vol. 3,
informatics, vol. 22, no. 3, pp. 642–652, 2018. no. 5, pp. 683–697, 1992.
[20] S. R. Soekadar, N. Birbaumer, M. W. Slutzky, and L. G. Cohen, [41] T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, and S. Khudanpur,
“Brain–machine interfaces in neurorehabilitation of stroke,” “Recurrent neural network based language model,” in Eleventh
Neurobiology of disease, vol. 83, pp. 172–179, 2015. annual conference of the international speech communication
[21] M. Ahn and S. C. Jun, “Performance variation in motor association, 2010.
imagery brain–computer interface: a brief review,” Journal of [42] T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran,
neuroscience methods, vol. 243, pp. 103–110, 2015. “Deep convolutional neural networks for lvcsr,” in 2013
[22] S. Ruiz, K. Buyukturkoglu, M. Rana, N. Birbaumer, and R. Sitaram, IEEE international conference on acoustics, speech and signal
“Real-time fmri brain computer interfaces: self-regulation processing, 2013, pp. 8614–8618.
of single brain regions to networks,” Biological psychology, [43] M. A. Kramer, “Nonlinear principal component analysis using
A Survey on Deep Learning-based Non-Invasive Brain Signals 26

autoassociative neural networks,” AIChE journal, vol. 37, no. 2, “Deepkey: An eeg and gait based dual-authentication system,”
pp. 233–243, 1991. ACM Transactions on Intelligent Systems and Technology
[44] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimension- (TIST), 2017.
ality of data with neural networks,” science, vol. 313, no. 5786, [62] H. Yang, S. Sakhavi, K. K. Ang, and C. Guan, “On the use
pp. 504–507, 2006. of convolutional neural networks and augmented csp features
[45] G. E. Hinton, “Deep belief networks,” Scholarpedia, vol. 4, no. 5, for multi-class motor imagery of eeg signals classification,” in
p. 5947, 2009. Engineering in Medicine and Biology Society (EMBC), 2015
[46] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 37th Annual International Conference of the IEEE, 2015, pp.
arXiv preprint arXiv:1312.6114, 2013. 2620–2623.
[47] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde- [63] L. Jingwei, C. Yin, and Z. Weidong, “Deep learning eeg
Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative response representation for brain computer interface,” in Control
adversarial nets,” in Advances in neural information processing Conference (CCC), 2015 34th Chinese, 2015, pp. 3518–3523.
systems, 2014, pp. 2672–2680. [64] H. K. Lee and Y.-S. Choi, “A convolution neural networks scheme
[48] S. Chambon, M. N. Galtier, P. J. Arnal, G. Wainrib, and for classification of motor imagery eeg based on wavelet time-
A. Gramfort, “A deep learning architecture for temporal frequecy image,” in International Conference on Information
sleep stage classification using multivariate and multimodal Networking (ICOIN) 2018, 2018, pp. 906–909.
time series,” IEEE Transactions on Neural Systems and [65] X. Zhang, L. Yao, C. Huang, Q. Z. Sheng, and X. Wang,
Rehabilitation Engineering, 2018. “Intent recognition in smart living through deep recurrent neural
[49] J. Zhang, Y. Wu, J. Bai, and F. Chen, “Automatic sleep networks,” in International Conference on Neural Information
stage classification based on sparse deep belief net and Processing, 2017, pp. 748–758.
combination of multiple classifiers,” Transactions of the Institute [66] Z. Tang, C. Li, and S. Sun, “Single-trial eeg classification of motor
of Measurement and Control, vol. 38, no. 4, pp. 435–451, 2016. imagery using deep convolutional neural networks,” Optik-
[50] A. Sors, S. Bonnet, S. Mirek, L. Vercueil, and J.-F. Payen, “A International Journal for Light and Electron Optics, vol. 130,
convolutional neural network for sleep stage scoring from raw pp. 11–18, 2017.
single-channel eeg,” Biomedical Signal Processing and Control, [67] Q. Wang, Y. Hu, and H. Chen, “Multi-channel eeg classification
vol. 42, pp. 107–114, 2018. based on fast convolutional feature extraction,” in International
[51] A. Vilamala, K. H. Madsen, and L. K. Hansen, “Neural Symposium on Neural Networks, 2017, pp. 533–540.
networks for interpretable analysis of eeg sleep stage scoring,” [68] I. Sturm, S. Lapuschkin, W. Samek, and K.-R. Müller, “Inter-
in International Workshop on Machine Learning for Signal pretable deep neural networks for single-trial eeg classification,”
Processing 2017, 2017. Journal of neuroscience methods, vol. 274, pp. 141–145, 2016.
[52] S. Biswal, J. Kulas, H. Sun, B. Goparaju, M. B. Westover, M. T. [69] M. Shahin, B. Ahmed, S. T.-B. Hamida, F. L. Mulaffer, M. Glos,
Bianchi, and J. Sun, “Sleepnet: automated sleep staging system and T. Penzel, “Deep learning and insomnia: Assisting
via deep learning,” arXiv preprint arXiv:1707.08262, 2017. clinicians with their diagnosis,” IEEE journal of biomedical and
[53] K. M. Tsiouris, V. C. Pezoulas, M. Zervakis, S. Konitsiotis, D. D. health informatics, vol. 21, no. 6, pp. 1546–1553, 2017.
Koutsouris, and D. I. Fotiadis, “A long short-term memory deep [70] I. Fernández-Varela, D. Athanasakis, S. Parsons, E. Hernández-
learning network for the prediction of epileptic seizures using Pereira, and V. Moret-Bonillo, “Sleep staging with deep learn-
eeg signals,” Computers in biology and medicine, vol. 99, pp. ing: a convolutional model,” in Proceedings of the European
24–37, 2018. Symposium on Artificial Neural Networks, Computational Intel-
[54] D. Tan, R. Zhao, J. Sun, and W. Qin, “Sleep spindle detection using ligence and Machine Learning (ESANN 2018).
deep learning: a validation study based on crowdsourcing,” in [71] A. M. Chiarelli, P. Croce, A. Merla, and F. Zappasodi,
Engineering in Medicine and Biology Society (EMBC), 2015 “Deep learning for hybrid eeg-fnirs brain–computer interface:
37th Annual International Conference of the IEEE, 2015, pp. application to motor imagery classification,” Journal of neural
2828–2831. engineering, vol. 15, no. 3, p. 036028, 2018.
[55] M. Manzano, A. Guillén, I. Rojas, and L. J. Herrera, “Combination [72] T. Uktveris and V. Jusas, “Application of convolutional neural
of eeg data time and frequency representations in deep networks networks to four-class motor imagery classification problem,”
for sleep stage classification,” in International Conference on Information Technology And Control, vol. 46, no. 2, pp. 260–
Intelligent Computing, 2017, pp. 219–229. 273, 2017.
[56] L. Fraiwan and K. Lweesy, “Neonatal sleep state identification [73] V. Lawhern, A. Solon, N. Waytowich, S. M. Gordon, C. Hung, and
using deep learning autoencoders,” in Signal Processing & its B. J. Lance, “Eegnet: a compact convolutional neural network
Applications (CSPA), 2017 IEEE 13th International Colloquium for eeg-based brain–computer interfaces,” Journal of neural
on, 2017, pp. 228–231. engineering, 2018.
[57] A. Supratak, H. Dong, C. Wu, and Y. Guo, “Deepsleepnet: a model [74] J. Li, Z. Struzik, L. Zhang, and A. Cichocki, “Feature learning from
for automatic sleep stage scoring based on raw single-channel incomplete eeg with denoising autoencoder,” Neurocomputing,
eeg,” IEEE Transactions on Neural Systems and Rehabilitation vol. 165, pp. 23–31, 2015.
Engineering, vol. 25, no. 11, pp. 1998–2008, 2017. [75] S. Redkar, “Using deep learning for human computer interface
[58] H. Dong, A. Supratak, W. Pan, C. Wu, P. M. Matthews, and via electroencephalography,” IAES International Journal of
Y. Guo, “Mixed neural network approach for temporal sleep Robotics and Automation, vol. 4, no. 4, 2015.
stage classification,” IEEE Transactions on Neural Systems and [76] X. Zhang, L. Yao, D. Zhang, X. Wang, Q. Z. Sheng, and T. Gu,
Rehabilitation Engineering, vol. 26, no. 2, pp. 324–333, 2018. “Multi-person brain activity recognition via comprehensive eeg
[59] K. G. Hartmann, R. T. Schirrmeister, and T. Ball, “Hierarchical signal analysis,” in Proceedings of the 14th EAI International
internal representation of spectral features in deep convolutional Conference on Mobile and Ubiquitous Systems: Computing,
networks trained for eeg decoding,” in Brain-Computer Networking and Services, 2017, pp. 28–37.
Interface (BCI), 2018 6th International Conference on, 2018, [77] J. Li and A. Cichocki, “Deep learning of multifractal attributes
pp. 1–6. from motor imagery induced eeg,” in International Conference
[60] E. Nurse, B. S. Mashford, A. J. Yepes, I. Kiral-Kornek, S. Harrer, on Neural Information Processing, 2014, pp. 503–510.
and D. R. Freestone, “Decoding eeg and lfp signals using [78] Y. Ren and Y. Wu, “Convolutional deep belief networks for feature
deep learning: heading truenorth,” in Proceedings of the ACM extraction of eeg signal,” in International Joint Conference on
International Conference on Computing Frontiers, 2016, pp. Neural Networks (IJCNN), 2014, pp. 2850–2853.
259–266. [79] S. Kumar, A. Sharma, K. Mamun, and T. Tsunoda, “A deep
[61] X. Zhang, L. Yao, K. Chen, X. Wang, Q. Sheng, and T. Gu, learning approach for motor imagery eeg signal classification,”
A Survey on Deep Learning-based Non-Invasive Brain Signals 27

in Computer Science and Engineering (APWC on CSE), 2016 signals,” in IEEE International Conference on Bioinformatics
3rd Asia-Pacific World Congress on, 2016, pp. 34–39. and Bioengineering (BIBE), 2014, pp. 30–37.
[80] N. Lu, T. Li, X. Ren, and H. Miao, “A deep learning [99] H. Xu and K. N. Plataniotis, “Affective states classification
scheme for motor imagery classification based on restricted using eeg and semi-supervised deep learning approaches,”
boltzmann machines,” IEEE transactions on neural systems and in Multimedia Signal Processing (MMSP), 2016 IEEE 18th
rehabilitation engineering, vol. 25, no. 6, pp. 566–576, 2017. International Workshop on, 2016, pp. 1–6.
[81] M. Dai, D. Zheng, R. Na, S. Wang, and S. Zhang, “Eeg [100] X. Li, P. Zhang, D. Song, G. Yu, Y. Hou, and B. Hu, “Eeg
classification of motor imagery using a novel deep learning based emotion identification using unsupervised deep feature
framework,” Sensors, vol. 19, no. 3, p. 551, 2019. learning,” 2015.
[82] C. Tan, F. Sun, W. Zhang, J. Chen, and C. Liu, “Multimodal [101] H. Xu and K. N. Plataniotis, “Eeg-based affect states classification
classification with deep convolutional-recurrent neural networks using deep belief networks,” in Digital Media Industry &
for electroencephalography,” in International Conference on Academic Forum (DMIAF), 2016, pp. 148–153.
Neural Information Processing, 2017, pp. 767–776. [102] W.-L. Zheng, J.-Y. Zhu, Y. Peng, and B.-L. Lu, “Eeg-based emotion
[83] L. Duan, M. Bao, J. Miao, Y. Xu, and J. Chen, “Classification based classification using deep belief networks,” in Multimedia and
on multilayer extreme learning machine for motor imagery task Expo (ICME), 2014 IEEE International Conference on, 2014,
from eeg signals,” Procedia Computer Science, vol. 88, pp. 176– pp. 1–6.
184, 2016. [103] K. Li, X. Li, Y. Zhang, and A. Zhang, “Affective state recognition
[84] E. S. Nurse, P. J. Karoly, D. B. Grayden, and D. R. Freestone, from eeg with deep belief networks,” in 2013 IEEE International
“A generalizable brain-computer interface (bci) using machine Conference on Bioinformatics and Biomedicine, 2013, pp. 305–
learning for feature discovery,” PloS one, vol. 10, no. 6, p. 310.
e0131328, 2015. [104] J. A. Mioranda-Correa and I. Patras, “A multi-task cascaded
[85] X. Zhang, L. Yao, C. Huang, S. Wang, M. Tan, G. Long, network for prediction of affect, personality, mood and
and C. Wang, “Multi-modality sensor data classification with social context using eeg signals,” in Automatic Face &
selective attention,” International Joint Conferences on Artificial Gesture Recognition (FG 2018), 2018 13th IEEE International
Intelligence (IJCAI), 2018. Conference on, 2018, pp. 373–380.
[86] S. Sakhavi, C. Guan, and S. Yan, “Parallel convolutional-linear [105] P. Kawde and G. K. Verma, “Deep belief network based affect
neural network for motor imagery classification,” in Signal recognition from physiological signals,” in Electrical, Computer
Processing Conference (EUSIPCO), 2015 23rd European, 2015, and Electronics (UPCON), 2017 4th IEEE Uttar Pradesh
pp. 2736–2740. Section International Conference on, 2017, pp. 587–592.
[87] A. Frydenlund and F. Rudzicz, “Emotional affect estimation using [106] Y. Gao, H. J. Lee, and R. M. Mehmood, “Deep learninig of
video and eeg data in deep neural networks,” in Canadian eeg signals for emotion recognition,” in Multimedia & Expo
Conference on Artificial Intelligence, 2015, pp. 273–280. Workshops (ICMEW), 2015 IEEE International Conference on,
[88] T. Zhang, W. Zheng, Z. Cui, Y. Zong, and Y. Li, “Spatial- 2015, pp. 1–5.
temporal recurrent neural network for emotion recognition,” [107] Z. Yin, M. Zhao, Y. Wang, J. Yang, and J. Zhang, “Recognition
IEEE transactions on cybernetics, no. 99, pp. 1–9, 2018. of emotions using multimodal physiological signals and an
[89] J. Li, Z. Zhang, and H. He, “Implementation of eeg emotion ensemble deep learning model,” Computer methods and
recognition system based on hierarchical convolutional neural programs in biomedicine, vol. 140, pp. 93–110, 2017.
networks,” in International Conference on Brain Inspired [108] S. Alhagry, A. A. Fahmy, and R. A. El-Khoribi, “Emotion
Cognitive Systems, 2016, pp. 22–33. recognition based on eeg using lstm recurrent neural network,”
[90] W. Liu, H. Jiang, and Y. Lu, “Analyze eeg signals with Emotion, vol. 8, no. 10, 2017.
convolutional neural network based on power spectrum feature [109] Y. Yuan, G. Xun, F. Ma, Q. Suo, H. Xue, K. Jia, and A. Zhang, “A
selection,” Proceedings of Science, 2017. novel channel-aware attention framework for multi-channel eeg
[91] F. Wang, S.-h. Zhong, J. Peng, J. Jiang, and Y. Liu, “Data seizure detection via multi-view deep learning,” in International
augmentation for eeg-based emotion recognition with deep Conference on Biomedical & Health Informatics (BHI), 2018,
convolutional neural networks,” in International Conference on pp. 206–209.
Multimedia Modeling, 2018, pp. 82–93. [110] S. S. Talathi, “Deep recurrent neural networks for seizure
[92] J. Li, Z. Zhang, and H. He, “Hierarchical convolutional detection and early seizure detection systems,” arXiv preprint
neural networks for eeg-based emotion recognition,” Cognitive arXiv:1706.03283, 2017.
Computation, pp. 1–13, 2017. [111] G. Ruffini, D. Ibañez, M. Castellano, S. Dunne, and A. Soria-
[93] K. Wang, Y. Zhao, Q. Xiong, M. Fan, G. Sun, L. Ma, and Frisch, “Eeg-driven rnn classification for prognosis of neurode-
T. Liu, “Research on healthy anomaly detection model based on generation in at-risk patients,” in International Conference on
deep learning from multiple time-series physiological signals,” Artificial Neural Networks, 2016, pp. 306–313.
Scientific Programming, vol. 2016, 2016. [112] I. Ullah, M. Hussain, H. Aboalsamh et al., “An automated system
[94] X. Chai, Q. Wang, Y. Zhao, X. Liu, O. Bai, and Y. Li, for epilepsy detection using eeg brain signals based on deep
“Unsupervised domain adaptation techniques based on auto- learning approach,” Expert Systems with Applications, vol. 107,
encoder for non-stationary eeg-based emotion recognition,” pp. 61–71, 2018.
Computers in biology and medicine, vol. 79, pp. 205–214, 2016. [113] U. R. Acharya, S. L. Oh, Y. Hagiwara, J. H. Tan, H. Adeli, and
[95] W. Liu, W.-L. Zheng, and B.-L. Lu, “Emotion recognition using D. P. Subha, “Automated eeg-based screening of depression
multimodal deep learning,” in International Conference on using deep convolutional neural network,” Computer methods
Neural Information Processing, 2016, pp. 521–529. and programs in biomedicine, vol. 161, pp. 103–113, 2018.
[96] W.-L. Zheng and B.-L. Lu, “Investigating critical frequency bands [114] U. R. Acharya, S. L. Oh, Y. Hagiwara, J. H. Tan, and H. Adeli,
and channels for eeg-based emotion recognition with deep “Deep convolutional neural network for the automated detection
neural networks,” IEEE Transactions on Autonomous Mental and diagnosis of seizure using eeg signals,” Computers in
Development, vol. 7, no. 3, pp. 162–175, 2015. biology and medicine, vol. 100, pp. 270–278, 2018.
[97] W.-L. Zheng, H.-T. Guo, and B.-L. Lu, “Revealing critical channels [115] F. C. Morabito, M. Campolo, C. Ieracitano, J. M. Ebadi,
and frequency bands for emotion recognition from eeg with L. Bonanno, A. Bramanti, S. Desalvo, N. Mammone,
deep belief network,” in Neural Engineering (NER), 2015 7th and P. Bramanti, “Deep convolutional neural networks for
International IEEE/EMBS Conference on, 2015, pp. 154–157. classification of mild cognitive impaired and alzheimer’s
[98] X. Jia, K. Li, X. Li, and A. Zhang, “A novel semi-supervised disease patients from scalp eeg recordings,” in Research and
deep learning framework for affective state recognition on eeg Technologies for Society and Industry Leveraging a better
A Survey on Deep Learning-based Non-Invasive Brain Signals 28

tomorrow (RTSI), 2016 IEEE 2nd International Forum on, 2016, works (IJCNN), 2018, pp. 1–6.
pp. 1–6. [133] S. Palazzo, C. Spampinato, I. Kavasidis, D. Giordano, and
[116] R. Schirrmeister, L. Gemein, K. Eggensperger, F. Hutter, and M. Shah, “Generative adversarial networks conditioned by brain
T. Ball, “Deep learning with convolutional neural networks signals,” in Proceedings of the IEEE International Conference
for decoding and visualization of eeg pathology,” in Signal on Computer Vision, 2017, pp. 3410–3418.
Processing in Medicine and Biology Symposium (SPMB), 2017 [134] P. S. S. C. G. D. S. M. Kavasidis, I., “Brain2image: Converting
IEEE, 2017, pp. 1–7. brain signals into images,” in Proceedings of the 25th ACM
[117] M.-P. Hosseini, T. X. Tran, D. Pompili, K. Elisevich, and international conference on Multimedia, 2017, pp. 1809–1817.
H. Soltanian-Zadeh, “Deep learning with edge computing for [135] J. Teo, C. L. Hou, and J. Mountstephens, “Deep learning
localization of epileptogenicity using multimodal rs-fmri and for eeg-based preference classification,” in AIP Conference
eeg big data,” in Autonomic Computing (ICAC), 2017 IEEE Proceedings, vol. 1891, no. 1, 2017, p. 020141.
International Conference on, 2017, pp. 83–92. [136] T. K. Reddy and L. Behera, “Online eye state recognition from
[118] A. R. Johansen, J. Jin, T. Maszczyk, J. Dauwels, S. S. eeg data using deep architectures,” in Systems, Man, and
Cash, and M. B. Westover, “Epileptiform spike detection Cybernetics (SMC), 2016 IEEE International Conference on,
via convolutional neural networks,” in IEEE International 2016, pp. 000 712–000 717.
Conference on Acoustics, Speech and Signal Processing [137] A. J. Yepes, J. Tang, and B. S. Mashford, “Improving classification
(ICASSP), 2016, pp. 754–758. accuracy of feedforward neural networks for spiking neuromor-
[119] A. H. Ansari, P. J. Cherian, A. Caicedo, G. Naulaers, M. De Vos, phic chips,” arXiv preprint arXiv:1705.07755, 2017.
and S. Van Huffel, “Neonatal seizure detection using deep [138] J. Shang, W. Zhang, J. Xiong, and Q. Liu, “Cognitive load
convolutional neural networks,” International journal of neural recognition using multi-channel complex network method,” in
systems, p. 1850011, 2018. International Symposium on Neural Networks, 2017, pp. 466–
[120] M.-P. Hosseini, D. Pompili, K. Elisevich, and H. Soltanian-Zadeh, 474.
“Optimized deep learning for eeg big data and seizure prediction [139] J. Behncke, R. T. Schirrmeister, W. Burgard, and T. Ball,
bci via internet of things,” IEEE Transactions on Big Data, “The signature of robot action success in eeg signals of
vol. 3, no. 4, pp. 392–404, 2017. a human observer: Decoding and visualization using deep
[121] Y. Yuan, G. Xun, K. Jia, and A. Zhang, “A novel wavelet-based convolutional neural networks,” in International Conference on
model for eeg epileptic seizure detection using multi-context Brain-Computer Interface (BCI) 2018, 2018, pp. 1–6.
learning,” in IEEE International Conference on Bioinformatics [140] Y.-C. Hung, Y.-K. Wang, M. Prasad, and C.-T. Lin, “Brain dynamic
and Biomedicine (BIBM), 2017, pp. 694–699. states analysis based on 3d convolutional neural network,” in
[122] Q. Lin, S.-q. Ye, X.-m. Huang, S.-y. Li, M.-z. Zhang, Y. Xue, and Systems, Man, and Cybernetics (SMC), 2017 IEEE International
W.-S. Chen, “Classification of epileptic eeg signals with stacked Conference on, 2017, pp. 222–227.
sparse autoencoder based on deep learning,” in International [141] V. Baltatzis, K.-M. Bintsi, G. K. Apostolidis, and L. J.
Conference on Intelligent Computing, 2016, pp. 802–810. Hadjileontiadis, “Bullying incidences identification within an
[123] F. C. Morabito, M. Campolo, N. Mammone, M. Versaci, immersive environment using hd eeg-based analysis: A swarm
S. Franceschetti, F. Tagliavini, V. Sofia, D. Fatuzzo, A. Gam- decomposition and deep learning approach,” Scientific reports,
bardella, A. Labate et al., “Deep learning representation from vol. 7, no. 1, p. 17292, 2017.
electroencephalography of early-stage creutzfeldt-jakob disease [142] S. Stober, D. J. Cameron, and J. A. Grahn, “Classifying eeg
and features for differentiation from rapidly progressive demen- recordings of rhythm perception.” in ISMIR, 2014, pp. 649–654.
tia,” International journal of neural systems, vol. 27, no. 02, p. [143] M. Völker, R. T. Schirrmeister, L. D. Fiederer, W. Burgard, and
1650039, 2017. T. Ball, “Deep transfer learning for error decoding from non-
[124] T. Wen and Z. Zhang, “Deep convolution neural network invasive eeg,” in Brain-Computer Interface (BCI), 2018 6th
and autoencoders-based unsupervised feature learning of eeg International Conference on, 2018, pp. 1–6.
signals,” IEEE Access, vol. 6, pp. 25 399–25 410, 2018. [144] L. G. Hernández, O. M. Mozos, J. M. Ferrández, and J. M. Antelis,
[125] A. Page, J. Turner, T. Mohsenin, and T. Oates, “Comparing raw data “Eeg-based detection of braking intention under different car
and feature extraction for seizure detection with deep learning driving conditions,” Frontiers in neuroinformatics, vol. 12,
methods.” in FLAIRS Conference, 2014. 2018.
[126] Y. Zhao and L. He, “Deep learning in the eeg diagnosis of [145] M. A. Almogbel, A. H. Dang, and W. Kameyama, “Eeg-signals
alzheimer’s disease,” in Asian Conference on Computer Vision, based cognitive workload detection of vehicle driver using deep
2014, pp. 340–353. learning,” in Advanced Communication Technology (ICACT),
[127] J. Turner, A. Page, T. Mohsenin, and T. Oates, “Deep belief net- 2018 20th International Conference on, 2018, pp. 256–259.
works used on high resolution multichannel electroencephalog- [146] M. J. Putten, S. Olbrich, and M. Arns, “Predicting sex from brain
raphy data for seizure detection,” in 2014 AAAI Spring Sympo- rhythms with deep learning,” Scientific reports, vol. 8, no. 1, p.
sium Series, 2014. 3069, 2018.
[128] V. Shah, M. Golmohammadi, S. Ziyabari, E. Von Weltin, I. Obeid, [147] M. Hajinoroozi, Z. Mao, and Y. Huang, “Prediction of driver’s
and J. Picone, “Optimizing channel selection for seizure drowsy and alert states from eeg signals with deep learning,” in
detection,” in Signal Processing in Medicine and Biology Computational Advances in Multi-Sensor Adaptive Processing
Symposium (SPMB), 2017 IEEE, 2017, pp. 1–5. (CAMSAP), 2015 IEEE 6th International Workshop on, 2015,
[129] M.-P. Hosseini, H. Soltanian-Zadeh, K. Elisevich, and D. Pompili, pp. 493–496.
“Cloud-based deep learning of big eeg data for epileptic seizure [148] A. Sternin, S. Stober, J. Grahn, and A. Owen, “Tempo estimation
prediction,” arXiv preprint arXiv:1702.05192, 2017. from the eeg signal during perception and imagination of
[130] M. Golmohammadi, S. Ziyabari, V. Shah, S. L. de Diego, I. Obeid, music,” in International Symposium on Computer Music
and J. Picone, “Deep architectures for automated seizure Multidisciplinary Research, 2015.
detection in scalp eegs,” arXiv preprint arXiv:1712.09776, 2017. [149] L. Chu, R. Qiu, H. Liu, Z. Ling, T. Zhang, and J. Wang, “Individual
[131] A. M. Al-kaysi, A. Al-Ani, and T. W. Boonstra, “A multichannel recognition in schizophrenia using deep learning methods with
deep belief network for the classification of eeg data,” in random forest and voting classifiers: Insights from resting state
International Conference on Neural Information Processing, eeg streams,” arXiv preprint arXiv:1707.03467, 2017.
2015, pp. 38–45. [150] Z. Yin and J. Zhang, “Cross-session classification of mental
[132] S. M. Abdelfattah, G. M. Abdelrahman, and M. Wang, “Augment- workload levels using eeg and an adaptive deep learning model,”
ing the size of eeg datasets using generative adversarial net- Biomedical Signal Processing and Control, vol. 33, pp. 30–47,
works,” in 2018 International Joint Conference on Neural Net- 2017.
A Survey on Deep Learning-based Non-Invasive Brain Signals 29

[151] L.-H. Du, W. Liu, W.-L. Zheng, and B.-L. Lu, “Detecting driving vol. 18, pp. 127–137, 2015.
fatigue with multimodal deep learning,” in Neural Engineering [168] Q. Liu, X.-G. Zhao, Z.-G. Hou, and H.-G. Liu, “Deep
(NER), 2017 8th International IEEE/EMBS Conference on, belief networks for eeg-based concealed information test,” in
2017, pp. 74–77. International Symposium on Neural Networks, 2017, pp. 498–
[152] S. Narejo, E. Pasero, and F. Kulsoom, “Eeg based eye 506.
state classification using deep belief network and stacked [169] T. Ma, H. Li, H. Yang, X. Lv, P. Li, T. Liu, D. Yao, and P. Xu,
autoencoder,” International Journal of Electrical and Computer “The extraction of motion-onset vep bci features based on
Engineering (IJECE), vol. 6, no. 6, pp. 3131–3141, 2016. deep learning and compressed sensing,” Journal of neuroscience
[153] M. Hajinoroozi, T.-P. Jung, C.-T. Lin, and Y. Huang, “Feature methods, vol. 275, pp. 80–92, 2017.
extraction with deep belief networks for driver’s cognitive states [170] R. Maddula, J. Stivers, M. Mousavi, S. Ravindran, and V. de Sa,
prediction from eeg data,” in Signal and Information Processing “Deep recurrent convolutional neural networks for classifying
(ChinaSIP), 2015 IEEE China Summit and International p300 bci signals,” in Proceedings of the 7th Graz Brain-
Conference on, 2015, pp. 812–815. Computer Interface Conference, Graz, Austria, 2017, pp. 18–22.
[154] P. P. San, S. H. Ling, R. Chai, Y. Tran, A. Craig, and H. Nguyen, [171] P. Bashivan, I. Rish, M. Yeasin, and N. Codella, “Learning
“Eeg-based driver fatigue detection using hybrid deep generic representations from eeg with deep recurrent-convolutional
model,” in Engineering in Medicine and Biology Society neural networks,” ICLR, 2016.
(EMBC), 2016 IEEE 38th Annual International Conference of [172] P. Bashivan, I. Rish, and S. Heisig, “Mental state recognition via
the, 2016, pp. 800–803. wearable eeg,” arXiv preprint arXiv:1602.00985, 2016.
[155] P. Li, W. Jiang, and F. Su, “Single-channel eeg-based mental fatigue [173] A. Shanbhag, A. P. Kholkar, S. Sawant, A. Vicente, S. Martires,
detection based on deep belief network,” in Engineering in and S. Patil, “P300 analysis using deep neural network,” in 2017
Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference on Energy, Communication, Data
International Conference of the, 2016, pp. 367–370. Analytics and Soft Computing (ICECDS), 2017, pp. 3142–3147.
[156] P. Zhang, X. Wang, W. Zhang, and J. Chen, “Learning spatial– [174] Z. Mao, V. Lawhern, L. M. Merino, K. Ball, L. Deng, B. J.
spectral–temporal eeg features with recurrent 3d convolutional Lance, K. Robbins, and Y. Huang, “Classification of non-
neural networks for cross-task mental workload assessment,” time-locked rapid serial visual presentation events for brain-
IEEE Transactions on neural systems and rehabilitation computer interaction using deep learning,” in Signal and
engineering, vol. 27, no. 1, pp. 31–42, 2018. Information Processing (ChinaSIP), 2014 IEEE China Summit
[157] S. Stober, A. Sternin, A. M. Owen, and J. A. Grahn, “Deep feature & International Conference on, 2014, pp. 520–524.
learning for eeg recordings,” arXiv preprint arXiv:1511.04306, [175] Z. Mao, “Deep learning for rapid serial visual presentation event
2015. from electroencephalography signal,” Ph.D. dissertation, The
[158] R. Chai, S. H. Ling, P. P. San, G. R. Naik, T. N. Nguyen, University of Texas at San Antonio, 2016.
Y. Tran, A. Craig, and H. T. Nguyen, “Improving eeg-based [176] R. Manor and A. B. Geva, “Convolutional neural network for
driver fatigue classification using sparse-deep belief networks,” multi-category rapid serial visual presentation bci,” Frontiers in
Frontiers in neuroscience, vol. 11, p. 103, 2017. computational neuroscience, vol. 9, p. 146, 2015.
[159] P. Bashivan, M. Yeasin, and G. M. Bidelman, “Single trial [177] H. Cecotti, “Convolutional neural networks for event-related
prediction of normal and excessive cognitive load through eeg potential detection: impact of the architecture,” in Engineering
feature fusion,” in Signal Processing in Medicine and Biology in Medicine and Biology Society (EMBC), 2017 39th Annual
Symposium (SPMB), 2015 IEEE, 2015, pp. 1–5. International Conference of the IEEE, 2017, pp. 2031–2034.
[160] X. Zhang, L. Yao, C. Huang, S. S. Kanhere, and D. Zhang, [178] A. J. Solon, S. M. Gordon, B. Lance, and V. Lawhern, “Deep
“Brain2object: Printing your mind from brain signals with spa- learning approaches for p300 classification in image triage:
tial correlation embedding,” arXiv preprint arXiv:1810.02223, Applications to the nails task,” in Proceedings of the 13th NTCIR
2018. Conference on Evaluation of Information Access Technologies,
[161] T. Koike-Akino, R. Mahajan, T. K. Marks, Y. Wang, S. Watanabe, NTCIR-13, Tokyo, Japan, 2017, pp. 5–8.
O. Tuzel, and P. Orlik, “High-accuracy user identification using [179] M. Hajinoroozi, Z. Mao, Y.-P. Lin, and Y. Huang, “Deep transfer
eeg biometrics,” in 2016 38th Annual International Conference learning for cross-subject and cross-experiment prediction of
of the IEEE Engineering in Medicine and Biology Society image rapid serial visual presentation events from eeg data,” in
(EMBC), 2016, pp. 854–858. International Conference on Augmented Cognition, 2017, pp.
[162] K. Kawasaki, T. Yoshikawa, and T. Furuhashi, “Visualizing 45–55.
extracted feature by deep learning in p300 discrimination task,” [180] Z. Mao, W. X. Yao, and Y. Huang, “Eeg-based biometric
in Soft Computing and Pattern Recognition (SoCPaR), 2015 7th identification with deep learning,” in Neural Engineering (NER),
International Conference of, 2015, pp. 149–154. 2017 8th International IEEE/EMBS Conference on, 2017, pp.
[163] C. Spampinato, S. Palazzo, I. Kavasidis, D. Giordano, N. Souly, 609–612.
and M. Shah, “Deep learning human mind for automated [181] R. Manor, L. Mishali, and A. B. Geva, “Multimodal neural network
visual classification,” in Proceedings of the IEEE Conference for rapid serial visual presentation brain computer interface,”
on Computer Vision and Pattern Recognition, 2017, pp. 6809– Frontiers in computational neuroscience, vol. 10, p. 130, 2016.
6817. [182] Z. Lin, Y. Zeng, L. Tong, H. Zhang, C. Zhang, and B. Yan, “Method
[164] M. Liu, W. Wu, Z. Gu, Z. Yu, F. Qi, and Y. Li, “Deep learning for enhancing single-trial p300 detection by introducing the
based on batch normalization for p300 signal detection,” complexity degree of image information in rapid serial visual
Neurocomputing, vol. 275, pp. 288–297, 2018. presentation tasks,” PloS one, vol. 12, no. 12, p. e0184713, 2017.
[165] S. Sarkar, K. Reddy, A. Dorgan, C. Fidopiastis, and M. Giering, [183] S. M. Gordon, M. Jaswa, A. J. Solon, and V. J. Lawhern, “Real
“Wearable eeg-based activity recognition in phm-related service world bci: cross-domain learning and practical applications,”
environment via deep learning,” Int. J. Progn. Health Manag, in Proceedings of the 2017 ACM Workshop on An Application-
vol. 7, pp. 1–10, 2016. oriented Approach to BCI out of the laboratory, 2017, pp. 25–
[166] H. Cecotti and A. Graser, “Convolutional neural networks for 28.
p300 detection with application to brain-computer interfaces,” [184] J. Yoon, J. Lee, and M. Whang, “Spatial and time domain feature of
IEEE transactions on pattern analysis and machine intelligence, erp speller system extracted via convolutional neural network,”
vol. 33, no. 3, pp. 433–445, 2011. Computational intelligence and neuroscience, vol. 2018, 2018.
[167] W. Gao, J.-a. Guan, J. Gao, and D. Zhou, “Multi-ganglion ann [185] J. Shamwell, H. Lee, H. Kwon, A. R. Marathe, V. Lawhern,
based feature learning with application to p300-bci signal and W. Nothwang, “Single-trial eeg rsvp classification using
classification,” Biomedical Signal Processing and Control, convolutional neural networks,” in Micro-and Nanotechnology
A Survey on Deep Learning-based Non-Invasive Brain Signals 30

Sensors, Systems, and Applications VIII, vol. 9836, 2016, p. “Deep learning of fmri big data: a novel approach to subject-
983622. transfer decoding,” arXiv preprint arXiv:1502.00093, 2015.
[186] L. Vařeka and P. Mautner, “Stacked autoencoders for the p300 [203] G. Shen, T. Horikawa, K. Majima, and Y. Kamitani, “Deep image
component detection,” Frontiers in neuroscience, vol. 11, p. 302, reconstruction from human brain activity,” PLoS computational
2017. biology, vol. 15, no. 1, p. e1006633, 2019.
[187] E. Carabez, M. Sugi, I. Nambu, and Y. Wada, “Identifying single [204] R. M. Cichy, A. Khosla, D. Pantazis, A. Torralba, and A. Oliva,
trial event-related potentials in an earphone-based auditory “Comparison of deep neural networks to spatio-temporal
brain-computer interface,” Applied Sciences, vol. 7, no. 11, p. cortical dynamics of human visual object recognition reveals
1197, 2017. hierarchical correspondence,” Scientific reports, vol. 6, p. 27755,
[188] S. Stober, D. J. Cameron, and J. A. Grahn, “Using convolutional 2016.
neural networks to recognize rhythm stimuli from electroen- [205] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville,
cephalography recordings,” in Advances in neural information Y. Bengio, C. Pal, P.-M. Jodoin, and H. Larochelle, “Brain
processing systems, 2014, pp. 1449–1457. tumor segmentation with deep neural networks,” Medical image
[189] A. HACHEM, M. M. B. Khelifa, A. M. Alimi, P. Gorce, S. V. analysis, vol. 35, pp. 18–31, 2017.
ARASU, S. BAULKANI, S. K. BISOY, P. K. PATTNAIK, [206] V. Shreyas and V. Pankajakshan, “A deep learning architecture for
S. RAVINDRAN, N. PALANISAMY et al., “Effect of fatigue brain tumor segmentation in mri images,” in Multimedia Signal
on ssvep during virtual wheelchair navigation,” Journal of Processing (MMSP), 2017 IEEE 19th International Workshop
Theoretical and Applied Information Technology, vol. 65, no. 1, on, 2017, pp. 1–6.
2014. [207] S. Sarraf and G. Tofighi, “Deep learning-based pipeline to
[190] J. Thomas, T. Maszczyk, N. Sinha, T. Kluge, and J. Dauwels, “Deep recognize alzheimer’s disease using fmri data,” in Future
learning-based classification for brain-computer interfaces,” in Technologies Conference (FTC), 2016, pp. 816–820.
Systems, Man, and Cybernetics (SMC), 2017 IEEE International [208] R. Li, W. Zhang, H.-I. Suk, L. Wang, J. Li, D. Shen, and S. Ji, “Deep
Conference on, 2017, pp. 234–239. learning based imaging data completion for improved brain
[191] N.-S. Kwak, K.-R. Müller, and S.-W. Lee, “A convolutional neural disease diagnosis,” in International Conference on Medical
network for steady state visual evoked potential classification Image Computing and Computer-Assisted Intervention, 2014,
under ambulatory environment,” PloS one, vol. 12, no. 2, p. pp. 305–312.
e0172578, 2017. [209] H.-I. Suk, C.-Y. Wee, S.-W. Lee, and D. Shen, “State-space model
[192] N. R. Waytowich, V. Lawhern, J. O. Garcia, J. Cummings, J. Faller, with deep learning for functional dynamics estimation in resting-
P. Sajda, and J. M. Vettel, “Compact convolutional neural state fmri,” NeuroImage, vol. 129, pp. 292–307, 2016.
networks for classification of asynchronous steady-state visual [210] H.-I. Suk, D. Shen, A. D. N. Initiative et al., “Deep learning in
evoked potentials,” arXiv preprint arXiv:1803.04566, 2018. diagnosis of brain disorders,” in Recent Progress in Brain and
[193] N. K. N. Aznan, S. Bonner, J. D. Connolly, N. A. Moubayed, Cognitive Engineering, 2015, pp. 203–213.
and T. P. Breckon, “On the classification of ssvep-based dry- [211] S. M. Plis, D. R. Hjelm, R. Salakhutdinov, E. A. Allen, H. J.
eeg signals via convolutional neural networks,” arXiv preprint Bockholt, J. D. Long, H. J. Johnson, J. S. Paulsen, J. A.
arXiv:1805.04157, 2018. Turner, and V. D. Calhoun, “Deep learning for neuroimaging: a
[194] T. Tu, J. Koss, and P. Sajda, “Relating deep neural network validation study,” Frontiers in neuroscience, vol. 8, p. 229, 2014.
representations to eeg-fmri spatiotemporal dynamics in a [212] A. Ortiz, J. Munilla, J. M. Gorriz, and J. Ramirez, “Ensembles
perceptual decision-making task,” in Proceedings of the IEEE of deep learning architectures for the early diagnosis of the
Conference on Computer Vision and Pattern Recognition alzheimer’s disease,” International journal of neural systems,
Workshops, 2018, pp. 1985–1991. vol. 26, no. 07, p. 1650025, 2016.
[195] J. Kulasingham, V. Vibujithan, and A. De Silva, “Deep belief [213] N. F. M. Suhaimi, Z. Z. Htike, and N. K. A. M. Rashid, “Studies on
networks and stacked autoencoders for the p300 guilty classification of fmri data using deep learning approach,” 2015.
knowledge test,” in Biomedical Engineering and Sciences [214] K. Seeliger, U. Güçlü, L. Ambrogioni, Y. Güçlütürk, and
(IECBES), 2016 IEEE EMBS Conference on, 2016, pp. 127– M. Van Gerven, “Generative adversarial networks for recon-
132. structing natural images from brain activity,” NeuroImage, vol.
[196] M. Attia, I. Hettiarachchi, M. Hossny, and S. Nahavandi, “A time 181, pp. 775–785, 2018.
domain classification of steady-state visual evoked potentials us- [215] C. Han, H. Hayashi, L. Rundo, R. Araki, W. Shimoda,
ing deep recurrent-convolutional neural networks,” in Biomedi- S. Muramatsu, Y. Furukawa, G. Mauri, and H. Nakayama, “Gan-
cal Imaging (ISBI 2018), 2018 IEEE 15th International Sympo- based synthetic brain mr image generation,” in 2018 IEEE 15th
sium on, 2018, pp. 766–769. International Symposium on Biomedical Imaging (ISBI 2018),
[197] J. Pérez-Benı́tez, J. Pérez-Benı́tez, and J. Espina-Hernández, “De- 2018, pp. 734–738.
velopment of a brain computer interface interface using multi- [216] P. Zhang, F. Wang, W. Xu, and Y. Li, “Multi-channel generative
frequency visual stimulation and deep neural networks,” in Elec- adversarial network for parallel magnetic resonance image
tronics, Communications and Computers (CONIELECOMP), reconstruction in k-space,” in International Conference on
2018 International Conference on, 2018, pp. 18–24. Medical Image Computing and Computer-Assisted Intervention,
[198] G. Huve, K. Takahashi, and M. Hashimoto, “Brain activity 2018, pp. 180–188.
recognition with a wearable fnirs using neural networks,” in [217] C. Hu, R. Ju, Y. Shen, P. Zhou, and Q. Li, “Clinical decision
Mechatronics and Automation (ICMA), 2017 IEEE International support for alzheimer’s disease based on deep learning and brain
Conference on, 2017, pp. 1573–1578. network,” in Communications (ICC), 2016 IEEE International
[199] ——, “Brain-computer interface using deep neural network and Conference on, 2016, pp. 1–6.
its application to mobile robot control,” in Advanced Motion [218] P. Garg, E. Davenport, G. Murugesan, B. Wagner, C. Whitlow,
Control (AMC), 2018 IEEE 15th International Workshop on, J. Maldjian, and A. Montillo, “Automatic 1d convolutional
2018, pp. 169–174. neural network-based detection of artifacts in meg acquired
[200] J. Hennrich, C. Herff, D. Heger, and T. Schultz, “Investigating deep without electrooculography or electrocardiography,” in Pattern
learning for fnirs based bci.” in EMBC, 2015, pp. 2844–2847. Recognition in Neuroimaging (PRNI), 2017 International
[201] T. Hiroyasu, K. Hanawa, and U. Yamamoto, “Gender classification Workshop on, 2017, pp. 1–4.
of subjects from cerebral blood flow changes using deep [219] M. Shu and A. Fyshe, “Sparse autoencoders for word decoding
learning,” in Computational Intelligence and Data Mining from magnetoencephalography,” in Proceedings of the third
(CIDM), 2014 IEEE Symposium on, 2014, pp. 229–233. NIPS Workshop on Machine Learning and Interpretation in
[202] S. Koyamada, Y. Shikauchi, K. Nakae, M. Koyama, and S. Ishii, NeuroImaging (MLINI), 2013.
A Survey on Deep Learning-based Non-Invasive Brain Signals 31

[220] A. Hasasneh, N. Kampel, P. Sripad, N. J. Shah, and J. Dammers, [240] K. B. Mikkelsen, S. L. Kappel, D. P. Mandic, and P. Kidmose,
“Deep learning approach for automatic classification of ocular “Eeg recorded from the ear: Characterizing the ear-eeg method,”
and cardiac artifacts in meg data,” Journal of Engineering, vol. Frontiers in neuroscience, vol. 9, p. 438, 2015.
2018, 2018. [241] S. B. Rutkove, “Introduction to volume conduction,” in The clinical
[221] Y. Gordienko, S. Stirenko, Y. Kochura, O. Alienin, M. Novotarskiy, neurophysiology primer, 2007, pp. 43–53.
and N. Gordienko, “Deep learning for fatigue estimation on [242] B. Burle, L. Spieser, C. Roger, L. Casini, T. Hasbroucq, and
the basis of multimodal human-machine interactions,” arXiv F. Vidal, “Spatial and temporal resolutions of eeg: Is it really
preprint arXiv:1801.06048, 2017. black and white? a scalp current density view,” International
[222] G. Pfurtscheller and F. L. Da Silva, “Event-related eeg/meg Journal of Psychophysiology, vol. 97, no. 3, pp. 210–220, 2015.
synchronization and desynchronization: basic principles,” [243] J. Malmivuo, R. Plonsey et al., Bioelectromagnetism: principles
Clinical neurophysiology, vol. 110, no. 11, pp. 1842–1857, and applications of bioelectric and biomagnetic fields, 1995.
1999. [244] M. Tortella-Feliu, A. Morillas-Romero, M. Balle, J. Llabrés,
[223] P. Khurana, A. Majumdar, and R. Ward, “Class-wise deep X. Bornas, and P. Putman, “Spontaneous eeg activity and
dictionaries for eeg classification,” in International Joint spontaneous emotion regulation,” International Journal of
Conference on Neural Networks (IJCNN), 2016, pp. 3556–3563. Psychophysiology, vol. 94, no. 3, pp. 365–372, 2014.
[224] E. Yin, T. Zeyl, R. Saab, T. Chau, D. Hu, and Z. Zhou, “A [245] A. Salek-Haddadi, K. Friston, L. Lemieux, and D. Fish, “Studying
hybrid brain–computer interface based on the fusion of p300 spontaneous eeg activity with fmri,” Brain research reviews,
and ssvep scores,” IEEE Transactions on Neural Systems and vol. 43, no. 1, pp. 110–133, 2003.
Rehabilitation Engineering, vol. 23, no. 4, pp. 693–701, 2015. [246] A. Ikeda and Y. Washizawa, “Spontaneous eeg classification using
[225] S. Min, B. Lee, and S. Yoon, “Deep learning in bioinformatics,” complex valued neural network,” in International Conference on
Briefings in bioinformatics, vol. 18, no. 5, pp. 851–869, 2017. Neural Information Processing, 2019, pp. 495–503.
[226] S. Sarraf, G. Tofighi et al., “Deepad: Alzheimer’s disease [247] A. M. Norcia, L. G. Appelbaum, J. M. Ales, B. R. Cottereau, and
classification via deep convolutional neural networks using mri B. Rossion, “The steady-state visual evoked potential in vision
and fmri,” bioRxiv, p. 070441, 2016. research: a review,” Journal of vision, vol. 15, no. 6, pp. 4–4,
[227] L. Marc Moreno, “Deep learning for brain tumor segmentation,” 2015.
Master diss. University of Colorado Colorado Springs., 2017. [248] S. Lees, N. Dayan, H. Cecotti, P. Mccullagh, L. Maguire, F. Lotte,
[228] A. Radford, L. Metz, and S. Chintala, “Unsupervised representa- and D. Coyle, “A review of rapid serial visual presentation-based
tion learning with deep convolutional generative adversarial net- brain–computer interfaces,” Journal of neural engineering,
works,” International Conference on Learning Representations vol. 15, no. 2, p. 021001, 2018.
(ICLR), 2016. [249] K. H. Chiappa, Evoked potentials in clinical medicine, 1997.
[229] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative [250] V. Mayya, B. Mainsah, and G. Reeves, “Information-theoretic
adversarial networks,” in International Conference on Machine analysis of refractory effects in the p300 speller,” arXiv
Learning (ICML), 2017, pp. 214–223. preprint arXiv:1701.03313. Non-exclusive-distrib License.
[230] A. Antoniades, L. Spyrou, C. C. Took, and S. Sanei, “Deep learning https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html,
for epileptic intracranial eeg data,” in Machine Learning for 2017.
Signal Processing (MLSP), 2016 IEEE 26th International [251] C. Guger, S. Daban, E. Sellers, C. Holzner, G. Krausz,
Workshop on, 2016, pp. 1–6. R. Carabalona, F. Gramatica, and G. Edlinger, “How many
[231] A. Antoniades, L. Spyrou, D. Martin-Lopez, A. Valentin, people are able to control a p300-based brain–computer
G. Alarcon, S. Sanei, and C. C. Took, “Deep neural architectures interface (bci)?” Neuroscience letters, vol. 462, no. 1, pp. 94–98,
for mapping scalp to intracranial eeg,” International journal of 2009.
neural systems, p. 1850009, 2018. [252] A. Belitski, J. Farquhar, and P. Desain, “P300 audio-visual speller,”
[232] R. Parasuraman and Y. Jiang, “Individual differences in cognition, Journal of Neural Engineering, vol. 8, no. 2, p. 025022, 2011.
affect, and performance: Behavioral, neuroimaging, and [253] M. Welvaert and Y. Rosseel, “On the definition of signal-to-noise
molecular genetic approaches,” Neuroimage, vol. 59, no. 1, pp. ratio and contrast-to-noise ratio for fmri data,” PloS one, vol. 8,
70–82, 2012. no. 11, 2013.
[233] L. Deng, “Three classes of deep learning architectures and their [254] R. M. Cichy, A. Khosla, D. Pantazis, and A. Oliva, “Dynamics of
applications: a tutorial survey,” APSIPA transactions on signal scene representations in the human brain revealed by magne-
and information processing, 2012. toencephalography and deep neural networks,” Neuroimage, vol.
[234] W.-L. Zheng and B.-L. Lu, “Personalizing eeg-based affective 153, pp. 346–358, 2017.
models with transfer learning,” in Proceedings of the Twenty- [255] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Fifth International Joint Conference on Artificial Intelligence, Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
2016, pp. 2732–2738. [256] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau,
[235] X. Zhang, L. Yao, and F. Yuan, “Adversarial variational embedding F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase rep-
for robust semi-supervised learning,” in SIGKDD 2019, 2019. resentations using rnn encoder-decoder for statistical machine
[236] S. Koyama, S. M. Chase, A. S. Whitford, M. Velliste, A. B. translation,” arXiv preprint arXiv:1406.1078, 2014.
Schwartz, and R. E. Kass, “Comparison of brain–computer [257] Y. Bengio, A. Courville, and P. Vincent, “Representation learning:
interface decoding algorithms in open-loop and closed-loop A review and new perspectives,” IEEE transactions on pattern
control,” Journal of computational neuroscience, vol. 29, no. 1- analysis and machine intelligence, vol. 35, no. 8, pp. 1798–
2, pp. 73–87, 2010. 1828, 2013.
[237] S. Aliakbaryhosseinabadi, E. N. Kamavuako, N. Jiang, D. Farina, [258] P. O. Glauner, “Comparison of training methods for deep neural
and N. Mrachacz-Kersting, “Online adaptive synchronous bci networks,” arXiv preprint arXiv:1504.06825, 2015.
system with attention variations,” in Brain-Computer Interface [259] G. St-Yves and T. Naselaris, “Generative adversarial networks
Research, 2019, pp. 31–41. conditioned on brain activity reconstruct seen images,” in
[238] E. K. Kalunga, S. Chevallier, Q. Barthélemy, K. Djouani, 2018 IEEE International Conference on Systems, Man, and
E. Monacelli, and Y. Hamam, “Online ssvep-based bci using Cybernetics (SMC), 2018, pp. 1054–1061.
riemannian geometry,” Neurocomputing, vol. 191, pp. 55–68,
2016.
[239] M. Pacharra, S. Debener, and E. Wascher, “Concealed around-the-
ear eeg captures cognitive processing in a visual simon task,”
Frontiers in human neuroscience, vol. 11, p. 290, 2017.
32

Appendices strong intra-band correlation with a distinct behavioral state


[10]. Each EEG pattern contains signals associated with
particular brain information. Table 7 shows EEG frequency
A. Non-invasive Brain Signals
patterns and the corresponding characteristics. Here, the
Here, we present a detailed introduction of brain signals as degree of awareness denotes the perception of individuals
shown in Figure 2. Non-invasive brain imaging technique when presented with external stimuli.
can be collected using electrical, magnetic or metabolic Compared to other signals (e.g., fMRI, fNIRS, MEG),
methods, which mainly include Electroencephalogram EEG has several important advantages: 1) the hardware has
higher portability with much lower price; 2) the temporal
(EEG), Functional near-infrared spectroscopy (fNIRS),
resolution is very high (milliseconds level). Among other
Functional magnetic resonance imaging (fMRI), and
non-invasive techniques, only MEG has the same level
Magnetoencephalography (MEG).
of temporal resolution; 3) EEG is relatively tolerant of
subject movement and artifacts, which can be minimized
A.1. Electroencephalography (EEG) by existing signal processing methods; 4) the subject
Electroencephalography (EEG) is the most commonly used doesn’t need to be exposed to high-intensity (>1 Tesla)
non-invasive technique for measuring brain activities. EEG magnetic fields, therefore, EEG can serve subjects that
monitors the voltage fluctuations generated by an electrical have metal implants in their body (such as metal-containing
current within human neurons. Electrodes placed on pacemakers).
the scalp measure the amplitude of EEG signals. EEG As the most commonly used signals, there are a large
signals have a low spatial resolution due to the effect of number of sub-classes of EEG signals. In this section,
volume conduction which refers to the complex effects we present a methodical introduction of EEG sub-class
of measuring electrical potentials a distance from the signals. As shown in Figure 2, we divided EEG signals into
source generators [241, 242]. EEG electrode locations spontaneous EEG and evoked potentials. Evoked potentials
generally follow the international 10-20 system [243]. The can be split into event-related potentials and steady-state
specific placement of electrodes is presented in Figure 5 evoked potentials based on the frequency of the external
[10]. The EEG signals are collected while the subject stimuli [7]. Each potential contains visual-, auditory-, and
is undertaking imagination task. Each line represents the somatosensory- potentials based on the external stimuli
signal stream collected from a single EEG electrode (also types. The dashed quadrilaterals in Figure 2, such as
called ‘channel‘) over time. Intracortical, SEP, SSAEP, SSSEP, and RSAP, are not
The temporal resolution of EEG signals is much better included in this survey because there are very few existing
than the spatial resolution. The ionic current changes studies working on them with deep learning algorithms. We
rapidly, which offers a temporal resolution higher than list these signals for systematic completeness.
1000 Hz. The SNR of EEG is generally very poor due
to both objective and subjective factors. Objective factors
include environmental noises, the obstruction of the skull
and other tissues between cortex and scalp, and different
stimulations. Subjective factors contain the subject’s mental
stage, fatigue status, the variance among different subjects,
and so on.
EEG recording equipment can be installed in a cap-
like headset. The EEG headset can be mounted on the user’s
head to gather signals. Compared to other equipment used (a) EEG electrode locations (b) EEG signals
to measure brain signals, EEG headsets are portable and
more accessible for most applications. Figure 5: EEG electrode locations on scalp (10-20 system)
The EEG signals collected from any typical EEG and the gathered EEG signals [10]. The electrodes’ names
hardware have several non-overlapping frequency bands are marked by their position: Fp (pre-frontal), F (frontal), T
(Delta, Theta, Alpha, Beta, and Gamma) based on the (temporal), P (parietal), O (occipital), and C ( central).
33

Table 7: EEG patterns and corresponding characters. Awareness Degree denotes the degree of being aware of an external
world. The awareness degree mentioned here is mainly defined in physiology instead of psychology.

Patterns Frequency (Hz) Amplitude Brain State Awareness Degree Produced Location
Delta 0.5-4 Higher Deep sleep pattern Lower Frontally and posteriorly
Theta 4-8 High Light sleep pattern Low Entorhinal cortex, hippocampus
Alpha 8-12 Medium Closing the eyes, relax state Intermediate Posterior regions of head
Beta 12-30 Low Active thinking, focus, high alert, anxious High Most evident frontally, motor areas
Gamma 30-100 Lower During cross-modal sensory processing Higher Somatosensory, auditory cortices

A.1.1. Spontaneous EEG Typically, when we talk about Potentials (AEP); and Somatosensory Evoked Potentials
the term ‘EEG,’ we refer to spontaneous EEG which (SEP) [28]. The VEP signals are mainly on the occipital
measures the brain signals under a specific state without lobe, and the highest signal amplitudes are collected at the
external stimulation [244, 245, 246]. In particular, Calcarine sulcus.
spontaneous EEG includes the EEG signals while the 1) Visual Evoked Potentials (VEP). Visual Evoked
individual is sleeping, undertaking a mental task (e.g., Potentials are a specific category of ERP which is caused
counting), suffering brain disorders, undertaking motor by visual stimulus (e.g., an alternating checkerboard pattern
imagery tasks, in a certain emotion, etc. on a computer screen). VEP signals are hidden within the
The EEG signals recorded while a user stares at a normal spontaneous EEG. To separate VEP signals from the
color/shape/image belong to this category. While the background EEG readings, repetitive stimulation and time-
subject is gazing at a specific image, the visual stimuli are locked signal-averaging techniques are generally employed.
steady without any change. This scenario differs from the Rapid Serial Visual Presentation (RSVP) [248] can
visual stimuli in evoked potential, where the visual stimuli be regarded as one kind of VEP. An RSVP diagram is
are changing at a specific frequency. Thus, we regard commonly used to examine the temporal characteristics of
the image stimulation as a particular state and regard it attention. The subject is required to stare at a screen where
as spontaneous EEG. Spontaneous EEG-based systems are a series of items (e.g., images) are presented one-by-one.
challenging to train, due to the lower SNR and the larger There is a specific item (called the target) separates from
variation across subjects [35]. the rest of the other items (called distracters). The subject
According to the gathering scenarios, the spontaneous knows which is the target before the RSVP experiment. For
EEG contains several subordinates: sleeping, motor instance, the distracters can be a color change or letters
imagery, emotional, mental disease and others. among numbers. RSVP contains a static mode (the items
appear on the screen and then disappear without moving)
A.1.2. Evoked Potential (EP) Evoked Potentials (EP) or and a moving mode (the items appear on the screen, move
evoked responses refers to the EEG signals which are to another place, and finally disappear). Nowadays, brain
evoked by an external stimulus instead of spontaneously. signal research mainly focuses on the static mode RSVP.
An EP is time-locked to the external stimulus while the Usually, the frequency of RSVP is 10Hz which means that
aforementioned spontaneous EEG is non-time-locked. In each item will stay on the screen for 0.1 seconds.
contrast to spontaneous EEG, EP generally has higher 2) Auditory Evoked Potentials (AEP). Auditory
amplitude and lower frequency. As a result, the EP Evoked Potentials are a specific subclass of ERP in which
signals are more robust across subjects. According to responses to auditory (sound) stimuli are recorded. AEP
the stimulation method, there exist two categories of EP: is mainly recorded from the scalp but originates at the
the Event-Related Potential (ERP) and the Steady State brainstem or cortex. The most common AEP measured is
Evoked Potential (SSEP). ERP records the EEG signals in the auditory brainstem response (ABR) which is generally
response to an isolated discrete stimulus event (or event employed to test the hearing ability of newborns and
change). To achieve this isolation, stimuli in an ERP infants. In the brain signal area, AEP is mainly used in
experiment are typically separated from each other by a clinical tests for its accuracy and reliability in detecting
long inter-stimulus interval, allowing for the estimation of a unilateral loss [249]. Similar to RSVP, Rapid Serial
stimulus-independent baseline reference [247]. The stimuli Auditory Presentation (RSAP) refers to the experiments
frequency of ERP is generally lower than 2 Hz. In contrast, with rapid serial presentation of sound stimuli. The task
SSEP is generated in response to a periodic stimulus at a for the subject is to recognize the target audio among the
fixed rate. The stimuli frequency of SSEP generally ranges distracters.
within 3.5-75 Hz. 3) Somatosensory Evoked Potentials (SEP).21 So-
Event-related potential (ERP). There are three kinds matosensory Evoked Potentials are another commonly used
of evoked potentials in extensive research and clinical 21 Generally, Somatosensory Evoked Potentials is abbreviated as SSEP

use: Visual Evoked Potentials (VEP); Auditory Evoked or SEP. In this paper, we choose SEP as the abbreviation in case of the
conflict with Steady-State Evoked Potentials (SSEP).
34

flash six times with each column flashing one time. Second,
the rows will flash for six times. After that, this paradigm
repeats for several times (e.g., N times). The P300 signals
of the total 12N flash will be analyzed to output a single
outcome (i.e., one letter/number).
Steady State Evoked Potentials (SSEP). Steady State
Evoked Potential is another subcategory of evoked poten-
tials, which are periodic cortical responses evoked by cer-
tain repetitive stimuli with a constant frequency. It has been
(a) ERP components (b) P300 speller demonstrated that the brain oscillations generally maintain
a steady level over time while the potentials are evoked by
Figure 6: P300 waves and visual-based P300 speller [250].
steady state stimuli (e.g., a flickering light with fixed fre-
quency). Technically, SSEP is defined as a form of response
subcategory of ERP which is elicited by electrical stimula- to repetitive sensory stimulation in which the constituent
tion of the peripheral nerves. SEP signals conclude a series frequency components of the response remain constant over
of amplitude deflection that can be elicited by virtually any time in both amplitude and phase [37]. Depending on
sensory stimuli. the type of stimuli, SSEP divides into three subcategories:
P300. P300 (also called P3) is an important component Steady-State Visually Evoked Potentials (SSVEP), Steady-
in ERP [251]. Here we introduce P300 signal separately State Auditory Evoked Potentials (SSAEP), and Steady-
since it is widely-used in brain signal analysis. Figure 6a State Somatosensory Evoked Potentials (SSSEP). In the
shows the ERP signal fluctuation in the 500 ms after brain signal area, most studies are focused on visual evoked
the stimuli onset. The waveform mainly concludes five steady potentials, and only rarely do papers focus on au-
components, P1, N1, P2, N2, and P3. The capital character ditory and somatosensory stimuli. Therefore, in this sur-
P/N represents positive/negative electrical potentials. The vey, we mainly introduce SSVEP rather than SSAEP and
following number refers to the occurrence time of the SSSEP.
specific potential. Thus, P300 denotes the positive potential Commonly Used Visual-Related Potentials. Visual
of ERP waveform at approximately 300 ms after the evoked potentials are the most common used potentials.
presented stimuli. Compared to other components, P300 Therefore, it is essential to distinguish the three different
has the highest amplitude and is easiest to detect. Thus, a visual evoked potential paradigms: VEP, RSVP, SSVEP.
large number of brain signal studies focus on P300 analysis. Here, we theoretically introduce the characteristics of each
P300 is more of an informative feature instead of a type of paradigm and then give three demonstration videos to
brain signal (e.g., VEP). Therefore, we do no list P300 in provide a better understanding. First, the frequencies are
Figure 2. P300 can be analyzed in most of ERP signals different: the frequency of VEP is less than 2Hz while
such as VEP, AEP, SEP. the frequency of RSVP is around 10Hz, and the frequency
In practice, P300 can be elicited by rare, task-relevant of SSVEP ranges from 3.5 to 75Hz. Second, they have
events in an ‘oddball’ paradigm (e.g., P300 speaker). In the various presentation protocols. In the VEP paradigm,
oddball paradigm, the subject receives a series of stimuli different visual patterns will be presented on the screen to
where low-probability target items are mixed with high- check the user’s brain signals changes. For instance, in
probability non-target items. Visual and auditory stimuli this video22 , the image pattern is full of the screen and
are the most commonly used in the oddball paradigm. changes dramatically. In RSVP diagram, several items will
Figure 6b shows an example of visual-based P300 speller be presented on a screen one-by-one. All the items are
which enables the subject the spell letters/numbers directly shown in the same place and share the same frequency. For
through brain signals [250]. The 26 letters of the alphabet example, the video23 shows an RSVP scenario which is
and the Arabic numbers are displayed on a computer screen called speed reading. In SSVEP paradigm, several items
which serves as the keyboard. The subject focuses attention will be presented on a screen at the same time while
successively on the characters they wish to spell. The the items are shown at variant positions with different
computer detects the chosen character online in real time. frequencies. For example, in this demonstration video24 ,
This detection is achieved by repeatedly flashing rows and there are four circles distributed on the up, down, left, and
columns of the matrix. When the elements containing right sides of a screen and the frequency of each item differs
the selected characters are flashing, a P300 fluctuation is from each other.
elicited. In the 6 × 6 matrix screen, the rows and columns
flash in mixed random order. The flash duration and interval 22 https://www.youtube.com/watch?v=iUW l5YAEEM
among adjacent flashes are generally set as 100 ms [252]. 23 https://www.youtube.com/watch?v=5yddeRrd0hA&t=36s

The columns and rows flash separately. First, the columns 24 https://www.youtube.com/watch?v=t96rl1SFHlI
35

A.2. Functional Near-infrared Spectroscopy (fNIRS) fNIRS and fMRI are as follows [24]. First, as the name
implies, fMRI measures BOLD response through magnetic
Functional near-infrared spectroscopy (fNIRS) is a non-
instead of optical methods. Hemoglobin differs in how it
invasive functional neuro-imaging technology using near-
responds to magnetic fields, depending on whether it has
infrared (NIR) light [38]. In specific, fNIRS employs
a bound oxygen molecule. The magnetic fields are more
NIR light to measure the aggregation degree of oxygenated
sensitive to and are more easily distorted by deoxy-Hb than
hemoglobin (Hb) and deoxygenated-hemoglobin (deoxy-
Hb molecules. Second, the magnetic fields have higher
Hb) because Hb and deoxy-Hb have higher absorbence
penetration than NIR light, which gives fMRI greater ability
of light than other head components such as the skull
to capture information from deep parts of the brain than
and scalp. fNIRS relies on blood-oxygen-level-dependent fNIRS. Third, fMRI has a higher spatial resolution than
(BOLD) response or hemodynamic response to form a fNIRS since the latter’s spatial resolution is limited by the
functional neuro-image. The BOLD response can detect emitter-detector pairs. However, the temporal resolutions
the oxygenated or deoxygenated blood level in the brain of fMRI and fNIRS are at an equal level because they both
blood. The relative levels reflect the blood flow and neural constrained by the blood flow speed.
activation, where increased blood flow implies a higher fMRI has several flaws compared to fNIRS: 1) fMRI
metabolic demand caused by active neurons. For example, requires an expensive scanner to generate magnetic fields;
when the user is concentrating on a mental task, the 2) the scanner is heavy and has poor portability. In order
prefrontal cortex neurons will be activated, and the BOLD to measure the signal of interest, CNR (Contrast-to-Noise
response in the prefrontal cortex area will be stronger [200]. Ratio) has been investigated to measure the image quality
Single or multiple emitter-detector pairs measure the of fMRI because researchers are more interested in the
Hb and deoxy-Hb: the emitter transmits NIR light through contrast between images rather than the raw images. So
the blood vessels to the detector. Most existing studies for fMRI data, using the CNR of the time series instead of
use fNIRS technologies to measure the status of prefrontal (t)SNR is more preferred because CNR compares a measure
and motor cortex. The former response to mental tasks of the activation fluctuations to the noise [253].
and music/image imagery while the latter is a response to
motor-related tasks (e.g., motor imagery). The monitored
A.4. Magnetoencephalography (MEG)
Hb and deoxy-Hb change slowly since the blood speed
varies in a relatively slow ratio compared to electrical Magnetoencephalography (MEG) is a functional neu-
signals. Temporal resolution refers to the smallest time of roimaging technique for mapping brain activity by record-
neural activity reliably separated by the signal. The fNIRS ing magnetic fields produced by electrical currents occur-
has lower temporal resolution compared with electrical or ring naturally in the brain, using very sensitive magnetome-
magnetic signals. The spatial resolution depends on the ters [254]. The ionic currents of active neurons will cre-
number of emitter-detector pairs. In current studies, three ate weak magnetic fields. The generated magnetic fields
emitters and eight detectors would suffice for adequately can be measured by magnetometers like SQUIDs (super-
acquiring the prefrontal cortex signals; and six emitters and conducting quantum interference devices). However, pro-
six detectors would suffice for covering the motor cortex ducing a detectable magnetic field requires massive (e.g.,
area [29]. fNIRS has a drawback in that it cannot be used 50,000) active neurons with similar orientation. The source
to measure cortical activity occurring deeper than 4cm in of the magnetic field measured by MEG is the pyramidal
the brain, due to the limitations in light emitter power and cells which are perpendicular to the cortex surface.
spatial resolution. MEG has a relatively low spatial resolution since
the signal quality highly depends on the measurement
A.3. Functional Magnetic Resonance Imaging (fMRI) factors (e.g., brain area, neuron orientations, neuron depth).
However, MEG can provide very high temporal resolution
Functional magnetic resonance imaging (fMRI) monitors (≥1000Hz) since MEG directly monitors the brain activity
brain activities by detecting changes associated with blood from the neuron level, which is in the same level of
flow in brain areas [14]. Similar to fNIRS, fMRI relies intracortical signals. MEG equipment is expensive and not
on the BOLD response. The main differences between portable which limits its real-world deployment.
36

B. Basic Deep Learning in Brain Signal Analysis connection with the other nodes of the same layer. MLP
includes multiple hidden layers. As shown in Figure 7b,
In this part, we will give relative detail introduction of we take a structure with two hidden layers as an example to
various deep learning models for the reason that a part of describe the data flow in MLP.
the potential readers who are from non-computer area (e.g., The input layer receives the observation x and feeds
biomedical) are not familiar to deep learning. forward to the first hidden layer,
For simplification, we first define an operation T (·) as
xh1 = σ(T (x)) (3)
T (x) = w ∗ x + b (1)
where xh1 denotes the data flow in the first hidden
T (x, x0 ) = w ∗ x + b + w0 ∗ x0 + b0 (2) layer and σ represents the non-linear activation function.
where x and x0 denote two variables while w, w0 , b, and There are several commonly used activation functions such
b0 denote the corresponding weights and basis. as sigmoid/Logistic, Tanh, ReLU, we choose sigmoid
activation function as an example in this section. Then, the
B.1. Discriminative Deep Learning Models data flow to the second hidden layer and the output layer,

Since the main task of brain signal analysis is brain signal xh2 = σ(T (xh1 )) (4)
recognition, the discriminative deep learning models are
the most popular and powerful algorithms. Suppose we y 0 = σ(T (xh2 )) (5)
have a dataset of brain signal samples {X, Y} where X where y 0 denotes the predict results in one-hot format. The
denotes the set of brain signal observations and Y denotes error (i.e., loss) could be calculated based on the distance
the set of sample ground truth (i.e., labels). Suppose an between y 0 and the ground truth y. For instance, the
specific sample-label pair {x ∈ RN , y ∈ RM } where Euclidean-distance based error can be calculated by
N and M denote the dimension of observations and the
number of sample categories, respectively. The aim of error = ky 0 − yk2 (6)
discriminative deep learning models is to learn a function
with the mapping: x → y. In short, the discriminative where k·k2 denotes the Euclidean norm. Afterward, the
models receive the input data and output the corresponding error will be back-propagated and optimized by a suitable
category or label. All the discriminative models introduced optimizer. The optimizer will adjust all the weights
in this section are supervised learning techniques which and basis in the model until the error converges. The
require the information of both the observations and the most widely used loss functions includes cross-entropy,
ground truth. negative log likelihood, mean square estimation, etc. The
most widely used optimizers include Adaptive moment
B.1.1. Multi-Layer Perceptron (MLP) The most basic estimation (Adam), Stochastic Gradient Descent (SGD),
neural network is fully-connected neural networks (Fig- Adagrad (Adaptive subgradient method), etc.
ure 7a) which only contains one hidden layer. The input Several terms may be easily confused with each other:
layer receives the raw data or extracted features of brain sig- Artificial Neural Network (ANN), Deep Neural Network
nals while the output layer shows the classification results. (DNN), and MLP. These terms have no strict difference
The term ‘fully-connected’ denotes each node in a specific and often mixed in literature and commonly used as
layer is connected with all the nodes in the previous and synonyms. Generally, ANN and DNN can be used to
next layer. This network is too ‘shallow‘ and generally not describe deep learning models overall, including not only
regarded as ‘deep‘ neural networks. fully-connected networks but also other networks (e.g.,
Multilayer Perceptron is the simplest and the most recurrent, convolutional networks), but MLP can only refer
basic deep learning model. The key difference between to fully-connected network. Additionally, ANN contains all
MLP and the fully-connected neural network is that MLP the models of neural networks, can be either shallow (one
has more than one hidden layers. All the nodes are fully- hidden layer) or deep (multiple hidden layers) while DNN
connected with the nodes of the adjacent layers but without doesn’t cover shallow neural network [30, 31].
37
Input Layer Hidden Layer Output Layer Input Layer Hidden Layer (1)  Hidden Layer (2) Output Layer

LSTM cell adopts four gates: the input gate, forget gate,
output gate, and input modulation gate. Each gate is a
weight to control how much information can flow through
this gate. For example, if the weight of the forget gate is
zero, the LSTM cell would remember all the information
(a) Fully-connected neural net- (b) MLP
passed from the previous time t−1; if the weight is one, the
work LSTM cell would remember nothing. The corresponding
activation function determines the weight. The detailed data
Figure 7: Illustration of standard neural network and flow as follows:
multilayer perceptron. (a) The basic structure of the fully-
connected neural network. The input layer receives the f = σ(T (It , Ot−1 )) (8)
raw data or extracted features of brain signals while the
i = σ(T (It , Ot−1 )) (9)
output layer shows the classification results. The term
‘fully-connected’ denotes each node in a specific layer is o = σ(T (It , Ot−1 )) (10)
connected with all the nodes in the previous and next layer.
m = tanh(T (It , Ot−1 )) (11)
(b) MLP could have multiple hidden layers, the more, the
deeper. This is an example of MLP with two hidden layers, ct = f ∗ ct−1 + i ∗ m (12)
which is the simplest MLP model.
ht = o ∗ tanh(ct ) (13)
where i, f , o and m represent the input gate, forget gate,
B.1.2. Recurrent Neural Networks (RNN) Recurrent output gate and input modulation gate, respectively.
Neural Network is a specific subclass of discriminative deep Gated Recurrent Units (GRU). Another widely used RNN
learning model which are designed to capture temporal architecture is GRU [256]. Similar to LSTM, GRU attempts
dependencies among input data [41]. Figure 8a describes to exploit the information from the past. GRU does
the activity of a specific RNN node in the time domain. At not require hidden states, however, it receives temporal
each time ranges from [1, t + 1], the node receives an input information only from the output of time t − 1. Thus, as
I (the subscript represents the specific time) and a hidden shown in Figure 9b, GRU has two inputs (It and Ot−1 ) and
state c from the previous time (except the first time). For one output (Ot ). The mapping can be described as:
instance, at time t it receives not only the input It but also
the hidden state of the previous node ct−1 . The hidden state It , Ot−1 → Ot (14)
can be regarded as the ‘memory’ of the nodes which can
help the RNN ‘remember’ the historical input. GRU contains two gates: reset gate r and update gate z.
Next, we will report two typical RNN architectures The former decides how to combine the input with previous
which have attracted much attention and achieved great memory. The latter decides how much of previous memory
success: long short-term memory and gated recurrent units. to keep around, which is similar to the forget gate of LSTM.
They both follow the basic principles of RNN, and we will The data flow as follows:
pay our attention to the complicated internal structures in
each node. Since the structure is much more complicated z = σ(T (It , Ot−1 )) (15)
than general neural nodes, we call it a ‘cell.’ Cells in RNN
are equivalent to nodes in MLP. r = σ(T (It , Ot−1 )) (16)
Long Short-Term Memory (LSTM). Figure 9a shows the Ōt = tanh(T (It , r ∗ Ot−1 )) (17)
structure of a single LSTM cell at time t [255]. The LSTM
cell has three inputs (It , Ot−1 , and ct−1 ) and two outputs Ot = (1 − z) ∗ Ot−1 + z ∗ Ōt (18)
(ct and Ot ). The operation is as follows: It can be observed that there’s an intermediate variable Ōt
which is similar to the hidden state of LSTM. However, Ōt
It , Ot−1 , ct−1 → ct , Ot (7)
only works on this time point and unable to pass to the next
It denotes the input value at time t, Ot−1 denotes the time point.
output at the previous time (i.e., time t − 1), and ct−1 We here give a brief comparison between LSTM and
denotes the hidden state at the previous time. ct and Ot GRU since they are very similar. First, LSTM and GRU
separately denote the hidden state and the output at time have comparable performance as studied by literature. For
t. Therefore, we can observe that the output Ot at time any specific task, it is recommended to try both of them
t not only related to the input It but also related to the to determine which provides better performance. Second,
information at the previous time. In this way, LSTM is GRU is lightweight since it only has two gates and without
empowered to remember the important information in the the hidden state. Therefore, GRU is faster to train and
time domain. Moreover, the essential idea of LSTM is to requires few data for generalization. Third, in contrast,
control the memory of specific information. For this aim, LSTM generally works better if the training dataset is big
38

O1 Ot­1 Ot Ot+1 B.1.3. Convolutional Neural Networks (CNN) Convolu-


c1 ct­2
ct­1
ct­1
ct
ct
ct+1 tional Neural Networks is one of the most popular deep
learning models specialized in spatial information explo-
I1 It-1 It It+1 Input layer Convolutional Pooling Convolutional Pooling Fully-connected Output Layer
Layer 1 Layer 1 Layer 2 Layer 2  Layer
ration [42]. This section will briefly introduce the work-
(a) RNN (b) CNN ing mechanism of CNN. CNN is widely used to discover
the latent spatial information in applications such as image
Figure 8: Illustration of RNN and CNN models. (a) The recognition, ubiquitous, and object searching due to their
recurrent procedure of the RNN model. This procedure salient features such as regularized structure, good spatial
describes the recurrent procedure of a specific node in time locality, and translation invariance. In the area of brain sig-
range [1, t + 1]. The node at time t receives two inputs nal, specifically, CNN is supposed to capture the distinctive
variables (It denotes the input at time t and ct−1 denotes dependencies among the patterns associated with different
the hidden state at time t − 1) and exports two variables brain signals.
(the output Ot and the hidden state ct at time t). (b) The We present a standard CNN architecture as shown
paradigm of CNN model which includes two convolutional in Figure 8b. The CNN contains one input layer, two
layers, two pooling layers, and one fully-connected layer. convolutional layers with each followed by a pooling layer,
one fully-connected layer, and one output layer. The square
patch in each layer shows the processing progress of a
specific batch of input values. The key to the CNN is
to reduce the input data into a form which is easier to
recognize, with as little information loss as possible. CNN
has three stacked layers: the convolutional Layer, pooling
Layer, and fully-connected Layer.
The convolutional layer is the core block of CNN,
which contains a set of filters to convolve the input data
(a) Structure of a LSTM cell. (b) Structure of a GRU cell. followed by a nonlinear transformation to extract the
geographical features. In the deep learning implementation,
Figure 9: Illustration of detailed LSTM and GRU cell there are several key hyper-parameters should be set in the
structures. (a) LSTM cell receives three inputs (It denotes convolutional layer, like the number of filters, the size of
the input at time t, Ot−1 denotes the output of previous each filter, etc. The pooling layer generally follows the
time, and ct−1 denotes the hidden state of the previous time) convolutional layer. The pooling layer aims to reduce the
and exports two outputs (the output of this time Ot and spatial size of the features progressively. In this way, it can
the hidden state of this time ct ). LSTM cell contains four help to decrease the number of parameters (e.g., weights
gates in order to control the data flow, which are the input and basis) and the computing burden. There are three kinds
gate, output gate, forget gate, and input modulation gate. of pooling operation: max, min, average. Take max pooling
(b) GRU cell receives two inputs (the input of this time It for example. The pooling operation outputs the maximum
and the output of the previous time Ot−1 ) and exports its value of the pooling area as a result. The hyper-parameters
output Ot . GRU cell only contains two gates which are the in the pooling layer includes the pooling operation, the size
reset gate and the update gate. Unlike the hidden state ct in of the pooling area, the strides, etc. In the fully-connected
LSTM cell, there is no transmittable hidden state in GRU layer, as in the basic neural network, the nodes have full
cell except one intermediate variable Ōt . connections to all activations in the previous layer.
The CNN is the most popular deep learning model in
enough. The reason is that LSTM has better non-linearity brain signal research, which can be used to exploit the latent
than GRU since LSTM has two more control gates (input spatial dependencies among the input brain signals like
modulation gate and forget gate). As a result, LSTM, fMRI image, spontaneous EEG, and so on. More details
compared with GRU, is more powerful to discover the latent will be reported in Section 4.
distinct information from large-level training dataset.
39
Input Layer Hidden Layer  Output Layer
Visible Layer Hidden Layer 
differs from the standard neural network, in that the AE is
trained to reconstruct its inputs, which forces the hidden
layer to try to learn good representations of the inputs.
Encoder

Decoder
The structure of AE contains two blocks. The first
block is called the encoder, which embeds the observation
to a latent representation (also called ‘code’),

(a) Autoencoder (b) RBM xh = σ(T (x)) (19)


Visible  Hidden  Hidden 
Layer Layer 1 Layer 2
Input Layer Hidden Layer 1  Hidden Layer 2  Hidden Layer 3  Output Layer

where xh represents the hidden layer. The second block is


called the decoder, which decodes the representation into
Encoder

Decoder

the original space,

y 0 = σ(T (xh )) (20)


(c) Deep AE (d) Deep RBM
where y 0 represents the output.
Figure 10: Illustration of several standard representative AE forces y 0 to be equal to the input x and calculates
deep learning models. (a) A basic autoencoder contains the error based on the distance between them. Thus, AE
one hidden layer. The process from the input layer to can compute the loss function only by x without the ground
the hidden layer is an encoder while the process from the truth y
hidden layer to the output layer is a decoder. (b) Restricted error = ky 0 − xk2 (21)
Boltzmann Machine, the encoder and the decoder share
the same transformation weights. The input layer and the Compared to Equation 6, this equation does not involve the
output layer are merged into the visible layer. (c) Deep AE variable y because it takes the input x as the ground truth.
with hidden layers. Generally, the number of hidden layers This is why AE is able to perform unsupervised learning.
is odd, and the middle layer is the learned representative Naturally, one variant of AE is Deep-AE (D-AE)
features. (d) Deep RBM has one visible layer and multiple which has more than one hidden layer. We present the
hidden layers, the last layer is the encoded representation. structure of D-AE with three hidden layers in Figure 10c.
From the figure, we can observe that there is one more
hidden layer in both the encoder and the decoder. The
B.2. Representative Deep Learning Models symmetrical structure ensures the smoothness of encoding
The term of representative deep learning refers to use deep and decoding procedure. Thus, D-AE generally has an odd
neural network for representation learning. It aims to learn number of hidden layers (e.g., 2n + 1) where the first n
representations of input data that makes it easier to perform layers belong to the encoder, the (n + 1)-th layer works as
a downstream task (e.g., classification, generation, and the code which belongs to both encoder and decoder, and
clustering) [257]. the last n layers belong to the decoder. The data flow of
The essential blocks of representative deep learning D-AE (Figure 10c) can be represented as
models are autoencoders, and restricted Boltzmann ma-
xh1 = σ(T (x)) (22)
chines25 . Deep Belief Networks are composed of AE or
RBM. The representative models including AE, RBM26 , xh2 = σ(T (xh2 )) (23)
and DBN, are unsupervised learning methods. Thus, they
can learn the representative features from only the input ob- where xh2 denotes the median hidden layer (the code).
servations x without the ground truth y. In short, repre- Then decode the hidden layer, we can get
sentative models receive the input data and output a dense
representation of the data. There are various definitions in xh3 = σ(T (xh2 )) (24)
different studies for several models (such as DBN, Deep
y 0 = σ(T (xh3 )) (25)
RBM, and Deep AE), in this survey, we choose the most
understandable definitions and will present them in detail It is almost the same as AE except that D-AE has more
in this section. hidden layers. Apart from D-AE, AE has many other
variants like denoising autoencoder, sparse autoencoder,
B.2.1. Autoencoder (AE) As shown in Figure 10a, A contractive AE, etc. Here we only introduce the D-AE
autoencoder is a neural network that has three layers: the because it is easily confused with the AE-based deep belief
network. The key difference between them will be provided
25 AE and RBM are generally regarded as kind of deep learning
in Section B.2.3.
although they only have three and two layers, respectively.
26 We regard AE, and RBMas representative methods as most The core idea of AE and its variants is simple, which is
researches in brain researches adopt them for feature representation. that condensing the input data x into a code xh (generally
input layer, the hidden layer, and the output layer [43]. It
40
Input Layer 1 Output Layer 1

Visible  Hidden 
We can observe from the Figure 10d that the Deep-RBM
Layer 1 Layer 1
(D-RBM) is an RBM with multiple hidden layers. The
Autoencoder 1

Hidden Layer 1 

input data from the visible layer firstly flow to the first

RBM 1
hidden layer and then the second hidden layer. Then,

RBM 2
the code will flow backward into the visible layer for
reconstruction.

Autoencoder 2
Input Layer 2

Hidden Layer 2
Visible Layer 2 Hidden Layer 2
Output Layer 2
B.2.3. Deep Belief Networks (DBN) A Deep Belief
(a) DBN-AE (b) DBN-RBM Network (DBN) is a stack of simple networks, such as
AEs or RBMs [258]. Thus, we divided DBN into DBN-
Figure 11: Illustration of deep belief networks. (a) DBN AE (also called stacked AE) which is composed of AE and
composed of autoencoders. DBN-AE contains multiple DBN-RBM (also called stacked RBM) which is composed
AE components (in this case, two AE), with the hidden of RBM.
layer of the previous AE working as the input layer of As shown in Figure 11a, the DBN-AE contains two
the next AE. The hidden layer of the last AE is the AE structures while the hidden layer of the first AE works
learned representation. (b) DBN composed of RBM. In as the input layer of the second AE. This diagram has two
this illustration, there are two RBM components with the stages. In the first stage, the input data feed into the first
hidden layer of the first RBM working as the visible AE follows the rules introduced in Section B.2.1. The
layer of the second RBM. The last hidden layer is the reconstruction error is calculated and back propagated to
encoded representation. While DBN-RBM and D-RBM adjust the corresponding weights and basis. This iteration
(Figure 10d) have similar architecture, the former is trained continues until the AE converges. We get the mapping,
greedily while the latter is trained jointly .
x1 → xh1 (29)
the code layer has lower dimension) and then reconstructing Then, we move on to the second stage where the
the data based on the code. If the reconstructed y 0 can learned representative code in the hidden layer xh1 will be
approximate to the input data x, it can be demonstrated that used as the input layer of the second AE, which is
the condensed code xh carries enough information about x,
thus, we can regard xh as a representation of the input data x2 = xh1 (30)
for future operation (e.g., classification).
and then, after the second AE converges, we have
B.2.2. Restricted Boltzmann Machine (RBM) Restricted x2 → xh2 (31)
Boltzmann Machine is a stochastic artificial neural network
that can learn a probability distribution over its set of inputs where xh2 denotes the hidden layer of the second AE,
[44]. It contains two layers including one visible layer meanwhile, it is the final outcome of the DBN-AE.
(input layer) and one hidden layer, as shown in Figure 10b. The core idea of AE is that of learning a representative
From the figure, we can see that the connection lines code with lower dimensionality but containing most
between the two layers are bidirectional. RBM is a variant information of the input data. The idea behind DBN-AE
of Boltzmann Machine with stronger restriction of being is to learn a more representative and purer code.
without intra-layer connections. In a general Boltzmann Similarly, the DBN-RBM is composed of several
machine, the nodes in the same hidden layer will connect. single RBM structures. Figure 11b shows a DBN with two
Similar to AE, the procedure of RBM also includes two RBMs where the hidden layer of the first RBM is used as
steps. The first step condenses the input data from the the visible layer of the second RBM.
original space to the hidden layer in a latent space. After Compare the DBN-RBM (Figure 11b) and D-RBM
that, the hidden layer is used to reconstruct the input data (Figure 10d). They almost have the same architecture.
in an identical way. Compared to AE, RBM has a stronger Moreover, DBN-AE (Figure 11a) and D-AE (Figure 10c)
constraint which is that the encoder weights and the decoder have similar architecture. The most important difference
weights should be equal. We have between the DBN and the deep AE/RBM is that the former
xh = σ(T (x)) (26) is trained greedily while the latter is trained jointly. In
particular, for the DBN, the first AE/RBM is trained first,
0 h
x = σ(T (x )) (27) after it converges, the second AE/RBM is trained[44]. For
In the above two equations, the weights of T (·) are the the deep AE/RBM, jointly training means that the whole
same. Then, the error for training can be calculated by structure is trained together, no matter how layers it has.

error = kx0 − xk2 (28)


41
Input Layer Expectation

ε
Hidden Layer  Output Layer

Real
input sample but cannot generate a similar one. VAE has
Real Brain
Signals
Discriminator
Network
one fundamentally unique property that separates it from
Fake other AEs, and it is this property that makes VAE so use-

Latent Random Variable


ful for generative modeling: the latent spaces are designed
Generator Fake Brain
Network  Signals to be continuous which allows easy random sampling and
Encoder Standard
Deviation
Decoder
interpolation. Next, we will introduce how VAE works.
(a) Variational Autoencoder (b) GAN Similar to the standard AE, VAE can be divided into
an encoder and decoder where the former embeds the input
Figure 12: Illustration of generative deep learning models. data to a latent space and the latter transfers the data from
(a) VAE contains two hidden layers. The first hidden layer the latent space to the original space. However, the learned
is composed of two components: the expectation and the representation in the latent space is forced to approximate a
prior distribution p(z)¯ which is generally set as Standard
standard deviation, which are learned separately from the
input layer. The second hidden layer represents the encoded Gaussian distribution. Based on the reparameterization
information.  denotes the standard normal distribution. (b) trick [46], the first hidden layer of VAE is designed to have
GAN mainly contain two crucial components: the generator two parts where one denotes the expectation µ and another
and the discriminator network. The former receives a latent denotes the standard deviation σ, thus we have
random variable to generate a fake brain signal while the
µ = σ(T (x)) (32)
latter receives both the real and the generated brain signals
and attempts to determine if its generated or not. In the are σ = σ(T (x)) (33)
of brain signals, GAN reconstructs or augments data instead
of classification. Then, the latent code in the hidden layer is not directly
calculated but sampled from a Gaussian distribution
N (µ, σ 2 ). The statistic code
B.3. Generative Deep Learning Models
z =µ+σ∗ε (34)
Generative deep learning models are mainly used to
generate training samples or data augmentation. In other where ε ∼ N (0, I). The representation z is forced to a
words, generative deep learning models play a supporting prior distribution, and the distance errorKL is measured
role in the brain signal field to enhance the training by Kullback–Leibler divergence,
data quality and quantity. After the data augmentation,
the discriminative models will be employed for the ¯
errorKL = DKL (z, p(z)) (35)
classification. This procedure is created to improve the
robustness and effectiveness of the trained deep learning ¯ denotes the prior distribution. In the decoder,
where p(z)
networks, especially when the training data is limited. In z is decoded into the output y 0 ,
short, the generative models receive the input data and
output a batch of similar data. In this section, we will y 0 = σ(T (z)) (36)
introduce two typical generative deep learning models:
and the reconstruction error is
variational Autoencoder (VAE) and Generative Adversarial
Networks (GAN). errorrecon = ky 0 − xk2 (37)

B.3.1. Variational Autoencoder (VAE) Variational Au- The overall error for VAE is combined by the DL
toencoder, proposed in 2013 [46], is an important vari- divergence and the reconstruction error,
ant of AE, and one of the most powerful generative algo-
rithms. The standard AE and its other variants can be used error = errorKL + errorrecon (38)
for representation but fail in generation for the reason that
the learned code (or representation) may not be continuous. The key point of VAE is that all the latent represen-
Therefore, we cannot generate a random sample which is tations z are forced to obey the normal distribution. Thus,
we can randomly sample a representation z 0 ∈ p(z) ¯ from
similar to the input sample. In other words, the standard
AE does not allow interpolation. Thus, we can replicate the the prior distribution and then reconstruct a sample based
on z 0 . This is why VAE is so powerful in generation.
42

B.3.2. Generative Adversarial Networks (GAN) Genera- B.4. Hybrid Model


tive Adversarial Networks [47] is proposed in 2014 and
Hybrid deep learning models refers to models which are
achieved great success in a wide range of research areas
composed of at least two deep basic learning models where
(e.g., computer vision and natural language processing).
the basic model is a discriminative, representative, or
GAN is composed of two simultaneously trained neural net-
generative deep learning model. Hybrid models comprise
works with a generator and a discriminator. The generator
two subcategories based on their targets: classification-
captures the distribution of the input data, and the discrimi-
aimed (CA) hybrid models and the non-classification-aimed
nator is used to estimate the probability that a sample came
(NCA) hybrid models.
from the training data. The generator aims to generate fake
Most of the deep learning related studies in brain
samples while the discriminator aims to distinguish whether
signal area are focused on the first category. Based on
the sample is genuine. The functions of the generator and
the existing literature, the representative and generative
the discriminator are opposite; that’s why GAN is called
models are employed to enhance the discriminative models.
‘adversarial.’ After the convergence of both the generator
The representative models can provide more informative
and the discriminator, the discriminator ought to be unable
and low dimensional features for the discrimination while
to recognize the generated samples. Thus, the pre-trained
the generative models can help to augment the training
generator can be used to create a batch of samples and use
data quality and quantity which supply more information
them for further operations such as as classification.
for the classification. The CA hybrid models can be
Figure 12b shows the procedure of a standard GAN.
further subdivided into: 1) several discriminative models
The generator receives a noise signal s which is randomly
combined to extract more distinctive and robust features
sampled from a multimodal Gaussian distribution and
(e.g., CNN+RNN); 2) representative model followed by a
outputs the fake brain signals xF . The distributor receives
discriminative model (e.g., DBN+MLP); 3) generative +
the real brain signals xR and the generated fake sample
representative model followed by a discriminative model;
xF , and then it predicts whether the received sample is 4) generative + representative model followed by a non-
real or fake. The internal architecture of the generator deep learning classifier. In which, a representative model
and discriminator are designed depending on the data followed by a non-deep learning classifier is regarded as a
types and scenarios. For instance, we can build the representative deep learning model.
GAN by convolutional layers on fMRI images since CNN A few NCA hybrid models aim for brain signal
has an excellent ability to extract spatial features. The reconstruction. For example, St-yves et al. [259]
discriminator and the generator are trained jointly. After adopted GAN to reconstruct visual stimuli based on fMRI
the convergence, numerous brain signals xG can be created images.
by the generator. Thus, the training set is enlarged from xR
to {xR , xG } to train a more effective and robust classifier.

You might also like