Epilepsy Detection
Epilepsy Detection
Abstract—Epilepsy is a common brain disease that has serious only time-consuming but also less robust. Therefore, construct
negative effects on patients. Electroencephalogram (EEG) is a model framework that automatically extracts features and
widely used for detecting epileptic signal. Because epilepsy EEG classifies the raw data is a more sensible choice.
data set is generally small, in this paper, we use a shallow
convolutional neural network (CNN) to classify the raw data. With the development of deep learning, many neural
To improve the performance of the model, we propose a layer- network-based classification models have been proven that
wise pre-training mechanism. In our experiment, we validate the these models were able to automatically extract powerful
effectiveness of the method on the public epilepsy EEG data set features [5]. In EEG epilepsy detection, many methods based
provided by University of Bonn. on deep learning have been proposed. Wen used deep con-
Index Terms—epilepsy, electroencephalogram, convolutional
neural network, pre-training
volutionl network and autoencoders-based model for feature
extraction and classified the data by different classifiers [6].
Acharya used a 13-layer CNN to classify the raw data [7].
I. I NTRODUCTION
San-Segundo used empirical mode decomposition (EMD), fast
Epilepsy is a chronic disease that causes sudden abnormal wavelet transform (FWT) and fourier transform for feature ex-
discharge of brain neurons and transient brain dysfunction. It traction and classified the data by a CNN with 3 convolutional
is one of the most prevalent neurological diseases in the world layers and 3 fully connected layers. [8] In the above mentioned
[1]. Therefore, timely diagnosis of seizures is the key to im- methods, the deep learning based methods can simplify the
prove the quality of life for patients. Electroencephalography process of feature extraction and even construct an end-to-
(EEG), as a record of brain electrical activity, is widely used in end network to classify raw data. However, these models have
the diagnosis of epilepsy. Usually, an experienced neurologist many parameters. Because epilepsy EEG data set is generally
will evaluate the EEG signal to determine whether the patient small, it would be better to use a model with fewer parameters.
is experiencing seizures or not. But this is a very inefficient and In general, the classification performance of shallow network
expensive way. So we urgently need to develop an automatic is poor. In order to improve the performance, we proposed
epilepsy detection system to ensure that the diagnosis process a layer-wise pre-training mechanism using on a 3-layer CNN
is easier, more accurate and more efficient. structure. The experimental results show that the model using
At present, the classical automatic detection system for proposed pre-training mechanism has achieved outstanding
epilepsy based on EEG signal can be divided into two parts: results.
the first part is to preprocess the signal, which is mainly
II. M ETHODOLOGY
used to remove noise. The second part is to construct the
hand-crafted feature for classification. In previous work, many A. Data
feature extraction methods have been proposed. Samiee used In our experiments, we used the public epilepsy EEG data
rational discrete short-time Fourier transform (DSTFT) for set provided by University of Bonn [9], which is one of the
feature extraction and fed the data into multilayer perceptron most commonly used data sets for epilepsy detection. The data
(MLP) architecture to get the best result [2]. Kaya used set contains five different collections: Set A, set B, set C, set
one-dimensional local binary pattern (1D-LBP) for feature D, set E. Each set contains 100 single-channel EEG signal
extraction and the data were classified by classifier algorithms segments with a duration of 23.6 seconds. Set A and set B
with features extracted [3]. Gupta first decomposed EEG were recorded from five healthy subjects using a standardized
signals into different brain rhythms, and then used statistical electrode placement scheme. The subjects were relaxed in an
method to model these rhythms. Finally, the parameters of the awake state and opened their eyes while recording set A, but
models and autoregressive moving average parameters were closed their eyes while recording set B. Set C, set D, and
designed as the features and used a SVM to classify the set E were recorded from five epileptic patients. Set D were
features [4]. These methods all require a good understanding recorded from the epileptogenic zone, and set C were recorded
of the characteristics of EEG signal. Moreover, since the EEG from the hippocampal formation of the opposite hemisphere
data is non-stationary, designing hand-crafted features is not of the brain. Set E was recorded from five patients while they
Authorized licensed use limited to: University of New South Wales. Downloaded on September 27,2020 at 18:15:20 UTC from IEEE Xplore. Restrictions apply.
were experiencing active seizures. The sampling frequency of where x is the sample in the min-batch, is a small number
the data was 173.61 Hz and had been filtered using a bandpass that guarantees the numerical stability, gamma and beta are
filter of 0.53-40 Hz. Since the data set is small, in order to two parameters that can be learned to control the distribution
increase number of samples in our experiments, we use a two- of y.
second sliding window to divide the data without overlapping E. Pre-training Mechanism
to obtain a larger amount of samples.
In order to improve the performance of the model, we
B. The Network Architecture propose a layer-wise pre-training mechanism based on convo-
Our neural network structure is summarized in Fig. 1. It lutional neural network. Through this mechanism, the neural
contains three convolutional layers, three Batch Normalization network can be trained more efficiently. Mainly, there are two
layers and the activation function is Relu. This is a shallow kinds of pre-training strategies: (1) Use the autoencoder [13]
neural network. Here we use the convolutionl kernel size is constructed by the original network, and then train the model
1x5. It is worth noting that each layer is padded before the by unsupervised learning on the data set using greedy layer-
convolution operation to ensure that the dimensions of the wise training strategy [14]. Finally, use the trained parameters
input and output are the same. The details of the network as the initial value of the original network, and then complete
structure are described in Fig. 1. supervised learning as usual. (2) Train the model using other
data sets which is similar to the target data set. The parameters
obtained from the similar data should be a good initial value
of weight. However, both methods have certain limitations,
mainly reflected in:
• The entire pre-training process is one-shot and it is not
possible to do some pre-training to fine-tune the model
during supervised learning.
• Because each training has a certain randomness, these
methods may sometimes get a bad initial value and
increase the difficulty of supervised learning.
In order to overcome the above problems, in this paper we
propose a layer-wise pre-training mechanism as shown in Fig.
2. First we train the sub-network 1 through a auxiliary fully
connected layer. And then pre-train sub-network 2 through
another auxiliary fully connected layer. Finally, we train the
entire network. An epoch consists of the above three training
Fig. 1. Proposed vanilla convolutional neural network. steps. It is worth noting that these three training steps are
all trained using supervised learning on the entire training
C. Activation Function set. The advantage of this method is that each epoch will
In the past, we used activation function such as tanh function perform three rounds of training (two pre-trainings plus one
or sigmoid function when modeling neurons. But the two entire training) so that the pre-training mechanism can still
activation functions have a saturation region. The derivative of function during the training process. Therefore, the model
the region is very close to zero, which makes the gradient of can be continuously fine-tuned and the damage caused by
the loss function very close to zero so that the training process randomness can be avoided.
cannot continue. Therefore, we use Relu as the nonlinear III. E XPERIMENT
activation function of the model [10]. Experiments have shown In our experiments, we considered different classification
that using Relu to train deep neural networks is faster than problem to evaluate the performance of our method. We
networks with the same structure using tanh activation function used K-fold cross-validation, randomly divide the data into
[11]. The Relu activation function is defined as follows: K disjoint fold, and then selected K-1 of them as the training
y = max(0, x) (1) set, and the remaining one as the test set. We got classification
accuracy on the test set and perform the process K times.
D. Batch Normalization Finally, we calculated the average of classification accuracy
In order to speed up the convergence of the neural network among K times as the result. We chose K=10. In order to
and reduce the sensitivity to the initial value, we use a batch make the training process converge better, each sample would
normalization layer after the convolutional layer to make the subtract its own mean.
distribution of the hidden layer more reasonable [12]. The A. Model Training
transformation formula of batch normalization is as follows:
In order to evaluate the proposed method better, we sepa-
x − mean[x] rately trained the vanilla CNN with and without the proposed
y= ∗ gamma + beta (2)
V ar[x] + pre-training mechanism. The model structure is as follows:
225
Authorized licensed use limited to: University of New South Wales. Downloaded on September 27,2020 at 18:15:20 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. The proposed training processes.
Vanilla convolutional neural network without proposed the performance of the shallow network which has fewer
pre-training The network structure and some training param- parameters and reduce the risk of overfitting. The experimental
eters are as follows: results demonstrate the effectiveness and superiority of the
• Network Depth: 3 Convolutionl layers + 3 BN layers + proposed method.
1 Fully Connected layer + 1 softmax layer
• Convolutionl Kernel Size per Layer: 1x5
VI. ACKNOWLEGEMENT
• Number of Convolutionl Kernels per Layer: 8 This work was supported in part by the National Nat-
• Learning Rate: 5e-5 ural Science Foundation of China under Grants 61836003,
• Number of Iterations: 300 61573152.
Vanilla convolutional neural network with proposed pre-
training The basic network structure is the same as vanilla R EFERENCES
CNN. We used the proposed pre-training mechanism on the [1] L. D. Iasemidis, “Epileptic seizure prediction and control,” IEEE Trans-
model. It should be noted that since the learning rates required actions on Biomedical Engineering, vol. 50, no. 5, pp. 549–558, 2003.
by the three sub-networks may be different, we selected three [2] K. Samiee, P. Kovacs, and M. Gabbouj, “Epileptic seizure classification
of eeg time-series using rational discrete short-time fourier transform,”
different learning rate of 5e-4, 2.5e-4, and 1e-5, respectively. IEEE Transactions on Biomedical Engineering, vol. 62, no. 2, pp. 541–
552, 2014.
IV. R ESULTS [3] Y. Kaya, M. Uyar, R. Tekin, and S. Yıldırım, “1d-local binary pattern
based feature extraction for classification of epileptic eeg signals,”
Our experimental results are shown in TABLE I. Among Applied Mathematics and Computation, vol. 243, pp. 209–219, 2014.
all the classification tasks, the results of some tasks are [4] A. Gupta, P. Singh, and M. Karlekar, “A novel signal modeling approach
comparable to the previous work, and the results of other tasks for classification of seizure and seizure-free eeg signals,” IEEE Transac-
tions on Neural Systems and Rehabilitation Engineering, vol. 26, no. 5,
are greatly improved compared to the previous work. In almost pp. 925–935, 2018.
all classification tasks, it can be seen that the classification [5] Y. LeCun, Y. Bengio et al., “Convolutional networks for images, speech,
accuracy of the model using pre-training is higher than the and time series,” The Handbook of Brain Theory and Neural Networks,
vol. 3361, no. 10, p. 1995, 1995.
model without pre-training. It is worth noting that, almost [6] T. Wen and Z. Zhang, “Deep convolution neural network and
in all tasks, the variance of CNN with pre-training is lower autoencoders-based unsupervised feature learning of eeg signals,” IEEE
than the model without pre-training mechanism as shown in Access, vol. 6, pp. 25 399–25 410, 2018.
[7] U. R. Acharya, S. L. Oh, Y. Hagiwara, J. H. Tan, and H. Adeli, “Deep
TABLE II. These results show that the pre-training mechanism convolutional neural network for the automated detection and diagnosis
not only improves the classification accuracy, but also makes of seizure using eeg signals,” Computers in Biology and Medicine, vol.
the training process more robust. 100, pp. 270–278, 2018.
[8] R. San-Segundo, M. Gil-Martı́n, L. F. D’Haro-Enrı́quez, and J. M. Pardo,
Finally we visualize the confusion matrix of E-C-A, E-D-A “Classification of epileptic eeg recordings using signal transforms and
and E-D-B classification task, as shown in the Fig. 3. These convolutional neural networks,” Computers in Biology and Medicine,
results show that the main confusions appear between set A / vol. 109, pp. 148–158, 2019.
B and set C / D. [9] R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, and C. E.
Elger, “Indications of nonlinear deterministic and finite-dimensional
structures in time series of brain electrical activity: Dependence on
V. C ONCLUSIONS recording region and brain state,” Physical Review E, vol. 64, no. 6,
In this paper, we propose a layer-wise pre-training mecha- p. 061907, 2001.
[10] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltz-
nism using on a 3-layer CNN structure for automatic epilepsy mann machines,” in Proceedings of the 27th International Conference
classification. The proposed pre-training mechanism improve on Machine Learning (ICML-10), 2010, pp. 807–814.
226
Authorized licensed use limited to: University of New South Wales. Downloaded on September 27,2020 at 18:15:20 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. 10-fold cross-vaildation confusion matrix.
TABLE II
10-FOLD CROSS-VALIDATION STD FOR CNN MODEL
TABLE I
COMPARISON WITH PROPOSED METHODS FOR THE SAME
DATASET Tasks with Pre-training std(%) without Pre-training std(%)
E-A 0 0
Tasks Authors (Year) Accuracy(%) E-B 0.006 0.008
E-A Subasi (2007) [15] 95 E-C 0.008 0.011
Guo et al. (2009) [16] 95.2 E-D 0.010 0.008
Nicoletta and Georgiou (2012) [17] 93.5 E-A,B,C,D 0.009 0.015
Samiee et al. (2015) [2] 99.8 E-C-A 0.010 0.012
Kaya et al. (2014) [3] 99.5 E-D-A 0.011 0.012
Gupta et al. (2018) [4] 94.85 E-D-B 0.006 0.008
San-Segundo et al. (2019) [8] 99.8 E-C,D 0.008 0.016
Ullah et al. (2018) [18] 99.9
Proposed Vanilla CNN 100
Proposed Vanilla CNN with pre-traing 100
E-B Nicoletta and Georgiou (2012) [17] 82.9
Samiee et al. (2015) [2] 99.3 [11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
Gupta et al. (2018) [4] 99.0 with deep convolutional neural networks,” in Advances in Neural Infor-
Ullah et al. (2018) [18] 99 mation Processing Systems, 2012, pp. 1097–1105.
Proposed Vanilla CNN 98.6 [12] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
Proposed Vanilla CNN with pre-traing 99.0 network training by reducing internal covariate shift,” arXiv preprint
E-C Nicoletta and Georgiou (2012) [17] 88.0 arXiv:1502.03167, 2015.
San-Segundo et al. (2019) [8] 98.5 [13] G. E. Hinton and J. L. McClelland, “Learning representations by
Gupta et al. (2018) [4] 97.5 recirculation,” in Neural Information Processing Systems, 1988, pp. 358–
Ullah et al. (2018) [18] 98.1 366.
Proposed Vanilla CNN 97.9 [14] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-
Proposed Vanilla CNN with pre-traing 98.7 wise training of deep networks,” in Advances in Neural Information
E-D Nicoletta and Georgiou (2012) [17] 79.9 Processing Systems, 2007, pp. 153–160.
Samiee et al. (2015) [2] 94.9 [15] A. Subasi, “Eeg signal classification using wavelet feature extraction and
Kaya et al. (2014) [3] 95.5 a mixture of expert model,” Expert Systems with Applications, vol. 32,
Gupta et al. (2018) [4] 96.35 no. 4, pp. 1084–1093, 2007.
Ullah et al. (2018) [18] 97.4 [16] L. Guo, D. Rivero, J. A. Seoane, and A. Pazos, “Classification of eeg
Proposed Vanilla CNN 97.5 signals using relative wavelet energy and artificial neural networks,”
Proposed Vanilla CNN with pre-traing 98.1 in Proceedings of the First ACM/SIGEVO Summit on Genetic and
E-ABCD Samiee et al. (2015) [2] 98.1 Evolutionary Computation. ACM, 2009, pp. 177–184.
Gupta et al. (2018) [4] 97.79 [17] N. Nicolaou and J. Georgiou, “Detection of epileptic electroencephalo-
San-Segundo et al. (2019) [8] 99.5 gram based on permutation entropy and support vector machines,”
Proposed Vanilla CNN 97.3 Expert Systems with Applications, vol. 39, no. 1, pp. 202–209, 2012.
Proposed Vanilla CNN with pre-traing 98.1 [18] I. Ullah, M. Hussain, H. Aboalsamh et al., “An automated system
E-C-A San-Segundo et al. (2019) [8] 96.5 for epilepsy detection using eeg brain signals based on deep learning
Proposed Vanilla CNN 94.4 approach,” Expert Systems with Applications, vol. 107, pp. 61–71, 2018.
Proposed Vanilla CNN with pre-traing 96.3
E-D-A Kaya et al. (2014) [3] 95.67
San-Segundo et al. (2019) [8] 95.7
Proposed Vanilla CNN 92.9
Proposed Vanilla CNN with pre-traing 95.4
E-D-B Acharya et al. (2018) [7] 88.67
Proposed Vanilla CNN 94.9
Proposed Vanilla CNN with pre-traing 96.3
E-CD Kaya et al. (2014) [3] 97.0
Gupta et al. (2018) [4] 96.92
Ullah et al. (2018) [18] 98.8
Proposed Vanilla CNN 97.5
Proposed Vanilla CNN with pre-traing 98.0
227
Authorized licensed use limited to: University of New South Wales. Downloaded on September 27,2020 at 18:15:20 UTC from IEEE Xplore. Restrictions apply.