0% found this document useful (0 votes)
19 views10 pages

11 - Efficient - Epileptic - Seizure - Prediction - Based - On - Deep - Learning

This paper presents a novel seizure prediction technique using deep learning applied to EEG recordings, aiming to detect the preictal brain state effectively and in real-time. The proposed method combines feature extraction and classification into a single automated system, achieving a high accuracy of 99.6% and a low false alarm rate of 0.004 h−1. Four deep learning models are introduced, utilizing convolutional and recurrent neural networks to enhance prediction capabilities and reduce computational complexity.

Uploaded by

dudyjyre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views10 pages

11 - Efficient - Epileptic - Seizure - Prediction - Based - On - Deep - Learning

This paper presents a novel seizure prediction technique using deep learning applied to EEG recordings, aiming to detect the preictal brain state effectively and in real-time. The proposed method combines feature extraction and classification into a single automated system, achieving a high accuracy of 99.6% and a low false alarm rate of 0.004 h−1. Four deep learning models are introduced, utilizing convolutional and recurrent neural networks to enhance prediction capabilities and reduce computational complexity.

Uploaded by

dudyjyre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

804 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO.

5, OCTOBER 2019

Efficient Epileptic Seizure Prediction Based


on Deep Learning
Hisham Daoud and Magdy A. Bayoumi, Life Fellow, IEEE

Abstract—Epilepsy is one of the world’s most common neuro-


logical diseases. Early prediction of the incoming seizures has a
great influence on epileptic patients’ life. In this paper, a novel
patient-specific seizure prediction technique based on deep learn-
ing and applied to long-term scalp electroencephalogram (EEG) Fig. 1. Brain states in a typical epileptic EEG recording.
recordings is proposed. The goal is to accurately detect the preictal
brain state and differentiate it from the prevailing interictal state
as early as possible and make it suitable for real time. The features
extraction and classification processes are combined into a single group while the disease crests among young individuals in ages
automated system. Raw EEG signal without any preprocessing is between 10 to 20 years old [2].
considered as the input to the system which further reduces the Epilepsy has a high disease burden where 50 million people
computations. Four deep learning models are proposed to extract worldwide have epilepsy and there are about two million new
the most discriminative features which enhance the classification patients recorded every year. Up to 70% of the epileptic patients
accuracy and prediction time. The proposed approach takes ad-
vantage of the convolutional neural network in extracting the could be controlled by the Anti-Epileptic Drugs (AED) while
significant spatial features from different scalp positions and the the other 30% are uncontrollable [2].
recurrent neural network in expecting the incidence of seizures Electroencephalogram (EEG) is the electrical recording of the
earlier than the current methods. A semi-supervised approach brain activities and is considered the most powerful diagnostic
based on transfer learning technique is introduced to improve the and analytical tool of epilepsy. Physicians classify the brain
optimization problem. A channel selection algorithm is proposed to
select the most relevant EEG channels which makes the proposed activity of the epileptic patients according to the EEG recordings
system good candidate for real-time usage. An effective test method into four states: preictal state, which is defined by the time period
is utilized to ensure robustness. The achieved highest accuracy of just before the seizure, ictal state which is during the seizure
99.6% and lowest false alarm rate of 0.004 h−1 along with very occurrence, postictal state that is assigned to the period after the
early seizure prediction time of 1 h make the proposed method the seizure took place and finally the interictal state which refers to
most efficient among the state of the art.
the period between seizures other than the previously mentioned
Index Terms—Classification, deep learning, epilepsy, EEG, states [3], these four states are illustrated in Fig. 1.
interictal, preictal, seizure prediction. Due to unexpected seizure times, epilepsy has a strong psy-
chological and social effect in addition to it could be considered a
I. INTRODUCTION life-threatening disease. Consequently, the prediction of epilep-
tic seizures would greatly contribute to improving the quality of
PILEPSY is defined according to the International League
E Against Epilepsy (ILAE) report [1], as a neurological
brain disorder identified by the frequent occurrence of symp-
life of epileptic patients in many aspects, like raising an alarm
before the occurrence of the seizure to provide enough time for
taking proper action, developing new treatment methods and
toms called epileptic seizure due to abnormal brain activities. setting new strategies to better understand the nature of the
Seizure’s characteristics include loss of awareness or conscious- disease. According to the above categorization of the epileptic
ness and disturbances of movement, sensation or other cognitive patient’s brain activities, the seizure prediction problem could
functions. The overall incidence of epilepsy is 23–100 per be viewed as a classification task between the preictal and
100,000. People at extremes of age are the most affected age interictal brain states. An alarm is raised in case of detecting the
preictal state among the predominant interictal states indicating
Manuscript received March 1, 2019; revised May 4, 2019 and June 26, 2019; a potential seizure is coming as shown in Fig. 1. The prediction
accepted June 28, 2019. Date of publication July 17, 2019; date of current time is the time before the seizure onset when the preictal state
version November 4, 2019. This paper was recommended by Associate Editor
L. Najafizadeh. (Corresponding author: Hisham Daoud.) is detected.
H. Daoud is with the Center for Advanced Computer Studies, Uni- In the literature, there are various methods proposed to address
versity of Louisiana at Lafayette, Lafayette, LA 70503 USA (e-mail: the seizure prediction problem trying to reach high classification
[email protected]).
M. A. Bayoumi is with the Department of Electrical and Computer Engineer- accuracy with early prediction. Since EEG signals are different
ing, University of Louisiana at Lafayette, Lafayette, LA 70503 USA (e-mail: across patients due to the variations in seizure type and loca-
[email protected]). tion [4], most seizure prediction methods are therefore patient-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. specific. In these methods, supervised learning techniques are
Digital Object Identifier 10.1109/TBCAS.2019.2929053 used through two main stages which are feature extraction
1932-4545 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 805

and classification between preictal states and interictal states.


In [5], the authors categorize the feature extraction schemes
in terms of localization into univariate and bivariate and in
terms of linearity into linear and nonlinear Multiple features are
sometimes combined to capture the brain dynamics that ends
up in dimensionality increase. The extracted features are used
to train the classifier that could then be used for the analysis of
Fig. 2. Block Diagram of MLP based Seizure predictor.
new EEG recordings to predict the occurrence of the seizure by
detecting the preictal state.
In the previous studies, the extracted features are categorized
into three main groups: time domain, frequency domain and non-
linear features. The authors in [6] used some statistical measures
like variance, skewness and kurtosis as time domain features. In
[7], the authors calculated the spectral power of the EEG signals
for frequency domain analysis. Some nonlinear features that
are derived from the dynamic systems’ theory were investigated Fig. 3. Block Diagram of DCNN + MLP based Seizure predictor.
such as Lyapunov exponent [8] and dynamic similarity index [9].
Based on the selected features, a prediction scheme that detects The used testing method proves the robustness of the proposed
the preictal brain state is implemented. Most of the previous algorithms over different seizures.
work proposed machine learning based prediction schemes like
Support Vector Machine (SVM). SVM classifier is used in
numerous studies like [7], [10], [11] to predict the epileptic II. METHODOLOGY
seizures. SVMs achieved outstanding results over other types In this paper, we propose four deep learning based models
of classifiers in terms of specificity and sensitivity [5]. for the purpose of early and accurate seizure prediction taking
Deep learning algorithms achieved great success in multiple into account the real-time operation. The seizure prediction
classification problems for various applications like computer problem is formulated as a classification task between interictal
vision and speech recognition. Some previous work utilized and preictal brain states, in which a true alarm is considered
deep learning in the classification stage for seizure prediction when the preictal state is detected within the predetermined
problem. In [12], the authors applied multi-layer perceptron preictal period as shown in Fig. 1. In spite of the abundant
to the extracted features. In [13] and [14], the authors used a research work done in seizure prediction, there is no standard
convolutional neural network as a classifier that is applied on duration for the preictal state. In our experiments, the preictal
the extracted features from EEG data to predict seizures. duration was chosen to be one hour before the seizure onset
The main challenge of the previously proposed methods is and interictal duration was chosen to be at least four hours
to determine the most discriminative features that best represent before or after any seizure as in [15]. Raw EEG data without
each class. The computation time needed to extract these features any preprocessing and without handcrafted features extracting
depends on the process complexity and is considered another is used as the input to all the models. The discriminative features
challenge especially in real-time application. Motivated by these are learned automatically using the deep learning algorithms in
challenges and due to the significance of the early and accurate order to reduce the overhead and speed up the classification task.
seizure prediction, we developed deep learning based seizure Due to the limited number of seizures for each patient, there is an
prediction algorithms that combine the feature extraction and imbalance between preictal and interictal samples. Obviously,
classification stages into a single automated framework. the number of interictal samples is much larger than the number
In this paper, we aim at automatic extraction of the most of preictal samples, and the classifiers tend to be more accurate
important features by developing deep learning based algo- toward the class with the larger number of training samples
rithms without any preprocessing. Multi-Layer Perceptron is [16]. In our experiments, we selected the number of interictal
applied to the raw EEG recordings as a simple architecture samples to be equal to the number of preictal samples to make the
of multiple trainable hidden layers, then Deep Convolutional data balanced. The EEG signals are divided to non-overlapping
Neural Network (DCNN) is used to learn the discriminative five seconds segments, each segment is considered as a training
spatial features between interictal and preictal states. In an- batch.
other proposed model, Bidirectional Long Short-Term Memory In our first model, Multi-layer Perceptron (MLP), a simple
(Bi-LSTM) Recurrent Neural Network is concatenated to the deep neural network, is trained on the selected patients to learn
DCNN to do the classification task. An Autoencoder (AE) the network parameters that are able to do the classification task.
based semi-supervised model is proposed and pre-trained using The block diagram of the model is shown in Fig. 2. To enhance
transfer learning technique to enhance the model optimization the classification accuracy, we propose the second model that
and converge faster. For the system to be suitable for real-time relies on Deep Convolutional Neural Network (DCNN) which
usage, computation complexity should be considered, therefore extracts the spatial features from different electrodes’ locations
we introduce a channel selection algorithm to select the best and uses MLP for the classification task as illustrated in Fig. 3.
representing channels from the multi-channel EEG recording. In order to use DCNN, EEG data is represented by a matrix

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
806 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO. 5, OCTOBER 2019

TABLE I
INFORMATION OF EEG DATA OF THE SELECTED PATIENTS

Fig. 4. Block Diagram of DCNN + Bi-LSTM based Seizure predictor.

Children’s Hospital Boston [18], [19] which is publicly available


[20]. The dataset composed of long-term scalp EEG data for 22
pediatric subjects with intractable seizures and one recording
with missing data. The recordings were taken during several days
after anti-seizure medication withdrawal to characterize their
seizures and evaluate their candidacy for surgical intervention.
Most cases have EEG recordings from surface electrodes of 23
channels in accordance with the International 10–20 system.
The sampling rate of the acquired EEG signals is 256 samples
per second with 16-bit resolution. There are some variations
Fig. 5. Block Diagram of the semi-supervised DCAE + Bi-LSTM model, in many factors between all subjects such as interictal period,
(a) pre-training phase of DCAE to generate the reconstructed EEG signals from
the latent space representation through unsupervised learning and (b) pre-trained preictal period, number of channels, and recording continuity.
classifier that predicts seizures through supervised learning. Therefore, we chose eight subjects in this study such that the
pre-determined interictal and preictal periods are satisfied, the
recordings are not interrupted and the full channels’ recordings
with one dimension is the number of channels and the other
are available. Table I summarizes the details about the EEG
dimension is the time steps. In our third model, proposed in
recordings used in our experiments.
[17], DCNN is utilized and concatenated with a Bidirectional
Long Short-Term Memory (Bi-LSTM) Network as the model
back-end to do the classification as shown in Fig. 4. LSTM B. Multi-Layer Perceptron
networks are known for their excellence in learning temporal
Multilayer Perceptron (MLP) is considered one of the most
features while maintaining long-time sequences dependencies
widely used artificial neural network (ANN). MLP consists
which helps in early prediction. Prediction problems are han-
usually of three successive layers, called: input layer, hidden
dled better using Bi-LSTM as it uses information from both
layers, and output layer [21]. Deep ANNs are composed of
previous and next time instances. For the sake of training time
multiple hidden layers that enable the network to learn the
reduction, we developed the fourth model that implements Deep
features better using the non-linear activation functions. The
convolutional Autoencoder (DCAE) architecture. In DCAE, we
ANN idea is motivated by the structure of the human brain’s
pre-trained the model front-end, DCNN, in an unsupervised
neural system. A typical ANN is a buildup of connected units
manner. Then, the training process is launched with some initial
called neurons. These artificial neurons incorporate the received
values that will help the network to converge faster and enhance
data and transmit it to the other associated neurons, much like
the network optimization which in turn reduce the training time
the biological neurons in the brain. The output of a neuron in any
and increase the accuracy. Transfer learning approach is used to
ANN is computed by applying a linear or non-linear activation
train the DCAE to improve the generalization across different
function to the weighted sum of the neurons’ output in the
seizures for the same patient. After training the AE, the trained
preceding layer. When the ANN used as a classifier, the final
encoder is connected to Bi-LSTM network for classification.
output at the output layer indicates the appropriate predicted
Fig. 5 illustrates the two parts of the DCAE model. We propose
class of the corresponding input data.
a channel selection algorithm to reduce the number of EEG
In our first proposed seizure prediction model, Fig. 2, we
channels which successively reduce the computation complexity
apply the raw EEG after segmentation to MLP with four hidden
and allocated memory making the system suitable for real-time
layers as depicted in Fig. 6. The number of units in each layer
application.
is 300, 100, 50, 20 starting from the first hidden layer to the
fourth one. The total number of trainable parameters is 8,870,291
A. Dataset which is considered high due to the fully connected architecture.
In this paper, we trained the proposed models and evaluated The model is trained with backpropagation and optimized using
their performance on the CHB-MIT EEG dataset recorded at RMSprop algorithm. The loss function used is the binary cross

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 807

and therefore the computational complexity. Finally, the fully


connected layer is applied to all the preceding layer’s output
to generate the one-dimensional feature vector. CNN is used as
a feature extractor to replace the complex feature engineering
used in previous work.
The proposed DCNN architecture model is shown in Fig. 7,
in which the EEG segment is converted into a 2D matrix to
be suitable for the DCNN. The architecture consists of four
convolutional layers and three maximum pooling layers inter-
changeably. We chose the number of kernels in each convolution
layer to be 32 with kernel size of 3 × 2 to cover the non-square
matrix of EEG data. The maximum pooling layers have pool
size of 2 × 2. RELU activation function is used across all
the convolutional layers. Batch Normalization technique [26] is
used to improve the training speed and reduce overfitting through
adding some noise to each layer’s activation.
The Batch Normalization Transform is defined as:
Fig. 6. The architecture of the proposed MLP based classifier.
x i − μB
BNγ,β (xi ) = γ  2 +β (4)
entropy defined by (1). σB + 

l (y, ŷ) = − [y log (ŷ) + (1 − y) log (1 − ŷ)] (1) where xi is the vector to be normalized in a mini-batch B =
2
{x1 , x2 , . . . xm }. μB and σB are the mean and variance of the
where ŷ and y are the desired output and the calculated output current mini-batch of xi , respectively.  is a constant added
respectively and l(y, ŷ) is the loss function. to the mini-batch variance for numerical stability. γ and β are
Rectifier Linear Unit (ReLU) activation function [22], as de- learned parameters used to scale and shift the normalized value
fined by (2), is used across the hidden layers to add nonlinearity respectively [26].
and to ensure robustness against noise in the input data. The proposed DCNN architecture is used as the front-end

x if x > 0 feature extractor in our three proposed models in Figs. 3, 4,
f (x) = (2)
0 if x < 0 5(b) which helps in spatial feature extraction from the different
electrodes position on the scalp. The number of trainable param-
where x is the sum of the weighted input signals and f (x) is the eters is drastically decreased when employing DCNN due to the
ReLU activation function. weight sharing property. The number of trainable parameters
Sigmoid activation function (3) is selected for the output layer. in the second model, DCNN + MLP, is almost 520K, while in
to predict the input data class. the third and fourth model, DCNN + Bi-LSTM and DCAE +
1 Bi-LSTM, the number of trainable parameters is almost 28K.
pi = (3)
1 + e−xi
where xi is the sum of the weighted input signals and pi is the D. Bidirectional-LSTM Recurrent Neural Network
probability of the input example being preictal. Recurrent neural network (RNN) is a type of neural network
that can maintain state along the sequential inputs. It can process
C. Deep Convolutional Neural Network
a temporal sequence of data depending on the processing done on
Convolutional Neural networks (CNNs) have shown great the previous sequences. This property of RNN makes it suitable
success in different pattern recognition and computer vision ap- for applications like prediction of time series data. The typical
plications [23]. This is due to the ability of CNN to automatically architecture of RNN is trained using backpropagation through
extract significant spatial features that best represents the data time (BPTT) which has some drawbacks like exploding and
from its raw form without any preprocessing and without any vanishing gradients and information morphing.
human decision in selecting these features [24]. The sparse con- Long Short Term Memory Networks (LSTMs) [27] are a type
nectivity and parameter sharing of CNN give it high superiority of RNN, implemented to overcome the problems of basic RNN.
regarding the memory footprint as it requires much less memory LSTMs are able to solve the problem of vanishing gradient by
to store the sparse weights. The equivariant representation prop- maintaining the gradient values during the training process and
erty of the CNN increases the detection accuracy of a pattern backpropagate it through layer and time, thus LSTM has the
when it exists in a different location across the image [25]. A capability of learning long-term dependencies. LSTM cell, as
typical CNN formed of three types of layers: convolution layer, shown in Fig. 8 consists of three controlling gates that could
pooling layer and fully connected layer. The convolution layer is store or forget the previous state and use or discard the current
used to generate the feature map by applying filters with trainable state. Any LSTM cell computes two states at each time step: a
weights to the input data. This feature map is then down-sampled cell state (c) that could be maintained for long time steps and a
by applying the pooling layer to reduce the features’ dimension hidden state (h) that is the new output of the cell at each time

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
808 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO. 5, OCTOBER 2019

Fig. 7. The architecture of the proposed DCNN front-end in DCNN based models.

Fig. 9. The unrolled Bidirectional LSTM Network.

Fig. 8. Basic LSTM cell. segment in the reverse order. The network output at each time
step is the combined outputs of the two blocks at this time
step. In addition to the previous context processing in standard
step. The mathematical expressions governing the cell gates’ LSTM, Bi-LSTM processes the future context which enhances
operation are defined as follows: the prediction results. Using Bi-LSTM as a classifier enhances
ft = σ (Wf h ht−1 + Wf x xt + bf ) (5) the prediction accuracy through extracting the important tem-
poral features in addition to the spatial features extracted by the
it = σ(Wih ht−1 + Wix xt + bi ) (6) DCNN.
Bi-LSTM is used in two proposed models in Figs. 4, 5(b), as
ot = σ (Woh ht−1 + Wox xt + bo ) (7) the back-end classifier that works on the feature vector generated
by DCNN. The proposed network consists of a single bidirec-
c̃t = tanh (Wch ht−1 + Wcx xt + bc ) (8)
tional layer that predicts the class label at the last time instance
ct = ft ◦ ct−1 + it ◦ c̃t (9) after processing all the EEG segments as shown in Fig. 9. We
chose the number of units, dimensionality of the output space,
ht = ot ◦ tanh (ct ) (10) to be 20. Dropout regularization technique is utilized to avoid
overfitting. The dropout is applied to the input and the recurrent
where xt is the input at time t, ct and ht are the cell state and the state with factor of 10% and 50% respectively. The sigmoid
hidden state at time t respectively. W and b denote weights and activation function is used for prediction of the EEG segment’s
biases parameters respectively. σ is the sigmoid function and ◦ class and RMSprop is selected for optimization.
is the Hadamard product operator. c̃t is a candidate for updating
ct through the input gate.
E. Deep Convolutional Autoencoder
The input gate it decides whether to update the cell with a
new cell state c̃t , while the forget gate ft decides what to keep Autoencoders (AEs) are unsupervised neural networks whose
or forget from the previous cell state and finally the output gate target is to find a lower dimensional representation of the
ot decides how much information to be passed to the next cell. input data. This technique has many applications like data
Instead of using LSTM as the classifier, we used a compression [29], dimensionality reduction [30], visualizing
Bidirectional-LSTM (Bi-LSTM) network [28] in which each high dimensional data [31] and removing noise from the input
LSTM block is replaced by two blocks that process temporal data. The AE network has two main parts namely, encoder and
sequence simultaneously in two opposite directions as depicted decoder. The encoder compresses the high dimensional input
in Fig. 9. In the forward pass block, the feature vector generated data into lower dimensional representation called latent space
from the DCNN is processed starting from its first-time instance representation or bottleneck and the decoder is retrieving the
to the end, while the backward pass block processes the same data back to its original dimension. The simple AE uses fully

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 809

Fig. 10. The architecture of the proposed DCAE. C stands for convolution, P for pooling, D for deconvolution and U for upsampling layer.

connected layers for the encoder and decoder. The aim is to learn After DCAE training, the pre-trained encoder is used as a
the parameters that minimize the cost function which expresses front-end of the fourth proposed model, DCAE + Bi-LSTM, as
the difference between the original data and the retrieved one. shown in Fig. 5(b) while the back-end is Bi-LSTM network. We
Deep Convolutional Autoencoder (DCAE) replaces the fully used the same network architecture of the DCNN and Bi-LSTM
connected layers in the simple AE with convolution layers. that is used in the third model (DCNN + Bi-LSTM). Training of
Due to the limited EEG dataset for each patient, we decided to this model is done in a supervised manner to predict the patient-
extend our work to develop an unsupervised training algorithm specific seizure onset. Since we used both unsupervised and
using DCAE as shown in Fig. 5(a). The proposed architecture supervised learning algorithms, this model is considered a semi-
of the DCAE model is depicted in Fig. 10. We used the same supervised learning model.
proposed DCNN model as an encoder and added the decoder
network to build the DCAE. Unsupervised learning is deployed
using transfer learning technique by training the DCAE on F. EEG Channel Selection
all the selected patients’ data (not patient-specific). Transfer We introduce an EEG channel selection algorithm to select
learning helps to obtain better generalization and enhance the the most important and informative EEG channels related to our
optimization of our prediction model and therefore reducing the problem. Decreasing the number of channels helps with reducing
training time. the features’ dimension, the computation load and the required
In the DCAE, Fig. 10, the encoder part consists of convolution memory for the model to be suitable for real-time application.
and pooling layers interchangeably, while in the decoder part, The proposed channel selection algorithm is explained in Ta-
the deconvolution and upsampling layers are used to reconstruct ble II. We provide the algorithm with the EEG preictal segments
the original EEG segment. The encoder output is the latent for each patient and the measured prediction accuracy by running
space representation which is low dimensional features that best our fourth model, DCAE + Bi-LSTM using all channels. On
represent the EEG input segment. On the other hand, the decoder the other hand, the algorithm will output the reduced channels
output is the reconstructed version of the original input. The that give the same accuracy by omitting redundant or irrelevant
learned encoder parameters are saved to be used later for training channels. We start by computing the statistical variance defined
the prediction model in Fig. 5(b) allowing the training process by (12) and the entropy defined by (13) for all the available
to have a good start point instead of random initialization of the channels (23 channels) of the preictal segments. Then, we select
parameters which reduces the training time drastically. the channels with highest variance entropy product that provide
Training of the DCAE is done using unlabeled EEG segments the same given prediction accuracy. This is done through an
(balanced data of preictal and interictal segments) of all the iterative process by training our model on the reduced channels
selected patients. RELU activation function is used across all the over each iteration. The variance is estimated as
convolutional layers. Batch Normalization technique is used to
N
improve the training speed and to reduce overfitting. The DCAE 1 
is optimized using RMSprop optimizer. The mean square error σ 2 (Xc ) = (xc (i) − μc )2 (12)
N i=1
is utilized as the cost function and is defined as
m  2
1  (i) where Xc , μc and N are the EEG data after normalization, mean
J (θ) = x − x(i) (11)
2m i=1 and number of samples of channel c, respectively. The entropy
of channel c is calculated as
(i)
where x(i) is the input EEG signal and x is the reconstructed N

EEG signal. m is the number of training examples and θ is the H (Xc ) = − p (xc (i)) log2 p (xc (i)) (13)
parameters being learned. i=1

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
810 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO. 5, OCTOBER 2019

TABLE II 80% of the training data is assigned to the training set while 20%
THE PROPOSED EEG CHANNEL SELECTION ALGORITHM
is assigned to the validation set over which the hyperparameters
are updated and the model is optimized.
We evaluated the performance of our models by calculating
some measures such as sensitivity, specificity, and accuracy on
the test data. These measures are averaged across all patients.
The prediction time of each model is recorded at the time of first
preictal segment detection. The evaluation measures are defined
as follows:
TP
Sensitivity = (14)
TP + FN
TN
Specificity = (15)
TN + FP
TP + TN
Accuracy = (16)
TP + TN + FP + FN
where TN, TP, FN and FP are the true negative, true positive,
false negative and false positive respectively.

III. RESULTS
A. Performance Evaluation and Analysis
We evaluated our proposed patient-specific models on the
selected patients by calculating some performance measures
such as prediction accuracy, prediction time, sensitivity, speci-
ficity and false alarm per hour. The training time is also com-
puted to evaluate our proposed channel selection algorithm.
Table III shows the obtained values of these measures for the
proposed four models which are MLP, DCNN + MLP, DCNN
+ Bi-LSTM, DCAE + Bi-LSTM. The fifth model, DCAE +
Bi-LSTM + CS, is the same as the fourth one but with using the
where p(xc (i)) is the probability mass function of the channel
channel selection algorithm.
c having N samples.
As could be noticed from Table III, MLP has the worst
In the channel selection algorithm, we chose the channels
accuracy, sensitivity, specificity and false alarm rate among the
with the highest variance entropy product because we want to
proposed models and this is because the learning process in this
maximize both. We want to select the channel that has a high
model aims at updating the network parameters for the output to
variance during the preictal interval and also provide the largest
be close to the ground truth without extracting any features from
amount of information.
the input data. The huge number of parameters in this model
(around 9 million) is another drawback. The training time is
G. Training and Testing Method
moderate (7.3 min) due to network simplicity.
In order to overcome the problem of the imbalanced dataset, By introducing the DCNN as a front-end, we found around
we selected the number of interictal segments to be equal to 10% enhancement in the accuracy, sensitivity and specificity and
the available number of preictal segments during the training the false alarm rate is improved by 60%. This improvement is due
process. The interictal segments were selected at random from to the ability of DCNN to extract the spatial features across dif-
the overall interictal samples. To ensure robustness and general- ferent scalp positions to use it in discrimination between preictal
ity of the proposed models, we used the Leave-one-out cross and interictal brain states. On the other hand, the training time
validation (LOOCV) technique as the evaluation method for is increased by 5 min and this is due to the added computation
all of our proposed models. In LOOCV, the training is done N complexity by the DCNN. The network parameters are dras-
separate times, where N is the number of seizures for a specific tically decreased because of the parameter sharing and sparse
patient. Each time, all seizures are involved in the training connectivity properties of the DCNN. In our third model, we
process except one seizure on which the testing is applied. The used Bi-LSTM as the back-end along with DCNN and this model
process is then repeated by changing the seizure under test. increase the accuracy to be 99.6%, the sensitivity to be 99.72%
By using this method, we ensure that the testing covers all the and the specificity to be 99.6%. The false alarm rate is enhanced
seizures and the tested seizures are unseen during the training. a lot to reach 0.004 false alarm per hour. This improvement is
The performance for one patient is the average across N trials due to using Bi-LSTM as a classifier instead of MLP. Bi-LSTM
and the overall performance is the average across all patients. extracts temporal features from the input sequence which helps

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 811

TABLE III
PERFORMANCE EVALUATION OF THE PROPOSED MODELS

Fig. 11. The measured accuracy among three different proposed algorithms. Fig. 13. The measured specificity among three proposed algorithms.

Fig. 14. The measured false alarm rate among three proposed algorithms.

Fig. 12. The measured sensitivity among three different proposed algorithms.

in seizure prediction more accurately at the cost of training time


which reached 14.2 min. The number of parameters is decreased
by 94% by getting rid of the MLP. In the fourth model, DCAE
is used to train the front-end part of our model. This improves
the network optimization by starting the training with an initial
set of parameters that makes the convergence process faster. As
a result, the training time decreased to 4.25 min on average with
the same highest performance. Utilizing the transfer learning
Fig. 15. The measured training time on the test set among five proposed
technique reduces overfitting and generalizes better. algorithms: MLP, DCNN + MLP, DCNN + Bi-LSTM, DCAE + Bi-LSTM
The proposed channel selection algorithm reduces the number and DCAE + Bi-LSTM + CS.
of channels to 10 channels on average among all the selected
patients instead of using all the channels which are 23 channels.
preictal segments, thus the prediction time is one hour before
Therefore, the computation complexity is reduced making the
the seizure onset or less in case of a shorter preictal segment.
training time to reach 2.2 min on average with lowest number
of parameters of around 18K which make this model suitable
B. Statistical Analysis
for real-time applications. All the obtained results are shown
graphically for different models across the selected patients in We performed Kruskal-Wallis test [32] as a nonparametric
(Figs. 11–15). test statistic to compare the accuracy, sensitivity, specificity
Regarding the prediction time, all the proposed models were and false alarm rate of each model of the three basic models
able to accurately predict the tested seizures from the start of the which are, MLP, DCNN + MLP, and DCNN + Bi-LSTM. The

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
812 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO. 5, OCTOBER 2019

TABLE IV
COMPARISON WITH OTHER SEIZURE PREDICTION METHODS APPLIED TO CHB-MIT DATASET

Kruskal-Wallis test yielded (p-value < 0.05) for all the per- test the proposed models proves the robustness and generality
formance measures indicating statistical significance difference of our method against variation across various seizure types.
between the results among all the proposed models. For the Our experimental results and the comparison with previous
accuracy (p-value = 0.01), for the sensitivity (p-value = 0.006), work demonstrate that the proposed method is efficient, reliable
for the specificity (p-value = 0.04), and for the false alarm rate and suitable for real-time application of seizure prediction. This
(p-value = 0.04). is by achieving accuracy higher than the state of the art with
earlier prediction time to mitigate the potential life-threatening
incidents for epileptic patients.
C. Comparison With Other Methods
For further evaluation of our proposed method, we compared
our achieved experimental results with previous work that have REFERENCES
used the same dataset as shown in Table IV. While the same [1] R. S. Fisher et al., “ILAE official report: A practical clinical definition of
criterion to select the patients from the dataset is applied in this epilepsy,” Epilepsia, vol. 55, no. 4, pp. 475–482, Apr. 2014.
paper and [10], the other compared work employed different cri- [2] World Health Organization, Neurological Disorders: Public Health Chal-
lenges. Geneva, Switzerland: World Health Organization, 2006.
teria which led to different selection of patients. In the presented [3] C.-Y. Chiang, N.-F. Chang, T.-C. Chen, H.-H. Chen, and L.-G. Chen,
previous work, some features were extracted like Zero-Crossing “Seizure prediction based on classification of EEG synchronization pat-
(ZC) interval in the EEG signals as in [33] and ZC of the Wavelet terns with on-line retraining and post-processing scheme,” in Proc.
Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., Boston, MA, USA, 2011,
Transform (WT) coefficients of the EEG signals as in [10], WT pp. 7564–7569.
of the EEG signals as in [13], spectral power as in [34] and set [4] “Epilepsy prevalence, incidence and other statistics,” Joint Epilepsy Coun-
of features in time domain, frequency domain and from graph cil, Leeds, U.K., 2005.
[5] E. Bou Assi, D. K. Nguyen, S. Rihana, and M. Sawan, “Towards accu-
theory as in [11]. These studies used machine learning based rate prediction of epileptic seizures: A review,” Biomed. Signal Process.
classifiers like SVM or Gaussian Mixture Model (GMM). The Control, vol. 34, pp. 144–157, Apr. 2017.
authors in [13] used CNN as a classifier. The proposed method [6] A. Aarabi, R. Fazel-Rezai, and Y. Aghakhani, “EEG seizure prediction:
Measures and challenges,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol.
achieved the highest accuracy, sensitivity and specificity among Soc., Sep. 2009, pp. 1864–1867.
others. Our prediction time is the earliest and the false alarm rate [7] M. Bandarabadi, C. A. Teixeira, J. Rasekhi, and A. Dourado, “Epileptic
is the lowest. seizure prediction using relative spectral power features,” Clin. Neuro-
physiol., vol. 126, no. 2, pp. 237–248, Feb. 2015.
[8] L. D. Iasemidis, J. C. Sackellares, H. P. Zaveri, and W. J. Williams,
“Phase space topography and the Lyapunov exponent of electrocor-
IV. CONCLUSION ticograms in partial seizures,” Brain Topography, vol. 2, no. 3, pp. 187–201,
1990.
In this paper, a novel deep learning based patient-specific [9] M. Le Van Quyen, J. Martinerie, M. Baulac, and F. Varela, “Anticipating
epileptic seizures in real time by a non-linear analysis of similarity be-
epileptic seizure prediction method using long-term scalp EEG tween EEG recordings,” Neuroreport, vol. 10, no. 10, pp. 2149–2155, Jul.
data has been proposed. This method achieves a prediction accu- 1999.
racy of 99.6%, a sensitivity of 99.72%, a specificity of 99.60%, a [10] S. Elgohary, S. Eldawlatly, and M. I. Khalil, “Epileptic seizure prediction
using zero-crossings analysis of EEG wavelet detail coefficients,” in Proc.
false alarm rate of 0.004 per hour and prediction time of one hour IEEE Conf. Comput. Intell. Bioinf. Comput. Biol., 2016, pp. 1–6.
prior the seizure onset. An important spatial and temporal feature [11] K. M. Tsiouris, V. C. Pezoulas, D. D. Koutsouris, M. Zervakis, and
from raw data are learned by the DCNN and Bi-LSTM networks D. I. Fotiadis, “Discrimination of preictal and interictal brain states from
long-term EEG data,” in Proc. IEEE 30th Int. Symp. Comput.-Based Med.
respectively. DCAE based Semi-supervised learning approach Syst., 2017, pp. 318–323.
is investigated with the transfer learning technique which led [12] C. A. Teixeira et al., “Epileptic seizure predictors based on computational
to reducing the training time. For the system to be suitable for intelligence techniques: A comparative study with 278 patients,” Comput.
Methods Programs Biomed., vol. 114, no. 3, pp. 324–336, May 2014.
real-time application, a channel selection algorithm is proposed [13] H. Khan, L. Marcuse, M. Fields, K. Swann, and B. Yener, “Focal onset
which reduces the computational load and the training time. seizure prediction using convolutional networks,” IEEE Trans. Biomed.
Using Leave-One-Out exhaustive cross-validation technique to Eng., vol. 65, no. 9, pp. 2109–2118, Sep. 2018.

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 813

[14] H. G. Daoud, A. M. Abdelhameed, and M. Bayoumi, “Automatic epileptic Hisham Daoud received the B.Sc. and M.Sc. degrees
seizure detection based on empirical mode decomposition and deep neural from Cairo University, Giza, Egypt, and the Ph.D.
network,” in Proc. IEEE 14th Int. Colloq. Signal Process. Appl., 2018, degree from Ain Shams University, Cairo, Egypt, in
pp. 182–186. 2004, 2007, and 2014, respectively, all in electronics
[15] “American epilepsy society seizure prediction challenge,” Accessed: and communications engineering.
Jun. 17, 2018. [Online]. Available: https://www.kaggle.com/c/seizure- Since 2004, he has held multiple positions in
prediction both industry and academia. He is currently with the
[16] N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: special issue University of Louisiana, Lafayette, LA, USA. His re-
on learning from imbalanced data sets,” SIGKDD Explorations, vol. 6, search interest includes biomedical signal processing,
pp. 1–6, Jun. 2004. machine learning, deep learning, and neuromorphic
[17] H. Daoud and M. Bayoumi, “Deep Learning based Reliable Early Epileptic computing.
Seizure Predictor,” in Proc. IEEE Biomed. Circuits Syst. Conf., Cleveland, Dr. Daoud was the recipient of the IEEE CASS Student Travel Award,
OH, USA, 2018, pp. 1–4. Best Paper Award in the 14th IEEE Colloquium on Signal Processing and its
[18] A. H. Shoeb, “Application of machine learning to epileptic seizure onset Applications Conference (CSPA’2018). He has served as a Reviewer for several
detection and treatment,” M.Sc. Thesis, Massachusetts Inst. Technol., IEEE conferences and journals.
Cambridge, MA, USA, 2009.
[19] A. L. Goldberger et al., “PhysioBank, physiotoolkit, and physionet: Com-
ponents of a new research resource for complex physiologic signals,”
Circulation, vol. 101, no. 23, pp. e215–e220, Jun. 2000.
[20] “CHB-MIT scalp EEG database,” Accessed: May 2, 2019. [Online].
Available: https://physionet.org/pn6/chbmit/
[21] N. Siddique and H. Adeli, Computational Intelligence: Synergies of Fuzzy
Logic, Neural Networks and Evolutionary Computing. Hoboken, NJ, USA: Magdy A. Bayoumi (LF’16) received the B.Sc. and
Wiley, 2013. M.Sc. degrees in electrical engineering from Cairo
[22] R. H. R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and University, Giza, Egypt, in 1973 and 1977, respec-
H. S. Seung, “Digital selection and analogue amplification coexist in a tively, the M.Sc. degree in computer engineering from
cortex-inspired silicon circuit,” Nature, vol. 405, no. 6789, pp. 947–951, Washington University, St. Louis, MO, USA, in 1981,
Jun. 2000. and the Ph.D. degree in electrical engineering from
[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification the University of Windsor, ON, Canada, in 1984.
with deep convolutional neural networks,” in Advances in Neural Infor- He is currently the Department Head of W. H. Hall
mation Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Department of Electrical and Computer Engineering.
Q. Weinberger, Eds. Red Hook, NY, USA: Curran Associates, Inc., 2012, He is the Hall Endowed Chair in computer engineer-
pp. 1097–1105. ing. He was the Director of the Center for Advanced
[24] Y. Bengio, Y. Lecun, and Y. Lecun, “Convolutional networks for images, Computer Studies and the Department Head of Computer Science Department.
speech, and time-series,” in The Handbook of Brain Theory and Neural He was also, the Loflin Eminent Scholar Endowed Chair in computer science,
Networks. Cambridge, MA, USA: MIT Press, 1995. all at the University of Louisiana at Lafayette where he has been a faculty
[25] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, member since 1985. He has graduated about 100 Ph.D. and 150 M.Sc. students,
MA, USA: MIT Press, 2016. authored/coauthored about 600 research papers and more than 10 books. He was
[26] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network the Guest/Coguest Editor of more than 10 special journal issues, the latest was
training by reducing internal covariate shift,” in Proc.32nd Int. Conf. Mach. on Machine to Machine Interface. He has served the IEEE Computer, Signal
Learn., Feb. 2015, vol. 37, pp. 448–456. Processing, and Circuits & Systems (CAS) societies. He is currently the Vice
[27] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural President of Technical Activities of IEEE RFID council and he is on the IEEE
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. RFID Distinguished Lecture Program (DLP). He is a member of IEEE IoT
[28] A. Graves and J. Schmidhuber, “Framewise phoneme classification with Activity Board. He was the recipient of the many awards, among them; the
bidirectional LSTM networks,” in Proc. IEEE Int. Joint Conf. Neural IEEE CAS Education award and the IEEE CAS Distinguished Service award.
Netw., 2005, vol. 4, pp. 2047–2052. He was on the IEEE DLP programs for CAS and Computer societies. He was
[29] P. Baldi, “Autoencoders, unsupervised learning, and deep architec- on the IEEE Fellow Selection Committee. He has been an ABET Evaluator
tures,” in Proc. ICML Workshop Unsupervised Transfer Learn., 2012, and he was an ABET Commissioner and Team Chair. He has given numerous
pp. 37–49. keynote/invited lectures and talks nationally and internationally. He was the
[30] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” May General Chair of IEEE ICASSP 2017 in New Orleans. He, also, chaired many
2014, arXiv:1312.6114. conferences including ISCAS 2007, ICIP 2009, and ICECS 2015. He was the
[31] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Chair of an international delegation to China, sponsored by People-to-People
Learn. Res., vol. 9, pp. 2579–2605, 2008. Ambassador, 2000. He received the French Government Fellowship, University
[32] W. H. Kruskal and W. A. Wallis, “Use of ranks in one-criterion vari- of Paris Orsay, 2003–2005 and 2009. He was a Visiting Professor with the King
ance analysis,” J. Amer. Statist. Assoc., vol. 47, pp. 583–621, Dec. Saud University. He was a United Nation Visiting Scholar. He has been an advisor
1952. to many EE/CMPS Departments in several countries. He was on the State of
[33] A. Shahidi Zandi, R. Tafreshi, M. Javidan, and G. A. Dumont, “Predicting Louisiana Comprehensive Energy Policy Committee. He was the Vice President
epileptic seizures in scalp EEG based on a variational Bayesian Gaussian of Acadiana Technology Council. He was on the Chamber of Commerce Tourism
mixture model of zero-crossing intervals,” IEEE Trans. Biomed. Eng., and Education committees. He was a member of several delegations representing
vol. 60, no. 5, pp. 1401–1413, May 2013. Lafayette to international cities. He was on the Le Centre International Board. He
[34] Z. Zhang and K. K. Parhi, “Low-complexity seizure prediction from was the General Chair of SEASME (an organization of French Speaking cities)
iEEG/sEEG using spectral power and ratios of spectral power,” IEEE conference in Lafayette. He is a member of Lafayette Leadership Institute; he
Trans. Biomed. Circuits Syst., vol. 10, no. 3, pp. 693–706, Jun. was a founding member of its executive committee. He was a Column Editor
2016. for Lafayette Newspaper; the Daily Advertiser.

Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.

You might also like