Seizure Prediction From EEG Data With BiLSTM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Predict Epileptic Episodes from EEG Time Series

with Deep Neural Networks


Liang Ma Mara Kaspers
Department of Biomedical Engineering Department of Biomedical Engineering
Columbia University Columbia University
New York, NY, 10027 New York, NY, 10027
[email protected] [email protected]

Abstract—Identifying seizure discharges in electroencephalo- Researchers have employed various machine learning tech-
gram (EEG) is an early and crucial step of clinical diagnosis niques to automatically detect seizures[2]. Yet the extreme
of epilepsy by neurologists. The manual review of an EEG for variability existing in EEG from different patients and some-
seizure detection is a laborious and error-prone process. There-
fore automated seizure detection has been extensively studied. In times the same patient causes significant difficulty in automatic
recent years, deep learning techniques has been adopted in order detection[3]. Another source of difficulty is that EEG signals
to avoid manual feature extraction and selection. In this short are highly non-stationary and nonlinear[4]. A generalized
project, we cursively compare the performance of two different seizure detector requires extraction of discriminative features
types of deep neural networks on predicting epileptic episodes between seizure and non-seizure EEGs. Traditional methods
using EEG raw time series. We use a relatively small and publicly
accessible dataset of surface EEG recorded from human subjects. include hand-engineered EEG feature extraction in the time
Data is organized and sectioned into EEG epochs labeled into domain, frequency domain, time-frequency domain, or com-
5 categories, including seizure activity. We designed two tasks binations of these domains. Time domain features include
based on the data; one is classifying seizure activity against amplitude and duration of waveform as well as their varia-
normal activity, and the other is classifying seizure against all tion coefficients. Frequency domain features can be spectral
other categories. A convolutional neural network (CNN) and a
bidirectional long short term memory network (BiLSTM) are characteristics from Fast Fourier Transform (FFT) or peri-
implemented to classify the inputs. The classification result for odogram. Time-frequency decomposition usually comes from
test data set showed that the LSTM, with accuracy above 0.97, Short Time Fourier Transform (STFT) or wavelet transform.
greatly outperforms the CNN. These features are extracted, statistically analyzed, ranked, and
selected for classification. Combinations of multiple domain
I. I NTRODUCTION features are commonly used in the classification process.
Epilepsy is a disease of the brain defined by recurrent Classification methods include logistic regression, support
unprovoked seizures and related syndromes[1]. The primary vector machine (SVM), k-nearest neighbor, and more re-
tool for seizure detection in a clinical setting is the electroen- cently t-stochastic neighborhood embedding (t-SNE), as well
cephalogram (EEG). Identifying seizure discharges in EEG as neural networks. In general, traditional approaches have
is an early and crucial step of clinical diagnosis of epilepsy two processes, feature extraction and classification. Both the
by neurologists. EEG continuously measures the electrical identification of the appropriate features and the choice of
activity of the brain via electrodes placed on the scalp of a proper classifier can play important roles in optimizing
the brain. It holds the advantage of being accessible, non- algorithm performance. These processes depend heavily on
invasive, and able to record from large cortical areas. Manual domain expertise and consume a great deal of time and effort
inspection of long, continuous EEG for seizure detection is to select proper features and classifiers.
a time consuming and laborious process in both clinical and We have learned in class that deep learning approaches
experimental settings. It can take many hours to meticulously can automatically discover and learn discriminative features
examine days of EEG recordings for patients hospitalized needed for the classification of inputs[5]. Recently, many
to diagnose epilepsy. In an experimental setting, long-term studies have investigated deep learning for seizure detection.
EEG recordings (even up to several months) are often to be These studies have been based on different deep neural net-
reviewed. Furthermore, the EEG readings made by different work structures, such as a fully connected neural network
inspectors can be inconsistent as the criteria for abnormal (FCNN)[6], convolutional neural network (CNN)[7], and re-
EEG findings are experiential. Aside from manual annotation current neural network (RNN)[8]. We learned that CNNs and
by experts with domain knowledge, classical methods include RNNs are especially good at processing biomedical signals
modeling signal characteristics such as time-frequency and and sequences. In order to take advantage of automatic feature
power analyses. However these methods are also subject to discoveries we used as input only the raw EEG signals
high variations due to experimental setup, quality of record- segmented into epochs. Using these time series, we trained,
ings, and analysis pipelines. validated, tested and compared two models. We find that, for
the given dataset, a bidirectional long short-term memory net- B. Tasks
work performs better and more consistent than a convolutional
We want to start with a simple task. The numbers of EEG
neural network does.
epochs for each category are comparable with each other, so
II. M ATERIALS AND M ETHODS we first sought to classify epochs of epileptic seizure against
A. Data recordings from healthy regions of the brain (Fig. 2). The two
categories are balanced. There are 1488 epochs of seizure
Data is downloaded from kaggle.com, an open access, web-
activity, and 1462 healthy epochs. In total there are 2950
based data science community for data and code sharing. The
epochs. The labels for healthy recordings were changed from
data originally comes from University of California Irvine
3s to 0s.
Machine Learning Repository[9]. The EEG data itself are
multivariate time series with real and integer values. This subset of data is further divided into training set,
It is organized into 5 different folders, each with 100 validation set, and test set. 80 percent of the epochs are first
files, with each file representing a single subject/person. Each randomly selected as the training set; of the remaining 20
file is a recording of brain activity for 23.6 seconds. The percent, half of it was randomly designated as the validation
corresponding time-series is sampled into 4097 data points. set and the other half the test set. We also expanded the
Each data point is the value of the EEG recording at a different dimensionality and the final shapes are:
point in time. So there are in total 500 individuals with each - training set: (2360, 178, 1)
having 4097 data points for 23.5 seconds. - validation set: (295, 178, 1)
The original authors divided and shuffled every 4097 data - test set: (295, 178, 1)
points into 23 chunks, each chunk contains 178 data points
for 1 second, and each data point is the value of the EEG
recording at a different point in time. So now we have 23
x 500 = 11500 pieces of information(row), each information
contains 178 data points for 1 second(column), the last column
represents the label y 1,2,3,4,5.
The response variable is y in column 179, the Explanatory
variables X1, X2, ..., X178.
y contains the category of the 178-dimensional input vector.
Specifically y in 1, 2, 3, 4, 5:
1 - Recording of seizure activity
2 - They recorder the EEG from the area where the tumor
was located Fig. 2. Example recording from a healthy region.
3 - Yes they identify where the region of the tumor was
in the brain and recording the EEG activity from the healthy We designed the next task to take advantage of the unused
brain area portion of the dataset, categories 2, 4 and 5 (Fig. 3, 4, 5).
4 - eyes closed, means when they were recording the EEG In addition to epileptic and healthy recordings, we include
signal the patient had their eyes closed 1497 epochs of EEG from tumor regions, 1450 epochs of
5 - eyes open, means when they were recording the EEG recording when the subject has eyes closed, and 1508 epochs
signal of the brain the patient had their eyes open or recording when the subject has eyes open. All categories
All subjects falling in classes 2, 3, 4, and 5 are subjects who other than seizure were relabeled as 0s.
did not have epileptic seizure. Only subjects in class 1 have This subset of data is further divided into training set,
epileptic seizure (Fig. 1). We noted that it is characterized by validation set, and test set. 80 percent of the epochs are first
composite oscillatory patterns and high amplitude. randomly selected as the training set; of the remaining 20
percent, half of it was randomly designated as the validation
set and the other half the test set. We also expanded the
dimensionality and the final shapes are:
- training set: (5924, 178, 1)
- validation set: (740, 178, 1)
- test set: (741, 178, 1)

C. Network structures
For each of the two tasks we sought to compare two models
of different network type, a convolutional neural network and
a bidirectional long short-term memory network. We have
Fig. 1. Example recording of epileptic activity. learned and seen examples that these types of networks are
suitable for biomedical signals and sequences. We designed,
Fig. 3. Example recording from tumor area.
Fig. 6. Summary of Convolutional Neural Network.

the CNN, with (None, 178, 1) in shape. The input is first fed
through a densely connected layer with 32 hidden units and
ReLU as activation. The key layer is the bidirectional recurrent
layer, and we specified the recurrent units as long short-term
memory units; there are 128 LSTM units. The output goes
through a dropout layer with probability 0.3. The output is
then batch normalized, and fed through a dense layer with 64
hidden units. This is followed by another dropout layer with
Fig. 4. Example recording when the patient has eyes closed.
0.3 probability and another layer of batch normalization. The
output layer has size 2 and uses soft max as activation. In total
there are 182,786 parameters, 182,146 of which are trainable.

Fig. 5. Example recording when the patient has eyes open.

implemented and assessed the models using keras from ten- Fig. 7. Summary of Bidirectional Long Short Term Memory Network.
sorflow. Experiments were carried out in an online python
notebook in Google Colab. A copy of data is stored and For both models we use Adam algorithm for stochastic
accessed with Google Drive. optimization. We choose sparse categorical crossentropy as
Our CNN has 9 hidden layers. The input layer has shape objective function to be minimized. Our metric for assessing
(None, 178, 1) which corresponds to one epoch of EEG. model performance is prediction accuracy on the test dataset.
After the input layer, the first hidden layer a one-dimensional During training, we also iteratively saved the model parameters
convolution layer with 32 filters and kernel size 6. The output with the highest validation accuracy, and compared with the
is batch normalized and fed through a 2-element max pooling final model.
layer. A second convolution layer with 64 filters and kernel
size 3 follows. The output of the second convolution is also III. R ESULTS AND C ONCLUSION
batch normalized and max pooled. The 7th layer is a flattening
layer. Flattened output is fed into a densely connected layer A. Experiment results
with 32 hidden units, followed by another dense layer with 16 We trained a CNN and a bidirectional LSTM model on the
hidden units; both layers uses rectified linear unit (ReLU) as two different categorical combinations of our EEG dataset;
activation function. The output layer has size 2 and uses soft epileptic episodes against non-epileptic episodes; and epileptic
max as activation. In total there are 95,474 parameters, 95,282 episodes against all 4 remaining categories. In both experi-
of which are trainable. ments, the LSTM model resulted in higher accuracy and lower
The input layer of our BiLSTM model is the same as that of loss compared to the CNN model (Table 1).
Epileptic VS Healthy Epileptic VS All
CNN* BiLSTM CNN* BiLSTM B. Conclusion and discussion
Epochs 500 100 100 100 We have demonstrated that a bidirectional LSTM model
Training Loss 0.6931 0.0207 0.6931 0.0265
Training Accuracy 0.4580 0.9947 0.8194 0.9912 is more accurate at predicting epileptic episodes from EEG
Validation Loss 0.6931 0.1192 0.6931 0.1001 data compared to a CNN model of the same depth. By com-
Validation Accuracy 0.4949 0.9729 0.8230 0.9770 paring the models’ performance metrics when trained on two
Test Accuracy 0.5492 0.9831 0.8313 0.9798
Best Model Test Acc 0.9220 0.9797 0.8421 0.9838 different categorizations of the same dataset, we showed that
Results varied widely between training sessions, the results represented the BiLSTM model produces higher validation accuracy and
here are example values of one session lower validation loss compared to the CNN model, regardless
TABLE I of categorization. Additionally, we find that the CNN model
S UMMARY OF PERFORMANCE METRICS CNN AND B I LSTM produces inconsistent and erratic accuracy metrics, whereas
the BiLSTM model can consistently predict epileptic activity
with 0.97 accuracy.
We attribute the inconsistent performance by CNN to our
When trained on the epileptic (n=1,488) versus healthy
deliberate data division scheme. When assigning data into
(n=1,462) EEG data, the CNN model generated inconsistent
training, validation, or test sets, we did not maintain pro-
accuracy metrics between training sessions. The final valida-
portional balance between the categories; in one execution
tion accuracy varied roughly from 0.1-0.9. Here we report a
the proportion of seizure activity in the training set might be
final test accuracy of 0.55, while the test accuracy from a
very low while the proportion in the validation set or the test
saved best model equals to 0.922. Additionally, the training
set might be high, and the opposite can occur in a different
and validation accuracy curves, over the course of 500 training
execution. This fact further strengthens our claim of BiLSTM
epochs, behaved erratically with high peaks and low valleys
supremacy since it is invariant to the proportional imbalance.
(Fig. 8A). Further, the validation and training loss values are
Although the lack of exquisite relugarization in the CNN
equal at 0.6931 and the curves are superimposed and do not
model or its relatively smaller size might play a role in
exhibit expected asymptomatic behavior over training time
the difference, we believe that the ability of the BiLSTM
(Fig. 8B).
architecture to learn bidirectional time dependencies, which
The LSTM model trained on the same two categorical are characteristic of EEG data and even more pronounced in
variables of healthy versus epileptic EEG data showed more the strong oscillatory pattern in seizure activities, allowed for
consistent results. The validation accuracy assessed on the its superior performance over the CNN model.
model was 0.973 and the training accuracy remained higher It is important to note that a different type of network or
than the validation accuracy (Fig. 9A). We report a test architecture might work better given different input modalities.
accuracy of 0.983, while the test accuracy from a saved best A recent publication showed that CNN is the most suitable
model equals to 0.980. Additionally, the training and validation structure for automated seizure detection when applied to 2D
loss after 100 epochs were 0.0068 and 0.0739, respectively, images of raw EEG waveform[10]. We emphasize that our
suggesting appropriate fitting of the data (Fig. 9B). claim is based on a specific task on a particular set of data.
The CNN model trained on the full dataset categorized Nevertheless, our exploration and experimentation have been
as epileptic (n=1,488) versus all other categories (n=5,917), a fruitful learning experience. Further directions of research
behaved similarly compared to the previous experiment on the may include identifying mechanisms through which BiLSTM
smaller subset of data (Fig. 10). Both the training and vali- outperforms CNN, finding the best input modality for each
dation accuracy varied widely from 0.1-0.9 between training network type, or integrating distinct network types into one
sessions and showed erratic behavior within training sessions. model suitable for a variety of data.
Again, the training and validation loss were equal at 0.693.
Here we report a final test accuracy of 0.831, while the test ACKNOWLEDGMENT
accuracy from a saved best model equals to 0.842. We would like to thank Prof. Sajda and the wonderful TAs
The bidirectional LSTM model trained on the full dataset for all the valuable guidance and assistance throughout the
showed similar validation loss and accuracy compared to when course, especially in such an unusual and uncertain semester.
trained on the smaller subset in the first test (Fig. 11). Again,
the performance metrics were significantly better compared to
the CNN model. The validation accuracy and loss were 0.977
and 0.108, respectively, and the training loss and accuracy
were better compared to the validation counterparts, indicating
appropriate fitting of the model. We report a final test accuracy
of 0.980, while the test accuracy from a saved best model
equals to 0.984.
Comparing the results we report the more sohpisticated
model (BiLSTM) with larger amount of data (epilepsy vs all
other) yielded the best performance.
[A] [B]
Fig. 8. CNN Performance on Seizure v.s. Healthy

[A] [B]
Fig. 9. LSTM Performance on Seizure v.s. Healthy

[A] [B]
Fig. 10. CNN Performance on Seizure v.s. All Other Categories

[A] [B]
Fig. 11. LSTM Performance on Seizure v.s. All Other Categories
R EFERENCES
[1] Fisher, R. S. et al. ILAE official report: a practical clinical definition of
epilepsy. Epilepsia 55, 475–482, https://doi.org/10.1111/epi.12550 (2014).
[2] Mohseni, H. R., Maghsoudi, A. Shamsollahi, M. B. Seizure detection in
EEG signals: a comparison of different approaches. In: Proceedings of the
IEEE Engineering in Medicine and Biology Society Suppl, 6724–6727,
https://doi.org/10.1109/IEMBS.2006.260931 (2006).
[3] McShane, T. A clinical guide to epileptic syndromes and their treatment.
Arch. Dis. Child. 89(6), 591 (2004).
[4] Palus, M. Nonlinearity in normal human EEG: cycles, temporal asymme-
try, nonstationarity and randomness, not chaos. Biol. Cybern. 75, 389–396
(1996).
[5] LeCun, Y., Bengio, Y. Hinton, G. Deep learning. Nature 521, 436–444,
https://doi.org/10.1038/nature14539 (2015).
[6] Jang, H. J. Cho, K. O. Dual deep neural network-based classifiers to
detect experimental seizures. Korean J Physiol Pharmacol 23, 131–139,
https://doi.org/10.4196/kjpp.2019.23.2.131 (2019).
[7] Zhou, M. et al. Epileptic Seizure Detection Based on
EEG Signals and CNN. Front. Neuroinform. 12, 95,
https://doi.org/10.3389/fninf.2018.00095 (2018).
[8] Hussein, R., Palangi, H., Ward, R. K. Wang, Z. J. Optimized
deep neural network architecture for robust detection of epilep-
tic seizures using EEG signals. Clin. Neurophysiol. 130, 25–37,
https://doi.org/10.1016/j.clinph.2018.10.010 (2019).
[9] Andrzejak RG, Lehnertz K, Rieke C, Mormann F, David P, Elger
CE (2001) Indications of nonlinear deterministic and finite dimensional
structures in time series of brain electrical activity: Dependence on
recording region and brain state, Phys. Rev. E, 64, 061907
[10] Cho, KO., Jang, HJ. Comparison of different input modalities and
network structures for deep learning-based seizure detection. Sci Rep 10,
122 (2020). https://doi.org/10.1038/s41598-019-56958-y

You might also like