11 - Efficient - Epileptic - Seizure - Prediction - Based - On - Deep - Learning
11 - Efficient - Epileptic - Seizure - Prediction - Based - On - Deep - Learning
5, OCTOBER 2019
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 805
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
806 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO. 5, OCTOBER 2019
TABLE I
INFORMATION OF EEG DATA OF THE SELECTED PATIENTS
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 807
l (y, ŷ) = − [y log (ŷ) + (1 − y) log (1 − ŷ)] (1) where xi is the vector to be normalized in a mini-batch B =
2
{x1 , x2 , . . . xm }. μB and σB are the mean and variance of the
where ŷ and y are the desired output and the calculated output current mini-batch of xi , respectively. is a constant added
respectively and l(y, ŷ) is the loss function. to the mini-batch variance for numerical stability. γ and β are
Rectifier Linear Unit (ReLU) activation function [22], as de- learned parameters used to scale and shift the normalized value
fined by (2), is used across the hidden layers to add nonlinearity respectively [26].
and to ensure robustness against noise in the input data. The proposed DCNN architecture is used as the front-end
x if x > 0 feature extractor in our three proposed models in Figs. 3, 4,
f (x) = (2)
0 if x < 0 5(b) which helps in spatial feature extraction from the different
electrodes position on the scalp. The number of trainable param-
where x is the sum of the weighted input signals and f (x) is the eters is drastically decreased when employing DCNN due to the
ReLU activation function. weight sharing property. The number of trainable parameters
Sigmoid activation function (3) is selected for the output layer. in the second model, DCNN + MLP, is almost 520K, while in
to predict the input data class. the third and fourth model, DCNN + Bi-LSTM and DCAE +
1 Bi-LSTM, the number of trainable parameters is almost 28K.
pi = (3)
1 + e−xi
where xi is the sum of the weighted input signals and pi is the D. Bidirectional-LSTM Recurrent Neural Network
probability of the input example being preictal. Recurrent neural network (RNN) is a type of neural network
that can maintain state along the sequential inputs. It can process
C. Deep Convolutional Neural Network
a temporal sequence of data depending on the processing done on
Convolutional Neural networks (CNNs) have shown great the previous sequences. This property of RNN makes it suitable
success in different pattern recognition and computer vision ap- for applications like prediction of time series data. The typical
plications [23]. This is due to the ability of CNN to automatically architecture of RNN is trained using backpropagation through
extract significant spatial features that best represents the data time (BPTT) which has some drawbacks like exploding and
from its raw form without any preprocessing and without any vanishing gradients and information morphing.
human decision in selecting these features [24]. The sparse con- Long Short Term Memory Networks (LSTMs) [27] are a type
nectivity and parameter sharing of CNN give it high superiority of RNN, implemented to overcome the problems of basic RNN.
regarding the memory footprint as it requires much less memory LSTMs are able to solve the problem of vanishing gradient by
to store the sparse weights. The equivariant representation prop- maintaining the gradient values during the training process and
erty of the CNN increases the detection accuracy of a pattern backpropagate it through layer and time, thus LSTM has the
when it exists in a different location across the image [25]. A capability of learning long-term dependencies. LSTM cell, as
typical CNN formed of three types of layers: convolution layer, shown in Fig. 8 consists of three controlling gates that could
pooling layer and fully connected layer. The convolution layer is store or forget the previous state and use or discard the current
used to generate the feature map by applying filters with trainable state. Any LSTM cell computes two states at each time step: a
weights to the input data. This feature map is then down-sampled cell state (c) that could be maintained for long time steps and a
by applying the pooling layer to reduce the features’ dimension hidden state (h) that is the new output of the cell at each time
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
808 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO. 5, OCTOBER 2019
Fig. 7. The architecture of the proposed DCNN front-end in DCNN based models.
Fig. 8. Basic LSTM cell. segment in the reverse order. The network output at each time
step is the combined outputs of the two blocks at this time
step. In addition to the previous context processing in standard
step. The mathematical expressions governing the cell gates’ LSTM, Bi-LSTM processes the future context which enhances
operation are defined as follows: the prediction results. Using Bi-LSTM as a classifier enhances
ft = σ (Wf h ht−1 + Wf x xt + bf ) (5) the prediction accuracy through extracting the important tem-
poral features in addition to the spatial features extracted by the
it = σ(Wih ht−1 + Wix xt + bi ) (6) DCNN.
Bi-LSTM is used in two proposed models in Figs. 4, 5(b), as
ot = σ (Woh ht−1 + Wox xt + bo ) (7) the back-end classifier that works on the feature vector generated
by DCNN. The proposed network consists of a single bidirec-
c̃t = tanh (Wch ht−1 + Wcx xt + bc ) (8)
tional layer that predicts the class label at the last time instance
ct = ft ◦ ct−1 + it ◦ c̃t (9) after processing all the EEG segments as shown in Fig. 9. We
chose the number of units, dimensionality of the output space,
ht = ot ◦ tanh (ct ) (10) to be 20. Dropout regularization technique is utilized to avoid
overfitting. The dropout is applied to the input and the recurrent
where xt is the input at time t, ct and ht are the cell state and the state with factor of 10% and 50% respectively. The sigmoid
hidden state at time t respectively. W and b denote weights and activation function is used for prediction of the EEG segment’s
biases parameters respectively. σ is the sigmoid function and ◦ class and RMSprop is selected for optimization.
is the Hadamard product operator. c̃t is a candidate for updating
ct through the input gate.
E. Deep Convolutional Autoencoder
The input gate it decides whether to update the cell with a
new cell state c̃t , while the forget gate ft decides what to keep Autoencoders (AEs) are unsupervised neural networks whose
or forget from the previous cell state and finally the output gate target is to find a lower dimensional representation of the
ot decides how much information to be passed to the next cell. input data. This technique has many applications like data
Instead of using LSTM as the classifier, we used a compression [29], dimensionality reduction [30], visualizing
Bidirectional-LSTM (Bi-LSTM) network [28] in which each high dimensional data [31] and removing noise from the input
LSTM block is replaced by two blocks that process temporal data. The AE network has two main parts namely, encoder and
sequence simultaneously in two opposite directions as depicted decoder. The encoder compresses the high dimensional input
in Fig. 9. In the forward pass block, the feature vector generated data into lower dimensional representation called latent space
from the DCNN is processed starting from its first-time instance representation or bottleneck and the decoder is retrieving the
to the end, while the backward pass block processes the same data back to its original dimension. The simple AE uses fully
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 809
Fig. 10. The architecture of the proposed DCAE. C stands for convolution, P for pooling, D for deconvolution and U for upsampling layer.
connected layers for the encoder and decoder. The aim is to learn After DCAE training, the pre-trained encoder is used as a
the parameters that minimize the cost function which expresses front-end of the fourth proposed model, DCAE + Bi-LSTM, as
the difference between the original data and the retrieved one. shown in Fig. 5(b) while the back-end is Bi-LSTM network. We
Deep Convolutional Autoencoder (DCAE) replaces the fully used the same network architecture of the DCNN and Bi-LSTM
connected layers in the simple AE with convolution layers. that is used in the third model (DCNN + Bi-LSTM). Training of
Due to the limited EEG dataset for each patient, we decided to this model is done in a supervised manner to predict the patient-
extend our work to develop an unsupervised training algorithm specific seizure onset. Since we used both unsupervised and
using DCAE as shown in Fig. 5(a). The proposed architecture supervised learning algorithms, this model is considered a semi-
of the DCAE model is depicted in Fig. 10. We used the same supervised learning model.
proposed DCNN model as an encoder and added the decoder
network to build the DCAE. Unsupervised learning is deployed
using transfer learning technique by training the DCAE on F. EEG Channel Selection
all the selected patients’ data (not patient-specific). Transfer We introduce an EEG channel selection algorithm to select
learning helps to obtain better generalization and enhance the the most important and informative EEG channels related to our
optimization of our prediction model and therefore reducing the problem. Decreasing the number of channels helps with reducing
training time. the features’ dimension, the computation load and the required
In the DCAE, Fig. 10, the encoder part consists of convolution memory for the model to be suitable for real-time application.
and pooling layers interchangeably, while in the decoder part, The proposed channel selection algorithm is explained in Ta-
the deconvolution and upsampling layers are used to reconstruct ble II. We provide the algorithm with the EEG preictal segments
the original EEG segment. The encoder output is the latent for each patient and the measured prediction accuracy by running
space representation which is low dimensional features that best our fourth model, DCAE + Bi-LSTM using all channels. On
represent the EEG input segment. On the other hand, the decoder the other hand, the algorithm will output the reduced channels
output is the reconstructed version of the original input. The that give the same accuracy by omitting redundant or irrelevant
learned encoder parameters are saved to be used later for training channels. We start by computing the statistical variance defined
the prediction model in Fig. 5(b) allowing the training process by (12) and the entropy defined by (13) for all the available
to have a good start point instead of random initialization of the channels (23 channels) of the preictal segments. Then, we select
parameters which reduces the training time drastically. the channels with highest variance entropy product that provide
Training of the DCAE is done using unlabeled EEG segments the same given prediction accuracy. This is done through an
(balanced data of preictal and interictal segments) of all the iterative process by training our model on the reduced channels
selected patients. RELU activation function is used across all the over each iteration. The variance is estimated as
convolutional layers. Batch Normalization technique is used to
N
improve the training speed and to reduce overfitting. The DCAE 1
is optimized using RMSprop optimizer. The mean square error σ 2 (Xc ) = (xc (i) − μc )2 (12)
N i=1
is utilized as the cost function and is defined as
m 2
1 (i) where Xc , μc and N are the EEG data after normalization, mean
J (θ) = x − x(i) (11)
2m i=1 and number of samples of channel c, respectively. The entropy
of channel c is calculated as
(i)
where x(i) is the input EEG signal and x is the reconstructed N
EEG signal. m is the number of training examples and θ is the H (Xc ) = − p (xc (i)) log2 p (xc (i)) (13)
parameters being learned. i=1
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
810 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO. 5, OCTOBER 2019
TABLE II 80% of the training data is assigned to the training set while 20%
THE PROPOSED EEG CHANNEL SELECTION ALGORITHM
is assigned to the validation set over which the hyperparameters
are updated and the model is optimized.
We evaluated the performance of our models by calculating
some measures such as sensitivity, specificity, and accuracy on
the test data. These measures are averaged across all patients.
The prediction time of each model is recorded at the time of first
preictal segment detection. The evaluation measures are defined
as follows:
TP
Sensitivity = (14)
TP + FN
TN
Specificity = (15)
TN + FP
TP + TN
Accuracy = (16)
TP + TN + FP + FN
where TN, TP, FN and FP are the true negative, true positive,
false negative and false positive respectively.
III. RESULTS
A. Performance Evaluation and Analysis
We evaluated our proposed patient-specific models on the
selected patients by calculating some performance measures
such as prediction accuracy, prediction time, sensitivity, speci-
ficity and false alarm per hour. The training time is also com-
puted to evaluate our proposed channel selection algorithm.
Table III shows the obtained values of these measures for the
proposed four models which are MLP, DCNN + MLP, DCNN
+ Bi-LSTM, DCAE + Bi-LSTM. The fifth model, DCAE +
Bi-LSTM + CS, is the same as the fourth one but with using the
where p(xc (i)) is the probability mass function of the channel
channel selection algorithm.
c having N samples.
As could be noticed from Table III, MLP has the worst
In the channel selection algorithm, we chose the channels
accuracy, sensitivity, specificity and false alarm rate among the
with the highest variance entropy product because we want to
proposed models and this is because the learning process in this
maximize both. We want to select the channel that has a high
model aims at updating the network parameters for the output to
variance during the preictal interval and also provide the largest
be close to the ground truth without extracting any features from
amount of information.
the input data. The huge number of parameters in this model
(around 9 million) is another drawback. The training time is
G. Training and Testing Method
moderate (7.3 min) due to network simplicity.
In order to overcome the problem of the imbalanced dataset, By introducing the DCNN as a front-end, we found around
we selected the number of interictal segments to be equal to 10% enhancement in the accuracy, sensitivity and specificity and
the available number of preictal segments during the training the false alarm rate is improved by 60%. This improvement is due
process. The interictal segments were selected at random from to the ability of DCNN to extract the spatial features across dif-
the overall interictal samples. To ensure robustness and general- ferent scalp positions to use it in discrimination between preictal
ity of the proposed models, we used the Leave-one-out cross and interictal brain states. On the other hand, the training time
validation (LOOCV) technique as the evaluation method for is increased by 5 min and this is due to the added computation
all of our proposed models. In LOOCV, the training is done N complexity by the DCNN. The network parameters are dras-
separate times, where N is the number of seizures for a specific tically decreased because of the parameter sharing and sparse
patient. Each time, all seizures are involved in the training connectivity properties of the DCNN. In our third model, we
process except one seizure on which the testing is applied. The used Bi-LSTM as the back-end along with DCNN and this model
process is then repeated by changing the seizure under test. increase the accuracy to be 99.6%, the sensitivity to be 99.72%
By using this method, we ensure that the testing covers all the and the specificity to be 99.6%. The false alarm rate is enhanced
seizures and the tested seizures are unseen during the training. a lot to reach 0.004 false alarm per hour. This improvement is
The performance for one patient is the average across N trials due to using Bi-LSTM as a classifier instead of MLP. Bi-LSTM
and the overall performance is the average across all patients. extracts temporal features from the input sequence which helps
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 811
TABLE III
PERFORMANCE EVALUATION OF THE PROPOSED MODELS
Fig. 11. The measured accuracy among three different proposed algorithms. Fig. 13. The measured specificity among three proposed algorithms.
Fig. 14. The measured false alarm rate among three proposed algorithms.
Fig. 12. The measured sensitivity among three different proposed algorithms.
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
812 IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 13, NO. 5, OCTOBER 2019
TABLE IV
COMPARISON WITH OTHER SEIZURE PREDICTION METHODS APPLIED TO CHB-MIT DATASET
Kruskal-Wallis test yielded (p-value < 0.05) for all the per- test the proposed models proves the robustness and generality
formance measures indicating statistical significance difference of our method against variation across various seizure types.
between the results among all the proposed models. For the Our experimental results and the comparison with previous
accuracy (p-value = 0.01), for the sensitivity (p-value = 0.006), work demonstrate that the proposed method is efficient, reliable
for the specificity (p-value = 0.04), and for the false alarm rate and suitable for real-time application of seizure prediction. This
(p-value = 0.04). is by achieving accuracy higher than the state of the art with
earlier prediction time to mitigate the potential life-threatening
incidents for epileptic patients.
C. Comparison With Other Methods
For further evaluation of our proposed method, we compared
our achieved experimental results with previous work that have REFERENCES
used the same dataset as shown in Table IV. While the same [1] R. S. Fisher et al., “ILAE official report: A practical clinical definition of
criterion to select the patients from the dataset is applied in this epilepsy,” Epilepsia, vol. 55, no. 4, pp. 475–482, Apr. 2014.
paper and [10], the other compared work employed different cri- [2] World Health Organization, Neurological Disorders: Public Health Chal-
lenges. Geneva, Switzerland: World Health Organization, 2006.
teria which led to different selection of patients. In the presented [3] C.-Y. Chiang, N.-F. Chang, T.-C. Chen, H.-H. Chen, and L.-G. Chen,
previous work, some features were extracted like Zero-Crossing “Seizure prediction based on classification of EEG synchronization pat-
(ZC) interval in the EEG signals as in [33] and ZC of the Wavelet terns with on-line retraining and post-processing scheme,” in Proc.
Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., Boston, MA, USA, 2011,
Transform (WT) coefficients of the EEG signals as in [10], WT pp. 7564–7569.
of the EEG signals as in [13], spectral power as in [34] and set [4] “Epilepsy prevalence, incidence and other statistics,” Joint Epilepsy Coun-
of features in time domain, frequency domain and from graph cil, Leeds, U.K., 2005.
[5] E. Bou Assi, D. K. Nguyen, S. Rihana, and M. Sawan, “Towards accu-
theory as in [11]. These studies used machine learning based rate prediction of epileptic seizures: A review,” Biomed. Signal Process.
classifiers like SVM or Gaussian Mixture Model (GMM). The Control, vol. 34, pp. 144–157, Apr. 2017.
authors in [13] used CNN as a classifier. The proposed method [6] A. Aarabi, R. Fazel-Rezai, and Y. Aghakhani, “EEG seizure prediction:
Measures and challenges,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol.
achieved the highest accuracy, sensitivity and specificity among Soc., Sep. 2009, pp. 1864–1867.
others. Our prediction time is the earliest and the false alarm rate [7] M. Bandarabadi, C. A. Teixeira, J. Rasekhi, and A. Dourado, “Epileptic
is the lowest. seizure prediction using relative spectral power features,” Clin. Neuro-
physiol., vol. 126, no. 2, pp. 237–248, Feb. 2015.
[8] L. D. Iasemidis, J. C. Sackellares, H. P. Zaveri, and W. J. Williams,
“Phase space topography and the Lyapunov exponent of electrocor-
IV. CONCLUSION ticograms in partial seizures,” Brain Topography, vol. 2, no. 3, pp. 187–201,
1990.
In this paper, a novel deep learning based patient-specific [9] M. Le Van Quyen, J. Martinerie, M. Baulac, and F. Varela, “Anticipating
epileptic seizures in real time by a non-linear analysis of similarity be-
epileptic seizure prediction method using long-term scalp EEG tween EEG recordings,” Neuroreport, vol. 10, no. 10, pp. 2149–2155, Jul.
data has been proposed. This method achieves a prediction accu- 1999.
racy of 99.6%, a sensitivity of 99.72%, a specificity of 99.60%, a [10] S. Elgohary, S. Eldawlatly, and M. I. Khalil, “Epileptic seizure prediction
using zero-crossings analysis of EEG wavelet detail coefficients,” in Proc.
false alarm rate of 0.004 per hour and prediction time of one hour IEEE Conf. Comput. Intell. Bioinf. Comput. Biol., 2016, pp. 1–6.
prior the seizure onset. An important spatial and temporal feature [11] K. M. Tsiouris, V. C. Pezoulas, D. D. Koutsouris, M. Zervakis, and
from raw data are learned by the DCNN and Bi-LSTM networks D. I. Fotiadis, “Discrimination of preictal and interictal brain states from
long-term EEG data,” in Proc. IEEE 30th Int. Symp. Comput.-Based Med.
respectively. DCAE based Semi-supervised learning approach Syst., 2017, pp. 318–323.
is investigated with the transfer learning technique which led [12] C. A. Teixeira et al., “Epileptic seizure predictors based on computational
to reducing the training time. For the system to be suitable for intelligence techniques: A comparative study with 278 patients,” Comput.
Methods Programs Biomed., vol. 114, no. 3, pp. 324–336, May 2014.
real-time application, a channel selection algorithm is proposed [13] H. Khan, L. Marcuse, M. Fields, K. Swann, and B. Yener, “Focal onset
which reduces the computational load and the training time. seizure prediction using convolutional networks,” IEEE Trans. Biomed.
Using Leave-One-Out exhaustive cross-validation technique to Eng., vol. 65, no. 9, pp. 2109–2118, Sep. 2018.
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.
DAOUD AND BAYOUMI: EFFICIENT EPILEPTIC SEIZURE PREDICTION BASED ON DEEP LEARNING 813
[14] H. G. Daoud, A. M. Abdelhameed, and M. Bayoumi, “Automatic epileptic Hisham Daoud received the B.Sc. and M.Sc. degrees
seizure detection based on empirical mode decomposition and deep neural from Cairo University, Giza, Egypt, and the Ph.D.
network,” in Proc. IEEE 14th Int. Colloq. Signal Process. Appl., 2018, degree from Ain Shams University, Cairo, Egypt, in
pp. 182–186. 2004, 2007, and 2014, respectively, all in electronics
[15] “American epilepsy society seizure prediction challenge,” Accessed: and communications engineering.
Jun. 17, 2018. [Online]. Available: https://www.kaggle.com/c/seizure- Since 2004, he has held multiple positions in
prediction both industry and academia. He is currently with the
[16] N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: special issue University of Louisiana, Lafayette, LA, USA. His re-
on learning from imbalanced data sets,” SIGKDD Explorations, vol. 6, search interest includes biomedical signal processing,
pp. 1–6, Jun. 2004. machine learning, deep learning, and neuromorphic
[17] H. Daoud and M. Bayoumi, “Deep Learning based Reliable Early Epileptic computing.
Seizure Predictor,” in Proc. IEEE Biomed. Circuits Syst. Conf., Cleveland, Dr. Daoud was the recipient of the IEEE CASS Student Travel Award,
OH, USA, 2018, pp. 1–4. Best Paper Award in the 14th IEEE Colloquium on Signal Processing and its
[18] A. H. Shoeb, “Application of machine learning to epileptic seizure onset Applications Conference (CSPA’2018). He has served as a Reviewer for several
detection and treatment,” M.Sc. Thesis, Massachusetts Inst. Technol., IEEE conferences and journals.
Cambridge, MA, USA, 2009.
[19] A. L. Goldberger et al., “PhysioBank, physiotoolkit, and physionet: Com-
ponents of a new research resource for complex physiologic signals,”
Circulation, vol. 101, no. 23, pp. e215–e220, Jun. 2000.
[20] “CHB-MIT scalp EEG database,” Accessed: May 2, 2019. [Online].
Available: https://physionet.org/pn6/chbmit/
[21] N. Siddique and H. Adeli, Computational Intelligence: Synergies of Fuzzy
Logic, Neural Networks and Evolutionary Computing. Hoboken, NJ, USA: Magdy A. Bayoumi (LF’16) received the B.Sc. and
Wiley, 2013. M.Sc. degrees in electrical engineering from Cairo
[22] R. H. R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and University, Giza, Egypt, in 1973 and 1977, respec-
H. S. Seung, “Digital selection and analogue amplification coexist in a tively, the M.Sc. degree in computer engineering from
cortex-inspired silicon circuit,” Nature, vol. 405, no. 6789, pp. 947–951, Washington University, St. Louis, MO, USA, in 1981,
Jun. 2000. and the Ph.D. degree in electrical engineering from
[23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification the University of Windsor, ON, Canada, in 1984.
with deep convolutional neural networks,” in Advances in Neural Infor- He is currently the Department Head of W. H. Hall
mation Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Department of Electrical and Computer Engineering.
Q. Weinberger, Eds. Red Hook, NY, USA: Curran Associates, Inc., 2012, He is the Hall Endowed Chair in computer engineer-
pp. 1097–1105. ing. He was the Director of the Center for Advanced
[24] Y. Bengio, Y. Lecun, and Y. Lecun, “Convolutional networks for images, Computer Studies and the Department Head of Computer Science Department.
speech, and time-series,” in The Handbook of Brain Theory and Neural He was also, the Loflin Eminent Scholar Endowed Chair in computer science,
Networks. Cambridge, MA, USA: MIT Press, 1995. all at the University of Louisiana at Lafayette where he has been a faculty
[25] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, member since 1985. He has graduated about 100 Ph.D. and 150 M.Sc. students,
MA, USA: MIT Press, 2016. authored/coauthored about 600 research papers and more than 10 books. He was
[26] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network the Guest/Coguest Editor of more than 10 special journal issues, the latest was
training by reducing internal covariate shift,” in Proc.32nd Int. Conf. Mach. on Machine to Machine Interface. He has served the IEEE Computer, Signal
Learn., Feb. 2015, vol. 37, pp. 448–456. Processing, and Circuits & Systems (CAS) societies. He is currently the Vice
[27] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural President of Technical Activities of IEEE RFID council and he is on the IEEE
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. RFID Distinguished Lecture Program (DLP). He is a member of IEEE IoT
[28] A. Graves and J. Schmidhuber, “Framewise phoneme classification with Activity Board. He was the recipient of the many awards, among them; the
bidirectional LSTM networks,” in Proc. IEEE Int. Joint Conf. Neural IEEE CAS Education award and the IEEE CAS Distinguished Service award.
Netw., 2005, vol. 4, pp. 2047–2052. He was on the IEEE DLP programs for CAS and Computer societies. He was
[29] P. Baldi, “Autoencoders, unsupervised learning, and deep architec- on the IEEE Fellow Selection Committee. He has been an ABET Evaluator
tures,” in Proc. ICML Workshop Unsupervised Transfer Learn., 2012, and he was an ABET Commissioner and Team Chair. He has given numerous
pp. 37–49. keynote/invited lectures and talks nationally and internationally. He was the
[30] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” May General Chair of IEEE ICASSP 2017 in New Orleans. He, also, chaired many
2014, arXiv:1312.6114. conferences including ISCAS 2007, ICIP 2009, and ICECS 2015. He was the
[31] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach. Chair of an international delegation to China, sponsored by People-to-People
Learn. Res., vol. 9, pp. 2579–2605, 2008. Ambassador, 2000. He received the French Government Fellowship, University
[32] W. H. Kruskal and W. A. Wallis, “Use of ranks in one-criterion vari- of Paris Orsay, 2003–2005 and 2009. He was a Visiting Professor with the King
ance analysis,” J. Amer. Statist. Assoc., vol. 47, pp. 583–621, Dec. Saud University. He was a United Nation Visiting Scholar. He has been an advisor
1952. to many EE/CMPS Departments in several countries. He was on the State of
[33] A. Shahidi Zandi, R. Tafreshi, M. Javidan, and G. A. Dumont, “Predicting Louisiana Comprehensive Energy Policy Committee. He was the Vice President
epileptic seizures in scalp EEG based on a variational Bayesian Gaussian of Acadiana Technology Council. He was on the Chamber of Commerce Tourism
mixture model of zero-crossing intervals,” IEEE Trans. Biomed. Eng., and Education committees. He was a member of several delegations representing
vol. 60, no. 5, pp. 1401–1413, May 2013. Lafayette to international cities. He was on the Le Centre International Board. He
[34] Z. Zhang and K. K. Parhi, “Low-complexity seizure prediction from was the General Chair of SEASME (an organization of French Speaking cities)
iEEG/sEEG using spectral power and ratios of spectral power,” IEEE conference in Lafayette. He is a member of Lafayette Leadership Institute; he
Trans. Biomed. Circuits Syst., vol. 10, no. 3, pp. 693–706, Jun. was a founding member of its executive committee. He was a Column Editor
2016. for Lafayette Newspaper; the Daily Advertiser.
Authorized licensed use limited to: University of Canberra. Downloaded on August 10,2024 at 07:21:16 UTC from IEEE Xplore. Restrictions apply.