Automatic Detection of Schizophrenia by Applying Deep Learning Over Spectrogram Images of EEG Signals
Automatic Detection of Schizophrenia by Applying Deep Learning Over Spectrogram Images of EEG Signals
https://doi.org/10.18280/ts.370209 ABSTRACT
Received: 17 January 2020 This study presents a method that aims to automatically diagnose Schizophrenia (SZ)
Accepted: 20 March 2020 patients by using EEG recordings. Unlike many literature studies, the proposed method does
not manually extract features from EEG recordings, instead it transforms the raw EEG into
Keywords: 2D by using Short-time Fourier Transform (STFT) in order to have a useful representation
schizophrenia, CNN, deep learning, of frequency-time features. This work is the first in the relevant literature in using 2D time-
spectrogram frequency features for the purpose of automatic diagnosis of SZ patients. In order to extract
most useful features out of all present in the 2D space and classify samples with high
accuracy, a state-of-art Convolutional Neural Network architecture, namely VGG-16, is
trained. The experimental results show that the method presented in the paper is successful
in the task of classifying SZ patients and healthy controls with a classification accuracy of
95% and 97% in two datasets of different age groups. With this performance, the proposed
method outperforms most of the literature methods. The experiments of the study also reveal
that there is a relationship between frequency components of an EEG recording and the SZ
disease. Moreover, Grad-CAM images presented in the paper clearly show that mid-level
frequency components matter more while discriminating a SZ patient from a healthy control.
235
predict working memory performance of healthy and SZ recording is sufficient to automatically discriminate SZ
patients. Their method reached an accuracy of 87% in the patients. The frequency-time features are obtained by
prediction performance. Similarly, Santos-Mayo et al. [13] converting raw EEG signals into 2D spectrogram images by
tested various ML approaches and feature selection algorithms using Short-time Fourier Transformation (STFT). As to our
including electrode grouping and filtering. As a result, they knowledge, it is the first time in the relevant literature,
reported Multi-Layer Perceptron (MLP) and SVM algorithms spectrograms are used as inputs to detect SZ patients. The most
have the best accuracy in classification performance with useful features that are thought to be present in these images
93.42% and 92.23%, respectively. Moreover, classification are automatically extracted by a state-of-art CNN model which
over features obtained from J5 feature selection algorithm [14] also classifies samples into healthy or diseased in later layers
performed better. In the study of Aslan and Akın [15], features of the network. Therefore, the proposed method is
are extracted from the EEG signals by using Relative Wavelet advantageous to many literature methods that require expert
Energy. These features are then fed to K-Nearest Neighbors knowledge to extract useful features from EEG recordings
algorithm in order to classify healthy and SZ cases. They because it extracts features automatically through layers of the
reported to have reached nearly 90% accuracy performance. CNN. Moreover, when it is contrasted to literature methods
Thilakvathi et al. [16] used Support Vector Machines (SVM) that use a CNN, it still has advantages such as interpretable
algorithm to discriminate SZ patients. They used Hannon outputs, simpler pipeline and easier to implement architecture.
Entropy, Spectral Entropy, Information Entropy, Higuchi’s The proposed method is tested against two different datasets
Fractal Dimension and Kolmogorov Complexity values as each of which contains patients and healthy controls of
features used as inputs to SVM. They reported an accuracy of different age groups. The fact that the proposed method
88.5%. In all these literature methods mentioned up to now, reaches high accuracy (95% and 97%) in both children and
the EEG recordings are not used in raw, instead researchers adult data show that it is a robust method for the task of
crafted some features out of EEG signals and fed a ML automatically diagnosing the SZ disease. Moreover, the
algorithm of the choice with these features. This feature obtained accuracy values are better than those of most of the
engineering approach has several advantages like good literature methods. It should also be noted that one more
predictive performance but it requires experts with advantage of the method is that because it uses images as
comprehensive knowledge of the target domain. Also, all these inputs, the model can output interpretable results such as Grad-
extracted features are ad hoc solutions specific to the data and CAM images that reveal the relationship between frequency
they are not proven to generalize well with all cases. components and the disease.
As an alternative approach, researchers have recently been
investigating Deep Learning (DL) algorithms such as CNN to
automatically diagnose SZ patients because DL algorithms do 2. METHODS
not require the practitioner extract any features manually from
the input. The features in the given input are extracted 2.1 Methods
automatically in layers of the network such as convolutional
and pooling layers. There are only few literature methods that 2.1.1 Deep learning
utilize CNNs to detect SZ in given EEG signals. In one study, Deep Learning is a recent approach in Machine Learning
Phang et al. [17] proposed a method that accepts brain that adopts hierarchical learning of features with deeper neural
functional connectivity information as features. These features network architectures. Network structures in DL looks similar
are extracted from EEG recordings by use of vector to those utilized in traditional ML, however, they differ in the
autoregressive (VAR) model, partial directed coherence way that DL algorithms attempt to learn features by
(PDC) and complex network measures of network topology. themselves automatically while ML methods often require
The obtained features are subsequently fed to two proper features given to the network by the practitioner. DL
Convolutional Neural Network (CNN) models which in turn started to come forward as a successful alternative to
are fused into a Fully-Connected Neural Network (FCN) that traditional ML algorithms only after large-scale datasets
is capable of classifying healthy controls and SZ patients. become publicly available and the hardware required to
They reported an accuracy of 93.06% in the classification task. process such kind of data become cheaper and thus more
Their method is reported to reach a satisfactory accuracy but accessible. Therefore, recently DL has been frequently used as
relies on additional data such as brain connectivity features. In a method to process, analyze and evaluate medical images
another study, Oh et al. [6] utilized a CNN model to classify mostly in the form of CNNs [18]. This study also uses a CNN
19-channel EEG recordings of 14 healthy controls and 14 SZ model to classify spectrogram images
patients. Their CNN model had a total of 11 layers including
regular convolutional, pooling and dense layers. No 2.1.2 Convolutional Neural Networks
preprocessing is used and the raw EEG channels are fed into Convolutional Neural Networks are DL networks that are
the CNN model all at once. Their method is reported to reach designed to process multimedia data types (e.g., images) in a
an accuracy of 81.26% for subject based testing and 98.07% way that features can be automatically extracted in a
for non-subject based testing. Both methods that utilize DL to hierarchical manner through the layers of the network [19]. At
diagnose SZ lack interpretability due to the use of CNNs as a the core, a CNN model often consists of two modules, a)
black-box solution. This is partly because of the fact that raw Feature extraction through convolutional and pooling layers b)
EEG signals are not obviously correlated with possible visual Classification stage via Fully-Connected Network layers
outcomes of convolution operations. (FCN) which operates similarly to a traditional Multi-Layer
In this study, we propose a method that attempts to detect Perceptron (MLP).
SZ patients with high accuracy, simple pipeline and as well as In a regular CNN model, there are a number of subsequent
interpretable outputs. The proposed study is novel in the way convolutional and pooling layers each of which is responsible
that it hypothesizes that frequency-time features of an EEG to extract features from the previous layer’s output. In this way,
236
early layers of the network extract simple features such as lines inputs are downsampled by a factor of 2. Then the acquired
in an image and feed later layers with these features so that feature set is connected to a FCN that completes the
subsequent layers can process these simple features and classification task. Figure 2 depicts general overview of the
extract more complex features like objects in an image. This VGG-16 architecture [25].
kind of hierarchical learning of features is inspired from VGG-16 is an example of family of state-of-art CNN
human cortex in which cells respond to visual elements in a models that include other well-known architectures such as
similar hierarchical way [20]. Figure 1 depicts a block of CIFAR, Google LeNet and AlexNet. Before choosing VGG-
convolutional layer, non-linear layer and pooling layer. 16, we empirically tested other CNN models with other data
and observed that with some exceptions all models performed
comparably. VGG-16, however, outperformed the others
slightly. Therefore, for the sake of simplicity we only show the
results with VGG-16 in the paper.
237
Figure 3. An example spectrogram generated by Matlab software
238
Figure 6. 19 channel electrode setup for Dataset B
3.2 Experiments
(b)
In this study, the proposed method is evaluated against two
datasets of samples. In the first dataset (A), there are 16- Figure 9. Accuracy against training time in epochs for (a)
channel EEG recordings of 39 healthy children and 45 children Dataset A and (b) Dataset B
with SZ disease. Each sample is divided into 5-seconds long
segments each of which is represented with a vector of length The data set of spectrogram images is split into train and test
10240 (128 values per second x 16 channels x 5 seconds). sets with a ratio of 80% and 20%, respectively. These images
These vectors are then converted into spectrograms of size are fed into VGG-16 CNN model in order to classify each as
224x224. Therefore, we have 1008 images for 84 individuals either healthy or SZ patient. The hyper parameters for the
in Dataset A. Figure 7 and 8 show example spectrogram network are taken as follows: input image size 112x112,
images for a healthy control and a SZ patient for Dataset A and batch-size 128, 1.0e-4 learning rate and optimizer Adam [29].
B, respectively. Through the experiments we observed that 50 epochs of
training sufficed in order to reach a convergence of the
network. On average, the network reached an accuracy of 95%
for the test set.
In the second dataset, there are 19-channel EEG recordings
of 28 adults (14 controls and 14 SZ patients). The length of the
recordings varied between 12 and 15 minutes. Therefore, in
healthy and SZ groups, the length of the recording is set to
length of the shortest record in the group. Subsequently, we
obtained 173 segments for each healthy control and 148
segments for each SZ patient where length of each segment
Figure 7. Example spectrogram images from Dataset A for was again 5 seconds. After this point, a total of 4494
(a) healthy control and (b) SZ patient spectrogram images are processed similarly to the ones in
Dataset A with the same settings and hyper parameters. As a
result, the network reached an accuracy of 97.4% at 30 epochs.
Figure 9 shows the change in the accuracy with respect to
training time in epochs.
As can be seen in Figure 9, sufficient amount of training is
important and affects the accuracy of the model significantly.
Unfortunately, there is no predefined number for the required
amount of training in the literature and it is in general
determined empirically for each dataset through the
experiments.
Figure 8. Example spectrogram images from Dataset B for
(a) healthy control and (b) SZ patient 3.3 Evaluation metrics
239
The experiments of the study are evaluated and results are The ROC curve is a graphical common metric used in the
confirmed with a number of well-known and widely-used ML literature. It plots TPR (y-axis) against FPR (x-axis)
evaluated metrics. The details and interpretations of these values changing with respect to different threshold values used
metrics are explained in other papers [30, 31] and therefore we by the binary-classifier while discriminating between 0 and 1
only include basic calculations of these metrics. values. The AUC (or AOC - Area of the Curve) value is the
total area occupied under the curve. It is better when the AUC
3.3.1 Confusion matrix (as shown in Figure 10 above) value is high because a high AUC tells that the classifier does
well with most threshold values while classifying “0” s as “0”
3.3.2 Prediction error and accuracy and “1” s as “1”. Figure 11 shows an example ROC curve.
𝐹𝑃 + 𝐹𝑁
𝐸𝑅𝑅 = = 1 − 𝐴𝐶𝐶 (2)
𝐹𝑃 + 𝐹𝑁 + 𝑇𝑃 + 𝑇𝑁 4. DISCUSSION
240
(a)
(a)
(b)
241
Figure 14. Activization maximization images for (a) healthy control and (b) SZ patient
Figure 15. Grad-CAM images obtained from a set of individuals of (a) healthy controls and (b) SZ patients
Table 3. A summary of relevant literature methods that aims to automatically detect SZ patients
242
Therefore, as a second attempt, we utilized another it should be noted that low complexity models with fewer
technique called Gradient Weighted Activation Maps (Grad- layers/nodes/parameters with lower computational
CAM) [33] in order to understand what really differs for a requirements will nonetheless be preferable even though they
healthy control and a SZ patient. In a Grad-CAM image, a do not come with a performance advantage.
color map is depicted on an input image where colors show the
relevance of the region in the image with respect to the
predicted class. Therefore, colors close to red on the image ETHICAL APPROVAL
means that that region holds important spatial features for the
predicted class. Figure 15 shows Grad-CAM images of Data used in this study are taken from publicly available
different individuals grouped by class. The Grad-CAM images datasets each of which are used in previous studies. First data
belong to the diseased group clearly reveal a pattern that is set is available at http://brain.bio.msu.ru/
usually colors being centered in the middle of the image eeg_schizophrenia.htm. The second one is accessible from
whereas in healthy control images high intensity colors either https://repod.pon.edu.pl/dataset/eeg-in-schizophrenia.
never appear or appear only at the top and bottom of the image.
Please note that the pattern shown in Figure 15 is common
to other individuals not shown in the figure. Therefore, a few REFERENCES
results can be inferred out of these images. First and foremost,
frequency components matter in the discrimination of SZ [1] Buettner, R., Hirschmiller, M., Schlosser, K., Rössle, M.,
patients and healthy controls. Moreover, the mid-level Fernandes, M., Timm, I.J. (2019). High-performance
frequency components are the most important ones to exclusion of schizophrenia using a novel machine
discriminate SZ patients as almost all samples in this group learning method on EEG data. 2019 IEEE International
has a similar pattern in the spectrogram region that represent Conference on E-health Networking, Application &
mid-level frequency components. Services (HealthCom), Bogota, Colombia, pp. 1-6.
Table 3 summarizes the relevant literature methods with http://dx.doi.org/10.1109/HealthCom46333.2019.90094
respect the methodology and the accuracy reached. As a result 37
of the table, it is clear that the method proposed in this paper [2] Zhang, L. (2019). EEG signals classification using
outperforms all the methods mentioned in the table except one machine learning for the identification and diagnosis of
study conducted by Oh et al. [6] which has a comparable schizophrenia. 2019 41st Annual International
performance with the proposed method. Beyond the improved Conference of the IEEE Engineering in Medicine and
performance tested on two different datasets, the proposed Biology Society (EMBC), Berlin, Germany, pp. 4521-
method has several advantages over literature methods. Firstly, 4524. http://dx.doi.org/10.1109/EMBC.2019.8857946
unlike the literature studies that utilize ML methods, it does [3] Shim, M., Hwang, H.J., Kim, D.W., Lee, S.H., Im, C.H.
not have a preprocessing stage to manually extract features. (2016). Machine-learning-based diagnosis of
Furthermore, it does not need to preprocess spectrogram schizophrenia using combined sensor-level and source-
images. Secondly, its results are interpretable and reveals level EEG features. Schizophrenia Research, 176(2-3):
which frequency components matter for EEG recordings of SZ 314-319. http://dx.doi.org/10.1016/j.schres.2016.05.007
patients. Lastly, it has a simple pipeline, raw EEG signals are [4] Cao, B., Cho, R.Y., Chen, D.C., Xiu, M.H., Wang, L.,
transformed into spectrograms which are then given to the Soares, J.C., Zhang, X.Y. (2020). Treatment response
CNN model. prediction and individualized identification of first-
episode drug-naïve schizophrenia using brain functional
connectivity. Molecular Psychiatry, 25: 906-913.
5. CONCLUSION https://doi.org/10.1038/s41380-018-0106-5
[5] Jack Jr, C.R., Lowe, V.J., Weigand, S.D., Wiste, H.J.,
In this study, a method is proposed to automatically Senjem, M.L., Knopman, D.S., Shiung, M.M., Gunter,
diagnose SZ patients. The experiments conducted by use of J.L., Boeve, B.F., Kemp, B.J., Weiner, M., Petersen, R.C.,
two separate datasets prove that the method is capable of the Alzheimer's Disease Neuroimaging Initiative. (2009).
discriminating healthy controls and SZ patients in a robust and Serial PIB and MRI in normal, mild cognitive
accurate way. The accuracy values reached by the method are impairment and Alzheimer’s disease: Implications for
95% and 97% for Dataset A and B, respectively. With these sequence of pathological events in Alzheimer’s disease.
results, the proposed method outperforms most of the methods Brain, 132(5): 1355-1365.
in the relevant literature. https://doi.org/10.1093/brain/awp062
Furthermore, the proposed method reveals that analyzing [6] Oh, S.L., Vicnesh, J., Ciaccio, E.J., Yuvaraj, R., Acharya,
frequency components in an EEG recording is a robust way of U.R. (2019). Deep convolutional neural network model
discriminating the SZ disease described as a brain disorder. for automated diagnosis of schizophrenia using EEG
Particularly, Grad-CAM images show that mid-level signals. Applied Science, 9(14): 2870.
frequency components of an EEG record of a SZ patient show https://doi.org/10.3390/app9142870
a specific pattern that makes the records separable. The [7] Kim, J.W., Lee, Y.S., Han, D.H., Min, K.J., Lee, J., Lee,
method introduced in the paper can be used as a framework for K. (2015). Diagnostic utility of quantitative EEG in un-
CAD studies that attempt to detect certain diseases that are medicated schizophrenia. Neuroscience Letters, 589:
considered to have some trace in the EEG recordings. 126-131. https://doi.org/10.1016/j.neulet.2014.12.064
The proposed method uses a state-of-art CNN architecture [8] Delorme, A., Makeig, S. (2004). EEGLAB: An open
called VGG-16. As more CNN models are introduced source toolbox for analysis of single-trial EEG dynamics
frequently, it is obvious that new models may help improve including independent component analysis. Journal of
the classification performance of the proposed method. Also, Neuroscience Methods, 134(1): 9-21.
243
https://doi.org/10.1016/j.jneumeth.2003.10.009 Ginneken, B., Sánchez, C.I. (2017). A survey on deep
[9] Dvey-Aharon, Z., Fogelson, N., Peled, A., Intrator, N. learning in medical image analysis. Medical Image
(2015). Schizophrenia detection and classification by Analysis, 42: 60-88.
advanced analysis of EEG recordings using a single http://dx.doi.org/10.1016/j.media.2017.07.005
electrode approach. PLoS One, 10(4): e0123033. [22] Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep
https://doi.org/10.1371/journal.pone.0123033 learning. MIT Press, 19: 305-307.
[10] Stockwell, R.G., Mansinha, L., Lowe, R.P. (1996). http://dx.doi.org/10.1007/s10710-017-9314-z
Localization of the complex spectrum: The S transform. [23] Prabhu, Understanding of Convolutional Neural
IEEE Transactions on Signal Processing, 44(4): 998- Network (CNN) — Deep Learning.” [Online]. Available:
1001. http://dx.doi.org/10.1109/78.492555 https://medium.com/@RaghavPrabhu/understanding-of-
[11] Johannesen, J.K., Bi, J., Jiang, R., Kenney, J.G., Chen, convolutional-neural-network-cnn-deep-learning-
C.M.A. (2016). Machine learning identification of EEG 99760835f148, accessed on 10-Jan-2020.
features predicting working memory performance in [24] Arsa, D.M.S., Susila, A.A.N.H., (2019). VGG16 in batik
schizophrenia and healthy adults. Neuropsychiatric classification based on random forest. 2019 International
Electrophysiology, 2: 3. https://doi.org/10.1186/s40810- Conference on Information Management and
016-0017-0 Technology (ICIMTech), 1: 295-299.
[12] Guyon, I., Elisseeff, A. (2003). An introduction to [25] Tindall, L., Luong, C., Saad, A. (2015). Plankton
variable and feature selection. The Journal of Machine classification using vgg16 network.
Learning Research, 3: 1157-1182. [26] Yuan, L., Cao, J. (2017). Patients’ EEG data analysis via
[13] Santos-Mayo, L., San-José-Revuelta, L.M., Arribas, J.I. spectrogram image with a convolution neural network.
(2016). A computer-aided diagnosis system with EEG International Conference on Intelligent Decision
based on the P3b wave during an auditory odd-ball task Technologies, pp. 13-21. http://dx.doi.org/10.1007/978-
in schizophrenia. IEEE Transactions on Biomedical 3-319-59421-7_2
Engineering, 64(2): 395-407. [27] Borisov, S.V., Kaplan, A.Y., Gorbachevskaya, N.L.,
https://doi.org/10.1109/TBME.2016.2558824 Kozlova, I.A. (2005). Analysis of EEG structural
[14] Devijver, P.A., Kittler, J. (1982). Pattern Recognition: A synchrony in adolescents with schizophrenic disorders.
Statistical Approach. Prentice Hall. Human Physiology, 31: 255-261.
[15] Aslan, Z., Akın, M. (2019). Detection of schizophrenia https://doi.org/10.1007/s10747-005-0042-z
on EEG signals by using relative wavelet energy as a [28] Olejarczyk, E., Jernajczyk, W. (2017). Graph-based
Feature Extractor. UEMK 2019 4th International Energy analysis of brain connectivity in schizophrenia. PLoS
& Engineering Congress, pp. 301-310. One, 12(11): e0188629.
[16] Thilakvathi, B., Shenbaga Devi, S., Bhanu, K., https://doi.org/10.1371/journal.pone.0188629
Malaippan, M. (2017). EEG signal complexity analysis [29] Kingma, D.P., Ba, J. (2014). Adam: A method for
for schizophrenia during rest and mental activity. stochastic optimization. arXiv Prepr. arXiv1412.6980.
Biomedical Research, 28(1). [30] Raschka, S. (2014). An overview of general performance
[17] Phang, C.R., Ting, C.M., Noman, F., Ombao, H. (2019). metrics of binary classifier systems. arXiv Prepr.
Classification of EEG-based brain connectivity networks arXiv1410.5330.
in schizophrenia using a multi-domain connectome [31] Goutte, C., Gaussier, E. (2005). A probabilistic
convolutional neural network. arXiv Prepr. interpretation of precision, recall and F-score, with
arXiv1903.08858. implication for evaluation. European Conference on
[18] Aslan, Z. (2019). On the use of deep learning methods on Information Retrieval, pp. 345-359.
medical images. The International Journal of Energy and http://dx.doi.org/10.1007/978-3-540-31865-1_25
Engineering Sciences, 3(2): 1-15. [32] Kotikalapudi, R. (2007). keras-vis. GitHub,
[19] Hubel, D.H., Wiesel, T.N. (1968). Receptive fields and https://github.com/raghakot/keras-vis, accessed on 12
functional architecture of monkey striate cortex. The December 2019.
Journal of Physiology, 195(1): 215-243. [33] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R.,
http://dx.doi.org/10.1113/jphysiol.1968.sp008455 Parikh, D., Batra, D. (2017). Grad-cam: Visual
[20] Min, S., Lee, B., Yoon, S. (2017). Deep learning in explanations from deep networks via gradient-based
bioinformatics. Briefings in Bioinformatics, 18(5): 851- localization. Proceedings of the IEEE International
869. http://dx.doi.org/10.1093/bib/bbw068 Conference on Computer Vision, pp. 618-626.
[21] Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., http://dx.doi.org/10.1007/s11263-019-01228-7
Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., van
244