0% found this document useful (0 votes)
32 views10 pages

Automatic Detection of Schizophrenia by Applying Deep Learning Over Spectrogram Images of EEG Signals

This study introduces a novel method for the automatic diagnosis of schizophrenia using deep learning techniques on EEG signal spectrograms. By transforming raw EEG data into 2D images through Short-Time Fourier Transform and employing a VGG-16 convolutional neural network, the method achieved classification accuracies of 95% and 97% across different age groups. The results indicate a strong correlation between frequency components of EEG signals and schizophrenia, showcasing the effectiveness of this approach compared to traditional methods.

Uploaded by

pranayjn14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views10 pages

Automatic Detection of Schizophrenia by Applying Deep Learning Over Spectrogram Images of EEG Signals

This study introduces a novel method for the automatic diagnosis of schizophrenia using deep learning techniques on EEG signal spectrograms. By transforming raw EEG data into 2D images through Short-Time Fourier Transform and employing a VGG-16 convolutional neural network, the method achieved classification accuracies of 95% and 97% across different age groups. The results indicate a strong correlation between frequency components of EEG signals and schizophrenia, showcasing the effectiveness of this approach compared to traditional methods.

Uploaded by

pranayjn14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Traitement du Signal

Vol. 37, No. 2, April, 2020, pp. 235-244


Journal homepage: http://iieta.org/journals/ts

Automatic Detection of Schizophrenia by Applying Deep Learning over Spectrogram


Images of EEG Signals
Zülfikar Aslan1*, Mehmet Akin2
1
Institute of Natural Sciences, Dicle University, Diyarbakır 21280, Turkey
2
Electrical-Electronics Engineering, Faculty of Engineering, Dicle University, Diyarbakır 21280, Turkey

Corresponding Author Email: [email protected]

https://doi.org/10.18280/ts.370209 ABSTRACT

Received: 17 January 2020 This study presents a method that aims to automatically diagnose Schizophrenia (SZ)
Accepted: 20 March 2020 patients by using EEG recordings. Unlike many literature studies, the proposed method does
not manually extract features from EEG recordings, instead it transforms the raw EEG into
Keywords: 2D by using Short-time Fourier Transform (STFT) in order to have a useful representation
schizophrenia, CNN, deep learning, of frequency-time features. This work is the first in the relevant literature in using 2D time-
spectrogram frequency features for the purpose of automatic diagnosis of SZ patients. In order to extract
most useful features out of all present in the 2D space and classify samples with high
accuracy, a state-of-art Convolutional Neural Network architecture, namely VGG-16, is
trained. The experimental results show that the method presented in the paper is successful
in the task of classifying SZ patients and healthy controls with a classification accuracy of
95% and 97% in two datasets of different age groups. With this performance, the proposed
method outperforms most of the literature methods. The experiments of the study also reveal
that there is a relationship between frequency components of an EEG recording and the SZ
disease. Moreover, Grad-CAM images presented in the paper clearly show that mid-level
frequency components matter more while discriminating a SZ patient from a healthy control.

1. INTRODUCTION imaging techniques like magnetic resonance imaging (MRI),


positron emission tomography (PET), functional magnetic
Schizophrenia (SZ) is a serious neuropsychiatric disease resonance imaging (fMRI) and diffusion tensor magnetic
that is estimated to affect nearly 1% of the world population. resonance imaging (DTI). These alternative techniques,
Patients of this disease suffer from hallucinations and however, have not considered as favorable as EEG due to the
delusions as well as diminishment in motivation and difficulty reasons such as high cost of imaging hardware and images
in expressing emotions [1]. These symptoms generally begin produced by these machines not always being of the desired
in early ages and the damage in the brain caused by the disease quality [5]. Therefore, EEG comes into prominence as a low
increases in time. Early diagnosis of the disease and patient cost and reliable alternative to be used as an input to a CAD
specific treatment may help reduce deformations in the brain, system designed to automatically detect many diseases such as
it is, however, difficult even for the experts to diagnose the SZ [6].
disease in the early stages [2]. Therefore, development of Therefore, in the relevant literature, most of the CAD
computer methods to diagnose the disease in order to help systems to detect SZ focused on using EEG signals. A great
clinicians in the decision making has been an important deal of researchers attempted to diagnose the disease by using
research topic in the relevant literature. Even though most traditional ML techniques over features extracted from EEG
literature methods such as [3, 4] often utilized traditional signals. The key to successful diagnosis of the disease is
Machine Learning (ML) algorithms, recent developments in extracting relevant features from the signals and therefore
the Deep Learning (DL) make a promising newer direction for there have been various methods proposed in the literature to
the researchers of the field. this end. In the study of Kim et al. [7], 5 frequency bands from
Electroencephalography (EEG) recording is an important 21-channel EEG recordings are selected. They applied Fast
tool to analyze brain activity and functions. An EEG record Fourier Transform (FFT) over these bands and spectral power
contains information obtained from electrical signals detected of these bands are calculated by using EEGLAB software [8].
by use of electrodes placed on different areas of the patient’s They classified healthy and SZ patients with an accuracy of
head. These signals are often digitized and analyzed by use of 62.2% by using the delta frequency. In another study, Dvey-
dedicated computer programs in order to help experts evaluate Aharon et al. [9] preprocessed EEG signals by using The
the information which is otherwise hard to analyze specifically Stockwell transformation to extract features [10]. Their
in the cases like SZ where the raw signal does not directly method called “TFFO” (Time-Frequency transformation
show any disease related anomaly. In the computer-aided followed by Feature-Optimization) showed a satisfactory
diagnosis (CAD) of SZ, using EEG recordings is not the only accuracy between 92% and 93.9%. Moreover, Johannesen et
way in the literature. In order to perform automatic detection al. [11] used Support Vector Machines (SVM) to extract most
of the disease, researchers used several different medical relevant features [12] from the EEG recordings in order to

235
predict working memory performance of healthy and SZ recording is sufficient to automatically discriminate SZ
patients. Their method reached an accuracy of 87% in the patients. The frequency-time features are obtained by
prediction performance. Similarly, Santos-Mayo et al. [13] converting raw EEG signals into 2D spectrogram images by
tested various ML approaches and feature selection algorithms using Short-time Fourier Transformation (STFT). As to our
including electrode grouping and filtering. As a result, they knowledge, it is the first time in the relevant literature,
reported Multi-Layer Perceptron (MLP) and SVM algorithms spectrograms are used as inputs to detect SZ patients. The most
have the best accuracy in classification performance with useful features that are thought to be present in these images
93.42% and 92.23%, respectively. Moreover, classification are automatically extracted by a state-of-art CNN model which
over features obtained from J5 feature selection algorithm [14] also classifies samples into healthy or diseased in later layers
performed better. In the study of Aslan and Akın [15], features of the network. Therefore, the proposed method is
are extracted from the EEG signals by using Relative Wavelet advantageous to many literature methods that require expert
Energy. These features are then fed to K-Nearest Neighbors knowledge to extract useful features from EEG recordings
algorithm in order to classify healthy and SZ cases. They because it extracts features automatically through layers of the
reported to have reached nearly 90% accuracy performance. CNN. Moreover, when it is contrasted to literature methods
Thilakvathi et al. [16] used Support Vector Machines (SVM) that use a CNN, it still has advantages such as interpretable
algorithm to discriminate SZ patients. They used Hannon outputs, simpler pipeline and easier to implement architecture.
Entropy, Spectral Entropy, Information Entropy, Higuchi’s The proposed method is tested against two different datasets
Fractal Dimension and Kolmogorov Complexity values as each of which contains patients and healthy controls of
features used as inputs to SVM. They reported an accuracy of different age groups. The fact that the proposed method
88.5%. In all these literature methods mentioned up to now, reaches high accuracy (95% and 97%) in both children and
the EEG recordings are not used in raw, instead researchers adult data show that it is a robust method for the task of
crafted some features out of EEG signals and fed a ML automatically diagnosing the SZ disease. Moreover, the
algorithm of the choice with these features. This feature obtained accuracy values are better than those of most of the
engineering approach has several advantages like good literature methods. It should also be noted that one more
predictive performance but it requires experts with advantage of the method is that because it uses images as
comprehensive knowledge of the target domain. Also, all these inputs, the model can output interpretable results such as Grad-
extracted features are ad hoc solutions specific to the data and CAM images that reveal the relationship between frequency
they are not proven to generalize well with all cases. components and the disease.
As an alternative approach, researchers have recently been
investigating Deep Learning (DL) algorithms such as CNN to
automatically diagnose SZ patients because DL algorithms do 2. METHODS
not require the practitioner extract any features manually from
the input. The features in the given input are extracted 2.1 Methods
automatically in layers of the network such as convolutional
and pooling layers. There are only few literature methods that 2.1.1 Deep learning
utilize CNNs to detect SZ in given EEG signals. In one study, Deep Learning is a recent approach in Machine Learning
Phang et al. [17] proposed a method that accepts brain that adopts hierarchical learning of features with deeper neural
functional connectivity information as features. These features network architectures. Network structures in DL looks similar
are extracted from EEG recordings by use of vector to those utilized in traditional ML, however, they differ in the
autoregressive (VAR) model, partial directed coherence way that DL algorithms attempt to learn features by
(PDC) and complex network measures of network topology. themselves automatically while ML methods often require
The obtained features are subsequently fed to two proper features given to the network by the practitioner. DL
Convolutional Neural Network (CNN) models which in turn started to come forward as a successful alternative to
are fused into a Fully-Connected Neural Network (FCN) that traditional ML algorithms only after large-scale datasets
is capable of classifying healthy controls and SZ patients. become publicly available and the hardware required to
They reported an accuracy of 93.06% in the classification task. process such kind of data become cheaper and thus more
Their method is reported to reach a satisfactory accuracy but accessible. Therefore, recently DL has been frequently used as
relies on additional data such as brain connectivity features. In a method to process, analyze and evaluate medical images
another study, Oh et al. [6] utilized a CNN model to classify mostly in the form of CNNs [18]. This study also uses a CNN
19-channel EEG recordings of 14 healthy controls and 14 SZ model to classify spectrogram images
patients. Their CNN model had a total of 11 layers including
regular convolutional, pooling and dense layers. No 2.1.2 Convolutional Neural Networks
preprocessing is used and the raw EEG channels are fed into Convolutional Neural Networks are DL networks that are
the CNN model all at once. Their method is reported to reach designed to process multimedia data types (e.g., images) in a
an accuracy of 81.26% for subject based testing and 98.07% way that features can be automatically extracted in a
for non-subject based testing. Both methods that utilize DL to hierarchical manner through the layers of the network [19]. At
diagnose SZ lack interpretability due to the use of CNNs as a the core, a CNN model often consists of two modules, a)
black-box solution. This is partly because of the fact that raw Feature extraction through convolutional and pooling layers b)
EEG signals are not obviously correlated with possible visual Classification stage via Fully-Connected Network layers
outcomes of convolution operations. (FCN) which operates similarly to a traditional Multi-Layer
In this study, we propose a method that attempts to detect Perceptron (MLP).
SZ patients with high accuracy, simple pipeline and as well as In a regular CNN model, there are a number of subsequent
interpretable outputs. The proposed study is novel in the way convolutional and pooling layers each of which is responsible
that it hypothesizes that frequency-time features of an EEG to extract features from the previous layer’s output. In this way,

236
early layers of the network extract simple features such as lines inputs are downsampled by a factor of 2. Then the acquired
in an image and feed later layers with these features so that feature set is connected to a FCN that completes the
subsequent layers can process these simple features and classification task. Figure 2 depicts general overview of the
extract more complex features like objects in an image. This VGG-16 architecture [25].
kind of hierarchical learning of features is inspired from VGG-16 is an example of family of state-of-art CNN
human cortex in which cells respond to visual elements in a models that include other well-known architectures such as
similar hierarchical way [20]. Figure 1 depicts a block of CIFAR, Google LeNet and AlexNet. Before choosing VGG-
convolutional layer, non-linear layer and pooling layer. 16, we empirically tested other CNN models with other data
and observed that with some exceptions all models performed
comparably. VGG-16, however, outperformed the others
slightly. Therefore, for the sake of simplicity we only show the
results with VGG-16 in the paper.

2.1.4 Generating spectrogram images from EEG signals


Short-Time Fourier Transformation (STFT) is a general
purpose tool that converts a signal in time domain into
frequency domain. STFT conversion is calculated by
multiplying the transfer function with a window function.
Figure 1. A block of convolutional layer, non-linear layer Spectrogram is a visual depiction of the signal in the frequency
and pooling layer domain within a time interval [26]. Therefore, it shows how
the frequency components of the signal change in time. In this
In a convolutional layer, there are typically many filters, W study, short segments of the EEG signal (e.g., 5 seconds long)
= W1, W2...., Wk, each of which is used to convolve the input is converted into a spectrogram in order that we can have
image with a filter to calculate a feature map Xk of the image. frequency components of different time points in one image.
Therefore, we have as many feature maps as the number of We used MATLAB software to obtain spectrogram images of
filters in the convolutional layer. More formally, each feature these short EEG segments. In the default configuration of
map is calculated via Eq. (1) where b denotes bias and σ (·) is MATLAB’s spectrogram function, it generates Nx=1024
a non-linear transfer function [21]: samples of a signal that consists of a sum of sinusoids. The
normalized frequencies of the sinusoids are 2π/5 rad/sample
𝑋𝑘𝑙 = 𝜎(𝑊𝑘𝑙−1 ∗ 𝑋 𝑙−1 + 𝑏𝑘𝑙−1 ) (1) and 4π/5 rad/sample.
In the example spectrogram in Figure 3, y-axis represents
A convolutional layer is generally followed by a pooling frequency which is normalized between 0 and 1 while x-axis
layer in which feature maps are downsampled in accordance stands for time. Colors approaching red show high values in
with the selected pooling function, max, min or avg. The that frequency whereas colors close to blue are used to show
function of choice is applied to every group of pixels in the low intensity values. Therefore, in the example spectrogram,
feature map and result of the function (e.g., maximum value in it is observed that low frequency components in the signal are
that group) is selected to represent the group in the new more intense than high frequency components at most of the
downsampled feature map. In an overview, a CNN model time segments. High frequency components emerge at high
consists of three different layer types: (1) convolutional layers, values only at few time values and thus mostly depicted with
(2) pooling layers and (3) a FCN [22]. different tones of blue color in the example spectrogram.
Convolutional layer is the first layer that extracts features
from the input image. The convolution operation uses small 2.1.5 General architecture of the proposed method
matrices (e.g., size 3x3) called filters to learn image features As mentioned our method does not include any manual
while keeping spatial information in the image. Pooling layer processing or ad hoc altering of input. Each input, i.e., a set of
is in general used to reduce the number of parameters. Since it EEG channel data, is passed through a series of
keeps the spatial information, it is often known as a transformations (segmentation, spectrogram generation and
downsampling operation. The behaviour of downsampling CNN evaluation) and finally results in a class value, either SZ
operation depends on the function selected. For instance, in patient or a healthy control. Figure 4 depicts general overview
max pooling, maximum value in the region of interest is of the process.
selected to replace all values in that region. The other types,
min pooling and avg pooling, do a similar job but they get the
minimum or the average value instead. The output of pooling
layer is connected to a FCN that does the classification [23].
A FCN is often a MLP with a Softmax output layer. As usual,
it is trained with the backpropagation algorithm [24].

2.1.3 VGG-16 architecture


VGG-16 is a state-of-art 16-layer CNN model developed by
the Oxford University Visual Geometry Group for the
ILSVRC-2014 competition. Its major difference is that it has
a deeper architecture than its predecessors. In VGG-16,
images are converted to 224x224x3 (RGB) and passed through
5 blocks of convolutional layers each of which has a filter size
of 3x3. Each block ends with a max pooling layer in which Figure 2. VGG-16 architecture

237
Figure 3. An example spectrogram generated by Matlab software

Figure 4. Flowchart of the proposed method

3. RESULTS 3.1.2 Dataset B


The second dataset utilized in this study contains EEG
3.1 Material recordings of 14 healthy controls and 14 SZ patients. This data
is recorded by the Institute of Psychiatry and Neurology in
3.1.1 Dataset A Warsaw, Poland from 14 male and 14 female subjects with the
The first dataset used in this study is the set of EEG average ages of 27.3±3.3 ve 28.3±4.1, respectively. The
recordings that belong to 39 healthy control subjects and 45 subjects keep their eyes shut during the recording that is taken
children that have the same kind of schizophrenic disorder. All with a sampling frequency of 250 Hz for about 12 and 15
SZ patients in this dataset are approved by the Mental Health minutes. Each record has 19 channels with the electrode
Research Center (MHRC) experts. None of the patients in this sequence of Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4,
dataset has undergone chemical treatment. The eldest of the T5, P3, Pz, P4, T6, O1 and O2 (See Figure 6) [28].
SZ patients is 14 years old and the youngest one is 10 years
and 8 months old while the eldest and the youngest control
subject are 13 years 9 months old and 11 years old,
respectively. The average age in both groups is 12 years and 3
months [27]. The data is recorded while the subjects are
comfortable, awake, their eyes being shut and 16 electrodes
are connected to their head. EEG is recorded in accordance
with the international 10-20 standard with the electrode
sequence of O1, O2, P3, P4, Pz, T5, T6, C3, C4, Cz, T3, T4,
F3, F4, F7 and F8 (See Figure 5). Each EEG recording is 60
seconds long and recorded with a sampling rate of 128 Hz.
Therefore, an EEG data for each subject is represented with a Figure 5. 16 channel electrode setup for Dataset A
7680 x 16 matrix.

238
Figure 6. 19 channel electrode setup for Dataset B

3.2 Experiments
(b)
In this study, the proposed method is evaluated against two
datasets of samples. In the first dataset (A), there are 16- Figure 9. Accuracy against training time in epochs for (a)
channel EEG recordings of 39 healthy children and 45 children Dataset A and (b) Dataset B
with SZ disease. Each sample is divided into 5-seconds long
segments each of which is represented with a vector of length The data set of spectrogram images is split into train and test
10240 (128 values per second x 16 channels x 5 seconds). sets with a ratio of 80% and 20%, respectively. These images
These vectors are then converted into spectrograms of size are fed into VGG-16 CNN model in order to classify each as
224x224. Therefore, we have 1008 images for 84 individuals either healthy or SZ patient. The hyper parameters for the
in Dataset A. Figure 7 and 8 show example spectrogram network are taken as follows: input image size 112x112,
images for a healthy control and a SZ patient for Dataset A and batch-size 128, 1.0e-4 learning rate and optimizer Adam [29].
B, respectively. Through the experiments we observed that 50 epochs of
training sufficed in order to reach a convergence of the
network. On average, the network reached an accuracy of 95%
for the test set.
In the second dataset, there are 19-channel EEG recordings
of 28 adults (14 controls and 14 SZ patients). The length of the
recordings varied between 12 and 15 minutes. Therefore, in
healthy and SZ groups, the length of the recording is set to
length of the shortest record in the group. Subsequently, we
obtained 173 segments for each healthy control and 148
segments for each SZ patient where length of each segment
Figure 7. Example spectrogram images from Dataset A for was again 5 seconds. After this point, a total of 4494
(a) healthy control and (b) SZ patient spectrogram images are processed similarly to the ones in
Dataset A with the same settings and hyper parameters. As a
result, the network reached an accuracy of 97.4% at 30 epochs.
Figure 9 shows the change in the accuracy with respect to
training time in epochs.
As can be seen in Figure 9, sufficient amount of training is
important and affects the accuracy of the model significantly.
Unfortunately, there is no predefined number for the required
amount of training in the literature and it is in general
determined empirically for each dataset through the
experiments.
Figure 8. Example spectrogram images from Dataset B for
(a) healthy control and (b) SZ patient 3.3 Evaluation metrics

Figure 10. The confusion matrix


(a)

239
The experiments of the study are evaluated and results are The ROC curve is a graphical common metric used in the
confirmed with a number of well-known and widely-used ML literature. It plots TPR (y-axis) against FPR (x-axis)
evaluated metrics. The details and interpretations of these values changing with respect to different threshold values used
metrics are explained in other papers [30, 31] and therefore we by the binary-classifier while discriminating between 0 and 1
only include basic calculations of these metrics. values. The AUC (or AOC - Area of the Curve) value is the
total area occupied under the curve. It is better when the AUC
3.3.1 Confusion matrix (as shown in Figure 10 above) value is high because a high AUC tells that the classifier does
well with most threshold values while classifying “0” s as “0”
3.3.2 Prediction error and accuracy and “1” s as “1”. Figure 11 shows an example ROC curve.

𝐹𝑃 + 𝐹𝑁
𝐸𝑅𝑅 = = 1 − 𝐴𝐶𝐶 (2)
𝐹𝑃 + 𝐹𝑁 + 𝑇𝑃 + 𝑇𝑁 4. DISCUSSION

𝑇𝑃 + 𝑇𝑁 An ideal classifier should detect diseased patients with a


𝐴𝐶𝐶 = = 1 − 𝐸𝑅𝑅 (3)
𝐹𝑃 + 𝐹𝑁 + 𝑇𝑃 + 𝑇𝑁 high rate while ruling out all healthy controls as non-diseased.
Therefore, both Precision and Recall metrics of the classifier
3.3.3 False and true positive rates should be high at the same time which eventually results in a
high F1-score as well. The results of the experiments shown in
𝐹𝑁 𝐹𝑃 Table 1 and Table 2 show that our proposed method performs
𝐹𝑃𝑅 = = (4)
𝑁 𝐹𝑃 + 𝑇𝑁 good at these aspects of the classification task with 95% and
97% F1-score values for Dataset A and B, respectively. Note
𝑇𝑃 𝑇𝑃 that the support value in Table 1 and 2 stands for the true
𝑇𝑃𝑅 = = (5)
𝑃 𝐹𝑁 + 𝑇𝑃 number of samples for each row.
The confusion matrices given in Figure 12 clearly show that
3.3.4 Precision, recall, F1 score correct classification rate for diseased and non-diseased
F1-Score is a more capable metric that can evaluate samples is high (>=0.94) while misclassification is very low
performance of a classifier in all aspects in a more balanced with values close to 0 (<= 0.05) for both datasets.
way than a single FPR or TPR metric. In order to calculate,
F1-Score, Precision (PRE), also known as TPR or Sensitivity Table 1. Performance results of the proposed method against
(SEN), and Recall metrics should be calculated beforehand. Dataset A
Eqns. (6)-(8) show the calculation of F1-score [31].
Dataset A Precision Recall F1-score Support
𝑇𝑃 Healthy Control 0.95 0.95 0.95 94
𝑃𝑅𝐸 = (6) SZ Patient 0.95 0.95 0.95 108
𝑇𝑃 + 𝐹𝑃
𝑇𝑃 𝑇𝑃 Accuracy 0.95 202
𝑅𝐸𝐶 = 𝑇𝑃𝑅 = = (7) Macro avg 0.95 0.95 0.95 202
𝑃 𝐹𝑁 + 𝑇𝑃 Weighted avg 0.95 0.95 0.95 202
𝑃𝑅𝐸. 𝑅𝐸𝐶
𝐹1 = 2. (8) Table 2. Performance results of the proposed method against
𝑃𝑅𝐸 + 𝑅𝐸𝐶 Dataset A
As an additional widely-used metric, the Specificity (SPC) Veri Seti B Precision Recall F1-score Support
measures how well is the classifier in avoiding Healthy Control 0.97 0.98 0.98 484
misclassifications and is ideally equal to 1. SZ Patient 0.98 0.96 0.97 414

𝑇𝑁 𝑇𝑁 Accuracy 0.97 898


𝑆𝑃𝐶 = 𝑇𝑁𝑅 = = (9)
𝑁 𝐹𝑃 + 𝑇𝑁 Macro avg 0.97 0.97 0.97 898
Weighted avg 0.97 0.97 0.97 898
3.3.5 Receiver operator characteristic (ROC) and AUC (area
under the curve) An obtained high TPR and low FPR for a specific
configuration does not mean a binary classifier is good at all
threshold values and therefore may suffer from poor
performance because of close values produced for “0” s and
“1” s. However, this is not the case for the proposed method
since AUC values are obtained as 0.95 and 0.974 for Dataset
A and B, respectively (see Figure 13). That is, the produced
values to represent “0” s and “1” s are very close to “0” and
“1” and thus most of the threshold values suffice to
discriminate between two. That makes the proposed method a
robust classifier for most cases.

Figure 11. An example ROC curve

240
(a)

(a)

(b)

Figure 13. ROC for (a) Dataset A and (b) Dataset B

Furthermore, CNNs are designed to classify images that


have spatial relations between pixels and mostly rely upon
these spatial features while classifying an object. In a raw EEG
(b) signal, only temporal relationships are explicitly available.
A spectrogram being an image that keeps frequency and
Figure 12. The confusion matrices of the proposed method time information with spatial relationships is thus a more
for (a) Dataset A and (b) Dataset B suitable input for a CNN model.
Furthermore, the proposed method in this study is capable
The Dataset A was previously evaluated by Phang et al. [17] of producing more human-interpretable outputs. In this
which reported to reach a classification accuracy of 93.06%. context, when the spectrogram images for a healthy control
Their method was a CNN model that used raw EEG signals as and a SZ patient shown in Figure 7 and 8 are inspected, it can
input. The method used an ensemble of 1D and 2D CNNs each be seen that these spectrograms have differences such as
of which classified the given signals by using different sets of different frequency values at different times.
features like brain connectivity features. The second dataset However, it is hard for a human to generalize these
was also evaluated against a CNN model proposed by Oh et al. differences for all samples in a dataset and formulize them
[6]. In their method, raw EEG signals are fed without a quantitatively. In this regard, we attempt to reveal what these
preprocessing to a CNN directly predicts the class value with differences are by using a technique called Activation
an accuracy of 98.07%. Maximization (AC) [32].
In comparison to these previous literature methods related An AC image is an artificial image synthesized by
to our work, our method outperforms most of them and iteratively finding the values that maximize the output of the
performs comparably with one study [6]. As to our knowledge, network for a single class. Therefore, it can be thought of an
due to reasons such as the lack of abundant SZ patient data ideal input that represents a class. Even though there are clear
available publicly, most methods measured the performance differences in AC images presented in Figure 14 for a healthy
with a single set of data whereas our method is evaluated control and a SZ patient, these images only show that the
against two separate sets of data (children and adult). This network respond to spectrograms of different classes in a very
clearly proves the robustness of the proposed method for diverse way.
different cases. Secondly, our improved performance with It is because that the features seen in AC images are
respect to methods that utilize raw EEG signals may be linked artificially generated to maximize the filter activation and
to the use of converting EEGs into spectrogram images which therefore do not necessarily represent real spectrogram images.
exhibit frequency information of the signal explicitly.

241
Figure 14. Activization maximization images for (a) healthy control and (b) SZ patient

Figure 15. Grad-CAM images obtained from a set of individuals of (a) healthy controls and (b) SZ patients

Table 3. A summary of relevant literature methods that aims to automatically detect SZ patients

Year The study Methods Data Accuracy


of study
2015 Kim et al. [7] * Obtaining different frequency bands, *Calculation of 90 healthy control 62.2%
spectral power, * Fast Fourier Transformation, *ROC 90 SZ patient
analysis
2015 Dvey-Aharon et al. [9] *Stockwell transformation, * “TFFO” (Time- 25 healthy control Between
Frequency transformation followed by Feature- 50 SZ patient 92% and
Optimization) 93.9%
2016 Johannesen et al. [11] *Statistical analysis over spectral power, *SVM 12 healthy control 87%
classification 40 SZ patient
2016 Santos-Mayo et al. [13] *EEGLAB feature extraction, *J5 feature extraction, 31 healthy control MLP:
*MLP and SVM classification 16 SZ patient 93.42%
SVM:
92.23%
2017 Thilakvathi B et al. [17] *Hannon entropy, *Spectral entropy, *Information 23 healthy control 88.5%
entropy 55 SZ patient
*higuchi’s fractal dimension, *Kolmogorov complexity
and approximate entropy, *SVM classification
2019 Aslan and Akın [15] *Wavelet, *Relative Wavelet Energy, * KNN 39 healthy control 90%
classification 45 SZ patient

2019 Phang et al. [16] CNN 39 healthy control 93.06%


45 SZ patient

2019 Shu Lih Oh et al. [6] CNN 14 healthy control 98.07%


14 SZ patient

242
Therefore, as a second attempt, we utilized another it should be noted that low complexity models with fewer
technique called Gradient Weighted Activation Maps (Grad- layers/nodes/parameters with lower computational
CAM) [33] in order to understand what really differs for a requirements will nonetheless be preferable even though they
healthy control and a SZ patient. In a Grad-CAM image, a do not come with a performance advantage.
color map is depicted on an input image where colors show the
relevance of the region in the image with respect to the
predicted class. Therefore, colors close to red on the image ETHICAL APPROVAL
means that that region holds important spatial features for the
predicted class. Figure 15 shows Grad-CAM images of Data used in this study are taken from publicly available
different individuals grouped by class. The Grad-CAM images datasets each of which are used in previous studies. First data
belong to the diseased group clearly reveal a pattern that is set is available at http://brain.bio.msu.ru/
usually colors being centered in the middle of the image eeg_schizophrenia.htm. The second one is accessible from
whereas in healthy control images high intensity colors either https://repod.pon.edu.pl/dataset/eeg-in-schizophrenia.
never appear or appear only at the top and bottom of the image.
Please note that the pattern shown in Figure 15 is common
to other individuals not shown in the figure. Therefore, a few REFERENCES
results can be inferred out of these images. First and foremost,
frequency components matter in the discrimination of SZ [1] Buettner, R., Hirschmiller, M., Schlosser, K., Rössle, M.,
patients and healthy controls. Moreover, the mid-level Fernandes, M., Timm, I.J. (2019). High-performance
frequency components are the most important ones to exclusion of schizophrenia using a novel machine
discriminate SZ patients as almost all samples in this group learning method on EEG data. 2019 IEEE International
has a similar pattern in the spectrogram region that represent Conference on E-health Networking, Application &
mid-level frequency components. Services (HealthCom), Bogota, Colombia, pp. 1-6.
Table 3 summarizes the relevant literature methods with http://dx.doi.org/10.1109/HealthCom46333.2019.90094
respect the methodology and the accuracy reached. As a result 37
of the table, it is clear that the method proposed in this paper [2] Zhang, L. (2019). EEG signals classification using
outperforms all the methods mentioned in the table except one machine learning for the identification and diagnosis of
study conducted by Oh et al. [6] which has a comparable schizophrenia. 2019 41st Annual International
performance with the proposed method. Beyond the improved Conference of the IEEE Engineering in Medicine and
performance tested on two different datasets, the proposed Biology Society (EMBC), Berlin, Germany, pp. 4521-
method has several advantages over literature methods. Firstly, 4524. http://dx.doi.org/10.1109/EMBC.2019.8857946
unlike the literature studies that utilize ML methods, it does [3] Shim, M., Hwang, H.J., Kim, D.W., Lee, S.H., Im, C.H.
not have a preprocessing stage to manually extract features. (2016). Machine-learning-based diagnosis of
Furthermore, it does not need to preprocess spectrogram schizophrenia using combined sensor-level and source-
images. Secondly, its results are interpretable and reveals level EEG features. Schizophrenia Research, 176(2-3):
which frequency components matter for EEG recordings of SZ 314-319. http://dx.doi.org/10.1016/j.schres.2016.05.007
patients. Lastly, it has a simple pipeline, raw EEG signals are [4] Cao, B., Cho, R.Y., Chen, D.C., Xiu, M.H., Wang, L.,
transformed into spectrograms which are then given to the Soares, J.C., Zhang, X.Y. (2020). Treatment response
CNN model. prediction and individualized identification of first-
episode drug-naïve schizophrenia using brain functional
connectivity. Molecular Psychiatry, 25: 906-913.
5. CONCLUSION https://doi.org/10.1038/s41380-018-0106-5
[5] Jack Jr, C.R., Lowe, V.J., Weigand, S.D., Wiste, H.J.,
In this study, a method is proposed to automatically Senjem, M.L., Knopman, D.S., Shiung, M.M., Gunter,
diagnose SZ patients. The experiments conducted by use of J.L., Boeve, B.F., Kemp, B.J., Weiner, M., Petersen, R.C.,
two separate datasets prove that the method is capable of the Alzheimer's Disease Neuroimaging Initiative. (2009).
discriminating healthy controls and SZ patients in a robust and Serial PIB and MRI in normal, mild cognitive
accurate way. The accuracy values reached by the method are impairment and Alzheimer’s disease: Implications for
95% and 97% for Dataset A and B, respectively. With these sequence of pathological events in Alzheimer’s disease.
results, the proposed method outperforms most of the methods Brain, 132(5): 1355-1365.
in the relevant literature. https://doi.org/10.1093/brain/awp062
Furthermore, the proposed method reveals that analyzing [6] Oh, S.L., Vicnesh, J., Ciaccio, E.J., Yuvaraj, R., Acharya,
frequency components in an EEG recording is a robust way of U.R. (2019). Deep convolutional neural network model
discriminating the SZ disease described as a brain disorder. for automated diagnosis of schizophrenia using EEG
Particularly, Grad-CAM images show that mid-level signals. Applied Science, 9(14): 2870.
frequency components of an EEG record of a SZ patient show https://doi.org/10.3390/app9142870
a specific pattern that makes the records separable. The [7] Kim, J.W., Lee, Y.S., Han, D.H., Min, K.J., Lee, J., Lee,
method introduced in the paper can be used as a framework for K. (2015). Diagnostic utility of quantitative EEG in un-
CAD studies that attempt to detect certain diseases that are medicated schizophrenia. Neuroscience Letters, 589:
considered to have some trace in the EEG recordings. 126-131. https://doi.org/10.1016/j.neulet.2014.12.064
The proposed method uses a state-of-art CNN architecture [8] Delorme, A., Makeig, S. (2004). EEGLAB: An open
called VGG-16. As more CNN models are introduced source toolbox for analysis of single-trial EEG dynamics
frequently, it is obvious that new models may help improve including independent component analysis. Journal of
the classification performance of the proposed method. Also, Neuroscience Methods, 134(1): 9-21.

243
https://doi.org/10.1016/j.jneumeth.2003.10.009 Ginneken, B., Sánchez, C.I. (2017). A survey on deep
[9] Dvey-Aharon, Z., Fogelson, N., Peled, A., Intrator, N. learning in medical image analysis. Medical Image
(2015). Schizophrenia detection and classification by Analysis, 42: 60-88.
advanced analysis of EEG recordings using a single http://dx.doi.org/10.1016/j.media.2017.07.005
electrode approach. PLoS One, 10(4): e0123033. [22] Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep
https://doi.org/10.1371/journal.pone.0123033 learning. MIT Press, 19: 305-307.
[10] Stockwell, R.G., Mansinha, L., Lowe, R.P. (1996). http://dx.doi.org/10.1007/s10710-017-9314-z
Localization of the complex spectrum: The S transform. [23] Prabhu, Understanding of Convolutional Neural
IEEE Transactions on Signal Processing, 44(4): 998- Network (CNN) — Deep Learning.” [Online]. Available:
1001. http://dx.doi.org/10.1109/78.492555 https://medium.com/@RaghavPrabhu/understanding-of-
[11] Johannesen, J.K., Bi, J., Jiang, R., Kenney, J.G., Chen, convolutional-neural-network-cnn-deep-learning-
C.M.A. (2016). Machine learning identification of EEG 99760835f148, accessed on 10-Jan-2020.
features predicting working memory performance in [24] Arsa, D.M.S., Susila, A.A.N.H., (2019). VGG16 in batik
schizophrenia and healthy adults. Neuropsychiatric classification based on random forest. 2019 International
Electrophysiology, 2: 3. https://doi.org/10.1186/s40810- Conference on Information Management and
016-0017-0 Technology (ICIMTech), 1: 295-299.
[12] Guyon, I., Elisseeff, A. (2003). An introduction to [25] Tindall, L., Luong, C., Saad, A. (2015). Plankton
variable and feature selection. The Journal of Machine classification using vgg16 network.
Learning Research, 3: 1157-1182. [26] Yuan, L., Cao, J. (2017). Patients’ EEG data analysis via
[13] Santos-Mayo, L., San-José-Revuelta, L.M., Arribas, J.I. spectrogram image with a convolution neural network.
(2016). A computer-aided diagnosis system with EEG International Conference on Intelligent Decision
based on the P3b wave during an auditory odd-ball task Technologies, pp. 13-21. http://dx.doi.org/10.1007/978-
in schizophrenia. IEEE Transactions on Biomedical 3-319-59421-7_2
Engineering, 64(2): 395-407. [27] Borisov, S.V., Kaplan, A.Y., Gorbachevskaya, N.L.,
https://doi.org/10.1109/TBME.2016.2558824 Kozlova, I.A. (2005). Analysis of EEG structural
[14] Devijver, P.A., Kittler, J. (1982). Pattern Recognition: A synchrony in adolescents with schizophrenic disorders.
Statistical Approach. Prentice Hall. Human Physiology, 31: 255-261.
[15] Aslan, Z., Akın, M. (2019). Detection of schizophrenia https://doi.org/10.1007/s10747-005-0042-z
on EEG signals by using relative wavelet energy as a [28] Olejarczyk, E., Jernajczyk, W. (2017). Graph-based
Feature Extractor. UEMK 2019 4th International Energy analysis of brain connectivity in schizophrenia. PLoS
& Engineering Congress, pp. 301-310. One, 12(11): e0188629.
[16] Thilakvathi, B., Shenbaga Devi, S., Bhanu, K., https://doi.org/10.1371/journal.pone.0188629
Malaippan, M. (2017). EEG signal complexity analysis [29] Kingma, D.P., Ba, J. (2014). Adam: A method for
for schizophrenia during rest and mental activity. stochastic optimization. arXiv Prepr. arXiv1412.6980.
Biomedical Research, 28(1). [30] Raschka, S. (2014). An overview of general performance
[17] Phang, C.R., Ting, C.M., Noman, F., Ombao, H. (2019). metrics of binary classifier systems. arXiv Prepr.
Classification of EEG-based brain connectivity networks arXiv1410.5330.
in schizophrenia using a multi-domain connectome [31] Goutte, C., Gaussier, E. (2005). A probabilistic
convolutional neural network. arXiv Prepr. interpretation of precision, recall and F-score, with
arXiv1903.08858. implication for evaluation. European Conference on
[18] Aslan, Z. (2019). On the use of deep learning methods on Information Retrieval, pp. 345-359.
medical images. The International Journal of Energy and http://dx.doi.org/10.1007/978-3-540-31865-1_25
Engineering Sciences, 3(2): 1-15. [32] Kotikalapudi, R. (2007). keras-vis. GitHub,
[19] Hubel, D.H., Wiesel, T.N. (1968). Receptive fields and https://github.com/raghakot/keras-vis, accessed on 12
functional architecture of monkey striate cortex. The December 2019.
Journal of Physiology, 195(1): 215-243. [33] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R.,
http://dx.doi.org/10.1113/jphysiol.1968.sp008455 Parikh, D., Batra, D. (2017). Grad-cam: Visual
[20] Min, S., Lee, B., Yoon, S. (2017). Deep learning in explanations from deep networks via gradient-based
bioinformatics. Briefings in Bioinformatics, 18(5): 851- localization. Proceedings of the IEEE International
869. http://dx.doi.org/10.1093/bib/bbw068 Conference on Computer Vision, pp. 618-626.
[21] Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., http://dx.doi.org/10.1007/s11263-019-01228-7
Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., van

244

You might also like