Effect of Photic Stimulation For Migraine Detection Using Random Forest and Discrete Wavelet Transform
Effect of Photic Stimulation For Migraine Detection Using Random Forest and Discrete Wavelet Transform
a r t i c l e i n f o a b s t r a c t
Article history: Migraine is a neurological disorder characterized by persisting attacks, underlined by the sensitivity to
Received 22 October 2017 light. One of the leading reasons that make migraine a bigger issue is that it cannot be diagnosed easily
Received in revised form 24 October 2018 by physicians because of the numerous overlapping symptoms with other diseases, such as epilepsy and
Accepted 6 December 2018
tension-headache. Consequently, studies have been growing on how to make a computerized decision
support system for diagnosis of migraine. In most laboratory studies, flash stimulation is used during the
Keywords:
recording of electroencephalogram (EEG) signals with different frequencies and variable (seconds) time
Medical decision support system
windows. The main contribution of this study is the investigation of the effects of flash stimulation on
Migraine
Electroencephalogram (EEG)
the classification accuracy, and how to find the effective window length for EEG signal classification. To
Discrete wavelet transform (DWT) achieve this, we tested different machine learning algorithms on the EEG signals features extracted by
Machine learning algorithms using discrete wavelet transform. Our tests on the real-world dataset, recorded in the laboratory, show
that the flash stimulation can improve the classification accuracy for more than 10%. Not surprisingly,
it is seen that the same holds for the selection of time window length, i.e. the selection of the proper
window length is crucial for the accurate migraine identification.
© 2018 Elsevier Ltd. All rights reserved.
1. Introduction seconds. One of the review studies in the literature found flash
frequency at 4 Hz to be the most accurate [40].
Migraine is a persistent neurological disorder with highly severe Ziming et al. [4] proposed a decision support system in the
symptoms such as throbbing pain in one or both sides of the brain, clinical diagnosis of probable migraine. The effectiveness was mea-
and sensitivity to the light [1]. Importantly, it is the third most sured using recall rate, precession rate, F-score and accuracy. In
prevalent disease, affecting one in every seven persons. It is worth general, this method proved to be relatively accurate in diagnos-
mentioning that the migraine is globally ranked as the seventh most ing tension headache and probable migraine. However, Bellotti
disabling disease and the first among the neurological disorders [2]. et al. [5] have focused on spontaneous data. The process took two
However, migraine would often be misdiagnosed, due to its over- stages: characterizing the signals by means of wavelet-based fea-
lapping symptoms with other diseases such as tension headache, tures, and a supervised neural network is then utilised to classify
epilepsy and strokes [3]. In the past decades, a plethora of studies the EEG patterns of migraine patients into two classes: headache-
have been conducted aiming at the highly accurate migraine iden- free and during the attacks. This study found that headache-free
tification. One particular method from these studies has already patients and healthy controls’ signals were highly discriminated
shown promising results, which is flash stimulation. The basis for [5]. Another automatic medical decision support system (DSS),
this approach is the analysis of the subject’s neural responses, suggested in [6], used different machine learning (ML) tools for
recorded with multichannel electroencephalography (EEG) under headache classification. Due to the already simplified existing diag-
flash stimulation at different frequencies for variable amounts of nostic criteria, the suggested DSS was able to achieve more accurate
results, as the model only chose clearly-defined types of headache
which are tension-type headache, migraine headache and another
headache. Akben et al. [7] analysed EEG signals of migraine patients
∗ Corresponding author.
under flash stimulation using artificial neural network (ANN). First,
E-mail addresses: [email protected] (A. Subasi),
power spectrums were acquired under 2 Hz, 4 Hz and 6 Hz flash
[email protected] (A. Ahmed), [email protected] (E. Aličković),
[email protected] (A. Rashik Hassan).
stimulation frequency then ANN was employed to determine which
https://doi.org/10.1016/j.bspc.2018.12.011
1746-8094/© 2018 Elsevier Ltd. All rights reserved.
232 A. Subasi et al. / Biomedical Signal Processing and Control 49 (2019) 231–239
These are set of procedures for each approximation, from A1 to A6. REPTree. Since these tools widely discussed in the literature, in the
Further information can be found in [11–14,45]. present contribution, we will not explain these methods in more
Using WT, we can describe EEG signals by means of discrete details, but instead, we forward the interest reader(s) to following
wavelet coefficients. Relevance of these signals is increased when literature: ANN, k-NN, SVM, CART, C4.5 [18]. We will only shortly
characterised by their statistical information. Those statistical describe REPTree.
features reduce dimensionality of the signal. Performance improve-
ment achieved by reduction of EEG signal dimension in various 2.3.1.1. REPTree. generates a classification (or regression) tree by
studies [15,13,16]. In this paper, EEG signals are described using a means of information gain/variance reduction and the pruning is
set of features given as [17]: done by using reduced-error approach. The advantage of REPTree is
(1) Mean of the coefficients for each sub-band. that it is optimized for speed, and the sorting of numeric attribute
values is done only once. Furthermore, REPTree can deal with the
n
1
N combination of decision trees and Naive-Bayes results in classifiers
Average = (c i )2 (3) which normally achieves better performance than both methods,
N particularly in a larger database. In NBTree, Naive-Bayes classifiers
i=0
are utilized at the leaves to construct a decision tree with uni-
(3) Standard deviation of the coefficients in each sub-band
variate splits at each node. NBTree works similar to the classical
n recursive partitioning schemes, but the leaf nodes generated by
i=1 (ci
− )2
Standard deviation = (4) Naive-Bayes categorizers instead of nodes predicting a single class
n by utilizing the standard entropy minimization to choose a thresh-
(4) Ratio of mean values of neighbour sub-bands. old for continuous attributes similar to decision-trees. The efficacy
of a node is calculated by discretizing the data and calculating the
n
ci cross-validation accuracy estimate of Naïve Bayes at each node to
select attributes [19].
i=1
n
Ratio = (5)
n 2.3.2. Ensemble Machine Learning approaches
cj More recently, researchers within the machine learning arena
j=1 reported that ensemble machine learning (EML) algorithms, con-
n sisting of a finite set of machine learning models combined
In the implementation, different mother wavelet families are together, might offer more accurate solutions when compared to
tried and the one achieves best classification accuracy is selected. any machine learning model alone. In the present contribution, we
Hence, the Daubechies 4 wavelet is used as a mother wavelet. EEG consider some of the EML approaches, which were proved very
signals are decomposed into different levels, and the one resulting efficient in different applications [20].
in the highest classification accuracy rate is selected. As a result,
DWT coefficients A5, D5, D4, D3, D2 and D1 are used to select the 2.3.2.1. Random tree classifiers. One of more well-known EML algo-
aforementioned statistical features. So, considering these six sub- rithms is a Random Tree (RT), introduced by Leo Breiman and Adele
bands, we extracted in total: Cutler. The basic idea to generate a number of single learners in RT
is to use the bagging process. It is noteworthy to mention that the
(1) 6 features from the mean of each sub-band coefficients, RT can solve both classification and regression problems. The basic
(2) 6 features from the average power of each sub-band coeffi- idea for classification is following: (1) the RT classifier takes the
cients, input feature vector, (2) classification is performed on every tree
(3) 6 features from the standard deviation of each sub-band coef- found in the forest, and (3) the “winner” (class label) is selected
ficients and based on the majority of “votes”. By using this procedure, RTs con-
(4) 5 features from the ratio of the mean values of each sub-band sist of acceptably balanced trees, where a single global setting for
coefficients, as these are the ratios of the mean values of two the ridge value works across all leaves, what additionally simplifies
neighbouring sub-bands, so if we have 6 sub-bands, it means the optimization problem [14,21–23].
that we will have 5 ratios.
2.3.2.2. ADTree. ADTree is derived from AdaBoost by adding three
This results in 23 features in total (6 + 6+6 + 5 = 23). As we con- nodes to the tree at each boosting iteration. For each of the split-
siders 2 channels (T3 and F7), in total we extract 23*2 = 46 features ter node, sets of instances are split into split node subsets and two
from each signal. Then, info-gain feature selection with ranker algo- prediction nodes are created. Principally, an ADTree is a graph in
rithm is used for feature selection, but all the features are selected which the knowledge exist in the tree is disseminated as multiple
after the implementation. paths to find predictions. Instances produce multiple splitter nodes
in order to have the prediction node values which are summed to
2.3. Classification module: machine learning algorithms achieve an overall prediction value [24]. Another attractive feature
of ADTree is the ability to be merged together which is impossi-
2.3.1. Standard machine learning approaches ble with conventional boosting procedures. This approach can be
We begin our analysis by first considering standard tools found transformed into the multiclass classification problems, creating a
in the machine learning literature. These include (1) ANN, (2) k- set of voting models [25].
NN (k-Nearest Neighbour), (3) SVM (support vector machine) and
a (4) group of decision tree (DT) methods, which include following 2.3.2.3. LADTree. The LADTree classification technique utilizes the
tools: CART (classification and regression tree), C4.5 decision tree, LogitBoost algorithm to produce an alternating decision tree. At
234 A. Subasi et al. / Biomedical Signal Processing and Control 49 (2019) 231–239
each iteration, a single attribute is selected as a splitter node for controls. In this study, EEG signals with and without flash stimu-
the tree and each training instance weights on a per-class basis. The lation are utilized for the analysis and diagnosis of the migraine
goal is to find the mean value of the instances by minimizing the without aura. This type of migraine usually begins with a dull ache
least-squares value among them. Hence, the sum of the responses and then grows into a constant, aching and pulsating pain which
of all the ensemble classifiers over the classes produce a vector to occurs especially at the superficial temporal artery, as well as the
denote the predicted values. In order to produce distinct trees for occipital artery of the head [7].
each class in parallel, the LogitBoost algorithm can be merged with
the induction of ADTrees [25]. 3.1. Performance evaluation metrics
2.3.2.4. Random forests (RF). Random forest (RF), first proposed by In this contribution, a number of different machine learn-
L. Breiman, is an ensemble learning based supervised learning algo- ing algorithms applied on the DWT-extracted features have been
rithm. RF is basically a collection of decision trees wherein each tree utilised in order to see the effect of window length and photic
is trained by a different subset of the training data-set. This is done stimulation on the accurate migraine identification. In order to con-
in order to meliorate the generalization capability of RF. To con- template the effect on both the relationship between techniques
struct each of the trees, RF chooses a bootstrapped subset of the and the estimation of the classification error, DWT feature-
original training set consisting of about two-third of the training extraction method has been implemented. Since both data sets are
set. RF leaves out the rest of the data as out-of-bag (OOB) instances. limited and required to be selected individually, the EEG data has
RF builds each of the decision trees to the maximum size without been divided into training dataset and test dataset by using k-fold
pruning. This ensures that each tree is grown to maximum depth, cross-validation (CV) and leave-one-out-cross validation (LOOCV).
which in turn ensures that the bias is of the classification model is Many researchers utilise this method for the purpose of limiting
low. But the variance of the classifier still remains high. RF grows the bias related to the random sampling of the generated datasets
a particular decision tree until each terminal node contains only ([28,12,16,29]). In K-fold cross-validation, the data is arbitrarily
members of one class. As the candidate splitters of the node of a separated into a defined number of subsets which are known as
decision tree, a subset q of the total set of Q features is utilised. folds. Cross-validation accuracy is formulated as follows:
The number q remains the same throughout the construction of all
1
k
the decision trees of RF. A test data point passes through respective CVA = Ai (6)
tests as it traverses from the root node to the leaf. Once it arrives at k
j=1
the leaf node, the decision tree votes. The final output of RF for that
test instance is the class having the highest number of votes. Such where k is the number of used folds, and Ai is the accuracy of each
an ensemble of weak classifiers, namely- decision-trees which by fold [30].
themselves have low bias but high variance, make RF a robust and The LOOCV technique utilize the idea of k-fold cross-validation
reliable classification model having not only low variance, but also by using the number of instances in the dataset as the value of k.
low bias. Due to its efficacy and reliability, RF has been success- Besides the variance increases due to large k, the LOOCV technique
ful implemented in various signal classification problems [14,26]. produces overoptimistic estimates. However, LOOCV is a reason-
RF is computationally less expensive than boosting or bagging, yet able assessment technique when creating classification models
robust to noise and outliers. Furthermore, parallel implementations based on small datasets. It is not preferred for large datasets because
of RF can also be easily implemented. of the computational cost. Contrary to the hold-out and k-fold
cross-validation, LOOCV does not permit for randomness in the
assessment process and is seamlessly deterministic and repro-
2.3.2.5. Rotation forest (RoF). RoF is one of the most widely applied
ducible [31].
EML algorithms together with RF. For a base classifier, we create the
To evaluate the performance of a classifier, the number of true
training data by randomly splitting the feature set into N subsets (N
negatives (TN), true positives (TP), false positives (FP) and false neg-
is a tuning parameter) and we apply Principal Component Analysis
atives (FN) are utilised. Furthermore, various definitions are utilised
(PCA) to each of the generated subsets. So as to keep the variabil-
to explain the results on different domains. The specificity and sen-
ity information present in the data, we use all PCs. Therefore, we
sitivity are widely used in diagnosis and identification tests and
have N-axis rotations so as to generate the important features for
defined as follows:
a base classifier. The main reason to use the rotation approach is
to improve the simultaneously single accuracy rates and to keep TP
Sensitivity = × 100% (7)
diversity inside the ensemble. Due to DT sensitivity to rotation of (TP + FN)
the feature axis, they were chosen here and, thusly, named “for- TN
est”. Accuracy is sought by holding all principal components, and Specifity = × 100% (8)
TN + FP
also using the whole dataset to train each base classifier. RoF details
are given in [20,27]. The mostly used performance measure of classifier is the total
classification accuracy, which is defined as
Migraine is the primary headache type and the most widespread Non-Photic Photic
among the active population. Substantial improvements achieved W=256 W=512 W=768 W=256 W=512 W=768
during the past decades in the understanding of the genetics, SVM 67.82 68.57 80.74 74.14 80.24 84.07
k-NN 65.75 70.48 77.78 74.25 80.00 83.33
pharmacology, pathophysiology and epidemiology of migraine,
ANN 65.52 69.52 75.93 74.71 76.19 82.22
but still there are several problems which require further inves-
Random Forest 68.28 71.67 75.19 78.85 78.57 85.19
tigation. This paper proposes a CDSS for automated detection of 64.83 66.67 67.04 71.72 71.90 77.78
CART
migraine which combines ensemble classification methods with C4.5 65.40 68.57 64.81 70.69 68.57 77.04
DWT feature extraction algorithm. The ensemble classifiers have Rotation Forest 67.36 73.10 77.41 77.70 79.05 83.70
capabilities of accurate decision making, while feature extraction REPTree 64.14 68.33 65.93 72.41 68.81 74.81
and dimension reduction methods enhanced the classification step. RandomTree 61.95 65.00 68.15 71.38 68.81 76.30
New developments in ensemble classification methods contribute ADTree 64.83 67.62 66.67 69.43 71.67 73.33
to a fully understanding of this important medical issue. The results LADTree 64.71 68.10 70.74 72.07 72.38 77.41
NBTree 65.98 72.14 65.93 72.53 75.71 75.93
found by neuroscientists and utilizing machine learning methods
shows the possibility in further improvement the diagnosis accu-
racy and makes daily clinical practice easier. Utilization of machine Table 2
learning methods in migraine detection help physicians in making Performance evaluation of classification methods with a window length of 3 s.
their judgment faster and more precise [6]. Non-Photic Photic
The general framework of the proposed system in this contri- Accuracy AUC F-Measure Accuracy AUC F-Measure
bution is presented in Fig. 2. It can be seen that it has three main SVM 80.74 0.807 0.807 84.07 0.841 0.841
modules: (1) signal segmentation, (2) feature extraction/dimension k-NN 77.78 0.833 0.778 83.33 0.86 0.833
ANN 75.93 0.815 0.758 82.22 0.888 0.822
reduction and (3) classification. The flash stimulations are applied
Random Forest 75.19 0.843 0.752 85.19 0.909 0.852
in different time periods and each has 30 s duration (with on off CART 67.04 0.678 0.669 77.78 0.787 0.778
periods). Then we divided (by using rectangle windowing) these C4.5 64.81 0.686 0.643 77.04 0.807 0.77
recorded signals into smaller number of lengths (1, 2, 3, secs). In Rotation Forest 77.41 0.84 0.773 83.70 0.912 0.837
this study, since the number of subjects is limited, we segmented REPTree 65.93 0.722 0.658 74.81 0.815 0.748
RandomTree 68.15 0.681 0.681 76.30 0.763 0.763
EEG signals with different lengths (1, 2 or 3 s). For 1 s data we used
ADTree 66.67 0.749 0.666 73.33 0.832 0.732
435 instances for each class, for 2 s data we used 210 instances for LADTree 70.74 0.768 0.707 77.41 0.834 0.774
each class, for 3 s data we used 135 instances for each class. Then NBTree 65.93 0.688 0.657 75.93 0.792 0.759
the statistical features extracted for each sub-band of DWT were
used to classify EEG signals for diagnosis of migraine. As a result,
for each class, total 46 features (23 for each channel T3 and F7) are the performances of the all classifiers are improved by applying
extracted from each signal (see Fig. 3). At the end, different sin- flash stimulation for migraine detection. With flash stimulation
gle/ensemble classification algorithms employed for more precise and the identical window length (3 s), random forest achieves
EEG signal recognition. the highest classification accuracy with 85.19%, SVM comes sec-
The performances of the single and ensemble classifiers are ond best with 84.07% and rotation forest is the third with 83.70%
evaluated in terms of total classification accuracy, F-measure and ADTree comes last with 73.33% accuracy, which is slightly lower
AUC which are defined in previous section. We compared the than the other DT classifiers. The total accuracy for random forest
performance of the machine learning techniques by employing 10- increased from 75.19% to 85.19% following flash stimulation appli-
fold cross-validation and leave one out cross validation (LOOCV) cation. Improvement in 10% accuracy confirms that superiority of
approaches. In this study, several classification algorithms are the flash stimulation in migraine detection.
implemented with different window lengths for two different The experimental results are presented in the Table 1 and 2
scenarios: (1) with photic stimulation and (2) without photic stim- show that the machine learning methods achieved a good classifi-
ulation using 10-fold cross validation and LOOCV. For each subject, cation performance for migraine detection with different window
EEG signals acquired after each flash stimulation, delivered on face. lengths. The best results were achieved by Random Forest, and
The flash stimulation is applied with different frequencies (2, 4 and the results are satisfactory from the medical point of view as well.
8 Hz) and durations (4 and 8 s). For each series we calculated the Hence, it can be possible to implement a fully automated decision
10-fold CV and LOOCV accuracy. Classifiers performances for the support system for migraine detection which realize a better diag-
clinical EEG data have been summarized in Tables from 1 to 4. All nosis and accuracy level comparable, or even slightly better, to an
methods performed reasonably well according to total classifica- experienced neuroscientist. Furthermore, it can be seen easily that
tion accuracy, F-measure and AUC. utilized photic stimulation slightly better results than models built
From the performance presented in Table 1, we can ascertain on without photic stimulation.
the optimal window size and the effect of window length on classi- Furthermore, to validate performances of different models, we
fication performance. As it can be seen easily from the Table 1, the also consider receiver operating characteristic (ROC) curve. To gen-
accuracy of classifiers increased by increasing the window length erate the curve, TPs are plotted as a percentage of all the positives
in both with or without flash stimulation. Without flash stim- and negatives in the sample. In addition, plotting the results on ROC
ulation and window length of 3 s (768 samples), rotation forest, curve is achieved by counting the number of true and false positives
random forest, k-NN and ANN gave similar performance, yielding in the test set in each fold of cross-validation [21]. Therefore, classi-
77.41%, 75.19%, 77.78% and 75.93% respectively. Nevertheless, SVM fication performance can be evaluated by the area, which is under
emerges as the best classification model yielding 80.74% accuracy the ROC curve (AUC). Another way to evaluate performance is the
for 3-s signal segments in the non-photic case. In addition, random mean of AUC, which demonstrates the reliability of the results by
forest is superior to other methods for EEG pattern classification using input data [34–39]. It can be seen from Table 2 that AUC of
with photic stimulation. Rotation Forest classifier is the highest (0.912), followed by Ran-
Effect of flash stimulation on the performances of the classi- dom Forest (0.909). It is worth noting that Random Tree has the
fiers is apparent by comparing results given in Table 2. Actually, lowest AUC.
236 A. Subasi et al. / Biomedical Signal Processing and Control 49 (2019) 231–239
3.4. Discussion
Table 5
The average accuracy and standard deviation of classification methods with a win- In this study, a CDSS for the identification of migraine was
dow length of 3 s using 10-fold CV. developed using ensemble machine learning algorithms which
Non-Photic Photic
are applied on the features extracted with DWT from the neural
AVERAGE SD AVERAGE SD responses during repetitive flash stimulation, recorded with mul-
SVM 80.70% 0.93% 84.10% 0.90% tichannel EEG. The proposed system has several advantages. The
k-NN 77.80% 1.20% 83.30% 0.96% first advantage is a high classification accuracy of approximately
ANN 75.90% 1.30% 82.20% 1.10% 86%. The second advantage is relative short segment window (3 s)
Random Forest 75.20% 0.95% 85.20% 0.92% for which the classification can be made, once we have the trained
NBTree 65.90% 2.50% 75.90% 2.20% model. The third advantage lies in a completely automated system,
CART 67.00% 2.30% 77.80% 2.60% without a need for a physician.
C4.5 64.80% 2.40% 77.00% 1.90% In this study, it was aimed to design a model to help the diagnosis
Rotation Forest 77.40% 1.95% 83.70% 2.10% and treatment of migraine using EEG signals. When the dimensions
REPTree 65.90% 2.60% 74.80% 2.50% of the EEG signals are very large, it is difficult to work with the data
RandomTree 68.10% 2.10% 76.30% 2.25% at this size. Therefore, first the DWT feature extraction method was
LADTree 70.70% 2.25% 77.40% 1.95% used to extract valuable and informative features. Then statistical
ADTree 66.70% 2.80% 73.30% 2.75% values of DWT sub-bands are calculated to present the distribu-
tion of wavelet coefficients. In the last step, obtained feature set is
used as an input to the ensemble classifiers. Obtained features were
is the highest (88.90%) with flash stimulation by using 10-fold cross classified with different single and ensemble classifiers firstly. They
validation. On the other hand, the sensitivity of the SVM classifier is were classified with Random Tree, ADTree, LADTree, Random For-
the highest (82.20%) without flash stimulation and the sensitivity est and Rotation Forest ensemble classifiers and the results were
of the Rotation Forest classifier is the highest (86.70%) with flash compared with each other. This study shows that using ensem-
stimulation by using 10-fold cross validation. ble learning techniques has improved the success rate significantly.
The specificity of the NBTree classifier is the highest (81.50%) The study of the classification of EEG signals in the literature will
without flash stimulation and the specificity of the k-NN and reveal that there is almost no study on ensemble techniques in this
ADTree classifiers are the highest (88.10%) with flash stimulation by regard.
using LOOCV. On the other hand, the sensitivity of the SVM classifier Furthermore, a CDSS for the diagnosis of migraine has been
is the highest (82.20%) without flash stimulation and the sensitiv- developed using a DWT and several machine learning techniques.
ity of the Random Forest classifier is the highest (85.95%) with flash The system confirmed a higher classification performance for con-
stimulation by using LOOCV. It can be seen that the Random Forest trol subjects and migraineurs. The proposed CDSS revealed multiple
outperforms all other classifiers, in terms of both total classification advantages. Since Random Forest is an ensemble approach, it is
accuracy and AUC by utilizing 10-fold cross validation and LOOCV. principally efficient in migraine detection under the flash stim-
Also, we check the mean accuracy and standard deviation of ulation. Furthermore, unlike other machine learning techniques,
classifier which is presented in Table 5. As it can be seen from the Random Forest is fast and more accurate. Moreover, the proposed
Table 5, the standard deviation of classifiers ranges from 0.9 to 2.8. framework has been designed to help neurologists in migraine
The highest average accuracy of 85.20 and standard deviation of detection. The results have shown a high degree of accuracy in rec-
0.92 is achieved by Random Forest classifier using 10-fold cross val- ognizing migraine with photic stimulation and 3 s signal length,
idation. To test the statistical significance in the difference of the however, it was not so good to accurately diagnose migraine when
performance metrics, the paired t-test is used, and the results are photic stimulation is not applied, and shorter signal length is
highly significant (p ≤ 0.01).This shows that our results are statisti- employed. With the combination of flash stimulation and 3 s signal
cally significant according to the standard deviation and the paired length, migraine can be diagnosed with a high degree of accuracy.
t-test results of the classifiers. The proposed framework as a whole can be realized in hospitals in
In control subjects, after applying flash stimulation, there was a order to help neurologists and as well as general practitioners in
substantial increase in the accuracy in many classifiers except some the migraine detection.
of them which were achieved lower accuracy. For the migraine Moreover, an evaluation of flash stimulated EEG changes
patients, the same trend was observed even if after the flash stimu- have been reported by utilizing different machine learning tech-
lation migraine patients showed a substantial increase in accuracy niques, which was able to present elusive brain electrical activity
for all classifiers. Hence, the classification accuracy was increased changes in migraine patients during the flash stimulation. In
with flash stimulation in migraineurs and superior than the control the present study, the condition of continuous photic stimuli
238 A. Subasi et al. / Biomedical Signal Processing and Control 49 (2019) 231–239
Table 6
Comparison of the Classification Accuracies Achieved by Different Studies.
increased the accuracy in controls especially migraine patients. signals. In order to evaluate the performance of the classifiers, a
The time–frequency analysis reproduced the findings described comparative study is realized by utilizing the photic and nonphotic
by Mouraux and Plaghki [42] and Ohara et al. [43] showed that EEG signals. The proposed technique, feature extraction using DWT
alpha and beta power increase after the photic stimulus, in both modelling, seems to be suitable for EEG signal analysis and migraine
healthy and migraine patients. Experimental results of this study detection. The effect of window size and photic stimulation is also
also showed the similar results. The time–frequency analysis was presented. According to the presented results in the classification
confirmed that the EEG changes after photic stimulation in alpha of EEG signals, we should emphases the following:
and beta power, which were time-correlated to the long-persisting
cortical activation. Besides, the further analysis has shown that the 1 Among the machine learning techniques, Random forest can be
consistent variations of cortical activation pattern between healthy successfully applied in EEG signal classification for the migraine
and migraineurs. An approach by utilizing the machine learning detection due to their stable and high performance with an accu-
algorithms have presented in EEG signal modelling that utilizes racy around 86%.
the DWT in the model regularization and nonlinear modelling of 2 Selection of input variable is an important step in creating the
the EEG signals. This allows measuring randomness in clinical EEG classifier. In this study, the statistical values for each sub-band of
signals by utilizing the 10-fold CV and LOOCV for the analysis of DWT are used as an input to a classifier. Statistical features can
EEG signals from healthy subjects and migraineurs, with and with- reduce the number of features in a subset; that allows the use of
out flash stimulation. The analysis of EEG signals has shown that the smallest subset that can be consistent with the full feature
the utilization of flash stimulation achieved better classification set. Hence, the features chosen for model construction are those
accuracy. related to the signal statistics of the different frequency bands.
Regarding to the clinical application, use of the flash stimulation Clinicians should take care to understand the requirements of the
increased the classifier performance. Since flash stimulation creates model before use.
the association of the local activity in cortex, EEG signals turn out 3 There were no significant changes regarding the accuracy using
to be more recognizable after the flash stimulation. The improve- two validation methods (10 folds cross-validation and LOOCV).
ment of EEG recognition after the flash stimulation might be a sign However, classification performance decreased slightly using
of cortical reactivity to peripheral circumstances in which the cor- the LOOCV method without flash stimulation. In contrast, most
tex might decrease the degree of randomness to effectively obtain of the classifiers showed some improvement in the specificity,
and intricate the novel photic stimulation. Actually de Tommaso sensitivity and classification accuracy using LOOCV with photic
et al. [44] reported that the variations of EEG recognition achieves stimulation. It was stable regarding the prediction of the patients
a normal decrease of beta band spectral density over the whole with migraine as it showed 88% of sensitivity using both methods.
analyzed time after the stimulus, especially this phenomenon was 4 Some of the methods’ performance decreased slightly using
efficient in migraine patients. These techniques allowed to repre- LOOCV at the expense of enhanced specificities. The ADtree clas-
sent a changed cortical response to the pain in sustained attention sifier achieved the worst performance, with 51.90% classification
circumstances of migraineurs. Their method was very sensitive to accuracy. Finally, there were noticeable improvement in the clas-
the stimulus-related EEG changes. If the painful flash stimuli was sifiers performance by increasing window length and applying
not the source of migraine, an acceptable involvement of these cor- flash stimulation.
tical regions with the pain modulation system can be compromised
during diverse situations.
Each classification method has different logic to adjust param- 4. Conclusions
eters. For example, only one key parameter (number of trees) is
adjusted in RF. Furthermore, clinicians do not know the mean- In this study, we investigated the optimal choice of classifier
ing of some parameters exactly. Because of creating appropriate for EEG-based migraine detection. We also studied the optimal
algorithm is difficult in clinical practice, various kinds of machine window size and the effect of flash stimulation on detection per-
learning methods have been developed. There are some criteria formance. The novel contribution of this article is the development
like ease of use, performance and interpretation which are the of a classification algorithm for migraine detection using EEG sig-
most important concern in choosing an appropriate algorithm. nals and to determine the best choice of window length. Our results
There is almost no study in the literature with ensemble learn- demonstrate that the best performance of the algorithm is achieved
ing methods in diagnosis of neuromuscular disorders. Hence, this in this study by employing flash stimulation and 3-s window length.
study indicates that ensemble learning methods have potential for While examining the efficacy of various classification models for
the migraine detection by utilizing EEG signals. Comparisons of different window sizes, we concluded that higher accuracy can
classifiers created in this study with similar systems, diversity of be achieved by rearranging the parameters of the classifiers. DWT
classification techniques, EEG signals classified in systems. EMG feature-extraction and Random Forest classifier with flash stimula-
signal processing techniques and their features is a challenging tion is perhaps of great help for the clinician for migraine diagnosis.
task. The results obtained in this study were found to perform Given the results, it can be debated that EEG is more decisive in
satisfactorily with a success rate of 85.95%, compared with the liter- diagnosing migraine without aura. In addition, it is observed that
ature examples. Hence, in this study, we evaluate the performance migraine patients are more triggered by exposure to painful stimuli
of different classifiers for the migraine detection employing EEG such as flash stimulation. After implementing DWT and classifica-
tion phases, it can be concluded that EEG is sufficient for diagnosis
A. Subasi et al. / Biomedical Signal Processing and Control 49 (2019) 231–239 239
of migraine under flash stimulation. Light and sound are known [17] D.G. Manolakis, V.K. Ingle, S.M. Kogon, Statistical and Adaptive Signal
to be triggering factors in migraine. Outcomes of this study are of Processing, McGraw-Hill, MA, 2005.
[18] I.H. Witten, E. Frank, M.A. Hall, C.J. Pal, Data Mining: Practical Machine
relevance to the physiological response of the brain. Learning Tools and Techniques, Morgan Kaufmann, 2016.
Furthermore, using clinical data set, a systematic comparison [19] R. Kohavi, Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree
and assessment of different well-known supervised machine learn- Hybrid, vol. 96, Presented at the KDD, Citeseer, 1996, pp. 202–207.
[20] E. Alickovic, A. Subasi, Breast cancer diagnosis using GA feature selection and
ing classifiers have been presented. 10-folds cross validation and Rotation Forest, Neural Comput. Appl. 28 (4) (2017) 753–763.
LOOCV methods have been utilized for the performance evaluation. [21] I. Witten, E. Frank, Data Mining, Practical Machine Learning Tools and,
Statistical measurements such as specificity, sensitivity and total Morgan Kaufmann Publishers, Elsevier, San Francisco, CA, 2005.
[22] K. Sushilkumar, Analysis of WEKA data mining algorithm REPTree, Simple
classification accuracy were employed for classifiers evaluation. As
CART and RandomTree for classification of Indian news, International Journal
a result, the experimental results have shown promising results. of Innovative Science, Engineering and Technology 2 (2) (2015) 438–446.
Employing LOOCV method, Random Forest classifier achieved [23] A. Subasi, E. Alickovic, J. Kevric, Diagnosis of chronic kidney disease by using
random fore, International Conference on Medical and Biological Engineering
85.95% of classification accuracy. Actually, most of the classifiers
2017 (CMBEBIH 2017) (2017).
performance increased slightly using LOOCV. However, classifica- [24] Y. Freund, L. Mason, The Alternating Decision Tree Learning Algorithm, vol.
tion accuracies of some classifiers were decreased slightly using the 99, Presented at the ICML, 1999, pp. 124–133.
10-folds cross validation method. Moreover, the ensemble classi- [25] G. Holmes, B. Pfahringer, R. Kirkby, E. Frank, M. Hall, Multiclass alternating
decision trees, Presented at the European Conference on Machine Learning
fier presented enhancement in the classification accuracy. Hence, (2002) 161–172.
we can say that ensemble classifiers can be the appropriate classi- [26] A.R. Hassan, M.I. Bhuiyan, Computer-aided sleep staging using complete
fication method for migraine detection. ensemble empirical mode decomposition with adaptive noise and bootstrap
aggregating, Biomed. Signal Process. Control 24 (2016) 1–10.
Moreover, we should emphasize that results obtained in this [27] J. Rodríguez, L. Kuncheva, C. Alonso, Rotation forest: a new classifier ensemble
study are preliminary investigations of a wide problem of the auto- method, IEEE Trans. Pattern Anal. Machine Intell. 28 (10) (2006) 1619–1630.
mated migraine detection. Our future aim will concentrate on a full [28] G.J. McLachlan, C. Ambroise, Selection bias in Gene extraction on the basis of
microarray Gene-expressed data, Proc. Natl. Acad. Sci. U. S. A. (2002)
diagnostic case including different diseases. We would like to uti- 6562–6566.
lize more complex classification techniques, such as hierarchical [29] J.J. Sandvig, B. Mobasher, R. Burke, A survey of collaborative recommendation
classifiers, to improve the quality of proposed CDSS. and the robustness of model-based algorithms, IEEE Comput. Soc. Techn.
Committee Data Eng. 31 (2) (2008) 3–13.
[30] R. Zhang, G. McAllister, B. Scotney, S. McClean, Combining wavelet analysis
Acknowledgement and bayesian networks for the classification of auditory brainstem response,
Inform. Technol. Biomed. IEEE Trans. on 10 (2006) 458–467.
[31] P. Cichosz, Data Mining Algorithms: Explained Using R, John Wiley & Sons,
The authors would like to thank Dr. Deniz Tuncel from Kahra-
2014.
manmaras Sutcu Imam University, Neurology Department for [32] A. Subasi, Classification of EMG signals using PSO optimized SVM for
providing the EEG data utilised in this research. diagnosis, Comput. Biol. Med. 43 (5) (2013) 576–586.
[33] M. Hall, I. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and
Techniques, Kaufmann, Burlington, 2011.
References [34] J.A. Handley, B.J. McNeil, The meaning and use of the area under a receiver
operating characteristic (ROC) curve, Radiology 143 (1982) 29–36.
[1] M.R. Foundation, About Migraine. (Migraine Research Foundation) Retrieved [35] N.A. Obuchowski, Receiver operating characteristic curves and their use in
April 4, Migraine Research Foundation, 2016 http://www. radiology, Radiology 229 (2003) 3–8.
migraineresearchfoundation.org/about-migraine.html. [36] J.A. Swets, ROC analysis applied to the evaluation of medical imaging
[2] T.M. Trust, Facts and Figures. Retrieved April 7, 2016, from, 2014 https:// techniques, nvest.Radiol 14 (1979) 109–121.
www.migrainetrust.org/about-migraine/migraine-what-is-it/facts-figures/. [37] J.M. Deleo, Receiver operating characteristic laboratory (ROCLAB): software
[3] R. Diagnosis, Right Diagnosis (healthgrades) Retrieved April 4, from, 2016 for developing decision strategies that account for uncertainty, in:
http://www.rightdiagnosis.com/m/migraine/misdiag.htm. Proceedings of the Second International Symposium on Uncertainity
[4] Y. Ziming, D. Zhao, L. Xudong, Y. Shengyuan, C. Xiaoyan, D. Huilong, A clinical Modelling and Analysis, 1993, pp. 318–325.
decision support system for the diagnosis of probable migraine and probable [38] A. Devos, L. Lukas, J.A. Suykens, L. Vanhamme, A.R. Tate, F.A. Howe, et al.,
tension-type headache based on case-based reasoning, J. Headache Pain 16 Classification of brain tumours using short echo time 1H MR spectra, J. Magn.
(29) (2015) 1–9, http://dx.doi.org/10.1186/s10194-015-0512-x. Reson. 170 (2004) 164–175.
[5] R. Bellotti, F. De Carlo, M. De Tommaso, M. Lucente, Classification of [39] M.H. Zweig, G. Campbell, Receiver-operating characteristic (ROC) plots: a
spontaneous EEG signals in migraine, Physica (2007), 1, 2, 4, 8. Retrieved 2016. fundamental evaluation tool in clinical medicine, Clin. Chem. 39 (8) (1993)
[6] B. Krawczyk, D. Simić, S. Simić, M. Woźniak, Automatic diagnosis of primary 561–577.
headaches by machine learning methods, Open Med. 8 (2) (2013) 157–165. [40] Selahaddin Batuhan Akben, A. Subasi, D. Tuncel, Analysis of repetitive flash
[7] S.B. Akben, D. Tuncel, A. Alkan, Classification of multi-channel EEG signals for stimulation frequencies and record periods to detect migraine using artificial
migraine detection, Biomed. Res. 27 (3) (2016) 743–748. neural network, J. Med. Syst. 36 (2) (2012) 925–931.
[8] E. Niedermeyer, F.L. da Silva, Electroencephalography: basic principles, [41] K. Jackowski, D. Jankowski, D. Simić, S. Simić, Migraine diagnosis support
clinical applications, and related fields, Lippincott Williams & Wilkins. (2005). system based on classifier ensemble, in: ICT Innovations 2014, Springer, 2015,
[9] J.L. Semmlow, Biosignal and Biomedical Image Processing, Marcel Dekker, pp. 329–339.
New York, 2004. [42] A. Mouraux, L. Plaghki, Single-trial detection of human brain responses
[10] R. Begg, D.T. Lai, M. Palaniswami, Computational Intelligence in Biomedical evoked by laser activation of A␦-nociceptors using the wavelet transform of
Engineering, CRC Press, Boca Raton, 2008. EEG epochs, Neurosci. Lett. 361 (1–3) (2004) 241–244.
[11] M. Akay, Wavelet applications in medicine, IEEE Spectr. 34 (1997) 50–56. [43] S. Ohara, A. Ikeda, T. Kunieda, S. Yazawa, K. Baba, T. Nagamine, et al.,
[12] A. Subasi, Classification of EMG signals using combined features and soft Movement-related change of electrocorticographic activity in human
computing, Appl. Soft Comput. 12 (8) (2012) 2188–2198. supplementary motor area proper, Brain 123 (6) (2000) 1203–1215.
[13] A. Subasi, EEG signal classification using wavelet feature extraction and a [44] M. de Tommaso, D. Marinazzo, S. Stramaglia, The measure of randomness by
mixture of expert model, Expert Syst. Appl. 32 (2007) 1084–1093. leave-one-out prediction error in the analysis of EEG after laser painful
[14] E. Alickovic, A. Subasi, Medical decision support system for diagnosis of heart stimulation in healthy subjects and migraine patients, Clin. Neurophysiol. 116
arrhythmia using DWT and random forests classifier, J. Med. Syst. 40 (108) (12) (2005) 2775–2782.
(2016). [45] E. Alickovic, J. Kevric, A. Subasi, Performance evaluation of empirical mode
[15] A. Kandaswamy, C.S. Kumar, R.P. Ramanathan, S. Jayaraman, N. Malmurugan, decomposition, discrete wavelet transform, and wavelet packed
Neural classification of lung sounds using wavelet coefficients, Comput. Biol. decomposition for automated epileptic seizure detection and prediction,
Med. 34 (6) (2004) 523–537. Biomed. Signal Process. Control 39 (2018) 94–102.
[16] A. Subasi, Medical decision support system for diagnosis of neuromuscular
disorders using DWT and fuzzy support vector machines, Comput. Biol. Med.
42 (8) (2012) 806–815.