0% found this document useful (0 votes)
10 views11 pages

Disease Using Extreme ML

This document presents a computer-aided diagnosis system using principle component analysis and extreme learning machine to diagnose thyroid disease. The system first uses PCA for dimension reduction, then trains an ELM classifier on the reduced features. Experimental results show the PCA-ELM system achieves high accuracy of 97.73% on thyroid disease classification and performs faster than support vector machine methods.

Uploaded by

gopalgoyal012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

Disease Using Extreme ML

This document presents a computer-aided diagnosis system using principle component analysis and extreme learning machine to diagnose thyroid disease. The system first uses PCA for dimension reduction, then trains an ELM classifier on the reduced features. Experimental results show the PCA-ELM system achieves high accuracy of 97.73% on thyroid disease classification and performs faster than support vector machine methods.

Uploaded by

gopalgoyal012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

J Med Syst (2012) 36:3327–3337

DOI 10.1007/s10916-012-9825-3

ORIGINAL PAPER

A Computer Aided Diagnosis System for Thyroid Disease


Using Extreme Learning Machine
Li-Na Li & Ji-Hong Ouyang & Hui-Ling Chen &
Da-You Liu

Received: 17 November 2011 / Accepted: 26 January 2012 / Published online: 12 February 2012
# Springer Science+Business Media, LLC 2012

Abstract In this paper, we present an effective and efficient validation method, with the mean accuracy of 97.73% and
computer aided diagnosis (CAD) system based on principle with the maximum accuracy of 98.1%. Besides, PCA-ELM
component analysis (PCA) and extreme learning machine performs much faster than support vector machines (SVM)
(ELM) to assist the task of thyroid disease diagnosis. The based CAD system. Consequently, the proposed method
CAD system is comprised of three stages. Focusing on PCA-ELM can be considered as a new powerful tools for
dimension reduction, the first stage applies PCA to construct diagnosing thyroid disease with excellent performance and
the most discriminative new feature set. After then, the less time.
system switches to the second stage whose target is model
construction. ELM classifier is explored to train an optimal Keywords Thyroid disease diagnosis . Extreme learning
predictive model whose parameters are optimized. As we machine (ELM) . Principle component analysis (PCA)
known, the number of hidden neurons has an important role
in the performance of ELM, so we propose an experimental
method to hunt for the optimal value. Finally, the obtained Introduction
optimal ELM model proceeds to perform the thyroid disease
diagnosis tasks using the most discriminative new feature The thyroid is a small gland, shaped like a butterfly, located in
set and the optimal parameters. The effectiveness of the the lower part of the neck below the skin and muscle layers.
resultant CAD system (PCA-ELM) has been rigorously The thyroid gland produces two active thyroid hormones,
estimated on a thyroid disease dataset which is taken from levothyroxine (abbreviated T4) and triiodothyronine (abbrevi-
UCI machine learning repository. We compare it with other ated T3). These hormones are important in the production of
related methods in terms of their classification accuracy. proteins, in the regulation of body temperature, and in overall
Experimental results demonstrate that PCA-ELM outper- energy production and regulation. As a result, thyroid function
forms other ones reported so far by 10-fold cross- impacts on every essential organ in the body. The seriousness of
thyroid disorders should not be underestimated (http://thyroid.
about.com/library/links/blthyroid.htm).
L.-N. Li : J.-H. Ouyang : H.-L. Chen : D.-Y. Liu (*)
The thyroid gland is prone to several very distinct prob-
College of Computer Science and Technology, Jilin University, lems, some of which are extremely common. Production of
No.2699, QianJin Road, too little thyroid hormone causes hypothyroidism or produc-
Changchun, Jilin 130012, China tion of too much thyroid hormone causes hyperthyroidism.
e-mail: [email protected]
For the former case, it is a condition where the thyroid is
D.-Y. Liu under-active and unable to produce sufficient levels of thyroid
e-mail: [email protected] hormone. On the contrary, the thyroid gland is overactive, and
produces an excess of thyroid hormone for the latter case.
L.-N. Li : J.-H. Ouyang : H.-L. Chen : D.-Y. Liu
Both types of disorders are relatively common in the general
Key Laboratory of Symbolic Computation and Knowledge
Engineering of Ministry of Education, population. Doctors can incorporate numerous factors, includ-
Changchun, Jilin 130012, China ing clinical evaluation, blood tests, imaging tests, biopsies,
3328 J Med Syst (2012) 36:3327–3337

and other tests to diagnose thyroid disease. A common used the best performance among the common classifiers from
method is a test, called the thyroid-stimulating hormone machine learning community. However, for dealing the multi-
(TSH) test, which can identify thyroid disorders even before class problems, SVM usually takes the combination of binary
the onset of symptoms. classifiers via One-Versus-All (OVA) or One-Versus-One
Nowadays, CAD systems are getting more and more (OVO) strategies [9], which result in great computational
popular. Because with the help of the CAD systems, the burden and long training time. In addition, many CAD sys-
possible errors experts made in the course of diagnosis can tems are based on artificial neural network (ANN), such as
be avoided, and the medical data can be examined in shorter CSFNN、PPFNN and DIMLP, which also has achieved very
time and more detailed as well. In fact, thyroid function good performance on the thyroid disease diagnostic problem.
diagnosis can be formulated as the classification problem, so However, ANN is easy to get stuck in the local minima since it
it can be automatically performed with the aid of the CAD is based on the empirical risk minimization principle. Recent-
systems. Machine learning techniques are increasingly in- ly, a new learning algorithm for a single hidden layer feed-
troduced to construct the CAD systems owing to its strong forward neural networks (SLFNs) was proposed, called ex-
capability of extracting complex relationships in the bio- treme learning machine (ELM). ELM was first introduced by
medical data. Recently, various methods have been pre- Huang [10]. Unlike iteratively adjusting network parameters
sented to solve this problem. In 2002, Ozyilmaz et al. [1] which is commonly adopted by gradient-based methods,
used various neural network methods including Multi Layer ELM choose input weights and hidden biases randomly, and
Perception with Back-Propagation method (MLP), Radial the output weights are analytically determined by using
Basis Function (RBF) and adaptive Conic Section Function Moore–Penrose (MP) generalized inverse. ELM not only
Neural Network (CSFNN) to help diagnosis of thyroid learns much faster with higher generalization performance,
disease, their classification accuracies are separately but also keeping parameter tuning-free. Thanks to its good
88.3%, 81.69% and 85.92%. In 1997, Probabilistic Potential properties, ELM has found its application in a wide range of
Function Neural Network (PPFNN) classifier [2] was classification tasks such as sales forecasting [11], predicting
employed and the accuracy of 78.14% was obtained. In patient outcomes [12] and so on. Especially, ELM has shown
2004, Pasi et al. [3] applied five different methods including its unique advantage over other learning algorithms on several
Linear Discriminant Analysis (LDA), C4.5 with default disease diagnostic tasks. Han et al. [13] proposed an effective
learning parameters (C4.5-1), C4.5 with parameter c equal model based on ELM to forecast how long the postoperative
to 5 (C4.5-2), C4.5 with parameter c equal to 95 (C4.5-3) patients suffered from non-small cell lung cancer will survive.
and DIMLP with two hidden layers and default learning The experimental results have shown the proposed model
parameters(DIMLP) to perform classification, and the accu- obtains better prediction accuracy and faster convergence rate
racies reached 81.34%, 93.26%, 92.81%, 92.94% and than those of ANN models. Zhang et al. [14] evaluated the
94.86% respectively. In 2006, an accuracy of 81% was multi-category classification performance of ELM for cancer
obtained with the application of artificial immune recogni- diagnosis based on the microarray data sets, the results indi-
tion system (AIRS) proposed by Polat et al. [4]. Further cate that ELM produces comparable or better classification
more, the author studied a hybrid method that combines accuracies with reduced training time and implementation
AIRS with a developed Fuzzy weighted pre-processing, complexity compared to ANN and SVM methods. Helmy et
and obtained a classification accuracy of 85%. In 2008, al. [15] proposed to use ELM for five kinds of disease diag-
Keles et al. [5] diagnosed thyroid diseases with a expert nostic problems, the evaluation results indicate that ELM
system that called ESTDD (expert system for thyroid dis- produces better classification accuracy with reduced training
ease diagnosis), whose accuracy was 95.33%. In 2009, time and implementation complexity compared to other mod-
Temurtas [6] realized the diagnosis by Multi Layer Percep- els. Gomathi et al. [16] proposed an ELM based CAD system
tion with Levenberg-Marquardt (LM) algorithm (MLP with for lung cancer detection, whose experimental results show
LM), and the corresponding accuracy was 93.19%. In 2011, that the usage of ELM instead of SVM will result in better
a Generalized Discriminant Analysis (GDA) and Wavelet accuracy of classification.
Support Vector Machine (WSVM) (GDA-WSVM) [7] In this paper we attempt to investigate the effective-
method for diagnosis of thyroid diseases was presented, ness of ELM approach in conducting the thyroid disease
and obtained 91.86% classification accuracy. In 2011, Chen diagnostic problem. Aiming at improving the efficiency
[8] proposed a particle swarm optimization optimized sup- and effectiveness of the classification accuracy for thy-
port vector machines with fisher score (FS-PSO-SVM) roid disease diagnosis, a CAD system based on ELM is
CAD system for thyroid disease, and the average accuracy introduced. The previous works [8, 17–20] have pointed
of 97.49% was achieved. out that using feature selection or feature extraction
From these works, we can clearly see that the support before conducting the classification tasks can improve
vector machines (SVM) [8] based CAD system has achieved the diagnosis accuracy. Here, we attempt to examine the
J Med Syst (2012) 36:3327–3337 3329

effectiveness of the dimensionality reduction technique The PCA method can be formularized as below: Given a
before constructing the ELM classifier for the thyroid s*t matrix M, which is a t-dimensional data set with each
disease diagnosis. Principle component analysis (PCA) row represents a different observation of the variables, and
is utilized to do the feature extraction which projects the each column gives the observation values of a variable.
original feature space into a new space, on which the Besides, the mean value of matrix is zero. That is we first
ELM is used to perform the diagnostic task. The resul- subtract the mean from each of the data dimension. The
tant CAD system was coined with the name of PCA- singular value decomposition of M is M0UΣVT, where U is
ELM. The effectiveness of the proposed PCA-ELM is a s*s orthonormal matrix containing the left singular vectors
examined in terms of classification accuracy on the of M which are the principal directions, the matrix Σ is an
thyroid disease dataset taken from UCI machine learn- s*t rectangular diagonal matrix with nonnegative real numb-
ing repository. Promisingly, as can be seen that the ers on the diagonal, and the t*t matrix V is the matrix of the
developed PCA-ELM CAD system has achieved high right eigenvectors of M. If we require that the PCA trans-
accuracy and runs very fast as well. formation preserves dimensionality, then new matrix N is
The remainder of this paper is organized as follows. given by: N0UTM; If we want a reduced-dimensionality
Background materials offers brief background knowledge representation, we can project M down into the reduced
on PCA and ELM. The detail of implementations of the space defined by only the first L singular vectors UL, then
hybrid method PCA-ELM is described in ‘Proposed PCA- N0(UL)TM.
ELM CAD system’. Experimental designs presents the ex-
perimental results and discussion of the proposed method.
Finally, conclusions and recommendations for future work A brief review of extreme learning machine (ELM)
are summarized in ‘Conclusions’.
Extreme learning machine (ELM) as a new learning algo-
rithm for single layer feed forward neural networks (SLFNs)
Background materials as shown in Fig. 1 was first introduced by Huang el al. [10].
ELM seeks to overcome the challenging issues faced with
Principle component analysis (PCA) the traditional SLFNs learning algorithms such as slow
learning speed, trivial parameter tuning and poor general-
PCA was invented in 1901 by Karl Pearson [21]. It is a ization capability. ELM has demonstrated great potential in
way of identifying patterns in data, and expressing the handling classification and regression tasks with excellent
data in such a way as to highlight their similarities and generalization performance. The learning speed of ELM is
differences. Since patterns in data can be hard to find in much faster than conventional gradient based iterative learn-
data of high dimension, PCA supply the user with a ing algorithms of SLFNs like back propagation algorithm
lower-dimensional picture, a “shadow” of this data while obtaining better generalization performance. ELM has
when viewed from its most informative viewpoint. The several significant features [22] such as extremely fast learn-
other main advantage of PCA is that once you have ing speed, high generalization performance and free of
found these patterns in the data, and you compress the parameter tuning which distinguish itself from the tradition-
data by reducing the number of dimensions, without al learning algorithms of SLFNs.
much loss of information. This technique is mostly used
as a tool in exploratory data analysis and for making
predictive models, and has found its application in
fields such as face recognition and image compression.
PCA is a mathematical procedure that uses an orthogonal
transformation to convert a set of observations of possibly
correlated variables into a set of values of uncorrelated
variables called principal components. It can be done by
eigenvalue decomposition of a data covariance matrix or
singular value decomposition of a data matrix, usually after
mean centering the data for each dimension. The transfor-
mation is defined in such a way that the first principal
component has as high a variance as possible, and each
succeeding component in turn has the highest variance
possible under the constraint that it be orthogonal to the
preceding components. Fig. 1 The structure of ELM model
3330 J Med Syst (2012) 36:3327–3337

Given a training set @ ¼ fðxi ; ti Þjxi 2 Rn ; ti 2 Rm ; i ¼ Equation 5 can be easily accomplished using a linear
1; 2; . . . ; N g, where xi is the n×1 input feature vector and method, such as the Moor-Penrose (MP) generalized inverse
ti is a m×1 target vector. The standard SLFNs which have an of H, as is shown in Eq. 6
activation function g(x), and the number of hidden neurons
Ñ can be mathematically modeled as follows: b ¼ Hy T
Hb ¼ T ) b ð6Þ
Where H† is the MP generalized inverse of the matrix H.
X
~
N
b i gðwi  xj þ bi Þ ¼ oj ; j ¼ 1; 2; :::; N ð1Þ The use of the MP generalized inverse method has lead to
i¼1 the minimum norm least-squares (LS) solution, which is
unique and has the smallest norm among all the LS solu-
Where wi is the weight vector between the ith neuron in the tions. As analyzed by Huang et al. [10], by using such MP
hidden layer and the input layer, bi means the bias of the ith inverse method, ELM tends to obtain a good generalization
neuron in the hidden layer; βi is the weight vector between performance with a dramatically increased learning speed.
the ith hidden neuron and the output layer; and oj is the In summary, the learning steps of the ELM algorithm can
target vector of the jth input data. Here, wi xj denotes the be summarized as the following three steps:
inner product of wi andxj. Given a training set @ ¼ fðxi ; ti Þjxi 2 Rn ; ti 2 Rm ; i ¼
If SLFNs can approximate these N samples with zero 1; 2; . . . ; N g , an activation function g(x), and the number
P
error, we will have Nj¼1 jjoj  tj jj ¼ 0, i.e., there exist βi, wi of hidden neurons Ñ,
P~
N (1) Randomly assign the input weights wi and bias bi, i01,
and bi, such that b i gðwi  xj þ bi Þ ¼ tj ; j ¼ 1; 2; . . . ; N .
i¼1 2,…,Ñ.
The above Equation can be reformulated compactly as: (2) Calculate the hidden layer output matrix H.
(3) Calculate the output weight β0H†T, T0[t2, t2,…, tn]T.
Hb ¼ T ð2Þ

 
Where H w1 ; :::; we ;b; :::; be ; x1 ; :::;xN Proposed PCA-ELM CAD system
N N
0 1
gðw1  x1 þ b1 Þ . . . gðwe  x1 þ be Þ
B N N
C In this section, we describe the proposed PCA-ELM CAD
¼B@
..
.
..
.
..
.
C
A system for thyroid disease diagnosis. The architecture is
gðw1  xN þ b1 Þ    gðwe  xN þ be Þ shown in Fig. 2. As mentioned in the Introduction, the aim
N N Ne
N
ð3Þ of this system is to maximize the generalization capability of
ELM for thyroid disease diagnosis. In order to achieve this
goal, we designed a hybrid method. In the first stage, dimen-
2 3 2 3 sion reduction is obtained using PCA. In the second stage,
bT1 t1T
6 7 6 7 different new feature sets are fed into the ELM classifier for
b ¼ 4 ... 5 and T ¼ 4 ... 5 ð4Þ
training an optimal model, meanwhile the number of hidden
bTN~ ~
Nm tNT N m neurons is selected which can obtain the most accurate
results. Finally, the predict model conducts the diagnostic
As named by Huang et al. [23] H is called the hidden tasks using the most discriminative new feature set and the
layer output matrix of the neural network, with the ith optimal parameters.
column of H being the ith hidden neuron output with respect
to inputs x1,x2,…,xN. Huang et al. [24, 25] has shown that The feature extraction and feature reduction phase
the input weights and the hidden layer biases of SLFNs need
not be adjusted at all and can be arbitrarily given. Under this As we known, feature extraction plays an important role in
assumption, the output weights can be analytically deter- classification. The features can be divided into two subsets,
mined by finding the least square solution b b of the linear one contains features that pose most of the useful information,
system Hβ0T: and the other is composed of the dispensable features. Since
the existence of the latter set of features hardly influence the
classification performance. We can eliminate those features,
jjHðw1 ;    ; we ; b1 ;    ; be Þb
b  Tjj so that the dimension of the feature vector is reduced to a
N N
lower dimension. On one side, it improves the computation
¼ min jjHðw1 ;    ; we ; b1 ;    ; be Þb  Tjj ð5Þ
b N N speed through dimension reduction; on the other side, these
J Med Syst (2012) 36:3327–3337 3331

Fig. 2 The architecture of the


proposed PCA-ELM CAD
system

remained dimensions are composed of set of attributes which obtained, it contains some instances with each one has several
pose high discriminate, so that increases the accuracy of the features. Secondly, we use PCA to reduce the original dimen-
resulting model. In our system, the feature extraction process sions into a lower level. Finally we get the transformed data-
is performed by PCA. Firstly the thyroid gland dataset is set. The detail pseudo-code is given below:
3332 J Med Syst (2012) 36:3327–3337

The classification phase 9 training sets and 1 test dataset. Each dataset is
presented by an L column matrix N.
In the second stage, ELM model performs the classification Step2: Experimentally decide the optimal number of hidden
tasks using the new feature set done by PCA. It includes two neurons. Firstly set up the number of hidden neurons
main sub procedures. At first, we should set up all the as Nmin, and for each folder of dataset, we get the
parameters of ELM model. Since the number of hidden ELM model using the training dataset in this folder
neurons has an importance influence on the performance by method referred in Section 2.2, and then apply the
of ELM model. We design an experimental strategy to trained ELM model to classify the test set. The accu-
choose the optimal number of the neurons. The idea is that racy of test dataset is stored. After 10 folders are
we consider the outputs of ELM with different number of completed, the average accuracy is obtained by the
hidden neurons. Suppose Nmin and Nmax are separately the mean of the stored accuracy. Then we add the num-
minimum and maximum number of hidden neurons and ber of hidden neurons and repeat the above procedure
n is the current value of hidden neurons. For each to get a new average accuracy. When n reaches the
choice of n, we test the average accuracy obtained by maximum number of hidden neurons Nmax or the
ELM via the 10-fold cross-validation technique, finally average accuracy arrives at a predefined threshold,
the one with the highest average accuracy is selected as we stop this iterative process. Now the number of
the optimal number of hidden neurons. After choosing hidden neurons is considered as the optimal value.
the optimal number of hidden neurons, we then use the Step3: Classify test datasets by ELM model with the optimal
ELM classifier to compute the classification accuracy value of hidden neurons and optimum new feature set
using the output result of PCA, and then averaged the obtained by PCA. Similarly, train ELM on the train-
obtained results. The detail steps are as follows: ing datasets of 10 folders, and then use the optimal
ELM to classify the test datasets. The average accu-
Step1: Pre-process the datasets. Divide the transformed racy is obtained as the final performance estimation
data provided by PCA into 10 subsets using 10- measure. The pseudo-code of this stage, termed as
fold cross-validation method. Each subset contains classification phase is given bellow:

Experimental designs learning repository (http://archive.ics.uci.edu/ml/datasets/


Thyroid+Disease, last accessed: 3 December 2010). This
Data description dataset is commonly used among the other classification
systems, so we choose it to compare our study with
In this section, we have performed our experiments on other ones for thyroid diagnosis problem. It comprises
the thyroid disease database taken from UCI machine 215 patients from the same hospital. Each individual
J Med Syst (2012) 36:3327–3337 3333

the training. At first, all of data are randomly divided to k


mutually exclusive and approximately equal size subsets.
Secondly, the classification algorithm is trained and tested k
times. In order to ensure the same class distribution in the
subset, the data is split via stratified sampling in which the
sample proportion in each data subset is the same as that in
the population. Empirical studies showed that stratified
cross-validation tends to generate comparison results with
lower bias and lower variance when compared to regular k-
fold cross-validation [27]. In this study we set k as 10, i.e.,
the data is divided into ten subsets. Each time, one of the ten
subsets is used as the test set and the other nine subsets are
put together to form a training set. Then the average error
Fig. 3 Classification accuracies obtained by different dimensions with
across all ten trials is computed. To evaluate accurately the
increasing neurons performance of the data sets, the 10-fold CV will be repeat-
ed 5 times and then averaged the results.
was characterized by the result of five laboratory tests, and all
of the features are continuous as shown in Table 1. These
Experimental setting
individuals were divided into three groups of known classifi-
cation based on diagnosis results, the class distribution is
The proposed PCA-ELM CAD system was implemented
using MATLAB platform. For ELM, the implementation
& Class 1: Healthy individuals (normal), totally 150
by Zhu and Huang available from http://www3.ntu.edu.
individuals;
sg/home/egbhuang was used. Regarding SVM, LIBSVM
& Class 2: Patients suffering from hyperthyroidism (hy-
implementation was utilized, which was originally de-
per), there are 35 individuals;
veloped by Chang and Lin [28]. PCA was implemented
& Class 3: Patients suffering from hypothyroidism (hypo),
from scratch. The empirical experiment was conducted
in sum 30 individuals.
on AMD Dual Core processor 5000+ CPU (2.6 GHz)
All the input attributes are first scaled so that they lie in a with 4GB of RAM.
suitable range. Usually, the data could be normalized by ELM models were built via the stratified 10-fold cross
scaling them into the interval of [−1, 1] according to the validation procedure on the thyroid disease database
Eq. 7, where x is the original value, x′ is the scaled value, through increasing gradually the number of hidden neurons
maxa is the maximum value of feature a, and mina is the from 1 to 50 in interval of 1. The best number of neurons
minimum value of feature a. was taken to create the training model. The sigmoid activa-
x  mina tion function was used to compute the hidden layer output
x0 ¼ ð Þ21 ð7Þ matrix.
maxa  mina
For SVM, we adopted the same setting as in [8]. The
In order to guarantee the valid results, the k-fold cross- range of the related parameters C and γ were varied between
validation (CV) method [26] is used to evaluate the classi- C0{2−5,2−3,…,215} and γ0{2−15,2−13,…,21}, and the grid
fication accuracy of test results. The k-fold cross-validation search technique [29] was employed using 5-fold cross-
method is widely applied by researchers with the aim of validation to find out the optimal parameter values of RBF
minimizing the bias associated with the random sampling of kernel function.

Table 1 The detail of the 5 attributes of the thyroid disease dataset

Attribute Description Mean Standard deviation

F1 T3-resin uptake test (A percentage) 109.595 13.145


F2 Total serum thyroxin as measured by the isotopic displacement method 9.805 4.697
F3 Total serum triiodothyronine as measured by radioimmuno assay 2.050 1.420
F4 Basal thyroid-stimulating hormone (TSH) as measured by radioimmuno assay 2.880 6.118
F5 Maximal absolute difference of TSH value after injection of 200 mg of thyrotropin-releasing 4.199 8.071
hormone as compared to the basal value
3334 J Med Syst (2012) 36:3327–3337

Experimental results and discussion Table 2 Classification results of PCA-ELM

Number of PCs Classification accuracy (%)


To evaluate the effectiveness of the proposed method, we
conducted experiments on the diagnosis of thyroid disease. Mean SD Max Min
The validation accuracy against the number of hidden neu-
1 97.73 0.39 98.1 97.16
rons for ELM based on different number of principles (PCs)
2 93.48 1.36 95.35 92.06
on the thyroid database is shown in Fig. 3. It can be seen
3 93.81 1.01 94.37 92.03
from the figure that the performance of ELM is changed
with different number of neurons and PCs. The highest 4 93.55 0.69 94.37 92.51
validation accuracy has been achieved when the number of 5 91.67 1.21 92.97 90.17
hidden neurons is equal to 13, 37, 29, 27 and 46 for 5
different ELM models based on 1 PC, 2 PCs, 3 PCs, 4
PCs and 5PCs respectively. Therefore, 13, 37, 29, 27 and Table 3 shows the classification confusion matrix of the
46 hidden neurons are chosen to create the training model best ELM model based on 1PC in 5 runs. The confusion
for ELM based on 1 PC, 2 PCs, 3 PCs, 4 PCs and 5PCs in matrix summed the whole 10 test subsets in 10-fold cross-
subsequent analysis, respectively. validation. As can be seen from the table, PCA-ELM correctly
After obtaining the optimal number of hidden neurons for classifies the whole 150 normal cases, misclassifies 1 case of
different dimension of feature vector, we tested the classifi- hyperthyroidism as normal one and 3 cases of hypothyroidism
cation accuracies of ELM model with the optimal parameter as the normal ones. For comparison purpose, Table 4 lists the
on different PCs. The classification results over 5 runs of 10- classification accuracies of our method and previous methods.
fold CV for the 5 models are shown in Fig. 4.The details of As shown in Table 4, our developed PCA-ELM CAD system
the results are given in Table 2. As can be seen from the can obtain better classification accuracy than that of all avail-
table, the best results were obtained with the model based on able methods proposed in previous studies.
1 PC with respect to the mean and maximum classification In order to further investigate the effectiveness and effi-
results. The best result was 98.1% and the mean classifica- ciency of the proposed PCA-ELM CAD system, we attemp-
tion result was 97.73%. Note that the ELM model based on ted to compare the results of the proposed approach with
5 PCs can be regard as the model without dimension reduc- that of PCA-SVM CAD system implemented in [8] in terms
tion, because the whole original features are served as the of the classification accuracy and CPU time. In PCA-SVM,
inputs into the ELM model. Compared with the ELM model all the five principle components (PCs) were fed into the
based on 5 PCs, the one based on 1 PC has enhanced further SVM classifier. As reported in [8], the model using 2 PCs
the mean classification accuracy by 6.06% and the maxi- achieved the best results with the mean classification accu-
mum classification accuracy by 5.13% thanks to using PCA. racy of 96.40% and the maximum classification accuracy of
The relatively bad performance of the ELM model based on 97.25%. Here, the SVM model based on 2 PCs was com-
5 PCs is due to the existence of irrelevant information in the pared with the PCA-ELM model based on 1 PC in our
data, which leads to decreasing the performance of the ELM experiment. The comparison results are summarized in
classifier. Table 5.
It can be seen from the results, PCA-ELM has outper-
forms PCA-SVM in terms of the classification accuracy at
the statistical level of 5%. In addition, it is interesting to see
that the standard deviation for the acquired performance by
the PCA-ELM is much smaller than that of PCA-SVM on
both datasets, which indicates consistency and stability of

Table 3 Classification confusion matrix for the whole 10 test subsets


in one runs of 10-fold CV

Actual Predicted

Normal Hyperthyroidism Hypothyroidism

Normal 150 0 0
Hyperthyroidism 1 34 0
Fig. 4 Classification accuracies over 5 runs of 10-fold CV with Hypothyroidism 3 0 27
different dimensions
J Med Syst (2012) 36:3327–3337 3335

Table 4 Classification accura-


cies obtained with our method Method Study Accuracy (%)
and other methods
Serpen et al.(1997) [2] MLP 36.74 (test data)
LVQ 81.86 (test data)
RBF 72.09 (test data)
PPFNN 78.14 (test data)
Ozyilmaz and Yildirim (2002) [1] MLP with back-propagation 86.33 (average-3-fold-CV)
MLP with fast back-propagation 89.80 (average-3-fold-CV)
RBF 79.08
CSFNN 91.14
Pasi (2004) [3] LDA 81.34 (test data)
C4.5-1 93.26 (test data)
C4.5-2 92.81 (test data)
C4.5-3 92.94 (test data)
MLP 96.24 (test data)
DIMLP 94.86 (test data)
Polat et al. (2007) [4] AIRS 81.00 (average-10-fold-CV)
AIRS with Fuzzy weighted 85.00 (average-3-fold-CV)
pre-processing
Keles et al. (2008) [5] ESTDD 95.33 (10-fold-CV)
Temurtas(2009) [6] MLNN with LM 92.96 (3-fold-CV)
PNN 94.43 (3-fold-CV)
LVQ 89.79 (3-fold-CV)
MLNN with LM 93.19 (10-fold-CV)
PNN 94.81 (10-fold-CV)
LVQ 90.05 (10-fold-CV)
Esin Dogantekina et al. (2011) [7] GDA-WSVM 91.86 (test data)
Chen et al. (2011) [8] FS-PSO-SVM 97.40 (average-10-fold-CV)
This Study PCA-ELM 97.73 (average-10-fold-CV)
98.10 (10-fold-CV)

the proposed model. As for the running time, we can ob- Through these analyses, it is obvious that PCA-ELM model
serve that the mean running time of ELM model is 7.54 s, is an efficient classification method in comparison with
which is shorter than SVM model whose mean value is PCA-SVM method. Therefore, we can see clearly that
11.48 s. We should notice that the running time for ELM PCA-SVM is a much more appropriate tool for thyroid
including the time to select the optimal number of hidden disease diagnosis problem compared with the other meth-
neurons as well as the training of ELM model. That implies ods. Consequently, it makes us be more convinced that the
that if we have prior knowledge about the optimal value of proposed CAD system can be very helpful in assisting the
hidden neurons, the running time can be further reduced. physicians to make the accurate diagnosis on the patients.

Table 5 The performance com-


parison of PCA-ELM with 5 runs of 10-fold CV PCA-ELM PCA-SVM [8] Paired t-test p-value
PCA-SVM (between (1) and (2))
(1)ACC (%) Time (s) (2)ACC (%) Time (s)

#1 run 98.10 7.7 96.32 11.2


#2 run 97.16 7.4 96.30 11.4
#3 run 98.10 7.3 95.82 11.7
#4 run 97.64 7.9 96.30 11.3
#5 run 97.64 7.4 97.25 11.8
Mean ± SD 97.73 7.54 96.40 11.48 0.016
±0.39 ±0.25 ±0.52 ±0.26
3336 J Med Syst (2012) 36:3327–3337

Conclusions 7. Dogantekin, E., Dogantekin, A., and Avci, D., An expert system
based on generalized discriminant analysis and wavelet support
vector machine for diagnosis of thyroid diseases. Expert Syst.
In this work, we have developed a CAD system PCA-ELM Appl. 38(1):146–150, 2011.
for assisting the diagnosis of thyroid disease. The main aim 8. Chen, H. L, Yang, B., Wang, G., Liu, J., Chen, Y. D., and Liu., D.
of this system is to apply the unique features of ELM Y., “A three-stage expert system based on support vector machines
for thyroid disease diagnosis.” J. Med. Syst.: http://dx.doi.org/
classifier including better generalization performance, fast
10.1007/s10916-011-9655-8, 2011.
learning speed, simpler and without tedious and time- 9. Hsu, C. W., and Lin, C. J., A comparison of methods for multi-
consuming parameter tuning to perform thyroid disease class support vector machines. Neural Networks, IEEE Transac-
diagnosis. In order to get rid of the irrelevant information tions on. 13(2):415–425, 2002.
10. Huang, G. B., Zhu, Q. Y., and Siew, C. K., Extreme learning
in the thyroid data, the PCA was used for feature reduction machine: a new learning scheme of feed forward neural networks.
before conducting the ELM classifier. Experimental results IEEE Int. Jt. Conf. Neural Netw. 2:985–990, 2004.
demonstrated that the proposed system performed signifi- 11. Chen, F. L., and Ou, T. Y., Sales forecasting system based on gray
cantly well in distinguishing among hyperthyroidism, hypo- extreme learning machine with taguchi method in retail industry.
Expert Syst. Appl. 38(3):1336–1345, 2011.
thyroidism and normal ones. It was observed that PCA-
12. Liu, N., Lin, Z., Koh, Z., Huang, G. B, Ser, W., Ong, M. E. H.,
ELM achieved the highest classification accuracy of Patient outcome prediction with heart rate variability and vital
98.1% and mean classification accuracy of 97.73% signs. J. Signal Proc. Syst. 1–14, 2010.
using10-fold cross-validation. Meanwhile, comparative 13. Han, F., et al., The forecast of the postoperative survival time of
patients suffered from non-small cell lung cancer based on PCA
study was conducted on the methods of PCA-SVM and
and extreme learning machine. Int. J. Neural Syst. 16(1):39–46,
PCA-ELM. The experimental results showed that PCA- 2006.
ELM significantly outperformed PCA-SVM in terms of 14. Zhang, R., et al., Multicategory classification using an ex-
classification accuracy with shorter run time. Therefore, it treme learning machine for microarray gene expression cancer
diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinforma. 4
can be safely concluded that, the developed PCA-ELM (3):485–494, 2007.
CAD system is helpful to make very accurate diagnostic 15. Helmy, T., and Rasheed, Z., Multi-category bioinformatics dataset
decision. The future investigation will pay much attention to classification using extreme learning machine. in Evolutionary
evaluate the proposed system in other medical diagnosis Computation, 2009. CEC '09. IEEE Congress on. 2009.
16. Gomathi, M., and Thangaraj, P., A computer aided diagnosis
problems.
system for lung cancer detection using machine learning tech-
nique. Eur. J. Sci. Res. 51(2):260–275, 2011.
17. Chen, H. L., Liu, D. Y., Yang, B., Liu, J., and Wang, G., A new
Acknowledgements This research is supported by the National Nat- hybrid method based on local fisher discriminant analysis and
ural Science Foundation of China (NSFC) under Grant Nos. 61133011, support vector machines for hepatitis disease diagnosis. Expert
61170092, 60973088, 60873149. Syst. Appl. 38(9):11796–11803, 2011.
18. Chen, H. L., Yang, B., Liu, J., and Liu, D. Y., A support
vector machine classifier with rough set-based feature selec-
tion for breast cancer diagnosis. Expert Syst. Appl. 38
(7):9014–9022, 2011.
References 19. Polat, K., and Gunes, S., Computer aided medical diagnosis sys-
tem based on principal component analysis and artificial immune
recognition system classifier algorithm. Expert Syst. Appl. 34
1. Ozyilmaz, L., and Yildirim T., Diagnosis of thyroid disease (1):773–779, 2008.
using artificial neural network methods. In Proceedings of 20. Polat, K., and Gunes, S., An expert system approach based on
ICONIP’02 nineth international conference on neural informa- principal component analysis and adaptive neuro-fuzzy inference
tion processing, Orchid Country Club, Singapore, pp. 2033– system to diagnosis of diabetes disease. Digit. Signal Proc. 17
2036, 2002. (4):702–710, 2007.
2. Serpen, G., Jiang, H., and Allred, L., Performance analysis of 21. Pearson, K., On lines and planes of closest fit to systems of points
probabilistic potential function neural network classifier. In Pro- in space. Philos. Mag. 2(6):559–572, 1901.
ceedings of artificial neural networks in engineering conference, 22. Huang, G.-B., Zhu, Q. Y., and Siew, C.-K., Extreme learning
St. Louis, MO, Vol. 7, pp. 471–476, 1997. machine: theory and applications. Neurocomputing 70(1–3):489–
3. Pasi, L., Similarity classifier applied to medical data sets, in 501, 2006.
international conference on soft computing. Helsinki, Finland & 23. Huang, G. B., and Babri, H. A., Upper bounds on the number of
Gulf of Finland & Tallinn, Estonia, 2004. hidden neurons in feedforward networks with arbitrary bounded
4. Polat, K., Sahan, S., and Gunes, S., A novel hybrid method based nonlinear activation functions. Neural Netw., IEEE Trans. on. 9
on artificial immune recognition system (AIRS) with fuzzy (1):224–229, 1998.
weighted pre-processing for thyroid disease diagnosis. Expert Syst. 24. Huang, G. B., Learning capability and storage capacity of two-
Appl. 32(4):1141–1147, 2007. hidden-layer feedforward networks. Neural Netw., IEEE Trans. on.
5. Keles, A., and Keles, A., ESTDD: expert system for thyroid dis- 14(2):274–281, 2003.
eases diagnosis. Expert Syst. Appl. 34(1):242–246, 2008. 25. Huang, G. B., Chen, L., and Siew, C. K., Universal approximation
6. Temurtas, F., A comparative study on thyroid disease diagno- using incremental constructive feedforward networks with random
sis using neural networks. Expert Syst. Appl. 36(1):944–949, hidden nodes. Neural Netw., IEEE Trans. on. 17(4):879–892,
2009. 2006.
J Med Syst (2012) 36:3327–3337 3337

26. Salzberg, S. L., On comparing classifiers: pitfalls to avoid and a 28. Chang, C. C., and Lin, C. J., LIBSVM: a library for support vector
recommended approach. Data mining. Knowl. Discov. 1(3):317– machines. 2001, Software available at http://www.csie.ntu.edu.tw/
328, 1997. cjlin/libsvm.
27. Ron, K., A study of cross-validation and bootstrap for accu- 29. Hsu, C. W., Chang, C. C., and Lin, C. J., A practical guide to support
racy estimation and model selection, in Proceedings of the vector classification. Technical report, Department of Computer Sci-
14th international joint conference on Artificial intelligence— ence and Information Engineering, National Taiwan University, Tai-
Vol2, 1995. pei, 2003. available at http://www.csie.ntu.edu.tw/cjlin/libsvm/.

You might also like