Disease Using Extreme ML
Disease Using Extreme ML
DOI 10.1007/s10916-012-9825-3
ORIGINAL PAPER
Received: 17 November 2011 / Accepted: 26 January 2012 / Published online: 12 February 2012
# Springer Science+Business Media, LLC 2012
Abstract In this paper, we present an effective and efficient validation method, with the mean accuracy of 97.73% and
computer aided diagnosis (CAD) system based on principle with the maximum accuracy of 98.1%. Besides, PCA-ELM
component analysis (PCA) and extreme learning machine performs much faster than support vector machines (SVM)
(ELM) to assist the task of thyroid disease diagnosis. The based CAD system. Consequently, the proposed method
CAD system is comprised of three stages. Focusing on PCA-ELM can be considered as a new powerful tools for
dimension reduction, the first stage applies PCA to construct diagnosing thyroid disease with excellent performance and
the most discriminative new feature set. After then, the less time.
system switches to the second stage whose target is model
construction. ELM classifier is explored to train an optimal Keywords Thyroid disease diagnosis . Extreme learning
predictive model whose parameters are optimized. As we machine (ELM) . Principle component analysis (PCA)
known, the number of hidden neurons has an important role
in the performance of ELM, so we propose an experimental
method to hunt for the optimal value. Finally, the obtained Introduction
optimal ELM model proceeds to perform the thyroid disease
diagnosis tasks using the most discriminative new feature The thyroid is a small gland, shaped like a butterfly, located in
set and the optimal parameters. The effectiveness of the the lower part of the neck below the skin and muscle layers.
resultant CAD system (PCA-ELM) has been rigorously The thyroid gland produces two active thyroid hormones,
estimated on a thyroid disease dataset which is taken from levothyroxine (abbreviated T4) and triiodothyronine (abbrevi-
UCI machine learning repository. We compare it with other ated T3). These hormones are important in the production of
related methods in terms of their classification accuracy. proteins, in the regulation of body temperature, and in overall
Experimental results demonstrate that PCA-ELM outper- energy production and regulation. As a result, thyroid function
forms other ones reported so far by 10-fold cross- impacts on every essential organ in the body. The seriousness of
thyroid disorders should not be underestimated (http://thyroid.
about.com/library/links/blthyroid.htm).
L.-N. Li : J.-H. Ouyang : H.-L. Chen : D.-Y. Liu (*)
The thyroid gland is prone to several very distinct prob-
College of Computer Science and Technology, Jilin University, lems, some of which are extremely common. Production of
No.2699, QianJin Road, too little thyroid hormone causes hypothyroidism or produc-
Changchun, Jilin 130012, China tion of too much thyroid hormone causes hyperthyroidism.
e-mail: [email protected]
For the former case, it is a condition where the thyroid is
D.-Y. Liu under-active and unable to produce sufficient levels of thyroid
e-mail: [email protected] hormone. On the contrary, the thyroid gland is overactive, and
produces an excess of thyroid hormone for the latter case.
L.-N. Li : J.-H. Ouyang : H.-L. Chen : D.-Y. Liu
Both types of disorders are relatively common in the general
Key Laboratory of Symbolic Computation and Knowledge
Engineering of Ministry of Education, population. Doctors can incorporate numerous factors, includ-
Changchun, Jilin 130012, China ing clinical evaluation, blood tests, imaging tests, biopsies,
3328 J Med Syst (2012) 36:3327–3337
and other tests to diagnose thyroid disease. A common used the best performance among the common classifiers from
method is a test, called the thyroid-stimulating hormone machine learning community. However, for dealing the multi-
(TSH) test, which can identify thyroid disorders even before class problems, SVM usually takes the combination of binary
the onset of symptoms. classifiers via One-Versus-All (OVA) or One-Versus-One
Nowadays, CAD systems are getting more and more (OVO) strategies [9], which result in great computational
popular. Because with the help of the CAD systems, the burden and long training time. In addition, many CAD sys-
possible errors experts made in the course of diagnosis can tems are based on artificial neural network (ANN), such as
be avoided, and the medical data can be examined in shorter CSFNN、PPFNN and DIMLP, which also has achieved very
time and more detailed as well. In fact, thyroid function good performance on the thyroid disease diagnostic problem.
diagnosis can be formulated as the classification problem, so However, ANN is easy to get stuck in the local minima since it
it can be automatically performed with the aid of the CAD is based on the empirical risk minimization principle. Recent-
systems. Machine learning techniques are increasingly in- ly, a new learning algorithm for a single hidden layer feed-
troduced to construct the CAD systems owing to its strong forward neural networks (SLFNs) was proposed, called ex-
capability of extracting complex relationships in the bio- treme learning machine (ELM). ELM was first introduced by
medical data. Recently, various methods have been pre- Huang [10]. Unlike iteratively adjusting network parameters
sented to solve this problem. In 2002, Ozyilmaz et al. [1] which is commonly adopted by gradient-based methods,
used various neural network methods including Multi Layer ELM choose input weights and hidden biases randomly, and
Perception with Back-Propagation method (MLP), Radial the output weights are analytically determined by using
Basis Function (RBF) and adaptive Conic Section Function Moore–Penrose (MP) generalized inverse. ELM not only
Neural Network (CSFNN) to help diagnosis of thyroid learns much faster with higher generalization performance,
disease, their classification accuracies are separately but also keeping parameter tuning-free. Thanks to its good
88.3%, 81.69% and 85.92%. In 1997, Probabilistic Potential properties, ELM has found its application in a wide range of
Function Neural Network (PPFNN) classifier [2] was classification tasks such as sales forecasting [11], predicting
employed and the accuracy of 78.14% was obtained. In patient outcomes [12] and so on. Especially, ELM has shown
2004, Pasi et al. [3] applied five different methods including its unique advantage over other learning algorithms on several
Linear Discriminant Analysis (LDA), C4.5 with default disease diagnostic tasks. Han et al. [13] proposed an effective
learning parameters (C4.5-1), C4.5 with parameter c equal model based on ELM to forecast how long the postoperative
to 5 (C4.5-2), C4.5 with parameter c equal to 95 (C4.5-3) patients suffered from non-small cell lung cancer will survive.
and DIMLP with two hidden layers and default learning The experimental results have shown the proposed model
parameters(DIMLP) to perform classification, and the accu- obtains better prediction accuracy and faster convergence rate
racies reached 81.34%, 93.26%, 92.81%, 92.94% and than those of ANN models. Zhang et al. [14] evaluated the
94.86% respectively. In 2006, an accuracy of 81% was multi-category classification performance of ELM for cancer
obtained with the application of artificial immune recogni- diagnosis based on the microarray data sets, the results indi-
tion system (AIRS) proposed by Polat et al. [4]. Further cate that ELM produces comparable or better classification
more, the author studied a hybrid method that combines accuracies with reduced training time and implementation
AIRS with a developed Fuzzy weighted pre-processing, complexity compared to ANN and SVM methods. Helmy et
and obtained a classification accuracy of 85%. In 2008, al. [15] proposed to use ELM for five kinds of disease diag-
Keles et al. [5] diagnosed thyroid diseases with a expert nostic problems, the evaluation results indicate that ELM
system that called ESTDD (expert system for thyroid dis- produces better classification accuracy with reduced training
ease diagnosis), whose accuracy was 95.33%. In 2009, time and implementation complexity compared to other mod-
Temurtas [6] realized the diagnosis by Multi Layer Percep- els. Gomathi et al. [16] proposed an ELM based CAD system
tion with Levenberg-Marquardt (LM) algorithm (MLP with for lung cancer detection, whose experimental results show
LM), and the corresponding accuracy was 93.19%. In 2011, that the usage of ELM instead of SVM will result in better
a Generalized Discriminant Analysis (GDA) and Wavelet accuracy of classification.
Support Vector Machine (WSVM) (GDA-WSVM) [7] In this paper we attempt to investigate the effective-
method for diagnosis of thyroid diseases was presented, ness of ELM approach in conducting the thyroid disease
and obtained 91.86% classification accuracy. In 2011, Chen diagnostic problem. Aiming at improving the efficiency
[8] proposed a particle swarm optimization optimized sup- and effectiveness of the classification accuracy for thy-
port vector machines with fisher score (FS-PSO-SVM) roid disease diagnosis, a CAD system based on ELM is
CAD system for thyroid disease, and the average accuracy introduced. The previous works [8, 17–20] have pointed
of 97.49% was achieved. out that using feature selection or feature extraction
From these works, we can clearly see that the support before conducting the classification tasks can improve
vector machines (SVM) [8] based CAD system has achieved the diagnosis accuracy. Here, we attempt to examine the
J Med Syst (2012) 36:3327–3337 3329
effectiveness of the dimensionality reduction technique The PCA method can be formularized as below: Given a
before constructing the ELM classifier for the thyroid s*t matrix M, which is a t-dimensional data set with each
disease diagnosis. Principle component analysis (PCA) row represents a different observation of the variables, and
is utilized to do the feature extraction which projects the each column gives the observation values of a variable.
original feature space into a new space, on which the Besides, the mean value of matrix is zero. That is we first
ELM is used to perform the diagnostic task. The resul- subtract the mean from each of the data dimension. The
tant CAD system was coined with the name of PCA- singular value decomposition of M is M0UΣVT, where U is
ELM. The effectiveness of the proposed PCA-ELM is a s*s orthonormal matrix containing the left singular vectors
examined in terms of classification accuracy on the of M which are the principal directions, the matrix Σ is an
thyroid disease dataset taken from UCI machine learn- s*t rectangular diagonal matrix with nonnegative real numb-
ing repository. Promisingly, as can be seen that the ers on the diagonal, and the t*t matrix V is the matrix of the
developed PCA-ELM CAD system has achieved high right eigenvectors of M. If we require that the PCA trans-
accuracy and runs very fast as well. formation preserves dimensionality, then new matrix N is
The remainder of this paper is organized as follows. given by: N0UTM; If we want a reduced-dimensionality
Background materials offers brief background knowledge representation, we can project M down into the reduced
on PCA and ELM. The detail of implementations of the space defined by only the first L singular vectors UL, then
hybrid method PCA-ELM is described in ‘Proposed PCA- N0(UL)TM.
ELM CAD system’. Experimental designs presents the ex-
perimental results and discussion of the proposed method.
Finally, conclusions and recommendations for future work A brief review of extreme learning machine (ELM)
are summarized in ‘Conclusions’.
Extreme learning machine (ELM) as a new learning algo-
rithm for single layer feed forward neural networks (SLFNs)
Background materials as shown in Fig. 1 was first introduced by Huang el al. [10].
ELM seeks to overcome the challenging issues faced with
Principle component analysis (PCA) the traditional SLFNs learning algorithms such as slow
learning speed, trivial parameter tuning and poor general-
PCA was invented in 1901 by Karl Pearson [21]. It is a ization capability. ELM has demonstrated great potential in
way of identifying patterns in data, and expressing the handling classification and regression tasks with excellent
data in such a way as to highlight their similarities and generalization performance. The learning speed of ELM is
differences. Since patterns in data can be hard to find in much faster than conventional gradient based iterative learn-
data of high dimension, PCA supply the user with a ing algorithms of SLFNs like back propagation algorithm
lower-dimensional picture, a “shadow” of this data while obtaining better generalization performance. ELM has
when viewed from its most informative viewpoint. The several significant features [22] such as extremely fast learn-
other main advantage of PCA is that once you have ing speed, high generalization performance and free of
found these patterns in the data, and you compress the parameter tuning which distinguish itself from the tradition-
data by reducing the number of dimensions, without al learning algorithms of SLFNs.
much loss of information. This technique is mostly used
as a tool in exploratory data analysis and for making
predictive models, and has found its application in
fields such as face recognition and image compression.
PCA is a mathematical procedure that uses an orthogonal
transformation to convert a set of observations of possibly
correlated variables into a set of values of uncorrelated
variables called principal components. It can be done by
eigenvalue decomposition of a data covariance matrix or
singular value decomposition of a data matrix, usually after
mean centering the data for each dimension. The transfor-
mation is defined in such a way that the first principal
component has as high a variance as possible, and each
succeeding component in turn has the highest variance
possible under the constraint that it be orthogonal to the
preceding components. Fig. 1 The structure of ELM model
3330 J Med Syst (2012) 36:3327–3337
Given a training set @ ¼ fðxi ; ti Þjxi 2 Rn ; ti 2 Rm ; i ¼ Equation 5 can be easily accomplished using a linear
1; 2; . . . ; N g, where xi is the n×1 input feature vector and method, such as the Moor-Penrose (MP) generalized inverse
ti is a m×1 target vector. The standard SLFNs which have an of H, as is shown in Eq. 6
activation function g(x), and the number of hidden neurons
Ñ can be mathematically modeled as follows: b ¼ Hy T
Hb ¼ T ) b ð6Þ
Where H† is the MP generalized inverse of the matrix H.
X
~
N
b i gðwi xj þ bi Þ ¼ oj ; j ¼ 1; 2; :::; N ð1Þ The use of the MP generalized inverse method has lead to
i¼1 the minimum norm least-squares (LS) solution, which is
unique and has the smallest norm among all the LS solu-
Where wi is the weight vector between the ith neuron in the tions. As analyzed by Huang et al. [10], by using such MP
hidden layer and the input layer, bi means the bias of the ith inverse method, ELM tends to obtain a good generalization
neuron in the hidden layer; βi is the weight vector between performance with a dramatically increased learning speed.
the ith hidden neuron and the output layer; and oj is the In summary, the learning steps of the ELM algorithm can
target vector of the jth input data. Here, wi xj denotes the be summarized as the following three steps:
inner product of wi andxj. Given a training set @ ¼ fðxi ; ti Þjxi 2 Rn ; ti 2 Rm ; i ¼
If SLFNs can approximate these N samples with zero 1; 2; . . . ; N g , an activation function g(x), and the number
P
error, we will have Nj¼1 jjoj tj jj ¼ 0, i.e., there exist βi, wi of hidden neurons Ñ,
P~
N (1) Randomly assign the input weights wi and bias bi, i01,
and bi, such that b i gðwi xj þ bi Þ ¼ tj ; j ¼ 1; 2; . . . ; N .
i¼1 2,…,Ñ.
The above Equation can be reformulated compactly as: (2) Calculate the hidden layer output matrix H.
(3) Calculate the output weight β0H†T, T0[t2, t2,…, tn]T.
Hb ¼ T ð2Þ
Where H w1 ; :::; we ;b; :::; be ; x1 ; :::;xN Proposed PCA-ELM CAD system
N N
0 1
gðw1 x1 þ b1 Þ . . . gðwe x1 þ be Þ
B N N
C In this section, we describe the proposed PCA-ELM CAD
¼B@
..
.
..
.
..
.
C
A system for thyroid disease diagnosis. The architecture is
gðw1 xN þ b1 Þ gðwe xN þ be Þ shown in Fig. 2. As mentioned in the Introduction, the aim
N N Ne
N
ð3Þ of this system is to maximize the generalization capability of
ELM for thyroid disease diagnosis. In order to achieve this
goal, we designed a hybrid method. In the first stage, dimen-
2 3 2 3 sion reduction is obtained using PCA. In the second stage,
bT1 t1T
6 7 6 7 different new feature sets are fed into the ELM classifier for
b ¼ 4 ... 5 and T ¼ 4 ... 5 ð4Þ
training an optimal model, meanwhile the number of hidden
bTN~ ~
Nm tNT N m neurons is selected which can obtain the most accurate
results. Finally, the predict model conducts the diagnostic
As named by Huang et al. [23] H is called the hidden tasks using the most discriminative new feature set and the
layer output matrix of the neural network, with the ith optimal parameters.
column of H being the ith hidden neuron output with respect
to inputs x1,x2,…,xN. Huang et al. [24, 25] has shown that The feature extraction and feature reduction phase
the input weights and the hidden layer biases of SLFNs need
not be adjusted at all and can be arbitrarily given. Under this As we known, feature extraction plays an important role in
assumption, the output weights can be analytically deter- classification. The features can be divided into two subsets,
mined by finding the least square solution b b of the linear one contains features that pose most of the useful information,
system Hβ0T: and the other is composed of the dispensable features. Since
the existence of the latter set of features hardly influence the
classification performance. We can eliminate those features,
jjHðw1 ; ; we ; b1 ; ; be Þb
b Tjj so that the dimension of the feature vector is reduced to a
N N
lower dimension. On one side, it improves the computation
¼ min jjHðw1 ; ; we ; b1 ; ; be Þb Tjj ð5Þ
b N N speed through dimension reduction; on the other side, these
J Med Syst (2012) 36:3327–3337 3331
remained dimensions are composed of set of attributes which obtained, it contains some instances with each one has several
pose high discriminate, so that increases the accuracy of the features. Secondly, we use PCA to reduce the original dimen-
resulting model. In our system, the feature extraction process sions into a lower level. Finally we get the transformed data-
is performed by PCA. Firstly the thyroid gland dataset is set. The detail pseudo-code is given below:
3332 J Med Syst (2012) 36:3327–3337
The classification phase 9 training sets and 1 test dataset. Each dataset is
presented by an L column matrix N.
In the second stage, ELM model performs the classification Step2: Experimentally decide the optimal number of hidden
tasks using the new feature set done by PCA. It includes two neurons. Firstly set up the number of hidden neurons
main sub procedures. At first, we should set up all the as Nmin, and for each folder of dataset, we get the
parameters of ELM model. Since the number of hidden ELM model using the training dataset in this folder
neurons has an importance influence on the performance by method referred in Section 2.2, and then apply the
of ELM model. We design an experimental strategy to trained ELM model to classify the test set. The accu-
choose the optimal number of the neurons. The idea is that racy of test dataset is stored. After 10 folders are
we consider the outputs of ELM with different number of completed, the average accuracy is obtained by the
hidden neurons. Suppose Nmin and Nmax are separately the mean of the stored accuracy. Then we add the num-
minimum and maximum number of hidden neurons and ber of hidden neurons and repeat the above procedure
n is the current value of hidden neurons. For each to get a new average accuracy. When n reaches the
choice of n, we test the average accuracy obtained by maximum number of hidden neurons Nmax or the
ELM via the 10-fold cross-validation technique, finally average accuracy arrives at a predefined threshold,
the one with the highest average accuracy is selected as we stop this iterative process. Now the number of
the optimal number of hidden neurons. After choosing hidden neurons is considered as the optimal value.
the optimal number of hidden neurons, we then use the Step3: Classify test datasets by ELM model with the optimal
ELM classifier to compute the classification accuracy value of hidden neurons and optimum new feature set
using the output result of PCA, and then averaged the obtained by PCA. Similarly, train ELM on the train-
obtained results. The detail steps are as follows: ing datasets of 10 folders, and then use the optimal
ELM to classify the test datasets. The average accu-
Step1: Pre-process the datasets. Divide the transformed racy is obtained as the final performance estimation
data provided by PCA into 10 subsets using 10- measure. The pseudo-code of this stage, termed as
fold cross-validation method. Each subset contains classification phase is given bellow:
Actual Predicted
Normal 150 0 0
Hyperthyroidism 1 34 0
Fig. 4 Classification accuracies over 5 runs of 10-fold CV with Hypothyroidism 3 0 27
different dimensions
J Med Syst (2012) 36:3327–3337 3335
the proposed model. As for the running time, we can ob- Through these analyses, it is obvious that PCA-ELM model
serve that the mean running time of ELM model is 7.54 s, is an efficient classification method in comparison with
which is shorter than SVM model whose mean value is PCA-SVM method. Therefore, we can see clearly that
11.48 s. We should notice that the running time for ELM PCA-SVM is a much more appropriate tool for thyroid
including the time to select the optimal number of hidden disease diagnosis problem compared with the other meth-
neurons as well as the training of ELM model. That implies ods. Consequently, it makes us be more convinced that the
that if we have prior knowledge about the optimal value of proposed CAD system can be very helpful in assisting the
hidden neurons, the running time can be further reduced. physicians to make the accurate diagnosis on the patients.
Conclusions 7. Dogantekin, E., Dogantekin, A., and Avci, D., An expert system
based on generalized discriminant analysis and wavelet support
vector machine for diagnosis of thyroid diseases. Expert Syst.
In this work, we have developed a CAD system PCA-ELM Appl. 38(1):146–150, 2011.
for assisting the diagnosis of thyroid disease. The main aim 8. Chen, H. L, Yang, B., Wang, G., Liu, J., Chen, Y. D., and Liu., D.
of this system is to apply the unique features of ELM Y., “A three-stage expert system based on support vector machines
for thyroid disease diagnosis.” J. Med. Syst.: http://dx.doi.org/
classifier including better generalization performance, fast
10.1007/s10916-011-9655-8, 2011.
learning speed, simpler and without tedious and time- 9. Hsu, C. W., and Lin, C. J., A comparison of methods for multi-
consuming parameter tuning to perform thyroid disease class support vector machines. Neural Networks, IEEE Transac-
diagnosis. In order to get rid of the irrelevant information tions on. 13(2):415–425, 2002.
10. Huang, G. B., Zhu, Q. Y., and Siew, C. K., Extreme learning
in the thyroid data, the PCA was used for feature reduction machine: a new learning scheme of feed forward neural networks.
before conducting the ELM classifier. Experimental results IEEE Int. Jt. Conf. Neural Netw. 2:985–990, 2004.
demonstrated that the proposed system performed signifi- 11. Chen, F. L., and Ou, T. Y., Sales forecasting system based on gray
cantly well in distinguishing among hyperthyroidism, hypo- extreme learning machine with taguchi method in retail industry.
Expert Syst. Appl. 38(3):1336–1345, 2011.
thyroidism and normal ones. It was observed that PCA-
12. Liu, N., Lin, Z., Koh, Z., Huang, G. B, Ser, W., Ong, M. E. H.,
ELM achieved the highest classification accuracy of Patient outcome prediction with heart rate variability and vital
98.1% and mean classification accuracy of 97.73% signs. J. Signal Proc. Syst. 1–14, 2010.
using10-fold cross-validation. Meanwhile, comparative 13. Han, F., et al., The forecast of the postoperative survival time of
patients suffered from non-small cell lung cancer based on PCA
study was conducted on the methods of PCA-SVM and
and extreme learning machine. Int. J. Neural Syst. 16(1):39–46,
PCA-ELM. The experimental results showed that PCA- 2006.
ELM significantly outperformed PCA-SVM in terms of 14. Zhang, R., et al., Multicategory classification using an ex-
classification accuracy with shorter run time. Therefore, it treme learning machine for microarray gene expression cancer
diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinforma. 4
can be safely concluded that, the developed PCA-ELM (3):485–494, 2007.
CAD system is helpful to make very accurate diagnostic 15. Helmy, T., and Rasheed, Z., Multi-category bioinformatics dataset
decision. The future investigation will pay much attention to classification using extreme learning machine. in Evolutionary
evaluate the proposed system in other medical diagnosis Computation, 2009. CEC '09. IEEE Congress on. 2009.
16. Gomathi, M., and Thangaraj, P., A computer aided diagnosis
problems.
system for lung cancer detection using machine learning tech-
nique. Eur. J. Sci. Res. 51(2):260–275, 2011.
17. Chen, H. L., Liu, D. Y., Yang, B., Liu, J., and Wang, G., A new
Acknowledgements This research is supported by the National Nat- hybrid method based on local fisher discriminant analysis and
ural Science Foundation of China (NSFC) under Grant Nos. 61133011, support vector machines for hepatitis disease diagnosis. Expert
61170092, 60973088, 60873149. Syst. Appl. 38(9):11796–11803, 2011.
18. Chen, H. L., Yang, B., Liu, J., and Liu, D. Y., A support
vector machine classifier with rough set-based feature selec-
tion for breast cancer diagnosis. Expert Syst. Appl. 38
(7):9014–9022, 2011.
References 19. Polat, K., and Gunes, S., Computer aided medical diagnosis sys-
tem based on principal component analysis and artificial immune
recognition system classifier algorithm. Expert Syst. Appl. 34
1. Ozyilmaz, L., and Yildirim T., Diagnosis of thyroid disease (1):773–779, 2008.
using artificial neural network methods. In Proceedings of 20. Polat, K., and Gunes, S., An expert system approach based on
ICONIP’02 nineth international conference on neural informa- principal component analysis and adaptive neuro-fuzzy inference
tion processing, Orchid Country Club, Singapore, pp. 2033– system to diagnosis of diabetes disease. Digit. Signal Proc. 17
2036, 2002. (4):702–710, 2007.
2. Serpen, G., Jiang, H., and Allred, L., Performance analysis of 21. Pearson, K., On lines and planes of closest fit to systems of points
probabilistic potential function neural network classifier. In Pro- in space. Philos. Mag. 2(6):559–572, 1901.
ceedings of artificial neural networks in engineering conference, 22. Huang, G.-B., Zhu, Q. Y., and Siew, C.-K., Extreme learning
St. Louis, MO, Vol. 7, pp. 471–476, 1997. machine: theory and applications. Neurocomputing 70(1–3):489–
3. Pasi, L., Similarity classifier applied to medical data sets, in 501, 2006.
international conference on soft computing. Helsinki, Finland & 23. Huang, G. B., and Babri, H. A., Upper bounds on the number of
Gulf of Finland & Tallinn, Estonia, 2004. hidden neurons in feedforward networks with arbitrary bounded
4. Polat, K., Sahan, S., and Gunes, S., A novel hybrid method based nonlinear activation functions. Neural Netw., IEEE Trans. on. 9
on artificial immune recognition system (AIRS) with fuzzy (1):224–229, 1998.
weighted pre-processing for thyroid disease diagnosis. Expert Syst. 24. Huang, G. B., Learning capability and storage capacity of two-
Appl. 32(4):1141–1147, 2007. hidden-layer feedforward networks. Neural Netw., IEEE Trans. on.
5. Keles, A., and Keles, A., ESTDD: expert system for thyroid dis- 14(2):274–281, 2003.
eases diagnosis. Expert Syst. Appl. 34(1):242–246, 2008. 25. Huang, G. B., Chen, L., and Siew, C. K., Universal approximation
6. Temurtas, F., A comparative study on thyroid disease diagno- using incremental constructive feedforward networks with random
sis using neural networks. Expert Syst. Appl. 36(1):944–949, hidden nodes. Neural Netw., IEEE Trans. on. 17(4):879–892,
2009. 2006.
J Med Syst (2012) 36:3327–3337 3337
26. Salzberg, S. L., On comparing classifiers: pitfalls to avoid and a 28. Chang, C. C., and Lin, C. J., LIBSVM: a library for support vector
recommended approach. Data mining. Knowl. Discov. 1(3):317– machines. 2001, Software available at http://www.csie.ntu.edu.tw/
328, 1997. cjlin/libsvm.
27. Ron, K., A study of cross-validation and bootstrap for accu- 29. Hsu, C. W., Chang, C. C., and Lin, C. J., A practical guide to support
racy estimation and model selection, in Proceedings of the vector classification. Technical report, Department of Computer Sci-
14th international joint conference on Artificial intelligence— ence and Information Engineering, National Taiwan University, Tai-
Vol2, 1995. pei, 2003. available at http://www.csie.ntu.edu.tw/cjlin/libsvm/.