0% found this document useful (0 votes)
45 views8 pages

Thong Kam 2008

NA

Uploaded by

Avijit Chaudhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views8 pages

Thong Kam 2008

NA

Uploaded by

Avijit Chaudhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

AdaBoost Algorithm with Random Forests for Predicting Breast

Cancer Survivability
Jaree Thongkam, Guandong Xu and Yanchun Zhang

Abstract—In this paper we propose a combination of the In relation to current medical analysis, the decision tree is
AdaBoost and random forests algorithms for constructing a widely used in the medical domain. Several research studies
breast cancer survivability prediction model. We use random have successfully employed decision trees to extract the
forests as a weak learner of AdaBoost for selecting the high knowledge from medical data sets. For example, Delen,
weight instances during the boosting process to improve Walker and Kadam [3] employed classification and
accuracy, stability and to reduce overfitting problems. The regression trees to predict the breast cancer survivability in
capability of this hybrid method is evaluated using basic
SEER medical databases. Their results showed that the
performance measurements (e.g., accuracy, sensitivity, and
specificity), Receiver Operating Characteristic (ROC) curve decision tree algorithm was superior for extracting the
and Area Under the receiver operating characteristic Curve knowledge from their data set. Many researchers have
(AUC). Experimental results indicate that the proposed method utilized a single classifier to extract the knowledge from data
outperforms a single classifier and other combined classifiers sets. For example, Yi and Fuyong [9] applied Support
for the breast cancer survivability prediction. Vector Machines (SVM) alone to discover breast cancer
diagnosis patterns from the University of Wisconsin
I. INTRODUCTION Hospital. Their results showed that SVM was suitable for
diagnosing the breast cancer patterns. Moreover, Ryu,
B REAST CANCER is the second most common cause of
cancer death among women in Thailand [1]. It has been
increasing in the past several years, with more than
Chandrasekaran and Jacob [4] employed an isotonic
separation technique to predict breast cancer in the
5,000 new cases reported every year. Several research Wisconsin breast cancer diagnosis data set and the Ljubljana
studies have contributed to investigate factors in diseases breast cancer recurrence data set. Their result showed that
such as lifestyle changes, dietary patterns, and genetic issues the isotonic separation technique outperformed C4.5, Robust
[2]. Also, much research has analyzed the course and LP-P, and SVM Gaussian kernel.
outcome of disease which can help patients to have an idea In order to enhance the ability of standard algorithms,
of how to make decisions about their quality of life in several of the attribution section methods have been utilized
accordance with their finances [3], [4]. For example, in the medical domain for selecting the significant attributes.
Kaplan-Meier and Cox-Propositional hazard models, For example, Xiong et al. [10] combined Principle
traditional statistical methods, are commonly used to Components Analysis (PCA) and Partial Least Squares
estimate the survival rate of a particular patient suffering (PLS) linear regression for analyzing attributes and then
from a disease over a particular time period [5]. Currently, used the decision tree and association rule to extract the
advanced techniques in the field of data mining, a new knowledge from a breast cancer diagnosis data set at the
stream of methodologies, have come into existence. These University of Wisconsin Madison. This data set included
techniques are proven to be more powerful than traditional 699 breast cancer patients, with 458 instances of benign
statistical methods [6]. They provide processes for class and 241 instances of malignant class. Their results
discovering useful patterns or models from large data sets showed a percentage of correctness of 96.57%. Moreover,
[7]. One of the most common widely used techniques in Wang et al. [11] utilized Independent Component Analysis
data mining is classification. It is used to extract models (ICA) to select the best attributes and applied Least Square
describing important data classes and to predict the outcome Support Vector Machines (LS-SVM) to detect the breast
in unseen data at the single point of time [8]. Therefore, in cancer tumor. Experiment results showed that the accuracy
order to help medical practitioners predict the accurate of LS-SVM with ICA was significantly improved over using
outcomes, data mining and decision support based tools are LS-SVM alone.
needed to help medical practitioners to process a huge Recently, AdaBoost technique has become an attractive
amount of data available from previously solved cases, and ensemble method in machine learning since it is low in error
to suggest the probable treatments based on analyzing the rate, performing well in the low noise data set [12], [13]. As
abnormal values of several significant attributes. a successor of the boosting algorithm, it is used to combine a
set of weak classifiers to form a model with higher
prediction outcomes [12]. As a result, several research
Manuscript received December 15, 2007.
J. Thongkam is with School of Computer Science and Mathematics,
studies have successfully applied the AdaBoost algorithm to
Victoria University, Melbourne, Australia (e-mail: [email protected]). solve classification problems in object detection, including
G. Xu is with School of Computer Science and Mathematics, Victoria face recognition, video sequences and signal processing
University, Melbourne, Australia (e-mail: [email protected]). systems. For example, Zhou and Wei [14] utilized the
Y. Zhang is with School of Computer Science and Mathematics, Victoria
University, Melbourne,Australia (e-mail: [email protected]).
AdaBoost algorithm to extract the top 20 significant features

3062

978-1-4244-1821-3/08/$25.002008
c IEEE
from the XM2VT face database. Their results showed that Input: S: training set, S=xi(i=1,2,…,n), labels yi ∈ Y
the AdaBoost algorithm reduces 54.23 % of the computation K: Iterations number
time. Additionally, Sun, Wang and Wong [15] applied the 1) Assign S sample (x1,y1),..,(xn,yn); xi ∈ X , yi ∈ {-1,+1}
AdaBoost algorithm to extract high-order pattern and weight 2) Initialize the weights of D1(i)=1/n, i=1,…,n)
of evidence rule based classifiers from the UCI Machine 3) for k=1,...,K
4) Call WeakLearn, providing it with the distribution Dk
Learning Repository. Their results showed the composed 5) Get weak hypothesis hk:XÆ{-1,+1} with its error: ε k = ¦ D (i) k
classifiers can achieve better classification accuracy over the i = hk ( xi ) ≠ yi

HPWR classifiers alone. However, few research studies 6) Update distribution Dk : D (i) = Dk (i) exp(−α k yk k k ( xk ))
k +1
have utilized AdaBoost and random forests to make 7) next k zk
§ K ·
predictions on medical databases. 8) Output : H(x) = sign¨ ¦α k hk ( x) ¸
© k =1 ¹
We propose a combination of AdaBoost and random
forests for predicting breast cancer survivability from the Fig. 1. AdaBoost algorithm
data set collected in Srinagarind hospital in Thailand. We where Zk is the normalization constant (chosen so that Dk+1
investigate the performance of the AdaBoost algorithm using will be a distribution). αk is presented in Equation 1 to
random forests as the weak learner algorithm to generate improve the generalizing result and also solves the
better prediction models in breast cancer survivability overfitting and noise sensitive problems [19]. W refers to
investigation. The 10-fold cross-validation method, the class probability estimate to construct real value of
confusion matrix, ROC curve AUC score, accuracy, α k hk (x).
sensitivity and specificity are used to evaluate the breast 1 W+1 − W−1
cancer survivability prediction models. α k = ln( ) (1)
2 W−1 + W−1
The remainder of this paper is organized as follows. Therefore, the final hypothesis H(x) is a weighted majority
Section II introduces the basic concepts of AdaBoost, vote of the K weak hypotheses where it is the weight
random forests, and the hybrid of AdaBoost and random assigned to hk. In addition, AdaBoost does not only handle
forests. Section III presents the methodologies and the binary class, but also handle the numerical class for
experiment design used in this paper. Experiment results and predictions purposed [17].
discussions are presented in section IV. The conclusion and
outline of future work are given in section V. B. Random Forests
Random forests (RF) [20] is one of the most successful
II. BASIC CONCEPTS OF ALGORITHMS ensemble learning techniques which have been proven to be
This section briefly describes the theoretical background of very popular and powerful techniques in the pattern
AdaBoost, random forests and the proposed combination recognition and machine learning for high-dimensional
algorithm used in this paper. classification [21] and skewed problems [20]. These studies
used RF to construct a collection of individual decision tree
A. AdaBoost classifiers which utilized the classification and regression
AdaBoost is a new and the most popular ensemble method. trees (CART) algorithms [22]. CART is a rule-based
It is used for prediction in classification tasks and it is method that generates a binary tree through a binary
reported to present self-rated confidence scores by recursive partitioning process that splits a node based on the
estimating the reliability of their predictions [16]. It is a yes and no answer of the predictors. The rule generated at
learning algorithm used to generate multiple classifiers from each step is to maximize the class purity within the two
which the best classifier is selected [16], [17]. It not only has resulting subsets. Each subset is split further based on the
high flexibility for combining with other methods, such as independent rules. CARTs use the Gini index to measures
decision stump and classification and regression trees the impurity of a data partition or set of training instances
(CART), but it also requires less input parameter and less [7]. Although the aim of CART is to maximize the
knowledge of computing background to improve the difference of heterogeneity, however, in the real world data
accuracy of the prediction models from the data set. sets the overfitting problem that causes the classifier to have
In this paper we utilized AdaBoost.M1 [18], Gentle a high error of prediction in the unseen data set often
AdaBoost, which is originated from the setting of weights encounters. Therefore, the bagging mechanism in RF can
over the training set. The training set (x1,y1),…(xn,yn) where enable the algorithm to create classifiers for high
each xi belongs to instance space X, and each label yi is in the dimensional data very quickly [20], [21]. The accuracy of
label set Y, which is equal to the set of {-1,+1}. It assigns the classification decision is obtained by voting from the
the weight on the training example i on round k as Dk(i). individual classifiers in the ensemble. The common element
The same weight will be set at the starting point (Dk(i)=1/N, in all of theses steps is that the number of b tree and a
i=1,…,N). Then the weight of the misclassified example random vector (Sb) using bootstrap sample are generated
from base learning algorithm (called weak hypothesis) is independent of the past random vectors but with the same
increased to concentrate the hard examples in the training set distribution, and a tree is grown using the training set and Sb.
in each round. The eight steps of AdaBoost algorithm are The random forests algorithm is shown in Fig. 2.
given in Fig. 1.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3063


Input: S: training sample methods. Thus, we proposed a combination of AdaBoost and
f: number of input instance to be used at each of the tree random forests to improve the performance of the prediction
B: number of generated trees in random forest
1) E is empty
models by measuring the accuracy, sensitivity, specificity,
2) for b=1 to B ROC curve and AUC score on the filtered breast cancer
3) Sb = bootstrapSample(S) survivability data set collected in Thailand.
4) Cb = BuildRandomTreeClassifiers(Sb,f)
5) E=E ∪ {Cb}
6) next b
III. METHODOLOGIES AND E XPERIMENT DESIGN
7) return E In this section we first describe the breast cancer data
preparation used in this experiment. Then we present
Fig. 2. Random forests algorithm
performance evaluation methods including accuracy
Many research studies applied the random forests
sensitivity, specificity, a Receiver Operating Characteristic
algorithm to construct the decision trees. For example, Kim,
(ROC) curve and an Area Under the receiver operating
Lee and Park [23] utilized the random forests algorithm to
characteristic Curve (AUC).
build the lightweight Intrusion Detection System. Their
results showed that random forests outperformed Support A. Data Set
Vector Machines (SVM) and artificial neural networks The breast cancer survivability data were obtained from
(ANN). However, the random forests classifier is weak in Srinagarind Hospital in Thailand. The data include patient
high noise data that could cause an overfitting problem information and treatment choice of patients who were
which reduced the accuracy of models in the unseen data set diagnosed with breast cancer in 1990-2001. The breast
(test set). Moreover, it suffers from the difficulty of tree cancer survivability data consist of 2,462 instances and 26
growing without pruning [21]. attributes. After studying descriptive statistics the result
C. AdaBoost and Random Forests showed that some attributes have more than 30% missing
values while some attributes have only one value. The
For the combination of AdaBoost and random forests
reason is that some patients have been diagnosed in
(ABRF) technique used in this paper, we utilized the random
Srinagarind, but received treatments in other hospitals. As
forest as a weak learner to generate the prediction models
so, we eliminated the outlier instances from the original data
with less error rate. Although AdaBoost works fast with
set [25]. Therefore, the final data set consists of 570
simple weak learners, random forest is of interest in our real
instances, 11 attributes and a binary class attribute.
world data set, due to few research studies having employed
Moreover, the binary attribute was coded as 0 (‘dead’) if a
this method to predict in the medical domain. Thirteen steps
patient survives less than 60 months otherwise was coded as
of the hybrid AdaBoost and random forests algorithm are
1 (‘alive’). The whole data set is divided into two classes,
given in Fig. 3.
‘dead’ and ‘alive’ class, in which the ‘dead’ class has 322
instances while the ‘alive’ class consists of 248 instances.
Input: S: training set, S=xi(i=1,2,…,n), labels yi ∈ Y
K: Iterations number
The attribute list is presented in Table I.
L:Learn (Random Forests algorithm as weak learner) B. Evaluation Methods
f: number of input instance to be used at each of the tree
B: number of generated trees in random forest In this experiment, we applied three evaluation methods
1) Assign N sample (x1,y1),..,(xn,yn); xi ∈ X , yi ∈ {-1,+1} including basic performance measures, ROC curve and AUC
2) Initialize the weights of D1(i)=1/n, i=1,…,n) score. These evaluation methods are based on the confusion
3) for k=1,…,K matrix. The confusion matrix is a visualization tool
4) empty E with the distribution Dk
5) for b=1to B
commonly used to present performances of classifiers in
6) Sb = booststrapSample(S) classification tasks [7]. It is used to show the relationships
7) Cb = BuildRandomTreeClassifiers(Sb,f) between real class attributes and that of predicted classes.
8) E=E ∪ {Cb} The level of effectiveness of the classification model is
9) next b calculated with the number of correct and incorrect
10) Get weak hypothesis hk:XÆ{-1,+1} with its error: ε k = ¦D
i = hk ( xi ) ≠ yi
k (i )
classifications in each possible value of the variables being
11) Update distribution Dk: Dk +1 (i ) = Dk (i ) exp(−α k yk k k ( x k )) classified in the confusion matrix [26] (see Fig. 4).
12) next k zk
§ K
·
13) Output : H( x) = sign¨ ¦ α k hk ( x) ¸ Predicted Classes
© k =1 ¹ ‘Dead’ ‘Alive’
‘Dead’ TP FN
Outcomes
Fig. 3. AdaBoost and random forests ‘Alive’ FP TN
This combination has advantages including increased
Fig. 4. The confusion matrix
performance and prediction ability of the models in some
data sets. For instance, Leshem [24] utilized AdaBoost The confusion matrix is used to compute true positives
algorithm and random forests algorithm as the base learning (TP), false positives (FP), true negatives (TN) and false
algorithms to predict the traffic flow. Their results showed negatives (FN), as represented in Fig. 4.
that this combination has a low error rate. Error rate is the 1) Performance Measures
basic measurement method, which is used to investigate the There are three commonly used performance
weak and strong points of algorithms for various evaluation measurements including accuracy, sensitivity and specificity

3064 2008 International Joint Conference on Neural Networks (IJCNN 2008)


[7]. The accuracy of classifiers is the percentage of Ling [31] demonstrated that AUC is a better evaluation
correctness of outcome among the test sets exploited in this measure than accuracy or error rate.
study. It is defined in (2). The sensitivity is referred as the
true positive rate, and the specificity as the true negative IV. RESULTS AND DISCUSSIONS
rate. Both sensitivity and specificity used for measuring the The WEKA environment is a well defined framework and
factors that affect the performance are presented in (3) and offers a variety of learning algorithms for the development
(4), respectively. of new data mining and machine learning algorithms.
WEKA version 3.5.6 [32] tools used in the data mining were
TP + TN selected to evaluate the performance and effectiveness of the
accuracy = (2)
TP + FP + TN + FN combination of AdaBoost and random forests (ABRF)
TP methods. In Section IV(A), the performance of the proposed
sensitivit y = (3)
algorithm is compared with 10 single classifiers including
TP + FP
AdaBoost, alternating decision tree (ADTree), Bagging,
TN
specificity = (4) C4.5, conjunctive rule, Naïve Bayes, Nearest-Neighbor
TN + FN classifier (NN-classifier), random forests, Repeated
In this study, the sensitivity is the probability of correct Incremental Pruning to Produce Error Reduction (RIPPER)
tests among “dead” patients. In contrast, the specificity is and Support Vector Machines (SVM). In Section IV(B) the
the probability of correct tests among “alive” patients. capability of the proposed algorithm is compared with 8
2) Receiver Operating Characteristic (ROC) Curve based classifiers of AdaBoost including ADTree, C4.5,
The Receiver operating characteristic curve graphically conjunctive rule, decision stump, Naïve Bayes, NN-
interprets the performance of the decision-making algorithm classifier, RIPPER and SVM. In Section IV(C) the
with regard to the decision parameter [27], [28]. It has often performance of ensemble methods with and without random
been used not only as an evaluation criterion for the forests were compared using ROC curve and AUC score.
predictive performance of classification or data mining Experiments were performed using a 10-fold cross-
algorithms but also as an alternative single-number measure validation approach to reduce the bias associated with the
for evaluating performance of learning algorithms [29]. random sampling strategy [33], [34] on the breast cancer
survivability data set from Srinagarind hospital in Thailand.
TP For each of the 10 runs, the data set was divided into a
TPR = (5)
TP + FN training (9 folds) set and a test set (remaining fold). The
FP results were averaged over these 10 runs.
FPR = (6)
TN + FP A. Classification Performance comparison among Single
ROC curve is a two-dimension graph in which the true Classifier
positive rate (TPR) (5) is plotted on the Y axis and the false
In these experiments, the performance and effectiveness
positive rate (FPR) (6) is plotted on the X axis. TPR is the
of the proposed algorithm is compared with 10 single
true positive value which is the number of correct
classifiers including AdaBoost, ADTree, Bagging, C4.5,
predictions in the death class. FPR is the false positive
conjunctive rules, Naïve Bayes, NN-classifiers, random
value which is the number of incorrect predictions in the
forests, RIPPER and SVM. The experiment results were
death class. Two important points in the ROC curve include
displayed in Table II.
the lower left point (0; 0) and the upper right point (1; 1).
Table II shows the accuracies, sensitivities and
The lower left point (0; 0) represents the strategy of never
specificities of 11 classifiers including ABRF, AdaBoost,
issuing a positive classification; such a classifier commits no
ADTree, Bagging, C4.5, conjunctive rule, Naïve Bayes, NN-
false positive errors but also gains on true positives. The
classifier, random forests, RIPPER and SVM. The
upper right point (1; 1) represents the opposite strategy, of
experiment results show that the accuracy of ABRF is 100%
unconditionally issuing positive classifications. Furthermore,
when evaluating the training set and 88.60% when
ROC analysis offers more robust evaluation of the relative
evaluating the test set with 10 iterations. Besides, using
prediction performance of the models than the tradition
random forests as the weak learner algorithm increases the
comparison of relative error, such as error rate [30].
average model accuracy (8.25%), sensitivity (11.37%) and
3) Area under the Receiver Operating Characteristic
specificity (2.6%), compared with the basic AdaBoost in the
Curve (AUC)
test set. On the other hand, random forests as the weak
An Area under the ROC Curve (AUC) is traditionally used
learner algorithm increases 2.81%, 2.67% and 3% of average
in medical diagnosis systems. Recently, the AUC has been
model accuracy, sensitivity and specificity compared with
introduced as an alternative measure for evaluating the
the random forests algorithm. As shown from the results, the
predictive ability of learning algorithms [7]. The AUC also
ABRF algorithm is better than basic AdaBoost and random
provides an approach for evaluating models based on an
forest in terms of accuracy, sensitivity and specificity and it
average of each point on the curve [27], [28]. Thus, the AUC
is better than random forests when addressing the overfitting
score is always between 0 and 1 for classifier performance
problem. In addition, the ABRF algorithm outperforms
and the model with a higher AUC score gives a better
ADTree, Bagging, C4.5, conjunctive rule, Naïve Bayes, NN-
classifier performance than others. Moreover, Huang and
classifier, RIPPER and SVM in both training and test sets.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3065


This may be due to the ability to vote the important instance
95
of the AdaBoost algorithm, which reduces the error of ABRF

random forests algorithm; this improve the performance of ADTree


C4.5
Conjunctive rule
the prediction model. Decision s tump
Naïve Bayes
90 NN-classifier

B. Classification Performance Comparison among RIPPER


SVM

Multiple Classifiers Combination


In these experiments, the capability of the proposed

Specificity(%)
85
algorithm is compared with 8 weak learners including
ADTree, C4.5, conjunctive rule, decision stump, Naïve
Bayes, NN-classifier, RIPPER and SVM of AdaBoost. The 80
default settings of each weak learner are used to generate the
models. These models were evaluated using 10-fold cross-
validation and measuring the accuracy, sensitivity, 75
specificity. The experiment involved increasing the iteration
of AdaBoost by 5 iterations each time until 100 iterations to
illustrate the performance of models. The experiment results 0 10 20 30 40 50 60 70 80 90 100
Iterations
were given in Figs. 5, 6 and 7, respectively.
Fig. 7. The specificity comparison

95 Figs. 5, 6 and 7 show performance measures in terms of


accuracy, sensitivity and specificity of ABRF and 8 weak
learner algorithms including ADTree, C4.5, decision stump,
90 conjunctive rule, Naïve Bayes, NN-classifier, RIPPER and
SVM, respectively. The experiment results show that the
ABRF algorithm outperforms ADTree, C4.5, decision
Accuracy(%)

85 stump, conjunctive rule, Naïve Bayes, NN-classifier,


RIPPER and SVM. Likewise, using random forests as the
weak learner algorithm increases 7.85%, 9.93% and 4.54%
80
of average model accuracy, sensitivity and specificity
ABRF
ADT ree
C4.5
compared with the basic AdaBoost in the same test set after
Conjunctive rule
Decis ion stump
running 100 times. Moreover, our results indicate that most
75 Naïve Bayes
NN-cl assifier
RIPPE R
hybrid classifiers including ADTree, C4.5, conjunctive rule
SVM
and decision stump improved their accuracy after 10 rounds
0 10 20 30 40 50 60 70 80 90 100 of re-boost. On the other hand, Naïve Bayes and NN-
Iterations
classifier decreased the accuracy after 10 rounds. Although
Fig. 5. The accuracy comparison
the sensitivity of ensemble classifiers seems stable, the
specificity of ensemble classifiers seems uncertain in the
95 results. It might be related to the fact that AdaBoost
concentrates on improving the majority class which is the
‘dead’ class in our case.
90 In addition, the statistic analysis including the minimum
(Min), maximum (Max), average (Avg) and variance (Var)
of the accuracy, sensitivity and specificity of the proposed
Sensitivity(%)

85 method was utilized. The analysis results were given in


Table III.
Table III shows the statistical results of accuracy,
80
ABRF
sensitivity and specificity including minimum, maximum,
ADT ree
C4.5
average and variance of ensemble classifiers. The
Conjunctive rule
Decision stump experiment results display that the average accuracy,
75 Naïve Bayes
NN-classifier
RIPPER sensitivity and specificity of ABRF algorithm outperform
SVM
other ensemble classifiers. Furthermore, the variance of
0 10 20 30 40 50 60 70 80 90 100
Iterations accuracy and specificity of ABRF being the lowest indicates
that ABRF is keeping stable even increasing the iteration.
Fig. 6. The sensitivity comparison Therefore, the computation cost can be reduced by applying
a few iterations.
C. Model Selection
In these experiments, the performance of the proposed
algorithm (ABRF) is compared with three classifiers

3066 2008 International Joint Conference on Neural Networks (IJCNN 2008)


including AdaBoost, random forests and C4.5 using the V. CONCLUSION AND FUTURE WORK
ROC curve. The experiment results were given in Fig. 8. In this paper we proposed a combination of the AdaBoost
and random forests algorithms for constructing a breast
cancer survivability prediction model. We illustrated the
capability and effectiveness of the proposed method using
10-fold cross-validation, accuracy, sensitivity, specificity.
The results showed that the proposed method improved the
accuracy up to 88.60% compared with several single and
combined classifiers. Although the prediction accuracy
improvements in terms of ROC and AUC achieved from the
experiments some times was not significant, the experiment
results have shown the improvement of the models for
further developing suitable prediction models.
As results, the proposed method is capable of extracting
patterns, but without the cooperation and feedback from the
medical practitioner, these results would be useless.
Besides, this method is not aimed at replacing the medical
practitioner and researchers, but rather to complement their
invaluable efforts to save more human lives. Therefore, the
Fig. 8. ROC curve patterns found via this hybrid method should be evaluated by
medical practitioners.
Fig. 8 illustrates the predictive performance of four As for further work, we plan to investigate the diversity of
classifiers including AdaBoost, random forests, ABRF and the number of classifiers in the ensemble and compare them
C4.5. The results show that ABRF algorithm improves the with other ensemble methods in this aspect. Another
prediction ability of random forests in some points and possibility to investigate is using the ABRF algorithm in
performs relatively well compared with AdaBoost and C4.5 larger data sets. Finally, a comparison of the rule-based
in terms of ROC curve. However, it is hardly possible to AdaBoost ensemble would be of interest.
distinguish the difference in performance between ABRF
and random forests models in ROC curve. Therefore, the
advance techniques used to select these models such as AUC ACKNOWLEDGEMENTS
scores is needed. The experiment results were shown in Fig. Thanks to IT and Cancer department staffs at Srinagarind
9. Hospital for providing the data. Thanks to Dr. Vatinee
Sukmak for helpful comments, suggestions and criticisms.
100
94.30 94.40
95
REFERENCES
90 86.90
85.20 [1] National Cancer Institute of Thailand, “Cancer in Thailand 1995-
85 1997,” Available: http://www.nci.go.th/cancer_record/.
80
[2] T. Srinivasan, A. Chandrasekhar, J. Seshadri and J. B. S. Jonathan,
“Knowledge discovery in clinical databases with neural network
75 evidence combination,” in Proc. International Conference on
AdaBoost Random ABRF C4.5 Intelligent Sensing and Information, 2005, pp. 512-517.
For es ts
[3] D. Delen, G. Walker and A. Kadam, “Predicting breast cancer
survivability: a comparison of three data mining methods,” J.
Class if ier s
Artificial Intelligence in Medicine, vol. 34, pp. 113-127, 2005.
Fig. 9. AUC scores [4] Y. U. Ryu, R. Chandrasekaran and V. S. Jacob, “Breast cancer
prediction using the isotonic separation technique,” J. European
Operational Research, vol. 181, pp. 842-854, 2007.
Fig. 9 displays the results of the AUC scores of four [5] S. Borovkova, “Analysis of survival data,” Available:
classifiers including AdaBoost, random forests, ABRF and http://www.math.leidenuniv.nl/~naw/serie5/deel03/dec2002/pdf/boro
C4.5. Although experiment results indicated that ABRF vkova.pdf.
outperforms AdaBoost, random forests and C4.5 in our data [6] L. Ohno-Machado, “Modeling medical prognosis: survival analysis
techniques,” J. Biomedical Informatics, vol. 34, pp. 428-439, 2001.
set, the performance of ABRF using the AUC scores does [7] J. Han and M. Kamber, Data mining: concepts and techniques. 2nd.
not seem a significant improvement. It might be due to the ed. San Francisco: Morgan Kaufmann, Elsevier Science, 2006.
fact that AdaBoost concentrates on improving the error rate [8] M. T. Skevofilakas, K. S. Nikita, P. H. Templaleksis, K. N. Birbas, I.
in the majority class. As shown from the results, using RF G. Kaklamanos and G. N. Bonatsos, “A decision support system for
breast cancer treatment based on data mining technologies and
as a base learning algorithm reduces the ability of prediction clinical practice guidelines,” in IEEE-EMBS the Twenty-Seventh
in the ‘alive’ class but makes the prediction of the false Annual International Conference on Medicine and Biology Society,
positive in ‘dead’ class lower than using RF alone. 2005, pp. 2429-2432.
[9] W. Yi and W. Fuyong, “Breast cancer diagnosis via support vector
machines,” in Proc. the Twenty Fifth Chinese Control Conference,
2006, pp. 1853-1856.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3067


[10] X. Xiong, Y. Kim, Y. Baek, D. W. Rhee and S.-H. Kim., “Analysis of
breast cancer using data mining & statistical techniques,” in the Sixth
International Conference on Software Engineering, Artificial
Intelligence, Networking and Parallel, 2005, pp. 82-87.
[11] C.-Y. Wang, C.-G. Wu, Y.-C. Liang and X.-C. Guo, “Diagnosis of
breast cancer tumor based on ICA and LS-SVM,” in Proc. IEEE
International Conference on Machine Learning and Cybernetics,
2006, pp. 2565-2570.
[12] Y. Ma and X. Ding, “Robust real-time face detection based on cost-
sensitive AdaBoost method,” in Proc. the International Conference
on Multimedia and Expo, 2003, pp. 465-473.
[13] A. Vezhnevets and V. Vezhnevets, “' Modest AdaB oost' - teaching
AdaBoost to generalize better,” Novosibirsk Akademgorodok, Russia
2005.
[14] M. Zhou and H. Wei, “Face Verification Using GaborWavelets and
AdaBoost,” in the Eighteenth International Conference on Pattern
Recognition, Hong Kong, 2006, pp. 404-407.
[15] Y. Sun, Y. Wang and A. K. C. Wong, “Boosting an associative
classifier,” IEEE Trans. Knowledge and Data Engineering vol. 18,
pp. 988-992, 2006.
[16] R. E. Schapire, “A brief introduction to boosting,” in Proc. the
International Joint Conference on Artificial Intelligence, 1999, pp.
1401-1405.
[17] R. E. Schapire and Y. Singer, “Improved boosting algorithms using
confidence-rated predictions,” J. Machine Learning, vol. 37(3), pp.
297-336, 1999.
[18] Y. Freund and R. E. Schapire, “Experiments with a new boosting
algorithm,” in Proc. the Thirteenth International Conference on
Machine Learning, San Francisco, 1996, pp. 148-156.
[19] J. Friedman, T. Hastie and R. Tibshirani, “Additive logistic
regression: A statistical view of boosting,” J. the Annals of Statistics,
vol. 38, pp. 337-374, 2000.
[20] L. Breiman, “Random Forests,” J. Machine Learning vol. 45, pp. 5–
32, 2001.
[21] N. Meinshausen, “Quantile Regression Forests,” J. Machine Learning
Research, vol. 7, pp. 983–999, 2006.
[22] L. Breiman, J. Friedman, R. Olshen and C. Stone, Classification and
regression trees. Wadsworth: Belmont, 1984.
[23] D. S. Kim, S. M. Lee and J. S. Park, “Building lightweight intrusion
detection system based on random forest,” in Advances in Neural
Networks, vol. 3973, Springer-Verlag Berlin Heidelberg, 2006.
[24] G. Leshem and Y. a. Ritov, “Traffic flow prediction using adaboost
algorithm with random forests as a weak learner,” J. International
Journal of Intelligent Technology, vol. 2, pp. 1305-6417, 2007.
[25] J. Thongkam, G. Xu, Y. Zhang and F. Huang, “Support vector
machines for outlier detection in cancers survivability prediction,” in
International Workshop on Health Data Management, to be
published, 2008.
[26] P. Cabena, P. Hadjinian, R. Stadler, J. Verhees and A. Zanasi,
Discovering data mining from concept to implementation. Upper
Saddle River, N.J.: Prentice Hall, 1998.
[27] X. He and E. C. Frey, “Three-class ROC analysis-the equal error
utility assumption and the optimality of three-class ROC surface using
the ideal observer,” IEEE Trans. Medical Imaging,vol. 25(8), pp.
979-986, 2006.
[28] K. Woods and K. W. Bowyer, “Generating ROC curves for artificial
neural networks,” IEEE Trans. Medical Imaging,vol. 16(3), pp. 329-
337, 1997.
[29] S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled and D. Roth,
“Generalization bounds for the area under the ROC curve,” J.
Machine Learning Research, vol. 6, pp. 393-425, 2005.
[30] R. O. Duda, D. G. Stork and P. E. Hart, Pattern classification. 2nd ed.
ed. New York: Wiley, 2001.
[31] J. Huang and C. X. Ling, “Using AUC and accuracy in evaluating
learning algorithms,” IEEE Trans. Knowledge and Data
Engineering,vol. 17(3), pp. 299-310, 2005.
[32] I. H. Witten and E. Frank, Data mining: practical machine learning
tools and techniques. 2 ed. San Francisco: Morgan Kaufmann, 2005.
[33] R. Kohavi, “A study of cross-validation and bootstrap for accuracy
estimation and model selection,” in Proc. the International Joint
Conference on Artificial Intelligence, 1995, pp. 1137-1143.
[34] J. Thongkam, G. Xu and Y. Zhang, “An analysis of data selection
methods on classifiers accuracy measures,” J. Korn Ken University,
vol. 35(1), Jan-Feb 2008.
3068 2008 International Joint Conference on Neural Networks (IJCNN 2008)
TABLE I
INPUT ATTRIBUTES OF BREAST CANCER DATA
No Attributes Types
1 Age Number
2 Marital Status Category(3)
3 Occupation Category(26)
4 Basis of diagnosis Category(6)
5 Topography Category(9)
6 Morphology Category(14)
7 Extent Category(4)
8 Stage Category(4)
9 Received Surgery Category(2)
10 Received Radiation Category(2)
11 Received Chemo Category(2)
12 Survivability (Class) Category(2)

TABLE II
PERFORMANCE COMPPARSON AMONG SINGLE CLASSIFIER ON THE TRAINING AND TEST SETS
Training Set Test Set
Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity
Classifiers (%) (%) (%) (%) (%) (%)
ABRF 100.00 100.00 100.00 88.60 89.30 87.65
AdaBoost 80.88 78.55 85.28 80.35 77.93 85.05
ADTree 85.09 85.59 84.39 82.28 83.59 80.50
Bagging 91.23 92.24 89.92 83.86 84.64 82.77
C4.5 92.46 93.19 91.50 84.04 87.38 80.08
Conjunctive Rule 77.54 74.74 83.71 77.54 74.74 83.71
Naïve Bayes 84.04 85.54 82.04 83.51 84.97 81.56
NN-classifier 100.00 100.00 100.00 83.86 85.49 81.71
Random forests 99.65 99.69 99.60 85.79 86.63 84.65
RIPPER 87.54 91.15 83.40 85.79 88.25 82.75
SVM 99.82 99.69 100.00 85.96 86.45 85.29

TABLE III
PERFORMANCE COMPPARSON AMONG MULTIPLE CLASSIFIERS ON TEST SETS
Accuracy (%) Sensitivity (%) Specificity (%)
Based Classifiers Min Max Avg Var Min Max Avg Var Min Max Avg Var
ABRF 88.42 89.30 88.79 0.05 88.36 90.37 89.79 0.18 86.99 88.94 87.48 0.22
AD Tree 81.05 87.72 86.07 4.34 83.23 89.62 88.02 3.67 77.56 85.89 83.56 0.51
C4.5 82.81 88.07 86.95 1.25 86.36 89.66 88.35 0.64 78.63 86.59 85.15 2.80
Conjunctive Rule 77.37 81.58 80.94 1.61 74.55 80.91 79.85 4.08 82.65 85.33 82.94 0.43
Decision Stump 77.54 81.75 80.40 0.92 74.74 81.87 79.74 3.38 80.18 85.05 81.69 2.15
Naïve Bayes 81.75 83.51 81.98 0.25 84.71 85.94 84.87 0.13 78.13 81.56 78.44 0.64
NN-classifier 81.58 83.68 82.15 0.23 84.01 85.23 84.47 0.08 78.49 81.63 79.19 0.60
RIPPER 84.21 86.49 85.74 0.15 86.19 87.46 86.38 0.20 80.38 85.29 84.89 1.33
SVM 85.96 88.42 87.79 0.29 87.35 89.02 88.40 0.18 84.15 87.76 86.98 0.65
Note: Min refers to minimum; Max refers to maximum; Avg refers to average; Var refers to variance.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3069

You might also like