0% found this document useful (0 votes)

45 views8 pages

Thong Kam 2008

Uploaded by

Avijit Chaudhuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views8 pages

Thong Kam 2008

Uploaded by

Avijit Chaudhuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

AdaBoost Algorithm with Random Forests for Predicting Breast

Cancer Survivability
Jaree Thongkam, Guandong Xu and Yanchun Zhang

Abstract—In this paper we propose a combination of the In relation to current medical analysis, the decision tree is
AdaBoost and random forests algorithms for constructing a widely used in the medical domain. Several research studies
breast cancer survivability prediction model. We use random have successfully employed decision trees to extract the
forests as a weak learner of AdaBoost for selecting the high knowledge from medical data sets. For example, Delen,
weight instances during the boosting process to improve Walker and Kadam [3] employed classification and
accuracy, stability and to reduce overfitting problems. The regression trees to predict the breast cancer survivability in
capability of this hybrid method is evaluated using basic
SEER medical databases. Their results showed that the
performance measurements (e.g., accuracy, sensitivity, and
specificity), Receiver Operating Characteristic (ROC) curve decision tree algorithm was superior for extracting the
and Area Under the receiver operating characteristic Curve knowledge from their data set. Many researchers have
(AUC). Experimental results indicate that the proposed method utilized a single classifier to extract the knowledge from data
outperforms a single classifier and other combined classifiers sets. For example, Yi and Fuyong [9] applied Support
for the breast cancer survivability prediction. Vector Machines (SVM) alone to discover breast cancer
diagnosis patterns from the University of Wisconsin
I. INTRODUCTION Hospital. Their results showed that SVM was suitable for
diagnosing the breast cancer patterns. Moreover, Ryu,
B REAST CANCER is the second most common cause of
cancer death among women in Thailand [1]. It has been
increasing in the past several years, with more than
Chandrasekaran and Jacob [4] employed an isotonic
separation technique to predict breast cancer in the
5,000 new cases reported every year. Several research Wisconsin breast cancer diagnosis data set and the Ljubljana
studies have contributed to investigate factors in diseases breast cancer recurrence data set. Their result showed that
such as lifestyle changes, dietary patterns, and genetic issues the isotonic separation technique outperformed C4.5, Robust
[2]. Also, much research has analyzed the course and LP-P, and SVM Gaussian kernel.
outcome of disease which can help patients to have an idea In order to enhance the ability of standard algorithms,
of how to make decisions about their quality of life in several of the attribution section methods have been utilized
accordance with their finances [3], [4]. For example, in the medical domain for selecting the significant attributes.
Kaplan-Meier and Cox-Propositional hazard models, For example, Xiong et al. [10] combined Principle
traditional statistical methods, are commonly used to Components Analysis (PCA) and Partial Least Squares
estimate the survival rate of a particular patient suffering (PLS) linear regression for analyzing attributes and then
from a disease over a particular time period [5]. Currently, used the decision tree and association rule to extract the
advanced techniques in the field of data mining, a new knowledge from a breast cancer diagnosis data set at the
stream of methodologies, have come into existence. These University of Wisconsin Madison. This data set included
techniques are proven to be more powerful than traditional 699 breast cancer patients, with 458 instances of benign
statistical methods [6]. They provide processes for class and 241 instances of malignant class. Their results
discovering useful patterns or models from large data sets showed a percentage of correctness of 96.57%. Moreover,
[7]. One of the most common widely used techniques in Wang et al. [11] utilized Independent Component Analysis
data mining is classification. It is used to extract models (ICA) to select the best attributes and applied Least Square
describing important data classes and to predict the outcome Support Vector Machines (LS-SVM) to detect the breast
in unseen data at the single point of time [8]. Therefore, in cancer tumor. Experiment results showed that the accuracy
order to help medical practitioners predict the accurate of LS-SVM with ICA was significantly improved over using
outcomes, data mining and decision support based tools are LS-SVM alone.
needed to help medical practitioners to process a huge Recently, AdaBoost technique has become an attractive
amount of data available from previously solved cases, and ensemble method in machine learning since it is low in error
to suggest the probable treatments based on analyzing the rate, performing well in the low noise data set [12], [13]. As
abnormal values of several significant attributes. a successor of the boosting algorithm, it is used to combine a
set of weak classifiers to form a model with higher
prediction outcomes [12]. As a result, several research
Manuscript received December 15, 2007.
J. Thongkam is with School of Computer Science and Mathematics,
studies have successfully applied the AdaBoost algorithm to
Victoria University, Melbourne, Australia (e-mail: [email protected]). solve classification problems in object detection, including
G. Xu is with School of Computer Science and Mathematics, Victoria face recognition, video sequences and signal processing
University, Melbourne, Australia (e-mail: [email protected]). systems. For example, Zhou and Wei [14] utilized the
Y. Zhang is with School of Computer Science and Mathematics, Victoria
University, Melbourne,Australia (e-mail: [email protected]).
AdaBoost algorithm to extract the top 20 significant features

3062

978-1-4244-1821-3/08/$25.002008
c IEEE
from the XM2VT face database. Their results showed that Input: S: training set, S=xi(i=1,2,…,n), labels yi ∈ Y
the AdaBoost algorithm reduces 54.23 % of the computation K: Iterations number
time. Additionally, Sun, Wang and Wong [15] applied the 1) Assign S sample (x1,y1),..,(xn,yn); xi ∈ X , yi ∈ {-1,+1}
AdaBoost algorithm to extract high-order pattern and weight 2) Initialize the weights of D1(i)=1/n, i=1,…,n)
of evidence rule based classifiers from the UCI Machine 3) for k=1,...,K
4) Call WeakLearn, providing it with the distribution Dk
Learning Repository. Their results showed the composed 5) Get weak hypothesis hk:XÆ{-1,+1} with its error: ε k = ¦ D (i) k
classifiers can achieve better classification accuracy over the i = hk ( xi ) ≠ yi

HPWR classifiers alone. However, few research studies 6) Update distribution Dk : D (i) = Dk (i) exp(−α k yk k k ( xk ))
k +1
have utilized AdaBoost and random forests to make 7) next k zk
§ K ·
predictions on medical databases. 8) Output : H(x) = sign¨ ¦α k hk ( x) ¸
© k =1 ¹
We propose a combination of AdaBoost and random
forests for predicting breast cancer survivability from the Fig. 1. AdaBoost algorithm
data set collected in Srinagarind hospital in Thailand. We where Zk is the normalization constant (chosen so that Dk+1
investigate the performance of the AdaBoost algorithm using will be a distribution). αk is presented in Equation 1 to
random forests as the weak learner algorithm to generate improve the generalizing result and also solves the
better prediction models in breast cancer survivability overfitting and noise sensitive problems [19]. W refers to
investigation. The 10-fold cross-validation method, the class probability estimate to construct real value of
confusion matrix, ROC curve AUC score, accuracy, α k hk (x).
sensitivity and specificity are used to evaluate the breast 1 W+1 − W−1
cancer survivability prediction models. α k = ln( ) (1)
2 W−1 + W−1
The remainder of this paper is organized as follows. Therefore, the final hypothesis H(x) is a weighted majority
Section II introduces the basic concepts of AdaBoost, vote of the K weak hypotheses where it is the weight
random forests, and the hybrid of AdaBoost and random assigned to hk. In addition, AdaBoost does not only handle
forests. Section III presents the methodologies and the binary class, but also handle the numerical class for
experiment design used in this paper. Experiment results and predictions purposed [17].
discussions are presented in section IV. The conclusion and
outline of future work are given in section V. B. Random Forests
Random forests (RF) [20] is one of the most successful
II. BASIC CONCEPTS OF ALGORITHMS ensemble learning techniques which have been proven to be
This section briefly describes the theoretical background of very popular and powerful techniques in the pattern
AdaBoost, random forests and the proposed combination recognition and machine learning for high-dimensional
algorithm used in this paper. classification [21] and skewed problems [20]. These studies
used RF to construct a collection of individual decision tree
A. AdaBoost classifiers which utilized the classification and regression
AdaBoost is a new and the most popular ensemble method. trees (CART) algorithms [22]. CART is a rule-based
It is used for prediction in classification tasks and it is method that generates a binary tree through a binary
reported to present self-rated confidence scores by recursive partitioning process that splits a node based on the
estimating the reliability of their predictions [16]. It is a yes and no answer of the predictors. The rule generated at
learning algorithm used to generate multiple classifiers from each step is to maximize the class purity within the two
which the best classifier is selected [16], [17]. It not only has resulting subsets. Each subset is split further based on the
high flexibility for combining with other methods, such as independent rules. CARTs use the Gini index to measures
decision stump and classification and regression trees the impurity of a data partition or set of training instances
(CART), but it also requires less input parameter and less [7]. Although the aim of CART is to maximize the
knowledge of computing background to improve the difference of heterogeneity, however, in the real world data
accuracy of the prediction models from the data set. sets the overfitting problem that causes the classifier to have
In this paper we utilized AdaBoost.M1 [18], Gentle a high error of prediction in the unseen data set often
AdaBoost, which is originated from the setting of weights encounters. Therefore, the bagging mechanism in RF can
over the training set. The training set (x1,y1),…(xn,yn) where enable the algorithm to create classifiers for high
each xi belongs to instance space X, and each label yi is in the dimensional data very quickly [20], [21]. The accuracy of
label set Y, which is equal to the set of {-1,+1}. It assigns the classification decision is obtained by voting from the
the weight on the training example i on round k as Dk(i). individual classifiers in the ensemble. The common element
The same weight will be set at the starting point (Dk(i)=1/N, in all of theses steps is that the number of b tree and a
i=1,…,N). Then the weight of the misclassified example random vector (Sb) using bootstrap sample are generated
from base learning algorithm (called weak hypothesis) is independent of the past random vectors but with the same
increased to concentrate the hard examples in the training set distribution, and a tree is grown using the training set and Sb.
in each round. The eight steps of AdaBoost algorithm are The random forests algorithm is shown in Fig. 2.
given in Fig. 1.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3063

Input: S: training sample methods. Thus, we proposed a combination of AdaBoost and
f: number of input instance to be used at each of the tree random forests to improve the performance of the prediction
B: number of generated trees in random forest
1) E is empty
models by measuring the accuracy, sensitivity, specificity,
2) for b=1 to B ROC curve and AUC score on the filtered breast cancer
3) Sb = bootstrapSample(S) survivability data set collected in Thailand.
4) Cb = BuildRandomTreeClassifiers(Sb,f)
5) E=E ∪ {Cb}
6) next b
III. METHODOLOGIES AND E XPERIMENT DESIGN
7) return E In this section we first describe the breast cancer data
preparation used in this experiment. Then we present
Fig. 2. Random forests algorithm
performance evaluation methods including accuracy
Many research studies applied the random forests
sensitivity, specificity, a Receiver Operating Characteristic
algorithm to construct the decision trees. For example, Kim,
(ROC) curve and an Area Under the receiver operating
Lee and Park [23] utilized the random forests algorithm to
characteristic Curve (AUC).
build the lightweight Intrusion Detection System. Their
results showed that random forests outperformed Support A. Data Set
Vector Machines (SVM) and artificial neural networks The breast cancer survivability data were obtained from
(ANN). However, the random forests classifier is weak in Srinagarind Hospital in Thailand. The data include patient
high noise data that could cause an overfitting problem information and treatment choice of patients who were
which reduced the accuracy of models in the unseen data set diagnosed with breast cancer in 1990-2001. The breast
(test set). Moreover, it suffers from the difficulty of tree cancer survivability data consist of 2,462 instances and 26
growing without pruning [21]. attributes. After studying descriptive statistics the result
C. AdaBoost and Random Forests showed that some attributes have more than 30% missing
values while some attributes have only one value. The
For the combination of AdaBoost and random forests
reason is that some patients have been diagnosed in
(ABRF) technique used in this paper, we utilized the random
Srinagarind, but received treatments in other hospitals. As
forest as a weak learner to generate the prediction models
so, we eliminated the outlier instances from the original data
with less error rate. Although AdaBoost works fast with
set [25]. Therefore, the final data set consists of 570
simple weak learners, random forest is of interest in our real
instances, 11 attributes and a binary class attribute.
world data set, due to few research studies having employed
Moreover, the binary attribute was coded as 0 (‘dead’) if a
this method to predict in the medical domain. Thirteen steps
patient survives less than 60 months otherwise was coded as
of the hybrid AdaBoost and random forests algorithm are
1 (‘alive’). The whole data set is divided into two classes,
given in Fig. 3.
‘dead’ and ‘alive’ class, in which the ‘dead’ class has 322
instances while the ‘alive’ class consists of 248 instances.
Input: S: training set, S=xi(i=1,2,…,n), labels yi ∈ Y
K: Iterations number
The attribute list is presented in Table I.
L:Learn (Random Forests algorithm as weak learner) B. Evaluation Methods
f: number of input instance to be used at each of the tree
B: number of generated trees in random forest In this experiment, we applied three evaluation methods
1) Assign N sample (x1,y1),..,(xn,yn); xi ∈ X , yi ∈ {-1,+1} including basic performance measures, ROC curve and AUC
2) Initialize the weights of D1(i)=1/n, i=1,…,n) score. These evaluation methods are based on the confusion
3) for k=1,…,K matrix. The confusion matrix is a visualization tool
4) empty E with the distribution Dk
5) for b=1to B
commonly used to present performances of classifiers in
6) Sb = booststrapSample(S) classification tasks [7]. It is used to show the relationships
7) Cb = BuildRandomTreeClassifiers(Sb,f) between real class attributes and that of predicted classes.
8) E=E ∪ {Cb} The level of effectiveness of the classification model is
9) next b calculated with the number of correct and incorrect
10) Get weak hypothesis hk:XÆ{-1,+1} with its error: ε k = ¦D
i = hk ( xi ) ≠ yi
k (i )
classifications in each possible value of the variables being
11) Update distribution Dk: Dk +1 (i ) = Dk (i ) exp(−α k yk k k ( x k )) classified in the confusion matrix [26] (see Fig. 4).
12) next k zk
§ K
·
13) Output : H( x) = sign¨ ¦ α k hk ( x) ¸ Predicted Classes
© k =1 ¹ ‘Dead’ ‘Alive’
‘Dead’ TP FN
Outcomes
Fig. 3. AdaBoost and random forests ‘Alive’ FP TN
This combination has advantages including increased
Fig. 4. The confusion matrix
performance and prediction ability of the models in some
data sets. For instance, Leshem [24] utilized AdaBoost The confusion matrix is used to compute true positives
algorithm and random forests algorithm as the base learning (TP), false positives (FP), true negatives (TN) and false
algorithms to predict the traffic flow. Their results showed negatives (FN), as represented in Fig. 4.
that this combination has a low error rate. Error rate is the 1) Performance Measures
basic measurement method, which is used to investigate the There are three commonly used performance
weak and strong points of algorithms for various evaluation measurements including accuracy, sensitivity and specificity

3064 2008 International Joint Conference on Neural Networks (IJCNN 2008)

[7]. The accuracy of classifiers is the percentage of Ling [31] demonstrated that AUC is a better evaluation
correctness of outcome among the test sets exploited in this measure than accuracy or error rate.
study. It is defined in (2). The sensitivity is referred as the
true positive rate, and the specificity as the true negative IV. RESULTS AND DISCUSSIONS
rate. Both sensitivity and specificity used for measuring the The WEKA environment is a well defined framework and
factors that affect the performance are presented in (3) and offers a variety of learning algorithms for the development
(4), respectively. of new data mining and machine learning algorithms.
WEKA version 3.5.6 [32] tools used in the data mining were
TP + TN selected to evaluate the performance and effectiveness of the
accuracy = (2)
TP + FP + TN + FN combination of AdaBoost and random forests (ABRF)
TP methods. In Section IV(A), the performance of the proposed
sensitivit y = (3)
algorithm is compared with 10 single classifiers including
TP + FP
AdaBoost, alternating decision tree (ADTree), Bagging,
TN
specificity = (4) C4.5, conjunctive rule, Naïve Bayes, Nearest-Neighbor
TN + FN classifier (NN-classifier), random forests, Repeated
In this study, the sensitivity is the probability of correct Incremental Pruning to Produce Error Reduction (RIPPER)
tests among “dead” patients. In contrast, the specificity is and Support Vector Machines (SVM). In Section IV(B) the
the probability of correct tests among “alive” patients. capability of the proposed algorithm is compared with 8
2) Receiver Operating Characteristic (ROC) Curve based classifiers of AdaBoost including ADTree, C4.5,
The Receiver operating characteristic curve graphically conjunctive rule, decision stump, Naïve Bayes, NN-
interprets the performance of the decision-making algorithm classifier, RIPPER and SVM. In Section IV(C) the
with regard to the decision parameter [27], [28]. It has often performance of ensemble methods with and without random
been used not only as an evaluation criterion for the forests were compared using ROC curve and AUC score.
predictive performance of classification or data mining Experiments were performed using a 10-fold cross-
algorithms but also as an alternative single-number measure validation approach to reduce the bias associated with the
for evaluating performance of learning algorithms [29]. random sampling strategy [33], [34] on the breast cancer
survivability data set from Srinagarind hospital in Thailand.
TP For each of the 10 runs, the data set was divided into a
TPR = (5)
TP + FN training (9 folds) set and a test set (remaining fold). The
FP results were averaged over these 10 runs.
FPR = (6)
TN + FP A. Classification Performance comparison among Single
ROC curve is a two-dimension graph in which the true Classifier
positive rate (TPR) (5) is plotted on the Y axis and the false
In these experiments, the performance and effectiveness
positive rate (FPR) (6) is plotted on the X axis. TPR is the
of the proposed algorithm is compared with 10 single
true positive value which is the number of correct
classifiers including AdaBoost, ADTree, Bagging, C4.5,
predictions in the death class. FPR is the false positive
conjunctive rules, Naïve Bayes, NN-classifiers, random
value which is the number of incorrect predictions in the
forests, RIPPER and SVM. The experiment results were
death class. Two important points in the ROC curve include
displayed in Table II.
the lower left point (0; 0) and the upper right point (1; 1).
Table II shows the accuracies, sensitivities and
The lower left point (0; 0) represents the strategy of never
specificities of 11 classifiers including ABRF, AdaBoost,
issuing a positive classification; such a classifier commits no
ADTree, Bagging, C4.5, conjunctive rule, Naïve Bayes, NN-
false positive errors but also gains on true positives. The
classifier, random forests, RIPPER and SVM. The
upper right point (1; 1) represents the opposite strategy, of
experiment results show that the accuracy of ABRF is 100%
unconditionally issuing positive classifications. Furthermore,
when evaluating the training set and 88.60% when
ROC analysis offers more robust evaluation of the relative
evaluating the test set with 10 iterations. Besides, using
prediction performance of the models than the tradition
random forests as the weak learner algorithm increases the
comparison of relative error, such as error rate [30].
average model accuracy (8.25%), sensitivity (11.37%) and
3) Area under the Receiver Operating Characteristic
specificity (2.6%), compared with the basic AdaBoost in the
Curve (AUC)
test set. On the other hand, random forests as the weak
An Area under the ROC Curve (AUC) is traditionally used
learner algorithm increases 2.81%, 2.67% and 3% of average
in medical diagnosis systems. Recently, the AUC has been
model accuracy, sensitivity and specificity compared with
introduced as an alternative measure for evaluating the
the random forests algorithm. As shown from the results, the
predictive ability of learning algorithms [7]. The AUC also
ABRF algorithm is better than basic AdaBoost and random
provides an approach for evaluating models based on an
forest in terms of accuracy, sensitivity and specificity and it
average of each point on the curve [27], [28]. Thus, the AUC
is better than random forests when addressing the overfitting
score is always between 0 and 1 for classifier performance
problem. In addition, the ABRF algorithm outperforms
and the model with a higher AUC score gives a better
ADTree, Bagging, C4.5, conjunctive rule, Naïve Bayes, NN-
classifier performance than others. Moreover, Huang and
classifier, RIPPER and SVM in both training and test sets.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3065

This may be due to the ability to vote the important instance
95
of the AdaBoost algorithm, which reduces the error of ABRF

random forests algorithm; this improve the performance of ADTree

C4.5
Conjunctive rule
the prediction model. Decision s tump
Naïve Bayes
90 NN-classifier

B. Classification Performance Comparison among RIPPER

SVM

Multiple Classifiers Combination

In these experiments, the capability of the proposed

Specificity(%)
85
algorithm is compared with 8 weak learners including
ADTree, C4.5, conjunctive rule, decision stump, Naïve
Bayes, NN-classifier, RIPPER and SVM of AdaBoost. The 80
default settings of each weak learner are used to generate the
models. These models were evaluated using 10-fold cross-
validation and measuring the accuracy, sensitivity, 75
specificity. The experiment involved increasing the iteration
of AdaBoost by 5 iterations each time until 100 iterations to
illustrate the performance of models. The experiment results 0 10 20 30 40 50 60 70 80 90 100
Iterations
were given in Figs. 5, 6 and 7, respectively.
Fig. 7. The specificity comparison

95 Figs. 5, 6 and 7 show performance measures in terms of

accuracy, sensitivity and specificity of ABRF and 8 weak
learner algorithms including ADTree, C4.5, decision stump,
90 conjunctive rule, Naïve Bayes, NN-classifier, RIPPER and
SVM, respectively. The experiment results show that the
ABRF algorithm outperforms ADTree, C4.5, decision
Accuracy(%)

85 stump, conjunctive rule, Naïve Bayes, NN-classifier,

RIPPER and SVM. Likewise, using random forests as the
weak learner algorithm increases 7.85%, 9.93% and 4.54%
80
of average model accuracy, sensitivity and specificity
ABRF
ADT ree
C4.5
compared with the basic AdaBoost in the same test set after
Conjunctive rule
Decis ion stump
running 100 times. Moreover, our results indicate that most
75 Naïve Bayes
NN-cl assifier
RIPPE R
hybrid classifiers including ADTree, C4.5, conjunctive rule
SVM
and decision stump improved their accuracy after 10 rounds
0 10 20 30 40 50 60 70 80 90 100 of re-boost. On the other hand, Naïve Bayes and NN-
Iterations
classifier decreased the accuracy after 10 rounds. Although
Fig. 5. The accuracy comparison
the sensitivity of ensemble classifiers seems stable, the
specificity of ensemble classifiers seems uncertain in the
95 results. It might be related to the fact that AdaBoost
concentrates on improving the majority class which is the
‘dead’ class in our case.
90 In addition, the statistic analysis including the minimum
(Min), maximum (Max), average (Avg) and variance (Var)
of the accuracy, sensitivity and specificity of the proposed
Sensitivity(%)

85 method was utilized. The analysis results were given in

Table III.
Table III shows the statistical results of accuracy,
80
ABRF
sensitivity and specificity including minimum, maximum,
ADT ree
C4.5
average and variance of ensemble classifiers. The
Conjunctive rule
Decision stump experiment results display that the average accuracy,
75 Naïve Bayes
NN-classifier
RIPPER sensitivity and specificity of ABRF algorithm outperform
SVM
other ensemble classifiers. Furthermore, the variance of
0 10 20 30 40 50 60 70 80 90 100
Iterations accuracy and specificity of ABRF being the lowest indicates
that ABRF is keeping stable even increasing the iteration.
Fig. 6. The sensitivity comparison Therefore, the computation cost can be reduced by applying
a few iterations.
C. Model Selection
In these experiments, the performance of the proposed
algorithm (ABRF) is compared with three classifiers

3066 2008 International Joint Conference on Neural Networks (IJCNN 2008)

including AdaBoost, random forests and C4.5 using the V. CONCLUSION AND FUTURE WORK
ROC curve. The experiment results were given in Fig. 8. In this paper we proposed a combination of the AdaBoost
and random forests algorithms for constructing a breast
cancer survivability prediction model. We illustrated the
capability and effectiveness of the proposed method using
10-fold cross-validation, accuracy, sensitivity, specificity.
The results showed that the proposed method improved the
accuracy up to 88.60% compared with several single and
combined classifiers. Although the prediction accuracy
improvements in terms of ROC and AUC achieved from the
experiments some times was not significant, the experiment
results have shown the improvement of the models for
further developing suitable prediction models.
As results, the proposed method is capable of extracting
patterns, but without the cooperation and feedback from the
medical practitioner, these results would be useless.
Besides, this method is not aimed at replacing the medical
practitioner and researchers, but rather to complement their
invaluable efforts to save more human lives. Therefore, the
Fig. 8. ROC curve patterns found via this hybrid method should be evaluated by
medical practitioners.
Fig. 8 illustrates the predictive performance of four As for further work, we plan to investigate the diversity of
classifiers including AdaBoost, random forests, ABRF and the number of classifiers in the ensemble and compare them
C4.5. The results show that ABRF algorithm improves the with other ensemble methods in this aspect. Another
prediction ability of random forests in some points and possibility to investigate is using the ABRF algorithm in
performs relatively well compared with AdaBoost and C4.5 larger data sets. Finally, a comparison of the rule-based
in terms of ROC curve. However, it is hardly possible to AdaBoost ensemble would be of interest.
distinguish the difference in performance between ABRF
and random forests models in ROC curve. Therefore, the
advance techniques used to select these models such as AUC ACKNOWLEDGEMENTS
scores is needed. The experiment results were shown in Fig. Thanks to IT and Cancer department staffs at Srinagarind
9. Hospital for providing the data. Thanks to Dr. Vatinee
Sukmak for helpful comments, suggestions and criticisms.
100
94.30 94.40
95
REFERENCES
90 86.90
85.20 [1] National Cancer Institute of Thailand, “Cancer in Thailand 1995-
85 1997,” Available: http://www.nci.go.th/cancer_record/.
80
[2] T. Srinivasan, A. Chandrasekhar, J. Seshadri and J. B. S. Jonathan,
“Knowledge discovery in clinical databases with neural network
75 evidence combination,” in Proc. International Conference on
AdaBoost Random ABRF C4.5 Intelligent Sensing and Information, 2005, pp. 512-517.
For es ts
[3] D. Delen, G. Walker and A. Kadam, “Predicting breast cancer
survivability: a comparison of three data mining methods,” J.
Class if ier s
Artificial Intelligence in Medicine, vol. 34, pp. 113-127, 2005.
Fig. 9. AUC scores [4] Y. U. Ryu, R. Chandrasekaran and V. S. Jacob, “Breast cancer
prediction using the isotonic separation technique,” J. European
Operational Research, vol. 181, pp. 842-854, 2007.
Fig. 9 displays the results of the AUC scores of four [5] S. Borovkova, “Analysis of survival data,” Available:
classifiers including AdaBoost, random forests, ABRF and http://www.math.leidenuniv.nl/~naw/serie5/deel03/dec2002/pdf/boro
C4.5. Although experiment results indicated that ABRF vkova.pdf.
outperforms AdaBoost, random forests and C4.5 in our data [6] L. Ohno-Machado, “Modeling medical prognosis: survival analysis
techniques,” J. Biomedical Informatics, vol. 34, pp. 428-439, 2001.
set, the performance of ABRF using the AUC scores does [7] J. Han and M. Kamber, Data mining: concepts and techniques. 2nd.
not seem a significant improvement. It might be due to the ed. San Francisco: Morgan Kaufmann, Elsevier Science, 2006.
fact that AdaBoost concentrates on improving the error rate [8] M. T. Skevofilakas, K. S. Nikita, P. H. Templaleksis, K. N. Birbas, I.
in the majority class. As shown from the results, using RF G. Kaklamanos and G. N. Bonatsos, “A decision support system for
breast cancer treatment based on data mining technologies and
as a base learning algorithm reduces the ability of prediction clinical practice guidelines,” in IEEE-EMBS the Twenty-Seventh
in the ‘alive’ class but makes the prediction of the false Annual International Conference on Medicine and Biology Society,
positive in ‘dead’ class lower than using RF alone. 2005, pp. 2429-2432.
[9] W. Yi and W. Fuyong, “Breast cancer diagnosis via support vector
machines,” in Proc. the Twenty Fifth Chinese Control Conference,
2006, pp. 1853-1856.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3067

[10] X. Xiong, Y. Kim, Y. Baek, D. W. Rhee and S.-H. Kim., “Analysis of
breast cancer using data mining & statistical techniques,” in the Sixth
International Conference on Software Engineering, Artificial
Intelligence, Networking and Parallel, 2005, pp. 82-87.
[11] C.-Y. Wang, C.-G. Wu, Y.-C. Liang and X.-C. Guo, “Diagnosis of
breast cancer tumor based on ICA and LS-SVM,” in Proc. IEEE
International Conference on Machine Learning and Cybernetics,
2006, pp. 2565-2570.
[12] Y. Ma and X. Ding, “Robust real-time face detection based on cost-
sensitive AdaBoost method,” in Proc. the International Conference
on Multimedia and Expo, 2003, pp. 465-473.
[13] A. Vezhnevets and V. Vezhnevets, “' Modest AdaB oost' - teaching
AdaBoost to generalize better,” Novosibirsk Akademgorodok, Russia
2005.
[14] M. Zhou and H. Wei, “Face Verification Using GaborWavelets and
AdaBoost,” in the Eighteenth International Conference on Pattern
Recognition, Hong Kong, 2006, pp. 404-407.
[15] Y. Sun, Y. Wang and A. K. C. Wong, “Boosting an associative
classifier,” IEEE Trans. Knowledge and Data Engineering vol. 18,
pp. 988-992, 2006.
[16] R. E. Schapire, “A brief introduction to boosting,” in Proc. the
International Joint Conference on Artificial Intelligence, 1999, pp.
1401-1405.
[17] R. E. Schapire and Y. Singer, “Improved boosting algorithms using
confidence-rated predictions,” J. Machine Learning, vol. 37(3), pp.
297-336, 1999.
[18] Y. Freund and R. E. Schapire, “Experiments with a new boosting
algorithm,” in Proc. the Thirteenth International Conference on
Machine Learning, San Francisco, 1996, pp. 148-156.
[19] J. Friedman, T. Hastie and R. Tibshirani, “Additive logistic
regression: A statistical view of boosting,” J. the Annals of Statistics,
vol. 38, pp. 337-374, 2000.
[20] L. Breiman, “Random Forests,” J. Machine Learning vol. 45, pp. 5–
32, 2001.
[21] N. Meinshausen, “Quantile Regression Forests,” J. Machine Learning
Research, vol. 7, pp. 983–999, 2006.
[22] L. Breiman, J. Friedman, R. Olshen and C. Stone, Classification and
regression trees. Wadsworth: Belmont, 1984.
[23] D. S. Kim, S. M. Lee and J. S. Park, “Building lightweight intrusion
detection system based on random forest,” in Advances in Neural
Networks, vol. 3973, Springer-Verlag Berlin Heidelberg, 2006.
[24] G. Leshem and Y. a. Ritov, “Traffic flow prediction using adaboost
algorithm with random forests as a weak learner,” J. International
Journal of Intelligent Technology, vol. 2, pp. 1305-6417, 2007.
[25] J. Thongkam, G. Xu, Y. Zhang and F. Huang, “Support vector
machines for outlier detection in cancers survivability prediction,” in
International Workshop on Health Data Management, to be
published, 2008.
[26] P. Cabena, P. Hadjinian, R. Stadler, J. Verhees and A. Zanasi,
Discovering data mining from concept to implementation. Upper
Saddle River, N.J.: Prentice Hall, 1998.
[27] X. He and E. C. Frey, “Three-class ROC analysis-the equal error
utility assumption and the optimality of three-class ROC surface using
the ideal observer,” IEEE Trans. Medical Imaging,vol. 25(8), pp.
979-986, 2006.
[28] K. Woods and K. W. Bowyer, “Generating ROC curves for artificial
neural networks,” IEEE Trans. Medical Imaging,vol. 16(3), pp. 329-
337, 1997.
[29] S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled and D. Roth,
“Generalization bounds for the area under the ROC curve,” J.
Machine Learning Research, vol. 6, pp. 393-425, 2005.
[30] R. O. Duda, D. G. Stork and P. E. Hart, Pattern classification. 2nd ed.
ed. New York: Wiley, 2001.
[31] J. Huang and C. X. Ling, “Using AUC and accuracy in evaluating
learning algorithms,” IEEE Trans. Knowledge and Data
Engineering,vol. 17(3), pp. 299-310, 2005.
[32] I. H. Witten and E. Frank, Data mining: practical machine learning
tools and techniques. 2 ed. San Francisco: Morgan Kaufmann, 2005.
[33] R. Kohavi, “A study of cross-validation and bootstrap for accuracy
estimation and model selection,” in Proc. the International Joint
Conference on Artificial Intelligence, 1995, pp. 1137-1143.
[34] J. Thongkam, G. Xu and Y. Zhang, “An analysis of data selection
methods on classifiers accuracy measures,” J. Korn Ken University,
vol. 35(1), Jan-Feb 2008.
3068 2008 International Joint Conference on Neural Networks (IJCNN 2008)
TABLE I
INPUT ATTRIBUTES OF BREAST CANCER DATA
No Attributes Types
1 Age Number
2 Marital Status Category(3)
3 Occupation Category(26)
4 Basis of diagnosis Category(6)
5 Topography Category(9)
6 Morphology Category(14)
7 Extent Category(4)
8 Stage Category(4)
9 Received Surgery Category(2)
10 Received Radiation Category(2)
11 Received Chemo Category(2)
12 Survivability (Class) Category(2)

TABLE II
PERFORMANCE COMPPARSON AMONG SINGLE CLASSIFIER ON THE TRAINING AND TEST SETS
Training Set Test Set
Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity
Classifiers (%) (%) (%) (%) (%) (%)
ABRF 100.00 100.00 100.00 88.60 89.30 87.65
AdaBoost 80.88 78.55 85.28 80.35 77.93 85.05
ADTree 85.09 85.59 84.39 82.28 83.59 80.50
Bagging 91.23 92.24 89.92 83.86 84.64 82.77
C4.5 92.46 93.19 91.50 84.04 87.38 80.08
Conjunctive Rule 77.54 74.74 83.71 77.54 74.74 83.71
Naïve Bayes 84.04 85.54 82.04 83.51 84.97 81.56
NN-classifier 100.00 100.00 100.00 83.86 85.49 81.71
Random forests 99.65 99.69 99.60 85.79 86.63 84.65
RIPPER 87.54 91.15 83.40 85.79 88.25 82.75
SVM 99.82 99.69 100.00 85.96 86.45 85.29

TABLE III
PERFORMANCE COMPPARSON AMONG MULTIPLE CLASSIFIERS ON TEST SETS
Accuracy (%) Sensitivity (%) Specificity (%)
Based Classifiers Min Max Avg Var Min Max Avg Var Min Max Avg Var
ABRF 88.42 89.30 88.79 0.05 88.36 90.37 89.79 0.18 86.99 88.94 87.48 0.22
AD Tree 81.05 87.72 86.07 4.34 83.23 89.62 88.02 3.67 77.56 85.89 83.56 0.51
C4.5 82.81 88.07 86.95 1.25 86.36 89.66 88.35 0.64 78.63 86.59 85.15 2.80
Conjunctive Rule 77.37 81.58 80.94 1.61 74.55 80.91 79.85 4.08 82.65 85.33 82.94 0.43
Decision Stump 77.54 81.75 80.40 0.92 74.74 81.87 79.74 3.38 80.18 85.05 81.69 2.15
Naïve Bayes 81.75 83.51 81.98 0.25 84.71 85.94 84.87 0.13 78.13 81.56 78.44 0.64
NN-classifier 81.58 83.68 82.15 0.23 84.01 85.23 84.47 0.08 78.49 81.63 79.19 0.60
RIPPER 84.21 86.49 85.74 0.15 86.19 87.46 86.38 0.20 80.38 85.29 84.89 1.33
SVM 85.96 88.42 87.79 0.29 87.35 89.02 88.40 0.18 84.15 87.76 86.98 0.65
Note: Min refers to minimum; Max refers to maximum; Avg refers to average; Var refers to variance.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3069

Arihant Trigonometry Unlocked
90% (10)
Arihant Trigonometry Unlocked
401 pages
Isf2015 PPT
No ratings yet
Isf2015 PPT
31 pages
Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
Team 03
No ratings yet
Team 03
21 pages
Improving Breast Cancer Diagnosis Accuracy by Particle Swarm Optimization Feature Selection
No ratings yet
Improving Breast Cancer Diagnosis Accuracy by Particle Swarm Optimization Feature Selection
30 pages
Project Title and Abstract
No ratings yet
Project Title and Abstract
17 pages
Oktafiani 2024 Ijca 9235371
No ratings yet
Oktafiani 2024 Ijca 9235371
9 pages
AMCIS 2020 Slide Template ERF
No ratings yet
AMCIS 2020 Slide Template ERF
14 pages
Breast Cancer Classifier Using Machine Learning
No ratings yet
Breast Cancer Classifier Using Machine Learning
7 pages
Paper Draft Mtech
No ratings yet
Paper Draft Mtech
4 pages
ITM Document Format - Vedant
No ratings yet
ITM Document Format - Vedant
5 pages
Chapter One To Three
No ratings yet
Chapter One To Three
39 pages
CHAPTER ONE To 3-1
No ratings yet
CHAPTER ONE To 3-1
51 pages
An Optimized Approach For Prediction of Heart Diseases Using Gradient Boosting Classifier
No ratings yet
An Optimized Approach For Prediction of Heart Diseases Using Gradient Boosting Classifier
7 pages
An Optimized Approach For Prediction of Heart Diseases Using Gradient Boosting Classifier
No ratings yet
An Optimized Approach For Prediction of Heart Diseases Using Gradient Boosting Classifier
7 pages
Topluluk Öğrenmesi Yöntemleri Ile Göğüs Kanseri Teşhisi: Breast Cancer Diagnosis With Ensemble Learning Methods
No ratings yet
Topluluk Öğrenmesi Yöntemleri Ile Göğüs Kanseri Teşhisi: Breast Cancer Diagnosis With Ensemble Learning Methods
17 pages
An Improved (2013)
No ratings yet
An Improved (2013)
15 pages
Breast Cancer Prediction (Final)
No ratings yet
Breast Cancer Prediction (Final)
28 pages
ML in Healthcare
No ratings yet
ML in Healthcare
5 pages
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
No ratings yet
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
4 pages
Irjet V6i31160
No ratings yet
Irjet V6i31160
7 pages
(IJCST-V11I3P3) :DR M Narendra, A Nandini, T Kamal Raj, V Sai Sowmya, CH Brahma Reddy
No ratings yet
(IJCST-V11I3P3) :DR M Narendra, A Nandini, T Kamal Raj, V Sai Sowmya, CH Brahma Reddy
3 pages
AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes
No ratings yet
AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes
22 pages
Computer-Aided Diagnosis Using Machine Learning Techniques
No ratings yet
Computer-Aided Diagnosis Using Machine Learning Techniques
5 pages
Itmconf Icacc2022 03057
No ratings yet
Itmconf Icacc2022 03057
6 pages
Breast Cancer Diagnosis Using Machine
No ratings yet
Breast Cancer Diagnosis Using Machine
11 pages
2005 Predicting Breast Cancer Survivability - A Comparison of Three Data Mining Methods
No ratings yet
2005 Predicting Breast Cancer Survivability - A Comparison of Three Data Mining Methods
15 pages
Comparison of Various Data Mining Methods For Early Diagnosis of Human Cardiology
No ratings yet
Comparison of Various Data Mining Methods For Early Diagnosis of Human Cardiology
9 pages
Breast Cancer Detection Using GA Feature Selection and Rotation Forest
No ratings yet
Breast Cancer Detection Using GA Feature Selection and Rotation Forest
11 pages
Paper - Heart Disease Prediction
No ratings yet
Paper - Heart Disease Prediction
5 pages
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
No ratings yet
A Hybrid Model To Predict The Breast Cancer Using Stacking and Bagging Model
6 pages
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
No ratings yet
Enhancing Breast Cancer Diagnosis: A Comparative Analysis of Feature Selection Techniques
11 pages
Predicting Breast Cancer Recurrence Using Effective Classification and Feature Selection Technique
No ratings yet
Predicting Breast Cancer Recurrence Using Effective Classification and Feature Selection Technique
1 page
Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
No ratings yet
Mining Big Data: Breast Cancer Prediction Using DT - SVM Hybrid Model
12 pages
BR Inel
No ratings yet
BR Inel
11 pages
iosrjournals.org
No ratings yet
iosrjournals.org
6 pages
Wjarr 2024 2024
No ratings yet
Wjarr 2024 2024
7 pages
s40537 019 0247 7
No ratings yet
s40537 019 0247 7
15 pages
Traffic Analysis - LMC-01
67% (3)
Traffic Analysis - LMC-01
15 pages
Intelligent Disease Diagnosis: A Multi-Disease Prediction Approach Using Machine Learning
No ratings yet
Intelligent Disease Diagnosis: A Multi-Disease Prediction Approach Using Machine Learning
12 pages
Ijet V7i2 8 10557
No ratings yet
Ijet V7i2 8 10557
4 pages
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
No ratings yet
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
11 pages
An Intelligent System For Automated Breast Cancer Diagnosis and Prognosis Using SVM Based Classifiers
No ratings yet
An Intelligent System For Automated Breast Cancer Diagnosis and Prognosis Using SVM Based Classifiers
2 pages
Breast Cancer Prediction Model With Decision Tree and Adaptive Boosting
No ratings yet
Breast Cancer Prediction Model With Decision Tree and Adaptive Boosting
7 pages
Breast Cancer Prediction Model Assignment
No ratings yet
Breast Cancer Prediction Model Assignment
37 pages
REASEARCH
No ratings yet
REASEARCH
4 pages
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
No ratings yet
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
6 pages
Multilevel Classification Algorithm Using Diagnosis and Prognosis of Breast Cancer
No ratings yet
Multilevel Classification Algorithm Using Diagnosis and Prognosis of Breast Cancer
3 pages
Research Paper Final
No ratings yet
Research Paper Final
11 pages
Multi-Disease Prediction With Machine Learning
No ratings yet
Multi-Disease Prediction With Machine Learning
7 pages
Utilizing Cutting-Edge Machine Learning Methods Fo - 241221 - 101813 Paper
No ratings yet
Utilizing Cutting-Edge Machine Learning Methods Fo - 241221 - 101813 Paper
7 pages
Neural Network
No ratings yet
Neural Network
15 pages
Breast Cancer Modeling and Prediction Combining
No ratings yet
Breast Cancer Modeling and Prediction Combining
6 pages
Journal-Breast Cancer Prediction
No ratings yet
Journal-Breast Cancer Prediction
10 pages
(IJCST-V7I4P8) : Nitasha
No ratings yet
(IJCST-V7I4P8) : Nitasha
4 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
4 pages
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
No ratings yet
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
5 pages
Computer Science Extended Essay First Draft (Second Version)
No ratings yet
Computer Science Extended Essay First Draft (Second Version)
10 pages
Acer Aspire v5-572p Quanta ZQK DAOZQKMB8E0 Rev1A Schematic
100% (1)
Acer Aspire v5-572p Quanta ZQK DAOZQKMB8E0 Rev1A Schematic
46 pages
Dialog4 Deckel FP1
100% (1)
Dialog4 Deckel FP1
23 pages
Curs 5 EEDI Rev 2020 EN - Pps
No ratings yet
Curs 5 EEDI Rev 2020 EN - Pps
38 pages
CFD Tutorial 1 - Elbow
100% (1)
CFD Tutorial 1 - Elbow
26 pages
Catia V5 Bending Torsion Tension Shear Tutorial
No ratings yet
Catia V5 Bending Torsion Tension Shear Tutorial
18 pages
Emeng 3131 Electrical Power Systems: Power System Transients, Power System Stability & Load Flow Studies
No ratings yet
Emeng 3131 Electrical Power Systems: Power System Transients, Power System Stability & Load Flow Studies
25 pages
Det-Tronics Flame Detector
No ratings yet
Det-Tronics Flame Detector
2 pages
2 UG Crystal Note
No ratings yet
2 UG Crystal Note
97 pages
Basic Calculus
No ratings yet
Basic Calculus
12 pages
EastWestAirlines Cluster
100% (1)
EastWestAirlines Cluster
6 pages
en Signalwandler SVE4
No ratings yet
en Signalwandler SVE4
2 pages
STA301-Quiz-2 by Vu Topper RM
No ratings yet
STA301-Quiz-2 by Vu Topper RM
125 pages
6 2 Reflections (Day 1) Lesson Plan
No ratings yet
6 2 Reflections (Day 1) Lesson Plan
3 pages
Comparison of Sound Insulation of Windows With Double Glass Units
No ratings yet
Comparison of Sound Insulation of Windows With Double Glass Units
5 pages
QED User Manual
No ratings yet
QED User Manual
57 pages
Laser Maser
No ratings yet
Laser Maser
4 pages
TM11-2631 Antenna Equipment RC-154-A, 1944
No ratings yet
TM11-2631 Antenna Equipment RC-154-A, 1944
32 pages
Paper - On-Site Investigation Techniques For The Structural Evaluation of Historic Masonry Buildings
No ratings yet
Paper - On-Site Investigation Techniques For The Structural Evaluation of Historic Masonry Buildings
8 pages
Color & Gloss Measurement
No ratings yet
Color & Gloss Measurement
4 pages
CEM3 Manual
No ratings yet
CEM3 Manual
76 pages
MVD Universal Battery Chargers
0% (1)
MVD Universal Battery Chargers
4 pages
Gravitation Test Series
No ratings yet
Gravitation Test Series
3 pages
Biomedicines 11 00184 v3
No ratings yet
Biomedicines 11 00184 v3
22 pages
Positouch DBF Files 2
No ratings yet
Positouch DBF Files 2
68 pages
A Novel Brain Tumor Classification Model
No ratings yet
A Novel Brain Tumor Classification Model
12 pages
Abiwinanda 2018
No ratings yet
Abiwinanda 2018
7 pages
Ferro Electric
No ratings yet
Ferro Electric
33 pages
CURVED BEAM 2021 PP 1-20
No ratings yet
CURVED BEAM 2021 PP 1-20
20 pages
Bar 2
No ratings yet
Bar 2
3 pages
For Correction
No ratings yet
For Correction
8 pages
For Correction2
No ratings yet
For Correction2
7 pages
Saleh 2020
No ratings yet
Saleh 2020
6 pages
DT CVD SPSS
No ratings yet
DT CVD SPSS
6 pages
Selective Determination of Fe (III) in Fe (II) Samples by UV-spectrophotometry With The Aid of Quercetin and Morin
No ratings yet
Selective Determination of Fe (III) in Fe (II) Samples by UV-spectrophotometry With The Aid of Quercetin and Morin
8 pages
Reed Switch Oil Level Sensor
No ratings yet
Reed Switch Oil Level Sensor
2 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet

Thong Kam 2008

Uploaded by

Thong Kam 2008

Uploaded by

AdaBoost Algorithm with Random Forests for Predicting Breast

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3063

3064 2008 International Joint Conference on Neural Networks (IJCNN 2008)

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3065

random forests algorithm; this improve the performance of ADTree

B. Classification Performance Comparison among RIPPER

Multiple Classifiers Combination

95 Figs. 5, 6 and 7 show performance measures in terms of

85 stump, conjunctive rule, Naïve Bayes, NN-classifier,

85 method was utilized. The analysis results were given in

3066 2008 International Joint Conference on Neural Networks (IJCNN 2008)

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3067

2008 International Joint Conference on Neural Networks (IJCNN 2008) 3069

You might also like