Models To Predict Cardiovascular Risk - Comparison of CART, Multilayer Perceptron and Logistic Regression
Models To Predict Cardiovascular Risk - Comparison of CART, Multilayer Perceptron and Logistic Regression
157
Results bootstrap and bagging).
CART can take as input either continuous or
Reference logistic model categorical variables. No distributional hypothesis is
The reference logistic model takes into account seven required for these variables. However, the tree
out of the ten original variables. Table 1 describes structure relies on their binarization.
these clinical characteristics in the database. Table 2 CART output provides a classification matrix, that
presents their predictive importance in the logistic allows to calculate the sensitivity and specificity. For
model. The n-categorical variables are transformed a given case, the classification process checks which
for the model into n - 1 binary variables. terminal node applies to the case. The output includes
Table 1: Descriptive characteristics of the total the following information on each terminal node:
population for diseased and non diseased people * a probability which represents the membership
degree of the terminal node to the class. This
Mean (SD) or % With outcome Without outcome probability is computed, taking into account the
(n = 891) (n = 14,553) relative frequency of each class in the terminal node
Age (y) 60.5 (9.7) 52.5 (9.5) and its relative size compared with the whole
Sex (% males) 66% 52% sample.
SBP* (mmdg) 174 (22) 161 (20) * the class assigned to the terminal node according to
DBP* (mHg) 98 (10) 98 (8) its probability and to pre-specified misclassification
Diabetes (%) 2.1 1.4 costs.
Smokers-Sk (%) 36.3 29.7
ECG-LVH (%) 24.2 11.2 Table 2: Ranking ofpredictive variables: comparison
BMI* (kg/im) 26.9 (4.5) 27.3 (4.6) of CART, MLP and logistic regression
TC (mmoll) 6.4 (1.2) 6.3 (1.1) Logistic Regression MLP (%) CART (%)
*: SBP: systolic blood pressure, DBP: diastolic blood (coefficient/SD)
pressure, Sk: binary smoking satus; TC: total cholesterol; Age (13,7) Age (100) Age (100)
ECG-LVH: left ventricular hypertrophy at ECG; BMI: Sex (9,6) Sex (38.1) SBP (59.6)
body mass index Sk-cat2* (5,4) SBP (17.5) ECG (26.5)
Comparison of implementation criteria SBP (5,4) ECG (5.7) Sex (23.0)
Multilayer perceptron (MLP) TC (4,3) TC (5.7) Sk-cat (16.3)
The learning time was not a major problem with the ECG1*(4,0) Sk-bin (3.4) Sk-bin (15.2)
MLP (no more than few seconds) as far as we did not ECG2*(3,3) DBP (1.4) TC (14.5)
use any bootstrapping method for optimization of the Sk-cat3* (2,4) Diabetes (0.7) DBP (13.9)
learning process. The learning time is slightly Diabetes (2,4) Sk-cat (0.2) Diabetes (1.3)
influenced by several other learning parameters such Sk-cati * (2,0) BMI (0.2) BMI (0.9)
as the number of iterations required, the number of ECG4* (1,8)
cross validations, the pre-treatment of data by ECG3 *(1,2)
normalization. *binary variable recoded from a multi-categorical variable
Input data have to be numerical and this is the only Comparison of methods' explicative performance
requirement for the learning system to work. Several Each method analyzed reports an indicator reflecting
options of data standardization were explored and did the predictive importance of variables. NevProp uses
not influence the performance of the model. an Automatic Relevance Determination (ARD)
NevProp provides predicted probabilities to belong to function to rank the importance of variables in
class 1, and various global performance indicators predicting the outcome. In CART, an indicator of
like the total proportion of misclassified cases. No variable importance is computed according to
classification matrix is directly available to compute information collected at each node. This information
the sensitivity and specificity. During the learning refers to the improvement of discrimination
process, a decision threshold (so-called "Score attributable to each potential test on the variables.
Threshold") is used to optimize the error at each Results on variable importance reported in the
iteration. The same threshold can also be applied to logistic model and in the models optimized with
the final predicted probability to ultimately classify NevProp and CART were slightly different (Table 2).
each case. In CART, a graphical representation of the decision
CART tree helps to understand the role of all predictive
The learning time with the CART software also variables and interactions between them. No such
depends on the validation options which are chosen to graphical representation is available in NevProp.
optimize the learning process: all re-sampling
methods are time consuming (cross-validation,
158
Comparison of discriminative performance results prevention. The predictive performance of CART is
Taking into account the primary sensitivity analyses, slightly lower than the performance of other methods
specifications of models were as follows: However, we met several problems in the task of
- NevProp: 10 input units, 10 hidden units, 30 comparative evaluation.
splits for cross validation (on the learning set),
30 iterations for learning process,
ScoreThreshold at 0.1.
- CART: Gini impurity flmction, split-sample
validation, misclassification costs at 1 (the
influence of misclassification costs on model
performance is described in Figure 1)
Sensitivity ¾
specificity Ar undethe -. - .v
Correctly ceaeelfes
l piect. RQOWUIv
% 1.
Ole
-- oPAS: AU. S.78
so-
70 - 0,7 I
60
*j4~pl~ d 0
°
e'
mtd~x
y
upobs
750
Figure 2: ROC curve for CART, MLP and logistic
40 0.4. regression
30 _ I-s! nativ~it 0
b3t First, at the implementation stage, we chose to
to -i-4pwclflclty -r, evaluate the methods at their best performance, i.e.
1 _ _tC~~r aso e d %i-e
after optimization of the modeling specifications.
0 This required to understand the meaning of each
1 1,2 1,4 1,6 1,6
learning parameter and to test its influence on final
Cost for mlsciaselficatlon of clau lin class 0
results. Some qualitative standards should probably
Figure 1: Variation of misclassification costs be clearly stated for implementation of these methods
between I and 1.8 in CART: effects on model's in order to make interpretable any kind of evaluation.
performance An effort was already done in that direction in the
NevProp manual [14]. A common environment is also
Table 4: Performance results for CART, MLP still lacking for implementation and evaluation: Such
compared with the logistic model an environment has been developed for UNIX and is
LR MLP CART not user-friendly [15].
Correctly classified Another difficulty was to define a comm-non
cases (%) in test set 65.9% 76.0% 69.1 framework of indicators to evaluate the same type of
(n = 5148) results for each optimized model. This framework is
Area under ROC 0.78 0.78 0.76 based on an assessment of explicative performance
curve (95% CI) (0.75-0.81) (0.75-0.80) (0.73- and discriminative predictive performance by ROC
________________ ____ ___________ 0.79) analysis. Indeed, a risk prediction model can be
AUCs difference -0,9562 -2,1864 considered as a diagnostic test and the ROC curve has
with the logistic Cr 0.33) (p = 0,02) been recommended as an appropriate measure of
model (p value) diagnostic accuracy by clinical epidemiologists [16].
Table 4 describes the performance results of the While comparing methods, it is necessary to
models applied in the test set. ROC curves for CART understand the semantic that underlies the output
and the MLP are not significantly different from the result of each method and to fit it into the common
one obtained with the logistic model (Figure 2). comparison framework. CART and MLP provide
fundamentally different types of results. The
Discussion and Conclusion extraction of predicted probabilities from CART
This work comparatively evaluates the output for the ROC analysis can be discussed. We
implementation and performance of two machine chose an approach that has been already described
learning methods (a multilayer perceptron and an and criticized [17]. Indeed, the probabilities available
inductive decision tree based on the CART for each terminal node remain dependent of the tree's
algorithm) by reference to a logistic regression structure (namely its depth) and the interpretation of
model, in the real context of cardiovascular this probability may not be exactly the same as the
159
one provided by the MLP or the logistic model. machine-learning methods for predicting pneumonia
Moreover, we did not consider any measure of mortality. Artif Intell Med 1997;9:107-38.
calibration which would provide along with 6. Knuiman MW, Vu HT, Segal MR. An empirical
discriminative performance a complete measure of comparison of multivariable methods for estimating
the accuracy of models [18]. Another complementary risk of death from coronary heart disease. J
work would consist in analyzing the precision and Cardiovasc Risk 1997;4(2):127-34.
generalizability of the performance results [19]. 7. Lapuerta P, Azen PS, LaBree L. Use of neural
The contribution of our models to the domain of networks in predicting the risk of coronary artery
cardiovascular prevention needs further discussion: disease. Comp Biomed Res. 1995;28:38-52.
all three models provide a relatively low predictive 8. Gueyffier F, Boutitie F, Boissel JP, Coope J, Cutler
performance, difficult to use for decision making in J, Ekbom T, et al. INDANA: a meta-analysis on
individuals. However, each method shows some individual patient data in hypertension. Protocol and
characteristics which may be interesting in the context preliminary results. Therapie 1995;50(4):353-62.
of clinical practice. First, the tree representation in 9. Goodman P. NevProp software, version 3.
CART is close to the medical reasonning and can ftp:Hlftp.scs.unr.edulpub/cbmr/nevpropdir [february 22
help to structure the understanding of prediction. 2000]
Second; models obtained with neural networks are 10. CART: Tree-structured non-parametric data
not fixed since the iterative learning process can analysis [program]. v3.6 version: San Diego, CA:
continue on local data. These methods probably have Salford Systems, 1995.
the potential to complement existing statistical 11. Breinan L, Fnedman J, Olshen R, Stone C.
models and to contribute to the interpretation and Classificationand Regression Trees: Pacific Grove:
presentation of risk in computerized decision support Wadsworth,, 1984.
systems. Other machine-learning methods such as 12. Hanley JA, McNeil BJ. The meaning and use of
genetic algorithms, bayesian networks and support the area under a receiver operatng characteristic
vector machines should also be explored. (ROC) curve. Radiology 1982;143(l):29-36.
13. Metz C. ROCKIT software, vO.9 beta. http://www-
Acknowledgements radiology.uchicago.edu/krl/toppagel l.htm [february
We thank the collaborators of INDANA project 22 2000]
(Individual Data Analysis of Antihypertensive 14. Goodman P, Harrell Fj. Neural networks:
Intervention Trials) for allowing us to access the advantages and limitations for biostatistical
INDANA database (Coope J, Cutler J, Ekbom T, modeling. http://www.scs.unr.edu/nevprop [february
Fagard R, Friedman L, Kerlikowske K, Mitchell 29 2000]
Perry H, Pocock S, Prineas R, Schron E). 15. Rasmussen C, Neal R, Hinton G, van Camp D,
Revow M, Ghahramani Z, et aL The DELVE Manual,
References vi>.1. http://www.cs.utoronto.ca/-delve [february 29
1. Prevention of coronary heart disease in clinical 2000]
practice. Recommendations of the Second Joint Task 16. Irwig L, Tosteson AN, Gatsonis C, Lau J, Colditz
Force of European and other Sbcieties on coronary G, Chalmers TC, et al. Guidelines for meta-analyses
prevention. Eur Heart J 1998;19(10):1434-503. evaluating diagnostic tests. Ann Intern Med
2. Jackson R, Barham P, Bills J, Birch T, McLennan 1994;120(8):667-76.
L, MacMahon S, et al. Management of raised blood 17. Raubertas RF, Rodewald LE, Humiston SG,
pressure in New Zealand: a discussion document. Szilagyi PG.- ROC curves for classification trees. Med
BMJ 1993;307(6896):107-10. Decis Making 1994;14(2): 169-74.
3. Unwin N, Thomson R, O¶Byrne AM, Laker M, 18. Harrell FE, Jr., Lee KL, Mark DB. Multivariable
Armstrong H. Implications of applying widely prognostic models: issues in developing models,
accepted cholesterol screening and management evaluating assumptions and adequacy, and measuring
guidelines to a british adult population: cross sectional and reducing errors. Stat Med 1996;15(4):361-87.
study of cardiovascular disease and risk factors. BMJ 19. Justice AC, Covinsky KE, Berlin JA. Assessing
1998;317(7166):1125-30. the generalizability of prognostic information. Ann
4. Grover SA, Paquet S, Levinton C, Coupal L, Intern Med 1999;130(6):515-24.
Zowall H. Estimating the benefits of modifying risk
factors of cardiovascular disease: a comparison of
primary vs secondary prevention. Arch Intern Med
1998;158(6):655-62.
5. Cooper GF, Aliferis CF, Ambrosino R, Aronis J,
Buchanan B, Caruana R, et al. An evaluation of
160