0% found this document useful (0 votes)
35 views5 pages

Models To Predict Cardiovascular Risk - Comparison of CART, Multilayer Perceptron and Logistic Regression

This document compares three models - CART decision trees, multilayer perceptrons, and logistic regression - for predicting cardiovascular risk using real patient data. It evaluates the models based on their implementation requirements, explanatory power, and predictive accuracy. The CART, perceptron, and logistic regression models achieved similar predictive performance with areas under the ROC curve of 0.78, 0.78, and 0.76 respectively.

Uploaded by

Abel Demelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views5 pages

Models To Predict Cardiovascular Risk - Comparison of CART, Multilayer Perceptron and Logistic Regression

This document compares three models - CART decision trees, multilayer perceptrons, and logistic regression - for predicting cardiovascular risk using real patient data. It evaluates the models based on their implementation requirements, explanatory power, and predictive accuracy. The CART, perceptron, and logistic regression models achieved similar predictive performance with areas under the ROC curve of 0.78, 0.78, and 0.76 respectively.

Uploaded by

Abel Demelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Models to predict cardiovascular risk: comparison of CART,

Multilayer perceptron and logistic regression


Isabelle Colombet MD*, MPH, Alan Ruelland, MS*,
Gilles Chatellier MD*, Frangois Gueyffier, MID, PhD**, Patrice Degoulet MD, PhD*,
Marie-Christine Jaulent PhD*,
*Medical Informatics Department, Broussais Hospital, Paris, France
** Clinical Pharmacology Department, Claude Bernard University, Lyon, France
colombetghbroussais.fr
Abstract predict coronary mortality in the Busselton cohort [6].
Lapuerta et al. uses a neural network to predict
The estimate ofa multivariate risk is now required in coronary risk from serum lipid profiles, taking into
guidelines for cardiovascular prevention. Limitations account censored data; the neural network showed a
of existing statistical risk models lead to explore higher proportion of observations which are correctly
machine-learning methods. This study evaluates the classified compared with a Cox model [7]. The
implementation and performance of a decision tree evaluation of prediction methods relies on the
(CART) and a multilayer perceptron (MLP) to analysis of predictive performance of models. This
predict cardiovascular riskfrom real data. performance is often incompletely assessed by such
The study population was randomly splitted in a indicators as the proportion of correctly classified
learning set (n=10,296) and a test set (n=5,148). observations. Other aspects of methods
CART and the MLP were implemented at their best implementation are not always addressed.
performance on the learning set and applied on the The objective of this work is to evaluate the
test set and compared to a logistic model. implementation and performance of two machine
Implementation, explicative and discriminative learning methods (a multilayer perceptron and an
performance criteria are considered, based on ROC inductive decision tree based on the CART
analysis. algorithm) comparatively with a logistic regression
Areas under ROC curves and their 95% confidence model, in order to predict the risk of cardiovascular
interval are 0.78 (0.75-0.81). 0.78 (0.75-0.80) and disease in a real database from the INDANA project
0. 76 (0.73-0.79) respectively for logistic regression, (Individual Data Analysis of Antihypertensive
MLP and CART. Given their implementation and Intervention Trials) [8]. This paper describes how
explicative characteristics, these methods can these methods have been applied to the INDANA
complement existing statistical models and contribute database and presents a comparison framework. The
to the interpretation ofrisk. results are reported and discussed according to this
Introduction framework.
Current guidelines published for the management of Material and methods
the main cardiovascular risk factors (hypertension,
hypercholesterolemia, type 2 diabetes) are based on a The INDANA database
decision making strategy that uses a multivariate The INDANA database has been previously
estimate of cardiovascular risk [1-3]. This strategy is described [8]. Briefly, this database consists in the
supposed to lead to a more accurate identification of individual data of 10 randomized controlled trials
patients who will most benefit from the treatment of designed to evaluate the preventive effects of
risk factors [4]. antihypertensive drugs. In this study, we only used
Any multivariate estimate of cardiovascular risk is data from the control groups. Observations with
currently based on the use of statistical models missing data were mostly clustered by trial and were
inferred from cohort data with methods such as dropped from the original dataset. The final dataset
logistic regression or Cox proportional analysis. consists in 15,444 subjects, described by several
Machine-learning methods are more and more clinical characteristics and prospectively followed
explored and evaluated for risk prediction purposes in during at least 6 years for incidence of cardiovascular
medical domains [5]. Few works have been outcomes. Problems of heterogeneity of outcome and
published on the evaluation of machine learning predictive variables measurements between trials are
methods to predict cardiovascular risk. Knuiman et addressed by Gueyffier et al. [8]. The outcome
al. find similar discriminative performance of a considered in this paper is the 6-year incidence of the
decision tree and a logistic regression model to combined endpoint defined by occurence of

10674-5027/00/$5.00 © 2000 AMIA, Inc. 156


myocardial infarction, stroke or cardiovascular death. the best tree. Learning parameters refer here to the
It is represented in the dataset by a binary variable: choice of the impurity function (i.e. Gini index), the
occurrence (class 1) or no occurrence (class 0) of the internal validation method (split-sample, cross
outcome event. validation, bootstrap), the specification of prior
Application of learning methods probabilities and/or misclassification costs.
Three learning methods were used to fit a prediction Sampling method
model to the data: logistic regression, a neural A split-sample strategy is used for application of the
network and an inductive decision tree. This fitting prediction methods. Randomly selected two thirds of
process is hereafter described for each method. the dataset are used to learn the prediction model
Logistic regression (learning set: n = 10,296). The remaining third is used
The reference model was built by forced entry of 10 to validate the model (test set: n = 5,148).
variables followed by removal of the ones with no All predictive models were optimized from a set of
significant partial correlation (R statistic). The SPSS ten predictive variables (age, sex, systolic and
v7.5.2F for Windows (1997) statistical package was diastolic blood pressure, serum total cholesterol,
used for these analyses. binary or multi-category smoking status, diabetes, left
Neural network: NevProp (Nevada backPropagation) ventricular hypertrophy on EKG, body mass index).
We use a common feedforward backpropagation Comparison framework
multilayer perceptron (MLP) simulator developped in A comparison framework was defined to consider
the NevProp software package at the University of metrics other than just performance of models. Three
Nevada and freely available on the Internet [9]. The types of indicators were assessed, based on intrinsic
prediction method is based on the nonlinear weighted properties of the algorithms and on properties that are
combination of input units (i.e. predictive variables) clinically useful for the defined task:
to predict one or more output units (i.e. outcome - Implementation criteria reflect the difficulty to
variable). The learning process is iterative and optimally apply the method to new data. Three
essentially consists in adjusting the weights to qualitative criteria are considered: 1) control of
decrease the output error. The network was specified the learning time 2) representation of the
with one input layer (representing the ten predictive predictive variables (are any transformation
variables), one hidden layer (including ten hidden required?) 3) representation of the output result (is
units) and one output layer (with one output unit any decision threshold implicitly used, is it
representing a binary cardiovascular event). Several automatically defined by the system or is it user-
sensitivity analyses were performed to test how the defined ?)
prediction results could be influenced by the - Explicative performance criteria reflect the extent
variations of learning parameters and to elicit the to which the model explains by itself the
most optimized network. These parameters refer to prediction process. Three criteria are considered:
the architecture of the network (number of hidden 1) expressiveness of the outcome result (binary
units), the method of internal validation (number of classification versus any other membership
iterations and data-splitting processes), the options of function) 2) report on the predictive variables
data pre-treatment (i.e. normalization of inputs), the implied in the decision and their relative
activation function for hidden units, and the importance 3) availability of a graphical
"ScoreThreshold" used by the system to classify a representation to understand the model itself.
case from its predicted probability. - Discriminative performance criteria reflect the
Decision tree: CART (Classification And Regression ability of the model to separate high risk subjects
Tree) from low risk subjects. Three criteria are
We use the software CART v3.6, developed by considered: 1) the ROC curve and area under it
Salford Systems [10] and based on Breiman's original (or c index) [12], 2) the sensitivity (i.e. true
algorithm [11]. An inductive decision tree is positive rate) and 3) the specificity (i.e. true
essentially a set of rules represented by decisional negative rate, or 1 - false positive rate).
nodes and leaves (i.e. terminal nodes) which are All the ROC curve analyses were performed using the
assigned to a class. The learning process consists in RocKit software which takes as input a vector of
1) selecting the most discriminative variable predicted probability along with the observed event
according to an impurity function to partition the [13]. The logistic model and neural network were
data, 2) repeating this partition until the nodes are applied to the test set to obtain this input vector. In
considered pure enough to be terminal and 3) pruning CART, these probabilities had to be extracted from
the resulting complete tree to avoid overfitting. Here the terminal nodes information given in the output.
again, sensitivity analyses were performed to elicit This extraction was done with an EXCEL macro.

157
Results bootstrap and bagging).
CART can take as input either continuous or
Reference logistic model categorical variables. No distributional hypothesis is
The reference logistic model takes into account seven required for these variables. However, the tree
out of the ten original variables. Table 1 describes structure relies on their binarization.
these clinical characteristics in the database. Table 2 CART output provides a classification matrix, that
presents their predictive importance in the logistic allows to calculate the sensitivity and specificity. For
model. The n-categorical variables are transformed a given case, the classification process checks which
for the model into n - 1 binary variables. terminal node applies to the case. The output includes
Table 1: Descriptive characteristics of the total the following information on each terminal node:
population for diseased and non diseased people * a probability which represents the membership
degree of the terminal node to the class. This
Mean (SD) or % With outcome Without outcome probability is computed, taking into account the
(n = 891) (n = 14,553) relative frequency of each class in the terminal node
Age (y) 60.5 (9.7) 52.5 (9.5) and its relative size compared with the whole
Sex (% males) 66% 52% sample.
SBP* (mmdg) 174 (22) 161 (20) * the class assigned to the terminal node according to
DBP* (mHg) 98 (10) 98 (8) its probability and to pre-specified misclassification
Diabetes (%) 2.1 1.4 costs.
Smokers-Sk (%) 36.3 29.7
ECG-LVH (%) 24.2 11.2 Table 2: Ranking ofpredictive variables: comparison
BMI* (kg/im) 26.9 (4.5) 27.3 (4.6) of CART, MLP and logistic regression
TC (mmoll) 6.4 (1.2) 6.3 (1.1) Logistic Regression MLP (%) CART (%)
*: SBP: systolic blood pressure, DBP: diastolic blood (coefficient/SD)
pressure, Sk: binary smoking satus; TC: total cholesterol; Age (13,7) Age (100) Age (100)
ECG-LVH: left ventricular hypertrophy at ECG; BMI: Sex (9,6) Sex (38.1) SBP (59.6)
body mass index Sk-cat2* (5,4) SBP (17.5) ECG (26.5)
Comparison of implementation criteria SBP (5,4) ECG (5.7) Sex (23.0)
Multilayer perceptron (MLP) TC (4,3) TC (5.7) Sk-cat (16.3)
The learning time was not a major problem with the ECG1*(4,0) Sk-bin (3.4) Sk-bin (15.2)
MLP (no more than few seconds) as far as we did not ECG2*(3,3) DBP (1.4) TC (14.5)
use any bootstrapping method for optimization of the Sk-cat3* (2,4) Diabetes (0.7) DBP (13.9)
learning process. The learning time is slightly Diabetes (2,4) Sk-cat (0.2) Diabetes (1.3)
influenced by several other learning parameters such Sk-cati * (2,0) BMI (0.2) BMI (0.9)
as the number of iterations required, the number of ECG4* (1,8)
cross validations, the pre-treatment of data by ECG3 *(1,2)
normalization. *binary variable recoded from a multi-categorical variable
Input data have to be numerical and this is the only Comparison of methods' explicative performance
requirement for the learning system to work. Several Each method analyzed reports an indicator reflecting
options of data standardization were explored and did the predictive importance of variables. NevProp uses
not influence the performance of the model. an Automatic Relevance Determination (ARD)
NevProp provides predicted probabilities to belong to function to rank the importance of variables in
class 1, and various global performance indicators predicting the outcome. In CART, an indicator of
like the total proportion of misclassified cases. No variable importance is computed according to
classification matrix is directly available to compute information collected at each node. This information
the sensitivity and specificity. During the learning refers to the improvement of discrimination
process, a decision threshold (so-called "Score attributable to each potential test on the variables.
Threshold") is used to optimize the error at each Results on variable importance reported in the
iteration. The same threshold can also be applied to logistic model and in the models optimized with
the final predicted probability to ultimately classify NevProp and CART were slightly different (Table 2).
each case. In CART, a graphical representation of the decision
CART tree helps to understand the role of all predictive
The learning time with the CART software also variables and interactions between them. No such
depends on the validation options which are chosen to graphical representation is available in NevProp.
optimize the learning process: all re-sampling
methods are time consuming (cross-validation,

158
Comparison of discriminative performance results prevention. The predictive performance of CART is
Taking into account the primary sensitivity analyses, slightly lower than the performance of other methods
specifications of models were as follows: However, we met several problems in the task of
- NevProp: 10 input units, 10 hidden units, 30 comparative evaluation.
splits for cross validation (on the learning set),
30 iterations for learning process,
ScoreThreshold at 0.1.
- CART: Gini impurity flmction, split-sample
validation, misclassification costs at 1 (the
influence of misclassification costs on model
performance is described in Figure 1)
Sensitivity ¾
specificity Ar undethe -. - .v
Correctly ceaeelfes
l piect. RQOWUIv
% 1.
Ole
-- oPAS: AU. S.78
so-

70 - 0,7 I

60
*j4~pl~ d 0
°
e'
mtd~x
y
upobs
750
Figure 2: ROC curve for CART, MLP and logistic
40 0.4. regression
30 _ I-s! nativ~it 0
b3t First, at the implementation stage, we chose to
to -i-4pwclflclty -r, evaluate the methods at their best performance, i.e.
1 _ _tC~~r aso e d %i-e
after optimization of the modeling specifications.
0 This required to understand the meaning of each
1 1,2 1,4 1,6 1,6
learning parameter and to test its influence on final
Cost for mlsciaselficatlon of clau lin class 0
results. Some qualitative standards should probably
Figure 1: Variation of misclassification costs be clearly stated for implementation of these methods
between I and 1.8 in CART: effects on model's in order to make interpretable any kind of evaluation.
performance An effort was already done in that direction in the
NevProp manual [14]. A common environment is also
Table 4: Performance results for CART, MLP still lacking for implementation and evaluation: Such
compared with the logistic model an environment has been developed for UNIX and is
LR MLP CART not user-friendly [15].
Correctly classified Another difficulty was to define a comm-non
cases (%) in test set 65.9% 76.0% 69.1 framework of indicators to evaluate the same type of
(n = 5148) results for each optimized model. This framework is
Area under ROC 0.78 0.78 0.76 based on an assessment of explicative performance
curve (95% CI) (0.75-0.81) (0.75-0.80) (0.73- and discriminative predictive performance by ROC
________________ ____ ___________ 0.79) analysis. Indeed, a risk prediction model can be
AUCs difference -0,9562 -2,1864 considered as a diagnostic test and the ROC curve has
with the logistic Cr 0.33) (p = 0,02) been recommended as an appropriate measure of
model (p value) diagnostic accuracy by clinical epidemiologists [16].
Table 4 describes the performance results of the While comparing methods, it is necessary to
models applied in the test set. ROC curves for CART understand the semantic that underlies the output
and the MLP are not significantly different from the result of each method and to fit it into the common
one obtained with the logistic model (Figure 2). comparison framework. CART and MLP provide
fundamentally different types of results. The
Discussion and Conclusion extraction of predicted probabilities from CART
This work comparatively evaluates the output for the ROC analysis can be discussed. We
implementation and performance of two machine chose an approach that has been already described
learning methods (a multilayer perceptron and an and criticized [17]. Indeed, the probabilities available
inductive decision tree based on the CART for each terminal node remain dependent of the tree's
algorithm) by reference to a logistic regression structure (namely its depth) and the interpretation of
model, in the real context of cardiovascular this probability may not be exactly the same as the

159
one provided by the MLP or the logistic model. machine-learning methods for predicting pneumonia
Moreover, we did not consider any measure of mortality. Artif Intell Med 1997;9:107-38.
calibration which would provide along with 6. Knuiman MW, Vu HT, Segal MR. An empirical
discriminative performance a complete measure of comparison of multivariable methods for estimating
the accuracy of models [18]. Another complementary risk of death from coronary heart disease. J
work would consist in analyzing the precision and Cardiovasc Risk 1997;4(2):127-34.
generalizability of the performance results [19]. 7. Lapuerta P, Azen PS, LaBree L. Use of neural
The contribution of our models to the domain of networks in predicting the risk of coronary artery
cardiovascular prevention needs further discussion: disease. Comp Biomed Res. 1995;28:38-52.
all three models provide a relatively low predictive 8. Gueyffier F, Boutitie F, Boissel JP, Coope J, Cutler
performance, difficult to use for decision making in J, Ekbom T, et al. INDANA: a meta-analysis on
individuals. However, each method shows some individual patient data in hypertension. Protocol and
characteristics which may be interesting in the context preliminary results. Therapie 1995;50(4):353-62.
of clinical practice. First, the tree representation in 9. Goodman P. NevProp software, version 3.
CART is close to the medical reasonning and can ftp:Hlftp.scs.unr.edulpub/cbmr/nevpropdir [february 22
help to structure the understanding of prediction. 2000]
Second; models obtained with neural networks are 10. CART: Tree-structured non-parametric data
not fixed since the iterative learning process can analysis [program]. v3.6 version: San Diego, CA:
continue on local data. These methods probably have Salford Systems, 1995.
the potential to complement existing statistical 11. Breinan L, Fnedman J, Olshen R, Stone C.
models and to contribute to the interpretation and Classificationand Regression Trees: Pacific Grove:
presentation of risk in computerized decision support Wadsworth,, 1984.
systems. Other machine-learning methods such as 12. Hanley JA, McNeil BJ. The meaning and use of
genetic algorithms, bayesian networks and support the area under a receiver operatng characteristic
vector machines should also be explored. (ROC) curve. Radiology 1982;143(l):29-36.
13. Metz C. ROCKIT software, vO.9 beta. http://www-
Acknowledgements radiology.uchicago.edu/krl/toppagel l.htm [february
We thank the collaborators of INDANA project 22 2000]
(Individual Data Analysis of Antihypertensive 14. Goodman P, Harrell Fj. Neural networks:
Intervention Trials) for allowing us to access the advantages and limitations for biostatistical
INDANA database (Coope J, Cutler J, Ekbom T, modeling. http://www.scs.unr.edu/nevprop [february
Fagard R, Friedman L, Kerlikowske K, Mitchell 29 2000]
Perry H, Pocock S, Prineas R, Schron E). 15. Rasmussen C, Neal R, Hinton G, van Camp D,
Revow M, Ghahramani Z, et aL The DELVE Manual,
References vi>.1. http://www.cs.utoronto.ca/-delve [february 29
1. Prevention of coronary heart disease in clinical 2000]
practice. Recommendations of the Second Joint Task 16. Irwig L, Tosteson AN, Gatsonis C, Lau J, Colditz
Force of European and other Sbcieties on coronary G, Chalmers TC, et al. Guidelines for meta-analyses
prevention. Eur Heart J 1998;19(10):1434-503. evaluating diagnostic tests. Ann Intern Med
2. Jackson R, Barham P, Bills J, Birch T, McLennan 1994;120(8):667-76.
L, MacMahon S, et al. Management of raised blood 17. Raubertas RF, Rodewald LE, Humiston SG,
pressure in New Zealand: a discussion document. Szilagyi PG.- ROC curves for classification trees. Med
BMJ 1993;307(6896):107-10. Decis Making 1994;14(2): 169-74.
3. Unwin N, Thomson R, O¶Byrne AM, Laker M, 18. Harrell FE, Jr., Lee KL, Mark DB. Multivariable
Armstrong H. Implications of applying widely prognostic models: issues in developing models,
accepted cholesterol screening and management evaluating assumptions and adequacy, and measuring
guidelines to a british adult population: cross sectional and reducing errors. Stat Med 1996;15(4):361-87.
study of cardiovascular disease and risk factors. BMJ 19. Justice AC, Covinsky KE, Berlin JA. Assessing
1998;317(7166):1125-30. the generalizability of prognostic information. Ann
4. Grover SA, Paquet S, Levinton C, Coupal L, Intern Med 1999;130(6):515-24.
Zowall H. Estimating the benefits of modifying risk
factors of cardiovascular disease: a comparison of
primary vs secondary prevention. Arch Intern Med
1998;158(6):655-62.
5. Cooper GF, Aliferis CF, Ambrosino R, Aronis J,
Buchanan B, Caruana R, et al. An evaluation of

160

You might also like