The Risk of Determining Risk With Multivariable Models
The Risk of Determining Risk With Multivariable Models
• Purpose: To review the principles of multivariable A l t h o u g h most physicians have received no instruction
analysis and to examine the application of multivariable in multivariable methods of statistical analysis, the
statistical methods in general medical literature. methods now commonly appear in medical literature.
• Data Sources: A computer-assisted search of arti- The results of multivariable analyses are often ex-
cles in The Lancet and The New England Journal of pressed in statements such as, "When other risk factors
Medicine identified 451 publications containing multi- are controlled, a decrease of 5 units in substance X
variable methods from 1985 through 1989. A random reduced disease by 10%," or "After adjustment for age
sample of 60 articles that used the two most common and stage of disease, treatment with procedure Y re-
methods—logistic regression or proportional hazards duced mortality by 2 5 % . "
analysis—was selected for more intensive review. Our purpose in the current research was to note the
• Data Extraction: During review of the 60 randomly frequency with which multivariable analyses now ap-
selected articles, the focus was on generally accepted pear in general medical journals, to identify some com-
methodologic guidelines that can prevent problems mon problems and desirable precautions in the analy-
affecting the accuracy and interpretation of multivari- ses, and to determine how well the challenges are being
able analytic results. met. The investigation also provided a framework for a
• Results: From 1985 to 1989, the relative frequency brief review—intended for clinical readers—of com-
of multivariable statistical methods increased annually monly used multivariable statistical methods.
from about 10% to 18% among all articles in the two
journals. In 44 (73%) of 60 articles using logistic or
General Principles
proportional hazards regression, risk estimates were
quantified for individual variables ("risk factors"). Vio- Format of Multivariable Analysis
lations and omissions of methodologic guidelines in
these 44 articles included overfitting of data; no test of In the types of multivariable analyses discussed here,
conformity of variables to a linear gradient; no mention the mathematical expressions described in Appendix 1
of pertinent checks for proportional hazards; no report are used to relate two or more independent variables to
of testing for interactions between independent vari- an outcome or dependent variable. In those expres-
ables; and unspecified coding or selection of indepen- sions, a linear regression coefficient indicates the impact
dent variables. These problems would make the re- of each independent variable on the outcome in the
ported results potentially inaccurate, misleading, or context of (or "adjusting for") all other variables. The
difficult to interpret. values of the regression coefficients are obtained as the
• Conclusions: The findings suggest a need for im- best mathematical fit for the specified model, although
provement in the reporting and perhaps conducting of the selected model and the multivariable analysis may
multivariable analyses in medical research. or may not provide a good absolute fit for the data.
The four main multivariable methods in medical liter-
ature have many mathematical similarities but differ in
the expression and format of the outcome expressed as
the dependent variable:
1. In multiple linear regression, the outcome variable
is a continuous quantity, such as blood pressure in
millimeters of mercury or sodium concentration in mil-
liequivalents per liter.
2. In multiple logistic regression, the observed out-
come variable is usually a binary event, such as alive
versus dead or case versus control. The event occurs at
a fixed point in time, such as mortality 1 year after
surgery.
3. In discriminant function analysis, the outcome
variable is a category or group to which a subject be-
longs. For example, patients may be classified as having
obstructive, restrictive, vascular, or "other" forms of
pulmonary dysfunction. The analytic results are often
converted to a " s c o r e " used to classify observations
Annals of Internal Medicine. 1993;118:201-210. into one of the categorical groups. For only two cate-
gories (such as healthy or diseased), this form of mul-
From Yale University School of Medicine, New Haven, Con- tivariable analysis produces results similar to logistic
necticut. For current author addresses, see end of text. regression.
Indexes of "goodness-of-fit" evaluate how effectively the cal- garded as reasonably accurate frequencies of the cited
culated model fits the actual data for estimating the outcome multivariable methods in these two journals.
variable. Although various indexes have been developed (2), a
consensus is lacking even among statisticians about which in- Because the logistic regression and proportional haz-
dex is most appropriate. (Details of the methods and discus- ards methods were particularly common, they were the
sions are beyond the scope of this review.) As a fundamental focus of further evaluation in the random sample of 60
principle, including additional independent variables in a model articles.
will enhance mathematical goodness of fit but can cause prob-
lems such as overfitting of data or collinearity of variables.
Authors' Purpose in Using Multivariable Analysis
In 44 (73%) of the 60 publications using logistic and
Results
Cox regression, the multivariable methods were applied
Frequency of Use to quantify risk estimates reported as regression coeffi-
cients, odds ratios, or relative risks for individual vari-
Table 1 shows results for the four multivariable meth- ables. For example, in a study of prenatal X-ray expo-
ods in the two journals from 1985 to 1989. The number sure and childhood cancer in twins (31), the relative risk
of annual citations has increased steadily, although the for cancer, adjusting for birth weight, was 2.4 for ex-
total number of pertinent articles has remained essen- posed compared with nonexposed children; and when
tially constant. The frequency of the multivariable type A behavior was related to outcome of coronary
methods has thus increased from 10% to 18% over the heart disease (79), the mortality for Type A persons,
5-year period; and in 1989, one of the four methods after adjustment for other risk variables, was 58% that
appeared on average at least once per week in each of Type B persons.
journal. The remaining 16 (27%) of the 60 articles used mul-
The manual inspection of articles from July to De- tivariable methods as follows: Thirteen studies con-
cember 1989 determined that the computer search had firmed the results of other forms of analysis (such as
correctly identified "original articles," "special arti- simple bivariate analysis [40] or a Mantel-Haenszel
cles," "medical intelligence," "medical progress," and analysis [62]); one study screened data for important
"case records" in The New England Journal of Medi- variables (to identify risk factors for gastric cancer after
cine (n = 191); and "original articles," "preliminary gastric surgery for benign disease [65]); one report cre-
communications," and "methods and devices" in The ated a risk score (to predict relapse among patients with
Lancet (n = 135). Eleven additional publications in The testicular teratoma [72]); and one investigation checked
Lancet, however, were mistakenly identified as "origi- for interactions only (in a report of tamoxifen therapy
nal," including two articles on medicine and the law, a for breast cancer [58]).
letter to the editor, a correction, and so forth. Thus, the
computer search identified a proportionate excess of 11
Problems in Reporting and Application
of 326, or 3%, of the desired articles for the checked
period in both journals. The error rate seemed too small The pertinent statistical "package" or program—such
to warrant corrections for the reported frequency as SAS (19) or BMDP (20)—used to perform a multi-
counts. variable analysis should be reported to the reader. Cit-
The manual inspection was also used to evaluate the ing the information is analogous to a laboratory re-
computer citations of the multivariable methods. In searcher indicating the particular experimental protocol
three articles, the search term appeared in the discus- used for physiologic measurements. Yet, the pertinent
sion (such as describing a logistic analysis done in an- program was mentioned in only 17 (39%) of the 44
other publication) rather than in the study methods. articles reviewed.
Several articles using major modifications of classical In addition to this general consideration, the six cited
multivariable methods were not identified, but these principles were evaluated in the 44 studies where logis-
techniques were not an intended subject of our investi- tic regression and proportional hazards analysis meth-
gation. Furthermore, in five articles the authors did not ods were applied for quantifying risk of individual vari-
distinguish between simple linear regression and multi- ables. Potential problems involving collinear variables,
ple linear regression when reporting the use of "linear influential observations, and model validation were not
regression." Thus, the results in Table 1 can be re- evaluated in the articles under review.
were not confirmed in a subsequent, similar investiga- sponding to < 200 mg/dL versus > 200 mg/dL). Rather
tion (22). Although the inconsistent results were attrib- than using a dichotomous classification, continuous
uted to various elements of study design, limitations of variables may be converted into an array of ordinal
the data analysis itself were not considered. zones or transformed into " d u m m y " variables (2) ap-
The problems of overfitted models have been re- propriate for the clinical context of each analysis. Al-
ported (2,13, 23) in the statistical literature but are not though the ideal number of zones cannot be specified in
widely recognized. The key issue in the overfitting is an advance and often requires judgment, clinicians should
ample number of outcome events, not just a large sam- be aware of this issue in multivariable modeling.
ple size. When numerous variables are included in an The problem of nonproportionality in Cox regression
attempt to "control" or "adjust" the data, accuracy of can be avoided if hazard functions are suitably checked
results can be threatened by overfitting or by other and reported. Although criteria for identifying " s e v e r e "
mechanisms (24). The number of variables selected for violations are lacking, a rigorous scientific analysis
analysis should therefore be parsimonious, based on should include evaluating methodologic assumptions
clinical sensibility and suitable data quality. and reporting the results. Techniques such as checking
In checking the problem of nonlinearity when ranked for proportional hazards using logarithmic graphs (15)
variables are used directly, the analyst can compare the may not be familiar to all readers but, when described,
observed and the multivariable model's predicted values would indicate that the proportional hazards assumption
for the outcome over the range of each variable. A had been evaluated.
single risk estimate is inappropriate if the pattern of The interaction problem is illustrated by the associa-
" e r r o r s " is nonrandom. For example, if arterial carbon tion of asbestos exposure and cigarette smoking with
dioxide tension (PC02) is included as a predictor in a lung cancer, initially thought (25) to interact: The risk
multivariable analysis of death from chronic obstructive for asbestos-exposed persons who also smoked ciga-
lung disease, the corresponding linear regression coef- rettes was substantially greater than the risk anticipated
ficient will represent the average impact of PC02 on merely from combining the risks calculated individually
mortality. If the actual mortality substantially differs for asbestos exposure and for cigarette smoking. Al-
from predicted mortality for "high" values of P C 0 2 , though subsequent data (26) did not confirm these re-
then the analysis will incorrectly estimate the true risk sults, such interactions represent another threat to the
for such patients. constancy implied by the reporting of regression coef-
In "nonlinear" circumstances the risks should be ficients. A variable whose impact is linear when acting
quantitatively estimated not as a single value but in alone may be nonlinear when acting jointly with other
zones or categories of the data. Although checking for a variables.
linear gradient is not a trivial exercise, a common The fifth principle requires an explicit statement of
method available in software packages involves visual the way the independent variables are analytically clas-
inspection of appropriate data. Alternatively, the ana- sified and coded. This statement can be easily incorpo-
lyst can use other forms of multivariable analysis, such rated in the text, tables, or appendix to allow the reader
as cross-stratification (7), to evaluate whether the vari- to interpret the quantitative results. Such disclosure is
ables conform to a linear gradient. obviously crucial for interpreting the numerical magni-
In the papers under review, the problem of noncon- tude of a cited risk factor.
formity to a linear gradient was frequently avoided by Similarly, a statement indicating the method of select-
the strategy of using binary independent variables—a ing among candidate independent variables is desirable.
tactic found in 30 of the 44 pertinent articles. The true Readers should be aware that some variables may have
impact of continuous or ordinal variables, however, minimal impact on the outcome despite achieving "sta-
may be masked when two binary zones are created. For tistical significance," whereas other variables that fail
example, the "J-shape" relationship of serum choles- to achieve the threshold of " P < 0.05" may still have a
terol and mortality cannot be described by binary zones substantial effect on the outcome. (This distinction be-
such as < 5.20 mmol/L versus > 5.20 mmol/L (corre- tween quantitative and statistical significance occurs in
As to your method of work, I have a single bit of advice, which I give with the
earnest conviction of its paramount influence in any success which may have
attended my efforts in life—Take no thought for the morrow. Live neither in the
past nor in the future, but let each day's work absorb your entire energies, and
satisfy your widest ambition. This was the singular but very wise answer which
Cromwell gave to Bellevire—"No one rises so high as he who knows not wither
he is going," and there is much truth in it. The student who is worrying about the
future, anxious over the examinations, doubting his fitness for the profession, is
certain not to do so well as the man who cares for nothing but the matter in hand,
and who knows not wither he is going!
Submissions from readers are welcomed. If the quotation is published, the sender's name will be
acknowledged. Please include a complete citation, as done for any reference.—The Editors