Multivariate or Multivariable Regression
Multivariate or Multivariable Regression
Multivariate or Multivariable Regression
STATISTICALLY
Speaking
Multivariate or Multivariable Regression?
The terms multivariate and multivariable are often used interchangeably in the public health literature. However, these
terms actually represent 2 very distinct types of analyses. We define the 2 types of analysis and assess the prevalence
of use of the statistical term multivariate in a 1-year span of articles published in the American Journal of Public
Health. Our goal is to make a clear distinction and to identify the nuances that make these types of analyses so dis-
tinct from one another.
Most regression models are described in terms model, and x1, x2, …, xk are the predictors in In 5 (17%) of the 30 articles, multivariate
of the way the outcome variable is modeled: the multivariable model. models (as we have defined them here) were
in linear regression the outcome is continuous, As is the case with linear models, logistic used; 4 (13%) of these models were derived
logistic regression has a dichotomous out- and proportional hazards regression models from longitudinal data and 1 from nested
come, and survival analysis involves a time to can be simple or multivariable. Each of these data. The remaining 25 (83%) articles in-
event outcome. Statistically speaking, multi- model structures has a single outcome vari- volved multivariable analyses; logistic regres-
variate analysis refers to statistical models that able and 1 or more independent or predictor sion (21 of 30, or 70%) was the most promi-
have 2 or more dependent or outcome vari- variables. nent type of analysis used, followed by linear
ables,1 and multivariable analysis refers to sta- Multivariate, by contrast, refers to the mod- regression (3 of 30, or 10%). Interestingly, in
tistical models in which there are multiple in- eling of data that are often derived from lon- 2 of the 30 articles (7%), the terms multivari-
dependent or response variables.2 gitudinal studies, wherein an outcome is mea- ate and multivariable were used interchange-
A multivariable model can be thought of as sured for the same individual at multiple time ably. This further elucidates the need to es-
a model in which multiple variables are found points (repeated measures), or the modeling tablish consistency in use of the 2 statistical
on the right side of the model equation. This of nested/clustered data, wherein there are terms.
type of statistical model can be used to at- multiple individuals in each cluster. A multi- Although some may argue that the inter-
tempt to assess the relationship between a variate linear regression model would have changeable use of multivariate and multivari-
number of variables; one can assess indepen- the form able is simply semantics, we believe that dif-
dent relationships while adjusting for poten- ferentiating between the 2 terms is important
(3) Yn×p = Xn×(k+1) β(k+1)×p + ε
tial confounders. for the field of public health. In general, mod-
A simple linear regression model has a where the relationships between multiple de- els used in public health research should be
continuous outcome and one predictor, pendent variables (i.e., Ys)—measures of multi- described as simple or multivariable, to indi-
whereas a multiple or multivariable linear re- ple outcomes—and a single set of predictor cate the number of predictors, and as linear,
gression model has a continuous outcome variables (i.e., Xs) are assessed. logistic, multivariate, or proportional hazards,
and multiple predictors (continuous or cate- We took a systematic approach to assessing to indicate the type of outcome (e.g., continu-
gorical). A simple linear regression model the prevalence of use of the statistical term ous, dichotomous, repeated measures, time to
would have the form multivariate. That is, we used PubMed and event).
the keyword “multivariate” to review articles Our review revealed that there is a need
(1) y = α + xβ + ε
published in the American Journal of Public for more accurate application and reporting of
By contrast, a multivariable or multiple linear Health over a 1-year span (December 2010– multivariable methods. This issue is not
regression model would take the form November 2011). We identified 30 articles in unique to public health research and has been
which the authors indicated the use of a “mul- identified as affecting other areas of research
(2) y = α + x1β1 + x2β2 + . . . + xkβk + ε
tivariate” statistical method. Each of the arti- as well (e.g., medicine, psychology, political
where y is a continuous dependent variable, x cles was individually reviewed to assess the science).3 However, we hope to see a clearer
is a single predictor in the simple regression type of analysis defined as multivariate. distinction in the use of the terms multivariate
January 2013, Vol 103, No. 1 | American Journal of Public Health Hidalgo and Goodman | Statistically Speaking | 39
⏐ PUBLIC HEALTH THEN AND NOW ⏐
Preventing Childhood Obesity in Early Care and Education, Second Edition is the
Bertha Hidalgo, PhD, MPH
new set of national standards describing evidence-based best practices in nutrition,
Melody Goodman, PhD, MS
physical activity, and screen time for early care and education programs. The stan-
dards are for all types of early care and education settings-centers and family child
About the Authors care homes. These updated standards are part of the new comprehensive Caring
Bertha Hidalgo is with the Department of Biostatistics,
for Our Children: National Health and Safety Performance Standards: Guidelines
Section on Statistical Genetics, University of Alabama at
Birmingham. Melody Goodman is with the Department of for Early Care and Education Programs, Third Edition.New Updates include all
Surgery, Division of Public Health Sciences, School of Med- MyPyramid references to MyPlate, and added playing outdoors standard.
icine, Washington University in St. Louis, St. Louis, MO. Topics include: General nutrition requirements, meal and snack patterns, requirements for infants in-
Correspondence should be sent to Bertha Hidalgo, PhD, cluding supporting breastfeeding, requirements for toddlers and preschoolers, meal service and supervi-
MPH, 1665 University Blvd, RPHB 443, Birmingham, AL
35294 (e-mail: [email protected]). Reprints can be or-
sion, nutrition education, active opportunities for physical activities, outdoor and indoor play time, care-
dered at http://www.ajph.org by clicking the “Reprints” link. givers/teachers' encouragement of physical activity, policies on infant feeding, food and nutrition services,
This article was accepted May 6, 2012. and physical activity, and more!
doi:10.2105/AJPH.2012.300897
ORDER TODAY!
Contributors 2012, 75 pp., softcover, ISBN 978-158110-7142
Both authors contributed equally to this article. B. Hi-
dalgo conducted the literature review and led the writ- APHA® Member Price: $29.95, List Price: $34.95
ing. M. Goodman conceived the topic and supervised
ORDER ONLINE: www.aphabookstore.org
the development of the article.
E-MAIL: [email protected] TEL: 888.320.APHA FAX: 888.361.APHA
Acknowledgments
B. Hidalgo was supported in part by a predoctoral train-
ing grant from the National Cancer Institute (grant
5R25CA047888) and a postdoctoral training grant
from the National Heart, Lung, and Blood Institute
(grant T32HL072757). M. Goodman was supported
by the Siteman Cancer Center, the National Cancer
Institute (grant U54CA153460), and the Washington
University Faculty Diversity Scholars Program.
References
1. Van Belle G. Biostatistics: A Methodology for the
Health Sciences. Hoboken, NJ: Wiley-Interscience;
2004.
2. Katz MH. Multivariable analysis: a primer for read-
ers of medical research. Ann Intern Med.
2003;138(8):644–650.
3. Freedland KE, Reese RL, Steinmeyer BC. Multivari-
able models in biobehavioral research. Psychosom Med.
2009;71(2):205–216.
40 | Statistically Speaking | Hidalgo and Goodman American Journal of Public Health | January 2013, Vol 103, No. 1