0% found this document useful (1 vote)
99 views

How To Interpret and Report The Results From Multi Variable Analysis

This document provides an overview of multivariable analyses, which allow researchers to analyze relationships between more than two variables. It defines key terms like dependent (outcome) variables and independent (predictor) variables. Multivariable analyses can be used to test predefined hypotheses or explore relationships in clinical trial data. Parameters like odds ratios, hazard ratios, and beta coefficients are used to quantify effects. It's important to consider potential confounding variables and perform bivariate analyses first to understand raw relationships between variables before interpreting a multivariable analysis.

Uploaded by

Sofia Batalha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
99 views

How To Interpret and Report The Results From Multi Variable Analysis

This document provides an overview of multivariable analyses, which allow researchers to analyze relationships between more than two variables. It defines key terms like dependent (outcome) variables and independent (predictor) variables. Multivariable analyses can be used to test predefined hypotheses or explore relationships in clinical trial data. Parameters like odds ratios, hazard ratios, and beta coefficients are used to quantify effects. It's important to consider potential confounding variables and perform bivariate analyses first to understand raw relationships between variables before interpreting a multivariable analysis.

Uploaded by

Sofia Batalha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

How to interpret and report the

results from multivariable analyses


Neus Valveny and Stephen Gilliver analyses the participating variables can be
classified into:
TFS Develop, Medical ● Dependent (or outcome or predicted)
Writing Unit, Barcelona, variables and
Spain / Lund, Sweden ● Independent (or predictor or explanatory)
variables, which in some models can be
further classified into factors and
covariates (or confounding factors).
In a bivariate analysis (sometimes
Correspondence to: referred to as univariate – see Box 1 below)
Neus Valveny there is only one independent and one
TFS Develop dependent variable.
Consell de Cent, 334-336, 4th floor
ES-08009 Barcelona, Spain
+ 34 617 414 124
[email protected]

Abstract
Multivariable analyses are some of the mean, percentage). With
central statistical methods of clinical univariate analyses we can
trials, and yet some medical writers may only answer “descriptive
be unsure as to what they are and how questions” in a single arm or
best to interpret and report the results. In cohort, such as “What is the rate of
this article we provide an overview of responders to drug X?” or “What is the
multivariable analyses, introducing some mean survival time in patients treated
of the core models biostatisticians use to with drug Y?”
analyse trial data. We focus on odds But what about situations where we
ratios, hazard ratios, and β coefficients as wish to analyse more than one variable at a
key parameters and provide guidance on time? The purpose of bivariate and multi-
important considerations when reporting variable analyses is to probe the relationships
them. between two (bivariate) or more than two
(multivariable) variables. These types of
analyses allow us to test a previously defined
What is a multivariable hypothesis (e.g. the primary efficacy analysis In a multivariable analysis there are:
analysis? of a confirmatory study) or to explore the ● One dependent variable and
Univariate analyses – analyses involving only existing relationships between the collected ● Two or more independent variables.
a single variable – are descriptive by nature. variables (e.g. between-arm analyses, sub-
They allow us to describe the distribution of group analyses, exploratory analyses). With BOX 1: Bivariate analyses that analyse the
a variable in a sample of n individuals or n bivariate and multivariable analyses we can relationship between one independent
tumour biopsies, for example. In univariate answer “analytical questions” in one or more variable and one dependent variable are
analyses we commonly use parameters such cohorts, such as “What is the overall survival often referred to as “univariate” analyses
as the median, mean, and standard deviation with drug X compared with drug Y?”, “What to distinguish them from multivariable
to describe quantitative (or continuous) is the efficacy of drug Z, based on the analyses, in which two or more
variables and frequencies and percentages to reduction in cholesterol levels, compared independent variables are assessed in
describe categorical variables. We can also with placebo?”, or “What is the relationship relation to a dependent outcome. In this
estimate population parameters by calcul- between response rate to drug X and the context, the term “univariate” is correct
ating 95% confidence intervals (CIs) for the level of biomarker Y?” and replaces the term “bivariate”.
aforementioned summary statistics (median, In both bivariate and multivariable

www.emwa.org Volume 25 Number 3 | Medical Writing September 2016 | 37


How to interpret and report the results from multivariable analyses – Valveny and Gilliver

Outcome
Multivariable analyses The dependent vari- we must also consider that some
should not be con- able is the one that is independent variables may be entered in the
fused with multivariate assessed with the model because they are confounding variables
analyses, which are study. Sometimes it is (sometimes also denoted as covariates).
used to assess the referred to as the Confounding variables are factors related to
relationships of several endpoint. Usually this both the dependent and independent
predictors with two or term is reserved for the variables. Unless we adjust our multivariable
more dependent vari- combination of the out- analysis for confounding variables, we may
ables or outcomes at the come plus the timepoint(s) end up with an inaccurate or incorrect
same time. In this article we of assessment (e.g. if the representation of the true relationships
will not review multivariate outcome is “mortality”, the between the dependent and independent
analyses. However, medical writers endpoint could be “mortality rate at 6 variables. For example, in many clinical trials
should be aware that the terms multivariate months”). the baseline value for a quantitative
and multivariable are often used inter- The independent variables define the outcome (e.g. baseline blood pressure in a
changeably. Do not be surprised to see subgroups of patients in which the outcome hypertension trial) is a potential con-
multivariable analyses described as will be compared (e.g. treatment arms). founding variable if it is not fully balanced
multivariate. ● If the independent variable is categorical between the two treatment arms, despite
To correctly interpret a multivariable (e.g. treatment arm, gender), the para- randomisation of the patients, because it is
analysis it is highly recommendable to first meters of the multivariable models we also related to the outcome. For this reason,
look at the bivariate analyses between the will review in later sections – the odds the primary efficacy analysis should always
variables that were involved in the ratio (OR), hazard ratio (HR), and beta include the baseline value for the
multivariable modelling. They show you: coefficient (β) – always estimate the quantitative outcome as a covariate.
1. the raw relationships between the depen- effect on the outcome of one or more
dent and independent variables (which allow categories versus a reference category When to apply a multivariable
the unadjusted associations to be quantified) (e.g. placebo or female gender), which analysis
and 2. correlations or associations between must be defined a priori. A multivariable analysis is needed in the
independent variables (which, if present, ● If the independent variable is quantit- following cases:
may require changes to the model). ative (e.g. age), no subgroups are 1. If there is one main independent variable
compared and the OR, β, and HR of interest (the other independent
Variables: estimate the effect on the outcome of variables being secondary factors):
Dependent vs independent / each 1-unit increase in the a. To evaluate the relationship between
Quantitative vs categorical independent variable (e.g. “for each 1 the variable of interest and the out-
It is very important to note that both the mg/dl increase in baseline cholesterol”). come after adjusting (or controlling)
dependent and independent variables can It is very common for continuous for other independent variables that
be either quantitative or categorical, and predictors to be transformed into categorical may also be related to the outcome
correct identification of these statistical variables prior to the multivariable analysis (confounding factors or covariates).
properties is essential for the medical using a previously defined cut-off point Examples: Variable of
writer to correctly interpret and report the (from the literature). This is because the “Patients treated with drug A had interest
results. parameters of the models are much easier significantly higher cholesterol levels at 6 Outcome
Common quantitative outcomes include for physicians to interpret if they compare months compared to patients treated with
cholesterol levels, blood pressure, and quest- one category to another than if they placebo, after adjustment for
ionnaire scores and common categorical inform about the risk associated baseline cholesterol.” Variable of
outcomes are survival (yes/no), response to with a 1-unit increase in the “Higher biomarker X interest
treatment, and presence/absence of a predictor. However, this leads to levels were significantly assoc-
specific event (e.g. cardiovascular event, a loss of statistical power and to iated with a higher response Outcome
relapse). the risk of not finding rate, independently of/after
Common quantitative predictors include significant results. If the model adjusting for age and gender.”
age, BMI, and baseline values for the includes the original continuous (Box 2 opposite)
outcomes. Common categorical predictors predictor, the medical writer may 2. If there are two or more
include treatment arm, gender, and baseline facilitate interpretation of the results main independent variables
disease severity. by reporting the risk associated with, for of interest:
Note that categorical variables with only example, a 10-unit increase in the predictor. a. To explore which of the independent
two categories are referred to as dichotomous. In interpreting a multivariable analysis variables are independently associated

38 | September 2016 Medical Writing | Volume 25 Number 3


Valveny and Gilliver – How to interpret and report the results from multivariable analyses

with the outcome, i.e. they keep a Please note that the words “independently potentially use multiple Cox regression.
significant p-value in the model associated” and “independent factor/ Finally, multiple linear regression,
despite the inclusion of other predictor” imply that a multi- multiple ANOVA, and ANCOVA are
independent variables: variable model has been used multivariable models in which the
exploratory models. and that the described dependent variable is continuous, i.e. it can
These models are relationship has been theoretically take any value in its given
commonly used to look adjusted for at least one range. Despite being slightly different from
for “causal relationships”, additional factor. each other, these models can be considered
although the results must equivalent from a medical writer’s point of
always be interpreted with Multivariable view. An example scenario would be to
caution because associations analyses commonly determine whether a new treatment
may be due to confounding used in biomedical (independent variable) reduces the score for
factors that were not accounted for. studies disease index X (dependent variable) after
Example: There are several different types of multi- adjusting for country and baseline disease
“In patients with disease Z, male gender variable analysis. Three of the most index X score (independent variables
and higher blood pressure were indep- commonly used analyses are multiple logistic considered as covariates).
endently associated with higher regression, multiple Cox regression, and These multivariable analyses will be
mortality.” multiple linear regression/multiple analysis of discussed in further detail below. The aim is
b. To predict an outcome with indep- variance (ANOVA)/analysis of covariance not to explain how to run the analyses,
endent variables that are known to be (ANCOVA) (Table 1 overleaf). It is rather how to interpret and report the results
associated with the outcome: important to note that multiple regression they give. The focus will be on ORs, HRs,
predictive models. and multivariate regression are not the same and β coefficients.
These models are commonly used in thing. In multiple regression there is only
oncology to establish prognostic one dependent variable; multivariate Multiple logistic regression:
factors that may be useful to select regression involves two or more main What is an odds ratio?
candidate patients for more aggressive dependent variables and is less commonly What is an OR? Let’s define two groups of
therapies. They can also be used to used. subjects: a test group we are interested in
predict response, compliance, and With multiple logistic regression the aim and a reference group we wish to compare
quality of life. is to determine how one dichotomous the test group to. The OR is the ratio of two
Example: dependent variable varies according to two sets of odds: the odds of an event occurring
“In patients with disease Z, the or more independent (quantitative or cate- in the test group divided by the odds of the
independent factors predicting response gorical) variables. Multiple logistic regress- same event in the reference group. Note that
to drug X were tumour stage at diagnosis ion might, for example, be used to test odds are not the same as probability: the
and baseline beta-2-microglobulin level. the relationships of weekly alcohol odds are the probability of an event (e.g.
The model with these two variables consumption at age 30 and gender death) occurring divided by the probability
correctly predicted the response in 65% of (independent variables) with probability of of it not occurring. While probability ranges
patients”. developing liver cancer during a 10 year from 0 to 1, the odds may range from 0 to
period (dependent variable). Liver cancer is positive infinity.
a categorical variable with two categories at Going back to our example above, how
BOX 2: When describing associations the end of the follow-up period: “cancer” do weekly alcohol consumption and gender
between different variables, a common and “no cancer”. affect the odds of developing liver cancer?
mistake is to not give the direction of the Multiple Cox regression is similar to Here we can define two reference groups:
association, e.g. “Biomarker X levels were multiple logistic regression but it explores one for weekly alcohol consumption and
significantly associated with the response the relationships between independent one for gender. The reference group for
rate.” From this sentence, the reader variables and a time-to-event dependent weekly alcohol consumption might be “0
cannot ascertain whether a higher variable (dichotomous), e.g. time to death. units” and let’s say the one for gender is
response rate is associated with high or If we wanted to determine whether a new “female”. (If you’re wondering how a
low biomarker X levels. Although, if not treatment (independent variable) affects categorical independent variable such as
otherwise indicated, such an association probability of disease progression (depend- gender may be entered into a mathematical
would usually be interpreted as positive, ent variable) in patients with renal cell model, this can be achieved by creating a
a good medical writer should clearly carcinomas of different clinical stages at dummy variable with a value of 0 or 1. In the
indicate the direction of the association. baseline (second independent variable that present example, females may be given a
may be considered a covariate), we could value of 0 and males 1.)

www.emwa.org Volume 25 Number 3 | Medical Writing September 2016 | 39


How to interpret and report the results from multivariable analyses – Valveny and Gilliver

Multiple logistic regression Multiple Cox regression Multiple linear regression /


Multiple ANOVA / ANCOVA

Dependent variable Dichotomous Time to event Quantitative


(no information about timepoint) (dichotomous with information
about timepoint)
Example: Treatment response Example: Overall survival Example: Blood pressure
(yes/no)

Independent variables 2 or more quantitative or 2 or more quantitative or 2 or more quantitative or


categorical variables categorical variables categorical variablesa

Equationb logit(p) = a + b1x1 + b2x2… log(hi(t)) = a + b1x1 + b2x2… y = a + b1x1 + b2x2…

Parameter OR (= Exp(b)) HR (= Exp(b)) β (= b)

Interpretation Odds for: Instantaneous risk/hazard (hazard Size of the effect on the outcome
• Category X vs reference per unit time) for: (in outcome units) for:
category (if independent • Category X vs reference • Category X vs reference
variable is categorical) category (if independent category (if independent
• A 1-unit increase (if variable is categorical) variable is categorical)
independent variable is • A 1-unit increase (if indepen- • A 1-unit increase (if indepen-
quantitative) dent variable is quantitative) dent variable is quantitative)

Example of reporting “…odds of treatment failure were 3 “…risk of death was 3 times higher in “…systolic blood pressure was 3
times higher in men than in women” men versus women” mmHg higher in men than in women”

a For ANOVA and ANCOVA at least 1 categorical variable is needed


b logit(p) is log(p/1-p), where p is the probability of the outcome; a denotes a constant, bn denotes the coefficient for each independent
variable, xn denotes an independent variable, hi(t) is the hazard to individual i at time t, and y denotes a dependent variable

Table 1. Types of multivariable models commonly used in biomedical studies

Say we obtain an OR for liver cancer of Narrower intervals are obtained with larger Note that we are not claiming that
1.68 for people who consume 40+ units of samples. For an OR, a CI that includes 1 alcohol consumption causes liver cancer
alcohol per week versus those who consume (e.g. 0.9 to 2.5) prevents us from inferring a (although there is ample evidence to suggest
0 units per week. This means that the odds significant difference between groups. this is the case). Rather, we are merely
of liver cancer are 1.68 times as high (or 68% If we adjust our multiple logistic saying that excessive alcohol consumption
higher) for those consuming 40+ units of regression model for confounder variables, is associated with liver cancer; it may or may
alcohol per week than for teetotallers. then the ORs we obtain will be referred to not cause liver cancer.
Similarly, an OR of 1.22 for males versus as adjusted ORs. If in the present example we Risk has a particular meaning in statistics,
females would mean that males have 22% calculate a 95% CI of 1.25 to 2.13 for our with relative risk (RR) implying a
higher odds of developing liver cancer OR of 1.68, we could describe the results of comparison of probabilities, not odds. In the
compared to females. the multiple logistic regression thus: above example, the odds of liver cancer were
ORs are typically presented with CIs. In Compared to teetotallers, those who 1.22 times higher in males compared to
general terms, the CI is a range of values consumed 40+ units of alcohol per week at females; we should not write that the “risk”
within which the true value of a parameter age 30 had higher odds of developing liver of liver cancer was 1.22 times higher in
in the population (not in the study sample) cancer (adjusted OR=1.68, 95% CI=1.25 males, because this would be inaccurate.
is expected to lie. A narrow CI indicates to 2.13). Males had higher odds of liver Phrases that indicate or imply probability,
good precision in our OR estimate; a wider cancer than females (adjusted OR=1.22, such as “X times as likely to” and “a 50%
CI would indicate more uncertainty. 95% CI=1.03 to 1.44). higher probability of ”, should also be

40 | September 2016 Medical Writing | Volume 25 Number 3


Valveny and Gilliver – How to interpret and report the results from multivariable analyses

avoided when reporting ORs. Note that the ANOVA, and ANCOVA, the variable and the dependent
OR gives a reasonable approximation for the dependent variable is variable (after adjusting for
RR when the event is rare, but not when the continuous. One such covariates) is significant.
event is common. variable is height at age 18. Thus we could describe the
What is its relationship with results of the current analysis
Multiple Cox regression: birth length and age at puberty as:
What is a hazard ratio? onset (independent variables)? Higher birth length was associated
Multiple Cox regression is used to calculate In addressing this question by with greater height at age 18 (β=1.2
HRs. An HR indicates the instantaneous risk multiple linear regression we obtain one β cm/cm, 95% CI=0.93 to 1.49). Age at
or hazard (hazard per unit time, usually coefficient for each quantitative independent puberty onset was inversely associated with
1 day) of an event (e.g. death) in a test group variable and for each non-reference category height at age 18 (β=-0.3 cm/year, 95%
relative to a reference group. Let’s return to of each categorical independent variable. CI=-0.19 to -0.45).
the example of the new treatment for renal For continuous independent variables
cell carcinoma. The new treatment (test such as birth length the β coefficient As a final remark regarding β coefficients,
group) gives an HR for death of 0.5 versus indicates how a 1-unit change in the value of please be aware that they are sometimes also
the existing gold standard treatment the independent variable would affect the provided for multiple logistic regression and
(reference group). How do we interpret value of the dependent variable if all other Cox regression models. In such cases, β is
this? variables in the model were held constant, simply the natural logarithm (ln) of the OR
In this example, the HR indicates the and the units for β are the units for the (logistic regression) or HR (Cox
relative rates of death per day in the two dependent variable divided by those for the regression).
treatment groups. The value of 0.5 indicates independent variable. For categorical
that the rate of death at any time during the independent variables the units of the β How to report the results from
follow-up period is twice as high with the coefficient are the same as those of the multivariable models
gold standard treatment compared to the dependent variable. It is very important to Whatever the model used, good medical
new treatment. A value of 1.0 would indicate understand this to correctly describe the writing practice is to list all the factors that
no difference in rate of death between the results of the model. were taken into account in the multivariable
two treatments. If the β coefficient for birth length is analysis, including those that were discarded
Like ORs and β coefficients, HRs are positive (e.g. 1.2 cm/cm), then a higher during the modelling process. Also,
typically presented with CIs. Assuming we birth length will be associated with a greater remember always to include the parameter
adjust our multiple Cox regression model height at age 18. A negative β coefficient for that indicates the strength and direction of
for several confounder variables, we could age at puberty onset (e.g. -0.3 cm/year) the association (i.e. the OR, HR, or β
report the results in the present example as: indicates a negative association between age coefficient), preferably with the 95% CI
Compared to the gold standard treatment, at puberty onset and height at age 18. The and/or the p-value for the variable
the new treatment was associated with a statistical significance of these results should (different from the overall p-value for the
significantly lower rate of death (adjusted be reported using the p-value associated model). If you are at all unsure as to the
HR=0.5, 95% CI=0.25 to 0.75). with each β coefficient. direction of a particular association, ask a
For categorical independent statistician for clarification.
In descriptions of survival variables such as gender the β When reporting the parameter, the
analyses, RR cannot be used coefficient and the corr- writing differs depending on the direction
instead of HR, since the two esponding p-value will of the association. We round off our
terms are not synonymous. indicate whether the introduction to multivariable analyses with
Though they can be inter- category is associated with some illustrative examples:
preted in more or less the greater height at age 18 ● For ORs (logistic regression) and HRs
same way, HRs and RRs are compared to the reference (Cox regression), results are significant
calculated differently. Notably, category. A positive β coefficient when the 95% CI does not include 1:
RRs do not account for the timing for males relative to females with a ● A value <1 implies that the factor is
of the events of interest. Don’t write p-value of <0.05 would indicate that males negatively associated with (i.e.
relative risk when you mean hazard ratio! are likely to be taller than females at age 18. protects against) the outcome. The
β coefficients are often presented with percentage decrease in the odds (OR)
Multiple linear regression / corresponding CIs; sometimes the CI is or risk (HR) is (1 - OR or HR) × 100.
Multiple ANOVA / ANCOVA: replaced by the standard error (SE) and the Example: “Category X protected against
What is the β coefficient? p-value. If a CI does not include 0, the mortality (adjusted OR=0.8, 95%
In multiple linear regression, multiple association between the independent CI=0.6 to 0.9 versus reference category

www.emwa.org Volume 25 Number 3 | Medical Writing September 2016 | 41


How to interpret and report the results from multivariable analyses – Valveny and Gilliver

Y)” or “Compared to Y, X was associated ● A value <0 implies that the factor is Acknowledgements
with a 20% reduction in the odds of negatively associated with the outcome. We would like to thank Elena Stolyarova, a
death.” Examples: biostatistician at TFS, for her insightful
● A value >1 implies that the factor is Quantitative factor: “A 1-unit increase comments on an early draft of this article.
positively associated with (i.e. in X was associated with a decrease of We also thank Adam Jacobs, Senior
increases the risk of) the outcome. 3 mmHg in systolic blood pressure at Principal Statistician at Premier Research,
The percentage increase in the odds 4 weeks (β=-3, 95% CI=-2.1 to -3.9)”. for providing constructive feedback and
(OR) or risk (HR) is (OR or HR - 1) Categorical factor: “Compared to suggesting improvements.
× 100. placebo, treatment with drug Z was
Example: “Category X was a risk factor associated with a decrease of 3 mmHg in Conflicts of Interest and
for mortality (adjusted HR=1.5, 95% systolic blood pressure at 4 weeks (β=-3, Disclaimers
CI=1.1 to 1.9 versus reference category 95% CI=-2.1 to 3.9).” The content of this article reflects the view
Y)” or “Compared to Y, X was associated ● A value >0 implies that the factor is of the authors, and not those of TFS. There
with a 50% increase in the risk of death.” positively associated with the are no conflicts of interest.
When the percentage is ≥100, the outcome.
“number of times” construction is Examples:
often used: Quantitative factor: “A 1 mg/dl Author information
Example: “Patients with X had a risk of increase in X was associated with a 2- Neus Valveny has been a medical writer
death approximately three times higher unit increase in quality of life score at 6 for the last 15 years, preparing more than
compared to those with Y (adjusted months (β=2, SE=0.3, p=0.025).” 40 manuscripts, 50 abstracts and posters,
HR=3.2, 95% CI=2.1 to 4.9).” Categorical factor: “Compared to 20 study protocols, and 15 study reports.
● For β coefficients (multiple regression, patients with mild disease at baseline, She has also worked as a statistician. She
multiple ANOVA, ANCOVA), severe disease was associated with a 2- is now Associate Director of Medical
results are significant when unit lower quality of life score at Writing at TFS, a medium-sized
the 95% CI does not 6 months (β=–2, SE=0.3, international CRO founded in Sweden.
include 0: p=0.025).”
Previously a full-time science editor and
part-time copy editor, Stephen Gilliver
is now a medical writer at TFS. He is also
Co-Editor of Medical Writing.

Call for Companies


The 2nd Medical Writing Internship Forum will be held at our
May 2017 Conference in Birmingham, UK.

Please contact [email protected] for more information.

42 | September 2016 Medical Writing | Volume 25 Number 3

You might also like