Logistic Regression: Multivariate Analysis
Logistic Regression: Multivariate Analysis
Multivariate Analysis
What is a log and an exponent?
Log is the power to which a base of 10 must
be raised to produce a given number. The log
of 1000 is 3 as 103=1000.
The log of an odds ratio of 1.0 is 0 as 100 = 1
Exponent (e) or 2.718 raised to a certain
power is the antilog of that number. Thus,
(expβ) = antilog β
Antilog of log odds 0 is 2.718o =1
Exponential increases are curvilinear.
Main questions with logistic regression
How do the odds of a successful
outcome depend upon or change based
on each explanatory variable (X)?
How does the probability that a
successful outcome occurs depend
upon or change based on each
explanatory variable (X)?
π
Logistic regression
Single binary response variable predicted by
categorical and interval variables
Maximum likelihood model – the coefficients
that make sample observations most likely are
reported in the final model
Binomial distribution that assumes a sigmoid
curve (non-linear)
The probability of success falls between 0 and
1 for all possible values of X (s-curve bends)
Sigmoid curve for logistic regression
Response variable
Denote Y by 0 and 1 (dummy coding)
0 and 1 are usually termed failure and
success of an outcome (by convention,
success is category 1)
The sample mean of Y is the sum of the
number of successes divided by the
sample size (proportion of success)
Odds ratios in logistic regression
Can be thought of as likelihood or odds success
based impact of predictors in model
Interval : the odds of success for those who are a
unit apart in X, net of other predictors.
For dummy coefficients: the odds of success for
those in the reference category of X (1) compared
with those in the omitted (0)
Every unit increase in X has an exponential effect
on the odds of success so an odds ratio can be >1
Odds ratio
π / 1- π is the odds ratio or the odds of
success
When the probability of success or π is
½ or 50-50, odds for success equals .
5/1-.5 = 1.0. This means that success is
equally as likely as failure
Thus, predicted probability of .5 and an
odds ratio of 1.0 are our points of
comparison when making inferences
Logistic transformation of odds ratio
To model dichotomous outcomes, SPSS takes logistic transformation
of odds ratios:
Log (π / 1- π ) = α + βX1 + βX2 …
To interpret, we take the exponent values of beta coefficient for each
predictor (can do for all in model)
Odds ratio or the odds of success are:
π / 1- π = e α + βX = e α + (e β)X
Exponent
We can also talk about the percentage
change in odds for interval and dummy
variables
Thus, the exponential beta value in the
SPSS output can be calculated into a
percent by 100 (exp b –1) or the
percentage change in odds for each unit
increase in the independent variable.
We don’t really talk about the intercept
here … betas for each predictor are our
concern
We can also talk about the probability of
success or π
Can calculate point estimates by substituting
specific X values, thus it is good for
forecasting, given respondent characteristics
Impact of X on π is interactive/non-constant
π is the probability of success and this
probability varies as X changes and it is
expressed in a % form (ranges 0-1)
π = e α + βX / 1 + e α + βX or odds / 1 + odds
Slope in logistic regression
models (FYI)
Like the slope of a straight line, β refers to
whether the sigmoid curve (π or prob. of
success) increases β+ or decreases β- as the
values of the intervals increase or we move
from 0 to 1 for dummy
Steepness of s-curve increases as absolute
value of β increases
The rate at which the curve climbs or
descends changes according to the values of
the independent variable thus β (X)
Slope in logistic regression models (FYI)
When β = 0, π does not change as X
increases (X has no bearing on probability or
odds of success ) so the curve is flat, there is
just a straight line
For β > 0, π increases as X increases
(probability of success increases thus curve
increases)
For β < 0, π decreases as X increases
(probability of success decreases thus curve
decreases)
Mention .5 bit on next slide
Slope in logistic regression (FYI)
Null hypothesis for predictors
Ho: β = 0
for Log (π / 1- π ) = α + βX 1…I
X has no effect on the likelihood that
[y =1] an outcome will occur
Y is independent of X so the likelihood
of being successful is the same for all
income groups
Wald Statistic
Null hypothesis test statistic for each predictor
in your model
Wald statistic is the significance test for
each parameter in the model
Null is that each β = 0
Has df=1; Chi-square distribution
It is the square of z statistic which
equals β/s.error
-2 log likelihood as test of null
hypothesis for entire model
A test of significance for model and is like the
F-ratio; chi-square distribution; df = pα + β - pα
Does the observed likelihood or odds of
success differ from 1?
Compares the model with the intercept alone
to intercept and predictors. Do your predictors
add to the predictive power of the model?
Tests if the difference is 0 and is referred to
as the model chi-square
Goodness of Fit Statistic – null
for residuals (FYI)
Compares observed probabilities or what you
observed in the sample to the predicted
probabilities of an outcome occurring based on
model parameters in your equation
Examines residuals – do the predictor
coefficients significantly minimize their squared
distances?
Chi-square distribution; df = p
Should be NS as observed and predicted are
anticipated to be quite similar
Mean of our response variable
attending self-help group (FYI)
The sample mean of Y is the sum of the
number of successes (yes to attend)
divided by the sample size, n
The sample mean is the proportion of
successful outcomes
Thus, 44 said yes and n = 400, thus
mean proportion of yes is .11 or 11%
Odds ratio and % in odds change by age
Age =-.0586 and p<.01 (beta negative).
Thus, log odds of attending a self-help group
decrease as a person gets older
Exp = .9431 … the odds ratio [exp <1] thus
odds decrease
% change (in this case a reduction in) in odds
of attending for each additional year of age is
371.093, 12 df, ns
Our model parameters minimize the
squared distances [residual] between
actual sample observations of
attendance to that which the logistic
regression equation predicts (odds and
probabilities)
Logistic Regression References
DeMaris, A. (1995). A tutorial in logistic regression.
Journal of Marriage and the Family, 57(10): 956-968