0% found this document useful (0 votes)
78 views29 pages

Logistic Regression: Multivariate Analysis

This document provides an overview of logistic regression, including: 1) Logistic regression predicts a binary outcome using categorical and continuous predictors, assuming a sigmoid relationship between predictors and outcome. 2) Key outputs include odds ratios describing the effect of each predictor on the odds of the outcome, and predicted probabilities of the outcome for different predictor values. 3) Model fit is assessed using Wald tests of predictor coefficients, a likelihood ratio test comparing the model to an intercept-only model, and a goodness-of-fit test comparing predicted to observed outcomes.

Uploaded by

Pradeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views29 pages

Logistic Regression: Multivariate Analysis

This document provides an overview of logistic regression, including: 1) Logistic regression predicts a binary outcome using categorical and continuous predictors, assuming a sigmoid relationship between predictors and outcome. 2) Key outputs include odds ratios describing the effect of each predictor on the odds of the outcome, and predicted probabilities of the outcome for different predictor values. 3) Model fit is assessed using Wald tests of predictor coefficients, a likelihood ratio test comparing the model to an intercept-only model, and a goodness-of-fit test comparing predicted to observed outcomes.

Uploaded by

Pradeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Logistic Regression

Multivariate Analysis
What is a log and an exponent?
 Log is the power to which a base of 10 must
be raised to produce a given number. The log
of 1000 is 3 as 103=1000.
 The log of an odds ratio of 1.0 is 0 as 100 = 1
 Exponent (e) or 2.718 raised to a certain
power is the antilog of that number. Thus,
(expβ) = antilog β
 Antilog of log odds 0 is 2.718o =1
 Exponential increases are curvilinear.
Main questions with logistic regression
 How do the odds of a successful
outcome depend upon or change based
on each explanatory variable (X)?
 How does the probability that a
successful outcome occurs depend
upon or change based on each
explanatory variable (X)?

π
Logistic regression
 Single binary response variable predicted by
categorical and interval variables
 Maximum likelihood model – the coefficients
that make sample observations most likely are
reported in the final model
 Binomial distribution that assumes a sigmoid
curve (non-linear)
 The probability of success falls between 0 and
1 for all possible values of X (s-curve bends)
Sigmoid curve for logistic regression
Response variable
 Denote Y by 0 and 1 (dummy coding)
 0 and 1 are usually termed failure and
success of an outcome (by convention,
success is category 1)
 The sample mean of Y is the sum of the
number of successes divided by the
sample size (proportion of success)
Odds ratios in logistic regression
 Can be thought of as likelihood or odds success
based impact of predictors in model
 Interval : the odds of success for those who are a
unit apart in X, net of other predictors.
 For dummy coefficients: the odds of success for
those in the reference category of X (1) compared
with those in the omitted (0)
 Every unit increase in X has an exponential effect
on the odds of success so an odds ratio can be >1
Odds ratio
 π / 1- π is the odds ratio or the odds of
success
 When the probability of success or π is
½ or 50-50, odds for success equals .
5/1-.5 = 1.0. This means that success is
equally as likely as failure
 Thus, predicted probability of .5 and an
odds ratio of 1.0 are our points of
comparison when making inferences
Logistic transformation of odds ratio
 To model dichotomous outcomes, SPSS takes logistic transformation
of odds ratios:
Log (π / 1- π ) = α + βX1 + βX2 …
 To interpret, we take the exponent values of beta coefficient for each
predictor (can do for all in model)
 Odds ratio or the odds of success are:
π / 1- π = e α + βX = e α + (e β)X


Exponent 
We can also talk about the percentage
change in odds for interval and dummy
variables
 Thus, the exponential beta value in the
SPSS output can be calculated into a
percent by 100 (exp b –1) or the
percentage change in odds for each unit
increase in the independent variable.
 We don’t really talk about the intercept
here … betas for each predictor are our
concern
We can also talk about the probability of
success or π
 Can calculate point estimates by substituting
specific X values, thus it is good for
forecasting, given respondent characteristics
 Impact of X on π is interactive/non-constant
 π is the probability of success and this
probability varies as X changes and it is
expressed in a % form (ranges 0-1)
 π = e α + βX / 1 + e α + βX or odds / 1 + odds
Slope in logistic regression
models (FYI)
 Like the slope of a straight line, β refers to
whether the sigmoid curve (π or prob. of
success) increases β+ or decreases β- as the
values of the intervals increase or we move
from 0 to 1 for dummy
 Steepness of s-curve increases as absolute
value of β increases
 The rate at which the curve climbs or
descends changes according to the values of
the independent variable thus β (X)
Slope in logistic regression models (FYI)
 When β = 0, π does not change as X
increases (X has no bearing on probability or
odds of success ) so the curve is flat, there is
just a straight line
 For β > 0, π increases as X increases
(probability of success increases thus curve
increases)
 For β < 0, π decreases as X increases
(probability of success decreases thus curve
decreases)
 Mention .5 bit on next slide
Slope in logistic regression (FYI)
Null hypothesis for predictors

 Ho: β = 0
for Log (π / 1- π ) = α + βX 1…I
 X has no effect on the likelihood that
[y =1] an outcome will occur
 Y is independent of X so the likelihood
of being successful is the same for all
income groups
Wald Statistic
Null hypothesis test statistic for each predictor
in your model
 Wald statistic is the significance test for
each parameter in the model
 Null is that each β = 0
 Has df=1; Chi-square distribution
 It is the square of z statistic which
equals β/s.error
-2 log likelihood as test of null
hypothesis for entire model
 A test of significance for model and is like the
F-ratio; chi-square distribution; df = pα + β - pα
 Does the observed likelihood or odds of
success differ from 1?
 Compares the model with the intercept alone
to intercept and predictors. Do your predictors
add to the predictive power of the model?
 Tests if the difference is 0 and is referred to
as the model chi-square
Goodness of Fit Statistic – null
for residuals (FYI)
 Compares observed probabilities or what you
observed in the sample to the predicted
probabilities of an outcome occurring based on
model parameters in your equation
 Examines residuals – do the predictor
coefficients significantly minimize their squared
distances?
 Chi-square distribution; df = p
 Should be NS as observed and predicted are
anticipated to be quite similar
Mean of our response variable
attending self-help group (FYI)
 The sample mean of Y is the sum of the
number of successes (yes to attend)
divided by the sample size, n
 The sample mean is the proportion of
successful outcomes
 Thus, 44 said yes and n = 400, thus
mean proportion of yes is .11 or 11%
Odds ratio and % in odds change by age
 Age  =-.0586 and p<.01 (beta negative).
Thus, log odds of attending a self-help group
decrease as a person gets older
 Exp  = .9431 … the odds ratio [exp  <1] thus
odds decrease
 % change (in this case a reduction in) in odds
of attending for each additional year of age is

100(exp  - 1) = 100 (.9431 – 1) =


-5.69 % less likely each year one ages
Predicted probability of attending by age
 When  < .5, the probability of attending
declines and we would see a downward dip in
the sigmoid curve with increasing values of X
(keeping in mind probability ranges from 0-1)
 More meaningful with all predictors, however, a
point estimate for age 80 would be:
  = e(-.0586)(80) / 1 + e(-.0586)(80) = .009 The
probability of those 80 years of age attending a
group is 1%
Odds ratio and % change in odds by gender
 Gender  =1.2540 and p<.05. Thus, odds
of attending a self-help group among
females is greater (referent category is
female and beta is positive)
 Exp  = 3.5045 … odds of attending are
3.5 times as large for females as they are
for males [exp  >1]
 % change (in this case an increase in) in
odds of attending when a person is female
is 100(exp  - 1)=100(3.50 – 1) = 250 %
Predicted probability of attending by
gender
 e(1.254)(1) / 1 + e(1.254)(1) = .77
 Thus, the probability of attending
among females is 77%
 When  > .5, the probability of
attending increases and we would
see an upward trend in the sigmoid
curve with increasing values of X on
the horizontal axis (keeping in mind
probability ranges from 0-1)
Wald statistic
 Coefficient for each independent
variable is 0
 Tells us which variables significantly
predictor the likelihood of attending a
self-help group
 Age = 7.2298** Gender 5.7723*
Likelihood statistic for the model
 Likelihood or odds are 1.0 and predicted
probability is .5
 Constant alone minus constant and all
predictors 281.36838 - 240.518*
 All of our predictor variables have β = 0
 With 12 df model chi-square of 40.85 has a
p<.0001
 The predictors in model significantly add to
our capacity to predict attendance
Goodness of Fit (FYI)

 371.093, 12 df, ns
 Our model parameters minimize the
squared distances [residual] between
actual sample observations of
attendance to that which the logistic
regression equation predicts (odds and
probabilities)
Logistic Regression References
 DeMaris, A. (1995). A tutorial in logistic regression.
Journal of Marriage and the Family, 57(10): 956-968

 Agresti, A. & Finlay, B. (1997). Logistic regression –


modeling categorical responses. Statistical methods
for social sciences (3rd ed., pp. 575-619). Prentice
Hall: New Jersey.

 Dwyer, J.H. (1983). Statistical methods for the social


and behavioral sciences (pp. 447-465). Oxford
University Press: New York.

You might also like