Logistic Regression: Prof. Andy Field
Logistic Regression: Prof. Andy Field
Slide 3
With One Predictor
P(Y ) =1+e- ( b10 +b1X1i )
• Outcome
– We predict the probability of the outcome
occurring
• b0 and b0
– Can be thought of in much the same way as
multiple regression
– Note the normal regression equation forms part
of the logistic regression equation
Slide 4
With Several Predictor
P(Y ) =1+e- ( b0 +b1X1i+b12 X2 i +...+bn Xni )
• Outcome
– We still predict the probability of the outcome
occurring
• Differences
– Note the multiple regression equation forms part
of the logistic regression equation
– This part of the equation expands to
accommodate additional predictors
Slide 5
Assessing the Model: the log-
likelihood statistic
N
log likelihood Y ln PY 1 Y ln 1 PY
i1
i i i i
• This R-statistic
– is the partial correlation between the outcome
variable and each of the predictor variables.
– It can vary between −1 and 1.
• The R-statistic is given by:
• Or:
Assessing Predictors: The Wald
Statistic
b
Wald SE b
Slide 9
The odds ratio: exp(B)
Odds after a unit change in the predictor
Odds ratio =
Original odds
Slide 10
Methods of Regression
• Forced Entry: All variables entered
simultaneously.
• Hierarchical: Variables entered in blocks.
– Blocks should be based on past research, or theory
being tested. Good Method.
• Stepwise: Variables entered on the basis of
statistical criteria (i.e. relative contribution to
predicting outcome).
– Should be used only for exploratory analysis.
Slide 11
Model building and Parsimony
• When building a model we should strive
for parsimony.
– predictors should not be included unless they
have explanatory benefit.
• First fit the model that includes all
potential predictors, and then
systematically remove any that don’t seem
to contribute to the model.
Things That Can go Wrong
• Assumptions from Linear Regression:
– Linearity
– Independence of Errors
– Multicollinearity
• Unique Problems
– Incomplete Information
– Complete Separation
– Overdispersion
Incomplete Information From the
Predictors
• Categorical Predictors:
– Predicting cancer from smoking and eating tomatoes.
– We don’t know what happens when nonsmokers eat tomatoes
because we have no data in this cell of the design.
• Continuous variables
– Will your sample contain a to include an 80 year old, highly
anxious, Buddhist left-handed lesbian?
Complete Separation
• When the outcome variable can be perfectly
predicted.
– E.g. predicting whether someone is a burglar or your
teenage son or your cat based on weight.
– Weight is a perfect predictor of cat/burglar unless you
have a very fat cat indeed!
1.0 1.0
0.8 0.8
Probability of Outcome
Probability of Outcome
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
20 30 40 50 60 70 80 90 0 20 40 60 80
Slide 17
Output: Initial Model
Output: Initial analysis
The output is split into two blocks: block 0 describes the model before Intervention is
included, and block 1 describes the model after Intervention is included. As such, block 1
Output: Block 0
is the main bit in which we’re interested. The bit of the block 0 output that does come in
useful is in Output 19.3, and will be there only if you selected Iteration history in Figure
19.10. This table tells us the initial 2LL, which is 154.084. We’ll use this value later so
don’t forget it.
OUT