Lecture 13: Introduction to
Logistic Regression
Sandy Eckel
[email protected]
13 May 2008
1
Logistic Regression
Basic Idea:
Logistic regression is the type of
regression we use for a response variable
(Y) that follows a binomial distribution
Linear regression is the type of regression
we use for a continuous, normally
distributed response (Y) variable
Remember the Binomial Distribution?
2
Review of the Binomial Model
Y ~ Binomial(n,p)
n independent trials
(e.g., coin tosses)
p = probability of success on each trial
(e.g., p = ½ = Pr of heads)
Y = number of successes out of n trials
(e.g., Y= number of heads)
3
Binomial Distribution Example
Binomial probability density function (pdf):
n y
P(Y = y ) = p (1 − p )
n− y
y
Example:
4
Why can’t we use Linear Regression
to model binary responses?
The response (Y) is NOT normally distributed
The variability of Y is NOT constant
Variance of Y depends on the expected value of Y
For a Y~Binomial(n,p) we have Var(Y)=pq which
depends on the expected response, E(Y)=p
The model must produce predicted/fitted
probabilities that are between 0 and 1
Linear models produce fitted responses that vary
from -∞ to ∞
5
Binomial Y example
Consider a phase I clinical trial in which
35 independent patients are given a
new medication for pain relief. Of the 35
patients, 22 report “significant” relief
one hour after medication
Question: How effective is the drug?
6
Model
Y = # patients who get relief
n = 35 patients (trials)
p = probability of relief for any patient
The truth we seek in the population
How effective is the drug? What is p?
Want a method to
Get best estimate of p given data
Determine range of plausible values for p
7
How do we estimate p?
Maximum Likelihood Method
The method of maximum likelihood estimation chooses
values for parameter estimates which make the observed
data “maximally likely” under the specified model
Likelihood Function: Pr(22 of 35)
1.0e-10
Max Likelihood
Likelihood
5.0e-11
MLE: p=0.63
0
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
p=Prob(Event) 8
Maximum Likelihood
Clinical trial example
Under the binomial model, ‘likelihood’ for observed Y=y
n y
P(Y = y ) = p (1 − p )
n− y
y
So for this example the likelihood function is:
35 22
P(Y = y ) = p (1 − p )
13
22
So, estimate p by choosing the value for p which makes
observed data “maximally likely”
i.e., choose p that makes the value of Pr (Y=22) maximal
The ML estimate of p is y/n
= 22/35
= 0.63
The estimated proportion of patients who will experience
relief is 0.63 9
Confidence Interval (CI) for p
Recall the general form of any CI:
Estimate ± (something near 2) x SE(estimate)
p (1 − p ) pq
Variance of p̂ : Var( p̂)= =
n n
“Standard Error” of p̂ : pq
n
Estimate of “Standard Error” of p̂ : pˆ qˆ
n
10
Confidence Interval for p
95% Confidence Interval for the ‘true’
proportion, p:
pˆ ± 1.96
pˆ qˆ
= 0.63 ± 1.96
(0.63)(0.37)
n 35
LB: 0.63-1.96(.082)
UB: 0.63+1.96(.082)
=(0.47, 0.79)
11
Conclusion
Based upon our clinical trial in which 22 of 35
patients experience relief, we estimate that 63%
of persons who receive the new drug experience
relief within 1 hour (95% CI: 47% to 79%)
Whether 63% (47% to 79%) represents an
‘effective’ drug will depend many things,
especially on the science of the problem.
Sore throat pain?
Arthritis pain?
Childbirth pain?
12
Aside: Review of Probabilities and Odds
The odds of an event are defined as:
P(Y = 1) P(Y = 1)
odds(Y=1) = =
P(Y = 0) 1 - P(Y = 1)
=
p
1-p
We can go back and forth between odds and
probabilities:
p
Odds =
1-p
p = odds/(odds+1)
13
Aside: Review of Odds Ratio
We saw that an odds ratio (OR) can be
helpful for comparisons.
Recall the Vitamin A trial where we
looked at the odds ratio of death
comparing the vitamin A group to the
no vitamin A group:
odds(Death | Vit. A)
OR =
odds(Death | No Vit A.)
14
Aside: Review of Odds Ratio Interpretation
The OR here describes the benefits of
Vitamin A therapy. We saw for this
example that:
OR = 0.59
The Vitamin A group had 0.60 times the
odds of death of the no Vitamin A group; or
An estimated 40% reduction in mortality
OR is a building block for logistic
regression
15
Logistic Regression
Suppose we want to ask whether new
drug is better than a placebo and have
the following observed data:
Relief? Drug Placebo
No 13 20
Yes 22 15
Total 35 35
16
Confidence Intervals for p
Placebo
( )
( ) Drug
0
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
p
17
Odds Ratio
odds(Relief | Drug)
OR =
odds(Relief | Placebo)
P(Relief | Drug) / [1 - P(Relief | Drug)]
=
P(Relief | Placebo) / [1 - P(Relief | Placebo)]
0.63/(1 - 0.63)
= = 2.26
0.45/(1 - 0.45)
18
Confidence Interval for OR
CI used Woolf’s method for the standard
error of log(Oˆ R ) (from lecture 6)
1 1 1 1
se(log(Oˆ R)) = + + + = 0.489
22 13 15 20
find log( ˆ
O R ) ± 1 . 96 se (log( ˆ
O R ))
Then (eL,eU)
19
Interpretation
OR = 2.26
95% CI: (0.86 , 5.90)
The Drug is an estimated 2 ¼ times
better than the placebo.
But could the difference be due to
chance alone?
YES ! 1 is a ‘plausible’ true population OR
20
Logistic Regression
Can we set up a model for this binomial
outcome similar to what we’ve done in
regression?
Idea: model the log odds of the event,
(in this example, relief) as a function of
predictor variables
21
A regression model for the log odds
P(relief | Tx)
log[ odds(Relief | Tx) ] = log = β + β Tx
0 1
P(no relief | Tx)
where: Tx = 0 if Placebo
1 if Drug
log( odds(Relief|Drug) ) = β0 + β1
log( odds(Relief|Placebo) ) = β0
log( odds(Relief|D)) – log( odds(Relief|P)) = β1
22
And…
Because of the basic property of logs:
log( odds(Relief|D)) – log( odds(Relief|P)) = β1
odds(R | D)
log =
β1
odds(R | P)
And: OR = exp(β1) = eβ1 !!
So: exp(β1) = odds ratio of relief for patients
taking the Drug-vs-patients taking the Placebo.
23
Logistic Regression
Logit estimates Number of obs = 70
LR chi2(1) = 2.83
Prob > chi2 = 0.0926
Log likelihood = -46.99169 Pseudo R2 = 0.0292
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Tx | .8137752 .4889211 1.66 0.096 -.1444926 1.772043
(Intercept) | -.2876821 .341565 -0.84 0.400 -.9571372 .3817731
------------------------------------------------------------------------------
Estimates:
log( odds(relief|Tx) ) = βˆ0 + βˆ1Tx
= -0.288 + 0.814(Tx)
Therefore: OR = exp(0.814) = 2.26 !
So 2.26 is the odds ratio of relief for patients taking the Drug
compared to patients taking the Placebo 24
It’s the same as the OR we got before!
So, why go to all the trouble of setting up a
linear model?
What if there is a biologic reason to expect
that the rate of relief (and perhaps drug
efficacy) is age dependent?
What if
Pr(relief) = function of Drug or Placebo AND Age
We could easily include age in a model such
as:
log( odds(relief) ) = β0 + β1Drug + β2Age
25
Logistic Regression
As in MLR, we can include many
additional covariates
For a Logistic Regression model with
r number of predictors:
log ( odds(Y=1)) = β0 + β1X1 + ... + βrXr
Pr(Y = 1) Pr(Y = 1)
where: odds(Y=1) = =
1 − Pr(Y = 1) Pr(Y = 0)
26
Logistic Regression
Thus:
Pr(Y = 1)
log = β0 + β1X1 + ... + βrXr
Pr(Y = 0)
But, why use log(odds)?
Linear regression might estimate anything
(-∞, +∞), not just a proportion in the range of
0 to 1
Logistic regression is a way to estimate a
proportion (between 0 and 1) as well as some
related items
27
Another way to motivate using log(OR) for the
lefthand side of logistic regression
We would like to use something like
what we know from linear regression:
Continuous outcome = β0 + β1X1 + β2X2+…
How can we turn a proportion into a
continuous outcome?
28
Transforming a proportion…
A proportion is a value between 0 and 1
The odds are always positive:
p
odds= ⇒ [0,+∞)
1− p
The log odds is continuous:
p
Logodds= ln ⇒ (−∞,+∞)
1− p
29
“Logit” transformation of the probability
Measure Min Max Name
Pr(Y = 1) 0 1 “probability”
Pr(Y = 1)
0 ∞ “odds”
1 − Pr(Y = 1)
Pr(Y = 1)
log -∞ ∞ “log-odds” or “logit”
1 − Pr(Y = 1)
30
Logit Function
Relates log-odds (logit) to p = Pr(Y=1)
logit function
10
5
log-odds
-5
-10
0 .5 1
Probability of Success
31
Key Relationships
Relating log-odds, probabilities, and
parameters in logistic regression:
Suppose we have the model:
logit(p) = β0 + β1X
p
i.e. log = β0 + β1X
1-p
Take “anti-logs” to get back to OR scale
p
= exp(β0 + β1X)
1-p
32
Solve for p as a function of the coefficients
p/(1-p) = exp(β0 + β1X)
p = (1 – p)⋅exp(β0 + β1X)
p = exp(β0 + β1X) – p ⋅ exp(β0 + β1X)
p + p ⋅exp(β0 + β1X) = exp(β0 + β1X)
p ⋅{1+ exp(β0 + β1X)} = exp(β0 + β1X)
exp(β + β X )
p= 0 1
1 + exp(β + β X )
0 1
33
What’s the point of all that algebra?
Now we can determine the estimated
probability of success for a specific set
of covariates, X, after running a logistic
regression model
34
Example
Dependence of Blindness on Age
The following data concern the Aegean
island of Kalytos where inhabitants
suffer from a congenital eye disease
whose effects become more marked
with age.
Samples of 50 people were taken at five
different ages and the numbers of blind
people were counted
35
Example: Data
Age Number blind / 50
20 6 / 50
35 7 / 50
45 26 / 50
55 37 / 50
70 44 / 50
36
Question
The scientific question of interest is to
determine how the probability of
blindness is related to age in this
population
Let pi = Pr(a person in age classi is blind)
37
Model 1 – Intercept only model
logit(pi) = β0*
β0*= log-odds of blindness for all ages
exp(β0*) = odds of blindness for all ages
No age dependence in this model
38
Model 2 – Intercept and age
logit(pi) = β0 + β1(agei – 45)
β0 = log-odds of blindness among 45 year olds
exp(β0) = odds of blindness among 45 year olds
β1 = difference in log-odds of blindness
comparing a group that is one year older than
another
exp(β1) = odds ratio of blindness comparing a
group that is one year older than another
39
Results
Model 1: logit(pi) = β0*
Logit estimates Number of obs = 250
LR chi2(0) = 0.00
Prob > chi2 = .
Log likelihood = -173.08674 Pseudo R2 = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(Intercept) | -.0800427 .1265924 -0.63 0.527 -.3281593 .1680739
------------------------------------------------------------------------------
exp( −.08)
logit( p̂i) = -0.08 or pˆ i = = 0.48
1 + exp( −.08)
40
Results
Model 2: logit(pi) = β0 + β1(agei – 45)
Logit estimates Number of obs = 250
LR chi2(1) = 99.30
Prob > chi2 = 0.0000
Log likelihood = -123.43444 Pseudo R2 = 0.2869
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0940683 .0119755 7.86 0.000 .0705967 .1175399
(Intercept) | -4.356181 .5700966 -7.64 0.000 -5.473549 -3.238812
------------------------------------------------------------------------------
logit( p̂i) = -4.4 + .094(agei - 45)
or exp(− 4.4 + 0.094(age − 45)
pˆi = i
( (
1 + exp − 4.4 + 0.094 age − 45
i
)
41
Test of significance
Is the addition of the age variable in the
model important?
Maximum likelihood estimates:
β̂1 =0.094 s.e.(β̂1 )=0.012
z-test: H0: β1 = 0
z=7.855; p-val=0.000
95% C.I. (0.07, 0.12)
42
What about the Odds Ratio?
Maximum likelihood estimates:
OR = exp( β̂1)= exp(0.094)= 1.10
SE( β̂1 ) =SE(log(OR) ) = 0.013
Same z-test, reworded for OR scale:
Ho: exp(β1) = 1
z = 7.86 p-val = 0.000
95% C.I. for β1 (1.07, 1.13)
*(calculated on log scale, then exponentiated!!)
e(0.094 – 1.96*0.013), e(0.094 + 1.96*0.013)
It appears that blindness is age dependent
Note: exp(0) = 1, where is this fact useful?
43
Model 1 fit
Plot of observed proportion -vs-
predicted proportions using an intercept
only model
1 Observed
Prob Blindness
.5
Predicted
0
20 40 60 80
Age
44
Model 2 fit
Plot of observed proportion -vs-
predicted proportions with age in the
model
1
Observed
Prob Blindness
.5
Predicted
0
20 40 60 80
Age 45
Conclusion
Model 2 clearly fits better than Model 1!
Including age in our model is better
than intercept alone.
46
Lecture 13 Summary
Logistic regression gives us a framework in which
to model binary outcomes
Uses the structure of linear models, with
outcomes modelled as a function of covariates
As we’ll see, many concepts carry over from
linear regression
Interactions
Linear splines
Tests of significance for coefficients
All coefficients will have different
interpretations in logistic regression
Log odds or Log odds ratios!
47
HW 3 Hint
General logistic model specification:
Systematic:
logit( P(Yi = 1)) = log(odds(Yi = 1)) = β 0 + β1 x1 + β 2 x2
Random:
Yi ~ Binomial (n, p ) = Binomial (1, pi )
where pi depends on the covariates for person i
48