0% found this document useful (0 votes)

20 views54 pages

Logistic Reg

This document discusses logistic regression for modelling binary event probabilities. It defines key concepts such as: 1) The probability, odds, and log-odds of an event occurring, and how these measures are related. 2) How logistic regression models the log-odds of an event in terms of predictor variables using a logistic regression equation. 3) How the intercept and coefficients in this equation can be interpreted on the probability, odds, and log-odds scales to understand how the predictors influence the event probability.

Uploaded by

Eric Lauron

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views54 pages

Logistic Reg

Uploaded by

Eric Lauron

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Logistic Regression: Univariate and

Multivariate

1
Events and Logistic Regression

I Logisitic regression is used for modelling event

probabilities.

I Example of an event: Mrs. Smith had a myocardial

infarction between 1/1/2000 and 31/12/2009.

I The occurrence of an event is a binary (dichotomous)

variable. There are two possibilities: the event occurs or it
does not occur.

I For this reason, event occurrence variables can always be

coded with 0, 1 e.g.
Yi = 1 ⇐⇒ person i became pregnant in 2011.
Yi = 0 ⇐⇒ person i did not become pregnant in 2011.

2
Measuring the Probability of an Event

I There are many equivalent ways of measuring the

probability of an event.
I We will use three:
1 probability of the event
2 odds in favour of the event
3 log-odds in favour of the event

I These are equivalent in the sense that if you know the

value of one measure for an event you can compute the
value of the other two measures for the same event
cf. measuring a distance in kilometres, statute miles or
nautical miles

3
The Probability of an Event

I This is a number π between 0 and 1. We write

π = P(Y = 1)

to mean π is the probability that Y = 1.

I π = 1 means we know the event is certain to occur.

I π = 0 means we know the event is certain not to occur.

I Values between 0 and 1 represent intermediate states of

certainty, ordered monotonically.

I Because we are certain one of Y = 1 and Y = 0 is true and

because they cannot be true simultaneously:

P(Y = 0) = 1 − P(Y = 1) = 1 − π.

4
Odds in Favour of an Event

I The odds in favour of an event is defined as the

probability the event occurs divided by the probability the
event does not occur.

I The odds in favour of Y = 1 is defined as:

P(Y = 1) P(Y = 1) π
ODDS(Y = 1) = = = .
P(Y 6= 1) P(Y = 0) 1−π
I Note:
1 1−π
ODDS(Y = 0) = = .
ODDS(Y = 1) π
so

ODDS(Y = 1) × ODDS(Y = 0) = 1.

5
Interpreting the Odds in Favour of an Event

I An odds is a number between 0 and ∞.

I An odds of 0 means we are certain the event does not

occur.

I An increased odds corresponds to increased belief in the

occurrence of the event.

I An odds of 1 corresponds to a probability of 1/2.

I An odds of ∞ corresponds to certainty the event occurs.

6
Log-odds in Favour of an Event

I The log odds in favour of an event is defined as the log of

the odds in favour of the event:
P(Y = 1) π
log ODDS(Y = 1) = log = log .
P(Y = 0) 1−π
I Note
1−π
log ODDS(Y = 1) = − log ODDS(Y = 0) = log
π

7
Interpreting the Log-odds in Favour of an Event

I A log-odds is a number between −∞ and ∞.

I A log odds of −∞ means we are certain the event does

not occur.

I An increased log-odds corresponds to increased belief in

the occurrence of the event.

I A log-odds of 0 corresponds to a probability of 1/2.

I A log-odds of ∞ corresponds to certainty the event occurs.

8
Moving between Probability, Odds and Log-odds

I You can use the following table to compute one measure

of probability from another:

P ODDS log ODDS

P(Y = 1) = π π
1−π
π
log 1−π
o
ODDS(Y = 1) = o 1+o log o
ex
log ODDS(Y = 1) = x 1+ex ex

I Choose the row corresponding to the quantity you start

with and the column corresponding to the quantity you
want to compute.
π
I log 1−π is often written logit(π).
exp(x)
I
1+exp(x) is often written inv. logit(x) (sometimes expit(x)).
9
Motivation for (Multivariate) Logistic Regression

I We want to model P(Y = 1) in terms of a set of predictor

variables X1 , X2 ,... Xp (for univariate regression p = 1).

I In linear regression we use the regression equation

E(Y) = β0 + β1 X1 + β2 X2 + ... + βp Xp (1)

I However, for a binary Y (0 or 1), E(Y) = P(Y = 1).

I We cannot now use equation (??), because the left hand

side is a number between 0 and 1 while the right hand
side is potentially a number between −∞ and ∞.

I Solution: replace the LHS with logit EY :

logit E(Y) = β0 + β1 X1 + β2 X2 + ... + βp Xp

10
Logistic Regression Equation Written on Three Scales

I We defined the regression equation on the logit or

log ODDS scale:

log ODDS(Y = 1) = β0 + β1 X1 + β2 X2 + ... + βp Xp

I On the ODDS scale the same equation may be written:

ODDS(Y = 1) = exp(β0 + β1 X1 + β2 X2 + ... + βp Xp )

I On the probability scale the equation may be written:

exp(β0 + β1 X1 + β2 X2 + ... + βp Xp )
P(Y = 1) =
1 + exp(β0 + β1 X1 + β2 X2 + ... + βp Xp )

11
Interpreting the Intercept

I In order to obtain a simple interpretation of the intercept

we need to find a situation in which the other parameters
(β1 , ..., βp ) vanish.
I This happens when X1 , X2 ..., Xp are all equal to 0.
I Consequently we can interpret β0 in 3 equivalent ways:
1 β0 is the log-odds in favour of Y = 1 when
X1 = X2 ... = Xp = 0.
2 β0 is such that exp(β0 ) is the odds in favour of Y = 1 when
X1 = X2 ... = Xp = 0.
exp(β )
0
3 β0 is such that 1+exp(β is the probability that Y = 1 when
0)
X1 = X2 ... = Xp = 0.

I You can choose any one of these three interpretations

when you make a report.
12
Univariate Picture: Intercept
Pr(Y = 1) = inv. logit(β0 + β1 X1 )

0.8

0.6 exp(β0 )
1+exp(β0 )

0.4

0.2

0
−2 −1 0 1 2 3
X1
I P(Y = 1) vs. X1 when p = 1 (univariate regression).
13
Univariate Picture: Sign of β1

1
Pr(Y = 1)

0.5

0
−2 0 2
X1
I When β1 > 0, P(Y = 1) increases with X1 (blue curve).

I When β1 < 0, P(Y = 1) decreases with X1 (red curve).

14
Univariate Picture: Magnitude of β1

1
Pr(Y = 1)

0.5

0
−2 0 2
X1
I β1 = 2 (blue curve), β1 = 4 (red curve).
I When |β1 | is greater, changes in X1 more strongly
influence the probability that the event occurs.
15
Interpreting β1 : Univariate Logistic Regression

I To obtain a simple interpretation of β1 we need to find a

way to remove β0 from the regression equation.

I On the log-odds scale we have the regression equation:

log ODDS(Y = 1) = β0 + β1 X1

I This suggests we could consider looking at the difference

in the log odds at different values of X1 , say t + z and t.

log ODDS(Y = 1|X1 = t + z) − log ODDS(Y = 1|X1 = t)

which is equal to

β0 + β1 (t + z) − (β0 + β1 t) = zβ1 .

16
Interpreting β1 : Univariate Logistic Regression

I By putting z = 1 we arrive at the following interpretation

of β1 :
β1 is the additive change in the log-odds in favour of Y = 1
when X1 increases by 1 unit.

I We can write an equivalent second interpretation on the

odds scale:
exp(β1 ) is the multiplicative change in the odds in favour of
Y = 1 when X1 increases by 1 unit.

17
β1 as a Log-odds Ratio

I The first interpretation of β1 expresses the equation:

ODDS(Y = 1|X1 = t + z)
log = zβ1
ODDS(Y = 1|X1 = t)

whilst the second interpretation expresses the equation:

ODDS(Y = 1|X1 = t + z)
= exp(zβ1 ).
ODDS(Y = 1|X1 = t)

I The quantity ODDS(Y=1|X 1 =t+z)

ODDS(Y=1|X1 =t) is the odds-ratio in favour
of Y = 1 for X1 = t + z vs. X1 = t.

18
Interpreting Coefficients in Multivariate Logistic
Regression

I The interpretation of regression coefficients in

multivariate logistic regression is similar to the
interpretation in univariate regression.

I We dealt with β0 previously.

I In general the coefficient βk (corresponding to the variable

Xk ) can be interpreted as follows:
βk is the additive change in the log-odds in favour of Y = 1
when Xk increases by 1 unit, while the other predictor variables
remain unchanged.

I As in the univariate case, an equivalent interpretation can

be made on the odds scale.

19
Fitting a Logistic Regression in R
I We fit a logistic regression in R using the glm function:
> output <- glm(sta ~ sex, data=icu1.dat, family=binomial)

I This fits the regression equation

logit P(sta = 1) = β0 + β1 × sex.

I data=icu1.dat tells glm the data are stored in the data

frame icu1.dat.

I family=binomial tells glm to fit a logistic model.