Lect7 Math231
Lect7 Math231
Statistics
Logistic Regression
Shaheena Bashir
FALL, 2019
2/29
Outline
Background
Introduction
Logit Transformation
Assumptions
Estimation
Example
Analysis
How Good is the Fitted Model?
Motivating Example
o
4/29
Background
Scatter Plot
Relationship between Age & CHD
1.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0.8
Coronary heart disease
0.6
0.4
0.2
0.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
20 30 40 50 60 70
Age (years)
o
Not informative!!
5/29
Background
o
6/29
Background
Background
o
7/29
Background
o
8/29
Background
1.0
0.8
●
●
●
0.6
% with CHD
●
0.4
●
0.2
●
●
0.0
20 30 40 50 60 70
Age (years)
o
9/29
Introduction
o
10/29
Introduction
Sigmoid Function
o
11/29
Introduction
Logit Transformation
Logit Function
p
The logit function ln 1−p (also called log-odds) is simply the log of
ratio of P(Y = 1) divided by P(Y = 0).
p
ln = Xβ
1−p
The odds
p
= exp(X β).
(1 − p)
Solving
exp(y ) 1
p = Pr (Y = 1|X = x) = =
[1 + exp(y )] 1 + exp(−y )
gives the standard logistic function, while y = X β.
o
12/29
Introduction
Logit Transformation
Logit Function
p
g (x) = ln 1−p has many of the desirable properties of a linear
regression model.
I It may be continuous
I It is linear in the parameters
I It has the potential for a range between −∞ and +∞
depending on the range of x .
o
13/29
Introduction
Logit Transformation
o
14/29
Introduction
Assumptions
Assumptions
o
15/29
Estimation
o
16/29
Example
o
17/29
Example
Analysis
CHD Analysis
OR = exp(0.11) = 1.116
The odds of getting CHD are · · · · · · when age increases by 1 year
o
18/29
Example
Analysis
Fitted Values
exp(βo + β1 X )
p =
[1 + exp(βo + β1 X )]
exp(−5.31 + 0.11Age)
=
[1 + exp(−5.31 + 0.11Age)]
o
19/29
Example
Analysis
R Software
o
20/29
Example
Analysis
Predicted Probabilities
●
●
●
●
0.8
●
●
●
●
●
●
●
●
●
predicted probabilities
●
0.6
●
●
●
●
●
●
●
0.4
●
●
●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●●
●●
●
20 30 40 50 60 70
Age
o
21/29
Example
How Good is the Fitted Model?
Analysis of Deviance
Model: binomial, link: logit
Terms added sequentially (first to last)
How well our model fits depends on the difference between the
model and the observed data.
library(ResourceSelection)
hoslem.test(as.numeric(chdage$chd)-1, fitted(mod1))
R Output
Hosmer and Lemeshow goodness of fit (GOF) test
data: as.numeric(chdage$chd) - 1, fitted(mod1)
X-squared = 2.2243, df = 8, p-value = 0.9734
o
24/29
Single Categorical Predictor
o
25/29
Single Categorical Predictor
o
26/29
Single Categorical Predictor
Past exposure yi ni
Smokers 112 288
Non-smokers 88 312
Then yi ∼ Bin(ni , pi )
xi is the binary predictor of past smoking
I xi = 1; if past smoker
I xi = 0; if non-smoker in the past
o
27/29
Single Categorical Predictor
logit(pi ) = βo + β1 xi
pi |xi = 1
β1 = logit(pi |xi = 1) − logit(pi |xi = 0) = log
pi |xi = 0
∴ OR = · · · · · ·
o
28/29
Single Categorical Predictor
o
29/29
Types of Logistic Regression Models