Logistic Reg
Logistic Reg
Multivariate
1
Events and Logistic Regression
2
Measuring the Probability of an Event
3
The Probability of an Event
π = P(Y = 1)
P(Y = 0) = 1 − P(Y = 1) = 1 − π.
4
Odds in Favour of an Event
ODDS(Y = 1) × ODDS(Y = 0) = 1.
5
Interpreting the Odds in Favour of an Event
6
Log-odds in Favour of an Event
7
Interpreting the Log-odds in Favour of an Event
8
Moving between Probability, Odds and Log-odds
10
Logistic Regression Equation Written on Three Scales
exp(β0 + β1 X1 + β2 X2 + ... + βp Xp )
P(Y = 1) =
1 + exp(β0 + β1 X1 + β2 X2 + ... + βp Xp )
11
Interpreting the Intercept
0.8
0.6 exp(β0 )
1+exp(β0 )
0.4
0.2
0
−2 −1 0 1 2 3
X1
I P(Y = 1) vs. X1 when p = 1 (univariate regression).
13
Univariate Picture: Sign of β1
1
Pr(Y = 1)
0.5
0
−2 0 2
X1
I When β1 > 0, P(Y = 1) increases with X1 (blue curve).
14
Univariate Picture: Magnitude of β1
1
Pr(Y = 1)
0.5
0
−2 0 2
X1
I β1 = 2 (blue curve), β1 = 4 (red curve).
I When |β1 | is greater, changes in X1 more strongly
influence the probability that the event occurs.
15
Interpreting β1 : Univariate Logistic Regression
log ODDS(Y = 1) = β0 + β1 X1
which is equal to
β0 + β1 (t + z) − (β0 + β1 t) = zβ1 .
16
Interpreting β1 : Univariate Logistic Regression
17
β1 as a Log-odds Ratio
ODDS(Y = 1|X1 = t + z)
log = zβ1
ODDS(Y = 1|X1 = t)
ODDS(Y = 1|X1 = t + z)
= exp(zβ1 ).
ODDS(Y = 1|X1 = t)
18
Interpreting Coefficients in Multivariate Logistic
Regression
19
Fitting a Logistic Regression in R
I We fit a logistic regression in R using the glm function:
> output <- glm(sta ~ sex, data=icu1.dat, family=binomial)
20
Logistic Regression: glm Output in R
Call:
glm(formula = sta ~ sex, family = binomial, data = icu1.dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6876 -0.6876 -0.6559 -0.6559 1.8123
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271 0.2273 -6.278 3.42e-10 ***
sex1 0.1054 0.3617 0.291 0.771
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
21
Logistic Regression: glm Output in R
Call:
glm(formula = sta ~ sex, family = binomial, data = icu1.dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6876 -0.6876 -0.6559 -0.6559 1.8123
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271 0.2273 -6.278 3.42e-10 ***
sex1 0.1054 0.3617 0.291 0.771
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
22
Logistic Regression: glm Output in R
Call:
glm(formula = sta ~ sex, family = binomial, data = icu1.dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6876 -0.6876 -0.6559 -0.6559 1.8123
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271 0.2273 -6.278 3.42e-10 ***
sex1 0.1054 0.3617 0.291 0.771
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
23
Logistic Regression: glm Output in R
Call:
glm(formula = sta ~ sex, family = binomial, data = icu1.dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6876 -0.6876 -0.6559 -0.6559 1.8123
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271 0.2273 -6.278 3.42e-10 ***
sex1 0.1054 0.3617 0.291 0.771
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
24
Logistic Regression: glm Output in R
Call:
glm(formula = sta ~ sex, family = binomial, data = icu1.dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6876 -0.6876 -0.6559 -0.6559 1.8123
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271 0.2273 -6.278 3.42e-10 ***
sex1 0.1054 0.3617 0.291 0.771
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
25
Logistic Regression: glm Output in R
Call:
glm(formula = sta ~ sex, family = binomial, data = icu1.dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6876 -0.6876 -0.6559 -0.6559 1.8123
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271 0.2273 -6.278 3.42e-10 ***
sex1 0.1054 0.3617 0.291 0.771
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
26
Computing a 95% Confidence Interval from glm
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271 0.2273 -6.278 3.42e-10 ***
sex1 0.1054 0.3617 0.291 0.771
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
27
Computing a 95% Confidence Interval on Odds Scale
28
Logistic Regression with Dummy Variables
29
Logistic Regression with Dummy Variables
31
Multivariate Logistic Regression ICU Example
I Summarise the data:
> summary(icu1.dat)
sta loc sex ser
Min. :0.0 0:185 0:124 0: 93
1st Qu.:0.0 1: 5 1: 76 1:107
Median :0.0 2: 10
Mean :0.2
3rd Qu.:0.0
Max. :1.0
32
Multivariate Logistic Regression ICU Example
33
Multivariate Logistic Regression ICU Example
I vital status (rows) vs. service type at ICU (columns):
0 1
0 67 93
1 26 14
34
Multivariate Logistic Regression ICU Example
I vital status (rows) vs. level of consciousness (columns):
0 1 2
0 158 0 2
1 27 5 8
35
Multivariate Logistic Regression ICU Example
0 1
0 54 70
1 39 37
0 1 2
0 116 3 5
1 69 2 5
37
Multivariate Logistic Regression ICU Example
I Service type (rows) vs. level of consciousness (columns):
0 1 2
0 84 2 7
1 101 3 3
38
Multivariate Logistic Regression ICU Example
I Now look at univariate regressions.
glm(formula = sta ~ sex, family = binomial, data = icu1.dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271 0.2273 -6.278 3.42e-10 ***
sex1 0.1054 0.3617 0.291 0.771
---
$intercept.ci
[1] -1.8726220 -0.9816107
$slopes.ci
[1] -0.6035757 0.8142967
$OR
sex1
1.111111
$OR.ci
[1] 0.5468528 2.2575874
$slopes.ci
[1] -1.6685958 -0.2252964
$OR
ser1
0.3879239
$OR.ci
[1] 0.1885116 0.7982796
Call:
glm(formula = sta ~ loc, family = binomial, data = icu1.dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7668 0.2082 -8.484 < 2e-16 ***
loc1 18.3328 1073.1090 0.017 0.986370
loc2 3.1531 0.8175 3.857 0.000115 ***
---
$intercept.ci
[1] -2.174912 -1.358605
$slopes.ci
[,1] [,2]
[1,] -2084.922247 2121.587900
[2,] 1.550710 4.755395
41
Multivariate Logistic Regression ICU Example
42
Multivariate Logistic Regression ICU Example
I Multivariate analysis:
Call:
glm(formula = sta ~ sex+ser, family = binomial, data = icu1.dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.96129 0.27885 -3.447 0.000566 ***
sex1 0.03488 0.36896 0.095 0.924688
ser1 -0.94442 0.36915 -2.558 0.010516 *
---
$intercept.ci
[1] -1.5078281 -0.4147469
$slopes.ci
[,1] [,2]
[1,] -0.6882692 0.758025
[2,] -1.6679299 -0.220904
$OR
sex1 ser1
1.0354933 0.3889063
43
Multivariate Logistic Regression ICU Example
Main Conclusions:
I Univariate and multivariate parameter models show same
pattern of significance.
44
Prediction In Logistic Regression
> newdata
sex ser
1 0 0
2 0 1
3 1 0
4 1 1
π̂i
I Predict on the log-odds scale (i.e. log 1− π̂i ) :
> predict(output, newdata=newdata)
1 2 3 4
-0.9612875 -1.9057045 -0.9264096 -1.8708266
I age - in years
47
Multivariate Logistic Regression ICU Example
0 1
0 51 109
1 2 38
48
Multivariate Logistic Regression ICU Example
0 1
0 1 92
1 52 55
49
Multivariate Logistic Regression ICU Example
I Box showing distribution of age stratified by vital status
> boxplot(list(icu2.dat$age[icu2.dat$sta==0],
icu2.dat$age[icu2.dat$sta==1]))
50
Multivariate Logistic Regression ICU Example
I Multivariate analysis:
Call:
glm(formula = sta ~ sex + ser + age + typ, family = binomial,
data = icu2.dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2753 -0.7844 -0.3920 -0.2281 2.5072
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.26359 1.11678 -4.713 2.44e-06 ***
sex1 -0.20092 0.39228 -0.512 0.60851
ser1 -0.23891 0.41697 -0.573 0.56667
age 0.03473 0.01098 3.162 0.00156 **
typ1 2.33065 0.80238 2.905 0.00368 **
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
65050
$OR.ci
[,1] [,2]
[1,] 0.3791710 1.764602
[2,] 0.3477894 1.783083
[3,] 1.0132920 1.057860
[4,] 2.1340289 49.565050
52
Multivariate Logistic Regression ICU Example
I Multivariate analysis on odds scale:
$OR
sex1 ser1 age typ1
0.8179766 0.7874880 1.0353364 10.2846123
$OR.ci
[,1] [,2]
[1,] 0.3791710 1.764602
[2,] 0.3477894 1.783083
[3,] 1.0132920 1.057860
[4,] 2.1340289 49.565050
53
Multivariate Logistic Regression ICU Example
ser
sta