0% found this document useful (0 votes)
14 views17 pages

Logistic Regression Asssignment Solutions

Uploaded by

yadavshwetacktd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

Logistic Regression Asssignment Solutions

Uploaded by

yadavshwetacktd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Logistic Regression Assignment

Solutions
David M. Rocke

April 15, 2021

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 1 / 17


Suppose we have data on 100 cases of myocardial
infarction and 150 healthy individuals (mi = 1 if MI,
0 otherwise) matched to the MI group by age and sex.
From their medical records before the MI (if they had
one), we classify the individuals as diabetic, metabolic
disorder, and normal blood glucose (bg = norm,
metdis, diabetic). The table on the next page shows
the number of individuals in each group.

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 2 / 17


norm metdis diabetic Total
Control 85 50 15 150
MI 35 30 35 100
Total 120 80 50 250

Find the odds ratio for MI for diabetic individuals


vs. normal individuals (ignoring the metabolic
disorder individuals). Interpret.

35/15 85
= = 5.67
35/85 15
The odds of an MI are 5.67 times higher in a
diabetic individual than an individual with normal
blood glucose.
David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 3 / 17
Write down the logistic regression model
formulation in detail for predicting MI from bg.
Specifically make sure you have defined the
coefficients in the model. Use “normal” as the base
level for the bg factor.
p = Pr(MI |bg )
 
p
ln = β0 + βmetdis xmetdis + βdiabetic xdiabetic
1−p
xmetdis = 1 iff bg = metdis
xdiabetic = 1 if bg = diabetic
βmetdis = log-odds ratio of metdis vs. normal
βdiabetic = log-odds ratio of diabetic vs. normal
David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 4 / 17
Derive the likelihood equation for the model.
1
`−1 (x) =
1 + exp(−x)
p0 = `−1 (β0 )
p1 = `−1 (β0 + βmetdis )
p2 = `−1 (β0 + βdiabetic )
 
120 35
L(β0 , βmetdis , βdiabetic ) = p0 (1 − p0 )85
35
 
80 30
× p1 (1 − p1)50
30
 
50 35
× p2 (1 − p2 )15
35
David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 5 / 17
Derive the maximum likelihood estimates for the
parameters of the model, using normal as the
default level.

p0 = 35/120 = 0.2917
p1 = 30/80 = 0.375
p2 = 35/50 = 0.70
β0 = log[35/85] = −0.8873
β0 + β1 = log[30/50] = −0.5108
β0 + β2 = log[35/15] = 0.8473
β1 = −0.5108 − (−0.8873) = 0.3765
β2 = 0.8473 − (−0.8873) = 1.7346
David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 6 / 17
> logistic.example
bg mi non.mi
1 norm 35 85
2 metdis 30 50
3 diabetic 35 15

> logistic.example.mi
[,1] [,2]
[1,] 35 85
[2,] 30 50
[3,] 35 15

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 7 / 17


> summary(glm(logistic.example.mi~bg,family=binomial,data=logistic.example))

Call:
glm(formula = logistic.example.mi ~ bg, family = binomial, data = logistic.example)

Deviance Residuals:
[1] 0 0 0

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8873 0.2008 -4.418 9.96e-06 ***
bgmetdis 0.3765 0.3061 1.230 0.219
bgdiabetic 1.7346 0.3682 4.711 2.47e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2.4696e+01 on 2 degrees of freedom


Residual deviance: -1.6653e-14 on 0 degrees of freedom
AIC: 20.031

Number of Fisher Scoring iterations: 3

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 8 / 17


Compute the maximized log likelihood. What is the
deviance? Why?

p0 = 35/120
p1 = 30/80
p2 = 35/50
 
120 35
L = p0 (1 − p0 )85
35
 
80 30
× p (1 − p1)50
30 1
 
50 35
× p2 (1 − p2 )15
35

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 9 / 17


p0 = 35/120
p1 = 30/80
p2 = 35/50
120
L = p 35 (1 − p0 )85
35 0
80
× p 30 (1 − p1)50
30 1
50
× p 35 (1 − p2 )15
35 2

> p0 <- 35/120


> p1 <- 30/80
> p2 <- 35/50
> ll <- lchoose(120,35)+35*log(p0)+85*log(1-p0)
> ll <- ll + lchoose(80,30)+30*log(p1)+50*log(1-p1)
> ll <- ll + lchoose(50,35)+35*log(p2)+15*log(1-p2)
> print(ll)
[1] -7.015692

> logLik(logistic.example.glm)
’log Lik.’ -7.015692 (df=3)
The deviance is zero since this is a saturated model.
David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 10 / 17
If the three parameters are β0 (the intercept),
βmetdis , and βdiabetic in that order, and if the
covariance matrix of the parameters is
!
0.04034 −0.04034 −0.04034
−0.04034 0.09367 0.04034
−0.04034 0.04034 0.13557

test the hypotheses (separately) that each of the


two non-intercept parameters is zero.

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 11 / 17


β0 = −0.8873
β1 = 0.3765
β2 = 1.7346
 
0.04034 −0.04034 −0.04034
V =  −0.04034 0.09367 0.04034 
−0.04034 0.04034 0.13557

z1 = 0.3765/ 0.09367 = 1.230 p = 0.219

z2 = 1.7346/ 0.13557 = 4.711 p = 2.5 × 10−6

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 12 / 17


> summary(glm(logistic.example.mi~bg,family=binomial,data=logistic.example))

Call:
glm(formula = logistic.example.mi ~ bg, family = binomial, data = logistic.example)

Deviance Residuals:
[1] 0 0 0

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8873 0.2008 -4.418 9.96e-06 ***
bgmetdis 0.3765 0.3061 1.230 0.219
bgdiabetic 1.7346 0.3682 4.711 2.47e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2.4696e+01 on 2 degrees of freedom


Residual deviance: -1.6653e-14 on 0 degrees of freedom
AIC: 20.031

Number of Fisher Scoring iterations: 3

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 13 / 17


Test the hypothesis that diabetic and metabolic
disorder subjects have a log-odds ratio vs. MI of 0.
β0 = −0.8873
β1 = 0.3765
β2 = 1.7346
 
0.04034 −0.04034 −0.04034
V =  −0.04034 0.09367 0.04034 
−0.04034 0.04034 0.13557
x3 = (1.7346 − 0.3765) = 1.3581
p
s3 = 0.13557 + 0.09367 − 2(0.04034) = 0.3854
z3 = x3 /s3 = 3.524 p = 4.26 × 10−4

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 14 / 17


Find a 95% confidence interval for the odds ratio for
MI with respect to normal/diabetic.
β0 = −0.8873
β1 = 0.3765
β2 = 1.7346
 
0.04034 −0.04034 −0.04034
V =  −0.04034 0.09367 0.04034 
−0.04034 0.04034 0.13557

1.7346 ± 1.960 0.13557
1.7346 ± 0.7217
(1.0129, 2.4563) log-odds ratio
(2.75, 11.66) odds ratio
David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 15 / 17
How would you perform the likelihood ratio test for
the given model vs. the null model?
The log-likelihood is −7.015692. We need to
compare this to the log-likelihood for the null
model, which is where p does not depend on the bg
variable. The MLE for p then is 100/250 = 0.40.
Minus twice the difference in these is asymptotically
χ22 . Calculated using the pooled p, we get a
log-likelihood of -19.36386. The test statistic is
then −2[−19.36386 − (−7.015692)] = 24.696 with
p = 4.34 × 10−6 .

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 16 / 17


> p <- 100/250

> ll2 <- lchoose(120,35)+35*log(p)+85*log(1-p)


> ll2 <- ll2 + lchoose(80,30)+30*log(p)+50*log(1-p)
> ll2 <- ll2 + lchoose(50,35)+35*log(p)+15*log(1-p)

> print(ll2)
[1] -19.36386

> x2 <- -2*(ll2-ll)


> x2
[1] 24.69634

> pchisq(x2,2,lower=F)
[1] 4.337673e-06

> drop1(logistic.example.glm,test="Chisq")
Single term deletions

Model:
logistic.example.mi ~ bg
Df Deviance AIC LRT Pr(>Chi)
<none> 0.000 20.031
bg 2 24.696 40.728 24.696 4.338e-06 ***

David M. Rocke Logistic Regression Assignment Solutions April 15, 2021 17 / 17

You might also like