Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
Jiawei Li
Sahil Bhagat
Shahrzad Baraeinezhad
Q1
glm1<-glm(switch~1,family=binomial(logit),data=Wells)
a . AIC=4120
b . BIC=NULL
summary(glm1)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
glm(formula = switch ~ 1, family = binomial(logit), data = Wells)
Deviance Residuals:
Min
1Q Median
-1.308 -1.308
1.052
3Q
1.052
Max
1.052
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.30296
0.03681
8.23
<2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4118.1
Residual deviance: 4118.1
AIC: 4120.1
on 3019
on 3019
degrees of freedom
degrees of freedom
glm1$BIC
## NULL
c . since there is no factor included, the log odds equals to that intercept adds a small
deviation In this case ,generally log odds = intercept=0.303
Q2
glm2<-glm(switch~distance+1,family=binomial(logit),data=Wells)
glm2
##
## Call:
##
##
##
##
##
##
##
##
##
a.
data = Wells)
Coefficients:
(Intercept)
0.605959
distance
-0.006219
3018 Residual
summary(glm2)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Call:
glm(formula = switch ~ distance + 1, family = binomial(logit),
data = Wells)
Deviance Residuals:
Min
1Q
Median
-1.4406 -1.3058
0.9669
3Q
1.0308
Max
1.6603
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.6059594 0.0603102 10.047 < 2e-16 ***
distance
-0.0062188 0.0009743 -6.383 1.74e-10 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4118.1
Residual deviance: 4076.2
AIC: 4080.2
on 3019
on 3018
degrees of freedom
degrees of freedom
(Intercept)
distance
0.605959365 -0.006218819
from the result above we found out that the sign of the coefficient for predictor
distance is negative,so with growing distance to a safe well families are less likely to
switch.
d.
p<-exp(0.303)/(exp(0.303)+1)
p
## [1] 0.5751757
x1=97.44
b.
slope half-way-point
slope=1/(exp(glm2$coefficients[1]+glm2$coefficients[2]*x1)+1)^2
slope
## (Intercept)
##
0.25
slope=0.25
Q4
glm3<-glm(switch~.+1,family=binomial(logit),data=Wells)
summary(glm3)
##
##
##
##
##
##
##
##
##
##
Call:
glm(formula = switch ~ . + 1, family = binomial(logit), data = Wells)
Deviance Residuals:
Min
1Q
Median
-2.5942 -1.1976
0.7541
3Q
1.0632
Max
1.6739
Coefficients:
Estimate Std. Error z value Pr(>|z|)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
a.
(Intercept)
-0.156712
0.099601 -1.573
0.116
arsenic
0.467022
0.041602 11.226 < 2e-16 ***
distance
-0.008961
0.001046 -8.569 < 2e-16 ***
education
0.042447
0.009588
4.427 9.55e-06 ***
associationyes -0.124300
0.076966 -1.615
0.106
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4118.1
Residual deviance: 3907.8
AIC: 3917.8
on 3019
on 3015
degrees of freedom
degrees of freedom
Direction:
Criterion:
backward/forward
AIC
Start: AIC=3917.83
switch ~ arsenic + distance + education + association + 1
Df Deviance
<none>
3907.8
- association 1
3910.4
- education
1
3927.7
- distance
1
3985.2
- arsenic
1
4056.1
AIC
3917.8
3918.4
3935.7
3993.2
4064.1
summary(switch.best)
##
##
##
+
##
##
##
##
##
##
##
##
Call:
glm(formula = switch ~ arsenic + distance + education + association
1, family = binomial(logit), data = Wells)
Deviance Residuals:
Min
1Q
Median
-2.5942 -1.1976
0.7541
3Q
1.0632
Max
1.6739
Coefficients:
Estimate Std. Error z value Pr(>|z|)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
(Intercept)
-0.156712
0.099601 -1.573
0.116
arsenic
0.467022
0.041602 11.226 < 2e-16 ***
distance
-0.008961
0.001046 -8.569 < 2e-16 ***
education
0.042447
0.009588
4.427 9.55e-06 ***
associationyes -0.124300
0.076966 -1.615
0.106
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4118.1
Residual deviance: 3907.8
AIC: 3917.8
on 3019
on 3015
degrees of freedom
degrees of freedom
The significant predictors are arsenic, distance and education and the AIC score of
the model is 3917.8
par(mfrow=c(4,3))
crPlots(switch.best,span=0.1)
## Warning in smoother(.x, partial.res[, var], col = col.lines[2], log.
x =
## FALSE, : could not fit smooth
crPlots(switch.best,span=0.25)
crPlots(switch.best,span=0.75)
crPlots(switch.best,span=0.90)
After using
different smoothing parameter like 0.1,0.25,0.75,0.90, there is no any quadratic
effect should be added.
Q5
Wells$arsenic2<-(Wells$arsenic)^2
Wells$arseniclog<-log(Wells$arsenic)
quadratic model
glm4<-glm(switch~.-arseniclog-arsenic,family=binomial(logit),data=Wells)
summary(glm4)
##
## Call:
## glm(formula = switch ~ . - arseniclog - arsenic, family = binomial(l
ogit),
##
data = Wells)
##
## Deviance Residuals:
##
Min
1Q
Median
3Q
Max
## -3.1699 -1.2250
0.8184
1.0596
1.5998
##
## Coefficients:
##
Estimate Std. Error z value Pr(>|z|)
## (Intercept)
0.251526
0.086568
2.906 0.00367 **
## distance
-0.007872
0.001021 -7.711 1.25e-14 ***
## education
0.041400
0.009504
4.356 1.32e-05 ***
## associationyes -0.128019
0.076361 -1.676 0.09364 .
## arsenic2
0.080732
0.009229
8.747 < 2e-16 ***
## --## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##
Null deviance: 4118.1 on 3019 degrees of freedom
## Residual deviance: 3954.0 on 3015 degrees of freedom
## AIC: 3964
##
## Number of Fisher Scoring iterations: 4
logarithm model
glm5<-glm(switch~.-arsenic2-arsenic,family=binomial(logit),data=Wells)
summary(glm5)
##
## Call:
## glm(formula = switch ~ . - arsenic2 - arsenic, family = binomial(log
it),
##
data = Wells)
##
## Deviance Residuals:
##
Min
1Q
Median
3Q
Max
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
-2.0742
-1.1760
0.7369
1.0382
1.8044
Coefficients:
Estimate Std. Error z value
(Intercept)
0.372386
0.084997
4.381
distance
-0.009791
0.001061 -9.227
education
0.042740
0.009656
4.426
associationyes -0.123718
0.077431 -1.598
arseniclog
0.886959
0.068901 12.873
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*'
Pr(>|z|)
1.18e-05
< 2e-16
9.59e-06
0.11
< 2e-16
***
***
***
***
on 3019
on 3015
degrees of freedom
degrees of freedom
compare AIC
glm4$aic
## [1] 3963.983
glm5$aic
## [1] 3885.602
Wald test:
---------Chi-squared test:
X2 = 184.2, df = 5, P(> X2) = 0.0
Wald test:
---------Chi-squared test:
X2 = 265.5, df = 5, P(> X2) = 0.0
the x2 value for glm4 is 184.2 while for glm5 is 265.5 with df=5,so the P-value for
glm5 is smaller than that of glm4, which means glm5 model has more significance.
Q6 a.
summary(glm5)
##
## Call:
## glm(formula = switch ~ . - arsenic2 - arsenic, family = binomial(log
it),
##
data = Wells)
##
## Deviance Residuals:
##
Min
1Q
Median
3Q
Max
## -2.0742 -1.1760
0.7369
1.0382
1.8044
##
## Coefficients:
##
Estimate Std. Error z value Pr(>|z|)
## (Intercept)
0.372386
0.084997
4.381 1.18e-05 ***
## distance
-0.009791
0.001061 -9.227 < 2e-16 ***
## education
0.042740
0.009656
4.426 9.59e-06 ***
## associationyes -0.123718
0.077431 -1.598
0.11
## arseniclog
0.886959
0.068901 12.873 < 2e-16 ***
## --## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##
Null deviance: 4118.1 on 3019 degrees of freedom
## Residual deviance: 3875.6 on 3015 degrees of freedom
## AIC: 3885.6
##
## Number of Fisher Scoring iterations: 4
glm5$coefficients
##
(Intercept)
niclog
##
0.372385812
959380
distance
-0.009791329
education associationyes
0.042740432
-0.123718388
arse
0.886
switch.logodds=(glm5$coefficients[1]+glm5$coefficients[2]*mean(Wells$di
stance)
+glm5$coefficients[3]*mean(Wells$education)+glm5$coefficients[4]*mean(a
s.numeric(Wells$association))
+glm5$coefficients[5]*mean(Wells$arseniclog))
switch.prob=exp(switch.logodds)/(exp(switch.logodds)+1)
switch.prob
## (Intercept)
##
0.551782
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Deviance Residuals:
Min
1Q
Median
-2.7823 -1.2004
0.7696
3Q
1.0816
Max
1.8476
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-0.147868
0.117538 -1.258 0.20838
distance
-0.005772
0.002092 -2.759 0.00579 **
arsenic
0.555977
0.069319
8.021 1.05e-15 ***
distance:arsenic -0.001789
0.001023 -1.748 0.08040 .
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4118.1
Residual deviance: 3927.6
AIC: 3935.6
on 3019
on 3016
degrees of freedom
degrees of freedom
the coefficients for distance and arsenic are the same with the coefficient above. The
coefficient for the union of the two is interpreted as the combined effect of the union
of the two predictors. I don't think it is meanful enough because from the summary
we can find out that the statistical significance for predictor:distance*arsenic is so
weak that the p-value is bigger than 0.05.
Q8 centralizing distance and arsenic
Wells$centereddistance<-Wells$distance-mean(Wells$distance)
Wells$centeredarsenic<-Wells$arsenic-mean(Wells$arsenic)
glm7<-glm(switch~centereddistance*centeredarsenic+centereddistance+cent
eredarsenic,family=binomial(logit),data=Wells)
summary(glm7)
##
## Call:
## glm(formula = switch ~ centereddistance * centeredarsenic + centered
distance +
##
centeredarsenic, family = binomial(logit), data = Wells)
##
## Deviance Residuals:
##
Min
1Q
Median
3Q
Max
## -2.7823 -1.2004
0.7696
1.0816
1.8476
##
## Coefficients:
##
Estimate Std. Error z value Pr(>|z
|)
## (Intercept)
0.351094
0.039852
8.810
<2e16 ***
##
16
##
16
##
04
##
##
##
##
##
##
##
##
##
##
centereddistance
-0.008737
0.001048 -8.337
***
centeredarsenic
0.469508
0.042074 11.159
***
centereddistance:centeredarsenic -0.001789
0.001023 -1.748
.
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<2e<2e0.08
on 3019
on 3016
degrees of freedom
degrees of freedom