26GeneralizedLinearModelBernoulliAnnotated PDF
26GeneralizedLinearModelBernoulliAnnotated PDF
2
8 i = 1, . . . , n, yi ⇠ N(µi , ), µi = x0i ,
8 i = 1, . . . , n, yi ⇠ Bernoulli(⇡i ),
exp(x0i )
⇡i = ,
1 + exp(x0i )
and y1 , . . . , yn are independent.
age in years
socioeconomic status
1 = upper
2 = middle
3 = lower
sector (1 or 2)
(
⇡ k (1 ⇡)1 k
for k 2 {0, 1}
Pr(y = k) = f (k) =
0 otherwise
Thus,
and
Pr(y = 1) = f (1) = ⇡ 1 (1 ⇡)1 1
= ⇡.
Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 11 / 46
The variance of y is a function of the mean of y.
1
X
E(y) = kf (k) = 0 · (1 ⇡) + 1 · ⇡ = ⇡
k=0
1
X
2
E(y ) = k2 f (k) = 02 · (1 ⇡) + 12 · ⇡ = ⇡
k=0
For i = 1, . . . , n, yi ⇠ Bernoulli(⇡i ),
where
exp(x0i )
⇡i =
1 + exp(x0i )
The function ✓ ◆
⇡
g(⇡) = log
1 ⇡
is called the logit function.
= log[exp(x0i )] = x0i .
yi ⇠ Bernoulli(⇡i ), where
✓ ◆
⇡i
log = x0i
1 ⇡i
In Generalized Linear Models terminology, the logit is
called the link function because it “links” the mean of
yi (i.e., ⇡i ) to the linear predictor x0i .
3
Complementary log-log (cloglog in R):
ˆ ⇠· N( , Î 1 ( ˆ ))
exp(x0 ) exp(x̃0 )
Let ⇡ = 1+exp(x0 ) and ⇡
˜= 1+exp(x̃0 ) .
⇢ ✓ ◆
⇡
˜ ⇡ ⇡
˜ ⇡
= exp log
1 ⇡
˜ 1 ⇡ 1 ⇡˜ 1 ⇡
⇢ ✓ ◆ ✓ ◆
⇡
˜ ⇡
= exp log log
1 ⇡ ˜ 1 ⇡
= exp{x̃0
x }
0
= exp{(xj + 1) j xj j }
= exp{ j }.
a) %
# P( exp ( L ;) E
exp ( p ;) E
exp ( U
;) ) =
100 It a) %
then
(exp(Lj ), exp(Uj ))
Lwt
> head(d) Ianort THIS
id age ses sector disease savings IN
VARIABLE
1 1 33 1 1 0 1 EXAMPLE
THIS .
2 2 35 1 1 0 1
3 3 6 1 1 0 0
4 4 60 1 1 0 1
5 5 18 3 1 1 0
6 6 26 3 1 0 0
>
> d$ses=factor(d$ses)
> d$sector=factor(d$sector)
Call:
glm(formula = disease ˜ age + ses + sector,
family = binomial(link = logit),
data = d)
LEARN ABOUT THESE LATER
-
WE Win
.
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6576 -0.8295 -0.5652 1.0092 2.0842
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Fuu Morsel :
logit ( Hi ) =
Xi @
lodid.
-
2 ) n -
i
,=(
a
AIC: 221.22
-24=(5+45)
-
Zl E) n -5
> round(vcov(o),3)
VEr ( E) (Intercept) age ses2 ses3 sector2
agebees
(Intercept)
ses2
0.191 -0.002 -0.083 -0.102
-0.002
-0.083
0.000
0.000
0.000
0.187
0.000
0.072
-0.080
0.000
0.003
ses3 -0.102 0.000 0.072 0.164 0.039
sector2 -0.080 0.000 0.003 0.039 0.124
D)
2llEb' extra
}
)
¥%¥r%÷¥
" * " " " 1
" ⇒
> confint(o)
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) -3.19560769 -1.47574975
age 0.01024152 0.04445014
ses2 -0.81499026 0.89014587
ses3 -0.53951033 1.05825383
sector2 ( 0.56319260
)<1.94992969
,
×=[}
,
1 age sector
, ,
> oreduced=glm(disease˜age+sector, ~ ~ -
+ family=binomial(link=logit),
+ data=d)
>
> anova(oreduced,o,test="Chisq")
Ho :p3=fy=o
Analysis of Deviance Table
Ho :fsesz=Bses3=O
Model 1: disease ˜ age + sector
Model 2: disease ˜ age + ses + sector
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 n -3 193 24dB)211.64 -2105.11=2 .ME ) -2L ,dEr)
↳
2 M -
5 191 Zl ,=( E) 211.22 2 0.4193 0.8109 =p
.
VALUE
y ,
)
5-3
PM } 0.4193
NOT SIGNIFICANT
> o=oreduced
-
Nt -
> anova(o,test="Chisq")
[email protected]
Analysis of Deviance Table
Model: binomial, link: logit
-
zettel )
/ /
Response: disease
)
Yann
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 195 236.33
p( X. 312.013 )
[email protected]
age 1 12.013 194 224.32 0.0005283 ***
sector 1 12.677 193 211.64 0.0003702 ***
zedo.sk
,
) # ( PCXT ? 12.677 )
Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 39 / 46
> head(model.matrix(o))
(Intercept) age sector2
1 1 33 0
2 1 35 0
3 1 6 0
4 1 60 0
5 1 18 0
6 1 26 0
>
> b=coef(o)
E
> b
(Intercept) age sector2
-2.15965912 0.02681289 1.18169345
> ci=confint(o)
Waiting for profiling to be done...
> ci
2.5 % 97.5 %
(Intercept) -2.86990940 -1.51605906
age 0.01010532 0.04421365
sector2 0.52854584 1.85407936
( K)
age
exp
1.027176
> #All else equal, the odds of disease are about 1.027
> #times greater for someone age x+1 than for someone
> #age x. An increase of one year in age is associated
> #with an increase in the odds of disease by about 2.7%.
> #A 95% confidence interval for the multiplicative
> #increase factor is
> exp(ci[2,])
2.5 % 97.5 %
1.010157 1.045206
( $3 )
> exp(b[3])
sector2 exp
3.25989
> #All else equal, the odds of disease are about 3.26
> #times greater for someone living in sector 2 than for
> #someone living in sector one.
> #A 95% confidence interval for the multiplicative
> #increase factor is
> exp(ci[3,])
2.5 % 97.5 %
1.696464 6.385816
X. =
[ I
,
40
,
I
]
> x=c(1,40,1)
> 1/(1+exp(-t(x)%*%b)) I
[,1]
E)
'
I + expt -
x.
[1,] 0.5236198
> #Approximate 95% confidence interval
> #for the probability in question.
> sexb=sqrt(t(x)%*%vcov(o)%*%x) ← a
'
Etesx
> cixb=c(t(x)%*%b-2*sexb,t(x)%*%b+2*sexb)
> 1/(1+exp(-cixb))
[1] 0.3965921 0.6476635
±zWEx. ELITE '
Sector 1
Sector 2
0.8
Estimated Probability of Disease
0.6
0.4
0.2
0.0
0 20 40 60 80
Age