0% found this document useful (0 votes)
64 views18 pages

Advanced Research Skills: Glms Ii Binomial Family

This document discusses using generalized linear models (GLMs) with binomial family distributions. It first provides an example analyzing the effects of dose on mortality for tobacco budworm moths. Higher doses were found to significantly increase odds of death. A second example analyzes factors related to erythrocyte sedimentation rate, finding fibrinogen level significantly predicts higher rates but globulin does not provide additional predictive value when included in the model. The document demonstrates fitting and interpreting GLMs for binomial response data.

Uploaded by

Diego Moreno
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views18 pages

Advanced Research Skills: Glms Ii Binomial Family

This document discusses using generalized linear models (GLMs) with binomial family distributions. It first provides an example analyzing the effects of dose on mortality for tobacco budworm moths. Higher doses were found to significantly increase odds of death. A second example analyzes factors related to erythrocyte sedimentation rate, finding fibrinogen level significantly predicts higher rates but globulin does not provide additional predictive value when included in the model. The document demonstrates fitting and interpreting GLMs for binomial response data.

Uploaded by

Diego Moreno
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture 7

GLMs II
Binomial Family
Olivier MISSA, [email protected]
Advanced Research Skills
2
Outline
Continue our Introduction to Generalized Linear Models.
In this lecture:
Illustrate the use of GLMs for
proportion and
binary data.
3
Binary & Proportion data tend to follow
the Binomial distribution

The Canonical link of this glm family
is the logit function:

The variance reaches a maximum for intermediate values
of p and a minimum at either 0% or 100%.
Reminder
|
|
.
|

\
|
) 1 (
log
p
p
( ) p p n Var = 1
p n Mean=
4
In R, binary/proportion data can be entered
into a model as a response in three different ways:
as a numeric vector
(holding the number or proportion of successes)
as a logical vector or a factor
(TRUE or the first factor level will be considered successes).
as a two-column matrix
(the first column holding the number of successes and
the second column the number of failures).
Three ways to work with binary data
5
Toxicity to tobacco budworm (moth) of different doses
of trans-cypermethrin.
Batches of 20 moths (of each sex) were put in contact
for three days with increasing doses of the pyrethroid.
1
st
Example
Dose (micrograms)
Sex 1 2 4 8 16 32
Male 1 4 9 13 18 20
Female 0 2 6 10 12 16
Number of dead moths
out of 20 tested
> (dose <- rep(2^(0:5), 2))
[1] 1 2 4 8 16 32 1 2 4 8 16 32
> numdead <- c(1,4,9,13,18,20,0,2,6,10,12,16)
> (sex <- factor( rep( c("M","F"), c(6,6) ) ))
[1] M M M M M M F F F F F F
Levels: F M
> SF <- cbind(numdead, numalive=20-numdead)
6
1
st
Example
> modb <- glm(SF ~ sex*dose, family=binomial)
> summary(modb)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.71578 0.32233 -5.323 1.02e-07 ***
sexM -0.21194 0.51523 -0.411 0.68082
dose 0.11568 0.02379 4.863 1.16e-06 ***
sexM:dose 0.18156 0.06692 2.713 0.00666 **
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.876 on 11 degrees of freedom
Residual deviance: 18.164 on 8 degrees of freedom
AIC: 56.275
What is modelled is the proportion of successes

=
(

|
|
.
|

\
|

+ =
n
i
i
i
i i i i
y n
y n
y n y y y D
1

ln ) ( ) / ln( 2
7
1
st
Example
> ldose <- log2(dose)
> modb2 <- glm(SF ~ sex*ldose, family=binomial)
> summary(modb2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.9935 0.5527 -5.416 6.09e-08 ***
sexM 0.1750 0.7783 0.225 0.822
ldose 0.9060 0.1671 5.422 5.89e-08 ***
sexM:ldose 0.3529 0.2700 1.307 0.191
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.8756 on 11 degrees of freedom
Residual deviance: 4.9937 on 8 degrees of freedom
AIC: 43.104

8
1
st
Example
> drop1(modb2, test="Chisq")
Single term deletions

Model:
SF ~ sex * ldose
Df Deviance AIC LRT Pr(Chi)
<none> 4.994 43.104
sex:ldose 1 6.757 42.867 1.763 0.1842

> modb3 <- update(modb2, ~. sex:ldose)
> summary(modb3)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***
sexM 1.1007 0.3558 3.093 0.00198 **
ldose 1.0642 0.1311 8.119 4.70e-16 ***
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.876 on 11 degrees of freedom
Residual deviance: 6.757 on 9 degrees of freedom
AIC: 42.867
9
1
st
Example
> drop1(modb3, test="Chisq")
Single term deletions

Model:
SF ~ sex + ldose
Df Deviance AIC LRT Pr(Chi)
<none> 6.757 42.867
sex 1 16.984 51.094 10.227 0.001384 **
ldose 1 118.799 152.909 112.042 < 2.2e-16 ***

> shapiro.test(residuals(modb3), type="deviance")

Shapiro-Wilk normality test

data: residuals(modb3, type = "deviance")
W = 0.9666, p-value = 0.8725

10
1
st
Example
> par(mfrow=c(2,2))
> plot(modb3)
11
1
st
Example
> plot( c(0,1) ~ c(1,32), type="n", log="x",
xlab="dose", ylab="Probability")
> text(dose, numdead/20, labels=as.character(sex) )
> ld <- seq(0,32,0.5)
> lines (ld, predict(modb3, data.frame(ldose=log2(ld),
sex=factor(rep("M", length(ld)), levels=levels(sex))),
type="response") )
> lines (ld, predict(modb3, data.frame(ldose=log2(ld),
sex=factor(rep("F", length(ld)), levels=levels(sex))),
type="response"), lty=2, col="red" )
12
1
st
Example
> modbp <- glm(SF ~ sex*ldose,
family=binomial(link="probit"))
> AIC(modbp)
[1] 41.87836

> modbc <- glm(SF ~ sex*ldose,
family=binomial(link="cloglog"))
> AIC(modbc)
[1] 43.8663

> AIC(modb3)
[1] 42.86747
13
> summary(modb3)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***
sexM 1.1007 0.3558 3.093 0.00198 **
ldose 1.0642 0.1311 8.119 4.70e-16 ***
---

> exp(modb3$coeff) ## careful it may be misleading
(Intercept) sexM ldose
0.031019 3.006400 2.898560 ## odds ration: p / (1-p)





> exp(modb3$coeff[1]+modb3$coeff[2]) ## odds for males
(Intercept)
0.09325553
1
st
Example
logit scale
|
|
.
|

\
|
) 1 (
log
p
p
Every doubling of the dose will lead
to an increase in the odds of dying
over surviving by a factor of 2.899
14
Erythrocyte Sedimentation Rate in a group of patients.
Two groups : <20 (healthy) or >20 (ill) mm/hour
Q: Is it related to globulin & fibrinogen level in the blood ?
2
nd
Example
> data("plasma", package="HSAUR")
> str(plasma)
'data.frame': 32 obs. of 3 variables:
$ fibrinogen: num 2.52 2.56 2.19 2.18 3.41 2.46 3.22 2.21 ...
$ globulin : int 38 31 33 31 37 36 38 37 39 41 ...
$ ESR : Factor w/ 2 levels "ESR < 20","ESR > 20": 1 1 ...
> summary(plasma)
fibrinogen globulin ESR
Min. :2.090 Min. :28.00 ESR < 20:26
1st Qu.:2.290 1st Qu.:31.75 ESR > 20: 6
Median :2.600 Median :36.00
Mean :2.789 Mean :35.66
3rd Qu.:3.167 3rd Qu.:38.00
Max. :5.060 Max. :46.00
15
2
nd
Example
> stripchart(globulin ~ ESR, vertical=T, data=plasma,
xlab="Erythrocyte Sedimentation Rate (mm/hr)",
ylab="Globulin blood level", method="jitter" )

> stripchart(fibrinogen ~ ESR, vertical=T, data=plasma,
xlab="Erythrocyte Sedimentation Rate (mm/hr)",
ylab="Fibrinogen blood level", method="jitter" )
16
2
nd
Example
> mod1 <- glm(ESR~fibrinogen, data=plasma, family=binomial)
> summary(mod1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8451 2.7703 -2.471 0.0135 *
fibrinogen 1.8271 0.9009 2.028 0.0425 *
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 30.885 on 31 degrees of freedom
Residual deviance: 24.840 on 30 degrees of freedom
AIC: 28.840

> mod2 <- glm(ESR~fibrinogen+globulin, data=plasma,
family=binomial)
> AIC(mod2)
[1] 28.97111
factor
17
2
nd
Example
> anova(mod1, mod2, test="Chisq")
Analysis of Deviance Table

Model 1: ESR ~ fibrinogen
Model 2: ESR ~ fibrinogen + globulin
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 30 24.8404
2 29 22.9711 1 1.8692 0.1716

> summary(mod2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -12.7921 5.7963 -2.207 0.0273 *
fibrinogen 1.9104 0.9710 1.967 0.0491 *
globulin 0.1558 0.1195 1.303 0.1925
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 30.885 on 31 degrees of freedom
Residual deviance: 22.971 on 29 degrees of freedom
AIC: 28.971
The difference in terms of
Deviance between these
models is not significant,
which leads us to select
the least complex model
18
2
nd
Example
> shapiro.test(residuals(mod1, type="deviance"))
Shapiro-Wilk normality test

data: residuals(mod1, type = "deviance")
W = 0.6863, p-value = 5.465e-07
> par(mfrow=c(2,2))
> plot(mod1)

You might also like