GLM Ohp
GLM Ohp
Regression models
The usual linear regression models assume data come
from a Normal distribution. . .
Generalized linear models in R
. . . with the mean related to predictors
Dr Peter K Dunn Generalized linear models (GLMs) assume data come
from some distribution. . .
http://www.usq.edu.au
Department of Mathematics and Computing
. . . with a function of the mean related to predictors
University of Southern Queensland
Model Randomness Structure
ASC, July 2008 Regression model Y ∼ N(µ, φ) µ = Xβ
GLM Y ∼ P(µ, φ) g (µ) = Xβ
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
Examples Examples
Example Example
Counts may be modelled using a Poisson distribution Proportions may be modelled using a binomial distribution
Usually, use a log link Often, use a logit link (to get a logistic regression
Define µ = E[Y ] as the expected count model)
The model is Define µ = E[Y ] as the expected proportion
! The model is
Yi ∼ Poisson(µi ) (random) !
log µi = Xβ (systematic) Yi ∼ Binomial(µi ) (random)
logit(µi ) = Xβ (systematic)
The log link ensures µ = exp(X β) is always positive
The log link means the effect of the covariates xj on µ is Yi ∼ % Binomial(µ
& i) (random)
multiplicative not additive µi
log = Xβ (systematic)
1 − µi
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
What link function can I choose? What link function can I choose?
n
an
ia
a
n
i
mi
ss
ss
ss
no
au
u
i
ga
bi
po
Link function
.g
se
indentity µ=η ! "
er
mm
log µ = η " !
v
log
ga
in
Link function
inverse 1/µ = η "
√
sqrt µ=η " indentity µ=η " "
logit logit(µ) = η ! log log µ = η " "
probit probit(µ) = η " inverse 1/µ = η ! "
cauchit cauchit(µ) = η " 1/mu^2 1/µ2 = η !
cloglog cloglog(µ) = η "
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
In R. . . Glms in R?
To fit a glm in R, we need to specify: Fitting glms is locally like fitting a standard regression
The linear predictor: x1+x2+log(x3) model
The distribution: family=poisson So most regression concepts have (approximate) analogies
The link function: link="log" for glms
They work together like this: For example, R allows the user to:
glm( y ~ x1 + x2 + log(x3), fit glms (use glm)
family=poisson(link = "log") ) find important predictors (F -tests using anova; t-tests
using summary)
compute residuals (using resid; quantile residuals in
package statmod strongly recommended: qresid)
perform diagnostics (using plot, hatvalues
cooks.distance, etc.)
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
Example Example
What predictors are significant? Sequential test: What predictors are significant? Post-fit test:
> summary(dep.full)
> anova(dep.full, test = "Chisq")
Call:
Analysis of Deviance Table glm(formula = Counts ~ D * S * C, family = poisson(link = log),
data = dep)
Model: poisson, link: log
Deviance Residuals:
Response: Counts [1] 0 0 0 0 0 0 0 0
family=poisson(link=log) )
Note that S * D means S + D and the interaction S : D
0.8
hatvalues(dep.opt)
0.6
0.4
0.2
1 2 3 4 5 6 7 8
Index
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
2
20
cooks.distance(dep.opt)
Sample Quantiles
15
1
10
! !
0
!
5
!1
0
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
3 !1 1!
2
Residuals
!
!
!
4 ! !
!
!6
!
!
!
!3
!1
! 63!!
5 A plot of ri vs hi with contours of equal Di (default) !1 1 2 3 4 5
!3
3!
!1 !
2
8!
6! ! !!
1.0
1
0.5
0
! ! 0.5 1
! !
!6
! !
Cook's distance
!
!4
0.0
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
Residuals vs Leverage
Example
Prop. fissures
Prop. fissures
!
No. turbines
No. turbines
No. fissures
No. fissures
2
8!
Std. deviance resid.
!
! 1
Hours
0.5
Hours
0
! 0.5
1
!6
!2
Cook's distance
Example Example
Three ways to fit binomial glms in R; here are two:
1
family=binomial(link=logit) )
with fissures
0.4
0.3
Can use alternative links:
!
!
0.2
! td.glm <- glm( prop ~ Hours, weights=Turbines,
0.1 ! family=binomial(link=probit) )
! !
0.0 !
td.glm <- glm( prop ~ Hours, weights=Turbines,
1000 2000 3000 4000 family=binomial(link=cloglog) )
Hours of use We use the default logit link
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
Example Example
The fitted model is: > td.cf <- signif(coef(td.glm), 3)
> td.cf
> summary(td.glm)
(Intercept) Hours
Call: -3.920000 0.000999
glm(formula = prop ~ Hours, family = binomial(link = logit),
weights = Turbines)
Deviance Residuals:
From R output, the fitted model is
Min 1Q Median 3Q Max % &
-1.5055 -0.7647 -0.3036 0.4901 2.0943
µi
Coefficients: log = −3.92 + 0.000999 × Hours
Estimate Std. Error z value Pr(>|z|) 1 − µi
(Intercept) -3.9235966 0.3779589 -10.381 <2e-16 ***
Hours 0.0009992 0.0001142 8.754 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
where µ is the expected proportion of turbines with
(Dispersion parameter for binomial family taken to be 1)
fissures
Null deviance: 112.670 on 10 degrees of freedom
Residual deviance: 10.331 on 9 degrees of freedom
AIC: 49.808
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
> plot(hatvalues(td.glm), type = "h", lwd = 2, col = "blue") > plot(cooks.distance(td.glm), type = "h", lwd = 2,
+ col = "blue")
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.6
0.5
hatvalues(td.glm)
cooks.distance(td.glm)
0.4
0.3
0.2
0.1
0.0
2 4 6 8 10
2 4 6 8 10
Index
Index
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
1.0
!
! !
60–64 11 710 15 923 7 895 10 839
0.0
!
!
65–69 10 581 10 834 11 702 14 631
!
70–74 11 509 12 634 9 535 8 539
!1.0
!
!
!
74+ 10 605 2 782 12 659 7 619
!1.5 !1.0 !0.5 0.0 0.5 1.0 1.5
Theoretical Quantiles
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
Vejle
Kolding
40!54
55!59
60!64
65!69
70!74
>74
Horsens
Fredericia
Age group City log µi = log Ti + Xβ
log Ti is an offset: a component of the linear predictor
with a known parameter value, here one.
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
14 14
0.020 0.020
12 12
0.015 0.015
No. Lung cancers
8 8
0.010 0.010
6 6
0.005 0.005
4 4
!
2 2
Vejle
Vejle
Kolding
Kolding
40!54
55!59
60!64
65!69
70!74
>74
Horsens
40!54
55!59
60!64
65!69
70!74
>74
Horsens
Fredericia
Fredericia
Age group City Age group City
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
family=poisson(link=log) )
hatvalues(lc.glm)
5 10 15 20
Index
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples
! !
!
!
0.4
1
cooks.distance(lc.glm)
!
Sample Quantiles
!!
!
0.3
!
!
0
!!!!
!!
!
!!
!
0.2
! !
!1
!
0.1
!2
0.0
5 10 15 20 !2 !1 0 1 2
Other models
We haved looked at fitting glms to
Proportions
Counts
Rates
Can also fit glms to
Positive continuous data (family=gamma or
family=inverse.gaussian)
Overdispersed counts (family=quasipoisson)
Overdispersed proportions (family=quasibinomial)
Positive continuous data with exact zeros
(family=tweedie using package statmod)