0% found this document useful (0 votes)
13 views6 pages

GLM Ohp

The document discusses regression-type models, focusing on Generalized Linear Models (GLMs) and their application in R. It explains the components of GLMs, including the randomness and structure models, and provides examples of distributions and link functions that can be used. Additionally, it covers fitting GLMs in R, significance testing for predictors, and diagnostic plots.

Uploaded by

Honey Beee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

GLM Ohp

The document discusses regression-type models, focusing on Generalized Linear Models (GLMs) and their application in R. It explains the components of GLMs, including the randomness and structure models, and provides examples of distributions and link functions that can be used. Additionally, it covers fitting GLMs in R, significance testing for predictors, and diagnostic plots.

Uploaded by

Honey Beee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Regression models
The usual linear regression models assume data come
from a Normal distribution. . .
Generalized linear models in R
. . . with the mean related to predictors
Dr Peter K Dunn Generalized linear models (GLMs) assume data come
from some distribution. . .
http://www.usq.edu.au
Department of Mathematics and Computing
. . . with a function of the mean related to predictors
University of Southern Queensland
Model Randomness Structure
ASC, July 2008 Regression model Y ∼ N(µ, φ) µ = Xβ
GLM Y ∼ P(µ, φ) g (µ) = Xβ

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Generalized linear models Normal regression models are not always


appropriate
Generalized linear models have two main components
1 The model for the randomness: Y ∼ P(µ, φ) There are obvious occasions when a Normal distribution is
2 The model for the structure: g (µ) = Xβ inappropriate:
We can choose from many distributions P Counts cannot have normal distributions: they are
non-negative integers
We can choose from many link functions g (µ) in a Proportions cannot have normal distributions: they are
separate decision constrained between 0 and 1
(Using a transformation in regression approximately Lots of continuous data are non-negative and have
makes both decisions at once) non-constant variance
In all cases, the variance cannot be constant since a
boundary on the responses exists

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Examples Examples

Example Example
Counts may be modelled using a Poisson distribution Proportions may be modelled using a binomial distribution
Usually, use a log link Often, use a logit link (to get a logistic regression
Define µ = E[Y ] as the expected count model)
The model is Define µ = E[Y ] as the expected proportion
! The model is
Yi ∼ Poisson(µi ) (random) !
log µi = Xβ (systematic) Yi ∼ Binomial(µi ) (random)
logit(µi ) = Xβ (systematic)
The log link ensures µ = exp(X β) is always positive 
The log link means the effect of the covariates xj on µ is  Yi ∼ % Binomial(µ
& i) (random)
multiplicative not additive µi
 log = Xβ (systematic)
1 − µi

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Basic fitting of glms in R What distributions can I choose?


Fit a regression model in R using gaussian: a Gaussian (Normal) distribution
lm( y ~ x1 + log( x2 ) + x3 ) binomial: a binomial distribution for proportions
poisson: a Poisson distribution for counts
To fit a glm, R must know the distribution and link Gamma: a gamma distribution for positive continuous
function data
Fit a regression model in R using (for example) inverse.gaussian: an inverse Gaussian distribution for
glm( y ~ x1 + log( x2 ) + x3, positive continuous data
family=poisson( link="log" ) )
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

What link function can I choose? What link function can I choose?

n
an

ia
a

n
i

mi

ss
ss

ss
no

au
u

i
ga

bi

po
Link function

.g
se
indentity µ=η ! "

er
mm
log µ = η " !

v
log

ga

in
Link function
inverse 1/µ = η "

sqrt µ=η " indentity µ=η " "
logit logit(µ) = η ! log log µ = η " "
probit probit(µ) = η " inverse 1/µ = η ! "
cauchit cauchit(µ) = η " 1/mu^2 1/µ2 = η !
cloglog cloglog(µ) = η "

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

In R. . . Glms in R?
To fit a glm in R, we need to specify: Fitting glms is locally like fitting a standard regression
The linear predictor: x1+x2+log(x3) model
The distribution: family=poisson So most regression concepts have (approximate) analogies
The link function: link="log" for glms
They work together like this: For example, R allows the user to:
glm( y ~ x1 + x2 + log(x3), fit glms (use glm)
family=poisson(link = "log") ) find important predictors (F -tests using anova; t-tests
using summary)
compute residuals (using resid; quantile residuals in
package statmod strongly recommended: qresid)
perform diagnostics (using plot, hatvalues
cooks.distance, etc.)

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Example: Poisson Example

Example To fit the minimal model in R:


dep.glm <- glm( Counts ~ C + S + D,
3 children Others family=poisson(link=log) )
< 14 (C = 1) (C = 0)
To fit the full model R:
SLE No SLE SLE No SLE dep.full <- glm( Counts ~ C * S * D,
(S = 1) (S = 0) (S = 1) (S = 0) family=poisson(link=log) )
Depres. (D = 1) 9 0 24 4 We assume all qualitative variables are declared as
OK (D = 0) 12 20 119 231 factors

The data are counts, so use a poisson family (and


default log link)
Initially, use the linear predictor C + S + D

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Example Example
What predictors are significant? Sequential test: What predictors are significant? Post-fit test:
> summary(dep.full)
> anova(dep.full, test = "Chisq")
Call:
Analysis of Deviance Table glm(formula = Counts ~ D * S * C, family = poisson(link = log),
data = dep)
Model: poisson, link: log
Deviance Residuals:
Response: Counts [1] 0 0 0 0 0 0 0 0

Terms added sequentially (first to last) Coefficients:


Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.4424 0.0658 82.718 < 2e-16 ***
Df Deviance Resid. Df Resid. Dev P(>|Chi|) D1 -4.0561 0.5043 -8.043 8.77e-16 ***
NULL 7 717.32 S1 -0.6633 0.1128 -5.878 4.15e-09 ***
D 1 330.63 6 386.69 7.005e-74 C1 -2.4467 0.2331 -10.497 < 2e-16 ***
S 1 19.92 5 366.77 8.066e-06 D1:S1 2.4550 0.5517 4.450 8.60e-06 ***
C 1 312.41 4 54.35 6.505e-70 D1:C1 -21.2422 42247.1657 -0.001 1.00
D:S 1 44.36 3 9.99 2.725e-11 S1:C1 0.1525 0.3822 0.399 0.69
D:C 1 7.45 2 2.54 0.01 D1:S1:C1 22.5556 42247.1657 0.001 1.00
S:C 1 0.54 1 2.00 0.46 ---
D:S:C 1 2.00 0 4.122e-10 0.16 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 7.1732e+02 on 7 degrees of freedom


Residual deviance: 4.1223e-10 on 0 degrees of freedom
AIC: 51.42

Number of Fisher Scoring iterations: 20


Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Example Plots: Hat diagonals


To fit one suggested model in R: > plot(hatvalues(dep.opt), type = "h", lwd = 2,
dep.opt <- glm( Counts ~ C + S * D, + col = "blue")

family=poisson(link=log) )
Note that S * D means S + D and the interaction S : D

0.8
hatvalues(dep.opt)

0.6
0.4
0.2
1 2 3 4 5 6 7 8

Index

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Plots: Cook’s distance Plots: Q–Q plots

> plot(cooks.distance(dep.opt), type = "h", lwd = 2, > library(statmod)


+ col = "blue") > qqnorm(qresid(dep.opt))

Normal Q!Q Plot


25

2
20
cooks.distance(dep.opt)

Sample Quantiles
15

1
10

! !

0
!
5

!1
0

1 2 3 4 5 6 7 8 !1.5 !1.0 !0.5 0.0 0.5 1.0 1.5

Index Theoretical Quantiles

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Typing plot( glm.object ) produces six plots, four by


> par(mfrow = c(2, 2))
default: > plot(dep.opt)
> par(mfrow = c(1, 1))
1 Residuals
' ri vs fitted values µ̂ (default)
2 |ri | vs µ̂ (default) Residuals vs Fitted Normal Q!Q
Std. deviance resid.

a Q–Q plot (default)


1 2 3

3 !1 1!
2
Residuals

!
!
!

A plot of Cook’s distance Di


0

4 ! !
!
!6
!
!

!
!3
!1

! 63!!
5 A plot of ri vs hi with contours of equal Di (default) !1 1 2 3 4 5
!3

!1.5 !0.5 0.5 1.5


6 A plot of Di vs hi /(1 − hi ), with contours of equal Di Predicted values Theoretical Quantiles

Scale!Location Residuals vs Leverage


Std. deviance resid.

Std. deviance resid.

3!
!1 !
2

8!
6! ! !!
1.0

1
0.5
0

! ! 0.5 1
! !
!6
! !
Cook's distance
!
!4
0.0

!1 1 2 3 4 5 0.0 0.4 0.8

Predicted values Leverage

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

> plot(dep.opt, which = 5)


Example

Residuals vs Leverage
Example
Prop. fissures

Prop. fissures

!
No. turbines

No. turbines
No. fissures

No. fissures
2

8!
Std. deviance resid.

!
! 1
Hours

0.5
Hours
0

! 0.5
1

!6
!2

3! 400 39 0 0.00 3000 42 9 0.21


!4

Cook's distance

0.0 0.2 0.4 0.6 0.8


1000 53 4 0.08 3400 13 6 0.46
Leverage 1400 33 2 0.06 3800 34 22 0.65
glm(Counts ~ D * S + C)
1800 73 7 0.10 4200 40 21 0.53
2200 30 5 0.17 4600 36 21 0.58
2600 39 9 0.23

The data are proportions: use binomial family


Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Example Example
Three ways to fit binomial glms in R; here are two:
1

td.glm <- glm( prop ~ Hours, weights=Turbines,


!
0.6 !
family=binomial(link=logit) )
2
!
0.5
td.glm <- glm( cbind(Fissures, Turbines) ~ Hours,
Proportion of turbines

family=binomial(link=logit) )
with fissures

0.4

0.3
Can use alternative links:
!
!
0.2
! td.glm <- glm( prop ~ Hours, weights=Turbines,
0.1 ! family=binomial(link=probit) )
! !

0.0 !
td.glm <- glm( prop ~ Hours, weights=Turbines,
1000 2000 3000 4000 family=binomial(link=cloglog) )
Hours of use We use the default logit link
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Example Example
The fitted model is: > td.cf <- signif(coef(td.glm), 3)
> td.cf
> summary(td.glm)
(Intercept) Hours
Call: -3.920000 0.000999
glm(formula = prop ~ Hours, family = binomial(link = logit),
weights = Turbines)

Deviance Residuals:
From R output, the fitted model is
Min 1Q Median 3Q Max % &
-1.5055 -0.7647 -0.3036 0.4901 2.0943
µi
Coefficients: log = −3.92 + 0.000999 × Hours
Estimate Std. Error z value Pr(>|z|) 1 − µi
(Intercept) -3.9235966 0.3779589 -10.381 <2e-16 ***
Hours 0.0009992 0.0001142 8.754 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
where µ is the expected proportion of turbines with
(Dispersion parameter for binomial family taken to be 1)
fissures
Null deviance: 112.670 on 10 degrees of freedom
Residual deviance: 10.331 on 9 degrees of freedom
AIC: 49.808

Number of Fisher Scoring iterations: 4

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Plots: Hat diagonals Plots: Cook’s distance

> plot(hatvalues(td.glm), type = "h", lwd = 2, col = "blue") > plot(cooks.distance(td.glm), type = "h", lwd = 2,
+ col = "blue")
0.05 0.10 0.15 0.20 0.25 0.30 0.35

0.6
0.5
hatvalues(td.glm)

cooks.distance(td.glm)

0.4
0.3
0.2
0.1
0.0

2 4 6 8 10
2 4 6 8 10
Index
Index

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Plots: Q–Q plots Example

> qqnorm(qresid(td.glm)) Example


Normal Q!Q Plot
F H K V
! Age C P C P C P C P
2.0
1.5

40–54 11 3059 13 2879 4 3142 5 2520


Sample Quantiles

1.0

! 55–59 11 800 6 1083 8 1050 7 878


0.5

!
! !
60–64 11 710 15 923 7 895 10 839
0.0

!
!
65–69 10 581 10 834 11 702 14 631
!
70–74 11 509 12 634 9 535 8 539
!1.0

!
!
!
74+ 10 605 2 782 12 659 7 619
!1.5 !1.0 !0.5 0.0 0.5 1.0 1.5

Theoretical Quantiles
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Plots: Number of cancers Rates


Number of lung cancer patients is a count, so use a
Poisson glm:
glm( Cases ~ City + Age,
14 14
family=poisson(link=log) )
12 12
But lung cancer rate probably more useful
No. Lung cancers

No. Lung cancers


10 10
Expected cancer rate is E[Yi /Ti ] = E[Yi ]/Ti = µ/Ti ,
8 8
where µi is the expected number of cancers,
6 6
Note Ti is known and not random.
4 4
Using a logarithmic link, model the cancer rate as
2 2
log(µi /Ti ) = Xβ or

Vejle
Kolding
40!54

55!59

60!64

65!69

70!74

>74

Horsens
Fredericia
Age group City log µi = log Ti + Xβ
log Ti is an offset: a component of the linear predictor
with a known parameter value, here one.
Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Plots: Number of cancers Plots: Rates of cancer

14 14
0.020 0.020

12 12

0.015 0.015
No. Lung cancers

No. Lung cancers

Lung cancer rate

Lung cancer rate


10 10

8 8
0.010 0.010

6 6

0.005 0.005
4 4
!

2 2
Vejle

Vejle
Kolding

Kolding
40!54

55!59

60!64

65!69

70!74

>74

Horsens

40!54

55!59

60!64

65!69

70!74

>74

Horsens
Fredericia

Fredericia
Age group City Age group City

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Rates Plots: Hat diagonals


To model lung cancer rate, use a Poisson glm with an > plot(hatvalues(lc.glm), type = "h", lwd = 2, col = "blue")
offset:
lc.glm <- glm( Cases ~ offset( log(Population))
+ City + Age,
0.32 0.34 0.36 0.38 0.40 0.42 0.44

family=poisson(link=log) )
hatvalues(lc.glm)

5 10 15 20

Index

Regression-type models Examples Using R R examples Regression-type models Examples Using R R examples

Plots: Cook’s distance Plots: Q–Q plots

> plot(cooks.distance(lc.glm), type = "h", lwd = 2, > library(statmod)


+ col = "blue") > qqnorm(qresid(lc.glm))

Normal Q!Q Plot


0.5

! !

!
!
0.4

1
cooks.distance(lc.glm)

!
Sample Quantiles

!!
!
0.3

!
!
0

!!!!
!!
!
!!
!
0.2

! !
!1

!
0.1

!2
0.0

5 10 15 20 !2 !1 0 1 2

Index Theoretical Quantiles


Regression-type models Examples Using R R examples

Other models
We haved looked at fitting glms to
Proportions
Counts
Rates
Can also fit glms to
Positive continuous data (family=gamma or
family=inverse.gaussian)
Overdispersed counts (family=quasipoisson)
Overdispersed proportions (family=quasibinomial)
Positive continuous data with exact zeros
(family=tweedie using package statmod)

You might also like