0% found this document useful (0 votes)

64 views18 pages

Advanced Research Skills: Glms Ii Binomial Family

This document discusses using generalized linear models (GLMs) with binomial family distributions. It first provides an example analyzing the effects of dose on mortality for tobacco budworm moths. Higher doses were found to significantly increase odds of death. A second example analyzes factors related to erythrocyte sedimentation rate, finding fibrinogen level significantly predicts higher rates but globulin does not provide additional predictive value when included in the model. The document demonstrates fitting and interpreting GLMs for binomial response data.

Uploaded by

Diego Moreno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views18 pages

Advanced Research Skills: Glms Ii Binomial Family

Uploaded by

Diego Moreno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

Lecture 7

GLMs II
Binomial Family
Olivier MISSA, [email protected]
Advanced Research Skills
2
Outline
Continue our Introduction to Generalized Linear Models.
In this lecture:
Illustrate the use of GLMs for
proportion and
binary data.
3
Binary & Proportion data tend to follow
the Binomial distribution

The Canonical link of this glm family
is the logit function:

The variance reaches a maximum for intermediate values
of p and a minimum at either 0% or 100%.
Reminder
|
|
.
|

\
|
) 1 (
log
p
p
( ) p p n Var = 1
p n Mean=
4
In R, binary/proportion data can be entered
into a model as a response in three different ways:
as a numeric vector
(holding the number or proportion of successes)
as a logical vector or a factor
(TRUE or the first factor level will be considered successes).
as a two-column matrix
(the first column holding the number of successes and
the second column the number of failures).
Three ways to work with binary data
5
Toxicity to tobacco budworm (moth) of different doses
of trans-cypermethrin.
Batches of 20 moths (of each sex) were put in contact
for three days with increasing doses of the pyrethroid.
1
st
Example
Dose (micrograms)
Sex 1 2 4 8 16 32
Male 1 4 9 13 18 20
Female 0 2 6 10 12 16
Number of dead moths
out of 20 tested
> (dose <- rep(2^(0:5), 2))
[1] 1 2 4 8 16 32 1 2 4 8 16 32
> numdead <- c(1,4,9,13,18,20,0,2,6,10,12,16)
> (sex <- factor( rep( c("M","F"), c(6,6) ) ))
[1] M M M M M M F F F F F F
Levels: F M
> SF <- cbind(numdead, numalive=20-numdead)
6
1
st
Example
> modb <- glm(SF ~ sex*dose, family=binomial)
> summary(modb)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.71578 0.32233 -5.323 1.02e-07 ***
sexM -0.21194 0.51523 -0.411 0.68082
dose 0.11568 0.02379 4.863 1.16e-06 ***
sexM:dose 0.18156 0.06692 2.713 0.00666 **
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.876 on 11 degrees of freedom
Residual deviance: 18.164 on 8 degrees of freedom
AIC: 56.275
What is modelled is the proportion of successes

=
(

|
|
.
|

\
|

+ =
n
i
i
i
i i i i
y n
y n
y n y y y D
1

ln ) ( ) / ln( 2
7
1
st
Example
> ldose <- log2(dose)
> modb2 <- glm(SF ~ sex*ldose, family=binomial)
> summary(modb2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.9935 0.5527 -5.416 6.09e-08 ***
sexM 0.1750 0.7783 0.225 0.822
ldose 0.9060 0.1671 5.422 5.89e-08 ***
sexM:ldose 0.3529 0.2700 1.307 0.191
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.8756 on 11 degrees of freedom
Residual deviance: 4.9937 on 8 degrees of freedom
AIC: 43.104

8
1
st
Example
> drop1(modb2, test="Chisq")
Single term deletions

Model:
SF ~ sex * ldose
Df Deviance AIC LRT Pr(Chi)
<none> 4.994 43.104
sex:ldose 1 6.757 42.867 1.763 0.1842

> modb3 <- update(modb2, ~. sex:ldose)
> summary(modb3)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***
sexM 1.1007 0.3558 3.093 0.00198 **
ldose 1.0642 0.1311 8.119 4.70e-16 ***
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 124.876 on 11 degrees of freedom
Residual deviance: 6.757 on 9 degrees of freedom
AIC: 42.867
9
1
st
Example
> drop1(modb3, test="Chisq")
Single term deletions

Model:
SF ~ sex + ldose
Df Deviance AIC LRT Pr(Chi)
<none> 6.757 42.867
sex 1 16.984 51.094 10.227 0.001384 **
ldose 1 118.799 152.909 112.042 < 2.2e-16 ***

> shapiro.test(residuals(modb3), type="deviance")

Shapiro-Wilk normality test

data: residuals(modb3, type = "deviance")
W = 0.9666, p-value = 0.8725

10
1
st
Example
> par(mfrow=c(2,2))
> plot(modb3)
11
1
st
Example
> plot( c(0,1) ~ c(1,32), type="n", log="x",
xlab="dose", ylab="Probability")
> text(dose, numdead/20, labels=as.character(sex) )
> ld <- seq(0,32,0.5)
> lines (ld, predict(modb3, data.frame(ldose=log2(ld),
sex=factor(rep("M", length(ld)), levels=levels(sex))),
type="response") )
> lines (ld, predict(modb3, data.frame(ldose=log2(ld),
sex=factor(rep("F", length(ld)), levels=levels(sex))),
type="response"), lty=2, col="red" )
12
1
st
Example
> modbp <- glm(SF ~ sex*ldose,
family=binomial(link="probit"))
> AIC(modbp)
[1] 41.87836

> modbc <- glm(SF ~ sex*ldose,
family=binomial(link="cloglog"))
> AIC(modbc)
[1] 43.8663

> AIC(modb3)
[1] 42.86747
13
> summary(modb3)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***
sexM 1.1007 0.3558 3.093 0.00198 **
ldose 1.0642 0.1311 8.119 4.70e-16 ***
---

> exp(modb3$coeff) ## careful it may be misleading
(Intercept) sexM ldose
0.031019 3.006400 2.898560 ## odds ration: p / (1-p)

> exp(modb3$coeff[1]+modb3$coeff[2]) ## odds for males
(Intercept)
0.09325553
1
st
Example
logit scale
|
|
.
|

\
|
) 1 (
log
p
p
Every doubling of the dose will lead
to an increase in the odds of dying
over surviving by a factor of 2.899
14
Erythrocyte Sedimentation Rate in a group of patients.
Two groups : <20 (healthy) or >20 (ill) mm/hour
Q: Is it related to globulin & fibrinogen level in the blood ?
2
nd
Example
> data("plasma", package="HSAUR")
> str(plasma)
'data.frame': 32 obs. of 3 variables:
$ fibrinogen: num 2.52 2.56 2.19 2.18 3.41 2.46 3.22 2.21 ...
$ globulin : int 38 31 33 31 37 36 38 37 39 41 ...
$ ESR : Factor w/ 2 levels "ESR < 20","ESR > 20": 1 1 ...
> summary(plasma)
fibrinogen globulin ESR
Min. :2.090 Min. :28.00 ESR < 20:26
1st Qu.:2.290 1st Qu.:31.75 ESR > 20: 6
Median :2.600 Median :36.00
Mean :2.789 Mean :35.66
3rd Qu.:3.167 3rd Qu.:38.00
Max. :5.060 Max. :46.00
15
2
nd
Example
> stripchart(globulin ~ ESR, vertical=T, data=plasma,
xlab="Erythrocyte Sedimentation Rate (mm/hr)",
ylab="Globulin blood level", method="jitter" )

> stripchart(fibrinogen ~ ESR, vertical=T, data=plasma,
xlab="Erythrocyte Sedimentation Rate (mm/hr)",
ylab="Fibrinogen blood level", method="jitter" )
16
2
nd
Example
> mod1 <- glm(ESR~fibrinogen, data=plasma, family=binomial)
> summary(mod1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8451 2.7703 -2.471 0.0135 *
fibrinogen 1.8271 0.9009 2.028 0.0425 *
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 30.885 on 31 degrees of freedom
Residual deviance: 24.840 on 30 degrees of freedom
AIC: 28.840

> mod2 <- glm(ESR~fibrinogen+globulin, data=plasma,
family=binomial)
> AIC(mod2)
[1] 28.97111
factor
17
2
nd
Example
> anova(mod1, mod2, test="Chisq")
Analysis of Deviance Table

Model 1: ESR ~ fibrinogen
Model 2: ESR ~ fibrinogen + globulin
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 30 24.8404
2 29 22.9711 1 1.8692 0.1716

> summary(mod2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -12.7921 5.7963 -2.207 0.0273 *
fibrinogen 1.9104 0.9710 1.967 0.0491 *
globulin 0.1558 0.1195 1.303 0.1925
---
(Dispersion parameter for binomial family taken to be 1)

Null deviance: 30.885 on 31 degrees of freedom
Residual deviance: 22.971 on 29 degrees of freedom
AIC: 28.971
The difference in terms of
Deviance between these
models is not significant,
which leads us to select
the least complex model
18
2
nd
Example
> shapiro.test(residuals(mod1, type="deviance"))
Shapiro-Wilk normality test

data: residuals(mod1, type = "deviance")
W = 0.6863, p-value = 5.465e-07
> par(mfrow=c(2,2))
> plot(mod1)

Lab Introduction To STATA
100% (1)
Lab Introduction To STATA
27 pages
CH Logistic Regression GLM
No ratings yet
CH Logistic Regression GLM
21 pages
330 Lect22
No ratings yet
330 Lect22
33 pages
HW 4
No ratings yet
HW 4
12 pages
Shorten - Count Data Analysis
No ratings yet
Shorten - Count Data Analysis
24 pages
Biostatistics 1
No ratings yet
Biostatistics 1
18 pages
Rss Grad Diploma Module4 Solutions Specimen B PDF
No ratings yet
Rss Grad Diploma Module4 Solutions Specimen B PDF
16 pages
PS With R Lab Record Exp PDF
No ratings yet
PS With R Lab Record Exp PDF
25 pages
FandI Subj101 200104 Examreport
No ratings yet
FandI Subj101 200104 Examreport
11 pages
BUS173 Final Assignment
No ratings yet
BUS173 Final Assignment
9 pages
Sa 123
No ratings yet
Sa 123
13 pages
BIOS 521 HW4 Solutions
No ratings yet
BIOS 521 HW4 Solutions
8 pages
MAPHL Exponential
No ratings yet
MAPHL Exponential
8 pages
2019 Exam
No ratings yet
2019 Exam
14 pages
Biostatistics
No ratings yet
Biostatistics
7 pages
PSQF6270 Example4a Binomial
No ratings yet
PSQF6270 Example4a Binomial
13 pages
Sta 328.applied Regression Analysis Ii PDF
No ratings yet
Sta 328.applied Regression Analysis Ii PDF
12 pages
Biostatistics IV (Significance)
No ratings yet
Biostatistics IV (Significance)
14 pages
Math68052 Generalised Linear Models and Survival Analysis
No ratings yet
Math68052 Generalised Linear Models and Survival Analysis
12 pages
BES220 Sick Nov2022
No ratings yet
BES220 Sick Nov2022
12 pages
Logistic Regression (With R) : 1 Theory
No ratings yet
Logistic Regression (With R) : 1 Theory
15 pages
Homework 9 Solutions: Table (Type)
No ratings yet
Homework 9 Solutions: Table (Type)
6 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Final Estimationsa
No ratings yet
Final Estimationsa
7 pages
Assignment GLM
No ratings yet
Assignment GLM
3 pages
Bioestadistica: Clara Carner 2023-05-29
No ratings yet
Bioestadistica: Clara Carner 2023-05-29
4 pages
GLM Assignemnt
No ratings yet
GLM Assignemnt
3 pages
An Introduction To Generalized Linear Models (Third Edition, 2008) by Annette Dobson & Adrian Barnett Outline of Solutions For Selected Exercises
No ratings yet
An Introduction To Generalized Linear Models (Third Edition, 2008) by Annette Dobson & Adrian Barnett Outline of Solutions For Selected Exercises
23 pages
Wa0049.
No ratings yet
Wa0049.
4 pages
07 GLM
No ratings yet
07 GLM
49 pages
ECON1005 2023 Past Paper (April)
No ratings yet
ECON1005 2023 Past Paper (April)
5 pages
Aitkin
No ratings yet
Aitkin
9 pages
MATH1541-WE01 Statistics I May 2016
No ratings yet
MATH1541-WE01 Statistics I May 2016
8 pages
Ucs410 - 2023e Est
No ratings yet
Ucs410 - 2023e Est
2 pages
Ass 3 Skeleton - 1
No ratings yet
Ass 3 Skeleton - 1
4 pages
Statistics Problems
No ratings yet
Statistics Problems
14 pages
STAT 431 - Practice Term Test # 2
No ratings yet
STAT 431 - Practice Term Test # 2
4 pages
GLMM Revision
No ratings yet
GLMM Revision
15 pages
Pas Ese 2024 July
No ratings yet
Pas Ese 2024 July
2 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
7.probability First Yesr
No ratings yet
7.probability First Yesr
21 pages
TCD 2021
No ratings yet
TCD 2021
4 pages
Comparing Lethal Dose Ratios Using Probit
No ratings yet
Comparing Lethal Dose Ratios Using Probit
10 pages
Econometrics I - Problem Set 1: Econometricswithr Download R
No ratings yet
Econometrics I - Problem Set 1: Econometricswithr Download R
3 pages
Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10
No ratings yet
Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10
6 pages
Stata Session 10 1
No ratings yet
Stata Session 10 1
3 pages
Kelly Chase
No ratings yet
Kelly Chase
10 pages
Anova PDF
100% (1)
Anova PDF
7 pages
Ujian Akhir Bioslanjut 2013
No ratings yet
Ujian Akhir Bioslanjut 2013
8 pages
University of Gujrat: Department of Management Sciences
No ratings yet
University of Gujrat: Department of Management Sciences
13 pages
University of Gujrat: Department of Management Sciences
No ratings yet
University of Gujrat: Department of Management Sciences
10 pages
Department of Statistics: STATS 762: Topics in Regression Modelling Term Test Friday October 12, 2007
No ratings yet
Department of Statistics: STATS 762: Topics in Regression Modelling Term Test Friday October 12, 2007
6 pages
Intergrated Problem
No ratings yet
Intergrated Problem
8 pages
MATH2201 Assignment 1
No ratings yet
MATH2201 Assignment 1
3 pages
QP 19SS QEStat Accessible
No ratings yet
QP 19SS QEStat Accessible
8 pages
Machine Learning GenAI Roadma
No ratings yet
Machine Learning GenAI Roadma
36 pages
Stats-Proj Group 2
0% (1)
Stats-Proj Group 2
53 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
MQM100 MultipleChoice Chapter10
71% (7)
MQM100 MultipleChoice Chapter10
28 pages
MGT555 Test June 2020 PDF
No ratings yet
MGT555 Test June 2020 PDF
11 pages
Econometrics Test Bank
No ratings yet
Econometrics Test Bank
134 pages
3.1 Multiple Choice: Introduction To Econometrics, 3e (Stock) Chapter 3 Review of Statistics
No ratings yet
3.1 Multiple Choice: Introduction To Econometrics, 3e (Stock) Chapter 3 Review of Statistics
32 pages
Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
Unit 1 Business Analytics
No ratings yet
Unit 1 Business Analytics
24 pages
Inferensi Disekitar Mean Dan Pos Hoc-Zahro
No ratings yet
Inferensi Disekitar Mean Dan Pos Hoc-Zahro
11 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
39 pages
Fernando, Logit Tobit Probit March 2011
No ratings yet
Fernando, Logit Tobit Probit March 2011
19 pages
PDF
No ratings yet
PDF
9 pages
Econometric Lec6
No ratings yet
Econometric Lec6
52 pages
Sample Solution
No ratings yet
Sample Solution
4 pages
Regression Analysis and Linear Models Concepts Applications and Implementation 1st Edition Richard B. Darlington PHD Instant Download
100% (2)
Regression Analysis and Linear Models Concepts Applications and Implementation 1st Edition Richard B. Darlington PHD Instant Download
62 pages
June 2020 QP - Paper 1 Edexcel Psychology As-Level
No ratings yet
June 2020 QP - Paper 1 Edexcel Psychology As-Level
24 pages
IT6006-Data Analytics Department of CSE 2018-2019
No ratings yet
IT6006-Data Analytics Department of CSE 2018-2019
193 pages
Logistics Regression Notes
No ratings yet
Logistics Regression Notes
12 pages
RAK Dafiq 4X2X3 FIXED
No ratings yet
RAK Dafiq 4X2X3 FIXED
96 pages
SSC CGL 2024 Tier-II (Statistics) Official Paper-II (Held On - 19 Jan, 2025)
No ratings yet
SSC CGL 2024 Tier-II (Statistics) Official Paper-II (Held On - 19 Jan, 2025)
34 pages
Universal Bank Case Solution
No ratings yet
Universal Bank Case Solution
9 pages
Case Study - V1 - Final
No ratings yet
Case Study - V1 - Final
12 pages
Violations of Classical Assumptions: Chapter Four
No ratings yet
Violations of Classical Assumptions: Chapter Four
38 pages
13 Correlation Analysis 1633738603
No ratings yet
13 Correlation Analysis 1633738603
17 pages
BUS 310 - Problem Set 3
No ratings yet
BUS 310 - Problem Set 3
9 pages
GROUP 11 Chi - Square Test
No ratings yet
GROUP 11 Chi - Square Test
13 pages
Final Review Questions
No ratings yet
Final Review Questions
5 pages
Statistics Chapter 2
No ratings yet
Statistics Chapter 2
1 page
Multivariate Data Analysis Techniques Using Python. Dimension Reduction, Classification and Segmentation
From Everand
Multivariate Data Analysis Techniques Using Python. Dimension Reduction, Classification and Segmentation
César Pérez López
No ratings yet
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Solutions Manual to accompany Introduction to Linear Regression Analysis
From Everand
Solutions Manual to accompany Introduction to Linear Regression Analysis
Douglas C. Montgomery
1/5 (1)
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

Advanced Research Skills: Glms Ii Binomial Family

Uploaded by

Advanced Research Skills: Glms Ii Binomial Family

Uploaded by

Lecture 7

You might also like