Example - Zero-Inflated, Generalized Linear Mixed Model For Count Data
Example - Zero-Inflated, Generalized Linear Mixed Model For Count Data
The model
Data collected in ecological studies are often complex. Studies may involve
repeat observations on the same units (e.g., individuals, quadrats, stations).
Predictor variables may be categorical or continuous, and interactions may be
of interest. In addition, such data may contain excess zero-valued observations
(with respect to a Poisson model) because of sampling variability and/or incomplete knowledge of processes that generated the data (e.g., factors that define
suitable species habitat) Zuur et al. (2009). Zero-inflated generalized linear
mixed-effects models (ZIGLMMs) are a class of models, incorporating aspects
of generalized linear models, mixed models, and zero-inflated models, that are
both flexible and computationally efficient tools for data of this sort.
The data for this example, taken from Zuur et al. (2009) and ultimately
from Roulin and Bersier (2007), quantify the amount of sibling negotiation (vocalizations when parents are absent) by owlets (owl chicks) in different nests
as a function of food treatment (deprived or satiated), the sex of the parent,
and arrival time of the parent at the nest. Since the same nests are observed
repeatedly, it is natural to consider a mixed-effects model for these data, with
the nest as a random effect. Because we are interested in explaining variability
in the number of vocalizations per chick, the total brood size in each nest is
used as an offset in the model.
Since the same nests are measured repeatedly, Zuur et al. (2009) fitted a
Poisson generalized linear mixed-effects model to these data (see their Section
13.2.2). We extend that example by considering zero-inflation.
We use a zero-inflated Poisson model with a log link function for the count
(Poisson) part [i.e., the inverse link is exponential] of the model and a logit link
for the binary part [i.e., the inverse link is logistic]
Yij ZIPois( = exp(ij ), p = logistic(zi ))
(1)
In this case we use only a single, overall zero-inflation parameter for the
whole model zi = z0 . (In Rs Wilkinson-Rogers notation, this would be
written as z~1.)
The model for the vector , the linear predictor of the Poisson part of the
distribution, follows a standard linear mixed model formulation:
(|B = b) = X + Zb + f
(2)
B Normal(0, )
(3)
and
where is the vector of linear predictors; B is the random variable representing
a vector of random effects; b is a particular value of this random variable; X
is the fixed-effect design matrix; is the vector of fixed-effect parameters; Z
is the random-effect design matrix; is the variance-covariance matrix of the
random effects, with parameters ; and f is an offset vector.
In this case, the fixed-effect design matrix X includes columns for the intercept, difference between males and females, between food treatments, arrival
times, and their interactions. The random effect design matrix Z is a simple
dummy matrix encoding nest identity. There is only a single random effect, so
2
on the
is a diagonal matrix with the random effects (nest) variance nest
diagonal (the nest random effects are independent and identically distributed)
2
and the random-effects parameter vector is just = {nest
}. The offset f , equal
to the log of the number of chicks in the nest, accounts for the fact that the
data give the total number of vocalizations per nest, while we are ultimately
interested in the total number of vocalizations per chick.
For the purposes of model comparison (timing, accuracy etc.) we will stick
with the offset, zero-inflated Poisson model described by eqs. (2, 1), but in
the discussion below we will also consider alternative formulations that seem to
fit the data well (e.g. using overdispersed distributions such as the lognormalPoisson or negative binomial; allowing the response to brood size to be other
than strictly proportional).
Preliminaries
Load packages:
>
>
>
>
>
>
>
>
>
library(coda)
library(reshape)
library(MCMCglmm)
library(coefplot2)
library(glmmADMB)
library(bbmle)
library(ggplot2)
theme_set(theme_bw())
library(RColorBrewer)
> library(R2jags)
> ## library(lme4)
Package versions:
bbmle
1.0.5.2
lme4
0.99999911-0
rjags
3-7
coda
0.15-2
MCMCglmm
2.16
coda
0.15-2
R2jags
0.03-08
coefplot2
ggplot2
0.1.3
0.9.2.1
R2WinBUGS RColorBrewer
2.1-18
1.0-5
glmmADMB
0.7.2.12
reshape
0.8.4
Load the data and use the rename function from the reshape package to
change the name of the response variable:
> load("../DATA/Owls.rda")
> library(reshape)
> Owls <- rename(Owls,c(SiblingNegotiation="NCalls"))
Scale arrival time: necessary for WinBUGS, useful to have it done up front
for the sake of parameter comparisons1
> Owls <- transform(Owls,ArrivalTime=scale(ArrivalTime,center=TRUE,scale=FALSE))
There are (at least) three different ways to do this problem in R, although (as
far as we know) there is no simple, out-of-the-box method that is built purely
on R (or underlying C or FORTRAN code) that solves the problem as we have
stated it.
The reference method, used for comparisons with ADMB and WinBUGS,
is the MCMCglmm package; this method is the only one that works out of
the box in R without recourse to other (non-R) tools. Its only disadvantage is that it fits a lognormal-Poisson version of the model (i.e., with
an observation-level random effect added (Elston et al., 2001) rather than
the reference Poisson model. In terms of equation 2, we add a parameter
2
obs
to the random-effects parameter vector and expand to incorpo2
rate an additional set of diagonal components obs
(the observation-level
random effects are also independent and identically distributed).
(Because it allows for overdispersion in the count part of the model, this
model is actually superior to the ZIP we originally proposed, but it is not
the reference model.)
1 here we replace the original variable with the scaled version, rather than creating a new
variable called (say) scArrivalT. This can potentially lead to confusion if we forget whether
we are dealing with the scaled or the unscaled version . . .
One can cheat a little bit and make use of ADMB (but without having to
do any explicit AD Model Builder coding) by using the glmmADMB package,
which encapsulates a subset of ADMBs capabilities into a pre-compiled
binary that can be run from within R using a fairly standard formula
syntax.
With a bit of hacking, one can write up a fairly simple, fairly generic implementation of the expectation-maximization (EM) algorithm that alternates between fitting a GLMM (using glmer) with data that are weighted
according to their zero probability, and fitting a binary GLM (using glm)
for the probability that a data point is zero. We have implemented this
in the code in owls_R_funs.R, as a function called zipme (see below).
3.1
MCMCglmm
f1
A
A
B
x1
1.1
2.3
1.7
trait
resp
resp
resp
zi_resp
zi_resp
zi_resp
f1
A
A
B
A
A
B
x1
1.1
2.3
1.7
1.1
2.3
1.7
only the binary part (zi_resp) of the model: the first would be specified as an
interaction of at.level(trait,1) with a covariate (i.e., the covariate affects
only responses for level 1 of trait, which are the count responses).
Here is the model specified by Zuur et al. (2009), which includes an offset effect of brood size and most but not all of the interactions between FoodTreatment,
SexParent, and ArrivalTime on the count aspect of the model, but only a single fixed (intercept) effect on the binary part of the model (i.e., the zero-inflation
term):
> fixef2 <- NCalls~trait-1+ ## intercept terms for both count and binary terms
at.level(trait,1):logBroodSize+
at.level(trait,1):((FoodTreatment+ArrivalTime)*SexParent)
(If we wanted to include covariates in the model for the level of zero-inflation
we would use an interaction with at.level(trait,2): for example, e.g. we
would add a term at.level(trait,2):SexParent to the fixed-effect model
if we wanted to model a situation where the zero-inflation proportion varied
according to parental sex.)
The next complexity is in the specification of the priors, which (ironically)
we have to do in order to make the model simple enough. By default, MCMCglmm
will fit the same random effect models to both parts of the model (count and
binary). Here, we want to turn off the random effect of nest for the binary
aspect of the model. In addition, MCMCglmm always fits residual error terms for
both parts of the model. In our first specification, we first fix the residual error
(R) of the binary part of the data to 1 (because it is not identifiable) by setting
fix=2; the parameter nu=0.002 specifies a very weak prior for the other (nonfixed) variance term. (In order to get reasonable mixing of the chain we have
to fix it to a non-zero value.) We also essentially turn off the random effect on
the binary part of the model by fixing its variance to 106 , in the same way.
> prior_overdisp
<- list(R=list(V=diag(c(1,1)),nu=0.002,fix=2),
G=list(list(V=diag(c(1,1e-6)),nu=0.002,fix=2)))
prior_overdisp will serve as a base version of the prior, but we also want
to specify that the log brood size enters the model as an offset. We do this
by making the priors for all but the log-brood-size effect (nearly) equal to the
default value for fixed effects [N ( = 0, 2 = 108 ) the variance for the log
brood size effect is an (extremely) weak prior centered on zero] and setting a
very strong prior centered at 1 [N ( = 1, 2 = 106 )] for the log brood size
effect.2
> prior_overdisp_broodoff <- c(prior_overdisp,
list(B=list(mu=c(0,1)[offvec],
V=diag(c(1e8,1e-6)[offvec]))))
2 If we set 2 = 1010 , the default value, for the non-offset predictors, we get an error saying
that fixed effect V prior is not positive definite the difference in variances is so
large that the smaller variance underflows to zero in a calculation.
NCalls.Nest
zi_NCalls.Nest
~idh(trait):Nest
post.mean l-95% CI u-95% CI eff.samp
0.095764 0.020663 0.203210
439.1
0.000001 0.000001 0.000001
0.0
6
R-structure:
~idh(trait):units
NCalls.units
zi_NCalls.units
Regression estimates
1.5
ArrivalTime:SexParentMale
FTSatiated:SexParentMale
SexParentMale
ArrivalTime
FTSatiated
logBroodSize
zi_NCalls
NCalls
1.0
0.5
0.0
1.0
0.1
0.2
0.3
0.4
VCV.NCalls.units
VCV.NCalls.Nest
0.5
0.5
We have to look at the trace plots plots of the time series of the Markov
chain to make sure that the fits behaved appropriately. We are hoping that
the trace plots look like white noise, with rapid and featureless variation:
For the fixed effects:
> print(xyplot(mfit1$Sol,layout=c(3,3)))
ArrivalTime
SexParentMale
0.3 0.1
logBroodSize
1.000
zi_NCalls
0.996
0.4
1.8
0.6
1.4
0.8
1.0
NCalls
1.0 0.20
1.0
0.6
0.10
0.10.2
0.00
FTSatiated
0.2
0.4
FTSatiated:SexParentMale
ArrivalTime:SexParentMale
Iteration number
The one parameter that looks slightly questionable is zi_NCalls, and indeed
its effective size is rather lower than wed like (we should be aiming for at least
200 if we want reliable confidence intervals):
> round(sort(effectiveSize(mfit1$Sol)))
zi_NCalls
78
ArrivalTime
549
SexParentMale
811
FTSatiated ArrivalTime:SexParentMale
331
541
FTSatiated:SexParentMale
NCalls
570
671
logBroodSize
1000
Note that although logBroodSize is varying, its varying over a very small range
(0.998-1.002) because of the strong prior we imposed.
We can also run a quantitative check on convergence, using geweke.diag
which gives a Z-statistic for the similarity between the first 10% and the last
50% of the chain:
> geweke.diag(mfit1$Sol)
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
NCalls
zi_NCalls
0.94798
-1.75351
FTSatiated
ArrivalTime
0.13767
-0.58302
FTSatiated:SexParentMale ArrivalTime:SexParentMale
0.63148
1.25217
logBroodSize
-0.07509
SexParentMale
-0.41654
All the values are well within the 95% confidence intervals of the standard
normal (i.e., |x| < 1.96).
For variance parameters:
>
>
>
>
>
vv <- mfit1$VCV
## drop uninformative ZI random effects
vv <- vv[,c("NCalls.Nest","NCalls.units")]
print(xyplot(vv,layout=c(1,2)))
effectiveSize(vv)
NCalls.Nest NCalls.units
439.1036
325.4316
0.3
0.4
0.5
0.6
NCalls.units
0.0
0.1
0.2
0.3
0.4
0.5
NCalls.Nest
200
400
600
800
1000
Iteration number
Density plots display the posterior distributions of the parameters, useful for
checking whether the distribution of sampled points is somehow odd (strongly
skewed, bimodal, etc.). Symmetry and approximate normality of the posterior
are useful for inference (e.g. the DIC is based on an assumption of approximate
10
normality, and quantiles and highest posterior density intervals give the same
results for symmetry), but not absolutely necessary.
For fixed effects:
> print(densityplot(mfit1$Sol,layout=c(3,3)))
0.0
0.5
0.15 0.05
0.05
0.15
ArrivalTime
0 5 10
FTSatiated
0.0 1.5
Density
0.5
1.0
0.6
0.20.0
0.20
0.10
0.00
0.6
0.8
1.0
0.2
logBroodSize
0 200
0123
0.4
SexParentMale
zi_NCalls
0 1 2 3
NCalls
01234
0.0 1.5
02468
FTSatiated:SexParentMale
ArrivalTime:SexParentMale
2.0
1.6
1.2
11
0.996
1.000
1.004
NCalls.units
0 2 4 6
Density
0 2 4 6 810
NCalls.Nest
0.0
0.1
0.2
0.3
0.4
0.5
0.3
0.4
0.5
0.6
Comparing the fits of the model with and without log brood size:
> coefplot2(list("brood-offset"=mfit1,"brood-est"=mfit2),
intercept=TRUE,
varnames=vn1,
legend=TRUE,legend.x="right")
12
Regression estimates
1
ArrivalTime:SexParentMale
FTSatiated:SexParentMale
SexParentMale
ArrivalTime
FTSatiated
broodest
broodoffset
logBroodSize
zi_NCalls
NCalls
With the exception of the log brood size and intercept parameters, the parameters are similar. The log brood size parameter is quite far from 1:
> coda::HPDinterval(mfit2$Sol)["logBroodSize",]
lower
upper
0.03850979 0.73115415
We can also create versions of the model that attempt to eliminate the
overdispersion in the count part of the model. We cant fix both variance parameters, but we can make the variance prior informative (by setting nu=100, or
nu=1000) and make the (1,1) element of the variance matrix small. However, if
we try too hard to do this, we sacrifice a well-mixed chain (this mixing property
is the reason that MCMCglmm automatically adds an observation-level variance to
every model). Since the model runs reasonably fast, for this case we might be
able to use brute force and just run the model 10 or 100 times as long . . .
We could set up prior specifications for this non-overdispersed case as follows:
> prior_broodoff <- within(prior_overdisp_broodoff,
R <- list(V=diag(c(1e-6,1)),nu=100,fix=2))
> prior_null <- within(prior_overdisp,
R <- list(V=diag(c(1e-6,1)),nu=100,fix=2))
13
3.2
glmmADMB
This problem can also be done with glmmADMB. The model is about equally fast,
and the coding is easier! On the other hand, there is arguably some advantage
to having the MCMC output (which would take longer to get with ADMB).
> library(glmmADMB)
> gt1 <- system.time(gfit1 <- glmmadmb(NCalls~(FoodTreatment+ArrivalTime)*SexParent+
offset(logBroodSize)+(1|Nest),
data=Owls,
zeroInflation=TRUE,
family="poisson"))
It takes 26.9 seconds, slightly faster than MCMCglmm.
> summary(gfit1)
Call:
glmmadmb(formula = NCalls ~ (FoodTreatment + ArrivalTime) * SexParent +
offset(logBroodSize) + (1 | Nest), data = Owls, family = "poisson",
zeroInflation = TRUE)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
0.8571
0.0823
10.41 < 2e-16 ***
FoodTreatmentSatiated
-0.3314
0.0635
-5.22 1.8e-07 ***
ArrivalTime
-0.0807
0.0156
-5.18 2.3e-07 ***
SexParentMale
-0.0838
0.0455
-1.84
0.066 .
FoodTreatmentSatiated:SexParentMale
0.0740
0.0761
0.97
0.331
ArrivalTime:SexParentMale
-0.0150
0.0143
-1.05
0.295
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Number of observations: total=599, Nest=27
Random effect variance(s):
Group=Nest
Variance StdDev
(Intercept)
0.14 0.3742
Zero-inflation: 0.25833 (std. err.: 0.018107 )
Log-likelihood: -1985.3
The results are quite similar, although MCMCglmm gives wider confidence
intervals in general because it considers the uncertainties in all model components, and because it allows for overdispersion.
14
0.6
0.4
0.2
0.0
0.2
ArrivalTime:SexParentMale
FTSatiated:SexParentMale
SexParentMale
ArrivalTime
FTSatiated
0.4
glmmADMB
MCMCglmm
15
Regression estimates
0.8
0.6
0.4
0.2
0.0
0.2
0.4
ArrivalTime:SexParentMale
FTSatiated:SexParentMale
SexParentMale
ArrivalTime
FTSatiated
glmmADMB_NB
glmmADMB_Poiss
MCMCglmm
It seems that most of the difference in confidence interval size was due to
overdispersion (or lack of it), rather than to what components of uncertainty
were included.
To explore the variance-mean relationship, we can calculate the mean and
variance by group and look at the relationship. A linear relationship implies
that a quasi-Poisson or negative binomial type 1 with variance V proportional
the mean (Hardin and Hilbe, 2007) is likely to be best, while a relationship
of the form V = + C2 implies that a negative binomial type 2 (the standard
in R and most other statistics packages) or lognormal-Poisson fit is likely to be
best.
> library(plyr)
> mvtab <- ddply(Owls,
.(FoodTreatment:SexParent:Nest),
summarise,
callmean=mean(NCalls),
callvar=var(NCalls))
> q1 <- qplot(callmean,callvar,data=mvtab)
> print(q1+
## linear (quasi-Poisson/NB1) fit
geom_smooth(method="lm",formula=y~x-1)+
## smooth (loess)
geom_smooth(colour="red")+
## semi-quadratic (NB2/LNP)
geom_smooth(method="lm",formula=y~I(x^2)+offset(x)-1,colour="purple")+
## Poisson (v=m)
geom_abline(a=0,b=1,lty=2))
16
100
callvar
50
10
15
callmean
17
Regression estimates
1.0
0.5
0.0
0.5
ArrivalTime:SexParentMale
FTSatiated:SexParentMale
SexParentMale
ArrivalTime
FTSatiated
glmmADMB_NB1
glmmADMB_NB
glmmADMB_Poiss
MCMCglmm
3.3
df
9
9
8
EM algorithm
we have written which takes a standard R model formula for the Poisson part
(cformula); a model formula with z on the left-hand side for the zero-inflation
part; and addition parameters data, maxitr (maximum number of iterations),
tol (tolerance criterion), and verbose (whether to print out progress messages).
> source("../R/owls_R_funs.R")
> library(lme4)
> zt1 <- system.time(ofit_zipme <zipme(cformula=NCalls~(FoodTreatment+ArrivalTime)*SexParent+
offset(logBroodSize)+(1|Nest),
zformula=z ~ 1,
data=Owls,maxitr=20,tol=1e-6,
verbose=FALSE))
> Owls$obs <- seq(nrow(Owls))
> zt2 <- system.time(ofit2_zipme <zipme(cformula=NCalls~(FoodTreatment+ArrivalTime)*SexParent+
offset(logBroodSize)+(1|Nest)+(1|obs),
zformula=z ~ 1,
data=Owls,maxitr=20,tol=1e-6,
verbose=FALSE))
The EM fits take 9.9 and 91.2 seconds without and with observation-level
random effects, respectively.
The EM results are similar to the corresponding fits from other approaches:
the Poisson fit looks most similar to the glmmADMB Poisson fit (this is the reference model, which doesnt account for overdispersion) and the lognormalPoisson fit (with an observation-level random effect) looks most like the MCMCglmm
fit.
The only innovation here is using brewer.pal(6,"Dark2") from the RColorBrewer
package to get some nicer colors.
>
>
>
>
>
19
Regression estimates
1.0
0.5
0.0
0.5
ArrivalTime:SexParentMale
FTSatiated:SexParentMale
SexParentMale
ArrivalTime
FTSatiated
20
zipme2
zipme
glmmADMB_NB1
glmmADMB_NB
glmmADMB_Poiss
MCMCglmm
ADMB
4.1
1
2
3
4
5
6
7
8
Code in owls.tpl
DATA_SECTION
init_int nobs
// # of observations (599)
init_int nnests
// # of Random Effect Levels (Nests) (27)
init_int nf
// # of fixed effects incl interactions and intercept
init_matrix fdesign(1,nobs,1,nf)
//Design matrix (intercept & fixed effects)
init_matrix rdesign(1,nobs,1,nnests)
//Design matrix (random effects)
init_vector obs(1,nobs)
//Response variable (number of calls)
init_vector Lbrood(1,nobs)
//Log of brood size (offset)
9
10
11
12
13
14
15
16
17
18
PARAMETER_SECTION
init_vector fcoeffs(1,nf)
init_number logit_p
init_bounded_number sigma(0.001,100)
vector eta(1,nobs)
vector mu(1,nobs)
number p
objective_function_value jnll
random_effects_vector rcoeffs(1,nnests)
19
20
21
22
23
24
PROCEDURE_SECTION
jnll = 0.0;
p = exp(logit_p)/(1.0+exp(logit_p));
eta = Lbrood + fdesign*fcoeffs + rdesign*(rcoeffs*sigma);
mu = exp(eta);
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
TOP_OF_MAIN_SECTION
gradient_structure::set_MAX_NVAR_OFFSET(764);
GLOBALS_SECTION
43
21
the joint negative log-likelihood of the observation model and the random
effects.
line 22 transforms the logit of the zero-inflation parameter back to the
probability scale
line 23 computes the linear predictor (offset + fixed effects + random
effects): in rcoeffs*sigma (scaling the random effects by the randomeffects standard deviation: vector scalar), * acts as elementwise multiplication; in the other cases (multiplying fixed-effect and random-effect
design matrices by the corresponding parameter vectors) it acts as matrix
multiplication
line 24 converts the linear predictor from the log to the count scale;
lines 2636 loop over the observations, subtracting the ZIP log-likelihood
(computed on lines 2835) for each observation to the objective function
value (joint negative log-likelihood) jnll
line 38 adds the negative log-likelihood of the random effects to jnll
line 42 will make sure theres enough space. ADMB told us to use this
number when we tried to run the program.
4.2
Load the R2admb package and tell R where to find the admb executable:
> library(R2admb)
> setup_admb()
[1] "/usr/local/admb"
Read in the data (if we hadnt already done so):
> load("../DATA/Owls.rda")
> ## don't forget to transform!
> Owls <- transform(Owls,ArrivalTime=scale(ArrivalTime,center=TRUE,scale=FALSE))
Organize the data these definitions need to be defined in advance in order
for R2admb to check them properly ...
> LBrood <- log(Owls$BroodSize)
> mmf <- model.matrix(~(FoodTreatment+ArrivalTime)*SexParent,
data=Owls)
> mmr <- model.matrix(~Nest-1, data=Owls)
> response <- Owls$SiblingNegotiation
> nf <- ncol(mmf)
> nobs <- nrow(Owls)
> nnests <- length(levels(Owls$Nest))
22
Combine these objects into a list and write them to a file in the appropriate
format for ADMB:
> regdata=list(nobs=nrow(Owls),
nnests=length(levels(Owls$Nest)),
nf=ncol(mmf),
fdesign=mmf,
rdesign=mmr,
obs=response,
Lbrood=LBrood)
> write_dat("owls", L=regdata)
Define and write a list of starting values for the coefficients:
> regparams=list(fcoeffs=rep(0,ncol(mmf)),
logit_p=0,
sigma=1,
rcoeffs=rep(0.001,length(levels(Owls$Nest))))
> write_pin("owls", L=regparams)
Compile the model, specifying that it contains random effects:
> compile_admb("owls", re=TRUE)
Run the compiled executable:
> xargs <- "-noinit -nox -l1 40000000 -nl1 40000000 -ilmn 5"
> run_admb("owls",extra.args=xargs)
The extra argument -noinit tells ADMB to start the random effects from the
last optimum values, instead of the pin file values, when doing the Laplace
approximation. -nox reduces the amount of information output while its running. -l1 allocates memory to prevent ADMB from creating the temporary
storage file f1b2list1, which is much slower than using RAM. -nl1 is similar
to -l1, but for the temporary file nf1b2list1. Users add these command line
options when they see temporary files created. The amount to allocate is done
by trial and error or experience. In this users experience, 40000000 is a good
value to try. -ilmn 5 is used to make ADMB run faster when there are a lot of
random effects; it tells ADMB to use a limited memory quasi-Newton optimization algorithm and only save 5 steps. (See the ADMB and ADMB-RE manuals, and http://admb-project.org/community/tutorials-and-examples/
memory-management, for more information.)
Read in the results and clean up files that were produced by ADMB:
>
>
>
>
4.3
Results
The fit_admb object read in by R2admb works with many of the standard accessor methods in R: coef, stdEr, vcov, confint, etc.:
> methods(class="admb")
[1] AIC.admb*
[6] logLik.admb*
coef.admb*
print.admb*
coeftab.admb*
stdEr.admb*
confint.admb*
summary.admb*
deviance.admb*
vcov.admb*
FoodTreatmentSatiated
-0.2911060
SexParentMale
-0.0809064
ArrivalTime:SexParentMale
-0.0213974
sigma
0.3596638
These coefficients are in the same order as the columns of the model matrix
mmf
> colnames(mmf)
[1] "(Intercept)"
"FoodTreatmentSatiated"
[3] "ArrivalTime"
"SexParentMale"
[5] "FoodTreatmentSatiated:SexParentMale" "ArrivalTime:SexParentMale"
Checking equivalence:
> write_pin("owls",as.list(c(coef(gfit1),
logitpz=qlogis(gfit1$pz),
sigma=sqrt(gfit1$S[[1]]))))
Comparing the three reference implementations (with glmmADMB, the EM
algorithm (zipme), and ADMB): do we care that the results from glmmADMB and ADMB arent quite identical? Is this due to robustification? glmmADMB has a better log-likelihood (1985 vs 1999)
either ADMB got stuck or the likelihoods are calculated differently
(robustification etc.)
24
>
>
>
>
0.3
0.2
0.1
0.0
0.1
ArrivalTime:SexParentMale
FTSatiated:SexParentMale
SexParentMale
ArrivalTime
FTSatiated
4.4
0.2
ADMB
zipme
glmmADMB_Poiss
The same model can be written in a much more efficient way if we exploit the
separability of its parameters. We do this by operating on each observation
separately and only sending the necessary parameters to the separable function.
This lets ADMB know how sparse the Hessian will be. The fewer parameters
get sent together, the sparser the Hessian. So instead of using a matrix for
the random effects, we use an index vector for which nest corresponds to each
observation. Then, for each observation, only the random effect for the relevant
nest gets sent to the separable function.
The new TPL file is owls_sep.tpl.
> sepdat=list(nobs=nrow(Owls),
nnests=length(levels(Owls$Nest)),
nf=ncol(mmf),
fdesign=mmf,
nests=as.integer(Owls$Nest),
25
obs=response,
Lbrood=LBrood)
> write_dat("owls_sep", L=sepdat)
The DATA_SECTION of the new tpl file looks like this:
DATA_SECTION
init_int nobs
// # obs (=599)
init_int nnests // # random effects levels (=nests=27)
init_int nf
// # fixed effects (incl intercept & interaction)
init_matrix fdesign(1,nobs,1,nf) // Design matrix for fixed effects + intercept
init_ivector nests(1,nobs)
// Nest indices
init_vector obs(1,nobs)
// Response variable (# calls)
init_vector Lbrood(1,nobs)
// Log of brood size (offset)
The PARAMETER_SECTION and par files are the same.
> write_pin("owls_sep", L=regpars)
The new PROCEDURE_SECTION is followed by two separable function definitions:
one to calculate the negative log-likelihood of the observations, and another to
calculate the negative log-likelihood of the random effects. These negative loglikelihoods get added to the joint negative log-likelihood jnll. Note that the
definition of a SEPARABLE_FUNCTION must all be on one line, but its split in this
document to fit on the page.
PROCEDURE_SECTION
jnll = 0.0;
for(int i=1; i<=nobs; i++)
{
pois(i, fcoeffs, rcoeffs(nests(i)), logit_p, sigma);
}
for(int n=1; n<=nnests; n++)
{
rand(rcoeffs(n));
}
SEPARABLE_FUNCTION void pois(int i, const dvar_vector& fcoeffs, const dvariable& r,
const dvariable& logit_p, const dvariable& sigma)
dvariable p = exp(logit_p)/(1.0+exp(logit_p));
dvariable eta = Lbrood(i) + fdesign(i)*fcoeffs + r*sigma;
dvariable mu = exp(eta);
if(obs(i)==0)
{
jnll-=log(p + (1-p)*exp(-mu) );
}
else//obs(i)>0
26
{
jnll-=log(1-p) + obs(i)*log(mu) - mu- gammln(obs(i)+1);
}
SEPARABLE_FUNCTION void rand(const dvariable& r)
jnll+=0.5*(r*r)+0.5*log(2.*M_PI); //for the random effects distributed N(0,1)
Then we compile and run the code from R and read back the results
> compile_admb("owls_sep", safe=FALSE, re=TRUE, verbose=FALSE)
> run_admb("owls_sep",verbose=FALSE,
extra.args="-shess -noinit -nox -l1 40000000 -nl1 40000000 -ilmn 5")
> fit_admb_sep <- read_admb("owls_sep")
The extra argument -shess tells ADMB to use algorithms that are efficient
for sparse Hessian matrices.
We get the same answer, at least up to a precision of 107 :
> all.equal(coef(fit_admb),coef(fit_admb_sep),tol=1e-7)
[1] "Names: 6 string mismatches"
5
1
BUGS
model {
## PRIORS
for (m in 1:5){
beta[m] ~ dnorm(0, 0.01)
}
alpha ~ dnorm(0, .01)
sigma ~ dunif(0, 5)
tau <- 1/(sigma*sigma)
psi ~ dunif(0, 1)
for(j in 1:nnests){
a[j] ~ dnorm(0, tau)
}
3
4
5
6
7
8
9
10
11
12
13
# Linear effects
14
for(i in 1:N){
SibNeg[i] ~ dpois(mu[i])
mu[i] <- lambda[i]*z[i]+0.00001 ## hack required for Rjags -- otherwise 'incompatible'
z[i] ~ dbern(psi)
log(lambda[i]) <- offset[i] + alpha + inprod(X[i,],beta) + a[nest[i]]
}
15
16
17
18
19
20
21
For convenience, we packaged the R code to run the BUGS model (in JAGS)
as a function.
> owls_BUGS_fit <- function(data, ni=25000, nb=2000, nt=10, nc=3) {
## Bundle data
data$ArrivalTime <- scale(data$ArrivalTime,center=TRUE,scale=FALSE)
fmm <- model.matrix(~(FoodTreatment+ArrivalTime)*SexParent,data=data)[,-1]
bugs.data <- with(data,list(N=nrow(data),
nnests=length(levels(Nest)),
offset = logBroodSize, ## nb specified on original scale
SibNeg = SiblingNegotiation,
nest = as.numeric(Nest),
X=fmm))
## Inits function
inits <- function(){list(a=rnorm(27, 0, .5),
sigma=runif(1,0,.5), alpha=runif(1, 0, 2), beta = rnorm(5))}
## Parameters to estimate
params <- c("alpha", "beta", "sigma", "psi")
29
The results give information about the fit (number of chains, length of burn-in,
number of samples, number saved) as well as estimated means, standard deviations, quantiles for each saved parameter. We use print(ofit,digits=2)
to reduce the resolution a bit so the print-out fits on the page better. (By default, R2jags prints results with 3 digits of precision, while R2WinBUGS prints
them with only 1 digit of precision: you can override either of these defaults by
using print with an explicit digits option set.) In addition, the results contain two useful diagnostics, the Gelman-Rubin statistic (Rhat) and the effective
size (n.eff). Because we have run multiple chains, we can use the (preferred)
Gelman-Rubin test of convergence, which assesses whether the variance among
and within chains is consistent with those chains sampling the same space (i.e.,
whether the chains have converged). The G-R statistics should be close to 1
(equal within- and between-chain variance), with a general rule of thumb that
< 1.2 is acceptable. (In this case it looks like we might have done a bit of
R
overkill, with the maximum Rhat value equal to 1.006.) The effective sample size
corrects for autocorrelation in the chain, telling how many equivalent samples
we would have if the chains were really independent. In this case the intercept
parameter mixes most poorly (n = 390), while is uncorrelated (n = 6900,
which is equal to the total number of samples saved). In general n > 1000 is
overkill, n > 200 is probably acceptable.
The R2jags package provides a plot method for rjags object, which falls
back on the method defined in R2WinBUGS for bugs objects; plot(ofit) would
give us a nice, rich plot that compares the chains, but the effect here is a
bit ruined by the deviance, which is on a very different scale from the other
variables. We extract the BUGSoutput component of the fit (which is what gets
plotted anyway) and use a utility function dropdev (defined elsewhere) to drop
the deviance component from the fit: We use a utility function
> source("../R/owls_R_funs.R")
> oo <- dropdev(ofit$BUGSoutput)
> plot(oo)
30
Bugs model at "owls.txt", fit using jags, 3 chains, each with 25000 iterations (first 2000 discarded)
Rhat
80% interval for each chain
0.5
0
0.5
1
1 1.5 2+
alpha
beta[1]
[2]
[3]
[4]
[5]
psi
0.5
0
0.5
1
1 1.5 2+
0.9
alpha
0.8
0.7
0.5
beta
0.5
1 2 3 4 5
0.78
0.76
psi
0.74
0.72
0.7
0.5
sigma0.4
0.3
> print(xyplot(mm2,layout=c(3,3)))
0
sigma
beta[5]
0.05
0.00
beta[4]
0.1
beta[3]
0.1 0.68
0.4
0.72
0.6
0.76
0.8
0.80
psi
beta[1]
beta[2]
0.04
0.08
0.12
0.6
0.8
1.0
0.50.40.30.20.1
alpha
.index
Combined summary
Finally, we will use some utility functions to summarize and compare all of the
parameter estimates from the methods described, for model variants as close
as we can get to the reference model (i.e. zero-inflated Poisson, log-brood-size
offset).
>
>
>
>
>
>
owls
> b <- basesum(ovals,ncolparam=3)
> print(b$plots$params)
32
intercept
R.MCMCglmm.CI.quantile
R.MCMCglmm.CI.mcmc
food
BUGS.CI.mcmc
R.zipme.CI.quad
R.MCMCglmm.CI.quantile
R.MCMCglmm.CI.mcmc
R.zipme.CI.quad
R.zipme.CI.quad
ADMB.separable.CI.quad
ADMB.regular.CI.quad
R.glmmadmb.CI.quad
ADMB.separable.CI.quad
R.glmmadmb.CI.quad
ADMB.regular.CI.quad
BUGS.CI.mcmc
0.8
R.MCMCglmm.CI.mcmc
BUGS.CI.mcmc
R.glmmadmb.CI.quad
0.6
ADMB.separable.CI.quad
ADMB.regular.CI.quad
arrival
R.MCMCglmm.CI.quantile
1.0
0.7
0.5
foodsex
sex
0.3
0.16
0.12
0.08
arrivalsex
R.MCMCglmm.CI.quantile
R.MCMCglmm.CI.quantile
R.MCMCglmm.CI.quantile
R.MCMCglmm.CI.mcmc
R.MCMCglmm.CI.mcmc
R.MCMCglmm.CI.mcmc
0.04
.id
BUGS.CI.mcmc
BUGS.CI.mcmc
BUGS.CI.mcmc
R.zipme.CI.quad
R.zipme.CI.quad
R.zipme.CI.quad
R.glmmadmb.CI.quad
R.glmmadmb.CI.quad
R.glmmadmb.CI.quad
ADMB.separable.CI.quad
ADMB.separable.CI.quad
ADMB.separable.CI.quad
ADMB.regular.CI.quad
ADMB.regular.CI.quad
ADMB.regular.CI.quad
0.3
0.2
0.1
logitpz
R.MCMCglmm.CI.quantile
R.MCMCglmm.CI.mcmc
0.0
0.1
BUGS.CI.mcmc
0.0
R.MCMCglmm.CI.quantile
R.MCMCglmm.CI.mcmc
0.2
nestvar
BUGS.CI.mcmc
R.zipme.CI.quad
R.zipme.CI.quad
R.glmmadmb.CI.quad
ADMB.separable.CI.quad
ADMB.separable.CI.quad
ADMB.regular.CI.quad
ADMB.regular.CI.quad
1.6
1.4
1.2
1.0
0.05
R.glmmadmb.CI.quad
1.8
0.4
0.0
0.1
0.2
0.3
0.4
0.5
est
The MCMCglmm fit is the most noticeably different from all the rest, (presumably) because it is fitting a zero-inflated lognormal-Poisson (overdispersed)
model rather than a zero-inflated Poisson model.
The next most obvious discrepancy is in the confidence intervals for the
among-nest variance (nestvar). The ADMB fit was done on the standard deviation scale, and the confidence intervals then transformed by squaring. The
more symmetric appearance of the MCMCglmm and BUGS confidence intervals
(which are based on the posterior distribution of the variance, and hence are invariant across changes in parameter scales) suggest that the sampling/posterior
distribution of the variance is actually more symmetric on the variance scale
than on the standard deviation scale.
Beyond this, there are slight differences in the parameter estimates from
glmmADMB, possibly because of some difference in stopping criteria. ?? could
try restarting glmmADMB at consensus parameter values . . .
Finally, we look at the timings of the various methods. The R methods are
all in the same ballpark, along with the non-separable variant ADMB model
33
0.00
0.05
ADMB.regular
ADMB.separable
BUGS
R.glmmadmb
R.MCMCglmm
R.zipme
(with E-M/zipme slightly faster than the others). BUGS is much slower ( 12
minutes), while the separable ADMB model is much faster ( 4 seconds):
We could use print(b$times) to look at the plot, but the values (times in
seconds) are easy enough to read:
> ff <- subset(ovals$base,select=c(.id,time))
> ff$time <- round(ff$time,1)
> ff[order(ff$time),]
.id time
2 ADMB.separable
4.5
7
R.zipme 15.5
4
R.MCMCglmm 25.5
5
R.MCMCglmm 25.5
6
R.glmmadmb 27.9
1
ADMB.regular 31.7
3
BUGS 725.9
References
Dempster, A., N. Laird, and D. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39,
138.
Elston, D. A., R. Moss, T. Boulinier, C. Arrowsmith, and X. Lambin (2001).
Analysis of aggregation, a worked example: numbers of ticks on red grouse
chicks. Parasitology 122 (5), 563569.
Gelman, A. and J. Hill (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge, England: Cambridge University Press.
Hardin, J. W. and J. Hilbe (2007, February). Generalized linear models and
extensions. Stata Press.
Lambert, D. (1992). Zero-inflated Poisson regression, with an application to
defects in manufacturing. Technometrics 34, 114.
Minami, M., C. Lennert-Cody, W. Gao, and M. Roman-Verdesoto (2007,
April). Modeling shark bycatch: The zero-inflated negative binomial regression model with smoothing. Fisheries Research 84 (2), 210221.
Roulin, A. and L. Bersier (2007). Nestling barn owls beg more intensely in
the presence of their mother than in the presence of their father. Animal
Behaviour 74, 10991106.
Schielzeth, H. (2010). Simple means to improve the interpretability of regression
coefficients. Methods in Ecology and Evolution 1, 103113.
34
Spiegelhalter, D. J., N. Best, B. P. Carlin, and A. Van der Linde (2002). Bayesian
measures of model complexity and fit. Journal of the Royal Statistical Society
B 64, 583640.
Zuur, A. F., E. N. Ieno, N. J. Walker, A. A. Saveliev, and G. M. Smith (2009,
March). Mixed Effects Models and Extensions in Ecology with R (1 ed.).
Springer.
35