0% found this document useful (0 votes)
13 views17 pages

Bayesian Inference For Generalized Linear Mixed Models A Comparison of Different Statistical Software Procedures

This article compares Bayesian inference methods for generalized linear mixed models (GLMM) using different statistical software, specifically focusing on INLA, JAGS, and Stan. Through extensive simulation studies, the authors evaluate the performance of these tools in estimating hierarchical Poisson regression models with overdispersion, highlighting challenges in estimating variance components. The findings suggest that while INLA is generally accurate and computationally efficient, all software may yield biased estimates under certain conditions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views17 pages

Bayesian Inference For Generalized Linear Mixed Models A Comparison of Different Statistical Software Procedures

This article compares Bayesian inference methods for generalized linear mixed models (GLMM) using different statistical software, specifically focusing on INLA, JAGS, and Stan. Through extensive simulation studies, the authors evaluate the performance of these tools in estimating hierarchical Poisson regression models with overdispersion, highlighting challenges in estimating variance components. The findings suggest that while INLA is generally accurate and computationally efficient, all software may yield biased estimates under certain conditions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

RMS: Research in Mathematics & Statistics

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/oama22

Bayesian inference for generalized linear mixed models:


A comparison of different statistical software procedures

Belay Birlie Yimer & Ziv Shkedy

To cite this article: Belay Birlie Yimer & Ziv Shkedy (2021) Bayesian inference for generalized
linear mixed models: A comparison of different statistical software procedures, RMS: Research
in Mathematics & Statistics, 8:1, 1896102, DOI: 10.1080/27658449.2021.1896102

To link to this article: https://doi.org/10.1080/27658449.2021.1896102

© 2021 The Author(s). This open access


article is distributed under a Creative
Commons Attribution (CC-BY) 4.0 license.

View supplementary material

Published online: 13 Apr 2021.

Submit your article to this journal

Article views: 3932

View related articles

View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=oama23
RMS: RESEARCH IN MATHEMATICS & STATISTICS
2021, VOL. 8, NO. 1, 1–16
https://doi.org/10.1080/27658449.2021.1896102

STATISTICS | RESEARCH ARTICLE

Bayesian inference for generalized linear mixed models: A comparison of


different statistical software procedures
a
Belay Birlie Yimer * and Ziv Shkedyb
a
Department of Statistics, Jimma University, Jimma, Ethiopia; bInteruniversity Institute for Biostatistics and Statistical Bioinformatics (I-biostat),
Hasselt University, Diepenbeek, Belgium

ABSTRACT ARTICLE HISTORY


Bayesian inference for generalized linear mixed models (GLMM) is appealing, but its widespread Received 19 August 2020
use has been hampered by the lack of a fast implementation tool and the difficulty in specifying Accepted 22 February 2021
prior distributions. In this paper, we conduct an extensive simulation study to evaluate the Keywords
performance of INLA for estimation of the hierarchical Poisson regression models with overdisper­ Clustering; Bayesian
sion in comparison with JAGS and Stan while assuming a variety of prior specifications for variance modelling; overdispersion;
components. Further, we analysed the influence of different factors such as small number of GLMM; count data; INLA;
observations per cluster, different values of the cluster variance and estimation from JAGS; stan
a misspecified model. A simulation study has shown that the approximation strategy employed
by INLA is accurate in general and that all software leads to similar results for most of the cases
considered. Estimation of the variance components, however, is difficult when their true value is
small for all estimation methods and prior specifications. The estimates obtained for all software
tend to be biased downward or upward depending on the assumed priors.

1 Introduction Gaussian Quadrature (AGQ) approximation. In contrast


The Integrated Nested Laplace Approximation (INLA) to the previous two studies, Grilli et al. (2015) showed
by Rue et al. (2009) is a Bayesian estimation method that the specification of the prior distribution is more
which is computationally faster than its predecessors. relevant than the choice of the estimation method.
Since its introduction, its performance compared to Following Fong et al. (2010), INLA’s developers
other software for Bayesian analysis has been widely addressed the inaccuracy of INLA for binary data with
reported in the literature. Taylor and Diggle (2013) com­ few or no replications by introducing a new correction
pared INLA and Markov chain Monte Carlo (MCMC) in term for INLA (Ferkingstad & Rue et al., 2015) and claim
the context of spatial log-Gaussian Cox processes. Their their correction has significantly improved the accuracy.
simulation study confirms the advantage of INLA in In this paper, we extend the investigation of Grilli et al.
terms of computational time, but shows that INLA has (2015) to longitudinal count data with overdispersion and
a lower predictive accuracy in certain scenarios. The use evaluate the performance of INLA in comparison with
of INLA for Bayesian inference for generalized linear JAGS (Plummer, 2003) and Stan (Stan Development
mixed models (GLMMs) was investigated by Fong et al. Team, 2015) while assuming a variety of prior specifica­
(2010) making a comparison with the maximum like­ tions for variance components. In contrast with the com­
lihood approach via Penalized Quasi Likelihood parison presented in Ferkingstad and Rue et al. (2015),
(Breslow & Clayton, 1993). Fong et al. (2010) showed based on the difference between the results obtained by
that the approximations were inaccurate for binary data INLA after the correction to the results obtained by
with few or no replications. Further, the performance of MCMC simulation, we base our comparison on the dif­
Bayesian inference using INLA for a random intercept ference between the parameter estimates obtained for all
logit model was investigated by Grilli et al. (2015), who methods and the true parameter values. Further, we inves­
presented a comparison with Bayesian MCMC Gibbs tigate the influence of small sample sizes and true values on
sampling and maximum likelihood with Adaptive the cluster variance and overdispersion.

CONTACT Belay Birlie Yimer [email protected] Department of Statistics, Jimma University, Jimma, Ethiopia
*Arthritis Research UK Centre for Epidemiology, Division of Musculoskeletal and Dermatological Sciences, the University of Manchester, Manchester, UK
E-mail: [email protected]
E-mail: [email protected]
Reviewing Editor: Yueqing Hu, Fudan University School of Life Sciences, Shanghai, China
Supplemental data for this article can be accessed here.
© 2021 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license.
You are free to: Share — copy and redistribute the material in any medium or format. Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: Attribution — You must give appropriate credit, provide a link to the
license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. No additional
restrictions You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
2 BELAY BIRLIE

We proceed as follows. In Section 2, the two case 2.2 The epilepsy data
studies used for illustration are presented. In section 3,
The epilepsy data set contains information about 59
we review the generalized linear mixed effects model,
epileptics’ patients that are randomized into two treat­
Bayesian estimation approaches and prior specifica­
ment arms, a placebo or a new drug in a randomized
tions. Section 4 presents results obtained for the two
clinical trial of anticonvulsant therapy (Thall & Vail,
studies where we apply the three estimation methods to
1990). The response variable, the number of epilepsy
the two longitudinal count data sets. The simulation
seizures, was measured at four visits over time. The data
design and the main results of the simulation study are
was presented and analysed by Breslow and Clayton
commented in Section 5. Finally, Section 6 gives
(1993) and used as an illustrating example in Fong
a discussion and concluding remarks.
et al. (2010) who evaluated the performance of INLA
in comparison with penalized quasi-likelihood (PQL).
2 Case studies The epilepsy data set is available in R (R Core Team,
2016) as part of the R package INLA (Rue et al., 2009).
Two data sets with longitudinal count outcomes were
Figure 2 shows the individual and average profiles of the
used to illustrate the methodological aspects discussed
epilepsy data for both treatment groups.
in this paper.

3 Hierarchical Bayesian models for count data


2.1 Anopheles mosquitoes count data
3.1 Hierarchical poisson-normal models
A longitudinal entomological study was conducted
between June 2013 and November 2013 in Jimma town, Generalized linear mixed models (GLMMs) extend the
south-western Ethiopia, to investigate whether the ecologi­ generalized linear model with a subject-specific random
cal transformation due to resettlement has an influence on effect, usually of Gaussian type, added to the linear
the abundance and species composition of Anopheles mos­ predictor to give a rich family of models that have
quitoes. This was carried out by comparing villages who are been used in a wide variety of applications (McCulloch
at the centre of the town with those newly emerged villages & Neuhaus, 2001; Molenberghs & Verbeke, 2005). The
located at the suburb of the town. The study design and analysis presented in this paper is focused on count
rationale are presented in Degefa et al. (2015). variables (number of female Anopheles mosquitoes and
The study consists of a longitudinal count data where number of epilepsy seizures) which were repeatedly
adult Anopheles mosquitoes resting inside human habi­ measured over time. Hence, we formulated
tations were collected monthly (June 2013— a hierarchical Poisson model for both case studies. Let
November 2013) from 40 selected houses using pyre­ Yij represent the response variable of the ith subject
thrum spray catches (PSCs). Half of the households measured at time j, i ¼ 1; 2; . . . ; I and j ¼ 1; 2; . . . ; J.
belong to resettled villages and are considered to be at A Poisson-Normal hierarchical (HPN) model can be
risk of Malaria infection. The second half of the house­ formulated as
hold belongs to villages located in low-risk areas (con­
Yij ,Poissonðλij Þ;
trol). Figure 1 (left panel) presents the monthly female
Anopheles mosquito count while the right panel pre­
sents the mean evolution over time. ηij ¼ logðλij Þ ¼ xijT β þ b0i ;

Figure 1. Female Anopheles mosquito count data. Individual count (left panel) and average profile (right panel) in at risk and control
villages, Jimma town, Southwest Ethiopia (June—November 2013).
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 3

Figure 2. Individual profiles (left panel) and average profile (right panel) of the epilepsy data for both treatment groups.

b0i ,Nð0; σ 2b0 Þ; (1) and they reported the performance of the two
approaches to be the same.
where xij is a p-dimensional design matrix of the fixed
effect parameters β and b0i is a random intercept typi­
cally used to model repeated count responses. To com­ 3.1.1 Model formulation for Anopheles mosquitoes
plete the specification, we used non-informative prior count data
distributions, namely, a normal distribution with mean Let Yij represent the number of female Anopheles mos­
zero and variance 10,000 for β‘s and considered various quito counts for household i during month j of the
non-informative prior distributions for σ 2b0 (see follow-up period, let tij be the time point (months) at
Section 3.2). which Yij has been measured, tij ¼ 1; . . . ; 6 for all
The HPN model specified in (1) assumes that all households, and let xi be an indicator variable, which
sources of variability in the data can be captured by denotes the village type of the ith household, which
the random effects and the Poisson variability. takes the value of one if the household is located in
A commonly encountered problem related to count a resettled (risk) village and zero if the household is
data is overdispersion (or underdispersion), which located in a control village. We assumed
may cause serious flaws in precision estimation and Yij jβ; b0i ,Poissonðλij Þ; i ¼ 1; . . . ; 40; j ¼ 1; . . . ; 6, and
inference (Breslow, 1990) if not appropriately accounted that the pattern of Anopheles mosquito abundance
for. A number of extensions of the HPN models have over time is log-linear and possibly different between
been proposed to account for extra-dispersion in the the control and at-risk villages. Thus, the HPN model
setting of longitudinal count data (2015; Aregay et al., has a linear predictor of the form
2013; Molenberghs et al., 2007). Aregay et al. (2015) ηij ¼ logðλij Þ
extend the HPN model in (1) by considering two sepa­
¼ β00 xi þ β01 ð1 xi Þ þ β10 xi tij þ β11 ð1 xi Þtij
rate random effects added to the linear predictor. The
þ b0i ; (3)
mean structure for their model (HPNOD) is given by
and the mean structure for the HPNOD is given by
ηij ¼ logðλij Þ ¼ xijT β þ b0i þ uij ;
ηij ¼ logðλij Þ
uij ,Nð0; σ 2u Þ; (2) ¼ β00 xi þ β01 ð1 xi Þ þ β10 xi tij þ β11 ð1 xi Þtij þ b0i
þ uij :
where b0i is the random intercept used to account for
(4)
possible clustering effect as before and the random effect
uij included to accommodate the overdispersion not
captured by the normal random effect b0i . The assumed 3.1.2 Model formulation for epilepsy data
priors for σ 2u are discussed in section 3.2. We choose the Breslow and Clayton (1993) analyzed the epilepsy data
additive overdispersion model above rather than the set presented in Section 2.2 using likelihood-based
multiplicative overdispersion model (Aregay et al., inference via penalized quasi-likelihood (PQL). Fong
2013; Molenberghs et al., 2007) to keep parametrization et al. (2010) used this data set to illustrate how
of the model to be the same across softwares. Further, Bayesian inference may be performed using INLA. We
a comparison of the additive model and the multiplica­ concentrate on the two random-effects models fit by
tive model has been described by Aregay et al. (2015) Breslow and Clayton (1993) and Fong et al. (2010). Let
4 BELAY BIRLIE

Yij be the number of seizures for patient i at visit j, i ¼ cluster variance and overdispersion is often difficult
1; 2; . . . ; 59 and j ¼ 1; 2; . . . ; 4, assumed to be condi­ (Gelman, 2006). Various non-informative prior dis­
tionally independent Poisson variables with mean tributions have been suggested in the Bayesian lit­
expðλij Þ, where the linear predictor for the HPN erature, including an improper uniform density on
model is given by the scale of standard deviation (Gelman et al.,
2003), proper distributions such as the inverse
ηij ¼ logðλij Þ gammað2; 2Þ with small positive 2 for the variance
¼ β0 þ β1 logðBaselinei =4Þ þ β2 Trti þ β3 Trti (Lunn et al., 2012), and conditionally conjugate
� logðBaselinei =4Þ folded-non-central-t family of prior distributions
for the standard deviation (Gelman, 2006). In this
þ β4 logðAgei Þ þ β5 V4i þ b0i ; (5) paper, we concentrate on the latter two approaches.
We considered three specifications based on
and the linear predictor for the HPNOD model is
inverse gammað2; 2Þ for the variance and half-
given by
Cauchy prior (a special case of the conditionally
ηij ¼ logðλij conjugate folded-non-central-t family of prior dis­
¼ β0 þ β1 logðBaselinei =4Þ þ β2 Trti þ β3 Trti tributions) for the standard deviation (see Figure 1
� logðBaselinei =4Þ in the supplementary material):
1. pðσ 2 Þ,Γð1; 0:0005Þ—the default choice of the
þ β4 logðAgei Þ þ β5 V4i þ b0i þ uij : (6) INLA software (Rue et al., 2009);
2. pðσ 2 Þ,Γð0:001; 0:001Þ—the default choice of the
Here, Baselinei =4 denote the one-fourth of the baseline BUGS software (Lunn et al., 2012) and the most popular
seizure count for the ith patient, Trti is an indicator choice in Bayesian analysis;
variable for the treatment arm of the ith patient, which 3. pðσ 2 Þ,Γð0:5; 0:0164Þ—a specification proposed
takes the value of one if the patient is assigned to the by Fong et al. (2010);
new drug and zero if the patient is assigned to the 4. Half—Cauchy prior with scale 25 on σ—a specifi­
placebo group, and V4i is an indicator variable for the cation proposed by (Gelman, 2006).
fourth visit. To aid convergence when fitting the HPN
and HPNOD models (5) and (6), respectively, the cov­
ariates logðBaselinei =4Þ, logðAgei Þ, and Trti � 4 Result
logðBaselinei =4Þ were centred about their mean.
4.1 Analysis of the Anopheles mosquitoes count
data
3.2 Priors for the variance components of the
In this subsection, we present the analysis of the
hierarchical Poisson models
Anopheles mosquitoes count data introduced in
A Bayesian approach is attractive for modelling complex Section 2.1. We considered three estimation routines
longitudinal count data, but requires the specification of (namely: Stan, JAGS, and INLA) where all of them are
prior distributions for all the random elements of the accessed through R software version 3.3.2 (R Core
model. For the hierarchical models in Section 3.1, this Team, 2016) to fit the HPN and HPNOD models pre­
involves choosing priors for the regression coefficients sented in section 3.1.1 and 3.1.2. For the Bayesian infer­
and the hyperparameters σ 2b0 and σ 2u of subject and ence using runjags (Denwood, 2016) we run 3 chains
observation-specific random effects, respectively. Two with 30,000 MCMC iterations per chain from which the
classes of prior distributions, informative and non- first 10,000 iterations are considered burn-in period,
informative priors, are used in Bayesian modelling. while for RStan (Stan Development Team, 2016) the
One can use informative priors when substantial prior results are based on 4 chains with 4000 MCMC itera­
information is available, for instance, from previous tions per chain (with the first 2000 the burn-in period).
studies relevant to the current data set. Non- Note that the models can be fitted in Stan using brms
informative prior distributions are intended to allow (Bürkner, 2017) as well. An elaborate discussion about
Bayesian inference for parameters about which little is two packages, RStan and brms, is given in Section 6 of
known beyond the data included in the analysis at hand the supplementary appendix. Model selection was made
(Gelman, 2006). In this paper, we focus on the choose of using the Deviance Information Criteria (DIC,
non-informative prior. Spiegelhalter et al., 2002) for INLA and JAGS and the
The specification of a non-informative prior to widely applicable information criteria (WAIC, Vehtari
express the absence of prior information about the et al., 2017) for Stan.
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 5

The posterior means and standard deviations for the methods lead to a similar result under Γð0:001; 0:001Þ
parameters of the HPN and HPNOD models estimated and Γð0:5; 0:0164Þ priors, but the posterior density of
using INLA, JAGS, and Stan using the four prior speci­ INLA for σ b02 deviates from that of Stan under the
fications are presented in Table 1. For all priors consid­ Γð1; 0:0005Þ and half Cauchy prior. For INLA, we
ered, the DIC and WAIC values of the HPNOD model observed differences in the point estimates of σ 2b0 and
are slightly higher than those of the HPN model, sug­ σ 2u across priors, where a profound variation across
gesting the latter to be a preferred model for this dataset.
priors is observed in the posterior density of σ 2u
All prior distributions and all methods of estimations
(Figure 4).
we examined yielded similar results for the regression
coefficients. However, the results in Table 1 revealed the
influence of the software used to fit the models and the 4.2 Analysis of the epilepsy data
prior specification on the parameter estimates corre­
The HPN and HPNOD models formulated in equations
sponding to the random effect terms. The three estima­
(5) and (6) were fitted using the three software and four
tion methods give similar point estimates for the
priors discussed in the previous section.
variance of the random intercept under the
Table 2 presents the posterior means obtained for
Γð0:001; 0:001Þ and Γð0:5; 0:0164Þ priors in both the
all fitted models. The DIC (WAIC) values of the
HPN and HPNOD models. On the other hand, the
HPNOD model obtained for all estimation methods
difference among estimation methods is noticeable
are smaller than those of the HPN model for all
under the Γð1; 0:0005Þ and half Cauchy prior, where
prior settings, indicating that the first model is pre­
the posterior mean for σ 2b0 obtained from Stan is about
ferred. The three estimation methods lead to virtually
6% larger than INLA under Γð1; 0:0005Þ prior and identical results for the HPNOD model under priors
about 13% larger under the half Cauchy prior. Figure 3 2 and 3. However, under the half Cauchy prior
shows the posterior density of the precision of the ran­ (prior 4), the posterior means for the random inter­
dom intercept (σ b02 ) obtained from JAGS, Stan and cept (σ 2b0 ) and overdispersion (σ 2u ) obtained for Stan
INLA using the four prior distributions. All estimation and JAGS are slightly higher than the posterior mean

Table 1. Parameter estimates of the HPN and HPNOD models for the Anopheles mosquitoes count data using INLA, JAGS and Stan
Stan JAGS INLA
HPN HPNOD HPN HPNOD HPN HPNOD
Pprior Param. Est sd Est sd Est SD Est SD Est sd Est sd
1 β00 1.55 0.23 1.55 0.22 1.55 0.23 1.55 0.23 1.56 0.23 1.56 0.23
β01 0.69 0.27 0.71 0.28 0.70 0.26 0.69 0.26 0.69 0.26 0.69 0.26
β10 −0.27 0.03 −0.27 0.03 −0.27 0.03 −0.27 0.03 −0.27 0.03 −0.27 0.03
β11 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05
σ b0 0.88 0.13 0.87 0.12 0.86 0.12 0.86 0.13 0.85 0.12 0.85 0.12
σu 0.00 0.00 0.03 0.02 0.03 0.02
DIC � 682.2 682.8 690.5 690.5 691.35 691.44
2 β00 1.55 0.24 1.55 0.23 1.55 0.23 1.56 0.23 1.55 0.23 1.55 0.23
β01 0.67 0.27 0.68 0.27 0.68 0.27 0.69 0.27 0.68 0.27 0.68 0.27
β10 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.03
β11 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05
σ b0 0.89 0.13 0.89 0.13 0.89 0.13 0.89 0.13 0.89 0.13 0.89 0.13
σu 0.04 0.05 0.08 0.05 0.08 0.05
DIC � 682.4 682.5 690.2 692.0 691.13 692.14
3 β00 1.55 0.23 1.55 0.23 1.55 0.23 1.56 0.23 1.55 0.23 1.55 0.23
β01 0.68 0.27 0.68 0.27 0.69 0.26 0.70 0.27 0.69 0.26 0.69 0.27
β10 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.04
β11 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05
σ b0 0.88 0.13 0.88 0.13 0.88 0.13 0.87 0.13 0.87 0.12 0.87 0.13
σu 0.05 0.05 0.14 0.05 0.13 0.04
DIC � 682.8 683.1 690.3 692.8 691.22 693.27
4 β00 1.54 0.24 1.55 0.24 1.55 0.23 1.57 0.24 1.56 0.23 1.56 0.23
β01 0.68 0.27 0.68 0.28 0.68 0.26 0.69 0.27 0.69 0.26 0.69 0.26
β10 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.04 −0.27 0.03 −0.28 0.03
β11 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05
σ b0 0.91 0.14 0.91 0.14 0.87 0.12 0.87 0.13 0.85 0.12 0.85 0.12
σu 0.10 0.06 0.14 0.06 0.12 0.05
DIC � 681.6 683.7 690.2 693.8 691.35 693.06
Time�� 4.28 4.74 2.88 6.06 2.49 2.56
* The WAIC (Vehtari et al., 2017) is reported rather than DIC for Stan.
** The computational time for INLA is in seconds whereas the computational time for JAGS and Stan is in minutes.
6 BELAY BIRLIE

Prior1 Prior2
1.2 1.2
JAGS JAGS
INLA INLA
1.0 STAN 1.0 STAN

0.8 0.8

Density

Density
0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

0 1 2 3 4 0 1 2 3 4
σ−2
b0 σ−2
b0

Prior3 Prior4
1.2 1.2
JAGS JAGS
INLA INLA
1.0 STAN 1.0 STAN

0.8 0.8
Density

Density
0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

0 1 2 3 4 0 1 2 3 4
σ−2
b0 σ−2
b0

Figure 3. Anopheles mosquitoes count data: Posterior density for the precision of the random intercept (σb02 ) obtained for the HPN
model.

Pr1 Pr1
Pr2 Pr2
3.0

Pr3 Pr3
Pr4 Pr4
30
2.5
2.0
density

density

20
1.5
1.0

10
0.5
0.0

0.4 0.6 0.8 1.0 1.2 1.4 0.0 0.1 0.2 0.3 0.4

σb0 σu

Figure 4. Anopheles mosquitoes count data: Marginal posterior distribution of σb (left panel) and σu (right panel) for the analysis of
Anopheles mosquitoes count data using HPNOD model estimated via INLA. Pr1—Γð1; 0:0005Þ, Pr2—Γð0:001; 0:001Þ, Pr3
—Γð0:5; 0:0164Þ and Pr4—(half-Cauchyð0; 25Þ.

obtained for INLA. Figures 5 and6 show the poster­ 5 Simulation study
ior density of the precision of the random intercept
A simulation study was conducted to compare the per­
(σ b02 ) and the precision of the overdispersion para­
formance of the three Bayesian software and the four
meter (σ u 2 ) obtained from JAGS, Stan and INLA prior specifications for σ b0 and σ u presented in
using the four prior distributions. We notice Section 3.2.
a small difference between JAGS and INLA as well
as Stan and INLA for the two prior specifications
(priors 1 and 2). Differences between estimation 5.1 Simulation setting
methods are clearly seen when the half Cauchy The simulation represents a longitudinal study where
prior is used. the data are Poisson distributed. The steps for the
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 7

Table 2. Parameter estimates of the HPN and HPNOD models for the epilepsy data set using INLA, JAGS and stan
Stan JAGS INLA
HPN HPNOD HPN HPNOD HPN HPNOD
PPrior Param. Est sd Est sd Est SD Est SD Est sd Est sd
11 β0 1.62 0.08 1.57 0.08 1.62 0.08 1.58 0.07 1.62 0.08 1.58 0.08
β1 0.88 0.14 0.88 0.14 0.88 0.13 0.89 0.13 0.88 0.14 0.88 0.13
β2 −0.34 0.16 −0.34 0.15 −0.34 0.15 −0.33 0.15 −0.34 0.15 −0.33 0.15
β3 0.34 0.21 0.35 0.21 0.34 0.20 0.35 0.20 0.34 0.21 0.35 0.21
β4 0.47 0.36 0.48 0.36 0.47 0.36 0.48 0.35 0.48 0.36 0.48 0.35
β5 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09
σ b0 0.53 0.06 0.49 0.07 0.52 0.06 0.48 0.07 0.52 0.06 0.48 0.07
σu 0.36 0.04 0.35 0.04 0.35 0.04
DIC � 1328.4 1151.9 1271.9 1159.3 1272.3 1158.2
22 β0 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08
β1 0.88 0.14 0.88 0.14 0.88 0.15 0.88 0.14 0.88 0.14 0.88 0.14
β2 −0.34 0.16 −0.33 0.16 −0.34 0.16 −0.33 0.16 −0.34 0.16 −0.33 0.16
β3 0.34 0.22 0.35 0.21 0.35 0.22 0.36 0.21 0.34 0.22 0.35 0.21
β4 0.47 0.37 0.48 0.37 0.47 0.38 0.48 0.37 0.47 0.37 0.48 0.36
β5 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09
σ b0 0.54 0.07 0.50 0.07 0.54 0.07 0.50 0.07 0.54 0.06 0.50 0.07
σu 0.36 0.04 0.36 0.04 0.36 0.04
DIC � 1328.9 1149.1 1271.75 1159.0 1272.1 1157.8
33 β0 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08
β1 0.88 0.14 0.88 0.14 0.88 0.13 0.89 0.14 0.88 0.14 0.88 0.14
β2 −0.34 0.16 −0.33 0.15 −0.33 0.16 −0.34 0.16 −0.34 0.16 −0.33 0.15
β3 0.34 0.21 0.35 0.21 0.35 0.21 0.33 0.21 0.34 0.21 0.35 0.21
β4 0.48 0.37 0.49 0.36 0.49 0.37 0.46 0.36 0.48 0.37 0.48 0.36
β5 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09
σ b0 0.54 0.07 0.49 0.07 0.53 0.06 0.49 0.07 0.53 0.06 0.49 0.07
σu 0.36 0.04 0.36 0.04 0.36 0.04
DIC � 1326.7 1148.4 1271.5 1159.1 1272.2 1157.9
44 β0 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08 1.62 0.08 1.58 0.08
β1 0.88 0.14 0.88 0.14 0.90 0.15 0.87 0.14 0.88 0.14 0.88 0.13
β2 −0.33 0.16 −0.33 0.16 −0.34 0.16 −0.33 0.16 −0.34 0.15 −0.33 0.15
β3 0.34 0.22 0.35 0.22 0.31 0.22 0.36 0.22 0.34 0.21 0.35 0.21
β4 0.47 0.37 0.48 0.37 0.45 0.38 0.48 0.36 0.48 0.36 0.48 0.35
β5 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09
σ b0 0.55 0.07 0.51 0.07 0.54 0.07 0.50 0.07 0.52 0.06 0.48 0.07
σu 0.37 0.04 0.37 0.04 0.35 0.04
DIC � 1326.5 1147.2 1271.7 1159.0 1272.3 1158.1
Time�� 5.80 5.44 3.23 5.06 4.30 12.78
* The WAIC (Vehtari et al., 2017) is reported rather than DIC for Stan.
** The computational time for INLA is in seconds whereas the computational time for JAGS and Stan is in minutes.

simulation study are as follows: For i ¼ 1; . . . ; N clus­ parameters for both the HPN and HPNOD models are
ters (subjects), j ¼ 1; . . . ; J observations per cluster β ¼ ðβ00 ; β01 ; β10 ; β11 Þ0 ¼ ð2; 2; 0:05; 0:2Þ0 . To study
observed at equally spaced sampling times how the performance of the estimation methods and
tij ¼ ðti1 ; . . . ; tiJ Þ0 , we generate Yij ,Poissonðλij Þ where the role of the prior distribution are affected by the value
of the cluster variance, we used four different values for
ηij ¼ logðλij Þ
the standard deviation parameter of the random inter­
¼ β00 xi þ β01 ð1 xi Þ þ β10 xi tij þ β11 ð1 xi Þtij þ b0i cept term, i.e σ b0 ¼ 0:1; 0:5; 1; 1:5. Further, in order to
þ uij ; study the effect of small number of observations per
(7) cluster, we considered four balanced designs with J ¼
with b0i ,Nð0; σ 2b0 Þ; uij ,Nð0; σ 2u Þ, and xi is an indicator 2; 5 observations per cluster and the number of clusters
variable such that xi ¼ 0 for i � N=2 and xi ¼ 1 equal to N ¼ 20 and 40. In total 4 � 4 � 2 � 2 ¼ 64
otherwise. The model formulated in (7) is the HPNOD simulation setting was considered. For each setting 500
model discussed in (3.1.1). Low, moderate, and datasets were generated. For each dataset, the HPN and
high levels of overdispersion were considered HPNOD models were fitted with INLA, JAGS, and Stan
corresponding to σ u ¼ 0:2; 1 and 2, respectively using a flat prior distribution Nð0; 1000Þ for the fixed
(Aregay et al., 2015). A model without overdispersion effect parameters and using the four prior distributions
(HPN) was formulated by omitting uij from the mean listed in Section 3.2 for σ 2b0 and σ 2u . For JAGS and Stan,
structure in (7). The true values of the fixed effect we ran 3 chains with 20,000 iterations per chain from
8 BELAY BIRLIE

Prior1 Prior2
0.4 0.4
JAGS JAGS
INLA INLA
STAN STAN
0.3 0.3

Density

Density
0.2 0.2

0.1 0.1

0.0 0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30
σ−2
b0 σ−2
b0

Prior3 Prior4
0.4 0.4
JAGS JAGS
INLA INLA
STAN STAN
0.3 0.3
Density

Density
0.2 0.2

0.1 0.1

0.0 0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30
σ−2
b0 σ−2
b0

Figure 5. Epilepsy data: Posterior density for the precision of the random intercept (σb02 ) obtained for the HPNOD model.

Prior1 Prior2
0.30 0.30
JAGS JAGS
INLA INLA
0.25 STAN 0.25 STAN

0.20 0.20
Density

Density

0.15 0.15

0.10 0.10

0.05 0.05

0.00 0.00

0 5 10 15 20 25 30 0 5 10 15 20 25 30
σ−2
u σ−2
u

Prior3 Prior4
0.30 0.30
JAGS JAGS
INLA INLA
0.25 STAN 0.25 STAN

0.20 0.20
Density

Density

0.15 0.15

0.10 0.10

0.05 0.05

0.00 0.00

0 5 10 15 20 25 30 0 5 10 15 20 25 30
σ−2
u σ−2
u

Figure 6. Epilepsy data: posterior density for the precision of the overdispersion parameter (σu 2 ) obtained for the HPNOD model.

� � �2 �
1 500
which the first 10,000 iterations were considered as �i¼1 Varð^θi Þ þ ^θi θ ;
500
burn-in period. For each parameter of interest θ, the
relative bias was calculated by where ^θi is the parameter estimate for θ obtained for the
1 500 �^ � ith simulation. A simulation study for unbalanced long­
�i¼1 θi θ =θ; itudinal data was conducted as well. The simulation
500
setting and result are discussed in detail in Section 5 of
and the mean squared error (MSE) by the supplementary appendix.
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 9

σu = 0 σu= 0.2
INLA JAGS Stan INLA JAGS Stan
1.0
0.5
0.5

HPN

HPN
0.0
Relative Bias 0.0

Relative Bias
−0.5
−0.5

1.0
0.5
0.5

HPNOD

HPNOD
0.0
0.0
−0.5
−0.5

0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb0 σb0

σu= 1 σu= 2
INLA JAGS Stan INLA JAGS Stan
5 12
4
3 8
HPN

HPN
2
1 4
Relative Bias

Relative Bias
0
0
−1
5 12
4
3 8
HPNOD

HPNOD
2
1 4
0
0
−1
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb0 σb0

Figure 7. Relative bias for the random intercept σb0 (y-axis) as a function of the true value of σb0 (x-axis); panels by level of
overdispersion σu . Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and
purple for prior 4 (half-Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.

5.2 Simulation results model mis-specified the mean structure of the under­
lying model used to generate the data (by omitting the
5.2.1 Estimation of σb0
overdispersion parameter). For datasets generated with
Figure 7 (upper left panel) presents the simulation
low levels of overdispersion (σ u ¼ 0:2), we observe
results obtained for the datasets generated without over­
a similar pattern to that of no overdispersion. When
dispersion. For all prior specifications, and for all soft­
the data are generated with a moderate to high level of
ware, differences were observed for σ b0 ¼ 0:1. For INLA
overdispersion, for the HPN model, all prior specifica­
and JAGS, prior 1 led to an underestimation of σ b0 while
tions and software have a tendency to overestimate the
prior 2 4 led to an over estimation. For Stan, prior 4
value of σ b0 with the magnitude of overestimation
led to over estimation, while priors 1–3 led to an under­
decreasing as the value of σ b0 increases. For the
estimation of σ b0 . Note that this pattern was observed
HPNOD model (which specified the mean structure
for both HPN and HPNOD, which in this case, mis-
correctly), for σ b0 ¼ 0:1, prior 1 consistently led to an
specified the mean structure (by including the over
under estimation of σ b0 across all software but with the
dispersion parameter).
smallest magnitude compared to the overestimation
The upper right and the lower panels in Figure 7
observed for the other priors.
present the simulation results obtained for the datasets
Similar patterns were observed for the second simu­
generated with varying levels of overdispersion
lation study of unbalanced longitudinal data (see
(σ u ¼ 0:2, σ u ¼ 1, and σ u ¼ 2). In this case, the HPN
Section 5 of the supplementary material). When the
10 BELAY BIRLIE

true values for σ b are small, substantial variation is all regression coefficients under all settings. For JAGS,
observed among software and prior specifications and we observed a different pattern for β10 when the data are
this variation vanishes when the true values are rela­ generated with a high level of overdispersion (σ u ¼ 2)
tively large. and the HPNOD model used to fit the data. The relative
bias for this parameter increases in a linear fashion with
5.2.2 Estimation of σu the value of the standard deviation of the random
Figure 8 shows the results for the estimation of σ u across intercept.
the levels of σ b0 for the HPNOD model. Differences
between the priors can be observed for σ u ¼ 0:2. For 5.2.4 Effect of cluster size
INLA and Stan, all priors led to an underestimation of A simulation study also investigated the effect of
σ u (prior 1 with the highest magnitude). For JAGS, prior a small number of observations per cluster on the
4 led to an overestimation of σ u while the other priors to estimation. Two and five observations per cluster
an underestimation. were considered. Figure 10, presents the relative
bias for all the parameters of the HPNOD model.
5.2.3 Estimation of β‘s The results obtained for N ¼ 20, σ b0 ¼ 1, and σ u ¼ 1
The relative biases for the regression coefficients for indicate that, as expected, the relative bias decreases
given values of σ b0 and σ u are shown in Figure 9. For as the sample size increases. Note that this pattern
INLA and Stan, all priors lead to similar relative bias for was observed for all software and priors considered.

σb0 = 0.1 σb0 = 0.5 σb0 = 1 σb0 = 1.5

0.0

INLA
−0.3

−0.6

0.0
Relative Bias

JAGS
−0.3

−0.6

0.0
Stan

−0.3

−0.6

0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
σu

Figure 8. Relative bias for the overdispersion parameter σu (y-axis) as a function of the true value for σu (x-axis). Red lines for prior 1
(Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-Cauchyð0; 25Þ).
N ¼ 20 and J ¼ 5.
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 11

HPN HPN HPN HPN HPNOD HPNOD HPNOD HPNOD HPN HPN HPN HPN HPNOD HPNOD HPNOD HPNOD

σu = 0 σu = 0.2 σu = 1 σu = 2 σu = 0 σu = 0.2 σu = 1 σu = 2 σu = 0 σu = 0.2 σu = 1 σu = 2 σu = 0 σu = 0.2 σu = 1 σu = 2

0.2
0.4

0.0
0.2

INLA

INLA
−0.2
0.0
−0.4
−0.2
−0.6

0.2
0.4
Relative Bias

Relative Bias
0.0
0.2

JAGS

JAGS
−0.2
0.0
−0.4
−0.2
−0.6

0.2
0.4

0.0
0.2

Stan

Stan
−0.2
0.0
−0.4
−0.2
−0.6

0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2
σb0 σb0

HPN HPN HPN HPN HPNOD HPNOD HPNOD HPNOD HPN HPN HPN HPN HPNOD HPNOD HPNOD HPNOD

σu = 0 σu = 0.2 σu = 1 σu = 2 σu = 0 σu = 0.2 σu = 1 σu = 2 σu = 0 σu = 0.2 σu = 1 σu = 2 σu = 0 σu = 0.2 σu = 1 σu = 2

3 0.4

2 0.2

INLA

INLA
0.0
1

−0.2
0

3 0.4
Relative Bias

Relative Bias

2 0.2
JAGS

JAGS
0.0
1

−0.2
0

3 0.4

2 0.2
Stan

Stan
0.0
1

−0.2
0

0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.81.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.40.81.2 0.4 0.8 1.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2
σb0 σb0

Figure 9. Relative bias for regression coefficients: β00 (top left panel), β01 (top right panel), β10 (bottom left panel), and β11 (bottom
right panel). Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for
prior 4 (half-Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.

The results obtained for the HPN model are similar xi ¼ 1 otherwise. We considered four set of values for
� � � �
and presented in Section 3 of the supplementary σ b0 0 0:1 0
σ b0 and σ b1 , i.e ¼ ;
appendix. 0 σ b1 0 0:1
� � � � � �
0:1 0 1:5 0 1:5 0
; ; and . Further,
5.3 Random intercept and slope model 0 1:5 0 0:1 0 1:5
low, moderate, and high levels of overdispersion were
We extend the simulation setting in Section 5.1 to considered corresponding to σ u ¼ 0:2; 1 and 2, respec­
a random intercept and slope model. For i ¼ 1; . . . ; N tively. A model without overdispersion (HPN) was for­
clusters (subjects), j ¼ 1; . . . ; J observations per cluster mulated by omitting uij from the mean structure in (8).
observed at equally spaced sampling times The true values of the fixed effect parameters are
tij ¼ ðti1 ; . . . ; tiJ Þ0 , we generate Yij ,Poissonðλij Þ where β ¼ ðβ00 ; β01 ; β10 ; β11 Þ0 ¼ ð2; 2; 0:05; 0:2Þ0 . For each
ηij ¼ logðλij Þ setting, we made 100 simulated data sets consisting of 20
¼ β00 xi þ β01 ð1 xi Þ þ β10 xi tij þ β11 ð1 xi Þtij þ b0i subjects with 5 measurements per subject and fit the HPN
þ b1i tij þ uij ; and HPNOD models with INLA, JAGS, and Stan using
the four prior distributions listed in Section 3.2 for σ 2b0 , σ 2b1
(8)
and σ 2u . For JAGS and Stan, we run 3 chains with 20,000
with b0i ,Nð0; σ 2b0 Þ, b1i ,Nð0; σ 2b1 Þ, uij ,Nð0; σ 2u Þ, and xi iterations per chain from which the first 10,000 iterations
is an indicator variable such that xi ¼ 0 for i � N=2 and were considered as burn-in period. Then, we computed
12 BELAY BIRLIE

β0 β1 β10 β11 σb0 σu

INLA
−2

2
Relative Bias

JAGS
0

−2

Stan
−2

2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5
J:Observation per cluster

Figure 10. Relative bias for HPNOD model parameters (y-axis) as a function number of observation per cluster J (x-axis). Red lines for
prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-
Cauchyð0; 25Þ). N ¼ 20, σb0 ¼ 1, and σu ¼ 1.

the relative bias and mean squared for each parameter (see levels of overdispersion (σ 2u ¼ 0; 0:2; 1; and 2) and
Section 5.1). different values of σ b0 (σ b0 ¼ 0:1 and σ b0 ¼ 1:5). As
before, variation among priors and software is
5.3.1 Estimation of σb0 and σb1 observed when the true value for σ b1 is small
The relative bias for σ b0 obtained for the datasets gener­ (σ b1 ¼ 0:1) and the true value for σ b0 is large
ated with varying level of overdispersion (σ b1 ¼ 1:5). Overall, INLA performs better than
2
(σ u ¼ 0; 0:2; 1; and 2) and different values of σ b1 JAGS and Stan under all scenarios.
(σ b1 ¼ 0:1 and σ b1 ¼ 1:5) are presented in Figure 11.
We observe substantial variation among priors and soft­ 5.3.2 Estimation of σu
ware when σ b0 ¼ 0:1. When σ b0 ¼ 0:1 and the data gen­ Figure 13 presents the relative bias for σ u obtained for
erated without overdispersion, Prior 1 (Γð1; 0:0005Þ) the datasets generated with varying levels of overdisper­
leads to underestimation while prior 2–4 leads to over­ sion. The variation among priors decreases as the true
estimation for all software. The relative bias decreases as value of overdispersion (σ u ) increases. However,
the true value for σ b0 increases. We also observed the a substantial difference is observed among software.
influence of the level of variation in the random slope INLA performs better than JAGS and Stan under all
(σ b1 ) on the estimate of σ b0 especially for JAGS. scenarios. JAGS and Stan consistently underestimate
� � � �
Figure 12 presents the relative bias for σ b1 σ b0 0 0:1 0
σu when ¼ and
obtained for the datasets generated with varying 0 σ b1 0 0:1
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 13

σb1= 0.1,σu= 0 σb1= 1.5,σu= 0 σb1= 0.1,σu= 1 σb1= 1.5,σu= 1


INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan
30 10.0
1.5
30
1.0 7.5
20
20

HPN

HPN

HPN

HPN
0.5 5.0
0.0 10 2.5 10

Relative Bias

Relative Bias

Relative Bias

Relative Bias
−0.5
0.0
0 0
30 10.0
1.5
30
1.0 7.5
20

HPNOD

HPNOD

HPNOD

HPNOD
0.5 5.0 20

0.0 10 2.5 10
−0.5
0.0
0 0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb0 σb0 σb0 σb0

σb1= 0.1,σu= 0.2 σb1= 1.5,σu= 0.2 σb1= 0.1,σu= 2 σb1= 1.5,σu= 2
INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan
30 20
1 15 40
20

HPN

HPN

HPN

HPN
10
10 20
0
Relative Bias

Relative Bias

Relative Bias

Relative Bias
5

0 0 0
30 20
1 15 40
HPNOD

HPNOD

HPNOD

HPNOD
20
10
10 20
0 5

0 0 0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb0 σb0 σb0 σb0

Figure 11. Relative bias for σb0 as a function of the it’s true value (x-axis); panels by level of overdispersion (σu ) and true values of σb1 .
Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-
Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.

σb0= 0.1,σu= 0 σb0= 1.5,σu= 0 σb0= 0.1,σu= 1 σb0= 1.5,σu= 1


INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan
3 10.0
0.5
2 7.5 7.5
0.0
HPN

HPN

HPN

HPN
5.0 5.0
1
2.5 2.5
Relative Bias

Relative Bias

Relative Bias

Relative Bias
−0.5 0
0.0 0.0
3 10.0
0.5
2 7.5 7.5
HPNOD

HPNOD

HPNOD

HPNOD
0.0 5.0 5.0
1
−0.5 2.5 2.5
0
0.0 0.0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb1 σb1 σb1 σb1

σb0= 0.1,σu= 0.2 σb0= 1.5,σu= 0.2 σb0= 0.1,σu= 2 σb0= 1.5,σu= 2
INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan
20
20
1.0
2 15
15
0.5
HPN

HPN

HPN

HPN
1 10 10
0.0 5
Relative Bias

Relative Bias

Relative Bias

Relative Bias

0 5
−0.5 0 0
20
20
1.0
2 15
15
HPNOD

HPNOD

HPNOD

HPNOD
0.5 10
1 10
0.0 5
0 5
−0.5 0 0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb1 σb1 σb1 σ b1

Figure 12. Relative bias for σb1 as a function of the it’s true value (x-axis); panels by level of overdispersion (σu ) and true values of σb0 .
Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-
Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.

� �
0:1 0 performance of the Bayesian estimation methods
. Further, when the data are generated with
0 1:5 and prior specifications for variance components
� � � � � �
σ b0 0 0:1 0 1:5 0 in the context of longitudinal count data. We com­
¼ or , JAGS pared the results obtained with INLA to the results
0 σ b1 0 1:5 0 1:5
and Stan overestimate σ u when σ u ¼ 0:2 and under­ obtained with JAGS which uses Gibbs sampling and
estimate σ u when σ u ¼ 1 or σ u ¼ 2. Stan which uses Hamiltonian Monte Carlo while
assuming a variety of prior specifications for var­
iance components. We analysed the influence of
6 Concluding remarks different factors such as small number of observa­
In this paper, we performed a Monte Carlo simula­ tions per cluster, different values of the random
tion study in order to simultaneously evaluate the effect variance and estimation from a misspecified
14 BELAY BIRLIE

σb0 = 0.1, σb1 = 0.1 σb0 = 0.1, σb1 = 1.5 σb0 = 1.5, σb1 = 0.1 σb0 = 1.5, σb1 = 1.5

INLA
1

−1

2
Relative Bias

JAGS
1

−1

Stan
1

−1
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
σu

Figure 13. Relative bias for σu as a function of it’s true value (x-axis) obtained. Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2
(Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.

model on the bias and mean squared errors of the considered independent prior for the random inter­
parameter estimates. cept and slope. Future research should focus on the
A simulation study has shown that the approxi­ comparison of different software assuming general
mation strategy employed by INLA is accurate in priors for variance-covariance.
general and all software leads to similar results for
most of the cases considered. Estimation of the var­
iance components, however, is difficult when their
Acknowledgements
true value is small for all estimation methods and The authors gratefully acknowledge the support from VLIR-
prior specifications. The estimates obtained for all UOS. For the simulations, the Flemish Supercomputer Centre,
software tend to be biased downward or upward funded by the Hercules Foundation and the Flemish
Government of Belgium – department EWI, was used.
depending on the prior. The results of the simulation
study also show that there is an effect of cluster size.
For all software and prior specifications, the relative Disclosure statement
bias for all parameters decrease as cluster size
No potential conflict of interest was reported by the authors.
increases. For the random intercept and slope
model, INLA performs better than JAGS and Stan
Funding
under all scenarios. In our simulation study of the
random intercept and slope model, we only The authors received no direct funding for this research.
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 15

Notes on contributors Bürkner, P. (2017). brms: An R Package for Bayesian


Multilevel Models Using Stan. Journal of Statistical
Belay Birlie Yimer is currently a research associate at Versus Software, 80(1), 1-28. doi:http://dx.doi.org/10.18637/jss.
Arthritis Centre for Epidemiology, Division of v080.i01
Musculoskeletal and Dermatological Sciences, The Core Team, R. (2016). R: A language and environment for
University of Manchester, UK. Prior to joining the statistical computing. R Foundation for Statistical
University of Manchester, Belay was an assistant professor in Computing. URL https://www.R-project.org/
the Department of Statistics, Jimma University, Ethiopia. He Degefa, T., Zeynudin, A., Godesso, A., Michael, Y. H., Eba, K.,
has completed his B.Sc. and M.Sc. in statistics and is currently Zemene, E., Emana, D., Birlie, B., Tushune, K., &
pursuing his PhD in statistics. His main research area focuses Yewhalaw, D. (2015). Malaria incidence and assessment
on Bayesian methods and modelling complex time-to-event of entomological indices among resettled communities in
data with applications to infectious and chronic diseases. Ethiopia: A longitudinal study. Malaria Journal, 14(1), 24.
Ziv Shkedy is a Professor of Biostatistics and bioinformatics at https://doi.org/10.1186/s12936-014-0532-z
the University of Hasselt in Belgium. Denwood, M. J. (2016). runjags: An R package providing
interface utilities, model templates, parallel computing
methods and additional distributions for MCMC models
ORCID in JAGS. Journal of Statistical Software, 71(9), 1–25. https://
doi.org/10.18637/jss.v071.i09
Belay Birlie Yimer http://orcid.org/0000-0001-8621-6539 Ferkingstad, E., Rue, H., et al. (2015). Improving the INLA
approach for approximate bayesian inference for latent
gaussian models. Electronic Journal of Statistics, 9(2),
PUBLIC INTEREST STATEMENT 2706–2731. https://doi.org/10.1214/15-EJS1092
Fong, Y., Rue, H., & Wakefield, J. (2010). Bayesian inference
Longitudinal count data are now an integral part of experi­ for generalized linear mixed models. Biostatistics, 11(3),
mental and empirical studies across a range of disciplines 397–412. https://doi.org/10.1093/biostatistics/kxp053
from the medical to the social and business sciences. Special Gelman, A. (2006). Prior distributions for variance para­
models for longitudinal data are required when there are meters in hierarchical models (comment on article by
repeated measurements of the count outcome from the same browne and draper). Bayesian Analysis, 1(3), 515–534.
individual over time, which leads to a dependence structure in https://doi.org/10.1214/06-BA117A
the data. Generalized linear mixed models (GLMM) are one of Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B., 2003.
the most used models for modeling longitudinal count data. Bayesian data analysis, (chapman & hall/crc texts in statis­
Bayesian inference for generalized linear mixed effect models tical science).
(GLMM) is appealing, but its widespread use has been ham­ Grilli, L., Metelli, S., & Rampichini, C. (2015). Bayesian esti­
pered by the lack of a fast implementation tool and the mation with integrated nested Laplace approximation for
difficulty in specifying prior distributions. In this paper, we binary logit mixed models. Journal of Statistical
conduct an extensive simulation study to evaluate the perfor­ Computation and Simulation, 85(13), 2718–2726. https://
mance of INLA for estimation of the hierarchical Poisson doi.org/10.1080/00949655.2014.935377
regression models with overdispersion in comparison with Lunn, D., Jackson, C., Best, N., Thomas, A., &
JAGS and Stan while assuming a variety of prior specifications Spiegelhalter, D. (2012). The BUGS book: A practical intro­
for variance components. duction to Bayesian analysis. CRC press.
McCulloch, C. E., & Neuhaus, J. M. (2001). Generalized linear
mixed models. Wiley Online Library.
References Molenberghs, G., & Verbeke, G. (2005). Missing data
Aregay, M., Shkedy, Z., & Molenberghs, G. (2013). concepts. Models for Discrete Longitudinal Data,
A hierarchical bayesian approach for the analysis of long­ 481–488.
itudinal count data with overdispersion: A simulation Molenberghs, G., Verbeke, G., & Demétrio, C. G. (2007).
study. Computational Statistics & Data Analysis, 57(1), An extended random-effects approach to modeling
233–245. https://doi.org/10.1016/j.csda.2012.06.020 repeated overdispersed count data. . Lifetime Data
Aregay, M., Shkedy, Z., & Molenberghs, G. (2015). Analysis, 13(4), 513–531. https://doi.org/10.1007/
Comparison of additive and multiplicative bayesian models s10985-007-9064-y
for longitudinal count data with overdispersion para­ Plummer, M., 2003. JAGS: A program for analysis of bayesian
meters: A simulation study. Communications in Statistics- graphical models using Gibbs sampling. In: Proceedings of
Simulation and Computation, 44(2), 454–473. https://doi. the 3rd international workshop on distributed statistical
org/10.1080/03610918.2013.781629 computing. Vol. 124. Vienna, p. 125.
Breslow, N. (1990). Tests of hypotheses in overdispersed Rue, H., Martino, S., & Chopin, N. (2009). Approximate
poisson regression and other quasilikelihood models. bayesian inference for latent gaussian models by using
Journal of the American Statistical Association, 85(410), integrated nested Laplace approximations. Journal of the
565–571. https://doi.org/10.1080/01621459.1990.10476236 Royal Statistical Society. Series B, Statistical Methodology, 71
Breslow, N. E., & Clayton, D. G. (1993). Approximate infer­ (2), 319–392. https://doi.org/10.1111/j.1467-9868.2008.
ence in generalized linear mixed models. Journal of the 00700.x
American Statistical Association, 88(421), 9–25. https:// Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der
www.jstor.org/stable/2290687 Linde, A. (2002). Bayesian measures of model complexity
16 BELAY BIRLIE

and fi. Journal of the Royal Statistical Society. Series B, in log-gaussian cox processes. Journal of Statistical
Statistical Methodology, 64(4), 583–639. https://doi.org/10. Computation and Simulation, 84(10), 2266-2284.
1111/1467-9868.00353 Thall, P. F., & Vail, S. C. (1990). Some covariance models for
Stan Development Team, 2015. Stan: A C++ library for prob­ longitudinal count data with overdispersion. Biometrics, 46
ability and sampling, version 2.5. 0.”. (3), 657–671. https://doi.org/10.2307/2532086
Stan Development Team. (2016). RStan: The R interface to Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical bayesian
Stan. version, 2.12, 1. model evaluation using leave- one-out cross-validation and
Taylor, B. M., & Diggle, P. J. (2013). INLA or MCMC? waic. Statistics and Computing, 27(5), 1413–1432. https://
a tutorial and comparative evaluation for spatial prediction doi.org/10.1007/s11222-016-9696-4

You might also like