Bayesian Inference For Generalized Linear Mixed Models A Comparison of Different Statistical Software Procedures
Bayesian Inference For Generalized Linear Mixed Models A Comparison of Different Statistical Software Procedures
To cite this article: Belay Birlie Yimer & Ziv Shkedy (2021) Bayesian inference for generalized
linear mixed models: A comparison of different statistical software procedures, RMS: Research
in Mathematics & Statistics, 8:1, 1896102, DOI: 10.1080/27658449.2021.1896102
CONTACT Belay Birlie Yimer [email protected] Department of Statistics, Jimma University, Jimma, Ethiopia
*Arthritis Research UK Centre for Epidemiology, Division of Musculoskeletal and Dermatological Sciences, the University of Manchester, Manchester, UK
E-mail: [email protected]
E-mail: [email protected]
Reviewing Editor: Yueqing Hu, Fudan University School of Life Sciences, Shanghai, China
Supplemental data for this article can be accessed here.
© 2021 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license.
You are free to: Share — copy and redistribute the material in any medium or format. Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: Attribution — You must give appropriate credit, provide a link to the
license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. No additional
restrictions You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
2 BELAY BIRLIE
We proceed as follows. In Section 2, the two case 2.2 The epilepsy data
studies used for illustration are presented. In section 3,
The epilepsy data set contains information about 59
we review the generalized linear mixed effects model,
epileptics’ patients that are randomized into two treat
Bayesian estimation approaches and prior specifica
ment arms, a placebo or a new drug in a randomized
tions. Section 4 presents results obtained for the two
clinical trial of anticonvulsant therapy (Thall & Vail,
studies where we apply the three estimation methods to
1990). The response variable, the number of epilepsy
the two longitudinal count data sets. The simulation
seizures, was measured at four visits over time. The data
design and the main results of the simulation study are
was presented and analysed by Breslow and Clayton
commented in Section 5. Finally, Section 6 gives
(1993) and used as an illustrating example in Fong
a discussion and concluding remarks.
et al. (2010) who evaluated the performance of INLA
in comparison with penalized quasi-likelihood (PQL).
2 Case studies The epilepsy data set is available in R (R Core Team,
2016) as part of the R package INLA (Rue et al., 2009).
Two data sets with longitudinal count outcomes were
Figure 2 shows the individual and average profiles of the
used to illustrate the methodological aspects discussed
epilepsy data for both treatment groups.
in this paper.
Figure 1. Female Anopheles mosquito count data. Individual count (left panel) and average profile (right panel) in at risk and control
villages, Jimma town, Southwest Ethiopia (June—November 2013).
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 3
Figure 2. Individual profiles (left panel) and average profile (right panel) of the epilepsy data for both treatment groups.
b0i ,Nð0; σ 2b0 Þ; (1) and they reported the performance of the two
approaches to be the same.
where xij is a p-dimensional design matrix of the fixed
effect parameters β and b0i is a random intercept typi
cally used to model repeated count responses. To com 3.1.1 Model formulation for Anopheles mosquitoes
plete the specification, we used non-informative prior count data
distributions, namely, a normal distribution with mean Let Yij represent the number of female Anopheles mos
zero and variance 10,000 for β‘s and considered various quito counts for household i during month j of the
non-informative prior distributions for σ 2b0 (see follow-up period, let tij be the time point (months) at
Section 3.2). which Yij has been measured, tij ¼ 1; . . . ; 6 for all
The HPN model specified in (1) assumes that all households, and let xi be an indicator variable, which
sources of variability in the data can be captured by denotes the village type of the ith household, which
the random effects and the Poisson variability. takes the value of one if the household is located in
A commonly encountered problem related to count a resettled (risk) village and zero if the household is
data is overdispersion (or underdispersion), which located in a control village. We assumed
may cause serious flaws in precision estimation and Yij jβ; b0i ,Poissonðλij Þ; i ¼ 1; . . . ; 40; j ¼ 1; . . . ; 6, and
inference (Breslow, 1990) if not appropriately accounted that the pattern of Anopheles mosquito abundance
for. A number of extensions of the HPN models have over time is log-linear and possibly different between
been proposed to account for extra-dispersion in the the control and at-risk villages. Thus, the HPN model
setting of longitudinal count data (2015; Aregay et al., has a linear predictor of the form
2013; Molenberghs et al., 2007). Aregay et al. (2015) ηij ¼ logðλij Þ
extend the HPN model in (1) by considering two sepa
¼ β00 xi þ β01 ð1 xi Þ þ β10 xi tij þ β11 ð1 xi Þtij
rate random effects added to the linear predictor. The
þ b0i ; (3)
mean structure for their model (HPNOD) is given by
and the mean structure for the HPNOD is given by
ηij ¼ logðλij Þ ¼ xijT β þ b0i þ uij ;
ηij ¼ logðλij Þ
uij ,Nð0; σ 2u Þ; (2) ¼ β00 xi þ β01 ð1 xi Þ þ β10 xi tij þ β11 ð1 xi Þtij þ b0i
þ uij :
where b0i is the random intercept used to account for
(4)
possible clustering effect as before and the random effect
uij included to accommodate the overdispersion not
captured by the normal random effect b0i . The assumed 3.1.2 Model formulation for epilepsy data
priors for σ 2u are discussed in section 3.2. We choose the Breslow and Clayton (1993) analyzed the epilepsy data
additive overdispersion model above rather than the set presented in Section 2.2 using likelihood-based
multiplicative overdispersion model (Aregay et al., inference via penalized quasi-likelihood (PQL). Fong
2013; Molenberghs et al., 2007) to keep parametrization et al. (2010) used this data set to illustrate how
of the model to be the same across softwares. Further, Bayesian inference may be performed using INLA. We
a comparison of the additive model and the multiplica concentrate on the two random-effects models fit by
tive model has been described by Aregay et al. (2015) Breslow and Clayton (1993) and Fong et al. (2010). Let
4 BELAY BIRLIE
Yij be the number of seizures for patient i at visit j, i ¼ cluster variance and overdispersion is often difficult
1; 2; . . . ; 59 and j ¼ 1; 2; . . . ; 4, assumed to be condi (Gelman, 2006). Various non-informative prior dis
tionally independent Poisson variables with mean tributions have been suggested in the Bayesian lit
expðλij Þ, where the linear predictor for the HPN erature, including an improper uniform density on
model is given by the scale of standard deviation (Gelman et al.,
2003), proper distributions such as the inverse
ηij ¼ logðλij Þ gammað2; 2Þ with small positive 2 for the variance
¼ β0 þ β1 logðBaselinei =4Þ þ β2 Trti þ β3 Trti (Lunn et al., 2012), and conditionally conjugate
� logðBaselinei =4Þ folded-non-central-t family of prior distributions
for the standard deviation (Gelman, 2006). In this
þ β4 logðAgei Þ þ β5 V4i þ b0i ; (5) paper, we concentrate on the latter two approaches.
We considered three specifications based on
and the linear predictor for the HPNOD model is
inverse gammað2; 2Þ for the variance and half-
given by
Cauchy prior (a special case of the conditionally
ηij ¼ logðλij conjugate folded-non-central-t family of prior dis
¼ β0 þ β1 logðBaselinei =4Þ þ β2 Trti þ β3 Trti tributions) for the standard deviation (see Figure 1
� logðBaselinei =4Þ in the supplementary material):
1. pðσ 2 Þ,Γð1; 0:0005Þ—the default choice of the
þ β4 logðAgei Þ þ β5 V4i þ b0i þ uij : (6) INLA software (Rue et al., 2009);
2. pðσ 2 Þ,Γð0:001; 0:001Þ—the default choice of the
Here, Baselinei =4 denote the one-fourth of the baseline BUGS software (Lunn et al., 2012) and the most popular
seizure count for the ith patient, Trti is an indicator choice in Bayesian analysis;
variable for the treatment arm of the ith patient, which 3. pðσ 2 Þ,Γð0:5; 0:0164Þ—a specification proposed
takes the value of one if the patient is assigned to the by Fong et al. (2010);
new drug and zero if the patient is assigned to the 4. Half—Cauchy prior with scale 25 on σ—a specifi
placebo group, and V4i is an indicator variable for the cation proposed by (Gelman, 2006).
fourth visit. To aid convergence when fitting the HPN
and HPNOD models (5) and (6), respectively, the cov
ariates logðBaselinei =4Þ, logðAgei Þ, and Trti � 4 Result
logðBaselinei =4Þ were centred about their mean.
4.1 Analysis of the Anopheles mosquitoes count
data
3.2 Priors for the variance components of the
In this subsection, we present the analysis of the
hierarchical Poisson models
Anopheles mosquitoes count data introduced in
A Bayesian approach is attractive for modelling complex Section 2.1. We considered three estimation routines
longitudinal count data, but requires the specification of (namely: Stan, JAGS, and INLA) where all of them are
prior distributions for all the random elements of the accessed through R software version 3.3.2 (R Core
model. For the hierarchical models in Section 3.1, this Team, 2016) to fit the HPN and HPNOD models pre
involves choosing priors for the regression coefficients sented in section 3.1.1 and 3.1.2. For the Bayesian infer
and the hyperparameters σ 2b0 and σ 2u of subject and ence using runjags (Denwood, 2016) we run 3 chains
observation-specific random effects, respectively. Two with 30,000 MCMC iterations per chain from which the
classes of prior distributions, informative and non- first 10,000 iterations are considered burn-in period,
informative priors, are used in Bayesian modelling. while for RStan (Stan Development Team, 2016) the
One can use informative priors when substantial prior results are based on 4 chains with 4000 MCMC itera
information is available, for instance, from previous tions per chain (with the first 2000 the burn-in period).
studies relevant to the current data set. Non- Note that the models can be fitted in Stan using brms
informative prior distributions are intended to allow (Bürkner, 2017) as well. An elaborate discussion about
Bayesian inference for parameters about which little is two packages, RStan and brms, is given in Section 6 of
known beyond the data included in the analysis at hand the supplementary appendix. Model selection was made
(Gelman, 2006). In this paper, we focus on the choose of using the Deviance Information Criteria (DIC,
non-informative prior. Spiegelhalter et al., 2002) for INLA and JAGS and the
The specification of a non-informative prior to widely applicable information criteria (WAIC, Vehtari
express the absence of prior information about the et al., 2017) for Stan.
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 5
The posterior means and standard deviations for the methods lead to a similar result under Γð0:001; 0:001Þ
parameters of the HPN and HPNOD models estimated and Γð0:5; 0:0164Þ priors, but the posterior density of
using INLA, JAGS, and Stan using the four prior speci INLA for σ b02 deviates from that of Stan under the
fications are presented in Table 1. For all priors consid Γð1; 0:0005Þ and half Cauchy prior. For INLA, we
ered, the DIC and WAIC values of the HPNOD model observed differences in the point estimates of σ 2b0 and
are slightly higher than those of the HPN model, sug σ 2u across priors, where a profound variation across
gesting the latter to be a preferred model for this dataset.
priors is observed in the posterior density of σ 2u
All prior distributions and all methods of estimations
(Figure 4).
we examined yielded similar results for the regression
coefficients. However, the results in Table 1 revealed the
influence of the software used to fit the models and the 4.2 Analysis of the epilepsy data
prior specification on the parameter estimates corre
The HPN and HPNOD models formulated in equations
sponding to the random effect terms. The three estima
(5) and (6) were fitted using the three software and four
tion methods give similar point estimates for the
priors discussed in the previous section.
variance of the random intercept under the
Table 2 presents the posterior means obtained for
Γð0:001; 0:001Þ and Γð0:5; 0:0164Þ priors in both the
all fitted models. The DIC (WAIC) values of the
HPN and HPNOD models. On the other hand, the
HPNOD model obtained for all estimation methods
difference among estimation methods is noticeable
are smaller than those of the HPN model for all
under the Γð1; 0:0005Þ and half Cauchy prior, where
prior settings, indicating that the first model is pre
the posterior mean for σ 2b0 obtained from Stan is about
ferred. The three estimation methods lead to virtually
6% larger than INLA under Γð1; 0:0005Þ prior and identical results for the HPNOD model under priors
about 13% larger under the half Cauchy prior. Figure 3 2 and 3. However, under the half Cauchy prior
shows the posterior density of the precision of the ran (prior 4), the posterior means for the random inter
dom intercept (σ b02 ) obtained from JAGS, Stan and cept (σ 2b0 ) and overdispersion (σ 2u ) obtained for Stan
INLA using the four prior distributions. All estimation and JAGS are slightly higher than the posterior mean
Table 1. Parameter estimates of the HPN and HPNOD models for the Anopheles mosquitoes count data using INLA, JAGS and Stan
Stan JAGS INLA
HPN HPNOD HPN HPNOD HPN HPNOD
Pprior Param. Est sd Est sd Est SD Est SD Est sd Est sd
1 β00 1.55 0.23 1.55 0.22 1.55 0.23 1.55 0.23 1.56 0.23 1.56 0.23
β01 0.69 0.27 0.71 0.28 0.70 0.26 0.69 0.26 0.69 0.26 0.69 0.26
β10 −0.27 0.03 −0.27 0.03 −0.27 0.03 −0.27 0.03 −0.27 0.03 −0.27 0.03
β11 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05
σ b0 0.88 0.13 0.87 0.12 0.86 0.12 0.86 0.13 0.85 0.12 0.85 0.12
σu 0.00 0.00 0.03 0.02 0.03 0.02
DIC � 682.2 682.8 690.5 690.5 691.35 691.44
2 β00 1.55 0.24 1.55 0.23 1.55 0.23 1.56 0.23 1.55 0.23 1.55 0.23
β01 0.67 0.27 0.68 0.27 0.68 0.27 0.69 0.27 0.68 0.27 0.68 0.27
β10 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.03
β11 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05
σ b0 0.89 0.13 0.89 0.13 0.89 0.13 0.89 0.13 0.89 0.13 0.89 0.13
σu 0.04 0.05 0.08 0.05 0.08 0.05
DIC � 682.4 682.5 690.2 692.0 691.13 692.14
3 β00 1.55 0.23 1.55 0.23 1.55 0.23 1.56 0.23 1.55 0.23 1.55 0.23
β01 0.68 0.27 0.68 0.27 0.69 0.26 0.70 0.27 0.69 0.26 0.69 0.27
β10 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.04
β11 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05
σ b0 0.88 0.13 0.88 0.13 0.88 0.13 0.87 0.13 0.87 0.12 0.87 0.13
σu 0.05 0.05 0.14 0.05 0.13 0.04
DIC � 682.8 683.1 690.3 692.8 691.22 693.27
4 β00 1.54 0.24 1.55 0.24 1.55 0.23 1.57 0.24 1.56 0.23 1.56 0.23
β01 0.68 0.27 0.68 0.28 0.68 0.26 0.69 0.27 0.69 0.26 0.69 0.26
β10 −0.27 0.03 −0.28 0.03 −0.27 0.03 −0.28 0.04 −0.27 0.03 −0.28 0.03
β11 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05 −0.25 0.05
σ b0 0.91 0.14 0.91 0.14 0.87 0.12 0.87 0.13 0.85 0.12 0.85 0.12
σu 0.10 0.06 0.14 0.06 0.12 0.05
DIC � 681.6 683.7 690.2 693.8 691.35 693.06
Time�� 4.28 4.74 2.88 6.06 2.49 2.56
* The WAIC (Vehtari et al., 2017) is reported rather than DIC for Stan.
** The computational time for INLA is in seconds whereas the computational time for JAGS and Stan is in minutes.
6 BELAY BIRLIE
Prior1 Prior2
1.2 1.2
JAGS JAGS
INLA INLA
1.0 STAN 1.0 STAN
0.8 0.8
Density
Density
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0 1 2 3 4 0 1 2 3 4
σ−2
b0 σ−2
b0
Prior3 Prior4
1.2 1.2
JAGS JAGS
INLA INLA
1.0 STAN 1.0 STAN
0.8 0.8
Density
Density
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0 1 2 3 4 0 1 2 3 4
σ−2
b0 σ−2
b0
Figure 3. Anopheles mosquitoes count data: Posterior density for the precision of the random intercept (σb02 ) obtained for the HPN
model.
Pr1 Pr1
Pr2 Pr2
3.0
Pr3 Pr3
Pr4 Pr4
30
2.5
2.0
density
density
20
1.5
1.0
10
0.5
0.0
0.4 0.6 0.8 1.0 1.2 1.4 0.0 0.1 0.2 0.3 0.4
σb0 σu
Figure 4. Anopheles mosquitoes count data: Marginal posterior distribution of σb (left panel) and σu (right panel) for the analysis of
Anopheles mosquitoes count data using HPNOD model estimated via INLA. Pr1—Γð1; 0:0005Þ, Pr2—Γð0:001; 0:001Þ, Pr3
—Γð0:5; 0:0164Þ and Pr4—(half-Cauchyð0; 25Þ.
obtained for INLA. Figures 5 and6 show the poster 5 Simulation study
ior density of the precision of the random intercept
A simulation study was conducted to compare the per
(σ b02 ) and the precision of the overdispersion para
formance of the three Bayesian software and the four
meter (σ u 2 ) obtained from JAGS, Stan and INLA prior specifications for σ b0 and σ u presented in
using the four prior distributions. We notice Section 3.2.
a small difference between JAGS and INLA as well
as Stan and INLA for the two prior specifications
(priors 1 and 2). Differences between estimation 5.1 Simulation setting
methods are clearly seen when the half Cauchy The simulation represents a longitudinal study where
prior is used. the data are Poisson distributed. The steps for the
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 7
Table 2. Parameter estimates of the HPN and HPNOD models for the epilepsy data set using INLA, JAGS and stan
Stan JAGS INLA
HPN HPNOD HPN HPNOD HPN HPNOD
PPrior Param. Est sd Est sd Est SD Est SD Est sd Est sd
11 β0 1.62 0.08 1.57 0.08 1.62 0.08 1.58 0.07 1.62 0.08 1.58 0.08
β1 0.88 0.14 0.88 0.14 0.88 0.13 0.89 0.13 0.88 0.14 0.88 0.13
β2 −0.34 0.16 −0.34 0.15 −0.34 0.15 −0.33 0.15 −0.34 0.15 −0.33 0.15
β3 0.34 0.21 0.35 0.21 0.34 0.20 0.35 0.20 0.34 0.21 0.35 0.21
β4 0.47 0.36 0.48 0.36 0.47 0.36 0.48 0.35 0.48 0.36 0.48 0.35
β5 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09
σ b0 0.53 0.06 0.49 0.07 0.52 0.06 0.48 0.07 0.52 0.06 0.48 0.07
σu 0.36 0.04 0.35 0.04 0.35 0.04
DIC � 1328.4 1151.9 1271.9 1159.3 1272.3 1158.2
22 β0 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08
β1 0.88 0.14 0.88 0.14 0.88 0.15 0.88 0.14 0.88 0.14 0.88 0.14
β2 −0.34 0.16 −0.33 0.16 −0.34 0.16 −0.33 0.16 −0.34 0.16 −0.33 0.16
β3 0.34 0.22 0.35 0.21 0.35 0.22 0.36 0.21 0.34 0.22 0.35 0.21
β4 0.47 0.37 0.48 0.37 0.47 0.38 0.48 0.37 0.47 0.37 0.48 0.36
β5 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09
σ b0 0.54 0.07 0.50 0.07 0.54 0.07 0.50 0.07 0.54 0.06 0.50 0.07
σu 0.36 0.04 0.36 0.04 0.36 0.04
DIC � 1328.9 1149.1 1271.75 1159.0 1272.1 1157.8
33 β0 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08
β1 0.88 0.14 0.88 0.14 0.88 0.13 0.89 0.14 0.88 0.14 0.88 0.14
β2 −0.34 0.16 −0.33 0.15 −0.33 0.16 −0.34 0.16 −0.34 0.16 −0.33 0.15
β3 0.34 0.21 0.35 0.21 0.35 0.21 0.33 0.21 0.34 0.21 0.35 0.21
β4 0.48 0.37 0.49 0.36 0.49 0.37 0.46 0.36 0.48 0.37 0.48 0.36
β5 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09
σ b0 0.54 0.07 0.49 0.07 0.53 0.06 0.49 0.07 0.53 0.06 0.49 0.07
σu 0.36 0.04 0.36 0.04 0.36 0.04
DIC � 1326.7 1148.4 1271.5 1159.1 1272.2 1157.9
44 β0 1.62 0.08 1.57 0.08 1.62 0.08 1.57 0.08 1.62 0.08 1.58 0.08
β1 0.88 0.14 0.88 0.14 0.90 0.15 0.87 0.14 0.88 0.14 0.88 0.13
β2 −0.33 0.16 −0.33 0.16 −0.34 0.16 −0.33 0.16 −0.34 0.15 −0.33 0.15
β3 0.34 0.22 0.35 0.22 0.31 0.22 0.36 0.22 0.34 0.21 0.35 0.21
β4 0.47 0.37 0.48 0.37 0.45 0.38 0.48 0.36 0.48 0.36 0.48 0.35
β5 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09 −0.16 0.05 −0.10 0.09
σ b0 0.55 0.07 0.51 0.07 0.54 0.07 0.50 0.07 0.52 0.06 0.48 0.07
σu 0.37 0.04 0.37 0.04 0.35 0.04
DIC � 1326.5 1147.2 1271.7 1159.0 1272.3 1158.1
Time�� 5.80 5.44 3.23 5.06 4.30 12.78
* The WAIC (Vehtari et al., 2017) is reported rather than DIC for Stan.
** The computational time for INLA is in seconds whereas the computational time for JAGS and Stan is in minutes.
simulation study are as follows: For i ¼ 1; . . . ; N clus parameters for both the HPN and HPNOD models are
ters (subjects), j ¼ 1; . . . ; J observations per cluster β ¼ ðβ00 ; β01 ; β10 ; β11 Þ0 ¼ ð2; 2; 0:05; 0:2Þ0 . To study
observed at equally spaced sampling times how the performance of the estimation methods and
tij ¼ ðti1 ; . . . ; tiJ Þ0 , we generate Yij ,Poissonðλij Þ where the role of the prior distribution are affected by the value
of the cluster variance, we used four different values for
ηij ¼ logðλij Þ
the standard deviation parameter of the random inter
¼ β00 xi þ β01 ð1 xi Þ þ β10 xi tij þ β11 ð1 xi Þtij þ b0i cept term, i.e σ b0 ¼ 0:1; 0:5; 1; 1:5. Further, in order to
þ uij ; study the effect of small number of observations per
(7) cluster, we considered four balanced designs with J ¼
with b0i ,Nð0; σ 2b0 Þ; uij ,Nð0; σ 2u Þ, and xi is an indicator 2; 5 observations per cluster and the number of clusters
variable such that xi ¼ 0 for i � N=2 and xi ¼ 1 equal to N ¼ 20 and 40. In total 4 � 4 � 2 � 2 ¼ 64
otherwise. The model formulated in (7) is the HPNOD simulation setting was considered. For each setting 500
model discussed in (3.1.1). Low, moderate, and datasets were generated. For each dataset, the HPN and
high levels of overdispersion were considered HPNOD models were fitted with INLA, JAGS, and Stan
corresponding to σ u ¼ 0:2; 1 and 2, respectively using a flat prior distribution Nð0; 1000Þ for the fixed
(Aregay et al., 2015). A model without overdispersion effect parameters and using the four prior distributions
(HPN) was formulated by omitting uij from the mean listed in Section 3.2 for σ 2b0 and σ 2u . For JAGS and Stan,
structure in (7). The true values of the fixed effect we ran 3 chains with 20,000 iterations per chain from
8 BELAY BIRLIE
Prior1 Prior2
0.4 0.4
JAGS JAGS
INLA INLA
STAN STAN
0.3 0.3
Density
Density
0.2 0.2
0.1 0.1
0.0 0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
σ−2
b0 σ−2
b0
Prior3 Prior4
0.4 0.4
JAGS JAGS
INLA INLA
STAN STAN
0.3 0.3
Density
Density
0.2 0.2
0.1 0.1
0.0 0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
σ−2
b0 σ−2
b0
Figure 5. Epilepsy data: Posterior density for the precision of the random intercept (σb02 ) obtained for the HPNOD model.
Prior1 Prior2
0.30 0.30
JAGS JAGS
INLA INLA
0.25 STAN 0.25 STAN
0.20 0.20
Density
Density
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
0 5 10 15 20 25 30 0 5 10 15 20 25 30
σ−2
u σ−2
u
Prior3 Prior4
0.30 0.30
JAGS JAGS
INLA INLA
0.25 STAN 0.25 STAN
0.20 0.20
Density
Density
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
0 5 10 15 20 25 30 0 5 10 15 20 25 30
σ−2
u σ−2
u
Figure 6. Epilepsy data: posterior density for the precision of the overdispersion parameter (σu 2 ) obtained for the HPNOD model.
� � �2 �
1 500
which the first 10,000 iterations were considered as �i¼1 Varð^θi Þ þ ^θi θ ;
500
burn-in period. For each parameter of interest θ, the
relative bias was calculated by where ^θi is the parameter estimate for θ obtained for the
1 500 �^ � ith simulation. A simulation study for unbalanced long
�i¼1 θi θ =θ; itudinal data was conducted as well. The simulation
500
setting and result are discussed in detail in Section 5 of
and the mean squared error (MSE) by the supplementary appendix.
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 9
σu = 0 σu= 0.2
INLA JAGS Stan INLA JAGS Stan
1.0
0.5
0.5
HPN
HPN
0.0
Relative Bias 0.0
Relative Bias
−0.5
−0.5
1.0
0.5
0.5
HPNOD
HPNOD
0.0
0.0
−0.5
−0.5
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb0 σb0
σu= 1 σu= 2
INLA JAGS Stan INLA JAGS Stan
5 12
4
3 8
HPN
HPN
2
1 4
Relative Bias
Relative Bias
0
0
−1
5 12
4
3 8
HPNOD
HPNOD
2
1 4
0
0
−1
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb0 σb0
Figure 7. Relative bias for the random intercept σb0 (y-axis) as a function of the true value of σb0 (x-axis); panels by level of
overdispersion σu . Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and
purple for prior 4 (half-Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.
5.2 Simulation results model mis-specified the mean structure of the under
lying model used to generate the data (by omitting the
5.2.1 Estimation of σb0
overdispersion parameter). For datasets generated with
Figure 7 (upper left panel) presents the simulation
low levels of overdispersion (σ u ¼ 0:2), we observe
results obtained for the datasets generated without over
a similar pattern to that of no overdispersion. When
dispersion. For all prior specifications, and for all soft
the data are generated with a moderate to high level of
ware, differences were observed for σ b0 ¼ 0:1. For INLA
overdispersion, for the HPN model, all prior specifica
and JAGS, prior 1 led to an underestimation of σ b0 while
tions and software have a tendency to overestimate the
prior 2 4 led to an over estimation. For Stan, prior 4
value of σ b0 with the magnitude of overestimation
led to over estimation, while priors 1–3 led to an under
decreasing as the value of σ b0 increases. For the
estimation of σ b0 . Note that this pattern was observed
HPNOD model (which specified the mean structure
for both HPN and HPNOD, which in this case, mis-
correctly), for σ b0 ¼ 0:1, prior 1 consistently led to an
specified the mean structure (by including the over
under estimation of σ b0 across all software but with the
dispersion parameter).
smallest magnitude compared to the overestimation
The upper right and the lower panels in Figure 7
observed for the other priors.
present the simulation results obtained for the datasets
Similar patterns were observed for the second simu
generated with varying levels of overdispersion
lation study of unbalanced longitudinal data (see
(σ u ¼ 0:2, σ u ¼ 1, and σ u ¼ 2). In this case, the HPN
Section 5 of the supplementary material). When the
10 BELAY BIRLIE
true values for σ b are small, substantial variation is all regression coefficients under all settings. For JAGS,
observed among software and prior specifications and we observed a different pattern for β10 when the data are
this variation vanishes when the true values are rela generated with a high level of overdispersion (σ u ¼ 2)
tively large. and the HPNOD model used to fit the data. The relative
bias for this parameter increases in a linear fashion with
5.2.2 Estimation of σu the value of the standard deviation of the random
Figure 8 shows the results for the estimation of σ u across intercept.
the levels of σ b0 for the HPNOD model. Differences
between the priors can be observed for σ u ¼ 0:2. For 5.2.4 Effect of cluster size
INLA and Stan, all priors led to an underestimation of A simulation study also investigated the effect of
σ u (prior 1 with the highest magnitude). For JAGS, prior a small number of observations per cluster on the
4 led to an overestimation of σ u while the other priors to estimation. Two and five observations per cluster
an underestimation. were considered. Figure 10, presents the relative
bias for all the parameters of the HPNOD model.
5.2.3 Estimation of β‘s The results obtained for N ¼ 20, σ b0 ¼ 1, and σ u ¼ 1
The relative biases for the regression coefficients for indicate that, as expected, the relative bias decreases
given values of σ b0 and σ u are shown in Figure 9. For as the sample size increases. Note that this pattern
INLA and Stan, all priors lead to similar relative bias for was observed for all software and priors considered.
0.0
INLA
−0.3
−0.6
0.0
Relative Bias
JAGS
−0.3
−0.6
0.0
Stan
−0.3
−0.6
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
σu
Figure 8. Relative bias for the overdispersion parameter σu (y-axis) as a function of the true value for σu (x-axis). Red lines for prior 1
(Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-Cauchyð0; 25Þ).
N ¼ 20 and J ¼ 5.
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 11
HPN HPN HPN HPN HPNOD HPNOD HPNOD HPNOD HPN HPN HPN HPN HPNOD HPNOD HPNOD HPNOD
0.2
0.4
0.0
0.2
INLA
INLA
−0.2
0.0
−0.4
−0.2
−0.6
0.2
0.4
Relative Bias
Relative Bias
0.0
0.2
JAGS
JAGS
−0.2
0.0
−0.4
−0.2
−0.6
0.2
0.4
0.0
0.2
Stan
Stan
−0.2
0.0
−0.4
−0.2
−0.6
0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2
σb0 σb0
HPN HPN HPN HPN HPNOD HPNOD HPNOD HPNOD HPN HPN HPN HPN HPNOD HPNOD HPNOD HPNOD
3 0.4
2 0.2
INLA
INLA
0.0
1
−0.2
0
3 0.4
Relative Bias
Relative Bias
2 0.2
JAGS
JAGS
0.0
1
−0.2
0
3 0.4
2 0.2
Stan
Stan
0.0
1
−0.2
0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.81.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.40.81.2 0.4 0.8 1.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2 0.40.81.2
σb0 σb0
Figure 9. Relative bias for regression coefficients: β00 (top left panel), β01 (top right panel), β10 (bottom left panel), and β11 (bottom
right panel). Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for
prior 4 (half-Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.
The results obtained for the HPN model are similar xi ¼ 1 otherwise. We considered four set of values for
� � � �
and presented in Section 3 of the supplementary σ b0 0 0:1 0
σ b0 and σ b1 , i.e ¼ ;
appendix. 0 σ b1 0 0:1
� � � � � �
0:1 0 1:5 0 1:5 0
; ; and . Further,
5.3 Random intercept and slope model 0 1:5 0 0:1 0 1:5
low, moderate, and high levels of overdispersion were
We extend the simulation setting in Section 5.1 to considered corresponding to σ u ¼ 0:2; 1 and 2, respec
a random intercept and slope model. For i ¼ 1; . . . ; N tively. A model without overdispersion (HPN) was for
clusters (subjects), j ¼ 1; . . . ; J observations per cluster mulated by omitting uij from the mean structure in (8).
observed at equally spaced sampling times The true values of the fixed effect parameters are
tij ¼ ðti1 ; . . . ; tiJ Þ0 , we generate Yij ,Poissonðλij Þ where β ¼ ðβ00 ; β01 ; β10 ; β11 Þ0 ¼ ð2; 2; 0:05; 0:2Þ0 . For each
ηij ¼ logðλij Þ setting, we made 100 simulated data sets consisting of 20
¼ β00 xi þ β01 ð1 xi Þ þ β10 xi tij þ β11 ð1 xi Þtij þ b0i subjects with 5 measurements per subject and fit the HPN
þ b1i tij þ uij ; and HPNOD models with INLA, JAGS, and Stan using
the four prior distributions listed in Section 3.2 for σ 2b0 , σ 2b1
(8)
and σ 2u . For JAGS and Stan, we run 3 chains with 20,000
with b0i ,Nð0; σ 2b0 Þ, b1i ,Nð0; σ 2b1 Þ, uij ,Nð0; σ 2u Þ, and xi iterations per chain from which the first 10,000 iterations
is an indicator variable such that xi ¼ 0 for i � N=2 and were considered as burn-in period. Then, we computed
12 BELAY BIRLIE
INLA
−2
2
Relative Bias
JAGS
0
−2
Stan
−2
2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5
J:Observation per cluster
Figure 10. Relative bias for HPNOD model parameters (y-axis) as a function number of observation per cluster J (x-axis). Red lines for
prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-
Cauchyð0; 25Þ). N ¼ 20, σb0 ¼ 1, and σu ¼ 1.
the relative bias and mean squared for each parameter (see levels of overdispersion (σ 2u ¼ 0; 0:2; 1; and 2) and
Section 5.1). different values of σ b0 (σ b0 ¼ 0:1 and σ b0 ¼ 1:5). As
before, variation among priors and software is
5.3.1 Estimation of σb0 and σb1 observed when the true value for σ b1 is small
The relative bias for σ b0 obtained for the datasets gener (σ b1 ¼ 0:1) and the true value for σ b0 is large
ated with varying level of overdispersion (σ b1 ¼ 1:5). Overall, INLA performs better than
2
(σ u ¼ 0; 0:2; 1; and 2) and different values of σ b1 JAGS and Stan under all scenarios.
(σ b1 ¼ 0:1 and σ b1 ¼ 1:5) are presented in Figure 11.
We observe substantial variation among priors and soft 5.3.2 Estimation of σu
ware when σ b0 ¼ 0:1. When σ b0 ¼ 0:1 and the data gen Figure 13 presents the relative bias for σ u obtained for
erated without overdispersion, Prior 1 (Γð1; 0:0005Þ) the datasets generated with varying levels of overdisper
leads to underestimation while prior 2–4 leads to over sion. The variation among priors decreases as the true
estimation for all software. The relative bias decreases as value of overdispersion (σ u ) increases. However,
the true value for σ b0 increases. We also observed the a substantial difference is observed among software.
influence of the level of variation in the random slope INLA performs better than JAGS and Stan under all
(σ b1 ) on the estimate of σ b0 especially for JAGS. scenarios. JAGS and Stan consistently underestimate
� � � �
Figure 12 presents the relative bias for σ b1 σ b0 0 0:1 0
σu when ¼ and
obtained for the datasets generated with varying 0 σ b1 0 0:1
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 13
HPN
HPN
HPN
HPN
0.5 5.0
0.0 10 2.5 10
Relative Bias
Relative Bias
Relative Bias
Relative Bias
−0.5
0.0
0 0
30 10.0
1.5
30
1.0 7.5
20
HPNOD
HPNOD
HPNOD
HPNOD
0.5 5.0 20
0.0 10 2.5 10
−0.5
0.0
0 0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb0 σb0 σb0 σb0
σb1= 0.1,σu= 0.2 σb1= 1.5,σu= 0.2 σb1= 0.1,σu= 2 σb1= 1.5,σu= 2
INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan
30 20
1 15 40
20
HPN
HPN
HPN
HPN
10
10 20
0
Relative Bias
Relative Bias
Relative Bias
Relative Bias
5
0 0 0
30 20
1 15 40
HPNOD
HPNOD
HPNOD
HPNOD
20
10
10 20
0 5
0 0 0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb0 σb0 σb0 σb0
Figure 11. Relative bias for σb0 as a function of the it’s true value (x-axis); panels by level of overdispersion (σu ) and true values of σb1 .
Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-
Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.
HPN
HPN
HPN
5.0 5.0
1
2.5 2.5
Relative Bias
Relative Bias
Relative Bias
Relative Bias
−0.5 0
0.0 0.0
3 10.0
0.5
2 7.5 7.5
HPNOD
HPNOD
HPNOD
HPNOD
0.0 5.0 5.0
1
−0.5 2.5 2.5
0
0.0 0.0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb1 σb1 σb1 σb1
σb0= 0.1,σu= 0.2 σb0= 1.5,σu= 0.2 σb0= 0.1,σu= 2 σb0= 1.5,σu= 2
INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan INLA JAGS Stan
20
20
1.0
2 15
15
0.5
HPN
HPN
HPN
HPN
1 10 10
0.0 5
Relative Bias
Relative Bias
Relative Bias
Relative Bias
0 5
−0.5 0 0
20
20
1.0
2 15
15
HPNOD
HPNOD
HPNOD
HPNOD
0.5 10
1 10
0.0 5
0 5
−0.5 0 0
0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2 0.4 0.8 1.2
σb1 σb1 σb1 σ b1
Figure 12. Relative bias for σb1 as a function of the it’s true value (x-axis); panels by level of overdispersion (σu ) and true values of σb0 .
Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2 (Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-
Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.
� �
0:1 0 performance of the Bayesian estimation methods
. Further, when the data are generated with
0 1:5 and prior specifications for variance components
� � � � � �
σ b0 0 0:1 0 1:5 0 in the context of longitudinal count data. We com
¼ or , JAGS pared the results obtained with INLA to the results
0 σ b1 0 1:5 0 1:5
and Stan overestimate σ u when σ u ¼ 0:2 and under obtained with JAGS which uses Gibbs sampling and
estimate σ u when σ u ¼ 1 or σ u ¼ 2. Stan which uses Hamiltonian Monte Carlo while
assuming a variety of prior specifications for var
iance components. We analysed the influence of
6 Concluding remarks different factors such as small number of observa
In this paper, we performed a Monte Carlo simula tions per cluster, different values of the random
tion study in order to simultaneously evaluate the effect variance and estimation from a misspecified
14 BELAY BIRLIE
σb0 = 0.1, σb1 = 0.1 σb0 = 0.1, σb1 = 1.5 σb0 = 1.5, σb1 = 0.1 σb0 = 1.5, σb1 = 1.5
INLA
1
−1
2
Relative Bias
JAGS
1
−1
Stan
1
−1
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
σu
Figure 13. Relative bias for σu as a function of it’s true value (x-axis) obtained. Red lines for prior 1 (Γð1; 0:0005Þ), green for prior 2
(Γð0:001; 0:001Þ), light blue for prior 3 (Γð0:5; 0:0164Þ) and purple for prior 4 (half-Cauchyð0; 25Þ). N ¼ 20 and J ¼ 5.
model on the bias and mean squared errors of the considered independent prior for the random inter
parameter estimates. cept and slope. Future research should focus on the
A simulation study has shown that the approxi comparison of different software assuming general
mation strategy employed by INLA is accurate in priors for variance-covariance.
general and all software leads to similar results for
most of the cases considered. Estimation of the var
iance components, however, is difficult when their
Acknowledgements
true value is small for all estimation methods and The authors gratefully acknowledge the support from VLIR-
prior specifications. The estimates obtained for all UOS. For the simulations, the Flemish Supercomputer Centre,
software tend to be biased downward or upward funded by the Hercules Foundation and the Flemish
Government of Belgium – department EWI, was used.
depending on the prior. The results of the simulation
study also show that there is an effect of cluster size.
For all software and prior specifications, the relative Disclosure statement
bias for all parameters decrease as cluster size
No potential conflict of interest was reported by the authors.
increases. For the random intercept and slope
model, INLA performs better than JAGS and Stan
Funding
under all scenarios. In our simulation study of the
random intercept and slope model, we only The authors received no direct funding for this research.
BAYESIAN INFERENCE FOR GENERALIZED LINEAR MIXED MODELS 15
and fi. Journal of the Royal Statistical Society. Series B, in log-gaussian cox processes. Journal of Statistical
Statistical Methodology, 64(4), 583–639. https://doi.org/10. Computation and Simulation, 84(10), 2266-2284.
1111/1467-9868.00353 Thall, P. F., & Vail, S. C. (1990). Some covariance models for
Stan Development Team, 2015. Stan: A C++ library for prob longitudinal count data with overdispersion. Biometrics, 46
ability and sampling, version 2.5. 0.”. (3), 657–671. https://doi.org/10.2307/2532086
Stan Development Team. (2016). RStan: The R interface to Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical bayesian
Stan. version, 2.12, 1. model evaluation using leave- one-out cross-validation and
Taylor, B. M., & Diggle, P. J. (2013). INLA or MCMC? waic. Statistics and Computing, 27(5), 1413–1432. https://
a tutorial and comparative evaluation for spatial prediction doi.org/10.1007/s11222-016-9696-4