0% found this document useful (0 votes)

32 views41 pages

Intro Spatial Models INLA-3-43

Uploaded by

maria.guntsche

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views41 pages

Intro Spatial Models INLA-3-43

Uploaded by

maria.guntsche

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Introduction 1

1 Spatio-temporal disease mapping 3

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Spatio-temporal models for disease mapping . . . . . . . . . . 6
1.2.1 Linear time trend models . . . . . . . . . . . . . . . . . 7
1.2.2 General time trend models . . . . . . . . . . . . . . . . 8
1.3 Model fitting and inference . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Penalized quasi-likelihood (PQL) . . . . . . . . . . . . 10
1.3.2 Integrated nested Laplace approximations (INLA) . . . 14
1.4 The R-INLA package . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.1 Models for the latent Gaussian field . . . . . . . . . . . 17
1.4.2 Implementing the lCAR prior . . . . . . . . . . . . . . 21
1.4.3 Prior distribution for the hyperparameters . . . . . . . 23
1.4.4 Posterior distribution of linear combinations . . . . . . 25
1.4.5 Linear constraints for the latent Gaussian fields . . . . 27
1.4.6 Model selection criteria . . . . . . . . . . . . . . . . . . 28
1.4.7 Model calibration . . . . . . . . . . . . . . . . . . . . . 29
1.5 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.5.1 Brain cancer mortality data . . . . . . . . . . . . . . . 30
1.5.2 R code for model fitting in INLA . . . . . . . . . . . . 34

2 Evaluation of models for the detection of high-nrisk areas 41

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2 P-splines in spatio-temporal disease mapping . . . . . . . . . . 42
ii Contents

2.2.1 Interaction P-spline model . . . . . . . . . . . . . . . . 42

2.2.2 ANOVA-type P-spline model . . . . . . . . . . . . . . . 44
2.3 Autoregressive and moving average models . . . . . . . . . . . 44
2.3.1 BYMar model . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.2 STMARS model . . . . . . . . . . . . . . . . . . . . . 46
2.4 Some aspects of model fitting and model comparisons . . . . . 47
2.5 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.6.1 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6.2 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 A new model proposal: two-level spatially structured models 63

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Two-level structure models . . . . . . . . . . . . . . . . . . . . 64
3.3 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.1 Brain cancer mortality data in the municipalities of
Navarre and Basque Country . . . . . . . . . . . . . . 68
3.3.2 R code for model fitting in INLA . . . . . . . . . . . . 73
3.4 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.1 Data generation . . . . . . . . . . . . . . . . . . . . . . 77
3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4 Bayesian P-splines models for spatio-temporal count data 87

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Conclusions and further work 87

References 91

List of Figures 99

List of Tables 101

Introduction
Spatio-temporal disease mapping
1
1.1 Introduction
Spatio-temporal disease mapping comprises a wide range of models used to
describe the geographical distribution of a disease in space and its evolution
in time. The development of statistical techniques for disease mapping has
been tremendous in the last few years, mainly due to the availability of in-
formation from modern registers with high quality data recorded throughout
many years and regions. The information acquired from these analyses is
invaluable for health researchers and policy makers as it helps to formulate
hypothesis about the etiology of a disease, the main risk factors, and also to
allocate funds efficiently in hot spot areas, or to plan prevention/intervention
programmes.
The main reason to use models in spatio-temporal disease mapping stud-
ies is to borrow strength from spatial and temporal neighbors to reduce the
high variability inherent to classical risk estimators, such as the standardized
mortality ratio (SMR) or standardized incidence ratio (SIR). In particular,
when studying rare diseases or in analysis where a high number of areas
or time periods are involved. Research into spatial and spatio-temporal dis-
ease mapping has been carried out within a hierarchical Bayesian framework,
with generalized linear mixed models (GLMM) playing a major role. Two
main approaches have been followed for model fitting and inference, the em-
pirical Bayes (EB) and fully Bayes (FB) approach. Both approaches have
been used in the literature and both have advantages and disadvantages (see
Goicoa et al., 2012 for some discussion), but the FB approach has experi-
enced an enormous expansion due to the advent of modern computers and
free software to run Markov chain Monte Carlo (McMC) algorithms such
as WinBUGS (Spiegelhalter et al., 2003) and the publication of practical
4 Spatio-temporal disease mapping

monographs (see for example, Lawson, 2013).

The FB approach provides posterior marginal distributions of the tar-
get parameters instead of a single point estimate. However, the posterior
distributions are not usually available in closed form and McMC algorithms
have to be used, a computer intensive simulation-based technique. Even
though these methods are very general and can be applied to virtually any
model providing exact inference, in practice these algorithms can lead to
high Monte Carlo errors and large computation time due to the complexity
of disease mapping models (Schrödle et al., 2011) and the high dimension of
the data. Moreover, specific algorithms not implemented in available soft-
ware are often needed (Schmid and Held, 2004). Hence, a trade-off between
exact inference, model complexity and computing time has to be achieved.
This becomes an issue in spatio-temporal disease mapping where the data at
hand are usually large and the models are complex. Additionally, the choice
of priors for the hyperparameters is important to obtain reliable inference
(see for example Wakefield, 2007; Fong et al., 2010 for some discussion). In
addition to McMC, an approximate method for Bayesian inference in latent
Gaussian models has recently been developed by Rue et al. (2009), which uses
integrated nested Laplace approximations (INLA) to estimate the posterior
marginal distributions of the quantities of interest. Many latent Gaussian
models admit conditional independence properties leading to sparse preci-
sion matrices, and INLA takes advantage of this to speed computation and
provides Bayesian inference without running long and complex McMC algo-
rithms.
Model fitting and estimation in the EB approach commonly rely on the
well known penalized quasi-likelihood (PQL) technique. The maximum like-
lihood estimation of GLMM with counts usually requires numerical integra-
tion and PQL can reduce the problem to a series of weighted least squares
regressions using a Laplace approximation to the quasi-likelihood (see Bres-
low and Clayton, 1993). Hence, it has been used in disease mapping as an
alternative to McMC methods. It provides good point estimates for Pois-
son models (Dean et al., 2004), it is computationally simple and fast, and
it has few convergence problems. However, it can be less accurate for bi-
nomial data, and inference relies on asymptotic distributions without clear
guidelines about when this theory provides accurate inference (see Breslow,
2004 and the references therein for an in depth discussion about PQL). An
additional drawback of PQL is that the variability due to the estimation of
the variance components is not taken into account in the global computation
of the risk variability, but some authors (see for example Ugarte et al., 2008)
have developed a mean squared error estimator to circumvent this problem.
Different spatio-temporal disease mapping models have been proposed in
1.1 Introduction 5

the literature including parametric and non-parametric time trend and in-
teractions. The literature about Bayesian spatio-temporal disease mapping
is extensive. For example, Bernardinelli et al. (1995) use a spatio-temporal
model with linear trend while Assunção et al. (2001) consider a second-degree
polynomial trend model. Using non-parametric models, it deserves attention
the work by Knorr-Held (2000), where four types of space-time interactions
are proposed. Martínez-Beneito et al. (2008) focus on an autoregressive ap-
proach to spatio-temporal disease mapping, and Ugarte et al. (2009a) com-
pare the performance of different space-time models. Most of the research
in disease mapping is based on conditional autoregressive priors (CAR) for
both spatial and temporal effects, extending the seminal work of Besag et al.
(1991). However, other approaches based on splines have been developed.
Within an EB approach, MacNab and Dean (2001) consider autoregressive
local smoothing in space and B-spline smoothing for time. Ugarte et al.
(2010, 2012b) consider a pure interaction P-spline model for space and time,
and Ugarte et al. (2012a) use an ANOVA type P-spline model to describe
spatio-temporal patterns of prostate cancer mortality in Spain. From a FB
approach, spline smoothing has also been used in disease mapping (see for
example, MacNab, 2007; MacNab and Gustafson, 2007).
In this chapter, our target is to go deeply into the INLA possibilities to fit
space-time disease mapping models. Most of the work in spatial and spatio-
temporal disease mapping with INLA considers the Besag et al. (1991) model
(hereafter BYM model) which includes two spatial effects: one assuming a
Gaussian exchangeable prior to model unstructured heterogeneity and an-
other one assuming an intrinsic conditional autoregressive prior (iCAR) for
the spatially structured variability. See for example, Schrödle et al. (2011);
Schrödle and Held (2011a); Held et al. (2010); Schrödle and Held (2011b) or
Blangiardo et al., 2013. However, the iCAR prior is improper and has the
undesirable large-scale property of tending to a negative pairwise correla-
tion for regions located further apart (see MacNab, 2011; Botella-Rocamora
et al., 2013). In addition, the variance components in the BYM convolution
model are not identifiable from the data (MacNab, 2014) and informative
hyperpriors are needed for posterior inference. In this paper we consider the
prior proposed by Leroux et al. (1999) that has been shown to outperform
the iCAR prior (Lee, 2011). This model can be easily implemented using
the R-INLA package as it will be shown later. It has already been used to
construct a local adaptive algorithm for spatial smoothing (Lee and Mitchell,
2013).
In Section 1.2, different spatio-temporal models are described and the
necessary set of identifiability constraints for each model are provided. De-
tails about both PQL and INLA estimation techniques are also included in
6 Spatio-temporal disease mapping

Section 1.3. Additionally, a description of the R-INLA package and some of its
more useful tools are described in detail in Section 1.4. Finally, in Section 1.5
male brain cancer mortality data in Spanish provinces is used as illustration
to show how to fit spatio-temporal models using INLA.

1.2 Spatio-temporal models for disease map-

ping
A wide range of spatio-temporal models for disease mapping have been pro-
posed in the literature, most of them based on CAR models extending the
well known BYM model (Besag et al., 1991). In this section two models with
parametric time trends and a battery of non-parametric models including dif-
ferent types of space-time interactions (Knorr-Held, 2000) will be described.
Suppose that the region under study is divided into n small areas labelled
as i = 1, . . . , n. For each area i, data are available for different time periods
labelled by t = 1, . . . , T . Then, conditional on the relative risk rit , the number
of counts Oit is assumed to be Poisson distributed with mean µit = eit rit ,
where eit is the number of expected cases for area i and time t. That is

Oit |rit ∼ P oisson(µit = eit rit ),

log(µit ) = log(eit ) + log(rit ).

Here, log(eit ) is an offset and depending on the specification of log(rit ) dif-

ferent models are defined.
To compute the number of expected cases eit , both direct and indi-
rect ‘age-and-sex’ standardization procedures can be performed. The direct
method uses a single standard population to compute the ‘age-and-sex’ ad-
justed rates for each area and time period, producing rates that these areas
would have if they had the same age and sex distribution as the standard
population. On the other hand, the indirect method uses the same ‘age-and-
sex’ rates (generally those computed using the information from all the areas
together along the whole study period) applied to the observed population in
each small area and time point. The indirect standardization procedure has
been considered in all the real data analysis presented in this dissertation, so
that eit is computed as
J
X Oj
eit = Nitj i = 1, ..., n; t = 1, ..., T,
j=1
Nj
1.2 Spatio-temporal models for disease mapping 7

where Oj and Nj are respectively the number of counts and the population
size in ‘age-and-sex’ group j ∈ {1, . . . , J}, so that

n X
X T n X
X T
Oj = Oitj , Nj = Nitj .
i=1 t=1 i=1 t=1

1.2.1 Linear time trend models

In this section, parametric Bayesian models with a linear time trend similar
to the ones proposed by Bernardinelli et al. (1995) are considered. These
models are natural extensions of the BYM spatial model, with an additional
linear time trend and a differential time trend for each small area. The log
risks are modelled as

log(rit ) = η + ξi + (β + ϕi ) · t (1.1)

where η is an intercept, ξi is the spatial effect, β represents an overall linear

time trend and ϕi captures the interaction between the linear time trend and
the spatial effect ξi . In this dissertation, the Leroux et al. (1999) CAR prior
distribution (lCAR) has been considered for the spatial effect ξi . That is, the
0
prior for the spatial random effects ξ = (ξ1 , . . . , ξn ) is given by

ξ ∼ N (0, [τξ (λξ Rξ + (1 − λξ )In )]−1 ), (1.2)

where λξ is a spatial smoothing parameter taking values between 0 and 1,

In is an identity matrix of dimension n × n, and Rξ is the n × n spatial
neighborhood matrix with diagonal elements equal to the number of neigh-
bors of each area and non-diagonal elements (Rξ )ij = −1 if areas i and j
are neighbors and (Rξ )ij = 0 otherwise. Here, two areas are considered as
neighbors if they share a common border. Note that when λξ = 0 the lCAR
prior reduces to an exchangeable prior ξ ∼ N (0, τξ−1 In ), whereas the intrin-
sic CAR prior ξ ∼ N (0, [τξ Rξ ]− ) is obtained when λξ = 1. The symbol −
denotes the Moore-Penrose generalized inverse of a matrix. The univariate
full conditional distributions corresponding to Equation (1.2) are given by
!
λξ X τξ−1
ξi |ξ −i ∼ N ξj , , (1.3)
1 − λξ + λξ ni j∼i 1 − λξ + λξ ni

where ξ −i denotes the random effect vector without the ith component, j ∼ i
indicates that areas i and j are neighbors, and ni is the number of neighbors
8 Spatio-temporal disease mapping

of area i.
Two different prior distributions for the differential trend ϕi are examined.
The first one assumes an exchangeable distribution, that is, the prior for the
0
differential trend ϕ = (ϕ1 , . . . , ϕn ) is given by ϕ ∼ N (0, τϕ−1 In ). The second
one considers an iCAR prior distribution, so that ϕ ∼ N (0, [τϕ Rξ ]− ).

1.2.2 General time trend models

The assumption of a linear time trend may be very unrealistic in practice,
where it is common to observe change points in temporal trends due to
improvement in treatments, screening and early detection programmes, and
research advances in general. A natural extension to Equation (1.1) is to drop
out linearity and assume non-parametric trends. In this dissertation, different
non-parametric models including space time interactions are considered. The
models are similar to those proposed by Knorr-Held (2000), except for the
prior distribution used for the spatial component. Here, the log-risk is model
as
log(rit ) = η + ξi + φt + γt + δit (1.4)
where η quantifies the logarithm of the global risk, ξi is the spatial compo-
nent, φt and γt represent the unstructured and structured temporal effects
respectively, and δit is the space-time interaction effect. Note that dropping
the interaction terms leads to additive models. All the components in Equa-
tion (1.4) can be modelled as Gaussian Markov random fields (GMRF) (see
Rue and Held, 2005), and prior densities can be written according to some
structure matrices. Again, the lCAR prior is considered for the spatial ran-
dom effect ξ. The unstructured temporal random effects φt are modelled as
independent and identically distributed normal random variables with mean
0
0 and precision τφ . That is, φ = (φ1 , . . . , φT ) ∼ N (0, τφ−1 IT ), where It
is the T × T identity matrix. For the structured temporal random effects
0
γ = (γ1 , . . . , γT ) , random walks of first (RW1) or second order (RW2) prior
distributions will be assumed, i.e., γ ∼ N (0, [τγ Rγ ]− ), where Rγ is the T × T
structure matrix of a RW1
 
1 −1
 −1 2 −1 
 

 −1 2 −1 

Rγ = 
 . . . . . . 
. . . 
 

 −1 2 −1 

 −1 2 −1 
−1 1
1.2 Spatio-temporal models for disease mapping 9

Table 1.1: Specification and rank deficiency for the four possible types of
space-time interaction proposed by Knorr-Held (2000).

Rank deficiency of Rδ
Space-time interaction Rδ
RW1 prior for γ RW2 prior for γ
Type I In ⊗ IT − −
Type II In ⊗ Rγ n 2n
Type III Rξ ⊗ IT T T
Type IV Rξ ⊗ Rγ n+T −1 2n + T − 2

or a RW2  
1 −2 1

 −2 5 −4 1 


 1 −4 6 −4 1 

Rγ = 
 .. .. .. 
. . . 
 

 1 −4 6 −4 1 

 1 −4 5 −2 
1 −2 1

0
The interaction random effect δ = (δ11 , . . . , δ1T , . . . , δn1 , . . . , δnT ) is as-
sumed to follow the multivariate normal distribution δ ∼ N (0, [τδ Rδ ]− ),
where Rδ is the nT × nT matrix obtained as the Kronecker product of the
corresponding spatial and temporal structure matrices (see Clayton, 1996).
The four types of interactions proposed by Knorr-Held (2000) will be con-
sidered. In Type I interactions (Rδ = In ⊗ IT ), all parameters δit are a
priori independent without any structure in space and time. In Type II in-
teractions (Rδ = In ⊗ Rγ ), each δi· for i = 1, . . . , n follows a random walk
independently for all other regions. I.e., temporal trends are different from
region to region, but do not have any structure in space. In Type III in-
teractions (Rδ = Rξ ⊗ IT ), each δ·t for t = 1, . . . , T follows an independent
intrinsic CAR prior distribution. I.e., different spatial distributions for each
time point without any temporal structure are assumed. Finally, in Type IV
interactions (Rδ = Rξ ⊗ Rγ ), each δit is completely dependent over space
and time. That is, different temporal trends are assumed from region to
region, but are more likely to be similar for adjacent regions. The structure
matrices for the different type of interactions and their rank deficiencies are
summarized in Table 1.1.
10 Spatio-temporal disease mapping

In the spatio-temporal model of Equation (1.4), identifiability problems

arise because the overall level can be absorbed by the spatial and the time
effects, and the interaction terms are confounded with the main effects. To
ensure model identifiability, sum-to-zero constraints are usually imposed over
the random effects of the model (see for example Knorr-Held, 2000; Schmid
and Held, 2004 or Schrödle et al., 2011). Necessary identifiability constraints
using RW1 or RW2 prior for the temporally structured random effect and
different types of space-time interactions are summarized in Table 1.2. How-
ever, no clear guidance is given in the literature about why these constraints
have to be considered. The details of how these sum-to-zero constraints solve
the identifiability problems in spatio-temporal models are given in ??.

1.3 Model fitting and inference

Model fitting and inference with spatio-temporal disease mapping models
have usually been done using either an empirical Bayes (EB) or fully Bayes
(FB) approach. In the EB approach, penalized quasi-likelihood (PQL) has
been widely used (see for example MacNab and Dean, 2001; Dean et al.,
2004; Ugarte et al., 2008, 2009b, 2010). From a FB perspective, Markov
chain Monte Carlo (McMC) techniques have been used because the posterior
distributions usually cannot be obtained in closed form (see for example
Bernardinelli et al., 1995; Knorr-Held and Besag, 1998; Knorr-Held, 2000;
Best et al., 2005; Ainsworth and Dean, 2006; Martínez-Beneito et al., 2008
or Ugarte et al., 2009a). Although these techniques have been widely used,
the implementation may not be easy for practitioners as algorithms have to
be carefully chosen (Knorr-Held and Rue, 2002; Schmid and Held, 2004), and
difficulties such as long computing times and large Monte Carlo errors usually
appear with complex models (Schrödle et al., 2011). An alternative to McMC
for Bayesian inference based on integrated nested Laplace approximations
and known as INLA as been recently proposed by Rue et al. (2009). This
technique can be easily used in the free statistical software R using the R-INLA
package.
In the following sections, both PQL and INLA techniques will be shortly
described.

1.3.1 Penalized quasi-likelihood (PQL)

The penalized quasi-likelihood algorithm (PQL) was proposed by Laird (1978)
and Stiratelli et al. (1984) as an approximate Bayes procedure for GLMMs.
This method has been widely used in disease mapping applications since
1.3 Model fitting and inference 11

Table 1.2: Identifiability constraints for the spatio-temporal CAR models

described in Section 1.2.2.

Interaction RW1 prior for γ RW2 prior for γ

n
P T
P n
P T
P
ξi = 0, γt = 0, ξi = 0, γt = 0,
Type I i=1 t=1 i=1 t=1
n P
P T n P
P T n P
P T
δit = 0 δit = tδit = 0
i=1 t=1 i=1 t=1 i=1 t=1

n
P T
P n
P T
P T
P
ξi = 0, γt = 0, ξi = 0, γt = tγt = 0,
Type II i=1 t=1 i=1 t=1 t=1
T
P T
P
δit = 0, for i = 1, . . . , n δit = 0, for i = 1, . . . , n
t=1 t=1

n
P T
P n
P T
P
ξi = 0, γt = 0, ξi = 0, γt = 0,
Type III i=1 t=1 i=1 t=1
n
P n
P
δit = 0, for t = 1, . . . , T δit = 0, for t = 1, . . . , T
i=1 i=1

n
P T
P n
P T
P
ξi = 0, γt = 0, ξi = 0, γt = 0,
i=1 t=1 i=1 t=1
T T
Type IV P
δit = 0, for i = 1, . . . , n
P
δit = 0, for i = 1, . . . , n
t=1 t=1
n
P n
P
δit = 0, for t = 1, . . . , T δit = 0, for t = 1, . . . , T
i=1 i=1

Breslow and Clayton (1993) applied it to the well-known Scottish lip cancer
data. The PQL analysis relies on a series of approximations to the mixed
model using a first order Taylor series expansion of the link function. First,
an appropriate working vector (Schall, 1991) must be defined to achieve cor-
respondence with the normal mixed model. Once the estimation of the fixed
and random effects are obtained, the restricted maximum likelihood (REML)
equations are used (see Harville, 1977) to estimate the variance components
of the model.
For notational simplicity, let consider the spatio-temporal CAR model of
Equation (1.4) without the unstructured temporal random effect φt . This
model can be expressed in matrix form as
12 Spatio-temporal disease mapping

log(r) = (1nT )η + (In ⊗ 1T )ξ + (1n ⊗ IT )γ + (In ⊗ IT )δ

0
where r = (r11 , . . . , r1T , . . . , rn1 , . . . , rnT ) , and 1n and 1T are column vectors
of ones of length n and T respectively. The components of the working vector
for this spatio-temporal model are

Y = Xη + Z1 ξ + Z2 γ + Z3 δ + (O − µ)g 0 (µ),

where X is the fixed effects matrix (here a column of ones), Z1 = In ⊗1T is the
design matrix of the main spatial random effect, Z2 = 1n ⊗ IT is the design
matrix of the main temporal random effect, Z3 = In ⊗ IT is the design matrix
of the interaction term, µ is the vector of means of the Poisson distribution,
g is the link function (here the logarithmic function) and g 0 (µ) = 1/µ. Then
a correspondence with a normal mixed model is attained as

Y = Xη + Z1 ξ + Z2 γ + Z3 δ + ,

where = (O − µ)g 0 (µ) ∼ N (0, W−1 ), and W = diag{µit }. The fixed effect
estimator is obtained as

η̂ = (X0 V̂−1 X)−1 X0 V̂−1 Y,

with asymptotic covariance matrix (X0 V̂−1 X)−1 , where

0 0 0
V = W−1 + Z1 G1 Z1 + Z2 G2 Z2 + Z3 G3 Z3

and

G1 = [τξ (λξ Rξ + (1 − λξ )In )]−1 , G2 = [τγ Rγ ]− , G3 = [τδ Rδ ]− ,

are the covariance matrices of the spatial ξ, temporal γ and spatio-temporal δ

0
random effects. Given Θ = (η, τξ , λξ , τγ , τδ ) , the random effects are predicted
as
0
ξ̂ = E[ξ|Y, Θ̂] = Ĝ1 Z1 V̂−1 (Y − Xη̂),
0
γ̂ = E[γ|Y, Θ̂] = Ĝ2 Z2 V̂−1 (Y − Xη̂),
0
δ̂ = E[δ|Y, Θ̂] = Ĝ3 Z3 V̂−1 (Y − Xη̂),
1.3 Model fitting and inference 13

because the conditional distributions of the random effects given Y are

0 0
ξ|Y ∼ N (G1 Z1 V−1 (Y − Xη), G1 − G1 Z1 V−1 Z1 G1 ),
0 0
γ|Y ∼ N (G2 Z2 V−1 (Y − Xη), G2 − G2 Z2 V−1 Z2 G2 ),
0 0
δ|Y ∼ N (G3 Z3 V−1 (Y − Xη), G3 − G3 Z3 V−1 Z3 G3 ).
0
To estimate the variance components θ = (τξ , λξ , τγ , τδ ) , the REML equa-
tions are used

1 0 −1 ∂V ∂V
(Y − Xη̂) V (Y − Xη̂) − tr(P) = 0,
2 ∂θ r ∂θ r
0
where P = V−1 − V−1 X(X0 V̂−1 X)−1 X V−1 . The asymptotic variance of θ̂,
Var(θ̂), is given by the inverse of the information matrix I with components

1 ∂V ∂V
Irs = tr P P .
2 ∂θ r ∂θ s

Given the initial estimates of the parameters, PQL first solves for fixed
and random effects considering fixed values of the variance parameters. Then,
the variance parameters are updated from the REML equation. The pro-
cedure is repeated until the convergence criteria is achieved. Finally, the
relative risks are estimated as

r̂it = exp{Xη̂ + Z1 ξ̂ + Z2 γ̂ + Z3 δ̂}.

In disease mapping models, the computation of the prediction errors is

crucial for detecting extreme risk areas. Accurate confidence intervals for
the true value of the risks must be computed in order to decide whether an
area exhibit extreme risks. In general, if the lower (upper) bound of the
credible interval of the risk is greater (lower) than 1, the region is classified
as a high (low) risk area. Several techniques have been developed to assess
the prediction error in small area applications (and in disease mapping in
particular). A key paper of this issue is Prasad and Rao (1990), where the
authors incorporate the uncertainty associated with the estimation of the
variance components. Ugarte et al. (2008) derive an appropriate mean square
error (MSE) estimator of the log-risk predictor and build the corresponding
confidence intervals for the risks in a spatial context with CAR models. A
very similar approach can be considered to compute the MSE estimator in
the spatio-temporal context.
14 Spatio-temporal disease mapping

1.3.2 Integrated nested Laplace approximations (INLA)

The INLA approach recently proposed by Rue et al. (2009), is a deterministic

algorithm for Bayesian inference based on integrated nested Laplace approx-
imations. INLA is especially designed for latent Gaussian models (a subclass
of structured additive regression models), which are flexible enough to be
used in many different types of applications. In these models, the response
0
variable y = (y1 , . . . , yN ) is assumed to belong to an exponential family,
where the mean µi is linked to a predictor νi trough a link function g(·), such
that g(µi ) = νi . The structure additive predictor νi is defined as follows
J
X L
X
νi = β0 + βj uji + fl (zli ) for i = 1, . . . , N. (1.5)
j=1 l=1

where β0 is a scalar representing the intercept, the coefficients β = {β1 , . . . , βJ }

0
quantify the linear effect of some covariates u = (u1 , . . . , uJ ) on the response,
and f = {f1 (·), . . . , fL (·)} is a collection of unknown functions of the covari-
0
ates z = (z1 , . . . , zL ) . The terms fl (·) can assume different forms such as
smooth and nonlinear effects of covariates, time trends and seasonal effects
as well as temporal or spatial random effects, defining a very flexible class of
models.
Models described in Section 1.2 fit into this framework and are usually
built as Bayesian hierarchical models with three stages. The first stage is
the observational model π(y|x), where π(·|·) denotes the conditional density
and y is the vector of observations. We assume that yi are conditionally
independent given the vector of all the latent (non-observable) components
of interest x and the vector of hyperparameters θ, so the distribution of the
N observations is given by the likelihood
N
Y
π(y|x, θ) = π(yi |xi , θ).
i=1

The second stage is the latent Gaussian field π(x|θ), where a multivariate
Gaussian prior with zero mean and precision matrix Q is assumed for x. This
precision matrix typically depends on the hyperparameters θ (third stage),
which are not necessarily Gaussian. That is, x ∼ N (0, Q−1 (θ)) with density
function given by

−N/2 1/2 1 0
π(x|θ) = (2π) |Q(θ)| exp − x Q(θ)x .
2
1.3 Model fitting and inference 15

The components of the latent Gaussian field x are supposed to be condition-

ally independent with the consequence that Q(θ) is a sparse precision matrix.
Note that if the components xi and xj are conditionally independent given all
the other components x−ij , that is, if the joint conditional distribution can
be factorized as π(xi , xj |x−ij ) = π(xi |x−ij )π(xj |x−ij ), then Qij (θ) = 0 and
vice versa. This specification is known as latent Gaussian Markov random
field (GMRF, Rue and Held, 2005). Therefore, numerical methods for sparse
matrices can be used when making inference with GMRFs, which are much
quicker than general algorithms for dense matrices (see Rue and Held, 2005
for algorithms).
Note that in the particular disease mapping model of Section 1.2.2, the
0 0 0 0 0
latent Gaussian field is defined as x = (η, ξ , φ , γ , δ ) , while the unknown
precision parameters and the spatial smoothing parameter form the vector
0
of hyperparameters θ = (τξ , λξ , τφ , τγ , τδ ) .
In the following, we briefly explain the approximate Bayesian inference
strategy of INLA. For further details see Rue et al. (2009) or Blangiardo
and Cameletti (2015). The main goal is to estimate the marginal posterior
distributions for each element of the GMRF
Z Z
π(xi |y) = π(xi , θ|y)dθ = π(xi |θ, y)π(θ|y)dθ (1.6)

and for each element of the hyperparameter vector

Z
π(θk |y) = π(θ|y)dθ −k .

The key feature of the INLA approach is to construct nested approximations

of Equation (1.6), for what it is necessary to compute
(i) π(θ|y), from which also the relevant marginals π(θk |y) can be obtained,

(ii) π(xi |θ, y), which is needed to compute the marginal posteriors π(xi |y).
For the first task (i), the Laplace approximation method described in Tierney
and Kadane (1986) can be used, so that the joint posterior density of the
hyperparameters π(θ|y) is approximated as

π(y|x, θ)π(x|θ)π(θ)
π̃(θ|y) ∝ x=x∗ (θ ),
(1.7)
π̃G (x|θ, y)

where the denominator π̃G (x|θ, y) denotes the Gaussian approximation to

the full conditional distribution of x, and x∗ (θ) is the mode for a given θ.
To integrate out the uncertainty with respect to θ, it is essential to explore
16 Spatio-temporal disease mapping

the properties of expression Equation (1.7) and find good evaluation points
θk for a numerical integration of Equation (1.6). This is done by an iterative
algorithm (Rue et al., 2009). Additionally, an appropriate area weight ∆k
must be assigned to each θk . Details about how posterior marginals π(θk |y)
are computed using numerical integration of an interpolant are available in
Martins et al. (2013).
For the second task (ii), three different approaches are possible: a Gaus-
sian approximation, a full Laplace approximation, and a simplified Laplace
approximation. In the Gaussian approximation, the posterior conditional
distributions π(xi |θ, y) are directly approximated as the marginals from
π̃G (x|θ, y). Even this approximation is the fastest option and often gives
accurate results in short computational time, according to Rue and Martino
(2007) unsatisfactory results can be obtained due to errors in the location
of the posterior marginals, errors due to the lack of skewness or both. This
approximation can be improved through applying another Laplace approxi-
mation to π(xi |θ, y) similar to the one described in Equation (1.7). However,
this “full Laplace” strategy can be computationally expensive. That is the
reason Rue et al. (2009) develop the simplified Laplace approximation based
on a series expansion of the full Laplace approximation. This method is less
time consuming and is very competitive in many applications.
Finally, an approximation to the posterior marginal density of Equa-
tion (1.6) is given by
X
π̃(xi |y) = π̃(xi |θk , y)π̃(θk |y)∆k .
k

1.4 The R-INLA package

The INLA methodology briefly described above is implemented in the open
source GMRFLib library written in C and Fortran (Martino and Rue, 2009).
An interface with the free statistical software R (R Core Team, 2016) is also
available, called R-INLA, allowing model specification and fitting within R.
The package can be downloaded and installed in R by typing

> install.packages("INLA", repos="https://www.math.ntnu.no/inla/R/stable")

for the stable version, or

> install.packages("INLA", repos="https://www.math.ntnu.no/inla/R/testing")

for the testing version. To upgrade the package to the latest version (type
inla.version() to find out the currently installed version), use either the
1.4 The R-INLA package 17

inla.upgrade(testing=TRUE) or inla.upgrade(testing=FALSE) commands.

Documentation for the package, many worked examples, and a discussion fo-
rum are also available in the R-INLA website http://www.r-inla.org/.
As mentioned in the previous section, fixed effects, smooth and nonlinear
terms, seasonal effects and random effects can be included in a formula
argument as calls to the f() function. The interface is flexible enough to
allow for the specification of different latent models and prior distribution
for the hyperparameters (see Section 1.4.1 and Section 1.4.3 for details). We
run the INLA algorithm with a call to the inla() function

> inla(formula, family=<family>, data=<data>, ...)

where formula has been previously defined, family is a string indicating

the likelihood family1 , and data is a data frame or list containing all the
variables included in the model. Many other additional arguments can be
included into the inla function, see help(inla) for a complete list.
The output of the function is an object of class inla, a list containing
all the results which can be explored with the names() function. By default,
marginal distributions for the latent field and for the hyperparameters are
computed. In addition, the marginal posterior distribution of the linear pre-
dictor can be computed using the control.predictor=list(compute=TRUE)
argument. Other features as the integration strategy for π(θk |y) ("auto" (de-
fault), "ccd", "grid" or "eb") and the approximation strategy for π(xi |θ, y)
("gaussian", "simplified.laplace" (default) or "laplace") can be also
controlled within the control.inla=list(...) argument.
Many examples of regression models, area and point-level spatial and
spatio-temporal processes, as well as the corresponding R code for model
fitting in R-INLA can be found in Blangiardo and Cameletti (2015). In
the following sections, a detailed description of some R-INLA (version 0.0-
1480869339, dated 2016-12-04) model fitting tools described through this
dissertation have been included.

1.4.1 Models for the latent Gaussian field

Many different latent models are implemented in the R-INLA package. The
list of all available models can be obtained typing

> names(inla.models()$latent)
[1] "linear" "iid" "mec" "meb"
[5] "rgeneric" "rw1" "rw2" "crw2"
1
Type names(inla.models()$likelihood) to obtain the list of available likelihoods.
18 Spatio-temporal disease mapping

[9] "seasonal" "besag" "besag2" "bym"

[13] "bym2" "besagproper" "besagproper2" "fgn"
[17] "ar1" "ar" "ou" "generic"
[21] "generic0" "generic1" "generic2" "generic3"
[25] "spde" "spde2" "spde3" "iid1d"
[29] "iid2d" "iid3d" "iid4d" "iid5d"
[33] "2diid" "z" "rw2d" "rw2diid"
[37] "slm" "matern2d" "copy" "clinear"
[41] "sigm" "revsigm" "log1exp" "logdist"

For each model, a detailed description and usage examples are provided in
http://www.r-inla.org/models/latent-models. Some of them are briefly
0
described in what follows. Assuming that x = (x1 , . . . , xk ) is a vector of
length k:

• The "iid" model, defines an independent random noise (or exchangeable)

prior for x. That is
x ∼ N (0, τ −1 Ik ), (1.8)
where τ is the precision parameter and Ik is the identity matrix of dimen-
sion k × k. The full conditional distribution of Equation (1.8) is given
by
xi |x−i ∼ N (0, τ −1 ).
This model is specified inside the f() function as

> f(x, model="iid", ..., hyper=list(prec=list(...)))

• The "besag" model, defines an intrinsic CAR prior for x. That is

x ∼ N (0, [τ R]− ), (1.9)

where τ is the precision parameter and R is the k × k spatial neighbor-

hood matrix (see Section 1.2.1). The full conditional distribution of Equa-
tion (1.9) is given by
!
1 X 1
xi |x−i ∼ N xj , ,
ni j∼i ni τ

where ni is the number of neighbors of area i, and j ∼ i indicates that

areas i and j are neighbors. This model is specified inside the f() function
as
1.4 The R-INLA package 19

> f(x, model="besag", graph=<graph>, ...,

+ hyper=list(prec=list(...)))

where the spatial neighborhood matrix R is passed to the program through

the graph argument (an inla.graph object, a symmetric matrix or a
filename containing thePgraph). The model automatically imposes the
sum-to-zero constraint ki=1 xi = 0 (constr=TRUE).

• The "bym" model, defines the BYM (or convolution) prior for x proposed
by Besag et al. (1991). That is

u ∼ N (0, [τv R]− ),

x = u + v; with (1.10)
v ∼ N (0, τu−1 Ik )
0
where u = (u1 , . . . , uk ) is the spatially structured component with preci-
0
sion parameter τu (iCAR prior) and v = (v1 , . . . , vk ) represents the un-
structured spatial component with precision parameter τv (iid prior). This
model is specified inside the f() function as

> f(x, model="bym", graph=<graph>, ...,

+ hyper=list(prec.spatial=list(...), prec.unstruct=list(...)))

Since each data point is represented by two random effects, only their sum
is identifiable. The "bym" model computes both the posterior distribution
of u + v (first k elements), and the posterior distribution of the spatially
structured effect u (elements from kP+ 1 to 2k). The model automatically
imposes the sum-to-zero constraint ki=1 (ui + vi ) = 0 (constr=TRUE).

• The "rw1" model, defines a first order random walk prior for x. It is
constructed assuming independent increments

∆xi = xi − xi+1 ∼ N (0, τ −1 ) for i = 1, . . . , k − 1 (1.11)

where τ is the precision parameter. The full conditional distribution of

Equation (1.11) is given by

xi−1 + xi+1 1
xi |x−i ∼ N , .
2 2τ

This model is specified inside the f() function as

> f(x, model="rw1", ..., hyper=list(prec=list(...)))

20 Spatio-temporal disease mapping

Pk
The model automatically imposes the sum-to-zero constraint i=1 xi = 0
(constr=TRUE).
• The "rw2" model, defines a second order random walk prior for x. It is
constructed assuming independent second-order increments

∆2 xi = xi − 2xi+1 + xi+1 ∼ N (0, τ −1 ) for i = 1, . . . , k − 2 (1.12)

where τ is the precision parameter. The full conditional distribution of

Equation (1.12) is given by

4(xi+1 + xi−1 ) − (xi+2 + xi−2 ) 1
xi |x−i ∼ N , .
6 6τ

This model is specified inside the f() function as

> f(x, model="rw2", ..., hyper=list(prec=list(...)))

Pk
The model automatically imposes the sum-to-zero constraint i=1 xi = 0
(constr=TRUE).
• The "generic0" model, defines a generic prior for x such that

x ∼ N (0, [τ C]−1 ),

where τ is the precision parameter and C is a structure (symmetric) matrix

of dimension k × k defined by the user. This model is specified inside the
f() function as

> f(x, model="generic0", Cmatrix=<Cmat>, ...,

+ hyper=list(prec=list(...)))

where the structure matrix C is passed to the program through the Cmat
argument (a dense or sparse-matrix).
• The "generic3" model, defines a generic prior for x such that
 " #−1 
Xm
x ∼ N 0, τi Ci ,
i=1

where τi is the specific precision parameter of the structure matrix Ci ≤ 0

(of dimension k × k) defined by the user. This model is specified inside
the f() function as
1.4 The R-INLA package 21

> f(x, model="generic3", Cmatrix=<list.Cmat>, ...,

+ hyper=list(prec1=list(...),prec2=list(...),...))

where list.Cmat is a list of length m (maximum 10) with the Ci matrices.

1.4.2 Implementing the lCAR prior

As already mentioned, the Leroux et al. (1999) CAR prior distribution (lCAR)
has been considered in this dissertation for the spatial random effect when
fitting spatio-temporal log-risk models. Recall that a lCAR prior distribution
0
for x = (x1 , . . . , xk ) is given by

x ∼ N (0, [τ (λR + (1 − λ)Ik )]−1 ), (1.13)

where τ is the precision parameter, λ is the spatial smoothing parameter,

R is the spatial neighborhood matrix of dimension k × k and Ik is an iden-
tity matrix. See Equation (1.3) for the expression of the full conditional
distribution.
This model was not originally available in R-INLA2 , but Ugarte et al.
(2014) show how to built this prior distribution using the "generic1" model.
According to the documentation, this model defines the following prior for x
−1 !
β
x ∼ N 0, τ (Ik − C) , (1.14)
λmax

where τ is the precision parameter, C is a structure (symmetric) matrix of

dimension k × k defined by the user, and λmax is the maximum eigenvalue of
C, which allows β to be in the range β ∈ [0, 1).
Let define C as

 −ni + 1, i = j
C = Ik − R = 1, i ∼ j (1.15)
0, otherwise


where ni is the number of neighbors of the ith area. If λmax = 1, then the
covariance matrix defined in Equation (1.14) takes the following expression
−1
β
τ (Ik − C) = [τ (Ik − β(Ik − R))]−1 = [τ (βR + (1 − β)Ik )]−1 ,
λmax
2
At the present time, an experimental version of the model is implemented under the
name "besagproper2".
22 Spatio-temporal disease mapping

which corresponds with the parameterization of the covariance matrix of the

lCAR prior distribution described in Equation (1.13) with β = λ. Hence we
need to show that the maximum eigenvalue of C is equal to 1.

Proof 1.1: First note that C = (cij ) is an ML-matrix (see for example
Seneta, 1981, p.45), that is, a real matrix for which cij ≥ 0, i 6= j. Let
consider the non-negative matrix T = µIk + C, where µ = max{ni }i∈{1,...,k} .
Then, C is an irreducible matrix if T is also irreducible. To show that T
is irreducible we assume that in the graph associated to the neighborhood
matrix R (see Rue and Held, 2005, p.18), there is a path from node i to node
j, for all i, j (i.e. regions i and j are connected). When the neighbourhood
structure is defined by adjacency, this condition holds provided there is no
isolated region or group of regions (for example islands). Let us supposed that
T is not irreducible (i.e. T is reducible). Hence, there exists a permutation
matrix P (see Rao and Rao, 1998, p.468) such that

0 A 0
PTP = .
B D

As T is a symmetric matrix, PTP0 is also symmetric and therefore B = 0.

Note that the off-diagonal elements of T are the same as those of C and
clearly
0 E F
PCP = ,
G H
where F = 0 and G = 0. Hence, the columns of matrix E represent regions
not connected with regions represented by the columns of matrix H. This is
a contradiction since all the elements in C are connected. Consequently, T
is an irreducible matrix and so is C.
Since C is a symmetric matrix, all its eigenvalues are real. As C is a
ML-irreducible matrix, it is a Perron matrix, and hence
X X
min cij ≤ τ ≤ max cij
i i
j j

where
P λmax is the maximum eigenvalue of C (see Seneta, 1981, p.52). Clearly
j cij = 1, ∀i = 1, . . . , k, and hence λmax = 1.

So, the lCAR prior can be specified inside the f() function as
> f(x, model="generic1", Cmatrix=<C.Leroux>, constr=TRUE, ...,
+ hyper=list(prec=list(...),beta=list(...)))
where C.Leroux is the structure matrix defined in Equation (1.15).
1.4 The R-INLA package 23

1.4.3 Prior distribution for the hyperparameters

As for the latent models, several prior distributions are implemented in
R-INLA for the hyperparameters θk . The list of all available priors can be
obtained typing

[1] "normal" "gaussian"

[3] "wishart1d" "wishart2d"
[5] "wishart3d" "wishart4d"
[7] "wishart5d" "loggamma"
[9] "minuslogsqrtruncnormal" "logtnormal"
[11] "logtgaussian" "flat"
[13] "logflat" "logiflat"
[15] "mvnorm" "pc.ar"
[17] "none" "invalid"
[19] "betacorrelation" "logitbeta"
[21] "pc.prec" "pc.dof"
[23] "pc.cor0" "pc.cor1"
[25] "pc.fgnh" "pc.spde.GA"
[27] "pc.matern" "pc.range"
[29] "pc" "ref.ar"
[31] "jeffreystdf" "expression:"
[33] "table:"

See the web page http://www.r-inla.org/models/priors for a detailed

description and examples of some of these priors. A novel approach using
penalised complexity priors (PC priors) is described in Simpson et al. (2014).
In all the models for latent Gaussian fields described in Section 1.4.1,
prior distributions for the precision parameters have to be specified. By
default, log-Gamma distribution of parameters 1 and 5e-05 are given to the
log-precision parameters in R-INLA, that is,

θ = log(τ ) ∼ logGamma(1,5e-05).

Note that for the "bym" model, the vector of hyperparameters is represented
as θ = (log(τu ), log(τv )), where τu and τv are respectively the precision pa-
rameters of the spatially structured and unstructured components of the
model. For the lCAR model described in Section 1.4.2, the vector of hyper-
parameters is represented as θ = (log(τ ), logit(β)), with default hyperprior
distribution
β
logit(β) = log ∼ N (0, 0.1)
1−β
24 Spatio-temporal disease mapping

In addition to the prior distributions already implemented in R-INLA,

the "expression" and "table" priors allow the user to define any pos-
sible prior not implemented yet. Instead of using the default hyperpriors
given by R-INLA, more suitable priors have been implemented when analyz-
ing real data examples in this dissertation. Specifically, improper uniform
prior distribution on the positive real line for the standard deviation, i.e.,
σ = 1/τ 2 ∼ U (0, ∞); and standard uniform distribution for the spatial
smoothing parameter, i.e., β ∼ U (0, 1), have been defined for the hyper-
parameters of the random effects. However, INLA only allows to define an
expression for the log-density θ1 = log(τ ) in the first case and θ2 = logit(β)
in the second case. So, appropriate transformations are necessary to obtain
equivalent distributions in each case.
Note that the σ ∼ U (0, ∞) prior distribution can be translated to an
equivalent distribution on the log-precision, by making

∂σ ∂ exp(− log(τ )/2)

π(θ1 ) = π(log(τ )) = π(σ) ∝ 1· ∝ exp(− log(τ )/2),
∂ log(τ ) ∂ log(τ )

and it can be implemented in R-INLA as

> sdunif = "expression:
+ logdens = -log_precision/2;
+ return(logdens)"
Once the "sdunif" prior distribution has been defined, it can be included
inside the f() function as
> f(x, model=<model>, ..., hyper=list(prec=list(prior=sdunif)))

β
In a similar way, accounting for θ2 = logit(β) = log 1−β , the density
function of θ2 is expressed as

∂β exp(θ2 )
π(θ2 ) = π(logit(β)) = π(β) · = π(β) · = π(β) · β(1 − β)
∂θ2 (1 + exp(θ2 ))2

To define the standard uniform distribution β ∼ U (0, 1) ≡ Beta(1, 1), the

log density of π(θ2 ) can be implemented in R-INLA as
> lunif = "expression:
+ a = 1;
+ b = 1;
+ beta = exp(theta)/(1+exp(theta));
+ logdens = lgamma(a+b)-lgamma(a)-lgamma(b)+(a-1)*beta+(b-1)*(1-beta);
+ log_jacobian = log(beta*(1-beta));
+ return(logdens+log_jacobian)"
1.4 The R-INLA package 25

Once the "lunif" prior distribution has been defined, it can be included
inside the f() function as
> f(x, model=<model>, ...,
+ hyper=list(beta=list(prior=lunif,initial=0)))

1.4.4 Posterior distribution of linear combinations

Depending on the context, it might be necessary to compute the posterior
marginals for linear combinations of the elements in the latent field (‘fixed’
or ‘random’ effects) or for the linear predictor of the model. Details on how
to compute these linear combinations within R-INLA are described in what
follows.
For the first case, assume that our interest lies in computing the posterior
marginals of
w = Ax,
0
where x = (x1 , . . . , xk ) is the latent field and A is a p × k matrix where
p is the number of linear combinations of the latent field. The functions
inla.make.lincomb() and inla.make.lincombs() can be used to define
such linear combinations. As remarked in Martins et al. (2013), two different
approaches are provided in R-INLA. The first approach creates an enlarged
latent field x̃ = (x, w) and then posterior marginals for x̃ are computed with
the INLA method using the Gaussian, simplified Laplace or full Laplace ap-
proximation strategies described in Section 1.3.2. However, the addition of
many linear combinations will lead to more dense precision matrices which
will consequently slow down the computations. The second approach does
not include w in the latent field, but performs a post-processing of the re-
sulting output given by INLA and approximate the posterior marginals of w
by a Gaussian distribution where

E[w|θ, y] = Aµ∗ and Var[w|θ, y] = A(Q∗ )−1 AT

in which µ∗ is the mean of the marginal approximation π̃(xi |θ, y) and Q∗

is the precision matrix of the Gaussian approximation π̃G (x|θ, y) used in
Equation (1.7). This approach leads to a much faster approximation of the
posterior marginals for w and that is why this is the default method in
R-INLA. However, more accurate approximations can be obtained switching
to the first approach, if necessary, by including the following argument into
the inla function
> inla(formula, family=<family>, data=<data>, ...,
+ control.inla=list(lincomb.derived.only=FALSE))
26 Spatio-temporal disease mapping

Example 1.1: Suppose that in the spatio-temporal model of Equation (1.4)

we are interested in computing the posterior distribution of wt = φt + γt .
This linear combination can be specified in R-INLA as

> # Define the linear combinations

> lc <- inla.make.lincombs(ID.year1=diag(t),ID.year2=diag(t))
> names(lc) <- paste("temporal.lincomb.",seq(1,t),sep="")
>
> # Compute w_t using the default method
> inla(formula, family=<family>, data=<data>, ..., lincomb=lc)
>
> # Compute w_t using the more accurate method
> inla(formula, family=<family>, data=<data>, ..., lincomb=lc,
+ control.inla=list(lincomb.derived.only=FALSE))

where formula contains the specification of the latent Gaussian fields of the
model, t is the number of time periods, and ID.year1 and ID.year2 are
respectively the internal variable names of φt and γt .

In some cases, the data/response might depend on a linear combination

of the linear predictor described in Equation (1.5). Internally, R-INLA adds
another layer in the hierarchical model

ν ∗ = Aν,

where in this case the likelihood function is linked to the latent field through
ν ∗ instead of ν, i.e.,
YN
π(y|x, θ) = π(yi |νi∗ , θ).
i=1

According to Martins et al. (2013), this feature is implemented by also adding

ν ∗ to the latent model, where the conditional distribution for ν ∗ has the mean
Aν and the precision matrix κA I where the constant κA is set to a high value.
In terms of output from inla, the vector (ν ∗ , ν) will be the linear predictor.

Example 1.2: Suppose that we want to fit the following model

k
X
yi = η + aij xi , for i = 1, . . . , N
j=1
1.4 The R-INLA package 27

0
where y = (y1 , . . . , yN ) is the response vector, η is an intercept, A = (aij ) is a
0
design matrix of dimension N ×k and x = (x1 , . . . , xk ) is a vector of unknown
coefficients where an exchangeable prior is considered, i.e., x ∼ N (0, τ Ik ).

This model can be specified in R-INLA as

> # Define the formula argument

> formula <- y ~ -1 + intercept + f(x,model="iid",...)
>
> # Call to the inla() function
> eta <- rep(1,N)
> inla(formula, family=<family>, data=<data>, ...,
+ control.predictor=list(A=cBind(eta,<A.matrix>),...))

where A.matrix contains the coefficients of the linear combinations.

1.4.5 Linear constraints for the latent Gaussian fields

In the spatio-temporal disease mapping models described in Section 1.2.2
sum-to-zero constraints are considered to ensure model identifiability be-
tween the intercept, the main spatial and temporal effects and the space-time
interaction effect.
In R-INLA, a sum-to-zero constraint is imposed by default on all in-
trinsic models (see Section 1.4.1) for the latent Gaussian field x, that is,
P k
i=1 xi = 0. Including this constraint (specified inside the f() function
with the constr=TRUE argument), makes possible to identify the intercept
and the main spatial/temporal effect. However, as stated in Table 1.2, addi-
tional contraints must be imposed over the spatio-temporal interaction term
depending on its prior distribution. Using the extraconstr argument, linear
constrains such as Ax = b can be specified, where the number of rows of A
is equal to the number of constraints to impose over x.

Example 1.3: Let consider the spatio-temporal model of Equation (1.4)

0
where a RW1 prior is given to the temporal random effect γ = (γ1 , . . . , γt )
and a completely structured prior (Type IV) is considered for the interaction
0
effect δ = (δ11 , . . . , δ1T , . . . , δn1 , . . . , δnT ) .

According to the Table 1.2, the following n + T sum-to-zero constraints are

28 Spatio-temporal disease mapping

needed over the interaction term

T
X
δit = 0, for i = 1, . . . , n
t=1

n
X
δit = 0, for t = 1, . . . , T
i=1

Note that these constraints can be written in the form Aδ = 0 with

0
IT ⊗ 1n
A= 0 . (1.16)
1T ⊗ In

This constraints are specified in the f() function as

> f(ID.delta, model="generic0", Cmatrix=<Cmat>,

+ rankdef=<rankdef>, constr=TRUE, hyper=list(prec=...),
+ extraconstr=list(A=<A.constr>, e=rep(0,<n.constr>)))

where Cmat is the Kronecker product of the spatial and temporal structure
matrices, rankdef is its rank deficiency, A.constr is the (n + T ) × nT dimen-
sion matrix given in Equation (1.16) and n.constr is equal to the number
of constraints to be imposed.

1.4.6 Model selection criteria

When the interest lays on the comparison between different models in terms
of performance, their deviance can be used. Given the data y with likelihood
π(y|x, θ) the Bayesian deviance is defined as

D(x, θ) = −2 log{(π(y|x, θ)} + 2 log{f (y)}. (1.17)

More specifically, for members of the exponential family with E[y] = µ(x, θ)
the saturated deviance obtained by setting f (y) = π(y|µ(x, θ) = y) shall
we used. The deviance of the model measures the variability linked to the
likelihood, which is the probabilistic structure used for the observations con-
ditional on the parameters. Typically, the posterior mean deviance D(x, θ) is
used as a measure of fit, as it is very robust. However more complex models
will fit better the data, and consequently lower values of the mean deviance
will be obtained. The Deviance Information Criterion (DIC, Spiegelhalter
et al., 2002) is the most commonly used measure of model fit based on the
1.4 The R-INLA package 29

deviance for Bayesian models. The DIC is computed as the sum of the poste-
rior mean deviance (a measure of goodnes of fit) and the number of effective
parameters (a measure of model complexity), which is defined as

pD = D(x, θ) − D(x, θ),

so that DIC= D(x, θ) + pD . Analogously to the Akaike information crite-

rion (AIC), models with smaller DIC values provide better trade-off between
model fit and complexity. The option control.compute=list(dic=TRUE)
inside the inla() function is used to compute the DIC. The details about
how these quantities are computed in R-INLA can be found in Rue et al.
(2009). It is very important to know that INLA does not compute the sat-
urated deviance, i.e., the deviance of the saturated model is removed from
Equation (1.17).
The Watanabe-Akaike information criterion (WAIC, Watanabe, 2010)
can be also computed by including the control.compute=list(waic=TRUE)
option.

1.4.7 Model calibration

0
Given the set of spatio-temporal observations y = (y11 , . . . , ynT ) , two types
of “leave-one-out” measures are defined to assess the calibration of the model:

(i) The conditional predictive ordinate (Pettit, 1990)

CPOit = P r(Yit = yit |y−it ),

that is, the cross-validated predictive probability mass at the observed

count yit .

(ii) The probability integral transform (Dawid, 1984)

PITit = P r(Yit ≤ yit |yit ),

that is, the cross-validated predictive cumulative distribution at the

observed count yit .

As described by Rue et al. (2009), these quantities are computed in R-INLA

without re-running the model by including into the inla() function the ar-
gument control.compute=list(cpo=TRUE). Their accuracy in comparison
with quantities that are obtained by McMC methods is discussed in Held
et al. (2010). As noted in this paper, the approximation of the predictive
measures might fail if the approximation of the latent field is not accurate
30 Spatio-temporal disease mapping

enough. This is due to an insufficient exploration of the tail properties of

involved densities. Hence, the full Laplace approximation might be oblig-
atory to get reliable results. It is also possible to increase accuracy of the
estimation for the tails of the marginal distributions by adding the option
control.inla=list(strategy="laplace", npoints=<h>) to add more eval-
uation points instead of the default npoints=9.
Using CPO values (whose extreme values indicates surprising observa-
tions), the Logarithmic Score (Gneiting and Raftery, 2007) can be computed
as
X n X T
LS = − log(CPOit ) (1.18)
i=1 t=1

which is asymptotically equivalent to the Akaike information criterion if the

observations are independent (Stone, 1977). The smaller the resulting score,
the better the predictive power of the model.
If y come from a continuous distribution, the PIT values have a standard
uniform distribution. So the histogram of the computed PIT values can
be used as a diagnostic tool. U-shaped histograms indicate underdispersed
predictive distributions, while hump or inverse U-shaped histograms point at
overdispersion. However, tn the case of count data the predictive distribution
is discrete and the PITs are no longer uniform under the hypothesis of an
ideal forecast. In this case, the adjusted version of the PIT suggested by
Czado et al. (2009) can be used instead.

1.5 Illustration
In this section, Spanish male brain cancer mortality data during the period
1986-2013 will be used to fit the spatio-temporal CAR models described
in Section 1.2 with INLA. Since the computational costs are substantially
reduced in comparison to McMC methods, a battery of models can be fitted
and compared in a reasonable time. The model selection criteria described
in Section 1.4.6 and Section 1.4.7 will be used to select the best model. The
R-INLA code to fit these models has been also included.

1.5.1 Brain cancer mortality data

Brain cancer mortality represents 2.4% of all male cancer deaths in Spain
in 2011. Mortality is slightly higher among men than among women and
has increased over the last 20 years. In 2014, the European population ad-
justed mortality rate was 5.48 per 100000 inhabitants, being the average age
1.5 Illustration 31

of death 64.05 years. Differences in brain cancer mortality risk among dif-
ferent Spanish provinces are known to exist, being Navarra and the Basque
provinces among those with a significant high relative risk (Ugarte et al.,
2010, 2014).
Brain cancer mortality data (International Classification of Disease-10:
code C71) registered during the period 1986-2010 in each of the 50 Spanish
provinces (excluding Ceuta and Melilla) have been obtained from the Span-
ish National Epidemiology Center. From a total of 50450 deaths recorded
throughout the studied period 28426 correspond to males and 22024 to fe-
males. The number of observed cases for males varies from 0 to 185, and
indirect age-standardization has been used to compute the number of ex-
pected deaths for each province and year.
The parametric and non-parametric models described in Section 1.2 have
been fitted to the real. The DIC, WAIC and the Logarithmic Score have
been used as model selection criteria. In addition, a corrected version of the
DIC proposed by Plummer (2008) has been included (DICc), because it has
been shown that DIC values may under-penalize complex models in disease
mapping.
Improper uniform prior distributions are given to the standard deviations
in the model and a vague zero mean normal distribution with precision close
to zero is considered for the intercept (η). Finally, a Uniform(0, 1) distribu-
tion has been used for the spatial smoothing parameter of the lCAR prior.
The results for all the fitted models using the simplified Laplace approxima-
tion are shown in Table 1.3.
Parametric models exhibit low values of the effective number of parame-
ters but the highest values of posterior deviance, leading to the largest DIC
values. Additive models are also discarded because of their large values in all
the model selection criteria. In general, models with a RW1 prior distribu-
tion for the structured temporal component shows a better fitting than those
with a RW2 prior. The model without unstructured temporal component
and completely structured (Type IV) interaction term is the best model in
terms of a trade-off between model fit and complexity (smallest DIC, DICc
and WAIC values), and also the best in terms of predictive power (lowest
Logarithmic Score value). To obtain more accurate posterior distributions,
the model has been fitted again using the full Laplace approximation.
The estimated log-relative risks can be split up into different components.
An overall global risk (given by η̂); a risk related to the spatial location (ξ̂)
that can be attributed to factors associated to a particular region; a tem-
poral risk trend common to all areas (γ̂) that can be attributed to changes
in coding the disease, diagnostics or policies affecting the whole country;
and an area specific temporal risk trend (δ̂) that reflects particular effects
32 Spatio-temporal disease mapping

Table 1.3: Posterior mean of the deviance (D̄), number of effective parame-
ters (pD ), model selection criteria (DIC, DICc, WAIC and Logarithmic Score)
and computational time (in seconds) from fitted models in the analysis of
male brain cancer mortality data in Spain.

Parametric models: log(rit ) = η + ξi + (β + ϕi ) · t

Prior for ϕ D̄ pD DIC DICc WAIC LS time

M1 iid 7223.7 50.9 7274.6 7277.5 7288.0 3644.2 6

M2 iCAR 7217.4 48.5 7265.8 7268.4 7278.9 3639.6 5
Nonparametric models: log(rit ) = η + ξi + γt + φt + δit

Rγ ≡ RW1 structure matrix

Space-time interaction D̄ pD DIC DICc WAIC LS time

M3 Additive model 7183.3 52.1 7235.5 7238.4 7249.9 3625.2 7
M4 Type I 6965.1 221.2 7186.3 7277.0 7194.8 3611.9 10
M5 Type II 7032.6 136.3 7168.9 7192.7 7188.5 3597.8 190
M6 Type III 6968.2 187.5 7155.8 7222.1 7162.0 3591.3 65
M7 Type IV 7022.4 123.2 7145.6 7166.1 7162.0 3584.0 452

Rγ ≡ RW2 structure matrix

Space-time interaction D̄ pD DIC DICc WAIC LS time

M8 Additive model 7185.0 50.7 7235.6 7238.3 7249.5 3625.0 6
M9 Type I 6964.8 221.0 7185.8 7276.4 7194.3 3611.6 11
M10 Type II 7068.4 135.5 7203.9 7227.3 7227.9 3617.9 189
M11 Type III 6969.2 186.4 7155.7 7221.1 7162.2 3591.3 63
M12 Type IV 7058.4 131.7 7190.1 7212.3 7212.1 3609.8 468

Nonparametric models: log(rit ) = η + ξi + γt + δit

Rγ ≡ RW1 structure matrix

Space-time interaction D̄ pD DIC DICc WAIC LS time

M13 Additive model 7185.0 50.1 7235.1 7237.7 7248.8 3624.6 4
M14 Type II 7033.8 134.7 7168.5 7191.4 7188.1 3597.4 170
M15 Type IV 7023.8 121.4 7145.1 7164.9 7161.7 3583.6 394

Rγ ≡ RW2 structure matrix

Space-time interaction D̄ pD DIC DICc WAIC LS time

M16 Additive model 7199.0 42.8 7241.8 7243.5 7253.3 3626.8 4
M17 Type II 7072.8 132.7 7205.5 7228.0 7229.9 3618.7 182
M18 Type IV 7072.7 123.3 7196.0 7215.5 7218.8 3612.7 432
1.5 Illustration 33

of each province. Figure 1.1a shows the spatial mortality risk (ζi = exp(ξi ))
associated to each region and constant along the period, while Figure 1.1b
displays the posterior probability that the spatial risk is greater than one.
Probabilities above 0.9 (below 0.1) points toward high (low) risk regions (see
Richardson et al., 2004; Ugarte et al., 2009a,b for some discussion about refer-
ence thresholds in relative risks and cut-off probabilities). From this picture,
a high risk is associated to some northern provinces of Spain. The temporal
risk trend exp(γt ) common to all regions is represented in Figure 1.1c, as well
as the 95% credibility interval. A clear increasing trend is observed during
the studied period.

1.36 1

1.23
0.9

1.11

0.8

1.00

0.2

0.90

0.1
0.81

0.73 0

(a) Map of posterior means of the spatial (b) Map of spatial posterior probabilities
pattern of mortality risks ζi = exp(ξi ) P (ζi > 1|O)
1.3
1.2
1.1
exp(γt)

1.0
0.9
0.8
0.7

1986 1990 1994 1998 2002 2006 2010

Year

(c) Common temporal trend of brain cancer

mortality relative risks

Figure 1.1: Spatial and temporal effects of male brain cancer mortality rela-
tive risks in Spanish provinces.
34 Spatio-temporal disease mapping

Finally, Figure 1.2 show the spatio-temporal evolution of male brain can-
cer mortality risks for each province (comparing to the whole of Spain) during
the study period 1986-2010, and the posterior probabilities that the relative
risks are greater than one. The risk scale was originally constructed in the
logarithmic scale to express the same magnitudes of excess and default of risk
with respect to Spain. Then, it was back-transformed to facilitate maps read-
ing and interpretation. For example, 1.67 means a 67% excess of risks with
respect to Spain in the studied period and 0.60 (1/1.67) means the same
amount but of risk default. Combining the information provided by both
maps, an increase in risk is observed as the maps are getting darker with
years. A group of provinces in the north and central-east of Spain exhibit
high risk.

1.5.2 R code for model fitting in INLA

The R code to fit the 18 models displayed in Table 1.3 with INLA is detailed
below. First, the data frame (or list) containing the variables of the model
has to be defined
> Data <- data.frame(O=<observed>, E=<expected>, ID.area=rep(1:n,each=t),
+ ID.area1=rep(1:n,each=t), year=rep(1:t,n),
+ ID.year=rep(1:t,n), ID.year1=rep(1:t,n),
+ ID.area.year=seq(1,n*t))

where observed and expected are respectively the vectors of observed and
expected deaths, being n and t the number of areas and time periods for
which data is available (n=50 provinces and t=25 years for brain cancer mor-
tality data). Note that the data must be ordered according to the Kronecker
product given for the structure matrix of the space-time interaction random
effect δ defined in Table 1.1. For details about how to introduce the data
or how the IDs must be specified in INLA, see the examples and tutorials
provided in http://www.r-inla.org/examples.
Then, we define the spatial neighborhood matrix Rξ and the structure
matrix to implement the lCAR prior distribution using the "generic1"
model (see Section 1.4.2) as
> g <- inla.read.graph("prov_nb.inla")
> R.xi <- matrix(0, g$n, g$n)
> for (i in 1:g$n){
+ R.xi[i,i]=g$nnbs[[i]]
+ R.xi[i,g$nbs[[i]]]=-1
+ }
> R.Leroux <- diag(n)-R.xi
1.5 Illustration 35

1.65
1986 1991 1996

1.45

1.28

1.13

1.00
2000 2005 2010

0.88

0.78

0.69

0.61

1
1986 1991 1996

0.9

0.8

2000 2005 2010

0.2

0.1

Figure 1.2: Posterior mean distribution of male brain cancer mortality risks
(top) and P (r̂it > 1|O) posterior probability maps (bottom).

where "prov_nb.inla" is an inla.graph object containing the neighbor-

ing structure of the Spanish provinces. Similarly, we define the temporal
structure matrix Rγ of a random walk of first or second order as

> D1 <- diff(diag(t),differences=1)

> R.gammaRW1 <- t(D1)%*%D1
> D2 <- diff(diag(t),differences=2)
> R.gammaRW2 <- t(D2)%*%D2
36 Spatio-temporal disease mapping

The formula object for the models presented in Table 1.3 are defined as

M1 model
> f.M1 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.area1, year, model="iid",
+ hyper=list(prec=list(prior=sdunif))) +
+ (year-mean(year))

M2 model
> f.M2 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.area1, year, model="besag", graph="prov_nb.inla",
+ hyper=list(prec=list(prior=sdunif))) +
+ (year-mean(year))

M3 model
> f.M3 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw1", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif)))

M4 model
> f.M4 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw1", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.area.year, model="iid", constr=TRUE,
+ hyper=list(prec=list(prior=sdunif)))

M5 model
> R <- kronecker(diag(n),R.gammaRW1)
> r.def <- n
> A.constr <- kronecker(diag(n),matrix(1,1,t))
>
> f.M5 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw1", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.area.year, model="generic0", Cmatrix=R, rankdef=r.def,
+ constr=TRUE, hyper=list(prec=list(prior=sdunif)),
+ extraconstr=list(A=A.constr, e=rep(0,n)))
1.5 Illustration 37

M6 model
> R <- kronecker(R.xi,diag(t))
> r.def <- t
> A.constr <- kronecker(matrix(1,1,n),diag(t))
>
> f.M6 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw1", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.area.year, model="generic0", Cmatrix=R, rankdef=r.def,
+ constr=TRUE, hyper=list(prec=list(prior=sdunif)),
+ extraconstr=list(A=A.constr, e=rep(0,t)))

M7 model
> R <- kronecker(R.xi,R.gammaRW1)
> r.def <- n+t-1
> A1 <- kronecker(diag(n),matrix(1,1,t))
> A2 <- kronecker(matrix(1,1,n),diag(t))
> A.constr <- rbind(A1,A2)
>
> f.M7 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw1", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.area.year, model="generic0", Cmatrix=R, rankdef=r.def,
+ constr=TRUE, hyper=list(prec=list(prior=sdunif)),
+ extraconstr=list(A=A.constr, e=rep(0,n+t)))

M8 model
> f.M8 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw2", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif)))

M9 model
> f.M9 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw2", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.area.year, model="iid", constr=TRUE,
+ hyper=list(prec=list(prior=sdunif)),
+ extraconstr=list(A=matrix(rep(1:t,n),1,n*t),e=0))
38 Spatio-temporal disease mapping

M10 model
> R <- kronecker(diag(n),R.gammaRW2)
> r.def <- 2*n
> A.constr <- kronecker(diag(n),matrix(1,1,t))
>
> f.M10 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw2", hyper=list(prec=list(prior=sdunif)),
+ extraconstr=list(A=matrix(1:t,1,t),e=0)) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.area.year, model="generic0", Cmatrix=R, rankdef=r.def,
+ constr=TRUE, hyper=list(prec=list(prior=sdunif)),
+ extraconstr=list(A=A.constr, e=rep(0,n)))

M11 model
> R <- kronecker(R.xi,diag(t))
> r.def <- t
> A.constr <- kronecker(matrix(1,1,n),diag(t))
>
> f.M11 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw2", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.area.year, model="generic0", Cmatrix=R, rankdef=r.def,
+ constr=TRUE, hyper=list(prec=list(prior=sdunif)),
+ extraconstr=list(A=A.constr, e=rep(0,t)))

M12 model
> R <- kronecker(R.xi,R.gammaRW2)
> r.def <- 2*n+t-2
> A1 <- kronecker(diag(n),matrix(1,1,t))
> A2 <- kronecker(matrix(1,1,n),diag(t))
> A.constr <- rbind(A1,A2)
>
> f.M12 <- O ~ f(ID.area, model="generic1", Cmatrix=R.Leroux, constr=TRUE,
+ hyper=list(prec=list(prior=sdunif),beta=list(prior=lunif))) +
+ f(ID.year, model="rw2", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.year1, model="iid", hyper=list(prec=list(prior=sdunif))) +
+ f(ID.area.year, model="generic0", Cmatrix=R, rankdef=r.def,
+ constr=TRUE, hyper=list(prec=list(prior=sdunif)),
+ extraconstr=list(A=A.constr, e=rep(0,n+t)))

As described in Section 1.4.5, the linear constraints that makes each model
identifiable (see Table 1.2) are specified through the constr=TRUE (a sum-to-
zero constraint over the random effect) and extraconstr arguments. Note
1.5 Illustration 39

that in the case of completely structured (Type IV) interaction term

T
P 0
δit = 0, for i = 1, . . . , n ⇐⇒ (In ⊗ 1T )δ = 0,
t=1

and
n
P 0
δit = 0, for t = 1, . . . , T ⇐⇒ (1n ⊗ IT )δ = 0.
i=1

To define the formula object for the non-parametric models without unstruc-
tured temporal component φt (models M13-M18), the f(ID.year1,...) ar-
gument has to be removed. For details about the implementation of the
hyperprior distributions in R-INLA see Section 1.4.3.
Finally, we run the INLA algorithm with a call to the inla() function as

> inla(<formula>, family="poisson", data=Data, E=E,

+ control.predictor=list(compute=TRUE, cdf=c(log(1))),
+ control.compute=list(dic=TRUE, cpo=TRUE, waic=TRUE),
+ control.inla=list(strategy="simplified.laplace"))

In addition to the marginal posterior distribution of the linear predictor

log(rit ), the posterior probabilities P (log(rit ) > 0|O) are also computed by
including the control.predictor=list(compute=TRUE, cdf=c(log(1)))
argument.