0% found this document useful (0 votes)

46 views99 pages

HSDM Vignette

This document introduces the hSDM R package for fitting hierarchical Bayesian species distribution models that account for imperfect detection and spatial correlation in species occurrence and abundance data. It discusses how these two factors can bias species distribution models if not accounted for, and describes existing methods like site-occupancy and N-mixture models that can address them. The hSDM package aims to provide user-friendly statistical functions for applying these more complex but realistic statistical models to better understand species distributions and inform conservation efforts.

Uploaded by

Gerardo Martin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views99 pages

HSDM Vignette

Uploaded by

Gerardo Martin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 99

Hierarchical Bayesian species distribution models with

the hSDM R Package

May 6, 2019

Adansonia grandidieri Baill. next to Andavadoaka village (southwest Madagascar).

Ghislain Vieilledent?,1 Cory Merow2 Jérôme Guélat3

Andrew M. Latimer4 Marc Kéry3
Alan E. Gelfand5 Adam M. Wilson6 Frédéric Mortier1
and John A. Silander Jr.2

[?] Corresponding author: \E-mail: [email protected] \Phone: +33.(0)4.67.59.37.51

\Fax: +33.(0)4.67.59.39.09
[1] Cirad – UMR AMAP, F–34398 Montpellier, France
[2] University of Connecticut – Department of Ecology and Evolutionary Biology, Storrs, CT 06269,
USA
[3] Swiss Ornithological Institute – 6204 Sempach, Switzerland
[4] University of California – Department of Plant Sciences, Davis, CA 95616, USA
[5] Duke University – Department of Statistical Science, Durham, NC 27708, USA
[6] Yale University – Department of Ecology and Evolutionary Biology, New Haven, CT 06520, USA

1
2
Florebo quocumque ferar

“I will flower everywhere I am planted”

3
4
Abstract

Species distribution models (SDM) are useful tools to explain or predict species range from
various environmental factors. SDM are thus widely used in conservation biology. Based
on the observations of the species in the field (occurence or abundance data), SDM face
two major problems which lead to bias in models’ results: imperfect detection and spatial
correlation of the observations.
At the present time, there is a lack of statistical tools to analyse large occurence or
abundance data-sets (typically with tens of hundreds observation points) taking into ac-
count both imperfect detection and spatial correlation.
Here, we present the hSDM R package wich aims at providing user-friendly statisti-
cal functions to fill this gap. Functions were developped through a hierarchical Bayesian
approach. They call a Metropolis-within-Gibbs algorithm coded in C to estimate model’s
parameters. Using compiled C code for the Gibbs sampler reduce drastically the compu-
tation time.
By making these new statistical tools available to the scientific community, we hope to
democratize the use of more complex, but more realistic, statistical models for increasing
knowledge in ecology and conserving biodiversity.

Keywords: R, C code, site-occupancy models, CAR process, spatial autocorrelation, biodiver-

sity, SDM, niche modelling, detection probability, counts data, presence-absence, false absence,
uncertainty, hierachical Bayesian models, Metropolis, MCMC, Gibbs sampler

5
6
CHAPTER 1

Introduction

1.1 Species distribution models

Biogeography is the study of the distribution of species over space and time and biogeog-
raphers try to understand the factors determining a species distribution (Smith, 1868;
Wallace, 1876). A species distribution is often represented with a map (Wallace, 1876).
This knowledge on the ecology of the species can be used for several applications such as
conservation biology (Thuiller, 2014).

Species distribution modelling (alternatively known as “environmental niche modelling”,

“ecological niche modelling”, “predictive habitat distribution modelling”, and “climate en-
velope modelling”) refers to the process of using computer algorithms to predict the dis-
tribution of species in geographic space on the basis of a mathematical representation of
their known distribution in environmental space (i.e. the realized ecological niche). The
environment is in most cases represented by climate data (such as temperature, and pre-
cipitation), but other variables such as soil type and land cover can also be used. Species
distribution models (SDM) allow estimating the probability of presence or abundance of a
species on a large geographical range using a limited number of species observations (Elith
& Leathwick, 2009; Guisan & Zimmermann, 2000). Species observations can be occurence
data (presence-absence data or presence only data) or abundance data (also known as
count data).

7
1.2 Imperfect detection and spatial correlation of the
observations

When considering presence-absence or abundance data for species distribution modelling,

strong assumptions are usually made (Araujo & Guisan, 2006; Guisan & Thuiller, 2005;
Sinclair et al., 2010). Among these assumptions, two can lead to biased estimates of species
distribution. The first one deals with imperfect detection and the second one with spatial
correlation of the observations.

Regarding imperfect detection, occurrence of a species is typically not observed per-

fectly. Species traits, survey-specific conditions and site-specific characteristics may influ-
ence species detection probability which is often < 1 (Chen et al., 2013). Thus, observations
might include false absences. For example, the habitat can be suitable and the species is
present but individuals have not been seen during the census. Or the habitat can be suit-
able but the species has not dispersed yet to the site (typical example for plant species,
see Latimer et al. (2006)) or was not present on the site at the moment of the observation
(typical example for animal species such as birds, see Kéry et al. (2005)). Treating observed
occurrence and species distributions as the true occurrence and distribution, failing to make
amendments for imperfect detection, may lead to problems in species distribution stud-
ies, habitat models and biodiversity management (Kéry & Schmidt, 2008; Lahoz-Monfort
et al., 2014; Latimer et al., 2006).

Regarding spatial correlation, most species present geographical patchiness (positive

spatial autocorrelation). This pattern is often driven by multiple causes that may be asso-
ciated to exogenous environmental factors such as climate or soil (which might be partly
taken into account in species distribution models), but also to endogeneous biotic pro-
cesses, called contagious processes, such as dispersal, migration, conspecific attraction or
mortality which are rarely considered (Dormann et al., 2007; Legendre, 1993; Lichstein
et al., 2002; Sokal & Oden, 1978). Due to the contagious biotic processes, the presence or
abundance of a species at one site is influenced by the presence or abundance of the species
at surrounding sites. A species might be present at a site where the environment is less
suitable because of the presence of the species at neighbouring sites where the environment
is higly suitable. Thus, ignoring spatial correlation may lead to biased conclusions about
ecological relationships (Lichstein et al., 2002) and even invert the slope of relationships
from non-spatial analysis in some particular cases (Kühn et al., 2006). In addition to
its ecological significance, spatial autocorrelation is problematic for classical species dis-
tribution models which assume independently distributed errors (Dormann et al., 2007;
Legendre, 1993; Lichstein et al., 2002).

8
1.3 Methods and software to account for imperfect
detection and spatial correlation
New classes of models, called site-occupancy models (MacKenzie et al., 2002) or zero
inflated binomial (ZIB) models (Latimer et al., 2006) for presence-absence data and N-
mixture models (Royle, 2004) or zero inflated Poisson (ZIP) models for abundance data
(Flores et al., 2009), were developed to solve the problems created by imperfect detection.
These models combine two processes, an ecological process which describes habitat suit-
ability and an observation process which takes into account imperfect detection. Because
they mix probability distributions to represent the suitability and observation processes,
these models have also been called mixture models. Mixture models use information from
repeated observations at several sites to estimate detectability. Detectability may vary
with site characteristics (e.g., habitat variables) or survey characteristics (e.g., weather
conditions), whereas suitability relates only to site characteristics.
One additional point regarding site-occupancy models is that they form a unifying
framework for a very large array of capture-recapture models to estimate population size in
animal ecology (Nichols, 1992): using parameter-expanded data augmentation (Royle et al.,
2007), most models for population size, survival, recruitment and similar demographic
quantities (presented in detail in standard references such as Williams et al. (2002), Royle &
Dorazio (2008) and Kéry & Schaub (2012)) can be cast into the framework of an occupancy
model and this makes their fitting much easier.
Several studies have demonstrated the advantages of site-occupancy and N-mixture
models over classical models which do not consider imperfect detection. These studies
have focused on the distribution of various plant or animal species in marine and terrestrial
ecosystems (see Chen et al. (2013); Latimer et al. (2006) for plants, Dorazio et al. (2006);
Kéry et al. (2005); Rota et al. (2011); Royle (2004) for birds, Kéry et al. (2010) for insects,
Bailey et al. (2004); Chelgren et al. (2011); MacKenzie et al. (2002) for amphibians, Monk
(2014) for fishes, and Gray (2012); Poley et al. (2014) for mammals).
Several softwares can be used to fit site-occupancy and N-mixture models (Table 1.2).
Some are based on the maximum likelihood approach (such as the widely used free Windows
programs MARK and PRESENCE and the R package unmarked) while other are based
on the hierarchical Bayesian approach (such as WinBUGS and OpenBUGS programs).

9
Softwares Socc Nmix Sp Approach OS Reference URL
PRESENCE 1 1 0 ML MS-W MacKenzie PRESENCE
(2006)
MARK 1 1 0 ML MS-W White & MARK
Burnham (1999)
E-SURGE 1 0 0 ML MS-W Choquet et al. E-SURGE
(2009)
unmarked 1 1 0 ML cross-platform Fiske & unmarked
Chandler (2011)
stocc 1 0 1 Bayesian cross-platform Johnson et al. stocc
(2013)

10
JAGS 1 1 0 Bayesian cross-platform JAGS
Stan 1 1 0 Bayesian cross-platform Stan Stan
Development
Team (2014)
WinBUGS 1 1 1 Bayesian MS-W Lunn et al. WinBUGS
(2009)
OpenBUGS 1 1 1 Bayesian cross-platform Lunn et al. OpenBUGS
(2009)
hSDM 1 1 1 Bayesian cross-platform hSDM

Table 1.2: Softwares available for modeling species distribution including imperfect detection.
A variety of methods have been developed to correct for the effects of spatial autocor-
relation in species distribution models based on occurence or abundance data (Cressie &
Cassie, 1993; Dormann et al., 2007; Keitt et al., 2002; Miller et al., 2007). In their review
article, Dormann et al. (2007) described six different statistical approaches to account for
spatial autocorrelation: autocovariate regression; spatial eigenvector mapping; generalised
least squares; autoregressive models and generalised estimating equations.
Several studies have demonstrated the advantages of these mehods focusing on a variety
of plant or animal species (see Gelfand et al. (2005); Kühn et al. (2006); Latimer et al.
(2006) for plants, Lichstein et al. (2002) for birds, and Johnson et al. (2013); Poley et al.
(2014) for mammals).
Among the methods available to account for spatial autocorrelation, conditional au-
toregressive (CAR) models, which incorporate spatial autocorrelation through a neigh-
bourhood structure, are commonly implemented in statistical softwares (Dormann et al.,
2007). The most commonly used softwares to implement CAR models are OpenBUGS
and WinBUGS softwares (Lunn et al., 2009) which have in-built functions (car.normal
and car.proper) to describe the CAR process. CAR models can also be implemented
in BayesX (Brezger et al., 2005) and in the following R packages: R-INLA (Rue et al.,
2009), CARBayes (Lee, 2013), stocc (for binary data only), spatcounts (for count data
only), CARramps (for Gaussian data only), and spdep (for Gaussian data only) (Ta-
ble 1.4).

11
Softwares Type of data Approach OS Reference URL
OpenBUGS all Bayesian cross-platform Lunn et al. OpenBUGS
(2009)
WinBUGS all Bayesian MS-W Lunn et al. WinBUGS
(2009)
BayesX all Bayesian cross-platform Brezger et al. BayesX
(2005)
R-INLA all Bayesian cross-platform Rue et al. R-INLA
(2009)

12
CARBayes all Bayesian cross-platform Lee (2013) CARBayes
stocc all Bayesian cross-platform Johnson et al. stocc
(2013)
spatcounts count Bayesian cross-platform spatcounts
CARramps Gaussian Bayesian cross-platform CARramps
spdep Gaussian ML cross-platform spdep
hSDM binomial and count Bayesian cross-platform hSDM

Table 1.4: Softwares available for modeling species distribution including spatial autocorrelation.
1.4 Objectives of the hSDM R package
Among the available statistical programs, only OpenBUGS can be used on any operating
system to fit both site-occupancy or N-mixture models including also a spatial autocor-
relation process (Table 1.2 and Table 1.4). One problem is that OpenBUGS, for such
models, cannot handle large data-sets (typically, data-sets with tens of thousands sites).
Moreover, for smaller data-sets, models can be fitted but computation time can be long due
to the fact that the OpenBUGS code is interpreted and not compiled. For this reason,
we decided to develop the hSDM (for hierarchical Bayesian species distribution models) R
package. The stocc R package (Johnson et al., 2013; Poley et al., 2014), which can handle
binary data only, has been developed for the same reasons. The hSDM package allows
the user to fit mixture models which take into account imperfect detection (site-occupancy,
N-mixture, ZIB and ZIP models) and account for spatial autocorrelation. Spatial autocor-
relation is represented through an intrinsic CAR process (Besag et al., 1991). Functions
in the hSDM R package use an adaptive Metropolis algorithm (Metropolis et al., 1953;
Robert & Casella, 2004) in a Gibbs sampler (Casella & George, 1992; Gelfand & Smith,
1990) to obtain the posterior distribution of model’s parameters. The Gibbs sampler is
written in C code and compiled to optimize computation efficiency. Thus, the hSDM
package can be used for very large data-sets while reducing drastically the computation
time.
In this vignette, we present examples to illustrate the use of the hSDM package in the
R statistical environment (R Development Core Team, 2014). Examples use virtual or real
data-sets. Results obtained with functions in the hSDM package are compared with the
results obtained with other softwares and models.

13
14
CHAPTER 2

Occurence data

2.1 Binomial model

2.1.1 Mathematical formulation

Let’s consider a random variable yi representing the total number of presences of a species
after several visits vi at a particular site i. Random variable yi can take values from 0
to vi and can be assumed to follow a Binomial distribution having parameters vi and θi
(Eq. 2.1). Parameter θi can be interpreted as the probability of presence of the species
at site i . Using a logit link function, θi can be expressed as a linear model combining
explicative variables Xi and parameters β (Eq. 2.1).

yi ∼ Binomial(vi , θi )
(2.1)
logit(θi ) = Xi β

Using this statistical model, we aim at representing a “suitability process”. Given

environmental variables Xi , how much is habitat at site i suitable for the species under
consideration? Parameters β indicate how much each environmental variable contributes
to the suitability process. Like every other function in the hSDM R package, function
hSDM.binomial() estimates the parameters β of such a model in a Bayesian framework.
Parameter inference is done using a Gibbs sampler including a Metropolis algorithm. The
Gibbs sampler is coded in the C language to optimize computation efficiency.

15
2.1.2 Data generation
To explore the characteristics of the hSDM.binomial() function, we generate a virtual
data-set on the basis of the Binomial model described above (Eq. 2.1). In the most general
case, sites are visited once (vi = 1). Thus, the random variable yi follows a Bernoulli
distribution of parameter θi and habitat characteristics Xi are fixed for site i. We generate
a virtual data-set in this particular case. For data generation, we import virtual altitudinal
data in R. Altitude is used as an explicative variable to determine habitat suitability, i.e.
the probability of presence of a virtual species. Altitudinal data are loaded at the same
time as the hSDM R package (data frame altitude in the working directory).
These data are transformed into a raster object using the function rasterFromXYZ()
from the raster package. The raster has 2500 cells (50 columns and 50 rows) and the
altitude ranges roughly between 100 and 600 m (Fig. 2.1). For linear models, explicative
variables are usually centered and scaled to facilitate inference and interpretation of model
parameters.

# Load altitudinal data and create raster

library(raster)
library(hSDM)
data(altitude,package="hSDM")
alt.orig <- rasterFromXYZ(altitude)
extent(alt.orig) <- c(0,50,0,50)
plot(alt.orig)
# Center and scale altitudinal data
alt <- scale(alt.orig,center=TRUE,scale=TRUE)
plot(alt)

A linear model including altitude (variable denoted A) is used to compute the proba-
bility of presence of the species (Eq. 2.2).

yi ∼ Bernoulli(θi )
(2.2)
logit(θi ) = β0 + β1 Ai

We fix the parameters to β0 = −1 and β1 = 1. The species has a higher probability of

presence at higher altitudes (Fig. 2.2).

# Target parameters
beta.target <- matrix(c(-1,1),ncol=1)
# Matrix of covariates (including the intercept)
ncells <- ncell(alt)
X <- cbind(rep(1,ncells),values(alt))
# Probability of presence as a linear function of altitude
logit.theta <- X %*% beta.target

16
50

50
40

40
1
500
30

30
0
400

−1
300
20

20
−2
200
−3
10

10
0

0
0 10 20 30 40 50 0 10 20 30 40 50

Figure 2.1: Altitudinal data. Original values (in m) on the left. Centered and scaled
values on the right.

theta <- inv.logit(logit.theta)

# Coordinates of raster cells
coords <- coordinates(alt)
# Transform the probability of presence into a raster
theta <- rasterFromXYZ(cbind(coords,theta))
# Color palette for probability plots
colRP <- colorRampPalette(c("white","yellow","orange",
"red","brown","black"))
# Plot the probability of presence
brks <- seq(0,1,length.out=100)
arg <- list(at=seq(0,1,length.out=5), labels=c("0","0.25","0.5","0.75","1"))
nb <- length(brks)-1
plot(theta,main="Initial probabilities",col=colRP(nb),
breaks=brks,axis.args=arg,zlim=c(0,1))

We can assume a number n of sites in the landscape where we have been able to observe
or not the presence of the species. We can simulate the presence or absence of the species
at these n sites given our model (Fig. 2.3).

# Number of observation sites

nsite <- 200
# Set seed for repeatability
seed <- 1234
# Sample the observations in the landscape

17
Initial probabilities

50
40
1

0.75
30

0.5
20

0.25

0
10
0

0 10 20 30 40 50

Figure 2.2: Probability of presence.

set.seed(seed)
x.coord <- runif(nsite,0,50)
set.seed(2*seed)
y.coord <- runif(nsite,0,50)
library(sp)
sites.sp <- SpatialPoints(coords=cbind(x.coord,y.coord))
# Extract altitude data for sites
alt.sites <- extract(alt,sites.sp)
# Compute theta for these observations
X.sites <- cbind(rep(1,nsite),alt.sites)
logit.theta.site <- X.sites %*% beta.target
theta.site <- inv.logit(logit.theta.site)
# Simulate observations
visits <- rep(1,nsite) # One visit per site for the moment
set.seed(seed)
Y <- rbinom(nsite,visits,theta.site)
# Group explicative and response variables in a data-frame
data.obs.df <- data.frame(Y,visits,alt=X.sites[,2])
# Transform observations in a spatial object
data.obs <- SpatialPointsDataFrame(coords=coordinates(sites.sp),
data=data.obs.df)
# Plot observations
plot(alt.orig)
points(data.obs[data.obs$Y==1,],pch=16)
points(data.obs[data.obs$Y==0,],pch=1)

18
50
● ● ●
● ● ● ● ●
● ●
● ●
● ●
●
● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●

40
● ●
● ● ●
● ●
● ● ●●
● ●
● ● ●
●● ● ●
● ● ●
●
● ●
●
●
●● ●
500
●
● ● ●
● ●
30 ● ●
● ● ● ● ●
● ● ● ●
●
● ● ● ●
● ●●
● 400
● ● ●
● ● ●
● ● ●
● ●
● ● ●
●● ● ●
●
●
●
● ● ●
● 300
20

●● ●
● ● ● ●●
● ●●● ● ● ●
● ●
● ●
● ● 200
●●
● ●
● ● ● ●● ●
● ● ●
● ● ● ● ● ●
●
●
● ● ●
10

● ● ● ●
● ●
● ● ● ●
● ●
● ● ●● ● ● ●
● ●
● ● ● ●●● ● ●
● ● ● ●
● ● ●
●● ●
●
●
0

0 10 20 30 40 50

Figure 2.3: Observation points. Presences (full circles) and absences (empty circles) are
localized on the altitude map (in m).

2.1.3 Parameter inference using the hSDM.binomial() function

The hSDM.binomial() function performs a Binomial logistic regression in a Bayesian
framework. Before using this function we need to prepare a bit the data for predictions.
We want to have predictions on the whole landscape, not only at observation points. To
directly obtain these predictions, we can create a data frame including altitudinal data on
the whole landscape. This data frame will be used for the suitability.pred argument.
The data frame for predictions must include the same column names as those used in the
formula for the suitability argument (i.e. “alt” our example).

data.pred <- data.frame(alt=values(alt))

We can now call the hSDM.binomial() function. Setting parameter save.p to 1, we can
save in memory the MCMC values for predictions. These values can be used to compute
several statistics for each predictions (mean, median, 95% quantiles). For example, mean
and 95% quantiles are useful to estimate the uncertainty around the mean predictions.

mod.hSDM.binomial <- hSDM.binomial(presences=data.obs$Y,

trials=data.obs$visits,
suitability=~alt,

19
data=data.obs,
suitability.pred=data.pred,
burnin=1000, mcmc=1000, thin=1,
beta.start=0,
mubeta=0, Vbeta=1.0E6,
seed=1234, verbose=1, save.p=1)

2.1.4 Analysis of the results

The hSDM.binomial() function returns an MCMC (Markov chain Monte Carlo) for each
parameter of the model and also for the model deviance. To obtain parameter estimates,
MCMC values can be summarized through a call to the summary() function from the coda
package. We can check that the values of the target parameters, β0 = −1 and β1 = 1, are
within the 95% confidence interval of the parameter estimates.

summary(mod.hSDM.binomial$mcmc)

Parameters estimates can be compared to results obtained with the glm() function.

#== glm results for comparison

mod.glm <- glm(cbind(Y,visits-Y)~alt,family="binomial",data=data.obs)
summary(mod.glm)

20
##
## Call:
## glm(formula = cbind(Y, visits - Y) ~ alt, family = "binomial",
## data = data.obs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1290 -0.7509 -0.6041 -0.1749 2.7277
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.3822 0.1966 -7.032 2.03e-12 ***
## alt 0.9518 0.2764 3.444 0.000573 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 215.71 on 199 degrees of freedom
## Residual deviance: 199.79 on 198 degrees of freedom
## AIC: 203.79
##
## Number of Fisher Scoring iterations: 5

MCMC can also be graphically summarized with a call to the plot.mcmc() function,
also in the coda package. MCMC are plotted with a trace of the sampled output and a
density estimate for each variable in the chain (Fig. 2.4). This plot can be used to visually
check that the chains have converged.

plot(mod.hSDM.binomial$mcmc)

The hSDM.binomial() function also returns two other objects. The first one, theta.latent,
is the predictive posterior mean of the latent variable θ (the probability of presence) for
each observation.

str(mod.hSDM.binomial$theta.latent)

## num [1:200] 0.2191 0.0992 0.1038 0.1878 0.221 ...

summary(mod.hSDM.binomial$theta.latent)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 0.0171 0.1540 0.2179 0.2297 0.2974 0.4970

21
Trace of beta.(Intercept) Density of beta.(Intercept)
−2.0 −1.6 −1.2 −0.8

1.5
1.0
0.5
0.0
1000 1200 1400 1600 1800 2000 −2.0 −1.5 −1.0

Iterations N = 1000 Bandwidth = 0.05789

Trace of beta.alt Density of beta.alt

1.2
1.5

0.8
1.0

0.4
0.5

0.0

1000 1200 1400 1600 1800 2000 0.0 0.5 1.0 1.5 2.0

Iterations N = 1000 Bandwidth = 0.0779

Trace of Deviance Density of Deviance

0.30
212

0.20
208

0.10
204
200

0.00

1000 1200 1400 1600 1800 2000 200 205 210 215

Iterations N = 1000 Bandwidth = 0.5371

Figure 2.4: Trace and density estimate for each variable of the MCMC.

22
The second one, theta.pred is the set of sampled values from the predictive posterior
(if parameter save.p is set to 1) or the predictive posterior mean (if save.p is set to 0)
for each prediction. In our example, save.p is set to 1 and theta.pred is an mcmc object.
Values in theta.pred can be used to plot the predicted probability of presence on the
whole landscape and the uncertainty associated to predictions (Fig 2.5).

# Create a raster for predictions

theta.pred.mean <- raster(theta)
# Create rasters for uncertainty
theta.pred.2.5 <- theta.pred.97.5 <- raster(theta)
# Attribute predicted values to raster cells
theta.pred.mean[] <- apply(mod.hSDM.binomial$theta.pred,2,mean)
theta.pred.2.5[] <- apply(mod.hSDM.binomial$theta.pred,2,quantile,0.025)
theta.pred.97.5[] <- apply(mod.hSDM.binomial$theta.pred,2,quantile,0.975)
# Plot the predicted probability of presence and uncertainty
plot(theta.pred.mean,main="Mean",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))
plot(theta.pred.2.5,main="Quantile 2.5 %",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))
plot(theta.pred.97.5,main="Quantile 97.5 %",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))

In our example, we can compare the predictions to the initial probability of presence
computed from our model to check that our predictions are correct (Fig. 2.6).

# Comparing predictions to initial values

plot(theta[],theta.pred.mean[],cex.lab=1.4,xlim=c(0,1),ylim=c(0,1))
points(theta[],theta.pred.2.5[],cex.lab=1.4,col=grey(0.5))
points(theta[],theta.pred.97.5[],cex.lab=1.4,col=grey(0.5))
abline(a=0,b=1,col="red",lwd=2)

23
Mean

50
40

0.75
30

0.5
20

0.25

0
10
0

0 10 20 30 40 50

Quantile 2.5 % Quantile 97.5 %

50
40

1 1

0.75 0.75
30

0.5 0.5
20

0.25 0.25

0 0
10

10
0

0 10 20 30 40 50 0 10 20 30 40 50

Figure 2.5: Predicted probability of presence and uncertainty of predictions.

Mean probability of presence (top), predictions at 2.5% quantile (bottom left) and 97.5%
quantile (bottom right) can be plotted from the mcmc object plot.p.pred returned by
function hSDM.binomial().

24
1.0
0.8
●●
●
●
●
●●●
●
●
●●
●
theta.pred.mean[] ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
0.6
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
● ●●●
●
●
●
●
● ●●
●
●
●
● ●●
●
●●
●●
●
●
● ●
●●
●
●
●
●●
● ●●
●●
●
●
●
●● ●
●●
●
●●
●
● ●●
●
●
●●
●
●●
●
● ●
●
●●
●
●
●
●
●● ●●
●
●
●
●
●● ●
●
●
●●
●
●●
● ●●
●
●
● ●
0.4

●●
●
●●
●
●●
● ●
●●
●
●
●
●●
●●
●
●● ●
●
●●
●
●
●
●●
●●
●●
●
●
●●
●
●

0.0 0.2 0.4 0.6 0.8 1.0

theta[]

Figure 2.6: Predicted vs. initial probabilities of presence. Initial probabilities of

presence are computed from the Binomial logistic regression model with target parameters.

2.2 Site-occupancy model

2.2.1 Mathematical formulation

Let’s consider the random variable zi describing habitat suitability at site i. The random
variable zi can take value 1 or 0 depending on the fact that the habitat is suitable (zi = 1)
or not (zi = 0). Habitat at site i is described by environmental variables Xi . Random
variable zi can be assumed to follow a Bernoulli distribution of parameter θi (Eq. 2.3). In
this case, θi is the probability that the habitat is suitable. Several visits at time t1 , t2 ,
etc., can occur at site i. Let’s consider the random variable yit representing
P the presence
of the species at site i and time t. The species is observed at site i ( Pt yit ≥ 1) only if
the habitat is suitable (zi = 1). The species is unobserved at site i ( t yit = 0) if the
habitat is not suitable (zi = 0), or if the habitat is suitable (zi = 1) but the probability
δit of detecting the species at site i and time t is inferior to 1. Thus, yit is assumed to
follow a Bernoulli distribution of parameter zi δit . Using a logit link function, δit can be
expressed as a linear model combining explicative variables Wit and parameters γ (Eq. 2.3).
Typically, explicative variables Wit are site characteristics (e.g., habitat variables) or survey
characteristics (e.g., weather conditions). The function hSDM.siteocc() estimates the
parameters β and γ of such a model.

25
Ecological process:
zi ∼ Bernoulli(θi )
logit(θi ) = Xi β
(2.3)
Observation process:
yit ∼ Bernoulli(zi δit )
logit(δit ) = Wit γ

2.2.2 Data generation

To explore the characteristics of the hSDM.siteocc() function, we can generate a new
virtual data-set on the basis of the site-occupancy model described above (Eq. 2.3). In
the most general case, the observation protocol includes severals visits with varying survey
conditions (e.g. weather conditions) to several sites with fixed sites characteristics (e.g.
habitat variables). We will generate a virtual data-set following this protocole using the
altitudinal data in the previous example for the Binomial model (Sec. 2.1).
We draw at random the number of visits at each site of the previous example (see
Fig. 2.3 of Sec. 2.1).

# Number of visits associated to each observation point

set.seed(seed)
visits <- rpois(nsite,lambda=3) # Mean number of visits ~3
# NB: Setting a too low mean number of visits per site (lambda < 3)
# leads to inaccurate parameter estimates
visits[visits==0] <- 1 # Number of visits must be > 0
# Vector of observation sites
sites <- vector()
for (i in 1:nsite) {
sites <- c(sites,rep(i,visits[i]))
}

The survey conditions for each visit are determined by two explicative variables, w1
and the altitude (variable denoted A). These two variables explain the observability of the
species (Eq. 2.4).

yit ∼ Bernoulli(zi δit )

(2.4)
logit(δit ) = γ0 + γ1 w1it + γ2 Ait

We fix the intercept and the effects of these two variables: γ0 = −1, γ1 = 1 and γ2 = −1
for determining the detection probability. In our case, the detection probability decreases
with altitude (γ2 < 0).

26
# Explicative variables for observation process
nobs <- sum(visits)
set.seed(seed)
w1 <- rnorm(n=nobs,0,1)
W <- cbind(rep(1,nobs),w1,X.sites[sites,2])
# Target parameters for observation process
gamma.target <- matrix(c(-1,1,-1),ncol=1)

Using covariates and parameters for the two processes, we compute the probability that
the habitat is suitable (θi ) and the species detection probability (δi ). We also draw the
random variables zi and yi and construct the observation data-set.

# Ecological process (suitability)

logit.theta.site <- X.sites %*% beta.target
theta.site <- inv.logit(logit.theta.site)
set.seed(seed)
Z <- rbinom(nsite,1,theta.site)

# Observation process (detectability)

logit.delta.obs <- W %*% gamma.target
delta.obs <- inv.logit(logit.delta.obs)
set.seed(seed)
Y <- rbinom(nobs,1,delta.obs*Z[sites])

# Data-sets
data.obs <- data.frame(Y,w1,alt=X.sites[sites,2],site=sites)
data.suit <- data.frame(alt=X.sites[,2])

2.2.3 Parameter inference using the hSDM.siteocc() function

The hSDM.siteocc() function estimates the parameter of a site-occupancy model in a
Bayesian framework.

mod.hSDM.siteocc <- hSDM.siteocc(# Observations

presence=data.obs$Y,
observability=~w1+alt,
site=data.obs$site,
data.observability=data.obs,
# Habitat
suitability=~alt,
data.suitability=data.suit,
# Predictions
suitability.pred=data.pred,

27
# Chains
burnin=1000, mcmc=1000, thin=1,
# Starting values
beta.start=0,
gamma.start=0,
# Priors
mubeta=0, Vbeta=1.0E6,
mugamma=0, Vgamma=1.0E6,
# Various
seed=1234, verbose=1, save.p=1)

2.2.4 Analysis of the results

summary(mod.hSDM.siteocc$mcmc)

##
## Iterations = 1001:2000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) -0.8013 0.3357 0.010617 0.03161
## beta.alt 1.1533 0.4902 0.015501 0.04079
## gamma.(Intercept) -1.2915 0.2261 0.007149 0.02448
## gamma.w1 0.9379 0.2273 0.007188 0.02099
## gamma.alt -0.9588 0.2173 0.006871 0.01604
## Deviance 296.0651 3.2473 0.102688 0.28808
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) -1.4767 -1.0039 -0.8107 -0.5990 -0.08896
## beta.alt 0.4223 0.7532 1.0780 1.4936 2.21094
## gamma.(Intercept) -1.7531 -1.4323 -1.2875 -1.1551 -0.77827
## gamma.w1 0.5312 0.7753 0.9254 1.0772 1.46189
## gamma.alt -1.3678 -1.1046 -0.9737 -0.8099 -0.52155
## Deviance 291.8021 293.5787 295.4287 298.0085 303.22179

28
Trace of beta.(Intercept) Density of beta.(Intercept) Trace of gamma.w1 Density of gamma.w1

1.6
1.2

1.5
−0.5

1.2
0.8

1.0
0.8
0.4

0.5
−1.5

0.0

0.0
0.4
1000 1200 1400 1600 1800 2000 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1000 1200 1400 1600 1800 2000 0.5 1.0 1.5

Iterations N = 1000 Bandwidth = 0.08047 Iterations N = 1000 Bandwidth = 0.05999

Trace of beta.alt Density of beta.alt Trace of gamma.alt Density of gamma.alt

0.0 0.2 0.4 0.6 0.8

−0.4

1.5
3

1.0
−1.0
2

0.5
1

−1.6

0.0
0

1000 1200 1400 1600 1800 2000 0 1 2 3 4 1000 1200 1400 1600 1800 2000 −1.5 −1.0 −0.5

Iterations N = 1000 Bandwidth = 0.1305 Iterations N = 1000 Bandwidth = 0.05785

Trace of gamma.(Intercept) Density of gamma.(Intercept) Trace of Deviance Density of Deviance

2.0

0.12
−0.8

305

0.06
1.0
−1.4

295

0.00
−2.0

0.0

1000 1200 1400 1600 1800 2000 −2.0 −1.5 −1.0 −0.5 1000 1200 1400 1600 1800 2000 290 295 300 305 310 315

Iterations N = 1000 Bandwidth = 0.05509 Iterations N = 1000 Bandwidth = 0.8646

Figure 2.7: Trace and density estimate for each variable of the MCMC.

plot(mod.hSDM.siteocc$mcmc)

# Create a raster for predictions

theta.pred.mean <- raster(theta)
# Computing mean and quantiles for uncertainty
theta.pred.mean[] <- apply(mod.hSDM.siteocc$theta.pred,2,mean)
theta.pred.2.5 <- apply(mod.hSDM.siteocc$theta.pred,2,quantile,0.025)
theta.pred.97.5 <- apply(mod.hSDM.siteocc$theta.pred,2,quantile,0.975)
# Plot the predicted probability of presence
plot(theta.pred.mean,main="hSDM.siteocc",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))

# Comparing predictions to initial values

plot(theta[],theta.pred.mean[],xlim=c(0,1),ylim=c(0,1),cex.lab=1.4)
points(theta[],theta.pred.2.5[],cex.lab=1.4,col=grey(0.5))
points(theta[],theta.pred.97.5[],cex.lab=1.4,col=grey(0.5))
abline(a=0,b=1,col="red",lwd=2)

Parameters estimates can be compared to results obtained with the glm() function
assuming a perfect detection.

29
Initial probabilities hSDM.siteocc
50

50
40

40
1 1

0.75 0.75
30

30
0.5 0.5
20

20
0.25 0.25

0 0
10

10
0

0 10 20 30 40 50 0 10 20 30 40 50
1.0

●
●●
● ●●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
0.8

●
●●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●●
●
●
●
●● ●●
●
●
● ●●
●
●
theta.pred.mean[]

●
●
● ●●
●
●●
●
●
● ●
●
●
●●
●
●● ●
●
●
●●
●
●
●● ●
●●
●
●
●
●
● ●●
●
●
●●
●
● ●●
●
●
0.6

●● ●
●●
●
●
●
●
● ●●
●
●
●●
●
● ●
●●
●
●
●
●●
● ●●
●
●●
●●
● ●
●●
●
●
●
● ●●
●
●
●●
●
●
● ●
●
●●
●
●● ●●
●
●●
●
●
● ●
●●
●
●
●
●
● ●
●
●
●●
●
●● ●
●●
●
●
●● ●
●
●●
●
●
●
● ●
●
●
●●
●
●
● ●●
●
●
●
● ●
●●
●
●
●
● ●
●●
● ●●
●
●
●
● ●
●
●● ●
●●
0.4

●
●●
●
●
●●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
● ●
●
●●
●●

0.0 0.2 0.4 0.6 0.8 1.0

theta[]

Figure 2.8: Comparing predicted probability of presence with initial probabili-

ties.

30
#== glm results for comparison
mod.glm <- glm(Y~alt,family="binomial",data=data.obs)
summary(mod.glm)

##
## Call:
## glm(formula = Y ~ alt, family = "binomial", data = data.obs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.5284 -0.4417 -0.4270 -0.4079 2.2650
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.3175 0.1443 -16.056 <2e-16 ***
## alt -0.1330 0.1289 -1.031 0.302
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 362.10 on 594 degrees of freedom
## Residual deviance: 361.08 on 593 degrees of freedom
## AIC: 365.08
##
## Number of Fisher Scoring iterations: 5

# Create a raster for predictions

theta.pred.glm <- raster(theta)
# Attribute predicted values to raster cells
theta.pred.glm[] <- predict.glm(mod.glm,newdata=data.pred,type="response")
# Plot the predicted probability of presence
plot(theta.pred.glm,main="GLM",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))

# Comparing predictions to initial values

plot(theta[],theta.pred.glm[],
xlim=c(0,1),ylim=c(0,1),cex.lab=1.4)
points(theta[],theta.pred.mean[],col=grey(0.5))
abline(a=0,b=1,col="red",lwd=2)

On Figure 2.9, we can see that using a GLM in the case of imperfect detection can lead
to very inaccurate parameter estimates and predictions for the probability of presence of

31
the species. This is particularly true when detection probability is negatively correlated to
presence probability (through an explicative variable such as the altitude in our example).
This has been clearly demonstrated in an article by Lahoz-Monfort et al. (2014).

32
Initial probabilities GLM
50

50
40

40
1 1

0.75 0.75
30

30
0.5 0.5
20

20
0.25 0.25

0 0
10

10
0

0 10 20 30 40 50 0 10 20 30 40 50
1.0
0.8

●●●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●●
theta.pred.glm[]

●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
0.6

●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
0.4

●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
0.2

0.0 0.2 0.4 0.6 0.8 1.0

theta[]

Figure 2.9: Comparing predicted probability of presence using GLM with ini-
tial probabilities. Grey dots figure the predictions with the hSDM.siteocc() function
whereas black dots figure the prediction using the glm() function.

33
2.3 Binomial iCAR model
2.3.1 Mathematical formulation
2.3.2 Data generation with iCAR

# Rasters must be projected to correctly compute the neighborhood

crs(alt) <- '+proj=utm +zone=1'
# Neighborhood matrix
neighbors.mat <- adjacent(alt, cells=c(1:ncells), directions=8,
pairs=TRUE, sorted=TRUE)
# Number of neighbors by cell
n.neighbors <- as.data.frame(table(as.factor(neighbors.mat[,1])))[,2]
# Adjacent cells
adj <- neighbors.mat[,2]
# Generate symmetric adjacency matrix, A
A <- matrix(0,ncells,ncells)
index.start <- 1
for (i in 1:ncells) {
index.end <- index.start+n.neighbors[i]-1
A[i,adj[c(index.start:index.end)]] <- 1
index.start <- index.end+1
}

# Function to draw in a multivariate normal

rmvn <- function(n, mu=0, V=matrix(1), seed=1234) {
p <- length(mu)
if (any(is.na(match(dim(V), p)))) {
stop("Dimension problem!")
}
D <- chol(V)
set.seed(seed)
t(matrix(rnorm(n*p),ncol=p)%*%D+rep(mu,rep(n,p)))
}

# Generate spatial random effects

Vrho.target <- 5 # Variance of spatial random effects
d <- 1 # Spatial dependence parameter = 1 for intrinsic CAR
Q <- diag(n.neighbors)-d*A + diag(.0001,ncells) # Add small constant to
# make Q non-singular
covrho <- Vrho.target*solve(Q) # Covariance of rhos
rho <- c(rmvn(1,mu=rep(0,ncells),V=covrho,seed=seed)) # Spatial Random Effects
rho <- rho-mean(rho) # Centering rhos on zero

34
rho.rast <- rasterFromXYZ(xyz=cbind(coords,rho))
# Probability of presence
theta.cells <- inv.logit(X %*% beta.target + rho)
theta <- rasterFromXYZ(cbind(coords,theta.cells))

# Ecological process (suitability)

cells <- extract(alt,sites.sp,cell=TRUE)[,1]
logit.theta.site <- X.sites %*% beta.target + rho[cells]
theta.site <- inv.logit(logit.theta.site)
set.seed(seed)
Y <- rbinom(nsite,visits,theta.site)
# Data-sets
data.suit <- data.frame(Y,visits,alt=X.sites[,2],cells)
data.pred <- data.frame(alt=values(alt),cell=c(1:ncells))
# Transform observations into a spatial object
data.suit <- SpatialPointsDataFrame(coords=coordinates(sites.sp),
data=data.suit)

# Plot spatial random effects

plot(rho.rast,main="Spatial random effects")
# Plot initial probabilities and observations
plot(theta,main="Initial probabilities (iCAR model)",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))
points(data.suit[data.suit$Y>0,],pch=16)
points(data.suit[data.suit$Y==0,],pch=1)

2.3.3 Parameter inference using the hSDM.binomial.iCAR() func-

tion

Start <- Sys.time() # Start the clock

mod.hSDM.binomial.iCAR <- hSDM.binomial.iCAR(presences=data.suit$Y,
trials=data.suit$visits,
suitability=~alt,
spatial.entity=data.suit$cells,
data=data.suit,
n.neighbors=n.neighbors,
neighbors=adj,
suitability.pred=data.pred,
spatial.entity.pred=data.pred$cell,
burnin=5000, mcmc=5000, thin=5,

35
Spatial random effects

50
40
4

2
30

−2
20

−4
10
0

0 10 20 30 40 50

Figure 2.10: Spatial random effects.

Initial probabilities (iCAR model)

● ● ●
● ● ● ● ●
● ●
● ●
● ●
●
● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
40

● ●
● ● ●
● ● 1
● ● ●●
● ●
● ● ●
●● ● ●
● ● ●
● ● ●
● ●
● ●● ●
● ● ●
● ●
0.75
30

● ●
● ● ● ● ●
● ● ● ●
●
● ●
● ● ● ●
●
● ●●
● ●
● ●
● ● ● ● 0.5
●
● ● ●
●● ● ●
● ●
● ● ● ●
●
20

●● ●
● ● ● ●● 0.25
● ●●● ● ● ●
● ●
● ●
● ●
●●
● ●
● ● ● ●● ●
● ● ● ● ● ●
● ●
●
● 0
●
● ● ●
10

● ● ● ●
● ●
● ● ● ●
● ●
● ● ●● ● ● ●
● ●
● ● ● ●●● ● ●
● ● ● ●
● ● ●
●● ●
●
●
0

0 10 20 30 40 50

Figure 2.11: Initial probability of presence and observations. Presences (full circles)
and absences (empty circles).

36
beta.start=0,
Vrho.start=1,
priorVrho="1/Gamma",
mubeta=0, Vbeta=1.0E6,
shape=1, rate=1,
Vrho.max=10,
seed=1234, verbose=1,
save.rho=1, save.p=0)
Time.hSDM <- difftime(Sys.time(),Start,units="sec") # Time difference

2.3.4 Analysis of the results with iCAR

summary(mod.hSDM.binomial.iCAR$mcmc)

# Predictions for spatial random effects

rho.pred <- apply(mod.hSDM.binomial.iCAR$rho.pred,2,mean)
rho.pred.rast <- rasterFromXYZ(cbind(coords,rho.pred))
plot(rho.pred.rast,main="Predictions rho")
# Predictions for probability of presence

37
theta.pred <- mod.hSDM.binomial.iCAR$theta.pred
theta.pred.rast <- rasterFromXYZ(cbind(coords,theta.pred))
plot(theta.pred.rast,main="Predictions theta",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))
# Predictions vs. initial spatial random effects
plot(rho[-cells],rho.pred[-cells],xlab="rho target",ylab="Predictions rho")
points(rho[cells],rho.pred[cells],col="blue",pch=16)
abline(a=0,b=1,col="red")
# Predictions vs. initial probabilities
plot(values(theta)[-cells],theta.pred[-cells],xlab="theta target",
ylab="Predictions theta")
points(values(theta)[cells],theta.pred[cells],col="blue",pch=16)
abline(a=0,b=1,col="red")

38
Spatial random effects Initial probabilities (iCAR model)
50

50
● ● ●
● ● ● ● ●
● ●
● ●
● ●
●
● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
40

40
● ●
● ● ●
4 ● ● 1
● ● ●●
● ●
● ● ●
●● ● ●
● ● ●
● ● ●
●
2 ● ● ●● ●
0.75
● ● ●
● ●
30

30
● ●
● ● ● ● ●
● ● ● ●
●
● ●
0 ● ● ● ●
●
● ●●
● ●
● ●
● ● ● ● 0.5
●
● ● ●
●● ● ●
● ●
● ● ● ●
−2 ●
20

20
●● ●
● ● ● ●● 0.25
● ●●● ● ● ●
● ●
● ●
−4 ● ●
●●
● ●
● ● ● ●● ●
● ● ● ● ● ●
● ●
●
● 0
●
● ● ●
10

10
● ● ● ●
● ●
● ● ● ●
● ●
● ● ●● ● ● ●
● ●
● ● ● ●●● ● ●
● ● ● ●
● ● ●
●● ●
●
●
0

0
0 10 20 30 40 50 0 10 20 30 40 50

Predictions rho Predictions theta

50
40

1
1.5

1.0 0.75
30

0.5
0.5
0.0
20

−0.5 0.25

−1.0
0
10

10
0

0 10 20 30 40 50 0 10 20 30 40 50

●
● ● ●●
● ● ●● ● ● ●
0.8

● ●● ● ● ●
● ● ● ●● ●●● ● ● ● ●●●
●● ●●● ●● ●●●●● ● ● ● ● ● ● ● ● ●● ●●●
● ● ●● ● ●● ●●● ●
● ●●● ●
● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ●●● ●●● ● ●●
● ●●
●● ●●●●● ●
●● ● ● ●● ●● ●●● ● ●● ● ● ● ● ● ● ● ●●
●●
●●● ●●●
1.5

●● ●●●●●● ● ●● ●● ● ● ● ● ●
●● ●
0.6

●● ●
● ●●●● ●
● ●● ●● ●●
●● ●
●● ●●●●● ●●
●● ●
● ● ● ● ●● ● ● ● ●● ● ● ●● ●●
●
● ●●● ● ●● ●●● ● ● ●● ●● ●● ● ●●●● ●●
Predictions rho

●
●● ● ● ● ●●● ●●
●●●●● ●●●● ● ● ● ●● ● ●●● ●● ●● ●● ● ●● ●● ●
●● ● ● ●●●●
● ●●●●●
●●●●●●
● ●●● ●● ●●
●● ●● ●
●● ●● ●●●●●
●
●●
●
● ●●●●●
● ● ● ●● ●● ●●
●●● ●●●
● ●● ●●●● ●● ● ●● ● ●●● ●●
● ● ●●
●
●
●
● ●● ●●● ● ● ●●
●
●
●
●●●●
● ● ●●● ●
●
●
●●●●●●
●
● ● ● ● ●
●● ●● ●
● ●● ●● ●● ●●●●●●●●● ●
●●
●● ● ●● ●● ● ●● ●
● ● ●●
● ● ●● ●●● ● ● ●● ● ● ●
● ●● ●●●● ● ●● ●● ● ●●● ●● ● ●● ●
● ● ●●●●
●●●●● ● ●
●
●
●
●●●●
●
●
●●
●
●
●●
● ●
●
●●●●
●● ● ●
●●
● ● ●●
●● ●●●● ●● ●●
● ●
●
●● ● ●●● ●
● ● ●● ●● ●●●● ●● ●● ●● ●● ● ●
●●
● ●● ●●● ● ●● ● ●● ● ●
● ● ●
● ●● ● ●●
●
● ●●● ●●●● ● ● ● ● ●●●
●● ●
●● ●
● ● ●●
●●●●●● ● ●●● ●● ● ● ●
●● ● ●● ●● ●●
● ●
●●●●●● ● ● ●
●
●●●●●●● ● ● ● ● ●
● ●● ● ●●●●●
● ●●●●●●● ●
●● ● ●●● ●● ●
● ●● ●●●
●●●●
● ●● ●
●
●
●●●●
● ●●●●●
● ●●
●● ●● ● ●● ●
● ● ●● ●● ● ●
● ● ● ●●●
●●●● ● ●●●●
●●● ●● ● ●●●●
●● ● ●●●●
●
● ●●●● ●●●● ●● ● ●●●●
●●
●
●●●●● ● ● ●●● ● ●●
●● ●●
● ●●● ●●● ●● ● ●
●
● ● ●●●
● ●
● ●● ●●● ● ● ● ● ● ●
●● ● ●● ●●●● ●● ●● ● ●
●● ● ●
● ●
●● ● ● ●● ● ● ● ●●●●● ●●●●
● ● ●● ● ● ● ● ●● ● ●●● ● ● ●●
●
● ● ● ●● ●●● ●
●● ●●●●
● ●● ●●●
●
●
●● ●● ●
●
● ●●●
●● ● ●
●●●
●
●●
●● ●
●●●●● ●●● ●
●●● ●●
● ●● ● ● ●●● ●●●● ● ● ●
● ●
● ●●●● ●● ●●●
● ●
●●●
●● ● ● ● ● ● ●●● ●●●
●● ● ● ● ● ●● ●●●● ●
●●● ●●●●●●
● ●
●●● ●● ●●
● ●●●
●
●
●
●●●
●
●
●
● ●●
●●
● ●●●
●●● ●
●●●
● ●●● ●● ●●●●●●● ● ● ●
●●●● ● ●● ●●●
● ●●
●●●
● ●● ●● ●●
● ●●
● ● ● ●●● ● ● ● ●
● ● ● ●● ● ● ●
● ●● ●●●●●● ●●
●●●
● ●●●●●●●
● ●
●● ●
●
●●
●●●● ● ●●
● ●
●●
● ●
●●
●●●
● ●●●●●
●●● ●●● ● ●
●● ●● ●●●
●●● ●●●● ● ●
● ●●●●
●● ●
●●●●●
●
● ●●● ● ●
●
●●● ●●● ● ● ●
●
● ●●
● ● ●●●●
●●●
●●●● ●●●●●●●●●● ●
●
●● ●●● ● ●● ● ●●● ●
● ● ●●
●●● ●● ●●● ●● ● ●● ●●● ● ●● ●
● ●● ● ● ●●●● ● ●● ●●
● ●● ●
−0.5

● ● ●●●● ●●●●● ●●● ●

●●
●
●●● ● ●●●● ●● ● ● ●●●● ●●
● ●● ●●● ● ●●● ● ● ● ●
0.2

●●
● ● ●
●●
● ●● ●● ●● ●●●●●●● ●● ●● ●● ●
● ●●● ●●●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●●
●
● ●●
●
●● ●
● ●
●
●●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●● ●
●●
●● ●
● ●●●●
●●●●●
●
●
●
●●●●
●
●
●●
●● ●●
●●●●
●●●●
●
●●●
●
●
●
● ● ●●●●
● ●
●
●●
●●
●
●● ●
●
●
●
●
●
●
●
●
●●●
●
●●●●●●
●
●
● ●●
●
●●
●●●
●●●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
● ●●
●●
●
●●
●
●●●
● ●
●
●●● ●●●●●● ●● ●
●
●
●● ● ●● ● ●
● ●
● ●
●
●
●● ●●● ● ●●●●
● ●
● ●●●
● ●●● ●●● ● ●● ● ●● ●
●●
●● ● ● ●●●●● ●
●●●● ●
●●●● ●
●●●
●●●
●●●
●●●●● ● ●●●●●●● ●● ●●
● ● ● ●●● ● ● ●
●●
●●
●●
●● ●● ●
●
●
●
●
●
●
●●●
●●
●●
●
●●
● ● ●●
●●●
●
●●●
●●● ●
●●
●●●● ●
●●
●
●● ●●● ●●
●●●● ●
●
●
●●
● ● ● ● ●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●●●
●
●
●●●●●●●
●●●●●
●●●
●
●
●●●
●●
● ●● ●
●
● ●
●●● ●
●●●
● ●
● ●
●
●●● ●●●
● ●●
●●●
● ● ● ● ●● ● ● ● ●
● ● ● ●● ●
● ● ● ●
● ● ●● ●
● ●
● ●
● ●●●
●●●● ● ● ● ● ● ●
●●
● ●
● ●
●●●●●
●● ●●
● ●●●●
● ●●
●●
●● ●
● ●
●●
●●
●
● ●●●
●●●●
●
●
●●●● ● ●●● ●
●● ● ● ●● ●
●●●●● ● ●
● ● ● ●
●● ● ●●●● ●● ● ●● ● ● ● ● ●● ●●●● ● ●
●●
● ●●●
●●● ● ●
● ●●●●
● ●●●● ●● ●● ●● ●●● ●●● ● ●●●● ● ● ●
● ●● ●● ●
●●●●●●● ●● ●●● ●●●
●
●●●●
● ●
●● ●
● ● ● ●●● ● ●●
●●●●●
●
●●
● ●● ●●
●●
●
●●
● ● ●
●
●●●●● ● ●●
●●●
● ●
● ●● ●● ●●●●●
●●●● ●●
●● ●
●●●●● ● ● ● ●●
● ●● ●● ● ●
●● ●●
●●●●●
● ● ● ●●●
●
● ●
● ●
●● ● ●● ●● ● ● ● ●●●
●●●
●●
●
●
●
●
●
● ●
●●
●●
●●
●
●● ●
●●● ●●●
●
●●●●●
●●● ●
●●●
●● ●
● ●●
● ●●●●● ● ●● ●● ● ●
●
●●● ● ● ●● ●●●●● ●●● ●
● ●●●
● ●
●●● ●● ● ●●●●●
● ●●●
●●
● ●
●
●●●● ●●● ● ●●● ●● ●● ●
● ● ● ● ●●● ●●●●
●●● ● ● ●●●●●●● ● ●● ●●●
● ● ●
●●●●
●●●● ●
●●
● ●● ●●● ● ●● ● ●
● ● ●● ●
●●● ● ●● ●
●● ●● ●● ● ● ● ●
● ●●●●
●
● ●●●
● ●
●●
●
●●●
●●●
●●●● ●
●●
●●
●●● ● ●
●●●● ●●●●●
●● ●●
●● ●●●● ●●●● ●●
●
●●●
●
●●●● ●
●● ●● ● ●● ●
●
●
●●
●●
●
●
●●●●● ●
●
●● ● ●●
● ● ● ●
● ●●● ●● ●●●●●●● ● ● ● ●● ● ●●●
● ●
● ●
●● ● ●●
●● ●● ●● ●● ●● ● ●●●
● ●●
●●●●● ●●●
● ● ●
●
● ● ●
● ●
● ● ● ● ●● ●● ●
−1.0

● ●●● ●●●● ●●● ● ●● ●●●●●● ●● ● ●

● ● ● ●
●
●
●●●
●
●●● ● ●
● ●● ● ●
● ●●●●●● ● ● ● ● ●● ●● ●
● ● ●
●
●●●● ●
● ● ●
● ● ●● ● ●●●● ● ●
●
● ● ● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
● ● ● ●
● ●
●
●
●●● ●● ●
● ●● ● ●
● ●
●●
●●
●● ●

−4 −2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0

rho target theta target

Figure 2.12: Predictions vs. initial values

39
2.3.5 Comparison with OpenBUGS results

# BUGS model
modelBUGS1.txt <-
"model {

# likelihood
for (n in 1:nobs) {
y[n] ~ dbin(theta[n], visits[n])
logit(theta[n]) <- Xbeta[n] + rho[IdCell[n]]
Xbeta[n] <- beta[1] + beta[2]*x1[n]
}

# CAR prior distribution for spatial random effects:

rho[1:ncells] ~ car.normal(adj[], weights[], num[], tau)
for (k in 1:sumNumNeigh) {
weights[k] <- 1 # set equal weights for all neighbors
}

# Other priors
for (i in 1:2) {
beta[i] ~ dnorm(0,1.0E-6)
}
Vrho <- 1/tau
tau ~ dgamma(1,1)

# Create model.txt file in the working directory

system(paste("echo \"",modelBUGS1.txt,"\" > modelBUGS1.txt",sep=""))

# Data for OpenBUGS

y <- data.suit$Y
visits <- data.suit$visits
IdCell <- data.suit$cells
x1 <- data.suit$alt
num <- n.neighbors
adj <- adj
nobs <- length(y)
ncells <- length(n.neighbors)
sumNumNeigh <- length(adj)
data <- list("y","visits","IdCell","x1","num",
"adj","nobs","ncells","sumNumNeigh")

# Inits

40
Value OpenBUGS hSDM
β0 -0.99 -0.99
βalt 0.73 0.75
Vρ 2.72 2.80
Deviance 328.62 327.84
Time (secs) 91 7

Table 2.1: Comparison between hSDM and OpenBUGS outputs.

inits <- list(list(beta=rep(0,2),rho=rep(0,ncells),tau=1))

# OpenBUGS call
library(R2OpenBUGS)
Start <- Sys.time() # Start the clock
Open <- bugs(data,inits,
model.file="modelBUGS1.txt",
parameters=c("beta","Vrho","rho"),
n.chains=1,
OpenBUGS.pgm="/usr/bin/OpenBUGS",
n.iter=2000,
n.burnin=1000,
n.thin=5,
DIC=TRUE,
debug=FALSE,
clearWD=FALSE)
Time.OpenBUGS <- difftime(Sys.time(),Start,units="sec") # Time difference

# Time difference
ratio.time <- as.numeric(Time.OpenBUGS)/as.numeric(Time.hSDM)
ratio.time # For this example, hSDM is X times faster

#== Outputs
Open$DIC
Open$pD
beta.pred.Open <- apply(Open$sims.list$beta,2,mean)
Vrho.pred.Open <- mean(Open$sims.list$Vrho)
deviance.Open <- mean(Open$sims.list$deviance)
rho.OpenBUGS <- apply(Open$sims.list$rho,2,mean)
plot(rho.pred,rho.OpenBUGS)
abline(a=0,b=1,col="red")

41
● ●
●
● ●●●
●●
● ●
●
●
●●● ●
●
●
●●
●
●
●●
●
●●●
●●●
●
●●●
●●
●
●●●
●● ●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
● ●
●
●●
●●
●●●
●
●
● ● ●

1.5
●●●
●●
●●
●●
●●
●●
●●
●
●
●●●
●
●●● ● ●
●●●
●●
●
●●●
●
●
●●
●●●●
●
● ●
● ●
●●●
●●●●
●
●●●●●
●● ● ●
●●
● ●
●●● ●●
●●
●●
●●●●●●●●●
●
● ●●●●●●●●
●●● ●●
●●●

1.0
●
● ●
●
●
●
●
●●●●
●
●
●●
●●●
●
●●
●
●
●
●
●●●
● ●
● ●
●●
●●●
●
● ●
●●
●
● ●●
●
● ●
●●●●●●●
●●
●
●
●
●
●
●●
●●
●●
●
●
●●
●●●●●
●
●●
●●
●
●●
●
●
●●
●
●●●●
●
●●●●
●●
●●
●●
●
●
●●
●
●
●●
●
●● ●
●●● ●
●●
● ●
●●
●
●●●
●●
●
●●
●●
●
●●
●●●●●
● ● ●
●
●●●●
●●
● ●●
rho.OpenBUGS
●●
●●●
●
●
●●
●
●
●●
●●
●●●
●
●●
●
●
●●
●●●
●●
●
●
●●
●●
● ●●
●
●
●● ●
● ●
●
●●
●
●●
●
●●
● ●
●●
●●●
●●
●●
●●●
0.5
●
●●
●●
●
●●
●● ●●
●●●●
●●
●
●●●●●
●
●●
●●●
●
●
●
●
●●
●●
●●● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
● ●●
●
●●●
●
●●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●●
●●
●●●●
●
●●
●
●●
●
●
●●
●
●
● ●
● ●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●●
●●
● ●
●●
●●●
●
●●
●●
●●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●●
●
●●
●
●●●
●
●●
●●
●
●● ●
●
●●●
●●
●
●●
● ●●●●
●
●
0.0

●
●
●
●●
●
●●
●●
● ●
●
●●●
●
●
●
●●● ●● ●● ●
●●●
●
●●
●
●
● ●
●
●●●●●
●
●●●●●●●
● ●
●
●
● ●●●●
●●
●●
●●
●● ● ●● ● ●
● ● ●
●
● ●
●●
●●
● ●●
●●
● ●●●●●●● ●●●●●●●
●
●●●
●
●●●
●● ● ● ● ●●
●
● ●
●
●
●

−1.0 −0.5 0.0 0.5 1.0 1.5

rho.pred

Figure 2.13: Comparison between hSDM and OpenBUGS for spatial random
effect estimates.

2.3.6 Comparison with GLM results

#== glm results for comparison

mod.glm <- glm(cbind(Y,visits)~alt,family="binomial",data=data.suit)
summary(mod.glm)

##
## Call:
## glm(formula = cbind(Y, visits) ~ alt, family = "binomial", data = data.suit)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3707 -0.9457 -0.6015 0.4535 1.8110
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.20339 0.08846 -13.604 < 2e-16 ***
## alt 0.44159 0.10239 4.313 1.61e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 157.26 on 199 degrees of freedom

42
## Residual deviance: 135.34 on 198 degrees of freedom
## AIC: 355.04
##
## Number of Fisher Scoring iterations: 4

# Create a raster for predictions

theta.pred.glm <- raster(theta)
# Attribute predicted values to raster cells
theta.pred.glm[] <- predict.glm(mod.glm,newdata=data.pred,type="response")
# Plot the predicted probability of presence
plot(theta.pred.glm,main="GLM for iCAR",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))

# Comparing predictions to initial values

plot(theta[],theta.pred.glm[],
xlim=c(0,1),ylim=c(0,1),cex.lab=1.4)
abline(a=0,b=1,col="red",lwd=2)

43
Initial probabilities (iCAR model) GLM for iCAR
50

50
● ● ●
● ● ● ● ●
● ●
● ●
● ●
●
● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
40

40
● ●
● ● ●
● ● 1 1
● ● ●●
● ●
● ● ●
●● ● ●
● ● ●
● ● ●
● ●
● ●● ●
● ● ●
● ●
0.75 0.75
30

30
● ●
● ● ● ● ●
● ● ● ●
●
● ●
● ● ● ●
●
● ●●
● ●
● ●
● ● ● ● 0.5 0.5
●
● ● ●
●● ● ●
● ●
● ● ● ●
●
20

20
●● ●
● ● ● ●● 0.25 0.25
● ●●● ● ● ●
● ●
● ●
● ●
●●
● ●
● ● ● ●● ●
● ● ● ● ● ●
● ●
●
● 0 0
●
● ● ●
10

10
● ● ● ●
● ●
● ● ● ●
● ●
● ● ●● ● ● ●
● ●
● ● ● ●●● ● ●
● ● ● ●
● ● ●
●● ●
●
●
0

0 10 20 30 40 50 0 10 20 30 40 50
1.0
0.8
theta.pred.glm[]
0.6
0.4

● ● ●
●● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●
● ●
● ●●● ●●
● ●●● ● ● ●●● ● ●● ● ● ● ●● ●● ● ● ● ●●●● ● ● ● ●●●
● ●●● ● ●●●●●●●● ●●●
●●●● ● ●
●●●●
● ●●
●● ●● ●
●●● ●●●●●●●● ●
●●● ● ● ●●● ● ● ● ● ●●●● ●
● ●●● ●●
●● ●
●● ●●
●●
●●●
● ●
●●●● ●●●
● ● ●● ● ●●●● ● ● ●●●● ●●● ● ● ● ●● ●●
●●●
● ●●●
●● ●
●● ●● ●
●
● ●● ●●●●
● ●
●●
● ●● ●●●
●●●●
●●●
● ●●
● ● ●●
●●●●
● ●● ●
●●●●●
●● ●●● ● ● ●● ●●● ●●● ●● ●●● ●●●●● ● ● ●●
● ●
●
● ●●● ● ●● ●●
●●
●●●
●●
●
●●●●●
●●●●
●
●●● ● ●●
●●●●●
●●●●
● ●
● ● ●●● ● ●● ● ● ●● ●●●●●● ●
●●●
●●●
●●●●
●●
●● ●●●●●●●
● ●●●●●●
●● ●●
●●●
● ● ●
●●
●●
●● ● ●
●● ●
●
●
●● ●
● ●
●●
●●●
●●
● ●●
●●
●● ●
●●●●●● ●●●●● ●
●
●●●●
●
● ●●●
●
●
●●●●●●●
●
●
●●●●
●●● ●●●
●
●
●●
●●●
●●●
●
●
●●●●
●
●
●●
●
● ●
● ●●
●
●●●
●
●
●●●
●●
●●
● ●
●
●●● ● ●
●
●
●●
●●
●●
●●●
●
●●● ●●
●●●
●
●
● ●●●
●
●●
●●●
●●
●
●
●
●●●●
●
●●●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●●
●
● ●●
●
●
●●
●●
●
●
●
●●
●●
●●
● ●●
●
●●●●●●
● ●
●
●
●
●
●●●
● ●●● ●●
●● ●● ● ●● ● ●
●● ●
● ●●●● ●
●●●●● ● ●● ● ●● ●
●● ● ●●
●
●● ● ●
●●●●● ●●
●● ●● ● ● ● ●●●
●● ●
● ● ● ● ●● ● ● ● ● ● ●●● ● ●
●●●● ● ●●● ● ●●
●●●●●
● ● ●
● ● ●
●● ● ●
●
● ●● ●
●●
● ●●
●●
●●
●●●●●●● ●● ●●●● ● ●●●●●● ● ●
●
●●
● ●●
● ● ●●●●●
●●●●● ●
●●●●●
● ●
●● ●
●●●●●●
●
●
●
●●●●
●●
●
●●
●●
● ●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●●
●
●
●●●●●●
●
●●
● ●
●
●
●
●
●
●
●
●●●
●●● ●●
●●●●
●
●●
●●●●
● ●
●●●
●
●
●●●●
●
●
●●●
●●● ●
●●
●●●●
● ●
●●● ● ●●●●●● ●●● ●● ●● ●● ●●●●●●● ● ●
●●●●● ●●
●● ●●
● ●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●
●● ●
●
●
●●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●
●●●●
●●
●
●
●●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●●●●
●●
●
●●●
●
●●
●●
●
●
●●
●
●●●
●
●
●
●●
●
●●●
● ●
●
●
●●
●●●●
●
●
●
●
●●●●●
●
●●●
●
●
●
●●
●
●
●●
●
●●●
●
● ●● ●
●● ●● ●● ●
● ●
●
●●● ●● ● ● ●●●
● ● ●
●
●
●
●●
●●●●●
●●●●●
●
● ●
●
●●
●
●
●●●●
● ●●●
●
●●
●●●
●●●●
● ●● ●
●●●
●●●
●●
●
● ●
●●
● ●
●
●●
●
●
●●●
●● ●
●●
● ●●●
●●●
● ●
●●● ●
●●●●●●● ●●
● ● ● ●●
●●●●●● ● ●●
● ●●
● ●
● ●●● ● ●
● ● ●● ● ● ● ● ● ●
● ●● ●
●● ●●
●
●●●
●●●●
●●●
●●
●●
●
●●
●●
●●●
● ●
●●●●
●● ●●
● ●●
● ●
●
● ●
●●●●●●●●
●●● ●● ●●●●● ●● ● ● ●
● ● ● ● ● ● ● ● ● ● ●●●● ●
0.2

0.0 0.2 0.4 0.6 0.8 1.0

theta[]

Figure 2.14: Comparing predicted probability of presence using GLM with initial
probabilities for a binomial model with iCAR process.

44
2.4 Site-occupancy iCAR model
2.4.1 Mathematical formulation
2.4.2 Data generation

# Ecological process (suitability)

logit.theta.site <- X.sites %*% beta.target + rho[cells]
theta.site <- inv.logit(logit.theta.site)
set.seed(seed)
Z <- rbinom(nsite,1,theta.site)

# Observation process (detectability)

nobs <- sum(visits)
set.seed(seed)
Y <- rbinom(nobs,1,delta.obs*Z[sites])

# Data-sets
data.obs <- data.frame(Y,w1,alt=X.sites[sites,2],site=sites)
data.suit <- data.frame(alt=X.sites[,2],cell=cells)

2.4.3 Parameter inference using the hSDM.siteocc.iCAR() func-

tion

Start <- Sys.time() # Start the clock

mod.hSDM.siteocc.iCAR <- hSDM.siteocc.iCAR(# Observations
presence=data.obs$Y,
observability=~w1+alt,
site=data.obs$site, data.observability=data.obs,
# Habitat
suitability=~alt, data.suitability=data.suit,
# Spatial structure
spatial.entity=data.suit$cell,
n.neighbors=n.neighbors, neighbors=adj,
# Predictions
suitability.pred=data.pred,
spatial.entity.pred=data.pred$cell,
# Chains
burnin=5000, mcmc=5000, thin=5,
# Starting values
beta.start=0,
gamma.start=0,

45
Vrho.start=1,
# Priors
mubeta=0, Vbeta=1.0E6,
mugamma=0, Vgamma=1.0E6,
# priorVrho="1/Gamma",
# shape=1, rate=1,
priorVrho="Uniform",
Vrho.max=10,
# Various
seed=1234, verbose=1,
save.rho=1, save.p=0)
Time.hSDM <- difftime(Sys.time(),Start,units="sec") # Time difference

summary(mod.hSDM.siteocc.iCAR$mcmc)

##
## Iterations = 5001:9996
## Thinning interval = 5
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) -1.3887 0.4224 0.013357 0.03063
## beta.alt 1.0793 0.4787 0.015138 0.05328
## gamma.(Intercept) -1.1084 0.2568 0.008121 0.01169
## gamma.w1 1.1206 0.2587 0.008180 0.01065
## gamma.alt -0.5615 0.2236 0.007071 0.00881
## Vrho 6.8428 2.0419 0.064569 0.57398
## Deviance 253.1036 10.1066 0.319600 1.01861
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) -2.2137 -1.6727 -1.3879 -1.1172 -0.5645
## beta.alt 0.1648 0.7386 1.0730 1.3929 2.1004
## gamma.(Intercept) -1.6208 -1.2980 -1.1016 -0.9277 -0.6422
## gamma.w1 0.6383 0.9490 1.1230 1.2783 1.6412
## gamma.alt -0.9982 -0.7134 -0.5594 -0.4180 -0.1070
## Vrho 3.2623 5.1637 7.0166 8.6736 9.8196
## Deviance 233.3963 246.3764 252.9402 260.1291 274.6428

46
# Predictions for spatial random effects
rho.pred <- apply(mod.hSDM.siteocc.iCAR$rho.pred,2,mean)
rho.pred.rast <- rasterFromXYZ(cbind(coords,rho.pred))
plot(rho.pred.rast,main="Predictions rho")
# Predictions for probability of presence
theta.pred <- mod.hSDM.siteocc.iCAR$theta.pred
theta.pred.rast <- rasterFromXYZ(cbind(coords,theta.pred))
plot(theta.pred.rast,main="Predictions theta",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))
# Predictions vs. initial spatial random effects
plot(rho[-cells],rho.pred[-cells],xlab="rho target",ylab="Predictions rho")
points(rho[cells],rho.pred[cells],col="blue",pch=16)
abline(a=0,b=1,col="red")
# Predictions vs. initial probabilities
plot(values(theta)[-cells],theta.pred[-cells],xlab="theta target",
ylab="Predictions theta")
points(values(theta)[cells],theta.pred[cells],col="blue",pch=16)
abline(a=0,b=1,col="red")

2.4.4 Comparison with OpenBUGS results

# BUGS model
modelBUGS.txt <-
"model {

# Suitability process
for (i in 1:nsite) {
z[i] ~ dbern(theta[i])
logit(theta[i]) <- Xbeta[i] + rho[IdCellforSite[i]]
Xbeta[i] <- beta[1] + beta[2]*alt.suit[i]
}

# Observability process
for (n in 1:nobs) {
y[n] ~ dbern(delta.prim[n])
delta.prim[n] <- delta[n]*z[IdSiteforObs[n]]
logit(delta[n]) <- gamma[1] + gamma[2]*w1[n] + gamma[3]*alt.obs[n]
}

# CAR prior distribution for spatial random effects:

rho[1:ncells] ~ car.normal(adj[], weights[], num[], tau)
for (k in 1:sumNumNeigh) {
weights[k] <- 1 # set equal weights for all neighbors

47
Spatial random effects Initial probabilities (iCAR model)
50

50
● ● ●
● ● ● ● ●
● ●
● ●
● ●
●
● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
40

40
● ●
● ● ●
4 ● ● 1
● ● ●●
● ●
● ● ●
●● ● ●
● ● ●
● ● ●
●
2 ● ● ●● ●
0.75
● ● ●
● ●
30

30
● ●
● ● ● ● ●
● ● ● ●
●
● ●
0 ● ● ● ●
●
● ●●
● ●
● ●
● ● ● ● 0.5
●
● ● ●
●● ● ●
● ●
● ● ● ●
−2 ●
20

20
●● ●
● ● ● ●● 0.25
● ●●● ● ● ●
● ●
● ●
−4 ● ●
●●
● ●
● ● ● ●● ●
● ● ● ● ● ●
● ●
●
● 0
●
● ● ●
10

10
● ● ● ●
● ●
● ● ● ●
● ●
● ● ●● ● ● ●
● ●
● ● ● ●●● ● ●
● ● ● ●
● ● ●
●● ●
●
●
0

0
0 10 20 30 40 50 0 10 20 30 40 50

Predictions rho Predictions theta

50
40

3 1

2 0.75
30

1 0.5
20

0 0.25

−1 0
10

10
0

0 10 20 30 40 50 0 10 20 30 40 50

●
●

● ●
0.7
2.5

● ●
● ● ●
● ●
●●● ● ●● ● ●
● ● ●
●● ●
● ● ●● ● ●●● ●●●● ● ● ●●●
● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ●
●
● ●
●●● ●
● ● ● ● ● ●● ●●●● ●●● ●
● ●
0.6
2.0

● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ●● ●
0.5

● ● ● ● ● ● ● ● ● ●● ● ● ●●
●
Predictions rho

● ●●
1.0

● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●
● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ●
0.4

●● ● ● ●● ● ●
● ●
●● ●● ●●●
●●●●
● ●●
●●
●
● ●●
●●
●●● ●●●● ● ● ●
● ●●●
● ●●● ●●●● ● ●●●●● ● ● ● ●● ●● ● ● ●●●● ●● ● ●● ●● ●
●●● ●● ●●● ●●●●● ●● ●
●● ● ●● ● ●● ●●
0.3

●●● ●●● ● ● ●●●● ●● ●

● ● ● ●●●●●●●
●●●
●●●● ● ●
● ●●●
●
●●●●●●● ●●●●●●●● ●●● ●●●
● ●
● ●● ●●
● ●●●
●● ●
●●●
●
●
●●●● ●●●
●
●●
●● ●●●
● ●●●
● ●● ●● ●●●●● ● ●● ●
● ● ●
● ●● ●●●● ●●●●●● ●
●● ●● ●
●●
●
●●
●● ●●●
●●
● ●●●● ●
● ● ●● ●
●●●
●● ●● ●
● ●● ● ● ● ● ●● ● ●
● ●●
●●●
●●● ●
●
●●
● ●
●● ●
●●●
●●
● ● ●
●
● ● ● ● ●
●● ●●● ●●
●● ● ● ● ●
● ● ● ●
● ● ●
● ●●● ● ●
●● ●●
●●●
●
●●
●●●●●●
●●● ●●
● ●●● ●●●●
● ● ●●● ● ●●● ●● ●
●●
●
● ● ●● ● ●●● ●● ●
●●
●●●●
●●●●●●●●●●●● ●●
●
●●
● ●●
●●●
● ●●● ●
●●●● ●●
●●
● ●
●● ●●●●●● ●● ●●●
● ●
●●
●● ●●●
●●● ●● ● ● ●● ● ● ● ●●● ●● ●●●●● ●●●●●● ● ●●● ● ●●●
● ● ● ●
●● ●● ●●●●●
●
● ●●
●
●
●
● ●
●●●
●●
●
●
●●
●
●●●●
● ●●●
●●
●●
●
● ●●
● ● ●
●
● ●●
●●
●
●●●●●●●●●●●
● ● ●● ●● ●●● ● ●
●
● ●●
●
●
●●
●
●●● ●
●
●●
●
●●
●
●
●
●●
●
●
● ●
●
●
●●
● ●
●
●●
●●●
●
●
●
●
●●●
●●● ●
●
●
●
●
●
●●
●●●●●
●●
●●●●●● ●●
●● ●●
●
● ●●●
● ● ● ● ●●●●●●
● ● ●●●●●●●● ● ●● ●● ● ● ● ●
●● ●●●●● ●●● ●● ●● ● ● ● ●● ●● ●
●●
●●●●●●●●●●● ●● ●●● ● ●●● ●●● ●●●●●
●●●
●●● ●
● ● ●● ●
●●●●●●
● ●● ●●● ●●● ●●●●● ●● ●
●
●●● ●●● ●
●
●
●
●●
●
●
●
●
●
●● ●●
●● ●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●●●
●●
●
●
●●●●●●●
●
●
●●●●● ●
●
●
●
●
●●●
●●
●
●
●
● ● ● ●●
● ●
●●
● ●
●
● ●
● ●● ●
● ●
●
●●●
● ●●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
● ●●
●
● ● ●●
●
●
●●
●●
●●
●●
● ●
●●●●
●●●●
● ● ●
● ●●● ● ● ●●
●● ● ●●● ● ● ● ●
●
● ●● ● ● ●● ● ●●● ●● ●● ●●●●●●●●●● ●● ●● ●● ● ● ● ●
● ●
● ●●●●●●●● ●●●
● ●● ● ● ● ●●
● ●●
● ●● ●●●●● ● ●
●●●●
●●
●●
●●
●●
●●●● ● ●
●● ●●●
● ● ●●
● ●●● ● ● ●
●●● ● ● ● ●
● ●● ●●●●●
● ● ●
●● ●●● ●
● ●● ●
● ●● ●● ●●●●●●● ●● ● ●● ●
−0.5

●● ●●● ●●●●● ●● ● ● ● ● ●
●
●●
●
●● ●
●● ● ●● ●
● ● ●● ● ● ●●●● ●
●●
● ●●● ●● ●● ●●
● ● ● ●●
●
●
●
●
●
●
●
●
●
●●
●●●
●●●● ● ●
● ● ● ● ●
●●
● ●
● ●
0.0

−4 −2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0

rho target theta target

Figure 2.15: Predictions vs. initial values

48
}

# Other priors
for (i in 1:2) {
beta[i] ~ dnorm(0,1.0E-6)
}
for (i in 1:3) {
gamma[i] ~ dnorm(0,1.0E-6)
}
Vrho ~ dunif(0,10)
tau <- 1/Vrho

# Create model.txt file in the working directory

system(paste("echo \"",modelBUGS.txt,"\" > modelBUGS.txt",sep=""))

# Data for OpenBUGS

y <- data.obs$Y
IdCellforSite <- data.suit$cell
IdSiteforObs <- data.obs$site
alt.suit <- data.suit$alt
w1 <- data.obs$w1
alt.obs <- data.obs$alt
num <- n.neighbors
adj <- adj
nobs <- length(y)
nsite <- length(IdCellforSite)
ncells <- length(n.neighbors)
sumNumNeigh <- length(adj)
data <- list("y","IdCellforSite","IdSiteforObs","alt.suit","w1","alt.obs","num",
"adj","nobs","nsite","ncells","sumNumNeigh")

# Inits
inits <- list(list(beta=rep(0,2),gamma=rep(0,3),rho=rep(0,ncells),Vrho=1))

# OpenBUGS call
library(R2OpenBUGS)
Start <- Sys.time() # Start the clock
Open <- bugs(data,inits,
model.file="modelBUGS.txt",
parameters=c("beta","gamma","Vrho","rho"),
n.chains=1,
OpenBUGS.pgm="/usr/bin/OpenBUGS",
n.iter=2000,

49
Value OpenBUGS hSDM
β0 -1.49 -1.39
βalt 1.11 1.08
γ0 -1.06 -1.11
γw1 1.13 1.12
γalt -0.55 -0.56
Vρ 6.38 6.84
Time (secs) 59 14

Table 2.2: Comparison between hSDM and OpenBUGS outputs.

n.burnin=1000,
n.thin=5,
DIC=TRUE,
debug=FALSE,
clearWD=FALSE)
Time.OpenBUGS <- difftime(Sys.time(),Start,units="sec") # Time difference

# Time difference
ratio.time <- as.numeric(Time.OpenBUGS)/as.numeric(Time.hSDM)
ratio.time # For this example, hSDM is X times faster

#== Outputs
Open$DIC
Open$pD
beta.pred.Open <- apply(Open$sims.list$beta,2,mean)
gamma.pred.Open <- apply(Open$sims.list$gamma,2,mean)
Vrho.pred.Open <- mean(Open$sims.list$Vrho)
deviance.Open <- mean(Open$sims.list$deviance)
rho.OpenBUGS <- apply(Open$sims.list$rho,2,mean)
plot(rho.pred,rho.OpenBUGS)
abline(a=0,b=1,col="red")

50
●

3
●

●●
●●
● ●
●●
●

2
●
●●●
●●●
●●
●●
●● ●
●●●
●
●
●●●●
●●●●
●● ● ●
●●
●●●●●
rho.OpenBUGS
● ●●●
●●●
●
●● ●● ●
●
●●●
●
●●●●
●●
●●
●●●●●
●●
● ●● ●●●●
● ●●●●
●
●●●●● ●
●●●●
●●●●
● ●
● ●
●●
●
● ●● ●●
●●●●● ●
●● ●● ●●● ● ●
●
1

−1 0 1 2 3

rho.pred

Figure 2.16: Comparison between hSDM and OpenBUGS for spatial random
effect estimates.

2.4.5 Comparison with GLM results

#== glm results for comparison

mod.glm <- glm(y~alt,family="binomial",data=data.obs)
summary(mod.glm)

##
## Call:
## glm(formula = y ~ alt, family = "binomial", data = data.obs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.4152 -0.4147 -0.4145 -0.4144 2.2352
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.410839 0.149237 -16.154 <2e-16 ***
## alt -0.001029 0.144026 -0.007 0.994
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 338.53 on 594 degrees of freedom

51
## Residual deviance: 338.53 on 593 degrees of freedom
## AIC: 342.53
##
## Number of Fisher Scoring iterations: 5

# Create a raster for predictions

theta.pred.glm <- raster(theta)
# Attribute predicted values to raster cells
theta.pred.glm[] <- predict.glm(mod.glm,newdata=data.pred,type="response")
# Plot the predicted probability of presence
plot(theta.pred.glm,main="GLM for siteocc iCAR",col=colRP(nb),breaks=brks,
axis.args=arg,zlim=c(0,1))

# Comparing predictions to initial values

plot(theta[],theta.pred.glm[],
xlim=c(0,1),ylim=c(0,1),cex.lab=1.4)
abline(a=0,b=1,col="red",lwd=2)

52
Initial probabilities (iCAR model) GLM for siteocc iCAR
50

50
● ● ●
● ● ● ● ●
● ●
● ●
● ●
●
● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
40

40
● ●
● ● ●
● ● 1 1
● ● ●●
● ●
● ● ●
●● ● ●
● ● ●
● ● ●
● ●
● ●● ●
● ● ●
● ●
0.75 0.75
30

30
● ●
● ● ● ● ●
● ● ● ●
●
● ●
● ● ● ●
●
● ●●
● ●
● ●
● ● ● ● 0.5 0.5
●
● ● ●
●● ● ●
● ●
● ● ● ●
●
20

20
●● ●
● ● ● ●● 0.25 0.25
● ●●● ● ● ●
● ●
● ●
● ●
●●
● ●
● ● ● ●● ●
● ● ● ● ● ●
● ●
●
● 0 0
●
● ● ●
10

10
● ● ● ●
● ●
● ● ● ●
● ●
● ● ●● ● ● ●
● ●
● ● ● ●●● ● ●
● ● ● ●
● ● ●
●● ●
●
●
0

0 10 20 30 40 50 0 10 20 30 40 50
1.0
0.8
theta.pred.glm[]
0.6
0.4
0.2

0.0 0.2 0.4 0.6 0.8 1.0

theta[]

Figure 2.17: Comparing predicted probability of presence using GLM with initial
probabilities for a site-occupancy model with iCAR process.

53
54
CHAPTER 3

Abundance data

55
56
CHAPTER 4

Additional examples with real data

4.1 Binomial iCAR model with tens of thousands spa-

tial cells
This example illustrates the use of the hSDM.binomial.iCAR() function on a large region
(tens of thousands grid cells). The data-set includes presence-absence observations for
Protea punctata Meisn. (Fig. 4.1) in the Cap Floristic Region. The data-set also includes
environmental variables for 36909 one minute by one minute grid cells on the whole South
Africa’s Cap Floristic Region (Fig. 4.2).

# Libraries
require(sp)
require(raster)
library(hSDM)

# Load data
data(cfr.env, package="hSDM")
dim(cfr.env) # 36909 cells
data(punc10, package="hSDM")
dim(punc10) # 2934 observations

# Standardize predictors
for (i in 3:8) {
m <- cfr.env[,i]-mean(cfr.env[,i], na.rm=T)
cfr.env[,i] <- m/sd(cfr.env[,i], na.rm=T)
}

57
Figure 4.1: Photography of Protea punctata Meisn.

# Make both data sets spatial objects

cfr.env <- SpatialPixelsDataFrame(points=cfr.env[c("lon","lat")],
tol=0.175039702866343,
data=cfr.env[,-c(1,2)])
fullgrid(cfr.env) <- TRUE
coordinates(punc10) <- c(2,3)

# Plot the whole data set

spplot(cfr.env,sp.layout=list('sp.points',
punc10[punc10$Occurrence==1,],
col='black',pch='o'),
col.regions=rainbow(100,start=0.67,end=0))

# Get the indices of cells where presences and absences have been observed.
cfr.env.rast <- stack(cfr.env)
pres <- extract(cfr.env.rast, SpatialPoints(punc10[punc10$Occurrence==1,]),
cellnumbers=TRUE)[,1]
abs <- extract(cfr.env.rast, SpatialPoints(punc10[punc10$Occurrence==0,]),
cellnumbers=TRUE)[,1]

# Make the data frame used in regressions

ncelltot <- length(cfr.env) # Including NULL cells
d <- data.frame(lon=coordinates(cfr.env)[,1],lat=coordinates(cfr.env)[,2],
Y=rep(0,ncelltot),
trials=rep(0,ncelltot),
cell.orig=1:ncelltot,
cfr.env@data)

58
text1 text2

o o
3
o o

oooo
oo oooo
oo
oooo oooo
o o o o
oo o ooo oo oooooooooooooo o oo o ooo oo oooooooooooooo o 2
o o oo o oo
oooooo
oooo o o oo o oo
oooooo
oooo
oo o ooo o o
oo oo
oooooo oo oo o ooo o o
oo oooooooo oo
oo ooo oo ooo

1
fert3 ph1

o o
0
o o

ooo
oo oo
oo
o
ooo o
oo
oo oo
oo o oo o
oo o ooo oo oooooooooooo oo o ooo oo oooooooooooo
o oo oo oo o oo
oooooo
oooo o oo oo oo o oo
oooooo
oooo −1
oo o ooo o o
oo oo
oooooo oo oo o ooo o o
oo oooooooo oo
oo ooo oo ooo

−2
min07 smdwin

o o

o o −3
oooo
oo oooo
oo
oooo oooo
o o o o
oo o ooo oo ooooooooooooo oo o ooo oo ooooooooooooo
o o oo o oo o oo
oooooo
oooo o o oo o oo o oo
oooooo
oooo
oo o ooo o o
oo ooooooo oo oo o ooo o o
oo ooooo oo oo −4
oo ooo oo ooo

Figure 4.2: Values of environmental variables in the Cap Floristic Region. Points
of presence of Protea punctata are represented by a circle.

59
d$Y[pres] <- 1
d$trials[c(pres,abs)] <- 1

# Remove NAs
to.remove <- which(!complete.cases(d))
d <- d[-to.remove,]
summary(d)

# Make d a spatial object for later use

coordinates(d) <- c(1,2)

# Find cells' neighborhood with function 'adjacent' from the 'raster' package
plot(cfr.env.rast)
sel.cell <- d$cell.orig
neighbors.mat <- adjacent(cfr.env.rast, cells=sel.cell, directions=8,
pairs=TRUE, target=sel.cell, sorted=TRUE)
n.neighbors <- as.data.frame(table(as.factor(neighbors.mat[,1])))[,2]
neighbors.orig <- neighbors.mat[,2]

# Sorting cells from 1 to dim(d)[1] (dim(d)[1]=36907)

s.cell <- sort(unique(d$cell.orig))
d$cell <- match(d$cell.orig,s.cell)
s.neighbors <- sort(unique(neighbors.orig))
neighbors <- match(neighbors.orig,s.neighbors)

# glm, just to compare

mod.glm <- glm(cbind(Y,trials-Y)~min07+smdwin,data=d, family="binomial")
summary(mod.glm)

# hSDM
mod.hSDM.binomial.iCAR <- hSDM.binomial.iCAR(presences=d$Y[d$trials>0],
trials=d$trials[d$trials>0],
suitability=~min07+smdwin,
spatial.entity=d$cell[d$trials>0],
data=d[d$trials>0,],
n.neighbors=n.neighbors,
neighbors=neighbors,
suitability.pred=d,
spatial.entity.pred=d$cell,
burnin=1000,
mcmc=1000, thin=1,
beta.start=c(0,0,0),
Vrho.start=10,
priorVrho="1/Gamma",

60
mubeta=0, Vbeta=1.0E6,
shape=2, rate=1,
Vrho.max=10,
seed=1234, verbose=1, save.rho=0)

# Outputs
summary(mod.hSDM.binomial.iCAR$mcmc)

##
## Iterations = 1001:2000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) -6.7924 0.2721 0.008605 0.05725
## beta.min07 -2.7341 0.1630 0.005153 0.03472
## beta.smdwin 0.7583 0.1404 0.004439 0.02668
## Vrho 6.7782 1.0610 0.033552 0.67466
## Deviance 489.1038 20.4751 0.647480 4.02088
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) -7.4383 -6.969 -6.7523 -6.6060 -6.363
## beta.min07 -3.0927 -2.840 -2.7185 -2.6345 -2.420
## beta.smdwin 0.4662 0.674 0.7581 0.8437 1.007
## Vrho 5.3646 5.934 6.5480 7.6397 8.843
## Deviance 451.8252 474.131 488.0859 502.8298 533.916

# Put output together

out <- data.frame(d,pred=mod.hSDM.binomial.iCAR$theta.pred,
sp.ef=mod.hSDM.binomial.iCAR$rho.pred)

# Plot results
coordinates(out) <- coordinates(d)
out <- SpatialPixelsDataFrame(out,tol=0.175039702866343,data=data.frame(out))
fullgrid(out) <- TRUE
p1 <- spplot(out['pred'],col.regions=rainbow(100,start=0.67,end=0),
sp.layout=list('sp.points',punc10[punc10$Occurrence==1,],
col='black',pch='o'))

61
1.0

o
0.8
o

o ooo
o
0.6
oo
o
oo
oo
o
o oo
o
oooo
o oooooo
oooo
oooo oo
o
ooooo
ooooooooooo oooo ooooooooooo
oo
0.4
o
oo oooo ooooo oo ooooooooo
oo
o o ooo o ooo
o oo
o oooooooo oo oooo
ooo oooooo
0.2

0.0

−1

−2

−3

Figure 4.3: Predicted probability of presence (top) and estimated spatial random
effects (bottom). Points of presence of Protea punctata are represented by a circle.

p2 <- spplot(out['sp.ef'],col.regions=rainbow(100,start=0.67,end=0))
print(p1,position=c(0,0.5,1,1),more=T)
print(p2,position=c(0,0,1,0.5))

Using function hSDM.binomial.iCAR(), we were able to estimate the spatial random

effect of 36907 cells (Fig. 4.3) and we demonstrated that the use of this function is not
limited (through memory problem or a much too long computation time) by the number
of spatial grid cells. Nevertheless, in this particular example, it is very difficult to reach
convergence for the variance of the spatial random effects (see MCMC outputs above).
This is likely due to the low information content of binary maps and the relatively low
number of observations (2934). As previously underlined by Dormann et al. (2007), we
argue that binomial intrinsic CAR models require further study and caution in their use.
The hSDM R package offers tools to help ecologist explore the behavior and performance
of such models.

62
Figure 4.4: Photography of Protea cynaroides (L.) L..

4.2 Binomial iCAR model with data from Latimer

et al. (2006)
In the Appendix B of their scientific article, Latimer et al. (2006) provide some code to
fit what they called “Model 2”, a Binomial iCAR model using presence/absence data for
species Protea cynaroides (L.) L., a common Protea and the national flower of South Africa
(Fig. 4.4).
For the purpose of their example, Latimer et al. (2006) provide data for a small region
including 476 one minute by one minute grid cells. This region is is a small corner of South
Africa’s Cape Floristic Region, and includes very high plant species diversity and a World
Biosphere Reserve. Contrary to the previous example, the data-set includes several visits
at the same site.

# Library
library(hSDM)

# Load data
data(datacells.Latimer2006,package="hSDM")
datacells.Latimer2006$cell <- c(1:dim(datacells.Latimer2006)[1])
data(neighbors.Latimer2006,package="hSDM")

# Format data
p <- datacells.Latimer2006$y[datacells.Latimer2006$n>0]
t <- datacells.Latimer2006$n[datacells.Latimer2006$n>0]
s <- datacells.Latimer2006$cell[datacells.Latimer2006$n>0]
data.obs <- datacells.Latimer2006[datacells.Latimer2006$n>0,]

63
# Model
Start <- Sys.time() # Start the clock
mod.hSDM.Lat2006.iCAR <- hSDM.binomial.iCAR(presences=p,
trials=t,
suitability=~rough+julmint+pptcv+smdsum+evi+ph1,
spatial.entity=s,
data=data.obs,
n.neighbors=datacells.Latimer2006$num,
neighbors=neighbors.Latimer2006,
suitability.pred=datacells.Latimer2006,
spatial.entity.pred=datacells.Latimer2006$cell,
burnin=5000,
mcmc=5000, thin=5,
beta.start=0,
Vrho.start=10,
priorVrho="1/Gamma",
mubeta=0, Vbeta=1.0E6,
shape=0.001, rate=0.001,
Vrho.max=1000,
seed=1234, verbose=1,
save.rho=0,save.p=0)
Time.hSDM <- difftime(Sys.time(),Start,units="sec") # Time difference

# Some outputs
summary(mod.hSDM.Lat2006.iCAR$rho.pred)
summary(mod.hSDM.Lat2006.iCAR$theta.latent)
summary(mod.hSDM.Lat2006.iCAR$theta.pred)

# Parameter estimates
summary(mod.hSDM.Lat2006.iCAR$mcmc)

64
## beta.pptcv -0.50551 0.3131 0.009900 0.07505
## beta.smdsum -0.05816 0.2581 0.008162 0.06383
## beta.evi -0.12120 0.2250 0.007116 0.03272
## beta.ph1 1.18415 0.3666 0.011594 0.07490
## Vrho 10.01343 1.6786 0.053082 0.14259
## Deviance 741.74700 23.2539 0.735353 1.33653
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) -2.2863 -1.98541 -1.85755 -1.75081 -1.5319
## beta.rough -0.3059 -0.09902 0.04951 0.18855 0.4036
## beta.julmint -1.0113 -0.81231 -0.70643 -0.59932 -0.2302
## beta.pptcv -1.1328 -0.70159 -0.50666 -0.29266 0.1193
## beta.smdsum -0.5741 -0.23491 -0.05943 0.12900 0.4283
## beta.evi -0.5819 -0.26339 -0.12097 0.01707 0.3810
## beta.ph1 0.5368 0.92446 1.17545 1.41019 1.9645
## Vrho 7.1367 8.79720 9.92110 11.08064 13.5447
## Deviance 695.1199 725.74487 742.33633 758.12635 786.8725

Contrary to the previous example, and due to the higher information content associated
to the fact that each site is visited several times, it was easier to reach convergence for the
variance of the spatial random effects in this example.

# BUGS model
modelBUGS2.txt <-
"model {

# Likelihood
for (i in 1:N_nonzeroy) {
y[ind[i]] ~ dbin(p[ind[i]], n[ind[i]])
}

for(i in 1:N_LOC){
logit(p[i]) <- rho[i]+xbeta[i]+mu
xbeta[i]<-beta[1]*rough[i] + beta[2]*julmint[i] + beta[3]*pptcv[i] +
beta[4]*smdsum[i] + beta[5]*evi[i] + beta[6]*ph1[i]
}

# CAR prior distribution for spatial random effects:

rho[1:N_LOC] ~ car.normal(adj[], weights[], num[], tau)
for(k in 1:sumNumNeigh) {
weights[k] <- 1 # set equal weights for all neighbors
}

65
# Other priors
mu ~ dnorm(0,0.1)
for (i in 1:6) {
beta[i] ~ dnorm(0, 0.2)
}
vrho <- 1/tau
tau ~ dgamma(0.001,0.001)

# Create model.txt file in the working directory

system(paste("echo \"",modelBUGS2.txt,"\" > modelBUGS2.txt",sep=""))

# Data for OpenBUGS

y <- datacells.Latimer2006$y
n <- datacells.Latimer2006$n
rough <- datacells.Latimer2006$rough
julmint <- datacells.Latimer2006$julmint
pptcv <- datacells.Latimer2006$pptcv
smdsum <- datacells.Latimer2006$smdsum
evi <- datacells.Latimer2006$evi
ph1 <- datacells.Latimer2006$ph1
num <- datacells.Latimer2006$num
adj <- neighbors.Latimer2006
ind <- which(datacells.Latimer2006$n!=0)
N_LOC <- length(y)
N_nonzeroy <- length(ind)
sumNumNeigh <- length(adj)

data <- list("y","n","rough","julmint","pptcv","smdsum",

"evi","ph1","num",
"adj","ind","N_LOC","N_nonzeroy","sumNumNeigh")

# Inits
inits <- list(list(mu=1,beta=rep(1.5,6),rho=rep(0,N_LOC),tau=1))

# OpenBUGS call
library(R2OpenBUGS)
Start <- Sys.time() # Start the clock
Open <- bugs(data,inits,
model.file="modelBUGS2.txt",
parameters=c("mu","beta","vrho"),
n.chains=1,
OpenBUGS.pgm="/usr/bin/OpenBUGS",
n.iter=2000,

66
n.burnin=1000,
n.thin=5,
DIC=TRUE,
debug=FALSE,
clearWD=FALSE)
Time.OpenBUGS <- difftime(Sys.time(),Start,units="sec") # Time difference

# Time difference
ratio.time <- as.numeric(Time.OpenBUGS)/as.numeric(Time.hSDM)

# Parameter estimates with OpenBUGS

print(Open$summary[,c(1,2)])

## mean sd
## mu -1.85205700 0.1598051
## beta[1] 0.03595761 0.1455205
## beta[2] -0.74706210 0.2354220
## beta[3] -0.49305137 0.2673517
## beta[4] -0.12947751 0.3064259
## beta[5] -0.14999565 0.1899236
## beta[6] 1.15288960 0.2608238
## vrho 9.59598700 1.5356158
## deviance 740.95200000 21.0887086

For this example, hSDM and OpenBUGS gave similar estimates for model parameters.
For the same number of iterations (10000), and for a relatively low number of grid cells
(476), hSDM was more than twice as fast as OpenBUGS.

4.3 ZIB model with data from Latimer et al. (2006)

Because sites have been visited several times, the same data-set can be used to fit a ZIB
model accounting for imperfect detection. If the observation conditions were different
from one visit to another, we would have to use the hSDM.siteocc() function which uses
a mixture model combining two Bernoulli processes. But in this case, the observation
conditions are not specified and can be supposed to be the same so that we can use
the hSDM.ZIB() function of the hSDM package. The hSDM.ZIB() function uses a mixture
model combining a Binomial process for observability and a Bernoulli process for suitability.

# Model
mod.hSDM.Lat2006.ZIB <- hSDM.ZIB(presences=p,
trials=t,
suitability=~rough+julmint+pptcv+smdsum+evi+ph1,
observability=~1,

67
data=data.obs,
suitability.pred=datacells.Latimer2006,
burnin=1000,
mcmc=1000, thin=1,
beta.start=0,
gamma.start=0,
mubeta=0, Vbeta=1.0E6,
mugamma=0, Vgamma=1.0E6,
seed=1234, verbose=1,
save.p=0)

# Some outputs
summary(mod.hSDM.Lat2006.ZIB$prob.p.pred)
summary(mod.hSDM.Lat2006.ZIB$prob.p.latent)
summary(mod.hSDM.Lat2006.ZIB$prob.q.latent)

# Parameter estimates
summary(mod.hSDM.Lat2006.ZIB$mcmc)

##
## Iterations = 1001:2000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) 3.376e-01 0.22592 0.007144 0.024106
## beta.rough -7.023e-03 0.26532 0.008390 0.023803
## beta.julmint 8.855e-01 0.31619 0.009999 0.040362
## beta.pptcv -4.258e-01 0.33632 0.010635 0.044614
## beta.smdsum 6.609e-01 0.32246 0.010197 0.038242
## beta.evi -9.290e-01 0.30885 0.009767 0.027722
## beta.ph1 1.607e+00 0.34856 0.011022 0.035347
## gamma.(Intercept) 1.259e-01 0.03747 0.001185 0.003638
## Deviance 1.838e+03 4.15851 0.131504 0.408601
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) -0.06711 0.17945 3.156e-01 0.4737 0.7671
## beta.rough -0.56028 -0.17994 -2.721e-03 0.1628 0.4449

68
## beta.julmint 0.22677 0.69281 9.022e-01 1.1097 1.4876
## beta.pptcv -1.10536 -0.61322 -3.971e-01 -0.2145 0.2101
## beta.smdsum 0.05880 0.44203 6.429e-01 0.8748 1.3067
## beta.evi -1.63302 -1.14023 -8.925e-01 -0.6909 -0.4173
## beta.ph1 0.91858 1.38330 1.596e+00 1.8395 2.3208
## gamma.(Intercept) 0.04765 0.09901 1.235e-01 0.1497 0.1992
## Deviance 1831.80407 1834.72720 1.837e+03 1839.9540 1847.4970

# Detection probability
gamma.hat <- mean(mod.hSDM.Lat2006.ZIB$mcmc[,"gamma.(Intercept)"])
delta.est <- inv.logit(gamma.hat)
delta.est

## [1] 0.5314216

Using this type of model, we can estimate the detection probability of the species
(delta.est= 0.53).

4.4 ZIB iCAR model with data from Latimer et al.

(2006)

# Model
mod.hSDM.Lat2006.ZIB.iCAR <- hSDM.ZIB.iCAR(presences=p,
trials=t,
suitability=~rough+julmint+pptcv+smdsum+evi+ph1,
observability=~1,
spatial.entity=s,
data=data.obs,
n.neighbors=datacells.Latimer2006$num,
neighbors=neighbors.Latimer2006,
suitability.pred=datacells.Latimer2006,
spatial.entity.pred=datacells.Latimer2006$cell,
burnin=5000,
mcmc=5000, thin=5,
beta.start=0,
gamma.start=0,
Vrho.start=10,
priorVrho="Uniform",
mubeta=0, Vbeta=1.0E6,
mugamma=0, Vgamma=1.0E6,
shape=2, rate=1,
Vrho.max=10,

69
seed=1234, verbose=1,
save.rho=0,save.p=0)

# Some outputs
summary(mod.hSDM.Lat2006.ZIB.iCAR$prob.p.pred)
summary(mod.hSDM.Lat2006.ZIB.iCAR$prob.p.latent)
summary(mod.hSDM.Lat2006.ZIB.iCAR$prob.q.latent)

# Parameter estimates
summary(mod.hSDM.Lat2006.ZIB.iCAR$mcmc)

##
## Iterations = 5001:9996
## Thinning interval = 5
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) 0.7138 0.26719 0.008449 0.014766
## beta.rough 0.5643 0.39828 0.012595 0.046211
## beta.julmint -1.1857 0.59331 0.018762 0.069454
## beta.pptcv -0.7958 0.49922 0.015787 0.061846
## beta.smdsum -0.4783 0.62096 0.019636 0.088051
## beta.evi -0.5099 0.38857 0.012288 0.023951
## beta.ph1 1.3835 0.42089 0.013310 0.035438
## gamma.(Intercept) 0.1299 0.03698 0.001170 0.001329
## Vrho 9.1243 0.76955 0.024335 0.061263
## Deviance 1729.7004 10.98519 0.347382 0.845395
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) 0.21252 0.5418 0.7078 0.90554 1.22941
## beta.rough -0.15347 0.2965 0.5564 0.81978 1.41734
## beta.julmint -2.37556 -1.5903 -1.1581 -0.77204 -0.09995
## beta.pptcv -1.68890 -1.1230 -0.8478 -0.46587 0.26083
## beta.smdsum -1.56522 -0.8984 -0.5346 -0.08909 0.84548
## beta.evi -1.27733 -0.7641 -0.4957 -0.23088 0.17985
## beta.ph1 0.61176 1.0899 1.3566 1.67052 2.17121
## gamma.(Intercept) 0.06067 0.1043 0.1309 0.15590 0.19941
## Vrho 7.11220 8.7303 9.3438 9.73560 9.98959
## Deviance 1710.18198 1721.6650 1729.2054 1737.04701 1752.44302

70
# Detection probability
gamma.hat <- mean(mod.hSDM.Lat2006.ZIB.iCAR$mcmc[,"gamma.(Intercept)"])
delta.est <- inv.logit(gamma.hat)
delta.est

## [1] 0.5324361

4.5 Abundance models with data from Kéry & An-

drew Royle (2010)
4.5.1 Presentation of the data
The data-set from Kéry & Andrew Royle (2010) includes repeated count data for the
Willow tit (Poecile montanus, a pesserine bird, see Fig. 4.5) in Switzerland on the period
1999-2003. Data come from the Swiss national breeding bird survey MHB (Monitoring
Haüfige Brutvögel). MHB is based on 264 1-km2 sampling units (quadrats) laid out as a
grid (Fig. 4.6). Since 1999, every quadrat has been surveyed two to three times during most
breeding seasons (15 April to 15 July). The Willow tit is a widespread but moderately
rare bird species. It has a weak song and elusive behaviour and can be rather difficult to
detect.
This data-set is available in the hSDM R package. It can be loaded with the data
command and formated to be used with hSDM functions.

# Load libraries
library(hSDM)
library(sp)
library(raster)

# Load Kéry et al. 2010 data

data(data.Kery2010,package="hSDM")
head(data.Kery2010)

# Normalized variables
elev.mean <- mean(data.Kery2010$elevation)
elev.sd <- sd(data.Kery2010$elevation)
juldate.mean <- mean(c(data.Kery2010$juldate1,
data.Kery2010$juldate2,
data.Kery2010$juldate3),na.rm=TRUE)
juldate.sd <- sd(c(data.Kery2010$juldate1,
data.Kery2010$juldate2,
data.Kery2010$juldate3),na.rm=TRUE)
data.Kery2010$elevation <- (data.Kery2010$elevation-elev.mean)/elev.sd

71
Figure 4.5: Willow tit (Poecile montanus).

data.Kery2010$juldate1 <- (data.Kery2010$juldate1-juldate.mean)/juldate.sd

data.Kery2010$juldate2 <- (data.Kery2010$juldate2-juldate.mean)/juldate.sd
data.Kery2010$juldate3 <- (data.Kery2010$juldate3-juldate.mean)/juldate.sd

# Landscape and observation sites

sites.sp <- SpatialPointsDataFrame(coords=data.Kery2010[c("coordx","coordy")],
data=data.Kery2010[,-c(1,2)])
xmin <- min(data.Kery2010$coordx)
xmax <- max(data.Kery2010$coordx)
ymin <- min(data.Kery2010$coordy)
ymax <- max(data.Kery2010$coordy)
ext <- extent(c(xmin,xmax,ymin,ymax))
ncol <- round((xmax-xmin)/10)
nrow <- round((ymax-ymin)/10)
landscape <- raster(ncols=ncol,nrows=nrow,ext)
values(landscape) <- runif(ncell(landscape),0,1)
landscape.po <- rasterToPolygons(landscape)
plot(landscape.po)
plot(sites.sp,add=TRUE,col="red",pch=16)
# Neighborhood
# Rasters must be projected to correctly compute the neighborhood
crs(landscape) <- '+proj=utm +zone=1'
# Cell for each site
cells <- extract(landscape,sites.sp,cell=TRUE)[,1]
# Neighborhood matrix
ncells <- ncell(landscape)
neighbors.mat <- adjacent(landscape, cells=c(1:ncells), directions=8,
pairs=TRUE, sorted=TRUE)
# Number of neighbors by cell

72
n.neighbors <- as.data.frame(table(as.factor(neighbors.mat[,1])))[,2]
# Adjacent cells
adj <- neighbors.mat[,2]

# Arranging data
# data.obs
nsite <- length(data.Kery2010$coordx)
count <- c(data.Kery2010$count1,data.Kery2010$count2,data.Kery2010$count3)
juldate <- c(data.Kery2010$juldate1,data.Kery2010$juldate2,
data.Kery2010$juldate3)
site <- rep(1:nsite,3)
data.obs <- data.frame(count,juldate,site)
data.obs <- data.obs[!is.na(data.obs$juldate),]
# data.suit
data.suit <- data.Kery2010[c("coordx","coordy","elevation","forest")]
data.suit$cells <- cells
data.suit <- data.suit[-139,] # Removing site 139 with no juldate

4.5.2 Simple Poisson model

# hSDM.poisson
data.pois <- data.obs
data.pois$elevation <- data.suit$elevation[as.numeric(as.factor(data.obs$site))]
mod.Kery2010.pois <- hSDM.poisson(counts=data.pois$count,
suitability=~elevation+I(elevation^2),
data=data.pois,beta.start=0)

# Outputs
summary(mod.Kery2010.pois$mcmc)

##
## Iterations = 5001:14991
## Thinning interval = 10
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) 0.02814 0.06428 0.002033 0.003726
## beta.elevation 3.08127 0.15347 0.004853 0.017159

73
● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ● ● ●● ● ●●
● ● ● ● ● ● ●● ● ● ●●● ● ●● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●● ●● ● ●
● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ● ● ● ● ● ●● ●
● ● ● ●
● ● ●● ● ● ● ● ● ●
● ● ●● ● ● ● ●● ● ● ● ●
●
● ● ● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ●●●
●● ●● ●
●
●

Figure 4.6: Location of the 264 1-km2 quadrats of the Swiss national breeding
bird survey. Points are located on a grid of 10-km2 cells. The grid is covering the
geographical extent of the observation points.

74
## beta.I(elevation^2) -1.79995 0.10235 0.003236 0.010673
## Deviance 2157.88058 2.40508 0.076055 0.116160
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) -0.09261 -0.01438 0.02764 0.07104 0.1562
## beta.elevation 2.80721 2.97833 3.07749 3.17837 3.3823
## beta.I(elevation^2) -2.00237 -1.86256 -1.79905 -1.73075 -1.6108
## Deviance 2155.19932 2156.14125 2157.27804 2158.97142 2164.3795

# Predictions
npred <- 100
nsamp <- dim(mod.Kery2010.pois$mcmc)[1]
# Abundance-elevation
elev.seq <- seq(500,3000,length.out=npred)
elev.seq.n <- (elev.seq-elev.mean)/elev.sd
beta <- as.matrix(mod.Kery2010.pois$mcmc[,1:3])
tbeta <- t(beta)
X <- matrix(c(rep(1,npred),elev.seq.n,elev.seq.n^2),ncol=3)
N <- matrix(NA,nrow=nsamp,ncol=npred)
for (i in 1:npred) {
N[,i] <- exp(X[i,] %*% tbeta)
}
N.est.pois <- apply(N,2,mean)
N.q1.pois <- apply(N,2,quantile,0.025)
N.q2.pois <- apply(N,2,quantile,0.975)

4.5.3 N-mixture model with imperfect detection

# hSDM.Nmixture
mod.Kery2010.Nmix <- hSDM.Nmixture(# Observations
counts=data.obs$count,
observability=~juldate+I(juldate^2),
site=data.obs$site,
data.observability=data.obs,
# Habitat
suitability=~elevation+I(elevation^2),
data.suitability=data.suit,
# Predictions
suitability.pred=NULL,
# Chains

75
burnin=10000, mcmc=5000, thin=5,
# Starting values
beta.start=0,
gamma.start=0,
# Priors
mubeta=0, Vbeta=1.0E6,
mugamma=0, Vgamma=1.0E6,
# Various
seed=1234, verbose=1,
save.p=0, save.N=0)

# Outputs
summary(mod.Kery2010.Nmix$mcmc)

##
## Iterations = 10001:14996
## Thinning interval = 5
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) 0.6726 0.08660 0.002739 0.007342
## beta.elevation 2.7879 0.19608 0.006200 0.023100
## beta.I(elevation^2) -1.7905 0.14436 0.004565 0.018337
## gamma.(Intercept) 0.2537 0.10346 0.003272 0.008333
## gamma.juldate -0.2253 0.08493 0.002686 0.004897
## gamma.I(juldate^2) 0.2658 0.08202 0.002594 0.005667
## Deviance 1887.5828 27.73003 0.876900 2.195556
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) 0.50177 0.6142 0.6744 0.7296 0.83792
## beta.elevation 2.41646 2.6443 2.7935 2.9233 3.17012
## beta.I(elevation^2) -2.07261 -1.8875 -1.7856 -1.6886 -1.51417
## gamma.(Intercept) 0.05384 0.1768 0.2596 0.3280 0.44389
## gamma.juldate -0.38921 -0.2820 -0.2230 -0.1683 -0.05706
## gamma.I(juldate^2) 0.09516 0.2089 0.2669 0.3209 0.41889
## Deviance 1837.06989 1868.2134 1885.6799 1906.7686 1942.45453

76
# Predictions
nsamp <- dim(mod.Kery2010.Nmix$mcmc)[1]
# Abundance-elevation
beta <- as.matrix(mod.Kery2010.Nmix$mcmc[,1:3])
tbeta <- t(beta)
N <- matrix(NA,nrow=nsamp,ncol=npred)
for (i in 1:npred) {
N[,i] <- exp(X[i,] %*% tbeta)
}
N.est.Nmix <- apply(N,2,mean)
N.q1.Nmix <- apply(N,2,quantile,0.025)
N.q2.Nmix <- apply(N,2,quantile,0.975)
# Detection-Julian date
juldate.seq <- seq(100,200,length.out=npred)
juldate.seq.n <- (juldate.seq-juldate.mean)/juldate.sd
gamma <- as.matrix(mod.Kery2010.Nmix$mcmc[,4:6])
tgamma <- t(gamma)
W <- matrix(c(rep(1,npred),juldate.seq.n,juldate.seq.n^2),ncol=3)
delta <- matrix(NA,nrow=nsamp,ncol=npred)
for (i in 1:npred) {
delta[,i] <- inv.logit(X[i,] %*% tgamma)
}
delta.est.Nmix <- apply(delta,2,mean)
delta.q1.Nmix <- apply(delta,2,quantile,0.025)
delta.q2.Nmix <- apply(delta,2,quantile,0.975)

4.5.4 Nmixture model with iCAR process

# hSDM.Nmixture.iCAR
mod.Kery2010.Nmix.iCAR <- hSDM.Nmixture.iCAR(# Observations
counts=data.obs$count,
observability=~juldate+I(juldate^2),
site=data.obs$site,
data.observability=data.obs,
# Habitat
suitability=~elevation+I(elevation^2),
data.suitability=data.suit,
# Spatial structure
spatial.entity=data.suit$cells,
n.neighbors=n.neighbors, neighbors=adj,
# Chains
burnin=20000, mcmc=10000, thin=10,
# Starting values

77
beta.start=0,
gamma.start=0,
Vrho.start=1,
# Priors
mubeta=0, Vbeta=1.0E6,
mugamma=0, Vgamma=1.0E6,
priorVrho="1/Gamma",
shape=1, rate=1,
# Various
seed=1234, verbose=1,
save.rho=0, save.p=0, save.N=0)

# Outputs
summary(mod.Kery2010.Nmix.iCAR$mcmc)

##
## Iterations = 20001:29991
## Thinning interval = 10
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) 0.3037 0.29294 0.009264 0.033554
## beta.elevation 2.1858 0.52575 0.016626 0.109007
## beta.I(elevation^2) -1.9443 0.31587 0.009989 0.051676
## gamma.(Intercept) -0.7980 0.17325 0.005479 0.048075
## gamma.juldate -0.1653 0.06898 0.002181 0.004864
## gamma.I(juldate^2) 0.1451 0.05597 0.001770 0.003551
## Vrho 15.2809 3.51718 0.111223 0.362266
## Deviance 1383.3447 45.30455 1.432656 7.774908
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) -0.26973 0.1131 0.2992 0.4970 0.8584
## beta.elevation 1.28847 1.8279 2.1260 2.4716 3.5030
## beta.I(elevation^2) -2.63988 -2.1388 -1.9197 -1.7269 -1.3687
## gamma.(Intercept) -1.10129 -0.9177 -0.8203 -0.6720 -0.4640
## gamma.juldate -0.29742 -0.2094 -0.1668 -0.1206 -0.0243
## gamma.I(juldate^2) 0.04049 0.1070 0.1458 0.1805 0.2670
## Vrho 9.69374 12.8070 14.8371 17.2770 23.6666
## Deviance 1300.66478 1351.5389 1381.0295 1411.1528 1479.4193

78
# Spatial random effects
rho.pred <- mod.Kery2010.Nmix.iCAR$rho.pred
r.rho.pred <- rasterFromXYZ(cbind(coordinates(landscape),rho.pred))
plot(r.rho.pred)
# Mean abundance by site
ma <- apply(sites.sp@data[,3:5],1,mean,na.rm=TRUE)
points(sites.sp,pch=".",cex=2)
points(sites.sp,pch=1,cex=ma/2)

# Predictions
nsamp <- dim(mod.Kery2010.Nmix.iCAR$mcmc)[1]
# Abundance-elevation
beta <- as.matrix(mod.Kery2010.Nmix.iCAR$mcmc[,1:3])
tbeta <- t(beta)
N <- matrix(NA,nrow=nsamp,ncol=npred)
# Simplified way of obtaining samples for rho
rho.samp <- sample(rho.pred,nsamp,replace=TRUE)
for (i in 1:npred) {
N[,i] <- exp(X[i,] %*% tbeta + rho.samp)
}
N.est.Nmix.iCAR <- apply(N,2,mean)
N.q1.Nmix.iCAR <- apply(N,2,quantile,0.025)
N.q2.Nmix.iCAR <- apply(N,2,quantile,0.975)

# Detection-Julian date
gamma <- as.matrix(mod.Kery2010.Nmix.iCAR$mcmc[,4:6])
tgamma <- t(gamma)
delta <- matrix(NA,nrow=nsamp,ncol=npred)
for (i in 1:npred) {
delta[,i] <- inv.logit(X[i,] %*% tgamma)
}
delta.est.Nmix.iCAR <- apply(delta,2,mean)
delta.q1.Nmix.iCAR <- apply(delta,2,quantile,0.025)
delta.q2.Nmix.iCAR <- apply(delta,2,quantile,0.975)

4.5.5 Comparing predictions from the three different models

# Expected abundance - Elevation

par(mar=c(4,4,1,1),cex=1.4,tcl=+0.5)
plot(elev.seq,N.est.pois,type="l",
xlim=c(500,3000),
ylim=c(0,7),

79
350
300

4
250

●
3
● ● ● ●

● ●

● 2
200

● ●

●
●
● ●

● ● ●
● 1
●
● ●

● ●
●
●
●●
●●● ● 0
● ●
150

●
● ●
●
● ● ●
●● ●
● ●● ● ● ●

●
●
−1
●
●
● ●

●
● ● ●
●

●
● −2
100

●
●
50

500 550 600 650 700 750 800

Figure 4.7: Estimated spatial random effects. Locations of observation quadrats are
represented by dots. The mean abundance on each quadrat is represented by a circle of
size proportional to abundance.

80
lwd=2,
xlab="Elevation (m a.s.l.)",
ylab="Expected abundance",
axes=FALSE)
#lines(elev.seq,N.q1.pois,lty=3,lwd=1)
#lines(elev.seq,N.q2.pois,lty=3,lwd=1)
axis(1,at=seq(500,3000,by=500),labels=seq(500,3000,by=500))
axis(2,at=seq(0,7,by=1),labels=seq(0,7,by=1))
# Nmix
lines(elev.seq,N.est.Nmix,lwd=2,col="red")
#lines(elev.seq,N.q1.Nmix,lty=3,lwd=1,col="red")
#lines(elev.seq,N.q2.Nmix,lty=3,lwd=1,col="red")
# Nmix.iCAR
lines(elev.seq,N.est.Nmix.iCAR,lwd=2,col="dark green")
#lines(elev.seq,N.q1.Nmix.iCAR,lty=3,lwd=1,col="dark green")
#lines(elev.seq,N.q2.Nmix.iCAR,lty=3,lwd=1,col="dark green")

# Detection probability - Julian date

par(mar=c(4,4,1,1),cex=1.4,tcl=+0.5)
plot(juldate.seq,delta.est.Nmix,type="l",
xlim=c(100,200),
ylim=c(0,1),
lwd=2,
col="red",
xlab="Julian date",
ylab="Detection probability",
axes=FALSE)
lines(juldate.seq,delta.q1.Nmix,lty=3,lwd=1,col="red")
lines(juldate.seq,delta.q2.Nmix,lty=3,lwd=1,col="red")
axis(1,at=seq(100,200,by=20),labels=seq(100,200,by=20))
axis(2,at=seq(0,1,by=0.2),labels=seq(0,1,by=0.2))
# Nmix.iCAR
lines(juldate.seq,delta.est.Nmix.iCAR,lwd=2,col="dark green")
lines(juldate.seq,delta.q1.Nmix.iCAR,lty=3,lwd=1,col="dark green")
lines(juldate.seq,delta.q2.Nmix.iCAR,lty=3,lwd=1,col="dark green")

81
7

1
6

0.8
5
Expected abundance

Detection probability

0.6
4
3

0.4
2

0.2
1
0

500 1000 1500 2000 2500 3000 100 120 140 160 180 200

Elevation (m a.s.l.) Julian date

Figure 4.8: Comparing predictions from the three different models. The three
different models are: Poisson (black), N-mixture (red) and N-mixture with iCAR process
(green). The plain lines represent the predictive posterior mean of the abundance or of
the probability of detection while the dashed lines represent the quantiles at 95% of the
predictive posterior given parameter uncertainty.

82
CHAPTER 5

Some technical aspects of parameter inference

5.1 Likelihood for site-occupancy models

As previously detailed in the mathematical formulation of the site-occupancy model, let’s
consider the random variable zi describing habitat suitability at site i. The random variable
zi can take value 1 or 0 depending on the fact that the habitat is suitable (zi = 1) or not
(zi = 0). Random variable zi can be assumed to follow a Bernoulli distribution of parameter
θi . In this case, θi is the probability that the habitat is suitable. Several visits at time t1 ,
t2 , etc., can occur at site i. Let’s consider the random variable yit representing
P the presence
of the species at site i and time t. The species is observed at site Pi ( t yit ≥ 1) only if the
habitat is suitable (zi = 1). The species is unobserved at site i ( t yit = 0) if the habitat is
not suitable (zi = 0), or if the habitat is suitable (zi = 1) but the probability δit of detecting
the species at site i and time t is inferior to 1. Given Hi the set of observations (list of
presence/absence) at site i, the likelihood L for site-occupancy models can be computed
as follow (Eq. 5.1).

Q
L= i p(Hi )
P Q
if t yit ≥ 1 p(Hi ) = p(zQi = 1) t p(yit )
p(Hi ) = θi t=1 p(yit )
(5.1)
with p(yit = 1) = δit and p(yit = 0) = 1 − δit
P Q
if t yit = 0 p(Hi ) = p(zi = 0) + p(z
Q i = 1) t p(yit = 0)
p(Hi ) = (1 − θi ) + θi t (1 − δit )

For site-occupancy models, there is a strong advantage of visiting a site several times.

83
When a site is visited several times for observation, if the species has been observed at least
once during the different visits, we can assert that the habitat at this site is suitable. And
the fact that the species can be unobserved at this site is only due to imperfect detection.
For more details, please refer to the original paper by MacKenzie et al. (2002) and the very
pedagogical note by Bailey & Adams (2005).

5.2 Random walk for estimating latent variables in

N-mixture models
For N-mixture models, in the hSDM package, the abundances Ni at site i are considered
as latent variables. Ni are estimated using a (simple) reflecting random walk Metropolis
algorithm (Hastings, 1970). Hastings (1970) suggests the following probabilities for the
proposal value N ? (Eq. 5.2).

If N = 0
p(N ? = 0|N = 0) = 1/2 p(N ? = 1|N = 0) = 1/2
(5.2)
If N > 0
p(N ? = i + 1|N = i) = 1/2 p(N ? = i − 1|N = i) = 1/2

In practice, if Ni are small, this choice seems to work fairly well and fast to approximate
the probability distributions of Ni , since it suffices that the chain visits only the first few
integers.

The approach used in the hSDM package for estimating parameters of N-mixture
models is different to the one proposed by Royle (2004) where the integral other Ni values
for the likelihood computation is approximated by a sum (see Eq. 3 in Royle (2004)). In
practice, the summation over Ni is restricted to a finite but large bound K. K should be
set high enough so that it does not affect the parameter estimates, but computation time
increases with K. In a Bayesian framework with MCMC methods, this approach (although
leading to equivalent parameter estimates for large values of K) is much slower than the
approach considering latent variables. For a comparison between the two approaches, a
function called hSDM.Nmixture.K(), which uses the approach by summation with bound
K, is available in the hSDM package.

5.3 Adaptive Metropolis within Gibbs

Except for the variance of the spatial random effects of the iCAR models, for which we
proposed conjugate priors, we used an adaptive Metropolis algorithm (Metropolis et al.,
1953; Robert & Casella, 2004) within Gibbs sampler (Casella & George, 1992; Gelfand &
Smith, 1990) to draw the samples of the posterior distribution for model’s parameters.

84
The proposal distribution in the Metropolis algorithm is a Normal distribution centered
on the current parameter value and with standard deviation σ. The standard deviation σ is
set to 1 at the beginning of the MCMC and is continuously adjusted so that the acceptance
rate is 0.44 for non-hierarchical models (hSDM.binomial() and hSDM.poisson() functions)
and 0.234 for hierarchical models (other hSDM functions). These values of acceptance rate
(0.44 for low-dimensional models and 0.234 for high-dimensional models) ensure a better
efficiency of the Metropolis algorithm and a faster MCMC convergence (Roberts et al.,
1997; Roberts & Rosenthal, 2009; Roberts et al., 2001).
The actualized value σ ? of the standard deviation of the proposal distribution is com-
puted from the current acceptance rate A, the optimal acceptance rate r (0.44 or 0.234)
and the current standard deviation σ (Eq. 5.3).

if A ≥ r σ ? = σ(2 − (1 − A)/(1 − r))

(5.3)
else σ ? = σ/(2 − A/r)

The tuning of the proposal is only done during the burnin period. After the burnin
period, the standard deviation of the proposal distribution is fixed at the current value.
The adaptive Metropolis within Gibbs is written in C code and compiled to optimize
computation efficiency.

5.4 Intrinsic conditional autoregressive (iCAR) model

To capture the spatial autocorrelation, we employ a Gaussian intrinsic conditional au-
toregressive (iCAR) model (Besag, 1974). To specify this model, we assume that the
conditional distribution of the spatial random effect ρj in cell j, given values for the spatial
random effect in all other cells j 0 6= j, depends only on the spatial random effect of the
neighbouring cells of j. Here, we specify that cell j 0 is a neighbor of j if their boundaries
intersect (Fig. 5.1). In the actual version of the iCAR process used in the hSDM R pack-
age, the spatial effect for any given cell depends only on the values of ρ for the cells in its
neighborhood, and the neighborhood encompasses only the height immediately adjacent
cells (“king movement” in chess). The neighborhood could alternatively be defined to be
larger, and different weights could be assigned to cells at different distances. Formally,
the Gaussian iCAR model for the spatial random effect at cell i can be presented by a
conditional distribution (Eq. 5.4).

p(ρj |ρj 0 ) ∼ N ormal(µj , Vρ /nj )

(5.4) µj : mean of ρj 0 in the neighborhood of j.

Vρ : variance of the spatial random effects.
nj : number of neighbors for cell j.

85
Figure 5.1: Diagram of the grid cell neighborhood used in the intrinsic condi-
tional autoregressive (iCAR) models

The variance of the spatial random effects Vρ is also a parameter to be estimated. We

use a conjugate prior to infer Vρ and we propose two prior distributions: an Inverse-Gamma
distribution with shape and rate parameters or a Uniform distribution with zero for the
lower bound of the interval and one parameter for the upper bound.

5.5 Difference between site-occupancy and ZIB mod-

els
Both site-occupancy or ZIB models (with hSDM.siteocc() or hSDM.ZIB() functions re-
spectively) can be used to model the presence-absence of a species taking into account
imperfect detection. The site-occupancy model can be used in all cases but can be less
convenient and slower to fit when the repeated visits at each site are made under the ex-
act same observation conditions. In this particular case, a Binomial distribution can be
used for the observation process and we suggest the use of a ZIB model for computational
efficiency (see example in Section 4.3).
On the contrary, when the data-set includes several visits at each site under different ob-
servation conditions, a Bernoulli distribution must be used for the observation process (not
a Binomial distribution). In this case, the ZIB models must not be used. For hSDM.ZIB()
functions, the fact that the observations are done on a same site is implicitely assumed
by the data structure (see presences and trials arguments for each observation/site).
Thus, for hSDM.ZIB() functions, there is no site argument to specify the site for each
observation such as for hSDM.siteocc() functions.

86
5.6 Difference between N-mixture and ZIP models
For counts data with imperfect detection, both N-mixture and ZIP models can be used
(with hSDM.Nmixture() or hSDM.ZIP() functions respectively). But the interpretation of
the underlying processes and the structure of the data that can be used differ between the
two models.

For the N-mixture model, the suitability process is modelled by a Poisson distribution.
In this case, we interpret the number of individuals at one site as a function of environ-
mental variables and we assume that there is more individuals when the habitat is more
suitable. In a second step, the observability process is modelled by a Binomial distribution.
We only see a fraction of the individuals present at one site due to observation conditions
(Eq. 5.5).
For the N-mixture model, several visits can occur at one site under different observation
conditions (see response variable y, explicative variables W and probability δ indexed on
both i and t).

Ecological process:
Ni ∼ Poisson(λi )
log(λi ) = Xi β
(5.5)
Observation process:
yit ∼ Binomial(Ni , δit )
logit(δit ) = Wit γ

For the ZIP model, the suitability process is modelled by a Bernoulli distribution. In
this case, we interpret the habitat at a particular site to be suitable for the species (zi = 1)
or not (zi = 0). Then, the process determining the number of individuals observed at
suitable sites (the abundance) is modelled by a Poisson distribution. Thus, this second
process can include both ecological or detection factors explaining the abundance of the
species at suitable sites (Eq. 5.6). Flores et al. (2009) provide a good example of the
application of a ZIP model to the distribution of tree saplings.

Suitability process:
zi ∼ Bernoulli(θi )
logit(θi ) = Xi β
(5.6)
Abundance process:
yi ∼ Poisson(zi , λi )
log(λi ) = Wi γ

Note that ZIP models cannot be used when the data-set includes several visits by

87
site. The likelihood of the ZIP models does not account for the fact that if the species is
observed at least once at one site during the visits, then the habitat at this site is obviously
suitable. Thus, such as for hSDM.ZIB() functions, hSDM.ZIP() functions do not have a site
argument to specify the site for each observation (which is the case for hSDM.Nmixture()
functions).

5.7 Difference between site and spatial.entity

For site-occupancy and N-mixture models taking into account both imperfect detection and
spatial correlation, the user must make the difference between the site argument which in-
dicates the site where the repeated observations have been made, and the spatial.entity
argument which indicates the spatial entity for the spatial correlation process. These two
spatial levels are clearly distinct. Thus, several sites (places visited) can be located in the
same spatial entity (region, state, etc.).
Of course, in some particular cases, the site and the spatial entity can be confounded.
Nonetheless, it is recommended to choose a resonable spatial scale (not too fine) for the
spatial correlation process. With a limited number of spatial entities, there is a possibility
to have more observations in each spatial entity. This should increase the amount of
information for estimating spatial random effects and also speed up the computation with
fewer spatial random effects to estimate. But the number of spatial entities should also
be large enough to be able to estimate the variance of the spatial random effects. For
example, Maas & Hox (2005) suggest a minimum of 50 levels for a random effect factor.

5.8 Computing the neighborhood for iCAR model

Section to be written...

• raster package
• The landscape raster must be projected (otherwise, torus system)
• function adjacent()

5.9 Forecasting species distribution under future cli-

mate change
Section to be written...

• How to obtain predictions

• What about the spatial random effects, do we include them ?

88
5.10 Computation time
When comparing OpenBUGS and hSDM outputs, computation times are given for guid-
ance. The computer used for performing the statistical analysis had 4 processors of 2.5 GHz
and 8Go of RAM. There is no parallelization implemented when running the Gibbs sam-
pler, so that only one processor is used. The operating system installed on the computer
was Linux Debian 9.0.

5.11 Package development, git and Sourceforge

Section to be written...

• Git repository: https://github.com/ghislainv/hSDM

• Web site on Sourceforge: https://ecology.ghislainv.fr/hSDM

• Number of line of code

Development work to be done:

• Analytically estimate the latent variables in N-mixture models

• Probit link function for Binomial model

• Random site effect for observability process

• Multispecies approach (see jSDM R package)

89
90
CHAPTER 6

Conclusion

Section to be written...

• Advantages of hSDM

– User friendly
– Speed
– Can handle large data-sets

• Recommendations

– Fitting complex models imply the use of data-sets providing sufficient informa-
tion (in number of observations, in number of repetitions, etc.).
– Users must be careful especially with non-identifiable over-parametrized model.
– Using hierarchical Bayesian species distribution models is only an option. Be
careful with “statistical machismo” (see http://dynamicecology.wordpress.
com/2012/09/11/statistical-machismo/ and Hodges & Reich (2010) for ex-
ample).

91
92
CHAPTER 7

Acknowledgements

Support was provided by Cirad and FRB (Fondation pour la Recherche sur la Biodiversité)
through the BioSceneMada project (project agreement AAP-SCEN-2013 I).

93
94
Bibliography

Araujo MB, Guisan A (2006) Five (or so) challenges for species distribution modelling.
Journal of Biogeography, 33, 1677–1688.

Bailey L, Adams MJ (2005) Occupancy Models to Study Wildlife. 2005-3096. U.S. Geolog-
ical Survey. URL http://fresc.usgs.gov/products/fs/fs2005-3096.pdf.

Bailey LL, Simons TR, Pollock KH (2004) Estimating site occupancy and species detection
probability parameters for terrestrial salamanders. Ecological Applications, 14, 692–702.

Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. Journal
of the Royal Statistical Society. Series B (Methodological), pp. 192–236.

Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in
spatial statistics. Annals of the Institute of Statistical Mathematics, 43, 1–20.

Brezger A, Kneib T, Lang S (2005) Bayesx: Analyzing bayesian structural additive regres-
sion models. Journal of Statistical Software, 14, 1–22. URL http://www.jstatsoft.
org/v14/i11.

Casella G, George EI (1992) Explaining the Gibbs Sampler. American Statistician, 46,
167–174.

Chelgren ND, Adams MJ, Bailey LL, Bury RB (2011) Using multilevel spatial models to
understand salamander site occupancy patterns after wildfire. Ecology, 92, 408–421.

Chen G, Kéry M, Plattner M, Ma K, Gardner B (2013) Imperfect detection is the rule

rather than the exception in plant distribution studies. Journal of Ecology, 101, 183–191.

Choquet R, Rouan L, Pradel R (2009) Program e-surge: a software application for fitting
multievent models. In: Modeling demographic processes in marked populations, pp. 845–
865. Springer.

95
Cressie NA, Cassie NA (1993) Statistics for spatial data, vol. 900. Wiley New York.

Dorazio RM, Royle JA, Soderstrom B, Glimskar A (2006) Estimating species richness and
accumulation by modeling species occurrence and detectability. Ecology, 87, 842–854.

Dormann CF, McPherson JM, Araujo M, et al. (2007) Methods to account for spatial
autocorrelation in the analysis of species distributional data: a review. Ecography, 30,
609–628. URL http://dx.doi.org/10.1111/j.2007.0906-7590.05171.x.

Elith J, Leathwick JR (2009) Species distribution models: Ecological explanation and

prediction across space and time. Annu. Rev. Ecol. Evol. Syst., 40, 677–697. URL
http://dx.doi.org/10.1146/annurev.ecolsys.110308.120159.

Fiske I, Chandler R (2011) unmarked: An R package for fitting hierarchical models of

wildlife occurrence and abundance. Journal of Statistical Software, 43, 1–23. URL
http://www.jstatsoft.org/v43/i10/.

Flores O, Rossi V, Mortier F (2009) Autocorrelation offsets zero-inflation in models of

tropical saplings density. Ecological Modelling, 220, 1797–1809.

Gelfand AE, Schmidt AM, Wu S, Silander JA, Latimer A, Rebelo AG (2005) Modelling
species diversity through species level hierarchical modelling. Journal of the Royal Sta-
tistical Society: Series C (Applied Statistics), 54, 1–20.

Gelfand AE, Smith AFM (1990) Sampling-Based Approaches to Calculating Marginal Den-
sities. Journal of American Statistical Association, 85, 398–409.

Gray TN (2012) Studying large mammals with imperfect detection: Status and habitat
preferences of wild cattle and large carnivores in eastern cambodia. Biotropica, 44,
531–536.

Guisan A, Thuiller W (2005) Predicting species distribution: offering more than simple
habitat models. Ecology Letters, 8, 993–1009.

Guisan A, Zimmermann NE (2000) Predictive habitat distribution models in ecology. Eco-

logical Modelling, 135, 147–186.

Hastings WK (1970) Monte carlo sampling methods using markov chains and their appli-
cations. Biometrika, 57, 97–109.

Hodges JS, Reich BJ (2010) Adding spatially-correlated errors can mess up the fixed effect
you love. The American Statistician, 64, 325–334. doi:10.1198/tast.2010.10052. URL
http://dx.doi.org/10.1198/tast.2010.10052.

Johnson DS, Conn PB, Hooten MB, Ray JC, Pond BA (2013) Spatial occupancy models
for large data sets. Ecology, 94, 801–808.

96
Keitt TH, Bjørnstad ON, Dixon PM, Citron-Pousty S (2002) Accounting for spatial pattern
when modeling organism-environment interactions. Ecography, 25, 616–625.

Kühn I, Bierman SM, Durka W, Klotz S (2006) Relating geographical variation in polli-
nation types to environmental and spatial factors using novel statistical methods. New
Phytologist, 172, 127–139.

Kéry M, Andrew Royle J (2010) Hierarchical modelling and estimation of abundance and
population trends in metapopulation designs. Journal of Animal Ecology, 79, 453–461.

Kéry M, Gardner B, Monnerat C (2010) Predicting species distributions from checklist

data using site-occupancy models. Journal of Biogeography, 37, 1851–1862.

Kéry M, Royle JA, Schmid H (2005) Modeling avian abundance from replicated counts
using binomial mixture models. Ecological applications, 15, 1450–1461.

Kéry M, Schaub M (2012) Bayesian population analysis using WinBUGS: a hierarchical

perspective. Academic Press.

Kéry M, Schmidt BR (2008) Imperfect detection and its consequences for monitoring for
conservation. Community Ecology, 9, 207–216.

Lahoz-Monfort JJ, Guillera-Arroita G, Wintle BA (2014) Imperfect detection impacts the

performance of species distribution models. Global Ecology and Biogeography, 23, 504–
515. doi:10.1111/geb.12138. URL http://dx.doi.org/10.1111/geb.12138.

Latimer AM, Wu SS, Gelfand AE, Silander JA (2006) Building statistical models to analyze
species distributions. Ecological Applications, 16, 33–50.

Lee D (2013) Carbayes: An r package for bayesian spatial modeling with conditional
autoregressive priors. Journal of Statistical Software, 55. URL http://www.jstatsoft.
org/v55/i13.

Legendre P (1993) Spatial autocorrelation: trouble or new paradigm? Ecology, 74, 1659–
1673.

Lichstein JW, Simons TR, Shriner SA, Franzreb KE (2002) Spatial autocorrelation and
autoregressive models in ecology. Ecological Monographs, 72, 445–463.

Lunn D, Spiegelhalter D, Thomas A, Best N (2009) The bugs project: Evolution, critique
and future directions. Statistics in medicine, 28, 3049–3067.

Maas CJ, Hox JJ (2005) Sufficient sample sizes for multilevel modeling. Methodology:
European Journal of Research Methods for the Behavioral and Social Sciences, 1, 86.

MacKenzie DI (2006) Occupancy estimation and modeling: inferring patterns and dynamics
of species occurrence. Academic Press.

97
MacKenzie DI, Nichols JD, Lachman GB, Droege S, Andrew Royle J, Langtimm CA (2002)
Estimating site occupancy rates when detection probabilities are less than one. Ecology,
83, 2248–2255.

Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of
state calculations by fast computing machines. The journal of chemical physics, 21,
1087–1092.

Miller J, Franklin J, Aspinall R (2007) Incorporating spatial dependence in predictive

vegetation models. Ecological Modelling, 202, 225–242.

Monk J (2014) How long should we ignore imperfect detection of species in the marine
environment when modelling their distribution? Fish and Fisheries, 15, 352–358.

Nichols JD (1992) Capture-recapture models. BioScience, pp. 94–102.

Poley LG, Pond BA, Schaefer JA, Brown GS, Ray JC, Johnson DS (2014) Occupancy
patterns of large mammals in the far north of ontario under imperfect detection and
spatial autocorrelation. Journal of Biogeography, 41, 122–132.

R Development Core Team (2014) R: A Language and Environment for Statistical Com-
puting. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.
R-project.org. ISBN 3-900051-07-0.

Robert CP, Casella G (2004) Monte Carlo statistical methods, vol. 319. Citeseer.

Roberts GO, Gelman A, Gilks WR, et al. (1997) Weak convergence and optimal scaling of
random walk metropolis algorithms. The annals of applied probability, 7, 110–120.

Roberts GO, Rosenthal JS (2009) Examples of adaptive mcmc. Journal of Computational

and Graphical Statistics, 18, 349–367.

Roberts GO, Rosenthal JS, et al. (2001) Optimal scaling for various metropolis-hastings
algorithms. Statistical science, 16, 351–367. doi:10.1214/ss/1015346320.

Rota CT, Fletcher RJ, Evans JM, Hutto RL (2011) Does accounting for imperfect detection
improve species distribution models? Ecography, 34, 659–670.

Royle JA (2004) N-mixture models for estimating population size from spatially replicated
counts. Biometrics, 60, 108–115.

Royle JA, Dorazio RM (2008) Hierarchical modeling and inference in ecology: the analysis
of data from populations, metapopulations and communities. Academic Press.

Royle JA, Dorazio RM, Link WA (2007) Analysis of multinomial models with unknown
index using data augmentation. Journal of Computational and Graphical Statistics, 16,
67–85.

98
Rue H, Martino S, Chopin N (2009) Approximate bayesian inference for latent gaussian
models by using integrated nested laplace approximations. Journal of the royal statistical
society: Series b (statistical methodology), 71, 319–392.

Sinclair SJ, White MD, Newell GR (2010) How useful are species distribution models for
managing biodiversity under future climates? Ecology and Society, 15, 8.

Smith SI (1868) The geographical distribution of animals. The American Naturalist, 2,

pp. 124–131. URL http://www.jstor.org/stable/2447129.

Sokal RR, Oden NL (1978) Spatial autocorrelation in biology: 2. some biological implica-
tions and four applications of evolutionary and ecological interest. Biological Journal
of the Linnean Society, 10, 229–249.

Stan Development Team (2014) Stan Modeling Language Users Guide and Reference Man-
ual, Version 2.2. URL http://mc-stan.org/.

Thuiller W (2014) Editorial commentary on ‘biomod – optimizing predictions of species

distributions and projecting potential future shifts under global change’. Global Change
Biology, 20, 3591–3592. doi:10.1111/gcb.12728. URL http://dx.doi.org/10.1111/
gcb.12728.

Wallace AR (1876) The geographical distribution of animals: with a study of the relations of
living and extinct faunas as elucidating the past changes of the earth’s surface. Macmillan
& Co., London.

White GC, Burnham KP (1999) Program mark: survival estimation from populations of
marked animals. Bird study, 46, S120–S139.

Williams BK, Nichols JD, Conroy MJ (2002) Analysis and management of animal popula-
tions: modeling, estimation, and decision making. Academic Press.

Species Distribution Modeling For Machine Learning Practitioners - A Review
No ratings yet
Species Distribution Modeling For Machine Learning Practitioners - A Review
27 pages
Model Systems in Biology: History, Philosophy, and Practical Concerns
From Everand
Model Systems in Biology: History, Philosophy, and Practical Concerns
Georg F. Striedter
No ratings yet
Printers Presentation
100% (1)
Printers Presentation
17 pages
Stan Users Guide 2 32
No ratings yet
Stan Users Guide 2 32
456 pages
Predicting Species Distribution Offering More Than
No ratings yet
Predicting Species Distribution Offering More Than
18 pages
Minimum Required Number of Specimen Records To Develop Accurate Species Distribution Models
No ratings yet
Minimum Required Number of Specimen Records To Develop Accurate Species Distribution Models
11 pages
Neotropical Mammals Hierarchical Analysis of Occupancy and Abundance Verified Download
No ratings yet
Neotropical Mammals Hierarchical Analysis of Occupancy and Abundance Verified Download
15 pages
2024.03. SI-BFÁ-BDZ-MZs Modelljóság EcolMod 12 Oldal en
No ratings yet
2024.03. SI-BFÁ-BDZ-MZs Modelljóság EcolMod 12 Oldal en
12 pages
Subjective: " " Sto: Diagnostics: Sto:Goal MET: Vital
No ratings yet
Subjective: " " Sto: Diagnostics: Sto:Goal MET: Vital
3 pages
Health and Diseases
No ratings yet
Health and Diseases
31 pages
Modelling Distribution and Abundance With Presence Only Data
No ratings yet
Modelling Distribution and Abundance With Presence Only Data
8 pages
10 7550@rmb 36723
No ratings yet
10 7550@rmb 36723
11 pages
Woodpecker Lx16
No ratings yet
Woodpecker Lx16
46 pages
Pinnacle DV500 User Guide
No ratings yet
Pinnacle DV500 User Guide
188 pages
Guidance in Joint SDM
No ratings yet
Guidance in Joint SDM
29 pages
07 Sillero 2021
No ratings yet
07 Sillero 2021
15 pages
Calibration of Hybrid Species Distribution
No ratings yet
Calibration of Hybrid Species Distribution
13 pages
Hijmans & Elith 2013 SDM With R
No ratings yet
Hijmans & Elith 2013 SDM With R
77 pages
The Importance of Correcting For Sampling Bias in MaxEnt Species Distribution Models
No ratings yet
The Importance of Correcting For Sampling Bias in MaxEnt Species Distribution Models
14 pages
2024 12 17 628864v1 Full
No ratings yet
2024 12 17 628864v1 Full
25 pages
Mackenzie Et Al. - 2002
No ratings yet
Mackenzie Et Al. - 2002
8 pages
Hepinstall and Sader 1997. Photogrammetric Engineering & Remote Sensing
No ratings yet
Hepinstall and Sader 1997. Photogrammetric Engineering & Remote Sensing
8 pages
Zoba 10
No ratings yet
Zoba 10
49 pages
363 HHD 221
No ratings yet
363 HHD 221
102 pages
Species Distribution Modeling With: Robert J. Hijmans and Jane Elith January 8, 2017
No ratings yet
Species Distribution Modeling With: Robert J. Hijmans and Jane Elith January 8, 2017
79 pages
Sensitivity of Conservation Planning To Different Approaches To Using Predicted Species Distribution Data
No ratings yet
Sensitivity of Conservation Planning To Different Approaches To Using Predicted Species Distribution Data
20 pages
The Effect of Sample Size and Species CH
No ratings yet
The Effect of Sample Size and Species CH
13 pages
Peter Man 2013
No ratings yet
Peter Man 2013
8 pages
In Search of Relevant Predictors For Marine Species Distribution Modelling Using The Marinespeed Benchmark Dataset
No ratings yet
In Search of Relevant Predictors For Marine Species Distribution Modelling Using The Marinespeed Benchmark Dataset
36 pages
Knowbr An Application To Map The Geographical Variation of Surv 2018
No ratings yet
Knowbr An Application To Map The Geographical Variation of Surv 2018
8 pages
SEEC Stats Toolbox - SDMs - Powerpoint
No ratings yet
SEEC Stats Toolbox - SDMs - Powerpoint
51 pages
Implementation of SDM - Experimental Notes
No ratings yet
Implementation of SDM - Experimental Notes
25 pages
Species Distribution Modeling With R
No ratings yet
Species Distribution Modeling With R
79 pages
Carrascal Et Al-2015-Diversity and Distributions
No ratings yet
Carrascal Et Al-2015-Diversity and Distributions
11 pages
Module 5 Notes
No ratings yet
Module 5 Notes
19 pages
Species Data Issues of Acquisition and Design
No ratings yet
Species Data Issues of Acquisition and Design
25 pages
Guillera-Arroita 2017 Modelling of Species Distributions, Range Dynamics and Communities Under Imperfect Detection
No ratings yet
Guillera-Arroita 2017 Modelling of Species Distributions, Range Dynamics and Communities Under Imperfect Detection
15 pages
The Importance of Setting The Classroom Learning E-1
No ratings yet
The Importance of Setting The Classroom Learning E-1
5 pages
Annurev Ecolsys 110308 120159
No ratings yet
Annurev Ecolsys 110308 120159
24 pages
Spatial - Wildie - Pop 20210831
No ratings yet
Spatial - Wildie - Pop 20210831
10 pages
Darktable Usermanual
No ratings yet
Darktable Usermanual
196 pages
Neotropical Mammals Hierarchical Analysis of Occupancy and Abundance Fast Ebook Download
100% (17)
Neotropical Mammals Hierarchical Analysis of Occupancy and Abundance Fast Ebook Download
14 pages
Ecography - 2006 - Elith - Novel Methods Improve Prediction of Species Distributions From Occurrence Data
No ratings yet
Ecography - 2006 - Elith - Novel Methods Improve Prediction of Species Distributions From Occurrence Data
23 pages
Ingles - Mackenzie 2024.01
No ratings yet
Ingles - Mackenzie 2024.01
4 pages
Wah Industry Limited. Internship Report
100% (4)
Wah Industry Limited. Internship Report
52 pages
Boiler Report
No ratings yet
Boiler Report
1 page
Rmrs 2019 Miller d001
No ratings yet
Rmrs 2019 Miller d001
16 pages
Liu Et Al. 2018 The Effect of Sample Size On Distribution Models
No ratings yet
Liu Et Al. 2018 The Effect of Sample Size On Distribution Models
14 pages
Multisensor Installation Tool List - 4309978 - 01
No ratings yet
Multisensor Installation Tool List - 4309978 - 01
6 pages
CCA Shree Cement
No ratings yet
CCA Shree Cement
10 pages
Veterinary Cytology - 1st Edition Complete EPUB Download
100% (15)
Veterinary Cytology - 1st Edition Complete EPUB Download
16 pages
SanchezMercadoFerrerParis2010 FranklinBook
No ratings yet
SanchezMercadoFerrerParis2010 FranklinBook
2 pages
The Uncertain Nature of Absences and Their Importance in Species Distribution Modelling
No ratings yet
The Uncertain Nature of Absences and Their Importance in Species Distribution Modelling
12 pages
Journal of Avian Biology - 2017 - Engler - Avian SDMs Current State Challenges and Opportunities
No ratings yet
Journal of Avian Biology - 2017 - Engler - Avian SDMs Current State Challenges and Opportunities
22 pages
LTSpice Tutorial
No ratings yet
LTSpice Tutorial
24 pages
Li 2013
No ratings yet
Li 2013
12 pages
SDM PDF
No ratings yet
SDM PDF
96 pages
Renner Warton
No ratings yet
Renner Warton
9 pages
Methods Ecol Evol - 2018 - Wilkinson
No ratings yet
Methods Ecol Evol - 2018 - Wilkinson
14 pages
JBiogeogr Zurell 2020
No ratings yet
JBiogeogr Zurell 2020
13 pages
Species Distribution Models: Ecological Explanation and Prediction Across Space and Time
No ratings yet
Species Distribution Models: Ecological Explanation and Prediction Across Space and Time
24 pages
Journal of Biogeography - 2006 - Ara Jo - Five or So Challenges For Species Distribution Modelling
No ratings yet
Journal of Biogeography - 2006 - Ara Jo - Five or So Challenges For Species Distribution Modelling
12 pages
Iknayan Et Al, 2014 Detecting Diversity Emerging Methods To Estimate Species Diversity
No ratings yet
Iknayan Et Al, 2014 Detecting Diversity Emerging Methods To Estimate Species Diversity
10 pages
How Well Do Environment-Based Models Predict Species Abundances at A Coarse Scale?
No ratings yet
How Well Do Environment-Based Models Predict Species Abundances at A Coarse Scale?
31 pages
4026-Article Text-9944-1-10-20190730
No ratings yet
4026-Article Text-9944-1-10-20190730
8 pages
Grade 5 - Week 13 - Science Questions
No ratings yet
Grade 5 - Week 13 - Science Questions
4 pages
Species Distribution Models: Ecological Explanation and Prediction Across Space and Time
No ratings yet
Species Distribution Models: Ecological Explanation and Prediction Across Space and Time
24 pages
Es 40 Elith
No ratings yet
Es 40 Elith
20 pages
Research: When Is Variable Importance Estimation in Species Distribution Modelling Affected by Spatial Correlation?
No ratings yet
Research: When Is Variable Importance Estimation in Species Distribution Modelling Affected by Spatial Correlation?
11 pages
2024 Acuvue Price List
No ratings yet
2024 Acuvue Price List
2 pages
Species Distribution Modelling Vith R
100% (1)
Species Distribution Modelling Vith R
72 pages
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
No ratings yet
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
59 pages
Exercises
No ratings yet
Exercises
9 pages
30 Day Diabetic Mealplan PDF
50% (2)
30 Day Diabetic Mealplan PDF
1 page
Modelling Distribution and Abundance With Presence-Only Data
No ratings yet
Modelling Distribution and Abundance With Presence-Only Data
8 pages
Practical 8
No ratings yet
Practical 8
4 pages
Annexure A - 1 - BOQ, ADMIN (Editable Excel File)
No ratings yet
Annexure A - 1 - BOQ, ADMIN (Editable Excel File)
50 pages
SAR Data Access and Availability One-Pager
No ratings yet
SAR Data Access and Availability One-Pager
2 pages
Save Time & Effort and Avoid Risk: Werum PAS-X MES Helps You To Digitize Your Pharma and Biotech Production
No ratings yet
Save Time & Effort and Avoid Risk: Werum PAS-X MES Helps You To Digitize Your Pharma and Biotech Production
16 pages
High Current Linear Regulated Bench Power Supply
No ratings yet
High Current Linear Regulated Bench Power Supply
14 pages
Minion Pro
No ratings yet
Minion Pro
35 pages
Diversity: Muddy Boots Beget Wisdom: Implications For Rare or Endangered Plant Species Distribution Models
No ratings yet
Diversity: Muddy Boots Beget Wisdom: Implications For Rare or Endangered Plant Species Distribution Models
11 pages
Modeling of Species Distributions With Maxent: New Extensions and A Comprehensive Evaluation
No ratings yet
Modeling of Species Distributions With Maxent: New Extensions and A Comprehensive Evaluation
15 pages
Role of UN and International NGOs in Global Health Governance - Edited
No ratings yet
Role of UN and International NGOs in Global Health Governance - Edited
3 pages
Guisan 2000 - Predictive Habitat Distribution Models in Ecology
No ratings yet
Guisan 2000 - Predictive Habitat Distribution Models in Ecology
40 pages
Handbook-Riser-Design - Clamps PDF
67% (3)
Handbook-Riser-Design - Clamps PDF
46 pages
VVCSL Seafarers Health Self Declaration With COVID 19 Vaccine and Testing and Temperature Control Form
No ratings yet
VVCSL Seafarers Health Self Declaration With COVID 19 Vaccine and Testing and Temperature Control Form
3 pages
A Trek Through Time - The History of Trek Bicycles
No ratings yet
A Trek Through Time - The History of Trek Bicycles
5 pages
Newtxdoc Font Package
No ratings yet
Newtxdoc Font Package
14 pages
VLSI System Design
No ratings yet
VLSI System Design
91 pages
Predictive Habitat Distribution Models in Ecology
No ratings yet
Predictive Habitat Distribution Models in Ecology
40 pages
Predicting Species' Geographic Distributions Based On Ecological Niche Modeling
No ratings yet
Predicting Species' Geographic Distributions Based On Ecological Niche Modeling
7 pages
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100% (1)
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100 pages
Entanglement - The Greatest Mystery in Physics - A. Aczel (2001) WW
100% (11)
Entanglement - The Greatest Mystery in Physics - A. Aczel (2001) WW
303 pages
Mega Project Interface Management
100% (3)
Mega Project Interface Management
3 pages