0% found this document useful (0 votes)
39 views9 pages

Renner Warton

This article shows that MAXENT, a popular species distribution modeling method, is mathematically equivalent to Poisson regression and related to a Poisson point process model. This equivalence allows improvements to MAXENT, including methods for choosing an appropriate spatial resolution, model adequacy assessment, and parameter selection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views9 pages

Renner Warton

This article shows that MAXENT, a popular species distribution modeling method, is mathematically equivalent to Poisson regression and related to a Poisson point process model. This equivalence allows improvements to MAXENT, including methods for choosing an appropriate spatial resolution, model adequacy assessment, and parameter selection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/235399634

Equivalence of MAXENT and Poisson Point Process Models for Species


Distribution Modeling in Ecology

Article in Biometrics · February 2013


DOI: 10.1111/j.1541-0420.2012.01824.x · Source: PubMed

CITATIONS READS

446 3,045

2 authors:

Ian W Renner David I Warton


The University of Newcastle, Australia UNSW Sydney
15 PUBLICATIONS 1,011 CITATIONS 156 PUBLICATIONS 23,420 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by David I Warton on 03 April 2019.

The user has requested enhancement of the downloaded file.


Biometrics
Biometrics 69, 274–281 DOI: 10.1111/j.1541-0420.2012.01824.x
March 2013

Equivalence of MAXENT and Poisson Point Process Models for


Species Distribution Modeling in Ecology

Ian W. Renner∗ and David I. Warton


School of Mathematics and Statistics and Evolution & Ecology Research Centre,
The University of New South Wales, NSW 2052, Australia.

email: [email protected]

Summary. Modeling the spatial distribution of a species is a fundamental problem in ecology. A number of modeling methods
have been developed, an extremely popular one being MAXENT, a maximum entropy modeling approach. In this article,
we show that MAXENT is equivalent to a Poisson regression model and hence is related to a Poisson point process model,
differing only in the intercept term, which is scale-dependent in MAXENT. We illustrate a number of improvements to
MAXENT that follow from these relations. In particular, a point process model approach facilitates methods for choosing
the appropriate spatial resolution, assessing model adequacy, and choosing the LASSO penalty parameter, all currently
unavailable to MAXENT. The equivalence result represents a significant step in the unification of the species distribution
modeling literature.
Key words: Habitat modeling; Location-only; Maximum entropy; Poisson likelihood; Presence-only data; Use-availability.

1. Introduction and Background Heritage Area, Australia(Figure 1a). We would like to model
Species distribution modeling (SDM), where the goal is to the distribution of C. eximia as a function of climatic and
explain the occurrence of a species using a set of environ- fire history variables in order to explore the nature of as-
mental variables, is an important goal in ecology. This is a sociation of each of these variables with occurrence of this
fast-moving field; in fact ISI’s Essential Science Indicators species.
(July 2012) identifies SDM as one of the top five ranked re- MAXENT (Phillips, Anderson, and Schapire, 2006), based
search fronts in ecology and the environmental sciences. One on a maximum entropy approach, is particularly common
potential reason for such high interest is that SDM aims to in SDM, having been cited 378 times in 2011 according to
address important topical questions such as the potential ef- Google Scholar. Its rise in popularity has been meteoric, hav-
fects of climate change on species distributions (Thullier et al., ing only been introduced to ecology 6 years ago, although the
2008). Rapid progress in this field has been facilitated by re- concept of maximum entropy modeling has been around for
cent significant technological advances in remote sensing, GIS a long time (Jaynes, 1957). A comprehensive study of cur-
(O’Sullivan and Unwin, 2010), and computational power, en- rent SDM methods found MAXENT to outperform nearly all
abling models to be built at increasingly fine resolutions and other methods (Elith et al., 2006), and this may explain its
increasingly large spatial scales. prevalence in the literature. Nevertheless, MAXENT has a
Ideally, an SDM could be constructed using systematically number of shortcomings, as demonstrated in Sections 3 and
collected presence/absence data so that logistic regression 4. In particular, it is unclear what diagnostic tools may be
(McCullagh and Nelder, 1989) and its extensions (Hastie and used to assess whether the fitted model is reasonable. More-
Tibshirani, 1990; Elith, Leathwick, and Hastie, 2008) may be over, MAXENT analyzes data after first aggregating them
used. However, the best available data often come not from into presence/absence grid cells (as in Figure 1a), and it is
systematic data but from lists of locations where a species currently unclear what spatial resolution should be used when
is reported to be present, with no corresponding information constructing these grid cells. Further, some key components
about where a species is reported to be absent (Pearce and of the output such as the intercept and fitted probabilities
Boyce, 2006). This type of data, known as “presence-only” are dependent on this choice of spatial resolution (“scale-
data, is typically found in museums, herbaria, and atlases dependence”).
(Pearce and Boyce, 2006). In this article we show that MAXENT is mathemati-
An example used throughout this article is a list of 95 lo- cally equivalent to Poisson regression (McCullagh and Nelder,
cations (NSW Office of Environment and Heritage, 2010) of 1989) and related to a Poisson point process model (Warton
Sydney eucalypt Corymbia eximia observed between 1990 and and Shepherd, 2010). Relationships between maximum like-
2008 within 100 km of the Greater Blue Mountains World lihood and maximum entropy have been known for a long

274 © 2013, The International Biometric Society


Equivalence of MAXENT and Poisson Point Process Models 275

Point pr MAXENT

(a)

(b)

0.000 0.003 0.006 0.009 0.012 0 1e−04 2e−04 3e−04

Figure 1. Comparison of point process and MAXENT analyses of Corymbia eximia data.(a) Response variables for C.
eximia: a Poisson point process model (left) or an area-interaction point process model (left) analyzes presence points yP
= {y1 ; . . . ; ym } MAXENT (right) analyzes presence/absence in grid cells {g1 , . . . , gn }, with n = 258 here. A key issue with
MAXENT is determining how many grid cells n to use for analysis. (b) Predicted species distribution maps for an area-
interaction model (left) and MAXENT (right). This figure appears in color in the electronic version of this article.

time—this relationship was explored for exponential families pseudo-absence regression, another popular method of SDM.
in the late 1950s (Kullback, 1959), while an equivalence for This article achieves a similar goal in relation to MAXENT—
contingency tables was established in 1963 (Good, 1963), and all of the problems described in Sections 3 and 4 can be ad-
maximum entropy was later linked to the maximum likeli- dressed by reframing the problem using a Poisson point pro-
hood of a Gibbs distribution (Della Pietra, Della Pietra, and cess model. Section 2 demonstrates the equivalence of Poisson
Lafferty, 1997). Nonetheless, the direct link we make in this point process models and MAXENT. Section 3 demonstrates
article between MAXENT and Poisson point process mod- by example how this equivalence can improve on current prac-
els is new. Warton and Shepherd (2010) introduced Poisson tice in MAXENT modeling. Finally, Section 4 demonstrates
point process models as a way to address “problems of model that these proposed improvements can led to more accurate
specification, interpretation, and implementation” inherent in predictions of a species’ actual distribution.
276 Biometrics, March 2013

2. Equivalence of MAXENT and Poisson Point Theorem 1. The MAXENT procedure and Poisson regres-
Process Models sion are equivalent. That is,
The goal of SDM is to link the location of species pres- 1. They fit the same model:
ences to some number p of environmental variables. Let
yP = {y1 , . . . , ym } be presence-only locations for a particular ln π(gi ) = ln μ(gi ) = x(gi ) β.
species over some region A and x(y) = {1, x1 (y), . . . , xp (y))}
2. They estimate parameters to maximize the same function
be the vector of p environmental variables corresponding to
up to a constant:
location y in the study region A. We fit an SDM by regressing
the y ∈ A against the x(y), using one of a few methods. Λ{β; z(n ) (g)} = l{β; z(n ) (g)} + C,
Rather than using the presence-only locations yP =
where C is a constant and Λ{β; z(n ) (g)} is the Lagrangian
{y1 , . . . , ym }, the MAXENT procedure analyzes data by
function to maximize entropy H{π(g)} subject to the con-
splitting A into n grid cells with centers at the loca-
straints stated in equations (1)–(2). Hence the maximum
tions in g = {g1 , . . . , gn }. A binary response vector z(n ) (g) =
entropy estimate β  MAXENT equals the maximum likelihood
{z (n ) (g1 ), . . . , z (n ) (gn )} is formed where z (n ) (gi ) = 1/m(n ) if
the ith grid cell contains at least one presence location and 0 estimate from Poisson regression β  GLM .
otherwise, and m(n ) is the count of grid cells that contain at The proof of Theorem 1 appears in Web Appendix 1. Part 1
least one presence location. Without loss of generality, we par- of Theorem 1 (that MAXENT fits a log-linear model) is well
(n )
tition {g1 , . . . , gn } as {gP , g0 }, where gP = {g1 , . . . , gm } are known (e.g. Dutta, 1966), but Part 2 (the link to Poisson re-
(n )
the m presence cells. We index z and m with the superscript gression) is new. This link to Poisson likelihood was enabled
(n) to emphasize that these quantities depend on the spatial by specifying the MAXENT model in a slightly different way
resolution used for resolution (hence the number of grid cells to what is conventional in the maximum entropy literature.
n) used in analysis.The goal in MAXENT is to model π(gi ), It is typical to exclude the intercept term from the model and
the probability that if there is one presence then it is located introduce a normalization constant in its place after optimiza-
in the ith grid cell. π(g) = {π(g1 ), . .  . , π(gn )} is estimated to tion to ensure that the sum of π is one. Instead, we included
n
maximize the entropy H{π(g)} = − i =1 π(gi ) ln π(gi ), sub- an intercept term and the constraint of equation (2) to the
ject to two types of constraint: optimization problem, which was the key to our derivation.
Hence we have shown that some maximum entropy problems,
 1 
n m (n )
including MAXENT, can be solved using standard generalized
π(gi ) xj (gi ) = (n ) xj (gi ), ∀j, (1)
m linear modeling software via Poisson regression, which can ac-
i =1 i =1
commodate a large number of predictors. We demonstrate
this result numerically in Web Figure 1. Further, this enables

n a link with Poisson point process models below.
π(gi ) = 1. (2) A Poisson point process regression model (PPM) analyzes
i =1 m presence-only locations yP = {y1 , . . . , ym } as a point pro-
cess in which the locations of the m points are assumed to
Equation (1) ensures that the predicted mean of each envi- be independent. Unlike MAXENT, which models probability
ronmental variable equals its observed mean for the presence π(gi ) per grid cell, a Poisson PPM models the limiting ex-
data while (2) ensures that the probabilities add to 1. pected count (λ(y), the “intensity”) per unit area (Cressie,
We will show that the MAXENT procedure is equivalent 1993) for any location y ∈ A. Intensity is modeled as a log-
to Poisson regression when applied to grid cell data z(n ) (g). linear function of p explanatory variables: ln{λ(y)} = x(y) β.
That is, we model the mean of z (n ) (gi ) as a log-linear model: An analysis on a per area basis rather than a per grid cell
basis is a key distinction between a point process model and
ln μi = x(gi ) β. (3)
MAXENT.
We estimate the parameters β to maximize the likelihood The log-likelihood of a Poisson point process model
function (McCullagh and Nelder, 1989): (Cressie, 1993) is:

m 

n
l(β; yP ) = ln λ(yi ) − λ(y)dy − ln(m!). (5)
l{β; z(n ) (g)} = z (n ) (gi ) ln μ(gi ) y ∈A
i =1
i =1

n

n By using numerical quadrature (Davis and Rabinowitz, 1984),
− μ(gi ) − ln{z (n ) (gi )!}. (4) the likelihood expression for yP can be approximated as a
i =1 i =1 weighted Poisson likelihood (Berman and Turner, 1992):

On face value, this analysis appears to be based on a nonsen- 


n

lppm (β; yP , y0 , w) ≈ wi [zw , i ln{λ(yi )} − λ(yi )], (6)


sical model for the data, as it implicitly assumes that a set
i =1
of noninteger values comes from a Poisson distribution. How-
ever, we will show first that this is precisely what MAXENT where y0 = {ym +1 , . . . , yn } are quadrature points and zw , i =
I (i ∈1, . . . , m )
does and later that this can be motivated as a point process wi
for quadrature weights w = {w1 , . . . , wn }, and I(·)
model, which can be fitted for a noninteger response using the is the indicator function. A natural way to choose quadrature
result of Berman and Turner (1992). points is to break the region A into a regular grid and insert
Equivalence of MAXENT and Poisson Point Process Models 277

a quadrature point at the center of each cell, meaning that 3. Model Application
y0 = g0 . Each cell can then be assigned a quadrature weight We will now demonstrate the application of a point process
which equals its area divided by the number of locations in model to the presence-only locations of Corymbia eximia, il-
{yP , y0 } contained in the cell. lustrating many features currently unavailable to MAXENT.
An alternative representation of the point process likeli- Software for the below analyses including example data will
hood, suggested during review, was to use I(i ∈ 1, . . . , m) as be available in the R package ppmlasso. Our analysis will
the response and ln wi as an offset term. This would produce consist of four steps: (1) determine the appropriate spatial
a likelihood expression proportional to (5), but without the resolution for analysis; (2) assess whether a Poisson point
need for a noninteger response. process model is appropriate; (3) estimate the LASSO param-
We find a relation in Theorem 2 between MAXENT and the eter (Tibshirani, 1996) for regularization; and (4) compare
above formulation for Poisson point process models by ana- results with a MAXENT model. We use four environmen-
lyzing data at grid cell locations {gP , g0 } instead of {yP , y0 }. tal variables as in Warton and Shepherd (2010)—minimum
That is, we use in the analysis the same quadrature points and maximum temperature, number of fires since 1943, and
y0 = g0 , but use the locations of the m(n ) presence grid cells annual rainfall. Likelihood of observing a presence point de-
gP in place of the m actual presence locations in yP . This pends not just on the spatial distribution of the species, but
results in some loss of information, discussed in Section 3. also on the spatial distribution of observers, which is strongly
affected by site accessibility. Hence we include two variables
Theorem 2. Consider a Poisson point process model fit- to measure site accessibility—distance from main roads and
ted to grid cell data z(n ) (g), with parameter estimates stored in distance from urban areas. Intensity of C. eximia was modeled
 PPM .Then:
β as a quadratic function of the six available variables, including
interactions between the four environmental variables and be-
 MAXENT = β
β  PPM + JC , tween the two accessibility variables (but assuming additivity
between environmental and accessibility variables). So long as
where JC = {ln C, 0, . . . , 0} is a vector of length p + 1, and
all six of these variables are independent of variables associ-
C = |A|/(m(n ) n).
ated with species detection probability, parameter estimates
In other words, the MAXENT and PPM solutions for grid
from a Poisson point process model will be consistently esti-
cell data are proportional, and estimates of slope parameters are
mated (Dorazio, in press).
identical.
Prior to applying the LASSO to point process models, vari-
ables were standardized to have mean 0 and variance 1 as in
The proof of Theorem 2 appears in Web Appendix 1.
Tibshirani (1996), such that the LASSO penalty was applied
to standardized coefficients. In MAXENT, variables were in-
Corollary 1: For a given presence-only dataset stead standardized to have minimum 0 and maximum 1.
yP ,consider a set of vectors of grid cell data constructed
at increasingly fine spatial resolutions (e.g.,by recursively 3.1 Choosing the Appropriate Spatial Resolution
partitioning {z(n ) (g); n = 1, 2, 22 , 23 , . . .}). As n → ∞, the NSW Office of Environment and Heritage (2010) provides en-
MAXENT solution for z(n ) (g) becomes proportional to the vironmental data over the study region at the 100 m resolu-
Poisson point process model solution for yP . That is: tion. However, performing an analysis at such a fine resolution
is computationally expensive and may not be necessary. Using

βMAXENT − JC → β,
a Poisson point process model specification facilitates the use
where JC is as defined in Theorem 2. of a numerical integration framework for choosing an appro-
priate spatial resolution for a particular species. As the ab-
The proof follows by noting that as n → ∞, the number sence grid cells g0 are used as quadrature points, the question
and location of presence points in gP approach those in yP of what spatial resolution needs to be used can be rephrased
and the quadrature approximation of (6) approaches the exact as a question of how many quadrature points are needed to
solution in (5). obtain a sufficiently accurate estimate of the log-likelihood.
This result is similar to Theorem 3.2 of Warton and The same idea was used in Warton and Shepherd (2010) to
Shepherd (2010), who showed that when fitting a Poisson clarify the role of pseudo-absences in presence-only analysis,
PPM with constant quadrature weights C, ignoring these and how their number and location can be chosen.
weights changes the solution by the factor C. MAXENT can Following Warton and Shepherd (2010), we add quadrature
be represented as a Poisson point process model ignoring points at increasingly fine resolutions until the log-likelihood
quadrature weights, so a similar result applies. These quadra- has converged. For Corymbia eximia, the likelihood appears
ture weights are the mechanism that ensures that analysis to converge at a spatial resolution of 800 m (Figure 2a), sug-
is performed on an area basis instead of a grid cell basis gesting that model output will not appreciably change at
(Warton and Shepherd, 2010). Hence while Poisson point pro- finer spatial resolutions. However, the entropy of analogous
cess model and MAXENT solutions are qualitatively identi- MAXENT models does not converge due to the scale depen-
cal, analyzing data on a grid cell basis instead of an area basis dence of π(g) and hence MAXENT is not very informative
induces scale dependence in MAXENT: as n → ∞, π(gi ) → 0. about which spatial resolution to use for analysis. The scale
Hence the maps in Web Figure 2 look the same, but only for dependence of MAXENT can be adjusted for in part (using
the Poisson point process models is the scale unchanged by “gain,” defined as ln n - entropy), but not completely, since
changing spatial resolution. the loss of information incurred by absorbing the m presence
278 Biometrics, March 2013

−430

−2
−4
−440
log−likelihood

−6
Entropy
−450
(a)

−8
−10
−460
Poisson PPM
MAXENT

−12
Gain

400 800 1600 3200 6400 12800


Spatial resolution (m)

Poisson PPM Area−Interaction Model


80

150
60

100
K(r)

K(r)

(b)
40

50
20
0

0 5 10 15 20 25 0 5 10 15 20 25
Radius (km) Radius (km)

Figure 2. Model checking for the Corymbia eximia analysis: (a) Spatial resolution can be chosen for a point process model
from a plot of maximized log-likelihood at differing spatial resolutions. Convergence is achieved at the 800m resolution for
the Poisson point process model, suggesting this is the optimal spatial resolution at which to perform analysis. There is no
convergence for the entropy used by MAXENT. We can attempt to address this by analyzing “gain” (defined as ln n - entropy),
but gain (rescaled) does not converge until the number of presence cells m(n) converges. (b) Inhomogeneous K-function (solid
line), with 95% simulation envelope (shaded area), for a Poisson point process model (left) and an area-interaction model
with radius 1 km (right). The deviation from the envelope for the Poisson point process model suggests additional clustering
unaccounted for in the model. This figure appears in color in the electronic version of this article.

locations into a smaller number m(n ) of presence grid cells outside a 95% envelope formed by simulating 1000 realiza-
varies with the choice of spatial resolution. Hence the gain tions from a Poisson point process model with intensity func-
will not converge until m(n ) converges. tion as estimated from the C. eximia data. The deviation
above the envelope suggests that the presence locations of
3.2 Is a Poisson PPM Appropriate?
C. eximia are more clustered than would be expected for a
The underlying assumption of a Poisson point process model true Poisson point process model. Instead, Figure 2b demon-
(and by equivalence, MAXENT) is that the point locations strates that an area-interaction model (Baddeley and van
are independent, conditional on model covariates. This may Lieshout, 1995) with radius 1 km is more appropriate, which
not be appropriate for Corymbia eximia. While MAXENT we fit using a Poisson pseudo-likelihood as in the spatstat
offers no method for checking this assumption, there are a (Baddeley and Turner, 2005) package of R. There is built-
number of diagnostic tools to assess model adequacy of a in code in spatstat for fitting a large suite of other spatial
Poisson point process model (Cressie, 1993; Baddeley et al., processes involving dependence between points (Baddeley and
2005). One such method is to construct the inhomogeneous Turner, 2005; Chakraborty et al., 2011) that may be suitable.
K-function (Baddeley, Møller, and Waagepetersen, 2000) and
corresponding simulation envelope (Diggle, 2003) of the fit- 3.3 Choosing the LASSO parameter
ted model. In Figure 2b, it can be seen that for C. ex- MAXENT is often fitted using a LASSO penalty to control
imia, a Poisson point process model may not be suitable for overfitting. For Corymbia eximia, MAXENT software uses
for the data, as the inhomogeneous K-function falls well an ad hoc value of 709 for the LASSO penalty parameter (λ),
Equivalence of MAXENT and Poisson Point Process Models 279

Table 1 Table 2
Current problems with MAXENT and their proposed solutions Predictive performance (measured as average area under the
available through reexpression as a Poisson point process model ROC curve for 20 different fivefold spatial cross-validation
schemes) of different presence-only models for C. eximia when
MAXENT problem Poisson PPM solution predicting to a separate presence–absence dataset. Note that
Predicted probabilities are Predicted intensities are the point process approach proposed in Section 3 has the
scale-dependent scale-invariant highest predictive performance
How to determine spatial Increase until log-likelihood
resolution? converges LASSO penalty Standard
How to assess model Various goodness-of-fit Model criteria AUC error
adequacy? procedures available Poisson PPM No penalty 0.7555 0.0070
How to choose LASSO Various data-driven MAXENT ad hoc MAXENT 0.8508 0.0060
parameter? methods Poisson PPM Nonlinear GCV 0.8813 0.0051
Available in MAXENT Use any standard GLM Area-interaction Nonlinear GCV 0.9066 0.0036
software only software
130 seconds to fit models in 12 seconds to fit models in
Figure 1b Figure 1b
eximia presence-only data and predict to new areas, assess-
ing predictive performance using a separate presence–absence
which was chosen without any consideration for predictive
dataset from 8678 systematically collected transects (NSW
performance of the model at hand but rather based entirely on
Office of Environment and Heritage, 2010), as in Elith et al.
the number of presence cells (90). Alternatively, some data-
(2006). This presence–absence dataset may be considered a
driven criterion could be used to try to choose a λ which
“gold standard,” where observers have gone to each of the
optimizes predictive performance (Tibshirani, 1996; Fu, 2005;
8678 sites and specifically noted presences of C. eximia. We
Zou, Hastie, and Tibshirani, 2007). We used a simple line
apply a spatial fivefold cross-validation in which sites are as-
search algorithm to find the value that minimized nonlinear
signed to 30 square 64 × 64 km spatial blocks that are ran-
GCV (Fu, 2005), which returned a value of 4.907.
domly assigned to test and training samples. We employ this
3.4 Results procedure to minimize the influence of spatial autocorrelation,
The coefficients for both the point process model and the which is not considered by MAXENT.
MAXENT model (Web Table 1) are qualitatively different due We evaluate the performance of MAXENT and various
largely to the different LASSO parameters. Of the 19 model models from the point process approach by comparing pre-
coefficients, only 11 are nonzero in the point process model, as dicted intensities at the systematically collected transects
opposed to 17 for MAXENT. Moreover, the harsher LASSO against observed presence/absence, using area under an ROC
penalty of the point process model ensures that each of the curve (Elith and Leathwick, 2007). Table 2 reveals that choos-
estimated coefficients are smaller than the corresponding co- ing the LASSO parameter to minimize the nonlinear GCV
efficients of the MAXENT model. Otherwise, the models are performs better than using MAXENT’s default method for
broadly similar and hence the maps produced by both models C. eximia for both point process models. Hence, while MAX-
identify the same geographic hot spots for Corymbia eximia ENT achieves high predictive performance relative to other
(Figure 1b). methods (Elith et al., 2006), there is the potential to improve
it further by using the data to inform the choice of the LASSO
3.5 Summary parameter.
In analyzing the Corymbia eximia data we have seen a num-
ber of advantages of the Poisson point process model approach 5. Discussion
in choosing the spatial resolution, assessing model adequacy, Some recent papers (Elith and Leathwick, 2009; Aarts,
and choosing the LASSO parameter. These are summarized Fieberg, and Matthiopoulos, 2012) have called for greater
in Table 1. Another potential advantage is in assessing model unification and synthesis of the literature on SDM. To that
uncertainty—a point process framework can be used to put end, we have demonstrated equivalence of MAXENT and a
standard errors on model coefficients and predictions, al- Poisson point process model. Warton and Shepherd (2010)
though when using the LASSO in estimation (Fan and Li, showed the equivalence of Poisson point process models and
2001) there are some difficulties (Kyung et al., 2010). A final pseudo-absence regression, which aside from MAXENT is the
advantage worthy of mention is in computation time: Figure most commonly used approach to presence-only modeling at
1b took 12 seconds to produce for the point process model, the moment. Hence our work represents a significant unifica-
but 130 seconds using MAXENT software (Table 1). tion of the literature, using Poisson point process models to
link the two most widely used presence-only methods, MAX-
4. Improvements in Predictive Performance ENT and pseudo-absence regression. This work has signifi-
We will now compare the predictive performance of the point cant practical ramifications, given that MAXENT (Table 1)
process approach described in Section 3 to MAXENT in or- and pseudo-absence regression (Warton and Shepherd, 2010)
der to assess whether the refinements we proposed (in partic- have shortcomings stemming largely from the framework used
ular, modeling point interactions and data-driven estimation for modeling, which can be resolved by using a Poisson point
of the LASSO penalty parameter) improve the performance process model instead. Others have made further connections
of the model. The approach we take is to model Corymbia between point process models and alternative approaches to
280 Biometrics, March 2013

analysis—Aarts et al. (2012) made a connection to the esti- or equivalently, to identify the number of “background points”
mation of “resource selection functions” via presence–absence to use in analysis.
analysis, and Dorazio (in press) to case-augmented binary re- In Section 4 we demonstrated that point process models
gression. Point process models are a natural framework for achieve a higher predictive performance for Corymbia eximia
analyzing presence-only data and it is interesting that a va- by choosing the LASSO penalty parameter to minimize non-
riety of different methods of analysis can all be connected to linear GCV. However, this may not be true of all species.
them in some way, and in many instances, improved through We are currently investigating the question of how predic-
this connection. tive performance varies with different methods of choosing
A key distinction between point process models and MAX- the LASSO parameter across multiple species endemic to the
ENT is that in the former we model λ(y) on a per area basis Blue Mountains.
whereas for the latter, we model π(gi ) per grid cell—the per While MAXENT has become extraordinarily popular in
area analysis is thus invariant under choice of spatial resolu- ecology, the lack of a model-based framework and diagnos-
tion while the per grid cell analysis is not (because increasing tic tools means that it is used rather uncritically and some-
spatial resolution increases the number of grid cells). This is times inappropriately. A key advantage to the point process
related to the distinction between probability and frequency approach is that we have a model and can hence check as-
models (Aarts et al., 2012). It is this distinction that enables sumptions. We found that a Poisson point process model did
the likelihood convergence for a Poisson point process model not fit C. eximia well due to violations of the independence as-
(Figure 2a) and hence a data-driven choice of spatial resolu- sumption, in which case there are many alternative options for
tion. However, MAXENT is proportional to a Poisson point presence-only data (Baddeley and Turner, 2005; Chakraborty
process model (Theorem 2), which suggests that it can achieve et al., 2011). The suite of diagnostic tools available via point
the same qualitative answer but with the disadvantage of scale process models (Cressie, 1993; Diggle, 2003; Baddeley and
dependence of the predicted probabilities and an arbitrary Turner, 2005; Baddeley et al., 2005) offers the possibility for
choice of spatial resolution. users to think more critically about the appropriateness of the
One important disadvantage of MAXENT is that in its model they are fitting, which can ultimately have benefits in
current form, it does not estimate the intercept consistently model interpretability and performance.
(Elith et al., 2011). The intercept term diverges to −∞ as
spatial resolution increases. Theorem 2 gives the form of the
term causing this divergence. This means that MAXENT as 6. Supplementary Materials
currently posed cannot predict species intensity for any subset Web Appendix 1, Web Figures 1 and 2 and Web Table 1
of the study region A or likewise model abundance in the way referenced in Sections 2 and 3 are available with this article
that point process models can. at the Biometrics website on Wiley Online Library.
The new-found ability to use data to estimate spatial reso-
lution (Figure 2a) is of interest for a couple of reasons. First,
the resolution of the process is largely a function of biological Acknowledgements
factors and measurement error, and estimating this resolution We would like to thank Dan Ramp and Evan Webster for
informs us about the spatial scale at which such processes are providing access to the data. We thank the Co-Editor, Asso-
operating. Second, the resolution of the process is of interest ciate Editor, and three referees for their valuable suggestions.
for computational reasons, because data are becoming avail- IWR is supported by an Australian Postgraduate Award and
able at increasingly fine resolutions—we originally had access DIW by the Australian Research Council Discovery Projects
to 8,620,092 points at the 100 m resolution, but even finer funding scheme (project number DP0985886).
resolutions are now available —and analysis at such fine res-
olutions can be very computationally intensive. We know of References
colleagues analyzing this type of data in biology departments
who have constructed their own parallel computing arrays to Aarts, G., Fieberg, J., and Matthiopoulos, J. (2012). Comparative in-
terpretation of count, presence-absence and point methods for
analyze this type of data for multiple species at fine resolu-
species distribution models. Methods in Ecology and Evolution 3,
tions. Hence it is of considerable practical interest to know 177–187.
whether such a fine resolution is required, and in our case, it Baddeley, A. and Turner, R. (2005). Spatstat: An R package for ana-
clearly was not required as we only needed 134,716 quadrature lyzing spatial point patterns. Journal of Statistical Software 12,
points and were able to analyze data in seconds on a desktop 1–42.
computer (Table 1), with negligible loss of information. Baddeley, A. J. and van Lieshout, M. (1995). Area-interaction point
An alternative approach to MAXENT analysis of all grid processes. Annals of the Institute of Statistical Mathematics 47,
cells is to randomly select empty grid cells as “background 601–619.
points” for analysis. This obviates any computational need to Baddeley, A. J., Møller, J., and Waagepetersen, R. (2000). Non- and
coarsen resolution for analysis. The default approach that has semiparametric estimation of interaction in inhomogeneous point
patterns. Statistica Neerlandica 54, 329–350.
been advocated (Phillips and Dudı́k, 2008) and implemented
Baddeley, A. J., Turner, R., Møller, J., and Hazelton, M. (2005). Resid-
in MAXENT software is to use 10,000 random background ual analysis for spatial point processes. Journal of the Royal Sta-
points, which for our data was clearly insufficient (Figure 2a), tistical Society, Series B 67, 617–666.
equivalent to using a resolution of nearly 3 km. We advise that Berman, M. and Turner, T. (1992). Approximating point process likeli-
as a matter of routine, presence-only analysts should use their hoods with GLIM. Journal of the Royal Statistics Society Series
data to identify a spatial resolution appropriate for analysis, C – Applied Statistics 41, 31–38.
Equivalence of MAXENT and Poisson Point Process Models 281

Chakraborty, A., Gelfand, A., Wilson, A., Latimer, A., and Silander, J. Fu, W. J. (2005). Nonlinear GCV and quasi-GCV for shrinkage models.
(2011). Point pattern modelling for degraded presence-only data Journal of Statistical Planning and Inference 131, 333–347.
over large regions. Journal of the Royal Statistical Society, Series Good, I. J. (1963). Maximum entropy for hypothesis formulation, espe-
C 60, 757–776. cially for multidimensional contingency tables. Annals of Mathe-
Cressie, N. A. C. (1993). Statistics for Spatial Data. New York: John matical Statistics 34, 911–934.
Wiley & Sons. Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Boca
Davis, P. J. and Rabinowitz, P. (1984). Methods of Numerical Integra- Raton, Florida: Chapman & Hall.
tion, 2nd edition. Orlando, Florida: Academic Press, Inc. Jaynes, E. T. (1957). Information theory and statistical mechanics.
Della Pietra, S., Della Pietra, V., and Lafferty, J. (1997). Inducing fea- Physics Review 106, 620–630.
tures on random fields. IEEE Transactions on Pattern Analysis Kullback, S. (1959). Information Theory and Statistics. New York: John
and Machine Intelligence 19, 380–393. Wiley & Sons.
Diggle, P. (2003). Statistical Analysis of Spatial Point Patterns, 2nd Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). Penalized re-
edition. New York: Oxford University Press, Inc. gression, standard errors, and Bayesian lassos. Bayesian Analysis
Dorazio, R. M. (in press). Predicting the geographic distribution of a 5, 369–411.
species from presence-only data subject to detection errors. Bio- McCullagh, P. and Nelder, J. (1989). Generalized Linear Models, 2nd
metrics. edition. London: Chapman and Hall.
Dutta, M. (1966). On maximum (information-theoretic) entropy esti- NSW Office of Environment and Heritage (2010). Atlas of NSW Wildlife
mation. Sankhya: The Indian Journal of Statistics, Series A 28, database. Data accessed 20/04/2010.
319–328. O’Sullivan, D. and Unwin, D. J. (2010). Geographic Information Anal-
Elith, J. and Leathwick, J. R. (2007). Predicting species distributions ysis, 2nd edition. Hoboken, New Jersey: John Wiley & Sons.
from museum and herbarium records using multiresponse models Pearce, J. L. and Boyce, M. S. (2006). Modelling distribution and abun-
fitted with multivariate adaptive regression splines. Diversity and dance with presence-only data. Journal of Applied Ecology 43,
Distributions 13, 265–275. 405–412.
Elith, J. and Leathwick, J. R. (2009). Species distribution models: Eco- Phillips, S. J. and Dudı́k, M. (2008). Modeling of species distributions
logical explanation and prediction across space and time. An- with Maxent: New extensions and a comprehensive evaluation.
nual Review of Ecology, Evolution, and Systematics 40, 677– Ecography 31, 161–175.
697. Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006). Maximum
Elith, J., Graham, C. H., Anderson, R. P., Dudı́k, M., Ferrier, S., entropy modeling of species geographic distributions. Ecological
Guisan, A., Hijmans, R. J., Huettmann, F., Leathwick, J. R., Modelling 190, 231–259.
Lehmann, A., Li, J., Lohmann, L. G., Loiselle, B. A., Manion, G., Thullier, W., Albert, C., Araújo, M., Berry, P., Cabeza, M., Guisan,
Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J. McC., Pe- A., Hickler, T., Midgley, G. F., Paterson, J., Schurr, F. M., Sykes,
terson, A. T., Phillips, S. J., Richardson, K. S., Scachetti-Pereira, M. T., and Zimmerman, N. E. (2008). Predicting global change
R., Schapire, R. E., Soberón, J., Williams, S., Wisz, M. S., and impacts on plant species distributions: Future challenges. Per-
Zimmermann, N. E. (2006). Novel methods improve prediction of spectives in Plant Ecology, Evolution and Systematics 9,137–152.
species’ distributions from occurrence data. Ecography 29, 129– Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
151. Journal of the Royal Statistical Society, Series B 58,267–288.
Elith, J., Leathwick, J. R., and Hastie, T. (2008). A working guide Warton, D. I. and Shepherd, L. C. (2010). Poisson point process mod-
to boosted regression trees. Journal of Animal Ecology 77, 802– els solve the “pseudo-absence problem” for presence-only data in
813. ecology. Annals of Applied Statistics 4, 1383–1402.
Elith, J., Phillips, S. J., Hastie, T., Dudı́k, M., Chee, Y. E., and Yates, Zou, H., Hastie, T., and Tibshirani, R. (2007). On the “degrees of free-
C. J. (2011). A statistical explanation of MaxEnt for ecologists. dom” of the lasso. The Annals of Statistics 35, 2173–2192.
Diversity and Distributions 17, 43–57.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized
likelihood and its oracle properties. Journal of the American Sta- Received March 2012, Revised July 2012
tistical Association 96, 1348–1360. Accepted August 2012.

View publication stats

You might also like