0% found this document useful (0 votes)
40 views12 pages

Rethinking Receiver Operating Characteristic Analysis

This document discusses limitations of using receiver operating characteristic (ROC) analysis to evaluate ecological niche models. It notes that ROC analysis favors models that predict across the entire spectrum of proportional areas, rather than those with narrower ranges. Additionally, ROC does not properly distinguish between errors in species distribution modeling versus ecological niche modeling. The authors propose modifying ROC analysis to address these issues by incorporating niche modeling considerations and using partial-area ROC approaches.

Uploaded by

GOD EXO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views12 pages

Rethinking Receiver Operating Characteristic Analysis

This document discusses limitations of using receiver operating characteristic (ROC) analysis to evaluate ecological niche models. It notes that ROC analysis favors models that predict across the entire spectrum of proportional areas, rather than those with narrower ranges. Additionally, ROC does not properly distinguish between errors in species distribution modeling versus ecological niche modeling. The authors propose modifying ROC analysis to address these issues by incorporating niche modeling considerations and using partial-area ROC approaches.

Uploaded by

GOD EXO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/222825004

Rethinking receiver operating characteristic analysis applications in


ecological niche modeling

Article in Ecological Modelling · April 2008


DOI: 10.1016/j.ecolmodel.2007.11.008

CITATIONS READS

1,487 3,187

3 authors:

Andrew Townsend Peterson Monica Papes


University of Kansas University of Tennessee
1,071 PUBLICATIONS 89,694 CITATIONS 124 PUBLICATIONS 5,549 CITATIONS

SEE PROFILE SEE PROFILE

Jorge Soberón
University of Kansas
224 PUBLICATIONS 32,968 CITATIONS

SEE PROFILE

All content following this page was uploaded by Andrew Townsend Peterson on 15 October 2017.

The user has requested enhancement of the downloaded file.


This article was published in an Elsevier journal. The attached copy
is furnished to the author for non-commercial research and
education use, including for instruction at the author’s institution,
sharing with colleagues and providing to institution administration.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:

http://www.elsevier.com/copyright
Author's personal copy

e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72

available at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/ecolmodel

Rethinking receiver operating characteristic analysis


applications in ecological niche modeling

A. Townsend Peterson ∗ , Monica Papeş, Jorge Soberón


Natural History Museum and Biodiversity Research Center, The University of Kansas, Lawrence, KS 66045 USA

a r t i c l e i n f o a b s t r a c t

Article history: The area under the curve (AUC) of the receiver operating characteristic (ROC) has become
Received 25 May 2007 a dominant tool in evaluating the accuracy of models predicting distributions of species.
Received in revised form ROC has the advantage of being threshold-independent, and as such does not require deci-
29 October 2007 sions regarding thresholds of what constitutes a prediction of presence versus a prediction of
Accepted 16 November 2007 absence. However, we show that, comparing two ROCs, using the AUC systematically under-
Published on line 9 January 2008 values models that do not provide predictions across the entire spectrum of proportional
areas in the study area. Current ROC approaches in ecological niche modeling applica-
Keywords: tions are also inappropriate because the two error components are weighted equally. We
Ecological niche model recommend a modification of ROC that remedies these problems, using partial-area ROC
Model evaluation approaches to provide a firmer foundation for evaluation of predictions from ecological
Receiver operating characteristic niche models. A worked example demonstrates that models that are evaluated favorably by
Area under curve traditional ROC AUCs are not necessarily the best when niche modeling considerations are
Omission error incorporated into the design of the test.
© 2007 Elsevier B.V. All rights reserved.

The tools and techniques of ecological niche modeling (ENM) al., 1988), as exemplified by a recent, large-scale model com-
and the related ideas of species distribution modeling (SDM) parison (Elith et al., 2006) and many similar studies. Spatial
have seen an impressive increase in activity in recent years predictions can present errors of omission (false negatives,
(Guisan and Zimmermann, 2000; Soberón and Peterson, 2004; leaving out known distributional area) and errors of com-
Araújo and Guisan, 2006). Many facets of these tools and mission (false positives, including unsuitable areas in the
their application have been examined in detailed analyses prediction). ROC analysis involves plotting sensitivity (i.e.,
(Stockwell and Peterson, 2002a,b, 2003; Anderson et al., 2003; proportion of known presences predicted present, = 1 − false
Pearson and Dawson, 2003; Araújo et al., 2005a,b; Guisan and negative rate) against 1 − specificity (i.e., proportion of known
Thuiller, 2005; Guisan et al., 2006; Pearson et al., 2007) that have absences predicted present, = false positive rate; Fig. 1). The
greatly clarified the conditions of their use. However, in spite area under the ROC curve (AUC) is then compared against null
of such attention, the issue of how to evaluate predictions of expectations [the area under the line linking the origin with
these models statistically remains an area that is incompletely upper right corner of the graph (1,1), = 0.5] either probabilisti-
and unsatisfactorily resolved (Fielding and Bell, 1997; Araújo cally or via bootstrap manipulations.
and Guisan, 2006; Guisan et al., 2006; Lobo et al., 2007). Here, we point out two sources of problems in ROC analyses
In recent publications, statistical evaluations of niche and that consistently favor certain kinds of algorithms over others.
distribution model predictions have generally been based on The first limitation of ROCs derives from the fact that certain
receiver operating characteristic (ROC) analyses (DeLong et algorithms span broad spectra of possible commission errors,


Corresponding author. Tel.: +1 785 864 3926.
E-mail address: [email protected] (A.T. Peterson).
0304-3800/$ – see front matter © 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.ecolmodel.2007.11.008
Author's personal copy

64 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72

whereas others are restricted to smaller ranges—we show that sion error), is plotted against the proportion of false positives
ROCs consistently favor the former over the latter. The sec- [b/(b + d)], which in turn is equivalent to 1 − specificity or the
ond limitation derives from the very different meanings of commission error. The plot in ROC space of sensitivity ver-
“absence” in the context of ENM versus SDM; as currently used, sus 1 − specificity displays how well an algorithm classifies
ROC analyses do not distinguish between the two, and, again, instances as the threshold changes. In SDM and ENM appli-
consistently favor model predictions oriented toward one type cations, threshold changes mean that the area predicted as
of analysis (SDM) over the other (ENM). We present a modifica- present also changes. Important sectors of this ROC space are
tion of the traditional ROC approach that takes steps towards the origin (0,0), where the algorithm never falsely identifies
resolving these two problems. absences, but it fails to identify every known presence (which
is useless); the top right corner (1,1), where the algorithm
identifies every true presence correctly, but misidentifies all
1. The (simple part of the) problem: absences as positives (also useless, although in a different
unequal span of model predictions way). Finally, in the top left corner (0,1), the algorithm cor-
rectly identifies all true positives and never misclassifies a true
A diverse set of inferential tools has been applied to the absence as a presence. Therefore, the regions in ROC space
challenge of estimating niches and predicting geographic dis- near the (0,1) corner represent model predictions that success-
tributions of species (Elith et al., 2006; Peterson, 2006), ranging fully identify true presences and seldom misidentify absences
from simple range rules to complex neural networks, genetic as presences.
algorithms, maximum entropy, and multivariate regression Now consider the behavior of a random classifier. Such an
algorithms. The outputs from these different techniques have algorithm always randomly identifies as present a fixed pro-
different characteristics: most relevant here is that different portion p of any set of instances, a function of the proportional
techniques may span very different ranges of predicted area area predicted present. This prediction rate is represented by
of presence of a species (e.g., range rules predict one or a few the straight line joining the points (0,0) and (1,1). A random
thresholds, whereas multivariate regression approaches pro- classificatory algorithm will select as present only a fraction
duce prediction across most of the spectrum of probabilities p of true presences, giving a value of p on the sensitivity axis
from 0 to 1). These differences, however, have implications (y-axis). It will also select (wrongly) a fraction p of absences as
for how AUC scores are calculated, because AUC calculations presences, giving the same value of p on the x-axis. Therefore,
assume that 1 − specificity spans the entire range [0,1], even as p varies, a line in which true presences = false presences is
though model predictions may not span that whole range. traced (Fig. 1).
Special modifications to the approach are required for devel- The above ideas can be applied directly to situations in
opment of AUC comparisons in partial ROCs that span only which true presences and true absences are known, such as
a subset of the full spectrum of areal predictions (Jiang et al., the typical SDM problem (Guisan and Zimmermann, 2000).
1996; Dodd and Pepe, 2003). By varying the threshold at which the score of an algorithm
ROC can be applied directly to evaluation of SDM predic- is regarded as a presence, a curve in ROC space is traced
tions (Fielding and Bell, 1997; Fawcett, 2003; Phillips et al., (Fig. 1); elevation of this curve above the straight line of ran-
2006), although even this functionality is not above question dom expectation is a measure of the discrimination capacity
(Lobo et al., 2007). A SDM produces a prediction value related of the algorithm (i.e., its capacity to classify correctly true pres-
(sometimes equal) to the probability that a species is present ences and true absences) (Fielding and Bell, 1997; Guisan and
in a cell. By assigning thresholds, the continuous scores can Zimmermann, 2000). In an ENM context, however, the situa-
be turned into binary predictions, which can be correct or tion is slightly different, but different in important ways (see
incorrect, producing a contingency table called the “confusion below).
matrix” (see Table 1). One confusion matrix exists per thresh- In comparing the performance of different algorithms, in
old value, and the four elements of the matrix can be used to either a SDM or ENM context, a problem exists that – to our
calculate error characteristics. knowledge – has not been discussed previously in the litera-
In a conventional ROC, the proportion of true positives ture on ENM or SDM: that some algorithms span the entire
[a/(a + c)], equivalent to the sensitivity (or absence of omis- range of possible commission errors, while others cover only
comparatively small regions of the overall ROC plot, either by
design or by the intrinsic operation of the algorithm. In other
words, while one algorithm may predict responses from 0 to
Table 1 – Schema of a confusion matrix, in which 100% of false positives, another may predict only in the range
predicted presences and absences are related to their of, for example, 40–90% (illustrated in Fig. 2 for Maxent, which
known status as observed presence or absence predicts across the whole spectrum of areas, compared with
Observed GARP, which predicts only at the broader end of the spectrum,
i.e., above ∼60%; details of methodologies for model genera-
Present Absent
tion are provided below in the worked example). Note that the
Predicted x-axis differs from that of a conventional ROC curve, an issue
Present a b that will be discussed in detail below.
Absent c d In practice, the ROC AUC is calculated based on a series
of trapezoids (Fawcett, 2003), with the curve in essence “con-
See text for explanation.
necting the dots” in representing the different thresholds of
Author's personal copy

e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72 65

Fig. 1 – Summary of new recommendations for receiver operating characteristic (ROC) analysis in niche modeling. Upper left
hand panel: traditional ROC approach, comparing the AUC of the test curve with 0.5, which is the AUC of the null
expectation curve. Upper right hand panel: comparison of two curves, and illustration of how the user-chosen error
tolerance E identifies different critical area thresholds for the two curves. Lower panels: illustration of the AUC comparisons
that would be used to characterize each of the two curves.

the prediction. In the example in Fig. 2, Maxent has an AUC of


0.72 (ratio of observed to null expectations = 1.44), but GARP an
2. Niche modeling considerations
AUC of only 0.63 (ratio = 1.26; Table 1)—the difference obtained
because the first point of the GARP curve is automatically The above is an artifactual problem that can affect any ROC
connected by a straight line to the origin. This procedure analysis applied to analyses of different extents in predic-
in effect penalizes algorithms with ROC curves that do not tions of proportional areas (Jiang et al., 1996; Dodd and Pepe,
begin at or near the origin. In other words, in the example in 2003). Another more subtle problem affects ROC analyses in
Fig. 2, since GARP only predicts relatively broad geographic ENM applications. Previous contributions have discussed dif-
areas that have high rates of prediction of true presences ferences between models of species’ distributions and models
(= low omission error) and does not make predictions at lower of species’ ecological niches (Soberón and Peterson, 2004, 2005;
thresholds that would have higher omission errors, its ROC Peterson, 2006). Although seemingly a minor distinction, these
curve is defined only within a subset of possible areas. In differences have important implications for how model pre-
this sense, we can now distinguish between two types of poor dictions should be evaluated.
performance in ROC analyses: ROC curves that are genuinely Models of ecological niches are designed explicitly to pre-
lower and closer to the line of no information (AUC = 0.5), ver- dict potential areas of distribution, and therefore are generally
sus those that have artificially low AUC scores because they broader than actual distributional areas (Hirzel et al., 2002;
do not predict across the whole spectrum of proportional Soberón and Peterson, 2005; Phillips et al., 2006). Often, ENM
area predicted present. These complications are far from lim- applications are based on presence information only, owing
ited to GARP, however—BIOCLIM and related algorithms offer quite simply to the practical lack of absence information,
only a few thresholds of prediction, and many regression- but even if absence data were to be available, they would
based approaches have limited ranges of probabilities have to be data regarding absence from the potential distribu-
predicted. tional area. As a consequence, data on absences of species
Author's personal copy

66 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72

Fig. 2 – Summary of characteristics of three example model predictions. Top panel: area predicted present across the
spectrum of thresholds for each model, based on a 3-threshold moving window. The bottom panel approximates closely a
traditional receiver operating characteristic model, except that the x-axis is measured as proportion of the study area
predicted present instead of being measured as success in predicting absence points; for simplicity, only GARP and Maxent
results are shown in this panel.
Author's personal copy

e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72 67

are of dubious utility in the process of modeling ecological calibrated based on successful versus unsuccessful prediction
niches, unless some approach for generating more realistic of absence points, but rather on the proportion of the overall
absence data is used (Lobo et al., 2006). This point is easily area under consideration predicted as present. This change
understandable if one considers invasive species: if one had follows the reasoning that Phillips et al. (Phillips et al., 2006)
been modeling the species’ ecological niche a few decades used in substituting “background points” for “absence points”
prior to its introduction in the novel region, one would have in their analyses of the Maxent algorithm. (2) AUC calculations
counted its future adventive distributional area as absence. are restricted to the domain of prediction of the algorithm,
This area was actually quite within the ecological niche and do not extend to intervals along the x-axis in which an
dimensions of the species, as was demonstrated by the later algorithm does not make predictions. Finally, (3) we restrict
invasion. AUC calculations to the domain within which omission error
As such, unless obtained somehow from the potential dis- is sufficiently low as to meet user-defined requirements of pre-
tribution (e.g., if annual plants were seeded experimentally dictive ability (Pepe, 2000). In this section, we develop these
across the region), absence data should not be employed in latter ideas in detail, and then illustrate the differences in a
evaluating model quality in ENM applications. For this reason, worked example.
and following previous observations by Phillips et al. (Phillips We begin by defining a user-selected parameter E, which
et al., 2006) that the logic of ROC allows for more general refers to the amount of error admissible along the true-
partitioning of instances than “presences” versus “absences,” positives axis, given the requirements and conditions of
we follow a modified ROC procedure that disposes entirely the study. This parameter refers to how much omission
of absence data. Rather, we calculate the values used as error is acceptable—it might be set at E = 0 in applications
the x-axis as the proportion of the overall area predicted as in which highest-quality occurrence data are used, or it
present, rather than using commission error calculated based might be higher (perhaps 5–20%) when the occurrence data
on vaguely defined (and often unavailable) data summarizing are known to include certain amounts of error (e.g., when
“absences” (Phillips et al., 2006). using “found” data). Hence, the researcher considers the
Since absence data have been omitted and the error characteristics of the data that will be used to test
1 − specificity axis changed to proportion of area predicted the model predictions and the needs of the particular study,
as present, new interpretations are needed. In previous and chooses a value of E appropriate to the question at
analyses, in which niche models were evaluated via expert hand.
opinion (Anderson et al., 2003), it was shown that omission Fig. 1 illustrates these ideas graphically: the upper left-
error characteristics are more important in distinguishing hand panel depicts a typical ROC analysis, in which a curve
good from bad models than are commission error consider- representing some model prediction has an AUC = 0.8, which
ations. Put simply, in a niche-modeling framework, a model is then compared to the AUC for a line of null expectations
that errs by omitting known points of presence is more (= 0.5) and significance values are obtained either by combina-
seriously flawed than one that predicts areas not known to torial probability calculations or by bootstrapping (DeLong et
be inhabited (Raxworthy et al., 2003). Among models that al., 1988; Vida, 2006). The upper right-hand panel shows two
overpredict, what is more, some ‘overprediction’ is in disjunct such curves, one of which (curve A) is clearly ‘better’ than the
areas that likely represent areas inaccessible to the species other (curve B), in that it is more elevated from the line of null
for reasons unrelated to landscape suitability (e.g., historical expectations.
dispersal limitations, speciation events, interspecific interac- In our proposed modification, the line defined by 1 − E on
tions) (Peterson et al., 1999; Wiens, 2004; Wiens and Graham, the vertical axis is intersected with the two ROC curves, and
2005): these areas do not represent model prediction error, the projection of each to the x-axis is used to identify key
but rather offer an accurate depiction of the spatial extent area thresholds for the models, in this case xA and xB (Fig. 1).
of habitable conditions for the species. Other models may The lower 2 panels of Fig. 1 show the AUC comparisons for
reconstruct overly broad suites of environmental conditions the ROC curves that would be used in our modified compar-
as suitable for the species—these models genuinely fail in isons. In each, we consider only the portion of the ROC curve
reconstructing the ecological niche of the species because that lies within the predictive range of the modeling algo-
they do not distinguish effectively between potential presence rithm and within the range of acceptable models in terms
and absence. Distinguishing between these two possibilities of omission error (1 − E to 1). Also in each, the null expecta-
(predicting areas not inhabited for nonecological reasons tions of AUC are <0.5 because only part of the full range of
versus predicting an overly broad suite of environmental proportional areas predicted present is included in the calcu-
conditions) represents an important ongoing priority in the lations. The area under the ROC curve for each model can then
development of this field, and depends in large part on being be calculated empirically as a series of trapezoids (DeLong et
able to decide which models are “better” than others. al., 1988; Burden and Faires, 2005). Given both the change in
the definition of the x-axis and the now-variable AUC for the
null expectation, we now express ROC results as ratios of the
3. Modified ROC approach area under the observed curve to the area under the trape-
zoid defined by the random line and the interval xA (or xB )
Given the above considerations, we outline a series of mod- to 1. This value departs from unity as the model’s ROC curve
ifications to ROC analysis that make it consistent with the improves with respect to random expectations, and compar-
characteristics of ENM applications, building on previous work isons of model ROC AUCs with null expectations must be
with partial-area ROC analyses, as follows. (1) The x-axis is not achieved by means of bootstrapping.
Author's personal copy

68 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72

Fig. 3 – Occurrence data used in the example discussed in the text: black and white points are occurrences of Mourning
Dove (Zenaida macroura) drawn from the North American Breeding Bird Survey (1991–2000). Models were built based on the
points in the off-diagonal quadrants, and were tested based on the points in the on-diagonal quadrants.

We characterized North American (24.3–76.5◦ N, 52.0–


4. Worked example 169.5◦ W) environments based on 19 biologically meaning-
ful climate parameters drawn from the 10 WorldClim data
4.1. Methods set (Hijmans et al., 2005), supplemented with information on
topographic features summarized in four additional raster
We use here an example analysis drawn from a recent compar- data layers (elevation, slope, aspect, compound topographic
ative study (Peterson et al., 2007). As full details are provided in index) from the 1 km resolution Hydro-1K digital elevation
that publication, and given that the points made herein are not model data set (USGS, 2001). All data sets were resampled
specific to any particular methodology, we here only provide to 10 resolution to reflect the spatial accuracy of the occur-
a sketch of the methods that were employed. We based this rence data; the dimensionality of the environmental data was
example on Mourning Dove (Zenaida macroura) occurrence data reduced by means of principal components analysis (PCA) to
drawn from the North American Breeding Bird Survey (BBS) create new axes that summarized variation in fewer (indepen-
(Sauer et al., 2001). To assure that occurrences used in anal- dent) dimensions (Peterson et al., 2007). We retained the first
yses represent reasonably stable populations, we used only 11 components, which together explained >99% of the overall
BBS survey routes on which the species had been detected variation in environmental parameters.
in ≥8 years in 1991–2000; overall, 1202 presence points were Several approaches have been used to approximate species’
available for the analyses. ecological niches (Segurado and Araújo, 2004), as exemplified
To challenge the ENM algorithms to predict into broad by a recent broad comparative study of model performance
unsampled areas (a niche-modeling challenge), we separated (Elith et al., 2006). Here, for the purpose of illustration, we com-
available occurrence points into quadrants based on whether pared three methods: one that performed relatively poorly in
their coordinates fell above or below the median longitude and the Elith et al. (2006) study, the Genetic Algorithm for Rule-set
median latitude of occurrence localities. Henceforth, we refer Prediction (GARP) (Stockwell, 1999; Pereira, 2002), versus one
to the NW and SE quadrants as ‘on-diagonal,’ and the NE and of the top performers, a maximum entropy (Maxent) approach
SW pair of quadrants as ‘off-diagonal’ (Fig. 3); we trained mod- (Phillips et al., 2006). Also included was the Minimum Distance
els based on off-diagonal quadrants (582 points) and tested algorithm (OpenModeller1 , version 0.1; hereafter MinDist),
them using the independent occurrence points in the on- which is equivalent to the simplest manifestation of DOMAIN,
diagonal quadrants (620 points) (Peterson and Shaw, 2003); this but based on Euclidean distances instead of on the Gower
manipulation challenges modeling algorithms to predict into metric (Carpenter et al., 1993), as this method presents some
unsampled regions, rather than simply interpolating or filling interesting contrasts with the other two algorithms.
gaps in a densely sampled landscape. It is important to note GARP models were developed using a desktop version
that all aspects of model development (including, e.g., best that permits flexibility in model development (Pereira, 2002).
subsets filtering in GARP) (Anderson et al., 2003) were carried
out on one pair of quadrants, and testing and model evaluation
in the other pair of quadrants only. 1
http://openmodeller.sourceforge.net/.
Author's personal copy

e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72 69

In GARP, occurrence points from the pair of quadrants on


which models are to be trained are divided randomly into
training and “extrinsic test data” sets; the former is again
divided evenly into “training data” (for model rule develop-
ment) and “intrinsic test data” sets (for model rule evaluation
and refinement). GARP works in an iterative process of rule
selection, evaluation, testing, and incorporation or rejection:
first, a method is chosen from a set of possibilities, and then
is applied to the training data and a rule developed; rules may
evolve by a number of means (e.g., truncation, point changes,
crossing-over among rules) to maximize predictivity. Predic-
tive accuracy is then evaluated independent points resampled
from the intrinsic test data, and change in predictive accuracy
from one iteration to the next is used to evaluate whether a
particular rule should be incorporated into the model. To force
GARP models to be general, and to minimize overfitting, fol-
lowing procedures in all recent GARP applications, we used
the best subsets procedure (Anderson et al., 2003). We then
summed the resulting 100 grids to create a surface summariz-
ing model agreement, with values ranging 0–100 as integers.
Maxent models were developed using software described
and tested in detail in recent publications (Phillips et al.,
2004, 2006). Maxent focuses on fitting a probability distribu-
tion for occurrence of the species in question to the set of
pixels across the study region, based on the idea that the
best explanation for unknown phenomena will maximize the
entropy of the probability distribution, subject to the appro-
priate constraints. In the case of modeling ecological niches
of species, these constraints consist of maintaining the dif-
ference between the mean values of the variable distributions
predicted by the algorithm and the observed means always
smaller than a “regularization parameter,” ˇ (Phillips et al.,
2004, 2006). We used default parameters for Maxent models
(i.e., no random subsampling, regularization multiplier = 1500
maximum iterations, 10,000 background points, convergence
limit = 10−5 ). Given the real-number nature of Maxent predic-
tions, and given the much-greater ease of manipulation of
integer grids, we imported results into ArcView as floating-
point grids, multiplied them by 100, and converted them to
integer grids for further analysis.
Finally, we estimated niche models using the very simple
MinDist algorithm. Here, for each pixel in the landscape, the
Euclidean distance in a normalized environmental space is
calculated to each known occurrence point. The minimum
of this set of distance measures is assigned as the predicted
value of the pixel in question. Although a maximum distance
parameter can be set to eliminate very large distances from
consideration, we did not make any such assumptions, and
rather allowed each pixel in the landscape to be assigned a
continuous variable that indicates similarity to known occur-
rences of the species.
We summarized these three models in various manners Fig. 4 – Illustration of receiver operating characteristic (ROC)
that relate to ROC analyses, all based solely on independent curves at different thresholds of E, the user-defined error
testing points from the quadrants that were not used to train tolerance. E = 100 is equivalent to the traditional ROC
the models. In particular, at each predictive threshold, we analysis, but the lower two panels show E = 5 and 1. The
calculated sensitivity as 1 − omission error, the latter mea- point is that the relationships between the curves change
sured based on the independent testing data from the other as one focuses on the lower-omission models instead of
two quadrants of the species’ distribution (Fig. 3). We calcu- the whole spectrum of thresholds.
lated AUCs using the trapezoid method (Burden and Faires,
2005), and present our AUC comparisons as the ratio of the
Author's personal copy

70 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72

Table 2 – Summary of statistics describing receiver operating characteristic curves for three modeling algorithms
(MinDist, GARP, Maxent) at each of three values of E, the threshold of acceptable omission error (MinDist statistics only
presented for two E values, for reasons explained in the text)
MinDist GARP Maxent

E = 100 E=5 E = 100 E=5 E=1 E = 100 E=5 E=1

Minimum 1.406 1.048 1.248 1.119 1.080 1.439 0.982 0.953


Maximum 1.500 1.130 1.304 1.183 1.122 1.539 1.179 1.118
Mean 1.457 1.093 1.273 1.146 1.087 1.488 1.132 1.060
Standard deviation 0.020 0.018 0.010 0.014 0.007 0.018 0.049 0.041
Number of replicates ≤ 1 0 0 0 0 0 0 18 25
P 9.8 × 10−114 2.7 × 10−7 1.0 × 10−174 8.1 × 10−25 8.2 × 10−34 1.3 × 10−156 0.0032 0.0736

Values presented are AUC ratios (minimum, maximum, mean, and standard deviation) across 200 bootstrap replicates, the number of bootstrap
replicates falling at or below unity, and the probability that the mean is ≤1 based on a standard normal variate associated with the mean and
standard deviation.

model AUC to the null expectation described above (referred by MinDist is unlikely to account for its poor performance in
to henceforth as “AUC ratios”). Bootstrapping manipulations to the partial-area AUC calculation, and we retain it in this exam-
permit evaluation of statistical significance of AUCs (as com- ple for full comparability with the other two methods. Full
pared with null expectations) were achieved by resampling 290 implementation of the methodology we present may wish to
test points (50% of the total test points available) with replace- limit comparisons in this region of the graph as well.
ment 200 times from the overall pool of testing data in S-Plus Finally, at E = 1%, MinDist provided no predictions in this
(version 7); one-tailed significance of differences in AUC from interval (Fig. 4), and so was excluded from calculations. GARP
the line of null expectations was assessed both via fitting a had an AUC ratio of 1.09, as compared with Maxent’s 1.06.
standard normal variate (the z-statistic) and calculating the While the GARP AUC was significantly higher than null expec-
probability that the mean AUC ratio is ≤1, and separately by tations by both measures, the Maxent AUC ratio was ≤ 1 in
counting the number of bootstrap replicates with AUC ratios 12% of the bootstrap replicates, and the z-statistic yielded a
of ≤1. P = 0.074, indicating that the Maxent curve was not signifi-
cantly elevated above the null expectations (Table 2). Hence,
4.2. Application models that appeared most accurate in their predictions at
E = 100 were not the most accurate when model tests were
We developed modified ROC curves for each of the three model restricted to the region of interest in the niche modeling exer-
outputs at each of three values of E: 100% (in which the user cise.
accepts models across the entire spectrum of areas predicted
as present, equivalent to the traditional ROC application), 5,
and 1% (Fig. 4, Table 2). As mentioned above, at E = 100%, Max- 5. Discussion and conclusions
ent clearly outperformed GARP, as did MinDist. However, it is
clear in Fig. 4 that much of this difference springs from the fact We emphasize that the purpose of this contribution is not to
that the GARP model makes no predictions of less than ∼65% establish that any niche modeling method is better or worse
of the study area: the chord drawn from that point on the graph than any other method. In fact, we considered removing mod-
to the origin leaves out much of the area included under the eling method names from the manuscript and replacing them
Maxent and MinDist curves. AUC ratios were 1.46 for MinDist with “X,” “Y,” and “Z,” to focus readers’ attention on the key
and 1.49 for Maxent, but only 1.27 for GARP, although all three points. In particular, we assert that currently accepted model
were significantly elevated above the line of null expectations evaluation techniques are not adequate for niche modeling
(bootstrap manipulation, all P  0.05; Table 1). applications, and can yield inaccurate and inappropriate con-
At E = 5%, however, the relative positions of the curves shift. clusions in many cases. This paper presents a first set of steps
Ignoring the lower part of the curves (corresponding to model towards remedying these failings.
thresholds that omit more than the user-stated tolerance), A recent broad comparison (in which two of the authors
now Maxent and GARP are the higher curves, and MinDist is of this paper participated) compared 14 modeling methods,
considerably lower (Fig. 4, Table 2). AUC ratios were highest and identified a suite of methods with particularly good
for Maxent and GARP (1.13 and 1.15, respectively), and lower predictive abilities that included Maxent (Elith et al., 2006).
for MinDist (1.09). Although all three were statistically sig- This large-scale comparison was nonetheless designed as an
nificantly better than null expectations using the z-statistic, SDM (distribution-modeling) exercise, and as such included
Maxent did not achieve statistical significance based on the absence data as an integral element in model testing. The
simpler counts of numbers of replicates with AUC ratios of > 1 three measures of model predictivity employed (the custom-
(Table 2). It is worthy of note that MinDist provides predic- ary ROC analyses, a kappa statistic, and a correlation-based
tions only up to 84.8% of the study area, and so the region procedure) all balance correct predictions of presences and
between that value and unity along the x-axis was filled by a absences, measuring the ability of an algorithm to discrim-
straight line; the incomplete trace of the ROC curve provided inate between sites where a species is present and those
Author's personal copy

e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72 71

where it is absent (Elith et al., 2006). However, as demonstrated ton, and Simon Ferrier, although we may or may not have
above, traditional ROC approaches can identify a method as agreed.
highly accurate when the method in fact is inferior in the
range of predictive thresholds that is likely to be of inter-
est in niche modeling exercises (this point indicates that part references
of the variation among models in the study in question is
artifactual, regardless of whether the application is one of
SDM or one of ENM). Furthermore, we point out that many Anderson, R.P., Lew, D., Peterson, A.T., 2003. Evaluating predictive
recent applications of these methods are explicitly ENM (niche- models of species’ distributions: criteria for selecting optimal
models. Ecol. Model. 162, 211–232.
modeling) applications (Anderson et al., 2002; Pearson et al.,
Anderson, R.P., Peterson, A.T., Gómez-Laverde, M., 2002. Using
2002; Graham et al., 2004; Araújo et al., 2005a; Thuiller et al., niche-based GIS modeling to test geographic predictions of
2005; Wiens and Graham, 2005), so the customary ROC analy- competitive exclusion and competitive release in South
ses should be regarded with caution in these cases. American pocket mice. Oikos 93, 3–16.
The evaluation methodology outlined in this paper Araújo, M.B., Guisan, A., 2006. Five (or so) challenges for species
achieves several of our goals. First, it removes the emphasis distribution modelling. J. Biogeogr. 33, 1677–1688.
Araújo, M.B., Pearson, R.G., Thuiller, W., Erhard, M., 2005a.
on absence data, which in niche-modeling applications can
Validation of species-climate impact models under climate
be positively misleading (Peterson, 2006). Second, it empha-
change. Global Change Biol. 11, 1504–1513.
sizes the key role of omission error in evaluating niche model Araújo, M.B., Whittaker, R.J., Ladle, R.J., Erhard, M., 2005b.
predictivity (Anderson et al., 2003). Finally, we follow a previ- Reducing uncertainty in projections of extinction risk from
ous suggestion in a very different application of ROC analysis climate change. Global Ecol. Biogeogr. 14, 529–538.
that analyses may best be limited to subsectors of the ROC Burden, R.L., Faires, J.D., 2005. Numerical Analysis, eighth ed.
space when certain portions of that space are not directly Thomson Books, Belmont, California.
Carpenter, G., Gillison, A.N., Winter, J., 1993. DOMAIN: a flexible
relevant to applications of interest (Jiang et al., 1996; Pepe,
modeling procedure for mapping potential distributions of
2000; Dodd and Pepe, 2003)—in niche modeling, this modifica- animals and plants. Biodivers. Conserv. 2, 667–680.
tion allows the user to set bounds on the types of predictions DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L., 1988.
that are to be considered (Pepe, 2000; Dodd and Pepe, 2003). A Comparing the areas under two or more correlated receiver
researcher interested in evaluating the invasive potential of a operating characteristic curves: a nonparametric approach.
species would almost certainly be disappointed with a method Biometrics 44, 837–845.
that performs well at thresholds that have associated omis- Dodd, L.E., Pepe, M.S., 2003. Partial AUC estimation and
regression. Biometrics 59, 614–623.
sion errors of >50%! Taking the intended uses of the model
Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S.,
into account, as well as the error-related characteristics of the Guisan, A., Hijmans, R.J., Huettman, F., Leathwick, J.R.,
input data, is an important refinement to model evaluation Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G.,
approaches. Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M.,
Clearly, much work remains in the development of these Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira,
methodologies. In reality, limits both to the sensitivity and the R., Schapire, R.E., Soberón, J., Williams, S.E., Wisz, M.S.,
Zimmermann, N.E., 2006. Novel methods improve prediction
false-positive axes may be desirable (Jiang et al., 1996; Dodd
of species’ distributions from occurrence data. Ecography 29,
and Pepe, 2003). In our modified approach, limiting the fraction
129–151.
of total area that an algorithm is allowed to predict (the over- Fawcett, R., 2003. ROC Graphs: Notes and Practical Considerations
prediction) may be biologically sensible. This restriction would for Data Mining Research. Technical Report HPL-2003-4. HP
create partial ROC curves, limited on one side by the minimum Laboratories, Palo Alto, California.
sensitivity acceptable, and on the other by the maximum over- Fielding, A.H., Bell, J.F., 1997. A review of methods for the
prediction that is tolerable. When more experience in the use assessment of prediction errors in conservation
presence/absence models. Environ. Conserv. 24, 38–49.
of our modified approach is gathered, development of a broad,
Graham, C.H., Ron, S.R., Santos, J.C., Schneider, C.J., Moritz, C.,
comparative study parallel to the previous (distribution mod- 2004. Integrating phylogenetics and environmental niche
eling) study (Elith et al., 2006), but based on niche modeling models to explore speciation mechanisms in dendrobatid
ideas, would be particularly instructive. Because the approach frogs. Evolution 58, 1781–1793.
we present here is distinct in several ways from conventional Guisan, A., Lehmann, A., Ferrier, S., Austin, M.P., Overton,
ROC analysis, the probabilistic interpretations of ROC scores J.M., Aspinall, R., Hastie, T., 2006. Making better
(Fawcett, 2003) and its relations with the Mann–Whitney test biogeographical predictions of species’ distributions. J. Appl.
Ecol. 43, 386–392.
will need to be reassessed. Finally, once the final form of the
Guisan, A., Thuiller, W., 2005. Predicting species distribution:
methodology for partial-ROC applications to predictions of offering more than simple habitat models. Ecol. Lett. 8,
species’ geographic distributions is clear, developing program 993–1009.
code to permit easy implementation would be desirable. Guisan, A., Zimmermann, N.E., 2000. Predictive habitat
distribution models in ecology. Ecol. Modell. 135, 147–186.
Hijmans, R.J., Cameron, S., Parra, J., 2005. WorldClim, Version 1.3,
Acknowledgements http://biogeo.berkeley.edu/worldclim/worldclim.htm.
University of California, Berkeley.
Hirzel, A.H., Hausser, J., Chessel, D., Perrin, N., 2002.
We thank many valued colleagues for discussions of these and Ecological-niche factor analysis: how to compute
related topics over the past several years, particularly Enrique habitat-suitability maps without absence data? Ecology 83,
Martı́nez-Meyer, Robert Anderson, Richard Pearson, Jake Over- 2027–2036.
Author's personal copy

72 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72

Jiang, Y., Metz, C.E., Nishikawa, R.M., 1996. A receiver operating Raxworthy, C.J., Martı́nez-Meyer, E., Horning, N., Nussbaum, R.A.,
characteristic partial area index for highly sensitive Schneider, G.E., Ortega-Huerta, M.A., Peterson, A.T., 2003.
diagnostic tests. Radiology 201, 745–750. Predicting distributions of known and unknown reptile
Lobo, J.M., Jimenez-Valverde, A., Real, R., 2007. AUC: a misleading species in Madagascar. Nature 426, 837–841.
measure of the performance of predictive distribution Sauer, J.R., Hines, J.E., Fallon, J., 2001. The North American
models. Global Ecol. Biogeogr.. Breeding Bird Survey, Results and Analysis 1966–2000, version
Lobo, J.M., Verdú, J.R., Numa, C., 2006. Environmental and 2001.2. USGS Patuxent Wildlife Research Center, Laurel, MD.
geographical factors affecting the Iberian distribution of Segurado, P., Araújo, M.B., 2004. An evaluation of methods for
flightless Jekelius species (Coleoptera: Geotrupidae). Divers. modelling species distributions. J. Biogeogr. 31, 1555–1568.
Distributions 12, 179–188. Soberón, J., Peterson, A.T., 2004. Biodiversity informatics:
Pearson, R.G., Dawson, T.P., 2003. Predicting the impacts of managing and applying primary biodiversity data. Philos.
climate change on the distribution of species: are bioclimate Trans. R. Soc. Lond. B 359, 689–698.
envelope models useful? Global Ecol. Biogeogr. 12, 361–371. Soberón, J., Peterson, A.T., 2005. Interpretation of models of
Pearson, R.G., Dawson, T.P., Berry, P.M., Harrison, P.A., 2002. fundamental ecological niches and species’ distributional
SPECIES: a spatial evaluation of climate impact on the areas. Biodivers. Inform. 2, 1–10.
envelope of species. Ecol. Modell. 154, 289–300. Stockwell, D.R.B., 1999. Genetic algorithms II. In: Fielding, A.H.
Pearson, R.G., Raxworthy, C., Nakamura, M., Peterson, A.T., 2007. (Ed.), Machine Learning Methods for Ecological Applications.
Predicting species’ distributions from small numbers of Kluwer Academic Publishers, Boston, pp. 123–144.
occurrence records: a test case using cryptic geckos in Stockwell, D.R.B., Peterson, A.T., 2002a. Controlling bias in
Madagascar. J. Biogeogr. 34, 102–117. biodiversity data. In: Scott, J.M., Heglund, P.J., Morrison, M.L.
Pepe, M.S., 2000. Receiver operating characteristic methodology. J. (Eds.), Predicting Species Occurrences: Issues of Scale and
Am. Stat. Assoc. 95, 308–311. Accuracy. Island Press, Washington, DC, pp. 537–546.
Pereira, R.S., 2002. Desktop GARP. Stockwell, D.R.B., Peterson, A.T., 2002b. Effects of sample size on
http://www.lifemapper.org/desktopgarp/. accuracy of species distribution models. Ecol. Modell. 148,
Peterson, A.T., 2006. Uses and requirements of ecological niche 1–13.
models and related distributional models. Biodivers. Inform. Stockwell, D.R.B., Peterson, A.T., 2003. Comparison of resolution
3, 59–72. of methods used in mapping biodiversity patterns from point
Peterson, A.T., Papeş, M., Eaton, M., 2007. Transferability and occurrence data. Ecol. Indicators 3, 213–221.
model evaluation in ecological niche modeling: a comparison Thuiller, W., Richardson, D.M., Pysek, P., Midgley, G.F., Hughes,
of GARP and Maxent. Ecography 30, 550–560. G.O., Rouget, M., 2005. Niche-based modelling as a tool for
Peterson, A.T., Shaw, J.J., 2003. Lutzomyia vectors for cutaneous predicting the risk of alien plant invasions at a global scale.
leishmaniasis in southern Brazil: ecological niche models, Global Change Biol. 11, 2234–2250.
predicted geographic distributions, and climate change USGS, 2001. HYDRO1k Elevation Derivative Database,
effects. Int. J. Parasitol. 33, 919–931. http://edcdaac.usgs.gov/gtopo30/hydro/. U.S. Geological
Peterson, A.T., Soberón, J., Sánchez-Cordero, V., 1999. Survey, Washington, D.C.
Conservatism of ecological niches in evolutionary time. Vida, S., 2006. Accumetric Test Performance Analysis, Version 1.1.
Science 285, 1265–1267. Accumetric Corporation, Montreal.
Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum Wiens, J.J., 2004. Speciation and ecology revisited: phylogenetic
entropy modeling of species geographic distributions. Ecol. niche conservatism and the origin of species. Evolution 58,
Modell. 190, 231–259. 193–197.
Phillips, S.J., Dudik, M., Schapire, R.E., 2004. A maximum entropy Wiens, J.J., Graham, C.H., 2005. Niche conservatism: integrating
approach to species distribution modeling. In: Proceedings of evolution, ecology, and conservation biology. Annu. Rev. Ecol.
the 21st International Conference on Machine Learning. Evol. Syst. 36, 519–539.

View publication stats

You might also like