Rethinking Receiver Operating Characteristic Analysis
Rethinking Receiver Operating Characteristic Analysis
net/publication/222825004
CITATIONS READS
1,487 3,187
3 authors:
Jorge Soberón
University of Kansas
224 PUBLICATIONS 32,968 CITATIONS
SEE PROFILE
All content following this page was uploaded by Andrew Townsend Peterson on 15 October 2017.
http://www.elsevier.com/copyright
Author's personal copy
e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72
available at www.sciencedirect.com
a r t i c l e i n f o a b s t r a c t
Article history: The area under the curve (AUC) of the receiver operating characteristic (ROC) has become
Received 25 May 2007 a dominant tool in evaluating the accuracy of models predicting distributions of species.
Received in revised form ROC has the advantage of being threshold-independent, and as such does not require deci-
29 October 2007 sions regarding thresholds of what constitutes a prediction of presence versus a prediction of
Accepted 16 November 2007 absence. However, we show that, comparing two ROCs, using the AUC systematically under-
Published on line 9 January 2008 values models that do not provide predictions across the entire spectrum of proportional
areas in the study area. Current ROC approaches in ecological niche modeling applica-
Keywords: tions are also inappropriate because the two error components are weighted equally. We
Ecological niche model recommend a modification of ROC that remedies these problems, using partial-area ROC
Model evaluation approaches to provide a firmer foundation for evaluation of predictions from ecological
Receiver operating characteristic niche models. A worked example demonstrates that models that are evaluated favorably by
Area under curve traditional ROC AUCs are not necessarily the best when niche modeling considerations are
Omission error incorporated into the design of the test.
© 2007 Elsevier B.V. All rights reserved.
The tools and techniques of ecological niche modeling (ENM) al., 1988), as exemplified by a recent, large-scale model com-
and the related ideas of species distribution modeling (SDM) parison (Elith et al., 2006) and many similar studies. Spatial
have seen an impressive increase in activity in recent years predictions can present errors of omission (false negatives,
(Guisan and Zimmermann, 2000; Soberón and Peterson, 2004; leaving out known distributional area) and errors of com-
Araújo and Guisan, 2006). Many facets of these tools and mission (false positives, including unsuitable areas in the
their application have been examined in detailed analyses prediction). ROC analysis involves plotting sensitivity (i.e.,
(Stockwell and Peterson, 2002a,b, 2003; Anderson et al., 2003; proportion of known presences predicted present, = 1 − false
Pearson and Dawson, 2003; Araújo et al., 2005a,b; Guisan and negative rate) against 1 − specificity (i.e., proportion of known
Thuiller, 2005; Guisan et al., 2006; Pearson et al., 2007) that have absences predicted present, = false positive rate; Fig. 1). The
greatly clarified the conditions of their use. However, in spite area under the ROC curve (AUC) is then compared against null
of such attention, the issue of how to evaluate predictions of expectations [the area under the line linking the origin with
these models statistically remains an area that is incompletely upper right corner of the graph (1,1), = 0.5] either probabilisti-
and unsatisfactorily resolved (Fielding and Bell, 1997; Araújo cally or via bootstrap manipulations.
and Guisan, 2006; Guisan et al., 2006; Lobo et al., 2007). Here, we point out two sources of problems in ROC analyses
In recent publications, statistical evaluations of niche and that consistently favor certain kinds of algorithms over others.
distribution model predictions have generally been based on The first limitation of ROCs derives from the fact that certain
receiver operating characteristic (ROC) analyses (DeLong et algorithms span broad spectra of possible commission errors,
∗
Corresponding author. Tel.: +1 785 864 3926.
E-mail address: [email protected] (A.T. Peterson).
0304-3800/$ – see front matter © 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.ecolmodel.2007.11.008
Author's personal copy
64 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72
whereas others are restricted to smaller ranges—we show that sion error), is plotted against the proportion of false positives
ROCs consistently favor the former over the latter. The sec- [b/(b + d)], which in turn is equivalent to 1 − specificity or the
ond limitation derives from the very different meanings of commission error. The plot in ROC space of sensitivity ver-
“absence” in the context of ENM versus SDM; as currently used, sus 1 − specificity displays how well an algorithm classifies
ROC analyses do not distinguish between the two, and, again, instances as the threshold changes. In SDM and ENM appli-
consistently favor model predictions oriented toward one type cations, threshold changes mean that the area predicted as
of analysis (SDM) over the other (ENM). We present a modifica- present also changes. Important sectors of this ROC space are
tion of the traditional ROC approach that takes steps towards the origin (0,0), where the algorithm never falsely identifies
resolving these two problems. absences, but it fails to identify every known presence (which
is useless); the top right corner (1,1), where the algorithm
identifies every true presence correctly, but misidentifies all
1. The (simple part of the) problem: absences as positives (also useless, although in a different
unequal span of model predictions way). Finally, in the top left corner (0,1), the algorithm cor-
rectly identifies all true positives and never misclassifies a true
A diverse set of inferential tools has been applied to the absence as a presence. Therefore, the regions in ROC space
challenge of estimating niches and predicting geographic dis- near the (0,1) corner represent model predictions that success-
tributions of species (Elith et al., 2006; Peterson, 2006), ranging fully identify true presences and seldom misidentify absences
from simple range rules to complex neural networks, genetic as presences.
algorithms, maximum entropy, and multivariate regression Now consider the behavior of a random classifier. Such an
algorithms. The outputs from these different techniques have algorithm always randomly identifies as present a fixed pro-
different characteristics: most relevant here is that different portion p of any set of instances, a function of the proportional
techniques may span very different ranges of predicted area area predicted present. This prediction rate is represented by
of presence of a species (e.g., range rules predict one or a few the straight line joining the points (0,0) and (1,1). A random
thresholds, whereas multivariate regression approaches pro- classificatory algorithm will select as present only a fraction
duce prediction across most of the spectrum of probabilities p of true presences, giving a value of p on the sensitivity axis
from 0 to 1). These differences, however, have implications (y-axis). It will also select (wrongly) a fraction p of absences as
for how AUC scores are calculated, because AUC calculations presences, giving the same value of p on the x-axis. Therefore,
assume that 1 − specificity spans the entire range [0,1], even as p varies, a line in which true presences = false presences is
though model predictions may not span that whole range. traced (Fig. 1).
Special modifications to the approach are required for devel- The above ideas can be applied directly to situations in
opment of AUC comparisons in partial ROCs that span only which true presences and true absences are known, such as
a subset of the full spectrum of areal predictions (Jiang et al., the typical SDM problem (Guisan and Zimmermann, 2000).
1996; Dodd and Pepe, 2003). By varying the threshold at which the score of an algorithm
ROC can be applied directly to evaluation of SDM predic- is regarded as a presence, a curve in ROC space is traced
tions (Fielding and Bell, 1997; Fawcett, 2003; Phillips et al., (Fig. 1); elevation of this curve above the straight line of ran-
2006), although even this functionality is not above question dom expectation is a measure of the discrimination capacity
(Lobo et al., 2007). A SDM produces a prediction value related of the algorithm (i.e., its capacity to classify correctly true pres-
(sometimes equal) to the probability that a species is present ences and true absences) (Fielding and Bell, 1997; Guisan and
in a cell. By assigning thresholds, the continuous scores can Zimmermann, 2000). In an ENM context, however, the situa-
be turned into binary predictions, which can be correct or tion is slightly different, but different in important ways (see
incorrect, producing a contingency table called the “confusion below).
matrix” (see Table 1). One confusion matrix exists per thresh- In comparing the performance of different algorithms, in
old value, and the four elements of the matrix can be used to either a SDM or ENM context, a problem exists that – to our
calculate error characteristics. knowledge – has not been discussed previously in the litera-
In a conventional ROC, the proportion of true positives ture on ENM or SDM: that some algorithms span the entire
[a/(a + c)], equivalent to the sensitivity (or absence of omis- range of possible commission errors, while others cover only
comparatively small regions of the overall ROC plot, either by
design or by the intrinsic operation of the algorithm. In other
words, while one algorithm may predict responses from 0 to
Table 1 – Schema of a confusion matrix, in which 100% of false positives, another may predict only in the range
predicted presences and absences are related to their of, for example, 40–90% (illustrated in Fig. 2 for Maxent, which
known status as observed presence or absence predicts across the whole spectrum of areas, compared with
Observed GARP, which predicts only at the broader end of the spectrum,
i.e., above ∼60%; details of methodologies for model genera-
Present Absent
tion are provided below in the worked example). Note that the
Predicted x-axis differs from that of a conventional ROC curve, an issue
Present a b that will be discussed in detail below.
Absent c d In practice, the ROC AUC is calculated based on a series
of trapezoids (Fawcett, 2003), with the curve in essence “con-
See text for explanation.
necting the dots” in representing the different thresholds of
Author's personal copy
e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72 65
Fig. 1 – Summary of new recommendations for receiver operating characteristic (ROC) analysis in niche modeling. Upper left
hand panel: traditional ROC approach, comparing the AUC of the test curve with 0.5, which is the AUC of the null
expectation curve. Upper right hand panel: comparison of two curves, and illustration of how the user-chosen error
tolerance E identifies different critical area thresholds for the two curves. Lower panels: illustration of the AUC comparisons
that would be used to characterize each of the two curves.
66 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72
Fig. 2 – Summary of characteristics of three example model predictions. Top panel: area predicted present across the
spectrum of thresholds for each model, based on a 3-threshold moving window. The bottom panel approximates closely a
traditional receiver operating characteristic model, except that the x-axis is measured as proportion of the study area
predicted present instead of being measured as success in predicting absence points; for simplicity, only GARP and Maxent
results are shown in this panel.
Author's personal copy
e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72 67
are of dubious utility in the process of modeling ecological calibrated based on successful versus unsuccessful prediction
niches, unless some approach for generating more realistic of absence points, but rather on the proportion of the overall
absence data is used (Lobo et al., 2006). This point is easily area under consideration predicted as present. This change
understandable if one considers invasive species: if one had follows the reasoning that Phillips et al. (Phillips et al., 2006)
been modeling the species’ ecological niche a few decades used in substituting “background points” for “absence points”
prior to its introduction in the novel region, one would have in their analyses of the Maxent algorithm. (2) AUC calculations
counted its future adventive distributional area as absence. are restricted to the domain of prediction of the algorithm,
This area was actually quite within the ecological niche and do not extend to intervals along the x-axis in which an
dimensions of the species, as was demonstrated by the later algorithm does not make predictions. Finally, (3) we restrict
invasion. AUC calculations to the domain within which omission error
As such, unless obtained somehow from the potential dis- is sufficiently low as to meet user-defined requirements of pre-
tribution (e.g., if annual plants were seeded experimentally dictive ability (Pepe, 2000). In this section, we develop these
across the region), absence data should not be employed in latter ideas in detail, and then illustrate the differences in a
evaluating model quality in ENM applications. For this reason, worked example.
and following previous observations by Phillips et al. (Phillips We begin by defining a user-selected parameter E, which
et al., 2006) that the logic of ROC allows for more general refers to the amount of error admissible along the true-
partitioning of instances than “presences” versus “absences,” positives axis, given the requirements and conditions of
we follow a modified ROC procedure that disposes entirely the study. This parameter refers to how much omission
of absence data. Rather, we calculate the values used as error is acceptable—it might be set at E = 0 in applications
the x-axis as the proportion of the overall area predicted as in which highest-quality occurrence data are used, or it
present, rather than using commission error calculated based might be higher (perhaps 5–20%) when the occurrence data
on vaguely defined (and often unavailable) data summarizing are known to include certain amounts of error (e.g., when
“absences” (Phillips et al., 2006). using “found” data). Hence, the researcher considers the
Since absence data have been omitted and the error characteristics of the data that will be used to test
1 − specificity axis changed to proportion of area predicted the model predictions and the needs of the particular study,
as present, new interpretations are needed. In previous and chooses a value of E appropriate to the question at
analyses, in which niche models were evaluated via expert hand.
opinion (Anderson et al., 2003), it was shown that omission Fig. 1 illustrates these ideas graphically: the upper left-
error characteristics are more important in distinguishing hand panel depicts a typical ROC analysis, in which a curve
good from bad models than are commission error consider- representing some model prediction has an AUC = 0.8, which
ations. Put simply, in a niche-modeling framework, a model is then compared to the AUC for a line of null expectations
that errs by omitting known points of presence is more (= 0.5) and significance values are obtained either by combina-
seriously flawed than one that predicts areas not known to torial probability calculations or by bootstrapping (DeLong et
be inhabited (Raxworthy et al., 2003). Among models that al., 1988; Vida, 2006). The upper right-hand panel shows two
overpredict, what is more, some ‘overprediction’ is in disjunct such curves, one of which (curve A) is clearly ‘better’ than the
areas that likely represent areas inaccessible to the species other (curve B), in that it is more elevated from the line of null
for reasons unrelated to landscape suitability (e.g., historical expectations.
dispersal limitations, speciation events, interspecific interac- In our proposed modification, the line defined by 1 − E on
tions) (Peterson et al., 1999; Wiens, 2004; Wiens and Graham, the vertical axis is intersected with the two ROC curves, and
2005): these areas do not represent model prediction error, the projection of each to the x-axis is used to identify key
but rather offer an accurate depiction of the spatial extent area thresholds for the models, in this case xA and xB (Fig. 1).
of habitable conditions for the species. Other models may The lower 2 panels of Fig. 1 show the AUC comparisons for
reconstruct overly broad suites of environmental conditions the ROC curves that would be used in our modified compar-
as suitable for the species—these models genuinely fail in isons. In each, we consider only the portion of the ROC curve
reconstructing the ecological niche of the species because that lies within the predictive range of the modeling algo-
they do not distinguish effectively between potential presence rithm and within the range of acceptable models in terms
and absence. Distinguishing between these two possibilities of omission error (1 − E to 1). Also in each, the null expecta-
(predicting areas not inhabited for nonecological reasons tions of AUC are <0.5 because only part of the full range of
versus predicting an overly broad suite of environmental proportional areas predicted present is included in the calcu-
conditions) represents an important ongoing priority in the lations. The area under the ROC curve for each model can then
development of this field, and depends in large part on being be calculated empirically as a series of trapezoids (DeLong et
able to decide which models are “better” than others. al., 1988; Burden and Faires, 2005). Given both the change in
the definition of the x-axis and the now-variable AUC for the
null expectation, we now express ROC results as ratios of the
3. Modified ROC approach area under the observed curve to the area under the trape-
zoid defined by the random line and the interval xA (or xB )
Given the above considerations, we outline a series of mod- to 1. This value departs from unity as the model’s ROC curve
ifications to ROC analysis that make it consistent with the improves with respect to random expectations, and compar-
characteristics of ENM applications, building on previous work isons of model ROC AUCs with null expectations must be
with partial-area ROC analyses, as follows. (1) The x-axis is not achieved by means of bootstrapping.
Author's personal copy
68 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72
Fig. 3 – Occurrence data used in the example discussed in the text: black and white points are occurrences of Mourning
Dove (Zenaida macroura) drawn from the North American Breeding Bird Survey (1991–2000). Models were built based on the
points in the off-diagonal quadrants, and were tested based on the points in the on-diagonal quadrants.
e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72 69
70 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72
Table 2 – Summary of statistics describing receiver operating characteristic curves for three modeling algorithms
(MinDist, GARP, Maxent) at each of three values of E, the threshold of acceptable omission error (MinDist statistics only
presented for two E values, for reasons explained in the text)
MinDist GARP Maxent
Values presented are AUC ratios (minimum, maximum, mean, and standard deviation) across 200 bootstrap replicates, the number of bootstrap
replicates falling at or below unity, and the probability that the mean is ≤1 based on a standard normal variate associated with the mean and
standard deviation.
model AUC to the null expectation described above (referred by MinDist is unlikely to account for its poor performance in
to henceforth as “AUC ratios”). Bootstrapping manipulations to the partial-area AUC calculation, and we retain it in this exam-
permit evaluation of statistical significance of AUCs (as com- ple for full comparability with the other two methods. Full
pared with null expectations) were achieved by resampling 290 implementation of the methodology we present may wish to
test points (50% of the total test points available) with replace- limit comparisons in this region of the graph as well.
ment 200 times from the overall pool of testing data in S-Plus Finally, at E = 1%, MinDist provided no predictions in this
(version 7); one-tailed significance of differences in AUC from interval (Fig. 4), and so was excluded from calculations. GARP
the line of null expectations was assessed both via fitting a had an AUC ratio of 1.09, as compared with Maxent’s 1.06.
standard normal variate (the z-statistic) and calculating the While the GARP AUC was significantly higher than null expec-
probability that the mean AUC ratio is ≤1, and separately by tations by both measures, the Maxent AUC ratio was ≤ 1 in
counting the number of bootstrap replicates with AUC ratios 12% of the bootstrap replicates, and the z-statistic yielded a
of ≤1. P = 0.074, indicating that the Maxent curve was not signifi-
cantly elevated above the null expectations (Table 2). Hence,
4.2. Application models that appeared most accurate in their predictions at
E = 100 were not the most accurate when model tests were
We developed modified ROC curves for each of the three model restricted to the region of interest in the niche modeling exer-
outputs at each of three values of E: 100% (in which the user cise.
accepts models across the entire spectrum of areas predicted
as present, equivalent to the traditional ROC application), 5,
and 1% (Fig. 4, Table 2). As mentioned above, at E = 100%, Max- 5. Discussion and conclusions
ent clearly outperformed GARP, as did MinDist. However, it is
clear in Fig. 4 that much of this difference springs from the fact We emphasize that the purpose of this contribution is not to
that the GARP model makes no predictions of less than ∼65% establish that any niche modeling method is better or worse
of the study area: the chord drawn from that point on the graph than any other method. In fact, we considered removing mod-
to the origin leaves out much of the area included under the eling method names from the manuscript and replacing them
Maxent and MinDist curves. AUC ratios were 1.46 for MinDist with “X,” “Y,” and “Z,” to focus readers’ attention on the key
and 1.49 for Maxent, but only 1.27 for GARP, although all three points. In particular, we assert that currently accepted model
were significantly elevated above the line of null expectations evaluation techniques are not adequate for niche modeling
(bootstrap manipulation, all P 0.05; Table 1). applications, and can yield inaccurate and inappropriate con-
At E = 5%, however, the relative positions of the curves shift. clusions in many cases. This paper presents a first set of steps
Ignoring the lower part of the curves (corresponding to model towards remedying these failings.
thresholds that omit more than the user-stated tolerance), A recent broad comparison (in which two of the authors
now Maxent and GARP are the higher curves, and MinDist is of this paper participated) compared 14 modeling methods,
considerably lower (Fig. 4, Table 2). AUC ratios were highest and identified a suite of methods with particularly good
for Maxent and GARP (1.13 and 1.15, respectively), and lower predictive abilities that included Maxent (Elith et al., 2006).
for MinDist (1.09). Although all three were statistically sig- This large-scale comparison was nonetheless designed as an
nificantly better than null expectations using the z-statistic, SDM (distribution-modeling) exercise, and as such included
Maxent did not achieve statistical significance based on the absence data as an integral element in model testing. The
simpler counts of numbers of replicates with AUC ratios of > 1 three measures of model predictivity employed (the custom-
(Table 2). It is worthy of note that MinDist provides predic- ary ROC analyses, a kappa statistic, and a correlation-based
tions only up to 84.8% of the study area, and so the region procedure) all balance correct predictions of presences and
between that value and unity along the x-axis was filled by a absences, measuring the ability of an algorithm to discrim-
straight line; the incomplete trace of the ROC curve provided inate between sites where a species is present and those
Author's personal copy
e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72 71
where it is absent (Elith et al., 2006). However, as demonstrated ton, and Simon Ferrier, although we may or may not have
above, traditional ROC approaches can identify a method as agreed.
highly accurate when the method in fact is inferior in the
range of predictive thresholds that is likely to be of inter-
est in niche modeling exercises (this point indicates that part references
of the variation among models in the study in question is
artifactual, regardless of whether the application is one of
SDM or one of ENM). Furthermore, we point out that many Anderson, R.P., Lew, D., Peterson, A.T., 2003. Evaluating predictive
recent applications of these methods are explicitly ENM (niche- models of species’ distributions: criteria for selecting optimal
models. Ecol. Model. 162, 211–232.
modeling) applications (Anderson et al., 2002; Pearson et al.,
Anderson, R.P., Peterson, A.T., Gómez-Laverde, M., 2002. Using
2002; Graham et al., 2004; Araújo et al., 2005a; Thuiller et al., niche-based GIS modeling to test geographic predictions of
2005; Wiens and Graham, 2005), so the customary ROC analy- competitive exclusion and competitive release in South
ses should be regarded with caution in these cases. American pocket mice. Oikos 93, 3–16.
The evaluation methodology outlined in this paper Araújo, M.B., Guisan, A., 2006. Five (or so) challenges for species
achieves several of our goals. First, it removes the emphasis distribution modelling. J. Biogeogr. 33, 1677–1688.
Araújo, M.B., Pearson, R.G., Thuiller, W., Erhard, M., 2005a.
on absence data, which in niche-modeling applications can
Validation of species-climate impact models under climate
be positively misleading (Peterson, 2006). Second, it empha-
change. Global Change Biol. 11, 1504–1513.
sizes the key role of omission error in evaluating niche model Araújo, M.B., Whittaker, R.J., Ladle, R.J., Erhard, M., 2005b.
predictivity (Anderson et al., 2003). Finally, we follow a previ- Reducing uncertainty in projections of extinction risk from
ous suggestion in a very different application of ROC analysis climate change. Global Ecol. Biogeogr. 14, 529–538.
that analyses may best be limited to subsectors of the ROC Burden, R.L., Faires, J.D., 2005. Numerical Analysis, eighth ed.
space when certain portions of that space are not directly Thomson Books, Belmont, California.
Carpenter, G., Gillison, A.N., Winter, J., 1993. DOMAIN: a flexible
relevant to applications of interest (Jiang et al., 1996; Pepe,
modeling procedure for mapping potential distributions of
2000; Dodd and Pepe, 2003)—in niche modeling, this modifica- animals and plants. Biodivers. Conserv. 2, 667–680.
tion allows the user to set bounds on the types of predictions DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L., 1988.
that are to be considered (Pepe, 2000; Dodd and Pepe, 2003). A Comparing the areas under two or more correlated receiver
researcher interested in evaluating the invasive potential of a operating characteristic curves: a nonparametric approach.
species would almost certainly be disappointed with a method Biometrics 44, 837–845.
that performs well at thresholds that have associated omis- Dodd, L.E., Pepe, M.S., 2003. Partial AUC estimation and
regression. Biometrics 59, 614–623.
sion errors of >50%! Taking the intended uses of the model
Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S.,
into account, as well as the error-related characteristics of the Guisan, A., Hijmans, R.J., Huettman, F., Leathwick, J.R.,
input data, is an important refinement to model evaluation Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G.,
approaches. Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M.,
Clearly, much work remains in the development of these Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira,
methodologies. In reality, limits both to the sensitivity and the R., Schapire, R.E., Soberón, J., Williams, S.E., Wisz, M.S.,
Zimmermann, N.E., 2006. Novel methods improve prediction
false-positive axes may be desirable (Jiang et al., 1996; Dodd
of species’ distributions from occurrence data. Ecography 29,
and Pepe, 2003). In our modified approach, limiting the fraction
129–151.
of total area that an algorithm is allowed to predict (the over- Fawcett, R., 2003. ROC Graphs: Notes and Practical Considerations
prediction) may be biologically sensible. This restriction would for Data Mining Research. Technical Report HPL-2003-4. HP
create partial ROC curves, limited on one side by the minimum Laboratories, Palo Alto, California.
sensitivity acceptable, and on the other by the maximum over- Fielding, A.H., Bell, J.F., 1997. A review of methods for the
prediction that is tolerable. When more experience in the use assessment of prediction errors in conservation
presence/absence models. Environ. Conserv. 24, 38–49.
of our modified approach is gathered, development of a broad,
Graham, C.H., Ron, S.R., Santos, J.C., Schneider, C.J., Moritz, C.,
comparative study parallel to the previous (distribution mod- 2004. Integrating phylogenetics and environmental niche
eling) study (Elith et al., 2006), but based on niche modeling models to explore speciation mechanisms in dendrobatid
ideas, would be particularly instructive. Because the approach frogs. Evolution 58, 1781–1793.
we present here is distinct in several ways from conventional Guisan, A., Lehmann, A., Ferrier, S., Austin, M.P., Overton,
ROC analysis, the probabilistic interpretations of ROC scores J.M., Aspinall, R., Hastie, T., 2006. Making better
(Fawcett, 2003) and its relations with the Mann–Whitney test biogeographical predictions of species’ distributions. J. Appl.
Ecol. 43, 386–392.
will need to be reassessed. Finally, once the final form of the
Guisan, A., Thuiller, W., 2005. Predicting species distribution:
methodology for partial-ROC applications to predictions of offering more than simple habitat models. Ecol. Lett. 8,
species’ geographic distributions is clear, developing program 993–1009.
code to permit easy implementation would be desirable. Guisan, A., Zimmermann, N.E., 2000. Predictive habitat
distribution models in ecology. Ecol. Modell. 135, 147–186.
Hijmans, R.J., Cameron, S., Parra, J., 2005. WorldClim, Version 1.3,
Acknowledgements http://biogeo.berkeley.edu/worldclim/worldclim.htm.
University of California, Berkeley.
Hirzel, A.H., Hausser, J., Chessel, D., Perrin, N., 2002.
We thank many valued colleagues for discussions of these and Ecological-niche factor analysis: how to compute
related topics over the past several years, particularly Enrique habitat-suitability maps without absence data? Ecology 83,
Martı́nez-Meyer, Robert Anderson, Richard Pearson, Jake Over- 2027–2036.
Author's personal copy
72 e c o l o g i c a l m o d e l l i n g 2 1 3 ( 2 0 0 8 ) 63–72
Jiang, Y., Metz, C.E., Nishikawa, R.M., 1996. A receiver operating Raxworthy, C.J., Martı́nez-Meyer, E., Horning, N., Nussbaum, R.A.,
characteristic partial area index for highly sensitive Schneider, G.E., Ortega-Huerta, M.A., Peterson, A.T., 2003.
diagnostic tests. Radiology 201, 745–750. Predicting distributions of known and unknown reptile
Lobo, J.M., Jimenez-Valverde, A., Real, R., 2007. AUC: a misleading species in Madagascar. Nature 426, 837–841.
measure of the performance of predictive distribution Sauer, J.R., Hines, J.E., Fallon, J., 2001. The North American
models. Global Ecol. Biogeogr.. Breeding Bird Survey, Results and Analysis 1966–2000, version
Lobo, J.M., Verdú, J.R., Numa, C., 2006. Environmental and 2001.2. USGS Patuxent Wildlife Research Center, Laurel, MD.
geographical factors affecting the Iberian distribution of Segurado, P., Araújo, M.B., 2004. An evaluation of methods for
flightless Jekelius species (Coleoptera: Geotrupidae). Divers. modelling species distributions. J. Biogeogr. 31, 1555–1568.
Distributions 12, 179–188. Soberón, J., Peterson, A.T., 2004. Biodiversity informatics:
Pearson, R.G., Dawson, T.P., 2003. Predicting the impacts of managing and applying primary biodiversity data. Philos.
climate change on the distribution of species: are bioclimate Trans. R. Soc. Lond. B 359, 689–698.
envelope models useful? Global Ecol. Biogeogr. 12, 361–371. Soberón, J., Peterson, A.T., 2005. Interpretation of models of
Pearson, R.G., Dawson, T.P., Berry, P.M., Harrison, P.A., 2002. fundamental ecological niches and species’ distributional
SPECIES: a spatial evaluation of climate impact on the areas. Biodivers. Inform. 2, 1–10.
envelope of species. Ecol. Modell. 154, 289–300. Stockwell, D.R.B., 1999. Genetic algorithms II. In: Fielding, A.H.
Pearson, R.G., Raxworthy, C., Nakamura, M., Peterson, A.T., 2007. (Ed.), Machine Learning Methods for Ecological Applications.
Predicting species’ distributions from small numbers of Kluwer Academic Publishers, Boston, pp. 123–144.
occurrence records: a test case using cryptic geckos in Stockwell, D.R.B., Peterson, A.T., 2002a. Controlling bias in
Madagascar. J. Biogeogr. 34, 102–117. biodiversity data. In: Scott, J.M., Heglund, P.J., Morrison, M.L.
Pepe, M.S., 2000. Receiver operating characteristic methodology. J. (Eds.), Predicting Species Occurrences: Issues of Scale and
Am. Stat. Assoc. 95, 308–311. Accuracy. Island Press, Washington, DC, pp. 537–546.
Pereira, R.S., 2002. Desktop GARP. Stockwell, D.R.B., Peterson, A.T., 2002b. Effects of sample size on
http://www.lifemapper.org/desktopgarp/. accuracy of species distribution models. Ecol. Modell. 148,
Peterson, A.T., 2006. Uses and requirements of ecological niche 1–13.
models and related distributional models. Biodivers. Inform. Stockwell, D.R.B., Peterson, A.T., 2003. Comparison of resolution
3, 59–72. of methods used in mapping biodiversity patterns from point
Peterson, A.T., Papeş, M., Eaton, M., 2007. Transferability and occurrence data. Ecol. Indicators 3, 213–221.
model evaluation in ecological niche modeling: a comparison Thuiller, W., Richardson, D.M., Pysek, P., Midgley, G.F., Hughes,
of GARP and Maxent. Ecography 30, 550–560. G.O., Rouget, M., 2005. Niche-based modelling as a tool for
Peterson, A.T., Shaw, J.J., 2003. Lutzomyia vectors for cutaneous predicting the risk of alien plant invasions at a global scale.
leishmaniasis in southern Brazil: ecological niche models, Global Change Biol. 11, 2234–2250.
predicted geographic distributions, and climate change USGS, 2001. HYDRO1k Elevation Derivative Database,
effects. Int. J. Parasitol. 33, 919–931. http://edcdaac.usgs.gov/gtopo30/hydro/. U.S. Geological
Peterson, A.T., Soberón, J., Sánchez-Cordero, V., 1999. Survey, Washington, D.C.
Conservatism of ecological niches in evolutionary time. Vida, S., 2006. Accumetric Test Performance Analysis, Version 1.1.
Science 285, 1265–1267. Accumetric Corporation, Montreal.
Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum Wiens, J.J., 2004. Speciation and ecology revisited: phylogenetic
entropy modeling of species geographic distributions. Ecol. niche conservatism and the origin of species. Evolution 58,
Modell. 190, 231–259. 193–197.
Phillips, S.J., Dudik, M., Schapire, R.E., 2004. A maximum entropy Wiens, J.J., Graham, C.H., 2005. Niche conservatism: integrating
approach to species distribution modeling. In: Proceedings of evolution, ecology, and conservation biology. Annu. Rev. Ecol.
the 21st International Conference on Machine Learning. Evol. Syst. 36, 519–539.