Mod Ev A
Mod Ev A
Mod Ev A
URL http://modeva.r-forge.r-project.org/
Repository CRAN
Repository/R-Forge/Project modeva
Repository/R-Forge/Revision 183
Repository/R-Forge/DateTimeStamp 2023-04-14 16:14:41
Date/Publication 2023-04-14 23:10:02 UTC
NeedsCompilation no
Depends R (>= 2.10)
R topics documented:
modEvA-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
applyThreshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1
2 modEvA-package
arrangePlots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Boyce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
confusionLabel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
confusionMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Dsquared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
evaluate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
evenness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
getBins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
getModEqn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
getThreshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
HLfit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
inputMunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
lollipop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
MESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
MillerCalib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
mod2obspred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
modEvAmethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
multModEv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
OA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
optiPair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
optiThresh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
plotGLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
predDensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
predPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
prevalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
ptsrast2obspred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
range01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
rotif.mods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
RsqGLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
standard01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
threshMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
varImp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
varPart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Index 81
Description
The modEvA package can analyse species distribution models and evaluate their performance. It
includes functions for performing variation partitioning; calculating several measures of model dis-
crimination, classification, explanatory power, and calibration; optimizing prediction thresholds
based on a number of criteria; performing multivariate environmental similarity surface (MESS)
analysis; and displaying various analytical plots.
modEvA-package 3
Details
Package: modEvA
Type: Package
Version: 3.9.3
Date: 2023-04-14
License: GPL-3
Author(s)
Barbosa A.M., Brown J.A., Jimenez-Valverde A., Real R.
A. Marcia Barbosa <[email protected]>
References
Barbosa A.M., Real R., Munoz A.R. & Brown J.A. (2013) New measures for assessing model
equilibrium and prediction mismatch in species distribution models. Diversity and Distributions
19: 1333-1338 (DOI: 10.1111/ddi.12100)
See Also
PresenceAbsence, ROCR, verification, Metrics
Examples
# load sample models:
data(rotif.mods)
# calculate the area under the ROC curve for the model:
AUC(model = mod)
Description
This function applies a threshold value to the continuous predictions of a model, converting them to
binary predictions: 1 for values above the threshold, and 0 for values below it. If two thresholds are
provided (e.g. to separate high, low and intermediate predictions), the result is 0 below the lowest
threshold, 1 above the highest threshold, and 0.5 between them.
Usage
applyThreshold(model = NULL, obs = NULL, pred = NULL, thresh, right = FALSE,
interval = 0.01, quant = 0, na.rm = TRUE)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments in-
stead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
applyThreshold 5
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
thresh numeric vector of length 1 or 2, containing the threshold value(s) with which
to reclassify ’pred’, or the criteria under which to compute these thresholds –
run modEvAmethods("getThreshold") for available options, and see Details in
getThreshold for their description.
right logical value indicating if the interval should be closed on the right (and open on
the left) or vice versa, i.e., if predictions equalling the threshold value(s) should
be classified as lower rather than higher. The default is FALSE.
interval Argument to pass to optiThresh indicating the interval between the thresholds
to test, if ’thresh’ implies optimizing a threshold-based measure. The default
is 0.01. Smaller values may provide more precise results but take longer to
compute.
quant Numeric value indicating the proportion of presences to discard if any of ’thresh’
is "MTP" (minimum training presence). With the default value 0, MTP will be
the threshold at which all observed presences are classified as such; with e.g.
quant=0.05, MTP will be the threshold at which 5% presences will be classified
as absences.
na.rm Logical value indicating whether NA values should be ignored. Defaults to
TRUE.
Details
Several criteria have been proposed for selecting thresholds with which to convert continuous model
predictions (of presence probability, habitat suitability or alike) into binary predictions of presence
or absence. A threshold is required for computing threshold-based model evaluation metrics, such
as those in threshMeasures. This function reclassifies the predictions of a model given one or two
numeric thresholds, or one or two threshold selection criteria implemented in getThreshold.
Value
This function returns an object of the same class as ’pred’ with the reclassified values after applica-
tion of the threshold.
Author(s)
A. Marcia Barbosa
See Also
getThreshold, threshMeasures
6 arrangePlots
Examples
# load sample models:
data(rotif.mods)
# you can also use applyThreshold with vectors of observed and predicted values:
Description
Get an appropriate row/column combination (for par(mfrow)) for arranging a given number of
plots within a plotting window.
Usage
arrangePlots(n.plots, landscape = FALSE)
Arguments
n.plots number of plots to be placed in the graphics device.
landscape logical, whether the plotting window should be landscape/horizontal (number
of columns larger than the number of rows) or not. The value does not make a
difference if the number of plots makes for a square plotting window.
AUC 7
Details
This function is used internally by optiThresh, but can also be useful outside it.
Value
An integer vector of the form c(nr, nc) indicating, respectively, the number of rows and of columns
of plots to set in the graphics device.
Author(s)
A. Marcia Barbosa
See Also
plot, layout
Examples
arrangePlots(10)
data(iris)
names(iris)
# say you want to plot all columns in a nicely arranged plotting window:
par(mfrow = arrangePlots(ncol(iris)))
for (i in 1:ncol(iris)) {
plot(1:nrow(iris), iris[, i])
}
Description
This function calculates the Area Under the Curve of the receiver operating characteristic (ROC)
plot, or alternatively the precision-recall (PR) plot, for either a model object or two matching vectors
of observed binary (1 for occurrence vs. 0 for non-occurrence) and predicted continuous (e.g.
occurrence probability) values, respectively.
8 AUC
Usage
Arguments
method character indicating with which method to calculate the AUC value. Available
options are "rank" (the default and most accurate, but implemented only if curve
= "ROC") and "trapezoid" (the default if curve = "PR"). The latter is computed
more accurately if ’interval’ is decreased (see ’interval’ argument).
plot logical, whether or not to plot the curve. Defaults to TRUE.
diag logical, whether or not to add the reference diagonal (if plot = TRUE). Defaults
to TRUE.
diag.col line colour for the reference diagonal (if diag = TRUE).
diag.lty line type for the reference diagonal (if diag = TRUE).
curve.col line colour for the curve.
curve.lty line type for the curve.
curve.lwd line width for the curve.
plot.values logical, whether or not to show in the plot the values associated to the curve
(e.g., the AUC). Defaults to TRUE.
plot.digits integer number indicating the number of digits to which the values in the plot
should be rounded. Defaults to 3. This argument is ignored if ’plot’ or ’plot.values’
are set to FALSE.
plot.preds logical value indicating whether the proportions of ’pred’ values for each thresh-
old should be plotted as proportionally sized blue circles. Can also be provided
as a character vector specifying if the circles should be plotted on the "curve"
(the default) and/or at the "bottom" of the plot. The default is FALSE for no cir-
cles, but it may be interesting to try it, especially if your curve has long straight
lines or does not cover the full length of the plot.
grid logical, whether or not to add a grid to the plot, marking the analysed thresholds.
Defaults to FALSE.
xlab label for the x axis. By default, a label is automatically generated according to
the specified ’curve’.
ylab label for the y axis. By default, a label is automatically generated according to
the specified ’curve’.
ticks logical, whether or not to add blue tick marks at the bottom of the plot to mark
the thresholds at which there were values from which to to draw the curve. De-
faults to FALSE.
na.rm Logical value indicating if missing values should be ignored in computations.
The default is TRUE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
... further arguments to be passed to the plot function.
Details
In the case of the "ROC" curve (the default), the AUC is a measure of the overall discrimination
power of the predictions, or the probability that an occurrence site has a higher predicted value than
10 AUC
a non-occurrence site. It can thus be calculated with the Wilcoxon rank sum statistic, as is done
with the default method="rank". There’s also an option to compute, instead of the ROC curve, the
precision-recall ("PR") curve, which is more robust to imbalanced data, e.g. species rarity (Sofaer
et al. 2019), as it doesn’t value true negatives.
If ’curve’ is set to "PR", or if ’method’ is manually set to "trapezoid", the AUC value will be more
accurate if ’interval’ is decreased (see ’method’ and ’interval’ arguments above). The plotted curve
will also be more accurate with smaller ’interval’ values, especially for imbalanced datasets (which
can cause an apparent disagreement between the look of the curve and the actual value of the AUC).
Mind that the AUC has been widely criticized (e.g. Lobo et al. 2008, Jimenez-Valverde et al.
2013), but is still among the most widely used metrics in model evaluation. It is highly correlated
with species prevalence (as are all model discrimination and classification metrics), so prevalence
is also output by the AUC function (if simplif = FALSE, the default) for reference.
Although there are functions to calculate the AUC in other R packages (e.g. ROCR, Presence-
Absence, verification, Epi, PRROC, PerfMeas, precrec), the AUC function is more compatible
with the remaining functions in modEvA, and it can be applied not only to a set of observed vs.
predicted values, but also directly to a model object of class "glm", "gam", "gbm", "randomForest"
or "bart".
Value
If simplif = TRUE, the function returns only the AUC value (a numeric value between 0 and 1).
Otherwise (the default), it returns a list with the following components:
thresholds a data frame of the true and false positives, the sensitivity, specificity and recall
of the predictions, and the number of predicted values at each analysed thresh-
old.
N the total number of obervations.
prevalence the proportion of presences (i.e., ones) in the data (which correlates with the
AUC of the "ROC" plot).
AUC the value of the AUC).
AUCratio the ratio of the obtained AUC value to the null expectation (0.5).
meanPrecision the arithmetic mean of precision (proportion of predicted presences actually ob-
served as presences) across all threshold values (defined by ’interval’). It is close
to the AUC of the precision-recall (PR) curve.
Author(s)
A. Marcia Barbosa
References
Lobo, J.M., Jimenez-Valverde, A. & Real, R. (2008). AUC: a misleading measure of the perfor-
mance of predictive distribution models. Global Ecology and Biogeography 17: 145-151
Jimenez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimina-
tion capacity in species distribution models depends on the representativeness of the environmental
domain. Global Ecology and Biogeography 22: 508-516
Sofaer, H.R., Hoeting, J.A. & Jarnevich, C.S. (2019). The area under the precision-recall curve as
a performance metric for rare binary events. Methods in Ecology and Evolution, 10: 565-577
Boyce 11
See Also
threshMeasures
Examples
# load sample models:
data(rotif.mods)
AUC(model = mod)
Description
This function computes the (continuous) Boyce index (Boyce 2002; Hirzel et al. 2006) for either:
1) a model object; or 2) two paired numeric vectors of observed (binary, 1 for occurrence vs. 0 for
no occurrence records) and predicted (continuous, e.g. occurrence probability) values; or 3) a set of
12 Boyce
presence point coordinates and a raster map with the predicted values for the entire model evaluation
area. This metric is designed for evaluating model predictions against presence/background data
(i.e. presence/available, where "available" includes both presences and absences; Boyce 2002), so
the function uses the model predictions for the presence sites (ones) against the predictions for the
entire dataset (ones and zeros).
Usage
Boyce(model = NULL, obs = NULL, pred = NULL, n.bins = NA,
bin.width = "default", res = 100, method = "spearman", rm.dup.classes = FALSE,
rm.dup.points = FALSE, plot = TRUE, plot.lines = TRUE, plot.values = TRUE,
plot.digits = 3, na.rm = TRUE, ...)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments (e.g.
for external test data) instead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
n.bins number of classes or bins (e.g. 10) in which to group the ’pred’ values, or a
vector with the bin thresholds. If n.bins = NA (the default), a moving window
is used (see next parameters), so as to compute the "continuous Boyce index"
(Hirzel et al. 2006).
bin.width width of the moving window (if n.bins = NA), in the units of ’pred’ (e.g. 0.1).
By default, it is 1/10th of the ’pred’ range).
res resolution of the moving window (if n.bins = NA). By default it is 100 focals,
providing 100 moving bins).
method argument to be passed to cor indicating which correlation coefficient to use. The
default is 'spearman' as per Boyce et al. (2002), but 'pearson' and 'kendall'
can also be used.
rm.dup.classes if TRUE (as in ’ecospat::ecospat.boyce’) and if there are different bins with the
same predicted/expected ratio, only one of each is used to compute the correla-
tion. See Examples.
Boyce 13
rm.dup.points if TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
plot logical, whether or not to plot the predicted/expected ratio against the median
prediction of each bin. Defaults to TRUE.
plot.lines logical, whether or not to add lines connecting the points in the plot (if plot=TRUE).
Defaults to TRUE.
plot.values logical, whether or not to show in the plot the value of the Boyce index. Defaults
to TRUE.
plot.digits number of digits to which the value in the plot should be rounded (if ’plot’ and
’plot.values’ are TRUE). Defaults to 3.
na.rm Logical value indicating if missing values should be removed from computa-
tions. The default is TRUE.
... some additional arguments can be passed to plot, e.g. ’main’ or ’xlim’.
Details
The Boyce index is the correlation between model predictions and area-adjusted frequencies (i.e.,
observed vs. expected proportion of occurrences) along different prediction classes. In other words,
it measures how model predictions differ from a random distribution of the observed presences
across the prediction gradient (Boyce et al. 2002). It can take values between -1 and 1. Positive
values indicate that presences are more frequent than expected by chance (given availability) in
areas with higher predicted values. Values close to zero mean that predictions are no better than
random, and negative values indicate counter predictions, i.e. that presences are more frequent in
areas with lower predicted values.
The code is largely based on the ’ecospat.boyce’ function in the ecospat package (version 3.2.1),
but it is modified to match the input types in the remaining functions of ’modEvA’, and to return a
more complete output.
Value
This function returns a list with the following components:
bins a data frame with the number of values in each bin, their median and range of
predicted values, and the corresponding predicted/expected ratio of presences.
B the numeric value of the Boyce index, i.e. the coefficient of correlation between
the median predicted value in each bin and the corresponding predicted/expected
ratio.
If plot=TRUE (the default), the function also plots the predicted/expected ratio for the utilized bins
along the prediction range. A good model should yield a monotonically increasing curve (but see
Note).
Note
In bins with overly small sample sizes, the comparison between median prediction and random
expectation may not be meaningful, although these bins will equally contribute to the overall Boyce
14 confusionLabel
index. When there are bins with less than 30 values, a warning is emitted and their points are plotted
in red, but mind that 30 is a largely arbitrary number. See the $bins$bin.N section of the console
output, and use the ’bin.width’ argument to enlarge the bins.
Author(s)
A. Marcia Barbosa, with significant chunks of code from the ’ecospat::ecospat.boyce’ function by
Blaise Petitpierre and Frank Breiner (ecospat package version 3.2.1).
References
Boyce, M.S., P.R. Vernier, S.E. Nielsen & F.K.A. Schmiegelow (2002) Evaluating resource selec-
tion functions. Ecological Modelling 157: 281-300
Hirzel, A.H., G. Le Lay, V. Helfer, C. Randin & A. Guisan (2006) Evaluating the ability of habitat
suitability models to predict species presences. Ecological Modelling 199: 142-152
Examples
# load sample models:
data(rotif.mods)
Description
This function labels the predictions of a binary-response model according to their confusion matrix
categories, i.e., it classifies each prediction into a false positive, false negative, true positive or true
negative, given a user-defined threshold value.
confusionLabel 15
Usage
Arguments
Value
This function returns a character vector of the same length as ’obs’ and ’pred’, or of the same
number of rows as the data in ’model’, containing the confusion matrix label for each value.
Author(s)
A. Marcia Barbosa
See Also
threshMeasures, confusionMatrix
Examples
# load sample models:
data(rotif.mods)
Description
This function computes the confusion (or contingency) matrix for a binary-response model, con-
taining the numbers of false positives, false negatives, true positives and true negatives, given a
user-defined threshold value.
Usage
confusionMatrix(model = NULL, obs = NULL, pred = NULL, thresh,
interval = 0.01, quant = 0, verbosity = 2, na.rm = TRUE, rm.dup = FALSE)
confusionMatrix 17
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments (e.g.
for external test data) instead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
thresh numeric value of the threshold to separate predicted presences from predicted
absences; can be "preval", to use the prevalence of ’obs’ (or of the response
variable in ’model’) as the threshold, or any real number between 0 and 1. See
Details in threshMeasures for an informed choice.
interval numeric value, used if ’thresh’ is a threshold optimization method such as "maxKappa"
or "maxTSS", indicating the interval between the thresholds to test. The default
is 0.01. Smaller values may provide more precise results but take longer to
compute.
quant numeric value indicating the proportion of presences to discard if thresh="MTP"
(minimum training presence). With the default value 0, MTP will be the thresh-
old at which all observed presences are classified as such; with e.g. quant=0.05,
MTP will be the threshold at which 5% presences will be classified as absences.
verbosity integer specifying the amount of messages to display. Defaults to the maximum
implemented; lower numbers (down to 0) decrease the number of messages.
na.rm logical argument indicating whether to remove (with a warning saying how
many) rows with NA in any of the ’obs’ or ’pred’ values. The default is FALSE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
Value
This function returns a data frame containing the four values of the confusion matrix.
Author(s)
A. Marcia Barbosa
18 Dsquared
See Also
threshMeasures, confusionLabel
Examples
# load sample models:
data(rotif.mods)
Description
This function computes the (adjusted) amount of deviance accounted for by a model, given a model
object or a set of observed and predicted values.
Usage
Dsquared(model = NULL, obs = NULL, pred = NULL, family = NULL,
adjust = FALSE, npar = NULL, na.rm = TRUE, rm.dup = FALSE)
Arguments
model a model object of class implemented in mod2obspred. If this argument is pro-
vided, ’obs’ and ’pred’ will be extracted with that function. Alternatively, you
can input the ’obs’ and ’pred’ arguments instead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
values of the response variable. Alternatively (and if ’pred’ is a ’SpatRaster’),
a two-column matrix or data frame containing, respectively, the x (longitude)
and y (latitude) coordinates of presence points, in which case the ’obs’ vector
of presences and absences will be extracted with ptsrast2obspred. This argu-
ment is ignored if ’model’ is provided.
Dsquared 19
pred alternatively to ’model’ and together with ’obs’, a vector with the corresponding
predicted values, of the same length and in the same order as ’obs’. Alternatively
(and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of the predicted
values for the entire evaluation region, in which case the ’pred’ vector will be
extracted with ptsrast2obspred. This argument is ignored if ’model’ is pro-
vided.
family a character vector (i.e. in quotes) of length 1 specifying the family of the model
that generated the ’pred’ values. This argument is ignored if model is provided
and is of a class for which family works; otherwise (i.e. if ’obs’ and ’pred’ are
provided rather than a model object), family can be specified by the user, or (if
left NULL) will be guessed (with a message) given the values of the response
variable.
adjust logical, whether or not to adjust the D-squared value for the number of obser-
vations and parameters in the model (see Details). The default is FALSE; TRUE
requires either providing the model object of class GLM, or specifying the num-
ber of parameters in the model that produced the pred values.
npar integer value indicating the number of parameters in the model. This argument
is ignored and taken from model if this argument is provided and of class GLM,
or if adjust = FALSE.
na.rm Logical value indicating whether missing values should be ignored in computa-
tions. Defaults to TRUE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
Details
Linear models have an R-squared value (commonly provided with the model summary) which mea-
sures the proportion of variation that the model accounts for. For generalized linear models (GLMs)
and others based on non-continuous response variables, an equivalent is the amount of deviance ac-
counted for (D-squared; Guisan & Zimmermann 2000), though this value is not routinely provided
with the model summary. The Dsquared function calculates it as the proportion of the null de-
viance (i.e. the deviance of a model with no predictor variables) that is accounted for by the model.
There is also an option to compute the adjusted D-squared, which takes into account the number of
observations and the number of parameters, thus allowing direct comparison among the output for
different models (Weisberg 1980, Guisan & Zimmermann 2000).
The function computes the mean residual deviance (as in the calc.deviance function of package
dismo) of the observed (response) against the predicted values, and the mean deviance of a null
model (with no predictor variables), i.e. of the response against the mean of the response. Finally,
it gets the explained deviance as (null-residual)/null.
Value
The function returns a numeric value indicating the (optionally adjusted) proportion of deviance
accounted for by the input model predictions.
20 evaluate
Author(s)
A. Marcia Barbosa, with parts of code from ’dismo::calc.deviance’ by John R. Leathwick and Jane
Elith
References
Guisan, A. & Zimmermann, N.E. (2000) Predictive habitat distribution models in ecology. Ecolog-
ical Modelling 135: 147-186
Weisberg, S. (1980) Applied Linear Regression. Wiley, New York
See Also
plotGLM, RsqGLM, dismo::calc.deviance
Examples
# load sample models:
data(rotif.mods)
Dsquared(model = mod)
# you can also use Dsquared with vectors of observed and predicted values
# instead of with a model object:
Description
This function evaluates the classification performance of a model based on the values of a confusion
matrix obtained at a particular threshold.
evaluate 21
Usage
evaluate(a, b, c, d, N = NULL, measure = "CCR")
Arguments
a number of correctly predicted presences
b number of absences incorrectly predicted as presences
c number of presences incorrectly predicted as absences
d number of correctly predicted absences
N total number of cases. If NULL (the default), it is calculated automatically by
adding up a, b, c and d.)
measure a character vector of length 1 indicating the the evaluation measure to use. Type
modEvAmethods("threshMeasures") for available options.
Details
A number of measures can be used to evaluate continuous model predictions against observed bi-
nary occurrence data (Fielding & Bell 1997; Liu et al. 2011; Barbosa et al. 2013). The ’eval-
uate’ function can calculate a few threshold-based classification measures from the values of a
confusion matrix obtained at a particular threshold. The ’evaluate’ function is used internally by
threshMeasures. It can also be accessed directly by the user, but it is usually more practical to use
’threshMeasures’, which calculates the confusion matrix automatically.
Value
The value of the specified evaluation measure.
Note
Some measures (e.g. NMI, odds ratio) don’t work with zeros in (some parts of) the confusion
matrix. Also, TSS and NMI are not symmetrical, i.e. "obs" vs "pred" is different from "pred" vs
"obs".
Author(s)
A. Marcia Barbosa
References
Barbosa A.M., Real R., Munoz A.R. & Brown J.A. (2013) New measures for assessing model
equilibrium and prediction mismatch in species distribution models. Diversity and Distributions,
19: 1333-1338
Fielding A.H. & Bell J.F. (1997) A review of methods for the assessment of prediction errors in
conservation presence/absence models. Environmental Conservation 24: 38-49
Liu C., White M., & Newell G. (2011) Measuring and comparing the accuracy of species distribu-
tion models with presence-absence data. Ecography, 34, 232-243.
22 evenness
See Also
threshMeasures
Examples
evaluate(23, 44, 21, 34)
Description
For building and evaluating species distribution models, the porportion of presences (prevalence)
of a species and the balance between the number of presences and absences may be issues to take
into account (e.g. Jimenez-Valverde & Lobo 2006, Barbosa et al. 2013). The evenness function
calculates the presence-absence balance in a binary (e.g., presence/absence) vector.
Usage
evenness(obs)
Arguments
obs a vector of binary observations (e.g. 1 or 0, male or female, disease or no disease,
etc.)
Value
A number ranging between 0 when all values are the same, and 1 when there are the same number
of cases with each value in obs.
Author(s)
A. Marcia Barbosa
References
Barbosa A.M., Real R., Munoz A.R. & Brown J.A. (2013) New measures for assessing model
equilibrium and prediction mismatch in species distribution models. Diversity and Distributions,
19: 1333-1338
Jimenez-Valverde A. & Lobo J.M. (2006) The ghost of unbalanced species distribution data in
geographical model predictions. Diversity and Distributions, 12: 521-524.
See Also
prevalence
getBins 23
Examples
(x <- rep(c(0, 1), each = 5))
(y <- c(rep(0, 3), rep(1, 7)))
(z <- c(rep(0, 7), rep(1, 3)))
prevalence(x)
evenness(x)
prevalence(y)
evenness(y)
prevalence(z)
evenness(z)
Description
Get continuous predicted values into bins according to specific criteria.
Usage
getBins(model = NULL, obs = NULL, pred = NULL, id = NULL,
bin.method, n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15,
min.prob.interval = 0.1, quantile.type = 7, simplif = FALSE,
verbosity = 2, na.rm = TRUE, rm.dup = FALSE)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments in-
stead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
24 getBins
id optional vector of row identifiers; must be of the same length and in the same
order of obs and pred (or of the cases used to build model)
bin.method the method with which to divide the values into bins. Type modEvAmeth-
ods("getBins") for available options and see Details for more information on
these methods.
n.bins the number of bins in which to divide the data.
fixed.bin.size logical, whether all bins should have (approximally) the same size.
min.bin.size integer value defining the minimum number of observations to include in each
bin. The default is 15, the minimum required for accurate comparisons within
bins (Jovani & Tella 2006, Jimenez-Valverde et al. 2013).
min.prob.interval
minimum range of probability values in each bin. The default is 0.1.
quantile.type argument to pass to quantile specifying the algorithm to use if bin.method =
"quantiles". The default is 7 (the quantile default in R), but check out other
types, e.g. 3 (used by SAS), 6 (used by Minitab and SPSS) or 5 (appropriate for
deciles, which correspond to the default n.bins = 10).
simplif logical, whether to calculate a faster, simplified version (used internally in other
functions). The default is FALSE.
verbosity integer specifying the amount of messages or warnings to display. Defaults to
the maximum implemented; lower numbers (down to 0) decrease the number of
messages.
na.rm logical argument indicating whether to remove (with a warning saying how
many) rows with NA in any of the ’obs’ or ’pred’ values. The default is TRUE,
as some ’bin.method’ options will fail if there are NAs.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
Details
Mind that different bin.methods can lead to visibly different results regarding the bins and any
operations that depend on them (such as HLfit). Currently available bin.methods are:
- round.prob: probability values are rounded to the number of digits of min.prob.interval -
e.g., if min.prob.interval = 0.1 (the default), values under 0.05 get into bin 1 (rounded probability =
0), values between 0.05 and 0.15 get into bin 2 (rounded probability = 0.1), etc. until values with
probability over 0.95, which get into bin 11. Arguments n.bins, fixed.bin.size and min.bin.size are
ignored by this bin.method.
- prob.bins: probability values are grouped into bins of the given probability intervals - e.g., if
min.prob.interval = 0.1 (the default), bin 1 gets the values between 0 and 0.1, bin 2 gets the values
between 0.1 and 0.2, etc. until bin 10 which gets the values between 0.9 and 1. Arguments n.bins,
fixed.bin.size and min.bin.size are ignored by this bin.method.
- size.bins: probability values are grouped into bins of (approximately) equal size, defined by
argument min.bin.size. Arguments n.bins and min.prob.interval are ignored by this bin.method.
- n.bins: probability values are divided into the number of bins given by argument n.bins, and their
sizes may or may not be forced to be (approximately) equal, depending on argument fixed.bin.size
getBins 25
(which is FALSE by default). Arguments min.bin.size and min.prob.interval are ignored by this
bin.method.
- quantiles: probability values are divided using R function quantile, with probability cutpoints
defined by the given n.bins (i.e., deciles by default), and with the quantile algorithm defined by
argument quantile.type. Arguments fixed.bin.size, min.bin.size and min.prob.interval are ignored
by this bin.method.
Value
The output of getBins is a list with the following components:
Note
This function is still under development and may fail for some datasets and binning methods (e.g.,
ties may sometimes preclude binning under some bin.methods). Fixes and further binning methods
are in preparation. Feedback is welcome.
Author(s)
A. Marcia Barbosa
References
Jimenez-Valverde A., Acevedo P., Barbosa A.M., Lobo J.M. & Real R. (2013) Discrimination ca-
pacity in species distribution models depends on the representativeness of the environmental do-
main. Global Ecology and Biogeography 22: 508-516.
Jovani R. & Tella J.L. (2006) Parasite prevalence and sample size: misconceptions and solutions.
Trends in Parasitology 22: 214-218.
See Also
HLfit
Examples
# load sample models:
data(rotif.mods)
Description
This function retrieves the equation of a model, to print or apply elsewhere.
Usage
getModEqn(model, type = "Y", digits = NULL, prefix = NULL,
suffix = NULL)
Arguments
model a model object of class ’lm’ or glm’.
type the type of equation to get; can be either "Y" (the default, for the linear model
equation), "P" (for probabiity) or "F" (for favourability).
digits the number of significant digits to which to round the coefficient estimates in the
equation.
prefix the prefix to add to each variable name in the equation.
suffix the suffix to add to each variable name in the equation.
Details
The summary of a model in R gives you a table of the coefficient estimates and other parameters.
Sometimes it may be useful to have a string of text with the model’s equation, so that you can
present it in an article (e.g. Real et al. 2005) or apply it in a (raster map) calculation, either in R
(although here you can usually use the ’predict’ function for this) or in a GIS software (e.g. Barbosa
et al. 2010). The getModEqn function gets this equation for linear or generalized linear models.
By default it prints the "Y" linear equation, but for generalized linear models you can also set type
= "P" (for the equation of probability) or type = "F" (for favourability, which modifies the intercept
to eliminate the effect of modelled prevalence - see Real et al. 2006).
If the variables to which you want to apply the model have a prefix or suffix (e.g. something like
prefix = "raster.stack$" for the R ’raster’ or ’terra’ package, or prefix = "mydata$" for a data frame,
or suffix = "@1" in QGIS, or suffix = "@mapset" in GRASS), you can get these in the equation too,
using the prefix and/or the suffix argument.
getThreshold 27
Value
Author(s)
A. Marcia Barbosa
References
Barbosa A.M., Real R. & Vargas J.M. (2010) Use of coarse-resolution models of species’ distribu-
tions to guide local conservation inferences. Conservation Biology 24: 1378-87
Real R., Barbosa A.M., Martinez-Solano I. & Garcia-Paris, M. (2005) Distinguishing the distribu-
tions of two cryptic frogs (Anura: Discoglossidae) using molecular data and environmental model-
ing. Canadian Journal of Zoology 83: 536-545
Real R., Barbosa A.M. & Vargas J.M. (2006) Obtaining environmental favourability functions from
logistic regression. Environmental and Ecological Statistics 13: 237-245
Examples
# load sample models:
data(rotif.mods)
getModEqn(mod)
Description
Usage
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments in-
stead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
threshMethod Criterion under which to compute the threshold. Run modEvAmethods("getThreshold")
for available options.
interval Argument to pass to optiThresh indicating the interval between the thresholds
to test. The default is 0.01. Smaller values may provide more precise results but
take longer to compute.
quant Numeric value indicating the proportion of presences to discard if threshMethod="MTP"
(minimum training presence). With the default value 0, MTP will be the thresh-
old at which all observed presences are classified as such; with e.g. quant=0.05,
MTP will be the threshold at which 5% presences will be classified as absences.
na.rm Logical value indicating whether NA values should be ignored. Defaults to
TRUE.
Details
Several criteria have been proposed for selecting a threshold with which to convert continuous
model predictions (of presence probability, habitat suitability or alike) into binary predictions of
presence or absence. A threshold is required for computing threshold-based model evaluation met-
rics, such as those in threshMeasures. This function implements a few of these threshold selection
criteria, including those outlined in Liu et al. (2005, 2013) and a couple more:
- "preval", "trainPrev": prevalence (proportion of presences) in the supplied ’model’ or ’obs’
- "meanPred": mean predicted value in the supplied ’model’ or ’pred’
- "midPoint": median predicted value in the supplied ’model’ or ’pred’
- "maxKappa": threshold that maximizes Cohen’s kappa
- "maxCCR", "maxOA", "maxOPS": threshold that maximizes the Correct Classification Rate, aka
Overall Accuracy, aka Overall Prediction Success
- "maxF": threshold that maximizes the F value
getThreshold 29
Value
This function returns a numeric value indicating the threshold selected under the specified ’thresh-
Method’.
Author(s)
A. Marcia Barbosa
References
Liu C., Berry P.M., Dawson T.P. & Pearson R.G. (2005) Selecting thresholds of occurrence in the
prediction of species distributions. Ecography 28: 385-393
Liu C., White M. & Newell G. (2013) Selecting thresholds for the prediction of species occurrence
with presence-only data. Journal of Biogeography 40: 778-789
See Also
threshMeasures, optiThresh
Examples
# load sample models:
data(rotif.mods)
# you can also use getThreshold with vectors of observed and predicted values
# instead of with a model object:
Description
This function calculates a model’s calibration performance (reliability) with the Hosmer & Lemeshow
goodness-of-fit statistic, which compares predicted probability to observed occurrence frequency at
each portion of the probability range.
Usage
HLfit(model = NULL, obs = NULL, pred = NULL, bin.method,
n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15,
min.prob.interval = 0.1, quantile.type = 7, simplif = FALSE,
verbosity = 2, alpha = 0.05, plot = TRUE, plot.values = TRUE,
plot.bin.size = TRUE, xlab = "Predicted probability",
ylab = "Observed prevalence", na.rm = TRUE, rm.dup = FALSE, ...)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments in-
stead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
HLfit 31
bin.method argument to pass to getBins specifying the method for grouping the records
into bins within which to compare predicted probability to observed prevalence;
type modEvAmethods("getBins") for available options, and see Details for more
information.
n.bins argument to pass to getBins specifying the number of bins to use if bin.method
= n.bins or bin.method = quantiles. The default is 10.
fixed.bin.size argument to pass to getBins, a logical value indicating whether to force bins to
have (approximately) the same size. The default is FALSE.
min.bin.size argument to pass to getBins specifying the minimum number of records in each
bin. The default is 15, the minimum required for accurate comparisons within
bins (Jovani & Tella 2006, Jimenez-Valverde et al. 2013).
min.prob.interval
argument to pass to getBins specifying the minimum interval (range) of proba-
bility values within each bin. The default is 0.1.
quantile.type argument to pass to quantile specifying the algorithm to use if bin.method =
"quantiles". The default is 7 (the quantile default in R), but check out other
types, e.g. 3 (used by SAS), 6 (used by Minitab and SPSS) or 5 (appropriate for
deciles, which correspond to the default n.bins = 10).
simplif logical, wheter to perform a faster simplified version returning only the basic
statistics. The default is FALSE.
verbosity integer specifying the amount of messages or warnings to display. Defaults to
the maximum implemented; lower numbers (down to 0) decrease the number of
messages.
alpha alpha value for confidence intervals if plot = TRUE.
plot logical, whether to produce a plot of the results. The default is TRUE.
plot.values logical, whether to report measure values in the plot. The default is TRUE.
plot.bin.size logical, whether to report bin sizes in the plot. The default is TRUE.
xlab label for the x axis.
ylab label for the y axis.
na.rm Logical value indicating whether missing values should be ignored in computa-
tions. Defaults to TRUE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
... further arguments to pass to the plot function.
Details
Most of the commonly used measures for evaluating model performance focus on the discrimina-
tion or the classification capacity, i.e., how well the model is capable of distinguishing or classifying
presences and absences (often after the model’s continuous predictions of presence probability or
alike are converted to binary predictions of presence or absence). However, there is another im-
portant facet of model evaluation: calibration or reliability, i.e., the relationship between predicted
probability and observed occurrence frequency (Pearce & Ferrier 2000; Jimenez-Valverde et al.
32 HLfit
2013). The HLfit function measures model reliability with the Hosmer & Lemeshow goodness-of-
fit statistic (Hosmer & Lemeshow 1980).
Note that this statistic has strong limitations and caveats (see e.g. http://www.statisticalhorizons.com/hosmer-
lemeshow, Allison 2014), mainly due to the need to group the values into bins within which to com-
pare probability and prevalence, and the strong influence of the binning method on the results. The
’HLfit’ function can use several binning methods, which are implemented and roughly explained in
the getBins function and can be accessed by typing ’modEvAmethods("getBins")’. You should try
’HLfit’ with different binning methods to see how if the results are robust.
Value
HLfit returns a list with the following components:
bins.table a data frame of the obtained bins and the values resulting from the hosmer-
Lemeshow goodness-of-fit analysis.
chi.sq the value of the Chi-squared test.
DF the number of degrees of freedom.
p.value the p-value of the Hosmer-Lemeshow test. Note that this is one of those tests
for which higher p-values are better.
RMSE the root mean squared error.
Note
The 4 lines of code from "observed" to "p.value" were adapted from the ’hosmerlem’ function
available at http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt. The plotting code
was loosely based on the calibration.plot function in package PresenceAbsence. HLfit still
needs some code simplification, and may fail for some datasets and binning methods. Fixes are
being applied. Feedback is welcome.
Author(s)
A. Marcia Barbosa
References
Allison P.D. (2014) Measures of Fit for Logistic Regression. SAS Global Forum, Paper 1485
Hosmer D.W. & Lemeshow S. (1980) A goodness-of-fit test for the multiple logistic regression
model. Communications in Statistics, A10: 1043-1069
Jimenez-Valverde A., Acevedo P., Barbosa A.M., Lobo J.M. & Real R. (2013) Discrimination ca-
pacity in species distribution models depends on the representativeness of the environmental do-
main. Global Ecology and Biogeography 22: 508-516
Jovani R. & Tella J.L. (2006) Parasite prevalence and sample size: misconceptions and solutions.
Trends in Parasitology 22: 214-218
Pearce J. & Ferrier S. (2000) Evaluating the Predictive Performance of Habitat Models Developed
using Logistic Regression. Ecological Modeling, 133: 225-245
HLfit 33
See Also
getBins, MillerCalib
Examples
# load sample models:
data(rotif.mods)
# you can also use 'predPlot' with vectors of observed and predicted values
# instead of a model object:
Description
This function is used internally by many other functions in this package to check and extract the
’obs’ and ’pred’ vectors from the user inputs.
Usage
inputMunch(model = NULL, obs = NULL, pred = NULL, rm.dup = FALSE, na.rm = FALSE)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments (e.g.
for external test data) instead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
rm.dup logical argument to be passed to ptsrast2obspred indicating whether repeated
points within the same pixel should be removed. The default is FALSE.
na.rm logical argument indicating whether to remove (with a warning saying how
many) rows with NA in any of the resulting ’obs’ or ’pred’ values. The default
is FALSE.
Value
This function returns a two-column data frame containing the ’obs’ and ’pred’ values, or an error
message if inputs are not as required.
lollipop 35
Author(s)
A. Marcia Barbosa
Description
This function creates a lollipop chart from a (optionally named) numeric vector.
Usage
lollipop(x, names = NULL, ymin = 0, sticks = TRUE, col = "royalblue",
grid = TRUE, cex = 1, cex.axis = 1, las = 2, ...)
Arguments
x a numeric vector.
names a vector of the same length as ’x’ with the names to be plotted below the lol-
lipops. If this argument is left NULL and ’x’ has names, then these will be
used.
ymin numeric value for the lower limit of the y axis. The default is zero. If set to NA,
the minimum of ’x’ will be used.
sticks logical value indicating whether the sticks of the lollipops should be drawn. The
default is TRUE.
col colour for the lollipops.
grid logical, whether or not to add a grid to the plot. The default is TRUE.
cex numeric value indicating the size of the lollipops. Will be passed as ’cex’ to
’points’ and as ’lwd’ to ’arrows’ (the lines or lollipop sticks).
cex.axis numeric value indicating the size of the x and y axis labels.
las argument to pass to par indicating the orientation of the axis labels.
... additional arguments that can be used for the plot, e.g. ’main’.
Details
According to modern data viz recommendations, lollipop charts are generally a better alternative to
bar charts, as they reduce the visual distortion caused by the length of the bars, making it easier to
compare the values.
Value
This function produces a lollipop chart of the values in ’x’.
36 MESS
Author(s)
A. Marcia Barbosa
See Also
barplot
Examples
lollipop(mtcars[,1], names = rownames(mtcars), las = 2, ylab = names(mtcars)[1],
cex.axis = 0.6, main = "Lollipop chart")
Description
This function performs the MESS analysis of Elith et al. (2010) to determine the extent of the
environmental differences between model training and model projection (extrapolation) data. It is
applicable to variables in a matrix or data frame.
Usage
Arguments
V a matrix or data frame containing the variables (one in each column) in the
training dataset.
P a matrix or data frame containing the same variables in the area to which the
model(s) will be projected. Variables (columns) must be in the same order as in
V, and colnames(P) must exist.
id.col optionally, the index number of a column containing the row identifiers in P. If
provided, this column will be excluded from MESS calculations but included in
the output.
verbosity Integer number indicating the amount of messages to display while computing
the results. The default is to display all messages. Set verbosity=0 for no mes-
sages.
MESS 37
Details
When model predictions are projected into regions, times or spatial resolutions not analysed in the
training data, it may be important to measure the similarity between the new environments and
those in the training sample (Elith et al. 2010), as models are not so reliable when predicting
outside their domain (Barbosa et al. 2009). The Multivariate Environmental Similarity Surfaces
(MESS) analysis measures the similarity in the analysed variables between any given locality in the
projection dataset and the localities in the reference (training) dataset (Elith et al. 2010).
MESS analysis is implemented in the MAXENT software (Phillips et al. 2006) and in the dismo
R package, but there it requires input variables in raster format. This implies not only the use of
complex spatial data structures, but also that the units of analysis are rectangular pixels, whereas we
often need to model distribution data recorded on less regular units (e.g. provinces, river basins),
or on equal-area cells that are not necessarily rectangular (e.g. UTM cells, equal-area hexagons or
other geometric shapes). The MESS function computes this analysis for variables in a data frame,
where localities (in rows) may be of any size or shape.
Value
The function returns a data frame with the same column names as P, plus a column named TOTAL,
quantifying the similarity between each point in the projection dataset and those in the reference
dataset. Negative values indicate localities that are environmentally dissimilar from the reference
region. The last column, MoD, indicates which of the column names of P corresponds to the most
dissimilar variable, i.e., the limiting factor or the variable that drives the MESS in that locality (Elith
et al. 2010).
Note
Newer and apparently more complete methods for analysing environmental dissimilarities have
been developed, such as extrapolation detection (ExDet; Mesgaran et al. 2014) and Mobility-
Oriented Parity analysis (MOP; Owens et al. 2013).
Author(s)
Alberto Jimenez-Valverde, A. Marcia Barbosa
References
Barbosa A.M., Real R. & Vargas J.M. (2009) Transferability of environmental favourability models
in geographic space: the case of the Iberian desman (Galemys pyrenaicus) in Portugal and Spain.
Ecological Modelling 220: 747-754
Elith J., Kearney M. & Phillips S. (2010) The art of modelling range-shifting species. Methods in
Ecology and Evolution 1: 330-342
Mesgaran M.B., Cousens R.D. & Webber B.L. (2014) Here be dragons: a tool for quantifying
novelty due to covariate range and correlation change when projecting species distribution models.
Diversity and Distributions, 20: 1147-1159
Owens H.L., Campbell L.P., Dornak L.L., Saupe E.E., Barve N., Soberon J., Ingenloff K., Lira-
Noriega A., Hensz C.M., Myers C.E. & Peterson A.T. (2013) Constraints on interpretation of eco-
logical niche models by limited environmental ranges on calibration areas. Ecological Modelling,
263: 10-18
38 MESS
Phillips S.J., Anderson R.P. & Schapire R.E. (2006) Maximum entropy modeling of species geo-
graphic distributions. Ecological Modelling 190: 231-259
See Also
OA; mess in package dismo; ecospat.climan in package ecospat; kuenm_mop and kuenm_mmop in
package kuenm
Examples
## Not run:
# load package 'fuzzySim' and its sample data:
require(fuzzySim)
data(rotif.env)
unique(rotif.env$CONTINENT)
rotif.env$HEMISPHERE[rotif.env$CONTINENT %in%
c("NORTHERN_AMERICA", "SOUTHERN_AMERICA")] <- "Western"
head(rotif.env)
head(east)
head(east.with.ID) # ID is in column 1
head(mess)
head(mess.with.ID)
range(mess[ , "TOTAL"])
## End(Not run)
MillerCalib 39
Description
This function calculates Miller’s (1991) calibration statistics for a presence probability model –
namely, the intercept and slope of a logistic regression of the response variable on the logit of
predicted probabilities. Optionally and by default, it also plots the corresponding regression line
over the reference diagonal (identity line). If the model is well calibrated, the line should lie along
(or at least be nearly parallel to) the reference diagonal, i.e. the slope should ideally equal 1 (i.e.,
45 degrees).
Usage
MillerCalib(model = NULL, obs = NULL, pred = NULL, plot = TRUE,
line.col = "black", diag = TRUE, diag.col = "grey",
plot.values = TRUE, digits = 2, xlab = "", ylab = "",
main = "Miller calibration", na.rm = TRUE, rm.dup = FALSE, ...)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments in-
stead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
plot logical, whether or not to produce a plot of the Miller regression line. Defaults
to TRUE.
line.col colour for the Miller regression line (if plot = TRUE).
diag logical, whether or not to add the reference diagonal (if plot = TRUE). Defaults
to TRUE.
diag.col line colour for the reference diagonal.
40 MillerCalib
plot.values logical, whether or not to report the values of the intercept and slope on the plot.
Defaults to TRUE.
digits integer number indicating the number of digits to which the values in the plot
should be rounded. Dafaults to 2. This argument is ignored if ’plot’ or ’plot.values’
are set to FALSE.
xlab label for the x axis.
ylab label for the y axis.
main title for the plot.
na.rm Logical value indicating whether missing values should be ignored in computa-
tions. Defaults to TRUE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
... additional arguments to pass to plot.
Details
Calibration or reliability measures how a model’s predicted probabilities relate to observed species
prevalence or proportion of presences in the modelled data (Pearce & Ferrier 2000; Wintle et al.
2005; Franklin 2010). If predictions are perfectly calibrated, the slope will equal 1 and the intercept
will equal 0, so the model’s calibation line will perfectly overlap with the reference diagonal or
identity line.
Note that Miller’s statistics assess the model globally: a model is well calibrated if the average of
all predicted probabilities equals the proportion of presences in the modelled data. For logistic re-
gression models, perfect calibration is always attained on the same data used for building the model
(Miller 1991); Miller’s calibration statistics are mainly useful when projecting a model outside those
training data.
Calibration can be separated into two measurable components, bias and spread, and a third com-
ponent, unexplained error. Bias describes a consistent overestimate or underestimate of presence
probability, which is reflected by a Miller intercept above or below 0 (i.e., a model line above or
below the reference diagonal). Spread describes a departure of the model line from the 45-degree
slope. A slope greater than 1 indicates that predicted values above 0.5 are underestimating, and
values below 0.5 are overestimating, the probability of presence. A slope smaller than 1 (while
greater than 0) implies that predicted values below 0.5 are underestimating, and values above 0.5
are overestimating, the probability of presence (Pearce & Ferrier 2000). A Miller slope very differ-
ent from 1 indicates a poorly calibrated model. The unexplained error component can be assessed,
though only in part, through residual analysis (Miller 1991; Pearce & Ferrier 2000).
While Miller’s calibration statistics were originally conceived for generalized linear models with
binomial distribution and logit link (Miller 1991), they may also apply to other models that also
estimate presence probability, including those that use different link functions such as probit or
cloglog. Regardless of how they get there, these models attempt to estimate presence probabilities
as well calibrated as possible, and the logit is the canonical link for the Bernoulli distribution which
is appropriate for a binary response variable. Indeed, the Miller slope is visibly worse when those
other link functions are used for computing it.
Miller’s calibration slope (though not the intercept) is also adequate to assess the calibration of
other predictions related to presence probability, such as suitability and favourability (see e.g. ’Fav’
MillerCalib 41
function in the fuzzySim package). Indeed, the slope is the same for presence probability and its
corresponding favourability value (see Examples, bottom).
Value
This function returns a list of two integer values:
If plot = TRUE, a plot will be produced with the model calibration line, optionally (if diag = TRUE)
over the reference diagonal, and optionally (if plot.values = TRUE) with the intercept and slope
values printed on it.
Author(s)
A. Marcia Barbosa
References
Franklin, J. (2010) Mapping Species Distributions: Spatial Inference and Prediction. Cambridge
University Press, Cambridge
Miller M.E., Hui S.L. & Tierney W.M. (1991) Validation techniques for logistic regression models.
Statistics in Medicine, 10: 1213-1226
Pearce J. & Ferrier S. (2000) Evaluating the predictive performance of habitat models developed
using logistic regression. Ecological Modelling, 133: 225-245
Wintle B.A., Elith J. & Potts J.M. (2005) Fauna habitat modelling and mapping: A review and case
study in the Lower Hunter Central Coast region of NSW. Austral Ecology, 30: 719-738
See Also
HLfit, Dsquared, RsqGLM, Boyce
Examples
# load sample models:
data(rotif.mods)
MillerCalib(model = mod)
MillerCalib(model = mod, plot.values = FALSE)
MillerCalib(model = mod, main = "Model calibration line")
# you can also use MillerCalib with vectors of observed and predicted values
# instead of a model object:
## Not run:
# (the following code requires the 'fuzzySim' pkg installed)
## End(Not run)
Description
This function takes a model object and returns the observed and (optionally) the fitted values in that
model.
Usage
mod2obspred(model, obs.only = FALSE)
Arguments
model a model object of class "glm", "gam", "gbm", "randomForest" or "bart" from
which the response variable and fitted (predicted) values can be extracted. Note
that, for "randomForest" models, only the out-of-bag prediction is available from
the model object (see ?predict.randomForest if you have that package in-
stalled), so here you’ll get different results if you provide ’model’ or the mod-
elled ’obs’ (and corresponding ’pred’) values.
obs.only logical value indicating whether only ’obs’ should be obtained (saves computing
time when ’pred’ not needed – used e.g. by prevalence). Defaults to FALSE.
Value
A data frame with one column containing the observed and (if obs.only=FALSE, the default) an-
other column containing the predicted values from ’model’.
Author(s)
A. Marcia Barbosa
modEvAmethods 43
See Also
prevalence
Examples
data(rotif.mods)
mod <- rotif.mods$models[[1]]
obspred <- mod2obspred(mod)
head(obspred)
Description
This function allows retrieving the methods available for some of the functions in modEvA, such as
threshMeasures, optiThresh, multModEv, getThreshold and getBins.
Usage
modEvAmethods(fun)
Arguments
fun a character vector of length 1 specifying the name (in quotes) of the function
for which to obtain the available methods. Must be one of "threshMeasures",
"optiThresh", "multModEv", "getThreshold" or "getBins".
Value
a character vector of the available methods for the specified function.
Author(s)
A. Marcia Barbosa
See Also
threshMeasures, optiThresh, getBins, multModEv
Examples
modEvAmethods("threshMeasures")
modEvAmethods("multModEv")
modEvAmethods("optiThresh")
modEvAmethods("getBins")
44 multModEv
Description
If you have a list of GLM model objects (created, e.g., with the multGLM function of the ’fuzzySim’
R-Forge package), or a data frame with presence-absence data and the corresponding predicted
values for a set of species, you can use the multModEv function to get a set of evaluation measures
for all models simultaneously, as long as they all have the same sample size.
Usage
multModEv(models = NULL, obs.data = NULL, pred.data = NULL,
measures = modEvAmethods("multModEv"), standardize = FALSE,
thresh = NULL, bin.method = NULL, verbosity = 0, ...)
Arguments
models a list of model object(s) of class "glm", all applied to the same data set. Evalua-
tion is based on the cases included in the models.
obs.data a data frame with observed (training or test) binary data. This argument is ig-
nored if ’models’ is provided.
pred.data a data frame with the corresponding predicted (training or test) values, with both
rows and columns in the same order as in ’obs.data’. This argument is ignored
if ’models’ is provided. Note that, for calibration measures (based on HLfit
or MillerCalib), the results are only valid if the input predictions represent
probability.
measures character vector of the evaluation measures to calculate. The default is all imple-
mented measures, which you can check by typing ’modEvAmethods("multModEv")’.
But beware: calibration measures (i.e., HL and Miller) are only valid if your
predicted values reflect actual presence probability (not favourability, habitat
suitability or others); you should exclude them otherwise.
standardize logical, whether to standardize measures that vary between -1 and 1 to the 0-1
scale (see standard01). The default is FALSE.
thresh argument to pass to threshMeasures if any of ’measures’ is calculated by
that function. The default is NULL, but a valid method must be specified
if any of ’measures’ is threshold-based - i.e., any of those in ’modEvAmeth-
ods("threshMeasures")’.
bin.method the method with which to divide the data into groups or bins, for calibration or
reliability measures such as HLfit. The default is NULL, but a valid method
must be specified if ’measures’ includes "HL" or "HL.p". Type modEvAmeth-
ods("getBins") for available options), and see HLfit and getBins for more in-
formation.
verbosity integer specifying the amount of messages or warnings to display. Defaults to
0, but can also be 1 or 2 for more messages from the functions within.
multModEv 45
... optional arguments to pass to HLfit (if "HL" or "HL.p" are included in ’mea-
sures’), namely n.bins, fixed.bin.size, min.bin.size, min.prob.interval or quan-
tile.type.
Value
A data frame with the value of each evaluation measure for each model.
Author(s)
A. Marcia Barbosa
See Also
threshMeasures
Examples
data(rotif.mods)
head(eval1)
head(eval2)
head(eval3)
46 OA
OA Overlap Analysis
Description
This function analyses the range of values of the given environmental variables at the sites where a
species has been recorded present.
Usage
OA(data, sp.cols, var.cols)
Arguments
data a data frame with your species’ occurrence data and the predictor variables.
sp.cols index number of the column containing the occurrence data of the species to be
modelled. Currently only one species can be analysed at a time.
var.cols index numbers of the columns containing the predictor variables to be used.
Details
Overlap Analysis is one of the simplest forms of modelling species’ distributions. It assesses the
ranges of values of the given environmental variables at the sites where a species has been recorded
present, and predicts where that species should be able to occur based on those presence data (e.g.
Brito et al. 1999, Arntzen & Teixeira 2006).
OA can also be useful when extrapolating models outside their original scope (geographical area,
time period or spatial resolution), as it can identify which localities are within the model’s domain
- i.e., within the analysed ranges of values of the variables, outside which the model may not be
reliable (e.g. Barbosa et al. 2009). In this case, the response is not a species’ presence, but rather
the sites that have been included in the model. See also the MESS function for a comparison between
modelled and extrapolation environments.
Input data for the OA function are a vector or column with ones and zeros (presences vs. absences
of a species if we want to model its occurrence, or modelled vs. non-modelled sites if we want to
know which non-modelled sites are within the modelled range), and a matrix or data frame with
the corresponding values of the environmental variables to consider (one variable in each column,
values in rows).
Value
A binary vector whith 1 where the values of all predictors lie within the ranges observed for the
presence records, and 0 otherwise.
Author(s)
A. Marcia Barbosa
optiPair 47
References
Arntzen J.W, Teixeira J. (2006) History and new developments in the mapping and modelling of the
distribution of the golden-striped salamander, Chioglossa lusitanica. Zeitschrift fur Feldherpetolo-
gie, Supplement: 1-14.
Barbosa, A.M., Real, R. & Vargas, J.M. (2009) Transferability of environmental favourability mod-
els in geographic space: the case of the Iberian desman (Galemys pyrenaicus) in Portugal and Spain.
Ecological Modelling 220: 747-754.
Brito J.C., Crespo E.G., Paulo O.S. (1999) Modelling wildlife distributions: Logistic Multiple Re-
gression vs Overlap Analysis. Ecography 22: 251-260.
See Also
MESS
Examples
## Not run:
# load package 'fuzzySim' and its sample data:
require(fuzzySim)
data(rotif.env)
names(rotif.env)
## End(Not run)
optiPair Optimize the classification threshold for a pair of related model eval-
uation measures.
Description
This function can optimize a model’s classification threshold based on a pair of model evaluation
measures that balance each other, such as sensitivity-specificity, precision-recall (i.e., positive pre-
dictive power vs. sensitivity), or omission-commission, or underprediction-overprediction (Fielding
& Bell 1997; Liu et al. 2011; Barbosa et al. 2013). The function plots both measures of the given
pair against all thresholds with a given interval, and calculates the optimal sum, difference and mean
of the two measures.
Usage
optiPair(model = NULL, obs = NULL, pred = NULL,
measures = c("Sensitivity", "Specificity"), interval = 0.01,
plot = TRUE, plot.sum = FALSE, plot.diff = FALSE, ylim = NULL,
na.rm = TRUE, exclude.zeros = TRUE, rm.dup = FALSE, ...)
48 optiPair
Arguments
Value
The output is a list with the following components:
measures.values
a data frame with the values of the chosen pair of measures, as well as their
difference, sum and mean, at each threshold.
MinDiff numeric value, the minimum difference between both measures.
ThreshDiff numeric value, the threshold that minimizes the difference between both mea-
sures.
MaxSum numeric value, the maximum sum of both measures.
ThreshSum numeric value, the threshold that maximizes the sum of both measures.
MaxMean numeric value, the maximum mean of both measures.
ThreshMean numeric value, the threshold that maximizes the mean of both measures.
If plot=TRUE (the default), a plot is also produced with the value of each of ’measures’ at each
threshold, and horizontal and vertical lines marking, respectively, the threshold and value at which
the difference between the two ’measures’ is minimal.
Author(s)
A. Marcia Barbosa
References
Barbosa, A.M., Real, R., Munoz, A.-R. & Brown, J.A. (2013) New measures for assessing model
equilibrium and prediction mismatch in species distribution models. Diversity and Distributions 19:
1333-1338
Fielding A.H. & Bell J.F. (1997) A review of methods for the assessment of prediction errors in
conservation presence/absence models. Environmental Conservation 24: 38-49
Liu C., White M., & Newell G. (2011) Measuring and comparing the accuracy of species distribu-
tion models with presence-absence data. Ecography, 34, 232-243.
See Also
optiThresh, threshMeasures
Examples
# load sample models:
data(rotif.mods)
optiPair(model = mod)
Description
The ’optiThresh’ function calculates optimal thresholds for a number of model evaluation measures
(see threshMeasures). Optimization is given for each measure, and/or for all measures according
to particular criteria (e.g. Jimenez-Valverde & Lobo 2007; Liu et al. 2005; Nenzen & Araujo 2011).
Results are given numerically and in plots.
Usage
optiThresh(model = NULL, obs = NULL, pred = NULL, interval = 0.01,
measures = modEvAmethods("threshMeasures"),
optimize = modEvAmethods("optiThresh"), simplif = FALSE, plot = TRUE,
sep.plots = FALSE, xlab = "Threshold", na.rm = TRUE, rm.dup = FALSE, ...)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments in-
stead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
optiThresh 51
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
interval numeric value between 0 and 1 indicating the interval between the thresholds at
which to calculate the evaluation measures. Defaults to 0.01.
measures character vector indicating the names of the model evaluation measures for
which to calculate optimal thresholds. The default is using all measures avail-
able in ’modEvAmethods("threshMeasures")’.
optimize character vector indicating the threshold optimization criteria to use; "each" cal-
culates the optimal threshold for each model evaluation measure, while the re-
maining options optimize all measures according to the specified criterion. The
default is using all criteria available in ’modEvAmethods("optiThresh")’.
simplif logical, whether to calculate a faster simplified version. Used internally in other
functions.
plot logical, whether to plot the values of each evaluation measure at all thresholds.
sep.plots logical. If TRUE, each plot is presented separately (you need to be recording R
plot history to be able to browse through them all); if FALSE(the default), all
plots are presented together in the same plotting window.
xlab character vector indicating the label of the x axis.
na.rm Logical value indicating whether missing values should be ignored in computa-
tions. Defaults to TRUE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
... additional arguments to pass to plot.
Value
This function returns a list with the following components:
all.thresholds a data frame with the values of all analysed measures at all analysed thresholds.
optimals.each if "each" is among the threshold criteria specified in ’optimize’, optimals.each
is output as a data frame with the value of each measure at its optimal threshold,
as well as the type of optimal for that measure (which may be the maximum for
measures of goodness such as "Sensitivity", or the minimum for measures of
badness such as "Omission").
optimals.criteria
a data frame with the values of measure at the threshold that maximizes each of
the criteria specified in ’optimize’ (except for "each", see above).
Note
"Sensitivity" is the same as "Recall", and "PPP" (positive predictive power) is the same as "Preci-
sion". "F1score"" is the harmonic mean of precision and recall.
52 optiThresh
Note
Some measures cannot be calculated for thresholds at which there are zeros in the confusion ma-
trix, hence the eventual ’NaN’ or ’Inf’ in results. Also, optimization may be deceiving for some
measures; use ’plot = TRUE’ and inspect the plot(s).
Author(s)
A. Marcia Barbosa
References
Jimenez-Valverde A. & Lobo J.M. (2007) Threshold criteria for conversion of probability of species
presence to either-or presence-absence. Acta Oecologica 31: 361-369.
Liu C., Berry P.M., Dawson T.P. & Pearson R.G. (2005) Selecting thresholds of occurrence in the
prediction of species distributions. Ecography 28: 385-393.
Nenzen H.K. & Araujo M.B. (2011) Choice of threshold alters projections of species range shifts
under climate change. Ecological Modelling 222: 3346-3354.
See Also
threshMeasures, optiPair
Examples
# load sample models:
data(rotif.mods)
## Not run:
optiThresh(model = mod)
## End(Not run)
# you can also use optiThresh with vectors of observed and predicted
# values instead of with a model object:
## Not run:
optiThresh(obs = mod$y, pred = mod$fitted.values, pch = 20)
## End(Not run)
plotGLM 53
Description
This function plots the observed (presence/absence) data and the predicted (probability) values of a
Generalized Linear Model against the y regression equation (logit) values. Only logistic regression
(binomial response, logit link) is currently implemented.
Usage
plotGLM(model = NULL, obs = NULL, pred = NULL, link = "logit",
plot.values = TRUE, plot.digits = 3, xlab = "Logit (Y)",
ylab = "Predicted probability", main = "Model plot", na.rm = TRUE,
rm.dup = FALSE, ...)
Arguments
model a binary-response model object of class "glm". If this argument is provided,
’obs’ and ’pred’ will be extracted with mod2obspred. Alternatively, you can
input the ’obs’ and ’pred’ arguments instead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
link the link function of the GLM; only ’logit’ (the default) is implemented.
plot.values logical, whether to include in the plot diagnostic values such as explained de-
viance (calculated with the Dsquared function) and pseudo-R-squared measures
(calculated with the RsqGLM function). Defaults to TRUE.
plot.digits integer number indicating the number of digits to which the values in the plot
should be rounded (if ’plot.values = TRUE’). Defaults to 3.
xlab character string specifying the label for the x axis.
54 plotGLM
Value
This function outputs a plot of model predictions against observations.
Author(s)
A. Marcia Barbosa
References
Guisan A. & Zimmermann N.E. (2000) Predictive habitat distribution models in ecology. Ecologi-
cal Modelling 135: 147-186
Weisberg S. (1980) Applied Linear Regression. Wiley, New York
See Also
predPlot, predDensity
Examples
# load sample models:
data(rotif.mods)
plotGLM(model = mod)
predDensity Plot the density of predicted values for presences and absences.
Description
This function produces a histogram and/or a kernel density plot of predicted values for a binary-
response model, possibly separately for the observed presences and absences, given a model object
or a vector of predicted values and (optionally) a vector of the corresponding observed values. When
there are multiple predicted values for each site, it can also plot a confidence interval.
Usage
predDensity(model = NULL, obs = NULL, pred = NULL, separate = TRUE,
type = "both", ci = NA, legend.pos = "topright", main = "Density of predicted values",
na.rm = TRUE, rm.dup = FALSE, ...)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’pred’ (and optionally ’obs’)
argument(s) instead of ’model’.
obs alternatively to ’model’ and together with ’pred’, an optional numeric vector
(in the same order of ’pred’) of observed presences (1) and absences (0) of a
binary response variable. Alternatively (and if ’pred’ is a ’SpatRaster’), a two-
column matrix or data frame containing, respectively, the x (longitude) and y
(latitude) coordinates of the presence points, in which case the ’obs’ vector will
be extracted with ptsrast2obspred. This argument may be omitted (to show
the density plot of all ’pred’ values combined), and it is ignored if ’model’ is
provided.
pred alternatively to ’model’, a vector of predicted values of presence probability,
habitat suitability, environmental favourability or alike. Must be of the same
length and in the same order as ’obs’ (if the latter is provided). Alternatively (and
if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of the predicted values
for the entire evaluation region, in which case the ’pred’ vector will be extracted
with ptsrast2obspred. This argument is ignored if ’model’ is provided.
separate logical value indicating whether prediction densities should be computed sepa-
rately for observed presences (ones) and absences (zeros). Defaults to TRUE,
but it is automatically changed to FALSE if either ’model’ or ’obs’ are not pro-
vided, or if ’ci’ is not NULL.
type character vector specifying whether to produce a "histogram", a "density" plot,
or "both" (the default). Partial argument matching is used.
ci numeric value for a confidence interval to add to the plot, e.g. 0.95 for 95%.
The default is NA.
56 predDensity
legend.pos character specifying the position for the legend; NA or "n" for no legend. Posi-
tion can be "topright" (the default), "topleft, "bottomright"", "bottomleft", "top",
"bottom", "left", "right", or "center". Partial argument matching is used.
main main title for the plot.
na.rm logical value indicating whether missing values should be ignored in computa-
tions. Defaults to TRUE.
rm.dup if TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
... additional arguments to pass to hist, e.g. ’breaks’.
Details
For more details, please refer to the documentation of the functions mentioned under "See Also".
Value
This function outputs and plots the object(s) specified in ’type’ – by default, a density object and
a histogram.
Author(s)
A. Marcia Barbosa
See Also
hist, density, predPlot
Examples
# load sample models:
data(rotif.mods)
predDensity(model = mod)
predPlot Plot predicted values for presences and absences, optionally classified
according to a prediction threshold.
Description
This function plots predicted values separated into observed presences and absences and (optionally
and by default) coloured according to whether they are above or below a given prediction threshold.
The plot imitates (with permission from the author) one of the graphical outputs of the ’summary’
of models built with the embarcadero package (Carlson, 2020), but it can be applied to other types
of models or to a set of observed and predicted values, and it allows specifying a user-defined
threshold.
Usage
predPlot(model = NULL, obs = NULL, pred = NULL, thresh = "preval",
main = "Classified predicted values", legend.pos = "n", pch = 1, cex = 0.5,
col = c("black", "grey"), na.rm = TRUE, rm.dup = FALSE, interval = 0.01,
quant = 0)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments in-
stead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
58 predPlot
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
thresh threshold value to separate predicted presences from predicted absences in ’pred’;
can be "preval" (the default), to use the prevalence (i.e. proportion of pres-
ences) in ’obs’; or any real number between 0 and 1; or any of the options
available on modEvAmethods("getThreshold") – see Details in getThreshold
for their description. This value, if not NA or NULL, will be used to draw a
vertical line on the plot and to colour the points (predicted values) according to
whether they fall above or below the threshold.
main Main title for the plot.
legend.pos character value specifying the position for the legend on the plot. Can be "bot-
tomleft", "bottom", "bottomright", "topleft", "left", "top", "topright", "right",
"center", or NA or "n" for no legend (the default). Partial argument matching is
used.
pch plotting character for the presences and absences (see par).
cex relative size of the plotting character (see par).
col vector of length 2 indicating the colours with which to plot predicted presences
and absences (points above and below the threshold), respectively. If ’thresh’ is
NA or NULL, all points will have the first of the specified colours.
na.rm Logical value indicating whether missing values should be ignored in computa-
tions. Defaults to TRUE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
interval Argument to pass to optiThresh indicating the interval between the thresholds
to test, if ’thresh’ implies optimizing a threshold-based measure. The default
is 0.01. Smaller values may provide more precise results but take longer to
compute.
quant Numeric value indicating the proportion of presences to discard if thresh="MTP"
(minimum training presence). With the default value 0, MTP will be the thresh-
old at which all observed presences are classified as such; with e.g. quant=0.05,
MTP will be the threshold at which 5% presences will be classified as absences.
Value
This function outputs a plot as per ’Description’.
Note
Points are jittered randomly along the y axis to minimize visual overlap. So, each run of ’pred-
Plot’ (unless you use set.seed first) will produce a different arrangement of points for the same
data, although their x-axis values are faithful.
predPlot 59
Author(s)
A. Marcia Barbosa
References
Carlson C.J. (2020) embarcadero: Species distribution modelling with Bayesian additive regression
trees in R. Methods in Ecology and Evolution, 11: 850-858.
See Also
predDensity, plotGLM
Examples
# load sample models:
data(rotif.mods)
predPlot(model = mod)
## Not run:
threshold <- optiThresh(mod, measures = "TSS", optimize = "each")
threshold <- threshold$optimals.each[ , "threshold"]
threshold
predPlot(model = mod, thresh = threshold)
## End(Not run)
# you can also use 'predPlot' with vectors of observed and predicted values
# instead of a model object:
prevalence Prevalence
Description
For building and evaluating species distribution models, the porportion of presences of the species
may be an issue to take into account (e.g. Jimenez-Valverde & Lobo 2006, Barbosa et al. 2013).
The prevalence function calculates this measure.
Usage
prevalence(obs, model = NULL, event = 1, na.rm = TRUE)
Arguments
obs a vector or a factor of binary observations (e.g. 1 vs. 0, male vs. female, disease
vs. no disease, etc.). This argument is ignored if ’model’ is provided.
model alternatively to ’obs’, a binary-response model object of class "glm", "gam",
"gbm", "randomForest" or "bart". If this argument is provided, ’obs’ will be
extracted with mod2obspred.
event the value whose prevalence we want to calculate (e.g. 1, "present", etc.). This
argument is ignored if ’model’ is provided.
na.rm logical, whether NA values should be excluded from the calculation. The default
is TRUE.
Value
Numeric value of the prevalence of event in the obs vector.
Author(s)
A. Marcia Barbosa
References
Barbosa A.M., Real R., Munoz A.R. & Brown J.A. (2013) New measures for assessing model
equilibrium and prediction mismatch in species distribution models. Diversity and Distributions, in
press
Jimenez-Valverde A. & Lobo J.M. (2006) The ghost of unbalanced species distribution data in
geographical model predictions. Diversity and Distributions, 12: 521-524.
See Also
evenness
ptsrast2obspred 61
Examples
prevalence(x)
prevalence(y)
prevalence(z)
data(rotif.mods)
prevalence(mod = rotif.mods$models[[1]])
ptsrast2obspred Observed and predicted values from presence points and a raster map.
Description
This function takes presence points or coordinates and a raster map of model predictions, and it
returns a data frame with two columns containing, respectively, the observed (presence or no pres-
ence) and the predicted value for each pixel. Duplicate points (i.e., points falling in the same pixel,
whether or not they have the exact same coordinates) can be kept or removed.
Usage
ptsrast2obspred(pts, rst, rm.dup = FALSE, na.rm = FALSE, verbosity = 2)
Arguments
pts a ’SpatVector’ map of the presence points, or a two-column matrix or data frame
containing their x (longitude) and y (latitude) coordinates, respectively.
rst a one-layer ’SpatRaster’ map of the model predictions, in the same CRS as
’pts’. If you have a raster map in another format, you can try to convert it with
’terra::rast()’
62 ptsrast2obspred
rm.dup logical, whether repeated points within the same pixel should be removed. See
Examples. The default is FALSE.
na.rm logical, whether presence points with missing or non-finite values of ’rst’ should
be excluded from the output. The default is FALSE.
verbosity integer value indicating the amount of messages to display. Defaults to 2, for
the maximum amount of messages.
Value
This function outputs a data frame with one column containing the observed (1 for presence, 0 for
absence) and another column containing the corresponding predicted values from ’rst’.
Author(s)
A. Marcia Barbosa
Examples
## Not run:
# you can run these examples if you have the 'terra' package installed
require(terra)
plot(rst)
## End(Not run)
range01 63
Description
This function re-scales a numeric vector so that it ranges between 0 and 1. So, the lowest value
becomes 0, the highest becomes 1, and the ones in the middle retain their rank and relative diference.
Usage
Arguments
x a numeric vector.
na.rm logical, whether to remove NA values.
Details
Value
A numeric vector of the same length as the input, now with the values ranging from 0 to 1.
Author(s)
A. Marcia Barbosa
See Also
standard01
Examples
range01(0:10)
range01(-12.3 : 21.7)
64 RMSE
Description
This function computes the root mean square error of a model object or a set of observed and
predicted values or maps.
Usage
RMSE(model = NULL, obs = NULL, pred = NULL, na.rm = TRUE, rm.dup = FALSE)
Arguments
model a model object of class implemented in mod2obspred. If this argument is pro-
vided, ’obs’ and ’pred’ will be extracted with that function. Alternatively, you
can input the ’obs’ and ’pred’ arguments instead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
values of the response variable. Alternatively (and if ’pred’ is a ’SpatRaster’),
a two-column matrix or data frame containing, respectively, the x (longitude)
and y (latitude) coordinates of presence points, in which case the ’obs’ vector
of presences and absences will be extracted with ptsrast2obspred. This argu-
ment is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the corresponding
predicted values, of the same length and in the same order as ’obs’. Alternatively
(and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of the predicted
values for the entire evaluation region, in which case the ’pred’ vector will be
extracted with ptsrast2obspred. This argument is ignored if ’model’ is pro-
vided.
na.rm Logical value indicating whether missing values should be ignored in computa-
tions. Defaults to TRUE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
Details
The root mean square error is computed as the square root of the mean of the squared differences
between observed and predicted values. It is (approximately) the same as the standard deviation
of the model residuals (prediction errors), i.e., a measure of how spread out these residuals are, or
how concentrated the observations are around the model prediction line. The smaller the RMSE,
the better.
Value
The function returns a numeric value indicating the root mean square error of the model predictions.
rotif.mods 65
Author(s)
A. Marcia Barbosa
References
Kenney J.F. & Keeping E.S. (1962) Root Mean Square. "Mathematics of Statistics", 3rd ed. Prince-
ton, NJ: Van Nostrand, pp. 59-60.
See Also
plotGLM, RsqGLM, Dsquared
Examples
# load sample models:
data(rotif.mods)
RMSE(model = mod)
# you can also use RMSE with vectors of observed and predicted values
# instead of with a model object:
Description
A set of generalized linear models of rotifer species distributions on TDWG level 4 regions of
the world (Fontaneto et al. 2012), together with their predicted values. Mind that these models
are provided just as sample data and have limited application, due to limitations in the underlying
distribution records. See Details for more information.
Usage
data(rotif.mods)
66 RsqGLM
Format
A list of 2 elements:
$ predictions: a data.frame with 291 observations of 60 variables, namely the presence probability
(P) and environmental favourability (F) for each of 30 species of rotifers, obtained from the rotif.env
dataset in the ’fuzzySim’ R-Forge package
$ models: a list of the 30 generalized linear model (glm) objects which generated those predictions.
Details
These models were obtained with the ’multGLM’ function and the rotif.env dataset from R-Forge
package ’fuzzySim’ using the following code:
require(fuzzySim)
data(rotif.env)
rotif.mods <- multGLM(data = rotif.env, sp.cols = 18:47, var.cols = 5:17, step = FALSE, trim =
TRUE)
See package ’fuzzySim’ (currently available on R-Forge at http://fuzzysim.r-forge.r-project.
org) for more information on the source data that were used to build these models.
References
Fontaneto D., Barbosa A.M., Segers H. & Pautasso M. (2012) The ’rotiferologist’ effect and other
global correlates of species richness in monogonont rotifers. Ecography, 35: 174-182.
Examples
data(rotif.mods)
head(rotif.mods$predictions)
rotif.mods$models[[1]]
Description
This function calculates some (pseudo) R-squared statistics for binomial Generalized Linear Mod-
els.
Usage
Arguments
model a binary-response model object of class "glm". Alternatively, you can input the
’obs’ and ’pred’ arguments instead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a vector of observed presences
(1) and absences (0) of a binary response variable. This argument is ignored if
’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the corresponding
predicted values of presence probability. Must be of the same length and in the
same order as ’obs’. This argument is ignored if ’model’ is provided.
use argument to be passed to cor for handling mising values.
plot logical value indicating whether or not to display a bar chart or (by default) a
lollipop chart of the calculated measures.
plot.type character value indicating the type of plot to produce (if plot=TRUE). Can be
"lollipop" (the default) or "barplot".
... additional arguments to pass to the plot function (see Examples).
Details
Implemented measures include the R-squareds of McFadden (1974), Cox-Snell (1989), Nagelkerke
(1991, which corresponds to the corrected Cox-Snell, eliminating its upper bound), and Tjur (2009).
See Allison (2014) for a brief review of these measures.
Value
The function returns a named list of the calculated R-squared values.
Note
Tjur’s R-squared can only be calculated for models with binomial response variable; otherwise, NA
will be returned.
Author(s)
A. Marcia Barbosa
References
Allison P. (2014) Measures of fit for logistic regression. SAS Global Forum, Paper 1485-2014
Cox, D.R. & Snell E.J. (1989) The Analysis of Binary Data, 2nd ed. Chapman and Hall, London
McFadden, D. (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P.
(ed.) Frontiers in Economics. Academic Press, New York
Nagelkerke, N.J.D. (1991) A note on a general definition of the coefficient of determination. Biometrika,
78: 691-692
Tjur T. (2009) Coefficients of determination in logistic regression models - a new proposal: the
coefficient of discrimination. The American Statistician, 63: 366-372.
68 standard01
See Also
Dsquared, AUC, threshMeasures, HLfit
Examples
# load sample models:
data(rotif.mods)
RsqGLM(model = mod)
# you can also use RsqGLM with vectors of observed and predicted values
# instead of a model object:
Description
This function converts the score of a measure that ranges from -1 to 1 (e.g. a kappa or TSS value
obtained for a model) into its (linearly) corresponding value in 0-to-1 scale, so that it can be com-
pared directly with measures that range between 0 and 1 (such as CCR or AUC). It can also perform
the conversion in the opposite direction.
Usage
standard01(score, direction = c("-1+1to01", "01to-1+1"))
Arguments
score numeric value indicating the score of the measure of interest.
direction character value indicating the direction in which to perform the standardization.
The default, "-1+1to01", can be switched to "01to-1+1".
standard01 69
Details
While most of the threshold-based measures of model evaluation range theoretically from 0 to 1,
some of them (such as Cohen’s kappa and the true skill statistic, TSS) may range from -1 to 1
(Allouche et al. 2006). Thus, the values of different measures may not be directly comparable (Bar-
bosa 2015). We do not usually get negative values of TSS or kappa (nor values under 0.5 for CCR
or AUC, for example) because that only happens when model predictions perform worse than ran-
dom guesses; still, such values are mathematically possible, and can occur e.g. when extrapolating
models to regions where where the species-environment relationships differ. This standardization
is included as an option in the threshMeasures function.
Value
The numeric value of ’score’ when re-scaled to the 0-to-1 (or to the -1 to +1) scale.
Note
Note that this is not the same as re-scaling a vector so that it ranges between 0 and 1, which is done
by range01.
Author(s)
A. Marcia Barbosa
References
Allouche O., Tsoar A. & Kadmon R. (2006) Assessing the accuracy of species distribution models:
prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43: 1223-1232
Barbosa, A.M. (2015) Re-scaling of model evaluation measures to allow direct comparison of their
values. The Journal of Brief Ideas, 18 Feb 2015, DOI: 10.5281/zenodo.15487
See Also
threshMeasures, range01
Examples
standard01(0.6)
Description
This function calculates a number of measures for evaluating the classification accuracy of a species
distribution (or ecological niche, or bioclimatic envelope...) model against observed presence-
absence data (Fielding & Bell 1997; Liu et al. 2011; Barbosa et al. 2013), upon the choice of
a threshold value above which the model is considered to predict that the species should be present.
Usage
threshMeasures(model = NULL, obs = NULL, pred = NULL, thresh,
measures = modEvAmethods("threshMeasures")
[-grep("OddsRatio", modEvAmethods("threshMeasures"))], simplif = FALSE,
plot = TRUE, plot.type = "lollipop", plot.ordered = FALSE, standardize = TRUE,
verbosity = 2, interval = 0.01, quant = 0, na.rm = TRUE, rm.dup = FALSE, ...)
Arguments
model a binary-response model object of class "glm", "gam", "gbm", "randomForest"
or "bart". If this argument is provided, ’obs’ and ’pred’ will be extracted with
mod2obspred. Alternatively, you can input the ’obs’ and ’pred’ arguments in-
stead of ’model’.
obs alternatively to ’model’ and together with ’pred’, a numeric vector of observed
presences (1) and absences (0) of a binary response variable. Alternatively (and
if ’pred’ is a ’SpatRaster’), a two-column matrix or data frame containing, re-
spectively, the x (longitude) and y (latitude) coordinates of the presence points,
in which case the ’obs’ vector will be extracted with ptsrast2obspred. This
argument is ignored if ’model’ is provided.
pred alternatively to ’model’ and together with ’obs’, a vector with the correspond-
ing predicted values of presence probability, habitat suitability, environmental
favourability or alike. Must be of the same length and in the same order as ’obs’.
Alternatively (and if ’obs’ is a set of point coordinates), a ’SpatRaster’ map of
the predicted values for the entire evaluation region, in which case the ’pred’
vector will be extracted with ptsrast2obspred. This argument is ignored if
’model’ is provided.
thresh threshold to separate predicted presences from predicted absences in ’model’ or
’pred’; can be a numeric value between 0 and 1, or any of the options provided
with modEvAmethods("getThreshold"). See Details in getThreshold for a
description of the available options, and also Details below for a more informed
choice.
measures character vector of the evaluation measures to use. By default, all measures
available through modEvAmethods("getThreshold") are included, except for
"OddsRatio" which usually yields overly large values that stand out in the plot.
threshMeasures 71
simplif logical, whether to calculate a faster, simplified version. Used internally by other
functions in the package. Defaults to FALSE.
plot logical, whether to produce a bar chart or (by default) a lollipop chart of the
calculated measures. Defaults to TRUE.
plot.type character value indicating the type of plot to produce (if plot=TRUE). Can be
"lollipop" (the default) or "barplot".
plot.ordered logical, whether to plot the measures in decreasing order rather than in input
order. Defaults to FALSE.
standardize logical, whether to change measures that may range between -1 and +1 (namely
kappa and TSS) to their corresponding value in the 0-to-1 scale (skappa and
sTSS), so that they can compare directly to other measures (see standard01).
The default is TRUE, but a message is displayed to inform the user about it.
verbosity integer specifying the amount of messages to display. Defaults to the maximum
implemented; lower numbers (down to 0) decrease the number of messages.
interval Numeric value, used if ’thresh’ is a threshold optimization method such as
"maxKappa" or "maxTSS", indicating the interval between the thresholds to
test. The default is 0.01. Smaller values may provide more precise results but
take longer to compute.
quant Numeric value indicating the proportion of presences to discard if thresh="MTP"
(minimum training presence). With the default value 0, MTP will be the thresh-
old at which all observed presences are classified as such; with e.g. quant=0.05,
MTP will be the threshold at which 5% presences will be classified as absences.
na.rm Logical value indicating whether missing values should be ignored in computa-
tions. Defaults to TRUE.
rm.dup If TRUE and if ’pred’ is a SpatRaster and if there are repeated points within the
same pixel, a maximum of one point per pixel is used to compute the presences.
See examples in ptsrast2obspred. The default is FALSE.
... additional arguments to be passed to the plot function.
Details
The threshold value can be chosen according to a number of criteria (see e.g. Liu et al. 2005,
2013; Jimenez-Valverde & Lobo 2007; Nenzen & Araujo 2011). You can choose a fixed numeric
value, or set ’thresh’ to "preval" (species’ prevalence or proportion of presences in the data in-
put to this function), or calculate optimal threshold values according to different criteria with the
getThreshold, optiThresh or optiPair function. If you are using "environmental favourabil-
ity" as input ’pred’ data (Real et al. 2006; see ’Fav’ function in R package fuzzySim), then the
0.5 threshold equates to using training prevalence in logistic regression (GLM with binomial error
distribution and logit link function).
While most of these threshold-based measures range from 0 to 1, some of them (such as kappa
and TSS) may range from -1 to 1 (Allouche et al. 2006), so their raw scores are not directly
comparable. ’threshMeasures’ includes an option (used by default) to standardize these measures
to 0-1 (Barbosa 2015) using the standard01 function, so that you obtain the standardized versions
skappa and sTSS.
This function can also be used to calculate the agreement between different presence-absence (or
other types of binary) data, as e.g. Barbosa et al. (2012) did for comparing mammal distribution
72 threshMeasures
data from atlas and range maps. Notice, however, that some of these measures, such as TSS or
NMI, are not symmetrical (obs vs. pred is different from pred vs. obs).
Value
If ’simplif=TRUE’, the output is a numeric matrix with the name and value of each measure. If
’simplif=FALSE’ (the default), the ouptut is a list with the following components:
Note
"Sensitivity" is the same as "Recall", and "PPP" (positive predictive power) is the same as "Preci-
sion". Some of these measures (like NMI, UPR, OPR, PPP, NPP) cannot be calculated for thresholds
at which there are zeros in the confusion matrix, so they can yield NaN values.
Author(s)
A. Marcia Barbosa
References
Allouche O., Tsoar A. & Kadmon R. (2006) Assessing the accuracy of species distribution models:
prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43: 1223-1232.
Barbosa, A.M. (2015) Re-scaling of model evaluation measures to allow direct comparison of their
values. The Journal of Brief Ideas, 18 Feb 2015, DOI: 10.5281/zenodo.15487
Barbosa A.M., Estrada A., Marquez A.L., Purvis A. & Orme C.D.L. (2012) Atlas versus range
maps: robustness of chorological relationships to distribution data types in European mammals.
Journal of Biogeography 39: 1391-1400
Barbosa A.M., Real R., Munoz A.R. & Brown J.A. (2013) New measures for assessing model
equilibrium and prediction mismatch in species distribution models. Diversity and Distributions
19: 1333-1338
Fielding A.H. & Bell J.F. (1997) A review of methods for the assessment of prediction errors in
conservation presence/absence models. Environmental Conservation 24: 38-49
Jimenez-Valverde A. & Lobo J.M. (2007) Threshold criteria for conversion of probability of species
presence to either-or presence-absence. Acta Oecologica 31: 361-369
Liu C., Berry P.M., Dawson T.P. & Pearson R.G. (2005) Selecting thresholds of occurrence in the
prediction of species distributions. Ecography 28: 385-393
Liu C., White M. & Newell G. (2011) Measuring and comparing the accuracy of species distribution
models with presence-absence data. Ecography 34: 232-243
varImp 73
Liu C., White M. & Newell G. (2013) Selecting thresholds for the prediction of species occurrence
with presence-only data. Journal of Biogeography, 40: 778-789
Nenzen H.K. & Araujo M.B. (2011) Choice of threshold alters projections of species range shifts
under climate change. Ecological Modelling 222: 3346-3354
Real R., Barbosa A.M. & Vargas J.M. (2006) Obtaining environmental favourability functions from
logistic regression. Environmental and Ecological Statistics 13: 237-245
See Also
optiThresh, optiPair, AUC
Examples
# load sample models:
data(rotif.mods)
Description
This function gets, and optionally plots, variable importance for an input model object of an imple-
mented class.
74 varImp
Usage
Arguments
Details
Variable importance in a model of class "glm" obtained with the glm function can be measured by
the magnitude of the absolute z-value test statistic, which is provided with summary(model). The
’varImp’ function outputs the absolute z value of each variable, and in the plot (by default) uses
different colours for variables with positive and negative relationships with the response.
If the input model is of class "gbm" of the gbm package, variable importance is obtained from
summary.gbm(model) and divided by 100 to get the result as a proportion rather than a percentage.
If the input model is of class "randomForest" of the randomForest package, variable importance is
obtained with model$importance.
If the input model is of class "bart" of the dbarts package, variable importance is obtained as the
mean number of tree splits each variable was chosen for. If ’error.bars’ is not NA, the error is also
computed according to the specified metric ("sd" or standard deviation by default).
Value
This function outputs, and optionally plots, a named numeric vector of variable importance, as
measured by ’imp.type’ (see Details). If ’model’ is of class "bart" and ’error.bars’ is not NA, the
output is a row-named data frame with the mean and the lower and upper bounds of the error bars
of variable importance.
Author(s)
A. Marcia Barbosa
References
Weissgerber T.L., Milic N.M., Winham S.J. & Garovic V,D, (2015) Beyond Bar and Line Graphs:
Time for a New Data Presentation Paradigm. PLOS Biol 13:e1002128. https://doi.org/10.1371/JOURNAL.PBIO.1002128
See Also
summary.glm
Examples
# load sample models:
data(rotif.mods)
plot.type = "barplot",
cex.names = 0.8,
main = "Variable importance in \n my model")
Description
This function performs variation partitioning (Borcard et al. 1992) among two factors (e.g. Ribas
et al. 2006) or three factors (e.g. Real et al. 2003) for either linear regression models (LM) or
generalized linear models (GLM).
Usage
varPart(A, B, C = NA, AB, AC = NA, BC = NA, ABC = NA, model.type = NULL,
A.name = "Factor A", B.name = "Factor B", C.name = "Factor C",
model = NULL, groups = NULL, pred.type = "Y", cor.method = "pearson",
return.models = FALSE, plot = TRUE, plot.digits = 3, cex.names = 1.5,
cex.values = 1.2, main = "", cex.main = 2, plot.unexpl = TRUE, colr = FALSE)
Arguments
A numeric value of the R-squared of the regression of the response variable on
the variables related to factor ’A’. NOTE: INSTEAD of this and the next 10
arguments, you can use arguments ’model’ and ’groups’ below.
B numeric value of the R-squared of the regression of the response variable on the
variables related to factor ’B’
C (optionally, if there are 3 factors) numeric value of the R-squared of the regres-
sion of the response on the variables related to factor ’C’
AB numeric value of the R-squared of the regression of the response on the variables
of factors ’A’ and ’B’ simultaneously
AC (if there are 3 factors) numeric value of the R-squared of the regression of the
response on the variables of factors ’A’ and ’C’ simultaneously
BC (if there are 3 factors) numeric value of the R-squared of the regression of the
response on the variables of factors ’B’ and ’C’ simultaneously
ABC (if there are 3 factors) numeric value of the R-squared of the regression of the
response on the variables of factors ’A’, ’B’ and ’C’ simultaneously
model.type deprecated argument, kept here for back-compatibility
A.name character string indicating the name of factor ’A’
B.name character string indicating the name of factor ’B’
C.name character string indicating the name of factor ’C’ (if there are 3 factors)
model a model object of class ’glm’ (for linear models, instead of ’lm’ you can use
use ’glm’ with family=gaussian). If this argument is provided, all previous ar-
guments are ignored, as they are computed instead from ’model’ and ’groups’.
varPart 77
groups data frame with 2 columns, the 1st one containing the names of the variables,
and the 2nd one containing the names of the factors in which they should be
grouped (e.g. climatic, human, topographic) for the variation partitioning. This
argument is required (and only used) if ’model’ is provided.
pred.type character value specifying the type of predictions among which to calculate the
R-squared values, or squared correlations. Can be "Y" (the default) for the ’link’
function (in the scale of the predictor variables), "P" for using the ’response’
(e.g. in the scale of probability for models of family binomial), or "F" for using
favourability (i.e., probability after removing the effect of modelled prevalence,
as in the ’Fav’ function of package fuzzySim); see Details. This argument is
only used if ’model’ is provided.
cor.method character value to pass to the ’method’ argument of cor specifying the correla-
tion coefficient to use. The default is "pearson". This argument is only used if
’model’ is provided.
return.models logical value indicating whether to include in the output the model obtained for
each group of variables. The default is FALSE. This argument is only used if
’model’ is provided.
plot logical, whether to plot the variation partitioning diagram. The default is TRUE.
plot.digits integer value of the number of digits to which to round the values in the plot.
The default is 3.
cex.names numeric value indicating character expansion factor to define the size of the
names of the factors displayed in the plot.
cex.values numeric value indicating character expansion factor to define the size of the
values displayed in the plot.
main optional character string indicating the main title for the plot. The default is
empty.
cex.main numeric value indicating character expansion factor to define the font size of the
plot title (if provided).
plot.unexpl logical value indicating whether the amount of unexplained variation should be
included in the plot. The default is TRUE.
colr logical value indicating whether or not to colour the circles in the plot. The
default is FALSE for back-compatibility.
Details
If you have linear models (i.e. GLMs of family Gaussian), input data for ’varPart’ are the coeffi-
cients of determination (R-squared values) of the linear regressions of the response variable on all
the variables in the model, on the variables related to each particular factor, and (when there are 3
factors) on the variables related to each pair of factors. The outputs are the amounts of variance ex-
plained exclusively by each factor, the amounts explained exclusively by the overlapping effects of
each pair of factors, and the amount explained by the overlap of the 3 factors if this is the case (e.g.
Real et al. 2003). The amount of variation not explained by the complete model is also provided.
If you have generalized linear models (GLMs) such as logistic regression (see glm), you have no
true R-squared values; inputs can then be the squared coefficients of correlation between the model
predictions given by each factor (or pair of factors) and the predictions of the complete model.
78 varPart
Predictions can be probability (e.g. Munoz & Real 2006), favourability (Baez et al. 2012, Estrada
et al. 2016), or the ’logit’ linear predictor (Real et al. 2013); the correltion coefficient can be e.g.
Pearson’s (Munoz & Real 2006) or Spearman’s (Baez et al. 2012). An adjusted R-squared can
also be used (De Araujo et al. 2014). In GLMs, the "total variation" (AB or ABC, depending on
whether you have two or three factors) is 1 (correlation of the predictions of the complete model
with themselves), and output values are not the total amounts of variance (of the response variable)
explained by variable groups and their overlaps, but rather their proportional contribution to the
total variation explained by the model.
Value
This function returns a data frame indicating the proportion of variance accounted for by each of
the factors or groups, and (if ’plot = TRUE’) a Venn diagram of the contributions of each factor or
overlap. If ’return.models=TRUE’, the output includes also the model obtained for each group of
variables.
Note
These results derive from arithmetic operations between your input values, and they always sum up
to 1; if your input is incorrect, the results will be incorrect as well, even if they sum up to 1.
This function had a bug up to modEvA version 0.8: a badly placed line break prevented the ABC
overlap from being calculated correctly. Thanks to Jurica Levatic for pointing this out and helping
to solve it!
Oswald van Ginkel also suggested a fix to some plotting awkwardness when using only two factors,
and a nice option for colouring the plot. Many thanks!
Author(s)
A. Marcia Barbosa
References
Baez J.C., Estrada A., Torreblanca D. & Real R (2012) Predicting the distribution of cryptic species:
the case of the spur-thighed tortoise in Andalusia (southern Iberian Peninsula). Biodiversity and
Conservation 21: 65-78
Borcard D., Legendre P., Drapeau P. (1992) Partialling out the spatial component of ecological
variation. Ecology 73: 1045-1055
De Araujo C.B., Marcondes-Machado L.O. & Costa G.C. (2014) The importance of biotic interac-
tions in species distribution models: a test of the Eltonian noise hypothesis using parrots. Journal
of Biogeography 41: 513-523
Estrada A., Delgado M.P., Arroyo B., Traba J., Morales M.B. (2016) Forecasting Large-Scale Habi-
tat Suitability of European Bustards under Climate Change: The Role of Environmental and Geo-
graphic Variables. PLoS ONE 11(3): e0149810
Munoz A.-R. & Real R. (2006) Assessing the potential range expansion of the exotic monk parakeet
in Spain. Diversity and Distributions 12: 656-665
Real R., Barbosa A.M., Porras D., Kin M.S., Marquez A.L., Guerrero J.C., Palomo L.J., Justo E.R.
& Vargas J.M. (2003) Relative importance of environment, human activity and spatial situation in
varPart 79
Examples
# if you have a linear model (LM), use (non-adjusted) R-squared values
# for each factor and for their combinations as inputs:
# with 2 factors:
# with 3 factors:
# if you have a model object and a table classifying the variables into groups:
data(rotif.mods)
mod <- rotif.mods$models[[2]]
head(mod$model)
vars <- colnames(mod$model)[-1]
80 varPart
vars
var_groups <- data.frame(vars = vars, groups = c("Spatial", "Spatial",
"Climate", "Climate", "Climate", "Human"))
var_groups
81