On Garp: Preparing Environmental Data
On Garp: Preparing Environmental Data
On Garp: Preparing Environmental Data
Once the extension is turned on, it shows up as GARP in the main menu of ArcView.
With the raster datasets of interest loaded as grids in ArcView, one initiates the extension, and then
must provide a path where the datasets are to be deposited. Note that the path must end with a \.
Then, one must choose a clipping method by which to reduce the environmental data grids to the study
area, which (as is justified in the conceptual section). The simplest are user-input coordinates, or a
rectangle drawn on the map using the graphics draw capabilities in ArcView (the rectangle must be
selected). A slightly more complicated option is that of providing a polygon shapefile that has a single
element in its attributes table. Regardless, one chooses among these options on the next page:
The program then provides a summary of the spatial resolutions associated with each of the grids:
One is then asked if one wishes to resample grids to a different cell size (answer yes!):
And then one specifies the resolution to which all of the grids should be resampled:
And the program provides some information about the grids that will result:
Then, one is asked about the manner of resampling. One normally would choose Nearest Neighbor,
but Bilinear Interpolation and Cubic Convolution are provided as additional options.
The key feature of the ASCII raster grids that are output is that they have exactly the same header
information, that is the specifications of the position and size of the grid. Opening one of the ASCII raster
grids, this header information looks like this:
Note that GARP requires an additional layer, which must be named mask.asc. This layer serves to
restrict the analyses that GARP develops to a particular area, which (see conceptual section) should be
equivalent to the M in the BAM diagram. Frequently, this layer is simply a copy of one of the other
environmental layers.
The next step involves using the program module GARP Dataset Manager, which downloads and installs
with the main GARP program. This program has two functions: (1) it rescales all values in each of the
environmental grids to the range 0-254, and (2) it removes all line breaks, creating in effect a single
vector of values across the entire region of analysis. The program is initiated from the main menu.
To start this program in processing the datasets for use in GARP, select Load Layers < from Ascii Raster
Grids
One then navigates to the folder in which the ASCII raster grids are stored, and one clicks on any of the
grids. All ASCII raster grids in the folder will load automatically. Note that this process can be timeconsuming, particularly if the grids that you are using are large.
One must then provide an identifier for the data set this is accomplished by filling in the top two lines
on the left side of the program interface with the same identifier:
This identifier is what will appear in the main GARP program as the tag for this particular data set.
Finally, the user must recall that GARPs use of these datasets is dumb; that is, if you are planning to
project among different environmental datasets, the only guide that GARP has for connecting different
datasets is the order of the data layers in the GARP data file (i.e., what is produced by this program). As
a consequence, you must put the different data layers in the same order, among different GARP
datasets. The program has a panel of buttons for reordering data layers, as well as removing and adding
them to the data set (recall, though, that the headers of the different environmental layer ASCII files
must be exactly the same, or this program will bomb!):
Note also that Maxents use of different data layers when projecting from one dataset to another is
different, so you must pay careful attention to this difference between the two programs. After the data
layers are in the proper order, one simply saves (File < Save) the GARP dataset, generally in the same
folder as the data layers, and one is ready to roll on actual analyses using GARP.
Hitting the button Upload Data Points, the file loads automatically. If there are multiple species in the
list, then multiple lines will appear in the Species List window.
2. Optimization Parameters
Here, one specifies the number of replicate models (runs) that one wishes to develop. The
convergence limit is rarely changed, but can be reduced to make the algorithm process longer and in
greater detailif this is done, then the Max iterations should be increased accordingly. Rule types can
be turned on and off, but all are known to contribute significantly, so it is recommended that all be left
on.
3. Environmental Layers
To add environmental datasets to an analysis, one chooses Datasets < Scan Directory,
and then one identifies the directory in which all of the environmental datasets of interest are located.
In this case, 5 environmental datasets are contained within the folder Future. By typing any character
in the File Name area, and hitting open, the program will detect all datasets in any folder or subfolder of
this directory.
Once one has scanned the directory, one can use the picklist at Dataset to choose a dataset. The
dataset in this window will be used for model calibration. One can also turn on and off specific layers via
checking an unchecking the boxes next to each particular layer. Finally, one can do a variety of jackknife
manipulations of inclusion or exclusion of layers in the analyses, with a greater flexibility than is offered
by Maxent.
4. Projection Layers
In this box, one can specify up to 9 environmental datasets onto which the models trained in the
analyses will be projected. They will appear in the results with suffixes _1, _2, etc., in order
corresponding to their order in this list.
5. Output:
In this window, one specifies the path of the folder in which one wishes for the results to be placed. One
can choose among three types of output, although ASCII Grids and ARC/INFO Grids are the only useful
ones. Specifications for enabling ARC/INFO Grid output are provided in the manual to desktop GARP.
The buttons under Models are not enabled and make no difference in the output.
This panel includes the most complicated set of options and selections that must be made. The user is
referred to a detailed publication (Anderson et al. 2003) for a close understanding of these options.
However, a basic summary is as follows:
a. Omission measure: indicates whether the data set aside from modeling in panel 1 are used for
calculations of omission rates, or whether the actual data used in model calibration are used.
Extrinsic is almost always used in this case.
b. Omission threshold: indicates whether an absolute (hard) or relative (soft) criterion is used
for omission criteria.
c. % distribution: this percentage is either the absolute value (i.e., omission rate <5%) or the
relative value (i.e., the 20% of replicate models that have the lowest omission rates), depending
on (b).
d. Commission threshold: this field indicates the proportion of the low-omission models identified
in (c) to be retained in the best subsets section.
The results are organized as follows. Within the target directory (in the example, G1_LPC), there is a
directory BestSubsets that includes a subdirectory for each species, and that for each species includes
a subdirectory for each projection. The projection to the environmental layers on which the model was
calibrated is the _0 subdirectory. Subsequent projections are in the same order as on the
configuration page.
Finally, the program exports a Results file, both in ASCII and Excel formats. This file provides quite a lot
of information about the model, including the following fields:
Field name
Task
Run
Species
Atomic Rules
Range Rules
Negated Rules
Logit Rules
Iter.
Conv.
Train Acc
Pr:Pr/Ac:Pr
Pr:Ab/Ac:Pr
Pr:Pr/Ac:Ab
Pr:Ab/Ac:Ab
Test Acc
Pr:Pr/Ac:Pr
Pr:Ab/Ac:Pr
Pr:Pr/Ac:Ab
Pr:Ab/Ac:Ab
Description
A running count of the models having been run as part of this analysis.
A running count of the models run for this particular species.
The species that is under analysis in this particular case.
Inclusion of atomic rule types in this analysis.
Inclusion of range rule types in this analysis.
Inclusion of negated range rule types in this analysis.
Inclusion of logit rule types in this analysis.
The iteration at which the model evolution ceased.
The change in the optimization parameter that caused the analysis to stop.
Overall correct prediction of training data.
Elements of the confusion matrix based on training data.
Elements of the confusion matrix based on training data.
Elements of the confusion matrix based on training data.
Elements of the confusion matrix based on training data.
Overall correct prediction of the intrinsic testing data.
Elements of the confusion matrix based on intrinsic testing data.
Elements of the confusion matrix based on intrinsic testing data.
Elements of the confusion matrix based on intrinsic testing data.
Elements of the confusion matrix based on intrinsic testing data.
Total Area
Presence Area
Absence Area
Non-predicted
Area
Yes
No
ChiSq
p
Commission
Omission (int)
Omission (ext)
Status
Message
Layers:
The total area (in pixels) that is involved in the analysis (i.e., calibration area).
The total area (in pixels) that is predicted as suitable.
The total area (in pixels) that is predicted as unsuitable.
The total area (in pixels) that is not predicted as suitable or unsuitable, taken as
predicted as unsuitable
The number of extrinsic testing points that was correctly predicted as suitable.
The number of extrinsic testing points that was incorrectly predicted as unsuitable.
A chi-square statistic based on the preceding 5 fields--not in use these days.
The p-value associated with the chi-square statistic in preceding field.
The proportion of the overall study area that was predicted as suitable.
The proportion of training occurrence points that was predicted incorrectly as
unsuitable.
The proportion of the extrinsic testing occurrence points that was predicted
incorrectly as unsuitable.
Whether the model was processed.
Whether the model processing was successful.
The remaining columns indicate the inclusion or exclusion of different
environmental layers in the analysis.