Species Distribution Modelling
Species Distribution Modelling
with MAXENT
Mikael von Numers
Åbo Akademi
Why model species distribution?
– Knowledge about the geographical distribution of species
is crucial for conservation and spatial planning.
– Detailed data on species distribution is usually not
available and collecting such data is costly and labor
intensive.
– Conservationists have in many cases to rely on predictive
models for estimating patterns of species distribution and
for making conservation strategies.
– SDMs provide one of the best ways to overcome
sparseness typical of distributional data, by relating them
to a set of geographic or environmental predictors.
What do we need for SDM?
• Maxent is a quite new method, but it has performed excellently in tests compared to other
similar methods.
• It is quite easy to use and has an nice user friendly interface.
• Shareware, active discussion group, lots of published papers recently.
• Download from: www.cs.princeton.edu/~schapire/maxent/
• Major conclusions drawn from Elith et al. 2006:
– Presence-only data are useful for modelling species´ distributions
– Presence-only data can be sufficiently accurate to be used in conservation planning
– New modelling methods, such as MAXENT, generally outperforms established methods
• Drawbacks:
– a “black box”; not easy to understand how the method works, compared to,
for instance, to GLM or GAM
– According to the literature not as “mature” a statistical method as GAM or
GLM.
– Sample selection bias is a bigger problem for presence-only methods than for
presence -absence methods. If there is a bias you will get a model that
combines the species distribution with the distribution of sampling effort.
• There are methods to deal with this problem: you can provide Maxent
with a “bias”raster to correct for the bias in sampling effort.
– If absence data are available, a presence-absence method is a better choice
than a po-method.
In this case a fitted model might be closer to a model of survey effort than of distribution.
The Maxent user interface
Zostera marina
Species data:
75 presence points of
Zostera marina in the
S. Archipelago Sea
Species X_coord Y_coord
Zostera, 3214710, 6666810
Zostera, 3191860, 6681080
Zostera, 3195940, 6674130 Species data format:
Zostera, 3215030, 6679040
Zostera, 3208580, 6653860
Zostera, 3184780, 6642620 • data as a comma delimited *.csv file (use Excel).
Zostera, 3205750, 6669300 • only 3 columns needed: species name(s) and co-ordinates.
Zostera, 3196800, 6646150
Zostera, 3213730, 6678190
Zostera, 3206280, 6678010
Zostera, 3199600, 6647510
Zostera, 3197280, 6646490
Zostera, 3200910, 6648660
Zostera, 3212160, 6647820
Zostera, 3212160, 6647890
Zostera, 3189660, 6683280
Zostera, 3205810, 6669390
Zostera, 3213530, 6654590
Fucus, 3209220, 6657510
Fucus, 3194840, 6646240
Fucus, 3196250, 6646940
Fucus, 3189310, 6683540
…
Predictor layers describing the environmental variables
ncols 1827
nrows 2044
xllcorner 3176430
llcorner 6636626
cellsize 25
NODATA_value -9999
-9999 -9999 -9999 -9999 -9999 -9999 -0.1697558 -0.3892355 -0.629083 -0.8858771 -1.15194
-1.418818 -1.683608 -1.943836 -2.19765 -2.453322 -2.724256 -3.016762 -3.336428 -
3.700734 -4.129993 -4.631121 -5.202521 -5.847729 -6.573002 -7.368282 -8.198206 -
9.017972 -9.795128 -10.51915 -11.18508 -11.76465 -12.1964 -12.40763 -12.36905 -
12.19018 -12.21704 -12.41916 -13.14217 -14.7096 -17.03474 -19.10044 -20.86929 -
22.32145 -23.51356 -24.54868 -25.52947 -26.52113 -27.53157 -28.51738 -29.42035 -
30.20646 -30.87352 -31.43348 -31.87958 -32.1587 -32.1725 -31.80916 -30.98529 -29.68661
-27.99283 -26.07573 -24.18093 -22.60346 -21.62301 -21.31837 -21.2968 -21.14873 -
20.38395 -18.66777 -15.95716 -13.17438 -10.75567 -8.945774 -6.735695 -4.542916 -
2.320879 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -
9999 -9999 -0.5598915
Predictors:
Depth (DEM)
Predictors:
exposure
Predictors:
distance from sand. A proxy
for sandy substrate (that is
not available).
Predictors:
Slope (derived from the DEM)
The Maxent output
probability raster is an ascii
(.asc) raster, which is easy
to exported to ArcView for
further analysis and
symbolisation.
Substrate data = categorical data
Cormorant fishing areas
Substrate included as a categorical variable
Worth to remember when modelling:
1. Garbage in garbage out.
2. Use a sufficient number of records. No algorithm can model extremely sparse species
data. Guideline > 30 records.
3. Each record should bring new information to the model; clusters of observations -> one
observation.
4. Samples should spread across the whole area of interest. -> Stratified sampling.
5. Beware of sampling bias especially in po-methods.
6. Pre-process the predictors carefully. Resolution, collinearity etc.
7. Check the model fit. ( AUC, cross validation, learn-test datasets). Large literature
available.
8. Many sources of error. -> predictions will always be uncertain. -> Be realistic and
cautious when interpreting the results.
Workflow:
1. The Maxent program
2. The Maxent output
3. Do a Maxent run using Zostera data and four predictor layers (individually or together)
4. Import the Maxent predictions to ArcView (together)