0% found this document useful (0 votes)
148 views14 pages

Behrens Et Al. 2018 - Spatial Modelling With Euclidean Distance Fields and Machine Learning

This study introduces a new hybrid spatial modelling framework called Euclidean distance fields in machine learning (EDM) that accounts for spatial non-stationarity, autocorrelation, and environmental correlation when mapping soil properties. EDM uses a set of generic spatially autocorrelated Euclidean distance fields as additional predictors alongside commonly used environmental covariates in machine learning methods. The approach provides advantages over other prediction methods like regression kriging and geographically weighted regression. The study demonstrates EDM produces accurate digital soil maps comparable to other contextual multiscale methods, with best results from tree-based algorithms like Cubist and random forest. EDM is a new practical alternative for digital soil mapping that enhances the available toolbox.

Uploaded by

wilkerjs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views14 pages

Behrens Et Al. 2018 - Spatial Modelling With Euclidean Distance Fields and Machine Learning

This study introduces a new hybrid spatial modelling framework called Euclidean distance fields in machine learning (EDM) that accounts for spatial non-stationarity, autocorrelation, and environmental correlation when mapping soil properties. EDM uses a set of generic spatially autocorrelated Euclidean distance fields as additional predictors alongside commonly used environmental covariates in machine learning methods. The approach provides advantages over other prediction methods like regression kriging and geographically weighted regression. The study demonstrates EDM produces accurate digital soil maps comparable to other contextual multiscale methods, with best results from tree-based algorithms like Cubist and random forest. EDM is a new practical alternative for digital soil mapping that enhances the available toolbox.

Uploaded by

wilkerjs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

European Journal of Soil Science, 2018 doi: 10.1111/ejss.

12687

Spatial modelling with Euclidean distance fields


and machine learning

T . B e h r e n s a , K . S c h m i d t a, R . A . V i s c a r r a R o s s e l b , P . G r i e s a , T . S c h o l t e n a
& R . A . M a c M i l l a nc
a
Department of Geosciences, Soil Science and Geomorphology, University of Tübingen, Rümelinstraße 19-23, 72074, Tübingen, Germany,
b
CSIRO Land and Water, GPO Box 1700, Canberra ACT 2601, Australia, and c LandMapper Environmental Solutions Inc., 7415 118 A
Street NW, Edmonton, Alberta Canada

Summary
This study introduces a hybrid spatial modelling framework, which accounts for spatial non-stationarity, spatial
autocorrelation and environmental correlation. A set of geographic spatially autocorrelated Euclidean distance
fields (EDF) was used to provide additional spatially relevant predictors to the environmental covariates
commonly used for mapping. The approach was used in combination with machine-learning methods, so we
called the method Euclidean distance fields in machine-learning (EDM). This method provides advantages over
other prediction methods that integrate spatial dependence and state factor models, for example, regression kriging
(RK) and geographically weighted regression (GWR). We used seven generic (EDFs) and several commonly used
predictors with different regression algorithms in two digital soil mapping (DSM) case studies and compared the
results to those achieved with ordinary kriging (OK), RK and GWR as well as the multiscale methods ConMap,
ConStat and contextual spatial modelling (CSM). The algorithms tested in EDM were a linear model, bagged
multivariate adaptive regression splines (MARS), radial basis function support vector machines (SVM), Cubist,
random forest (RF) and a neural network (NN) ensemble. The study demonstrated that DSM with EDM provided
results comparable to RK and to the contextual multiscale methods. Best results were obtained with Cubist, RF
and bagged MARS. Because the tree-based approaches produce discontinuous response surfaces, the resulting
maps can show visible artefacts when only the EDFs are used as predictors (i.e. no additional environmental
covariates). Artefacts were not obvious for SVM and NN and to a lesser extent bagged MARS. An advantage of
EDM is that it accounts for spatial non-stationarity and spatial autocorrelation when using a small set of additional
predictors. The EDM is a new method that provides a practical alternative to more conventional spatial modelling
and thus it enhances the DSM toolbox.

Highlights
• We present a hybrid mapping approach that accounts for spatial dependence and environmental correlation.
• The approach is based on a set of generic Euclidean distance fields (EDF).
• Our Euclidean distance fields in machine learning (EDM) can model non-stationarity and spatial autocorrela-
tion.
• The EDM approach eliminates the need for kriging of residuals and produces accurate digital soil maps.

Introduction to the parametric distribution of the ‘clorpt’ soil forming factors


(i.e. climate, organisms, relief, parent material and time) described
The spatial variation and distribution of soil properties can be
by Jenny (1941). Here, we introduce Euclidean distance fields
represented by random fields (i.e. random functions in the geo-
(EDFs) (Rosenfeld & Pfaltz, 1968) as new, spatially relevant,
graphical space that are spatially correlated) (Matérn, 1960). Soil
covariates for modelling soil spatial variation. The EDFs can be
variation and its spatial distribution are also often strongly related
used alone as predictors in regression approaches for mapping, or
Correspondence: T. Behrens. Email: [email protected] in combination with other more commonly used predictors, such
Received 20 June 2017; revised version accepted 3 May 2018 as climate and terrain attributes, when there is a need to account for

© 2018 British Society of Soil Science 1


2 T. Behrens et al.

both spatial dependence and environmental correlation. We propose contrast, RK cannot account for spatial non-stationarity because of
the use of EDFs with machine learning methods; therefore, we call its two-step approach in setting up a global regression model with
the method Euclidean distance fields in machine learning (EDM). fixed geographical relations and subsequent interpolation by kriging
Spatial dependence refers to the covariation of variables within to account for spatial autocorrelation in the residuals. Compared to
geographic space and includes descriptions of spatial autocorrela- EDM, the regression models in GWR are explicitly local and thus
tion and non-stationarity. Spatial autocorrelation is a quality of the based only on a local subset of sample data. Therefore, the local
data whereby observations are interrelated in space. It is used by regression models in GWR might not reveal important global spa-
methods for spatial interpolation to estimate values at unobserved tial dependencies.
locations using values at observed locations in a neighbourhood There are additional hybrid methods described in the DSM
(Oliver & Webster, 1990; Páez, 2004). Spatial non-stationarity is literature. These are mostly variations of RK (i.e. RK with
a condition in which a global regression model cannot explain the regression approaches other than the linear model) or combina-
relations between some sets of variables (i.e. where variation in rela- tions of GWR and kriging, such as GWR-kriging or local RK
tions occurs over space and where models that allow local variation (e.g. Kumar et al., 2012; Viscarra Rossel et al., 2015; Sun et al.,
are required) (Brunsdson et al., 1996). 2012). Although local kriging models can handle non-stationarity
Models that account explicitly and simultaneously for both spatial better than the usual RK, all local models have the same theo-
dependence and environmental correlation are called ‘hybrid’ mod- retical drawbacks. Regression kriging with non-linear regression
els. In this respect the ‘scorpan’ approach introduced by McBratney models can, in some cases, improve the accuracy of prediction
et al. (2003) is an extension of Jenny’s clorpt state factor equation (Viscarra Rossel et al., 2014).
to account more explicitly for spatial effects. In addition to Jenny’s Another category of hybrid DSM methods comprises those that
clorpt factors, the scorpan model adds ‘s’ for other soil proper- use multiscale contextual derivatives or indicators of environmental
ties and ‘n’, which is important for hybrid models because it refers covariates, such as terrain; that is ConMap (Behrens et al., 2010),
to space or location, and thus any spatial dependence in the data. ConStat (Behrens et al., 2014) and contextual spatial modelling
Two well-known hybrid approaches are regression kriging (RK) (CSM, Behrens et al., 2018), referred to as contextual mapping in
(Neuman & Jacobson, 1984; Odeh et al., 1994) and geographi- the following. They do not rely on spatial predictors, such as in
cally weighted regression (GWR) (Brunsdon et al., 1996). In the EDM, and are not based on local regression models as in GWR.
case of RK, a regression model is first generated using a set of Rather, they make use of spatial contextual environmental covariate
environmental covariates followed by kriging (spatial prediction) information. This contextual information is extracted at local to
of the regression residuals. Both models are summed to generate supra-regional scales in terms of circular spatial neighbourhoods
the final prediction map. In contrast, GWR is based on localized or decomposed scales, and considers much larger neighbourhoods
calibrations, where different regression models are applied in differ- than common multiscale approaches (e.g. Wood, 1996; Moran
ent regions using local sets of sample points. Kriging is commonly & Bui, 2002; Behrens et al., 2018). These contextual mapping
used to predict spatially autocorrelated variables under assumptions approaches account for the source of spatial dependence (i.e. small
of stationarity, whereas GWR was developed to predict spatially to large-scale interactions between environmental covariates across
non-stationary variables. Local RK approaches (Sun et al., 2012) the landscape), instead of using spatial autocorrelation or factoring
adapt the GWR idea to the RK framework and derive local models in non-stationarity for mapping directly, as with GWR or RK. In
within a given geographic neighbourhood. this respect, contextual mapping represents a contrasting approach
Another approach to account for spatial dependence is trend to EDM. Although EDM makes use of spatial autocorrelation,
surface mapping (TSM) (Unwin, 1975), which is often used for the contextual mapping approaches try to remove the spatially
detrending (i.e. separating relatively large-scale systematic spatial autocorrelated part of the residuals.
trends from non-systematic small-scale variation arising from local Our aims here are to describe the EDM concept, to demonstrate its
effects) (Krumbein, 1959). It is based on fitting linear or polynomial application with two datasets and to compare the efficacy of EDM
regression equations to the geographic (x, y) coordinates. Applying against ordinary kriging (OK), RK, GWR and contextual mapping
the concept of TSM with non-linear modelling approaches, such methods.
as artificial neural networks or decision trees, should produce local
rather than global trend models that can characterize local variation. Materials and methods
Including additional EDFs as predictors (i.e. not only x, y) should
Study sites
help to map complex processes that are characterized by both global
and local spatial variation. We suggest here a set of five generic The aim of this study was to introduce and evaluate a new hybrid
geographic EDFs that can be used as additional spatial predictors spatial mapping approach using Euclidean distance fields embedded
in ‘scorpan’ models. in a ‘scorpan’ framework with additional environmental covariates;
Because EDFs partition geographic space into sub-regions, the therefore, we needed datasets with both spatial dependence and
regression model can deal with the interactions between envi- some correlation with environmental factors. Consequently, we
ronmental covariates and soil that vary non-linearly over space. used the same two datasets analysed previously by Behrens et al.
This is a common advantage of EDM, GWR and local RK. In (2010, 2014, 2018) because they met the required criteria.

© 2018 British Society of Soil Science, European Journal of Soil Science


Spatial modelling with Euclidean distance fields 3

The study site in Rhine-Hesse, Germany, is a large wine-growing We derived the following seven, moderately correlated
and loess-covered region of approximately 1150 km2 . The mean (maximum r = 0.66), terrain attributes from the DEMs as envi-
annual precipitation ranges from 500 to 850 mm. Luvisols, Cam- ronmental covariates for both study sites with the common terrain
bisols and Podzols characterize this region. The major process that analysis tools provided in SAGA GIS 6.0 (Conrad et al., 2017):
influenced topsoil silt content, which is analysed in this study, was valley depth, topographic wetness index (TWI), slope, longitudinal
local translocation of loess during the Wuerm glaciation. The loess curvature, elevation, cross-sectional curvature and aspect. Sine
was blown out of the surrounding riverbeds and deposited on the and cosine transforms (sinAspect and cosAspect) were applied
plateaus and in lee areas. This process is not well reflected by the to linearize the circular measurements of aspect. Thus, including
underlying parent material (map) or by current climate conditions elevation, eight terrain attributes were computed and used. All
or land cover. Historical land use might be relevant to explain sub- terrain attributes were subsequently log10 transformed and rescaled
sequent erosion processes better. However, data on historical land to a range of 0 to1.
use are not available. Therefore, the only relevant data available
for prediction of this property in this area are those on terrain. Ter- Euclidean distance fields and terrain attribute subsets
rain shape modulated climate conditions, leading to translocation of
The X and Y coordinates used alone as spatial predictors in soil
loess. Gravity, and thus terrain, is also the major controlling factor
mapping might not be sufficient to produce accurate models. Thus,
of soil erosion. Other relevant processes in this region, such as the
additional spatial Euclidean distance fields (Krumbein, 1959) can
formation of periglacial slope deposits, are controlled by solar inso-
be used as explicitly autocorrelated indicators of spatial position
lation (i.e. slope and aspect) and contributing area. In total we used
and context for ‘scorpan’ models in common regression approaches.
342 samples to predict topsoil silt content (0–10 cm) with a digital
We integrated the following generic Euclidean distance fields in the
elevation model (DEM) with a resolution of 20 m (Figure 1). The
EDM:
topsoil silt content ranges from 2 to 83%.
The Piracicaba study area covers an area of approximately • X and Y coordinates, which are distance fields to the edges of a
300 km2 and is in the state of São Paulo (Brazil); the area comprises rectangle around the sample set.
mainly sugarcane fields. The mean annual precipitation is 1328 mm. • the distances to the corners of a rectangle around the sample set
The geology of the area is characterized by sandstone, siltstone (C1, C2, C3, C4).
and shale, and to a lesser extent limestone, basalt and colluvial • and the distances to the centre location of the sample set (CC).
deposits (Mezzalira, 1965). Arenosols, Ferralsols, Acrisols, Alisols,
Nitisols, Cambisols and Lixisols are the major soil types. Because of Figure 2 shows an example of these spatial predictors for a
the predominantly sedimentary substrates, ranging from sandstone quadratic region.
to limestone, clay content, which is modelled in this study, has a The idea behind the selection of the corners and centre distance
wide range of values from 6 to 72%. Therefore, parent material was that these provide additional (non-linear) information about
plays a major role, but there is no appropriate map of parent material position without introducing too many additional predictors, to
available. Nevertheless, the effect of parent material can be inferred maintain parsimony in the models. Moreover, this set of EDFs is
from the geomorphic signature (Behrens et al., 2014, 2018). Land independent from the sample locations and can easily be calculated
use does not play a role because all the data come from sugarcane in any geographic information system (GIS). Other metrics related
fields. Hence, terrain was again used as the only relevant covariate. to position might also be possible. However, some measures would
In total 321 soil samples were available to predict topsoil clay be required to determine the positions of the location of the origin
content (0–10 cm) with a shuttle radar topography mission (SRTM) of the EDFs. Using the corners (and the same applies for XY)
DEM with a resolution of 90 m (Figure 1). ensures that a specific distance is directionally unique across the
study area (i.e. the minimum and maximum are located at different
corners or boundaries). The converse is the centre distance, which
Environmental covariates
does not provide directionally unique distances. Yet, in combination
The hypothesis behind this study is that spatial dependence, with the X and Y coordinates and the corner EDFs it might help to
and more specifically spatial autocorrelation and spatial account for non-linear effects. Another approach, instead of using
non-stationarity, can be modelled by the EDM approach. To inter- the seven generic EDFs applied in this study, would be to use a
pret the effect of spatial dependence in relation to environmental distance transform for each sample location (i.e. sample EDFs).
correlation, we compare EDM with OK, RK using a linear regres- This might enable less complex regression algorithms to reveal local
sion model, GWR and the three contextual mapping approaches. effects better, but it would also increase markedly the number of
The idea was to compare the proposed EDM approach with several predictors and computation by a factor of 45 in this study. We used
of the most commonly used spatial prediction approaches. Because the minimum and maximum values of the X and Y coordinates of
we are introducing a new methodology, to be able to focus on the sample set to determine the origin of the distance transforms.
the EDF predictors and make comparisons with previous studies An additional buffer is not required and, because EDM might be
(Behrens et al., 2010, 2014, 2018), we use only terrain attributes as considered an interpolation method, spatial extrapolation is not
the environmental covariates. recommended.

© 2018 British Society of Soil Science, European Journal of Soil Science


4 T. Behrens et al.

(a) (b)

Figure 1 Sample locations showing the silt and clay content and elevation a.s.l. (m) for (a) Piracicaba and (b) Rhine-Hesse.
0.024




0.04

● ● ●


● ●

● ●
● ●


0.018


0.03


Semivariance


● ●


0.012

0.02



0.006

0.01


Piracicaba Rhine−Hesse
Model: Exponential Model: Spherical
Nugget: 0.003 Nugget: 0.009
Sill: 0.02 Sill: 0.03
Range: 2872 Range: 17394
0

Figure 2 Experimental variograms (dots) and


0 5000 10000 15000 0 5000 10000 15000 20000
fitted models (dashed lines) for the clay dataset
Distance/m Distance/m
of Piracicaba and the silt dataset of Rhine-Hesse.

To evaluate the effects of combining EDFs with relevant, and • All terrain attributes (T3)
partially irrelevant, terrain attributes on the performance of different • X and Y coordinates + terrain attribute subset 1 (XY + CD + T1)
models, we produced two terrain attribute subsets (T1, T2) from • X and Y coordinates + terrain attribute subset 2 (XY + CD + T2)
a feature-importance analysis. We derived the average feature • X and Y coordinates + all terrain attributes (XY + CD + T3)
importance with the model-specific feature-importance analysis • X and Y coordinates + corner distances + centre distance +
functions implemented in random forests (RF) and Cubist, together terrain attribute subset 1 (XY + CD + T1)
with a model-independent filter approach that uses the R2 value • X and Y coordinates + corner distances + centre distance +
from a locally weighted scatterplot smoothing (LOESS) regression terrain attribute subset 2 (XY + CD + T2)
for each feature. All calculations were carried out using the R • X and Y coordinates + corner distances + centre distance + all
package caret (Kuhn, 2017). The first terrain subset (T1) contains terrain attributes (XY + CD + T3)
only the most important terrain attribute and the second subset (T2)
contains the three most important terrain attributes. The T3 subset The range of the feature space of all spatial predictors was
contains all eight terrain attributes. standardized to values between 0 and 1, which is required for some
To determine the effect of the EDF predictors, their combinations algorithms such as neural networks (NN) (cf. Behrens et al., 2005).
and the influence of the terrain attributes, we evaluated the following All models used the same data without any further model-specific
combinations of EDM predictors with the environmental covariates: preprocessing in terms of variable selection or scaling.

• X and Y coordinates (XY)


Regression approaches tested with the EDF predictors
• Corner distances + centre distance (CD)
• X and Y coordinates + corner distances + centre distance The choice of the regression can be important for EDM because of
(XY + CD) the effect it has on the map of predictions. Most transitions in nature

© 2018 British Society of Soil Science, European Journal of Soil Science


Spatial modelling with Euclidean distance fields 5

Table 1 Regression models and the corresponding R libraries compared for a linear hyperplane as a decision function for non-linear problems
Euclidean distance fields in machine learning (EDM) in this study. and then apply a back-transformation in the non-linear space. We
used the typical general-purpose radial basis function kernel in this
Library
study (Karatzoglou et al., 2004).
Regression model Method R library reference

Linear regression
Decision tree ensembles
Linear model lm base R Core Team, 2017
Support vector regression One of the most recent improvements in ensemble learning, which
Radial basis svmRadial kernlab Karatzoglou et al., has become widely adopted, is RF (Breiman, 2001; Grimm et al.,
function kernel 2004 2008). It aggregates multiple classification or regression tree pre-
Regression trees dictions based on changes in the training dataset through sampling
Random forest (RF) RF randomForest Liaw & Wiener,
in the instance and feature space.
2002
Cubist uses if–then rules that partition the data. When the
Cubist Cubist cubist Quinlan, 1992;
conditions in each rule are satisfied, a linear least squares model
Kuhn, 2017
MARS is used to predict the response (Quinlan, 1992). There are various
Bagged bagEarth earth Milborrow, 2017; examples of the use of Cubist for digital soil mapping (e.g. Bui
multivariate Kuhn, 2017 et al., 2009).
adaptive regression
splines Multivariate adaptive regression splines
Neural networks
Model averaged avNNet nnet Venables & Ripley, Multivariate adaptive regression splines (MARS) introduced by
neural network 2002; Kuhn, Friedman (1991) are a generalization of recursive partitioning
2017 regression approaches such as classification and regression trees
(CART, Breiman et al., 1984). By applying linear basis functions
between the splits of the partitioned space, MARS generates
are relatively smooth, over longer or shorter distances. The use of a piecewise linear models instead of piecewise constant models like
decision tree on continuous X and Y spatial coordinates to interpo- CART. Therefore, when the underlying function is continuous,
late a spatially correlated field can produce discontinuous response the accuracy of prediction is expected to be higher with MARS
surfaces (sharp boundaries), which are unnatural and not visually (Friedman, 1991). The piecewise functions are aggregated in terms
appealing (Figures 10 and 11). Yet, these discontinuous surfaces of an additive model.
might produce a model that is well validated. Therefore, implement-
ing EDM requires careful choice of the appropriate regression for
Single hidden layer artificial neural networks
the specific aim. Consequently, we compared several algorithms.
Table 1 lists the models tested using the caret package in R Neural networks are frequently used in DSM studies (e.g. Behrens
(Kuhn, 2017). The following sections provide a short overview of et al., 2005). In this study we used single hidden layer feed-forward
the methods. Most of these methods have already been applied NN as described in Venables & Ripley (2002). Kuhn (2017)
successfully in the context of digital soil analysis and mapping (e.g. extended the model in terms of an ensemble approach, which
Grimm et al., 2008; Bui et al., 2009; Viscarra Rossel & Behrens, aggregates the same NN model based on a different random number
2010; Schmidt et al., 2014). seeds.

Linear regression Reference algorithms currently used for DSM

Multiple linear regression examines linear correlations between Kriging. Kriging is applied in several fields of environmental sci-
multiple independent variables and a dependent variable. We ence; it was one of the first and most important, spatial interpo-
applied the least square criterion for calibrating the model (Rao & lation techniques. Kriging is based on the theory of regionalized
Toutenburg, 1999). It is the most general and widely used model variables (Matheron, 1963) and is a spatial distance-weighted inter-
and served as a reference in this study. It was also used as part of polation method that assumes stationarity or intrinsic stationarity
regression kriging and GWR in this study. of the mean (Webster & Oliver, 2007). The weighting is based on
the spatial autocorrelation function, which can be visualized and
analysed using a variogram.
Support vector machines
Regression kriging (RK) is a hybrid extension of the above that
Support vector machines are a kernel-based learning method from combines a linear regression model with kriging of the residuals
statistical learning theory. They make use of an implicit mapping (Neuman & Jacobson, 1984; Odeh et al., 1994). The advantage
of the input data into a high-dimensional feature space defined by of RK is the inclusion of an external trend. Regression kriging
a kernel function (Karatzoglou et al., 2004). It is possible to derive is equivalent to universal kriging or kriging with external drift

© 2018 British Society of Soil Science, European Journal of Soil Science


6 T. Behrens et al.

Figure 3 Visualization of the Euclidean distance fields in machine learning (EDM) predictors.

Table 2 Pearson correlation coefficients between clay (Piracicaba) and silt ConMap and ConStat differ in the way that terrain features are
(Rhine-Hesse), and the EDFs and terrain attributes. generated and described. ConMap uses elevation differences from
the centre pixel to each pixel in a sparse circular neighbourhood.
Predictor Clay Silt
ConStat uses statistical measures within growing, sparse circular
Y 0.56 −0.11 neighbourhoods. In both cases, the terrain indices extracted for
X 0.10 0.19 each location are used as predictors. The advantage of ConStat is
CC −0.20 0.27 that the resulting model can be interpreted in terms of soil genesis
C4 0.45 −0.25 using feature importance analysis and partial dependence models
C3 −0.51 −0.13
(Behrens et al., 2014). Both methods depend on terrain indices only;
C2 0.49 0.15
therefore, they have no explicit geographic component as in RK,
C1 −0.48 0.25
Valley depth −0.46 −0.13
GWR or EDM. However, they can account for spatial dependence
TWI 0.03 −0.03 (Behrens et al., 2010, 2014).
Slope 0.05 0.14 A related approach to the above methods is CSM (Behrens et al.,
Longitudinal curvature 0.19 −0.01 2018). In contrast to ConMap and ConStat, CSM uses a small set
Elevation 0.56 0.29 of common terrain attributes derived from scaled versions of the
Cross-sectional curvature 0.09 −0.08 DEM generated by a Gaussian pyramid approach. The advantages
sin(Aspect) 0.08 0.17 compared to ConMap and ConStat are that the models are easier to
cos(Aspect) 0.03 0.05 interpret, the entire range of scales can be covered, and the approach
CC, distances to the centre location of the sample set; C1, C2, C3, C4, is computationally less demanding.
distances to the corners of a rectangle around the sample set; EDF, Euclidean The maximum spatial context analysed with ConMap and Con-
distance fields; TWI, topographic wetness index. Stat in this study was set to a neighbourhood radius of 20 km for
Rhine-Hesse and 25 km for Piracicaba. The variograms of the clay
(e.g. Hengl et al., 2003). We used the gstat package (Pebesma, and silt distribution are shown in Figure 3.
2004) in R for variography and kriging.
Validation. Several aspects have to be considered regarding vali-
Geographically weighted regression. Geographically weighted
dation accuracy:
regression (GWR) is a local distance-weighted linear regression
technique that accounts for local spatial variation (Brunsdon et al.,
1996). It enables regional prediction of properties based on a linear • the effect of different EDF and terrain predictor combinations on
regression with spatially varying regression coefficients. The spatial modelling accuracy,
kernel used to weight observations in the regression is based on their • the differences between regression approaches tested with the
distance to the centre and can be constant or adaptive. We used an EDF predictors, and
adaptive Gaussian spatial kernel as implemented in the GWmodel • the comparison of EDM predictions against the reference
package in R (Gollini et al., 2015), which ensured that the kernel approaches (OK, RK, GWR and contextual mapping).
size was adapted to the sample density.
Ten times 10-fold cross-validation was used to determine mod-
Contextual mapping. In contrast to common DSM approaches elling accuracy for all models. Begleiter & El-Yaniv (2008) pro-
based on derivatives computed from digital terrain analysis, Con- posed that estimation of parameters by the regression approaches
Map (Behrens et al., 2010) and ConStat (Behrens et al., 2014) do and accuracy of estimates from modelling should be embedded
not derive standard terrain attributes. Both ConMap and ConStat are within one cross-validation procedure. Therefore, we tested differ-
designed to analyse simultaneously a wide range of spatial scales, ent parameter settings in a grid learning approach (e.g. Schmidt
from the local or point scale to supra-regional scales, which is not et al., 2008) implemented in the R package caret (Kuhn, 2017).
typically accomplished with conventional terrain analysis. In the Therefore, we used a single ten-fold cross-validation approach. Ten
Rhine-Hesse case study the largest scale or neighbourhood size was times ten-fold cross-validation was used to derive the final mod-
1000 times larger than the cell size of the DEM. elling accuracy and the 95% confidence intervals of the accuracies

© 2018 British Society of Soil Science, European Journal of Soil Science


Spatial modelling with Euclidean distance fields 7

Piracicaba Rhine-Hesse
100 100

Relative feature importance


Relative feature importance

90 90
80 80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0

Average RF Cubist LOESS Average RF Cubist LOESS

Figure 4 Feature-importance values for Piracicaba. Feature-importance Figure 5 Feature-importance values for Rhine-Hesse. Feature-importance
values calculated by random forests (RF), Cubist and locally weighted values calculated by random forests (RF), Cubist and locally weighted
scatterplot smoothing (LOESS) regression using the R package caret. scatterplot smoothing (LOESS) regression using the R package caret.

from ten-fold cross-validation. We used the coefficient of determi- confidence intervals of the EDM approaches averaged across
nation (R2 ) as the criterion to interpret the differences between the all regression algorithms. Figures 7 and 8 give the R2 values
models and the study sites. and corresponding 95% confidence intervals for the regression
approaches for Piracicaba and Rhine-Hesse, respectively. Figure 9
Results shows the R2 cross-validation results of the reference mapping
approaches. The R2 values of the CSM approach are taken from
Correlation and feature importance Behrens et al. (2018).
The correlation between the soil properties and the EDFs, as well as The largest validation accuracies within each group of pre-
the terrain attributes, was comparable at both study sites, indicating dictor combinations indicate that the two tree-based modelling
a similar effect of terrain and spatial dependence on the distribution approaches, Cubist and RF, generally performed best (Figures 7
of the soil property (Table 2). In Piracicaba, the maximum value of and 8). For the predictor combinations including terrain attributes,
the correlation coefficient was 0.56 for the EDFs and for the terrain bagEarth also had large validation accuracies. The svmRadial and
attributes. In Rhine-Hesse, the maximum was 0.27 for the EDFs avNNet methods were in some cases similar to bagEarth but were
and 0.29 for terrain, indicating a more complex landscape and thus less accurate overall. The linear model did not perform well in any
pedogenesis. of the scenarios tested.
Figures 4 and 5 show that the general trend of the three The main results of the comparison of combinations of the X and
feature-importance measures is relatively consistent. The trend Y coordinates (XY), the corner and centre distance transforms (CD)
appears to be more important for the EDFs than the terrain and the terrain attributes, are:
attributes for both study sites. Elevation was the most important
terrain attribute in both cases and was the only one with the • prediction accuracy of the CD data was generally significantly
same range of importance as the EDF predictors. All other ter- greater than for the XY data,
rain attributes had average importance values below the weakest • prediction accuracy of the XY + CD data was generally signifi-
EDF. For Rhine-Hesse, the most useful remaining terrain attributes cantly higher than the CD data alone,
selected for subset T2 were slope and sin(Aspect) (Figure 5). For • the increase in prediction accuracy when adding CD to XY was
Piracicaba, the additional terrain attributes for T2 were valley depth comparable for both study sites,
and TWI (Figure 4). • although correlation analysis suggested a comparable effect of
terrain and EDF predictors, the prediction accuracy was least
when only terrain attributes were used,
Validation
• the largest prediction accuracies were obtained when EDF and
Because the results of both study sites show similar general terrain attributes were combined,
patterns, in most cases, the results are described together. Figure 6 • in general, the XY + CD + T models attained significantly
shows the R2 cross-validation values and the corresponding 95% greater accuracies than the corresponding XY + T models,

© 2018 British Society of Soil Science, European Journal of Soil Science


8 T. Behrens et al.

Average prediction accuracy

0.60

0.45

Study site
Rhine−Hesse
R2

0.30
Piracicaba

0.15

0.00
Figure 6 Average R2 values of the Euclidean

XY+CD+T1

XY+CD+T2

XY+CD+T3
XY+CD

distance fields in machine learning (EDM)


XY+T1

XY+T2

XY+T3
CD
XY

T3

models for Piracicaba and Rhine-Hesse. The


lines indicate the 95% confidence interval.

Piracicaba

0.60

0.45
lm
avNNet
R2

svmRadial
0.30 bagEarth
cubist
rf

0.15

0.00

XY CD XY+CD T3 XY+T1 XY+T2 XY+T3 XY+CD+T1 XY+CD+T2 XY+CD+T3

Figure 7 The R2 values of the Euclidean distance fields in machine learning (EDM) models for Piracicaba. The lines indicate the 95% confidence interval.

• in some cases for Cubist and RF there was no significant Importantly, the additional CD spatial location predictors with
difference when the CD were added to XY, XY or XY + T significantly increased prediction accuracy in most
• adding non-relevant predictors, such as in T3, often had a cases.
significant negative effect on prediction accuracy, which was an
effect of fitting noise,
• analysis of single regression approaches showed that the strength Mapping
of the negative effect of non-relevant predictors was related to the
regression approaches, Visualization of the model results is part of interpreting the
• radialSVM was the most adversely affected by noisy pre- behaviour of the machine learning approaches when using dif-
dictors, whereas the linear regression model (lm) was not ferent combinations of EDFs and environmental covariates. We
affected at all, illustrate the most relevant models based on the XY, XY + CD,
• bagEarth, RF and Cubist analysed the EDF + T datasets best, T3 and XY + CD + T3 predictors. The EDM predictions and
• the negative effect of noise was less pronounced with the maps of the reference DSM approaches for Piracicaba are
XY + CD + T data than XY + T data, and shown in Figure 10, and for Rhine-Hesse in Figure 11. In each
• the greatest increase in R2 with XY and CD combined was for case, the legends are restricted to the range of the soil prop-
lm and avNNet, showing that less complex models benefit from erty values in the sample data. The following sections pertain
the additional information. to the mapping.

© 2018 British Society of Soil Science, European Journal of Soil Science


Spatial modelling with Euclidean distance fields 9

Rhine−Hesse
0.60

0.45

lm
avNNet
R2

0.30 svmRadial
bagEarth
cubist
rf

0.15

0.00

XY CD XY+CD T3 XY+T1 XY+T2 XY+T3 XY+CD+T1 XY+CD+T2 XY+CD+T3

Figure 8 The R2 values of the Euclidean distance fields in machine learning (EDM) models for Rhine-Hesse. The lines indicate the 95% confidence interval.

Reference models The EDM XY + CD + T3 models


0.70
Visually, the EDM XY + CD + T3 RF, Cubist and bagEarth predic-
tions resembled the ConMap predictions most closely. SvmRadial
0.65
and avNNet showed a similar pattern, but with more spatial varia-
tion, which might be a function of fitting non-relevant predictors,
0.60 resulting in spurious spatial detail. Like ConMap, the EDM pre-
Study site dictions with RF, Cubist and bagEarth showed less local variation
than for svmRadial, but they achieved greater prediction accuracies.
R2

0.55 Rhine−Hesse
Piracicaba Thus, they were less affected by noisy or irrelevant predictors.
0.50

The EDM predictors and piece-wise models


0.45
The XY models clearly showed artefacts related to the piece-wise
modelling approaches of the tree-based ensembles and to a lesser
0.40 extent the bagEarth model. Although this resulted in unnatural look-
OK RK GWR ConMap ConStat CSM ing response surfaces, they were the best performing interpolation
approaches based on the EDF + T datasets tested so far. One advan-
Figure 9 The R2 values of the reference modelling approaches for Piraci- tage of this behaviour might be that we are immediately reminded
caba and Rhine-Hesse. that we are looking at a model and not at ground truth.
The addition of CD to the XY + T predictors also produced sim-
The EDM XY and XY + CD models ilar artefacts (e.g. abrupt boundaries), but they were less obvi-
ous. When we included terrain data as additional covariates in
The most interesting comparisons concern the EDM modelling hybrid ‘scorpan’ modelling, only a very few such artefacts remained
approaches that use only EDFs as predictors. In general, the visible.
local patterns were stronger when XY + CD were used as pre-
dictors instead of XY only. In this case, even the linear regres-
Comparison of EDM to the reference mapping approaches
sion model showed the general spatial trend of the silt and
clay distributions. However, only Cubist, RF and svmRadial Both EDM (XY + CD + T3) and RK produced similar accuracies.
showed details comparable to OK. Cubist and RF, and to a The contextual mapping approaches gave results that were slightly
lesser extent bagEarth, produced visible artefacts, which stemmed better than all other models in Rhine-Hesse. In Piracicaba they
from the methods. were similar to RK and the XY + CD + T models. In both study

© 2018 British Society of Soil Science, European Journal of Soil Science


10 T. Behrens et al.

Figure 10 Spatial modelling results for Piracicaba. The


upper section shows the spatial models arranged as a
matrix where the rows reference the machine learning
algorithms and the columns the different Euclidean dis-
tance fields in machine learning (EDF) combinations
used for modelling (XY, XY + CD and XY + CD + T).
The lower section shows the spatial models of the refer-
ence algorithms OK, RK, GWR, ConMap and ConStat.

© 2018 British Society of Soil Science, European Journal of Soil Science


Spatial modelling with Euclidean distance fields 11

Figure 11 Spatial modelling results for Rhine-Hesse. The upper section shows the spatial models arranged as a matrix where the rows reference the machine
learning algorithms and the columns the different Euclidean distance fields in machine learning (EDF) combinations used for modelling (XY, XY + CD and
XY + CD + T). The lower section shows the spatial models of the reference algorithms OK, RK, GWR, ConMap and ConStat.

© 2018 British Society of Soil Science, European Journal of Soil Science


12 T. Behrens et al.

areas, GWR was only slightly better than lm. The local GWR The theoretical advantages of EDM have not resulted here in clear
regressions seemed to lack the global information required to derive increases in prediction accuracy compared to the other (reference)
a good general model. The relations are also non-linear in general methods, although the accuracy attained by EDM was similar
so that local linear models do not account well for spatially varying to that of RK, which cannot account for changing geographical
geographical relationships. Another reason is the generally small relations. However, the theoretical advantage of EDM should
effect of terrain at the scale of the DEM resolution, which might be result in greater prediction accuracies when there is strong spatial
a case where GWR should not be considered. autocorrelation, when the relation of the dependent variable to the
The generally smaller contribution of common terrain attributes environmental covariates is strong and when the extent of changes
to model accuracy in Rhine-Hesse also seemed to account for the in the geographic relations in space are large.
poorer validation accuracy of RK compared to OK. Therefore, the The GWR, and similar local models, often cannot develop rela-
residuals used in the kriging part of RK seemed to provide less tions from the entire dataset because it might not be useful for
information than the original data, which must also be attributed DSM. As shown in the Rhine-Hesse dataset, RK can be less accu-
to the adverse effect of non-relevant predictors. rate than OK if the regression model is developed on ‘noisy’ and
In Piracicaba RK was equivalent to the contextual mapping potentially artificial relations. In the presence of non-stationarity,
approaches. However, it performed less well in Rhine-Hesse. EDM might perform better than RK because regional variation can
The best EDM approaches in Rhine-Hesse were based on RF be dealt with directly. If the spatial structure shows a large-scale
and Cubist, which were similar to OK, but their R2 values trend ConMap and ConStat might not be able to resolve it fully. In
were approximately 5% smaller than for contextual mapping such cases, EDM, RK and GWR might outperform ConMap and
(Figures 7–9). ConStat.
It is unlikely that a different set of conventional terrain attributes Linear models should also be able to produce results strongly cor-
at the scale of resolution of the DEM would help to increase the pre- related with those of OK when the distances to each sample location
diction accuracy of the EDM approach, RK or GWR in both regions are used as additional predictors for interpolation. However, from
significantly. For Rhine-Hesse the set of reference terrain attributes a comparison with the reference methods, a further increase in pre-
in Behrens et al. (2010) for ConMap was different and contained diction accuracy with additional distance fields is not expected in
many more attributes. Nevertheless, the R2 value obtained with hybrid ‘scorpan’ models for non-linear regressions. Nevertheless,
RF in that study was also smaller than 0.2. It seems for both it might help to improve the performance of some algorithms, such
datasets that the better the terrain-based predictions performed, as artificial neural networks.
the more comparable were the validation accuracies between For the regression methods tested with EDM, the application
all models. of machine learning algorithms depends on the dataset and they
can perform differently (cf. Viscarra Rossel & Behrens, 2010).
Random forests, Cubist and bagged MARS seem to be good
Discussion options for EDM. In many cases, SVM and NN should provide
good predictions. An outstanding question is what to do about
Contextual spatial modelling, EDM and GWR differ from RK in the visible artefacts in the response surfaces of the tree-based
their ability to consider non-stationary relations between a soil approaches.
property and environmental covariates directly. The contextual Most of the methods tested, including EDM, are not suited for
mapping approaches and EDM account for spatial autocorrelation interpreting the models entirely in terms of the processes that
by using covariates that identify and describe some of the spatial result in soil formation. Such interpretations are achieved best with
structure in a target variable (e.g. de Knegt et al., 2010). In the machine learning methods (e.g. RF or Cubist), together with the
case of contextual mapping, these covariates represent substitutes contextual mapping. Conversely, EDM is fast because it requires
for environmental factors, whereas for EDM they are explicit only a small set of additional predictors.
spatially autocorrelated Euclidean distance fields that are not
specific regarding the location and values of the sample set and
environmental covariates. These EDFs enable the regression model
Conclusions
to infer the relevant spatial dependence when predicting unknown The generic EDFs that we described and evaluated represent a
values at new locations based on known values at nearby sampled new option for improving representation of the spatial factor (n) in
locations. This is the opposite of GWR, for which local models the ‘scorpan’ model. We showed that the accuracy of predictions
are used, which also aim to extract the spatial structure that is not made with several commonly used machine-learning approaches
described by environmental covariates. Contextual mapping, EDM generally improved when the new location covariates, or EDFs,
and RK differ from GWR because GWR does not use the entire were included in the analysis. These additional measures of spa-
sample set for creating the regression model at specific locations, tial context and position can, and do, undoubtedly contribute to
but only local subsets. Thus, it might fail to reveal relevant parts of improvements in spatial prediction of soil properties and spatial
the soil–environmental relations, which might only be extractable data mining, in general. They enable machine-learning models
from the entire dataset. to vary predictions locally to model non-stationary conditions

© 2018 British Society of Soil Science, European Journal of Soil Science


Spatial modelling with Euclidean distance fields 13

and to make locally varying predictions that use information that Grimm, R., Behrens, T., Märker, M. & Elsenbeer, A. 2008. Soil organic
quantifies local spatial autocorrelation. carbon concentrations and stocks on Barro Colorado Island–digital soil
The results of this paper add to the growing body of evidence mapping using random forests analysis. Geoderma, 146, 102–113.
which suggests that machine learning models that use covariates Hengl, T., Heuvelink, G. & Stein, A. 2003. Comparison of Kriging
with External Drift and Regression Kriging. International Institute for
describing spatial position or spatial context might eliminate the
Geo-information Science and Earth, Enschede [WWW document]. URL
need for a second, separate step to correct residuals, as done in the
https://webapps.itc.utwente.nl/librarywww/papers_2003/misca/hengl_
kriging step of RK. They do this in a single-stage spatially varying
comparison.pdf [accessed on 26 April 2018].
prediction model. The remaining residuals ought not to exhibit any
Jenny, H. 1941. Factors of Soil Formation. McGraw-Hill, New York.
remaining spatial dependence. Karatzoglou, A., Smola, A., Hornik, K. & Zeileis, A. 2004. kernlab–An
S4 package for kernel methods in R. Journal of Statistical Software, 11,
1–20.
Acknowledgements
de Knegt, H.J., van Langevelde, F., Coughenour, M.B., Skidmore, A.K., de
This research was funded by the German Research Foundation Boer, W.F., Heitkönig, I.M.A. et al. 2010. Spatial autocorrelation and the
(DFG) under the PedoScale project (BE 4023/3-1). We are very scaling of species–environment relationships. Ecology, 91, 2455–2465.
grateful to José A.M. Demattê for providing the Brazilian dataset Krumbein, W.C. 1959. Trend surface analysis of contour type maps with
and to the Federal Geological Survey of Rhineland Palatinate for irregular control-point spacing. Journal of Geophysical Research, 64,
providing the Rhine-Hesse dataset. 823–834.
Kuhn, M. 2017. Caret: Classification and Regression Training. R Package
Version 6.0–76 [WWW document]. URL https://CRAN.R-project.org/
References package=caret [accessed on 31 May 2017].
Kumar, S., Lal, R. & Liu, D. 2012. A geographically weighted regression
Begleiter, R. & El-Yaniv, R. 2008. A Generic Tool for Performance
kriging approach for mapping soil organic carbon stock. Geoderma,
Evaluation of Supervised Learning Algorithms. Technical Report No
189–190, 627–634.
CS-2008-01. Computer Science Department, Technion, Israel [WWW
Liaw, A. & Wiener, M. 2002. Classification and regression by random
document]. URL http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-
Forest. R News, 2, 18–22.
get.cgi/2008/CS/CS-2008-01.pdf [accessed on 27 April 2018].
Matérn, B. 1960. Spatial variation. Meddelanden från Statens Skogsforskn-
Behrens, T., Förster, H., Scholten, T., Steinrücken, U., Spies, E.-D.
ingsinstitut, 2nd edn (1986), Lecture Notes in Statistics, No 36 edn, Vol-
& Goldschmitt, M. 2005. Digital soil mapping using artificial
ume 49. Springer, New York.
neural networks. Journal of Plant Nutrition and Soil Science,
Matheron, G. 1963. Principles of geostatistics. Economic Geology, 58,
168, 21–33.
1246–1266.
Behrens, T., Schmidt, K., Zhu, A.-X. & Scholten, T. 2010. The ConMap
McBratney, A.B., Mendonca Santos, M.L. & Minasny, B. 2003. On digital
approach for terrain based digital soil mapping. European Journal of Soil
soil mapping. Geoderma, 117, 3–52.
Science, 61, 133–143.
Behrens, T., Schmidt, K., Ramirez-Lopez, L., Gallant, J., Zhu, A.-X. & Mezzalira, S. 1965. Descrição Geográfica e Geológica das folhas de
Scholten, T. 2014. Hyper-scale digital soil mapping and soil formation Piracicaba e São Carlos (SP). In: Boletim do Instituto Geográfico e
analysis. Geoderma, 213, 578–588. Geológico No 43. Instituto Geográfico e Geológico, Sao Paulo.
Behrens, T., Schmidt, K., MacMillan, R.A. & Viscarra Rossel, R.A. 2018. Milborrow, S. 2017. Earth: Multivariate Adaptive Regression Splines. R
Multiscale contextual spatial modelling with the Gaussian scale space. Package Version 4.5.0 [WWW document]. URL https://CRAN.R-project
Geoderma, 310, 128–137. .org/package=earth [accessed on 31 May 2017].
Breiman, L. 2001. Random forests. Machine Learning, 45, 5–32. Moran, C. & Bui, E. 2002. Spatial data mining for enhanced soil map
Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. 1984. Classifica- modelling. International Journal of Geographical Information Science,
tion and Regression Trees. Wadsworth, Belmont, CA. 16, 533–549.
Brunsdon, C., Fotheringham, A.S. & Charlton, M.E. 1996. Geographically Neuman, S.P. & Jacobson, E.A. 1984. Analysis of nonintrinsic spatial
weighted regression: a method for exploring spatial nonstationarity. variability by residual kriging with application to regional groundwater
Geographical Analysis, 28, 281–298. level. Mathematical Geology, 16, 499–521.
Bui, E., Hendersen, B. & Viergever, K. 2009. Using knowledge discovery Odeh, I., McBratney, A. & Chittleborough, D. 1994. Spatial prediction of
with data mining from the Australian soil resource information system soil properties from landform attributes derived from a digital elevation
database to inform soil carbon mapping in Australia. Global Biogeochem- model. Geoderma, 63, 197–214.
ical Cycles, 23, 1–15. Oliver, M.A. & Webster, R. 1990. Kriging: a method of interpolation for
Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L. geographical information systems. International Journal of Geographical
et al. 2017. System for Automated Geoscientific Analyses (SAGA). Information Systems, 4, 313–332.
Version 3.0 [WWW document]. URL http://www.saga-gis.org/en/index Páez, A. 2004. Anisotropic variance functions in geographically weighted
.html [accessed on 31 May 2017]. regression models. Geographical Analysis, 36, 299–314.
Friedman, J.H. 1991. Multivariate adaptive regression splines. Annals of Pebesma, E.J. 2004. Multivariable geostatistics in S: the gstat package.
Statistics, 19, 1–67. Computers & Geosciences, 30, 683–691.
Gollini, I., Lu, B., Charlton, M., Brunsdon, C. & Harris, P. 2015. GWmodel: Quinlan, R. 1992. Learning with continuous classes. In: Proceedings of the
an R package for exploring spatial heterogeneity using geographically 5th Australian Joint Conference on Artificial Intelligence (A. Adams &
weighted models. Journal of Statistical Software, 63, 1–50. L. Sterling), 343–348, World Scientific, Singapore.

© 2018 British Society of Soil Science, European Journal of Soil Science


14 T. Behrens et al.

R Core Team 2017. R: A Language and Environment for Statistical Venables, W.N. & Ripley, B.D. 2002. Modern Applied Statistics with S, 4th
Computing. R Foundation for Statistical Computing, Vienna [WWW edn. Springer, New York.
document]. URL https://www.R-project.org/ [accessed on 31 May 2017]. Viscarra Rossel, R.A. & Behrens, T. 2010. Data mining and knowledge
Rao, C. & Toutenburg, H. 1999. Linear Models: Least Squares and discovery techniques to model and interpret soil diffuse reflectance
Alternatives. Springer, New York. spectra. Geoderma, 158, 46–54.
Rosenfeld, A. & Pfaltz, J.L. 1968. Distance functions and digital pictures. Viscarra Rossel, R.A., Webster, R., Bui, E.N. & Baldock, J.A. 2014.
Pattern Recognition, 1, 33–61. Baseline map of organic carbon in Australian soil to support national
Schmidt, K., Behrens, T. & Scholten, T. 2008. Instance selection and clas- carbon accounting and monitoring under climate change. Global Change
sification tree analysis for large spatial datasets in digital soil mapping. Biology, 20, 2953–2970.
Geoderma, 146, 138–146. Viscarra Rossel, R.A., Chen, C., Grundy, M.J., Searle, R., Clifford, D.
Schmidt, K., Behrens, T., Daumann, J., Ramirez-Lopez, L., Werban, U., & Campbell, H. 2015. The Australian three-dimensional soil grid:
Dietrich, P. et al. 2014. A comparison of calibration sampling schemes Australia’s contribution to the Global Soil Map project. Soil Research,
at the field scale. Geoderma, 232–234, 243–256. 53, 845–864.
Sun, W., Minasny, B. & McBratney, A. 2012. Analysis and prediction of soil Webster, R. & Oliver, M.A. 2007. Geostatistics for Environmental Scien-
properties using local regression-kriging. Geoderma, 171–172, 16–23. tists. Second Edition. John Wiley & Sons Ltd., Chichester.
Unwin, D. 1975. An Introduction to Trend Surface Analysis (Concepts and Wood, J., 1996. The geomorphological characterization of digital elevation
Techniques in Modern Geography; No5). Geo Abstracts, Norwich. models. Doctoral thesis, University of Leicester, Leicester, UK.

© 2018 British Society of Soil Science, European Journal of Soil Science

You might also like