Spatial Predictive Modelling With R 1st Edition Jin Li Install Download
Spatial Predictive Modelling With R 1st Edition Jin Li Install Download
https://ebookmeta.com/product/spatial-predictive-modelling-
with-r-1st-edition-jin-li-2/
https://ebookmeta.com/product/spatial-predictive-modelling-
with-r-1st-edition-jin-li/
https://ebookmeta.com/product/fundamentals-of-spatial-analysis-
and-modelling-1st-edition-jay-gao/
https://ebookmeta.com/product/advanced-prognostic-predictive-
modelling-in-healthcare-data-analytics-1st-edition-sudipta-roy/
https://ebookmeta.com/product/designing-autonomous-ai-a-guide-
for-machine-teaching-1st-edition-kence-anderson/
Superfans The Easy Way to Stand Out Grow Your Tribe and
Build a Successful Business Pat Flynn
https://ebookmeta.com/product/superfans-the-easy-way-to-stand-
out-grow-your-tribe-and-build-a-successful-business-pat-flynn/
https://ebookmeta.com/product/on-the-reception-of-the-heterodox-
into-the-orthodox-church-1st-edition-uncut-mountain-press/
https://ebookmeta.com/product/analyzing-and-writing-with-primary-
sources-1st-edition-wendy-conklin/
https://ebookmeta.com/product/women-talk-back-to-shakespeare-new-
interdisciplinary-approaches-to-early-modern-culture-1st-edition-
jo-eldridge-carney/
https://ebookmeta.com/product/international-law-and-new-wars-
first-edition-christine-chinkin/
Her Courageous Cowboy A Clean Contemporary Western
Romance Calhoun Cowboy Camp Book 5 1st Edition Macie St
James
https://ebookmeta.com/product/her-courageous-cowboy-a-clean-
contemporary-western-romance-calhoun-cowboy-camp-book-5-1st-
edition-macie-st-james/
Spatial Predictive Modeling
with R
Spatial Predictive Modeling
with R
Jin Li
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
© 2022 Jin Li
Reasonable efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The authors
and publishers have attempted to trace the copyright holders of all material reproduced in this publica-
tion and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future
reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans-
mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or
contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. For works that are not available on CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used
only for identification and explanation without intent to infringe.
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors
Contents
Preface xiii
v
vi Contents
Appendix 351
References 357
Index 369
Preface
xiii
xiv Preface
and optimized (Li and Heap 2011, 2014). Spatial predictive models are often developed in
terms of predictive accuracy based on model validation methods and are fundamentally
different from other modeling types (Leek and Peng 2015).
TABLE 1: Development of the hybrid methods for SPM (Li, Heap, Potter, and Daniell
2010, 2011b, 2011a; Li, Potter, Huang, Daniell, et al. 2010; Li 2011; Li, Potter, Huang, and
Heap 2012a, 2012b) (modified from Li (2018a)).
Predictive accuracy is the key criteria for predictive modeling, and it is used for parameter,
variable and model/method selection in this book. The property of spatial predictions is
a further criteria for predictive modeling, where professional knowledge plays its role in
examining whether the predictions are scientifically sound, reasonable, and interpretable.
This book aims to introduce SPM as a discipline to modelers and researchers. It systemati-
cally introduces the entire process of SPM . The process contains the following components:
data acquisition, method and variable selection, model or parameter optimization, accuracy
assessment, and the generation and visualization of spatial predictions. Each of these mod-
eling components plays an important role in model development. Incorrect or inappropriate
implementation of any components may lead to less accurate or even misleading predic-
tive model(s). This book provides tools for relevant components to improve the quality of
spatial predictions in various disciplines. It also provides guidelines, suggestions, recommen-
dations, and reproducible examples in R for developing the most accurate predictive model
by considering these components, relevant requirements, and factors associated with each
component.
This book concentrates more on the applications of predictive methods and less on the math-
ematical and statistical details that can be found in previous studies (e.g., Goovaerts 1997;
Webster and Oliver 2001; Venables and Ripley 2002; Wackernagel 2003; Bivand, Pebesma,
and Gomez-Rubio 2013; van Lieshout 2019). Since this book is specifically focusing on SPM
with R, for one interested in machine learning and spatio-temporal statistics with R other
Preface xv
relevant publications are available (e.g., Kuhn and Johnson 2013; James et al. 2017; Wikle,
Zammit-Mangion, and Cressie 2019).
This book covers the whole modeling process that is not only important for SPM, but also
provides valuable tools to other predictive modeling fields. It is expected to boost the ap-
plications of appropriate SPM processes, and improve the quality of spatial predictions for
various disciplines. It is also expected to enhance further research in this field, and antici-
pated further novel and performance-improved spatial predictive methods to be developed
and applied.
R versions
Two versions of R were used to run R code in this book.
R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under:
Windows 10 x64 (build 19041)
R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under:
Windows 10 x64 (build 19041)
R packages
An R package, spm (Li 2019b), initially released on CRAN in 2017, is used as one of the core
packages in this book. Three other core R packages, spm2 (Li 2021a), steprf (Li 2021c), and
stepgbm (Li 2021b), are developed for and released with this book, to cover the methods and
functions that are not currently included in the spm package.
1. The spm package
The spm package introduces some novel accurate hybrid methods of geostatistical and ma-
chine learning methods for SPM. It contains two commonly used geostatistical methods (i.e.,
OK and IDW ), two machine learning methods (RF and GBM ), four hybrid methods (i.e.,
RFIDW, RFOK, GBMIDW, and GBMOK) and two averaging methods (RFOKRFIDW,
and the average of GBMOK and GBMIDW (GBMOKGBMIDW )). For each method, two
functions are provided, with one function for assessing the predictive errors and accuracy
of the method based on cross-validation and the other for generating spatial predictions.
2. The spm2 package
The spm2 package is an extended version of spm, by further introducing some novel functions
for statistical methods (i.e., GLM, glmnet, GLS), thin plate splines, SVM, kriging methods
(i.e., simple kriging, universal kriging, block kriging, kriging with an external drift), and
228 hybrid methods plus numerous variants for SPM. For each method, two functions are
provided, with one function for assessing the predictive errors and accuracy of the method
based on cross-validation and the other for generating spatial predictions if needed. It also
contains a couple of functions for data preparation and predictive accuracy assessment.
3. The steprf package
The steprf package introduces several novel variable selection methods for RF. They are
based on averaged variable importance (AVI ), and knowledge informed AVI (KIAVI and
KIAVI2) methods.
4. The stepgbm package
The stepgbm package introduces a couple of novel variable selection methods for GBM. They
are based on relative variable influence (RVI ) and knowledge informed RVI (KIRVI and
KIRVI2) methods.
Preface xvii
Data sets
All data sets used in this book are available either in the spm, spm2, and sp (Pebesma and
Bivand 2020) packages or online as detailed in Appendix A.
References for R
Learning materials are available online for beginners, such as
(1) “An Introduction to R”;
(2) “R Language Definition”;
(3) “R Installation and Administration”;
(4) “R Data Import/Export”; and
(5) “Writing R Extensions”.
They can also be accessed from Manuals (in PDF) tab under Help tab in RGui.
Caveats
Although in this book I attempt to cover relevant components, which contribute to the
improvement of predictive accuracy, as comprehensively as possible, the SPM field is too
broad to allow that to be done completely in this book. This is because different disciplines
have their own specific features and requirements. Therefore, further work is needed to
identify factors in relevant components or additional components that can further improve
predictive accuracy in each discipline.
Furthermore, the data sets and software packages are provided in good faith, but none of
the author, publishers, and distributors warrant their accuracy and are responsible for the
consequences of their use.
xviii Preface
Contact details
The author may be contacted by email at
[email protected]
and would be grateful for being informed of errors and improvements to the contents of this
book. Errata and updates are available at https://github.com/jinli22.
Acknowledgment
This book would not be possible without:
(1) R, a free software environment for statistical computing and graphics (R Core Team
2020);
(2) RStudio, an integrated development environment (IDE) for R (RStudio Team 2020);
and
(3) R bookdown, a powerful tool for combining analysis and reporting into the same docu-
ment (Xie 2016).
I would like to acknowledge the contribution of many people to the conception and com-
pletion of this book. Over the years, many people have greatly influenced my career, but a
special recognition must be given to Bob Murison, University of New England, who, with
enthusiasm and patience, helped me in modern statistics and statistical computing in 1990s.
Tony Arthur and Steve Henry kindly helped to get the bee data set released from CSIRO for
this book. I am also grateful to Yanchang Zhao for helpful discussion and suggestion in the
early stage of the preparation of the book. I am indebted to Xiufu Zhang for proofreading
and critical comments on the preliminary draft. I would like to thank three reviewers for
valuable comments on the proposal of this book. I am greatly appreciative to Rob Calver
at Chapman & Hall/CRC for seeking a book idea back in 2015 and his team for continuous
support, patience, and help at each phase of the preparation of this book. Finally, I would
like to thank my family and my parents for their love and support.
Jin Li
July , 2021
Author Bio
xix
1
Data acquisition, data quality control, and spatial
reference systems
This chapter introduces relevant sampling designs for spatial data, factors to be considered
for data quality control (QC), and spatial data types and spatial reference systems to be
used for spatial predictive modeling.
library (raster)
pb.df <- as.data.frame(raster("./data/bathy_1km_Petrel.tif"), xy = TRUE , na.rm =
TRUE) # petrel bathymetry raster data imported as data - frame
names(pb.df) <- c("long", "lat", "bathy")
The data sets, pps.df and pb.df, are in dataframe format. They need to be converted to
spatial objects as below according to Bivand, Pebesma, and Gomez-Rubio (2013) so that
they can be visualized.
DOI: 10.1201/9781003091776-1 1
2 1 Data acquisition, data quality control, and spatial reference systems
library (sp)
pb <- pb.df
gridded(pb) = ~long+lat # grid spatial data into SpatialPixelsDataFrame
Spatial distribution of sediment samples in the Petrel area is illustrated in Figure 1.1 by
par(font.axis = 2, font.lab = 2)
lab. palette <- colorRampPalette(c("dark blue", "blue", "light blue"), space = "Lab
")
image(pb , axes = T, xlab = "Longitude", ylab = "Latitude", col = lab. palette (255))
plot(pps , add = TRUE , pch = 1, col = "red", cex = 0.5)
It is apparent that the samples are non-randomly distributed and even clustered.
FIGURE 1.1: Samples selected (red circle) by ad-hoc sampling designs; and the back-
ground shows the spatial patterns of bathymetry in the Petrel area.
For sampling point locations in an area using spsample, the following arguments need to be
specified:
(1) x, a spatial object;
(2) n, sample size; and
(3) type, a character to specify a sample method (e.g., “random”, “regular”).
We apply spsample to the Petrel area and produce 100 samples as below.
systsps <- spsample(pb , n = 100, "regular")
The samples selected, systsps, are illustrated in Figure 1.2. As expected for systematic
sampling design, the samples are evenly distributed over space.
FIGURE 1.2: Samples selected (red circle) by systematic sampling design; and the back-
ground shows the spatial patterns of bathymetry in the Petrel area.
For spatial predictive modeling, non-random sampling methods are not recommended. How-
ever, non-random sampling designs were compared previously (Diggle and Ribeiro Jr. 2010),
which provides some useful clues for lattice plus close pairs sampling for spatial predictive
modeling.
We apply spsample to pb and produce 100 samples using unstratified equal probability design
as an example below. To make the results reproducible, we need to use function set.seed
that takes an (arbitrary) integer argument.
set.seed (1234)
unstran <- spsample(pb , n = 100, "random")
FIGURE 1.3: Samples selected (red circle) by unstratified random sampling design, and
the background shows the spatial patterns of bathymetry in the Petrel area.
The unstratified random sampling is not recommended to collect data for spatial predictive
modeling, so for the unstratified unequal probability design, no example will be provided.
This is because (1) spatial information is available for sampling design, which would lead
to spatially stratified sampling as discussed below in Section 1.1.3, and (2) unstratified
random sampling design may even be overperformed by lattice sampling, a kind of non-
random design (Diggle and Ribeiro Jr. 2010).
ing a survey for a region. Some recently developed randomized spatial sampling procedures
were reviewed and compared using simple random sampling without replacement as a bench-
mark for comparison, and the guidance has been also provided for choosing an appropriate
spatial sampling method (Benedetti, Piersimoni, and Postigione 2017). Furthermore, some
applications of stratified random sampling with spatial information (i.e., spatially stratified
random sampling) and some R packages for stratified random sampling have been reviewed
for spatial predictive modeling (Li 2019a).
Spatially stratified random sampling design can be generated in several ways in R (Li 2019a)
using additional information. Three examples using different functions are provided below.
1. Function spsample
FIGURE 1.4: Samples selected (red circle) by spatially stratified random sampling design
using the function spsample in the sp package, and the background shows the spatial patterns
of bathymetry in the Petrel area.
The second example applies the functions stratify and spsample in the spcosa package
(Walvoort, Brus, and de Gruijter 2020) to pb.
6 1 Data acquisition, data quality control, and spatial reference systems
The function stratify is to partition a spatial object into compact strata by means of k-
means (Walvoort, Brus, and de Gruijter 2020). The descriptions of relevant arguments of
stratify are detailed in its help file, which can be accessed by ?stratify.
For sampling point locations in an area using stratify, the following arguments need to be
specified:
(1) x, a spatial object;
(2) nStrata, number of strata; and
(3) equalArea, if FALSE the algorithm results in compact strata, and if TRUE the algorithm
results in compact strata of equal size.
We apply stratify to pb and produce 20 strata as below.
library (spcosa)
The arguments of spsample in spcosa are the same as the arguments of spsample in sp. We
apply spsample to strata20 and select five samples from each stratum as below.
# select 100 samples (i.e., 5 samples from each stratum )
set.seed (1234)
sp20 <- spsample(strata20 , n = 5)
sp20 .100 <- as(sp20 , 'data.frame ')
3. Function clhs
The last example uses the function clhs in the clhs package (Roudier 2011, 2020). The
function clhs is to implement the conditioned Latin hypercube sampling (Roudier 2011).
The descriptions of relevant arguments of clhs are detailed in its help file, which can be
accessed by ?clhs.
For sampling point locations in an area using clhs, the following arguments need to be
specified:
(1) x, a data.frame or a spatial object; and
(2) szie, sample size.
We apply clhs to pb.df, a data.frame. This sampling design uses both location information
and bathymetry data to spatially stratify the sampling area and randomly select 100 samples
as below.
library (clhs)
set.seed (1234)
sample100 <- clhs(pb.df , size = 100)
sample100.sp <- pb.df[sample100 ,]
sp100 <- sample100.sp[,-3]
FIGURE 1.5: Samples selected (gray dot) by spatially stratified random sampling using
the functions stratify and spsample in the spcosa package, where the region to be sampled
is divided into 20 equal areas (green).
FIGURE 1.6: Samples selected (red circle) by spatially stratified random sampling design
using the function clhs; and the background shows the spatial patterns of bathymetry in
the Petrel area.
set.seed( 1234)
samp <- quasiSamp(n = n, dimension = 2, study.area = NULL , potential.sites = X,
inclusion.probs = altInclProbs) # generate the design according to the altered
inclusion probabilities .
Since this sampling method uses the spatial extent of survey area (i.e., potential sites)
instead of its actual spatial domain, some samples may locate outside the domain. If this
occurs, sample size (i.e., the number of samples to be selected) may need to be increased
to ensure sufficient samples fall within the domain.
We can visualize the adjusted inclusion probabilities (white to blue areas to show the inclu-
sion probability changing from low to high), legacy sites (red circle) and sample locations
as in Figure 1.7 by
X1 <- X
X1$prob <- altInclProbs
gridded(X1) = ~long+lat
image(X1 , axes = T, xlab = "Longitude", ylab = "Latitude", col = hcl.colors (10, "
blues", rev = TRUE)) # Adjusted Inclusion Probabilities
1.2 Data quality control 9
FIGURE 1.7: Samples selected (red diamond) by spatially stratified random sampling
with prior information (i.e., legacy sites, red circle), with the adjusted inclusion probabilities
increasing from the legacy sites (white to blue areas show the inclusion probability changing
from low to high).
still affected by many factors, such as data credibility, data accuracy, and completeness (Li,
Potter, Huang, Daniell, et al. 2010; Li, Potter, Huang, and Heap 2012b). Data QC process
is still required for the sediment samples.
We will use sediment samples in the Petrel area as an example, and the samples will be
extracted from MARS database as below.
sed1 <- read.csv("./data/MARS_Grain_size_as_mud_sand_gravel_mean_texture_20200524_
092819. csv")
# names (sed1)
dim(sed1)
[1] 385 20
The data structure and first three rows of sediment samples are
str(sed1 , digits.d = 6, width = 65, strict.width = "cut")
head(sed1 , 3)
The samples in sed1 can be quality controlled based on the data QC criteria developed
previously (Li, Potter, Huang, Daniell, et al. 2010; Li, Potter, Huang, and Heap 2012b). In
total, six data QC criteria are considered below to demonstrate how to conduct data QC.
library (spm2)
sed1$lat.digit <- decimaldigit(sed1$lat)
sed1$long.digit <- decimaldigit(sed1$long)
Then we remove samples with two or less digits after the decimal point in either longitude
or latitude.
sed1 <- subset(sed1 , sed1$lat.digit >= 3 | sed1$long.digit >= 3)
sed1 <- sed1[, -c(11:12)]
dim(sed1)
[1] 374 10
This shows that sample size has been reduced from 385 to 374, that is, 11 samples have
been removed.
Sample.type Frequency
1 CORE BOX 5
2 CORE GRAVITY 130
3 DREDGE PIPE 45
4 DREDGE UNSPECIFIED 92
5 GRAB SHIPEK 31
6 GRAB SMITH MCINTYRE 68
7 GRAB UNSPECIFIED 3
Among these seven sampling methods, dredge methods (i.e., dredge pipe and dredge un-
specified) are assumed to be unreliable, and samples collected using the methods are less
accurate. So, we need to remove these dredged samples.
sed1 <- subset(sed1 , sed1$ sample.type != "DREDGE PIPE" & sed1$ sample.type != "
DREDGE UNSPECIFIED")
dim(sed1)
[1] 237 10
It is clear that the sample size has been reduced from 374 to 237, that is, in total, 137
samples were collected using dredge methods and have been removed.
FALSE TRUE
230 7
12 1 Data acquisition, data quality control, and spatial reference systems
It shows that there are seven locations with duplicated samples. Since the data set is sorted
using mud variable in descending order, it ensures the samples with less mud content be
labeled as duplicated. These duplicated samples need to be removed, and the duplicated
samples with less mud content are removed below.
sed1$best_loc <- duplicated (sed1$location.id)
sed1 <- subset(sed1 , best_loc == FALSE)
sed1 <- sed1[, -c(11:12)]
dim(sed1)
[1] 230 10
This shows that sample size has been reduced from 237 to 230, that is, seven duplicated
samples have been removed.
[1] 125 10
This shows that sample size has been reduced from 230 to 125, that is, 105 samples with a
base depth more than 5 cm have been removed.
sed1 <- subset(sed1 , mud >= 0 & sand >= 0 & gravel >= 0)
dim(sed1)
[1] 111 10
It shows that sample size has been reduced from 125 to 111, that is, 14 samples are with
NA and have been removed.
In fact, with a further inspection, the sums of sediment for these 14 samples are all 100% if
the NAs were replaced with 0s. This suggests that 0s were ignored for these samples in the
MARS database, resulting in these missing values. If one wishes, it is pretty reasonable to
include these 14 samples by simply replacing the NAs with 0s.
1.3 Data quality control 13
[1] 99 102
99 100 102
1 109 1
This shows that: (1) the sum of three sediment types are 100% for 109 samples, and (2) for
the remaining two samples their sum is not 100%, but close to 100%. The differences could
result from rounding errors in their recordings, so no further samples are removed.
2. Data limits
For marine samples, bathymetry data are stored as negative, while for terrestrial samples
elevation data usually are positive. Samples with positive bathymetry are assumed to be
incorrect and need to be removed.
sed1b <- subset(sed1 , bathy <= 0)
dim(sed1b)
[1] 0 10
This suggests that no sample is selected. This is because bathymetry data are actually miss-
ing for all these remaining samples, which can be shown from summary(sed1) or simply from
sed1$bathy. Since we can derive bathy data for these samples from bathy data for Australian
EEZ as shown in Appendix A, it would not prevent the selection of these samples.
sed1 .1 <- sed1[, -c(1:4)]
names(sed1 .1); dim(sed1 .1)
[1] 111 6
In total, there are 111 samples remaining after the data QC process. In comparison with
the pps.df data set that was generated back in 2012 (Li, Potter, Huang, Daniell, et al. 2010;
Li, Potter, Huang, and Heap 2012b), there are three fewer samples in this data set. This
could be due to sample removal resulted from data QC process for MARS database since
2012.
This example intends to provide some clues to how to QC a data set at hand. Sometimes,
data noises may be resulted from repeated measures, and certain rules may need to be
developed to clean such samples based on professional knowledge (e.g., Li et al. 2009).
Furthermore, exploratory analysis can be used to further detect abnormal samples as de-
tailed in the next chapter.
14 1 Data acquisition, data quality control, and spatial reference systems
data(petrel)
class(petrel)
[1] "data.frame"
head(petrel , 3)
petrel [1:2, ]
data(sponge)
class(sponge)
[1] "data.frame"
head(sponge , 3)
The samples are point data and stored together with their associated location information.
The location information in the petrel data set is longitude and latitude, while the location
information in the sponge data set is easting and northing.
2. Grid data of predictive variables with location information
We use two grid data sets, petrel.grid and sponge.grid, in the spm package as examples below.
data(petrel.grid)
class(petrel.grid)
1.3 Spatial data types and spatial reference systems 15
[1] "data.frame"
petrel.grid [1:2, ]
data(sponge.grid)
class(sponge.grid)
[1] "data.frame"
sponge.grid [1:2, ]
The examples show that location information in the petrel.grid data set is longitude and
latitude, while the location information in the sponge.grid data set is easting and northing.
The spatial reference system for the location information can be changed when required as
demonstrated below.
The spatial information, easting and northing, in the sponge data set is stored in utm zone
52 south. Since the sponge data set is in dataframe format, it needs to be converted to
SpatialPoints format with its associated coordinate reference system prior to reprojecting as
below (Bivand, Keitt, and Rowlingson 2019; Pebesma and Bivand 2020; Hijmans 2020).
library (sp)
library (raster)
library (rgdal)
Given that the data format needs to be changed from dataframe to SpatialPoints format as
below, it is a good idea to reassign the dataframe sponge to a different object spng that will
be reformatted. Thus the format of sponge will remain unchanged for future use.
16 1 Data acquisition, data quality control, and spatial reference systems
spng [1:2, ]
crs(spng)
CRS arguments:
+proj=utm +zone =52 +south +ellps=WGS84 +units=m +no_defs
Now the spng data set can be reprojected from easting and northing in utm zone 52 south
to longitude and latitude in WGS84 using spTransform as demonstrated below.
spng.wgs84 <- spTransform(spng , CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=
WGS84 +towgs84 =0,0,0"))
class(spng.wgs84)
crs(spng.wgs84)
# windows ()
# spplot (spng.wgs84 , " sponge ")
write.csv(spng.wgs84 , "./data/ spongelonglat.csv", row.names = FALSE)
Function st_transform
Alternatively, the spng data set can be reprojected from easting and northing in utm zone 52
south to longitude and latitude in WGS84 using st_transform as shown below.
library (sf)
spng2 <- st_as_sf(sponge , coords=c("easting", "northing"))
st_crs(spng2) <- 32752
# crs( spng2 )
spng.wgs84 .2 <- st_ transform (spng2 , st_crs (4326))
class(spng.wgs84 .2)
1.3 Spatial data types and spatial reference systems 17
crs(spng.wgs84 .2)
For spatial predictive modeling, coordinate information may be used as predictive vari-
able(s). Thus spng.wgs84.2 needs to be converted into a dataframe format as follows.
spng.wgs84 .3 <- cbind ((st_ coordinates(spng.wgs84 .2)), st_set_geometry(spng.wgs84
.2, NULL))
class(spng.wgs84 .3)
[1] "data.frame"
spng.wgs84 .3[1:2 , ]
pg[1:2, ]
Now the pg data set can be reprojected from longitude and latitude in WGS84 to easting
and northing in utm zone 52 south as below.
pg.utm52s <- spTransform(pg , CRS("+proj=utm +zone =52 +south +units=m +no_defs +
ellps=WGS84 +towgs84 =0,0,0"))
class(pg.utm52s); crs(pg.utm52s)
CRS arguments:
+proj=utm +zone =52 +south +ellps=WGS84 +units=m +no_defs
Similar to the reprojection of point data spng above, st_transform can be used to reproject
the grid data set from WGS84 to utm zone 52 south.
3. Selection of spatial reference systems
Spatial reference system used to project spatial information is often assumed to have certain
effects on the performance of predictive models, thus in practice various spatial reference
systems have been developed to minimize such effects (Jiang and Li 2014). For spatial
predictive modeling, a spatial reference system that can minimize distortion in distance over
space is ideal. When a study area is relatively small and located within one utm zone, spatial
data are often projected using the utm zone or an appropriate projection system. When the
study area is spanning over two or more utm zones, the existing geographic coordinate
system (i.e., WGS84) could be used. This is because the effects of spatial reference systems
on the predictive accuracy of spatial predictive methods (i.e., IDW and OK) could be
negligible for areas at various latitudinal locations (up to 70 decimal degrees) and spatial
scales (Jiang and Li 2013, 2014; Turner, Li, and Jiang 2017). Hence, without reprojecting the
spatial data (most likely it is in WGS84), the spatial data can be used for spatial predictive
modeling for areas with latitude less than 70 decimal degrees.
Furthermore, since the predictive accuracy is the key for predictive modeling, an optimal
spatial reference system should be selected to maximize predictive accuracy. The spatial
reference system that can minimize the distortion in distance should be identified and used.
The selection of an optimal spatial reference system can also be determined based on its
effect on predictive accuracy for relevant predictive method . This can be achieved with
cross-validation function that is to be introduced later for relevant predictive methods in
this book.
2
Predictive variables and exploratory analysis
This chapter introduces data preparation of predictive variables and exploratory analysis for
predictive relevant methods, including (1) principles for pre-selection of predictive variables
and limitations, (2) predictive variables, and (3) role and limitations of exploratory analysis
in variable pre-selection.
DOI: 10.1201/9781003091776-2 19
20 2 Predictive variables and exploratory analysis
(i.e., surrogate variables) are used instead for spatial predictive modeling. Proxy variables
are usually variables directly caused by response variable and/or correlated variables. They
can be identified based on expert or professional knowledge (e.g., McArthur et al. 2010).
Certainly, predictive models can use causal variables, proxy variables, or both if causal
variables are not all available.
2.1.4 Limitations
For spatial predictive modeling, how to select potential predictive variables can be limited
or constrained by certain factors. For example, (1) they need to be continuously available
for the entire area to be predicted; (2) spatial resolution of various predictive variables
needs to meet desired resolution for the final predictions, although they can be re-scaled
or aggregated. Sometimes, even though we know certain possible predictive variables, they
may not meet these requirements and cannot be used for spatial predictive modeling. This
is particularly true in mountainous and deep sea areas for spatial predictive modeling in
the environmental sciences.
2. Climatic variables
Climatic variables may include temperature, precipitation, wind speed, and various derived
variables (e.g., humidity, seasonality).
3. Topographical variables
Topographical variables may include elevation, slope, aspect, distance-to-sea, and relevant
derived variables such as topographic position index.
4. Optical remote sensing data
Optical remote sensing data may include various reflectance bands and relevant derived
variables (e.g., NDVI, EVI).
5. Vegetation information
Vegetation information may include variables like vegetation types, abundance, and cover-
age.
6. Substrate data
Substrate data may include soil type, soil nutrients, organic matter, and soil moisture.
7. Disturbance information
Disturbance information may include grazing, fertilization, burning, and so on.
This is based on hbee1 data set from a previous study (Arthur et al. 2010, 2020).
Then some outliers identified could be excluded from further modeling and analysis.
Although identification of outliers is important for predictive modeling, the outliers identi-
fied may be caused by certain conditions (e.g., an optimal environmental condition) (Arthur
et al. 2010, 2020). Thus they could be false outliers as shown in Figure 2.2 (Li 2008; Arthur
et al. 2010) for model glmmpql1 after fitting a glmmPQL model below.
library (MASS)
glmmpql1 <- glmmPQL(hbee ~ inf + c500 + I(inf ^2) + I(links300 ^2) + inf:I(links300
^2), random = ~1| paddock/plot , data = hbee1 , family = quasi(var = "mu^2", link
= "log"), maxit = 1000)
It is apparent that one outlier identified in Figure 2.1 (i.e., the observation with the maximal
bee count number) could be no longer classified as an outlier in Figure 2.2.
2.3 Exploratory analysis 23
FIGURE 2.2: Observed honey bee count data vs. fitted values of an optimal model:
showing a false outlier (i.e., the sample with a count value > 60).
24 2 Predictive variables and exploratory analysis
Outliers may also change with predictive models developed, that is, a false outlier could
also be produced by a sub-optimal model, e.g, model glmmpql2 below.
glmmpql2 <- glmmPQL(hbee ~ inf + c500 + w2000 + w300 + I(inf ^2) + I(c500 ^2) + I(
w2000 ^2) + I(w300 ^2) + inf:w2000 + inf:w300 + c500:w2000 + c500:w300 + I(inf
^2):w2000 + I(inf ^2):w300 , data = hbee1 , random = ~ 1| paddock/plot , family =
quasi(link = log , var = mu^2), maxit = 1000)
The results are shown in Figure 2.3 (Li 2008; Arthur et al. 2010) by
par(font.axis = 2, font.lab = 2, las = 1)
plot(hbee , predict (glmmpql2 , type = "response"), xlab = "Observed values", ylab =
"Fitted values")
lines(hbee , hbee)
FIGURE 2.3: Observed honey bee count data vs. fitted values of a sub-optimal model:
showing an outlier (i.e., the observation with a count value > 60).
It is obvious that the observation with the maximal bee count number becomes an outlier.
Therefore, caution should be taken in dealing with outliers.
2. Homogeneity of variance
Variance of response variable (or depend variable) can be either homogeneity (homoscedas-
ticity) or heterogeneity (heteroscedasticity). For spatial predictive modeling using regression
methods (e.g., GLM), we need to consider the variance of response variable in relation with
its mean as shown in Figure 2.4 that is based on hbee1 data set by
mu <- with(hbee1 , tapply(hbee , list(paddock , obs), mean))
vars <- with(hbee1 , tapply(hbee , list(paddock , obs), var))
Apparently, the variance changes with sample mean and is heterogeneous. This can be fur-
ther examined based on the residuals, for details see (McCullagh and Nelder 1999; Crawley
2007).
Other documents randomly have
different content
Jes. 29:7; Pred. 5:2, 6; Sirach 31:1 v., 34:1 v.; en schrijft ze dikwerf
aan de valsche profeten toe Jer. 23:25, 29:8; Mich. 3:6; Zach. 10:2.
Maar toch bedient God zich telkens van droomen om zijn wil bekend
te maken, Num. 12:6; Deut. 13:1-6; 1 Sam. 28:6, 15; Joel 2:28 v.;
zij komen bij Israelieten, maar ook meermalen bij niet-Israelieten
voor Gen. 20, 31, 40, 41; Richt. 7; Dan. 2, 4 en behelzen of een
woord, eene mededeeling Gods, Gen. 20:3, 31:9, 24; Matth. 1:20,
2:12, 19, 22, 27:19; of een voorstelling der phantasie, die dan
meermalen verklaring behoeft Gen. 28. 37:5, 40:5, 41:15; Richt.
7:13; Dan. 2, 4. Litter, bij Herzog2 15:734. G. E. W. de Wijs, De
droomen in en buiten den Bijbel 1858. Witsius, de proph. I cap. 5.
Met den droom is het visioen verwant Gen. 15:1, 11; 20:7; Num.
12:6. Reeds de namen ֹרֶאה, ֹחֶזה, ָנִביאen misschien ook ֹצֶפה
waarmede de profeet genoemd wordt, Kuenen, De Profeten I 49, 51
v. 97. Id. Godsd. v. Isr. I 212. Id. Hist. Cr. Ond. II2 5 v. König, Der
Offenbarungsbegriff I 71 f. Delitzsch, Genesis3 634. Schultz, Altt. Th.
239. Smend, Lehrb. 79 f., en de namen ַמְר ֶאהen ָחזֹוןvoor het
profetisch gezicht duiden waarschijnlijk aan, dat het visioen een niet
ongewoon middel der openbaring was. Maar deze woorden hebben
dikwerf hunne oorspronkelijke beteekenis verloren en worden ook
gebruikt, als er geen eigenlijk gezicht meer plaats heeft, 1 Sam.
3:15; Jes. 1:1; Ob. 1; Nah. 1:1, enz. Visioenen worden in de Schrift
telkens vermeld en beschreven, van Genesis af tot in de Apoc. toe.
Gen. 15:1, 46:2; Num. 12:6, 22:3, 24:3; 1 Kon. 22:17-23; Jes. 6,
21:6; Jer. 1:24; Ezech. 1-3, 8-11, 40; Dan. 1:17, 2:19, 7, 8, 10;
Amos 7-9; Zach. 1-6; Matth. 2:13, 19; Luk. 1:22, 24:23; Hd. 7:55,
9:3, 10:3, 10, 16:9, 22:17, 26:19; 1 Cor. 12-14; 2 Cor. 12:1; Apoc.
1:10, enz. Het visioen was menigmaal van eene zekere
geestvervoering vergezeld. Muziek, dans en extase gaan saam;
profetie en poezie zijn verwant, 1 Sam. 10:5 v., 19:20-24; 2 Kon.
3:15; 1 Chr. 25:1, 25; 2 Chr. 29:30. Als de hand des Heeren op de
profeten valt, Jes. 8:11; Ezech. 3:14, 11:5 of de Geest over hen
komt, geraken zij menigmaal in een toestand van verrukking Num.
24:3; 2 Kon. 9:11; Jer. 29:26; Hos. 9:5, en vallen ter aarde Num.
24:3, 15, 16; 1 Sam. 19:24; Ezech. 1:28, 3:23, 43:3; Dan. 10:8-10;
Hd. 9:4; Apoc. 1:17, 11:16, 22:8. In dien toestand worden hun de
gedachten Gods in symbolischen vorm te zien of te hooren gegeven.
In beelden en gezichten wordt hun zijn raad geopenbaard Jer. 1:13
v., 24:1 v.; Am. 7-9; Zach. 1-6; Apoc., enz.; vooral aangaande de
toekomst, Num. 23 v.; 1 Kon. 22:17; 2 Kon. 5:26, 8:11 v.; Jer. 4:23
v., 14:18; Ezech. 8; Am. 7, enz. Ook hooren zij in dien toestand
allerlei stemmen en geluiden, 1 Kon. 18:41; 2 Kon. 6:32; Jes. 6:3, 8;
Jer. 21:10, 49:14; Ezech. 1:24, 28, 2:2, 3:12; Apoc. 7:4, 9:16, 14:2,
19:1, 21:3, 22:8, enz. Zelfs worden zij in den geest opgenomen en
verplaatst, Ezech. 3:12 v., 8:3, 43:1; Dan. 8:2; Matth. 4:5, 8; Hd.
9:10, 10, 11, 22:17, 23:11, 27:23; 2 Cor. 12:2; Apoc. 1:9, 12, 4:1,
12:18. Daniel was na het ontvangen van een visioen eenige dagen
krank, 7:28, 8:27. Toch was de extase waarin de ontvangers der
openbaring menigmaal verkeerden, geen toestand, waarbij het
bewustzijn geheel of gedeeltelijk was onderdrukt. Zoodanig was wel
de toestand, waarin de grieksche μαντεις hunne godspraken gaven,
Tholuck, Die Propheten u. s. w. 64 f. En Philo, Quis rer. div. heres,
Just. Martyr, Dial. c. Tryph, c. 135. Coh. ad Graecos c. 37.
Athenagoras, Leg. pro Christ. c. 8. Tertul. adv. Marc. 4, 22 en in den
nieuweren tijd Hengstenberg in de eerste uitgave zijner Christol. des
A. T. III. 2. 158 f. hebben de extase der profeten alzoo opgevat.
Maar dezen ontvangen visioenen niet in slapenden maar in
wakenden toestand, niet alleen in de eenzaamheid, maar ook in
anderer bijzijn, Ezech. 8:1. Onder het visioen blijven zij zichzelf
bewust, zien, hooren, denken, spreken, vragen en antwoorden Ex.
4-6, 32:7 v.; Jes. 6; Jer. 1; Ezech. 4-6 enz. en later herinneren zij
zich alles en deelen het nauwkeurig mede, König, Der
Offenbarungsbegriff, I 160 f. II 83 f. Kuenen, De profeten I 96 v.
Oehler, Altt. Theol. § 207 f. Orelli in Herzog2 16:724. Daarom werd
de psychische gesteldheid der profeten onder het visioen door de
meeste theologen gehouden voor eene zelfbewuste, geestelijke
aanschouwing, voor eene alienatio mentis a sensibus corporis, en
niet voor eene alienatio a mente; zoo o. a. door Orig. de princ. III,
3, 4, August. ad Simplic. II qu. 1. Thomas, S. Theol. II 2 qu. 175.
Witsius, de proph. I c. 4. Buddeus, Inst. theol. dogm. I, 2, 5 en in
den nieuweren tijd door Hävernick en Keil in hunne inleiding op het
O. T. Oehler, Altt. Theol. § 210. Tholuck, Die Propheten S. 64 f.
Kueper, Das Profetenthum des Alten Bundes S. 51 f. Orelli bij
Herzog2 16, 724. König, Offenb. II 132 f. Alleen heeft König, ten
einde de objectiviteit te handhaven, daaraan de eigenaardige
meening toegevoegd, dat alle visioenen uitwendig, lichamelijk en
zinnelijk waarneembaar waren. Inderdaad zijn vele verschijningen
als Gen. 18, 32, Ex. 3, 19, enz. naar de bedoeling der schrijvers voor
objectief te houden. Er is onderscheid tusschen theophanie en
visioen. Maar toch zijn de bovengemelde visioenen, 1 Kon. 22:17 v.;
Jes. 6; Jer. 1; Ezech. 1-3; Dan.; Amos 7-9; Zach. 1-6, enz. zeker
inwendig en geestelijk. Vele zijn van dien aard, dat ze niet zinnelijk
voorstelbaar en waarneembaar zijn. König gaat te ver, als hij van het
uitwendige der openbaring hare objectiviteit en waarheid laat
afhangen, en geen inwerking van Gods Geest in den geest des
menschen denken kan, dan door de uitwendige zintuigen heen. Hij
vergeet dat er ook wel hallucinaties zijn van gezicht en gehoor, dat
het uitwendige als zoodanig zelfbedrog nog niet buitensluit en dus
de zekerheid der openbaring door haar uitwendig karakter alleen
niet voldoende bewezen wordt, Orelli bij Herzog2 16:724 f. Kuenen
H. C. O. II2 13. Van Leeuwen, Bijb. Godg. 62 v. Borchert, Die
Visionen der Propheten, Stud. u. Krit. 1895, 2tes Heft.
Als laatste vorm der openbaring moet nog genoemd worden de
inwendige verlichting. Hengstenberg, Christol. des A. T. III2 2 S. 158
cf. ook Kueper, Das Proph. 53 f. meende, dat de extase de gewone
toestand was, waarin de profeet bij het ontvangen der openbaring
verkeerde. Maar dit gevoelen is door velen, o. a. door Riehm, Mess.
Weissagung2 S. 15 f. König, Der Off. begriff II 48 f. 83 f. 132 f.
bestreden en thans algemeen verworpen. De extase is niet de regel,
maar de uitzondering, Kuenen, Prof. I 98. H. C. O2 II 11. De meeste
openbaringen aan de profeten ook in ’t O. T. hadden plaats zonder
eenig visioen, bijv. bij Jesaja, Hagg., Mal., Ob., Nah., Hab., Jerem.,
Ezech. Wel wordt dan voor de Godspraak nog dikwerf het woord
„gezicht” gebezigd, maar dit geschiedt ook daar waar er niets wordt
gezien Jes. 1:1, 2:1; Amos 1:1; Hab. 1:1, 2:1; 1 Sam. 3:15; Ob. 1;
Nah. 1:1 enz. De openbaring geschiedt dan inwendig door den
Geest, als Geest der openbaring. Wel heeft König, Der Off. I 104 f.
141 f. 155 f. beweerd, dat de Geest niet is principe der openbaring
maar alleen principe der illuminatie, d. i. dat Jahveh openbaart maar
de Geest slechts voor die openbaring subjectief ontvankelijk maakt;
König kwam hiertoe, wijl hij ook daardoor de objectiviteit en
uitwendigheid der openbaring handhaven wilde en den subjectieven
Geest wilde binden aan ’t objectieve woord van Jahveh. Maar Num.
11:25-29; Deut. 34:9, 1 Sam. 10:6, 19:20 v.; 2 Sam. 23:2; 1 Kon.
22:24; 1 Chron. 12:18, 28:12; 2 Chron. 15:1, 20:14 v., 24:20; Neh.
9:30; Jes. 11:1, 30:1, 42:1, 48:16, 59:21, 61:1, 63:10 v., Ezech. 2:2,
3:24, 8:3, 11:5, 24; Micha 3:8; Hos. 9:7; Joël 2:28; Zach. 7:12,
laten zich niet uitsluitend van eene formeele, subjectieve
bekwaammaking des Geestes verstaan; zij leeren duidelijk, dat de
profeten niet alleen door maar uit den Geest spraken, dat de profetie
voortkwam uit den Geest in hen. Er was ook wel eene den profeet
subjectief bekwaam makende werkzaamheid des Geestes, maar
deze is niet de eenige; zij is niet van de andere openbarende
werkzaamheid zoo streng te scheiden als König doet, zij is op Königs
standpunt, waar de openbaring geheel uitwendig is, ook onnoodig,
Kuenen H. C. O2 14. En de leugengeest 1 Kon. 22:22 leert duidelijk,
dat de Geest bron van ’t woord is, Herz.2 16:721. De Joodsche
theologie zag in den Geest niet alleen de bron der verlichting, maar
ook van de openbaring en profetie. Weber, System der altsyn. pal.
Theol. 184-187. Het N. Test. verklaart even duidelijk, dat de O. T.
profeten spraken uit en door den Geest Gods, Hd. 28:25; 1 Petr.
1:11; 2 Petr. 1:21. Wel echter is er onderscheid in de wijze, waarop
de H. Geest in O. en N. T. de openbaring innerlijk meedeelt. Onder
het O. T. daalt de H. Geest van boven en momentaan op iemand
neer. Hij komt over de profeten, Num. 24:2; 1 Sam. 19:20, 23; 2
Chr. 15:1, 20:14; wordt vaardig over hen, Richt. 14:19, 15:14; 1
Sam. 10:6; valt op hen, Ez. 11:5; trekt hen aan als een kleed, Richt.
6:34; 1 Chr. 12:18; de hand, d. i. de kracht des Heeren grijpt hen
aan, Jes. 8:11; Ez. 1:3, 3:22, 8:1, 37:1, 40:1. Tegenover deze
werking des Geestes zijn de profeten dan ook meest passief; zij
zwijgen, vallen ter aarde, ontzetten zich, en verkeeren voor een tijd
in een abnormen, extatischen toestand. De Geest der profetie is nog
niet het blijvend bezit van de profeten; er is nog scheiding en
afstand tusschen beiden; en de stand der profeten staat nog
afgezonderd tegenover het volk. Heel de profetie is nog onvolkomen.
Zij ziet daarom ook vooruit en verwacht een profeet, op wien de
Geest des Heeren rusten zal Deut. 18:18; Jes. 11:2, 61:1; ja zij
voorspelt de vervulling van Mozes’ wensch, dat al het volk des
Heeren profeten mochten zijn Num. 11:29; en getuigt van eene
toekomstige woning van Gods Geest in alle kinderen des Heeren,
Jes. 32:15, 44:3, 59:21; Joël 2:28; Ez. 11:19, 36:27, 39:29. In het
N. T. verschijnt de hoogste, de eenige, de waarachtige profeet. Hij is
als Logos de volle en voltooide openbaring Gods, Joh. 1:1 v. 18,
14:9, 17:6; Col. 2:9. Hij ontvangt geen openbaring van boven of
buiten, maar is zelf de bron der profetie. De H. Geest komt niet over
Hem en valt niet op Hem neer. Hij woont in Hem zonder mate Joh.
3:34. Uit dien Geest is Hij ontvangen, door dien Geest spreekt,
handelt, leeft en sterft Hij, Mt. 3:16, 12:28; Luk. 1:17, 2:27, 4:1, 14,
18; Rom. 1:4; Hebr. 9:14. En dien Geest schenkt Hij, aan zijne
discipelen, niet alleen als Geest der wedergeboorte en heiliging maar
ook als Geest der openbaring en verlichting, Mk. 13:11; Luk. 12:12;
Joh. 14:17, 15:26, 16:13, 20:22; Hd. 2:4, 6:10, 8:29, 10:19, 11 vs.
12, 13:2, 18:5, 21:4; 1 Cor. 2:12 v.; 12:7-11. Door dien Geest
worden nog wel bijzondere personen bekwaamd tot het ambt van
profeet, Rom. 12:7; 1 Cor. 14:3; Ef. 2:20, 3:5 enz. Ook de eigenlijke
voorspelling ontbreekt in ’t N. T. niet, Mt. 24; Hd. 20:23, 21:8; 1 Cor.
15; 2 Thess. 2. Apoc. Maar alle geloovigen zijn toch de zalving des
Geestes deelachtig, 1 Joh. 2:20; en zijn van den Heere geleerd, Mt.
11:25-27; Joh. 6:45. Allen zijn profeten, die de deugden des Heeren
verkondigen, Hd. 2:17 v.; 1 Petr. 2:9. De profetie als eene bijzondere
gave zal te niet gedaan worden, 1 Cor. 13:8. In het nieuwe
Jeruzalem zal de naam Gods op aller voorhoofden zijn. De leugen is
er volkomen buitengesloten, Apoc. 21:27, 22:4, 15. Litteratuur over
de profeten en de profetie bij Schultz Altt. Theol. 4e Aufl. 213 f; en
verder König, Der Offenbarungsbegriff des A. T. Leipzig, Hinrichs
1882. Kuenen, Hist. crit. Onderzoek, 2e uitg. 1889 II bl. 1 v. Smend,
Lehrbuch der altt. Religionsgesch. 1893. S. 79 f. Kuyper, Encycl. II
362 v. 429 v. C. H. Cornill, Der israel. Prophetismus, Strassburg
1894.