Geostatistical Analysis Tutorial
Geostatistical Analysis Tutorial
Table of Contents
Introduction to the ArcGIS Geostatistical Analyst Tutorial . . . . . . . . . . . . . . . . . . . 3
Exercise 1: Creating a surface using default parameters
. . . . . . . . . . . . . . . . . . . 6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Tutorial scenario
The U.S. Environmental Protection Agency is responsible for monitoring atmospheric ozone concentration in
California. Ozone concentration is measured at monitoring stations throughout the state. The locations of
the stations are shown here. The concentration levels of ozone are known for all the stations, but the ozone
values for other (unmonitored) locations in California are also of interest. However, due to cost and
practicality, monitoring stations cannot be placed everywhere. Geostatistical Analyst provides tools that
make optimal predictions possible by examining the relationships between all the sample points and
producing a continuous surface of ozone concentration, standard errors (uncertainty) of predictions, and
probabilities that critical values are exceeded.
The data required to complete the exercises is included on the tutorial data DVD. When you run the wizard
to install this data, check to install the Geostatistical Analyst data (the default installation path is
C:\ArcGIS\ArcTutor\Geostatistical Analyst). The following datasets (which are contained in a geodatabase
called ca_ozone.gdb) will be used in the tutorial:
Dataset
Description
ca_outline
Outline of California
O3_Sep06_3pm
ca_cities
ca_hillshade
The ozone dataset (O3_Sep06_3pm) was provided courtesy of the California Air Resources Board and
represents the concentration of ozone measured on September 6, 2007 between 3:00 and 4:00 p.m. in parts
per million (ppm). The original data has been modified for the purpose of the tutorial and should not be
considered accurate data.
From the ozone point samples (measurements), you will produce two continuous surfaces (maps) predicting
the values of ozone concentration for every location in the state of California. The first map that you create
will simply use all the default options to introduce you to the process of creating a surface from your sample
points. The second map that you produce will allow you to incorporate more of the spatial relationships that
are discovered among the points. When creating this second map, you will use the exploratory spatial data
analysis (ESDA) tools to examine your data. You will also be introduced to some of the geostatistical options
that you can use to create a surface, such as removing trends and modeling spatial autocorrelation. By
using the ESDA tools and working with the geostatistical parameters, you will be able to create a more
accurate surface. Many times, it is not the actual values of some critical health risk that are of concern but
rather whether the values are above some toxic level. If this is the case, immediate action must be taken.
The third surface you create will assess the probability that a critical ozone threshold value has been
exceeded. For this tutorial, the critical threshold will be if the value of ozone goes above 0.09 ppm
(California State's ambient air quality standard for hourly measurements); then the location should be closely
monitored. You will use Geostatistical Analyst to predict the probability that ozone values across California
complied with this standard on September 6, 2007 between 3:00 and 4:00 p.m.
This tutorial is divided into individual tasks that are designed to let you explore the capabilities of
Geostatistical Analyst at your own pace.
Exercise 1 takes you through accessing Geostatistical Analyst and the process of creating a surface
of ozone concentration using default parameter values to introduce you to the steps involved in
creating an interpolation model.
Exercise 2 guides you through the process of exploring your data before you create the surface to
spot outliers and recognize trends.
Exercise 3 creates a second surface that incorporates more of the spatial relationships discovered in
exercise 2 and improves on the surface you created in exercise 1. This exercise also introduces you
to some of the basic concepts of geostatistics.
Exercise 4 shows you how to compare the results of the two surfaces that you created in exercises
1 and 3 and how to decide which surface provides the better predictions of the unknown values.
Exercise 5 takes you through the process of mapping the probability that ozone exceeded a critical
threshold, creating a third surface.
You will need a few hours of focused time to complete the tutorial. However, you can also perform the
exercises one at a time if you want. It is recommended that you save your results after each exercise.
Start ArcMap by clicking Start > All Programs > ArcGIS > ArcMap 10.
2.
3.
4.
5.
Click Close.
On the main menu, click Customize > Toolbars > Geostatistical Analyst.
The Geostatistical Analyst toolbar is added to your ArcMap session.
The extension and toolbar only need to be enabled and added once; they will be active and
present the next time you one ArcMap.
2.
Navigate to the folder where you installed the tutorial data (the default installation path is
C:\ArcGIS\ArcTutor\Geostatistical Analyst).
3.
4.
Hold down the CTRL key and choose the O3_Sep06_3pm and ca_outline datasets.
5.
Click Add.
6.
Right-click the ca_outline layer legend (the box below the layer's name) in the table of contents
and click No Color, as shown in the following figure:
Only the outline of California is displayed. This allows you to see the layers that you will create
in this tutorial underneath this layer.
7.
8.
9.
In the Show dialog box, click Quantities and then Graduated colors.
10.
11.
Choose the White to Black color ramp so that the points will stand out against the color
surfaces you will create in this tutorial. The symbology dialog should look like this:
12.
Click OK.
Note that the highest ozone values occurred in California's Central Valley, while the lowest
values occurred along the coast. Mapping the data is the first step in exploring it and
understanding more about the phenomenon you want to model.
Steps:
1.
2.
Browse to your working folder (for example, you could create the following folder to store your
work: C:\Geostatistical Analyst Tutorial).
3.
4.
Click Save.
You needed to provide a name for the map because this is the first time you have saved it. To
save the ArcMap document in the future, simply click Save.
Click the Geostatistical Analyst arrow on the Geostatistical Analyst toolbar and click
Geostatistical Wizard.
3.
4.
Click the Data Field arrow and click the OZONE attribute.
5.
Click Next.
By default, Ordinary (kriging) and Prediction (map) are selected on the dialog box. Since the
method to map the ozone surface is selected, you could click Finish to create a surface using
the default parameters. However, steps 6 to 10 will expose you to other dialog boxes. In each
step of the wizard, the interior panels (windows) can be resized by dragging the dividers
between them.
Note that there is a box on the bottom-right of the Geostatistical Wizard that shows a brief
description of the highlighted method or parameter. At this stage, the box shows the dataset
and field that will be used to create the surface.
6.
Click Next.
The semivariogram/covariance model is displayed, allowing you to examine spatial
relationships between measured points. You can assume that things that are closer together
are more alike than things that are farther apart. The semivariogram allows you to explore this
assumption. The process of fitting a semivariogram model to capture the spatial relationships
in the data is known as variography.
7.
Click Next.
The crosshairs show a location that has no measured value. To predict a value at the
crosshairs, you can use the values at the measured locations. You know that the values of the
closest measured locations are most alike to the value of the unmeasured location that you are
trying to predict. The red points in the image below are going to be weighted (or influence the
unknown value) more than the green points since they are closer to the location you are
predicting. Using the surrounding points and the semivariogram/covariance model fitted
previously, you can predict values for the unmeasured location.
10
8.
Click Next.
The cross-validation diagram gives you an idea of how well the model predicts the values at
the unknown locations.
11
You will learn how to use the graph and understand the statistics in exercise 4.
9.
Click Finish.
The Method Report dialog box summarizes information on the method (and its associated
parameters) that will be used to create the output surface.
12
10.
Click OK.
The predicted ozone map is added as the top layer in the table of contents.
11.
Double-click the layer in the table of contents to open the Layer Properties dialog box.
12.
Click the General tab and change the layer's name to Default Kriging and click OK.
Changing the layer's name will help you distinguish this layer from the one you will create in
exercise 4.
13
13.
Notice that the interpolation continues into the ocean because the extent of the layer is the same
as the extent of the input data (O3_Sep06_3pm).
14.
To restrict the prediction surface to within California, right-click the Default Kriging layer and
click Properties.
15.
16.
Click the Set the extent to arrow, click the rectangular extent of ca_outline, then click OK.
The interpolated area extends so that it covers all of California.
17.
Right-click the Layers data frame in the table of contents, click Properties, then click the Data
Frame tab.
18.
Click the Clip Options arrow, choose Clip to shape, then click the Specify Shape button.
19.
On the Data Frame Clipping dialog box, click the Outline of Features button, click the Layer
arrow, then click ca_outline.
20.
14
21.
22.
Right-click the Default Kriging layer in the table of contents and click Validation/Prediction.
This opens the GA Layer To Points geoprocessing tool with the Default Kriging layer specified
as the input geostatistical layer.
15
23.
Input geostatistical layer should be automatically set to Default Kriging. For Point
observation locations, navigate to the geodatabase that contains the data for this tutorial,
and click the ca_cities dataset. Leave the Field to validate on (optional) empty as we just
want to generate ozone predictions for the major cities, not validate the predicted values
against measured values. For Output statistics at point locations navigate to the folder you
created for the output and name the output file CA cities ozone.shp.
The GA Layer To Points geoprocessing tool dialog boxes should look like this:
24.
25.
Once the tool has run, click the Add Data button
16
26.
27.
28.
29.
Right-click the CA_ozone_cities layer and click Remove to remove the layer from the project.
30.
Surface-fitting methodology
You have now created a map of ozone concentration and completed exercise 1. While it is a simple task to
create a surface map using the default options that the Geostatistical Wizard provides, it is important to
follow a structured process such as the one shown below:
17
You will practice this structured process in the following exercises of the tutorial. In addition, in exercise 5, you
will create a surface showing the probability that ozone concentrations exceed a specified threshold. Note that
you have already performed the first step of this process, representing the data, in exercise 1. In exercise 2,
you will explore the data.
In this exercise, you were introduced to Geostatistical Wizard and to the process of creating an interpolation
model. The following exercises will refine this process by extracting as much pertinent information as possible
from the data to create a better model.
18
If you closed your previous ArcMap session, start the program again and open Ozone
Prediction Map.mxd.
2.
Click the ca_outline layer and drag it under the O3_Sep06_3pm layer in the table of contents.
19
3.
4.
On the Geostatistical Analyst toolbar, click Geostatistical Analyst > Explore Data >
Histogram.
5.
On the Histogram dialog box, click the Attribute arrow and click OZONE.
20
The x-axis values have been rescaled by a factor of 10 to make them easier to read. You
might want to resize and move the Histogram dialog box so that you can also see the map, as
shown below.
21
The distribution of the ozone values is depicted in the histogram with the range of values split
into 10 classes. The frequency of data within each class is represented by the height of each
bar. Generally, the important features of a distribution are its central value, spread, and
symmetry. As a quick check, if the mean and the median are approximately the same value,
you have one piece of evidence that the data may be normally distributed.
The ozone data histogram indicates that the data is unimodal (one hump) and skewed right.
The right tail of the distribution indicates the presence of a relatively small number of sample
points with large ozone concentration values. It seems that the data is not close to a normal
distribution.
6.
Select the two histogram bars with ozone values larger than 0.10 ppm (recall that the values
have been rescaled by a factor of 10) by clicking and dragging the pointer over them.
The sample points within this range are selected on the map. Note that most of these sample
points are located in California's Central Valley.
7.
Click the Clear Selected Features button on the Tools toolbar to clear the selected points on
the map and histogram.
8.
Click the Close button located in the upper corner of the Histogram dialog box.
22
1.
On the Geostatistical Analyst toolbar, click Geostatistical Analyst > Explore Data > Normal
QQPlot.
2.
A general QQ plot is a graph on which the quantiles from two distributions are plotted versus
each other. For two identical distributions, the QQ plot will be a straight line. Therefore, it is
possible to check the normality of the ozone data by plotting the quantiles of that data versus
the quantiles of a standard normal distribution. From the normal QQ plot above, you can see
that the plot is not very close to being a straight line. The main departure from this line occurs
at low values of ozone concentration (selected and shown in green in the image above, which
have been selected by clicking and dragging the pointer over them).
If the data does not exhibit a normal distribution in either the histogram or normal QQ plot, it
may be necessary to transform the data to make it conform to a normal distribution before
using certain kriging interpolation techniques.
3.
Click the Close button located in the upper corner of the Normal QQPlot dialog box.
23
On the Geostatistical Analyst toolbar, click Geostatistical Analyst > Explore Data > Trend
Analysis.
2.
24
3.
Click the Rotate Locations scroll bar and scroll left until the rotation angle is 90 degrees.
You can see that while you rotate the points, the trends always exhibit upside-down U shapes.
Also, the trend does not seem to be stronger (a more pronounced U shape) for any particular
rotation angle, reaffirming the observation above that there is a strong trend from the center of
the data domain in all directions. Because the trend is U shaped, a second-order polynomial is
a good choice to use as a global trend model. This trend is possibly caused by the fact that the
pollution is low at the coast, but farther inland there are large human populations that taper off
again at the mountains. You will remove these trends in exercise 4.
25
4.
Click the Close button located in the upper corner of the Trend Analysis dialog box.
On the Geostatistical Analyst toolbar, click Geostatistical Analyst > Explore Data >
Semivariogram/Covariance Cloud.
26
2.
27
3.
Click the Select Features by Rectangle button on the Tools toolbar, then click and drag the
pointer over some points with large semivariogram (y-axis) values in the Semivariogram/
Covariance Cloud dialog box to select them. (Use the diagram on the left as a guide. It is not
important to select exactly the same points as those shown in the diagram below.)
The pairs of sample locations that are selected in the semivariogram are highlighted on the
map, and lines link the locations, indicating the pairing. As it might be expected from the
default kriging prediction map, the lines with high semivariogram values for a particular
distance between the points in a pair are those that correspond to the largest gradient in the
ozone values.
The diagram below shows pairs with typical semivariogram values for approximately the same
distances between the pairs of points.
28
Most of the lines are roughly parallel to the coast lines. We see that there are directional
influences affecting the data. The reasons for these directional influences may be known by
local environmental scientists, and they can be statistically quantified without knowing the
sources for high air pollution. These directional influences will affect the accuracy of the
surface you create in the next exercise. However, once you know if one exists, Geostatistical
Analyst provides tools to account for it in the surface-creation process. To explore for a
directional influence in the semivariogram cloud, you can use the Search Direction tools.
4.
5.
The direction the pointer is facing determines which pairs of data locations are plotted on the
semivariogram. For example, if the pointer is facing an eastwest direction, only the pairs of
data locations that are east or west of one another will be plotted on the semivariogram. This
enables you to eliminate pairs you are not interested in and to explore the directional
influences on the data.
29
6.
Click and drag the Select Features by Rectangle tool along the pairs with the highest
semivariogram values to select them on the plot and in the map. (Use the following diagram as
a guide. It is not important to select the exact points in the diagram or to use the same search
direction.)
Notice that the majority of the linked locations (representing pairs of points on the map)
correspond to one of the sample points from the central California region. This is because the
values of ozone in this area are higher than anywhere else in California.
7.
Click the Close button in the upper corner of the dialog box.
8.
Click the Clear Selected Features button on the Tools toolbar to clear the selected points on
the map.
The normal QQ plot also shows that the data is not normally distributed, since the points in the plot do
not form a straight line. A data transformation may be necessary.
Using the Trend Analysis tool, you saw that the data exhibited a trend and, once refined, identified that
the trend would be best fit by a second-order polynomial.
The semivariogram/covariance cloud illustrated that the unusually high semivariogram values are
largely represented by the lines perpendicular to the coast. The analysis using this tool indicates that
the interpolation model should account for anisotropy.
The semivariogram surface indicates there is spatial autocorrelation in the data. Knowing that there
are no outlier (or erroneous) sample points in the dataset, you can proceed with confidence to the
surface interpolation. You will be able to create a more accurate surface than the one you created in
exercise 1 using default options and parameter values because you now know that there is trend and
30
anisotropy in the data and you can adjust for it in the interpolation. Also, a data transformation may
improve the prediction model.
In exercise 3 you will use what you have learnt about the ozone data to create a better interpolation model
than the one created in exercise 1 which was based on default parameter values.
31
You will again use the ordinary kriging interpolation method, but this time incorporate trend and anisotropy in
your model to create better predictions. Ordinary kriging is the simplest geostatistical model because the
number of assumptions behind it is the lowest.
Steps:
1.
If you closed your previous ArcMap session, start the program again and open Ozone
Prediction Map.mxd.
2.
Make sure that none of the points representing ozone measurements are selected. If some are,
clear the selection by clicking the Clear Selected Features button on the Tools toolbar.
3.
On the Geostatistical Analyst toolbar, click Geostatistical Analyst > Geostatistical Wizard.
4.
5.
6.
Click the Attribute drop-down arrow and click the OZONE attribute.
7.
Click Next.
By default, Ordinary Kriging type and Prediction output type are selected.
32
From the exploration of your data in exercise 2, you discovered a global trend. After refinement
with the Trend Analysis tool, you determined that a second-order polynomial seemed
reasonable. This trend can be represented by a mathematical formula and removed from the
data.
Once the trend is removed, the statistical analysis will be performed on the residuals or the shortrange variation component of the surface. The trend will automatically be added back before the
final surface is created so that the predictions produce meaningful results.
8.
Click the Order of trend removal drop-down arrow and click Second.
A second-order polynomial will be fitted because a Ushaped curve was detected in the Trend
Analysis dialog box in exercise 2.
9.
Click Next.
33
By default, Geostatistical Analyst maps the global trend in the dataset. The surface indicates the
most rapid change in the southwestnortheast direction and a more gradual change in the
northwestsoutheast direction (causing the ellipse shape).
Trends should only be removed if there is justification for doing so. The southwestnortheast
trend in air quality can be attributed to an ozone buildup between the mountains and the coast.
The elevation and prevailing wind direction are contributing factors to the relatively low values in
the mountains and at the coast. The high concentration of humans also leads to high levels of
pollution between the mountains and coast. Hence, you can justifiably remove these trends.
10.
Semivariogram/Covariance modeling
Using the semivariogram/covariance cloud tool in exercise 2, you explored the overall spatial autocorrelation
of the measured points. To do so, you examined semivariogram values, which showed the difference
squared of the ozone measurements taken at pairs of sampling locations separated by different distances.
The goal of semivariogram/covariance modeling is to determine the best fit for a model that will pass
through the points in the semivariogram (shown by the blue line in the diagram below). The semivariogram
is a graphic representation used to provide a picture of the spatial correlation in the dataset.
The Semivariogram/Covariance Modeling dialog box allows you to fit models to the spatial relationships in
the dataset. Geostatistical Analyst first determines good lag sizes for grouping semivariogram values. The
lag size is the size of a distance class into which pairs of locations are grouped to reduce the large number
34
of possible combinations. In exercise 2, the semivariogram/covariance cloud showed one red point for every
pair of points in the dataset. Our goal now is to fit a curve through those points. To have a clearer picture of
the semivariogram values, the empirical semivariogram values (red points) are grouped according to the
separation distance they are associated with. The points are split into bins (or lags), and the lag size
determines how wide each interval (bin) will be. This process is known as binning.
As a result of the binning, notice that there are fewer points in this semivariogram than in the semivariogram
cloud you saw in exercise 2. The Semivariogram/Covariance Modeling dialog box displays the
semivariogram values as a surface (map on the bottom left of the dialog box) and as a scatterplot relating
semivariogram values to separation distance. By default, optimal parameter values are calculated for an
omnidirectional (all directions) stable semivariogram model. There are several other types of semivariogram
models that could be used, depending on how well they fit the data. Parameter values for the
omnidirectional stable semivariogram model are the nugget, range, partial sill, and shape. You will notice
that at smaller distances, the semivariogram model (blue line) rises sharply, then levels off. The range is the
distance where it levels off. This flattening out of the semivariogram indicates that there is little
autocorrelation in attribute (ozone) values beyond the range.
The fitting algorithm actually uses variable lag sizes, which allow the spatial autocorrelation in ozone
concentrations to be captured quite well, especially at short distances (which are the most important for
interpolation). Semivariogram values for these lag sizes can be exported by selecting Save geometrical
values as table in the Export > Variography section of the dialog box and visualized using Excel, for
example.
35
By removing the trend, the semivariogram will model the spatial autocorrelation among data points without
having to consider the trend in the data. The trend will be automatically added back to the calculations
before the final surface is produced.
The color scale, which represents the calculated semivariogram value, provides a direct link between the
empirical semivariogram values on the graph and those on the semivariogram surface. The value of each
cell in the semivariogram surface is color coded, with lower values shown in blue and green and higher
values shown in orange and red. The average value for each cell of the semivariogram surface is plotted on
the semivariogram graph and depicted as a red point. The average value for each lag (which encompasses
many cells) is plotted on the semivariogram graph as well, and is depicted as a blue cross. The x-axis on the
semivariogram graph is the distance from the center of the cell to the center of the semivariogram surface.
For the ozone data, the semivariogram starts low at short distances (ozone values measured at locations
that are close together are similar) and increases as distance increases (ozone values get more dissimilar
the farther apart they were measured). Notice from the semivariogram surface that dissimilarity in ozone
values increases more rapidly in the westeast direction than in the southnorth direction. Earlier, you
removed a coarse-scale trend. Now it appears that there is still a directional component to the
autocorrelation, so you will incorporate that into the next model.
Directional semivariograms
A directional influence will affect the points of the semivariogram and the model that will be fit. In certain
directions, things that are closer to each other may be more alike than in other directions. Geostatistical
Analyst can account for directional influences, or anisotropy in the semivariogram model. Anisotropy can be
caused by wind, runoff, a geological structure, or a wide variety of other processes. The directional influence
can be statistically quantified and accounted for when making your map.
You can explore the dissimilarity in data points for a certain direction with the Search Direction tool. This
allows you to examine directional influences on the semivariogram chart. It does not affect the output
surface. The following steps show how to achieve this.
Steps:
1.
Type a new Lag size value of 15000. Reducing the lag size means that you are zooming in to
model the details of the local spatial data variation.
2.
36
3.
Click and hold the mouse pointer on the center blue line of the Search Direction tool.
Change the search direction by dragging the center line. As you change the direction of the
search, note how the semivariogram graph changes. Only the semivariogram surface values
within the direction of the search are plotted on the semivariogram graph above.
To actually account for the directional influences on the semivariogram model for the surface
calculations, you must calculate the anisotropical semivariogram or covariance model.
4.
37
The blue ellipse on the semivariogram surface indicates the range of the semivariogram in
different directions. In this case, the major axis lies approximately in the NNWSSE direction.
Anisotropy will now be incorporated into the model to adjust for the directional influence of
autocorrelation in the output surface.
5.
Change the search direction angle under View Settings from 0 to 61.35 to make the directional
pointer coincide with the minor axis of the anisotropical ellipse.
38
Note that the shape of the semivariogram curve increases more rapidly to its sill value. The xand ycoordinates are in meters, so the range in this direction is approximately 110 kilometers.
6.
Change the search direction angle under View Settings from 61.35 to 151.35 to make the
directional pointer coincide with the major axis of the anisotropical ellipse.
39
The semivariogram model increases more gradually, then flattens out. The range in this direction
is about 180 kilometers. The plateau that the semivariogram models reach in both steps 13 and
14 is the same and is known as the sill. The range is the distance at which the semivariogram
model reaches its limiting value (the sill). Beyond the range, the dissimilarity between points
becomes constant with increased lag distance. Points separated by distances larger than the
range are spatially uncorrelated with each other.
The nugget represents measurement error and/or microscale variation (variation at spatial scales
too fine to detect). It is possible to estimate the measurement error if you have multiple
observations per location, or you can decompose the nugget into measurement error and
microscale variation using the Error Measurement control.
7.
Click Next.
40
Now you have a fitted model to describe the spatial autocorrelation, taking into account the trend
and directional influences in the data. This information, along with the configuration and
measurements of locations around the prediction location, is used to make a prediction. But how
should measured values be used to make the predictions?
Searching neighborhood
It is common practice to limit the data used by defining a circle (or ellipse) to enclose the points that are
used to predict a value at an unmeasured location. Additionally, to avoid bias in a particular direction, the
circle (or ellipse) can be divided into sectors from which an equal number of points are selected. By using
the Searching Neighborhood dialog box, you can specify the number of points (a maximum of 200), the
radius (or major/minor axis), and the number of sectors of the circle (or ellipse) to be used for prediction.
The points selected in the data view window indicate the weights that will be associated with each measured
value to predict a value for the location marked by the crosshair. In this example, five measured values
(shown in red) have weights of more than 10 percent. The larger the weight, the more impact that value will
have on the prediction for the location at the crosshair.
Steps:
1.
Click on the surface preview to select a prediction location (where the crosshairs are located).
Note the change in the selection of data locations (together with their associated weights) that
will be used for calculating the value at the prediction location.
2.
For the purpose of this exercise, under Predicted Value, type 66000 for X and -220000 for Y.
41
3.
Change the Copy from Variogram to False and type 90 in the Angle text box.
Notice how the shape of the search ellipse changes.
However, to account for the directional influences, change the Copy from Variogram control back
to True.
4.
Click Next.
The Cross Validation dialog box will appear.
42
Before you actually create the surface, you will use the Cross Validation dialog box to perform
diagnostics on the parameters to determine how good your model will be.
Cross-validation
The objective of cross-validation is to help you make an informed decision about which model provides the
most accurate predictions. It gives you an idea of how well the model predicts the unknown values. Crossvalidation sequentially omits a point in the dataset, predicts a value for that point's location value using the
rest of the data, then compares the measured and predicted values (the difference between the measured
and predicted value is known as a prediction error). The statistics calculated on the prediction errors serve
as diagnostics that indicate whether the model is reasonable for decision making and map production.
To judge if a model provides accurate predictions, verify that
The predictions are unbiased, indicated by a mean prediction error close to 0.
The standard errors are accurate, indicated by a root-mean-square standardized prediction error
close to 1.
The predictions do not deviate much from the measured values, indicated by root-mean-square
error and average standard error that are as small as possible.
The Cross Validation dialog box also allows you to display scatterplots that show the error, standardized
error, and QQ plot for each data point.
Steps:
1.
43
From the QQPlot tab, you can see that some values fall slightly above the line and some slightly
below the line, but most points fall very close to the straight dashed line, indicating that prediction
errors are close to being normally distributed.
2.
To select the location for a particular point, click the row that relates to the point of interest in the
table. The selected point is shown in green on the QQ plot.
3.
Optionally, click the Export Result Table button to save a point feature class for further analysis
of the results.
4.
Click Finish.
The Method Report dialog box provides a summary of the model that will be used to create a
surface.
44
5.
Click OK.
The predicted ozone map will appear as the top layer in ArcMap. By default, the layer assumes
the name of the interpolation method used to produce the surface (for instance, Kriging).
6.
Click the layer name and change the name to Trend Removed.
7.
To extend the prediction surface so that it covers all of California, right-click the Trend
Removed layer, click Properties, then click the Extent tab. Under Set the extent to, specify the
rectangular extent of ca_outline and click OK.
8.
Drag the O3_Sep06_3pm layer to the top of the table of contents so that you can see the points
on top of the interpolated surface.
9.
Right-click the Trend Removed layer that you created and click Change output to Prediction
Standard Error.
45
The prediction standard errors quantify the uncertainty for each location in the surface that you
created. A simple rule of thumb is that 95 percent of the time, the true ozone value will be inside
the interval formed by the predicted value 2 times the prediction standard error, assuming that
the data is normally distributed. Notice in the prediction standard error surface that locations near
sample points generally have lower error.
10.
Right-click the Trend Removed layer that you created and click Change output to Prediction
to get back to the ozone prediction map.
46
11.
The surface you created in exercise 1 simply used the defaults of the Geostatistical Wizard, with no
consideration of trends in the surface, of using smaller lag sizes, or of using an anisotropic semivariogram
model. The prediction surface you created in this exercise took into consideration the global trends in the data
and adjusted for the local directional influence (anisotropy) in the semivariogram.
In exercise 4, you will compare the two models to see which one provides a better prediction of unknown
values.
47
If you closed your previous ArcMap session, start the program again and open Ozone
Prediction Map.mxd.
2.
The Cross Validation Comparison dialog box will appear and automatically compare the Trend
Removed model to the Default Kriging model (as it is the only other model in the table of
contents).
48
3.
The predictions do not deviate much from the measured values, indicated by root-meansquare error and average standard error that are as small as possible.
You can also use the Predicted, Error, Standardized Error, and Normal QQPlot tabs to see
graphical representations of the performance of each model.
Based on these criteria, the Trend Removed model performs better than Default Kriging.
4.
5.
49
6.
You have now identified the best of two prediction surfaces, but you might want to create other types of
surfaces to support your analysis of the phenomenon and any decision you may need to make based on the
interpolated values.
In exercise 5, you will use indicator kriging to calculate the probability that a critical ozone threshold value was
exceeded and generate a final map showing all the surfaces you have created in this tutorial.
50
If you closed your previous ArcMap session, start the program again and open the Ozone
Prediction Map.mxd.
2.
On the Geostatistical Analyst toolbar, click Geostatistical Analyst > Geostatistical Wizard.
3.
4.
5.
Click the Attribute drop-down arrow and click the OZONE attribute.
6.
Click Next.
7.
Click Indicator Kriging; notice that Probability Map is selected as the output type.
8.
Make sure that the Threshold is set to Exceed, then set the Primary Threshold Value to 0.09.
51
9.
Click Next.
10.
11.
Change Anisotropy to True to account for the directional nature of the data.
The blue lines show the estimated semivariogram models in different directions.
12.
13.
52
The blue line represents the threshold value (0.09 ppm). Points to the left of the blue line have an
indicator-transform value of 0, whereas points to the right have an indicator-transform value of 1.
14.
Click to select a row in the table with an indicator value of 0. The selected point will be shown in
green on the scatterplot, to the left of the blue threshold line.
In the case of the selected row in the figure below, the prediction is exactly the same as the
indicator value.
The Measured and Indicator columns display the actual and transformed values for each sample
location. The indicator prediction values can be interpreted as the probability of exceeding the
threshold. The indicator prediction values are calculated using the semivariogram modeled from
the binary (0,1) data, created based on indicator transformations of your original data. Crossvalidation sequentially omits a point and calculates indicator prediction values for each. For
example, the highest measured value is 0.121. If this location had not actually been measured,
the indicator kriging model shows about a 78 percent chance that the ozone value at that
location was above the 0.09 ppm threshold.
15.
16.
53
It is clear from the map that in central California, the probability that ozone concentrations
exceed the threshold of 0.09 ppm is likely.
17.
18.
Drag the Indicator Kriging layer to reposition it between the O3_Sep0_3pm and Trend
Removed layers.
19.
20.
Click the Extent tab and set the extent to specify the rectangular extent of ca_outline.
21.
Click Apply.
22.
23.
Uncheck the Filled Contours option and check the Contours option.
24.
Click Contours so that the symbology for contours (lines) appears. Choose a green to blue color
ramp.
54
25.
Click the Classify button. In the Classification dialog box, change Method to Equal Interval and
Classes to 5.
26.
55
27.
As a final touch to the map, you can add the ca_hillshade dataset to the project (from
C:\ArcGIS\ArcTutor\Geostatistical Analyst). It should be added to the bottom of the table of
contents and depicted using a white to black color ramp.
28.
Right-click the Trend Removed layer, click Properties, then click the Display tab.
29.
56
30.
In this tutorial, you have been introduced to the Geostatistical Wizard, data exploration using the ESDA tools,
ordinary kriging (using default parameter values and more refined options) to predict ozone values across
California, and indicator kriging to map the probability that ozone concentrations exceeded a critical threshold
value. Many other interpolation methods are offered in the Geostatistical Wizard, and several are also offered
in geoprocessing tools that can be used in ModelBuilder.
57