Workbook Arc GISPro
Workbook Arc GISPro
®
Spatial Analysis with ArcGIS Pro
®
STUDENT EDITION
Copyright © 2019 Esri
All rights reserved.
The information contained in this document is the exclusive property of Esri. This work is
protected under United States copyright law and other international copyright treaties and
conventions. No part of this work may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying and recording, or by any information storage or
retrieval system, except as expressly permitted in writing by Esri. All requests should be sent to
Attention: Director, Contracts and Legal, Esri, 380 New York Street, Redlands, CA 92373-8100,
USA.
Export Notice: Use of these Materials is subject to U.S. export control laws and regulations
including the U.S. Department of Commerce Export Administration Regulations (EAR). Diversion
of these Materials contrary to U.S. law is prohibited.
Commercial Training Course Agreement Terms: The Training Course and any software,
documentation, course materials or data delivered with the Training Course is subject to the
terms of the Master Agreement for Products and Services, which is available at
http://www.esri.com/~/media/Files/Pdfs/legal/pdfs/ma-full/ma-full.pdf. The license rights in
the Master Agreement strictly govern Licensee's use, reproduction, or disclosure of the
software, documentation, course materials and data. Training Course students may use the
course materials for their personal use and may not copy or redistribute for any purpose.
Contractor/Manufacturer is Esri, 380 New York Street, Redlands, CA 92373-8100, USA.
Esri Trademarks: Esri trademarks and product names mentioned herein are subject to the terms
of use found at the following website: http://www.esri.com/legal/copyright-trademarks.html.
Other companies and products or services mentioned herein may be trademarks, service marks or
registered marks of their respective mark owners.
Table of Contents
Esri resources for your organization.............................................................................................ix
Course introduction
Course introduction .................................................................................................................... 1
Course goals ................................................................................................................................ 2
Installing the course data............................................................................................................. 2
Training Services account credentials .......................................................................................... 3
Icons used in this workbook ........................................................................................................ 4
Understanding the ArcGIS platform ............................................................................................ 5
3 Proximity analysis
Lesson introduction .................................................................................................................. 3-1
i
Using proximity in everyday life................................................................................................ 3-2
Choosing the best distance measure ....................................................................................... 3-3
Ways to measure distance ........................................................................................................ 3-4
Outputs of proximity analysis ................................................................................................... 3-5
Buffering using different distance measures............................................................................. 3-7
Measuring cost ......................................................................................................................... 3-8
Exercise 3: Analyze proximity ................................................................................................... 3-9
Prepare the project ........................................................................................................... 3-10
Select features based on distance .................................................................................... 3-11
Create proximity zones ..................................................................................................... 3-12
Determine the closest store to each customer ................................................................. 3-16
Add and calculate a field .................................................................................................. 3-17
Create desire lines............................................................................................................. 3-18
Create drive-time polygons .............................................................................................. 3-20
Create a distance surface .................................................................................................. 3-23
Lesson review.......................................................................................................................... 3-25
Answers to Lesson 3 questions............................................................................................... 3-26
4 Overlay analysis
Lesson introduction .................................................................................................................. 4-1
Introducing overlay ................................................................................................................... 4-2
How overlay works.................................................................................................................... 4-3
Overlay tools............................................................................................................................. 4-5
Choosing the appropriate tool ................................................................................................. 4-7
Exercise 4: Perform overlay analysis ......................................................................................... 4-8
Make selections based on location ..................................................................................... 4-9
Overlay customers and driving times using the Intersect tool.......................................... 4-11
Overlay customers and driving times using the Identity tool ........................................... 4-14
Remove customers within 15 miles ................................................................................... 4-16
Summarize stream length in a watershed ......................................................................... 4-17
Calculate the amount of each land-use classification ....................................................... 4-19
Lesson review.......................................................................................................................... 4-21
Answers to Lesson 4 questions............................................................................................... 4-22
ii
Add the XY Table To Point tool ........................................................................................... 5-8
Add the Near tool ............................................................................................................... 5-9
Add the Make Feature Layer tool ..................................................................................... 5-10
Add the XY To Line tool .................................................................................................... 5-11
Run the model................................................................................................................... 5-12
Automating and sharing models ............................................................................................ 5-14
Exercise 5B: Use a model to process multiple inputs ............................................................. 5-16
Prepare ArcGIS Pro and make a copy of a model............................................................. 5-17
Add an iterator to a model ............................................................................................... 5-18
Set model parameters....................................................................................................... 5-20
Change model element labels .......................................................................................... 5-24
Lesson review.......................................................................................................................... 5-28
Answers to Lesson 5 questions............................................................................................... 5-29
7 Suitability modeling
Lesson introduction .................................................................................................................. 7-1
What is suitability modeling?.................................................................................................... 7-2
Suitability modeling workflow................................................................................................... 7-3
Evaluating analysis criteria ........................................................................................................ 7-4
Choosing vector or raster overlay............................................................................................. 7-5
Deriving surfaces from other sources ....................................................................................... 7-6
Raster functions and geoprocessing tools................................................................................ 7-7
Levels of measurement ............................................................................................................. 7-8
iii
Transforming values to a common scale................................................................................. 7-10
Exercise 7A: Build a model and classify data to a common scale .......................................... 7-12
Prepare a project and set environments ........................................................................... 7-13
Create a model ................................................................................................................. 7-14
Add input layers and Euclidean Distance tools ................................................................ 7-14
Add the Slope tool and set parameters............................................................................ 7-17
Reclassify land-use values ................................................................................................. 7-17
Rescale the roads distance surface ................................................................................... 7-19
Rescale the stream distance surface ................................................................................. 7-20
Rescale the slope surface.................................................................................................. 7-21
Run the model................................................................................................................... 7-22
Types of raster overlay ............................................................................................................ 7-24
The Raster Calculator.............................................................................................................. 7-26
Locating and analyzing results................................................................................................ 7-27
Exploring data sources ........................................................................................................... 7-29
Exercise 7B: Perform suitability modeling .............................................................................. 7-30
Overlay input rasters ......................................................................................................... 7-31
Create regions................................................................................................................... 7-34
Lesson review.......................................................................................................................... 7-36
Answers to Lesson 7 questions............................................................................................... 7-37
8 Spatial statistics
Lesson introduction .................................................................................................................. 8-1
Spatial patterns......................................................................................................................... 8-2
What are spatial statistics?........................................................................................................ 8-3
Types of spatial statistics........................................................................................................... 8-5
Interpreting inferential statistics................................................................................................ 8-7
Descriptive versus inferential .................................................................................................... 8-9
Spatial statistics tools.............................................................................................................. 8-12
Clusters and outliers ............................................................................................................... 8-13
Clustering tools....................................................................................................................... 8-15
Exercise 8A: Use spatial statistics to explore data .................................................................. 8-17
Prepare ArcGIS Pro ........................................................................................................... 8-18
Locate directional trends in data....................................................................................... 8-19
Run the Average Nearest Neighbor tool .......................................................................... 8-20
Run the Spatial Autocorrelation tool................................................................................. 8-22
Run the Hot Spot Analysis tool ......................................................................................... 8-23
Create a density surface.................................................................................................... 8-25
Exercise 8B: Perform clustering and outlier analysis............................................................... 8-27
Prepare the project ........................................................................................................... 8-28
Perform density-based clustering ..................................................................................... 8-28
Perform optimized hot spot analysis................................................................................. 8-31
iv
Perform optimized outlier analysis .................................................................................... 8-34
Lesson review.......................................................................................................................... 8-37
Answers to Lesson 8 questions............................................................................................... 8-38
9 Space-time analysis
Lesson introduction .................................................................................................................. 9-1
Incorporating time into your analysis........................................................................................ 9-2
Temporal analysis...................................................................................................................... 9-3
Exercise 9A: Explore data ......................................................................................................... 9-5
Use a chart to explore data ................................................................................................. 9-6
Space-time analysis .................................................................................................................. 9-8
Emerging hot spot analysis..................................................................................................... 9-11
Space-time analysis workflow ................................................................................................. 9-13
Exercise 9B: Explore space-time pattern mining tools ........................................................... 9-15
Explore data using charts.................................................................................................. 9-16
Create a space-time cube ................................................................................................. 9-18
Run the Emerging Hot Spot Analysis tool......................................................................... 9-19
Visualize a space-time cube in 3D..................................................................................... 9-21
Lesson review.......................................................................................................................... 9-24
Answers to Lesson 9 questions............................................................................................... 9-25
10 Regression analysis
Lesson introduction ................................................................................................................ 10-1
Explaining spatial patterns...................................................................................................... 10-2
Causes of spatial patterns....................................................................................................... 10-3
What is regression?................................................................................................................. 10-4
Regression equation ............................................................................................................... 10-6
OLS regression........................................................................................................................ 10-9
Checkpoint ........................................................................................................................... 10-11
Interpreting OLS diagnostics ................................................................................................ 10-12
Six OLS checks...................................................................................................................... 10-14
OLS reports........................................................................................................................... 10-17
Exploratory regression .......................................................................................................... 10-19
Exercise 10: Find a properly specified regression model ..................................................... 10-21
Set up ArcGIS Pro ........................................................................................................... 10-22
Perform exploratory data analysis................................................................................... 10-22
Use the Generalized Linear Regression tool to test for higher spending factors............ 10-24
Evaluate the spatial output from the GLR tool................................................................ 10-25
Create a scatter plot matrix............................................................................................. 10-27
Run the GLR tool on multiple dependent variables........................................................ 10-30
Perform OLS checks ........................................................................................................ 10-30
Lesson review........................................................................................................................ 10-35
v
Enriching data for analysis .................................................................................................... 10-36
Answers to Lesson 10 questions........................................................................................... 10-37
12 Geostatistical interpolation
Lesson introduction ................................................................................................................ 12-1
Deterministic interpolation ..................................................................................................... 12-2
Geostatistical interpolation..................................................................................................... 12-4
Kriging .................................................................................................................................... 12-5
Geostatistical workflow ........................................................................................................... 12-6
Exercise 12: Use the Geostatistical Wizard to perform kriging............................................... 12-9
Set up the ArcGIS Pro project ......................................................................................... 12-10
Explore the data distribution .......................................................................................... 12-10
Perform kriging using the Geostatistical Wizard ............................................................. 12-12
Evaluate predicted value and error ................................................................................. 12-15
Empirical Bayesian kriging (EBK) .......................................................................................... 12-17
Lesson review........................................................................................................................ 12-20
13 3D analysis
Lesson introduction ................................................................................................................ 13-1
When to use 3D analysis......................................................................................................... 13-2
3D analysis examples.............................................................................................................. 13-3
Interactive 3D analysis ............................................................................................................ 13-6
Exercise 13: Perform 3D analysis ............................................................................................ 13-8
Set up the project ............................................................................................................. 13-9
Create sight lines .............................................................................................................. 13-9
Perform line-of-sight analysis .......................................................................................... 13-11
Create a 3D buffer........................................................................................................... 13-13
Intersect 3D features ....................................................................................................... 13-15
vi
Lesson review........................................................................................................................ 13-18
Answers to Lesson 13 questions........................................................................................... 13-19
Appendices
Appendix A: Esri data license agreement ............................................................................... A-1
Appendix B: Answers to lesson review questions ....................................................................B-1
Appendix C: Additional resources........................................................................................... C-1
vii
Esri resources
Take advantage of these resources to develop ArcGIS software skills, discover applications of
geospatial technology, and tap into the experience and knowledge of the ArcGIS community.
Esri publications: Access online editions of ArcNews, ArcUser, and ArcWatch at esri.com/esri-
news/publications
Esri Press
Esri Press publishes books on the science and technology of GIS in numerous public and private
sectors. esripress.esri.com
ix
Esri resources (continued)
GIS bibliography
A comprehensive index of journals, conference proceedings, books, and reports related to GIS,
including references and full-text materials. gis.library.esri.com
GeoNet
Join the online community of GIS users and experts. esri.com/geonet
Esri events
Esri conferences and user group meetings offer a great way to network and learn how to achieve
results with ArcGIS. esri.com/events
Esri Videos
View an extensive collection of videos by Esri leaders, event keynote speakers, and product
experts. youtube.com/user/esritv
GIS Dictionary
This term browser defines and describes thousands of GIS terms. http://support.esri.com/other-
resources/gis-dictionary
x
Course introduction
Welcome to Spatial Analysis with ArcGIS Pro. In this course, you will learn essential concepts and
a standard workflow that you can apply to any spatial analysis project. You will work with various
ArcGIS tools to explore, analyze, and produce reliable information from data.
• Overlay combines features and attributes, and you can apportion numeric attributes for split
features.
• Overlay can be performed on vector or raster data; each uses different tools.
This course will help you understand GIS analysis, which helps people answer questions about
their data and the spatial relationships within the data. It teaches a standard GIS analysis workflow
that can be applied to any analysis question.
After learning this workflow, you will follow it while performing the four types of analysis to answer
real-world questions like the following:
1
Course goals
After completing this course, you will be able to perform the following tasks:
• Quantify spatial patterns using spatial statistics and analyze change over time to identify
emerging hot spots.
• Use interpolation and regression analysis to explain why patterns occur and predict how
patterns will change.
• Prepare data and choose appropriate tools and settings for an analysis.
• Examine features and distribution patterns within an area of interest and identify optimal
locations using 2D and 3D analysis tools.
DISCLAIMER: Some courses use sample scripts or applications that are supplied
either on the DVD or on the Internet. These samples are provided "AS IS," without
warranty of any kind, either express or implied, including but not limited to, the
implied warranties of merchantability, fitness for a particular purpose, or
noninfringement. Esri shall not be liable for any damages under any theory of law
related to the licensee's use of these samples, even if Esri is advised of the possibility
of such damage.
2
Training Services account credentials
Your instructor will provide a temporary account and group to use during class.
Password: __________________________________________________________________
After completing this course, you will need your own account to perform course exercises that
require signing in to ArcGIS Online. The sign-in steps will vary based on your account type.
3
Icons used in this workbook
Estimated times provide guidance on approximately how many minutes an
exercise will take to complete.
4
Understanding the ArcGIS platform
ArcGIS is a Web GIS platform that you can use to deliver your authoritative maps, apps,
geographic information layers, and analytics to wider audiences.
• Individuals interact with ArcGIS through apps running on desktops, in web browsers, and on
mobile devices.
• Organizations share their authoritative geospatial data, maps, and tools as web services to a
central portal that supports self-service mapping, analytics, and collaboration. Organizations
deploy portals in the cloud, in their own infrastructure, or in both.
• Individuals use ArcGIS apps and portals to find authoritative content, create web maps and
web apps, perform analytics, and share results.
• Organizations leverage the information shared by individuals to make more informed
decisions, communicate with partners and stakeholders, and engage the public.
• A portal is a collaborative space where users can create, analyze, organize, store, and share
geospatial content. Within ArcGIS there are two ways to implement a portal: use ArcGIS
Online or deploy ArcGIS Enterprise.
5
1 Building a foundation for spatial analysis
Welcome to Spatial Analysis Using ArcGIS Pro, a course that will use spatial analysis to assist
you in making important decisions in your work. This lesson introduces spatial analysis and
presents a workflow that you can apply to any analytical project and data. You will also learn
about the various types of spatial analysis, many of which you will use throughout the course.
Topics covered
1-1
Lesson 1
When you look at a map, you think about the features and relationships that you see. If the map
illustrates wildlife habitats, you might conclude that some animals chose particular areas for forest
cover or proximity to water. If the map illustrates fire risk areas, you might conclude that the fire
risk comes from a certain vegetation type, the lack of rainfall, wind exposure, or a combination of
them all. Based on what you see in a map, you draw conclusions that reflect your understanding
of spatial data.
But sometimes the map's visual elements are not enough for you to understand what is occurring
or why.
Can you make any assumptions based on the following crime locations?
Figure 1.1. Points on the left show where crimes have occurred. Results of spatial analysis on the right show hot and
cold spots of cime incidents.
When you cannot rely solely on a map's visual elements to answer questions, you can perform
spatial analysis. Spatial analysis is the process of examining the locations, attributes, and
relationships of features in spatial data to help gain a better understanding or answer questions.
1-2
Building a foundation for spatial analysis
In this short video, Lauren Bennett, a product engineer on the Esri spatial analysis team, discusses
the benefits of spatial analysis.
1-3
Lesson 1
Spatial analysis can provide numerous benefits, such as reduction of costs and increases in
efficiency, productivity, and revenue. Spatial analysis sets true GIS software, including ArcGIS Pro,
apart from other map-viewing applications.
1-4
Building a foundation for spatial analysis
You have probably performed some type of nonspatial analysis before. When you add a
geographical element to your questions and subsequent decisions, you make analysis more
complex by adding spatial properties like distance and direction. Spatial properties have a
significant effect on the analytical methods that you use to solve a particular problem.
To help classify spatial properties, analytical problems are categorized into six groups. Each group
reflects a set of related questions, described in the following table.
Analysis Description
problem
Understand At the most rudimentary level, you are lost if you do not know where
where you are or what is around you. Asking "Where?" is the first question in
spatial analysis.
Measure size, You may want to describe an object in terms of its geometry, such as
shape, and area, perimeter, or length. You may also want to describe the
distribution distribution of several objects.
Determine You may need to describe and quantify the relationships among
how places features to determine what is near, what is within, or how something
are related overlaps in space and time.
Find the best You may need to find the best route to travel, or the best location to
locations and build a new storefront or station.
paths
Detect and You may need to find patterns in data, such as hot spots or outliers. You
quantify may also need to determine how those patterns change over time.
patterns
Make You may need to determine how things may appear in the future or
predictions how crime or fire danger will spread.
1-5
Lesson 1
You can use different types of spatial analysis to answer questions. Most real-world GIS spatial
analyses may use several types at one time to solve spatial problems.
Analysis Description
type
Proximity • Determines which features are close to other features, the exact distance
between features, or which features are within a certain distance of other
features
1-6
Building a foundation for spatial analysis
Analysis Description
type
Network • Determines solutions for complex routing problems to help locate the
best, most cost-effective path for delivering resources
1-7
Lesson 1
The analysis workflow provides a framework for you to plan, organize, execute, and share your
spatial analysis project. The analysis process may not always be linear. Sometimes, after the initial
examination of the analysis results, you may have more questions that require another smaller,
more focused analysis before you can answer the initial question.
Figure 1.3. The spatial analysis workflow contains standard steps that you can apply to any analysis.
1. Ask questions • Determine the questions and the criteria, which determines
the data
1-8
Building a foundation for spatial analysis
3. Analyze and • Choose the appropriate methods and tools and run
model
4. Interpret results • View and interpret results to identify flaws or errors in the
process
7. Make decisions • Use analysis results to answer the initial question and make
decisions
1-9
Lesson 1
You have learned about various spatial analysis tools and a standard workflow. You will apply what
you have learned to determine the possible ways to solve a spatial problem.
1. How can spatial analysis help identify the best location for the distribution center?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
2. Which types of analysis tools would you use to locate a suitable site for the distribution
center?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
1-10
Building a foundation for spatial analysis
Lesson review
2. What helps you choose the appropriate datasets for your analysis?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
1-11
Answers to Lesson 1 questions
2. Which types of analysis tools would you use to locate a suitable site for the distribution center?
• Proximity
• Overlay
• Network for routing and delivery
1-12
2 Planning and preparing for spatial analysis
As you progress through the spatial analysis workflow, you will see that exploring your data
can be a time-consuming process. It takes time to examine data quality and completeness,
and to determine whether the analysis of your data will yield the results you want. In this
lesson, you will explore data and learn how to modify your existing datasets efficiently to
ensure optimal analysis results.
Topics covered
Environment settings
2-1
Lesson 2
Data properties
Preparing to perform analysis involves criteria that are useful for answering your analysis
questions. These criteria include metadata properties, which render your data easier to use and
more effective. You will now discuss several useful metadata properties.
2-2
Planning and preparing for spatial analysis
When you perform analysis with raster data, you will consider things that you do not normally
consider with vector data, such as cell size, masks, and NoData values. You can work with all the
raster data considerations mentioned here as geoprocessing environment settings.
Cell size
Cell size refers to the ground dimensions of a single cell in a raster, measured in map units. Cell
size is determined at the point of data capture based on the scale and device. You must
determine the best cell size for your data and analysis as it is often a parameter in analysis tools.
With rasters of varying resolutions, your analysis results are only as accurate as the lowest
resolution dataset.
Figure 2.1. Each raster has a specific cell size. Cell size is often used synonymously with pixel size or resolution. The
larger the cell size, the less detail, or resolution; the smaller the cell size, the more detail, or resolution.
2-3
Lesson 2
Figure 2.2. The mask is the Ohio state boundary. The result of processing with that mask includes only data within
the mask.
If rasters for your analysis contain NoData values, you can set ArcGIS Pro to ignore
those cells or to estimate values for them.
Extent: The minimum bounding rectangle that defines an area of analysis. Extent could be
another layer or the current extent of the map.
Resampling: The process of aggregating or interpolating new cell values when transforming
rasters to a new coordinate space or cell size.
2-4
Planning and preparing for spatial analysis
Environment settings
What are environment settings and why should you use them?
Environment settings are background settings that directly affect tool outputs. Environment
settings help ensure consistent analysis results.
2-5
Exercise 2 35 minutes
When you have identified your analysis question and criteria, you should have a good idea of the
required data. The second step of the analysis workflow is to explore data that you have and, if
necessary, change or acquire new data to replace or supplement the existing data.
2-6
Planning and preparing for spatial analysis
e Type the organizational account user name and password provided to you by your instructor
and click Sign In.
The map displays customer locations in the Boston, Massachusetts, area. You will use the
customer locations for several different analyses in the course.
h In the Catalog pane, expand Folders, and then expand SNAP to see the course folder
structure.
2-7
Lesson 2
The course data is stored in several folders and geodatabases, but you will create outputs in the
project geodatabase. By default, the output workspace is already set to the project geodatabase
in the environment settings.
The scratch workspace differs from the current workspace in that it is designed for
output data that you do not want to maintain. The primary purpose of the scratch
workspace is for use in ModelBuilder and Python scripts.
j Click OK.
The Customers layer is stored in NAD 1927 UTM Zone 19N. Your organization has standardized a
coordinate system of NAD 1983 StatePlane FIPS 2001. For analysis, it is best to store all data in
the same coordinate system to ensure consistent results.
d Click OK.
Next, you will find the Project tool to reproject data. All licensed geoprocessing tools are available
in the Geoprocessing pane. Some commonly used tools are located in the Analysis Gallery.
2-8
Planning and preparing for spatial analysis
• In the Coordinate system dialog box, type 1983 StatePlane Massachusetts and
press Enter.
• Expand Projected Coordinate System, State Plane, and NAD 1983 (Meters), and
then click NAD 1983 StatePlane Massachusetts FIPS 2001 (Meters) and click OK.
• Geographic Transformation: Use the default setting.
i Click Run.
j In the Contents pane, double-click the first Customers layer to open its properties.
The updated coordinate system information indicates that your data is in the correct coordinate
system for analysis. The coordinate system of the map is still set to NAD 1927, so you will change
that.
2-9
Lesson 2
o In the search field, type 1983 StatePlane Massachusetts and press Enter.
p Under Projected Coordinate System, expand State Plane and NAD 1983 (Meters), and then
select NAD 1983 StatePlane Massachusetts FIPS 2001 (Meters).
q Click OK.
r In the Contents pane, right-click the second Customers layer (the one in NAD27) and choose
Remove.
2-10
Planning and preparing for spatial analysis
You will use the x- and y-coordinates to create the stores spatially.
d Search for xy, and find and open the XY Table To Point tool.
f Click Run.
g In the Contents pane, under the BostonStores layer, click the symbol and assign the Square 1
symbology to it.
2-11
Lesson 2
The stores are now part of your geodatabase as their own feature class.
BostonStores contains the same attributes as the original Stores table. However, some attributes
are not present, such as the name and address.
The StoresTable contains address information and the number of employees for each store.
1. How can you join the attributes from StoresTable to the BostonStores feature class?
__________________________________________________________________________________
2-12
Planning and preparing for spatial analysis
b In the Contents pane, right-click BostonStores, point to Joins And Relates, and choose Add
Join.
d Click Run.
The attributes from StoresTable are added into BostonStores. Joins are stored within the project,
but you could also export the layer that the join is based on into the geodatabase. You will keep
the join as a virtual join for your analysis so that you can accomplish the same tasks without
creating another feature class.
a From the Insert tab, in the Project group, click Import Map.
2-13
Lesson 2
The map contains streams and land-use classifications for Ohio and Indiana, hydrologic unit
boundaries for several states, and the state boundary of Ohio. Your analysis will focus on Ohio, so
you will use the Ohio boundary layer to extract other features to narrow down the data.
a In the Contents pane, turn off Region5_HUC8 and NLCD_OhioInd, and turn on
OhioIndStreams.
The Ohio state boundary will act as a "cookie cutter" to extract only the streams that are within it.
2-14
Planning and preparing for spatial analysis
You can also interactively draw the clip features for this parameter using the
pencil icon.
• Output Feature Class: OhioStreams
d Click Run.
You have extracted the streams for Ohio that you will use for overlay analysis. Next, you will
extract the hydrologic unit boundaries for Ohio.
All tools that you have run and their associated parameters are saved in the Geoprocessing history
in each project. You can see the tools that you used in this exercise, including the Clip tool.
You can quickly modify a parameter and rerun a tool from its history.
2-15
Lesson 2
a In the Contents pane, make Ohio, NLCD_OhioInd, and the basemap the only visible layers.
2-16
Planning and preparing for spatial analysis
It is important to know the cell size of the input raster so that the output is the same.
You can set cell size as an environment setting.
In the Raster Analysis section, you can see that Cell Size is set to Maximum Of Inputs. This setting
ensures that a smaller, higher-resolution cell size will not be applied to the output raster, which
implies higher-quality data. For the data that you are working with, leaving the cell size
environment as it is set will result in output rasters with 30-meter cell size.
You want to extract the cells that fall within Ohio, but the Clip tool only works on vector data.
When you want to extract and analyze raster data, you must use Spatial Analyst tools. Even
though you are extracting cells in a raster dataset, you can use either a vector or raster dataset as
the mask.
g In the Geoprocessing pane, click the Back button, and then click the Toolboxes tab, if
necessary.
2-17
Lesson 2
j Click Run.
The output raster does not have the standard NLCD symbology, so you will import it from a layer
file.
m In the top-right corner of the Symbology pane, click the Options menu button and choose
Import.
q Close the Ohio map, save your project, and keep ArcGIS Pro open.
2-18
Planning and preparing for spatial analysis
Lesson review
1. What are some things to consider when preparing data for analysis?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
2-19
Lesson 2
2-20
Answers to Lesson 2 questions
2-21
3 Proximity analysis
Proximity analysis helps answer questions about the distances between features. It helps you
understand details about features in close proximity to one another and features that are
distant from one another.
ArcGIS Pro provides numerous proximity analysis tools that are designed to help answer
various questions about proximal relationships. You will learn how ArcGIS Pro measures
distance, the various data types on which you can use proximity tools, and how to apply
several tools to answer spatial questions.
Topics covered
Measuring distance
Determining cost
3-1
Lesson 3
How does proximity play a role in daily life? People think spatially every day. Many of your spatial
thoughts pertain to proximity. How far away is the store? What is the best route to work?
1. What is the first thing that you do when figuring out how to get to a new place?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
3-2
Proximity analysis
ArcGIS Pro tools calculate distance in four ways: cost, Euclidean, geodesic, and network.
Distance Description
measurement
3-3
Lesson 3
The various ways to measure distance in proximity tools are all applicable in certain situations and
with certain data. For each example, choose the best way to measure distance in proximity
analysis.
1. Which distance measurement would you use when you run the Buffer tool on small-extent
(large-scale) data?
_____________________________________________________________________________________
2. Which distance measurement would you use when you run the Buffer tool on large-extent
(small-scale) data?
_____________________________________________________________________________________
3. Which proximity distance measurement is most suitable for finding the best path for a
pipeline?
_____________________________________________________________________________________
3-4
Proximity analysis
The proximity analysis tools in ArcGIS Pro create different kinds of outputs—either an expanded
area or a numeric value.
Area expanding
Some proximity analysis tools create polygon or raster data that represents a specific distance or a
proximity zone, used for allocation.
Figure 3.1. Buffer and Thiessen polygons are examples of area-expanding proximity tools.
Tool Description
Create Creates proximity zones around points indicating that all the area within a
Thiessen zone is closer to the point within it than to any other point.
Polygons
Euclidean Similar to Create Thiessen Polygons, but the result is a raster dataset that
Allocation can be used for raster analysis.
3-5
Lesson 3
Numeric value
The numeric values returned by some tools are distances from other features, and the x,y
coordinates of the closest feature.
Figure 3.2. The Near tool returns a numeric value of the closest feature's ID, distance, and coordinates.
Tool Description
Determines the closest features in two different layers and appends the
Near closest feature's ID and distance to input table. Adding the closest feature's
coordinates is optional.
Generate
Determines the distances between features, within a specified search radius,
Near
and creates table of distances.
Table
3-6
Proximity analysis
When buffering at a small scale, such as a world scale, you should use geodesic measurements.
Geodesic measurements are the only true distance; they consider the curvature of the earth,
unlike other proximity metrics like Euclidean or cost. When you buffer at a large scale, you can use
planar because you are probably in a 2D coordinate system.
It is important to work with a projection that properly preserves distance at the given scale.
Figure 3.3. Shown is the difference between straight-line and geodesic distance measurements used for
5,000-kilometer and 10,000-kilometer buffers around North Korea.
3-7
Lesson 3
Measuring cost
Cost is another way in which distance is measured in ArcGIS Pro. Cost is the amount of
unfavorable impact or impedance associated with moving across a geographic area. A common
example of cost is time; it may cost more time to travel one route than another.
You can analyze cost using the Network Analyst extension. Network Analyst allows you to assess
cost over a linear network, such as roads. You can use Network Analyst to determine the best
routes for delivery companies to deliver packages, determine driving times, and allocate
resources.
Figure 3.4. You can analyze cells in a raster dataset to create a least-cost path for transporting resources. You can
also use Network Analyst to find the best route or driving times from specific locations.
You can also analyze surfaces to create a cost distance or cost-path surface. For example, assume
that you have an elevation surface and want to go from point A to point B. The higher the
elevation, the higher the cost of traveling that cell.
Esri Training courses: Creating Optimized Routes Using ArcGIS Pro, Creating an
Origin-Destination Cost Matrix in ArcGIS Pro, and Finding the Closest Facilities
Using ArcGIS Pro
3-8
Exercise 3 40 minutes
Analyze proximity
You have prepared store and customer data for Boston. You will use the data to perform proximity
analysis to determine store customer relationships, distances, and driving times for each store
based on an online service.
3-9
Lesson 3
b From the Catalog pane, expand Maps and open the Boston map.
If you did not finish the previous exercise, import the result map file named Ex2.mapx
from C:\EsriTraining\SNAP\Results\Ex02 to begin this exercise. Result maps are
provided for all exercises in the Results folder.
d In the Contents pane, change the name of the BostonStores layer to Stores.
You will also change the color of the Customers layer to blue.
f Click the Customers symbol, and then in the Symbology pane, click the Properties tab.
g Update the Color to a dark blue of your choice, and then click Apply.
3-10
Proximity analysis
a At the top of the Contents pane, click the List By Selection button .
b Right-click Stores and chose Make This The Only Selectable Layer.
c From the Map tab, in the Selection group, click the Select tool.
Next, you will select all customers within 5 miles of the chosen store.
g Click Run.
3-11
Lesson 3
For all geoprocessing tools, if a selection is present in an input layer, processing takes
place only on the selected features.
You have located customers within 5 miles of a store. You could export these customers to their
own feature class for further analysis or create a selection layer in the project.
j At the top of the Contents pane, click the List By Drawing Order button .
a From the Map tab, in the Selection group, click Clear to clear the selection.
c In the Geoprocessing pane, click the Back button, and then click the Toolboxes tab, if
necessary.
3-12
Proximity analysis
You have probably used a buffer to create a zone around a feature based on a specified distance.
You will use a few other Proximity tools to analyze stores and customers. You want to create zones
around each store that contain the closest customers. The zones indicate the store that customers
will most likely travel to.
The Create Thiessen Polygons tool creates proximity zones around points, indicating that a point
is closer to another point within that polygon than to any other point.
e Open the Create Thiessen Polygons tool, and set the following parameters:
f Click Run.
3-13
Lesson 3
Your symbology may be different than what is shown in graphics throughout the
exercises.
You can see the extent of the zones created for the stores. Each customer point that falls within a
zone is closer to that store than to any other store.
2. Is there anything about the result that you notice as a potential problem?
__________________________________________________________________________________
3-14
Proximity analysis
4. How can you modify the extent of geoprocessing outputs to better suit your analysis?
__________________________________________________________________________________
Next, you will run the tool again. This time, you will modify the output extent environment setting.
g Before you run the Create Thiessen Polygons tool again, on the Analysis tab, click
Environments.
h Under Processing Extent, update the Extent to Customers, and then click OK.
j Change the StoreZones symbol to No Color and a solid black outline with a width of 2.
Hint: In the Symbology pane, from the Gallery tab, click Black Outline (2 pts).
Setting the extent environment to the same environment as the Customers layer allows the tool to
incorporate all customers. You will use the StoreZones layer when you perform overlay analysis.
3-15
Lesson 3
a In the Contents pane, make Stores, Customers, and the basemap the only visible layers.
After Latitude and Longitude, the table does not contain any additional fields.
d In the Geoprocessing pane, click the Back button, and, if necessary, click the Toolboxes tab.
Checking the Location box will add the x,y coordinates of the closest feature as
separate fields, NEAR_X and NEAR_Y.
g Click Run.
3-16
Proximity analysis
NEAR_DIST The distance, in map units, between the input and the closest feature
Some features have actual values, whereas others have a -1. A search distance of 10 miles was
used. A -1 indicates that the feature falls outside the search radius. After you run the Near tool,
and you query the Customers layer, you can determine which store is the closest, the x,y
coordinates of the store, and the distance.
c Press Tab until you reach Data Type, and then choose Float.
g Under Rounding, lower the Decimal Places to 2, and then click OK.
h From the ribbon, on the Fields tab, in the Changes group, click Save.
3-17
Lesson 3
j In the Customers table, right-click the Miles field and choose Calculate Field.
k In the Geoprocessing pane, for Fields, scroll down and double-click NEAR_DIST to add it to
the expression.
m Click Run.
You can get a better idea of the distance using a more standard measurement.
a From the Map tab, in the Selection group, click Select By Attributes.
b In the Geoprocessing pane, for Input Rows, ensure that Customers is chosen.
3-18
Proximity analysis
e Click Add.
f Click Run.
g In the Geoprocessing pane, click the Back button, and then search for and open the XY To
Line tool.
i Click Run.
k In the Contents pane, for Customers, change the symbol size to 3 pt.
l For StoreCust10Miles, change the Color to a medium gray and the Line Width to .75 pt.
3-19
Lesson 3
Now, you have a good visual representation of the relationship between stores and customers.
The lines are beneficial for visualizing cannibalization (when many customers visit one store
regardless of how close it is). You would see long lines from a store to customers even though
those customers are much closer to another store. Visualizing the lines may help businesses
identify potential stores for closing, or at least ask why customers choose to drive farther to shop
at another store.
b From the Analysis tab, in the Tools group, click Network Analysis and choose Service Area.
Network Analyst creates a Service Area group layer, adds it to the Contents pane, and adds a
Service Area tab to the ribbon. Next, you will import the stores into the Network Analyst Facilities
layer.
c From the Service Area tab, in the Input Data group, click Import Facilities.
3-20
Proximity analysis
f Click Run.
g In the Contents pane, turn off the Stores layer, and then zoom to the Facilities layer.
The stores are added into the Facilities layer. Next, you will create drive-time polygons. Driving
times give businesses an idea of which customers live within a designated time from a store.
h In the Contents pane, click the Service Area group layer to select it.
i On the Service Area tab, in the Travel Settings group, set Cutoffs to be only 15 for a 15-minute
driving time.
j In the Arrive/Depart Time group, verify that Not Using Time is selected.
k In the Output Geometry group, click Standard Precision and choose Generalized.
3-21
Lesson 3
By default, Network Analyst uses an ArcGIS Online service to calculate driving times. If
you have your own network dataset, you can specify to use that. Using the ArcGIS
Online network dataset consumes credits.
m In the Contents pane, change the color of the Cutoff polygons to a beige, and then close the
Symbology pane.
You used Network Analyst to create drive-time polygons, which offer a different perspective on
proximity from that of a straight-line distance.
3-22
Proximity analysis
a In the Contents pane, make Stores and the basemap the only visible layers.
b In the Geoprocessing pane, click the Back button, and, if necessary, click the Toolboxes tab.
You can use several Distance tools to produce a raster surface, including Euclidean Distance. The
Euclidean Distance tool uses straight-line or geodesic distance from the inputs to create a
distance surface. You can use the distance surface in suitability modeling, as you will later in the
course.
d Open the Euclidean Distance tool and set the following parameters:
e Click Run.
3-23
Lesson 3
The Contents pane now shows distance bands and a legend. Cells are given a value based on
their straight-line distance from each store.
3-24
Proximity analysis
Lesson review
2. Explain the difference between using a straight-line distance and using cost.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
3-25
Answers to Lesson 3 questions
2. Is there anything about the result that you notice as a potential problem?
Yes. Not all the customers are accounted for in the zones.
3-26
Answers to Lesson 3 questions (continued)
4. How can you modify the extent of geoprocessing outputs to better suit your analysis?
You can change the environment setting.
3-27
4 Overlay analysis
Overlay is a fundamental and important type of spatial analysis. Overlay analysis is used to
explore both spatial and attribute characteristics of combined data layers. More specifically,
data is overlaid to answer questions about which geographic features are on top of other
features (for example, what crimes are reported within which patrol areas). This lesson focuses
on using overlay in analysis and how attributes of input layers are combined during the
process. You will also learn how different tools manage the data's output extent when an
overlay operation is performed (for example, whether all areas are preserved in the output or
only the areas of overlap). Finally, you will learn about other tools that process the cells of a
raster dataset based on how they overlap with other datasets.
Topics covered
4-1
Lesson 4
Introducing overlay
Overlay analysis can help you determine features that overlap. When you overlay one set of
features with another, you create new information.
Figure 4.1. Overlay processes features and attributes into new information.
4-2
Overlay analysis
What is overlay?
Overlay analysis is the geometric intersection of multiple datasets to combine, erase, modify, or
update features in a new output dataset. As you learned earlier, the output dataset contains new
information that combines existing information from the inputs. Overlay helps answer one of the
basic questions in GIS: What is on top of what?
Using overlay
Overlay allows all combinations of geometry. The extents of the inputs need not be identical.
Overlay tools always result in the simplest geometry from all inputs. As in the following example, a
polygon and line are the inputs and a line feature class is the result.
Figure 4.2. On the left, streams and watersheds are inputs to an overlay operation. On the right, the result is
streams within specified watersheds.
Although an important result of overlay is the spatial dataset, you also get an attribute table that
contains valuable information. All the overlay tools, except for Erase, produce an output feature
class in which additional attributes are defined and populated. Having all attributes in one table
allows you to quickly query a single feature to discover other attributes or apply symbology using
the other attributes.
4-3
Lesson 4
Figure 4.3. On top, the tables for the two input datasets are shown. Below, the resulting table from performing
overlay has attributes from both streams and the watersheds that overlap.
4-4
Overlay analysis
Overlay tools
ArcGIS Pro contains several tools for performing overlay analysis. The tool that you use depends
on the question that you want to answer, the types of features in your input data, and which
features that you want to include in the output.
Tool Description
name
Identity • Combines features of any type (point, line, or polygon) with "identity"
features, which must be polygons or have the same geometry as the
input features.
• Extent of output feature class has the same extent as the input feature
class.
• Any of the input features that overlap the identity features will get the
attributes of those identity features.
• Input features that do not overlap have null attributes.
Erase • Creates a feature class by overlaying the input features with the polygons
of the erase features.
• Only those portions of the input features falling outside the erase
feature's outside boundaries are copied to the output feature class.
4-5
Lesson 4
The following tools are not in the Overlay toolset but create information based on overlapping
features.
Summarize • Overlays a polygon layer with another layer to summarize the number
Within
of points, length of the lines, or area of the polygons within each
polygon.
• Calculates summary statistics about the attributes of the features within
the polygons.
4-6
Overlay analysis
For each of the following examples, choose the appropriate overlay tool.
1. Which overlay tool will output all customers but will only append attributes for customers
who overlap 15-minute drive-time polygons?
_____________________________________________________________________________________
2. Which overlay tool would you use to find the average value of vacant parcels within each
city boundary?
_____________________________________________________________________________________
3. Which overlay tool will create a layer containing only schools that fall outside the flood
zone?
_____________________________________________________________________________________
4-7
Exercise 4 35 minutes
In this exercise, you will use several overlay tools to locate customers based on their spatial
relationships with the drive-time polygons. You will also use Spatial Analyst tools to summarize the
length of streams in watersheds and calculate the amount of each land-use classification in a
raster.
4-8
Overlay analysis
b Make Customers, StoreZones, and the basemap the only visible layers.
c From the Contents pane, change the symbol size for the Customers layer to 6, and then close
the Symbology pane.
d Zoom to the Thiessen polygon that contains the largest number of customers.
e From the Map tab, in the Selection group, choose the Select tool.
4-9
Lesson 4
4-10
Overlay analysis
Now that you have selected the customers based on the fact that they intersect the selected zone,
you can export them to their own layer for further analysis.
j In the Contents pane, right-click the Customers layer, point to Selection, and choose Make
Layer From Selected Features.
You have isolated the customers who are closer to the store in the Thiessen polygon than to any
other store.
Step 2: Overlay customers and driving times using the Intersect tool
You want to show only the customers who live within the drive-time polygons. You could use a
spatial query, but you want to store the attributes for customers and the drive-time polygons in
the same layer. An advantage of using overlay tools is that they append attributes. In this step,
you will intersect the customers with the drive-time polygons.
a Make Customers, Service Area, and the basemap the only visible layers.
d For Extent, click the As Specified Below down arrow and choose Customers.
4-11
Lesson 4
f For Input Features, click the Add Many button , choose Customers and Service Area\
Polygons, and then click Add.
h Click Run.
i In the Contents pane, turn off the Customers layer, and drag the CustDriveInt layer above the
Service Area layer.
The Intersect tool creates a feature class containing only the customers who intersect the drive-
time polygons. Further, the customers will also have drive-time attributes.
Now you can query a customer and see the ID of the store that is within the 15-minute driving
time. You could create desire lines again, this time showing 15-minute driving times instead of 10
miles.
4-12
Overlay analysis
m Find and open the XY To Line tool, and then set the following parameters:
n Ensure that the following fields are set with these parameters (they should have been
previously set):
o Click Run.
p From the Contents pane, change the CustDriveInt layer symbol size to 4 pt, and then close the
Symbology pane.
r In the Contents pane, turn the StoreCust10Miles layer on and off to see the difference
between 10 miles and 15 minutes.
4-13
Lesson 4
You can use the results to determine where customers are coming from, how far they are willing to
travel, and perhaps where new customers could exist. The way that you conceptualize distance,
whether it be time or miles, can affect analysis results.
Step 3: Overlay customers and driving times using the Identity tool
What if you wanted to show all the stores but append only drive-time attributes to the stores that
fall within the drive-time polygons? In this step, you will use the Identity overlay tool to append
attributes for only overlapping features, while keeping all features in the output.
a In the Contents pane, make Customers, Service Area, and the basemap the only visible layers.
b In the Geoprocessing pane, search for and open the Identity tool.
d Click Run.
4-14
Overlay analysis
All customer points are retained in the output. The difference is that features have attributes for
customers and driving times.
1. Why do only some of the points have FacilityID and Name values?
__________________________________________________________________________________
j Click a customer within the drive-time polygon, and then in the pop-up window, scroll all the
way down.
4-15
Lesson 4
Now there are no drive-time attributes because the customer falls outside the polygon. Using the
results of the Identity tool, you could symbolize different colors for a customer based on whether
that customer is within 15 minutes of a store.
a In the Geoprocessing pane, search for and open the Erase tool.
c Click Run.
d In the Contents pane, make PotentialCustomers, Service Area, and the basemap the only
visible layers.
4-16
Overlay analysis
Now you can see only the customers who are not within 15 minutes of other stores. These
customer locations can help you identify the need for another store.
b Make Ohio, OhioHUC, and the basemap the only visible layers.
Due to the number of stream features, you will focus your analysis on a subset of
watersheds to save processing time.
e Select three watersheds by drawing a box that touches the following features:
4-17
Lesson 4
You may want to update the OhioHUC symbology so that you can see it more clearly.
You will get a total length of streams within the three selected watersheds.
f In the Geoprocessing pane, click the Back button twice, and then click the Toolboxes tab.
You will use the Summarize Within tool to get the total length of streams in each watershed.
Although the Summarize Within tool is not categorized as an overlay tool, it will summarize
features based on streams overlapping a watershed.
h Open the Summarize Within tool and set the following parameters:
i Click Run.
j In the Contents pane, turn off the Ohio and OhioHUC layers, and zoom out.
4-18
Overlay analysis
The three watersheds that you selected are the only features in the output feature class. The
statistical result determined by amount of overlap is in the table.
a In the Contents pane, make OhioHUC and NLCD_Ohio the only visible layers.
4-19
Lesson 4
You will use a zonal analysis tool to tabulate the area of land use within each watershed polygon.
The Zonal tools allow you to perform analysis where the output is a result of computations
performed on all cells that belong to each input zone. In this case, the watershed polygons act as
zones.
d Open the Tabulate Area tool and set the following parameters:
e Click Run.
f In the Contents pane, under Standalone Tables, open the LanduseArea table.
Now you have a table that has the areas of each land-use classification tabulated. You could join
the LanduseArea table to the Ohio_HUC layer to add the areas for all land-use types to the
Ohio_HUC layer.
4-20
Overlay analysis
Lesson review
2. If you use the Intersect tool with streams and watersheds as the inputs, what would the
resulting feature class contain?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
4-21
Answers to Lesson 4 questions
4-22
5 Automating spatial analysis
Running geoprocessing tools one by one in succession can be a successful workflow for
producing desired information products and results. But you may want a visual representation
of your analysis that you can modify and rerun. You may also want a tool that can perform the
same process over and over on multiple inputs. ModelBuilder allows you to chain together
tools, automate workflows, and share tools so others can add their own data.
In this lesson, you will focus on ModelBuilder as a means of automation, but you will also see
how Python and tasks can be used.
Topics covered
5-1
Lesson 5
Automating workflows
Earlier, you used individual geoprocessing tools to perform analysis and produce effective results.
ArcGIS Pro enables you to automate your analysis—to set it to process multiple datasets at a time.
5-2
Automating spatial analysis
You can perform spatial analysis by running individual geoprocessing tools in succession and
produce satisfactory results. However, you may want to run the same tool on many inputs at one
time to have a visual representation of your analysis.
Figure 5.1. Ways to automate analysis and other operations in ArcGIS Pro.
Automation Description
method
Batch Allows you to run the tool multiple times using many input datasets or
geoprocessing different parameter settings. Makes it possible to run a given tool as
many times as needed with very little interaction.
Python The scripting language of ArcGIS. ArcGIS includes ArcPy, which gives
you access to all geoprocessing tools, scripting functions, and
specialized modules that help you automate a GIS process.
5-3
Lesson 5
Automation Description
method
Tasks A set of preconfigured steps that guide you and others through a
workflow or business process. You can use a task to implement a best-
practice workflow, improve the efficiency of a workflow, or create a
series of interactive tutorial steps.
5-4
Automating spatial analysis
Batch geoprocessing
You may have dozens of datasets that you want to clip to the same boundary. Without coding or
creating a model, you could set up batch processing one time to execute the Clip tool on all
inputs. Most tools have batch mode, but you can verify by right-clicking the tool. If Batch is not
listed in the menu, then you cannot run the tool in batch mode.
Figure 5.2. After you choose Batch and set the batch parameter, you run the tool in batch mode.
5-5
Exercise 5A 30 minutes
Build a model
You want to create a visual representation of your analysis that you can modify and rerun as
needed. You will use ModelBuilder to chain tools together for the analysis of customers and stores
that you performed earlier. The model will create desire lines from stores to customers.
• Create a model.
• Add tools to a model and set parameters.
5-6
Automating spatial analysis
The map file contains one feature class of stores and five stand-alone tables. You will use the
CustomerTable in the first exercise, and then you will use the other four tables in the second
exercise.
d Open CustomerTable.
CustomerTable contains many customer-specific attributes, such as name and address. It also
contains x,y coordinates. The table is nonspatial, so you will use a geoprocessing tool to create a
point feature class from the table.
f Zoom to roughly a scale of 1:900,000, with the stores centered in the map.
h Under Processing Extent, set Extent to Current Display Extent, and then click OK.
5-7
Lesson 5
e For Label, type Customers and Stores Analysis, and then click OK.
Names cannot include spaces, but labels can include them. Spaces are not, however,
required in label names.
It is important to document your models so that other users understand what the model does.
g In the Catalog pane, right-click the model and choose Edit Metadata.
i For Summary, type: The model automates adding customer points from a table, finding
the closest store, and creating desire lines.
a From the Geoprocessing pane, search for the XY Table To Point tool.
The tool is added along with an output element. You must specify an input table for the tool to be
ready to run. You will use the CustomerTable as the input.
c From the Contents pane, drag the CustomerTable into the model to the left of the tool.
5-8
Automating spatial analysis
e Click the blue input data element, and then drag a line to the tool.
You set the input by connecting the model elements, but you can open the tool to set other
parameters.
g Double-click the XY Table To Point tool, and set the following parameters:
h Click OK.
c If necessary, move the Near tool to the right of the green Customers element.
d Connect the Customers output data element to the Near tool as Input Features.
5-9
Lesson 5
f Click OK.
g From the ModelBuilder tab, in the View group, click Auto Layout.
b Add the Make Feature Layer under the output of the Near tool.
5-10
Automating spatial analysis
c For the Make Feature Layer tool, set the following parameters:
d Click OK.
As you did earlier, you used the NEAR_FID attribute to only process features within the 10-mile
search radius.
a In the Geoprocessing pane, click the Back button until you see the Favorites tab.
b Click Favorites, if necessary, and under Recent, add the XY To Line tool to the end of the
model.
On the ModelBuilder tab, in the View group, model zoom tools are available to zoom
in and out of the model so that you can place elements.
c Connect the output of the Make Feature Layer tool to the XY To Line tool as Input Table.
5-11
Lesson 5
e Click OK.
5-12
Automating spatial analysis
a In the model, right-click the Customers_Layer output data element and then the DesireLines
output data elements and choose Add To Display for both of them.
All the model elements are colored appropriately, so the model is ready to run. If any
element had parameter issues, it would be gray.
You have created a model that performs the same analysis that you did earlier in the course. Next,
you will automate this workflow to account for multiple inputs.
g Save the project, and leave ArcGIS Pro open for the next exercise.
5-13
Lesson 5
You can increase the power of your models through iteration, or the ability to process multiple
datasets at one time. You can add an element called an iterator to enable bulk processing on
items like feature classes or tables. Your model can become a powerful tool when it is given the
ability to process many datasets at one time.
After you create and automate your model, you may want to share it as a tool. When you share a
model, you should set model parameters for specific input data elements and individual tool
parameters.
5-14
Automating spatial analysis
Automating a model by adding an iterator enables processing on multiple datasets at one time.
You set model parameters if you plan to share your model as geoprocessing tool.
Figure 5.3. Setting model parameters allows users to add their own data and to choose tool parameters that meet
their needs.
5-15
Exercise 5B 25 minutes
You are working as a business analyst, and you receive a customer report each week with new
customers. You will use information about these new customers to compare by week and see
where the new customers are coming from. Because you get a customer report every week, you
want to create a tool that processes multiple tables at once to run the XY To Point, Near, and XY
To Line tools. You already performed the analysis workflow using the tools. Next, you will set your
model for multiple inputs using an iterator.
5-16
Automating spatial analysis
a Restore the ArcGIS Pro project, and activate the Automation map, if necessary.
The table contains the x,y coordinates for customer locations. Each of the four tables that you
added contains the same attributes. As you did before, you will use the x,y coordinates to map
the customers.
f From the ModelBuilder tab, in the Mode group, click Select to activate the tool, if necessary.
g Drag a box around all model elements to select them (handles will appear around selected
elements).
l Select the initial CustomerTable blue input data element and press Delete.
n Update the name for the model to IterateTables, and then update the label to Iterate Tables
and click OK.
5-17
Lesson 5
a From the ModelBuilder tab, in the Insert group, click Iterators and choose Iterate Tables.
b Drag all elements associated with the iterator to the left of the XY Table To Point tool.
The * indicates that you want to process all tables that start with the word Week.
f Click OK.
g Connect the green output data element from the iterator to the XY Table To Point tool as
Input Table.
5-18
Automating spatial analysis
i For Output Feature Class, replace Customers with %Name%_Points, and ensure that the
output is being added to SNAPCourse.gdb.
Because you are using an iterator and will be processing multiple inputs, you must use
an in-line variable for the output name.
k Click OK.
m Open the XY To Line tool and change the Output Feature Class name to
%Name%_DesireLines.
n For Start X Field, choose POINT_X, and for Start Y Field, choose POINT_Y.
You must use an in-line variable for the output name because four feature classes are being
created. If you did not use a variable, each feature class would have the same name and be
overwritten, and only one feature class would be created.
o Click OK.
p Right-click the intermediate data element named %Name%_Points and ensure that Add To
Display is not checked.
q Right-click the final output of the model (%Name%_DesireLines) and uncheck Add To Display.
r From the ModelBuilder tab, click Validate, and then click Run.
5-19
Lesson 5
v Select and add each of the Week#Cust_DesireLines feature classes to the map.
You used ModelBuilder and iterators to automate an analysis workflow. You can run the same
model on other input tables that you receive to map your customers.
5-20
Automating spatial analysis
a Add a new model, and change its name and label to DesireLineTool.
b Copy all model elements from the Iterate Tables model, and paste them into the
DesireLineTool model.
d In the Catalog pane, if necessary, expand Toolboxes and SNAPCourse.tbx, and then double-
click the DesireLineTool model.
The model opens in a tool dialog box, which states that it has no parameters. Model parameters
are required if you want to give users the ability to change inputs, outputs, or other tool
properties.
1. Which items in your model should be made available for users to provide their own
data?
__________________________________________________________________________________
__________________________________________________________________________________
e In the DesireLineTool model, click in the white space to clear your selection.
f Right-click the first blue input data element (Business.gdb) and choose Parameter.
5-21
Lesson 5
The parameter now displays in the tool dialog box, thus allowing users to input their own data.
Next, you will make model parameters for the XY Table To Point tool.
i Right-click the XY Table To Point tool, point to Create Variable, point to From Parameter, and
choose X Field.
j In the same manner, add model parameters for Y Field and Coordinate System.
k From the ModelBuilder tab, in the View group, click Auto Layout.
5-22
Automating spatial analysis
All model parameters that you set appear in the geoprocessing pane. You will continue adding
parameters for most of the remaining elements.
n Using the following table as a guide, create model parameters for the remaining elements.
Element Parameters
XY To Line Start X Field, Start Y Field, End X Field, End Y Field, Spatial Reference;
set all as model parameters
5-23
Lesson 5
a In the model, right-click the Business.gdb input data element and choose Rename.
b Overwrite the existing name with Workspace containing customer tables and press Enter.
5-24
Automating spatial analysis
X Field Customer X
Y Field Customer Y
5-25
Lesson 5
You have customized the labels of your parameters so that users know what the parameter is.
You can make your tool accessible from the Analysis gallery so it is easier to find.
h In the Catalog pane, right-click the DesireLineTool model and choose Add To Analysis Gallery.
If you were to share your project as a project package, others could run the tool from the Analysis
gallery.
5-26
Automating spatial analysis
5-27
Lesson 5
Lesson review
2. Why would you set model parameters for your model elements and variables?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
5-28
Answers to Lesson 5 questions
5-29
6 Creating surfaces using interpolation
Suppose that you want to model a feature as a continuous surface, but you only have data
values for a finite number of points. For example, you want to create a precipitation map for
an entire region, but you only have a few rain gauge locations recording observations in the
area. How would you do it? Because surfaces represent continuous phenomena that have
values at every point across their extent, you must interpolate the values for the unknown
locations.
The word "interpolate" means to estimate a value that lies between two other values. From a
GIS perspective, spatial interpolation refers to the process of estimating or predicting the
unknown data values for specific locations using the known data values. Sample data is often
collected at irregularly distributed locations, and attributes are sometimes difficult to
consistently quantify. With GIS, you can use point samples to model complex surfaces that
suit your specific needs and provide the information necessary to make informed and
defensible decisions.
Topics covered
What is interpolation?
6-1
Lesson 6
The renowned geographer and cartographer Waldo Tobler formulated a statement known as the
First Law of Geography:
"Everything is related to everything else, but near things are more related than distant things."
Tobler's First Law of Geography is the foundation for one of the most important concepts in
spatial analysis: spatial autocorrelation. Spatial autocorrelation is a measure of the degree to
which a set of spatial features and their associated data values tend to be clustered together in
space (positive spatial autocorrelation) or randomly distributed (no autocorrelation). Spatial
autocorrelation is an important concept used in interpolation and spatial statistics.
6-2
Creating surfaces using interpolation
What is interpolation?
When you watch the weather news coverage on TV, you probably see maps like the map in this
image—with rainfall, snowfall, temperature, or wave height represented. The temperature values
are represented as a surface. Surface data is commonly used in GIS to model continuous
phenomena, like elevation, soil nutrient levels, air pollution, or temperature. Continuous
phenomena do not have discrete x,y coordinates that define a boundary. However, the surface
was derived from discrete points that contain a temperature value.
Figure 6.1. Weather monitoring stations are used throughout Europe to record temperature and other atmospheric
conditions. A single monitoring station records values for only one location.
Data cannot be captured at every location, so a technique called spatial interpolation is used to
estimate unknown values from known values. Your data must contain a value—such as elevation,
precipitation, or another continuous variable.
6-3
Lesson 6
General example
On the left are known values for some phenomenon. On the right is a surface created using the
known values to predict values where no samples were recorded.
Figure 6.2. Imagine that the point values on the left are temperature readings. The surface on the right was created
through interpolation to estimate unknown values.
Figure 6.3. Temperature values at weather stations across Europe interpolated to a continuous weather surface.
6-4
Creating surfaces using interpolation
Interpolation methods
Interpolation methods can be either deterministic or geostatistical. All methods rely on the
similarity of nearby sample points to create the surface, which is referred to as spatial dependence
(or spatial autocorrelation).
Deterministic
Deterministic methods use mathematical models (nonstatistical) for creating surfaces from
measured points. These methods are "deterministic" because the spatial relationships in the
measured points are determined by the initial data conditions and how the user specifies the
model parameters.
Figure 6.4. When using deterministic methods, no assumptions are made about the spatial statistical structure of
variability in the data values. Also, uncertainty is not considered in the predictions.
6-5
Lesson 6
Geostatistical
Geostatistical methods rely on both mathematical and statistical models to create output surfaces.
The model parameters are estimated based on the spatial structure and statistical properties of
the underlying data. Geostatistical methods assume that the data being modeled is subject to
random variation and measurement error.
Figure 6.5. Geostatistical methods produce a prediction surface and a surface (not shown) showing estimates of
prediction uncertainty.
6-6
Creating surfaces using interpolation
Interpolation tools
Each interpolation method estimates unknown values from known values. However, the ways in
which methods work are different. The following list includes commonly used deterministic and
geostatistical interpolation tools.
Tool Description
Inverse Uses the measured values surrounding the prediction location to predict a
Distance value for any unsampled location. Predicted values are based on the
Weighted assumption that things that are close to one another are more alike than
(IDW) things that are farther apart.
Natural Finds the closest subset of input samples to a query point and applies
Neighbors weights to them based on proportionate areas to interpolate a value.
Interpolation tools are located in the Spatial Analyst, Geostatistical Analyst, and 3D Analyst
toolboxes, and all require extensions.
6-7
Lesson 6
Deterministic interpolation
Deterministic interpolation techniques create surfaces from measured points, based on either the
extent of similarity (inverse distance weighted) or the degree of smoothing.
6-8
Creating surfaces using interpolation
• Manually validate the surfaces using the Explore tool, and compare the cell values with the
sample point values in the same location.
• Create the surface on a subset of points, thus withholding some sample points. After you
create the surface on the subset, explore how well the interpolator estimated values where
the withheld points are located. You can create the subset manually using the Subset
Features tool in the Geostatistical Analyst toolbox to create training and testing data, and
then use the GA Layer To Points tool to perform the validation.
• Use the Cross Validation tool in the Geostatistical Analyst toolbox. If you use a deterministic
interpolator from the Geostatistical Analyst toolbox, you can perform cross validation on it
using a geoprocessing tool.
6-9
Exercise 6 30 minutes
Interpolate surfaces
The U.S. Environmental Protection Agency is responsible for monitoring atmospheric ozone
concentration in California. Ozone concentration is measured at monitoring stations throughout
the state. The concentration levels of ozone are known for all the stations, but the ozone values
for other unmonitored locations in California are also of interest. However, it is too costly and
impractical to put monitoring stations everywhere. In this exercise, you will use interpolation to
create continuous surfaces from the ozone sample points.
6-10
Creating surfaces using interpolation
c From the Insert tab, in the Project group, click Import Map.
You can clearly see visual clusters of high and low ozone concentrations. The clusters indicate that
interpolation is a good option for creating a continuous surface. Next, you will explore the
attributes for the sample points.
e In the Contents pane, right-click Samples and open its attribute table.
6-11
Lesson 6
The table contains the name of the monitoring station, its elevation, and the ozone measurement.
Currently, the Samples layer is symbolized using the OZONE field. You will use the OZONE field
when you interpolate a continuous surface from the sample points.
g In the Contents pane, turn off the Samples and Hillshade layers.
c Click OK.
6-12
Creating surfaces using interpolation
d Open the Natural Neighbor tool, and set the following parameters:
e Click Run.
There are no sample points in some areas, and the Natural Neighbor tool creates surfaces that
pass through only the sample points. You set the analysis mask, but Natural Neighbor does not
honor an analysis mask.
f In the Contents pane, turn the Samples layer on and off to view the points with the surface.
g When you are finished, ensure that the Samples and OzoneNN layers are turned off.
a In the top-left corner of the Geoprocessing pane, click the Back button .
6-13
Lesson 6
b From the Interpolation toolbox, open the Spline tool, and set the following parameters:
c Click Run.
d Turn the Samples layer on and off to view it with the surface.
Next, you will run the Spline tool using the tension method. With the Tension spline type, higher
values entered for the weight parameter result in somewhat coarser surfaces that nonetheless
closely conform to the control points.
h From the Appearance tab, in the Effects group, use the Swipe tool to compare results from the
regularized and tension methods.
6-14
Creating surfaces using interpolation
Regularized spline creates a smoother surface than the tension method does.
a In the Geoprocessing pane, click the Back button, and from the Interpolation toolbox, open
the IDW tool.
c Click Run.
6-15
Lesson 6
Next, you will modify the Power parameter. The Power parameter allows you to control the
significance of known points on the interpolated values based on their distance from the output
point. By defining a higher Power value, more emphasis is placed on the nearest points. Thus,
nearby data will have the most influence on the estimated values.
d In the Geoprocessing pane, change the following parameters for the IDW tool:
f In the Contents pane, compare the IDW results by turning the layers off and on.
6-16
Creating surfaces using interpolation
a In the Contents pane, turn on all layers except the basemap and Hillshade.
b From the Map tab, in the Navigate group, click the Explore tool to activate it.
c Click the Explore tool down arrow and choose Selected In Contents.
Now you can base which attributes are displayed by the Explore tool on the selected layer.
6-17
Lesson 6
You will focus on the sample points specified in the graphic to see which tool predicted values
more closely to the recorded value.
f With the Explore tool, click the second point from the top.
6-18
Creating surfaces using interpolation
The Natural Neighbor tool interpolated this part of the surface to a value of 0.070. Next, you will
get values for the Spline and IDW tool surfaces.
i Using the skills that you have learned, get the interpolated value at the same point for the
other interpolated layers.
Closing the pop-up window is not necessary. Simply change the selected layer, and
then click the point again. If the pop-up window opens over the point, zoom out so that
you can see the point and the pop-up window together or move the pop-up.
The predicted values for the sample point are all close or exact. Normally, you would perform this
operation on several points throughout the study area to determine which tool predicted the best.
You could also determine the best surface by withholding some sample points from the
interpolation. You could perform the interpolation on a subset of points, and then compare the
predicted values with the values of the withheld points.
k If you have time and want to explore the interpolated surfaces further, proceed to the
challenge step.
10 minutes
c Use the Explore tool to see how well the interpolation tools predicted values that were
excluded from the interpolation.
6-19
Lesson 6
Lesson review
1. Describe interpolation.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
3. What are some ways in which you can validate surfaces created using interpolation?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
6-20
Answers to Lesson 6 questions
• Houses in one development are more similar in value than in neighborhoods farther
away.
• If it is snowing where you are, it is likely to be snowing 100 feet away from where you
are but maybe not 100 miles from where you are.
6-21
Exercise 6 challenge solution
b In the Catalog pane, expand Folders, and then expand Snap and Interpolation.
SubsetPoints has fewer sample points in it than the initial Samples layer. You will use the same
interpolation tools as you did earlier, but this time, you will use the SubsetPoints feature class as
the input.
d Locate and run the Natural Neighbor tool with the following parameters:
e Locate and run the Spline tool with the following parameters:
f Locate and run the IDW tool with the following parameters:
g From the Catalog pane, in the CaliOzone geodatabase, add WithHeldPoints to the map.
6-22
Exercise 6 challenge solution
Figure 6.7.
j In the map, using the Explore tool, click the top point.
Figure 6.8.
k In the Contents pane, select NNSubset, and then click the point.
l Repeat the process of selecting SplineSubset and then IDWSubset and clicking the point to
get the predicted values.
1. Which interpolation tool best predicted the value of the withheld point?
IDW, with a value of 0.074
As you can see, manually validating a surface is time-consuming. Later in the course, you will use
geostatistical tools to validate surfaces.
6-23
7 Suitability modeling
Suitability modeling is a type of analysis that locates places that are "suitable" or favorable for
certain phenomenon. An example would be the best locations for a wind farm in Colorado.
You will learn about a standard workflow for performing suitability modeling that you can
apply to any data or analyses and use it to solve a problem.
Topics covered
Levels of measurement
7-1
Lesson 7
Suitability modeling is the process of combining multiple datasets, usually raster, together into
one layer with the intention of finding optimal locations for various phenomena. ArcGIS Pro has
several raster overlay tools, such as Weighted Overlay and Weighted Sum, that allow you to
weight layers based on their relative importance to a suitability modeling scenario.
Figure 7.1. Distance, land use, and slope rasters are combined into a single raster that contains cells suitable for a
vineyard. Vineyards are successful on certain slopes, on certain land uses, and at certain distances from roads.
7-2
Suitability modeling
When performing suitability modeling, you can guide your analysis using a standard workflow.
Figure 7.2. You can follow the suitability modeling workflow using any analysis problems and datasets.
Define the problem: State the problem that you are trying to solve, such as finding the best
location for a wind energy facility.
Identify and derive criteria: Criteria are the conditions that geographic areas must meet to be
considered suitable. For example, daily average wind speed must be at least 25 mph.
Transform values to a common scale: When dealing with datasets containing different measures
and ranges, you must transform data values so that you can rank them on the same scale.
Weight layers and combine: Suitability modeling involves weighting layers based on their relative
importance to the problem that you are trying to solve. After you weight the layers, you use tools
to combine them onto a suitability surface.
Locate the phenomenon: With the resulting suitability surface, you can dig deeper to locate best
sites or regions that are most suitable.
Analyze the results: Explore the findings, perhaps alter some tool parameters, and rerun the
analysis to get the best result.
7-3
Lesson 7
The first step of the suitability modeling workflow is to define the problem. You must determine
the proper climate conditions for growing crops. You will consider phenomena like temperature,
elevation, slope, and distance from roads to find the most suitable places. The datasets of interest
become the analysis criteria.
What types of data can you identify in the criteria, and can you use vector overlay tools like
Intersect and Union with them?
7-4
Suitability modeling
Based on the data and scenario provided, determine whether you would use raster or vector
analysis tools.
Potential datasets include wind, ports, shipping lanes, bathymetry, and nature preserve boundary.
1. Would you use raster or vector overlay to determine suitable ocean locations to harvest
wind power? Why?
_____________________________________________________________________________________
Potential datasets include roads, census blocks, competitor stores, and zoning.
2. Would you use raster or vector overlay to determine the most suitable locations for a
shopping center? Why?
_____________________________________________________________________________________
7-5
Lesson 7
In raster overlay analysis, you work with surfaces. A surface is a geographic phenomenon
represented as a set of continuous data (such as elevation, geological boundaries, or air
pollution). Surfaces do not have discrete x,y coordinates for phenomena because the data being
modeled does not have set boundaries and is more continuous over the landscape. After
determining the criteria that you need for your analysis, you may not always have the data that is
required to model these criteria. For example, you might need slope but have only elevation, or
need distance and have only roads.
Figure 7.3. You can derive surfaces from vector or raster data using geoprocessing tools or raster functions.
7-6
Suitability modeling
In ArcGIS Pro, you can create raster data in two main ways: raster functions and geoprocessing
tools.
Raster functions
Using a raster function is a quick way to process and analyze rasters in ArcGIS Pro. You can apply a
raster function to raster datasets, mosaic datasets, or image services that are in your map. The
resulting virtual layers are stored in your current project. You can apply system functions for data
management, visualization, and analysis.
Raster functions do not create permanent data; they process only the pixels that are visible on
your screen, creating virtual layers in the map, which saves disk space and results in fast
processing. If you want to save the result of a raster function, you can export it to a geodatabase
raster.
Geoprocessing tools
To build a geodatabase with rasters, you would use geoprocessing tools. Some raster functions
and geoprocessing tools are similar, such as Hillshade and Slope, and whether you use a function
or a tool depends on the output that you want. You can add geoprocessing tools to models, but
you cannot add raster functions. However, you can create a function chain, which is similar to a
model.
7-7
Lesson 7
Levels of measurement
There are various types of data in GIS—nominal, ordinal, interval, and ratio—that are referred to
as levels of measurement. Each type allows various mathematical operations to be performed on
it. An understanding of levels of measurement is vital in weighted suitability modeling because
you are essentially taking nominal (land use), interval (temperature), or ratio (distance to roads)
and transforming it into interval or ratio data.
Figure 7.4. Nominal data is a name or description, such as the peak names shown in the image.
Ordinal data supports the relational operators equal to (=), not equal to (!=), greater than (>), less
than (<), greater than or equal to (>=), and less than or equal to (<=).
Figure 7.5. Ordinal measurements determine importance, such as 1st, 2nd, and 3rd place in a race, or which peak is
higher than another. You cannot add, subtract, multiply, or divide the numbers.
Interval measurements capture values that are measurable on an interval scale, such as
temperature or elevation. Interval data has an arbitrary zero point (for example, 0 degrees F does
7-8
Suitability modeling
not imply "no temperature"). Interval data supports all relational operations supported by nominal
and ordinal measurements, and the mathematical operations of addition and subtraction.
Figure 7.6. The elevation of Mt. Everest is an example of interval data or measurement. The zero point (sea level) is
arbitrary.
Ratio scales are used for many measurements in the physical sciences and engineering, such as
mass, length, time, height, and energy. All relational operators can use a variable measured at the
ratio level, and all necessary mathematical operators (+, -, X, /). For example, height can be
represented as ratio data because one object, such as a building, can be twice as tall as another.
Ratio data has an absolute zero point. For example, a county with zero population implies the
complete absence of people.
Figure 7.7. The height of a mountain from an absolute zero point is an example of ratio data or measurement.
7-9
Lesson 7
When you perform a weighted overlay, you combine many surfaces that contain different ranges
of values and may be in different units of measure. You must address this issue of differing values
and ranges before you overlay the rasters. Based on the type of data that you have, you may
reclassify the data values manually or use a tool that automates the process.
Reclassify
Reclassification involves manual assignment of data values into discrete classes. Reclassify is best
used for discrete data that will have distinct class breaks. You can enter the values manually or
load them from a table. The output raster will contain only the values of your suitability scale and
its cells distributed to the various classes as specified by the user.
Figure 7.8. You can reclassify using individual values (shown in the example on the left) or ranges of values (shown in
the example on the right).
Rescale by function
Another method available to transform data values is called rescale by function. Many times, the
suitability changes continuously with the changing values of the criterion and often does so in a
nonlinear manner. For example, cell locations close to existing roads may be the most preferred in
a housing suitability model because the cost of getting power to those locations is cheaper. As
the distance from a road increases, the cost of getting power to those locations may increase
exponentially. As a result, the suitability for farther locations may decrease dramatically. Rescale
by function is a better option than reclassification for continuous data.
7-10
Suitability modeling
Figure 7.9. A function used to transform data values onto the desired suitability scale. Notice that there are no
distinct class breaks.
7-11
Exercise 7A 30 minutes
You will build a model to derive surfaces from various sources and to classify and transform data
values to a common suitability scale. Then, in the second exercise, you will overlay the surfaces.
7-12
Suitability modeling
The map displays streams, roads, land use, and elevation for an area in Vermont. For raster
analysis, it is important to verify the cell size of your data.
g Click OK.
h Using the same steps that you just performed, find the cell size for the LandUse raster.
In this case, the cell sizes match for the input rasters. You will set the output cell size for other
rasters that you create to match the cell size of the inputs, which is 30 meters. Next, you will set
several environments for the analysis.
7-13
Lesson 7
l Click OK.
b Update the model name to BearSuitability and its label to Bear Suitability.
Hint: Catalog pane > Toolboxes > SNAPCourse.tbx > Right-click Model > Properties
a From the Contents pane, drag each of the layers, excluding the basemap, into the model and
arrange them one under each other as follows:
• LandUse
• Roads
• Streams
• Elevation
7-14
Suitability modeling
You will use two vector layers, Streams and Roads, as inputs into the Euclidean Distance tool to
create distance surfaces for the raster analysis.
c Drag the Euclidean Distance tool into the model and place it to the right of Streams.
e Open the Euclidean Distance tool, and change the Output Distance Raster name to
StreamDist.
f Click OK.
Next, you will add another Euclidean Distance tool to create the distance surface for the Roads
layer.
7-15
Lesson 7
h From the Geoprocessing pane, add another Euclidean Distance tool to the model to the right
of Roads.
i Connect Roads to Euclidean Distance (2) as Input Raster Or Feature Source Data.
j Open the Euclidean Distance (2) tool, and change the Output Distance Raster name to
RoadsDist.
k Click OK.
7-16
Suitability modeling
b Drag Slope (Spatial Analyst Tools) into the model to the right of Elevation.
d Open the Slope tool, and for Output Raster, type Slope.
f Click OK.
7-17
Lesson 7
Integer-based raster datasets can have attribute tables. The CLASS_NAMES field contains
descriptions for each of the cell values. You can analyze the table to determine which land-use
codes are most suitable for bear habitats.
1. Which values should receive the most suitable class of 5 when you reclassify?
__________________________________________________________________________________
2. Which values do you consider to be highly suitable for bear habitats, but not as good as
values 6, 7, and 8?
__________________________________________________________________________________
c Add a Reclassify (Spatial Analyst Tools) tool to the model, placing it to the right of the
LandUse data element.
e Open the Reclassify tool, and ensure that Reclass Field is set to VALUE.
g In the Value and New columns, type the following values and classes, pressing Enter after each
row to add another row.
After you press Enter, you must double-click in the Value cell to type, and then press
Tab to move to the New cell.
7-18
Suitability modeling
Value New
1 1
2 1
3 2
4 3
5 4
6 9
7 7
8 8
9 10
10 9
11 2
12 3
7-19
Lesson 7
where you are not sure of the class breaks, you can let ArcGIS Pro determine them using a
transformation function. Next, you will use Rescale By Function to rescale the RoadsDist layer.
b Add Rescale By Function (Spatial Analyst Tools) to the model next to RoadsDist.
d Open the Rescale By Function tool, and set the following parameters:
The Linear transformation function is best used when the preferences for values increase or
decrease at a constant linear rate. For example, the most suitable bear habitats are farther away
from roads.
e Click OK.
a In the Geoprocessing pane, add the Rescale By Function tool to the model, placing it to the
right of StreamDist.
7-20
Suitability modeling
c Open the Rescale By Function (2) tool, and set the following parameters:
Typically, bears like to be closer to streams for food and water. The Small transformation function
is used when the smaller input values are more preferred.
d Click OK.
a Add a Rescale By Function tool to the model, placing it to the right of the Slope (2) data
element.
c Open the Rescale By Function (3) tool, and set the following parameters:
The Logistic Decay function is best used when the lower input values are more preferred, and as
the input values increase, the preferences rapidly decrease.
d Click OK.
7-21
Lesson 7
f With the Explore tool active, hold the Shift key and select SlopeRescale, RoadsRescale, and
StreamsRescale, and then right-click and choose Add To Display.
b From the ModelBuilder tab, click Validate, and then click Run.
The legends for the layers that you rescaled are different from the legend for LandUseRcl in that
they are stretched rather than having distinct class breaks.
In the rescaled layers, green indicates more suitable and red indicates less suitable areas.
Consider distance from roads: The linear function that you used makes smaller distances less
suitable and greater distances more suitable.
e In the Contents pane, make Roads and RoadsRescale the only visible layers.
By viewing the layers together, you can see how the layer was rescaled. Where there are no roads,
it is most suitable for bear habitats.
7-22
Suitability modeling
Streams cover most the study area, so if you consider only distance from streams, there are many
suitable areas.
i In the Bear Suitability model, right-click each of the final green output data elements and
uncheck Add To Display.
You have added the necessary tools to derive surfaces, reclassify the land-use raster, and rescale
the continuous surfaces. You have also explored the results of the model up to this point. In the
next exercise, you will overlay the reclassified and rescaled layers to create a suitability surface.
7-23
Lesson 7
Binary
In binary overlay, each cell is assigned a value of 0 or 1, based on whether it meets all the analysis
criteria. Zero (0) indicates that the cell does not meet all the criteria, and one (1) indicates that a
cell meets all the criteria.
Figure 7.10. For each input surface, a 1 indicates that a cell is suitable. The final result reveals that only two cells are
suitable based on the criteria; all other cells (for example, cells with a value of 0) are considered unsuitable.
Weighted
In weighted overlay, values are manually reclassified onto a common scale (for example, 1 to 5 or
1 to 10). Layers are also weighted based on their influence to the particular analysis scenario,
which is a key component to the suitability modeling workflow. Weighting allows stakeholders to
assign relative importance to certain layers in the analysis. Weights can be a highly subjective
component to your analysis, unless they are determined in a proper manner (for example, using
the Delphi method). Altering weights can change the results of your analysis. You have the option
to choose weights as relative percentages that sum to 1.
Figure 7.11. In this example, 1 indicates a cell that is not suitable and 9 indicates the most suitable cells. The
remaining values indicate varying degrees of suitability.
7-24
Suitability modeling
Fuzzy overlay
Fuzzy overlay is based on fuzzy logic. The basic premise behind fuzzy logic is that there are
inaccuracies in attributes and in the geometry of spatial data. Cells are assigned values that
represent their membership to a set of suitable locations. These membership values range from
zero to 1, with zero indicating non-membership to a set (unsuitable) and 1 indicating membership
to a set (suitable). Fuzzy overlay is best suited for analyzing data that does not adhere to discrete
polygons and boundaries, such as landslides or disease outbreaks.
Figure 7.12. This image is the result of running fuzzy overlay to find bald eagle habitats near Big Bear Lake,
California. Red cells are least suitable and green are more suitable.
7-25
Lesson 7
Binary analysis determines good or bad sites by assigning a value of 0 or 1, with 0 being
unsuitable and 1 being suitable. The Raster Calculator is a commonly used Spatial Analyst tool for
performing map algebra. Map algebra is a powerful language for raster analysis that allows you to
perform various mathematical, logical, relational, and other types of operations on raster data. In
this example, you are testing criteria within each raster and then creating a raster that identifies
the cells in which all the criteria have been met.
7-26
Suitability modeling
After you have transformed values to a common suitability scale and combined the layers, your
output is a suitability surface. Within the suitability surface are values within your chosen suitability
scale (1 to 5, 1 to 10, and so on). You may want to show only the best sites that range from 8 to
10.
Figure 7.14. On the left, a suitability surface with values ranging from 1 to 10. On the right, the result of using map
algebra in the Raster Calculator to only show cells with a value of 8 or higher.
Locating regions
When you create a suitability surface, it may be difficult to know which areas are the most suitable.
Often, the most suitable areas are not contiguous or are isolated from one another. By
incorporating a tool called Locate Regions, you can avoid arbitrary assignment of the most
suitable locations. The Locate Regions tool is part of the final step in any suitability modeling
workflow. Locate Regions identifies the best regions, or groups of contiguous cells, in the
suitability surface that meet your desired suitability criteria and other spatial constraints.
7-27
Lesson 7
Locate Regions is often used along with the Cost Connectivity tool to select and then connect the
best available regions in the least-cost way.
7-28
Suitability modeling
The analysis that you perform in the next exercise will determine the most suitable places for bear
habitats in northern Vermont, near Lake Champlain. Before you perform the analysis, you will use
ArcGIS Pro to evaluate the criteria and data.
For this analysis, the criteria for suitable bear habitats are as follows:
Instructions
a If necessary, start ArcGIS Pro and restore the course project.
1. Does the BearSuitability geodatabase contain the necessary datasets for each criterion?
_____________________________________________________________________________________
2. Which tool will you use on Streams and Roads to create distance surfaces that you can use
in raster overlay?
_____________________________________________________________________________________
4. Do you have to derive a surface from the LandUse raster or can you use it as an input as is?
_____________________________________________________________________________________
7-29
Exercise 7B 15 minutes
You have built a model and classified and transformed data to a common scale. Now, you are
ready to overlay the rasters to locate suitable regions for bear habitats in Vermont.
• Overlay rasters.
• Locate suitability regions.
7-30
Suitability modeling
a If necessary, restore the ArcGIS Pro project and view the Bear Suitability model.
There are several raster overlay tools, such as Weighted Overlay and Weighted Sum. You will use
Weighted Sum because it allows you to input floating point rasters (for example, the outputs of
the Rescale By Function tool) and Weighted Overlay does not.
c Add Weighted Sum (Spatial Analyst Tools) to the right of the model.
LandUseRcl 0.25
SlopeRescale 0.20
RoadsRescale 0.20
StreamsRescale 0.35
7-31
Lesson 7
i Click OK.
7-32
Suitability modeling
q For Min and Max, replace the current values with 0.5 for each.
In many cases, you can assume that most pixel values fall within an upper and lower
limit. Therefore, it is reasonable to trim off the extreme values. You can do this
statistically by defining either a standard deviation or a clipping percent.
The suitability surface illustrates the suitability ranges for the entire study area. From the surface,
you can get a better idea where it is suitable for bear habitats.
r In the Contents pane, turn input layers on and off to evaluate the results.
You can see that the cells are red, or less suitable, where there are roads. You can also see that it is
more suitable for bear habitats where the elevation is lower, due to less steep slopes.
7-33
Lesson 7
c Add the Locate Regions tool to the model next to the BearSuitability output element.
e Open the Locate Regions tool, and set the following parameters:
f Click OK.
k In the Contents pane, for BearPatches, set the fill for the zero value to No Color.
7-34
Suitability modeling
l In the Contents pane, toggle off and on BearPatches so that you can see how they compare to
the original suitability surface.
You have used raster data to perform suitability modeling to locate the bear habitat regions based
on a set of criteria.
7-35
Lesson 7
Lesson review
2. Explain the difference between the Reclassify tool and the Rescale By Function tool.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
7-36
Answers to Lesson 7 questions
You cannot use regular overlay tools on continuous (raster) data. You must use Spatial
Analyst tools to analyze raster data.
Exercise 7A: Build a model and classify data to a common scale (page 7-12)
1. Which values should receive the most suitable class of 5 when you reclassify?
Values 6, 7, and 8, which represent deciduous, evergreen, and mixed forest
2. Which values do you consider to be highly suitable for bear habitats, but not as good as values
6, 7, and 8?
Values 9 and 10, which are scrub/shrub and forested wetlands
7-37
Answers to Lesson 7 questions (continued)
3. Which values are least suitable for bear habitats?
Values 1, 2, and 3, which represent developed areas
2. Which tool will you use on Streams and Roads to create distance surfaces that you can use in
raster overlay?
Use the Euclidean Distance tool.
4. Do you have to derive a surface from the LandUse raster or can you use it as an input as is?
You can use it as an input as is.
7-38
8 Spatial statistics
When you look at a map, your mind will naturally try to identify patterns, trends, and spatial
relationships.
Spatial statistics extend these natural processes by quantifying spatial distributions and spatial
relationships. Spatial statistics allow you to supplement the subjective perspective of your
data with concrete numbers and statistics. Statistics help with enhancing communication,
fostering consensus, facilitating problem-solving through analysis, promoting decision
making, and providing mechanisms for evaluating the impacts of those decisions. In this
lesson, you will focus on the most intuitive and commonly used spatial statistics solutions.
Topics covered
Spatial patterns
Data distributions
8-1
Lesson 8
Spatial patterns
A spatial pattern may lead to questions about the possible processes that create the pattern.
Most spatial phenomena exhibit some type of pattern that is probably influenced by some other
factor. For example, an animal species may migrate along the same path every year because there
is plenty of food and water and few predators.
Have you worked with datasets containing thousands of points, like you see in the map on the
left? Did you feel unsure about where to begin to better understand patterns?
Figure 8.1. It is difficult to distinguish spatial patterns from a map of points, as shown on the left. The result of
running a spatial statistics tool, shown on the right, shows you clusters of high and low counts of graffiti incidents.
The red clusters in the map on the right are called hot spots. Hot spots are statistically significant
spatial clusters of high values. You will use hot spots to distinguish spatial patterns.
1. What have you done to busy, subjective maps to show them in a meaningful way?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
8-2
Spatial statistics
Spatial statistics are the application of tools and methods that use space and spatial relationships
(such as distance, area, length, orientation, centrality, coincidence, connectivity, orientation)
directly in their mathematical computations.
• Minimize the subjectivity inherent in human visual interpretation of maps and spatial data.
• Identify and quantify patterns and trends in data that may not be revealed in visual analysis.
• Answer questions more confidently and make important decisions using more than simple
visual analysis.
Figure 8.2. Maps displayed using a graduated color classification can be subjective and highlight things that the
mapmaker wants you to notice. Both maps have the same crime index data but are visualized using different
classification methods.
8-3
Lesson 8
The following map is based on statistically significant hot and cold spots.
Figure 8.3. The same data as the previous maps is now visualized using a hot spot map. The red and blue colors
indicate the results of a statistical test for spatial clustering, rather than the data values of the attribute mapped in
the graduated color map.
8-4
Spatial statistics
Descriptive statistics
Descriptive statistics return a summary about your data, whether the result is quantitative like a
summary statistic (mean, sum, and so on) or visual, such as a graph or feature class. In GIS,
descriptive statistics commonly measure central tendency, dispersion or concentration, or
orientation of spatial phenomenon.
Figure 8.4. On the left, the Mean Center tool locates the geographic center for a sample of points. On the right, the
Directional Distribution tool uses standard deviational ellipses to show directional trends of incidents by day and
night.
8-5
Lesson 8
Inferential statistics
In classical statistics, inferential statistics infer something about the entire population based on the
distribution of values in a sample. A common example of inferential statistics is predicting the
outcome of an election based on polls. Inferential statistic tests begin by identifying a null
hypothesis. The null hypothesis for pattern analysis tools is complete spatial randomness in either
the location of features or the values associated with those features. Some spatial statistics tools
return statistics that indicate the degree of statistical significance, which in turn provides you with
a degree of confidence in rejecting or not rejecting the null hypothesis of complete spatial
randomness.
Esri Press: The Esri Guide to GIS Analysis, Volume 2: Spatial Measurements and
Statistics
8-6
Spatial statistics
Given the null hypothesis of complete spatial randomness, many of these spatial statistics tools
compare the observed spatial distribution in data to that of a theoretical random spatial
distribution and calculate common statistical significance tests from this comparison. You can
visually represent your randomization null hypothesis using the standard normal distribution (with
a mean of zero and a standard deviation of 1) and use it to interpret two common outputs from
these tools: z-scores and p-values.
Figure 8.6. A normal distribution chart looks like this example, with most the data falling around the mean and the
highest and lowest values occurring in the tails. When this graphic is used to interpret the results of spatial statistics
tools, output values in the tails indicate that it is unlikely that the observed spatial pattern is the result of random
chance.
In spatial statistics, the z-score and p-value may be interpreted differently. When you run a spatial
statistics tool, the resulting statistics are then compared with the expected value of that statistic
under the null hypothesis of complete spatial randomness. This comparison results in the z-scores
that you see in the outputs.
8-7
Lesson 8
For example, assume that you run a tool and receive a z-score of 2.16. For a p-value of 0.05, this
z-score exceeds the critical value of 1.96, meaning that you can reject the null hypothesis of
complete spatial randomness. In other words, there is less than 5 percent likelihood that the
observed pattern is the result of random chance.
Figure 8.7. You can compare the legend with the chart to see which features are statistically significant.
8-8
Spatial statistics
Based on the result provided, determine whether the statistical tool used is descriptive or
inferential.
1. Does the Directional Distribution tool describe your data or make statistical inferences
based on the data provided?
_____________________________________________________________________________________
8-9
Lesson 8
The Average Nearest Neighbor tool was run to create a report and provide statistics about a
distribution of point features.
2. Does the Average Nearest Neighbor tool describe your data or make statistical inferences
about it based on the data provided?
_____________________________________________________________________________________
8-10
Spatial statistics
3. Does the Optimized Hot Spot tool describe your data or make statistical inferences about
it based on the data provided?
_____________________________________________________________________________________
8-11
Lesson 8
Spatial statistics may seem daunting, but running the tools can quickly provide valuable
information that you can use to understand your data and make informed decisions. Two common
tools used are the Directional Distribution tool and the Spatial Autocorrelation tool.
Figure 8.11. Results of running the Directional Distribution tool on the left and the Spatial Autocorrelation tool on
the right.
• Directional Distribution tool: Creates standard deviational ellipses to summarize the spatial
characteristics of geographic features, such as central tendency, dispersion, and directional
trends.
• Spatial Autocorrelation tool: Measures spatial autocorrelation based on both feature
locations and feature values simultaneously. Given a set of features and an associated
attribute, it evaluates whether the pattern expressed is clustered, dispersed, or random.
8-12
Spatial statistics
Clusters occur when phenomena are found in close proximity to one another. Clusters also occur
when groups of features with similarly high or low values are in close proximity, or based on the
degree of similarity among feature attributes. Clusters in your data can identify the locations of
hot spots, cold spots, outliers, and similar features. Finding locations of clusters and spatial
patterns in your data can lead to powerful discoveries and fuel further research questions and
analysis.
Figure 8.12. This graphic illustrates the different ways in which your data can be spatially distributed. To the left, the
data is dispersed, and there is no clustering observed. As you move to the right, the data gets more clustered.
Figure 8.13. On the left, results of the Optimized Hot Spot tool show statistically significant hot and cold spots
(clusters). On the right, results of the Optimized Outlier Analysis tool identify areas where there are low outliers
within hot spots and high outliers within cold spots.
Cluster analysis also may find unusual or extreme data values, called outliers, where one or a few
features may have values that are very different from nearby features. In data analysis, outliers can
potentially have a strong effect on results, so they must be analyzed carefully to determine if they
represent valid or erroneous data.
8-13
Lesson 8
Density-based clustering
Another powerful clustering tool is Density Based Clustering. The Density Based Clustering tool
finds clusters of point features within surrounding noise based on their spatial distribution. Do not
confuse density-based clustering with density analysis tools, like Kernel Density. Density analysis
tools take known quantities of a phenomenon and spread it across the landscape.
Figure 8.14. Density-based clustering can identify natural clusters in your data.
8-14
Spatial statistics
Clustering tools
Figure 8.15. Hot spot, outlier, and multivariate clustering tools in action.
The Gi* statistic returned for each feature in the dataset is a z-score. For statistically significant
positive z-scores, the larger the z-score is, the more intense the clustering of high values (hot
8-15
Lesson 8
spot). For statistically significant negative z-scores, the smaller the z-score is, the more intense the
clustering of low values (cold spot).
• You would use the Hot Spot Analysis (Getis-Ord-Gi*) tool when you want full control over
every parameter option.
• You would use the Optimized Hot Spot Analysis tool when you want the tool to interrogate
your data to determine optimal parameter values.
This tool identifies statistically significant spatial clusters of high values (hot spots) and low values
(cold spots), as well as high and low outliers within your dataset.
8-16
Exercise 8A 25 minutes
You will use various descriptive and inferential statistics tools to quantify patterns and relationships
in ozone sample data.
8-17
Lesson 8
The map displays the same ozone points that you worked with in the interpolation lesson.
g Click OK.
8-18
Spatial statistics
c Open the Directional Distribution (Standard Deviational Ellipse) tool, and set the following
parameters:
d Click Run.
When the underlying spatial pattern of features is concentrated toward the center with
fewer features toward the periphery, one standard deviational ellipse polygon will cover
approximately 63 percent of the features.
The ellipse indicates a directional trend that is based on the orientation of the sample points.
Next, you will run the same tool using a different ellipse size to account for more sample points.
8-19
Lesson 8
f In the Geoprocessing pane, modify the following parameters for the Directional Distribution
tool:
g Click Run.
Choosing two standard deviations and including more of the sample points results in a larger
ellipse with a similar directional trend as the other ellipse for one standard deviation.
The Average Nearest Neighbor tool uses area as a parameter, so you will copy the area from the
StateBoundary layer.
b In the table, double-click the value for AREA, right-click it, and choose Copy.
8-20
Spatial statistics
f Open the Average Nearest Neighbor tool, and set the following parameters:
g Click Run.
The output from the Average Nearest Neighbor tool is a report, not a feature class.
The null hypothesis states that the locations of the features are randomly distributed. However,
the z-score (-3.30) returned from the Average Nearest Neighbor tool is statistically significant at a
confidence level of 99 percent, meaning that there is less than a 1 percent likelihood that the
spatial pattern of ozone locations is the result of random chance.
i Close the report, minimize File Explorer, and return to ArcGIS Pro.
8-21
Lesson 8
a In the Geoprocessing pane, go back to the search results and search for spatial.
b Open the Spatial Autocorrelation (Global Moran's I) tool, and set the following parameters:
c Click Run.
The warning is due to not providing a distance band or threshold distance. The
software calculates a default value of 154,019 meters, or roughly 95 miles, to ensure
that every feature has at least one neighbor.
If you were investigating crime clusters, a distance of 800 meters might work because that is the
average size of a city block. When choosing the appropriate distance threshold, it is often based
on common sense, theory, and the field that you are working in.
8-22
Spatial statistics
The high z-score indicates that you can reject the null hypothesis and that the spatial distribution
of high values and low values in the data is more spatially clustered than would be expected if the
underlying spatial processes were random.
e Close the report, minimize File Explorer, and return to ArcGIS Pro.
b Open the Hot Spot Analysis (Getis-Ord Gi*) tool, and set the following parameters:
150000 is the value that the Spatial Autocorrelation tool used as the threshold distance.
Statistically significant hot spots of high ozone values are in north-central California. These hot
spots indicate higher ozone values surrounded by other higher ozone values.
e From the map tab, in the Navigate group, click the Explore down arrow and choose Topmost
Layer.
8-23
Lesson 8
The OZONE values are all close to one another and are relatively high, around 0.08 to +1.0.
Another benefit of performing hot spot analysis is seeing where there are clusters of low values, or
cold spots.
h Zoom to the southern part of California where there are cold spots (blue points).
Most of the ozone values for the cold spots are around 0.05 or lower.
Each point is assigned a value for z-score and p-value, and those values are added to the attribute
table. When you have a combination of a high z-score and a low p-value, it indicates spatial
clustering of high values. The lower or negative z-scores in combination with a lower p-value
indicate spatial clustering of low values.
Next, you will use the Optimized Hot Spot Analysis tool and let ArcGIS Pro determine the optimal
parameters.
8-24
Spatial statistics
m Open the Optimized Hot Spot Analysis tool, and set the following parameters:
n Click Run.
The results are similar to that of the Hot Spot Analysis tool, but there are some differences if you
turn the two layers off and on.
a In the Contents pane, turn off the OptimizedOzone and HotSpots layers.
c Open the Kernel Density tool, and then at the top of the pane, click the Environments tab.
e At the top of the pane, click Parameters, and then set the following parameters:
8-25
Lesson 8
The density surface aligns well with the sample points, as it should, because you did not use a
population field and only used the point location to create the surface. However, the symbology
of the heat map makes it difficult to interpret patterns.
h In the Contents pane, turn on the OptimizedOzone layer and turn off the Samples layer.
i In the OptimizedOzone layer legend, click the symbol for Hot Spot - 99% Confidence.
m In the map, zoom in on the northernmost part of California, where there is a high density of
sample points.
The hot spots at a 99% confidence interval are not all located in the high-density area of the
points. The density surface is symbolized based on density of features, not ozone values.
Comparing a density surface to a statistical result demonstrates how spatial statistics quantify
spatial patterns and how heat maps do not.
n Save the project, and then close the map view and continue to the next exercise.
8-26
Exercise 8B 20 minutes
Clustering is an important concept that can give great insights into your data, its relationships with
other data, and potential reasons about why it behaves a certain way. You will use the Spatial
Statistics clustering tools to locate natural clusters in your data based on geographic location,
perform a hot spot analysis on incident points to determine clustering of incidents, and run outlier
analysis to verify patterns that you have visually analyzed.
8-27
Lesson 8
From a quick visual analysis, you can tell that more applicants are in the larger cities, like Atlanta.
You will use density-based clustering tools to validate your initial insights and locate natural
clusters in the data.
8-28
Spatial statistics
b Open the Density-Based Clustering tool and set the following parameters:
Some clustering methods require the user to input a search distance. The Self-Adjusting
(HDBSCAN) method is a data-driven approach where the software determines the best
search distance for the input features.
c Click Run.
All the colored areas represent clustering, whereas the gray locations represent "noise" in your
data.
8-29
Lesson 8
Each feature is assigned a Cluster ID value. If a feature is part of a valid cluster, then the value is
positive, whereas a -1 indicates that the point location is noise. ArcGIS Pro symbolizes the output
layer using the Cluster ID attribute. Next, you will find the best locations for career fairs using the
clusters and Mean Center, a descriptive statistical tool.
g From the Map tab, in the Selection group, click Select By Attributes.
All the features that are part of a natural cluster are selected. Next, you will use the Mean Center
tool on the selected points to locate the best places for career fairs.
l In the Geoprocessing pane, search for and open the Mean Center tool.
8-30
Spatial statistics
n Click Run.
The Mean Center tool located the geographic center of each cluster, thus identifying some
possible areas to explore for locating the career fairs.
8-31
Lesson 8
There are more than 120,000 graffiti locations in these New York neighborhoods. With more than
120,000 points, how can you acquire useful information from the data? Using spatial statistics
tools is a great place to start. You will first perform an optimized hot spot analysis to identify hot
and cold spots based solely on location.
f Search for and open the Optimized Hot Spot Analysis tool, and then set the following
parameters:
If you do not override the cell size, the cell size will be determined using the Average
Nearest Neighbor tool, which will be roughly 430 meters. You can let the software
calculate certain settings or override them if you are unfamiliar with the data.
8-32
Spatial statistics
g Click Run.
There are statistically significant hot spots and some statistically significant cold spots. You may
want to examine the hot spots further to determine the causes for more graffiti in those areas and
then establish a remediation plan.
The non-significant areas often have no graffiti points in them. For example, the large area in the
center has no points.
8-33
Lesson 8
You will use the basemap layer to gain a better understanding of why some locations have no
graffiti points.
There are no graffiti incidents reported in Central Park, possibly due to more security and a lack of
structures or objects to paint or draw on.
b With the Explore tool, zoom to the large hot spot that is located farthest north.
8-34
Spatial statistics
c In the Contents pane, ensure that the OptimizedHS layer is selected, and then click the
Appearance tab.
e Click in the map, and drag the Optimized layer to see the basemap.
The Bronx Zoo is in the location where there are few to no graffiti points, yet it exists within a hot
spot. The reason there are hot spots even though there are no points is because the red hex bins
are red based on their neighbors. You will use a local outlier tool to explore this area further.
8-35
Lesson 8
f Search for and open the Optimized Outlier Analysis tool, and then set the following
parameters:
g Click Run.
The Optimized Outlier tool validates your findings that the area of the zoo contains spatial outliers
of graffiti incidents. The blue fishnets indicate a low number of graffiti incidents per bin,
surrounded primarily by fishnets with high numbers of incidents.
8-36
Spatial statistics
Lesson review
8-37
Answers to Lesson 8 questions
8-38
9 Space-time analysis
Incorporating time into your spatial analysis allows you to focus on how spatial patterns in
your data may vary or change over time. By analyzing data over time, you may detect trends
or patterns that you would not otherwise detect had you analyzed the data for the entire time
period. Understanding how patterns have changed over time can help you determine how
they might change in the future and better prepare you for these changes. ArcGIS Pro has
tools for analyzing data in space and time.
Topics covered
Temporal analysis
9-1
Lesson 9
You have performed many analyses in the course and each one answered one specific question:
Where is something?
GIS is often used to find where things are located, but you can add another factor to your analysis
that can give you more information about data and its patterns: time.
How can incorporating time into your data improve your analysis results?
9-2
Space-time analysis
Temporal analysis
Proximity, overlay, and statistical analysis help you determine the "where" questions about your
data and look at the spatial variations in your data. Time-based analysis, or temporal analysis,
adds another dimension to your analyses and can help answer the "when" questions about your
data. In GIS, temporal analysis refers to an analysis that involves a time attribute. Temporal
analysis is useful for studying the variation, or changes, in data over time at the same location.
9-3
Lesson 9
To gain an understanding of the temporal variability of the ozone measurements, you would
capture ozone readings at the same location and measure the variance in the samples over time.
• Reveal patterns that may not be detected when visualizing data over the full time period.
• Determine if an event occurs more frequently during certain times of day, week, month, hour,
or minute.
• Focus efforts on more recent occurrences.
9-4
Exercise 9A 10 minutes
Explore data
Exploratory data analysis is a big part of analysis. You can view data in a map, view attributes, or
make charts to discover new information about your data that will help in your analysis. You will
examine historical tornado data from Texas and do some exploratory analysis using time.
9-5
Lesson 9
d In the Contents pane, open the Tornado Start Points layer attribute table and explore the
attributes.
1. Which attribute can you use to analyze the temporal aspect of the tornado points?
__________________________________________________________________________________
f In the Contents pane, ensure that Tornado Start Points is selected, and then click the Data tab.
g In the Visualize group, click Create Chart and choose Line Chart.
i Under Time Binning Options, set Interval Size to 1 Years, and then click away from the setting
to see the change in the chart.
9-6
Space-time analysis
3. How can using 70 years' worth of data influence a hot spot analysis?
__________________________________________________________________________________
j Close the Texas map, and keep ArcGIS Pro open for the next exercise.
9-7
Lesson 9
Space-time analysis
Everything happens within the context of space (location) and time. GIS can analyze spatial
patterns well, but spatial patterns may change over time. If you are analyzing your data in space
only, then you may only be getting half the story. You can create and analyze time snapshots and
perform true space-time analysis.
Time snapshots
Time is often analyzed as a time snapshot, or arbitrary groupings based on time. For example, you
may have data that spans one year and break it up into 12 layers, each representing a month, or
you could add a Month attribute to the table and categorize by month. While breaking up layers
by time snapshots allows you to visualize temporal trends over the course of the year, you may be
arbitrarily breaking up data that is truly related in space and time, and possibly missing important
patterns and trends.
In the following image, the data is broken up into snapshots for January and February, but notice
the dates associated with each point. Each incident occurred within a few days of each other, but
by arbitrarily separating them into separate month bins, you may miss a potential pattern.
9-8
Space-time analysis
True space-time analysis considers each incident in relation to incidents near to it in both space
and time, and it is not dependent on arbitrary categories, such as months or days.
In the following graphic, the time slice, also referred to as the time step interval, may seem similar
to a time snapshot, but it is not. A time step interval is a moving window of time that considers all
its neighbors in time and space, while a time snapshot does not consider other features.
Figure 9.4. Each point is put into a bin time series, with the bottom being older and the top being more recent
events.
9-9
Lesson 9
Two tools for creating a space-time cube are the Create Space Time Cube By Aggregating Points
tool and the Create Space Time Cube From Defined Locations tool. Both tools take time-stamped
features and structure them into a netCDF (network Common Data Form) data cube by
generating space-time bins with either aggregated incident points or defined features with
associated spatiotemporal attributes. NetCDF is a file format for storing multidimensional
scientific data (variables), such as temperature, humidity, pressure, wind speed, and direction.
9-10
Space-time analysis
Earlier, you examined a dataset containing more than 120,000 graffiti incident locations. Because
of the sheer number of points, it was difficult to visually identify any spatial patterns. You ran the
Optimized Hot Spot Analysis tool and its result showed statistically significant hot and cold spots.
From the optimized map and statistical results, you can identify clusters. You know where the hot
and cold spots are, but you want to add time into the analysis to see how graffiti patterns have
changed to help narrow your areas of interest. You cannot put resources everywhere, so by
narrowing the focus to current hot spots, you may be more efficient at prevention or mitigation.
Figure 9.6. Hot spots created from thousands of points indicate clusters where there are statistically significantly
higher numbers of graffiti incidents occurring than the remainder of the study area. But does this map tell the whole
story?
You can use a tool called Create Space Time Cube By Aggregating Points to create a netCDF file
(network Common Data Form—a file format for storing multidimensional scientific data)
containing the space-time cube. Then you can add the space-time cube into the Emerging Hot
Spot Analysis tool to get the result in the following graphic. The following result uses a space-time
cube based on eight years of data broken up into three-month time intervals to reflect the
seasons, as there may be seasonal variance to graffiti occurrences. The legend for the layer
provides useful descriptions of the symbology in the map. For example, a new hot spot may be an
area of interest as it was never a hot spot before the most recent time interval.
9-11
Lesson 9
Figure 9.7. Emerging Hot Spot Analysis results show you varying degrees of hot and cold spots based on a time
interval.
9-12
Space-time analysis
• Run the Create Space Time Cube By Aggregating Points tool to create a netCDF dataset.
• Run the Emerging Hot Spot Analysis tool.
• Run the Visualize Space Time Cube in 3D tool.
You can run the Visualize Space Time Cube In 3D tool in any order with the Emerging Hot Spot
Analysis tool, but you have to create the space-time cube before either. The space-time cube
netCDF file is used as an input in both the Emerging Hot Spot Analysis tool and the Visualize
Space Time Cube In 3D tool.
9-13
Lesson 9
• Better understand the structure of the space-time cube and how the process of aggregation
into the cube works.
• Offer insights into the results of Emerging Hot Spot Analysis and Local Outlier Analysis,
providing evidence that can help you understand the result categories themselves.
• Additionally, visualizing summary fields and variables can help you understand how confident
that you can be in subsequent analyses by displaying the spatial pattern of empty bins that
had to be estimated.
9-14
Exercise 9B 20 minutes
Earlier, you used the Optimized Hot Spot Analysis tool to evaluate more than 120,000 graffiti
incidents and show where there are hot spots, cold spots, and areas with no statistically significant
clusters of graffiti. However, the data spans an eight-year time period, and you would like to dive
deeper into the analysis by incorporating a third dimension: time. Your optimized hot spot map
shows you where there are hot spots and cold spots of graffiti incidents for eight years of
cumulative data without consideration of how consistent they have been or if they have changed
in location over time. By factoring in time, you can narrow down areas of interest and focus on a
few problem spots to help reduce graffiti. In this exercise, you will use ArcGIS Pro space-time
pattern mining tools to further explore the incidents by factoring in when the incident occurred.
9-15
Lesson 9
c In the Contents pane, make Graffiti, OptimizedHS, and the basemap the only visible layers.
You will incorporate time into your analysis to determine if there is more to the story regarding the
graffiti incidents. There are 120,878 points that span over eight years. You are more concerned
about incidents that happened in the last few months to a year rather than ones that happened
eight years ago. Space-time analysis tools can locate temporal hot spots that you can focus on.
9-16
Space-time analysis
You likely clicked a different point in the map, but you will notice the Created_Date field. Each
point has a date and time associated with it that you will use to further investigate the spatial
patterns in the graffiti incidents. Exploratory data analysis is always recommended before
statistical analysis, as many statistical methods require an understanding of data distribution,
presence of outliers, spatial autocorrelation, and other factors. You will display the temporal trend
in the graffiti incidents using a line chart.
i With the Graffiti layer still selected, click the Data tab.
j In the Visualize group, click Create Chart and choose Line Chart.
The chart shows all eight years of data using a one-month interval size. You can see that there are
natural dips and spikes in incidents. One notable pattern is that in the spring of each year, there is
a large spike in incidents. You are not trying to determine why graffiti occurs; rather, you are trying
to get a sense of temporal patterns. The interval size used in the chart depends on the data that
you have. For example, you could show data that spans 50 years in five-year intervals, but if the
data is for one day, you may show it in increments of several hours. For the incidents, you will
change the interval to three months, based on seasons.
9-17
Lesson 9
When you view the chart using a three-month time interval based on season length, you can still
see a temporal pattern that indicates graffiti spikes at times and dips at times. Temporal variation
in graffiti incidents may be caused by seasonal changes, as spring and summer weather may be
more common times to create graffiti. The chart shows you that there is temporal variation in
graffiti incidents, but by incorporating time using the space-time pattern mining tools, you can
learn much more.
a In the Geoprocessing pane, click the Back button until you see the list of recently used tools.
Hint: If you closed the Geoprocessing pane: Analysis tab > Tools.
d Open the Create Space Time Cube By Aggregating Points tool, and set the following
parameters:
9-18
Space-time analysis
The Time Step Interval parameter is not a time snapshot. By setting the time step
interval to 3 months, you will be able to compare each season to the previous season,
and so on.
Hexagon grids are a good alternative to a fishnet grid. Hexagon grids reduce
sampling bias and represent patterns in your data more naturally than a fishnet
grid. Finding neighbors is also easier because the length of contact is the same
on each side.
• Distance Interval: 500 Meters
You used 500 meters when you ran the Optimized Hot Spot Analysis tool, so you
will use the same value here.
e Click Run.
f At the bottom of the Geoprocessing pane, point to the green box to view the messages for
the tool.
g Scroll down in the messages to view the statistics about your space-time cube file.
No visual result is created, but you will input the netCDF file that you created into the Emerging
Hot Spot Analysis tool.
b In the Space Time Pattern Mining Tools toolbox, open the Emerging Hot Spot Analysis tool.
9-19
Lesson 9
e In the Contents pane, make EmergingHS and the basemap the only visible layers.
Running the Emerging Hot Spot Analysis tool provides valuable information about your data. You
can match the map symbology with the legend and description to gain a good understanding of
the areas that you should focus on. For example, any persistent, intensifying, or new hot spots
may be of interest, while sporadic hot spots may not be a concern. The results narrow down the
focus even more from the result of the Optimized Hot Spot Analysis tool because you
incorporated time.
How are the descriptions in the legend created? You will find information in the ArcGIS Pro Help
documentation.
9-20
Space-time analysis
Hint: In the upper-right corner of ArcGIS Pro, click the View Help question mark.
The table provides descriptions of all the categories in the emerging hot spot result. You can see
why certain locations in your map are new hot spots—they were a statistically significant hot spot
for the final time step and were never a hot spot before. The information that you can discover
from the space-time pattern mining tools may reveal patterns and trends that were not visual to
the human eye, or that were not apparent when analyzing the data over the entire time period.
i Briefly read the descriptions for Intensifying Hot Spot and Persistent Hot Spot.
You started with eight years' worth of data and more than 120,000 points. Now, you have
narrowed down your analysis focus to two or three smaller areas.
a From the View tab, in the View group, click Convert and choose To Local Scene.
d Open the Visualize Space Time Cube In 3D tool, and set the following parameters:
The Display Theme parameter determines the type of information shown in the
legend.
• Output Features: GraffitiCube3D
e Click Run.
9-21
Lesson 9
f In the Contents pane, in 2D Layers, turn off all layers, including the basemap.
h Click the roller wheel of the mouse, and tilt the 3D view.
Visualizing the space-time cube in 3D allows you to see how hot and cold spots have changed
over time. Each hexagon bin represents a three-month time interval, with the oldest time intervals
on the bottom and the most recent on the top.
You can enable time on the layer to see each time interval display using the Time Slider.
l For Layer Time, choose Each Feature Has Start And End Time Fields.
m Click OK.
9-22
Space-time analysis
n If necessary, zoom out so that you can see all the hexagon bins.
o Point to the time slider, and then click the play button to watch each hexagon bin's three-
month time interval appear.
You have used space-time analysis tools to dig deeper into your data to better understand it.
9-23
Lesson 9
Lesson review
3. Differentiate between analyzing time snapshots of data and true space-time analysis.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
9-24
Answers to Lesson 9 questions
3. How can using 70 years' worth of data influence a hot spot analysis?
Hot spot analysis will show clusters, but what you will not see are the possible changes in
those hot or cold spots over time.
9-25
10 Regression analysis
Most of the GIS analyses that you have performed in the course (including proximity, overlay,
and statistical hot spot analyses) determine where phenomena are located. You may want to
further explore the observed spatial patterns to determine why a phenomenon is occurring.
When you understand what contributes to a phenomenon's occurring, you are better
equipped to offer data-driven solutions to mitigate or prevent a problem.
In this lesson, you will learn about regression—how to use it for explanatory analysis and some
of the statistics that help you locate a valid model. You will also learn a workflow for finding
the best regression model, and use ArcGIS Pro to perform ordinary least squares (OLS)
regression.
Topics covered
What is regression?
Regression equation
10-1
Lesson 10
Most GIS analysis results tell you where something is located—for example, the best site for a new
store, spatial clustering of various phenomena, or areas where a specific disease is more
prominent than other diseases. Knowing where something occurs is beneficial, but understanding
what contributes to spatial phenomena can help solve a problem. ArcGIS Pro provides statistical
analysis tools that allow you to model spatial relationships to help you understand the factors
causing spatial patterns and predict future patterns based on current data and trends.
Figure 10.1. In this map, optimized hot spot analysis indicates the location of statistically significant hot spots of
higher Medicare spending in hospital referral regions. But why might this spatial pattern exist?
When you look at this map of Medicare spending, you may wonder why there is a hot spot for
spending in the southern and southeastern states. ArcGIS Pro can help identify factors that
contribute to or cause spatial patterns.
10-2
Regression analysis
You will consider two situations to determine the factors that might have caused certain
phenomena.
Scenario 2: Graffiti
In your job as a GIS analyst for a police department, you notice a high volume of graffiti in the
city. You want to determine the possible causes of graffiti in certain areas.
10-3
Lesson 10
What is regression?
Regression is a statistical method for evaluating the relationship between variables. By evaluating
that relationship, it is possible to hypothesize about the causes of a pattern. Using regression
analysis, you can model, examine, and explore spatial relationships to better understand factors
behind spatial patterns. When you have quantified the factors that contribute to a phenomenon,
you can make better decisions.
• Continuous (Gaussian): The variable that you are modeling is continuous. This model
performs ordinary least squares (OLS) regression.
• Binary (logistic): The variable that you are modeling represents presence or absence. This
can be either conventional 1s and 0s, or continuous data that has been recoded based on
some threshold value.
• Count (Poisson): The variable that you are modeling is discrete and represents events (for
example, crime counts, disease incidents, or traffic accidents).
10-4
Regression analysis
Benefits of regression
• Explore correlations: Does higher Medicare spending translate to better health or better-
quality health care?
• Predict unknown values: How many claims for heat-related illness are expected given current
weather forecasts?
• Understand or explain key factors that contribute to a process: Why are test scores higher in
certain parts of the country?
When you understand key factors that contribute to a process, you can be confident that the
relationships that you find are real, and you can use information to guide decision making.
10-5
Lesson 10
Regression equation
The regression equation is the core of regression analysis. It provides a context for understanding
terms used in regression analysis. ArcGIS Pro regression tools create the equation based on
parameters that you set for those tools. In regression analysis, there is a dependent variable and
one or more independent variables thought to influence or contribute to the dependent variable.
Regression is used to predict the value of the dependent variable that you are trying to model or
to determine the degree to which an independent variable is important to your model.
Figure 10.3. The OLS regression equation. The equation assumes that all relationships are linear.
Dependent variable: The variable representing the process being predicted or modeled, such as
test scores, foreclosures, or Medicare spending. The dependent variable is also called the
response variable.
Independent variable: One variable or a set of variables used to explain or predict the
dependent variable values. Independent variables are often called explanatory variables.
10-6
Regression analysis
coefficient (relative to the units of the independent variable that it is associated with), the stronger
the relationship.
Figure 10.6. Each independent variable has a coefficient. For example, spending = b0 + b1(distance) + b2(imaging
events) + b3(hospital beds) + e
Coefficients can indicate positive, negative, or no relationship between the dependent and
independent variables.
Figure 10.7. Scatter plot: A scatter plot is a type of mathematical diagram using Cartesian coordinates to display
values for typically two variables for a set of data. In each example, there is one dependent variable and one
independent variable (univariate).
The equation also has a y-intercept. The y-intercept (b0) is the expected value for the dependent
variable if all the independent variables are zero.
Negative As the value of the independent variable increases, the dependent variable's
value decreases. For example, as the percentage of college educated people
increases (y-axis), the unemployment rate (x-axis) decreases.
10-7
Lesson 10
Residual: The over- and under-predictions (errors) in the model, or the differences between actual
observed values and predicted values.
Figure 10.8. Distribution of the residuals can indicate whether you have found all key variables. The magnitude of
the difference between the observed and predicted values is one measure of model fit.
10-8
Regression analysis
OLS regression
OLS is the best known of all regression techniques, and you can access it in the Generalized
Linear Regression (GLR) tool using the continuous (Gaussian) model type. OLS is widely used
outside GIS, and it is the proper start point for all spatial regression analyses. It provides a global
model of the dependent variable or the process that you are trying to explain or predict. Global
means that a single regression equation to represent a process is applied to all the features in the
study area, thus assuming that relationships are fixed. You evaluate the OLS summary that
contains various diagnostics, including how well the model is performing and how each
independent variable is helping the model.
Figure 10.9. OLS is a global regression model that applies one regression equation to all features.
Being a global regression model, OLS creates one equation. Each variable has a single coefficient,
and the relationships between data variables are fixed across geographic space. This process is
referred to as stationarity. OLS is global and assumes stationarity, meaning you could move all the
points to different locations and the regression equation would be the same. Another type of
regression analysis that you will use later accounts for spatial variation in your variable's
relationships (nonstationarity).
10-9
Lesson 10
1. Identify the process that you want to explain or predict, as well as the data variable that
represents it.
2. Select variables that represent the factors influencing the process.
3. Explore and analyze data (descriptive stats, univariate, bivariate).
4. Choose the method (for example, OLS) and specify the model based on what you learned
about data relationships in step 3.
5. Validate and evaluate the model; perform six checks.
Figure 10.10. The regression workflow is iterative and can require a lot of work to properly specify an OLS model.
Depending on what happens in step 5, you may have several different options:
10-10
Regression analysis
Checkpoint
1. Which of the following options describes the dependent variable in the regression
equation?
c. OLS attempts to explain which variables explain a phenomenon, and to what degree.
10-11
Lesson 10
Performing OLS regression analysis is more in depth than running a tool. You must evaluate the
statistical results to determine if the variables that you selected explain the variance in the
dependent variable. If your variables meet all six requirements, then you have found a properly
specified model. Watch the video to see how to interpret the OLS results, and then answer the
following questions.
1. What should you do if the probability associated with a coefficient is not statistically
significant?
_____________________________________________________________________________________
2. What is the adjusted R-squared statistic, and how does it indicate model performance?
_____________________________________________________________________________________
_____________________________________________________________________________________
3. What does it mean if the OLS residuals are spatially clustered? What should you do to
solve the problem?
_____________________________________________________________________________________
_____________________________________________________________________________________
10-12
Regression analysis
When you perform OLS regression, you must ensure that your model passes the six checks. In this
video, you learned how to find the results and to perform the six OLS checks. A model that passes
all six checks is properly specified.
10-13
Lesson 10
The workflow for regression is to locate your dependent and independent variables, run OLS, and
then perform checks on the statistics in the OLS report. The six checks should be performed to
determine whether the variables result in a usable model. Your goal is to find a properly specified
model, or one that you can trust to explain the process represented by your dependent variable.
After you run OLS, you manually perform the six checks in any order.
Check Description
3. Residuals After you run OLS, you will see a message in the geoprocessing results
should not suggesting that you run the Moran's I tool to ensure that your residuals
be clustered are not spatially autocorrelated. Statistically significant spatial
in location or autocorrelation (clustering of residuals) can be a symptom of mis-
in value. specification, which is the wrong type of regression model. It occurs
when one or more key variables are missing.
10-14
Regression analysis
Check Description
4. Verify that A properly specified model has residuals that are normally distributed
residuals are with a mean of zero. One example of a biased model is one that does a
normally good job of predicting high values but performs poorly when predicting
distributed low values. A biased model might be the result of outliers within or non-
using the linear relationships between the data variables. If the Jarque-Bera
Jarque-Bera statistic (test) is statistically significant (it has an asterisk next to the p-
test. value), the model is biased and you cannot trust the model.
5. Are all VIF The variance inflation factor (VIF) should be less than 7.5. A VIF over 7.5
values lower for an independent variable indicates variable redundancy
than 7.5? (multicollinearity). At least one of the variables with a VIF above 7.5
should be removed.
10-15
Lesson 10
Figure 10.12. Match the number in the table with the number in the diagram to see where each of the six checks is
in the OLS diagnostic report.
10-16
Regression analysis
OLS reports
Examine the following OLS reports that use the same dependent variable yet with different
combinations of independent variables, and answer the questions.
10-17
Lesson 10
2. These OLS reports are for modeling total crime in a city but use different independent
variables. Which model is better and why?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
10-18
Regression analysis
Exploratory regression
10-19
Lesson 10
Figure 10.15. The goal of exploratory regression is finding properly specified OLS models, given the variables that
you provide.
10-20
Exercise 10 25 minutes
You will use OLS regression to analyze Medicare spending and explain the factors that contribute
to higher spending.
10-21
Lesson 10
The map is symbolized using graduated colors based on the total Medicare spending per hospital
referral region (HRR). Lighter colors indicate less spending, and darker colors indicate more
spending. You will use OLS regression to analyze potential causes for higher Medicare spending.
10-22
Regression analysis
The Total Costs 2010 variable contains the total cost of Medicare spending per HRR. According to
the regression workflow, the process or phenomenon that you are trying to understand is
Medicare spending costs, so Total Costs 2010 will be the dependent variable. Other attributes,
such as number of hospital beds, readmission rate, and number of emergency visits, may be
independent variables that explain costs.
b Scroll through the table to view the attributes, and then close it.
In your first analysis, you will test how the hierarchical condition category (HCC) score (a measure
of a population's overall health) contributes to spending. The HCC score measures the prevalence
of chronic health conditions. Before you use HCC as the independent variable to help explain
total costs, you will create a scatter plot to explore the relationships between total costs and HCC
score.
d From the Data tab, click Create Chart, and then choose Scatter Plot.
e In the Chart Properties pane, for X-Axis Number, choose Average HCC Score 2010.
There is a positive relationship between the HCC score and total costs (the HCC score increases
as total costs increase). The R-squared of 0.66 indicates that the HCC score explains 66 percent of
10-23
Lesson 10
the variance in total costs, a solid value. With an R-squared value of 0.66, you may want to include
the HCC score variable in your OLS model. Although a scatter plot shows you the relationship and
an R-squared value, you cannot view and evaluate other key diagnostics to determine if the model
is properly specified.
Step 3: Use the Generalized Linear Regression tool to test for higher
spending factors
In this step, you will test how well the HCC index score explains higher spending.
b Open the Generalized Linear Regression (GLR) tool, and set the following parameters:
c Click Run.
10-24
Regression analysis
In your map, the States layer may be covered by the GLRContinuous layer.
The results of the GLR tool are displayed in the map and symbolized using the standardized
residuals. After running regression, evaluate the report to see how well the model performed and
explains variance in the dependent variable.
d At the bottom of the Geoprocessing pane, in the green box, click View Details.
e Expand the Generalized Linear Regression report, both horizontally and vertically, so that you
can see the entire report.
1. What does the adjusted R-squared value tell you about HCC score and Medicare
spending?
__________________________________________________________________________________
2. What does the AIC value of 1770 tell you about the HCC score and Medicare spending?
__________________________________________________________________________________
An adjusted R-squared of 0.65 is suitable. If this model was properly specified, it would explain
about 65 percent of the variation in Medicare spending. However, HCC index scores may not be
the only part of the Medicare spending story in this area. Several other OLS regression
assumptions should be met before you have a properly specified model. You will wait until you
have created the full OLS model using several independent variables to do the six checks, but
first, you will explore the spatial output of OLS.
10-25
Lesson 10
3. What can you extrapolate about the residuals from the map?
__________________________________________________________________________________
You can use the Spatial Autocorrelation tool to validate your visual analysis of the residuals.
b Search for and open the Spatial Autocorrelation tool, and set the following parameters:
c Click Run.
10-26
Regression analysis
Spatial autocorrelation validates visual observation of residual clustering, which suggests that your
regression model may be missing key variables.
b From the Data tab, click Create Chart and choose Scatter Plot Matrix.
You will use Total Costs 2010 as the dependent variable because you want to determine the
factors that contribute to it. You will choose several independent variables that previous research
and theory have indicated are strong factors that contribute to higher Medicare spending. You will
analyze the number of hospital beds, evaluation and management costs, total imaging events
(MRI, CAT scan), distance to Houston, and dehydration rates.
c In the Chart Properties pane, for Numeric Fields, select the following options:
10-27
Lesson 10
You will focus on the column on the far left that has the dependent variable of Total Costs 2010 on
the y-axis and the dependent variables on the x-axis.
e In the Chart Properties pane, check the Show Histogram and Show Linear Trend boxes.
10-28
Regression analysis
The histogram shows the distribution of the variables, and the linear trends indicate positive,
negative, or no relationship between the variables.
For the most part, the relationships between all the variables and Total Costs 2010 are positive. A
positive relationship makes sense because as there are more hospital beds, evaluations and
management of a hospital, and imaging events; a greater distance from Houston; and more
dehydration, there will be more costs. These relationships are expected and indicate that these
variables may be strong factors in higher costs.
The R2, or adjusted R-squared, is an indicator of how well the independent variables model the
dependent variable. So, how well do the five variables chosen explain total Medicare costs?
Viewing the adjusted R-squared value is another exploratory measure that you can perform before
you run a regression tool.
You can also look at the relationships between independent variables. A strong positive
correlation indicates that the variables may be telling the same part of the story (multicollinearity).
You could always remove independent variables that tell the same story to reduce redundancy.
10-29
Lesson 10
b Double-click the Generalized Linear Regression tool that you ran earlier.
c Modify only the following parameters (leaving the others as they are):
d Click Run.
The spatial output is added to the map. Next, you will use the results to perform the six checks
and validate the model.
a In the Contents pane, find the charts created with the output from the GLR tool.
10-30
Regression analysis
The Relationships Between Variables chart shows a scatter plot matrix similar to the one that you
viewed before you ran the GLR tool. The Distribution Of Standard Residual chart shows how the
residuals from the model compare to a normal distribution. The Standardized Residual VS.
Predicted Plot shows the standardized residuals plotted against the standardized predicted
values. No patterns should be present if the model fits well. Do you see a pattern in the chart?
e Expand the Generalized Linear Regression report so that you can see the entire report.
4. What does the OLS summary indicate about the statistical significance for each
variable?
__________________________________________________________________________________
5. What does it mean when a coefficient's probability has an asterisk next to it?
__________________________________________________________________________________
Number of hospital beds shows a positive relationship with Medicare spending costs.
10-31
Lesson 10
Most the other variables have a positive relationship. For example, higher evaluation and
management costs in an HRR, more MRI and other imaging events, and a higher dehydration rate
contribute to more spending.
There is one negative relationship: the distance to Houston. The distance to Houston variable
measures how far each referral region is from Houston, as Houston has one of the largest medical
complexes in the world. This variable is spatial. When you are not finding a properly specified
model, including a spatial variable can sometimes help capture the nonstationarity (regional
variation) in the data relationships.
9. Which of the OLS checks assesses normality in the distribution (nonspatial) of residual
values? (Hint: Look in the GLR Diagnostics section.)
__________________________________________________________________________________
11. Based on the spatial OLS output of the residuals, do you think that the residuals are
clustered?
__________________________________________________________________________________
10-32
Regression analysis
g Search for and open the Spatial Autocorrelation tool, and set the following parameters:
h Click Run.
Spatial autocorrelation validates your initial visual analysis of randomly distributed residuals.
j Close the Spatial Autocorrelation Report, minimize File Explorer, and return to the Generalized
Linear Regression report.
10-33
Lesson 10
13. What information can you obtain from the AIC and adjusted R-squared values in
comparison to the first model that tested only the HCC score variable?
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
Based on the six OLS checks, you have a properly specified regression model, or one that you can
trust with explaining total Medicare costs in this study area. As you know, OLS is a global
regression model that applies one equation to all features and has fixed relationships. What if
Medicare spending changed based on location? Most spatial relationships are not static, and you
can improve your regression model by incorporating varying relationships. You will work with a
different regression tool in the next lesson that incorporates spatial variation of variables.
k Leave the Generalized Linear Regression report open, as you will use it in the next exercise.
10-34
Regression analysis
Lesson review
2. Explain what the AIC value represents and how to use it.
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
3. You want to perform OLS regression analysis but do not have key independent variables in
the attribute table. What can you do to get the required information in the table?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
10-35
Lesson 10
You can enrich your data by adding demographic and landscape facts about the people and
places that surround or are inside your data locations. Enriched data comes from demographic
data that Esri curates annually and is available through ArcGIS Online or locally installed Business
Analyst data. The output feature class from the Enrich tool is a duplicate of your input with new
attribute fields added to the table. You can then use the attributes for any operation that uses
attributes, including as variables in regression analysis. This tool requires an ArcGIS Online
organizational account and consumes credits, or a locally installed Business Analyst dataset.
10-36
Answers to Lesson 10 questions
Scenario 2: Graffiti
2. What factors might contribute to more graffiti in certain areas of a city?
Answers may vary, but some potential causes include time of year, weather, income, and
availability of structures.
10-37
Answers to Lesson 10 questions (continued)
Checkpoint (page 10-11)
1. Which of the following options describes the dependent variable in the regression equation?
c. OLS attempts to explain which variables explain a phenomenon, and to what degree.
2. What is the adjusted R-squared statistic, and how does it indicate model performance?
Adjusted R-squared is a statistic that quantifies model performance. The value returned is
a percentage that shows how much of the variation in the dependent variable is
explained by the independent variables in the model.
3. What does it mean if the OLS residuals are spatially clustered? What should you do to solve the
problem?
Spatially clustered residuals indicate that you may be missing key variables. Try adding
other independent variables to the model until the residuals are not spatially clustered.
10-38
Answers to Lesson 10 questions (continued)
OLS reports (page 10-17)
2. What does the AIC value of 1770 tell you about the HCC score and Medicare spending?
Nothing. When you have other regression models using Total Costs 2010 as the
dependent variable, you can compare AIC values.
3. What can you extrapolate about the residuals from the map?
Visually, there appears to be some spatial clustering of over-predictions (red) and under-
predictions (blue).
4. What does the OLS summary indicate about the statistical significance for each variable?
The variables have an asterisk (*) and are therefore statistically significant, helping the
model.
10-39
Answers to Lesson 10 questions (continued)
5. What does it mean when a coefficient's probability has an asterisk next to it?
There is a low probability that the variable is not helping the model.
9. Which of the OLS checks assesses normality in the distribution (nonspatial) of residual values?
(Hint: Look in the GLR Diagnostics section.)
Jarque-Bera
11. Based on the spatial OLS output of the residuals, do you think that the residuals are clustered?
No, they do not appear to be clustered.
13. What information can you obtain from the AIC and adjusted R-squared values in comparison to
the first model that tested only the HCC score variable?
Adjusted R-squared increased from 0.65 to 0.86, indicating that the variables explain 86
percent of the Medicare spending story. Further, the AIC value decreased from 1770 to
1672, indicating a better model for the dependent variable.
10-40
11 Geographically weighted regression
You have used OLS, a global regression method, to create a regression model that is applied
to all features in the study area. Tobler's First Law of Geography states: "Everything is related
to everything else, but near things are more related than distant things." With Tobler's Law in
mind, you may speculate that spatial relationships vary over a study area. OLS regression
assumes that the relationships between your variables are static over space, but another type
of regression analysis, called geographically weighted regression (GWR), allows for these
variable relationships for change over space. In this lesson, you will use GWR to see if your
model improves by allowing the data relationships to vary spatially.
Topics covered
11-1
Lesson 11
Earlier, you learned several important terms and concepts related to OLS regression. Using what
you have learned about dependent and independent variables, coefficients, and R-squared
values, assess the following situation and answer the questions.
1. Is OLS, a global regression model, an appropriate choice to make these predictions? Why?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
11-2
Geographically weighted regression
GWR characteristics
You have learned that OLS regression constructs one equation for all features in the study area. As
such, it is considered a global regression model, which assumes that the relationships between
your data variables are static over space. Another type of regression, GWR, is a local regression
model in that it constructs a single equation for each feature in the study area using only its
neighboring features. As a result, GWR allows for variable relationships to change over space.
When you find a properly specified model using OLS, you can use the same variables in GWR and
potentially improve your results.
11-3
Lesson 11
• You have found a properly specified OLS model and you want to know if allowing for spatially
varying relationships improves model performance. Use the same variables as you did in the
OLS model.
• You want to predict alternative or future values using a model calibrated with existing values.
• The OLS diagnostics indicate statistically significant nonstationarity.
It is important to find a properly specified model using OLS regression. OLS has strong
diagnostics that you can use to perform the six checks to find the best combination of variables.
GWR has some diagnostics, but not all of them are suitable for finding a properly specified model.
A recommended approach is to find a properly specified model using OLS, and then use the
same variables in GWR.
Figure 11.1. Perform six checks using OLS before running GWR.
In the OLS diagnostic report, there is a statistic called the Koenker statistic. The Koenker (BP)
statistic is a value that indicates whether the independent variables in the model have a consistent
relationship to the dependent variable both in geographic space and in data space. A statistically
significant Koenker value, or an asterisk (*) next to it, indicates that the modeled relationships are
not consistent.
11-4
Geographically weighted regression
Figure 11.2. If the Koenker test is statistically significant, use the Robust Probability values to determine coefficient
significance.
11-5
Lesson 11
GWR in action
11-6
Geographically weighted regression
GWR tips
• Use GWR after you find a properly specified model using OLS and the same independent
variables.
• Use GWR if the Koenker test statistic has an asterisk (*) next to it after running OLS. The
asterisk (*) next to the Koenker statistic indicates statistically significant nonstationarity.
• Use GWR when you are predicting future values based on current and estimated values.
• An optional output of GWR is creating coefficient surfaces so that you can visualize the
relationships between each independent variable and dependent variable to see where the
relationships are stronger.
• GWR prints R-squared as a measure of goodness of fit for the model. Its value varies from 0.0
to 1.0, with higher values being preferable. It may be interpreted as the proportion of
dependent variable variance accounted for by the regression model. The denominator for
the R-squared computation is the sum of squared dependent variable values. Adding an
extra explanatory variable to the model does not alter the denominator but does alter the
numerator, giving the impression of improvement in model fit that may not be real.
11-7
Lesson 11
• Because of the previously described problem for the R-squared value, calculations for the
adjusted R-squared value normalize the numerator and denominator by their degrees of
freedom. This has the effect of compensating for the number of variables in a model, and
consequently, the adjusted R-squared value is almost always smaller than the R-squared
value. However, in making this adjustment, you lose the interpretation of the value as a
proportion of the variance explained. In GWR, the effective number of degrees of freedom is
a function of the bandwidth, so the adjustment may be quite marked in comparison to a
global model like OLS. For this reason, the AICc is preferred as a means of comparing
models.
11-8
Exercise 11 20 minutes
Perform GWR
In this exercise, you will use the same variables from a properly specified model to get a better
result by allowing for spatial variation in the variable relationships. You will also use GWR to
predict Medicare costs related to reducing dehydration rates.
11-9
Lesson 11
a Restore the ArcGIS Pro project, and verify that you are viewing the Regression Analysis map
and the Generalized Linear Regression report window.
b In the GLR report's GLR Diagnostics section, locate the Koenker statistic.
Running GWR does not take more effort to find the correct variables, as you have done that work
using OLS. You will use the same variables from the properly specified OLS model from the
previous exercise when you run GWR.
d Search for and open the Geographically Weighted Regression (GWR) tool, and set the
following parameters:
e Click Run.
11-10
Geographically weighted regression
g Expand the Geographically Weighted Regression (GWR) window, scroll to the bottom, and
find Model Diagnostics.
There are fewer diagnostics in the GWR report than in the OLS report, but there are adjusted R-
squared and AIC values. The OLS model from the previous exercise had an adjusted R-squared of
0.86 and an AIC of 1672.
Find a properly specified model using OLS first, and then use the same variables in
GWR.
2. What can you say about the adjusted R-squared and AIC values in the GWR model
compared to the values in the OLS model?
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
By taking into account nonstationarity, GWR provides an improved model of Medicare spending.
c Click the Symbology down arrow, and then choose Unique Values and then Graduated Colors.
11-11
Lesson 11
The darker areas show locations where the relationship between number of imaging events and
Medicare spending is the strongest. Knowing where relationships are strongest can help you focus
any remediation efforts or decide how to address the problem. Next, you will symbolize the GWR
layer by another coefficient.
f In the Symbology pane, change the Field to Coefficient (PQI10D), which is the dehydration
rate, and keep the other settings as they are.
11-12
Geographically weighted regression
Viewing the coefficient map for dehydration rates indicates an underlying spatial process. In the
western part of the study area (Texas), dehydration rates have more of an impact on Medicare
spending than in the southeastern states, such as the Carolinas and Georgia. The map does not
indicate that there is more dehydration in dark areas or that there is more spending; rather, it
shows that the relationship between dehydration rates and Medicare spending is strongest.
Alternatively, you can set an optional parameter for GWR to create coefficient surfaces
for each independent variable by specifying an output workspace. The coefficient
surfaces are similar to how you symbolized the GWR layer based on the coefficient. The
surface will show spatial correlation between variables, but it will be a raster dataset.
Based on the results, you can target outreach programs to help educate people on staying
properly hydrated. You cannot assign resources everywhere, so GWR helps narrow down problem
areas so that you can target efforts to resolve the issue.
11-13
Lesson 11
b Open the previous run of the GWR tool, and leave all parameters that you set previously as
they were.
d Click Run.
GWR starts by calibrating the model using the original data, so you will notice that the diagnostics
in the report are the same. The difference is that after it completes the original analysis, GWR then
predicts the impact based on any new variables provided; in this case, it was dehydration. You will
update the symbology of the output prediction layer to match the original map of Medicare
spending, using that predicted value for cost.
f In the Symbology pane, ensure that the Primary Symbology is set to Graduated Colors.
g In the upper-right corner of the Symbology pane, click the Options button and choose
Import Symbology.
h In the Geoprocessing pane, for Symbology Layer, choose Study Area, and then click Run.
11-14
Geographically weighted regression
k In the Contents pane, make PredictReducedDehy, States, and Study Area the only visible
layers.
Initial costs:
You can see what the impact of reducing dehydration by 50 percent would be and the areas
where it would be the most effective.
11-15
Lesson 11
11-16
Geographically weighted regression
Lesson review
11-17
Answers to Lesson 11 questions
2. What can you say about the adjusted R-squared and AIC values in the GWR model compared to
the values in the OLS model?
Adjusted R-squared is 0.88, indicating that you are telling more of the Medicare spending
story in this area by allowing the relationships to vary across space. Additionally, the
lower AIC value for GWR indicates that it provides a better fit than the OLS model, given
that the models use the same dependent variable.
11-18
12 Geostatistical interpolation
You have used deterministic interpolation to create continuous surfaces from sample points
using deterministic methods. You also learned that there is another method of interpolation
called geostatistical interpolation. In this lesson, you will learn the basics of geostatistical
interpolation and create a prediction surface.
Topics covered
Geostatistical interpolation
Kriging
Geostatistical workflow
12-1
Lesson 12
Deterministic interpolation
All spatial interpolators attempt to predict the value of an attribute at unknown locations using
attribute values from known sampled locations.
Earlier in the course, you used deterministic interpolation methods to create surfaces from known
point locations, predicting unknown values from known values. Deterministic interpolators use a
mathematical formula to calculate this predicted value based on the degree of smoothing or
similarity in relation to neighboring points.
Figure 12.2. Some interpolators attempt to fit a mathematical function to a distribution of points, much like bending
a piece of paper to fit the distribution.
When you use deterministic interpolators, the output is fully determined by the user-specification
of the parameter values and the data. Changing these parameter values may produce different
12-2
Geostatistical interpolation
results, but they do not consider the statistical properties and underlying spatial structure of the
data.
Figure 12.3. In IDW, a deterministic interpolation method, the user can specify a search neighborhood. Only values
within the search neighborhood determine the unknown values.
12-3
Lesson 12
Geostatistical interpolation
Geostatistical interpolation uses the statistical properties of your measured points to predict the
unknown locations across the surface. The process of modeling the statistical correlation between
all pairs of points in a dataset allows the spatial dependence to be inferred based on the
underlying spatial structure of the dataset.
Figure 12.4. Geostatistical interpolation models the statistical correlation between all pairs of known points based
on both their distance apart and their values.
Geostatistics is based on the regionalized variable theory, which says the variation in a surface can
be decomposed into three main components: a deterministic trend component, an
autocorrelated error component, and a random error component.
Geostatistical example:
Figure 12.5. All these blue points represent a sample location that includes a temperature value. Thus, you can
calculate a mean, which is a statistic. Deterministic methods ignore even basic statistics, such as the mean.
12-4
Geostatistical interpolation
Kriging
One of the most popular geostatistical interpolation methods is kriging. Kriging assumes that at
least some of the spatial variation observed in natural phenomena can be modeled by random
processes with spatial autocorrelation and require that the spatial autocorrelation be explicitly
modeled.
Kriging assumptions
Assumption Description
Spatial Every location in the area has a value, but not all values are available
continuity to you.
Stationarity The relationship between two points and their values depends only on
the distance between them, not their exact location.
Normally Use histograms and other charts to verify normal distribution and
distributed apply transformation if needed.
No global There is a constant average in your data values across the surface. You
trends can remove trends.
Spatial The data is evenly distributed, not spatially clustered, as clusters will
clustering not appropriately represent the study area.
You can use kriging techniques to describe and model spatial patterns, predict values at
unmeasured locations, and assess the uncertainty associated with a predicted value at the
unmeasured locations.
12-5
Lesson 12
Geostatistical workflow
The workflow for performing geostatistical interpolation is more complex than performing
deterministic interpolation. Knowledge of the data and of the various geostatistical properties and
options in the tools is vital to creating a valid prediction surface.
Figure 12.7. Checking the distribution of your data is a commonly used operation for data exploration.
Esri Training course: Exploring Spatial Patterns in Your Data Using ArcGIS
12-6
Geostatistical interpolation
Figure 12.8. Each red dot represents a pair of point locations. The x-axis represents the distance between paired
locations, whereas the y-axis represents semivariance. A red dot in the lower left represents a pair of points that are
close together in distance and are also similar in value. A red dot in the top-right corner represents a point pair that
is far away in distance and also dissimilar in value.
12-7
Lesson 12
Cross-validation
Cross-validation is a procedure for testing how well the model predicts values at unknown
locations. In cross-validation, a piece of data whose value is known independently is removed
from the dataset, and the rest of the data is used to predict its value. This estimate is then
compared to the actual sample value to calculate the model error.
12-8
Exercise 12 20 minutes
You have used deterministic interpolation methods to create surfaces from sample points. Now,
you will use geostatistical techniques to create a prediction surface. You will analyze the same
ozone sample points from an earlier exercise, but you will use kriging this time.
12-9
Lesson 12
c Turn off all layers in the Interpolation map except Samples and StateBoundary.
f Set the Extent to Same As Layer - StateBoundary, and then click OK.
b From the Data tab, click Create Chart and choose Histogram.
The mean of the data is displayed with the red vertical line in the chart.
The height of each bar represents the frequency of data within each bin. Generally, the important
features of the distribution are its central value (for example, mean and median), spread, and
12-10
Geostatistical interpolation
symmetry. The ozone data histogram indicates that the data is unimodal (one hump) and nearly
symmetric. The right tail of the distribution indicates the presence of a relatively small number of
sample points with large ozone concentration values. As a quick check, if the mean and the
median are approximately the same value, you have one piece of evidence that the data may be
approximately normally distributed.
The mean and median values are 0.058 and 0.056, respectively, so they are very close.
e In the Chart Properties pane, check the Show Normal Distribution box.
Although the ozone measurements do not fit perfectly into the bell-shaped curve, other indicators
(such as mean, median, and kurtosis) suggest a normal distribution. Kriging assumes that the data
is normally distributed, so if your data is not normal, you should apply a transformation. You can
visualize what a transformation will do to the data in the chart. Charts are connected to the
features in the map, so you can select a column in the histogram and see the associated features
selected in the map.
f In the histogram, draw a box around the four bars on the right to select them.
The features highlighted in the map belong to one of the four bins that you selected in
the chart.
12-11
Lesson 12
h On the Data tab, in the Selection group, click Clear to clear your selection.
i From the Data tab, click Create Chart and choose QQ Plot.
j In the Chart Properties pane, for Compare The Distribution Of, select OZONE.
This QQ plot is normal, as it is plotting the quantiles of a numeric variable (OZONE) against the
quantiles of a normal distribution. If the distributions of the compared quantiles are identical, then
the plotted points will form an approximate straight line. The farther the plotted points deviate
from a straight line, the less similar the compared distributions are. In this case, the ozone values
closely follow a normal distribution.
a From the Analysis tab, in the Tools group, click Geostatistical Wizard.
12-12
Geostatistical interpolation
e Click Next.
h Click Next.
The semivariogram model is displayed, which allows you to examine spatial relationships between
measured points. Now you would like to fit the semivariogram model to capture the spatial
relationships in the data and use it in the prediction model. The goals are to achieve the best fit
and incorporate your knowledge of the phenomenon in the model. You can change parameters to
get the best fit, or you can let ArcGIS Pro optimize the model.
12-13
Lesson 12
j Click Next.
You can assume that as locations get farther from the prediction location, the measured values
have less spatial autocorrelation with the prediction location. As these points have little or no
effect on the predicted value, they can be eliminated from the calculation of that particular
prediction point by defining a search neighborhood. You can control the size and shape of the
search neighborhood and other properties.
k Click Next.
The final panel of the wizard is for cross validation. You learned about manual ways in which to
validate surfaces earlier in the course. With the Geostatistical Wizard, validation is built in.
Validation removes one data location and predicts the associated data using the data at the rest
of the locations. The primary use for this tool is to compare the predicted value to the observed
value to obtain useful information about some of your model parameters.
l Click Finish.
The Method Report window summarizes information on the method and its associated
parameters that will be used to create the output surface.
m Click OK.
12-14
Geostatistical interpolation
n If necessary, zoom out in the map, and then turn off the Samples layer.
The surface created is a layer in the project and not a raster dataset in a geodatabase.
To save the layer to disk, you would right-click it, point to Export Layer, and choose To
Rasters.
q Visually judge how well the default Kriging layer represents the measured ozone values.
In general, do high ozone predictions occur in the same areas where high ozone concentrations
were measured?
You have created a geostatistical surface using kriging. Although you did not alter any
parameters, you see that there are many important data considerations when you perform kriging.
12-15
Lesson 12
locations where predictions and validations will be performed. The result is a point layer
containing each city, its original attributes, a prediction value, and a standard error value.
You will first create a point layer from the Kriging layer.
a In the Contents pane, right-click the Kriging layer, point to Export Layer, and choose To Points
to open the GA Layer To Points tool.
Each city now has a predicted ozone value, as well as a standard error value that indicates the
level of uncertainty associated with the ozone prediction for each city.
12-16
Geostatistical interpolation
Empirical Bayesian kriging (EBK) is a geostatistical interpolation method that automates the most
difficult aspects of building a valid kriging model. Other kriging methods in Geostatistical Analyst
require you to manually adjust parameters to receive accurate results, but EBK automatically
calculates these parameters through a process of subsetting and simulations. EBK offers a data-
driven approach to interpolation.
Advantages
Standard errors of prediction are more accurate than other kriging methods.
12-17
Lesson 12
There is also a tool called Empirical Bayesian Kriging 3D (EBK3D) that allows you to interpolate
points in 3D to account for both horizontal and vertical changes in data values. Imagine that you
have points that represent greenhouse gas samples throughout the atmosphere at varying
altitudes. You could use EBK3D to interpolate values where no samples were recorded.
12-18
Geostatistical interpolation
12-19
Lesson 12
Lesson review
12-20
13 3D analysis
Throughout the course, you have witnessed and applied many analysis tools to examine the
spatial relationships in data. Imagine being able to see the effects that mountains, valleys,
buildings, and other 3D objects have on these relationships. Using 3D GIS, you can detect
trends and patterns that are not as apparent in 2D. Further, many analyses questions can only
be answered using 3D tools and visualization.
In this lesson, you will learn techniques for analyzing both surface and 3D feature data in
ArcGIS Pro to identify patterns not apparent in 2D. You will perform line-of-sight analysis,
buffer 3D features, and use 3D overlay tools. 3D capability is included in ArcGIS Pro, so it is
unnecessary to have separate apps to handle the 3D visualization and analysis. However, 3D
Analyst is required for 3D analysis tools.
Topics covered
13-1
Lesson 13
You have performed many types of analyses in the course, but all have been in 2D. You will learn
about 3D analysis and situations when it would enhance or make your analysis possible.
How could your analysis benefit from incorporating 3D, and what are some potential 3D
analysis examples?
13-2
3D analysis
3D analysis examples
There are many uses for 3D analysis, such as analyzing underground resources, determining
visibility or line of sight, shadow-volume analysis, and volumetric and area analysis. In ArcGIS Pro,
you can view and edit 3D data out of the box. If you want to perform 3D analysis using
geoprocessing tools, you must have the 3D Analyst extension.
Multipatch features
A multipatch feature is a GIS object that stores a collection of patches to represent the boundary
of a 3D object as a single row in a database. Patches store texture, color, transparency, and
geometric information representing parts of a feature. All multipatches store z-values as part of
the coordinates used to construct patches. When you create a feature class, you can specify
multipatch as its type, rather than point, line, or polygon.
Sun-shadow volume
Sun-shadow volume creates volumes that model shadows cast by each feature using sunlight for a
given date and time. You can use sun-shadow volume analysis to visualize the effects of a new
building on surrounding areas, such as a park.
13-3
Lesson 13
Line-of-sight analysis
Line-of-sight analysis determines the visibility of sight lines over obstructions consisting of a
surface and an optional multipatch dataset.
13-4
3D analysis
3D feature analysis
The 3D Features toolset provides a collection of tools for constructing features and assessing
geometric properties in three-dimensional space. You can buffer, intersect, and apply union to 3D
features as you do with 2D features.
13-5
Lesson 13
Interactive 3D analysis
Figure 13.5. Interactive 3D analysis tools: Viewshed (on the left) and Line Of Sight (on the right).
13-6
3D analysis
• Line Of Sight creates sight lines to determine if one or more targets are visible from a given
observer location.
• View Dome determines the parts of a sphere that are visible from an observer located at the
center.
• Viewshed determines the visible surface area from a given observer location through a
defined viewing angle.
• Slice temporarily suppresses part of a scene's display to reveal hidden content. It can be
applied to any content in the scene, making it possible to see inside buildings, explore
stacked volumes, and push through subsurface geology.
Each tool uses a different method to achieve visibility analysis and has customizable creation
methods and parameter values. The analysis feedback in the view is color-coded to distinguish
what is obstructed, unobstructed, and out of range.
13-7
Exercise 13 20 minutes
Perform 3D analysis
You will use 3D buildings and other features in Montreal, Quebec, to perform visibility analysis
using 3D Analyst tools. You will also use 3D features tools to buffer and intersect 3D features.
13-8
3D analysis
a If necessary, start or restore ArcGIS Pro and view the SNAPCourse project.
The map file contains the definition for a 3D scene that has buildings, a route, and observer points
for Montreal, Quebec. Next, you will set analysis environments.
h Click OK.
13-9
Lesson 13
on rooftops or other high vantage points to observe crowd behavior during special events. In this
exercise, you will perform a line-of-sight analysis for an event in Montreal.
a In the Contents pane, right-click the Observers layer and choose Zoom To Layer.
b Tilt and navigate the scene to see the observers and the route.
The two yellow points represent observers, either a camera or a person. You will use 3D
functionality to determine what each potential observer can see along the route. First, you will
create lines, called sight lines, between each of your observer points and the route. Sight lines are
a required parameter in the Line Of Sight tool. You will space these lines 30 feet apart along the
route.
c In the Geoprocessing pane, view the toolsets in the 3D Analyst Tools toolbox.
d Expand Visibility.
e Open the Construct Sight Lines tool, and set the following parameters:
13-10
3D analysis
f Click Run.
Sight lines are added from each observer to the route. Some of these sight lines are obstructed by
buildings, while others are not.
a In the Geoprocessing pane, open the Line Of Sight tool, and then set the following
parameters:
b Click Run.
13-11
Lesson 13
In the LOS layer, the green lines indicate visible lines of sight. The red lines indicate sight lines
that are blocked (that is, the buildings block the observers' visibility of portions of the route).
c Navigate around the scene to view the sight lines and how they are affected by the buildings.
13-12
3D analysis
Sometimes, a line of sight can start as visible but become red (not visible) when it is obstructed.
a In the Contents pane, make PipeExplosion and Buildings the only visible layers.
b From the Map tab, click Bookmarks, and then choose Pipe Explosion.
13-13
Lesson 13
The red symbol indicates where the steam pipe burst. You have used the Buffer tool to create
buffers for 2D features, but there is another buffer tool specifically for 3D features that you will use
for this analysis.
c In the Geoprocessing pane, return to the 3D Analyst Tools toolbox and expand the 3D
Features toolset.
The 3D Features toolset contains comparable tools for performing buffer, intersect,
union, and near operations on 3D features.
e Click Run.
f In the Contents pane, change the color of the Pipe3DBuffer layer to a bright red.
13-14
3D analysis
A planar buffer would be flat on a 2D surface, but because you are working in a 3D scene and
using 3D analysis tools, the buffer is a multipatch feature that contains z-values.
a In the Geoprocessing pane, return to the 3D Analyst Tools toolbox, open the Intersect 3D
tool, and set the following parameters:
b Click Run.
Disregard the warning. It appears because some multipatch features in the Buildings
feature class are not fully enclosed, but this issue will not affect your results.
13-15
Lesson 13
d Tilt the scene to view the intersected features from various angles.
The texture of the buildings can interfere with the display of the overlapping features, so you can
turn off the Buildings layer to better see the intersect result.
The areas where the buildings are red is where the blast radius intersected with the buildings and
should be tested for contamination.
With the Buildings layer off, you can clearly see the areas where the buildings and buffer
intersected. If you have the appropriate data, you can also use the result of the intersection to
select building interior features, such as rooms. This action will quickly give you a list of locations
to check for breached windows for possible contamination.
The attributes for the buffer and the buildings are in the AffectedAreas layer. If the attribute table
had more information, such as address, occupant name, and so on, it would also be in the table.
13-16
3D analysis
13-17
Lesson 13
Lesson review
1. The core functionality of ArcGIS Pro includes 3D visualization. Does the core functionality
also include 3D analysis geoprocessing tools?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
13-18
Answers to Lesson 13 questions
Benefits: Gain insight into the landscape and how real-world features relate to one
another, for enhanced visualization and analysis, to solve spatial problems for which 3D is
essential.
13-19
Appendix A
Esri data license agreement
Training Materials Reservation of Ownership. This Agreement gives You certain limited rights to
use electronic and tangible versions of the digital or printed content required to complete a
course, which may include, but are not limited to, workbooks, data, concepts, exercises, and
exams ("Training Materials"). Esri and its licensor(s) retain exclusive rights, title, and ownership to
the copy of Training Materials, software, data, and documentation licensed under this Agreement.
Training Materials are protected by United States copyright laws and applicable international
copyright treaties and/or conventions. All rights not specifically granted in this Agreement are
reserved to Esri and its licensor(s).
Grant of License. Esri grants to You a personal, nonexclusive, nontransferable license to use
Training Materials for Your own training purposes. You may run and install one (1) copy of Training
Materials and reproduce one (1) copy of Training Materials. You may make one (1) additional copy
of the original Training Materials for archive purposes only, unless Esri grants in writing the right to
make additional copies.
Training Materials are intended solely for the use of the training of the individual who registered
and attended a specific training course. You may not (i) separate the component parts of the
Training Materials for use on multiple systems or in the cloud, use in conjunction with any other
software package, and/or merge and compile into a separate database(s) or documents for other
analytical uses; (ii) make any attempt to circumvent the technological measure(s) (e.g., software or
hardware key) that effectively controls access to Training Materials; (iii) remove or obscure any
copyright, trademark, and/or proprietary rights notices of Esri or its licensor(s); or (iv) use audio
and/or video recording equipment during a training course.
Term. The license granted by this Agreement will commence upon Your receipt of the Training
Materials and continue until such time that (1) You elect to discontinue use of the Training
Materials or (2) Esri terminates this Agreement for Your material breach of this Agreement. This
Agreement will be terminated automatically without notice if You fail to comply with any provision
of this Agreement. Upon termination of this Agreement in either instance, You will return to Esri or
destroy all copies of the Training Materials, including any whole or partial copies in any form, and
A-1
Appendix A
Esri data license agreement (continued)
deliver evidence of such destruction to Esri, and which evidence will be in a form acceptable to
Esri in its sole discretion. The parties hereby agree that all provisions that operate to protect the
rights of Esri and its licensor(s) will remain in force should breach occur.
Limited Warranty. Esri warrants that the media on which Training Materials is provided will be
free from defects in materials and workmanship under normal use and service for a period of
ninety (90) days from the date of receipt.
Disclaimer of Warranties. EXCEPT FOR THE LIMITED WARRANTY SET FORTH ABOVE, THE
TRAINING AND TRAINING MATERIALS CONTAINED THEREIN ARE PROVIDED "AS IS,"
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, AND NONINFRINGEMENT. ESRI DOES NOT WARRANT THAT THE TRAINING OR
TRAINING MATERIALS WILL MEET YOUR NEEDS OR EXPECTATIONS; THAT THE USE OF
TRAINING MATERIALS WILL BE UNINTERRUPTED; OR THAT ALL NONCONFORMITIES,
DEFECTS, OR ERRORS CAN OR WILL BE CORRECTED. THE TRAINING DATABASE HAS BEEN
OBTAINED FROM SOURCES BELIEVED TO BE RELIABLE, BUT ITS ACCURACY AND
COMPLETENESS, AND THE OPINIONS BASED THEREON, ARE NOT GUARANTEED. THE
TRAINING DATABASE MAY CONTAIN SOME NONCONFORMITIES, DEFECTS, ERRORS, AND/
OR OMISSIONS. ESRI AND ITS LICENSOR(S) DO NOT WARRANT THAT THE TRAINING
DATABASE WILL MEET YOUR NEEDS OR EXPECTATIONS, THAT THE USE OF THE TRAINING
DATABASE WILL BE UNINTERRUPTED, OR THAT ALL NONCONFORMITIES CAN OR WILL BE
CORRECTED. ESRI AND ITS LICENSOR(S) ARE NOT INVITING RELIANCE ON THIS TRAINING
DATABASE, AND YOU SHOULD ALWAYS VERIFY ACTUAL DATA, SUCH AS MAP, SPATIAL,
RASTER, OR TABULAR INFORMATION. THE DATA CONTAINED IN THIS PACKAGE IS SUBJECT
TO CHANGE WITHOUT NOTICE. IN ADDITION TO AND WITHOUT LIMITING THE PRECEDING
PARAGRAPH, ESRI DOES NOT WARRANT IN ANY WAY TRAINING DATA. TRAINING DATA MAY
NOT BE FREE OF NONCONFORMITIES, DEFECTS, ERRORS, OR OMISSIONS; BE AVAILABLE
WITHOUT INTERRUPTION; BE CORRECTED IF ERRORS ARE DISCOVERED; OR MEET YOUR
NEEDS OR EXPECTATIONS. YOU SHOULD NOT RELY ON ANY TRAINING DATA UNLESS YOU
HAVE VERIFIED TRAINING DATA AGAINST ACTUAL DATA FROM DOCUMENTS OF RECORD,
FIELD MEASUREMENT, OR OBSERVATION.
Exclusive Remedy. Your exclusive remedy and Esri's entire liability for breach of the limited
warranties set forth above will be limited, at Esri's sole discretion, to (i) replacement of any
defective Training Materials; (ii) repair, correction, or a workaround for Training Materials; or (iii)
return of the fees paid by You for Training Material that do not meet Esri's limited warranty,
provided that You uninstall, remove, and destroy all copies of the Training Materials and execute
and deliver evidence of such actions to Esri.
A-2
Appendix A
Esri data license agreement (continued)
Export Regulation. You must comply with all applicable laws and regulations of the United States
including, without limitation, its export control laws. You expressly acknowledge and agree not to
export, reexport, transfer, or release Esri-provided Training Materials, in whole or in part, to (i) any
US embargoed country (including to a resident of any US embargoed country); (ii) any person or
entity on the US Treasury Department Specially Designated Nationals List; (iii) any person or entity
on the US Commerce Department Lists of Parties of Concern; or (iv) any person or entity where
such export, reexport, or provision violates any US export control laws or regulations including,
but not limited to, the terms of any export license or licensing provision and any amendments and
supplemental additions to US export laws.
Governing Law. This Agreement is governed by and construed in accordance with the laws of the
state in which training is being held or, in the case of training provided over the Internet, the laws
of the State of California, without reference to its conflict of laws principles.
A-3
Appendix B
Answers to lesson review questions
2. What helps you choose the appropriate datasets for your analysis?
Analysis criteria helps you choose.
• Add attributes
• Calculate values
• Join fields
• Edit features and attributes
• Modify spatial reference
• Display XY data
• Extract features of interest
B-1
Appendix B
Answers to lesson review questions (continued)
2. Explain the difference between using a straight-line distance and using cost.
Straight-line is an as-the-crow-flies distance. But for some applications, such as routing, a
straight line does not accurately reflect distance. You can use cost, such as time, to create
driving time data that accurately reflects traffic, population, and other factors.
B-2
Appendix B
Answers to lesson review questions (continued)
2. If you use the Intersect tool with streams and watersheds as the inputs, what would the resulting
feature class contain?
Overlay tools output the simpler of the geometries from the inputs, so the output would
contain the streams that fell within the watersheds. Further, the attribute table would
have both stream and watershed attributes. Users could query streams and determine
which watersheds that they fall within, or the other way around.
2. Why would you set model parameters for your model elements and variables?
You set model parameters to share your model with other users who want to run it with
their own data.
3. What are some ways in which you can validate surfaces created using interpolation?
Use the Explore tool to click sample points and the surface at the same location to
compare estimated values. You can also interpolate on a subset of your sample points and
then use the withheld points to see how well the interpolator estimated values.
B-3
Appendix B
Answers to lesson review questions (continued)
2. Explain the difference between the Reclassify tool and the Rescale By Function tool.
Reclassify is when the user manually sets class breaks and which values go into each class.
Rescale By Function is when the software determines the classes based on a function and
your input data. Reclassify is best for discrete data and Rescale By Function is best for
continuous data.
B-4
Appendix B
Answers to lesson review questions (continued)
3. Differentiate between analyzing time snapshots of data and true space-time analysis.
Analyzing by time snapshots groups features into arbitrary bins based on a day, week,
month, year, and so on. Time snapshots may break up related data and do not give you
the whole story. Space-time analysis assesses each feature separately so that other
features within a specified time period will be analyzed together regardless of whether
they spill over two months.
2. Explain what the AIC value represents and how to use it.
AIC is useful for comparing models using the same dependent variable, and you want this
value to be lower. AIC is only comparable between models using the same dependent
variable.
B-5
Appendix B
Answers to lesson review questions (continued)
3. You want to perform OLS regression analysis but do not have key independent variables in the
attribute table. What can you do to get the required information in the table?
You could manually add attributes or join fields from other data sources. Further, if you
do not have other data sources to get information, use the Enrich Layer tool to add
attributes from ArcGIS Online.
B-6
Appendix B
Answers to lesson review questions (continued)
• Enhanced visualization
• Ability to perform analyses not possible in 2D
• Opportunity to gain another perspective about your data
B-7
Appendix C
Additional resources
Lesson 3 Resources
Measuring cost
• Esri Training courses: Creating Optimized Routes Using ArcGIS Pro,
Creating an Origin-Destination Cost Matrix in ArcGIS Pro, and
Finding the Closest Facilities Using ArcGIS Pro -
Lesson 5 Resources
Automation methods in
ArcGIS Pro • ArcGIS Pro Help: Create a new task -
Lesson 6 Resources
Interpolation methods
• ArcGIS Pro Help: Deterministic methods for spatial interpolation
• -ArcGIS Pro Help: What are geostatistical interpolation techniques? -
Interpolation tools
• ArcGIS Pro Help: Classification trees of the interpolation methods
offered in Geostatistical Analyst -
Deterministic
interpolation • ArcGIS Pro Help: Subset Features
• -ArcGIS Pro Help: GA Layer To Points
• -ArcGIS Pro Help: Performing cross-validation and validation -
C-1
Appendix C
Additional resources (continued)
Lesson 7 Resources
Lesson 8 Resources
Interpreting inferential
statistics • ArcGIS Pro Help: What is a z-score? What is a p-value? -
Lesson 9 Resources
Space-time analysis
• ArcGIS Pro Help: Why hexagons? -
Lesson 10 Resources
Exploratory regression
• ArcGIS Pro Help: How Exploratory Regression works -
Lesson 11 Resources
GWR in action
• ArcGIS Pro Help: Interpreting GWR results -
C-2
Appendix C
Additional resources (continued)
Lesson 12 Resources
Kriging
• ArcGIS Pro Help: How kriging works
• -ArcGIS Pro Help: Kriging in Geostatistical Analyst
• -ArcGIS Pro Help: Understanding how to create surfaces using
geostatistical techniques -
Geostatistical workflow
• ArcGIS Pro Help: The geostatistical workflow
• -ArcGIS Pro Help: Essential vocabulary for Geostatistical Analyst
• -ArcGIS Pro Help: Understanding the semivariogram: The range, sill,
and nugget
• -ArcGIS Pro Help: Modeling a semivariogram -
Lesson 13 Resources
Interactive 3D analysis
• ArcGIS Pro Help: Exploratory analysis tools -
C-3