Final Exam Guide Gis 311
Final Exam Guide Gis 311
dZ
2▪ What is GIS and how can it be described
80txm
Z
=
1/a=▪ What is GIS used for
GIS is a technology that is used to create, manage, analyze, and map all types of data.
dnx9
xyGIS connects data to a map, integrating location data (where things are) with all types of descriptive
information (what things are like there). This provides a foundation for mapping and analysis that is
used in science and almost every industry. GIS helps users understand patterns, relationships, and
geographic context. The benefits include improved communication, efficiency, management, and
decision-making.
▪ What fundamental questions does GIS help address
Spatial data models are how we store the "what" and the "where" in GIS
A layer represents geographic data in ArcMap, such as a particular theme of data. Examples of map
layers include streams and lakes, terrain, roads, political boundaries, parcels, building footprints, utility
lines, and orthophoto imagery.
Each map layer is used to display and work with a specific GIS dataset. A layer references the data
stored in geodatabases, coverages, shapefiles, imagery, rasters, CAD files, and so on, rather than
actually storing the geographic data. Thus, a layer always reflects the most up-to-date information in
your database. A layer won't draw on your map unless you also have access to the data source on which
the layer is based.
A working GIS integrates five key components: hardware, software, data, people, and methods.
▪ Description of the five components of GIS, how do they work, how do they interact with each
other
Software
GIS software provides the functions and tools needed to store, analyze, and display geographic
information. Key software components are
Tools for the input and manipulation of geographic information
A database management system (DBMS)
Tools that support geographic query, analysis, and visualization
A graphical user interface (GUI) for easy access to tools
Data
Possibly the most important component of a GIS is the data. Geographic data and related tabular data
can be collected in-house or purchased from a commercial data provider. A GIS will integrate spatial
data with other data resources and can even use a DBMS, used by most organizations to organize and
maintain their data, to manage spatial data.
People
GIS technology is of limited value without the people who manage the system and develop plans for
applying it to real-world problems. GIS users range from technical specialists who design and maintain
the system to those who use it to help them perform their everyday work.
Methods
A successful GIS operates according to a well-designed plan and business rules, which are the models
and operating practices unique to each organization.
▪ Integration of data and technologies with government agencies and federal standards
National governments use geographic information system (GIS) technology to manage programs and
evaluate policy outcomes. With GIS, agency staff and leaders integrate diverse types of data to derive
understanding, operationalize solutions, communicate insights, and engage stakeholders and the public.
Using GIS to incorporate location intelligence into high-level decision-making provides powerful
insights on critical issues the nation faces, and supports science- and analysis-based policy making.
▪ What are spatial data models, why are they adopted, how do they work
In GIS, we represent the real world using spatial data models. These include the objects in a spatial
database (e.g., the point, line, or polygon geometries) as well as the relationships between them.
Spatial data models are how we store the "what" and the "where" in GIS. Let's review a bit on spatial
data models by watching the video below.
▪ What are the two basic types of spatial data models, how they differ from each other, why do we
choose one rather than the other
There are two main types of spatial data models used in GIS: vectors and rasters. Vectors use discrete
elements - points, lines, polygons - to represent the geometry and coordinates (the "where") of real-
world entities. Vectors also comprise a table of attributes (the "what") that are linked to the point, line,
or polygon geometry. The video below goes into more depth on the vector data model.
The raster data model uses a regular set of cells in a grid to represent real-world entities. Rasters are a
more natural way to represent spatial features that vary continuously over space such as elevation,
rainfall, or temperature. They are also used to store imagery. The video below provides more
information on the raster data model.
There are three common spatial data models being used in GIS today: vector, raster, and triangulated
irregular network (TIN).
▪ Vector data models: definition, characteristics, and geometry
A vector data model defines discrete objects. Examples of discrete objects are fire hydrants, roads,
ponds, or a cadastral. A vector data model is broken down into three basic types: points, lines, and
polygons. All three types of vector data are composed of coordinates, and attributes.
Points: A point uses a single coordinate pair to define its location. Points are considered to have no
dimension even though they may have a real world dimension. For the purposes of a GIS, no
dimension is assumed. Each point has associated attribute information, and the information is attached
to the center of the point. Examples of spatial phenomenon that would be modeled well as points are
light poles, manhole covers, and crime locations.
What you see in this image are the different ways to represent airports using a point vector type. Even
though the symbols are different, they all represent airports, and, all of the attributes for each airport are
linked to the center of each one of these symbols.
A Line Vector: A line vector type is defined by an ordered set of coordinates. Each line, and curve, is
made up of multiple line segments, however, on occasion, curved lines are represented mathematically.
There are two words that we need to define when discussing lines: the node, and a vertex. A node is
where a line begins or ends. A vertex is where a line changes direction. The smallest possible line will
have two nodes, a start node, and an end node. Longer lines will have at least two nodes, and many
vertices in between where the line changes direction. Attributes may be attached to the entire line,
individual node, or individual vertices, therefore, each line may have multiple rows of attributes in the
attribute table.
For example, if a line represents a road, each road segment between two intersections, may have its
own address information, such as the start address, and the end address for that block. An intersection
may have an attribute that describes where the intersection has a stop sign, or stoplight. The other
option, is for the entire line to have one row of attributes no matter how complex the line. Examples of
spatial phenomenon that are modeled well by lines are roads, pipelines, outlines of objects, and power
lines.
Polygon: The last vector data type is the polygon. A polygon is formed by a set of connected lines
where the start and end point have the same coordinate. Because the start and end point have the same
coordinate, the polygon will close and will have an interior region. Attribute information is attached to
the center of the polygon no matter how complex the polygon. Examples of spatial phenomenon
modeled well by polygons are lakes, cities, tree stands, and political boundaries.
Tabular information is the basis of geographic features, allowing you to visualize, query, and analyze
your data. In the simplest terms, tables are made up of rows and columns, and all rows have the same
columns. In ArcGIS, rows are known as records and columns are fields. Each field can store a specific
type of data, such as a number, date, or piece of text.
Feature classes are tables with special fields that contain information about the geometry of the
features. These include the Shape field for point, line, and polygon feature classes and the BLOB field
for annotation feature classes. Some fields, such as the unique identifier number (Object ID) and Shape,
are automatically added, populated, and maintained by ArcGIS.
A shapefile stores nontopological geometry and attribute information for the spatial
features in a data set. The geometry for a feature is stored as a shape comprising a set of
vector coordinates.
A shapefile stores integer and double-precision numbers. The remainder of this document
will refer to the following types:
Integer: Signed 32-bit integer (4 bytes)
Double: Signed 64-bit IEEE double-precision floating point number (8 bytes)
The main file that stores the feature geometry. No attributes are stored in this file—only geometry.
Yes
.shx
A companion file to the .shp that stores the position of individual feature IDs in the .shp file.
Yes
.dbf
Yes
No
.atx
No
.ixs and .mxs
No
.prj
No
.xml
No
The raster data model uses a regular set of cells in a grid to represent real-world entities. Rasters are a
more natural way to represent spatial features that vary continuously over space such as elevation,
rainfall, or temperature. They are also used to store imagery. The video below provides more
information on the raster data model.
▪ Topology in GIS: what is it, what is it used for and why, what happens when it doesn’t work
properly
•Dimensionality: the distinction between point, line, area, and volume, which are said to have
topological dimensions of 0, 1, 2, and 3 respectively
Many phenomena are subject to topological constraints: for example, two counties cannot overlap, two
contours cannot cross, and the boundary of an area cannot cross itself. Topology is an important
concept therefore in constructing and editing spatial databases. Figure 2-7, below, illustrates the
concept of topology, using the example of two areas that share a common boundary. While stretching
of the space can change the shapes of the areas, they will remain in contact (retain their adjacency)
however much stretching is applied.
▪ Semi-line algorithm, PIP point object in polygon spatial operation, and ArcGIS PIP-related
topology rules
One of the most basic of spatial operations is that of determining whether a given point lies inside a
polygon. More generally the problem extends to the case of multiple points and polygons, with the
problem being to assign every point to the correct polygon. Related problems include line in polygon
and polygon in polygon tests. A fast initial screening approach is to check whether a point (or line,
polygon) lies in the polygon’s MBR. However, the standard algorithm for determining PIP in the vector
model (known as the semi-line algorithm) is to extend a line vertically upwards (or horizontally, e.g. to
the right) and then count the number of times this line crosses the polygon boundary. If the line crosses
the boundary an odd number of times it lies inside the polygon. This is true even when the polygon is
concave or contains holes. A number of special cases need to be considered, as illustrated in Figure
4-13, below. These include points that lie on the boundary or at a vertex, and points that lie directly
below a vertical segment of the polygon boundary. A useful discussion of PIP algorithms and related
procedures is provided in Worboys and Duckham (2004, pp 197-202).
If the given point lies on the boundary or at a vertex of the polygon there is a further difficulty, since a
unique assignment rule could prevent the point from being allocated to any polygon. Solutions to such
cases include: assigning the point to the first polygon tested and not to any subsequent polygons;
randomly assigning the point to a unique polygon from the set whose boundary it lies on; reporting an
exception (non-assignment); assigning a weight value to the point (e.g. 0.5 on a boundary or 1/n at a
vertex, where n is the number of polygons meeting at that vertex).
▪ Winding Number method as alternate PIP algorithm
Another technique used to check if a point is inside a polygon is to compute the given point's winding
number with respect to the polygon. If the winding number is non-zero, the point lies inside the
polygon. This algorithm is sometimes also known as the nonzero-rule algorithm.
Draw a horizontal line to the right of each point and extend it to infinity.
Count the number of times the line intersects with polygon edges.
A point is inside the polygon if either count of intersections is odd or point lies on an edge of the
polygon. If none of the conditions are true, then point lies outside.[4]
One way to compute the winding number is to sum up the angles subtended by each side of the
polygon.[5] However, this involves costly inverse trigonometric functions, which generally makes this
algorithm performance-inefficient (slower) compared to the ray casting algorithm. Luckily, these
inverse trigonometric functions do not need to be computed. Since the result, the sum of all angles, can
add up to 0 or
2π {\displaystyle 2\pi } (or multiples of
2π
{\displaystyle 2\pi }) only, it is sufficient to track through which quadrants the polygon winds,[6] as it
turns around the test point, which makes the winding number algorithm comparable in speed to
counting the boundary crossings.
▪ Topology and Graph Theory; the case of Konigsberg and the Seven Bridges
The Seven Bridges of Königsberg is a historically notable problem in mathematics. Its negative
resolution by Leonhard Euler, in 1736,[1] laid the foundations of graph theory and prefigured the idea
of topology.[2]
The city of Königsberg in Prussia (now Kaliningrad, Russia) was set on both sides of the Pregel River,
and included two large islands—Kneiphof and Lomse—which were connected to each other, and to the
two mainland portions of the city, by seven bridges. The problem was to devise a walk through the city
that would cross each of those bridges once and only once.
reaching an island or mainland bank other than via one of the bridges, or
accessing any bridge without crossing to its other end
are explicitly unacceptable.
Euler proved that the problem has no solution. The difficulty he faced was the development of a
suitable technique of analysis, and of subsequent tests that established this assertion with mathematical
rigor.
▪ What’s geodesy
Geodesy is the science of measuring the shape of the Earth. Geodesy addresses Earth's curvature, how
the Earth deviates from an idealized sphere, and inaccuracies in measurement. These factors result in
the use of different coordinate systems, which cause confusion and spatial errors.
The Earth is an oblate ellipsoid. It is fatter than it is tall - yes, the Earth is a bit squished! Because the
Earth is not perfectly spherical, it is actually quite complicated to define its shape.
Additionally, Earth's shape is impacted by gravity differently depending on the mass balance. The
undulating surface is called the geoid. The geoid is the 3D surface that approximates the true
gravitational shape of the Earth. Certain areas of the world have higher or lower mean sea level values
depending on how strongly gravity is affecting these areas. These undulations can be more than 100m,
so geoidal deviations are not insignificant! But they do make it more difficult to measure the shape of
Earth.
The Datum
A datum is a reference ellipsoid and an identified point of origin to which the ellipsoid is tied. The
figure below shows the geoid, (red, irregular sphere) fit with a datum that consists of the reference
ellipsoid/spheroid (blue circle) and a point of origin.
Datum Adjustments
Periodically, datum adjustments need to be made as the Earth shifts its mass balance. Shifts in the mass
balance change how gravity impacts Earth, which changes the geoid, which warrants a datum
adjustment.
What types of events might warrant a datum adjustment? How often do you think major events occur
that change how Earth's mass is distributed and warrant adjustments to datums?
Events that lead to the need to adjust a datum are infrequent, but they do occur. Any event that
significantly changes the mass balance of the Earth could force a datum adjustment. Examples include
volcanic erruptions, major earthquakes, and even manmade events. Construction of the Three Gorges
Dam in China shifted the balance of water on Earth so much that datum adjustments were needed!
One of the most commonly used datums is the World Geodetic System (WGS) based on the WGS 1984
ellipsoid (WGS84). WGS84 was developed by the U.S. Department of Defense and is used in global
navigation satellite systems.
After the size and shape of the reference ellipsoid have been determined and the origin selected, the
next step is to define the poles, equator, parallels and meridians.
Parallels are lines of latitude. Lines of latitude range from 0o at the Equator to 90o at the poles. A 1o
change is the same everywhere on Earth.
Meridians are lines of longitude. Lines of longitude range from 0o (the Greenwich Meridan) to 180o.
A 1o change in longitude is NOT the same distance everywhere on Earth because meridians converge
at the poles.
Coordinate Systems
Coordinates are sets of numbers that unambiguously define locations. There are many different sets of
coordinates that describe the same locations on Earth.
Geographic coordinates DO NOT form a Cartesian surface. They occur on a curved surface where the
meridians converge at the poles.
Geographic coordinate systems use a three-dimensional spherical surface to define locations on Earth.
Geographic coordinate systems consist of a datum (see above), a prime meridian, and an angular unit of
measurement.
Geographic coordinate systems use a three-dimensional spherical surface to define locations on Earth,
which means that lengths, angles, and areas are not constant across two dimensions. Geographic
coordinate systems, like WGS_84, are not suitable for GIS 311 project work.
Map Projections
The final step in going from the 3D Earth to a 2D map is a map projection. A map projection is a
means of projecting locations from a curved surface (the Earth) onto a flat plane (the map). The video
below shows why map projections are needed.
Developable Surfaces
To project points on a sphere onto a plane, you need some sort of surface to serve as the plane. We call
these surfaces developable surfaces because they can be rolled out into a plane. Three types of
developable surfaces used in map projections are:
Conical (cones)
Cylindrical (cylinders)
Planar (planes)
Distortions
All world maps are wrong because it is impossible to represent the surface of a sphere with a plane
without some form of distortion. Distortions generally fall into four categories:
Shape
Area
Direction
Distance
Tissot's indicatrix is a method for measuring map distortions. Ellipses are placed at the intersections of
parallels and meridians on a sphere. When the sphere is projected onto a plane, the distortions to the
circles make it easy to tell which properties have been distorted.
The figure below shows Tissot's indicatrix of the Mercator projection. The size of the circles becomes
larger moving away from the Equator, but the shape does not get distorted
The figure below shows Tissot's indicatrix for the Mollweide projection.
Two common projections
There are many different projections that have been developed for different purposes ranging from
navigation to aesthetics. Two projections that are commonly used in GIS are the Lambert Conformal
Conic and the Transverse Mercator.
As the name suggests, this projection uses a cone as the developable surface. The cone is set secant to
the earth so that there are two standard parallels that intersect the surface of the earth. The figure below
from Paul Bolstad's text "Fundamentals of GIS" illustrates how the cone intersects the Earth at the
standard parallels.
When the cone is unrolled, the parallels that intersected the earth have zero distortion. Errors increase
as you move away from the standard parallels.
Transverse Mercator
The transverse Mercator map projection is an adaptation of the standard Mercator projection where the
developable surface - a cylinder - is set transverse to the Earth. The transverse version is widely used in
national and international mapping systems around the world, including the Universal Transverse
Mercator system (discussed on the following page).
The cylindrical developable surface is placed tangent to the Earth, and errors are zero along this circle.
Errors increase moving away from this central meridian, as shown in Tissot's indicatrix in the above
image.
There are two very common projected coordinate systems that are used in GIS:
There are 120 zones in the U.S. that follow county boundaries. If you find the county, you can find the
zone!
Each zone has its own projection system:
Zones oriented in a North-South direction (e.g., IL-W, IL-E) use a transverse Mercator projection
Zones oriented in an East-West direction (e.g., ND-N, ND-S) use a Lambert Conformal Conic
projection
An example of how a projection is fit for each zone is shown below for the central zone in the state of
Minnesota (MN-C). Since this zone is longer in the E-W direction than the N-S, it uses a Lambert
Conformal Conic projection. The projection is placed secant to the zone, with two standard parallels
placed 1/6 of the way from the bottom and top of the zone, respectively. Distortions increase as you
move north and south of those two lines.
Within each zone, locations are given sets of numbers in meters based on how far the location is from
the Equator, and how far it is from the edge of the zone. Most of Arizona falls within Zone 12, but a
small piece on the west side falls in Zone 11. It is important to note that coordinates are not continuous
across zones. This means that if your study site straddles a UTM zone, it is best NOT to use this
projected coordinate system.
UTM Basics
In the UTM system, coordinates are numbered in meters.
Instead of placing the origin (0,0) in the center of each zone and having negative coordinates, UTM
coordinates are placed relative to an origin that is placed 500,000 meters west of the zone.
These coordinates are called false eastings since the easting value is relative to a somewhat arbitrary
point.
In the northern hemisphere, such as in the picture shown to the right, all coordinates use the Equator as
0-degrees north.
In the southern hemisphere, zones also have a false northing to ensure that coordinates are positive.
If the Equator was used as zero, then any coordinates in the southern hemisphere would be zero.
For these zones, North values are 10,000,000 meters at the equator and reduce in value as one moves
south
In this way, all areas in the southern hemisphere maintain a positive northing value
Eastings are still false at 500,000 meters west of the zone.
MODULE 2:
▪ Standard deviational ellipse, dispersion, directional distribution, and central tendency, trends:
definition, calculation, real-world examples, importance
The example below shows two maps. The map on the left shows a point for all tornadoes reported in
the U.S. between 1995 and 2015. The map on the right shows the mean center and standard deviational
ellipse for that same set of tornadoes.
A useful analysis of point locations, especially when they represent plants or animals, is to estimate
home range. Convex hull polygons and characteristic hull polygons are two simple methods for
delineating home ranges based on a set of (x,y) points.
▪ Convex hull polygons and characteristic hull polygons (CHP): definition, characteristics, real-
world examples, differences and limitations
A convex hull is the minimum bounding geometry that completely encloses a set of points. In other
words, it is the smallest polygon that contains all the points. You can think of it as a rubber band that
fits around the outermost set of points. A convex hull can be affected considerably by extreme outlier
points, which also will impact the shape, area, perimeter length and centroid.
The image below taken from Paul Bolstad's text "Fundamentals of GIS" shows how characteristic hull
polygons are formed. Notice how the areas without any points in the convex hull on the left are
removed (white areas) from the characteristic hull polygons on the right.
▪
Kernel
Suggested Reading - Read Chapter 4.3.4: Density, kernels and occupancy of Geospatial AnalysisLinks
to an external site. before starting this section.
One problem with Convex Hulls and Characteristic Hull Polygons is that they don't account for
different densities of points in different areas of the home range. Kernel density estimation estimates
the probability of finding a point within every area of the study region and maps those probabilities as a
continuous surface.
Kernel density can be difficult to conceptualize. Click here to engage with an interactive website that
helps visualize how kernel density estimation works.Links to an external site.
There are many reasons why you might use areal (polygon) data for your GIS analysis.
Certain data are only available in polygon format. For example, the U.S. Census Bureau does not
publish census results at the individual household level to protect privacy.
Some operations require data to be in polygon format
If you ever need to convert points into polygons, you can use the spatial join feature in GIS.
Overlay analyses - these operations include union, intersect, identity, clip, merge, dissolve, buffers, etc.
These were covered in GIS 211 and will not be reviewed here.
Identify characteristics of a distribution - these techniques are similar to those covered in the first part
of this Module including mean center, standard deviational ellipse, etc. One important thing to note is
that when these operations are computed for polygons, they operate on the centroid points of the
polygon.
Describing spatial patterns - this category of techniques help you determine whether features and/or
values are clustered, dispersed, or randomly positioned throughout the study area.
The remainder of this module will focus on describing spatial patterns in areal (polygon) datasets.
▪ Dr. Amy Frazier and the relationship between crime and building demolition in the City of Buffalo,
NY ▪ Eric Newburger’s explanations of why we use statistics in GIS, what are the three questions of
statistics, what additional information can be extracted via statistics from GIS data
▪ The case of Dr. John Snow’s cholera map in London, 1854, as early example of spatial analysis
The term statistical significance refers to the likelihood a relationship is caused by something other
than random chance. In the words of Eric Newburger, significance testing answers the question: 'are
you sure that's not just dumb luck?'
When we are analyzing spatial patterns, we are interested in determining if there are underlying
processes influencing the locations and values of features. In other words, is something other than
random chance influencing locations.
With 99% significance, there is just a 1 in 100 chance that the observation would have occurred
naturally, and what we are observing is EXTREMELY unusual
This law suggests that objects or phenomena that are geographically close to each other are more likely
to be similar or have some kind of spatial relationship compared to objects that are farther apart.
For example, this concept applies to pollution, noise, soil sciences, and countless phenomena.
▪ Concept of spatial autocorrelation and its three declinations in detecting spatial patterns
(positive/negative/no spatial autocorrelation)
Spatial autocorrelation measures how similar objects are to other objects close to them. In other words,
it measures the correlation of a variable with itself through space. Spatial autocorrelation can be
positive or negative. Positive spatial autocorrelation occurs when similar values occur near one another.
Negative spatial autocorrelation occurs when dissimilar values occur near one another. These principles
of autocorrelation are depicted in the figure below.
When you are describing spatial patterns for GIS data, trends can often be uncovered more easily using
spatial autocorrelation measures. You will learn two separate analyses below. Both are based on spatial
autocorrelation, and both can be used to describe spatial patterns in areal data.
▪ Moran’s I global statistics: definition, calculation, uses, real-world examples, capabilities and
limits
Moran's I is a global statistic for measuring spatial autocorrelation. The term global means the statistics
analyzes the entire dataset together and returns a single value describing the pattern of the features.
Moran's I measures global spatial autocorrelation based on feature locations and feature attribute values
simultaneously.
Given a set of features and an associated attribute, Moran's I evaluates whether the pattern is clustered,
dispersed, or random. The tool calculates the Moran's I Index value and both a a z-score and p-value to
evaluate the statistical significance of the Index. In other words, it helps answer the question "are you
sure that Moran's I value isn't just dumb luck?"
The math behind the Global Moran's I statistic is shown above. The tool computes the mean and
variance for the attribute being evaluated. Then, for each feature value, it subtracts the mean, creating a
deviation from the mean.
Moran's I can provide an indication of WHETHER OR NOT features are clustered, but it does not
provide any information on WHERE features might be clustering. To answer that question, you will
need to use the Getis-Ord Gi* statistic.
The Getis-Ord Gi* statistic (pronounced G-i-star) provides an indication of where features with either
high or low values cluster spatially. This tool works by looking at each polygon feature, setting a
neighborhood search distance, and determining whether a feature with a high value is surrounded by
other features in the neighborhood that have high values as well.
The local sum for a feature and its neighbors is compared proportionally to the sum of all features;
when the local sum is very different from the expected local sum, and when that difference is too large
to be the result of random chance, a statistically significant z-score results.
▪ Concepts of the MAUP, or modifiable areal unit problem, and its two aspects of zoning or
zonation effect and scale effect
The modifiable areal unit problem, or MAUP (sounds like 'mop'), refers to the
biases that are introduced into statistical analysis when data are
aggregated into areal units. There are two related MAUP issues:
1.Zoning effects - the statistics will change according to the shape of the
units
2.Scale effects - the statistics will change according to the size of the units
The figure below illustrates the zoning effect, where the shape of the units
(how the boundaries were drawn) changes the illness rate result. In the left-hand
side, the curved boundary produces an illness rate of 50% below the line and 0%
above. In the right-hand size, the straight boundary produces a different result:
100% to the right and 0% to the left.
▪ Gerrymandering as an example of MAUP; real-world example of MAUP as political occurrence
in Indonesia
MODULE 3:
▪ The two fundamental tables required in the node-link model of a relational database
There are two fundamental tables required in the Node-Link model that can be
stored in a relational database. The first is a Node Table, which stores the unique
identifier for each node along with its x and y coordinates. The second is a Link
Table, which stores the link unique identifier, the node of origin, and the node of
destination. This Link Table can also store whether or not the link is unidirectional
or not.
Demands, such as the number of people that will board the bus at a certain stop,
can be added to netowrks through attributes.
Turns
When a link terminates in a node (junction), four types of actions can occur:
1.The traveler can pass through the node and continue straight
2.The traveler can backtrack - make a U-Turn
3.The traveler can turn left
4.The traveler can turn right
At each node in the network, the number of possible actions (turns) that can be
performed is represented by n^2 where n is the number of edges connected to
the node. For example, in the diagram below, the red node is serviced by three
links. Therefore, the number of possible turns for that node is 3² = 9.
Stops
Unlike nodes, stops do not affect or alter the path direction. Stops are placed in the network to signal
where the route should visit. Stops do not necessarily have to occur on the network. The network will
find the closest node. Information for stops includes:
▪ Network analysis applications: definitions and special features of optimal routes analysis,
closest facility & location/allocation analysis, service area analysis
Network analysis is used in many different applications. Four are highlighted here:
Optimal Routes
Closest Facility
Service Areas
Origin-Destination Cost Matrix
Optimal Routes
Anytime you have used Google Maps to get directions to a restaurant, job
interview, or a friends' house, you are performing an optimal route analysis on a
network (well, Google is). Optimal routes are one of the most common
types of network analysis and are now commonplace on most mapping apps.
Depending on the analysis, the definition of 'optimal' may change. For example, you may want to find
the quickest, shortest, or even the most scenic route. The answer will change depending on which
impedance you choose to solve. If the impedance is time, then the optimal route will be the one that
gets you to the destination the fastest. The impedance may include live traffic information from apps
such as 'ways' or historical traffic information.
If the impedance is distance, then the optimal route will be the shortest distance without taking into
account traffic or time. Maybe you are running out of gas and or mileage on your car lease and need to
choose the route with the shortest distance!
When performing an optimal route analysis, the best route is defined as the route that has the lowest
impedance, or least cost, where the impedance is chosen by the user. Any cost attribute can be used as
the impedance when determining the best route.
The figure below is from Esri's online documentation and shows an example of
closest facility analysis for locations (crimes, symbolized in yellow) and facilities
(patrol cars on duty). Notice how not all facilities (patrol cars) are assigned to an
incident. The user can specify thresholds for distances (or impedances) that are
too far or costly.
Service areas differ from euclidean distance (straight line) buffers because they follow the network that
agents such as humans or cars travel along (e.g., roads, railroads, etc.) instead of computing distances
"as the crow flies".
The next step in this analysis could intersect the service area polygons that were
generated through this analysis with a population dataset to determine how
many people/households live within the service area polygons. This analysis
could help determine whether another daycare was needed to service the
population in these areas.
▪ Origin-destination cost matrix and calculation of the least-cost paths along a network
Origin-Destination cost matrices finds and measures the least-cost paths along
the network from multiple origins to multiple destinations. The output is a matrix
showing the distance between every origin and every destination in the network.
▪ Gamma (γ) index of a network structure: definition and calculation; examples and exercises
The gamma (γ) index provides a measure of how complex a network is. Gamma
index measures ratio of actual number of links to maximum possible number of
links in a network
where l is the number of links in the network, and lmax is the maximum number
of possible links in the network. The maximum number of possible links is equal
to the number of nodes, n, minus 2, and that number is multiplied times 3.
Gamma ranges from 0 to 1 with values near 0 indicating simple networks and
values near 1 indicating more complex networks.
In the exercise below, you will compute the gamma index (gamma) for the
following networks - A, B, C, and D. Each network is more complex than the
previous, and so the gamma index values should increase. The first one has been
completed for you, and the answers are filled into the table below the figures.
▪ First-order (C1 ) matrix in network connectivity: definition and calculation; examples and
exercises
The first matrix you will compute is called a 'first-order' matrix. A first-order
matrix measures which nodes are directly connected to another node. If you
can travel from one node to another node without passing through a third node,
then the two nodes are directly connected.
The example below shows a very simple network with four nodes on the left and a
First-Order Matrix on the right. If two nodes are connected by a single link, there is
a '1' in the matrix. Since a node cannot be connected to itself directly, there
are '0's along the diagonal of the matrix.
The row sum equals the direct connectivity for each node. For example, node 'C' is
located centrally in the network. It is the only node that is directly connected to all
other nodes. Node C has a first-order connectivity value of '3'.
▪ Second-order (C2 ) matrix in network connectivity: definition and calculation; examples and
exercises
Second-order connectivity measures nodes that are connected to each other through exactly two links.
While it may be easy to count these in your head for the simple example, for much larger networks that
would not be feasible. Therefore, you need a quick way to compute how many second-order
connections there are in the network.
To compute second order connectivity, you multiply the C1 matrix times the C1 matrix.
For each cell in the output matrix (the C2 matrix), you need to multiple a row in
the first C1 matrix by a column in the second C1 matrix. Let's walk through it.
The empty matrix below is the C2 matrix you need to fill in with values. The
highlighted red square in the top, left corner is the value you are going to fill first.
That square is in Row 1, Column 1. Therefore, you are going to multiple Row 1
from C1 by Column 1 in C1, and then add those values together.
Djikstra's algorithm finds the shortest path between nodes in a network. The video
below explains Djikstra's Shortest Path Algorithm.
MODULE 4:
In the figure below, the white cells have a value of '0', and the gray cells have a
value of '1'. The cell by cell encoding has been completed for the first two rows.
Fill in the rest and check your answers below
▪ Run
length
Run length encoding also stores raster values by row, but groups 'like' values that
are next to each other to save storage space. As above, the white cells in the
figure below are '0' and the gray cells are '1'. First, the number of cells of 'like'
values is reported, followed by the value. In the first row, there are four white cells
(4:0), followed by 1 gray cell (1:1), followed by three white cells (3:0). The first
two rows have been completed. Fill in the rest and check your answers below.
▪ Quad
Tree
diagrams: definition and calculation; tree hierarchy, leaf nodes and non-leaf nodes; white leafs
and gray leafs; examples and exercises
Quad tree diagrams are the most complex of the three data structures reviewed
here. Rather than code the raster row by row, quad trees create a hierarchical tree
diagram describing how the raster can be drawn. The raster is divided into
hierarchical quads (or quarters), and a code is stored only when the entire quad
consists of a single value. The goal is to build a tree with all leafs. For quadrants in
which all cells are a single value, the value is stored as a leaf node. For quadrants
with multiple values, the value is stored as a non-leaf node. Non-leaf nodes must
be broken down further until they can be stored as a leaf. Let's walk through an
example below.
▪ Map algebra of raster files: definition and the four main groups
Map algebra is a framework for analyzing data stored as rasters. Map algebra
operations and functions are typically broken down into four groups: local, focal,
zonal, and global. The first three will be discussed on this page. Global operations
are discussed on the next page.
▪ Local raster operations: unary and binary operations, definition and calculation, real-world
examples and exercises
Local raster operations are performed on a cell-by-cell basis. The functions are
applied to each individual cell, and in situations where multiple raster are
involved, local operations only involve those cells sharing the same
location.
The figure below shows a unary local operation on the left, and a binary local
operation on the right. In the unary operation, there is only one input raster, and
the mathematical operation is performed on each cell individually. In the binary
operation, the mathematical operation is performed between the two rasters. In
each case, a new output raster is created.
Focal raster operations or neighborhood operations: definition and calculation; real-world examples;
examples and exercises with the use of the 3x3 moving window to compute focal operations such as
maximum/minimum/average statistics
The term ‘Focal Operation’ refers to each operation being preformed on a cell that
is the ‘focus’ of a neighborhood. The neighborhood can take many shapes, but it
is commonly a square box that glides across the raster, computing a statistic
within the box and replacing the center cell with the new value. The video below
explains focal operations in more depth.
▪ The effect of increasing/decreasing the size of the moving window over the anomalies in a raster:
changes in value/resolution due to focal operations performed on rasters
▪ Zonal raster operations: definition and calculation; difference from focal operations; real-world
examples and exercises
A zonal operation computes a new summary statistic value (such as the mean,
median, or maximum) from cells aggregated for some zonal unit. Zonal operations
differ from focal operations in several key ways. Zones do not have to be
symmetrical or regular-shaped. Also, a zonal operation changes the values for all
of the cells in the zone, not just a focal cell, and all cells in the zone are given the
same value in the output raster.
Global operations compute an output raster dataset in which the output value at
each cell location is potentially a function of all the cells combined from the
various input raster datasets. In this manner, global operations are considered
'per-raster' operations, since the output depends on the entire raster or rasters.
▪ Euclidean distance surfaces: definitions of source and of Euclidean distance; how they work;
how they are identified and computed; real-world examples
The Euclidean distance tools describe each cell's relationship to a source or a set
of sources based on the straight-line distance.
Source
The source identifies the location of the objects of interest, such as roads, rivers,
fire hydrants, or conservation areas. If the source is a raster, it must contain only
the values of the source cells, while other cells must be NoData. If the source is a
feature, it will internally be transformed into a raster when you run the tool.
Distance
Euclidean distance is calculated from the center of the source cell to the center of
every surrounding cell. The distance to each source cell is determined by
calculating the hypotenuse of the right triangle that can be drawn between the
source and each surrounding cell (see figure below). When there are multiple
sources (e.g., many roads), the shortest distance to a source is determined, and
the value is assigned to the cell location on the output raster.
▪
Weighted
distance
or cost
surfaces:
definition,
Cost surfaces differ from Euclidean distance surfaces in one respect: in addition to
the distance from a source or destination, a cost surface also includes a
computation of the cost to travel there such as time, money, calories, or carbon
emissions.
Below is a set of practice examples based on the same surface used in the video.
The first one is done for you. Compute the cost for all of the empty cells
▪ Concepts of aggregation and block statistics operations: definition and examples, when they’re
applied, different statistics to perform either operation
Block statistics partitions the input raster into non-overlapping blocks and
calculates a new value for each cell in that block based on a statistic of the input
values.
The outputs of the two operations will look exactly the same on screen. The
only way to know if the resolution is unchanged is to check the raster properties.
You'll want to use aggregation in situations where you need to resample the
resolution of the raster for further analysis and block statistics in situations
where you just need to change the values of the cells in a block.
There are many available statistics for performing either aggregation or block
statistics:
Mean
Minimum
Maximum
Majority
Sum
Range
Median
Standard Deviation
▪ Basic raster reclassification: definition and main uses
Reclassification is the process of assigning new values to the cells in a raster
based on a set of new data values. Reclassifying data is useful when you want to:
Rescale values from numerous different rasters into a common scale
Reduce the number of values for mapping/analysis by grouping together
certain values
Convert ratio or interval data into categorical classes
Set certain values to 'No Data'
While changing the symbology of a raster can produce a similar looking result
as reclassification, it is important to understand that changing symbology does
not reclassify the data.
The figure below shows an example of a reclassification. The old values were
symbolized according to ranges of values. Those ranges were assigned new
values in the reclassification table. The output raster reflects the new data values.
If...then...else
The equation at the top of the figure below is read as: The output for each cell
equals a conditional statement (CON). IF the value of a cell in LayerA is less than
3 assign the cell the value from LayerB. ELSE (otherwise), assign it the value from
LayerC
In the example above, the CONDITION is that LayerA is less than 3. The
reclassification values then depend on whether or not that condition is satisfied in
a particular cell. There are two options for output values - LayerB and LayerC.
The option selected depends on the answer to whether LayerA is less than 3.
A cartographic modeling involves multiple data layers that are combined through
a set of operations, which often include overlay, reclassification, buffering, and
other functions.
The flow
chart
shows
the input
and
output
datasets
in boxes.
The
operations performed on the datasets are show in ovals. There are three input
datasets: lakes, road, and hydric status. The lakes and roads layers are buffered to
identify areas 'near' lakes and roads (the first two criteria). The hydric status is
reclassified (recoded) to identify welands. The lakes and roads layers are
combined through a union operation. The wetlands are then subtracted from the
lakes and roads buffer union to produce the areas suitable for the new park.
While the example above uses primarily vector data, cartographic modeling
often uses raster data. Cartographic models are used for a variety of
applications including land use planning, transportation studies, modeling disease
spread, and identifying preservation sites.
Intermediate Data
Cartographic modeling often produces 'intermediate' data layers that are not
part of the final output of the model. For instance, in the example above, the lakes
and roads buffer layers were intermediate data layers. Some software programs
such as ArcGIS allow you to designate these data layers as temporary in the
model, so that you do not save a large number of unnecessary data files during
the project.
▪ Site suitability analysis via weighted overlay as cartographic modeling: definition and basic
concepts, ranked values and assigned weights
Weighted Overlay
Site Suitability analyses are a common example of cartographic models. Site
suitability analysis involves the ranking of values within each data layer to put
them on the same scale and the weighting of different data layers to produce a
single, final, output surface showing areas of high and low suitability.
The figure below shows how multiple criteria, each represented by a separate
raster, can be combined through weighting to produce a single output
surface. The process involves a pixel-by-pixel, weighted combination of
four input layers. Every pixel in each layer has been ranked on the same scale (1
to 9, with 1 being best and 9 being worst). The layers are each then given
different weights according to their relative importance in the overall analysis.
The value for each pixel is the sum of the products (ranked value times weight)
divided by the sum of the weights for all the layers.
▪ Steps to perform a site suitability analysis with definitions and relative procedures: selection of
criteria; assignment of rankings and weightings; creation of the final suitability surface and how
to compute it
Step 1: Selecting Site Criteria
The first step in designing a site suitability analysis is to design the complete set
of criteria needed to perform the analysis. In lab, you'll be locating the optimal
site for corn production using four criteria: distance to roads, land use, drainage,
and slope. This set of criteria is simplified so that you can complete the lab in a
reasonable amount of time, but in the real world, the criteria would be much more
extensive.
Let's imagine you are scouting out a site to build a new ski resort. What are
some criteria that you might want to consider when selecting the
site? Several possible criteria are listed below, see if you can think of any others
not on the list.
2. Slopes (slopes should be steep enough for skiing, but not too steep!)
Before data layers can be combined together in a weighted overlay, the values
within each layer need to be formalized into a common scale. Rankings
can either be continuous or discrete. An example of discrete rankings would be
that soils are either good or bad for construction. An example of continuous
rankings would be that slopes can be rated continuously from 0% (best) to 100%
(worst) for suitability.
All layers should have values in the same scale range. For example, the cost to
acquire a parcel of land cannot be ranked according to dollar amount from $0 to
$100,000 while soils are ranked on a scale of 1-10. The values of the parcel cost
would dominate the equation when the layers are combined.
To put these values onto the same scale, the cost of the parcels needs to
be reclassified to a scale of 1 to 10 so that it matches the scale of the soils.
There are two methods to determine appropriate rankings:
There are many instances where determining a formalized ranking scale either
empirically or theoretically is difficult. Rankings based on personal values can
be particularly difficult to assign. For instance, values such as "private" or
"isolated" can be difficult to formalize. In these cases, it is best to work closely
with the decision-maker to ensure the rankings reflect their personal values.
Importance Ranking
One common method for assigning weights to different layers is to rank the layers
according to their importance, and then use the equation below to compute
weights.
The denominator is simply the sum of all of the numerators, which you saw from
the previous answer was 10
Review
Weighted overlay operations are commonly used to solve multicriteria problems
such as site suitability analysis. These operations require that the multiple layers
contain values on the same scale, which can be accomplished through rankings.
Next, weights are assigned to each layer based on its relative importance.
The final suitability surface is the sum across all layers of the value in each cell
times the weight of the layer, divided by the sum of the weights across all layers.
MODULE 5:
▪ DEM Digital Elevation Models: definition, basic concepts, sources for DEM data, common uses,
real-world examples
As mentioned before, the USGS has DEM layers available at different resolutions
including 30m resolution but as fine a resolution as 5-meter DEMs in Alaska and 1-
meter DEMs in the conterminous U.S.
Slope
Slope is the incline, or steepness, of a surface. Slope can be measured in
degrees from horizontal (0-90o), or percent slope (which is the rise divided by the
run, multiplied by 100). Keep in mind that a slope of 45 degrees equals 100
percent slope because the rise and the run are the same.
Computing slope on a DEM is a bit more complicated since the elevation change is
the 'rise' (the z dimension), but there are now two dimensions for the 'run' (the x
and y dimensions). Therefore, computing slope on a DEM must take into account
the change in elevation values in both the x and y directions. There are two
primary methods for doing this:
1.The 4-nearest neighbor method
2.The 3rd order finite differencing approach
Both methods for computing slope are a type of focal analysis, so if you
need a refresher on moving windows and kernels, please refer back to the page in
Module 4 on Local, Focal, and Zonal Operations.
In the 4-neighbor method, only the four cells that share an edge with the center
cell are used in the computation of slope. Slope (s) in the equation below is
computed as:
There are two main components of the equation above: dZ/dx and dZ/dy.
In plain English, the first value (dZ/dx) represents the change in elevation value in
the x direction (dZ) divided by the distance covered in the x-direction (dx). The
second term (dZ/dy) represents the change in elevation in the y direction (dZ)
divided by the distance covered in the y-direction. Those terms are each squared
and summed, and slope is the arctangent (atan) of the square root of the summed
terms.
Let's walk through the example of computing slope for the center cell for the
raster below. You will notice that the cells are labeled through . These labels will
help keep the computations clear.
The first step is to compute the dZ/dx term in the numerator of the equation
above. dZ refers to the difference of the z-value, where z is the elevation. In this
first term, you want to find dZ in the x-direction, so subtract the terms to the right
and left of the center cell.
Notice that the focal cell, the center cell, is not actually used in the
calculation. The differencing of values occurs across the center cell.
The denominator of the term is simply the distance from the centroid of Z 5 to the
centroid of Z4. Since cell size is 10, that distance from centroid-to-centroid is 20.
By plugging in 0.45 and 0.15 into the equation above, it is now simple to compute
the slope value for the center cell. Just be sure to always compute slope in
degrees, not radians!
The figure below (Bolstad 2015), provides an example of how to compute the
dZ/dx and dZ/dy terms. Notice that the equation does not change. The only thing
that changes is the derivation of dZ/dx and dZ/dy. Using the same cell numbering
scheme as above:
Aspect
Another piece of information that can be derived from a DEM is the aspect of a
slope. The aspect is shows what direction the slope is facing for each location.
Review this website on how aspect is represented. We will not be going through
the actual calculation of aspect. However, if you are given the calculated
degrees for a cell, you should be able to determine the direction the slope is
facing. For example, if a cell was calculated to have a value of 96 degrees, what
direction would the slope be facing? Hint, 96 degrees is in the yellow color ramp. L
i
The brief video below shows the making of an aspect map in ArcGIS. The version
n
of ArcGIS that he uses is older than yours, but his discussion of aspect and how it
k
relates to slope is quite helpful.
s
t
Viewsheds and Line of Sight o
The diagram below (taken from here) shows a line of sight profile from a location
at the base of a mountain. The area in green is what is visible from that location
while the red shows what is not visible. A viewshed combines the lines of sight in
all directions to determine what is visible on the raster from that location. L
i
Many industries, such as the telecommunications industry, use the construction of
n
a viewshed to aid in determining where a cell phone tower would have the most
coverage. Additionally, a viewshed analysis would be helpful prior to homek
construction and would also have military uses in determining the best positioning s
for troops and armament. t
o
▪ Solar radiation: definition and basic concepts; common applications; modeling the solar
radiation in four steps a
n
Quantifying the amount of solar radiation an area on the ground receives can
be useful if you need to map and understand the effects of the sun over a
e
geographic area for specific time periods. Solar radiation is different from air
x
temperature because it measures the amount of radiation reaching the ground in
t
a particular spot, not the temperature of the air above.
e
Applications of solar radiation include determining growth periods and optimal r
planting dates for agriculture crops, locating places for installing solar panel n
arrays, optimal building orientations, and predicting the behavior of forest afires.
l
The total radiation a spot on earth is receiving is a combination of the direct s
radiation, reflected radiation, and diffuse radiation: i
t
Total Radiation = Direct + Reflected + Diffuse e
.
Reflected radiation is usually minimal, and it can be difficult to model, so for the
purposes of GIS, we just include Direct and Diffuse radiation in the calculation.
Keep in mind that this is the calculation for one, single pixel in the DEM. It is
necessary to repeat this process for every single pixel in the study
area as each will have a different upward-looking hemispherical viewshed.
In the second step, you need to overlay the viewshed on a direct sun map to
estimate radiation. A sun map is a raster displaying discrete sectors
defined by sun’s position as it varies through the hours of the day and
days of the year.
A sun map displays the sun track (the apparent position of the sun) as it
varies through the hours of day and days of the year. You can think of the
sun track as f you were looking up and watching as the sun's position
moves across the sky over a period of time. The sun map consists
of discrete sectors defined by the sun's position at particular hours
during the day and days throughout the year.
The sun track is calculated based on the latitude of the study area and
the time range of interest. The map below on the left is the direct sun
map for 45oN latitude calculated from winter solstice (December 22nd)
to summer solstice (June 22nd). Each colored box is a sector
representing the sun's position using 1/2 hour intervals through the day
and monthly intervals through the year.
The solar radiation originating from each sector (colored box in the map
below) is calculated separately, and the upward-looking
hemispherical viewshed is overlaid on the map for calculation of
direct radiation.
When running a solar radiation operation, the user must select the
appropriate date or range of dates (e.g., growing season) over which to
compute solar radiation.
The map on the right shows the sun map overlain with the upward-looking
hemispherical viewshed. For this particular pixel, the area would receive sun from
about 05:00 until about 16:00, but after that the sun would be obstructed (gray
area).
Total direct insolation for a given location is the sum of the direct
insolation from all sun map sectors.
In the third step, you need to overlay the viewshed on a diffuse sky map to
estimate diffuse radiation. The diffuse sky map is different from the direct sun
map. The diffuse sky map is the hemispherical view of entire sky divided into
sectors defined by 8 zenith and 16 azimuth angles. Each sector has different
diffuse radiation. The diffuse sky map is overlaid with the viewshed to compute
diffuse radiation only for the areas that can be seen from the sky.
The result from the direct sun map and the diffuse sky map are added
together to compute the total radiation received for a pixel in a DEM.
Step 4: Repeat the process for every, single location (pixel) in
the study area
Lastly, you must repeat this very computationally intensive process for every,
single pixel in the study area.
MODULE 6:
Deterministic Interpolation
Deterministic interpolation methods use mathematical functions to
calculate values at unknown locations based on either the degree of
similarity to known sample locations/values or the degree of smoothing in relation
to neighboring data points.
▪ IDW or inverse distance weighted interpolation: definition and characteristics; advantages and
disadvantages of the method; calculation; uses; exercises to solve upon a central unknown
“starred” point surrounded by known points
Using the equation above, we can solve for Z - the value for the unsampled point:
Why do you get different answers when you change the value for n?
In IDW, the exponent (n) controls how smooth the output surface will be. Higher
exponents generally mean smoother surfaces. The number of neighbors (i) also
influences how smooth the surface will be. The figures below are from Paul
Bolstad's textbook "Fundamentals of GIS". You can see how changing the
exponent (n) and number of neighbors (i) changes the output surface. Increasing
the exponent makes the surface smoother. IDW is notorious for having "bulls-eye"
artifacts.
The figure below is from Paul Bolstad's textbook "Fundamentals of GIS". The
figure on the left shows an original surface with sampling points (black dots). The
figure on the right shows the Thiessen polygons for the sampled points. Take note
of the coarseness of the interpolation result.
▪ Trend Interpolation: definition and characteristics; linear and second-order polynomial trend
surfaces; when to use trend surfaces
The technique is global because the entire set of points is used in the
computation of a single surface. The surface is not optimized in local
neighborhoods around individual points.
The two figures below show a trend surface (red line) fit to a set of sample points.
The difference between the two figures is the order of the polynomial. On the left,
a first-order polynomial was used (linear). On the right, a second-order polynomial
was used (quadratic).
The property being interpolated varies gradually over the study region, such
as pollution.
The analyst is interested in removing local effects and focusing on global
trends.
Kriging is probably the most well known and widely used geostatistical
interpolation method. Kriging is a multi-stage interpolation process that rests on
calculating and modeling the variogram. The following pages will walk you
through the variogram and how it is used in kriging.
▪ Assessing interpolation error and uncertainty via the RMSE Root Mean Square Error:
definition and basic concepts, method of calculation by steps