GEC ABE 2 - Lecture Notes 2
GEC ABE 2 - Lecture Notes 2
Pages
Unit No. : 7
Introduction: In Unit 6 "Data Characteristics and Visualization", we discussed different ways to query,
classify, and summarize information in attribute tables. These methods are indispensable for
understanding the basic quantitative and qualitative trends of a dataset. However, they don’t take
particular advantage of the greatest strength of a geographic information system (GIS), notably the
explicit spatial relationships. Spatial analysis is a fundamental component of a GIS that allows for an in-
depth study of the topological and geometric properties of a dataset or datasets. In this unit, we discuss
the basic spatial analysis techniques for vector datasets.
Aim: Become familiar with concepts and terms related to the variety of single overlay analysis
techniques available to analyze and manipulate the spatial attributes of a vector feature dataset.
Learning Objectives:
1. Become familiar with concepts and terms related to the implementation of basic multiple layer
operations and methodologies used on vector feature datasets.
2. Become familiar with basic single and multiple raster geoprocessing techniques.
3. Understand how local, neighborhood, zonal, and global analyses can be applied to raster
datasets.
4. Become familiar with concepts and terms related to GIS surfaces, how to create them, and how
they are used to answer specific spatial questions.
5. Learn to apply basic raster surface analyses to terrain mapping applications.
Topics:
UNIT 7
GEOSPATIAL ANALYSIS I: VECTOR OPERATIONS
Single Layer Analysis
As the name suggests, single layer analyses are those that are undertaken on an individual feature
dataset. Buffering1 is the process of creating an output polygon layer containing a zone (or zones) of a
specified width around an input point, line, or polygon feature. Buffers are particularly suited for
determining the area of influence around features of interest. Geoprocessing2 is a suite of tools
provided by many geographic information system (GIS) software packages that allow the user to
automate many of the mundane tasks associated with manipulating GIS data. Geoprocessing usually
involves the input of one or more feature datasets, followed by a spatially explicit analysis, and resulting
in an output feature dataset.
Buffering
Buffers are common vector analysis tools used to address questions of proximity in a GIS and can be
used on points, lines, or polygons (Figure 7.1 "Buffers around Red Point, Line, and Polygon Features").
For instance, suppose that a natural resource manager wants to ensure that no areas are disturbed
within 1,000 feet of breeding habitat for the federally endangered Delhi Sands flower-loving fly
(Rhaphiomidas terminatus abdominalis). This species is found only in the few remaining Delhi Sands soil
formations of the western United States. To accomplish this task, a 1,000-foot protection zone (buffer)
could be created around all the observed point locations of the species. Alternatively, the manager may
decide that there is not enough pointspecific location information related to this rare species and decide
to protect all Delhi Sands soil formations. In this case, he or she could create a 1,000-foot buffer around
all polygons labeled as “Delhi Sands” on a soil formations dataset. In either case, the use of buffers
provides a quick-and-easy tool for determining which areas are to be maintained as preserved habitat
for the endangered fly.
Several buffering options are available to refine the output. For example, the buffer tool will typically
buffer only selected features. If no features are selected, all features will be buffered. Two primary types
of buffers are available to the GIS users: constant width and variable width. Constant width buffers
require users to input a value by which features are buffered (Figure 7.1 "Buffers around Red Point, Line,
and Polygon Features"), such as is seen in the examples in the preceding paragraph. Variable width
buffers, on the other hand, call on a premade buffer field within the attribute table to determine the
buffer width for each specific feature in the dataset (Figure 7.2 "Additional Buffer Options around Red
Features: (a) Variable Width Buffers, (b) Multiple Ring Buffers, (c) Doughnut Buffer, (d) Setback Buffer,
(e) Nondissolved Buffer, (f) Dissolved Buffer").
In addition, users can choose to dissolve or not dissolve the boundaries between overlapping, coincident
buffer areas. Multiple ring buffers can be made such that a series of concentric buffer zones (much like
an archery target) are created around the originating feature at user-specified distances (Figure 7.2
"Additional Buffer Options around Red Features: (a) Variable Width Buffers, (b) Multiple Ring Buffers, (c)
Doughnut Buffer, (d) Setback Buffer, (e) Nondissolved Buffer, (f) Dissolved Buffer"). In the case of
polygon layers, buffers can be created that include the originating polygon feature as part of the buffer
or they be created as a doughnut buffer that excludes the input polygon area. Setback buffers are
similar to doughnut buffers; however, they only buffer the area inside of the polygon boundary. Linear
features can be buffered on both sides of the line, only on the left, or only on the right. Linear features
can also be buffered so that the end points of the line are rounded (ending in a half-circle) or flat
(ending in a rectangle).
Geoprocessing Operations
“Geoprocessing” is a loaded term in the field of GIS. The term can (and should) be widely applied to any
attempt to manipulate GIS data. However, the term came into common usage due to its application to a
somewhat arbitrary suite of single layer and multiple layer analytical techniques in the Geoprocessing
Wizard of ESRI’s ArcView software package in the mid-1990s. Regardless, the suite of geoprocessing
tools available in a GIS greatly expand and simplify many of the management and manipulation
processes associated with vector feature datasets. The primary use of these tools is to automate the
repetitive preprocessing needs of typical spatial analyses and to assemble exact graphical
representations for subsequent analysis and/or inclusion in presentations and final mapping products.
The union, intersect, symmetrical difference, and identity overlay methods discussed in Section 7.2.2
"Other Multilayer Geoprocessing Options" are often used in conjunction with these geoprocessing tools.
The following represents the most common geoprocessing tools.
The dissolve operation combines adjacent polygon features in a single feature dataset based on a single
predetermined attribute. For example, part (a) of Figure 7.3 "Single Layer Geoprocessing Functions"
shows the boundaries of seven different parcels of land, owned by four different families (labeled 1
through 4). The dissolve tool automatically combines all adjacent features with the same attribute
values. The result is an output layer with the same extent as the original but without all of the
unnecessary, intervening line segments. The dissolved output layer is much easier to visually interpret
when the map is classified according to the dissolved field.
The append operation creates an output polygon layer by combining the spatial extent of two or more
layers (part (d) of Figure 7.3 "Single Layer Geoprocessing Functions"). For use with point, line, and
polygon datasets, the output layer will be the same feature type as the input layers (which must each be
the same feature type as well). Unlike the dissolve tool, append does not remove the boundary lines
between appended layers (in the case of lines and polygons). Therefore, it is often useful to perform a
dissolve after the use of the append tool to remove these potentially unnecessary dividing lines. Append
is frequently used to mosaic data layers, such as digital US Geological Survey (USGS) 7.5-minute
topographic maps, to create a single map for analysis and/or display.
The select operation creates an output layer based on a user-defined query that selects particular
features from the input layer (part (f) of Figure 7.3 "Single Layer Geoprocessing Functions"). The output
layer contains only those features that are selected during the query. For example, a city planner may
choose to perform a select on all areas that are zoned “residential” so he or she can quickly assess which
areas in town are suitable for a proposed housing development.
Finally, the merge operation combines features within a point, line, or polygon layer into a single feature
with identical attribute information. Often, the original features will have different values for a given
attribute. In this case, the first attribute encountered is carried over into the attribute table, and the
remaining attributes are lost. This operation is particularly useful when polygons are found to be
unintentionally overlapping. Merge will conveniently combine these features into a single entity.
Among the most powerful and commonly used tools in a geographic information system (GIS) is the
overlay of cartographic information. In a GIS, an overlay is the process of taking two or more different
thematic maps of the same area and placing them on top of one another to form a new map (Figure 7.4
"A Map Overlay Combining Information from Point, Line, and Polygon Vector Layers, as Well as Raster
Layers"). Inherent in this process, the overlay function combines not only the spatial features of the
dataset but also the attribute information as well.
A common example used to illustrate the overlay process is, “Where is the best place to put a mall?”
Imagine you are a corporate bigwig and are tasked with determining where your company’s next
shopping mall will be placed. How would you attack this problem? With a GIS at your command,
answering such spatial questions begins with amassing and overlaying pertinent spatial data layers. For
example, you may first want to determine what areas can support the mall by accumulating information
on which land parcels are for sale and which are zoned for commercial development. After collecting
and overlaying the baseline information on available development zones, you can begin to determine
which areas offer the most economic opportunity by collecting regional information on average
household income, population density, location of proximal shopping centers, local buying habits, and
more. Next, you may want to collect information on restrictions or roadblocks to development such as
the cost of land, cost to develop the land, community response to development, adequacy of
transportation corridors to and from the proposed mall, tax rates, and so forth. Indeed, simply collecting
and overlaying spatial datasets provides a valuable tool for visualizing and selecting the optimal site for
such a business endeavor.
Overlay
Operations Several basic overlay processes are available in a GIS for vector datasets: point-inpolygon,
polygon-on-point, line-on-line, line-in-polygon, polygon-on-line, and polygon-on-polygon. As you may be
able to divine from the names, one of the overlay dataset must always be a line or polygon layer, while
the second may be point, line, or polygon. The new layer produced following the overlay operation is
termed the “output” layer.
The point-in-polygon overlay operation requires a point input layer and a polygon overlay layer. Upon
performing this operation, a new output point layer is returned that includes all the points that occur
within the spatial extent of the overlay (Figure 7.4 "A Map Overlay Combining Information from Point,
Line, and Polygon Vector Layers, as Well as Raster Layers"). In addition, all the points in the output layer
contain their original attribute information as well as the attribute information from the overlay. For
example, suppose you were tasked with determining if an endangered species residing in a national park
was found primarily in a particular vegetation community. The first step would be to acquire the point
occurrence locales for the species in question, plus a polygon overlay layer showing the vegetation
communities within the national park boundary. Upon performing the point-in-polygon overlay
operation, a new point file is created that contains all the points that occur within the national park. The
attribute table of this output point file would also contain information about the vegetation
communities being utilized by the species at the time of observation. A quick scan of this output layer
and its attribute table would allow you to determine where the species was found in the park and to
review the vegetation communities in which it occurred. This process would enable park employees to
make informed management decisions regarding which onsite habitats to protect to ensure continued
site utilization by the species.
As its name suggests, the polygon-on-point overlay operation is the opposite of the point-in-polygon
operation. In this case, the polygon layer is the input, while the point layer is the overlay. The polygon
features that overlay these points are selected and subsequently preserved in the output layer. For
example, given a point dataset containing the locales of some type of crime and a polygon dataset
representing city blocks, a polygon-on-point overlay operation would allow police to select the city
blocks in which crimes have been known to occur and hence determine those locations where an
increased police presence may be warranted.
A line-on-line overlay operation requires line features for both the input and overlay layer. The output
from this operation is a point or points located precisely at the intersection(s) of the two linear datasets
(Figure 7.7 "Line-on-Line Overlay"). For example, a linear feature dataset containing railroad tracks may
be overlain on linear road network. The resulting point dataset contains all the locales of the railroad
crossings over a town’s road network. The attribute table for this railroad crossing point dataset would
contain information on both the railroad and the road over which it passed.
The line-in-polygon overlay operation is similar to the point-in-polygon overlay, with that obvious
exception that a line input layer is used instead of a point input layer. In this case, each line that has any
part of its extent within the overlay polygon layer will be included in the output line layer, although
these lines will be truncated at the boundary of the overlay (Figure 7.9 "Polygon-on-Line Overlay"). For
example, a line-in-polygon overlay can take an input layer of interstate line segments and a polygon
overlay representing city boundaries and produce a linear output layer of highway segments that fall
within the city boundary. The attribute table for the output interstate line segment will contain
information on the interstate name as well as the city through which they pass.
The polygon-on-line overlay operation is the opposite of the line-in-polygon operation. In this case, the
polygon layer is the input, while the line layer is the overlay. The polygon features that overlay these
lines are selected and subsequently preserved in the output layer. For example, given a layer containing
the path of a series of telephone poles/wires and a polygon map contain city parcels, a polygon-on-line
overlay operation would allow a land assessor to select those parcels containing overhead telephone
wires.
Finally, the polygon-in-polygon overlay operation employs a polygon input and a polygon overlay. This
is the most commonly used overlay operation. Using this method, the polygon input and overlay layers
are combined to create an output polygon layer with the extent of the overlay. The attribute table will
contain spatial data and attribute information from both the input and overlay layers (Figure 7.10
"Polygon-in-Polygon Overlay"). For example, you may choose an input polygon layer of soil types with an
overlay of agricultural fields within a given county. The output polygon layer would contain information
on both the location of agricultural fields and soil types throughout the county.
The overlay operations discussed previously assume that the user desires the overlain layers to be
combined. This is not always the case. Overlay methods can be more complex than that and therefore
employ the basic Boolean operators: AND, OR, and XOR (see Section 6.1.2 "Measures of Central
Tendency"). Depending on which operator(s) are utilized, the overlay method employed will result in an
intersection, union, symmetrical difference, or identity.
Specifically, the union overlay method employs the OR operator. A union can be used only in the case of
two polygon input layers. It preserves all features, attribute information, and spatial extents from both
input layers (part (a) of Figure 7.11 "Vector Overlay Methods "). This overlay method is based on the
polygon-in-polygon operation described in Section 7.1.1 "Buffering".
Alternatively, the intersection overlay method employs the AND operator. An intersection requires a
polygon overlay, but can accept a point, line, or polygon input. The output layer covers the spatial extent
of the overlay and contains features and attributes from both the input and overlay (part (b) of Figure
7.11 "Vector Overlay Methods ").
The symmetrical difference overlay method employs the XOR operator, which results in the opposite
output as an intersection. This method requires both input layers to be polygons. The output polygon
layer produced by the symmetrical difference method represents those areas common to only one of
the feature datasets (part (c) of Figure 7.11 "Vector Overlay Methods ").
In addition to these simple operations, the identity (also referred to as “minus”) overlay method creates
an output layer with the spatial extent of the input layer (part (d) of Figure 7.11 "Vector Overlay
Methods ") but includes attribute information from the overlay (referred to as the “identity” layer, in
this case). The input layer can be points, lines, or polygons. The identity layer must be a polygon dataset.
In addition to the aforementioned vector overlay methods, other common multiple layer geoprocessing
options are available to the user. These included the clip, erase, and split tools. The clip geoprocessing
operation is used to extract those features from an input point, line, or polygon layer that falls within
the spatial extent of the clip layer (part (e) of Figure 7.11 "Vector Overlay Methods "). Following the clip,
all attributes from the preserved portion of the input layer are included in the output. If any features are
selected during this process, only those selected features within the clip boundary will be included in the
output. For example, the clip tool could be used to clip the extent of a river floodplain by the extent of a
county boundary. This would provide county managers with insight into which portions of the floodplain
they are responsible to maintain. This is similar to the intersect overlay method; however, the attribute
information associated with the clip layer is not carried into the output layer following the overlay.
The erase geoprocessing operation is essentially the opposite of a clip. Whereas the clip tool preserves
areas within an input layer, the erase tool preserves only those areas outside the extent of the
analogous erase layer (part (f) of Figure 7.11 "Vector Overlay Methods "). While the input layer can be a
point, line, or polygon dataset, the erase layer must be a polygon dataset. Continuing with our clip
example, county managers could then use the erase tool to erase the areas of private ownership within
the county floodplain area. Officials could then focus specifically on public reaches of the countywide
floodplain for their upkeep and maintenance responsibilities.
The split geoprocessing operation is used to divide an input layer into two or more layers based on a
split layer (part (g) of Figure 7.11 "Vector Overlay Methods "). The split layer must be a polygon, while
the input layers can be point, line, or polygon. For example, a homeowner’s association may choose to
split up a countywide soil series map by parcel boundaries so each homeowner has a specific soil map
for their own parcel.
Spatial Join
A spatial join is a hybrid between an attribute operation and a vector overlay operation. Like the “join”
attribute operation described in Section 5.2.2 "Joins and Relates", a spatial join results in the
combination of two feature dataset tables by a common attribute field. Unlike the attribute operation, a
spatial join determines which fields from a source layer’s attribute table are appended to the
destination layer’s attribute table based on the relative locations of selected features. This relationship
is explicitly based on the property of proximity or containment between the source and destination
layers, rather than the primary or secondary keys. The proximity option is used when the source layer is
a point or line feature dataset, while the containment option is used when the source layer is a polygon
feature dataset.
When employing the proximity (or “nearest”) option, a record for each feature in the source layer’s
attribute table is appended to the closest given feature in the destination layer’s attribute table. The
proximity option will typically add a numerical field to the destination layer attribute table, called
“Distance,” within which the measured distance between the source and destination feature is placed.
For example, suppose a city agency had a point dataset showing all known polluters in town and a line
dataset of all the river segments within the municipal boundary. This agency could then perform a
proximity-based spatial join to determine the nearest river segment that would most likely be affected
by each polluter.
When using the containment (or “inside”) option, a record for each feature in the polygon source layer’s
attribute table is appended to the record in the destination layer’s attribute table that it contains. If a
destination layer feature (point, line, or polygon) is not completely contained within a source polygon,
no value will be appended. For example, suppose a pool cleaning business wanted to hone its marketing
services by providing flyers only to homes that owned a pool. They could obtain a point dataset
containing the location of every pool in the county and a polygon parcel map for that same area. That
business could then conduct a spatial join to append the parcel information to the pool locales. This
would provide them with information on the each land parcel that contained a pool and they could
subsequently send their mailers only to those homes.
Overlay Errors
Although overlays are one of the most important tools in a GIS analyst’s toolbox, there are some
problems that can arise when using this methodology. In particular, slivers are a common error
produced when two slightly misaligned vector layers are overlain (Figure 7.12 "Slivers"). This
misalignment can come from several sources including digitization errors, interpretation errors, or
source map errors (Chang 2008).Chang, K. 2008. Introduction to Geographic Information Systems. New
York: McGraw-Hill. For example, most vegetation and soil maps are created from field survey data,
satellite images, and aerial photography. While you can imagine that the boundaries of soils and
vegetation frequently coincide, the fact that they were most likely created by different researchers at
different times suggests that their boundaries will not perfectly overlap. To ameliorate this problem, GIS
software incorporates a cluster tolerance option that forces nearby lines to be snapped together if they
fall within a user-specified distance. Care must be taken when assigning cluster tolerance. Too strict a
setting will not snap shared boundaries, while too lenient a setting will snap unintended, neighboring
boundaries together (Wang and Donaghy 1995).Wang, F., and P. Donaghy. 1995. “A Study of the Impact
of Automated Editing on Polygon Overlay Analysis Accuracy.” Computers and Geosciences 21: 1177–85.
A second potential source of error associated with the overlay process is error propagation. Error
propagation arises when inaccuracies are present in the original input and overlay layers and are
propagated through to the output layer (MacDougall 1975).MacDougall, E. 1975. “The Accuracy of Map
Overlays.” Landscape Planning 2: 23–30. These errors can be related to positional inaccuracies of the
points, lines, or polygons. Alternatively, they can arise from attribute errors in the original data table(s).
Regardless of the source, error propagation represents a common problem in overlay analysis, the
impact of which depends largely on the accuracy and precision requirements of the project at hand.
Like the geoprocessing tools available for use on vector datasets (Section 8.1 "Basic Geoprocessing with
Rasters"), raster data can undergo similar spatial operations. Although the actual computation of these
operations is significantly different from their vector counterparts, their conceptual underpinning is
similar. The geoprocessing techniques covered here include both single layer (Section 8.1.1 "Single Layer
Analysis") and multiple layer (Section 8.1.2 "Multiple Layer Analysis") operations.
Reclassifying, or recoding, a dataset is commonly one of the first steps undertaken during raster
analysis. Reclassification is basically the single layer process of assigning a new class or range value to all
pixels in the dataset based on their original values (Figure 8.1 "Raster Reclassification". For example, an
elevation grid commonly contains a different value for nearly every cell within its extent. These values
could be simplified by aggregating each pixel value in a few discrete classes (i.e., 0–100 = “1,” 101–200 =
“2,” 201–300 = “3,” etc.). This simplification allows for fewer unique values and cheaper storage
requirements. In addition, these reclassified layers are often used as inputs in secondary analyses, such
as those discussed later in this section.
As described in Chapter 7 "Geospatial Analysis I: Vector Operations", buffering is the process of creating
an output dataset that contains a zone (or zones) of a specified width around an input feature. In the
case of raster datasets, these input features are given as a grid cell or a group of grid cells containing a
uniform value (e.g., buffer all cells whose value = 1). Buffers are particularly suited for determining the
area of influence around features of interest. Whereas buffering vector data results in a precise area of
influence at a specified distance from the target feature, raster buffers tend to be approximations
representing those cells that are within the specified distance range of the target (Figure 8.2 "Raster
Buffer around a Target Cell(s)"). Most geographic information system (GIS) programs calculate raster
buffers by creating a grid of distance values from the center of the target cell(s) to the center of the
neighboring cells and then reclassifying those distances such that a “1” represents those cells composing
the original target, a “2” represents those cells within the user-defined buffer area, and a “0” represents
those cells outside of the target and buffer areas. These cells could also be further classified to represent
multiple ring buffers by including values of “3,” “4,” “5,” and so forth, to represent concentric distances
around the target cell(s).
A raster dataset can also be clipped similar to a vector dataset (Figure 8.3 "Clipping a Raster to a Vector
Polygon Layer"). Here, the input raster is overlain by a vector polygon clip layer. The raster clip process
results in a single raster that is identical to the input raster but shares the extent of the polygon clip
layer.
Raster overlays are relatively simple compared to their vector counterparts and require much less
computational power (Burroughs 1983).Burroughs, P. 1983. Geographical Information Systems for
Natural Resources Assessment. New York: Oxford University Press. Despite their simplicity, it is
important to ensure that all overlain rasters are coregistered (i.e., spatially aligned), cover identical
areas, and maintain equal resolution (i.e., cell size). If these assumptions are violated, the analysis will
either fail or the resulting output layer will be flawed. With this in mind, there are several different
methodologies for performing a raster overlay (Chrisman 2002).Chrisman, N. 2002. Exploring
Geographic Information Systems. 2nd ed. New York: John Wiley and Sons.
The mathematical raster overlay is the most common overlay method. The numbers within the aligned
cells of the input grids can undergo any user-specified mathematical transformation. Following the
calculation, an output raster is produced that contains a new value for each cell (Figure 8.4
"Mathematical Raster Overlay"). As you can imagine, there are many uses for such functionality. In
particular, raster overlay is often used in risk assessment studies where various layers are combined to
produce an outcome map showing areas of high risk/ reward.
The Boolean raster overlay method represents a second powerful technique. As discussed in Chapter 6
"Data Characteristics and Visualization", the Boolean connectors AND, OR, and XOR can be employed to
combine the information of two overlying input raster datasets into a single output raster. Similarly, the
relational raster overlay method utilizes relational operators (<=, =, <>, >, and =>) to evaluate
conditions of the input raster datasets. In both the Boolean and relational overlay methods, cells that
meet the evaluation criteria are typically coded in the output raster layer with a 1, while those evaluated
as false receive a value of 0.
The simplicity of this methodology, however, can also lead to easily overlooked errors in interpretation if
the overlay is not designed properly. Assume that a natural resource manager has two input raster
datasets she plans to overlay; one showing the location of trees (“0” = no tree; “1” = tree) and one
showing the location of urban areas (“0” = not urban; “1” = urban). If she hopes to find the location of
trees in urban areas, a simple mathematical sum of these datasets will yield a “2” in all pixels containing
a tree in an urban area. Similarly, if she hopes to find the location of all treeless (or “non-tree,”
nonurban areas, she can examine the summed output raster for all “0” entries. Finally, if she hopes to
locate urban, treeless areas, she will look for all cells containing a “1.” Unfortunately, the cell value “1”
also is coded into each pixel for nonurban, tree cells. Indeed, the choice of input pixel values and overlay
equation in this example will yield confounding results due to the poorly devised overlay scheme.
Scale of Analysis
Raster analyses can be undertaken on four different scales of operation: local, neighborhood, zonal, and
global. Each of these presents unique options to the GIS analyst and are presented here in this section.
Local Operations
Local operations can be performed on single or multiple rasters. When used on a single raster, a local
operation usually takes the form of applying some mathematical transformation to each individual cell
in the grid. For example, a researcher may obtain a digital elevation model (DEM) with each cell value
When applied to multiple rasters, it becomes possible to perform such analyses as changes over time.
Given two rasters containing information on groundwater depth on a parcel of land at Year 2000 and
Year 2010, it is simple to subtract these values and place the difference in an output raster that will note
the change in groundwater between those two times (Figure 8.5 "Local Operation on a Raster Dataset").
These local analyses can become somewhat more complicated however, as the number of input rasters
increase. For example, the Universal Soil Loss Equation (USLE) applies a local mathematical formula to
several overlying rasters including rainfall intensity, erodibility of the soil, slope, cultivation type, and
vegetation type to determine the average soil loss (in tons) in a grid cell.
Neighborhood Operations
Tobler’s first law of geography states that “everything is related to everything else, but near things are
more related than distant things.” Neighborhood operations represent a group of frequently used
spatial analysis techniques that rely heavily on this concept. Neighborhood functions examine the
relationship of an object with similar surrounding objects. They can be performed on point, line, or
polygon vector datasets as well as on raster datasets. In the case of vector datasets, neighborhood
analysis is most frequently used to perform basic searches. For example, given a point dataset
containing the location of convenience stores, a GIS could be employed to determine the number of
stores within 5 miles of a linear feature (i.e., Interstate 10 in California).
Neighborhood analyses are often more sophisticated when used with raster datasets. Raster analyses
employ moving windows, also called filters or kernels, to calculate new cell values for every location
throughout the raster layer’s extent. These moving windows can take many different forms depending
on the type of output desired and the phenomena being examined. For example, a rectangular, 3-by-3
moving window is commonly used to calculate the mean, standard deviation, sum, minimum, maximum,
or range of values immediately surrounding a given “target” cell (Figure 8.6 "Common Neighborhood
Types around Target Cell “x”: (a) 3 by 3, (b) Circle, (c) Annulus, (d) Wedge"). The target cell is that cell
found in the center of the 3-by-3 moving window. The moving window passes over every cell in the
raster. As it passes each central target cell, the nine values in the 3-by-3 window are used to calculate a
new value for that target cell. This new value is placed in the identical location in the output raster. If
one wanted to examine a larger sphere of influence around the target cells, the moving window could
be expanded to 5 by 5, 7 by 7, and so forth. Additionally, the moving window need not be a simple
rectangle. Other shapes used to calculate neighborhood statistics include the annulus, wedge, and circle
(Figure 8.6 "Common Neighborhood Types around Target Cell “x”: (a) 3 by 3, (b) Circle, (c) Annulus, (d)
Wedge").
Neighborhood operations are commonly used for data simplification on raster datasets. An analysis that
averages neighborhood values would result in a smoothed output raster with dampened highs and lows
as the influence of the outlying data values are reduced by the averaging process. Alternatively,
neighborhood analyses can be used to exaggerate differences in a dataset. Edge enhancement is a type
of neighborhood analysis that examines the range of values in the moving window. A large range value
would indicate that an edge occurs within the extent of the window, while a small range indicates the
lack of an edge.
Zonal Operations
A zonal operation is employed on groups of cells of similar value or like features, not surprisingly called
zones (e.g., land parcels, political/municipal units, waterbodies, soil/vegetation types). These zones
could be conceptualized as raster versions of polygons. Zonal rasters are often created by reclassifying
an input raster into just a few categories (see Section 8.2.2 "Neighborhood Operations"). Zonal
operations may be applied to a single raster or two overlaying rasters. Given a single input raster, zonal
operations measure the geometry of each zone in the raster, such as area, perimeter, thickness, and
centroid. Given two rasters in a zonal operation, one input raster and one zonal raster, a zonal operation
produces an output raster, which summarizes the cell values in the input raster for each zone in the
zonal raster (Figure 8.7 "Zonal Operation on a Raster Dataset").
Zonal operations and analyses are valuable in fields of study such as landscape ecology where the
geometry and spatial arrangement of habitat patches can significantly affect the type and number of
species that can reside in them. Similarly, zonal analyses can effectively quantify the narrow habitat
corridors that are important for regional movement of flightless, migratory animal species moving
through otherwise densely urbanized areas.
Global Operations
Global operations are similar to zonal operations whereby the entire raster dataset’s extent represents
a single zone. Typical global operations include determining basic statistical values for the raster as a
whole. For example, the minimum, maximum, average, range, and so forth can be quickly calculated
over the entire extent of the input raster and subsequently be output to a raster in which every cell
contains that calculated value (Figure 8.8 "Global Operation on a Raster Dataset").
A surface is a vector or raster dataset that contains an attribute value for every locale throughout its
extent. In a sense, all raster datasets are surfaces, but not all vector datasets are surfaces. Surfaces are
commonly used in a geographic information system (GIS) to visualize phenomena such as elevation,
temperature, slope, aspect, rainfall, and more. In a GIS, surface analyses are usually carried out on
either raster datasets or TINs (Triangular Irregular Network; Chapter 5 "Geospatial Data Management",
Section 5.3.1 "Vector File Formats"), but isolines or point arrays can also be used. Interpolation is used
to estimate the value of a variable at an unsampled location from measurements made at nearby or
neighboring locales. Spatial interpolation methods draw on the theoretical creed of Tobler’s first law of
geography, which states that “everything is related to everything else, but near things are more related
than distant things.” Indeed, this basic tenet of positive spatial autocorrelation forms the backbone of
many spatial analyses (Figure 8.9 "Positive and Negative Spatial Autocorrelation").
Creating Surfaces
The ability to create a surface is a valuable tool in a GIS. The creation of raster surfaces, however, often
starts with the creation of a vector surface. One common method to create such a vector surface from
point data is via the generation of Thiessen (or Voronoi) polygons. Thiessen polygons are mathematically
generated areas that define the sphere of influence around each point in the dataset relative to all other
points (Figure 8.10 "A Vector Surface Created Using Thiessen Polygons"). Specifically, polygon
boundaries are calculated as the perpendicular bisectors of the lines between each pair of neighboring
points. The derived Thiessen polygons can then be used as crude vector surfaces that provide attribute
information across the entire area of interest. A common example of Thiessen polygons is the creation
of a rainfall surface from an array of rain gauge point locations. Employing some basic reclassification
techniques, these Thiessen polygons can be easily converted to equivalent raster representations.
While the creation of Thiessen polygons results in a polygon layer whereby each polygon, or raster zone,
maintains a single value, interpolation is a potentially complex statistical technique that estimates the
value of all unknown points between the known points. The three basic methods used to create
interpolated surfaces are spline, inverse distance weighting (IDW), and trend surface. The spline
interpolation method forces a smoothed curve through the set of known input points to estimate the
unknown, intervening values. IDW interpolation estimates the values of unknown locations using the
distance to proximal, known values. The weight placed on the value of each proximal value is in inverse
proportion to its spatial distance from the target locale. Therefore, the farther the proximal point, the
less weight it carries in defining the target point’s value. Finally, trend surface interpolation is the most
complex method as it fits a multivariate statistical regression model to the known points, assigning a
value to each unknown location based on that model.
Other highly complex interpolation methods exist such as kriging. Kriging is a complex geostatistical
technique, similar to IDW, that employs semivariograms to interpolate the values of an input point layer
and is more akin to a regression analysis (Krige 1951).Krige, D. 1951. A Statistical Approach to Some
Mine Valuations and Allied Problems at the Witwatersrand. Master’s thesis. University of
Witwatersrand. The specifics of the kriging methodology will not be covered here as this is beyond the
scope of this text. For more information on kriging, consult review texts such as Stein (1999).Stein, M.
1999. Statistical Interpolation of Spatial Data: Some Theories for Kriging. New York: Springer.
Inversely, raster data can also be used to create vector surfaces. For instance, isoline maps are made up
of continuous, nonoverlapping lines that connect points of equal value. Isolines have specific monikers
depending on the type of information they model (e.g., elevation = contour lines, temperature =
isotherms, barometric pressure = isobars, wind speed = isotachs) Figure 8.11 "Contour Lines Derived
from a DEM" shows an isoline elevation map. As the elevation values of this digital elevation model
(DEM) range from 450 to 950 feet, the contour lines are placed at 500, 600, 700, 800, and 900 feet
elevations throughout the extent of the image. In this example, the contour interval, defined as the
vertical distance between each contour line, is 100 feet. The contour interval is determined by the user
during the creating of the surface.
Surface analysis is often referred to as terrain (elevation) analysis when information related to slope,
aspect, viewshed, hydrology, volume, and so forth are calculated on raster surfaces such as DEMs
(digital elevation models; Chapter 5 "Geospatial Data Management", Section 5.3.1 "Vector File
Formats"). In addition, surface analysis techniques can also be applied to more esoteric mapping efforts
such as probability of tornados or concentration of infant mortalities in a given region. In this section we
discuss a few methods for creating surfaces and common surface analysis techniques related to terrain
datasets.
Several common raster-based neighborhood analyses provide valuable insights into the surface
properties of terrain. Slope maps (part (a) of Figure 8.12 "(a) Slope, (b) Aspect, and (c and d) Hillshade
Maps") are excellent for analyzing and visualizing landform characteristics and are frequently used in
conjunction with aspect maps (defined later) to assess watershed units, inventory forest resources,
determine habitat suitability, estimate slope erosion potential, and so forth. They are typically created
by fitting a planar surface to a 3-by-3 moving window around each target cell. When dividing the
horizontal distance across the moving window (which is determined via the spatial resolution of the
raster image) by the vertical distance within the window (measure as the difference between the largest
cell value and the central cell value), the slope is relatively easily obtained. The output raster of slope
values can be calculated as either percent slope or degree of slope.
Any cell that exhibits a slope must, by definition, be oriented in a known direction. This orientation is
referred to as aspect. Aspect maps (part (b) of Figure 8.12 "(a) Slope, (b) Aspect, and (c and d) Hillshade
Maps") use slope information to produce output raster images whereby the value of each cell denotes
the direction it faces. This is usually coded as either one of the eight ordinal directions (north, south,
east, west, northwest, northeast, southwest, southeast) or in degrees from 1° (nearly due north) to 360°
(back to due north). Flat surfaces have no aspect and are given a value of −1. To calculate aspect, a 3-by-
3 moving window is used to find the highest and lowest elevations around the target cell. If the highest
cell value is located at the top-left of the window (“top” being due north) and the lowest value is at the
bottom-right, it can be assumed that the aspect is southeast. The combination of slope and aspect
information is of great value to researchers such as botanists and soil scientists because sunlight
availability varies widely between north-facing and south-facing slopes. Indeed, the various light and
moisture regimes resulting from aspect changes encourage vegetative and edaphic differences.
A hillshade map (part (c) of Figure 8.12 "(a) Slope, (b) Aspect, and (c and d) Hillshade Maps") represents
the illumination of a surface from some hypothetical, user-defined light source (presumably, the sun).
Indeed, the slope of a hill is relatively brightly lit when facing the sun and dark when facing away. Using
the surface slope, aspect, angle of incoming light, and solar altitude as inputs, the hillshade process
codes each cell in the output raster with an 8-bit value (0–255) increasing from black to white. As you
can see in part (c) of Figure 8.12 "(a) Slope, (b) Aspect, and (c and d) Hillshade Maps", hillshade
representations are an effective way to visualize the three-dimensional nature of land elevations on a
twodimensional monitor or paper map. Hillshade maps can also be used effectively as a baseline map
when overlain with a semitransparent layer, such as a false-color digital elevation model (DEM; part (d)
of Figure 8.12 "(a) Slope, (b) Aspect, and (c and d) Hillshade Maps").
Viewshed analysis is a valuable visualization technique that uses the elevation value of cells in a DEM or
TIN (Triangulated Irregular Network) to determine those areas that can be seen from one or more
specific location(s) (part (a) of Figure 8.13 "(a) Viewshed and (b) Watershed Maps"). The viewing
location can be either a point or line layer and can be placed at any desired elevation. The output of the
viewshed analysis is a binary raster that classifies cells as either 1 (visible) or 0 (not visible). In the case of
two viewing locations, the output raster values would be 2 (visible from both points), 1 (visible from one
point), or 0 (not visible from either point).
Additional parameters influencing the resultant viewshed map are the viewing azimuth (horizontal
and/or vertical) and viewing radius. The horizontal viewing azimuth is the horizontal angle of the view
area and is set to a default value of 360°. The user may want to change this value to 90° if, for example,
the desired viewshed included only the area that could be seen from an office window. Similarly, vertical
viewing angle can be set from 0° to 180°. Finally, the viewing radius determines the distance from the
viewing location that is to be included in the output. This parameter is normally set to infinity
(functionally, this includes all areas within the DEM or TIN under examination). It may be decreased if,
for instance, you only wanted to include the area within the 100 km broadcast range of a radio station.
Similarly, watershed analyses are a series of surface analysis techniques that define the topographic
divides that drain surface water for stream networks (part (b) of Figure 8.13 "(a) Viewshed and (b)
Watershed Maps"). In geographic information systems (GISs), a watershed analysis is based on input of
a “filled” DEM. A filled DEM is one that contains no internal depressions (such as would be seen in a
pothole, sink wetland, or quarry). From these inputs, a flow direction raster is created to model the
direction of water movement across the surface. From the flow direction information, a flow
accumulation raster calculates the number of cells that contribute flow to each cell. Generally speaking,
cells with a high value of flow accumulation represent stream channels, while cells with low flow
accumulation represent uplands. With this in mind, a network of rasterized stream segments is created.
These stream networks are based on some user-defined minimum threshold of flow accumulation. For
example, it may be decided that a cell needs at least one thousand contributing cells to be considered a
stream segment. Altering this threshold value will change the density of the stream network. Following
the creation of the stream network, a stream link raster is calculated whereby each stream segment
(line) is topologically connected to stream intersections (nodes). Finally, the flow direction and stream
link raster datasets are combined to determine the output watershed raster as seen in part (b) of Figure
8.13 "(a) Viewshed and (b) Watershed Maps" (Chang 2008).Chang, K. 2008. Introduction to Geographic
Information Systems. New York: McGraw-Hill. Such analyses are invaluable for watershed management
and hydrologic modeling.
Unit No. : 8
Introduction: From projections to data management to spatial analysis, we have up to now focused on
the more technical points of a geographic information system (GIS). This unit is concerned less with the
computational options available to the GIS user and more with the artistic options.
Aim: Gain an understanding the properties of color and how best to utilize them in your cartographic
products.
Learning Objectives:
1. Understand how to best utilize point, line, and polygon symbols to assist in the interpretation of
your map and its features.
2. Familiarize cartographers with the basic cartographic principles that contribute to effective map
design.
Topics:
1. Color
2. Symbology
3. Cartographic Design
UNIT 9
CARTOGRAPHIC PRINCIPLES
Although a high-quality map is composed of many different elements, color is one of the first
components noticed by end-users. This is partially due to the fact that we each have an intuitive
understanding of how colors are, and should be, used to create an effective and pleasing visual
experience. Nevertheless, it is not always clear to the map-maker which colors should be used to best
convey the purpose of the product. This intuition is much like listening to our favorite music. We know
when a note is in tune or out of tune, but we wouldn’t necessarily have any idea of how to fix a bad
note. Color is indeed a tricky piece of the cartographic puzzle and is not surprisingly the most frequently
criticized variable on computer-generated maps (Monmonier 1996).Monmonier, M. 1996. How to Lie
with Maps. 2nd ed. Chicago: University of Chicago Press. This section attempts to outline the basic
components of color and the guidelines to most effectively employ this important map attribute
Color Basics
As electromagnetic radiation (ER) travels via waves from the sun (or a lightbulb) to objects on the earth,
portions of the ER spectrum are absorbed, scattered, or reflected by various objects. The resulting
property of the absorbed, scattered, and reflected ER is termed “color.” White is the color resulting
from the full range of the visual spectrum and is therefore considered the benchmark color by which all
others are measured. Black is the absence of ER. All other colors result from a partial interaction with
the ER spectrum.
The three primary aspects of color that must be addressed in map making are hue, value, and
saturation. Hue is the dominant wavelength or color associated with a reflecting object. Hue is the most
basic component of color and includes red, blue, yellow, purple, and so forth. Value is the amount of
white or black in the color. Value is often synonymous with contrast. Variations in the amount of value
for a given hue result in varying degrees of lightness or darkness for that color. Lighter colors are said to
possess high value, while dark colors possess low value. Monochrome colors are groups of colors with
the same hue but with incremental variations in value. As seen in Figure 9.1 "Value", variations in value
will typically lead the viewer’s eye from dark areas to light areas.
Saturation describes the intensity of color. Full saturation results in pure colors, while low saturation
colors approach gray. Variations in saturation yield different shades and tints. Shades are produced by
blocking light, such as by an umbrella, tree, curtain, and so forth. Increasing the amount of shading
results in grays and blacks. Tint is the opposite of shade and is produced by adding white to a color. Tints
and shades are particularly germane when using additive color models (see Section 9.1.2 "Color Models"
for more on additive color models). To maximize the interpretability of a map, use saturated colors to
represent hierarchically prominent features and washed-out colors to represent background features.
If used properly, color can greatly enhance and support map design. Likewise, color can detract from a
mapping product if abused. To use color properly, one must first consider the purpose of the map. In
some cases, the use of color is not warranted. Grayscale maps can be just as effective as color maps if
the subject matter merits it. Regardless, there are many reasons to use color. The five primary reasons
are outlined here.
Color is particularly suited to convey meaning (Figure 9.2 "Use of Color to Provide Meaning"). For
example, red is a strong color that evokes a passionate response in humans. Red has been shown to
evoke physiological responses such as increasing the rate of respiration and raising blood pressure. Red
is frequently associated with blood, war, violence, even love. On the other hand, blue is a color
associated with calming effects. Associated with the sky or ocean, blue colors can actually assist in sleep
and is therefore a recommended color for bedrooms. Too much blue, however, can result in a lapse
from calming effects into feelings of depression (i.e., having the “blues”). Green is most commonly
associated with life or nature (plants). The color green is certainly one of the most topical colors in
today’s society with commonplace references to green construction, the Green party, going green, and
so forth. Green, however, can also represent envy and inexperience (e.g., the greeneyed monster,
greenhorn). Brown is also a nature color but more as a representation of earth and stone. Brown can
also imply dullness. Yellow is most commonly associated with sunshine and warmth, somewhat similar
to red. Yellow can also represent cowardice (e.g., yellow-bellied). Black, the absence of color, is possibly
the most meaning-laden color in modern parlance. Even more than the others, the color black purports
surprisingly strong positive and negative connotations. Black conveys mystery, elegance, and
sophistication (e.g., a black-tie affair, in the black), while also conveying loss, evil, and negativity (e.g.,
blackout, black-hearted, black cloud, blacklist).
The second reason to use color is for clarification and emphasis (Figure 9.3 "Use of Color to Provide
Emphasis"). Warm colors, such as reds and yellows, are notable for emphasizing spatial features. These
colors will often jump off the page and are usually the first to attract the reader’s eye, particularly if they
are counterbalanced with cool colors, such as blues and greens (see Section 9.1.3 "Color Choices" for
more on warm and cool colors). In addition, the use of a hue with high saturation will stand out starkly
against similar hues of low saturation.
Color use is also important for creating a map with pleasing aesthetics (Figure 9.4 "Use of Color to
Provide Aesthetics"). Certainly, one of the most challenging aspects of map creation is developing an
effective color palette. When looking at maps through an aesthetic lens, we are truly starting to think of
our creations as artwork. Although somewhat particular to individual viewers, we all have an innate
understanding of when colors in a graphic/art are aesthetically pleasing and when they are not. For
example, color use is considered harmonious when colors from opposite sides of the color wheel are
used (Section 9.1.3 "Color Choices"), whereas equitable use of several major hues can create an
unbalanced image.
The fourth use of color is abstraction (Figure 9.5 "Use of Color to Provide Abstraction"). Color
abstraction is an effective way to illustrate quantitative and qualitative data, particularly for thematic
products such as choropleth maps. Here, colors are used solely to denote different values for a variable
and may not have any particular rhyme or reason. Figure 9.5 "Use of Color to Provide Abstraction"
shows a typical thematic map with abstract colors representing different countries.
Opposite abstraction, color can also be used to represent reality (Figure 9.6). Maps showing elevation
(e.g., digital elevation models or DEMs) are often given false colors that approximate reality. Low areas
are colored in variations of green to show areas of lush vegetation growth. Mid-elevations (or low-lying
desert areas) are colored brown to show sparse vegetation growth. Mountain ridges and peaks are
colored white to show accumulated snowfall. Watercourses and water bodies are colored blue. Unless
there is a specific reason not to, natural phenomena represented on maps should always be colored to
approximate their actual color to increase interpretability and to decrease confusion.
Color Models
Color models are systems that allow for the creation of a range of colors from a short list of primary
colors. Color models can be additive or subtractive. Additive color models combine emitted light to
display color variations and are commonly used with computer monitors, televisions, scanners, digital
cameras, and video projectors. The RGB (red-green-blue) color model is the most common additive
model (part (a) of Figure 9.7 "Additive Color Models: (a) RGB, (b) HSL, and (c) HSV"). The RGB model
combines light beams of the primary hues of red, green, and blue to yield additive secondary hues of
magenta, cyan, and yellow. Although there is a substantive difference between pure yellow light (~580
nm) and a mixture of green and red light, the human eye perceives these signals as the same. The RGB
model typically employs three 8-bit numeric values (called an RGB triplet) ranging from 0 to 255 to
model colors. For instance, the RGB triplets for the pure primary and secondary colors are as follows:
Red = (255, 0, 0)
Green = (0, 255, 0)
Blue = (0, 0, 255)
Magenta = (255, 0, 255)
Cyan = (0, 255, 255)
Yellow = (255, 255, 0)
Black, the absence of additive color = (0, 0, 0)
White, the sum of all additive color = (255, 255, 255)
Two other common additive color models, based on the RGB model, are the HSL (hue, saturation,
lightness) and HSV (hue, saturation, value) models (Figure 9.7 "Additive Color Models: (a) RGB, (b) HSL,
and (c) HSV", b and c). These models are based on cylindrical coordinate systems whereby the angle
around the central vertical axis corresponds to the hue; the distance from the central axis corresponds
to saturation; and the distance along the central axis corresponds to either saturation or lightness.
Because of their basis in the RGB model, both the HSL and HSV color models can be directly transformed
between the three additive models. While these relatively simple additive models provide minimal
computerprocessing time, they do possess the disadvantage of glossing over some of the complexities
of color. For example, the RGB color model does not define “absolute” color spaces, which connotes
that these hues may look differently when viewed on different displays. Also, the RGB hues are not
evenly spaced along the color spectrum, meaning combinations of the hues is less than exact.
In contrast to an additive model, subtractive color models involve the mixing of paints, dyes, or inks to
create full color ranges. These subtractive models display color on the assumption that white, ambient
light is being scattered, absorbed, and reflected from the page by the printing inks. Subtractive models
therefore create white by restricting ink from the print surface. As such, these models assume the use of
white paper as other paper colors will result in skewed hues. CMYK (cyan, magenta, yellow, black) is the
most common subtractive color model and is occasionally referred to as a “four-color process” (Figure
9.8 "Subtractive Color Model: CMYK"). Although the CMY inks are sufficient to create all of the colors of
the subtractive rainbow, a black ink is included in this model as it is much cheaper than using a CMY mix
for all blacks (black being the most commonly printed color) and because combining CMY often results
in more of a dark brown hue. The CMYK model creates color values by entering percentages for each of
the four colors ranging from 0 percent to 100 percent. For example, pure red is composed of 14 percent
cyan, 100 percent magenta, 99 percent yellow, and 3 percent black.
As you may guess, additive models are the preferred choice when maps are to be displayed on a
computer monitor, while subtractive models are preferred when printing. If in doubt, it is usually best to
use the RGB model as this supports a larger percentage of the visible spectrum in comparison with the
CMYK model. Once an image is converted from RGB to CMYK, the additional RGB information is
irretrievably lost. If possible, collecting both RGB and CMYK versions of an image is ideal, particularly if
your graphic is to be both printed and placed online. One last note, you will also want to be selective in
your use of file formats for these color models. The JPEG and GIF graphic file formats are the best choice
for RGB images, while the EPS and TIFF graphic file formats are preferred with printed CMYK images.
Color Choices
Effective color usage requires a modicum of knowledge about the color wheel. Invented by Sir Isaac
Newton in 1706, the color wheel is a visual representation of colors arranged according to their
chromatic relationships. Primary hues are equidistant from each other with secondary and tertiary
colors intervening. The red-yellow-blue color wheel is the most frequently used (Figure 9.9 "Color
Wheel"); however, the magenta-yellow-cyan wheel is the preferred choice of print makers (for reasons
described in the previous section). Primary colors are those that cannot be created by mixing other
colors; secondary colors are defined as those colors created by mixing two primary hues; tertiary colors
are those created by mixing primary and secondary hues. Furthermore, complementary colors are those
placed opposite each on the wheel, while analogous colors are located proximal to each other.
Complementary colors emphasize differences. Analogues suggest harmony.
Colors can be further referred to as warm or cool (Figure 9.10 "Warm (Orange) and Cool (Blue) Colors").
Warm colors are those that might be seen during a bright, sunny day. Cool colors are those associated
with overcast days. Warm colors are typified by hues ranging from red to yellow, including browns and
tans. Cool color hues range from blue-green through blue-violet and include the majority of gray
variants. When used in mapping, it is wise to use warm and cool colors with care. Indeed, warm colors
stand out, appear active, and stimulate the viewer. Cool colors appear small, recede, and calm the
viewer. As you might guess, it is important that you apply warm colors to the map features of primary
interest, while using cool colors on the secondary, background, and/or contextual features.
In light of the plethora of color schemes and options available, it is wise to follow some basic color usage
guidelines. For example, changes in hue are best suited to visualizing qualitative data, while changes in
value and saturation are effective at visualizing quantitative data. Likewise, variations in lightness and
saturation are best suited to representing ordered data since these establish hierarchy among features.
In particular, a monochromatic color scale is an effective way to represent the order of data whereby
light colors represent smaller data values and dark colors represent larger values. Keep in mind that it is
best to use more light shades than dark ones as the human eye can better discern lighter shades. Also,
the number of coincident colors that can be distinguished by humans is around seven, so be careful not
to abuse the color palette in your maps. If the data being mapped has a zero point, a dichromatic scale
(Figure 9.11) provides a natural breaking point with increasing color values on each end of the scale
representing increasing data values.
In addition, darker colors result in more important or pronounced graphic features (assuming the
background is not overly dark). Use dark colors on features whose visual impact you wish to magnify.
Finally, do not use all the colors of the spectrum in a single map. It is best to leave such messy, rainbow-
spectacular effects to the late Jackson Pollock and his abstract expressionist ilk.
Symbology
While color is an integral variable when choosing how to best represent spatial data, making informed
decisions on the size, shape, and type of symbols is equally important. Although raster data are
restricted to symbolizing features as a single cell or as cell groupings, vector data allows for a vast array
of options to symbolize points, lines, and polygons in a map. Like color, cartographers must take care to
use symbols judiciously in order to most effectively communicate the meaning and purpose of the map
to the viewer.
Vector points, lines, and polygons can be symbolized in a myriad of ways. The guidelines laid out in this
section will help you to make informed decisions on how best to represent the features in your map.
The primary visual variables associated with symbolization include size, texture, pattern, and shape
(Figure 9.12 "Visual Variables"). Changes to symbol size and texture are most effectively used in
conjunction with ordinal, interval, and ratio data. Changes to symbol pattern and shape are preferred in
conjunction with nominal data.
Variations in the size of symbols are powerful indicators of feature importance. Intuitively, larger
symbols are assumed to be more important than smaller symbols. Although symbol size is most
commonly associated with point features, linear symbols can effectively be altered in size by adjusting
line width. Polygon features can also benefit from resizing. Despite the fact that the area of the polygon
can’t be changed, a point representing the centroid of the polygon can be included in the map. These
polygon centroids can be resized and symbolized as desired, just like any other point feature. Varying
symbol size is moderately effective when applied to ordinal or numerical data but is ineffective with
nominal data.
Symbol texture, also referred to as spacing, refers to the compactness of the marks that make up the
symbol. Points, lines, and polygons can be filled with horizontal hash marks, for instance. The closer
these hash marks are spaced within the feature symbol, the more hierarchically important the feature
will appear. Varying symbol texture is most effective when applied to ordinal or numerical data but is
ineffective with nominal data.
Much like texture, symbols can be filled with different patterns. These patterns are typically some
artistic abstraction that may or may not attempt to visualize real world phenomena. For example, a land-
use map may change the observed fill patterns of various land types to try to depict the dominant plants
associated with each vegetation community. Changes to symbol patterns are most often associated with
polygon features, although there is some limited utility in changing the fill patterns of points and lines.
Varying symbol size is moderately effective when applied to ordinal or numerical data and is ineffective
when applied to nominal data.
Altering symbol shape can have dramatic effects on the appearance of map features. Point symbols are
most commonly symbolized with circles. Circles tend to be the default point symbol due to their
unchanging orientation, compact shape, and viewer preference. Other geometric shapes can also
constitute effective symbols due to their visual stability and conservation of map space. Unless specific
conditions allow, volumetric symbols (spheres, cubes, etc.) should be used sparingly as they rarely
contribute more than simple, two-dimensional symbols. In addition to geometric symbols, pictograms
are useful representations of point features and can help to add artistic flair to a map. Pictograms
should clearly denote features of interest and should not require interpretation by the viewer (Figure
9.13 "Pictograms"). Locales that frequently employ pictograms include picnic areas, camping sites, road
signs, bathrooms, airports, and so forth. Varying symbol shape is most effective when applied to
nominal data and is moderately effective with ordinal and nominal data.
Finally, applying variations in lightness/darkness will affect the hierarchical value of a symbol. The darker
the symbol, the more it stands out among lighter features. Variations in the lightness/darkness of a
symbol are most effective when applied to ordinal data, are moderately effective when applied to
numerical data, and are ineffective when applied to nominal data.
Keep in mind that there are many other visual variables that can be employed in a map, depending on
the cartographic software used. Regardless of the chosen symbology, it is important to maintain a
logical relationship between the symbol and the data. Also, visual contrast between different mapped
variables must be preserved. Indeed, the efficacy of your map will be greatly diminished if you do not
ensure that its symbols are readily identifiable and look markedly different from each other.
Proportional Symbolization
In addition to the uniform symbols presented in the previous section, symbols for a single, quantitative
variable can be sized proportionally to match the data values. These proportional symbols are useful for
presenting a fairly exact understanding of the differences in magnitude within a dataset. As the numeric
values for each class increases, so too does the size of the symbol representing that class. This allows the
symbol size of features to be directly related to the attribute values they represent whereby small points
denote small data values and large points denote large data values.
Similar to proportional symbols, range graded symbols group raw data into classes with each class
represented by a differently sized symbol. Both proportional and range graded symbols are most
frequently used with point data, but lines and polygons can benefit from proportional symbolization as
well. In the case of linear datasets, line width is most frequently used as the proportional visual variable.
Polygon datasets typically summarize a quantitative variable within each polygon, place a centroid
within that polygon, and proportion that centroid point symbol. Range grading should not be used if the
data range for a given variable is small. In these cases, range grading will suggest larger differences in
the data values than is merited.
The advantage of proportional symbolization is the ease with which the viewer can discriminate symbol
size and thus understand variations in the data values over a given map extent. On the other hand,
viewers may misjudge the magnitude of the proportional symbols if they do not pay close attention to
the legend. In addition, the human eye does not see and interpret symbol size in absolute terms. When
proportional circles are used in maps, it is typical that the viewer will underestimate the larger circles
relative to the smaller circles. To address this potential pitfall, graduated symbols can be based on either
mathematical or perceptual scaling. Mathematical scaling directly relates symbol size with the data
value for that locale. If one value is twice as large as another, it will be represented with a symbol twice
as large as the other. Perceptual scaling overcomes the underestimation of large symbols by making
these symbols much larger than their actual value would indicate (Figure 9.14 "Mathematical versus
Perceptual Scaling").
A disadvantage of proportional symbolization is that the symbol size can appear variable depending on
the surrounding symbols. This is best shown via the Ebbinghaus illusion (also known as Titchener
circles). As you can see in Figure 9.15 "Ebbinghaus Illusion", the central circles are both the same size
but appear different due to the visual influence of the surrounding circles. If you are creating a graphic
with many different symbols, this illusion can wreak havoc on the interpretability of your map.
Cartographic Design
In addition to effective use of colors and symbols, a map that is well designed will greatly enhance its
ability to relate pertinent spatial information to the viewer. Judicious use of map elements,
typography/labels, and design principles will result in maps that minimize confusion and maximize
interpretability. Furthermore, the use of these components must be guided by a keen understanding of
the map’s purpose, intended audience, topic, scale, and production/reproduction method.
Map Elements
Chapter 9 "Cartographic Principles", Section 9.1 "Color" and Section 9.2 "Symbology" discussed visual
variables specific to the spatial features of a map. However, a map is composed of many more elements
than just the spatial features, each of which contributes immensely to the interpretability and flow of
the overall map. This section outlines the basic map elements that should be incorporated into a
“complete” map. Following Slocum et al. (2005),Slocum, T., R. McMaster, F. Kessler, and H. Howard.
2005. Thematic Cartography and Geographic Visualization. 2nd ed. Upper Saddle River, NJ: Pearson
Prentice Hall. these elements are listed in the logical order in which they should be placed into the map
(Figure 9.16 "A US Map Showing Various Map Elements").
The first feature that should be placed into the map layout is the frame line. This line is essentially a
bordering box that surrounds all the map elements described hereafter. All of these map elements
should be balanced within the frame line. To balance a map, ensure that neither large blank spaces nor
jumbled masses of information are present within the map. Similar to frame lines are neat lines. Neat
lines are border boxes that are placed around individual map elements. By definition, neat lines must
occur within the frame line. Both frame lines and neat lines are typically thin, black-lined boxes, but they
can be altered to match the specific aesthetics of an individual map.
The mapped area is the primary geographic component of the overall map. The mapped area contains
all of the features and symbols used to represent the spatial phenomena being displayed. The mapped
area is typically bordered with a neat line.
Insets can be thought of as secondary map areas, each encased within their own neat line. These neat
lines should be of different thickness or type than other line features on the map to adequately
demarcate them from other map features. Insets often display the primary mapped area in relation to a
larger area. For example, if the primary map shows the locales of national parks with a county, an inset
displaying the location of that county within the larger state boundary may be included. Conversely,
insets are also used to display areas related to the primary map but that occur at some far off locale.
This type of inset is often used with maps of the United States whereby Alaska and Hawaii are placed as
insets to a map of the contiguous United States. Finally, insets can be used to clarify areas where
features would otherwise be overcrowded if restricted to the primary mapping area. If the county map
of national parks contained four small, adjacent parks, an inset could be used to expand that jumbled
portion of the map to show the exact spatial extent of each of the four parks. This type of inset is
frequently seen when showing the small northeastern states on a map of the entire United States.
All maps should have a title. The title is one of the first map elements to catch the viewer’s eye, so care
should be taken to most effectively represent the intent of the map with this leading text. The title
should clearly and concisely explain the purpose of the map and should specifically target the intended
viewing audience. When overly verbose or cryptically abbreviated, a poor title will detract immensely
from the interpretability of the cartographic end-product. The title should contain the largest type on
the map and be limited to one line, if possible. It should be placed at the top-center of the map unless
there is a specific reason otherwise. An alternate locale for the title is directly above the legend.
The legend provides a self-explanatory definition for all symbols used within the mapped area. Care
must be taken when developing this map element, as a multitude of features within a dataset can lead
to an overly complex legend. Although placement of the legend is variable, it should be placed within
the white space of the map and not in such a way that it masks any other map elements. Atop the
legend box is the optional legend header. The legend header should not simply repeat the information
from the title, nor should it include extraneous, non-legendspecific information. The symbols
representing mapped features should be to the left of the explanatory text. Placing a neat line around
the legend will help to bring attention to the element and is recommended but not required. Be careful
not to take up too much of the map with the legend, while also not making the legend so small that it
becomes difficult to read or that symbols become cluttered. Removing information related to base map
features (e.g., state boundaries on a US map) or readily identifiable features (e.g., highway or interstate
symbols) is one effective way to minimize legend size. If a large legend is unavoidable, it is acceptable to
place this feature outside of the map’s frame line.
Attribution of the data source within the map allows users to assess from where the data are derived.
Stylistically, the data source attribution should be hierarchically minimized by using a relatively small,
simple font. It is also helpful to preface this map element with “Source:” to avoid confusion with other
typographic elements.
An indicator of scale is invaluable to provide viewers with the means to properly adjudicate the
dimensions of the map. While not as important when mapping large or widely familiar locales such as a
country or continent, the scale element allows viewers to measure distances on the map. The three
primary representations of scale are the representational fraction, verbal scale, and bar scale (for more,
see Chapter 2 "Map Anatomy", Section 2.1 "Maps and Map Types"). The scale indicator should not be
prominently displayed within the map as this element is of secondary importance.
Finally, map orientation notifies the viewer of the direction of the map. To assist in clarifying
orientation, a bi can also be included in the mapped area. Most maps are made such that the top of the
page points to the north (i.e., a northup map). If your map is not north-up, there should be a good
reason for it. Orientation is most often indicated with a north arrow, of which there are many stylistic
options available in current geographic information system (GIS) software packages. One of the most
commonly encountered map errors is the use of an overly large or overly ornate north arrow. North
arrows should be fairly inconspicuous as they only need to be viewed once by the reader. Ornate north
arrows can be used on small scale maps, but simple north arrows are preferred on medium to large-
scale maps so as to not detract from the presumably more important information appearing elsewhere.
Taken together, these map elements should work together to achieve the goal of a clear, ordered,
balanced, and unified map product. Since modern GIS packages allow users to add and remove these
graphic elements with little effort, care must be taken to avoid the inclination to employ these
components with as little forethought as it takes to create them. The following sections provide further
guidance on composing these elements on the page to honor and balance the mapped area.
Type is found throughout all the elements of a map. Type is similar to map symbols in many senses.
Coloring effects alter typographic hierarchy as lighter type fades into the background and dark type
jumps to the fore. Using all uppercase letters and/or bolded letters will result in more pronounced
textual effects. Larger font sizes increase the hierarchical weight of the type, so ensure that the size of
the type corresponds with the importance of the map feature. Use decorative fonts, bold, and italics
sparingly. These fonts, as well as overly small fonts, can be difficult to read if overused. Most
importantly, always spell check your final cartographic product. After spell checking, spell check again.
Yu wont reegrett teh ecstra efort.
Other typographic options for altering text include the use of serif, sans serif, and display fonts. While
the use of serif fonts are preferred in written documents to provide horizontal guidelines, either is
acceptable in a mapping application (Slocum 2005).Slocum, T., R. McMaster, F. Kessler, and H. Howard.
2005. Thematic Cartography and Geographic Visualization. 2nd ed. Upper Saddle River, NJ: Pearson
Prentice Hall. Sans serif fonts, on the other hand, are preferred for maps that are viewed over the
Internet.
Kerning is an effective typographic effect that alters the space between adjacent letters in a word.
Decreasing the kerning of a typeset is useful if the text is too large for the space given. Alternatively,
increasing the kerning is an effective way to label large map areas, particularly in conjunction with all-
uppercase lettering. Like kerning, changes in leading (pronounced “led-ing”) alter the vertical distance
between lines of text. Leading should not be so cramped that lines of text begin to overwrite each other,
nor should it be so wide that lines of text appear unrelated. Other common typographic effects include
masks, callouts, shadows, and halos (Figure 9.17 "Typographic Effects"). All of these effects serve to
increase the visibility and importance of the text to which they are applied.
In addition to the general typographic guidelines discussed earlier, there are specific typographic
suggestions for feature labels. Obviously, labels must be placed proximal to their symbols so they are
directly and readily associated with the features they describe. Labels should maintain a consistent
orientation throughout so the reader does not have to rubberneck about to read various entries. Also,
avoid overprinting labels on top of other graphics or typographic features. If that is not possible,
consider using a halo, mask, callout, or shadow to help the text stand out from the background. In the
case of maps with many symbols, be sure that no features intervene between a symbol and its label.
Some typographic guidelines are specific to labels for point, line, and polygon features. Point labels, for
example, should not employ exaggerated kerning or leading. If leader lines are used, they should not
touch the point symbol nor should they include arrow heads. Leader lines should always be represented
with consistent color and line thickness throughout the map extent. Lastly, point labels should be placed
within the larger polygon in which they reside. For example, if the cities of Illinois were being mapped as
points atop a state polygon layer, the label for the Chicago point symbol should occur entirely over land,
and not reach into Lake Michigan. As this feature is located entirely on land, so should its label.
Line labels should be placed above their associated features but should not touch them. If the linear
feature is complex and meandering, the label should follow the general trend of the feature and not
attempt to match the alignment of each twist and turn. If the linear feature is particularly long, the
feature can be labeled multiple times across its length. Line labels should always read from left to right.
Polygon labels should be placed within the center of the feature whenever possible. If increased
emphasis is desired, all-uppercase letters can be effective. If alluppercase letters are used, exaggerated
kerning and leading is also appropriate to increase the hierarchical importance of the feature. If the
polygon feature is too small to include text, label the feature as if it were a point symbol. Unlike point
labels, however, leader lines should just enter into the feature.
Map Design
Map design is a complex process that provides many variables and choices to the cartographer. The
British Cartographic Society Design Group presented five “Principles of Cartographic Design” on their
listserv on November 26, 1999. These principles, and a brief summary of each, are as follows:
1. Concept before compilation. A basic understanding of the concept and purpose of the map
must be secured before the actual mapping exercise begins. Furthermore, there is no way to
determine what information to include in a map without having first determined who the end-
user is and in what manner the map will be used. A map without a purpose is of no use to
anyone.
2. Hierarchy with harmony. Important map features must appear prominent on the map. The less
important features should fade into the background. Creating harmony between the primary
and secondary representations on the map will lead to a quality product that will best suit the
needs for which it was developed.
3. Simplicity from sacrifice. Upon creating a map, it is tempting to add as much information into
the graphic view as can possibly fit. In reality, it is best to leave some stones unturned. Just as
the key to good communication is brevity, it can be said that the key to good mapping is
simplicity. A map can be considered complete when no other features can be removed. Less, in
this instance, is more
4. Maximum information at minimum cost. The purpose of a map is to convey the greatest
amount of information with the least amount of interpretive effort by the user. Map design
should allow complex spatial relationships to be understood at a glance.
5. Engage the emotion to engage the understanding. Well-constructed maps are basically works
of art. All of the artistic and aesthetic rules outlined in this chapter serve to engage the emotive
center of the viewer. If the viewer does not formulate some basic, emotional response to the
map, the message will be lost.
It should become increasingly clear that the cartographic choices made during the mapping process
have as much influence on the interpretation of a map as does the data being mapped. Borrowing
liberally from the popularized Mark Twain quote, it could be said that, “There are three kinds of lies: lies,
damned lies, and maps.” Mapmakers, indeed, have the ability to use (or misuse) cartographic principles
to represent (or misrepresent) the spatial data at their disposal. It is now up to you, the cartographer, to
master the tools presented in this book to harness the power of maps to elucidate and address the
spatial issues with which you are confronted.
Unit No. : 9
Introduction:
Aim: Achieve a basic understanding of the role of a project manager in the lifecycle of a GIS project.
Learning Objectives:
1. Review a sampling of the common tools and techniques available to complete GIS project
management tasks.
Topics:
UNIT 10
GIS PROJECT MANAGEMENT
Project Management Basics
Project management is a fairly recent professional endeavor that is growing rapidly to keep pace with
the increasingly complex job market. Some readers may equate management with the posting of clichéd
artwork that lines the walls of corporate headquarters across the nation (Figure 10.1). These posters
often depict a multitude of parachuters falling arm-in-arm while forming some odd geometric shape,
under which the poster is titled “Teamwork.” Another is a beautiful photo of a landscape titled,
“Motivation.” Clearly, any job that is easy enough that its workers can be motivated by a pretty picture
is a job that will either soon be done by computers or shipped overseas. In reality, proper project
management is a complex task that requires a broad knowledge base and a variety of skills.
The Project Management Institute (PMI) Standards Committee describes project management as “the
application of knowledge, skills, tools, and techniques to project activities in order to meet or exceed
stakeholder needs and expectations.” To assist in the understanding and implementation of project
management, PMI has written a book devoted to this subject titled, “A Guide to the Project
Management Body of Knowledge,” also known as the PMBOK Guide (PMI 2008). This section guides the
reader through the basic tenets of this text.
The primary stakeholders in a given project include the project manager, project team, sponsor/client,
and customer/end-user. As project manager, you will be required to identify and solve potential
problems, issues, and questions as they arise. Although much of this section is applicable to the majority
of information technology (IT) projects, GIS projects are particularly challenging due to the large storage,
integration, and performance requirements associated with this particular field. GIS projects, therefore,
tend to have elevated levels of risk compared to standard IT projects.
Project management is an integrative effort whereby all of the project’s pieces must be aligned properly
for timely completion of the work. Failure anywhere along the project timeline will result in delay, or
outright failure, of the project goals. To accomplish this daunting task, five process groups and nine
project management knowledge areas have been developed to meet project objectives. These process
groups and knowledge areas are described in this section.
The five project management process groups presented here are described separately, but realize that
there is typically a large degree of overlap among each of them.
Initiation, the first process group, defines and authorizes a particular project or project phase. This is the
point at which the scope, available resources, deliverables, schedule, and goals are decided. Initiation is
typically out of the hands of the project management team and, as such, requires a high-level
sponsor/client to approve a given course of action. This approval comes to the project manager in the
form of a project charter that provides the authority to utilize organizational resources to address the
issues at hand.
The planning process group determines how a newly initiated project phase will be carried out. It
focuses on defining the project scope, gathering information, reviewing available resources, identifying
and analyzing potential risks, developing a management plan, and estimating timetables and costs. As
such, all stakeholders should be involved in the planning process group to ensure comprehensive
feedback. The planning process is also iterative, meaning that each planning step may positively or
negatively affect previous decisions. If changes need to be made during these iterations, the project
manager must revisit the plan components and update those now-obsolete activities. This iterative
methodology is referred to as “rolling wave planning.”
The executing process group describes those processes employed to complete the work outlined in the
planning process group. Common activities performed during this process group include directing
project execution, acquiring and developing the project team, performing quality assurance, and
distributing information to the stakeholders. The executing process group, like the planning process
group, is often iterative due to fluctuations in project specifics (e.g., timelines, productivity,
unanticipated risk) and therefore may require reevaluation throughout the lifecycle of the project.
The monitoring and controlling process group is used to observe the project, identify potential
problems, and correct those problems. These processes run concurrently with all of the other process
groups and therefore span the entire project lifecycle. This process group examines all proposed
changes to the project and approves only those that do not alter the overall, stated goals of the project.
Some of the specific activities and actions monitored and controlled by this process group include the
project scope, schedule, cost, output quality, reports, risk, and stakeholder interactions.
Finally, the closing process group essentially terminates all of the actions and activities undertaken
during the four previous process groups. This process group includes handing off all pertinent
deliverables to the proper recipients and the formal completion of all contracts with the sponsor/client.
This process group is also important to signal the sponsor/client that no more charges will be made, and
they can now reassign the project staff and organizational resources as needed.
Each of the five aforementioned process groups is available for use with nine different knowledge areas.
These knowledge areas comprise those subjects that project managers must be familiar with to
successfully complete a given project. A brief description of each of these nine knowledge areas is
provided here.
1. Project integration management describes the ability of the project manager to “identify,
define, combine, unify, and coordinate” the various project activities into a coherent whole
(PMBOK 2008). It is understood by senior project managers that there is no single way to
successfully complete this task. In reality, each manager must apply their specific skills,
techniques, and knowledge to the job at hand. This knowledge area incorporates all five of the
PMBOK process groups.
2. Project scope management entails an understanding of not only what work is required to
complete the project but also what extraneous work should be excluded from project. Defining
the scope of a project is usually done via the creation of a scope plan document that is
distributed among team members. This knowledge area incorporates the planning, as well as
the monitoring and controlling process groups.
3. Project time management takes into account the fact that all projects are subject to certain
time constraints. These time constraints must be analyzed and an overall project schedule must
be developed based on inputs from all project stakeholders (see Section 10.2.1 "Scheduling" for
more on scheduling). This knowledge area incorporates the planning, as well as the monitoring
and controlling process groups.
4. Project cost management is focused not only with determining a reasonable budget for each
project task but also with staying within the defined budget. Project cost management is often
either very simple or very complex. Particular care needs to be taken to work with the
sponsor/client as they will be funding this effort. Therefore, any changes or augments to the
project costs must be vetted through the sponsor/client prior to initiating those changes. This
knowledge area incorporates the planning, as well as the monitoring and controlling process
groups.
5. Project quality management identifies the quality standards of the project and determines how
best to satisfy those standards. It incorporates responsibilities such as quality planning, quality
assurance, and quality control. To ensure adequate quality management, the project manager
must evaluate the expectations of the other stakeholders and continually monitor the output of
the various project tasks. This knowledge area incorporates the planning, executing, and
monitoring and controlling process groups.
6. Project human resource management involves the acquisition, development, organization, and
oversight of all team members. Managers should attempt to include team members in as many
aspects of the task as possible so they feel loyal to the work and invested in creating the best
output possible. This knowledge area incorporates the planning, executing, and monitoring and
controlling process groups.
7. Project communication management describes those processes required to maintain open lines
of communication with the project stakeholders. Included in this knowledge area is the
determination of who needs to communicate with whom, how communication will be
maintained (e-mail, letter reports, phone, etc.), how frequently contacts will be made, what
barriers will limit communication, and how past communications will be tracked and archived.
This knowledge area incorporates the planning, executing, and monitoring and controlling
process groups.
8. Project risk management identifies and mitigates risk to the project. It is concerned with
analyzing the severity of risk, planning responses, and monitoring those identified risks. Risk
analysis has become a complex undertaking as experienced project managers understand that
“an ounce of prevention is worth a pound of cure.” Risk management involves working with all
team members to evaluate each individual task and to minimize the potential for that risk to
manifest itself in the project or deliverable. This knowledge area incorporates the planning, as
well as the monitoring and controlling process groups.
9. Project procurement management, the final knowledge area, outlines the process by which
products, services, and/or results are acquired from outside the project team. This includes
selecting business partners, managing contracts, and closing contracts. These contracts are legal
documents supported by the force of law. Therefore, the fine print must be read and
understood to ensure that no confusion arises between the two parties entering into the
agreement. This knowledge area incorporates the planning, executing, monitoring and
controlling, and closing process groups.
Project Failure
Murphy’s Law of Project Management states that no major project is completed on time, within budget,
and with the same staff that started it—do not expect yours to be the first. It has been estimated that
only 16 percent of fully implemented information technology projects are completed on time and within
budget (The Standish Group International 2000).The Standish Group International. 2000. “Our Blog.”
http://www.pm2go.com. These failed projects result in an estimated loss of over $81 billion every year!
David Hamil discusses the reasons for these failures in his web feature titled, “Your Mission, Should You
Choose to Accept It: Project Management Excellence”
(http://spatialnews.geocomm.com/features/mesa1).
The first noted cause for project failure is poor planning. Every project must undergo some type of
planning-level feasibility study to determine the purpose of the project and the methodologies
employed to complete it. A feasibility study is basically used to determine whether or not a project
should be given the “green light.” It outlines the project mission, goals, objectives, scope, and
constraints. A project may be deemed unfeasible for a variety of reasons including an unacceptable level
of risk, unclear project requirements, disagreement among clients regarding project objectives, missing
key stakeholders, and unresolved political issues.
A second cause for project failure is lack of corporate management support. Inadequate staffing and
funding, as well as weak executive sponsorship on the part of the client, will typically result in a project
with little chance of success. One of the most important steps in managing a project will be to
determine which member of the client’s team is championing your project. This individual, or group of
individuals, must be kept abreast of all major decisions related to the project. If the client’s project
champion loses interest in or contact with the effort, failure is not far afield.
A third common cause of project failure is poor project management. A high-level project manager
should have ample experience, education, and leadership abilities, in addition to being a skilled
negotiator, communicator, problem solver, planner, and organizer. Despite the fact that managers with
this wide-ranging expertise are both uncommon and expensive to maintain, it only takes a failed project
or two for a client to learn the importance of securing the proper person for the job at hand.
The final cause of project failure is a lack of client focus and the lack of the end-user participation. The
client must be involved in all stages of the lifecycle of the project. More than one GIS project has been
completed and delivered to the client, only to discover that the final product was neither what the client
envisioned nor what the client wanted. Likewise, the end-user, which may or may not be the client, is
the most important participant in the long-term survival of the project. The end-user must participate in
all stages of project development. The creation of a wonderful GIS tool will most likely go unused if the
end-user can find a better and/or more cost-efficient solution to their needs elsewhere.
As a project manager, you will find that there are many tools and techniques that will assist your efforts.
While some of these are packaged in a geographic information system (GIS), many are not. Others are
mere concepts that managers must be mindful of when overseeing large projects with a multitude of
tasks, team members, clients, and end-users. This section outlines a sampling of these tools and
techniques, although their implementation is dependent on the individual project, scope, and
requirements that arise therein. Although these topics could be sprinkled throughout the preceding
chapters, they are not concepts whose mastery is typically required of entry-level GIS analysts or
technicians. Rather, they constitute a suite of skills and techniques that are often applied to a project
after the basic GIS work has been completed. In this sense, this section is used as a platform on which to
present novice GIS users with a sense of future pathways they may be led down, as well as providing
hints to other potential areas of study that will complement their nascent GIS knowledge base.
Scheduling
One of the most difficult and dread-inducing components of project management for many is the need
to oversee a large and diverse group of team members. While this text does not cover tips for getting
along with others (for this, you may want to peruse Unnamed Publisher’s selection of
psychology/sociology texts), ensuring that each project member is on task and up to date is an excellent
way to reduce potential problems associated with a complex project. To achieve this, there are several
tools available to track project schedules and goal completions.
The Gantt chart (named after its creator, Henry Gantt) is a bar chart that is used specifically for tracking
tasks throughout the project lifecycle. Additionally, Gantt charts show the dependencies of interrelated
tasks and focus on the start and completion dates for each specific task. Gantt charts will typically
represent the estimated task completion time in one color and the actual time to completion in a
second color (Figure 10.2 "Gantt Chart"). This color coding allows project members to rapidly assess the
project progress and identify areas of concern in a timely fashion.
PERT (Program Evaluation and Review Technique) charts are similar to Gantt charts in that they are both
used to coordinate task completion for a given project (Figure 10.3 "PERT Chart"). PERT charts focus
more on the events of a project than on the start and completion dates as seen with the Gantt charts.
This methodology is more often used with very large projects where adherence to strict time guidelines
is more important than monetary considerations. PERT charts include the identification of the project’s
critical path. After estimating the best- and worstcase scenario regarding the time to finish all tasks, the
critical path outlines the sequence of events that results in the longest potential duration for the
project. Delays to any of the critical path tasks will result in a net delay to project completion and
therefore must be closely monitored by the project manager.
There are some advantages and disadvantages to both the Gantt and PERT chart types. Gantt charts are
preferred when working with small, linear projects (with less than thirty or so tasks, each of which
occurs sequentially). Larger projects (1) will not fit onto a single Gantt display, making them more
difficult to visualize, and (2) quickly become too complex for the information therein to be related
effectively. Gantt charts can also be problematic because they require a strong sense of the entire
project’s timing before the first task has even been committed to the page. Also, Gantt charts don’t take
correlations between separate tasks into account. Finally, any change to the scheduling of the tasks in a
Gantt chart results in having to recreate the entire schedule, which can be a time-consuming and mind-
numbing experience.
PERT charts also suffer from some drawbacks. For example, the time to completion for each individual
task is not as clear as it is with the Gantt chart. Also, large project can become very complex and span
multiple pages. Because neither method is perfect, project managers will often use Gantt and PERT
charts simultaneously to incorporate the benefits of each methodology into their project.
While a GIS commands a large swath of the computer-generated mapping market share, it is not the
only cartographic player in town. GIS, as you now hopefully understand, is primarily a database-driven
mapping solution. Computer-aided design (CAD), on the other hand, is a graphics-based mapping
solution adopted by many cartographers; engineers in particular. Historically speaking, points, lines, and
polygons in a CAD system do not link to attributes but are mere drawings representing some reality.
CAD software, however, has recently begun to incorporate “smart” features whereby attribute
information is explicitly linked to the spatial representations.
CAD is typically used on many projects related to surveying and civil engineering work. For example,
creating a cadastral map for a housing development is a complex matter with a fine scale of exactitude
required to ensure, for example, that all electrical, sewer, transportation, and gas lines meet at precise
locales (Figure 10.4 "CAD Drawing of a Conceptual Land Development Project"). An error of inches, in
either the vertical or horizontal dimension, could result in a need for a major plan redesign that may
cost the client an inordinate amount of time and money. Too many of these types of errors, and you and
your engineer may soon be looking for a new job.
Regardless, the CAD drawing used to create these development plans is usually only concerned with the
local information in and around the project site that directly affects the construction of the housing
units, such as local elevation, soil/ substrates, land-use/land-cover types, surface water flows, and
groundwater resources. Therefore, local coordinate systems are typically employed by the civil engineer
whereby the origin coordinate (the 0, 0 point) is based off of some nearby landmark such as a manhole,
fire hydrant, stake, or some other survey control point. While this is acceptable for engineers, the GIS
user typically is concerned not only with local phenomena but also with tying the project into a larger
world.
For example, if a development project impacts a natural watercourse in the state of California, agencies
such as the US Army Corps of Engineers (a nationwide government agency), California Department of
Fish and Game (a statewide government agency), and the Regional Water Quality Control Board (a local
government agency) will each exert some regulatory requirements over the developer. These agencies
will want to know where the watercourse originates, where it flows to, where within the length of the
watercourse the development project occurs, and what percentage of the watercourse will be impacted.
These concerns can only be addressed by looking at the project in the larger context of the surrounding
watershed(s) within which the project occurs. To accomplish this, external, standardized GIS datasets
must be brought to bear on the project (e.g., national river reaches, stream flow and rain gauges,
habitat maps, national soil surveys, and regional land-use/land-cover maps). These datasets will
normally be georeferenced to some global standard and therefore will not automatically overlay with
the engineer’s local CAD data.
As project manager, it will be your team’s duty to import the CAD data (typically DWG, DGN, or DXF file
format) and align it exactly with the other, georeferenced GIS data layers. While this has not been an
easy task historically, sophisticated tools are being developed by both CAD and GIS software packages to
ensure that they “play nicely” with each other. For example, ESRI’s ArcGIS software package contains a
“Georeferencing” toolbar that allows users to shift, pan, resize, rotate, and add control points to assist
in the realignment of CAD data.
Application Development
As project manager, you may discover that the GIS software package employed by your workgroup is
missing some basic functionality that would greatly enhance the productivity of your team. In these
cases, it may be worthwhile to create your own GIS application(s). GIS applications are either stand-
alone GIS software packages or customizations of a preexisting GIS software package that are made to
meet some specific project need. These applications can range from simple (e.g., apply a standard
symbol/color set and text guidelines to mapped features) to complex (e.g., sort layers, select features
based on a predefined set of rules, perform a spatial analysis, and output a hard-copy map).
Some of the more simple applications can be created by using the canned tool sets and functionality
provided in the GIS software. For example, ESRI’s ArcGIS software package includes a macro language
called Model Builder that allows users with no knowledge of programming languages create a series of
automated tasks, also called workflows, which can be chained together and executed multiple times to
reduce the redundancy associated with many types of GIS analyses. The more complex applications will
most likely require the use of the GIS software’s native macro language or to write original code using
some compatible programming language. To return to the example of ESRI products, ArcGIS provides
the ability to develop and incorporate user-written programs, called scripts, into to standard platform.
These scripts can be written in the Python, VBScript, JScript, and Perl programming languages.
While you may want to create a GIS application from the ground up to meet your project needs, there
are many that have already been developed. These pre-written applications, many of which are open
source, may be employed by your project team to reduce the time, money, and headache associated
with such an effort. A sampling of the open-source GIS applications written for the C-family of
programming languages are as follows (Ramsey 2007):Ramsey, P. 2007. “The State of Open Source GIS.”
Refractions Research. http://www.refractions.net/expertise/ whitepapers/opensourcesurvey/survey-
open-source-2007-12.pdf.
GIS applications, however, are not always created from scratch. Many of them incorporate open-source
shared libraries that perform functions such as format support, geoprocessing, and reprojection of
coordinate systems. A sampling of these libraries is as follows:
While the C-based applications and libraries noted earlier are common due to their extensive time in
development, newer language families are supported as well. For example, Java has been used to
develop unique applications (e.g., gvSIG, OpenMap, uDig, Geoserver, JUMP, and DeeGree) from its
libraries (GeoAPI, WKB4J, GeoTools, and JTS Topology Suite), while .Net applications (e.g., MapWindow,
WorldWind, SharpMap) are a new but powerful application option that support their own libraries
(Proj.Net, NTS) as well as the C-based libraries.
Map Series
A project manager will often be required to produce paper and/or digital maps of the project site. These
maps will typically include standard information such as a title, north arrow, scale bar, corporate contact
information, data source, and so forth. This is simple if the site is small enough that the pertinent
mapped features can be resolved on a single map. However, problems arise if the site is exceedingly
large, follows a linear pathway (e.g., highway improvement projects), or is composed of distant,
noncontiguous site locales. In these cases, the manager will need to create a series of easily referenced
and reproduced maps that are at the exact same scale, have minimal overlap, and maintain consistent
collar material throughout.
To accomplish this task, a map series can be employed to create standardized maps from the GIS (e.g.,
“DS Map Book” for ArcGIS 9; “Data Driven Pages” for ArcGIS 10). A map series is essentially a multipage
document created by dividing the overall data frame into unique tiles based on a user-defined index
grid. Figure 10.5 "Project Site Tiled into an Output Series" shows an example of a map series that divides
a project site into a grid of similar tiles. Figure 10.6 "Output from a Map Series" shows the standardized
maps produced when that series is printed. While these maps can certainly be created without the use
of a map series generator, this functionality greatly assists in the organization and display of project’s
whose extents cannot be represented within a single map.
Grid-to-Ground Transformations
Project managers must be mindful of the transition from in-program mapped units to real-world
locations. As discussed in Chapter 3 "Data, Information, and Where to Find Them", Section 3.2 "Data
about Data", transforming the three-dimensional earth to two dimensions necessarily results in both
accuracy and precision errors. While projects that cover a small areal extent may not noticeably suffer
from this error, projects that cover a large areal extent could run into substantial problems.
When surveyors measure the angles and distances of features on the earth for input into a GIS, they are
taking “ground” measurements. However, spatial datasets in a GIS are based on a predefined coordinate
system, referred to as “grid” measurements. In the case of angles, ground measurements are taken
relative to some north standard such as true north, grid north, or magnetic north. Grid measurements
are always relative to the coordinate system’s grid north. Therefore, grid north and ground north may
well need to be rotated in order to align correctly.
In the case of distances, two sources of error may be present: (1) scale error and (2) elevation error.
Scale error refers to the phenomenon whereby points measured on the three-dimensional earth (i.e.,
ground measurement) must first be translated onto the coordinate system’s ellipsoid (i.e., mean sea
level), and then must be translated to the two-dimensional grid plane (Figure 10.7 "Grid-to-Ground
Transformation"). Basically, scale error is associated with the move from three to two dimensions and is
remedied by applying a scale factor (SF) to any measurements made to the dataset.
In addition to scale error, elevation error becomes increasingly pronounced as the project site’s
elevation begins to rise. Consider Figure 10.8 "Grid versus Ground Measurements", where a line
measured as 1,000 feet at altitude must first be scaled down to fit the earth’s ellipsoid measurement,
then scaled again to fit the coordinate system’s grid plane. Each such transition requires compensation,
referred to as the elevation factor (EF). The SF and EF are often combined into a single combination
factor (CF) that is automatically applied to any measurements taken from the GIS.
In addition to EF and SF errors, care must be taken when surveying areas greater than 5 miles in length.
At these distances, slight errors will begin to compound and may create noticeable discrepancies. In
particular, projects whose length crosses over coordinate systems zones (e.g., Universal Transverse
Mercator [UTM] zones or State Plane zones) are likely to suffer from unacceptable grid-to-ground
errors.
While the tools and techniques outlined in this section may be considered beyond the scope of an
introductory text on GISs, these pages represent some of the concerns that will arise during your tenure
as a GIS project manager. Although you will not need a comprehensive understanding of these issues for
your first GISrelated jobs, it is important that you understand that becoming a competent GIS user will
require a wide-ranging skill set, both technically and interpersonally.
UNIT 10
Introduction to Geostatistics
What is interpolation?
• Three types:
1. Resampling of raster cell size
2. Transforming a continuous surface from one data model to another (e.g. TIN to raster or
raster to vector).
3. Creating a surface based on a sample of values within the domain.
Requirements of interpolation
• Interpolation only works where values are spatially dependent—that values for nearby points
tend to be more similar
• Where values across a landscape are geographically independent, interpolation does not work
Interpolation examples
• Elevation:
• Elevation:
Source: LUBOS
MITAS AND
HELENA
MITASOVA,
University of Illinois
Sampling example
• Imagine this elevation cross section: If each dashed line represented a sample point (in 1-D), this
spacing would miss major local sources of variation, like the gorge
• Our interpolated surface (represented in 1-D by the blue line) would look like this
• Here our interpolated surface is much closer to reality at the local level, but we pay for this in
the form of higher data gathering cost
Interpolation examples
Sampling Approaches
• Often a regular gridded sampling strategy is appropriate and can eliminate sampling biases
• Sometimes, though, it can introduce biases if the grid pattern correlates in frequency with
something in the landscape, such as trees in a plantation or irrigation lines
• Random sampling can avoid this but introduces other problems including difficulty in finding
sample points and uneven distribution of points, leading to geographic gaps.
• This depends partially on the size of the support, or sampling unit
• An intermediate approach is the stratified random sample
• Create geographic or non-geographic subpopulations, from each of which random sample is
taken
• Proportional or equal probability SRS: enforce a certain sampling rate, πhj= nh/Nh for
each stratum h and obs j.
• Simple SRS: enforce a certain sample size nh
• Disproportionate SRS: where πhj varies such that certain strata are oversampled and
certain undersampled.
• DSRS is advantageous when subpopulation variances are unequal, which is frequently the case
when stratum sizes are considerably different. In DSRS we sample those strata with higher
variance at higher rate. We may also use this when we have an underrepresented
subpopulation that will have too few observations to model if sampled with SSRS.
• Proportional samples are self-weighting because the rates are the same for each stratum
• The other two have unequal sampling probabilities (unless a simple SRS has equal Nh) and may
require weighting
• When the stratifying unit is geographical (e.g. county, soil polygon, forest stand), this is called a
cluster sample.
• In a one stage cluster sample (OSCS) a series of geographic units are sampled and
all observations within are sampled: obviously this does not work for interpolation
• More relevant is a two stage cluster sample (TSCS) in which we take a sample of cluster units
and then a subsample of the population of each cluster unit.
• In this type of sample, variance has two components, that between clusters and that between
observations
• The number of samples we want within each zone depends on the statistical certainty with
which we want to generate our surface
• Do we want to be 95% certain that a given pixel is classified right, or 90% or 80%?
• Our desired confidence level will determine the number of samples we need per strata
• This is a tradeoff between cost and statistical certainty
• Think of other examples where you could stratify….
• A common problem with sampling points for interpolation is what is not being sampled?
• Very frequently people leave out sample points that are hard to get to or hard to collect data at
• This creates sampling biases and regions whose interpolated values are essentially meaningless
Sampling example
• Say we want to make an average precipitation layer and we find that in our study zone
precipitation is highly spatially variable within 10 miles of the ocean
• We’d a coastline layer to help us sample.
• We’d have high density of sampling points within 10 miles of the ocean a much lower density in
the inland zones
• Say we were looking at an inland area, far from any ocean, and we decided that precipitation
varied with elevation. How would we set up our sampling design?
• In this case, flat areas would need fewer sample points, while areas of rough topography would
need more
• In our sampling design we would set up zones, or strata, corresponding to different elevation
zones and we would make sure that we get a certain minimum number of samples within each
of those zones
• This ensures we get a representative sample across, in this case, elevation;
Sampling
• The number of zones we use will determine how representative our sample is; if zones are big
and broad, we do not ensure that all elevation ranges are represented
• Sampling strategy for interpolation depends on the scale at which you are working and the scale
dependency of the phenomenon you are studying
• In many cases interpolation will work to pick up regional trends but lose the local variation in
the process
• The density of sample points must be chosen to reflect the scale of the phenomenon you are
measuring.
Scale dependency
• If you have a high density of sample points, you will capture local variation, which is
appropriate for large-scale (small-area) studies
• If you have low density of sample points, you will lose sensitivity of local variation and
capture only the regional variation; this is more appropriate for small-scale (large-area)
studies
• In ArcGIS, to interpolate:
• Create or add a point shapefile with some attribute that will be used as a Z value
UNIT 11
GEOSTATISTICAL PREDICTION
n
Where λi are given by some weighting fn and i 1
i1
^
z(x ) d i
p
i
z(x0 ) i1
n p
i1
ij
IDW-How it works
• There are two IDW method options Variable and fixed radius:
• 1. Variable (or nearest neighbor): User defines how many neighbor points are going to
be used to define value for each cell
• 2. Fixed Radius: User defines a radius within which every point will be used to define the
value for each cell
• Can also define “Barriers”: User chooses whether to limit certain points from being used in the
calculation of a new value for a cell, even if the point is near. E.g. wouldn't use an elevation
point on one side of a ridge to create an elevation value on the other side of the ridge. User
chooses a line theme to represent the barrier
• It is the P where the Root Mean Squared Prediction Error (RMSPE) is lowest, as in the graph on
right
• To determine this, we would need a test, or validation data set, showing Z values in x,y locations
that are not included in prediction data and then look for discrepancies between actual and
predicted values. We keep changing the P value until we get the minimum level of error.
Without this, we just guess.
Optimizing P value
The blue line indicates degree of spatial autocorrelation (required for interpolation). The closer to the
dashed (1:1) line, the more perfectly autocorrelated.
Spline Method
• Number of points around a cell that will be used to fit a polynomial function to a curve
Kriging Method
• Like IDW interpolation, Kriging forms weights from surrounding measured values to predict
values at unmeasured locations. As with IDW interpolation, the closest measured values usually
have the most influence. However, the kriging weights for the surrounding measured points
are more sophisticated than those of IDW. IDW uses a simple algorithm based on distance, but
kriging weights come from a semivariogram that was developed by looking at the spatial
structure of the data. To create a continuous surface or map of the phenomenon, predictions
are made for locations in the study area based on the semivariogram and the spatial
arrangement of measured values that are nearby.
--from ESRI Help
• In other words, kriging substitutes the arbitrarily chosen p from IDW with a probabilistically-
based weighting function that models the spatial dependence of the data.
• The structure of the spatial dependence is quantified in the semi-variogram
• Semivariograms measure the strength of statistical correlation as a function of distance; they
quantify spatial autocorrelation
• Kriging associates some probability with each prediction, hence it provides not just a surface,
but some measure of the accuracy of that surface
• Kriging equations are estimated through least squares
• Kriging has both a deterministic, stochastic and random error component Z(s) =
μ(s) + ε’(s)+ ε’’(s), where
μ(s) = deterministic component
ε’(s)= stochastic but spatially dependent component
ε’’(s)= spatially independent residual error
• Assumes spatial variation in variable is too irregular to be modeled by simple smooth function,
better with stochastic surface
• Interpolation parameters (e.g. weights) are chosen to optimize fn
• Hence, foundation of Kriging is notion of spatial autocorrelation, or tendency of values of
entities closer in space to be related.
• This is a violation of classical statistical models, which assumes that observations are
independent.
• Autocorrelation can be assessed using a semivariogram, which plots the difference in pair values
(variance) against their distances.
Semivariance
• Semivariance(distance h) = 0.5 * average [ (value at location i– value at location j)2] OR
n
{z(x ) z(x h)}
i i
2
(h) i1
2n
• Based on the scatter of points, the computer (Geostatistical analyst) fits a curve through those
points
• The inverse is the covariance matrix which shows correlation over space
Variogram
• Plots semi-variance against distance between points
• Is binned to simplify
• Can be binned based on just distance (top) or distance and direction (bottom)
• Where autocorrelation exists, the semivariance should have slope
• Look at variogram to find where slope levels
Steps
• Variogram cloud; can use bins to make cloud plot of all points or box plot of points
• Empirical variogram: choose bins and lags
• Model variogram: fit function through empirical variogram
– Functional forms?
Functional Forms
Kriging Method
• We can then use a scatter plot of predicted versus actual values to see the extent to which our
model actually predicts the values
• If the blue line and the points lie along the 1:1 line this indicates that the kriging model predicts
the data well
• The fitted variogram results in a series of matrices and vectors that are used in weighting and
locally solving the kriging equation.
• Basically, at this point, it is similar to other interpolation methods in that we are taking a
weighting moving average, but the weights (λ) are based on statistically derived autocorrelation
measures.
• λs are chosen so that the estimate z(x0 ) is unbiased and the estimated variance is less than
for any other possible linear combo of the variables.
Produces four types of prediction maps:
• Prediction Map: Predicted values
• Probability Map: Probability that value over x
• Prediction Standard Error Map: fit of model
• Quantile maps: Probability that value over certain quantile
Example
• Here are some sample elevation points from which surfaces were derived using the three
methods
Example: IDW
• Done with P =2. Notice how it is not as smooth as Spline. This is because of the weighting
function introduced through P
Example: Spline
• Note how smooth the curves of the terrain are; this is because Spline is fitting a simply
polynomial equation through the points
Example: Kriging
• This one is kind of in between—because it fits an equation through point, but weights it based
on probabilities
Reference:
Troy, Austin. Fundamentals of GIS. http://www.uvm.edu/rsenr/gradgis/lectures/