Basic Mapping Principles For Visualizing Cancer Data Using Geographic Information Systems (GIS)
Basic Mapping Principles For Visualizing Cancer Data Using Geographic Information Systems (GIS)
Abstract: Maps and other data graphics may play a role in generating ideas and hypotheses at the
beginning of a project. They are useful as part of analyses for evaluating model results and
then at the end of a project when researchers present their results and conclusions to
varied audiences, such as their local research group, decision makers, or a concerned
public. Cancer researchers are gaining skill with geographic information system (GIS)
mapping as one of their many tools and are broadening the symbolization approaches they
use for investigating and illustrating their data. A single map is one of many possible
representations of the data, so making multiple maps is often part of a complete mapping
effort. Symbol types, color choices, and data classing each affect the information revealed
by a map and are best tailored to the specific characteristics of data. Related data can be
examined in series with coordinated classing and can also be compared using multivariate
symbols that build on the basic rules of symbol design. Informative legend wording and
setting suitable map projections are also basic to skilled mapmaking.
(Am J Prev Med 2006;30(2S):S25–S36) © 2006 American Journal of Preventive Medicine
A
geographic information system (GIS) allows
the counts, rates per 100,000 person-years (age ad-
epidemiologists and cancer researchers to in-
justed using 1970 populations), and upper and lower
vestigate spatial patterns within their data and
bounds of 95% conference interval (CI) offered on the
understand relationships between cancer and other
site for black and white races. The map area is cropped
health, socioeconomic, and environmental variables.
to produce compact demonstration figures that can be
High-quality maps also allow researchers to present a
compared in series. The data are freely available
compelling case to others who are interested in their
through the NCI website to other mapmakers who
work. GIS is an additional tool in the exploration,
would like to work with the methods described.
analysis, and communication of cancer data, and
The basic overview of thematic mapping offered in
knowledge of the basic principles for representing data
this article has wide application in cancer and epidemi-
can help cancer researchers make the most of GIS and
the opportunities for insight it offers. ologic mapping. Other tools in GIS are also of use to
This article is structured in three sections: mapping epidemiologists, such as address geocoding and net-
methods, mapping multiple variables, and map finish- work analysis. The links between spatial statistics soft-
ing. Two common symbol types, choropleth mapping ware tools and GIS are also improving.2 The focus of
and proportional symbols, are featured, and decisions this short article, however, is limited to symbolizing
involved in making effective use of these symbols are statistical data, which is a common use of GIS. Basic
summarized. Supporting figures present maps of pros- criteria for choosing symbols to map derived values,
tate cancer data to correspond with the topic of this significance levels, model results, and smoothed rates
special issue of the American Journal of Preventive Medi- are the same as for simpler measures such as crude
cine. These maps were produced in ArcGIS (ESRI, rates. Likewise, multivariate maps that combine or
Redlands CA, version 9) with no further augmentation overlay model results with original data or related
in illustration software. The data and geography used variables can illuminate relationships between them by
for these maps are from the National Cancer Institute combining symbolization approaches.
(NCI) Cancer Mortality Maps and Graphs Website.1 Cartographers use visual tools, and epidemiologists
use statistical tools to investigate their data. This is an
From the Department of Geography, Pennsylvania State University, oversimplification, to be sure, but it seems to be a core
University Park, Pennsylvania difference in approach between the two fields and each
Address correspondence and reprint requests to: Cynthia A. could be enhanced by further use of the other’s meth-
Brewer, PhD, Department of Geography, Pennsylvania State Univer-
sity, 302 Walker Building, University Park PA, 16802-5011. E-mail: ods. The tools cartographers use to improve their visual
[email protected]. representations of data can complement epidemiolo-
Color Symbols
The main goal in choosing colors for choropleth maps
is to order lightness so it parallels ordering in the data.
The simple case is light-to-dark color for low-to-high
values with a constant hue (blue is used in Figure 2a).
Adding hue variation can help make it easier to see
differences between color symbols. A lightness se-
quence combined with a progression through adjacent
hues produces some of the best sequential choropleth
color schemes (for example, yellow– green– blue are
adjacent in the ordering of hues through the spectrum;
Figure 2b). These hue and lightness sequences are
more challenging to design, and useful series of ready-
made sequential schemes are offered online through
ColorBrewer.org (Figure 3) to assist mapmakers who
are not experienced with color specification.5,6
Many map readers find spectral (rainbow) schemes
appealing (Figure 2c). These color schemes are not
well suited to sequential data because lightness varies
through the spectrum (yellow and cyan are often
lighter than other hues). Spectral schemes can be
adjusted to better order lightness, and the intrinsically
light yellow hue can also be used to emphasize critical
values within a data range.7,8 For these diverging
schemes (versus sequential schemes), lightness diverges
from a mid-range critical value toward two contrasting
Figure 1. Hue symbolizes categorical difference in counts for
two race groups. Pie chart symbols are scaled to a constant hues. Figure 4 shows a modified spectral diverging
size and show relative proportions of mortality for two popu- scheme (Figure 4a) and other diverging examples that
lations: black and white males. use fewer hues (Figure 4b,c).
Diverging data may have an obvious structure, such scheme, and looking at distributions using a variety of
as positive and negative values diverging from zero representations may offer the most insight (compare
(Figure 4c). Dark red to white to dark blue is an Figure 2b to Figure 2c).
example color scheme that parallels this diverging Color blindness in map readers becomes an issue
structure. Data may also be presented as diverging from when using multiple hues.9 About 8% of men and ⬍1%
a calculated value such as a national rate, threshold of women have one of the varied forms of red– green
value in disease incidence, or median. These data color vision deficiency. Color blind people do see many
might be equally well represented using a sequential hues but there are predictable groupings of hues that
will be confused with each other. The extent of color
confusions depends on the severity of a person’s color
vision deficiency. The range of hues from red through
orange, brown, yellow, and green may all look the same
or similar if they are also similar in lightness. This set of
color confusions means that some popular color
schemes, such as spectral and “stop light” (red–yellow–
green) schemes, produce maps that are difficult to read
for a substantial number of people.
Red– green combinations are not the only hues that
are confused by people with common color vision
deficiencies. Other example sets of hues that can be
confusing are magenta– gray– cyan and blue–purple.
Example hue pairs that work well as the anchors in
diverging schemes for color blind readers are: red–
blue, red–purple, orange– blue, orange–purple,
brown– blue, brown–purple, yellow– blue, yellow–
purple, yellow– gray, and blue– gray.10 The colorfulness
Figure 3. An example screen from ColorBrewer.org, an on-
line tool offering color specifications for each color in of spectral schemes can also be taken advantage of
schemes suited for thematic maps. Color schemes are while still accommodating most readers’ vision impair-
grouped into sequential, diverging, and qualitative sets. ments by using a spectral scheme that skips the greens:
dark red, orange, yellow, light blue, blue–purple, dark units each are shown in Figure 5a. Equal interval
purple (Figure 4a).8 ColorBrewer includes a variety of classing breaks the data range into equal segments for
these diverging schemes with full color specifications. predictable and equal class ranges (unlike the variation
The tools at Vischeck.com are also useful for correcting in quantile ranges, as seen in Figure 5a, where the first
the appearance of graphics to accommodate people class has a range of 19.2 deaths and the second has a
with color vision impairments. range of 2.3). The number of counties in each class
varies with equal intervals (Figure 5b). Jenks methods
Classing (called natural breaks in ArcGIS; Figure 5c) minimize
Data classing is another basic decision made when variation within classes and maximize variation between
creating choropleth maps of data. For example, in classes. With this approach, enumeration units that
Figure 5a, counties with data values between 19.23 and share a color are statistically more similar to each other
21.54 are grouped into one class and represented by a than to units in other color classes.3
green color. There are numerous methods for classing Cartographers most commonly choose a Jenks
data11 and most GIS and mapping programs offer a method for their first look at data. In contrast, quantile
selection that often includes quantiles (Figure 5a), classing is the more common choice of epidemiologists,
equal intervals (Figure 5b), and a Jenks optimized perhaps because variation in calculated values pro-
method (Figure 5c). Other choices include classing by duced by different types of standardization and age
standard deviations and minimizing differences across adjustment means death rates may be usefully seen as
boundaries. There is no one correct way to class a data ranked values.11 Cartographers recommend looking at
set, and different methods will produce different map a histogram (Figure 6) or other aspatial graph of the
patterns, especially if data are skewed or include ex- data to assist in choosing classes.3 Generally, a sound
treme outliers. approach is to start with a standard classification and
Quantile classing assigns the same number of enu- adjust breaks to improve the map based on knowledge
meration units to each class (it is a generalized form of of the data and the audience. For example, a useful
percentiles). Four quantiles (quartiles) allocate one adjustment is to group extreme outliers into their own
quarter of the data values to each class with the median class and then class the rest of the data range using a
at the middle break. For example, four classes of 391 standard method. In Figure 7a, for example, rates
⬍12.71 and ⬎30.64 are in separate classes and equal assign colors that readers can tell apart with too many
intervals are applied to the remainder of the data range classes (the extreme being an n-class map). Seven
(compare Figure 7a to Figure 5b). Likewise, when there classes is often the most you will want to use on a
are many zero values in a data set, it works well to choropleth map, and an optimal number of classes can
separate them into their own class and then class the be calculated by examining diminishing reductions in
remainder of the data set. Another adjustment strategy variance with increasing numbers of classes.3 A quick
is to apply Jenks for good statistical breaks and then look at a rough proportional symbol map can also
adjust classes to include the national rate and round provide an alternative understanding of the data distri-
data values to assist map reading by a general audience bution that helps you judge how well the classed view
(Figure 7b).12 represents the data.
Watch map patterns while changing methods and
adjusting breaks to check the sensitivity of the distribu-
tion. The more classes used, the less changeable the Proportional Symbols
map pattern will be with different classing methods and Another way to represent quantitative data, for either
adjustments. There are diminishing returns with in- points or areas, is with symbols that vary by size in
creasing numbers of classes, and it becomes difficult to proportion to data values. Symbols such as circles and
squares are usually scaled by the software in proportion
to the square root of each data value so that symbol
areas visually represent the data values. Sizes of linearly
scaled symbols, such as bars, are more accurately inter-
preted by map readers, but they soon become imprac-
tical with large data ranges. A symbol scaled by area,
such as a square, is more compact and easier to
associate with the location for which it represents data.
Proportional symbols may be placed directly at data
points, such as cities or address locations, or they may
be centered in areas. The order in which symbols are
drawn, so that smaller symbols appear above larger
ones, aids map reading.
Figure 6. Example histogram display in the classify window of Use of proportional symbols for enumeration areas is
ArcMap (ESRI, Redlands CA, v9). particularly useful for count data (total number of
Figure 9. Example of a map series with each map classed separately using quantile classing. The maps are a time series.
and lowest rates change, with the highest rates shifting smoothing and representation of a more generalized
to the east by 1990 –1994. In contrast, the shared breaks surface. Likewise, aggregating to larger enumeration
of Figure 10 also make the overall increase in rates units, through longer spans of time, and across related
through time more obvious (Figure 9 requires careful cancer types may improve the meaningfulness of maps
study of the legends for this information). of cancer data.4
Wording
Completed maps may be missing critical information
about the calculations behind the symbols they present
or, conversely, they may have such laborious titles that
the main issue presented by the map is obscured. A
map title should present the basic topic of the map and
invite the reader to investigate further. The legend title
should provide details of the map calculations (i.e., it
should not be labeled “legend” or something so terse as
“%”). If the calculation is complex, then the clarifica-
tion is best continued in a note in small type on the
map or in associated text. Map sources, data sources,
and authorship are also in small type on the map. This
format varies with media. For example, a journal pub-
File Export
When preparing a GIS map, prepare in advance to
export it for distribution to others. It is difficult to share
a map project (such as an .MXD from ESRI ArcGIS)
directly with graphics production people, and often it
Map Projection
Another basic mapmaking issue is map projection.3 A
projection transforms the base geography from a spher-
ical model of the Earth to the flat page or screen. GIS
implements map projections by applying a series of
equations to geographic coordinates. Projections that
preserve area are suitable for most epidemiologic maps.
These are called equal area projections. The Albers
Equal Area projection is commonly used for the U.S.
C (seen in all figures in this article). In addition, custom-
O ized projections can be created to suit any map scale
L
O and world area.
R If the mapmaker does not attend to map projection,
software defaults usually present the mapped area
underpinned with a regular grid of latitude and longi-
tude lines, producing an inappropriate east–west
stretch and north–south compression at U.S. latitudes.
These distortions interfere with readers’ judgments of
densities and relative areas of cancer rates, which are
crucial for much epidemiologic map interpretation.
Slab-like default projections also mark a map as the
product of an amateur, calling into question compe-
tence of data handing and other GIS decisions.
Figure 15 shows a portion of the southeast U.S. on
three maps at the same scale with a graticule (latitude
and longitude lines) overlay. Figure 15a has no projec-
tion set, the graticule is square, and the counties and
states are distorted by being stretched east to west.
Figure 15b is projected using an Albers Equal Area
projection with settings for the entire U.S., causing this
eastern portion to be tilted, as seen in the angled
latitude lines. The third example, Figure 15c, is ad-
justed by changing the central meridian so north is up