Lecture 4 - How Not To Lie With Spatial Statistics
Lecture 4 - How Not To Lie With Spatial Statistics
T
discrete object, such as the location of a noxious
formulate some cautionary remarks related to facility, or even a sample point designed to measure a
the use of methods and software for spatial data continuous phenomenon, such as an air quality moni-
analysis, with particular reference to empirical work toring station. Although these are all points on a GIS
dealing with cancer prevention and research. Due to map, they each require a distinct statistical approach,
length limitations, the discussion will have to be brief, respectively referred to as point pattern analysis
incomplete, and largely nontechnical. For more com- (events), lattice data analysis (discrete objects), and
prehensive and technical reviews of some of the issues geostatistics (continuous surfaces). Methods and prop-
raised, see the articles by Anselin,2 Greenland,3 and erties that are appropriate for one type of analysis do
Wakefield.4 In this context, I define spatial data analysis not readily transfer to other types of spatial processes.
broadly as consisting of three important components: Unfortunately, the GIS (and spatial analysis software)
exploratory spatial data analysis (ESDA), visualization, remains largely ignorant about the nature of the un-
and spatial modeling. Although the dividing lines be- derlying process, and simply deals with the data as
tween these areas of interest are not precise, I consider “points,” thereby not preventing meaningless analyses
ESDA as concerned with the search for interesting (such as the application of geostatistical analysis to
“patterns,” visualization as consisting of methods to discrete lattice data).
show these interesting patterns, and spatial modeling as The upshot of this situation is that care is needed in
the collection of techniques (also referred to as spatial the range of activities involved in spatial data analysis,
regression analysis, spatial econometrics) to explain from the collection of data and the use of software to
and predict these patterns. Recent overviews of the the interpretation of results and their application in
methodology of spatial statistics and spatial economet- policy analysis. Along the way, choices that yield differ-
rics can be found in Lawson,5 Anselin et al.,6 Banerjee ent results must be made, offering the temptation to
et al.,7 Waller and Gotway,8 and Schabenberger and tailor the method to the desired result (to lie with
Gotway.9 statistics). I will briefly comment on a few salient points
The focus on patterns highlights the importance of and important tradeoffs, starting with data problems,
location and distance, two central concepts in spatial methodologic challenges, and software issues, and clos-
data analysis. Recent methodologic advances in spatial ing with some remarks on interpretation and policy.
statistics, combined with the ready availability of cheap
and powerful desktop geographic information systems
(GIS) have brought spatial analysis within reach of
Data Problems
many nonspecialists. The array of techniques available Spatial data include the location of the observation as
can be bewildering, especially because many of them an essential attribute. This is either recorded in a
are easily applied through the use of commercial coordinate system as an absolute location (such as
off-the-shelf point-and-click software, without much latitude–longitude, or some projected x,y coordinates),
guidance as to what is appropriate for the situation at or referred to as an administrative entity, such as a
hand. This is further complicated by the fact that spatial census tract or ZIP code zone. In practice, the geo-
data can be represented in many different ways (e.g., as graphic information about patients or hospitals, for
discrete spatial objects, such as counties, or as contin- example, is not necessarily available in such form, but is
uous surfaces, such as a risk surface). In addition, a more likely recorded as a street address. The process of
given representation does not necessarily provide in- translating street addresses to the formal spatial loca-
sight into the type of spatial process at hand. For tion information is referred to as geocoding. Although
example, a point could represent an event, such as the straightforward to carry out in most commercial GIS, it
address of a person undergoing cancer screening, or a is also fraught with problems such as inaccurate address
information or flaws in the spatial database on street
From the Spatial Analysis Laboratory, University of Illinois, Urbana, locations. This will result in errors that need to be
Illinois accounted for in any spatial statistical analysis. Unfor-
Address correspondence and reprint requests to: Luc Anselin, MA, tunately, in practice, such errors do not tend to be
PhD, Spatial Analysis Laboratory, University of Illinois, 333 Daven-
port Hall, MC 150, 607 South Mathews Avenue, Urbana, IL 61801- random, but show systematic spatial variation. For
3671. E-mail: [email protected] example, more inaccuracies will tend to be found in