0% found this document useful (0 votes)
19 views15 pages

Interactive Techniques and Exploratory Spatial Data Analysis

This chapter reviews the ideas behind interactive and exploratory spatial data analysis and their relation to GIS. Three important aspects are considered. First, an overview is presented of the principles behind interactive spatial data analysis, based on insights from the use of dynamic graphics in statistics and their extension to spatial data. This is followed by a review of spatialised exploratory data analysis (EDA) techniques, that is, ways in which a spatial representation can be giv

Uploaded by

Pembayun Otsu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

Interactive Techniques and Exploratory Spatial Data Analysis

This chapter reviews the ideas behind interactive and exploratory spatial data analysis and their relation to GIS. Three important aspects are considered. First, an overview is presented of the principles behind interactive spatial data analysis, based on insights from the use of dynamic graphics in statistics and their extension to spatial data. This is followed by a review of spatialised exploratory data analysis (EDA) techniques, that is, ways in which a spatial representation can be giv

Uploaded by

Pembayun Otsu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Regional Research Institute Working Papers Regional Research Institute

1996

Interactive Techniques and Exploratory Spatial Data Analysis


Luc Anselin

Follow this and additional works at: https://researchrepository.wvu.edu/rri_pubs

Part of the Regional Economics Commons

Digital Commons Citation


Anselin, Luc, "Interactive Techniques and Exploratory Spatial Data Analysis" (1996). Regional Research
Institute Working Papers. 200.
https://researchrepository.wvu.edu/rri_pubs/200

This Working Paper is brought to you for free and open access by the Regional Research Institute at The Research
Repository @ WVU. It has been accepted for inclusion in Regional Research Institute Working Papers by an
authorized administrator of The Research Repository @ WVU. For more information, please contact
[email protected].
17
Interactive techniques and exploratory
spatial data analysis
L ANSELIN

This chapter reviews the ideas behind interactive and exploratory spatial data analysis and
their relation to GIS. Three important aspects are considered. First, an overview is
presented of the principles behind interactive spatial data analysis, based on insights from
the use of dynamic graphics in statistics and their extension to spatial data. This is followed
by a review of spatialised exploratory data analysis (EDA) techniques, that is, ways in which
a spatial representation can be given to standard EDA tools by associating them with
particular locations or spatial subsets of the data. The third aspect covers the main ideas
behind true exploratory spatial data analysis, emphasising the concern with visualising
spatial distributions and local patterns of spatial autocorrelation. The geostatistical
perspective is considered, typically taken in the physical sciences, as well as the lattice
perspective, more familiar in the social sciences. The chapter closes with a brief discussion
of implementation issues and future directions.

1 INTRODUCTION (Openshaw and Alvanides, Chapter 18; Fischer,


Chapter 19; Openshaw 1990, 1991). While this rather
Recent developments in computing hardware and extreme viewpoint is not shared by many, it is widely
GIS software have made it possible to interact recognised that many of the geographical analysis
directly with large spatial databases and to obtain techniques of the 1960s fail to take advantage of the
almost instantaneous results for a wide range of GIS visualisation and data manipulation capabilities
operations. The sophistication in storage, retrieval, embodied in modern GIS. Specifically, most spatial
and display provided by the rapidly evolving GIS statistical techniques, such as tests for spatial
technology has created a demand for new tools to autocorrelation and spatial regression models, are
carry out spatial analysis in general and spatial primarily static in nature, allowing only limited
statistical analysis in particular (see, among others, interaction between the data, the models, and the
Anselin and Getis 1992; Bailey 1994; Goodchild analyst. In contrast, dynamic or interactive
1987; Goodchild et al 1992; Openshaw 1991). This approaches to data analysis stress the user
demand grew out of an early awareness that the interaction with the data in a graphical environment,
implementation of ‘traditional’ spatial analysis allowing direct manipulation in the form of
techniques was insufficient to address the challenges instantaneous selection, deletion, rotation, and other
faced in a GIS environment (Goodchild and transformations of data points to aid in the
Longley, Chapter 40). The latter is often exploration of structure and the discovery of
characterised by vast numbers of observations patterns (Buja et al 1996; Cleveland 1993; Cleveland
(hundreds to several thousands) and ‘dirty’ data, and and McGill 1988).
some go so far as to completely reject ‘traditional’ The importance of EDA to enhance the spatial
spatial analysis that is based on statistical inference analytical capabilities of GIS has become widely

253
L Anselin

recognised (Anselin 1994; Anselin and Getis 1992; 2 PRINCIPLES OF INTERACTIVE SPATIAL
Bailey and Gatrell 1995; Fotheringham and Charlton DATA ANALYSIS
1994). The EDA paradigm for statistical analysis is
based on a desire to let the data speak for themselves The principles behind interactive spatial data
and to impose as little prior structure upon them as analysis can be traced back to the work on dynamic
possible. Instead, the emphasis is on creative data graphics for data analysis in general, originated by
displays and the use of simple indicators to elicit the statistician John Tukey and a number of research
patterns and suggest hypotheses in an inductive groups at AT&T Bell Laboratories. An excellent
manner, while avoiding potentially misleading review of the origins of these ideas is given in the
impressions given by ‘outliers’ or ‘atypical’ collection of papers edited by Cleveland and McGill
observations (Good 1983; Tukey 1977). Since spatial (1988), and early discussions of specific methods are
data analysis is often characterised as being ‘data rich contained in the papers by, among others, Becker et
but theory poor’ (Openshaw 1991), it would seem to al (1987), Becker and Cleveland (1987), and Stuetzle
form an ideal area for the application of EDA. (1987). More recent reviews of methods for the
However, this is not a straightforward exercise, since dynamic analysis of high-dimensional multivariate
the special nature of spatial data, such as the data and other aspects of interactive statistical
prevalence of spatial autocorrelation, may invalidate graphics can be found in papers by, among others,
the interpretation of methods that are based on an Becker et al (1996), Buja et al (1991, 1996),
assumption of independence, which is the rule in Cleveland (1993), and Cook et al (1995).
mainstream EDA (Anselin 1990; Anselin and Getis Dynamic graphical methods started as
1992). Hence, the need has arisen to develop enhancements to the familiar static displays of data
specialised methods of exploratory spatial data (e.g. histograms, bar charts, pie charts, scatterplots),
analysis (ESDA) that take the special nature of by allowing direct manipulation by the user that
spatial data explicitly into account (for recent results in ‘immediate’ change in a graph (see Elshaw
reviews, see Anselin 1994; Anselin and Bao 1997; Thrall and Thrall, Chapter 23, for some examples).
Bailey and Gatrell 1995; Cook et al 1996; Cressie This had become possible by the availability of
1993; Majure and Cressie 1997). workstations with sufficient computational power to
This chapter reviews the ideas behind interactive generate the statistical graphs without delays and to
and ESDA and their relation to GIS. Many of the allow interaction with the data by means of an input
ESDA techniques have been developed quite recently device (light pen or mouse). The overall motivation
and this remains an area of very active research. was to involve the human factor more directly in the
Therefore, the emphasis will be on general principles, exploration of data (i.e. exploiting the inherent
rather than on specific techniques. The latter will capabilities of the brain to detect patterns and
only be used to illustrate the overall framework and structure), and thereby gain richer insights than
no attempt is made to cover a comprehensive set of possible with the traditional rigid and static display.
methods. The bulk of the chapter considers three This was achieved by allowing the user to delete data
important aspects of the integration of ESDA and points, highlight (brush) subsections of the data,
interactive methods with GIS. First, an overview is establish links between the same data points in
presented of the principles behind interactive spatial different graphs, and rotate, cut through, and project
data analysis, based on insights from the use of higher-dimensional data. Furthermore, the user and
dynamic graphics in statistics and their extension to not a preset statistical procedure determined which
spatial data. This is followed by a review of actions to perform. Interactive statistical procedures
spatialised EDA techniques, that is, ways in which a become particularly effective when datasets are large
spatial representation can be given to standard EDA (many observations) and high-dimensional (many
tools by associating them with particular locations variables), situations where characterisation of the
or spatial subsets of the data. The third aspect data by a few numbers becomes increasingly
covers the main ideas behind true exploratory spatial unrealistic (for an early assessment see, for example,
data analysis, emphasising the concern with Andrews et al 1988: 75). While dynamic graphics for
visualising spatial distributions and local patterns of statistics were originally mostly experimental and
spatial autocorrelation (Getis, Chapter 16). The confined to research environments, they have quickly
chapter closes with a brief discussion of become pervasive features of the EDA capability in
implementation issues and future directions. modern commercial statistical software packages.

254
Interactive techniques and exploratory spatial data analysis

An important aspect of dynamic graphics is the The most comprehensive set of tools to date that
representation of data by means of multiple and implement dynamic graphics for exploring spatial
simultaneously available ‘views’, such as a table, a data is contained in the Regard (formerly Spider)
list of labels, a bar chart, pie chart, histogram, stem software of Haslett, Unwin and associates, which
and leaf plot, box plot, or scatterplot. These views runs on a Macintosh platform (see also Bradley and
are shown in different windows on a computer Haslett 1992; Haslett and Power 1995; Unwin 1994).
screen. They are linked in the sense that when a Regard, and its successor Manet (Unwin et al 1996)
location in any one of the windows (e.g. a bar on a allow for the visualisation of the distribution and
bar chart or a set of points on a scatterplot) is associations between data for any subset of locations
selected by means of a pointing device (brushing), selected on a map display. Similarly, for any subset
the corresponding locations in the other windows of data highlighted in a non-spatial view, such as a
are highlighted as well (see Becker et al 1987). While category in a histogram, the corresponding locations
geographical locations have always played an are highlighted on the map. This is illustrated in
important role in dynamic graphics (see the many Figure 1, where attention focuses on suggesting
examples of Cleveland and McGill 1988), it is only promising multivariate relations pertaining to
recently that the ‘map’ was introduced explicitly as electoral change in the new German Bundesländer
an additional view of the data, for example by (formerly East Germany). Six types of dynamically
Haslett et al (1990, 1991), MacDougall (1991), and linked views of the data are included, consisting of a
Monmonier (1989). map with highlighted constituencies, a bar chart,

Fig 1. Interactive dynamic graphics for exploring spatial data with Manet.

255
L Anselin

conditional (trellis) plot, histogram, scatterplot and linking it to other software modules. For example,
missing value chart, as well as lists with variable at the Statistics Laboratory of Iowa State
names and values observed at a specific location. University, a 2-directional link was established
(For details on the Manet approach, see Unwin et al between the XGobi dynamic graphics software of
1996 and http://www1.Math.Uni-Augsburg.de/~theus/ Buja et al (1991, 1996) and ArcView (Cook et al
Manet/ManetEx.html.) While highly dynamic in its 1996; Majure et al 1996a, 1996b; Symanzik et al
statistical graphics, the Spider–Regard–Manet 1994, 1995, 1996; http://www.gis.iastate.edu/XGobi-
approach is still somewhat limited in terms of the AV2/X Gobi-AV2.html). Similarly, the SpaceStat
spatial aspects of the data, in the sense that it is software for spatial data analysis of Anselin (1992,
based on a fixed map and does not take advantage of 1995a) was linked with ArcView in a Microsoft
GIS functionality, such as specialised data models to Windows environment (Anselin and Bao 1996,
facilitate spatial queries and overlays (see also 1997; http://www.rri.wvu.edu/utilities.htm). In many
respects, these and similar efforts achieve a
Hazelhoff and Gunnink 1992).
functionality close to that of Regard, although not
Several ideas from the methodology of dynamic
as seamless and considerably slower in execution.
statistical graphics are reflected in the design of
For example, in Figure 2, ArcView scripts were
current GIS and mapping software. For example,
used to construct a histogram for the median values
the ArcView GIS (ESRI 1995b) is organised
of housing in West Virginia counties, linked to a
around several linked ‘views’ of the data (a map, a map (a view in ArcView). Using a selection tool to
table, and several types of charts). These allow a click on a given bar (interval) in the histogram, the
limited degree of dynamic interaction in the sense relevant counties in the map are highlighted (for
that a selection made in any of the views (spatial further details on the dataset and the procedures,
selection of features on a map, records in a table) is see Anselin and Bao 1996, 1997). In contrast to
immediately reflected in all other views. While Regard, the linked frameworks allow the
Version 2.1 is rather limited in terms of its built-in exploitation of the full functionality of the GIS to
statistical (exploratory) analysis capabilities, search for other variables that may display similar
enhancements to make ArcView into a tool for patterns, using queries and spatial overlays (for
interactive ESDA have been developed by example, see Cook et al 1996).

Val 90
25

20 15800 - 2557.1
2557.1 - 35314.3
35314.3 - 45071.4
15 45071.4 - 54888.6
54828.6 - 64858.7
64858.7 - 74342.9
10
74342.9 - 81100

0
Count

Fig 2. Linked histogram and map in ArcView–SpaceStat.

256
Interactive techniques and exploratory spatial data analysis

3 SPATIALISED EXPLORATORY DATA in a GIS). A comparison of the two graphs suggests


ANALYSIS a systematically higher value for counties at the rim,
although a few counties in either group do not fit the
Whilst a widely available commercial pattern. In an interactive data analysis, this could
implementation of interactive and dynamic spatial easily be addressed by sequentially removing or
data analysis integrated with a GIS does not exist at adding counties to one or the other subset, providing
the time of writing, the use of EDA with GIS has the groundwork for a spatial analysis of variance
become fairly common. For example, in the (for other examples see Anselin et al 1993). However,
‘archaeologist’s workbench’ of Farley et al (1990) it is well recognised that potential spatial
and Williams et al (1990), standard EDA tools such autocorrelation among these observations could
as box plots and scatterplots were applied to invalidate the interpretation of any analysis of
geographical data, by exporting information from a variance or regression analysis. Therefore,
GIS to a statistical package (a 1-directional link). techniques only qualify as true ESDA when this is
However, the latter is not ESDA in the sense used by addressed explicitly.
Cressie (1993) and Anselin (1994), but rather non-
spatial EDA applied to spatial data (see also Anselin
and Getis 1992).
Spatialised EDA (Anselin 1994) is one step closer
to true ESDA in the sense that location is combined
with a graphic description of the data in the form of
Outer
a bar chart, pie chart, or various icons. The most
familiar example of this may be the positioning of
Chernoff faces at geographical locations on a map,
such as coordinates of cities or centroids of states, as
illustrated by Fotheringham and Charlton (1994)
and Haining (1990: 226) (but for a critical
assessment see Haslett 1992). The facility to add bar Inner
charts and pie charts to areal units on a map is by
now a familiar feature in many commercial GIS and
mapping packages.
A more meaningful combination of location and
(a) 20000 40000 60000 80000
data description is obtained when summaries of
spatial distributions are visualised for different
subsets in the data, providing initial insight into
spatial heterogeneity (i.e. different for spatial subsets
in the data, such as a north–south differential) or
suggesting a spatial trend (a systematic variation of
a variable with location, such as an east–west trend).
For example, Haining (1990: 224) organises box
plots for standardised mortality rates by distance
band away from the centre of the city, revealing a
clear spatial trend. Similarly, spatialised EDA
techniques may be used to carry out a form of
exploratory spatial analysis of variance, in which the
interest centres on differences in central tendency
(mean, median) of the distribution of a variable
between spatial subsets (or spatial regimes) in the
data. In Figure 3 this is illustrated for the West
Virginia data. Two box plots refer respectively to (b)
counties at the outer rim and inner counties
(generated by applying a spatial selection operation Fig 3. Exploratory spatial analysis of variance.

257
L Anselin

4 EXPLORATORY SPATIAL DATA ANALYSIS association, indicating local non-stationarity and


discovering islands of spatial heterogeneity (Anselin
ESDA can be broadly defined as the collection of 1994; Cressie 1993). In the remainder of this section,
techniques to describe and visualise spatial first some techniques are considered to visualise spatial
distributions, identify atypical locations (spatial distributions, with a particular focus on identifying
outliers), discover patterns of spatial association (spatial outliers and atypical observations. These techniques
clusters), and suggest different spatial regimes and other are more specialised than the methods for visualisation
forms of spatial instability or spatial non-stationarity for GIS discussed by Kraak (Chapter 11). This is
(Anselin 1994; see also Beard and Buttenfield, Chapter followed by a short review of ESDA techniques to
15). Central to ESDA is the concept of spatial visualise and assess spatial autocorrelation, for both
autocorrelation, that is, the phenomenon where geostatistical and lattice perspectives.
locational similarity (observations in spatial proximity)
is matched by value similarity (correlation).
4.1 Visualising spatial distributions
Spatial autocorrelation has been conceptualised
from two main perspectives, one prevalent in the Many of the spatialised EDA techniques described
physical sciences, the other in the social sciences. above can be successfully applied to gain insight into
Following Cressie’s (1993) classification, the so-called the distribution of data across locations in a GIS.
geostatistical perspective considers spatial observations These methods can also be integrated in a dynamic
to be a sample of points from an underlying interactive framework in a fairly straightforward way,
continuous spatial distribution (surface). This is for example as in the Manet software. A more explicit
modelled by means of a variogram, which expresses focus on identifying spatial outliers is offered by the so-
the strength of association between pairs of locations called box map, the extension of a familiar quantile
as a continuous function of the distance separating choropleth map (a standard feature in most GIS and
them (for comprehensive reviews see Cressie 1993 and mapping software) with highlighted upper and lower
Isaaks and Srivastava 1989). By contrast, in the so- outliers, defined as observations outside the ‘fences’ in
called lattice perspective, spatial locations are discrete a box plot (Cleveland 1993). A box map can easily be
points or areal units, and spatial data are implemented in many current GIS and mapping
conceptualised as a single realisation of a spatial packages (e.g. Anselin and Bao 1997). By comparing
stochastic process, similar to the approach taken in the box maps for different variables using overlay
analysis of time series. Essential in the analysis of operations in a GIS, an initial look at potential
lattice data is the concept of a spatial weights matrix, multivariate associations can be obtained (e.g. see
which expresses the spatial arrangement (topology, Talen 1997). Other approaches to identify outliers in
contiguity) of the data and which forms the starting spatial data can be envisaged as well, for example by
point for any statistical test or model (for extensive constructing spatial queries for those locations whose
reviews see Cliff and Ord 1981; Cressie 1993; values exceed some criterion of ‘extremeness’. Such
Haining 1990; Upton and Fingleton 1985). devices can be readily implemented in most currently
Juxtaposed on the distinction between the available commercial GIS.
geostatistical and lattice perspective is that between A more rigorous approach, geared towards the
global and local indicators of spatial association. geostatistical perspective, consists of the estimation of
Global indicators, such as the familiar Moran’s I and a spatial cumulative distribution function (SCDF), that
Geary’s c spatial autocorrelation statistics, summarise is, a continuous density function for all observations in
the overall pattern of dependence in the data into a a given region. This is implemented in the
single indicator (see Getis, Chapter 16). A major ArcView–XGobi linked framework mentioned earlier.
practical drawback for GIS analysis is that these global The linkage allows users to highlight regions of the
indicators are based on a strong assumption of spatial data on a map in ArcView and to find an SCDF plot in
stationarity, which, among others, requires a constant XGobi, to brush areas on the map to find the
mean (no spatial drift) and constant variance (no corresponding subset in the SCDF, and to brush
outliers) across space. This is not very meaningful or quantiles of the estimated SCDF and find the
may even be highly misleading in analyses of spatial matching locations on the map. For example, in
association for hundreds or thousands of spatial units Figure 4 (from Majure et al 1996a), the two SCDF
that characterise current GIS applications. The main functions for forest health indicators in the graph on
contribution of ESDA with respect to GIS lies the left-hand side correspond to the two large
therefore in visualising local patterns of spatial sub-regions of New England states in the map on the

258
Interactive techniques and exploratory spatial data analysis

right. An advantage of this form of linkage is that the


GIS can be used to overlay other data onto the
sample points, in order to suggest potential
10 multivariate associations. Clearly, an approach such
as SCDF could be integrated into a more
08 comprehensive Manet-type dynamic interactive
ESDA framework, although this has not been
implemented to date.
06

4.2 Visualising spatial autocorrelation: the


04 geostatistical perspective
The main focus of ESDA in geostatistics is on
02
identifying ‘unusual’ and highly influential (pairs of)
locations in order to obtain more robust estimates of
00 the variogram. Such locations are referred to as
1000 2000 3000 4000 spatial outliers, or pockets of local non-stationarity,
CDI and they require closer scrutiny before proceeding
with geostatistical modelling or spatial prediction
(a) (Kriging). The basic tools are outlined by Cressie
(1993) and include the variogram cloud, the
variogram box plot and the spatial lag scatterplot. A
variogram cloud is a scatterplot of squared
differences (or of square root absolute differences:
see Cressie 1993) between all pairs of observations,
sorted by distance band. An implementation of this
device in an interactive dynamic graphics framework
consisting of ArcView and XGobi is illustrated in
Figure 5 (from Majure et al 1996). By brushing
points in the cloud plot, lines are drawn between
pairs of observations on the map, suggesting
potential regions that are spatial outliers. A similar,
but more encompassing approach is implemented in
the Regard software, where the variogram cloud is
included as one of the linked views of the data to
facilitate a search for local pockets of spatial

(b)

Fig 4. Spatial cumulative distribution function (SCDF) in


ArcView–Xgobi. Fig 5. Brushed variogram cloud plot in ArcView–Xgobi.
Source: Majure et al 1996a Source: Majure et al 1996a

259
L Anselin

non-stationarity (Bradley and Haslett 1992; Haslett geostatistics software packages (e.g. S+SpatialStats,
1992; Haslett et al 1991; Haslett and Power 1995). MathSoft 1996a), although the linkage to GIS is still
The spatial lag scatterplot (also referred to as a limited or non-existent at the time of writing.
lagged scatterplot) and the variogram box plot
provide two different summary views of the 4.3 Visualising spatial autocorrelation: the
information in the cloud plot. The spatial lag
lattice perspective
scatterplot focuses on the observation pairs that
belong to a given distance class, that is, a subsection Central in the lattice perspective to spatial
of the variogram cloud between two distances. The autocorrelation is the concept of a spatial weights
value observed at each point is plotted against the matrix and associated spatially lagged variable or
value observed at the ‘lagged’ point (a point spatial lag. The non-zero elements of the spatial
separated from it by a distance belonging to the weights matrix indicate for each location which
given distance band). The spatial lag scatterplot other locations potentially interact with it (the so-
identifies potential influential locations as points called spatial neighbours). Furthermore, the value of
that are far-removed from the 45 degree line (Majure the non-zero elements is related to the relative
and Cressie 1997). The variogram box plot consists strength of this interaction (for technical details see
of a box plot for each distance band in the Cliff and Ord 1981; Haining 1990; Upton and
variogram cloud, as in the left-hand side of Figure 6, Fingleton 1985). A spatial lag is constructed as a
illustrating the spatial dependence in the West weighted average (using the weights in the spatial
Virginia housing values. For several distance bands, weights matrix) of the values observed for the
outliers may be identified as points outside the neighbours of a given location (see Anselin 1988).
fences of the box plot. These outliers can be The matching of the value observed at a location
associated with the pairs of locations to which they with its spatial lag for a given spatial weights matrix
correspond, as in the right-hand side of Figure 6, provides useful insight into the local pattern of spatial
typically obtained in an interactive manner (and in a association in the data. More precisely, when a high
way similar to the procedure illustrated in Figure 5). degree of positive spatial autocorrelation is present,
Extensions of both types of plots are possible in the observed value at a location and its spatial lag will
many ways, for example by using robustified tend to be similar. Spatial outliers will tend to be
measures of squared difference, by focusing on characterised by very different values for the location
different directions (anisotropy), or by including and its spatial lag, either much higher or much lower in
multiple variables (for extensive examples see Majure the location compared to the average for its
and Cressie 1997). ESDA techniques based on the neighbours. The association between a variable and its
geostatistical perspective can be found in many spatial lag can be visualised by means of so-called
academic as well as a number of commercial spatial lag pies and spatial lag bar charts (Anselin 1994;

40.5

39.5
y coordinate

2
gamma

38.5
1

37.5
0

0.0 0.5 1.0 1.5 2.0 -82 -81 -80 -79 -78
distance x coordinate
Fig 6. Variogram box plot with outlier pairs identified by location.

260
Interactive techniques and exploratory spatial data analysis

Anselin et al 1993; Anselin and Bao 1997). Both of


these are made up of visual devices (size of the pie or
length of the bar) that indicate the relative value of the
spatial lag compared to the value at a location, as
illustrated in Figure 7. Other visualisation schemes are
possible as well, for example based on the difference,
absolute difference, squared difference, or ratio between
the value observed at a location and its spatial lag.
These devices can be implemented in most GIS and
mapping software in a straightforward way. In addition
to the usual zooming and querying facilities available in
an interactive GIS, the use of spatial lag pies or spatial
lag bar charts could be made dynamic by allowing an W_val 90
interactive definition of the spatial weights matrix. It is Val 90
envisaged that systems implementing these ideas will be
available in the near future.
A more formal approach towards visualising
Fig 7. Spatial lag pie chart in ArcView–SpaceStat.
spatial association can be based on the concept of a
Moran scatterplot and associated scatter map slope of the line) as well as local spatial association
(Anselin 1994, 1995b, 1997). It follows from the (local trends in the scatterplot). The latter is
interpretation of the Moran’s I statistic for spatial obtained by the decomposition of the scatterplot
autocorrelation as a regression coefficient in a into four quadrants, each corresponding to a
bivariate spatial lag scatterplot. More precisely, in a different type of spatial association: positive
scatterplot with the spatial lag on the vertical axis association between high values in the upper right
and the value at each location on the horizontal axis, and between low values in the lower left quadrants;
Moran’s I corresponds to the slope of the regression negative association between high values surrounded
line through the points. When the variables are by low values in the lower right and the reverse in
expressed in standardised form (i.e. with mean zero the upper left quadrant. An illustration of this
and standard deviation equal to one), this allows for decomposition for the West Virginia data is given in
an assessment of both global spatial association (the Figure 8. The spatial locations that correspond to

4 4

2 2
Spatial lag

Spatial lag

0 0

-2 -2

-4 -4
-4 -2 0 2 4 -4 -2 0 2 4
Housing value Housing value
(a) (b)

Fig 8. Moran scatterplot with linear and loess smoother.

261
L Anselin

the points in the scatterplot can be found in a linked associated map (Figure 9) can easily be implemented in
map, where each quadrant is represented by a a dynamic graphics setting, for example using the
different shade or colour, as in Figure 9. By ArcView–SpaceStat linked framework.
interactively identifying particular points in the
graph (e.g. extreme values), the corresponding
location can be shown on the map. This is a 5 IMPLEMENTATION AND FUTURE DIRECTIONS
straightforward extension of the notion of brushing
scatterplots to assess local spatial association. To date a fully interactive ESDA functionality is not
Two additional interpretations of the Moran yet part of commercial GIS. However, several partial
scatterplot are useful in an interactive ESDA setting. implementations exist, where a spatial statistical
One is to identify outliers or high leverage points that ‘module’ is added to an existing GIS (a point also
unduly influence the slope of the regression line (i.e. made by Aspinall, Chapter 69; Boots, Chapter 36;
the measure of global spatial association). Such Fischer, Chapter 19; and Getis, Chapter 16). Early
outliers can be found by means of standard regression discussions of these approaches were primarily
diagnostics and are easily identified on a map in a conceptual, and a number of different taxonomies
linked framework. They can also be related to the for integration have been advanced, primarily
significance of local indicators of spatial association focusing on the nature of the linkage – closely
(LISA) statistics (Getis, Chapter 16; Anselin 1995b; coupled versus loosely coupled – and the types of
Getis and Ord 1992; Ord and Getis 1995). In statistical function that should be included (e.g.
conjunction with a map of significant LISA statistics, Anselin and Getis 1992; Goodchild et al 1992).
the Moran scatterplot provides the basis for a Building on the general framework outlined by
substantive interpretation of spatial clusters or spatial Anselin and Getis (1992), a schematic overview of
outliers (further details are given by Anselin 1995b, the interaction between different analytical functions
1996). A second interpretation is to consider the extent of a GIS is given in Figure 10 (based on Anselin
to which a non-linear smoother (such as a loess 1998; see Getis, Chapter 16; and Goodchild and
smoother; Cleveland 1979) approximates the linear fit Longley, Chapter 40, for related conceptual schema).
in the scatterplot. Strong non-linear patterns may Following the usual classification of GIS
indicate different spatial regimes or other forms of functionality into four broad groups (input, storage,
local spatial non-stationarity. For example, on the analysis, and output), the analysis function can be
right-hand side of Figure 8, the loess function suggests further subdivided into selection, manipulation,
two distinct slopes in the graph, one considerably exploration and confirmation. Anselin et al (1993)
steeper than the other. The Moran scatterplot and considered the first two of these to form a ‘GIS
module’ while the latter two formed a ‘data analysis
module’ to emphasise the practical division of labour
between typical commercial GIS software and the
specialised (add-on) software needed to carry out
spatial data analysis. However, this distinction is
becoming increasingly irrelevant, since many
statistical software packages now have some form of
mapping (or even GIS) functionality, and a growing
number of (spatial) statistical functions are included
in GIS software. More important than classifying
these functions as belonging to one or other module
is to stress their interaction and the types of
information that must be exchanged between them,
as illustrated by the linkages in Figure 10. While
many other taxonomies are possible, the main point
of the classification in Figure 10 is that selection and
manipulation (shown on the left) are present in
virtually all advanced systems and have become
known as ‘spatial analysis’ in the commercial world
Fig 9. Moran scatter map. (e.g. ESRI 1995c: Lesson 8). By contrast, the spatial

262
Interactive techniques and exploratory spatial data analysis

data analysis functions (shown on the right) are spatial association (e.g. Ding and Fotheringham
essentially absent in commercial systems. 1992; Bao et al 1995). An alternative is a closely-
The essence of any integration as in Figure 10 is coupled linkage between two software packages that
that spatial information (such as location, topology, allow remote procedure calls (in Unix) or dynamic
and distance) must be transferred from the GIS to data exchange (in a Microsoft Windows
the statistical module and location-specific results of environment). This approach is taken in the only
the statistical analysis must be moved back to the commercial implementation that exists to date of an
GIS for mapping. Apart from the self-contained integrated data analysis and GIS environment, the
approach taken in Spider-Regard-Manet, most S+Gislink between the S-Plus statistical software
implementations to date of ESDA functionality in a and the ARC/INFO GIS (MathSoft 1996b). On
GIS are extensions of existing systems by means of Unix workstations a bi-directional link is established
macro-language scripts. This typically hides the that allows data to be passed back and forth in their
linked nature of the analysis routines from the user. native format. In addition, the linkage allows users
Recent examples are extensions of ARC/INFO with to call S-Plus statistical functions from within
non-spatial EDA tools, such as scatterplots (e.g. ARC/INFO. A similar approach is taken in the
Batty and Xie 1994), and routines for the ArcView–XGobi integration at the Statistics
computation of global and local indicators of Laboratory of Iowa State University. A much looser

Selection Exploratory
spatial data
analysis
views
zooming
browsing spatial distribution
spatial queries
buffers global spatial
association

spatial sampling local spatial


association

Manipulation Confirmatory
spatial data
aggregation analysis
dissolution
map abstraction spatial regression
centroids
tessellation model
specification

estimation
topology
spatial weights
diagnostics

overlay
interpolation spatial prediction

Fig 10. Spatial analysis in GIS.

263
L Anselin

coupling is implemented in the SpaceStat–ArcView Anselin L 1997 The Moran scatterplot as an ESDA tool to
linkage. Both of these efforts focus explicitly on assess local instability in spatial association. In Fischer M,
ESDA, while the S-Plus–ARC/INFO linkage Scholten H, Unwin D (eds) Spatial analytical perspectives on
GIS in environmental and socio-economic sciences. London,
pertains primarily to traditional non-spatial EDA.
Taylor and Francis: 111–25
Several promising research directions are being
pursued in the quest to develop more powerful Anselin L 1998 GIS research infrastructure for spatial analysis
of real estate markets. Journal of Housing Research 8
tools for spatial analysis in GIS in general, and
interactive spatial data analysis in particular. Anselin L, Bao S 1996 SpaceStat.apr user’s guide. Morgantown,
Highly relevant ongoing efforts include the use of Regional Research Institute, West Virginia University
the Internet to facilitate interactive mapping and Anselin L, Bao S 1997 Exploratory spatial data analysis:
visual data exploration (e.g. the Iris framework of linking SpaceStat and ArcView. In Fischer M, Getis A (eds)
Andrienko and Andrienko 1996; and see Batty, Recent developments in spatial analysis – spatial statistics,
behavioural modelling and neurocomputing. Berlin, Springer
Chapter 21), the extension of data mining
techniques to spatial data (e.g. Ng and Han 1994), Anselin L, Dodson R, Hudak S 1993 Linking GIS and spatial
and the use of massive parallel computing for the data analysis in practice. Geographical Systems 1: 3–23
estimation of local indicators of spatial association Anselin L, Getis A 1992 Spatial statistical analysis and
(e.g. Armstrong and Marciano 1995). The extent of geographic information systems. Annals of Regional Science
commercial and academic research activity devoted 26: 19–33
to methodological and computational facets will Armstrong M P, Marciano R 1995 Massively parallel
likely lead to a much-enhanced ESDA functionality processing of spatial statistics. International Journal of
in the GIS of the near future. This is an area of Geographical Information Systems 9: 169–89
rapid change, and it is hoped that the general Bailey T C 1994 A review of statistical spatial analysis in
principles outlined in this chapter may provide a geographical information systems. In Fotheringham A S,
basis for the interpretation and assessment of Rogerson P (eds) Spatial analysis and GIS. London, Taylor
and Francis: 13–44
future developments.
Bailey T C, Gatrell A C 1995 Interactive spatial data analysis.
Harlow, Longman/New York, John Wiley & Sons Inc.
References Bao S, Henry M, Barkley D, Brooks K 1995 RAS: a regional
Andrews D F, Fowlkes E B, Tukey P A 1988 Some approaches analysis system integrated with ARC/INFO. Computers,
to interactive statistical graphics. In Cleveland W S, McGill Environment, and Urban Systems 18: 37–56
M E (eds) Dynamic graphics for statistics. Pacific Grove, Batty M, Xie Y 1994a Modelling inside GIS: part I. Model
Wadsworth: 73–90 structures, exploratory spatial data analysis and aggregation.
Andrienko N, Andrienko G 1996 IRIS, a knowledge-based International Journal of Geographical Information Systems 8:
system for visual data exploration. See also 291–307
http://allanon.gmd.de/and/java/iris/Iris.html Becker R A, Cleveland W S 1987 Brushing scatterplots.
Anselin L 1988 Spatial econometrics: methods and models. Technometrics 29: 127–42
Dordrecht, Kluwer Becker R A, Cleveland W S, Shyu M-J 1996 The visual design
Anselin L 1990a What is special about spatial data? and control of Trellis display. Journal of Computational and
Alternative perspectives on spatial data analysis. In Griffith Graphical Statistics 5: 123–55
D A (ed.) Spatial statistics, past, present, and future. Ann Becker R A, Cleveland W S, Wilks A R 1987 Dynamic
Arbor, Institute of Mathematical Geography: 63–77 graphics for data analysis. Statistical Science 2: 355–95
Anselin L 1992 SpaceStat: a program for the analysis of spatial Bradley R, Haslett J 1992 High interaction diagnostics for
data. Santa Barbara, NCGIA, University of California geostatistical models of spatially referenced data. The
Anselin L 1994a Exploratory spatial data analysis and Statistician 41: 371–80
geographic information systems. In Painho M (ed.) New Buja A, Cook D, Swayne D F 1996 Interactive high-
tools for spatial analysis. Luxembourg, Eurostat: 45–54 dimensional data visualization. Journal of Computational
Anselin L 1995a SpaceStat version 1.80 user’s guide. and Graphical Statistics 5: 78–99
Morgantown, Regional Research Institute, West Buja A, McDonald J A, Michalak J, Stuetzle W 1991
Virginia University Interactive data visualisation using focusing and linking. In
Anselin L 1995b Local indicators of spatial association – Nielson G M, Rosenblum L (eds) Proceedings of Visualisation
LISA. Geographical Analysis 27: 93–115 91. Los Alamitos, IEEE Computer Society Press: 155–62

264
Interactive techniques and exploratory spatial data analysis

Cleveland W S 1979 Robust locally weighted regression and Haslett J, Wills G, Unwin A 1990 SPIDER – an interactive
smoothing scatter plots. Journal of the American Statistical statistical tool for the analysis of spatially distributed
Association 74: 829–36 data. International Journal of Geographical Information
Cleveland W S 1993 Visualizing data. Summit, Hobart Press Systems 4: 285–96

Cleveland W S, McGill M E (eds) 1988 Dynamic graphics for Hazelhoff L, Gunnink J L 1992 Linking tools for exploratory
statistics. Pacific Grove, Wadsworth analysis of spatial data with GIS. EGIS 92, Proceedings
Third European Conference on Geographical Information
Cliff A, Ord J K 1981b Spatial processes: models and Systems. Utrecht, EGIS Foundation: 204–13
applications. London, Pion
Isaaks E H, Srivastava R M 1989 An introduction to applied
Cook D, Buja A, Cabrera J, Hurley C 1995 Grand tour and geostatistics. Oxford, Oxford University Press
projection pursuit. Journal of Computational and Graphical
MacDougall E B 1991 A prototype interface for exploratory
Statistics 4: 155–72
analysis of geographic data. Proceedings, Eleventh Annual
Cook D, Majure J, Symanzik J, Cressie N 1996 Dynamic ESRI User Conference Vol. 2. Redlands, ESRI Inc.: 547–53
graphics in a GIS: a platform for analysing and exploring
Majure J, Cook D, Cressie N, Kaiser M, Lahiri S, Symanzik J
multivariate spatial data. Computational Statistics 11: 467–80
1996a Spatial CDF estimation and visualisation with
Cressie N A C 1993 Statistics for spatial data, revised edition. applications to forest health monitoring. Computing Science
New York, John Wiley & Sons Inc. and Statistics 27: 93–101
Ding Y, Fotheringham A S 1992 The integration of spatial Majure J, Cressie N 1997 Dynamic graphics for exploring
analysis and GIS. Computers, Environment, and Urban spatial dependence in multivariate spatial data. Geographical
Systems 16: 3–19 Systems
ESRI 1995b ArcView 2.1, the geographic information system for Majure J, Cressie N, Cook D, Symanzik J 1996b GIS, spatial
everyone. Redlands, ESRI statistical graphics, and forest health. Proceedings, Third
ESRI 1995c Understanding GIS, the ARC/INFO method. International Conference/Workshop on Integrating GIS and
Redlands, ESRI Inc. Environmental Modeling, Santa Fe, 21–26 January. Santa
Barbara, NCGIA.
Farley J A, Limp W F, Lockhart J 1990 The archaeologist’s
workbench: integrating GIS, remote sensing, EDA and MathSoft 1996a S+SpatialStats user’s manual, version 1.0.
Seattle, MathSoft, Inc.
database management. In Allen K, Green F, Zubrow E (eds)
Interpreting space: GIS and archaeology. London, Taylor and MathSoft 1996b S+Gislink. Seattle, MathSoft, Inc.
Francis: 141–64 Monmonier M 1989 Geographic brushing: enhancing
Fotheringham A S, Charlton M 1994 GIS and exploratory exploratory analysis of the scatterplot matrix. Geographical
spatial data analysis: an overview of some research issues. Analysis 21: 81–4
Geographical Systems 1: 315–27 Ng R, Han J 1994 Efficient and effective clustering
Getis A, Ord J K 1992 The analysis of spatial association by methods for spatial data mining. Technical Report 94–13.
use of distance statistics. Geographical Analysis 24: 189–206 Vancouver, University of British Columbia, Department of
Computer Science
Good I J 1983 The philosophy of exploratory data analysis.
Philosophy of Science 50: 283–95 Openshaw S 1990 Spatial analysis and geographical information
systems: a review of progress and possibilities. In Scholten H,
Goodchild M F 1987 A spatial analytical perspective on Stillwell J (eds) Geographical information systems for urban and
geographical information systems. International Journal of regional planning. Dordrecht, Kluwer: 153–63
Geographical Information Systems 1: 327–34
Openshaw S 1991c Developing appropriate spatial analysis
Goodchild M F, Haining R P, Wise S et al 1992 Integrating GIS methods for GIS. In Maguire D, Goodchild M F, Rhind D
and spatial analysis – problems and possibilities. International (eds) Geographical information systems: principles and
Journal of Geographical Information Systems 6: 407–23 applications. Harlow, Longman/New York, John Wiley &
Haining R P 1990 Spatial data analysis in the social Sons Inc. Vol. 1: 389–402
and environmental sciences. Cambridge (UK), Cambridge Ord J K, Getis A 1995 Local spatial autocorrelation statistics:
University Press distributional issues and applications. Geographical Analysis
Haslett J 1992 Spatial data analysis – challenges. The 27: 286–306
Statistician 41: 271–84 Stuetzle W 1987 Plot windows. Journal of the American
Haslett J, Bradley R, Craig P, Unwin A, Wills G 1991 Statistical Association 82: 466–75
Dynamic graphics for exploring spatial data with Symanzik J, Majure J, Cook D 1996 Dynamic graphics in a
applications to locating global and local anomalies. The GIS; a bidirectional link between ArcView 2.0 and XGobi.
American Statistician 45: 234–42 Computing Science and Statistics 27: 299–303
Haslett J, Power G M 1995 Interactive computer graphics for a Symanzik J, Majure J, Cook D, Cressie N 1994 Dynamic
more open exploration of stream sediment geochemical data. graphics in a GIS: a link between ARC/INFO and XGobi.
Computers and Geosciences 21: 77–87 Computing Science and Statistics 26: 431–35

265
L Anselin

Symanzik J, Megretskaia I, Majure J, Cook, D 1997 Unwin A, Hawkins G, Hofman H, Siegl B 1996
Implementation issues of variogram cloud plots and spatially Interactive graphics for data sets with missing values –
lagged scatterplots in the linked ArcView 2.1 and XGobi MANET. Journal of Computational and Graphical
environment. Computing Science and Statistics 28 Statistics 5: 113–22
Talen E 1997 Visualizing fairness: equity maps for planners. Upton G J, Fingleton B 1985 Spatial data analysis by example.
Journal of the American Planning Association New York, John Wiley & Sons Inc.
Tukey J W 1977 Exploratory data analysis. Reading (USA), Williams I, Limp W, Briuer F 1990 Using geographic
Addison-Wesley information systems and exploratory data analysis for
Unwin A 1994 REGARDing geographic data. In Dirschedl P, archeological site classification and analysis. In Allen K,
Osterman R (eds) Computational statistics. Heidelberg, Green F, Zubrow E (eds) Interpreting space: GIS and
Physica: 345–54 archaeology. London, Taylor and Francis: 239–73

266

You might also like