Spatial Pattern Analysis
Spatial Pattern Analysis
Spatial Analysis
6.1 Introduction
Chapter 6
Spatial analysis, in a narrow sense, is a set of mathematical
Spatial Analysis (and usually statistical) tools used to find order and
patterns in spatial phenomena.
6. Spatial Analysis
1
6. Spatial Analysis 6. Spatial Analysis
Objectives of spatial analysis are To achieve the first objective, we do ‘exploratory spatial
analysis’, which is considered as a part of ‘spatial data
1. to detect spatial patterns that cannot be detected by mining’.
visual analysis, and
2. to confirm whether a spatial pattern found in visual The second objective emphasizes statistical analysis,
analysis is significant (nonrandom). because it borrows statistical techniques from spatial
statistics. It is often called ‘confirmatory spatial analysis’.
Spatial analysis started as a subject-dependent statistics in One of the questions frequently raised in spatial analysis
biometrics, epidemiology, and econometrics. Then spatial is
analysis drew attention of statisticians in 1970’s, and the
methodology was drastically sophisticated. ‘Is this spatial pattern statistically significant?’
Though spatial analysis includes non-statistical analysis, it We want to know whether a pattern emerged only by
is often called ‘spatial statistics’ because a great part of chance or it appears from certain causes.
spatial analysis is based on statistical analysis.
1. Fotheringham, A. S. and Rogerson, P. (1993): Spatial 5. Longley P. and Batty, M. (2003): Advanced Spatial
Analysis and GIS, Taylor and Francis. Analysis: The CASA book of GIS, ESRI Press.
2. Bailey, B. C. and Gatrell, A. C. (1995): Interactive 6. Maguire, D. J., Batty, M., and Goodchild, M. F. (2005):
Spatial Data Analysis, Prentice-Hall. GIS, Spatial Analysis and Modeling, ESRI Press.
3. O'Sullivan, D. and Unwin, D. (2002): Geographic
Information Analysis, John Wiley.
4. Wong, D. W.-S. and Lee, J. (2005): Statistical Analysis
with ArcView GIS and ArcGIS, John Wiley.
2
6. Spatial Analysis 6. Spatial Analysis
1. Hoel, A. C. (1984): Introduction to Mathematical Statistics, 1. Ripley, B. D. (1981): Spatial Statistics, John Wiley.
5th Edition, John Wiley. 2. Upton, G. and Fingleton, B. (1985): Spatial Data
2. Wonnacott, T. H. and Wonnacott, R. J. (1990): Analysis by Example: Volume 1: Point Pattern and
Introductory Statistics, 5th Edition, Jown Wiley. Qualitative Data, John Wiley.
3. Wonnacott, T. H. and Wonnacott, R. J. (1990): 3. Cressie, N. (1993): Statistics for Spatial Data, 2nd
Introductory Statistics for Business and Economics, 4th Edition, John Wiley.
Edition, Jown Wiley. 4. Moore, M. (2001): Spatial Statistics : Methodological
4. Fox, J. (1997): Applied Regression Analysis, Linear Aspects and Applications, Springer.
Models, and Related Methods, Sage Publication.
5. Chatterjee, S., Price, B., and Hadi, A. S. (1999):
Regression Analysis by Example, 3rd Edition, John Wiley.
5. Diggle, P. (2002): Statistical Analysis of Spatial Point Points are the most fundamental spatial objectsin GIS.
Patterns, Oxford University Press. They are used for representing zero-dimensional spatial
6. Schabenberger, O. and Gotway, C. A. (2004): objects, that is, locations in a two- or higher dimensional
Statistical Methods For Spatial Data Analysis , space.
Chapman & Hall.
In GIS, however, points are also used for representing
spatial objects including lines and polygons that are
relatively smaller than the study region. The distribution
of retail stores, for example, is represented as a point
distribution, though stores are at least two-dimensional
spatial objects in the real world.
Analysis of point distributions, which is often called ‘point 1. Ripley, B. D. (1981): Spatial Statistics, John Wiley.
pattern analysis’, is one of the basic methods in spatial 2. Upton, G. and Fingleton, B. (1985): Spatial Data
analysis. Analysis by Example: Volume 1: Point Pattern and
Qualitative Data, John Wiley.
Point pattern analysis is applicable to any spatial 3. Cressie, N. (1993): Statistics for Spatial Data, 2nd
distribution represented as a set of points in GIS. Edition, John Wiley.
3
6. Spatial Analysis
This does not imply that we cannot treat points that are
not homogeneous in the real world. In basic point pattern
analysis, we focus only on the spatial aspect of point
distributions, neglecting their attributes.
6. Spatial Analysis
Because of this, in point pattern analysis we use a To describe the degree of spatial clustering of a point
quantitative measure that indicates the degree of distribution, nearest neighbor distance method uses the
clustering. average distance from every point to its nearest neighbor
point.
4
6. Spatial Analysis
1 n
W= ∑ di
n i =1
W=23.45 W=35.71 W=72.85
6. Spatial Analysis
This indicates that the concept of spatial cluster is based To evaluate a point distribution, we consider the point
on the pattern of points with respect to the size of region distribution under CSR (Complete Spatial Randomness).
in which the points are located. Points are distributed randomly over an infinite space.
To evaluate the degree of spatial clustering, therefore, we This type of point distribution is called a homogeneous
have to standardize the nearest neighbor distance, taking Poisson distribution. Homogeneous Poisson distribution
the region size into account. has one parameter λ, the density of points.
5
6. Spatial Analysis 6. Spatial Analysis
The expectation of the nearest neighbor distance of points We can standardize the nearest neighbor distance W by
under CSR is represented as a function of point density λ: dividing it by its expectation under CSR.
1
E [W ] =
2 λ
S: The region in which points are distributed The standardized nearest neighbor distance is then given
A: The area of S by
W
w=
The point density in S substitutes for the point density λ: E [W ]
n
λ= n
A = 2W
A
This is an approximate calculation, because homogeneous
Poisson distribution is defined only in an infinite space.
6. Spatial Analysis
6
6. Spatial Analysis 6. Spatial Analysis
The standardized nearest neighbor distance is a In statistical tests, we build null and alternative hypotheses.
descriptive measure of the degree of point clustering.
Once we calculate it for a point distribution and obtain a Null hypothesis H0:
small value, we want to know whether we can say with
confidence “points are clustered.” Points are randomly distributed, following a
homogeneous Poisson distribution.
In significance test, we use the nearest neighbor distance For statistical test, we need the probability distribution of
W as a statistic. the statistic W under the null hypothesis, CSR.
If W is significantly small (large), we accept the If the points are randomly distributed over an infinite
alternative hypothesis H1, and we can say that the points space, the probability distribution of W is given by a
are clustered (dispersed) at a certain significance level. normal distribution:
Otherwise, accepting the null hypothesis H0, we say that
the points are randomly distributed. ⎛ 1 4 −π ⎞
N⎜ , ⎟
⎝ 2 λ 4π nλ ⎠
In reality, however, points are distributed in a bounded 1) When n is large enough (>100), we randomly choose m
region, which makes the probability distribution of W sample points and calculate the average nearest neighbor
under CSR slightly different from the normal distribution. distance. We can excludes the edge effect by the random
sampling.
This is called ‘edge effect’, which has to be corrected in
1 m
∑ di
the statistical test.
W=
m i =1
Correction of the edge effect depends on the number of
points.
7
6. Spatial Analysis 6. Spatial Analysis
2) When n is not so large (n<100), we use all the n points 1. We cannot distinguish all point distributions only by
in calculation of the nearest neighbor distance. In this case, the nearest neighbor distance.
the probability distribution of W under CSR is
approximated by a normal distribution
⎛ A L L A A⎞
N ⎜⎜ 0.5 + 0.051 + 0.041 , 0.070 2 + 0.037 5 ⎟⎟
⎝ n n n n n n ⎠
L: Perimeter of S
2. The result depends on the definition of S, the region in K-function method overcomes the first limitation of the
which points are distributed. nearest neighbor distance method. K-function method can
distinguish various types of point distributions not
distinguishable by the nearest neighbor distance method.
8
6. Spatial Analysis 6. Spatial Analysis
Definition of K-function
K-function indicates the average number of points within In contrast to the nearest neighbor distance, the K-
a certain distance h from points. It is, therefore, a function shows a large value when points are clustered.
function of the distance h.
K(h) K(h)
h h
9
6. Spatial Analysis 6. Spatial Analysis
K ( h) =
i j ≠i
h: Distance parameter
σij: Binary function defined by nλ
The numerator is the total number of points within a
⎪⎧1 if xi − x j ≤ h certain distance h from points.
σ ij ( h ) = ⎨
⎪⎩0 otherwise
K-function describes the degree of spatial clustering at the Since both the K-function and the nearest neighbor
scale represented by the distance parameter h. distance methods use distance between points, they are
often called ‘distance methods’.
A large h implies that we are discussing the point
distribution at a small scale, in other words, in a large
spatial extent. If K(h) shows a large value for a large h, we
say that points are globally clustered, or to be exact,
points are clustered at the scale of h.
To evaluate the degree of spatial clustering of points, we Comparing K(h) and its expectation, we can classify point
again consider the distribution of points under CSR. distributions into one of three categories:
The expectation of K-function of points under CSR is K(h) > πh2: Points are clustered
K(h) ≈ πh2: Points are randomly distributed
E ⎡⎣ K ( h ) ⎤⎦ = π h 2 K(h) < πh2: Points are dispersed
10
K(h)
K(h) K(h)
E[K(h)]
h h
Figure: K-function and its expectation under CSR Figure: K-function and its expectation under CSR
K(h) K(h)
h h
Figure: K-function and its expectation under CSR Figure: K-function and its expectation under CSR
6. Spatial Analysis
Standardization of K-function
K (h)
K(h)
L (h) = −h
π
The standardized K-function is called L-function.
h
11
6. Spatial Analysis
1.6
1.4
1.2
Point distributions are then classified as below.
1.0
L(h) 0.8
L(h) > 0: Points are clustered 0.6
L(h) ≈ 0: Points are randomly distributed 0.4
L(h) < 0: Points are dispersed 0.2
0.0
0 10 20 30
100 h
Statistical test is applicable also to the K-function method. Null hypothesis H0:
We can answer questions such as “whether points are Points are randomly distributed, following a
significantly clustered?” homogeneous Poisson distribution.
If K(h) is significantly large (small), we accept the For statistical test we need the probability distribution of
alternative hypothesis H1, and we can say that the points K(h) of points under CSR, which depends on the number
are clustered (dispersed). Otherwise, accepting the null of points n.
hypothesis H0, we say that the points are randomly
distributed.
12
6. Spatial Analysis 6. Spatial Analysis
1) When n is large enough (n>100), we randomly choose m 2) When n is not so large (n<100), the probability
sample points and calculate the K-function. The distribution of points under CSR cannot be represented in
probability distribution of the K-function of points under an analytical form. In such a case, we often use Monte
CSR is approximately given by a normal distribution Carlo simulation. We repeatedly distribute n points
randomly in S and calculate the K-function. Repeating
this at least 10,000 times, we can obtain an approximate
⎛ π h2 ⎞ probability distribution of points under CSR.
N ⎜ π h2 , ⎟
⎝ mλ ⎠
6. Spatial Analysis
Hypotheses
The quadrat method differs from the distance methods in c: Number of cells covering S.
statistical hypotheses. xi: Number of points in cell i.
x : Average number of points in a cell
Null hypothesis H0:
Points are distributed according to the uniform The quadrat method uses the χ2 statistic defined by
distribution in S. This hypothesis is equivalent to the
c
∑(x − x )
dispersed distribution discussed in the distance methods. 2
i
x
Points are spatially clustered.
13
6. Spatial Analysis 6. Spatial Analysis
If points are uniformly (dispersedly) distributed, the χ2 When points are distributed according to the uniform
statistic shows a small value, because xis will be close to distribution, the χ2 statistic approximately follows the χ2
the mean x . If χ2 is very small, we accept the null distribution with c degrees of freedom.
hypothesis and say that points are uniformly distributed.
Consequently, we can test the significance of χ2 by the
If points are spatially clustered, the χ2 statistic shows a ordinary χ2 test which is used in basic statistics.
large value. If χ2 is large enough, we can reject the null
hypothesis and conclude that points are not uniformly
distributed.
χ2 test
The χ2 test is widely used in statistics. Basically, it is a test The χ2 statistic, in its original form, is
for comparing an observed sample distribution with a c
∑( x − y )
2
distribution derived from a theoretical model.
i i
χ2 = i =1
yi
where yi is the expectation of xi when events follow the
theoretical distribution. The χ2 statistic shows a small
value if the theory fits the observed data.
History
In point pattern analysis, the theoretical distribution The quadrat method is first used in geography by a
considered in the χ2 test is the uniform distribution in S. Japanese geographer Isamu Matsui.
The expectation of xi, which is denoted by yi, is thus given
by the average number of points in a cell: Matui, I. (1932): Statistical study of the distribution of
1 c scattered villages in two regions of the Tonami Plain,
yi = ∑ xi Toyama Prefecture, Japanese Journal of Geology and
n i =1 Geography, 9, 251-255.
=x
n
=
c
14
6. Spatial Analysis 6. Spatial Analysis
One of the advantages of the quadrat method is that we The quadrat method aggregates point data into raster
can analyze a point distribution statistically in data. This implies that the quadrat method ignores a large
comparison of any distribution derived from a theory. amount of locational information in the observed point
distribution.
In the null hypothesis, we can consider not only the
uniform distribution but also any other distribution Because of this, the quadrat method has several
derived from a spatial model, say, inhomogeneous Poisson limitations to which we should pay attention.
distribution, a clustered distribution, etc..
1. The result depends on the cell size. 2. The result depends on the definition of the region in
which points are distributed.
3. The quadrat method cannot distinguish some different Those limitations are quite similar to those of the nearest
distributions. neighbor distance method. Consequently, one solution is
to try various cell size and interpret the result as a
function of the spatial scale represented by the cell size.
This method is parallel to the K-function method.
15
6. Spatial Analysis 6. Spatial Analysis
Homework Q.6.1
Even if we do so, if cells contain only a few points, Suppose a 3 x 3 square lattice as shown below. We locate
statistical test does not work successfully. We cannot three points randomly on the lattice. It is prohibited to
reject the null hypothesis and always have to say “points locate the points on the lattice boundary.
are uniformly distributed.”
1. What is the probability that all the points are located Suppose parallel lines equally spaced by a distance l on an
in the same cell? infinite plane. We randomly drop a line segment of length l.
2. What is the probability that the three points are What is the probability that the line segment crosses one of
arranged lengthways? the parallel lines?
3. What is the probability that the three points are
arranged lengthways, breadthways, or diagonally, as
seen in the bingo game winner?
6. Spatial Analysis
Homework Q.6.3
16