Fuzzy Geographically Weighted Clustering
Fuzzy Geographically Weighted Clustering
net/publication/255650957
CITATIONS READS
31 1,097
2 authors, including:
Dan Jacobson
The University of Calgary
53 PUBLICATIONS 1,603 CITATIONS
SEE PROFILE
All content following this page was uploaded by Dan Jacobson on 22 June 2016.
G. A. Mason1, R. D. Jacobson2
1
University of Calgary,
Dept. of Geography
2500 University Drive NW
Calgary, AB, T2N 1N4
Telephone: +1 403 210 9723
Fax: +1 403 282 6561
Email: [email protected]
2
University of Calgary,
Dept. of Geography
2500 University Drive NW
Calgary, AB, T2N 1N4
Telephone: +1 403 220 6192
Fax: +1 403 282 6561
Email: [email protected]
1. Introduction
Geodemographic analysis has been described as “the analysis of spatially referenced
geodemographic and lifestyle data” (See and Openshaw, 2001, p.269) It is widely used in the
public and private sectors for the planning and provision of products and services.
Geodemographic analysis often uses clustering techniques which are used to classify the
geodemographic data into groups, making the data more manageable for analysis purposes.
Clustering identifies a number of geodemographic groups (clusters), each group having a
particular geodemographic profile. Each geographical area under consideration is then assigned
to a group based on its similarity to the group profile.
Fuzzy clustering offers a method of clustering that uses the principles of fuzzy logic to
calculate a membership value for each subject in each of the groups. So rather than assigning a
geographical area to a single group, each area is allocated a membership value in each of the
groups (clusters), thus helping to overcome the issues of ecological fallacy. The fuzzy clustering
algorithm typically used in geodemographic analysis is Bezdek's fuzzy c-means clustering
algorithm known as FCM (Bezdek et. al., 1984). Fuzzy geodemographic analysis using FCM has
been investigated by Feng and Flowerdew (1998, 1999), and See (1999), but has received scant
attention since - an exception being the recent investigation by one of the authors (Mason, 2006).
This paper proposes a modification to the fuzzy clustering algorithm to incorporate
geographical effects, suitable for geodemographic analysis.
1
values based on “neighbourhood effects”. The neighbourhood effects incorporate geography into
the model. The neighbourhood effects formula adjusts the cluster membership as shown in
equation 1.
n
1
' i = i . ∑w
A j ij j
(1)
=1 (2)
where and are scaling variables to affect the proportion of the original
membership vs. the weighted (calculated) membership
pij b
w ij = a (3)
d ij
A previous investigation by one of the authors (Mason, 2006) identified that the cluster centres
are not only sensitive to changes in variable values but are also sensitive to the amount of
fuzziness applied via a fuzzy exponent configuration variable. This observation led to the
hypothesis that the application of the neighbourhood effects as presented by Feng and Flowerdew
(1998), if included in the fuzzy clustering algorithm, would also affect the cluster centres,
making them “geographically aware”.
The fuzzy c-means algorithm (Bezdek et al., 1984), is an iterative algorithm that re-calculates
the cluster centres and the associated membership values on each iteration, until optimality is
2
Socio-economic datasets for geodemographic analysis
Fuzzy clustering
“+ geography”
i.e. Neighbourhood effects
(ii) ex post facto adjustment of cluster (iii) Geographically weighted clustering, where
membership after original fuzzy clusters become “geographically aware”, being
clustering with the neighbourhood sensitive to neighbourhood effects; cluster
effects incorporated membership and characteristics evolve
throughout the clustering process.
3
achieved. The proposed modification to the algorithm adds an additional step to each iteration,
that applies a weighting to the membership values using the technique noted in 3.2 below,
subsequent to the “standard” calculation of the membership values as prescribed by Bezdek.
Feng and Flowerdew's neighbourhood effects have some limitations. First they ignore the effects
of areas which have no common boundary (See, 1999). Second, they exclude the effects of
population - a key geodemographic consideration. To overcome these limitations, a modified
version of the cluster membership adjustment is proposed that incorporates a spatial interaction
effect model similar to that discussed by Birkin and Clarke (1991). This proposed model
calculates the influence of one area upon another as the product of the populations of the areas. A
distance decay effect is implemented in the divisor. This effect is implemented through the
weighting factor as described in equation 6.
The adjusted cluster membership for the fuzzy geographically weighted clustering algorithm,
which is calculated in each iteration of the fuzzy clustering algorithm, is shown in equation 4:
n
1
' i = i . ∑ w ij j (4)
A j
mi m j b
w ij = (6)
d ija
where mi, mj are the population of areas i and j respectively
dij is the distance between i and j
and a and b are user definable parameters
and A is a factor to scale the "sum" term, and is calculated across all
clusters, ensuring that the sum of the memberships for a given area for
all clusters is equal to one.
4. Results
The Fuzzy Geographically Weighted Clustering (FGWC) algorithm was run against a test dataset
of socio-economic demographic variables, using a=1, b=1, and β values from 0.0 to 0.5. A
seven cluster scenario with fuzzy exponent of 1.2 was used. The aim was to determine how the
cluster centres and cluster membership values change based on the application of the new spatial
interaction weighting factor in the algorithm.
4
4.1 Nearest Cluster Distance
Taking the cluster centre for the β =0.0 scenario as the base (this is the equivalent of the normal
fuzzy clustering) a calculation was made of the multidimensional euclidean distance (based on
standardized variable values) to the nearest cluster of each of the β=0.1, β=0.2, β=0.3, β=0.4
and β=0.5 scenarios. This distance measure is often used as a cluster similarity measure and is
used here to show how far the cluster centre has moved as a result of the spatial membership
weighting.
α=0.9 α=0.8 α=0.7 α=0.6 α=0.5
β=0.1 β=0.2 β=0.3 β=0.4 β=0.5
We can see in Table 1 that as a general trend, for increasing β (decreasing α), the further
away the nearest cluster is, indicating a greater difference in the characteristics defined by that
cluster centre. An illustration of the movement of the cluster centre across changing β is shown
in Figure 2.
Figure 2: The trajectory of a cluster centre across increasing β for variables 1 and 7.
5
4.2 Moran's I
In the consideration of the effects of the FGWC algorithm on spatial autocorrelation, it was
hypothesized that spatial autocorrelation of the cluster membership values would increase with
increasing β, since the weighting factor incorporates the influence of local areas. To test this,
Moran's I was calculated for each cluster in each scenario to identify how the spatial auto-
correlation changes with increasing β. For each β, the totals across all clusters were also
calculated as an indicator of the total spatial autocorrelation.
The standard deviations of the membership values for each cluster (labelled c1 through c7) was
calculated across the values of β. The results are shown in Table 3.
This shows that overall the memberships are getting less distinct as the spatial interaction
effect of the surrounding areas weighs more heavily on the resultant membership values. We also
can see that the maximum membership value is decreasing with β.
β = 0.0 β = 0.1 β = 0.2 β = 0.3 β = 0.4 β = 0.5
c1 0.3390 0.2997 0.2441 0.2432 0.1875 0.1172
c2 0.3095 0.2524 0.2565 0.2474 0.1991 0.1570
c3 0.3418 0.2988 0.2457 0.2262 0.2120 0.1869
c4 0.3430 0.2222 0.2687 0.2196 0.1406 0.0962
c5 0.1899 0.2861 0.2623 0.2096 0.1658 0.1629
c6 0.3111 0.2935 0.1955 0.1928 0.2217 0.1907
c7 0.2441 0.2799 0.2657 0.1588 0.1945 0.1554
max mem 1.0000 0.9588 0.9221 0.8799 0.8326 0.7659
Table 3: Standard deviations and maximum membership values of memberships across β
5. Conclusion
The Fuzzy Geographically Weighted Clustering algorithm offers a geographically aware
alternative to a regular clustering algorithm which provides the capability to apply population
and distance effects into a geodemographic cluster analysis. The resultant cluster centres are
different than the unweighted scenario reflecting the application of the spatial interaction effects.
The spatial autocorrelation of the cluster membership values increase as the spatial interaction
6
weighting is increased, and the membership values indicate an increasing homogenization of the
clusters, as measured by the standard deviation of the membership values. There is considerable
potential for further research into this technique and for the application to real world scenario's.
6. References
Bezdek, J.C., R. Ehrlich, et al. (1984) FCM: the fuzzy c-means clustering algorithm, Computers and GeoSciences
10: 191-203
Birkin, M and G. P. Clarke (1991) Spatial Interaction in Geography, Geography Review, 4(5), pp 16-24
Feng, Z. and R. Flowerdew (1998), Fuzzy Geodemographics: a contribution from fuzzy clustering methods. In S.
Carver (ed) Innovations in GIS 5. London: Taylor & Francis.
Feng, Z. and R. Flowerdew (1999) The use of fuzzy classification to improve geodemographic targeting, in Gittings,
B (ed.) Innovations in GIS, Vol 6. London: Taylor & Francis.
Openshaw, S. (1998), Towards the Marketing System that Thinks, Institute of Direct Marketing Lecture, [online]
http://www.geog.leeds.ac.uk/presentations/98-8/tsld104.htm
See, L. (1999) Geographical applications of fuzzy logic and fuzzy hybrid techniques, PhD dissertation, School of
Geography, University of Leeds
See L. and S. Openshaw (2001). Fuzzy geodemographic targeting. In G. Clarke, M. Madden (Eds.) Regional
Science in Business. Berlin:Springer. pp 269-282.