Functions Exam
Functions Exam
Visualizing Spatial
Autocorrelation
Rationale..................................................................................................................................... 2
Function Design........................................................................................................................ 2
Function Application...............................................................................................................2
Appendix..................................................................................................................................... 6
References................................................................................................................................15
1
Rationale
Function Design
The function first requires users complete two preparatory steps before start.
Firstly, users must load the R packages specified in Figure 2 of the Appendix. Secondly,
users must load a shapefile and its associated attribute file and merge the two based on the
output area coding scheme (such as LSOA, MSOA, OA or by ward). This will allow R to
identify unique attribute values to its output area. The function is then ready to run. Users
must input the datafile, the dataset created by merging the shapefile and attribute data, and
variable, the attribute of interest within the dataset. Additionally, they may choose to
customize parameters within the function: (1) weight allows the user to determine how
2
neighbours of the target area are weighted; (2) p1 and p2 allows users to adjust the
bandwidth in which the local Moran’s estimate falls into (3) method uses the switch
function to allow customizations of the method used to define neighbours, users can choose
between (4) cust, adjusts whether relationships will be defined by queen’s or rook’s case,
or (5) distance.x and distance.y, manages the distance threshold of defining neighbours;
(6)output allows users to specify which visual diagram they want to print; and (7) stats
allows customizations to the first output to specify which estimate in the statistical test is
to be mapped. More information can be found in Figure 2 of the Appendix.
Subsequently, the function processes the user’s inputted information and identifies
neighbours within the dataset. The lag values and local Moran’s I test is calculated; the
statistical results will be bound to the dataset. An if-else statement is added to account for
errors in row standardization; if the values obtained do not fall roughly between -1 to 1, the
function prints and error message requiring the user to re-customize the method, cust,
distance.x and distance.y parameters. If the value stays within the specified range, the data
is put into a switch function that results in different outcomes depending on the input.
When the user specifies output = “Moran”, a choropleth map of a local Moran’s estimate is
printed. When the user specifies output = “LISA”, the function creates quadrants to
determine whether the clustering are high or low values, assigns a colour to the value and
prints a map corresponding to these features. The last option is output = “Scatter”, this
signals the function to recentre the lag values and attribute values and creates a scatter plot
with a line of average values and slope of linear fit. More information can be found in
Figure 1 of the Appendix.
Function Application
3
Firstly, the output “Scatter” produces a scatter plot of lagged values and observed
values from the test. The upper right and lower left quadrants reveal positive correlations
surrounded by high and low values respectively whereas the upper left and lower right
quadrants show negative correlations surrounded by high and low values respectively.
Upon reviewing Figure 5, no obvious correlation can be determined about unemployment
in Southwark, leading to a need for deeper analysis. The output option “Moran” can
produce a map with multiple local Moran’s I estimate. Figure 6 depicts three maps printed
by specifying the “stats” option. From Figure 6.1’s Moran’s I map, areas with clustering
values can be seen the most in the southern part of Southwark. There is a less overt
clustering throughout the northern part with occasional signs of strong dispersion.
Moreover, areas of dispersion corresponding to higher p-values can be seen in Figure 6.2.
This suggests that they are less statistically significant than the areas where similar
unemployment levels are clustered. Alongside the higher z-scores in clustered regions
shown in Figure 6.3, the null hypothesis can be rejected suggesting that unemployment is
positively spatial autocorrelated in Southwark (). The third output option, “LISA”, goes
further to explore whether the clustering is of high values or low values. Depicted by Figure
7, low-low values clusters at the most top and bottom of Southwark whereas occasional
high-high values are more dispersed throughout Southwark. This means that Southwark’s
unemployment levels show a geographical pattern of clustering that is statistically
significant, particularly for the south of Southwark.
4
adjustable weightings system and neighbour relationships increases applicability to
studies. For example, researcher’s studying impacts of disease, where distance is an
influencing factor, can adjust the method of determining relationships (Kang et al., 2020).
The customizable distance also allows the function to be used in wider and more complex
spatial polygons such as studies on the national level or higher (Kim et al., 2003). Benefits
are additionally derived from the customizable conditional requirement of local Moran’s
value falling between “p1” and “p2” as it ensures the accuracy of results by signalling to
errors. The bandwidth has been left open since accurate results do not always fall exactly
between -1 and 1. As a result, these factors reveal that the assumptions of the different
studies can be reflected in the function through its adjustable parameters, achieving the
goals creating an adaptive function that synthesizes the Moran’s I test process.
5
Appendix
6
Figure 2. Function Inputs
cluster_scope(datafile, variable, weight, method, p1, p2, cust, distance.x, distance.y, output, stats)
Variables
Parameter Customizations
“minmax” = divides the weights by the minimum of the maximum row sums and maximum column sums of the input
weights’ (Bivand, RDocumentation)
p1, p2 Specify the bandwidth “Ii” can fall within before and error message is printed, usually between 1 and
-1.
Specify whether neighbours are defined by rook’s or queen’s case when using method = “Def”.
cust Where TRUE signals the usage of queen’s, FLASE signals the usage of rook’s case
Specify distance threshold to define neighbours when using method = “Dist”. Where distance.x
distance.x is the minimum distance and distance.y is the maximum of the specified threshold.
distance.y
Specify which estimate from the local Moran’s I test to be mapped with “Moran” map (first output
Stats option):
7
Specify the map you want to produce:
output “Morans” = a choropleth map of local Moran’s I estimates
“LISA” = a local indicators of spatial association map
“Scatter” = scatter plot of attribute values and lag values
8
abline(h = 0, lty = 2)
abline(v = 0, lty = 2)
xy.lm <- lm(Lag_Values ~ Attribute_Values)
abline(xy.lm, lty=3)}
#Map Output Options
switch(output, "Moran" = print(Map1), "LISA" = Map2(datafile), "Scatter"
= Map3(variable)) } }
library(rgeos)
library(tmap)
library(spdep)
9
Figure 5 Local Moran’s I Scatter Plot
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "Scatter", stats = "
Ii")
10
Figure 6.1 Local Moran’s I Estimate
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "Moran", stats = "Ii
")
11
Figure 6.2 Z-Score
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "Moran", stats = "Z.
Ii")
## Legend labels were too wide. The labels have been resized to 0.51, 0.59, 0
.63, 0.63, 0.57. Increase legend.width (argument of tm_layout) to make the le
gend wider and therefore the labels larger.
12
Figure 6.3 P-Value
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "Moran", stats = "Pr
(z != E(Ii))")
13
Figure 7 Lisa Map
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "LISA", stats = "Ii"
)
14
References
Anselin, L. (2020). (1) LISA and Local Moran: An Introduction to Spatial Data Science.
GeoDa. Local Spatial Autocorrelation. Available at:
https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html#significance-and-
interpretation
Anselin, L. (1992). SpaceStat TUTORIAL: A Workbook for Using SpaceStat in the Analysis of
Spatial Data. University of Illinois, Urbana-Champaign
Bivand, R. Werner G. Müller, M.R. (2009). Power calculations for global and local Moran’s I.
In: Computational Statistics & Data Analysis, Volume 53, Issue 8. Pages 2859-2872. ISSN
0167-9473. https://doi.org/10.1016/j.csda.2008.07.021.
Bivand, R. (spdep versión 1.2-7). poly2nb: Construct neighbours list from polygon list.
RDocumentation. Available at:
https://www.rdocumentation.org/packages/spdep/versions/1.2-7/topics/poly2nb
Espada, R. Apan, A. McDougall, K. (2013). Understanding the January 2011 Queensland
flood: the role of geographic interdependency in flood risk assessment for urban
community. In: Australian and New Zealand Disaster and Emergency Management
Conference (ANZDMC 2013): Earth: Fire and Rain, 28-30 May 2013, Brisbane, Australia.
Humphrey, A.L. Wilson, B.C. Reddy, M. Shroba, J.A. Ciaccio, C.E. (2015). An Association
Between Pediatric Food Allergy and Food Deserts. The Journal of Allergy and Clinical
Immunology. Elsevier Inc. https://doi.org/10.1016/j.jaci.2014.12.1774
Kang, D. Choi, H. Kim, J.H., Choi, J. (2020). Spatial epidemic dynamics of the COVID-19
outbreak in China. In: International Journal of Infectious Diseases, Volume 94. Pages 96-
102, ISSN 1201-9712. https://doi.org/10.1016/j.ijid.2020.03.076.
Kim, J. Elliot, E. Wang, D.M. (2003). A spatial analysis of county-level outcomes in US
Presidential elections: 1988–2000. Electoral Studies. Volume 22, Issue 4, Pages 741-761.
ISSN 0261-3794. https://doi.org/10.1016/S0261-3794(02)00008-2
Legendre, P. (1993). Spatial Autocorrelation: Trouble or New Paradigm?. Ecology, 74:
1659-1673. https://doi.org/10.2307/1939924
Levine, N. (2008). CrimeStat: A Spatial Statistical Program for the Analysis of Crime
Incidents. In: Shekhar, S., Xiong, H. (eds) Encyclopedia of GIS. Springer, Boston, MA.
https://doi.org/10.1007/978-0-387-35973-1_229
Netrdová, A. Nosek, V. (2017). Exploring the variability and geographical patterns of
population characteristics: Regional and spatial perspectives. In: Morvarian Geographical
Reports, 25(2): 85-94. Institute of Geonics, The Czech Academy of Sciences. doi:
10.1515/mgr-2017-0008
Watrel, R.H. Weichelt, R. Davidson F.M. Heppen, J. Fouberg, E.H, Archer, J.C. Morill, R.L.
Shelley, F.M. Martis, K.C. (2018). Atlas of the 2016 Election. Rowan & Little Field. London,
United Kingdom
15