0% found this document useful (0 votes)
10 views16 pages

Functions Exam

good research

Uploaded by

Johnathan Chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views16 pages

Functions Exam

good research

Uploaded by

Johnathan Chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Function Manual

Visualizing Spatial
Autocorrelation

Chak Long Chan


21st November, 2022
Table of Contents

Rationale..................................................................................................................................... 2

Function Design........................................................................................................................ 2

Function Application...............................................................................................................2

Potential and Limitations...................................................................................................... 4

Appendix..................................................................................................................................... 6

Figure 1. Function Flow Chart........................................................................................................ 6

Figure 2. Function Inputs................................................................................................................. 7

Figure 3. R script of cluster_scope................................................................................................ 8

Figure 4. R script of cluster_scope applied to Southwark data........................................ 9

Figure 5 Local Moran’s I Scatter Plot........................................................................................ 10

Figure 6.1 Local Moran’s I Estimate.......................................................................................... 11

Figure 6.2 Z-Score............................................................................................................................. 12

Figure 6.3 P-Value..............................................................................................................................13

Figure 7 Lisa Map.............................................................................................................................. 14

References................................................................................................................................15

1
Rationale

Spatial autocorrelation is a spatial analysis method that assesses the similarities of


an object with its neighbours across a given space. This phenomenon poses challenges to
statistical testing as the assumption of autocorrelation rejects the null hypothesis of
independence (Legendre, 1993). The local Moran’s I statistical testing is a widely used
method to demonstrate this. By multiplying all cross products of deviation values from
each target feature and its neighbours with a unique weight, an estimate is obtained
(Anselin, 1992). This value typically falls between -1, 0 and 1, revealing positive, negative
or no correlation. Through binding this statistical information to spatial polygon data
frames, choropleth maps and scatter plots can be created to visually depict the clustering of
a variable and whether the clusters are surrounded by neighbours with similar values. The
application of local Moran’s statistical testing has been insightful for numerous studies.
This is evident in research regarding geographical patterns of food allergy, election choices,
flood risks and etc (Humphrey et al., 2015; Watrel et al, 2018; Espada et al., 2013).
However, the computational difficulty of statistical calculations has limited the efficiency of
visualizing spatial data (Bivand et al., 2009). These two factors create strong support for
creating a function that is applicable to many studies and minimizes researcher’s efforts on
computation. As a result, cluster_scope aims to synthesise the processes of conducting local
Moran’s I statistical testing by providing users with a single adaptive function that can
multidimensionally visualize spatial patterns and relationship.

Function Design

The function first requires users complete two preparatory steps before start.
Firstly, users must load the R packages specified in Figure 2 of the Appendix. Secondly,
users must load a shapefile and its associated attribute file and merge the two based on the
output area coding scheme (such as LSOA, MSOA, OA or by ward). This will allow R to
identify unique attribute values to its output area. The function is then ready to run. Users
must input the datafile, the dataset created by merging the shapefile and attribute data, and
variable, the attribute of interest within the dataset. Additionally, they may choose to
customize parameters within the function: (1) weight allows the user to determine how

2
neighbours of the target area are weighted; (2) p1 and p2 allows users to adjust the
bandwidth in which the local Moran’s estimate falls into (3) method uses the switch
function to allow customizations of the method used to define neighbours, users can choose
between (4) cust, adjusts whether relationships will be defined by queen’s or rook’s case,
or (5) distance.x and distance.y, manages the distance threshold of defining neighbours;
(6)output allows users to specify which visual diagram they want to print; and (7) stats
allows customizations to the first output to specify which estimate in the statistical test is
to be mapped. More information can be found in Figure 2 of the Appendix.

Subsequently, the function processes the user’s inputted information and identifies
neighbours within the dataset. The lag values and local Moran’s I test is calculated; the
statistical results will be bound to the dataset. An if-else statement is added to account for
errors in row standardization; if the values obtained do not fall roughly between -1 to 1, the
function prints and error message requiring the user to re-customize the method, cust,
distance.x and distance.y parameters. If the value stays within the specified range, the data
is put into a switch function that results in different outcomes depending on the input.
When the user specifies output = “Moran”, a choropleth map of a local Moran’s estimate is
printed. When the user specifies output = “LISA”, the function creates quadrants to
determine whether the clustering are high or low values, assigns a colour to the value and
prints a map corresponding to these features. The last option is output = “Scatter”, this
signals the function to recentre the lag values and attribute values and creates a scatter plot
with a line of average values and slope of linear fit. More information can be found in
Figure 1 of the Appendix.

Function Application

Cluster_scope can be used to explore whether there is a geographical pattern in


unemployment rates in Southwark (2011)- this process is depicted in Figure 4. The
attributes data frame and Southwark’s shapefile is loaded separately into R and merged
into one data set by their specified area code “OA”. Through inputting the required
variables and adjusting the parameters to fit the data set, various insightful outputs can be
obtained.

3
Firstly, the output “Scatter” produces a scatter plot of lagged values and observed
values from the test. The upper right and lower left quadrants reveal positive correlations
surrounded by high and low values respectively whereas the upper left and lower right
quadrants show negative correlations surrounded by high and low values respectively.
Upon reviewing Figure 5, no obvious correlation can be determined about unemployment
in Southwark, leading to a need for deeper analysis. The output option “Moran” can
produce a map with multiple local Moran’s I estimate. Figure 6 depicts three maps printed
by specifying the “stats” option. From Figure 6.1’s Moran’s I map, areas with clustering
values can be seen the most in the southern part of Southwark. There is a less overt
clustering throughout the northern part with occasional signs of strong dispersion.
Moreover, areas of dispersion corresponding to higher p-values can be seen in Figure 6.2.
This suggests that they are less statistically significant than the areas where similar
unemployment levels are clustered. Alongside the higher z-scores in clustered regions
shown in Figure 6.3, the null hypothesis can be rejected suggesting that unemployment is
positively spatial autocorrelated in Southwark (). The third output option, “LISA”, goes
further to explore whether the clustering is of high values or low values. Depicted by Figure
7, low-low values clusters at the most top and bottom of Southwark whereas occasional
high-high values are more dispersed throughout Southwark. This means that Southwark’s
unemployment levels show a geographical pattern of clustering that is statistically
significant, particularly for the south of Southwark.

Potentials and Limitations

As demonstrated, the combination of varied analytical methodologies benefits the


researcher. Whilst a local-Moran map does show a pattern of variation geographically, a
positive value does not necessarily mean that it is a hotspot as it can consist of low values
(Levine, 2008). This is accounted for by incorporating LISA map which would subsequently
inform the researcher on the characteristics of the cluster.

The function’s applicability is also maximized by the large amounts of


customizations available to the user. Firstly, maps are outputted by the switch function
allowing users to easily display different visual representations of the base test. The

4
adjustable weightings system and neighbour relationships increases applicability to
studies. For example, researcher’s studying impacts of disease, where distance is an
influencing factor, can adjust the method of determining relationships (Kang et al., 2020).
The customizable distance also allows the function to be used in wider and more complex
spatial polygons such as studies on the national level or higher (Kim et al., 2003). Benefits
are additionally derived from the customizable conditional requirement of local Moran’s
value falling between “p1” and “p2” as it ensures the accuracy of results by signalling to
errors. The bandwidth has been left open since accurate results do not always fall exactly
between -1 and 1. As a result, these factors reveal that the assumptions of the different
studies can be reflected in the function through its adjustable parameters, achieving the
goals creating an adaptive function that synthesizes the Moran’s I test process.

Nevertheless, there are potential limitations of cluster_scope. The customizable


parameters may only be useful in a research perspective, but the lack of customizations to
the maps itself poses challenges to integration with research papers and publications.
Similarly, limitations may also challenge the accuracy of the function. Firstly, the local
Moran estimate maps may be skewed by the uneven allocation of values to colors. For
example, Figure 5.1’s key shows different distances within each colour class, this may cause
a misrepresentation in the map as more values are allocated into one class than others
(Levine, 2008). Secondly, methodological problems can be seen with choosing the
appropriate limit of p-value to statistical significance. Due to the “multiple comparisons
problem”, determining statistical significance for all values below 0.05 may not be suitable
for every study (Anselin, 2020). Thirdly, the Modifiable Area Unit Problem (MAUP) and the
checkerboard problem also skews the integrity of results from this function (Netrdová and
Nosek, 2017). The former touches of the usage of boundaries to aggregate data which
opens more room for statistical error as man-made boundaries overlooks potential
significance of results. The latter criticizes the extent to which neighbourhood relationships
are accurately presented. For example, output areas on the border Southwark may have
similar neighbours in bordering districts, but these neighbours are not accounted for.
Hence, the function leaves room for inaccuracy caused by skewed assumptions leading to a
wrongful rejection of null.

5
Appendix

Figure 1. Function Flow Chart

6
Figure 2. Function Inputs
cluster_scope(datafile, variable, weight, method, p1, p2, cust, distance.x, distance.y, output, stats)

Required Libraries: sp, spdep, rgeos, rgdal, tmap

Variables

datafile A merged data frame containing shapefiles and attributes.

variable Specify the attribute of interest within datafile. Enter as datafile$attribute.

Parameter Customizations

weight Specify the methodology of assigning weights to neighbours:


‘“B” = the basic binary coding,
“W” = row standardised (sums over all links to n),
“C” = globally standardised (sums over all links to n),
“U” = C divided by the number of neighbours (sums over all links to unity)
“S” = variance-stabilizing coding scheme (sums over all links to n)

“minmax” = divides the weights by the minimum of the maximum row sums and maximum column sums of the input
weights’ (Bivand, RDocumentation)

p1, p2 Specify the bandwidth “Ii” can fall within before and error message is printed, usually between 1 and
-1.

method Specify the neighbour relationship:


“Dist” = use a distance threshold to define neighbours
“Def” = use rook’s or queen’s case to define neighbours

Specify whether neighbours are defined by rook’s or queen’s case when using method = “Def”.
cust Where TRUE signals the usage of queen’s, FLASE signals the usage of rook’s case

Specify distance threshold to define neighbours when using method = “Dist”. Where distance.x
distance.x is the minimum distance and distance.y is the maximum of the specified threshold.
distance.y

Specify which estimate from the local Moran’s I test to be mapped with “Moran” map (first output
Stats option):

“Ii” = map local Moran’s estimate


“E.Ii” = map expectation of local Moran’s estimate
“Var.Ii” = map variance of local Moran’s estimate
“Z.Ii” = map z-score
“Pr(z != E(Ii))” = map p-value

7
Specify the map you want to produce:
output “Morans” = a choropleth map of local Moran’s I estimates
“LISA” = a local indicators of spatial association map
“Scatter” = scatter plot of attribute values and lag values

Figure 3. R script of cluster_scope


cluster_scope <- function(datafile, method, cust, p1, p2, distance.x, distanc
e.y, weight, variable, stats, output) {#Customize Neighbours
switch(method, "Dist" = neighbours <- dnearneigh(coordinates(datafile), dis
tance.x, distance.y),"Def" = neighbours <- poly2nb(datafile, queen = cust))
#Determine Weighting, Customize Weights
listw <- nb2listw(neighbours, style= weight)
#Calculate Local Moran I
local <- localmoran( x = variable, listw, zero.policy = T)
moran.map <- datafile
moran.map@data<- cbind(datafile@data, local)
Lag <- lag.listw(nb2listw(neighbours, style = weight, zero.policy=T), varia
ble)
if(max(moran.map$Ii)>p1 | p2 > min(moran.map$Ii))
{print("Local Moran's I Value does not fall between specified range, please
try row standardizing or adjust neighbour relationships")}
else{ #Creating Maps
Map1 <- tm_shape(moran.map) + tm_fill(col = stats, style = "quantile", ti
tle = "Local Test")
Map2 <- function(datafile){
#Create LISA Quadrants
quadrant <- vector(mode="numeric",length=nrow(local))
m.Rate_num <- variable - mean(variable)
m.Lag <- Lag - mean(Lag, na.rm=T)
signif <- 0.1
quadrant[m.Rate_num <0 & m.Lag<0] <- 1
quadrant[m.Rate_num <0 & m.Lag>0] <- 2
quadrant[m.Rate_num >0 & m.Lag<0] <- 3
quadrant[m.Rate_num >0 & m.Lag >0] <- 4
quadrant[local[,5]>signif] <- 0
brks <- c(0,1,2,3,4)
colors <- c("white","blue",rgb(0,0,1,alpha=0.4),rgb(1,0,0,alpha=0.4),"r
ed")
plot(datafile,border="lightgray",col=colors[findInterval(quadrant,brks,
all.inside=FALSE)])
legend("bottomleft",legend=c("Insignificant","Low-low","Low-high","High
-low","High-high"),
fill=colors,bty="n")}
Map3 <- function(variable){
Attribute_Values<- variable-mean(variable)
Lag_Values<- Lag-mean(Lag)
plot(Attribute_Values, Lag_Values)

8
abline(h = 0, lty = 2)
abline(v = 0, lty = 2)
xy.lm <- lm(Lag_Values ~ Attribute_Values)
abline(xy.lm, lty=3)}
#Map Output Options
switch(output, "Moran" = print(Map1), "LISA" = Map2(datafile), "Scatter"
= Map3(variable)) } }

Figure 4. R script of cluster_scope applied to Southwark data


library(sp)
library(rgdal)

library(rgeos)

library(tmap)
library(spdep)

#Load relevant data files


southwark.OA <- readOGR("/Users/jfrickinc/Desktop/worksheet_data/Southwark/sh
apefiles", "Southwark_oa11")

southwark.census <- read.csv("/Users/jfrickinc/Desktop/worksheet_data/Southwa


rk/practical_data_Southwark.csv")
#Merge Data
SOA.Census <- merge(southwark.OA, southwark.census, by.x = "OA11CD", by.y = "
OA")

9
Figure 5 Local Moran’s I Scatter Plot
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "Scatter", stats = "
Ii")

10
Figure 6.1 Local Moran’s I Estimate
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "Moran", stats = "Ii
")

## Variable(s) "Ii" contains positive and negative values, so midpoint is set


to 0. Set midpoint = NA to show the full spectrum of the color palette.

11
Figure 6.2 Z-Score
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "Moran", stats = "Z.
Ii")

## Variable(s) "Z.Ii" contains positive and negative values, so midpoint is s


et to 0. Set midpoint = NA to show the full spectrum of the color palette.

## Legend labels were too wide. The labels have been resized to 0.51, 0.59, 0
.63, 0.63, 0.57. Increase legend.width (argument of tm_layout) to make the le
gend wider and therefore the labels larger.

12
Figure 6.3 P-Value
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "Moran", stats = "Pr
(z != E(Ii))")

13
Figure 7 Lisa Map
cluster_scope(datafile = SOA.Census, p1= 1.2, p2 = -1.2, distance.x = 0, meth
od = "Dist", cust = TRUE, distance.y = 2000, weight = "W",
variable = SOA.Census$Unemployed, output = "LISA", stats = "Ii"
)

14
References
Anselin, L. (2020). (1) LISA and Local Moran: An Introduction to Spatial Data Science.
GeoDa. Local Spatial Autocorrelation. Available at:
https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html#significance-and-
interpretation
Anselin, L. (1992). SpaceStat TUTORIAL: A Workbook for Using SpaceStat in the Analysis of
Spatial Data. University of Illinois, Urbana-Champaign
Bivand, R. Werner G. Müller, M.R. (2009). Power calculations for global and local Moran’s I.
In: Computational Statistics & Data Analysis, Volume 53, Issue 8. Pages 2859-2872. ISSN
0167-9473. https://doi.org/10.1016/j.csda.2008.07.021.
Bivand, R. (spdep versión 1.2-7). poly2nb: Construct neighbours list from polygon list.
RDocumentation. Available at:
https://www.rdocumentation.org/packages/spdep/versions/1.2-7/topics/poly2nb
Espada, R. Apan, A. McDougall, K. (2013). Understanding the January 2011 Queensland
flood: the role of geographic interdependency in flood risk assessment for urban
community. In: Australian and New Zealand Disaster and Emergency Management
Conference (ANZDMC 2013): Earth: Fire and Rain, 28-30 May 2013, Brisbane, Australia.
Humphrey, A.L. Wilson, B.C. Reddy, M. Shroba, J.A. Ciaccio, C.E. (2015). An Association
Between Pediatric Food Allergy and Food Deserts. The Journal of Allergy and Clinical
Immunology. Elsevier Inc. https://doi.org/10.1016/j.jaci.2014.12.1774
Kang, D. Choi, H. Kim, J.H., Choi, J. (2020). Spatial epidemic dynamics of the COVID-19
outbreak in China. In: International Journal of Infectious Diseases, Volume 94. Pages 96-
102, ISSN 1201-9712. https://doi.org/10.1016/j.ijid.2020.03.076.
Kim, J. Elliot, E. Wang, D.M. (2003). A spatial analysis of county-level outcomes in US
Presidential elections: 1988–2000. Electoral Studies. Volume 22, Issue 4, Pages 741-761.
ISSN 0261-3794. https://doi.org/10.1016/S0261-3794(02)00008-2
Legendre, P. (1993). Spatial Autocorrelation: Trouble or New Paradigm?. Ecology, 74:
1659-1673. https://doi.org/10.2307/1939924
Levine, N. (2008). CrimeStat: A Spatial Statistical Program for the Analysis of Crime
Incidents. In: Shekhar, S., Xiong, H. (eds) Encyclopedia of GIS. Springer, Boston, MA.
https://doi.org/10.1007/978-0-387-35973-1_229
Netrdová, A. Nosek, V. (2017). Exploring the variability and geographical patterns of
population characteristics: Regional and spatial perspectives. In: Morvarian Geographical
Reports, 25(2): 85-94. Institute of Geonics, The Czech Academy of Sciences. doi:
10.1515/mgr-2017-0008
Watrel, R.H. Weichelt, R. Davidson F.M. Heppen, J. Fouberg, E.H, Archer, J.C. Morill, R.L.
Shelley, F.M. Martis, K.C. (2018). Atlas of the 2016 Election. Rowan & Little Field. London,
United Kingdom

15

You might also like