Clustering of Census Recorded Ethnic Background
Clustering of Census Recorded Ethnic Background
Introduction
Clustering of Census Recorded
Load Packages
This work will hopefully form a foundation upon which a more fully-featured tool might be
developed. Allowing local public health teams to better understand their resident populations and
design services which are both acceptable to and appropriate for those groups.
A more detailed discussion of the issue of health and ethnicity in the UK can be read here:
http://www.parliament.uk/documents/post/postpn276.pdf
(http://www.parliament.uk/documents/post/postpn276.pdf)
Load Packages
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 1/40
23/10/2020 Clustering of Census Recorded Ethnic Background
library("tidyverse")
Introduction library("dplyr")
library("rgdal")
Load Packages library("extrafont")
library("sp")
Load Ethnicity Data
library("maptools")
Load Resident Population Numbers library("rgeos")
library("MASS")
Download and Unzip the Output Area library("raster")
Lookup Data library("broom") # contains the tidy function which now replaces the fortify funct
ion for ggplot
Obtain Data From a PostGIS library("viridis") # For nicer ggplot colours
PostgreSQL System library("spdep")
library("gridExtra")
Map of Ethnic Group by Output Area
library("Cairo")
Dots in Polygon Mapping library("RSQLite")
In this instance I use an API to directly pull the data I need at Output Area level. I’m only
downloading data for Slough (ONS code E06000039) currently, I would eventually like to
download data for the entire country at this level of detail as this would allow identification of
clusters which span authority boundaries (i.e. edge effects).
For now we will look at a single BMI ethnic group. The NOMIS census code for this group is 132
(NOMIS ethnicity detailed codelist available from
https://www.nomisweb.co.uk/api/v01/dataset/NM_575_1/cell.def.htm
(https://www.nomisweb.co.uk/api/v01/dataset/NM_575_1/cell.def.htm))
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 2/40
23/10/2020 Clustering of Census Recorded Ethnic Background
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 3/40
23/10/2020 Clustering of Census Recorded Ethnic Background
# Unzip
Introduction unzip(zipfile="data\\OA11_WD11_LAD11_EW_LU.zip", exdir="data")
Load Resident Population Numbers # Trim out the fields we don't need in the LAD lookup
oa_lad_lkp <- oa_lad_lkp %>% dplyr::select(OA11CD,LAD11CD,LAD11NM)
Download and Unzip the Output Area
Lookup Data Now the Local Authority Code can be joined to the census ethnicity table.
Obtain Data From a PostGIS
ethnicity_detailed <- ethnicity_detailed %>%
PostgreSQL System
left_join(pop_detailed, by="GEOGRAPHY_CODE") %>%
Map of Ethnic Group by Output Area left_join(oa_lad_lkp, by=c("GEOGRAPHY_CODE"="OA11CD")) %>%
filter(!is.na(LAD11CD)) %>%
Dots in Polygon Mapping mutate(
"CELL_NAME"=gsub("[.]"," ",CELL_NAME),
Clustering of Ethnic Group
"POP_PROPORTION"=OBS_VALUE/POPULATION)
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 4/40
23/10/2020 Clustering of Census Recorded Ethnic Background
# Should be able to pull data from my postgres database using the rgdal package bu
Introduction t this doesn't want to work for some reason so using postGIStools package instead
# dsn <- "PG:dbname=spatial_data_store host=localhost port=5432 user=postgres pass
Load Packages word=postgres"
# ogrListLayers(dsn)
Load Ethnicity Data
Load Resident Population Numbers # Create connection to the postgis database where all my shapefiles are stored
con <- dbConnect(PostgreSQL(), dbname = "spatial_data_store", user = "postgres",
Download and Unzip the Output Area host = "localhost",
Lookup Data password = "postgres")
Obtain Data From a PostGIS # Pull all the output areas from the postgis database for a specific local authori
PostgreSQL System ty
oa_shp <- get_postgis_query(con,
Map of Ethnic Group by Output Area
"SELECT *
Dots in Polygon Mapping FROM output_area_december_2011_generalised_cli
pped_boundaries_in_eng
Clustering of Ethnic Group WHERE lad11cd='E06000039'",
geom_name = "geom")
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 5/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Download and Unzip the Output Area plot_proportion <- ggplot(oa_shp_tidy, aes(long, lat, fill=POP_PROPORTION, group=i
Lookup Data d)) +
geom_polygon(col="grey") +
Obtain Data From a PostGIS scale_fill_gradient2(low="white",high="red", labels=scales::percent) +
PostgreSQL System labs(fill="Proportion") +
coord_fixed() +
Map of Ethnic Group by Output Area
theme_void() +
Dots in Polygon Mapping theme(legend.position="bottom")
Clustering of Ethnic Group plot_numbers <- ggplot(oa_shp_tidy, aes(long, lat, fill=OBS_VALUE, group=id)) +
geom_polygon(col="grey") +
GP Registration Numbers
scale_fill_gradient2(low="white",high="blue") +
labs(fill="Count") +
coord_fixed() +
theme_void() +
theme(legend.position="bottom")
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 6/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 7/40
23/10/2020 Clustering of Census Recorded Ethnic Background
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 8/40
23/10/2020 Clustering of Census Recorded Ethnic Background
# Calculate a 500 metre margin around the shapefile to ensure the density hotspots
Introduction aren't cut off on the map plot.
map_padding <- 500
Load Packages map_padding <- c(-map_padding, map_padding,
-map_padding, map_padding)
Load Ethnicity Data
{
plot(raster_kde2d, col=magma(7))
plot(oa_shp_data, add=TRUE, border="#555555")
}
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 9/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 11/40
23/10/2020 Clustering of Census Recorded Ethnic Background
ggplot() +
#stat_density_2d(data=as.data.frame(coordinates(x)), aes(x=x, y=y,fill = ..densi
ty..), geom = "raster", contour = FALSE) +
geom_polygon(data=slough_shp_tidy,
aes(long, lat, group=id),
fill=NA,
colour="white",
size=1) +
geom_raster(data=as.data.frame(raster_kde2d,
xy=TRUE),
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 12/40
23/10/2020 Clustering of Census Recorded Ethnic Background
aes(x,y,fill=layer)) +
scale_fill_viridis(option="magma",
Introduction begin=0.1,
end=0.9,
Load Packages
guide = guide_colorbar(
Load Ethnicity Data direction = "horizontal",
#barheight = unit(30, units = "mm"),
Load Resident Population Numbers #barwidth = unit(300, units = "mm"),
draw.ulim = FALSE,
Download and Unzip the Output Area
title.position = 'top',
Lookup Data
title.hjust = 0.5,
Obtain Data From a PostGIS label.hjust = 0.5)) +
geom_polygon(data=bld_slough_shp_tidy, aes(long, lat, group=id), fill="white", c
PostgreSQL System
olour=NA, alpha=0.5) +
Map of Ethnic Group by Output Area geom_path(data=rd_slough_shp_tidy, aes(long, lat, group=id), colour="white", alp
ha=0.5) +
Dots in Polygon Mapping geom_polygon(data=slough_shp_tidy, aes(long, lat, group=id), fill=NA, colour="wh
ite",size=0.25) +
Clustering of Ethnic Group
coord_fixed() +
GP Registration Numbers labs(x=NULL,
y=NULL,
fill="Density") +
theme_void() +
theme(legend.position = "bottom")
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 13/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
Create a Neighbourhood
First we create a neighbourhood object using the poly2nb function and our output area shapefile.
We do this using the ‘Queen’s case’ setting, meaning that adjacent areas which share either a
border or a corner are counted as neighbours.
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 14/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Load Packages {
par(mar=c(0,0,0,0))
Load Ethnicity Data
plot(oa_shp_data,
Load Resident Population Numbers border="grey")
plot(neighbourhood,
Download and Unzip the Output Area coords=coordinates(oa_shp_data),
Lookup Data col="red",
add=T)
Obtain Data From a PostGIS }
PostgreSQL System
GP Registration Numbers
Load Packages
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 16/40
23/10/2020 Clustering of Census Recorded Ethnic Background
##
Introduction ## Monte-Carlo simulation of Moran I
##
Load Packages ## data: oa_shp_data$POP_PROPORTION
## weights: neighbourhood_weights_list
Load Ethnicity Data
## number of simulations + 1: 600
Load Resident Population Numbers ##
## statistic = 0.73272, observed rank = 600, p-value = 0.001667
Download and Unzip the Output Area ## alternative hypothesis: greater
Lookup Data
Obtain Data From a PostGIS # Plot the distribution (note that this is a density plot instead of a histogram)
PostgreSQL System plot(moran_i_monte_carlo)
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 17/40
23/10/2020 Clustering of Census Recorded Ethnic Background
A local moran’s i statistic can now be calculated for each output area.
Introduction
# Local Moran
Load Packages LM_Results <- localmoran(oa_shp_data$POP_PROPORTION,
neighbourhood_weights_list,
Load Ethnicity Data p.adjust.method="bonferroni",
na.action=na.exclude,
Load Resident Population Numbers
zero.policy=TRUE)
Download and Unzip the Output Area
Lookup Data summary(LM_Results)
The results of the local moran’s i are merged back into the output area shapefile for plotting.
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 18/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Load Resident Population Numbers # manually make a moran plot based on standardised variables
# standardise variables and save to a new column
Download and Unzip the Output Area oa_shp_data$SCALED_POP_PROPORTION <- scale(oa_shp_data$POP_PROPORTION)
Lookup Data
# create a lagged variable
Obtain Data From a PostGIS oa_shp_data$LAGGED_SCALED_POP_PROPORTION <- lag.listw(neighbourhood_weights_list,
PostgreSQL System oa_shp_data$SCALED_POP_PROPORTION)
First we look at the relationship between population proportion and the spatially lagged values.
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 19/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
Secondly we can plot a comparison set of maps using ggplot to look at the clusters which are
identified.
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 20/40
23/10/2020 Clustering of Census Recorded Ethnic Background
GP Registration Numbers
gg2 <- ggplot(oa_shp_data_tidy, aes(long, lat, fill=POP_PROPORTION, group=id)) +
geom_polygon(col="white") +
scale_fill_gradient(low="white",high="red", labels=scales::percent) +
coord_fixed() +
theme_void()
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 21/40
23/10/2020 Clustering of Census Recorded Ethnic Background
gg1
Introduction
Load Packages
GP Registration Numbers
gg2
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 22/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
gg3
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 23/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
gg4
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 24/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
Now we can create a plot which shows only the areas with a high proportion that are surrounded
by high proportion areas and are statistically significant.
ggplot() +
geom_polygon(data=oa_shp_data_tidy, aes(long, lat, fill=lmoran_sig, group=id),fi
ll="grey",col="white") +
geom_polygon(data=oa_shp_data_tidy_sig_high_high, aes(long, lat, fill=lmoran_sig
, group=id),fill="red",col="white") +
coord_fixed() +
theme_void()
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 25/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 26/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 27/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 28/40
23/10/2020 Clustering of Census Recorded Ethnic Background
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 29/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Introduction
Load Packages
GP Registration Numbers
GP Registration Numbers
It would be useful to know if areas with low GP registration were in any way related to our
clusters. For now we will just do this visually.
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 30/40
23/10/2020 Clustering of Census Recorded Ethnic Background
GP Registration Numbers
# We lose a couple of hundred practices mainly from Jersey, Guernsey and Northern
Ireland
# drop those practices with null coordinates
gpp <- gpp %>% filter(!is.na(oseast1m))
# filter our practices to those within our local authority area using the over fun
ction to perform a point-in-polygon selection
point_in_polygon <- sp::over(gpp_shp, slough_shp)
point_in_polygon$row_number <- row.names(point_in_polygon)
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 31/40
23/10/2020 Clustering of Census Recorded Ethnic Background
Load Packages
GP Registration Numbers
Note that we will generate catchment areas and then select all of the ones which overlap with our
selected local authority.
Introduction
Clustering of Ethnic Group ## OGR data source with driver: ESRI Shapefile
## Source: "shp", layer: "Lower_Layer_Super_Output_Areas_December_2011_Generalised
GP Registration Numbers _Clipped__Boundaries_in_England_and_Wales"
## with 34753 features
## It has 6 fields
## Integer64 fields read as strings: objectid
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 33/40
23/10/2020 Clustering of Census Recorded Ethnic Background
## # A tibble: 7,531 x 5
Introduction ## PRACTICE_CODE n TOTAL_POP MEAN_LSOA_POP MEDIAN_LSOA_POP
## <chr> <int> <int> <dbl> <dbl>
Load Packages ## 1 A81001 50 4178 83.56000 6.5
## 2 A81002 91 19902 218.70330 220.0
Load Ethnicity Data
## 3 A81004 134 9344 69.73134 42.0
Load Resident Population Numbers ## 4 A81005 23 7931 344.82609 167.0
## 5 A81006 106 13661 128.87736 95.0
Download and Unzip the Output Area ## 6 A81007 68 9834 144.61765 160.5
Lookup Data ## 7 A81008 129 3973 30.79845 4.0
## 8 A81009 128 9084 70.96875 49.5
Obtain Data From a PostGIS ## 9 A81011 70 11723 167.47143 182.0
PostgreSQL System ## 10 A81012 132 4778 36.19697 25.0
## # ... with 7,521 more rows
Map of Ethnic Group by Output Area
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 34/40
23/10/2020 Clustering of Census Recorded Ethnic Background
# define a function to handle tidying (which used to be called fortifying) and the
n joinng the data items back in
clean <- function(shape){
shape@data$id = rownames(shape@data)
shape.points = tidy(shape, region="id")
shape.df = inner_join(shape.points, shape@data, by="id")
}
ggplot(data=lsoa_shp_pop_tidy,
aes(long,
lat,
group=id,
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 35/40
23/10/2020 Clustering of Census Recorded Ethnic Background
fill=Difference)) +
geom_polygon(colour="white") +
Introduction geom_text(aes(x,y,label=Number_of_Practices),color = "white", size=3) +
scale_fill_gradient2(midpoint=0, low="red", mid="white",high="blue") +
Load Packages
coord_fixed() +
Load Ethnicity Data theme_void()
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 36/40
23/10/2020 Clustering of Census Recorded Ethnic Background
# ----------
Introduction median_ons_pop <- median(lsoa_pop_var$All.Ages, na.rm=T)
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 37/40
23/10/2020 Clustering of Census Recorded Ethnic Background
# ----------
Introduction median_pat_pop_diff <- median(lsoa_pop_var$Difference, na.rm=T)
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 38/40
23/10/2020 Clustering of Census Recorded Ethnic Background
# ----------
Introduction
ggplot(lsoa_pop_var, aes(All_Patients,All.Ages,colour=Difference)) +
Load Packages geom_point() +
scale_color_gradient2(low="#2c7bb6", mid="#ffffbf", high="#d7191c") +
Load Ethnicity Data
geom_smooth(method="lm", se=T, colour="black", linetype=3) +
Load Resident Population Numbers #geom_abline(intercept=0, slope=1) +
theme_bw() +
Download and Unzip the Output Area labs(x="Population Estimate 2016",
Lookup Data y="Registrants",
title="LSOA Population Estimates vs Number of Registrants"
Obtain Data From a PostGIS ) +
PostgreSQL System coord_equal()
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 39/40
23/10/2020 Clustering of Census Recorded Ethnic Background
# ----------
Introduction median_pat_num_prac <- median(lsoa_pop_var$Number_of_Practices, na.rm=T)
GP Registration Numbers
https://rstudio-pubs-static.s3.amazonaws.com/346625_9b7a90358ca44d0b89db512afedc63b2.html#create_a_neighbourhood 40/40