Geospatial Data II
Geospatial Data II
Geospatial Data II
This section provides an overview of Geospatial data visualization; Geospatial Python
Libraries; and most importantly Basic Spatial Analysis Techniques like Point Pattern
Analysis, Spatial Interpolation, and Spatial Correlation models.
Spatial statistics
A) Spatial Autocorrelation
Spatial autocorrelation refers to the correlation of a variable with itself through space.
It quantifies the degree to which objects or values located near each other in
geographic space are similar or dissimilar.
• Positive Spatial Autocorrelation: Occurs when geographically proximate
locations have similar values. For example, areas with high rainfall are often
surrounded by other areas with high rainfall. This clustering of similar values
indicates positive autocorrelation.
• Negative Spatial Autocorrelation: Occurs when neighboring locations have
dissimilar values. For instance, high property values may be adjacent to areas
with low property values, indicating a pattern of dispersion.
1
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
In Moran’s I formula, the spatial weight 𝜔𝑖𝑗 represents the strength of the spatial
relationship between locations 𝑖 and 𝑗. This weight 𝜔𝑖𝑗 can be defined in various ways,
often based on distance or other criteria such as adjacency, but it is not necessarily the
same as distance itself. Instead, ωij may represent a function of distance or connectivity
between locations. A common way to define ωij is to use the inverse distance, giving
2
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
2) Local Measures
Local measures, often referred to as Local Indicators of Spatial Association (LISA),
assess spatial autocorrelation at a specific location or small region within the study area.
They are used to detect spatial heterogeneity, meaning that spatial autocorrelation may
vary across the dataset. Local measures help identify clusters of similar or dissimilar
values (e.g., hotspots or cold spots) and spatial outliers. The most common local
measure of spatial autocorrelation is Local Moran's I. It is a local version of the global
Moran’s I statistic:
3
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Local Moran's I detects clusters of similar values (high-high or low-low) and identifies
spatial outliers (high-low or low-high) within a dataset:
High-High clusters: A high value (positive and significant) indicates that the location
has a high value and is surrounded by neighbors with high values (hotspot).
Low-Low clusters: A high positive also means that the location has a low value and is
surrounded by other low values (cold spot).
High-Low or Low-High clusters: Indicate spatial outliers, where a location with a high
value is surrounded by low-value neighbors (High-Low), or a location with a low value
is surrounded by high-value neighbors (Low-High). These patterns highlight significant
local deviations from the surrounding spatial context.
In Local Moran's I maps, the statistically insignificant areas represent locations
where the calculated Local Moran's I value does not show a strong or reliable spatial
autocorrelation. In other words, these locations do not exhibit a clear pattern of
clustering or dispersion that is distinguishable from random chance. This may random
distribution, no strong spatial dependence or unclear spatial pattern.
4
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Quadrat count method: (a) point (event) locations in an area overlay by N 9 N contiguous grid
size; (b) number of points observed in each quadrat.
5
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
6
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
C) Spatial Interpolation
7
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
8
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
9
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
10
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
There are many variants of the Kriging method, each differing in how they handle
trends, data assumptions, and constraints. Here we introduce 3 of these variants:
1) Ordinary Kriging (OK) (As discussed above):
• Assumes the mean of the data is constant but unknown over the study area.
• Weights are determined based solely on spatial autocorrelation as modeled by
the semivariogram.
• Most used form of kriging.
2) Universal Kriging (UK):
• Accounts for a spatial trend or drift in the data, meaning that the mean is not
constant across the study area.
• Incorporates both a deterministic trend and spatial autocorrelation in the kriging
model.
• Often used when there is a clear trend in the data (e.g., elevation increasing with
latitude).
3) Cokriging:
• A multivariate extension of kriging that interpolates multiple correlated variables
simultaneously.
• Uses the correlation between variables to improve the estimation of a target
variable (e.g., using soil type to predict crop yield).
D) Spatial Regression
11
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
12
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
2) Spatial Weights: GWR uses spatial weights to assign more influence to nearby
observations when estimating the local regression coefficients. This means that
for each location, data points that are geographically closer are given more
weight in the local regression, while distant points have less influence.
Unlike traditional regression models, spatial regression explicitly considers the
influence of the spatial structure of the data, addressing the fact that observations
closer together in space may be more similar than those further apart.
The coefficients 𝛽 for location 𝑖 are not estimated using data from just location 𝑖 alone,
but rather using data from all other locations in the dataset, with nearby locations
having a larger influence and distant locations having less influence. A weighting
function defines how much influence each observation j has on the local regression at
location 𝑖. For example, if location 𝑗 is close to location 𝑖, the weight will be large,
meaning that observation 𝑗 will have a strong influence on the estimation of the local
coefficients at location 𝑖. If location 𝑗 is far away from ii, the weight will be small,
meaning that observation 𝑗 will have little influence.
13
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Python-based geostatistics
There are several widely used Python libraries that provide support for geostatistics,
including tools for spatial autocorrelation, point pattern analysis, spatial interpolation,
and spatial regression. Below are some of the most used libraries:
1. PySAL (Python Spatial Analysis Library)
PySAL is the most comprehensive and widely used library for spatial data analysis in
Python, and it includes support for many geostatistical techniques.
14
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Key Features:
• Spatial Autocorrelation: PySAL includes tools to compute global and local
measures of spatial autocorrelation, such as Moran’s I and Local Indicators of
Spatial Association (LISA).
• Spatial Regression: PySAL has modules for spatial econometrics, including
geographically weighted regression (GWR).
• Point Pattern Analysis: It supports point pattern analysis.
• Spatial Interpolation: PySAL includes methods for kriging and inverse distance
weighting (IDW) for spatial interpolation.
2. Geostatistical Modeling Library (GSTools)
GSTools is a specialized Python library for geostatistics, focusing on kriging and
variogram modeling. GSTools is a good tool when you want to perform advanced
kriging models.
3. Geopandas
GeoPandas extends the capabilities of pandas to handle spatial data. While it does not
natively support many geostatistical algorithms, it integrates well with PySAL for those
advanced functionalities, serving as a tool for reading files, pre-processing, and
visualizing spatial data.
Geospatial data visualization involves the graphical representation of data that is tied
to specific geographic locations, enabling users to analyze patterns, trends, and
relationships based on location. While it shares many concepts, methods, and tools
with general data visualization, it introduces unique challenges and techniques due to
the inherent spatial component. Geospatial visualization includes specialized methods
like heat maps, choropleth maps, and 3D terrain models, and requires handling
geographic projections, spatial relationships, and layers of location-based data,
making it essential for applications in fields such as environmental monitoring, urban
planning, and navigation.
Location-based data can be visualized by plotting the spatial coordinates (e.g., latitude
and longitude) on a 2D or 3D grid. Visualizing spatial data in Python is straightforward
with libraries like matplotlib, plotly, and pyvista, which support various types of 2D and
3D visualizations, including scatter plots, surface plots, and contour maps. These
libraries are standalone, requiring no connection to external software or services,
15
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
enabling users to visualize data directly within Python environments. However, when
geospatial data requires geographic context—such as overlaying points or features on
maps of streets, buildings, or terrain—Python can be integrated with external mapping
services like OpenStreetMap, Google Maps, or Google Earth using libraries like folium
and OSMnx. These connections allow data to be displayed in a browser or interactive
map interface, providing real-world context for the spatial data.
Overlaying
In the context of geospatial data visualization, overlaying means placing spatial data
(e.g., points, lines, or polygons representing things like locations, roads, or regions) on
top of a base map that shows real-world features such as streets, buildings, terrain, or
satellite imagery.
The spatial data and the base map must use the same geographic coordinate system
or projection to ensure proper alignment. For example, both must use latitude and
longitude or another compatible system. Furthermore, the precision of the overlay
depends on the quality of the data. Low-resolution data may not align well with detailed
base maps, leading to inaccuracies in visual representation.
Choropleth Maps
Choropleth mapping is a data visualization technique used in geospatial data to
represent the distribution of a variable across predefined regions, such as countries,
states, or districts. In a choropleth map, geographic areas are filled with varying shades
16
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
or colors based on the value of the data associated with that area. Typically, darker or
more intense colors represent higher values, while lighter colors indicate lower values.
Choropleth mapping is generally used for discrete variables, especially those that are
aggregated over geographic areas (i.e., data is summarized or averaged for specific
regions or zones, rather than being shown at every individual point within those
regions.). Each region is shaded or colored based on the value of the variable, making
it suitable for showing region-specific data.
However, choropleth maps can also be used for continuous variables, but this is less
common. When used with continuous variables (e.g., temperature or elevation
averaged over regions), the values are still aggregated to predefined geographic
areas, and color gradients are applied to represent different ranges of the continuous
variable.
17
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
range. This classification is easy to interpret and ensures that the same range of values
is assigned to each class. But it can lead to misleading visualizations if the data is
skewed, as some classes may have few or no observations.
• Example: If the data ranges from 0 to 100 and you want 5 classes, each class
would cover an interval of 20 units (0–20, 21–40, and so on).
2) Quantile Classification
Divides the data so that each class contains an equal number (or proportion) of data
points. It is useful when you want each category to have the same number of regions
or areas. This method ensures that all classes have data, which can help highlight spatial
patterns. But for data with wide variability may lead to uneven intervals, where some
classes span large ranges and others cover narrow ranges.
• Example: If there are 100 regions, a quantile classification with 5 classes will
assign 20 regions to each class.
3) Standard Deviation Classification
Divides the data based on how much values deviate from the mean (average). Class
boundaries are typically set at 1 or 0.5 standard deviation intervals. It is useful for
emphasizing how values diverge from the average, especially when interested in
showing outliers or values that are far from the norm. This method clearly highlights
regions that are above or below the average. But it is not ideal for data that doesn’t
follow a normal distribution, as the method assumes a bell-curve distribution.
• Example: A dataset with a mean of 50 and a standard deviation of 10 would
create classes like 0–40, 40–50, 50–60, 60–70, etc.
Contour maps
Contour maps are used to represent continuous data over a 2D plane, where lines
(contours) connect points of equal value. These are often used to visualize data such as
elevation, temperature, or pressure, where the data changes smoothly over space. The
contour lines help to show areas of equal value, and the space between the lines
indicates how rapidly the values are changing.
18
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
Hotspot maps
Hotspot maps are focused on showing where events occur frequently. These areas are
called "hotspots" because they indicate heightened activity. These maps are commonly
used to identify clusters or areas of intensity within a geographic region. In a hotspot
map, regions with more events (or higher values of the variable being mapped) are
often marked with warmer or more intense colors (like red), while regions with fewer
events are marked with cooler colors (like blue or green).
In environmental data analytics, hotspot maps are widely used to visualize and analyze
spatial patterns in phenomena such as wildfires, pollution incidents, and species
observations.
Proportional Symbol Maps
Proportional symbol maps use symbols (e.g., circles, squares) whose size varies in
proportion to the value of the data being represented. Larger symbols indicate higher
values, while smaller symbols indicate lower values. Common use cases in
environmental data analytics include visualizing the amount of waste generated or
improperly disposed of across various regions, mapping the capacity or output of
renewable energy sources (e.g., wind farms, solar plants, hydroelectric dams) by
geographic location, and illustrating the volume of available freshwater resources (e.g.,
groundwater or reservoir levels) in different areas. These visualizations help highlight
19
Course Title: Environmental Data Analytics.
Degree Program: Masters in data science.
Instructor: Mohammad Mahdi Rajabi
regional patterns and trends, making it easier to identify areas that require targeted
interventions or further analysis.
An example of Proportional Symbol Maps showing increase or decrease in jobs in the US.
20