Exploratory Spatial Data Analysis

This document discusses exploratory spatial data analysis (ESDA) techniques and descriptive statistics that can be used to analyze and visualize spatial data. Some key techniques mentioned include choropleth maps, histograms, measures of central tendency, outliers detection, and bivariate analyses like scatter plots. ESDA aims to describe, visualize, and examine relationships in spatial data through both graphical and numerical methods.

Uploaded by

Asaad Ashoo

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Exploratory Spatial Data Analysis

Uploaded by

Asaad Ashoo

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

Exploratory Spatial Data Analysis

Spatial Data Analysis

Lecture-04
Exploratory Spatial Data Analysis (ESDA

Exploratory Spatial Data Analysis (ESDA) is a

collection of visual and numerical methods used
to analyze spatial data by
• (a) Classical non-spatial ESDA
• (b) Non-Classical & Advanced Spatial ESDA
(Identifying spatial interactions, relationships
and patterns)
ESDA methods and tools are used to
• Describe and summarize spatial data
distributions
• Visualize spatial distributions
• Examine spatial autocorrelation (i.e., trace
spatial relationships and associations)
• Detect spatial outliers
• Locate clusters
• Identify hot or cold spots
Descriptive statistics
Descriptive statistics is a set of statistical procedures that
summarize the essential characteristics of a distribution
through calculating/plotting:
• Frequency distribution
• Center, spread and shape (mean, median and standard
deviation)
• Standard error
• Percentiles and quartiles
• Outliers
• Boxplot graph
• Normal QQ plot
Others Statistics
• Inferential statistics is the branch of statistics
that analyzes samples to draw conclusions for
an entire population.
• Spatial statistics employ statistical methods to
analyze spatial data, quantify a spatial process,
discover hidden patterns or unexpected
trends and model these data in a geographic
context.
Why Use
• Centrographic measures
• Analyze spatial patterns
• Identify spatial autocorrelation, hot spots and
outliers
• Perform spatial clustering
• Model spatial relationships
• Analyze spatially continuous variables
ESDA Tools and Descriptive Statistics for
Visualizing Spatial Data
• Simple ESDA Tools and Descriptive Statistics
for Visualizing Spatial Data (Univariate Data)
• ESDA Tools and Descriptive Statistics for
Analyzing Two or More Variables (Bivariate
Analysis)
ESDA techniques and descriptive statistics
• The most common ESDA techniques and descriptive
statistics for analyzing univariate data (only one variable of
the dataset is analyzed each time. These include
• Choropleth maps
• Frequency distributions and histograms
• Measures of the center, spread and shape of a distribution
• Percentiles and quartiles
• Outlier detection
• Boxplots
• Normal QQ plot
Choropleth Maps
• Choropleth maps are thematic maps in which areas
are rendered according to the values of the variable
displayed
• Choropleth maps are used to obtain a graphical
perspective of the spatial distribution of the values
of a specific variable across the study area.
• There are two main categories of variables displayed
in choropleth maps:
• (a) spatially extensive variables
• (b) spatially intensive
Frequency Distribution and Histograms
• Frequency distribution table is a table that stores
the categories (also called “bins”), the frequency,
the relative frequency and the cumulative relative
frequency of a single continuous interval variable.
• The frequency for a particular category or value
(also called “observation”) of a variable is the
number of times the category or the value appears
in the dataset.
• Relative frequency is the proportion (%) of the
observations that belong to a category.
Frequency Distribution and Histograms
• The cumulative relative frequency of each row
is the addition of the relative frequency of this
row and above.
Histogram
• A frequency distribution histogram is a histogram
that presents in the x-axis the bins and in the y-
axis the frequencies (or the relative frequencies)
of a single continuous interval variable.
• A probability density histogram is defined so that
(a) The area of each box equals the relative
frequency (probability) of the corresponding bin
(b) The total area of the histogram equals 1
Use of Frequency Distribution & Histogram

 Frequency distribution tables and histograms are

used to analyze how the values of the studied
variable are distributed across the various
categories.
 The histogram can also be used to determine if the
distribution is normal or not.
 Additionally, it can be used to display the shape of
a distribution and examine the distribution’s
statistical properties (e.g., mean value, skewness,
kurtosis)
Measures of Center
• Measures of central tendency provide information
about where the center of a distribution is located.
The most commonly used measures of center for
numerical data are the mean and the median.
• The mean is the simple arithmetic average: the sum
of the values of a variable divided by the number of
observations.
• The median is the value that divides the sorted
scores form smaller to larger in half. It is a measure
of center.
Mean
Measures of Shape
• Measures of shape describe how values (e.g.,
frequencies) are distributed across the intervals (bins)
and are measured by skewness and kurtosis.
• Shapes are of
• Symmetrical
• Asymmetrical
• Skewness is the measure of the asymmetry of a
distribution around the mean.
• Kurtosis, from the graphical inspection perspective, is the
degree of the peakedness or flatness of a distribution.
Measures of Spread/Variability – Variation

• Measures of spread (also called measures of variability,

variation, diversity or dispersion) of a dataset provide
information of how much the values of a variable differ
among themselves and in relation to the mean. The most
common measures are as follows:
• Range
• Deviation from the mean
• Variance
• Standard deviation
• Standard distance
• Percentiles and quartiles
Range
• A range is the difference between the largest
and smallest values of the variable studied.
• The greater the range, the more variation in
the variable’s values, which might also reveal
potential outliers
Deviation
• Deviation from the mean is the subtraction of
the mean from each score
Sample Variance
• Sample Variance is the sum of the squared
deviations from the mean divided by n – 1.
• Large values of s 2 (variance) reveal a great
variation in the data, indicating that many
observations have scores further away from
the mean.
Standard Deviation
• Standard deviation is the square root of variance
• A positive standard deviation value indicates the
number of standard deviations above the mean,
and a negative value indicates the number of
standard deviations below the mean.
• Standard deviation is used to estimate how many
objects in the sample lie further away from the
mean in reference to the z-score
Percentile
• A percentile is a value in a ranked data
distribution below which a given percentage
of observations falls. Every distribution has
100 percentiles.
• Percentiles are used to compare a value in
relation to how many values, as a percentage
of the total, have a smaller or larger value.
Quartile
• The quartiles are the 25th, 50th and 75th
percentiles, called “lower quartile” (Q1),
“median” and “upper quartile” (Q3)
respectively.
Interquartile Range
• The interquartile range (IQR) is obtained by
subtracting the lower quartile from the upper
quartile
Quantile
 Quantiles are equal-sized, adjacent subgroups
that divide a distribution.
 Quantiles are often used to divide probability
distributions into areas of equal probabilities.
In fact, percentiles are quantiles that divide a
distribution to 100 subgroups.
 GIS software uses quantiles to color and to
symbolize spatial entities when there are
many different values
Outliers
• Outliers are the most extreme scores of a
variable.
• They should be traced for three main reasons:
• Outliers might be wrong measurements
• Outliers tend to distort many statistical results
• Outliers might hide
Boxplot

• A boxplot is a graphical representation of the

key descriptive statistics of a distribution.
• To depict the median, spread (regarding
percentiles) and presence of outliers.
Normal QQ Plot

• The normal QQ plot is a graphical technique

that plots data against a theoretical normal
distribution that forms a straight line.
• A normal QQ plot is used to identify if the data
are normally distributed in a theoretical line of
45 degree.
ESDA Tools and Descriptive Statistics for Analyzing
Two or More Variables (Bivariate Analysis)

• Spatial analysis often focuses on two different

variables simultaneously. This type of analysis
is called “bivariate,” and the dataset used is
called a “bivariate dataset.” The study of more
than two variables, as well as the dataset
used, is called “multivariate.”
ESDA techniques for Bivariate Data
• The most common ESDA techniques and
descriptive statistics for analyzing bivariate data
include
• Scatter plot
• Scatter plot matrix
• Covariance and variance–covariance matrix
• Correlation coefficient
• Pairwise correlation
• General QQ plot
Scatter plot & Scatter Plot matrix
• A scatter plot displays the values of two variables as a set of
point coordinates
• A scatter plot matrix depicts the combinations of all possible
pairs of scatter plots when more than two variables are
available.
• The visual inspection of all pair combinations facilitates
• (a) the locating of variables with high or no association,
• (b) the identification of relationship type (i.e., linear
nonlinear)
• (c) outlying points.
• The closer the data, the higher their linear correlation
• scattered a pattern is, the weaker the linear relationship
• The further away a data point lies likely to be outlier.
Covariance and Variance–Covariance Matrix

• Covariance is a measure of the extent to which two

variables vary together (i.e., change in the same linear
direction). Covariance Cov(X, Y) is calculated as:
 Covariance measures the extent to which two variables
of a dataset change in the same or opposite linear
direction.
 If the covariance is negative, then the variables change
in the opposite way (one increases, the other
decreases). Zero covariance indicates no correlation
between the variables.
Covariance and Variance–Covariance Matrix

• The variance–covariance matrix is applied in many

statistical procedures to produce estimator
parameters in a statistical model, such as the
eigenvectors and eigenvalues used in principal
component analysis.
• It is also used in the calculation of correlation
coefficients.
• Covariance and variance–covariance are
descriptive statistics and are widely used in many
spatial statistical approaches.
Correlation Coefficient

• Correlation coefficient r(x, y) analyzes how

two variables (X, Y) are linearly related.
Among the correlation coefficient metrics
available, the most widely used is the
Pearson’s correlation coefficient (also called
Pearson product-moment correlation).
Pairwise correlation
• Pairwise correlation is the calculation of the
correlation coefficients for all pairs of
variables.
• To identify potential linear relationships
quickly.
General QQ Plot
• A general QQ plot depicts the quantiles of a variable
against the quantiles of another variable.
• This plot can be used to assess similarities in the
distributions of two variables.
• If the two variables have identical distributions, then
the points lie on the reference line at 45; if they do
not, then their distributions differ.
Rescaling Data
• Rescaling is the mathematical process of changing
the values of a variable to a new range.
• By rescaling data, the spread and the values of the
data change, but the shape of the distribution and
relative attributes of the curve remain unchanged.
• comparing their descriptive statistics
• Comparison for highest and lowest values and
index
• Used for multivariate datasets.
Methods for Rescaling
• Normalize: The following formula is a typical method
of creating common boundaries.

• Adjust: Rescaling data is to divide a variable (or

multiply it by assigning weights) by a specific value.
• Adjustments could be expressed in many other ways
depending on the problem studied and the research
question/hypothesis tested.
• What are parametric and nonparametric
methods and tests?
• What is a test of significance?
• What is the null hypothesis?
• What is a p-value?
• What is a z-score?
• What is the confidence interval?
• What is the standard error of the mean?
• What is so important about normal distribution?
• How can we identify if a distribution is normal?
What are parametric and nonparametric methods and tests?

• Parametric methods and tests are statistical

methods using parameter estimates for
statistical inferences.
• They assume that the sample is drawn from
some known distribution (not necessarily
normal) that obeys some specific rules. They
belong to inferential statistics.
What are parametric and nonparametric
methods and tests?
• Statistical methods used when normal
distribution or other types of probability
distributions are not assumed are called
“nonparametric”
Confidence Interval
• Confidence interval is an interval estimate of a
population parameter.
• In other words, a confidence interval is a range of
values that is likely to contain the true population
parameter value.
• A confidence interval is calculated once a confidence
level is defined.
• It is usually set to 95% or 99%.
• A confidence level of 95% reflects a significance
level of 5%.
Standard Error
• The standard error of a statistic is the standard
deviation of its sampling distribution.
• The standard error reveals how far the sample
statistic deviates from the actual population
statistic.
Standard Error
• Low values of the standard error of the mean
indicate more precise estimates of the
population mean.
• The larger the sample is, the smaller the
standard error calculated. This is rational, as
the more objects we have, the closer to the
real values our approximation will be.
Significance Tests, Hypothesis, p-Value and
z-Score
• A test of significance is the process of
rejecting or not rejecting a hypothesis based
on sample data.
• The p-value is the probability of finding the
observed (or more extreme) results of a
sample statistic (test statistic) if we assume
that the null hypothesis is true. It is calculated
based on the z-score.
Z Score
• The z-score (also called z-value) expresses
distance as the number of standard deviations
between an observation (for hypothesis
testing calculated by a specific formula for a
statistical test) and the mean. It is calculated
(for samples) by the following formula
Significance Level
• Significance level α is a cutoff value used to
reject or not reject the null hypothesis.
Significance level α is a probability and is user-
defined, usually taking values such as
• α = 0.05, 0.01 or 0.001, (5%, 1%
and0.1% )probability levels.
• The smaller the p-value the more statistically
significant the results.
What is so important about normal
distribution?
• Create a histogram and superimpose a
normal curve.
• Calculate the skewness and kurtosis for the
distribution.
• Create a normal QQ plot
What to Do When Distribution Is Not
Normal
• Use nonparametric statistics
• Apply variable transformation. An efficient
way to avoid a non-normal distribution is to
transform it (if possible) to a normal
distribution.
• Check the sample size.

Lesson Plan
No ratings yet
Lesson Plan
9 pages
How To Make Maps in R
100% (1)
How To Make Maps in R
54 pages
2 Spatial Statistics - Univariate
No ratings yet
2 Spatial Statistics - Univariate
70 pages
Paper QC QA in GIS 2018 16pages
100% (1)
Paper QC QA in GIS 2018 16pages
17 pages
Lesson 2 - Designing Web Services and Web Maps
No ratings yet
Lesson 2 - Designing Web Services and Web Maps
10 pages
Practical Introduction To QGIS: Fmoh/Hitd
100% (1)
Practical Introduction To QGIS: Fmoh/Hitd
38 pages
Grupo 7 Build A Geospatial Dashboard in Python Using Greppo by Adithya Krishnan Towards Data Science
100% (1)
Grupo 7 Build A Geospatial Dashboard in Python Using Greppo by Adithya Krishnan Towards Data Science
13 pages
DAP Training Manual - Module 1
No ratings yet
DAP Training Manual - Module 1
24 pages
GIS Lab
100% (1)
GIS Lab
27 pages
Tutorial All PPSS PostGIS
100% (1)
Tutorial All PPSS PostGIS
11 pages
CENG301 DBMS - Session-3
100% (1)
CENG301 DBMS - Session-3
13 pages
Geostatistical Analysis
100% (1)
Geostatistical Analysis
38 pages
Chapter 1. Introduction
100% (2)
Chapter 1. Introduction
39 pages
Rubric QGIS
100% (1)
Rubric QGIS
2 pages
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
Big Data - Introduction: Ravichandran
100% (1)
Big Data - Introduction: Ravichandran
44 pages
In All The Regression Models That We Have Considered So
100% (1)
In All The Regression Models That We Have Considered So
52 pages
Lecture 01 - Introduction To GIS (Part - I)
No ratings yet
Lecture 01 - Introduction To GIS (Part - I)
18 pages
Coursefinal Part1
100% (1)
Coursefinal Part1
96 pages
For Power BI Installation:: Get Data: To Get The Data From Different Sources Like CSV, Excel, Test, SQL, Access Etc..
No ratings yet
For Power BI Installation:: Get Data: To Get The Data From Different Sources Like CSV, Excel, Test, SQL, Access Etc..
11 pages
Power BI - Exam Prep - 29 - 3
No ratings yet
Power BI - Exam Prep - 29 - 3
40 pages
Mapping Global Data Sets - Json
100% (1)
Mapping Global Data Sets - Json
15 pages
Tribhuwan University: Department of Computer Science and Information Technology B.SC - CSIT Programme
100% (1)
Tribhuwan University: Department of Computer Science and Information Technology B.SC - CSIT Programme
90 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Regression - Elements of AI 4-2
100% (2)
Regression - Elements of AI 4-2
20 pages
Spatial Database Lab1 PostgreSQL Tutorial I GUI
50% (2)
Spatial Database Lab1 PostgreSQL Tutorial I GUI
11 pages
Introduction To Web Mapping
No ratings yet
Introduction To Web Mapping
8 pages
Fifteen: 15.1 Lesson: Introduction To Databases
No ratings yet
Fifteen: 15.1 Lesson: Introduction To Databases
22 pages
Statistical Methods For Decision Making (SMDM) Project Report
100% (2)
Statistical Methods For Decision Making (SMDM) Project Report
22 pages
and Install QGIS: Updated: November 2018
No ratings yet
and Install QGIS: Updated: November 2018
10 pages
Unit 2 Data Management and Processing System
100% (1)
Unit 2 Data Management and Processing System
43 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Building A Connection in Qgis With Postgres SQL
100% (1)
Building A Connection in Qgis With Postgres SQL
18 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Whitepaper Advanced Analytics With Tableau Eng
No ratings yet
Whitepaper Advanced Analytics With Tableau Eng
21 pages
Geodatabase Topology Rules and Topology Error Fixe
No ratings yet
Geodatabase Topology Rules and Topology Error Fixe
29 pages
GIS Manual (Powerpoint) Final
100% (1)
GIS Manual (Powerpoint) Final
252 pages
Chapter 5 - Data Exploration and Visualization With
No ratings yet
Chapter 5 - Data Exploration and Visualization With
39 pages
1 IGET - GIS - 001 - Introduction - To - QGIS
100% (2)
1 IGET - GIS - 001 - Introduction - To - QGIS
21 pages
B.SC Statistics
No ratings yet
B.SC Statistics
16 pages
Spatial Data Exploration
No ratings yet
Spatial Data Exploration
8 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
Spatial Data Presentation and Visualization
No ratings yet
Spatial Data Presentation and Visualization
8 pages
2023 Data Analysis and Visualization Using Python
100% (1)
2023 Data Analysis and Visualization Using Python
9 pages
The Web Map Design
100% (1)
The Web Map Design
11 pages
Statistical Package For Social Sciences (SPSS) & Research Methodology
No ratings yet
Statistical Package For Social Sciences (SPSS) & Research Methodology
1 page
Tableau Desktop Training: About Intellipaat
No ratings yet
Tableau Desktop Training: About Intellipaat
10 pages
Introduction To Python For Data Science - Syllabus
100% (1)
Introduction To Python For Data Science - Syllabus
5 pages
Powerbivstableau 160912230240
100% (1)
Powerbivstableau 160912230240
34 pages
Data Visualization
No ratings yet
Data Visualization
9 pages
Question Gis
No ratings yet
Question Gis
3 pages
Machine Learning Statistical Model Using Transportation Data
No ratings yet
Machine Learning Statistical Model Using Transportation Data
32 pages
Overview of Data Science
No ratings yet
Overview of Data Science
20 pages
Data Visualization - Matplotlib PDF
100% (1)
Data Visualization - Matplotlib PDF
15 pages
Data Analysis and Visualization
No ratings yet
Data Analysis and Visualization
4 pages
Qgis ACF Training ENG - Draft - 110209
No ratings yet
Qgis ACF Training ENG - Draft - 110209
65 pages
What Is Exploratory Data Analysis (EDA)
100% (1)
What Is Exploratory Data Analysis (EDA)
13 pages
Data Science Math Skills
No ratings yet
Data Science Math Skills
1 page
Spatial Database Management System - BlackFridayWeb
100% (1)
Spatial Database Management System - BlackFridayWeb
26 pages
Materi 1 B VDE
No ratings yet
Materi 1 B VDE
18 pages
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
No ratings yet
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
1 page
Results: Contingency Tables
No ratings yet
Results: Contingency Tables
7 pages
ISO Tolerance Part 2
86% (7)
ISO Tolerance Part 2
50 pages
Moments and Measures of Skewness and Kurtosis
No ratings yet
Moments and Measures of Skewness and Kurtosis
19 pages
PROBLEM SET Sequence and Series
No ratings yet
PROBLEM SET Sequence and Series
1 page
Measures of Shape - Skewness and Kurtosis - MATH200 (TC3, Brown)
No ratings yet
Measures of Shape - Skewness and Kurtosis - MATH200 (TC3, Brown)
10 pages
Indroduction On Statistics
No ratings yet
Indroduction On Statistics
34 pages
Semi-Log Graph
No ratings yet
Semi-Log Graph
1 page
BRM Presentation Group 5 - Univariate & Bivariate Analysis
No ratings yet
BRM Presentation Group 5 - Univariate & Bivariate Analysis
26 pages
Measure of Variability Ungrouped Data
No ratings yet
Measure of Variability Ungrouped Data
22 pages
Alumni Giving Managerial Report Rachel Whitaker 1
No ratings yet
Alumni Giving Managerial Report Rachel Whitaker 1
4 pages
Test Bank for Statistics for Business and Economics, 8/E : 0321937945 - Quickly Download For The Best Reading Experience
100% (6)
Test Bank for Statistics for Business and Economics, 8/E : 0321937945 - Quickly Download For The Best Reading Experience
54 pages
Mathematics Statistical Square Root Mean Squares
No ratings yet
Mathematics Statistical Square Root Mean Squares
5 pages
Process Capability (CP, CPK) and Process Performance (PP, PPK) - What Is The Difference?
No ratings yet
Process Capability (CP, CPK) and Process Performance (PP, PPK) - What Is The Difference?
7 pages
SEMI-FINAL EXAM
No ratings yet
SEMI-FINAL EXAM
4 pages
6 Sigma Tools
No ratings yet
6 Sigma Tools
1 page
Laporan Projek Statistic (DAS 20502) Muhazreen
No ratings yet
Laporan Projek Statistic (DAS 20502) Muhazreen
20 pages
Chi Square Test
No ratings yet
Chi Square Test
7 pages
Outlier Detection and Removal
No ratings yet
Outlier Detection and Removal
2 pages
Get Solution Manual for Business Analytics, 4th Edition, Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann Free All Chapters Available
100% (3)
Get Solution Manual for Business Analytics, 4th Edition, Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann Free All Chapters Available
65 pages
Ejercicios IB Media Moda Mediana
No ratings yet
Ejercicios IB Media Moda Mediana
11 pages
Statistics Test Igcse Level
No ratings yet
Statistics Test Igcse Level
4 pages
Lecture Slides 1 Introduction
No ratings yet
Lecture Slides 1 Introduction
61 pages
Missing Value 11
No ratings yet
Missing Value 11
14 pages
Ade STAT314 Note
No ratings yet
Ade STAT314 Note
86 pages
One-Way ANOVA: Borbon, Klied Erica B. Largo, Vyankha Jhouana L
No ratings yet
One-Way ANOVA: Borbon, Klied Erica B. Largo, Vyankha Jhouana L
19 pages
Bachelor of Business Administration (BBA) : Q.T. in Business
No ratings yet
Bachelor of Business Administration (BBA) : Q.T. in Business
4 pages
When To Use A Particular Statistical Test
No ratings yet
When To Use A Particular Statistical Test
1 page
Unit 21 - Application of SD (Student)
No ratings yet
Unit 21 - Application of SD (Student)
8 pages
Ancova: Psy 420 Andrew Ainsworth
No ratings yet
Ancova: Psy 420 Andrew Ainsworth
53 pages