Spatial Econometricswith Stata 2022

Download as pdf or txt
Download as pdf or txt
You are on page 1of 153

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/358712548

Spatial Econometrics with Stata: Exploratory Spatial Data Analysis (ESDA),


Spatial Models for Cross-Sectional Data, Spatial Models for Panel Data.

Presentation · February 2022


DOI: 10.13140/RG.2.2.24440.93442

CITATIONS READS

0 51

1 author:

Marcos Herrera Gomez


National Scientific and Technical Research Council
66 PUBLICATIONS   234 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Land & Property Market Value View project

Econometrics View project

All content following this page was uploaded by Marcos Herrera Gomez on 18 February 2022.

The user has requested enhancement of the downloaded file.


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

Spatial Econometrics with Stata


Exploratory Spatial Data Analysis

Marcos Herrera-Gomez1
([email protected])

1 CONICET-IELDE
National University of Salta (Argentina)

Graduate School of International Development


Nagoya University (Japan)
February 14th, 2022
M. Herrera-Gomez Exploratory Spatial Data Analysis
Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

General index

1 Overview
2 Exploratory Spatial Data Analysis
3 ESDA: Visualising spatial data
Loading data and choropleth maps
4 ESDA: Discovering patterns of spatial dependence
Matrices and spatial tests
Global Spatial Tests
Local Spatial Tests
Spatial Correlogram
5 Summary

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

Objective of this Workshop

We will explore the main topics of Spatial Econometrics Analysis


using lattice data:

1 Exploratory Spatial Data Analysis (ESDA): February 14.


2 Spatial Models for Cross-Sectional Data: February 16.
3 Spatial Models for Panel Data: February 18.
Also, we will use Stata as the main software.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

What is Spatial Econometrics?

Jean Paelinck introduced the term “Spatial Econometrics” in


1974 to designate: “a combination of economic theory,
mathematical formalization and statistics to deal with: spatial
interdependence, importance of factors in other places, explicit
modelling of space".

Luc Anselin: “SE is an econometric branch dealing with spatial


effects in cross-sectional models and panel data models”.
Two spatial effects:
spatial dependence: reflects a situation where a variable at
location i, depends on the variable or another variable at
neighboring locations (main topic of this workshop).
spatial heterogeneity: (instability) over space.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

Why is Spatial Econometrics important nowadays?

Theory-driven
From individual decision to social-spatial interaction,
agent-based modelling.
Peer-effects, contextual effects, neighbourhood effects.

Data-driven
Geo-referenced information.

Technology
Geographical Information Systems.
Capability of statistical software.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

Types of spatial data

Spatial data are those data which combine attribute


information (e.g. name of the spatial object, population
density, productivity, etc.) with location information (spatial
coordinates) (georeferenced data).
Types of spatial data:
Geostatistic data: continuous spatial field (noise surface,
pollution surface).
Aerial (Lattice or Regional): discrete spatial data, fixed
polygons or points (counties, provinces, countries).
Point pattern: location as a random event (crimes, accidents).

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

Main books of Spatial Econometrics

Anselin L. (1988). Spatial Econometrics, Methods and


Models. Boston: Kluwer Academic.
LeSage J. and Pace R.K. (2009), Introduction to Spatial
Econometrics, Taylor & Francis Group, LLC.
Elhorst, J. P. (2014). Spatial econometrics: from
cross-sectional data to spatial panels (Vol. 479, p. 480).
Heidelberg: Springer.
Pesaran, M. H. (2015). Time series and panel data
econometrics. Oxford University Press. (Part VI).
Kelejian, H. and Piras, G. (2017). Spatial econometrics.
Academic Press.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

ESDA: Exploratory Spatial Data Analysis

ESDA has its origins in Exploratory Data Analysis (EDA).


Tukey (1977): “Exploratory data analysis is an attitude, a state
of flexibility, a willingness to look for those things that we
believe are not there, as well as those we believe to be there.”
The idea is explore data to discover potentially explicable
patterns.
Data visualization: chart, table, graph.
ESDA expands upon EDA where location is fundamental.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Why we need the “S” in EDA
locationally variant

locationally invariant
Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

Elements of ESDA

Visualize spatial distributions.


Discover patterns of spatial dependence:
global spatial autocorrelation, and spatial relationships.
Identify atypical locations or spatial outliers.
Detecting spatial heterogeneity:
Spatial regimes (spatial structural breaks).
Local spatial autocorrelation: hot spots, cold spots.
Regionalization (spatial clustering).

M. Herrera-Gomez Exploratory Spatial Data Analysis


Example used: Impact of net migration on unemployment

Hot topic in economics.


Competitive theories:
Orthodox theory: net migration causes more unemployment
(positive relationship).
New Economic Geography theory: net migration causes less
unemployment (negative relationship).
Definition of variables:
UNEMPLOYMENT RATE as the number of people
unemployed as a percentage of the labour force.
NET MIGRATION RATE as the ratio of net migration during
the year to the average population in that year. The value is
expressed per 1,000 persons. Net migration is the difference
between immigration into and emigration from the area during
the year (net migration is therefore negative when the number
of emigrants exceeds the number of immigrants).
Level of analysis:
NUTS 2 (Europe 15), 164 regions from 2007 to 2012.
Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

General Index

1 Overview
2 Exploratory Spatial Data Analysis
3 ESDA: Visualising spatial data
Loading data and choropleth maps
4 ESDA: Discovering patterns of spatial dependence
Matrices and spatial tests
Global Spatial Tests
Local Spatial Tests
Spatial Correlogram
5 Summary

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

From shape to dta

First we need a file of administrative areas:


http://www.diva-gis.org/, http://www.gadm.org/.
Georeferencing information (lattice data) usually is stored in a
shapefile (a collection of files with a common filename with at
least three connected files):
.shp is the file that store geometric objects.
.shx is an index file of geometric objects.
.dbf is the database file, in dBASE format, and
contains information of attributes of the objects.
Shape files cannot be read directly in Stata.
However, spshape2dta (or shp2dta <Stata 15) command can
import shapefiles and convert them in Stata format.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

From shape to dta

First we need a file of administrative areas:


http://www.diva-gis.org/, http://www.gadm.org/.
Georeferencing information (lattice data) usually is stored in a
shapefile (a collection of files with a common filename with at
least three connected files):
.shp is the file that store geometric objects.
.shx is an index file of geometric objects.
.dbf is the database file, in dBASE format, and
contains information of attributes of the objects.
Shape files cannot be read directly in Stata.
However, spshape2dta (or shp2dta <Stata 15) command can
import shapefiles and convert them in Stata format.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

From shape to dta

First we need a file of administrative areas:


http://www.diva-gis.org/, http://www.gadm.org/.
Georeferencing information (lattice data) usually is stored in a
shapefile (a collection of files with a common filename with at
least three connected files):
.shp is the file that store geometric objects.
.shx is an index file of geometric objects.
.dbf is the database file, in dBASE format, and
contains information of attributes of the objects.
Shape files cannot be read directly in Stata.
However, spshape2dta (or shp2dta <Stata 15) command can
import shapefiles and convert them in Stata format.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

From shape to dta

First we need a file of administrative areas:


http://www.diva-gis.org/, http://www.gadm.org/.
Georeferencing information (lattice data) usually is stored in a
shapefile (a collection of files with a common filename with at
least three connected files):
.shp is the file that store geometric objects.
.shx is an index file of geometric objects.
.dbf is the database file, in dBASE format, and
contains information of attributes of the objects.
Shape files cannot be read directly in Stata.
However, spshape2dta (or shp2dta <Stata 15) command can
import shapefiles and convert them in Stata format.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

spshape2dta command

Syntax:
spshape2dta “shp.filename”, saving(filename) [options]

Example:
spshape2dta "Nuts2_epsg4326", saving(nuts2)

The spshape2dta command generates two files:


nuts2.dta: contains information from .dbf file, _ID, latitude
(_CY) y longitude (_CX).
nuts2_shp.dta: contains geometric information from .shp file.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

spshape2dta command

Syntax:
spshape2dta “shp.filename”, saving(filename) [options]

Example:
spshape2dta "Nuts2_epsg4326", saving(nuts2)

The spshape2dta command generates two files:


nuts2.dta: contains information from .dbf file, _ID, latitude
(_CY) y longitude (_CX).
nuts2_shp.dta: contains geometric information from .shp file.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

Merging data sets

The new database (nuts2.dta) does not contain information


about economics variables.
Using the index of geometric objects has been generated a
excel file with variables from Eurostat: unemployment rate and
net migration rate.
Both dataset are easily jointed using POLY_ID as link variable:

import excel "C:\.\.\.\nuts2_164.xls", firstrow


save "C:\.\.\.\migr_unemp07_12.dta"
use nuts2, clear
merge 1:1 POLY_ID using migr_unemp, gen(union) force

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

Merging data sets

The new database (nuts2.dta) does not contain information


about economics variables.
Using the index of geometric objects has been generated a
excel file with variables from Eurostat: unemployment rate and
net migration rate.
Both dataset are easily jointed using POLY_ID as link variable:

import excel "C:\.\.\.\nuts2_164.xls", firstrow


save "C:\.\.\.\migr_unemp07_12.dta"
use nuts2, clear
merge 1:1 POLY_ID using migr_unemp, gen(union) force

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

Merging data sets

The new database (nuts2.dta) does not contain information


about economics variables.
Using the index of geometric objects has been generated a
excel file with variables from Eurostat: unemployment rate and
net migration rate.
Both dataset are easily jointed using POLY_ID as link variable:

import excel "C:\.\.\.\nuts2_164.xls", firstrow


save "C:\.\.\.\migr_unemp07_12.dta"
use nuts2, clear
merge 1:1 POLY_ID using migr_unemp, gen(union) force

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

Visualizing in maps
A choropleth is a map in which each area is coloured with an
intensity proportional to the value of a quantitative variable. Some
classical maps:
Quantiles: class breaks correspond to quantiles of the distribution
of variable (each class includes approximately the same number of
polygons).
Equal Intervals: class breaks correspond to values that divide the
distribution of variable attribute into k equal-width intervals.
Boxplot: the distribution of variable attribute is divided into 6
classes defined as follows: [min, p25 − 1.5 ∗ iqr ],
(p25 − 1.5 ∗ iqr , p25], (p25, p50],(p50, p75], (p75, p75 + 1.5 ∗ iqr ]
and (p75 + 1.5 ∗ iqr , max], where iqr is the interquartile range.
Standard Deviates: the distribution of variable attribute is divided
into k classes (2 ≤ k ≤ 9) whose width is defined as a fraction p of
its standard deviation sd.
M. Herrera-Gomez Exploratory Spatial Data Analysis
Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

spmap command

Syntax:
spmap [attribute] [if] [in] using basemap [,basemap_options]

Details: basemap_options
polygon(polygon_suboptions)
line(line_suboptions)
point(point_suboptions)
diagram(diagram_suboptions)
arrow(arrow_suboptions)
label(label_suboptions)
scalebar(scalebar_suboptions)
graph_options]

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data Loading data and choropleth maps
ESDA: Discovering patterns of spatial dependence
Summary

spmap command

Syntax:
spmap [attribute] [if] [in] using basemap [,basemap_options]

Details: basemap_options
polygon(polygon_suboptions)
line(line_suboptions)
point(point_suboptions)
diagram(diagram_suboptions)
arrow(arrow_suboptions)
label(label_suboptions)
scalebar(scalebar_suboptions)
graph_options]

M. Herrera-Gomez Exploratory Spatial Data Analysis


Quantile map

spmap U2012 using nuts2_shp, id(_ID) clmethod(q) title("Unemployment rate") ///


legend(size(medium) position(5)) fcolor(Blues2) note("Europe, 2012" "Source:
Eurostat")
Quantile map

spmap NM2012 using nuts2_shp, id(_ID) clmethod(q) title("Unemployment rate")


legend(size(medium) position(5)) fcolor(Blues2) note("Europe, 2012" "Source:
Eurostat")
Equal intervals map

spmap NM2012 using nuts2_shp, id(_ID) clmethod(e) title("Net migration rate")


legend(size(medium) position(5)) fcolor(BuRd) note("Europe, 2012" "Source:
Eurostat")
Box map

spmap NM2012 using nuts2_shp, id(_ID) clmethod(boxplot) title("Net migration


rate") legend(size(medium) position(5)) fcolor(Rainbow) note("Europe, 2012"
"Source: Eurostat")
Box map

spmap U2012 using nuts2_shp, id(_ID) clmethod(boxplot) title("Unemployment


rate") legend(size(medium) position(5)) fcolor(Heat) note("Europe, 2012" "Source:
Eurostat")
Deviation map

spmap NM2012 using nuts2_shp, id(_ID) clmethod(s) title("Net migration rate")


legend(size(medium) position(5)) fcolor(BuRd) note("Europe, 2012" "Source:
Eurostat")
Combine map

spmap U2012 using nuts2_shp, id(_ID) fcolor(RdYlBu) cln(8)


point(data(migr_unemp_final) xcoord(X) ycoord(Y) deviation(NM2012) sh(T)
fcolor(dknavy) size(*0.3)) legend(size(medium) position(5)) legt(Unemployment)
note("Solid triangles indicate values over the mean of net-migration." "Europa, 2012.
Source: Eurostat")
Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

General Index

1 Overview
2 Exploratory Spatial Data Analysis
3 ESDA: Visualising spatial data
Loading data and choropleth maps
4 ESDA: Discovering patterns of spatial dependence
Matrices and spatial tests
Global Spatial Tests
Local Spatial Tests
Spatial Correlogram
5 Summary

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Centrality of spatial W

We show spatial concentration in previous maps, in a formal way:


      
yi 0 αij αik yi ui
 yj  =  αji 0 αjk   yj  +  uj  , (??) (1)
yk αki αkj 0 yk uk
y = Ay + u, (2)

Strategy of identification:
   
0 αij αik 0 wij wik
A =  αji 0 αjk  = ρ  wji 0 wjk  = ρW .
αki αkj 0 wki wkj 0
We transform a non-identified model in other that contains only one
parameter: ρ.
W captures ‘who is the neighbour of whom’: must be EXOGENOUS!

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Criteria used to create W


Usually, the building of W is an ad-hoc procedure of the researcher.
Common criteria are:
1 Geographical:
Distance functions:
inverse
inverse with threshold
Contiguity:
Rook
Queen
K nearest neighbours.
2 Socio-economic:
Similarity degree in economic dimensions (or social networks).
3 Combinations between both criteria.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Advices about W

Griffith (1995):
“It is better to use a reasonable selection of the geographic
weight matrix that considers all null connections”.
“A relatively large number of regional units must be used in a
spatial statistical analysis.”
“Models with lower orders should be preferred over models
with higher orders”
“In general, it is better to use an under-identified weight matrix
than an over-identified one”.
• Exceptionally, it can be built from theory.
• It can be built based on non-geographical conditions: beware of
endogeneity!
• Generally, we work with row-normalized matrix.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Generating W using Stata


In Stata there are (at least) three commands to generate W:
spatwmat:
Distance criterion.
Used for spatial univariate analysis.
Format file no compatible with spmatrix (and spmat).
spwmatrix:
Generate W using geographic criteria (no contiguity).
Generate W under socio-economic criteria.
Import, export and manipulate from GeoDa.
Compatible format file with spatwmat.
spmatrix (Stata default):
Generate W using geographic criteria (no under knn).
Import, export and read matrices from GeoDa.
Format file no compatible with spatwmat.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Generating W using Stata


We will use a geographic criterion:
spwmatrix: for example 5-nn.
. spwmatrix gecon _CY _CX, wn(W5st) knn(5) row con
Nearest neighbor (knn = 5) spatial weights matrix (164 x 164)
calculated successfully and the following action(s) taken:
- Spatial weights matrix created as Stata object(s): W5st.
- Spatial weights matrix has been row-standardized.
Connectivity Information for the Spatial Weights Matrix
- Sparseness: 3.049%
- Neighbors: Min : 5
Mean : 5
Median: 5
Max : 5

It is not advisable to work with units without neighbours.


In addition, it is usual to standardize W (usually row-standardize).
M. Herrera-Gomez Exploratory Spatial Data Analysis
Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

General Index

1 Overview
2 Exploratory Spatial Data Analysis
3 ESDA: Visualising spatial data
Loading data and choropleth maps
4 ESDA: Discovering patterns of spatial dependence
Matrices and spatial tests
Global Spatial Tests
Local Spatial Tests
Spatial Correlogram
5 Summary

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Univariate spatial tests


The following statistics provide a measure of global spatial autocorrelation and
allow us to know its significance.
Moran I test (1950):
∑∑(yi −y )wij (yj −y )
n i j
I= S0 N .
2
∑ (yi −y )
i=1

Geary c test (1954):


n n
2
∑ ∑ wij (yi −yj )
n−1 i=1j=1
c= 2S0 n
2
.
∑ (yi −y )
i=1

Getis-Ord G test (1992):


n n
∑ ∑ wij yi yj
i j6=i
G= n n .
∑ ∑ yi yj
i j6=i

Null hypotheses of tests: No spatial autocorrelation.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Global spatial tests in Stata


. spatgsa U2012, w(W5st) moran geary two
Measures of global spatial autocorrelation
--------------------------------------------------------------
Moran’s I
--------------------------------------------------------------
Variables | I E(I) sd(I) z p-value*
--------------------+-----------------------------------------
U2012 | 0.767 -0.006 0.045 17.084 0.000
--------------------------------------------------------------
Geary’s c
--------------------------------------------------------------
Variables | c E(c) sd(c) z p-value*
--------------------+-----------------------------------------
U2012 | 0.228 1.000 0.054 -14.282 0.000
--------------------------------------------------------------
*2-tail test

. spatgsa U2012, w(W5bin) go two


Measures of global spatial autocorrelation
--------------------------------------------------------------
Getis & Ord’s G
--------------------------------------------------------------
Variables | G E(G) sd(G) z p-value*
--------------------+-----------------------------------------
U2012 | 0.039 0.031 0.001 11.864 0.000
--------------------------------------------------------------
*2-tail test

M. Herrera-Gomez Exploratory Spatial Data Analysis


Moran’s I scatterplot

splagvar U2012, wname(W5st) wfrom(Stata) ind(U2012) order(1) plot(U2012)


moran(U2012)
Moran’s I scatterplot

splagvar NM2012, wname(W5st) wfrom(Stata) ind(NM2012) order(1) plot(NM2012)


moran(NM2012)
Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

General Index

1 Overview
2 Exploratory Spatial Data Analysis
3 ESDA: Visualising spatial data
Loading data and choropleth maps
4 ESDA: Discovering patterns of spatial dependence
Matrices and spatial tests
Global Spatial Tests
Local Spatial Tests
Spatial Correlogram
5 Summary

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Local indicators of spatial association


A version of Moran I test is used to detect spatial clusters in local
dimension:
n
(xi − x)
Ii (d) = n ∑ wij (d) (xj − x) , (3)
1 2 j=1,j6=i
n ∑ (x i − x)
i=1
where wij (d) is a weighting distance.
Null hypotheses is no spatial autocorrelation and the significance of
Ii could be contrasted using normal distribution:
[Ii − E [Ii ]]
z [Ii ] = p .
Var [Ii ]
This test allows grouping observations in 4 categories (see scatter
Moran): High-High (H-H), Low-Low (L-L), Low-High (L-H) and
High-Low (H-L).
M. Herrera-Gomez Exploratory Spatial Data Analysis
Local Moran’s I scatterplot

genmsp_v0 U2012, w(W5st)


graph twoway (scatter Wstd_U2012 std_U2012 if pval_U2012>=0.05, msymbol(i) mlabel (_ID)
mlabsize(*0.6) mlabpos(c)) (scatter Wstd_U2012 std_U2012 if pval_U2012<0.05, msymbol(i) mlabel
(_ID) mlabsize(*0.6) mlabpos(c) mlabcol(red)) (lfit Wstd_U2012 std_U2012), yline(0, lpattern(--))
xline(0, lpattern(--)) xlabel(-1.5(1)4.5, labsize(*0.8)) xtitle("{it:z}") ylabel(-1.5(1)3.5, angle(0)
labsize(*0.8)) ytitle("{it:Wz}") legend(off) scheme(s1color) title("Local Moran I of Unemployment
rate")
Local Moran’s I map

spmap msp_U2012 using nuts2_shp, id(_ID) clmethod(unique) title("Unemployment


rate") legend(size(medium) position(4)) ndl("No signif.") fcolor(blue red) ///
note("Europe, 2012" "Source: Eurostat")
Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

General Index

1 Overview
2 Exploratory Spatial Data Analysis
3 ESDA: Visualising spatial data
Loading data and choropleth maps
4 ESDA: Discovering patterns of spatial dependence
Matrices and spatial tests
Global Spatial Tests
Local Spatial Tests
Spatial Correlogram
5 Summary

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Matrices and spatial tests
Exploratory Spatial Data Analysis
Global Spatial Tests
ESDA: Visualising spatial data
Local Spatial Tests
ESDA: Discovering patterns of spatial dependence
Spatial Correlogram
Summary

Alternative measure of global spatial autocorrelation: Correlations


computed for all pairs of observations as a function of the distance.
Sample autocorrelation between regions i and j:

 (zi − z) zj − z
ρij = ρ zi , zj =
(1/n) ∑ (zh − z)2
h=1

Problem: there are n (n − 1) /2 individuals values of ρij . 


Solution: spatial autocorrelation as a distance function: ρij = g dij

ρ (d) = ∑∑1 (dij/h) (zi − z) zj − z /∑∑1 (dij/h) = I ∗ (h)



i j i j

where 1 is an indicator function, h the bandwidth.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Spatial correlogram

spatcorr U2012, bands(0(2)12) xcoord(_CX) ycoord(_CY) graph


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

Summing up

ESDA is an important initial step in spatial analysis.


Show qualitative spatial dependence (mapping).
Find outliers/spatial regimes/clustering.
Quantify the spatial autocorrelation and its significance.
Stata has incorporated tools for spatial analysis.
ESDA can be carried out completely, as in others software.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary

Some references

Anselin L (1995) Local indicators of spatial association – LISA. Geogr Anal


27(2):93–115.
Bivand RS (2010) Exploratory spatial data analysis. In: Fischer MM, Getis A
(eds) Handbook of applied spatial analysis: software tools, methods and
applications. Springer, Berlin/Heidelberg, pp 219–254.
Monmonier M (1996) How to lie with maps, 2nd edn. University of Chicago
Press, Chicago.
Symanzik, J. (2014). Exploratory spatial data analysis. Handbook of regional
science, 1295-1310.
Tukey JW (1977) Exploratory data analysis. Addison-Wesley Pub. Co, Reading.
Stata:
Drukker, D. M. et al. (2013). Creating and managing spatial-weighting matrices
with the spmat command. Stata Journal, 13(2), 242-286.
Pisati, M. (2008). SPMAP: Stata module to visualize spatial data. Statistical
Software Components.

M. Herrera-Gomez Exploratory Spatial Data Analysis


Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Spatial Econometrics with Stata


Spatial Models for Cross-sectional data

Marcos Herrera-Gomez1
([email protected])

1 CONICET-IELDE
National University of Salta (Argentina)

Graduate School of International Development


Nagoya University (Japan)
February 16th, 2022
M. Herrera-Gomez Spatial Cross-sectional Models
Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

General index

1 Introduction
2 Taxonomy of spatial models
3 Methods of estimation
Maximum likelihood estimation
Instrumental Variables and Generalized Method of Moments
4 Spatial Modelling: Data-driven strategies
Specific to General modelling
General to Specific modelling
5 Interpretation

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Sources of spatial dependence


Spatial spillover
Example: the growth rate of a region is affected by characteristics
and performances of its neighbours.
Spatial spillovers are not instantaneous, require some time to arise
(dynamic feedback effects).
Omitted variables
Unobservable factors (e.g., location amenities) which exert an
influence on the dependent variable and are spatially correlated.
It is unlikely that explanatory variables are readily available to
capture these types of latent variables.
Measurement errors and unobserved heterogeneity
Administrative boundaries (GIS induced) that don’t accurately
reflect the nature of underlying Data Generating Process.
Anselin (2003) proposes a taxonomy of regression models: spatially lagged
dependent variables (Wy), spatially lagged explanatory variables (WX) and
spatially lagged error term (Wu).
M. Herrera-Gomez Spatial Cross-sectional Models
Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Sources of spatial dependence


Spatial spillover
Example: the growth rate of a region is affected by characteristics
and performances of its neighbours.
Spatial spillovers are not instantaneous, require some time to arise
(dynamic feedback effects).
Omitted variables
Unobservable factors (e.g., location amenities) which exert an
influence on the dependent variable and are spatially correlated.
It is unlikely that explanatory variables are readily available to
capture these types of latent variables.
Measurement errors and unobserved heterogeneity
Administrative boundaries (GIS induced) that don’t accurately
reflect the nature of underlying Data Generating Process.
Anselin (2003) proposes a taxonomy of regression models: spatially lagged
dependent variables (Wy), spatially lagged explanatory variables (WX) and
spatially lagged error term (Wu).
M. Herrera-Gomez Spatial Cross-sectional Models
Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Sources of spatial dependence


Spatial spillover
Example: the growth rate of a region is affected by characteristics
and performances of its neighbours.
Spatial spillovers are not instantaneous, require some time to arise
(dynamic feedback effects).
Omitted variables
Unobservable factors (e.g., location amenities) which exert an
influence on the dependent variable and are spatially correlated.
It is unlikely that explanatory variables are readily available to
capture these types of latent variables.
Measurement errors and unobserved heterogeneity
Administrative boundaries (GIS induced) that don’t accurately
reflect the nature of underlying Data Generating Process.
Anselin (2003) proposes a taxonomy of regression models: spatially lagged
dependent variables (Wy), spatially lagged explanatory variables (WX) and
spatially lagged error term (Wu).
M. Herrera-Gomez Spatial Cross-sectional Models
Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Sources of spatial dependence


Spatial spillover
Example: the growth rate of a region is affected by characteristics
and performances of its neighbours.
Spatial spillovers are not instantaneous, require some time to arise
(dynamic feedback effects).
Omitted variables
Unobservable factors (e.g., location amenities) which exert an
influence on the dependent variable and are spatially correlated.
It is unlikely that explanatory variables are readily available to
capture these types of latent variables.
Measurement errors and unobserved heterogeneity
Administrative boundaries (GIS induced) that don’t accurately
reflect the nature of underlying Data Generating Process.
Anselin (2003) proposes a taxonomy of regression models: spatially lagged
dependent variables (Wy), spatially lagged explanatory variables (WX) and
spatially lagged error term (Wu).
M. Herrera-Gomez Spatial Cross-sectional Models
Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Alternatives of specification
General Cliff-Ord model (Manski model)

y = ρWy + X β + WX θ + u,
u = λ Wu + ε.

Imposing restrictions in θ , ρ and λ we can obtain the following models:


θ = 0, ρ 6= 0, λ = 0 → SLM (Spatial Lag Model).
θ = 0, ρ 6= 0, λ 6= 0 → SEM (Spatial Error Model).
θ = 0, ρ 6= 0, λ 6= 0 → SARAR (Spatial AutoRegressive model with
AutoRegressive error).
θ 6= 0, ρ = 0, λ = 0 → SLX (Spatial Lag in X).
θ 6= 0, ρ 6= 0, λ = 0 → SDM (Spatial Durbin Model).
θ 6= 0, ρ = 0, λ 6= 0 → SDM (Spatial Durbin Error Model).

M. Herrera-Gomez Spatial Cross-sectional Models


Alternatives of specification
Introduction
Taxonomy of spatial models
Methods of estimation Maximum likelihood estimation
Spatial Modelling: Data-driven strategies Instrumental Variables and Generalized Method of Moments
Interpretation
Summary

General Index

1 Introduction
2 Taxonomy of spatial models
3 Methods of estimation
Maximum likelihood estimation
Instrumental Variables and Generalized Method of Moments
4 Spatial Modelling: Data-driven strategies
Specific to General modelling
General to Specific modelling
5 Interpretation

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Maximum likelihood estimation
Spatial Modelling: Data-driven strategies Instrumental Variables and Generalized Method of Moments
Interpretation
Summary

MLE

The point of departure: assumption of normality for the error terms,


ε ∼ MVN(0, Ω).
The joint likelihood then follows from the multivariate normal distribution
for y .

SARAR model
Assuming |ρ| < 1 and |λ | < 1, the log likelihood function is
  n 1 1 0
L β , ρ, λ , σ 2 = − ln (π) − lnΩ + ln |I − ρW | + ln |I − λ W | − v v
2 2 2
0 0 0
with v v = (Ay − X β ) B Ω−1 B(Ay − X β ) as the sum of squares of the
0
transformed errors; and E εε = Ω as the variance-covariance matrix.
Jacobian term is the determinant of a full n × n matrix, e.g. |I − ρW |

M. Herrera-Gomez Spatial Cross-sectional Models


Stata syntax for MLE: spregress depvar [indepvars], ml
estimator [options]

SLM
spregress U2012 NM2012, ml dvarlag(W5st)

SEM
spregress U2012 NM2012, ml errorlag(W5st)

SARAR
spregress U2012 NM2012, ml dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, ml dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, ml errorlag(W5st) ivarlag(W5st: NM2012)
Stata syntax for MLE: spregress depvar [indepvars], ml
estimator [options]

SLM
spregress U2012 NM2012, ml dvarlag(W5st)

SEM
spregress U2012 NM2012, ml errorlag(W5st)

SARAR
spregress U2012 NM2012, ml dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, ml dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, ml errorlag(W5st) ivarlag(W5st: NM2012)
Stata syntax for MLE: spregress depvar [indepvars], ml
estimator [options]

SLM
spregress U2012 NM2012, ml dvarlag(W5st)

SEM
spregress U2012 NM2012, ml errorlag(W5st)

SARAR
spregress U2012 NM2012, ml dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, ml dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, ml errorlag(W5st) ivarlag(W5st: NM2012)
Stata syntax for MLE: spregress depvar [indepvars], ml
estimator [options]

SLM
spregress U2012 NM2012, ml dvarlag(W5st)

SEM
spregress U2012 NM2012, ml errorlag(W5st)

SARAR
spregress U2012 NM2012, ml dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, ml dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, ml errorlag(W5st) ivarlag(W5st: NM2012)
Stata syntax for MLE: spregress depvar [indepvars], ml
estimator [options]

SLM
spregress U2012 NM2012, ml dvarlag(W5st)

SEM
spregress U2012 NM2012, ml errorlag(W5st)

SARAR
spregress U2012 NM2012, ml dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, ml dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, ml errorlag(W5st) ivarlag(W5st: NM2012)
Introduction
Taxonomy of spatial models
Methods of estimation Maximum likelihood estimation
Spatial Modelling: Data-driven strategies Instrumental Variables and Generalized Method of Moments
Interpretation
Summary

General Index

1 Introduction
2 Taxonomy of spatial models
3 Methods of estimation
Maximum likelihood estimation
Instrumental Variables and Generalized Method of Moments
4 Spatial Modelling: Data-driven strategies
Specific to General modelling
General to Specific modelling
5 Interpretation

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Maximum likelihood estimation
Spatial Modelling: Data-driven strategies Instrumental Variables and Generalized Method of Moments
Interpretation
Summary

IV and GMM
The endogeneity of the Wy can also be addressed by means of an instrumental
variables or two-stage least squares (2SLS) approach:

 
E (y |X ) = [I − ρW ]−1 X β = I + ρW + ρ 2 W 2 + · · · X β

= X β + ρWX β + ρ 2 W 2 X β + · · ·

Then, Wy is instrumented using WX , W 2 X ,...


For the spatial term in error, Wu, Kelejian and Prucha (1999) develop a set of
moment conditions that yield estimation equations for the parameter λ :
h 0 i
E u u/n = σ 2
h 0 0 i  0 
E u W Wu/n = σ 2 /n tr W W
h 0 i
E u Wu/n = 0

M. Herrera-Gomez Spatial Cross-sectional Models


Stata syntax for IV/GMM: spregress depvar [indepvars],
gs2sls estimator [options]

SLM
spregress U2012 NM2012, gs2sls dvarlag(W5st)

SEM
spregress U2012 NM2012, gs2sls errorlag(W5st)

SARAR
spregress U2012 NM2012, gs2sls dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, gs2sls dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, gs2sls errorlag(W5st) ivarlag(W5st: NM2012)
Stata syntax for IV/GMM: spregress depvar [indepvars],
gs2sls estimator [options]

SLM
spregress U2012 NM2012, gs2sls dvarlag(W5st)

SEM
spregress U2012 NM2012, gs2sls errorlag(W5st)

SARAR
spregress U2012 NM2012, gs2sls dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, gs2sls dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, gs2sls errorlag(W5st) ivarlag(W5st: NM2012)
Stata syntax for IV/GMM: spregress depvar [indepvars],
gs2sls estimator [options]

SLM
spregress U2012 NM2012, gs2sls dvarlag(W5st)

SEM
spregress U2012 NM2012, gs2sls errorlag(W5st)

SARAR
spregress U2012 NM2012, gs2sls dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, gs2sls dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, gs2sls errorlag(W5st) ivarlag(W5st: NM2012)
Stata syntax for IV/GMM: spregress depvar [indepvars],
gs2sls estimator [options]

SLM
spregress U2012 NM2012, gs2sls dvarlag(W5st)

SEM
spregress U2012 NM2012, gs2sls errorlag(W5st)

SARAR
spregress U2012 NM2012, gs2sls dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, gs2sls dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, gs2sls errorlag(W5st) ivarlag(W5st: NM2012)
Stata syntax for IV/GMM: spregress depvar [indepvars],
gs2sls estimator [options]

SLM
spregress U2012 NM2012, gs2sls dvarlag(W5st)

SEM
spregress U2012 NM2012, gs2sls errorlag(W5st)

SARAR
spregress U2012 NM2012, gs2sls dvarlag(W5st) errorlag(W5st)

SDM
spregress U2012 NM2012, gs2sls dvarlag(W5st) ivarlag(W5st: NM2012)

SDEM
U2012 NM2012, gs2sls errorlag(W5st) ivarlag(W5st: NM2012)
Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

General Index

1 Introduction
2 Taxonomy of spatial models
3 Methods of estimation
Maximum likelihood estimation
Instrumental Variables and Generalized Method of Moments
4 Spatial Modelling: Data-driven strategies
Specific to General modelling
General to Specific modelling
5 Interpretation

M. Herrera-Gomez Spatial Cross-sectional Models


Specific-to-General (STGE) Modelling
Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Residual tests

The first step (under STGE) is to estimate a no-spatial model and obtain the
residuals.
In our case our initial model is

U2012 = β1 + β2 NM2012 + u.
This equation is estimated under OLS:
. reg U2012 NM2012
Source | SS df MS Number of obs = 164
-------------+------------------------------ F( 1, 162) = 56.32
Model | 1453.94714 1 1453.94714 Prob > F = 0.0000
Residual | 4182.20231 162 25.8160636 R-squared = 0.2580
-------------+------------------------------ Adj R-squared = 0.2534
Total | 5636.14945 163 34.577604 Root MSE = 5.081
------------------------------------------------------------------------------
U2012 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
NM2012 | -.7011928 .0934347 -7.50 0.000 -.8856998 -.5166859
_cons | 11.43504 .4697136 24.34 0.000 10.50749 12.36259
------------------------------------------------------------------------------

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Residual tests

There are a set of tests that allow the detection of spatial autocorrelation:

Parameters in H1
Null hypotheses Test
Spatial lag Error lag
yes - LMERROR
λ =0
yes yes RLMERROR
- yes LMLAG
ρ =0
yes yes RLMLAG
No spatial
Moran´s I
autocorrelation

M. Herrera-Gomez Spatial Cross-sectional Models


Moran I test

1 Null and Alternative Hypotheses: H0 : No spatial autocorrelation,


H1 : No H0
2 Moran (1950) proposes the following test:
0
e We
I= 0
ee
where u are the OLS residual, W is the row-normalized weighting matrix.
Asimptotic distribution, under H0 :


n [I − E (I )] ∼ N [0, V (I )]
as

The rejection of the null hypothesis should lead us to specify a model


where the spatial structure is present.
There is no model in the alternative hypothesis.
Moran’s test works well even for small sample sizes, although a sample
size greater than 40 units is advisable.
Disadvantage: it behaves like a misspecification test.
Spatial Dependence in the error term: LMERROR and RLMERROR

1 Null and Alternative Hypotheses: H0 : λ = 0, H1 : λ 6= 0


(SEM) y = X β + u, u = λ Wu + ε
2 Simple version test:
" 0
#2
1 e We
LMERROR = ∼ χ 2 (1)
T1 σe2 as

3 Robust test: this version introduces a correction to the LMERROR under


the presence of a spatial lag ρ.

0  −1 0 2
e We e Wy
σ
− T1 nJeρβ
e2 σe2
RLMERROR = −1 ∼ χ 2 (1)
as

2
T1 − T1 nJρβe
 0

where e are the OLS residuals, T1 = tr W 2 + W W ,
  0  
  WX βe M WX βe  −1 0 0
, M = I − X X 0 X X yσ e2 = e e
nJeρβ = T1 + σe2 n .
Spatial Dependence in Dependent variable: LMLAG and RLMLAG

1 Null and Alternative Hypotheses: H0 : ρ = 0, H1 : ρ 6= 0


(SLM) y = ρWy + X β + ε
2 Simple version test:
 0
2
e Wy
σe2
LMLAG = ∼ χ 2 (1)
nJeρβ as

3 Robust test: this version introduces a correction to the LMLAG under the
presence of a spatial lag λ .
 0
2
0
e Wy
σ
− e σeWe
e2 2

RLMLAG =   ∼ χ 2 (1)
nJeρβ − T1 as
  0  
  WX βe M WX βe
where e are the OLS residuals, nJeρβ = T1 + e2
σ
,

 0
  0 −1 0 0
e e
T1 = tr W 2 + W W , M = I − X X X e2 =
X yσ n .
Spatial test in Stata

reg U2012 NM2012


spatdiag, weights(W5st)

Diagnostic tests for spatial dependence in OLS regression


------------------------------------------------------------
Diagnostics
------------------------------------------------------------
Test | Statistic df p-value
-------------------------------+----------------------------
Spatial error: |
Moran’s I | 12.703 1 0.000
Lagrange multiplier | 148.081 1 0.000
Robust Lagrange multiplier | 0.750 1 0.386
|
Spatial lag: |
Lagrange multiplier | 182.220 1 0.000
Robust Lagrange multiplier | 34.889 1 0.000
------------------------------------------------------------
According to the evidence of the tests, an SLM should be estimated:

U20012 = ρ (W × U20012) + β1 + β2 NM2012 + u.


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Don’t forget the SLX

From OLS and using LM’s you are exploring SEM and SLM models.
However, the third alternative (according to the chart) was not explored.

Now, we check the SLX:

U20012 = β1 + β2 NM2012 + θ1 (W × NM2012) + u.

splagvar , wname(W5st) wfrom(Stata) ind(NM2012)


reg U2012 NM2012 wx_NM2012 (omitted results)

SLX is a competitive model: θ1 significant.


Now, we check the presence of spatial effects in SLX’s residuals.

M. Herrera-Gomez Spatial Cross-sectional Models


Spatial test in Stata

reg U2012 NM2012 wx_NM2012


spatdiag, weights(W5st)

Diagnostics
------------------------------------------------------------
Test | Statistic df p-value
-------------------------------+----------------------------
Spatial error: |
Moran’s I | 12.868 1 0.000
Lagrange multiplier | 151.248 1 0.000
Robust Lagrange multiplier | 2.171 1 0.141
|
Spatial lag: |
Lagrange multiplier | 156.114 1 0.000
Robust Lagrange multiplier | 7.037 1 0.008
------------------------------------------------------------
According to the evidence of the tests, an SDM should be estimated:

U20012 = ρ (W × U20012) + β1 + β2 NM2012 + θ1 (W × NM2012) + u.


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Conclusion of STGE

From a OLS model, we detect spatial effects on dependent variable:


SLM

U20012 = ρ (W × U20012) + β1 + β2 NM2012 + u.

But, following the 3rd alternative, we detect spatial effects on SLX:


SDM.

U20012 = ρ (W × U20012) + β1 + β2 NM2012 + θ1 (W × NM2012) + u.

Also, there is another alternative: the SDM could be reduced to SEM.

M. Herrera-Gomez Spatial Cross-sectional Models


Likelihood Ratio: Common factor test
Assuming the SDM model has been estimated:

y = ρWy + X β + WX θ + u.

The null and alternative hypotheses are: H0 : θ + ρβ = 0, H1 : θ + ρβ 6= 0.


Under H0 , θ = −ρβ , and replacing into the SDM model:

y = ρWy + X β + WX (−ρβ ) + u = ρWy + X β − ρWX β + u,


(I − ρW ) y = (I − ρW ) X β + u.

The last expression is summarized in SEM: y = X β + (I − λ W )−1 ε, where ρ


has been replaced by λ .
Under null hypothesis, we have an SEM and, under alternative hypothesis, an
SDM:
h i
LRCOMFAC = 2 l|H1 − l|H0 ∼ χq2
as

lrtest SDM_ml SEM_ml


Likelihood-ratio test LR chi2(1) = 6.81
(Assumption: SEM_ml nested in SDM_ml) Prob > chi2 = 0.0091
Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Conclusion of STGE

Variable OLS SLX SLM SDM

NM2012 −0.70∗∗ −0.29∗∗ −0.19∗∗ −0.16∗∗


W × NM2012 −0.95∗∗ −0.10∗∗∗
const −11.44∗∗ 13.02∗∗ −2.41∗∗ −2.82∗∗
ρb −0.82∗∗ −0.80∗∗

loglik −489.28 −478.67 −419.35 −418.88


AIC 1000.56 963.33 846.70 847.76
Nota: ∗∗ p < 0.05.

What is the best model?

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

General Index

1 Introduction
2 Taxonomy of spatial models
3 Methods of estimation
Maximum likelihood estimation
Instrumental Variables and Generalized Method of Moments
4 Spatial Modelling: Data-driven strategies
Specific to General modelling
General to Specific modelling
5 Interpretation

M. Herrera-Gomez Spatial Cross-sectional Models


General-to-Specific (GETS) Modelling
Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Initial model for GETS

The GETS starts with the most complex model and then, using LR
test, we go down, dropping non significant variables.

LeSage and Pace (2009) suggest to start with Spatial Durbin


Model (you reach most nested models).

Elhorst (2014) suggest compare with the Spatial Durbin Error


Model (produces similar predictions in many cases).

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Initial model for GETS

The GETS starts with the most complex model and then, using LR
test, we go down, dropping non significant variables.

LeSage and Pace (2009) suggest to start with Spatial Durbin


Model (you reach most nested models).

Elhorst (2014) suggest compare with the Spatial Durbin Error


Model (produces similar predictions in many cases).

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Initial model for GETS

The GETS starts with the most complex model and then, using LR
test, we go down, dropping non significant variables.

LeSage and Pace (2009) suggest to start with Spatial Durbin


Model (you reach most nested models).

Elhorst (2014) suggest compare with the Spatial Durbin Error


Model (produces similar predictions in many cases).

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

LR test from SDM


lrtest SDM_ml SEM_ml
Likelihood-ratio test LR chi2(1) = 6.81
(Assumption: SEM_ml nested in SDM_ml) Prob > chi2 = 0.0091

lrtest SDM_ml SLX_ml


Likelihood-ratio test LR chi2(1) = 119.57
(Assumption: SLX_ml nested in SDM_ml) Prob > chi2 = 0.0000

lrtest SDM_ml SLM_ml


Likelihood-ratio test LR chi2(1) = 0.94
(Assumption: SLM_ml nested in SDM_ml) Prob > chi2 = 0.3331

We select the SLM


M. Herrera-Gomez Spatial Cross-sectional Models
Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

GETS: selecting the best model

Variable OLS SLX SLM SDM SDEM SARAR

NM2012 −0.70∗∗ −0.29∗∗ −0.19∗∗ −0.16∗∗ −0.21∗∗ −0.16∗∗

W × NM2012 −0.95∗∗ −0.10∗∗∗ −0.32∗∗∗

const −11.44∗∗ 13.02∗∗ −2.41∗∗ −2.82∗∗ −11.78∗∗ −1.74∗∗

ρb −0.82∗∗ −0.80∗∗ −0.88∗∗

λ
b 0.83∗∗ −0.33∗∗

LRCOMFAC 6.81∗∗

loglik −489.28 −478.67 −419.35 −418.88 −421.12 −418.08

AIC 1000.56 963.33 846.70 847.76 852.25 846.15

Nota: ∗∗ p < 0.05.

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Maximum Likelihood: selecting the best model

From specific to general strategy:


Using LM tests: spatial lag model (SLM).

Between SDM and SEM: LRCOMFAC .

From general to specific strategy:


Start using SDM and to eliminate sequentially non-significant
variables: SLM selected.

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary

Results under IV/GMM

Variable SEM SLM SDM SDEM SARAR

NM2012 −0.21∗∗ −0.14∗∗ −0.14∗∗ −0.24∗∗ −0.15∗∗


W × NM2012 0.02∗∗ −0.47∗∗
const 10.50∗∗ −1.57∗∗ −1.40∗∗ −11.97∗∗ −1.60∗∗
ρb −0.89∗∗ −0.91∗∗ −0.89∗∗
λ
b 0.78∗∗ −0.19∗∗

pseudo − R 2 0.26 0.43 0.43 0.41 0.43


Nota: ∗∗ p < 0.05.

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Interpretation of estimated parameters

In SLM, SARAR or SDM models, a change of the variable xk


in region i will affect the region itself and affects potentially
the other regions indirectly through the spatial multiplier
mechanism ((I − ρW )−1 ).
In a linear model, the marginal effect is:

∂ E (yi ) b ∂ E (yj )
= βk =0
∂ xik ∂ xik

but in spatial models with Wy and/or Wx, the second effect is


not zero.

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

SLM. Direct and indirect effects


The marginal effect of the explanatory variable xk on the dependent variable is:
 ∂y ∂y 
1 1
∂x
··· ∂ xnk
h
∂y ∂y
i  .1k . .. 
. . . =  . .. ,
∂ x1k ∂ xnk  . . 
∂ yn ∂ yn
∂ x1k
··· ∂ x1k
0 ··· 0
 
βk
 0 βk ··· 0 
= (In − ρW )−1  .. .. .. ,
 
..
 . . . . 
0 0 ··· βk
= (In − ρW )−1 [βk In ] , (1)

Direct effect: average of the elements of principal diagonal of


(In − ρW )−1 [βk In ].
Indirect effect: (spatial spillover) average of sum of rows, without of elements
of principal diagonal of (In − ρW )−1 [βk In ].
M. Herrera-Gomez Spatial Cross-sectional Models
Example under SLM

. estat impact
progress :100%
Average impacts Number of obs = 164
--------------------------------------------------------
| Delta-Method
| dy/dx Std. Err. z P>|z|
-------------+------------------------------------------
direct |
NM2012 | -.2414164 .0700594 -3.45 0.001
-------------+------------------------------------------
indirect |
NM2012 | -.7986744 .2491086 -3.21 0.001
-------------+------------------------------------------
total |
NM2012 | -1.040091 .3029917 -3.43 0.001
--------------------------------------------------------
Example under SLM
If we apply manually the above expressions (SLM under MLE):
. mata:
---------------------- mata (type end to exit)-------------------
: b = st_matrix("e(b)")
: b
1 2 3 4
+-------------------------------------------------------------+
1 | -.1898462498 2.40790211 .8174714987 8.182474325 |
+-------------------------------------------------------------+
: rho = b[1,3]
: rho
.8174714987
: S = luinv(I(rows(W))-rho*W)
: end
-----------------------------------------------------------------
. * Total effects
. mata: (b[1,1]/rows(W))*sum(S)
-1.040090862
* Direct effects
. mata: (b[1,1]/rows(W))*trace(S)
-.2414164387
. * Indirect effects (spatial spillovers)
. mata: (b[1,1]/rows(W))*sum(S) - (b[1,1]/rows(W))*trace(S)
-.7986744231
Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Summing up

Stata is one of the most complete in tools for spatial


econometrics estimation for cross-sectional data:
MLE
IV/GMM.
Also, for cross-section data, the most common spatial
specifications can be estimated by ML and/or IV/GMM.
Main results of the impact of net migration:
Cross-section model: SLM shows a negative impact in
unemployment (long run effect).

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Summing up

Stata is one of the most complete in tools for spatial


econometrics estimation for cross-sectional data:
MLE
IV/GMM.
Also, for cross-section data, the most common spatial
specifications can be estimated by ML and/or IV/GMM.
Main results of the impact of net migration:
Cross-section model: SLM shows a negative impact in
unemployment (long run effect).

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Some references
Anselin L. (2003). “Spatial Externalities, Spatial Multipliers and Spatial
Econometrics,” International Regional Science Review, 26, 153-166.
Anselin, L. and A. Bera (1998). “Spatial dependence in linear regression
models with an Introduction to Spatial Econometrics,” Handbook of
Applied Economic Statistics, pp. 237-289.
Brueckner, J. (2003). “Strategic interaction among governments: An
overview of empirical studies,” International Regional Science Review,
26(2).
Kelejian, H. H., and Prucha, I. R. (1998). A generalized spatial two-stage
least squares procedure for estimating a spatial autoregressive model with
autoregressive disturbances. The Journal of Real Estate Finance and
Economics, 17(1), 99-121.
Mur, J., and Angulo, A. (2009). Model selection strategies in a spatial
setting: Some additional results. Regional Science and Urban Economics,
39(2), 200-213.

M. Herrera-Gomez Spatial Cross-sectional Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Spatial Econometrics with Stata


Spatial Econometrics Models for Panel data

Marcos Herrera-Gomez1
([email protected])

1 CONICET-IELDE
National University of Salta (Argentina)

Graduate School of International Development


Nagoya University (Japan)
February 18th, 2022
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

General index

1 Introduction to panel data models


2 Testing spatial effects
Pooled Model
Fixed and Random models
3 Static spatial panel models
4 Dynamic spatial panel models
5 Common factors
General Nested Spatial model with common factors
Modelling Common Factors

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Basic model: pooled model

Consider a linear model:

yt = Xt β + ut ,
where:
yt is a n × 1 vector of outcomes for each t ∈ {1, . . . , T }.
Xt is a n × k matrix of time-invariant individual explanatory
variables.
ut is a n × 1 vector of random error terms.
Problem:
This model doesn’t control by heterogeneity: specific temporal or
individual variables could be affect on dependent variable.

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Model with individual and temporal effects


If we reconsider the basic model for each individual, with k independent
variables xit :

yit = xit β + uit ,


where i = 1, . . . , n, t = 1, . . . , T .
We can decompose the error term into (two-way error component):

uit = µi + φt + εit ,
where µi is a common region-specific effect and φt is a common time-specific
effect for all regions.
These effects could be treated as fixed or random.
In the fixed effects model, a dummy variable is introduced for each region
and each time.
In the random effects model, µi (i = 1, . . . , n) is treated as a random
 
variable that is independently and identically distributed, i.i.d. 0, σµ2 ,
and cov (µi , εit ) = 0. (similar assumption for φt ).
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Random effects models


This model is quite popular among applied econometricians, by following
reasons:
1 It may be considered as a compromise solution: Panel data models with

controls for fixed effects only utilize the time-variant variables, whereas
RE models employ both time-series and cross-sectional variables.
2 RE model avoids the loss of degrees of freedom in comparison to fixed

effects model: is an efficient estimator under ideal conditions.


3 RE model avoids the problem of variables that only vary a little and

cannot be estimated.
However, the random effects model should satisfied three conditions:
(1) The number of units should potentially be able to go to infinity.
(2) The units in the sample should be representative of a larger
population.
(3) The correlation between the random effects, µi (i = 1, . . . , n) and the
explanatory variables needs to be 0.
These 3 conditions do not tend to be satisfied in spatial research.
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Random effects models


This model is quite popular among applied econometricians, by following
reasons:
1 It may be considered as a compromise solution: Panel data models with

controls for fixed effects only utilize the time-variant variables, whereas
RE models employ both time-series and cross-sectional variables.
2 RE model avoids the loss of degrees of freedom in comparison to fixed

effects model: is an efficient estimator under ideal conditions.


3 RE model avoids the problem of variables that only vary a little and

cannot be estimated.
However, the random effects model should satisfied three conditions:
(1) The number of units should potentially be able to go to infinity.
(2) The units in the sample should be representative of a larger
population.
(3) The correlation between the random effects, µi (i = 1, . . . , n) and the
explanatory variables needs to be 0.
These 3 conditions do not tend to be satisfied in spatial research.
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Types of asymptotic

There are two types of asymptotic in spatial data:


1 INFILL asymptotic structure: the limits of the sampling region
remains bounded. When n goes to infinity, the more units come
from observations taken from between those already observed.
2 INCREASING DOMAIN asymptotic structure: the sampling region
grows when n goes infinity. In this case, the initial observations
preserve the spatial structure of neighbourhood.
Also, there are two types of sampling designs:
(a) stochastic design, where the spatial units are randomly drawn.
(b) fixed design where the spatial units lie on a non-random field.
Spatial econometric literature mainly focuses on increasing domain
asymptotic under a fixed sample design (Cressie 1993, Elhorst 2014).

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Fixed effects models

Additionally to the increasing domain asymptotic and a fixed sample


design: when the dataset contain all spatial units within a study area it is
questionable whether they are still representative of a larger population.
For example, given the all states in a country, the population may
be said to be sampled exhaustively (we have the population). Then,
the random effects are no necessary and fixed effects should be
specified.
Also, in Spatial econometrics there is a prominent reason for fixed effects:
under infill asymptotic, the spatial weight matrix cannot consistently be
specified and the impact of spatial interaction effects cannot be
consistently estimated.
In general, the fixed effects model is more appropriate than the random
effects model. However, random effect remains as a competitive model if
the objective population is a “super-population”.

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Fixed or random effects

Hausman test (1978) is computed as:


0
H = (βfe − βre ) (Vfe − Vre )−1 (βfe − βre ) ,
where βfe is the vector of coefficients of the consistent estimator
fe, βre is the vector of coefficients of the efficient estimator re,
with Vfe and Vre as the variance-covariance matrix of fe and re,
respectively. This statistic is distributed as χq2 , with q degrees
(number of common coefficients in both models).
Hausman test can be consider as a statistic of validation of re
estimator, null hypotheses.
Hausman’s specification test can also be used in models with
spatial lags Wy and WX .

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models Pooled Model
Dynamic spatial panel models Fixed and Random models
Common factors
Summary

General Index

1 Introduction to panel data models


2 Testing spatial effects
Pooled Model
Fixed and Random models
3 Static spatial panel models
4 Dynamic spatial panel models
5 Common factors
General Nested Spatial model with common factors
Modelling Common Factors

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models Pooled Model
Dynamic spatial panel models Fixed and Random models
Common factors
Summary

Simple LM Tests
Under a no-spatial pooled model, or under SLX extension, we can to test
the spatial autocorrelation on error:

[ub0 (IT ⊗W )ub/σb 2 ]2 2


LMERROR = ∼ χ(1) ,
T × T1 as

where T1 = tr [(W 0 + W ) W ] y ub are the OLS residuals from pooled


model and σ b 2 = ub0 ub/(n×T ).
Also, the presence of spatial lag can tested with:
2
[ub0 (IT ⊗W )y/σb 2 ] 2
LMLAG = ∼ χ(1) ,
Jb as
 0   
where Jb= 1/σb 2 (IT ⊗ W ) X βb MTn (IT ⊗ W ) X βb + T × T1 σb 2 , with
MTn = ITn − X (X 0 X )−1 X 0 .
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models Pooled Model
Dynamic spatial panel models Fixed and Random models
Common factors
Summary

Robust LM Tests

The robust version of the LM error:

i2
ub0 (IT ⊗W )b
h   0
u
− T × T1 Jb−1 × u (ITσb⊗W )y
b
σb 2 2
2
RLMERROR = h i ∼ χ(1) .
−1 as
T × T1 1 − T × T1 J
b

The robust version of the LM lag:


i2
ub0 (IT ⊗W )y ub0 (IT ⊗W )b
h  
u
σb 2
− σb 2 2
RLMLAG = ∼ χ(1) .
Jb− T × T1 as

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models Pooled Model
Dynamic spatial panel models Fixed and Random models
Common factors
Summary

General Index

1 Introduction to panel data models


2 Testing spatial effects
Pooled Model
Fixed and Random models
3 Static spatial panel models
4 Dynamic spatial panel models
5 Common factors
General Nested Spatial model with common factors
Modelling Common Factors

M. Herrera-Gomez Spatial Panel Data Models


Detection of spatial dependence

To incorporate spatial effects we must have some evidence of their presence. A


possible test that can be used is CD test (Pesaran, 2004):
!
q n−1 n
2T
CD = n(n−1) ∑ ∑ ij ,
ρ
b
i=1 j=i+1

where ρbij is the correlation coefficient in the residuals between i and j:


T
∑ ubit ubjt
t=1
ρbij = ρbji =  1/2  1/2 , (??) (1)
T T
∑ ubit2 ∑ ubjt2
t=1 t=1
Null hypothesis: no autocorrelation in cross-section dimension.
In Stata:
. xtreg U NM, fe
(ommitted product)
. xtcsd, pes abs

Pesaran’s test of cross sectional independence = 60.169, Pr = 0.0000


Average absolute value of the off-diagonal elements = 0.464
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Spatial lag model


The SLM with fixed effects is:
yt = ρWyt + Xt β + µ + εt ,
(2)
εt ∼ N 0, σε2 In ,
 

where
···
     
y1t x11t x21t xk1t µ1
 y2t   x12t x22t ··· xk2t   µ2 
yt =  ..  , Xt =  .. .. .. , µ =  .. .
     
..
 .   . . . .   . 
ynt x1nt x2nt ··· xknt µn
Under random effects, this model can be written as:

yt = ρWyt + Xt β + µ + εt ,
| {z }
hut i (3)
εt ∼ N 0, σε2 In , µ ∼ N 0, σµ2 In .
 

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

SLM. Direct and indirect effects


The partial effect of one unit increases on the SLM model is as follows:
 ∂y
· · · ∂∂xy1

1
∂ x1k nk
h
∂y
i  . .. .. 
∂ x1k
. . . ∂∂xy =   .. . . ,

nk
∂ yn ∂ yn
∂ x1k
··· ∂x
1k

0 ··· 0
 
βk
 0 βk · · · 0 
= (In − ρW )−1  . .. ..  ,
 
 .. ..
. . . 
0 0 · · · βk
= (In − ρW )−1 [βk In ] , (4)

Direct effect: average of the elements of principal diagonal of


(In − ρW )−1 [βk In ].
Indirect effect: (spatial spillover) average of sum of rows, without of elements
of principal diagonal of (In − ρW )−1 [βk In ].
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Spatial Error Model

The SEM model with fixed effects is:

yt = Xt β + µ + εt

εt = ρW εt + ηt (5)

ηt ∼ N 0, ση2 In
 

and the version of SEM model with fixed effects is:

yt = Xt β + µ + εt ,
| {z }
ut
 εt = ρW εt + ηt , 
ηt ∼ N 0, ση2 In , µ ∼ N 0, σµ2 In ,


M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Spatial Durbin Model


SDM specification:

yt = ρWyt + Xt β + WXt γ + εt , (6)


with direct-indirect effects:

 ∂ y1 ∂ y1 
∂ x1k
··· ∂ xnk
.. ..
h i
∂y
... ∂y
=
 .. 
,
. . .

∂ x1k ∂ xnk  
∂ yn ∂ yn
∂ x1k
··· ∂ x1k
···
 
βk w12 γk w1n γk
 w21 γk βk ··· w2n γk 
= (In − ρW )−1  .. .. .. ,
 
..
 . . . . 
wn1 γk wn2 γk ··· βk
= (In − ρW )−1 [βk In + γk W ] , (7)

M. Herrera-Gomez Spatial Panel Data Models


command xsmle

SLM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sar)
hausman

SEM
xsmle U NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem) hausman

SDM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sdm)
durbin(NM) hausman

SDEM
xsmle U NM wx_NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem)
command xsmle

SLM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sar)
hausman

SEM
xsmle U NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem) hausman

SDM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sdm)
durbin(NM) hausman

SDEM
xsmle U NM wx_NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem)
command xsmle

SLM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sar)
hausman

SEM
xsmle U NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem) hausman

SDM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sdm)
durbin(NM) hausman

SDEM
xsmle U NM wx_NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem)
command xsmle

SLM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sar)
hausman

SEM
xsmle U NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem) hausman

SDM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sdm)
durbin(NM) hausman

SDEM
xsmle U NM wx_NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem)
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Alternative Models

Variable SLM SEM SDM SDEM

NM −0.169∗∗∗ −0.166∗∗∗ −0.147∗∗∗ −0.190∗∗∗


W × NM −0.048∗∗∗ −0.361∗∗∗
ρb −0.745∗∗∗ −0.721∗∗∗
λ
b −0.840∗∗∗ −0.735∗∗∗
COMFAC −90.42∗∗∗
Spatial effects (long run)
Directs −0.200∗∗∗ −0.182∗∗∗ −0.190∗∗∗
Indirects −0.463∗∗∗ −0.518∗∗∗ −0.361∗∗∗
Totals −0.662∗∗∗ −0.700∗∗∗ −0.551∗∗∗
AIC 2353 2415 2351 2340

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Types of Spatial Lag Models

Following Anselin et al (2008), there are 3 types of dynamics spatial lag


panel models (SLM):
1 Simultaneous spatio-temporal

yt = τyt−1 + ρWyt + Xt β + µ + εt .
2 Pure Recursive

yt = γWyt−1 + Xt β + µ + εt .
3 Spatio-temporal Recursive

yt = τyt−1 + γWyt−1 + ρWyt + Xt β + µ + εt .

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Simultaneous spatio-temporal Model

Simultaneous spatio-temporal

yt = τyt−1 + ρWyt + Xt β + µ + εt .
1 The dynamic structure is explicit with an inter-temporal contagion
that multiplies through the impact of contemporary neighbours.
2 The contemporary spatial effect hinders the use of this model for
predictive purposes:
The individual reacts immediately to his neighbours, although he is
also affected by his past.
3 The estimation can be done by GMM or MV.
4 Stationarity condition is required.

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Pure recursive Model

Pure Recursive

yt = γWyt−1 + Xt β + µ + εt .
1 The dynamic structure is indirect but exists:
Example: y1t depends on ywi ,t−1 which, in turn, depends on y1,t−2 .
2 It is useful for the innovation diffusion model (Upton and Fingleton,
1985) or contagion-models (COVID-19).
3 The estimation can be done using instrumental variables in the
traditional way or GMM and MV.

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Spatio-temporal recursive model

Spatio-temporal Recursive

yt = τyt−1 + γWyt−1 + ρWyt + Xt β + µ + εt .


1 The dynamic structure is explicit in both directions: the spatial and
the temporal direction.
The network of multiplier effects is complex.
2 It has a good predictive capacity as reflected by Giacomini and
Granger (2004).
3 The estimation is possible either by GMM or QMV.
4 It is necessary to analyse the stationarity conditions (τ + γ + ρ < 1).
5 Model with different extensions nowadays.

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Types of Spatial Durbin Models

Again, there are 3 possible specifications of spatial dynamic (SDM):


1 Simultaneous Spatio-temporal

yt = τyt−1 + ρWyt + Xt β + WXt θ + µ + εt .


2 Pure Recursive

yt = γWyt−1 + Xt β + WXt θ + µ + εt .
3 Spatio-temporal Recursive

yt = τyt−1 + γWyt−1 + ρWyt + Xt β + WXt θ + µ + εt .

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Direct and indirect effects

If we consider the most complete model previous model: spatio-temporal


recursive SDM.
The direct and indirect short- and long-run effects can be obtained:
Short run (assuming τ = γ = 0):
h i
∂y
∂x . . . ∂∂xy = (In − ρW )−1 [βk In + γk W ] .
1k nk t
Long run (assuming yt = yt−1 = y ∗ ):
h i
∂y
∂x . . . ∂y
∂x = [(1 − τ) In − (γ + ρ) W ]−1 [βk In + γk W ] .
1k nk t

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

xsmle command

SLM 1
xsmle U NM, dlag(1) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

SLM 2
xsmle U NM, dlag(2) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

SLM 3
xsmle U NM, dlag(3) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

xsmle command

SLM 1
xsmle U NM, dlag(1) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

SLM 2
xsmle U NM, dlag(2) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

SLM 3
xsmle U NM, dlag(3) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

xsmle command

SLM 1
xsmle U NM, dlag(1) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

SLM 2
xsmle U NM, dlag(2) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

SLM 3
xsmle U NM, dlag(3) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)

M. Herrera-Gomez Spatial Panel Data Models


Alternative models of dynamic SLM

Variable SLM 1 SLM 2 SLM 3

Ut−1 −0.59∗∗∗ −0.66∗∗∗


W ×U 0.48∗∗∗ −0.56∗∗∗ −0.58∗∗∗
W × Ut−1 −0.42∗∗∗ −0.17∗∗∗
NM 0.03∗∗ −0.06∗∗∗ 0.02∗∗
Spatial effects (short run)
Directs −0.03∗∗ −0.06∗∗∗ −0.03∗∗∗
Indirects −0.02∗ −0.07∗∗∗ −0.03∗∗∗
Totals −0.02∗∗ −0.13∗∗∗ −0.06∗∗∗
Spatial effects (long run)
Directs −0.05∗∗ −0.08∗∗ −0.12∗∗
Indirects −1.21 −0.15∗∗ −0.36∗∗
Totals −1.16 −0.23∗∗ −0.49
AIC 1801.82 1966.40 1799.70
stationarity
Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

General Index

1 Introduction to panel data models


2 Testing spatial effects
Pooled Model
Fixed and Random models
3 Static spatial panel models
4 Dynamic spatial panel models
5 Common factors
General Nested Spatial model with common factors
Modelling Common Factors

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

General Nested model with CF

General Nesting Spatial (GNS) model with Common Factors (Elhorst,


2020):

yt = τyt−1 + ρWyt + γWyt−1 + Xt β + WXt θ + ∑Γ 0 frt + ut , .


r
ut = λ Wut + εt

Contemporaneous spatial lags in dependent and explanatory


variables, including the error term.
Temporal lag and spatio-temporal lag.
Generic common factors ∑Γ 0 frt : unobserved shocks, probable
r
non-linears.

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

Linear Restrictions in unobservable terms

Common factors can be linearly restricted: ∑Γ 0 frt = µ + αt ιN


r
µ is a vector of individual effects, fixed or random.
αt is a temporal effect, fixed or random.
This type of restriction allows returning to the previously panel
models.
The random option should satisfy the assumptions that:
the number of units potentially goes to infinity.
the observations be representative (a sample) of a large
population.
the effects are orthogonal to the explanatory variables.
These conditions are not adequately met in empirical spatial
researchs: preponderance of fixed effects models.

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

General Index

1 Introduction to panel data models


2 Testing spatial effects
Pooled Model
Fixed and Random models
3 Static spatial panel models
4 Dynamic spatial panel models
5 Common factors
General Nested Spatial model with common factors
Modelling Common Factors

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

Common Factors in the GNS


There are 3 alternatives to specify the common factors within the GNS:

Option 1 for ∑Γ 0 frt :


r
0
Consider 2 factors f1t = 1 1 · · · 1 and
0
f2t = α1 α2 · · · αT with the imposition of the parametric
constraints: Γ01 = µ1 µ2 · · · µn and Γ02 = 1 1 · · · 1 .
Using this option captures individual and temporary fixed effects:
Individual fixed effects are captured by f1t which is constant over
time but with heterogeneous coefficients Γ1 .
Time fixed effects are captured by f2t which changes between
periods but with homogeneous coefficients Γ2 .
The number of common factor parameters to be estimated is
n + T + 1.
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

Common Factors in the GNS


Option 2 for ∑Γ 0 frt :
r
Another alternative to control for common factors is to use individual
fixed effects, but include time fixed effects using cross-sectional averages

1 n 1 n
yt = ∑ yit , y t−1 = ∑ yi,t−1
n i=1 n i=1

1 n
xt = ∑ xikt , (k = 1, ..., K )
n i=1

The problem with temporal fixed effects is that each dummy has the
same impact on all observations for period t, in this case a temporal
heterogeneity is introduced.
Problem with this strategy: the parameters grow to n + (2 + K ) × n.
Empirically, introducing the time effects of y t and y t−1 is effective in
capturing unobservable heterogeneity (Cicarelli and Elhorst, 2018).
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

Common Factors in the GNS

Option 3 for ∑Γ 0 frt :


r
estimate the main components with the idea of Shi and Lee (2017):
QML estimate for the GNS model with CF, including a Nickell bias
correction and corrections for the impact of the bias on the other
parameters.
Elhorst extended this analysis by including different measures of
goodness of fit.
Problems:
No-easy interpreting of principal components compared to the
cross-sectional averages strategy.
This strategy requires estimating 2 × n additional parameters.

M. Herrera-Gomez Spatial Panel Data Models


Testing the type of cross-sectional dependence
Recall that the cross-sectional CD test (Pesaran, 2004) uses the correlation
coefficient between pairs of units in a panel:
s !
n−1 n
2T
CD = ∑ ∑ ρbij
n (n − 1) i=1 j=i+1

Two null hypotheses can be tested:


1 H0 : independence in the cross-section (checked previously).
2 H0 : weak cross-section dependence (α ≤ 1/2)
H1 : strong dependence on cross-section (α > 1/2)
where α is the exponent of the cross-sectional dependence defined as

!
n−1 n
2n  
ρn = ∑ ∑ ρij = O n2α−2 ,
n (n − 1) i=1 j=i+1

and it measures the rate at which the variance of the cross-sectional


correlation averages goes to 0.
For α ≤ 1/2, ρ n tends to go 0 very fast.
For α ' 1, ρ n tends to a non-zero value (common factor).
Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

Testing the type of cross-sectional dependence

Bailey et al (2016) propose a consistent estimation of α, such that the


type of cross-sectional dependence present in the panel can be tested:
The exponent α can take values within the interval (0, 1] and :
1 α ≤ 1/2 weak dependency.
2 α = 1 strong dependency.
3 Intermediate values indicate moderate dependence.
The use of this statistic allows discriminating the estimation method to
be used for the panel.

M. Herrera-Gomez Spatial Panel Data Models


Introduction to panel data models
Testing spatial effects
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary

Estimation method according α test

Elhorst et al (2021) propose the following strategy according α exponential test:

α Cross-section Dep. W matrix Method

0 < α ≤ 0.5 weak sparse


0.5 < α ≤ 0.75 moderate still quite sparse ML/GMM/IV
0.75 < α < 1 quite strong dense (GVAR)
α =1 strong CS averages or PC (without W) OLS

The α can be estimated consistently only for 0.5 < α ≤ 1. Use Pesaran’s
CD test to find out whether α is smaller or greater than 0.5.

M. Herrera-Gomez Spatial Panel Data Models


Estimation method according α test

Practical guide suggested by Elhorst et al (2021):


1 Assess the degree of strong cross-sectional dependence in the raw data
using the CD-test of Pesaran (2004) and the corresponding exponent α
of Bailey et al (2016).
1 A non-significant CD-test result or a significant CD-test result with
a value of α significantly smaller than 3/4 indicates that the data
are weakly dependent or moderately dependent.
A spatial econometric model without CF suffices.
2 A significant CD-test and a value of α not significantly smaller than
1 suggests the presence of CF.
2 Assess the degree of cross-sectional dependence of Cross-sectional
Average using the residuals. Apply the CD-test on the “de-factored”
observations from step 1 in case a common factor model has been chosen.
1 Failure to reject the null indicates possibly remaining weak
cross-sectional dependence: The appropriate method is a CF model
with a sparse connectivity matrix W estimated by means of
ML/IV/GMM.
Elhorst’s procedure
Elhorst’s procedure
Alternative competitive models

Variable SLM 2 SLM 2+CF SLM 3+CF SDM+CF

Ut−1 −0.26∗∗∗ −0.25∗∗∗


W ×U −0.56∗∗∗ −0.67∗∗∗ −0.66∗∗∗ −0.67∗∗∗
W × Ut−1 −0.42∗∗∗ −0.14∗∗∗ −0.12∗∗∗ −0.11∗∗∗
NM −0.06∗∗∗ −0.03∗∗∗ −0.03∗∗∗ −0.04∗∗∗
W × NM 0.02
Spatial effects (short run)
Directs −0.06∗∗∗ −0.04∗∗∗ −0.03∗∗∗ −0.04∗∗
Indirects −0.07∗∗∗ −0.06∗∗∗ −0.05∗∗∗ −0.02∗∗
Totals −0.13∗∗∗ −0.10∗∗∗ −0.08∗∗∗ −0.06∗∗
Spatial effects (long run)
Directs −0.08∗∗ −0.04∗∗∗ −0.05∗∗∗ −0.05∗∗
Indirects −0.15∗∗ −0.13∗∗ −0.10∗∗ −0.04∗∗
Totals −0.23∗∗ −0.17∗∗ −0.15∗∗ −0.09
AIC 1966.40 2706.95 2706.60 2709.73
stationarity yes yes yes yes
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Summing up
For panel data, recent developments in Stata provide
alternatives for estimating static and dynamic models.
Dynamic spatial econometric models for spatial panels with
common factors (CF) are one of the most advanced models
currently available for empirical research.
Stata contains the alternative commands to implement this
models and tests.
Main results of the impact of net migration:
Static Panel model: SDM and the SDEM show a negative
impact in unemployment (NEG theory).
Dynamic Panel model: All competitive models show a negative
impact in short and long run. Results in line with the NEG
theory. (Caution: this is an illustrative example, dynamic panel
models require a larger time dimension)
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Summing up
For panel data, recent developments in Stata provide
alternatives for estimating static and dynamic models.
Dynamic spatial econometric models for spatial panels with
common factors (CF) are one of the most advanced models
currently available for empirical research.
Stata contains the alternative commands to implement this
models and tests.
Main results of the impact of net migration:
Static Panel model: SDM and the SDEM show a negative
impact in unemployment (NEG theory).
Dynamic Panel model: All competitive models show a negative
impact in short and long run. Results in line with the NEG
theory. (Caution: this is an illustrative example, dynamic panel
models require a larger time dimension)
M. Herrera-Gomez Spatial Panel Data Models
Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Some references
Theoretical references:
Anselin, L., Gallo, J. L., & Jayet, H. (2008). Spatial panel econometrics. In The
econometrics of panel data (pp. 625-660). Springer, Berlin, Heidelberg.
Bailey, N., Kapetanios, G., & Pesaran, M. H. (2016). Exponent of
cross-sectional dependence: Estimation and inference. Journal of Applied
Econometrics, 31(6), 929-960.
Elhorst, J. P., Gross, M., & Tereanu, E. (2021). Cross-sectional dependence and
spillovers in space and time: Where spatial econometrics and global var models
meet. Journal of Economic Surveys, 35(1), 192-226.
Applied references:
Elhorst et al. (2020): Persistent habit car (Spatio-temporal recursive SDM with
CF).
Jung et al (2014): impact of poverty’s programs (Spatio-temporal recursive
SDM).
Keller and Shiue (2007): historical analysis of rice’s price (simultaneous
spatio-temporal SLM).
Montmartin and Herrera (2015): R&D investment in OCDE countries
(Spatio-temporal recursive SDM).
View publication stats
M. Herrera-Gomez Spatial Panel Data Models

You might also like