Spatial Econometricswith Stata 2022

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/358712548
Spatial Econometrics with Stata: Exploratory Spatial Data Analysis (ESDA),

Spatial Models for Cross-Sectional Data, Spatial Models for Panel Data.
Presentation · February 2022

DOI: 10.13140/RG.2.2.24440.93442
CITATIONS READS
0 51
1 author:
Marcos Herrera Gomez

National Scientific and Technical Research Council
66 PUBLICATIONS 234 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Land & Property Market Value View project
Econometrics View project
All content following this page was uploaded by Marcos Herrera Gomez on 18 February 2022.
The user has requested enhancement of the downloaded file.

Overview
Exploratory Spatial Data Analysis
ESDA: Visualising spatial data
ESDA: Discovering patterns of spatial dependence
Summary
Spatial Econometrics with Stata

Marcos Herrera-Gomez1
([email protected])
1 CONICET-IELDE
National University of Salta (Argentina)
Graduate School of International Development

Nagoya University (Japan)
February 14th, 2022
M. Herrera-Gomez Exploratory Spatial Data Analysis
Overview
Summary
General index
1 Overview
2 Exploratory Spatial Data Analysis
3 ESDA: Visualising spatial data
Loading data and choropleth maps
4 ESDA: Discovering patterns of spatial dependence
Matrices and spatial tests
Global Spatial Tests
Local Spatial Tests
Spatial Correlogram
5 Summary

Overview
Summary
Objective of this Workshop
We will explore the main topics of Spatial Econometrics Analysis

using lattice data:
1 Exploratory Spatial Data Analysis (ESDA): February 14.

2 Spatial Models for Cross-Sectional Data: February 16.
3 Spatial Models for Panel Data: February 18.
Also, we will use Stata as the main software.

Overview
Summary
What is Spatial Econometrics?
Jean Paelinck introduced the term “Spatial Econometrics” in

1974 to designate: “a combination of economic theory,
mathematical formalization and statistics to deal with: spatial
interdependence, importance of factors in other places, explicit
modelling of space".
Luc Anselin: “SE is an econometric branch dealing with spatial

effects in cross-sectional models and panel data models”.
Two spatial effects:
spatial dependence: reflects a situation where a variable at
location i, depends on the variable or another variable at
neighboring locations (main topic of this workshop).
spatial heterogeneity: (instability) over space.

Overview
Summary
Why is Spatial Econometrics important nowadays?
Theory-driven
From individual decision to social-spatial interaction,
agent-based modelling.
Peer-effects, contextual effects, neighbourhood effects.
Data-driven
Geo-referenced information.
Technology
Geographical Information Systems.
Capability of statistical software.

Overview
Summary
Types of spatial data
Spatial data are those data which combine attribute

information (e.g. name of the spatial object, population
density, productivity, etc.) with location information (spatial
coordinates) (georeferenced data).
Types of spatial data:
Geostatistic data: continuous spatial field (noise surface,
pollution surface).
Aerial (Lattice or Regional): discrete spatial data, fixed
polygons or points (counties, provinces, countries).
Point pattern: location as a random event (crimes, accidents).

Overview
Summary
Main books of Spatial Econometrics
Anselin L. (1988). Spatial Econometrics, Methods and

Models. Boston: Kluwer Academic.
LeSage J. and Pace R.K. (2009), Introduction to Spatial
Econometrics, Taylor & Francis Group, LLC.
Elhorst, J. P. (2014). Spatial econometrics: from
cross-sectional data to spatial panels (Vol. 479, p. 480).
Heidelberg: Springer.
Pesaran, M. H. (2015). Time series and panel data
econometrics. Oxford University Press. (Part VI).
Kelejian, H. and Piras, G. (2017). Spatial econometrics.
Academic Press.

Overview
Summary
ESDA: Exploratory Spatial Data Analysis
ESDA has its origins in Exploratory Data Analysis (EDA).

Tukey (1977): “Exploratory data analysis is an attitude, a state
of flexibility, a willingness to look for those things that we
believe are not there, as well as those we believe to be there.”
The idea is explore data to discover potentially explicable
patterns.
Data visualization: chart, table, graph.
ESDA expands upon EDA where location is fundamental.

Why we need the “S” in EDA
locationally variant
locationally invariant
Overview
Summary
Elements of ESDA
Visualize spatial distributions.

Discover patterns of spatial dependence:
global spatial autocorrelation, and spatial relationships.
Identify atypical locations or spatial outliers.
Detecting spatial heterogeneity:
Spatial regimes (spatial structural breaks).
Local spatial autocorrelation: hot spots, cold spots.
Regionalization (spatial clustering).

Example used: Impact of net migration on unemployment
Hot topic in economics.

Competitive theories:
Orthodox theory: net migration causes more unemployment
(positive relationship).
New Economic Geography theory: net migration causes less
unemployment (negative relationship).
Definition of variables:
UNEMPLOYMENT RATE as the number of people
unemployed as a percentage of the labour force.
NET MIGRATION RATE as the ratio of net migration during
the year to the average population in that year. The value is
expressed per 1,000 persons. Net migration is the difference
between immigration into and emigration from the area during
the year (net migration is therefore negative when the number
of emigrants exceeds the number of immigrants).
Level of analysis:
NUTS 2 (Europe 15), 164 regions from 2007 to 2012.
Overview
ESDA: Visualising spatial data Loading data and choropleth maps
Summary
General Index
1 Overview
Local Spatial Tests
Spatial Correlogram
5 Summary

Overview
Summary
From shape to dta
First we need a file of administrative areas:

http://www.diva-gis.org/, http://www.gadm.org/.
Georeferencing information (lattice data) usually is stored in a
shapefile (a collection of files with a common filename with at
least three connected files):
.shp is the file that store geometric objects.
.shx is an index file of geometric objects.
.dbf is the database file, in dBASE format, and
contains information of attributes of the objects.
Shape files cannot be read directly in Stata.
However, spshape2dta (or shp2dta <Stata 15) command can
import shapefiles and convert them in Stata format.

Overview
Summary
From shape to dta


Overview
Summary
From shape to dta


Overview
Summary
From shape to dta


Overview
Summary
spshape2dta command
Syntax:
spshape2dta “shp.filename”, saving(filename) [options]
Example:
spshape2dta "Nuts2_epsg4326", saving(nuts2)
The spshape2dta command generates two files:

nuts2.dta: contains information from .dbf file, _ID, latitude
(_CY) y longitude (_CX).
nuts2_shp.dta: contains geometric information from .shp file.

Overview
Summary
spshape2dta command
Syntax:
spshape2dta “shp.filename”, saving(filename) [options]
Example:
spshape2dta "Nuts2_epsg4326", saving(nuts2)
The spshape2dta command generates two files:

nuts2.dta: contains information from .dbf file, _ID, latitude
(_CY) y longitude (_CX).
nuts2_shp.dta: contains geometric information from .shp file.

Overview
Summary
Merging data sets
The new database (nuts2.dta) does not contain information

about economics variables.
Using the index of geometric objects has been generated a
excel file with variables from Eurostat: unemployment rate and
net migration rate.
Both dataset are easily jointed using POLY_ID as link variable:
import excel "C:\.\.\.\nuts2_164.xls", firstrow

save "C:\.\.\.\migr_unemp07_12.dta"
use nuts2, clear
merge 1:1 POLY_ID using migr_unemp, gen(union) force

Overview
Summary
Merging data sets

net migration rate.

use nuts2, clear

Overview
Summary
Merging data sets

net migration rate.

use nuts2, clear

Overview
Summary
Visualizing in maps
A choropleth is a map in which each area is coloured with an
intensity proportional to the value of a quantitative variable. Some
classical maps:
Quantiles: class breaks correspond to quantiles of the distribution
of variable (each class includes approximately the same number of
polygons).
Equal Intervals: class breaks correspond to values that divide the
distribution of variable attribute into k equal-width intervals.
Boxplot: the distribution of variable attribute is divided into 6
classes defined as follows: [min, p25 − 1.5 ∗ iqr ],
(p25 − 1.5 ∗ iqr , p25], (p25, p50],(p50, p75], (p75, p75 + 1.5 ∗ iqr ]
and (p75 + 1.5 ∗ iqr , max], where iqr is the interquartile range.
Standard Deviates: the distribution of variable attribute is divided
into k classes (2 ≤ k ≤ 9) whose width is defined as a fraction p of
its standard deviation sd.
Overview
Summary
spmap command
Syntax:
spmap [attribute] [if] [in] using basemap [,basemap_options]
Details: basemap_options
polygon(polygon_suboptions)
line(line_suboptions)
point(point_suboptions)
diagram(diagram_suboptions)
arrow(arrow_suboptions)
label(label_suboptions)
scalebar(scalebar_suboptions)
graph_options]

Overview
Summary
spmap command
Syntax:
spmap [attribute] [if] [in] using basemap [,basemap_options]
Details: basemap_options
polygon(polygon_suboptions)
line(line_suboptions)
point(point_suboptions)
diagram(diagram_suboptions)
arrow(arrow_suboptions)
label(label_suboptions)
scalebar(scalebar_suboptions)
graph_options]

Quantile map
spmap U2012 using nuts2_shp, id(_ID) clmethod(q) title("Unemployment rate") ///

legend(size(medium) position(5)) fcolor(Blues2) note("Europe, 2012" "Source:
Eurostat")
Quantile map
spmap NM2012 using nuts2_shp, id(_ID) clmethod(q) title("Unemployment rate")

legend(size(medium) position(5)) fcolor(Blues2) note("Europe, 2012" "Source:
Eurostat")
Equal intervals map
spmap NM2012 using nuts2_shp, id(_ID) clmethod(e) title("Net migration rate")

legend(size(medium) position(5)) fcolor(BuRd) note("Europe, 2012" "Source:
Eurostat")
Box map
spmap NM2012 using nuts2_shp, id(_ID) clmethod(boxplot) title("Net migration

rate") legend(size(medium) position(5)) fcolor(Rainbow) note("Europe, 2012"
"Source: Eurostat")
Box map
spmap U2012 using nuts2_shp, id(_ID) clmethod(boxplot) title("Unemployment

rate") legend(size(medium) position(5)) fcolor(Heat) note("Europe, 2012" "Source:
Eurostat")
Deviation map
spmap NM2012 using nuts2_shp, id(_ID) clmethod(s) title("Net migration rate")

legend(size(medium) position(5)) fcolor(BuRd) note("Europe, 2012" "Source:
Eurostat")
Combine map
spmap U2012 using nuts2_shp, id(_ID) fcolor(RdYlBu) cln(8)

point(data(migr_unemp_final) xcoord(X) ycoord(Y) deviation(NM2012) sh(T)
fcolor(dknavy) size(*0.3)) legend(size(medium) position(5)) legt(Unemployment)
note("Solid triangles indicate values over the mean of net-migration." "Europa, 2012.
Source: Eurostat")
Overview
Local Spatial Tests
Spatial Correlogram
Summary
General Index
1 Overview
Local Spatial Tests
Spatial Correlogram
5 Summary

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Centrality of spatial W
We show spatial concentration in previous maps, in a formal way:

      
yi 0 αij αik yi ui
 yj  =  αji 0 αjk   yj  +  uj  , (??) (1)
yk αki αkj 0 yk uk
y = Ay + u, (2)
Strategy of identification:
   
0 αij αik 0 wij wik
A =  αji 0 αjk  = ρ  wji 0 wjk  = ρW .
αki αkj 0 wki wkj 0
We transform a non-identified model in other that contains only one
parameter: ρ.
W captures ‘who is the neighbour of whom’: must be EXOGENOUS!

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Criteria used to create W

Usually, the building of W is an ad-hoc procedure of the researcher.
Common criteria are:
1 Geographical:
Distance functions:
inverse
inverse with threshold
Contiguity:
Rook
Queen
K nearest neighbours.
2 Socio-economic:
Similarity degree in economic dimensions (or social networks).
3 Combinations between both criteria.

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Advices about W
Griffith (1995):
“It is better to use a reasonable selection of the geographic
weight matrix that considers all null connections”.
“A relatively large number of regional units must be used in a
spatial statistical analysis.”
“Models with lower orders should be preferred over models
with higher orders”
“In general, it is better to use an under-identified weight matrix
than an over-identified one”.
• Exceptionally, it can be built from theory.
• It can be built based on non-geographical conditions: beware of
endogeneity!
• Generally, we work with row-normalized matrix.

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Generating W using Stata

In Stata there are (at least) three commands to generate W:
spatwmat:
Distance criterion.
Used for spatial univariate analysis.
Format file no compatible with spmatrix (and spmat).
spwmatrix:
Generate W using geographic criteria (no contiguity).
Generate W under socio-economic criteria.
Import, export and manipulate from GeoDa.
Compatible format file with spatwmat.
spmatrix (Stata default):
Generate W using geographic criteria (no under knn).
Import, export and read matrices from GeoDa.
Format file no compatible with spatwmat.

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Generating W using Stata

We will use a geographic criterion:
spwmatrix: for example 5-nn.
. spwmatrix gecon _CY _CX, wn(W5st) knn(5) row con
Nearest neighbor (knn = 5) spatial weights matrix (164 x 164)
calculated successfully and the following action(s) taken:
- Spatial weights matrix created as Stata object(s): W5st.
- Spatial weights matrix has been row-standardized.
Connectivity Information for the Spatial Weights Matrix
- Sparseness: 3.049%
- Neighbors: Min : 5
Mean : 5
Median: 5
Max : 5
It is not advisable to work with units without neighbours.

In addition, it is usual to standardize W (usually row-standardize).
Overview
Local Spatial Tests
Spatial Correlogram
Summary
General Index
1 Overview
Local Spatial Tests
Spatial Correlogram
5 Summary

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Univariate spatial tests

The following statistics provide a measure of global spatial autocorrelation and
allow us to know its significance.
Moran I test (1950):
∑∑(yi −y )wij (yj −y )
n i j
I= S0 N .
2
∑ (yi −y )
i=1
Geary c test (1954):

n n
2
∑ ∑ wij (yi −yj )
n−1 i=1j=1
c= 2S0 n
2
.
∑ (yi −y )
i=1
Getis-Ord G test (1992):

n n
∑ ∑ wij yi yj
i j6=i
G= n n .
∑ ∑ yi yj
i j6=i
Null hypotheses of tests: No spatial autocorrelation.

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Global spatial tests in Stata

. spatgsa U2012, w(W5st) moran geary two
Measures of global spatial autocorrelation
--------------------------------------------------------------
Moran’s I
--------------------------------------------------------------
Variables | I E(I) sd(I) z p-value*
--------------------+-----------------------------------------
U2012 | 0.767 -0.006 0.045 17.084 0.000
--------------------------------------------------------------
Geary’s c
--------------------------------------------------------------
Variables | c E(c) sd(c) z p-value*
--------------------+-----------------------------------------
U2012 | 0.228 1.000 0.054 -14.282 0.000
--------------------------------------------------------------
*2-tail test
. spatgsa U2012, w(W5bin) go two

Measures of global spatial autocorrelation
--------------------------------------------------------------
Getis & Ord’s G
--------------------------------------------------------------
Variables | G E(G) sd(G) z p-value*
--------------------+-----------------------------------------
U2012 | 0.039 0.031 0.001 11.864 0.000
--------------------------------------------------------------
*2-tail test

Moran’s I scatterplot
splagvar U2012, wname(W5st) wfrom(Stata) ind(U2012) order(1) plot(U2012)

moran(U2012)
Moran’s I scatterplot
splagvar NM2012, wname(W5st) wfrom(Stata) ind(NM2012) order(1) plot(NM2012)

moran(NM2012)
Overview
Local Spatial Tests
Spatial Correlogram
Summary
General Index
1 Overview
Local Spatial Tests
Spatial Correlogram
5 Summary

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Local indicators of spatial association

A version of Moran I test is used to detect spatial clusters in local
dimension:
n
(xi − x)
Ii (d) = n ∑ wij (d) (xj − x) , (3)
1 2 j=1,j6=i
n ∑ (x i − x)
i=1
where wij (d) is a weighting distance.
Null hypotheses is no spatial autocorrelation and the significance of
Ii could be contrasted using normal distribution:
[Ii − E [Ii ]]
z [Ii ] = p .
Var [Ii ]
This test allows grouping observations in 4 categories (see scatter
Moran): High-High (H-H), Low-Low (L-L), Low-High (L-H) and
High-Low (H-L).
Local Moran’s I scatterplot
genmsp_v0 U2012, w(W5st)

graph twoway (scatter Wstd_U2012 std_U2012 if pval_U2012>=0.05, msymbol(i) mlabel (_ID)
mlabsize(*0.6) mlabpos(c)) (scatter Wstd_U2012 std_U2012 if pval_U2012<0.05, msymbol(i) mlabel
(_ID) mlabsize(*0.6) mlabpos(c) mlabcol(red)) (lfit Wstd_U2012 std_U2012), yline(0, lpattern(--))
xline(0, lpattern(--)) xlabel(-1.5(1)4.5, labsize(*0.8)) xtitle("{it:z}") ylabel(-1.5(1)3.5, angle(0)
labsize(*0.8)) ytitle("{it:Wz}") legend(off) scheme(s1color) title("Local Moran I of Unemployment
rate")
Local Moran’s I map
spmap msp_U2012 using nuts2_shp, id(_ID) clmethod(unique) title("Unemployment

rate") legend(size(medium) position(4)) ndl("No signif.") fcolor(blue red) ///
note("Europe, 2012" "Source: Eurostat")
Overview
Local Spatial Tests
Spatial Correlogram
Summary
General Index
1 Overview
Local Spatial Tests
Spatial Correlogram
5 Summary

Overview
Local Spatial Tests
Spatial Correlogram
Summary
Alternative measure of global spatial autocorrelation: Correlations

computed for all pairs of observations as a function of the distance.
Sample autocorrelation between regions i and j:

(zi − z) zj − z
ρij = ρ zi , zj =
(1/n) ∑ (zh − z)2
h=1
Problem: there are n (n − 1) /2 individuals values of ρij .

Solution: spatial autocorrelation as a distance function: ρij = g dij
ρ (d) = ∑∑1 (dij/h) (zi − z) zj − z /∑∑1 (dij/h) = I ∗ (h)

i j i j
where 1 is an indicator function, h the bandwidth.

Spatial correlogram
spatcorr U2012, bands(0(2)12) xcoord(_CX) ycoord(_CY) graph

Overview
Summary
Summing up
ESDA is an important initial step in spatial analysis.

Show qualitative spatial dependence (mapping).
Find outliers/spatial regimes/clustering.
Quantify the spatial autocorrelation and its significance.
Stata has incorporated tools for spatial analysis.
ESDA can be carried out completely, as in others software.

Overview
Summary
Some references
Anselin L (1995) Local indicators of spatial association – LISA. Geogr Anal

27(2):93–115.
Bivand RS (2010) Exploratory spatial data analysis. In: Fischer MM, Getis A
(eds) Handbook of applied spatial analysis: software tools, methods and
applications. Springer, Berlin/Heidelberg, pp 219–254.
Monmonier M (1996) How to lie with maps, 2nd edn. University of Chicago
Press, Chicago.
Symanzik, J. (2014). Exploratory spatial data analysis. Handbook of regional
science, 1295-1310.
Tukey JW (1977) Exploratory data analysis. Addison-Wesley Pub. Co, Reading.
Stata:
Drukker, D. M. et al. (2013). Creating and managing spatial-weighting matrices
with the spmat command. Stata Journal, 13(2), 242-286.
Pisati, M. (2008). SPMAP: Stata module to visualize spatial data. Statistical
Software Components.

Introduction
Taxonomy of spatial models
Methods of estimation
Spatial Modelling: Data-driven strategies
Interpretation
Summary

Spatial Models for Cross-sectional data
([email protected])
1 CONICET-IELDE

February 16th, 2022
M. Herrera-Gomez Spatial Cross-sectional Models
Introduction
Interpretation
Summary
General index
1 Introduction
2 Taxonomy of spatial models
3 Methods of estimation
Maximum likelihood estimation
Instrumental Variables and Generalized Method of Moments
4 Spatial Modelling: Data-driven strategies
Specific to General modelling
General to Specific modelling
5 Interpretation

Introduction
Interpretation
Summary
Sources of spatial dependence

Spatial spillover
Example: the growth rate of a region is affected by characteristics
and performances of its neighbours.
Spatial spillovers are not instantaneous, require some time to arise
(dynamic feedback effects).
Omitted variables
Unobservable factors (e.g., location amenities) which exert an
influence on the dependent variable and are spatially correlated.
It is unlikely that explanatory variables are readily available to
capture these types of latent variables.
Measurement errors and unobserved heterogeneity
Administrative boundaries (GIS induced) that don’t accurately
reflect the nature of underlying Data Generating Process.
Anselin (2003) proposes a taxonomy of regression models: spatially lagged
dependent variables (Wy), spatially lagged explanatory variables (WX) and
spatially lagged error term (Wu).
Introduction
Interpretation
Summary

Spatial spillover
Omitted variables
Introduction
Interpretation
Summary

Spatial spillover
Omitted variables
Introduction
Interpretation
Summary

Spatial spillover
Omitted variables
Introduction
Interpretation
Summary
Alternatives of specification
General Cliff-Ord model (Manski model)
y = ρWy + X β + WX θ + u,
u = λ Wu + ε.
Imposing restrictions in θ , ρ and λ we can obtain the following models:

θ = 0, ρ 6= 0, λ = 0 → SLM (Spatial Lag Model).
θ = 0, ρ 6= 0, λ 6= 0 → SEM (Spatial Error Model).
θ = 0, ρ 6= 0, λ 6= 0 → SARAR (Spatial AutoRegressive model with
AutoRegressive error).
θ 6= 0, ρ = 0, λ = 0 → SLX (Spatial Lag in X).
θ 6= 0, ρ 6= 0, λ = 0 → SDM (Spatial Durbin Model).
θ 6= 0, ρ = 0, λ 6= 0 → SDM (Spatial Durbin Error Model).

Alternatives of specification
Introduction
Methods of estimation Maximum likelihood estimation
Spatial Modelling: Data-driven strategies Instrumental Variables and Generalized Method of Moments
Interpretation
Summary
General Index
1 Introduction
5 Interpretation

Introduction
Interpretation
Summary
MLE
The point of departure: assumption of normality for the error terms,

ε ∼ MVN(0, Ω).
The joint likelihood then follows from the multivariate normal distribution
for y .
SARAR model
Assuming |ρ| < 1 and |λ | < 1, the log likelihood function is
n 1 1 0
L β , ρ, λ , σ 2 = − ln (π) − lnΩ + ln |I − ρW | + ln |I − λ W | − v v
2 2 2
0 0 0
with v v = (Ay − X β ) B Ω−1 B(Ay − X β ) as the sum of squares of the
0
transformed errors; and E εε = Ω as the variance-covariance matrix.
Jacobian term is the determinant of a full n × n matrix, e.g. |I − ρW |

Stata syntax for MLE: spregress depvar [indepvars], ml
estimator [options]
SLM
spregress U2012 NM2012, ml dvarlag(W5st)
SEM
spregress U2012 NM2012, ml errorlag(W5st)
SARAR
spregress U2012 NM2012, ml dvarlag(W5st) errorlag(W5st)
SDM
spregress U2012 NM2012, ml dvarlag(W5st) ivarlag(W5st: NM2012)
SDEM
U2012 NM2012, ml errorlag(W5st) ivarlag(W5st: NM2012)
estimator [options]
SLM
SEM
SARAR
SDM
SDEM
estimator [options]
SLM
SEM
SARAR
SDM
SDEM
estimator [options]
SLM
SEM
SARAR
SDM
SDEM
estimator [options]
SLM
SEM
SARAR
SDM
SDEM
Introduction
Interpretation
Summary
General Index
1 Introduction
5 Interpretation

Introduction
Interpretation
Summary
IV and GMM
The endogeneity of the Wy can also be addressed by means of an instrumental
variables or two-stage least squares (2SLS) approach:

E (y |X ) = [I − ρW ]−1 X β = I + ρW + ρ 2 W 2 + · · · X β
= X β + ρWX β + ρ 2 W 2 X β + · · ·
Then, Wy is instrumented using WX , W 2 X ,...

For the spatial term in error, Wu, Kelejian and Prucha (1999) develop a set of
moment conditions that yield estimation equations for the parameter λ :
h 0 i
E u u/n = σ 2
h 0 0 i 0
E u W Wu/n = σ 2 /n tr W W
h 0 i
E u Wu/n = 0

Stata syntax for IV/GMM: spregress depvar [indepvars],
gs2sls estimator [options]
SLM
spregress U2012 NM2012, gs2sls dvarlag(W5st)
SEM
spregress U2012 NM2012, gs2sls errorlag(W5st)
SARAR
spregress U2012 NM2012, gs2sls dvarlag(W5st) errorlag(W5st)
SDM
spregress U2012 NM2012, gs2sls dvarlag(W5st) ivarlag(W5st: NM2012)
SDEM
U2012 NM2012, gs2sls errorlag(W5st) ivarlag(W5st: NM2012)
SLM
SEM
SARAR
SDM
SDEM
SLM
SEM
SARAR
SDM
SDEM
SLM
SEM
SARAR
SDM
SDEM
SLM
SEM
SARAR
SDM
SDEM
Introduction
Methods of estimation Specific to General modelling
Spatial Modelling: Data-driven strategies General to Specific modelling
Interpretation
Summary
General Index
1 Introduction
5 Interpretation

Specific-to-General (STGE) Modelling
Introduction
Interpretation
Summary
Residual tests
The first step (under STGE) is to estimate a no-spatial model and obtain the
residuals.
In our case our initial model is
U2012 = β1 + β2 NM2012 + u.
This equation is estimated under OLS:
. reg U2012 NM2012
Source | SS df MS Number of obs = 164
-------------+------------------------------ F( 1, 162) = 56.32
Model | 1453.94714 1 1453.94714 Prob > F = 0.0000
Residual | 4182.20231 162 25.8160636 R-squared = 0.2580
-------------+------------------------------ Adj R-squared = 0.2534
Total | 5636.14945 163 34.577604 Root MSE = 5.081
------------------------------------------------------------------------------
U2012 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
NM2012 | -.7011928 .0934347 -7.50 0.000 -.8856998 -.5166859
_cons | 11.43504 .4697136 24.34 0.000 10.50749 12.36259
------------------------------------------------------------------------------

Introduction
Interpretation
Summary
Residual tests
There are a set of tests that allow the detection of spatial autocorrelation:
Parameters in H1
Null hypotheses Test
Spatial lag Error lag
yes - LMERROR
λ =0
yes yes RLMERROR
- yes LMLAG
ρ =0
yes yes RLMLAG
No spatial
Moran´s I
autocorrelation

Moran I test
1 Null and Alternative Hypotheses: H0 : No spatial autocorrelation,

H1 : No H0
2 Moran (1950) proposes the following test:
0
e We
I= 0
ee
where u are the OLS residual, W is the row-normalized weighting matrix.
Asimptotic distribution, under H0 :
√
n [I − E (I )] ∼ N [0, V (I )]
as
The rejection of the null hypothesis should lead us to specify a model

where the spatial structure is present.
There is no model in the alternative hypothesis.
Moran’s test works well even for small sample sizes, although a sample
size greater than 40 units is advisable.
Disadvantage: it behaves like a misspecification test.
Spatial Dependence in the error term: LMERROR and RLMERROR
1 Null and Alternative Hypotheses: H0 : λ = 0, H1 : λ 6= 0

(SEM) y = X β + u, u = λ Wu + ε
2 Simple version test:
" 0
#2
1 e We
LMERROR = ∼ χ 2 (1)
T1 σe2 as
3 Robust test: this version introduces a correction to the LMERROR under

the presence of a spatial lag ρ.

0 −1 0 2
e We e Wy
σ
− T1 nJeρβ
e2 σe2
RLMERROR = −1 ∼ χ 2 (1)
as

2
T1 − T1 nJρβe
0

where e are the OLS residuals, T1 = tr W 2 + W W ,
 0 
WX βe M WX βe −1 0 0
, M = I − X X 0 X X yσ e2 = e e
nJeρβ = T1 + σe2 n .
Spatial Dependence in Dependent variable: LMLAG and RLMLAG
1 Null and Alternative Hypotheses: H0 : ρ = 0, H1 : ρ 6= 0

(SLM) y = ρWy + X β + ε
2 Simple version test:
0
2
e Wy
σe2
LMLAG = ∼ χ 2 (1)
nJeρβ as
3 Robust test: this version introduces a correction to the LMLAG under the
presence of a spatial lag λ .
0
2
0
e Wy
σ
− e σeWe
e2 2
RLMLAG = ∼ χ 2 (1)
nJeρβ − T1 as
 0 
WX βe M WX βe
where e are the OLS residuals, nJeρβ = T1 + e2
σ
,
0
0 −1 0 0
e e
T1 = tr W 2 + W W , M = I − X X X e2 =
X yσ n .
Spatial test in Stata
reg U2012 NM2012

spatdiag, weights(W5st)
Diagnostic tests for spatial dependence in OLS regression

------------------------------------------------------------
Diagnostics
------------------------------------------------------------
Test | Statistic df p-value
-------------------------------+----------------------------
Spatial error: |
Moran’s I | 12.703 1 0.000
Lagrange multiplier | 148.081 1 0.000
Robust Lagrange multiplier | 0.750 1 0.386
|
Spatial lag: |
------------------------------------------------------------
According to the evidence of the tests, an SLM should be estimated:
U20012 = ρ (W × U20012) + β1 + β2 NM2012 + u.

Introduction
Interpretation
Summary
Don’t forget the SLX
From OLS and using LM’s you are exploring SEM and SLM models.
However, the third alternative (according to the chart) was not explored.
Now, we check the SLX:
U20012 = β1 + β2 NM2012 + θ1 (W × NM2012) + u.
splagvar , wname(W5st) wfrom(Stata) ind(NM2012)

reg U2012 NM2012 wx_NM2012 (omitted results)
SLX is a competitive model: θ1 significant.

Now, we check the presence of spatial effects in SLX’s residuals.

Spatial test in Stata
reg U2012 NM2012 wx_NM2012

spatdiag, weights(W5st)
Diagnostics
------------------------------------------------------------
Test | Statistic df p-value
-------------------------------+----------------------------
Spatial error: |
Moran’s I | 12.868 1 0.000
|
Spatial lag: |
------------------------------------------------------------
According to the evidence of the tests, an SDM should be estimated:
U20012 = ρ (W × U20012) + β1 + β2 NM2012 + θ1 (W × NM2012) + u.

Introduction
Interpretation
Summary
Conclusion of STGE
From a OLS model, we detect spatial effects on dependent variable:

SLM
U20012 = ρ (W × U20012) + β1 + β2 NM2012 + u.
But, following the 3rd alternative, we detect spatial effects on SLX:

SDM.
U20012 = ρ (W × U20012) + β1 + β2 NM2012 + θ1 (W × NM2012) + u.
Also, there is another alternative: the SDM could be reduced to SEM.

Likelihood Ratio: Common factor test
Assuming the SDM model has been estimated:
y = ρWy + X β + WX θ + u.
The null and alternative hypotheses are: H0 : θ + ρβ = 0, H1 : θ + ρβ 6= 0.

Under H0 , θ = −ρβ , and replacing into the SDM model:
y = ρWy + X β + WX (−ρβ ) + u = ρWy + X β − ρWX β + u,

(I − ρW ) y = (I − ρW ) X β + u.
The last expression is summarized in SEM: y = X β + (I − λ W )−1 ε, where ρ

has been replaced by λ .
Under null hypothesis, we have an SEM and, under alternative hypothesis, an
SDM:
h i
LRCOMFAC = 2 l|H1 − l|H0 ∼ χq2
as
lrtest SDM_ml SEM_ml

Likelihood-ratio test LR chi2(1) = 6.81
(Assumption: SEM_ml nested in SDM_ml) Prob > chi2 = 0.0091
Introduction
Interpretation
Summary
Conclusion of STGE
Variable OLS SLX SLM SDM
NM2012 −0.70∗∗ −0.29∗∗ −0.19∗∗ −0.16∗∗

W × NM2012 −0.95∗∗ −0.10∗∗∗
const −11.44∗∗ 13.02∗∗ −2.41∗∗ −2.82∗∗
ρb −0.82∗∗ −0.80∗∗
loglik −489.28 −478.67 −419.35 −418.88

AIC 1000.56 963.33 846.70 847.76
Nota: ∗∗ p < 0.05.
What is the best model?

Introduction
Interpretation
Summary
General Index
1 Introduction
5 Interpretation

General-to-Specific (GETS) Modelling
Introduction
Interpretation
Summary
Initial model for GETS
The GETS starts with the most complex model and then, using LR
test, we go down, dropping non significant variables.
LeSage and Pace (2009) suggest to start with Spatial Durbin

Model (you reach most nested models).
Elhorst (2014) suggest compare with the Spatial Durbin Error

Model (produces similar predictions in many cases).

Introduction
Interpretation
Summary



Introduction
Interpretation
Summary



Introduction
Interpretation
Summary
LR test from SDM

lrtest SDM_ml SEM_ml
(Assumption: SEM_ml nested in SDM_ml) Prob > chi2 = 0.0091
lrtest SDM_ml SLX_ml

(Assumption: SLX_ml nested in SDM_ml) Prob > chi2 = 0.0000
lrtest SDM_ml SLM_ml

(Assumption: SLM_ml nested in SDM_ml) Prob > chi2 = 0.3331
We select the SLM

Introduction
Interpretation
Summary
GETS: selecting the best model
Variable OLS SLX SLM SDM SDEM SARAR
NM2012 −0.70∗∗ −0.29∗∗ −0.19∗∗ −0.16∗∗ −0.21∗∗ −0.16∗∗
W × NM2012 −0.95∗∗ −0.10∗∗∗ −0.32∗∗∗
const −11.44∗∗ 13.02∗∗ −2.41∗∗ −2.82∗∗ −11.78∗∗ −1.74∗∗
ρb −0.82∗∗ −0.80∗∗ −0.88∗∗
λ
b 0.83∗∗ −0.33∗∗
LRCOMFAC 6.81∗∗
loglik −489.28 −478.67 −419.35 −418.88 −421.12 −418.08
AIC 1000.56 963.33 846.70 847.76 852.25 846.15
Nota: ∗∗ p < 0.05.

Introduction
Interpretation
Summary
Maximum Likelihood: selecting the best model
From specific to general strategy:

Using LM tests: spatial lag model (SLM).
Between SDM and SEM: LRCOMFAC .
From general to specific strategy:

Start using SDM and to eliminate sequentially non-significant
variables: SLM selected.

Introduction
Interpretation
Summary
Results under IV/GMM
Variable SEM SLM SDM SDEM SARAR
NM2012 −0.21∗∗ −0.14∗∗ −0.14∗∗ −0.24∗∗ −0.15∗∗

W × NM2012 0.02∗∗ −0.47∗∗
const 10.50∗∗ −1.57∗∗ −1.40∗∗ −11.97∗∗ −1.60∗∗
ρb −0.89∗∗ −0.91∗∗ −0.89∗∗
λ
b 0.78∗∗ −0.19∗∗
pseudo − R 2 0.26 0.43 0.43 0.41 0.43

Nota: ∗∗ p < 0.05.

Introduction
Interpretation
Summary
Interpretation of estimated parameters
In SLM, SARAR or SDM models, a change of the variable xk

in region i will affect the region itself and affects potentially
the other regions indirectly through the spatial multiplier
mechanism ((I − ρW )−1 ).
In a linear model, the marginal effect is:
∂ E (yi ) b ∂ E (yj )
= βk =0
∂ xik ∂ xik
but in spatial models with Wy and/or Wx, the second effect is

not zero.

Introduction
Interpretation
Summary
SLM. Direct and indirect effects

The marginal effect of the explanatory variable xk on the dependent variable is:
 ∂y ∂y 
1 1
∂x
··· ∂ xnk
h
∂y ∂y
i  .1k . .. 
. . . =  . .. ,
∂ x1k ∂ xnk  . . 
∂ yn ∂ yn
∂ x1k
··· ∂ x1k
0 ··· 0
 
βk
 0 βk ··· 0 
= (In − ρW )−1  .. .. .. ,
 
..
 . . . . 
0 0 ··· βk
= (In − ρW )−1 [βk In ] , (1)
Direct effect: average of the elements of principal diagonal of

(In − ρW )−1 [βk In ].
Indirect effect: (spatial spillover) average of sum of rows, without of elements
of principal diagonal of (In − ρW )−1 [βk In ].
Example under SLM
. estat impact
progress :100%
Average impacts Number of obs = 164
--------------------------------------------------------
| Delta-Method
| dy/dx Std. Err. z P>|z|
-------------+------------------------------------------
direct |
NM2012 | -.2414164 .0700594 -3.45 0.001
-------------+------------------------------------------
indirect |
NM2012 | -.7986744 .2491086 -3.21 0.001
-------------+------------------------------------------
total |
NM2012 | -1.040091 .3029917 -3.43 0.001
--------------------------------------------------------
Example under SLM
If we apply manually the above expressions (SLM under MLE):
. mata:
---------------------- mata (type end to exit)-------------------
: b = st_matrix("e(b)")
: b
1 2 3 4
+-------------------------------------------------------------+
1 | -.1898462498 2.40790211 .8174714987 8.182474325 |
+-------------------------------------------------------------+
: rho = b[1,3]
: rho
.8174714987
: S = luinv(I(rows(W))-rho*W)
: end
-----------------------------------------------------------------
. * Total effects
. mata: (b[1,1]/rows(W))*sum(S)
-1.040090862
* Direct effects
. mata: (b[1,1]/rows(W))*trace(S)
-.2414164387
. * Indirect effects (spatial spillovers)
. mata: (b[1,1]/rows(W))*sum(S) - (b[1,1]/rows(W))*trace(S)
-.7986744231
Introduction
Interpretation
Summary
Summing up
Stata is one of the most complete in tools for spatial

econometrics estimation for cross-sectional data:
MLE
IV/GMM.
Also, for cross-section data, the most common spatial
specifications can be estimated by ML and/or IV/GMM.
Main results of the impact of net migration:
Cross-section model: SLM shows a negative impact in
unemployment (long run effect).

Introduction
Interpretation
Summary
Summing up
Stata is one of the most complete in tools for spatial

econometrics estimation for cross-sectional data:
MLE
IV/GMM.
Also, for cross-section data, the most common spatial
specifications can be estimated by ML and/or IV/GMM.
Cross-section model: SLM shows a negative impact in
unemployment (long run effect).

Introduction
Interpretation
Summary
Some references
Anselin L. (2003). “Spatial Externalities, Spatial Multipliers and Spatial
Econometrics,” International Regional Science Review, 26, 153-166.
Anselin, L. and A. Bera (1998). “Spatial dependence in linear regression
models with an Introduction to Spatial Econometrics,” Handbook of
Applied Economic Statistics, pp. 237-289.
Brueckner, J. (2003). “Strategic interaction among governments: An
overview of empirical studies,” International Regional Science Review,
26(2).
Kelejian, H. H., and Prucha, I. R. (1998). A generalized spatial two-stage
least squares procedure for estimating a spatial autoregressive model with
autoregressive disturbances. The Journal of Real Estate Finance and
Economics, 17(1), 99-121.
Mur, J., and Angulo, A. (2009). Model selection strategies in a spatial
setting: Some additional results. Regional Science and Urban Economics,
39(2), 200-213.

Introduction to panel data models
Testing spatial effects
Static spatial panel models
Dynamic spatial panel models
Common factors
Summary

Spatial Econometrics Models for Panel data
([email protected])
1 CONICET-IELDE

February 18th, 2022
M. Herrera-Gomez Spatial Panel Data Models
Common factors
Summary
General index
1 Introduction to panel data models

2 Testing spatial effects
Pooled Model
Fixed and Random models
3 Static spatial panel models
4 Dynamic spatial panel models
5 Common factors
General Nested Spatial model with common factors
Modelling Common Factors

Common factors
Summary
Basic model: pooled model
Consider a linear model:
yt = Xt β + ut ,
where:
yt is a n × 1 vector of outcomes for each t ∈ {1, . . . , T }.
Xt is a n × k matrix of time-invariant individual explanatory
variables.
ut is a n × 1 vector of random error terms.
Problem:
This model doesn’t control by heterogeneity: specific temporal or
individual variables could be affect on dependent variable.

Common factors
Summary
Model with individual and temporal effects

If we reconsider the basic model for each individual, with k independent
variables xit :
yit = xit β + uit ,

where i = 1, . . . , n, t = 1, . . . , T .
We can decompose the error term into (two-way error component):
uit = µi + φt + εit ,
where µi is a common region-specific effect and φt is a common time-specific
effect for all regions.
These effects could be treated as fixed or random.
In the fixed effects model, a dummy variable is introduced for each region
and each time.
In the random effects model, µi (i = 1, . . . , n) is treated as a random

variable that is independently and identically distributed, i.i.d. 0, σµ2 ,
and cov (µi , εit ) = 0. (similar assumption for φt ).
Common factors
Summary
Random effects models

This model is quite popular among applied econometricians, by following
reasons:
1 It may be considered as a compromise solution: Panel data models with
controls for fixed effects only utilize the time-variant variables, whereas
RE models employ both time-series and cross-sectional variables.
2 RE model avoids the loss of degrees of freedom in comparison to fixed
effects model: is an efficient estimator under ideal conditions.

3 RE model avoids the problem of variables that only vary a little and
cannot be estimated.
However, the random effects model should satisfied three conditions:
(1) The number of units should potentially be able to go to infinity.
(2) The units in the sample should be representative of a larger
population.
(3) The correlation between the random effects, µi (i = 1, . . . , n) and the
explanatory variables needs to be 0.
These 3 conditions do not tend to be satisfied in spatial research.
Common factors
Summary
Random effects models

This model is quite popular among applied econometricians, by following
reasons:
1 It may be considered as a compromise solution: Panel data models with
controls for fixed effects only utilize the time-variant variables, whereas
RE models employ both time-series and cross-sectional variables.
2 RE model avoids the loss of degrees of freedom in comparison to fixed
effects model: is an efficient estimator under ideal conditions.

3 RE model avoids the problem of variables that only vary a little and
cannot be estimated.
However, the random effects model should satisfied three conditions:
(1) The number of units should potentially be able to go to infinity.
(2) The units in the sample should be representative of a larger
population.
(3) The correlation between the random effects, µi (i = 1, . . . , n) and the
explanatory variables needs to be 0.
These 3 conditions do not tend to be satisfied in spatial research.
Common factors
Summary
Types of asymptotic
There are two types of asymptotic in spatial data:

1 INFILL asymptotic structure: the limits of the sampling region
remains bounded. When n goes to infinity, the more units come
from observations taken from between those already observed.
2 INCREASING DOMAIN asymptotic structure: the sampling region
grows when n goes infinity. In this case, the initial observations
preserve the spatial structure of neighbourhood.
Also, there are two types of sampling designs:
(a) stochastic design, where the spatial units are randomly drawn.
(b) fixed design where the spatial units lie on a non-random field.
Spatial econometric literature mainly focuses on increasing domain
asymptotic under a fixed sample design (Cressie 1993, Elhorst 2014).

Common factors
Summary
Fixed effects models
Additionally to the increasing domain asymptotic and a fixed sample

design: when the dataset contain all spatial units within a study area it is
questionable whether they are still representative of a larger population.
For example, given the all states in a country, the population may
be said to be sampled exhaustively (we have the population). Then,
the random effects are no necessary and fixed effects should be
specified.
Also, in Spatial econometrics there is a prominent reason for fixed effects:
under infill asymptotic, the spatial weight matrix cannot consistently be
specified and the impact of spatial interaction effects cannot be
consistently estimated.
In general, the fixed effects model is more appropriate than the random
effects model. However, random effect remains as a competitive model if
the objective population is a “super-population”.

Common factors
Summary
Fixed or random effects
Hausman test (1978) is computed as:

0
H = (βfe − βre ) (Vfe − Vre )−1 (βfe − βre ) ,
where βfe is the vector of coefficients of the consistent estimator
fe, βre is the vector of coefficients of the efficient estimator re,
with Vfe and Vre as the variance-covariance matrix of fe and re,
respectively. This statistic is distributed as χq2 , with q degrees
(number of common coefficients in both models).
Hausman test can be consider as a statistic of validation of re
estimator, null hypotheses.
Hausman’s specification test can also be used in models with
spatial lags Wy and WX .

Static spatial panel models Pooled Model
Dynamic spatial panel models Fixed and Random models
Common factors
Summary
General Index

Pooled Model
5 Common factors

Common factors
Summary
Simple LM Tests
Under a no-spatial pooled model, or under SLX extension, we can to test
the spatial autocorrelation on error:
[ub0 (IT ⊗W )ub/σb 2 ]2 2

LMERROR = ∼ χ(1) ,
T × T1 as
where T1 = tr [(W 0 + W ) W ] y ub are the OLS residuals from pooled

model and σ b 2 = ub0 ub/(n×T ).
Also, the presence of spatial lag can tested with:
2
[ub0 (IT ⊗W )y/σb 2 ] 2
LMLAG = ∼ χ(1) ,
Jb as
0
where Jb= 1/σb 2 (IT ⊗ W ) X βb MTn (IT ⊗ W ) X βb + T × T1 σb 2 , with
MTn = ITn − X (X 0 X )−1 X 0 .
Common factors
Summary
Robust LM Tests
The robust version of the LM error:
i2
ub0 (IT ⊗W )b
h 0
u
− T × T1 Jb−1 × u (ITσb⊗W )y
b
σb 2 2
2
RLMERROR = h i ∼ χ(1) .
−1 as
T × T1 1 − T × T1 J
b
The robust version of the LM lag:

i2
ub0 (IT ⊗W )y ub0 (IT ⊗W )b
h
u
σb 2
− σb 2 2
RLMLAG = ∼ χ(1) .
Jb− T × T1 as

Common factors
Summary
General Index

Pooled Model
5 Common factors

Detection of spatial dependence
To incorporate spatial effects we must have some evidence of their presence. A

possible test that can be used is CD test (Pesaran, 2004):
!
q n−1 n
2T
CD = n(n−1) ∑ ∑ ij ,
ρ
b
i=1 j=i+1
where ρbij is the correlation coefficient in the residuals between i and j:

T
∑ ubit ubjt
t=1
ρbij = ρbji = 1/2 1/2 , (??) (1)
T T
∑ ubit2 ∑ ubjt2
t=1 t=1
Null hypothesis: no autocorrelation in cross-section dimension.
In Stata:
. xtreg U NM, fe
(ommitted product)
. xtcsd, pes abs
Pesaran’s test of cross sectional independence = 60.169, Pr = 0.0000

Average absolute value of the off-diagonal elements = 0.464
Common factors
Summary
Spatial lag model

The SLM with fixed effects is:
yt = ρWyt + Xt β + µ + εt ,
(2)
εt ∼ N 0, σε2 In ,

where
···
     
y1t x11t x21t xk1t µ1
 y2t   x12t x22t ··· xk2t   µ2 
yt =  ..  , Xt =  .. .. .. , µ =  .. .
     
..
 .   . . . .   . 
ynt x1nt x2nt ··· xknt µn
Under random effects, this model can be written as:
yt = ρWyt + Xt β + µ + εt ,
| {z }
hut i (3)
εt ∼ N 0, σε2 In , µ ∼ N 0, σµ2 In .


Common factors
Summary
SLM. Direct and indirect effects

The partial effect of one unit increases on the SLM model is as follows:
 ∂y
· · · ∂∂xy1

1
∂ x1k nk
h
∂y
i  . .. .. 
∂ x1k
. . . ∂∂xy =   .. . . ,

nk
∂ yn ∂ yn
∂ x1k
··· ∂x
1k
0 ··· 0
 
βk
 0 βk · · · 0 
= (In − ρW )−1  . .. ..  ,
 
 .. ..
. . . 
0 0 · · · βk
= (In − ρW )−1 [βk In ] , (4)
Direct effect: average of the elements of principal diagonal of

(In − ρW )−1 [βk In ].
Indirect effect: (spatial spillover) average of sum of rows, without of elements
of principal diagonal of (In − ρW )−1 [βk In ].
Common factors
Summary
Spatial Error Model
The SEM model with fixed effects is:
yt = Xt β + µ + εt
εt = ρW εt + ηt (5)
ηt ∼ N 0, ση2 In

and the version of SEM model with fixed effects is:
yt = Xt β + µ + εt ,
| {z }
ut
εt = ρW εt + ηt ,
ηt ∼ N 0, ση2 In , µ ∼ N 0, σµ2 In ,


Common factors
Summary
Spatial Durbin Model

SDM specification:
yt = ρWyt + Xt β + WXt γ + εt , (6)

with direct-indirect effects:
 ∂ y1 ∂ y1 
∂ x1k
··· ∂ xnk
.. ..
h i
∂y
... ∂y
=
 .. 
,
. . .

∂ x1k ∂ xnk  
∂ yn ∂ yn
∂ x1k
··· ∂ x1k
···
 
βk w12 γk w1n γk
 w21 γk βk ··· w2n γk 
= (In − ρW )−1  .. .. .. ,
 
..
 . . . . 
wn1 γk wn2 γk ··· βk
= (In − ρW )−1 [βk In + γk W ] , (7)

command xsmle
SLM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sar)
hausman
SEM
xsmle U NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem) hausman
SDM
xsmle U NM t2-t6, fe type(ind, leeyu) wmat(W5_st) mod(sdm)
durbin(NM) hausman
SDEM
xsmle U NM wx_NM t2-t6, fe type(ind, leeyu) emat(W5_st)
mod(sem)
command xsmle
SLM
hausman
SEM
mod(sem) hausman
SDM
durbin(NM) hausman
SDEM
mod(sem)
command xsmle
SLM
hausman
SEM
mod(sem) hausman
SDM
durbin(NM) hausman
SDEM
mod(sem)
command xsmle
SLM
hausman
SEM
mod(sem) hausman
SDM
durbin(NM) hausman
SDEM
mod(sem)
Common factors
Summary
Alternative Models
Variable SLM SEM SDM SDEM
NM −0.169∗∗∗ −0.166∗∗∗ −0.147∗∗∗ −0.190∗∗∗

W × NM −0.048∗∗∗ −0.361∗∗∗
ρb −0.745∗∗∗ −0.721∗∗∗
λ
b −0.840∗∗∗ −0.735∗∗∗
COMFAC −90.42∗∗∗
Spatial effects (long run)
Directs −0.200∗∗∗ −0.182∗∗∗ −0.190∗∗∗
Indirects −0.463∗∗∗ −0.518∗∗∗ −0.361∗∗∗
Totals −0.662∗∗∗ −0.700∗∗∗ −0.551∗∗∗
AIC 2353 2415 2351 2340

Common factors
Summary
Types of Spatial Lag Models
Following Anselin et al (2008), there are 3 types of dynamics spatial lag

panel models (SLM):
1 Simultaneous spatio-temporal
yt = τyt−1 + ρWyt + Xt β + µ + εt .
2 Pure Recursive
yt = γWyt−1 + Xt β + µ + εt .
3 Spatio-temporal Recursive
yt = τyt−1 + γWyt−1 + ρWyt + Xt β + µ + εt .

Common factors
Summary
Simultaneous spatio-temporal Model
Simultaneous spatio-temporal
yt = τyt−1 + ρWyt + Xt β + µ + εt .
1 The dynamic structure is explicit with an inter-temporal contagion
that multiplies through the impact of contemporary neighbours.
2 The contemporary spatial effect hinders the use of this model for
predictive purposes:
The individual reacts immediately to his neighbours, although he is
also affected by his past.
3 The estimation can be done by GMM or MV.
4 Stationarity condition is required.

Common factors
Summary
Pure recursive Model
Pure Recursive
yt = γWyt−1 + Xt β + µ + εt .
1 The dynamic structure is indirect but exists:
Example: y1t depends on ywi ,t−1 which, in turn, depends on y1,t−2 .
2 It is useful for the innovation diffusion model (Upton and Fingleton,
1985) or contagion-models (COVID-19).
3 The estimation can be done using instrumental variables in the
traditional way or GMM and MV.

Common factors
Summary
Spatio-temporal recursive model
Spatio-temporal Recursive
yt = τyt−1 + γWyt−1 + ρWyt + Xt β + µ + εt .

1 The dynamic structure is explicit in both directions: the spatial and
the temporal direction.
The network of multiplier effects is complex.
2 It has a good predictive capacity as reflected by Giacomini and
Granger (2004).
3 The estimation is possible either by GMM or QMV.
4 It is necessary to analyse the stationarity conditions (τ + γ + ρ < 1).
5 Model with different extensions nowadays.

Common factors
Summary
Types of Spatial Durbin Models
Again, there are 3 possible specifications of spatial dynamic (SDM):

1 Simultaneous Spatio-temporal
yt = τyt−1 + ρWyt + Xt β + WXt θ + µ + εt .

2 Pure Recursive
yt = γWyt−1 + Xt β + WXt θ + µ + εt .
3 Spatio-temporal Recursive
yt = τyt−1 + γWyt−1 + ρWyt + Xt β + WXt θ + µ + εt .

Common factors
Summary
Direct and indirect effects
If we consider the most complete model previous model: spatio-temporal

recursive SDM.
The direct and indirect short- and long-run effects can be obtained:
Short run (assuming τ = γ = 0):
h i
∂y
∂x . . . ∂∂xy = (In − ρW )−1 [βk In + γk W ] .
1k nk t
Long run (assuming yt = yt−1 = y ∗ ):
h i
∂y
∂x . . . ∂y
∂x = [(1 − τ) In − (γ + ρ) W ]−1 [βk In + γk W ] .
1k nk t

Common factors
Summary
xsmle command
SLM 1
xsmle U NM, dlag(1) fe wmat(W5_st) type(both) mod(sar)
effects nsim(499)
SLM 2
effects nsim(499)
SLM 3
effects nsim(499)

Common factors
Summary
xsmle command
SLM 1
effects nsim(499)
SLM 2
effects nsim(499)
SLM 3
effects nsim(499)

Common factors
Summary
xsmle command
SLM 1
effects nsim(499)
SLM 2
effects nsim(499)
SLM 3
effects nsim(499)

Alternative models of dynamic SLM
Variable SLM 1 SLM 2 SLM 3
Ut−1 −0.59∗∗∗ −0.66∗∗∗

W ×U 0.48∗∗∗ −0.56∗∗∗ −0.58∗∗∗
W × Ut−1 −0.42∗∗∗ −0.17∗∗∗
NM 0.03∗∗ −0.06∗∗∗ 0.02∗∗
Spatial effects (short run)
Directs −0.03∗∗ −0.06∗∗∗ −0.03∗∗∗
Indirects −0.02∗ −0.07∗∗∗ −0.03∗∗∗
Totals −0.02∗∗ −0.13∗∗∗ −0.06∗∗∗
Directs −0.05∗∗ −0.08∗∗ −0.12∗∗
Indirects −1.21 −0.15∗∗ −0.36∗∗
Totals −1.16 −0.23∗∗ −0.49
AIC 1801.82 1966.40 1799.70
stationarity
Static spatial panel models General Nested Spatial model with common factors
Dynamic spatial panel models Modelling Common Factors
Common factors
Summary
General Index

Pooled Model
5 Common factors

Common factors
Summary
General Nested model with CF
General Nesting Spatial (GNS) model with Common Factors (Elhorst,

2020):
yt = τyt−1 + ρWyt + γWyt−1 + Xt β + WXt θ + ∑Γ 0 frt + ut , .

r
ut = λ Wut + εt
Contemporaneous spatial lags in dependent and explanatory

variables, including the error term.
Temporal lag and spatio-temporal lag.
Generic common factors ∑Γ 0 frt : unobserved shocks, probable
r
non-linears.

Common factors
Summary
Linear Restrictions in unobservable terms
Common factors can be linearly restricted: ∑Γ 0 frt = µ + αt ιN

r
µ is a vector of individual effects, fixed or random.
αt is a temporal effect, fixed or random.
This type of restriction allows returning to the previously panel
models.
The random option should satisfy the assumptions that:
the number of units potentially goes to infinity.
the observations be representative (a sample) of a large
population.
the effects are orthogonal to the explanatory variables.
These conditions are not adequately met in empirical spatial
researchs: preponderance of fixed effects models.

Common factors
Summary
General Index

Pooled Model
5 Common factors

Common factors
Summary
Common Factors in the GNS

There are 3 alternatives to specify the common factors within the GNS:
Option 1 for ∑Γ 0 frt :

r
0
Consider 2 factors f1t = 1 1 · · · 1 and
0
f2t = α1 α2 · · · αT with the imposition of the parametric
constraints: Γ01 = µ1 µ2 · · · µn and Γ02 = 1 1 · · · 1 .
Using this option captures individual and temporary fixed effects:
Individual fixed effects are captured by f1t which is constant over
time but with heterogeneous coefficients Γ1 .
Time fixed effects are captured by f2t which changes between
periods but with homogeneous coefficients Γ2 .
The number of common factor parameters to be estimated is
n + T + 1.
Common factors
Summary

r
Another alternative to control for common factors is to use individual
fixed effects, but include time fixed effects using cross-sectional averages
1 n 1 n
yt = ∑ yit , y t−1 = ∑ yi,t−1
n i=1 n i=1
1 n
xt = ∑ xikt , (k = 1, ..., K )
n i=1
The problem with temporal fixed effects is that each dummy has the
same impact on all observations for period t, in this case a temporal
heterogeneity is introduced.
Problem with this strategy: the parameters grow to n + (2 + K ) × n.
Empirically, introducing the time effects of y t and y t−1 is effective in
capturing unobservable heterogeneity (Cicarelli and Elhorst, 2018).
Common factors
Summary

r
estimate the main components with the idea of Shi and Lee (2017):
QML estimate for the GNS model with CF, including a Nickell bias
correction and corrections for the impact of the bias on the other
parameters.
Elhorst extended this analysis by including different measures of
goodness of fit.
Problems:
No-easy interpreting of principal components compared to the
cross-sectional averages strategy.
This strategy requires estimating 2 × n additional parameters.

Testing the type of cross-sectional dependence
Recall that the cross-sectional CD test (Pesaran, 2004) uses the correlation
coefficient between pairs of units in a panel:
s !
n−1 n
2T
CD = ∑ ∑ ρbij
n (n − 1) i=1 j=i+1
Two null hypotheses can be tested:

1 H0 : independence in the cross-section (checked previously).
2 H0 : weak cross-section dependence (α ≤ 1/2)
H1 : strong dependence on cross-section (α > 1/2)
where α is the exponent of the cross-sectional dependence defined as
!
n−1 n
2n
ρn = ∑ ∑ ρij = O n2α−2 ,
n (n − 1) i=1 j=i+1
and it measures the rate at which the variance of the cross-sectional

correlation averages goes to 0.
For α ≤ 1/2, ρ n tends to go 0 very fast.
For α ' 1, ρ n tends to a non-zero value (common factor).
Common factors
Summary
Testing the type of cross-sectional dependence
Bailey et al (2016) propose a consistent estimation of α, such that the

type of cross-sectional dependence present in the panel can be tested:
The exponent α can take values within the interval (0, 1] and :
1 α ≤ 1/2 weak dependency.
2 α = 1 strong dependency.
3 Intermediate values indicate moderate dependence.
The use of this statistic allows discriminating the estimation method to
be used for the panel.

Common factors
Summary
Estimation method according α test
Elhorst et al (2021) propose the following strategy according α exponential test:
α Cross-section Dep. W matrix Method
0 < α ≤ 0.5 weak sparse

0.5 < α ≤ 0.75 moderate still quite sparse ML/GMM/IV
0.75 < α < 1 quite strong dense (GVAR)
α =1 strong CS averages or PC (without W) OLS
The α can be estimated consistently only for 0.5 < α ≤ 1. Use Pesaran’s
CD test to find out whether α is smaller or greater than 0.5.

Estimation method according α test
Practical guide suggested by Elhorst et al (2021):

1 Assess the degree of strong cross-sectional dependence in the raw data
using the CD-test of Pesaran (2004) and the corresponding exponent α
of Bailey et al (2016).
1 A non-significant CD-test result or a significant CD-test result with
a value of α significantly smaller than 3/4 indicates that the data
are weakly dependent or moderately dependent.
A spatial econometric model without CF suffices.
2 A significant CD-test and a value of α not significantly smaller than
1 suggests the presence of CF.
2 Assess the degree of cross-sectional dependence of Cross-sectional
Average using the residuals. Apply the CD-test on the “de-factored”
observations from step 1 in case a common factor model has been chosen.
1 Failure to reject the null indicates possibly remaining weak
cross-sectional dependence: The appropriate method is a CF model
with a sparse connectivity matrix W estimated by means of
ML/IV/GMM.
Elhorst’s procedure
Elhorst’s procedure
Alternative competitive models
Variable SLM 2 SLM 2+CF SLM 3+CF SDM+CF
Ut−1 −0.26∗∗∗ −0.25∗∗∗

W ×U −0.56∗∗∗ −0.67∗∗∗ −0.66∗∗∗ −0.67∗∗∗
W × Ut−1 −0.42∗∗∗ −0.14∗∗∗ −0.12∗∗∗ −0.11∗∗∗
NM −0.06∗∗∗ −0.03∗∗∗ −0.03∗∗∗ −0.04∗∗∗
W × NM 0.02
Spatial effects (short run)
Directs −0.06∗∗∗ −0.04∗∗∗ −0.03∗∗∗ −0.04∗∗
Indirects −0.07∗∗∗ −0.06∗∗∗ −0.05∗∗∗ −0.02∗∗
Totals −0.13∗∗∗ −0.10∗∗∗ −0.08∗∗∗ −0.06∗∗
Directs −0.08∗∗ −0.04∗∗∗ −0.05∗∗∗ −0.05∗∗
Indirects −0.15∗∗ −0.13∗∗ −0.10∗∗ −0.04∗∗
Totals −0.23∗∗ −0.17∗∗ −0.15∗∗ −0.09
AIC 1966.40 2706.95 2706.60 2709.73
stationarity yes yes yes yes
Common factors
Summary
Summing up
For panel data, recent developments in Stata provide
alternatives for estimating static and dynamic models.
Dynamic spatial econometric models for spatial panels with
common factors (CF) are one of the most advanced models
currently available for empirical research.
Stata contains the alternative commands to implement this
models and tests.
Static Panel model: SDM and the SDEM show a negative
impact in unemployment (NEG theory).
Dynamic Panel model: All competitive models show a negative
impact in short and long run. Results in line with the NEG
theory. (Caution: this is an illustrative example, dynamic panel
models require a larger time dimension)
Common factors
Summary
Summing up
For panel data, recent developments in Stata provide
alternatives for estimating static and dynamic models.
Dynamic spatial econometric models for spatial panels with
common factors (CF) are one of the most advanced models
currently available for empirical research.
Stata contains the alternative commands to implement this
models and tests.
Static Panel model: SDM and the SDEM show a negative
impact in unemployment (NEG theory).
Dynamic Panel model: All competitive models show a negative
impact in short and long run. Results in line with the NEG
theory. (Caution: this is an illustrative example, dynamic panel
models require a larger time dimension)
Common factors
Summary
Some references
Theoretical references:
Anselin, L., Gallo, J. L., & Jayet, H. (2008). Spatial panel econometrics. In The
econometrics of panel data (pp. 625-660). Springer, Berlin, Heidelberg.
Bailey, N., Kapetanios, G., & Pesaran, M. H. (2016). Exponent of
cross-sectional dependence: Estimation and inference. Journal of Applied
Econometrics, 31(6), 929-960.
Elhorst, J. P., Gross, M., & Tereanu, E. (2021). Cross-sectional dependence and
spillovers in space and time: Where spatial econometrics and global var models
meet. Journal of Economic Surveys, 35(1), 192-226.
Applied references:
Elhorst et al. (2020): Persistent habit car (Spatio-temporal recursive SDM with
CF).
Jung et al (2014): impact of poverty’s programs (Spatio-temporal recursive
SDM).
Keller and Shiue (2007): historical analysis of rice’s price (simultaneous
spatio-temporal SLM).
Montmartin and Herrera (2015): R&D investment in OCDE countries
(Spatio-temporal recursive SDM).
View publication stats

Spatial Econometricswith Stata 2022

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Spatial Econometricswith Stata 2022

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatial Econometricswith Stata 2022

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Spatial Econometrics with Stata: Exploratory Spatial Data Analysis (ESDA),

Presentation · February 2022

Marcos Herrera Gomez

Land & Property Market Value View project

Econometrics View project

The user has requested enhancement of the downloaded file.

Spatial Econometrics with Stata

Graduate School of International Development

M. Herrera-Gomez Exploratory Spatial Data Analysis

Objective of this Workshop

We will explore the main topics of Spatial Econometrics Analysis

1 Exploratory Spatial Data Analysis (ESDA): February 14.

M. Herrera-Gomez Exploratory Spatial Data Analysis

What is Spatial Econometrics?

Jean Paelinck introduced the term “Spatial Econometrics” in

Luc Anselin: “SE is an econometric branch dealing with spatial

M. Herrera-Gomez Exploratory Spatial Data Analysis

Why is Spatial Econometrics important nowadays?

M. Herrera-Gomez Exploratory Spatial Data Analysis

Types of spatial data

Spatial data are those data which combine attribute

M. Herrera-Gomez Exploratory Spatial Data Analysis

Main books of Spatial Econometrics

Anselin L. (1988). Spatial Econometrics, Methods and

M. Herrera-Gomez Exploratory Spatial Data Analysis

ESDA: Exploratory Spatial Data Analysis

ESDA has its origins in Exploratory Data Analysis (EDA).

M. Herrera-Gomez Exploratory Spatial Data Analysis

Visualize spatial distributions.

M. Herrera-Gomez Exploratory Spatial Data Analysis

Hot topic in economics.

M. Herrera-Gomez Exploratory Spatial Data Analysis

From shape to dta

First we need a file of administrative areas:

M. Herrera-Gomez Exploratory Spatial Data Analysis

From shape to dta

First we need a file of administrative areas:

M. Herrera-Gomez Exploratory Spatial Data Analysis

From shape to dta

First we need a file of administrative areas:

M. Herrera-Gomez Exploratory Spatial Data Analysis

From shape to dta

First we need a file of administrative areas:

M. Herrera-Gomez Exploratory Spatial Data Analysis

The spshape2dta command generates two files:

M. Herrera-Gomez Exploratory Spatial Data Analysis

The spshape2dta command generates two files:

M. Herrera-Gomez Exploratory Spatial Data Analysis

Merging data sets

The new database (nuts2.dta) does not contain information

import excel "C:\.\.\.\nuts2_164.xls", firstrow

M. Herrera-Gomez Exploratory Spatial Data Analysis

Merging data sets

The new database (nuts2.dta) does not contain information

import excel "C:\.\.\.\nuts2_164.xls", firstrow

M. Herrera-Gomez Exploratory Spatial Data Analysis

Merging data sets