0% found this document useful (0 votes)

211 views161 pages

Notes On Spatial Econometrics: Mauricio Sarrias Universidad de Talca October 6, 2020

This document provides an introduction to spatial econometrics. It discusses why spatial econometric models are needed to account for spatial dependence and autocorrelation between observational units like regions. It covers topics like spatial weight matrices, methods for constructing them, and how to test for spatial autocorrelation. It also describes different types of spatial econometric models, how to estimate their parameters using maximum likelihood, and how to interpret their results. Application examples in R are provided.

Uploaded by

bwcastillo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

211 views161 pages

Notes On Spatial Econometrics: Mauricio Sarrias Universidad de Talca October 6, 2020

Uploaded by

bwcastillo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 161

Notes on Spatial Econometrics

Mauricio Sarrias
Universidad de Talca

October 6, 2020
ii
Contents

I Introduction to Spatial Dependence 1

1 Introduction to Spatial Econometric 3
1.1 Why do We Need Spatial Econometric? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Spatial Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Spatial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Spatial Weight Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Weights Based on Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Weights Based on Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Row-Standardized Weights Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Spatial Lagged Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.5 Higher Order Spatial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Examples of Weight Matrices in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Creating Contiguity Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Creating Distance-based Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.3 Constructing a Spatially Lagged Variable . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Testing for Spatial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.1 Global Spatial Autocorrelation: Moran’s I . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Application: Poverty in Santiago, Chile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.1 Cloropeth Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5.2 Moran’s I Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Spatial Models 29
2.1 Taxonomy of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.1 Spatial Lag Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.2 Spatial Durbin Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.3 Spatial Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.4 Spatial Autocorrelation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Motivation of Spatial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.1 SLM as a Long-run Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 SEM and Omitted Variables Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.3 SDM and Omitted Variables Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 Interpreting Spatial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.1 Measuring Spillovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.2 Marginal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.3 Partitioning Global Effects Estimates Over Space . . . . . . . . . . . . . . . . . . . . . 42
2.3.4 Lesage’s Book Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

iii
iv

II Estimation Methods 51
3 Maximum Likelihood Estimation 53
3.1 What Are The Consequences of Applying OLS? . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1.1 Finite and Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1.2 Illustration of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Maximum Likelihood Estimation of SLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.1 Maximum Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.2 Score Vector and Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.3 Ord’s Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.4 Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 ML Estimation of SEM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.1 What Are The Consequences of Applying OLS on a SEM Model? . . . . . . . . . . . . 64
3.3.2 Log-likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.3 Score Function and ML Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Computing the Standard Errors For The Marginal Effects . . . . . . . . . . . . . . . . . . . . 67
3.5 Spillover Effects on Crime: An Application in R . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.1 Estimation of Spatial Models in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.2 Estimation of Marginal Effects in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.6 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6.1 Triangular Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6.2 Consistency of QMLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.6.3 Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Appendix 3.A Terminology in Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Appendix 3.B A function to estimate the SLM in R . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4 Hypothesis Testing 93
4.1 Test for Residual Spatial Autocorrelation Based on the Moran I Statistic . . . . . . . . . . . . 93
4.1.1 Cliff and Ord Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.1.2 Kelijan and Prucha (2001) Derivation of Moran’s I . . . . . . . . . . . . . . . . . . . . 94
4.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Common Factor Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3 Hausman Test: OLS vs SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.4 Tests Based on ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.2 Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.4.3 Lagrange Multiplier Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.4.4 Anselin and Florax Recipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.5 Lagrange Multiplier Test Statistics in R . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5 Instrumental Variables and GMM 105

5.1 A Review of GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.1 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.2 One-Step GMM Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.1.3 Two-Step GMM Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2 Spatial Two Stage Estimation of SLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.1 Instruments in the Spatial Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.2 Defining the S2SLS Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2.3 S2SLS Estimator as GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.4 Additional Endogenous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2.5 Consistency of S2SLS Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.2.6 Asymptotic Distribution of S2SLS Estimator . . . . . . . . . . . . . . . . . . . . . . . 116
5.2.7 S2SLS Estimation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.3 Generalized Moment Estimation of SEM Model . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.3.1 Spatially Weighted Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.2 Moment Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.3.3 Feasible Generalized Least Squares Model . . . . . . . . . . . . . . . . . . . . . . . . . 126
v

5.3.4 FGLS in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.4 Estimation of SAC Model: The Feasible Generalized Two Stage Least Squares estimator Pro-
cedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4.1 Intuition Behind the Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4.2 Moment Conditions Revised . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.4.3 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.4.4 Estimators and Estimation Procedure in a Nutshell . . . . . . . . . . . . . . . . . . . 137
5.5 Application in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.5.1 SAC Model with Homokedasticity (GS2SLS) . . . . . . . . . . . . . . . . . . . . . . . 142
5.5.2 SAC Model with Homokedasticity and Additional Endogeneity (GS2SLS) . . . . . . . 143
Appendix 5.A Proof Theorem 3 in Kelejian and Prucha (1998) . . . . . . . . . . . . . . . . . . . . 144

Index 151
vi
List of Figures

1.1 Environmental Externalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Spatial Distribution of Poverty in Metropolitan Region, Chile . . . . . . . . . . . . . . . . . . 4
1.3 Spatial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Rook Contiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Bishop Contiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Queen Contiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Higher-Order Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.8 Plotting a Map in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.9 Queen and Rook Criteria for MR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10 Different Spatial Weight Schemes for MR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.11 Moran Scatterplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.12 Cloropleth map: Poverty in the Metropolitan Region . . . . . . . . . . . . . . . . . . . . . . . 24
1.13 Cloropleth map: Poverty in the Metropolitan Region (Equal Interval) . . . . . . . . . . . . . 25
1.14 Moran Plot for Poverty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1 The SLM for Two Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2 The SDM for Two Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 The SEM for Two Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Taxonomy of Spatial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Regions east and west of the CBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.1 Distribution of ρb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Spatial Distribution of Crime in Columbus, Ohio Neighborhoods . . . . . . . . . . . . . . . . 69
3.3 Effects of a Change in Region 30: Categorization . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4 Effects of a Change in Region 30: Magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Distances from R3 to all Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.1 Estimation steps for SAC model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

vii
viii LIST OF FIGURES
List of Tables

3.1 Spatial Models for Crime in Columbus, Ohio Neighborhoods. . . . . . . . . . . . . . . . . . . 71

5.1 Spatial Models for Crime in Columbus: ML vs S2SLS . . . . . . . . . . . . . . . . . . . . . . 119

5.2 Spatial Models for Crime in Columbus: ML vs GM . . . . . . . . . . . . . . . . . . . . . . . . 130

ix
x LIST OF TABLES
Part I

Introduction to Spatial Dependence

1
Introduction to Spatial Econometric
1
1.1 Why do We Need Spatial Econometric?
An important aspect of any study involving spatial units (cities, regions, countries, etc) is the potential
relationships and interactions between them. For example, when modeling pollution at the regional level it is
awkward to analyze each region as independent units. In fact, regions cannot be analyzed as isolated entities
since they are spatially interrelated by ecological and economic interactions. Therefore, it is highly probably
the existence of environmental externalities: an increase in region i’s pollution will affect the pollution
in neighbors regions, but the impact will be lower for more distance regions. Consider Figure 1.1, where
region 3 is highly industrialized, whereas region 1, 2, 4 and 5 are residential areas. If region 3 increases its
economic activity, then the pollution not only will increase in that region, but also in the neighbor regions.
It is also expected that contamination will increase in region 1 and 5 but in lower magnitudes. We might
think that environmental externality in R3 causes environmental degradation in other regions, though both
spatial-economic interactions (e.g. transportation of input and output from region 3) and spatial-ecological
interactions (e.g. carbon emissions).

Figure 1.1: Environmental Externalities

R1 R2 R3 R4 R5

In the same vain, if we study crime at the city level then somehow we should incorporate the possibility
that crime is localized. For example identification of concentration or cluster of greater criminal activity
has emerged as a central mechanism to targeting a criminal justice and crime prevention response to crime
problem. These clusters of crime are commonly referred to as hotpots: geographic locations of high crime
concentration, relative to the distribution of crime across the whole region of interest.
Both examples implicitly state that geography location and distance matter. In fact, they reflect the
importance of the first law of geography. According to Waldo Tobler: “everything is related to everything
else”, but near things are more related than distant things. This first law is the foundation of the fundamental
concepts of spatial dependence and spatial autocorrelation.

1.1.1 Spatial Dependence

Spatial dependence reflects a situation where values observed at one location or region, say observation i,
depend on the values of neighboring observations at nearby locations. Formally, we might state:

yi = f (yj ), i = 1, ..., n , j 6= i. (1.1)

In words, what happens in region i, depends on what happens in region j for all j 6= i. Using our previous
example, we would like to estimate

3
4 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

y1 = β21 y2 + β31 y3 + β41 y4 + β51 y5 + 1

y2 = β12 y1 + β32 y3 + β42 y4 + β52 y5 + 2
y3 = β13 y1 + β23 y2 + β43 y4 + β53 y5 + 3 (1.2)
y4 = β14 y1 + β24 y2 + β34 y3 + β54 y5 + 4
y5 = β15 y1 + β25 y2 + β35 y3 + β45 y5 + 4
where βji is the effect of pollution of region j on region i. However, it is easy to see that this would be of little
practical usefulness, since it would result in a system with many more parameters than observations: we have
n = 5 observations, but 20 parameters to be estimated, which implies that we do not have sufficient degrees
of freedom. Intuitively, once we allow for dependence relation between a set of n observations/locations, there
are potentially n2 − n relations that could arise. We subtract n from the potential n2 dependence relations
because we rule out dependence of an observation on itself.
The key point is that, under standard econometric modeling, it is impossible to model spatial dependency.
However, as we will see in the next Chapter, we might be able to incorporate spatial relationships more
efficiently using the so-called spatial weight matrix.

1.1.2 Spatial Autocorrelation

Another important concept is spatial autocorrelation. In space, the term autocorrelation refers to the
correlation between the value of the variable at two different locations. Other ways of defining the same
concept are: (1) correlation between the same attribute at two (or more) different locations, or (2) coincidence
of values similarity with location similarity. Essentially, spatial autocorrelation is concerned with establishing
whether the presence of a variable in one region in a regional system makes the presence of that variable in
neighboring regions more, or less, likely.
The counterpart of spatial autocorrelation (and spatial dependency) is spatial randomness. Spatial ran-
domness means that we cannot observe any spatial pattern in the data. That is, the value we observe in
some spatial unit is equally likely as in any other spatial unit. Spatial Randomness is important because it
will form the null hypothesis later. If rejected, then there is evidence of spatial structure.
As an example, Figure 1.2 plots the spatial distribution of poverty in the Metropolitan Region, Chile. It
can be observed that there is some spatial pattern where communes with similar rate of poverty are clustered.

Figure 1.2: Spatial Distribution of Poverty in Metropolitan Region, Chile

Notes: This graph shows the spatial distribution of poverty in the Metropolitan Region, Chile.

Formally, the existence of spatial autocorrelation may be expressed by the following moment conditions:
1.1. WHY DO WE NEED SPATIAL ECONOMETRIC? 5

Cov(yi , yj ) = E(yi yj ) − E(yi )E(yj ) 6= 0 for i 6= j, (1.3)

where yi and yj are observations on a random variable at locations i and j in space, and i, j can be points
or areal units. Therefore, a nonzero spatial autocorrelation exists between attributes of a feature defined
at locations i and j if the covariance between feature attribute values at those points is nonzero. If this
covariance is positive (i.e., if data with attribute values above the mean tend to be near other data with
values above the mean), then we say there is positive spatial autocorrelation; if the converse is true, then
we say there is negative spatial autocorrelation. Figure 1.3 show an example of positive and negative
spatial autocorrelation.

Figure 1.3: Spatial Autocorrelation

Positive Spatial Autocorrelation Negative Spatial Autocorrelation

Notes: Spatial Autocorrelation among 400 spatial units arranged in an 20-by-20 regular square lattice grid. Different gray-tones
refer to different values of the variable ranging from low values (white) to high values (black). The left plot shows positive
spatial autocorrelation, whereas right plot shows negative spatial autocorrelation.

Positive autocorrelation is much more common, but negative autocorrelation does exists, for example,
in studies of welfare competition or federal grants competitions among local governments (Saavedra, 2000;
Boarnet and Glazer, 2002), and studies of regional employment (Filiztekin, 2009; Pavlyuk, 2011), the cross-
border lottery shopping (Garrett and Marsh, 2002), foreign direct investment in OECD countries (Garretsen
and Peeters, 2009) and locations of Turkish manufacturing industry (Basdas, 2009). In short, we are interested
in studying non-random spatial patterns and try to explain this non-randomness. Possible causes of non-
randomness are (Gibbons et al., 2015):

1. Firms may be randomly allocated across space but some characteristics of locations varies across space
and influences outcomes.
2. Location may have no causal effect on outcomes, but outcomes may be correlated across space because
heterogeneous individuals or firms are non-randomly allocated across space.
3. Individual or firms may be randomly allocated across space but they interact so that decisions by one
agent affects outcomes of other agents.
4. Individuals or firms may be non-randomly allocated across space and the characteristics of others nearby
directly influences individual outcomes.
6 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

1.2 Spatial Weight Matrix

One of the crucial issues in spatial econometric is the problem of formally incorporating spatial dependence
into the model. As we saw in Section 1.1.1, the main problem is that we have more parameter than obser-
vations. So, the question is: What would be a good criteria to define closeness in space? Or, in other words,
how to determine which other units in the system influence the one under consideration?
The device typically used in spatial analysis to define the concept of closeness in space is the so-called
“spatial weight matrix”, or more simply, W matrix. If we assume that there are n spatial objects (regions,
cities, countries), then W will be a square matrix of dimension n × n. This matrix imposes a structure
in terms o what are the neighbors for each location. It assigns weights that measure the intensity of the
relationship among pairs of spatial units. Thus, each element (i, j) of W – which we denote by wij – expresses
the degree of spatial proximity between the pair. This matrix can be represented in the form
 
w11 w12 . . . w1n
 w21 w22 . . . w2n 
W = . .. .. .. 
 
 .. . . . 
wn1 wn2 ... wnn
Generally, we assume that the diagonal elements of this “spatial neighbors” matrix are set to zero: “regions
are not neighbors to themselves”.
A more formal definition of spatial weight matrix is the following:
Definition 1.2.1 — Spatial Weight Matrix. Let n be the number of spatial units. The spatial weight matrix,
W , a n × n positive symmetric and non-stochastic matrix with element wij at location i, j. The values
of wij or the weights for each pair of locations are assigned by some preset rules which defines the spatial
relations among locations. By convention, wij = 0 for the diagonal elements.

Non-stochastic means that the researcher takes W as known a priori, and therefore, all results are
conditional upon the specification of W .
Note also that the definition of W requires a rule for wij . In other words, we need to figure out how to
assign a real number to wij , for i 6= j, representing the strength of the spatial relationship between i and j.
There are several ways of doing that. But, in general, there are two basic criteria. The first type establishes a
relationship based on shared borders or vertices of lattice or irregular polygon data (contiguity). The second
type establishes a relationship based on the distance between locations. Generally speaking, contiguity is
most appropriate for geographic data expressed as polygons (so-called areal units), whereas distance is suited
for point data, although in practice the distinction is not that absolute.

1.2.1 Weights Based on Boundaries

The availability of polygon or lattice data permits the construction of contiguity-based spatial weight matrices.
A typical specification of the contiguity relationship in the spatial weight matrix is
(
1 if i and j are contiguous
wij = . (1.4)
0 if i and j are not contiguous
In a regular grid, neighbors (contiguity) can be defined in a number of ways. In analogy of the game of
chess, rook contiguity, bishop contiguity and queen contiguity are distinguished.

Rook Contiguity
In this case, two locations are neighbors if they share at least part of a common border or side. In
Figure 1.4 we have a regular grid with 9 regions: each square represents a region. If for example we want
to define the neighbors of region 5 using the rook criteria, then its neighbors will be regions 2, 4, 6 and 8.
Those represent the regions filled in red.
If we continue with this reasoning, then the 9 × 9 W matrix will be:
1.2. SPATIAL WEIGHT MATRIX 7

Figure 1.4: Rook Contiguity

1 2 3

4 5 6

7 8 9

0 1 0 1 0 0 0 0 0
 
1 0 1 0 1 0 0 0 0
0 1 0 0 0 1 0 0 0
 
1 0 0 0 1 0 1 0 0
 

0
W = 1 0 1 0 1 0 1 0 (1.5)
 
0 0 1 0 1 0 0 0 1

0 0 0 1 0 0 0 1 0
 
0 0 0 0 1 0 1 0 1
 

0 0 0 0 0 1 0 1 0

Bishop Contiguity
In bishop contiguity (which is seldom used in practice), region i’s neighbors are located at its corners.
Figure 1.5 shows the neighbors of region 5 under this scheme. The neighbors are regions 1, 3, 7 and 9. Note
that regions in the interior will have more neighbors than those in the periphery.

Figure 1.5: Bishop Contiguity

1 2 3

4 5 6

7 8 9

The resulting W matrix will be:

0 0 0 0 1 0 0 0 0
 
0 0 0 1 0 1 0 0 0
0 0 0 0 1 0 0 0 0
 
0 1 0 0 0 0 0 1 0
 

1
W = 0 1 0 0 0 1 0 1
 
0 1 0 0 0 0 0 1 0

0 0 0 0 1 0 0 0 0
 
0 0 0 1 0 1 0 0 0
 

0 0 0 0 1 0 0 0 0
This criteria is seldom used in practice.
8 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

Queen Contiguity
In queen contiguity, any region that touches the boundary of region i, whether on a side or a single point, is
considered neighbor. Under this criteria, the neighbors of 5 will be regions: 1, 2, 3, 4, 6, 7, 8 and 9.

Figure 1.6: Queen Contiguity

1 2 3

4 5 6

7 8 9

1.2.2 Weights Based on Distance

Weights may also be defined as a function of the distance between region i and j, dij . This distance is
usually computed as the distance between their centroids, but it may of course be between other relevant
points for each spatial units, such as the capital – or largest city– or each region. Unlike the weights based
on contiguity, matrices based on distances only need the coordinates of the points.
There are several ways of computing the distance between two spatial units. Let xi and xj be the
longitude; and yi and yj the latitude coordinates for region i and j, respectively. The most general concept
of distance is the Minkowski metric:
p p
dpij = (|xi − xj | + |yi − yj | ) ,
for two points i and j, with respective coordinates (xi , yi ) and (xj , yj ), and with p as the parameter. The
most familiar special case is the Euclidean or straight line distance with p = 2:
q
deij = (xi − xj )2 + (yi − yj )2 .
Another employed metric is the Manhattan block distance. This measure only considers movement along
the east-west and north-south directions, i.e., by straight angles. This yield a distance measure where p = 1:

ij = |xi − xj | + |yi − yj | .
dm
All three measures presented above are useful if we consider the earth as a plane. For example, the
Euclidean distance is the length of a straight line on a map, and is not necessarily the shortest distance if
you take into account the curvature of the earth. The great circle distance take into account the curvature
of Earth. Ships and aircraft usually follow the great circle geometry to minimize the distance and save time
and money. In particular, the great circle distance is computed as:

ij = r × arccos [cos |xi − xj | cos yi cos yj + sin yi sin yj ]

−1
dcd
where r is the Earth’s radius. The arc distance is obtained in miles with r = 3959 and in kilometers with
r = 6371.

Inverse Distance
Now we have to transform the information about the distances among spatial points into a weight scheme.
The idea is that wijt → 0 as dij → ∞. In other words, the closer is j to i, the larger wij should be to conform
to Tobler’s first law.
In the inverse distance weighting scheme, the weights are inversely related to separation distance as shown
below:
1.2. SPATIAL WEIGHT MATRIX 9

if i 6= j
(
1
dα
wij = ij

0 if i = j,
where the exponent α is a parameter that is usually set by the researcher. In practice, the parameters are
seldom estimated, but typically set to α = 1 or α = 2. Therefore, the weights are given by the reciprocal
of the distance: the larger the distance between to spatial units, the lowest the spatial weight or the spatial
connection. Finally, by convention, the diagonal elements of the spatial weights are set to zero and not
computed. Plugging in a value of dii = 0 would yield division by zero for inverse distance weights.

Negative Exponential Model

Here the weights decline exponentially with separation distance

dij
wij = exp − ,
α
where α is a parameter that is commonly chosen by researcher. Since the weights are given by the exponential
of the negative distance, the greater the distance between i and j, the lower wij .
Both the inverse distance and the negative exponential distance depend not only on the parameter value
and functional form, but also on the metric used for distance. Since the weights are inversely related to
distance, larger values for the latter will yield small values for the former, and vice versa. This may be a
problem in practice when the distances are so large that the corresponding inverse distance weights become
close to zero, possible resulting in a zero spatial weight matrix. In addition, a potential problem may occur
when the distance metric is such that distances take on values less than one, which is typically a not a desired
result (Anselin and Rey, 2014).

k-nearest Neighbors
An alternative type of spatial weights that avoids the problem of isolates is to select the k-nearest neighbors.
In contrast to the distance band, this is not a symmetric relation. However, a potential problem with this
type of neighbors is the occurrence of ties, i.e., when more than one location j has the same distance from
i. A number of solutions exist to break the tie, from randomly selecting one the k-th order neighbors, to
including all of them.

Threshold Distance (Distance Band Weights)

In contrast to the k-nearest neighbors method, the threshold distance specifies that an region i is neighbor
of j if the distance between them is less than a specified maximum distance:
(
1 if 0 ≤ dij ≤ dmax
wij =
0 if dij > dmax .
To avoid isolates that would result from too stringent a critical distance, the distance must be chosen
such that each location has at least one neighbor. Such a distance conforms to a max-min criterion, i.e., it
is the largest of the nearest neighbor distances.
Finally, it is important to note that a weights matrix obtained from a distance band is always symmetric,
since distance is a symmetric relation.

1.2.3 Row-Standardized Weights Matrix

In practice, the spatial weights are seldom used in their binary (or distance) form, but subject to a transfor-
mation or standardization. In particular, we would like to compute weighted averages in which more weight
is placed on nearby observations than on distant observations. To do so, we can define a row-standardized
weight matrix W s , whose element wij s
is given by:
wij
s
wij =P .
j wij
10 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

This ensures that all weights are between 0 and 1 and facilities the interpretation of operation with the
weights matrix as an averaging of neighboring values as we will see below. The row-standardized weights
matrix also ensures that the spatial parameter in many spatial stochastic processes are comparable between
models (Anselin and Bera, 1998).
Another important featurePis Pthat, under row-standardization, the element of each row sum to unity and
the sum of all weights, S0 = i j wij = n, the total number of observations. This is a nice interpretation
that we will explore later.
Another important issue is about symmetry. As we have learnt, some spatial weight matrices are symmet-
ric. An important characteristic of symmetric matrix is that all its characteristics roots are real. However,
after the row standardization the matrices are no longer symmetric.
The row-standardized matrix is also known in the literature as the row-stochastic matrix:
Definition 1.2.2 — Row-stochastic Matrix. A real n × n matrix A is called Markov matrix, or row-
stochastic matrix if
1. aij ≥ 0 for 1 ≤ i, j ≤ n;
Pn
2. j=1 aij = 1 for 1 ≤ i ≤ n

An important characteristic of the row-stochastic matrix is related to its eigen values:

Theorem 1.1 — Eigenvalues of row-stochastic Matrix. Every eigenvalue ωi of a row-stochastic Matrix satisfies
|ω| ≤ 1

Therefore, the eigenvalues of the row-stochastic (i.e., row-normalized, row standardized or Markov) neigh-
borhood matrix W s are in the range [−1, +1].
Finally, we the behavior of W s is important for asymptotic properties of estimators and test statistics
(Anselin and Bera, 1998, pp. 244). In particular, the W matrix should be also exogenous, unless endogeneity
is considered explicitly in the model specification.

1.2.4 Spatial Lagged Variables

Now that we have discussed the spatial weight matrix, we can create the so-called spatially lagged variables
or spatial lag operator. The P spatial lag operator takes the form yL = W y with dimension n × 1, where
each element is given by yLi = j wij yj , i.e., a weighted average of the y values in the neighbor of i.
For example:

0 1 0 10 50
    

W y = 1 0 1 50 = 10 + 30 .

0 1 0 30 50
Using a row-standardized weight matrix:

0 1 0 10 50
    

W y = 0.5 0 0.5 50 = 5 + 15 .

0 1 0 30 50
As a result, for spatial unit i, the spatial lag of yi , referred as yLi ( the variable W y observed for location
i) is:

yLi = wi,1 yi + wi,2 y2 + ... + wi,n yn ,

or,
n
X
yLi = wi,j yj ,
j=1

where the weights wij consists of the elements of the ith row of the matrix W , matched up with the
corresponding elements of the vector y. In other words, this is a weighted sum of the values observed at
neighboring locations, since the non-neighbors are not included.
1.2. SPATIAL WEIGHT MATRIX 11

R As stated by Anselin (1988, p. 23-24), standardization must be done with caution.1 For example, when
the weights are based on an inverse distance function (or similar concept of distance decay), which has
a meaningful economic interpretation, scaling the rows so that the weights sum to one may result in a
loss of that interpretation. Can you give an example?

1.2.5 Higher Order Spatial

So far we have learned how to define the geographical space by matrix W . However, an interesting question
is how to define higher-order neighbors. For example, we may be interested in defining the neighbors of the
neighbors of a spatial unit. Or even we might be interested in the neighbors of neighbors of neighbors of
spatial unit i. To discuss this interesting case we need to define higher-order spatial weight matrices.
We define the higher-order spatial weight matrix l as W l . So, for example the spatial weight of order
l = 2 is given by W 2 = W W , spatial weight matrix of order l = 3 is given by W 3 = W W W , and so on.
What is the meaning of the element wij in this case? For spatial weights of order 2, the element wij of the
weight matrix is 1 if polygon j is adjacent to the first order neighbors of polygon i and is 0 otherwise. Thus,
for spatial neighbor weights of order n, the element wij of the weight matrix W is 1 if polygon j is adjacent
to the neighbors of order n − 1 of polygon i, and is 0 otherwise.
To illustrate these points, consider the following spatial structure for our example in Section 1.1:

0 1 0 0 0
 
 1 0 1 0 0
W =  0 1 0 1 0 . (1.6)
 

 0 0 1 0 1
0 0 0 1 0
Then W 2 = W W based on the 5 × 5 first-order contiguity matrix W from (1.6) is:

1 0 1 0 0
 
0 2 0 1 0
W2 =  1 0 2 0 1 (1.7)
 
0 1 0 2 0


0 0 1 0 1
Note that for region R1, the second-order neighbors are regions R1 and R3. That is, region R1 is a
second-order neighbor to itself as well as to region R3, which is a neighbor to the neighboring region R2.
Now consider R2. The first panel of Figure 1.7 shows the first-order neighbors of R2 given by the spatial
weight matrix in (1.6): the first-order neighbors are R1 and R3. Panel B considers the second-order neighbors
of R2: the second-order neighbors are R2 itself and R4. To understand this, this note that there is a feedback
effect from the first impact from R2 coming from R1 and R3 (first-order neighbors of R2). This explain why
the element w22 2
= 2. Moreover, there is an indirect effect coming from R4 through R3 that finally impacts
R2. This represents the value of 1 for the element w24 2
.
Similarly, for region R3, the second-order neighbors are regions R1 (which is a neighbor to the neighboring
region R2), R3 (a second-order neighbor to itself), and R5 (which is a neighbor to the neighboring region
R4).

Figure 1.7: Higher-Order Neighbors

R1 R2 R3 R4 R5

1 See also Elhorst (2014, p. 12) and references therein.

12 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

The third-order neighbors are:

0 2 0 1 0
 
2 0 3 0 1
0
W3 =  3 0 3 0
 
1 0 3 0 2


0 1 0 2 1

1.3 Examples of Weight Matrices in R

As probably you can see, creating spatial weight matrices by hand is tedious (and almost impossible). How-
ever, there exists several statistical software that allow us to create them in a very simply fashion. First, we
need the shape file, which has geographical information. The shapefile format is a digital vector storage for
storing geometric location and associated attribute information. Nowadays it is possible to read and write
geographical datasets using the shapefile format with a wide variety of software.
The shapefile format is simple because it can store the primate geometric data types of points, lines and
polygons. Shapes (points/lines/polygons) together with data attributes can create infinitely many represen-
tations about geographic data. The three mandatory files have filename extensions .shp, .shx, and .dbf.
The actual shapefile relates specifically to the .shp file, but alone is incomplete for distribution as the other
supporting files are required. The characteristics of each file is the following:

• .shp: shape format; the feature geometry itself,

• .shx: shape index format; a positional index of the feature geometry to allow seeking forwards and
backwards quickly,
• .dbf: attribute format; columnar attributes for each shape, in dBase IV format.

For simplicity in showing how to create neighbor objects in R, we work on the map consisting of the
communes of the Metropolitan Region in Chile.
We first need to load the Metropolitan Region shape file in R. To do so, we will use the maptools package
(Bivand and Lewin-Koh, 2015), which allows us reading and handling spatial objects.

#Load package
library("maptools")

If the shape file mr_chile.shp is in the same working directory, then we can load it into R using the
command readShapeSpatial:

# Read shape file

mr <- readShapeSpatial("mr_chile.shp")
class(mr)

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

The function readShapeSpatial reads data from the shapefile into a Spatial object of class “sp”. The
function names give us the name of the variables in the .dbf file associated with the shape file.

# Names of the variables in .dbf

names(mr)

## [1] "ID" "NAME" "NAME2" "URB_POP" "RUR_POP"

## [6] "MALE_POP" "TOT_POP" "FEM_POP" "N_PARKS" "N_PLAZA"
## [11] "CONS_HOUSE" "M2_CONS_HA" "GREEN_AREA" "AREA" "POVERTY"
## [16] "PER_CONTR_" "PER_HON_SA" "PER_PLANT_" "NURSES" "DOCTORS"
## [21] "CONSULT_RU" "CONSULT_UR" "POSTAS" "ESTAB_MUN_" "PSU_MUN_PR"
## [26] "PSU_PART_P" "PSU_SUB_PR" "STUDENT_SU" "STUDENT_PA" "STUDENT_MU"
1.3. EXAMPLES OF WEIGHT MATRICES IN R 13

We can plot the shapefile using the generic function plot in the following way

# Plot shapefile
plot(mr, main = "Metropolitan Region-Chile", axes = TRUE)

The metropolitan region with the 52 communes is shown in Figure 1.8.

Figure 1.8: Plotting a Map in R

Metropolitan Region−Chile

−33.0
−33.5
−34.0
−34.5

−71.5 −71.0 −70.5 −70.0

1.3.1 Creating Contiguity Neighbors

To create spatial weight matrices we need the spdep package (Bivand et al., 2013). After installing it, we
load the package

#Load package
library("spdep")

In the spdep package, neighbor relationships between n observations are represented by an object of class
“nb”. This object is a list of length n with the index numbers of neighbors of each component recorded as an
integer vector. If any observation has no neighbors, the component contains an integer zero.
The function poly2nb is used in order to construct weight matrices based on contiguity. Specifically, it
creates a “neighbors list” based on regions with contiguous boundaries of class “nb”. Check out help(polynb)
to see all the details and options.
First, we create a neighbor list based on the ‘Queen’ criteria for the communes of the Metropolitan Region:

# Create queen W
queen.w <- poly2nb(mr, row.names = mr$NAME, queen = TRUE)

Since we have an nb object to examine, we can present the standard methods for these objects. There are
print, summary, plot, and other methods. The characteristics of the weights are obtained with the usual
summary command:

# Summary of W
summary(queen.w)

## Neighbour list object:

14 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

## Number of regions: 52
## Number of nonzero links: 292
## Percentage nonzero weights: 10.79882
## Average number of links: 5.615385
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 12
## 3 2 7 15 10 10 2 1 1 1
## 3 least connected regions:
## Tiltil San Pedro Maria Pinto with 2 links
## 1 most connected region:
## San Bernardo with 12 links

The output presents important information about the neighbors: it shows the number of regions, which
corresponds to 52 in this example; the number of nonzero links; the percentage of nonzero weights; the
average number of links, and so on.
The commune of San Bernardo is most connected region with 12 neighbors under the queen scheme. The
least connected regions are Tiltil, San Pedro, and Maria Pinto with 2 neighbors each of them. The output also
shows the distribution of neighbors. For example, 7 out of 52 regions has 4 neighbors and only 2 communes
has 8 neighbors.
To transform the list into an actual matrix W , we can use the function nb2listw:

# From list to matrix

queen.wl <- nb2listw(queen.w, style = "W")
summary(queen.wl)

## Characteristics of weights list object:

## Neighbour list object:
## Number of regions: 52
## Number of nonzero links: 292
## Percentage nonzero weights: 10.79882
## Average number of links: 5.615385
## Link number distribution:
##
## 2 3 4 5 6 7 8 9 10 12
## 3 2 7 15 10 10 2 1 1 1
## 3 least connected regions:
## Tiltil San Pedro Maria Pinto with 2 links
## 1 most connected region:
## San Bernardo with 12 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 52 2704 52 19.76751 216.466

An important argument of the function is style. This argument indicates what P type of matrix to create.
For example, style = "W" creates a row-standardize matrix so that wij s
= wij / j wij . After normalization,
each row of W P s
will
P sum to 1. "B" is the basic binary coding; and "C" is globally standardize, that is,
wij = wij · (n/ i j wij ). If style = "U", then wij
s s
= wij / i j wij . In a minmax matrix, the (i, j)th
P P
element of W s becomes wij s
= wij / min {maxi (τi ), maxi (ci )}, with maxi (τi ) being the largest row sum of
W and maxi (ci ) being the largest column sum of q W (Kelejian and Prucha, 2010). Finally, "S" is the
variance-stabilizing coding scheme where wij = wij /
s
j wij (Tiefelsdorf et al., 1999).
P 2

Furthermore, the summary function reports constants used in the inference for global spatial autocorrela-
tion statistics, which we will discuss later.
We can also see the attributes of the object using the function attributes:
1.3. EXAMPLES OF WEIGHT MATRICES IN R 15

# Attributes of wlist
attributes(queen.w)

## $class
## [1] "nb"
##
## $region.id
## [1] Santiago Cerillos Cerro Navia
## [4] Conchali El Bosque Estacion Central
## [7] La Cisterna La Florida La Granja
## [10] La Pintana La Reina Lo Espejo
## [13] Lo Prado Macul Nunoa
## [16] Pedro Aguirre Cerda Penalolen Providencia
## [19] Quinta Normal Recoleta Renca
## [22] San Joaquin San Miguel San Ramon
## [25] Independencia Puente Alto Las Condes
## [28] Vitacura Quilicura Huechuraba
## [31] Maipu Pudahuel San Bernardo
## [34] Tiltil Lampa Colina
## [37] Lo Barnechea Pirque Paine
## [40] Buin Alhue Melipilla
## [43] San Pedro Maria Pinto Curacavi
## [46] Penaflor Calera de Tango Padre Hurtado
## [49] El Monte Talagante Isla de Maipo
## [52] San Jose de Maipo
## 52 Levels: Alhue Buin Calera de Tango Cerillos Cerro Navia Colina ... Vitacura
##
## $call
## poly2nb(pl = mr, row.names = mr$NAME, queen = TRUE)
##
## $type
## [1] "queen"
##
## $sym
## [1] TRUE

We may ask whether the matrix is symmetric using:

# Symmetric W
is.symmetric.nb(queen.w)

## [1] TRUE

As we previously discussed, generally weight matrix based on boundaries are symmetric. Now, we con-
struct a binary matrix using the Rook criteria:

# Rook W
rook.w <- poly2nb(mr, row.names = mr$NAME, queen = FALSE)
summary(rook.w)

## Neighbour list object:

## Number of regions: 52
## Number of nonzero links: 272
## Percentage nonzero weights: 10.05917
## Average number of links: 5.230769
## Link number distribution:
##
16 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

Figure 1.9: Queen and Rook Criteria for MR

## 2 3 4 5 6 7 8 9 10
## 3 3 12 16 7 6 2 1 2
## 3 least connected regions:
## Tiltil San Pedro Maria Pinto with 2 links
## 2 most connected regions:
## Santiago San Bernardo with 10 links

Finally, we can plot the weight matrices using the following set of commands (see Figure 1.9).

# Plot Queen and Rook W Matrices

plot(mr, border = "grey")
plot(queen.w, coordinates(mr), add = TRUE, col = "red")
plot(rook.w, coordinates(mr), add = TRUE, col = "yellow")

1.3.2 Creating Distance-based Neighbors

We now construct spatial weight matrices using the k-nearest neighbors criteria.

# K-neighbors
coords <- coordinates(mr) # coordinates of centroids
head(coords, 5) # show coordinates

## [,1] [,2]
## 0 -70.65599 -33.45406
## 1 -70.71742 -33.50027
## 2 -70.74504 -33.42278
## 3 -70.67735 -33.38372
## 4 -70.67640 -33.56294

k1neigh <- knearneigh(coords, k = 1, longlat = TRUE) # 1-nearest neighbor

k2neigh <- knearneigh(coords, k = 2, longlat = TRUE) # 2-nearest neighbor

The function coords extract the spatial coordinates from the shape file, whereas the function knearneigh
returns a matrix with the indices of points belonging to the set of the k-nearest neighbors of each other. The
1.3. EXAMPLES OF WEIGHT MATRICES IN R 17

argument k indicates the number of nearest neighbors to be returned. If point coordinates are longitude-
latitude decimal degrees, then distances are measured in kilometers if longlat = TRUE. Furthermore, if
longlat = TRUE, great circle distances are used. Note that the objects k1neigh and k2neigh are of class
knn.
Weight matrices based on inverse distance can be computed in the following way (see Section 1.2.2):

# Inverse weight matrix

dist.mat <- as.matrix(dist(coords, method = "euclidean"))
dist.mat[1:5, 1:5]

## 0 1 2 3 4
## 0 0.00000000 0.07687010 0.09438408 0.07350782 0.11078109
## 1 0.07687010 0.00000000 0.08226867 0.12324109 0.07489489
## 2 0.09438408 0.08226867 0.00000000 0.07814455 0.15606360
## 3 0.07350782 0.12324109 0.07814455 0.00000000 0.17922003
## 4 0.11078109 0.07489489 0.15606360 0.17922003 0.00000000

dist.mat.inv <- 1 / dist.mat # 1 / d_{ij}

diag(dist.mat.inv) <- 0 # 0 in the diagonal
dist.mat.inv[1:5, 1:5]

## 0 1 2 3 4
## 0 0.000000 13.008960 10.595007 13.603994 9.026811
## 1 13.008960 0.000000 12.155295 8.114177 13.352046
## 2 10.595007 12.155295 0.000000 12.796797 6.407644
## 3 13.603994 8.114177 12.796797 0.000000 5.579733
## 4 9.026811 13.352046 6.407644 5.579733 0.000000

# Standardized inverse weight matrix

dist.mat.inve <- mat2listw(dist.mat.inv, style = "W", row.names = mr$NAME)
summary(dist.mat.inve)

## Characteristics of weights list object:

## Neighbour list object:
## Number of regions: 52
## Number of nonzero links: 2652
## Percentage nonzero weights: 98.07692
## Average number of links: 51
## Link number distribution:
##
## 51
## 52
## 52 least connected regions:
## Santiago Cerillos Cerro Navia Conchali El Bosque Estacion Central La Cisterna La Florida La Granja La
## 52 most connected regions:
## Santiago Cerillos Cerro Navia Conchali El Bosque Estacion Central La Cisterna La Florida La Granja La
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 52 2704 52 2.902384 214.3332

The function dist from stats package computes and returns the distance matrix computed by using
the specified distance measure—euclidean distance in this example— to compute the distance between the
rows of a data matrix. The other methods that can be used are maximum, manhattan, canberra, binary or
minkowski. Finally, the mat2listw function converts a square spatial weight matrix as a sequence of number
1:nrow(x).2
2 For more about spatial weight matrices see (Stewart and Zhukov, 2010).
18 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

The following code plot the different weight matrices:

# Plot Weights
par(mfrow = c(3, 2))
plot(mr, border = "grey", main = "Queen")
plot(queen.w, coordinates(mr), add = TRUE, col = "red")
plot(mr, border = "grey", main = "1-Neigh")
plot(knn2nb(k1neigh), coords, add = TRUE, col = "red")
plot(mr, border = "grey", main = "2-Neigh")
plot(knn2nb(k2neigh), coordinates(mr), add = TRUE, col = "red")
plot(mr, border = "grey", main = "Inverse Distance")
plot(dist.mat.inve, coordinates(mr), add = TRUE, col = "red")

Figure 1.10: Different Spatial Weight Schemes for MR

Queen 1−Neigh

2−Neigh Inverse Distance

1.4. TESTING FOR SPATIAL AUTOCORRELATION 19

1.3.3 Constructing a Spatially Lagged Variable

Spatially lagged variables are important elements of many spatial test and spatial regression specifications.
In spdep, they are constructed by means of the lag.listw function.
First, we will combine the variables POVERTY and URB_POP into a matrix and check the contents with head

# X matrix
X <- cbind(mr$POVERTY, mr$URB_POP)
head(X, 5)

## [,1] [,2]
## [1,] 8 159919
## [2,] 9 65262
## [3,] 18 131850
## [4,] 12 104634
## [5,] 14 166514

Now, we can construct a spatially lagged version of this matrix, using the queen.w weights:

# Create WX
WX <- lag.listw(nb2listw(queen.w), X)
head(WX)

## [,1] [,2]
## [1,] 9.10000 100138.9
## [2,] 12.40000 299498.4
## [3,] 14.00000 144756.5
## [4,] 14.60000 121974.2
## [5,] 18.25000 170266.5
## [6,] 10.42857 236231.1

1.4 Testing for Spatial Autocorrelation

As we stated in Section 1.1.2, spatial autocorrelation refers to the correlation of a variable with itself in space.
It can be positive (when high values correlate with high neighboring values or when low values correlate with
low neighboring values) or negative (spatial outlier for high-low or low-high values). So the next question
is how to test whether the spatial pattern we observe truly follows a spatial autocorrelated process or is
completely random. In other words, we need a test of spatial autocorrelation to formally examine whether
the observed value of a variable at one location is independent of values of that variable at neighboring
locations.

1.4.1 Global Spatial Autocorrelation: Moran’s I

Global spatial autocorrelation is a measure of overall clustering. So the main goal of these indices is to
summarize the degree to which similar observations tend to occur near each other. Those indices calculate
the similarity of values at location i and j then ‘weight’ the similarity by the proximity of locations i and
j. High similarities with high weight indicate similar values that are close together, whereas low similarities
with high weight indicate dissimilar values that are close together.
The most general measure used is the Moran’s I.3 This statistic is a measure of overall clustering that
exists in a dataset. It is assessed by means of a test of a null hypothesis of random location. Therefore,
rejection of this null hypothesis suggests a spatial pattern or spatial structure.
Moran’s I is given by:
Pn Pn Pn Pn
i=1 j=1,j6=i wij (xi − x̄) (xj − x̄) n i=1 j=1 wij (xi − x̄) (xj − x̄)
I= Pn 2 = Pn 2 (1.8)
S0 i=1 (xi − x̄) /n S0 i=1 (xi − x̄)
3 There exists several other measures of global spatial autocorrelation as the Geary’s C test. But, in this notes we will focus

only in the Moran’s I.

20 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC
Pn Pn
where S0 = i=1 j=1 wij and wij is an element of the spatial weight matrix that measures spatial distance
or connectivity between regions i and j. In matrix form:

n z0W z
I= (1.9)
S0 z 0 z
where z = x − x̄. If the W matrix is row standardized, then:

z0W sz
I= (1.10)
z0z
because S0 = n. Values range from -1 (perfect dispersion) to +1 (perfect correlation). A zero value indicates
a random spatial pattern.
A very useful tool for understanding the Moran’s I test is the Moran Scatterplot. The idea of the Moran
scatterplot is to display the variable for each region (on the horizontal axis) against the standardized spatial
weighted average (average of the neighbors’ x, also called spatial lag) on the vertical axis (See Figure 1.11).
As pointed out by Anselin (1996), expressing the variables in standardized from (i.e. with mean zero and
standard deviation equal to one) allows assessment of both the global spatial association, since the slope
of the line is the Moran’s I coefficient, and local spatial association (the quadrant in the scatterplot). The
Moran scatterplot is therefore divided into four different quadrants corresponding to the four types of local
spatial association between a region and its neighbors:

• Quadrant I displays the region with high x (above the average) surrounded by regions with high x
(above the average). This quadrant is usually denoted High-High.
• Quadrant II show the regions with low value surrounded by region with high values. This quadrant is
usually denoted Low-High.
• Quadrant III display the regions with low value surrounded by regions with low values, and is denoted
Low-Low.
• Quadrant IV shows the regions with high value surrounded by regions with low values. It is noted
High-Low.

Regions located in quadrant I and III refer to positive spatial autocorrelation, the spatial clustering of
similar values, whereas quadrant II and IV represent negative spatial autocorrelation, the spatial clustering
of dissimilar values.
To understand Moran’s I, it is important to note the similarity of the Moran’s I with the OLS coefficient.
Recall that
Pn
(xi − x̄) (yi − ȳ)
βb = i=1Pn 2 (1.11)
i=1 (xi − x̄)
Then looking at (1.8), Moran’s I is equivalent to the slope coefficient of a linear regression of the spatial
lag W x on the observation vector x measured in deviation from their mean. It is, however, not equivalent
to the slope of x on W x which would be a more natural way.
The hypothesis tested by the Moran’s I is the following:

• H0 : x is spatially independent; the observed x is assigned at random among locations. In this case I
is close to zero.
• H1 : x is not spatially independent. In this case I is statistically different from zero.

What is the distribution of the Moran’s I? We are interested in the distribution of:

I − E [I]
Var(I)
p

There are two ways to compute the mean and variance of Moran’s I. The first one is under the normal
assumption of xi and the second one is under randomization of xi . Under the normal assumption, it is
assumed that the random variable xi are the result of n independently drawings from a normal population.
Under the randomization assumption, no matter what the underlying distribution of the populations, we
consider the observed values of xi were repeatedly randomly permuted.
1.4. TESTING FOR SPATIAL AUTOCORRELATION 21

Figure 1.11: Moran Scatterplot

3
2
Quadrant II Quadrant I
1
Wx

Quadrant III
0

Quadrant IV
4:2
−1

4:1

5:4

−3 −2 −1 0 1 2 3 4

Moments Under Normality Assumption

Theorem 1.2 gives the moments of Moran’s I under normality.

Theorem 1.2 — Moran’s I Under Normality. Assume that {xi } = {x1 , x2 , ..., xn } are independent and dis-
tributed as N(µ, σ 2 ), but µ and σ 2 are unknown. Then:
1
E (I) = − (1.12)
n−1
and
n2 S1 − nS2 + 3S02
E I2 = (1.13)
S02 (n2 − 1)
Pn Pn Pn Pn Pn Pn
where S0 = i=1 j=1 wij , S1 = i=1 j=1 (wij +wji )2 /2, S2 = i=1 (wi. +w.i )2 , where wi. = j=1 wij
Pn
and wi. = j=1 wji Then:
2
Var (I) = E I 2 − E (I) (1.14)

Proof. Let zi = xi − x̄. The following moments are true for zi :

E [zi ] = 0
σ2
E zi2 = σ2 −

n
22 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

σ2
E [zi zj ] = −
n
(n2 − 2n + 3)σ 2
E zi2 zj2 =

m2
(n − 3)σ 4
E zi2 zj zk =

−
n
3σ 4
E [zi zj zk zl ] =
n2
Then:
hP i
n Pn
n n
n E i=1 j=1 wij zi zj n XX E [zi zj ]
E [I] = Pn = wij Pn
S0 E[ i=1 zi ]
2 S0 i=1 j=1 i=1 E [zi ]
2

2
−nS0 σn
=
S0 n(1 − 1/n)σ 2 (1.15)
σ2
=− n
(1 − 1/n)σ 2
1
=−
n−1
and

 hP i2 
n Pn
 n2 i=1 j=1 wij zi zj
E I2 = E  2

2 2
Pn
S0 [ i=1 zi ]


1/2 (2) (wij + wji )2 zi2 zj2 + (3) (wij + wji )(wik + wki )zi2 zj zk +
" P P P #
n2 (4) wij wkl zi zj zk zl
= 2E
S0 s
(1.16)

Moran’s I under Randomization

Theorem 1.3 gives the moments of Moran’s I under randomization.

Theorem 1.3 — Moran’s I Under Randomization. Under permutation, we have:

1
E (I) = − (1.17)
n−1
and

n2 − 3n + 3 S1 − nS2 + 3S02 − b2 n2 − n S1 − 2nS2 + 6S02

n
E I = 2
(1.18)

(n − 1)(n − 2)(n − 3)S02
Pn Pn Pn Pn Pn Pn
where S0 = i=1 j=1 wij , S1 = i=1 j=1 (wij +wji )2 /2, S2 = i=1 (wi. +w.i )2 , where wi. = j=1 wij
Pn
and wi. = j=1 wji .Then:
2
Var (I) = E I 2 − E (I) (1.19)

It is important to note that the expected value of Moran’s I under normality and randomization is the
same.

Monte Carlo Moran’s I

The normality assumption is a very strong assumption. However we can use the Moran’s I test based on
Monte Carlo simulation.
1.5. APPLICATION: POVERTY IN SANTIAGO, CHILE 23

The idea for any Monte Carlo test is the following:

• To test a null hypothesis H0 (no spatial autocorrelation in our case), we specify a test statistic T such
that large values of T are evidence against H0 .
• Let T have observed value tobs . We generally want to compute

p − value = Pr(T ≥ tobs |H0 ) (1.20)

Therefore we need the distribution of T when H0 is true to evaluate this probability.

The algorithm for the Morans’ I Monte Carlo test is the following:

Algorithm 1.4 — Moran’s’ I Monte Carlo Test. The procedure is the following:

1. Rearrange the spatial data by shuffling their location and compute the Moran’s I S times. This will
create the distribution under H0 . This operationalizes spatial randomness.
2. Let I1∗ , I2∗ , ..., IS∗ be the Moran’s I for each time. A consistent Monte Carlo p-value is then:
PS
1+ 1(Is∗ ≥ Iobs )
pb = s=1
(1.21)
S+1

3. For tests at the α level or at 100(1 − α)% confidence intervals, there are reasons for choosing S so
that α(S + 1) is an integer. For example, use S = 999 for confidence intervals and hypothesis tests
when α = 0.05.

1.5 Application: Poverty in Santiago, Chile

In this section we undertake and exploratory spatial data analysis (ESDA) for poverty in Metropolitan
Region, Chile. We start by loading the required packages and the shape file.

# Load maptools ----

library("maptools") # Tools for reading and handling spatial objects
library("RColorBrewer")
library("spdep")
mr <- readShapeSpatial("mr_chile")

1.5.1 Cloropeth Graphs

If we are interested in the geographical variation in poverty, we should start by plotting the spatial distribution
of poverty. This can be useful in a variety of way. Usually, aggregate or national level indicators hide
important differences between different spatial units. Thus, poverty mapping helps to highlight geographical
variations. In addition to this, another advantage of poverty maps is their legibility- maps are powerful tools
for representing complex information in a visual format that is easy to understand.
So, we start by plotting the geographical variation of poverty among communes by using the spplot
function. In particular, we use a cloropleth4 map using the quantile classification. In a quantile graph, the
variable is sorted and grouped in categories with equal number of observations, or quantiles.

# Cloropleth graphs ----

spplot(obj = mr,
zcol = "POVERTY",
at = quantile(mr$POVERTY, p = c(0, .25, .5, .75, 1),
na.rm = TRUE),
col.regions = brewer.pal(5, "Blues"),
4 The name of this technique is derived from the Greek words choros - space, and pleth - value
24 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

Figure 1.12: Cloropleth map: Poverty in the Metropolitan Region

main = "Cloropleth map of Metropolitan Region",

sub = "Poverty")

Figure 1.12 provides some useful insights. First, it clearly shows that the spatial pattern of poverty in the
MR is not spatial homogeneous, but rather the intensity of poverty varies across space. Second, it provides an
example of how disaggregated poverty indicators can reveal additional information to aggregate indicators.
It shows that poverty intensity is lower peripheral communes than central communes.
How to interpret quantile maps? A quantile classification scheme is an ordinal ranking of the data values,
dividing the distribution into intervals that have an equal number of data values. Quantile classification
ensures maps are easily comparable and can be ‘easy to read’.
We can also plot the data using the equal interval classification. Equal interval divides the data into equal
size classes (e.g., 0-10, 10-20, 20-30, etc) and works best on data that is generally spread across the entire
range. In the following example we use a defined interval classification.

# Define interval classification

mr$pov.cat <- cut(mr$POVERTY,
breaks = c(0, 9, 12, 16, 28),
labels = c("< 9", "9-12", "12-16", ">16"))
spplot(mr, "pov.cat", col.regions = brewer.pal(5, "Reds"))

However, regarding the possible spatial association that seems to be derived from the above figures for
the poverty variable, it is necessary to note that the results are sensitive to the number of defined intervals
(among other things). Therefore, it is necessary to conduct a comprehensive and formal analysis about the
potential presence of spatial dependence to ascertain whether there exists a pattern of statistically significant
spatial autocorrelation in the spatial distribution of poverty. That is why now we calculate the Moran’s I
test.

1.5.2 Moran’s I Test

First, we create two spatial weight matrices (queen and rook) to assess the robustness of the test under
different spatial schemes.

# Generate W matrices
queen.w <- poly2nb(mr, row.names = mr$NAME, queen = TRUE)
rook.w <- poly2nb(mr, row.names = mr$NAME, queen = FALSE)
1.5. APPLICATION: POVERTY IN SANTIAGO, CHILE 25

Figure 1.13: Cloropleth map: Poverty in the Metropolitan Region (Equal Interval)

>16
12−16
9−12
<9

Moran’s I test statistic for spatial autocorrelation is implemented in spdep (Bivand and Piras, 2015).
There are mainly two function for computing this test: moran.test, where the inference is based on a
normal or randomization assumption, and moran.mc, for a permutation-based test.

# Moran's I test
moran.test(mr$POVERTY, listw = nb2listw(queen.w), randomisation = FALSE,
alternative = 'two.sided')

##
## Moran I test under normality
##
## data: mr$POVERTY
## weights: nb2listw(queen.w)
##
## Moran I statistic standard deviate = 4.0453, p-value = 5.225e-05
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.306497992 -0.019607843 0.006498517

moran.test(mr$POVERTY, listw = nb2listw(rook.w), randomisation = FALSE,

alternative = 'two.sided')

##
## Moran I test under normality
##
## data: mr$POVERTY
## weights: nb2listw(rook.w)
##
## Moran I statistic standard deviate = 4.3309, p-value = 1.485e-05
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.342282943 -0.019607843 0.006982432

The randomisation option is set to TRUE by default, which implies that in order to get inference based
26 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC

on a normal approximation, it must be explicitly set to FALSE, as in our case. Similarly, the default is a
one-sided test, so that in order to obtain the results for the more commonly used two-sided test, the option
alternative must be explicitly to ’two.sided’. Note also that the zero.policy option is set to FALSE by
default, which means that islands result in a missing value code NA. Setting this option to TRUE will set the
spatial lag for island to the customary zero value.
The results show that the Moran’s I statistic are ≈ 0.30 and 0.34, respectively, and highly significant. This
implies that there is evidence of robust positive spatial autocorrelation in the poverty variable (since we
are rejecting the null hypothesis of random spatial distribution).

R If you compute the Moran’s I test for two different variables, but using the same spatial weight
matrix, the expectation and variance of the Moran’s I test statistic will be the same under the normal
approximation. Why?

The test under randomization gives the following results:

# Moran test under randomization

moran.test(mr$POVERTY, listw = nb2listw(queen.w),
alternative = 'two.sided')

##
## Moran I test under randomisation
##
## data: mr$POVERTY
## weights: nb2listw(queen.w)
##
## Moran I statistic standard deviate = 4.0689, p-value = 4.723e-05
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.306497992 -0.019607843 0.006423226

Note how the value of the statistic and its expectation do not change relative to the normal case, only
the variance is different.
We can carry out a Moran’s I test based on random permutation the function moran.mc. Unlike previous
test, it needs the number of permutations nsim. Since the rank of the observed statistic is computed relative
to the reference distribution of statistics for the permuted data sets, it is good practice to set this number to
something ending on 9 (such as 99 or 999). This will lead to rounded pseudo p-values like 0.01 or 0.001.

# Moran's Test
set.seed(1234)
moran.mc(mr$POVERTY, listw = nb2listw(queen.w),
nsim = 99)

##
## Monte-Carlo simulation of Moran I
##
## data: mr$POVERTY
## weights: nb2listw(queen.w)
## number of simulations + 1: 100
##
## statistic = 0.3065, observed rank = 100, p-value = 0.01
## alternative hypothesis: greater

Note that none of the permuted data sets yielded a Moran’s I greater than the observed value of 0.3065,
hence a pseudo p-value of (0 + 1)/(99 + 1) = 0.01.
The Moran scatter plot can also be obtained using the function moran.plot of spdep:
1.5. APPLICATION: POVERTY IN SANTIAGO, CHILE 27

# Moran's plot
moran.plot(mr$POVERTY, listw = nb2listw(queen.w))

Figure 1.14: Moran Plot for Poverty

El Bosque
18
Spatially Lagged Poverty Rate

San Ramon
16

La Pintana
San Miguel La Granja
14
12
10

Vitacura
8
6

0 5 10 15 20 25

Poverty Rate

Figure 1.14 displays the Moran scatterplot of poverty with the queen weight matrix. Positive spatial
autocorrelation, detected by the value of the Moran’s I, is reflected by the fact that most of the communes
are located in quadrant I and III. However, there are some exceptions such as the communes located in
quadrant II and IV. For example, San Miguel is a commune with low poverty rate, but surrounded by
communes with high poverty.
A major limitation of Moran’s I is that it cannot provide information on the specific locations of spatial
patterns; it only indicates the presence of spatial autocorrelation globally. A single overall indication is given
of whether spatial autocorrelation exists in the dataset, but no indication is given of whether local variations
exist in spatial autocorrelation (e.g., concentrations, outliers) across the spatial extent of the data.
28 CHAPTER 1. INTRODUCTION TO SPATIAL ECONOMETRIC
Spatial Models
2
In the previous chapter we learnt some preliminary concepts in spatial econometric, such as spatial depen-
dency, spatial autocorrelation and we learnt how to do an exploratory analysis of the spatial data.
In this chapter we show the formulation of spatial models. In particular, in Section 2.1 we will derive a
complete taxonomy of spatial model including the Spatial Lag Model, Spatial Durbin Model, Spatial Error
Model and the Spatial Autocorrelation Model. We will give a brief motivation of each of them and some
examples. In Section 2.3 we show how to understand the ‘spillover’ effects and how to interpret marginal
effects in the spatial model framework.

2.1 Taxonomy of Models

As we shown in previous chapter (in particular in Section 1.1.1), it is no possible to model spatial dependen-
cies using traditional econometric. The main problem is that we have more parameters than observations.
However, as we will see, using the spatial weight matrix we can solve that problem by reducing the number
of parameters to just one parameter using the weighted average of the y values in the neighborhood of i.

2.1.1 Spatial Lag Model

So, given the problem of insufficient degree of freedom, how to model spatial spillover effects? Instead of
using a full system of equation, we can model the spatial dependency as:
n
X
yi = α + ρ wij yj + i , i = 1, ..., n, (2.1)
j=1

where
Pn wij is the (i, j)th element of the W matrix; yi is the dependent variable for spatial unit i, so that
j=1 wij yj is the weighted average of the dependent variable for the neighbors of i (or spatial lag); i is the
error term such that E(i ) = 0; and ρ is the spatial autoregressive parameter which measures the intensity
of the spatial interdependence. If ρ > 0 indicates a positive spatial dependence, whereas ρ < 0 indicates a
negative spatial dependence. It should be clear that if ρ = 0, we have a conventional regression model. By
including the spatial lag variable we are making explicit the the existence of spatial spillovers effects due
to, for example, geographical proximity. This data generating process is known as a Spatial Autoregressive
Process or also labeled as SAR or Spatial Lag Model SLM. Since the model (2.1) does not include explanatory
variables the model is known as the pure SLM.
Figure 2.1 represents the spatial autoregressive model in (2.1) for two regions. The variables (x1 , x2 ) and
unobserved terms (1 , 2 ) have a direct effect on y for both regions. Note that the model incorporates spatial
spillover effects by the effect of y1 on y2 and vice versa. That is, the model reflects the ‘simultaneity’ inherent
in spatial autocorrelation.
The model can also be written in vector form is

yi = α + ρ wi >
y + i , i = 1, ..., n,
(1×n) (n×1)

29
30 CHAPTER 2. SPATIAL MODELS

Figure 2.1: The SLM for Two Regions

x1 x2

Effects
y1 y2 → : non-spatial effects
99K: spatial effects

1 2

where wi is the ith row of W . A full SLM specification with covariates in matrix form can be written as:

y = αın + ρW y + Xβ + , (2.2)
where y is an n × 1 vector of observations on the dependent variable, X is an n × K matrix of observations
on the explanatory variables, β is the K × 1 vector of parameters, and ın is a n × 1 vector of ones.

Reduced Form and Parameter Space

An important concept in the context of spatial models is the difference between structural and reduced form
model. Very roughly, the reduced form of a model is the one which the endogenous variables are expressed as
functions of the exogenous variables. The structural form is the ‘behavioral model’ that relates the variables.
For example, the Equation (2.2) is the structural model for the SLM, which relates the all the (exogenous
and endogenous) variables with the dependent variable y. However, a better way of think about the model is
how the model is generated. This is the so-called data generating process (DGP). If we solve the system (2.2)
for the endogenous variables , y, we will obtain the reduced-form model. Thus, the implied data generating
process or “reduced form equation” for the SLM given in Equation (2.2) is:
−1 −1
y = (In − ρW ) (αın + Xβ) + (In − ρW ) ε, (2.3)
which no longer contains any spatially lagged dependent variable on the right-hand size. The Equation (2.3)
expresses the simultaneous nature of the spatial autoregressive process.

R The reduced form of a system of equations is the result of solving the system for the endogenous
variables. This gives the latter as functions of the exogenous variables, if any. For example, the
general expression of a structural form is f (y, X, ε) = 0, whereas the reduced form of this model is
given by y = g(X, ε), with g as function.

Without restrictions on (In − ρW )—and (αın + Xβ)—the coefficients cannot be identified from data.
In other words, in order to obtain the reduced form we need (In − ρW ) to be invertible. From standard
algebra theory a matrix A is invertible if det(A) 6= 0. Thus, we require that det(In − ρW ) 6= 0. The question
is: which values of ρlead to non-singular (In − ρW )? For symmetric matrices, the compact open interval
−1
for ρ ∈ ωmin −1
, ωmax will lead to a symmetric positive definite (In − ρW ), where ωmin and ωmax are the
minimum and maximum eigen value of W , respectively. This gives rise to the following Theorem:

Theorem 2.1 — Invertibility. Let W be a weighting matrix, such that wii = 0 for all i = 1, ..., n, and assume
that all of the roots of W are real. Assume also that W is not row normalized. Let ωmin and ωmax be the
minimum and maximum eigen value of W . Assume also that ωmax > 0 and ωmin < 0. Then (In − ρW )
is nonsingular for all:
−1 −1
ωmin < ρ < ωmax

Recall that for ease of interpretation, it is common practice to normalize W such that the elements of
each row sum to unity. Since W is nonnegative, this ensures that all weights are between 0 and 1, and has
the effect that the weighting operation can be interpreted as an averaging of neighboring values.
2.1. TAXONOMY OF MODELS 31

According to our Theorem 1.1 (Eigenvalues of row-stochastic Matrix) the eigenvalues of the row-stochastic
(i.e., row-normalized, row standardized or Markov) neighborhood matrix W are in the range [−1, +1]. In
this case ρ ∈ (−1, 1), however it is misleading to consider ρ as a conventional correlation coefficient vector y
and its spatial lag W y. This is only the result of considering the standard row-standardized matrix. Other
standardization methods will lead to other potential parameter space of ρ.
−1
Theorem 2.2 — Invertibility of Row-Normalized W matrix. If W is row-normalized, then (In − ρW ) exists
for all |ρ| < 1

In spite of its popularity, row-normalized weighting has it drawbacks. As we suggested in the remark in
Section 1.2.4, row normalization alters the internal weighting structure of W so that comparisons between
rows become somewhat problematic. In view of this limitation, it is natural to consider simple scalar nor-
malization which multiply W by a single number, say a · W , which removes any measure-unit effect but
preserves relations between all rows of W .
In particular let

a = min {r, c}
X
r = max |wij | maximal row sum of the absolute values
i
j (2.4)
X
c = max |wij | maximal column sum of the absolute values.
j
i

Then, assuming that the elements of W are nonegative, (In − ρW ) will be nonsingular for all |ρ| < 1/a.
Note that this normalization has the advantage of ensuring that the resulting spatial weights, wij , are all
between 0 and 1, and hence can still be interpreted as relative influence intensities. This could be taken as
the parameter space.
This is an important result because a model which has a weighting matrix which is not row normalized
can always be normalized in such a way that the inverse needed to solve the model will exists in an easily
established region.

R For more about normalizing W and the parameter space of ρ see Elhorst (2014, section 2.4) and
Kelejian and Prucha (2010, section 2.2)

Considering the reduced form Equation (2.3), we might be able to find the mean and variance-covariance
matrix of the complete system as function of exogenous variables. The expectation is given by:
h i
−1 −1
E(y|X, W ) = E (In − ρW ) (αın + Xβ) + (In − ρW ) ε X, W
(2.5)
−1
= (In − ρW ) (αın + Xβ) .
−1
When |ρ| < 1, (In − ρW ) implies an infinite series (also called the Leontief expansion) given in the
following Lemma.

Lemma 2.3 — Leontief Expansion. If |ρ| < 1, then

∞
X
(I − ρW )−1 = (ρW )i
i=0

Then, using Lemma 2.3 (Leontief Expansion), the reduced model given in Equation (2.3) can be written
as:

y = In + ρW + ρ2 W 2 + ... (αın + Xβ) + In + ρW + ρ2 W 2 + ... ε,

(2.6)
= αın + ρW ın α + ρ2 W 2 ın α + ... + Xβ + ρW Xβ + ρ2 W 2 Xβ + ... + ε + ρW ε + ρ2 W 2 ε.

Expression (2.6) can be simplified since the infinite series:

ın α
αın + ρW ın α + ρ2 W 2 ıα + ... → ,
(1 − ρ)
32 CHAPTER 2. SPATIAL MODELS

since α is a scalar, the parameter |ρ| < 1, and W is row-stochastic. By definition W ın = ın and therefore
W W ın = W ın = ı. Consequently, W l ın = ın for l ≥ 0 (recall that W 0 = In ). This allows to write:
1
y= ın α + Xβ + ρW Xβ + ρ2 W 2 Xβ + ... + ε + ρW ε + ρ2 W 2 ε + ...
(1 − ρ)
This expression allows defining two effects: a multiplier effect affecting the explanatory variables and a
spatial diffusion effect affecting the error terms. On the one hand, with respect to the explanatory variables,
this expression means that, on average, the value of y at one location i is not only explained by the values
of the explanatory variables associated to this location but also by those associated to all other locations
−1
(neighbors or not) via the inverse spatial transformation (In − ρW ) . This spatial multiplier effect decreases
−1
with distance, that is, the powers of W in the series expansion of (In − ρW ) .
On the other hand, with respect to the error process, this expression means that a random shock in a
location i not only affects the value of y in this location but also has an impact on the values of y in all other
locations via the same spatial inverse transformation. To see this, recall that W 2 will reflect second-order
contiguous neighbors, those that are neighbors to the first-order neighbors (review Section 1.2.5). Since the
neighbor of the neighbor (second-order neighbor) to an observation i includes observation i itself, W 2 has
positive elements on the diagonal when each observations has at least one neighbor. That is, higher-order
spatial lags can lead to a connectivity relation for an observations i such that W ε will extract observations
from the vector ε that point back to the observation i itself. This implies that there exists a simultaneous
feedback. This is the diffusion effect, which also declines with distance. We will explore this mechanism more
deeply in Section 2.3.
From Equation (2.3), we derive the variance-covariance matrix of y:

Var ( y| W , X) = E yy > W , X

−1
(2.7)
= (In − ρW ) E εε> W , X In − ρW >

This variance-covariance matrix is full, which implies that each location is correlated with every other
location in the system. However, this correlation decreases with distance. This equation also shows that the
covariance between each pair of error terms
is not null and decreasing with the order of proximity. Moreover,
the elements of the diagonal of E εε> W , X are not constant. This implies error heterokedasticity of
ε. Since we have not assumed anything about the error variance, we can say that E εε> W , X is a full

matrix, say Ω . This covers the possibility of heteroskedasticity, spatial autocorrelation, or both. In absence
of either of these complications, the variance matrix simplifies to the usual σ 2 In .
Example 2.1 — County Homicide Rates in US. In the criminology literature there has been a great emphasis

of spatial diffusion of crime. The idea is that criminal violence may spread geographically via a diffusion
process. For example, the literature suggests that certain social processes such as illegal drug markets and
gang rivalries may be important for explaining the pattern and mechanisms of the spread of homicides (Cohen
and Tita, 1999).
In particular, empirical literature has focused on homicide rates and their determinants using the following
OLS specification:

yi = x>
i β + i ,

where yi is the homicide rate in spatial unit i and x is a set of covariates that explain homicide rates across
spatial units. However, this model does not allow capturing the idea of spatial diffusion and spatial effects of
homicide rates. Furthermore, it has been generally found that homicide rate follows a spatial autocorrelated
process. Given this, Baller et al. (2001), after rejecting the null hypothesis of spatial randomness on homicide
rates, propose (among other spatial models) the following SLM process for modeling homicide rates using a
county-level data for the decennial years in the 1960 to 1990 time period:

y = αın + ρW y + Xβ + ,
where y is the homicide rates for the US counties, X includes a deprivation, population density, median age,
the unemployment rate, percent divorced, and a Southern dummy variable based on census definitions. As
explained by Baller et al. (2001), if homicides rates are determined solely by the structural factors included in
the X matrix, there should be no spatial patterning of homicide beyond that created by socio-demographic
similarities of geographically proximate counties. If this is the case, once all xk are included in the model,
the spatial relationship between yi and yj will become nonsignificant. This implies that ρ = 0.
2.1. TAXONOMY OF MODELS 33

This is the model most compatible with common notions of diffusion processes because it implies an
influence of neighbors’ homicide rates that is not simply an artifact of measured or unmeasured independent
variables. Rather, homicide events in one place actually increase the likelihood of homicides in nearby locales.

2.1.2 Spatial Durbin Model

The Spatial Durbin Model (SDM) model is shown in (2.8) along with its associated data generating process
in (2.9).

y = ρW y + αın + Xβ + W Xγ + (2.8)
−1
y = (In − ρW ) (αın + Xβ + W Xγ + ε) (2.9)

The SDM results in a spatial autoregressive model of a special form, including not only the spatially
lagged dependent variable and the explanatory variables, but also the spatially lagged explanatory variables,
W X: y depends on own-regional factors from matrix X, plus the same factors averaged the n neighboring
regions. This idea is shown in Figure 2.2. Note that Region 1 not only exerts an impact on Region 2 (an
viceversa) via y, but also via the the independent variable x.

Figure 2.2: The SDM for Two Regions

x1 x2

Effects
y1 y2 → : non-spatial effects
99K: spatial effects

1 2

As an example, consider that y is some measure of air pollution in each region. Thus, W y states that air
pollution 1 might affect pollution in region 2, and viceversa. If X contains a measure of population density,
the variable W X would indicate that density and region 1 (2) would affect air pollution in region 2 (1).
This model has also very good properties in terms of calculation of marginal effects that will explore later.

2.1.3 Spatial Error Model

Another form of spatial dependence occurs when the dependence works through the error process, in that
the errors from different areas may display spatial autocorrelation. The SEM model is formulated as:

y = αı + Xβ + u,
(2.10)
u = λW u + ε.
where λ is the autoregressive parameter for the error lag W u (to distinguish the notation from the spatial
autoregressive coefficient ρ in a spatial lag model), and ε is a generally a i.i.d noise. Figure 2.3 visualizes the
SEM for two regions. Note that the error term of both regions are related, and the only spatial effect goes
from 1 to 2 and vice versa.
As stated by Anselin and Bera (1998), spatial error dependence may be interpreted as a nuisance (and
the parameter λ as a nuisance parameter) in the sense that it reflects spatial autocorrelation in measurement
errors or in variables that are otherwise not crucial to the model (i.e., the “ignored” variables spillovers across
the spatial units of observations).
Unlike previous models, interactions effects among the error terms do not require a theoretical model
for a spatial or social interaction process, but instead, are consistent with a situation where determinants
34 CHAPTER 2. SPATIAL MODELS

Figure 2.3: The SEM for Two Regions

x1 x2

Effects
y1 y2 → : non-spatial effects
99K: spatial effects

1 2

of the dependent variable omitted from the model are spatially autocorrelated, or with a situation where
unobserved shocks follows a spatial pattern.
The spatial diffusion of this model can be analyzed if we consider the reduced form equation. If the matrix
(In − λW ) is not singular, then 2.10 can be written under the following reduced form:

y = αı + Xβ + (In − λW )−1 ε. (2.11)

This expression leads to a global spatial diffusion effect, but there is not spatial multiplier effect. The
variance-covariance matrix is given by:

E yy > W , X = E εε> W , X

−1 −1 (2.12)
= (In − λW ) E εε> W , X In − λW >

R Interaction effects among the unobserved terms may also be interpreted to reflect a mechanism to
correct rent-seeking politicians for unanticipated fiscal policy changes. See for example Allers and
Elhorst (2005).

2.1.4 Spatial Autocorrelation Model

A more general model is the one that includes the key modeling insights from both spatial lag and spatial error
model describe above. This model is called the Spatial Autocorrelation Model (SAC) and the its structural
representation is the following:
n
X K
X
yi = α + ρ wij yj + xik βk + ui
j=1 k=1
n
X
ui = λ mij uj + i
j=1

or more compactly,

y = αın + ρW y + Xβ + u
(2.13)
u = λM u + ε
where the matrix W and M are n × n spatial-weighting matrices.1 In this model, spatial interactions in the
dependent variable and the disturbances are considered. As standard, the spatial weight matrices W and
M are taken to be known and nonsthocastic. These matrices are part of the model definition, and in many
applications, M = W . When ρ = 0, the model reduces to the SEM. When λ = 0 the model reduces to the
SLM (SAR) specification. Setting ρ = 0 and λ = 0 causes the model to reduce to a linear regression model
with exogenous variables.
1 This model is also known as SARAR(1, 1) model or Cliff-Ord models because of the impact that Cliff and Ord (1973) had

on the subsequent literature. Note that SARAR(1, 1) is a special case of the more general SARAR(p, q) model.
2.2. MOTIVATION OF SPATIAL MODELS 35

The reduced form is given by:

−1 −1 −1
y = (In − ρW ) (Xβ + αın ) + (In − ρW ) (In − ρM ) ε. (2.14)
Figure 2.4 gives a more complete taxonomy for different spatial models. The more complete model is the
General Nesting Spatial Model (GNS or Manski’s Model), which includes spatial dependence in the dependent
variable, exogenous variables and the error term. Different restrictions give rise to different spatial models.

Figure 2.4: Taxonomy of Spatial Models

SAC
Spatial Lag Model
λ=0
y = ρW y + Xβ + u
y = ρW y + Xβ + ε
u = λW u + ε

0 0 ρ
= = =
γ γ 0

General nesting spatial model

Spatial durbin model SLX OLS
λ=0 ρ=0 γ=0

y = ρW y + Xβ + W Xγ + u
y = ρW y+Xβ+W Xγ+ε y = Xβ + W Xγ + ε y = Xβ + ε
u = λW u + ε

ρ
γ

=
ρ = 0
= −

0
0 0 ρβ λ
=
=
λ

Spatial durbin
Spatial error model
error model
γ=0

y = Xβ + u
y = Xβ + W Xγ + u
u = λW u + ε
u = λW u + ε

Starting with the GNS model:

• Imposing the restriction γ = 0 leads to the SAC model that includes both a spatial lag for the dependent
variable and spatial lag for the error term, but excludes the influence of the spatially lagged explanatory
variables.
• Imposing the restriction λ = 0 leads to the SDM.
• Imposing the restriction ρ = 0 leads to the Spatial Durbin Error Model (SDEM).

Starting with the SDM:

• The so-called common factor parameter restrictions (γ = −ρβ) yields the spatial error regression model
(SEM) specification that assumes that externalities across spatial unites are mostly a nuisance spatial
dependence problem caused by the regional transmission of random shocks.
• Imposing the restriction γ = 0 leads to the spatial lag model (SLM), whereas the restriction ρ = 0
results in a least-squares spatially lagged X regression model (labeled SLX) that assumes independence
between regions in the dependent variable, but includes characteristics from neighboring regions in the
form of spatially lagged explanatory variables.

Finally, if ρ = λ = 0 and θ = 0, then we obtain the traditional linear regression model.

2.2 Motivation of Spatial Models

2.2.1 SLM as a Long-run Equilibrium
It can be shown that a SLM can be considered as a simultaneous dependence system over time that culminate
in a new steady state equilibrium, even using a cross-sectional dataset.
36 CHAPTER 2. SPATIAL MODELS

To illustrate this, consider yt , which represent some dependent variable vector at time t. Assume that
this variable is determined by a spatial autoregressive scheme that depends on space-time lagged values of the
dependent variable from neighboring observations. This would lead to a time lag of the average neighboring
values of the dependent variable observed during previous period, W yt−1 . We can also include current period
own-region characteristics Xt in the model. If the characteristics of regions remain relatively fixed over time,
we can write Xt = X As an illustration, consider a model involving pollution as the dependent variable yt ,
which depend on past period pollution of neighboring regions, W yt−1 . Then, the more appropriate process
is the following:

yt = ρW yt−1 + Xβ + εt . (2.15)
Note that we can replace yt−1 on the right-hand side of (2.15) with:

yt−1 = ρW yt−2 + Xβ + εt−1 ,

producing:

yt = Xβ + ρW (Xβ + ρW yt−2 + εt−1 ) + εt , (2.16)

= Xβ + ρW Xβ + ρ W yt−2 + t + ρW εt−1 .
2 2
(2.17)

Recursive substitution for past values of the vector yt−r on the right-hand side of (2.17) over q periods
leads to:

yt = In + ρW + ρ2 W 2 + ... + ρq−1 W q−1 Xβ + ρq W q yt−q + u,

u = = εt + ρW εt−1 + ρ2 W 2 εt−2 + ... + ρq−1 W q−1 εt−(q−1) .

Noting that:

E (yt ) = In + ρW + ρ2 W 2 + ... + ρq−1 W q−1 Xβ + ρq W q yt−q , (2.18)

where we use the fact that E(εt−r ) = 0, r = 0, ..., q − 1, which also implies that E(u) = 0. Finally, taking the
limit of (2.18),
−1
lim E (yt ) = (In − ρW ) Xβ. (2.19)
q→∞

Note that we use the fact that the magnitude of ρq W q yt−q tends to zero for large q, under the assumption
that |ρ| < 1 and assuming that W is row-stochastic, so the matrix W has a principal eigenvalue of 1.
Equation (2.19) states that we can interpret the observed cross-sectional relation as the outcome or
expectation of a long-run equilibrium or steady state. Note that this provides a dynamic motivation for
the data generating process of the cross-sectional SLM that serves as a workhorse of spatial regression
modeling. That is, a cross-sectional SLM relation can arise from time-dependence of decisions by economic
agents located at various point in space when decisions depend on those neighbors.

2.2.2 SEM and Omitted Variables Motivation

Consider the following process:

y = xβ + zθ,
where x and z are uncorrelated vectors of dimension n × 1, and the vector z follows the following spatial
autoregressive process:

z = ρW z + r
−1
z = (In − ρW ) r

where r ∼ N(0, σ2 In ). Examples of z are culture, social capital, neighborhood prestige.
If z is not observed, then:
2.3. INTERPRETING SPATIAL MODELS 37

y = xβ + u
−1 (2.20)
u = (In − ρW ) ε
where ε = θr. Then, we have the DGP for the spatial error model.

2.2.3 SDM and Omitted Variables Motivation

Now suppose that X and ε from (2.20) are correlated, given by the following process:

ε = xγ + v
(2.21)
v ∼ N(0, σ 2 In )
where the scalar parameters γ and σ 2 govern the strength of the relationship between X and z = (In −
ρW )−1 r. Inserting (2.21) into (2.20), we obtain:
−1
y = xβ + (In − ρW ) ε
−1
= xβ + (In − ρW ) (xγ + v)
= xβ + (In − ρW )
−1
xγ + (In − ρW )
−1
v (2.22)
(In − ρW ) y = (In − ρW ) xβ + v
y = ρW y + x (β + γ) + W x(−ρβ) + v
This is the Spatial Durbin Model (SDM), which includes a spatial lag of the dependent variable y, as well
as the explanatory variables x.

2.3 Interpreting Spatial Models

2.3.1 Measuring Spillovers
A major focus of regional science is spatial spillover. A basic definition of spillovers in a spatial context
would be that changes occurring in one region exert impacts on other regions (LeSage and Pace, 2014). Some
examples are:

• Changes in tax rate by one spatial unit might exert an impact on tax rate setting decisions of nearby
regions, a phenomenon that has been labeled tax mimicking and yardstick competition between local
government (see our example below).
• Situations where home improvements made by one homeowner exert a beneficial impact on selling
prices of neighboring homes.
• Innovation by university researchers diffuses to nearby firms.

• Air or water pollution generated in one region spills over to nearby regions.

The models reviewed in the previous section can be use to formally define the concept of a spatial
spillover, and more important to provide estimates of the quantitative magnitude of spillovers and test for
the statistically significance of these. There is however a distinction between global and local spillovers, which
is discussed in Anselin (2003) and LeSage and Pace (2014).
We start our discussion about spillovers by formally defining global spillovers.
Definition 2.3.1 — Global Spillovers. Global spillovers arise when changes in a characteristic of one region
impact all regions’ outcomes. This applies even to the region itself since impacts can pass to the neighbors
and back to the own region (feedback). Specifically, global spillovers impact the neighbors, neighbors to
the neighbors, neighbors to the neighbors to the neighbors and so on.

The endogenous interactions produced by global spillovers lead to a scenario where changes in one region
set in motion a sequence of adjustments in (potentially) all regions in the sample such that a new long-run
steady state equilibrium arises (LeSage, 2014).
38 CHAPTER 2. SPATIAL MODELS

As explained by LeSage (2014), global spillovers might arise when considering local policies interactions.
For example: “it seems plausible that changes in levels of public assistance (cigarette taxes) in state A would
lead to a reaction by neighboring states B to change their levels of assistances (taxes), which in turn produces a
game-theoretic (feedback) response of state A, and also responses of states C who are neighbors to neighboring
states B, and so on.”
The following definition corresponds to local spillovers.
Definition 2.3.2 — Local Spillovers. Local spillovers represent a situation where the impact fall only on
nearby or immediate neighbors, dying out before they impact regions that are neighbors to the neighbors.

As it can be noted from the previous definitions, the main difference is that feedback or endogenous
interaction is only possible for global spillovers.

R According to Anselin (2003) and LeSage and Pace (2014), different spatial models give rise to different
measures of spillovers.

2.3.2 Marginal Effects

Mathematically, the notion of spillover can be thought as the derivative ∂yi /∂xj . This means that changes
to explanatory variables in region i impact the dependent variable in region j 6= i.
As an illustration, consider the SDM model, which can be re-written as:

(In − ρW )y = Xβ + W Xθ + ε
y = (In − ρW )−1 Xβ + (In − ρW )−1 W Xθ + (In − ρW )−1 ε
y = A(W )−1 Xβ + A(W )−1 W Xθ + A(W )−1 ε, since A(W ) = (In − ρW )−1
y = A(W )−1 (Xβ + W Xθ) + A(W )−1 ε
K
X
y= A(W )−1 (In βr + W θr ) xr + A(W )−1 ε
r=1
K
X
y = Sr (W ) xr + A(W )−1 |{z}
ε ,
|{z}
r=1
| {z } |{z} | {z }
(n×1) (n×n) n×1 (n×n) n×1

where Sr = A(W )−1 (In βr + W θr ), and

 
xr1
 xr2 
xr =  .  .
 
 .. 
xrn
Assuming that E(i ) = 0, then the expansion of the expected value yields:

E(y1 ) Sr (W )11 Sr (W )12 . . . Sr (W )1n

    
x1r
 E(y2 )  X K  S (W ) Sr (W )22 . . . Sr (W )2n 
 r 21   x2r 
 
 ..  = .. .. . . . . (2.23)
 
 .  r=1 

. . .. ..   .. 
 

E(yn ) Sr (W )n1 Sr (W )n2 . . . Sr (W )nn xnr

For the dependent variable for spatial unit i, Equation (2.23) would be:
k
X
E(yi ) = [Sr (W )i1 x1r + Sr (W )i2 x2r + ... + Sr (W )in xnr ] . (2.24)
r=1
So, the impact on the expected value of location i given a change in the explanatory variable xr in location
j is

∂E(yi )
= Sr (W )ij (2.25)
∂xjr
2.3. INTERPRETING SPATIAL MODELS 39

where Sr (W )ij is this equation represents the i, jth element of the matrix Sr (W ). This result implies that,
unlike the OLS model, a change in some variable in certain region will potentially affect the expected value of
the dependent variable in all other regions. Given this characteristic, this type of effect is known as indirect
effect.
The impact of the expected value of region i, given a change in certain variable for the same region is
given by

∂E(yi )
= Sr (W )ii . (2.26)
∂xir
This impact includes the effect of feedback loops where observation i affects observation j and obser-
vation j also affects observation i: a change in xir will affect the expected value of dependent variable in i,
then will pass through the neighbors of i and back to the region itself. To shed more light on this, let us
write the all the marginal effects in matrix notation as follows:
 ∂E(y1 ) ∂E(y1 )
. . . ∂E(y 1)

∂x1r ∂x2r ∂xnr
∂E(y ) ∂E(y2 )
 ∂x1r2 ∂x2r . . . ∂E(y
∂xnr 
2) 
∂E(y) ∂E(y)
. . . ∂xnr =  .
∂E(y)
.. .. .. 
.
 
∂x1r ∂x2r
(n×n)
 . . . . 
∂E(yn ) ∂E(yn ) ∂E(yn )
∂x1r ∂x2r . . . ∂xnr
= A(W )−1 (In βr + W θr ) = Sr (W ) (2.27)
 
βr w12 θr . . . w1n θr
 w21 θr βr . . . w2n θr 
= (In − ρW )−1  . . .. .. 
 
 .. .. . . 
wn1 θr wn2 θr . . . βr
This expression is somewhat difficult to understand. To provide a better understanding we follow Elhorst
(2010) and consider a model with 3 regions arranged linearly2 with the following matrices:

0 1 0
 

W = w21 0 w23  (2.28)

0 1 0
and

1 − w23 ρ2 ρ2 w23
 
ρ
1 
A(W )−1 = ρw21 1 ρw23  (2.29)
1 − ρ2
ρ2 w21 ρ 1 − w21 ρ2
where w12 = w31 = 1 since units 1 and 3 have only one neighbor, and w21 + w23 = 1, so we explicitly consider
a row-standardized matrix. Substituting Equations (2.28) and (2.29) into Equation (2.27) we get:

1 − w23 ρ2 βr + (w21 ρ) θr ρβr + θr w23 ρ2 βr + (ρw23 )θr

 
1 
∂E(y) ∂E(y) ∂E(y)
= (w21 ρ)βr + w21 θr βr + ρθr (w23 ρ)βr + w23 θr
1 − ρ2

∂x1r ∂x2r ∂x3r
(w21 ρ2 )βr + (w21 ρ)θr ρβr + θr (1 − w21 ρ2 )βr + (w23 ρ)θr

Every diagonal element of this matrix represents a direct effect. Consequently, indirect effect do not occur
if both ρ = 0 and θk = 0, since all non-diagonal elements will then be zero. Another important insight is that
direct and indirect effects are different for different spatial units in the sample. Direct effects are different
because the diagonal elements of the matrix (In − ρW )−1 are different for different units, provided that
ρ 6= 0. Indirect effects are different because both the non-diagonal elements of the matrix (In − ρW )−1 and
of the matrix W are different for different units, provided that ρ 6= 0 and/or θk 6= 0. Finally, note that
indirect effects that occur if θk 6= 0 are local effects, whereas indirect effects that occur if ρ 6= 0 are global
effects.
2 Unit 1 is neighbor of unit 2, unit 2 is a neighbor of both units 1 and 3, and unit 3 is a neighbor of unit 2.
40 CHAPTER 2. SPATIAL MODELS

Summary Measures
In general, it can be noted that the change of each variable in each region implies n2 potential marginal
effects. If we have K variables in our model, this implies K × n2 potential measures. Even for small values
of n and K, it may already be rather difficult to report these results compactly. To overcome this problem,
LeSage and Pace (2010, p. 36-37) propose the following scalar summary measures:
Definition 2.3.3 — Average Direct Impact. Let Sr = A(W )−1 (In βr + W θr ) for variable r. The impact of
changes in the ith observation of xr , which is denoted xir , on yi could be summarized by measuring the
average Sr (W )ii , which equals
1
ADI = tr (Sr (W )) (2.30)
n
Averaging over the direct impact associated with all observations i is similar in spirit to typical regression
coefficient interpretations that represent average response of the dependent to independent variables over the
sample of observations.
Definition 2.3.4 — Average Total Impact to an Observation. Let Sr = A(W )−1 (In βr + W θr ) for variable
r. The sum across the ith row of Sr (W ) would be represent the total impact on individual observation
yi resulting from changing the rth explanatory variable by the same amount across all n observations.
There are n of these sums given by the column vector cr = Sr (W )ın , so an average of these total impacts
is:
1
ATIT = ı0n cr (2.31)
n

Definition 2.3.5 — Average Total Impact from an Observation. Let Sr = A(W )−1 (In βr + W θr ) for variable
r. The sum down the jth column of Sr (W ) would yield the total impact over all yi from changing the
rth explanatory variable by an amount in the jth observation. There are n of these sums given by the
row vector rr = ı0n Sr (W ), so an average of these total impacts is:
1
ATIF = rr ın (2.32)
n
The definition 2.3.5 relates how changes in a single observation j influences all observations. In contrast,
definition 2.3.4 considers how changes in all observations influences a single observation i. In both cases,
averaging over all n observations, leads to the same numerical result. The implication of this interesting
result is that the average total impact is the average of all derivatives of yi with respect to xjr for any i, j.
Therefore:

M̄ (r)direct = n−1 tr (Sr (W )) (2.33)

M̄ (r)total = n−1 ı0n Sr (W )ın (2.34)
M̄ (r)indirect = M̄ (r)total − M̄ (r)direct (2.35)

Given our example above, we obtain a direct effect of:

(3 − ρ2 ) 2p
βk + θk ,
3(1 − ρ2 ) 3(1 − ρ2 )
and an indirect effect of

3ρ + ρ2 3+ρ
βk + θk .
(3(1 − ρ2 )) 3(1 − ρ2 )
Unfortunately, since every application will have its own unique number of observations n and spatial
weight matrix (W ), these formulae cannot be generalized.
Example 2.2 — The Effect of Number of workers on Commuting Times. Kirby and LeSage (2009) use an
SDM specification to consider changes in the (logged) number of workers in the US census tracts with
commuting times exceeding 45 minutes one way, between 1990 and 2000 (See also the example in Section
2.3.4). The motivation of this investigation is the fact that the percentage of the US workers with these
2.3. INTERPRETING SPATIAL MODELS 41

long commute times in 1990 was 12.5% compared to 15.4% in 2000, an increase of more than 10%. When
deciding which model to estimate, they note that spillover impacts from an increase in commuters traveling
long distances to work would seem global in nature, since the congestion effects of more travelers on one
segment of a metropolitan area roadway network impact travel times of other travelers on the entire network.
Furthermore, they state that feedback effects seem likely since congestion arising from commuting decisions
by workers in one tract will spillover to neighboring tracts, which in turn create congestion feedback to the
own tract. These two observations led the authors to specify the following SDM:

y = ρW y + αın + Xβ + W Xθ + ε,
where X includes the (logged) number of workers with long commute times (y), variables related to location
decision of households; age, gender and income distribution of resident population, and geographical charac-
teristics of the tract, and W X includes these same characteristics of neighboring census tracts. Based on a
comparison of direct, indirect and total effects estimates form the 1990 and 2000 models, they conclude
that the suite of variables reflecting the age and gender distribution of population in the tracts represents
the primary explanation for changes in the number of workers with long commute times between 1990 and
2000. The spillover impacts of the number of employed females in the 1990 model was positive suggesting
that more employed females in a tract produced an increase in long commute times for neighboring tract
commuters. In contrast, for the 2000 model, spillovers associated with employed females were negative, so
that more employed females in a tract reduced long commute times for workers located in neighboring tracts.

Example 2.3 — Effect of Pollution on Housing Price. Kim et al. (2003) use a spatial-lag hedonic model in
order to assess the direct and indirect effect of quality air on housing price. The main model is the following:

p = ρW p + X1 β1 + X2 β2 + X3 β3 + ε,
where p is the vector of housing prices, ρ is a spatial autocorrelation parameter, W is the n × n spatial weight
matrix, X1 is a matrix with observations on structural characteristics, X2 is a matrix with observations on
neighborhood characteristics, and X3 is a matrix with observations on environmental quality (SO2 and NOx ).
The marginal implicit price (marginal benefit) of the hedonic equation is derived as

∂E(p)
∂x1r
∂E(p)
∂x2r . . . ∂E(p)
∂xnr
= A(W )−1 In βr where A(W )−1 = (In − ρW )−1

Focusing on the first row the interpretation if the following: the housing price of location i is not only
affected by a marginal change air quality of location i but also is affected by marginal changes of air quality
in other locations. That is, the total impact of a change in Pair quality on housing price at location i is the
n
sum of the direct impacts ∂p1 /∂x1k plus induced impacts i=2 ∂p1 /∂xik (See our Definition 2.3.4).
An important point evidenced by Kim et al. (2003) is that, if the row-sums of W is less than or equal
to one and ρ in the proper parameter space, i.e., ρ < 1, then the total average effect can be computed as
βr /(1 − ρ). To see this note that

n−1 ı> Sr (W )ı = n−1 ı> A(W )−1 (Iβr ) ı

= n−1 ı> (In − ρW )−1 (Iβr ) ı

= n−1 ı> In + ρW + ρ2 W 2 + ... (Iβr ) ı using Lemma 2.3

= n−1 ı> In βr + ρW βr + ρ2 W 2 βr + ... ı

= n−1 ı> In ıβr + ρW ıβr + ρ2 W (W ı)βr + ρ3 W W (W ı)

= n−1 ı> βr ı + ρβr ı + ρ2 βr ı + ρ3 βr ı + ...

∵ W lı = ı (2.36)

= n−1 ı> βr + ρβr + ρ2 βr + ρ3 βr + ... ı

= n−1 βr + ρβr + ρ2 βr + ρ3 + ... ı> ı

= n−1 βr + ρβr + ρ2 βr + ρ3 βr + ... n

βr
=
(1 − ρ)
The model is estimated in a semi-log functional form, therefore the estimated coefficients can be inter-
preted as semi-elasticities. In particular, note that the elasticity for SO2 is given by:
42 CHAPTER 2. SPATIAL MODELS

SO2 dp
SO2 =
p dSO2

SO2 βr
= ·p since the model is log-lin (2.37)
p (1 − ρ)
βr
= · SO2
(1 − ρ)
Using the estimated ρb = 0.549 and replacing SO2 by its mean value they obtain that the elasticity of
housing price from a given small change in air quality is about 0.348 ≈ 4%. The marginal benefits per
household of a permanent 4% improvement in air quality using βSO2 (In − ρW )−1 p is about $2333 (1.43%
of mean house value) for owners.

Example 2.4 — Human Capital and Labor Productivity. Fischer et al. (2009) analyze the role of human capital

in explaining labor productivity variation among European region. In particular they estimate the following
model:

y = ρW y + Xβ + W Xγ + ε
where y is the vector of observations on the (log of) labor productivity level at the end of the sample period
(2004) and X contains (the log of) labor productivity and human capital at the beginning of the sample
period (1995). The parameter ρ is expected to be positive indicating that regional productivity levels are
positively related to a linear combination of neighboring regions’ productivity. The parameter vector γ
captures two types of spatial externalities:spatial effects working through the level of labor productivity and
spatial effects working through the level of human capital, both at the beginning of the sample period.
The estimated parameter of the spatial autoregressive parameter is ρb = 0.664 providing evidence for the
existence of significant spatial effects working through the dependent variable.
The mean direct impact for the human capital is 0.1317, whereas the indirect impact is -0.1968. They
interpret the indirect impact in two ways. First, they argue that the indirect impact reflects how a change
in the human capital level of all regions by some constant would impact the labor productivity of a typical
region (observation). The sign of the estimated mean indirect impact implies that an increase in the initial
level of human capital of all other regions would decrease the productivity level of a typical region. This
indirect impact takes into account the fact that the change in initial human capital level negatively impacts
other regions’ labor productivity, which in turn negatively influences our typical region’s labor productivity
due to the presence of positive spatial dependence on neighboring regions’ labor productivity levels.
Second Fischer et al. (2009) measure the cumulative impact of a change in region’s i initial level of human
capital averaged over all other regions. The impact from changing a single region’s initial level of human
capital on each of the other region’s labor productivity is small, but cumulatively the impact measures
-0.1968.

R A very good paper for those interesting in making the connection between global/local spillovers and
different spatial model specifications is LeSage (2014). This is a must-read paper.

2.3.3 Partitioning Global Effects Estimates Over Space

It should bear in mind that these scalar summary measures of impact reflect how these changes would work
thought the simultaneous dependence system over time to culminate in a new steady state equilibrium.
Therefore, they should be considered as those impacts that would take place once all regions reach their
equilibrium after the initial change in the variable of interest (See our discussion in Section 2.2.1). However
one could track the cumulative effects as the impacts pass through neighbors, neighbors of neighbors and so
on.

R Cross-sectional observations could be viewed as reflecting a (comparative static) slice at one point in
time of a long-run steady-sate equilibrium relationship, and the partial derivatives viewed as reflecting
a comparative static analysis of changes that represent new steady-state relationship that would arise
(LeSage, 2014).
2.3. INTERPRETING SPATIAL MODELS 43

Intuition tell us that impacts arising from a change in the explanatory variables will influence low-order
neighbors more than higher-order neighbors. Therefore, we would expect a decline in the impacts’ magnitude
as we move from lower- to higher-order neighbors. To get a better idea of this process is necessary to consider
the matrix Sr (W ) and recognize, by Lemma 2.3, that this matrix can be expressed as a linear combination
of power of the weight matrix W . In particular, recall that if W is a row standardized matrix such that
ρ ∈ (−1, 1), then by Lemma 2.3:

∂E(y) ∂E(y) ∂E(y)
≈ In + ρW + ρ2 W 2 + ρ3 W 3 + ... + ρl W l In βr (2.38)

∂x1r ∂x2r . . . ∂xnr

This expression allow us to observe the impact associated with each power of W , where these powers
corresponds to the observation themselves (zero-order), immediate neighbors (first-order), neighbors of neigh-
bors (second-order), and so on. Using this expansion we could account for both the cumulative effects as
marginal and total direct, indirect associated with different order of neighbors.

2.3.4 Lesage’s Book Example

Commuting Times and Congestion
In this section we use LeSage and Pace (2010)’s example as an illustration of spatial spillovers.3 For this
purpose consider a set of seven regions show in Figure 2.5, which represent three regions to the west and
three to the east of a central business district (CBD). In particular, consider a region R4 as being the central
business district. Since the entire region contains only a single roadway, all commuters share this route to
and from the CBD.

Figure 2.5: Regions east and west of the CBD

We observe the following set of the sample data for these regions that relates travel times to the CBD
(in minutes) contained in the dependent variable vector y to distance (in miles) and population density
(population per square block) of the regions in the two columns of the matrix X.
3 This example is further explore in Kirby and LeSage (2009) with a real application.
44 CHAPTER 2. SPATIAL MODELS

According to LeSage and Pace (2010), the pattern of longer travel times for more distant regions R1
and R7 versus nearer R3 and R5 found in vector y seems to clearly violate independence, since travel times
appear similar for neighboring regions (see also Example 2.2). However one can argue that the observed
pattern is not due to spatial dependence, but rather it is explained by the variables Distance and Density
associated with each region, since these also appear similar for neighboring regions. Note that even for
individual residing in the CBD, it takes time to go somewhere else in the CBD. Therefore, the travel time
for intra-CBD travel is 26 minutes despite having a distance of 0 miles.
If we assume that the observed data was collected in a given day and averaged over a 24-hour period,
it can be hypothesized that congestion effects that arise from the shared highway can explain the observed
patter of travel times. It is reasonable to claim that longer travel times in one region should lead to longer
travel times in neighboring regions on any given day. This is because commuters pass from one region to
another as they travel along the highway to the CBD.
Congestion effects represent one type of spatial spillover, which do not occur simultaneously, but require
some time for the traffic delay to arise. From a modeling point of view, this effect cannot be captured by OLS
model with distance and density as independent variables. These are dynamic feedback effects from travel
time on a particular day that impact travel times of neighboring regions in the short time interval required
for the traffic delay to occur. Since the explanatory variable distance would not change from day to day, and
population density would change very slowly on a daily time scale, these variables would not be capable of
explaining daily delay phenomena.
A better way of explaining congestion is by the following DGP:

y = ρ0 W y + Xβ0 + ε,
such that:
−1
yb = (In − ρbW ) X β,
b

where the estimated parameters are βb = (0.135, 0.561)0 and ρb = 0.640 (assume that somehow we have
estimated these parameters). Note that the estimated spatial autoregressive parameters indicates positive
spatial dependence in the commuting times.

Computing Effects in R
Now think about the following question: What would be the estimated spillovers if region R2 doubles its
population density? To answer this question we first obtain the predicted values of travel times before the
change.4 That is, we first obtain:
−1
yb(1) = (In − ρbW ) X β.
b

4 Note that there is a typo in LeSage and Pace (2010), because in their equation (1.19) they double distance, not density.
2.3. INTERPRETING SPATIAL MODELS 45

# Estimated coefficients
b <- c(0.135, 0.561)
rho <- 0.642

# W and X
X <- cbind(c(10, 20, 30, 50, 30, 20, 10),
c(30, 20, 10, 0, 10, 20, 30))
W <- cbind(c(0, 1, 0, 0, 0, 0, 0),
c(1, 0, 1, 0, 0, 0, 0),
c(0, 1, 0, 1, 0, 0, 0),
c(0, 0, 1, 0, 1, 0, 0),
c(0, 0, 0, 1, 0, 1, 0),
c(0, 0, 0, 0, 1, 0, 1),
c(0, 0, 0, 0, 0, 1, 0))
Ws <- W / rowSums(W)

# Prediction
yhat_1 <- solve(diag(nrow(W)) - rho * Ws) %*% crossprod(t(X), b)

Now we estimate the predicted values of travel times after the change in population density in R2 using:
−1 f b
yb(2) = (In − ρbW ) X β (2.39)
where Xf is the new matrix reflecting a doubling of the population density of region R2.5 A comparison of
predictions yb(1) and yb(2) are going to be used to illustrate how the model generates spatial spillovers.

# Now we double the population density of a single region

X_d <- cbind(c(10, 40, 30, 50, 30, 20, 10),
c(30, 20, 10, 0, 10, 20, 30))

# Compute predicted value after the change

yhat_2 <- solve(diag(nrow(W)) - rho * Ws) %*% crossprod(t(X_d), b)

# Results
result <- cbind(yhat_1, yhat_2, yhat_2 - yhat_1)
colnames(result) <- c("y1", "y2", "y2 - y1")
round(result, 2)

## y1 y2 y2 - y1
## [1,] 41.90 44.46 2.56
## [2,] 36.95 40.93 3.99
## [3,] 29.84 31.28 1.45
## [4,] 25.90 26.43 0.53
## [5,] 29.84 30.03 0.19
## [6,] 36.95 37.03 0.08
## [7,] 41.90 41.95 0.05

sum(yhat_2 - yhat_1)

## [1] 8.846915

The two set of predictions show that the change in region R2 population density has a direct effect that
increases the commuting times for residents of region R2 by ≈4 minutes. It also has an indirect or spillover
effect that produces an increase in commuting times for the other six regions. Furthermore, it can be noticed
that the increase in commuting times for neighboring regions R1 and R3 are the greatest and these spillovers
5 For more about prediction in the spatial context see Kelejian and Prucha (2007).
46 CHAPTER 2. SPATIAL MODELS

decline as we move to regions in the sample that are located farther away from region R2 where the change
in population density occurred.
What is the cumulative indirect impacts? Adding up the increased commuting times across all other
regions (excluding the own-region change in commuting time), we find that equals ≈ 4.86(2.56 + 1.45 + 0.53 +
0.19 + 0.08 + 0.05) minutes, which is larger than the direct (own-region) impact of 4 minutes. Finally, the
total impact of all residents of the seven regions from the change in population density of region R2 is the
sum of the direct and indirect effects, or 8.85 minutes increase in travel times to the CBD.
Now assume that the OLS estimates for the example above are: βbOLS = [0.55, 1.25]. Using these estimates
we compute the OLS predictions based on the matrices X and X f as shown above.

# Ols prediction
b_ols <- c(0.55, 1.25)
yhat_1 <- crossprod(t(X), b_ols)
yhat_2 <- crossprod(t(X_d), b_ols)
result <- cbind(yhat_1, yhat_2, yhat_2 - yhat_1)
colnames(result) <- c("y1", "y2", "y2 - y1")
round(result, 2)

## y1 y2 y2 - y1
## [1,] 43.0 43.0 0
## [2,] 36.0 47.0 11
## [3,] 29.0 29.0 0
## [4,] 27.5 27.5 0
## [5,] 29.0 29.0 0
## [6,] 36.0 36.0 0
## [7,] 43.0 43.0 0

The results show no spatial spillovers. Only the travel time of R2 is affected by the change in population
density of region R2. It can be also observed that OLS prediction is upward bias. This is the main message
here. An OLS model does no allows for spatial spillover impacts and generates biased marginal effects.
Now we further explore our formulas and definition from previous Section. As we showed in Equation
(2.26), the impact of changes in the ith observation of xr on yi is Sr (W )ii . Given the SLM structure of our
example, this is equivalent to

∂E(CTi )
= Sdensity (W )ii , where Sdensity = (I − ρW )−1 Iβdensity .
∂densityi
We can compute our Sdensity in the following way.

# Compute S(W) matrix for density

b_dens <- 0.135
S <- solve(diag(nrow(W)) - rho * Ws) %*% diag(nrow(W)) * b_dens
colnames(S) <- rownames(S) <- c("R1", "R2", "R3", "R4", "R5", "R6", "R7")

Then, the direct impact of doubling population density of R2 on the expected value of commuting time
for R2 is given by

∆E(CT2 ) = Sdensity (W )22 ∆density2 = Sdensity (W )22 · 20

In R, this equals :

# Direct impact of R2 on R2
round(S[2,2] * 20, 2)

## [1] 3.99

Note that this value is the same as that found using the predicted value procedure: by doubling population
density in R2 increases the commuting times for residents of region R2 by ≈4 minutes.
2.3. INTERPRETING SPATIAL MODELS 47

Finding the indirect impact on region R1 is similar given Equation 2.25. The indirect impact on region
R1 is given by:

∆E(CT1 ) = Sdensity (W )12 ∆density2 = Sdensity (W )12 · 20

That is:

# Indirect impact of R2 on R1
round(S[1,2] * 20, 2)

## [1] 2.56

Again, note that is the same value computed before: An increase of 100% of population density in R2
implies an increase of travel time of region R1 to CBD of about 2.56 minutes, after considering all feedback
effects.
An interesting question would be the following: What would be the impact on commuting time on R1
if population density increases by 20 in all the Regions? To answer this question, we should recall our
definition 2.3.4 states that the sum across the ith row of Sr (W ) would be represent the total impact on
individual observation yi resulting from changing the rth explanatory variable by the same amount across n
observations.

# ATIT
round(sum(S[1, ]) * 20, 2)

## [1] 7.54

This number implies that the total impact to R1 will be an increase of commuting time of ≈ 7.5 minutes.
Using the formula for ATIT gives the same result:

# ATIT
n <- nrow(W)
vones <- rep(1, n)
round(((t(vones) %*% S %*% vones) / n ) * 20, 2)

## [,1]
## [1,] 7.54

Similarly, we could ask: What would be the impact of increasing density by 20 in R1 on all the other
regions? This is equivalent to our definition 2.3.5 which state that the sum down the jth column of Sr (W )
would yield the total impact over all yi from changing the rth explanatory variable by an amount in the jth
observation.

# ATIF
round(sum(S[, 1]) * 20, 2)

## [1] 5.54

In words, increasing density by 20 in R1 would imply a total effect in all the regions of about 7.54 minutes.
Imagine that you are a policy maker and you are considering in implementing a policy to reduce population
density and hence reduce commuting time in the regions. However, given that resources are scarce, you must
select which region to implement this policy. In order to produce a greater effect of policy you could use
the estimated spatial model and look for the region that will have the greatest overall impact (considering
feedback effects). Basically, this involves calculating the column sum of Sr (W ) for each region in the following
way:

# Computing colsums of S(W)

round(colSums(S), 2)

## R1 R2 R3 R4 R5 R6 R7
## 0.28 0.44 0.40 0.39 0.40 0.44 0.28
48 CHAPTER 2. SPATIAL MODELS

Note that the impact of decreasing population density by 1 will have a greater reduction in commuting
time if applied in regions R2 and R6 (why?)
Finally, the average direct, indirect and total effects of an increase in 1 in population density in all the
regions can be computed as follows.

# Average Direct Impact

ADI <- sum(diag(S)) / nrow(W)
round(ADI, 4)

## [1] 0.1837

# Average Total Impact

Total <- crossprod(rep(1, nrow(W)), S) %*% rep(1, nrow(W)) / nrow(W)
round(Total, 4)

## [,1]
## [1,] 0.3771

# Average Indirect Impact

round(Total - ADI, 4)

## [,1]
## [1,] 0.1934

Equation (2.36) of Example 2.3, we show that the total effect can be also be computed as βr /(1 − ρ). We
know show that this proposition is true for our example

#Check total effect

b_dens / (1 - rho )

## [1] 0.377095

Cumulative Effects
The main idea of this exercise is to show how the change in some explanatory variable produces changes in
the independent variable in all the spatial units by decomposing them into cumulative and marginal impacts
for different order of neighbors as explained in Section 2.3.3.
First, we load the package expm which will allow us to compute power of matrices in a loop. Then we
create the estimated coefficients along with the W matrix:

# Package to compute power of a matrix

library("expm")

In order to create the decomposition for the ADI, AII and ATI, we create the following loop from q = 0
to q = 10:

## Loop for decomposition

out <- matrix(NA, nrow = 11, ncol = 3) # Matrix for the results
colnames(out) <- c("Total", "Direct", "Indirect") # colnames
rownames(out) <- paste("q", sep = "=", seq(0, 10)) # rownames

for (q in 0:10) {
if (q == 0) { # If q=0, then Sr = I * beta
S <- diag(n) * b_dens
} else {
S <- (rho ^ q * Ws %^% q) * b_dens
}
2.3. INTERPRETING SPATIAL MODELS 49

q <- q + 1 # the row = 0 doesn't exist!

out[q, 2] <- sum(diag(S)) / n
out[q, 1] <- crossprod(rep(1, n), S) %*% rep(1, n) / n
out[q, 3] <- out[q, 1] - out[q, 2]
}

The results are the following

# Print results
round(out, 4)

## Total Direct Indirect

## q=0 0.1350 0.1350 0.0000
## q=1 0.0867 0.0000 0.0867
## q=2 0.0556 0.0318 0.0238
## q=3 0.0357 0.0000 0.0357
## q=4 0.0229 0.0106 0.0123
## q=5 0.0147 0.0000 0.0147
## q=6 0.0095 0.0039 0.0056
## q=7 0.0061 0.0000 0.0061
## q=8 0.0039 0.0015 0.0024
## q=9 0.0025 0.0000 0.0025
## q=10 0.0016 0.0006 0.0010

round(colSums(out), 4)

## Total Direct Indirect

## 0.3742 0.1834 0.1909

This table shows both the cumulative and partitioned direct, indirect and total impacts associated with
orders 0 to 10 for the SLM. The cumulative direct impact from previous section equal to 0.1837, which given
the coefficient 0.1350 indicates that there is a feedback equal to (0.1837 - 0.1350) = 0.0487 arising from each
region impacting neighbors that in turn impacts neighbors to neighbors and so on.
The column sum of the matrix out shows that by the time we reach 10th-order neighbors we have
accounted for 0.1834 of the 0.1837 cumulative direct effect. It is important noting that for W 0 there is no
indirect effect, only direct effects, and for W 1 there is no direct effect, only indirect. To see this, note that
when q = 0 we obtain W 0 = In :

Ws %^% 0

## [,1] [,2] [,3] [,4] [,5] [,6] [,7]

## [1,] 1 0 0 0 0 0 0
## [2,] 0 1 0 0 0 0 0
## [3,] 0 0 1 0 0 0 0
## [4,] 0 0 0 1 0 0 0
## [5,] 0 0 0 0 1 0 0
## [6,] 0 0 0 0 0 1 0
## [7,] 0 0 0 0 0 0 1

Thus, we have Sr (W ) = In βr = 0.1350In . When q = 1 we have only indirect effect since there are zero
elements on the diagonal of the matrix W . This also occurs for q = 3, 5, 7, 9:

Ws %^% 1

## [,1] [,2] [,3] [,4] [,5] [,6] [,7]

## [1,] 0.0 1.0 0.0 0.0 0.0 0.0 0.0
## [2,] 0.5 0.0 0.5 0.0 0.0 0.0 0.0
50 CHAPTER 2. SPATIAL MODELS

## [3,] 0.0 0.5 0.0 0.5 0.0 0.0 0.0

## [4,] 0.0 0.0 0.5 0.0 0.5 0.0 0.0
## [5,] 0.0 0.0 0.0 0.5 0.0 0.5 0.0
## [6,] 0.0 0.0 0.0 0.0 0.5 0.0 0.5
## [7,] 0.0 0.0 0.0 0.0 0.0 1.0 0.0

Ws %^% 3

## [,1] [,2] [,3] [,4] [,5] [,6] [,7]

## [1,] 0.000 0.750 0.000 0.250 0.000 0.000 0.000
## [2,] 0.375 0.000 0.500 0.000 0.125 0.000 0.000
## [3,] 0.000 0.500 0.000 0.375 0.000 0.125 0.000
## [4,] 0.125 0.000 0.375 0.000 0.375 0.000 0.125
## [5,] 0.000 0.125 0.000 0.375 0.000 0.500 0.000
## [6,] 0.000 0.000 0.125 0.000 0.500 0.000 0.375
## [7,] 0.000 0.000 0.000 0.250 0.000 0.750 0.000

Also, the row-stochastic nature of W leads to an average of the sum of the rows that takes the form
βr × ρ = 0.135 × 0.642 = 0.0867, when q = 1.
The matrix out also shows that both direct and indirect effects fall out as the order of neighbors increases,
however the indirect or spatial spillovers effects decay more slowly as we move to higher-order neighbors.
Part II

Estimation Methods

51
Maximum Likelihood Estimation
3
In this chapter we begin the study of the estimation methods for spatial models. In particular, we focus in
the maximum likelihood estimation method. However, it is important to know some basic of the different
estimation methods.
Spatial econometric models can be estimated by maximum likelihood (ML) (Ord, 1975), quasi-maximum
likelihood (QML) (Lee, 2004), instrumental variables (IV) (Anselin, 1988, pp. 82-86), generalized method
of moments (GMM) (Kelejian and Prucha, 1998, 1999), or by Bayesian Markov Chain Monte Carlo method
(Bayesian MCMC) (LeSage, 1997).
As we will see in this chapter, the main drawback of the ML estimation is the assumption of normality of
the error terms. QML and IV/GMM have the advantage that they do not rely on the assumption of normality
of the disturbances. However, both estimators assume that the disturbance terms are independently and
identically distributed for all i with zero mean and variance σ 2 . IV/GMM estimator has the disadvantage
that the estimate for ρ or λ may be out of the parameter space. These coefficients are restricted to the
interval (1/rmin ) by the Jacobian term in the ML estimation. This issue motivated the development of the
IV/GMM, which do not require the Jacobian term.To instrument the spatially lagged dependent variable,
Kelejian et al. (2004) suggest [X, W X, ..., W g X], where g is a pre-selected constant.

3.1 What Are The Consequences of Applying OLS?

We start this chapter analyzing the consequences of applying OLS model on a sample that follows a spatial
autocorrelated structure. The main result is that the estimated coefficients will be biased and inconsistent.
This means that the estimated parameters will not be close to the true parameters, even if you have a very
large data set, which is a serious problem.

3.1.1 Finite and Asymptotic Properties

First we will show that an OLS estimate of ρ will be biased under the SLM. To do so, and not get lost with
the notation, consider the following pure first order spatial autoregressive model:

y = ρ0 W y + ε , (3.1)
(n×1) (n×1) (n×1)

where ρ0 is the true population parameter of the data generating process (DGP). The reduced form for the
pure SLM in (3.1) is:
−1
y = (In − ρ0 W ) ε. (3.2)
As a result, the spatial lag term equals:
−1
W y = W (In − ρ0 W ) ε. (3.3)
This result will be useful later. Now, recall that if the model is y = Xβ + ε, then the OLS estimator is
−1 >
β = X >X
b X y. Then, considering (3.1) the OLS estimator for ρ0 is:
53
54 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

 −1
> >
ρbOLS = (W y) (W y) (W y) y . (3.4)
 
| {z } | {z } | {z } |{z}
(1×n) (n×1) (1×n) (n×1)

Substituting the expression for y in the population equation (3.1) into (3.4) gives us the following sampling
error equation:
h i−1
> >
ρbOLS = ρ0 + (W y) (W y) (W y) ε
n
!−1 n !
X X
= ρ0 + 2
yLi yLi i ,
i=1 i=1

where yLi is the ith element of the spatial lag operator W y = yL . Assuming that W is nonstochastic, the
mathematical expectation of ρbOLS is
h i−1
> >
E ( ρbOLS | W ) = ρ0 + E (W y) (W y) (W y) ε W

n
!−1 n
!
(3.5)
X X
= ρ0 + 2
yLi yLi i W .

E

i=1 i=1

From (3.5) it is clear that if the expectation of the last term is zero, then ρbOLS will be unbiased. However,
note that
n
!
h i
>
X
yLi i W = E (W y) ε W

E

i=1
" #
−1

= E ε> I − ρW > W > ε W using (3.3)

(1×1)

= E ε> C > ε W (3.6)

= E tr ε C ε W
> >

= E tr C > εε> W

= tr (C) E εε> W since tr(A) = tr(A> )

6= 0,
−1
where C = W (I − ρW ) . Therefore, given the result in (3.6) we have that E ( ρbOLS | W ) = ρ0 if and only
if tr (C) = 0, which occurs if ρ0 = 0. If ρ = 0, C = W , and tr(C) = tr(W ) = 0 because the diagonal
elements of W are zeros (See Definition 3.1.1 for properties of the trace). In other words, if the true model
follows a spatial autoregressive structure, the OLS estimate of ρ will be biased.
Definition 3.1.1 — Some useful results on trace. The trace of a squared matrix A, denoted tr(A), is defined
to be the sum of the elements on the main diagonal of A:
n
X
tr(A) = aii = a11 + a22 + ... + ann (3.7)
i=1

where aii denotes the entry on the ith row and ith column of A.
Some properties:
1. Let A and B be square matrices and c a scalar. Then:

tr(A + B) = tr(A) + tr(B) (3.8)

tr(cA) = c tr(A) (3.9)

2. tr(A) = tr(A> ).
3.1. WHAT ARE THE CONSEQUENCES OF APPLYING OLS? 55

3. tr(AB) = tr(BA).
4. Trace of an idempotent matrix: Let A be an idempotent matrix, then tr(A) = rank(A).
What about consistency? Note that we can write:

N
!−1 n
!
1X 2 1X
ρbOLS = ρ0 + y yLi i . (3.10)
n i=1 Li n i=1
Under ‘some conditions’ we can show that:
n
1X 2
y → q, (3.11)
n i=1 Li

where q is some finite scalar (We need some assumptions here about ρ and the structure of the spatial weight
matrix ). However, for the second term we obtain
n
1X p
yLi i −→ E(yLi εi ) = tr (C) E(εε> ) 6= 0. (3.12)
n i=1
As a result, the presence of the spatial weight matrix results in a quadratic form in the error terms, which
in turns introduces a form of endogeneity because the spatial lag W y will be correlated with the disturbance
vector ε. Therefore ρbOLS is inconsistent, and we need to account for the simultaneity by either in a maximum
likelihood estimation framework, or by using a proper set of instrumental variables.

R Lee (2002) shows that in some cases the OLS estimator may still consistent and even be asymptotically
efficient relative to some other estimators.

3.1.2 Illustration of Bias

We will perform a simple simulation experiment to assess the properties of the OLS estimator when the data
generating process follows a spatial lag model. The basic design of the experiment consists of generating
simulated observations from a known data generating process (from a SLM model in this case), from known
parameters, and then estimate the parameters for each simulated sample. If the parameter is biased then ,
on average, the estimated parameters should be far away from the true parameter.
For our simulation experiment, we will assume that the true DGP is:

y = ρ0 W y + ε (3.13)
where the true value ρ0 = 0.7; the sample size for each sample is n = 225; ε ∼ N(0, 1) and W is an artificial
n × n weight matrix. The W is constructed from a neighbor list for rook contiguity on a 500 × 500 regular
lattice.
The syntax for creating the global parameters for the simulation in R is the following:

# Global parameters
library("spdep") # Load package
set.seed(123) # Set seed
S <- 100 # Number of simulations
n <- 225 # Spatial units
rho <- 0.7 # True rho
w <- cell2nb(sqrt(n), sqrt(n)) # Create artificial W matrix
iw <- invIrM(w, rho) # Compute inverse of (I - rho*W)
rho_hat <- vector(mode = "numeric", length = S) # Vector to save results.

The function cell2nb creates a list of neighbors for a grid of cells. By default it creates neighbors based
on rook criteria. The invIrM function generates the full weights W , checks that ρ lies in its feasible range
between 1/ min ω and 1/ max ω, where ω = eigen(W ), and returns the n × n inverted matrix (In − ρW )−1 .
The loop for the simulation is the following:
56 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

# Loop for simulation

for (s in 1:S) {
e <- rnorm(n, mean = 0 , sd = 1) # Create error term
y <- iw %*% e # True DGP
Wy <- lag.listw(nb2listw(w), y) # Create spatial lag
out <- lm(y ~ Wy) # Estimate OLS
rho_hat[s] <- coef(out)["Wy"] # Save results
}

Note that since W is considered as fixed (nonstochastic) it is created out of simulation loop. The results
are the following:

# Summary of rho_hat
summary(rho_hat)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 0.8309 0.9981 1.0331 1.0332 1.0751 1.1680

It can be noticed that the estimated ρ ranges from 0.8 to 1.2, that is, the range does not include the true
parameter ρ0 = 0.7. Moreover, the mean of the estimated parameters is 1, which is very far away from 0.7!
We can conclude that the OLS estimator of the pure SLM model is highly biased.
Finally, we can plot the sampling distribution of the estimated parameters in the following way:

# Plot density of estimated rho_hat.

plot(density(rho_hat),
xlab = expression(hat(rho)),
main = "")
abline(v = rho, col = "red")

Figure 3.1 present the sampling distribution of ρ estimated by OLS for each sample in the Monte Carlo
simulation study. The observed pattern is the same as previously discussed: the distribution does not contain
ρ0 = 0.7.

Figure 3.1: Distribution of ρb

7
6
5
4
Density

3
2
1
0

0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3

^
ρ

Notes: This graph shows the sampling distribution of ρ estimated by OLS for each sample in the Monte Carlo
simulation study. The true DGP follows a pure Spatial Lag Model where the true parameter is ρ0 = 0.7
3.2. MAXIMUM LIKELIHOOD ESTIMATION OF SLM 57

3.2 Maximum Likelihood Estimation of SLM

Maximum Likelihood (ML) estimation of spatial lag and spatial error regression models was first derived by
Ord (1975). The starting point is the assumption of normality for the error terms. The joint likelihood then
follows from the multivariate normal distribution for the dependent variable y. But unlike the classic OLS,
the joint log likelihood for a spatial regression does not equal the sum of the log likelihoods associated with
the individual observations. This is due to the spatial simultaneity of the system.
In this section, we will give further insights about these issues. In particular, we derived the ML estimation
procedure for the Spatial Lag Model following very close to Ord (1975) and Anselin (1988, chapter 6).

3.2.1 Maximum Likelihood Function

The SLM model is given by the following structural model:

y = ρ0 W y + Xβ0 + ε,
(3.14)
ε ∼ N(0n , σ02 In ),
where y is a vector n × 1 that collects the dependent variable for each spatial units; W is a n × n spatial
weight matrix; X is a n × K matrix of independent variables; β0 is a known K × 1 vector of parameters;
ρ0 measures the degree of spatial correlation; and ε is a n × 1 vector of error terms. Note that we are
making the explicit assumption that the error terms follows a multivariate normal distribution with mean
0 and variance-covariance matrix σ02 In . That is, we are assuming that all spatial units have the same error
variance.
Since we are explicitly assuming the distribution of the error term, we will be able to use the maximum
likelihood estimation procedure. Under the maximum likelihood criterion, the parameter estimates θb =
(βb> , ρb, σ
b2 )> are chosen so as to maximize the probability of generating or obtaining the observed sample.
However, it should be noted that ML estimation is a highly parametric approach, which means that it is based
on strong assumptions. We will see that within these assumptions, it has optimal asymptotic properties (such
as consistency and asymptotic efficiency), but when the assumptions are violated, the optimal properties may
no longer hold.
How can we estimate θ0 ? Note that we could rearrange the model as:

y − ρ0 W y = Xβ0 + ε.
Following the derivation of the linear model, an estimate for β0 would be:
−1
b 0) = X >X
β(ρ X > (In − ρ0 W ) y,
which depend on ρ0 , Given this, an estimate for the variance parameter would be

εb(ρ0 )> εb(ρ0 )

b2 (ρ0 ) =
σ ,
n
which also depends on ρ0 , and where the residuals εb(ρ0 ) will be given by εb(ρ0 ) = y − ρ0 W y − X β. b Since
βb and σ b depend on ρ0 , we can concentrate the full log-likelihood with respect to the parameters β, σ 2 and
2

reduce maximum likelihood to an univariate optimization problem in the parameter ρ. This will be very
useful later in order to derive the ML algorithm.
In order to derive the joint distribution of the data, we need to find the probability density function
f (y1 , y2 , ..., yn |X; θ) = f (y|X; θ), that is, the joint conditional distribution of y given X. Using the Trans-
formation Theorem, we need the following transformation:

∂ε
f (y|X; θ) = f (ε(y)|X; θ) .
∂y
Recall that the model can be written as ε = Ay −Xβ with A = In −ρW where Ay is spatially filtered
dependent variable, i.e., with the effect of spatial autocorrelation taken out. Note that ε = f (y), that is,
the unobserved is a functional form of the observed y.1 To move from the the distribution of the error term
to the distribution for the observable random variable y we need the Jacobian transformation:
1 Since y and not the are the observed quantities, the parameters must be estimated by maximizing L(y), not L(ε). For
i i
more details about this, see Mead (1967) and Doreian (1981).
58 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

∂ε
det = det (J ) = det(A) = det(In − ρW ),
∂y

where J = ∂y ∂ε
is the n × n Jacobian matrix, and det(In − ρW ) is the determinant of a n × n matrix. In
contrast to the time-series case, the spatial Jacobian is not the determinant of a triangular matrix, but of a
full matrix. This may complicate its computation considerably. Recall that this Jacobian reduces to a scalar
1 in the standard regression model, since the partial derivative becomes |∂(y − Xβ)/∂y| = |In | = 1.
Using the density function of the multivariate normal distribution we can find the joint pdf of ε|X. By
recognizing that ε ∼ N(0, σ 2 In ), we can write:

1

f (ε|X) = (2π · σ 2 )−n/2 exp − 2 ε> ε .
2σ
Given an iid sample of n observations, y and X, the joint density of the observed sample is:

1

∂Ay − Xβ
f (y|X; θ) = (2π · σ 2 )−n/2 exp − 2 (Ay − Xβ)> (Ay − Xβ) det .
2σ ∂y
Note that the likelihood function is defined as the joint density treated as a function of the parameters:
L(θ|y, X) = f (y|X; θ). Finally, the log-likelihood function, which will be maximized, takes the form:2

n log(2π) n log(σ 2 ) 1
log L(θ) = log |A| − − − 2 (Ay − Xβ)> (Ay − Xβ)
2 2 2σ (3.15)
n log(2π) n log(σ 2 ) 1 h >
i
= log |A| − − − 2 y > A> Ay − 2 (Ay) Xβ + β > X > Xβ ,
2 2 2σ
where this development uses the fact that the transpose of a scalar is the scalar, i.e., y > A> Xβ = (y > AXβ)> =
β > X > Ay. This is similar to the typical linear-normal likelihood, except that the transformation from ε to
y, is not by the usual factor of 1, but by log |A|.

3.2.2 Score Vector and Estimates

In order to find the ML estimates for the SLM model, we need to maximize Equation (3.15) with respect to
θ = (β > , σ 2 , ρ)> . To do so, we need to find the first order condition (FOC) of this optimization problem.
Before taking derivatives it is useful to review some important properties of matrix calculus given in the
next definition.
Definition 3.2.1 — Some useful results on matrix calculus. Some important results are the followings:

∂(ρW )
=W (3.16)
∂ρ
∂A ∂(In − ρW )
=
∂ρ ∂ρ
∂In ∂ρW (3.17)
= −
∂ρ ρ
= −W
∂ log |A|
= tr(A−1 ∂A/∂ρ) = tr A−1 (−W ) (3.18)

∂ρ
Let ε = Ay − Xβ, then:

∂ε ∂(Ay − Xβ)
= = −W y (3.19)
∂ρ ∂ρ

n log(2π)
2 Since the constant −
2
is not a function of any of the parameters, some software programs do not include it when
reporting maximized log-likelihood. See Bivand and Piras (2015).
3.2. MAXIMUM LIKELIHOOD ESTIMATION OF SLM 59

∂ε> ε
= ε> (∂ε/∂ρ) + (∂ε> /∂ρ)ε = 2ε> (∂ε/∂ρ) = 2ε> (−W )y (3.20)
∂ρ

∂A−1
= −A−1 (∂A/∂ρ)A−1 = A−1 W A−1 (3.21)
∂ρ

∂ tr A−1 W

= tr ∂A−1 W /∂ρ (3.22)

∂ρ
Taking the derivative of Equation (3.15) respect to β, we obtain:

∂ log L(θ) 1 1
>
>
= − 2 −2 (Ay) X + 2X Xβ = 2 X > (Ay − Xβ),
>
(3.23)
∂β 2σ σ
and with respect to σ 2 yields:

∂ log L(θ) n 1 >

= − 2 + 4 (Ay − Xβ) (Ay − Xβ) . (3.24)
∂σ 2 2σ 2σ
Setting both (3.23) and (3.24) to 0 and solving, we obtain:

−1
βbM L (ρ) = X >X X > Ay (3.25)
>
(Ay − XβM L ) (Ay − XβM L )
L (ρ) = (3.26)
2
σ
bM .
n
Note that conditional on ρ (assuming we know ρ), these estimates are simply OLS applied to the spa-
tial filtered dependent variable Ay and the exploratory variables X. Moreover, after some manipulation,
Equation (3.25) can be re-written as:

−1 > −1 >

βM L (ρ) = X >X X y − ρ X >X X Wy
= βO − ρβL .
b b (3.27)
Note that the first term in (3.27) is just the OLS regression of y on X, whereas the second term is just
ρ times the OLS regression of W y on X. Next, define the following:

eO = y − X βb0 and eL = W y − X β
cL . (3.28)
Then, plugging (3.27) into (3.26)

>
(eO − ρeL ) (eO − ρeL )
L [βM L (ρ), ρ] = (3.29)
2
σM .
n
Note that both (3.27) and (3.29) rely only on observables, except for ρ, and so are readily calculable
given some estimate of ρ. Therefore, plugging (3.27) and (3.29) back into the likelihood (3.15) we obtain the
concentrated log-likelihood function:
" #
>
n n n (eO − ρeL ) (eO − ρeL )
log L(θ)c = − − log(2π) − log + log |In − ρW | , (3.30)
2 2 2 n
which is a nonlinear function of a single parameter ρ. A ML estimate for ρ, ρbM L , is obtained from a
numerical optimization of the concentrated log-likelihood function (3.30). Once we obtain ρb, we can easily
obtain β.
b The procedure can be summarized in the following steps:

Algorithm 3.1 — ML estimation of SLM. The algorithm to perform the ML estimation of the SLM is the
following:
1. Perform the two auxiliary regression of y and W y on X to obtain βbO and βbL as in Equation (3.27).

2. Use βbO and βbL to compute the residuals in Equation (3.28).

60 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

3. Maximize the concentrated likelihood given in Equation (3.30) by numerical optimization to obtain
an estimate of ρ.
4. Use the estimate of ρb to plug it back in to the expression for β (Equation 3.25) and σ 2 (Equation
3.26).

Since the score function will be important for understanding the asymptotic theory of MLE, we will derive
also ∂ log L(θ)/∂ρ. Taking the derivative of Equation (3.15) respect to ρ, we obtain:

∂ log L(θ) 1

∂ ∂
= log |A| − 2 ε> ε
∂ρ ∂ρ 2σ ∂ρ
1
= − tr(A−1 W ) + 2 2ε> W y Using (3.18) and (3.20)
2σ (3.31)
1
= − tr(A W ) + 2 2ε W y
−1 >
2σ
1
= − tr(A W ) + 2 ε> W y.
−1
σ
Thus the complete gradient (or score function) is:
 ∂ log L(θ)  
1 >

∂ log L(θ)  ∂ log∂βL(θ)   σ2 X ε
∇θ = =  ∂σ2  = 2σ 4 (ε ε − nσ )
1 > 2 
∂θ ∂ log L(θ) − tr(A W ) + σ2 ε W y
−1 1 >
∂ρ

Note that if we replace y = A −1

Xβ + A −1
ε in Equation (3.31), we get:

∂ log L(θ) 1 1
= 2 (CXβ)> ε + 2 (ε> Cε − σ 2 tr(C)),
∂ρ σ σ
where:

C = W A−1 .

3.2.3 Ord’s Jacobian

Therefore:
n
Y
|In − ρW | = (1 − ρωi ),
i=1

and the log-determinant term follows as

n
X
log |In − ρW | = log(1 − ρωi ) (3.32)
i=1

The advantage of this approach is that the eigenvalues only need to be computed once, which carries
some overhead, but greatly speeds up the calculation of the log-likelihood at each iteration. In practice, in
all but the smallest data sets (< 4000 observations), the Ord’s approach will be faster than the brute force
approach.
This new formulation give us the possible domain of ρ. We need that 1 − ρωi 6= 0, which occurs only if
1/ωmin < ρ < 1/ωmax . For row-standardized matrix, the largest eigenvalues is 1. With this new approxima-
tion, the new concentrated log-likelihood function is:
3.2. MAXIMUM LIKELIHOOD ESTIMATION OF SLM 61

" # n
>
n n n (eO − ρeL ) (eO − ρeL ) X
log L(θ)c = − − log(2π) − log + log(1 − ρωi )
2 2 2 n i=1
n
(3.33)
e eO − 2ρe>
L eO + ρ eL eL 2X
> 2 >

= const − log O + log(1 − ρωi )
n n i=1

Another method approach is the characteristic root method outlined in Smirnov and Anselin (2001). This
approach allows for the estimation of spatial lag models for very large data sets (> 100,000 observations) in
a very short time. However, it is limited by the requirement that the weight matrix needs to be intrinsically
symmetric. This precludes the use of asymmetric weight such as k-nearest neighbor weights. For other
approximations see LeSage and Pace (2010, chapter 4).

3.2.4 Hessian
The Hessian matrix will be very important in the following sections to obtain the asymptotic variance-
covariance matrix. For this reason we devote a complete section in order to derive this matrix for the SLM.
In this case, the Hessian is a (K + 2) × (K + 2) matrix of second derivatives given by :
2 2 2
 
`(β,σ ,ρ) `(β,σ ,ρ) `(β,σ ,ρ)
∂β∂β > ∂β∂σ 2 ∂β∂ρ
2
`(β,σ 2 ,ρ) `(β,σ 2 ,ρ) 
H(β, σ 2 , ρ) = 
 `(β,σ ,ρ)

 ∂σ2 ∂β > ∂(σ 2 )2 ∂σ 2 ∂ρ 
2
`(β,σ ,ρ) `(β,σ 2 ,ρ) `(β,σ 2 ,ρ)
∂ρ∂β > ∂ρ∂σ 2 ∂ρ2

Now we work in the cross-derivatives for β. From (3.23):

∂ 2 log L(θ) 1
= − 2 (X > X) (3.34)
∂β∂β > σ
∂ 2 log L(θ) 1
= − 2 2 X >ε (3.35)
∂β∂σ 2 (σ )
∂ 2 log L(θ) 1
= − 2 X > W y, (3.36)
∂β∂ρ σ

Using the first derivative (3.24) and working in the cross-derivatives for σ 2 , we obtain:

∂ 2 log L(θ) n 1
= − 2 3 ε> ε (3.37)
∂(σ )
2 2 2(σ )
2 2 (σ )
and:

∂ log L(θ) 1

∂ε
= 2ε >
Using Equation (3.20)
∂σ 2 ∂ρ 2σ 4 ∂ρ
1
= − 4 ε> W y (3.38)
σ
ε> W y
=−
σ4
Finally, working in the second derivative of ρ, and using (3.31), we obtain

∂ log L(θ) 1

∂ ∂
=− tr(A W ) + 2
−1
ε> W y
∂ρ2 ∂ρ σ ∂ρ
∂A−1 W 1

∂
= − tr + 2 (Ay)> W y
∂ρ σ ∂ρ (3.39)
1
= − tr A−1 W A−1 W + 2 (−y > W > W y)

σ
1
= − tr (W A−1 )2 − 2 (y > W > W y)

σ
Therefore, the Hessian is:
62 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

− σ2 (X > X) − (σ12 )2 X > ε − σ12 X > W y

 1 
>
H(β, σ , ρ) = 
2 n
− 1 >
− ε σW y
 
. 2(σ 2 )2 (σ 2 )3 ε ε 4 
− tr (W A ) − σ2 (y W W y)
−1 2 1 > >

. .
which is symmetric. Under some regularity conditions, the ML estimates will be asymptotically efficient.
This means that they achieve the Cramer-Rao lower variance bound, given by the inverse of the information
matrix:
−1 −1
[I(θ)] = −E H(β, σ 2 , ρ)

.
To obtain the information matrix, we need to take the expected values of these second derivatives. To do
so, the following definitions and relations for the Spatial Lag Model are very useful:

ε = Ay − Xβ.
If follows that, in terms of expected values:

E [ ε| W , X] = 0 (3.40)
E εε> W , X = σ 2 In , (3.41)

and, for y:

y = A−1 Xβ + A−1 ε (3.42)

E [ y| W , X] = A Xβ −1
(3.43)
>
E yy > W , X = A−1 Xβ A−1 Xβ + A−1 σ 2 In A−1> (3.44)

We now derive the most difficult expectations. From (3.36):

∂ log L(θ) 1
2
E W , X = − 2 X > E [ W y| W , X]
∂β∂ρ σ
1
= − 2 X > E W A−1 Xβ + W A−1 ε W , X

σ (3.45)
1
= − 2 X > W A−1 Xβ
σ
1
= − 2 X > CXβ
σ
For (3.37) we obtain:

∂ log L(θ) 1
2
n
W,X = E >

E − 2 3 ε ε W , X
∂(σ 2 )2 2(σ 2 )2 (σ )
n 1 >
=

− 6E ε ε W,X
2σ 4 σ
n 1
= − 6 E ε> Iε W , X

2σ 4 σ
n 1
= − 6 E tr(ε> Iε) W , X

2σ 4 σ
n 1 (3.46)
= − 6 E tr(In )εε> W , X

2σ 4 σ
n 1
= − 6 tr(In )E εε> W , X

2σ 4 σ
n 1
= − 6 tr(In )σ 2 In
2σ 4 σ
n n 2
= − σ In
2σ 4 σ6
n
=− 4
2σ
3.2. MAXIMUM LIKELIHOOD ESTIMATION OF SLM 63

From (3.38):

∂ 2 log L(θ) ε> W y

E W , X = E − W , X
∂σ 2 ∂ρ σ4
1
= − 4 E ε> W y W , X

σ
1
= − 4 E ε> W (A−1 Xβ + A−1 ε) W , X

σ
1
= − 4 E ε> W A−1 Xβ + ε> W A−1 ε W , X

σ
1
= − 4 E ε> CXβ + ε> Cε W , X

σ (3.47)
1
= − 4 E ε> Cε W , X

σ
1
= − 4 E tr ε> Cε W , X

σ
1
= − 4 tr (C) E εε> W , X

σ
1
= − 4 tr (C) σ 2 In
σ
= − tr(C)/σ 2
For (3.39):

∂ 2 log L(θ) 1 > >

= tr (W )
−1 2
(y

E W , X E − A − W W y) W,X
∂ρ2 σ2
1 h > i
= − tr(W A−1 )2 − 2 E A−1 Xβ + A−1 ε W > W A−1 Xβ + A−1 ε
σ
1
= − tr(W A ) − 2 E β > X > A−1> W > W A−1 Xβ + β > X > A−1> W > W A−1 ε +
−1 2
σ
ε> A−1> W > W A−1 Xβ + ε> A−1> W > W A−1 ε

1 h 0
= − tr(W A−1 )2 − 2 E β > X > A−1 W > W A−1 Xβ+
σ
2ε> A−1> W > W A−1 Xβ + ε> A−1> W > W A−1 ε

1
= − tr(W A−1 )2 − 2 E β > X > C > CXβ + 2ε> C > CXβ + ε> C > Cε

σ
1
= − tr(W A ) − 2 β > X > C > CXβ + E 2ε> C > CXβ + E ε> C > Cε
−1 2

σ
1
= − tr(W A ) − 2 β > X > C > CXβ + E tr(ε> C > Cε)
−1 2

σ
1
= − tr(W A ) − 2 β > X > C > CXβ + E tr(C > C)εε>
−1 2

σ
1
= − tr(W A ) − 2 β > X > C > CXβ + tr(C > C)E εε> W , X
−1 2

σ
1
= − tr(W A ) − 2 β > X > C > CXβ + tr(C > C)σ 2
−1 2

σ
1
= − tr(C) − 2 (CXβ)> (CXβ) − tr(C > C)
2
σ
1
= − tr(C C) − 2 (CXβ)> (CXβ)
s
σ
(3.48)
where ε> A−1> W > W A−1 Xβ = (β > X > A−1> W > W A−1 ε)> because is a scalar, C s = C + C > and
C = W A−1 .

Thus, the expected value of the Hessian is:

64 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

σ 2 (X X) σ 2 X (CXβ)
>
0> >
1 1


− E H(β, σ , ρ) = σ 2 tr(C) (3.49)

2 n 1

 
2σ 4
tr(C C) + σ2 (CXβ) (CXβ)
s 1 >

The asymptotic variance matrix follows as the inverse of the information matrix:
−1
σ 2 (X X) σ 2 X (CXβ)
>
1
0> 1 >
 

Var(β, σ 2 , ρ) =  σ 2 tr(C) (3.50)

n 1 
2σ 4
tr(C C) + σ2 (CXβ) (CXβ)
s 1 >

An important feature is that the covariance between β and the error variance is zero, as in the standard
regression model, this is not the case for ρ and the error variance. This lack of block diagonality in the infor-
mation matrix for the spatial lag model will lead to some interesting results on the structure of specification
test.
However, we can use the eigenvalues approximation. Recall that
n
∂ X ωi
log |A| = − , (3.51)
∂ρ i=1
1 − ρωi
so that,
−1
σ 2 (X X) σ 2 X (CXβ)
>
1
00 1 >
 

Var(β, σ 2 , ρ) =  σ 2 tr(C)
n 1
. 2σ 4

. . α+ tr(CC) + σ2 (CXβ) (CXβ)
1 >

ωi2
Pn
where α = i=1 (1−ρωi )2 . Note that while the covariance between β and the error variance is zero, as in
the standard regression model, this is not the case for ρ and the error variance.

3.3 ML Estimation of SEM Model

3.3.1 What Are The Consequences of Applying OLS on a SEM Model?
As we reviewed in Section 2.1.3, a second way to incorporate spatial autocorrelation in a regression model is
to specify a spatial process for the error term. The SEM model is given by

y = Xβ0 + u
u = λ0 W u + ε (3.52)
ε∼ N(0, σ02 In )
where λ0 is the spatial autoregressive coefficient for the error lag W u (to distinguish the notation from
the spatial autoregressive coefficient ρ in a spatial lag model), W is the spatial weight matrix, ε is an
error ε ∼ N (0, σ02 In ). This model do not require a theoretical model for a spatial process, but instead,
is consistent with a situation where determinants of the dependent variable omitted from the model are
spatially autocorrelated, or with a situation where unobserved shocks follows a spatial pattern (Elhorst,
2014). In summary, SEM treats spatial correlation primarily as a nuisance.
If λ > 0, then we face positive spatial correlation. This implies clustering of similar values; that is, the
errors for spatial unit i tend to vary systematically with the errors for other nearby observations j so that
smaller/larger errors for i would tend to go together with smaller/larger errors for j. This violates the typical
assumption of no autocorrelation in the error term of the OLS.
Under the assumption that the spatial weights matrix is row standardized and the parameter is less than
one in absolute value, the model can be also be expressed as:
−1
y = Xβ + (In − λW ) ε.
Since u = (In − λW ) −1
ε, it can be shown that E(u) = 0. Furthermore, the variance-covariance matrix
of u is:

Var(u) = E(uu> ) = σ 2 (In − λW )−1 (I − λW > )−1 = σ 2 Ωu−1 , (3.53)

3.3. ML ESTIMATION OF SEM MODEL 65

where Ωu = (In − λW )(In − λW )> . The variance covariance (3.53) is a full matrix implying a spatial
autoregressive error process leading to a nonzero error covariance between every pair of observations, but
decreasing in magnitude with the order of contiguity (Anselin and Bera, 1998). Furthermore, the complex
structure in the inverse matrix in (3.53) yields nonconstant diagonal elements in the error covariance matrix,
thus inducing heteroskedasticity in u, irrespective of the heteroskedasticity of ε. Finally, u ∼ N(0, σ 2 Ωu−1 ).

R The OLS estimates of model in Equation (3.52) are unbiased, but inefficient if λ 6= 0.

Given the previous Remark, we might used generalized least squares (GLS) for a more efficient parameters
estimation. Recall that the inefficiency of OLS estimates of the regression coefficient would invalidate the
statistical inference in the spatial error model. The invalidity of significance test arises from biased estimation
of the variance and standard errors of the OLS estimates for β and λ.

3.3.2 Log-likelihood function

The model in Equation (3.52) implies that the reduced form is :

ε = (I − λW ) y − (In − λW ) Xβ = By − BXβ,
where B = (In − λW ). The new error term indicates that ε(y). Recall that in order to create the log-
likelihood function we need the joint density function. Using the Transformation Theorem we are able to
find the joint conditional function:

f (y1 , ..., yn |X; θ) = f (ε(y)|X; θ) · |J | .

Again, the Jacobian term is not equal to one, but instead is
∂ε
J= = B.
∂y
Thus, the joint density function of ε—which is a function of y— equals
" #
>
[(I n − λW ) (y − Xβ)] [(I n − λW ) (y − Xβ)]
f (ε(y)|X; θ) = (2πσ 2 )−n/2 exp − ,
2σ 2

and the joint density function of y, f (y1 , ..., yn |X; θ) equals

(y − Xβ)> B > B(y − Xβ)

f (y|X; θ) = (2πσ 2 )−n/2 exp − · |B|
2σ 2
Finally, the log-likelihood can be expressed as

n n (y − Xβ)> Ω(λ)(y − Xβ)

log L(θ) = − log(2π) − log(σ 2 ) − + log |In − λW | , (3.54)
2 2 2σ 2
where
>
Ω(λ) = B > B = (In − λW ) (In − λW )
Again, we run into complications over the log of the determinant |In − λW |, which is an nth-order
polynomial that is cumbersome to evaluate.

3.3.3 Score Function and ML Estimates

Maximizing the log-likelihood function (3.54) is the same as to minimizing the sum of the transformed errors,
ε> ε, corrected by the log of the Jacobian, log |In − λW |. Since we are accounting for this correction, the
ML estimates will differ from the OLS estimates. They will coincide if λ → 0.
To obtain the ML estimates, we apply FOC to the log-likelihood function (3.54). Taking the derivative
respect to β yields:
66 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

−1 >
βM L (λ) = X > Ω(λ)X

X Ω(λ)y
−1
= (BX)> (BX) (BX)> By (3.55)

−1
= X(λ)> X(λ) X(λ)> y(λ),

where:

X(λ) = BX = (I − λW )X = (X − λW X)
(3.56)
y(λ) = (y − λW y).

If λ is known, this estimator is equal to the GLS estimator—βbM L = βbGLS —and it can be thought as the
OLS estimator resulting from a regression of y(λ) on X(λ). In other words, for a known value of the spatial
autoregressive coefficient ,λ, this is equivalent to OLS on the transformed variables.

R In the literature, the transformations:

X(λ) = (X − λW X)
y(λ) = (y − λW y)

are known as the Cochrane-Orcutt transformation.

In the same way, a first-order condition resulting from the spatial derivative of (3.54) with respect to σ 2
gives the ML estimator for the error variance:
1 > > 1
L (λ) = εb B Bε = εb> (λ)b ε(λ) (3.57)
2
σM
n n
where εb = y − X βbM L and εb(λ) = B(λ)(y − X βbM L ) = B(λ)y − B(λ)X βbM L .
First order condition derived from the expression of the likelihood are highly non-linear and therefore the
likelihood in Equation (3.54) cannot be directly maximized. Again, a concentrated likelihood approach is
necessary.
The estimators for β and σ 2 are both functions of the value of λ. A concentrated log-likelihood can then
be obtained as:

1 > >

n
log L(θ)c = const + log εb B B εb + log |B| (3.58)
2 n
The residual vector of the concentrated likelihood is also, indirectly, a function of the spatial autoregressive
parameter.
A one-time optimization will in general not be sufficient to obtain maximum likelihood estimates for all
the parameters. Therefore an interactive procedure will be needed.
Alternate back and forth between the estimation of the spatial autoregressive coefficient conditional upon
residuals (for a value of β), and a estimation of the parameter vector (conditional upon the s.a.c).

Algorithm 3.2 — ML estimation of SEM. Following Anselin (1988), the procedure can be summarize in the
following steps:

1. Carry out an OLS of BX on By; get βbOLS

2. Compute initial set of residuals b

OLS = By − BX βbOLS
3. Given b
OLS , find λ
b that maximizes the concentrated likelihood.

4. If the convergence criterion is met, proceed, otherwise repeat steps 1, 2 and 3.

5. Given λ,
b estimate β(λ)
b by GLS and obtain a new vector of residuals, εb(λ)

6. Given εb(λ) and λ,

b estimate σ
b(λ).
3.4. COMPUTING THE STANDARD ERRORS FOR THE MARGINAL EFFECTS 67

Finally, the asymptotic variance-covariance matrix is:

 X(λ)>X(λ) −1
σ2 0 0
k×k
AsyVar(β, σ 2 , λ) =  (3.59)
 
tr(WB )
0 n 
2σ 4 σ2
 
tr(WB )
0 σ2 tr(WB ) +2
tr(WB> WB )
where WB = W (I − λW )−1 .

3.4 Computing the Standard Errors For The Marginal Effects

In section 2.3.2, we explain how to obtain summary measures for the direct, indirect and total effects.
However, we did not explain how to obtain standard errors for such measures. For example, we would like
to have confidence intervals for the indirect effects and to be able to say whether they are significant.
Recall that our three summary measures are:

M̄ (θ)direct = n−1 tr (Sr (θ))

M̄ (θ)total = n Sr (θ)ın
n−1 ı>
M̄ (θ)indirect = M̄ (r)total − M̄ (r)direct ,

which are highly nonlinear due to Sr (θ).3 Therefore, a procedure such as the Delta Method is not feasible.
Instead, we use a Monte Carlo approximation which takes into account the sampling distribution of θ. To
show this procedure, consider the SDM where:
−1
S(θ)r = (In − ρW ) (In βr + W γr )
Let g(θ) = M̄ (θ) be a function representing the marginal (direct, indirect or total) effect that depends
on the population parameters θ. If N(θ|θ̄, Σθ ) denotes the multivariate normal density of θ with mean θ̄
and asymptotic variance-covariance matrix Σθ , then the expected value of the marginal effects conditional
on the population parameters θ̄ and Σθ is:
Z
E(g(θ)|θ̄, Σθ ) = E(g(θ)|y, X, θ)N(θ|θ̄, Σθ )dθ. (3.60)
θ
A Monte Carlo approximation to this expectation is obtained by calculation of the empirical marginal
effects evaluated at pseudo draws of θ from the asymptotic distribution of the estimator. The algorithm is
the following:

Algorithm 3.3 — Standard Errors of the Marginal Effects. Estimate the model using MLE. Consider s =
1, ..., S, and start with s = 1

1. Take a random draw of θ s from N(θ, b θ ), which is the estimated asymptotic distribution of θ.
b̄ Σ b

2. Compute the marginal effect, but substituting θb for θ s .

3. Update s = s + 1, and go back to step 1.
4. Repeat for a large number of repetitions S (e.g., S = 1000).

5. Calculate the empirical mean of the marginal effects. The standard error of the marginal effect across
the S draws is the standard error.
3 Note that we have replaced the parameter for the spatially lagged independent variable to let θ be the vector parameters of

the model.
68 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

3.5 Spillover Effects on Crime: An Application in R

3.5.1 Estimation of Spatial Models in R
In this example we use Anselin (1988)’s dataset. This sample corresponds to a cross-sectional dataset of 49
Columbus, Ohio neighborhoods, which is used to explain the crime rate as a function of household income
and housing values. In particular, the variables are the following:

• CRIME: residential burglaries and vehicle thefts per thousand household in the neighborhood.

• HOVAL: housing value in US$1,000.

• INC: household income in US$1,000.

We start our analysis by loading the required packages into R workspace.

# Load packages
library("spdep")
library("spatialreg")
library("memisc") # Package for tables
library("maptools")
library("RColorBrewer")
library("classInt")
source("getSummary.sarlm.R") # Function for spdep models

Dataset is currently part of the spdep package. We load the data using the following commands.

# Load data
columbus <- readShapePoly(system.file("etc/shapes/columbus.shp",
package = "spdep")[1])
col.gal.nb <- read.gal(system.file("etc/weights/columbus.gal",
package = "spdep")[1])

As usual in applied work, we start the analysis by asking whether there exists a spatial pattern in the
variable we are interested in. To get some insights about the spatial distribution of CRIME we use the following
quantile clorophet graph:

# Spatial distribution of crime

spplot(columbus, "CRIME",
at = quantile(columbus$CRIME, p = c(0, .25, .5, .75, 1), na.rm = TRUE),
col.regions = brewer.pal(5, "Blues"),
main = "")

Figure 3.2 shows the spatial pattern of crime. It can be observed that the spatial distribution of crime
follows a clear pattern of positive autocorrelation. However, we must corroborate this statement by using
a global test of spatial autocorrelation. To do so, we use a row-normalized binary contiguity matrix W ,
col.gal.nb, based on the queen criteria and carry out a Moran’s I test. In particular, we use a Moran test
based on Monte Carlo simulations using the moran.mc function with 99 simulations.

# Moran's I test
set.seed(1234)
listw <- nb2listw(col.gal.nb, style = "W")
moran.mc(columbus$CRIME, listw = listw,
nsim = 99, alternative = 'greater')

##
## Monte-Carlo simulation of Moran I
##
3.5. SPILLOVER EFFECTS ON CRIME: AN APPLICATION IN R 69

Figure 3.2: Spatial Distribution of Crime in Columbus, Ohio Neighborhoods

Notes: This graph shows the spatial distribution of crime on the 49 Columbus, Ohio neighborhoods. Darker color
indicates greater rate of crime.

## data: columbus$CRIME
## weights: listw
## number of simulations + 1: 100
##
## statistic = 0.48577, observed rank = 100, p-value = 0.01
## alternative hypothesis: greater

The results show that the Moran’s I statistic is 0.51 and the p-value is 0.01. This implies that we
reject the null hypothesis of random spatial distribution and there exists evidence of positive global spatial
autocorrelation in the crime variable: places with high (low) crime rate are surrounded by places with high
(low) crime rate.
Our next step is to estimate different spatial models using the functions already programmed in spdep.
First, we estimate the classical OLS model followed by the SLX, SLM, SDM, SEM and SAC models. The
functions used for each models are the following:
• OLS: lm function.
• SLX: lm function, where W X is constructed using the function lag.listw from spdep package. This
model can also be estimated using the function lmSLX from spdep package as shown below.
• SLM: lagsarlm from spdep package.
• SDM: lagsarlm from spdep package, using the argument type = "mixed". Note that type = "Durbin"
may be used instead of type = "mixed".
• SEM: errorsarlm from spdep package. Note that the Spatial Durbin Error Model (SDEM)—not shown
here— can be estimated by using type = "emixed".
• SAC: sacsarlm from spdep package.
All models are estimated using ML procedure outline in the previous section. In order to compute the
determinant of the Jacobian we use the Ord (1975)’s procedure by explicitly using the argument method =
"eigen" in each spatial model. That is, the Jacobian is computed as in (3.32).
70 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

# Models
columbus$lag.INC <- lag.listw(listw,
columbus$INC) # Create spatial lag of INC
columbus$lag.HOVAL <- lag.listw(listw,
columbus$HOVAL) # Create spatial lag of HOVAL
ols <- lm(CRIME ~ INC + HOVAL,
data = columbus)
slx <- lm(CRIME ~ INC + HOVAL + lag.INC + lag.HOVAL,
data = columbus)
slm <- lagsarlm(CRIME ~ INC + HOVAL,
data = columbus,
listw,
method = "eigen")
sdm <- lagsarlm(CRIME ~ INC + HOVAL,
data = columbus,
listw,
method = "eigen",
type = "mixed")
sem <- errorsarlm(CRIME ~ INC + HOVAL,
data = columbus,
listw,
method = "eigen")
sac <- sacsarlm(CRIME ~ INC + HOVAL,
data = columbus,
listw,
method = "eigen")

Note that the SLX model can also be estimated as follows:

slx2 <- lmSLX(CRIME ~ INC + HOVAL,

data = columbus,
listw)
summary(slx2)

The models are presented in Table 3.1. The OLS estimates are presented in the first column. The results
show that an increase of 1 thousand dollars in the income of the neighborhood is correlated, in average, with
a decreased of 1.6 crimes per thousand households. Similarly, an increase of 1 thousand dollars in the housing
value of the neighborhood is correlated, in average, with a decreased of 0.3 crimes per thousand households.
Both correlations are statistically significant.4 Both results implies that crimes (residential burglaries and
vehicle thefts) are lower richer neighborhoods.
Column 2 of Table 3.1 show the results for the SLX. In particular, the model is given by y = Xβ +
W Xγ + ε, where W X is a 49 × 2 matrix, whose columns correspond to the spatial lag of INC and HOVAL.
The coefficient for the spatial lag of INC, W.INC, is negative and significant. This implies that crime in spatial
unit i is correlated with the income in its neighborhood: the higher the income of the neighbors of i the
lower the crime in i. This result does not, however, hold for the housing value of the neighbors of i which is
positive but not statistically different from zero.
The results for the SLM are shown in column 3. The spatial autoregressive parameter ρ is positive and
significant indicating strong spatial autocorrelation. This implies evidence of spillover effects on crime. The
coefficients for the other variables in the regression are similar to the OLS results, though smaller in absolute
value.
The results for the SDM are presented in column 4. Whereas the estimated ρ parameter is positive and
significant, the coefficient of the lagged explanatory variables are not. This indicates that once we have take
into account the endogenous interaction effects of crime, the neighbors’ factors do not matter in explaining
the crime in each location. Moreover, for the spatial lag of income, the wrong sign is obtained, since the
common factor hypothesis would imply a positive sign, given a positive estimate for ρ and negative sign for
4 Note that we refer to correlation since there may still be some sort of endogeneity problem in either of the two variables.
3.5. SPILLOVER EFFECTS ON CRIME: AN APPLICATION IN R 71

INC. This provide some evidence that an omitted spatial lag may be the main spatial effect, rather than
spatial dependence in the error term.
Column 5 shows the results for the the SEM model which confirm the conclusions from the previous
models. It can be noticed that the autoregressive parameter for W u is positive and significant indicating an
important spatial transmission of the random shocks. This result may be explained by the fact of omitting
important variables that are spatially correlated.
The SAC model, presented in column 6, considers both endogenous interactions effects and interactions ef-
fects among the error terms. From the results, we observe that the SAC model produces coefficients estimates
of W y and W u variables that are not significantly different from zero. However, if endogenous interaction
effects and interactions effects among the error terms are separated from each other, both coefficients turn
out to be significant. This might be explained by the fact that the model is overparametrized, as a result of
which the significance levels of all variables tend to go down.
For those interested in programming the SLM model in R, I provide a short code in R to estimate this
model using Ord’s approximation in Appendix 3.B.

Table 3.1: Spatial Models for Crime in Columbus, Ohio Neighborhoods.

OLS SLX SLM SDM SEM SAC

Constant 68.619 ∗∗∗
74.029 ∗∗∗
46.851∗∗∗
45.593 61.054
∗∗∗
49.051∗∗∗
∗∗∗

(4.735) (6.722) (7.315) (13.129) (5.315) (10.055)

INC −1.597∗∗∗ −1.108∗∗ −1.074∗∗∗ −0.939∗∗ −0.995∗∗ −1.069∗∗
(0.334) (0.375) (0.311) (0.338) (0.337) (0.333)
HOVAL −0.274∗ −0.295∗∗ −0.270∗∗ −0.300∗∗∗ −0.308∗∗∗ −0.283∗∗
(0.103) (0.101) (0.090) (0.091) (0.093) (0.092)
W.IN C −1.383∗ −0.618
(0.559) (0.577)
W.HOV AL 0.226 0.267
(0.203) (0.184)
ρ 0.404∗∗∗ 0.383∗ 0.353
(0.121) (0.162) (0.197)
λ 0.521∗∗∗ 0.132
(0.141) (0.299)
AIC 382.754 380.197 376.337 378.032 378.310 378.146
N 49 49 49 49 49 49
Significance: ∗ ∗ ∗ ≡ p < 0.001; ∗∗ ≡ p < 0.01; ∗ ≡ p <
0.05

3.5.2 Estimation of Marginal Effects in R

In this section we expand our analysis from Section 2.3.4 in the sense that we now integrate the estimation
of the marginal effects using a real estimation from R.
We begin our analysis with the following question: what would happen to crime in all regions if income
rose from 13.906 to 14.906 in the 30th region (∆INC = 1)? Note that we tried to answer a similar question in
the commuting-time example from previous chapter. As we did in Section 2.3.4 we can use the reduced-form
predictor given by the following formula:

yb = E(y|X, W ) = (In − ρbW )−1 X β,

and estimate the predicted values pre- and post- the change in the income variable. In the following lines we
use the reduced-form predictor and the observed values of the exogenous variables to obtain the predicted
values for CRIME, yb1 , using the SLM model previously estimated.

# The predicted values

rho <- slm$rho # Estimated rho from SLM model
beta_hat <- coef(slm)[-1] # Estimated parameters
72 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

A <- invIrW(listw, rho = rho) # (I - rho*W)^{-1}

X <- cbind(1, columbus$INC, columbus$HOVAL) # Matrix of observed variables
y_hat_pre <- A %*% crossprod(t(X), beta_hat) # y hat

Next we increase INC by 1 in spatial unit 30, and calculate the reduced-form predictions, yb2 .

# The post-predicted values

col_new <- columbus # copy the data frame

# Change the income value

col_new@data[col_new@data$POLYID == 30, "INC"] <- 14.906

# The predicted values

X_d <- cbind(1, col_new$INC, col_new$HOVAL)
y_hat_post <- A %*% crossprod(t(X_d), beta_hat)

Finally, we compute the difference between pre- and post-predictions: yb2 − yb1 :

# The difference
delta_y <- y_hat_post - y_hat_pre
col_new$delta_y <- delta_y

# Show the effects

summary(delta_y)

## V1
## Min. :-1.1141241
## 1st Qu.:-0.0074114
## Median :-0.0012172
## Mean :-0.0336341
## 3rd Qu.:-0.0002604
## Max. :-0.0000081

sum(delta_y)

## [1] -1.648071

According to the result from sum(delta_y), the predicted effect of the change would be a decrease of 1.65
in the crime rate, considering both direct and indirect effects. That is, increasing the income in US$1,000 in
region 30th might generate effects that will transmit through the whole system of region resulting in a new
equilibrium where the the total crime will reduce in 1.7 crimes per thousand households.
Sometimes we would like to plot these effects. Suppose we wanted to show those regions that had low
and high impact due to the increase in INC. Let’s define “high impacted regions” those regions whose crime
rate decrease more than 0.05. The following code produces Figure 3.3.

# Breaks
breaks <- c(min(col_new$delta_y), -0.05, max(col_new$delta_y))
labels <- c("High-Impacted Regions", "Low-Impacted Regions")
np <- findInterval(col_new$delta_y, breaks)
colors <- c("red", "blue")

# Draw Map
plot(col_new, col = colors[np])
legend("topleft", legend = labels, fill = colors, bty = "n")
points(38.29, 30.35, pch = 19, col = "black", cex = 0.5)

Now we map the magnitude of the changes caused by altering INC in region 30. The code is the following
and the graph is presented in Figure 3.4.
3.5. SPILLOVER EFFECTS ON CRIME: AN APPLICATION IN R 73

Figure 3.3: Effects of a Change in Region 30: Categorization

High−Impacted Regions
Low−Impacted Regions

Notes: This graph shows those regions that had low and high impact due to increase in INC in 30th. Red-colored
regions are those regions with a decrease of crime rate larger than 0.05, whereas blue-colored regions are those
regions with lower decrease of crime rate.

# Plot the magnitude of the ME

pal5 <- brewer.pal(6, "Spectral")
cats5 <- classIntervals(col_new$delta_y, n = 5, style = "jenks")
colors5 <- findColours(cats5, pal5)
plot(col_new, col = colors5)
legend("topleft", legend = round(cats5$brks, 2), fill = pal5, bty = "n")

In the rest of this Section we use the impacts() function from spdep package to understand the direct
(local), indirect(spillover), and total effect of a unit change in each of the predictor variables. This function
returns the direct, indirect and total impacts for the variables in the model. The spatial lag impact measures
are computed using the reduced form:
K
X
y= A(W )−1 (In βr ) + A(W )−1 ε
r=1
(3.61)
A(W ) −1
= In + ρW + ρ W + ....
2 2

The exact A(W )−1 is computed when listw is given. When the traces are created by powering sparse
matrices the approximation In + ρW + ρ2 W 2 + .... is used. When the traces are created by powering sparse
matrices, the exact and the trace methods should give very similar results, unless the number of powers used
is very small, or the spatial coefficient is close to its bounds.

impacts(slm, listw = listw)

## Impact measures (lag, exact):

## Direct Indirect Total
## INC -1.1225156 -0.6783818 -1.8008973
## HOVAL -0.2823163 -0.1706152 -0.4529315
74 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

Figure 3.4: Effects of a Change in Region 30: Magnitude

−1.1141
−1.1141
−0.1632
−0.0759
−0.0061
0

Notes: This graph shows the spatial distribution of the changes caused by altering INC in region 30.

The output says that an increase of US$1,000 in income leads to a decrease of 1.8 crimes per thousand
households.
The direct effect of the income variable in the SLM model amounts to -1.123, while the coefficient estimate
of this variable is -1.074. This implies that the feedback effect is -1.123 - (-1.074) = -0.049. This feedback
effect corresponds to 4.5% of the coefficient estimate.
Let’s corroborate these results by computing the impacts using matrix operations:

## Construct S_r(W) = A(W)^-1 (I * beta_r + W * theta_r)

Ibeta <- diag(length(listw$neighbours)) * coef(slm)["INC"]
S <- A %*% Ibeta

ADI <- sum(diag(S)) / nrow(A)

ADI

## [1] -1.122516

n <- length(listw$neighbours)
Total <- crossprod(rep(1, n), S) %*% rep(1, n) / n
Total

## [,1]
## [1,] -1.800897

Indirect <- Total - ADI

Indirect

## [,1]
## [1,] -0.6783818

Note that the results are the same as those computed by impact.
3.5. SPILLOVER EFFECTS ON CRIME: AN APPLICATION IN R 75

We can also obtain the p-values of the impacts by using the argument R. This argument indicates the
number of simulations use to create distributions for the impact measures, provided that the fitted model
object contains a coefficient covariance matrix.
Now with p-values:

# Compute standard errors of impacts

im_obj <- impacts(slm, listw = listw, R = 200)
summary(im_obj, zstats = TRUE, short = TRUE)

## Impact measures (lag, exact):

## Direct Indirect Total
## INC -1.1225156 -0.6783818 -1.8008973
## HOVAL -0.2823163 -0.1706152 -0.4529315
## ========================================================
## Simulation results (asymptotic variance matrix):
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## INC 0.28508990 0.3418742 0.5254074
## HOVAL 0.09464442 0.1093738 0.1772125
##
## Simulated z-values:
## Direct Indirect Total
## INC -3.857740 -2.090573 -3.453541
## HOVAL -3.114138 -1.761298 -2.750232
##
## Simulated p-values:
## Direct Indirect Total
## INC 0.00011444 0.036566 0.00055328
## HOVAL 0.00184483 0.078188 0.00595532

The results shows that the variable that exerts the largest negative direct impact is INC. That is, INC
exert the largest reduction on own-crime rate. The indirect effects are presented in the second column. These
effects help identify which variables produce the largest spatial spillovers. Negative effects could be considered
spatial benefits, since these indicate variables that lead to a reduction in crime rate. Positive indirect effects
would represent a negative externality, since this indicates that neighboring regions suffer from an increase in
crime rate when these variables increase. From the results we observe that INC has the largest and significant
negative indirect effects.
The indirect effect for HOVAL is not significant. The weakly significant effect in the SLM model can be
explained by the fact that this model suffers from the problem that the ratio between the spillover effect
and the direct effect is the same for every explanatory variable. Therefore, this model is too rigid to model
spillover effects adequately.
Total effect takes into account both the direct and indirect effects, allowing us to draw an inference
regarding what variables are important to reduce crime rate. We can observe that INC has the larges total
effect.
Now we follow the example that converts the spatial weight matrix into “sparse” matrix, and power it up
using the trW function.

# Impacts using traces.

W <- as(nb2listw(col.gal.nb, style = "W"), "CsparseMatrix")
trMC <- trW(W, type = "MC")
im <- impacts(slm, tr = trMC, R = 100)
summary(im, zstats = TRUE, short = TRUE)

## Impact measures (lag, trace):

## Direct Indirect Total
## INC -1.1220237 -0.6788736 -1.8008973
76 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

## HOVAL -0.2821926 -0.1707389 -0.4529315

## ========================================================
## Simulation results (asymptotic variance matrix):
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## INC 0.30303387 0.2985437 0.4706174
## HOVAL 0.09635802 0.1161956 0.1879878
##
## Simulated z-values:
## Direct Indirect Total
## INC -3.821767 -2.198798 -3.855705
## HOVAL -2.937362 -1.457964 -2.406792
##
## Simulated p-values:
## Direct Indirect Total
## INC 0.0001325 0.027892 0.0001154
## HOVAL 0.0033102 0.144850 0.0160933

We can also observe the cumulative impacts using the argument Q. When Q and tr are given in the
impacts function the output will present the impact components for each step in the traces of powers of the
weight matrix up to and including the Qth power.

# Cumulative impacts
im2 <- impacts(slm, tr = trMC, R = 100, Q = 5)
sums2 <- summary(im2, zstats = TRUE, reportQ = TRUE, short = TRUE)
sums2

## Impact measures (lag, trace):

## Direct Indirect Total
## INC -1.1220237 -0.6788736 -1.8008973
## HOVAL -0.2821926 -0.1707389 -0.4529315
## =================================
## Impact components
## $direct
## INC HOVAL
## Q1 -1.073533465 -0.2699971236
## Q2 0.000000000 0.0000000000
## Q3 -0.038985415 -0.0098049573
## Q4 -0.005035472 -0.0012664374
## Q5 -0.003072085 -0.0007726393
##
## $indirect
## INC HOVAL
## Q1 0.00000000 0.000000000
## Q2 -0.43358910 -0.109049054
## Q3 -0.13613675 -0.034238831
## Q4 -0.06569456 -0.016522394
## Q5 -0.02549505 -0.006412086
##
## $total
## INC HOVAL
## Q1 -1.07353347 -0.269997124
## Q2 -0.43358910 -0.109049054
## Q3 -0.17512216 -0.044043788
## Q4 -0.07073004 -0.017788832
## Q5 -0.02856713 -0.007184726
3.5. SPILLOVER EFFECTS ON CRIME: AN APPLICATION IN R 77

##
## ========================================================
## Simulation results (asymptotic variance matrix):
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## INC 0.34631256 0.4029543 0.6401131
## HOVAL 0.08921795 0.1241116 0.1807874
##
## Simulated z-values:
## Direct Indirect Total
## INC -3.233305 -1.853701 -2.916189
## HOVAL -3.239503 -1.585160 -2.686904
##
## Simulated p-values:
## Direct Indirect Total
## INC 0.0012237 0.063782 0.0035434
## HOVAL 0.0011974 0.112930 0.0072118
## ========================================================
## Simulated impact components z-values:
## $Direct
## INC HOVAL
## Q1 -3.167835 -3.1796183
## Q2 NaN NaN
## Q3 -1.703030 -1.5687862
## Q4 -1.272087 -1.0947684
## Q5 -1.002337 -0.8174607
##
## $Indirect
## INC HOVAL
## Q1 NaN NaN
## Q2 -2.465562 -2.4657025
## Q3 -1.703030 -1.5687862
## Q4 -1.272087 -1.0947684
## Q5 -1.002337 -0.8174607
##
## $Total
## INC HOVAL
## Q1 -3.167835 -3.1796183
## Q2 -2.465562 -2.4657025
## Q3 -1.703030 -1.5687862
## Q4 -1.272087 -1.0947684
## Q5 -1.002337 -0.8174607
##
##
## Simulated impact components p-values:
## $Direct
## INC HOVAL
## Q1 0.0015358 0.0014747
## Q2 NA NA
## Q3 0.0885624 0.1166978
## Q4 0.2033424 0.2736181
## Q5 0.3161810 0.4136652
##
## $Indirect
## INC HOVAL
78 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

## Q1 NA NA
## Q2 0.013680 0.013674
## Q3 0.088562 0.116698
## Q4 0.203342 0.273618
## Q5 0.316181 0.413665
##
## $Total
## INC HOVAL
## Q1 0.0015358 0.0014747
## Q2 0.0136799 0.0136745
## Q3 0.0885624 0.1166978
## Q4 0.2033424 0.2736181
## Q5 0.3161810 0.4136652

3.6 Asymptotic Properties

In this section we review the asymptotic properties of the ML and Quasi ML for the SLM. In particular, we
follow Lee (2004).

3.6.1 Triangular Arrays

An important question is the context of asymptotic theory is the following: What is the meaning of n → ∞
in a spatial context? Will increase the geographical area or will increase the number of spatial unit in a given
geographical area?
For spatial data, two distinct asymptotic frameworks have been studied: increasing domain and infill
asymptotic. Increasing domain consists of a sampling structure where new observations (spatial units) are
added at the edges (boundary points), similar to the underlying asymptotic in time series analysis. That is,
increasing domain asymptotic refers to more and more observations being sampled over an increasing domain.
The problem here is what the boundary is. When referring to increasing domain asymptotic, it is assumed
that the spatial locations of the observations do not become dense. Infill asymptotic are appropriate when
the spatial domain is bounded, and new observations (points) are added in between existing ones, generating
denser surface. In most applications of spatial econometric, the implied structure is that of an increasing
domain.
The increasing domain framework requires the knowledge of triangular arrays. The following definition
give us a simply definition of Triangular Arrays.
Definition 3.6.1 — Triangular Array of Random Variables. The ordered collection of random variables

{X11 , X21 , X22 , X31 , X32 , X33 , ..., Xnn , ...} ,

or
 
X11
 X21 X22 
 
 X31 X32 X33 
 .. .. .. ..
 
 . . . .


 
Xn1 Xn2 Xn3 Xn4 ... Xnn 
.. .. .. .. .. .. ..
 
. . . . . . .
is called a triangular array of random variables, and will be denoted by {Xnn }.

Central limit theorems that are applied to triangular arrays of random variablesPn are concerned with
limiting distributions of appropriately defined function of the row average Sn = n−1 i=1 Xni . For example,
for n = 3 (third row) we have S3 =P(1/3)(X31 + X32 + X33 ). Note that the traditional CLTs deal with
n
functions of average of the type n−1 i=1 Xi , the Xi ’s being elements of the sequence {Xn }. However, the
triangular array {Xnn } is more general than a sequence {Xn } in the sense that the random variables in a
3.6. ASYMPTOTIC PROPERTIES 79

row of the array need not be the same as random variables in other rows. Thus, the triangular nature of a
random variable leads to certain statistical problems, especially with respect to the relevant CLT that should
be applied. In other words, we will need a CLT applicable to triangular array. Both the LLN and CLT
require slightly stronger conditions than the LLN and CLT for i.i.d sequence of random variables.
What are the conditions on the random variables so that a properly Sn converges to a normal distribution
as n → ∞. In a nutshell, assume:

• Independence: assume all random variables in the array are independent.

• Centering: assume E(Xj,i ) = 0 for all j, i.
Pn
• Variances converge: assume i=1 E(Xn,i 2
) → σ 2 > 0 as n → ∞
• No single variance is too large.
d
Then Sn −→ N (0, σ 2 ) as n → ∞.
Probably you’re asking yourself, why triangular arrays are important in the spatial context? Note that if
we adopt the increasing domain approach, it is clear that as n increases, W itself changes as observations are
added. To see this, let θ0 = (β0> , ρ0 , σ02 )> be the true parameter vector. We further assume that variables
and estimates depend on the sample size n. This will allow us to study their behavior as n → ∞. Therefore,
denote An (ρ) = In − ρWn for any value of ρ. The “equilibrium” vector is

yn = A−1
n (Xn β0 + εn ) (3.62)
where An = An (ρ0 ) is nonsingular. Let εn (δ) = yn −Xn β−ρWn yn , where δ = (β , ρ) . Thus, εn = εn (δ0 ).
> >

Since the matrices (In −ρW )−1 generally depend upon the sample size n, the vectors y and ε will also depend
upon n, and they will form a triangular arrays. This is due to the fact that for the “boundary” elements
the sample weights matrix changes as new spatial units — or new data points — are added. That is, new
spatial units change the structure for the existing spatial units (see for example Kelejian and Prucha, 1999,
2001; Anselin, 2007). For example, the outcome for the first spatial unit, y1,n , will be different if we consider
a total n = 10 or n = 15 observations because of the changing nature of W as n changes and given the DGP
in Equation (3.62). This implies that these elements and the vector y should be indexed by n:

yn = (y11 , y21 , y22 , ..., ynn )

For example, for n = 1, 2, 3, then (by row):

n = 1 =⇒ y11
n = 2 =⇒ y12 y22
n = 3 =⇒ y13 y23 y33
..
.
n = n =⇒ y13 y23 y33 . . . y3n
where y11 6= y12 6= y13 and y22 6= y23 . Note that the dependent variable in the same row are mutually
independent (spatial units are independent) and have the same distribution. But the distribution of the
random variable y (and ) in different rows are allowed to be different.
The triangular array structure of y is partly a consequence of allowing a triangular array structure for the
disturbances in the model. But there is a more fundamental reason for it, and for treating the X observations
as a triangular array also. In allowing for the elements of Xn to depend on n we allow explicitly for some of
the regressors to be spatial lags.
We can identify each of the indices i = 1, ..., n with a location in space. In regularly-observed time series
settings, these indices correspond to equidistant points on the real line, and it is evident what we usually
mean by letting n increase. However there is ambiguity when these points are in space. For example, consider
n points on a 2 dimensional regularly-spaced lattice, where both the number (n1 ) of rows and the number
(n2 ) of columns increases with n = n1 · n2 . If we choose to list these points in lexicographic order (say
first left to right, then second row, etc) then as n increases there would have to be some re-labeling, as the
triangular array permits. Another consequence of this listing is that dependence between locations i and j is
not always naturally expressed as a function of the difference i − j. For example, this is so if the dependence
is isotropic.
80 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

3.6.2 Consistency of QMLE

Lee (2004) derive the asymptotic properties (consistency and asymptotic normality) of the ML and QML
estimator under the spatial context. Lee starts with the following assumption about the error terms i .

Assumption 3.4 — Errors. Assume the following

1. The disturbances {i,n : 1 ≤ i ≤ n, n ≥ 1} are identically distributed. Furthermore, for each sample
size n, they are jointly independent distributed with mean E(i,n ) = 0 and E(2i,n ) = σ,n2
where
0 < σ,n < b.
2

2. Its moments E |i,n |4+γ for some γ > 0 exits.

Note that Assumption 3.4(1) allows the error term to depend on the sample size n, i.e., to form a triangular
array. (For simplicity of notation we will, for the most part, drop again subscripts n in the following).
Moreover, because statistics involving quadratic forms of n will be present in the estimation, the existence
of the fourth order moment of i,n will guarantee finite variances for the quadratic forms and we will be able
to apply a CLT.
In order to understand the asymptotic behavior of Wn under some regularity conditions, we need to
understand some useful terminologies.
Definition 3.6.2 — Triangular array of constants. Let {bni } , i = 1, ..., n be a triangular array of constants.

1. {bni } are at most of order (1/hn ), denoted by O(1/hn ) uniformly in i if there exists a finite constant
c independent of i and n such that |bni | ≤ hcn for all i and n.
2. {bni } are bounded away from zero uniformly in i at rate of hn if there exists a positive sequence
{hn } and a constant c > 0 independent of i and n such that c ≤ |bni | /hn for all i for sufficiently
large n.

Again, we must think the W matrices as triangular arrays of constants. Recall that the elements of W
are denoted as wij . However, since as we add more spatial units the spatial structure changes, it might be
the case that the element wij is not the same when n = 50 or n = 55. Therefore, we need triangular arrays
in order to make explicit this possibility. That is why we will index the elements of Wn as wn,ij .
Another question is whether each element of Wn —or sequences— are bounded. That is, they are limited
as n → ∞. In this context, Definition 3.6.2 provides a specific setting for sequences bounded away from zero.
If sequences are divergent, this definition describes how fast the sequences tend to infinity. Now, we apply
this definition to the spatial weight matrices:

n , denoted by O(1/hn ),
Assumption 3.5 — Weight Matrix. The elements wn,ij of Wn are at most of order h−1
uniformly in all i, j, where the rate sequence hn can be bounded or divergent. As a normalization,
wn,ii = 0 for all i.

Recall that in econometric we are often interested in the asymptotic behavior of variables. For example
we say that:
Xn
Xn = O(bn ) =⇒ lim = −∞ < c < ∞.
n→∞ bn
This implies that Xn is a bounded sequence of rate bn . Probably you recall from you econometric class
that we can write:

√ X > X −1 1
n βb − β = √ X > ε,
n n
and we usually state that X > X = O(n) and X > ε = Op (n1/2 ). That is, the sequence (1/n)X > X is a
bounded sequence√and (1/n1/2 )X > ε is a bounded sequence in terms of probability (it converges to something
as fast as rate 1/ n). Assumption 3.5 states that the elements of Wn are sequences that might be bounded
or divergent at rate hn . That is, we do not know if hn wn,ij is bounded or divergent.
3.6. ASYMPTOTIC PROPERTIES 81

Assumption 3.6 The ratio hn /n → 0 as n goes to infinity

Assumptions 3.5 and 3.6 link directly the spatial weight matrix to the sample size n. The intuition tell us
that as the sample size n increases, the row sum of the weight matrices will also tend to increase, since one
region could have more neighbors (see our discussion in Section 3.6.1). The rate at which the spatial weights
wn,ij increases as n increases can be bounded (limit on the number of neighbors) or can be divergent (not
limit in the number of neighbors). Therefore, Assumptions 3.5 and 3.6 are intended to cover weight matrices
whose elements are not restricted to be nonnegative and those that might not be row-standardized.
What are the implications of those assumption? These assumptions have to be with the row and column
sums of the matrix W . In particular, the row and column sums of W before W is row-normalized should
not diverge to infinity at a rate equal to or faster than the rate of the sample size n. This condition is slightly
different in Kelejian and Prucha (1998, 1999). Their condition states that the row and columns sums of the
matrices W and (In − ρW )−1 before W is row-normalized should be uniformly bounded in absolute value
as n goes to infinity. In both cases these conditions limit the cross-sectional correlation to a manageable
degree, i.e., the correlation between two spatial units should converge to zero as the distance separating them
increases to infinity.
In addition to the technicality, these assumptions have applied implications. Normally, no spatial unit
is assumed to be a neighbor to more than a given number, say q, other units. Therefore, the number of
neighbors is limited and Lee (2004)’s and Kelejian and Prucha (1998, 1999)’s assumption is satisfied.
By contrast, when the spatial weights matrix is an inverse distance matrix Kelejian and Prucha (1998,
1999)’s condition may not be satisfied. To see this, consider an infinite number of spatial units that are
arranged linearly. Let the distance of each spatial unit to its first left- and right-hand neighbor be d; to its
second left- and right-hand neighbor, the distance 2d; and so on. See for example Figure 3.5.

Figure 3.5: Distances from R3 to all Regions

R1 R2 R3 R4 R5
2d d d 2d

When W is an inverse distance matrix and its off-diagonal elements are of the form 1/dij , where dij is
the distance between two spatial units i and j, each row sum is

1/d + 1/d + 1/2d + 1/2d + .... = 2 × (1/d + 1/2d + 1/3d + ....)

representing a series that is not finite. This is perhaps the main motivation of why some empirical applications
introduce a cut-off point d∗ such that wij = 0 if dij > d∗ . However, since the ratio 2 × (1/d + 1/2d + 1/3d +
....)/n → 0 as n → ∞, Lee (2004)’s condition is satisfied, which implies that an inverse distance matrix without
cut-off point does not necessarily have to be excludedP in an empirical study for reasons of consistency. Thus,
n
Assumption 3.6 excludes cases where the row sums, j=1 wij , for i = 1, ..., n, diverges to infinity at a rate
equal to or faster than the rate of the sample size n, because the ML estimator would likely be inconsistent
for those cases. Another case where {hn } is a bounded sequence is when we fixed the number of neighbors,
such as in the case of k-neighbors approach. Nevertheless our distance example explains why it sometimes
leads to numerical problems or unexpected outcomes in empirical applications. This is because the number
of unit in the sample generally does not go to infinity,
Pn but is finite.
What if hn is unbounded? Under this case j=1 dij is uniformly bounded away from zero at the rate
hn , where limn→∞ hn = ∞. This particular case rules out cases where each unit has only a (fixed) finite
number of neighbors even when the total number of unit increases to infinity. For example, it rules out the
case where units correspond to counties and neighbors are defined as counties with contiguous border.
In which cases hn → ∞? This case requires that each unit in the limit has infinitely many neighbors. As
stated by Lee (2002), in economic applications where either the neighbors of any unit are dense in a relevant
space or each unit is influenced by many of itsP neighboring units, which represents
Pn a significant proportion
n
of the total population units, it is likely that j=1 dij will diverge and (1/n) j=1 dij will converge as n
becomes large. Consider the case where dij = 1/ |ri − rj |, where ri is the proportion of state i’s population
that is of African descent. AsP no state in USA has zero proportion of African-Americans in its population,
n Pn
dij will be positive, and (1/n) j=1 dij will be bounded away from zero and j=1 dij will be likely to possess
the n rate of divergence in this example.
Another example occurs when all cross-sectional units are assumed to be neighbors of each other and
are given equal weights. In that case all off-diagonal elements of the spatial weights matrix are wij = 1.
82 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

Since the row and column sums are n − 1, these sums diverge to infinity as n → ∞. In contrast to the
previous case, however, (n − 1)/n → 1 instead of 0 as n → ∞. This implies that a spatial weight matrix
that has equal weights and that is row-normalized subsequently, wij − 1/(n − 1) bust be excluded for reasons
of consistency since is satisfies neither Lee (2004)’s and Kelejian and Prucha (1998, 1999)’s condition. The
alternative is a group interaction matrix, introduced by Case (1991). Here “neighbors” refer to farmers who
live in the same district. Suppose that there are R districts and there are m farmers in each district. The
sample size is n = mR. Case assumed that in a district, each neighbor of a farmer is given equal weight.
In that case, Wn = IR ⊗ Bm , where Bm = (ım ı> m − Im )/(m − 1). For this example, hn = (m − 1) and
hn /n = (m − 1)/(mR) = O(1/R). If sample size n increases by increasing both R and m, then hn goes to
infinity and hn /n goes to zero as n tends to infinity. Thus, this matrix satisfies Lee (2004)’s condition.

R Whether {hn } is a bounded or divergent sequence has interesting implications on the OLS estimation.
The OLS estimators of β and ρ are inconsistent when {hn } is bounded, but they can be consistent
when {hn } is divergent (see Lee, 2002).

In summary, when {hn } is a bounded sequence, it implies a cross sectional unit has only a small number
of neighbors, where the spatial dependence is usually defined based on geographical implications. When
{hn } is divergent, it corresponds to the scenario where each unit has a large number of neighbors that often
emerges in empirical studies of social interactions or cluster sampling data.

Assumption 3.7 The matrix An is nonsingular.

Under Assumption 3.7, the SLM model (system) has the reduced form (equilibrium) given by Equation
(3.62), and:

−1
E(yn ) = (In − ρ0 Wn ) Xn β = A−1
n Xn β0 (3.63)
−1 −1>
Var(yn ) = σ02 (In − ρ0 Wn ) (In − ρ0 Wn ) = σ02 A−1 −1>
n An (3.64)

Before explaining the rest of assumption, we need the notion of bounded matrices.
Definition 3.6.3 — Bounded Matrices. Let {An } be a sequence of n-dimensional square matrices, where
An = [an,ij ],

1. The column sums of {An } are uniformly bounded (in absolute value) if there exists a finite constant
c that does not depend on n such that

n
X
kAn k∞ = max |an,ij | ≤ c
1≤j≤n
i=1

2. The row sums of {An } are uniformly bounded (in absolute value) if there exists a finite constant c
that does not depend on n such that

n
X
kAn k1 = max |an,ij | ≤ c
1≤i≤n
j=1

Then {An } is said to be uniformly bounded in row sums if {kAn k1 } is a bounded sequence. Similarly,
{An } is said to be uniformly bounded in column sums if {kAn k∞ } is a bounded sequence.
The following lemmas will be very useful:

Lemma 3.8 If {An } an {Bn } are uniformly bounded in row sums (column sums), then {An Bn } is also
uniformly bounded in row sums (column sums).

Lemma 3.9 If {An } is absolutely summable, and Zn has bounded elements, then the elements of Zn> An Zn =
O(n)
3.6. ASYMPTOTIC PROPERTIES 83

Assumption 3.10 The sequences of matrices {Wn } and are uniformly bounded in both row and
−1
An
column sums

The uniform boundedness of the matrices is a condition to limit the spatial correlation to a manageable
degree. For example, it guarantees that the variances of yn are bounded as n goes to infinity.
Technically, this assumes that {kWn k1 } and {kWn k∞ } are bounded sequences. Formally, let An be a
square matrix. Using Definition 3.6.3, we say that the row and column sums of the sequences of matrices An
is bounded uniformly in absolute value if there exists a constant c < ∞ that does not depend on N such that
N
X N
X
kAn k∞ = |aij,N | < c, kAn k1 = |aij,N | < c ∀N
j=1 i=1
1≤i≤n 1≤j≤n

Why do we care about these? Because we need the variance goes to zero when the sample size goes to
infinity in order to apply some consistency theorem.5

Lemma 3.11 — Uniform Boundedness of Matrices in Row and Column Sums. Suppose that the spatial weights
matrix Wn is a non-negative matrix with its (i, j)th element being
dij
wn,ij = Pn
l=1 dil
and dij > 0 for all i, j.
Pn
1. If the row sums j=1 dij are bounded away from zero at the rate hn uniformly in i, and the column
Pn
sums i=1 dij are O(hn ) uniformly in j, then {Wn } are uniformly bounded in column sums.
Pn
2. (Symmetric Matrix) If dij = dji for all i and j and the row sums j=1 dij are O(hn ) and bounded
away from zero at the rate hn uniformly in i, then {Wn } are uniformly bounded in column sums.

Assumption 3.12 The elements of Xn are uniformly bounded constants for all n. The limn→∞ Xn> Xn /n
exists and is nonsingular.

This rules out multicollinearity among the regressors. Note also that we are assuming that Xn is non-
stochastic. If Xn were stochastic, then we will require:

plimn→∞ Xn> Xn /n,

to exists.

n (ρ) are uniformly bounded in either row or column sums, uniformly in ρ in a compact
Assumption 3.13 A−1
parameter space P . The true parameter ρ0 is in the interior of P

−1
This assumption is needed to deal with the nonlinearity of log (In − ρW ) in the log-likelihood function.

Recall
that if kW
k < 1, then In −ρWn is invertible for all n. Then if kW k < 1, then the sequence of matrices
−1
(In − Wn ) are uniformly bounded in any subset of (−1, 1) bounded away from the boundary. As we

−1
previously see, if W is row-standardized (In − W ) is uniformly bounded in row sums norm uniformly in
any closed subset of (−1, 1). Therefore, P from Assumption 3.13 can be considered as a single closed set
contained in (-1, 1).
−1
What if W is not row-normalized but its eigenvalues are real? Then, the Jacobian of (In − W ) will

be positive if −1/ωmin < ρ < 1/ωmax , where ωmin and ωmax are the minimum and maximum eigenvalues of
W , and P will be a closed interval contained in (−1/ωmin , 1/ωmax ) for all n. Thus, Assumption 3.13 rules
out models where ρ0 is close to -1 and 1.
Now, noting that:
5 Equivalently, this assumption rules out the unit root case in time series.
84 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

yn = Xn β0 + ρ0 Wn yn + εn
= Xn β0 + ρ0 Wn A−1n Xn β0 + An εn + εn
−1

= Xn β0 + ρ0 Wn A−1
n Xn β0 + ρ0 Wn An εn + εn
−1
(3.65)
= Xn β0 + ρ0 Wn A−1
n Xn β0 + In + ρ0 Wn An
−1

εn
= Xn β0 + ρ0 Cn Xn β0 + (In + ρ0 Cn ) εn
= Xn β0 + ρ0 Cn Xn β0 + A−1
n εn

because In + ρ0 Cn = A−1
n (Show this), where Cn = Wn An .
−1

Assumption 3.14 The

1 0
lim (Xn , Cn Xn β0 ) (Xn , Cn Xn β0 )
n→∞ n
exists and is nonsingular.

This is a sufficient condition for global identification of θ0

Theorem 3.15 — Consistency. Under assumption 3.4-3.14, θ0 is globally identifiable and θbn is a consistent
estimator of θ0 .

The proof is given in Lee (2004). Identification of ρ0 can be based on the maximum values of the concen-
trated log-likelihood function Qn (ρ)/n. With identification and uniform convergence of [log Ln (ρ) − Qn (ρ)] /n
to zero on P , consistency of the QMLE θbn follows.

3.6.3 Asymptotic Normality

Recall that in order to derive the asymptotic distribution of the QML and ML we need the asymptotic
behavior of the gradient. Taking a Taylor series expansion around θ0 of ∂ log Ln (θbn )/∂θ = 0 at θ0 , we get:

∂ log Ln (θbn ) ∂ log Ln (θ0 ) ∂ 2 log Ln (θen ) b

= + (θn − θ0 ), (3.66)
∂θ ∂θ ∂θ∂θ >
where θen = αn θbn + (1 − αn )θ0 and αn ∈ [0, 1], therefore:
" #−1
√ 1 ∂ 2 log Ln (θen ) 1 ∂ log Ln (θ0 )
n(θn − θ0 ) = −
b
>
√ . (3.67)
n ∂θ∂θ n ∂θ
As standard in asymptotic theory of ML, we need somehow to show that the first element of the rhs of
(3.67) converges to something. We also need to find the limiting distribution of √1n ∂ log ∂θLn (θ0 )
. Recall that
the first-order derivatives of the log-likelihood function at θ0 are given by (see Section 3.2.2):

1 ∂ log Ln (θ0 ) 1
√ = 2 √ Xn> εn (3.68)
n ∂β σ0 n
1 ∂ log Ln (θ0 ) 1
= √ ε0n εn − nσ02 (3.69)

√
n ∂σ 2 2σ0 n
4

and

1 ∂ log Ln (θ0 ) 1 1
√ = 2 √ (Cn Xn β0 )> εn + 2 √ (ε> Cn εn − σ02 tr(Cn )) (3.70)
n ∂ρ σ0 n σ0 n n
As explained by Lee (2004, pag. 1905), these are linear and quadratic functions of εn . In particular, the
asymptotic distribution of (3.70) may be derived from central limit theorem for linear-quadratic forms. The
matrix Cn is uniformly bounded in row sums. As the elements of Xn are bounded, the elements of Cn Xn β0
for all n are uniformly bounded by Lemma 3.8. With the existence of high order moments of in Assumption
3.4, the central limit theorem for quadratic forms of double arrays of Kelejian and Prucha (2001) can be
applied and the limit distribution of the score vector follows.
3.6. ASYMPTOTIC PROPERTIES 85
√ √
Since E [(1/ n)∂ log Ln /∂θ] = 0, the variance matrix of (1/ n)∂ log Ln /∂θ is:

1 ∂ log Ln (θ0 ) 1 ∂ log Ln (θ0 ) 1 ∂ log Ln (θ)

E √ ·√ = −E + Ωθ,n , (3.71)
n ∂θ n ∂θ > n ∂θ∂θ >

where −E n1 ∂ log
∂θ∂θ
Ln (θ)
> , without divided by n, is given in Equation (3.49)

2 n (X (CXβ)
>
X) 0> >
 1 1

2n X
1 ∂ log Ln (θ)

σ σ
= σ 2 n tr(C) (3.72)
1 1
−E 2σ 4

n ∂θ∂θ > 1
n tr(C s
C) + 1
σ2 n (CXβ) >
(CXβ)
and represent the average Hessian matrix (or information matrix when ε’s are normal). The matrix Ωθ,n
is a matrix with the second, third, and fourth moments of ε. If εn is normally distributed, then Ωθ,n = 0.

Theorem 3.16 — Asymptotic Normality. Under Assumptions 3.4-3.14,

√
d
n θbn − θ0 −→ N θ, Σθ−1 + Σθ−1 Ωθ Σθ−1 , (3.73)

where Ωθ = limn→∞ Ωθ,n and

1 ∂ 2 log Ln (θ0 )

Σθ = − lim E , (3.74)
n→∞ n ∂θ∂θ >
which are assumed to exists. If the i ’s are normally distributed, then:
√
d
n θbn − θ0 −→ N θ, Σ −1 . (3.75)

θ

The following lemmas and statements summarize some basic properties on spatial weight matrices and
some law of large numbers and central limit theorems on linear and quadratic forms. For proof of these
lemmas see Lee (2004)’ appendix. The error term εn are assumed to be i.i.d. with zero mean and finite
variance σ02 according to Assumption 3.4. For quadratic forms involving ε, the fourth moment µ4 for the ε’s
is assumed to exists.
Consider the following properties:

Lemma 3.17 — Limiting Distribution. Suppose that An is n × n matrix with its columns sums being
uniformly bounded and elements of the n × K matrix Cn are uniformly bounded. Elements i ’s of
εn = (1 , ...., n )> are i.i.d(0, σ 2 ). Then:
1
√ Cn> An εn = Op (1). (3.76)
n
That is, it converges in distribution to something. Furthermore, if the limit of 1 >
n Cn An ε n exists and
is positive definite, then:

1 1

d
√ Cn> An εn −→ N 0, σ02 lim Cn> An A> n n .
C (3.77)
n n→∞ n

Lemma 3.18 — First and Second Moments. Let An = [aij ] be an n-dimensional square matrix. Then

1. E(ε>
n An εn ) = σ0 tr(An ),
2

Pn
2. E(ε>
n An εn ) = (µ4 − 3σ0 ) a2ii + σ04 tr2 (An ) + tr(An A>
n ) + tr(An ) , and
2 4
2

i=1
Pn
3. Var(ε>
n An εn ) = (µ4 − 3σ0 ) i=1 aii + σ0 tr(An An ) + tr(An ) .
>
4 2 4
2

In particular, if ε’s are normally distributed, then

• E(ε>n An εn ) = σ0 tr (An ) + tr(An An ) + tr(An ) , and

>
2 4
2 2

86 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

• Var(ε>
n An εn ) = σ0 tr(An An ) + tr(An )
>
4
2

Lemma 3.19 Suppose that {An } is uniformly bounded in either row and column sums, and the elements
an,ij of An are O(1/hn ) uniformly in all i and j. Then:

• E(ε>
n An εn ) = O(n/hn ),

• Var(ε>
n An εn ) = O(n/hn ), and

• ε>
n An εn = Op (n/hn ).

Furthermore:

• limn→∞ hn
n = 0, and,

• hn >
n εn An εn − n E(εn An εn )
hn >
= op (1)

Sketch of Proof Asymptotic Normality. We will sketch the proof of asymptotic normality assuming consis-
tency (Theorem 3.15). The sketch consists in the following steps:

1. First, we need to show that:

1 ∂ 2 log Ln (θ0 )

Σθ = − lim E
n→∞ n ∂θ∂θ >
is non-singular. To show this is beyond the scope of this class notes. We will take this as given.
2. Now we will show that

1 ∂ 2 log Ln (θen ) p 1 ∂ 2 log Ln (θ0 )

−→
n ∂θ∂θ > n ∂θ∂θ >
Recall that the second derivatives are given by Equations (3.34)-(3.39).
By Assumption 3.12 (No asymptotic multicolinearity), we know that limn→∞ n1 Xn> Xn exists, therefore
p
Xn> Xn = O(n) so that Xn> Xn /n = O(1) (Lemma 3.9) and σ en2 −→ σ02 from consistency, then from
Equation (3.34), we have:

1 ∂ 2 log Ln (θen ) 1 ∂ 2 log Ln (θ0 ) 1 (Xn> Xn ) 1 (Xn> Xn )

− = − +
n ∂β∂β > n ∂β∂β > en2
σ n σ02 n
1 1 (Xn Xn )
>

= − 2
σ2 σ n
| 0 {z n } | {z }
e
op (1) O(1)

= op (1)O(1)
= op (1)

By Lemma 3.17, we obtain the following results:

1 > >
X C εn = op (1)
n n n
1 > >
X C Cn εn = op (1)
n n n
It follows that:
3.6. ASYMPTOTIC PROPERTIES 87

1 > > 1
X W yn = Xn> Cn Xn β0 + op (1)
n n n n
1 > > 1
y W εn = ε> C > εn + op (1)
n n n n n n
1 > > 1 > 1
y W Wn yn = (Xn β0 ) Cn> Cn Xn β0 + ε> C > Cn εn + op (1)
n n n n n n n
As Xn> Wn yn /n = Op (1) (it convergences to something in distribution), it follows from Equation (3.36):

1 ∂ 2 log Ln (θen ) 1 ∂ 2 log Ln (θ0 ) 1 (Xn> Wn yn ) 1 (Xn> Wn yn )

− =− 2 + 2
n ∂β∂ρ n ∂β∂ρ σ
en n σ0 n
1 1 (Xn Wn yn )
>

= − 2
σ02 σ
en | n
{z }
| {z }
op (1) Op (1)

= op (1)Op (1) = op (1)

The following identity will be useful:

ε(δen ) = yn − Xn βen − ρen Wn yn + (ε(δ0 ) − ε(δ0 ))

= yn − Xn βen − ρen Wn yn − yn + Xn β0 − ρ0 Wn yn + εn (δ0 ) (3.78)
= Xn (β0 − βen ) + (ρ0 − ρen )Wn yn + εn (δ0 )

Then, taking into account Equation (3.35) and using our result in Equation (3.78) yields:

1 ∂ 2 log Ln (θen ) 1 ∂ 2 log Ln (θ0 ) 1 Xn> ε(δen ) 1 Xn> εn (δ0 )

− = − + .
n ∂β∂σ 2 n ∂β∂σ 2 en4
σ n σ04 n
1 1
= − 4 X > Xn (β0 − βen ) + (ρ0 − ρen )Wn yn + εn (δ0 ) +
σ
en n
1 X > εn (δ0 )
+ 4 n
σ0 n
1 1 > 1 1 1 1
= 4 X Xn (β0 − βen ) − 4 X > Wn yn (ρ0 − ρen ) − 4 X > εn (δ0 )+
σ
en n σen n σen n
1 Xn> εn (δ0 )
+ 4
σ0 n
1 1 Xn> εn (δ0 ) Xn> Xn W > Wn yn

= 4 − 4
+ 4
(β0 − βen ) + n 4 (ρ0 − ρen )
σ0 σ
en n neσn neσn
= op (1)Op (1) + O(1)op (1) + Op (1)op (1)
= op (1)

From Equation (3.39), we know that:

∂ log Ln (θ) 1
= − tr (Cn (ρ))2 − 2 (yn> Wn> Wn yn ) where Cn (ρ) = Wn An (ρ)−1

∂ρ2 σ

From Mean Value Theorem around of tr (Cn (eρn ))2 around ρ0 :

tr (Cn (e
ρn ))2 = tr (Cn (ρ0 ))2 + 2 tr (Cn (ρ̄))3 (ρ0 − ρen )

tr (Cn (e
ρn ))2 − tr (Cn (ρ0 ))2 = 2 tr (Cn (ρ̄))3 (ρ0 − ρen )

Then:
88 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

1 ∂ 2 log Ln (θen ) 1 ∂ 2 log Ln (θ0 ) 1 1 1 yn> Wn> Wn yn

= 2 tr (C (ρ̄)) 3
(ρ ) + = op (1)

− n 0 − ρ n 2 −
∂ρ2 ∂ρ2 en2 |
e
n n |n {z } | {z } | σ0 {z σ n
{z }
op (1) }
O(n/hn ) op (1) Op (n/hn )

Note that Cn (ρ̄) is uniformly bounded in row andcolumn sums uniformly in a neighborhood of ρ0 by
Assumption 3.10 and 3.13. Note that tr (Cn (ρ̄))3 = O(n/hn ) by Lemma
Considering Equation (3.38):

1 ∂ 2 log Ln (θen ) 1 ∂ 2 log Ln (θ0 ) 1 1 1 1

2
− 2
= − 4 yn> Wn> εn (δen ) + 4 yn> Wn> εn (δn )
n ∂σ ∂ρ n ∂σ ∂ρ σ
e n σ n
1 1 > >h i
= − 4 yn Wn Xn (β0 − βen ) + (ρ0 − ρen )Wn yn + εn (δ0 ) +
σ
e n
1 1 > >
y W εn (δn )
σ4 n n n
1 1 1 1
= − 4 yn> Wn> Xn (β0 − βen ) + 4 yn> Wn> Wn yn (e
ρn − ρ0 )+
σ
e n σ
e n
1 1 1 > >

− 4 y W εn
σ 4 σ
e n n n
= op (1)

Note the following:

1 e> e > X > X

n y > W > Wn yn ε> εn
ε(δ) ε(δ) = βen − β0 n
+ (e
ρn − ρ0 ) 2 n n + n +
n n n n
> X > W y > X > ε y > W > εn
n n n n
2(e
ρn − ρ0 ) βen − β0 n
+ 2 β0 − βen + 2(ρ0 − ρen ) n n
n n n
ε> ε n
= n + op (1)
n

Finally, considering the second derivative in Equation (3.37)

1 ∂ 2 log L(θ)
e 1 ∂ 2 log L(θ0 ) 1 1 1 e> e 1 1 1
− = − 2 3 ε(δ) ε(δ) − + 2 3 ε> ε
n ∂(σ ) 2 2 n ∂(σ ) 2 2 2(e
σ )
2 2 (e
σ ) n 2(σ )
2 2 (σ ) n
1 1 1 1 1 e> e 1 1

= − 2 2 − 2 3 ε(δ) ε(δ) + 2 3 ε> ε
2 σ e 2 )2 (σ ) (e
σ ) n (σ ) n
1 1 1 1 1 1
>
εn εn
= − 2 2 − 2 3 + op (1) + 2 3 ε> ε
2 σ e )
2 2 (σ ) (e
σ ) n (σ ) n
1 1 1 1 1
>
εn εn
= − 2 2 + − 2 3 + op (1)
2 σ e 2 )2 (σ ) (σ 2 )3 (e
σ ) n
= op (1)

3. Now, we need to show that:

1 ∂ 2 log Ln (θ0 ) p 1 ∂ 2 log Ln (θ0 )

−→ E (3.79)
n ∂θ∂θ > n ∂θ∂θ >

Recall that the expectations are:

3.6. ASYMPTOTIC PROPERTIES 89

1 ∂ 2 log L(θ0 ) 1 1

E >
W , X = − 2 (X > X)
n ∂β∂β σ0 n
1 ∂ log L(θ0 )
2

W,X = 0

E
n ∂β∂σ 2
1 ∂ 2 log L(θ0 ) 1 1

E W , X = − 2 X > CXβ0
n ∂β∂ρ σ0 n
1 ∂ 2 log L(θ0 ) 1

E W , X =− 4
n ∂(σ 2 )2 2σ0
1 ∂ 2 log L(θ0 ) 1

E W , X = − tr(C)/σ02
n ∂σ 2 ∂ρ n
1 ∂ 2 log L(θ0 ) 1 1 1

W , X = − n tr(C C) − σ 2 n (CXβ0 ) (CXβ0 )
s >
E
n ∂ρ2 0

All these expectations exist in the limit by Assumption 3.14 and Lemma 3.18-3.19. Then, by nonsin-
gularity of E [H(wi ; θ)], we can say that
−1
1

p −1
H(w; θ)
b −→ E [H(w; θ0 )]
n

4. Recall that the first-order derivatives of the log-likelihood function at θ0 are given by (see Section 3.2.2):

1√
Xn> εn
 
σ02 n
1 ∂ log Ln (θ0 ) 1√
ε0n εn − nσ02

√ =
 
2σ04 n
n ∂θ
 
1√
σ2 n
(Cn Xn β0 )> εn + σ21√n (ε> n Cn εn − σ02 tr(Cn ))
0 0

As explained by Lee (2004, pag. 1905), these are linear and quadratic functions of εn . In particular, the
asymptotic distribution of √1n ∂ log ∂θ
Ln (θ0 )
may be derived from central limit theorem for linear-quadratic
forms. The matrix Cn is uniformly bounded in row sums. As the elements of Xn are bounded, the
elements of Cn Xn β0 for all n are uniformly bounded by Lemma 3.8. With the existence of high order
moments of in Assumption 3.4, the central limit theorem for quadratic forms of double arrays of
Kelejian and Prucha (2001) can be applied and the limit distribution of the score vector follows.
√ √
Since E [(1/ n)∂ log Ln /∂θ] = 0, the variance matrix of (1/ n)∂ log Ln /∂θ under normality is:

1 ∂ log Ln (θ0 ) 1 ∂ log Ln (θ0 ) 1 ∂ log Ln (θ)

E √ ·√ = −E , (3.80)
n ∂θ n ∂θ > n ∂θ∂θ >
where:

σ 2 n (X X) σ 2 n X (CXβ)
>
1
0> 1 >
 
1 ∂ log Ln (θ)

= σ 2 n tr(C) (3.81)
1 1
−E 2σ 4

n ∂θ∂θ > 1
n tr(C s
C) + 1
σ2 n (CXβ) >
(CXβ)
and represent the average Hessian matrix (or information matrix when ε’s are normal). Then:

1 ∂ log Ln (θ0 ) d
√ −→ N(0, −E [H(wi ; θ)]), (3.82)
n ∂θ
and:

√ d −1
n(θbn − θ0 ) −→ −E [H(wi ; θ)] N(0, −E [H(wi ; θ)]) = N(0, Σθ−1 ).

90 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION

Appendix
3.A Terminology in Asymptotic Theory
3.B A function to estimate the SLM in R
For those interested in programming spatial models via ML, here I provide a small function to estimate the
SLM based on spdep package and Algorithm 3.1.

##################################
# Spatial Lag Model Estimated via Maximum Likelihood
# By: Mauricio Sarrias
# Based on spdep code
#################################

sarML <- function(formula, data, Wl)

{
# Model Frame: This part is standard in R to obtain
# the variables using formula and data argument.
callT <- match.call(expand.dots = TRUE)
mf <- callT
m <- match(c("formula", "data"), names(mf), 0L)
mf <- mf[c(1L, m)]
mf[[1L]] <- as.name("model.frame")
mf <- eval(mf, parent.frame()) # final model frame

# Get variables and Globals

y <- model.response(mf) # Get dependent variable from mf
X <- model.matrix(formula, mf) # Get X from mf
n <- nrow(X) # Number of spatial units
k <- ncol(X) # Number of regressors
Wy <- lag.listw(Wl, y) # Spatial lag
W <- listw2mat(Wl) # wlist to matrix
sn <- length(Wl$neighbours)
if (n != sn) stop("number of spatial units in W is different to the number of data")

# Generate auxiliary regressions

# See Algorithm 3.1
ols_0 <- lm(y ~ X - 1)
ols_L <- lm(Wy ~ X - 1)
e_0 <- residuals(ols_0)
e_L <- residuals(ols_L)

# Get eigenvalues to constraint the optimization

omega <- eigenw(Wl)

# Maximize concentrated log-likelihood

rho_space <- if (is.complex(omega)) 1 / range(Re(eig)) else 1 / range(omega)
opt_lc <- optimize(f = logLik_sar, # This function is below
lower = rho_space[1] + .Machine$double.eps,
upper = rho_space[2] - .Machine$double.eps,
maximum = TRUE,
e_0 = e_0, e_L = e_L, omega = omega, n = n)
# Obtain rho_hat from concentrated log-likelihood
rho_hat <- opt_lc$maximum
3.B. A FUNCTION TO ESTIMATE THE SLM IN R 91

# Generate estimates
A <- (diag(n) - rho_hat * W)
Ay <- crossprod(t(A), y)
beta_hat <- solve(crossprod(X)) %*% crossprod(X, Ay) # See Equation (3.25)
error <- Ay - crossprod(t(X), beta_hat)
sigma2_hat <- crossprod(error) / n # See Equation (3.26)

# Hessian
C <- crossprod(t(W), solve(A)) # C = WA^{-1}
alpha <- sum(omega ^ 2 / ((1 - rho_hat * omega) ^ 2))
if (is.complex(alpha)) alpha <- Re(alpha)
b_b <- drop(1 / sigma2_hat) * crossprod(X) # k * k
b_rho <- drop(1 / sigma2_hat) * (t(X) %*% C %*% X %*% beta_hat) # k * 1
sig_sig <- n / (2 * sigma2_hat ^ 2) # 1 * 1
sig_rho <- drop(1 / sigma2_hat) * sum(diag(C)) # 1 * 1
rho_rho <- sum(diag(crossprod(C))) + alpha +
drop(1 / sigma2_hat) * crossprod(C %*% X %*% beta_hat) # 1*1
row_1 <- cbind(b_b, rep(0, k), b_rho)
row_2 <- cbind(t(rep(0, k)), sig_sig, sig_rho)
row_3 <- cbind(t(b_rho), sig_rho, rho_rho)
Hessian <- rbind(row_1, row_2, row_3)
std.err <- sqrt(diag(solve(Hessian)))

# Table of coefficients
all_names <- c(colnames(X), "sigma2", "rho")
all_coef <- c(beta_hat, sigma2_hat, rho_hat)
z <- all_coef / std.err
p <- pnorm(abs(z), lower.tail = FALSE) * 2
sar_table <- cbind(all_coef, std.err, z, p)
cat(paste("\nEstimates from SAR Model \n\n"))
colnames(sar_table) <- c("Estimate", "Std. Error", "z-value", "Pr(>|z|)")
rownames(sar_table) <- all_names
printCoefmat(sar_table)
}

logLik_sar <- function(rho, e_0, e_L, omega, n)

{
# This function returns the concentrated log L for maximization

#Generate determinant using Ord's approximation

det <- if (is.complex(omega)) Re(prod(1 - rho * omega)) else prod(1 - rho * omega)
e_diff <- e_0 - rho * e_L
sigma2 <- crossprod(e_diff) / n

#Log-Likelihood function
l_c <- - (n / 2) - (n / 2) * log(2 * pi) - (n / 2) * log(sigma2) + log(det)
return(l_c)
}
92 CHAPTER 3. MAXIMUM LIKELIHOOD ESTIMATION
Hypothesis Testing
4
In the previous chapter we have presented the spatial autoregressive models, the intuition underlying their
DGP, and their estimation by ML. At this stage the following question arises: which model is more convenient
for empirical analysis? There exists two ways to proceed. The first way is to use a spatial model according
to some theoretical considerations. The second approach suggests that a series of statistical test should be
carried out on the different specifications of the spatial autocorrelation models to adopt the one that better
control for spatial autocorrelation among residuals.
In this chapter we present some approaches to test whether the true spatial parameters are zero or not.
In other words, we would like to assess the null H0 : λ = 0 or H0 : ρ = 0, under the alternative H1 : λ 6= 0 or
H1 : ρ 6= 0.
We first start with the Moran’s I statistic used to test whether there is some evidence of spatial autocor-
relation in the error term. Then, we present several test based on the ML principle.

4.1 Test for Residual Spatial Autocorrelation Based on the Moran

I Statistic
4.1.1 Cliff and Ord Derivation
Recall from Section 1.4.1 that the Moran’s I test allows us assess whether the observed value of a variable at
one location is independent of values of that variable at neighboring locations. One could also in principle
apply the same test to the OLS residuals to assess whether some spatial autocorrelation remains. If the true
DGP follows a spatial process, and we wrongly ignore it, then Moran’s I on the OLS residuals should detect
this misspecification.
A Moran I statistic for spatial autocorrelation can be applied to regression residuals in a straightforward
way. Formally, this I statistic is:
>
n εb W εb
I=
S0 εb> εb
where εb is a vector of OLS residuals, W is a spatial weight matrix, n is the number of observations and S0
is a standardization factor, equal to the sum of all elements in the weight matrix. For a weight matrix that
is normalized such that the row elements sum to one, expression (4.2) simplifies to:

εb> W εb
I= (4.1)
εb> εb
The asymptotic distribution for the Moran statistic with regression residuals was developed by Cliff and
Ord (1972, 1973). In particular, the following Theorem give us the moment of the Moran’s I statistic and
its distribution.

Theorem 4.1 — Moran’s I. Consider H0 : no spatial autocorrelation, and assume that ε ∼ N(0, σ 2 In ). Let
the Moran’s I statistic be:

93
94 CHAPTER 4. HYPOTHESIS TESTING

εb> W εb

n
I= (4.2)
S0 εb> εb
where εb = y − X βb is a vector of OLS residuals, βb = (X > X)−1 X > y, W is a spatial weight matrix, n
is the number of observations and S0 is a standardization factor, equal to the sum of all elements in the
weight matrix. Then, the moments under the null are:

n tr(M W )
E(I) =
S0 n − K
2
2 2 (4.3)
n
tr M W M W > + tr (M W ) + [tr(M W )]

S0
E(I 2 ) =
(n − K)(n − K + 2)
−1
where M = I − X X > X X > . Then:

I − E(I)
zI = ∼ N(0, 1) (4.4)
Var(I)1/2
where Var(I) = E(I 2 ) − E(I)2 .

According to Anselin (1988, p. 102), the interpretation of this test is not always straightforward, even
though it is by far the most widely used approach. While the null hypothesis is obviously the absence of
spatial dependence, a precise expression for the alternative hypothesis does not exists. Intuitively, the spatial
weight matrix is taken to represent the pattern of potential spatial interaction that causes dependence, but
the nature of the underlying DGP is not specified. Usually it is assumed to be of a spatial autoregressive
form. However, the coefficient 4.1 is mathematically equivalent to an OLS regression of W εb on εb, rather
than for εb on W εb, which would correspond to an autoregressive process as in SEM model. In other words,
Moran’s I is a misspecification test that has power against a host of alternatives. This includes spatial error
autocorrelation, but also residual correlation caused by a spatial lag alternative, and even heteroskedasticity!
Thus, the rejection of the null hypothesis of no spatial autocorrelation does not imply the alternative of spatial
error autocorrelation, which is typically how this result is incorrectly interpreted. Specifically, Moran’s I also
has considerable power against a spatial lag alternative, so rejection of the null does not provide any guidance
in the choice of a spatial error vs. a spatial lag as the alternative spatial regression specification.

4.1.2 Kelijan and Prucha (2001) Derivation of Moran’s I

More recently, Kelejian and Prucha (2001) have criticized Moran’s I measure, arguing that the normalizing
factor used by Cliff and Ord (1972) to derive its expected value and the variance under the null of no
spatial correlation is not theoretically justified. In fact, the denominator of (4.2) represents the estimator
of the standard deviation of the quadratic form appearing in the numerator and this can be proved to be
inconsistent. With this motivation, Kelejian and Prucha (2001) proposed a different normalizing factor that
removes this inconsistency and achieves the aim of normalizing the variance to unity. The Moran’s I they
proposed is the following:

εb> W εb
I¯ = , (4.5)
e2
σ
with σ
e2 being normalizing factor that depends on the particular model chosen as an alternative hypothesis.
In particular, if the alternative hypothesis is constituted by a SEM, the normalizing factor assumes the
expression:
−1/2
εb> εb tr W > + W W

e =
σ 2
. (4.6)
n
As a consequence the test statistic can be defined as:

ε> W εb
nb
I¯ = −1/2
. (4.7)
εb> εb {tr [(W > + W ) W ]}
4.2. COMMON FACTOR HYPOTHESIS 95

The two expressions reported in Equations (4.2) and (4.7) coincide if the weight matrix has dichotomous
entries in which case wij = wij
2
and, therefore,
XX −1/2
wij = tr W > + W W

.
i j

In their paper, Kelejian and Prucha (2001) prove that the modified Moran test I¯ converges in distribution
to a standardized normal distribution even when the priori assumption of the normality of the error is not
satisfied. Even if in large samples I¯ ∼ N(0, 1), in small samples its expected value and variance may be
different.

4.1.3 Example
We will continue here with Anselin (1988)’s example (see Section 3.5) and we analyze whether the regression
residuals from a OLS model show evidence of some spatial autocorrelation.
To carry out the Moran’s I test on the residuals in R we need to pass the regression object and spatial
weight object (listw) to the lm.morantest function.

# Moran test for residuals

library("spdep")
# Load data
columbus <- readShapePoly(system.file("etc/shapes/columbus.shp",
package = "spdep")[1])
col.gal.nb <- read.gal(system.file("etc/weights/columbus.gal",
package = "spdep")[1])
listw <- nb2listw(col.gal.nb, style = "W")
ols <- lm(CRIME ~ INC + HOVAL,
data = columbus)
lm.morantest(ols, listw = listw, alternative = "two.sided")

##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = CRIME ~ INC + HOVAL, data = columbus)
## weights: listw
##
## Moran I statistic standard deviate = 2.681, p-value = 0.00734
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I Expectation Variance
## 0.212374153 -0.033268284 0.008394853

The default setting in this function is to compute the p-value for one sided test. To get a two-sided test,
the alternative argument must be specified explicitly.
The results show a Moran’s I statistic of 0.212, which is highly significant and reject the null hypothesis
of uncorrelated error terms.
Recall that the Moran’s I statistic has high power against a range of alternatives. However, it does not
provide much help in terms of which alternative model would be most appropriate.

4.2 Common Factor Hypothesis

The SEM model can be expanded and rewritten as follows:
96 CHAPTER 4. HYPOTHESIS TESTING

y = Xβ + (In − λW )−1 ε
(In − λW )y = (In − λW )Xβ + ε
(4.8)
y − λW y = (X − λW X)β + ε
y = λW y + Xβ − W X(λβ) + ε
resulting in a model including not only the spatially lagged dependent variable, W y, but also the spatially
lagged explanatory variables (W X). Under some nonlinear restrictions we can see that (4.8) is equivalent
to the SDM. The unconstrained form of the model—or the SDM model—is

y = γ1 W y + Xγ2 + W Xγ3 + ε, (4.9)

where γ1 is a scalar, γ2 is a K × 1 vector (where K is the number of explanatory variables, including the
constant), and γ3 is also a K × 1 vector. Note that if γ3 = −γ1 γ2 , then the SDM is equivalent to the SEM
model. Note also that γ3 = −γ1 γ2 is a vector of K × 1 nonlinear constraints of the form:

γ3,k = −γ1 γ2,k , for k = 1, ..., K. (4.10)

These conditions are usually formulated as a null hypothesis, designated as the Common Factor Hy-
pothesis, and written as:

H0 : γ3 + γ1 γ2 = 0. (4.11)
If the constraints hold it follows that the SDM is equivalent to the SEM model.

4.3 Hausman Test: OLS vs SEM

As we explained in Section 3.3, OLS estimates for the parameters β will be unbiased if the underlying DGP
represents the SEM model, but standard errors from least-squares are biased. Since we are comparing two
models that provide consistent estimates, but one is more efficient than the other, we can perform a Hausman
test (Pace and LeSage, 2008).
The idea behind the Hausman test is to compare two set of estimators that are consistent, but one of
them is more efficient. Let βbOLS and βbSEM the estimated parameters with OLS and for the SEM model
estimated, for example, via MLE. Then a natural test is to consider the difference between the two estimators:
qb = βbOLS − βbSEM . If the difference is ‘large’, then there exists evidence against the H0 : βbOLS = βbSEM
suggesting misspecification and then the SEM model is more appropriate. If we cannot reject the null, it
would be an indicator that spatially correlated omitted variables do not represent a problem or are not
correlated with the explanatory variables.
The following definition provides the statistic and asymptotic distribution for the Hausman test.
Definition 4.3.1 — Hausman Test. Let β
bOLS and βbSEM be OLS and SEM estimators. Define qb = βbOLS −
βbSEM , and

Var(q)
b = Var(βbOLS ) − Var(βbSEM ). (4.12)
Then the Hausman statistic:
−1
H = qb> (Var(q))
b q,
b (4.13)
is distributed asymptotically chi-square with #β degrees of freedom.

The estimated variance-covariance matrix βbSEM is given by (see Equation 3.59):

h i−1
>
Var(βbSEM ) = σ
b2 X > (In − λW ) (In − λW ) X . (4.14)
−1
However, as shown by Cordy and Griffith (1993), the usual OLS variance-covariance matrix σ 2 X > X
is inconsistent under the null of a spatial error DGP. A consistent estimator of the OLS variance-covariance
matrix under the spatial error DGP can be obtained as follows. Under the SEM model, the sampling error
for the OLS estimator is:
4.4. TESTS BASED ON ML 97

−1
βbOLS = X > X X >y
−1
= X >X X > [Xβ0 + (I − λW ) ε]
−1
βbOLS − β0 = X > X X > Bε
where B = (I − λW ). Taking expectation, we get:
h i h −1 > i
E βbOLS − β0 = E X > X X Bε
−1
= X >X X > BE(ε)

=0
So the OLS estimator is unbiased. For the variance, we obtain:
h i2
Var(βbOLS ) = E βb − E(β)
b
h −1 > −1 i
= E X >X X Bεε> B > X X > X
(4.15)
−1 > −1
= X >X X BE εε> B > X X > X

−1 > −1
= σ2 X > X X BB > X X > X
Under the null of the spatial error process, the ML estimate σ
b2 , based on the the variance of the residuals
from the SEM provides a consistent estimate of σ . The ML estimate λ
2 b provides a consistent estimate of
λ. With these estimates, we can compute the variance of the OLS estimates as in Equation (4.15)(Pace and
LeSage, 2008).

4.4 Tests Based on ML

In the previous section we shown how to perform a Moran’s I test to assess whether the residuals present
evidence of spatial autocorrelation. However, in this section we first estimate a spatial model and then we
conduct inference. Thus, we will write the null hypothesis as a restriction on a subset of the parameter
vector θ. Specifically, we would like to test whether H0 : ρ = 0 or H0 : λ = 0.
We begin our discussion of the hypothesis tests by describing the ML trinity: the Wald, Likelihood
Ratio (LR), and Lagrange Multiplier (LM) test. These tests can be thought of as a comparison between
the estimates obtained after the constraints implied by the hypothesis have been imposed to the estimates
obtained without the constraints.

4.4.1 Likelihood Ratio Test

The likelihood ratio test is used to compare the difference between the value of the log-likelihood of a
specification considered to be unconstrained and the value of log-likelihood obtained for a constrained model
specification.
We define the constrained estimate as:
( n )
1X
θM L = arg max
e log L(θ) s.t ρ = 0 (4.16)
θ∈Θ n i=1
or
( n
)
1X
θeM L = arg max log L(θ) s.t λ=0 (4.17)
θ∈Θ n i=1
and the unconstrained estimate as:
( n
)
1X
θbM L = arg max log L(θ) (4.18)
θ∈Θ n i=1
98 CHAPTER 4. HYPOTHESIS TESTING

Definition 4.4.1 — Likelihood Ratio Test. The Likelihood Ratio (LR) Test is formally defined as:

n n
!
1X b −1
X d
LR = 2 · n log L(θ) log L(θ)
e −→ χ2 (r) (4.19)
n i=1 n i=1
where r is the number of constraints.
The number of constraints imposed may vary depending on the specifications. In spatial models, the
number of constrains is generally one or two, since we have the restriction ρ = 0, λ = 0, or λ = ρ = 0.
The likelihood ratio test is designed to evaluate the distance that separates the values of the two likeli-
hoods: if the distance is small, then the constrained model is comparable to the unconstrained model. In
this case, the constraint version is “acceptable” and do not reduce the performance of the model. It is thus
statistically possible to not reject the null hypothesis (the postulated constraints prove to be credible). In
other words, if the likelihood value of an unconstrained model strays too far from the constrained model, we
cannot accept the null hypothesis: the gap is too large for the constraint to be consider realistic.

LR for the SLM

Note that the log-likelihood for the unconstrained model—that is the model for which ρ 6= 0—is:

n log(2π) n log(σ 2 ) 1
log L(θ) = log |A| − − − 2 (Ay − Xβ)> (Ay − Xβ) (4.20)
2 2 2σ
The log-likelihood for the constrained model is found by setting ρ = 0 in Equation (4.20). Recall that if
ρ = 0, then A = I − ρW = I, then:

n log(2π) n log(σ 2 ) 1
log L(θ) = − − − 2 (y − Xβ)> (y − Xβ) (4.21)
2 2 2σ
Therefore, following our definition in Equation (4.19):

LR = 2(log L(θ)b − log(θ))

e
1

= 2 log |A| + 2 (y − Xβ)> (y − Xβ) − (Ay − Xβ)> (Ay − Xβ)

2σ (4.22)
1
= 2 log |A| + 2 (y − Xβ)> (y − Xβ) − (Ay − Xβ)> (Ay − Xβ)

σ
with the coefficients respectively evaluated at their restricted and unrestricted estimates. The resulting test
statistic is asymptotically distributed as χ2 with 1 degree of freedom, or, alternatively, its square root is
distributed as a standard normal variate.

LR for the SEM

Note that the log-likelihood for the unconstrained model—that is the model for which λ 6= 0—is:

n log(2π) n log(σ 2 ) 1
log L(θ) = log |B| − − − 2 (y − Xβ)> Ω(λ)(y − Xβ) (4.23)
2 2 2σ
Then the LR for the SEM model is:
1
LR = 2 log |B| + (y − Xβ)> (y − Xβ) − (y − Xβ)> Ω(λ)(y − Xβ) (4.24)

σ2
which is also distributed as χ2 (1). We can use the formulae above or use the following algorithm:

Algorithm 4.2 — LR Test. To compute the test statistic LR,

1. compute the restricted MLE θe and record the value of the log-likelihood function at convergence
log L(θ),
e

2. compute the unrestricted MLE θb and record the value of the log-likelihood function at convergence
log L(θ),
b
4.4. TESTS BASED ON ML 99

3. and compute,
h i
LR = 2 log L(θ)
b − log L(θ)
e

This statistic is always positive because the unrestricted maximum value always exceeds the restricted
one.
4. Compare LR with the critical value of chi-square distribution with 1 degrees of freedom.

4.4.2 Wald Test

This approach is based on the comparison of the distances between the estimated parameters in constrained
and unconstrained form. Thus, this idea suggest that, if the distance between the parameter estimates βb
and θe is too high, the data fail to support the null hypothesis. In such circumstances, the null hypothesis
cannot be accepted.
Formally, the Wald test proposes to calculate the distance between unconstrained estimators and the
constrained estimators. This distance can be expressed by (θb − θ) e 2 and is influenced by the shape of the
likelihood curve.
The Wald statistic is distributed asymptotically according to a chi2r with r degrees of freedom, where r
represents the number of constraints tested. A large value of W means that the null hypothesis should be
rejected, and, conversely, a small value suggests non-rejection of the null hypothesis.
The Wald test commonly uses unconstrained model estimates for evaluating the statistical value of W .
Thus, the researcher needs to estimate only the unconstrained model for hypothesis testing. This is different
from the likelihood ratio test where both unconstrained and constrained models need to be estimated in order
to compare their likelihoods.
Definition 4.4.2 — The Wald Test. Assume that we have r nonlinear restrictions (which includes linear
restriction as special case):

r(θ0 ) = 0
Let also

∂r(θ0 )
R(θ) =
∂θ >
The Wald test is given by:
h i
d
W = n · r(θ)
b > R(θ) b > r(θ)
b Vb R(θ) b −→ χ2 (r) (4.25)

where r is the number of constraints.

Wald Test for SLM

The W statistic is:

ρb2
Wρ = (4.26)
Var(ρ)
d

where Var(ρ)
d can be obtained from Equation 3.50 as:
−1
1

Var(ρ)
d = tr(C s C) + 2 (CXβ)> (CXβ) (4.27)
σ
Clearly,
ρ a
∼ N(0, 1) (4.28)
se(ρ)
with se(ρ) as the estimated standard deviation.
100 CHAPTER 4. HYPOTHESIS TESTING

Extensions to hypotheses that consists of linear and nonlinear combinations of model parameters can
be obtained in a straightforward way. Computationally, the W —and LR— is more demanding since they
require ML estimation under the alternative, and the explicit forms of the tests are more complicated.

Wald Test for SEM

The W statistic is:

b2
λ
Wλ = (4.29)
Var(λ)
d

where Var(λ)
d can be obtained from Equation 3.59 as:
−1
tr(WB )

Var(λ)
d = − + tr(WB )2
+ tr(WB
>
WB ) (4.30)
σ2

>
Algorithm 4.3 — Wald Test. Let θ = θ1> , θ2> . In general, to compute the Wald test statistic for
H0 : θ02 = 0,

1. compute the unrestricted MLE θ,

b
√
2. compute an estimator of the variance matrix of the asymptotic distribution of n(θb − θ0 ), for
example, the information I(θ)
b −1 ,

3. and finally compute the quadratic form:

W = n · θb> Vbw−1 θb (4.31)

where Vbw is the (2, 2) block of I(θ)
b −1 partitioned conformably with θ: that is

h i−1 −1
Vbw = I22 (θ) − I21 (θ) I11 (θ)
b b b I12 (θ)
b (4.32)

which is a consistent estimator of the asymptotic variance of θb2 .

4. Compare W with the critical value of chi-square distribution with K − r degrees of freedom.

4.4.3 Lagrange Multiplier Test

This approach is also based on the log-likelihood function curve, with the slope of the likelihood function
being evaluated by the constraint type. The idea is that when the constraints are verified, the value of
the estimated parameters θ0 is such that the likelihood function slope at this point is zero. The goal is to
compare, whether the slope evaluated using the constrained model is zero or strays too far from 0. In the
last case, the null hypothesis must be rejected.
The Lagrange Multiplier test (or just score test) is based on the restricted model instead of the unrestricted
model. Suppose that we maximize the log-likelihood subject to the set of constraints

Theorem 4.4 — Lagrange Multiplier Test. The Lagrange multiplier test statistic is:
!> !
∂ log L(θ)
e h i−1 ∂ log L(θ)
e d
LM = I(θ)
e −→ χ(r) (4.33)
∂ θe ∂ θe
Under the null hypothesis, LM has a limiting chi-square distribution with degrees of freedom equal to
the number of restrictions. All terms are computed at the restricted estimator.

The main advantage of the LM statistic is that it only requires the constrained model to be estimated,
and it is very often less complex since it mainly lies on the OLS. This is one of the reasons that has lead to
the widespread use of this approach.
4.4. TESTS BASED ON ML 101

LM statistical test construction depends on the postulated specification of the spatial autoregressive DGP:
SEM or SLM. The usual practice is to initially use a general test for detecting residual spatial autocorrelation
(Moran’s I test for example) in order to then be able to carry out the statistical LM test to identify the
specific type of the autoregressive process.

Test for SEM

This test, proposed by Burridge assumes the omission of a spatial autoregressive process of the error term ui ,
where ui = λ j wij uj + i . The null hypothesis is H0 : λ = 0. The constrained version of the SEM model
P
can be reduced to a standard linear regression model y = Xβ + ε.
For the SEM model we need to find the score function of the log-likelihood for the constrained model.
Note that

∂ log L(θ) 1 >

= y − X b B(λ)> B(λ)X
β
∂β σ2
∂ log L(θ) n 1 >
(4.34)
= − 2 + 4 y − X βb B(λ)> B(λ) y − X βb
∂σ 2 2σ 2σ
∂ log L(θ) 1
= − tr B −1 W + 2 ε> W (y − Xβ)

∂λ σ
Under the null hypothesis H0 : λ = 0, we get:

∂ log L(θ) 1 b I > In X = 1 εb> X

>
= y − X β n
∂β σ2 σ 2 OLS
λ=0
∂ log L(θ) n 1
= − 2 + 4 εb> εbOLS (4.35)
∂σ 2
λ=0 2σ 2σ OLS
∂ log L(θ) ε> W ε

=
∂λ
λ=0 σ2
The test is essentially based on the score with respect to λ, i.e., on

∂ log L(θ) ε> W ε

sλ = = (4.36)
∂λ
λ=0 σ2
Recall that:
 X(λ)>X(λ) −1
σ2 0 0
k×k
AsyVar(β, σ 2 , λ) =  (4.37)
 
tr(WB )
0 n 
2σ 4 σ2
 
tr(WB )
0 tr(WB )2 + tr(WB> WB )
σ2

where WB = W (I − λW )−1 . Under the null, EH0 ∂ 2 ln L/∂β∂λ = 0, and EH0 ∂ 2 ln L/∂σ∂λ = 0 because

E ε> W ε = σ 2 tr(W ) = 0 as W has a zero diagonal. Furthermore,

∂ log L(θ)

= − tr W 2 + W > W (4.38)

EH0 2
∂λ
Then the expression for the LM test for a SEM specification is:
2
1 εb> W εb

LMERR = (4.39)
C b2
σ
where C = tr W + W > W . Therefore, the test requires only OLS estimates. Under the null hypothesis,

this statistic converges asymptotically to a χ2 (1). For example, if we use a significance level of 95%, the
critical value is 3.84. Thus, we reject the null hypothesis, if the value of the statistical test LMERR is greater
than 3.84. We can conclude in this case that spatial autocorrelation is present in the standard linear model
residuals and we must proceed to estimate the SEM specification.
Note also that it is similar in expression to Moran’s I: except for the scaling factor T , this statistic is
essentially the square of Moran’s I.
102 CHAPTER 4. HYPOTHESIS TESTING

Test for SLM

The LM test can also be used to detect whether the detected spatial autocorrelation among the residuals of
the multiple regression does not rise from the omission of spatially lagged dependent variable regressors.
The null hypothesis of this test is based on the significance of the autoregressive parameter, H0 : ρ = 0.
In this case:

∂ log L(θ) 1

sρ=0 = = 2 ε> W y (4.40)
∂ρ
ρ=0 σ
The inverse of the information matrix is given in (3.50). The complicating feature of this matrix is that
even under ρ = 0, it is not block diagonal; the (ρ, β) term is equal to (X > W Xβ)/σ 2 , obtained by inserting
ρ = 0; i.e., C = W . The main problem of this is that even under ρ = 0, we cannot ignore one of the
off-diagonal terms. This is not the case for sλ=0 . Asymptotic variance of sλ=0 was obtained just using the
(2, 2) element of ?. For the spatial lag model, asymptotic variance of sρ=0 is obtained from the reciprocal of
the last element of: 1
−1
σ 2 (X X)
>
00 >
1 1
σ 2 X W Xβ
Var(β, σ , ρ) ρ=0 =
2
n
 . 2σ 4 0 
. tr(W + W W ) +
. 2 >
σ 2 (W Xβ) (W Xβ)
1 >

Since under ρ = 0, C = W and tr(W ) = 0. Recall that T = tr W > + W W , then we can write:

2
1

εbW y
LMSAR = (4.41)
T1 b2
σ
h i −1 >
>
where T1 = (W X εb) M (W X εb) + T σ σ 2 with M = I − X X > X
b2 /b X . Under the null hypothesis,
the test asymptotically converges according to the χ2 distribution to 1 degree of freedom.

4.4.4 Anselin and Florax Recipe

How to decide? For the simple case of choosing between a SLM or SEM alternative, there is evidence that
the proper model is most likely the one with the largest significant LM test value (Anselin and Rey, 1991).

• When the test LMLAG value is significant and the LMERR is insignificant, the most appropriate model
is the SLM model;

• in the same vein, when the test LMERR is significant and the LMLAG value is insignificant, the most
appropriate model is the SEM model.

As you can guess, sometimes it is possible to find that both statistical test are significant. In this case,
one decision rule can be as follows:

• when the test LMLAG value is higher than the test LMERR value, it would be best to consider the
SLM model;
• when the test LMERR value is higher than the test LMLAG value, it would be best to consider the
SEM model.

Of course, if both statistics are significant, it could also well be appropriate to estimate a general autore-
gressive model (SAC).

4.4.5 Lagrange Multiplier Test Statistics in R

Lagrange Multiplier tests, as well as their robust forms are included in the lm.LMtests function. An OLS
regression object and a spatial listw object must be passed as arguments. In addition, the tests must be
specified as a character vector as illustrated below.

1 This is obtained using partitioned Inversion.

4.4. TESTS BASED ON ML 103

# LM test
lm.LMtests(ols, listw,
test = c("LMerr", "RLMerr", "LMlag", "RLMlag"))

##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = CRIME ~ INC + HOVAL, data = columbus)
## weights: listw
##
## LMerr = 4.6111, df = 1, p-value = 0.03177
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = CRIME ~ INC + HOVAL, data = columbus)
## weights: listw
##
## RLMerr = 0.033514, df = 1, p-value = 0.8547
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = CRIME ~ INC + HOVAL, data = columbus)
## weights: listw
##
## LMlag = 7.8557, df = 1, p-value = 0.005066
##
##
## Lagrange multiplier diagnostics for spatial dependence
##
## data:
## model: lm(formula = CRIME ~ INC + HOVAL, data = columbus)
## weights: listw
##
## RLMlag = 3.2781, df = 1, p-value = 0.07021

Note that both LMerr and LMlag are significant. However, the robust statistics point to the lag model
as the proper alternative. With this information in hand, we an select the spatial lag model as the proper
model.
104 CHAPTER 4. HYPOTHESIS TESTING
Instrumental Variables and GMM
5
In the previous chapter, we learnt how to estimate spatial models using ML. One of the main disadvantages of
this method is that it may be computational intensive when the number of spatial units is large. Recall this
procedure requires the manipulation of n × n matrices, such as the matrix multiplication, matrix inversion,
the computation of characteristics roots and so on.
In this chapter, we will study the instrumental variables and the generalized method of moments method
(IV/GMM). One of the reason for developing IV/GMM estimators was a response to perceived computational
difficulties of the ML method (Kelejian and Prucha, 1998, 1999). Unlike ML, the IV/GMM procedure does
not require the computation of the Jacobian, and does not rely on the normality assumption.

5.1 A Review of GMM

Before explaining the estimation procedure for the SLM, SEM and SAC model, we review some aspects of
the GMM procedure in the spatial context. This section is heavily based on Prucha (2014).

5.1.1 Model Specification

Suppose the data are generated from a model

f (yin , xin , θ0 ) = uin i = 1, ..., n.,

where f (yin , xin , θ0 ) might represent a system of spatial equations, yin is the dependent variable corresponding
to unit i, xin is a vector of explanatory variables, uin is a disturbance term, θ0 is the K ×1 unknown parameter
vector, and f (·) is a known function.
Also, assume that there exists a 1×L vector of instruments hin and let win be the vector of all observables
variables, including instruments, pertaining to the ith unit. For simplicity, assume that the disturbances are
i.i.d. (0, σ 2 ) and that the instruments are non-stochastic (those assumption can be relaxed). Note that we are
considering a triangular array since the variables are indexed by n. In particular the explanatory P variables
could be ofP the form xin = [xi , x̄in , ȳin ] where xi is some exogenous explanatory variable, and x̄in = j wij xj
and ȳin = j wij yjn are spatial lags, where wij denote spatial weights with wii = 0.
Suppose that there exists a vector S × 1 of sample moments

g1,n (w1 , ..., wn , θ)

 

gn (θ) = gn (w1 , ..., wn , θ) =  ..

. ,
 

gS,n (w1 , ..., wn , θ)

with S ≥ K (for identification), and suppose that

E [gn (w1 , ..., wn , θ)] = 0 ⇐⇒ θ = θ0 ,

that is, the model is identified. Let Υ be some S × S symmetric positive semidefinite weighting matrix, then
the corresponding GMM estimator is defined as:
105
106 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

>
θbn = argmin gn (w1 , ..., wn , θ) Υ gn (w1 , ..., wn , θ). (5.1)
θ (1×S) (S×S) (S×1)

If s = K the weighting matrix is irrelevant and θbn can be found as a solution to the moment condition:

gn (w1 , ..., wn , θ)
b = 0. (5.2)
The classical GMM literature exploits linear moment conditions of the form
" n #
1X >
E hi ui = 0,
n i=1 (S×1) (1×1)
which holds since E h> i ui = hi E [ui ] = 0 under the maintained assumptions. The spatial literature
>

frequently considers quadratic moment conditions. Let Aq , with element (aijq ) be some n × n matrix
with tr(Aq ) = 0, and assume for ease of exposition that Aq is non-stochastic. Then the quadratic moment
conditions considered in the spatial literature are of the form:
 
n X n
1 X
E aijq ui uj  = 0, (5.3)
n i=1 j=1
>
which clearly holds under the maintained assumptions. To see this, let u = [u1 , ..., un ] , then the moment
conditions in (5.3) can be rewritten as:
" #
Aq E uu> tr(Aq )
>
u Aq u
E = tr = σ2 = 0,
n n n
since under the maintained assumptions E uu> = σ 2 In .

>
Now let θ0 = [λ0 , δ0 ] and suppose the sample moment vector in (5.2) can be decomposed into:

gn (w1 , ..., wn , λ, δ)
λ
gn (w1 , ..., wn , θ) = ,
gnδ (w1 , ..., wn , λ, δ)
where λ is, for example, the spatial autoregressive parameter and δ is the rest of parameters in the model,
such that:

E gnλ (w1 , ..., wn , λ, δ) = 0 ⇐⇒ λ = λ0 ,

E gnδ (w1 , ..., wn , λ, δ) = 0 ⇐⇒ δ = δ0 ,

and that some easily (and consistent) computable initial estimator, say δbn , for δ0 is available. In this case
we may consider the following GMM estimator for λ0 corresponding to some weighting matrix Υnλλ :

bn = argmin g λ (w1 , ..., wn , λ, δ)

λ b > Υ λλ (w1 , ..., wn , λ, δ).
b (5.4)
n n
λ

Utilizing λ
bn we may further consider the following estimator for δ0 corresponding to some weight matrix
Υnδδ :

δbn = argmin gnδ (w1 , ..., wn , λ,

b δ)> Υ δδ (w1 , ..., wn , λ,
n
b δ). (5.5)
δ

GMM estimator like θb in Equation (5.1) are often referred to as one-step estimators. Estimators like
bn and δbn in Equations (5.4) and (5.5) above, where the sample moments depend on some initial estimator,
λ
are often referred to as two-step estimators.
If the model conditions are valid, we would expect the most efficient one-step estimator to be more efficient
than the most efficient two-step estimators. However, as usual, there are trade-offs. One trade-off is in terms
of computations. Recall that for small sample sizes ML is available as an alternative to GMM. For large
sample size, statistical efficiency may be less important than computational efficiency and feasibility, and
thus the use of two-step GMM estimators may be attractive. Also, Monte Carlos studies suggest that in
many situations, the loss of efficiency may be relatively small. Another trade-off is that the misspecification
of one moment condition will typically result in inconsistent estimates of all model parameters.
5.1. A REVIEW OF GMM 107

5.1.2 One-Step GMM Estimation

Assuming that θbn is an interior point, the first-order condition for maximization of the objective function is:

∂Qn (θbn ) b > Υ gn (θ)

0 = = −Gn (θ) b, (5.6)
(K×1) ∂θ (K×S) (S×S) (S×1)
(K×1)

where Gn (θ) is the Jacobian of gn (θ):

∂gn (θ)
Gn (θ) ≡ .
∂θ >
Now using Taylor expansion to gn (θ), yields:

gn (θ)
b = gn (θ0 ) + Gn (θ) θbn − θ0 . (5.7)
(S×1) (S×1) (S×K)
(K×1)

Substituting (5.7) into the first-order condition (5.6), we obtain:

∂Qn (θbn ) b > Υ gn (θ) b > Υ Gn (θ) θbn − θ0 .

0= = −Gn (θ) b − Gn (θ)
∂θ (K×S) (S×S) (S×1) (K×S) (S×S) (S×K)
(K×1) (K×1)
√
Solving this for θbn − θ0 and multiplying by n yield:
√ h i−1
n(θbn − θ0 ) = − G(θ)
b > Υ Gn (θ) b >Υ ngn (wi , θ0 ) + op (1).
√
n G(θ) n

For easy exposition, assume that

p
Gn (θ)
b −→ G by some LLN (5.8)
p
Υn −→ Υ by some LLN (5.9)
√ d
ngn (θ0 ) −→ N(0, Ψ ) by some CLT (5.10)

where Ψ is some positive definite matrix. Then applying traditional asymptotic rules:
√ d
n(θbn − θ0 ) −→ N (0, Φ) ,
where:
−1 > −1
Φ = G> Υ G G Υ Ψ Υ G G> Υ G

.

It can be seen that if we choose Υ = Ψ b −1 (weights are given by the variance-covariance matrix of the
n
p
moment conditions), where Ψb −→ Ψ , the variance-covariance simplifies to
−1
Var(θbn ) = Φ = G> Ψ −1 G

.
−1 > −1 > −1 −1
Since G Υ G is positive semidefinite it follows that Υ = Φ
> >
G Υ ΨΥ G G Υ G − G Ψ G b−1
n
gives the optimal GMM estimator (less asymptotic variance).
However, note that we need a CLT applicable to triangular array. In particular we need a CLT for linear
quadratic forms. The following theorems will be useful when deriving the asymptotic properties of Spatial
GMM estimators.
Theorem 5.1 — CLT for triangular arrays with homokedastic errors, (Kelejian and Prucha, 1998). Let {vi,n , 1 ≤ i ≤ n, n ≥ 1}
be a triangular array of identically distributed random variables. Assume that the random variables
{vi,n , 1 ≤ i ≤ n} are jointly independently distributed for each n with E(vi,n ) = 0 and E(vi,n 2
) = σ 2 < ∞.
Let {aij,n , 1 ≤ i ≤ n, n ≥ 1} , j = 1, ..., k be triangular arrays of real numbers that are bounded in absolute
value. Further let
108 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

   
v1,n a11,n ... a1k,n
vn =  ...  , An =  ... .. 
. 
  

vn,n an1,n ... ank,n

Assume that limn→∞ n−1 A>
n An = QAA is finite and nonsingular matrix. Then

1 d
√ A> vn −→ N(0, σ 2 QAA )
n n

Theorem 5.2 — CLT for Vectors of Linear Quadratic Forms with Heterokedastic Innovations. Assume the
following:
1. For r = 1, ..., m let Ar,n withPn elements (aijr )i,j=1,...,n be an n × n non-stochastic symmetric real
matrix with sup1≤j≤n,n≥1 i=1 |aijr | < ∞,
Pn
|air |δ1
2. and let ar = (air , ..., anr )> be a n × 1 non-stochastic real vector with supn i=1n < ∞ for some
δ1 > 2.

3. Let ε = (1 , ..., n )> be an n × 1 random vector with the i distributed totally independent with
δ
E [i ] = 0, E 2i , and sup1≤i≤n,n≥1 E |i | 2 < ∞ for some δ2 > 4.

0
Consider the m × 1 vector of linear quadratic forms vn = [Q1n , ..., Qmn ] with:
n X
X n n
X
Qrn = ε0 Ar ε + a0r ε = aijr i j + air i . (5.11)
i=1 j=1 i=1

>
Let µv = E [vn ] = [µQ1 , ..., µQ2 ] and Σvn = [σQrs ]r,s=1,...,m denote the mean and VC matrix of vn ,
respectively, then:
n
X
µQr = aiir σi2
i=1
n X
X n n
X
σQrs = 2 aijr aijs σi2 σj2 + air ais σi2
i=1 j=1 i=1
n h i Xn
(4) (3)
X
+ aiir aiis µi − 3µ4i + (air aiis + ais aiir )µi
i=1 i=1
(3) (4)
with µi = E(3i ) and µi = E(4i ). Furthermore, given that n−1 λmin (Σvn ) ≥ c for some c > 0, then
d
Σv−1/2
n
(vn − µvn ) −→ N(0, Im )
and thus:
a
n−1/2 (vn − µvn ) ∼ N(0, n−1 Σvn )

Kelejian and Prucha (2001) introduced a CLT for a single quadratic form under the assumptions useful
for spatial models. The generalization to vectors of linear quadratic forms is given in Kelejian and Prucha
(2010).

5.1.3 Two-Step GMM Estimation

The usual approach to deriving the limiting distribution of two-step GMM estimators is to manipulate the
score of the objective function by expanding the sample moment vector around the true parameter, using a
Taylor expansion.1
1 For more on two-step estimation see Newey and McFadden (1994, section 6)
5.1. A REVIEW OF GMM 109

Consider the two-step GMM estimators for λ0 defined in Equation (5.4). Applying this approach, and
assuming typical regularity conditions, we get:

√ h√ √
bn − λ0 = − (Gλλ )> Υ λλ Gλλ −1 Gλλ > Υ λλ
i
ngnλ (λ0 , δ0 ) + Gλδ n δbn − δ0 + op (1), (5.12)

n λ n n n n n n

where

∂gnλ (λ0 , δ0 ) p
−→ Gλλ ,
∂λ
∂gnλ (λ0 , δ0 ) p
−→ Gλδ ,
∂δ
p
Υnλλ −→ Υ λλ .

In many cases the estimator δbn will be asymptotically linear in the sense that
√ 1
n δbn − δ0 = √ Tn> un + op (1),
n
where Tn is a non-stochastic n × kδ matrix, where kδ is the dimension of δ0 , and where un = (u1 , ..., un )> .
Now define:
1 λδ >
λ
g∗n (λ0 , δ0 ) = gnλ (λ0 , δ0 ) + G Tn un .
n
Then Equation (5.12) can be rewritten as:
√
b n − λ0 = − (Gλλ )> Υ λλ Gλλ −1 (Gλλ )> Υ λλ ng λ (λ0 , δ0 ) + op (1).

(5.13)
√
n λ ∗n

Now suppose that

√ d
λ
(λ0 , δ0 ) −→ N 0, Ψ∗λλ

ng∗n
where Ψ∗λλ is some positive definite matrix. Then
√
d
N 0, Φλλ

n λ b n − λ0 −→
∗

with:
λλ > λλ λλ −1 λλ > λλ λλ λλ λλ λλ 0 λλ λλ −1
∗ = (G ) Υ
Φλλ G (G ) Υ Φ∗ Υ G (G ) Υ G
−1 p
From this it is seen that if we choose Υnλλ = Φλλ where Φλλ
∗n −→ Φ∗ , then variance-covariance
λλ
∗n
simplifies to
λλ > λλ −1 λλ −1
∗ = (G ) (Ψ∗ )
Φλλ G .
So, using the weighting matrix Υnλλ , a consistent estimator for the inverse of the limiting variance-
covariance matrix Ψ∗λλ yields the efficient two-step GMM estimator.
Suppose that Equation (5.10) holds and:
λλ
Ψ λδ

Ψ
Ψ= ,
Ψ δλ Ψ δδ
then the limiting distribution of the sample moment vector gnλ evaluated at the true parameter is given by
√ d
ngnλ (λ0 , δ0 ) −→ N 0, Ψ λλ

Note that in general Ψ∗λλ 6= Ψ λλ , unless Gλδ = 0, and that in general Ψ∗λλ will depend on Tn , which
in turn will depend on the employed estimator δbn . In other words, unless Gλδ = 0, for a two-step GMM
estimator, we cannot simply use the variance-covariance matrix Ψ λλ of the sample moment vector mλ (λ0 , δ0 ),
rather we need to work with the variance-covariance matrix Ψ∗λλ .
110 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

Prucha (2014) illustrate the difference between Ψ λλ , with elements Ψλλ rs , and Ψ∗ , with elements Ψ∗rs , for
λλ λλ

the important special case where the moment conditions are quadratic and ui is i.i.d N(0, σ ). For simplicity
2

assume that
1 Pn Pn
j=1 aij1 ui uj
gnλ (λ0 , δ0 ) = n1 Pi=1
n Pn .
n i=1 j=1 aij2 ui uj

Now, for r = 1, 2, let air denote the (i, r)th element of Gλδ Tn> , then by Equation (5.10):
1 Pn Pn Pn
i=1 Pj=1 aij1 ui uj + n Pi=1 ai1 ui
1
g∗n (λ0 , δ0 ) = 1 n
λ n
n n
j=1 aij2 ui uj + n
1
P
n i=1 i=1 ai2 ui

It then follows from Theorem 5.2 that

n X
X n
Ψλλ
rs = 2σ
4
aijr aijs
i=1 j=1

but
n X
X n b
X
Ψλλ
∗rs = 2σ
4
aijr aijs + σ 2 air ais
i=1 j=1 i=1

Note that air and ais in the last sum of the RHS for the expression for Ψλλ ∗rs depend on what estimator
δn is employed in the sample moment vector gn (λ0 , δ) used to form the objective function for the two-step
b λ b
GMM estimator λ bn defined in Equation (5.4). It is for this reason that in the literature on tow-step GMM
estimation, users are often advised to follow a specific sequence of steps, to ensure the proper estimation of
respective variance-covariance matrices.

5.2 Spatial Two Stage Estimation of SLM

In this section we will derive the Spatial Two Stage Least Square (S2SLS) procedure for estimating the SLM
model. The asymptotic properties of this model was first derived by Kelejian and Prucha (1998).2 To get
some insights about this procedure recall that Spatial Lag Model (SLM) is given by:

y = ρW y + Xβ + ε.
A more concise way to express the model is as:

y = Zδ + ε,
where Z = [X, W y] and the (K + 1) × 1 coefficient column vector is rearranged as δ = (β > , ρ)> . As we
have previously shown in Section 3.1, the presence of the spatially lagged dependent variable on the right
hand side of the equation induces endogeneity or simultaneous equation bias. Therefore the OLS estimates
are inconsistent.
Instead of applying QML or ML estimation procedure, we might rely on the instrumental variable approach
in order to deal with the endogeneity caused by the spatial lag variable. The principle of instrumental
variables estimation is based on the existence of a set of instruments, H that are strongly correlated with Z
but asymptotically uncorrelated with ε.
At this point is important to stress that the only endogenous variable in this model is the spatial lagged
variable. Therefore, matrix H should contain all the predetermined variables, that is, X and the instru-
ment(s) for W y. As we will see later, an important feature of this estimation procedure is that it does
not require to compute the Jacobian term. Another important feature is that it does not make the strong
assumption of normality of the error terms.
2 In particular, Kelejian and Prucha (1998) derived this model as the first step in their Generalized S2SLS.
5.2. SPATIAL TWO STAGE ESTIMATION OF SLM 111

5.2.1 Instruments in the Spatial Context

What is the best instrument(s) for W y? To get the ideal instrument for W y we should understand the
literature of optimal instrumental variables. Roughly, it states that the ‘best instruments’ for the r.h.s
variables are the conditional means. Thus, the ideal instruments are:

E (Z|X) = [E (X|X) , E (W y|X)]

= [X, W E (y|X)] since W is non-stochastic.
Since X are non-stochastic, X are their own best instrument, whereas the best instruments for W y are
given by W E ( y| X). Noting that the reduced-form equation is y = (In − ρ0 W )−1 (Xβ0 + ε), and using
Leontief Expansion (Lemma 2.3), the expected value of the reduced form is
"∞ #
X
E(y|X) = (In − ρ0 W )−1 Xβ0 = In + ρW + ρ2 W 2 + ... Xβ0 = ρl0 W l Xβ0 (5.14)

l=1

In principle, the problem is to approximate E(y|X) as closely as possible without incurring in the inversion
of (In − ρ0 W ). Therefore, note that (5.14) can be expressed as a linear function of X, W X, W 2 X, ... As a
result, and given that the roots of ρWn are less than one in absolute value, the conditional expectation can
also be written as:

E(W y|X) = W E ( y| X)
−1
= W (In − ρW ) Xβ
= W In + ρW + ρ2 W 2 + ρ3 W 3 + ... Xβ

"∞ #
X
=W l l
ρ0 W Xβ
l=1
= W Xβ + W 2 X(ρβ) + W 3 X(ρ2 β) + W 4 X(ρ3 β) + ...
To avoid issues associated with the computation of the inverse of the n × n matrix (In − ρ0 W ), Kelejian
and Prucha (1998, 1999) suggest the use of an approximation of the best instruments. More specifically,
since E(y|X) is linear in X, W X, W 2 X..., they suggest using a set of instruments H which contains, say,
X, W X, W 2 X, ..., W l X, and to compute approximations of the best instruments from a regression of the
rhs variables against H, where l is a pre-selected finite constant and is generally set to 2 in applied studies.
Thus, in general we can write the instruments as:

H = (X, W X, W 2 X).

R The intuition behind the instruments is the following: Since X determines y, then it must be true that
W X, W 2 X, ... determines W y. Furthermore, since X is uncorrelated with ε, then W X must be also
uncorrelated with ε.

In the theoretical literature, some other suggestions for so-called optimal weights have been made. For
example, using the conditional expectation in (5.14), Lee (2003) suggested the instrument matrix:

H = X, W (I − ρW )−1 Xβ ,

which requires the use of consistent first round estimates for ρ and β. In Kelejian et al. (2004), a similar
approach is outlined where the matrix inverse is replaced by the power expansion. This yield an instruments
matrix as:
∞
" ! #
X
H = X, W l l
ρ0 W Xβ .
l=1

In any practical implementation, the power expansion must be truncated at some point.
112 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

5.2.2 Defining the S2SLS Estimator

Now that we have defined the matrix of instruments, H, we can apply the standard two-stage procedure
with the exception that the assumptions must also consider the asymptotic behavior of W and (In − ρW ).
Given the inclusion of the weight matrices, this procedure is called spatial two stage least squares (S2SLS).
As usual, we start with some assumptions about the error term. Specifically, we will assume that the errors
form triangular arrays and are heterokedastic. Note that Kelejian and Prucha (1998) derived the asymptotic
properties assuming that the errors are homokedastic. Kelejian and Prucha (2010) extend the model by
assuming heteroskedasticity. We will keep Kelejian and Prucha (2010)’s assumption since homoskedasticity
can be viewed as a particular case. A key difference with the ML approach is that we do not need to assume
the whole distribution of the error term.
Assumption 5.3 — Heterokedastic Errors (Kelejian and Prucha, 2010). The errors {i,n , 1 ≤ i ≤ n, n ≥ 1}
satisfy E(i,n ) = 0, E(2i,n ) = σi,n
2
, with 0 < aσ ≤ σi,n
2
≤ aσ < ∞. Additionally the errors are assumed to
4+η
possess fourth moments, that is sup1≤i≤n,n≥1 E |i,n | for some η > 0. Furthermore, for each n ≥ 1 the
random variables 1,n , ..., n,n are totally independent.

Assumption 5.3 (Heterokedastic Errors) states the first two moments of the error terms, but we do not
assume that they are normally distributed. We assume also that the error terms are heterokedastic, i.e., the
unobserved variables have different variance for all spatial units. Finally, this assumption also allows for the
innovations to depend on the sample size n, i.e., to form a triangular arrays. See our discussion in Section
3.6.1 about triangular arrays.
Now, we state some assumptions about the behavior of the spatial weight matrix W .

Assumption 5.4 — Diagonal elements of Wn (Kelejian and Prucha, 1998). All diagonal elements of the spatial
weighting matrix Wn are zero

Assumption 5.4 (Diagonal elements of Wn ) is a normalization of the model and it also implies that no
spatial unit is viewed as its own neighbor.

Assumption 5.5 — Nonsingularity (Kelejian and Prucha, 1998). The matrix (In − ρ0 Wn ) is nonsingular with
|ρ0 | < 1.

Under Nonsigularity Assumption 5.5, we can write the reduced form of the true model as:

yn = (In − ρ0 Wn )−1 Xn β0 + (In − ρ0 Wn )−1 εn .

That is, Assumption 5.5 implies that the model is complete in that it determines yn . Furthermore,
Kelejian and Prucha (1998) note that the elements of (In − ρ0 Wn )−1 will depend on the sample size n, even
if the elements of Wn does not depend on n. Therefore, in general, the elements of yn will also depend on n
and thus form a triangular array, even in the case where the errors i,n do not depend on n.
Assumption 5.3 (Heterokedastic Errors) implies further that the population variance-covariance matrix
of yn is equal to

E(yn yn> ) = Ωyn = (In − ρ0 Wn )−1 Σn (In − ρ0 Wn> )−1 , (5.15)

where Σ = diag(σi,n
2
). If we assume homokedasticity, then the variance-covariance matrix of y reduces to:

E(yn yn> ) = Ωyn = σ2 (In − ρ0 Wn )−1 (In − ρ0 Wn> )−1 .

Assumption 5.6 — Bounded matrices (Kelejian and Prucha, 1998). The row and column sums of the matrices
Wn and (In − ρ0 Wn ) are bounded uniformly in absolute value.

This assumption guarantees that the variance of yn in Equation (5.15), which depend on Wn and (In −
ρ0 Wn ), are uniformly bounded in absolute value as n goes to infinity, thus limiting the degree of correlation
between, respectively, the elements of εn and yn . This assumption is technical and will be used in the
large-sample derivations of the regression parameters estimator.
5.2. SPATIAL TWO STAGE ESTIMATION OF SLM 113

R Applied to W Assumption 5.6 (Bounded matrices) means that each cross-sectional unit can only have
a limited number of neighbors. Applied to (I − ρW ) limits the degree of correlation.

Assumption 5.7 — No Perfect Multicolinearity (Kelejian and Prucha, 1998). The regressor matrices Xn have
full column rank (for n large enough). Furthermore, the elements of the matrices Xn are uniformly
bounded in absolute value.

Now we state some assumptions about the instruments.

Assumption 5.8 — Rank Instruments, (Kelejian and Prucha, 1998). The instrument matrices Hn have full
column rank L ≥ K + 1 for all n large enough. Furthermore, the elements of the matrices Hn are
uniformly bounded in absolute value. They are composed of a subset of the linearly independent columns
of (X, W X, W 2 X, ...).

Assumption 5.9 — Limits of Instruments (Kelejian and Prucha, 1998) . Let Hn be a matrix of instruments,
then:

1. limn→∞ n−1 Hn> Hn = QHH where QHH is finite and nonsingular (full rank).
2. plimn→∞ n−1 Hn> Zn = QHZ where QHZ is finite and has full column rank.

Since the instrument matrix Hn contains the spatially lagged explanatory variables, the first condition
in Assumption 5.9 (Limits of Instruments) limn→∞ n−1 Hn> Hn = QHH implies that Wn Xn and Xn cannot
be linearly dependent. This condition would be violated if for example Wn Xn include a spatial lag for the
constant term or the model is the pure SLM. The second condition in Assumption 5.9 (Limits of Instruments)
requires a non-null correlation between the instruments and the original variables.
Given all this assumptions we can define the S2SLS estimator as follows.
Definition 5.2.1 — Spatial Two Stage Least Square Estimator. Let Hn be the matrix (n × L) of instruments.
Then the S2SLS is given by:
−1
δbS2SLS = Zb > Zn
n
b > yn
Zn (5.16)

where:

bn = Hn θbn = Hn (H > Hn )−1 H > Zn = PH,n Zn

Z (5.17)
n n

Note that the S2SLS estimator in (5.16) is similar to the standard 2SLS. We first need the predicted values
for Z based on the OLS regression of Z on H in the first stage. Consider this first stage as the regression
Z = Hθ + ξ, so that θb = (H > H)−1 H > Z. Then the predicted values Z b is obtained using Equation (5.17)
where PH is the projection matrix, which symmetric and idempotent, and hence singular. Note also that
H is a n × L matrix, which also includes the exogenous variables X. It is also important to note that the
projection matrix does not affect X, but it does affect the endogenous variable W y:
h i
PH Z = [X, PH W y] = X, W dy (5.18)

Note that this approach is in the same spirit as the traditional treatment in simultaneous equation setting,
where each endogenous variable (including the spatial lag) is regressed on the complete set of exogenous
variables to form its instrument.

5.2.3 S2SLS Estimator as GMM

Maybe you remember from your econometric class that the 2SLS procedure is a sub-model of the one-step
GMM estimator. Recall that the GMM estimator is defined as the solution of the minimization problem as
in Equation (5.1):
114 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

 

 

δbGM M = arg min gn (β)> Υn−1 gn (β) ,
| {z } β 
| {z } |{z} | {z }
K×1 1×L L×L L×1

where
1 > 1
gn = H ε = H > (y − Zδ)
n n
The matrix Υn−1 is the optimal weight matrix, which correspond to the inverse of the covariance matrix
of the sample moments:
1 2 >
Υ = σ b H H
n
Then, the function to minimize is:
1 n > > −1 > o
Q= H y − H >
Zδ H >
H H y − H >
Zδ
σ2
nb
Obtaining the first order conditions and solving for δ, we obtain:
−1
δbGM M = Z > PH Z Z > PH y (5.19)

5.2.4 Additional Endogenous Variables

In the specification considered so far, the only endogenous variable is the spatially lagged dependent variable
W y. However, in practice, some of other explanatory variables may be endogenous as well, requiring in-
struments in addition to the spatially lagged exogenous variables that were necessary for the spatially lagged
dependent variable.
For example Anselin and Lozano-Gracia (2008) were interested in the effect of improved air quality on
house prices. Since the air-quality variables were obtained using interpolated air pollution measures, they
argued that these measure may suffer from “error in variable” problem which lead to an additional endogeneity
problem to that of spatially lagged variable. In particular they consider the following model:
n
X
yi = ρ wij yj + x0i β + γ1 pol1i + γ2 pol2i + i ,
j=1

where yi is the house price, xi is a vector of controls, pol1i and pol2i are the air quality variables and i is the
error term. Since the actual pollution is not observed at locations i of the house transaction, it is replaced by
a spatially interpolated value, such as the result of a kriging prediction. This interpolated value measures
the true pollution with error causing simultaneous equation bias, so they needed proper instruments for these
variables. They instrumentalize these endogenous variables using the latitude, longitude and their product
as the instruments.
In particular, we can write the general model with additional endogenous variables

y = ρW y + Xβ + Y γ + ε,
where Y are the endogenous explanatory variables. In a spatial lag model, an additional question is whether
these instruments (for the endogenous explanatory variables) should be included in spatially lagged form as
well, similar to what is done for the exogenous variables. As before, the rationale for this comes from the
structure of the reduced form. In this case the reduced form is given by:
−1 −1
E [ W y| Z] = W (I − ρW ) Xβ + W (I − ρW ) Y γ,
where Z = [X, Y ]. The problem here is that the Y are endogenous, and thus they do not belong on the
right hand side of the reduced form! If they are replaced by their instruments, then the presence of the term
−1
W (I − ρW ) would suggest the need for spatial lags to be included as well. In other words, since the
system determining y and Y is not completely specified, the optimal instruments are not known (Bivand
and Piras, 2015).
5.2. SPATIAL TWO STAGE ESTIMATION OF SLM 115

5.2.5 Consistency of S2SLS Estimator

In this section, we will sketch the proof the consistency of the S2SLS Estimator. The following Theorem will
be useful:
Theorem 5.10 — Sufficient Conditions for Consistency. Chebyshev’s inequality implies that a sufficient con-
ditions for an estimator based on a sample of size n, say θbn , say to be consistent for θ are:

E θbn = θ0

lim Var θbn = 0
n→∞

If these two requirements are met, then:

p
θbn −→ θ

First, note that Hn (Hn> Hn )−1 Hn> is symmetric and idempotent and so Z
b > Zn = Z
n n n . As usual, we
b>Z
b
first write the estimator in terms of the population error term:

−1
δbn = δ0 + Zb>Z
n
bn b > εn
Zn
h > i−1 > (5.20)
= δ0 + Hn (Hn> Hn )−1 Hn> Zn Hn (Hn> Hn )−1 Hn> Zn Hn (Hn> Hn )−1 Hn> Zn εn
−1 >
= δ0 + Zn> Hn (Hn> Hn )−1 Hn> Zn Zn Hn (Hn> Hn )−1 Hn> εn

where we used Assumption 5.8 (Rank of Instruments). Solving for δbn − δ0 we obtain:

" > −1 #−1 > −1

1 > 1 > 1 > 1 > 1 > 1 >

(δbn − δ0 ) = H Zn H Hn H Zn H Zn H Hn H εn
n n n n n n n n n n n n
1 >

= Pn
e >
H εn ,
n n
(5.21)
where:
−1 " > −1 #−1
1 > 1 > 1 > 1 > 1 >

Pen = H Hn H Zn H Zn H Hn H Zn .
n n n n n n n n n n
From Assumption 5.9 (Limits of Instruments), we know that:

lim n−1 Hn> Hn = QHH

n→∞
plimn→∞ n−1 Hn> Zn = QHZ .
p −1
Therefore, Pen −→ Pn , where Pn = Q−1
HH QHZ Q> −1
HZ QHH QHZ = Op (1) is a finite matrix. Thus,

Pen − Pn = op (1) =⇒ Pen = Pn + op (1). (5.22)

By Assumption 5.8 (Rank of Instruments) H is uniformly bounded in absolute value. The Assumption 5.3
(Heterokedastic Errors) implies that i,n forms a triangular array of identically distributed random variables.
Furthermore, we know from that assumption that E(ε) = 0 and Var(ε) = Σ = diag(σi,n 2
). Thus,

1 >

E Hn εn = 0
n
1 > 1

Var Hn εn = 2 Hn> Σn Hn
n n
p p
Since Var n Hn εn → 0, by Chebyshev’s Theorem 5.10, n−1 Hn> εn −→ 0 and δbn −→ δ0
1 >

116 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

5.2.6 Asymptotic Distribution of S2SLS Estimator

√
Multiplying Equation (5.21) by n we obtain:

" > −1 #−1 > −1

√ 1 1 1 1 > 1 > 1
n(δbn − δ0 ) = >
Hn Zn >
Hn Hn >
Hn Zn Hn Zn Hn Hn √ Hn> εn
n n n n n n
1

= Pen> √ Hn> εn .
n
(5.23)
Inserting (5.22) into (5.23), we get:
√ 1 >
n(δbn − δ0 ) = √ [Pn + op (1)] Hn> εn
n
1
= Pn> √ Hn> εn + op (1)
n
√
Note that we can also write: n(δbn − δ0 ) = √1n Tn> εn + op (1) with Tn = Hn Pn , as in our discussion in
Section 5.1.3.
Thus, by Chebyshev’s inequality n−1/2 Pn Hn> εn = Op (1) (that is it converges in distribution to something)
and consequently:
√ 1
n(δbn − δ0 ) = Pn> √ Hn εn + op (1) = Op (1) + op (1) = Op (1)
n
Therefore using Theorem 5.1 (CLT for Linear Forms),
1 d
√ Hn> εn −→ N 0, Hn> ΣHn

n
Finally :
√ d
n(δbn − δ0 ) −→ N(0, Ωn )
where

Ωn = Pn> Hn> Σn Hn Pn (5.24)

Now, we present a formal Theorem for the asymptotic properties of the 2SLS Estimator for SLM.

Theorem 5.11 — Spatial 2SLS Estimator for SLM. Suppose that Assumptions 5.3 to 5.9 hold. Then the
S2SLS estimator defined as
−1
δbn = Zb>Z
n
bn b > yn
Zn (5.25)

is consistent, and its asymptotic distribution is:

√ d
n(δbn − δ0 ) −→ N(0, Ωn ) (5.26)
where

Ωn = Pn> Hn> ΣHn Pn (5.27)

Inference on δ is then based on the asymptotic variance-covariance matrix:
5.2. SPATIAL TWO STAGE ESTIMATION OF SLM 117

h −1 > i−1

Var(δb2SLS ) = Z > H H > H H Z
−1 −1 > i
h
× Z >H H >H H > ΣH H > H

H Z
h −1 > −1 i (5.28)
× Z >H H >H H Z
−1 −1
= Zb>Z b>ΣZ
Z b Z >Z b

Theorem 5.11 gives us a very general asymptotic distribution for the S2SLS estimator. The estimator of
Σ will be based on HAC estimators. However, under certain conditions the asymptotic variance-covariance
matrix of the estimator can be reduced. For example, under homokedasticity the asymptotic variance-
covariance matrix reduced to :
−1
−1
Var(δb2SLS ) = σ2 Q>
HZ QHH QHZ (5.29)
A good estimator for the asymptotic variance will be:
−1
Var(
d δb2SLS ) = σ
b2 Z > H(H > H)−1 H > Z (5.30)

where:

εb> εb
b2 =
σ , εb = y − yb (5.31)
n

5.2.7 S2SLS Estimation in R

In this section we continue our example from Section 3.5. In particular, we will estimate the following SLM
model:

y = ρW y + Xβ + ε,
where y is our crime variable and X contains a vector of ones and the variables INC and HOVAL. We will
estimate this model again by ML procedure and then compare it with the S2SLS procedure. In R there
exists two functions in order to compute the S2SLS procedure. The first one is the stsls from spdep and
stslshac from sphet package (Piras, 2010). The latter allows estimating also S2SLS with heterokedasticity
using HAC estimators.
We first load the required packages and dataset:

# Load packages and data

library("memisc")
library("spdep")
library("sphet")
data("columbus")
listw <- nb2listw(col.gal.nb)
source("getSummary.sarlm.R")

Now we estimate the SLM model by ML using Ord’s eigen approximation of the determinant and S2SLS
with homokedastic and robust standard errors.

# Estimate models
slm <- lagsarlm(CRIME ~ INC + HOVAL,
data = columbus,
listw,
method = "eigen")
s2sls <- stsls(CRIME ~ HOVAL + INC,
data = columbus,
listw = listw,
robust = FALSE,
118 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

W2X = TRUE)
s2sls_rob <- stsls(CRIME ~ HOVAL + INC,
data = columbus,
listw = listw,
robust = TRUE,
W2X = TRUE)
s2sls_pir <- stslshac(CRIME ~ INC + HOVAL,
data = columbus,
listw = listw,
HAC = FALSE)

stsls function fits SLM model by S2SLS, with the option of adjusting the results for heteroskedasticity.
Note that the arguments are similar to lagsarlm from spdep. The robust option of stsls is set FALSE
as default. If TRUE the function applies a heteroskedasticity correction to the coefficient covariances. Note
that the third model s2sls_rob uses this option. The argument W2X controls the number of instruments.
When W2X = FALSE only W X are used as instruments, however when W2X = TRUE W X and W 2 X are used
as instruments for W y. The function stslshac from sphet with the argument HAC = FALSE estimate the
S2SLS estimates with homokedastic standard errors without adjusting for heteroskedasticity.
Some caution should be expressed regarding the standard errors. When the argument robust = FALSE
is used, the variance-covariance matrix is computed as:
−1
Var(
d δb2SLS ) = σ
b2 Z > Z

where:

εb> εb
b2 =
σ , εb = y − yb
n−K
Note that the error variance is calculated with a degrees of freedom correction (i.e., dividing by n − K).
When robust = TRUE the variance-covariance matrix is computed as we have previously stated. That is:
−1
Var(
d δb2SLS ) = σ
b2 Z > H(H > H)−1 H > Z

The results are presented in Table 5.1.

LeSage (2014, pag. 24) points out that researcher should consider performance of estimation procedures,
not simply point estimates. That is, when comparing models we should also focus on the scalar summaries
of the partial derivatives (direct/indirect effects estimates) and their standard errors. That is, methods that
seems superior in terms of bias of the parameters might performance worse in terms of partial effects.
Now we compare the direct and indirect effects:

im_ml <- impacts(slm, listw = listw, R = 200)

im_s2sls <- impacts(s2sls_rob, listw = listw, R = 200)
summary(im_ml, zstats = TRUE, short = TRUE)

## Impact measures (lag, exact):

## Direct Indirect Total
## INC -1.1225156 -0.6783818 -1.8008973
## HOVAL -0.2823163 -0.1706152 -0.4529315
## ========================================================
## Simulation results (asymptotic variance matrix):
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## INC 0.28916892 0.30818822 0.4752632
## HOVAL 0.09730162 0.09782644 0.1760643
##
## Simulated z-values:
## Direct Indirect Total
5.3. GENERALIZED MOMENT ESTIMATION OF SEM MODEL 119

## INC -3.840769 -2.124869 -3.714764

## HOVAL -2.860266 -1.728447 -2.541097
##
## Simulated p-values:
## Direct Indirect Total
## INC 0.00012265 0.033598 0.00020339
## HOVAL 0.00423286 0.083908 0.01105053

summary(im_s2sls, zstats = TRUE, short = TRUE)

## Impact measures (lag, exact):

## Direct Indirect Total
## HOVAL -0.2858263 -0.2083456 -0.4941719
## INC -1.0687585 -0.7790438 -1.8478023
## ========================================================
## Simulation results (HC0 IV variance matrix):
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## HOVAL 0.1707429 0.1934773 0.3363825
## INC 0.4373813 0.6042149 0.9081161
##
## Simulated z-values:
## Direct Indirect Total
## HOVAL -1.626658 -1.177925 -1.503176
## INC -2.464124 -1.432267 -2.139769
##
## Simulated p-values:
## Direct Indirect Total
## HOVAL 0.103810 0.23883 0.132794
## INC 0.013735 0.15207 0.032373

Table 5.1: Spatial Models for Crime in Columbus: ML vs S2SLS

SLM S2SLS S2SLS Robust

Constant 46.851 ∗∗∗
44.116 ∗∗∗
44.116∗∗∗
(7.315) (11.172) (7.632)
INC −1.074∗∗∗ −1.008∗∗ −1.008∗
(0.311) (0.391) (0.458)
HOVAL −0.270∗∗ −0.270∗∗ −0.270
(0.090) (0.093) (0.174)
ρ 0.404∗∗∗ 0.455∗ 0.455∗∗
(0.121) (0.191) (0.141)
N 49 49 49
Significance: ∗ ∗ ∗ ≡ p < 0.001; ∗∗ ≡ p < 0.01; ∗ ≡ p <
0.05

5.3 Generalized Moment Estimation of SEM Model

In the chapter 3, we have derived the ML estimator for a SEM model. Kelejian and Prucha (1999) derive
a Generalized Moment estimator for λ in order to use it later in a FGLS estimator. The main Kelejian and
Prucha (1999)’s motivation to derive this new estimator is that the (quasi) maximum likelihood estimator
may not be computationally feasible in many cases involving moderate- or large-sized samples. As they
120 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

state, the generalized moment estimator that they suggest is computationally simple irrespectively of the
sample size, which makes it very attractive if we have a very large spatial data base. Since the IV/GMM
estimators ignore the Jacobian term, many of the problems related with matrix inversion, the computation
of characteristic roots and/or Cholesky decomposition could be avoided. Another motivation was that at the
time there were no formal results available regarding the consistency and asymptotic normality of the ML
estimator (Prucha, 2014, pag. 1608). Recall that Lee formally derived the asymptotic properties of the ML
in 2004 for the SLM.3
Recall that the SEM model is given by:

y = Xβ + u,
(5.32)
u = λM u + ε.
In brief, Kelejian and Prucha (1999) suggest the use of nonlinear least square to obtain a consistent
generalized moment estimator for λ, which can be used to obtain consistent estimators for β in a FGLS
approach. The main difference between the Generalized Moments (GM) estimation discussed here and the
Generalized Method of Moment (GMM) estimation discussed later is that in the former there is no inference
for the spatial autoregressive coefficient. In other words, λ is viewed purely as a nuisance parameter, whose
only function is to aid in obtaining consistent estimates for β.

R The GM procedure proposed by Kelejian and Prucha (1999) was originally motivated by the compu-
tational difficulties of the ML.

R Kelejian and Prucha (1999) does not provide an asymptotic variance for λ. Thus, some software just
λ, but not its standard error.
provide the estimate b

One advantage of the GM estimator (and of QML) is that they do not rely on the assumption of nor-
mality of the disturbances ε. Nonetheless, both estimators assume that i are independently and identically
distributed for all i with zero mean and variance σ 2 . To begin with, we state the same assumption about the
error terms as in Kelejian and Prucha (1999).

Assumption 5.12 — Homokedastic Errors (Kelejian and Prucha, 1999). The innovations {i,n , 1 ≤ i ≤ n, n ≥ 1}
are independently and identically distributed for all n with zero mean and variance σ 2 , where 0 < σ 2 < b,
with b < ∞. Additionally, the innovations are assumed to possess finite fourth moments.

Now we state the following assumptions:

Assumption 5.13 — Weight Matrix Mn (Kelejian and Prucha, 1999). Assume the following:

1. All diagonal elements of the spatial weighting matrix Mn are zero.

2. The matrix (I − λ0 Mn ) is nonsingular with |λ0 | < 1.

Given Equation (5.32), and Assumption 5.13 (Weight Matrix Mn ), we can write u = (I − λM )−1 ε.
Therefore, the expectation and variance of u are E(u) = 0 and E(uu> ) = Ω(λ), respectively, where:

Ω(λ0 ) = σ2 (In − λ0 Mn )−1 (In − λ0 Mn> )−1 .

Note that a row-standardized spatial weight matrix is typically not symmetric, such that Mn 6= Mn> and
thus (In − λ0 Mn )−1 6= (In − λ0 Mn> )−1 .

5.3.1 Spatially Weighted Least Squares

The key issue in Kelejian and Prucha (1999) is to find a consistent estimator of λ so that the consistency
of the resulting spatially weighted estimator is assured. Under this approach, Kelejian and Prucha (1999)
were not necessarily interested in inference about λ per se, but only interested in its estimate as a way to
obtain estimates for β. This implies that λ is considered a nuisance parameter.
3 The consistency and asymptotic normality of the ML estimator for the SEM and SAC model remain to be derived.
5.3. GENERALIZED MOMENT ESTIMATION OF SEM MODEL 121

The spatially weighted least squares (SWLS) boils down to:

−1
βbSW LS = Xs> Xs Xs> ys , (5.33)

were Xs = X − λW b X and ys = y − λM b y, using a consistent estimate λ b for the autoregressive parameter.

Note that this model is basically and OLS applied to spatially filtered variables. Furthermore, it should
be noted that the SWSLS are nothing but a special case of feasible generalized least squares (FGLS). To note
this consider the homoskedastic case, with E ε> ε = σ 2 In . Consequently:

h i−1
>
E uu> = Ω = σ 2 (In − λM ) (In − λM ) (5.34)

,

and the corresponding generalized least squares (GLS) estimator—assuming we know λ0 —for β is:
−1 > −1
βbGLS = X > Ω −1 X

X Ω y.
From Equation (5.34), it can be observed that Ω contains a matrix inverse, so that its inverse, Ω −1
is simple the product of the two spatial filters scaled by σ 2 . Thus, the expression for the GLS estimator
simplifies to:
−1
> 1 1

> >
βGLS = X 2 (In − λM ) (In − λM ) X
b X > 2 (In − λM ) (In − λM ) y,
σ σ
h i−1
> >
= X > (In − λM ) (In − λM ) X X > (In − λM ) (In − λM ) y.

The FGLS estimator substitutes a consistent estimate for λ into this expression, as:
> −1 >
βF GLS = X
b >
In − λM
b In − λM X
b X > In − λM
b In − λM
b y,

which is the same as Equation (5.33).

5.3.2 Moment Conditions

The basic idea behind a method of moments estimator is to find a set of population moments equations
that provide a relationship between population moments and parameters. Then, we replace the population
moments using sample moments to obtain a consistent estimate of λ to plug into Equation (5.33).
Given the DGP in Equation (5.32), we can write:

ε = u − λM u,
where ε is the idiosyncratic error and u is the regression error. The GM estimation approach employs the
following simple quadratic moment conditions:4

E n−1 ε> ε = σ 2 ,

σ2
E n−1 ε> M M ε = E tr(M > M εε> ) ,

n
E n−1 ε> M ε = 0.

The
Kelejian and Prucha (1999)’s GM estimator of λ is based on these three moments. The final value
of E n−1 ε> M M ε will depend on the assumption about the variance of ε. If we assume heterokedasticity
then:

E n−1 ε> M M ε = E n−1 tr ε> M M ε

= n−1 tr W diag E(2i ) W >

where we use the fact that tr X > AX = X > AX = tr AXX > . Furthermore, note that under homokedas-

ticity as in Kelejian and Prucha (1999) we obtain:

4 Please, derive these moment conditions.
122 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

σ2
E n−1 ε> M M ε = tr M > M

n

Definition 5.3.1 — Moment Conditions. Under homoskedasticity (Kelejian and Prucha, 1999) the moment
conditions are:

E n−1 ε> ε = σ 2 ,

σ2
E n−1 ε> M M ε = tr M > M ,

n
E n−1 ε> M ε = 0.

Under heterokedasticity (Kelejian and Prucha, 2010) the moment conditions are:

E n−1 ε> ε = σ 2 ,

E n−1 ε> M M ε = n−1 tr W diag E(2i ) W > ,

E n−1 ε> M ε = 0.

In order to operationalize the moment conditions, we need to convert conditions on ε into conditions on
u (since ε is not observed). Since u = λM u + ε if follows that ε = u − λM u, i.e., the spatially filtered
regression error terms.

ε> ε = (u − λM u)> (u − λM u)
= u> u − 2λu> M u + λ2 u> M > M u (5.35)
>
ε M Mε>
= (u − λM u) M M (u − λM u)
> >

= u> M > M u − 2λu> M > M M u + λ2 u> M > M M > M u (5.36)

>
ε Mε = (u − λM u) M (u − λM u)
>

= u> M u − 2λu> M M u + λ2 u> M > M M u (5.37)

Let uL = M u, uLL = M M u.5 Taking the expectation over (5.35) and assuming Homokedasticity by
Assumption 5.12, we get:

E ε> ε = E u> u − 2λE u> M u + λ2 E u> M > M u

1 2 1
σ 2 = E u> u − λ E u> uL + λ2 E u> since E n−1 ε> ε = σ 2

L uL
n n n
1 > 2 > 1
0 = σ − E u u + λ E u uL − λ 2 E u>
2

L uL
n n n (5.38)
2 > 21
> 1 2 1 >
0 = λ E u uL − λ E uL uL + σ − E u u
n n n n

λ2 1
0 = n2 E u> uL − n1 E u> 1 λ  − E u> u

L uL
n
σ2
In similar fashion,

 
λ
λ2  − 1 E u>
0 = n tr(M M ) (5.39)
>
2
− n1 E u> 1 >

n E uLL uL LL uLL
n L uL
σ2
 
λ2 1
0 = + 0 λ  − E u> uL (5.40)
1
> >
1
>
n E u u LL uL u L − n E u L u LL
n
σ2
5 Spatially lagged variables are denoted by bar superscripts in the articles. Instead, we will use the L subscript throughout.

That is, a first order spatial lag of y, W y, is denoted by yL . Higher order spatial lags are symbolized by adding additional L
subscripts.
5.3. GENERALIZED MOMENT ESTIMATION OF SEM MODEL 123

At this point it is important to realized that we have have three equations an three unknowns, λ, λ2 and
σ . Consider the following three-equations system implied by Equations (5.38), (5.39) and (5.40):
2

Γn α = γn (5.41)
where Γn is given in Equation (5.42), and α = (λ, λ2 , σ 2 ).6 If Γn where known, Assumption 5.16 (Identifica-
tion) implies that Equation (5.41) determines α as:

α = Γn−1 γn
where:
2 >
− n1 E u> 1
 
n E u uL L uL
Γn = n tr(M M ) (5.42)
>
 2
n > LL uL >
E u − n1 E u>LL uLL
1 > 
1
n E u uLL + uL uL − n1 E u> L uLL 0
and
1 >
 
n E u u
γn = 1 >
 E uL uL 
n (5.43)
1 >

n E u uL

Now we express the moment conditions γn = Γn α as sample averages in observables spatial lags of OLS
residuals:

gn = Gn α + υn (λ, σ 2 ) (5.44)
Note also that
2 >
b>
− n1 u 1
 
n u uL Lu
b b bL
Gn = n tr(M M )
2 > b 1 > 1 >
 u
>n LL u
b L −nu b LL u
b LL 
n u uLL + uL uL 0
1
b> b b>
− n1 u

b b Lub LL
and
1 >
 
nu u
b b
gn = 1
 u >b 
n L uL
b
1 >
n u uL
b b
where Gn is a 3 × 3 matrix, and where υn (λ, σ 2 ) can be viewed as a vector of residuals. This can be thought
as a OLS regression where (Kelejian and Prucha, 1998):

e n = G−1
α n gn (5.45)
However, the estimator in (5.45) is based on an overparameterization in the sense that it does not use the
information that the second element of α, λ2 , is the squared of the first. Given this, (Kelejian and Prucha,
1998) and (Kelejian and Prucha, 1999) define the GM estimator for λ and σ 2 as the nonlinear least square
estimator corresponding to Equation (5.44):7

(λ 2
LS,N ) = argmin υn (λ, σ ) υn (λ, σ ) : ρ ∈ [−a, a], σ ∈ [0, b]
2 > 2 2
(5.46)

bN LS,n , σ
bN

Note that (λ
bN LS,n , σ
bN2
LS,N ) are defined as the minimizers of
  >   
λ λ
gn − Gn λ2  gn − Gn λ2 
σ2 σ2

6 Note
that we are assuming that λ2 is a new parameter.
7 Theystate that is more efficient than the OLS estimator. However, both estimator are consistent. See Theorem 2 in
(Kelejian and Prucha, 1998).
124 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

Assumption 5.14 — Bounded Matrices (Kelejian and Prucha, 1999). The row and column sums of the matrices
Mn and (I − λMn ) are bounded uniformly in absolute value.

Assumption 5.15 — Residuals (Kelejian and Prucha, 1999). Let u

ei,n denote the i-th element of u
e n . We then
assume that

ei,n − ui,n = di,n ∆n

u
where di,n and ∆n are 1 × p and p × 1 dimensional random vectors. Let dij,n be the jth element of di,n .
2+δ
Then, we assume that for some δ > 0, E |dij,n | ≤ cd < ∞, where cd does not depend on n, and that
√
n k∆n k = Op (1). (5.47)
√
This assumption should be satisfied for most cases in which ue is based on n-consistent estimators of the
regression coefficients (non-linear OLS, linear OLS, 2SLS). Assumption 5.15 comes from Kelejian and Prucha
(2010) and is a bit stronger than the same assumption in Kelejian and Prucha (1999).

Assumption 5.16 — Identification (Kelejian and Prucha, 1999). Let Γn be the matrix in Equation (5.42). The
smallest eigenvalues of Γn> Γn is bounded away from zero, that is, ωmin (Γn> Γn ) ≥ ω∗ > 0, where ω∗ may
depend on λ and σ 2

Theorem 5.17 — Consistency. Let (λ

bN LS,n , σ
bN2
LS,N ) given by:

(λ 2
LS,N ) = argmin υn (λ, σ ) υn (λ, σ ) : ρ ∈ [−a, a], σ ∈ [0, b]
2 > 2 2

bN LS,n , σ
bN
Then, given Assumptions 5.3 (Heterokedastic errors), 5.13 (Weight Matrix Mn ), 5.14 (Bounded Ma-
trices), 5.15 (Residuals), and 5.16 (Identification),
p
(λ
bN LS,n , σ
bN2
LS,N ) −→ (λ, σ ) as n → ∞
2
(5.48)

An important remark is that Theorem 5.17 states only that the NLS estimates are consistent, but it does
not tell us about the asymptotic distribution of λ
bN LS,n .
The following Theorem is very useful to derive the important asymptotic results:

Theorem 5.18 — Consistency of quadratic forms in spatial models. Let S be an n × n nonstochastic matrix
whose row and columns sums are uniformly bounded in absolute value. Let v > = (v1 , ..., vn ) where vi are
iid (0, σ 2 ) and E(vi4 ) < ∞. Then

v > Sv p tr(S)
−→ E v > Sv = σ 2

n n
If the limit of tr(S)/n exits, then:

tr(S)
lim = S∗
n→∞ n
and

v > Sv p 2 ∗
−→ σ S .
n

Sketch or proof for GM estimator of λ.b The proof is based on Kelejian and Piras (2017) and consist into two
steps. First, we prove consistency of λ for the OLS estimate of α—which is more simple—and assuming that
b
the vector u is observed. We then show that u can be replaced in the GM estimator for λ by u b . For a more
general proof see Kelejian and Prucha (1998, 1999).

1. Assuming that u is observed. Recall that in Equation (5.44) the sample moments are based on the
estimated u
b . But, if u were observed, then we would use the following sample moments:
5.3. GENERALIZED MOMENT ESTIMATION OF SEM MODEL 125

gn∗ = G∗n α
where

2 >
− n1 u> 1
 
n u uL L uL
G∗n = n tr(M M )
2 >

>n uLL uL > − n1 u>
LL uLL
1 > 
n u uLL + uL uL − n1 u> 0
1
L uLL

and

1 >
 
nu u
gn∗ =  u>
1
n L uL

1 >
n u u L

Recall that:

−1
u = (In − λM ) ε
−1
uL = M (In − λM ) ε
−1
uLL = M M (In − λM ) ε
and first and second column of G are quadratic forms of ε. Then, using Theorem (5.18) we can state
∗

that:
p
G∗ −→ Γn

Also:

plim gn∗ = plim G∗ α

= Γα

If u would be observed, a linear GMM estimator for λ, say λ,

e would be the first element of the least
squared estimator α, namely:

e = G−1∗
α n gn
∗

since G∗n is a 3 × 3 matrix which is nonsingular. Thus, using our previous results:

plim α
e = plim G−1∗
n plim gn∗ = Γn−1 γn = α (5.49)

2. Replacing u by ub . Now consider the estimator α based on u

b . The OLS estimator is consistent and can
be expressed as:

p
βe = β0 + ∆n , ∆n −→ 0.

Then, the OLS estimator u

b is:

b = y − X βb
u
= y − X (β0 + ∆n )
= y − Xβ0 − X∆n
= u − X∆n

Note that, with the exception of the constants in the third column of G∗n , every element of G∗n and
gn∗ can be expressed as a quadratic of the form ε> Sε/n, where S is an n × n matrix whose row and
columns are uniformly bounded in absolute value given our assumption 5.14. For example:

1 > 1 −1> −1 1
u uL = ε> (In − λM ) M (In − λM ) ε = ε> Sε
n n n
126 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

Then:

>
b> S u
u (u − X∆n ) S (u − X∆n )
=
b
n n
u> Su 2∆> >
n X Su ∆> X > SX∆n
= − + n
n n n
We need to show that (This would be part of your homework):

2∆> >
n X Su p
−→ 0
n
∆> >
n X SX∆n p
−→ 0
n
so that we can say that:

b p 1 >
b> S u
u
−→ u Su,
n n
and finally say that:

p p p p
gn −→ gn∗ −→ γn , Gn −→ G∗n −→ Γn

Given Equation (5.49), consistency is proved.

5.3.3 Feasible Generalized Least Squares Model

In Section 5.3.1 we have derived that the GLS estimator is given by:
−1 >
βGLS (λ) = X > Ω(λ)−1 X X Ω(λ)−1 y, (5.50)

where Ω(λ) = (I − λW )−1 (I − λW > )−1 . But now we have a consistent estimate for λ. Thus, we can get
an estimate of βb using the FGLS estimator defined as:
h i−1
βF GLS (λ) = X > Ω(λ)
b −1 X X > Ω(λ)
b −1 y. (5.51)

Assumption 5.19 — Limiting Behavior. The elements of X are non-stochastic and bounded in absolute value
by cX , 0 < cX < ∞. Also, X has full rank, and the matrix QX = limn→∞ n−1 X > X is finite and
nonsingular. Furthermore, the matrices QX (λ) = limn→∞ n−1 X > Ω(λ)−1 X is finite and nonsingular for
all |ρ| < 1

The following Theorem proposes the asymptotic distribution for the FGLS Estimator:

Theorem 5.20 — Asymptotic Properties of FGLS Estimator. If assumptions 5.3 (Homokedastic errors), 5.13
(Weight Matrix Mn ), 5.14 (Bounded Matrices), and 5.19 (Limiting Behavior) hold:

1. The true GLS estimator βbGLS is a consistent estimator for β, and

√
d
n βbGLS − β −→ N 0, σ 2 QX (λ)−1 (5.52)

2. Let λ
bn be a consistent estimator for λ. Then the true GLS estimator βbGLS and the feasible GLS
estimator βbF GLS have the same asymptotic distribution.
h i
3. Suppose further than σ
bn2 is a consistent estimator for σ 2 . Then σ bn )−1 X is a consistent
bn2 n−1 X > Ω(λ
5.3. GENERALIZED MOMENT ESTIMATION OF SEM MODEL 127

estimator for σ 2 QX (λ)−1 .

Note that Theorem 5.20 assumes the existence of a consistent estimator of λ and σ 2 . It can be shown
that the OLS estimator:
−1
βbn = X > X X >y
√
is n-consistent. Thus, the OLS residuals uei = yi − x>i βn satisfy Assumption 5.15 with di,n = |xi | and
b
∆n = βbn − β. Thus, OLS residuals can be used to obtain consistent estimators of λ and σ 2 .
Then, the feasible GLS is given by
h i−1
βbF GLS = X > (λ)X(
e λ)
e X > (λ)y(
e λ) e

where:

e = (I − λM
X(λ) e )X
e = (I − λM
y(λ) e )y

The variance covariance matrix of βbF GLS is estimated as:

h i−1
Var
d βbF GLS = σ
b2 X > (λ)X(
e λ)
e ,

where:

b2 = εb> (λ)b
σ e ε(λ)
e

εb(λ)
e = y(λ)e − X(λ) e βbF GLS = (I − λM
e )b
u
b = y − X βbF GLS
u

Sketch of Proof of Theorem 5.20. We first prove part (a). Recall that the GLS and FGSL estimator are given
by:
−1 >
βbGLS = X > Ω(λ)−1 X X Ω(λ)−1 y

h i−1
βbF GLS = X > Ω(λ)
b −1 X X > Ω(λ)
b −1 y

−1
Since y = Xβ + u = Xβ + (In − λM ) ε, the sampling error of βbGLS is,
−1 >
βb = β + X > Ω(λ)−1 X X Ω(λ)−1 u

−1 > > −1
βb − β = X > Ω(λ)−1 X X (In − λM ) (In − λM ) (In − λM ) ε

−1 > >
βb − β = X > Ω(λ)−1 X X (In − λM ) ε

−1
√ 1 > 1

n(β − β) =
b −1
X Ω(λ) X √ A> ε
n n
where A = (In − λM ) X. By Assumption 5.19 (Limiting Behavior):
1 >
X Ω(λ)−1 X → QX (λ)
n
Since QX is not singular:
−1
1 >

X Ω(λ)−1 X → Q−1
X (λ)
n
Since A is bounded in abolute value, by Theorem 5.1 it follows that:
1 d

√ A> ε −→ N 0, lim n−1 σ 2 A> A (5.53)
n n→∞
128 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM
>
where limn→∞ n−1 σ 2 A> A = σ 2 limn→∞ n−1 X > (In − λM ) (In − λM ) X = σ 2 QX (λ). Consequently:
−1
√ 1 > 1

n(β − β) =
b −1
X Ω(λ) X √ A> ε
n n
| {z } | {z }
→Q−1
X
(λ) d
−→N(0,σ 2 QX (λ))
d
−→ N 0, Q−1 −1
X (λ)σ QX (λ)QX (λ) )
2 >

d
−→ N 0, σ 2 Q−1
X (λ)

This also implies that βbGLS is consistent. To show part (b), we can show that:
√ p
n(βbGLS − βbF GLS ) −→ 0
Following Kelejian and Prucha (1999), if suffices to show that
1 > h b −1 i p
X Ω(λn ) − Ω(λ)−1 X −→ 0 (5.54)
n
and
1 > h b −1 i p
X Ω(λn ) − Ω(λ)−1 u −→ 0
n
Note that:

bn )−1 − Ω(λ)−1 = (λ − λ
Ω(λ bn )(M + M > ) + (λ2 − λ
b2 )M > M
n

Then using the fact the we have summable matrices,

1 > h b −1 i
X Ω(λn ) − Ω(λ)−1 X = (λ − λ
bn ) n−1 X > (M + M > )X + (λ2 − λ
b2 ) n−1 X > M > M X
n | {z } | {z } | {z n} | {z }
p O(1) p O(1)
−→0 −→0

where (λ − λ
bn ) = op (1) since λ
bn is a consistent estimate of λ, and :

1 > h b −1 i
X Ω(λn ) − Ω(λ)−1 u = (λ − λ
bn ) n−1/2 X > (M + M > )u + (λ2 − λ
b2 ) n−1/2 X > M > M u
n | {z } | {z } | {z n} | {z }
p Op (1) p Op (1)
−→0 −→0
= op (1) ∗ Op (1) + op (1) ∗ Op (1)
(5.55)
= op (1) + op (1)
= op (1)
p
−→ 0

To see that n−1/2 X > (M + M > )u = Op (1) note

h i
E n−1/2 X > (M + M > )u = 0
Var[n−1/2 X > (M + M > )u] = n−1 X > (M + M > )Ω(M > + M ) X = O(1)
| {z }
absolutely summable
| {z }
O(n)

A similar result holds for n−1/2 X > M > M u.

Part 3 of the theorem follows from (5.54) and the fact that σ
b2 is a consistent estimator for σ 2 .

R Any random variable X with cdf F is Op (1) (White, 2014, pag. 28).

A Feasible GLS (FGLS) can be obtained along with the following steps:
5.3. GENERALIZED MOMENT ESTIMATION OF SEM MODEL 129

Algorithm 5.21 — GLS (FGLS) Algorithm of SEM. The steps are the following:

1. First of all obtain a consistent estimate of β, say βe using either OLS or NLS.
2. Use this estimate to obtain an estimate of u, say u
b,

3. Use u
b , to estimate λ, say λ,
b using (5.46),

4. Estimate β using Equation (5.51)

5.3.4 FGLS in R
The estimation procedure by GM is carried out by the GMerrorsar function from spatialreg package. In
order to show its functionalities we first load the required packages and dataset:

# Load data and packages

library("memisc")
library("spdep")
library("spatialreg")
data("columbus")
listw <- nb2listw(col.gal.nb)
source("getSummary.sarlm.R")

Now we estimate the SEM model by ML using Ord’s eigen approximation of the determinant and the
Kelejian and Prucha (1999)’s GM procedure:

# Estimate the SEM model by ML and GM

sem_ml <- errorsarlm(CRIME ~ INC + HOVAL,
data = columbus,
listw,
method = "eigen")
sem_mm <- GMerrorsar(CRIME ~ HOVAL + INC,
data = columbus,
listw = listw,
returnHcov = TRUE)

A Hausman test comparing an OLS and SEM model can be obtained using

# Hausman test
summary(sem_mm, Hausman = TRUE)

##
## Call:GMerrorsar(formula = CRIME ~ HOVAL + INC, data = columbus, listw = listw,
## returnHcov = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.8212 -6.8764 -2.1781 9.5693 28.5779
##
## Type: GM SAR estimator
## Coefficients: (GM standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 63.487150 5.083612 12.4886 < 2.2e-16
## HOVAL -0.300365 0.096799 -3.1030 0.0019160
## INC -1.180414 0.341788 -3.4536 0.0005531
##
## Lambda: 0.3643 (standard error): 0.4318 (z-value): 0.84366
## Residual variance (sigma squared): 109.37, (sigma: 10.458)
130 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

## GM argmin sigma squared: 108.93

## Number of observations: 49
## Number of parameters estimated: 5
## Hausman test: 6.4506, df: 3, p-value: 0.091633

The default model specification shown above. The output follows the familiar R format. Note that even
though the estimation procedure is the GM, the output presents inference for λ. In this case, the inference
is based on the analytical method described in http://econweb.umd.edu/~prucha/STATPROG/OLS/desols.
pdf. The output also shows the Hausman test. Recall that this test can be used whenever there are two
estimators, one of which is inefficient but consistent (OLS in this case under the maintained hypothesis of
the SEM), while the other is efficient (SEM in this case). The null hypothesis is that the SEM and OLS
estimates are not significantly different (see LeSage and Pace, 2010, pag. 62). We reject the null hypothesis,
thus the SEM model is more appropriate. Table 5.2 compares the estimates.

Table 5.2: Spatial Models for Crime in Columbus: ML vs GM

ML GM
Constant 61.054 ∗∗∗
63.487∗∗∗
(5.315) (5.084)
INC −0.995∗∗ −1.180∗∗∗
(0.337) (0.342)
HOVAL −0.308∗∗∗ −0.300∗∗
(0.093) (0.097)
λ 0.521∗∗∗ 0.364
(0.141) (0.432)
N 49 49
Significance: ∗ ∗ ∗ ≡ p < 0.001; ∗∗ ≡ p < 0.01; ∗ ≡ p <
0.05

5.4 Estimation of SAC Model: The Feasible Generalized Two

Stage Least Squares estimator Procedure
5.4.1 Intuition Behind the Procedure
Consider the following SAC model:

y = Xβ + ρW y + u = Zδ + u
(5.56)
u = λM u + ε
>
where Z = [X, W y], δ = β > , λ , y is the n × 1 vector of observations of the dependent variables, X

is the n × K matrix of observations on nonstochastic (exogenous) regressors, W and M are the n × n

nonstochastic weights matrices, u is the n × 1 vector of regression disturbances, ε is an n × 1 vector of
innovations. Note that we allow different spatial weight matrices for each process. However, in practice,
there is a seldom sound basis for assuming this.

R This model is generally referred to as the Spatial-ARAR(1, 1) model to emphasize its autoregressive
structure both in the dependent variable and the error term.

The SAC model can be estimated by ML procedure (see Anselin, 1988). However, the estimation process
requires the inversion of A and B, which can be very costly in terms of computation in large samples.
Furthermore, the ML relies on the normality assumption of the error terms. One way of dealing with
this issue is to incorporate the estimation ideas from the S2SLS and GM we previously presented. To see
this, we can re-write the first equation in model (5.56) by applying the following spatial Cochrane-Orcutt
transformation:
5.4. ESTIMATION OF SAC MODEL: THE FEASIBLE GENERALIZED TWO STAGE LEAST SQUARES ESTIMATOR PR

−1
y = Zδ + (I − λM ) ε
(I − λM ) y = (I − λM ) Zδ + ε (5.57)
ys (λ) = Zs (λ)δ + ε
where the spatially filtered variables are given by:

ys (λ) = y − λM y
= y − λyL
= (I − λM ) y
Zs (λ) = Z − λM Z
= Z − λZL
= (I − λM ) Z

If we knew λ, we would be able to apply an IV approach on the transformed model (5.57). For the
discussion below, assume that we know λ. Note that the ideal instruments in this case will be:

E (Z) = E [X, W E (y)]

E (M Z) = E [M X, M W E (y)]
Given that all the columns of E(Z) and E(M Z) are linear in

X, W X, W 2 X, ..., M X, M W X, M W 2 X, ... (5.58)

Let the matrix of instruments, H, be a subset of the columns in (5.58), for example

H = X, W X, ..., W l X, M X, M W X, ..., M W l X ,

where typically, l ≤ 2. Then:

P Z = (X, P W y)
P M Z = (M X, P M W y) .
p
Since we have the instruments H, and we have assumed that we have λ
b such that λ
b −→ λ0 we might
apply a GMM-type procedure using the following moment conditions:

1 >

m(λ0 , δ0 ) = E H u =0
n
Obviously, the corresponding GMM estimator is just the 2SLS estimator. Note that for the transformed
model (5.57), the moment conditions would be

1

m(λ0 , δ0 ) = E √ H > ε = 0
n
Now let λ
e some consistent estimator for λ0 which can be obtained in a previous step, then the sample
moment vector is:

e δ) = √1 H > ys (λ)
h i
mδ (λ, e − Zs (λ)δ
e ,
n | {z }
ε
e
where we explicitly state that the moments depends on δ—which will be estimated—and a consistent estimate
of λ. Under homoskedasticity the variance-covariance matrix of the moment vector g(λ0 , δ0 ) is given by:

Var(m(λ0 , δ0 )) = E(m(λ0 , δ0 )m(λ0 , δ0 )> ) = σ 2 n−1 H > H,

which motivates the following two-step GMM estimator for δ0 :
132 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

δb = argmin gnδ (λ,

e δ)> Υ δδ g δ (λ,
n n
e δ)
δ

with
−1
1 >

Υnδδ = H H .
n
Note that:

1 h i> 1 −1
1 h i
Jn = √ H ys (λ) − Zs (λ)δ
> e e >
H H √ H ys (λ) − Zs (λ)δ
> e e
n n n
−1
1 1 >

h i > h i
= ys (λ)
e − Zs (λ)δ
e H H H H > ys (λ)e − Zs (λ)δ
e
n n
h i> −1 > h i
= ys (λ)
e − Zs (λ)δ
e H H >H H ys (λ)e − Zs (λ)δ
e
h i> h i
= ys (λ)
e − Zs (λ)δ
e PH ys (λ) e − Zs (λ)δ
e

Then, the estimator of δ will be:

h > i−1 >
δb = Z
cs Zs Z
cs ys

where Z cs = H H > H −1 HZs . This estimator has been called the feasible generalized spatial two-stage

least squares (FGS2SLS) estimator (Kelejian and Prucha, 1998). However, this estimator is not fully efficient.
The question is: How to obtain a consistent estimator of λ?
b As probably you can guess, this consistent
estimator is obtained in a previous step by GM.

5.4.2 Moment Conditions Revised

Since we will require a consistent estimate of λ, in this section we will specialized in other ways of expressing
the moment conditions under homokedasticity (Kelejian and Prucha, 1999) and heteroskedasticity (Kelejian
and Prucha, 2010). Furthermore, recall that the Kelejian and Prucha (1999)’s GM approach presented in
Section 5.3.2 does not yield a consistent estimate for λ in the presence of heteroskedasticity: Theorem 5.17 is
derived under homokedasticity. Extensions that include the form of a generalized method of moments were
made by Kelejian and Prucha (2010), Arraiz et al. (2010) and Drukker et al. (2013).
The GMM approach offers three main extensions relative to GM. First, the estimator is robust to the
presence of heteroskedasticity. Second, an asymptotic variance matrix is obtained for the parameter λ.
Finally, joint inference is implemented for the spatial lag coefficient ρ and the spatial error coefficient λ.
The expression for the moment conditions in the articles cited above change a bit. In particular, the
moment conditions are reduced from three to two and their expressions are generalized. This is so since
no condition can be now derived from the parameter σ 2 under heteroskedasticity. To see this, consider the
homokedastic model and the following three moment conditions:

E ε> ε = σ 2

E ε> M M ε = σ 2 tr M > M

E ε> M ε = 0

Substituting out σ 2 into the second moment equation yields:

E ε> M M ε − E ε> ε tr M > M = 0

E ε> M M ε − ε> ε tr M > M = 0

E ε> M M ε − ε> tr M > M ε = 0

E ε> M M − tr M > M I ε = 0

E ε> A1 ε = 0.

5.4. ESTIMATION OF SAC MODEL: THE FEASIBLE GENERALIZED TWO STAGE LEAST SQUARES ESTIMATOR PR

Generalizing this expression for the third moment we end up with two instead of three quadratic moment
conditions:
1 >
E ε A1 ε = 0

n (5.59)
1 >
E ε A2 ε = 0

n
with

A1 = M M − n−1 tr M > M I

A2 = M .
Note that A1 is symmetric with tr(A1 ) = 0 (you should be able to prove this), but its diagonal elements
are non zero (In the heteroskedasticity case it is!). In Drukker et al. (2013), an additional scaling factor is
included as:
h 2 i
ν = 1/ 1 + (1/n) tr M > M

.

Under this case the weighting matrices for quadratic moments are:

A1 = ν M M − n−1 tr M > M I

A2 = Mn .
If the errors are heterokedastic, then:

= M > M − n−1 diag M > M = M > M − n−1 diag m>

A1 i mi
A2 = M,

where mi is the ith column of the weights matrix M . Note that diag m> i mi consists of the sum of the

squares of the weight in the ith column. Denote this matrix as D.

The sample moments are obtained by replacing ε by the their counterpart expressed as a function of the
regression residuals. Since u = λuL + ε, it follows that ε = u − λuL = us , the spatially filtered residuals.
Then:
1 >
E us A1 us = 0

n (5.60)
1 >
E us A2 us = 0

n
or more general
1 >
E u I − λM > Aq I − λM > u = 0 (5.61)

n
where q = 1, 2. Note that:
1 > 1 >
ε Aq ε = (u − λuL ) Aq (u − λuL )
n n
1 1 1 2 >
= u> Aq u − λ u> Aq uL + u>L Aq u + λ uL Aq uL
n n n (5.62)
1 1 1
= u> Aq u − 2 λu> Aq u + λ 2 u>
L Aq uL
n n L n
=0
In the third line of Equation 5.62, we assume that Aq is symmetric such that:

u> Aq uL + u>
L Aq u = uL Aq u + uL Aq u
> >

= u>
L Aq + Aq u
>

= 2u>
L Aq u
134 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

Here it is important to note that in some cases A2 = M might not be symmetric. However we can set:

A2 = (1/2) M + M > (5.63)

Taking expectation over (5.62):

1
E ε> Aq ε = n−1 E u> Aq u − 2n−1 λE u>
L Aq u + λ n
2 −1
E u>

L A1 uL
n
λ
0 = n−1 E u> Aq u − 2n−1 E u> −1 >

A
L q u −n E u A u
L q L λ2
Then, we have the following system of equations for q = 1, 2 (see (Kelejian and Prucha, 2010, pag 56)):

2n E u>
−1
n E u> A1 u
−1 −1
E u>

− L A1 u −n L A1 uL λ
=0
n−1 E u> A2 u 2n−1 E u> L A2 u −n−1 E u> L A2 uL λ2
2n−1 E u> M > A1 u −n−1 E u> M > A1 M u (5.64)
−1
n E u> A1 u

− =0
n−1 E u> uL n−1 E u>L M +M
>
u −n−1 E u> M > A2 M u
γn − Γn αn = 0.
where we use Equation (5.63) for the second moment. Now, we can express the sample moment conditions
as in Section 5.3.2:

f = ge − G
m e λ2 = 0
2×1 2×1 2×2 λ

The elements of gb the following:

1 >
ge1 = u
e A1 u
e
n
1 > 1 >
ge2 = u e= u
e A2 u e u
eL
n n

The G
b matrix is given by:

G
e 11 = 2n−1 u
e > M > A1 u
e (5.65)
G
e 12 = −n−1 u
e > M > A1 M u
e (5.66)
= −1 > >
A2 + A> (5.67)

G
e 21 −n u
e M 2 u
e
G
e 22 = −n −1
u>
e M A2 M u
e (5.68)

A more compact notation is:

e > A1 + A> u> e>

 
u 1 ues −e s A1 u s
1 . ..
G= 
e .
. .

n

e > Aq + A> u> e>

u q ues −e s Aq us
 > 
u
e A1 u
1
e
ge =  .
.. 
n

e > Aq u
u e
for q = 1, 2.

R Kelejian and Prucha (1999) show consistency of the Method of Moment estimator of λ, but not
asymptotic normality of the estimator.

The variance of the moment conditions will be useful later for the GMM procedure. For this, we need
some statistics for quadratic forms. It can be shown that:
5.4. ESTIMATION OF SAC MODEL: THE FEASIBLE GENERALIZED TWO STAGE LEAST SQUARES ESTIMATOR PR

E ε> Aε = tr (AΣ) + µ> Aµ

where µ and Σ are the expected value and variance-covariance matrix of ε, respectively. This result only
depends on the existence of u and Σ; it does not require normality of ε. For the moment assume that A is
symmetric and ε is normally distributed, then:

Var ε> Aε = 2 tr (AΣAΣ) + 4µ> AΣAµ,

and the covariance:

Cov ε> A1 ε, ε> A2 ε = 2 tr (A1 ΣA2 Σ) + 4µ> A1 ΣA2 µ

If A is not symmetric, then we can use our trick in Equation (5.63), then:

1 1 1 1

Cov ε A1 ε, ε A2 ε = 2 tr
> >
A1 + A1 Σ A2 + A2 Σ + 4µ> A1 + A>
> >
A2 + A>

1 Σ 2 µ
2 2 2 2
(5.69)
Now, let Ψ be 2 × 2 matrix of variance-covariance matrix of the moment conditions n1 E ε> A1 ε . Then,

the using Equation 5.69:

1 1 >
ψs,r = tr As + A> s Σ Ar + Ar Σ +
>
µ A1 + A> 1 Σ A2 + A2 µ
>

2n n
where s, r = 1, 2 correspond to the moment conditions; Σ is a diagonal matrix in the heteroskedasticity case
with elements:

2i = (e
b uLi )2 = u
ui − λe e2si

5.4.3 Assumptions
Now we will state the assumption for the SAC model under heteroskedasticity following Arraiz et al. (2010).
The assumptions regarding the spatial weight matrix are the following:

Assumption 5.22 — Spatial Weights Matrices (Arraiz et al., 2010). Assume the following:

(a) All diagonal elements Wn and Mn are zero.

(b) λ ∈ (−1, 1), ρ ∈ (−1, 1).

Assumption 5.22(a) is a normalization rule: a region cannot be a neighbor of itself. Assumption 5.22(b)
has to do with the parameter space. This assumption is discussed by Kelejian and Prucha (2010, section
2.2). Assumption 5.22(c) ensures that y and u are uniquely defined. Thus, under assumption 5.22 (Spatial
Weight Matrices), we can write the model as:
−1
yn = (In − ρWn ) [Xn β + un ]
−1
un = (In − ρMn ) εn .
The reduced form is:

y = (I − ρW )−1 Xβ + (I − ρW )−1 (I − λM )−1 ε

The reduced form represents a system of n simultaneous equations. As in the standard spatial lag model,
we can include endogenous explanatory variables on the right hand side of model specification. In this case:
−1
y = ρW y + Xβ + Y γ + (I − λW ) ε.
136 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

Assumption 5.23 — Heteroskedastic Errors (Arraiz et al., 2010). The error term {i,n : 1 ≤ i ≤ n, n ≥ 1}
satisfy E(i,n ) = 0, E(2i,n ) = σi,n
2
, with 0 < aσ ≤ σi,n
2
≤ aσ < ∞. Furthermore, for each n ≥ 1 the random
variables 1,n , ...., n,n are totally independent.

Assumption 5.23 allows the innovations to be heteroskedastic with uniformly bounded variances. This
assumption also allows for the innovations to depend on the sample size n, i.e., to form a triangular arrays.

Assumption 5.24 — Bounded Spatial Weight Matrices (Arraiz et al., 2010). The row and column sums of
the matrices Wn and Mn are bounded uniformly in absolute value, by , respectively, one and some finite
constant, and the row and column sums of the matrices (In − ρWn )−1 and (I − ρMn )−1 are bounded
uniformly in absolute value by some finite constant.

This assumption is a technical assumption, which is used in large-sample derivation of the regression
parameter estimator. This assumption limits the extent of spatial autocorrelation between u and y. It
ensures that the disturbance process and the process of the dependent variable exhibit a “fading” memory.
Note that:
h i
−1
E [un ] = E (In − λMn ) εn

= (In − λMn )
−1
E [εn ] (5.70)
= 0 by Assumption 5.23 (Heteroskedastic Errors)
> −1
h i
−1
E un u> = (I ) >

n E n − λM n εn ε n I n − λM n

(5.71)
−1 −1
= (In − λMn ) E εn εn I − λMn>
>

−1 −1
= (In − λMn ) Σ In − λMn>
where Σ = diag(σi,n
2
).

Assumption 5.25 — Regressors (Arraiz et al., 2010). The regressor matrices Xn have full column rank (for
n large enough). Furthermore, the elements of the matrices Xn are uniformly bounded in absolute value.

This assumption rules out multicollinearity problems, as well as unbounded exogenous variables.

Assumption 5.26 — Instruments I (Arraiz et al., 2010). The instruments matrices Hn have full column rank
L ≥ K + 1 (for all n large enough). Furthermore, the elements of the matrices Hn are uniformly bounded
in absolute value. Additionally, Hn is assumed to, at least, contain the linearly independent columns of
(Xn , Mn Xn )

There are some papers that discuss the use of optimal instruments for the spatial (see for example Lee,
2003; Das et al., 2003; Kelejian et al., 2004; Lee, 2007).

R The effect of the selection of instruments on the efficiency of the estimators remains to be further
investigated.

Assumption 5.27 — Instruments II (Identification) (Arraiz et al., 2010). The instruments Hn satisfy further-
more:

(a) QHH = limn→∞ n−1 Hn> Hn is finite and nonsingular.

(b) QHZ = plimn→∞ n−1 Hn> Zn and QHM Z = plimn→∞ n−1 Hn> M Zn are finite and have full column
rank. Furthermore QHZ,s (λ) = QHZ − λQHM Z has full column rank.
(c) QHΣH = limn→∞ n−1 Hn> Σn Hn is finite and nonsingular, where Σn = diag(σi,n
2
)

In treating Xn and Hn as non-stochastic our analysis should be viewed as conditional on Xn and Hn .

5.4. ESTIMATION OF SAC MODEL: THE FEASIBLE GENERALIZED TWO STAGE LEAST SQUARES ESTIMATOR PR

5.4.4 Estimators and Estimation Procedure in a Nutshell

Consider again the transformed model:

ys (λ0 ) = Zs (λ0 )δ0 +

where ys (λ0 ) = y − λ0 M y and Zs (λ0 ) = Z − λ0 M Z. If we would know λ0 , then we could apply the S2SLS
to the transformed model. However, λ0 is unknown and therefore we need to estimate it in a first place in
order to estimate δ. The steps will be:

1. An initial IV estimator of δ leads to a set of consistent residuals.

2. With these residuals, derive the moment conditions that provide a consistent estimate of λ0 using GMM
Estimation procedure.
3. The estimate of λ0 is then used to define a weighting matrix for the moment conditions in order to
obtain a consistent and efficient estimator.
4. An estimate of δ0 is obtained from the transformed model.

5. Finally, a consistent and efficient estimate of λ is based on GS2SLS residuals.

These steps are shown in Figure 5.1.

Now we will consider each step in detail:

Step 1a: 2SLS estimator

In the first step, δ is estimated by 2SLS applied to untransformed model y = Zδ +u using the instruments
matrix H. Then:
−1
δe2SLS = Ze>Z e>y
Z (5.72)
−1
where Z
e = H H >H H > Z = PH Z = (X, W
gy). The estimates δe2SLS yield an initial vector of residuals,
u2SLS as:

e 2SLS = y − Z δe2SLS
u (5.73)
The following Theorem states that δe2SLS is consistent:

Theorem 5.28 — Consistency of δe2SLS (Kelejian and Prucha, 2010). Suppose the assumptions hold. Then
δe2SLS = δ + Op (n−1/2 ), and hence δe2SLS is consistent for δ0 :
p
δe2SLS −→ δ0

Sketch of proof for Theorem 5.28. The model is:

yn = Zn δ + un ,
un = λMn un + εn .
The sampling error is given by:

−1
δbn = δ0 + Zb>Z
n n
b b > un
Zn
h > i−1 >
= δ0 + Hn (Hn> H)−1 Hn> Zn Hn (Hn> Hn )−1 Hn> Zn Hn (Hn> Hn )−1 Hn> Zn un
−1 > −1
= δ0 + Zn> Hn (Hn> Hn )−1 Hn> Zn Zn Hn (Hn> Hn )−1 Hn> (I − λMn ) εn

√
Solving for δbn − δ0 and multiplying by n we obtain:
138 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

Figure 5.1: Estimation steps for SAC model

Obtain a consistent estimate of λ

Estimate
S2SLS:
−1
δe2SLS = Z e>Z e>y
Z
and get u
b 2SLS

Initial GMM estimator of λ:

Use ub 2SLS to obtain λ̆GM M

Efficient GMM
estimator of λ:
Use u
b GM M to compute
the weighting matrix Ψ
e
and obtain λOGM M
e

Obtain a consistent estimate of δb

Estimate FGS2SLS
using λ :
h OGM iM−1
e
δbF GS2SLS = Z >
b Z
s
b > ys
Z s
and get u
b F GS2SLS

Efficient GMM es-

timator of λ using :
Use u
b F GS2SLS to compute
the weighting matrix Ψ
e
and obtain λeOGM M

" > −1 #−1 > −1

√ 1 > 1 > 1 > 1 > 1 > 1
n(δbn − δ0 ) = H Zn H Hn H Zn H Zn H Hn √ Hn> (I − λMn )−1 εn
n n n n n n n n n n n
" > −1 #−1 > −1
1 > 1 > 1 > 1 > 1 > 1
= H Zn H Hn H Zn H Zn H Hn √ Fn> εn ,
n n n n n n n n n n n

where:
−1
Fn> = Hn> (I − λMn ) = whose elements are bounded in absolute value
Assumption 5.27 implies that:
1 >
lim H Hn = QHH ,
n n
1
plim Hn> Zn = QHZ ,
n
which are finite and nonsingular.
Furthermore, note that E(n−1/2 Fn> εn ) = 0 and
5.4. ESTIMATION OF SAC MODEL: THE FEASIBLE GENERALIZED TWO STAGE LEAST SQUARES ESTIMATOR PR

h i 1 h −1 −1 i
E (n−1/2 Fn> εn )(n−1/2 Fn> εn )> = E Hn> (I − λMn ) εε> I − λMn> Hn
n
1 −1 −1
= σ 2 Hn> (I − λMn ) I − λMn> Hn
n
Assume that
1 > −1 −1 1
limHn (I − λMn ) I − λMn> Hn = Fn> Fn = Φ exists
n→∞ n n
Then assuming homocedasticity and using Theorem 5.20:
d
n−1/2 Fn> εn −→ N(0, σ2 Φ)
Therefore:
√ d
n(δbn − δ0 ) −→ N(0, ∆)
and
−1
−1 > −1
∆ = σ2 Q> QHZ Q−1 −1 −1
>
HZ QHH QHZ HH ΦQHH QHZ QHZ QHH QHZ

Then we can say that δe = δ + Op (n−1/2 ).

p
Consistency follows if n−1 Fn> εn −→ 0. Note that E(n−1 Fn> εn ) = 0 and
1 −1 −1
Var n−1 Fn> εn = σ 2 2 Hn> (I − λMn ) I − λMn>

Hn
n
which converges to 0, then using Chebyshev’s inequality 5.10:
p p
n−1 Fn> εn −→ 0 and hence δen −→ δ0

Although δe2SLS is consistent, it does not utilize information relating to the spatial correlation error term.
We therefore turn to the second step of the procedure. (Question: Why we cannot use the OLS residuals for
the next step?)

Step 1b: Initial GMM estimator of λ based on 2SLS residuals

Using the consistent estimate u in the previous step, now we create the sample moments corresponding to
(5.61) for q = 1, 2 based on the estimated residuals, and u
es = M ue:

1 u e 2SLS I − λM > A1 (I − λM ) u
>
m(λ, δ2SLS ) =
e 2SLS
A2 (I − λM ) u
e
n u e>2SLS I − λM
> e 2SLS
(5.74)
λ
= G 2 − ge
e
λ
where,

e > A1 + A> u> e>

 
u 1 ues −e s A1 u s
1 . ..
G= 
e .. .

n

e Aq + A>
>
u> e>

u q ues −e s Aq us
 > 
u
e A1 u
1
e
ge =  .
.
.

n

e > Aq u
u e
The initial GMM estimator for λ is then defined as
( > )
λ λ
λ̆gmm = argmin Ge − ge G
e − ge (5.75)
λ λ2 λ2
140 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

where Υ λλ = I. This estimator is consistent but not efficient. For efficiency we need to replace Υ λλ by the
variance-covariance matrix of the sample moments. Furthermore, the expression above can be interpreted as
a nonlinear least squares system of equations. The initial estimate is thus obtained as a solution of the above
system.
Now, we need to define the expression for the matrices As . Drukker et al. (2013) suggest, for the
homokedastic case, the following expressions:

1

A1 = υ M M − tr M M I
> >

n
A2 = M
where υ is the scaling factor needed to obtain the same estimator of Kelejian and Prucha (1998, 1999).
On the other hand, when heteroskedasticity is assumed, Kelejian and Prucha (2010) recommend the
following expressions:

A1 = M > M − diag(M > M )

A2 = M

Step 1c: Efficient GMM estimator of λ based on 2SLS residuals

The efficient GMM estimator of λ is a weighted nonlinear least squares estimator. Specifically, this estimator
is λ
e where:
h i
eogmm = argmin m(λ, δ)
λ e >Ψe −1 m(λ, δ)
e (5.76)
λ

e −1 , where Ψ is the variance of the moment conditions m(λ, δ).

and where the weighting matrix is Ψ n h i
e
The matrix Ψe =Ψ
−1
n
e (λ̆gmm ) is defined as follows. Let Ψ
−1
n
e= Ψ b rs with
r,s=1,2
h i
Ψ
e rs = (2n)−1 tr (Ar + A> )Σ(A
r
e s + A> )Σ
s
e + n−1 a e>r Σe
e as , (5.77)

where:

e = diagi=1,...,n e2i

Σ

= I − λ̆gmm M u
e e

e r = I − λ̆gmm M H Pe α
a er
h i (5.78)
e r = −n−1 Z > I − λ̆gmm M (Ar + A>
α r ) I − λ̆gmm M ue
−1 " > −1 #−1
1 1 1 1 1

Pe = >
H H >
H Zn >
H Z >
H H >
H Z
n n n n n n
It is important to note that this step is not necessary since the previous estimator of λ is already consistent.

Step 2a: FGS2SLS Estimator

Using λ̆ogmm from step 1c (or the consistent estimator from step 1b) in the transformed model we have:
h i−1
δbn (λ
eogmm ) = Zb > (λ
s
eogmm )Z(λ
eogmm ) b > (λ
Zs
eogmm )ys (λ
eogmm ) (5.79)

where

ys = y − λ
eogmm M y

Zs = Z − λ
eogmm M Z
(5.80)
bs = PH Zs
Z
−1
PH = H H > H H>
5.5. APPLICATION IN R 141

Step 2b: Efficient GMM estimator of λ using FGS2SLS residual

In this last step, and efficient GMM estimator of λ based on the GS2SLS residuals is obtained by minimizing
the following expression:
( > )
b = argmin λ λ
λ G
b − gb (Ψ b )
λ
bbλ −1
G
b − gb (5.81)
λ λ2 λ2

where Ψ λb
bbλ
is an estimator for the variance-covariance matrix of the (normalized) sample moment vector
based on the GS2SLS residuals. This estimator differs for the cases of homoskedastic and heteroskedastic
errors.
For the homoskedastic case the r, s (with r, s = 1, 2) element of Ψ
bbλb
λ
is given by:
2 2
rs = σ (2n)−1 tr Ar + A> As + A>
λbλ

Ψ
bb e r s

+σe2 n−1 a
e>
r ae>
s
2 2 >
(5.82)
+n −1
e −3 σ
µ(4)
e vecD (Ar ) vecD (As )
+ n−1 µ e r vecD (As ) + a s vecD (Ar ) ,
>
e(3) a e>

where

e r = Tbα
a er
T = H Pb ,
b
h i−1
b −1 Q
Pb = Q b > b −1 b >
HH HZ QHZ QHH QHZ
b

b −1 = n−1 H > H ,

Q HH
b HZ = n−1 H > Z ,

Q

Z = I − λMe Z, (5.83)
er = −n−1 Z > Ar + A>

α r εb
b2 = n−1 εbεb,
σ
X n
b(3) = n−1
µ 3i ,
b
i=1
n
X
b(4) = n−1
µ 4i .
b
i=1

For the heteroskedastic case the r, s (with r, s = 1, 2) element of Ψ

bbλb
λ
is given by:
h i
rs = (2n) tr Ar + A> r Σ As + As Σ + n (5.84)
λbλ −1 > b −1 > b >

Ψbb b a
e r Σeas ,

where, Σ
b is a diagonal matrix whose ith diagonal element is b
2i .

5.5 Application in R
In this example we will use the simulated US Driving Under the Influence (DUI) county data set used in
Drukker et al. (2011). The dependent variable dui is defined as the alcohol-related arrest rate per 100,000
daily vehicle miles traveled (DVMT). The explanatory variables include

• police: number of sworn officers per 100,000 DVMT,

• nondui: non-alcohol-related arrests per 100,000 DVMT,
• vehicles: number of registered vehicles per 1,000 residents, and
• dry: a dummy for counties that prohibit alcohol sale within their borders
142 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

We load the required packages and dataset:

library("maptools")
library("spdep")
library("sphet")
# Load Data
us_shape <- readShapeSpatial("ccountyR") # Load shape file

## Warning: readShapeSpatial is deprecated; use rgdal::readOGR or sf::st_read

## Warning: readShapePoints is deprecated; use rgdal::readOGR or sf::st_read
names(us_shape) # Names of variables in dbf

## [1] "dry" "nondui" "vehicles" "elect" "dui" "police"

# Load weight matrix

queen.w <- read.gal("ccountyR_w.gal")
lw <- nb2listw(queen.w, style = "W")

5.5.1 SAC Model with Homokedasticity (GS2SLS)

First, we estimate the SAC model assuming homoskedasticity (Kelejian and Prucha, 1998) using the gstsls
function from spdep package. We will also assume that W = M . The code is the following:

GS2SLS <- gstsls(dui ~ police + nondui + vehicles + dry,

data = us_shape,
listw = lw)
summary(GS2SLS)

##
## Call:gstsls(formula = dui ~ police + nondui + vehicles + dry, data = us_shape,
## listw = lw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.655535 -0.362165 -0.070363 0.277261 2.418849
##
## Type: GM SARAR estimator
## Coefficients: (GM standard errors)
## Estimate Std. Error z value Pr(>|z|)
## Rho_Wy 0.04692763 0.01698220 2.7633 0.005721
## (Intercept) -6.40991922 0.41836312 -15.3214 < 2.2e-16
## police 0.59810726 0.01491778 40.0936 < 2.2e-16
## nondui 0.00024688 0.00108699 0.2271 0.820328
## vehicles 0.01571247 0.00066881 23.4933 < 2.2e-16
## dry 0.10608849 0.03496242 3.0344 0.002410
##
## Lambda: 0.00095701
## Residual variance (sigma squared): 0.31811, (sigma: 0.56402)
## GM argmin sigma squared: 0.31789
## Number of observations: 3109
## Number of parameters estimated: 8

The results show that all the variables are significant, except for nondui. Importantly, higher number
of sworn officers is positively correlated with the DUI arrest rate, after controlling for nondui, vehicles
and dry! The spatial autoregressive coefficient ρ is positive and significant indicating autocorrelation in the
dependent variable. Drukker et al. (2011) give some theoretical explanation of this results. On the one hand,
5.5. APPLICATION IN R 143

the positive coefficient may be explained in terms of coordination effort among police departments in different
countries. On the other hand, it might well be that an enforcement effort in one of the counties leads people
living close to the border to drink in neighboring counties. The estimate is λ negative, however the output
does not produce inference for it. Lastly, it is important to stress that the standard errors has a degrees of
freedom correction in the variance-covariance matrix.

5.5.2 SAC Model with Homokedasticity and Additional Endogeneity (GS2SLS)

The size of the police force may be related with the arrest rates dui. As a consequence, police produces
endogeneity. We will use the dummy variable elect, where elect is 1 if a country government faces an
election, 0 otherwise. To do so, we use the spreg function from sphet. Note that λ is ρ. The estimate of ρ is
positive and significant thus indicating spatial autocorrelation in the dependent variable (coordination effort
among police departments in different counties).

G2SLS_en_in <- spreg(dui ~ nondui + vehicles + dry,

data = us_shape,
listw = lw,
endog = ~ police,
instruments = ~ elect,
model = "sarar",
het = FALSE,
lag.instr = TRUE)
summary(G2SLS_en_in)

##
## Generalized stsls
##
## Call:
## spreg(formula = dui ~ nondui + vehicles + dry, data = us_shape,
## listw = lw, endog = ~police, instruments = ~elect, lag.instr = TRUE,
## model = "sarar", het = FALSE)
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6.1862 -0.8838 0.0147 -0.0161 0.9213 8.3616
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 11.60596811 1.66674437 6.9633 3.325e-12 ***
## nondui -0.00019624 0.00275912 -0.0711 0.943299
## vehicles 0.09299562 0.00564911 16.4620 < 2.2e-16 ***
## dry 0.39825983 0.09090201 4.3812 1.180e-05 ***
## police -1.35130834 0.14101772 -9.5825 < 2.2e-16 ***
## lambda 0.19319018 0.04431011 4.3600 1.301e-05 ***
## rho -0.08597523 0.03018333 -2.8484 0.004393 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

An important issue here is that the optimal instrument are unknown. It is not recommended the
inclusion of the spatial lag of these additional exogenous variables in the matrix of instruments. However,
results reported in ? do consider the spatial lags of elect.
Now we assume that the error are heteroskedastic of unknown form.
144 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

Appendix
5.A Proof Theorem 3 in Kelejian and Prucha (1998)
Recall that the GS2SLS is given by:
h i−1
δbn = Zbs (λ)> Z
bs (λ) bs (λ)> ys (λ)
Z (5.85)
Whereas, the FGS2SLS is given by:
h i−1
δbF,n = Zbs (λ) bs (λ)
b >Z b bs (λ)
Z b > ys (λ)
b (5.86)
where

bs (λ
Z bn ) = PH Zs (λ
bn )
n

Zs (λ
bn ) = Zn − λ
bn Mn Zn

ys ( λ
b n ) = yn − λ
bn Mn yn
(5.87)
\
Zs (λn ) = Xn − λn Mn Xn , Wn yn − λn Mn Wn yn
b b b b

b\
λ n Mn Wn yn = PHn Wn yn − λn Mn Wn yn .
b

The sampling error is:

h i−1
δbF,n − δ = Zbs (λ) bs (λ)
b >Z b bs (λ)
Z b > us (λ
bn ) (5.88)
where:

us (λ
bn ) = (I − λ
bn )u

= (I − λ
bn )u + εn − εn

= εn + (I − λbn Mn )u − (I − λMn )u (5.89)

= εn + u − λbn Mn u − u + λMn u

= εn − λ bn − λ Mn un

Then:

h i−1 h i
δbF,n − δ = Z bs (λ)
b >Zbs (λ)
b bs (λ)
Z b > εn − λ b n − λ M n un
h i−1 h i−1
= Z bs (λ)
b >Zbs (λ)
b bs (λ)
Z b > εn − Z bs (λ)
b >Z bs (λ)
b Zbs (λ)
b > λ bn − λ Mn un
−1 −1
1 b b >b b 1 b b > 1 b b >b b bn − λ 1 Z

= Zs (λ) Zs (λ) Zs (λ) εn − Zs (λ) Zs (λ) λ bs (λ)
b > Mn un
n n n n
−1 −1
√ 1 b b >b b 1 b b > 1 b b >b b bn − λ √1 Z

n(δbF,n − δ) = Zs (λ) Zs (λ) √ Z s ( λ) ε n − Z s ( λ) Z s (λ) λ bs (λ)
b > Mn un
n n n n
(5.90)
By consistency λ bn − λ = op (1). Now, we need to show that:

1 b b >b b p 1 b
Zs (λ) Zs (λ) −→ Zs (λ)> Z
bs (λ) = Q̄ (5.91)
n n
1 b b > d
s (λ) εn −→ N(0, σ Q̄), (5.92)
2
√ Z
n

bn − λ √1 Z
p
λ bs (λ)
b > Mn un −→ 0 (5.93)
n
5.A. PROOF THEOREM 3 IN ? 145

where:
>
Q̄ = [QHZ − λQmHZ ] Q−1
HH [QHZ − λQmHZ ] (5.94)
is finite and nonsingular. For 5.91, note that:

1 b b >b b 1 >
Zs (λ) Zs (λ) = Zn − λ
bn Mn Zn PH Zn − λ
n
bn Mn Zn
n n
1 >
bn Mn Zn Hn H > H −1 H > Zn − λ

=

Zn − λ n n
bn Mn Zn
n
(5.95)
 
−1
1 > 1 > >  1 > 1 > 1 >
 
=  Zn Hn − λn
 b Z M Hn  Hn H H Zn − λn Hn Mn Zn
b
|n {z } |{z} n | n {zn }
 | n {z n n n
p p
−→λ −→Q>
}| {z }
p
−→Q>
HZ
HM Z
→Q−1 p
−→QHZ −λQHM Z
HH

For 5.92, note that:

1 b b > 1 >
√ Z s (λ) εn = √ Zn − λ
bn Mn Zn PH ε
n
n n
 

1 >
−1 (5.96)
1 > >  1 > 1
 
= Z H − λ
 n n n |{z}
b n Z n M n Hn
 Hn H √ Hn> εn
| {z } n | {z }| n n
p p
−→λ >
{z } | {z }
p −→QHM Z
−→Q> →Q−1 d
HZ HH −→N(0,σ2 QHH )

For 5.93 note that:

 
−1
bn − λ √1 Z b n 1 Z > M > Hn  1 H > H
bn − λ  1 Z > Hn − λ 1
 
bs (λ)
b > Mn un = λ √ Hn> Mn un
 
λ n n n n  n n
n | {z } | {z } |{z} n | {z } | n
p p
−→λ −→Q>
{z }
op (1) p
−→Q>
HZ
HM Z
→Q−1
HH
(5.97)
n Mn Hn = n
Note that E n−1/2 Hn> Mn un = 0 and E n−1 Hn> Mn un u> > > −1
Hn> Mn Σun Mn> Hn> , whose

elements are bounded, where

−1 −1
Σun = σ2 (I − λMn ) I − λMn>
Then √1 H > Mn un
n n = Op (1) and finally
√ d
n(δbF,n − δ) −→ N(0, σ2 Q̄−1 ) (5.98)
The small sample approximation is
h i−1
δbF,n ∼ N δ, σ
b2 Zbs (λ) bs (λ)
b >Z b (5.99)

where:

b2 = εb> εb/n
σ (5.100)
and εb = ys (λ)
b − Zs (λ)
b δbF .
146 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM
Bibliography

Allers, M. A. and Elhorst, J. P. (2005). Tax Mimicking and Yardstick Competition Among Local Governments
in The Netherlands. International tax and public finance, 12(4):493–513.
Anselin, L. (1988). Spatial Econometrics: Methods and Models, volume 4. Springer.
Anselin, L. (1996). Chapter Eight: The Moran Scatterplot as an ESDA Tool to Assess Local Instability in
Spatial Association. Spatial Analytical, 4:121.

Anselin, L. (2003). Spatial Externalities, Spatial Multipliers, and Spatial Econometrics. International regional
science review, 26(2):153–166.
Anselin, L. (2007). Spatial Econometrics, pages 310–330. Blackwell Publishing Ltd.
Anselin, L. and Bera, A. (1998). Spatial Dependence in Linear Regression Models with an Introduction to
Spatial Econometrics. In Ullah, A. and Giles, D., editors, Handbook of Applied Economic Statistics, pages
237–289. Marcel Dekker, New York.
Anselin, L. and Lozano-Gracia, N. (2008). Errors in Variables and Spatial Effects in Hedonic House Price
Models of Ambient Air Quality. Empirical economics, 34(1):5–34.
Anselin, L. and Rey, S. (1991). Properties of Tests for Spatial Dependence in Linear Regression Models.
Geographical analysis, 23(2):112–131.
Anselin, L. and Rey, S. (2014). Modern Spatial Econometrics in Practice: A Guide to Geoda, Geodaspace
and Pysal. GeoDa Press LLC.
Arraiz, I., Drukker, D. M., Kelejian, H. H., and Prucha, I. R. (2010). A Spatial Cliff-Ord-Type Model with
Heteroskedastic Innovations: Small and Large Sample Results. Journal of Regional Science, 50(2):592–614.
Baller, R. D., Anselin, L., Messner, S. F., Deane, G., and Hawkins, D. F. (2001). Structural Covariates of
US County Homicide Rates: Incorporating Spatial Effects. Criminology, 39(3):561–588.
Basdas, U. (2009). Spatial Econometric Analysis of the Determinants of Location in Turkish Manufacturing
Industry. Available at SSRN 1506888.

Bivand, R., Hauke, J., and Kossowski, T. (2013). Computing the Jacobian in Gaussian Spatial Autoregressive
Models: An Illustrated Comparison of Available Methods. Geographical Analysis, 45(2):150–179.
Bivand, R. and Lewin-Koh, N. (2015). maptools: Tools for Reading and Handling Spatial Objects. R package
version 0.8-36.

Bivand, R. and Piras, G. (2015). Comparing Implementations of Estimation Methods for Spatial Economet-
rics. Journal of Statistical Software, 63(1):1–36.

147
148 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM

Boarnet, M. G. and Glazer, A. (2002). Federal Grants and Yardstick Competition. Journal of urban Eco-
nomics, 52(1):53–64.
Cliff, A. and Ord, K. (1972). Testing for Spatial Autocorrelation Among Regression Residuals. Geographical
analysis, 4(3):267–284.
Cliff, A. D. and Ord, J. K. (1973). Spatial Autocorrelation. London:Pion.

Cohen, J. and Tita, G. (1999). Diffusion in Homicide: Exploring a General Method for Detecting Spatial
Diffusion Processes. Journal of Quantitative Criminology, 15(4):451–493.
Cordy, C. B. and Griffith, D. A. (1993). Efficiency of least squares estimators in the presence of spatial
autocorrelation. Communications in Statistics-Simulation and Computation, 22(4):1161–1179.

Das, D., Kelejian, H. H., and Prucha, I. R. (2003). Finite Sample Properties of Estimators of Spatial
Autoregressive Models with Autoregressive Disturbances. Papers in Regional Science, 82(1):1–26.
Doreian, P. (1981). Estimating Linear Models with Spatially Distributed Data. Sociological methodology,
pages 359–388.

Drukker, D. M., Egger, P., and Prucha, I. R. (2013). On Two-step Estimation of a Spatial Autoregressive
Model with Autoregressive Disturbances and Endogenous Regressors. Econometric Reviews, 32(5-6):686–
733.
Drukker, D. M., Prucha, I. R., and Raciborski, R. (2011). A Command for Estimating Spatial-autoregressive
Models with Spatial-autoregressive Disturbances and Additional Endogenous Variables. Econometric Re-
views, 32:686–733.
Elhorst, J. P. (2010). Applied Spatial Econometrics: Raising the Bar. Spatial Economic Analysis, 5(1):9–28.
Elhorst, J. P. (2014). Spatial Econometrics: From Cross-Sectional Data to Spatial Panels. Springer.
Filiztekin, A. (2009). Regional Unemployment in Turkey. Papers in Regional Science, 88(4):863–878.

Fischer, M. M., Bartkowska, M., Riedl, A., Sardadvar, S., and Kunnert, A. (2009). The Impact of Human
Capital on Regional Labor Productivity in Europe. Letters in Spatial and Resource Sciences, 2(2-3):97–108.
Garretsen, H. and Peeters, J. (2009). FDI and the Relevance of Spatial Linkages: Do Third-Country Effects
Matter for Dutch FDI? Review of World Economics, 145(2):319–338.

Garrett, T. A. and Marsh, T. L. (2002). The revenue impacts of cross-border lottery shopping in the presence
of spatial autocorrelation. Regional Science and Urban Economics, 32(4):501–519.
Gibbons, S., Overman, H. G., and Patacchini, E. (2015). Spatial Methods. Handbook of Regional and Urban
Economics SET, page 115.
Kelejian, H. and Piras, G. (2017). Spatial econometrics. Academic Press.

Kelejian, H. H. and Prucha, I. R. (1998). A Generalized Spatial Two-Stage Least Squares Procedure for
Estimating a Spatial Autoregressive Model with Autoregressive Disturbances. The Journal of Real Estate
Finance and Economics, 17(1):99–121.
Kelejian, H. H. and Prucha, I. R. (1999). A Generalized Moments Estimator for the Autoregressive Parameter
in a Spatial Model. International economic review, 40(2):509–533.
Kelejian, H. H. and Prucha, I. R. (2001). On the Asymptotic Distribution of the Moran I Test Statistic with
Applications. Journal of Econometrics, 104(2):219–257.
Kelejian, H. H. and Prucha, I. R. (2007). The Relative Efficiencies of Various Predictors in Spatial Econo-
metric Models Containing Spatial Lags. Regional Science and Urban Economics, 37(3):363–374.

Kelejian, H. H. and Prucha, I. R. (2010). Specification and Estimation of Spatial Autoregressive Models with
Autoregressive and Heteroskedastic Disturbances. Journal of Econometrics, 157(1):53–67.
5.A. PROOF THEOREM 3 IN ? 149

Kelejian, H. H., Prucha, I. R., and Yuzefovich, Y. (2004). Instrumental Variable Estimation of a Spatial
Autoregressive Model with Autoregressive Disturbances: Large and Small Sample Results. In Lesage, J.
and Pace, R., editors, Spatial and Spatiotemporal Econometrics, pages 163–198. Emerald Group Publishing
Limited.
Kim, C. W., Phipps, T. T., and Anselin, L. (2003). Measuring the Benefits of Air Quality Improvement: A
Spatial Hedonic Approach. Journal of environmental economics and management, 45(1):24–39.
Kirby, D. K. and LeSage, J. P. (2009). Changes in Commuting to Work Times Over the 1990 to 2000 Period.
Regional Science and Urban Economics, 39(4):460–471.
Lee, L.-F. (2002). Consistency and Efficiency of Least Squares Estimation for Mixed Regressive, Spatial
Autoregressive Models. Econometric theory, 18(02):252–277.
Lee, L.-f. (2003). Best Spatial Two-Stage Least Squares Estimators for a Spatial Autoregressive Model with
Autoregressive Disturbances. Econometric Reviews, 22(4):307–335.
Lee, L.-F. (2004). Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Autore-
gressive Models. Econometrica, 72(6):1899–1925.
Lee, L.-f. (2007). GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models. Journal
of Econometrics, 137(2):489–514.
LeSage, J. and Pace, R. K. (2010). Introduction to Spatial Econometrics. CRC press.
LeSage, J. P. (1997). Bayesian Estimation of Spatial Autoregressive Models. International Regional Science
Review, 20(1-2):113–129.
LeSage, J. P. (2014). What Regional Scientists Need to Know about Spatial Econometrics. The Review of
Regional Studies, 44(1):13–32.
LeSage, J. P. and Pace, R. K. (2014). Interpreting Spatial Econometric Models, pages 1535–1552. Springer
Berlin Heidelberg, Berlin, Heidelberg.
Mead, R. (1967). A Mathematical Model for the Estimation of Inter-Plant Competition. Biometrics, pages
189–205.
Newey, W. K. and McFadden, D. (1994). Large Sample Estimation and Hypothesis Testing. Handbook of
econometrics, 4:2111–2245.
Ord, K. (1975). Estimation Methods for Models of Spatial Interaction. Journal of the American Statistical
Association, 70(349):120–126.
Pace, R. K. and LeSage, J. P. (2008). A spatial hausman test. Economics Letters, 101(3):282–284.
Pavlyuk, D. (2011). Spatial Analysis of Regional Employment Rates in Latvia.
Piras, G. (2010). sphet: Spatial Models with Heteroskedastic Innovations in R. Journal of Statistical Software,
35(1):1–21.
Prucha, I. (2014). Instrumental Variables/Method of Moments Estimation. In Fischer, M. M. and Nijkamp,
P., editors, Handbook of Regional Science, pages 1597–1617. Springer Berlin Heidelberg.
Saavedra, L. A. (2000). A Model of Welfare Competition with Evidence from AFDC. Journal of Urban
Economics, 47(2):248–279.
Smirnov, O. and Anselin, L. (2001). Fast Maximum Likelihood Estimation of Very Large Spatial Autoregres-
sive Models: A Characteristic Polynomial Approach. Computational Statistics & Data Analysis, 35(3):301–
319.
Stewart, B. M. and Zhukov, Y. (2010). Choosing Your Neighbors: The Sensitivity of Geographical Diffusion
in International Relations. In APSA 2010 Annual Meeting Paper.
Tiefelsdorf, M., Griffith, D., and Boots, B. (1999). A Variance-Stabilizing Coding Scheme for Spatial Link
Matrices. Environment and Planning A, 31(1):165–180.
White, H. (2014). Asymptotic Theory for Econometricians. Academic press.
150 CHAPTER 5. INSTRUMENTAL VARIABLES AND GMM
Index

eigen values, 30 S2SLS

Endogeneity Asymptotic distribution, 116
additional endogenous variables, 114 consistency, 115
error in variables, 114 example, 117
stsls function, 117
Generalized method of moments, 105 SAC model
Moment conditions, 121 FGS2SLS, 130
GS2SLS Spatial autocorrelation, 4
gstsls function, 142 Spatial dependence, 3
spreg function, 143 Spatial durbin model, 33
reduced form, 33
Heteroskedasticity Spatial error model, 33
error term, 112 reduced form, 34
Spatial lag model, 29
Instrumental Variables
Spillover effects, 29
definition in the spatial context, 111
Direct effects, 39
optimal instruments, 111
example, 43
S2SLS, 110
for S2SLS, 118
Leontief expansion, 31 Global spillovers, 37
Indirect effects, 38
Maximum likelihood, 57 Local spillovers, 38
concentrated log-likelihood, 59 Marginal effects, 38
Jacobian, 57
Moment conditions, 121 Tobler’s law, 3
Moran’s I test, 19
Weight matrix, 6
Monte carlo, 23
Based on distance, 8
Moran scatterplot, 20
Bishop contiguity, 7
moran.mc function, 26
Definition, 6
moran.plot function, 26
Higher order, 11
moran.test function, 25
Invertibility, 30
Normality, 21
knearneigh function, 16
Randomization, 22
lag.listw function, 19
Multiplier effect, 32
nb2listw function, 14
Parameter space, 31 poly2nb function, 13
Queen contiguity, 8
quadratic moment conditions, 121 Rook contiguity, 6
Row-standardization, 9
Reduced form Spatial lag, 10
Spatial lag model, 30

151

Anselin, L Et AL - Advances in Spatial Econometrics - Methodology, T
No ratings yet
Anselin, L Et AL - Advances in Spatial Econometrics - Methodology, T
515 pages
Econometric S 2
0% (1)
Econometric S 2
611 pages
Eigenvalues and Eigenvectors - Linear Algebra
100% (1)
Eigenvalues and Eigenvectors - Linear Algebra
69 pages
(Chapman & Hall CRC Handbooks of Modern Statistical Methods) Alan E. Gelfand, Peter Diggle, Peter Guttorp, Montserrat Fuentes - Handbook of Spatial Statistics (Chapman & Hall CRC Handbooks of Modern S
No ratings yet
(Chapman & Hall CRC Handbooks of Modern Statistical Methods) Alan E. Gelfand, Peter Diggle, Peter Guttorp, Montserrat Fuentes - Handbook of Spatial Statistics (Chapman & Hall CRC Handbooks of Modern S
620 pages
Spatial Big Data Science: Zhe Jiang Shashi Shekhar
100% (2)
Spatial Big Data Science: Zhe Jiang Shashi Shekhar
138 pages
(Interdisciplinary Statistics) Diggle, Peter - Giorgi, Emanuele - Model-Based Geostatistics For Global Public Health - Methods and Applications-CRC Press (2019)
No ratings yet
(Interdisciplinary Statistics) Diggle, Peter - Giorgi, Emanuele - Model-Based Geostatistics For Global Public Health - Methods and Applications-CRC Press (2019)
274 pages
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
No ratings yet
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
387 pages
Spatial Analysis
100% (3)
Spatial Analysis
133 pages
Etc 2410 Notes
50% (2)
Etc 2410 Notes
133 pages
Modern Time Series: Description, Prediction and Causality: ©2023 Neil Shephard
No ratings yet
Modern Time Series: Description, Prediction and Causality: ©2023 Neil Shephard
221 pages
Econometric S
No ratings yet
Econometric S
1,341 pages
Financial Econometrics 2010-2011
100% (1)
Financial Econometrics 2010-2011
483 pages
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
100% (1)
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
414 pages
Econometric S
100% (1)
Econometric S
348 pages
2021 - Creel - Econometrics (Githuib Book)
No ratings yet
2021 - Creel - Econometrics (Githuib Book)
1,060 pages
Wickens Exercises in Econometrics
0% (1)
Wickens Exercises in Econometrics
113 pages
Ebook Econometrics
No ratings yet
Ebook Econometrics
1,006 pages
Python Programming and Numerical Methods - A Guide For Engineers and Scientists - Python Numerical Methods
No ratings yet
Python Programming and Numerical Methods - A Guide For Engineers and Scientists - Python Numerical Methods
5 pages
Ece
100% (1)
Ece
139 pages
Intro To Econometrics With R PDF
No ratings yet
Intro To Econometrics With R PDF
392 pages
B.sc. Computer Science
No ratings yet
B.sc. Computer Science
60 pages
Linear Algebra Cheat Sheet
67% (3)
Linear Algebra Cheat Sheet
3 pages
Eco No Metrics
No ratings yet
Eco No Metrics
1,045 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
LectureNotes 480
No ratings yet
LectureNotes 480
192 pages
Creel M Econometrics
No ratings yet
Creel M Econometrics
479 pages
Kuan C.-M. Introduction To Econometric Theory (LN, Taipei, 2002) (202s) - GL
No ratings yet
Kuan C.-M. Introduction To Econometric Theory (LN, Taipei, 2002) (202s) - GL
202 pages
Course v13
No ratings yet
Course v13
336 pages
Applied Statistics For Economic and Buisness
No ratings yet
Applied Statistics For Economic and Buisness
315 pages
Econometrics Simpler Note
No ratings yet
Econometrics Simpler Note
692 pages
Planning Support Science For Smarter Urban Futures 2017 PDF
No ratings yet
Planning Support Science For Smarter Urban Futures 2017 PDF
499 pages
Applied Robust Statistics-David Olive
No ratings yet
Applied Robust Statistics-David Olive
588 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Applied Robust Statistics 2005 PDF
No ratings yet
Applied Robust Statistics 2005 PDF
532 pages
Econometrics UAB
No ratings yet
Econometrics UAB
353 pages
Anatolyev Problemnik
100% (1)
Anatolyev Problemnik
116 pages
Introduction To Linear Algebra - Models Methods and Theory PDF
No ratings yet
Introduction To Linear Algebra - Models Methods and Theory PDF
556 pages
Linear Model and Extensions
No ratings yet
Linear Model and Extensions
400 pages
EcmAll PDF
No ratings yet
EcmAll PDF
266 pages
Applied Robust Statistics
No ratings yet
Applied Robust Statistics
532 pages
Regbook Inside
100% (1)
Regbook Inside
21 pages
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Quantitative Research Methods For Political Science, Public Policy and Public Administration, With Applications in R
No ratings yet
Quantitative Research Methods For Political Science, Public Policy and Public Administration, With Applications in R
259 pages
2016 Book SpatialEconometricInteractionM PDF
No ratings yet
2016 Book SpatialEconometricInteractionM PDF
466 pages
2016 Book SpatialEconometricInteractionM PDF
No ratings yet
2016 Book SpatialEconometricInteractionM PDF
466 pages
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
The Gracious Lily Affair
From Everand
The Gracious Lily Affair
Van Wyck Mason
5/5 (1)
Eco No Metrics
No ratings yet
Eco No Metrics
312 pages
Adv Stat Inf
No ratings yet
Adv Stat Inf
194 pages
Handout 2020 Part1 PDF
No ratings yet
Handout 2020 Part1 PDF
82 pages
Between River and Mountain
From Everand
Between River and Mountain
Sally Walker Brinkmann
No ratings yet
Et
No ratings yet
Et
324 pages
Understanding Symmetrical Components For Power System Modeling Das Instant Download
No ratings yet
Understanding Symmetrical Components For Power System Modeling Das Instant Download
58 pages
Course HEM245 2021
No ratings yet
Course HEM245 2021
157 pages
Matrices and Determinants
100% (1)
Matrices and Determinants
36 pages
Hitchhiker Guide
No ratings yet
Hitchhiker Guide
168 pages
2009 - Underground Mining Method Selection by Decision Making Tools
No ratings yet
2009 - Underground Mining Method Selection by Decision Making Tools
12 pages
Fluid Mechanics, Turbulent Flow and Turbulence
No ratings yet
Fluid Mechanics, Turbulent Flow and Turbulence
247 pages
Silo Buckling
No ratings yet
Silo Buckling
14 pages
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
No ratings yet
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
86 pages
TOBo ML
No ratings yet
TOBo ML
135 pages
Ansel in 2007
No ratings yet
Ansel in 2007
141 pages
Hansen (2006, Econometrics)
No ratings yet
Hansen (2006, Econometrics)
196 pages
Econometric S 2007
No ratings yet
Econometric S 2007
167 pages
Analysis PDF
No ratings yet
Analysis PDF
135 pages
Stat 378
No ratings yet
Stat 378
73 pages
Computer Vision ch4
No ratings yet
Computer Vision ch4
100 pages
XXXX Statistical Estimation
No ratings yet
XXXX Statistical Estimation
87 pages
Ch7 Eigenvalues and Eigenvectors
No ratings yet
Ch7 Eigenvalues and Eigenvectors
90 pages
Reg Book Stat
No ratings yet
Reg Book Stat
79 pages
(PDF Download) Linear Algebra and Group Theory For Physicists and Engineers 2nd Edition Yair Shapira Fulll Chapter
100% (7)
(PDF Download) Linear Algebra and Group Theory For Physicists and Engineers 2nd Edition Yair Shapira Fulll Chapter
64 pages
Statistical Regression
No ratings yet
Statistical Regression
32 pages
Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University
No ratings yet
Spatial Analysis and Modeling (GIST 4302/5302) : Guofeng Cao Department of Geosciences Texas Tech University
59 pages
Notes MSM
No ratings yet
Notes MSM
66 pages
Foundations of Econometrics Using SAS Simulations and Examples
No ratings yet
Foundations of Econometrics Using SAS Simulations and Examples
56 pages
Glmext4 Preview
No ratings yet
Glmext4 Preview
27 pages
Spatial Relationships Between Two Georeferenced Variables With Applications in R ISBN 3030566803, 9783030566807 One-Click Ebook Download
No ratings yet
Spatial Relationships Between Two Georeferenced Variables With Applications in R ISBN 3030566803, 9783030566807 One-Click Ebook Download
16 pages
Spatial Data Mining: Clustering Techniques
No ratings yet
Spatial Data Mining: Clustering Techniques
56 pages
LA Project PALLAVI
No ratings yet
LA Project PALLAVI
41 pages
Lda PDF
No ratings yet
Lda PDF
47 pages
23S2 IE2107 LA Tutorial Soln
No ratings yet
23S2 IE2107 LA Tutorial Soln
25 pages
MUS2 Draft Contents November 2020
No ratings yet
MUS2 Draft Contents November 2020
14 pages
Application of Transfer Matrix Method in Acoustics
No ratings yet
Application of Transfer Matrix Method in Acoustics
7 pages
Chapter 3:spatial Query Languages
No ratings yet
Chapter 3:spatial Query Languages
32 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
Plastic Pawan
No ratings yet
Plastic Pawan
17 pages
Carl Schildkraut
No ratings yet
Carl Schildkraut
20 pages
Eigenvalue Analysis
No ratings yet
Eigenvalue Analysis
14 pages
L2 Spatial Robust Tests
No ratings yet
L2 Spatial Robust Tests
28 pages
Methods For Applied Macroeconomic Research Fabio Canova C °
No ratings yet
Methods For Applied Macroeconomic Research Fabio Canova C °
14 pages
Principles and Spin 3.1 Four Principles of Quantum Mechanics
No ratings yet
Principles and Spin 3.1 Four Principles of Quantum Mechanics
23 pages
Journal of Financial Management of Property and Construction
No ratings yet
Journal of Financial Management of Property and Construction
26 pages
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Notes From A Projected Sacrifice Zone: Hugo Reinert
No ratings yet
Notes From A Projected Sacrifice Zone: Hugo Reinert
21 pages
Water and Food in The 21st Century Surve PDF
No ratings yet
Water and Food in The 21st Century Surve PDF
23 pages
Fruits of Our Labour: Work and Organisation in The Global Food System
No ratings yet
Fruits of Our Labour: Work and Organisation in The Global Food System
18 pages
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
System Safety in LPG Fired Furnace - A Multi Criteria Decision Making Technique
No ratings yet
System Safety in LPG Fired Furnace - A Multi Criteria Decision Making Technique
21 pages
Journal of The Mechanics and Physics of Solids: Noureddine Damil, Michel Potier-Ferry
No ratings yet
Journal of The Mechanics and Physics of Solids: Noureddine Damil, Michel Potier-Ferry
15 pages
12 P 5
No ratings yet
12 P 5
14 pages
Constructing The Spatial Weights Matrix Using A Local Statistic
No ratings yet
Constructing The Spatial Weights Matrix Using A Local Statistic
15 pages
Systems: Visual Analysis of Nonlinear Dynamical Systems: Chaos, Fractals, Self-Similarity and The Limits of Prediction
No ratings yet
Systems: Visual Analysis of Nonlinear Dynamical Systems: Chaos, Fractals, Self-Similarity and The Limits of Prediction
18 pages
The Spatial Organization of Food Sharing in Early Postclassic Households: An Application of Soil Chemistry in Ancient Oaxaca, Mexico
No ratings yet
The Spatial Organization of Food Sharing in Early Postclassic Households: An Application of Soil Chemistry in Ancient Oaxaca, Mexico
16 pages
Journal Pone 02169858
No ratings yet
Journal Pone 02169858
13 pages
Solution of The Bosonic and Algebraic Hamiltonians by Using AIM
No ratings yet
Solution of The Bosonic and Algebraic Hamiltonians by Using AIM
12 pages
Control System Design in State Space (Ch. 10, P 722) : Pole (Eigenvalue) Placement. (P 723)
No ratings yet
Control System Design in State Space (Ch. 10, P 722) : Pole (Eigenvalue) Placement. (P 723)
7 pages
Journal of Archaeological Science: M. Anne Katzenberg, Olga Goriunova, Andrzej Weber
No ratings yet
Journal of Archaeological Science: M. Anne Katzenberg, Olga Goriunova, Andrzej Weber
12 pages
South African Review of Sociology: Click For Updates
No ratings yet
South African Review of Sociology: Click For Updates
8 pages
Quaternary International: Luis Alberto Borrero
No ratings yet
Quaternary International: Luis Alberto Borrero
8 pages
v78b03 PDF
No ratings yet
v78b03 PDF
4 pages
Module-3 Eco-598 ML & Ai
No ratings yet
Module-3 Eco-598 ML & Ai
93 pages
Quantum Computing: Exercise Sheet 3: Steven Herbert and Anuj Dawar
No ratings yet
Quantum Computing: Exercise Sheet 3: Steven Herbert and Anuj Dawar
2 pages

Notes On Spatial Econometrics: Mauricio Sarrias Universidad de Talca October 6, 2020

Uploaded by

Notes On Spatial Econometrics: Mauricio Sarrias Universidad de Talca October 6, 2020

Uploaded by

Notes on Spatial Econometrics

I Introduction to Spatial Dependence 1

5 Instrumental Variables and GMM 105

5.3.4 FGLS in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

1.1 Environmental Externalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 The SLM for Two Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.1 Estimation steps for SAC model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

3.1 Spatial Models for Crime in Columbus, Ohio Neighborhoods. . . . . . . . . . . . . . . . . . . 71

5.1 Spatial Models for Crime in Columbus: ML vs S2SLS . . . . . . . . . . . . . . . . . . . . . . 119

Introduction to Spatial Dependence

Figure 1.1: Environmental Externalities

1.1.1 Spatial Dependence

yi = f (yj ), i = 1, ..., n , j 6= i. (1.1)

y1 = β21 y2 + β31 y3 + β41 y4 + β51 y5 + 1

1.1.2 Spatial Autocorrelation

Figure 1.2: Spatial Distribution of Poverty in Metropolitan Region, Chile

Cov(yi , yj ) = E(yi yj ) − E(yi )E(yj ) 6= 0 for i 6= j, (1.3)

Figure 1.3: Spatial Autocorrelation

Positive Spatial Autocorrelation Negative Spatial Autocorrelation

1.2 Spatial Weight Matrix

1.2.1 Weights Based on Boundaries

Figure 1.4: Rook Contiguity

Figure 1.5: Bishop Contiguity

The resulting W matrix will be:

Figure 1.6: Queen Contiguity

1.2.2 Weights Based on Distance

ij = r × arccos [cos |xi − xj | cos yi cos yj + sin yi sin yj ]

Negative Exponential Model

Threshold Distance (Distance Band Weights)

1.2.3 Row-Standardized Weights Matrix

An important characteristic of the row-stochastic matrix is related to its eigen values:

1.2.4 Spatial Lagged Variables

W y = 1 0 1 50 = 10 + 30 .

W y = 0.5 0 0.5 50 = 5 + 15 .

yLi = wi,1 yi + wi,2 y2 + ... + wi,n yn ,

1.2.5 Higher Order Spatial

Figure 1.7: Higher-Order Neighbors

1 See also Elhorst (2014, p. 12) and references therein.

The third-order neighbors are:

1.3 Examples of Weight Matrices in R

• .shp: shape format; the feature geometry itself,

# Read shape file

# Names of the variables in .dbf

## [1] "ID" "NAME" "NAME2" "URB_POP" "RUR_POP"

The metropolitan region with the 52 communes is shown in Figure 1.8.

Figure 1.8: Plotting a Map in R

−71.5 −71.0 −70.5 −70.0

1.3.1 Creating Contiguity Neighbors

## Neighbour list object:

# From list to matrix

## Characteristics of weights list object:

We may ask whether the matrix is symmetric using:

## Neighbour list object:

Figure 1.9: Queen and Rook Criteria for MR

# Plot Queen and Rook W Matrices

1.3.2 Creating Distance-based Neighbors

k1neigh <- knearneigh(coords, k = 1, longlat = TRUE) # 1-nearest neighbor

# Inverse weight matrix

dist.mat.inv <- 1 / dist.mat # 1 / d_{ij}

# Standardized inverse weight matrix

## Characteristics of weights list object:

The following code plot the different weight matrices:

Figure 1.10: Different Spatial Weight Schemes for MR

2−Neigh Inverse Distance

1.3.3 Constructing a Spatially Lagged Variable

1.4 Testing for Spatial Autocorrelation

1.4.1 Global Spatial Autocorrelation: Moran’s I

only in the Moran’s I.

Figure 1.11: Moran Scatterplot

Moments Under Normality Assumption

Proof. Let zi = xi − x̄. The following moments are true for zi :

Moran’s I under Randomization

Theorem 1.3 — Moran’s I Under Randomization. Under permutation, we have:

n2 − 3n + 3 S1 − nS2 + 3S02 − b2 n2 − n S1 − 2nS2 + 6S02

Monte Carlo Moran’s I

The idea for any Monte Carlo test is the following:

y1 = β21 y2 + β31 y3 + β41 y4 + β51 y5 + 1