0% found this document useful (0 votes)
25 views43 pages

Cross-Sectional Dependence in Panel Data Analysis

This paper provides an overview of panel data models that allow for cross-sectional dependence between observational units. It distinguishes between two main approaches to modeling this dependence: spatial dependence models that assume nearby units are more correlated, and factor structure models that assume unobserved common factors influence all units. The paper discusses how the degree of cross-sectional dependence affects estimator properties, and considers estimation under assumptions of strong and weak exogeneity. It also reviews available tests for cross-sectional dependence and methods for determining the number of factors. Monte Carlo experiments are used to investigate finite-sample properties.

Uploaded by

KHIEM HUOL GIA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views43 pages

Cross-Sectional Dependence in Panel Data Analysis

This paper provides an overview of panel data models that allow for cross-sectional dependence between observational units. It distinguishes between two main approaches to modeling this dependence: spatial dependence models that assume nearby units are more correlated, and factor structure models that assume unobserved common factors influence all units. The paper discusses how the degree of cross-sectional dependence affects estimator properties, and considers estimation under assumptions of strong and weak exogeneity. It also reviews available tests for cross-sectional dependence and methods for determining the number of factors. Monte Carlo experiments are used to investigate finite-sample properties.

Uploaded by

KHIEM HUOL GIA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Cross-sectional Dependence in Panel Data Analysis

Vasilis Sara…dis Tom Wansbeeky


University of Sydney University of Groningen

February 2010

Abstract
This paper provides an overview of the existing literature on panel data models
with error cross-sectional dependence. We distinguish between spatial dependence
and factor structure dependence and we analyse the implications of weak and strong
cross-sectional dependence on the properties of the estimators. We consider estima-
tion under strong and weak exogeneity of the regressors for both T …xed and T large
cases. Available tests for error cross-sectional dependence and methods for determ-
ining the number of factors are discussed in detail. The …nite-sample properties of
some estimators and statistics are investigated using Monte Carlo experiments.
Key words: Panel data, Cross-sectional dependence, Spatial dependence, Factor
structure, Strong/Weak exogeneity.
JEL Classi…cation: C33; C50.

1 Introduction
The analysis of longitudinal data is common across many …elds of research. In econo-
metrics, the topic is invariably called panel data analysis. Over the last fourty years, it
has grown into a major sub…eld of econometrics. Traditionally, the focus has been on
panels involving a large number of individual units i = 1; : : : ; N , with a few observations
over time, t = 1; : : : ; T .1 Often, the data come from surveys where a large group of
people or households has been followed over a few years. The National Longitudinal
Surveys of Labor Market Experience and the University of Michigan’s Panel Study of
Income Dynamics are prominent examples. One of the primary reasons for collecting
these data has been to overcome aggregation problems that arise with time series data
in modelling the behaviour of heterogeneous agents on the basis of the “representative
agent” assumption. More recently, considerable interest has also been directed to pan-
els where the cross-sectional and time series dimensions are of similar magnitude. For
Corresponding author. Faculty of Economics and Business, University of Sydney, NSW 2006, Aus-
tralia. Tel: +61-2-9036 9120; E-mail: vasilis.sara…[email protected].
y
University of Groningen, P.O.Box 800, 9700 AV Groningen, The Netherlands. Tel: +31-50-363-8339;
E-mail: [email protected].
1
An exception to this is the seemingly unrelated regression (SUR) approach due to Zellner (1962);
see Section 4.1.

1
instance, the Penn-World tables cover several countries over relatively long periods and
the main focus of study lies in cross-country economic, social and political comparisons.
One major issue that inherently arises in every panel data study with potential
implications on parameter estimation and inference is the possibility that the individual
units are interdependent. In fact, this notion of ‘between group’dependence is familiar
in the social sciences since the 1930’s, i.e. well before the emergence of panel data
econometrics. In speci…c, Stephan (1934, pg. 165) argues that “in dealing with social
data, we know that by virtue of their very social character, persons, groups and their
characteristics are interrelated and not independent”. Neprash (1934, pg. 168) asserts
that “the correlation of spatially distributed variables must be accepted with severe
limitations of interpretation. The data involved violate two important conditions of
sound application of correlation and sample techniques namely, the independence of
the units of which the traits are measured, and the homogeneity of distribution of the
traits within a given area”. Fisher in his “Design of Experiments” book (1935, pg. 66)
claims that “patches in close proximity are commonly more alike ... than those further
apart”. Later on, in the …eld of economic geography Tobler (1970, pg. 236) invoked his
‘…rst law of geography’: “everything is related to everything else; but near things are
more related than distant things”.
Naturally, the issue of how to characterise cross-sectional dependence has attracted
considerable attention among researchers over the years. Perhaps the earliest methodo-
logy put forward to deal with this issue was the spatial approach. Spatial models were
developed primarily for cross-sectional data using a concept of a distance metric, which
allowed formulating models with a structure similar to that provided by the time index
in time series. The concept of ‘economic distance’eventually allowed the use of spatial
models in certain economic applications as well, mainly drawn from regional science and
urban economics. The increasing availability of panel data during the last decades gave
rise to new possibilities in characterising error cross-sectional dependence. A prominent
alternative to the spatial approach is the factor structure approach, which assumes that
the disturbance term contains a …nite number of unobserved factors that in‡uence each
individual separately. Initially, the inferential theory for factor models was developed
for cases where one dimension was …xed and the other went to in…nity. Recently, this
theory has been extended for large panels, where both dimensions can go to in…nity; see
Bai (2003) and Bai and Ng (2002).
In this paper we attempt to provide an overview of some of the recent developments
that have been made in the …eld and link them to earlier related work. Realistically,
it is impossible to do justice to the voluminous and still rapidly growing literature of
panel data models with error cross-sectional dependence. In what follows, we shall focus
on stationary models with a static error structure. Throughout, we try to employ a
uni…ed notation. Some of the issues discussed in the paper are also brie‡y mentioned
by Baltagi and Pesaran (2007), which is an introduction to a special issue of the Journal
of Applied Econometrics, and Hsiao (2007) in a paper reviewing the state of the art and
the current issues in panel data analysis. There is a growing literature on dynamic
factor models (see e.g. Forni and Lippi 2001, and Forni, Hallin, Lippi and Reichlin,

2
2000), which is mainly concerned with extraction of common components of economic
variables rather than with estimation of structural (regression) parameters; therefore,
we do not review this in the present paper.2 There are also important developments
in non-stationary panel data models with error cross-sectional dependence (see e.g. Bai
and Ng, 2004, Moon and Perron, 2004, and Phillips and Sul, 2003) for which a succint
overview has already been provided by Hurlin and Mignon (2004) and Breitung and
Pesaran (2008). The main theoretical results for large dimensional factor analysis using
principal components are reviewed in an excellent survey by Bai and Ng (2008).
The set-up of the paper is as follows. The next section describes the spatial and
factor structure approaches. Section 3 links these approaches with the concepts of
weak and strong cross-sectional dependence. Sections 4 and 5 analyse estimation under
strong and weak exogeneity respectively. Section 6 discusses available tests of error
cross-sectional dependence and methods for determining the number of factors. We
conclude by indicating a number of topics for future research.
In what follows we adopt the conventional mathematics notation where capital letters
denote matrices and small letters in bold denote vectors.

2 Spatial Dependence and Factor Structure


Consider the following panel data model:
0
yit = xit + i + it , i = 1; 2; : : : ; N , t = 1; 2; : : : ; T , (1)

where yit is the observation on the dependent variable for individual i at time t, xit is a
column vector of regressors with dimension K, is the corresponding parameter vector
of …xed coe¢ cients, i is an individual-speci…c time-invariant unobserved e¤ect, and it
is the error component that may be cross-sectionally correlated. The latter would imply
that the following is true:

Cov( it ; jt ) 6= 0 for some t and some i 6= j, (2)

where the number of possible pairings ( it ; jt ) increases with N .


We view the presence of cross-sectional dependence in the error term as a consequence
of model mispeci…cation. In other words, had the model been speci…ed correctly, cross-
sectional dependence would had been taken into account and the resulting disturbance
would be purely idiosyncratic and uncorrelated across individuals. The advantage of
this approach is that it makes the distinction between strongly and weakly exogenous
regressors clear. In particular, let 0it be the disturbance term of a correctly speci…ed
model. We call the vector xit strongly exogenous if 0it is mean-independent of its past,
present and future, i.e.
E 0it jxi1 ; :::; xiT = 0, (3)
2
For dynamic factor models see also the special issue in the Journal of Econometrics, Vol. 119, 2004.

3
On the other hand, the vector xit is weakly exogenous if 0 is mean-independent of its
it
past and present, so
E 0it jxi1 ; :::; xit = 0. (4)
Ignoring cross-sectional dependence may a¤ect the …rst-order properties (unbiased-
ness, consistency) of standard panel estimators because even if xit is strongly, or weakly,
exogenous with respect to 0it , it may not be so with respect to it . In addition, even
if the …rst-order properties of these estimators remain una¤ected, the presence of error
cross-sectional dependence may largely reduce the extent to which they can provide ef-
…ciency gains over estimating (1) using, say, OLS for each individual i. In a sense, if all
individuals behave similarly there is little gain to be obtained by looking at more than
one of them.
Unfortunately, modelling general forms of cross-sectional dependence is not a straight-
forward task. In speci…c, contrary to a time series model, where it is natural to specify
the correlations between the disturbances to be functions of distance measured by time,
in a cross-section there is no such natural ordering of the observations. To deal with this
issue, the panel data literature has mainly adopted two di¤erent approaches to modelling
error cross-sectional dependence, the spatial approach and the factor structure approach.
The former assumes that the structure of cross-sectional dependence is a function of
an immutable distance measure, de…ned according to a pre-speci…ed metric. In economic
applications, spatial techniques are adapted using alternative measures of “economic
distance” (Conley, 1999), or “policy and social distance” (Conley and Topa, 2002). A
number of di¤erent spatial processes have been proposed in the literature to model cross-
sectional dependence, the most popular of which have been the Spatial Moving Average
(SMA), Spatial Auto-Regressive (SAR) and Spatial Error Components (SEC) processes.
These can be de…ned as follows:
N
X
SM A; it = wij "jt + "it ;
j=1
N
X
SAR; it = wij jt + "it ;
j=1
N
X
SEC; it = wij jt + "it , (5)
j=1

where wij is the i-speci…c spatial weight attached to individual j, typically determined
before estimation, "it is white noise, and for SEC jt denotes a zero mean random
component, uncorrelated with "it and xit . These spatial models can be estimated using
a generalized method of moments (GMM) approach (see e.g. Kapoor, Kelejian and
Prucha, 2007; Kelejian and Prucha, 2009), or a method based on maximum likelihood
(e.g. Lee, 2004).
The factor structure approach assumes the presence of an unobserved common com-
ponent in the disturbance which is a linear combination of a …xed number of factors (e.g.

4
::
Lawley and Maxwell, 1971; Goldberger, 1972, and Joreskog and Goldberger, 1975). In
this case the error can be written as
0
it = i t + "it , (6)
0
where t = 1t ; :::; M0 t denotes an M0 1 vector of unobserved factors, i =
( 1i ; :::; M0 i )0 is an M0 1 vector of factor loadings3 and "it is a purely idiosyncratic
component such that E ("it ) = 0 and
2
for t = s and i = j,
"
E ("it "js ) =
0 otherwise.
This formulation generates a taxonomy of models depending on whether the i and/or
t are correlated with xit or not. The relative size of N and T is also important. As an
example, suppose that (1) combined with (6) is used to model the returns to education,
where, as is typical in micro-econometric panels, N is large and T small; in this case the
vector of covariates, xit , may include variables like education, experience, and tenure of
individual i with the same employer, i may capture innate ability (which is constant
by de…nition) and i may re‡ect time-varying productivity of individual i. Both i and
4 For small T , the factors can be treated as time-
i are likely to be correlated with xit .
speci…c parameters that re‡ect how productivity varies over time. Another example can
be drawn from the estimation of production and cost functions. For a cost function the
vector xit represents input prices and output, i may capture cost e¢ ciency of …rm i, t
may re‡ect changes in the regulatory regime over time, with i , the impact on …rm i,
depending on the size of the …rm in the market, on …nancial constraints, technology and
other considerations. In this case both t and i are likely to be correlated with input
prices and output.5 Depending upon the size of N and T , as well as on the properties
of xit , di¤erent methods can be used to estimate these models, as we shall see in the
following sections.
Sara…dis (2009) shows that all spatial processes can be expressed in the following
form:
it = ( i wi )0 t + "it ,
by setting M0 = N and imposing appropriate zero restrictions on wi and homogeneity
restrictions on i .6 This may be useful because spatial dependence can be viewed in
this case as a special form of factor structure dependence, in which one may think of the
unobserved components, t as shocks, the impact of which is either ‘global’(factors) or
‘local’(spatially correlated components).
3
There is large variation in the literature regarding the notation used for factor models. Following
Kiviet and Sara…dis (2009), our choice is based on the following reasoning: we use Greek symbols for
unobserved variables/parameters and Latin symbols for observed ones. Consequently, we use " (epsilon)
to denote the purely idiosyncratic error component, (phi) to denote the factors and (lamda) to denote
¯ ¯
their loadings. Similarly, (eta) is used to denote the individual-speci…c e¤ect.
4 ¯ ¯
The argument here would be that it is the most able and productive individuals who embark on
higher education, all other things being equal, such as equal opportunities and so on.
5
Many other examples are provided by Ahn, Lee and Schmidt (2001) and Bai (2009).
6
For the factor structure wi = N , where N is a T 1 column vector of ones.

5
3 Weak and Strong Cross-sectional Dependence
The spatial approach and the factor structure approach imply di¤erent degree of error
cross-sectional dependence. However, there is no unique de…nition of what is ‘weak’
and what is ‘strong’dependence in the literature. In particular, let t, i 1 be the
i
scalar sequence 1t ; 2t ; 3t ,..., and notice that there are T such scalar sequences, for
t = 1; :::; T . Weak dependence can be de…ned in the following ways:

De…nition 1 (Chudik, Pesaran and Tosetti, 2009) The double-indexed sequence f it , i 1, t 1g


is said to be weakly dependent at a given point in time if its weighted average, conditional
on the information set available in the previous period, It 1 , converges to its expectation
in quadratic mean, as N ! 1 for all weights that satisfy certain ‘granularity condi-
tions’.7

De…nition 2 (Sara…dis, 2009) The double-indexed sequence f it ,ni 1, t o 1g is said


t;s
to be cross-sectionally weakly correlated if, for each i and j > 0, i;j ,j 6= i is abso-
lutely summable, that is,
X t;s
i;j < 1, for all t and s, (7)
j6=i

where t;s
i;j =Cov( it ; js j i;j ), and i;j denotes the conditioning set of all time-invariant
characteristics of individuals i and j.

Neither of these de…nitions requires the process to be covariance stationary.8 Fur-


thermore, both de…nitions imply that spatial dependence is a weak form of dependence.
This comes directly from the standard assumption employed in spatial processes that the
row and column sums of the weighting matrix W = [wij ] satisfy a uniform boundedness
condition.9 This can be stated as follows10 :
N
X N
X
jwij j Bw < 1 8 j and jwij j Bw < 1 8 i. (8)
i=1 j=1

It is important to emphasise that spatial dependence in the residuals does not a¤ect the
…rst-order properties (consistency) of standard panel data estimators. In particular,
it is straightforward to show that the mean-independence conditions (3) and (4) are
preserved when the error term of the mispeci…ed model, it , follows either one of the
three processes in (5). Therefore, the potential gains from modelling spatial dependence
arise with respect to estimation e¢ ciency and the validity of inference.
7
These conditions ensure that the weights are not dominated by a few individuals.
8
For alternative de…nitions of weak cross-sectional dependence that require covariance stationarity
see Forni and Lippi (2001).
9
Notice, however, that uniform boundedness is not actually necessary for weak dependence; see
Sara…dis (2009).
10
See e.g. Kapoor, Kelejian and Prucha (2007, pg. 106) and Lee (2007, pg. 491).

6
The di¤erence between the two de…nitions provided above lies mainly in factor struc-
tures and it can be illustrated through an example. Consider a single-factor error process

it = i t + "it , where the following assumptions are made:


2 2
E ( t ) = E ("it ) = 0, E t = , E "2it = 2
";
E ( t "it ) = 0, E ( t s) = E ("ts "it ) = 0 for s 6= t. (9)

According to De…nition 1, the error process is weakly dependent so long as


N
X
1
lim N i = 0. (10)
N !1
i=1

This is because E ( it ) =E( i t + "it ) = iE


( t ) + E ("it ) = 0 and
N
" N N
#
X X X p
1 1 1
N it = tN i+N "it ! 0.
i=1 i=1 i=1

On the other hand, according to De…nition 2 the error process (9) is not weakly de-
P
pendent because t;t i;j = Cov( it ; jt j i ; j ) = i j
2 6= 0 and therefore
j6=i
t;t
i;j is
unbounded. Intuitively, since all individuals are subject to the same shock, t , the
sum of the absolute conditionalPcovariances between individual disturbances grows with
N regardless of whether N 1 N i=1 i ! 0 or not. As a result, all factor structures,
provided they are non-degenerate, imply strong dependence under De…nition 2 but not
under De…nition 1.
De…nition 1 has an encompassing property in the sense that any factor structure
with E 0i t = 0 reduces to a weakly dependent process when the observations are
expressed in terms of deviations from time-speci…c averages. Speci…cally, suppose that
the error term follows process (9). Averaging it over all i for each t and subtracting
yields
it = i t + "it , (11)
P N P N
where it = it :t with :t = N 1 i=1 it , and so on. Therefore, even if N 1 i=1 i 9
0, thus violating (10), we have
N
X
1
lim N i !0 (12)
N !1
i=1

by construction. As a result, in a linear regression model it can always be transformed


such that it becomes weakly dependent.
De…nition 2 has the property that weak dependence is preserved by product. In
other words, the product of two (or more) weakly dependent processes is also weakly
dependent.11 As we shall see later on, this is important in panel data models with
11
See Sara…dis (2009) for a proof.

7
weakly exogenous regressors. Under De…nition 2, weak dependence is not preserved by
product. For instance, consider
P the2 product between it and is , as de…ned in (9). This
product then involves N 1 N i=1 i , which is not converging to zero.
One …nal remark. Neither De…nition 1 implies that weak error cross-sectional de-
pendence cannot a¤ect on …rst-order properties of standard panel data estimators, nor
De…nition 2 implies that strong dependence has always an adverse e¤ect on the …rst-
order properties of these estimators. For the former case, one can think of a regression
model with a single covariate, xit , and one-factor error structure in which i and t
are both random with zero mean and xit contains i t with Cov( i ; i ) 6= 0. For the
latter case one can think of a single-factor error process in which neither i nor t are
correlated with xit .
The literature on spatial dependence is rich and is developing rapidly. Notwith-
standing, the remainder of this paper focuses mainly on residual factor structures. This
is partly because of the generality of this approach relative to spatial dependence, in
that it does not require a priori the speci…cation of a distance metric, which may or
may not be appropriate in certain economic applications. Furthermore, modelling a
factor structure is likely to sweep out the spatial correlations as well (see Pesaran and
Tosetti, 2009). Finally, notice that in the spatial dependence case standard panel data
estimators can still be used to make robust inferences on the parameters. In particular,
one may employ spectral density matrix estimation techniques of the sort popularised
in econometrics by Newey and West (1987), valid for large T and …xed N (see Arellano,
pg. 19, for details) or the methods of Driscoll and Kraay (1998) and Pesaran and Tosetti
(2009), valid for large N and large T .

4 Estimation Under Strict Exogeneity


4.1 The Seemingly Unrelated Regressions Approach
By far the most classic model of error cross-sectional dependence in econometrics is the
Seemingly Unrelated Regressions (SUR) approach, due to Zellner (1962).12 In the form
where the same regressors enter the model for all individuals the model is
0
yit = i xit + i + it , (13)

where both i and i are treated as …xed. There are two underlying assumptions
behind this approach. Firstly, E ( it jxit ) = 0, that is, all regressors remain strongly
exogenous in the mispeci…ed model. Therefore, neglecting cross-sectional dependence
does not a¤ect the …rst-order properties of standard panel estimators. Secondly, the
asymptotics are …xed N and T ! 1. These assumptions combined imply that the
error covariance matrix, = [ ij ], can be left unrestricted, i.e. there is no need to
impose a factor structure in the residuals. The SUR approach leads to a feasible
12
A thorough review of the large literature accumulated on SUR can be found in Srivastava and
Dwivedi (1979) and Srivastava and Giles (1987). A survey of more recent developments is provided by
Fiebig (2001) and Moon and Perron (2006).

8
GLS estimator, in which OLS is used at …rst-stage for each individual-speci…c equation
to obtain consistent estimates of the parameters, including the N (N + 1) =2 distinct
entries in the error covariance matrix. The resulting estimator of i is consistent and
asymptotically e¢ cient. When T is only slightly greater than N , the estimate of may
be ill-conditioned. Kontoghiorghes and Clarke (1995) propose an numerical procedure
for estimating a SUR model that avoids the di¢ culty in directly computing the inverse
of the estimated covariance matrix.
When N > T , the least-squares estimate of the general, unstructured error covariance
matrix, b , with typical entry T 1 Tt=1 bit bjt , is singular. This implies that the standard
SUR estimator is not feasible. Robertson and Symons (2007) propose imposing a
factor structure in the residuals, according to (6), and then they estimate the residual
covariance matrix using maximum likelihood. Therefore, their method allows SUR
estimation of panel models by providing a full-rank estimator of the error covariance
matrix when the usual estimate is rank-de…cient.
An alternative approach, valid for …xed T , is to impose a factor structure, as in (6),
and use a GMM estimator that makes use of the second-order moment restrictions im-
posed on the covariance matrix of i +"i , where = ( 1 ; :::; T )0 and "i = ("i1 ; :::; "iT )0 ;
see e.g. Wansbeek and Meijer (2000). This method requires that i is random with zero
mean, and is uncorrelated with the covariates. Under these assumptions, the model
becomes a particular case of a so-called structural equation model (SEM), which can
be handled routinely by softwares like LISREL, EQS, AMOS, MX, Mplus, MECOSA,
RAMONA, LINCS and PROC CALIS. As Wansbeek and Meijer (2007) indicate, the
availability of these programs is not generally known among econometricians, leading
sometimes to papers dealing with special cases of a SEM, which in fact are not needed.

4.2 The Principal Components Approach


Coakley, Fuertes and Smith (2002) propose an arguably simpler estimation approach,
based on residual principal components analysis. Speci…cally, they estimate individual
OLS regressions of yit on xit to extract the M0 < N principal components of b =
T 1 Tt=1 bit bjt as proxies for the latent factors. In the second stage these proxies are
used as additional regressors with individual-speci…c coe¢ cients. Similarly to the SUR
approach, the estimator requires that the unobserved factors are uncorrelated with xit .
Otherwise, the …rst-stage estimate of is inconsistent, thus invalidating the properties
of the estimators.
An alternative estimator based on principal components analysis that does not de-
pend on an initial estimate of is given by
" N
# 1 N
X X
b = ei0 M b X
X ei ei0 M b y
X ei , (14)
PC
i=1 i=1

where Xei = QT Xi , QT = IT T 1 T 0 , the matrix that transforms the observations in


T
terms of deviations from individual-speci…c averages to remove i , T is a T column vector

9
1
of ones, Xi = (xi1 ; :::; xiT )0 , y
ei = QT yi , yi = (yi1 ; :::; yiT )0 , M b = IT b b0 b b0
0
and b = b 1 ; :::; b T , a T M0 matrix, which is computed as the vector of principal
components extracted from the covariates, xit . Intuitively, the idea is to sweep out the
factors that are common between the yit and xit processes by orthogonalising the data
prior to estimation. A similar estimator is proposed by Kapetanios and Pesaran (2007)
except that b is computed from the vector of principal components extracted from zit =
(yit ; x0it )0 . This can be useful
np when o di¤erent factors hit the y and x processes. The rate of
convergence of b is min N ; T . Therefore for …xed T , b is not consistent, in general,
unless the purely idiosyncratic component is serially uncorrelated and homoskedastic (see
Bai, 2003).
Bai (2009) proposes an iterative principal components (IP C) estimator such that
T
b ;b is the solution to (14) and the following non-linear equation:
IP C
" N
#
1 X ei b IP C ei b IP C
0
b = b VbN T ,
ei
y X ei
y X (15)
NT
i=1

where VbN T is a diagonal matrix that consists of the M0 largest eigenvalues of the mat-
P ei b IP C y
0
ei b IP C , arranged in decreasing order. Therefore,
rix N1T N i=1 y ei X ei X
given b one can estimate and given one can estimate b . The solution can simply be
obtained by iteration. The resulting estimator is consistent and asymptotically normal
jointly p d
as (N; T ) ! 1, i.e. N T b IP C ! N (0; IP C ), where IP C is the asymp-
p
totic variance of N T b IP C . For …xed T the estimator is inconsistent under
serial correlation or heteroskedasticity13

4.3 The Common Correlated E¤ects Estimator


In practice, M0 is most likely to be unknown and so it needs to be estimated.14 Pesaran
(2006) proposes an alternative which does not require estimating the number of latent
factors and is valid even when t is correlated with xit . His ‘Common Correlated
E¤ects’(CCE) estimator is given by
b 1
i;CCE = Xi0 Mw Xi Xi0 Mw yi , (16)
1
where Mw = IT Zw Zw0 Zw Zw0 , and Zw = (zw1 ; :::zwT )0 ; T is the T (K + 2)
matrix of observations on the weighted cross-sectional averages of the observed variables
in (13) including a vector of ones, i.e. the typical entry is zwt = N i=1 wi zit .
15 The
13
Notice that if the i were known p b could be estimated using a cross-sectional regression for each t
and the rate of convergence would be N , under arbitrary serial correlation and heteroskedasticity.
14
The issue of estimating the number of factors is discussed in Section 6.2.
15
For consistency, it is only required that the chosen weights satisfy, for each i, the condition N 2
j=1 wij !
1
0 as N ! 1. Therefore, an obvious choice is wij = N for all i.

10
intuition of this method lies in that even if t is unobserved, it is in the space spanned
by the cross-sectional weighted averages of the observed variables. As a result, the
projection as in (16) eliminates the factors and hence the inconsistency due to possible
correlations that exist between the factors and the regressors.16 To see this, assume the
following general model for the correlation between t and xit :
0
xit = i t + i + it , (17)

where i is the M0 K matrix of factor loadings of the covariates, it is the K 1


vector of the speci…c disturbances of xit such that E ( it j i 0 t ; i ) = 0. Combining
(13) with (17) yields
0 0 0 0
yit i + i i i + i i it + i it
zit = = 0 t+ +
(K+1) 1 xit i i it
0
= i t + i + it . (18)
(K+1) M M 1 (K+1) 1 (K+1) 1

Setting wij = N 1 for all i and averaging over i gives


0
zt = t + + t.

Assuming that
Rank = M0 K + 1 for all N (19)
p
and using the result that t ! 0 as N ! 1 for each t, we have
0 1 p
t (zt ) ! 0 as N ! 1. (20)

Thus employing the Frisch-Waugh theorem, (20) suggests using y t , xt and T as observ-
able proxies for t .17
E¢ ciency gains from pooling the observations over the cross-sectional dimension can
be achieved when the individual slope coe¢ cients are the same, i.e. i = . Setting
wij = N 1 for all i yields the following pooled CCE estimator:
" N
# 1 N
X X
b = Xi0 M Xi Xi0 M yi , (21)
P CCE
i=1 i=1

1 b
where M =IT Z Z 0Z Z 0 with Z = (z1 ; :::zT )0 ; T and zt = N 1 N z .
j=1 jt P CCE
jointly
is asymptotically (large N ) unbiased for , and as (N; T ) ! 1,
p d
N T b P CCE ! N (0; P CCE ) , (22)
16
A similar projection is proposed by Mundlak (1978) with the di¤erence being that Pesaran’s approach
includes the cross-sectional mean of the dependent variable as well. Mundlak’s projection will not work
if the regressors are correlated with the factors.
17
The scaling does not a¤ect t .

11
p
where P CCE is the asymptotic variance of N T b P CCE . For …xed T the
distribution of b P CCE is non-standard because it depends on nuisance parameters. The
method of bootstrapping could be used to obtain standard errors for b P CCE in this case
although this is still a matter of research. Kapetanios, Pesaran, and Yamagata (2009)
have extended the results of Pesaran (2006) by allowing unobserved common factors to
follow unit root processes.
The CCE estimator is attractive because it is computationally very simple. Further-
more, the estimator has the additional advantage that it does not require specifying the
number of factors, which is necessary if the latent factors are estimated using maximum
likelihood or an approach based on principal components analysis. On the other hand,
it is clear from (20) that the rank condition (19) might be crucial for the estimator.
This will be violated if the number of unobserved factors is larger than K + 1 or if,
for example, the average of the factor loadings in the yit and xit equations tends to a
zero vector, in which case Rank < M0 .18 When the rank condition is violated,
the CCE estimator requires, for consistency, that the factor loadings satisfy a random
coe¢ cients type assumption speci…cally that i and i are mutually independent and
also independent from t . Notice that under such assumption, the …xed e¤ects estim-
ator remains unbiased and consistent even if t is correlated with xit , provided that the
observations are expressed in terms of deviations from time-speci…c averages.19 This is
because
0 0
E( it jxi1 ; :::; xiT ) = E i t + "it jxi1 ; :::; xiT =E i t jxi1 ; :::; xiT
0 0
= E i jxi1 ; :::; xiT E ( t jxi1 ; :::; xiT ) = E i E( t jxi1 ; :::; xiT ) = 0, (23)
where the second equality holds under strong exogeneity of the covariates with respect
to the idiosyncratic error, the third equality holds because i and t are mutually
independent and the fourth equality because i and i are mutually independent. This
result implies that even if t is correlated with xit it is still possible to obtain consistent
estimates of the parameters using the SUR approach of Robertson and Symons (2007)
and the residual principal components estimator of Coakley, Fuertes and Smith (2002),
provided that the …rst-stage estimated error covariance matrix, b , is based on the two-
way …xed e¤ects regression.

4.4 A Monte Carlo Study


4.4.1 Design
We investigate the …nite sample performance of the above estimators using a limited
Monte Carlo study. The underlying data generating process is given by
1 1 2 2
yit = xit + ! it , ! it = i + it , it = i t + i t + "it ,
1 1 2 2
xit = i t + i t + i + "it , i = 1; :::; N , t = 1; :::; T , (24)
18
The latter implies weak cross-sectional dependence under De…nition 1. See e.g. example (9).
19
Essentially this is the two-way error component …xed e¤ects estimator. See e.g. Hsiao (2003, section
3.6) and Baltagi (2008, chapter 3).

12
where i i:i:d:N 0; 2 ; "it i:i:d:N 0; 2" and "it i:i:d:N (0; 1).20 Furthemore,
m
t i:i:d:N (0; 1), m
i i:i:d:N m
2
; m for m = 1; 2, and
m m 2
i = m
[ m i = m + (1 m
)1=2 i
m
]= m
m
i + m i
m
, (25)

where m = E[( m i m
)( i m m
)]=( m m ), m = m = m , m = m (1
m
2 )1=2 and
m i i:i:d:N ( m ; 1). Hence, i m is a weighted sum of the two mu-
tually independent random components, m i and i
m
, which are weighted such that
m 2 m m
V ar( i ) = and E[( i m
)( i m
)] = m m m
, as required. We set
m
= and 2 2
= m for m = 1; 2. We also set = 2, N = 100, T = 50. 2,000
m m
replications are performed.
Following Kiviet and Sara…dis (2009) we choose values for the simulation parameters
on the basis of (i) 1 , the fraction of the ‘structured’noise, it , over the total noise, ! it ,
(hence just excluding the idiosyncratic disturbance noise), i.e.
2 +
P2 2 +
P2 2
m=1 m m=1 m
1 P 2 2 +
P 2 2
; (26)
2+ 2 +
" m=1 m m=1 m

(ii) 2, which is the fraction of the factor noise over all structured noise, i.e.
P2 2 +
P2 2
m=1 m m=1 m
2 P 2 2 +
P 2 2
; (27)
2 +
m=1 m m=1 m

and (iii) 3 , which re‡ects the closeness of the factor structure to an ordinary time e¤ect
(which it is when 2 m = 0 for m = 1; 2), i.e.
P2 2
m=1
3 P2 2
Pm
2 2
. (28)
m=1 m
+ m=1 m

Normalising 2" = 1 implies that these three fractions parameterise completely 2 ,


P2 2
P2 2 .
m=1 and m=1 In particular, it is straightforward to show that 2 =
m
1 ( 1 + 2 )+ 1 2 P m
P2
( )2
, 2
m=1
2
m
= 1 ( 1 +( 3 )+
)2
1 3
and m=1
2
m
= 11 2 1 3 .21 We set
2 3 2 3
2 = 2 = 22 = 2 =
1 = 0:85, 2 = 0:80 and 3 = 0:80. Further, we set 1 1 2

1
P2 2 ,
P2 2
1=2
2 m=1 m 1
= m=1 m
1 and 2 = 1 We consider Case I in which
20
We also performed the experiments with it = uit + uit 1 , = 0:5, uit i:i:d:N (0; 1). However,
the results were very similar and therefore they are not reported in the paper. They are available from
the authors upon request.
21
An alternative design, which is common practice in the literature, would be to choose the mean and
variance of the error components such that the average error cross-sectional correlation, , equals a
speci…c value. However, as noted in Kiviet and Sara…dis (2009), this is problematic because a particular
value of the average error cross-sectional correlation can be obtained at a multitude of combinations of
parameter values. On the contrary, reporting the values of these ratios enhance the transparency of the
design.

13
the rank condition (19) is satis…ed and Case II in which the rank condition is violated.
1 1 1 1
The former sets = 1 2 1=2
and = 2 1=2 .
1
(1 ) 1
2
(1 2 ) 2
1 2
This implies that E i = 1
but E i = 2
6= 2
. The latter sets m
=
1
m 1=2
1
, which implies that E ( i m ) = m for m = 1; 2. These two
(1 2 ) m
cases yield the following expectation for the matrix of factor loadings:
1 1
i i 1:347 1:347
Case I : E 2 2 ,
i i 1 1

and
1 1
i i 1:347 1:347
Case II : E 2 2 .
i i 1 1
We also consider two sub-cases for the correlation between i and i speci…cally,
2 f0; 0:5g, which generates Case I(I)a and Case I(I)b respectively.

4.4.2 Results
Table 1 reports bias, expressed as a percentage, and root mean square error (RMSE)
for all estimators.22 F E and T W F E denote the one-way and two-way error compon-
ent …xed e¤ects estimators respectively, F E-P C and T W F E-P C denote the principal
components estimator proposed by Coakley, Fuertes and Smith (2002), based on F E
and T W F E residuals; and P CCE and P C denote (21) and the iterative version of (14)
respectively. Firstly, we can see that F E exhibits a large bias in all cases. This is
because the within transformation does not eliminate the factor structure and the t s
are correlated with xit given (24). T W F E performs, perhaps surprisingly, very well in
terms of both bias and RMSE so long as the factor loadings of y and x are uncorrelated
namely, = 0. As expected, the estimator is not a¤ected by whether the rank
condition is satis…ed or not. However, when = 0:5 both bias and RMSE of T W F E
increase substantially because (23) does no longer hold true. The performance of F E-
P C and T W F E-P C is naturally a¤ected by the properties of the residuals they use at
…rst-stage. Hence, F E-P C is biased in all circumstances, while T W F E-P C appears
to perform very well when T W F E also does well. Of course in this case T W F E-P C
outperforms T W F E in terms of variance and RMSE since T W F E-P C augments the
T W F E model by including estimates of the M0 principal components in the set of re-
gressors. P CCE performs best when the rank condition is satis…ed. In fact, in this case
P CCE outperforms P C, which is remarkable given that M0 is assumed to be known.
When the rank condition is violated the estimator seems to do well for = 0, although
it is outperformed by T W F E-P C and P C in this case. When = 0:5 the performance
22
Speci…cally, we report bias in terms of 100 c = , where is the average estimate over all
replications of , obtained using method c. Since = 1, the entries represent essentially bias multiplied
by one hundred.

14
of P CCE deteriorates substantially. On the other hand, while P C is not a¤ected by the
rank condition, it is a¤ected by the value of . Therefore, for = 0:5 P C outperforms
P CCE only when the rank condition is violated. In this case T W F E-P C does best.

Table 1. Bias in % and RMSE of Estimators


FE T W F E F E-P C T W F E-P C P CCE PC
Case Ia: Rank condition satis…ed, = 0.
13:7 :000 3:67 :027 :027 :063
(:290) (:034) (:089) (:029) (:013) (:036)
Case Ib: Rank condition satis…ed, = 0:5.
32:0 10:5 4:01 4:72 :271 4:04
(:643) (:213) (:087) (:107) (:014) (:094)
Case IIa: Rank condition violated, = 0.
31:7 :000 2:43 :051 :051 :067
(:625) (:034) (:051) (:023) (:027) (:023)
Case IIb: Rank condition violated, = 05.
35:1 10:5 4:48 3:57 5:95 4:06
(:704) (:213) (:092) (:083) (:125) (:093)
In conclusion, we can see that possible non-zero correlations between the factor
loadings of y and x can have an adverse e¤ect on the estimators. Whether this issue
applies or not in practice depends on the speci…c application of course. As an example,
suppose that (1) represents a cost function, where yit denotes cost, xit denotes a vector
of input prices and output, and t denotes an oil price shock that hits the industry as
a whole at time t; in this case, it is natural to think that the factor loadings of input
prices will be (highly) correlated with the factor loadings of cost. Similarly, one may
think of examples where the factor loadings would be mutually uncorrelated.
As mentioned above, PCCE and PC are valid under N and T both large. When T
is small it may be more natural to employ a …xed e¤ects treatment of i and use one of
the methods described in Sections 5.3 and 5.4:

5 Estimation Under Weak Exogeneity


Economic behaviour is intrinsically dynamic. For example, as a result of the force of
habit, individual agents may change their consumption and investment patterns with
a lapse of time. Similarly, technological and institutional reasons may prevent …rms
from switching between optimal levels of capital and labor instantaneously. Imperfect
knowledge and uncertainty may also contribute to persistence, or a delayed response to
shocks by decision makers. In most cases future expectations can play an important
role in decision making and expectational errors may imply some degree of dependence
between a subset of the regressors and lagged disturbances, leading to weak exogeneity
as in (4).

15
5.1 Asymptotic Properties of Least Squares Estimators
Often the weakly exogenous regressor takes the form of a lagged dependent variable.
Since this variable is by construction correlated with the individual e¤ect, i , estimation
of the dynamic panel data model is not straightforward and indeed it has spawned a
vast literature, which is still growing. In its simplest form, where the lagged dependent
variable is the only regressor, the model is
yit = yit 1 + i + it , j j < 1, i = 1; 2; : : : ; N , t = 1; 2; : : : ; T . (29)
Since strong exogeneity is violated in (29), standard least-squares-based estimators that
rely on the elimination of the individual e¤ect yield inconsistent parameter estimates
even if there is no error cross-sectional dependence. Two such estimators are the …xed
e¤ects and …rst-di¤erenced estimators, which converge to the following limiting values23 :
1
plimN !1 ^ F E = N ( ;T) D( ;T) , and (30)
1
plimN !1 ^ F D = , (31)
2
h i h i
1
where A ( ; T ) = [T (1 )] 1 T 1 T (1 ) 1 and B ( ; T ) 1 = (T 1) 1 2
h i
1 2 [(1 ) (T 1)] 1 1 1 T [T (1 )] 1 . It follows that both ^ F E and
^ F D are inconsistent for …xed T as N ! 1. For T ! 1, ^ F E is consistent but ^ F D is
not, unless Var( i ) = 0. One way to obtain consistent parameter estimates is to start
from (30) or (31); since both estimators converge
p into functions of (and T ) alone, it is
possible to solve in terms of and obtain N -consistent estimates of the autoregressive
parameter. In the case of (30) the solution requires a numerical approach due to the
fact that A and B are highly nonlinear. However, (31) involves a linear function,
making the construction of a consistent, or “bias-corrected”, estimator trivial24
^ BCF D = 2^ F D + 1. (32)
Phillips and Sul (2007) analyse the properties of ^ F E under error cross-sectional
dependence. They show that the estimator converges, for …xed T , to
h ih i 1
2 (1) 2 (1)
plimN !1 ^ F E = " A ( ; T ) + AT " B ( ; T ) + BT , (33)

where A( ; T ) and B( ; T ) are de…ned in (30), and


T
X
(1) 0 0
AT = t + (wt 1 w; 1) , and
t=1
XT
(1) 0 0
BT = (wt 1 w; 1) + (wt 1 w; 1) , (34)
t=1
23
See Nickell (1981) and Phillips and Sul (2007).
24
See Chowdhury (1987). It is worth mentioning that bias-corrected estimators of this type have not
found many applications in the literature since it is not straightforward to generalise them into models
that include weakly exogenous regressors other than the lagged dependent variable.

16
P PT
where wt 1 = 1=0 t 1 , w; 1 = T
1
t=1 wt 1 , E ( i ) = and E ( i )( i )0 =
. Therefore, cross-sectional dependence adds an extra source of bias; for i = 0 (33)
reduces to (30). It is worth noting that contrary to (30), the probability limit in (33)
depends on nuisance parameters, in particular t . According to Phillips and Sul (2007),
this may explain the substantial variability observed in dynamic panel estimates when
there is cross-sectional dependence, even in situations where N is large.
It is worth mentioning that time-speci…c demeaning of the observations will not
remove the source of bias that is due to the factor structure from (33), even if the factor
loadings satisfy a random coe¢ cients type assumption. Instead, it is straightfoward to
show that the only di¤erence in the plim of b F E is that (34) changes to
T
X
e (1) =
0
(wt 1 w; 1) , and
AT t
t=1
XT
e (1) = (wt 1 w; 1)
0
(wt 1 w; 1) . (35)
BT
t=1

This is contrary to the case of strong exogeneity, in which the two-way error component
…xed e¤ects estimator is consistent under a random coe¢ cients type assumption for the
factor loadings, even if the latent factors are correlated with the regressors.
Unfortunately, under weak exogeneity the methods discussed in the previous section,
based on maximum likelihood and principal components analysis, will not generally yield
consistent parameter estimates either. For instance, the P C estimator may be thought
of as a two-stage process, whereby is purged from the model by multiplying through
by the projection M and then b is obtained by (non-)linear regression. However this
procedure transforms the residuals of the model to M e i . Each entry of this vector
is a linear combination of the elements of the whole time-series e i and thus it is not
orthogonal to the corresponding entry in X ei . This means that writing (14) as
" N
# 1 N
X X
b = + N 1 e 0 Mb X
X ei N 1 e 0 Mb e i ,
X (36)
PC i i
i i

the second term on the right-hand side will not have zero probability limit as N ! 1
jointly
for …xed T , or even (N; T ) ! 1. The same issue arises with the CCE estimator,
which is equivalent to a (stacked) linear regression of yi on M Xi where in this case M
is the projection that removes yt and xt , and as such the transformed residuals are not
orthogonal to the regressor unless the latter is strongly exogenous.
The properties of the bias-corrected FD estimator (32) under error cross-sectional
dependence are investigated by Hayakawa (2007). The author shows that

1 (2)
2T AT 2 2"
plimN !1 ^ BCF D = 2 + 1 + (2)
, (37)
1 2 2
T BT + 1+ "

17
where
T
X T
X
(2) 0 0 (2)
AT = t + wt 1 and BT = wt0 1 + 0
wt 1, (38)
t=1 t=1
P
where wt 1 = 1=0 t 1 . Therefore, in this case cross-sectional dependence
turns an otherwise consistent estimator inconsistent, for …xed T . On the other hand, for
T ! 1 we have that plimN !1 ^ BCF D = , regardless of whether N is …xed or N ! 1.

5.2 Asymptotic Properties of IV and GMM Estimators


By far the most popular approach for estimating dynamic panel data models is the
method of instrumental variables and the Generalised Method of Moments (see Anderson
and Hsiao, 1981, Holtz-Eakin, Newey and Rosen, 1988, and Arellano and Bond, 1991).
The point of departure is the simple observation that

E [yit s it ] = 0; for t = 2; :::; T and 2 s t. (39)

Replacing the expectation by the sample average and minimising a weighted quadratic
distance function with respect to produces a consistent estimator. Subsequent exten-
sions augment the standard …rst-di¤erenced GMM estimator with additional moment
conditions implied either by the same basic assumptions (Ahn and Schmidt, 1995) or
by additional assumptions regarding the initial conditions (Arellano and Bover, 1995,
and Blundell and Bond, 1998). The latter allows one to combine the equations in
…rst-di¤erences with the equations in levels, constructing a ‘system’GMM estimator.
Sara…dis and Robertson (2009) analyse the behavior of the standard IV estimator
under error cross-sectional dependence. They show that the estimator has the following
probability limit:
P PT
plimN !1 N1 Ni=1 t=2 yit 2 it (3) (3) (T 1) 2
1
plimN !1 (^ IV )= 1 PN P T
= AT BT " ,
plimN !1 N i=1 t=2 yit 2 yit 1 1+
(40)
where
T
X T
X
(3) (3)
AT = ft0 + 0
wt 2 and BT = wt0 1 + 0
wt 2. (41)
t=2 t=2

Therefore, similarly to the result for BCFD, cross-sectional dependence renders an oth-
erwise consistent estimator inconsistent for …xed T .25 Essentially, this is because
the numerator in (40) converges to the population moment condition, conditional on
f gt 1 , which is E (yit 2 it ) j f gt 1 6= 0, even if the unconditional expectation
E [(yit 2 it )] = 0. A direct by-product of this result is that all standard GMM es-
timators that make use of lagged values of the dependent variable as instruments for
25 1 (3)
Notice, however, that for large T the bias diminishes because T AT = op (1).

18
the endogenous regressor are inconsistent. This holds true for any lag length of the
instruments used. For instance, (39) becomes

E yit s it j f gt 1 = 0
t + 0
wt s 6= 0; for t = 2; :::; T and 2
1. s t
(42)
A similar result applies for system GMM. In general, the asymptotic bias of these estim-
ators will depend on the particular transformation employed, the number of instruments
used and the choice of the weighting matrix. It is worth emphasizing that not all forms
of cross-sectional dependence are detrimental to GMM. Sara…dis (2009) focuses on the
conditions required on the cross-sectional dimension of the error process for the standard
dynamic panel GMM estimator to remain consistent. He demonstrates that, if there is
cross-sectional dependence in the errors, it su¢ ces that this is weak (under De…nition
2).
Notice that for the single-factor case, the asymptotic bias of ^ IV reduces to
2 + 2
1
plimN !1 (^ IV )= (T 1) 2
, (43)
2 + 2
2 1+ "

P P P PT
where 1 = Tt=2 wt 2 t and 2 = Tt=2 wt 1 wt 2 = Tt=2 wt 1 wt 2 2
t=2 (wt 2 ) .
Sara…dis and Robertson (2009) demonstrate that ^ IV is biased downwards in this case.
When the observations are expressed in terms of deviations from time-speci…c aver-
ages, the asymptotic bias of the IV estimator is
P PT
plimN !1 N1 Ni=1 t=2 y it 2 it (3) (3) (T 2) 2
1
plimN !1 (e IV )= 1 PN P T
= AT BT " ,
plimN !1 N i=1 t=2 y it 2 y it 1 1+
(44)
(3) PT (3) PT
where y it = yit y t , AT = t=2 ft0 wt 2 and BT = t=2 wt0 1 wt 2, while for
M0 = 1 (44) reduces to
2
1
plimN !1 (e IV )= (T 1) 2
: (45)
2
2 1+ "

Notice that for 2 = 0 (i.e. factor loadings have zero variance) the bias in e IV disappears.
Sara…dis and Robertson demonstrate that unless = 0 the asymptotic bias of e IV is,
in general, going to be smaller than ^ IV . Intuitively, this is because time-speci…c
demeaning reduces the impact of the factor structure (by removing the mean value of
i ), which is the reason for the asymptotic bias of the IV estimator. Simulation results
con…rm these …ndings and provide a formal justi…cation to the practice of including
common time e¤ects in the context of a short dynamic panel data model with large N
and T …xed.
Although time-speci…c demeaning may reduce the impact of cross-sectional depend-
ence (provided that 6= 0) it will not eliminate it, unless =p0. Sara…dis, Yamagata
and Robertson (2009) show that one can still obtain …xed-T , N -consistent estimates

19
of the parameters within the GMM framework by using instruments with respect to the
subset of regressors that are strongly exogenous (if any), provided that they remain so
in the mispeci…ed model. Strong exogeneity of a subset of xit will be maintained in the
mispeci…ed model if their factor loadings are either zero, or mutually uncorrelated with
the factor loadings involved in the y process. Empirically, this can be determined using
Sargan’s (1958) or Hansen’s (1982) overidenti…cation restrictions test statistic.
Sara…dis (2009) demonstrates that under weakly correlated errors (using De…nition
2), an additional, non-redundant, set of moment conditions arises for each individual i
speci…cally, instruments with respect to the individual(s) with which unit i is weakly
correlated. This set of instruments can be used to obtain consistent estimates of the
parameters in situations where the error structure is subject to both weak and strong
correlations.26 For instance, consider (29) and let
0
it = i t + "it + "jt : (46)

Hence the composite error, it , is subject to a multi-factor structure and the purely
idiosyncratic component, "it , is spatially correlated and follows an MA(1) process.27
In other words, mispeci…cation of the model results in both global (factors) and local
(spatial) correlations. Transforming in terms of deviations from time-speci…c averages
yields
y it = y it 1 + i + 0i t + "i;t + "j;t . (47)
As mentioned above, the moment conditions with respect to lagged values of y are
invalidated under the multi-factor structure (see e.g. (42)). However, it turns out that

E y j;t s i;t j f gt 1 = 0; for t = 2; :::; T and 2 s t 1. (48)

The required assumption for the above result is that the factor loadings are cross-
sectionally uncorrelated, i.e. E i 0j j . A similar expression to (48) (mutatis mutandis)
applies for system GMM. The resulting GMM approach is attractive because it does
not require strongly exogenous regressors under the mispeci…ed model, although it does
require the speci…cation of a weighting matrix, which may or may not be appropriate in
certain economic applications.

5.3 Estimation Using Quasi-Di¤erencing Approaches


A di¤erent approach, valid for …xed T , which does not require strongly exogenous re-
gressors or spatially correlated errors, is to treat t as a vector of …xed parameters,
speci…c to each time period, and the vector i as random variables which are correlated
with the covariates but uncorrelated with the purely idiosyncratic component, "it . Es-
sentially, this is the usual …xed e¤ects assumption extended to the factor case, although
26
This structure is also studied by Pesaran and Tosetti (2009) and Chudick, Pesaran and Tosetti
(2009).
27
Spatial dependence is only a device here; in fact, ordering of the observations is not necessary to
obtain the results.

20
the within transformation cannot eliminate the common factors in this case. To this
end, a number of di¤erent transformations, all based on quasi-di¤erencing, have been
proposed to eliminate the factors from the model and estimate the structural parameters
using the generalised method of moments.
An early application of this approach has been considered by Wansbeek and Knaap
(1999) who imposed M0 = 1 and t = t. So instead of an arbitrary sequence of time
…xed e¤ects 1 ; :::; T , entering the model multiplicatively, there is a linear trend with
individual-speci…c coe¢ cients. After taking …rst-di¤erences i drops out of the model.
The linear trend becomes a constant, which disappears after taking …rst-di¤erences
again. Double-di¤erencing may eliminate much of the variation of the data and the
issue of weak instruments might arise, cf. Bekker (1994), also discussed by Wansbeek
and Knaap (1999).
A generalization of the model above is given by Nauges and Thomas (2003), employ-
ing a transformation proposed by Holtz-Eakin, Newey and Rosen (1988). In particular,
they consider (29) with a single-factor error structure, i.e. it = i t + "it , and T …xed.
They use …rst-di¤erencing to eliminate the individual e¤ects, which yields
yit = yit 1 + it , it = i t + "it . (49)
De…ne %t = t= t 1; lagging (49), multiplying by %t and subtracting yields
yit %t yit 1 = ( yit 1 %t yit 2) + i t %t t 1 + ( "it %t "it 1)
= ( yit 1 %t yit 2) + ( "it %t "it 1) : (50)
Notice that appropriate lagged values of the dependent variable will be uncorrelated
with the transformed error term, leading to a GMM estimator based on Arellano-Bond
type of moment conditions. Assuming "it is serially uncorrelated, this set of moment
conditions is
E [yit s( "it %t "it 1 )] = 0; for t = 3; :::; T and 3 s t. (51)
The main di¤erence with (39) is that the moment conditions above are non-linear be-
cause the time-speci…c nuisance parameters, %t , need to be estimated jointly with the
structural parameter, . The results from their Monte Carlo study are mixed; while
the proposed GMM estimator exhibits, in general, smaller bias compared to the stand-
ard …rst-di¤erenced GMM estimator, it also has larger variance to the extent that it is
outperformed in terms of RMSE.28
Ahn, Lee and Schmidt (2006) consider a model with a multi-factor error structure
and weakly/strongly exogenous regressors. They use a di¤erent transformation, based
on multi-quasi-di¤erencing, and propose a GMM estimator applied on the multi-quasi-
di¤erenced model. To see how this method works, assume, without loss of generality,
that M0 = 2 and consider the following model:
1 1 2 2
yit = yit 1 + it , it = i t + i t + "it , (52)
28
An alternative transformation for the single-factor model, based on quasi-di¤erencing as well, is
provided by Ahn, Lee and Schmidt (2001).

21
where "it is serially uncorrelated. Identi…cation of this factor model requires M02 [= 4]
restrictions. Typically, M0 (M0 + 1) =2 restrictions arise by normalising
T
X
m n 1 for m = n,
t t = (53)
0 otherwise.
t=1

Additional M0 (M0 1) =2 restrictions are usually obtained by requiring that the factor
loadings are mutually uncorrelated. Since M0 = 2, this would yield one extra restriction
in the present case. Alternatively, one can impose M02 restrictions solely on the factors,
which are treated as parameters. This is the approach followed by Ahn, Lee and
Schmidt. In particular, they normalise 1T = 1, 1T 1 = 0, 2T = 0, 2T 1 = 1. In this
case model (52) becomes, for periods T 1 and T , respectively,
2
yiT 1 = yiT 2 + i + "iT 1, (54)

and
1
yiT = yiT 1 + i + "iT . (55)
1 2
Multiplying (54) and (55) by t and t respectively and subtracting from (52) yields
1 2
yit t yiT t yiT 1
1 2 1 2
= yit 1 t yiT 1 t yiT 2 + "it t "iT t "iT 1 . (56)

This suggests the following (T M0 ) (T M0 + 1) =2 non-linear moment conditions:


1 2
E yit s "it t "iT t "iT 1 = 0; for t = 2; :::; T and 2 s t, (57)

which lead to joint estimation of the structural parameter, , and the (T 2) 2 nuisance
parameters. In a compact form, for any …xed number of factors one may write the model
as
yi = yi; 1 + i + "i , (58)
where = ( 0u ; IM ) and u is the (T M0 1) M0 matrix of unrestricted parameters
with typical entry mt for t = 2; :::; T M0 and m = 1; :::; M0 . The transformation that
makes the error orthogonal to the factors amounts to pre-multiplying (58) by 01 , where
0 0
1 = (IT M ; u ) , since 1 = 0 by construction.

5.4 Alternative Approaches


An alternative approach to quasi-di¤erencing involves introducing explicitly a new set
of parameters, which, under the …xed e¤ects assumption, represent the unobserved cov-
ariances between the covariates and the common factor component of the disturbance.
This is the method followed by Robertson, Sara…dis and Symons (2010) and Bai (2010).
The former is based on the generalised method of moments and the latter on the method
of maximum likelihood. It is instructive to illustrate these methods using model (52)
and letting M0 free. Let E ( i yis ) = s for any t and s. Notice that under no serial

22
correlation in "it , we have E ("it yis ) = 0 for any s t 1. Therefore, the following
T (T + 1) =2 centered moment conditions exist:
0
E yis it s t = 0, for t = 1; :::; T and s t 1. (59)

The above expression is similar to a moment condition like E (Xi ) = 0, except that
the former is non-linear because some of the parameters enter multiplicatively. Writing
the model in vector form, we have

yi = yi; 1 + i, i = i + "i , (60)

where yi = (yi1 ; :::; yiT )0 , yi; 1 = (yi0 ; :::; yiT 1 )0 , = ( 1 ; ::: T )0 is a T M0 matrix,
and "i = ("i1 ; :::; "iT )0 . De…ne the matrix of instruments as follows:
2 3
yi0 0 0 0 0 0
6 0 yi0 yi1 0 0 0 7
6 7
6 .. .
.. .
.. .. 7
Zi =6 . . 7. (61)
T T (T +1)=2 6 7
4 5
0 0 0 ::: yi0 yi1 ::: yiT 1
We have
E Zi0 ui S (IT ) T
= 0, (62)
where S is a selector matrix of order T (T + 1) =2 T 2 that consists of 0s and 1s, with a
0 0
single 1 in each row29 , = 0 ; ::: T 1 is a T M0 matrix, and T = 01 ; 02 ; :::; 0T
is a T M0 1 vector.
Replacing the population moments with their sample averages yields
N
X
1
N Zi0 ui S (IT ) T
= 0. (63)
i=1

T
De…ning = ; ; the GMM estimator is
N
!0 N
!
X X
b = arg min N 1
Zi0 ui S (IT ) T
AN N 1
Zi0 ui S (IT ) T
,
i=1 i=1
(64)
where AN is a non-negative de…nite weight matrix.
Robertson, Sara…dis and Symons (2010) call estimators in this class Factor Instru-
mental Variable (FIV) estimators. They note that in most practical circumstances a
set of linear restrictions can be demonstrated to hold among the parameters, namely the
matrix . These can be obtained by writing the model as

z0it (1 )= 0
i t + "it , (65)
29
The number of rows of S corresponds to the number of moment conditions available and the number
of columns corresponds to the number of regressors (1 at present) times the number of time periods
available squared.

23
0
where zit = (yit ; yit 1) , and then multiplying through by i and taking expectations:
0
E i zit (1 )= t, (66)

where = E i 0i . The key point here is that the elements in E ( i z0it ) include terms
in various of the s because the instrument set includes elements of zit , so the left-hand
side of (66) is a linear function of the entries in . For example, for the single-factor
model the linear restrictions take the form
2
s+1 = s + s+1 , s = 0; :::; T 1,

where 2 = E 2i . For s = T 1, s+1 = T = E (yiT i ), which can be regarded as a


constant to be estimated. Robertson, Sara…dis and Symons (2010) call the GMM estim-
ator that exploits these restrictions FIVR (restricted FIV), in contrast to the estimator
obtained when these restrictions are not imposed, FIVU (unrestricted FIV). They show
that FIVU is asymptotically equivalent to the quasi-di¤erenced GMM estimator of Ahn,
Lee and Schmidt (2006), while FIVR is asymptotically more e¢ cient.
Bai (2010) proposes controlling the correlations between the regressors and the com-
mon factor component using the method of Chamberlain (1982). In particular, consider
the linear projection of yi0 on i :
0
yi0 = 0 + i 0 + "0 , (67)

which can be derived using the reduced form of yi0 ,


1
X 1
X
1 0 j j
yi0 = 0 + i j + "i; j, (68)
1
j=0 j=0
P P1
with 0 = 0 = (1 ), 0 = 1 j=0
j
j and "0 = j=0
j"
i; j .
This implies a system of T + 1 equations

Ayi+ = +
+ +
i + "+
i , (69)

where yi+ = (yi0 ; yi1 ; :::; yiT )0 , +


= ( 0 ; 0; :::; 0)0 , + =( 0; 1 ; :::;
0
T) , "+
i = ("i0 ; "i1 ; :::; "iT )
0

and 2 3
1 0 0
6 1 0 7
6 7
A=6 . . .. 7.
4 .. .. ... . 5
0 1
Let + = + +0 +
"+ , the covariance of
+
i + "+
i , and
+
i = Ayi+ +
. The
log-likelihood function for yi+ is
N
N + 1X +0 + 1 +
ln i i . (70)
2 2
i=1

24
Notice that since A is a lower-triangular matrix, det (A) = 1 and therefore the Jacobian
term does not enter into the likelihood. Bai suggests estimating (70) using a quasi-
maximum likelihood (QML) procedure based on the ECM (expectation and conditional
maximization) algorithm.
When the model includes covariates, xit , the reduced form of yi0 can be written as
0
yi0 = 0 + i 0 + wi0 0 + "0 , (71)

where wi = vec (x0i ) and 0 is loosely speaking the linear projection of xi on i . Hence
0 0
the residual is + i = Ayi
+ +
wi , with = 0
0 IT , and the likelihood
function is identical to (70).
The attractive feature of the FIVR and QML estimators is that they can both allow
a …xed e¤ects speci…cation as a special case, in which one of the factors is constant over
time. Furthermore, they are valid under strongly and weakly exogenous regressors and
permit unit roots. In this way, these estimators generalise the classical error components
formulation for a wide range of panel data models. Moreover, FIV estimators share
the traditional advantage of method of moments estimators in that they exploit only
a set of orthogonality conditions and make no use of subsidiary assumptions such as
homoskedasticity or other assumed distributional properties of the error process. One
di¤erence between FIV and QML is that in the former approach, the factors, the loadings
of which are uncorrelated with the regressors, will enter into the residuals of the model,
thus resulting in fewer parameters to be estimated. QML will estimate all factors, which
can also be desirable if these factors have a structural signi…cance.
When T is large, treating t …xed leads to an incidental parameters problem so the
methods described above are not appropriate. One way to proceed is to treat t as
random and use the panel feasible generalised median unbiased (PFGMU) estimator
proposed by Phillips and Sul (2003). This involves using the residuals obtained from
a …rst-stage panel median unbiased estimator to construct an invertible estimate of the
error covariance matrix by means of a method of moments procedure, estimating the
regression model using a feasible generalised FE (FGFE) estimator and subsequently
calculating PFGMU using the median function of FGFE. Alternative methods for pro-
jecting out estimates of the factor loadings have been proposed by Moon and Perron
(2004) and Bai and Ng (2004). All these methods are valid for large T only, and it
is not straightforward to generalise them into models that include weakly exogenous
regressors other than the lagged dependent variable.

6 Testing for Cross-Sectional Dependence


6.1 Available Tests for the Presence of Error Cross-Sectional Depend-
ence
The null hypothesis of interest is

H0 : Cov( it ; jt ) = 0 for all t and all i 6= j, (72)

25
vs alternative hypothesis (2). Several tests for error cross-sectional dependence have
been proposed in the literature. Perhaps the most widely known test is the LM statistic
by Breusch and Pagan (1980). The basic idea of this kind of tests is to substitute in the
score vector the parameter estimates obtained from the restricted model under the null
hypothesis and check whether the null vector is su¢ ciently close to zero. It turns out
that under the null the test statistic can be based on the residuals from individual-speci…c
OLS regressions. Let
PT
t=1 bit bjt
^ij = ^ji = : (73)
PT 2
1=2 P
T 2
1=2
b
t=1 it b
t=1 jt

Then under H0 , as T ! 1 for …xed N , we have


N
X1 N
X d
LM = T ^2ij ! 2
N (N 1)=2 , (74)
i=1 j=i+1

where the number of degrees of freedom equals the number of distinct o¤-diagonal ele-
ments of the error covariance matrix.
As noted by Pesaran (2004) and Pesaran, Ullah and Yamagata (2008), the LM stat-
istic (74) is likely to have poor size properties when N is large clearly, an empirically
relevant situation. Pesaran (2004) shows that when both N and T are large, (74) can
be modi…ed in a straightforward way. In particular under H0 , for any given pair i 6= j,
we have
d
T ^2ij ! 21 , (75)
for T ! 1. Therefore, since the ^2ij are asymptotically uncorrelated, the following
scaled version of the LM statistic can be considered:
s N
1 X1 X N
d
LM2 = T ^2ij 1 ! N (0; 1) , (76)
N (N 1)
i=1 j=i+1

for T ! 1 and N ! 1 sequentially.


For …xed T both LM and LM2 statistics are likely to exhibit substantial size dis-
tortions (Pesaran, 2004). This is mainly due to the fact that E(T ^2ij 1) will not be
correctly centered at zero when T is small and with large N the incorrect centering of
the statistics is likely to be accentuated, resulting potentially in large size distortions.
Friedman (1937) proposed a non-parametric test, appropriate for large N and …xed
T , based on Spearman’s rank correlation coe¢ cient. The latter can be thought of as the
regular product-moment correlation coe¢ cient except that it is computed from ranks.
In particular, under the null, as N ! 1 for T …xed we have
d 2
F R = (T 1) [(N 1) RAV E + 1] ! T 1,

where
N
X1 N
X
1
RAV E = rbij (77)
N (N 1) =2
i=1 j=i+1

26
and rbij denotes Spearman’s rank correlation coe¢ cient given by
PT
t=1 (ri;t (T + 1=2)) (rj;t (T + 1=2))
rij = rji = PT . (78)
t=1 (ri;t (T + 1=2))2

A closely related test is developed by Pesaran (2004). He proposes a simple altern-


ative, based on regular product-moment correlation coe¢ cients, which has exactly mean
zero for …xed values of either N or T ,
s 0 1
N
X 1 XN
2T @ d
CD = ^ij A ! N (0; 1) . (79)
N (N 1)
i=1 j=i+1

Pesaran shows that the above statistic is valid under a wide class of panel data models,
including heterogeneous models, dynamic models and regression models with multiple
structural breaks, provided that the unconditional means of yit and xit are time-invariant
and their innovations are symmetrically distributed. Chen, Gao and Li (2009) extend
this method by developing a nonparametric counterpart of the CD statistic for testing
error cross-sectional dependence in nonparametric models.
Both the CD and F R statistics share a common weakness in that they may lack
power to detect the alternative hypothesis under which the sign of the elements of the
error covariance matrix is alternating that is, there are positive and negative correl-
ations in the residuals. This can arise if, for example, cross-sectional dependence is
characterised by a factor model with zero mean factor loadings. Notice that the same
problem might arise even if the factor loadings have mean di¤erent from zero; one such
instance is when time-speci…c dummies are included in the regression model to capture
possible common variations in the dependent variable. In fact, this practice is not
uncommon for …xed T and amounts to transforming the observations in terms of time-
speci…c averages. Thus, suppose that the disturbance follows a single-factor process, as
in (9), in which case time-speci…c demeaning yields process (11). Observe that

Cov it ; jt = E ( i) E j = 0. (80)

Therefore the CD and RAV E statistics will be centered around zero, which implies that
the power of the tests will not increase with N and therefore they may be inconsistent.
Frees (1995) proposes a test statistic that is not subject to this problem and is valid
for …xed T , large N . Speci…cally, de…ne
2 2
Q = b1 (T ) 1 (T 1) + b2 (T ) 2 T (T 3) =2, (81)

where 21 and 22 are independent chi-square distributed variables with T 1 and


T (T 3) =2 degrees of freedom respectively and

4 (T + 2) 2 (5T + 6)
b1 (T ) = 2 , b2 (T ) = . (82)
5 (T 1) (T + 1) 5T (T 1) (T + 1)

27
Also, let
N
X1 N
X
2 2 2
RAV E = rbij . (83)
N (N 1) =2
i=1 j=i+1
h i
2 1
Frees (1995) shows that F RE = N RAV E (T 1) follows asymptotically a Q
2
distribution for N ! 1, T …xed. Therefore, the null is rejected if RAV E is larger than
1
(T 1) + Qq =N , where Qq is an appropriate quantile from the Q distribution.30
Pesaran, Ullah and Yamagata (2008) argue that the F RE statistic tends to behave
similarly to the uncorrected version of the LM statistic for large N when the model
involves more than one explanatory variable (intercept). They propose a bias-adjusted
version of the LM test that makes use of the exact mean and variance of the LM statistic
and is valid under strongly exogenous regressors and normal errors. This is de…ned as
s " #
2
N
X1 X N
(T K) ^2ij Tij d
LM3 = 2 ! N (0; 1) , (84)
N (N 1) Tiji=1 j=i+1

where
1
Tij = E (T K) ^2ij = tr (Mi Mj ) , (85)
T K
1
1
with Mi = IT Xi (Xi0 Xi ) Xi0 , Mj = IT Xj Xj0 Xj Xj0 and

2
Tij = var (T K) ^2ij = [tr (Mi Mj )]2 1T + 2tr [(Mi Mj )]2 2T , (86)

2
with 1T = 2T (T K) and

2T = 3 [(T K 8) (T K + 2) + 24]2 [(T K + 2) (T


: K 2) (T K 4)] 2

(87)
Notice that the test statistic is feasible only when T > K + 8, and it has exactly mean
zero regardless of the value of T p
, unlike the LM pstatistic. On the other hand, unless T
is large the covariance between T K^ij and T K^2ij 0 , for any j 6= j 0 , is di¤erent
2

from zero even under the normality assumption. Therefore (84) is valid under the
sequential asymptotic T ! 1 …rst and then N ! 1. Simulation evidence provided by
Pesaran, Ullah and Yamagata (2008) indicate that the test has good size and power for
T 20.
Sara…dis, Robertson and Yamagata (2009) propose a testing procedure that does not
require normality and is valid for …xed T , large N panel data models with weakly exo-
genous regressors. Their testing procedure is based on Sargan’s di¤erence-test statistic
for overidentifying restrictions. In particular, consider the following model
T
yi = X w;i w + X s;i s + i T
+ i, i = i + "i , (88)
30
de Hoyos and Sara…dis (2006) show how to perform all these tests in Stata using the command xtcsd;
see http://ideas.repec.org/c/boc/bocode/s456736.html.

28
where yi is a T 1 vector of stacked time series observations expressed in terms of
deviations from time-speci…c averages, and similarly for the remaining variables, while
X w;i and X s;i are T Kw and T Ks matrices of weakly and strongly exogenous
regressors respectively. The null hypothesis of interest is

H0 : var( i ) = =0 (89)

against the alternative


H1 : 6= 0, (90)
as opposed to (72) and (2). The aim of the test is to examine whether there is cross-
sectional dependence left out in the errors after time-speci…c demeaning takes place.31
Let X se;i be the T Kse matrix of regressors that remain strongly exogenous in the mispe-
ci…ed model. Thus, X se;i is a subset of X s;i and includes covariates, the factor loadings
of which are either zero (so these covariates are not hit by the factors) or mutually
uncorrelated with i .32 Furthermore, let Z i be the matrix of instrumental variables
that makes use of the full set of moment conditions, while Z se;i be the corresponding
matrix that makes use of the moment conditions that arise with respect to X se;i only.
Sargan’s (1958) or Hansen’s (1982) test of overidentifying restrictions based on the full
set of moment conditions is given by
N
! N
!
X b 1 X
0 0
SF = N 1
b Zi _ i
0
Z b , (91)
i i
i=1 i=1

where b i is the residual vector obtained from the following two-stage linear GMM es-
0
timator of = 0w ; s with the general form

N N
! 1 N N
b X 1X X 1X
• F = W i Z i b_ F
0
Z 0i W i W i Z i b_ F
0
Z 0i yi , (92)
i=1 i=1 i=1 i=1

.
where yi and W i denote some transformation33 of yi and W i = X w;i .. X s;i respect-

ively and b_ F is the estimated weight matrix obtained from a …rst-stage GMM estimator.
Similarly, Sargan’s/Hansen’s test of overidentifying restrictions based on the subset of
moment conditions with respect to X se;i is given by

N
! N
!
X e_ 1 X
0
SR = N 1
e i Z se;i Z 0se;i e i 0 , (93)
i=1 i=1
31
The authors phrase H0 and H1 as ‘homogeneous’ and ‘heterogeneous’ cross-sectional dependence
respectively.
32
Membership in the subset X se;i is testable using Sargan’s/Hansen’s test for overidentifying restric-
tions.
33
For example, …rst-di¤erences, orthogonal deviations and so on.

29
where e i is the residual obtained from the two-stage linear GMM estimator of the
following general form
N N
! 1 N N
b X b 1X X 1X
•R = W i Z se;i _ R
0 0
Z se;i W i W i 0 Z se;i b_ F Z 0se;i yi , (94)
i=1 i=1 i=1 i=1

with similar de…nitions applying as before (mutatis mutandis) and we assume that the
number of columns of Z se;i is larger than W i . Under the null hypothesis as N ! 1 for
…xed T ,
d
DSY R = (SF SR ) ! 2hd , (95)
where hd is the di¤erence between the number of columns of Z i and Z se;i .
The DSY R statistic is very general since it can be performed using alternative GMM
estimators which are not necessarily asymptotically e¢ cient. On the other hand, it
requires hd > Kw + Ks valid moment restrictions under the alternative. This will
be violated if, for example, the (non-zero) factor loadings of the covariates included
in X se;i are correlated with i , or if s = 0 (all regressors are weakly exogenous in
the correctly speci…ed model).34 Yamagata (2008) proposes testing for error cross-
sectional dependence using a joint serial correlation test applied after estimating the
model using the …rst-di¤erenced GMM estimator (Arellano and Bond, 1991). Essentially
the procedure involves an examination of the joint signi…cance of estimates of second
and up to pth-order (…rst-di¤erenced) error serial correlations. The intuition of the test
lies in that error cross-sectional dependence is also likely to show up as serial correlation
in the residuals. To see this, consider the single-factor error process (9) and let i
i:i:d: 0; 2 . Applying time-speci…c demeaning and taking expectations, conditional
upon t , yields
2
E bit bt+s = E ( i t + "it ) i t+s + "it+s = t t+s 6= 0. (96)
Notice that the magnitude of E bit bt+s does not necessarily decrease as s increases
for a given t. Therefore, the null hypothesis of interest becomes
H0 : E bit bt+s = 0 jointly for s = 2; 3; :::; p [ T 2] , (97)
against the alternative
H1 : E bit bt+s 6= 0 for some s, (98)
and t = 2; 3; :::; T s. Under the null hypothesis, as N ! 1 for …xed T , the joint
statistic for second up to pth-order serial correlation is
1 d
m2(2;p) = 0
NH G0 G H0 N ! 2
(p 1) , (99)
0 0 PT s
where H = ( 1 ; :::; N) , i =( i2 ; :::; ip ) , is = t=2 bit bt+s , G = (g1 ; :::; gN ) 0 ,
1 0 b
1 P PT s
gi = (gi2 ; :::; gip )0 , gis = is
0
N s QN AN
• Z0
i bi, Ns =N 1 N i=1 t=2 bit wit+s ,
34
In this case, the null hypothesis could be addressed using a simple overidentifying restrictions test.

30
QN = A0N b• AN , AN = N 1 PN Z 0 W i b • N 1 PN W 0 Z i and b • is the estim-
i i i i
ated weighting matrix obtained from the two-stage …rst-di¤erenced GMM estimator.
It would be interesting to extend this approach to alternative models and estimation
methods but we do not have any results as yet.

6.2 Determining the Number of Factors


Once the null hypothesis of no (heterogeneous) error cross-sectional dependence is rejec-
ted, an important issue comes into play for all estimators allowing for a multi-factor error
structure except the CCE estimator; this is how to determine the appropriate number
of factors. The simplest way to decide upon this is to use the ‘Kaiser criterion’, which
retains all those factors associated with eigenvalues that are above average, or equival-
ently greater than one for standardised data. Intuitively, this is because the chosen
factor must extract as much variation as the original variable. However, in practice this
criterion is often found to be too conservative.
Bai and Ng (2002) propose determining the number of factors by minimising certain
model selection information criterion functions. In particular, consider the T K matrix
of observed variables Xi = (xi1 ; :::; xiT )0 and let
T
Wi = (M0 ) i + "i , (100)

where T(M0 ) contains T observations for each of the M0 1 largest principal components
of the covariance matrix of Wi . The main task is to estimate M0 . De…ne
N
T 1 X
V M; b (M ) Wi0 Wi Wi0 P T Wi , (101)
NT (M )
i

T
where P T is the projection of Wi onto the column space de…ned by (M ) , for any
(M )
M Mmax , where is Mmax is the maximum possible value of M0 . Bai and Ng (2002)
estimate M0 as the solution to either one of the following minimisation problems:

T N +T NT
c1 = arg
M min ln V M; b M + M ln , (102)
M Mmax NT N +T
T N +T
c2 = arg
M min ln V M; b +M 2
ln CN T , (103)
M Mmax NT
and
T ln CN2
c3 = arg
M min ln V M; b +M 2
T
, (104)
M Mmax CN T

where CN 2
T = min (N; T ). The authors demonstrate that (102)-(104) are asymptotically
equivalent and they estimate the true number of factors consistently as min (N; T ) ! 1,
p
cj !
i.e. M M0 for j = 1; 2; 3. In …nite samples the performance of the above information
criteria will be di¤erent. Using simulated data, Bai and Ng show that M c1 and M c2 are

31
more robust than M c3 when either N or T is fairly small and they perform well so long
as min (N; T ) 40. Otherwise, these criteria may not work well, leading to too many
factors being estimated.
Kapetanios (2009) proposes a di¤erent method to determine the appropriate number
of factors. This is based on the result that the largest eigenvalue of the sample covariance
p 2
matrix of the data converges almost surely to (1 + c) , where c = limN;T !1 N T , which
implies that if there is no factor structure in the data, the maximum eigenvalue of the
p 2
sample covariance matrix should not exceed (1 + c) almost surely, in large samples.
Therefore, the method starts essentially be checking whether the factor structure is sup-
p 2
ported by the data at all, using as threshold the value (1 + c) + d; where d > 0 is
chosen a priori. Kapetanios suggests choosing for d the mean eigenvalue of the covari-
ance matrix (for standardised data this equals to 1). Hence, if the maximum eigenvalue
of the covariance matrix exceeds this threshold, the maximum principal component is
obtained and the data are orthogonalised from a regression on the …rst principal com-
ponent. Next, the maximum eigenvalue of the resulting covariance matrix is compared
p 2
against (1 + c) + d and the process is repeated until the maximum eigenvalue of the
resulting covariance matrix does not exceed the threshold value. Using simulated data,
Kapetanios shows that in a majority of circumstances of empirical interest this method
outperforms the information criteria (102)-(104).
The method proposed by Kapetanios requires that the idiosyncratic errors of the
approximate factor model are i.i.d. Onatski (2007) develops a similar estimator that
makes less stringent assumptions on the serial correlation and heteroskedasticity pattern
of the idiosyncratic errors. His method is based on the mirror image of Kapetanios’
argument, i.e. for data characterised by M0 latent common factors, the largest M0
eigenvalues of the covariance matrix of the data grow with N , while the rest of the
eigenvalues are bounded. Hence, the Onatski estimator equals the number of eigenvalues
greater than a threshold value:
c4 = arg max [M j
M M > (1 + ) c1 ] , (105)
M Mmax

where M denotes the M th largest eigenvalue of the sample covariance matrix of Wi ,


is a parameter to be chosen a priori and c1 = # Mmax +1 + (1 #) # 2Mmax +1 , with
1
# = 22=3 22=3 1 , is a threshold obtained from the empirical distribution of the
eigenvalues to distinguish the diverging ones from the bounded ones. Under the as-
sumption that the idiosyncratic errors of the approximate factor model are either seri-
ally uncorrelated, or cross-sectionally independent (but not both), the above estimator
is shown to be consistent.
Ahn and Horenstein (2008) argue that the above methods can be somewhat generous
in penalizing large M . Another potential problem in using these methods is that they
all require a choice of Mmax . In large samples this is certainly not an issue provided
that Mmax > M0 . However, in …nite samples the estimate of M0 could be sensitive to
the choice of Mmax . To this end, Ahn and Horenstein propose estimating the number of
factors by maximising the ratio of two adjacent eigenvalues, or the ratio of their growth

32
rate. In particular, we have
c5 = arg max
M M = M +1 , (106)
M Mmax

and " #
c6 = arg max ln ( M )
M , (107)
M Mmax ln M +1
PT PT
where M = j=M j = j=M +1 j and M Mmax . Using simulated data they
show that the proposed estimators outperform the existing ones even in samples with
small N and T unless the signal-to-noise ratio of the model is too small.35
Notice that the set-up in all the above methods is such that the factors are extracted
from observed variables. Therefore, it is not clear what the properties of these methods
are when the factors are extracted from estimated residuals, which is precisely what is
of main interest in this paper. We explore this issue via Monte Carlo experiments.

6.3 A Monte Carlo Study


6.3.1 Design
The underlying data generating process is given by
M
X
m m
yit = 1 x1it + 2 x2it + ! it , ! it = i+ it , it = i t + "it ,
m=1
M
X
m m
x1it = i t + "it , i = 1; :::; N , t = 1; :::; T , (108)
m=1

which is similar to Section 4.4 except that we add an extra regressor, x2it . To examine
the impact of strict/weak exogeneity on the properties of the tests we set (i) x2it =
i + $ it , $ it i:i:d:N x2 ; 2x2 and (ii) x2it = yit 1 . In the former case we specify the
parameters such that the signal-to-noise ratio depends solely on the slope coe¢ cients,
1 and 2 . In particular, de…ne yit = yit i such that

yit = 1 x1it + 2 x2it + it , (109)

and let the signal-to-noise ratio be denoted by = 2s = 2 ; where 2s is the variance of


the signal and 2 is the total error variance. The signal variance equals
2 2
s = var 1 x1it + 2 x2it = var (yit it ) = var (yit )+var ( it ) 2cov (yit ; it ) . (110)
35
The issue of determining the number of factors in the dynamic factor model case is analysed, among
others, by Amengual and Watson (2007) and Hallin and Liska (2007).

33
We consider each of the terms in (110) sequentially. We have
2 2
var (yit ) = 1 var (x1it ) + 2 var (x2it ) + var ( it ) +2 1 cov (x1it ; it ) =
" M M
#
X X
2 2 2
= 1 m
+ m
+ 2" + 2 2
2 x2
m=1 m=1
" M M
# M
X X X
2 2 2
+ m
+ m
+ " +2 1 m m m
, (111)
m=1 m=1 m=1

M
X M
X
2 2 2
var ( it ) = m
+ m
+ ", (112)
m=1 m=1
and
2
2cov (yit ; it ) = 2cov 1 x1it + 2 x2it + it ; it = 2[ 1 cov (x1it ; it ) + var ( it )]
M
" M M
#
X X X
2 2 2
= 2 1 m m m
+ m
+ m
+ " . (113)
m=1 m=1 m=1

Setting
2 2
" = " ;
" M M
#
X X
2 2 2 2
x2 = m
+ m
+ " ; and
m=1 m=1
2 2
m
= m
; m
= m
, m
= for m = 1; ::; M , (114)

and combining (110)-(113) yields


" M M
#
X X
2 2 2 2 2 2
s = 1 + 2 m
+ m
+ " .
m=1 m=1

Therefore,
2 2 2 2
= s= = 1 + 2. (115)
In the case of weak exogeneity (so x2it = yit 1) we de…ne yit = yit 1
i
such that
2

1 1
yit = x1it + it . (116)
1 2 1 2

The variance of the signal equals


2
s = var (yit it ) = var (yit ) + var ( it ) 2cov (yit ; it )
2 2
1 1 1
= var (x1it ) + var ( it ) +2 2 cov (x1it ; it ) + var ( it )
1 2 1 2 (1 2)
2
[ 1 cov (x1it ; it ) + var ( it )] . (117)
1 2

34
Using (114) and imposing = 0 it is straightforward to show that
2 2 2 2 2
= s= = 1 + 2 = (1 2) . (118)

To examine the impact of the signal-to-noisepratio on the performance of the statist-


ics we set 2 f1; 5g and we select k = =2 for k = 1; 2. As in Section 4.4
we choose values for the remaining parameters subject to the three fractions 1 , 2 ,
and 3 . Normalising 2" = 1 implies that these fractions parameterise completely 2 ,
P2 P P
m=1
2
m
and 2m=1 2 m . To simplify things we let 2 m = M 1 M m=1
2 ,
m m
=
P 1=2
M 1 M m=1
2
m
for all m. Further, we …x 2 = 0:9 and we set 1 2 f0:5; 0:9g and
3 2 f0:5; 0:8g to examine, respectively, the impact of the relative size of the purely idio-
syncratic error component over factor noise and the closeness of the factor structure to
an ordinary time e¤ect. We expect the performance of the statistics to deteriorate with
high values of 1 and 3 . Notice, however, that as 1 approaches 1 the impact of the
factor structure on the properties of the estimates of the structural parameters is likely
to become smaller. Furthermore, when 3 = 1 the multi-factor structure degenerates to
a single individual-invariant e¤ect which can be accounted for using time-speci…c dummy
variables. Finally, it is also worth pointing out that consistent estimation of the struc-
tural parameters only requires that M c M0 . Therefore, the cost of underestimating
the number of factors is greater than estimating more factors.
We consider M 2 f1; 3g and N = 100, T 2 f10; 50; 100g. As before, we perform
2,000 replications. The starting value of y, yi0 , is drawn from a stationary process. All
statistics are calculated using the residuals obtained by OLS for each individual. Prior
to computation of the eigenvectors each i-speci…c residual vector is standardised to have
unit variance.

6.3.2 Results
Table 2 reports the results in terms of the frequency of the statistics to select the true
number of factors, M0 . If the statistic selects an incorrect number of factors with higher
frequency than M0 , then we report both frequencies, as well as the value of M c 6= M0
in brackets. For example, ‘:000 (0; 1:00)’ means that the frequency of selecting M0 is
zero and the statistic has selected M = 0 with frequency 1. The M cj refer to the
corresponding statistics de…ned in (102)-(107). Following Onatski (2007) we choose
2 0; max N 1=2 ; T 1=2 ; max N 2=3 ; T 2=3 . Therefore, M c4 contains three cases,
c4 , M
M c4 and M c4 that correspond to each of these di¤erent values of respectively.
(1) (2) (3)
As we can see, the performance of the statistics varies across di¤erent experiments
and depends crucially upon the size of M0 (the smaller the better), T (the larger the
better) and the values of 1 and 2 (the smaller the better). For T = 100, most stat-
istics perform well even if the factors are extracted from residuals rather than observed
variables, unless 1 and 2 are both close to unity and M0 = 1. In this case all statistics
heavily underestimate M0 although M c4 and M c4 do less so than the others. This
(1) (3)

…nding is not surprising because most of the noise is idiosyncratic in this case. M c3

35
outperforms M c1 and M c2 , unless 1 = 2 = 0:5, in which case it compares less favorably.
c
M4(1) does relatively poorly in most circumstances but M c4 and M c4 perform quite
(2) (3)
well even for large values of either 1 or 2 , although they are both sensitive to small
values of T . M c5 appears to perform well even for T = 10 and it underestimates M0
mostly when 1 is large. Similar results have been obtained for = 5 and = 0:5
and therefore it appears that these two parameters are not crucial for the performance
of the statistics. Furthermore, we have reached similar conclusions for the case where
x2it = yit 1 but to save space we do not report these here.36
In summary, we may argue that some of the statistics considered here, based on resid-
uals rather than observed variables, perform reasonably well (especially M c4 , M c4 and
(2) (3)
c5 ) under both strong and weak exogeneity, unless a large proportion of the variation
M
in total noise is due to the purely idiosyncratic component, or there is little variation in
the factor loadings, or T is small. The …rst case might be less of a problem in practice
because the impact of the factor structure can be small while the second case can be
accounted for quite e¤ectively using time dummies. Determining the number of factors
under small T is certainly an issue that requires further research.
To this end, Ahn, Lee and Schmidt (2006) propose determining the number of factors
using a sequential method, based on GMM and Sargan’s (1958) or Hansen’s (1982) test
statistic. Their method is appropriate for …xed T . The intuition of this approach is
c < M0 , this is likely to show up as a signi…cant overidentifying restrictions test
that if M
statistic. Therefore, one may start by testing the null M0 = 0 against the alternative
M0 > 0. Then if the null is rejected, one can move to test the null M0 = 1 against
the alternative M0 > 1 and so one until the null hypothesis is not rejected. Naturally,
the signi…cance level used for this sequential method needs to be appropriately adjus-
ted. This approach is valid under strongly and weakly exogenous regressors. In the
former case, it will identify only the factors whose factor loadings are correlated with
the regressors. An alternative approach can be based on the joint serial correlation test
(Yamagata, 2008) combined with a GMM estimator that allows for factor residuals in
the same sequential manner, but we have no results as yet. A further possibility under
…xed T is to construct a criterion based on a likelihood ratio test statistic. Lawley
and Maxwell (1971, section 2.6) provide details for the case of extracting latent factors
from observed variables, although the case of extracting factors from regression residuals
remains unexplored in the literature. We do expect the issue of determining the number
of factors in …xed T cases to attract more attention in the near future.

36
The results are available from the authors upon request.

36
Table 2 Performance of statistics for selecting the number of factors, = 1.

T M0 1 3 M M M M (1)
M (2)
M (3)
M M
100 1 0:5 0:5 1:00 1:00 :995 :314 (2; :45) :885 :613 1:00 1:00
c1 c2 c3 c4 c4 c4 c5 c6

50 1 0:5 0:5 1:00 1:00 1:00 :174 (2; :43) :835 :562 1:00 1:00
10 1 0:5 0:5 :003 (6; :97) :054 (6; :70) :000 (0; 1:0) :156 (0; :84) :002 (0; :99) :006 (0; :99) :904 :000 (7; 1:0)
100 1 0:9 0:5 1:00 :994 1:00 :351 (2; :45) :913 :672 1:00 1:00
50 1 0:9 0:5 :850 :609 :999 :208 (2; :45) :880 :602 1:00 1:00
10 1 0:9 0:5 :000 (6; :70) :001 (0; :71) :000 (6; :98) :001 (0; :99) :000 (0; 1:0) :000 (0; 1:0) :218 (7; :25) :000 (7; 1:0)
100 1 0:5 0:8 1:00 1:00 :999 :301 (2; :48) :902 :628 1:00 1:00
50 1 0:5 0:8 1:00 1:00 1:00 :184 (2; :43) :850 :554 1:00 1:00
10 1 0:5 0:8 :007 (6; :93) :026 (6; :63) :000 (6; 1:0) :074 (0; :93) :000 (0; 1:0) :001 (0; 1:0) :849 :000 (7; 1:0)
100 1 0:9 0:8 :448 (0; :55) :055 (0; :95) 1:00 :368 (2; :42) :907 :682 1:00 1:00

37
50 1 0:9 0:8 :023 (0; :97) :002 (0; :99) :573 :201 (2; :45) :856 :590 :979 :992
10 1 0:9 0:8 :000 (6; :88) :000 (6; :61) :000 (6; :99) :000 (0; 1:0) :000 (0; 1:0) :000 (0; 1:0) :147 (7; :30) :000 (7; 1:0)
100 3 0:5 0:5 1:00 1:00 :907 :726 :980 :907 1:00 :119 (1; :72)
50 3 0:5 0:5 :998 :998 1:00 :649 :998 :914 :999 :118 (1; :67)
10 3 0:5 0:5 :000 (6; :99) :000 (6; :98) :000 (6; 1:0) :002 (0; :53) :000 (0; :97) :000 (0; :90) :340 :000 (7; 1:0)
100 3 0:9 0:5 :238 (0; :55) :012 (0; :75) :999 :813 :995 :995 :785 :000 (0; :99)
50 3 0:9 0:5 :017 (1; :68) :000 (1; :60) :619 :720 :938 :926 :449 :002 (1; :96)
10 3 0:9 0:5 :000 (6; 1:0) :000 (6; :50) :000 (6; :99) :000 (0; :60) :000 (0; 1:0) :000 (0; 1:0) :074 (1; :34) :000 (7; 1:0)
100 3 0:5 0:8 :014 (1; :72) :000 (1; :98) :997 :796 :988 :944 :746 :000 (1; 1:0)
50 3 0:5 0:8 :001 (1; :96) :000 (1; 1:0) :237 (2; :54) :701 :770 :841 :449 :000 (1; 1:0)
10 3 0:5 0:8 :000 (6; :97) :000 (6; :71) :000 (6; 1:0) :000 (0; :95) :000 (0; 1:0) :000 (0; 1:0) :094 (1; :90) :002 (7; 1:0)
100 3 0:9 0:8 :000 (0; :55) :000 (0; :96) :000 (1; :98) :296 (2; :52) :017 (1; :68) :108 (2; :51) :000 (1; 1:0) :000 (1; 1:0)
50 3 0:9 0:8 :000 (0; :96) :000 (0; :99) :000 (1; :59) :359 (2; :37) :018 (1; :78) :096 (1; :50) :005 (1; :97) :002 (1; :99)
10 3 0:9 0:8 :000 (6; :89) :000 (6; :62) :000 (6; :99) :000 (0; 1:0) :000 (0; 1:0) :000 (0; 1:0) :085 (7; :31) :000 (7; 1:0)
7 Current Challenges and Future Directions
There have been several major advances in the theoretical literature of panel data ana-
lysis with error cross-sectional dependence over the last ten years. Methods developed
for dealing with …xed- and large-T cases, strongly and weakly exogenous regressors,
non-stationary panels and testing for non-zero correlations across individuals have all
helped to (re)address more e¤ectively the issue of cross-sectional dependence and ulti-
mately that of unobserved heterogeneity. Notwithstanding, there is still an abundance
of non-trivial problems that require research attention. For instance, the literature is
mute on dealing with cross-sectional dependence in non-linear panel data models, in
which case it is typically assumed, for identi…cation purposes rather than descriptive
accuracy, that all observations are independent across individuals. Testing for cross-
sectional dependence in non-linear models is not straightforward either, although some
progress has been made by Hsiao, Pesaran and Pick (2009). There is a large range of
other models that await possible extensions of the existing methods, such as panel VARs
with a multi-factor error structure, systems of simultaneous equations and models with
heterogeneous coe¢ cients.
Finally, there is yet a relatively small empirical literature that deals with cross-
sectional dependence in practice. It will be useful, as well as interesting, to see the
extent to which economic applications can bene…t from theoretical advances in the …eld.

References
[1] Ahn, S.C. and Schmidt, P. 1995. E¢ cient Estimations of Models for Dynamic Panel
Data. Journal of Econometrics, 68, 5-28.
[2] Ahn, S. C. and Horenstein, A. 2008. Eigenvalue ratio test for the number of factors.
Mimeo.
[3] Ahn, S. C., Y. H. Lee and P. Schmidt. 2006. GMM estimation of linear panel data
models with time-varying individual e¤ects. Journal of Econometrics, 101, 219-255.
[4] Ahn, S. C., Y. H. Lee and P. Schmidt. 2006. Panel Data Models with Multiple
Time-Varying Individual E¤ects. Mimeo.
[5] Amengual, D. and Watson, M. W. 2007. Consistent estimation of the number of dy-
namic factors in a large N and T panels. Journal of Business & Economic Statistics,
25(1), 91-96.
[6] Anderson, T.W. and Hsiao, C. 1981. Estimation of Dynamic Models with Error
Components. Journal of the American Statistical Association, 76, 598-606.
[7] Arellano, M. 2003. Panel Data Econometrics. Oxford University Press, Oxford.
[8] Arellano, M. and Bond S. 1991. Some Tests of Speci…cation for Panel Data: Monte
Carlo Evidence and an Application to Employment Equations. Review of Economic
Studies, 58, 277-297.

38
[9] Arellano, M. and Bover, O. 1995. Another Look at the Instrumental Variable Es-
timation of Error-Component Models. Journal of Econometrics, 68, 29-51.

[10] Bai, J. 2009. Panel Data Models with Interactive Fixed E¤ects. Econometrica, 77,
1229-1279.

[11] Bai, J. 2010. Likelihood approach to small T dynamic panel models with interactive
e¤ects. Mimeo.

[12] Bai, J. and Ng, S. 2002. Determining the Number of Factors in Approximate Factor
Models. Econometrica, 70, 191-22.

[13] Bai, J. and Ng, S. 2002. A PANIC Attack on Unit Roots and Cointegration. Eco-
nometrica, 72(4), 1127-1177.

[14] Baltagi, B. 2008. Econometric Analysis of Panel Data, 4th ed. John Willey & Sons,
West Sussex.

[15] Baltagi B. H. and Pesaran M. H. 2007. Heterogeneity and cross section dependence
in panel data models: theory and applications - Introduction. Journal of Applied
Econometrics, 22(2), 229-232.

[16] Bekker, P.A. 1994. Alternative Approximations to the Distributions of Instrumental


Variable Estimators. Econometrica, 62, 657-681.

[17] Blundell, R. and Bond, S. 1998. Initial Conditions and Moment Restrictions in
Dynamic Panel Data Models. Journal of Econometrics, 87, 115-143.

[18] Breitung, J. and Pesaran, M.H. 2008. Unit Roots and Cointegration in Panels, in
L. Matyas and Sevestre P. (eds.) The Econometrics of Panel Data: Fundamentals
and Recent Developments in Theory and Practice, Kluwer Academic Publishers.

[19] Breusch, T. and A. Pagan. 1980. The Lagrange multiplier test and its application
to model speci…cation in econometrics. Review of Economic Studies 47, 239-253.

[20] Chen, J., Gao, J. and Li, D. 2009. A New Diagnostic Test for Cross–Section Inde-
pendence in Nonparametric Panel Data Models. Mimeo.

[21] Chudik, A. Pesaran, M. H. and Tosetti, E. 2009. Weak and Strong Cross Section
Dependence and Estimation of Large Panels. Mimeo.

[22] Coakley, J., A. Fuertes and R. Smith (2002). A Principal Components Approach to
Cross-Section Dependence in Panels. Working paper, Birckbeck College, University
of London.

[23] Conley, T.G. 1999. GMM Estimation with Cross Sectional Dependence. Journal of
Econometrics, 92,1-45.

39
[24] Conley, T.G., and Topa, G. 2002. Socio-economic Distance and Spatial Patterns in
Unemployment. Journal of Applied Econometrics, 17, 303-327.

[25] de Hoyos, R. E. and Sara…dis, V. 2006. Testing for Cross-sectional Dependence in


Panel Data Models. The Stata Journal 6(4): 482-496.

[26] Driscoll, J.C., and Kraay, A.C. 1998. Consistent Covariance Matrix Estimation with
Spatially Dependent Data. The Review of Economics and Statistics, 80, 549-560.

[27] Fiebig, D. G. 2001. Seemingly Unrelated Regression, in Baltagi, B. eds, A Compan-


ion to Theoretical Econometrics, Backwell Publishers, 101-121.

[28] Fisher, R.A. 1935. The Design of Experiments. Oliver and Boyd, Edinburgh.

[29] Forni, M. and Lippi, M. 2001. The Generalized Dynamic Factor Model: Represent-
ation Theory. Econometric Theory 17, 1113-1141.

[30] Forni, M., M. Hallin, M. Lippi, and L. Reichlin (2000). The generalized factor model:
identi…cation and estimation. The Review of Economics and Statistics, 82, 540-554.

[31] Frees, E. W. 1995. Assessing Cross-sectional Correlation in Panel Data. Journal of


Econometrics, 69, 393–414.

[32] Friedman, M. 1937. The use of ranks to avoid the assumption of normality implicit
in the analysis of variance. Journal of the American Statistical Association 32, 675–
701.

[33] Goldberger, A. 1972. Structural equation methods in the social sciences. Economet-
rica 40 (6), 979–1001.

[34] Hallin, M. and Liška, R. 2007. Determining the number of factors in the general
dynamic factor model. Journal of the American Statistical Association, 102(478),
603-617.

[35] Hansen, L. P. 1982. Large Sample Properties of Generalized Method of Moments


Estimators, Econometrica, 50, 1029-1054.

[36] Hayakawa, K. 2009. Bias Corrected Estimation of Dynamic Panel Data Models with
Interactive Fixed E¤ects. Mimeo.

[37] Holtz-Eakin D, Newey W. and Rosen H. 1988. Estimating Vector Autoregressions


with Panel Data. Econometrica, 56, 1371-1395.

[38] Hurlin, C., Mignon, V. 2004. Second generation panel unit root tests. Mimeo.

[39] Hsiao, C. Analysis of Panel Data. 2nd ed. Cambridge University Press, Cambridge.

[40] Hsiao, C. 2007. Panel Data Analysis - Advantages and Challenges. TEST. Vol. 16,
pp. 1-22.

40
[41] Hsiao, C., Pesaran, M. H. and Pick, A. 2009. Diagnostic Tests of Cross Section
Independence for Nonlinear Panel Data Models. Mimeo.

[42] Jöreskog, K. G. and Goldberger, A. S. 1975. Estimation of a model with multiple


indicators and multiple causes of a single latent variable. Journal of the American
Statistical Association, 70, 631-639.

[43] Kapetanios, G. An Alternative Method for Determining the Number of Factors


in Factor Models with Large Data Sets. Kapetanios G. Journal of Business and
Economic Statistics, forthcoming.

[44] Kapetanios, G., and Pesaran, M. H. 2007. Small Sample Properties of Cross Section
Augmented Estimators for Panel Data Models with Residual Multi-factor Struc-
tures; with M. H. Pesaran. In The Re…nement of Econometric Estimation and Test
Pro-cedures: Finite Sample and Asymptotic Analysis, Garry Phillips and Elias
Tzavalis (eds.), Cambridge University Press, Cambridge.

[45] Kapetanios, G., Pesaran, M. H. and Yamagata, T. 2009. Panels with Nonstationary
Multifactor Error Structures. Mimeo.

[46] Kapoor, M., Kelejian, H. and Prucha, I. 2007. Panel Data Models with Spatially
Correlated Error Components. Journal of Econometrics, 140, 97–130.

[47] Kelejian, H. and Prucha, I. 2010. “Speci…cation and Estimation of Spatial Autore-
gressive Models with Autoregressive and Heteroskedastic Disturbances. Journal of
Econometrics, forthcoming.

[48] Kiviet, J. and Sara…dis, V. 2000. Cross-sectional Correlation in Panel Data Rela-
tionships. Mimeo.

[49] Kontoghiorghes, E. J. and Clarke, M. R. B. 1995. An alternative approach for the


numerical solution of seemingly unrelated regression equations models. Computa-
tional Statistics & Data Analysis, 19(4), 369-377.

[50] Lee, L. F. 2004. Asymptotic Distributions of Quasi-Maximum Likelihood Estimators


for Spatial Autoregressive Models. Econometrica, 72, 1899–1925.

[51] Lee, L. F. 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregress-
ive models. Journal of Econometrics, 137, 489–514.

[52] Lawley, D.N. and Maxwell A.E. 1971. Factor Analysis as a Statistical Method.
Butterworth, London.

[53] Moon, R. G. and Perron, B. 2004. E¢ cient Estimation of the SUR Cointegrating
Regression Model and Testing for Purchasing Power Parity. Econometric Reviews,
23, 293-323.

[54] Moon, H. R. and Perron, B. 2006. Seemingly Unrelated Regressions. Mimeo.

41
[55] Mundlak, Y. 1978. On the pooling of time series and cross section data. Economet-
rica, 46, 69-85.

[56] Nauges, C. and Thomas, A. 2003. Consistent estimation of dynamic panel data
models with time-varying individual e¤ects. Annales d’Economie et de Statistique,
70, 53-74.

[57] Neprash, J.A. 1934. Some Problems in the Correlation of Spatially Distributed
Variables. Journal of the American Statistical Association, 29, 167-168.

[58] Newey, W. and West, K. 1987. A Simple, Positive Semi-de…nite, Heteroskedasticity


and Autocorrelation Consistent Covariance Matrix. Econometrica, 55(3), 703-708.

[59] Nickell, S. 1981. Biases in Dynamic Models with Fixed E¤ects. Econometrica, 49,
1417-1426.

[60] Onatski, A. 2007. A formal statistical test for the number of factors in the approx-
imate factor models. Mimeo

[61] Pesaran, M. H. 2004. General diagnostic tests for cross section dependence in pan-
els. University of Cambridge, Faculty of Economics, Cambridge Working Papers in
Economics No. 0435.

[62] Pesaran, M. H. and Tosetti, E. 2009. Large panels with common factors and spatial
correlations. Mimeo.

[63] Pesaran, M. H., A. Ullah, and Yamagata, T. 2008. A bias-adjusted test of error
cross section dependence. The Econometrics Journal, 11, 105-127.

[64] Phillips, P. and Sul, D. 2003. Dynamic Panel Estimation and Homogeneity Testing
under cross-sectional Dependence. Econometrics Journal 6, 217-259.

[65] Phillips, P. and Sul, D. 2007. Bias in Dynamic Panel Estimation with Fixed E¤ects,
Incidental Trends and cross-sectional Dependence. Journal of Econometrics 137,
162-188.

[66] Robertson, D. and Symons. J. 2007. Maximum Likelihood Factor Analysis with
Rank De…cient Sample Covariance Matrices. Journal of Multivariate Analysis,
98(4), 813-828.

[67] Robertson, D., V. Sara…dis, and J. Symons (2010). IV Estimation of Panels with
Factor Residuals. mimeo.

[68] Sara…dis, V. 2009. GMM Estimation of Short Dynamic Panel Data Models with
Error Cross-sectional Dependence. Mimeo.

[69] Sara…dis, V. and Robertson, D. 2009. On the Impact of Error Cross-sectional De-
pendence in Short Dynamic Panel Estimation. The Econometrics Journal, 12(1),
62-81.

42
[70] Sara…dis, V., Yamagata, T. and Robertson, D. 2009. A Test of Cross Section De-
pendence for a Linear Dynamic Panel Model with Regressors. Journal of Econo-
metrics, 148(2), 149-161.

[71] Sargan, J.D. 1958. The Estimation of Economic Relationships Using Instrumental
Variables. Econometrica, 26, 393-495.

[72] Stephan, F.F. 1934. Sampling Errors and Interpretations of Social Data Ordered in
Time and Space. Journal of the American Statistical Association, 29, 165-166.

[73] Srivastava, V. K. and Dwivedi, T. D. 1979. Estimation of seemingly unrelated re-


gression equations –a brief survey. Journal of Econometrics, 10, 15-32.

[74] Srivastava. S. and Giles, D. 1987. Seemingly Unrelated Regression Equations Mod-
els. Marcel Dekker, New York.

[75] Tobler, W. 1970. A Computer Movie Simulating Urban Growth in the Detroit Re-
gion. Economic Geography, 46, 234-240.

[76] Yamagata, T. 2008. A Joint Serial Correlation Test for Linear Panel Data Models.
Journal of Econometrics 146, 13-145.

[77] Wansbeek, T., and Knaap, T. 1999. Estimating a Dynamic Panel Data Model with
Heterogenous Trends. Annales d’Economie et de Statistique, 55-56, 331-349.

[78] Wansbeek, T., and E. Meijer. 2000. Measurement Error and Latent Variables in
Econometrics. Amsterdam, Elsevier.

[79] Wansbeek, T., and E. Meijer. 2007. Comments on; Panel data Analysis - Advantages
and Challenges. TEST. Vol. 16, pp. 33-36.

[80] Zellner, A. 1962. An E¢ cient Method of Estimating Seemingly Unrelated Regres-


sions and Tests for Aggregation Bias. Journal of the American Statistical Associ-
ation, 57, 348-368.

43

You might also like